Multilingual translation pipeline
EN ↔ ZH ↔ JA ↔ KO with a brand glossary, streaming output, and a built-in quality
score — using only nex-pro. ~$0.0001 per typical paragraph.
The two non-obvious things
Most translation pipelines fail on two boring things:
- Brand terms get translated. "NexToken" becomes "下一令牌". Bad.
- Quality is invisible. The model returns something — is it good? Same model can self-score on a 1-10 rubric.
This recipe solves both with a glossary tool + a single eval pass.
Step 1 · Define glossary + system prompt
GLOSSARY = {
"NexToken": {"zh": "NexToken", "ja": "NexToken", "ko": "NexToken"},
"nex-pro": {"zh": "nex-pro", "ja": "nex-pro", "ko": "nex-pro"},
"Singapore": {"zh": "新加坡", "ja": "シンガポール", "ko": "싱가포르"},
"wallet": {"zh": "钱包", "ja": "ウォレット", "ko": "지갑"},
}
def system_prompt(target_lang, glossary):
pairs = "\n".join(f" {src} → {dst[target_lang]}" for src, dst in glossary.items())
return (
f"You are a professional translator. Translate the user's text into {target_lang}.\n"
f"Preserve product/brand names exactly. Glossary:\n{pairs}\n"
"Output ONLY the translation, no preamble, no quotes."
)
Step 2 · Translate with streaming
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ["NEX_API_KEY"], base_url="https://api.nextoken.biz/v1")
def translate(text, target="zh"):
out = []
stream = client.chat.completions.create(
model="nex-pro",
messages=[
{"role": "system", "content": system_prompt(target, GLOSSARY)},
{"role": "user", "content": text},
],
temperature=0.2,
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True) # live UX
out.append(delta)
print()
return "".join(out)
translated = translate("NexToken is hosted in Singapore. Top up your wallet.", target="zh")
# Output: NexToken 部署在新加坡。请充值你的钱包。
Step 3 · Quality-score every translation
A separate call scores 0-10 on five rubrics. Cheap because nex-pro charges per token, not per call, and the eval prompt is small.
def score(source, translation, target_lang):
r = client.chat.completions.create(
model="nex-pro",
messages=[{
"role": "user",
"content": (
f"Score this {target_lang} translation on a 0–10 scale for each rubric. "
f"Return ONLY JSON like {{\"accuracy\":N,\"fluency\":N,\"brand_safety\":N,\"tone\":N,\"completeness\":N}}.\n\n"
f"Source: {source}\n\nTranslation: {translation}"
),
}],
temperature=0,
response_format={"type": "json_object"},
)
return r.choices[0].message.content
print(score("Top up your wallet.", "请充值你的钱包。", "zh"))
# {"accuracy":10, "fluency":10, "brand_safety":10, "tone":9, "completeness":10}
Why score with the same model? Cheaper than a different judge and good enough
for regression-style monitoring. If you need a stronger judge (e.g. for releases or A/B testing),
swap
model="nex-pro" to model="nex-reasoning" — 6× cost but
significantly better at catching subtle errors.
Step 4 · Bulk pipeline (CSV in, CSV out)
import csv
with open("source.csv") as inp, open("translated.csv", "w", newline="") as out:
reader = csv.DictReader(inp)
writer = csv.DictWriter(out, fieldnames=["id", "source", "zh", "ja", "ko", "score_zh"])
writer.writeheader()
for row in reader:
s = row["text"]
zh = translate(s, "zh")
ja = translate(s, "ja")
ko = translate(s, "ko")
sc = score(s, zh, "zh")
writer.writerow({"id": row["id"], "source": s, "zh": zh, "ja": ja, "ko": ko, "score_zh": sc})
For 1,000 short strings (≈ 50 tokens each) across 3 target languages + score: about $0.30 total.
Same job on gpt-4o: ~$6.
Production extras
- Use /v1/batches for jobs > 10K rows — gets you another 30% off and runs async (see recipe 04)
- Cache translations by content hash. The semantic cache hit fee is 5% of retail.
- Quality threshold: re-translate anything scoring < 7 on accuracy with
nex-reasoning