Multilingual translation pipeline

EN ↔ ZH ↔ JA ↔ KO with a brand glossary, streaming output, and a built-in quality score — using only nex-pro. ~$0.0001 per typical paragraph.

⏱ 10 minnex-prostreamingtool calling

The two non-obvious things

Most translation pipelines fail on two boring things:

Brand terms get translated. "NexToken" becomes "下一令牌". Bad.
Quality is invisible. The model returns something — is it good? Same model can self-score on a 1-10 rubric.

This recipe solves both with a glossary tool + a single eval pass.

Step 1 · Define glossary + system prompt

GLOSSARY = {
    "NexToken":  {"zh": "NexToken", "ja": "NexToken", "ko": "NexToken"},
    "nex-pro":   {"zh": "nex-pro",  "ja": "nex-pro",  "ko": "nex-pro"},
    "Singapore": {"zh": "新加坡",   "ja": "シンガポール", "ko": "싱가포르"},
    "wallet":    {"zh": "钱包",     "ja": "ウォレット",   "ko": "지갑"},
}

def system_prompt(target_lang, glossary):
    pairs = "\n".join(f"  {src} → {dst[target_lang]}" for src, dst in glossary.items())
    return (
        f"You are a professional translator. Translate the user's text into {target_lang}.\n"
        f"Preserve product/brand names exactly. Glossary:\n{pairs}\n"
        "Output ONLY the translation, no preamble, no quotes."
    )

Step 2 · Translate with streaming

from openai import OpenAI
import os

client = OpenAI(api_key=os.environ["NEX_API_KEY"], base_url="https://api.nextoken.biz/v1")

def translate(text, target="zh"):
    out = []
    stream = client.chat.completions.create(
        model="nex-pro",
        messages=[
            {"role": "system", "content": system_prompt(target, GLOSSARY)},
            {"role": "user",   "content": text},
        ],
        temperature=0.2,
        stream=True,
    )
    for chunk in stream:
        delta = chunk.choices[0].delta.content or ""
        print(delta, end="", flush=True)   # live UX
        out.append(delta)
    print()
    return "".join(out)

translated = translate("NexToken is hosted in Singapore. Top up your wallet.", target="zh")
# Output: NexToken 部署在新加坡。请充值你的钱包。

Step 3 · Quality-score every translation

A separate call scores 0-10 on five rubrics. Cheap because nex-pro charges per token, not per call, and the eval prompt is small.

def score(source, translation, target_lang):
    r = client.chat.completions.create(
        model="nex-pro",
        messages=[{
            "role": "user",
            "content": (
                f"Score this {target_lang} translation on a 0–10 scale for each rubric. "
                f"Return ONLY JSON like {{\"accuracy\":N,\"fluency\":N,\"brand_safety\":N,\"tone\":N,\"completeness\":N}}.\n\n"
                f"Source: {source}\n\nTranslation: {translation}"
            ),
        }],
        temperature=0,
        response_format={"type": "json_object"},
    )
    return r.choices[0].message.content

print(score("Top up your wallet.", "请充值你的钱包。", "zh"))
# {"accuracy":10, "fluency":10, "brand_safety":10, "tone":9, "completeness":10}

Why score with the same model? Cheaper than a different judge and good enough for regression-style monitoring. If you need a stronger judge (e.g. for releases or A/B testing), swap model="nex-pro" to model="nex-reasoning" — 6× cost but significantly better at catching subtle errors.

Step 4 · Bulk pipeline (CSV in, CSV out)

import csv

with open("source.csv") as inp, open("translated.csv", "w", newline="") as out:
    reader = csv.DictReader(inp)
    writer = csv.DictWriter(out, fieldnames=["id", "source", "zh", "ja", "ko", "score_zh"])
    writer.writeheader()
    for row in reader:
        s = row["text"]
        zh = translate(s, "zh")
        ja = translate(s, "ja")
        ko = translate(s, "ko")
        sc = score(s, zh, "zh")
        writer.writerow({"id": row["id"], "source": s, "zh": zh, "ja": ja, "ko": ko, "score_zh": sc})

For 1,000 short strings (≈ 50 tokens each) across 3 target languages + score: about $0.30 total. Same job on gpt-4o: ~$6.

Production extras

Use /v1/batches for jobs > 10K rows — gets you another 30% off and runs async (see recipe 04)
Cache translations by content hash. The semantic cache hit fee is 5% of retail.
Quality threshold: re-translate anything scoring < 7 on accuracy with nex-reasoning

← Chinese RAG Next: Batch classification →