Drop-in for OpenAI. Half the price.
Same SDK. Just change base_url and the model name. Pay half.
| Feature | OpenAI text-embedding-3-small | Cohere embed-multilingual-v3 | NexToken nex-embed-zh |
|---|---|---|---|
| Price (per 1M input tokens) | $0.020 | $0.100 | $0.010 (−50% / −90%) |
| Output dimensions | 1,536 | 1,024 | 1,024 |
| Max input tokens | 8,191 | 512 | 512 |
| Strong on Chinese | — | Generic | ✓ BGE-large-zh, trained for CN |
| OpenAI SDK drop-in | — | ✗ | ✓ Just change base_url |
| APAC compliance hosting | ✗ (US) | Multi-region (premium) | ✓ Singapore default |
| Wallet-based prepaid billing | ✗ | ✗ | ✓ No surprise overages |
| Free tier | $5 in trial credit | Card required | ✓ $5 free, no card |
Three lines of code, half the price.
If you're already using the OpenAI SDK for embeddings, switching is a one-line change: point base_url at NexToken and pick nex-embed-zh.
- Real OpenAI-compatible response — works with the official Python and Node SDKs unchanged
- 1024-dim float vectors served from a Singapore GPU;
~40msP50 latency end-to-end - Per-call cost shown in the response so you always know what you're spending
from openai import OpenAI client = OpenAI( api_key="NEX_...", base_url="https://api.nextoken.biz/v1", ) # Self-hosted Chinese-strong embedding # BGE-large-zh-v1.5, 1024 dim, hosted in Singapore resp = client.embeddings.create( model="nex-embed-zh", input=["hello", "你好", "こんにちは"], ) print(resp.data[0].embedding[:4]) print(f"Cost: ${resp.nex.cost_usd}")
Today: nex-embed-zh. Soon: more.
We're scaling under an AWS GPU quota constraint in Singapore. The first model ships today; multilingual + long-context land as soon as the second card is approved.
nex-embed-zh
512 token max input
Strong on Chinese + English
~50% under text-embedding-3-small
nex-embed-multilingual
8,192 token context
100+ languages
Dense + sparse + multi-vector
nex-embed-large
32,768 token context
SOTA on Chinese benchmarks
Pairs with
nex-rerank
Self-hosted. Singapore-private. Transparent.
Every call is served on our Singapore GPU. Nothing fan-out to OpenAI behind the scenes. The response.nex object tells you exactly what happened and what it cost.
You POST /v1/embeddings
OpenAI-compatible request body. nex-embed-zh is the current model id.
Singapore GPU serves
Private VPC; never leaves the region. P50 ~40ms end-to-end through the public API.
You see what you paid
Response includes nex.cost_usd, nex.latency_ms, nex.request_id. No surprise invoices.
Ready to halve your embedding bill?
Free $5 credit. No card. OpenAI SDK works as-is.