★ NEW · NexToken Embeddings (Public Beta)

Chinese-strong embeddings, ~50% cheaper than OpenAI.

Name: NexToken Embeddings
Brand: NexToken
Price: 0.01 USD

Self-hosted BGE-M3 behind one OpenAI-compatible endpoint — 1024-dim, 8K context, multilingual. Switch your base_url, save half on every embedding call. Hosted in Singapore for APAC compliance — no data leaves the region.

Get free API key See code

−50%

vs OpenAI

1024

Dim

~40ms

P50 latency

APAC hosted

Drop-in for OpenAI. Half the price.

Same SDK. Just change base_url and the model name. Pay half.

Feature	OpenAI text-embedding-3-small	Cohere embed-multilingual-v3	NexToken nex-embed-zh
Price (per 1M input tokens)	$0.020	$0.100	$0.010 (−50% / −90%)
Output dimensions	1,536	1,024	1,024
Max input tokens	8,191	512	8,192
Strong on Chinese	—	Generic	✓ BGE-M3, SOTA on C-MTEB
OpenAI SDK drop-in	—	✗	✓ Just change base_url
APAC compliance hosting	✗ (US)	Multi-region (premium)	✓ Singapore default
Wallet-based prepaid billing	✗	✗	✓ No surprise overages
Free tier	$5 in trial credit	Card required	✓ $5 free, no card

Three lines of code, half the price.

If you're already using the OpenAI SDK for embeddings, switching is a one-line change: point base_url at NexToken and pick nex-embed-zh.

Real OpenAI-compatible response — works with the official Python and Node SDKs unchanged
1024-dim float vectors served from a Singapore GPU; ~40ms P50 latency end-to-end
Per-call cost shown in the response so you always know what you're spending

from openai import OpenAI

client = OpenAI(
    api_key="NEX_...",
    base_url="https://api.nextoken.biz/v1",
)

# Self-hosted Chinese-strong embedding
# BGE-M3, 1024 dim, 8K context, hosted in Singapore
resp = client.embeddings.create(
    model="nex-embed-zh",
    input=["hello", "你好", "こんにちは"],
)

print(resp.data[0].embedding[:4])
print(f"Cost: ${resp.nex.cost_usd}")

Today: `nex-embed-zh`. Soon: more.

We're scaling under an AWS GPU quota constraint in Singapore. The first model ships today; multilingual + long-context land as soon as the second card is approved.

● Live

nex-embed-zh

BAAI/bge-m3 · MIT

$0.01

per 1M input tokens

1,024 dimensions
8,192 token max input
Strong on Chinese + 100 languages
~50% under text-embedding-3-small

Coming next

nex-embed-large

Qwen/Qwen3-Embedding-8B · Apache-2.0

~$0.030

per 1M input tokens (planned)

4,096 dimensions
32,768 token context
SOTA on Chinese benchmarks
Pairs with nex-rerank

Self-hosted. Singapore-private. Transparent.

Every call is served on our Singapore GPU. Nothing fan-out to OpenAI behind the scenes. The response.nex object tells you exactly what happened and what it cost.

You POST /v1/embeddings

OpenAI-compatible request body. nex-embed-zh is the current model id.

Singapore GPU serves

Private VPC; never leaves the region. P50 ~40ms end-to-end through the public API.

You see what you paid

Response includes nex.cost_usd, nex.latency_ms, nex.request_id. No surprise invoices.

Ready to halve your embedding bill?

Free $5 credit. No card. OpenAI SDK works as-is.

Get free API key Read docs