★ NEW · NexToken Embeddings (Public Beta)

Chinese-strong embeddings, ~50% cheaper than OpenAI.

Self-hosted BGE-large-zh-v1.5 behind one OpenAI-compatible endpoint. Switch your base_url, save half on every embedding call. Hosted in Singapore for APAC compliance — no data leaves the region.

−50%
vs OpenAI
1024
Dim
~40ms
P50 latency
SG
APAC hosted

Drop-in for OpenAI. Half the price.

Same SDK. Just change base_url and the model name. Pay half.

Feature OpenAI text-embedding-3-small Cohere embed-multilingual-v3 NexToken nex-embed-zh
Price (per 1M input tokens)$0.020$0.100$0.010 (−50% / −90%)
Output dimensions1,5361,0241,024
Max input tokens8,191512512
Strong on ChineseGeneric✓ BGE-large-zh, trained for CN
OpenAI SDK drop-in✓ Just change base_url
APAC compliance hosting✗ (US)Multi-region (premium)✓ Singapore default
Wallet-based prepaid billing✓ No surprise overages
Free tier$5 in trial creditCard required✓ $5 free, no card

Three lines of code, half the price.

If you're already using the OpenAI SDK for embeddings, switching is a one-line change: point base_url at NexToken and pick nex-embed-zh.

  • Real OpenAI-compatible response — works with the official Python and Node SDKs unchanged
  • 1024-dim float vectors served from a Singapore GPU; ~40ms P50 latency end-to-end
  • Per-call cost shown in the response so you always know what you're spending
from openai import OpenAI

client = OpenAI(
    api_key="NEX_...",
    base_url="https://api.nextoken.biz/v1",
)

# Self-hosted Chinese-strong embedding
# BGE-large-zh-v1.5, 1024 dim, hosted in Singapore
resp = client.embeddings.create(
    model="nex-embed-zh",
    input=["hello", "你好", "こんにちは"],
)

print(resp.data[0].embedding[:4])
print(f"Cost: ${resp.nex.cost_usd}")

Today: nex-embed-zh. Soon: more.

We're scaling under an AWS GPU quota constraint in Singapore. The first model ships today; multilingual + long-context land as soon as the second card is approved.

● Live

nex-embed-zh

BAAI/bge-large-zh-v1.5 · MIT
$0.01
per 1M input tokens
1,024 dimensions
512 token max input
Strong on Chinese + English
~50% under text-embedding-3-small
Coming next

nex-embed-multilingual

BAAI/bge-m3 · MIT
~$0.005
per 1M input tokens (planned)
1,024 dimensions
8,192 token context
100+ languages
Dense + sparse + multi-vector
Coming next

nex-embed-large

Qwen/Qwen3-Embedding-8B · Apache-2.0
~$0.030
per 1M input tokens (planned)
4,096 dimensions
32,768 token context
SOTA on Chinese benchmarks
Pairs with nex-rerank

Self-hosted. Singapore-private. Transparent.

Every call is served on our Singapore GPU. Nothing fan-out to OpenAI behind the scenes. The response.nex object tells you exactly what happened and what it cost.

1

You POST /v1/embeddings

OpenAI-compatible request body. nex-embed-zh is the current model id.

2

Singapore GPU serves

Private VPC; never leaves the region. P50 ~40ms end-to-end through the public API.

3

You see what you paid

Response includes nex.cost_usd, nex.latency_ms, nex.request_id. No surprise invoices.

Ready to halve your embedding bill?

Free $5 credit. No card. OpenAI SDK works as-is.