NexToken API Reference
NexToken provides a unified REST API that routes your LLM requests to the optimal provider — OpenAI, Anthropic, Google DeepMind, Meta, DeepSeek, and Mistral — automatically. Drop-in compatible with the OpenAI API format. No SDK changes required for most integrations.
base_url to https://api.nextoken.biz/v1 and your api_key to your NexToken key. No other changes needed for basic usage.
Authentication
All requests must include your NexToken API key in the Authorization header using the Bearer scheme.
API keys are prefixed nxt_sk_ for standard keys and nxt_sub_ for sub-keys. Manage your keys in the API Keys dashboard.
Quickstart
Make your first request in under 60 seconds. The example below sends a chat completion request routed automatically to the best available provider.
Base URL & Versioning
All API endpoints are served from the base URL below. The current stable version is v1.
Breaking changes will be introduced under a new version prefix (e.g. /v2). Minor additions are non-breaking and released without version bumps. Subscribe to the status page for API deprecation notices.
⭐ NexToken Native Models
NexToken's proprietary models — built for cost-efficiency and Asia-Pacific compliance.
A single stable API across providers. We handle infrastructure, you focus on building. Underlying inference architecture is proprietary.
| Model | Context | Price ($/1M in / out) | Best for |
|---|---|---|---|
nex-pro ★ Default | 32k | $0.10 / $0.40 | Default choice for chat, code, content, summarisation. Self-hosted Singapore GPU. Strong Chinese + English. Lowest latency in APAC. ~95% cheaper than GPT-4o |
nex-auto | smart | $0.30 / $1.20 | Network picks per-request between nex-pro / nex-reasoning. Actual target surfaced in nex.smart_router. |
nex-reasoning | 128k | $1.20 / $4.80 | Multi-step math, logic, structured analysis. No tool calling. ~90% cheaper than o1 |
nex-embed-zh | 512 | $0.01 / — | Chinese-strong embeddings, 1024-dim. ~50% cheaper than text-embedding-3-small |
Legacy IDs. nex-smart and nex-coder still work — both are transparent aliases of nex-pro. No code changes required if you're already using them.
Quick example: nex-pro
When to use which
- nex-pro — Default for everything. Chat, code, content, summarisation, classification, customer support. 32K context, OpenAI tool calling, lowest latency in APAC. Start here.
- nex-reasoning — Multi-step math, logical proofs, structured analysis. Not for tool calling — if you need both reasoning and tools, use
nex-pro. - nex-auto — Don't want to pick? The gateway routes per-request between
nex-proandnex-reasoning. Actual target reported innex.smart_router. - nex-embed-zh — Chinese-strong embeddings (1024-dim BGE-large-zh-v1.5).
All NexToken Native chat models support streaming and tool calling (nex-reasoning excepted — see above). Response includes a nex.provider field set to "nex". Detailed pricing comparison: see pricing page.
Chat Completions
Create a model response for a given chat conversation. Fully compatible with the OpenAI Chat Completions schema.
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
| model | string | required | Model identifier. Recommended: nex-pro (self-hosted Singapore GPU, 32K context, ~95% cheaper than GPT-4o). Other Nex models: nex-reasoning, nex-auto. Or pass a vendor model directly: gpt-4o, claude-sonnet-4-6, gemini-2.5-pro, deepseek-v3. |
| messages | array | required | Array of message objects with role (system/user/assistant) and content. |
| stream | boolean | optional | If true, returns a Server-Sent Events stream. Default: false. |
| max_tokens | integer | optional | Maximum tokens in the response. Defaults to model maximum. |
| temperature | number | optional | Sampling temperature 0–2. Higher = more random. Default: 1. |
| top_p | number | optional | Nucleus sampling. Alterative to temperature. Default: 1. |
| tools | array | optional | Array of tool definitions for function calling. Only supported on compatible models. |
| nex_routing | object | optional | NexToken routing hints. See Routing Hints below. |
The nex object in every response provides routing transparency: which provider served the request, total latency, exact cost charged, and the router's confidence score.
Streaming (SSE)
Set stream: true to receive a Server-Sent Events stream. Each event contains a delta with partial content. The stream terminates with a data: [DONE] sentinel.
usage field (where supported by the provider). If unavailable, NexToken estimates tokens using tiktoken and logs a warning in your request detail view.Routing Hints (nex_routing)
Pass the optional nex_routing object to influence how NexToken routes your request.
| Field | Type | Description |
|---|---|---|
| strategy | string | "cost" · "latency" · "quality" · "balanced" (default) |
| providers | array | Allowlist of provider names. E.g. ["openai","anthropic"] pins routing to those two. |
| exclude_providers | array | Denylist of providers to never route to for this request. |
| fallback | boolean | If true (default), automatically retry with next-best provider on failure. |
| max_fallback_attempts | integer | Maximum fallback retries. Default: 2. |
Embeddings
Create vector embeddings from input text(s). Compatible with the OpenAI Embeddings schema, so existing OpenAI SDK code works unchanged. Routed to NexToken's self-hosted GPU in Singapore — your data stays in the region.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
| model | string | required | Embedding model id. Currently nex-embed-zh (BGE-large-zh-v1.5, 1024-dim). |
| input | string | string[] | required | A single string or batch of strings to embed. Max 256 strings per call. Each string up to 512 tokens. |
| encoding_format | string | float (default) or base64. | |
| dimensions | integer | Accepted but ignored — native dim is 1024. | |
| user | string | Optional end-user identifier for abuse monitoring. |
Quick example
Response shape
Errors: 404 NEX_EMBED_MODEL_UNKNOWN, 400 NEX_EMBED_EMPTY_INPUT, 401 NEX_AUTH_REQUIRED, 403 NEX_MODEL_NOT_ALLOWED, 429 NEX_RATE_LIMIT, 503 NEX_EMBED_UPSTREAM_FAILED. The full OpenAPI spec lives at embedding/api-spec/openapi.yaml.
List Models
Returns all models available for routing through your NexToken account.
List Providers
Returns real-time health and availability status for all connected providers.
Tokenize new · May 2026
Count tokens for a string or an OpenAI-style messages list without paying for an upstream call.
The response carries an accuracy band — exact for OpenAI / GPT-4 family,
approx_5pct for Claude / Llama / Mistral, approx_15pct for Chinese-friendly
tokenizers (Qwen / DeepSeek / GLM).
Estimate Cost new · May 2026
Quote a chat completion before sending it. Returns wholesale + retail USD plus a fits
flag that tells you whether the input is within the model's context window. Useful for
budget gates and "show price before send" client UX.
Batch new · May 2026 · 30% off
Fan out up to 100 chat-completion items in one call. Each item gets its own response in
the same order, with per-item retail cost. 30% retail discount applies to every
successful item. Item shape mirrors OpenAI's /v1/batches input format so
existing JSONL builders work unchanged.
Images new · May 2026
Generate images via DALL-E 3 / DALL-E 3 HD. OpenAI-compatible payload — the response carries
a nex envelope with request id, provider, cost, and latency.
Audio new · May 2026
Two endpoints: Whisper transcription + OpenAI TTS speech synthesis. Both bill at OpenAI list × 1.20 markup. Transcription bills by estimated minutes; speech bills by 1K input characters.
Prompt Templates new · May 2026
Customer-managed prompt templates with {{variable}} substitution. CRUD plus a
server-side /render endpoint that's handy for testing variable substitution
without firing a chat completion. Quotas: 200 templates × 64 KB / user.
Fine-tunes new · May 2026 · beta
API surface to queue, list, and poll fine-tune jobs against your training files. The shape
mirrors OpenAI's /v1/fine_tuning/jobs so the OpenAI SDK targets it unchanged. Jobs
currently stay in status: "queued" until the LoRA training worker is enabled —
integrate today, take results when the backend comes online.
Response nex Metadata expanded · May 2026
Every /v1/chat/completions response carries a nex envelope alongside the
OpenAI-standard fields. The block grew in May 2026 to surface the new gateway capabilities:
Clients that ignore unknown JSON fields (the OpenAI SDK does by default) are unaffected by these additions — every new field is opt-in for whoever wants to inspect it.
List API Keys
Returns all API keys in your account. Key secrets are never returned after creation — only masked prefixes.
Create API Key
| Parameter | Type | Required | Description |
|---|---|---|---|
| name | string | required | Human-readable label for this key. |
| budget_usd | number | optional | Monthly spending cap in USD. Requests return 402 when exceeded. |
| rpm_limit | integer | optional | Per-key RPM ceiling. Inherits account limit if omitted. |
| allowed_models | array | optional | Allowlist of model IDs. All models allowed if omitted. |
| expires_at | string | optional | ISO 8601 expiry timestamp. Key auto-revokes at this time. |
nxt_sk_…) is returned only once at creation. Store it securely — it cannot be retrieved again.Revoke API Key
Immediately revokes the key. In-flight streaming requests complete within 120 seconds. New requests with this key return 401 Unauthorized immediately.
Wallet Balance
Top Up Wallet
Initiates a top-up via Stripe. Returns a checkout_url to redirect the user for payment. Programmatic top-up (saved card) is available for Business and Enterprise plans.
Usage Summary
| Query Param | Type | Description |
|---|---|---|
| from | string | ISO 8601 start date. Default: start of current month. |
| to | string | ISO 8601 end date. Default: now. |
| group_by | string | day · model · provider · key |
Request Logs
Returns paginated request logs. Retention period depends on your plan: 7 days (Developer), 30 days (Pro), 90 days (Business), 365 days (Enterprise + Extended Audit Logs add-on).
Error Codes
NexToken uses standard HTTP status codes. All error responses include a JSON body with error.code and error.message.
Retry-After header.Rate Limits
Rate limits are enforced per API key using a sliding window algorithm. The current window and remaining capacity are returned in every response header.
| Plan | RPM | Daily Requests | Concurrent Streams |
|---|---|---|---|
| Developer | 100 | 10,000 | 5 |
| Pro | 1,000 | 100,000 | 25 |
| Business | 10,000 | 1,000,000 | 100 |
| Enterprise | Custom | Custom | Custom |
SDKs & Libraries
NexToken is compatible with any OpenAI-compatible SDK. Simply point base_url at https://api.nextoken.biz/v1.
- Python —
pip install openai(official OpenAI SDK, setbase_url) - Node.js / TypeScript —
npm install openai - Go —
github.com/sashabaranov/go-openai - Rust —
async-openaicrate - LangChain — Use
ChatOpenAIwith customopenai_api_base - LlamaIndex — Use
OpenAI(api_base=...)
A native NexToken SDK with routing-specific features (provider pinning, cost callbacks, routing telemetry) is on the roadmap for Q3 2025.
Changelog
v1.2.0 — June 2025
- Added
nex_routing.strategyfield for per-request routing hints - Added
nex.routing_scoreto response metadata - Fixed: streaming token count now uses provider
usagefield where available - Fixed:
budget:zeroRedis flag now permanent (no TTL) — eliminates 1-hour grace period
v1.1.0 — April 2025
- Added DeepSeek V3 and Mistral Large 2 support
- Added sub-key management endpoints
- Added
X-RateLimit-*response headers - Improved GST invoice generation — tax base now calculated on post-discount amount
v1.0.0 — January 2025
- Initial stable release
- OpenAI-compatible Chat Completions endpoint
- Provider routing: OpenAI, Anthropic, Google, Meta (Llama)
- Wallet top-up via Stripe