API v1 · Stable

NexToken API Reference

NexToken provides a unified REST API that routes your LLM requests to the optimal provider — OpenAI, Anthropic, Google DeepMind, Meta, DeepSeek, and Mistral — automatically. Drop-in compatible with the OpenAI API format. No SDK changes required for most integrations.

💡

OpenAI-Compatible

If you already use the OpenAI Python or Node.js SDK, simply change the base_url to https://api.nextoken.biz/v1 and your api_key to your NexToken key. No other changes needed for basic usage.

Authentication

All requests must include your NexToken API key in the Authorization header using the Bearer scheme.

Authorization header HTTP

Authorization: Bearer nxt_sk_••••••••••••••••••••••••••••••••

API keys are prefixed nxt_sk_ for standard keys and nxt_sub_ for sub-keys. Manage your keys in the API Keys dashboard.

⚠️

Keep your keys secret

Never expose API keys in client-side code, public repositories, or log files. Use environment variables or a secrets manager. Rotate keys immediately if compromised.

Quickstart

Make your first request in under 60 seconds. The example below sends a chat completion request routed automatically to the best available provider.

quickstart.sh bash

curl https://api.nextoken.biz/v1/chat/completions \
  -H "Authorization: Bearer $NEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nex-pro",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }'

Base URL & Versioning

All API endpoints are served from the base URL below. The current stable version is v1.

Base URL

https://api.nextoken.biz/v1

Breaking changes will be introduced under a new version prefix (e.g. /v2). Minor additions are non-breaking and released without version bumps. Subscribe to the status page for API deprecation notices.

⭐ NexToken Native Models

NexToken's proprietary models — built for cost-efficiency and Asia-Pacific compliance.

A single stable API across providers. We handle infrastructure, you focus on building. Underlying inference architecture is proprietary.

Model	Context	Price ($/1M in / out)	Best for
`nex-pro` ★ Default	32k	$0.10 / $0.40	Default choice for chat, code, content, summarisation. Self-hosted Singapore GPU (Qwen2.5-7B on aibox-gpu). Strong Chinese + English. Lowest latency in APAC. ~95% cheaper than GPT-4o
`nex-embed-zh`	8,192	$0.01 / —	Chinese-strong embeddings, 1024-dim (BGE-M3, self-hosted Singapore GPU). ~50% cheaper than text-embedding-3-small, multilingual

Legacy IDs. nex-smart, nex-coder, nex-reasoning, and nex-auto still resolve for existing integrations. nex-smart/nex-coder are transparent aliases of nex-pro; nex-reasoning/nex-auto are kept for backwards compatibility but no longer advertised — call nex-pro, deepseek-v3, or claude-sonnet-4-6 directly for new code.

Quick example: nex-pro

nex_pro_demo.py python

from openai import OpenAI

client = OpenAI(
    api_key="nex_live_your_key",
    base_url="https://api.nextoken.biz/v1",
)

# nex-pro — Singapore-hosted Qwen2.5-7B, 32K context, ~95% cheaper than GPT-4o
response = client.chat.completions.create(
    model="nex-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in 3 sentences."},
    ],
)

print(response.choices[0].message.content)
print(f"Cost: ${response.nex.cost_usd}")
# Output:
# Quantum computing uses qubits that can exist in superposition...
# Cost: $0.000054   ← roughly 1/10 the cost of GPT-4o

When to use which

nex-pro — Default for everything. Chat, code, content, summarisation, classification, customer support. 32K context, OpenAI tool calling, lowest latency in APAC. Start here.
nex-embed-zh — Chinese-strong embeddings (1024-dim BGE-M3, 8K context, multilingual).
Need reasoning? Call deepseek-v3 or claude-sonnet-4-6 directly through the same key — no separate model setup, same billing envelope.

All NexToken Native chat models support streaming and tool calling. Response includes a nex.provider field set to "nex". Detailed pricing comparison: see pricing page.

Chat Completions

Create a model response for a given chat conversation. Fully compatible with the OpenAI Chat Completions schema.

POST /v1/chat/completions

Request body

Parameter	Type	Required	Description
model	string	required	Model identifier. Recommended: `nex-pro` (self-hosted Singapore GPU, 32K context, ~95% cheaper than GPT-4o). Or pass a vendor model directly: `gpt-4o`, `claude-sonnet-4-6`, `gemini-2.5-pro`, `deepseek-v3`.
messages	array	required	Array of message objects with `role` (system/user/assistant) and `content`.
stream	boolean	optional	If `true`, returns a Server-Sent Events stream. Default: `false`.
max_tokens	integer	optional	Maximum tokens in the response. Defaults to model maximum.
temperature	number	optional	Sampling temperature 0–2. Higher = more random. Default: `1`.
top_p	number	optional	Nucleus sampling. Alterative to temperature. Default: `1`.
tools	array	optional	Array of tool definitions for function calling. Only supported on compatible models.
nex_routing	object	optional	NexToken routing hints. See Routing Hints below.

200 Success response

response.jsonjson

{
  "id": "chatcmpl-nxt-7f3a9c2b",
  "object": "chat.completion",
  "created": 1718000000,
  "model": "gpt-4o",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "Hello! How can I help you today?" },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 10, "completion_tokens": 9, "total_tokens": 19 },
  "nex": {
    "provider": "openai",
    "latency_ms": 387,
    "cost_usd": 0.000052,
    "routing_score": 0.94
  }
}

The nex object in every response provides routing transparency: which provider served the request, total latency, exact cost charged, and the router's confidence score.

Streaming (SSE)

Set stream: true to receive a Server-Sent Events stream. Each event contains a delta with partial content. The stream terminates with a data: [DONE] sentinel.

stream chunkjson

data: {
  "id": "chatcmpl-nxt-7f3a9c2b",
  "object": "chat.completion.chunk",
  "choices": [{
    "delta": { "content": "Hello" },
    "finish_reason": null
  }]
}

data: [DONE]

ℹ️

Token usage for streaming responses is reported in the final chunk's usage field (where supported by the provider). If unavailable, NexToken estimates tokens using tiktoken and logs a warning in your request detail view.

Routing Hints (`nex_routing`)

Pass the optional nex_routing object to influence how NexToken routes your request.

Field	Type	Description
strategy	string	`"cost"` · `"latency"` · `"quality"` · `"balanced"` (default)
providers	array	Allowlist of provider names. E.g. `["openai","anthropic"]` pins routing to those two.
exclude_providers	array	Denylist of providers to never route to for this request.
fallback	boolean	If `true` (default), automatically retry with next-best provider on failure.
max_fallback_attempts	integer	Maximum fallback retries. Default: `2`.

Embeddings

Create vector embeddings from input text(s). Compatible with the OpenAI Embeddings schema, so existing OpenAI SDK code works unchanged. Routed to NexToken's self-hosted GPU in Singapore — your data stays in the region.

POST /v1/embeddings

Request body

Field	Type	Required	Description
model	string	required	Embedding model id. Currently `nex-embed-zh` (BGE-M3, 1024-dim, 8K context).
input	string \| string[]	required	A single string or batch of strings to embed. Max 256 strings per call. Each string up to 512 tokens.
encoding_format	string	`float` (default) or `base64`.
dimensions	integer	Accepted but ignored — native dim is 1024.
user	string	Optional end-user identifier for abuse monitoring.

Quick example

embed_demo.pypython

from openai import OpenAI

client = OpenAI(
    api_key="nex_live_your_key",
    base_url="https://api.nextoken.biz/v1",
)

resp = client.embeddings.create(
    model="nex-embed-zh",
    input=["Hello world", "你好，世界"],
)

print(len(resp.data[0].embedding))  # 1024
print(resp.usage.total_tokens)
print(resp.nex.cost_usd)

Response shape

response.jsonjson

{
  "object": "list",
  "model": "nex-embed-zh",
  "data": [
    { "object": "embedding", "index": 0, "embedding": [0.0334, 0.0146, … 1024 floats] },
    { "object": "embedding", "index": 1, "embedding": [0.0653, …] }
  ],
  "usage": { "prompt_tokens": 5, "total_tokens": 5 },
  "nex": {
    "provider": "nex",
    "cost_usd": "0.00000005",
    "latency_ms": 35,
    "request_id": "req_5d4500331a944fca9bba373e"
  }
}

Errors: 404 NEX_EMBED_MODEL_UNKNOWN, 400 NEX_EMBED_EMPTY_INPUT, 401 NEX_AUTH_REQUIRED, 403 NEX_MODEL_NOT_ALLOWED, 429 NEX_RATE_LIMIT, 503 NEX_EMBED_UPSTREAM_FAILED. The full OpenAPI spec lives at embedding/api-spec/openapi.yaml.

List Models

Returns all models available for routing through your NexToken account.

GET /v1/models

200 Response

models.jsonjson

{
  "data": [
    { "id": "gpt-4o",          "provider": "openai",    "context_window": 128000, "streaming": true },
    { "id": "claude-sonnet-4",  "provider": "anthropic", "context_window": 200000, "streaming": true },
    { "id": "gemini-2.5-pro",  "provider": "google",    "context_window": 1000000,"streaming": true }
  ]
}

List Providers

Returns real-time health and availability status for all connected providers.

GET /v1/providers

Tokenize new · May 2026

Count tokens for a string or an OpenAI-style messages list without paying for an upstream call. The response carries an accuracy band — exact for OpenAI / GPT-4 family, approx_5pct for Claude / Llama / Mistral, approx_15pct for Chinese-friendly tokenizers (Qwen / DeepSeek / GLM).

POST /v1/tokenize

{
  "model": "gpt-4o",
  "input": "Hello, NexToken!"
}
// or messages:
{
  "model": "gpt-4o",
  "input": [{ "role": "user", "content": "Hi" }]
}
// → 200 OK
{
  "model": "gpt-4o",
  "tokens": 6,
  "encoding": "tiktoken/o200k_base",
  "accuracy": "exact"
}

Estimate Cost new · May 2026

Quote a chat completion before sending it. Returns wholesale + retail USD plus a fits flag that tells you whether the input is within the model's context window. Useful for budget gates and "show price before send" client UX.

POST /v1/estimate-cost

{
  "model": "gpt-4o",
  "input": [{ "role": "user", "content": "Summarise this article ..." }],
  "expected_output_tokens": 500,
  "billing_tier": "pro"
}
// → 200 OK
{
  "model": "gpt-4o",
  "input_tokens": 312,
  "output_tokens": 500,
  "wholesale_total_usd": "0.00578000",
  "retail_total_usd": "0.00686772",
  "context_window": 128000,
  "fits": true,
  "accuracy": "exact"
}

Batch new · May 2026 · 30% off

Fan out up to 100 chat-completion items in one call. Each item gets its own response in the same order, with per-item retail cost. 30% retail discount applies to every successful item. Item shape mirrors OpenAI's /v1/batches input format so existing JSONL builders work unchanged.

POST /v1/batch

{
  "items": [
    {
      "custom_id": "row-1",
      "method": "POST",
      "url": "/v1/chat/completions",
      "body": { "model": "gpt-4o-mini", "messages": [{"role":"user","content":"Translate: Hello"}] }
    }
    // up to 100 items per call
  ]
}
// → 200 OK
{
  "id": "batch_...",
  "item_count": 1,
  "success_count": 1,
  "discount_factor": 0.7,
  "total_retail_usd": "0.00000378",
  "items": [{"custom_id": "row-1", "response": {...}, "retail_usd": "0.00000378"}]
}

Images Coming soon · Beta — join waitlist

Heads up. The /v1/images/* endpoints below describe the planned interface (OpenAI-compatible). They are not live yet — calls will return 501 Not Implemented. Email waitlist@nextoken.biz to be notified when the beta opens.

Generate images via DALL-E 3 / DALL-E 3 HD. OpenAI-compatible payload — the response carries a nex envelope with request id, provider, cost, and latency.

POST /v1/images/generations

{
  "model": "dall-e-3",
  "prompt": "a kitten coding in cyberpunk style, neon lights",
  "n": 1,
  "size": "1024x1024"
}
// → 200 OK
{
  "data": [{ "url": "https://..." }],
  "nex": { "provider": "openai", "cost_usd": "0.05000000", "request_id": "img_..." }
}

Audio Coming soon · Beta — join waitlist

Heads up. The /v1/audio/* endpoints below describe the planned interface (OpenAI-compatible). They are not live yet — calls will return 501 Not Implemented. Email waitlist@nextoken.biz to be notified when the beta opens.

Two endpoints: Whisper transcription + OpenAI TTS speech synthesis. Both bill at OpenAI list × 1.20 markup. Transcription bills by estimated minutes; speech bills by 1K input characters.

POST /v1/audio/transcriptions

// multipart/form-data — file=<audio bytes> · model=whisper-1
{
  "text": "Hello, this is NexToken.",
  "nex": {
    "provider": "openai",
    "cost_usd": "0.00072000",
    "estimated_minutes": 0.1
  }
}

POST /v1/audio/speech

{
  "model": "tts-1",
  "input": "Hello from NexToken",
  "voice": "alloy"
}
// → 200 OK · audio/mpeg body · X-Nex-Cost-Usd / X-Nex-Request-Id headers

Prompt Templates new · May 2026

Customer-managed prompt templates with {{variable}} substitution. CRUD plus a server-side /render endpoint that's handy for testing variable substitution without firing a chat completion. Quotas: 200 templates × 64 KB / user.

POST /v1/templates

GET /v1/templates

GET /v1/templates/{id}

POST /v1/templates/{id}/render

// Create
POST /v1/templates
{
  "name": "customer-greeting",
  "content": "Hello, {{name}}! How can I help with your {{product}} order?"
}
// Render
POST /v1/templates/<id>/render
{ "variables": { "name": "Alice", "product": "NexPro" } }
// → { "rendered": "Hello, Alice! How can I help with your NexPro order?" }

Fine-tunes Coming soon · Beta — join waitlist

Heads up. The /v1/fine_tuning/* endpoints below describe the planned interface (OpenAI-compatible). They are not live yet — calls will return 501 Not Implemented. Email waitlist@nextoken.biz to be notified when the beta opens.

API surface to queue, list, and poll fine-tune jobs against your training files. The shape mirrors OpenAI's /v1/fine_tuning/jobs so the OpenAI SDK targets it unchanged. Jobs currently stay in status: "queued" until the LoRA training worker is enabled — integrate today, take results when the backend comes online.

POST /v1/fine_tunes

GET /v1/fine_tunes/{id}

GET /v1/fine_tunes

Response `nex` Metadata expanded · May 2026

Every /v1/chat/completions response carries a nex envelope alongside the OpenAI-standard fields. The block grew in May 2026 to surface the new gateway capabilities:

{
  // always present
  "provider": "openai",                           // resolved upstream
  "cost_usd": "0.00006300",                      // retail charged to wallet
  "request_id": "req_...",
  "latency_ms": 412,

  // only when relevant (otherwise null)
  "cached_input_tokens": 1024,                  // upstream prompt cache hit
  "semantic_cache_hit": { "similarity": 0.99, "age_seconds": 120, "original_request_id": "req_..." },
  "smart_router": { "target_model": "nex-pro", "tier": "general", "reason": "chat content" },
  "pii_redactions": { "cn_phone": 1, "email": 2 },
  "injection_score": 3.5                       // only in warn-mode; block-mode returns 422
}

Clients that ignore unknown JSON fields (the OpenAI SDK does by default) are unaffected by these additions — every new field is opt-in for whoever wants to inspect it.

List API Keys

GET /v1/keys

Returns all API keys in your account. Key secrets are never returned after creation — only masked prefixes.

Create API Key

POST /v1/keys

Parameter	Type	Required	Description
name	string	required	Human-readable label for this key.
budget_usd	number	optional	Monthly spending cap in USD. Requests return 402 when exceeded.
rpm_limit	integer	optional	Per-key RPM ceiling. Inherits account limit if omitted.
allowed_models	array	optional	Allowlist of model IDs. All models allowed if omitted.
expires_at	string	optional	ISO 8601 expiry timestamp. Key auto-revokes at this time.

⚠️

The full key secret (nxt_sk_…) is returned only once at creation. Store it securely — it cannot be retrieved again.

Revoke API Key

DELETE /v1/keys/{key_id}

Immediately revokes the key. In-flight streaming requests complete within 120 seconds. New requests with this key return 401 Unauthorized immediately.

Wallet Balance

GET /v1/wallet/balance

balance.jsonjson

{
  "balance_usd": 48.32,
  "currency": "USD",
  "loyalty_tier": "silver",
  "billing_tier": "pro",
  "spend_this_month_usd": 201.68
}

Top Up Wallet

POST /v1/wallet/topup

Initiates a top-up via Stripe. Returns a checkout_url to redirect the user for payment. Programmatic top-up (saved card) is available for Business and Enterprise plans.

Usage Summary

GET /v1/usage/summary

Query Param	Type	Description
from	string	ISO 8601 start date. Default: start of current month.
to	string	ISO 8601 end date. Default: now.
group_by	string	`day` · `model` · `provider` · `key`

Request Logs

GET /v1/usage/logs

Returns paginated request logs. Retention period depends on your plan: 7 days (Developer), 30 days (Pro), 90 days (Business), 365 days (Enterprise + Extended Audit Logs add-on).

Error Codes

NexToken uses standard HTTP status codes. All error responses include a JSON body with error.code and error.message.

400 bad_request

Malformed request body or invalid parameters.

401 unauthorized

Missing or invalid API key. Key may be revoked.

402 wallet_empty

Wallet balance is zero. Top up to resume requests.

403 budget_exceeded

Per-key monthly budget cap reached.

404 not_found

Resource (key, log entry, etc.) not found.

429 rate_limited

RPM limit exceeded. Check Retry-After header.

502 provider_error

Upstream provider returned an error. Fallback attempted.

503 no_provider

No healthy provider available for this model.

Rate Limits

Rate limits are enforced per API key using a sliding window algorithm. The current window and remaining capacity are returned in every response header.

Rate limit headersHTTP

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 947
X-RateLimit-Reset: 1718000060
Retry-After: 13  (only on 429 responses)

Plan	RPM	Daily Requests	Concurrent Streams
Developer	100	10,000	5
Pro	1,000	100,000	25
Business	10,000	1,000,000	100
Enterprise	Custom	Custom	Custom

SDKs & Libraries

NexToken is compatible with any OpenAI-compatible SDK. Simply point base_url at https://api.nextoken.biz/v1.

Python — pip install openai (official OpenAI SDK, set base_url)
Node.js / TypeScript — npm install openai
Go — github.com/sashabaranov/go-openai
Rust — async-openai crate
LangChain — Use ChatOpenAI with custom openai_api_base
LlamaIndex — Use OpenAI(api_base=...)

A native NexToken SDK with routing-specific features (provider pinning, cost callbacks, routing telemetry) is on the roadmap for Q3 2025.

Changelog

v1.2.0 — June 2025

Added nex_routing.strategy field for per-request routing hints
Added nex.routing_score to response metadata
Fixed: streaming token count now uses provider usage field where available
Fixed: budget:zero Redis flag now permanent (no TTL) — eliminates 1-hour grace period

v1.1.0 — April 2025

Added DeepSeek V3 and Mistral Large 2 support
Added sub-key management endpoints
Added X-RateLimit-* response headers
Improved GST invoice generation — tax base now calculated on post-discount amount

v1.0.0 — January 2025

Initial stable release
OpenAI-compatible Chat Completions endpoint
Provider routing: OpenAI, Anthropic, Google, Meta (Llama)
Wallet top-up via Stripe

NexToken API Reference

Authentication

Quickstart

Base URL & Versioning

⭐ NexToken Native Models

Quick example: nex-pro

When to use which

Chat Completions

Request body

Streaming (SSE)

Routing Hints (nex_routing)

Embeddings

Request body

Quick example

Response shape

List Models

List Providers

Tokenize new · May 2026

Estimate Cost new · May 2026

Batch new · May 2026 · 30% off

Images Coming soon · Beta — join waitlist

Audio Coming soon · Beta — join waitlist

Prompt Templates new · May 2026

Fine-tunes Coming soon · Beta — join waitlist

Response nex Metadata expanded · May 2026

List API Keys

Create API Key

Revoke API Key

Wallet Balance

Top Up Wallet

Usage Summary

Request Logs

Error Codes

Rate Limits

SDKs & Libraries

Changelog

v1.2.0 — June 2025

v1.1.0 — April 2025

v1.0.0 — January 2025

Routing Hints (`nex_routing`)

Response `nex` Metadata expanded · May 2026