No hidden fees. No vendor lock-in. NexToken routes your requests to the optimal provider — automatically.
nex-auto
Every chat completion now passes upstream prompt-cache savings through automatically.
Near-duplicate prompts bill at 5% of normal retail via our self-hosted semantic cache.
Switch model: "nex-auto" and let the gateway pick the right tier per prompt.
No client code changes required.
nex-auto smart router · batch endpoint 30% off · tokenize + estimate-costNexToken charges provider cost + a small routing markup. Prices per 1,000,000 tokens (1M tokens). Input / Output priced separately.
| Model | Input ($/1M) | Output ($/1M) | Best For | Capabilities |
|---|---|---|---|---|
| nex-pro32k ★ Default | $0.10 | $0.40 | The default Nex model. Chat, code, content, summarisation. Self-hosted Singapore GPU. Strong Chinese + English. ~95% cheaper than GPT-4o | stream tools |
| nex-autosmart | $0.30 | $1.20 | Don't want to pick a model? Network picks per-request between nex-pro / nex-reasoning. Actual target surfaced in nex.smart_router. |
stream tools |
| nex-reasoning128k | $1.20 | $4.80 | Multi-step math, logic, structured analysis. No tool calling. ~90% cheaper than o1 | stream |
| nex-embed-zh512 | $0.01 | — | Chinese-strong embeddings, 1024-dim. Self-hosted, marginal cost ~0. ~50% cheaper than text-embedding-3-small | /v1/embeddings |
nex-smart and nex-coder still work — they are transparent aliases for nex-pro. No code changes needed.
Why NexToken Native? Single stable API, no vendor lock-in, optimised cost-performance. Powered by NexToken's intelligent routing infrastructure. Underlying inference architecture details are proprietary.
| Model | Input ($/1M) | Output ($/1M) | Markup | Capabilities |
|---|---|---|---|---|
| gpt-4o128k | $2.60 | $10.40 | +8% | stream vision tools |
| gpt-4o-mini128k | $0.16 | $0.65 | +8% | stream vision tools |
| o1200k | $16.20 | $64.80 | +8% | tools |
| o3-mini200k | $1.13 | $4.40 | +8% | stream tools |
| Model | Input ($/1M) | Output ($/1M) | Markup | Capabilities |
|---|---|---|---|---|
| claude-opus-4200k | $15.00 | $75.00 | +7% | stream vision tools |
| claude-sonnet-4200k | $3.00 | $15.00 | +7% | stream vision tools |
| claude-haiku-4200k | $0.80 | $4.00 | +7% | stream vision tools |
| Model | Input ($/1M) | Output ($/1M) | Markup | Capabilities |
|---|---|---|---|---|
| gemini-2.5-pro1M | $1.25 | $10.00 | +8% | stream vision tools |
| gemini-2.5-flash1M | $0.075 | $0.30 | +8% | stream vision tools |
| Model | Input ($/1M) | Output ($/1M) | Markup | Capabilities |
|---|---|---|---|---|
| llama-3.3-70b128k | $0.59 | $0.79 | +10% | stream tools |
| deepseek-v364k | $0.27 | $1.10 | +10% | stream tools |
| mistral-large-2128k | $2.00 | $6.00 | +10% | stream tools |
The higher your monthly token spend, the lower your effective markup. Tiers reset on the 1st of each month.
| Tier | Monthly Token Spend | Markup Rate | Effective Saving | Unlocks |
|---|---|---|---|---|
| Developer | $0 – $500 | Standard | — | 3 keys, 20 RPM |
| Pro | $500 – $5,000 | −1% | Up to $50/mo | 200 RPM, analytics |
| Business | $5,000 – $50,000 | −2.5% | Up to $1,250/mo | Custom routing, SLA |
| Enterprise | $50,000+ | Negotiated | Up to 15%+ | Dedicated cluster, custom terms |
* Billing tiers are independent from Loyalty tiers. Billing tiers reflect monthly spend volume; Loyalty tiers reflect cumulative top-up history.
Cumulative top-up milestones unlock permanent wallet bonuses. Tiers do not reset — they track your total lifetime top-up.
Loyalty bonus credits are applied to your wallet at time of top-up. Example: Gold user tops up $1,000 → receives $1,080 wallet balance (+8% bonus). Bonuses do not stack with promotional codes.
Optional extras available on Pro and Business plans.
Detailed breakdown of what's included in each plan.
| Feature | Developer | Pro | Business | Enterprise |
|---|---|---|---|---|
| Limits & Access | ||||
| Monthly free credits | $5 one-time | $10 incl. | $30 incl. | Custom |
| Requests per minute (RPM) | 20 | 200 | 1,000 | Custom |
| API keys | 3 | 20 | Unlimited | Unlimited |
| Sub-keys per key | ✕ | 3 | 10 | Unlimited |
| Context window support | Up to 128k | Up to 200k | Up to 1M | Up to 1M+ |
| Routing & Intelligence | ||||
| Smart auto-routing | ✓ | ✓ | ✓ | ✓ |
| Provider fallback | ✕ | ✓ | ✓ | ✓ |
| Custom routing rules | ✕ | ✕ | ✓ | ✓ |
| Dedicated routing cluster | ✕ | ✕ | ✕ | ✓ |
| Streaming (SSE) | ✓ | ✓ | ✓ | ✓ |
| Budget & Controls | ||||
| Per-key budget limits | ✕ | ✓ | ✓ | ✓ |
| Auto top-up | ✕ | ✓ | ✓ | ✓ |
| Spend alerts | Email only | ✓ | ✓ | ✓ |
| Multi-wallet (teams) | ✕ | ✕ | ✓ | ✓ |
| Observability | ||||
| Request logs retention | 7 days | 30 days | 90 days | 365 days |
| Usage analytics dashboard | Basic | Standard | Advanced | Advanced + export |
| Cost attribution labels | ✕ | ✕ | ✓ | ✓ |
| SLA & Support | ||||
| Uptime SLA | ✕ | ✕ | 99.9% | 99.99% |
| Support channel | Community | Priority email | Dedicated Slack | |
| Response time | Best effort | <48h | <8h | <2h |
| Custom invoicing / PO | ✕ | ✕ | ✕ | ✓ |
Cete Ventures Pte. Ltd. (UEN: 202421160G) is GST-registered in Singapore. Platform subscription fees are subject to 9% GST for Singapore-based customers. Model usage credits for SG GST-registered businesses with a valid GST number may qualify for input tax claim — contact us to provide your registration number. Non-SG customers are invoiced without GST. A valid GST invoice is issued for every transaction.
Join hundreds of developers and teams using NexToken to reduce LLM costs and improve reliability.