Home / Qwen3 Next 80B A3B Thinking
qwen/qwen3-next-80b-a3b-thinking

Qwen3 Next 80B A3B Thinking

qwen/qwen3-next-80b-a3b-thinking
Qwen3-Next employs a highly sparse MoE design: 80 billion total parameters, but only approximately 3 billion are activated per inference step. Experiments show that, with global load balancing, increasing the total number of expert parameters while keeping the number of activated experts fixed steadily reduces training loss. Compared to Qwen3’s MoE (128 total experts, 8 routed), Qwen3-Next expands to 512 total experts, combining 10 routed experts + 1 shared expert — maximizing resource usage without compromising performance. The Qwen3-Next-80B-A3B-Thinking excels at complex reasoning tasks — outperforming higher-cost models like Qwen3-30B-A3B-Thinking-2507 and Qwen3-32B-Thinking, outperforming the closed-source Gemini-2.5-Flash-Thinking on multiple benchmarks, and approaching the performance of our top-tier model Qwen3-235B-A22B-Thinking-2507.
Price
Input$0.15 per million tokens
Output$1.5 per million tokens

Use the following code example to integrate our API:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.jiekou.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="qwen/qwen3-next-80b-a3b-thinking",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=65536,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Information

Provider
Quantification
bf16

Supported Features

Context length
65536
Maximum output
65536
Function call
Support
Structured output
Support
Reasoning
Support
serverless
Support
Input Capabilities
text
Output Capabilities
text
Contact Us