Qwen3 Next 80B A3B Thinking

qwen/qwen3-next-80b-a3b-thinking

Qwen3-Next employs a highly sparse MoE design: 80 billion total parameters, but only approximately 3 billion are activated per inference step. Experiments show that, with global load balancing, increasing the total number of expert parameters while keeping the number of activated experts fixed steadily reduces training loss. Compared to Qwen3’s MoE (128 total experts, 8 routed), Qwen3-Next expands to 512 total experts, combining 10 routed experts + 1 shared expert — maximizing resource usage without compromising performance. The Qwen3-Next-80B-A3B-Thinking excels at complex reasoning tasks — outperforming higher-cost models like Qwen3-30B-A3B-Thinking-2507 and Qwen3-32B-Thinking, outperforming the closed-source Gemini-2.5-Flash-Thinking on multiple benchmarks, and approaching the performance of our top-tier model Qwen3-235B-A22B-Thinking-2507.

Price

Input	$0.15 per million tokens
Output	$1.5 per million tokens

Use the following code example to integrate our API:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.highwayapi.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="qwen/qwen3-next-80b-a3b-thinking",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=65536,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Information

Provider

Alibaba

Quantification

bf16

Supported Features

Context length

65536

Maximum output

65536

Function call

Support

Structured output

Support

Reasoning

Support

serverless

Support

Input Capabilities

text

Output Capabilities

text