Home / Qwen3 Next 80B A3B Instruct
qwen/qwen3-next-80b-a3b-instruct

Qwen3 Next 80B A3B Instruct

qwen/qwen3-next-80b-a3b-instruct
Qwen3-Next employs a highly sparse MoE architecture: 80 billion total parameters, but only approximately 3 billion are activated per inference step. Experiments show that, with global load balancing, increasing the total number of expert parameters while keeping the number of activated experts fixed steadily reduces training loss. Compared to Qwen3’s MoE (128 total experts, 8 routed), Qwen3-Next expands to 512 total experts, combining 10 routed experts and 1 shared expert—maximizing resource usage without compromising performance. The Qwen3-Next-80B-A3B-Instruct performs comparably to our flagship model Qwen3-235B-A22B-Instruct-2507, and demonstrates clear advantages in tasks requiring ultra-long context (up to 256K tokens).
Price
Input$0.15 per million tokens
Output$1.5 per million tokens

Use the following code example to integrate our API:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.jiekou.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="qwen/qwen3-next-80b-a3b-instruct",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=65536,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Information

Provider
Quantification
bf16

Supported Features

Context length
65536
Maximum output
65536
Function call
Support
Structured output
Support
serverless
Support
Input Capabilities
text
Output Capabilities
text
Contact Us