Qwen3 Next 80B A3B Instruct

qwen/qwen3-next-80b-a3b-instruct

Qwen3-Next employs a highly sparse MoE architecture: 80 billion total parameters, but only approximately 3 billion are activated per inference step. Experiments show that, with global load balancing, increasing the total number of expert parameters while keeping the number of activated experts fixed steadily reduces training loss. Compared to Qwen3’s MoE (128 total experts, 8 routed), Qwen3-Next expands to 512 total experts, combining 10 routed experts and 1 shared expert—maximizing resource usage without compromising performance. The Qwen3-Next-80B-A3B-Instruct performs comparably to our flagship model Qwen3-235B-A22B-Instruct-2507, and demonstrates clear advantages in tasks requiring ultra-long context (up to 256K tokens).

Price

Input	$0.15 per million tokens
Output	$1.5 per million tokens

Use the following code example to integrate our API:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.highwayapi.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="qwen/qwen3-next-80b-a3b-instruct",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=65536,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Information

Provider

Alibaba

Quantification

bf16

Supported Features

Context length

65536

Maximum output

65536

Function call

Support

Structured output

Support

serverless

Support

Input Capabilities

text

Output Capabilities

text