Home / Qwen2.5 VL 72B Instructions
qwen/qwen2.5-vl-72b-instruct

Qwen 2.5 VL 72B Instruction Manual

qwen/qwen2.5-vl-72b-instruct
Qwen2.5-VL, the latest vision-language model in the Qwen2.5 series, delivers enhanced multimodal capabilities, including advanced visual comprehension for object and text recognition, chart and layout analysis, and agent-based dynamic tool orchestration. It processes long-form videos (over 1 hour) with key event detection while enabling precise spatial annotation through bounding boxes or coordinate points. The model specializes in extracting structured data from scanned documents (such as invoices and tables) and achieves state-of-the-art performance across multimodal benchmarks covering image understanding, temporal video analysis, and agent task evaluations.
Price
Input$0.8 per million tokens
Output$0.8 per million tokens

Use the following code example to integrate our API:

1from openai import OpenAI
2
3client = OpenAI(
4    api_key="<Your API Key>",
5    base_url="https://api.jiekou.ai/openai"
6)
7
8response = client.chat.completions.create(
9    model="qwen/qwen2.5-vl-72b-instruct",
10    messages=[
11        {"role": "system", "content": "You are a helpful assistant."},
12        {"role": "user", "content": "Hello, how are you?"}
13    ],
14    max_tokens=32768,
15    temperature=0.7
16)
17
18print(response.choices[0].message.content)

Information

Provider
Quantification
bf16

Supported Features

Context length
32768
Maximum output
32768
serverless
Support
Input Capabilities
text, image, video
Output Capabilities
text
Contact Us