ZhipuAI

GLM-5V-Turbo

GLM-5V-Turbo is Z.AI’s first multimodal coding foundation model, built for vision-based coding tasks. It can natively process multimodal inputs such as images, video, and text, while also excelling at long-horizon planning, complex coding, and action execution. Deeply optimized for agent workflows, it works seamlessly with agents such as Claude Code and OpenClaw to complete the full loop of “understand the environment → plan actions → execute tasks”.

Text

GLM-5-Turbo

GLM-5 is an open-source foundation model engineered for complex system engineering and long-horizon agent tasks, delivering reliable productivity for top-tier programmers. Transcending the boundary between "writing code" and "building systems," it goes beyond traditional snippet generation to offer senior-architect-level planning and execution capabilities. By rejecting the "frontend-heavy, logic-light" approach, GLM-5 demonstrates exceptional reasoning and self-healing capabilities in backend refactoring, complex algorithm implementation, and deep debugging—autonomously analyzing logs and iteratively fixing persistent bugs until the system runs. As the first open-source model featuring Opus-class style and system engineering depth, GLM-5 provides extreme logic density alongside the freedom of local deployment and high cost-effectiveness, making it the ideal choice for large-scale backend development and automated Agent construction.

Text

GLM-5

Text

GLM-OCR

GLM-OCR is a lightweight professional OCR model with only 0.9 billion parameters, achieving state-of-the-art performance with a score of 94.62 on OmniDocBench V1.5. Optimized for real-world business scenarios, it delivers high-precision recognition for handwritten text, stamps, and code documentation. The model supports the direct rendering of complex tables into HTML code and the extraction of structured data from IDs and receipts into standard JSON format, enabling high-accuracy document parsing with minimal resource consumption.

Text

GLM-4.7-Flash

GLM-4.7-Flash, a state-of-the-art model in the 30B class, offers an impressive balance of high performance and efficiency. Designed specifically for Agentic Coding, it enhances coding proficiency, long-horizon planning, and tool synergy, achieving top-tier results on public benchmarks among open-source models of comparable size. It excels at complex agent tasks with superior instruction following for tool usage, while significantly improving the frontend aesthetics and completion efficiency of long-range workflows in Artifacts and Agentic Coding.

Text

GLM-4.7

GLM-4.7 is Z.AI's latest flagship model, featuring major upgrades focused on advanced coding capabilities and more reliable multi-step reasoning and execution. It demonstrates significant improvements in complex agent workflows, while delivering a more natural conversational experience and stronger front-end design sensibility.

Text

GLM 4.5V

Z.ai's GLM-4.5V sets a new standard in visual reasoning, achieving state-of-the-art performance across 42 benchmarks among open-source models. Beyond benchmarks, it excels in real-world applications through hybrid training, enabling comprehensive visual understanding—from image and video analysis and GUI interaction to complex document processing and precise visual grounding. In China's GeoGuessr challenge, GLM-4.5V outperformed 99% of 21,000 human players within 16 hours, reaching 66th place within a week. Built on the GLM-4.5-Air foundation and adopting the approach of GLM-4.1V-Thinking, it utilizes a 106-billion-parameter MoE architecture for scalable, efficient performance. This model bridges advanced AI research with practical deployment, delivering unmatched visual intelligence

Text

GLM-4.5

The GLM-4.5 Series models are foundation models specifically engineered for intelligent agents. The flagship GLM-4.5 integrates a total of 355 billion parameters (32 billion active), unifying reasoning, coding, and agent capabilities to address complex application demands. As a hybrid reasoning system, it offers two operational modes: - Thinking Mode: Enables complex reasoning, tool invocation, and strategic planning - Non-Thinking Mode: Delivers low-latency responses for real-time interactions This architecture bridges high-performance AI with adaptive functionality for dynamic agent environments.

Image

GLM Image Generation

GLM 系列提供稳定的生成能力，适合生产场景。该系列面向生产级调用，强调稳定性与可控输出。适合通用内容生成与工具调用，便于集成到你的生产工作流。即时推理 API，性能稳定，无需等待，价格亲民

Audio

GLM Voice Clone

The Glm series offers reliable synthesis capabilities, making it ideal for production environments. Designed for production-grade use, this series prioritizes stability and predictable output. The speech synthesis feature supports multiple languages and emotional tone control, making it suitable for voiceovers, announcements, customer service, and character dialogue. The real-time inference API delivers stable performance with no waiting time and is affordably priced.

Audio

GLM Audio to Text

The Glm series offers reliable generation capabilities, making it ideal for production environments. Designed for production-level use, this series prioritizes stability and controllable output. Its speech-to-text capabilities are well-suited for transcribing meeting and customer service recordings, supporting stable recognition and timeline output even in noisy environments. The real-time inference API delivers consistent performance with no waiting time and is affordably priced.

Audio

GLM Text-to-Speech

Image

Z Image Turbo LoRA

The Z Series offers reliable generation capabilities, making it ideal for production environments. Designed for production-level use, this series prioritizes stability and predictable output. It is suitable for general-purpose content generation and tool integration, and can be easily incorporated into your production workflow. The real-time inference API delivers consistent performance with no waiting time and is affordably priced.

Image