ai/qwen3-vllm

Verified Publisher

By Docker

Updated 4 months ago

Qwen3 is the latest Qwen LLM, built for top-tier coding, math, reasoning, and language tasks.

Model
0

10K+

ai/qwen3-vllm repository overview

Qwen3

logo

Qwen3 is the latest generation in the Qwen LLM family, designed for top-tier performance in coding, math, reasoning, and language tasks. It includes both dense and Mixture-of-Experts (MoE) models, offering flexible deployment from lightweight apps to large-scale research.

Qwen3 introduces dual reasoning modes—"thinking" for complex tasks and "non-thinking" for fast responses—giving users dynamic control over performance. It outperforms prior models in reasoning, instruction following, and code generation, while excelling in creative writing and dialogue.

With strong agentic and tool-use capabilities and support for over 100 languages, Qwen3 is optimized for multilingual, multi-domain applications.


📌 Characteristics

AttributeValue
ProviderAlibaba Cloud
Architectureqwen3
Cutoff dateApril 2025 (est.)
Languages119 languages from multiple families (Indo European, Sino-Tibetan, Afro-Asiatic, Austronesian, Dravidian, Turkic, Tai-Kadai, Uralic, Astroasiatic) including others like Japanese, Basque, Haitian,...
Tool calling
Input modalitiesText
Output modalitiesText
LicenseApache 2.0

🧠 Intended uses

Qwen3-8B is designed for a wide range of advanced natural language processing tasks:

  • Supports both Dense and Mixture-of-Experts (MoE) model architectures, available in sizes including 0.6B, 1.7B, 4B, 8B, 14B, 32B, and large MoE variants like 30B-A3B and 235B-A22B.
  • Enables seamless switching between thinking and non-thinking modes:
    • Thinking mode: optimized for complex logical reasoning, math, and code generation.
    • Non-thinking mode: tuned for efficient, general-purpose dialogue and chat.
  • Offers significant improvements in reasoning performance, outperforming previous QwQ (in thinking mode) and Qwen2.5-Instruct (in non-thinking mode) models on mathematics, code generation, and commonsense reasoning benchmarks.
  • Delivers superior human alignment and excels at: Creative writing, Role-playing, Multi-turn dialogue, Instruction following with immersive conversations.
  • Provides strong agent capabilities, including: Integration with external tools and best-in-class performance in complex agent-based workflows across both thinking and unthinking modes.
  • Offers support for 100+ languages and dialects, with robust multilingual instruction following and translation abilities.

Considerations

  • Thinking Mode Switching
    Qwen3 supports a soft switch mechanism via /think and /no_think prompts (when enable_thinking=True). This allows dynamic control over the model's reasoning depth during multi-turn conversations.
  • Tool Calling with Qwen-Agent
    For agentic tasks, use Qwen-Agent, which simplifies integration of external tools through built-in templates and parsers, minimizing the need for manual tool-call handling.

Note: Qwen3 models use a new naming convention: post-trained models no longer include the -Instruct suffix (e.g., Qwen3-32B replaces Qwen2.5-32B-Instruct), and base models now end with -Base.


🐳 Using this model with Docker Model Runner

First, pull the model:

docker model pull ai/qwen3-vllm

Then run the model:

docker model run ai/qwen3-vllm

For more information, check out the Docker Model Runner docs.


Benchmarks

CategoryBenchmarkQwen3
General TasksMMLU87.81
MMLU-Redux87.40
MMLU-Pro68.18
SuperGPQA44.06
BBH88.87
Mathematics & Science TasksGPQA47.47
GSM8K94.39
MATH71.84
Multilingual TasksMGSM83.53
MMMLU86.70
INCLUDE73.46
Code TasksEvalPlus77.60
MultiPL-E65.94
MBPP81.40
CRUX-O79.00

Tag summary

Content type

Model

Digest

sha256:e57fe40c0

Size

15.3 GB

Last updated

4 months ago

docker model pull ai/qwen3-vllm:8B

This week's pulls

Pulls:

616

Last week