ai/qwen3-vllm

Verified Publisher

By Docker

•Updated 4 months ago

Qwen3 is the latest Qwen LLM, built for top-tier coding, math, reasoning, and language tasks.

Model

10K+

Overview Tags

ai/qwen3-vllm repository overview

⁠Qwen3

logo

Qwen3 is the latest generation in the Qwen LLM family, designed for top-tier performance in coding, math, reasoning, and language tasks. It includes both dense and Mixture-of-Experts (MoE) models, offering flexible deployment from lightweight apps to large-scale research.

Qwen3 introduces dual reasoning modes—"thinking" for complex tasks and "non-thinking" for fast responses—giving users dynamic control over performance. It outperforms prior models in reasoning, instruction following, and code generation, while excelling in creative writing and dialogue.

With strong agentic and tool-use capabilities and support for over 100 languages, Qwen3 is optimized for multilingual, multi-domain applications.

⁠📌 Characteristics

Attribute	Value
Provider	Alibaba Cloud
Architecture	qwen3
Cutoff date	April 2025 (est.)
Languages	119 languages from multiple families (Indo European, Sino-Tibetan, Afro-Asiatic, Austronesian, Dravidian, Turkic, Tai-Kadai, Uralic, Astroasiatic) including others like Japanese, Basque, Haitian,...
Tool calling	✅
Input modalities	Text
Output modalities	Text
License	Apache 2.0

⁠🧠 Intended uses

Qwen3-8B is designed for a wide range of advanced natural language processing tasks:

Supports both Dense and Mixture-of-Experts (MoE) model architectures, available in sizes including 0.6B, 1.7B, 4B, 8B, 14B, 32B, and large MoE variants like 30B-A3B and 235B-A22B.
Enables seamless switching between thinking and non-thinking modes:
- Thinking mode: optimized for complex logical reasoning, math, and code generation.
- Non-thinking mode: tuned for efficient, general-purpose dialogue and chat.
Offers significant improvements in reasoning performance, outperforming previous QwQ (in thinking mode) and Qwen2.5-Instruct (in non-thinking mode) models on mathematics, code generation, and commonsense reasoning benchmarks.
Delivers superior human alignment and excels at: Creative writing, Role-playing, Multi-turn dialogue, Instruction following with immersive conversations.
Provides strong agent capabilities, including: Integration with external tools and best-in-class performance in complex agent-based workflows across both thinking and unthinking modes.
Offers support for 100+ languages and dialects, with robust multilingual instruction following and translation abilities.

⁠Considerations

Thinking Mode Switching
Qwen3 supports a soft switch mechanism via /think and /no_think prompts (when enable_thinking=True). This allows dynamic control over the model's reasoning depth during multi-turn conversations.
Tool Calling with Qwen-Agent
For agentic tasks, use Qwen-Agent, which simplifies integration of external tools through built-in templates and parsers, minimizing the need for manual tool-call handling.

Note: Qwen3 models use a new naming convention: post-trained models no longer include the -Instruct suffix (e.g., Qwen3-32B replaces Qwen2.5-32B-Instruct), and base models now end with -Base.

⁠🐳 Using this model with Docker Model Runner

First, pull the model:

docker model pull ai/qwen3-vllm

Then run the model:

docker model run ai/qwen3-vllm

For more information, check out the Docker Model Runner docs⁠.

⁠Benchmarks

Category	Benchmark	Qwen3
General Tasks	MMLU	87.81
	MMLU-Redux	87.40
	MMLU-Pro	68.18
	SuperGPQA	44.06
	BBH	88.87
Mathematics & Science Tasks	GPQA	47.47
	GSM8K	94.39
	MATH	71.84
Multilingual Tasks	MGSM	83.53
	MMMLU	86.70
	INCLUDE	73.46
Code Tasks	EvalPlus	77.60
	MultiPL-E	65.94
	MBPP	81.40
	CRUX-O	79.00

⁠🔗 Links

Qwen3: Think Deeper, Act Faster⁠

Tag summary

Recent tags

Content type

Model

Digest

sha256:e57fe40c0…

Size

15.3 GB

Last updated

4 months ago

docker model pull ai/qwen3-vllm:8B

This week's pulls

Pulls:

616

Last week

Learn more⁠