Instructions to use Qwen/Qwen3-Next-80B-A3B-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen3-Next-80B-A3B-Thinking with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Qwen/Qwen3-Next-80B-A3B-Thinking") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Next-80B-A3B-Thinking") model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Next-80B-A3B-Thinking") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Qwen/Qwen3-Next-80B-A3B-Thinking with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/Qwen3-Next-80B-A3B-Thinking" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3-Next-80B-A3B-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Qwen/Qwen3-Next-80B-A3B-Thinking
- SGLang
How to use Qwen/Qwen3-Next-80B-A3B-Thinking with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/Qwen3-Next-80B-A3B-Thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3-Next-80B-A3B-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/Qwen3-Next-80B-A3B-Thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3-Next-80B-A3B-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Qwen/Qwen3-Next-80B-A3B-Thinking with Docker Model Runner:
docker model run hf.co/Qwen/Qwen3-Next-80B-A3B-Thinking
Add EvalEval community eval results
#22 opened 4 days ago
by
EvalEvalBot
Add LEXam evaluation results
#21 opened 7 days ago
by
joelniklaus
Add GPQA evaluation result
#20 opened 5 months ago
by
merve
Add GPQA evaluation result
1
#19 opened 5 months ago
by
merve
Request for SWE-bench-Verified Evaluation Metrics of Qwen3-Next-80B-A3B.
➕ 2
#18 opened 6 months ago
by
ludybupt
希望能控制思考level
#17 opened 7 months ago
by
hqhxzly
I hope qwen3 will open-source a large 120b model, like the GPT-120b. GPT's Chinese version isn't very good; we still need qwen.
👍 1
#16 opened 7 months ago
by
mimeng1990
经典逻辑推理题无法给出正确答案
1
#15 opened 8 months ago
by
pypry
FP8 版本无法使用vllm v0.10.2 来加载
#14 opened 9 months ago
by
iwaitu
Good for Roleplay - Sucks at tool calling
2
#13 opened 9 months ago
by
abiteddie
4 bit with bitsandbytes not working
#12 opened 9 months ago
by
TheBigBlockPC
Fix broken qwen3-next blog link
#11 opened 9 months ago
by
Smorty100
Something of with knowledge cutoff date?
#10 opened 9 months ago
by
Nirav-Madhani
Failed to run VLLM batch inference
➕ 1
1
#9 opened 9 months ago
by
Junxiao-Zhao
🚀 Best Practices for Evaluating the Qwen3-Next Model
🚀👍 7
1
#8 opened 9 months ago
by
Yunxz
vllm: error: unrecognized arguments: --enable-reasoning
1
#7 opened 9 months ago
by
evilll
The Thinking model has a higher hallucination rate than the Instruct model and tends to overlook details in long contexts.
1
#4 opened 9 months ago
by
xiaoxiao218
Local Installation Video and Testing On CPU - Step by Step
#3 opened 9 months ago
by
fahdmirzac
Support of 1M context doubt
👀 1
1
#2 opened 9 months ago
by
clyang33
GGUF when?
😎👍 14
1
#1 opened 9 months ago
by
ouchiewouchie