Instructions to use OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit") config = load_config("OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit
Run Hermes
hermes
Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated — MTPLX 4-bit (MoE MTP head)
Overview
This is the MLX 4-bit build of OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated with the Qwen3.5 MoE Multi-Token-Prediction (MTP) head included, packaged for MTPLX native MTP speculative decoding on Apple Silicon.
The language/vision weights are byte-identical to the -MLX-4bit build. The only additions are the MTP head (mtp.safetensors, BF16, 4.7 GB) and a config.json pointer (mlx_lm_extra_tensors.mtp_file). That sidecar is ignored by plain mlx-lm/mlx-vlm, so this folder still loads as an ordinary MLX model — but with MTPLX it also drives speculative decoding.
- Language: 4-bit, group size 64 (MoE routing gates kept at higher precision by the model's quant predicate), ≈ 4.5 bits/weight.
- MTP head: 1 layer, MoE (router + 256 experts / 8 active + shared expert), full self-attention, BF16 (785 tensors). MTPLX stacks the experts into
switch_mlpat load and verifies every drafted token against the target model. - Vision: the BF16 vision tower from the base build is still present; MTPLX runs the text path only. For image input, use the
-MLX-4bitrepo withmlx-vlm.
⚠️ Requires MTPLX with Qwen3.5-MoE MTP support
The Qwen3.5 MTP head is an MoE block. MTPLX ≤ 0.3.7 only supported a dense Qwen MTP head and will reject this model with invalid-mtp-tensor-layout. Support is added in MTPLX PR #84.
Until that lands in a release, install from the branch:
pip install "git+https://github.com/janfeddersen-wq/MTPLX.git@qwen3-5-moe-mtp"
# after the PR is merged & released: pip install -U mtplx
Usage
MODEL=OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit
# one-shot, with acceptance stats
mtplx ask --model "$MODEL" --prompt "Explain Rayleigh scattering simply." --mtp --stats --yes
# interactive terminal chat
mtplx start cli --model "$MODEL" --yes
# OpenAI-compatible server
mtplx quickstart --model "$MODEL" --port 8000 --yes
--yes accepts the "family-compatible-unverified" gate (no recorded exactness baseline is shipped). Add --no-mtp to compare against plain autoregressive decoding.
Measured (M5 Max, 128 GB)
120-token greedy run: depth-1 acceptance ≈ 70 %, accepted_by_depth = [40, 19, 3] of [57, 57, 56] drafted → 120 tokens in 57 target verify passes (≈ 2.1 tokens/verify), ~52 decode tok/s. (Contrary to the earlier note on the base card, the MoE MTP head does yield a real speedup once a runtime can consume it.)
Known limitation — MoE exactness
At temperature 0, MTP vs non-MTP greedy output is ~98 % identical and re-converges immediately, but occasionally flips a single token. This is the MoE router hitting a near-tie that resolves differently under batched verification vs single-token decode (an inherent MoE/FP effect), not a drafting error — the target model verifies every token. Strict bit-exactness for MoE heads is still being worked out (e.g. fp32 router logits during verify); see PR #84.
Files
| File | Description | Size |
|---|---|---|
model-*-of-00014.safetensors |
4-bit language weights + BF16 vision tower | ~65 GB |
mtp.safetensors |
MoE MTP head (BF16) | 4.7 GB |
config.json |
Qwen3_5MoeForConditionalGeneration + quantization + mlx_lm_extra_tensors.mtp_file |
— |
tokenizer*, chat_template.jinja, generation_config.json, processor configs |
Standard | — |
Total on disk: ~70 GB.
Hardware
Needs roughly ≥ 80 GB unified memory to load with usable context (65 GB base + ~5 GB BF16 MTP + KV cache). Runs comfortably on 96 GB+ M-series Macs.
Support & Community
☕ If these models are useful to you, consider supporting my work — it funds compute for more & larger abliterations.
- Discord: https://discord.gg/rhUZY5GEZr
- Bitcoin Donations:
bc1qsvfduzj9fjs9fugpc52yver3f2g8fp7xjxecdv
Notes
- License: Other (inherits from the Qwen3.5 base license)
- Parent (full weights): Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated
- Plain 4-bit MLX (no MTP, for
mlx-vlm/LM Studio):-MLX-4bit - Architecture: Qwen3.5 MoE (~10B active / 122B total) + Qwen3-VL vision tower + MoE MTP head
Disclaimer
Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, and deployment requirements.
- Downloads last month
- 1,820
4-bit
Model tree for OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit
Base model
Qwen/Qwen3.5-122B-A10B