Instructions to use OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit with MLX:

# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model, processor = load("OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit")
config = load_config("OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit")

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit

Run Hermes

hermes

Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated — MTPLX 4-bit (MoE MTP head)

Overview

This is the MLX 4-bit build of OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated with the Qwen3.5 MoE Multi-Token-Prediction (MTP) head included, packaged for MTPLX native MTP speculative decoding on Apple Silicon.

The language/vision weights are byte-identical to the -MLX-4bit build. The only additions are the MTP head (mtp.safetensors, BF16, 4.7 GB) and a config.json pointer (mlx_lm_extra_tensors.mtp_file). That sidecar is ignored by plain mlx-lm/mlx-vlm, so this folder still loads as an ordinary MLX model — but with MTPLX it also drives speculative decoding.

Language: 4-bit, group size 64 (MoE routing gates kept at higher precision by the model's quant predicate), ≈ 4.5 bits/weight.
MTP head: 1 layer, MoE (router + 256 experts / 8 active + shared expert), full self-attention, BF16 (785 tensors). MTPLX stacks the experts into switch_mlp at load and verifies every drafted token against the target model.
Vision: the BF16 vision tower from the base build is still present; MTPLX runs the text path only. For image input, use the -MLX-4bit repo with mlx-vlm.

⚠️ Requires MTPLX with Qwen3.5-MoE MTP support

The Qwen3.5 MTP head is an MoE block. MTPLX ≤ 0.3.7 only supported a dense Qwen MTP head and will reject this model with invalid-mtp-tensor-layout. Support is added in MTPLX PR #84.

Until that lands in a release, install from the branch:

pip install "git+https://github.com/janfeddersen-wq/MTPLX.git@qwen3-5-moe-mtp"
# after the PR is merged & released:  pip install -U mtplx

Usage

MODEL=OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit

# one-shot, with acceptance stats
mtplx ask --model "$MODEL" --prompt "Explain Rayleigh scattering simply." --mtp --stats --yes

# interactive terminal chat
mtplx start cli --model "$MODEL" --yes

# OpenAI-compatible server
mtplx quickstart --model "$MODEL" --port 8000 --yes

--yes accepts the "family-compatible-unverified" gate (no recorded exactness baseline is shipped). Add --no-mtp to compare against plain autoregressive decoding.

Measured (M5 Max, 128 GB)

120-token greedy run: depth-1 acceptance ≈ 70 %, accepted_by_depth = [40, 19, 3] of [57, 57, 56] drafted → 120 tokens in 57 target verify passes (≈ 2.1 tokens/verify), ~52 decode tok/s. (Contrary to the earlier note on the base card, the MoE MTP head does yield a real speedup once a runtime can consume it.)

Known limitation — MoE exactness

At temperature 0, MTP vs non-MTP greedy output is ~98 % identical and re-converges immediately, but occasionally flips a single token. This is the MoE router hitting a near-tie that resolves differently under batched verification vs single-token decode (an inherent MoE/FP effect), not a drafting error — the target model verifies every token. Strict bit-exactness for MoE heads is still being worked out (e.g. fp32 router logits during verify); see PR #84.

Files

File	Description	Size
`model-*-of-00014.safetensors`	4-bit language weights + BF16 vision tower	~65 GB
`mtp.safetensors`	MoE MTP head (BF16)	4.7 GB
`config.json`	`Qwen3_5MoeForConditionalGeneration` + `quantization` + `mlx_lm_extra_tensors.mtp_file`	—
`tokenizer*`, `chat_template.jinja`, `generation_config.json`, processor configs	Standard	—

Total on disk: ~70 GB.

Hardware

Needs roughly ≥ 80 GB unified memory to load with usable context (65 GB base + ~5 GB BF16 MTP + KV cache). Runs comfortably on 96 GB+ M-series Macs.

Support & Community

☕ If these models are useful to you, consider supporting my work — it funds compute for more & larger abliterations.

buymeacoffee.com/oym.kuato

Discord: https://discord.gg/rhUZY5GEZr
Bitcoin Donations: bc1qsvfduzj9fjs9fugpc52yver3f2g8fp7xjxecdv

Notes

License: Other (inherits from the Qwen3.5 base license)
Parent (full weights): Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated
Plain 4-bit MLX (no MTP, for mlx-vlm/LM Studio): -MLX-4bit
Architecture: Qwen3.5 MoE (~10B active / 122B total) + Qwen3-VL vision tower + MoE MTP head

Disclaimer

Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, and deployment requirements.

Downloads last month: 1,820

Safetensors

Model size

20B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated-MTPLX-4bit

Base model

Qwen/Qwen3.5-122B-A10B

Finetuned

OpenYourMind/Qwopus3.5-122B-A10B-Kimi-K2.6-destill-healed-abliterated

Quantized

(7)

this model