DualMinded-Qwen3-1.7B
A 1.7B parameter dual-cognition model trained on Opus 4.6 reasoning traces. The model implements a three-phase cognitive loop — explore, examine, respond — where it reasons freely, critiques its own reasoning, then synthesizes a clean answer.
Convergent Intelligence LLC: Research Division
Architecture
<explore> — unconstrained reasoning, derivation, speculation
</explore>
<examine> — adversarial self-critique, error detection, refinement
</examine>
<response> — clean synthesis from the internal dialogue
</response>
This is the multi-model collision array collapsed into a single architecture. The dialectical structure that produces novel insights from architectural diversity is recreated through role-conditioned generation on shared weights. No extra parameters, no routing — same weights, different cognitive modes.
Training Pipeline
DualMinded-Qwen3-1.7B is the product of a four-stage pipeline:
Stage 1 — Multi-Teacher Distillation: Qwen3-30B-A3B in three variants (Instruct, Thinking, Coder) distilled into Qwen3-1.7B via proof-weighted KD with 2.25× loss amplification on reasoning tokens.
Stage 2 — DISC Refinement: Disctil-Qwen3-1.7B: the student refined through Discrepancy Calculus, detecting and preserving structural boundaries in the teacher's distribution.
Stage 3 — Topological Knowledge Distillation (TKD): Continuous-stream distillation with topology-guided windowing from Qwen3-30B-A3B-Thinking. Bounded variation decomposition of the teacher's output: smooth + jumps + drift. Jump positions amplified at 3σ, windows cut at low-discrepancy boundaries, 4-phase curriculum ordering (easy → hard).
Stage 4 — DualMind SFT on Opus 4.6:
SFT using Opus-4.6-Reasoning-3000x-filtered. The thinking column maps directly to <explore> — no heuristic sentence splitting needed. The solution column is split into <examine> + <response>.
Training Configuration
| Parameter | Value |
|---|---|
| Base checkpoint | TKD checkpoint-512 |
| Dataset | Opus-4.6-Reasoning-3000x-filtered (50%) |
| Max seq length | 2048 |
| Batch size | 2 × 8 accum = 16 effective |
| Learning rate | 5e-6 (cosine) |
| Warmup | 32 steps |
| Max steps | 1024 |
| Precision | BF16 |
| Hardware | NVIDIA H100 |
DualMind vs DualMinded
| DualMind | DualMinded | |
|---|---|---|
| SFT Data | LogicInference_OA | Opus-4.6-Reasoning |
| Explore Source | Heuristic CoT split | Direct Opus thinking column |
| Strength | Formal logic, structured proofs | Extended reasoning, creative derivation |
| Base Checkpoint | TKD final | TKD checkpoint-512 |
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"reaperdoesntknow/DualMinded-Qwen3-1.7B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DualMinded-Qwen3-1.7B")
prompt = "##USER:\nProve the mean value theorem.\n\n<explore>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
temperature=0.6,
top_p=0.9,
repetition_penalty=1.15,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Ghost Imprinting
Sequential distillation from multiple teachers (Instruct → Thinking → Coder → Opus) leaves residual fields in weight space. These residuals produce capabilities absent from any individual teacher — the singular-continuous component of the bounded variation decomposition applied to the parameter tensor. Models in the DualMind family exhibit emergent behaviors (e.g., literary content from physics-only training data) attributable to these ghost imprints.
GGUF
Quantized versions available at DualMinded-Qwen3-1.7B-GGUF: F16, Q8_0, Q5_K_M, Q4_K_M.
Ollama: ollama run reaperdoesntrun/DualMinded-1.7B
Related
- DualMind — LogicInference-trained variant
- DualMind_Methodolgy — Paper: DOI 10.57967/hf/8184
- Structure Over Scale — Paper 1: CPU training methodology
- DualMind Collection
- DistilQwen Collection
Mathematical Foundations: Discrepancy Calculus (DISC)
This model's training pipeline is grounded in Discrepancy Calculus — a measure-theoretic framework that treats singularities as primary structure rather than pathology. Full theory: "On the Formal Analysis of Discrepancy Calculus" (Colca, 2026; Convergent Intelligence LLC: Research Division).
The Core Operator:
For smooth $f$: $Df(x) = |f'(x)|$. For rough $f$: $D$ localizes irregularity to null sets while preserving integral structure.
The Mesh Fundamental Identity — every BV function decomposes as:
Standard knowledge distillation captures only term 1. Topological Knowledge Distillation (TKD) preserves all three by treating the teacher's output distribution as a BV function and computing discrepancy energy, jump sets, and gap energy density before training begins.
Citation
@misc{colca2026dualmind,
title={From Three Teachers to Dual Cognition: Topology-Aware Multi-Teacher Distillation and Role-Conditioned Self-Critique at 1.7B Scale},
author={Colca, Roy S.},
year={2026},
publisher={HuggingFace},
url={https://doi.org/10.57967/hf/8184}
}
Convergent Intelligence LLC: Research Division — Apache 2.0
Convergent Intelligence Portfolio
Part of the DualMind Series by Convergent Intelligence LLC: Research Division
DualMind Family
| Model | Format | Description |
|---|---|---|
| DualMind | BF16 | LogicInference-trained. Explore→Examine→Response loop. |
| DualMinded-Qwen3-1.7B | BF16 | Opus 4.6 reasoning traces. Higher quality splits. |
| Dualmind-Qwen-1.7B-Thinking | BF16 | Thinking-teacher variant with extended deliberation. |
| DualMind-GGUF | GGUF | Quantized LogicInference variant. CPU/6GB GPU. |
| DualMinded-Qwen3-1.7B-GGUF | GGUF | Quantized Opus variant. Ollama ready. |
Papers
| Paper | DOI |
|---|---|
| Structure Over Scale | 10.57967/hf/8165 |
| Three Teachers to Dual Cognition | 10.57967/hf/8184 |
| Discrepancy Calculus | 10.57967/hf/8194 |
Last updated: 2026-03-31 by Convergent Intelligence LLC: Research Division
- Downloads last month
- 1,859
Model tree for reaperdoesntknow/DualMinded-Qwen3-1.7B
Base model
Qwen/Qwen3-1.7B