𧬠Darwin-35B-A3B-Opus β The Child That Surpassed Both Parents
What if a merged model could beat both its parents? We proved it can. Darwin-35B-A3B-Opus is a 35B MoE model (3B active) built with our Darwin V5 engine β the first evolution system that CT-scans parent models before merging them. π€ Model: FINAL-Bench/Darwin-35B-A3B-Opus
The result speaks for itself: GPQA Diamond 90.0%, versus Father (Qwen3.5-35B-A3B) at 84.2% and Mother (Claude 4.6 Opus Distilled) at 85.0%. That's +6.9% over Father and +5.9% over Mother. Not a tradeoff β a genuine leap. Meanwhile, MMMLU sits at 85.0% (Father: 85.2%), multimodal is fully intact, and all 201 languages are preserved.
How? Model MRI changed everything. Traditional merging is guesswork. Darwin V4 added evolution. Darwin V5 added X-ray vision. Model MRI scans each parent layer by layer and discovers: Mother's L34βL38 is the reasoning engine (peak cosine distance), 50β65% of Mother's experts are dead (killed by text-only distillation), and Father is a healthy generalist with every expert alive. The prescription: transplant Mother's reasoning brain at L38 (90% weight), replace her dead experts with Father's living ones, and let Father's router handle the output layer. Reasoning went up. Versatility stayed intact. No tradeoff β just evolution.
35B total, 3B active (MoE) Β· GPQA Diamond 90.0% Β· MMMLU 85.0% (201 languages) Β· Multimodal Image & Video Β· 262K native context Β· 147.8 tok/s on H100 Β· Runs on a single RTX 4090 (Q4) Β· Apache 2.0 Darwin V5's full algorithm and technical details will be released alongside an upcoming paper.
π Try it now: FINAL-Bench/Gemma-4-Multi Two Models, One Space Switch between both Gemma 4 variants in a single interface:
β‘ Gemma 4 26B-A4B β MoE with 128 experts, only 3.8B active params. 95% of the 31B's quality at ~8x faster inference. AIME 88.3%, GPQA 82.3%. π Gemma 4 31B β Dense 30.7B. Best quality among Gemma 4 family. AIME 89.2%, GPQA 84.3%, Codeforces 2150. Arena open-model top 3.
Features
Vision β Upload images for analysis, OCR, chart reading, document parsing Thinking Mode β Toggle chain-of-thought reasoning with Gemma 4's native <|channel> thinking tokens System Prompts β 6 presets (General, Code, Math, Creative, Translate, Research) or write your own Streaming β Real-time token-by-token response via ZeroGPU Apache 2.0 β Fully open, no restrictions
Technical Details Built with the dev build of transformers (5.5.0.dev0) for full Gemma 4 support including multimodal apply_chat_template, variable-resolution image processing, and native thinking mode. Runs on HF ZeroGPU with @spaces.GPU β no dedicated GPU needed. Both models support 256K context window and 140+ languages out of the box.
π₯ GRM2 - The small one that surpasses the big ones. What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can. GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks. π€ Model: OrionLLM/GRM2-3b The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510.
Things our clients and open source actually said to us this year:
"Finally, someone built a synthetic PII training data for German."
"Does it cover have localised information? Not just the language, the actual format. That must have been a lot of work that we can save from our side."
"We operate in 12 EU countries. Your dataset is the only one that covers all of them which has helped us out a lot in compliance especially because it's synthetic."
Every language has strong PII localization names, addresses, IDs, phone numbers, dates in the real format of that country.
Your Loss Function Has Singularities. Classical Calculus Can't See Them.
Introducing Discrepancy Calculus (DISC) β treating training singularities as structure, not noise.
Loss plateaus, mode collapse, catastrophic forgetting, distilled models that know things the teacher never taught β we engineer around these. But what if those singularities are the actual structure of the learning problem?
The core insight: Every BV function decomposes into smooth (what classical calculus handles), jump (capability emergence, loss plateaus breaking), and Cantor (ghost imprinting β knowledge transferring through weight-space topology, not gradient signal). Classical analysis sees only the first. DISC sees all three.
The paper proves this isn't alternative notation β it's strictly larger. The Meta-Discrepancy Theorem: where singularities exist, the classical FTC/MVT/chain-rule package is provably impossible.
What it explains:
TopologicalQwen exhibited literary reasoning from physics-only data β the Cantor part explains how. DualMind's ExploreβExamineβResponse loop operationalizes DISC as inference dynamics. 50 models, 35K+ downloads, all built on this framework.
Paper: Discrepancy Calculus: Foundations and Core Theory (DOI: 10.57967/hf/8194) β 8 axioms, proofs, computational recipes.
Series: Structure Over Scale (DOI: 10.57967/hf/8165) β Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184) β DISC Foundations
β Roy S. Colca Jr., Convergent Intelligence LLC: Research Division
Hey Music Lovers ! You have a track and you want the stems. Retro 6stems splits any audio file into 6 clean, separated tracks Drop an MP3, WAV, FLAC or M4A β Demucs gets to work immediately with a minimal retro design . Instrumental mix included automatically Vocals Drums Bass Guitar Piano Other (strings, synths...) Preview each stem directly in the browser, download individually or grab the full ZIP. No install. No GPU needed on your end. Just upload and wait.