ok i'm back so like not a single word of this is meaningful in any way and is riddled with factual errors in the claims it does make
first off, evolutionary merging as a concept isn't new, mergekit can already do this

There is no way you merged two models of different architectures and got a positive result. If the "mother" were "text only", it would definitionally have to also be multimodal. Otherwise it's not Qwen3.5. And there is no architecture that is even remotely compatible with Qwen3.5's, the DeltaNet attention heads see to this fact. If what you're saying is true, then you're splicing a cat's brain into a dog's (or, to be somewhat reasonable, a cat's brain into an ocelot's.) This is all I needed but there's more actually.
201 Languages β Potentially degraded β
Inherited from Father
You presume. Any amount of finetuning is going to result in specialization in the target area. 201 languages are going to necessarily be represented across the model in a way that layer-wise merging can't preserve without other techniques that are already implemented in a mathematical way (tl;dr Task Arithmetic Works.)
Benchmark Transparency β No scores published β
Fully open
Your "mother" is a community finetune. This is vexatious language that degrades a hobbyist project as being "opaque." Not that a human wrote this model card and I don't even need to ask Pangram about that.
Model MRI Integration β CT-scans parent models layer by layer before merging, guiding evolution with structural insight
If conventional merging is "mixing recipes blindfolded," Darwin V5 is "precision surgery with X-ray guidance."
Extremely waffly use of medical terminology with no technical definition whatsoever in this context, neither extant nor provided.
Traditional model merging relies on humans setting hyperparameters like ratio and density by intuition. Set ratio=0.5, density=0.9, run once, and hope for the best. The result depends on luck, and applying the same ratio uniformly across billions of parameters ignores each layer's unique role.
Laughably wrong. The entire point of merging is that iteration is fast. "By intuition" is not a meaningful critique because intuition is the only way that humans do this. "and applying the same ratio uniformly across billions of parameters ignores each layer's unique role" presumes that A) that layers have a "unique role" (if it were that simple, mechanistic interpretability would be solved), and B) that we have to use the same ratios, essentially "model merging hasn't evolved since 2023" which I can literally prove by Just Look At It.
Darwin V4's Advance
Darwin V4 solved this with evolutionary algorithms β automatically searching hundreds of parameter combinations and selecting survivors by real benchmark scores.
You didn't invent that. See above.

You need more than this, my guy. You can't just expect us to take your word for this. Give some actual theory or get lost.
Discovering attn=0.168 and ffn=0.841 β this extreme asymmetry β is virtually impossible by human intuition.
Perhaps not those precise numbers, but people literally already do layerwise merge ratios, and this is literally already what my friends in Allura have found in their experiments with finetuning; changing attention vs. feed-forward layers provides drastically different results. We're a bunch of gooner dorks in our bedrooms, you've rediscovered this as a government-funded AI lab. What's going on here exactly?

No rigorous definition of "dead" is provided through this entire model card. From what I can tell it means "inactive to a higher degree"
MRI didn't apply uniform ratios. It split 40 layers into 3 blocks:
Thanks GPT-4o.

But again, these terms are meaningless. We don't know what "MRI" means, we have no way to verify that your process actually results in the numbers you're providing.
Dead Expert 50~65% is the fingerprint of Claude text-only distillation. The fine-tuning killed multimodal and multilingual experts that were no longer activated during text-only training.
Didn't you say at the top that the Claude distill is a text-only model?? Why would you expect layers with connections to the multimodal tower to be activated?????? Are we for real??????????
Father MRI: Healthy Generalist (Organ Donor)
Yet another metaphor with no technical definition extant or provided.
The Father (Qwen3.5-35B-A3B) shows healthy, uniform expert activation across all 40 layers β a well-balanced generalist with all experts alive. This is the "organ donor" that revives the Mother's dead 50β65% experts.
Of course. It's the base model. You would expect that
Why This Matters
Thanks GPT-4o.
I can't critique this section but I don't think I have to because the reason I can't critique it is because it's unfalsifiable on account of the blatant and egregious lack of any kind of technical direction in this model card. There is nothing to critique. This is Ancient Aliens tier. This is a wall made of saltine crackers.
So what do we have here?
A layer-wise merge of a Claude 4.6 Opus distillation onto the Qwen 3.5 base, improving degradation caused by what might have been an underdeveloped finetune methodology, that results in better performance, because model merging is a validated technique that works well. The layerwise ratios were discovered with an evolutionary process, a thing that already exists, but isn't often done because it's more expensive.
That alone is interesting enough to promote. It's good PR for evolutionary merging, which I think more people should be focusing on.
But what's stapled on top is a cheap facade of irrelevant jargon from medicine that communicates nothing of value to anything that might have changed about the process, along with false claims about merging that demonstrate that nobody involved with this project respects it as a method, with a model card shat out by a free-tier LLM that understands what it's saying perhaps less than the humans who could have conceivably produced the graphs.
I am insulted having spent my time reading this. There is so much more I could go into but I just keep repeating myself over and over and I only have so many hours in a day.
Come back with a paper with some actual math on it, and I'll change my tone. Until then, stay off of our HF feed, please. This crap makes us all look bad.
Get that government bag tho I guess.