Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

danielhanchenΒ 
posted an update 2 days ago
view post
Post
2330
A new way to use Unsloth.

Coming soon...
SeaWolf-AIΒ 
posted an update 2 days ago
view post
Post
2036
🧬 Darwin-35B-A3B-Opus β€” The Child That Surpassed Both Parents

What if a merged model could beat both its parents? We proved it can.
Darwin-35B-A3B-Opus is a 35B MoE model (3B active) built with our Darwin V5 engine β€” the first evolution system that CT-scans parent models before merging them.
πŸ€— Model: FINAL-Bench/Darwin-35B-A3B-Opus

The result speaks for itself: GPQA Diamond 90.0%, versus Father (Qwen3.5-35B-A3B) at 84.2% and Mother (Claude 4.6 Opus Distilled) at 85.0%. That's +6.9% over Father and +5.9% over Mother. Not a tradeoff β€” a genuine leap. Meanwhile, MMMLU sits at 85.0% (Father: 85.2%), multimodal is fully intact, and all 201 languages are preserved.

How? Model MRI changed everything. Traditional merging is guesswork. Darwin V4 added evolution. Darwin V5 added X-ray vision. Model MRI scans each parent layer by layer and discovers: Mother's L34–L38 is the reasoning engine (peak cosine distance), 50–65% of Mother's experts are dead (killed by text-only distillation), and Father is a healthy generalist with every expert alive. The prescription: transplant Mother's reasoning brain at L38 (90% weight), replace her dead experts with Father's living ones, and let Father's router handle the output layer. Reasoning went up. Versatility stayed intact. No tradeoff β€” just evolution.

35B total, 3B active (MoE) Β· GPQA Diamond 90.0% Β· MMMLU 85.0% (201 languages) Β· Multimodal Image & Video Β· 262K native context Β· 147.8 tok/s on H100 Β· Runs on a single RTX 4090 (Q4) Β· Apache 2.0
Darwin V5's full algorithm and technical details will be released alongside an upcoming paper.

πŸš€ Live Demo: FINAL-Bench/Darwin-35B-A3B-Opus

πŸ† FINAL Bench Leaderboard: FINAL-Bench/Leaderboard

πŸ“Š ALL Bench Leaderboard: FINAL-Bench/all-bench-leaderboard

Built by VIDRAFT Β· Supported by the Korean Government GPU Support Program
  • 8 replies
Β·
danielhanchenΒ 
posted an update about 7 hours ago
SeaWolf-AIΒ 
posted an update about 5 hours ago
view post
Post
290
πŸ’Ž Gemma 4 Playground β€” Dual Model Demo on ZeroGPU

We just launched a Gemma 4 Playground that lets you chat with Google DeepMind's latest open models β€” directly on Hugging Face Spaces with ZeroGPU.

FINAL-Bench/Gemma-4-Multi

πŸ‘‰ Try it now: FINAL-Bench/Gemma-4-Multi
Two Models, One Space
Switch between both Gemma 4 variants in a single interface:

⚑ Gemma 4 26B-A4B β€” MoE with 128 experts, only 3.8B active params. 95% of the 31B's quality at ~8x faster inference. AIME 88.3%, GPQA 82.3%.
πŸ† Gemma 4 31B β€” Dense 30.7B. Best quality among Gemma 4 family. AIME 89.2%, GPQA 84.3%, Codeforces 2150. Arena open-model top 3.

Features

Vision β€” Upload images for analysis, OCR, chart reading, document parsing
Thinking Mode β€” Toggle chain-of-thought reasoning with Gemma 4's native <|channel> thinking tokens
System Prompts β€” 6 presets (General, Code, Math, Creative, Translate, Research) or write your own
Streaming β€” Real-time token-by-token response via ZeroGPU
Apache 2.0 β€” Fully open, no restrictions

Technical Details
Built with the dev build of transformers (5.5.0.dev0) for full Gemma 4 support including multimodal apply_chat_template, variable-resolution image processing, and native thinking mode. Runs on HF ZeroGPU with @spaces .GPU β€” no dedicated GPU needed.
Both models support 256K context window and 140+ languages out of the box.

Links

- πŸ€— Space: [FINAL-Bench/Gemma-4-Multi]( FINAL-Bench/Gemma-4-Multi)
- πŸ“„ Gemma 4 26B-A4B: [google/gemma-4-26B-A4B-it]( google/gemma-4-26B-A4B-it)
- πŸ“„ Gemma 4 31B: [google/gemma-4-31B-it]( google/gemma-4-31B-it)
- πŸ”¬ DeepMind Blog: [Gemma 4 Launch](https://deepmind.google/blog/gemma-4-byte-for-byte-the-most-capable-open-models/)
DedeProGamesΒ 
posted an update 1 day ago
view post
Post
1662
πŸ”₯ GRM2 - The small one that surpasses the big ones.
What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can.
GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks.
πŸ€— Model: OrionLLM/GRM2-3b
The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510.

πŸš€ Chat with GRM:
DedeProGames/GRM2-Chat

πŸ† Download official GGUFs: OrionLLM/GRM2-3b-GGUF
MikeDoesΒ 
posted an update 2 days ago
view post
Post
2135
Things our clients and open source actually said to us this year:

"Finally, someone built a synthetic PII training data for German."

"Does it cover have localised information? Not just the language, the actual format. That must have been a lot of work that we can save from our side."

"We operate in 12 EU countries. Your dataset is the only one that covers all of them which has helped us out a lot in compliance especially because it's synthetic."

Every language has strong PII localization names, addresses, IDs, phone numbers, dates in the real format of that country.

23 languages. 29 regions. 3 scripts. 1,428,143 examples.

100% synthetic. Zero real personal data. Free on Hugging Face.
sergiopaniegoΒ 
posted an update 2 days ago
view post
Post
1782
TRL is officially an adult πŸ₯³

excited to announce TRL v1.0❗️

head to the blog to see how we got here and what’s next for this post-training library, designed to keep pace with the field

https://huggingface.co/blog/trl-v1
  • 2 replies
Β·
reaperdoesntknowΒ 
posted an update 2 days ago
view post
Post
1923
Your Loss Function Has Singularities. Classical Calculus Can't See Them.

Introducing Discrepancy Calculus (DISC) β€” treating training singularities as structure, not noise.

Loss plateaus, mode collapse, catastrophic forgetting, distilled models that know things the teacher never taught β€” we engineer around these. But what if those singularities are the actual structure of the learning problem?

The core insight: Every BV function decomposes into smooth (what classical calculus handles), jump (capability emergence, loss plateaus breaking), and Cantor (ghost imprinting β€” knowledge transferring through weight-space topology, not gradient signal). Classical analysis sees only the first. DISC sees all three.

The paper proves this isn't alternative notation β€” it's strictly larger. The Meta-Discrepancy Theorem: where singularities exist, the classical FTC/MVT/chain-rule package is provably impossible.

What it explains:

TopologicalQwen exhibited literary reasoning from physics-only data — the Cantor part explains how. DualMind's Explore→Examine→Response loop operationalizes DISC as inference dynamics. 50 models, 35K+ downloads, all built on this framework.

Paper: Discrepancy Calculus: Foundations and Core Theory (DOI: 10.57967/hf/8194) β€” 8 axioms, proofs, computational recipes.

Series: Structure Over Scale (DOI: 10.57967/hf/8165) β†’ Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184) β†’ DISC Foundations

β€” Roy S. Colca Jr., Convergent Intelligence LLC: Research Division
ArtelTalebΒ 
posted an update about 16 hours ago
view post
Post
823
Hey Music Lovers !

You have a track and you want the stems.

Retro 6stems splits any audio file into 6 clean, separated tracks

Drop an MP3, WAV, FLAC or M4A β€” Demucs gets to work immediately with a minimal retro design .

Instrumental mix included automatically
Vocals
Drums
Bass
Guitar
Piano
Other (strings, synths...)

Preview each stem directly in the browser, download individually or grab the full ZIP.

No install. No GPU needed on your end. Just upload and wait.


πŸ‘‰ ArtelTaleb/retro-6stems
shriarul5273Β 
posted an update 1 day ago
view post
Post
891
πŸ” One API. 12 model families. 28 variants. Why depth_estimation makes depth research easier

Switching between depth models usually means rewriting preprocessing, adapting outputs, and dealing with different codebases.

depth_estimation removes that friction.

With the same interface, you can work with:
🌊 Depth Anything
🍎 DepthPro
🧭 MiDaS
πŸ“ ZoeDepth
🧩 MoGe
πŸ›°οΈ VGGT / OmniVGGT
and more

Change one model string, keep the rest of your workflow the same.

That makes it much easier to:
βš–οΈ compare models fairly
πŸ§ͺ prototype quickly
πŸ“ˆ benchmark consistently
πŸ› οΈ build reusable depth pipelines

GitHub: https://github.com/shriarul5273/depth_estimation

#depthestimation #research #computervision #python #machinelearning #opensource #pytorch