Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

danielhanchen

posted an update 2 days ago

Post

2330

A new way to use Unsloth.

Coming soon...

SeaWolf-AI

posted an update 2 days ago

Post

2036

🧬 Darwin-35B-A3B-Opus — The Child That Surpassed Both Parents

What if a merged model could beat both its parents? We proved it can.
Darwin-35B-A3B-Opus is a 35B MoE model (3B active) built with our Darwin V5 engine — the first evolution system that CT-scans parent models before merging them.
🤗 Model: FINAL-Bench/Darwin-35B-A3B-Opus

The result speaks for itself: GPQA Diamond 90.0%, versus Father (Qwen3.5-35B-A3B) at 84.2% and Mother (Claude 4.6 Opus Distilled) at 85.0%. That's +6.9% over Father and +5.9% over Mother. Not a tradeoff — a genuine leap. Meanwhile, MMMLU sits at 85.0% (Father: 85.2%), multimodal is fully intact, and all 201 languages are preserved.

How? Model MRI changed everything. Traditional merging is guesswork. Darwin V4 added evolution. Darwin V5 added X-ray vision. Model MRI scans each parent layer by layer and discovers: Mother's L34–L38 is the reasoning engine (peak cosine distance), 50–65% of Mother's experts are dead (killed by text-only distillation), and Father is a healthy generalist with every expert alive. The prescription: transplant Mother's reasoning brain at L38 (90% weight), replace her dead experts with Father's living ones, and let Father's router handle the output layer. Reasoning went up. Versatility stayed intact. No tradeoff — just evolution.

35B total, 3B active (MoE) · GPQA Diamond 90.0% · MMMLU 85.0% (201 languages) · Multimodal Image & Video · 262K native context · 147.8 tok/s on H100 · Runs on a single RTX 4090 (Q4) · Apache 2.0
Darwin V5's full algorithm and technical details will be released alongside an upcoming paper.

🚀 Live Demo: FINAL-Bench/Darwin-35B-A3B-Opus

🏆 FINAL Bench Leaderboard: FINAL-Bench/Leaderboard

📊 ALL Bench Leaderboard: FINAL-Bench/all-bench-leaderboard

Built by VIDRAFT · Supported by the Korean Government GPU Support Program

8 replies

danielhanchen

posted an update about 7 hours ago

Post

447

Google releases Gemma 4. ✨
Gemma 4 introduces 4 models: E2B, E4B, 26B-A4B, 31B.
The multimodal reasoning models are under Apache 2.0.

Run E2B and E4B on ~6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB.

GGUFs: https://huggingface.co/collections/unsloth/gemma-4
Guide: https://unsloth.ai/docs/models/gemma-4

SeaWolf-AI

posted an update about 5 hours ago

Post

290

💎 Gemma 4 Playground — Dual Model Demo on ZeroGPU

We just launched a Gemma 4 Playground that lets you chat with Google DeepMind's latest open models — directly on Hugging Face Spaces with ZeroGPU.

FINAL-Bench/Gemma-4-Multi

👉 Try it now: FINAL-Bench/Gemma-4-Multi
Two Models, One Space
Switch between both Gemma 4 variants in a single interface:

⚡ Gemma 4 26B-A4B — MoE with 128 experts, only 3.8B active params. 95% of the 31B's quality at ~8x faster inference. AIME 88.3%, GPQA 82.3%.
🏆 Gemma 4 31B — Dense 30.7B. Best quality among Gemma 4 family. AIME 89.2%, GPQA 84.3%, Codeforces 2150. Arena open-model top 3.

Features

Vision — Upload images for analysis, OCR, chart reading, document parsing
Thinking Mode — Toggle chain-of-thought reasoning with Gemma 4's native <|channel> thinking tokens
System Prompts — 6 presets (General, Code, Math, Creative, Translate, Research) or write your own
Streaming — Real-time token-by-token response via ZeroGPU
Apache 2.0 — Fully open, no restrictions

Technical Details
Built with the dev build of transformers (5.5.0.dev0) for full Gemma 4 support including multimodal apply_chat_template, variable-resolution image processing, and native thinking mode. Runs on HF ZeroGPU with @spaces .GPU — no dedicated GPU needed.
Both models support 256K context window and 140+ languages out of the box.

Links

- 🤗 Space: [FINAL-Bench/Gemma-4-Multi]( FINAL-Bench/Gemma-4-Multi)
- 📄 Gemma 4 26B-A4B: [google/gemma-4-26B-A4B-it]( google/gemma-4-26B-A4B-it)
- 📄 Gemma 4 31B: [google/gemma-4-31B-it]( google/gemma-4-31B-it)
- 🔬 DeepMind Blog: [Gemma 4 Launch](https://deepmind.google/blog/gemma-4-byte-for-byte-the-most-capable-open-models/)

DedeProGames

posted an update 1 day ago

Post

1662

🔥 GRM2 - The small one that surpasses the big ones.
What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can.
GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks.
🤗 Model: OrionLLM/GRM2-3b
The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510.

🚀 Chat with GRM:
DedeProGames/GRM2-Chat

🏆 Download official GGUFs: OrionLLM/GRM2-3b-GGUF

MikeDoes

posted an update 2 days ago

Post

2135

Things our clients and open source actually said to us this year:

"Finally, someone built a synthetic PII training data for German."

"Does it cover have localised information? Not just the language, the actual format. That must have been a lot of work that we can save from our side."

"We operate in 12 EU countries. Your dataset is the only one that covers all of them which has helped us out a lot in compliance especially because it's synthetic."

Every language has strong PII localization names, addresses, IDs, phone numbers, dates in the real format of that country.

23 languages. 29 regions. 3 scripts. 1,428,143 examples.

100% synthetic. Zero real personal data. Free on Hugging Face.

sergiopaniego

posted an update 2 days ago

Post

1782

TRL is officially an adult 🥳

excited to announce TRL v1.0❗️

head to the blog to see how we got here and what’s next for this post-training library, designed to keep pace with the field

https://huggingface.co/blog/trl-v1

2 replies

reaperdoesntknow

posted an update 2 days ago

Post

1923

Your Loss Function Has Singularities. Classical Calculus Can't See Them.

Introducing Discrepancy Calculus (DISC) — treating training singularities as structure, not noise.

Loss plateaus, mode collapse, catastrophic forgetting, distilled models that know things the teacher never taught — we engineer around these. But what if those singularities are the actual structure of the learning problem?

The core insight: Every BV function decomposes into smooth (what classical calculus handles), jump (capability emergence, loss plateaus breaking), and Cantor (ghost imprinting — knowledge transferring through weight-space topology, not gradient signal). Classical analysis sees only the first. DISC sees all three.

The paper proves this isn't alternative notation — it's strictly larger. The Meta-Discrepancy Theorem: where singularities exist, the classical FTC/MVT/chain-rule package is provably impossible.

What it explains:

TopologicalQwen exhibited literary reasoning from physics-only data — the Cantor part explains how. DualMind's Explore→Examine→Response loop operationalizes DISC as inference dynamics. 50 models, 35K+ downloads, all built on this framework.

Paper: Discrepancy Calculus: Foundations and Core Theory (DOI: 10.57967/hf/8194) — 8 axioms, proofs, computational recipes.

Series: Structure Over Scale (DOI: 10.57967/hf/8165) → Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184) → DISC Foundations

— Roy S. Colca Jr., Convergent Intelligence LLC: Research Division

ArtelTaleb

posted an update about 16 hours ago

Post

823

Hey Music Lovers !

You have a track and you want the stems.

Retro 6stems splits any audio file into 6 clean, separated tracks

Drop an MP3, WAV, FLAC or M4A — Demucs gets to work immediately with a minimal retro design .

Instrumental mix included automatically
Vocals
Drums
Bass
Guitar
Piano
Other (strings, synths...)

Preview each stem directly in the browser, download individually or grab the full ZIP.

No install. No GPU needed on your end. Just upload and wait.

👉 ArtelTaleb/retro-6stems

shriarul5273

posted an update 1 day ago

Post

891

🔁 One API. 12 model families. 28 variants. Why depth_estimation makes depth research easier

Switching between depth models usually means rewriting preprocessing, adapting outputs, and dealing with different codebases.

depth_estimation removes that friction.

With the same interface, you can work with:
🌊 Depth Anything
🍎 DepthPro
🧭 MiDaS
📏 ZoeDepth
🧩 MoGe
🛰️ VGGT / OmniVGGT
and more

Change one model string, keep the rest of your workflow the same.

That makes it much easier to:
⚖️ compare models fairly
🧪 prototype quickly
📈 benchmark consistently
🛠️ build reusable depth pipelines

GitHub: https://github.com/shriarul5273/depth_estimation

#depthestimation #research #computervision #python #machinelearning #opensource #pytorch

Recently active users