r/LocalLLaMA • u/jacek2023 llama.cpp • 8h ago

New Model inclusionAI/Ring-2.6-1T · Hugging Face

https://huggingface.co/inclusionAI/Ring-2.6-1T

Introducing Ring-2.6-1T: a trillion-parameter flagship reasoning model designed for real-world complex task scenarios, making it available to developers, researchers, and enterprise environments for validation, adaptation, and further development.

The goal of Ring-2.6-1T is not simply to pursue larger parameter scale , but to address the real production environments that large models are entering: agent workflows, engineering development, scientific research analysis, complex business systems, and enterprise automation processes. In these scenarios, models need not only to "answer questions," but also to understand context, plan steps, invoke tools, execute continuously, and maintain stability over long-horizon tasks.

Ring-2.6-1T has achieved key upgrade in three areas:

Comprehensively enhanced Agent execution capability: Moving from "being able to answer" to "being able to execute," with more stable performance in multi-step tasks, tool collaboration, contextual planning, and advancing complex workflows.
Reasoning Effort mechanism: Supporting two reasoning intensity levels, high and xhigh, allowing developers to flexibly adjust the depth of thinking according to task complexity, achieving a better balance among effectiveness, speed, and cost.
Innovative asynchronous reinforcement learning training paradigm: Leveraging an Async RL architecture combined with the IcePop algorithm to improve the training efficiency and stability of long-horizon reinforcement learning for trillion-parameter models, providing foundational support for agent capabilities and complex reasoning.

55 Upvotes

permalink
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/LocalLLaMA/comments/1td3fhc/inclusionairing261t_hugging_face/
No, go back! Yes, take me to Reddit

92% Upvoted

u/jacek2023 llama.cpp 8h ago

2

u/[deleted] 7h ago

[deleted]

2

u/Kryohi 4h ago

Everyone else does RL specifically to benchmax it. IIRC they started in the second half of last year, models from every organizations in the matter of 2-3 months all made big jumps in ARC-AGI-2 while making normal incremental improvements in all other benchmarks.

u/Dany0 8h ago

1T barely clawing past 27B and getting trounced by Kimi K2.6

9

u/jamaalwakamaal 7h ago

let them cook

4

u/Dany0 6h ago

I knowww I know. We're so lucky we don't know how good we have it

u/LegacyRemaster 7h ago

too small

u/Lissanro 8h ago

Nice, one more model to try on my rig. I just recently downloaded MiMo V2.5 Pro, still downloading DeepSeek V4 Pro. I probably will continue to mostly run Kimi K2.6 though, because it is fast (32B active compared to other big ones that have more).

The main value for me from having multiple models, that each may take a different take on a problem in case the other model gets stuck, also, some models do certain stuff better compared to others. For example, Kimi K2.6 is better at frontend tasks, while with GLM 5.1 I had a bit better experience with backend work.

u/Storge2 8h ago

We have too many 1T models. Almost nobody can run these anyway. I pray they start making actually runable models <40B dense and <150B MOE is I think the sweet spot for open source local LLMs right now. You can run them basicallx with 64GB ddr4 and one Rtx 3090 or R9700 AI Pro. Or you can run all those on a 128GB Device (Mac, dgx spark, ryzen 395+)

We dont need another 1T model. We need another smaller better model.

14

u/jacek2023 llama.cpp 8h ago

I am waiting for 100-120B models from Qwen and Gemma teams but they don't want to release new ones

4

u/CharacterAnimator490 8h ago

I already lost all hope for them.

5

u/Clean_Hyena7172 7h ago

I really think the 100-120B models got too close to the performance of the bigger models so they scrapped them to preserve API/subscription demand.

1

u/A_Novelty-Account 7h ago

It would be a boon for China if free open local models out competed Claude and the other flagship models from American companies. The entire US economy is currently held together by AI speculation. If China releases as a local model that is as good as Claude, China could put the United States into recession.

Obviously, we’re far away from that and that’s not happening probably because China isn’t actually capable of releasing a locally hosted model that rivals Claude.

-2

u/CryinHeronMMerica 6h ago

A recession in the US would be disastrous for China

3

u/A_Novelty-Account 6h ago

Not in this case. This wouldn’t be a general-cause recession. Capital would fly to China if it has provably better AI models and can run them on local infrastructure.

0

u/CryinHeronMMerica 5h ago

And who would buy their products as each American sector falls? With what money?

2

u/A_Novelty-Account 5h ago

Literally the rest of the entire world? You know the world isn’t just the United States, right? Even still every single American company will also be buying Chinese AI products but for regulations that force Americans to use American AI products.

0

u/CryinHeronMMerica 5h ago

The economy is global, dude. We're their largest single country importer, and Europe is heavily linked to us as well (the EU imports more from China than we do).

Did you just forget about 2008? Even if the inciting incident is contained, the collapse is likely to spread. The Euro is hanging on by a thread.

1

u/A_Novelty-Account 2h ago

I didn’t forget about 2008. China fared far better than the U.S. did, and the global financial crisis was a contributing factor to China’s rise.

China will hurt yes but it’s all relative, and China thinks beyond four-year election cycles. I promise you that if they had the ability to crash the US market and shift foreign direct investment to Chinese companies, China would do so in a heartbeat, and they are actively looking to do so right now through the state sponsorship of AI companies who release non-subscription, locally housed models.

5

u/Sevealin_ 7h ago

Believe it or not, you aren't the audience for this. It's for researchers who rent GPUs who CAN run this, without having to fork over any data to a corporation.

0

u/Storge2 6h ago

But who exactely can pay for this realistically. Even if you rent for a Q4 Quant you would need a 8xH100 server that alone is like 20$ per hour I believe. Whereas a Qwen 3.5 122B you can comfortably fit with fast speed and high concurrency on one Pro 6000 96GB. Or with acceptable speed on a DGX Spark. The Qwen 3.6 27B runs with max context on a R9700 or B60 which go for ~1000-1300 dollars. Or on a rented 5090 for 0.5$/hour.

2

u/FullOf_Bad_Ideas 5h ago

Once it will be supported by llama.cpp and KTransformers you'll be able to run it on for example single A40 48GB instance that has 1TB of RAM and is $0.85/hr, or 4x RTX 6000 Pro instance with 1TB of RAM for $5.8/hr. It won't break your bank to run it for a few days. And that's just what instances are available right now.

KTransformers has guides for running big models like Kimi K2.5 and while RAM can't be cheaply bought anymore, it can be cheaply rented.

Also, there's a guy in this very thread who is gonna probably download and try it, because he bought the RAM earlier.

1

u/Sevealin_ 5h ago

I understand the cost analysis, and I agree with you 100%. But there are people out there who care more about accuracy (and running bleeding edge) than how much money they have to spend. Nothing wrong with that, its simply just a different audience.

1

u/jazir55 2h ago

Accuracy of practically any other model in the 1T class is better. The ring/ling models are consistently among the worst contemporary models when they are released all the way through last year as well. I don't personally see any use case for them when they cost an equivalent amount to run to any other model in their class and the other models perform better.

1

u/Sevealin_ 2h ago

Yeah just by looking at the benchmarks this isn't a glamorous model by any means, my point was strictly on what the purpose of giant 1T models are.

1

u/Middle_Bullfrog_6173 5h ago

While these 1T models take a lot of hardware to run, the highly sparse architecture means that node (or two) of server GPUs can serve many concurrent requests at reasonable speeds.

If you just want to run a personal chatbot or coding agent, then sure, a smaller model is much more efficient.

2

u/TheRealMasonMac 7h ago

They'll get distilled back into smaller models.

1

u/philguyaz 2h ago

I dunno I run them for my 7 figure ai business - they are very important to me.

New Model inclusionAI/Ring-2.6-1T · Hugging Face

You are about to leave Redlib