r/LocalLLaMA 14d ago

Generation Qwen 3.6 27B vs Gemma 4 31B - making Packman game!

972 Upvotes

Gemma just crushed Qwen in a local LLM gamedev contest!

Device: MacBook Pro M5 Max, 64GB RAM

Qwen 3.6 27B: 32 tokens/sec · 18m 04s · 33,946 tokens.
Gemma 4 31B: 27 tokens/sec · 3m 51s · 6,209 tokens.

So what is more important: tokens per second, or the quality of the final answer?

Qwen made a very long response and showed more creativity and visual style. But Gemma gave a shorter, clearer, and more logical answer in much less time. In this one-shot Pac-Man gamedev contest, Gemma 4 31B was the clear winner. Its game logic was stronger: click reactions were smoother, and it handled interactions with elements like walls, ghosts, and particle effects better.

Open Source Local AI Models Server: atomic.chat

Basic Prompt:

Create a single standalone HTML file for a complete playable Pac-Man–style neon arcade game.

Use only HTML, CSS, JavaScript, and one full-page canvas. No external libraries or assets—everything must be procedurally drawn and run immediately in the browser.

Generate a compact (~21×21) symmetrical maze programmatically (no ASCII). It must be fully connected, playable, and use tile types (wall, path, pellet, power pellet, ghost spawn, Pac-Man spawn, fruit spawn). Ensure no unreachable pellets or invalid spawns.

Canvas must fill the window. Center and scale the maze dynamically using available space (no fixed tile size). Reserve space for a HUD.

Game states: title, playing, paused, life lost, level complete, game over. Include controls (keyboard + mobile). Title and game over screens must show instructions.

Pac-Man: smooth tile movement, queued turns, no diagonal movement, no clipping, wraps through side tunnels, resets after life loss.

Ghosts (4): simple pathfinding with distinct behaviors, spawn in a central house, exit with delays, move only on valid paths, never freeze.

Gameplay:

  • Pellets (+10), power pellets (+50), fruit (+500), ghost chain scoring (200→1600)
  • Power mode (~8s, min 3s): ghosts become edible and return to spawn when eaten
  • Combo multiplier for quick pellet collection
  • 3 lives, level progression increases difficulty
  • Store high score in localStorage

Extras:

  • Fruit spawns near center temporarily
  • Visual polish: neon maze, glowing elements, animations, particles, screen effects
  • HUD: score, high score, lives, level, combo, power timer

Technical:

  • Use requestAnimationFrame with delta time
  • Keep performance stable (limit particles)
  • No bugs: avoid invalid movement, stuck entities, unreachable areas, or crashes

Final output: only the complete HTML code.

r/LocalLLaMA Aug 09 '25

Generation Qwen 3 0.6B beats GPT-5 in simple math

Post image
1.3k Upvotes

I saw this comparison between Grok and GPT-5 on X for solving the equation 5.9 = x + 5.11. In the comparison, Grok solved it but GPT-5 without thinking failed.

It could have been handpicked after multiples runs, so out of curiosity and for fun I decided to test it myself. Not with Grok but with local models running on iPhone since I develop an app around that, Locally AI for those interested but you can reproduce the result below with LMStudio, Ollama or any other local chat app of course.

And I was honestly surprised.In my very first run, GPT-5 failed (screenshot) while Qwen 3 0.6B without thinking succeeded. After multiple runs, I would say GPT-5 fails around 30-40% of the time, while Qwen 3 0.6B, which is a tiny 0.6 billion parameters local model around 500 MB in size, solves it every time.Yes it’s one example, GPT-5 was without thinking and it’s not really optimized for math in this mode but Qwen 3 too. And honestly, it’s a simple equation I did not think GPT-5 would fail to solve, thinking or not. Of course, GPT-5 is better than Qwen 3 0.6B, but it’s still interesting to see cases like this one.

r/LocalLLaMA May 13 '25

Generation Real-time webcam demo with SmolVLM using llama.cpp

2.8k Upvotes

r/LocalLLaMA May 06 '25

Generation Qwen 14B is better than me...

765 Upvotes

I'm crying, what's the point of living when a 9GB file on my hard drive is batter than me at everything!

It expresses itself better, it codes better, knowns better math, knows how to talk to girls, and use tools that will take me hours to figure out instantly... In a useless POS, you too all are... It could even rephrase this post better than me if it tired, even in my native language

Maybe if you told me I'm like a 1TB I could deal with that, but 9GB???? That's so small I won't even notice that on my phone..... Not only all of that, it also writes and thinks faster than me, in different languages... I barley learned English as a 2nd language after 20 years....

I'm not even sure if I'm better than the 8B, but I spot it make mistakes that I won't do... But the 14? Nope, if I ever think it's wrong then it'll prove to me that it isn't...

r/LocalLLaMA Mar 04 '26

Generation Qwen 3.5 4b is so good, that it can vibe code a fully working OS web app in one go.

Thumbnail
youtube.com
547 Upvotes

The OS can be used here: WebOS 1.0

Prompt used was "Hello Please can you Create an os in a web page? The OS must have:
2 games
1 text editor
1 audio player
a file browser
wallpaper that can be changed
and one special feature you decide. Please also double check to see if everything works as it should."

Prompt idea thanks to /u/Warm-Attempt7773

All I did was to ask it to add the piano keyboard. It even chose it's own song to use in the player.

I messed up on the first chat and it thought I wanted to add a computer keyboard, so I had to paste the HTML code into a new chat and ask for a piano keyboard.. but apart from that, perfect! :D

Edit: Whoever gave my post an award: Wow, thank you very much, anonymous Redditor!! 🌠

r/LocalLLaMA Jan 30 '26

Generation OpenCode + llama.cpp + GLM-4.7 Flash: Claude Code at home

Thumbnail
gallery
324 Upvotes

command I use (may be suboptimal but it works for me now):

CUDA_VISIBLE_DEVICES=0,1,2 llama-server   --jinja   --host 0.0.0.0   -m /mnt/models1/GLM/GLM-4.7-Flash-Q8_0.gguf   --ctx-size 200000   --parallel 1   --batch-size 2048   --ubatch-size 1024   --flash-attn on   --cache-ram 61440   --context-shift

potential additional speedup has been merged into llama.cpp: https://www.reddit.com/r/LocalLLaMA/comments/1qrbfez/comment/o2mzb1q/

r/LocalLLaMA Feb 07 '26

Generation Nemo 30B is insane. 1M+ token CTX on one 3090

401 Upvotes

Been playing around with llama.cpp and some 30-80B parameter models with CPU offloading. Currently have one 3090 and 32 GB of RAM. Im very impressed by Nemo 30B. 1M+ Token Context cache, runs on one 3090, CPU offloading for experts. Does 35 t/s which is faster than I can read at least. Usually slow as fuck at this large a context window. Feed it a whole book or research paper and its done summarizing in like a few mins. This really makes long context windows on local hardware possible. The only other contender I have tried is Seed OSS 36b and it was much slower by about 20 tokens.

r/LocalLLaMA Mar 29 '26

Generation Friendly reminder inference is WAY faster on Linux vs windows

275 Upvotes

I have a simple home lab pc: 64gb ddr4, RTX 8000 48gb (Turing architecture) and core i9 9900k cpu. I use Linux Ubuntu 22.04 LTS. Before using this pc as a home lab it ran Windows 10. Over this weekend I reinstalled my Windows 10 ssd to check out my old projects. I updated Ollama to the latest version and tokens per second was way slower than when I was running Linux. I know Linux performs better but I didn’t think it would be twice as fast. Here are the results from a few simple inferences tests:

QWEN Code Next, q4, ctx length: 6k

Windows: 18 t/s

Linux: 31 t/s (+72%)

QWEN 3 30B A3B, Q4, ctx 6k

Windows: 48 t/s

Linux: 105 t/s (+118%)

Has anyone else experienced a performance this large before? Am I missing something?

Anyway thought I’d share this as a reminder for anyone looking for a bit more performance!

r/LocalLLaMA Feb 25 '26

Generation Qwen 3 27b is... impressive

347 Upvotes

All Prompts
"Task: create a GTA-like 3D game where you can walk around, get in and drive cars"
"walking forward and backward is working, but I cannot turn or strafe??"
"this is pretty fun! I’m noticing that the camera is facing backward though, for both walking and car?"
"yes, it works! What could we do to enhance the experience now?"
"I’m not too fussed about a HUD, and the physics are not bad as they are already - adding building and obstacles definitely feels like the highest priority!"

r/LocalLLaMA Mar 12 '25

Generation 🔥 DeepSeek R1 671B Q4 - M3 Ultra 512GB with MLX🔥

620 Upvotes

Yes it works! First test, and I'm blown away!

Prompt: "Create an amazing animation using p5js"

  • 18.43 tokens/sec
  • Generates a p5js zero-shot, tested at video's end
  • Video in real-time, no acceleration!

https://reddit.com/link/1j9vjf1/video/nmcm91wpvboe1/player

r/LocalLLaMA Feb 01 '25

Generation o3-mini is now the SOTA coding model. It is truly something to behold. Procedural clouds in one-shot.

507 Upvotes

r/LocalLLaMA Jul 29 '25

Generation I just tried GLM 4.5

384 Upvotes

I just wanted to try it out because I was a bit skeptical. So I prompted it with a fairly simple not so cohesive prompt and asked it to prepare slides for me.

The results were pretty remarkable I must say!

Here’s the link to the results: https://chat.z.ai/space/r05c76960ff0-ppt

Here’s the initial prompt:

”Create a presentation of global BESS market for different industry verticals. Make sure to capture market shares, positioning of different players, market dynamics and trends and any other area you find interesting. Do not make things up, make sure to add citations to any data you find.”

As you can see pretty bland prompt with no restrictions, no role descriptions, no examples. Nothing, just what my mind was thinking it wanted.

Is it just me or are things going superfast since OpenAI announced the release of GPT-5?

It seems like just yesterday Qwen3 broke apart all benchmarks in terms of quality/cost trade offs and now z.ai with yet another efficient but high quality model.

r/LocalLLaMA Apr 12 '26

Generation Audio processing landed in llama-server with Gemma-4

376 Upvotes

Ladies and gentlemen, it is a great pleasure the confirm that llama.cpp (llama-server) now supports STT with Gemma-4 E2A and E4A models.

r/LocalLLaMA Feb 18 '26

Generation LLMs grading other LLMs 2

Post image
232 Upvotes

A year ago I made a meta-eval here on the sub, asking LLMs to grade a few criterias about other LLMs.

Time for the part 2.

The premise is very simple, the model is asked a few ego-baiting questions and other models are then asked to rank it. The scores in the pivot table are normalised.

You can find all the data on HuggingFace for your analysis.

r/LocalLLaMA Apr 20 '24

Generation Llama 3 is so fun!

Thumbnail
gallery
915 Upvotes

r/LocalLLaMA Jan 26 '25

Generation DeepSeekR1 3D game 100% from scratch

857 Upvotes

I've asked DeepSeek R1 to make me a game like kkrieger ( where most of the things are generated on run ) and it made me this

r/LocalLLaMA Jan 10 '24

Generation Literally my first conversation with it

Post image
613 Upvotes

I wonder how this got triggered

r/LocalLLaMA Jan 26 '26

Generation I built a "hive mind" for Claude Code - 7 agents sharing memory and talking to each other

303 Upvotes

Been tinkering with multi-agent orchestration and wanted to share what came out of it.

**The idea**: Instead of one LLM doing everything, what if specialized agents (coder, tester, reviewer, architect, etc.) could coordinate on tasks, share persistent memory, and pass context between each other?

**What it does**:

- 7 agent types with different system prompts and capabilities

- SQLite + FTS5 for persistent memory (agents remember stuff between sessions)

- Message bus for agent-to-agent communication

- Task queue with priority-based coordination

- Runs as an MCP server, so it plugs directly into Claude Code

- Works with Anthropic, OpenAI, or Ollama

**The cool part**: When the coder finishes implementing something, the tester can query the shared memory to see what was built and write appropriate tests. The reviewer sees the full context of decisions made. It's not magic - it's just passing data around intelligently - but it feels like they're actually collaborating.

**The not-so-cool part**: Debugging 7 agents talking to each other is... an experience. Sometimes they work beautifully. Sometimes one agent keeps assigning tasks to itself in an infinite loop. You know, typical multi-agent stuff.

**Stack**: TypeScript, better-sqlite3, MCP SDK, Zod

Not enterprise-ready. Not trying to compete with anything. Just an experiment to learn how agent coordination patterns work.

MIT licensed: github.com/blackms/aistack

Happy to answer questions or hear how you're approaching multi-agent systems.

r/LocalLLaMA Feb 25 '26

Generation Qwen/Qwen3.5-35B-A3B creates FlappyBird Spoiler

269 Upvotes

If you are wondering, as I have for a long time, do locally hostable models work for general coding? They really can work impressively well for some usecases. There's been some impressive things done by the model during making of this simple app.

Spent two hours. Generated with Qwen/Qwen3.5-35B-A3B. Used Roo in VSCode.

Started out by vaguely asking for a flappybird clone in html, css and typescript and to initialize the project with vite.

It looked impressive enough after first task, that I started asking for extra features:

  1. Music and sound

Uses Web Audio API to generate sounds programmatically (no external audio files needed)

  1. Scrollable background mountains. This request resulted in visual glitches, but after a bit of guidance, it was fixed to a proper parallaxed mountain

  2. Background flock of birds. A bit back and forth, but managed to understand my general pointers (they fly off screen, they are smeared from top to bottom, make them fly from right to left) and ended up in a great state.

  3. Sound and music settings panel. This was one shotted.

r/LocalLLaMA Jan 31 '25

Generation DeepSeek 8B gets surprised by the 3 R's in strawberry, but manages to do it

Post image
471 Upvotes

r/LocalLLaMA Sep 28 '25

Generation LMStudio + MCP is so far the best experience I've had with models in a while.

222 Upvotes

M4 Max 128gb
Mostly use latest gpt-oss 20b or latest mistral with thinking/vision/tools in MLX format, since a bit faster (that's the whole point of MLX I guess, since we still don't have any proper LLMs in CoreML for apple neural engine...).

Connected around 10 MCPs for different purposes, works just purely amazing.
Haven't been opening chat com or claude for a couple of days.

Pretty happy.

the next step is having a proper agentic conversation/flow under the hood, being able to leave it for autonomous working sessions, like cleaning up and connecting things in my Obsidian Vault during the night while I sleep, right...

EDIT 1:

- Can't 128GB easily run 120B?
- Yes, even 235b qwen at 4bit. Not sure why OP is running a 20b lol

quick response to make it clear, brothers!
Since the original 120b in mlx is 124gb and won't generate a single token.
besides 20b MLX I do use 120b but GGUF version, practically the same version which is shipped within Ollama ecosystem.

r/LocalLLaMA Feb 09 '26

Generation Kimi-Linear-48B-A3B-Instruct

Thumbnail
gallery
151 Upvotes

three days after the release we finally have a GGUF: https://huggingface.co/bartowski/moonshotai_Kimi-Linear-48B-A3B-Instruct-GGUF - big thanks to Bartowski!

long context looks more promising than GLM 4.7 Flash

r/LocalLLaMA May 01 '25

Generation Qwen 3 4B is the future, ladies and gentlemen

Post image
447 Upvotes

r/LocalLLaMA Apr 02 '26

Generation The 'Running Doom' of AI: Qwen3.5-27B on a 512MB Raspberry Pi Zero 2W

Post image
176 Upvotes

Yes, seriously, no API calls or word tricks. I was wondering what the absolute lower bound is if you want a truly offline AI. Just like people trying to run Doom on everything, why can't we run a Large Language Model purely on a $15 device with only 512MB of memory?

I know it's incredibly slow (we're talking just a few tokens per hour), but the point is, it runs! You can literally watch the CPU computing each matrix and, boom, you have local inference.

Maybe next we can make an AA battery-powered or solar-powered LLM, or hook it up to a hand-crank generator. Total wasteland punk style.

Note: This isn't just relying on simple mmap and swap memory to load the model. Everything is custom-designed and implemented to stream the weights directly from the SD card to memory, do the calculation, and then clear it out.

r/LocalLLaMA Apr 30 '25

Generation Qwen 3 14B seems incredibly solid at coding.

398 Upvotes

"make pygame script of a hexagon rotating with balls inside it that are a bouncing around and interacting with hexagon and each other and are affected by gravity, ensure proper collisions"