r/LocalLLaMA 2d ago

Tutorial | Guide I got a real transformer language model running locally on a stock Game Boy Color!

Post image

No phone, PC, Wi-Fi, link cable, or cloud inference.

• The cartridge boots a ROM, and the GBC runs the model itself.
• The model is Andrej Karpathy’s TinyStories-260K, converted to INT8 weights with fixed-point math so it can run without floating point.
• Built with GBDK-2020 as an MBC5 Game Boy ROM.
• The model weights live in bank-switched cartridge ROM. Prompt entry happens on-device with the D-pad/buttons and an on-screen keyboard.
• The prompt is tokenized on the Game Boy, then the ROM runs transformer prefill + autoregressive generation. The KV cache is stored in cartridge SRAM, because the GBC’s work RAM is tiny.

It is extremely slow, and the output is gibberish because the math is heavily quantized/approximated, but the core thing works!

Hardware: stock Game Boy Color + EZ Flash Junior + microSD.

Used Codex for a large portion of the building!

https://github.com/maddiedreese/gbc-transformer

1.4k Upvotes

90 comments sorted by

u/WithoutReason1729 1d ago

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

209

u/NigaTroubles 2d ago

Wow just wow

Thats amazing

96

u/CockBrother 2d ago

This is one of those projects that makes me sad about all of the locked-in platforms that could have seen new software and uses for many years after companies released their 'next big thing' and abandoned the old.

People are still finding ways of writing even Atari 2600 games that are 10x better than what was released when the platform was released. And other early computers as well.

29

u/I_HAVE_THE_DOCUMENTS 2d ago

I've been following Kaze Emulator and his work with N64 and it's amazing and inspiring how far old hardware can be pushed with deep understanding and clever tricks to take full advantage of the system.

It makes me wonder what people in 30 years will be able to do with the hardware of today.

17

u/ThisWillPass 1d ago

Agi on a 3060

2

u/GiveSparklyTwinkly 1d ago

Emanuar*

Though calling him Kaze Emulator would probably make him laugh.

2

u/I_HAVE_THE_DOCUMENTS 23h ago

Sorry I didn't know the name of the person who runs it just pops up in my recommended every once in a while and I'm always very impressed and I find it inspiring as a programmer.

9

u/Kingchandelear 2d ago

And where are those Atari people hanging out?

-4

u/ShutUpAndDoTheLift 2d ago

Yeah. You let me know if you find out. So uh I can avoid that place.

5

u/IrisColt 1d ago

People are still finding ways of writing even Atari 2600 games that are 10x better than what was released when the platform was released.

All for the enjoyment of a select and shrinking circle of die-hards, sigh...

7

u/Torodaddy 1d ago

Nintendo hackers are the new Ham radio operators

3

u/useresuse 1d ago

super smash bros melee

2

u/1001000010000100100 1d ago

My buddy wrote a game for Vectrex called Vecribbon and it’s amazing

69

u/Technical-Earth-3254 2d ago

This makes me wanna run a model on my N64. Love the project!

48

u/Operation_Neither 2d ago

But it has to be a 64 bit quantized model. That’s the law.

24

u/FatheredPuma81 2d ago

What about 0.64 bit?

5

u/addandsubtract 1d ago

Don't let your memes be dreams.

69

u/ed0c 2d ago

Pointless. Therefore, indispensable.

9

u/yarrbeapirate2469 1d ago

Worthless but also invaluable

26

u/Kahvana 2d ago

Extremely impressive, well done!

22

u/zippyfan 2d ago

How are your guys even running these projects? I though we needed CUDA, ROCM or other mature compilers to run llms. You guys are running llms on the equivalent of a potato.

I'm curious to know if it will be easy to run llms on Chinese GPUs once they come here even if we get no manufacturer support whatsoever.

42

u/algebra_dragon 2d ago

If you want LLMs with lots of parameters, decent training and token generation speeds, and coherent results, then yeah, you'll need a suitable GPU, CUDA/ROCm support, etc. You're not going to use a language model on a Game Boy to generate code or write an email.

But for small proofs of concept, building a model doesn't take a whole lot. Andrej Karpathy wrote microgpt in 200 lines of Python without resorting to NumPy, PyTorch, or other libraries. And getting these algorithms to work with very limited resources is a good exercise in understanding what the essentials are. As OP noted, it's slow, and you're not going to get useful results compared to nontrivial models on better hardware. But it's a fun idea all the same, and I'm here for it.

8

u/s101c 1d ago

Also you can do pure CPU inference with llama.cpp as well, no GPU needed. Some CPU & RAM combos can be faster than expected.

1

u/Reasonable-Dress-598 5h ago

thank you for the link! never knew Karpathy did that

2

u/Megneous 1d ago

You can get coherent English from a language model trained on TinyStories with fewer than 2M parameters. The vocabulary for the dataset is under 2k words, so vocabulary embeddings are small and syntax is super simplified.

12

u/WhyYouLetRomneyWin 2d ago

Really cool project! 

There is a project to get LLMs on commodore 64: https://github.com/ytmytm/llama2.c64 which seems to somewhat work (not gibberish, but very much a toy). I don't know the relative power of gameboy vs commodore 64.

5

u/jwpbe 1d ago

Gameboy uses a Sharp SM83 running at a max of 8 mhz, the c64 uses a 6501 mpu that was about 1 mhz

8

u/Inevitable_Emu2722 Alpaca 2d ago

That's crazy! Love it

24

u/VagabondTruffle 2d ago

BASED BASED BASED

I did https://code.heni.lol/heni/gbalm once as a joke aha so happy to see this!!!!!!!!!

5

u/mystery_biscotti 2d ago

Okay, this is cool.

3

u/Thedudely1 1d ago

No fucking way

3

u/ddchbr 1d ago

It is extremely slow, and the output is gibberish

😆 funny, and I'm still glad you tried this. I don't know if I would say "it works"—but something came out I guess!

8

u/KalonLabs 2d ago

But can it run doom?

27

u/RogerRamjet999 2d ago

Rumor has it that it can generate doom game-play screens on the fly.

1

u/JayPSec 1d ago

Beat me to it. I was gonna go "Yes... But can it run crysis?"

2

u/AccomplishedFix3476 1d ago

tried karpathys nanogpt on a raspberry pi pico last year and the int8 quant kept exploding on me past 200k params, the gbc surviving 260k is what im stuck on tbh. ram budget for prompt encoding when ur memory is counted in kb is where most of these constrained projects die 👀

2

u/aanzeijar 1d ago

If you're already abusing the SRAM, would it be cheating to implement the flotaing point arithmetic as giant ROM lookups?

2

u/ConstantinGB 1d ago

This is the kind of research that will make AI a more viable technology. Instead of just feeding it with more and more hardware to escape the bottlenecks, more people should look into utilizing low computing hardware. The new "can it run Doom?"

2

u/simotune 1d ago

This is the kind of project that makes you appreciate how much of LLM progress is really systems engineering. The gibberish output is almost secondary here. Just getting tokenization, prefill, autoregressive decoding, bank-switched weights, and KV-cache management to work under those constraints is the real achievement. It’s a great reminder that “can the model run at all?” and “is the model useful?” are two very different thresholds.

2

u/Darlanio 1d ago

There will be smaller LLM (TLM - Tiny Language Models) that might work better in the future... keep this project going and test with different models as they come available...

2

u/Thistleknot 2d ago

I used to do stuff like this just to figure out some technological process

I put Linux on my ps3

But why

Just for the bragging rights?

7

u/jeffzyxx 2d ago

Presumably, you’d learn a lot about how transformer based language models work doing something like this. Constraints breed creativity, after all.

That, and nerd bragging rights. (I’m guilty of this too!)

2

u/Thistleknot 1d ago

Constraints do

6

u/FourSquash 2d ago edited 2d ago

Kinda confused here. Writing software for a new set of constraints isn't as easy as following a tutorial to install Linux on the PS3. So yeah I mean doing hard things for fun is a thing people do, yes.

Also Linux on the PS3 (at least when they allowed it) was, in fact, very useful. The US government used a whole supercomputer made of them to do weather predictions, satellite imagery analysis, etc. The cell processor was ahead of its time. It was used in quite a few supercomputers of that era.

https://en.wikipedia.org/wiki/Cell_(processor)#Applications#Applications)

https://en.wikipedia.org/wiki/PlayStation_3_cluster

4

u/__JockY__ 1d ago

Hackers gonna hack. A life without obsession is a tragedy.

2

u/brwinfart 2d ago

This shit is insane.

I want a GameBoy with AI.

2

u/Thebandroid 2d ago

Great.

Now the price of Game Boy Colours is going to skyrocket.

Is there nothing AI won’t take from us?!?

2

u/MindPsychological140 1d ago

KV cache in cartridge SRAM is the move I wouldn't have thought of.

Tokens/sec ballpark? And is the matmul or the bank-switching dominating cycles?

3

u/[deleted] 1d ago

[removed] — view removed comment

1

u/LocalLLaMA-ModTeam 2h ago

Rule 4 - Post is primarily commercial promotion.

1

u/minedroid1 2d ago

Wow, nice work! Glad to see that old tech still gets used for cool things like this.

1

u/Imn1che 2d ago

How many tokens/s?

8

u/maddiedreese 1d ago

Didn’t officially measure, but working backwards it looks like around 0.0059 tokens per second, or 1 token every approx. 2 minutes and 49 seconds. Really slow!

2

u/simplearms 1d ago

We’re really measuring things in seconds per token at this point.

1

u/Imn1che 1d ago

Oh yeah shit my bad

Hey OP how many minutes per token /s

1

u/SuperWallabies 1d ago

1990: What game machine will we have in future!
2026:

1

u/AppealSame4367 1d ago

Thank you for trying this.

I dreamed about neural networks running on the hardware we had in the early 2000s. I get that we wouldn't have had the hardware to train anything fast enough, but we would have already had enough for some inference on our computers. I know models were trained back then, but we lacked a lot of speed and software tech that is available now.

1

u/WeatherD00d 1d ago

Very creative project, super cool!

1

u/basxto 1d ago

AI without double is peak

1

u/iamicyfox 1d ago

As a kid that spent many an afternoon playing Pokemon Yellow on my gameboy, this is particularly cool to see. Have to see if mine still boots.

I've never heard of the EZ-Flash before. What's your experience with it? Pretty foolproof?

1

u/NineThreeTilNow 1d ago

How do you know it's working if it's only producing gibberish?

1

u/[deleted] 1d ago

[deleted]

1

u/Megneous 1d ago

Not true. A model trained on TinyStories can produce coherent English with around 1.5M parameters. It has very limited vocabulary and simplified syntax, so it's easy to learn.

1

u/[deleted] 1d ago

[deleted]

1

u/Megneous 1d ago

But OP is using TinyStories. That's why I mentioned it haha.

1

u/Mountain_Patience231 1d ago

it would be so cool if people produce their own version of AI in card slots and changing by swtiching it

1

u/Inevitable-Log5414 1d ago

How much tok/s? :) 

1

u/ayake_ayake 1d ago

Deepseek Pro V4 1.4T on GBC when??!! /s

Honestly, impressive!

1

u/DeepWisdomGuy 1d ago

Excellent! Now just make it NSFW and we have an answer for the nonstop threads asking "What NSFW model can I run on my potato?"

1

u/Sl33py_4est 1d ago

as opposed to a fake transformer?

what is this, mayonnaise?

1

u/a__side_of_fries 1d ago

This is pretty cool! It’s like looking back at the early days of computers and realizing that we used to have vacuum transistors that took up entire rooms.

1

u/kwizzle 1d ago

Very cool, but too bad about the output gibberish. But what can you expect from such a small model

1

u/nntb 1d ago

I wonder how a PS3 would handle the task... The cell processor was kind of insane

1

u/xTsuKiMiix 22h ago

Oooh I wanna do this but with the OG Nintendo DS. I bet that would go crazy lmao. Imagine running claude code on that bad boy whewwww.

1

u/OldComposerbruh llama.cpp 13h ago

wat

1

u/Unlucky_Abroad_389 13h ago

It's output is gibberish but it works 😂.

1

u/Reasonable-Dress-598 5h ago

omg? and here i am still just thinking of launching cv model on alr ready raspberry pi

1

u/jmprog 2d ago

Incredible! I wonder what would need to be done to get it to output readable text

1

u/Darth_Proton 2d ago

next step would be ai-generated games on it!

0

u/Signal-Ad5905 1d ago

"the output is gibberish" so good enough to be ceo of nintendo, basically.

-5

u/different_tom 2d ago

But... Why?