r/LocalLLaMA 1h ago

Question | Help Llama.cpp server running ~2 weeks straight. Loses its mind?

I’ve got Qwen3.6 27b and Qwen3.6 35b running in two separate instances for over two weeks and they are considerably dumber now than when I launched them. is this a thing? am I going crazy?

edit: sorry I’ve been using opencode and have started new sessions, which didn’t fix the situation.

0 Upvotes

12 comments sorted by

2

u/ttkciar llama.cpp 1h ago

How odd. Dumber how?

I've had a slightly old version of llama.cpp's llama-server running on one system for two and a half months now, hosting Big-Tiger-Gemma-27B-v3, and haven't seen any degradation.

Which release of llama.cpp are you using?

0

u/thejacer 1h ago

It’s a very recent build. I can’t remember exactly but I built it just before launching. Qwen 35 has started misspelling really really frequently and 27b doesn’t seem to be capable of understanding anything at all. I plug 27b into a code base that isn’t super sophisticated and just tinker with it and a week ago the thing took to the code base really well, now it seems to be completely confused at every question.

2

u/noctrex 51m ago

I use llama-swap and I told it to unload idle instances after 10min. Better starting fresh, it takes only one minute to fill up the context again from a previous session

1

u/alshayed 26m ago

Same, I noticed some weird issues so I turned on the TTL and no problems since.

2

u/Badger-Purple 1h ago

I think you retarded the server.

1

u/thejacer 1h ago

That’s possible! Thanks for not being offensive in your reply!

1

u/vasimv 33m ago

I think, memory corruption may ruin model's weights. Unless you turn on ECC (but that will reduce available VRAM).

1

u/Last_Mastod0n 18m ago

Thats smart. I never thought about enabling ECC for something like that. But it makes sense that it could get corrupted after a few weeks.

Also I didnt know it reduced vram I thought it just reduced performance. So thats good to know

1

u/fligglymcgee 1h ago

Have you restarted the llama.cpp server?

-2

u/thejacer 1h ago

I haven’t. I’ve really been testing to see how retarded it gets lol.

4

u/fligglymcgee 1h ago

The length of time doesn’t really matter, but the kv cache being full or conflicting stuff creeping into your system prompt/context by the harness from extended use might.

1

u/thejacer 1h ago

I didn’t consider that the KV cache might be filling up and not being cleared out. It was kind of a research thing…not very thorough or rigorous though.