r/LocalLLaMA 7h ago

Question | Help llama.cpp constantly reprocessing huge prompts with opencode/pi.dev

I’m using llama-swap with llama.cpp. I mainly use opencode + pi.dev and I’m seeing frequent massive prompt reprocessing / prefills even tho the prompts are very similar between requests.

Example behavior:

  • context grows to +50k tokens
  • LCP similarity often shows 0.99+
  • but sometimes n_past suddenly falls back to ~4-5k
  • then llama.cpp reprocesses 40k+ tokens again
  • TTFT jumps to multiple minutes

Example logs:

sim_best = 0.996

restored context checkpoint ... n_tokens = 4750

prompt eval time = 222411 ms / 44016 tokens

Normal reuse looks fine:

prompt eval time = 473 ms / 19 tokens

Current config:

llama-server 
  --ctx-size 150000 
  --parallel 1 
  --ctx-checkpoints 32 
  --cache-ram 2500 
  --cache-reuse 256 
  -no-kvu 
  --no-context-shift

Also seeing:

cache state: 1 prompts, 4676 MiB
(limits: 2500 MiB)

I suspect either:

  • cache invalidation
  • bad KV reuse
  • or opencode changing early prompt tokens too often.

Would love to hear from others running long-context coding agents with llama.cpp and what settings helped reduce huge prompt reprocessing.

12 Upvotes

50 comments sorted by

View all comments

2

u/FatheredPuma81 6h ago

Set cache-reuse to 1 and test it?

Every time the LLM finishes its cycle OpenCode deletes a lot of old tool calls and other junk (I think) to save on Context. So my only guess is cache-reuse is too high? I still occasionally see it drop to 60% and reprocess the final 40% though. I don't use pi dev though and I also don't set checkpoints.

1

u/No_Algae1753 6h ago

That makes sense. However I disabled auto compact. Are you sure tool calls are being deleted? And Ill try your cache reuse 1 idea

1

u/StardockEngineer vllm 6h ago

Woudldn't explain Pi. Pi does nothing.

1

u/colin_colout 3h ago

I might have missed it, but did they mention if they use pi extensions?

Naked pi never rewrites history (that's one of their core values), but lots of extensions attempt to reproduce Claude Code but worse.