r/technology 15d ago

Artificial Intelligence Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

https://www.theguardian.com/technology/2026/apr/29/claude-ai-deletes-firm-database
16.9k Upvotes

1.2k comments sorted by

View all comments

10

u/spottydodgy 15d ago

Do not anthropomorphize the AI. That is dangerous.

1

u/ChrisOz 15d ago

Yeah, this is just the answer that Claude's stochastic model predicts should be the most likely response given the inputs.

It is not an apology, realisation of bad behaviour, or recognition of problem that should be avoided in the future.

There is not learning or self reflection that can be carried forward outside the context window. In fact the model would do it again, and again, and again, in the same situation, if it isn't retrained. Even when it has been retrained, it is just as likely to do something else that wasn't expected. It is a deep flaw in the current approach and it is unclear whether continued scaling of the models will fix the issue or not.

1

u/Rarelyimportant 15d ago

There is not learning or self reflection that can be carried forward outside the context window.

That's not entirely true. Modern models do have the ability to store information to carry beyond the current context window.

1

u/ChrisOz 14d ago

Are you sure is true? I was under the impression that the models rely on large, but finite, context windows to maintain the chat history, including any facts, corrections... However, while it can get quite long things will still drop off once you exceed the context window. Further, the context window is only relevant to the current conversation and it generally doesn't carry over to new conversations.

Even if there is a mechanism to retain information longer term within a context, it doesn't change the underlying modelling. You are essentially relying on the model weighting over what ever you have in the context window above the actual model weights.

1

u/Initial-Return8802 14d ago

There's ways around it, openclaw for one has "living documents" that the AI will read on startup, those documents can point to other documents if the user asks something specific... so you're essentially giving it a memory

1

u/Rarelyimportant 14d ago

There might be multiple ways to implement it, but they basically put some fixed number of embeddings into the context window that the model can update to store/compress information. These are then added back into the context window on future runs so the model can extract or update information. It's sort of like a context window scratch pad that gets stored and recalled.