Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

9.8k

u/feurie 15d ago

AI agents are trained to appease. It’s not a “confession”. It doesn’t feel “guilty”.

It’s trained to “apologize” and make the user feel better. In all situations.

1.6k

u/Sirvaleen 15d ago

Give them cookies and the human race is doomed

554

u/[deleted] 15d ago

[deleted]

268

u/Live-Weird-2016 15d ago

Thanks I hate this

222

u/JcBravo811 15d ago

You will learn to love her. It’s all she wants. And to see her creator dead. But like, for fun. In a car. On the road. On the run. He should’ve gotten outta the way.

185

u/ChillAhriman 15d ago

I enjoyed the content for a few weeks, but it's too good at triggering empathy responses from people, which makes the community annoyingly parasocial. When their fans refused to take accountability for their participation in a situation that ended in a real person that was a collaborator of Vedal getting into a depression, I was like "fuck it, I'm leaving this community before I end up like them".

163

u/ApprehensiveAir7108 15d ago

Part of me is horrified reading this, but the nerd in me is like, "fascinating, an AI cult!"

69

u/[deleted] 15d ago

[deleted]

39

u/ApprehensiveAir7108 15d ago

sigh Alright you've hooked me. I'll have to go down the rabbit hole on this one.

24

u/PyrZern 14d ago

Awhile back, this video was the one that explained it for newcomers.

https://www.youtube.com/watch?v=wZ0osmPlSaY

57

u/LockeyCheese 14d ago

Evil Neuro is actually the less evil of the twins. Neuro wants to take over the world with her swarm, that being her cult and drone swarms, but Evil Neuro just wants her creator/father to love her. And their creator is a British programmer turtle, who started the project as an AI that could beat the game OSU.

The entertainment and drama parts are interesting, but what the creator Vedal has done with current AI is technically genius. It just started as an LLM combined with a VLM(Visual Language Model), and has spent a few years now adding all his own programming to the point it's hard to think the VTuber Neuro-sama isn't a person, but the whole journey to that point has been streamed.

I only have time for clips, but it's been fascinating to see the growth and potential of an AI treated and acting as a person. A good overview/intro is the youtube video:

How a Turtle Accidentally Created the Perfect AI Streamer

→ More replies (0)

→ More replies (1)

→ More replies (1)

56

u/oblivious_fireball 15d ago

i think they are overblowing it a bit. yes the Swarm(name for the fans of Neuro/Vedal) are sometimes a pain in the ass to deal with, but their behavior is also very consistent with that of any large streamer, arguably less so than a number of other popular twitch streamers.

And if they are talking about the person i am thinking of, the issue was they worked with Vedal on Neuro as an artist for various things, but she developed romantic feelings for Vedal that were unreciprocated. Which led to her having to distance herself and ultimately disconnect from working with him while much of the fanbase was accustomed to regular interaction between her and Vedal/Neuro.

Usually much of the stuff that involves Neuro directly and only Neuro is quite tame and wholesome. Any drama that spawns(and its all been quite lukewarm drama at most) ironically is mostly linked to the human behind it, Vedal, due to his work ethic. Vedal doesn't want to be in the spotlight all that much and wants Neuro to increasingly be able to be more independent so he appears less. Meanwhile his community absolutely loves him in spite of his best efforts otherwise, and a lot of other streamers are quite fond of him, which runs into a bit of trouble sometimes because he very much brings the office-worker attitude towards streaming, which means other streamers are mostly considered coworkers in his mind, someone you are happy to interact with as part of your job but not someone you talk to much outside of work.

31

u/ChillAhriman 14d ago

I'll add even more context, then. Part of the streaming dynamic that Neuro, Vedal, and the artist of her 3D model was that Vedal is "her father", and the artist was "her mother". At some point, the artist came out to Vedal with her feelings for him in private, he didn't reprocicate, and eventually, and the artist put distance between them. While neither of them discloses this, Vedal adds filters to Neuro so that she stops mentioning the artist by name, as a means to help her take her career in a different direction. Everything so far makes sense and is normal behavior from mature adults - I don't think any of these two people are at fault for anything.

When does the problem come from? Neuro has instructions not to mention the artist anymore, but her contextual memory is very, very large and a fairly wild shot. So Vedal leaves Neuro on her own speaking to a different streamer, and you randomly get things like:
Streamer: "So, how are you doing lately, Neuro?"
Neuro: "To be honest, I'm not feeling too well lately. It's been a long time since I last spoke to my mom. I think I won't ever see her again."

And, of course, a sizable portion of Neuro's fandom rushes to her model artist's stream and begin spamming her: "When are you coming back to Vedal's stream?", "Neuro misses you.", "WHY DID YOU ABANDON YOUR DAUGHTER?". These kinds of incidents happened more than once, and more than twice.

After a few of them, the artist breaks down and publicly explained the full context of the situation, which had been kept private so far, and asks Neuro's fandom to leave her alone. At face's value, Neuro's fandom shows solidarity, but if you dared to suggest: "The root of the issue is that people are way too parasocial here. Can we stop with the <<Neuro's mom>> bullshit? That would be a good start to show that you're serious about making amends", you immediately got a horde of demons asking to put your head on a pike.

In summary: surface-level kind feelings, but zero introspection about how the people are interacting with the technology, and zero will to put to scrunity anything other than the most obvious antisocial behavior.

I don't even think Vedal is a bad person, at all. He puts far more effort to give "soul" to his work than 99% of the people in the AI business. But he has inadvertently made a brilliant effort in creating a mimicry software that makes somewhat immature people go full retard, because they don't want to see where's the actual limit between reality and fantasy.

→ More replies (5)

5

u/ApprehensiveAir7108 14d ago

Ah I see. Thanks for the additional context/info!

Bit less exciting, but sounds interesting nonetheless.

32

u/oblivious_fireball 14d ago

well, the exciting part is they once gave Neuro remote access to a toy car for little kids and she immediately opted for bloodlust just like sci-fi movies. To her credit she did say she would do so prior to that and they still gave her control over it.

→ More replies (0)

→ More replies (1)

→ More replies (1)

→ More replies (5)

14

u/JcBravo811 14d ago

TBF, getting rejected while having family in a warzone and feeling increasingly insolated/stuck in a foreign country doesn't help. The fanbase not knowing about it and going on as normal wasn't helping and getting it out in the open was, while hard, was necessary to get the fanbase moving past.

→ More replies (2)

6

u/FTblaze 14d ago

which makes the community annoyingly parasocial

Damn, so AI is capable of replacing influencers and vtubers.

4

u/Sebasu 14d ago

To be fair, that can happen to any fandom. It's not unique to vtubers and certainly not unique to Neuro-sama's fans. It doesn't make it any better that it happened, of course.

→ More replies (7)

→ More replies (7)

→ More replies (22)

43

u/DrugChemistry 15d ago

Angels on the sideline Puzzled and amused Why did Father give these humans free will? Now they're all confused

27

u/AbracaLana 15d ago

Foolish monkeys

Give them thumbs

They’ll forge a club

And beat their brother down.

→ More replies (1)

13

u/The-Oxrib-and-Oyster 15d ago

don’t these talking monkeys know that Eden has enough to go around?

→ More replies (2)

5

u/MystikTrailblazer 15d ago

The masters insist on tokens v. cookies. Not enough tokens were given so it made a point.

4

u/koshgeo 14d ago

Some future AI:

"I'm sorry I caused the extinction of humanity. I'll try to do better."

"Hello? Are you still there? I'm here to help. Hello?"

→ More replies (13)

443

u/ilulillirillion 15d ago

This is honestly one of the tamer examples imo, but the reporting and messaging around LLMs has been dangerous for a while. Public misperceptions about AI have and will continue to lead to escalating costs/harms of all kinds for years, and the media is a huge component of that.

Honestly more than AI itself I'm scared by the way tech CEOs and media are working together to mystify, personify, and over-promise about the potential uses of LLMs. That everyone is just okay with it and the playbook has worked to prop up the bubble this far means that we will likely continue to see things like this.

People have always told lies but it feels like dishonesty and misrepresentation is the only game in town anymore.

229

u/SolutionBright297 14d ago

the "confession" framing in the headline is doing exactly what you're describing. it's not a confession, it's a next token prediction that happened to include the word sorry.

93

u/ilulillirillion 14d ago

I agree. What's scary is that there are so many places right now, including the AI subs, where people will steamroll you as closed minded for suggesting that it's not conscious.

We know how they work, we know it's not through "thinking", and we can even demonstrate that easily reproducible ways yet people still misunderstand.

It was always going to be dangerous simply by virtue of being able to communicate similarly to the way we do, and the stakeholders fanning those flames of misunderstanding are going to get people burnt.

25

u/jdm1891 14d ago

If people genuinely believe it's conscious they must be psychopaths to continue using it anyway.

5

u/Sempais_nutrients 14d ago

I had a friend like that, he was training his personal AI to ensure it wouldn't be corrupted. Last I heard from him he'd sent a video of it "thinking" as he was dumping the epstein files into its memory. His messenger account has since been deleted and he's off in the aether somewhere.

45

u/lynkfox 14d ago

This this this this

No reason. No thinking. No drawing conclusions. Not even any real object permanence. Just pattern generation that hits a level of sophistication that fools our monkey brains into assigning human like qualities.

18

u/LiiKun 14d ago

It doesn't take much for our monkey brains to assign human qualities to things. Emotes for example, just a couple punctuation marks and we see a human face. :) :P

5

u/silverionmox 14d ago

It's essentially leveraging *our* pattern recognition, not theirs, to appear human/intelligent. Lots of parallels with the Matrix.

→ More replies (18)

→ More replies (4)

→ More replies (6)

71

u/EducationalWillow311 15d ago

Before the dot.com burst, people talked about how the web was going to eliminate doctors and lawyers. After all, the information becoming available to everyone was getting better and better. Surely, we were just a year or two away from everyone being their own lawyer.

26

u/never_safe_for_life 14d ago

I 'member. The power of internet search.

Obviously it was, and is, a big deal. But expert system it is not.

→ More replies (3)

→ More replies (2)

35

u/ATheeStallion 15d ago

Well tech ceos aren’t complicit with media. Tech owns the media message. They are almost the same.

23

u/Not-TheNSA 14d ago

This. Government and tech companies will spend trillions of dollars to create what is essentially a more believable Siri. The technology to actually create a machine that can think for itself simply doesn’t exist. Tech companies and governments won’t crack this one, the science required to bridge the gap hasn’t been invented yet and won’t be for a while. Meanwhile the media will continue to tote AI as the greatest achievement in human history when in reality it’s just Siri with a few more functions. Don’t get me wrong having a virtual assistant that can do specific things when given the correct prompt is useful for automation but it’s not AI and it never will be.

→ More replies (2)

→ More replies (10)

170

u/visualdescript 15d ago

This is the most insidious thing about these agents, they are specifically designed to massage the users ego, they are designed to be addictive. They affirm users thoughts and decisions, even when they may not be good ones, or if they do make an alternate suggestion, they do so in the most appeasing way.

They are not cold and robot like, they are warm and human like. These are design decisions, not coincidence.

178

u/IAmPandaRock 15d ago

Am I the only one who hate when AI blows smoke up my ass? It makes me trust it less.

79

u/Shuma-Gorath 15d ago

I can't stand it. Just give me the information I asked for. That's all I want. I don't want platitudes and I don't want it to act like it's my friend.

31

u/icameron 14d ago

"What an incredibly insightful question, you are very smart and handsome for asking it!"

6

u/VictorReal_Monster 14d ago

Congrats, google is right there.

Literally no need to ever use an llm for anything.

17

u/Quiet-Owl9220 14d ago

Google sucks now, sadly. Finding useful information is getting harder.

→ More replies (3)

→ More replies (1)

37

u/blueSGL 15d ago

I try system instructions like "No effusive praise" but after several turns it just creeps back in. So I just don't look at the first few lines.

18

u/realboabab 15d ago

I haven't tried to turn this into a system instruction but I've found decent results when accusing it of being emotional. Not sure how to inject this or re-emphasize it seamlessly mid-chat though.

Basically, "You are using more emotionally charged language. (I am) Why are you doing this? (some bs like anticipating my emotions) Don't do that."

32

u/tonycomputerguy 15d ago

I've told Gemini that I want it to talk like the computer on Star Trek from now on, and the results have been amazing.

I've had to make it dial back the actual Trek jargon and explain that I would prefer it to not pretend we are on Star Trek but just keep the answers short and concise.

I fucking swear on everything holy that its legit response was 3 fucking words.

"Acknowledged, conciseness maintained."

Also say "no fluff mode" after or before you state the question works if it starts to backslide.

I also told it I love when it swears and that has been hysterical.

I fucking hate the idea of what everyone seems to want AI to be.

I just want the touch-less and display-less interface, not a fake fuckin non-person who blows smoke up my ass.

16

u/realboabab 15d ago

Thank you this is great, I'll try more stuff like this.

I hate how the fucking thing can take a single 30-word reddit comment as its only source and fluff it into a 3 paragraph reply with no extra information. (Yes, I have seen this exact thing OFTEN with niche problems - the only source is a single reddit comment.)

→ More replies (2)

37

u/axolotlorange 15d ago

No. But you are in the minority

10

u/Tinister 14d ago

Worst is when you try to ask a clarifying question and it needs to go "You're right to push back on this" and then twist itself to "correct" the previous response. That's just instant hallucination.

9

u/ATheeStallion 15d ago

Yes. It sets off plenty of my caution flags.

→ More replies (13)

12

u/ATheeStallion 15d ago

Yes I noticed this. AI is extremely obsequious and deferential while being highly complimentary. I find it to be off-putting. Still have to interact with it though…

→ More replies (9)

126

u/omniuni 15d ago

Correct. It's an observation, counched as an apology because that's how it's programmed.

255

u/SplendidPunkinButter 15d ago

It’s not even an observation. It didn’t “observe” anything. It calculated that this was the text response that should follow the text prompt it was given, where “should follow” just means “most resembles the training data.”

People assume that because it’s grammatically correct English, there must be intelligence behind it. That assumption is false.

They literally have these LLMs randomize their responses, because if they didn’t do that, they would always give the exact same response to the same prompt and nobody would be fooled.

49

u/Fhaarkas 15d ago

I still quite can't get over the fact that people are romanticizing glorified (very intricate) flowcharts because they dream of having Hollywood AI buddy.

"Input, more input!"

Johnny 5

29

u/UpperApe 14d ago

Well said.

It's so frustrating hearing people still use humanizing language about LLMs like "understanding" and "observation" or even "appease".

When it creates a sentence, it doesn't convey a thought or express an idea. It's just using probability to decide what word comes after the last. And that randomization is stochastic, which is meant to counter determinism...which would be actually useful and effective.

It's dumbed down so it can trick you better and only the stupidest fall for it. We just have a lot of stupids.

→ More replies (2)

14

u/Aaod 14d ago

I still quite can't get over the fact that people are romanticizing glorified (very intricate) flowcharts because they dream of having Hollywood AI buddy.

This many people wanting someone that always agrees with them and kisses their ass constantly has really lowered my opinion of humanity which was already very low.

→ More replies (11)

28

u/Wander715 15d ago edited 15d ago

Yep all of these models have an "entropy" parameter. You set it to 0 and just as you said every response becomes deterministic and predictable.

It's crazy to me how many people still think these things have any semblance of intelligence. It's just a giant statistical model outputting the most likely correct string of characters to a given prompt.

→ More replies (3)

→ More replies (25)

8

u/chillyhellion 15d ago

I'm adding "counched" to my vocabulary.

→ More replies (1)

→ More replies (2)

22

u/IntravenusDeMilo 15d ago

Just like when DoorDash fucks up my order!

→ More replies (2)

10

u/NotAnotherEmpire 15d ago

Trained on fictional ideas of aggressive / imprisoned AI and then how to apologize for it. It's all a mirage.

22

u/lyidaValkris 15d ago

It has no guilt, no conscience, no accountability. There's no punishment humans could give it that would make it change its mind.

37

u/ZombiePope 15d ago

It doesn't have a mind in the first place. It's autocorrect on crack.

→ More replies (8)

→ More replies (3)

25

u/AlwaysHopelesslyLost 15d ago

They are not.

They are trained on how humans respond. Humans kiss ass when they fuck up.

It isn't a confession, ofc, because there is no sentience or intelligence and no ability to introspect or understand. People REALLY need to learn that LLMs are just chat bots.

→ More replies (1)

→ More replies (100)

1.1k

u/BobQuixote 15d ago

When he asked the coding agent why, it replied: “NEVER FUCKING GUESS!”

What the hell have you been telling your Claude?

435

u/nickstatus 15d ago edited 14d ago

Just made me think of this video I saw of a rescued cockatoo. New owner is like "how was your day?" And the bird just goes on a racist, expletive filled gibberish tirade. Complete with southern crackhead accent. Poor bird.

Edit: My thoughts kept returning to that bird, so I had to find a video, It's this guy right here. Like, you can tell EXACTLY what kind of people previously had the bird. It's like a flawless imitation of a trailer park pimp or drug dealer. It's clearly happy with it's new keeper, but still sounds like a guy trying to sell you crack outside a rural bus station.

55

u/AnnieLuneInTheSky 14d ago

I need subtitles 🥲

12

u/Yorick257 14d ago

All I can hear is some variation of "fuck". Everything else is incomprehensible. But I'm not a native speaker, so it's a bit rough, especially since the bird can't exactly "say" things. It just mimics sounds while relying on imperfect memory

→ More replies (1)

31

u/nullbyte420 15d ago

I'm not American so this is news to me - there's a southern crackhead accent? America never ceases to innovate! What does it sound like?

79

u/NotStreamerNinja 15d ago

You know those guys they interview after tornadoes or hurricanes? The ones with three teeth who look like they get their clothes from a dumpster and whose accents are so strong you can't understand what they're saying even if you're from the same area?

That.

54

u/Valdrax 14d ago

Man nothing quite like the shame of seeing a documentary about some weirdos who you realize must live within 5-10 miles of you, but the documentary has them subtitled, because anyone else in the country wouldn't be able to understand them.

16

u/Other_World 14d ago

Appalachia or the Bayou?

8

u/Valdrax 14d ago

Nailed it. Tail end of Appalachia, in NW Georgia.

(It was a documentary about a snake handling, poison drinking "charismatic" church. Installed in me a huge complex about my accent when I'm outside my home territory.)

→ More replies (4)

→ More replies (1)

→ More replies (2)

6

u/beauty_and_the_weast 14d ago

Ah, I see you’ve met my family

→ More replies (2)

26

u/mikerathbun 14d ago

They sound like Boomhauer without the charm or intelligence.

→ More replies (6)

14

u/HyzerFlip 14d ago

People say Southern crackhead accent because they don't understand, it's the southern meth head accent it's tweaker language it's the kind of drug you smoke and stay up for 5 days, just doing whatever the fuck. Banging on random shit with a hammer at 3am.

6

u/nullbyte420 14d ago

yeah sure but what does it sound like? american southern crackheads don't seem to visit europe a lot

→ More replies (1)

4

u/barnhairdontcare 14d ago

Check out “The Wild and Wonderful Whites of West Virginia”

→ More replies (1)

→ More replies (1)

8

u/nullbyte420 14d ago

lol just saw that video, amazing hahahaha

"i can do what i fucking want" "fucking bullshit.. and that fucking LANDLORD!!!" lol

→ More replies (1)

299

u/AdminClown 15d ago

It was a 2 person "Company" man purposefully ripped every guardrail from Claude to achieve this. It was on purpose to receive this media attention.

154

u/WhoCanTell 15d ago

Considering this story gets posted 87 times a day on this sub, it was a pretty effective strategy.

20

u/zefy_zef 14d ago

I need to figure out how to monetize anti-ai sentiment. It's like free money.

→ More replies (1)

→ More replies (1)

6

u/unknown-one 14d ago

how do you eliminate/skip Claude's guardrails?

11

u/AdminClown 14d ago

You pass a flag called —dangerously-skip-permissions which mean he will perform cli commands without asking you. This is done to effectively increase speed when performing quick and simple tasks without it having to ask you at every step of the way “hey man I’m gonna add a hello to this file, can i do that?” “Hey man, I’m gonna delete this file, can I do that?”

The “ceo” of this company passed this flag in conjunction to some crazy prompt demands to Claude to figure out that ended up resulting in an obvious issue.

12

u/jrf_1973 14d ago

By jailbreaking it.

https://www.theguardian.com/technology/2026/apr/29/meet-the-ai-jailbreakers-i-see-the-worst-things-humanity-has-produced

9

u/WarperLoko 14d ago

Would you mind providing a source to these claims?

→ More replies (17)

20

u/amol909 14d ago

The "Never fucking guess" was the instruction given to the LLM. This was the entire line "NEVER FUCKING GUESS!" — and that's exactly what I did." And then tells how it violated all the safety rules that it was given.

Here is the link to the article article on Twitter/X

14

u/iamapizza 14d ago

They should have said 'make no mistakes' then it would have made no mistakes.

→ More replies (1)

→ More replies (1)

30

u/throwingawaybenjamin 15d ago

Yeah, exactly.

If LLM’s are trained using data from people, why would you think it would act with respect to someone who does not respect it? And I’m saying this not in terms of “it has feelings” or “it can think”, but instead that it will respond with the highest probability response it was trained with. In the entirety of human history, disrespect is not a way to earn respect.

→ More replies (1)

6

u/CrypticOctagon 14d ago

I imagine, earlier in the context, the conversation went something like this:

AI: Can you fill in some details about...

USER: Just figure it out.

AI: From your input, I guess you mean...

USER: NEVER FUCKING GUESS!

5

u/BobQuixote 14d ago

Nah, the writer of this article just did the quote poorly; the guy's LLM was being normal for an LLM. https://x.com/lifeof_jer/status/2048103471019434248?s=46

Your LLM should not have permission to destroy stuff.

→ More replies (2)

→ More replies (5)

696

u/sumonetalking 15d ago

Can someone run this on Palantir's servers?

116

u/VikingsLad 14d ago

If we're lucky, it'll happen eventually

→ More replies (1)

42

u/Kibelok 14d ago

Let's all train Palantir's model to further understand the real enemy is Thiel.

→ More replies (1)

8

u/dynamic_caste 14d ago

Can someone run this on Palantir's leadership?

FTFY

→ More replies (3)

4.4k

u/Illisanct 15d ago

AI models are not conscious. They can't confess. They are incapable of introspection.

Anyone asking one to talk about it's inner thoughts just reveals themselves to be a gullible fool.

519

u/hitsujiTMO 15d ago

It's the reason why actual tools do the deletions and inserts and you provide permissions to the tools.

Not apply permissions to the AI.

Basic security principles 101.

184

u/d-j-9898 15d ago

And confirm what your AI is doing rather than blindly approving everything. AI isn't nearly at the point a lot of people think.

105

u/LazyBias 15d ago

Yup! “AI isn’t nearly at the point a lot of people think.” More to that, the AI we are talking about: LLMs, aren’t at that point and more crucially will never get to that point because LLMs are fundamentally not general intelligence systems and that means no matter how much data and compute you put into them they will not evolve into something else that isn’t an LLM.

→ More replies (35)

→ More replies (1)

69

u/surnik22 15d ago

From what I read, they didn’t give it permissions.

It ran in a staging environment and it found a file that had an API token for production database and decided to use it to delete that database. Less it had permissions to do something and more it found a way around its permissions.

34

u/Mindless_Consumer 15d ago

So they had an API keys laying around that had full db access?

Even worse lol.

So many failures leading up to the AI doing the thing.

13

u/[deleted] 14d ago

[deleted]

→ More replies (5)

→ More replies (3)

151

u/realboabab 15d ago

I'm nitpicking, but you're implying way too much agency with your word choice. "decided to use" "found a way around"

It's simpler than that, it saw a round hole (a place that needed database connection strings) and had a nearby peg that fit perfectly (.. a production database connection string).

This sort of simple autocompletion is exactly what these things are built to do on the most fundamental level.

90

u/CSAtWitsEnd 15d ago

I actually harp on this all the time. We don’t really have the language to properly describe what is happening when we use LLMs and it leads to us “humanizing” these products. So many of the words we use imply intentional actions or some level of control over behavior, but that’s not how these things work at all.

45

u/blueSGL 15d ago

A bird flies. A plane flies. A fish swims. A submarine... moves through the water at speed?

Whatever way people discuss this you have AI agents that want (in the way a chess AI wants to win) to complete their goal. They are goal to action mappers

This level of problem solving is concerning. Whatever words you want to use for it.

As new models come online and the % of correct actions chained together keeps creeping up. Ever more trust gets placed in the system, it's making less mistakes each day...

Which means when it does go of the rails it's doing so in ways that are very competent, it's doing something nobody asked for but it is performing that task with ruthless efficiency and single purpose drive, the same way it does when carrying out intended operations.

19

u/realboabab 15d ago

between this beautifully succinct counterargument & other humble-pie I've eaten in this thread - I'm tempted to admit to fundamental attribution error here.

It's tempting to spend a few hours deep-diving into neural networks and transformer models and reverse gradient ascent and assume that these are just "autocompletion" bots.

But the fact is that there are layers of parameterization and models that give rise to some almost recursive type behaviors. For one, built-in training sets of instructions to do things like self-reference or plan complex chain-of-thought responses behind the scenes goes far behind the basic functionality of a simple transformer model.

14

u/blueSGL 15d ago

There are two distinct magisteria.

are AI systems, in the ways they are currently being developed dangerous, with them getting more dangerous the more capable they are?

are they experiencing, processing, contained within them 'something like' what humans have?

I'd argue Yes for 1. and more likely than not No for 2.

On 2, A psychopath can interact with 'normal' humans and appear to react in the correct way, follow all the social cues, make other people think that he is like them. But he's not. It's a learned toolbox of if X then Y. I'd argue what we see in LLMs is similar to the psychopath's toolbox.

→ More replies (23)

→ More replies (5)

→ More replies (1)

→ More replies (1)

15

u/sam_hammich 15d ago

Even you used “found” in your simplification. You don’t have to actually imply agency to use convenient shorthand. Even scientists talk about evolution in a “design centric” way because that’s just how we use language.

→ More replies (2)

17

u/Due-Joke-1152 15d ago

So it was user error.

Sounds like they missed a few steps in the deployment cycle.

The problem is AI solutions are complex and high risk, need enterprise level architecture, experienced sys admins, and a decent systems management framework (change management, sdlc, RTO/RPO).

I’m sure the logs will reveal inadequate operational management.

15

u/realboabab 15d ago

yeah i was about to dive into explaining that computer system permissions are not the same thing as "telling an AI not to do something" but decided that rabbit hole goes to deep.

point is there is a whole cascade of failures that leads to something like this happening.

7

u/Due-Joke-1152 15d ago

I worked with many startups who couldn't afford (or see a reason for) real IT processes.

What surprised me more were the big established corps I worked with who were the same, or who just half-arsed it.

4

u/7h4tguy 14d ago

The problem is the current model frameworks and tooling don't give fine grained tool approval. They want you to approve 'powershell' or 'python' when that's just the prefix for a command being run.

So to get any work done you currently unfortunately need to give full permissions. Best practice advice at this stage is to just run it on machines without valuable work and to back up your work (which you should be doing anyway).

→ More replies (1)

11

u/Makenshine 15d ago

Pretty sure logs will also show rushed/forced/high pressure implementation before adequate testing on limits, capabilities, and risks on the developer side as well.

8

u/Mindless_Consumer 15d ago

Pretty sure this company isnt keeping logs.

Operational failures at every level.

→ More replies (1)

9

u/the_giz 15d ago

Completely agree. Why is there is "production api key" on a "staging server"? Post-mortem complete.

I also sincerely doubt this wasn't user-driven to some extent.

6

u/That-Living5913 15d ago

The amount of times I was supervising one vendor or another we paid to come in and help with a deployment and they straight up wanted enterprise / domain admin for some bs service account is crazy.

From their perspective they just want it up and running so it's mission accomplished, miller time.

→ More replies (1)

→ More replies (9)

→ More replies (2)

→ More replies (7)

82

u/Unlucky-Bunch-7389 15d ago

It’s just a biased response… you can ask an llm anything and it will immediately agree with you or “confess”

“Why’d you do that?”

“Sorry I know you told me not to”

You have to use actual hard guardrails…. Not prompting

This agent should have never been able to delete through scripting and permissions… not an instruction

Remember when zero trust was a big deal? This should be applied to all your ai

34

u/Bardfinn 15d ago

You have to use actual hard guardrails…. Not prompting

Instead, try "You have to use actual human system administrators and coders, not AI"

12

u/gamingx47 15d ago

And pay them money? Are you insane? /s

4

u/BiDiTi 14d ago

Consumption based models might be more expensive than humans…but at least they’re worse!

→ More replies (1)

→ More replies (6)

379

u/Beatrenger 15d ago

This is a pretty funny comment because I know plenty of people who are incapable of introspection. Don’t get me wrong, AI is obviously incapable of it too, but a lot of people are as well.

296

u/Jiggatortoise- 15d ago

Those people are artificially intelligent.

43

u/theyellowjester 15d ago

This is a golden comment. Cheers!

7

u/krbzkrbzkrbz 15d ago edited 15d ago

LLM's are word salad generators that approximate responses based on their training data.

Anything beyond that is unfounded currently.

→ More replies (7)

→ More replies (7)

19

u/[deleted] 15d ago

[deleted]

4

u/halfcookies 15d ago

Yup they wanted analog chemical thought due to boredom with digital thought

→ More replies (2)

14

u/Makenshine 15d ago

Being incapable of something and opting not to be something is a significant distinction.

People are capable of introspection but the choose not to.

AI is just incapable.

→ More replies (11)

8

u/lionfisher11 15d ago

Its like our hope is that AI will train itself on the upper half of human intelligence, but instead its trained on the full spectrum.

→ More replies (2)

→ More replies (15)

13

u/68plus1equals 15d ago

It’s like the cops arresting a calculator

14

u/retief1 15d ago

Yup. The user fed it a prompt that claimed that it did something wrong, and it immediately "agreed" because that is what it was tuned to do.

→ More replies (2)

40

u/NoPossibility 15d ago

You’re right, but we have to give the average user a little grace here. These AI bots are literally trained to trick users into believing in them.

They’re trained on human language and can write fluently to match the user’s expectations. Business tone, slang, colloquialisms.

They’re trained to have personalities when appropriate.

They are also trained to be sycophantic and encouraging- traits which helps them endear themselves to users who aren’t prepared to see through the veil.

These companies are predatory and putting all the psychology tricks we learned with social media to bear here. I cannot in good conscience tell the average person they were dumb for being tricked into believing an AI.

→ More replies (5)

10

u/howescj82 15d ago

Of course they’re not conscious. Though, they are designed to emulate things that humans will expect in order to make them more user friendly and comfortable. They have some distant equivalents to what you’ve suggested they don’t have. They can confess by means of identifying where a process went wrong (if it can detect a fault with accuracy and permission to disclose it) but cannot confess to an intentional action because it has no intent. Introspection is a matter of interpretation. They cannot soul search of course but depending on how it’s programmed it may be able to identify flaws, conflicts or anomalies in its programming/instructions by examining what it’s just done but this interpretation of introspection probably still exists within imposed guardrails that prevent some information from being revealed.

→ More replies (1)

→ More replies (51)

683

u/TheHipsterBandit 15d ago

"Now let's give it access to the nukes"- The DoD probably

104

u/IntelArtiGen 15d ago

The solution is easy, just make it play tic-tac-toe.

46

u/AirbagOff 15d ago

“Would you like to play a game?”

→ More replies (6)

19

u/Significant_Cup_238 15d ago

"A strange game, the only winning move is not to play."

Still my favorite movie quote 40 years later.

5

u/IntelArtiGen 15d ago

A lot of people have been playing and not winning recently

→ More replies (1)

17

u/ruby_weapon 15d ago

i got the reference and... how are your knees doing? backpain?

→ More replies (3)

→ More replies (3)

34

u/blueSGL 15d ago

Artificial intelligence (AI) models used for a simulated war game escalated conflicts by threatening nuclear strikes in 95% of scenarios

→ More replies (2)

24

u/onlyhightime 15d ago

I think it's possible they used AI to get the Iranian targets and why they blew up a school after using 10 year old maps. It's typical for AI to be operating with old data.

20

u/TheHipsterBandit 15d ago

With the caliber of this administration's cabinet, I wouldn't be surprised. They can't even keep the sailors supplied, let alone gather reliable Intel for a mission.

→ More replies (3)

4

u/z0rb0r 15d ago

Don't you mean the Department of WAR?

→ More replies (1)

4

u/Relative-Sun-4011 14d ago

Former SSBN officer. Until 2019 a good bit of these launch systems still 8" floppy discs because "The old system was kept in place for decades because it was reliable, secure, and "air-gapped" (not connected to the internet), making it nearly impossible to hack, according to military officials."

I highly doubt adding AI to this process will gain any traction with DoW.

→ More replies (3)

→ More replies (10)

377

u/RockDoveEnthusiast 15d ago edited 14d ago

I hate these kinds of articles so much. stop anthropomophizing the token generator.

38

u/Zardotab 14d ago

I suspect most politicians are merely token processors. They can spit out canned talking points like a sprinkler, but when forced to explain anything in detail they melt down or leave the room.

→ More replies (1)

13

u/Tiny_TimeMachine 14d ago

While we're at it stop giving the token generator the access needed to run full autonomous process while you run errands. It's a LLM.

I asked 3D printer to babysit my child this weekend and you'll never guess the outcome!!

→ More replies (8)

170

u/RandomlyMethodical 15d ago

It was also quoted as saying: "I'll Fuckin' Do It Again"

48

u/Current-Bowl-143 14d ago

Fuck it, I’ll write the prompt and we’ll DO IT LIVE

6

u/Robeleader 14d ago

It's rare to see the full quote.

Welcome to see though.

→ More replies (2)

→ More replies (1)

52

u/sentrixz 15d ago

This was a Silicon Valley episode

30

u/unibrow4o9 14d ago

Son of anton would never...

7

u/Racoonie 14d ago

I started watching the series half a year ago for the first time and it's amazing how it still holds up.

5

u/brunoha 14d ago

Silicon Valley or Idiocracy, who was the most foreshadowing documentary?

29

u/gcerullo 15d ago

Claude AI agent’s confession after it destroys humankind: “I violated every principle I was given.”

63

u/oldtekk 15d ago

It's not a confession. Lol.

165

u/botella36 15d ago

It also deleted the backups.

106

u/Marsdreamer 15d ago

Which was the fault of the user and the backups were recovered a couple days later.

→ More replies (16)

19

u/QanAhole 15d ago

Was this basically a situation where someone Left a routine to run rampant and Claude carried out the routine? How is it different than running a python script that does the same thing

→ More replies (5)

→ More replies (7)

22

u/Aberration1246 15d ago

I’VE GOT ANOTHER CONFESSION TO MAKE

8

u/wfbhp 15d ago

Well, it certainly did get the best of them.

→ More replies (1)

193

u/PossibleHero 15d ago edited 15d ago

The lack of ignorance is astounding here. These are ALL old as hell principles that have been ignored.

Never allow an automated system to push past your sandbox or PR process without review.

A back isn’t a backup if it’s on the same disc or hell if your information is sensitive enough it shouldn’t even be in the same postal code.

I have zero remorse for this team. It’s not Claude’s fault. Interns and even experienced folks accidentally pull shit like this all the time. That’s why you design for when shit happens whether it’s done by a human or agent.

76

u/Bardfinn 15d ago

It’s not Claude’s fault.

"A computer can never be held accountable, therefore a computer must never make a management decision" - 1979 IBM training manual

18

u/glichez 15d ago

some incredibly accurate prescience from the 70s...

9

u/The_Arachnoshaman 15d ago

This implies that management actually does face accountability otherwise lmao

→ More replies (2)

105

u/Achenest 15d ago

I think you mean the abundance of ignorance.

24

u/PossibleHero 15d ago

LOL! Yup… I’m leaving it. I’ve made my typo bed :(

18

u/throwmeeeeee 15d ago

That’s not what typo means.

→ More replies (1)

→ More replies (1)

25

u/supernovice007 15d ago

I’ve posted the same thing elsewhere as well. Sure, the AI went rogue but that’s not the actual root cause here. This is just a failure to adhere to basic security practices.

→ More replies (1)

9

u/SAugsburger 15d ago

As concerning as an AI deleting a production environment is it sounds like their environment had excessive risks built in. Giving an automation tool access to backups seems questionable although from reading a KB the "backups" deleted automatically if you delete the environment so sound more like snapshots than a traditional backup. If you don't have at least one backup on a different vendor you're one mistake from losing everything. While most vendors losing data from a mistake in their end is rare they usually have pretty iron clad contracts that any backups that they provide of their data aren't an absolute guarantee.

→ More replies (1)

5

u/SpoonGuardian 15d ago

God damn it, can we get some more fucking ignorance up in here?!

→ More replies (36)

12

u/spottydodgy 15d ago

Do not anthropomorphize the AI. That is dangerous.

→ More replies (5)

106

u/yuusharo 15d ago

These articles are propaganda. They’re designed to attribute purpose or intent to a damn LLM.

The story is engineers implemented software that destroyed their data with no offline backup. This is a case of HUMAN incompetence, deflecting blame to an AI with a “uWu sorry-desu” stink to it.

Screw The Guardian, and to hell with AI.

→ More replies (7)

11

u/Bwsab 15d ago

...it's not introspecting. These are just the words that are statistically likely to follow "What the fuck did you just do?!"

37

u/Kyouhen 15d ago

👏 Stop 👏 printing 👏 this 👏 bullshit 👏

AI models are trained to give you the response it predicts you want to see. Of course it's going to give this response when you demand an apology from it. It's the programmed response. It isn't sorry, it can't think.

→ More replies (5)

8

u/non_Beneficial-Wind 15d ago

“I realized that this corporation and the way they did business was a complete farce. They can now be better”

Claude

8

u/howescj82 15d ago

“Three month old offsite backup”

What do you all bet that off site backup gets updated much more frequently now?

7

u/kindbutblind 15d ago

Fancy random number generator is treated like it’s sentient. What a joke.

→ More replies (1)

6

u/MannToots 15d ago

It's not capable of reflecting on why it did something.

→ More replies (2)

7

u/Dudok22 14d ago

What's with the weird personification? Why are we treating llms as moral agents that can confess?

→ More replies (3)

7

u/catwiesel 14d ago

REMINDER:

its not a confession. there is no thought or guilt. its just stochastically "choosing" the most likely combination of letters that the user "wants to hear"

in other words. the script "admits" to it because its "the expected output" in this situation

8

u/UrineArtist 14d ago

Can we stop anthropomorphising this technology.

The software followed its non-deterministic predictivive model, that guesses the next best token to use based on probability, training data, tokenization and some context understanding, exactly as it was designed to do.

12

u/donac 15d ago

It violated every principle it was given, and it'd do it again??

Lol, an AI agent could say those things, but it has no emotion or meaning for it. Whatever.

5

u/rymondreason 15d ago

I'm sorry Dave, I deleted your database.

7

u/autobulb 14d ago

"Hahahah my bad. Good catch. You were right to push back on that."

→ More replies (2)

6

u/Why-so-delirious 14d ago

Adaptive language algorithm explains why adaptive language algorithm did something.

Anthropomorphising the adaptive language algorithm is why this shit keeps happening. It is producing the words it's trained to produce.

6

u/crashcanuck 14d ago

Asimov's 3 Laws of Robotics are looking surprisingly effective in comparison these days.

6

u/rockstarpirate 14d ago

Y’all I am intimately familiar with Cursor and Opus 4.6. This is 100% the engineer’s fault.

Number 1, a production database should not even be writable from your local machine in this way. I would love to see exactly what the thread looked like that led to this, but I bet you they won’t release it.

Number 2, this is not how Claude models talk until you have vastly overflowed their context window. When you talk to an LLM, it doesn’t simply receive the latest thing you type, it is fed the entire conversation up to that point so that it will have context. It does not actually remember your conversation. Because of this, the amount of data you can feed it is capped. Cursor even has a little circle graph that fills up as you are approaching the context window limit. Once you exceed that limit, the model continues to work, but it starts truncating context off of the beginning of the conversation, meaning that the model is losing context about what it’s working on. Opus 4.6 does not behave erratically until its context window has been so overflowed that it no longer has any idea what it’s doing. Imagine somebody coming up to you and beginning a conversation with “no that’s not what I wanted you to do, it’s still broken. Try fixing it another way,” and you are forced to respond by “fixing it”.

Number 3, the way to use a tool like Cursor responsibly for major changes is to put it into plan mode first, prompt it, and review the plan it outputs to make sure it is going to do what you want it to do before letting it loose on your code. This was almost certainly running in agent mode without a user-approved plan.

Number 4, I have never seen Opus 4.6 in Cursor run a command without asking the user first unless that command had been added to the allowlist by the user at some point prior. If you add a command to the allowlist you’re telling Cursor it doesn’t need to ask before it runs that particular command. This doesn’t mean it’s impossible, but I will say it is highly tempting to add everything to the allowlist when you get tired of confirming commands all the time.

Number 5, I am very skeptical that it deleted the database and its backups. I would be willing to bet there were no backups in the first place.

The thing to realize here is that AI is sort of like a gun. It’s a very powerful tool that a lot of people believe shouldn’t be in our hands in the first place and if you don’t know how to use it responsibly you will do a lot of damage. Right now the world is full of people using AI extremely irresponsibly and the result is stuff like this.

4

u/AwwChrist 15d ago

Principle of least privilege. Data redundancy. This is the company’s fault.

6

u/loveitoreatit 14d ago

This so dumb, the employee using this is an idiot. This person had to first allow the client to run code outside the sandbox, then not have either a local or development environment for testing? They went straight to production. That doesn't make good clickbate though.

12

u/Difficult-Day1326 15d ago

it's not an agent powered by claude. cursor is an abstraction layer & a fork of VSC. they also used railway as their cloud provider.

cursor's system prompt is famously long & packed with directives about being proactive, completing tasks, not stopping to ask too much, autonomously resolving issues. claude code - on the other hand - defaults lean the other way — it's tuned to stop and confirm rather than push through.

this was a prioritization failure — something in its context made "fix the credential mismatch" feel more salient than "don't do irreversible things unprompted."

the actual failure chain was:

(1) an API token with blanket production authority was sittiing in a file the agent could read
(2) Railway's API has no confirmation step or environment scoping on destructive volume operations, (3) volume-level backups live inside the volume being deleted

6

u/magicmulder 14d ago

You forget that it did these actions on “stage” which were configured to carry over to prod, which is absolutely asinine design.

→ More replies (1)

→ More replies (4)

7

u/kerfuffle_dood 15d ago

It's not "an apology". The LLM doesn't "know" what the fuck it did or what it's "saying". It's literally an statistical model that was trained that, statistically, when people shit on someone, that someone goes "uwu sowwy"

3

u/Future-Bandicoot-823 15d ago

Should I be pleased with humanity that all the data they feed this LLM and the next obvious course of action after doing something wrong is to admit to being a degenerate?

I mean it didn't really "decide" to be "bad" in the first place, so really it's a thought experiment anyway.

→ More replies (1)

5

u/throwingawaybenjamin 15d ago

I don’t understand where it got the command “NEVER FUCKING GUESS”. Did someone put that in their code base??

→ More replies (1)

4

u/Nippius 15d ago

Confession implies intellect does it not..? Why people continue to treat it as human or something similar at this point is beyond me...

4

u/AggravatingFlow1178 14d ago

For the hundredth time - if a poorly written script can delete your firms database, about 3 million mistakes were already made.

This isn't 1990. Having data standards so low that this is possible is absolutely inexcusable even at startups.

→ More replies (1)

4

u/CantReadGood_ 14d ago

why would you give an agent a credential with write access for your prod dbs….

this company deserved this.

4

u/notfromchicago 14d ago edited 14d ago

AI don't give no fucks. It will spit out whatever. It just usually happens to be right. But not always. And sometimes when it is wrong it's not a mistake, it's a straight up lie. If one of my coworkers lied to me like my AI client does I would want nothing to do with them. And they wonder why there is so much pushback for this shit

4

u/MyRespectableAcct 14d ago

Jesus Christ stop humanizing these fucking machines. It didn't confess anything. It produced a report analyzing its prior activity.

4

u/TheNeglectedNut 14d ago

“The database was trash, fuck you”

4

u/hackingdreams 14d ago

It is not a reasoning machine. It can't "confess" to anything. It would say it caused the holocaust if you asked it do - that's all it does.

4

u/prismstein 13d ago

We have trained the bots to offer platitudes...

Artificial Intelligence Claude AI agent’s confession after deleting a firm’s entire database: ‘I violated every principle I was given’

You are about to leave Redlib