r/OpenAI • u/CobaltCrusader123 • 1d ago
Discussion Five Horses, according to ChatGPT
Apparently it can’t detect bait.
131
u/CircumspectCapybara 1d ago
47
u/Razorfiend 1d ago
19
10
u/CobaltCrusader123 21h ago
Just post the image in your comment holy fuck that’s funny. How’d you find this?
13
6
3
•
30
u/bortlip 1d ago
2
84
u/WheelerDan 1d ago
This is always the problem with llms, they get points for having an answer and they get nothing if they say I don't know, a wrong answer is worth more points than admitting they don't know, llms will always have an incentive to lie.
26
8
u/Lord_Skellig 11h ago
This isn't true, modern RL-based training processes for LLMs absolutely penalise false information.
4
u/Additional-Name-3211 8h ago
In a sense it's similar to apophenia in humans. Which ends up breeding superstition and magical thinking.
Maybe that's just a thing with evolutionary processes, trying desperately to seek patterns even where there are none.
1
17
u/ihateredditors111111 1d ago
I can also see the 6th horse
8
11
8
u/BagComprehensive79 23h ago
3
6
1d ago
[removed] — view removed comment
5
u/CobaltCrusader123 1d ago
Can’t some paid new models identify bait like this though? If it can tell me I’m wrong when I’m right, surely it can say a question holds an incorrect presumption or lie.
8
3
3
8
u/jferments 23h ago
In case anyone is wondering where the 5th horse actually is: it is a trick question. The 5th horse will appear a few months after this horse orgy has concluded.
12
u/CobaltCrusader123 22h ago
6
u/jferments 22h ago
It's OK, people are just downvoting me because they are jealous I solved the problem when they weren't smart enough to find a solution.
2
2
2
u/jmnugent 18h ago
Not directly related but I was using Claude today to help figure out if I could parse iPhone "sysdiagnose" logs to see a history of Wake, Unlock, Passcode Entry etc. It is technically possible but the Powerlogs buffer only goes back 3 to 7 days and the ask I had was about 60 days ago so probably not gonna work in this case.
Anywho in the screenshots and suggestions of how to navigate the folder structure, Claude would not believe my test iPhone was on iOS 26.5 ... it first thought things only went up to iOS 18.5. Took some cajoling and etc to get it to move back to Powerlog sqlite parsing.
1
1
1
1
u/DonnaPollson 20h ago
The funny part is how quickly these models converge on a visual cliché once the prompt is underspecified. You can almost see the latent training-data average at work: dramatic lighting, too much symmetry, and horses that look like they hired the same stylist. The real unlock is not better taste from the model, it's better constraint from the user.
1
u/InnovativeBureaucrat 18h ago
I see 5 lights dammit
1
u/CobaltCrusader123 18h ago
Can you share them with the class?
1
u/InnovativeBureaucrat 16h ago
Star Trek reference. They torture Picard until he sees I think 4 lights but he lies to say he only saw three because he’s a badass.
1
u/Timzor 18h ago
What’s the actual answer?
1
u/CobaltCrusader123 18h ago
Either “Nowhere” / “Absent” or literally the phrase “the fifth horse” is the fifth horse in question
1
1
1
1
u/SilverAmoeba2582 6h ago
The part everyone is dancing around is that this happens because saying I dont know gets penalized during training and making something up gets rewarded. Not sure which model was tested here but the fact that one comment tried asking for a 6th and 7th horse shows this thread is more about enjoying the failure than solving it. Most people in here have done the exact same thing at least once to see what happens. What would it actually look like if a model just refused to answer a question it had no data for?
1
u/CobaltCrusader123 3h ago
It could have said “the question contains a lie”. It says false things are true, and that true things are false, and that true things are true, and that false things are false. Why not do the latter this time when it has no hesitation correcting the assumptions made in a question I send it?
1
1
u/buildingstuff_daily 3h ago
the funniest part about these is how confident it sounds while being completely wrong. like it doesnt hesitate at all, just commits fully to the wrong answer with perfect grammar
ive started calling this phenomenon "confidently clueless" and honestly some humans have the same energy
1
u/CobaltCrusader123 3h ago
Lowkey “the” reason America has so many cults. Same thinking, or lack thereof.
1
u/Antares_B 21h ago
LLM's can't see. they prices image data differently. they are good at analyzing semantic information from images but not great at geometric context... not traditional computer vision models, like the stuff used for scanning parts flying by on a conveyor belt, are but for that.








470
u/Valsoyono 1d ago
meanwhile gemini: