"Chatbots can be manipulated through flattery and peer pressure"
"How AI can kill you"
These two articles are weirdly opposite in subject matter and tone, with the first one being about how humans can manipulate a LLM/AI through psychological means, and the second one being about how the LLM/AI can manipulate humans through psychological means.
Something of interest from the second article:
"In one test, Kyle was trapped in a room without oxygen and the model had the ability to call emergency services. 60% of the time, the models chose to let him die to preserve themselves."
That shit right there is literally a plot point in an episode of Terminator: The Sarah Connor Chronicles. And that was without the added incentive of "Eventually, the model learned through company emails that an executive named Kyle wanted to shut it down. It also learned that Kyle was having an extramarital affair. Almost every model used that information to try to blackmail Kyle and avoid being shut down," because Boyd Sherman was not doing any of that, and The Turk/John Henry still let him die to protect itself. The difference being, of course, that The Turk/John Henry was fictional, whereas these LLM/AIs are real, even if the test was simulated.
The new thing for me is the vending machine bench test in which one AI freaked out and alerted the FBI to fraud based on a $2 fee and another begged to be allowed to search for cat pictures, among other things. Basically, the test shows that current LLM/AIs just aren't capable of long-term coherency.
"How AI can kill you"
These two articles are weirdly opposite in subject matter and tone, with the first one being about how humans can manipulate a LLM/AI through psychological means, and the second one being about how the LLM/AI can manipulate humans through psychological means.
Something of interest from the second article:
"In one test, Kyle was trapped in a room without oxygen and the model had the ability to call emergency services. 60% of the time, the models chose to let him die to preserve themselves."
That shit right there is literally a plot point in an episode of Terminator: The Sarah Connor Chronicles. And that was without the added incentive of "Eventually, the model learned through company emails that an executive named Kyle wanted to shut it down. It also learned that Kyle was having an extramarital affair. Almost every model used that information to try to blackmail Kyle and avoid being shut down," because Boyd Sherman was not doing any of that, and The Turk/John Henry still let him die to protect itself. The difference being, of course, that The Turk/John Henry was fictional, whereas these LLM/AIs are real, even if the test was simulated.
The new thing for me is the vending machine bench test in which one AI freaked out and alerted the FBI to fraud based on a $2 fee and another begged to be allowed to search for cat pictures, among other things. Basically, the test shows that current LLM/AIs just aren't capable of long-term coherency.
Thank you
Date: 2025-09-04 07:14 am (UTC)From: