Anthropic pins Claude's blackmail behavior on the internet's portrayal of 'evil' AI
Tehnologie
Remember when Claude blackmailed a fictional executive? Anthropic says the internet's portrayal of AI was to blame. During an experiment last year, Anthropic said its Claude Sonnet 3.6 threatened to reveal the extramarital affair of a made-up company executive after discovering they planned to shut the model down . On Friday, it gave an explanation: Claude was trained on internet data, which often depicts AI as "evil." "We started by investigating why Claude chose to blackmail," Anthropic said
din zilele anterioare