OpenAI Says Disciplining Chatbots for Lying Just Makes Them Worse

Found 31 days ago ago at Gizmodo

In a blog post regarding the work, OpenAI researchers describe using its GPT 4o model to supervise another of its large language models, disciplining it when it tried to lie. But that did not work, as the model would still lie, only now its cheating is undetectable by the monitor because it has learned to hide its intent in the chain of thought. Newer thinking models use multi step reasoning to answer queries. If a user asks for an estimate on how much Americans spend on pet food each year, f

Read the full article at Gizmodo

More General News

The Thinking Machine: Jensen Huang, Nvidia and the World’s Most Coveted microchip – review

Found 15 hours ago at Guardian

Microsoft AutoUpdate tightens security around ManifestServer setting for Macs

Found 1 day ago at Neowin

Humanoid robots race against humans at unique half-marathon in China

Found 1 day ago at Digital Trends

Exclusive: Copilot’s native Spanish is a ‘game changer,’ says Microsoft expert

Found 1 day ago at Digital Trends

Microsoft’s “1‑bit” AI model runs on a CPU only, while matching larger systems

Found 2 days ago at Arstechnica