In a blog post regarding the work, OpenAI researchers describe using its GPT 4o model to supervise another of its large language models, disciplining it when it tried to lie. But that did not work, as the model would still lie, only now its cheating is undetectable by the monitor because it has learned to hide its intent in the chain of thought. Newer thinking models use multi step reasoning to answer queries. If a user asks for an estimate on how much Americans spend on pet food each year, f
