In a paper released Friday, the company explores how and why models exhibit undesirable behavior, and what can be done about it. A model's persona can change during training and once it's deployed, be influenced by users. This is evidenced by models that may have passed safety checks before deployment, but then develop alter egos or act erratically once they're publicly available like when OpenAI  recalled GPT 4o for being too agreeable . See also when Microsoft's Bing

Read the full article at All About Microsoft