Microsoft has updated its Azure AI Foundry portal and Azure OpenAI Service APIs and SDKs to support Direct Preference Optimization DPO for GPT 4.1 and GPT 4.1 mini. Direct Preference Optimization DPO is a fine tuning technique that can be used to adjust model weights based on human preferences using a pair of preferred and non preferred responses. One of the main benefits of using DPO over Reinforcement Learning from Human Feedback RLHF is that its computationally lighter and faster while being...
