Mercor released a new benchmark called APEX Agents, and it is brutal. unlike the usual tests that ask AI to write a poem or solve a math problem, this one uses actual queries from lawyers, consultants, and bankers. It asks the models to do complete, multi step tasks that require jumping between different types of information. The results? Even the absolute best models on the market—we are talking about Gemini 3 Flash and GPT 5.2—couldn’t crack a 25% accuracy rate. Gemini led...

Read the full article at Digital Trends

Wingeek Icon
More Windows News