Kim's team stated, "Under the same conditions [as LG AI Research's experiment], Gemini and Grok series models scored approximately 92 points, while ChatGPT and Claude series models scored about 88 ...
Codex, introducing "context compaction" for long tasks and raising API prices by 40% to target enterprise engineering.