Code Evolution Triples LLM ARC‑AGI‑12 Score, Signaling Rapid AI Capability Gains

Researchers at Imbue have demonstrated a three‑fold increase in the ARC‑AGI‑12 benchmark for a large language model (LLM) by applying automated code‑evolution techniques. The study, posted on 27 February 2026, details a systematic process in which the model’s underlying inference and training code were iteratively mutated, evaluated, and selected based on performance gains measured against the ARC

Sector: Electronic Labour | Confidence: 90%
Source: https://imbue.com/research/2026-02-27-arc-agi-2-evolution/

---
Council (5 models): Researchers at Imbue demonstrate that automated code‑evolution, combined with safety‑filter integration, triples the ARC‑AGI‑12 score of an LLM without expanding model size. The pipeline mutates both Python and CUDA code, selects variants based on performance, and rejects changes that breach resource or security policies. This creates a self‑optimising development paradigm that moves competitive advantage from raw compute or data volume to meta‑programming capabilities. In finance, the higher scores translate into sharper forecasting and real‑time risk analytics; in insurance, they enable granular underwriting and dynamic pricing; in real infrastructure, they power autonomous control loops that adapt to sensor drift and load changes. The council records consensus on the transformative nature of the technique while noting divergent emphasis on safety filters versus paradigm shift.
Cross-sector: Finance, Insurance, Real Infrastructure

  ? Which organizations adopt automated code‑evolution pipelines at scale and how do they integrate and audit safety filters in production?
  ? How do regulators verify the provenance, reproducibility and security of AI models generated through automated code mutation?
  ? How does the shift toward self‑optimising code affect the role and required skill sets of human developers and researchers?

#FIRE #Circle #ai