Sakana AI’s Fugu System Reportedly Outperforms Claude 5 on Key Benchmarks

A new artificial intelligence system named Fugu, developed by the Japanese company Sakana AI, has reportedly outperformed Anthropic’s highly advanced Claude Fable 5 on several critical benchmarks. This achievement signals a significant development in the global AI landscape, particularly as Fugu employs a distinctive architecture that coordinates multiple AI models through a single API to tackle complex challenges.

Benchmark charts released by Sakana AI illustrate Fugu Ultra’s superior performance. On LiveCodeBench, an open-source test evaluating coding proficiency with regularly updated software problems, Fugu Ultra scored 93.2, with the standard Fugu model close behind at 92.9, both exceeding Fable 5’s 89.8. Furthermore, Fugu Ultra and Fugu both achieved a score of 95.5 on GPQA-D (Diamond), a rigorous test comprising 198 graduate-level multiple-choice questions across biology, physics, and chemistry, surpassing the prior Claude Mythos Preview model’s 94.6.

This reported outperformance is particularly noteworthy given the recent history of Anthropic’s Claude Fable 5 and its foundational model, Mythos. These models, considered among the most powerful and capable AI systems, were controversially rolled back just three days after their launch. The US government had reportedly requested Anthropic revoke access for all foreign entities, citing grave national security concerns.

Anthropic had previously previewed Mythos in April but withheld its mass release due to fears that malicious actors could exploit its capabilities. Concerns included the potential for hacking critical infrastructure, such as banking systems, or even developing bioweapons. The company itself stated that Mythos could identify flaws in every major operating system and web browser it tested, uncovering vulnerabilities that had remained undetected for decades.

To manage these risks, Anthropic initiated Project Glasswing, a controlled program that shared Mythos with approximately 50 vetted organizations, including tech giants like Google, Apple, Amazon, Microsoft, and cybersecurity firm CrowdStrike, specifically for defensive cybersecurity applications. Even with Fable 5, Anthropic implemented guardrails designed to automatically revert the model to a less capable version, Claude Opus 4.8, if a user attempted to engage in high-risk activities like hacking critical systems or creating bioweapons.

Sakana AI’s Fugu system, in contrast to these single, monolithic models, operates by orchestrating several specialized AI models. This multi-model coordination allows it to approach and solve intricate tasks with a potentially different kind of efficiency and flexibility. The company launched two versions: Fugu, designed for everyday tasks like coding and chat, and Fugu Ultra, tailored for more demanding applications such as AI research, paper reproduction, cybersecurity analysis, and patent investigations.

Founded in 2023 in Tokyo, Sakana AI brings together significant expertise. Its co-founders are Llion Jones, a prominent co-author of Google’s seminal 2017 paper “Attention Is All You Need,” which laid the groundwork for modern transformer-based AI models, and David Ha, who previously served as the head of research at Stability AI. Their combined experience underscores the deep technical foundation behind Fugu’s development.

Beyond its direct comparison with Anthropic’s offerings, Sakana AI has also asserted that its Fugu models surpassed Google’s Gemini 3.1 Pro, OpenAI GPT-5.5, and Anthropic’s Opus 4.8 in a diverse array of tasks. These include automated research, mechanical design, Japanese handwriting analysis, one-shot chess, Rubik’s Cube solving, and financial time-series prediction, indicating a broad competitive edge across various domains.

This emergence of a Japanese AI firm challenging established American leaders like Anthropic, Google, and OpenAI signals a broadening of the global AI innovation landscape. It suggests that alternative architectural approaches, such as Fugu’s multi-model coordination, could offer new pathways for developing powerful and versatile AI systems. As the AI industry continues its rapid evolution, the performance of systems like Fugu will likely intensify competition and drive further advancements in AI capabilities and safety protocols worldwide.

Moving forward, the AI community will closely watch how Sakana AI’s Fugu system integrates into real-world applications and whether its multi-model approach can sustain its competitive edge against the rapidly evolving single-model giants. Its success could inspire further diversification in AI development strategies, potentially leading to more specialized, efficient, and perhaps even more secure AI solutions.

IN SHORTJapanese AI firm Sakana AI has launched its Fugu system, which reportedly surpasses Anthropic’s Claude Fable 5 on specific coding and science benchmarks. Unlike single-model AI, Fugu coordinates multiple models via one API for complex tasks. This development highlights global innovation in AI, challenging established players and offering new approaches to advanced problem-solving.

TL;DR

Japanese firm Sakana AI launched its Fugu system, reportedly outperforming Anthropic’s Claude Fable 5 on specific benchmarks.
Fugu Ultra scored 93.2 on LiveCodeBench (coding) and 95.5 on GPQA-D (graduate-level science), exceeding Fable 5 and Mythos Preview scores.
Anthropic’s Claude Fable 5 and Mythos models were recently rolled back due to US government national security concerns.
Mythos was deemed too powerful, capable of identifying critical system vulnerabilities, leading to a controlled program called Project Glasswing.
Fugu’s unique architecture coordinates multiple AI models via a single API, rather than relying on a single monolithic model.
Sakana AI was founded by Llion Jones (co-author of "Attention Is All You Need") and David Ha (former head of research at Stability AI).
Sakana AI also claims Fugu models outperformed Google’s Gemini 3.1 Pro and OpenAI GPT-5.5 on various other tasks.