Maguro-003 -

| Property | Value | |----------|-------| | Format | JSONL, ShareGPT-style | | Size | 3.2 GB compressed | | Tokens | ~780M (Japanese: 92%, English: 7%, other: 1%) | | Avg response length | 128 tokens | | Train/validation split | 95/5 | | Toxicity filter threshold | 0.03 (using Japanese hate speech classifier) |

Tokyo — In the sprawling world of artificial intelligence, where datasets are often named after animals, colors, or celestial bodies, a new codename has begun circulating among researchers: maguro-003 . For those unfamiliar, “maguro” is Japanese for bluefin tuna — a prized, high-fat, deep-red delicacy. And in AI circles, the name hints at quality: premium, carefully curated, and ready for rigorous consumption. maguro-003

Maguro-003 is licensed under a custom non-commercial research license, though commercial licenses are reportedly available for enterprises based in Japan or partnering with local universities. Based on metadata extracted from Hugging Face staging repositories: | Property | Value | |----------|-------| | Format

Whether it becomes the gold standard or a footnote depends on adoption. But one thing is certain: in the race to build smaller, smarter, more respectful models, maguro-003 has set a new bar for what “premium” means. This article is based on available technical documentation, developer testimonials, and public code repositories as of April 14, 2026. The author has no affiliation with Wakaba Labs or any commercial AI entity. This article is based on available technical documentation,

Phone
Email