A design-lead craft specialist: magazine editorial, heritage athletic club, expedition narrative, and Bauhaus systems—with near-perfect section completeness (9.90 avg) and the field's strongest visual identity scores.
Highest craft ceiling in the benchmark. Opus-4.7 leads on visual identity, completeness, and model average, making it ideal for premium launches where design-review pass rate matters. Pair with brand-energy guardrails (contemplative editorial undershoots Cadence's energetic brief) and mandatory mobile QA—v7 and v10 show the floor is still slop-prone without explicit constraints.
Consistent strengths
- Highest model average (84.0) and visual identity (8.60) in the 80-entry dataset
- Completeness leader at 9.90 — judges rarely found missing sections
- Won variations 5 (heritage athletic) and 9 (expedition-themed) with thematic cohesion rare in AI output
- 80% ship rate (8/10) — only two iterate verdicts, both with clear remediation paths
Consistent weaknesses
- Recycled '214k members' stat across batch — 'feels fabricated, hurts honesty lens'
- Brand energy often contemplative vs energetic brief — brand-fit laggard at 7.70
- Lowest mobile readiness (7.50); v10 scored 5 for vertical rotated pillar text
- v7 purple/glass Y2K relapse (77) — 'edges toward AI-hyperpop slop'
- Best moment
- Variation 5 (87, ship, cross-model winner): Heritage athletic club with postcard testimonial treatment, triple CTA hero, and vintage identity that 'avoids every AI slop tell.'
- Worst moment
- Variation 10 (76, iterate): Maximalist high-energy page with sticky-note testimonials but 'vertical rotated text nearly unreadable at 390px — do not ship mobile as-is.'
- Skill delta
- N/A — single harness (claude-code baseline), no skill variant in dataset.









