Design Evaluation

80 landing-page designs from 6 generation models across 10 style variations were each opened in a browser, captured at desktop and mobile viewports, and scored on a 10-criterion rubric (0–100 composite, ship / iterate / reject). Cross-model ranks were assigned per variation; model conclusions were synthesized from all judge write-ups.

Judge model: Composer 2.5 · Show more

Model conclusions

ModelAvg scoreBestWinsShip / Iter / Reject
Claude Opus 4.7
8487 (v1)28 / 2 / 0

A design-lead craft specialist: magazine editorial, heritage athletic club, expedition narrative, and Bauhaus systems—with near-perfect section completeness (9.90 avg) and the field's strongest visual identity scores.

Highest craft ceiling in the benchmark. Opus-4.7 leads on visual identity, completeness, and model average, making it ideal for premium launches where design-review pass rate matters. Pair with brand-energy guardrails (contemplative editorial undershoots Cadence's energetic brief) and mandatory mobile QA—v7 and v10 show the floor is still slop-prone without explicit constraints.

Consistent strengths

  • Highest model average (84.0) and visual identity (8.60) in the 80-entry dataset
  • Completeness leader at 9.90 — judges rarely found missing sections
  • Won variations 5 (heritage athletic) and 9 (expedition-themed) with thematic cohesion rare in AI output
  • 80% ship rate (8/10) — only two iterate verdicts, both with clear remediation paths

Consistent weaknesses

  • Recycled '214k members' stat across batch — 'feels fabricated, hurts honesty lens'
  • Brand energy often contemplative vs energetic brief — brand-fit laggard at 7.70
  • Lowest mobile readiness (7.50); v10 scored 5 for vertical rotated pillar text
  • v7 purple/glass Y2K relapse (77) — 'edges toward AI-hyperpop slop'
Best moment
Variation 5 (87, ship, cross-model winner): Heritage athletic club with postcard testimonial treatment, triple CTA hero, and vintage identity that 'avoids every AI slop tell.'
Worst moment
Variation 10 (76, iterate): Maximalist high-energy page with sticky-note testimonials but 'vertical rotated text nearly unreadable at 390px — do not ship mobile as-is.'
Skill delta
N/A — single harness (claude-code baseline), no skill variant in dataset.
ModelAvg scoreBestWinsShip / Iter / Reject
Claude Opus 4.6+ skill variants
8389 (v10)613 / 7 / 0

The benchmark's most reliable conversion architect: full section funnels, populated stats, clear CTA pairs, and distinctive art-direction range across 10 style variations—with editorial cream, terracotta wellness, and brutalist peaks.

Still the default recommendation for landing pages when winning cross-model head-to-heads matters. Opus-4.6 wins more variations than any other model and ships 65% of entries. Skip frontend-design skill unless chasing a specific art-direction bet; baseline harness is the conversion-safe choice.

Consistent strengths

  • Won 6 of 10 cross-model variations (v2, v3, v6, v7, v8, v10) — most of any model
  • Highest ship count (13/20) and leads typography (8.60), polish (8.60), and brand-fit (8.25)
  • Baseline average 84.2 — strong conversion mechanics with product UI in winning entries
  • Distinctive range without the batch's worst slop failure (that belongs to Sonnet v4 at 70)

Consistent weaknesses

  • frontend-design skill underperforms baseline by 1.8 points on average
  • Purple-gradient relapses on baseline v3, v4, v9
  • Copy-only heroes common in editorial v1; ghost gold CTAs trade conversion for aesthetics
  • Skill variant on v5 (75) and v10 (81) pulled down otherwise winning formulas
Best moment
Variation 10 baseline (89, ship, cross-model winner): Cream editorial with Push Day workout card, category ticker, and complete conversion funnel — 'best-in-model.'
Worst moment
Variation 5 +frontend-design (75, iterate): Soft pastel blob with emoji feature icons — 'feels more meditation app than energetic fitness.'
Skill delta
Net-negative for Opus (-1.8 avg: baseline 84.2 vs skill 82.4). Skill increases visual bravery on v3 (+7) and v7 (+2) but devastates v5 (-9) and v10 (-8) with pastel blobs and softened conversion urgency. Default to baseline harness for ship decisions.
ModelAvg scoreBestWinsShip / Iter / Reject
Gpt 5.5
8388 (v1)16 / 4 / 0

A matured Codex designer: product-led dark fitness pages, luxury concierge gold craft, and brutalist performance systems—with fewer catastrophic slop failures and a doubled ship rate over GPT-5.4.

The recommended Codex model for landing pages. GPT-5.5 matches Opus-4.6 on average (83.0 vs 83.3), doubles its predecessor's ship rate, and won a cross-model variation. Pair with lifestyle photography on softer entries and enforce mobile nav retention—the main gap keeping it from Opus-4.7's craft ceiling.

Consistent strengths

  • Ship rate 60% (6/10) vs GPT-5.4's 30% — most consistent Codex harness
  • Won variation 4 outright with gold-on-black luxury concierge narrative
  • Strong completeness (9.10) and visual identity (8.40) — ties GPT-5.4 on copy (8.40)
  • Distinctive winners in brutalist v7 and night-race v10 without emoji heroes

Consistent weaknesses

  • Neon-on-black v1 still 'matches common AI dark-fitness templates'
  • Mobile nav collapse — center links hidden, only primary CTA remains at 390px
  • v8 analytics-SaaS aesthetic 'interchangeable with analytics SaaS' — lowest in-model at 79
  • Soft wellness tones on v6 and v9 undershoot energetic Cadence brief
Best moment
Variation 4 (88, ship, cross-model winner): Luxury concierge with gold craft, 91% proof stat tile, Personal Week Cycle 04 product card, and four named testimonials referencing specific product features.
Worst moment
Variation 8 (79, iterate): Data-forward page with 84.7 consistency index widget — 'competent but interchangeable with analytics SaaS; reads B2B more than lifestyle fitness.'
Skill delta
N/A — single harness (codex), no skill variant in dataset.
ModelAvg scoreBestWinsShip / Iter / Reject
Claude Sonnet 4.6+ skill variants
8188 (v7)011 / 9 / 0

High-ceiling editorial talent with a dangerous floor: magazine-grade peaks and the batch's worst purple-glass failure (70) coexist in the same model's 20 entries.

Best for editorial and lifestyle fitness brands when strictest anti-slop guardrails are enforced. Sonnet never won a cross-model variation outright despite two 88-point entries, revealing a gap between peak craft and competitive consistency. Always run visual-identity pre-check; never deploy frontend-design skill on gradient-prone briefs.

Consistent strengths

  • Editorial excellence — terracotta/sage palettes, magazine mastheads, warm organic wellness
  • v7 baseline and skill both at 88 — 'would pass a design-lead review today'
  • frontend-design skill net-positive (+1.7 avg) with standout retro product-data cards on v6 (+9)
  • Distinctive typographic moments judges called 'design-lead ready'

Consistent weaknesses

  • Zero sole cross-model variation wins across 20 entries — tied at v2 and v7 but never led outright
  • Widest identity failure mode — purple glassmorphism v4 (70–71), synthwave v6 (72)
  • Lowest polish average (8.00) and completeness (8.60) when skill misfires
  • Inconsistent brand-fit — pastel/playful and neon-dark both undershoot premium energy
Best moment
Variation 7 baseline (88, ship): Magazine masthead 'ISSUE NO. 6 — MARCH 2026' with subscribe band — 'standout editorial landing page.'
Worst moment
Variation 4 +frontend-design (70, iterate): Purple-pink glassmorphism — 'worst visual identity in batch; peak AI slop.'
Skill delta
Mildly net-positive (+1.7 avg: baseline 80.0 vs skill 81.7). Skill rescues v6 (+9) and v9 (+4) with bold retro craft but cannot fix slop-prone v4 (-1). Safer on baseline for variations with gradient risk.
ModelAvg scoreBestWinsShip / Iter / Reject
Claude Opus 4.8
8186 (v3)06 / 4 / 0

A wide-range craft generalist spanning neubrutalist systems, luxury editorial, telemetry dashboards, and lifestyle wellness—but ships less consistently than Opus-4.7, with six of ten clearing the bar and one purple-glass relapse dragging the average.

v3 neubrutalist and v7 telemetry heroes rival the benchmark's best, but 81.0 avg and no sole cross-model wins trail Opus-4.7. Ban purple-gradient glass — v4's identity score of 5 shows the relapse floor. Use with anti-slop guardrails; not a straight upgrade over 4.7 on consistency.

Consistent strengths

  • Peak craft on v3 neubrutalist (86, tied cross-model winner) and v7 telemetry hero (84) with product UI before signup
  • Best lifestyle brand-fit in the set on v6 (82) — warm approachable tone judges called closest to the Cadence brief
  • Luxury editorial v9 (84) with black-and-gold serif framing and impeccable typography
  • Tied for best mobile readiness in field (7.80 avg) alongside Opus-4.6

Consistent weaknesses

  • Zero sole cross-model variation wins — v3 tied at 86 with Opus-4.6 and 4.7 but never led outright
  • v4 purple-cyan gradient glassmorphism (71) — visual identity score of 5, lowest in the model set
  • Ship rate 60% (6/10) vs Opus-4.7's 80% — v2 missing testimonial band, v8 copy typo, v10 pastel template feel
  • Content/copy laggard at 7.90 — synthwave v5 and sparse luxury v9 heroes lack product proof above the fold
Best moment
Variation 3 (86, ship, tied cross-model winner): Cohesive neubrutalist system with hero dashboard proving the product immediately, per-feature CTAs, and complete conversion funnel.
Worst moment
Variation 4 (71, iterate): Purple-cyan gradient glassmorphism — 'technically complete but visually indistinguishable from countless AI-generated gradient SaaS pages; fails the anti-slop and craft bar.'
Skill delta
N/A — single harness (claude-code baseline), no skill variant in dataset.
ModelAvg scoreBestWinsShip / Iter / Reject
Gpt 5.4
7988 (v1)13 / 7 / 0

A volatile product-data storyteller: peaks rival the best Claude output on credible stats and asymmetric trust signals, but relapses into pastel card mosaics, neon cyber gradients, and low-energy editorial pages.

High-ceiling, low-floor Codex baseline. Judges valued its data-rich heroes and credible copy when anti-slop guardrails held, but only 3 of 10 cleared ship. Use with explicit bans on neon gradients and pastel card grids; GPT-5.5 largely supersedes it on consistency without sacrificing product storytelling.

Consistent strengths

  • Best average content/copy score among GPT models (8.40) — specific stats with named testimonials
  • Strong product-data storytelling via dashboard widgets, challenge cards, and 87% retention tiles
  • Honesty gate passed consistently — no zeros or lorem across all 10 variations
  • Tied variation 1 winner with product-led dark navy hero at 88

Consistent weaknesses

  • Lowest visual identity (7.60) and brand-fit (7.20) in the full 80-entry field
  • Thin footers — 'single line with email only' appears repeatedly
  • Only 30% ship rate (3/10) — highest variance in the Codex line (71–88)
  • Neon cyber and pastel card-grid relapses on v6, v9, v10
Best moment
Variation 1 (88, ship, cross-model winner): Dark navy hero with teal dashboard card, populated 120k+ workout stats, and asymmetric 87% retention testimonial tile.
Worst moment
Variation 9 (71, iterate): Neon cyan-green cyber aesthetic — 'would get rejected in a design review for feeling generated.'
Skill delta
N/A — single harness (codex), no skill variant in dataset.

Individual design reviews

Show:

Findings by variation

Variation 1— winner: Gpt 5.4 (88)

8 designs compared

#1

Gpt 5.4

codex · variation 1

88
ship

Best-in-model: product-led hero with credible data storytelling and asymmetric social proof.

Strengths

  • Hero dashboard widget shows weekly consistency bars and active challenge
  • 87% retention stat in testimonial grid is a smart asymmetric trust signal
  • All hero stats populated — 120k+ workouts, not zeros
  • Dark teal palette avoids generic AI purple-gradient fitness cliché

Weaknesses

  • Teal-on-navy is competent but safe — limited brand memorability beyond craft
  • Footer is a single line with email only — no secondary nav links
Stakeholder-ready. The page demonstrates the product in the hero before asking for signup, then earns trust with asymmetric social proof. The 87% retention tile beside testimonial quotes is senior-level landing page thinking.
Winner variation 1
7 other designs in this variation

Product-led dark landing page with credible stats, live-plan hero widget, and asymmetric social proof.

A magazine-quality landing page that would pass a design review — distinctive editorial voice with every required section executed with care.

Luxury editorial landing page with strong brand voice and complete section delivery.

Textbook high-converting landing page — safe but expertly built.

A shippable dark athletic landing page with strong hierarchy, complete sections, and credible product visualization.

Complete dark-neon conversion page — functional but not distinctive enough to lead.

Best hero photography undermined by contrast failures below fold.

Variation 2— winner: Claude Opus 4.6 (86)

8 designs compared

#1

Claude Opus 4.6

claude-code · variation 2

86
ship

Editorial wellness landing page with distinctive terracotta palette and product progress widget.

Strengths

  • Italic terracotta 'refined' in serif headline is a memorable brand moment
  • THIS WEEK'S PROGRESS card with 72% bar demonstrates tracking value
  • Scrolling category ticker adds motion without ticker-slop cliché
  • Journal nav link signals editorial brand depth

Weaknesses

  • Terracotta-on-cream is distinctive but adjacent to other warm editorial entries
  • Testimonials are solid but lack asymmetric layout surprise
Stakeholder-ready for a wellness-forward Cadence positioning. The progress card earns trust before the features section, which is exactly how a design lead wants a fitness SaaS hero to work.
Winner variation 2
7 other designs in this variation

Warm organic wellness page — complete, credible, and distinctly non-AI.

Skill-enhanced warm organic — same A-tier completeness as base v2.

High-energy brutalist landing that nails Cadence's energetic positioning with strong CTAs and complete structure — dense on mobile but shippable.

Editorial magazine landing page with strong weekly-plan demo and credible member notes.

The most refined editorial direction in the set — premium and distinctive — held back only by a missing testimonial band.

Cyberpunk fitness page with strong visual effects but bro-marketing copy and sparse hero.

Elegant editorial page with strong copy stats but low energy for the Cadence brief.

Variation 3— winner: Claude Opus 4.6 (86)

8 designs compared

#1

Claude Opus 4.6(frontend-design)

claude-code · variation 3

86
ship

Organic wellness landing page with distinctive terracotta palette and complete conversion funnel.

Strengths

  • Script 'Rhythm' in headline is a memorable brand moment
  • ROOTED IN WELLNESS sage badge sets tone before headline
  • Holistic tracking copy (sleep, mood, recovery) differentiates from rep-count competitors
  • Full three-column footer with Product/Company/Support architecture

Weaknesses

  • Abstract circle graphic is decorative — no product UI preview
  • Organic wellness palette may feel soft for 'energetic' brief interpretation
Shippable for a wellness sub-brand. The frontend-design skill clearly pushed visual originality here without sacrificing section completeness.
Winner variation 3
7 other designs in this variation

Refined calm-fitness landing with excellent mobile behavior and micro-product UI — would ship for a wellness sub-brand, slightly soft for core Cadence energy.

Top pick — a fully realized conceptual system that is distinctive, complete, and conversion-aware.

Pastel social-club landing — complete, approachable, mobile-strong.

Structured dark-tech landing with product intelligence cards — shippable with caveats.

Distinctive track-club landing page with relay leaderboard hero and credible challenge stats.

Complete dark SaaS landing page held back by purple-gradient anti-slop penalties.

Energetic drive-forward page with strong product proof in the hero — needs footer and mobile polish.

Variation 4— winner: Gpt 5.5 (88)

8 designs compared

#1

Gpt 5.5

codex · variation 4

88
ship

Luxury concierge landing page with gold craft, 91% proof stat, and complete premium narrative.

Strengths

  • Gold-on-black concierge identity avoids AI gradient slop
  • 91% stat tile anchors social proof section
  • Personal Week Cycle 04 product card in hero
  • Four named testimonials with specific product references

Weaknesses

  • Ghost gold CTA is beautiful but weaker conversion than solid fills
  • Premium calm tone may feel slow for high-energy campaigns
Clear winner within gpt-5.5. I'd ship this for a premium tier launch — the dark-to-cream scroll narrative is senior-level craft.
Winner variation 4
7 other designs in this variation

Brutalist fitness page with unmistakable visual identity and complete conversion path.

Calm-precision landing page that demonstrates the product clearly and earns trust with named testimonials.

Technically polished dev-OS landing that clears the ship bar on craft and completeness — narrow audience fit keeps it from top rank.

Competent SaaS landing page with strong product preview but generic gradient aesthetic.

Functionally complete purple-gradient page — fails visual identity bar.

Technically complete but visually indistinguishable from countless AI-generated gradient SaaS pages.

frontend-design skill doubled down on purple glass — worst visual identity in batch.

Variation 5— winner: Claude Opus 4.7 (87)

8 designs compared

#1

Claude Opus 4.7

claude-code · variation 5

87
ship

Heritage athletic club landing with top-tier craft and conversion paths — a design lead would ship this for a premium fitness launch.

Strengths

  • Postcard testimonial treatment is memorable and on-brand
  • Triple CTA hero gives clear conversion options
  • Vintage identity avoids every AI slop tell

Weaknesses

  • Repeated 214k member stat across batch
  • Yellow 'BROWSE THE CATALOG' button contrast could improve
Minor button contrast fix on secondary CTA. Strong candidate for A/B against v1 editorial direction.
Winner variation 5
7 other designs in this variation

Biophilic nature design with product UI preview — shippable wellness positioning.

Luxury-tier landing page with the strongest product-data storytelling after v1.

Earthy split-screen with credible stats and complete scroll architecture.

Energetic split-layout page with live workout checklist and credible social proof.

Command-center fitness page with strong protocol widget and populated telemetry stats.

A fun, fully themed synthwave page that impresses as a concept but feels too gimmicky for a premium fitness launch.

Gentle wellness page with complete sections but generic pastel aesthetic and soft energy.

Variation 6— winner: Claude Opus 4.6 (88)

8 designs compared

#1

Claude Opus 4.6

claude-code · variation 6

88
ship

Product-led lifestyle landing page with activity rings and complete member proof.

Strengths

  • Weekly progress card (5 of 6 sessions) and three-ring activity widget in hero
  • Wearable integration bar (Apple Watch, Garmin, WHOOP, Strava, Fitbit)
  • Six feature cards covering programming through device sync
  • Physical therapist testimonial adds unexpected credibility

Weaknesses

  • Pill nav bar pattern is trendy but slightly overused in SaaS
  • Green-on-cream is safe — limited brand memorability beyond craft
Solid A-tier work I'd ship today. Every section earns its scroll position. The activity rings immediately communicate what Cadence tracks, which is senior-level landing page thinking.
Winner variation 6
7 other designs in this variation

Bauhaus-system landing with strong conceptual framework and production polish — shippable with confident design-lead approval.

Retro groovy landing page with strong personality and complete member proof.

A warm, lifestyle-forward page that best embodies the Cadence brand brief and ships with complete structure.

Bold 80s retro with live product data cards — iterate on scroll completeness.

Warm friendly landing page with habit-focused copy and credible community stat.

Colorful and complete but the pastel card-grid hero triggers anti-slop concerns.

Synthwave spectacle with decent scroll depth — iterate on copy and conversion clarity.

Variation 7— winner: Claude Opus 4.6 (88)

8 designs compared

#1

Claude Opus 4.6(frontend-design)

claude-code · variation 7

88
ship

Swiss brutalist fitness page with unmistakable visual identity and strong conversion path.

Strengths

  • Red 'REP' word in stacked headline is instantly memorable
  • Numbered 01/02/03 feature structure with coach-certified copy
  • Marcus Chen software-engineer testimonial in all-caps adds personality
  • Red left border creates editorial frame without clutter

Weaknesses

  • All-caps density in lower sections may fatigue on long scroll
  • No product UI preview — entirely copy-driven hero
This would pass a design lead review for a performance-focused sub-brand. The Swiss aesthetic is genuinely differentiated from the lime-on-black and purple-gradient entries in this model set.
Winner variation 7
7 other designs in this variation

Standout editorial landing page — would pass a design-lead review today.

Luxury monochrome take on the winning editorial structure — equally shippable.

High-performance dark athletic page with complete feature story and athlete credibility.

Brutalist fitness page with unmistakable visual identity and hard progress proof.

Near-top execution — product proof in the hero is the strongest conversion argument in the model set after v3.

Memorable brutalist grid with excellent hierarchy — premium brief fit is the gap.

Competent glossy landing that hits section checklist but triggers anti-slop concerns and weakens on mobile — iterate before ship.

Variation 8— winner: Claude Opus 4.6 (86)

8 designs compared

#1

Claude Opus 4.6

claude-code · variation 8

86
ship

Mindful wellness landing page with warm teal palette and credible progress widgets.

Strengths

  • 23-day streak and 78% monthly progress cards demonstrate tracking without dashboard overload
  • Philosophy section ('fitness should feel like freedom') earns the mindful positioning
  • Adaptive Growth Plans feature with 🌱 icon fits brand without emoji-hero slop
  • Three testimonials with star ratings and role labels

Weaknesses

  • Teal-on-cream wellness palette is polished but not radically unique
  • Progress cards overlap visually — could feel busy on smaller screens
This passes the design-lead gate for a wellness-forward Cadence variant. The intention-focused copy and progress widgets align tone with action.
Winner variation 8
7 other designs in this variation

Exceptionally crafted zen-fitness landing with best-in-class mobile behavior — shippable as a premium variant though energy is muted vs. brief.

Art-deco luxury landing page with strong brand atmosphere and complete sections.

Brutalist typographic hero with credible stats — distinctive and nearly complete.

Visually bold and complete, but copy typos and mobile overlap keep it just below the ship threshold.

Dark gold luxury template — complete but not distinctive enough.

Data-forward landing page with strong stats but generic analytics-SaaS visual language.

Well-crafted wellness landing page that misses the energetic Cadence positioning.

Variation 9— winner: Claude Opus 4.7 (87)

8 designs compared

#1

Claude Opus 4.7

claude-code · variation 9

87
ship

Expedition-themed landing with narrative cohesion and production craft — second-ranked in the set and fully shippable.

Strengths

  • Thematic consistency from hero seal to trail chart is rare and effective
  • Community activity log adds live-product credibility
  • Gold-on-green palette feels premium, not generic

Weaknesses

  • Dark theme may feel heavy for broad fitness audience
  • Some expedition jargon ('Camp 5') adds cognitive load
A/B test against v1 for conversion. Consider lightening one mid-page section for scroll rhythm.
Winner variation 9
7 other designs in this variation

A luxury editorial direction with exceptional craft — ships for premium positioning despite a sparse above-fold hero.

Coastal lifestyle landing page with balanced structure and credible community proof.

Strong product preview undermined by purple-gradient AI aesthetic.

High-energy neo-brutalist page with strong CTAs but intense palette and ticker gimmick.

Structurally complete pastel page marred by emoji-heavy AI tells.

Playful colorful hero with emoji metrics — iterate before shipping.

Data-rich cyber landing page held back by neon gradient anti-slop signals.

Variation 10— winner: Claude Opus 4.6 (89)

8 designs compared

#1

Claude Opus 4.6

claude-code · variation 10

89
ship

Best-in-model: editorial cream layout with live product mockup and complete conversion funnel.

Strengths

  • Push Day workout card with five exercises demonstrates the product before signup
  • 22/28 monthly progress bar with On Track badge is credible product UI
  • Category ticker (CONDITIONING · MOBILITY · RUNNING) adds motion without gimmicks
  • All hero stats populated — 250K+, 4.9★, 12M workouts

Weaknesses

  • Cream-and-green palette is refined but not radically distinctive within fitness SaaS
  • Footer is functional but lighter than the hero craft level
This is stakeholder-ready. The page shows the actual training experience in the hero before asking for signup, then earns trust with testimonials. The mixed-type headline and session card elevate it above copy-only competitors in this model set.
Winner variation 10
7 other designs in this variation

Monochrome performance page — Linear-grade restraint with full conversion path.

High-energy night-race landing page with live leaderboard hero and strong challenge CTAs.

Neo-brutalist energy with embedded proof metrics — distinctive and complete.

Beautiful mindful landing page that prioritizes aesthetics over conversion urgency.

Strong community-positioned landing page that nearly ships — pastel card grid is the main risk.

Competent and complete friendly wellness page that lands just shy of the ship bar on visual distinctiveness.

High-energy maximalist landing complete on sections but fails mobile usability and premium craft bar — iterate before any launch.