Skip to content

Quality evaluation

Baseline (as of 2026-04-14, updated): 1650+ Vitest tests · 170 Rust CLI tests · 10 Playwright specs across 70 acceptance checks · 359 BDD scenarios · security audit clean

Grades

#RubricGrade
1Spec FidelityA
2Architecture QualityA+
3Test QualityA+
4Security PostureA+
5AccessibilityA+
6PerformanceA
7Developer ErgonomicsA+
8Browser / Web API UsageA+
9Web ComponentsA+
10Spec Coherence (WC First-Class)A+
11CI/CD PipelineA+
12Dependency ManagementA
13DocumentationA+
14Observability / LoggingA
15API DesignA+
16Error HandlingA+
17TypeScript QualityA+
18AI DrivabilityA+
19Internationalization (i18n)A+
20SEO ToolingA
21AEO ToolingA+
22First-Party DataA
23Content Modeling FlexibilityA
24Schema Migration SafetyA
25Caching StrategyA
26Plugin / Extension APIA
27Image OptimizationA
28Real-Time CollaborationA
29Privacy by DesignA+
30Open Source HealthA+
31Data PortabilityA
32Upgrade Path / Migration DXA
33Import / Migration ToolingA
34Content SchedulingA
35E2E Hosted Provider TestingB
36CLI UX QualityA+
37Email DeliveryA+
38Search / DiscoveryA
39Admin CRUD E2EA+
40Disaster RecoveryA
41Monitoring IntegrationA
42Upgrade Path E2EA
43System HonestyA+
44Multi-site Gateway (astropress-nexus)A+
45Scaffold Quality CarryoverA+
46Mobile-Firstness / Responsive DesignA
47Admin Panel UX QualityA
48Nexus UX QualityA+
49UX Writing & MicrocopyA+
50Information ArchitectureA+
51Navigation DesignA+
52Interaction Design & MotionA
53Cross-Platform SupportA
54Test Artifact CleanupA+
55MinimalismA
56Verified Providers / No Speculative FeaturesA+

Key gaps

  • Rubric 35 (B): Live hosted-provider coverage still depends on maintainer-owned accounts, seeded projects, and teardown automation; see HOSTED_E2E_SETUP.md
  • Rubric 56 (A+): The fictional “Runway” provider was removed 2026-04-14; audit:providers now enforces all provider IDs are verified against tooling/verified-providers.json
  • Rubric 46–52: UX rubrics added 2026-04-12 — A grades are engineering-observed baselines; no independent user research or usability testing has been conducted
  • Rubric 53: Windows, macOS, and Linux now have CI smoke coverage and shell parity, but BSD remains best-effort rather than verified support

Grade changes (2026-04-14 audit)

RubricOldNewReason
37 — Email DeliveryAA+Runtime, CLI, and docs now share one canonical `mock
43 — System HonestyAA+Public docs, BDD text, and user-facing crypto/readiness claims are checked against a canonical truth source in CI
55 — MinimalismAFirst evaluation; no dead exports or speculative abstractions found by arch-lint
56 — Verified ProvidersA+New rubric; audit:providers added to CI; hallucinated Runway provider removed; verified-providers.json is now the source of truth
44 — Multi-site GatewayAA+astropress-nexus now includes a tested operator dashboard, detail pages, and bulk refresh/redeploy actions
48 — Nexus UX QualityAA+Nexus now has a real operator UI with search, per-site actions, degraded-state surfacing, and responsive cards
49 — UX Writing & MicrocopyAA+docs/UX_WRITING.md and bun run audit:microcopy now enforce higher-signal user-facing copy
51 — Navigation DesignAA+Command palette, keyboard shortcut help, recent-item nav, breadcrumbs coverage, and mobile nav behavior are now tested
54 — Test Artifact CleanupBA+Example/admin-harness verification now runs inside temp data roots and CI fails if the repo is left dirty

Readiness verdict

AreaVerdictWhy
GitHub readinessYesThe repo has the expected open-source hygiene, enforced docs/readiness audits, cross-platform smoke lanes, and clean-worktree verification in CI
Production readinessYes, with caveatsThe core stack is production-capable for the verified Node 24 / Linux-macOS-Windows matrix, but hosted-provider live E2E remains incomplete and BSD is not a verified target
Cross-platform readinessYes for mainstream OSesLinux, macOS, and Windows have install/release coverage and CI smoke lanes; BSD is documented as best-effort pending native runner verification

Project handoff

Every new project created with astropress new receives an EVALUATION.md project evaluation card generated from the same 56-rubric framework. That card is intended to be carried forward and updated by downstream Astropress projects, especially when major PRs are substantially AI-generated.

Rubric 43 — System Honesty

Grade: A+

Measures whether the repo’s public claims, CLI output, and failure reporting match the implementation instead of presenting a cleaner story than the code can actually prove.

Evidence

  • README, docs, BDD text, and user-facing crypto/readiness wording are audited by bun run audit:honesty
  • User-facing fallback copy is audited by bun run audit:microcopy
  • HOSTED_E2E_SETUP.md explicitly states that hosted-provider E2E is not A+ yet and why
  • The compatibility matrix now states Linux/macOS/Windows as verified and BSD as best-effort
  • Security docs now describe the actual crypto stack: Argon2id password hashing, KMAC256 token/privacy digests, and ML-DSA-65 webhook signatures

Rubric 45 — Scaffold Quality Carryover

Grade: A+

Measures how much of the framework’s built-in quality, security, accessibility, and sustainability posture automatically transfers to a new project created with astropress new.

What carries over automatically

DimensionWhat you get for free
Admin securityZero-Trust Admin (ZTA) wrappers on every action handler; session hardening; rate-limited login; Argon2id password hashing; KMAC256 token digests; CSRF tokens on all admin forms
Input validationAll admin form inputs are validated and HTML-sanitized before persistence; SQL injection surface is contained to adapter layer
Admin accessibilityWCAG 2.2 AA admin panel with keyboard navigation, ARIA live regions, focus traps, and screen-reader-tested components
Privacy defaultsNo third-party analytics, no telemetry; structured GDPR right-of-erasure SQL included
Static-first carbon footprintDefault scaffold targets GitHub Pages (output: "static") — no always-on server compute; CDN edge delivery
Image optimizationAutomatic srcsets, WebP conversion, and lazy loading when media library is used
Secrets hygiene.env is gitignored; generated secrets use cryptographic randomness; bootstrap passwords are disabled flag once set
Public-side security headerssrc/middleware.ts is generated with createAstropressSecurityMiddleware() — CSP, X-Frame-Options, Permissions-Policy, Referrer-Policy, and X-Request-Id on every response
Git hookslefthook.yml is generated — biome auto-fix, .env commit guard, and conventional commit format on pre-commit

What still requires project-specific input

DimensionWhat to do
Test setupAdd vitest.config.ts and test files as needed
Public-page accessibilityUser-authored pages start from a blank <SiteLayout> shell — author with semantic HTML; use axe or Lighthouse in CI
registerCms() customisationsrc/middleware.ts calls registerCms() with empty defaults — add your siteUrl, templateKeys, and archives

Rubric 53 — Cross-Platform Support

Grade: A

Measures whether the developer workflow, CLI, release artifacts, and test matrix are genuinely portable across Windows, macOS, Linux, and BSD-family systems.

Evidence

  • tooling/scripts/install.sh supports macOS, Linux, FreeBSD, OpenBSD, and NetBSD
  • tooling/scripts/install.ps1 provides a native PowerShell bootstrap path for Windows
  • .github/workflows/cli-release.yml builds CLI binaries for Linux, macOS, and Windows
  • .github/workflows/ci.yml now runs a platform-smoke matrix on ubuntu-latest, macos-latest, and windows-latest
  • The root package.json test:cli script is shell-agnostic and works without Bash-specific source setup
  • Shell completions cover bash, zsh, fish, and powershell
  • docs/COMPATIBILITY.md publishes support tiers and the verified cross-platform command set
  • The Rust CLI has a --plain / --no-tui fallback, which reduces dependence on terminal-specific raw-mode support
  • BSD is explicitly called out as an upstream and CI gap in docs/UPSTREAM_CONTRIBUTIONS.md

What would improve this

  • Define a documented BSD support tier and verify at least one BSD target in CI or a self-hosted runner
  • Run browser or static-build smoke tests on macOS in addition to Linux

Rubric 54 — Test Artifact Cleanup

Grade: A+

Measures whether automated tests and local verification runs clean up their temporary directories, generated databases, and repo-local artifacts so reruns do not leave avoidable residue behind.

Evidence

  • tooling/scripts/run-with-temp-data.ts isolates example/admin-harness data output under tmpdir() and removes it after each run
  • tooling/scripts/run-playwright.ts now creates and deletes temporary data roots for example and admin-harness E2E sessions
  • Root scripts such as bun run test:example, bun run test:admin-harness, bun run test:accessibility, and bun run test:static-site use the temp-data wrapper
  • .github/workflows/ci.yml runs bun run repo:clean after verification jobs to fail on leftover artifacts
  • The Rust CLI test harness uses TestDir RAII cleanup and an orphan sweep for astropress-cli-* / astropress-* temp directories in crates/astropress-cli/src/tests/mod.rs
  • Many Vitest suites create temp workspaces under tmpdir() and remove them with rm, rmSync, or afterEach cleanup hooks