Quality evaluation
Baseline (as of 2026-04-14, updated): 1650+ Vitest tests · 170 Rust CLI tests · 10 Playwright specs across 70 acceptance checks · 359 BDD scenarios · security audit clean
Grades
| # | Rubric | Grade |
|---|---|---|
| 1 | Spec Fidelity | A |
| 2 | Architecture Quality | A+ |
| 3 | Test Quality | A+ |
| 4 | Security Posture | A+ |
| 5 | Accessibility | A+ |
| 6 | Performance | A |
| 7 | Developer Ergonomics | A+ |
| 8 | Browser / Web API Usage | A+ |
| 9 | Web Components | A+ |
| 10 | Spec Coherence (WC First-Class) | A+ |
| 11 | CI/CD Pipeline | A+ |
| 12 | Dependency Management | A |
| 13 | Documentation | A+ |
| 14 | Observability / Logging | A |
| 15 | API Design | A+ |
| 16 | Error Handling | A+ |
| 17 | TypeScript Quality | A+ |
| 18 | AI Drivability | A+ |
| 19 | Internationalization (i18n) | A+ |
| 20 | SEO Tooling | A |
| 21 | AEO Tooling | A+ |
| 22 | First-Party Data | A |
| 23 | Content Modeling Flexibility | A |
| 24 | Schema Migration Safety | A |
| 25 | Caching Strategy | A |
| 26 | Plugin / Extension API | A |
| 27 | Image Optimization | A |
| 28 | Real-Time Collaboration | A |
| 29 | Privacy by Design | A+ |
| 30 | Open Source Health | A+ |
| 31 | Data Portability | A |
| 32 | Upgrade Path / Migration DX | A |
| 33 | Import / Migration Tooling | A |
| 34 | Content Scheduling | A |
| 35 | E2E Hosted Provider Testing | B |
| 36 | CLI UX Quality | A+ |
| 37 | Email Delivery | A+ |
| 38 | Search / Discovery | A |
| 39 | Admin CRUD E2E | A+ |
| 40 | Disaster Recovery | A |
| 41 | Monitoring Integration | A |
| 42 | Upgrade Path E2E | A |
| 43 | System Honesty | A+ |
| 44 | Multi-site Gateway (astropress-nexus) | A+ |
| 45 | Scaffold Quality Carryover | A+ |
| 46 | Mobile-Firstness / Responsive Design | A |
| 47 | Admin Panel UX Quality | A |
| 48 | Nexus UX Quality | A+ |
| 49 | UX Writing & Microcopy | A+ |
| 50 | Information Architecture | A+ |
| 51 | Navigation Design | A+ |
| 52 | Interaction Design & Motion | A |
| 53 | Cross-Platform Support | A |
| 54 | Test Artifact Cleanup | A+ |
| 55 | Minimalism | A |
| 56 | Verified Providers / No Speculative Features | A+ |
Key gaps
- Rubric 35 (B): Live hosted-provider coverage still depends on maintainer-owned accounts, seeded projects, and teardown automation; see
HOSTED_E2E_SETUP.md - Rubric 56 (A+): The fictional “Runway” provider was removed 2026-04-14;
audit:providersnow enforces all provider IDs are verified againsttooling/verified-providers.json - Rubric 46–52: UX rubrics added 2026-04-12 — A grades are engineering-observed baselines; no independent user research or usability testing has been conducted
- Rubric 53: Windows, macOS, and Linux now have CI smoke coverage and shell parity, but BSD remains best-effort rather than verified support
Grade changes (2026-04-14 audit)
| Rubric | Old | New | Reason |
|---|---|---|---|
| 37 — Email Delivery | A | A+ | Runtime, CLI, and docs now share one canonical `mock |
| 43 — System Honesty | A | A+ | Public docs, BDD text, and user-facing crypto/readiness claims are checked against a canonical truth source in CI |
| 55 — Minimalism | — | A | First evaluation; no dead exports or speculative abstractions found by arch-lint |
| 56 — Verified Providers | — | A+ | New rubric; audit:providers added to CI; hallucinated Runway provider removed; verified-providers.json is now the source of truth |
| 44 — Multi-site Gateway | A | A+ | astropress-nexus now includes a tested operator dashboard, detail pages, and bulk refresh/redeploy actions |
| 48 — Nexus UX Quality | A | A+ | Nexus now has a real operator UI with search, per-site actions, degraded-state surfacing, and responsive cards |
| 49 — UX Writing & Microcopy | A | A+ | docs/UX_WRITING.md and bun run audit:microcopy now enforce higher-signal user-facing copy |
| 51 — Navigation Design | A | A+ | Command palette, keyboard shortcut help, recent-item nav, breadcrumbs coverage, and mobile nav behavior are now tested |
| 54 — Test Artifact Cleanup | B | A+ | Example/admin-harness verification now runs inside temp data roots and CI fails if the repo is left dirty |
Readiness verdict
| Area | Verdict | Why |
|---|---|---|
| GitHub readiness | Yes | The repo has the expected open-source hygiene, enforced docs/readiness audits, cross-platform smoke lanes, and clean-worktree verification in CI |
| Production readiness | Yes, with caveats | The core stack is production-capable for the verified Node 24 / Linux-macOS-Windows matrix, but hosted-provider live E2E remains incomplete and BSD is not a verified target |
| Cross-platform readiness | Yes for mainstream OSes | Linux, macOS, and Windows have install/release coverage and CI smoke lanes; BSD is documented as best-effort pending native runner verification |
Project handoff
Every new project created with astropress new receives an EVALUATION.md project evaluation card generated from the same 56-rubric framework. That card is intended to be carried forward and updated by downstream Astropress projects, especially when major PRs are substantially AI-generated.
Rubric 43 — System Honesty
Grade: A+
Measures whether the repo’s public claims, CLI output, and failure reporting match the implementation instead of presenting a cleaner story than the code can actually prove.
Evidence
- README, docs, BDD text, and user-facing crypto/readiness wording are audited by
bun run audit:honesty - User-facing fallback copy is audited by
bun run audit:microcopy HOSTED_E2E_SETUP.mdexplicitly states that hosted-provider E2E is notA+yet and why- The compatibility matrix now states Linux/macOS/Windows as verified and BSD as best-effort
- Security docs now describe the actual crypto stack: Argon2id password hashing, KMAC256 token/privacy digests, and ML-DSA-65 webhook signatures
Rubric 45 — Scaffold Quality Carryover
Grade: A+
Measures how much of the framework’s built-in quality, security, accessibility, and sustainability posture automatically transfers to a new project created with astropress new.
What carries over automatically
| Dimension | What you get for free |
|---|---|
| Admin security | Zero-Trust Admin (ZTA) wrappers on every action handler; session hardening; rate-limited login; Argon2id password hashing; KMAC256 token digests; CSRF tokens on all admin forms |
| Input validation | All admin form inputs are validated and HTML-sanitized before persistence; SQL injection surface is contained to adapter layer |
| Admin accessibility | WCAG 2.2 AA admin panel with keyboard navigation, ARIA live regions, focus traps, and screen-reader-tested components |
| Privacy defaults | No third-party analytics, no telemetry; structured GDPR right-of-erasure SQL included |
| Static-first carbon footprint | Default scaffold targets GitHub Pages (output: "static") — no always-on server compute; CDN edge delivery |
| Image optimization | Automatic srcsets, WebP conversion, and lazy loading when media library is used |
| Secrets hygiene | .env is gitignored; generated secrets use cryptographic randomness; bootstrap passwords are disabled flag once set |
| Public-side security headers | src/middleware.ts is generated with createAstropressSecurityMiddleware() — CSP, X-Frame-Options, Permissions-Policy, Referrer-Policy, and X-Request-Id on every response |
| Git hooks | lefthook.yml is generated — biome auto-fix, .env commit guard, and conventional commit format on pre-commit |
What still requires project-specific input
| Dimension | What to do |
|---|---|
| Test setup | Add vitest.config.ts and test files as needed |
| Public-page accessibility | User-authored pages start from a blank <SiteLayout> shell — author with semantic HTML; use axe or Lighthouse in CI |
registerCms() customisation | src/middleware.ts calls registerCms() with empty defaults — add your siteUrl, templateKeys, and archives |
Rubric 53 — Cross-Platform Support
Grade: A
Measures whether the developer workflow, CLI, release artifacts, and test matrix are genuinely portable across Windows, macOS, Linux, and BSD-family systems.
Evidence
tooling/scripts/install.shsupports macOS, Linux, FreeBSD, OpenBSD, and NetBSDtooling/scripts/install.ps1provides a native PowerShell bootstrap path for Windows.github/workflows/cli-release.ymlbuilds CLI binaries for Linux, macOS, and Windows.github/workflows/ci.ymlnow runs aplatform-smokematrix onubuntu-latest,macos-latest, andwindows-latest- The root
package.jsontest:cliscript is shell-agnostic and works without Bash-specificsourcesetup - Shell completions cover
bash,zsh,fish, andpowershell docs/COMPATIBILITY.mdpublishes support tiers and the verified cross-platform command set- The Rust CLI has a
--plain/--no-tuifallback, which reduces dependence on terminal-specific raw-mode support - BSD is explicitly called out as an upstream and CI gap in
docs/UPSTREAM_CONTRIBUTIONS.md
What would improve this
- Define a documented BSD support tier and verify at least one BSD target in CI or a self-hosted runner
- Run browser or static-build smoke tests on macOS in addition to Linux
Rubric 54 — Test Artifact Cleanup
Grade: A+
Measures whether automated tests and local verification runs clean up their temporary directories, generated databases, and repo-local artifacts so reruns do not leave avoidable residue behind.
Evidence
tooling/scripts/run-with-temp-data.tsisolates example/admin-harness data output undertmpdir()and removes it after each runtooling/scripts/run-playwright.tsnow creates and deletes temporary data roots for example and admin-harness E2E sessions- Root scripts such as
bun run test:example,bun run test:admin-harness,bun run test:accessibility, andbun run test:static-siteuse the temp-data wrapper .github/workflows/ci.ymlrunsbun run repo:cleanafter verification jobs to fail on leftover artifacts- The Rust CLI test harness uses
TestDirRAII cleanup and an orphan sweep forastropress-cli-*/astropress-*temp directories incrates/astropress-cli/src/tests/mod.rs - Many Vitest suites create temp workspaces under
tmpdir()and remove them withrm,rmSync, orafterEachcleanup hooks