Quality evaluation

Baseline (as of 2026-04-14, updated): 1650+ Vitest tests · 170 Rust CLI tests · 10 Playwright specs across 70 acceptance checks · 359 BDD scenarios · security audit clean

Grades

#	Rubric	Grade
1	Spec Fidelity	A
2	Architecture Quality	A+
3	Test Quality	A+
4	Security Posture	A+
5	Accessibility	A+
6	Performance	A
7	Developer Ergonomics	A+
8	Browser / Web API Usage	A+
9	Web Components	A+
10	Spec Coherence (WC First-Class)	A+
11	CI/CD Pipeline	A+
12	Dependency Management	A
13	Documentation	A+
14	Observability / Logging	A
15	API Design	A+
16	Error Handling	A+
17	TypeScript Quality	A+
18	AI Drivability	A+
19	Internationalization (i18n)	A+
20	SEO Tooling	A
21	AEO Tooling	A+
22	First-Party Data	A
23	Content Modeling Flexibility	A
24	Schema Migration Safety	A
25	Caching Strategy	A
26	Plugin / Extension API	A
27	Image Optimization	A
28	Real-Time Collaboration	A
29	Privacy by Design	A+
30	Open Source Health	A+
31	Data Portability	A
32	Upgrade Path / Migration DX	A
33	Import / Migration Tooling	A
34	Content Scheduling	A
35	E2E Hosted Provider Testing	B
36	CLI UX Quality	A+
37	Email Delivery	A+
38	Search / Discovery	A
39	Admin CRUD E2E	A+
40	Disaster Recovery	A
41	Monitoring Integration	A
42	Upgrade Path E2E	A
43	System Honesty	A+
44	Multi-site Gateway (astropress-nexus)	A+
45	Scaffold Quality Carryover	A+
46	Mobile-Firstness / Responsive Design	A
47	Admin Panel UX Quality	A
48	Nexus UX Quality	A+
49	UX Writing & Microcopy	A+
50	Information Architecture	A+
51	Navigation Design	A+
52	Interaction Design & Motion	A
53	Cross-Platform Support	A
54	Test Artifact Cleanup	A+
55	Minimalism	A
56	Verified Providers / No Speculative Features	A+

Key gaps

Rubric 35 (B): Live hosted-provider coverage still depends on maintainer-owned accounts, seeded projects, and teardown automation; see HOSTED_E2E_SETUP.md
Rubric 56 (A+): The fictional “Runway” provider was removed 2026-04-14; audit:providers now enforces all provider IDs are verified against tooling/verified-providers.json
Rubric 46–52: UX rubrics added 2026-04-12 — A grades are engineering-observed baselines; no independent user research or usability testing has been conducted
Rubric 53: Windows, macOS, and Linux now have CI smoke coverage and shell parity, but BSD remains best-effort rather than verified support

Grade changes (2026-04-14 audit)

Rubric	Old	New	Reason
37 — Email Delivery	A	A+	Runtime, CLI, and docs now share one canonical `mock
43 — System Honesty	A	A+	Public docs, BDD text, and user-facing crypto/readiness claims are checked against a canonical truth source in CI
55 — Minimalism	—	A	First evaluation; no dead exports or speculative abstractions found by arch-lint
56 — Verified Providers	—	A+	New rubric; `audit:providers` added to CI; hallucinated Runway provider removed; `verified-providers.json` is now the source of truth
44 — Multi-site Gateway	A	A+	`astropress-nexus` now includes a tested operator dashboard, detail pages, and bulk refresh/redeploy actions
48 — Nexus UX Quality	A	A+	Nexus now has a real operator UI with search, per-site actions, degraded-state surfacing, and responsive cards
49 — UX Writing & Microcopy	A	A+	`docs/UX_WRITING.md` and `bun run audit:microcopy` now enforce higher-signal user-facing copy
51 — Navigation Design	A	A+	Command palette, keyboard shortcut help, recent-item nav, breadcrumbs coverage, and mobile nav behavior are now tested
54 — Test Artifact Cleanup	B	A+	Example/admin-harness verification now runs inside temp data roots and CI fails if the repo is left dirty

Readiness verdict

Area	Verdict	Why
GitHub readiness	Yes	The repo has the expected open-source hygiene, enforced docs/readiness audits, cross-platform smoke lanes, and clean-worktree verification in CI
Production readiness	Yes, with caveats	The core stack is production-capable for the verified Node 24 / Linux-macOS-Windows matrix, but hosted-provider live E2E remains incomplete and BSD is not a verified target
Cross-platform readiness	Yes for mainstream OSes	Linux, macOS, and Windows have install/release coverage and CI smoke lanes; BSD is documented as best-effort pending native runner verification

Project handoff

Every new project created with astropress new receives an EVALUATION.md project evaluation card generated from the same 56-rubric framework. That card is intended to be carried forward and updated by downstream Astropress projects, especially when major PRs are substantially AI-generated.

Rubric 43 — System Honesty

Grade: A+

Measures whether the repo’s public claims, CLI output, and failure reporting match the implementation instead of presenting a cleaner story than the code can actually prove.

Evidence

README, docs, BDD text, and user-facing crypto/readiness wording are audited by bun run audit:honesty
User-facing fallback copy is audited by bun run audit:microcopy
HOSTED_E2E_SETUP.md explicitly states that hosted-provider E2E is not A+ yet and why
The compatibility matrix now states Linux/macOS/Windows as verified and BSD as best-effort
Security docs now describe the actual crypto stack: Argon2id password hashing, KMAC256 token/privacy digests, and ML-DSA-65 webhook signatures

Rubric 45 — Scaffold Quality Carryover

Grade: A+

Measures how much of the framework’s built-in quality, security, accessibility, and sustainability posture automatically transfers to a new project created with astropress new.

What carries over automatically

Dimension	What you get for free
Admin security	Zero-Trust Admin (ZTA) wrappers on every action handler; session hardening; rate-limited login; Argon2id password hashing; KMAC256 token digests; CSRF tokens on all admin forms
Input validation	All admin form inputs are validated and HTML-sanitized before persistence; SQL injection surface is contained to adapter layer
Admin accessibility	WCAG 2.2 AA admin panel with keyboard navigation, ARIA live regions, focus traps, and screen-reader-tested components
Privacy defaults	No third-party analytics, no telemetry; structured GDPR right-of-erasure SQL included
Static-first carbon footprint	Default scaffold targets GitHub Pages (`output: "static"`) — no always-on server compute; CDN edge delivery
Image optimization	Automatic srcsets, WebP conversion, and lazy loading when media library is used
Secrets hygiene	`.env` is gitignored; generated secrets use cryptographic randomness; bootstrap passwords are disabled flag once set
Public-side security headers	`src/middleware.ts` is generated with `createAstropressSecurityMiddleware()` — CSP, X-Frame-Options, Permissions-Policy, Referrer-Policy, and X-Request-Id on every response
Git hooks	`lefthook.yml` is generated — biome auto-fix, `.env` commit guard, and conventional commit format on pre-commit

What still requires project-specific input

Dimension	What to do
Test setup	Add `vitest.config.ts` and test files as needed
Public-page accessibility	User-authored pages start from a blank `<SiteLayout>` shell — author with semantic HTML; use axe or Lighthouse in CI
`registerCms()` customisation	`src/middleware.ts` calls `registerCms()` with empty defaults — add your `siteUrl`, `templateKeys`, and `archives`

Rubric 53 — Cross-Platform Support

Grade: A

Measures whether the developer workflow, CLI, release artifacts, and test matrix are genuinely portable across Windows, macOS, Linux, and BSD-family systems.

Evidence

tooling/scripts/install.sh supports macOS, Linux, FreeBSD, OpenBSD, and NetBSD
tooling/scripts/install.ps1 provides a native PowerShell bootstrap path for Windows
.github/workflows/cli-release.yml builds CLI binaries for Linux, macOS, and Windows
.github/workflows/ci.yml now runs a platform-smoke matrix on ubuntu-latest, macos-latest, and windows-latest
The root package.json test:cli script is shell-agnostic and works without Bash-specific source setup
Shell completions cover bash, zsh, fish, and powershell
docs/COMPATIBILITY.md publishes support tiers and the verified cross-platform command set
The Rust CLI has a --plain / --no-tui fallback, which reduces dependence on terminal-specific raw-mode support
BSD is explicitly called out as an upstream and CI gap in docs/UPSTREAM_CONTRIBUTIONS.md

What would improve this

Define a documented BSD support tier and verify at least one BSD target in CI or a self-hosted runner
Run browser or static-build smoke tests on macOS in addition to Linux

Rubric 54 — Test Artifact Cleanup

Grade: A+

Measures whether automated tests and local verification runs clean up their temporary directories, generated databases, and repo-local artifacts so reruns do not leave avoidable residue behind.

Evidence

tooling/scripts/run-with-temp-data.ts isolates example/admin-harness data output under tmpdir() and removes it after each run
tooling/scripts/run-playwright.ts now creates and deletes temporary data roots for example and admin-harness E2E sessions
Root scripts such as bun run test:example, bun run test:admin-harness, bun run test:accessibility, and bun run test:static-site use the temp-data wrapper
.github/workflows/ci.yml runs bun run repo:clean after verification jobs to fail on leftover artifacts
The Rust CLI test harness uses TestDir RAII cleanup and an orphan sweep for astropress-cli-* / astropress-* temp directories in crates/astropress-cli/src/tests/mod.rs
Many Vitest suites create temp workspaces under tmpdir() and remove them with rm, rmSync, or afterEach cleanup hooks