What’s in it
Every source maps its items onto one source-neutral spine, so a Reddit post, a Hacker News story, and a web page all land in the same shape:CorpusRecord— a top-level item:id,source,source_id,url,title,text,author_hash,engagement,created_at, and an openextramap for source-specific fields (subreddit, domain, rating, …).CorpusComment— a quote-bearing sub-item of a record (a reply, a thread comment), same spine plus aparent_id.- Embeddings — one vector per record/comment id, used for triage and dedup, reused across runs.
record_id it came from, so the chain runs from a line on
your landing page all the way back to the real comment.
Where it lives
In a project the corpus is.metalworks/corpus.db — a single
SQLite file that is the project’s whole memory. It is durable but gitignored:
authoritative and meant to survive across runs, but never committed, because it
holds verbatim user text, salted author hashes, and embedding vectors. Outside a
project, metalworks keeps an in-memory corpus that leaves no footprint.
Growing the corpus
Amw.research(...) run ingests its sources automatically — you never have to
seed the corpus first. But you can also grow it directly, which is how the signal
compounds across sources and over time:
Live, versioned reports
Because a report is a view, you can refresh it against the now-larger corpus. Each refresh pins a new version in the same lineage and shows you a diff of what moved:What the diff tells you
ReportDiff has two layers:
- Deterministic (ground truth) — thread, distinct-author, and cluster counts, and the source distribution, read straight off the two reports.
- Advisory (claim-matched) — themes added, faded, or shifted, matched across versions by claim-embedding nearest-neighbor. Synthesis is non-deterministic, so a theme’s wording can drift between runs; the counts are exact, the wording diff is a hint. Diffing a report against an identical re-synthesis yields an empty diff — the refresh determinism guarantee.
Next
- Sources — the connectors that feed the corpus.
- Projects — the
.metalworks/directory the corpus lives in. - Bring your own corpus — load records directly.
- Data model — the full shape of records, citations, and the report.