Contract-first
metalworks.contract is a set of Pydantic models — DemandReport, Landscape, Assessment,
ResearchBrief, and the rest. It is the stable spine: the one thing that doesn’t change casually
below 1.0, and the shape every other layer speaks. The TypeScript twin (ts/contract.ts) and the
JSON-schema snapshots (src/metalworks/contract/schema/) are generated from these models by
scripts/gen_ts_types.py — one source of truth, no hand-maintained copies. A contract change is a
deliberate act: regenerate, commit the generated files, and stay additive (new fields default so an
old payload still validates).
One capability, four surfaces
Every capability is exposed the same way on all four surfaces, so the Python SDK user, an agent talking over MCP, and someone in Claude Code all get the same thing:_TOOL_WRAPPERS). CONTRIBUTING.md
has the file-by-file checklist; /pr-ready verifies it.
Deterministic core, LLM prose
The decisions are pure, testable functions — the model never makes the call. Theassess()
verdict (GO / PIVOT / NO-GO) is a deterministic gap over demand strength and landscape saturation;
gap severity is service-assigned from distinct-author breadth; demand bands self-calibrate to the
run. The LLM writes only the human-facing rationale — the prose explaining a decision that was
already computed. This is what makes the output defensible and CI-testable: the same inputs always
produce the same verdict, and a unit test can pin the whole decision matrix without a model in the
loop.
No-cite-no-claim
Every claim resolves to a real quote. A cluster carries verbatimResolvedCitations; a competitor
gap or a marketing-site line carries an EvidenceRef that resolves against report.evidence. If a
claim can’t be backed by a real comment (or a grounded web finding), it is dropped, not shipped —
no hallucinated competitors, no invented quotes, no confident-looking output built on nothing. This
is the trust property the whole product rests on.
Lean core, lazy extras
The core depends only onpydantic, httpx, typer, and rich. Everything heavier — the provider
SDKs, duckdb, supabase, mcp — lives behind an extra and is lazy-imported
inside the function that needs it, never at module top level. So import metalworks is free and
pulls in zero provider modules (CI asserts this on a bare install), and a missing extra raises a
MissingExtraError carrying the exact pip install command rather than a raw ModuleNotFoundError.
Swappable protocols
Underneath the facade, each external dependency is a small versioned protocol with thin adapters:ChatModel / GroundedChatModel, SearchProvider, EmbeddingProvider, and the typed storage repos.
Bring your own and the rest of the pipeline doesn’t care — see Extending and
Protocols. Conformance suites (metalworks.testing.check_all_repos,
FakeChatModel) hold your adapter to the same behavior the built-ins demonstrate.
Offline by default
The test suite runs with no network:pytest-socket blocks sockets, and tests needing the real
network are marked network and deselected by default. Synthesis is exercised for real against
fixtures using FakeChatModel (scripted per output model — it raises on an unscripted call so drift
is caught, never silently nulled), FakeEmbedding, and MemoryStores. A test that needs a live
service is the exception, not the rule.
Next: the protocols reference for the exact seams, or CONTRIBUTING.md for the file-by-file workflow and the pre-PR gate.