Why open-core
Two reasons. First, distribution: the easiest way to get a tool in front of the developers and agent-builders who would use it is to make the core free and embeddable. Second, it is a forcing function for clean architecture: extracting the pipeline behind versioned protocols, with no Supabase or auth singletons, makes it better software than it was as an internal monolith.What is in metalworks
The pieces that are generally useful and carry no tenant-specific logic:- The data contract and the LLM / search / embedding / storage protocols.
- The research vertical: brief to corpus to triage to clustered demand report.
- The Reddit core: OAuth, search, metrics, inbox, subreddit intel, rate limiting, posting, and the deterministic compliance gate.
- The discovery loop: filter to generate to gate.
What stays in Clique
Deliberately not open-sourced:- Account-backstory generation. The source had tooling that fabricated
Reddit account histories to make accounts look aged and credible. That is
coordinated inauthentic behavior and a Reddit ToS violation. It was excluded
on purpose. The persona model here carries a
backgroundfield that must be authentic; there is no path that invents one. SeeUSAGE_POLICY.md. - The memory system. Clique’s episodic / structured / procedural memory is
product surface. metalworks exposes a plain
DiscoveryContextseam (voice guidelines, winning examples, pinned notes, personas) that a memory system renders into; the memory itself stays in Clique. - Multi-tenant auth, billing, plan caps, and spend caps. Tenant fields
default to
"local"so a library user never thinks about tenants.
Structural provenance
The research pipeline is built so that claims cannot be fabricated by construction, not by trust:- Cluster quotes are exact-matched against stored Reddit comments. A quote that
does not match a real comment is dropped (
no-quote-no-theme). - Web findings carry their
source_urlfrom the grounding tool’s citation metadata, never from model prose. Zero citations means the finding is dropped; URLs are never synthesized. - Counts (distinct authors, mention counts, must-address resolution) are computed from membership, never asserted by a model.
Internal vs external grounding
Web research follows the Inspect-AI split. When the chat model supports native grounding (Gemini’sgoogle_search, Anthropic’s web_search), the adapter
returns a GroundedResult with character-offset supports and the pipeline maps
findings to citations by span overlap. When it does not, the pipeline falls back
to an external SearchProvider (Exa, Tavily) plus a structured synthesis call
that cites results by index. Either way, the provenance contract above holds.
API stability
Below 1.0, the stable surface ismetalworks.contract and the MCP tool
contracts. Everything else may change in any 0.x release. This lets the four
form factors ship before the internal APIs are frozen.