ItemSource connector that pulls source-neutral records into one shared
corpus, and synthesis runs over the whole corpus — so a report
can draw on Reddit threads, Hacker News discussions, and web pages at once.
Shipped connectors
| Source id | What it reads | Key needed |
|---|---|---|
reddit | Live public Reddit submissions + comments | none |
arctic | Reddit historical archive (Arctic Shift) — see Load a Reddit corpus | none |
hackernews | Hacker News stories + comments (Algolia API) | none |
web | Web pages from a search provider (Exa / Tavily / parallel.ai / Firecrawl) | a search key |
Choosing your sources
Three ways, in increasing permanence:[sources] (and falls back to
Reddit). Whatever you pick, a single mw.research(...) call still works end to
end on an empty corpus — it ingests the chosen sources on demand, then
synthesizes. The corpus it builds up is durable and reused next time; see
the corpus.
Flat priority and breadth
Adding web (or any source) promotes it to a peer — it does not weight it above the others. Every source ingests into one corpus and synthesis is source-agnostic. The one per-source difference is how a cluster’s breadth is measured, so sources stay comparable instead of one drowning out another:- Authored sources (Reddit, Hacker News): breadth = distinct authors.
- Authorless web: breadth = distinct domains.
- Mixed clusters: the two are summed (
breadth_unitbecomes"voices").
demand_score, so fifty
quiet voices outrank one viral post, and an authorless web hit never scores zero
just for lacking an author. A cluster carries both distinct_author_count
(authored voices only) and breadth_count / breadth_unit for the honest,
source-neutral count.
Bring your own source
A connector is small. Copyresearch/sources/template.py and implement the
ItemSource protocol:
--source mysource,
get_source("mysource"), or Metalworks(sources=[MySource()]).
No comment layer? Some sources (web pages, link-only feeds) have no comment
thread — the record’s own text is the signal. Return None from comments_for
and set the class attribute yields_units = True. metalworks then treats
each record as its own synthesis unit and ranks it on domain breadth. This is an
explicit opt-in: a comment-bearing source whose comment client merely isn’t wired
also returns None, but is not a unit source.
A conformance check (metalworks.testing.check_item_source) verifies your
connector satisfies the protocol — wire it into your tests.
Next
- The corpus — the durable store your sources feed, and live, refreshable reports over it.
- Demand research — run a report over your sources.
- Bring your own corpus — load data directly without a connector.