Model configuration - metalworks

metalworks talks to LLMs through the ChatModel protocol. You rarely construct adapters by hand — you name a model and metalworks resolves it.

Model refs

A model ref is provider:model-id or provider/model (the slash form matches the convention used by OpenRouter, LiteLLM, and most agent runtimes):

from metalworks import Metalworks

Metalworks(model="anthropic/claude-opus-4-6")
Metalworks(model="openai:gpt-5")
Metalworks(model="google/gemini-3-pro")

Ref	Routes to	Needs
`anthropic/<id>`	native Anthropic SDK	`ANTHROPIC_API_KEY`
`openai/<id>`	native OpenAI SDK	`OPENAI_API_KEY`
`google/<id>` (or `gemini/<id>`)	native Google SDK	`GOOGLE_API_KEY` / `GEMINI_API_KEY`, or Vertex AI (below)
`openrouter/<vendor/model>`	OpenRouter	`OPENROUTER_API_KEY`
`openai-compatible/<id>`	your `OPENAI_BASE_URL` endpoint	`OPENAI_API_KEY` + `OPENAI_BASE_URL`
`meta-llama/llama-3-70b` (any unknown vendor)	OpenRouter (the whole ref is the id)	`OPENROUTER_API_KEY`

A bare known-provider slash like anthropic/claude-opus always routes to the native SDK — it never silently lands on OpenRouter.

No ref? Inferred from your keys

With no model, the provider is taken from the first key present, in order: Anthropic, OpenAI, Google. So Metalworks() with only OPENAI_API_KEY set uses OpenAI. If none of those is set, a lone OPENROUTER_API_KEY is the recognized single-key fallback — Metalworks() then talks to OpenRouter’s OpenAI-compatible endpoint (so one key reaches many models). A native key always wins over it. You can also pin a default in ~/.config/metalworks/metalworks.toml:

provider = "anthropic"
model = "claude-opus-4-6"

Precedence: explicit model= ref > config file > first present key.

Google via Vertex AI

The Google chat and embedding adapters can authenticate through Vertex AI (Application Default Credentials, e.g. a service account) instead of an API key. Set GOOGLE_GENAI_USE_VERTEXAI=true and provide a project and location:

export GOOGLE_GENAI_USE_VERTEXAI=true
export VERTEX_PROJECT_ID=...        # or GOOGLE_CLOUD_PROJECT
export VERTEX_LOCATION=us-central1  # or GOOGLE_CLOUD_LOCATION (default us-central1)
# credentials: GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa.json, or ambient gcloud ADC

With Vertex mode on, provider inference routes to Google even when no GOOGLE_API_KEY is set. The project is required (VERTEX_PROJECT_ID or GOOGLE_CLOUD_PROJECT); the location defaults to us-central1.

Any OpenAI-compatible endpoint

This is the “bring your own model” path. Any server that speaks the OpenAI chat-completions API — OpenRouter, vLLM, LM Studio, Together, Groq, a local runtime — works with no new adapter:

from metalworks.llm.adapters.openai import OpenAIChatModel

local = OpenAIChatModel(
    model_id="llama-3.1-70b",
    base_url="http://localhost:1234/v1",   # your endpoint
    api_key_env="LOCAL_LLM_KEY",           # the env var holding its key
    native_structured=False,               # use the schema-in-prompt ladder
)
Metalworks(chat=local).research("...", subreddits=["..."])

native_structured=False routes structured calls straight to the schema-in-prompt ladder tier, which is the safe default for endpoints whose JSON-schema support varies. Leave it True if your endpoint enforces response_format reliably.

Fast vs main model

The research and discovery pipelines use a cheap “fast” model for triage and filtering and a capable model for synthesis and generation. Set both:

Metalworks(model="anthropic/claude-opus-4-6", fast_model="anthropic/claude-haiku-4-5")

If you set only model, the fast slot falls back to it. Resolve a pair directly with metalworks.config.resolve_models(model, fast_model).

Embeddings

The pipeline embeds Reddit comments to cluster demand. You don’t configure this separately — it resolves from your environment, and never requires its own key:

Present	Embeddings used
`GOOGLE_API_KEY` / Vertex	Google embeddings
else `OPENAI_API_KEY`	OpenAI embeddings
neither	local model — `fastembed` (`BAAI/bge-small-en-v1.5`, 384-dim), no key

So a chat-only provider (Anthropic, OpenRouter, a local LLM) just works: embeddings fall back to the local model, downloaded once to the Hugging Face cache, then fully offline. A Google or OpenAI key is used automatically when present (higher quality, no download).

metalworks models warm          # pre-download the local model before your first run

Override explicitly by injecting a provider:

from metalworks.embeddings.adapters.openai import OpenAIEmbedding
Metalworks(embeddings=OpenAIEmbedding())     # force a specific embedding backend

Embedding vectors from different models live in incompatible spaces. metalworks stamps each cached index with an identity and refuses to mix them — switching embedding backend on an existing .metalworks/ project triggers a clear EmbeddingModelMismatch rather than silently degrading retrieval. Re-run research to rebuild the index under the new model.

Check the resolution

metalworks doctor        # resolved chat + embedding models, keys found, actionable hints
metalworks models list   # the same, plus a provider × key × extra reachability matrix

​Model refs

​No ref? Inferred from your keys

​Google via Vertex AI

​Any OpenAI-compatible endpoint

​Fast vs main model

​Embeddings

​Check the resolution

Model refs

No ref? Inferred from your keys

Google via Vertex AI

Any OpenAI-compatible endpoint

Fast vs main model

Embeddings

Check the resolution