What a Capability Is

When you right-click a file and the Transmute menu shows you what that file can become, each entry is reachable through one or more capabilities — typed operations a cartridge knows how to perform on a typed input to produce a typed output.

A capability is not a button in MachineFabric’s code. It’s a contract:

  • What it accepts — an input media URN (e.g. media:pdf, media:wav;bytes, media:llm-generation-request;json;record).
  • What it produces — an output media URN (e.g. media:textable;page, media:llm-text-stream;ndjson).
  • How it’s named — a Cap URN of the form cap:in="media:X";<tags>;out="media:Y" that distinguishes the operation distinctly enough that the planner can route to it.
  • What arguments it takes — optional parameters extracted from stdin, CLI positions, or flags.

The naming, matching, and routing of capabilities is defined by CapDAG, the open protocol MachineFabric is built on. This page covers what capabilities mean from inside MachineFabric — what the Transmute menu shows, what the standard ones are, what cartridges declare. For protocol-level details:

How the Transmute Menu Reaches a Capability

You right-click a file. Here’s what happens before the menu opens:

  1. MachineFabric reads the file’s media type — a PDF becomes media:pdf, a WAV becomes media:wav;bytes, and so on.
  2. The planner walks the capability graph — every cap that accepts that input directly, plus every cap reachable through one or more chained cartridges, contributes a possible destination.
  3. The menu shows destinations, not steps — “render to image,” “transcribe,” “summarize.” You pick the destination; the planner picks the route.
  4. The route is shown before it runs — for multi-step pipelines the graph appears so you can see the journey before a single byte moves.
  5. Cartridges are invoked in order — each one in a sandboxed XPC service, each step’s progress streamed back live, results stored in your library.

When the same destination is reachable by more than one route, ranking picks the most specific match. A PDF-specific extractor beats a generic document one; an LLM cap that names a specific model beats a generic one.

Standard Capabilities

CapDAG ships a small set of standard caps that every cartridge can rely on. The mandatory one is CAP_IDENTITY; the rest are common operations the registry has helpers for.

Mandatory

Cap URN Purpose
cap: (CAP_IDENTITY) Pass input through unchanged. Every cartridge must declare it. The runtime auto-registers a handler — you just include it in the manifest.

Standard (optional)

Cap Purpose
CAP_DISCARD Accept any input, produce media:void. Default implementation provided.
CAP_ADAPTER_SELECTION Inspect content and return the media URNs that describe it. Used for file-type detection.

LLM, Embeddings, Models

The cartridge SDK exposes URN constants and builders for the LLM, embeddings, and model-management caps that MachineFabric routes through every day:

  • LLM inferenceCAP_LLM_INFERENCE_GGUF, CAP_LLM_INFERENCE_MLX, CAP_LLM_INFERENCE_CANDLE, CAP_LLM_INFERENCE_CONSTRAINED
  • LLM introspectionCAP_LLM_VOCAB, CAP_LLM_MODEL_INFO
  • EmbeddingsCAP_GENERATE_EMBEDDINGS, CAP_EMBEDDINGS_DIMENSIONS
  • VisionCAP_DESCRIBE_IMAGE
  • Models — download, list, status, contents, availability, path (URN builders in the standard library)

The associated payload media URNs:

Media URN Carries
media:llm-generation-request;json;record An LlmGenerationRequest — prompt, model spec, sampling params, optional constraints
media:llm-text-stream;ndjson An LlmStreamMessage stream — tokens, status, completion, tool requests, errors
media:llm-vocab-response;json;record An LlmVocabResponse
media:llm-model-info;json;record An LlmModelInfo

The full request/stream/vocab/info shapes live in the SDK Reference. The dispatch rule that routes a generation request to the right backend (GGUF, MLX, or Candle) follows the same ranking logic as everything else — the request’s model_spec plus tags determines the provider.

Document and content caps

Standard URN builders in the CapDAG library cover the common document operations:

  • disbind_urn(input_media) — decompose a document into pages with text content
  • render_page_image_urn(input_media) — render document pages as images
  • coercion_urn(source_type, target_type) — generic format coercion
  • format_conversion_urn(in_media, out_media) — JSON / YAML / CSV interconversion
  • generate_json_urn(lang_code), make_decision_urn(lang_code), make_multiple_decisions_urn(lang_code) — structured-output LLM caps

For everything else (custom extraction, sentiment tagging, OCR, sentiment, transliteration), cartridges define their own caps. The only requirements are that the URN parses (per CapDAG URN syntax) and that the input and output media URNs it references either exist in the registry or are proposed alongside the cartridge (Contributing).

# Examples of valid custom caps
cap:in="media:text;sentiment-input;textable";out="media:sentiment-tag;textable";tag-sentiment
cap:in="media:png;bytes";analyze;target=faces;out="media:json;record"
cap:in="media:wav;bytes";transcribe;model=whisper;out="media:textable"

What a Cartridge Declares

Cartridges declare their capabilities in a CapManifest returned at startup. The manifest is what the host reads at install time to register the cartridge with the router.

The minimal Python shape looks like this (the Getting Started tutorial walks through every line):

from capdag.bifaci.manifest import CapManifest, default_group
from capdag.cap.definition import Cap, CapArg, CapOutput, PositionSource, StdinSource
from capdag.standard.caps import CAP_IDENTITY
from capdag.urn.cap_urn import CapUrn, CapUrnBuilder

def build_manifest() -> CapManifest:
    tag_sentiment = Cap(
        urn=(
            CapUrnBuilder()
            .marker("tag-sentiment")
            .in_spec("media:text;sentiment-input;textable")
            .out_spec("media:sentiment-tag;textable")
            .build()
        ),
        title="Tag sentiment",
        command="tag-sentiment",
    )
    tag_sentiment.args = [
        CapArg(
            media_urn="media:text;sentiment-input;textable",
            required=True,
            sources=[
                StdinSource("media:text;sentiment-input;textable"),
                PositionSource(0),
            ],
            arg_description="UTF-8 text to classify.",
        )
    ]
    tag_sentiment.output = CapOutput(
        media_urn="media:sentiment-tag;textable",
        output_description="One of 'positive', 'neutral', 'negative'.",
    )

    identity = Cap(
        urn=CapUrn.from_string(CAP_IDENTITY),
        title="Identity",
        command="identity",
    )

    return CapManifest(
        name="sentiment-tagger",
        version="0.1.0",
        channel="nightly",
        registry_url=None,
        description="Classify text as positive, neutral, or negative.",
        cap_groups=[default_group([identity, tag_sentiment])],
    )

A few rules the host enforces:

  • CAP_IDENTITY is mandatory. Every cartridge’s manifest must list it. The Python runtime auto-registers a handler; you just declare the cap.
  • channel and registry_url are baked in. They must agree with the engine’s compile-time configuration and with the on-disk install record (cartridge.json); a mismatch is rejected at discovery time.
  • cap_groups are atomic registration units. A group succeeds or fails as a whole; the host uses groups to keep related caps consistent.

The Cap URN builder API (CapUrnBuilder().marker().in_spec().out_spec().build()) is the only safe way to construct a Cap URN — the runtime canonicalizes tags alphabetically, and a hand-written URN string almost certainly won’t match the canonical form. See the Getting Started tutorial for the build-once-and-keep-the-string pattern.

Routing in Practice

When MachineFabric needs a capability:

  1. It builds a request — a Cap URN describing what is needed.
  2. The router runs the dispatch predicate against every registered provider and collects the eligible ones.
  3. Among eligible providers, ranking picks the most specific.
  4. The selected cartridge is spawned (if not already running), the request is sent over the Bifaci protocol, and the response is streamed back.
  5. For multi-step requests, the planner finds the path through the capability graph and the executor runs each step in order, with parallel execution where the graph allows it.

If two providers match with equal specificity, the first registered wins. This means a specialized cartridge for a specific file type reliably overrides a more general one for that type, while leaving the general cartridge available everywhere else.