CapDAG: A Cross-Language Cartridge Protocol

The hard part isn’t the protocol. It’s making the protocol behave the same way across every language someone might want to write a cartridge in.

Why cartridges are processes

Most plugin systems pick a language and make everyone use it. VS Code extensions are TypeScript. Vim plugins are Vimscript. This works until someone wants to use a library that doesn’t exist in the chosen language — and document processing is full of those dependencies. The best PDF parser is in C++. The best EPUB library is in Rust. Whisper.cpp is C++; MLX is Swift/Metal; Candle is Rust; the long tail of model wrappers people actually publish on HuggingFace is Python.

Forcing one language means reimplementing the world.

So in MachineFabric, cartridges are just processes. Anything that can read stdin and write stdout can be a cartridge. The wire protocol — Bifaci — is a small framed binary protocol over those two pipes. It multiplexes streams, carries CBOR payloads, and supports peer invocation between cartridges. The spec is open; the Bifaci protocol document walks through frame layout, handshake, streaming, and routing internals.

A cartridge in Python is fifty lines of code. The Getting Started tutorial builds one from scratch end-to-end. We ship SDKs in Rust, Go, Swift/Objective-C, and Python today; the canonical runtime surface (manifest construction, handler registration, peer invocation, progress reporting) is currently most complete in the Python capdag package on PyPI.

What CapDAG does

Bifaci is the wire. CapDAG is the addressing system on top of it. Every cartridge advertises what it can do using Cap URNs:

cap:in="media:pdf";disbind;out="media:textable;page"
cap:in="media:image;png";describe-image;out="media:image-description;textable"
cap:in="media:textable";llm;mlx;summarization;out="media:generated-text;textable;record"

A Cap URN is a triple: an input media type, an output media type, and a bag of non-directional tags (the operation, model family, language, format). When the host dispatches a request, a single rule decides which providers are eligible:

A provider $p$ matches a request $r$ if the request’s input is at least as specific as the provider accepts (contravariant), the provider’s output is at least as specific as the request requires (covariant), and the provider satisfies every explicit tag constraint while being free to refine the unspecified ones.

Standard function subtyping. The point isn’t the math — it’s textbook — it’s that this is the only rule. Every cartridge in every language gets the same dispatch. When two cartridges overlap, ranking picks the most specific. A cartridge that handles media:pdf beats one that handles media: (the universal top type). You don’t configure this. It falls out of how the URNs are written.

The full formalism is in the CapDAG dispatch document; ranking is in 08-RANKING.

The planner is pathfinding

When a user asks for a destination that’s not directly served — “summarize this PDF,” when the only PDF cartridge produces page text and the only summarizer accepts plain text — the planner searches the capability graph for a path: PDF $\to$ disbind $\to$ page text $\to$ summarize $\to$ summary. The search is over a finite, registration-time-fixed graph. It terminates in milliseconds. The path is shown to the user before execution begins, because the planner produces a Machine Notation expression that names every step and every input/output type, and the UI renders it. There is no agent picking tools at random. There is no LLM in the planning loop.

Some routing decisions can only be made after a value is available. A cap declares its model-spec argument as media:llm;model-spec;textable, but the actual model spec string — hf:meta-llama/Llama-3.2-1B-Instruct, hf:mlx-community/..., hf:mistralai/Mistral-7B-Instruct — determines which prompt-formatting adapter the runtime applies. CapDAG handles this through a parallel registry of value adapters: a base URN plus a string is refined into a more specific URN, and dispatch picks the family-specific adapter accordingly. Different LLM families speak different prompt formats; without family-aware refinement, a Mistral model receiving Llama-style turn markers generates plausible-looking but hallucinated multi-turn conversations. We learned that one the hard way.

The two refinement registries — file-based and value-based — are architecturally parallel: same trait pattern, same longest-prefix lookup, same refine-or-pass-through semantics. They differ only in what they inspect (bytes vs. strings) and what markers they add (structural vs. semantic).

Cross-language reality

The URN parsing and matching logic has to be byte-identical everywhere. If the Rust implementation matches differently than the Swift implementation, cartridges break in ways that are impossible to debug from the outside.

We wrote the same logic four times: Rust, Swift, Go, and Python. Same test suite. Same edge cases. Same canonical normalization rules (alphabetical tag order, quoting around media URN values, predictable wildcard truth tables). Interop tests run every combination — Rust host with Python cartridge, Swift host with Go cartridge, all of them.

This took longer than writing the protocol. Most of the time went into finding where two languages disagreed about something subtle: string comparison under Unicode normalization, integer overflow at the boundaries of the specificity score, ordering of multi-value tags in the canonical form. We have a tagged-urn test suite that runs against every implementation; if the Rust output and the Swift output diverge on any input, the test fails and we go fix whichever is wrong.

Streaming-native, sandboxed

Cross-language IPC is usually slow. Bifaci is multiplexed — a single connection carries multiple logical streams simultaneously — and frames carry CBOR with chunk indexing and checksums. Cartridge overhead doesn’t show up in profiles when processing real documents. A 10MB PDF dwarfs the IPC cost.

Native cartridges (Rust, Swift, Go) move data at the speed of the underlying parser. Python cartridges are slower because the interpreter dominates. For bulk processing, native is the way; for prototyping or wrapping a Python-only library, Python cartridges are perfectly fine.

On macOS, every cartridge runs in a sandboxed XPC service, separate from both the engine and the user-facing app. MachineFabric never loads cartridge code into its own process. The XPC sandbox restricts cartridge capabilities by default: no network, no access to user files outside what’s explicitly handed in. A cartridge that needs network access (a cloud-proxy cartridge) declares the entitlement and is reviewed before being accepted into the canonical manifest.

Distributed cartridges must be code-signed with a Developer ID certificate, notarized by Apple, and packaged as a signed .pkg installer. The Cartridges page walks through the macOS-specific install layout and the host runtime’s failure modes (manifest_invalid, bad_installation, entry_point_missing, handshake_failed).

Why this matters

The cartridge system is how MachineFabric handles new file types and new model families without shipping app updates. Someone with domain expertise writes a cartridge for their format; they don’t need to coordinate with us. The Cap URN is the API contract.

CapDAG is open. The reference implementation is in Rust; matching implementations exist in Swift, Go, and Python. The capability registry at capdag.com is browsable; submission is a GitHub issue against capfab for new capabilities and media defs, and against cartridge-shelf for new cartridges. Every submission is reviewed by hand. Every accepted cartridge ends up in the canonical manifest at cartridges.machinefabric.com that ships with every install.

If you’re building something that needs a multi-language capability system with a formal dispatch rule, our specification might save you the months we spent getting the edge cases right. If you’re building a cartridge for MachineFabric specifically — start with the Getting Started tutorial and let us know on the cartridge-shelf issue tracker if anything’s unclear.