IDL Walkthrough

An inference service, from interface to deployment.

A streaming inference backend that works locally, over TLS, or inside an SGX enclave — without changing the interface, the implementation, or the calling code.

The scenario

You are building a C++ inference backend and you need callers in other processes, other machines, or a browser UI to be able to start an inference run, receive tokens as they are generated, and cancel if needed.

Without a framework you write serialisation on both sides, manage a callback channel, design a wire protocol for streaming output, handle object lifetimes across the connection, and repeat that work for every transport you need. Canopy replaces that with a single IDL file and generated glue on both sides.

Token streaming arrives via a callback interface — the server calls back into the client over the same connection, fire-and-forget.
A session object is created on the server and returned as an rpc::shared_ptr — the session stays alive exactly as long as the caller holds it.
The service factory is the top-level entry point — callers connect to it and ask it to create sessions.
The same interface compiles in both blocking and co_await modes; nothing changes.

Step 1 — write the IDL

The IDL describes the callable surface: interfaces, methods, argument direction, and one-way calls. It says nothing about transport, serialisation, or execution model.

// inference/inference.idl
// Copyright notice here

#import "rpc/rpc_types.idl"

namespace inference
{
    // [inline] means callers write inference::v1::i_session, not inference::i_session.
    // Bumping to v2 later is an additive change — v1 stays callable.
    [inline] namespace v1
    {
        // Sampling parameters. rpc::optional means the caller may omit the field;
        // the default value (= …) fills it in.
        [status=production]
        struct generation_options
        {
            [description="Maximum tokens to generate"]
            rpc::optional<uint32_t> max_tokens = 512;

            [description="Sampling temperature 0.0–2.0"]
            rpc::optional<float> temperature = 0.7;

            [description="Halt on any matching sequence"]
            rpc::optional<std::vector<std::string>> stop_sequences;
        };

        [status=production]
        enum class finish_reason { complete, max_tokens, cancelled, error };

        // Callback: the server delivers tokens into this interface on the client side.
        // [post] = fire-and-forget — no reply is awaited and no round-trip is added
        // per token.  on_finish is a normal call so the client gets confirmation.
        [status=production]
        interface i_token_sink
        {
            [post] int on_token(const std::string& token);
            int on_finish(finish_reason reason, uint32_t tokens_generated);
        };

        // A live inference session. Returned as rpc::shared_ptr by the factory.
        // The server keeps the session alive for exactly as long as the caller holds
        // its shared_ptr — distributed reference counting, no manual cleanup.
        [status=production]
        interface i_session
        {
            int generate(
                const std::string& prompt,
                const generation_options& options,
                // optimistic_ptr: a callable non-owning reference to the client's
                // token sink. Does not prevent the sink from being released.
                const rpc::optimistic_ptr<i_token_sink>& sink);

            int cancel();
        };

        // The top-level service factory. Callers connect to this, list available
        // models, and request a session for the one they want.
        [status=production]
        interface i_inference_service
        {
            int list_models([out] std::vector<std::string>& model_ids);
            int create_session(
                const std::string& model_id,
                [out] rpc::shared_ptr<i_session>& session);
        };
    }
}

IDL features used here

[inline] namespace v1 — stable versioned name without nesting callers in two namespaces
rpc::optional<T> — omissible field with an explicit default
[post] — one-way fire-and-forget; no reply channel opened per token
rpc::optimistic_ptr<T> — callable non-owning reference (breaks the callback cycle)
rpc::shared_ptr<T> — returned session whose lifetime is controlled by the caller
[out] — output parameter direction annotation
[description=…] — description embedded in the generated JSON schema

What you are not deciding here

No transport — TCP, IPC, local, DLL, or SGX is chosen at construction time
No serialisation format — YAS binary, JSON, or Protocol Buffers is a build or per-connection choice
No target language — C++, Rust (via Protocol Buffers), and JavaScript all consume the same IDL
No execution model — the same source compiles blocking or co_await
No version number — the namespace name is the version; v1 and v2 coexist without flags or negotiation
No wire protocol — framing, encoding, and version negotiation are handled by the runtime

The IDL is the type system

The IDL file is not just documentation — it is the single canonical definition from which every downstream artefact is generated. Change a type in the IDL and the C++ proxies, protobuf descriptors, JSON schemas, and JavaScript stubs all update in the next build. Nothing can drift out of sync because nothing is written twice.

// One IDL file. Many generated targets.
inference.idl
single source of truth
├──
C++ proxy + stub — yas_binary, yas_json, protocol_buffers variants
├──
Rust interop — Protocol Buffers wire format; same IDL, different runtime
├──
JavaScript WebSocket client — reduced-trust browser layer, same contract
├──
JSON schema (config profile) — VS Code IntelliSense, validation, hover docs
└──
JSON schema (MCP profile) — minimal tool schema for AI agent use

Version management

[inline] namespace v1 is the versioning mechanism. The qualified name inference::v1::i_session is the version — there is no separate version field to keep in sync with the interface shape.

At build time, Canopy writes a fingerprint for every interface marked [status=production] into a check_sums/production/ file. If the interface definition changes, the fingerprint changes, and a CI check can reject the build — enforcing that a production contract is never silently modified.

To evolve a production interface, you rename it: bump v1 to v2. The old name stays callable by any code compiled against it; the new name starts a fresh contract. Both can coexist in the same IDL file and the same running service.

Serialisation format as a deployment detail

Each CanopyGenerate target (yas_binary, yas_json, protocol_buffers) is a different transformation of the same IDL types. The caller holds an rpc::shared_ptr<i_session> — it does not know or specify which wire format is in use.

Format is selected at transport construction time and can be negotiated per connection. A C++-to-C++ path might use YAS binary for throughput; a browser client uses JSON for readability; a Rust service uses Protocol Buffers for cross-language compatibility. The interface code does not change.

The same IDL attributes that annotate types for C++ also annotate the generated JSON schemas: [description=…] becomes hover documentation in VS Code and method descriptions in MCP tool definitions. Annotations written once appear everywhere they are needed.

A struct field added in the IDL appears in the C++ proxy, the protobuf descriptor, the JSON schema, and the JavaScript client after the next build — consistently and without touching any of those targets by hand. A field removed from the IDL causes a compile error in any code that still references it, in any language that has a generated binding.

Step 2 — what Canopy generates

One CanopyGenerate CMake call produces C++ for each requested serialisation format. The generated headers expose the same virtual interface the IDL described — no hand-written serialisation, dispatch, or transport code on either side.

# CMakeLists.txt
CanopyGenerate(
    inference
    inference/inference.idl
    ${CMAKE_CURRENT_SOURCE_DIR}
    ${CMAKE_BINARY_DIR}/generated
    ""
    yas_binary       # high-performance C++ to C++ path
    yas_json         # human-readable debug path and browser/agent calls
    protocol_buffers # cross-language path (Rust, browser via protobuf)
    include_paths ${CMAKE_CURRENT_SOURCE_DIR}/.
    install_dir   ${GENERATED_INSTALL_DIR})

After generation the caller-side header exposes the pure virtual interface exactly as written in the IDL. The caller holds an rpc::shared_ptr<i_inference_service> and calls it like a local object:

// Caller — the generated interface looks exactly like a local C++ object.
// No transport code here.  No serialisation.  No error handling beyond the return code.
int main()
{
    rpc::shared_ptr<inference::v1::i_inference_service> svc = connect(); // see Step 4

    std::vector<std::string> models;
    if (int err = svc->list_models(models); err != rpc::error::OK())
        return err;

    rpc::shared_ptr<inference::v1::i_session> session;
    svc->create_session(models[0], session);

    // Register a callback sink.  The server will call on_token() on this
    // object from the other side of the connection.
    auto sink = rpc::make_shared<my_token_sink>(service);

    inference::v1::generation_options opts;
    opts.max_tokens = 256;
    opts.temperature = 0.8f;

    session->generate("Explain RAII in one paragraph.", opts, sink);

    // When session goes out of scope, the remote object is released automatically.
}

The caller does not know or care whether the service is in the same process, on another machine, or inside an SGX enclave. That decision is made where the transport is constructed — not here.

Step 3 — write the implementation

The server-side implementation inherits from rpc::base and overrides the interface methods. There is no serialisation, no transport plumbing, and no callback channel to set up — that is all in generated code.

// server/my_session.h
#include "generated/inference/inference.h"

class my_session
    : public rpc::base<my_session, inference::v1::i_session>
{
    my_model_state model_;
public:
    int generate(
        const std::string& prompt,
        const inference::v1::generation_options& opts,
        const rpc::optimistic_ptr<inference::v1::i_token_sink>& sink) override
    {
        auto max_tokens = opts.max_tokens.value_or(512);
        model_.begin(prompt, opts.temperature.value_or(0.7f));

        while (model_.has_next() && max_tokens-- > 0)
        {
            sink->on_token(model_.next_token());   // [post]: fire-and-forget, no round-trip
        }
        return sink->on_finish(
            inference::v1::finish_reason::complete,
            model_.tokens_generated());
    }

    int cancel() override
    {
        model_.cancel();
        return rpc::error::OK();
    }
};

class my_inference_service
    : public rpc::base<my_inference_service, inference::v1::i_inference_service>
{
    std::shared_ptr<rpc::service> svc_;
public:
    explicit my_inference_service(std::shared_ptr<rpc::service> svc) : svc_(svc) {}

    int list_models([out] std::vector<std::string>& model_ids) override
    {
        model_ids = { "llama3-8b", "mistral-7b" };
        return rpc::error::OK();
    }

    int create_session(
        const std::string& /*model_id*/,
        rpc::shared_ptr<inference::v1::i_session>& session) override
    {
        // rpc::make_shared registers the object in this zone and returns a
        // distributed shared_ptr.  The session stays alive as long as the
        // caller — in any process or machine — holds its copy.
        session = rpc::make_shared<my_session>(svc_);
        return rpc::error::OK();
    }
};

The [post] annotation on on_token means the server calls it and moves on immediately — there is no round-trip delay waiting for the client to acknowledge each token. Throughput is bounded by serialisation and network bandwidth, not latency.

Step 4 — choose a deployment

The implementation above does not change between these three deployments. The only difference is which transport object you construct around it.

Local (same process)

Plugin or DLL boundary. Zero-copy serialisation path for the binary format. Useful during development and for in-process isolation.

TCP + TLS

Network deployment between processes or machines. The stream transformer wraps each accepted TCP connection in TLS before handing it to the transport — application code unchanged.

SGX enclave

Inference inside a trusted-execution environment. Remote attestation confirms the enclave identity before the session begins. Same IDL, same implementation — different transport constructor.

// ── Local deployment (in-process or DLL boundary) ────────────────────────────

auto zone = rpc::zone::create();
auto service = rpc::root_service::create("inference", zone, scheduler);

// Register the implementation — the generated RPC runtime routes calls to it.
service->register_object<inference::v1::i_inference_service>(
    rpc::make_shared<my_inference_service>(service));


// ── TCP + TLS deployment ─────────────────────────────────────────────────────

auto tls_ctx = std::make_shared<streaming::tls::context>(cert_path, key_path);

// stream_transformer: each raw TCP stream is wrapped in TLS before the
// transport sees it.  The application service object is unchanged.
auto tls_wrap = [tls_ctx, scheduler](std::shared_ptr<streaming::stream> tcp)
    -> CORO_TASK(std::optional<std::shared_ptr<streaming::stream>>)
{
    auto tls = std::make_shared<streaming::tls::stream>(tcp, tls_ctx);
    if (!CO_AWAIT tls->handshake()) CO_RETURN std::nullopt;
    CO_RETURN tls;
};

auto listener = std::make_shared<streaming::listener>(
    "inference",
    std::make_shared<streaming::tcp::acceptor>(endpoint),
    rpc::stream_transport::make_connection_callback<
        inference::v1::i_inference_service,
        inference::v1::i_inference_service>(
            [&](
                const rpc::shared_ptr<inference::v1::i_inference_service>&,
                const std::shared_ptr<rpc::service>& svc)
                -> CORO_TASK(rpc::service_connect_result<inference::v1::i_inference_service>)
            {
                CO_RETURN { rpc::error::OK(),
                    rpc::make_shared<my_inference_service>(svc) };
            }),
    std::move(tls_wrap));

listener->start_listening(service);


// ── SGX enclave deployment ───────────────────────────────────────────────────
// The IDL and implementation are compiled into the enclave unchanged.
// Only the transport construction is enclave-specific.

auto sgx_transport = rpc::sgx_coroutine_transport::create(
    enclave_path, token_path, service);
// Callers connect through sgx_transport; the enclave verifies its own
// identity via DCAP remote attestation before admitting any session.

The client-side connection follows the same pattern: build the corresponding transport, call connect_to_zone, and receive an rpc::shared_ptr<i_inference_service>. From that point the calling code above works identically for all three deployments.

Step 5 — schemas and agent discovery

The same IDL that drives generated C++ also drives generated JSON schemas. Two schema profiles serve different consumers from a single source:

Config profile

Full authoring schema: descriptions from [description=…], default values, string|integer enums, additionalProperties: false, and cross-file $ref with $id.

Used by VS Code and other JSON-schema-aware editors to provide completion and validation when writing Canopy configuration files.

MCP profile

Minimal tool schema: everything inlined, no $id, string-only enums, no defaults. Sized and shaped for LLM tool-use layers.

An AI agent receives this schema and can build a valid JSON call for any method in the interface — with no hand-written schema maintenance.

At runtime, a caller can ask a Canopy service to describe itself. The service returns method names, parameter schemas, and interface metadata — all derived from the generated code, with no hand-written MCP configuration needed.

// ── What the generator produces (build-time, no runtime cost) ────────────────

// Config profile schema for generation_options (fragment):
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "generation_options",
  "properties": {
    "max_tokens": {
      "type": "integer",
      "default": 512,
      "description": "Maximum tokens to generate"
    },
    "temperature": {
      "type": "number",
      "default": 0.7,
      "description": "Sampling temperature 0.0–2.0"
    }
  },
  "additionalProperties": false
}

// MCP profile for i_session.generate (fragment):
// — no defaults (fewer tokens to the LLM), string-only enums, self-contained
{
  "name": "i_session.generate",
  "inputSchema": {
    "type": "object",
    "properties": {
      "prompt":  { "type": "string", "description": "The prompt text" },
      "options": {
        "type": "object",
        "properties": {
          "max_tokens":  { "type": "integer", "description": "Maximum tokens to generate" },
          "temperature": { "type": "number",  "description": "Sampling temperature 0.0–2.0" }
        }
      }
    },
    "required": ["prompt"]
  }
}

Marking an interface [introspectable] Planned allows runtime discovery via i_marshaller::get_schema. A browser client or AI agent can call get_schema on a connected object, receive the live MCP tool list, and then call methods by name using JSON-encoded parameters — with no generated client stubs and no hard-coded method identifiers.

// ── Future: runtime discovery over an existing connection ─────────────────────
// The browser JS client (or any untyped caller) can discover
// i_inference_service's methods without pre-generated stubs.

// JS (WebSocket demo, planned)
const tools = await canopy.getSchema(serviceRef, 'mcp');
// tools = [{ name: "i_session.generate", inputSchema: {...} }, ...]

const { interface_id, method_id } = canopy.resolve("i_session", "generate");
const reply = await canopy.send({
    encoding: 'yas_json',
    interface_id, method_id,
    data: { prompt: "Explain RAII.", options: { max_tokens: 128 } }
});
// The C++ stub decodes the JSON arguments and dispatches to my_session::generate.
// No protobuf, no hard-coded ids.

What was written vs what was generated

inference.idlThe complete service contract: 50 lines describing three interfaces, two structs, one enum, and their attributes.

CMakeLists.txt changeOne CanopyGenerate call specifying the IDL file and desired serialisation formats.

my_session.h / .cppThe actual inference logic: generate, cancel, and the session factory — no transport or serialisation code.

Transport construction~10–20 lines per deployment: choose TCP+TLS, local, or SGX by constructing the matching transport objects.

Generated (not written)Proxy and stub code for each serialisation format; JSON schemas for editor IntelliSense and AI tooling; version fingerprints; JavaScript WebSocket client.

Try the calculator demo first

The WebSocket calculator uses the same IDL pattern — simpler interface, same transport and generation story. Run it to see the generated call path end to end before exploring a fuller service contract.

Open live demos IDL reference GitHub