Chapter 14 / 40
How the system is built
The system runs on two compute tiers, wired together by a single rule: code on Cloudflare reaches the harness through pull-based queues, never the other way around.
Status: In Progress — the Cloudflare side is live. The harness side activates the moment the Framework Desktop is racked in Gennevilliers.
Cloudflare — data, public surface, the one MCP endpoint
Cloudflare hosts everything that benefits from global distribution, cheap serverless compute, and Cloudflare Access as the identity gate:
- D1 (the data warehouse) — every number about Sodimo’s business, including the nightly Sodiwin mirror and the
run_ledgertable that records every AI invocation in the stack. - R2 (object storage) — off-site backups from the Gennevilliers NAS, generated artefacts, any static asset that outgrows git.
- The
sodimo-coreWorker — the single MCP endpoint for the entire stack. Tools for ERP reads, CRM reads and writes, run-ledger writes, email drafting, email send, and external API calls all live here. Claude.ai reaches it. Scheduled agents reach it. Nothing else claims to be an MCP server. - The CRM site — a home-built TypeScript app inspired by Twenty and Pipedrive. Runs on Cloudflare Pages, reads and writes D1.
- Internal dashboards — small Cloudflare Pages apps that team members deploy directly from Claude Code. The savings-study dashboard is one of them, reading from
run_ledger. - WhatsApp webhook — a Worker that lands inbound messages, delegates AI work to
sodimo-core, and queues outbound replies. - The public sites —
sodimo.euand its three siblings.
Access to anything internal goes through Cloudflare Access, gated on a Google Workspace account under @sodimo.eu.
The harness — email, local models, archive, scheduled work
The harness is the Fedora bootc machine in the Gennevilliers server room. It hosts everything that belongs on hardware Sodimo physically controls:
- The mail stack — Postfix, Dovecot, rspamd, Piler.
- Local AI inference — llama.cpp fronted by llama-swap, exposed to humans via OpenWebUI.
- Paperclip — the scheduled-agent runner.
- The nightly ETL — reads Florian’s CSV dump from the NAS, loads D1.
- Caddy, Cockpit, Tailscale — the usual surrounding crew.
Chapter The harness details each quadlet. The point here is the boundary, not the inventory.
The wiring rule — pull-based, never inbound
Two directions of traffic cross the boundary. They are wired deliberately differently.
Harness → Cloudflare (outbound). The ETL writes to D1 over the D1 HTTP API. OpenWebUI posts run records to the run-ledger endpoint on the Worker. Paperclip’s internal Postgres mirrors to run_ledger via a cron. Backups stream to R2. Outbound connections from the harness to Cloudflare are plain HTTPS — nothing exotic.
Cloudflare → Harness (inbound). Nothing directly. The harness exposes no inbound port. Instead, the Worker writes to Cloudflare Queues; a small systemd service on the harness polls those queues and acts on messages. The canonical case is email send: the Worker receives an email_send MCP call, writes {to, from, subject, body} to email_outbox, and returns. Seconds later a Python service on the harness drains the queue, validates the sender, hands the message to Postfix, and acknowledges.
This rule applies symmetrically. Code running on the harness that needs to send email — a Paperclip agent, an auto-reply skill, anything — calls the Worker email_send tool exactly the way Claude.ai would. It does not shortcut to local sendmail. Fedora → Cloudflare → Fedora is intentional. Four properties depend on it: a unified run ledger, a single sender-authorization policy, one observability schema, and centralized rate-limiting with a dead-letter queue.
The MCP boundary
Read left to right. Humans and agents both terminate at the Worker for anything that counts as a side-effect with an audit trail. Humans also have a separate path over Tailscale for the on-prem UIs — Piler search, OpenWebUI chat, Cockpit admin. Those paths are for humans only; agents never use them.
Why not the obvious alternatives
Why not one big on-prem box with everything on it? Two answers. D1 is already where the ERP mirror lives and where Claude.ai needs to read from. Duplicating it on-prem would double the data gravity and halve the audit surface. And the one-MCP-surface rule sidesteps a known Claude.ai OAuth bug with self-hosted MCP servers — a bug class that disappears if there is no self-hosted MCP server.
Why not push from Cloudflare to the harness over a tunnel? A tunnel means the harness exposes an inbound endpoint to the internet, which is an attack surface Sodimo would have to understand and defend. A pull queue means the harness opens zero inbound ports. If the harness is offline, messages queue for up to four days and drain when it comes back — the Worker does not care.
Why not let on-prem agents just call local services directly? Because local co-location is a physical fact, not an architectural permission. Every producer of a controlled side-effect routes through the Worker. A Paperclip agent that calls local sendmail produces an invisible send: the agent run exists, the email exists, the join between them does not. The savings study breaks, the sender-authorization policy drifts, and a runaway loop hits Postfix at full speed instead of a queue with backpressure.
Backup
The harness backs up to the NAS nightly. The NAS replicates off-site to R2. Bare-metal restore runs on Sodimo-owned hardware in under two hours — because the harness is described by a repo of plain-text quadlets, not by the state of a running machine.
Credentials
Credentials are self-managed. Routine ones live in a personal Google password manager. The ones that matter — domain registrar, Cloudflare root, LUKS recovery keys — are written on paper. Jack and Paul both have emergency access.