Chapter 19 / 40

OpenWebUI — team chat with the local models

OpenWebUI is where anyone at Sodimo talks to the local AI models. Open the page, pick a model, type a question. It runs on the Framework Desktop on-site — nothing leaves the office. For scheduled or background work, the team uses Paperclip instead (chapter Paperclip).

Status: Planned — activates when the Framework Desktop is on-site and the local models are loaded. The configuration is committed in sodimo/harness; the page goes live the day the box boots.

How to get in

The page lives at chat.sodimo.eu, reached through the Cloudflare Tunnel → Caddy → openwebui:8080. There is no login screen. Anyone Cloudflare Access admits at the edge is in, and gets the same admin-level account as everyone else.

This is deliberate. Access control lives at the edge — Cloudflare Access decides who can see the URL at all, and OpenWebUI trusts the Cf-Access-Authenticated-User-Email header downstream of that. Once inside, the team shares one workspace. No password resets, no account provisioning, no forgotten logins.

Locked-down posture, set in openwebui.env:

WEBUI_AUTH=false, ENABLE_LOGIN_FORM=false, ENABLE_SIGNUP=false
DEFAULT_USER_ROLE=admin
ENABLE_COMMUNITY_SHARING=false, ENABLE_USER_WEBHOOKS=false

RAG embeddings — routed through LiteLLM

OpenWebUI’s default RAG pipeline fetches embedding models from HuggingFace on first boot, which stalls indefinitely on a locked-down box. Embeddings are instead routed through LiteLLM to the local local-embeddings model:

RAG_EMBEDDING_ENGINE=openai
RAG_EMBEDDING_MODEL=local-embeddings
RAG_OPENAI_API_BASE_URL=http://litellm:4000
RAG_OPENAI_API_KEY=<LiteLLM master key>

No HuggingFace fetch, no traffic off-box, consistent token accounting. Applied in smoke on 2026-04-22; repo-side wiring blocked on sodimo/dotfiles#15 (.env files are gitignored — need .env.tmpl companions).

What a session looks like

Open the page. The chat interface loads.
Pick a model from the dropdown — the locally-hosted models enabled for Sodimo.
Type a prompt. The answer streams back from the Framework Desktop. Observed end-to-end on 2026-04-22 smoke: 53 tok/s on local-task (qwen3-4b) through the full chain — see Local AI stack → Smoke benchmark.
Optionally, drop in a file (a PDF, a CSV, an email) and ask questions about its contents. The file stays on the box; embeddings run on local-embeddings via LiteLLM.
When a Sodimo tool is available (customer lookup, AR query, catalog match — see chapter What the AI can access), the model can call it during the answer.

Every turn is logged to the run ledger: which model, how many tokens, how long, whether it stayed local or escalated. Individual identities are not tracked inside OpenWebUI — the page is shared — but the aggregate usage appears on the cost dashboard.

Shared prompts

OpenWebUI has a “Prompts” workspace the whole team sees. This is the team’s skill shelf: pre-authored prompts for common tasks — summarize this email, rewrite in English, translate to French, draft an order acknowledgement. Anyone can save a new prompt and the next person who opens the page sees it.

The heavier skills — morning briefs, collection-letter drafts, monthly board reports — are authored in the skills library (chapter Skills library) and run through the terminal, Claude.ai, or on a schedule. OpenWebUI is for the ad-hoc requests that do not deserve a named skill.

When to use OpenWebUI versus Paperclip versus Claude.ai

Need	Use
Ad-hoc chat with a local model, no data leaves the building	OpenWebUI
A scheduled or background agent run (morning brief, overnight draft)	Paperclip
A harder one-off request where the local tier is not enough	Claude.ai seat

What OpenWebUI is not

It is not the CRM, not the inbox, not a dashboard builder. Chat only, against the local models. Scheduled agents, background work, and anything that needs to run without a human watching go through Paperclip (chapter Paperclip). Cloud escalation to Claude goes through the existing Claude.ai seats, not through a model picker inside OpenWebUI.