Chapter 19 / 40
OpenWebUI — team chat with the local models
OpenWebUI is where anyone at Sodimo talks to the local AI models. Open the page, pick a model, type a question. It runs on the Framework Desktop on-site — nothing leaves the office. For scheduled or background work, the team uses Paperclip instead (chapter Paperclip).
Status: Planned — activates when the Framework Desktop is on-site and the local models are loaded. The configuration is committed in sodimo/harness; the page goes live the day the box boots.
How to get in
The page lives at chat.sodimo.eu, reached through the Cloudflare Tunnel → Caddy → openwebui:8080. There is no login screen. Anyone Cloudflare Access admits at the edge is in, and gets the same admin-level account as everyone else.
This is deliberate. Access control lives at the edge — Cloudflare Access decides who can see the URL at all, and OpenWebUI trusts the Cf-Access-Authenticated-User-Email header downstream of that. Once inside, the team shares one workspace. No password resets, no account provisioning, no forgotten logins.
Locked-down posture, set in openwebui.env:
WEBUI_AUTH=false,ENABLE_LOGIN_FORM=false,ENABLE_SIGNUP=falseDEFAULT_USER_ROLE=adminENABLE_COMMUNITY_SHARING=false,ENABLE_USER_WEBHOOKS=false
RAG embeddings — routed through LiteLLM
OpenWebUI’s default RAG pipeline fetches embedding models from HuggingFace on first boot, which stalls indefinitely on a locked-down box. Embeddings are instead routed through LiteLLM to the local local-embeddings model:
RAG_EMBEDDING_ENGINE=openaiRAG_EMBEDDING_MODEL=local-embeddingsRAG_OPENAI_API_BASE_URL=http://litellm:4000RAG_OPENAI_API_KEY=<LiteLLM master key>
No HuggingFace fetch, no traffic off-box, consistent token accounting. Applied in smoke on 2026-04-22; repo-side wiring blocked on sodimo/dotfiles#15 (.env files are gitignored — need .env.tmpl companions).
What a session looks like
- Open the page. The chat interface loads.
- Pick a model from the dropdown — the locally-hosted models enabled for Sodimo.
- Type a prompt. The answer streams back from the Framework Desktop. Observed end-to-end on 2026-04-22 smoke: 53 tok/s on
local-task(qwen3-4b) through the full chain — see Local AI stack → Smoke benchmark. - Optionally, drop in a file (a PDF, a CSV, an email) and ask questions about its contents. The file stays on the box; embeddings run on
local-embeddingsvia LiteLLM. - When a Sodimo tool is available (customer lookup, AR query, catalog match — see chapter What the AI can access), the model can call it during the answer.
Every turn is logged to the run ledger: which model, how many tokens, how long, whether it stayed local or escalated. Individual identities are not tracked inside OpenWebUI — the page is shared — but the aggregate usage appears on the cost dashboard.
Shared prompts
OpenWebUI has a “Prompts” workspace the whole team sees. This is the team’s skill shelf: pre-authored prompts for common tasks — summarize this email, rewrite in English, translate to French, draft an order acknowledgement. Anyone can save a new prompt and the next person who opens the page sees it.
The heavier skills — morning briefs, collection-letter drafts, monthly board reports — are authored in the skills library (chapter Skills library) and run through the terminal, Claude.ai, or on a schedule. OpenWebUI is for the ad-hoc requests that do not deserve a named skill.
When to use OpenWebUI versus Paperclip versus Claude.ai
| Need | Use |
|---|---|
| Ad-hoc chat with a local model, no data leaves the building | OpenWebUI |
| A scheduled or background agent run (morning brief, overnight draft) | Paperclip |
| A harder one-off request where the local tier is not enough | Claude.ai seat |
What OpenWebUI is not
It is not the CRM, not the inbox, not a dashboard builder. Chat only, against the local models. Scheduled agents, background work, and anything that needs to run without a human watching go through Paperclip (chapter Paperclip). Cloud escalation to Claude goes through the existing Claude.ai seats, not through a model picker inside OpenWebUI.