gpt-oss-120b on Strix Halo via kyuz0 flags; smoke-validated apps
End-to-end kyuz0-aligned local-AI stack on AMD Strix Halo, validated in a full 8-container smoke test (OpenWebUI → LiteLLM → llama-swap → llama-server), plus a companion 11-target apps-smoke pass that surfaced and fixed real quadlet bugs in-tree.
Added
docs/README.md,docs/kyuz0-toolbox.md,docs/resync-runbook.md— new docs tree documenting kyuz0 interpolation map, pinned upstream SHA (1421e8706020e8d7e797f71b9f28cd3072e7f868), RADV vs AMDVLK vs ROCm trade-offs on gfx1151, and a step-by-step resync runbook for future kyuz0 upstream bumps- Mandatory kyuz0 Strix-Halo flag set applied to every model in
llama-swap.yaml:--no-mmap -fa on -ngl 999 --batch-size 4096 --ubatch-size 512 --cache-type-k q8_0 --cache-type-v q8_0 --jinja --direct-io --cache-prompt --cache-reuse 256 --threads 12 gpt-oss-120b(unsloth UD-Q8_K_XL, 2-shard GGUF, 65k ctx,reasoning_effort=high) wired as thelocal-heavyalias
Changed
home/dot_config/containers/systemd/llama-swap.yaml— full rewrite. Drives/app/llama-serverdirectly instead oframalama --runtime llama.cpp run. Discovered during integration thatghcr.io/mostlygeek/llama-swap:vulkandoes NOT ship aramalamabinary — the pre-existing config was latently broken at first model loadhome/dot_config/containers/systemd/llama-swap.yaml—proxy:URLs switched fromhttp://llama-swap:NNNN(netavark DNS round-trip) tohttp://127.0.0.1:NNNN(co-located netns). Self-DNS bug under adguard DNAT hijack produced 502s after the first request; verified fixed (5.4 s cold, 0.08 s warm)home/dot_config/containers/systemd/llama-swap.container— volume mount~/.local/share/ramalama→~/.local/share/llama-models(raw GGUF layout, no ramalama cache format)./dev/kfd+HSA_OVERRIDE_GFX_VERSION=11.0.0retained as vestige candidates for future removal (tracked in #14)home/dot_config/containers/systemd/litellm.yaml—local-heavymodel fieldopenai/gpt-oss-20b→openai/gpt-oss-120bhome/dot_config/containers/systemd/twenty.container+twenty-worker.container— image pinned to fully-qualifieddocker.io/twentycrm/twenty:v2.0.0(short-name policy fails non-TTY pulls underpodman auto-update/ bootc)
Fixed
home/dot_config/containers/systemd/llama-swap.container— droppedGroupAdd=keep-groupsandPodmanArgs=--group-add=render; kept--group-add=videoonly. The prior combination was incompatible with rootless podman 5.8.2 and blocked llama-swap from reachingActivehome/dot_config/containers/systemd/postfix.container— commented outVolume=postfix-main.cfandVolume=postfix-master.cflines. The referenced files don't exist in the repo yet; podman was failing to create the mount. Postfix now boots on env-onlyPOSTFIX_*config (real config authoring tracked in #18)home/dot_config/sodimo/paperclip.env.tmpl— addedBETTER_AUTH_SECRET=CHANGEME-BETTER-AUTH-32-BYTE-HEX. Paperclip refuses to boot without it
Other
- Full 8-container AI stack on
172.30.0.0/24test network:qwen3-4bend-to-end through OpenWebUI → LiteLLM → llama-swap → llama-server: PASS, 11.5 s cold / 0.08–0.20 s warm / 53 tok/s on RADV Vulkan. Every kyuz0 flag verified verbatim inpodman topon the spawnedllama-serverprocess - Full 11-target apps stack on
172.31.0.0/24: 9/11 Tier 1 PASS, 3/3 Tier 2 HTTP PASS (vaultwarden/alive, paperclip/api/health, twenty/healthz). Twenty migrations completed in ~18 s (well under the 3–10 min budget) - gpt-oss-120b runtime deferred to prod harness — 100 GB UD-Q8_K_XL download infeasible in session window + dev-box kernel cmdline is not kyuz0-tuned (tracked as
sodimo/harness#11)