gpt-oss-120b on Strix Halo via kyuz0 flags; smoke-validated apps

2026-04-22-kyuz0-strix-halo April 22, 2026 dotfiles GitHub

End-to-end kyuz0-aligned local-AI stack on AMD Strix Halo, validated in a full 8-container smoke test (OpenWebUI → LiteLLM → llama-swap → llama-server), plus a companion 11-target apps-smoke pass that surfaced and fixed real quadlet bugs in-tree.

Added

docs/README.md, docs/kyuz0-toolbox.md, docs/resync-runbook.md — new docs tree documenting kyuz0 interpolation map, pinned upstream SHA (1421e8706020e8d7e797f71b9f28cd3072e7f868), RADV vs AMDVLK vs ROCm trade-offs on gfx1151, and a step-by-step resync runbook for future kyuz0 upstream bumps
Mandatory kyuz0 Strix-Halo flag set applied to every model in llama-swap.yaml: --no-mmap -fa on -ngl 999 --batch-size 4096 --ubatch-size 512 --cache-type-k q8_0 --cache-type-v q8_0 --jinja --direct-io --cache-prompt --cache-reuse 256 --threads 12
gpt-oss-120b (unsloth UD-Q8_K_XL, 2-shard GGUF, 65k ctx, reasoning_effort=high) wired as the local-heavy alias

Changed

home/dot_config/containers/systemd/llama-swap.yaml — full rewrite. Drives /app/llama-server directly instead of ramalama --runtime llama.cpp run. Discovered during integration that ghcr.io/mostlygeek/llama-swap:vulkan does NOT ship a ramalama binary — the pre-existing config was latently broken at first model load
home/dot_config/containers/systemd/llama-swap.yaml — proxy: URLs switched from http://llama-swap:NNNN (netavark DNS round-trip) to http://127.0.0.1:NNNN (co-located netns). Self-DNS bug under adguard DNAT hijack produced 502s after the first request; verified fixed (5.4 s cold, 0.08 s warm)
home/dot_config/containers/systemd/llama-swap.container — volume mount ~/.local/share/ramalama → ~/.local/share/llama-models (raw GGUF layout, no ramalama cache format). /dev/kfd + HSA_OVERRIDE_GFX_VERSION=11.0.0 retained as vestige candidates for future removal (tracked in #14)
home/dot_config/containers/systemd/litellm.yaml — local-heavy model field openai/gpt-oss-20b → openai/gpt-oss-120b
home/dot_config/containers/systemd/twenty.container + twenty-worker.container — image pinned to fully-qualified docker.io/twentycrm/twenty:v2.0.0 (short-name policy fails non-TTY pulls under podman auto-update / bootc)

Fixed

home/dot_config/containers/systemd/llama-swap.container — dropped GroupAdd=keep-groups and PodmanArgs=--group-add=render; kept --group-add=video only. The prior combination was incompatible with rootless podman 5.8.2 and blocked llama-swap from reaching Active
home/dot_config/containers/systemd/postfix.container — commented out Volume=postfix-main.cf and Volume=postfix-master.cf lines. The referenced files don't exist in the repo yet; podman was failing to create the mount. Postfix now boots on env-only POSTFIX_* config (real config authoring tracked in #18)
home/dot_config/sodimo/paperclip.env.tmpl — added BETTER_AUTH_SECRET=CHANGEME-BETTER-AUTH-32-BYTE-HEX. Paperclip refuses to boot without it

Other

Full 8-container AI stack on 172.30.0.0/24 test network: qwen3-4b end-to-end through OpenWebUI → LiteLLM → llama-swap → llama-server: PASS, 11.5 s cold / 0.08–0.20 s warm / 53 tok/s on RADV Vulkan. Every kyuz0 flag verified verbatim in podman top on the spawned llama-server process
Full 11-target apps stack on 172.31.0.0/24: 9/11 Tier 1 PASS, 3/3 Tier 2 HTTP PASS (vaultwarden /alive, paperclip /api/health, twenty /healthz). Twenty migrations completed in ~18 s (well under the 3–10 min budget)
gpt-oss-120b runtime deferred to prod harness — 100 GB UD-Q8_K_XL download infeasible in session window + dev-box kernel cmdline is not kyuz0-tuned (tracked as sodimo/harness#11)