Pre
Apr 17
Apr 17
Gemma-4 26B MoE — production inference server
10.44 tok/s
Q5_K_MMoE · 8 active experts
Port :8081. mlock required — without it the model pages out. Key research finding: E-cores are load-bearing for MoE expert dispatch.
Restricting to P-cores dropped to 7.73 tok/s. Optimal: threads=8, cpu_mask=3F5. Autoresearch loop ran 15+ experiments. This was the production server for all agents before the tower existed.