This post is highly work-in-progress; gathering thoughts and ideas for now.
Goal#
- build a near-equivalent of Claude using locally served models using Ollama
- evaluate how close or far they are from, say, Sonnet 4.6
Mise-en-scène#
Pieces:
- Ollama post: you can now use Ollama as backend for Claude Code tool! Just need:
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434- recommendations from a
gemma4:4ebchat:- use GGUF (or GGML) format
- prefer
Q4_K_Mquantization - see if any community members have variants that they fine-tuned further on code
- increase context window (e.g., 32k+ tokens)
- recommendation (pre-Gemma4): highly quantized (Q4 or Q5) version of Mixtral or DeepSeek Coder in GGUF format
