Blog

Engineering notes

Short write-ups on design decisions, failure modes, and benchmarks from building an open-source AI browser agent.

Four vision models, one screenshot: which one is actually worth running locally for a browser agent?
We fed the same Google sign-in page through Gemma 4-E2B, Gemma 4-31B, Qwen3.5-27B, and Qwen3.6-35B-A3B using the exact system prompt WebBrain's vision sub-call ships with. The spread on OCR accuracy, latency, and token cost is wider than you'd expect — and one model quietly changed our mind about which architecture to reach for.