After 50+ AI council sessions and building a product ecosystem on AI infrastructure, I have a working routing table. This is what I actually use.
DeepSeek-chat for expert-role debate agents. Cheap enough to run five parallel instances without thinking about cost. Smart enough to maintain a consistent expert persona across multiple debate rounds.
Gemini 2.5 Pro with thinking enabled for synthesis. When you have five agents who have produced contradictory outputs, you need something that can hold the contradiction in mind and reason through it rather than just averaging. thinking_budget=8192 does this better than anything I have tested.
OpenAI o3 for financial and process logic. When exactness matters — unit economics, multi-step financial projections — o3 produces fewer errors than anything else.
Groq (Llama 3.3 70B) for fast, cheap tasks. UI copy, simple Q&A. Latency is under one second. Cost is near zero.
Gemini 2.0 Flash for client-facing quick queries. Fast, capable enough, cost-effective at scale.
The routing decision tree: Is this financial or logic-critical? Use o3. Is this synthesis from complex multi-source inputs? Use Gemini 2.5 Pro with thinking. Is this expert-role reasoning in a loop? Use DeepSeek-chat. Is this fast, cheap, good-enough? Use Groq or Gemini Flash.