Learn to Build AI Tools Like This

Foundation

Three ideas behind every tool

Understand these patterns and you can build anything on this site yourself.

⚔️

Multi-Agent Debates

Multiple AI models debate a question simultaneously from different expert angles. The disagreement surfaces better answers than any single model.

🔀

Model Routing

Different models excel at different tasks. DeepSeek for parallel experts (cheap + smart). Gemini 2.5 Pro for synthesis (thinking mode). o3 for financial logic. Groq for speed.

🏛️

The Consilium Pattern

Experts debate → Synthesis model reads all views → Structured verdict. This pattern powers every tool here.

Quick Start

From zero to your first consilium

Four steps to run a working multi-agent debate engine on your machine.

1

Install dependencies

Three packages. That's all you need to get started.

pip install httpx python-dotenv groq openai

2

Set API keys

All three providers have free tiers. DeepSeek is the cheapest for expert agents.

# .env
DEEPSEEK_API_KEY=your-key  # deepseek.com — very cheap
GROQ_API_KEY=your-key       # console.groq.com — free tier
GOOGLE_API_KEY=your-key     # ai.google.dev — free tier

3

Run your first consilium

Copy this snippet. Pass your business question. Get a structured debate from 5 parallel experts.

import asyncio, httpx, os
from dotenv import load_dotenv
load_dotenv()

EXPERTS = ["CTO", "CFO", "CMO", "COO", "Risk Officer"]
QUESTION = "Should we expand to the LatAm market next quarter?"

async def ask_expert(role, question):
    async with httpx.AsyncClient(timeout=30) as c:
        r = await c.post("https://api.deepseek.com/chat/completions",
            headers={"Authorization": f"Bearer {os.environ['DEEPSEEK_API_KEY']}"},
            json={"model": "deepseek-chat", "messages": [
                {"role": "system", "content": f"You are an expert {role}. Be direct."},
                {"role": "user", "content": question}]})
        return role, r.json()["choices"][0]["message"]["content"]

async def main():
    results = await asyncio.gather(*[ask_expert(e, QUESTION) for e in EXPERTS])
    for role, opinion in results:
        print(f"\n[{role}]\n{opinion}")

asyncio.run(main())

4

Customize expert roles for your domain

Replace EXPERTS with roles relevant to your business: Legal, Security, UX, Sales, DevOps — whatever your decision needs. Fork the full debate-engine on GitHub for synthesis, rounds, and structured reports.

Tutorials

Tool-by-Tool Guides

Click any tool to see what it does, when to use it, the key code pattern, and an example output.

🏗️ Virtual CTO Architecture ▼

What it does Routes your tech question through 5 AI experts: CTO, Security, DevOps, Frontend, Backend — then synthesizes a verdict.

When to use Architecture trade-offs, technology choices, scaling decisions, before hiring a CTO.

# Submit your tech question — 5 experts debate in parallel
EXPERTS = ["CTO", "Security Engineer", "DevOps", "Frontend", "Backend"]
question = "Should we use microservices or monolith for our MVP?"
# → 5 experts debate → Gemini 2.5 Pro synthesizes the verdict

Example Output [CTO] Start monolith. Ship faster, split later when you have real load data...
[Security] Monolith easier to secure at MVP stage. Microservices attack surface is large...
[DevOps] Microservices = 3x ops overhead. Only worth it above 50k req/day...

SYNTHESIS: Unanimous lean toward monolith for MVP. Revisit at 10k daily active users.

🧠 CEO Coach Bot Strategy ▼

What it does A Telegram bot powered by DeepSeek Reasoner that thinks through leadership challenges: hiring, strategy, conflict, culture.

When to use Difficult people decisions, strategic pivots, weekly prioritization, morale issues.

# How to use
# 1. /start in Telegram → describe your leadership challenge
# 2. DeepSeek Reasoner (R1) thinks step-by-step
# 3. Receive structured coaching advice + action items
# Model: deepseek-reasoner — chain-of-thought reasoning

Example Output Challenge: "My best engineer keeps missing deadlines but produces great work eventually."

Coach: This is a deadline-setting problem, not a performance problem. Three levers: (1) involve them in estimation, (2) break milestones smaller, (3) have a direct conversation about what "deadline" means to them vs you.

📊 Financial Audit Bot Finance ▼

What it does Upload your P&L or KPI spreadsheet — get KPI analysis, anomaly detection, and runway calculation via OpenAI o3.

When to use Monthly reviews, investor prep, catching unexpected cost trends.

# Pattern: parse spreadsheet → o3 → structured verdict
data = parse_spreadsheet("p_and_l_q4.xlsx")
prompt = f"""Analyze this P&L. Return:
1. Top 3 anomalies vs prior period
2. Runway risk if trend continues
3. One actionable recommendation
Data: {data}"""
# model="o3" — best for structured financial reasoning

Example Output ANOMALY 1: COGS +18% while revenue grew only 6% — margin compression risk
ANOMALY 2: Marketing spend +31% with no corresponding lead volume increase
ANOMALY 3: Accounts receivable aging beyond 90d increased 2.3x
RUNWAY: ~8 months at current burn trajectory
RECOMMENDATION: Audit Q3 marketing contracts for vendor lock-in

👥 Hiring Assistant HR ▼

What it does Paste job description + CVs → 4 expert evaluators (HR, Technical, Culture Fit, Risk) score each candidate and return a ranked shortlist.

When to use Screening 10+ applicants, hiring for senior roles, reducing bias.

# 4 evaluators score each candidate independently
EVALUATORS = ["HR Specialist", "Technical Lead",
              "Culture Fit Evaluator", "Risk Assessor"]
# Each evaluator: score 1-10 on their dimension
# Final rank = weighted average across all 4

Example Output CANDIDATE: Alexei M. | TOTAL: 8.4/10
HR: 9/10 — Strong progression, clear motivation
Technical: 8/10 — 5y Python, no async experience yet
Culture: 8/10 — Startup background matches pace
Risk: 8/10 — Clean exit from current role
RECOMMENDATION: Strong hire. Address async gap in onboarding.

🏛️ Consilium Decision-Making ▼

What it does The general-purpose debate engine. 6 specialist experts debate your business question in parallel, then Gemini 2.5 Pro reads all positions and produces a structured verdict with confidence level.

When to use Any high-stakes decision: pricing, market entry, partnerships, pivots, org changes.

# Full consilium: 6 experts + Gemini 2.5 Pro synthesis
python3 consilium.py \
  --question "Should we enter the Brazilian iGaming market in H1 2026?" \
  --experts market_analyst,legal,cfo,sales,risk,ops \
  --synthesis gemini-2.5-pro \
  --thinking

Example Output DEBATE: 6 positions captured (Legal: regulatory risk; CFO: positive ROI; Market Analyst: bullish timing)

SYNTHESIS VERDICT: Conditional YES — enter H2 not H1. Regulatory prep requires 4-6 months. ROI positive at 12 months if existing LatAm clients used as anchor. Confidence: 74%.

Model Routing

Right model for the right task

Using the wrong model wastes money and time. This routing table is extracted from production usage across 15+ tools.

Use Case	Model	Provider	Why
Parallel expert agents	deepseek-chat	DeepSeek	Smart + very cheap ($0.001/1K tokens)
Final synthesis	gemini-2.5-pro	Google	Best reasoning + thinking mode
Financial / process logic	o3	OpenAI	Best for structured reasoning
Fast / UX tasks	gemini-2.0-flash	Google	Free tier, fast
High volume / cheap	llama-3.3-70b	Groq	Free tier, very fast
❌ Avoid for agentic loops	claude-*	Anthropic	Too expensive at scale

Resources

Everything you need to start building

⚡

GitHub — debate-engine

Full source code. Clone, fork, and run your own multi-agent consilium in under an hour.

github.com/nealkhis/debate-engine

📐

Architecture Docs

Deep-dive into multi-model routing, agent orchestration, and the synthesis pipeline.

ac-architecture-docs-production.up.railway.app

🔑

Request Tool Access

Get hands-on access to the live tools: Virtual CTO, CEO Coach, Financial Audit, Hiring Assistant, Consilium.

nealkhis.com/access

💬

Questions? Telegram

Stuck on implementation or want to discuss a use case — write directly. I respond to builders and operators.

@Neal_kh