Cheap LLM for Dev/Test

All LLM calls in Binexia go through litellm — swapping providers is a config change, not a code change. Providers are grouped by API protocol (not individual brands): most providers (DeepSeek, Groq, Together, Fireworks, Perplexity, GLM/Zhipu, Azure) speak the OpenAI protocol — same SDK, different endpoint. Use free tiers during development, switch to production models only for quality validation.

Best Option: Google Gemini 2.0 Flash (Free)

Completely free tier with generous limits. One API key covers all 8 agents.

Metric	Free Tier
Requests/min	15 RPM
Tokens/min	1,000,000 TPM
Requests/day	1,500 RPD
Cost	$0

Setup:

# .env.testip.local — add this one line
GOOGLE_API_KEY=AIza...    # https://aistudio.google.com/apikey

litellm config (services/litellm/config.yaml):

- model_name: gemini-2.0-flash
  litellm_params:
    model: gemini/gemini-2.0-flash
    api_key: os.environ/GOOGLE_API_KEY

Update routing table:

UPDATE ubios_config.llm_routing
SET provider = 'google',
    model = 'gemini-2.0-flash',
    api_key_env = 'GOOGLE_API_KEY';

Or change per-agent in Settings → LLM Routing.

Other Free/Budget Options

Groq (Free Tier)

LPU-accelerated inference — very fast. Llama models only.

Model	Free RPM	Free TPM
llama-3.3-70b-versatile	30	12,000
llama-3.1-8b-instant	30	6,000

GROQ_API_KEY=gsk_...    # https://console.groq.com

Good for Orchestrator and Context agents. Less capable for complex SQL generation.

DeepSeek V3.2 (~$0.28/1M input)

Extremely cheap, strong reasoning. Cache hits cost $0.028/1M.

DEEPSEEK_API_KEY=sk-...    # https://platform.deepseek.com

10,000 test queries at ~1,500 tokens each ≈ $0.004 (cache miss).

Gemini 2.0 Flash-Lite (Free, Faster)

Same free tier but higher RPM (30). Use for high-volume automated tests.

Recommended Dev Routes

Agent	Dev Default	Cost
OrchestratorAgent	`gemini-2.0-flash`	Free
AnalyticsAgent	`gemini-2.0-flash`	Free
KnowledgeAgent	`gemini-2.0-flash`	Free
ContextAgent	`gemini-2.0-flash-lite`	Free
BehavioralScoringAgent	`gemini-2.0-flash`	Free
AnomalyDetectionAgent	`gemini-2.0-flash`	Free
ForecastAgent	`gemini-2.0-flash`	Free
DocumentExtractionAgent	`gemini-2.0-flash`	Free

All 8 agents on one free key.

When to Switch to Paid

Scenario	Switch to	Why
SQL accuracy validation	Anthropic Claude Sonnet	Production-grade text-to-SQL
RAG quality benchmarking	OpenAI GPT-4o	Baseline comparison
Load testing (>15 RPM)	DeepSeek V3.2	Cheap, high rate limits
CI/CD pipeline (>1.5K req/day)	DeepSeek V3.2	Near-zero cost

Cost Comparison: 1,000 Queries per Agent

Provider	8 agents × 1,000 queries
Google Gemini 2.0 Flash (free)	$0.00
DeepSeek V3.2	~$0.03
OpenAI GPT-4o-mini	~$1.84
Anthropic Claude Sonnet	~$48.00

Info

Without any LLM keys, Binexia runs in mock mode — dashboard widgets show hardcoded demo data, AI queries return canned responses. Everything else works: login, CRUD, file upload, dashboard layout, semantic model editing.

Free Tier Gotchas

Gemini 15 RPM is shared across all agents — integration test suites can hit this. Add time.sleep(4) between agent calls if rate-limited
Groq models change occasionally — pin a specific version if tests depend on output format
DeepSeek can be slow during peak hours (China daytime) — use cache hits when possible