Guides
Cheap LLM for Dev/Test
Use free and budget LLMs during development to avoid burning expensive tokens.
All LLM calls in Binexia go through litellm — swapping providers is a config change, not a code change. Providers are grouped by API protocol (not individual brands): most providers (DeepSeek, Groq, Together, Fireworks, Perplexity, GLM/Zhipu, Azure) speak the OpenAI protocol — same SDK, different endpoint. Use free tiers during development, switch to production models only for quality validation.
Best Option: Google Gemini 2.0 Flash (Free)
Completely free tier with generous limits. One API key covers all 8 agents.
| Metric | Free Tier |
|---|---|
| Requests/min | 15 RPM |
| Tokens/min | 1,000,000 TPM |
| Requests/day | 1,500 RPD |
| Cost | $0 |
Setup:
# .env.testip.local — add this one line
GOOGLE_API_KEY=AIza... # https://aistudio.google.com/apikeylitellm config (services/litellm/config.yaml):
- model_name: gemini-2.0-flash
litellm_params:
model: gemini/gemini-2.0-flash
api_key: os.environ/GOOGLE_API_KEYUpdate routing table:
UPDATE ubios_config.llm_routing
SET provider = 'google',
model = 'gemini-2.0-flash',
api_key_env = 'GOOGLE_API_KEY';Or change per-agent in Settings → LLM Routing.
Other Free/Budget Options
Groq (Free Tier)
LPU-accelerated inference — very fast. Llama models only.
| Model | Free RPM | Free TPM |
|---|---|---|
| llama-3.3-70b-versatile | 30 | 12,000 |
| llama-3.1-8b-instant | 30 | 6,000 |
GROQ_API_KEY=gsk_... # https://console.groq.comGood for Orchestrator and Context agents. Less capable for complex SQL generation.
DeepSeek V3.2 (~$0.28/1M input)
Extremely cheap, strong reasoning. Cache hits cost $0.028/1M.
DEEPSEEK_API_KEY=sk-... # https://platform.deepseek.com10,000 test queries at ~1,500 tokens each ≈ $0.004 (cache miss).
Gemini 2.0 Flash-Lite (Free, Faster)
Same free tier but higher RPM (30). Use for high-volume automated tests.
Recommended Dev Routes
| Agent | Dev Default | Cost |
|---|---|---|
| OrchestratorAgent | gemini-2.0-flash | Free |
| AnalyticsAgent | gemini-2.0-flash | Free |
| KnowledgeAgent | gemini-2.0-flash | Free |
| ContextAgent | gemini-2.0-flash-lite | Free |
| BehavioralScoringAgent | gemini-2.0-flash | Free |
| AnomalyDetectionAgent | gemini-2.0-flash | Free |
| ForecastAgent | gemini-2.0-flash | Free |
| DocumentExtractionAgent | gemini-2.0-flash | Free |
All 8 agents on one free key.
When to Switch to Paid
| Scenario | Switch to | Why |
|---|---|---|
| SQL accuracy validation | Anthropic Claude Sonnet | Production-grade text-to-SQL |
| RAG quality benchmarking | OpenAI GPT-4o | Baseline comparison |
| Load testing (>15 RPM) | DeepSeek V3.2 | Cheap, high rate limits |
| CI/CD pipeline (>1.5K req/day) | DeepSeek V3.2 | Near-zero cost |
Cost Comparison: 1,000 Queries per Agent
| Provider | 8 agents × 1,000 queries |
|---|---|
| Google Gemini 2.0 Flash (free) | $0.00 |
| DeepSeek V3.2 | ~$0.03 |
| OpenAI GPT-4o-mini | ~$1.84 |
| Anthropic Claude Sonnet | ~$48.00 |
Info
Without any LLM keys, Binexia runs in mock mode — dashboard widgets show hardcoded demo data, AI queries return canned responses. Everything else works: login, CRUD, file upload, dashboard layout, semantic model editing.
Free Tier Gotchas
- Gemini 15 RPM is shared across all agents — integration test suites can hit this. Add
time.sleep(4)between agent calls if rate-limited - Groq models change occasionally — pin a specific version if tests depend on output format
- DeepSeek can be slow during peak hours (China daytime) — use cache hits when possible