Guides

Cheap LLM for Dev/Test

Use free and budget LLMs during development to avoid burning expensive tokens.

All LLM calls in Binexia go through litellm — swapping providers is a config change, not a code change. Providers are grouped by API protocol (not individual brands): most providers (DeepSeek, Groq, Together, Fireworks, Perplexity, GLM/Zhipu, Azure) speak the OpenAI protocol — same SDK, different endpoint. Use free tiers during development, switch to production models only for quality validation.

Best Option: Google Gemini 2.0 Flash (Free)

Completely free tier with generous limits. One API key covers all 8 agents.

MetricFree Tier
Requests/min15 RPM
Tokens/min1,000,000 TPM
Requests/day1,500 RPD
Cost$0

Setup:

# .env.testip.local — add this one line
GOOGLE_API_KEY=AIza...    # https://aistudio.google.com/apikey

litellm config (services/litellm/config.yaml):

- model_name: gemini-2.0-flash
  litellm_params:
    model: gemini/gemini-2.0-flash
    api_key: os.environ/GOOGLE_API_KEY

Update routing table:

UPDATE ubios_config.llm_routing
SET provider = 'google',
    model = 'gemini-2.0-flash',
    api_key_env = 'GOOGLE_API_KEY';

Or change per-agent in Settings → LLM Routing.

Other Free/Budget Options

Groq (Free Tier)

LPU-accelerated inference — very fast. Llama models only.

ModelFree RPMFree TPM
llama-3.3-70b-versatile3012,000
llama-3.1-8b-instant306,000
GROQ_API_KEY=gsk_...    # https://console.groq.com

Good for Orchestrator and Context agents. Less capable for complex SQL generation.

DeepSeek V3.2 (~$0.28/1M input)

Extremely cheap, strong reasoning. Cache hits cost $0.028/1M.

DEEPSEEK_API_KEY=sk-...    # https://platform.deepseek.com

10,000 test queries at ~1,500 tokens each ≈ $0.004 (cache miss).

Gemini 2.0 Flash-Lite (Free, Faster)

Same free tier but higher RPM (30). Use for high-volume automated tests.

AgentDev DefaultCost
OrchestratorAgentgemini-2.0-flashFree
AnalyticsAgentgemini-2.0-flashFree
KnowledgeAgentgemini-2.0-flashFree
ContextAgentgemini-2.0-flash-liteFree
BehavioralScoringAgentgemini-2.0-flashFree
AnomalyDetectionAgentgemini-2.0-flashFree
ForecastAgentgemini-2.0-flashFree
DocumentExtractionAgentgemini-2.0-flashFree

All 8 agents on one free key.

When to Switch to Paid

ScenarioSwitch toWhy
SQL accuracy validationAnthropic Claude SonnetProduction-grade text-to-SQL
RAG quality benchmarkingOpenAI GPT-4oBaseline comparison
Load testing (>15 RPM)DeepSeek V3.2Cheap, high rate limits
CI/CD pipeline (>1.5K req/day)DeepSeek V3.2Near-zero cost

Cost Comparison: 1,000 Queries per Agent

Provider8 agents × 1,000 queries
Google Gemini 2.0 Flash (free)$0.00
DeepSeek V3.2~$0.03
OpenAI GPT-4o-mini~$1.84
Anthropic Claude Sonnet~$48.00

Info

Without any LLM keys, Binexia runs in mock mode — dashboard widgets show hardcoded demo data, AI queries return canned responses. Everything else works: login, CRUD, file upload, dashboard layout, semantic model editing.

Free Tier Gotchas

  • Gemini 15 RPM is shared across all agents — integration test suites can hit this. Add time.sleep(4) between agent calls if rate-limited
  • Groq models change occasionally — pin a specific version if tests depend on output format
  • DeepSeek can be slow during peak hours (China daytime) — use cache hits when possible