Setting Up LLM Providers

How LLM Keys Work

All provider keys live in .env.testip.local. Two services read them:

Agno reads keys directly via litellm Python library (in-process, lowest latency)
LiteLLM Proxy reads keys from services/litellm/config.yaml and exposes an OpenAI-compatible API for Dify and other services

Both use the same keys — no duplication.

Quick Setup

Edit .env.testip.local and add at least one provider key:

# Cheapest option — works for all agents
OPENAI_API_KEY=sk-...

# Better reasoning — recommended for Analytics and Behavioral agents
ANTHROPIC_API_KEY=sk-ant-...

# LiteLLM proxy master key (for Dify access)
LITELLM_MASTER_KEY=sk-litellm-your-secret

Then restart containers:

docker compose -f docker-compose.ip-test.yml --env-file .env.testip.local up -d

No rebuild needed — env vars are read on container start.

Verifying Setup

Agno (direct)

curl http://$HOST_IP:8001/health

LiteLLM Proxy

curl http://$HOST_IP:4000/health

Test a completion through the proxy:

curl http://$HOST_IP:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 50}'

Verifying a Provider Works

Test a simple query from the UI: log in, open a dashboard, and type a question in the Raw Query widget. If you get a chart, the LLM is working.

LiteLLM Proxy Configuration

The proxy config is at services/litellm/config.yaml. It maps friendly model names to provider-specific parameters:

model_list:
  - model_name: gpt-4o-mini
    litellm_params:
      model: openai/gpt-4o-mini
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: glm-4-flash
    litellm_params:
      model: openai/glm-4-flash
      api_key: os.environ/GLM_API_KEY
      api_base: https://open.bigmodel.cn/api/paas/v4

To add a new model, add it to config.yaml and restart the litellm container:

docker compose -f docker-compose.ip-test.yml restart litellm

Provider Details

Providers are grouped by API protocol. Most providers speak the OpenAI API format — same SDK, different endpoint. litellm handles the routing automatically.

OpenAI or Compatible

These providers use the OpenAI /v1/chat/completions API protocol. litellm routes each to the correct endpoint.

OpenAI (Recommended Default)

OPENAI_API_KEY=sk-proj-...    # https://platform.openai.com/api-keys

Cost: ~$0.15/1M input tokens (GPT-4o-mini), ~$2.50/1M (GPT-4o)
Good for: Orchestrator, Knowledge, Context, Document Extraction
Provider: openai

DeepSeek

DEEPSEEK_API_KEY=sk-...    # https://platform.deepseek.com

Cost: Very cheap (~$0.14/1M tokens for DeepSeek-V3)
Good for: Budget-conscious deployments
Provider: deepseek

Groq

GROQ_API_KEY=gsk_...    # https://console.groq.com

Very fast inference via LPU hardware
Good for: Real-time responses, low latency
Provider: groq

Together AI

TOGETHER_API_KEY=...    # https://api.together.ai

Open-source models at scale — Llama, Mistral, FLUX
Provider: together

Fireworks AI

FIREWORKS_API_KEY=...    # https://fireworks.ai

Fast open-source model inference
Provider: fireworks

Perplexity

PERPLEXITY_API_KEY=pplx-...    # https://perplexity.ai/settings/api

Sonar models — search-augmented generation
Provider: perplexity

GLM (Zhipu AI)

GLM_API_KEY=your-glm-key    # https://open.bigmodel.cn

Configure in the LLM Routing Editor:

Provider: glm
Model: glm-4-flash (or glm-4-plus, glm-4-long)
API Base URL: https://open.bigmodel.cn/api/paas/v4
API Key Env Var: GLM_API_KEY

GLM uses an OpenAI-compatible API — the glm provider setting automatically routes through the openai/ litellm prefix with the correct api_base.

Azure OpenAI

AZURE_API_KEY=...
AZURE_API_BASE=https://your-resource.openai.azure.com
AZURE_API_VERSION=2024-06-01

Enterprise compliance — regional deployment, data residency
Provider: azure
Requires api_base set to your Azure endpoint

Custom (Any OpenAI-Compatible Endpoint)

For self-hosted models (Ollama, vLLM, LocalAI, etc.):

CUSTOM_LLM_API_KEY=optional-key

Provider: custom
Set api_base to your endpoint (e.g., http://localhost:11434/v1 for Ollama)

Anthropic

ANTHROPIC_API_KEY=sk-ant-api03-...    # https://console.anthropic.com/settings/keys

Cost: ~$3/1M input tokens (Claude Sonnet)
Good for: Analytics, Behavioral Scoring, Anomaly Detection, Forecast
Stronger at: Complex SQL generation, reasoning, analysis
Provider: anthropic

Google Gemini

GOOGLE_API_KEY=AI...    # https://aistudio.google.com/apikey

Multimodal — text, images, video
Provider: google
Models: gemini-2.0-flash, gemini-2.5-pro
Free tier: 15 RPM, 1M TPM — great for development (see Cheap LLM for Dev)

Mistral

MISTRAL_API_KEY=...    # https://console.mistral.ai

European provider — data stays in EU
Provider: mistral

OpenRouter (Gateway)

OPENROUTER_API_KEY=sk-or-...    # https://openrouter.ai/keys

Cost: Varies by model (often cheaper than direct)
Good for: Accessing 100+ models with one key
Model format: anthropic/claude-3.5-sonnet, meta-llama/llama-3-70b, etc.
Provider: openrouter

Cohere

COHERE_API_KEY=...    # https://dashboard.cohere.com

Enterprise NLP — search, summarization, classification
Provider: cohere

Mock Mode (No Keys)

Without any LLM keys, Binexia runs in mock mode:

Dashboard widgets show hardcoded demo data
AI queries return canned demo responses
Document extraction skips LLM synthesis
Scheduled agents don't run

Everything else works: login, CRUD, file upload, dashboard layout, semantic model editing.

Cost Estimates

Approximate cost per agent type with default models:

Agent	Model	Avg tokens/query	Est. cost/1000 queries
Orchestrator	GPT-4o-mini	~500	$0.08
Analytics	Claude Sonnet	~2000	$6.00
Knowledge	GPT-4o-mini	~1500	$0.23
Context	GPT-4o-mini	~800	$0.12
Behavioral	Claude Sonnet	~3000	$9.00
Anomaly	Claude Sonnet	~2000	$6.00
Forecast	Claude Sonnet	~2500	$7.50