Cost Optimization
Reduce LLM costs by 30–70% without meaningful quality degradation.
agentlens.cost_optimizerAnalyzes agent event patterns and recommends cheaper model alternatives where task complexity doesn’t require expensive models.
Quick Start
from agentlens.cost_optimizer import CostOptimizer
from agentlens.models import AgentEvent
optimizer = CostOptimizer()
events = [
AgentEvent(model="gpt-4o", tokens_in=500, tokens_out=100,
event_type="llm_call"),
AgentEvent(model="gpt-4-turbo", tokens_in=50, tokens_out=10,
event_type="classification"),
]
report = optimizer.analyze(events)
print(f"Current cost: ${report.current_cost_usd:.4f}")
print(f"Optimized: ${report.optimized_cost_usd:.4f}")
print(f"Savings: ${report.total_savings_usd:.4f} ({report.total_savings_pct}%)")
How It Works
The optimizer follows a three-step pipeline for every analyzable event:
| Step | What Happens | Key Factors |
|---|---|---|
| 1. Complexity Assessment | Scores each event from 0.0 (trivial) to 1.0 (critical) | Output ratio, token volume, tool calls, decision traces, event type |
| 2. Model Matching | Finds the cheapest model in the recommended tier that fits | Same-provider preference, context window compatibility |
| 3. Savings Validation | Filters out recommendations below the minimum savings threshold | Savings percentage, confidence level, aggressive mode |
Complexity Levels
The ComplexityAnalyzer maps each event to one of five levels,
each associated with a recommended model tier:
| Level | Score Range | Recommended Tier | Typical Tasks |
|---|---|---|---|
| Trivial | < 0.15 | Economy | Formatting, simple extraction |
| Low | 0.15 – 0.30 | Economy | Classification, simple Q&A |
| Medium | 0.30 – 0.50 | Standard | Summarization, code review |
| High | 0.50 – 0.75 | Premium | Complex reasoning, code gen |
| Critical | ≥ 0.75 | Flagship | Deep research, multi-step planning |
Complexity Factors
Five weighted factors contribute to the complexity score:
FACTOR_WEIGHTS = {
"output_ratio": 0.25, # High output → more generation work
"token_volume": 0.20, # Large prompts → more context needed
"has_tool_call": 0.15, # Tool usage signals agentic behavior
"has_decision": 0.20, # Decision traces → reasoning required
"event_type": 0.20, # Some types are inherently complex
}
Model Registry
The built-in registry covers popular models across four tiers:
| Model | Tier | Input $/1M | Output $/1M | Context |
|---|---|---|---|---|
| gpt-4o-mini | Economy | $0.15 | $0.60 | 128K |
| gpt-3.5-turbo | Economy | $0.50 | $1.50 | 16K |
| claude-3-haiku | Economy | $0.25 | $1.25 | 200K |
| gpt-4o | Standard | $2.50 | $10.00 | 128K |
| claude-3-sonnet | Standard | $3.00 | $15.00 | 200K |
| claude-3.5-sonnet | Standard | $3.00 | $15.00 | 200K |
| gpt-4-turbo | Premium | $10.00 | $30.00 | 128K |
| gpt-4 | Premium | $30.00 | $60.00 | 8K |
| claude-3-opus | Flagship | $15.00 | $75.00 | 200K |
Custom Models
Register your own models or override pricing:
from agentlens.cost_optimizer import CostOptimizer, ModelInfo, ModelTier
# Via constructor
optimizer = CostOptimizer(custom_models={
"my-fine-tuned": ModelInfo(
name="my-fine-tuned",
tier=ModelTier.ECONOMY,
input_cost_per_1m=0.10,
output_cost_per_1m=0.30,
max_context=32_000,
strengths=["classification", "extraction"],
)
})
# Or after construction
optimizer.register_model("llama-3-70b", ModelInfo(
name="llama-3-70b",
tier=ModelTier.STANDARD,
input_cost_per_1m=0.80,
output_cost_per_1m=0.80,
max_context=128_000,
))
Optimization Report
The analyze() method returns an OptimizationReport
with these key fields:
| Field | Type | Description |
|---|---|---|
total_events | int | Total events analyzed |
optimizable_events | int | Events where a cheaper model is recommended |
current_cost_usd | float | Total cost at current model selection |
optimized_cost_usd | float | Projected cost after optimization |
total_savings_usd | float | Dollar savings |
total_savings_pct | float | Percentage reduction |
recommendations | list | Per-event model change suggestions |
model_usage | dict | Count of events per model |
tier_distribution | dict | Count of events per tier |
migration_plan | list | Phased rollout steps |
summary | str | Human-readable summary |
report = optimizer.analyze(events)
# Check if optimizations were found
if report.has_savings:
print(report.summary)
# Inspect individual recommendations
for rec in report.recommendations:
print(f" {rec.current_model} → {rec.recommended_model}")
print(f" Saves ${rec.estimated_savings_usd:.4f} ({rec.savings_pct}%)")
print(f" Confidence: {rec.confidence.value}")
print(f" Risk: {rec.risk}")
print()
Migration Plan
The optimizer generates a phased migration plan grouped by confidence level:
| Phase | Confidence | Risk | Approach |
|---|---|---|---|
| 1 | High | Low | Quick wins — switch immediately with minimal risk |
| 2 | Medium | Medium | A/B test before full rollout |
| 3 | Low | High | Experimental — requires quality monitoring and rollback |
for step in report.migration_plan:
print(f"Phase {step.phase}: {step.description}")
print(f" Models to change: {step.models_to_change}")
print(f" Target: {step.target_model}")
print(f" Est. savings: {step.estimated_savings_pct}%")
Quick Estimate
For a fast overview without full recommendations, use quick_estimate():
estimate = optimizer.quick_estimate(events)
print(f"Current cost: ${estimate['current_cost']:.4f}")
print(f"Potential savings: ${estimate['potential_savings']:.4f}")
print(f"Savings %: {estimate['savings_pct']}%")
print(f"Overprovisioned: {estimate['overprovisioned_count']}/{estimate['total_events']}")
Single-Event Suggestion
Get a model recommendation for a single event:
event = AgentEvent(model="gpt-4-turbo", tokens_in=50, tokens_out=10,
event_type="classification")
suggestion = optimizer.suggest_model(event)
if suggestion:
print(f"Consider using {suggestion} instead of {event.model}")
else:
print("Current model is appropriate for this task")
Configuration
| Parameter | Default | Description |
|---|---|---|
aggressive | False |
Include low-confidence recommendations (higher savings, higher risk) |
min_savings_pct | 10.0 |
Minimum savings percentage to include a recommendation |
custom_models | None |
Dict of additional or overridden model definitions |
# Conservative (default) — only high/medium confidence
optimizer = CostOptimizer()
# Aggressive — include all recommendations
optimizer = CostOptimizer(aggressive=True, min_savings_pct=5.0)
Session-Specific Analysis
Analyze events from a specific session:
# Filter events by session ID
report = optimizer.analyze_session_events(all_events, session_id="sess-abc123")
print(f"Session cost: ${report.current_cost_usd:.4f}")
Confidence Levels
Each recommendation carries a confidence assessment based on the complexity score and the tier gap between current and recommended models:
| Confidence | When Assigned | Action |
|---|---|---|
| High | Low complexity + small tier gap (≤ 1) | Safe to switch immediately |
| Medium | Low complexity + larger gap, or medium complexity + small gap | A/B test recommended |
| Low | Higher complexity or large tier gaps | Only included in aggressive mode; monitor carefully |
Best Practices
- Start conservative. Use default settings first.
Only enable
aggressive=Trueafter validating quality. - A/B test medium-confidence changes. Run 10–20% of traffic through the recommended model and compare output quality.
- Update the model registry. Pricing changes frequently. Register updated costs to get accurate savings estimates.
- Classify your event types. The more specific the
event_type(e.g.,"classification"vs"generic"), the better the complexity assessment. - Monitor after switching. Use AgentLens’s evaluation and drift modules to detect quality regressions after model changes.
- Review periodically. Run optimization analysis weekly or after major agent changes to catch new savings opportunities.
API Reference
CostOptimizer
| Method | Returns | Description |
|---|---|---|
analyze(events) | OptimizationReport | Full analysis with recommendations and migration plan |
analyze_session_events(events, session_id) | OptimizationReport | Analyze events filtered to a specific session |
quick_estimate(events) | dict | Fast cost overview without per-event details |
suggest_model(event) | str | None | Single-event model recommendation |
register_model(name, info) | None | Add or update a model in the registry |
ComplexityAnalyzer
| Method | Returns | Description |
|---|---|---|
assess(event) | ComplexityAssessment | Score an event’s complexity (0.0–1.0) with level and reasoning |
Data Classes
| Class | Key Fields |
|---|---|
ModelInfo | name, tier, input_cost_per_1m, output_cost_per_1m, max_context, strengths |
ComplexityAssessment | level, score, factors, recommended_tier, reasoning |
Recommendation | current_model, recommended_model, estimated_savings_usd, confidence, risk |
MigrationStep | phase, description, models_to_change, target_model, estimated_savings_pct |
OptimizationReport | total_events, recommendations, current_cost_usd, total_savings_pct, migration_plan |
Enums
| Enum | Values |
|---|---|
ModelTier | ECONOMY, STANDARD, PREMIUM, FLAGSHIP |
ComplexityLevel | TRIVIAL, LOW, MEDIUM, HIGH, CRITICAL |
Confidence | HIGH, MEDIUM, LOW |