Cost Optimization

Reduce LLM costs by 30–70% without meaningful quality degradation.

Module: agentlens.cost_optimizer
Analyzes agent event patterns and recommends cheaper model alternatives where task complexity doesn’t require expensive models.

Quick Start

from agentlens.cost_optimizer import CostOptimizer
from agentlens.models import AgentEvent

optimizer = CostOptimizer()

events = [
    AgentEvent(model="gpt-4o", tokens_in=500, tokens_out=100,
               event_type="llm_call"),
    AgentEvent(model="gpt-4-turbo", tokens_in=50, tokens_out=10,
               event_type="classification"),
]

report = optimizer.analyze(events)

print(f"Current cost:  ${report.current_cost_usd:.4f}")
print(f"Optimized:     ${report.optimized_cost_usd:.4f}")
print(f"Savings:       ${report.total_savings_usd:.4f} ({report.total_savings_pct}%)")

How It Works

The optimizer follows a three-step pipeline for every analyzable event:

Step	What Happens	Key Factors
1. Complexity Assessment	Scores each event from 0.0 (trivial) to 1.0 (critical)	Output ratio, token volume, tool calls, decision traces, event type
2. Model Matching	Finds the cheapest model in the recommended tier that fits	Same-provider preference, context window compatibility
3. Savings Validation	Filters out recommendations below the minimum savings threshold	Savings percentage, confidence level, aggressive mode

Complexity Levels

The ComplexityAnalyzer maps each event to one of five levels, each associated with a recommended model tier:

Level	Score Range	Recommended Tier	Typical Tasks
Trivial	< 0.15	Economy	Formatting, simple extraction
Low	0.15 – 0.30	Economy	Classification, simple Q&A
Medium	0.30 – 0.50	Standard	Summarization, code review
High	0.50 – 0.75	Premium	Complex reasoning, code gen
Critical	≥ 0.75	Flagship	Deep research, multi-step planning

Complexity Factors

Five weighted factors contribute to the complexity score:

FACTOR_WEIGHTS = {
    "output_ratio":   0.25,   # High output → more generation work
    "token_volume":   0.20,   # Large prompts → more context needed
    "has_tool_call":  0.15,   # Tool usage signals agentic behavior
    "has_decision":   0.20,   # Decision traces → reasoning required
    "event_type":     0.20,   # Some types are inherently complex
}

Model Registry

The built-in registry covers popular models across four tiers:

Model	Tier	Input $/1M	Output $/1M	Context
gpt-4o-mini	Economy	$0.15	$0.60	128K
gpt-3.5-turbo	Economy	$0.50	$1.50	16K
claude-3-haiku	Economy	$0.25	$1.25	200K
gpt-4o	Standard	$2.50	$10.00	128K
claude-3-sonnet	Standard	$3.00	$15.00	200K
claude-3.5-sonnet	Standard	$3.00	$15.00	200K
gpt-4-turbo	Premium	$10.00	$30.00	128K
gpt-4	Premium	$30.00	$60.00	8K
claude-3-opus	Flagship	$15.00	$75.00	200K

Custom Models

from agentlens.cost_optimizer import CostOptimizer, ModelInfo, ModelTier

# Via constructor
optimizer = CostOptimizer(custom_models={
    "my-fine-tuned": ModelInfo(
        name="my-fine-tuned",
        tier=ModelTier.ECONOMY,
        input_cost_per_1m=0.10,
        output_cost_per_1m=0.30,
        max_context=32_000,
        strengths=["classification", "extraction"],
    )
})

# Or after construction
optimizer.register_model("llama-3-70b", ModelInfo(
    name="llama-3-70b",
    tier=ModelTier.STANDARD,
    input_cost_per_1m=0.80,
    output_cost_per_1m=0.80,
    max_context=128_000,
))

Optimization Report

The analyze() method returns an OptimizationReport with these key fields:

Field	Type	Description
`total_events`	int	Total events analyzed
`optimizable_events`	int	Events where a cheaper model is recommended
`current_cost_usd`	float	Total cost at current model selection
`optimized_cost_usd`	float	Projected cost after optimization
`total_savings_usd`	float	Dollar savings
`total_savings_pct`	float	Percentage reduction
`recommendations`	list	Per-event model change suggestions
`model_usage`	dict	Count of events per model
`tier_distribution`	dict	Count of events per tier
`migration_plan`	list	Phased rollout steps
`summary`	str	Human-readable summary

report = optimizer.analyze(events)

# Check if optimizations were found
if report.has_savings:
    print(report.summary)

    # Inspect individual recommendations
    for rec in report.recommendations:
        print(f"  {rec.current_model} → {rec.recommended_model}")
        print(f"  Saves ${rec.estimated_savings_usd:.4f} ({rec.savings_pct}%)")
        print(f"  Confidence: {rec.confidence.value}")
        print(f"  Risk: {rec.risk}")
        print()

Migration Plan

The optimizer generates a phased migration plan grouped by confidence level:

Phase	Confidence	Risk	Approach
1	High	Low	Quick wins — switch immediately with minimal risk
2	Medium	Medium	A/B test before full rollout
3	Low	High	Experimental — requires quality monitoring and rollback

for step in report.migration_plan:
    print(f"Phase {step.phase}: {step.description}")
    print(f"  Models to change: {step.models_to_change}")
    print(f"  Target: {step.target_model}")
    print(f"  Est. savings: {step.estimated_savings_pct}%")

Quick Estimate

For a fast overview without full recommendations, use quick_estimate():

estimate = optimizer.quick_estimate(events)
print(f"Current cost:        ${estimate['current_cost']:.4f}")
print(f"Potential savings:   ${estimate['potential_savings']:.4f}")
print(f"Savings %:           {estimate['savings_pct']}%")
print(f"Overprovisioned:     {estimate['overprovisioned_count']}/{estimate['total_events']}")

Single-Event Suggestion

Get a model recommendation for a single event:

event = AgentEvent(model="gpt-4-turbo", tokens_in=50, tokens_out=10,
                   event_type="classification")

suggestion = optimizer.suggest_model(event)
if suggestion:
    print(f"Consider using {suggestion} instead of {event.model}")
else:
    print("Current model is appropriate for this task")

Configuration

Parameter	Default	Description
`aggressive`	`False`	Include low-confidence recommendations (higher savings, higher risk)
`min_savings_pct`	`10.0`	Minimum savings percentage to include a recommendation
`custom_models`	`None`	Dict of additional or overridden model definitions

# Conservative (default) — only high/medium confidence
optimizer = CostOptimizer()

# Aggressive — include all recommendations
optimizer = CostOptimizer(aggressive=True, min_savings_pct=5.0)

Session-Specific Analysis

Analyze events from a specific session:

# Filter events by session ID
report = optimizer.analyze_session_events(all_events, session_id="sess-abc123")
print(f"Session cost: ${report.current_cost_usd:.4f}")

Confidence Levels

Each recommendation carries a confidence assessment based on the complexity score and the tier gap between current and recommended models:

Confidence	When Assigned	Action
High	Low complexity + small tier gap (≤ 1)	Safe to switch immediately
Medium	Low complexity + larger gap, or medium complexity + small gap	A/B test recommended
Low	Higher complexity or large tier gaps	Only included in aggressive mode; monitor carefully

Best Practices

Start conservative. Use default settings first. Only enable aggressive=True after validating quality.
A/B test medium-confidence changes. Run 10–20% of traffic through the recommended model and compare output quality.
Update the model registry. Pricing changes frequently. Register updated costs to get accurate savings estimates.
Classify your event types. The more specific the event_type (e.g., "classification" vs "generic"), the better the complexity assessment.
Monitor after switching. Use AgentLens’s evaluation and drift modules to detect quality regressions after model changes.
Review periodically. Run optimization analysis weekly or after major agent changes to catch new savings opportunities.

API Reference

`CostOptimizer`

Method	Returns	Description
`analyze(events)`	`OptimizationReport`	Full analysis with recommendations and migration plan
`analyze_session_events(events, session_id)`	`OptimizationReport`	Analyze events filtered to a specific session
`quick_estimate(events)`	`dict`	Fast cost overview without per-event details
`suggest_model(event)`	`str \| None`	Single-event model recommendation
`register_model(name, info)`	`None`	Add or update a model in the registry

`ComplexityAnalyzer`

Method	Returns	Description
`assess(event)`	`ComplexityAssessment`	Score an event’s complexity (0.0–1.0) with level and reasoning

Data Classes

Class	Key Fields
`ModelInfo`	name, tier, input_cost_per_1m, output_cost_per_1m, max_context, strengths
`ComplexityAssessment`	level, score, factors, recommended_tier, reasoning
`Recommendation`	current_model, recommended_model, estimated_savings_usd, confidence, risk
`MigrationStep`	phase, description, models_to_change, target_model, estimated_savings_pct
`OptimizationReport`	total_events, recommendations, current_cost_usd, total_savings_pct, migration_plan

Enums

Enum	Values
`ModelTier`	ECONOMY, STANDARD, PREMIUM, FLAGSHIP
`ComplexityLevel`	TRIVIAL, LOW, MEDIUM, HIGH, CRITICAL
`Confidence`	HIGH, MEDIUM, LOW