Python SDK Reference

Complete reference for the agentlens Python package.

Installation

# From source (development mode)
cd sdk
pip install -e .

# From source (production)
pip install ./sdk

# With dev dependencies (for testing)
pip install -e ".[dev]"

agentlens.init()

Initialize the SDK and connect to the AgentLens backend. Must be called before any other SDK function.

agentlens.init(
    api_key: str = "default",
    endpoint: str = "http://localhost:3000"
) -> AgentTracker
ParameterTypeDefaultDescription
api_keystr"default"API key for authentication (sent as X-API-Key header)
endpointstr"http://localhost:3000"URL of the AgentLens backend

Returns: An AgentTracker instance (also stored globally for module-level functions).

import agentlens

tracker = agentlens.init(
    api_key="prod-key-abc123",
    endpoint="https://agentlens.example.com"
)
⚠️ Call init() first

All other SDK functions raise RuntimeError if init() hasn't been called.

agentlens.start_session()

Create a new tracking session. All subsequent track() calls are associated with this session.

agentlens.start_session(
    agent_name: str = "default-agent",
    metadata: dict | None = None
) -> Session
ParameterTypeDefaultDescription
agent_namestr"default-agent"Name identifying the agent
metadatadict | NoneNoneArbitrary metadata (version, environment, etc.)

Returns: A Session object with a unique session_id.

session = agentlens.start_session(
    agent_name="research-agent-v2",
    metadata={"version": "2.1.0", "environment": "production"}
)
print(f"Session: {session.session_id}")

agentlens.track()

Record a single event (LLM call, tool call, decision, error, etc.) in the current session.

agentlens.track(
    event_type: str = "generic",
    input_data: dict | None = None,
    output_data: dict | None = None,
    model: str | None = None,
    tokens_in: int = 0,
    tokens_out: int = 0,
    reasoning: str | None = None,
    tool_name: str | None = None,
    tool_input: dict | None = None,
    tool_output: dict | None = None,
    duration_ms: float | None = None,
) -> AgentEvent
ParameterTypeDescription
event_typestrEvent category: "llm_call", "tool_call", "decision", "error", "generic"
input_datadictInput to the operation (prompt, query, etc.)
output_datadictOutput from the operation (response, result, etc.)
modelstrLLM model name (e.g., "gpt-4", "claude-3.5-sonnet")
tokens_inintInput/prompt token count
tokens_outintOutput/completion token count
reasoningstrWhy the agent made this decision (creates a DecisionTrace)
tool_namestrTool/function name (creates a ToolCall)
tool_inputdictTool input parameters
tool_outputdictTool return value
duration_msfloatExecution time in milliseconds

Example: Track an LLM Call

agentlens.track(
    event_type="llm_call",
    input_data={"prompt": "Summarize this article", "article_url": "https://..."},
    output_data={"response": "The article discusses..."},
    model="gpt-4-turbo",
    tokens_in=1200,
    tokens_out=350,
    reasoning="User asked for a summary. Using GPT-4 Turbo for long context.",
    duration_ms=2340.5,
)

Example: Track a Tool Call

agentlens.track(
    event_type="tool_call",
    tool_name="web_search",
    tool_input={"query": "latest AI safety papers 2026"},
    tool_output={"results": [{"title": "...", "url": "..."}]},
    duration_ms=890.2,
)

agentlens.explain()

Generate a human-readable explanation of all events in the current (or specified) session.

agentlens.explain(
    session_id: str | None = None
) -> str

Returns a Markdown-formatted string with a timeline of events, token counts, and reasoning traces.

explanation = agentlens.explain()
print(explanation)

# Output:
# ## Session Explanation: research-agent-v2
# **Session ID:** a1b2c3d4
# **Started:** 2026-02-14T10:30:00+00:00
# **Status:** active
# **Total tokens:** 1550 in / 353 out
#
# ### Event Timeline:
# 1. [10:30:01.234] **llm_call** (model: gpt-4-turbo)
#    💡 Reasoning: User asked for a summary...
#    📊 Tokens: 1200 in / 350 out
# 2. [10:30:02.124] **tool_call** → tool: web_search

agentlens.end_session()

End the current session, mark it as completed, and flush all pending events to the backend.

agentlens.end_session(
    session_id: str | None = None
) -> None
💡 Always end sessions

Calling end_session() ensures all buffered events are flushed to the backend. If you forget, some events may be lost if the process exits before the background flush thread runs.

AgentTracker (Advanced)

If you need more control, use the AgentTracker instance directly instead of the module-level functions:

from agentlens.tracker import AgentTracker
from agentlens.transport import Transport

transport = Transport(
    endpoint="http://localhost:3000",
    api_key="my-key",
    batch_size=20,         # Flush every 20 events
    flush_interval=10.0,   # Or every 10 seconds
    max_retries=5,         # Retry failed sends 5 times
)
tracker = AgentTracker(transport=transport)

session = tracker.start_session(agent_name="custom-agent")
tracker.track(event_type="llm_call", model="gpt-4", tokens_in=100, tokens_out=50)
tracker.end_session()

tracker.track_tool()

Convenience method for tracking tool calls:

tracker.track_tool(
    tool_name="database_query",
    tool_input={"sql": "SELECT * FROM users WHERE active = 1"},
    tool_output={"rows": 42},
    duration_ms=15.3,
)

Multiple Sessions

The tracker supports multiple concurrent sessions. The most recently started session is the "current" one used by module-level functions:

# Session 1
s1 = agentlens.start_session(agent_name="agent-a")
agentlens.track(event_type="llm_call", model="gpt-4", tokens_in=100, tokens_out=50)
agentlens.end_session()

# Session 2
s2 = agentlens.start_session(agent_name="agent-b")
agentlens.track(event_type="tool_call", tool_name="search")
agentlens.end_session()

# Or end a specific session by ID
agentlens.end_session(session_id=s1.session_id)

Analysis Modules

AgentLens includes several analysis modules beyond basic tracking. These run client-side (pure Python, no external dependencies) and work with the same Session and event data you already collect.

CostForecaster

Predicts future AI costs from historical usage using linear regression, exponential moving average, or simple averaging. Useful for budget planning and overspend alerts.

from agentlens.forecast import CostForecaster, UsageRecord
from datetime import datetime

forecaster = CostForecaster()

# Feed historical data points
forecaster.add_record(UsageRecord(
    timestamp=datetime(2026, 3, 1, 10, 0),
    tokens_in=5000, tokens_out=2000,
    cost_usd=0.035, model="gpt-4o"
))
# ... add more records ...

# Forecast next 7 days
forecast = forecaster.forecast_daily(days=7)
print(f"Predicted 7-day cost: ${forecast.total_predicted_cost:.2f}")
print(f"Method used: {forecast.method}")
for day in forecast.daily_predictions:
    print(f"  {day.date}: ${day.predicted_cost:.4f}")

# Get a spending summary
summary = forecaster.spending_summary()
print(f"Daily average: ${summary.daily_average:.4f}")
print(f"Monthly projection: ${summary.monthly_projection:.2f}")
print(f"Most expensive model: {summary.top_model}")

Forecast methods (auto-selected based on data volume):

Key classes: UsageRecord, DailyPrediction, ForecastResult, SpendingSummary

ComplianceChecker

Policy-based session validation. Define organizational rules (token limits, allowed models, forbidden tools, etc.) and check sessions against them. Produces structured pass/fail reports.

from agentlens.compliance import CompliancePolicy, ComplianceChecker

policy = CompliancePolicy(name="production-policy", rules=[
    {"kind": "max_tokens", "limit": 50000},
    {"kind": "forbidden_tools", "tools": ["execute_code", "rm"]},
    {"kind": "allowed_models", "models": ["gpt-4o", "claude-3.5-sonnet"]},
    {"kind": "max_events", "limit": 200},
    {"kind": "max_duration_ms", "limit": 300000},
    {"kind": "required_tools", "tools": ["safety_check"]},
    {"kind": "max_error_rate", "limit": 0.05},
    {"kind": "require_reasoning"},
])

checker = ComplianceChecker()
report = checker.check(session, policy)

print(report.render())       # Human-readable table
print(f"Compliant: {report.compliant}")
print(f"Passed: {report.passed}/{report.total_rules}")

# Serialize for storage
json_str = report.to_json()

Supported rule kinds:

DriftDetector

Detects behavioral changes by comparing metrics between a baseline window and a current window. Answers: "Is my agent behaving differently than before?"

from agentlens.drift import DriftDetector

detector = DriftDetector()

# Add sessions from two time periods
for s in historical_sessions:
    detector.add_baseline(s)
for s in recent_sessions:
    detector.add_current(s)

# Detect drift
report = detector.detect()
print(report.format_report())
print(f"Drift score: {report.drift_score}/100")
print(f"Status: {report.status.value}")

# Or compare two session lists directly
report = DriftDetector.compare(baseline_sessions, current_sessions)

Metrics compared: token usage, latency, error rate, model distribution, tool usage, event type distribution. Uses Cohen's d effect size for statistical significance.

Drift statuses:

AlertRules

Flexible, pattern-based alerting engine that evaluates declarative rules against streams of agent events. Supports threshold, rate, consecutive-event, regex, and aggregate conditions with composite AND/OR logic.

from agentlens.alert_rules import (
    AlertRule, AlertEngine, AlertSeverity,
    ThresholdCondition, RateCondition, PatternCondition,
    CompositeCondition
)

# Define rules
rules = [
    AlertRule(
        name="High token usage",
        condition=ThresholdCondition(
            field="tokens_out", op=">", value=10000
        ),
        severity=AlertSeverity.WARNING,
    ),
    AlertRule(
        name="Error spike",
        condition=RateCondition(
            field="event_type", value="error",
            window_seconds=300, threshold=5
        ),
        severity=AlertSeverity.CRITICAL,
    ),
    AlertRule(
        name="Sensitive data in output",
        condition=PatternCondition(
            field="output_data",
            pattern=r"\b\d{3}-\d{2}-\d{4}\b"  # SSN pattern
        ),
        severity=AlertSeverity.CRITICAL,
    ),
]

# Create engine and evaluate
engine = AlertEngine(rules=rules)
alerts = engine.evaluate(events)
for alert in alerts:
    print(f"[{alert.severity.value}] {alert.rule_name}: {alert.message}")

Condition types: