Python SDK Reference
Complete reference for the agentlens Python package.
Installation
# From source (development mode)
cd sdk
pip install -e .
# From source (production)
pip install ./sdk
# With dev dependencies (for testing)
pip install -e ".[dev]"
agentlens.init()
Initialize the SDK and connect to the AgentLens backend. Must be called before any other SDK function.
agentlens.init(
api_key: str = "default",
endpoint: str = "http://localhost:3000"
) -> AgentTracker
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | "default" | API key for authentication (sent as X-API-Key header) |
endpoint | str | "http://localhost:3000" | URL of the AgentLens backend |
Returns: An AgentTracker instance (also stored globally for module-level functions).
import agentlens
tracker = agentlens.init(
api_key="prod-key-abc123",
endpoint="https://agentlens.example.com"
)
All other SDK functions raise RuntimeError if init() hasn't been called.
agentlens.start_session()
Create a new tracking session. All subsequent track() calls are associated with this session.
agentlens.start_session(
agent_name: str = "default-agent",
metadata: dict | None = None
) -> Session
| Parameter | Type | Default | Description |
|---|---|---|---|
agent_name | str | "default-agent" | Name identifying the agent |
metadata | dict | None | None | Arbitrary metadata (version, environment, etc.) |
Returns: A Session object with a unique session_id.
session = agentlens.start_session(
agent_name="research-agent-v2",
metadata={"version": "2.1.0", "environment": "production"}
)
print(f"Session: {session.session_id}")
agentlens.track()
Record a single event (LLM call, tool call, decision, error, etc.) in the current session.
agentlens.track(
event_type: str = "generic",
input_data: dict | None = None,
output_data: dict | None = None,
model: str | None = None,
tokens_in: int = 0,
tokens_out: int = 0,
reasoning: str | None = None,
tool_name: str | None = None,
tool_input: dict | None = None,
tool_output: dict | None = None,
duration_ms: float | None = None,
) -> AgentEvent
| Parameter | Type | Description |
|---|---|---|
event_type | str | Event category: "llm_call", "tool_call", "decision", "error", "generic" |
input_data | dict | Input to the operation (prompt, query, etc.) |
output_data | dict | Output from the operation (response, result, etc.) |
model | str | LLM model name (e.g., "gpt-4", "claude-3.5-sonnet") |
tokens_in | int | Input/prompt token count |
tokens_out | int | Output/completion token count |
reasoning | str | Why the agent made this decision (creates a DecisionTrace) |
tool_name | str | Tool/function name (creates a ToolCall) |
tool_input | dict | Tool input parameters |
tool_output | dict | Tool return value |
duration_ms | float | Execution time in milliseconds |
Example: Track an LLM Call
agentlens.track(
event_type="llm_call",
input_data={"prompt": "Summarize this article", "article_url": "https://..."},
output_data={"response": "The article discusses..."},
model="gpt-4-turbo",
tokens_in=1200,
tokens_out=350,
reasoning="User asked for a summary. Using GPT-4 Turbo for long context.",
duration_ms=2340.5,
)
Example: Track a Tool Call
agentlens.track(
event_type="tool_call",
tool_name="web_search",
tool_input={"query": "latest AI safety papers 2026"},
tool_output={"results": [{"title": "...", "url": "..."}]},
duration_ms=890.2,
)
agentlens.explain()
Generate a human-readable explanation of all events in the current (or specified) session.
agentlens.explain(
session_id: str | None = None
) -> str
Returns a Markdown-formatted string with a timeline of events, token counts, and reasoning traces.
explanation = agentlens.explain()
print(explanation)
# Output:
# ## Session Explanation: research-agent-v2
# **Session ID:** a1b2c3d4
# **Started:** 2026-02-14T10:30:00+00:00
# **Status:** active
# **Total tokens:** 1550 in / 353 out
#
# ### Event Timeline:
# 1. [10:30:01.234] **llm_call** (model: gpt-4-turbo)
# 💡 Reasoning: User asked for a summary...
# 📊 Tokens: 1200 in / 350 out
# 2. [10:30:02.124] **tool_call** → tool: web_search
agentlens.end_session()
End the current session, mark it as completed, and flush all pending events to the backend.
agentlens.end_session(
session_id: str | None = None
) -> None
Calling end_session() ensures all buffered events are flushed to the backend. If you forget, some events may be lost if the process exits before the background flush thread runs.
AgentTracker (Advanced)
If you need more control, use the AgentTracker instance directly instead of the module-level functions:
from agentlens.tracker import AgentTracker
from agentlens.transport import Transport
transport = Transport(
endpoint="http://localhost:3000",
api_key="my-key",
batch_size=20, # Flush every 20 events
flush_interval=10.0, # Or every 10 seconds
max_retries=5, # Retry failed sends 5 times
)
tracker = AgentTracker(transport=transport)
session = tracker.start_session(agent_name="custom-agent")
tracker.track(event_type="llm_call", model="gpt-4", tokens_in=100, tokens_out=50)
tracker.end_session()
tracker.track_tool()
Convenience method for tracking tool calls:
tracker.track_tool(
tool_name="database_query",
tool_input={"sql": "SELECT * FROM users WHERE active = 1"},
tool_output={"rows": 42},
duration_ms=15.3,
)
Multiple Sessions
The tracker supports multiple concurrent sessions. The most recently started session is the "current" one used by module-level functions:
# Session 1
s1 = agentlens.start_session(agent_name="agent-a")
agentlens.track(event_type="llm_call", model="gpt-4", tokens_in=100, tokens_out=50)
agentlens.end_session()
# Session 2
s2 = agentlens.start_session(agent_name="agent-b")
agentlens.track(event_type="tool_call", tool_name="search")
agentlens.end_session()
# Or end a specific session by ID
agentlens.end_session(session_id=s1.session_id)
Analysis Modules
AgentLens includes several analysis modules beyond basic tracking. These run client-side (pure Python, no external dependencies) and work with the same Session and event data you already collect.
CostForecaster
Predicts future AI costs from historical usage using linear regression, exponential moving average, or simple averaging. Useful for budget planning and overspend alerts.
from agentlens.forecast import CostForecaster, UsageRecord
from datetime import datetime
forecaster = CostForecaster()
# Feed historical data points
forecaster.add_record(UsageRecord(
timestamp=datetime(2026, 3, 1, 10, 0),
tokens_in=5000, tokens_out=2000,
cost_usd=0.035, model="gpt-4o"
))
# ... add more records ...
# Forecast next 7 days
forecast = forecaster.forecast_daily(days=7)
print(f"Predicted 7-day cost: ${forecast.total_predicted_cost:.2f}")
print(f"Method used: {forecast.method}")
for day in forecast.daily_predictions:
print(f" {day.date}: ${day.predicted_cost:.4f}")
# Get a spending summary
summary = forecaster.spending_summary()
print(f"Daily average: ${summary.daily_average:.4f}")
print(f"Monthly projection: ${summary.monthly_projection:.2f}")
print(f"Most expensive model: {summary.top_model}")
Forecast methods (auto-selected based on data volume):
- Linear regression — 7+ days of data; captures trends
- Exponential moving average (EMA) — 3–6 days; weights recent usage more
- Simple average — fewer than 3 days; straightforward extrapolation
Key classes: UsageRecord, DailyPrediction, ForecastResult, SpendingSummary
ComplianceChecker
Policy-based session validation. Define organizational rules (token limits, allowed models, forbidden tools, etc.) and check sessions against them. Produces structured pass/fail reports.
from agentlens.compliance import CompliancePolicy, ComplianceChecker
policy = CompliancePolicy(name="production-policy", rules=[
{"kind": "max_tokens", "limit": 50000},
{"kind": "forbidden_tools", "tools": ["execute_code", "rm"]},
{"kind": "allowed_models", "models": ["gpt-4o", "claude-3.5-sonnet"]},
{"kind": "max_events", "limit": 200},
{"kind": "max_duration_ms", "limit": 300000},
{"kind": "required_tools", "tools": ["safety_check"]},
{"kind": "max_error_rate", "limit": 0.05},
{"kind": "require_reasoning"},
])
checker = ComplianceChecker()
report = checker.check(session, policy)
print(report.render()) # Human-readable table
print(f"Compliant: {report.compliant}")
print(f"Passed: {report.passed}/{report.total_rules}")
# Serialize for storage
json_str = report.to_json()
Supported rule kinds:
max_tokens/min_tokens— Total token boundsallowed_models/forbidden_models— Model allow/deny listsrequired_tools/forbidden_tools— Tool usage enforcementmax_events/min_events— Event count boundsmax_duration_ms— Session duration limitmax_tool_calls— Tool call count limitrequire_reasoning— Require reasoning in decision tracesmax_error_rate— Maximum error event ratiocustom— Custom rule with a Python callable
DriftDetector
Detects behavioral changes by comparing metrics between a baseline window and a current window. Answers: "Is my agent behaving differently than before?"
from agentlens.drift import DriftDetector
detector = DriftDetector()
# Add sessions from two time periods
for s in historical_sessions:
detector.add_baseline(s)
for s in recent_sessions:
detector.add_current(s)
# Detect drift
report = detector.detect()
print(report.format_report())
print(f"Drift score: {report.drift_score}/100")
print(f"Status: {report.status.value}")
# Or compare two session lists directly
report = DriftDetector.compare(baseline_sessions, current_sessions)
Metrics compared: token usage, latency, error rate, model distribution, tool usage, event type distribution. Uses Cohen's d effect size for statistical significance.
Drift statuses:
stable— No significant changes (score 0–15)minor_drift— Small behavioral changes (score 16–40)significant_drift— Notable behavioral shift (score 41–70)degraded— Major degradation detected (score 71–100)
AlertRules
Flexible, pattern-based alerting engine that evaluates declarative rules against streams of agent events. Supports threshold, rate, consecutive-event, regex, and aggregate conditions with composite AND/OR logic.
from agentlens.alert_rules import (
AlertRule, AlertEngine, AlertSeverity,
ThresholdCondition, RateCondition, PatternCondition,
CompositeCondition
)
# Define rules
rules = [
AlertRule(
name="High token usage",
condition=ThresholdCondition(
field="tokens_out", op=">", value=10000
),
severity=AlertSeverity.WARNING,
),
AlertRule(
name="Error spike",
condition=RateCondition(
field="event_type", value="error",
window_seconds=300, threshold=5
),
severity=AlertSeverity.CRITICAL,
),
AlertRule(
name="Sensitive data in output",
condition=PatternCondition(
field="output_data",
pattern=r"\b\d{3}-\d{2}-\d{4}\b" # SSN pattern
),
severity=AlertSeverity.CRITICAL,
),
]
# Create engine and evaluate
engine = AlertEngine(rules=rules)
alerts = engine.evaluate(events)
for alert in alerts:
print(f"[{alert.severity.value}] {alert.rule_name}: {alert.message}")
Condition types:
ThresholdCondition— Field value exceeds a thresholdRateCondition— Events matching a criteria exceed a rate within a time windowConsecutiveCondition— N consecutive events match a criteriaPatternCondition— Regex match on a string fieldAggregateCondition— Aggregate function (sum/avg/max/min) exceeds thresholdCompositeCondition— Combine conditions with AND/OR logic