Documentation
Everything you need to install, configure, and run AeneasSoft.
Quickstart
Get your first trace and activate Active Defense in under 2 minutes.
Step 1: Install the SDK
bashpip install aeneas-agentwatchStep 2: Initialize in your code
pythonimport agentwatch
agentwatch.init()
# That's it. Every LLM call is now monitored.Step 3: Make an LLM call
pythonfrom openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
# Trace captured automatically. No callbacks, no wrappers.Step 4: View your traces
Open http://localhost:3001 to see your dashboard, or use the API:
bashcurl http://localhost:3001/api/tracesConfiguration Reference
All parameters for agentwatch.init():
| Parameter | Type | Default | Description |
|---|---|---|---|
| api_key | str | None | None | API key for cloud mode. If not set, SDK connects to localhost (development mode). |
| proxy_url | str | None | auto | Backend ingest URL. Auto-detected: localhost:3001 (no key) or api.aeneassoft.com (with key). |
| zero_data_retention | bool | False | Strip prompt/response text from spans. Only metadata (model, tokens, cost) is sent. |
| budget_per_hour | float | None | None | Hourly cost limit in USD. Triggers alert (or block) when exceeded. |
| max_error_rate | float | None | None | Error rate threshold (0.0–1.0). Triggers when error ratio exceeds this in a 5-min window. |
| max_calls_per_minute | int | None | None | Loop detection. Triggers when calls per minute exceed this threshold. |
| block_on_threshold | bool | False | If True, raises CircuitBreakerException and blocks the request. If False, alert only. |
| on_alert | callable | None | None | Callback function invoked when any threshold is exceeded. Receives alert dict. |
| on_block | callable | None | None | Pre-block hook (sync or async). Fires BEFORE CircuitBreakerException. Receives BlockEvent with .monitor for recovery. |
Smart URL Detection
The SDK automatically determines where to send traces:
api_key not set → http://localhost:3001/api/ingest (local dev)
api_key set → https://api.aeneassoft.com/api/ingest (cloud)
proxy_url set → uses your custom URL
Full Example
pythonimport agentwatch
agentwatch.init(
budget_per_hour=10.0, # Alert if agent spends > $10/hour
max_error_rate=0.5, # Alert if > 50% of calls fail
max_calls_per_minute=100, # Detect infinite loops
block_on_threshold=True, # Block calls (not just alert)
zero_data_retention=True, # GDPR: don't store prompts
on_alert=lambda alert: print(f"ALERT: {alert['reason']}")
)Active Defense (Safe Pause & Resume)
Block runaway AI agents before they drain your budget — then recover without losing state. Active Defense monitors cost, error rate, and call frequency with a full state machine (CLOSED → OPEN → PAUSED → HALF_OPEN).
How It Works
1. SDK intercepts every LLM call at the HTTP transport layer
2. Before the request leaves your process, thresholds are checked
3. If a threshold is exceeded and block_on_threshold=True:
→ CircuitBreakerException is raised. Request never sent. $0 wasted.
4. If block_on_threshold=False: alert fires, request proceeds
Thresholds
| Threshold | Window | Behavior |
|---|---|---|
| budget_per_hour | Rolling 1 hour | Fires when cumulative cost exceeds limit |
| max_error_rate | Rolling 5 minutes | Fires when error ratio exceeds threshold (min. 10 calls required) |
| max_calls_per_minute | Rolling 1 minute | Fires when call count exceeds limit (loop detection) |
Alert Cooldown
Duplicate alerts are suppressed for 60 seconds to prevent spam. Each unique threshold violation has its own cooldown timer.
Catching Blocked Requests + Recovery
pythonfrom agentwatch import CircuitBreakerException
try:
result = client.chat.completions.create(...)
except CircuitBreakerException as e:
print(f"Blocked: {e.reason}")
print(f"State: {e.state}") # Full monitor snapshot
# Recovery via e.monitor:
e.monitor.pause(60) # Half-Open after 60s (probe call)
e.monitor.increase_budget(5) # Or: raise budget, resume immediately
e.monitor.reset_windows() # Emergency: clear all windows
result = client.chat.completions.create(...) # Next call goes throughState Machine
CLOSED → threshold exceeded → OPEN (all calls blocked)
OPEN → pause(N) → PAUSED (waiting for probe window)
PAUSED → timeout elapsed → HALF_OPEN (one probe call allowed)
HALF_OPEN → probe OK → CLOSED | probe fails → OPEN
OPEN → increase_budget(N) → CLOSED (immediate)
Pre-Block Hook (on_block)
Save state before the exception propagates. Supports sync and async callbacks.
pythonasync def save_state(event):
await db.save({"agent": event.scope_id, "spent": event.current})
event.monitor.pause(30) # Pause instead of hard kill
agentwatch.init(
budget_per_hour=5.0,
block_on_threshold=True,
on_block=save_state, # Fires BEFORE exception
)LangGraph Checkpoint Resume
pythontry:
result = graph.invoke({"messages": [...]}, config=config)
except CircuitBreakerException as e:
# LangGraph auto-checkpoints before exception propagates
e.monitor.increase_budget(10.0)
result = graph.invoke(None, config=config) # Resumes from checkpointEU AI Act Compliance
Every state transition (triggered, paused, budget_increased, reset, recovered, retrip) emits a compliance-relevant span with eu_ai_act_art12_relevant flag. No state change goes unlogged.
Per-Agent Scoping
Set individual budgets and thresholds for each agent. Global limits still apply — per-agent limits are checked in addition.
pythonimport agentwatch
agentwatch.init(budget_per_hour=50.0, block_on_threshold=True)
# Each agent gets its own budget:
with agentwatch.agent("ResearchBot",
role="Researcher",
budget_per_hour=10.0,
block_on_threshold=True):
result = client.chat.completions.create(...)
# Blocked if ResearchBot exceeds $10/hr OR global exceeds $50/hr
with agentwatch.agent("WriterBot",
role="Writer",
budget_per_hour=20.0,
max_error_rate=0.3):
result = client.chat.completions.create(...)
# WriterBot has its own budget AND error rate thresholdagent() Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| name | str | required | Agent display name. Used in traces and dashboard. |
| role | str | "Agent" | Agent role for classification (e.g., Researcher, Writer, Orchestrator). |
| agent_id | str | None | auto | Unique ID. Auto-generated from name if not set. |
| budget_per_hour | float | None | None | Per-agent hourly budget limit in USD. |
| max_error_rate | float | None | None | Per-agent error rate threshold (0.0–1.0). |
| block_on_threshold | bool | False | Block calls when this agent's thresholds are exceeded. |
| on_alert | callable | None | None | Per-agent alert callback. |
| on_block | callable | None | None | Pre-block hook. Receives BlockEvent with .monitor for recovery decisions. |
Nested Agents
Agent contexts can be nested. Inner agents have their own thresholds while outer agent and global thresholds still apply.
pythonwith agentwatch.agent("Orchestrator", budget_per_hour=100.0):
# Orchestrator scope
with agentwatch.agent("SubAgent", budget_per_hour=10.0):
# SubAgent scope — both SubAgent AND Orchestrator limits checked
result = client.chat.completions.create(...)Cost Tracking
AeneasSoft calculates real USD cost for every LLM call using current list prices.
Supported Models & Pricing
Prices in USD per 1M tokens (input / output):
| Model | Input | Output |
|---|---|---|
| gpt-4o | $5.00 | $15.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4-turbo | $10.00 | $30.00 |
| gpt-4 | $30.00 | $60.00 |
| gpt-3.5-turbo | $0.50 | $1.50 |
| claude-opus-4-6 | $15.00 | $75.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-haiku-4-5 | $0.80 | $4.00 |
| gemini-1.5-pro | $3.50 | $10.50 |
| gemini-1.5-flash | $0.075 | $0.30 |
| mistral-large-latest | $4.00 | $12.00 |
| mistral-small-latest | $1.00 | $3.00 |
| command-r-plus | $3.00 | $15.00 |
| command-r | $0.50 | $1.50 |
| llama3-70b-8192 | $0.59 | $0.79 |
| llama3-8b-8192 | $0.05 | $0.08 |
| mixtral-8x7b-32768 | $0.24 | $0.24 |
Unknown models use a default of $1.00 / $2.00 per 1M tokens. Model name matching supports prefixes (e.g., "gpt-4o-2024-11-20" matches "gpt-4o").
Node.js SDK
bashnpm install @aeneassoft/sdk-nodetypescriptimport { init, trace, span, currentTraceId } from '@aeneassoft/sdk-node';
// Initialize
init({
apiKey: 'aw_your_key_here', // Required
ingestUrl: 'http://localhost:3001/api/ingest', // Optional
zeroDataRetention: false // Optional
});
// Group calls into a trace
await trace('my-workflow', {}, async () => {
const response = await openai.chat.completions.create({...});
console.log('Trace ID:', currentTraceId());
});Node.js init() Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| apiKey | string | required | Authentication key for the backend. |
| ingestUrl | string | api.aeneassoft.com | Backend ingest endpoint URL. |
| zeroDataRetention | boolean | false | Strip input/output from spans. |
Framework Compatibility
AeneasSoft operates at the HTTP transport layer — below every framework. No plugins. No middleware. No wrappers. Just 2 lines.
Every framework that calls an AI provider over HTTP is automatically instrumented. This includes LangChain, CrewAI, AutoGen, LlamaIndex, Haystack, Semantic Kernel, and any custom code.
LangChain
pythonimport agentwatch # <-- line 1
agentwatch.init(api_key="your-key") # <-- line 2. Done.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
chain = ChatPromptTemplate.from_template("Explain {topic}") | ChatOpenAI(model="gpt-4o") | StrOutputParser()
with agentwatch.trace("research-chain"):
result = chain.invoke({"topic": "EU AI Act"})
# Every LLM call traced. No LangSmith required.CrewAI
pythonimport agentwatch # <-- line 1
agentwatch.init(api_key="your-key") # <-- line 2. Done.
from crewai import Agent, Task, Crew
researcher = Agent(role="Researcher", goal="Find data", verbose=True)
writer = Agent(role="Writer", goal="Write content", verbose=True)
crew = Crew(agents=[researcher, writer], tasks=[...])
with agentwatch.trace("compliance-crew"):
result = crew.kickoff()
# Both agents traced — tokens, cost, latency per step.AutoGen
pythonimport agentwatch # <-- line 1
agentwatch.init(api_key="your-key") # <-- line 2. Done.
from autogen import ConversableAgent
assistant = ConversableAgent(name="Assistant", llm_config={"model": "gpt-4o"})
reviewer = ConversableAgent(name="Reviewer", llm_config={"model": "gpt-4o"})
with agentwatch.trace("code-review"):
assistant.initiate_chat(reviewer, message="Review this code", max_turns=3)
# Full conversation flow traced: Assistant -> Reviewer -> Assistant.Any Framework / Direct HTTP
pythonimport agentwatch # <-- line 1
agentwatch.init(api_key="your-key") # <-- line 2. Done.
# Works with LlamaIndex, Haystack, Semantic Kernel, raw httpx/requests —
# anything that calls an AI API over HTTP is captured automatically.
# Supported: OpenAI, Anthropic, Gemini, Mistral, Groq, Cohere,
# Together AI, Fireworks, Ollama, Azure OpenAI, and more.Self-Hosting
Run AeneasSoft locally with Docker. No cloud dependency. No account needed.
Minimal (Development)
bashdocker compose -f docker-compose.local.yml up -d
# Starts: ClickHouse + Backend
# Dashboard: http://localhost:3001
# No Kafka. No auth. LOCAL_MODE=true.Full Stack (Production)
bashdocker compose up -d
# Starts: ClickHouse + Kafka + Backend + Proxy
# Configure via .env file
# Services: clickhouse (8123), kafka (9092), backend (3001), proxy (8080)Verify It Works
bash# Check server health:
curl http://localhost:3001/health
# Send a test trace:
python -c "import agentwatch; agentwatch.init(); agentwatch.verify()"Environment Variables
Configure the backend via .env file or environment variables.
Core
| Parameter | Type | Default | Description |
|---|---|---|---|
| PORT | number | 3001 | Backend server port. |
| CLICKHOUSE_URL | string | http://localhost:8123 | ClickHouse database endpoint. |
| CLICKHOUSE_DB | string | productname | ClickHouse database name. |
| KAFKA_BROKERS | string | — | Kafka connection string. Optional — if not set, direct ingest is used. |
| CORS_ORIGINS | string | http://localhost:3000 | Comma-separated allowed CORS origins. |
Authentication
| Parameter | Type | Default | Description |
|---|---|---|---|
| API_KEY | string | — | Single-tenant API key for SDK authentication. |
| JWT_SECRET | string | — | Secret for JWT token signing. Generate: openssl rand -hex 32 |
| LOCAL_MODE | boolean | auto | Skip all auth. Auto-enabled if neither JWT_SECRET nor API_KEY is set. |
Data & Privacy
| Parameter | Type | Default | Description |
|---|---|---|---|
| ZERO_DATA_RETENTION | boolean | false | Don't store prompts/responses. Metadata only. |
| DATA_RETENTION_DAYS | number | 30 | ClickHouse TTL — data auto-deleted after N days. |
Email (Optional)
| Parameter | Type | Default | Description |
|---|---|---|---|
| RESEND_API_KEY | string | — | Resend API key for sending emails (alerts, welcome, password reset). |
| FROM_EMAIL | string | noreply@aeneassoft.com | Sender email address. |
API Reference
All endpoints accept JSON. Authentication via JWT token or X-API-Key header. In LOCAL_MODE, no auth required.
| Method | Endpoint | Description |
|---|---|---|
| GET | /health | Server status. No auth required. |
| POST | /api/ingest | Ingest ATP span. API key auth. |
| GET | /api/traces | List traces. Supports ?search, ?status, ?agent_id, ?model, ?from, ?to, ?sort, ?order, ?limit, ?offset. |
| GET | /api/traces/:id/spans | All spans for a trace. |
| GET | /api/traces/:id/graph | Causal execution graph for a trace. |
| GET | /api/traces/:id/compliance-score | EU AI Act Article 12 readiness score (0-100). |
| GET | /api/traces/:id/compliance-report | RSA-signed PDF compliance report (Enterprise). |
| GET | /api/metrics | Dashboard KPIs: traces, tokens, cost, latency, error rate. |
| GET | /api/cost/daily | Daily cost breakdown. |
| GET | /api/cost/by-agent | Cost breakdown per agent. |
| GET | /api/cost/by-model | Cost breakdown per model. |
| GET | /api/reports/monthly | Monthly report data (JSON). |
| GET | /api/reports/monthly/csv | Monthly traces export (CSV). |
| GET | /api/reports/monthly/pdf | Monthly report (RSA-signed PDF). |
| GET | /api/circuit-breaker/status | Real-time circuit breaker state. |
| GET | /api/alerts | List alert rules. |
| POST | /api/alerts | Create alert rule. |
| DELETE | /api/alerts/:id | Delete alert rule. |
| PATCH | /api/alerts/:id | Toggle alert enabled/disabled. |
| GET | /api/alerts/history | Alert event history. |
| POST | /api/alerts/sdk-alert | Receive circuit breaker alert from SDK. |
Alert System
AeneasSoft has two layers of alerting: SDK-level (in-process) and backend-level (server-side rules).
SDK Alerts (In-Process)
Configured via init() parameters. Fires immediately when thresholds are exceeded. 60-second cooldown between duplicate alerts.
pythondef my_alert_handler(alert):
print(f"Alert: {alert['reason']}")
print(f"Scope: {alert['scope']} / {alert['scope_id']}")
print(f"Threshold: {alert['threshold']} | Current: {alert['current']}")
# Send to Slack, PagerDuty, email, etc.
agentwatch.init(
budget_per_hour=10.0,
on_alert=my_alert_handler
)Backend Alerts (Server-Side)
Create rules via the dashboard or API. Backend evaluates rules against incoming trace data and sends email notifications via Resend.
bash# Create an alert rule via API:
curl -X POST http://localhost:3001/api/alerts \
-H "Content-Type: application/json" \
-d '{
"name": "High Cost Alert",
"condition": "cost_per_hour",
"threshold": 50.0,
"action_type": "email"
}'Streaming Support
AeneasSoft wraps OpenAI and Anthropic stream objects transparently. Your code receives chunks exactly as before — the SDK captures metadata in the background.
Non-streaming calls: 100% accuracy. Full request + response captured.
Streaming calls: Request + final usage summary (tokens, cost) captured. Individual chunks are not logged.
Full chunk-level streaming tracing ships Q3 2026.
python# Streaming works transparently:
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content, end="")
# → Trace captured with final token count and cost after stream completesArchitecture
In-Process Telemetry Interception with Safe Pause & Resume (Patent Pending).
Your Code
|
v
[agentwatch.init()]
|
+--→ Intercepts every LLM call in-process (no proxy)
| Captures: model, tokens, cost, latency, input/output
| Active Defense: budget/error/loop check BEFORE request
|
+--→ Circuit Breaker State Machine
CLOSED → OPEN → PAUSED → HALF_OPEN → CLOSED
on_block hook: save state before exception
Recovery: pause() / increase_budget() / reset()
|
v
AI Provider (OpenAI, Anthropic, Gemini, Mistral, Groq, etc.)In-Process Interception
AeneasSoft captures every LLM call inside your application process. No external proxy, no network hop, no single point of failure. Works with any AI provider accessible via HTTP.
Safe Pause & Resume
When a threshold is exceeded, the circuit breaker transitions through a full state machine (CLOSED → OPEN → PAUSED → HALF_OPEN). The on_block hook fires before the exception, letting you save state. Recovery methods on the monitor let you pause, increase budget, or reset.
EU AI Act Compliance
Every circuit breaker state change (triggered, paused, budget increased, recovered, re-tripped, reset) is automatically logged as a compliance-relevant span with eu_ai_act_art12_relevant flag.
Supported Providers
Auto-detected via URL: OpenAI, Anthropic, Gemini, Mistral, Groq, Cohere, Together AI, Fireworks, Azure OpenAI, Ollama — and any provider accessible via HTTP.
Migration Guide
From Langfuse
Langfuse requires decorators or callbacks on every function. AeneasSoft requires zero code changes beyond init().
python# Before (Langfuse):
from langfuse.decorators import observe
from langfuse.openai import openai
@observe()
def my_function():
client = openai.OpenAI()
return client.chat.completions.create(...)
# After (AeneasSoft):
import agentwatch
agentwatch.init()
def my_function():
client = OpenAI()
return client.chat.completions.create(...)
# That's it. Remove all decorators. Remove langfuse imports.From LangSmith
LangSmith is tightly coupled to LangChain. AeneasSoft works with any framework.
python# Before (LangSmith):
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls_..."
# Only works with LangChain
# After (AeneasSoft):
import agentwatch
agentwatch.init()
# Works with LangChain, CrewAI, raw OpenAI, httpx, any frameworkFrom Helicone
Helicone requires routing all traffic through a proxy (single point of failure). AeneasSoft runs in-process.
python# Before (Helicone):
client = OpenAI(
base_url="https://oai.helicone.ai/v1", # Proxy SPOF
default_headers={"Helicone-Auth": "Bearer sk-..."}
)
# After (AeneasSoft):
import agentwatch
agentwatch.init()
client = OpenAI() # Direct connection. No proxy. No SPOF.Troubleshooting & FAQ
No traces appearing in the dashboard?
1. Check that the backend is running: curl http://localhost:3001/health 2. Ensure agentwatch.init() is called before any LLM calls. 3. Check that your SDK version is up to date: pip install --upgrade aeneas-agentwatch
CircuitBreakerException raised unexpectedly?
Check your threshold configuration. budget_per_hour is cumulative over a rolling 1-hour window. Use agentwatch.get_state() to see current values. Set block_on_threshold=False to switch to alert-only mode.
Cost shows $0.00 for all traces?
The SDK needs token counts from the API response to calculate cost. Ensure you're using a supported model (20+ models tracked). Unknown models use a default rate of $1.00/$2.00 per 1M tokens.
Does it work with async code?
Yes. Both sync and async clients are patched (OpenAI, AsyncOpenAI, httpx.AsyncClient, aiohttp). Context propagation uses ContextVar for async safety.
Does it add latency to my LLM calls?
Negligible. The interceptor adds ~0.1ms overhead per call (threshold check + span recording). No network proxy, no extra HTTP hop.
Can I use it in production?
Yes. Thread-safe (Lock-protected), memory-capped (deque maxlen=10,000 per monitor ≈ 240KB RAM), and battle-tested with CI on every push.
How do I disable it temporarily?
Don't call agentwatch.init(). The SDK only activates when init() is explicitly called. No environment variable side effects.
Honest Boundaries
We believe transparency about limitations builds more trust than hiding them.
What we do well
- Non-streaming LLM calls: 100% accurate capture
- In-process interception with zero network latency
- Circuit breaker with Safe Pause & Resume state machine
- Cost tracking for 20+ models with current list prices
What we don't do (yet)
- Prompt management / versioning
- A/B testing / evaluation pipelines
- Streaming: individual chunks not captured (Q3 2026)
- Cost: batch API, cached tokens, fine-tuned model rates
- Multi-process central configuration
Monkey-patching disclaimer
We modify HTTP library internals at runtime. We test against pinned library versions and ship SDK updates within 48 hours of breaking changes upstream.