Documentation

Everything you need to install, configure, and run AeneasSoft.

Quickstart

Get your first trace and activate Active Defense in under 2 minutes.

Step 1: Install the SDK

bashpip install aeneas-agentwatch

Step 2: Initialize in your code

pythonimport agentwatch
agentwatch.init()
# That's it. Every LLM call is now monitored.

Step 3: Make an LLM call

pythonfrom openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
# Trace captured automatically. No callbacks, no wrappers.

Step 4: View your traces

Open http://localhost:3001 to see your dashboard, or use the API:

bashcurl http://localhost:3001/api/traces

Configuration Reference

All parameters for agentwatch.init():

Parameter	Type	Default	Description
api_key	str \| None	None	API key for cloud mode. If not set, SDK connects to localhost (development mode).
proxy_url	str \| None	auto	Backend ingest URL. Auto-detected: localhost:3001 (no key) or api.aeneassoft.com (with key).
zero_data_retention	bool	False	Strip prompt/response text from spans. Only metadata (model, tokens, cost) is sent.
budget_per_hour	float \| None	None	Hourly cost limit in USD. Triggers alert (or block) when exceeded.
max_error_rate	float \| None	None	Error rate threshold (0.0–1.0). Triggers when error ratio exceeds this in a 5-min window.
max_calls_per_minute	int \| None	None	Loop detection. Triggers when calls per minute exceed this threshold.
block_on_threshold	bool	False	If True, raises CircuitBreakerException and blocks the request. If False, alert only.
on_alert	callable \| None	None	Callback function invoked when any threshold is exceeded. Receives alert dict.
on_block	callable \| None	None	Pre-block hook (sync or async). Fires BEFORE CircuitBreakerException. Receives BlockEvent with .monitor for recovery.

Smart URL Detection

The SDK automatically determines where to send traces:

api_key not set → http://localhost:3001/api/ingest (local dev)

api_key set → https://api.aeneassoft.com/api/ingest (cloud)

proxy_url set → uses your custom URL

Full Example

pythonimport agentwatch

agentwatch.init(
    budget_per_hour=10.0,         # Alert if agent spends > $10/hour
    max_error_rate=0.5,           # Alert if > 50% of calls fail
    max_calls_per_minute=100,     # Detect infinite loops
    block_on_threshold=True,      # Block calls (not just alert)
    zero_data_retention=True,     # GDPR: don't store prompts
    on_alert=lambda alert: print(f"ALERT: {alert['reason']}")
)

Active Defense (Safe Pause & Resume)

Block runaway AI agents before they drain your budget — then recover without losing state. Active Defense monitors cost, error rate, and call frequency with a full state machine (CLOSED → OPEN → PAUSED → HALF_OPEN).

How It Works

1. SDK intercepts every LLM call at the HTTP transport layer

2. Before the request leaves your process, thresholds are checked

3. If a threshold is exceeded and block_on_threshold=True:

→ CircuitBreakerException is raised. Request never sent. $0 wasted.

4. If block_on_threshold=False: alert fires, request proceeds

Thresholds

Threshold	Window	Behavior
budget_per_hour	Rolling 1 hour	Fires when cumulative cost exceeds limit
max_error_rate	Rolling 5 minutes	Fires when error ratio exceeds threshold (min. 10 calls required)
max_calls_per_minute	Rolling 1 minute	Fires when call count exceeds limit (loop detection)

Alert Cooldown

Duplicate alerts are suppressed for 60 seconds to prevent spam. Each unique threshold violation has its own cooldown timer.

Catching Blocked Requests + Recovery

pythonfrom agentwatch import CircuitBreakerException

try:
    result = client.chat.completions.create(...)
except CircuitBreakerException as e:
    print(f"Blocked: {e.reason}")
    print(f"State: {e.state}")  # Full monitor snapshot

    # Recovery via e.monitor:
    e.monitor.pause(60)           # Half-Open after 60s (probe call)
    e.monitor.increase_budget(5)  # Or: raise budget, resume immediately
    e.monitor.reset_windows()     # Emergency: clear all windows

    result = client.chat.completions.create(...)  # Next call goes through

State Machine

CLOSED → threshold exceeded → OPEN (all calls blocked)

OPEN → pause(N) → PAUSED (waiting for probe window)

PAUSED → timeout elapsed → HALF_OPEN (one probe call allowed)

HALF_OPEN → probe OK → CLOSED | probe fails → OPEN

OPEN → increase_budget(N) → CLOSED (immediate)

Pre-Block Hook (on_block)

Save state before the exception propagates. Supports sync and async callbacks.

pythonasync def save_state(event):
    await db.save({"agent": event.scope_id, "spent": event.current})
    event.monitor.pause(30)  # Pause instead of hard kill

agentwatch.init(
    budget_per_hour=5.0,
    block_on_threshold=True,
    on_block=save_state,  # Fires BEFORE exception
)

LangGraph Checkpoint Resume

pythontry:
    result = graph.invoke({"messages": [...]}, config=config)
except CircuitBreakerException as e:
    # LangGraph auto-checkpoints before exception propagates
    e.monitor.increase_budget(10.0)
    result = graph.invoke(None, config=config)  # Resumes from checkpoint

EU AI Act Compliance

Every state transition (triggered, paused, budget_increased, reset, recovered, retrip) emits a compliance-relevant span with eu_ai_act_art12_relevant flag. No state change goes unlogged.

Per-Agent Scoping

Set individual budgets and thresholds for each agent. Global limits still apply — per-agent limits are checked in addition.

pythonimport agentwatch

agentwatch.init(budget_per_hour=50.0, block_on_threshold=True)

# Each agent gets its own budget:
with agentwatch.agent("ResearchBot",
                       role="Researcher",
                       budget_per_hour=10.0,
                       block_on_threshold=True):
    result = client.chat.completions.create(...)
    # Blocked if ResearchBot exceeds $10/hr OR global exceeds $50/hr

with agentwatch.agent("WriterBot",
                       role="Writer",
                       budget_per_hour=20.0,
                       max_error_rate=0.3):
    result = client.chat.completions.create(...)
    # WriterBot has its own budget AND error rate threshold

agent() Parameters

Parameter	Type	Default	Description
name	str	required	Agent display name. Used in traces and dashboard.
role	str	"Agent"	Agent role for classification (e.g., Researcher, Writer, Orchestrator).
agent_id	str \| None	auto	Unique ID. Auto-generated from name if not set.
budget_per_hour	float \| None	None	Per-agent hourly budget limit in USD.
max_error_rate	float \| None	None	Per-agent error rate threshold (0.0–1.0).
block_on_threshold	bool	False	Block calls when this agent's thresholds are exceeded.
on_alert	callable \| None	None	Per-agent alert callback.
on_block	callable \| None	None	Pre-block hook. Receives BlockEvent with .monitor for recovery decisions.

Nested Agents

Agent contexts can be nested. Inner agents have their own thresholds while outer agent and global thresholds still apply.

pythonwith agentwatch.agent("Orchestrator", budget_per_hour=100.0):
    # Orchestrator scope
    with agentwatch.agent("SubAgent", budget_per_hour=10.0):
        # SubAgent scope — both SubAgent AND Orchestrator limits checked
        result = client.chat.completions.create(...)

Cost Tracking

AeneasSoft calculates real USD cost for every LLM call using current list prices.

Supported Models & Pricing

Prices in USD per 1M tokens (input / output):

Model	Input	Output
gpt-4o	$5.00	$15.00
gpt-4o-mini	$0.15	$0.60
gpt-4-turbo	$10.00	$30.00
gpt-4	$30.00	$60.00
gpt-3.5-turbo	$0.50	$1.50
claude-opus-4-6	$15.00	$75.00
claude-sonnet-4-6	$3.00	$15.00
claude-haiku-4-5	$0.80	$4.00
gemini-1.5-pro	$3.50	$10.50
gemini-1.5-flash	$0.075	$0.30
mistral-large-latest	$4.00	$12.00
mistral-small-latest	$1.00	$3.00
command-r-plus	$3.00	$15.00
command-r	$0.50	$1.50
llama3-70b-8192	$0.59	$0.79
llama3-8b-8192	$0.05	$0.08
mixtral-8x7b-32768	$0.24	$0.24

Unknown models use a default of $1.00 / $2.00 per 1M tokens. Model name matching supports prefixes (e.g., "gpt-4o-2024-11-20" matches "gpt-4o").

Node.js SDK

bashnpm install @aeneassoft/sdk-node

typescriptimport { init, trace, span, currentTraceId } from '@aeneassoft/sdk-node';

// Initialize
init({
  apiKey: 'aw_your_key_here',     // Required
  ingestUrl: 'http://localhost:3001/api/ingest',  // Optional
  zeroDataRetention: false         // Optional
});

// Group calls into a trace
await trace('my-workflow', {}, async () => {
  const response = await openai.chat.completions.create({...});
  console.log('Trace ID:', currentTraceId());
});

Node.js init() Parameters

Parameter	Type	Default	Description
apiKey	string	required	Authentication key for the backend.
ingestUrl	string	api.aeneassoft.com	Backend ingest endpoint URL.
zeroDataRetention	boolean	false	Strip input/output from spans.

Framework Compatibility

AeneasSoft operates at the HTTP transport layer — below every framework. No plugins. No middleware. No wrappers. Just 2 lines.

Every framework that calls an AI provider over HTTP is automatically instrumented. This includes LangChain, CrewAI, AutoGen, LlamaIndex, Haystack, Semantic Kernel, and any custom code.

LangChain

pythonimport agentwatch                          # <-- line 1
agentwatch.init(api_key="your-key")         # <-- line 2. Done.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

chain = ChatPromptTemplate.from_template("Explain {topic}") | ChatOpenAI(model="gpt-4o") | StrOutputParser()

with agentwatch.trace("research-chain"):
    result = chain.invoke({"topic": "EU AI Act"})
# Every LLM call traced. No LangSmith required.

CrewAI

pythonimport agentwatch                          # <-- line 1
agentwatch.init(api_key="your-key")         # <-- line 2. Done.

from crewai import Agent, Task, Crew

researcher = Agent(role="Researcher", goal="Find data", verbose=True)
writer = Agent(role="Writer", goal="Write content", verbose=True)
crew = Crew(agents=[researcher, writer], tasks=[...])

with agentwatch.trace("compliance-crew"):
    result = crew.kickoff()
# Both agents traced — tokens, cost, latency per step.

AutoGen

pythonimport agentwatch                          # <-- line 1
agentwatch.init(api_key="your-key")         # <-- line 2. Done.

from autogen import ConversableAgent

assistant = ConversableAgent(name="Assistant", llm_config={"model": "gpt-4o"})
reviewer = ConversableAgent(name="Reviewer", llm_config={"model": "gpt-4o"})

with agentwatch.trace("code-review"):
    assistant.initiate_chat(reviewer, message="Review this code", max_turns=3)
# Full conversation flow traced: Assistant -> Reviewer -> Assistant.

Any Framework / Direct HTTP

pythonimport agentwatch                          # <-- line 1
agentwatch.init(api_key="your-key")         # <-- line 2. Done.

# Works with LlamaIndex, Haystack, Semantic Kernel, raw httpx/requests —
# anything that calls an AI API over HTTP is captured automatically.
# Supported: OpenAI, Anthropic, Gemini, Mistral, Groq, Cohere,
# Together AI, Fireworks, Ollama, Azure OpenAI, and more.

Self-Hosting

Run AeneasSoft locally with Docker. No cloud dependency. No account needed.

Minimal (Development)

bashdocker compose -f docker-compose.local.yml up -d
# Starts: ClickHouse + Backend
# Dashboard: http://localhost:3001
# No Kafka. No auth. LOCAL_MODE=true.

Full Stack (Production)

bashdocker compose up -d
# Starts: ClickHouse + Kafka + Backend + Proxy
# Configure via .env file
# Services: clickhouse (8123), kafka (9092), backend (3001), proxy (8080)

Verify It Works

bash# Check server health:
curl http://localhost:3001/health

# Send a test trace:
python -c "import agentwatch; agentwatch.init(); agentwatch.verify()"

Environment Variables

Configure the backend via .env file or environment variables.

Core

Parameter	Type	Default	Description
PORT	number	3001	Backend server port.
CLICKHOUSE_URL	string	http://localhost:8123	ClickHouse database endpoint.
CLICKHOUSE_DB	string	productname	ClickHouse database name.
KAFKA_BROKERS	string	—	Kafka connection string. Optional — if not set, direct ingest is used.
CORS_ORIGINS	string	http://localhost:3000	Comma-separated allowed CORS origins.

Authentication

Parameter	Type	Default	Description
API_KEY	string	—	Single-tenant API key for SDK authentication.
JWT_SECRET	string	—	Secret for JWT token signing. Generate: openssl rand -hex 32
LOCAL_MODE	boolean	auto	Skip all auth. Auto-enabled if neither JWT_SECRET nor API_KEY is set.

Data & Privacy

Parameter	Type	Default	Description
ZERO_DATA_RETENTION	boolean	false	Don't store prompts/responses. Metadata only.
DATA_RETENTION_DAYS	number	30	ClickHouse TTL — data auto-deleted after N days.

Email (Optional)

Parameter	Type	Default	Description
RESEND_API_KEY	string	—	Resend API key for sending emails (alerts, welcome, password reset).
FROM_EMAIL	string	noreply@aeneassoft.com	Sender email address.

API Reference

All endpoints accept JSON. Authentication via JWT token or X-API-Key header. In LOCAL_MODE, no auth required.

Method	Endpoint	Description
GET	/health	Server status. No auth required.
POST	/api/ingest	Ingest ATP span. API key auth.
GET	/api/traces	List traces. Supports ?search, ?status, ?agent_id, ?model, ?from, ?to, ?sort, ?order, ?limit, ?offset.
GET	/api/traces/:id/spans	All spans for a trace.
GET	/api/traces/:id/graph	Causal execution graph for a trace.
GET	/api/traces/:id/compliance-score	EU AI Act Article 12 readiness score (0-100).
GET	/api/traces/:id/compliance-report	RSA-signed PDF compliance report (Enterprise).
GET	/api/metrics	Dashboard KPIs: traces, tokens, cost, latency, error rate.
GET	/api/cost/daily	Daily cost breakdown.
GET	/api/cost/by-agent	Cost breakdown per agent.
GET	/api/cost/by-model	Cost breakdown per model.
GET	/api/reports/monthly	Monthly report data (JSON).
GET	/api/reports/monthly/csv	Monthly traces export (CSV).
GET	/api/reports/monthly/pdf	Monthly report (RSA-signed PDF).
GET	/api/circuit-breaker/status	Real-time circuit breaker state.
GET	/api/alerts	List alert rules.
POST	/api/alerts	Create alert rule.
DELETE	/api/alerts/:id	Delete alert rule.
PATCH	/api/alerts/:id	Toggle alert enabled/disabled.
GET	/api/alerts/history	Alert event history.
POST	/api/alerts/sdk-alert	Receive circuit breaker alert from SDK.

Alert System

AeneasSoft has two layers of alerting: SDK-level (in-process) and backend-level (server-side rules).

SDK Alerts (In-Process)

Configured via init() parameters. Fires immediately when thresholds are exceeded. 60-second cooldown between duplicate alerts.

pythondef my_alert_handler(alert):
    print(f"Alert: {alert['reason']}")
    print(f"Scope: {alert['scope']} / {alert['scope_id']}")
    print(f"Threshold: {alert['threshold']} | Current: {alert['current']}")
    # Send to Slack, PagerDuty, email, etc.

agentwatch.init(
    budget_per_hour=10.0,
    on_alert=my_alert_handler
)

Backend Alerts (Server-Side)

Create rules via the dashboard or API. Backend evaluates rules against incoming trace data and sends email notifications via Resend.

bash# Create an alert rule via API:
curl -X POST http://localhost:3001/api/alerts \
  -H "Content-Type: application/json" \
  -d '{
    "name": "High Cost Alert",
    "condition": "cost_per_hour",
    "threshold": 50.0,
    "action_type": "email"
  }'

Streaming Support

AeneasSoft wraps OpenAI and Anthropic stream objects transparently. Your code receives chunks exactly as before — the SDK captures metadata in the background.

Non-streaming calls: 100% accuracy. Full request + response captured.

Streaming calls: Request + final usage summary (tokens, cost) captured. Individual chunks are not logged.

Full chunk-level streaming tracing ships Q3 2026.

python# Streaming works transparently:
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)
for chunk in stream:
    print(chunk.choices[0].delta.content, end="")
# → Trace captured with final token count and cost after stream completes

Architecture

In-Process Telemetry Interception with Safe Pause & Resume (Patent Pending).

Your Code
    |
    v
[agentwatch.init()]
    |
    +--→ Intercepts every LLM call in-process (no proxy)
    |       Captures: model, tokens, cost, latency, input/output
    |       Active Defense: budget/error/loop check BEFORE request
    |
    +--→ Circuit Breaker State Machine
            CLOSED → OPEN → PAUSED → HALF_OPEN → CLOSED
            on_block hook: save state before exception
            Recovery: pause() / increase_budget() / reset()
            |
            v
        AI Provider (OpenAI, Anthropic, Gemini, Mistral, Groq, etc.)

In-Process Interception

AeneasSoft captures every LLM call inside your application process. No external proxy, no network hop, no single point of failure. Works with any AI provider accessible via HTTP.

Safe Pause & Resume

When a threshold is exceeded, the circuit breaker transitions through a full state machine (CLOSED → OPEN → PAUSED → HALF_OPEN). The on_block hook fires before the exception, letting you save state. Recovery methods on the monitor let you pause, increase budget, or reset.

EU AI Act Compliance

Every circuit breaker state change (triggered, paused, budget increased, recovered, re-tripped, reset) is automatically logged as a compliance-relevant span with eu_ai_act_art12_relevant flag.

Supported Providers

Auto-detected via URL: OpenAI, Anthropic, Gemini, Mistral, Groq, Cohere, Together AI, Fireworks, Azure OpenAI, Ollama — and any provider accessible via HTTP.

Migration Guide

From Langfuse

Langfuse requires decorators or callbacks on every function. AeneasSoft requires zero code changes beyond init().

python# Before (Langfuse):
from langfuse.decorators import observe
from langfuse.openai import openai

@observe()
def my_function():
    client = openai.OpenAI()
    return client.chat.completions.create(...)

# After (AeneasSoft):
import agentwatch
agentwatch.init()

def my_function():
    client = OpenAI()
    return client.chat.completions.create(...)
# That's it. Remove all decorators. Remove langfuse imports.

From LangSmith

LangSmith is tightly coupled to LangChain. AeneasSoft works with any framework.

python# Before (LangSmith):
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls_..."
# Only works with LangChain

# After (AeneasSoft):
import agentwatch
agentwatch.init()
# Works with LangChain, CrewAI, raw OpenAI, httpx, any framework

From Helicone

Helicone requires routing all traffic through a proxy (single point of failure). AeneasSoft runs in-process.

python# Before (Helicone):
client = OpenAI(
    base_url="https://oai.helicone.ai/v1",  # Proxy SPOF
    default_headers={"Helicone-Auth": "Bearer sk-..."}
)

# After (AeneasSoft):
import agentwatch
agentwatch.init()
client = OpenAI()  # Direct connection. No proxy. No SPOF.

Troubleshooting & FAQ

No traces appearing in the dashboard?

1. Check that the backend is running: curl http://localhost:3001/health
2. Ensure agentwatch.init() is called before any LLM calls.
3. Check that your SDK version is up to date: pip install --upgrade aeneas-agentwatch

CircuitBreakerException raised unexpectedly?

Check your threshold configuration. budget_per_hour is cumulative over a rolling 1-hour window. Use agentwatch.get_state() to see current values. Set block_on_threshold=False to switch to alert-only mode.

Cost shows $0.00 for all traces?

The SDK needs token counts from the API response to calculate cost. Ensure you're using a supported model (20+ models tracked). Unknown models use a default rate of $1.00/$2.00 per 1M tokens.

Does it work with async code?

Yes. Both sync and async clients are patched (OpenAI, AsyncOpenAI, httpx.AsyncClient, aiohttp). Context propagation uses ContextVar for async safety.

Does it add latency to my LLM calls?

Negligible. The interceptor adds ~0.1ms overhead per call (threshold check + span recording). No network proxy, no extra HTTP hop.

Can I use it in production?

Yes. Thread-safe (Lock-protected), memory-capped (deque maxlen=10,000 per monitor ≈ 240KB RAM), and battle-tested with CI on every push.

How do I disable it temporarily?

Don't call agentwatch.init(). The SDK only activates when init() is explicitly called. No environment variable side effects.

Honest Boundaries

We believe transparency about limitations builds more trust than hiding them.

What we do well

Non-streaming LLM calls: 100% accurate capture
In-process interception with zero network latency
Circuit breaker with Safe Pause & Resume state machine
Cost tracking for 20+ models with current list prices

What we don't do (yet)

Prompt management / versioning
A/B testing / evaluation pipelines
Streaming: individual chunks not captured (Q3 2026)
Cost: batch API, cached tokens, fine-tuned model rates
Multi-process central configuration

Monkey-patching disclaimer

We modify HTTP library internals at runtime. We test against pinned library versions and ship SDK updates within 48 hours of breaking changes upstream.