Documentation

Everything you need to install, configure, and run AeneasSoft.

Quickstart

Get your first trace and activate Active Defense in under 2 minutes.

Step 1: Install the SDK

bashpip install aeneas-agentwatch

Step 2: Initialize in your code

pythonimport agentwatch
agentwatch.init()
# That's it. Every LLM call is now monitored.

Step 3: Make an LLM call

pythonfrom openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
# Trace captured automatically. No callbacks, no wrappers.

Step 4: View your traces

Open http://localhost:3001 to see your dashboard, or use the API:

bashcurl http://localhost:3001/api/traces

Configuration Reference

All parameters for agentwatch.init():

ParameterTypeDefaultDescription
api_keystr | NoneNoneAPI key for cloud mode. If not set, SDK connects to localhost (development mode).
proxy_urlstr | NoneautoBackend ingest URL. Auto-detected: localhost:3001 (no key) or api.aeneassoft.com (with key).
zero_data_retentionboolFalseStrip prompt/response text from spans. Only metadata (model, tokens, cost) is sent.
budget_per_hourfloat | NoneNoneHourly cost limit in USD. Triggers alert (or block) when exceeded.
max_error_ratefloat | NoneNoneError rate threshold (0.0–1.0). Triggers when error ratio exceeds this in a 5-min window.
max_calls_per_minuteint | NoneNoneLoop detection. Triggers when calls per minute exceed this threshold.
block_on_thresholdboolFalseIf True, raises CircuitBreakerException and blocks the request. If False, alert only.
on_alertcallable | NoneNoneCallback function invoked when any threshold is exceeded. Receives alert dict.
on_blockcallable | NoneNonePre-block hook (sync or async). Fires BEFORE CircuitBreakerException. Receives BlockEvent with .monitor for recovery.

Smart URL Detection

The SDK automatically determines where to send traces:

api_key not set → http://localhost:3001/api/ingest (local dev)

api_key set → https://api.aeneassoft.com/api/ingest (cloud)

proxy_url set → uses your custom URL

Full Example

pythonimport agentwatch

agentwatch.init(
    budget_per_hour=10.0,         # Alert if agent spends > $10/hour
    max_error_rate=0.5,           # Alert if > 50% of calls fail
    max_calls_per_minute=100,     # Detect infinite loops
    block_on_threshold=True,      # Block calls (not just alert)
    zero_data_retention=True,     # GDPR: don't store prompts
    on_alert=lambda alert: print(f"ALERT: {alert['reason']}")
)

Active Defense (Safe Pause & Resume)

Block runaway AI agents before they drain your budget — then recover without losing state. Active Defense monitors cost, error rate, and call frequency with a full state machine (CLOSED → OPEN → PAUSED → HALF_OPEN).

How It Works

1. SDK intercepts every LLM call at the HTTP transport layer

2. Before the request leaves your process, thresholds are checked

3. If a threshold is exceeded and block_on_threshold=True:

CircuitBreakerException is raised. Request never sent. $0 wasted.

4. If block_on_threshold=False: alert fires, request proceeds

Thresholds

ThresholdWindowBehavior
budget_per_hourRolling 1 hourFires when cumulative cost exceeds limit
max_error_rateRolling 5 minutesFires when error ratio exceeds threshold (min. 10 calls required)
max_calls_per_minuteRolling 1 minuteFires when call count exceeds limit (loop detection)

Alert Cooldown

Duplicate alerts are suppressed for 60 seconds to prevent spam. Each unique threshold violation has its own cooldown timer.

Catching Blocked Requests + Recovery

pythonfrom agentwatch import CircuitBreakerException

try:
    result = client.chat.completions.create(...)
except CircuitBreakerException as e:
    print(f"Blocked: {e.reason}")
    print(f"State: {e.state}")  # Full monitor snapshot

    # Recovery via e.monitor:
    e.monitor.pause(60)           # Half-Open after 60s (probe call)
    e.monitor.increase_budget(5)  # Or: raise budget, resume immediately
    e.monitor.reset_windows()     # Emergency: clear all windows

    result = client.chat.completions.create(...)  # Next call goes through

State Machine

CLOSED → threshold exceeded → OPEN (all calls blocked)

OPENpause(N)PAUSED (waiting for probe window)

PAUSED → timeout elapsed → HALF_OPEN (one probe call allowed)

HALF_OPEN → probe OK → CLOSED | probe fails → OPEN

OPENincrease_budget(N)CLOSED (immediate)

Pre-Block Hook (on_block)

Save state before the exception propagates. Supports sync and async callbacks.

pythonasync def save_state(event):
    await db.save({"agent": event.scope_id, "spent": event.current})
    event.monitor.pause(30)  # Pause instead of hard kill

agentwatch.init(
    budget_per_hour=5.0,
    block_on_threshold=True,
    on_block=save_state,  # Fires BEFORE exception
)

LangGraph Checkpoint Resume

pythontry:
    result = graph.invoke({"messages": [...]}, config=config)
except CircuitBreakerException as e:
    # LangGraph auto-checkpoints before exception propagates
    e.monitor.increase_budget(10.0)
    result = graph.invoke(None, config=config)  # Resumes from checkpoint

EU AI Act Compliance

Every state transition (triggered, paused, budget_increased, reset, recovered, retrip) emits a compliance-relevant span with eu_ai_act_art12_relevant flag. No state change goes unlogged.


Per-Agent Scoping

Set individual budgets and thresholds for each agent. Global limits still apply — per-agent limits are checked in addition.

pythonimport agentwatch

agentwatch.init(budget_per_hour=50.0, block_on_threshold=True)

# Each agent gets its own budget:
with agentwatch.agent("ResearchBot",
                       role="Researcher",
                       budget_per_hour=10.0,
                       block_on_threshold=True):
    result = client.chat.completions.create(...)
    # Blocked if ResearchBot exceeds $10/hr OR global exceeds $50/hr

with agentwatch.agent("WriterBot",
                       role="Writer",
                       budget_per_hour=20.0,
                       max_error_rate=0.3):
    result = client.chat.completions.create(...)
    # WriterBot has its own budget AND error rate threshold

agent() Parameters

ParameterTypeDefaultDescription
namestrrequiredAgent display name. Used in traces and dashboard.
rolestr"Agent"Agent role for classification (e.g., Researcher, Writer, Orchestrator).
agent_idstr | NoneautoUnique ID. Auto-generated from name if not set.
budget_per_hourfloat | NoneNonePer-agent hourly budget limit in USD.
max_error_ratefloat | NoneNonePer-agent error rate threshold (0.0–1.0).
block_on_thresholdboolFalseBlock calls when this agent's thresholds are exceeded.
on_alertcallable | NoneNonePer-agent alert callback.
on_blockcallable | NoneNonePre-block hook. Receives BlockEvent with .monitor for recovery decisions.

Nested Agents

Agent contexts can be nested. Inner agents have their own thresholds while outer agent and global thresholds still apply.

pythonwith agentwatch.agent("Orchestrator", budget_per_hour=100.0):
    # Orchestrator scope
    with agentwatch.agent("SubAgent", budget_per_hour=10.0):
        # SubAgent scope — both SubAgent AND Orchestrator limits checked
        result = client.chat.completions.create(...)

Cost Tracking

AeneasSoft calculates real USD cost for every LLM call using current list prices.

Supported Models & Pricing

Prices in USD per 1M tokens (input / output):

ModelInputOutput
gpt-4o$5.00$15.00
gpt-4o-mini$0.15$0.60
gpt-4-turbo$10.00$30.00
gpt-4$30.00$60.00
gpt-3.5-turbo$0.50$1.50
claude-opus-4-6$15.00$75.00
claude-sonnet-4-6$3.00$15.00
claude-haiku-4-5$0.80$4.00
gemini-1.5-pro$3.50$10.50
gemini-1.5-flash$0.075$0.30
mistral-large-latest$4.00$12.00
mistral-small-latest$1.00$3.00
command-r-plus$3.00$15.00
command-r$0.50$1.50
llama3-70b-8192$0.59$0.79
llama3-8b-8192$0.05$0.08
mixtral-8x7b-32768$0.24$0.24

Unknown models use a default of $1.00 / $2.00 per 1M tokens. Model name matching supports prefixes (e.g., "gpt-4o-2024-11-20" matches "gpt-4o").


Node.js SDK

bashnpm install @aeneassoft/sdk-node
typescriptimport { init, trace, span, currentTraceId } from '@aeneassoft/sdk-node';

// Initialize
init({
  apiKey: 'aw_your_key_here',     // Required
  ingestUrl: 'http://localhost:3001/api/ingest',  // Optional
  zeroDataRetention: false         // Optional
});

// Group calls into a trace
await trace('my-workflow', {}, async () => {
  const response = await openai.chat.completions.create({...});
  console.log('Trace ID:', currentTraceId());
});

Node.js init() Parameters

ParameterTypeDefaultDescription
apiKeystringrequiredAuthentication key for the backend.
ingestUrlstringapi.aeneassoft.comBackend ingest endpoint URL.
zeroDataRetentionbooleanfalseStrip input/output from spans.

Framework Compatibility

AeneasSoft operates at the HTTP transport layer — below every framework. No plugins. No middleware. No wrappers. Just 2 lines.

Every framework that calls an AI provider over HTTP is automatically instrumented. This includes LangChain, CrewAI, AutoGen, LlamaIndex, Haystack, Semantic Kernel, and any custom code.

LangChain

pythonimport agentwatch                          # <-- line 1
agentwatch.init(api_key="your-key")         # <-- line 2. Done.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

chain = ChatPromptTemplate.from_template("Explain {topic}") | ChatOpenAI(model="gpt-4o") | StrOutputParser()

with agentwatch.trace("research-chain"):
    result = chain.invoke({"topic": "EU AI Act"})
# Every LLM call traced. No LangSmith required.

CrewAI

pythonimport agentwatch                          # <-- line 1
agentwatch.init(api_key="your-key")         # <-- line 2. Done.

from crewai import Agent, Task, Crew

researcher = Agent(role="Researcher", goal="Find data", verbose=True)
writer = Agent(role="Writer", goal="Write content", verbose=True)
crew = Crew(agents=[researcher, writer], tasks=[...])

with agentwatch.trace("compliance-crew"):
    result = crew.kickoff()
# Both agents traced — tokens, cost, latency per step.

AutoGen

pythonimport agentwatch                          # <-- line 1
agentwatch.init(api_key="your-key")         # <-- line 2. Done.

from autogen import ConversableAgent

assistant = ConversableAgent(name="Assistant", llm_config={"model": "gpt-4o"})
reviewer = ConversableAgent(name="Reviewer", llm_config={"model": "gpt-4o"})

with agentwatch.trace("code-review"):
    assistant.initiate_chat(reviewer, message="Review this code", max_turns=3)
# Full conversation flow traced: Assistant -> Reviewer -> Assistant.

Any Framework / Direct HTTP

pythonimport agentwatch                          # <-- line 1
agentwatch.init(api_key="your-key")         # <-- line 2. Done.

# Works with LlamaIndex, Haystack, Semantic Kernel, raw httpx/requests —
# anything that calls an AI API over HTTP is captured automatically.
# Supported: OpenAI, Anthropic, Gemini, Mistral, Groq, Cohere,
# Together AI, Fireworks, Ollama, Azure OpenAI, and more.

Self-Hosting

Run AeneasSoft locally with Docker. No cloud dependency. No account needed.

Minimal (Development)

bashdocker compose -f docker-compose.local.yml up -d
# Starts: ClickHouse + Backend
# Dashboard: http://localhost:3001
# No Kafka. No auth. LOCAL_MODE=true.

Full Stack (Production)

bashdocker compose up -d
# Starts: ClickHouse + Kafka + Backend + Proxy
# Configure via .env file
# Services: clickhouse (8123), kafka (9092), backend (3001), proxy (8080)

Verify It Works

bash# Check server health:
curl http://localhost:3001/health

# Send a test trace:
python -c "import agentwatch; agentwatch.init(); agentwatch.verify()"

Environment Variables

Configure the backend via .env file or environment variables.

Core

ParameterTypeDefaultDescription
PORTnumber3001Backend server port.
CLICKHOUSE_URLstringhttp://localhost:8123ClickHouse database endpoint.
CLICKHOUSE_DBstringproductnameClickHouse database name.
KAFKA_BROKERSstringKafka connection string. Optional — if not set, direct ingest is used.
CORS_ORIGINSstringhttp://localhost:3000Comma-separated allowed CORS origins.

Authentication

ParameterTypeDefaultDescription
API_KEYstringSingle-tenant API key for SDK authentication.
JWT_SECRETstringSecret for JWT token signing. Generate: openssl rand -hex 32
LOCAL_MODEbooleanautoSkip all auth. Auto-enabled if neither JWT_SECRET nor API_KEY is set.

Data & Privacy

ParameterTypeDefaultDescription
ZERO_DATA_RETENTIONbooleanfalseDon't store prompts/responses. Metadata only.
DATA_RETENTION_DAYSnumber30ClickHouse TTL — data auto-deleted after N days.

Email (Optional)

ParameterTypeDefaultDescription
RESEND_API_KEYstringResend API key for sending emails (alerts, welcome, password reset).
FROM_EMAILstringnoreply@aeneassoft.comSender email address.

API Reference

All endpoints accept JSON. Authentication via JWT token or X-API-Key header. In LOCAL_MODE, no auth required.

MethodEndpointDescription
GET/healthServer status. No auth required.
POST/api/ingestIngest ATP span. API key auth.
GET/api/tracesList traces. Supports ?search, ?status, ?agent_id, ?model, ?from, ?to, ?sort, ?order, ?limit, ?offset.
GET/api/traces/:id/spansAll spans for a trace.
GET/api/traces/:id/graphCausal execution graph for a trace.
GET/api/traces/:id/compliance-scoreEU AI Act Article 12 readiness score (0-100).
GET/api/traces/:id/compliance-reportRSA-signed PDF compliance report (Enterprise).
GET/api/metricsDashboard KPIs: traces, tokens, cost, latency, error rate.
GET/api/cost/dailyDaily cost breakdown.
GET/api/cost/by-agentCost breakdown per agent.
GET/api/cost/by-modelCost breakdown per model.
GET/api/reports/monthlyMonthly report data (JSON).
GET/api/reports/monthly/csvMonthly traces export (CSV).
GET/api/reports/monthly/pdfMonthly report (RSA-signed PDF).
GET/api/circuit-breaker/statusReal-time circuit breaker state.
GET/api/alertsList alert rules.
POST/api/alertsCreate alert rule.
DELETE/api/alerts/:idDelete alert rule.
PATCH/api/alerts/:idToggle alert enabled/disabled.
GET/api/alerts/historyAlert event history.
POST/api/alerts/sdk-alertReceive circuit breaker alert from SDK.

Alert System

AeneasSoft has two layers of alerting: SDK-level (in-process) and backend-level (server-side rules).

SDK Alerts (In-Process)

Configured via init() parameters. Fires immediately when thresholds are exceeded. 60-second cooldown between duplicate alerts.

pythondef my_alert_handler(alert):
    print(f"Alert: {alert['reason']}")
    print(f"Scope: {alert['scope']} / {alert['scope_id']}")
    print(f"Threshold: {alert['threshold']} | Current: {alert['current']}")
    # Send to Slack, PagerDuty, email, etc.

agentwatch.init(
    budget_per_hour=10.0,
    on_alert=my_alert_handler
)

Backend Alerts (Server-Side)

Create rules via the dashboard or API. Backend evaluates rules against incoming trace data and sends email notifications via Resend.

bash# Create an alert rule via API:
curl -X POST http://localhost:3001/api/alerts \
  -H "Content-Type: application/json" \
  -d '{
    "name": "High Cost Alert",
    "condition": "cost_per_hour",
    "threshold": 50.0,
    "action_type": "email"
  }'

Streaming Support

AeneasSoft wraps OpenAI and Anthropic stream objects transparently. Your code receives chunks exactly as before — the SDK captures metadata in the background.

Non-streaming calls: 100% accuracy. Full request + response captured.

Streaming calls: Request + final usage summary (tokens, cost) captured. Individual chunks are not logged.

Full chunk-level streaming tracing ships Q3 2026.

python# Streaming works transparently:
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)
for chunk in stream:
    print(chunk.choices[0].delta.content, end="")
# → Trace captured with final token count and cost after stream completes

Architecture

In-Process Telemetry Interception with Safe Pause & Resume (Patent Pending).

Your Code
    |
    v
[agentwatch.init()]
    |
    +--→ Intercepts every LLM call in-process (no proxy)
    |       Captures: model, tokens, cost, latency, input/output
    |       Active Defense: budget/error/loop check BEFORE request
    |
    +--→ Circuit Breaker State Machine
            CLOSED → OPEN → PAUSED → HALF_OPEN → CLOSED
            on_block hook: save state before exception
            Recovery: pause() / increase_budget() / reset()
            |
            v
        AI Provider (OpenAI, Anthropic, Gemini, Mistral, Groq, etc.)

In-Process Interception

AeneasSoft captures every LLM call inside your application process. No external proxy, no network hop, no single point of failure. Works with any AI provider accessible via HTTP.

Safe Pause & Resume

When a threshold is exceeded, the circuit breaker transitions through a full state machine (CLOSED → OPEN → PAUSED → HALF_OPEN). The on_block hook fires before the exception, letting you save state. Recovery methods on the monitor let you pause, increase budget, or reset.

EU AI Act Compliance

Every circuit breaker state change (triggered, paused, budget increased, recovered, re-tripped, reset) is automatically logged as a compliance-relevant span with eu_ai_act_art12_relevant flag.

Supported Providers

Auto-detected via URL: OpenAI, Anthropic, Gemini, Mistral, Groq, Cohere, Together AI, Fireworks, Azure OpenAI, Ollama — and any provider accessible via HTTP.


Migration Guide

From Langfuse

Langfuse requires decorators or callbacks on every function. AeneasSoft requires zero code changes beyond init().

python# Before (Langfuse):
from langfuse.decorators import observe
from langfuse.openai import openai

@observe()
def my_function():
    client = openai.OpenAI()
    return client.chat.completions.create(...)

# After (AeneasSoft):
import agentwatch
agentwatch.init()

def my_function():
    client = OpenAI()
    return client.chat.completions.create(...)
# That's it. Remove all decorators. Remove langfuse imports.

From LangSmith

LangSmith is tightly coupled to LangChain. AeneasSoft works with any framework.

python# Before (LangSmith):
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls_..."
# Only works with LangChain

# After (AeneasSoft):
import agentwatch
agentwatch.init()
# Works with LangChain, CrewAI, raw OpenAI, httpx, any framework

From Helicone

Helicone requires routing all traffic through a proxy (single point of failure). AeneasSoft runs in-process.

python# Before (Helicone):
client = OpenAI(
    base_url="https://oai.helicone.ai/v1",  # Proxy SPOF
    default_headers={"Helicone-Auth": "Bearer sk-..."}
)

# After (AeneasSoft):
import agentwatch
agentwatch.init()
client = OpenAI()  # Direct connection. No proxy. No SPOF.

Troubleshooting & FAQ

No traces appearing in the dashboard?

1. Check that the backend is running: curl http://localhost:3001/health
2. Ensure agentwatch.init() is called before any LLM calls.
3. Check that your SDK version is up to date: pip install --upgrade aeneas-agentwatch

CircuitBreakerException raised unexpectedly?

Check your threshold configuration. budget_per_hour is cumulative over a rolling 1-hour window. Use agentwatch.get_state() to see current values. Set block_on_threshold=False to switch to alert-only mode.

Cost shows $0.00 for all traces?

The SDK needs token counts from the API response to calculate cost. Ensure you're using a supported model (20+ models tracked). Unknown models use a default rate of $1.00/$2.00 per 1M tokens.

Does it work with async code?

Yes. Both sync and async clients are patched (OpenAI, AsyncOpenAI, httpx.AsyncClient, aiohttp). Context propagation uses ContextVar for async safety.

Does it add latency to my LLM calls?

Negligible. The interceptor adds ~0.1ms overhead per call (threshold check + span recording). No network proxy, no extra HTTP hop.

Can I use it in production?

Yes. Thread-safe (Lock-protected), memory-capped (deque maxlen=10,000 per monitor ≈ 240KB RAM), and battle-tested with CI on every push.

How do I disable it temporarily?

Don't call agentwatch.init(). The SDK only activates when init() is explicitly called. No environment variable side effects.

Honest Boundaries

We believe transparency about limitations builds more trust than hiding them.

What we do well

  • Non-streaming LLM calls: 100% accurate capture
  • In-process interception with zero network latency
  • Circuit breaker with Safe Pause & Resume state machine
  • Cost tracking for 20+ models with current list prices

What we don't do (yet)

  • Prompt management / versioning
  • A/B testing / evaluation pipelines
  • Streaming: individual chunks not captured (Q3 2026)
  • Cost: batch API, cached tokens, fine-tuned model rates
  • Multi-process central configuration

Monkey-patching disclaimer

We modify HTTP library internals at runtime. We test against pinned library versions and ship SDK updates within 48 hours of breaking changes upstream.