Integration Documentation

Track AI agent runs and automatically detect cost waste. The AgentYield SDK captures LLM and tool calls, then analyzes them for inefficiencies.

Supported providers: OpenAI, Anthropic, Google, Meta / Llama (via Together AI), Mistral, Groq, DeepSeek, and OpenRouter (prefix upstream models with openrouter/). Using a model that isn't built in? Pass costUsd and contextWindowUsed yourself.

SDK Documentation

Installation

npm install @agentyield/sdk

Quick Start

Add ~10 lines to your agent. That's it.

defined-run.ts
import OpenAI from "openai"
import { AgentYield } from "@agentyield/sdk"

const openai = new OpenAI()
const ay = new AgentYield({
  apiKey: process.env.AGENTYIELD_API_KEY,
  agentId: "research-agent",
})

const run = ay.startRun({ metadata: { task: "Q2 competitive research" } })

// Your code — LLM call
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Summarize the competitive landscape for Q2" }],
})

// [AgentYield] Track the LLM call — SDK computes cost automatically
run.trackLLMCall({
  model: response.model,
  inputTokens: response.usage.prompt_tokens,
  outputTokens: response.usage.completion_tokens,
  costUsd: ay.computeCost(response.model, response.usage),
  contextWindowUsed: response.usage.prompt_tokens / 128_000,
  purpose: "competitive_summary",
})

// Your code — tool call
const searchResults = await webSearch("AI agent monitoring tools 2026")

// [AgentYield] Track tool usage — no cost for free tools, pass 0
run.trackToolCall({
  tool: "web_search",
  input: { query: "AI agent monitoring tools 2026" },
  costUsd: 0, // web_search cost varies by provider — pass actual cost or 0
  metadata: { resultCount: searchResults.length },
})

await run.end({ status: "success" })

API Reference

new AgentYield(config)

Initialize the SDK. Call once per application.

ParameterTypeRequiredDescription
apiKeystringYesYour API key (ay_live_... or ay_test_...)
agentIdstringYesIdentifies which agent is being tracked
baseURLstringNoAPI endpoint override for self-hosted

ay.startRun(options?)

Start a new run. Returns a Run instance.

ParameterTypeRequiredDescription
runIdstringNoCustom run ID (auto-generated if omitted)
metadataobjectNoArbitrary metadata attached to the run

run.trackLLMCall(data)

Record an LLM API call within the run.

ParameterTypeRequiredDescription
modelstringYesModel name (e.g. "gpt-4o")
inputTokensnumberYesInput token count
outputTokensnumberYesOutput token count
costUsdnumberYesCost in USD
contextWindowUsednumberNoProportion of context window used (0.0–1.0)
purposestringNoLabel for this call (e.g. "initial_planning")

run.trackToolCall(data)

Record a tool invocation. The input is hashed locally — raw data never leaves your app.

ParameterTypeRequiredDescription
toolstringYesTool name
inputobjectYesTool input (hashed locally, never sent raw)
costUsdnumberYesCost in USD
redactstring[]NoFields to remove before hashing
metadataobjectNoAdditional metadata

run.checkpoint(options?)

Flush all buffered events to AgentYield as a named time window, then reset the buffer. The run stays open — use this for long-running or always-on agents that need to report metrics periodically. Returns a Promise<void>.

ParameterTypeRequiredDescription
labelstringNoLabel for this window (defaults to window_1, window_2, …)
long-running-agent.ts
const run = ay.startRun();

// flush metrics every 5 minutes — run stays open
setInterval(async () => {
  await run.checkpoint({ label: "5m-window" });
}, 5 * 60 * 1000);

// later, when the agent shuts down
await run.end({ status: "success" });

Each checkpoint sends the events collected since the last checkpoint (or since the run started). After a successful checkpoint the internal buffer is cleared, so end() only sends events recorded after the most recent checkpoint.

run.end(options)

Flush any remaining buffered events and close the run. Returns a Promise<void>.

ParameterTypeDescription
status"success" | "error" | "timeout"Final run status

Guide: Long-Running Agents

Not every agent starts a task and finishes minutes later. Monitoring bots, always-on assistants, and framework-based agents like OpenClaw can run for hours or indefinitely. The checkpoint() method is designed for these workloads.

How it works

  1. Call ay.startRun() once when the agent starts.
  2. Track LLM and tool calls as normal with trackLLMCall() and trackToolCall().
  3. Periodically call run.checkpoint() to flush buffered events. Each checkpoint sends only the events recorded since the last flush.
  4. When the agent shuts down, call run.end() to close the run and send any remaining events.

If the process exits unexpectedly, the SDK's built-in exit handler will attempt to flush remaining events automatically.

Example: Timer-based checkpoints

The simplest pattern — flush every N minutes regardless of activity:

long-running-agent.ts
import OpenAI from "openai"
// [AgentYield] Import the SDK
import { AgentYield } from "@agentyield/sdk"

const openai = new OpenAI()
// [AgentYield] Initialize with your API key
const ay = new AgentYield({
  apiKey: process.env.AGENTYIELD_API_KEY,
  agentId: "support-monitor",
})

// [AgentYield] Start a run to track this worker
const run = ay.startRun({ metadata: { env: "production" } })

// [AgentYield] Flush to AgentYield every 5 minutes — run stays open
const flushInterval = setInterval(async () => {
  await run.checkpoint({ label: `window-${new Date().toISOString()}` })
}, 5 * 60 * 1000)

// Your code — process tasks from a queue indefinitely
for await (const ticket of supportTicketQueue) {

  try {

    const response = await openai.chat.completions.create({

      model: "gpt-4o",

      messages: [{ role: "user", content: ticket.body }],

    })

    run.trackLLMCall({

      model: response.model,

      inputTokens: response.usage.prompt_tokens,

      outputTokens: response.usage.completion_tokens,

      costUsd: ay.computeCost(response.model, response.usage),

      contextWindowUsed: response.usage.prompt_tokens / ay.contextWindowSize(response.model),

      purpose: "ticket_response",

    })

    await postReply(ticket.id, response.choices[0].message.content)

  } catch (err) {

    // Log the error but keep the agent running

    console.error(`[agent] Failed to process ticket ${ticket.id}:`, err)

  }

}

// Graceful shutdown — flush remaining events and close the run
process.on("SIGTERM", async () => {
  clearInterval(flushInterval)
  // [AgentYield] End the run on shutdown
  await run.end({ status: "success" })
  process.exit(0)
})

Example: Event-based checkpoints

Flush after completing a unit of work — useful when activity is bursty:

task-queue-agent.ts
import OpenAI from "openai"
import { AgentYield } from "@agentyield/sdk"

const openai = new OpenAI()
const ay = new AgentYield({
  apiKey: process.env.AGENTYIELD_API_KEY,
  agentId: "task-queue-agent",
})

const run = ay.startRun()

async function processTask(task: Task) {
  try {
    // Your code — LLM call
    const response = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: task.prompt }],
    })

    // [AgentYield] Track the LLM call — SDK handles cost and context math
    run.trackLLMCall({
      model: response.model,
      inputTokens: response.usage.prompt_tokens,
      outputTokens: response.usage.completion_tokens,
      costUsd: ay.computeCost(response.model, response.usage),
      contextWindowUsed: response.usage.prompt_tokens / ay.contextWindowSize(response.model),
      purpose: "task_processing",
    })

    // Your code — write result to database
    await db.write({ id: task.id, result: response.choices[0].message.content })

    // [AgentYield] Track the tool call — no cost for internal db writes
    run.trackToolCall({
      tool: "db_write",
      input: { id: task.id },
      costUsd: 0,
    })

    // [AgentYield] Flush after each task completes
    await run.checkpoint({ label: `task-${task.id}` })

  } catch (err) {
    console.error(`[agent] Failed to process task ${task.id}:`, err)
  }
}

// Your code — process tasks from a queue indefinitely
for await (const task of taskQueue) {
  await processTask(task)
}

// Graceful shutdown
process.on("SIGTERM", async () => {
  await run.end({ status: "success" })
  process.exit(0)
})

Checkpoint labels

Labels help you identify windows in your dashboard. If you don't provide one, the SDK auto-generates window_1, window_2, etc. Use descriptive labels when the context matters:

await run.checkpoint({ label: "morning-batch" });
await run.checkpoint({ label: "afternoon-batch" });
await run.checkpoint(); // → "window_3"

Best practices

  • Choose a flush interval that matches your workload. Every 5 minutes is a good default. High-throughput agents may benefit from 1-minute windows; low-activity agents can use 15–30 minutes.
  • Use meaningful labels. Labels like task-1234 or hourly-2pm make it easier to correlate waste patterns with specific workloads.
  • Always call end() on shutdown. This ensures the final batch of events is captured. The SDK handles unexpected exits via process signal handlers, but an explicit end() is more reliable.
  • Checkpoints are lightweight. Each one is a single HTTP POST. The event buffer is cleared after each successful flush, keeping memory usage constant over time.

OpenClaw Integration

AgentYield is available as an OpenClaw skill. No SDK code required — install the skill and waste detection runs automatically inside every OpenClaw session.

Install the skill

In any OpenClaw session, run:

/skill install agentyield

Configure your API key

Add your AgentYield API key to your OpenClaw environment config (.env or openclaw.json):

.env
AGENTYIELD_API_KEY=ay_live_xxxxxxxxxxxxxxxxxxxx

How it works

  1. The skill hooks into the OpenClaw session event lifecycle, intercepting every LLM call and tool call the agent makes.
  2. Events are buffered in memory and flushed as checkpoints — every 50 events or every 30 minutes, whichever comes first.
  3. After each checkpoint, the skill logs a line to the session: [AgentYield] Checkpoint sent — Waste Score: 34. View: https://agentyield.co/runs/...
  4. All API calls are fire-and-forget with a 3-second timeout. The skill never blocks or slows down the host agent.

Configuration

The skill accepts optional config fields in the OpenClaw skill frontmatter:

FieldDefaultDescription
checkpointEvery50Flush a checkpoint every N events
checkpointInterval30Flush a checkpoint every N minutes

OpenClaw vs SDK

The OpenClaw skill is ideal when you're already using OpenClaw and want zero-code setup. For custom agents, direct integrations, or fine-grained control over event tracking, use the TypeScript SDK instead.

Test Mode

Use an ay_test_ prefixed key to validate your integration without sending data. Useful for CI/CD.

const ay = new AgentYield({
  apiKey: "ay_test_xxxxxxxxxxxxxxxxxxxx",
  agentId: "my-agent",
})
// Logs: [AgentYield] Test mode — run data not sent

PII Safety

Tool inputs are hashed locally using SHA-256. Only the hash is transmitted — raw inputs never leave your application. Use redact to exclude sensitive fields:

run.trackToolCall({
  tool: "db_query",
  input: {
    table: "users",
    filter: { email: "[email protected]", ssn: "123-45-6789" }
  },
  redact: ["filter.email", "filter.ssn"], // PII removed before hashing
  costUsd: 0, // internal query — no direct API cost
})
// → Hash is computed on { table: "users", filter: {} }
// → Raw values never leave your application

Connectors

Already logging your LLM traffic in another observability tool? Skip the SDK install. AgentYield Connectors pull your existing request history into AgentYield and run our waste detection on top — duplicate calls, oversized context, model mismatches, excessive retries, and redundant reads.

How they work

  1. Paste a read-only API key from your observability provider on the Connectors page. Keys are encrypted at rest and only decrypted server-side at sync time.
  2. Run a manual sync to pull the last 30 days of requests, or flip on Auto-sync daily to pull anything new each day automatically.
  3. We group requests into runs (by session ID where available, otherwise per day), score waste, and surface findings in your dashboard alongside SDK-tracked runs.
  4. If an auto-sync fails for 2+ consecutive days, you get an in-app notification and an email so you can fix it before data gaps grow.

Supported platforms

PlatformStatusNotes
HeliconeAvailableUS and EU regions. Uses the bulk Clickhouse query endpoint; bucket runs by Helicone-Session-Id.
LangSmithComing soonSync runs and traces from LangSmith projects.
LangfuseComing soonSync traces and observations from Langfuse.

Need a connector for a different platform? Let us know and we'll prioritize it.

Get Your API Key

Create a free account and generate an API key to start sending run data.

Package: @agentyield/sdk · npm