Skip to main content
Last updated on

How It Works

OpenBox plugin governs your OpenClaw agent through two paths: tool governance for agent tool calls and LLM guardrails for model inference requests. Both paths evaluate actions against your OpenBox Core policies in real time before execution.

Tool Governance

Every tool call your agent makes passes through the plugin's before_tool_call hook before execution.

Flow

Agent decides to call a tool (e.g. Bash)


before_tool_call hook fires


Plugin sends tool name + parameters to OpenBox Core


Core evaluates against your policies

├── allow → tool executes normally
└── block → tool is prevented, agent sees the block reason


after_tool_call hook fires


Plugin reports tool result + OTel spans to Core

When a tool is blocked, the agent receives a message like:

Blocked by governance policy: <reason from your policy>

The agent sees this as the tool result and can respond accordingly.

What gets sent to Core

For each tool call, the plugin sends:

  • Tool name - which tool the agent is calling
  • Parameters - the full parameters passed to the tool
  • Session identity - workflow ID, run ID, agent ID
  • Timestamps - when the activity started

After execution, the plugin also sends:

  • Tool result - the output (truncated to 4000 characters for string results)
  • Error information - if the tool failed
  • Duration - how long the tool took
  • OTel spans - HTTP requests and filesystem operations that occurred during execution

LLM Guardrails

When you route your LLM provider through the OpenBox gateway (see Configuration), every LLM request is evaluated for guardrails before reaching the model.

Flow

Agent sends LLM request


Request hits local gateway (http://127.0.0.1:18919/v1)


Gateway sends request content to OpenBox Core


Core runs guardrails checks

├── pass (no issues) → request forwarded to LLM provider as-is
├── pass (with redaction) → sensitive content redacted, then forwarded
└── fail (validation failed) → request blocked, agent receives block message


LLM provider response returned to agent

Guardrails

Guardrails are configured in the OpenBox platform - no code or plugin configuration changes needed. Available guardrail types include:

TypeWhat it does
PII DetectionDetects and redacts emails, phone numbers, SSNs, passport numbers, and other personally identifiable information
Content FilteringBlocks requests matching specific content categories
ToxicityBlocks abusive or hostile language from user interactions
Ban WordsCensors or blocks domain-specific banned terms (competitor names, internal codenames, regulated terms)

When guardrails detect sensitive content, there are two possible outcomes:

  • Redaction - the sensitive content is replaced with sanitized values before the request reaches the LLM. For example, an email address in the prompt becomes a redacted placeholder. The agent continues normally with the sanitized input.
  • Block - the request is stopped entirely. The agent receives a message explaining why the request was blocked, formatted as a valid LLM response so the agent can process it normally.

Blocked responses are returned in the correct format for both Chat Completions and Responses APIs, including proper SSE streaming format when the agent requests streaming.

Governance Verdicts

OpenBox Core evaluates every tool call against your configured policies and returns a verdict:

VerdictEffect
allowTool executes normally. This is the default when no policy matches or when Core is unreachable (fail-open).
blockCurrent tool call is prevented. The agent receives the block reason as the tool result and can attempt other actions.

A block verdict stops a single action. The agent can continue using other tools or try a different approach. The block reason from your policy is returned to the agent so it understands why the action was prevented.

Session Lifecycle

The plugin tracks the full lifecycle of each agent session and reports it to OpenBox Core. This is what populates session timelines in the OpenBox dashboard.

Lifecycle events

session_start


WorkflowStarted sent to Core (on first tool call)


┌─── Tool call ──────────────────────┐
│ ActivityStarted → Core evaluates │ (repeats for each
│ Tool executes │ tool call)
│ ActivityCompleted → Core receives │
└────────────────────────────────────┘

┌─── LLM inference ─────────────────┐
│ ActivityStarted → Core records │ (repeats for each
│ LLM responds │ LLM call)
│ ActivityCompleted → Core receives │
└────────────────────────────────────┘


agent_end (success or failure captured)


session_end


WorkflowCompleted or WorkflowFailed sent to Core
Gateway stopped, state reset

Identity and session tracking

The plugin accumulates session identity across multiple hooks because different hooks provide different context:

  • session_start provides the sessionId
  • before_tool_call provides the sessionKey
  • Both are needed to register the session with Core

The session is registered with Core on the first governance evaluation (not at session_start). All subsequent events in the session reference the same workflow_id and run_id.

OTel Span Capture

The plugin captures OpenTelemetry spans during tool execution and LLM inference and attaches them to governance events sent to Core. This gives you visibility into what HTTP requests and filesystem operations occurred during each action.

What is captured

Span typeSourceExamples
HTTP requestsUndici instrumentationAPI calls, web fetches made during tool execution
Filesystem operationsFilesystem instrumentationFile reads, writes, deletes

Spans are mapped to semantic names for clarity:

Raw operationMapped name
fs readFile, fs stat, fs readdirfile.read
fs writeFile, fs rename, fs mkdirfile.write
fs unlink, fs rm, fs rmdirfile.delete
fs open, fs closefile.open

Self-call filtering

HTTP requests to OpenBox Core itself are automatically excluded from span capture. This prevents governance API calls from appearing as tool activity in your telemetry data.

Fail-Open Design

The plugin is designed to fail open. If OpenBox Core is unreachable or returns an error, tools and LLM calls execute normally. This ensures that infrastructure issues with the governance service never block your agent's work.

Specifically:

ScenarioBehavior
Core is unreachable (network error, timeout)Tool executes, error is logged
Core returns an unexpected responseTool executes, error is logged
Core returns no verdictTool executes (defaults to allow)
Core returns allowTool executes
Core returns blockTool is prevented

Only an explicit block verdict from Core prevents execution. Everything else defaults to allowing the action.

The LLM gateway follows the same pattern: if Core is unreachable during guardrails evaluation, the LLM request is forwarded to the provider as-is.