Last updated on Mar 17, 2026

Tags:

How It Works

OpenBox plugin governs your OpenClaw agent through two paths: tool governance for agent tool calls and LLM guardrails for model inference requests. Both paths evaluate actions against your OpenBox Core policies in real time before execution.

Tool Governance

Every tool call your agent makes passes through the plugin's before_tool_call hook before execution.

Flow

Agent decides to call a tool (e.g. Bash)
         │
         ▼
before_tool_call hook fires
         │
         ▼
Plugin sends tool name + parameters to OpenBox Core
         │
         ▼
Core evaluates against your policies
         │
         ├── allow  → tool executes normally
         └── block  → tool is prevented, agent sees the block reason
         │
         ▼
after_tool_call hook fires
         │
         ▼
Plugin reports tool result + OTel spans to Core

When a tool is blocked, the agent receives a message like:

Blocked by governance policy: <reason from your policy>

The agent sees this as the tool result and can respond accordingly.

What gets sent to Core

For each tool call, the plugin sends:

Tool name - which tool the agent is calling
Parameters - the full parameters passed to the tool
Session identity - workflow ID, run ID, agent ID
Timestamps - when the activity started

After execution, the plugin also sends:

Tool result - the output (truncated to 4000 characters for string results)
Error information - if the tool failed
Duration - how long the tool took
OTel spans - HTTP requests and filesystem operations that occurred during execution

LLM Guardrails

When you route your LLM provider through the OpenBox gateway (see Configuration), every LLM request is evaluated for guardrails before reaching the model.

Flow

Agent sends LLM request
         │
         ▼
Request hits local gateway (http://127.0.0.1:18919/v1)
         │
         ▼
Gateway sends request content to OpenBox Core
         │
         ▼
Core runs guardrails checks
         │
         ├── pass (no issues)           → request forwarded to LLM provider as-is
         ├── pass (with redaction)       → sensitive content redacted, then forwarded
         └── fail (validation failed)    → request blocked, agent receives block message
         │
         ▼
LLM provider response returned to agent

Guardrails

Guardrails are configured in the OpenBox platform - no code or plugin configuration changes needed. Available guardrail types include:

Type	What it does
PII Detection	Detects and redacts emails, phone numbers, SSNs, passport numbers, and other personally identifiable information
Content Filtering	Blocks requests matching specific content categories
Toxicity	Blocks abusive or hostile language from user interactions
Ban Words	Censors or blocks domain-specific banned terms (competitor names, internal codenames, regulated terms)

When guardrails detect sensitive content, there are two possible outcomes:

Redaction - the sensitive content is replaced with sanitized values before the request reaches the LLM. For example, an email address in the prompt becomes a redacted placeholder. The agent continues normally with the sanitized input.
Block - the request is stopped entirely. The agent receives a message explaining why the request was blocked, formatted as a valid LLM response so the agent can process it normally.

Blocked responses are returned in the correct format for both Chat Completions and Responses APIs, including proper SSE streaming format when the agent requests streaming.

Governance Verdicts

OpenBox Core evaluates every tool call against your configured policies and returns a verdict:

Verdict	Effect
`allow`	Tool executes normally. This is the default when no policy matches or when Core is unreachable (fail-open).
`block`	Current tool call is prevented. The agent receives the block reason as the tool result and can attempt other actions.

A block verdict stops a single action. The agent can continue using other tools or try a different approach. The block reason from your policy is returned to the agent so it understands why the action was prevented.

Session Lifecycle

The plugin tracks the full lifecycle of each agent session and reports it to OpenBox Core. This is what populates session timelines in the OpenBox dashboard.

Lifecycle events

session_start
     │
     ▼
WorkflowStarted sent to Core (on first tool call)
     │
     ▼
┌─── Tool call ──────────────────────┐
│  ActivityStarted  → Core evaluates │  (repeats for each
│  Tool executes                     │   tool call)
│  ActivityCompleted → Core receives │
└────────────────────────────────────┘
     │
┌─── LLM inference ─────────────────┐
│  ActivityStarted  → Core records   │  (repeats for each
│  LLM responds                      │   LLM call)
│  ActivityCompleted → Core receives │
└────────────────────────────────────┘
     │
     ▼
agent_end (success or failure captured)
     │
     ▼
session_end
     │
     ▼
WorkflowCompleted or WorkflowFailed sent to Core
Gateway stopped, state reset

Identity and session tracking

The plugin accumulates session identity across multiple hooks because different hooks provide different context:

session_start provides the sessionId
before_tool_call provides the sessionKey
Both are needed to register the session with Core

The session is registered with Core on the first governance evaluation (not at session_start). All subsequent events in the session reference the same workflow_id and run_id.

OTel Span Capture

The plugin captures OpenTelemetry spans during tool execution and LLM inference and attaches them to governance events sent to Core. This gives you visibility into what HTTP requests and filesystem operations occurred during each action.

What is captured

Span type	Source	Examples
HTTP requests	Undici instrumentation	API calls, web fetches made during tool execution
Filesystem operations	Filesystem instrumentation	File reads, writes, deletes

Spans are mapped to semantic names for clarity:

Raw operation	Mapped name
`fs readFile`, `fs stat`, `fs readdir`	`file.read`
`fs writeFile`, `fs rename`, `fs mkdir`	`file.write`
`fs unlink`, `fs rm`, `fs rmdir`	`file.delete`
`fs open`, `fs close`	`file.open`

Self-call filtering

HTTP requests to OpenBox Core itself are automatically excluded from span capture. This prevents governance API calls from appearing as tool activity in your telemetry data.

Fail-Open Design

The plugin is designed to fail open. If OpenBox Core is unreachable or returns an error, tools and LLM calls execute normally. This ensures that infrastructure issues with the governance service never block your agent's work.

Specifically:

Scenario	Behavior
Core is unreachable (network error, timeout)	Tool executes, error is logged
Core returns an unexpected response	Tool executes, error is logged
Core returns no verdict	Tool executes (defaults to `allow`)
Core returns `allow`	Tool executes
Core returns `block`	Tool is prevented

Only an explicit block verdict from Core prevents execution. Everything else defaults to allowing the action.

The LLM gateway follows the same pattern: if Core is unreachable during guardrails evaluation, the LLM request is forwarded to the provider as-is.

Tool Governance​

Flow​

What gets sent to Core​

LLM Guardrails​

Flow​

Guardrails​

Governance Verdicts​

Session Lifecycle​

Lifecycle events​

Identity and session tracking​

OTel Span Capture​

What is captured​

Self-call filtering​

Fail-Open Design​

Tool Governance

Flow

What gets sent to Core

LLM Guardrails

Flow

Guardrails

Governance Verdicts

Session Lifecycle

Lifecycle events

Identity and session tracking

OTel Span Capture

What is captured

Self-call filtering

Fail-Open Design