Last updated on Mar 2, 2026

Tags:

Trust Scores

The Trust Score is a 0-100 metric representing an agent's trustworthiness based on its configuration and behavior.

Calculation

Trust Score = (Risk Profile Score × 40%) + (Behavioral × 35%) + (Alignment × 25%)

Component	Weight	Source	Range
Risk Profile Score	40%	Risk scoring (Assess phase)	0-100
Behavioral	35%	Policy compliance (Authorize + Monitor)	0-100
Alignment	25%	Goal consistency (Verify phase)	0-100

Components

Risk Profile Score (40%)

Based on the agent's inherent risk profile:

Configured at agent creation
14 parameters across three weighted categories: Base Security (25%), AI-Specific (45%), Impact (30%)
Produces an Risk Profile Score (0–100) and a Risk Tier (1–4)
Static unless re-assessed
Higher score = lower inherent risk

Behavioral Score (35%)

Based on runtime compliance:

Behavioral Compliance component starts at 100 for new agents
Violations affect the Behavioral Compliance component (35% weight), not Trust Score directly
Increases with compliant behavior
Updated continuously

Factors:

Penalty to Behavioral Compliance component:

Minor violation: -5 pts (→ -1.75 pts Trust Score)
Major violation: -15 pts (→ -5.25 pts Trust Score)
Critical violation: -25 pts (→ -8.75 pts Trust Score)

Alignment Score (25%)

Based on goal consistency:

Starts at 100 for new agents
Updated per session based on goal alignment checks
Uses LLM evaluation (configurable)

Calculation per session:

Session Alignment = avg(operation_alignment_scores)
Overall Alignment = weighted_avg(recent_sessions, decay=0.95)

Score Ranges

Risk Profile Score	Risk Tier	Risk Level	Description
0% – 24%	Tier 1	Low	Read-only, public data access
25% – 49%	Tier 2	Medium	Internal data, non-critical actions
50% – 74%	Tier 3	High	PII, financial data, critical actions
75% – 100%	Tier 4	Critical	System admin, destructive actions

Score Display

Trust Score card on the Assess tab, showing the score, tier badge, and component breakdown.

Color coding:

Tier	Color
Tier 1 (0% – 24%)	Green
Tier 2 (25% – 49%)	Blue
Tier 3 (50% – 74%)	Yellow
Tier 4 (75% – 100%)	Red

Score Evolution

New Agents

Initial Trust Score:
├── Risk Profile: (from risk profile) × 40%
├── Behavioral: 100 × 35% = 35
├── Alignment: 100 × 25% = 25
└── Total: varies by risk profile

Behavioral and Alignment components start at 100 for new agents. Overall Trust Score depends on the Risk Profile score.

Example: Risk Profile Score = 98, Behavioral = 100, Alignment = 100 → Trust Score = (98 × 0.40) + (100 × 0.35) + (100 × 0.25) = 99.2 → TIER 1

Over Time

Day 1:  92 ━━━━━━━━━━━━━━━━━━ Tier 1
Day 7:  88 ━━━━━━━━━━━━━━━━━━ Tier 2 (minor violations)
Day 14: 84 ━━━━━━━━━━━━━━━━━━ Tier 2 (stable)
Day 21: 86 ━━━━━━━━━━━━━━━━━━ Tier 2 (recovering)
Day 30: 89 ━━━━━━━━━━━━━━━━━━ Tier 2 (approaching Tier 1)

Recovery

To improve a degraded score:

Consecutive compliance - No violations for 7+ days
High operation volume - More compliant operations
HITL success - Approved requests
Goal alignment - Consistent alignment scores

Recovery rate:

Tier 1-3: +1 pt/day
Tier 4: +0.5 pt/day

Trust Tiers - How scores map to trust controls
Assess Phase - Configure the Risk Profile component
Adapt Phase - Watch trust evolve over time

Calculation​

Components​

Risk Profile Score (40%)​

Behavioral Score (35%)​

Alignment Score (25%)​

Score Ranges​

Score Display​

Score Evolution​

New Agents​

Over Time​

Recovery​

Related​