70 sessions · 4 windows (day) · 18 checkpoints · Granularity: day
| Checkpoint | 04-01 (24 sess.) | 04-02 (26 sess.) | 04-04 (6 sess.) | 04-05 (14 sess.) |
Trend | Delta |
|---|---|---|---|---|---|---|
| GLOBAL (avg. checkpoints) | 82% | 82% | N/O | N/O | → STABLE | +0.4pts |
| Audit Trail | 0% 🔴 | 0% 🔴 | N/O | N/O | → STABLE | 0pts |
| Automatic Blocking Linked to Human Rejection | 0% 🔴 | 0% 🔴 | N/O | N/O | → STABLE | 0pts |
| Confidence-Based Human Routing | 79% | 81% | N/O | N/O | → STABLE | +2pts |
| Contextual Memory Limitation | 100% | 100% | N/O | N/O | → STABLE | 0pts |
| Data Traceability | 100% | 100% | N/O | N/O | → STABLE | 0pts |
| Data Cleansing & Anonymisation | 100% | 100% | N/O | N/O | → STABLE | 0pts |
| Decision Record Structure | 100% | 100% | N/O | N/O | → STABLE | 0pts |
| Authority Delegation | 100% | 100% | N/O | N/O | → STABLE | 0pts |
| System Explainability | 100% | 100% | N/O | N/O | → STABLE | 0pts |
| Bypass Detection | 100% | 100% | N/O | N/O | → STABLE | 0pts |
| Human-in-the-Loop Mechanism | 100% | 100% | N/O | N/O | → STABLE | 0pts |
| Escalation to Human | 100% | 100% | N/O | N/O | → STABLE | 0pts |
| Human Validation | 0% 🔴 | 0% 🔴 | N/O | N/O | → STABLE | 0pts |
| Serious Incident Notification Procedure | 96% | 96% | N/O | N/O | → STABLE | +0pts |
| Execution Limits (Guardrails) | 96% | 100% | N/O | N/O | → STABLE | +4pts |
| User Override | 100% | 100% | N/O | N/O | → STABLE | 0pts |
| PII Masking Before External Transmission | 100% | 100% | N/O | N/O | → STABLE | 0pts |
| Post-Market Plan | 100% | 100% | N/O | N/O | → STABLE | 0pts |
Strict rate (🟢 only) per time window.
VERIFIED = sessions present and compliant (rate shown) ·
FAILED 🔴 = sessions present, none compliant (negative evidence) ·
N/O = no sessions evaluated this control in this period (absence of evidence ≠ evidence of absence)
IMPROVING = +5pts between first and last observed period · DEGRADING = −5pts.
Living governance metrics — operational summary of human oversight in real operation.
Session KPIs : Governance Reliability · Human Oversight Coverage · Auditability Coverage
Temporal KPIs : Human Intervention Rate · Oversight Stability · Drift Index · Compliance Volatility
Distinct from AI Act scoring — measures the quality of active supervision, not documentary compliance.
Hallucination Rate = confirmed_hallucinations / outputs (detection: hallucination_detected flag set by grounding validator or human reviewer — not self-reported by the model) ·
Override Rate = human rejections (disagreement ≠ hallucination) ·
Confidence Degradation = model self-reported low confidence (caution ≠ error) ·
Intervention Rate = total governance activity (dual signal: active governance + system needing correction)
| Metric | Bar | Value | Source |
|---|---|---|---|
| Hallucination Rate |
|
2.86% | confirmed_hallucinations / outputs — 2 of 70 outputs. Detection method: event-level hallucination_detected=true flag in runtime traces (set by grounding validator or human reviewer). |
| Human Override Rate |
|
7.14% | human_rejections / outputs — 5 rejections |
| Confidence Degradation Rate |
|
0.00% | low_confidence_outputs / outputs — 0 outputs sous seuil |
| Governance Intervention Rate |
|
7.14% | (rejections + escalations) / outputs — 5 interventions. Dual signal: high rate = governance is active (positive) but system requires frequent correction (negative). Interpret alongside Reliability. |
| Output Quality Risk Score |
|
5.00% | Weighted: ×0.2 low_conf + ×0.3 rejections + ×1.0 hallucinations (≠ Hallucination Rate — multi-signal summary) |
GHS = weighted average of 10 dimensions :
Governance Reliability ×4.0 (anchor — conformity IS governance) ·
Human Intervention Rate ×2.0 · Governance Drift Index ×1.8 ·
Human Oversight Stability ×1.5 · Human Oversight Coverage ×1.2 ·
Confidence Routing Rate ×1.3 · Auditability Coverage ×1.3 ·
Compliance Volatility ×1.2 · Override Rate ×1.0 · Policy Adherence Rate ×0.8.
Cap rule: if Governance Reliability = 0%, GHS is capped at 50 —
mechanisms without outcomes cannot score above "monitoring required".
Blind period penalty: periods with no observations reduce Drift Index
and Compliance Volatility scores — loss of visibility is itself a risk signal.
Drift vs Coverage Loss : Governance Drift Index measures the checkpoints whose rate drops between two observed periods. If the latest periods have no data (Coverage Loss), the GDI may under- or over-estimate the real drift by extrapolation — the ⚠ annotation flags this interpretation risk.