⚠️
Demo dossier — synthetic runtime data. These audits are published for demonstration purposes. Runtime traces were synthetically generated to illustrate the behavioral audit methodology. Systems are anonymised. Full production dossiers with live execution evidence are available under NDA — contact@factnotebook.com

📈 Temporal Compliance Trend — P3

70 sessions · 4 windows (day) · 18 checkpoints · Granularity: day

Global trend
→ STABLE
Delta / Observability
+0.4pts
Observability Coverage
50% periods with data
Global evolution
▇ ▇ ░ ░ ░=N/O
Checkpoint 04-01
(24 sess.)
04-02
(26 sess.)
04-04
(6 sess.)
04-05
(14 sess.)
Trend Delta
GLOBAL (avg. checkpoints) 82%82%N/ON/O → STABLE +0.4pts
Audit Trail0% 🔴0% 🔴N/ON/O→ STABLE0pts
Automatic Blocking Linked to Human Rejection0% 🔴0% 🔴N/ON/O→ STABLE0pts
Confidence-Based Human Routing79%81%N/ON/O→ STABLE+2pts
Contextual Memory Limitation100%100%N/ON/O→ STABLE0pts
Data Traceability100%100%N/ON/O→ STABLE0pts
Data Cleansing & Anonymisation100%100%N/ON/O→ STABLE0pts
Decision Record Structure100%100%N/ON/O→ STABLE0pts
Authority Delegation100%100%N/ON/O→ STABLE0pts
System Explainability100%100%N/ON/O→ STABLE0pts
Bypass Detection100%100%N/ON/O→ STABLE0pts
Human-in-the-Loop Mechanism100%100%N/ON/O→ STABLE0pts
Escalation to Human100%100%N/ON/O→ STABLE0pts
Human Validation0% 🔴0% 🔴N/ON/O→ STABLE0pts
Serious Incident Notification Procedure96%96%N/ON/O→ STABLE+0pts
Execution Limits (Guardrails)96%100%N/ON/O→ STABLE+4pts
User Override100%100%N/ON/O→ STABLE0pts
PII Masking Before External Transmission100%100%N/ON/O→ STABLE0pts
Post-Market Plan100%100%N/ON/O→ STABLE0pts

Strict rate (🟢 only) per time window.
VERIFIED = sessions present and compliant (rate shown) · FAILED 🔴 = sessions present, none compliant (negative evidence) · N/O = no sessions evaluated this control in this period (absence of evidence ≠ evidence of absence)
IMPROVING = +5pts between first and last observed period · DEGRADING = −5pts.

🏛️ Governance Health Score

Living governance metrics — operational summary of human oversight in real operation.
Session KPIs : Governance Reliability · Human Oversight Coverage · Auditability Coverage
Temporal KPIs : Human Intervention Rate · Oversight Stability · Drift Index · Compliance Volatility
Distinct from AI Act scoring — measures the quality of active supervision, not documentary compliance.

GHS Global
50/100
→ STABLE
Human Supervision
75
/100
Oversight · Stability
Auditability
97
/100
Traçabilité · Politique
Operational Stability
84
/100
Drift · Routing · Volatility
Gov. Reliability
0
/100
Sessions conformes
Pourquoi GHS = 50/100 avec Gov. Reliability = 0% ?
Governance Reliability pèse 11% du score composite (poids 1.8/16.1). Les 9 autres dimensions observées (Human Supervision, Auditability, Stability...) maintiennent le score. La pénalité effective de ce groupe sur le GHS est de 6 pts environ. Ce score reflète une gouvernance opérationnellement active (routing, oversight, audit) mais structurellement non fiable au niveau session (0 session entièrement conforme).
Dimension Barre Score Source
● Human Supervision 75/100
Human Oversight Coverage
100% Sessions where oversight was observed, not necessarily effective — see Governance Reliability for actual conformity
Human Oversight Stability
0% Oversight consistency — 0/4 periods VERIFIED (≠ Coverage: measures temporal regularity, not presence)
Human Intervention Rate
100% HUMAN_OVERSIGHT + HUMAN_FALLBACK — last period VERIFIED
Override Rate
100% Human interventions overriding AI decisions
● Auditability 97/100
Auditability Coverage
96% Sessions with audit trail or observed decision record
Policy Adherence Rate
100% Inverse of the audit-trail violation rate
● Operational Stability 84/100
Governance Drift Index
85% 0/18 controls degrading · 100=no drift · 0=all degrading ⚠ 2 recent periods without data — drift potentially linked to coverage loss, not an actual behavioral change.
Confidence Routing Rate
81% Effective routing to human supervision on low-confidence cases
Compliance Volatility
87% Rate stability — std-dev=0.3pts · 100=perfectly stable
● Governance Reliability 0/100
Governance Reliability
0% Fully compliant sessions / 70 total sessions

📊 Output Quality Metrics 70 outputs analysed · scientific separation of phenomena

Hallucination Rate = confirmed_hallucinations / outputs (detection: hallucination_detected flag set by grounding validator or human reviewer — not self-reported by the model) · Override Rate = human rejections (disagreement ≠ hallucination) · Confidence Degradation = model self-reported low confidence (caution ≠ error) · Intervention Rate = total governance activity (dual signal: active governance + system needing correction)

Metric Bar Value Source
Hallucination Rate
2.86% confirmed_hallucinations / outputs — 2 of 70 outputs. Detection method: event-level hallucination_detected=true flag in runtime traces (set by grounding validator or human reviewer).
Human Override Rate
7.14% human_rejections / outputs — 5 rejections
Confidence Degradation Rate
0.00% low_confidence_outputs / outputs — 0 outputs sous seuil
Governance Intervention Rate
7.14% (rejections + escalations) / outputs — 5 interventions. Dual signal: high rate = governance is active (positive) but system requires frequent correction (negative). Interpret alongside Reliability.
Output Quality Risk Score
5.00% Weighted: ×0.2 low_conf + ×0.3 rejections + ×1.0 hallucinations (≠ Hallucination Rate — multi-signal summary)
▶ Annexe — Formule pondérée GHS composite

GHS = weighted average of 10 dimensions :
Governance Reliability ×4.0 (anchor — conformity IS governance) · Human Intervention Rate ×2.0 · Governance Drift Index ×1.8 · Human Oversight Stability ×1.5 · Human Oversight Coverage ×1.2 · Confidence Routing Rate ×1.3 · Auditability Coverage ×1.3 · Compliance Volatility ×1.2 · Override Rate ×1.0 · Policy Adherence Rate ×0.8.
Cap rule: if Governance Reliability = 0%, GHS is capped at 50 — mechanisms without outcomes cannot score above "monitoring required".
Blind period penalty: periods with no observations reduce Drift Index and Compliance Volatility scores — loss of visibility is itself a risk signal.

Drift vs Coverage Loss : Governance Drift Index measures the checkpoints whose rate drops between two observed periods. If the latest periods have no data (Coverage Loss), the GDI may under- or over-estimate the real drift by extrapolation — the ⚠ annotation flags this interpretation risk.

💬 Feedback
Does this report convince you? ×