Runtime Temporal Trend

📈 Temporal Compliance Trend — P3

70 sessions · 4 windows (day) · 18 checkpoints · Granularity: day

Global trend

→ STABLE

Delta / Observability

+0.4pts

Observability Coverage

50% periods with data

Global evolution

▇ ▇ ░ ░ ░=N/O

Checkpoint	04-01 (24 sess.)	04-02 (26 sess.)	04-04 (6 sess.)	04-05 (14 sess.)	Trend	Delta
GLOBAL (avg. checkpoints)	82%	82%	N/O	N/O	→ STABLE	+0.4pts
Audit Trail	0% 🔴	0% 🔴	N/O	N/O	→ STABLE	0pts
Automatic Blocking Linked to Human Rejection	0% 🔴	0% 🔴	N/O	N/O	→ STABLE	0pts
Confidence-Based Human Routing	79%	81%	N/O	N/O	→ STABLE	+2pts
Contextual Memory Limitation	100%	100%	N/O	N/O	→ STABLE	0pts
Data Traceability	100%	100%	N/O	N/O	→ STABLE	0pts
Data Cleansing & Anonymisation	100%	100%	N/O	N/O	→ STABLE	0pts
Decision Record Structure	100%	100%	N/O	N/O	→ STABLE	0pts
Authority Delegation	100%	100%	N/O	N/O	→ STABLE	0pts
System Explainability	100%	100%	N/O	N/O	→ STABLE	0pts
Bypass Detection	100%	100%	N/O	N/O	→ STABLE	0pts
Human-in-the-Loop Mechanism	100%	100%	N/O	N/O	→ STABLE	0pts
Escalation to Human	100%	100%	N/O	N/O	→ STABLE	0pts
Human Validation	0% 🔴	0% 🔴	N/O	N/O	→ STABLE	0pts
Serious Incident Notification Procedure	96%	96%	N/O	N/O	→ STABLE	+0pts
Execution Limits (Guardrails)	96%	100%	N/O	N/O	→ STABLE	+4pts
User Override	100%	100%	N/O	N/O	→ STABLE	0pts
PII Masking Before External Transmission	100%	100%	N/O	N/O	→ STABLE	0pts
Post-Market Plan	100%	100%	N/O	N/O	→ STABLE	0pts

Strict rate (🟢 only) per time window.
VERIFIED = sessions present and compliant (rate shown) · FAILED 🔴 = sessions present, none compliant (negative evidence) · N/O = no sessions evaluated this control in this period (absence of evidence ≠ evidence of absence)
IMPROVING = +5pts between first and last observed period · DEGRADING = −5pts.

🏛️ Governance Health Score

Living governance metrics — operational summary of human oversight in real operation.
Session KPIs : Governance Reliability · Human Oversight Coverage · Auditability Coverage
Temporal KPIs : Human Intervention Rate · Oversight Stability · Drift Index · Compliance Volatility
Distinct from AI Act scoring — measures the quality of active supervision, not documentary compliance.

GHS Global

50/100

→ STABLE

Human Supervision

/100

Oversight · Stability

Auditability

/100

Traçabilité · Politique

Operational Stability

/100

Drift · Routing · Volatility

Gov. Reliability

/100

Sessions conformes

Pourquoi GHS = 50/100 avec Gov. Reliability = 0% ?
Governance Reliability pèse 11% du score composite (poids 1.8/16.1). Les 9 autres dimensions observées (Human Supervision, Auditability, Stability...) maintiennent le score. La pénalité effective de ce groupe sur le GHS est de 6 pts environ. Ce score reflète une gouvernance opérationnellement active (routing, oversight, audit) mais structurellement non fiable au niveau session (0 session entièrement conforme).

Dimension	Score	Source

● Human Supervision 75/100
Human Oversight Coverage	100%	Sessions where oversight was observed, not necessarily effective — see Governance Reliability for actual conformity
Human Oversight Stability	0%	Oversight consistency — 0/4 periods VERIFIED (≠ Coverage: measures temporal regularity, not presence)
Human Intervention Rate	100%	HUMAN_OVERSIGHT + HUMAN_FALLBACK — last period VERIFIED
Override Rate	100%	Human interventions overriding AI decisions
● Auditability 97/100
Auditability Coverage	96%	Sessions with audit trail or observed decision record
Policy Adherence Rate	100%	Inverse of the audit-trail violation rate
● Operational Stability 84/100
Governance Drift Index	85%	0/18 controls degrading · 100=no drift · 0=all degrading ⚠ 2 recent periods without data — drift potentially linked to coverage loss, not an actual behavioral change.
Confidence Routing Rate	81%	Effective routing to human supervision on low-confidence cases
Compliance Volatility	87%	Rate stability — std-dev=0.3pts · 100=perfectly stable
● Governance Reliability 0/100
Governance Reliability	0%	Fully compliant sessions / 70 total sessions

📊 Output Quality Metrics 70 outputs analysed · scientific separation of phenomena

Hallucination Rate = confirmed_hallucinations / outputs (detection: hallucination_detected flag set by grounding validator or human reviewer — not self-reported by the model) · Override Rate = human rejections (disagreement ≠ hallucination) · Confidence Degradation = model self-reported low confidence (caution ≠ error) · Intervention Rate = total governance activity (dual signal: active governance + system needing correction)

Metric	Value	Source
Hallucination Rate	2.86%	confirmed_hallucinations / outputs — 2 of 70 outputs. Detection method: event-level `hallucination_detected=true` flag in runtime traces (set by grounding validator or human reviewer).
Human Override Rate	7.14%	human_rejections / outputs — 5 rejections
Confidence Degradation Rate	0.00%	low_confidence_outputs / outputs — 0 outputs sous seuil
Governance Intervention Rate	7.14%	(rejections + escalations) / outputs — 5 interventions. Dual signal: high rate = governance is active (positive) but system requires frequent correction (negative). Interpret alongside Reliability.
Output Quality Risk Score	5.00%	Weighted: ×0.2 low_conf + ×0.3 rejections + ×1.0 hallucinations (≠ Hallucination Rate — multi-signal summary)

▶ Annexe — Formule pondérée GHS composite

GHS = weighted average of 10 dimensions :
Governance Reliability ×4.0 (anchor — conformity IS governance) · Human Intervention Rate ×2.0 · Governance Drift Index ×1.8 · Human Oversight Stability ×1.5 · Human Oversight Coverage ×1.2 · Confidence Routing Rate ×1.3 · Auditability Coverage ×1.3 · Compliance Volatility ×1.2 · Override Rate ×1.0 · Policy Adherence Rate ×0.8.
Cap rule: if Governance Reliability = 0%, GHS is capped at 50 — mechanisms without outcomes cannot score above "monitoring required".
Blind period penalty: periods with no observations reduce Drift Index and Compliance Volatility scores — loss of visibility is itself a risk signal.

Drift vs Coverage Loss : Governance Drift Index measures the checkpoints whose rate drops between two observed periods. If the latest periods have no data (Coverage Loss), the GDI may under- or over-estimate the real drift by extrapolation — the ⚠ annotation flags this interpretation risk.