Confidence and Completeness

Chapter 12 — Confidence and Completeness

“Confidence without evidence is arrogance. Evidence without confidence scoring is noise.” — Dibyendu De

This chapter defines the canonical confidence scoring standard used across every module in RAPID AI. It is the bridge between Part 2 (domain engineering — signal intelligence, stability intelligence, engineering intelligence) and Part 3 (the product — architecture, data, UI, deployment). No downstream product decision — whether a dashboard color, an automated alert, an RCM task recommendation, or a CDE design-out proposal — can be made without first answering: How confident are we in this assessment, and is the evidence complete enough to act on it?

12.1 Why Confidence Is a First-Class Concept

Most IIoT platforms treat confidence as an afterthought — a footnote attached to an alert. RAPID AI treats it as structural. Confidence is not a label appended to a diagnosis; it is a numeric signal that flows through the pipeline alongside the diagnostic evidence itself. Every module produces a confidence score. Every downstream consumer checks that score before acting.

This design choice has three consequences:

Honest propagation. Poor input data reduces confidence in every downstream result, automatically. A noisy vibration signal does not produce a false alarm; it produces a low-confidence assessment that the dashboard suppresses or the RCM engine defers for human review.
Auditable decisions. Every maintenance action can be traced back through the pipeline to the specific confidence values that justified it. Regulators, insurers, and plant managers can see exactly why the system recommended a bearing replacement and how confident it was in that recommendation.
Graceful degradation. When sensors fail, data quality drops, or evidence is ambiguous, RAPID AI does not crash or produce garbage. It produces lower-confidence results and clearly communicates the uncertainty, allowing engineers to apply judgment where the system cannot.

12.2 The Confidence Scale

All confidence scores in RAPID AI use a single canonical representation:

Data type: IEEE 754 float
Range: 0.0 to 1.0 (clamped; values outside this range indicate a bug)
Field name: confidence_score (or confidence in compact payloads)
Precision: Two decimal places for display; full float precision for computation

Qualitative Labels

Label	Range	Interpretation	Evidence Profile
High	>= 0.85	Strong evidence from multiple independent sources	Multiple sensors confirming, trend and entropy aligned, rule match score > 0.90
Medium-High	>= 0.75	Good evidence with minor gaps	Strong single-source evidence, or two sources with partial alignment
Medium	>= 0.60	Moderate evidence, actionable with caveats	Ambiguous or single-source; pattern match present but incomplete
Low	>= 0.40	Limited evidence, needs corroboration	Weak signal, noisy data, or early-stage trend not yet confirmed
Insufficient	< 0.40	Not enough data for reliable conclusion	Contradictory evidence, missing sensors, or data quality gate failure

The boundaries are not arbitrary. They align with the decision thresholds defined in Section 12.3. A “Medium” score of 0.60 is the minimum at which Module D will accept a diagnosis. A “Medium-High” score of 0.75 is the minimum for automated alerts. These boundaries were chosen to balance two risks: acting too early on weak evidence (false positive) vs. waiting too long for perfect evidence (missed failure).

Text-to-Numeric Mapping

Source data from the domain expert’s original documents uses qualitative labels (“high”, “medium”, “low”). RAPID AI maps these to canonical numeric values at the ingestion boundary:

Qualitative Input	Canonical Value	Context
High	0.85	Direct sensor confirmation with valid calibration
High with calibration	0.85	Requires `calibration_valid == true` flag
Medium-high	0.75	Strong single-source or moderate multi-source
Medium	0.60	Ambiguous, single-source, or mixed evidence
Low	0.40	Weak signal, noisy data, insufficient history
Insufficient	0.00	Contradictory evidence or hard block from Module A

No module internal to the pipeline should ever store or transmit confidence as a text label. The conversion happens once, at the boundary, and all downstream computation uses the numeric value.

12.3 Decision Thresholds

Different decisions require different confidence levels. A dashboard can display a “watch” indicator at lower confidence than required to trigger a bearing replacement work order. The thresholds encode this risk hierarchy:

Decision Context	Minimum Confidence	Rationale
Safety-critical escalation	>= 0.80	High confidence required before overriding normal operations or triggering emergency protocols
Automated alert dispatch	>= 0.75	Alerts sent to maintenance teams must be credible; false alarms erode trust
RCM task selection	>= 0.70	Prevents false maintenance triggers; below this, strategy = “Inspect / Validate”
Module D diagnosis acceptance	>= 0.60	Minimum to enter health staging; below this, mechanism remains “unconfirmed”
Dashboard display	>= 0.50	Suppresses low-confidence noise from the UI; below this, the asset shows as “data insufficient”
CDE contradiction trigger	>= 0.65	Contradiction analysis requires reliable failure mode identification
Imperfection rule activation	>= 0.60	Structural weakness inference needs credible diagnostic evidence
Copilot response inclusion	>= 0.50	Natural language summaries include lower-confidence hypotheses with explicit caveats

The RCM Confidence Gate

The RCM decision algorithm (Chapter 9) embeds confidence directly in its logic:

IF confidence_score < 0.60:
    strategy = "Inspect / Validate"
    task = "Collect more data and verify failure mode"

ELIF confidence_score >= 0.70 AND detectable_online:
    strategy = "Condition Based Maintenance"
    task = "Monitor trend and act on trigger"

This means a bearing with a BPFO frequency match scoring 0.55 will not trigger a CBM work order. Instead, it triggers a validation task — an engineer verifies the reading, checks the sensor, and confirms or denies. This is the system expressing appropriate humility about its own uncertainty.

12.4 How Confidence Flows Through the Pipeline

Confidence is not computed once and forgotten. It enters the pipeline at Module A and accumulates, compounds, and transforms as evidence passes through each module. The flow is:

Module A (Data Quality)
    -> Module B (Fault Detection Confidence)
        -> Module B.2 (Trend Confidence)
            -> Module B.3 (Entropy / Stability Index)
                -> Module C (SSI Fusion - Weighted Confidence)
                    -> AESF (Stability State Confidence)
                        -> Module D (Diagnostic Confidence)
                            -> Module E (RCM Task Confidence)
                                -> Module F (RUL / Weibull Confidence)
                                    -> Module G (CDE Confidence)

Stage 1: Data Quality (Module A)

Module A produces Q_data, a quality score between 0.0 and 1.0 that reflects the trustworthiness of the raw sensor input. It is computed as the product of all triggered soft-penalty rules:

Q_data = Product(penalty_i)     for all triggered DG rules

Each penalty_i is a multiplicative factor less than 1.0. Hard-block rules (DG001, DG002, DG005, DG006, DG011) set Q_data = 0, halting the pipeline entirely for that signal. Soft-penalty rules (DG003, DG007, DG008, DG010, DG015, DG016) progressively degrade quality.

Q_data is the multiplicative ceiling on all downstream confidence. No matter how strong the diagnostic evidence, a sensor with Q_data = 0.6 caps the final confidence at 60% of what it would otherwise be.

Stage 2: Fault Detection (Module B)

Module B evaluates 119 physics-based initiator rules across 12 component types. Each matched rule produces a B_match_score (0.0 to 1.0) reflecting how closely the sensor evidence matches the expected fault pattern. The rule confidence c_m is derived from the match strength:

c_m = max(B_match_score_i)     across all matched rules for a given failure mode

Stage 3: Trend Analysis (Module B.2)

Module B.2 classifies the trend (Stable, Drift, Accelerating, Chaotic, Step) and produces a trend_confidence score. The trend severity score c_t reflects both the trend class and the statistical confidence of the regression:

c_t = trend_severity * regression_r_squared

Trends classified as “Chaotic” with low R-squared receive near-zero trend confidence, preventing noise-driven false alarms.

Stage 4: Entropy / Stability (Module B.3)

Module B.3 computes the Stability Index using the SEDL entropy decomposition:

SI = 1 - (0.5 * SE + 0.3 * TE + 0.2 * DE)

Where:

SE = Spectral Entropy: -Sum(p_i * ln(p_i)) / ln(N) (FFT magnitude distribution)
TE = Temporal Entropy: -Sum(q_i * ln(q_i)) / ln(N) (amplitude bin distribution)
DE = Directional Entropy: -Sum(r_j * ln(r_j)) / ln(3) (H/V/A energy ratio)

Higher SI means more stable. The entropy-derived confidence contribution is:

c_s = 1 - SI     (stability gap: lower stability = higher concern)

Stage 5: Evidence Compounding

The three independent evidence streams (rule match, trend, entropy) compound using independent-evidence fusion:

confidence_compound = 1 - (1 - c_m) * (1 - c_t) * (1 - c_s)

This formula has an important property: each additional evidence source can only increase confidence. If c_m = 0.70, c_t = 0.50, and c_s = 0.40, the compound confidence is:

1 - (1 - 0.70)(1 - 0.50)(1 - 0.40) = 1 - (0.30)(0.50)(0.60) = 1 - 0.09 = 0.91

Three moderate-confidence evidence sources combine to produce high confidence. This is exactly the behavior desired: multiple independent indicators of the same failure mode should reinforce each other.

Stage 6: Data Quality Ceiling

The final pipeline confidence applies the Module A quality gate:

C_final = Q_data * confidence_compound

Canonical reference: See Chapter 6 for the authoritative confidence propagation formula.

If Q_data = 0.80 (one soft penalty triggered) and confidence_compound = 0.91:

C_final = 0.80 * 0.91 = 0.728

This crosses the RCM threshold (0.70) but falls short of the safety escalation threshold (0.80). The system recommends CBM but does not trigger emergency protocols. This is the confidence pipeline working as intended.

Stage 7: Downstream Module Confidence

Modules D through G inherit and further refine the upstream confidence:

Module	Confidence Source	Adjustment
D (Prognostics)	`C_final` from fusion	Multiplied by HSR multiplier; compressed by health stage severity
E (Maintenance)	`C_final` from Module D	Weighted into priority score via `0.25 * C` term
F (RUL/Weibull)	`C_final` + reliability history	Weibull parameters adjusted by `S_eff` and `SSI`
G (CDE)	`C_final` from Module D	Triggers only when `C_final >= 0.65` and recurrence >= 2

12.5 Normalization Rules

Different modules produce scores on different scales. Before fusion or comparison, all scores must be normalized to the canonical 0.0-1.0 range.

Module B.2 (Trend Severity)

Trend classes map directly to severity scores:

Trend Class	Severity Score	Confidence Multiplier
Stable	0.10	R-squared of regression
Drift	0.40	R-squared of regression
Accelerating	0.80	R-squared of regression
Chaotic	0.60	Capped at 0.50 (unpredictable)
Step	0.85	1.0 if change exceeds 3-sigma

Module B.3 (SEDL to SI)

Already normalized by construction. Each entropy component is divided by ln(N) or ln(3), clamping to [0, 1]. The weighted combination is also clamped:

SI = clamp(1 - (0.5*SE + 0.3*TE + 0.2*DE), 0.0, 1.0)

Module C (SSI)

Canonical reference: See Chapter 5 for the authoritative SSI formula.

SSI is computed as a weighted mean of block scores, each already on [0, 1]. The result is clamped:

SSI = clamp(Sum(w_i * bs_i) / Sum(w_i), 0.0, 1.0)

Module C (SEI)

The System Entropy Index normalizes B.3 signals for system-level fusion:

SEI = clamp(0.7 * (1 - SI) + 0.3 * dSE_dt_norm, 0.0, 1.0)

Combined System Confidence

When both SSI and SEI are available, the system-level confidence is:

system_confidence = 0.6 * SSI + 0.4 * SEI

The final system state is determined by max(SSI_state, SEI_state) plus override rules (e.g., Critical_Instability forces SSI >= 0.70).

Effective Severity (Cross-Module)

When multiple rules or modules produce severity assessments for the same failure mode, they are combined using confidence-weighted averaging:

S_eff = Sum(severity_i * confidence_i) / Sum(confidence_i)

This ensures that a high-confidence severity-8 finding dominates over a low-confidence severity-9 finding.

12.6 Conflict Resolution

When modules disagree — and they will — RAPID AI uses a three-tier resolution strategy.

Tier 1: Max-Wins (Default)

For most conflicts, the highest-confidence assessment prevails. If Module B says “bearing outer race spalling” at confidence 0.78 and also says “gear mesh misalignment” at confidence 0.71, both are reported, ranked by confidence. The highest-confidence finding drives the primary recommendation.

Tier 2: Confidence-Weighted Average

For continuous scores (SSI, SEI, severity), conflicting assessments are merged using the S_eff formula. This prevents a single outlier from dominating the fused result.

Tier 3: Escalation to Human Review

Certain conflict patterns trigger mandatory human review:

Conflict Pattern	Trigger	Action
Entropy contradicts trend	SI > 0.70 (stable) but trend = “Accelerating”	Flag as “Ambiguous stability” — engineer review required
High confidence, opposite conclusions	Two rules with confidence > 0.75 pointing to mutually exclusive failure modes	Flag as “Diagnostic conflict” — present both hypotheses with evidence
Safety-critical below threshold	Consequence = Safety but confidence < 0.80	Escalate to Level 3 review; do not suppress the finding
CDE vs. RCM disagreement	Module G recommends design-out but Module E recommends time-based replacement	Present both; require engineering review for resolution

Override Rules

Three hard overrides exist that bypass normal conflict resolution:

Critical Instability Override: If Module B.3 reports stability_state == "Critical_Instability", then SSI = max(SSI, 0.70) regardless of component-level scores. System-level entropy collapse trumps component-level health.
SEI Alarm Override: If SEI_state == "alarm", the final system state is raised to at least “warning”, even if SSI_state == "healthy". Entropy disorder demands attention even when traditional indicators appear normal.
Safety Consequence Override: If consequence_category == "Safety" and severity_rank >= 4, the RCM strategy is forced to “Immediate action / fail-safe / shutdown review” regardless of confidence score.

12.7 Completeness Matrix

RAPID AI’s knowledge base is extensive but not complete. This section maps what is implemented, what is designed, and what remains planned. Honesty about completeness is itself a form of confidence scoring — applied to the system rather than to a diagnosis.

Rule and Knowledge Coverage

Knowledge Domain	Specified	Populated	With Real Logic	Status
Module B fault detection rules	119	119	119	Complete
Failure mode master library	320	320	320	Complete (target: 500)
FRETTLSM factors	88	88	0 (weights missing)	Schema only
Imperfection rules	300	300	~50 (rest use placeholder logic)	Partially populated
IMS rows (ground truth)	100	100	100	Complete
RCM decision rules	7	7	7	Complete
RCM workbook templates	10 CSVs	Headers only	0	Requires plant data
SEDL entropy thresholds	8	8	8	Complete
Module C system profiles	3	3	3	Complete
AESF fault routing	39	39	39	Complete
Module G contradiction types	8	8	7 (CT08 missing)	95%

Module Implementation Status

Module	Docs	Schema	Code	Data	Overall
M0 Config (GUARD)	100%	100%	100%	N/A	100%
MA Signal	90%	80%	0%	N/A	55%
MB Fault Detection	100%	100%	30%	100%	80%
MC Fusion (SSI/SEI)	100%	100%	0%	N/A	65%
AESF	100%	100%	0%	N/A	65%
MD Prognostics	100%	100%	0%	N/A	65%
ME Maintenance	100%	70%	30%	100%	70%
MF RUL Engine	100%	100%	0%	N/A	65%
MG CDE	100%	100%	0%	N/A	65%
FRETTLSM	100%	70%	20%	10%	50%
Imperfection	90%	90%	10%	40%	55%
Reliability	100%	80%	0%	0%	45%

Critical Gaps

The twelve items that cannot be resolved from existing source material and require domain expert input or plant-specific data:

Numeric sensor thresholds — sensor_evidence_rules.csv contains 100 rules with "IF X > threshold" where threshold is never a number. This is the single most blocking gap.
Weibull fitting algorithm — Referenced in reliability docs but no implementation provided.
Dependency graph traversal logic — Schema defined but no algorithm for propagating failure impact across connected assets.
FRETTLSM factor catalog with activation weights — Schema exists but no factor data rows.
FRETTLSM asset class templates — No pump/motor/gearbox factor configurations.
FRETTLSM seed data — DDL exists, no INSERT scripts.
RCM workbook data — 10 CSVs with headers only; requires plant-specific field data.
CT08 (Cost vs. Redundancy) resolution families — Missing from Module G.
Dashboard widget JSON structures — 4 of 5 undocumented.
Imperfection rule physics logic — 250 of 300 rules use "threshold_or_ratio_violation" placeholder.
Module F window threshold cutoffs — Recommended_Window boundaries undefined.
Operating-state-dependent IAR classification — FRETTLSM I/A/R roles change by operating state but schema does not support this.

Cross-Module Integration Issues

Issue	Impact	Mitigation
No unified confidence field naming	B.2 uses `trend_confidence`, B.3 uses `SI`, C uses `SSI`/`SEI`, sensor evidence uses text	This chapter defines the canonical standard; refactoring required
No schema bridge tables	3 disconnected stacks (Normalized, Reliability, FRETTLSM) with incompatible PKs	Bridge tables designed in Chapter 15
No error propagation spec	If Module A blocks a signal, downstream modules receive no notification	Pipeline orchestrator must propagate `Q_data = 0` as explicit “no data” signal
AESF positional ambiguity	Documented as “Module E+” but functionally sits between B.3 and C	Treat as stability co-processor feeding into Module C fusion
FM ID collisions	FM0001 means different things in different files	IMS is the canonical authority; all other references must use IMS schema_id

12.8 Confidence in Practice: Worked Example

Consider Cooling Water Pump P-101A showing vibration anomaly.

Module A (GUARD):

Raw signal passes all hard blocks
One soft penalty triggered (DG008: aliasing indicator = 0.12)
Q_data = 0.90

Module B (Fault Detection):

AFB06 (outer race spalling) matched with B_match_score = 0.82
c_m = 0.82

Module B.2 (Trend Analysis):

Trend classified as “Accelerating” with R-squared = 0.88
c_t = 0.80 * 0.88 = 0.704

Module B.3 (Entropy):

SE = 0.71, TE = 0.48, DE = 0.33
SI = 1 - (0.5*0.71 + 0.3*0.48 + 0.2*0.33) = 1 - (0.355 + 0.144 + 0.066) = 0.435
c_s = 1 - 0.435 = 0.565

Evidence Compounding:

confidence_compound = 1 - (1 - 0.82)(1 - 0.704)(1 - 0.565)
                    = 1 - (0.18)(0.296)(0.435)
                    = 1 - 0.0232
                    = 0.977

Data Quality Ceiling:

C_final = 0.90 * 0.977 = 0.879

Decision Outcomes:

Exceeds dashboard display threshold (0.50): Displayed
Exceeds Module D acceptance threshold (0.60): Diagnosis accepted
Exceeds RCM activation threshold (0.70): CBM strategy selected
Exceeds automated alert threshold (0.75): Alert dispatched
Exceeds safety escalation threshold (0.80): Safety review triggered (consequence category is “Operational”, so safety override does not apply)

Module D Assessment:

Health Stage: HSR004 (Unstable, SSI 0.60-0.80)
RUL band: 1-4 weeks
Action: ACT005 — Bearing replacement (planned). Priority: 87.

The complete evidence trail — from the raw vibration signal through every confidence transformation to the final work order — is auditable, explainable, and traceable.

12.9 Summary of Canonical Scores Across Modules

Module	Score Name	Range	What It Measures
A (GUARD)	`Q_data`	0.0 - 1.0	Raw signal trustworthiness
B (SENSE)	`B_match_score`	0.0 - 1.0	Fault pattern match strength
B.2 (Trend)	`trend_confidence`	0.0 - 1.0	Trend classification certainty
B.3 (SEDL)	`SI`	0.0 - 1.0	System stability (higher = more stable)
C (FUSE)	`SSI`	0.0 - 1.0	Fused system health (higher = worse)
C (FUSE)	`SEI`	0.0 - 1.0	System entropy index (higher = more disordered)
AESF	`SI`, `EI`, `CSS`, `JII`	0 - 100	Four stability dimensions (note: 0-100 scale)
D (Prognostics)	`C_final`	0.0 - 1.0	Diagnostic confidence after fusion
E (Maintenance)	Priority `P`	0 - 100	Action urgency (higher = more urgent)
F (RUL)	`RUL_days`	0 - 3650	Estimated remaining useful life
F (Weibull)	`P_30`	0.0 - 1.0	30-day failure probability
G (CDE)	Trigger confidence	0.0 - 1.0	Contradiction detection certainty

Note the AESF anomaly: its indices use a 0-100 scale while the rest of the pipeline uses 0.0-1.0. Normalization to the canonical range requires dividing by 100 before feeding into downstream fusion. This is a known integration point that must be handled at the AESF-to-Module-C interface.

Appendix 12-A: Key Terms

Term	Definition
Confidence Score	A 0.0-1.0 numeric value expressing the system’s certainty in a given assessment
Q_data	Data quality score from Module A; multiplicative ceiling on all downstream confidence
B_match_score	Rule match strength from Module B fault detection
SI (Stability Index)	SEDL entropy-derived stability measure; higher = more stable
SSI (System Stability Index)	Module C fused health score; higher = worse condition
SEI (System Entropy Index)	Module C entropy overlay; higher = more disordered
S_eff (Effective Severity)	Confidence-weighted average severity across multiple assessments
C_final	Pipeline output confidence after evidence compounding and quality ceiling
RPN (Risk Priority Number)	Severity x Probability x Detectability; range 1-125
Evidence Compounding	`1 - Product(1 - C_i)` formula for combining independent evidence sources

Appendix 12-B: Acronym Index

Acronym	Expansion
AESF	Acceleration-Entropy Stability Framework
BPFO	Ball Pass Frequency Outer
CBM	Condition-Based Maintenance
CDE	Contradiction Driven Engineering
DE	Directional Entropy
FRETTLSM	Force-Reactive-Environment-Time-Temperature-Lubrication-Surface-Material
HSR	Health Staging Rules
IAR	Initiator-Accelerator-Retarder
IMS	Integrated Master Schema
NLI	Non-Linearity Index
RCM	Reliability Centered Maintenance
RPN	Risk Priority Number
RUL	Remaining Useful Life
SE	Spectral Entropy
SEDL	Spectral-Temporal-Differential Entropy Layer
SEI	System Entropy Index
SI	Stability Index
SSI	System Stability Index
TE	Temporal Entropy

Previous: Chapter 11 — Product Vision Next: Chapter 13 — Product Strategy

Standards Alignment

Standard	Relevance to This Chapter
ISO 13374 — Condition monitoring and diagnostics of machines	The confidence scoring standard ensures that every ISO 13374 processing level output carries a quantified uncertainty measure, enabling honest propagation of data quality through the entire pipeline.
ISO 17359 — General guidelines for condition monitoring	The confidence scale and decision thresholds implement ISO 17359’s requirement that condition monitoring systems provide reliable, repeatable assessments with documented uncertainty.
ISO 13381-1 — Prognostics	The completeness scoring (evidence breadth across signal, thermal, process, and correlation domains) aligns with ISO 13381-1’s requirement for multi-source evidence fusion in prognostic assessments.

Changelog

Version	Date	Author	Changes
2.1.0	2026-03-17	Rick D	Added standards alignment, living doc metadata, changelog
2.0.0	2026-03-17	Rick D	Enriched with production codebase content
1.0.0	2026-03-17	Rick D	Initial chapter creation