Skip to content

Confidence and Completeness

Chapter 12 — Confidence and Completeness

Section titled “Chapter 12 — Confidence and Completeness”

“Confidence without evidence is arrogance. Evidence without confidence scoring is noise.” — Dibyendu De

This chapter defines the canonical confidence scoring standard used across every module in RAPID AI. It is the bridge between Part 2 (domain engineering — signal intelligence, stability intelligence, engineering intelligence) and Part 3 (the product — architecture, data, UI, deployment). No downstream product decision — whether a dashboard color, an automated alert, an RCM task recommendation, or a CDE design-out proposal — can be made without first answering: How confident are we in this assessment, and is the evidence complete enough to act on it?


12.1 Why Confidence Is a First-Class Concept

Section titled “12.1 Why Confidence Is a First-Class Concept”

Most IIoT platforms treat confidence as an afterthought — a footnote attached to an alert. RAPID AI treats it as structural. Confidence is not a label appended to a diagnosis; it is a numeric signal that flows through the pipeline alongside the diagnostic evidence itself. Every module produces a confidence score. Every downstream consumer checks that score before acting.

This design choice has three consequences:

  1. Honest propagation. Poor input data reduces confidence in every downstream result, automatically. A noisy vibration signal does not produce a false alarm; it produces a low-confidence assessment that the dashboard suppresses or the RCM engine defers for human review.

  2. Auditable decisions. Every maintenance action can be traced back through the pipeline to the specific confidence values that justified it. Regulators, insurers, and plant managers can see exactly why the system recommended a bearing replacement and how confident it was in that recommendation.

  3. Graceful degradation. When sensors fail, data quality drops, or evidence is ambiguous, RAPID AI does not crash or produce garbage. It produces lower-confidence results and clearly communicates the uncertainty, allowing engineers to apply judgment where the system cannot.


All confidence scores in RAPID AI use a single canonical representation:

  • Data type: IEEE 754 float
  • Range: 0.0 to 1.0 (clamped; values outside this range indicate a bug)
  • Field name: confidence_score (or confidence in compact payloads)
  • Precision: Two decimal places for display; full float precision for computation
LabelRangeInterpretationEvidence Profile
High>= 0.85Strong evidence from multiple independent sourcesMultiple sensors confirming, trend and entropy aligned, rule match score > 0.90
Medium-High>= 0.75Good evidence with minor gapsStrong single-source evidence, or two sources with partial alignment
Medium>= 0.60Moderate evidence, actionable with caveatsAmbiguous or single-source; pattern match present but incomplete
Low>= 0.40Limited evidence, needs corroborationWeak signal, noisy data, or early-stage trend not yet confirmed
Insufficient< 0.40Not enough data for reliable conclusionContradictory evidence, missing sensors, or data quality gate failure

The boundaries are not arbitrary. They align with the decision thresholds defined in Section 12.3. A “Medium” score of 0.60 is the minimum at which Module D will accept a diagnosis. A “Medium-High” score of 0.75 is the minimum for automated alerts. These boundaries were chosen to balance two risks: acting too early on weak evidence (false positive) vs. waiting too long for perfect evidence (missed failure).

Source data from the domain expert’s original documents uses qualitative labels (“high”, “medium”, “low”). RAPID AI maps these to canonical numeric values at the ingestion boundary:

Qualitative InputCanonical ValueContext
High0.85Direct sensor confirmation with valid calibration
High with calibration0.85Requires calibration_valid == true flag
Medium-high0.75Strong single-source or moderate multi-source
Medium0.60Ambiguous, single-source, or mixed evidence
Low0.40Weak signal, noisy data, insufficient history
Insufficient0.00Contradictory evidence or hard block from Module A

No module internal to the pipeline should ever store or transmit confidence as a text label. The conversion happens once, at the boundary, and all downstream computation uses the numeric value.


Different decisions require different confidence levels. A dashboard can display a “watch” indicator at lower confidence than required to trigger a bearing replacement work order. The thresholds encode this risk hierarchy:

Decision ContextMinimum ConfidenceRationale
Safety-critical escalation>= 0.80High confidence required before overriding normal operations or triggering emergency protocols
Automated alert dispatch>= 0.75Alerts sent to maintenance teams must be credible; false alarms erode trust
RCM task selection>= 0.70Prevents false maintenance triggers; below this, strategy = “Inspect / Validate”
Module D diagnosis acceptance>= 0.60Minimum to enter health staging; below this, mechanism remains “unconfirmed”
Dashboard display>= 0.50Suppresses low-confidence noise from the UI; below this, the asset shows as “data insufficient”
CDE contradiction trigger>= 0.65Contradiction analysis requires reliable failure mode identification
Imperfection rule activation>= 0.60Structural weakness inference needs credible diagnostic evidence
Copilot response inclusion>= 0.50Natural language summaries include lower-confidence hypotheses with explicit caveats

The RCM decision algorithm (Chapter 9) embeds confidence directly in its logic:

IF confidence_score < 0.60:
strategy = "Inspect / Validate"
task = "Collect more data and verify failure mode"
ELIF confidence_score >= 0.70 AND detectable_online:
strategy = "Condition Based Maintenance"
task = "Monitor trend and act on trigger"

This means a bearing with a BPFO frequency match scoring 0.55 will not trigger a CBM work order. Instead, it triggers a validation task — an engineer verifies the reading, checks the sensor, and confirms or denies. This is the system expressing appropriate humility about its own uncertainty.


12.4 How Confidence Flows Through the Pipeline

Section titled “12.4 How Confidence Flows Through the Pipeline”

Confidence is not computed once and forgotten. It enters the pipeline at Module A and accumulates, compounds, and transforms as evidence passes through each module. The flow is:

Module A (Data Quality)
-> Module B (Fault Detection Confidence)
-> Module B.2 (Trend Confidence)
-> Module B.3 (Entropy / Stability Index)
-> Module C (SSI Fusion - Weighted Confidence)
-> AESF (Stability State Confidence)
-> Module D (Diagnostic Confidence)
-> Module E (RCM Task Confidence)
-> Module F (RUL / Weibull Confidence)
-> Module G (CDE Confidence)

Module A produces Q_data, a quality score between 0.0 and 1.0 that reflects the trustworthiness of the raw sensor input. It is computed as the product of all triggered soft-penalty rules:

Q_data = Product(penalty_i) for all triggered DG rules

Each penalty_i is a multiplicative factor less than 1.0. Hard-block rules (DG001, DG002, DG005, DG006, DG011) set Q_data = 0, halting the pipeline entirely for that signal. Soft-penalty rules (DG003, DG007, DG008, DG010, DG015, DG016) progressively degrade quality.

Q_data is the multiplicative ceiling on all downstream confidence. No matter how strong the diagnostic evidence, a sensor with Q_data = 0.6 caps the final confidence at 60% of what it would otherwise be.

Module B evaluates 119 physics-based initiator rules across 12 component types. Each matched rule produces a B_match_score (0.0 to 1.0) reflecting how closely the sensor evidence matches the expected fault pattern. The rule confidence c_m is derived from the match strength:

c_m = max(B_match_score_i) across all matched rules for a given failure mode

Module B.2 classifies the trend (Stable, Drift, Accelerating, Chaotic, Step) and produces a trend_confidence score. The trend severity score c_t reflects both the trend class and the statistical confidence of the regression:

c_t = trend_severity * regression_r_squared

Trends classified as “Chaotic” with low R-squared receive near-zero trend confidence, preventing noise-driven false alarms.

Module B.3 computes the Stability Index using the SEDL entropy decomposition:

SI = 1 - (0.5 * SE + 0.3 * TE + 0.2 * DE)

Where:

  • SE = Spectral Entropy: -Sum(p_i * ln(p_i)) / ln(N) (FFT magnitude distribution)
  • TE = Temporal Entropy: -Sum(q_i * ln(q_i)) / ln(N) (amplitude bin distribution)
  • DE = Directional Entropy: -Sum(r_j * ln(r_j)) / ln(3) (H/V/A energy ratio)

Higher SI means more stable. The entropy-derived confidence contribution is:

c_s = 1 - SI (stability gap: lower stability = higher concern)

The three independent evidence streams (rule match, trend, entropy) compound using independent-evidence fusion:

confidence_compound = 1 - (1 - c_m) * (1 - c_t) * (1 - c_s)

This formula has an important property: each additional evidence source can only increase confidence. If c_m = 0.70, c_t = 0.50, and c_s = 0.40, the compound confidence is:

1 - (1 - 0.70)(1 - 0.50)(1 - 0.40) = 1 - (0.30)(0.50)(0.60) = 1 - 0.09 = 0.91

Three moderate-confidence evidence sources combine to produce high confidence. This is exactly the behavior desired: multiple independent indicators of the same failure mode should reinforce each other.

The final pipeline confidence applies the Module A quality gate:

C_final = Q_data * confidence_compound

Canonical reference: See Chapter 6 for the authoritative confidence propagation formula.

If Q_data = 0.80 (one soft penalty triggered) and confidence_compound = 0.91:

C_final = 0.80 * 0.91 = 0.728

This crosses the RCM threshold (0.70) but falls short of the safety escalation threshold (0.80). The system recommends CBM but does not trigger emergency protocols. This is the confidence pipeline working as intended.

Modules D through G inherit and further refine the upstream confidence:

ModuleConfidence SourceAdjustment
D (Prognostics)C_final from fusionMultiplied by HSR multiplier; compressed by health stage severity
E (Maintenance)C_final from Module DWeighted into priority score via 0.25 * C term
F (RUL/Weibull)C_final + reliability historyWeibull parameters adjusted by S_eff and SSI
G (CDE)C_final from Module DTriggers only when C_final >= 0.65 and recurrence >= 2

Different modules produce scores on different scales. Before fusion or comparison, all scores must be normalized to the canonical 0.0-1.0 range.

Trend classes map directly to severity scores:

Trend ClassSeverity ScoreConfidence Multiplier
Stable0.10R-squared of regression
Drift0.40R-squared of regression
Accelerating0.80R-squared of regression
Chaotic0.60Capped at 0.50 (unpredictable)
Step0.851.0 if change exceeds 3-sigma

Already normalized by construction. Each entropy component is divided by ln(N) or ln(3), clamping to [0, 1]. The weighted combination is also clamped:

SI = clamp(1 - (0.5*SE + 0.3*TE + 0.2*DE), 0.0, 1.0)

Canonical reference: See Chapter 5 for the authoritative SSI formula.

SSI is computed as a weighted mean of block scores, each already on [0, 1]. The result is clamped:

SSI = clamp(Sum(w_i * bs_i) / Sum(w_i), 0.0, 1.0)

The System Entropy Index normalizes B.3 signals for system-level fusion:

SEI = clamp(0.7 * (1 - SI) + 0.3 * dSE_dt_norm, 0.0, 1.0)

When both SSI and SEI are available, the system-level confidence is:

system_confidence = 0.6 * SSI + 0.4 * SEI

The final system state is determined by max(SSI_state, SEI_state) plus override rules (e.g., Critical_Instability forces SSI >= 0.70).

When multiple rules or modules produce severity assessments for the same failure mode, they are combined using confidence-weighted averaging:

S_eff = Sum(severity_i * confidence_i) / Sum(confidence_i)

This ensures that a high-confidence severity-8 finding dominates over a low-confidence severity-9 finding.


When modules disagree — and they will — RAPID AI uses a three-tier resolution strategy.

For most conflicts, the highest-confidence assessment prevails. If Module B says “bearing outer race spalling” at confidence 0.78 and also says “gear mesh misalignment” at confidence 0.71, both are reported, ranked by confidence. The highest-confidence finding drives the primary recommendation.

For continuous scores (SSI, SEI, severity), conflicting assessments are merged using the S_eff formula. This prevents a single outlier from dominating the fused result.

Certain conflict patterns trigger mandatory human review:

Conflict PatternTriggerAction
Entropy contradicts trendSI > 0.70 (stable) but trend = “Accelerating”Flag as “Ambiguous stability” — engineer review required
High confidence, opposite conclusionsTwo rules with confidence > 0.75 pointing to mutually exclusive failure modesFlag as “Diagnostic conflict” — present both hypotheses with evidence
Safety-critical below thresholdConsequence = Safety but confidence < 0.80Escalate to Level 3 review; do not suppress the finding
CDE vs. RCM disagreementModule G recommends design-out but Module E recommends time-based replacementPresent both; require engineering review for resolution

Three hard overrides exist that bypass normal conflict resolution:

  1. Critical Instability Override: If Module B.3 reports stability_state == "Critical_Instability", then SSI = max(SSI, 0.70) regardless of component-level scores. System-level entropy collapse trumps component-level health.

  2. SEI Alarm Override: If SEI_state == "alarm", the final system state is raised to at least “warning”, even if SSI_state == "healthy". Entropy disorder demands attention even when traditional indicators appear normal.

  3. Safety Consequence Override: If consequence_category == "Safety" and severity_rank >= 4, the RCM strategy is forced to “Immediate action / fail-safe / shutdown review” regardless of confidence score.


RAPID AI’s knowledge base is extensive but not complete. This section maps what is implemented, what is designed, and what remains planned. Honesty about completeness is itself a form of confidence scoring — applied to the system rather than to a diagnosis.

Knowledge DomainSpecifiedPopulatedWith Real LogicStatus
Module B fault detection rules119119119Complete
Failure mode master library320320320Complete (target: 500)
FRETTLSM factors88880 (weights missing)Schema only
Imperfection rules300300~50 (rest use placeholder logic)Partially populated
IMS rows (ground truth)100100100Complete
RCM decision rules777Complete
RCM workbook templates10 CSVsHeaders only0Requires plant data
SEDL entropy thresholds888Complete
Module C system profiles333Complete
AESF fault routing393939Complete
Module G contradiction types887 (CT08 missing)95%
ModuleDocsSchemaCodeDataOverall
M0 Config (GUARD)100%100%100%N/A100%
MA Signal90%80%0%N/A55%
MB Fault Detection100%100%30%100%80%
MC Fusion (SSI/SEI)100%100%0%N/A65%
AESF100%100%0%N/A65%
MD Prognostics100%100%0%N/A65%
ME Maintenance100%70%30%100%70%
MF RUL Engine100%100%0%N/A65%
MG CDE100%100%0%N/A65%
FRETTLSM100%70%20%10%50%
Imperfection90%90%10%40%55%
Reliability100%80%0%0%45%

The twelve items that cannot be resolved from existing source material and require domain expert input or plant-specific data:

  1. Numeric sensor thresholdssensor_evidence_rules.csv contains 100 rules with "IF X > threshold" where threshold is never a number. This is the single most blocking gap.
  2. Weibull fitting algorithm — Referenced in reliability docs but no implementation provided.
  3. Dependency graph traversal logic — Schema defined but no algorithm for propagating failure impact across connected assets.
  4. FRETTLSM factor catalog with activation weights — Schema exists but no factor data rows.
  5. FRETTLSM asset class templates — No pump/motor/gearbox factor configurations.
  6. FRETTLSM seed data — DDL exists, no INSERT scripts.
  7. RCM workbook data — 10 CSVs with headers only; requires plant-specific field data.
  8. CT08 (Cost vs. Redundancy) resolution families — Missing from Module G.
  9. Dashboard widget JSON structures — 4 of 5 undocumented.
  10. Imperfection rule physics logic — 250 of 300 rules use "threshold_or_ratio_violation" placeholder.
  11. Module F window threshold cutoffsRecommended_Window boundaries undefined.
  12. Operating-state-dependent IAR classification — FRETTLSM I/A/R roles change by operating state but schema does not support this.
IssueImpactMitigation
No unified confidence field namingB.2 uses trend_confidence, B.3 uses SI, C uses SSI/SEI, sensor evidence uses textThis chapter defines the canonical standard; refactoring required
No schema bridge tables3 disconnected stacks (Normalized, Reliability, FRETTLSM) with incompatible PKsBridge tables designed in Chapter 15
No error propagation specIf Module A blocks a signal, downstream modules receive no notificationPipeline orchestrator must propagate Q_data = 0 as explicit “no data” signal
AESF positional ambiguityDocumented as “Module E+” but functionally sits between B.3 and CTreat as stability co-processor feeding into Module C fusion
FM ID collisionsFM0001 means different things in different filesIMS is the canonical authority; all other references must use IMS schema_id

12.8 Confidence in Practice: Worked Example

Section titled “12.8 Confidence in Practice: Worked Example”

Consider Cooling Water Pump P-101A showing vibration anomaly.

Module A (GUARD):

  • Raw signal passes all hard blocks
  • One soft penalty triggered (DG008: aliasing indicator = 0.12)
  • Q_data = 0.90

Module B (Fault Detection):

  • AFB06 (outer race spalling) matched with B_match_score = 0.82
  • c_m = 0.82

Module B.2 (Trend Analysis):

  • Trend classified as “Accelerating” with R-squared = 0.88
  • c_t = 0.80 * 0.88 = 0.704

Module B.3 (Entropy):

  • SE = 0.71, TE = 0.48, DE = 0.33
  • SI = 1 - (0.5*0.71 + 0.3*0.48 + 0.2*0.33) = 1 - (0.355 + 0.144 + 0.066) = 0.435
  • c_s = 1 - 0.435 = 0.565

Evidence Compounding:

confidence_compound = 1 - (1 - 0.82)(1 - 0.704)(1 - 0.565)
= 1 - (0.18)(0.296)(0.435)
= 1 - 0.0232
= 0.977

Data Quality Ceiling:

C_final = 0.90 * 0.977 = 0.879

Decision Outcomes:

  • Exceeds dashboard display threshold (0.50): Displayed
  • Exceeds Module D acceptance threshold (0.60): Diagnosis accepted
  • Exceeds RCM activation threshold (0.70): CBM strategy selected
  • Exceeds automated alert threshold (0.75): Alert dispatched
  • Exceeds safety escalation threshold (0.80): Safety review triggered (consequence category is “Operational”, so safety override does not apply)

Module D Assessment:

  • Health Stage: HSR004 (Unstable, SSI 0.60-0.80)
  • RUL band: 1-4 weeks
  • Action: ACT005 — Bearing replacement (planned). Priority: 87.

The complete evidence trail — from the raw vibration signal through every confidence transformation to the final work order — is auditable, explainable, and traceable.


12.9 Summary of Canonical Scores Across Modules

Section titled “12.9 Summary of Canonical Scores Across Modules”
ModuleScore NameRangeWhat It Measures
A (GUARD)Q_data0.0 - 1.0Raw signal trustworthiness
B (SENSE)B_match_score0.0 - 1.0Fault pattern match strength
B.2 (Trend)trend_confidence0.0 - 1.0Trend classification certainty
B.3 (SEDL)SI0.0 - 1.0System stability (higher = more stable)
C (FUSE)SSI0.0 - 1.0Fused system health (higher = worse)
C (FUSE)SEI0.0 - 1.0System entropy index (higher = more disordered)
AESFSI, EI, CSS, JII0 - 100Four stability dimensions (note: 0-100 scale)
D (Prognostics)C_final0.0 - 1.0Diagnostic confidence after fusion
E (Maintenance)Priority P0 - 100Action urgency (higher = more urgent)
F (RUL)RUL_days0 - 3650Estimated remaining useful life
F (Weibull)P_300.0 - 1.030-day failure probability
G (CDE)Trigger confidence0.0 - 1.0Contradiction detection certainty

Note the AESF anomaly: its indices use a 0-100 scale while the rest of the pipeline uses 0.0-1.0. Normalization to the canonical range requires dividing by 100 before feeding into downstream fusion. This is a known integration point that must be handled at the AESF-to-Module-C interface.


TermDefinition
Confidence ScoreA 0.0-1.0 numeric value expressing the system’s certainty in a given assessment
Q_dataData quality score from Module A; multiplicative ceiling on all downstream confidence
B_match_scoreRule match strength from Module B fault detection
SI (Stability Index)SEDL entropy-derived stability measure; higher = more stable
SSI (System Stability Index)Module C fused health score; higher = worse condition
SEI (System Entropy Index)Module C entropy overlay; higher = more disordered
S_eff (Effective Severity)Confidence-weighted average severity across multiple assessments
C_finalPipeline output confidence after evidence compounding and quality ceiling
RPN (Risk Priority Number)Severity x Probability x Detectability; range 1-125
Evidence Compounding1 - Product(1 - C_i) formula for combining independent evidence sources
AcronymExpansion
AESFAcceleration-Entropy Stability Framework
BPFOBall Pass Frequency Outer
CBMCondition-Based Maintenance
CDEContradiction Driven Engineering
DEDirectional Entropy
FRETTLSMForce-Reactive-Environment-Time-Temperature-Lubrication-Surface-Material
HSRHealth Staging Rules
IARInitiator-Accelerator-Retarder
IMSIntegrated Master Schema
NLINon-Linearity Index
RCMReliability Centered Maintenance
RPNRisk Priority Number
RULRemaining Useful Life
SESpectral Entropy
SEDLSpectral-Temporal-Differential Entropy Layer
SEISystem Entropy Index
SIStability Index
SSISystem Stability Index
TETemporal Entropy

Previous: Chapter 11 — Product Vision Next: Chapter 13 — Product Strategy


StandardRelevance to This Chapter
ISO 13374 — Condition monitoring and diagnostics of machinesThe confidence scoring standard ensures that every ISO 13374 processing level output carries a quantified uncertainty measure, enabling honest propagation of data quality through the entire pipeline.
ISO 17359 — General guidelines for condition monitoringThe confidence scale and decision thresholds implement ISO 17359’s requirement that condition monitoring systems provide reliable, repeatable assessments with documented uncertainty.
ISO 13381-1 — PrognosticsThe completeness scoring (evidence breadth across signal, thermal, process, and correlation domains) aligns with ISO 13381-1’s requirement for multi-source evidence fusion in prognostic assessments.
VersionDateAuthorChanges
2.1.02026-03-17Rick DAdded standards alignment, living doc metadata, changelog
2.0.02026-03-17Rick DEnriched with production codebase content
1.0.02026-03-17Rick DInitial chapter creation