Skip to content

Engineering Intelligence

Engineering Intelligence is the third and deepest layer in RAPID AI. It answers: What mechanism explains this, and what should we do? Four modules work together: Module D (Health Staging and Prognostics), Module E (Maintenance Actions), Module F (Weibull Reliability), and Module G (Contradiction Driven Engineering).


Module D — Health Staging and Prognostics

Section titled “Module D — Health Staging and Prognostics”

Module D maps the fused evidence from Module C into specific failure mechanisms and estimates how much useful life remains. It translates system-level health scores into actionable engineering assessments.

Six rules classify the machine into health stages based on SSI and trend slope:

RuleConditionHealth StageRUL BandMultiplier
HSR001SSI < 0.30Healthy> 6 months1.0
HSR002SSI 0.30-0.50Degrading (early)3-6 months0.85
HSR003SSI 0.50-0.60 AND slope > 0.02Degrading (late)1-3 months0.60
HSR004SSI 0.60-0.80Unstable1-4 weeks0.35
HSR005SSI >= 0.80Critical< 7 days0.20
HSR006SSI >= 0.80 AND slope > 0.05Critical (imminent)< 48 hours0.05

The RUL multiplier applies to the raw RUL estimate from trend extrapolation, compressing it based on the health stage assessment. A machine at HSR004 (Unstable) has its trend-based RUL multiplied by 0.35 because experience shows that unstable machines degrade faster than linear extrapolation predicts.

Slope-based overrides capture rapidly deteriorating conditions that the static SSI thresholds might miss:

  • Degrading + SSI_slope > 0.05 escalates to Unstable
  • Healthy + SSI_slope > 0.02 escalates to Degrading

This means a machine with SSI = 0.25 (nominally healthy) but a steep upward trend will be reclassified as degrading, triggering earlier investigation.

Module D computes remaining useful life using trend-based extrapolation:

RUL_linear = ln(failure_threshold / current_value) / slope_log
RUL_accel = ln(failure_threshold / current_value) / (slope_log + slope_change)
RUL_adj = RUL * (1 - NLI) [instability adjustment when NLI >= 0.6]
rul_days = raw_rul * rul_multiplier [from the matched HSR rule]

The accelerating model is used when slope_change >= 0.01, which detects feedback-driven degradation. The NLI adjustment from Module B.2 penalizes unpredictable trends. The final rul_days value is capped at 3,650 days (10 years) and floored at 180 days for degrading machines with flat slopes.

Module D matches the evidence pattern against known failure mechanisms:

RouteEvidence Required
Bearing faultBPFO/BPFI harmonics + kurtosis rise + envelope peaks
LoosenessSub-harmonics + many harmonics + V/H ratio anomaly
RubFractional sub-harmonics + truncated waveform
CavitationBroadband noise + random peaks + flow correlation
Gear mesh deteriorationMesh frequency sidebands + amplitude modulation
Structural resonanceFixed frequency independent of speed + amplification

Each candidate receives a confidence score based on how many evidence sources align. Multiple candidates can be active simultaneously, ranked by confidence. The IMS provides the authoritative mapping from mechanism to maintenance action.


Module E converts diagnostic intelligence into maintenance decisions. It answers: What should we do, and how urgently?

P = 100 * (0.45 * S + 0.25 * C + 0.20 * K + 0.10 * U) * M_safe * R_sp * R_mp
ComponentWeightDescription
S0.45Signal severity score (0-1)
C0.25Fault confidence (0-1)
K0.20Asset criticality from Module 0 (0-1)
U0.10Trend urgency from Module B.2 (0-1)
M_safe1.5 or 1.0Safety multiplier
R_sp0.7 or 1.0Spares availability
R_mp0.7 or 1.0Manpower availability

The safety multiplier (1.5x) can push the priority score above 100, reflecting that safety-critical assets deserve highest urgency regardless of other factors. The spares and manpower modifiers reduce priority when the intervention cannot proceed — not because the fault is less serious, but because issuing an urgent work order for a repair that cannot be performed wastes operational attention.

ScoreResponse
>= 85Immediate — Emergency shutdown if needed; immediate repair
>= 7024 hours — Plan urgent intervention
>= 507 days — Schedule in next available window
< 50Next shutdown — Add to planned outage scope
(process_corr >= 0.75)Process-driven — Adjust operations, not maintenance

The process-driven category is critical for avoiding unnecessary maintenance interventions. When Module C identifies that the dominant contributor has high process correlation (BSR004), Module E redirects the response from maintenance to operations. This prevents the costly error of replacing a bearing that is vibrating due to upstream process instability.

Action IDTitleTrigger
ACT001Confirmation measurement runPost-maintenance verification
ACT002Lubrication serviceBearing stress, temperature rise
ACT003Alignment correctionMisalignment detection
ACT004Balance correctionImbalance detection
ACT005Bearing replacement (planned)Progressive bearing damage
ACT006Foundation inspection/correctionLooseness, structural issues
ACT007Investigate electricalMotor electrical faults
ACT008Emergency shutdownP >= 85, imminent failure
ACT009Gear inspectionGear mesh deterioration
ACT010Flow correctionCavitation, recirculation
ACT011Coupling replacementCoupling wear/backlash
ACT012Seal replacementSeal leakage/wear

Every action traces back through the justification chain to the physics that triggered it. The dashboard displays not just the action but the evidence: “ACT003 — Alignment correction. Triggered by AFB07: A/H > 1.3 with 2x peak at 1.5x baseline. Misalignment is transferring axial force through the coupling, overloading the DE bearing.

Module E selects from six strategy tiers based on the failure pattern, consequence class, and detection capability:

StrategyWhen Applied
Run-to-failureHidden function, no safety consequence, cheap replacement
Time-based replacementKnown wear-out pattern (Nowlan A/B), predictable life
Condition-based monitoringRandom/complex failure (D/E/F patterns — 82% of cases)
Predictive with AIHigh-value asset, sufficient historical data
RedesignChronic failure, contradiction detected
Operational changeProcess-driven, environment-driven

The full RCM decision framework is detailed in Chapter 9.


Module F provides statistically grounded remaining useful life estimates by combining population Weibull statistics with real-time condition data. This is the bridge between statistical reliability engineering (how long do these components typically last?) and condition-based diagnostics (how is this specific machine doing right now?).

The key innovation is adjusting population-level Weibull parameters based on real-time condition evidence:

beta_adj = beta_base * (1 + 0.8 * S_eff) [alpha_severity = 0.8]
eta_adj = eta_base * (1 - 0.6 * SSI) [gamma_degradation = 0.6]
  • beta_adj (shape): As severity increases, beta increases. A higher beta steepens the hazard curve, meaning the machine transitions from random failure (beta ~ 1.0) to predictable wear-out (beta > 2.0). This reflects the physics: a machine with active degradation is no longer failing randomly — it is on a deterministic degradation path.

  • eta_adj (scale): As SSI increases (more degradation), eta decreases. A lower eta compresses the characteristic life, meaning the machine’s effective remaining population life is shorter than the population average. A machine with SSI = 0.80 has its characteristic life reduced to 52% of the population value.

Componentbeta (shape)eta (scale, hours)Applies To
Bearing1.550,000AFB, journal, TPJB
Gear2.580,000Gears
Motor1.370,000AC motor, DC motor
Generic1.550,000Coupling, foundation, fluid_flow, belts, chains, shafts

Module F provides three complementary outputs:

F001 — Weibull RUL:

RUL_weibull = eta_adj * (-ln(R_target))^(1/beta_adj) - t_current
R_target = 0.90 (90% reliability target, configurable)

F002 — 30-Day Failure Probability:

P_30 = [R(t) - R(t + 720h)] / R(t)

This answers the maintenance planner’s question: “What is the chance this machine fails in the next month?”

F003 — Risk Index:

Risk_Index = 100 * severity * criticality clamped to [0, 100]

Risk_Index > 70 warrants urgent action. Risk_Index combines the diagnostic assessment (severity) with the business context (criticality) into a single prioritization metric.

PF_position = (current_value - baseline) / (failure_threshold - baseline)

PF_position maps the machine’s current state onto the P-F interval — the window between the point where a failure is detectable (P) and the point of functional failure (F). PF_position > 0.80 means the machine is 80% of the way from detection to failure, leaving only 20% of the intervention window remaining.


Module G — Contradiction Driven Engineering (CDE)

Section titled “Module G — Contradiction Driven Engineering (CDE)”

When a machine keeps failing despite correct maintenance, the problem is in the design, not the execution. Module G identifies engineering contradictions — cases where two valid design requirements oppose each other, making failure inevitable — and proposes design-level resolutions.

Module G is the Track 3 capability described in Chapter 2. It is the highest-leverage intervention: instead of managing failure, it eliminates the conditions that make failure possible.

Module G runs only when at least one condition is met:

  • Recurrence flag: Same failure mode repeating despite corrective maintenance
  • Sustained alarm: Warning or alarm state sustained across multiple evaluation cycles
  • High economic impact: Failure cost threshold exceeded
  • Contradiction signal: Opposing trends in related parameters (e.g., bearing temperature falling while vibration rises — suggesting that a lubricant change solved the thermal problem but created a mechanical problem)
TypeContradictionResolution Pattern
CT01High load capacity vs. low vibrationStiffness optimization, damping design
CT02Tight clearance vs. thermal growth accommodationThermal compensation design
CT03Precise alignment vs. thermal growth shiftsHot alignment, growth compensation
CT04High speed vs. dynamic stabilityCritical speed management
CT05Flow efficiency vs. cavitation avoidanceNPSH margin design
CT06Rigid foundation vs. vibration isolationTuned isolation mounts
CT07High stiffness vs. flexibility for thermal growthCompliant coupling design
TierStrategyTargetExample
1 — StrategicEliminate root initiatorsCapital investment, design changeUpgrade bearing type, change material
2 — TacticalSuppress acceleratorsProcedural or parameter changeReduce load, improve cooling path
3 — DefensiveStrengthen retardersEnhance protectionUpgrade lubrication, add redundancy

Module G proposes resolutions from seven proven engineering patterns:

  1. Material substitution — change the bearing material, coating, or surface treatment to resist the failure mechanism
  2. Geometry modification — change shaft diameter, overhang length, or clearances to alter the load distribution
  3. Operating point adjustment — change the speed, load, or flow to move away from the design contradiction zone
  4. Thermal management — add cooling, modify thermal growth allowances, or implement hot-alignment procedures
  5. Damping addition — add dampers, squeeze film bearings, or compliant mounts to absorb energy
  6. Redundancy — add backup systems, standby equipment, or parallel flow paths
  7. Process redesign — change the upstream or downstream process to remove the operating condition that creates the contradiction

Most predictive maintenance systems stop at “replace the bearing.” Module G asks why the bearing keeps failing and proposes changes that make the failure mode physically impossible.

Consider a cooling tower fan bearing that fails every 18 months due to thermal cycling (CT02). The clearance is set tight for low vibration at operating temperature, but during cold starts the clearance is too tight, crushing the bearing preload. No amount of monitoring or maintenance can solve this — it is a design contradiction between thermal growth and clearance specification.

Module G identifies the contradiction (tight clearance vs. thermal growth), classifies it as CT02, and proposes resolution: specify a bearing with temperature-compensated clearance, or implement a slow warm-up protocol that limits start-up thermal gradients. The equation: Root Cause + Design-Out = Zero Chronic Failures.


Reference Implementation: Diagnostic Engine

Section titled “Reference Implementation: Diagnostic Engine”
"""RAPID AI — Modules D-G: Engineering Intelligence
Diagnostic engine, RUL estimation, and design-out recommendation.
"""
import math
from dataclasses import dataclass
@dataclass(frozen=True)
class DiagnosisResult:
fault_code: str
fault_name: str
confidence: float
severity: float
contributing_rules: list[str]
@dataclass(frozen=True)
class RULEstimate:
model: str # "linear", "accelerating", "instability"
rul_hours: float
confidence: float
hazard_probability: float
def propagate_confidence(rule_confidences: list[float], data_quality: float) -> float:
"""RAPID AI confidence propagation formula.
C_final = Q_data × (1 − ∏(1 − C_i))
"""
if not rule_confidences:
return 0.0
product = math.prod(1 - c for c in rule_confidences)
return data_quality * (1 - product)
def effective_severity(fusion_severity: float, data_quality: float) -> float:
"""S_eff = S_fusion × Q_data"""
return fusion_severity * data_quality
# --- Module F: RUL Estimation ---
def rul_linear(current_severity: float, slope: float, threshold: float = 0.80) -> RULEstimate:
"""F001: Linear degradation model.
RUL = (threshold - current) / slope
"""
if slope <= 0:
return RULEstimate("linear", float('inf'), 0.3, 0.0)
rul = (threshold - current_severity) / slope
return RULEstimate("linear", max(0, rul), 0.7, 0.0)
def rul_accelerating(current_severity: float, slope: float,
acceleration: float, threshold: float = 0.80) -> RULEstimate:
"""F002: Accelerating degradation (quadratic).
Uses quadratic formula to solve: current + slope*t + 0.5*accel*t^2 = threshold
"""
a = 0.5 * acceleration
b = slope
c = current_severity - threshold
discriminant = b**2 - 4*a*c
if discriminant < 0 or a == 0:
return rul_linear(current_severity, slope, threshold)
t = (-b + math.sqrt(discriminant)) / (2 * a)
return RULEstimate("accelerating", max(0, t), 0.6, 0.0)
def rul_weibull_adjusted(beta_base: float, eta_base: float,
severity: float, ssi: float,
operating_hours: float) -> RULEstimate:
"""Condition-adjusted Weibull RUL.
β_adj = β_base × (1 + 0.8 × S_eff)
η_adj = η_base × (1 − 0.6 × SSI)
Hazard: h(t) = (β/η)(t/η)^(β-1)
"""
beta = beta_base * (1 + 0.8 * severity)
eta = max(1, eta_base * (1 - 0.6 * ssi))
# Hazard probability over next interval
t = operating_hours
hazard = (beta / eta) * (t / eta) ** (beta - 1) if t > 0 else 0
prob = 1 - math.exp(-hazard)
# Estimate RUL as time to 50% failure probability
rul = eta * (math.log(2) ** (1/beta)) - t
return RULEstimate("weibull", max(0, rul), 0.65, min(1, prob))
# Weibull coefficients per component type (from production thresholds)
WEIBULL_COEFFICIENTS = {
"bearing": {"beta": 1.5, "eta": 50000},
"gear": {"beta": 2.5, "eta": 80000},
"motor": {"beta": 1.3, "eta": 70000},
"coupling": {"beta": 2.0, "eta": 60000},
"pump": {"beta": 1.8, "eta": 55000},
}

The Engineering Intelligence layer transforms diagnostic evidence into engineering decisions. The next chapter examines the Integrated Master Schema — the structure that connects every diagnostic decision to a specific sensor, rule, action, and dashboard output.


StandardRelevance to This Chapter
ISO 13374 — Condition monitoring and diagnostics of machinesModule D implements ISO 13374 Level 5 (Prognostics) with health staging and RUL estimation. Module E implements Level 6 (Advisory Generation) with structured maintenance actions.
ISO 13381-1 — PrognosticsModule D’s RUL estimation (linear, accelerating, NLI-adjusted models) and Module F’s Weibull reliability modeling directly implement ISO 13381-1’s prognostic methodology for remaining useful life estimation.
IEC 61649 — Weibull analysisModule F’s condition-adjusted Weibull parameters (beta_adj, eta_adj) implement IEC 61649’s two-parameter Weibull model, extended with real-time condition adjustment factors that bridge statistical reliability with sensor-based diagnostics.
ISO 17359 — General guidelines for condition monitoringModule E’s action catalog (ACT001-ACT015) and priority scoring implement ISO 17359’s guidance on translating condition monitoring results into maintenance decisions.
SAE JA1011/JA1012 — RCM evaluation criteriaModule E’s six-tier strategy selection implements SAE JA1011’s consequence-driven maintenance task selection logic, from run-to-failure through redesign.
EN 13306 — Maintenance terminologyThe action catalog uses EN 13306-compliant terminology for maintenance types: corrective, condition-based, predetermined, and improvement maintenance.
VersionDateAuthorChanges
2.1.02026-03-17Rick DAdded standards alignment, living doc metadata, changelog
2.0.02026-03-17Rick DEnriched with production codebase content
1.0.02026-03-17Rick DInitial chapter creation

Next: Chapter 7 — The IMS Previous: Chapter 5 — Stability Intelligence