Engineering Intelligence

Chapter 6 — Engineering Intelligence

Engineering Intelligence is the third and deepest layer in RAPID AI. It answers: What mechanism explains this, and what should we do? Four modules work together: Module D (Health Staging and Prognostics), Module E (Maintenance Actions), Module F (Weibull Reliability), and Module G (Contradiction Driven Engineering).

Module D — Health Staging and Prognostics

Purpose

Module D maps the fused evidence from Module C into specific failure mechanisms and estimates how much useful life remains. It translates system-level health scores into actionable engineering assessments.

Health Stage Rules (HSR001-HSR006)

Six rules classify the machine into health stages based on SSI and trend slope:

Rule	Condition	Health Stage	RUL Band	Multiplier
HSR001	SSI < 0.30	Healthy	> 6 months	1.0
HSR002	SSI 0.30-0.50	Degrading (early)	3-6 months	0.85
HSR003	SSI 0.50-0.60 AND slope > 0.02	Degrading (late)	1-3 months	0.60
HSR004	SSI 0.60-0.80	Unstable	1-4 weeks	0.35
HSR005	SSI >= 0.80	Critical	< 7 days	0.20
HSR006	SSI >= 0.80 AND slope > 0.05	Critical (imminent)	< 48 hours	0.05

The RUL multiplier applies to the raw RUL estimate from trend extrapolation, compressing it based on the health stage assessment. A machine at HSR004 (Unstable) has its trend-based RUL multiplied by 0.35 because experience shows that unstable machines degrade faster than linear extrapolation predicts.

Slope Escalation

Slope-based overrides capture rapidly deteriorating conditions that the static SSI thresholds might miss:

Degrading + SSI_slope > 0.05 escalates to Unstable
Healthy + SSI_slope > 0.02 escalates to Degrading

This means a machine with SSI = 0.25 (nominally healthy) but a steep upward trend will be reclassified as degrading, triggering earlier investigation.

RUL Estimation

Module D computes remaining useful life using trend-based extrapolation:

RUL_linear = ln(failure_threshold / current_value) / slope_log
RUL_accel  = ln(failure_threshold / current_value) / (slope_log + slope_change)
RUL_adj    = RUL * (1 - NLI)     [instability adjustment when NLI >= 0.6]
rul_days   = raw_rul * rul_multiplier     [from the matched HSR rule]

The accelerating model is used when slope_change >= 0.01, which detects feedback-driven degradation. The NLI adjustment from Module B.2 penalizes unpredictable trends. The final rul_days value is capped at 3,650 days (10 years) and floored at 180 days for degrading machines with flat slopes.

Mechanism Candidate Matching

Module D matches the evidence pattern against known failure mechanisms:

Route	Evidence Required
Bearing fault	BPFO/BPFI harmonics + kurtosis rise + envelope peaks
Looseness	Sub-harmonics + many harmonics + V/H ratio anomaly
Rub	Fractional sub-harmonics + truncated waveform
Cavitation	Broadband noise + random peaks + flow correlation
Gear mesh deterioration	Mesh frequency sidebands + amplitude modulation
Structural resonance	Fixed frequency independent of speed + amplification

Each candidate receives a confidence score based on how many evidence sources align. Multiple candidates can be active simultaneously, ranked by confidence. The IMS provides the authoritative mapping from mechanism to maintenance action.

Module E — Maintenance Actions

Purpose

Module E converts diagnostic intelligence into maintenance decisions. It answers: What should we do, and how urgently?

Priority Score

P = 100 * (0.45 * S + 0.25 * C + 0.20 * K + 0.10 * U) * M_safe * R_sp * R_mp

Component	Weight	Description
S	0.45	Signal severity score (0-1)
C	0.25	Fault confidence (0-1)
K	0.20	Asset criticality from Module 0 (0-1)
U	0.10	Trend urgency from Module B.2 (0-1)
M_safe	1.5 or 1.0	Safety multiplier
R_sp	0.7 or 1.0	Spares availability
R_mp	0.7 or 1.0	Manpower availability

The safety multiplier (1.5x) can push the priority score above 100, reflecting that safety-critical assets deserve highest urgency regardless of other factors. The spares and manpower modifiers reduce priority when the intervention cannot proceed — not because the fault is less serious, but because issuing an urgent work order for a repair that cannot be performed wastes operational attention.

Priority Windows

Score	Response
>= 85	Immediate — Emergency shutdown if needed; immediate repair
>= 70	24 hours — Plan urgent intervention
>= 50	7 days — Schedule in next available window
< 50	Next shutdown — Add to planned outage scope
(process_corr >= 0.75)	Process-driven — Adjust operations, not maintenance

The process-driven category is critical for avoiding unnecessary maintenance interventions. When Module C identifies that the dominant contributor has high process correlation (BSR004), Module E redirects the response from maintenance to operations. This prevents the costly error of replacing a bearing that is vibrating due to upstream process instability.

Action Catalog (ACT001-ACT015)

Action ID	Title	Trigger
ACT001	Confirmation measurement run	Post-maintenance verification
ACT002	Lubrication service	Bearing stress, temperature rise
ACT003	Alignment correction	Misalignment detection
ACT004	Balance correction	Imbalance detection
ACT005	Bearing replacement (planned)	Progressive bearing damage
ACT006	Foundation inspection/correction	Looseness, structural issues
ACT007	Investigate electrical	Motor electrical faults
ACT008	Emergency shutdown	P >= 85, imminent failure
ACT009	Gear inspection	Gear mesh deterioration
ACT010	Flow correction	Cavitation, recirculation
ACT011	Coupling replacement	Coupling wear/backlash
ACT012	Seal replacement	Seal leakage/wear

Every action traces back through the justification chain to the physics that triggered it. The dashboard displays not just the action but the evidence: “ACT003 — Alignment correction. Triggered by AFB07: A/H > 1.3 with 2x peak at 1.5x baseline. Misalignment is transferring axial force through the coupling, overloading the DE bearing.”

RCM Strategy Selection

Module E selects from six strategy tiers based on the failure pattern, consequence class, and detection capability:

Strategy	When Applied
Run-to-failure	Hidden function, no safety consequence, cheap replacement
Time-based replacement	Known wear-out pattern (Nowlan A/B), predictable life
Condition-based monitoring	Random/complex failure (D/E/F patterns — 82% of cases)
Predictive with AI	High-value asset, sufficient historical data
Redesign	Chronic failure, contradiction detected
Operational change	Process-driven, environment-driven

The full RCM decision framework is detailed in Chapter 9.

Module F — Weibull Reliability

Purpose

Module F provides statistically grounded remaining useful life estimates by combining population Weibull statistics with real-time condition data. This is the bridge between statistical reliability engineering (how long do these components typically last?) and condition-based diagnostics (how is this specific machine doing right now?).

Condition-Adjusted Weibull Parameters

The key innovation is adjusting population-level Weibull parameters based on real-time condition evidence:

beta_adj = beta_base * (1 + 0.8 * S_eff)      [alpha_severity = 0.8]
eta_adj  = eta_base  * (1 - 0.6 * SSI)         [gamma_degradation = 0.6]

beta_adj (shape): As severity increases, beta increases. A higher beta steepens the hazard curve, meaning the machine transitions from random failure (beta ~ 1.0) to predictable wear-out (beta > 2.0). This reflects the physics: a machine with active degradation is no longer failing randomly — it is on a deterministic degradation path.
eta_adj (scale): As SSI increases (more degradation), eta decreases. A lower eta compresses the characteristic life, meaning the machine’s effective remaining population life is shorter than the population average. A machine with SSI = 0.80 has its characteristic life reduced to 52% of the population value.

Component Base Parameters

Component	beta (shape)	eta (scale, hours)	Applies To
Bearing	1.5	50,000	AFB, journal, TPJB
Gear	2.5	80,000	Gears
Motor	1.3	70,000	AC motor, DC motor
Generic	1.5	50,000	Coupling, foundation, fluid_flow, belts, chains, shafts

Three Reliability Models (F001-F003)

Module F provides three complementary outputs:

F001 — Weibull RUL:

RUL_weibull = eta_adj * (-ln(R_target))^(1/beta_adj) - t_current
R_target = 0.90 (90% reliability target, configurable)

F002 — 30-Day Failure Probability:

P_30 = [R(t) - R(t + 720h)] / R(t)

This answers the maintenance planner’s question: “What is the chance this machine fails in the next month?”

F003 — Risk Index:

Risk_Index = 100 * severity * criticality     clamped to [0, 100]

Risk_Index > 70 warrants urgent action. Risk_Index combines the diagnostic assessment (severity) with the business context (criticality) into a single prioritization metric.

P-F Interval Position

PF_position = (current_value - baseline) / (failure_threshold - baseline)

PF_position maps the machine’s current state onto the P-F interval — the window between the point where a failure is detectable (P) and the point of functional failure (F). PF_position > 0.80 means the machine is 80% of the way from detection to failure, leaving only 20% of the intervention window remaining.

Module G — Contradiction Driven Engineering (CDE)

Purpose

When a machine keeps failing despite correct maintenance, the problem is in the design, not the execution. Module G identifies engineering contradictions — cases where two valid design requirements oppose each other, making failure inevitable — and proposes design-level resolutions.

Module G is the Track 3 capability described in Chapter 2. It is the highest-leverage intervention: instead of managing failure, it eliminates the conditions that make failure possible.

Trigger Conditions

Module G runs only when at least one condition is met:

Recurrence flag: Same failure mode repeating despite corrective maintenance
Sustained alarm: Warning or alarm state sustained across multiple evaluation cycles
High economic impact: Failure cost threshold exceeded
Contradiction signal: Opposing trends in related parameters (e.g., bearing temperature falling while vibration rises — suggesting that a lubricant change solved the thermal problem but created a mechanical problem)

Eight Contradiction Types

Type	Contradiction	Resolution Pattern
CT01	High load capacity vs. low vibration	Stiffness optimization, damping design
CT02	Tight clearance vs. thermal growth accommodation	Thermal compensation design
CT03	Precise alignment vs. thermal growth shifts	Hot alignment, growth compensation
CT04	High speed vs. dynamic stability	Critical speed management
CT05	Flow efficiency vs. cavitation avoidance	NPSH margin design
CT06	Rigid foundation vs. vibration isolation	Tuned isolation mounts
CT07	High stiffness vs. flexibility for thermal growth	Compliant coupling design

Resolution Hierarchy (from IAR Model)

Tier	Strategy	Target	Example
1 — Strategic	Eliminate root initiators	Capital investment, design change	Upgrade bearing type, change material
2 — Tactical	Suppress accelerators	Procedural or parameter change	Reduce load, improve cooling path
3 — Defensive	Strengthen retarders	Enhance protection	Upgrade lubrication, add redundancy

Seven Resolution Patterns

Module G proposes resolutions from seven proven engineering patterns:

Material substitution — change the bearing material, coating, or surface treatment to resist the failure mechanism
Geometry modification — change shaft diameter, overhang length, or clearances to alter the load distribution
Operating point adjustment — change the speed, load, or flow to move away from the design contradiction zone
Thermal management — add cooling, modify thermal growth allowances, or implement hot-alignment procedures
Damping addition — add dampers, squeeze film bearings, or compliant mounts to absorb energy
Redundancy — add backup systems, standby equipment, or parallel flow paths
Process redesign — change the upstream or downstream process to remove the operating condition that creates the contradiction

The Track 3 Value Proposition

Most predictive maintenance systems stop at “replace the bearing.” Module G asks why the bearing keeps failing and proposes changes that make the failure mode physically impossible.

Consider a cooling tower fan bearing that fails every 18 months due to thermal cycling (CT02). The clearance is set tight for low vibration at operating temperature, but during cold starts the clearance is too tight, crushing the bearing preload. No amount of monitoring or maintenance can solve this — it is a design contradiction between thermal growth and clearance specification.

Module G identifies the contradiction (tight clearance vs. thermal growth), classifies it as CT02, and proposes resolution: specify a bearing with temperature-compensated clearance, or implement a slow warm-up protocol that limits start-up thermal gradients. The equation: Root Cause + Design-Out = Zero Chronic Failures.

Reference Implementation: Diagnostic Engine

"""RAPID AI — Modules D-G: Engineering Intelligence
Diagnostic engine, RUL estimation, and design-out recommendation.
"""
import math
from dataclasses import dataclass

@dataclass(frozen=True)
class DiagnosisResult:
    fault_code: str
    fault_name: str
    confidence: float
    severity: float
    contributing_rules: list[str]

@dataclass(frozen=True)
class RULEstimate:
    model: str        # "linear", "accelerating", "instability"
    rul_hours: float
    confidence: float
    hazard_probability: float

def propagate_confidence(rule_confidences: list[float], data_quality: float) -> float:
    """RAPID AI confidence propagation formula.
    C_final = Q_data × (1 − ∏(1 − C_i))
    """
    if not rule_confidences:
        return 0.0
    product = math.prod(1 - c for c in rule_confidences)
    return data_quality * (1 - product)

def effective_severity(fusion_severity: float, data_quality: float) -> float:
    """S_eff = S_fusion × Q_data"""
    return fusion_severity * data_quality

# --- Module F: RUL Estimation ---
def rul_linear(current_severity: float, slope: float, threshold: float = 0.80) -> RULEstimate:
    """F001: Linear degradation model.
    RUL = (threshold - current) / slope
    """
    if slope <= 0:
        return RULEstimate("linear", float('inf'), 0.3, 0.0)
    rul = (threshold - current_severity) / slope
    return RULEstimate("linear", max(0, rul), 0.7, 0.0)

def rul_accelerating(current_severity: float, slope: float,
                     acceleration: float, threshold: float = 0.80) -> RULEstimate:
    """F002: Accelerating degradation (quadratic).
    Uses quadratic formula to solve: current + slope*t + 0.5*accel*t^2 = threshold
    """
    a = 0.5 * acceleration
    b = slope
    c = current_severity - threshold
    discriminant = b**2 - 4*a*c
    if discriminant < 0 or a == 0:
        return rul_linear(current_severity, slope, threshold)
    t = (-b + math.sqrt(discriminant)) / (2 * a)
    return RULEstimate("accelerating", max(0, t), 0.6, 0.0)

def rul_weibull_adjusted(beta_base: float, eta_base: float,
                         severity: float, ssi: float,
                         operating_hours: float) -> RULEstimate:
    """Condition-adjusted Weibull RUL.
    β_adj = β_base × (1 + 0.8 × S_eff)
    η_adj = η_base × (1 − 0.6 × SSI)
    Hazard: h(t) = (β/η)(t/η)^(β-1)
    """
    beta = beta_base * (1 + 0.8 * severity)
    eta = max(1, eta_base * (1 - 0.6 * ssi))
    # Hazard probability over next interval
    t = operating_hours
    hazard = (beta / eta) * (t / eta) ** (beta - 1) if t > 0 else 0
    prob = 1 - math.exp(-hazard)
    # Estimate RUL as time to 50% failure probability
    rul = eta * (math.log(2) ** (1/beta)) - t
    return RULEstimate("weibull", max(0, rul), 0.65, min(1, prob))

# Weibull coefficients per component type (from production thresholds)
WEIBULL_COEFFICIENTS = {
    "bearing":  {"beta": 1.5, "eta": 50000},
    "gear":     {"beta": 2.5, "eta": 80000},
    "motor":    {"beta": 1.3, "eta": 70000},
    "coupling": {"beta": 2.0, "eta": 60000},
    "pump":     {"beta": 1.8, "eta": 55000},
}

The Engineering Intelligence layer transforms diagnostic evidence into engineering decisions. The next chapter examines the Integrated Master Schema — the structure that connects every diagnostic decision to a specific sensor, rule, action, and dashboard output.

Standards Alignment

Standard	Relevance to This Chapter
ISO 13374 — Condition monitoring and diagnostics of machines	Module D implements ISO 13374 Level 5 (Prognostics) with health staging and RUL estimation. Module E implements Level 6 (Advisory Generation) with structured maintenance actions.
ISO 13381-1 — Prognostics	Module D’s RUL estimation (linear, accelerating, NLI-adjusted models) and Module F’s Weibull reliability modeling directly implement ISO 13381-1’s prognostic methodology for remaining useful life estimation.
IEC 61649 — Weibull analysis	Module F’s condition-adjusted Weibull parameters (beta_adj, eta_adj) implement IEC 61649’s two-parameter Weibull model, extended with real-time condition adjustment factors that bridge statistical reliability with sensor-based diagnostics.
ISO 17359 — General guidelines for condition monitoring	Module E’s action catalog (ACT001-ACT015) and priority scoring implement ISO 17359’s guidance on translating condition monitoring results into maintenance decisions.
SAE JA1011/JA1012 — RCM evaluation criteria	Module E’s six-tier strategy selection implements SAE JA1011’s consequence-driven maintenance task selection logic, from run-to-failure through redesign.
EN 13306 — Maintenance terminology	The action catalog uses EN 13306-compliant terminology for maintenance types: corrective, condition-based, predetermined, and improvement maintenance.

Changelog

Version	Date	Author	Changes
2.1.0	2026-03-17	Rick D	Added standards alignment, living doc metadata, changelog
2.0.0	2026-03-17	Rick D	Enriched with production codebase content
1.0.0	2026-03-17	Rick D	Initial chapter creation

Next: Chapter 7 — The IMS Previous: Chapter 5 — Stability Intelligence