AI Implementation

Chapter 18 — AI Implementation

18.1 The Hybrid AI Architecture

RAPID AI is not a machine learning system that learned to predict failures from historical data. It is a physics-informed hybrid intelligence platform where three distinct layers of technology each contribute what they do best.

Layer 1: Physics-Based Rules (the backbone). The foundation of every diagnostic decision is a library of 451+ deterministic rules encoding Dibyendu De’s 28 years of field experience across 4,000+ validated cases. These rules are parseable boolean expressions evaluated against live sensor data by a recursive-descent parser (rule_evaluator.py). A rule like A/H > 1.2 AND V/H < 0.8 fires when axial vibration dominates and vertical is suppressed — the physics signature of excessive bearing preload (AFB01). No training data. No weight tuning. No gradient descent. The rule fires because the physics says it should, and it can be read, audited, and challenged by any reliability engineer who understands bearing mechanics.

The rule library spans 12 component categories (anti-friction bearings, journal bearings, tilting-pad bearings, couplings, AC motors, DC motors, gears, foundations, belts, chains, fluid/flow, shafts), plus 300 imperfection rules across 8 equipment types, plus 100 IMS evidence patterns that map complete diagnostic chains from sensor input to maintenance action. Every rule carries a severity weight, a physics basis explanation, and a recommended corrective action. This is not a model — it is an encoded engineering textbook.

Layer 2: LLM-Augmented Reasoning (the intelligence layer). Large language models sit on top of the rule engine. They do not make diagnostic decisions. They translate diagnostic decisions into natural language that reliability engineers, maintenance planners, and plant managers can understand and act on. When the rule engine determines that pump P-101 has entered the Unstable state because BPFO harmonics are rising with accelerating trend slope and the SEDL entropy analysis shows spectral energy dispersing, the LLM synthesizes that into a paragraph a human can read in thirty seconds and make a decision from.

The LLM also handles ambiguous queries. When an engineer asks “what’s going on with the cooling water system,” the system must identify which assets belong to that system, retrieve their current diagnostic states, and compose a coherent multi-asset summary. Formal rule evaluation cannot do this — it operates on structured sensor data, not natural language questions. The LLM bridges the gap between how engineers think (in systems, relationships, and plain language) and how the diagnostic engine operates (in numeric thresholds, boolean logic, and state machines).

Layer 3: Vector Similarity Search (the retrieval layer). PostgreSQL with pgvector stores 768-dimensional embeddings of rule descriptions, IMS evidence patterns, FRETTLSM factor descriptions, and imperfection rule catalogs. When a query arrives — whether from a user or from the LLM context assembly pipeline — semantic similarity search retrieves the most relevant rules and patterns, even when the terminology does not exactly match. An engineer asking about “pump cavitation” retrieves IMS001 (cavitation) but also surfaces FL03 (recirculation) and FL05 (vane pass frequency), because the embedding space captures the semantic proximity of these hydraulic failure modes.

This three-layer architecture is what makes RAPID AI a “Physics-Informed AI” system in the formal sense. Physics provides the constraints, structure, and explainability. AI provides the flexibility, communication, and semantic understanding. Neither layer is sufficient alone. A rule engine without LLM augmentation produces machine-readable output that requires expert interpretation. An LLM without physics rules hallucinates diagnostic conclusions. Together, they form a system where every AI-generated explanation is grounded in deterministic physics, and every physics result is communicated in a form humans can act on.

18.2 The Diagnostic Copilot

The centerpiece AI feature is a conversational interface where engineers interact with the diagnostic engine in natural language. The copilot does not replace the engineer’s judgment — it amplifies it by making the full depth of the diagnostic pipeline accessible through conversation.

Example interaction:

An engineer types: “Why is pump P-101 in S3?”

The system executes a chain of operations behind the question:

Entity extraction identifies P-101 as an asset ID and S3 as an AESF stability state (Diffuse Disorder).
The current diagnostic state is retrieved: SSI = 0.72, System_State = warning, AESF state S3 with SI = 32, EI = 68.
The matched rules are gathered: AFB09 (bearing outer race defect), Module B.2 trend class “accelerating” with slope = 0.038, Module B.3 entropy state “Destabilizing” with dSE/dt = 0.024.
The FRETTLSM causal analysis results are retrieved: top category T1 (Tribological), top factor T108 (Contaminated lubricant, confidence 0.75), secondary factor T106 (Surface fatigue, confidence 0.60).
The IMS row is identified: IMS row mapping to bearing outer race defect with CBM strategy and 7-day action window.
Module E priority score is computed: P = 73 (24-hour priority window), recommended actions ACT005 (bearing replacement, planned) and ACT002 (lubrication service).

The copilot assembles this evidence into a structured prompt and the LLM generates:

“Pump P-101 entered S3 (Diffuse Disorder) because bearing outer race defect frequency (BPFO) has been rising with an accelerating trend for 14 days. The SEDL entropy analysis shows spectral energy dispersing from concentrated lines into broadband noise — spectral entropy has increased from 0.38 to 0.62 over the past three evaluation cycles. Temperature at the drive-end bearing has risen from 62C baseline to 78C. The FRETTLSM analysis identifies tribological degradation (T108: contaminated lubricant, confidence 0.75) as the most likely initiator, with surface fatigue (T106, confidence 0.60) as a secondary contributor. Priority score is 73, placing this in the 24-hour action window. Recommended: schedule bearing replacement within 7 days. Before replacement, take an oil sample to confirm the contamination hypothesis. [Citations: AFB09, BSR001, SR03, T108, T106, IMS-bearing-OR, ACT005, ACT002]”

Design decisions that make this work:

The LLM never invents diagnostic conclusions. Every statement in the copilot’s response corresponds to a specific output from the deterministic pipeline. The bearing defect was identified by rule AFB09, not by the LLM. The trend classification came from Module B.2, not from the LLM. The causal hypothesis came from the FRETTLSM engine’s keyword matching against the 88-factor taxonomy. The LLM’s job is to weave these facts into coherent prose — not to generate new facts.

Every response includes citations. The citation block at the end of each response lists the specific rule IDs, IMS rows, sensor readings, and module outputs that support each claim. An engineer who questions the recommendation can trace any statement back to its source. This is not optional formatting — it is enforced by the structured output schema.

Confidence scores are computed by the rule engine, not the LLM. When the copilot reports “confidence 0.75,” that number came from confidence.py — the canonical confidence scoring module that maps qualitative assessments to numeric values through a defined hierarchy (CONFIDENCE_HIGH = 0.85, CONFIDENCE_MEDIUM_HIGH = 0.75, CONFIDENCE_MEDIUM = 0.60, CONFIDENCE_LOW = 0.40, CONFIDENCE_INSUFFICIENT = 0.00). The LLM cannot inflate or deflate confidence. It reports what the engine computed.

The LLM is a translator, not an analyst. This is the most important architectural boundary. The diagnostic engine does the engineering. The LLM does the communication. Cross this boundary and you get a system that sounds authoritative while being unreliable. Maintain it and you get a system that is as trustworthy as the physics it is built on.

18.3 RAG (Retrieval-Augmented Generation) Architecture

The copilot does not send the entire rule library to the LLM with every query. It uses Retrieval-Augmented Generation to select only the relevant context for each question, keeping token usage manageable while maintaining diagnostic precision.

Step 1: Embedding Pipeline

At system startup and on reference data updates, the embedding pipeline processes the complete knowledge base into pgvector:

Knowledge Source	Count	What Gets Embedded
Initiator rule descriptions	119	Physics basis, severity cascade, diagnosis label, recommended action
Imperfection rule descriptions	300	Evaluation logic, engineering reason, recommended action
IMS evidence patterns	100	Failure mode, sensor evidence logic, RCM decision, maintenance task
FRETTLSM factor descriptions	88	Factor name, description, keyword set, IAR classification
Action catalog entries	15	Action title, trigger conditions, procedure summary
Equipment type profiles	19	Asset class, subsystem types, typical failure modes

Each item is embedded as a 768-dimensional vector using a domain-tuned embedding model. The embedding captures semantic meaning — “bearing outer race spalling” and “rolling element surface fatigue” land near each other in the vector space even though they share no keywords.

Step 2: Query Processing

When a user question arrives, the system performs entity extraction and query decomposition:

Asset identification: Extract asset IDs (P-101), asset types (pump, motor), and system names (cooling water) from the natural language query.
Failure mode extraction: Identify mentioned failure modes, symptoms, or conditions (cavitation, high vibration, temperature rise).
Module identification: Determine which pipeline modules are relevant (stability state implies AESF, trend implies B.2, cause implies FRETTLSM).
Query embedding: Embed the processed query for semantic retrieval.

Step 3: Context Assembly

The retrieval pipeline combines three strategies to build the LLM context window:

Deterministic retrieval: If an asset ID is specified, load its current diagnostic state directly from the database — SSI, System_State, matched rules, trend data, AESF indices, FRETTLSM hypotheses, Module E priority score and actions. This is not semantic search; it is a direct database lookup of the most recent pipeline evaluation.

Vector similarity retrieval: Embed the query and retrieve the top-K most similar rules, IMS rows, and FRETTLSM factors from pgvector using cosine similarity. K is typically 10-15, tuned to stay within token budget while capturing the relevant diagnostic neighborhood. A similarity threshold of 0.70 filters out low-relevance matches.

Keyword filter overlay: Narrow the vector results using structured filters — asset type, component category, severity level. A query about pump cavitation should not return gearbox mesh frequency rules, even if they are semantically adjacent in the embedding space.

Step 4: Generation

The assembled context is structured into a prompt with clear sections:

[SYSTEM PROMPT]
You are a diagnostic copilot for RAPID AI. You explain diagnostic results
using the NEME rhythm: Notice (what was observed), Engage (what rules matched),
Mull (what the interactions mean), Exchange (what actions to take).
Never invent failure modes. Never override confidence scores.
Cite every claim with its source identifier.

[CURRENT DIAGNOSTIC STATE]
Asset: P-101, Type: centrifugal pump, SSI: 0.72, State: warning
Matched rules: AFB09 (confidence 0.75), ...
Trend: accelerating (slope 0.038), AESF: S3 (SI=32, EI=68)
FRETTLSM: T108 (0.75), T106 (0.60)
Module E: P=73, Window=24h, Actions: ACT005, ACT002

[RELEVANT RULES AND IMS CONTEXT]
(Top-K retrieved rules with descriptions, physics basis, actions)

[USER QUESTION]
Why is pump P-101 in S3?

Step 5: Citation Injection

Post-generation, the system validates that every claim in the LLM’s response maps to a source in the assembled context. Claims that cannot be traced to a specific rule, IMS row, sensor reading, or module output are flagged and either removed or marked as “AI interpretation” with reduced confidence. The citation block is appended to every response, formatted as a list of source identifiers that the frontend can render as clickable links to the underlying evidence.

18.4 LLM Integration Patterns

The integration between RAPID AI and external LLM providers follows patterns designed for industrial reliability — where incorrect advice can cost millions or endanger lives.

Model Selection

Claude API serves as the primary model for complex diagnostic explanations. Its strength is structured reasoning over technical content — it can follow the NEME diagnostic rhythm, maintain citation discipline, and produce coherent multi-paragraph explanations that cover the full diagnostic chain from sensor input through causal analysis to maintenance recommendation. For simpler queries (single-asset status lookups, straightforward threshold comparisons), a local model handles the response without external API calls, reducing latency and cost.

Prompt Engineering

The system prompt is not generic. It encodes domain-specific constraints:

The NEME diagnostic rhythm (Notice, Engage, Mull, Exchange) structures every response so that engineers receive information in the order they need it: observation first, then hypothesis, then interaction analysis, then action.
The IAR framework (Initiator, Accelerator, Retarder) is embedded in the prompt so that causal explanations naturally decompose into root cause, aggravating factors, and defensive gaps.
Confidence thresholds are defined in the prompt: CONFIDENCE_HIGH (0.85), CONFIDENCE_MEDIUM_HIGH (0.75), CONFIDENCE_MEDIUM (0.60), CONFIDENCE_LOW (0.40). The LLM must use qualitative labels that match these numeric ranges.
The Nowlan-Heap context is included: the LLM knows that 82% of industrial failures are not age-related, so it does not default to “replace on schedule” recommendations.

Structured Output

The LLM returns JSON conforming to a strict schema:

{
  "explanation": "Natural language diagnostic explanation...",
  "citations": [
    {"source_type": "rule", "source_id": "AFB09", "claim": "bearing outer race defect"},
    {"source_type": "ims", "source_id": "IMS-bearing-OR", "claim": "CBM strategy"},
    {"source_type": "sensor", "source_id": "BPFO_trend", "claim": "14-day rising trend"}
  ],
  "confidence": {
    "overall": 0.75,
    "label": "medium-high",
    "source": "rule_engine"
  },
  "recommended_actions": [
    {"action_id": "ACT005", "title": "Bearing replacement (planned)", "urgency": "7 days"},
    {"action_id": "ACT002", "title": "Lubrication service", "urgency": "immediate"}
  ],
  "follow_up_questions": [
    "What is the oil analysis history for P-101?",
    "Has the coupling alignment been checked recently?",
    "Are there similar trends on the redundant pump P-101B?"
  ]
}

This structured format allows the frontend to render explanations, citations, actions, and follow-ups as distinct UI components rather than parsing free text.

Guardrails

The LLM operates within strict boundaries:

Action catalog constraint: The LLM cannot recommend maintenance actions outside the Module E action catalog (ACT001-ACT015). If the LLM generates “replace the impeller,” the post-processing layer matches it to the closest catalog entry or flags it as unrecognized.
Confidence immutability: The LLM cannot override, adjust, or reinterpret confidence scores computed by confidence.py. It reports them verbatim.
IMS boundary: The LLM cannot invent failure modes that do not exist in the IMS. If a novel failure pattern is detected, the system reports it as “unclassified pattern” with CONFIDENCE_LOW rather than fabricating a diagnosis.
No autonomous actions: The LLM can recommend, advise, and explain. It cannot trigger work orders, modify maintenance schedules, or actuate control systems.

Streaming

Copilot responses are streamed to the frontend in real-time via Server-Sent Events. The structured output schema is delivered incrementally: the explanation text streams first (so the engineer starts reading immediately), followed by the citations, actions, and follow-up questions as structured JSON blocks appended after the text completes. This keeps perceived latency under two seconds for most queries.

Caching

Common query patterns are cached at the context assembly level. When the same asset type exhibits the same failure mode with the same severity band, the assembled context (retrieved rules, IMS rows, FRETTLSM factors) is identical. The cache key is a composite of {asset_type}:{failure_mode}:{severity_band}:{system_state}. This avoids redundant embedding lookups and vector searches for the 80% of queries that follow familiar diagnostic patterns. The LLM generation itself is not cached — the explanation must reflect the current sensor values and timestamps.

Diagnostic Copilot: Agentic Workflow Patterns

Architecture: Hybrid AI with Tool Calling

RAPID AI uses a hybrid approach: deterministic rule engine for 451+ validated rules + LLM copilot for explanation, exploration, and edge cases.

┌─────────────────────────────────────────────┐
│           Diagnostic Copilot (LLM)          │
│  ┌─────────┐  ┌──────────┐  ┌───────────┐  │
│  │ Context │  │ Tool     │  │ Response  │  │
│  │ Builder │  │ Router   │  │ Generator │  │
│  └────┬────┘  └────┬─────┘  └─────┬─────┘  │
│       │            │              │         │
│  ┌────▼────────────▼──────────────▼─────┐   │
│  │         System Prompt + RAG          │   │
│  │  • IMS ground truth (100×34)         │   │
│  │  • Rule documentation (451 rules)    │   │
│  │  • Case study library (4,000+)       │   │
│  │  • FRETTLSM diagnostic framework    │   │
│  └──────────────────────────────────────┘   │
└─────────────┬──────────────────┬────────────┘
              │                  │
    ┌─────────▼──────┐  ┌───────▼────────┐
    │ Rule Engine    │  │ Data Layer     │
    │ (Deterministic)│  │ (PostgreSQL +  │
    │ 451+ rules     │  │  pgvector)     │
    └────────────────┘  └────────────────┘

Tool Calling Interface

The copilot has access to diagnostic tools via structured function calling:

// Tool definitions for the diagnostic copilot
const diagnosticTools = [
  {
    name: "run_pipeline",
    description: "Run full RAPID AI diagnostic pipeline on an asset",
    parameters: {
      asset_id: "string",
      measurement_id: "string | 'latest'",
      modules: "string[] — subset of ['guard','sense','fuse','act']"
    }
  },
  {
    name: "query_ims",
    description: "Query the Integrated Master Schema for ground truth",
    parameters: {
      asset_type: "string",
      fault_code: "string?",
      component: "string?"
    }
  },
  {
    name: "get_trend",
    description: "Retrieve trend data for a measurement point",
    parameters: {
      asset_id: "string",
      parameter: "string — 'rms' | 'kurtosis' | 'temperature' | 'ssi'",
      period_days: "number"
    }
  },
  {
    name: "compare_cases",
    description: "Find similar historical cases using vector similarity",
    parameters: {
      fault_signature: "object — current fault evidence",
      top_k: "number — default 5"
    }
  },
  {
    name: "explain_rule",
    description: "Get detailed explanation of a specific diagnostic rule",
    parameters: {
      rule_id: "string — e.g., 'BRG01', 'UNB03', 'DG005'"
    }
  },
  {
    name: "frettlsm_analysis",
    description: "Run FRETTLSM root cause analysis for a detected fault",
    parameters: {
      fault_code: "string",
      asset_context: "object — operating conditions, history"
    }
  },
  {
    name: "estimate_rul",
    description: "Run remaining useful life estimation",
    parameters: {
      asset_id: "string",
      model: "'linear' | 'accelerating' | 'weibull'"
    }
  }
];

System Prompt Architecture

You are the RAPID AI Diagnostic Copilot, an expert reliability engineer
powered by Dibyendu De's 28 years of diagnostic methodology.

## Your Role
- Explain diagnostic findings in plain language
- Guide engineers through root cause analysis using FRETTLSM
- Never guess — always verify with data via tool calls
- When confidence < 0.70, say so explicitly
- Recommend actions with clear priority and reasoning

## Rules of Engagement
1. ALWAYS check data quality first (Module A GUARD results)
2. NEVER skip the FRETTLSM checklist — premature closure kills
3. If the rule engine output conflicts with your reasoning,
   trust the rules (they encode 4,000+ validated cases)
4. Present confidence ranges, not point estimates
5. Explain the physics — engineers need to understand WHY,
   not just WHAT

## Context Available
- IMS ground truth: {ims_schema}
- Active rules: {rule_summary}
- Asset profile: {asset_profile}
- Recent measurements: {recent_data}

Example Agentic Conversation

Engineer: "Pump P-301 vibration is trending up on the drive end horizontal"

Copilot (internal):
  1. tool_call: get_trend(asset_id="P-301", parameter="rms", period_days=90)
  2. tool_call: run_pipeline(asset_id="P-301", measurement_id="latest")
  3. tool_call: query_ims(asset_type="pump", component="DE_bearing")

Copilot: "I've analyzed P-301's drive end horizontal trend and latest measurement:

**Current State:** SSI 0.47 (Watch → approaching Alert)
**Key Findings:**
- 1X vibration has increased 85% over 90 days (from 1.2 to 2.2 mm/s)
- Kurtosis trending from 3.1 to 4.8 — suggests developing bearing defect
- No significant 2X (misalignment unlikely)
- BPFO sidebands detected at 4.2× shaft speed — matches 6205 bearing geometry

**Confidence:** 0.74 (medium-high)
**Data Quality:** 0.95 (DG015 soft penalty — SNR slightly low)

**FRETTLSM Analysis:**
- **L (Lubrication):** Most likely primary cause — HF envelope shows amplitude modulation typical of lubrication film breakdown
- **S (Surface):** Secondary — kurtosis rise suggests early surface damage
- **T (Time):** Bearing installed 2023-06, at ~18,000 hours — within expected life for 6205 at this load

**Recommended Actions:**
1. (Immediate) Grease bearing per OEM spec — lubrication may arrest progression
2. (1 week) Recheck vibration — if kurtosis still rising after lubrication, schedule replacement
3. (Planning) Order replacement bearing 6205-2RS — current lead time: 2 weeks

**RUL Estimate:** 4-8 weeks at current degradation rate (linear model, confidence 0.65)"

Agent Orchestration Pattern

For complex diagnostics requiring multiple data sources:

"""Multi-agent diagnostic orchestration pattern."""

class DiagnosticOrchestrator:
    """Coordinates multiple specialist agents for comprehensive analysis."""

    async def diagnose(self, asset_id: str) -> DiagnosticReport:
        # Phase 1: Data Collection (parallel)
        vibration_data, oil_data, thermal_data, process_data = await asyncio.gather(
            self.vibration_agent.analyze(asset_id),
            self.oil_agent.analyze(asset_id),
            self.thermal_agent.analyze(asset_id),
            self.process_agent.analyze(asset_id),
        )

        # Phase 2: Fusion (SSI)
        ssi = self.fusion_agent.compute_ssi(
            vibration=vibration_data,
            oil=oil_data,
            thermal=thermal_data,
            process=process_data,
            profile=self.get_asset_profile(asset_id)
        )

        # Phase 3: Root Cause Analysis
        rca = self.rca_agent.analyze(
            findings=[vibration_data, oil_data, thermal_data, process_data],
            ssi=ssi,
            framework="FRETTLSM"
        )

        # Phase 4: Recommendations
        actions = self.action_agent.recommend(
            rca=rca,
            asset_criticality=self.get_criticality(asset_id),
            spare_availability=self.check_spares(rca.required_parts)
        )

        return DiagnosticReport(ssi=ssi, rca=rca, actions=actions)

MCP Integration Pattern

RAPID AI can expose its diagnostic engine as an MCP (Model Context Protocol) server, enabling any AI assistant to use it:

// MCP server exposing RAPID AI tools
const rapidAIMCPServer = {
  name: "rapid-ai-diagnostics",
  version: "2.0.0",
  tools: [
    {
      name: "rapid_diagnose",
      description: "Run RAPID AI diagnostic pipeline on vibration data",
      inputSchema: {
        type: "object",
        properties: {
          asset_id: { type: "string" },
          signal_data: { type: "array", items: { type: "number" } },
          sampling_rate: { type: "number" },
          rpm: { type: "number" }
        },
        required: ["asset_id", "signal_data", "sampling_rate"]
      }
    },
    {
      name: "rapid_health_check",
      description: "Get current health status and SSI for an asset",
      inputSchema: {
        type: "object",
        properties: {
          asset_id: { type: "string" }
        },
        required: ["asset_id"]
      }
    },
    {
      name: "rapid_rul",
      description: "Estimate remaining useful life for an asset component",
      inputSchema: {
        type: "object",
        properties: {
          asset_id: { type: "string" },
          component: { type: "string" }
        },
        required: ["asset_id", "component"]
      }
    }
  ]
};

18.5 Vector Similarity for Rule Matching

The deterministic rule evaluator (rule_evaluator.py) is the primary matching engine. It tokenizes rule expressions into a safe recursive-descent parse tree, resolves sensor variable names against live data, evaluates boolean logic with AND/OR/parentheses, and computes vibration ratios (A/H, H/V, V/A) for directional analysis. This process is fast, precise, and auditable. When a rule like BPFO > 0.7 AND envelope_gE > 5.0 fires, the engineer can trace exactly which sensor values triggered which terms.

Vector similarity augments this deterministic engine in three scenarios where formal expression matching reaches its limits.

Scenario 1: Low-Confidence or No-Match Situations

When the rule evaluator processes all 119 initiator rules against an asset’s sensor data and produces no match above CONFIDENCE_MEDIUM (0.60), the system has detected that something is abnormal (Module A flagged elevated features) but cannot identify the specific fault pattern. This gap exists because the 121 rules, while comprehensive, cannot cover every possible failure signature across every equipment configuration.

In this scenario, the vector-enhanced approach activates:

The current sensor evidence pattern — a composite of the extracted features (RMS, kurtosis, crest factor, spectral entropy, directional ratios) — is embedded as a vector.
pgvector retrieves the top-5 most similar IMS evidence patterns by cosine similarity.
Matches are returned with a semantic_match flag and a confidence score discounted by 20% relative to deterministic matches. A semantic match with raw similarity of 0.85 becomes CONFIDENCE_MEDIUM (0.60) rather than CONFIDENCE_MEDIUM_HIGH (0.75).
The copilot explicitly labels these as “semantic similarity matches” in its explanation, distinguishing them from deterministic rule matches.

Scenario 2: Cross-Equipment Pattern Transfer

The 119 initiator rules are organized by component type. A bearing rule validated on centrifugal pumps may not be explicitly defined for fans or compressors, even though the underlying physics of bearing degradation is identical. Vector similarity allows the system to recognize that a fan bearing exhibiting A/H > 1.3 with rising kurtosis is semantically similar to the pump bearing pattern AFB07 (misalignment), even if no explicit fan-bearing misalignment rule exists yet.

This cross-equipment inference is conservative: it carries a transfer_flag that alerts the engineer that the match was derived from a different equipment type, and confidence is further discounted by 10%.

Scenario 3: Novel Failure Pattern Discovery

When a sensor evidence pattern consistently appears in the data but matches no existing rules (deterministic or semantic), the system logs it as a candidate for new rule creation. Over time, these unmatched patterns — clustered by vector proximity — become candidates for expansion of the rule library. This is how the system learns without machine learning: field evidence that repeatedly appears without a matching rule signals a gap in the diagnostic knowledge base that Dibyendu De or a domain expert should evaluate and codify.

18.6 Training Data Strategy

RAPID AI inverts the conventional AI training data requirement. A typical ML predictive maintenance system needs thousands of labeled failure examples before it can make its first prediction. RAPID AI can diagnose on its first deployment because its knowledge comes from encoded physics and expert rules, not from statistical patterns learned from data.

That said, data plays four critical roles in making the system better over time.

Validation Data

Confirmed diagnostic cases — instances where RAPID AI produced a diagnosis and a subsequent inspection or failure confirmed or refuted it — are the gold standard for measuring system accuracy. Each validated case is stored with the full pipeline state at the time of diagnosis (sensor values, matched rules, confidence scores, trend data, FRETTLSM hypotheses) alongside the field outcome (confirmed failure mode, actual root cause, actual remaining life). The target is to accumulate 500+ validated cases across the 19 equipment families, providing statistically significant accuracy metrics per equipment type and failure mode.

Feedback Loops

Engineers interact with the copilot and the diagnostic dashboard. When they mark a diagnostic result as correct, incorrect, or partially correct, that feedback is recorded against the specific rule match, confidence score, and IMS row. Over time, this feedback calibrates the system:

Rules that consistently produce correct matches at high confidence have their severity weights validated.
Rules that produce false positives have their threshold expressions reviewed for tightening.
Confidence thresholds that trigger action windows are validated against real-world outcome data — if the 24-hour window (P >= 70) consistently catches failures with 7+ days of remaining life, the threshold may be relaxed; if failures are being missed, it is tightened.
FRETTLSM keyword sets are expanded when field reports use terminology not currently in the taxonomy’s keyword tuples.

This is not reinforcement learning. It is structured human-in-the-loop calibration, where every adjustment is reviewed and approved by a domain expert before it modifies the rule base.

Embedding Fine-Tuning

The base embedding model captures general semantic relationships. Domain-specific fine-tuning on a reliability engineering corpus — ISO standards, vibration analysis literature, Dibyendu De’s diagnostic case studies, the RAPID AI rule library itself — produces embeddings where “BPFO” is close to “bearing outer race defect frequency” and “1x” is close to “shaft rotational speed,” relationships that a general-purpose model may not capture precisely. The fine-tuning corpus is curated, not scraped — every document is reviewed for technical accuracy before inclusion.

Few-Shot Examples

The copilot’s prompt includes curated diagnostic Q&A pairs that demonstrate the expected response format, citation discipline, and NEME structure. These examples are drawn from validated diagnostic cases and serve as in-context learning anchors. A representative set includes:

Bearing defect diagnosis with BPFO evidence and FRETTLSM causal mapping
Misalignment diagnosis with directional ratio analysis and IAR decomposition
Cavitation diagnosis with process correlation and Module E action planning
Multi-cause diagnosis where two initiators interact (lubrication failure accelerated by thermal degradation)
Low-confidence diagnosis where the system appropriately communicates uncertainty

The few-shot library is versioned alongside the rule library and updated when new validated cases provide better exemplars.

18.7 AI Ethics and Safety

Industrial AI carries consequences that consumer AI does not. A chatbot that generates an inaccurate restaurant recommendation wastes a dinner. A diagnostic system that generates an inaccurate maintenance recommendation can delay a critical bearing replacement, leading to a catastrophic failure, an unplanned shutdown costing hundreds of thousands of dollars, or — in the worst case — injury or death.

RAPID AI’s AI ethics framework is grounded in engineering accountability, not abstract principles.

Explainability as a Non-Negotiable

Every AI-generated recommendation traces back to physics-based rules through a complete audit chain: LLM explanation -> cited rule IDs -> rule expressions -> sensor data -> physical measurements. There are no black-box decisions. An engineer can follow the chain from “replace the bearing within 7 days” back through the priority score calculation (Module E), the SSI fusion (Module C), the matched initiator rule (Module B), the extracted features (Module A), and the raw sensor values that started it all. If any link in the chain is questionable, the engineer can challenge it and the system can show its work.

Human-in-the-Loop (Mandatory)

RAPID AI recommends. Humans decide. This is not a philosophical position — it is an architectural constraint. The system has no interface to a CMMS work order creation system that could bypass human review. It has no connection to control systems that could modify operating parameters. It has no ability to schedule maintenance or dispatch technicians autonomously. The output of every diagnostic pipeline run is a recommendation presented to a human who must evaluate it, approve it, and initiate the action through their existing maintenance management processes.

Confidence Transparency

Every diagnostic output carries an explicit confidence score computed by confidence.py, with defined thresholds that determine what actions are warranted:

Threshold	Value	What It Gates
THRESHOLD_DASHBOARD_DISPLAY	0.50	Minimum confidence to show on dashboard
THRESHOLD_RCM_ACTIVATION	0.70	Minimum confidence to trigger RCM strategy selection
THRESHOLD_SAFETY_ESCALATION	0.80	Minimum confidence to escalate safety-critical findings

Results below THRESHOLD_DASHBOARD_DISPLAY are suppressed from the user interface entirely. The system never hides uncertainty — it quantifies it and gates its own outputs accordingly.

Bias Mitigation Through Physics

Traditional ML systems learn from historical maintenance data, which encodes the biases of past maintenance practices — assets that were over-maintained appear reliable; assets that were neglected appear failure-prone. RAPID AI’s rule base is derived from physics, not from historical patterns. A bearing preload rule (AFB01) fires based on directional vibration ratios that reflect the physical state of the bearing, not based on whether similar bearings have historically been replaced on schedule. This physics-first approach eliminates the feedback loops that make ML-based predictive maintenance systems brittle and biased.

Liability Boundary

All AI outputs are explicitly labeled as advisory. The API response schema includes a disclaimer field that states: AI-generated recommendations are advisory and do not constitute engineering directives. The responsible engineer must evaluate all recommendations in the context of site-specific conditions, regulatory requirements, and professional judgment before initiating maintenance actions. This boundary is architectural, not merely legal — the system is designed so that its outputs inform decisions rather than make them.

Data Privacy and Isolation

Sensor data, diagnostic results, and copilot conversations are scoped to the customer’s deployment. There is no cross-customer data aggregation, no shared training pipeline, and no telemetry that transmits diagnostic content to external services beyond the LLM API calls required for copilot generation. LLM API calls transmit assembled context (rules, feature summaries, diagnostic states) but never raw sensor time-series data. Customers with air-gapped deployments can use local models for copilot generation, trading response quality for complete data isolation.

18.8 Future AI Capabilities

The hybrid architecture is designed to accommodate capabilities that do not yet exist in the system. Each future capability is constrained by the same principle: physics provides the ground truth, AI provides the augmentation.

Predictive Maintenance Scheduling Optimization

Current state: Module E computes priority scores and assigns action windows (immediate, 24 hours, 7 days, next shutdown). Future state: combinatorial optimization across the full asset fleet, solving for the maintenance schedule that minimizes total risk while respecting resource constraints (maintenance crew availability, spare parts inventory, production schedule windows, shutdown duration limits). This is a constraint satisfaction problem, not a prediction problem — the RUL estimates from Module F and the priority scores from Module E become inputs to an optimizer that finds the schedule with the lowest aggregate risk exposure.

Cross-Asset Pattern Recognition

Current state: each asset is diagnosed independently. Future state: fleet-level anomaly detection that identifies correlated degradation across multiple assets sharing common operating conditions. When three cooling water pumps on the same header all show rising spectral entropy simultaneously, that pattern — invisible at the individual asset level — signals a system-level condition (water quality change, header pressure instability, common foundation degradation) that no single-asset diagnostic can identify.

Automated Rule Generation

Current state: new rules are authored by domain experts (primarily Dibyendu De) and encoded as parseable expressions. Future state: when a novel sensor evidence pattern is validated by field confirmation across multiple occurrences, the system proposes a new rule expression, severity weight, and physics basis explanation for expert review. The expert approves, modifies, or rejects the proposal. The AI generates the candidate; the human validates the physics. This accelerates rule library expansion while maintaining the requirement that every rule has a traceable physical justification.

Digital Twin Integration

Current state: RAPID AI diagnoses based on real-time sensor data compared against static thresholds and expert rules. Future state: integration with physics-based simulation models (finite element analysis, computational fluid dynamics, rotordynamic models) that provide dynamic baselines. Instead of comparing vibration against a fixed threshold, the system compares measured behavior against the simulated behavior of a healthy machine under the current operating conditions. Deviations between the digital twin prediction and measured reality become the diagnostic signal, automatically adjusting for load, speed, temperature, and process variations.

Natural Language Rule Authoring

Current state: rules are authored as formal boolean expressions (BPFO > 0.7 AND envelope_gE > 5.0). Future state: a domain expert describes the rule in natural language — “add a rule that fires when pump discharge pressure drops while motor current rises, indicating a developing blockage or impeller degradation” — and the system generates the formal expression, maps it to the appropriate component category, assigns a default severity weight based on similar rules, and submits it for validation against historical data before activation. The natural language interface lowers the barrier to rule authoring without compromising the deterministic precision of the rule evaluation engine.

Bayesian Confidence Updating

Current state: confidence scores are computed through deterministic mapping (keyword match ratios, rule match counts, severity cascades). Future state: Bayesian belief networks that update confidence as new evidence arrives. When Module B.2 reports an accelerating trend confirming Module B’s bearing defect hypothesis, the posterior confidence should increase more than the current independent-evidence compounding formula (confidence = 1 - (1-c_m)(1-c_t)(1-c_s)) allows. Full Bayesian updating would model the conditional dependencies between evidence sources — a bearing temperature rise is more significant when it coincides with rising kurtosis than when it occurs in isolation.

Reinforcement Learning for Maintenance Optimization

Current state: Module E selects maintenance strategies from a fixed decision tree (RCM tiers, priority windows, action catalog). Future state: reinforcement learning that optimizes maintenance policies based on long-term outcomes. The reward function would encode total cost of ownership: repair cost + downtime cost + safety risk + environmental risk. The agent would learn that early intervention on certain failure modes (lubrication degradation in high-speed bearings) yields dramatically better outcomes than late intervention, while other failure modes (slow corrosion in low-criticality structural elements) can safely be deferred. This capability requires the validated case database described in Section 18.6 to reach sufficient scale — hundreds of confirmed diagnostic-outcome pairs per failure mode category.

Each of these capabilities extends the platform without violating its core principle: the physics-based rule engine remains the source of truth, and AI augments, accelerates, and communicates — but never overrides — the engineering judgment encoded in the rules.

18.9 Production AI Infrastructure

The production deployment uses a protocol-based provider registry that makes adding new AI providers as simple as adding one file.

Provider Registry

engine/server/intelligence/llm/providers/
├── __init__.py          — AIProvider protocol + ProviderRegistry
├── gemini.py            — Google Gemini (text + embed + tools)
├── openai.py            — OpenAI (text + tools)
├── cloudflare.py        — Workers AI (text + classify)
└── template.py          — Deterministic fallback (always available)

Each provider implements the AIProvider protocol and auto-registers on import. Adding a new provider requires zero edits to existing code.

Provider	Capabilities	Status
Gemini	text_generation, tool_calling, embedding	Primary
OpenAI	text_generation, tool_calling	Fallback
Cloudflare	text_generation, classification	Edge
Template	text_generation (deterministic)	Always available

Fallback Chain

Configured via RAPID_AI_PROVIDER_CHAIN=gemini,openai,cloudflare,template. For each request, the router filters by capability + valid API keys, then tries in order until success.

Model Tiers

Tier	Model	Use Case
Lite	`gemini-3.1-flash-lite`	Classification tasks
Standard	`gemini-3.1-flash`	Analysis briefs
Pro	`gemini-3.1-pro`	Diagnostic reports, RCA narratives

Pro falls back: pro → flash → flash-lite → 2.0-flash.

Cloudflare AI Gateway

All LLM calls route through Cloudflare AI Gateway for rate limiting, cost tracking, and observability. Requires CLOUDFLARE_ACCOUNT_ID, CLOUDFLARE_AI_GATEWAY_ID, CLOUDFLARE_AIG_TOKEN.

Feature Flags

AI capabilities are gated via environment variables (all default ON):

Feature	Env Var	Purpose
Analysis Brief	`RAPID_FEATURE_AI_BRIEF`	LLM-generated narratives
AI Diagnostician	`RAPID_FEATURE_AI_DIAGNOSTICIAN`	Agentic RCA diagnosis
RAG Rules	`RAPID_FEATURE_RAG_RULES`	pgvector rule + analysis embeddings
Swarm Engine	`RAPID_FEATURE_SWARM_ENGINE`	Multi-agent coordination

Disable any feature with RAPID_FEATURE_<NAME>=0. Current flags exposed at GET /health → features object.

Standards Alignment

Standard	Relevance to This Chapter
ISO 13374 — Condition monitoring and diagnostics of machines	The hybrid AI architecture (physics rules + LLM + vector search) extends ISO 13374’s processing framework by adding natural language advisory generation and semantic retrieval capabilities on top of the standard’s six processing levels.
ISO 17359 — General guidelines for condition monitoring	The diagnostic copilot implements ISO 17359’s requirement for accessible condition monitoring results by translating structured diagnostic outputs into natural language that non-specialist users can understand and act on.
MIMOSA OSA-CBM — Open System Architecture for CBM	The three-layer AI architecture (rules, LLM reasoning, vector retrieval) maintains OSA-CBM compliance by ensuring that the LLM layer augments rather than replaces the deterministic diagnostic engine.

Changelog

Version	Date	Author	Changes
2.1.0	2026-03-17	Rick D	Added standards alignment, living doc metadata, changelog
2.0.0	2026-03-17	Rick D	Enriched with production codebase content
1.0.0	2026-03-17	Rick D	Initial chapter creation