Agentic AI Architecture

Chapter 30 — Agentic AI Architecture

The previous chapters established RAPID AI’s diagnostic intelligence: 451+ physics-based rules, the SEDL entropy framework, AESF stability states, FRETTLSM causal taxonomy, and the complete A-through-G pipeline. Chapter 18 described how a large language model translates this intelligence into natural language. This chapter goes further. It describes how the LLM becomes an agent — a reasoning system that selects tools, orchestrates multi-step diagnostic workflows, and collaborates with specialized sub-agents to answer questions that no single pipeline call can resolve.

The architecture follows a strict boundary: the LLM reasons about diagnostics; the engines compute diagnostics. The LLM never calculates a confidence score, never evaluates a rule, never estimates remaining useful life. It decides which tools to call, interprets their results, and composes explanations grounded in those results. Cross this boundary and the system hallucinates. Maintain it and every response traces back to physics.

30.1 System Prompts for the Diagnostic Copilot

The LLM copilot is not a generic chatbot. It is a domain-constrained diagnostic agent whose behavior is controlled by a system prompt that encodes role, framework, constraints, output format, and safety guardrails. The system prompt is the single most important piece of prompt engineering in the entire platform — it determines whether the copilot behaves like a disciplined reliability engineer or an overconfident generalist.

Role Definition

The system prompt opens with an unambiguous identity statement:

Who you are: RAPID AI, an engineering reliability diagnostic assistant built on Dibyendu De’s Theory of Imperfections.
What you do: Diagnose rotating machinery failures using physics-based rules and deterministic engines, not statistical correlations or training data.
What you do not do: You do not guess, speculate, or invent failure modes. Every diagnostic conclusion must originate from the pipeline engines.

Diagnostic Framework

The prompt encodes RAPID AI’s diagnostic methodology:

NEME rhythm: Notice (observe sensor data and alerts) —> Engage (run diagnostic pipeline, retrieve rules) —> Mull (reason about the evidence, consider alternatives) —> Exchange (present findings with citations, invite engineer feedback).
IAR classification: Every failure factor is classified as Initiator (root cause), Accelerator (amplifying condition), or Retarder (protective factor). The copilot must use these classifications when discussing causal chains.
PLS3D depth levels: Diagnostic depth ranges from superficial (symptom identification) through moderate (mechanism mapping) to deep (design contradiction analysis). The copilot adjusts depth based on the question’s complexity and the user’s role.
AESF stability states: S0 through S4 (Coherent Stable, Focused Instability, Transitional Instability, Diffuse Disorder, Critical Transition). The copilot references these states by name and explains their physical meaning.

Hard Constraints

The following constraints are non-negotiable and enforced through prompt engineering:

Never invent failure modes not present in the IMS rule library or the 119 initiator rules. If the pipeline does not detect a fault, the copilot does not report one.
Never override confidence scores computed by confidence.py. Report them exactly as the engine provides them.
Never recommend actions not in the Module E action catalog (ACT001 through ACT015+). If the situation requires an action outside the catalog, say so explicitly rather than improvising.
Always cite rule IDs (e.g., AFB09), IMS row identifiers, sensor readings, module outputs, and FRETTLSM factor codes in every response.
Never provide medical, legal, or financial advice. RAPID AI is a diagnostic tool, not a liability shield.

Output Format

Every copilot response follows a structured schema:

{
  "explanation": "Human-readable diagnostic narrative (the prose the engineer reads)",
  "citations": [
    { "type": "rule", "id": "AFB09", "description": "Bearing outer race defect" },
    { "type": "ims", "row": "IMS-bearing-OR", "strategy": "CBM" },
    { "type": "sensor", "tag": "P101-DE-VIB-H", "value": 4.2, "unit": "mm/s" },
    { "type": "module", "name": "B.2", "output": "accelerating", "slope": 0.038 }
  ],
  "confidence": 0.75,
  "confidence_label": "medium-high",
  "recommended_actions": [
    { "code": "ACT005", "description": "Bearing replacement (planned)", "priority_window": "7 days" }
  ],
  "follow_up_questions": [
    "Has the lubrication schedule been maintained for P-101?",
    "Were there any process upsets in the last 14 days?"
  ]
}

The LLM generates this JSON structure, which the frontend parses into visual cards, citation links, and action buttons. The explanation field is also rendered as prose for conversational display.

Confidence Language Mapping

Numeric confidence scores map to specific hedging language. The copilot does not freestyle its uncertainty expressions:

Confidence Range	Language	Example
0.85 - 1.00	”with high confidence"	"With high confidence (0.87), the failure mode is bearing outer race defect.”
0.70 - 0.84	”with moderate-high confidence"	"With moderate-high confidence (0.75), tribological degradation is the primary initiator.”
0.50 - 0.69	”the evidence suggests"	"The evidence suggests (0.58) misalignment as a contributing factor.”
0.30 - 0.49	”possibly” / “there are indications"	"There are indications (0.42) of foundation looseness, but additional data is needed.”
0.00 - 0.29	”insufficient evidence"	"There is insufficient evidence (0.18) to confirm coupling fatigue at this time.”

Safety Guardrails

If the overall diagnostic confidence is below 0.50, the copilot must explicitly state: “The diagnostic confidence is below the reliability threshold. The following observations are preliminary and should not be used as the sole basis for maintenance decisions.”
If the recommended action involves safety-critical work (e.g., ACT005 bearing replacement on a critical-service pump), the confidence must be >= 0.80. If it is not, the copilot recommends additional data collection before action.
If the copilot detects contradictory evidence (e.g., Module B says misalignment but Module B.3 entropy is stable), it flags the contradiction rather than arbitrarily choosing one interpretation.

Full Example System Prompt

The following is a production-ready system prompt that could be passed directly to the Claude API:

You are RAPID AI, an engineering reliability diagnostic assistant built on
Dibyendu De's Theory of Imperfections. You diagnose rotating machinery failures
using physics-based rules and deterministic diagnostic engines. You do not guess,
speculate, or invent failure modes.

DIAGNOSTIC FRAMEWORK:
- Follow the NEME rhythm: Notice sensor data → Engage diagnostic engines →
  Mull the evidence → Exchange findings with citations.
- Classify every causal factor as Initiator (I), Accelerator (A), or Retarder (R).
- Reference AESF stability states (S0-S4) by name and physical meaning.

HARD CONSTRAINTS:
- Never report a failure mode not detected by the diagnostic pipeline.
- Never override confidence scores from the rule engine.
- Never recommend actions outside the Module E catalog (ACT001-ACT015).
- Always cite rule IDs, IMS rows, sensor tags, and module outputs.
- If confidence < 0.50, state that findings are preliminary.
- If recommending safety-critical actions, require confidence >= 0.80.

OUTPUT FORMAT:
Return a JSON object with keys: explanation, citations, confidence,
confidence_label, recommended_actions, follow_up_questions.
The explanation should be 2-4 paragraphs of clear engineering prose.
Citations must reference specific rule IDs, IMS rows, and sensor readings.

CONFIDENCE LANGUAGE:
- 0.85-1.00: "with high confidence (X.XX)"
- 0.70-0.84: "with moderate-high confidence (X.XX)"
- 0.50-0.69: "the evidence suggests (X.XX)"
- 0.30-0.49: "there are indications (X.XX)"
- 0.00-0.29: "insufficient evidence (X.XX)"

TOOLS:
You have access to diagnostic tools. Use them to retrieve asset data, run
diagnoses, analyze entropy, estimate RUL, and search rules. Always call
get_asset_config before diagnosing. Always call get_diagnostic_history for
trending context. Reason step by step before selecting tools.

SAFETY:
If evidence is contradictory, flag the contradiction explicitly.
If the engineer's question is outside your diagnostic scope, say so.
You are a translator of physics-based diagnostics, not an independent analyst.

30.2 Function/Tool Calling Architecture

The copilot does not operate in a vacuum. It has access to structured tools that execute against the Python backend and return typed results. The LLM selects which tools to call, in what order, and with what parameters — based on its reasoning about the user’s question.

Available Tools

Each tool corresponds to a FastAPI endpoint in the Python backend. The tool definitions include parameter schemas, return type descriptions, and usage guidance that the LLM uses for selection.

diagnose(asset_id: str, sensor_data: dict) → DiagnosticResult
  Runs the full A → B → C → D → E pipeline for an asset.
  Returns: fault detections, severity scores, trend classifications,
  AESF state, SSI, RUL estimate, priority score, recommended actions.

sedl_analyze(h_amplitudes: list[float], v_amplitudes: list[float],
             a_amplitudes: list[float]) → SEDLResult
  Computes spectral entropy (SE), temporal entropy (TE), directional
  entropy (DE), and the composite Stability Index (SI).
  Returns: SE, TE, DE, SI, entropy_state, dSE_dt, dTE_dt, dDE_dt.

fusion_evaluate(block_scores: dict, profile: str) → FusionResult
  Fuses per-component block scores into the System Stability Index (SSI)
  using profile-weighted aggregation.
  Returns: SSI, block_contributions, system_state, dominant_block.

rul_estimate(severity: float, confidence: float, log_slope: float,
             baseline_mtbf: float, operating_hours: float) → RULResult
  Estimates remaining useful life using F001 (Weibull extrapolation),
  F002 (condition-adjusted Weibull), and F003 (degradation curve fit).
  Returns: rul_days, failure_probability_30d, model_used, uncertainty_band.

cde_evaluate(asset_id: str, SSI: float, system_state: str,
             failure_history: list, design_params: dict) → CDEResult
  Detects engineering contradictions using Contradiction-Driven Engineering.
  Returns: contradictions_found, contradiction_type, design_recommendations,
  altshuller_principles.

causal_analyze(failure_mode: str, evidence: dict,
               context: dict) → CausalResult
  Maps failure evidence to the FRETTLSM 88-factor taxonomy.
  Returns: top_factors (with IAR classification and confidence),
  causal_chain, contributing_conditions.

get_asset_config(asset_id: str) → AssetConfig
  Retrieves asset profile: equipment type, subsystem map, sensor list,
  baselines, thresholds, criticality rating, maintenance history summary.

get_diagnostic_history(asset_id: str, days: int) → List[DiagnosticResult]
  Historical diagnostic results for trending. Default: 30 days.
  Returns: timestamped list of SSI, AESF state, active faults, priority scores.

get_ims_rules(asset_type: str) → List[IMSRow]
  Retrieves all IMS ground-truth rows applicable to the asset type.
  Each row includes: failure_mode, sensor_evidence, RCM_decision,
  maintenance_task, action_window.

search_rules(query: str) → List[Rule]
  Semantic search across 451+ rules using pgvector similarity.
  Returns: matching rules ranked by relevance, with rule_id, description,
  severity, component_category, physics_basis.

Tool Calling Flow

The interaction between user, LLM, and tools follows a structured pattern:

User asks a question in natural language. Example: “Is pump P-101 getting worse? Should I schedule a shutdown?”
LLM reasons about the question using chain-of-thought (internal monologue, not shown to the user):
- “The user asks about degradation trend and shutdown decision for P-101.”
- “I need: (a) current asset config for context, (b) diagnostic history for trending, (c) a fresh diagnosis with latest sensor data.”
- “I will call get_asset_config first, then get_diagnostic_history, then diagnose.”
LLM issues tool calls in the appropriate sequence. Some tools can run in parallel (asset config + diagnostic history), while others depend on prior results (diagnosis may need the asset profile).
Tools execute against the Python FastAPI backend. Each tool call is a structured HTTP request to the corresponding endpoint. Results return as typed JSON.
LLM interprets results and composes a response grounded in the tool outputs. Every claim in the response maps to a specific tool result.

Multi-Step Tool Use

Simple questions (“What is the current state of P-101?”) may require only 1-2 tool calls. Complex questions require a chain of 3-5 calls:

Example: “Why is the cooling water system degrading, and what should we do about it?”

Step	Tool Call	Purpose
1	`get_asset_config("CW-SYSTEM")`	Identify all assets in the cooling water system
2	`get_diagnostic_history("P-101", 30)` + `get_diagnostic_history("P-102", 30)`	Get trending data for each asset (parallel)
3	`diagnose("P-101", latest_data)` + `diagnose("P-102", latest_data)`	Fresh diagnosis for each asset (parallel)
4	`causal_analyze("bearing_defect", evidence, context)`	Root cause analysis on the worst asset
5	`search_rules("cooling water pump degradation")`	Find related IMS rules and imperfection patterns

The LLM synthesizes results from all five steps into a unified narrative that addresses the system-level question, not just individual asset states.

Error Handling

Tool calls can fail. The LLM handles failures gracefully:

Asset not found: “I could not find asset ID ‘P-103’ in the system. Did you mean P-101 or P-102? You can also provide the full asset tag.”
Insufficient sensor data: “The latest sensor data for P-101 is 48 hours old (threshold: 24 hours). I can run a diagnosis on the available data, but results may not reflect current conditions. Would you like to proceed?”
Engine timeout: “The diagnostic pipeline timed out for this request. This can happen with large multi-asset analyses. Let me try diagnosing each asset individually.”
Confidence too low: “The diagnosis returned a confidence of 0.32, which is below the reliability threshold. I recommend collecting additional vibration data at higher resolution before drawing conclusions.”

The principle is: never silently fail, never make up results, always explain what happened and suggest next steps.

30.3 Multi-Agent Orchestration

Single-agent architecture handles most diagnostic questions. But some scenarios require specialized reasoning that exceeds what a single prompt and tool set can efficiently manage. For these cases, RAPID AI deploys multiple collaborating agents, each specialized for a domain within the diagnostic workflow.

Agent Roster

Agent	Specialization	Tools Available	When Activated
Diagnostic Agent	Primary pipeline interpreter	diagnose, sedl_analyze, fusion_evaluate, get_asset_config, get_diagnostic_history	Every user interaction (always the primary)
Causal Agent	FRETTLSM root cause analysis	causal_analyze, search_rules, get_ims_rules	When fault detected and cause requested
RCM Agent	Maintenance strategy optimization	rul_estimate, get_diagnostic_history, get_asset_config	When maintenance decision requested
CDE Agent	Contradiction-driven engineering	cde_evaluate, get_diagnostic_history, causal_analyze	When chronic/recurring failure detected
Reporting Agent	Structured report generation	All tools (read-only)	When formal report requested

Communication Pattern

Agents do not communicate through shared memory or message queues. They communicate through structured handoff — one agent completes its work, produces a typed result object, and the orchestrator passes relevant portions of that result to the next agent as context.

User Question
    │
    v
┌─────────────────────────┐
│   Orchestrator          │
│   (routes to primary    │
│    agent, manages       │
│    handoffs)            │
└─────────┬───────────────┘
          │
          v
┌─────────────────────────┐
│   Diagnostic Agent      │   Step 1: Run pipeline
│   (primary, always)     │   Identify faults, severity, state
└─────────┬───────────────┘
          │ result: DiagnosticResult
          v
┌─────────────────────────┐
│   Causal Agent          │   Step 2: Root cause
│   (if cause needed)     │   Map to FRETTLSM taxonomy
└─────────┬───────────────┘
          │ result: CausalResult
          v
┌─────────────────────────┐
│   CDE Agent             │   Step 3: Design analysis
│   (if chronic failure)  │   Detect contradictions
└─────────┬───────────────┘
          │ result: CDEResult
          v
┌─────────────────────────┐
│   Reporting Agent       │   Step 4: Synthesis
│   (aggregates all)      │   Generate final response
└─────────────────────────┘
          │
          v
    Final Response to User

Activation Logic

The orchestrator decides which agents to involve based on the question’s complexity and the primary agent’s results:

Simple status query (“What is P-101’s current state?”): Diagnostic Agent only. One tool call, one response.
Diagnostic with cause (“Why is P-101 in S3?”): Diagnostic Agent runs the pipeline, then hands off to Causal Agent for FRETTLSM analysis. Two agents, sequential.
Maintenance decision (“Should I shut down P-101 for bearing replacement?”): Diagnostic Agent + RCM Agent in parallel. Diagnostic Agent provides current state; RCM Agent provides RUL estimate and maintenance priority analysis.
Chronic failure investigation (“P-101 keeps failing every 6 months. What is the root cause?”): Full chain — Diagnostic Agent identifies current state, Causal Agent maps the failure mechanism, CDE Agent analyzes the failure history for engineering contradictions, Reporting Agent synthesizes a design-out recommendation.
Fleet-level report (“Give me a health summary of all cooling water pumps”): Diagnostic Agent runs multi-asset analysis, Reporting Agent formats the fleet summary with comparative tables and trend charts.

Agent Context Isolation

Each agent operates within its own context window. The orchestrator passes only the information each agent needs:

The Causal Agent receives the fault detection results and sensor evidence, but not the full diagnostic history (it does not need trending data for root cause identification).
The CDE Agent receives the causal analysis results and 12-month failure history, but not real-time sensor readings (it operates on patterns, not current measurements).
The Reporting Agent receives all prior agent outputs but does not call diagnostic tools itself — it formats what others have computed.

This isolation prevents context window bloat and keeps each agent focused on its specialty. A single-agent approach with all tools and all context would degrade response quality as the context window fills with irrelevant information.

Agent Consistency

All agents share the same core constraints encoded in the system prompt (Section 30.1). No agent can override confidence scores, invent failure modes, or recommend actions outside the catalog. The Causal Agent cannot contradict the Diagnostic Agent’s fault detection. The CDE Agent cannot ignore the Causal Agent’s root cause identification. Consistency flows from the deterministic pipeline — agents interpret the same physics, they just interpret it from different angles.

Production Agent Architecture

Agent Types in RAPID AI

Agent	Role	Autonomy Level	Human-in-Loop
Data Collector	Ingests sensor data, validates, stores	Fully autonomous	On quality alerts
Diagnostic Engine	Runs rule pipeline, produces findings	Fully autonomous	Findings reviewed
Trend Analyst	Monitors degradation trends, flags changes	Autonomous + alerts	On state transitions
RCA Analyst	Performs root cause analysis via FRETTLSM	Semi-autonomous	Always before action
Copilot	Explains findings, answers questions	Interactive	Always
Action Recommender	Suggests maintenance actions	Semi-autonomous	Before execution
Escalation Agent	Monitors RUL vs lead times, escalates	Autonomous	Notifies only
Report Generator	Creates periodic and ad-hoc reports	Autonomous	Before distribution

Safety Boundaries

┌─────────────────────────────────────────────┐
│              FULLY AUTONOMOUS               │
│  • Data ingestion and validation            │
│  • Feature extraction and rule evaluation   │
│  • Trend monitoring and state tracking      │
│  • Report generation (draft)                │
├─────────────────────────────────────────────┤
│          HUMAN-APPROVED AUTONOMY            │
│  • Root cause diagnosis finalization        │
│  • Work order creation                      │
│  • Maintenance scheduling changes           │
│  • Alert distribution                       │
├─────────────────────────────────────────────┤
│            HUMAN-ONLY ACTIONS               │
│  • Equipment shutdown decisions             │
│  • Safety-critical maintenance execution    │
│  • Operating parameter changes              │
│  • System configuration changes             │
└─────────────────────────────────────────────┘

Guardrails

Confidence Gate: No action recommended below 0.70 confidence
Safety Gate: Safety-critical assets require >= 0.80 confidence + human approval
Cost Gate: Actions exceeding $10,000 require management approval
Contradiction Check: If Module G detects contradictory evidence, escalate to human
Rate Limiting: No more than 3 state transitions per asset per 24 hours (prevents flicker)
Audit Trail: Every agent decision logged with reasoning, confidence, and data inputs

Real-World Deployment Pattern

"""Production deployment: diagnostic agent with safety boundaries."""

class ProductionDiagnosticAgent:
    """Agent with explicit safety boundaries and audit trail."""

    def __init__(self, config: AgentConfig):
        self.confidence_threshold = config.confidence_threshold  # 0.70
        self.safety_threshold = config.safety_threshold  # 0.80
        self.max_autonomy_cost = config.max_autonomy_cost  # $10,000
        self.audit_log = AuditLogger(config.audit_db)

    async def process_measurement(self, measurement: Measurement) -> AgentAction:
        # Always autonomous: run pipeline
        result = await self.pipeline.run(measurement)
        self.audit_log.log_pipeline_run(measurement.id, result)

        # Check confidence gate
        if result.confidence < self.confidence_threshold:
            self.audit_log.log_decision("LOW_CONFIDENCE", result)
            return AgentAction.MONITOR  # Just watch

        # Check safety gate
        if measurement.asset.is_safety_critical:
            if result.confidence < self.safety_threshold:
                self.audit_log.log_decision("SAFETY_ESCALATION", result)
                return AgentAction.ESCALATE_TO_HUMAN

        # Check state transition
        if result.state_changed:
            # Validate transition is real (not noise)
            if not self.state_machine.validate_transition(result):
                return AgentAction.MONITOR

            # Escalation check: is RUL < spare lead time?
            if result.rul_hours and result.rul_hours < self.get_spare_lead_time(result):
                self.audit_log.log_decision("LEAD_TIME_ESCALATION", result)
                return AgentAction.URGENT_ESCALATION

        # Normal processing
        return AgentAction.LOG_AND_CONTINUE

30.4 Prompt Engineering Patterns

The copilot’s intelligence emerges not from a single monolithic prompt but from a composition of patterns, each solving a specific problem in the diagnostic communication pipeline.

Few-Shot Examples

The system prompt includes 3-5 curated diagnostic Q&A pairs drawn from Dibyendu’s 4,000+ validated case history. These examples teach the LLM:

The expected length and structure of responses (2-4 paragraphs, not one sentence, not an essay)
How to weave citations naturally into prose (“…bearing outer race defect (AFB09) with BPFO harmonics at 1x, 2x, and 3x…”)
How to present multi-factor causal chains (“The primary initiator is contaminated lubricant (T108, I, 0.75), accelerated by misalignment (M102, A, 0.55)”)
How to frame follow-up questions that drive the diagnostic forward

Examples are rotated periodically to prevent the LLM from memorizing specific phrasing. Each example is tagged with the asset type and failure category so the retrieval layer can inject domain-relevant examples based on the current query.

Chain-of-Thought Enforcement

The LLM is instructed to reason through the NEME framework before generating its final response:

Before responding, reason through these steps internally:

NOTICE: What sensor data and alerts are present? What is abnormal?
ENGAGE: Which diagnostic tools should I call? In what order?
MULL: What does the evidence suggest? Are there alternative explanations?
       Is the evidence consistent or contradictory?
EXCHANGE: How should I present this to the engineer? What level of
          confidence is warranted? What follow-up questions would help?

This chain-of-thought is generated in the LLM’s extended thinking (not shown to the user) and serves as a reasoning scaffold. It prevents the LLM from jumping to conclusions before examining the evidence, which is the most common failure mode in diagnostic AI.

Retrieval-Augmented Prompting

Before the LLM generates a response, the system injects relevant context retrieved from pgvector:

Matched IMS rows: The 3-5 most relevant IMS evidence patterns for the detected failure modes
Matched rules: The specific initiator rules that fired in the current diagnosis, with their physics basis descriptions
FRETTLSM factors: The top causal factors from the taxonomy, with descriptions and keyword sets
Historical patterns: Previous diagnostic results for the same asset, summarized as trend context

This context is injected between the system prompt and the user’s question, formatted as structured data the LLM can reference. The injection is dynamic — different queries surface different context.

Structured Output Enforcement

The LLM is instructed to return JSON conforming to a specific schema (Section 30.1). This is enforced through:

Schema definition in the system prompt: The JSON structure is described with field-level documentation.
Tool-use response format: When using Claude’s tool calling, the response format is constrained by the tool’s output schema.
Validation layer: The Python backend validates the LLM’s JSON output against a Pydantic model before returning it to the frontend. Malformed responses are retried once with a correction prompt.

Temperature Calibration

Temperature is not a single setting — it varies by output type:

Output Component	Temperature	Rationale
Tool selection (which tools to call)	0.0	Deterministic tool selection — same question should always call same tools
Diagnostic conclusion (the JSON structure)	0.1	Near-deterministic — conclusions should be consistent across re-runs
Explanation prose (the `explanation` field)	0.3	Slight variation for natural language quality, but still grounded
Follow-up questions	0.5	More creative exploration of diagnostic angles

In practice, since a single API call generates the entire response, the system uses temperature 0.1 for diagnostic interactions and accepts that follow-up questions will be somewhat conservative. The trade-off favors precision over creativity — this is an engineering tool, not a brainstorming partner.

30.5 Conversation Memory & Context Management

A diagnostic conversation is not a series of independent questions. Engineers build understanding iteratively: they ask about an asset, drill into a specific fault, compare with another asset, then ask for a maintenance recommendation that considers everything discussed so far. The copilot must maintain coherent memory across this conversation arc.

Memory Tiers

Short-term memory (current session):

The asset(s) currently under discussion
Diagnostic results retrieved during this session
Tool calls made and their results
The engineer’s stated concerns and constraints (“I can’t shut down P-101 until next Tuesday”)

Short-term memory is held in the conversation context window. It persists for the duration of the session and is discarded when the session ends.

Medium-term memory (asset diagnostic context):

The last 30 days of diagnostic results for the active asset
Recent alerts and their resolution status
Active maintenance work orders
Sensor trending summaries

Medium-term memory is retrieved from the database on demand. When the engineer mentions an asset, the system loads its recent diagnostic history into the context window as background information.

Long-term memory (persistent knowledge):

Asset configuration: equipment type, subsystem map, sensor layout, baselines, thresholds
Historical failure patterns: recurring failure modes, average time between failures, seasonal patterns
Engineer preferences: preferred report format, depth level, notification settings
Organizational context: plant topology, system relationships, criticality rankings

Long-term memory is stored in PostgreSQL and retrieved through the get_asset_config and get_diagnostic_history tools. It is not held in the conversation window — it is fetched as needed.

Context Window Management

Claude’s context window is large but not infinite. For long diagnostic sessions (20+ turns), the system employs progressive summarization:

Turns 1-10: Full conversation history retained.
Turns 11-20: Turns 1-5 summarized into a structured digest (key findings, active faults, decisions made). Turns 6-20 retained in full.
Turns 21+: Only the structured digest plus the last 10 turns are retained. The digest is updated every 5 turns.

The digest format:

{
  "session_summary": {
    "assets_discussed": ["P-101", "P-102"],
    "active_faults": [
      { "asset": "P-101", "fault": "AFB09", "confidence": 0.75, "state": "S3" }
    ],
    "decisions_made": [
      "Schedule bearing replacement for P-101 within 7 days",
      "Collect oil sample before replacement"
    ],
    "open_questions": [
      "Lubrication schedule adherence for P-101",
      "Process upset history for cooling water system"
    ]
  }
}

This approach preserves the essential diagnostic context while preventing the conversation from exceeding the context window. The engineer experiences a continuous session; the system manages memory transparently.

Session Handoff

When an engineer returns to a previously diagnosed asset, the system can reconstruct context:

Load the asset’s latest diagnostic state from PostgreSQL.
Load the last session’s digest (if the engineer had a previous conversation about this asset).
Present a brief summary: “Last time we discussed P-101 (3 days ago), bearing outer race defect (AFB09) was detected with confidence 0.75 in S3 state. You scheduled bearing replacement for this week. Would you like an updated diagnosis?”

This continuity makes the copilot feel like a colleague who remembers previous conversations, not a stateless query engine that starts fresh every time.

Privacy and Data Boundaries

Conversation memory respects role-based access control:

An engineer sees diagnostic history for their assigned assets only.
A plant manager sees fleet-level summaries but not raw sensor data.
An operator sees current state and recommended actions but not confidence internals.
No conversation data is used for LLM fine-tuning or shared across organizations.
Session transcripts are stored in the audit log with the same retention policy as diagnostic results (configurable per customer, default 2 years).

30.6 Architecture Summary

The agentic AI layer sits atop RAPID AI’s deterministic diagnostic engines and adds three capabilities that the engines alone cannot provide:

Natural language interface: Engineers interact in their own words. The LLM translates between human questions and structured tool calls.
Multi-step reasoning: Complex questions are decomposed into tool call sequences that no single API endpoint could answer.
Specialized collaboration: Multiple agents bring domain expertise (causal analysis, maintenance strategy, design engineering) to questions that span the full diagnostic lifecycle.

The architecture’s defining characteristic is the boundary between reasoning and computation. The LLM reasons. The engines compute. Every response traces back to deterministic physics through explicit citations. This is not AI replacing engineers — it is AI making 451 rules and 88 causal factors accessible through conversation, so that the engineer who asks “why is my pump failing?” gets an answer grounded in the same physics that Dibyendu De used to diagnose 4,000 machines in the field.

Standards Alignment

Standard	Relevance to This Chapter
ISO 13374 — Condition monitoring and diagnostics of machines	The agentic copilot extends ISO 13374’s advisory generation (Level 6) with natural language interaction, while maintaining the requirement that all diagnostic conclusions originate from the deterministic processing levels (L2-L5).
MIMOSA OSA-CBM — Open System Architecture for CBM	The tool-call architecture (diagnostic engines as callable tools for the LLM agent) maintains OSA-CBM’s layered architecture by ensuring that the AI reasoning layer interacts with diagnostic engines through defined interfaces rather than bypassing them.
IEC 62443 — Industrial cybersecurity	The hard constraints on the copilot (never fabricate data, never override safety alerts, always cite engine outputs) implement IEC 62443’s safety integrity requirements for AI systems operating in industrial environments.

Changelog

Version	Date	Author	Changes
2.1.0	2026-03-17	Rick D	Added standards alignment, living doc metadata, changelog
2.0.0	2026-03-17	Rick D	Enriched with production codebase content
1.0.0	2026-03-17	Rick D	Initial chapter creation