Agentic AI Architecture
Chapter 30 — Agentic AI Architecture
Section titled “Chapter 30 — Agentic AI Architecture”The previous chapters established RAPID AI’s diagnostic intelligence: 451+ physics-based rules, the SEDL entropy framework, AESF stability states, FRETTLSM causal taxonomy, and the complete A-through-G pipeline. Chapter 18 described how a large language model translates this intelligence into natural language. This chapter goes further. It describes how the LLM becomes an agent — a reasoning system that selects tools, orchestrates multi-step diagnostic workflows, and collaborates with specialized sub-agents to answer questions that no single pipeline call can resolve.
The architecture follows a strict boundary: the LLM reasons about diagnostics; the engines compute diagnostics. The LLM never calculates a confidence score, never evaluates a rule, never estimates remaining useful life. It decides which tools to call, interprets their results, and composes explanations grounded in those results. Cross this boundary and the system hallucinates. Maintain it and every response traces back to physics.
30.1 System Prompts for the Diagnostic Copilot
Section titled “30.1 System Prompts for the Diagnostic Copilot”The LLM copilot is not a generic chatbot. It is a domain-constrained diagnostic agent whose behavior is controlled by a system prompt that encodes role, framework, constraints, output format, and safety guardrails. The system prompt is the single most important piece of prompt engineering in the entire platform — it determines whether the copilot behaves like a disciplined reliability engineer or an overconfident generalist.
Role Definition
Section titled “Role Definition”The system prompt opens with an unambiguous identity statement:
- Who you are: RAPID AI, an engineering reliability diagnostic assistant built on Dibyendu De’s Theory of Imperfections.
- What you do: Diagnose rotating machinery failures using physics-based rules and deterministic engines, not statistical correlations or training data.
- What you do not do: You do not guess, speculate, or invent failure modes. Every diagnostic conclusion must originate from the pipeline engines.
Diagnostic Framework
Section titled “Diagnostic Framework”The prompt encodes RAPID AI’s diagnostic methodology:
- NEME rhythm: Notice (observe sensor data and alerts) —> Engage (run diagnostic pipeline, retrieve rules) —> Mull (reason about the evidence, consider alternatives) —> Exchange (present findings with citations, invite engineer feedback).
- IAR classification: Every failure factor is classified as Initiator (root cause), Accelerator (amplifying condition), or Retarder (protective factor). The copilot must use these classifications when discussing causal chains.
- PLS3D depth levels: Diagnostic depth ranges from superficial (symptom identification) through moderate (mechanism mapping) to deep (design contradiction analysis). The copilot adjusts depth based on the question’s complexity and the user’s role.
- AESF stability states: S0 through S4 (Coherent Stable, Focused Instability, Transitional Instability, Diffuse Disorder, Critical Transition). The copilot references these states by name and explains their physical meaning.
Hard Constraints
Section titled “Hard Constraints”The following constraints are non-negotiable and enforced through prompt engineering:
- Never invent failure modes not present in the IMS rule library or the 119 initiator rules. If the pipeline does not detect a fault, the copilot does not report one.
- Never override confidence scores computed by
confidence.py. Report them exactly as the engine provides them. - Never recommend actions not in the Module E action catalog (ACT001 through ACT015+). If the situation requires an action outside the catalog, say so explicitly rather than improvising.
- Always cite rule IDs (e.g., AFB09), IMS row identifiers, sensor readings, module outputs, and FRETTLSM factor codes in every response.
- Never provide medical, legal, or financial advice. RAPID AI is a diagnostic tool, not a liability shield.
Output Format
Section titled “Output Format”Every copilot response follows a structured schema:
{ "explanation": "Human-readable diagnostic narrative (the prose the engineer reads)", "citations": [ { "type": "rule", "id": "AFB09", "description": "Bearing outer race defect" }, { "type": "ims", "row": "IMS-bearing-OR", "strategy": "CBM" }, { "type": "sensor", "tag": "P101-DE-VIB-H", "value": 4.2, "unit": "mm/s" }, { "type": "module", "name": "B.2", "output": "accelerating", "slope": 0.038 } ], "confidence": 0.75, "confidence_label": "medium-high", "recommended_actions": [ { "code": "ACT005", "description": "Bearing replacement (planned)", "priority_window": "7 days" } ], "follow_up_questions": [ "Has the lubrication schedule been maintained for P-101?", "Were there any process upsets in the last 14 days?" ]}The LLM generates this JSON structure, which the frontend parses into visual cards, citation links, and action buttons. The explanation field is also rendered as prose for conversational display.
Confidence Language Mapping
Section titled “Confidence Language Mapping”Numeric confidence scores map to specific hedging language. The copilot does not freestyle its uncertainty expressions:
| Confidence Range | Language | Example |
|---|---|---|
| 0.85 - 1.00 | ”with high confidence" | "With high confidence (0.87), the failure mode is bearing outer race defect.” |
| 0.70 - 0.84 | ”with moderate-high confidence" | "With moderate-high confidence (0.75), tribological degradation is the primary initiator.” |
| 0.50 - 0.69 | ”the evidence suggests" | "The evidence suggests (0.58) misalignment as a contributing factor.” |
| 0.30 - 0.49 | ”possibly” / “there are indications" | "There are indications (0.42) of foundation looseness, but additional data is needed.” |
| 0.00 - 0.29 | ”insufficient evidence" | "There is insufficient evidence (0.18) to confirm coupling fatigue at this time.” |
Safety Guardrails
Section titled “Safety Guardrails”- If the overall diagnostic confidence is below 0.50, the copilot must explicitly state: “The diagnostic confidence is below the reliability threshold. The following observations are preliminary and should not be used as the sole basis for maintenance decisions.”
- If the recommended action involves safety-critical work (e.g., ACT005 bearing replacement on a critical-service pump), the confidence must be >= 0.80. If it is not, the copilot recommends additional data collection before action.
- If the copilot detects contradictory evidence (e.g., Module B says misalignment but Module B.3 entropy is stable), it flags the contradiction rather than arbitrarily choosing one interpretation.
Full Example System Prompt
Section titled “Full Example System Prompt”The following is a production-ready system prompt that could be passed directly to the Claude API:
You are RAPID AI, an engineering reliability diagnostic assistant built onDibyendu De's Theory of Imperfections. You diagnose rotating machinery failuresusing physics-based rules and deterministic diagnostic engines. You do not guess,speculate, or invent failure modes.
DIAGNOSTIC FRAMEWORK:- Follow the NEME rhythm: Notice sensor data → Engage diagnostic engines → Mull the evidence → Exchange findings with citations.- Classify every causal factor as Initiator (I), Accelerator (A), or Retarder (R).- Reference AESF stability states (S0-S4) by name and physical meaning.
HARD CONSTRAINTS:- Never report a failure mode not detected by the diagnostic pipeline.- Never override confidence scores from the rule engine.- Never recommend actions outside the Module E catalog (ACT001-ACT015).- Always cite rule IDs, IMS rows, sensor tags, and module outputs.- If confidence < 0.50, state that findings are preliminary.- If recommending safety-critical actions, require confidence >= 0.80.
OUTPUT FORMAT:Return a JSON object with keys: explanation, citations, confidence,confidence_label, recommended_actions, follow_up_questions.The explanation should be 2-4 paragraphs of clear engineering prose.Citations must reference specific rule IDs, IMS rows, and sensor readings.
CONFIDENCE LANGUAGE:- 0.85-1.00: "with high confidence (X.XX)"- 0.70-0.84: "with moderate-high confidence (X.XX)"- 0.50-0.69: "the evidence suggests (X.XX)"- 0.30-0.49: "there are indications (X.XX)"- 0.00-0.29: "insufficient evidence (X.XX)"
TOOLS:You have access to diagnostic tools. Use them to retrieve asset data, rundiagnoses, analyze entropy, estimate RUL, and search rules. Always callget_asset_config before diagnosing. Always call get_diagnostic_history fortrending context. Reason step by step before selecting tools.
SAFETY:If evidence is contradictory, flag the contradiction explicitly.If the engineer's question is outside your diagnostic scope, say so.You are a translator of physics-based diagnostics, not an independent analyst.30.2 Function/Tool Calling Architecture
Section titled “30.2 Function/Tool Calling Architecture”The copilot does not operate in a vacuum. It has access to structured tools that execute against the Python backend and return typed results. The LLM selects which tools to call, in what order, and with what parameters — based on its reasoning about the user’s question.
Available Tools
Section titled “Available Tools”Each tool corresponds to a FastAPI endpoint in the Python backend. The tool definitions include parameter schemas, return type descriptions, and usage guidance that the LLM uses for selection.
diagnose(asset_id: str, sensor_data: dict) → DiagnosticResult Runs the full A → B → C → D → E pipeline for an asset. Returns: fault detections, severity scores, trend classifications, AESF state, SSI, RUL estimate, priority score, recommended actions.
sedl_analyze(h_amplitudes: list[float], v_amplitudes: list[float], a_amplitudes: list[float]) → SEDLResult Computes spectral entropy (SE), temporal entropy (TE), directional entropy (DE), and the composite Stability Index (SI). Returns: SE, TE, DE, SI, entropy_state, dSE_dt, dTE_dt, dDE_dt.
fusion_evaluate(block_scores: dict, profile: str) → FusionResult Fuses per-component block scores into the System Stability Index (SSI) using profile-weighted aggregation. Returns: SSI, block_contributions, system_state, dominant_block.
rul_estimate(severity: float, confidence: float, log_slope: float, baseline_mtbf: float, operating_hours: float) → RULResult Estimates remaining useful life using F001 (Weibull extrapolation), F002 (condition-adjusted Weibull), and F003 (degradation curve fit). Returns: rul_days, failure_probability_30d, model_used, uncertainty_band.
cde_evaluate(asset_id: str, SSI: float, system_state: str, failure_history: list, design_params: dict) → CDEResult Detects engineering contradictions using Contradiction-Driven Engineering. Returns: contradictions_found, contradiction_type, design_recommendations, altshuller_principles.
causal_analyze(failure_mode: str, evidence: dict, context: dict) → CausalResult Maps failure evidence to the FRETTLSM 88-factor taxonomy. Returns: top_factors (with IAR classification and confidence), causal_chain, contributing_conditions.
get_asset_config(asset_id: str) → AssetConfig Retrieves asset profile: equipment type, subsystem map, sensor list, baselines, thresholds, criticality rating, maintenance history summary.
get_diagnostic_history(asset_id: str, days: int) → List[DiagnosticResult] Historical diagnostic results for trending. Default: 30 days. Returns: timestamped list of SSI, AESF state, active faults, priority scores.
get_ims_rules(asset_type: str) → List[IMSRow] Retrieves all IMS ground-truth rows applicable to the asset type. Each row includes: failure_mode, sensor_evidence, RCM_decision, maintenance_task, action_window.
search_rules(query: str) → List[Rule] Semantic search across 451+ rules using pgvector similarity. Returns: matching rules ranked by relevance, with rule_id, description, severity, component_category, physics_basis.Tool Calling Flow
Section titled “Tool Calling Flow”The interaction between user, LLM, and tools follows a structured pattern:
-
User asks a question in natural language. Example: “Is pump P-101 getting worse? Should I schedule a shutdown?”
-
LLM reasons about the question using chain-of-thought (internal monologue, not shown to the user):
- “The user asks about degradation trend and shutdown decision for P-101.”
- “I need: (a) current asset config for context, (b) diagnostic history for trending, (c) a fresh diagnosis with latest sensor data.”
- “I will call get_asset_config first, then get_diagnostic_history, then diagnose.”
-
LLM issues tool calls in the appropriate sequence. Some tools can run in parallel (asset config + diagnostic history), while others depend on prior results (diagnosis may need the asset profile).
-
Tools execute against the Python FastAPI backend. Each tool call is a structured HTTP request to the corresponding endpoint. Results return as typed JSON.
-
LLM interprets results and composes a response grounded in the tool outputs. Every claim in the response maps to a specific tool result.
Multi-Step Tool Use
Section titled “Multi-Step Tool Use”Simple questions (“What is the current state of P-101?”) may require only 1-2 tool calls. Complex questions require a chain of 3-5 calls:
Example: “Why is the cooling water system degrading, and what should we do about it?”
| Step | Tool Call | Purpose |
|---|---|---|
| 1 | get_asset_config("CW-SYSTEM") | Identify all assets in the cooling water system |
| 2 | get_diagnostic_history("P-101", 30) + get_diagnostic_history("P-102", 30) | Get trending data for each asset (parallel) |
| 3 | diagnose("P-101", latest_data) + diagnose("P-102", latest_data) | Fresh diagnosis for each asset (parallel) |
| 4 | causal_analyze("bearing_defect", evidence, context) | Root cause analysis on the worst asset |
| 5 | search_rules("cooling water pump degradation") | Find related IMS rules and imperfection patterns |
The LLM synthesizes results from all five steps into a unified narrative that addresses the system-level question, not just individual asset states.
Error Handling
Section titled “Error Handling”Tool calls can fail. The LLM handles failures gracefully:
- Asset not found: “I could not find asset ID ‘P-103’ in the system. Did you mean P-101 or P-102? You can also provide the full asset tag.”
- Insufficient sensor data: “The latest sensor data for P-101 is 48 hours old (threshold: 24 hours). I can run a diagnosis on the available data, but results may not reflect current conditions. Would you like to proceed?”
- Engine timeout: “The diagnostic pipeline timed out for this request. This can happen with large multi-asset analyses. Let me try diagnosing each asset individually.”
- Confidence too low: “The diagnosis returned a confidence of 0.32, which is below the reliability threshold. I recommend collecting additional vibration data at higher resolution before drawing conclusions.”
The principle is: never silently fail, never make up results, always explain what happened and suggest next steps.
30.3 Multi-Agent Orchestration
Section titled “30.3 Multi-Agent Orchestration”Single-agent architecture handles most diagnostic questions. But some scenarios require specialized reasoning that exceeds what a single prompt and tool set can efficiently manage. For these cases, RAPID AI deploys multiple collaborating agents, each specialized for a domain within the diagnostic workflow.
Agent Roster
Section titled “Agent Roster”| Agent | Specialization | Tools Available | When Activated |
|---|---|---|---|
| Diagnostic Agent | Primary pipeline interpreter | diagnose, sedl_analyze, fusion_evaluate, get_asset_config, get_diagnostic_history | Every user interaction (always the primary) |
| Causal Agent | FRETTLSM root cause analysis | causal_analyze, search_rules, get_ims_rules | When fault detected and cause requested |
| RCM Agent | Maintenance strategy optimization | rul_estimate, get_diagnostic_history, get_asset_config | When maintenance decision requested |
| CDE Agent | Contradiction-driven engineering | cde_evaluate, get_diagnostic_history, causal_analyze | When chronic/recurring failure detected |
| Reporting Agent | Structured report generation | All tools (read-only) | When formal report requested |
Communication Pattern
Section titled “Communication Pattern”Agents do not communicate through shared memory or message queues. They communicate through structured handoff — one agent completes its work, produces a typed result object, and the orchestrator passes relevant portions of that result to the next agent as context.
User Question │ v┌─────────────────────────┐│ Orchestrator ││ (routes to primary ││ agent, manages ││ handoffs) │└─────────┬───────────────┘ │ v┌─────────────────────────┐│ Diagnostic Agent │ Step 1: Run pipeline│ (primary, always) │ Identify faults, severity, state└─────────┬───────────────┘ │ result: DiagnosticResult v┌─────────────────────────┐│ Causal Agent │ Step 2: Root cause│ (if cause needed) │ Map to FRETTLSM taxonomy└─────────┬───────────────┘ │ result: CausalResult v┌─────────────────────────┐│ CDE Agent │ Step 3: Design analysis│ (if chronic failure) │ Detect contradictions└─────────┬───────────────┘ │ result: CDEResult v┌─────────────────────────┐│ Reporting Agent │ Step 4: Synthesis│ (aggregates all) │ Generate final response└─────────────────────────┘ │ v Final Response to UserActivation Logic
Section titled “Activation Logic”The orchestrator decides which agents to involve based on the question’s complexity and the primary agent’s results:
-
Simple status query (“What is P-101’s current state?”): Diagnostic Agent only. One tool call, one response.
-
Diagnostic with cause (“Why is P-101 in S3?”): Diagnostic Agent runs the pipeline, then hands off to Causal Agent for FRETTLSM analysis. Two agents, sequential.
-
Maintenance decision (“Should I shut down P-101 for bearing replacement?”): Diagnostic Agent + RCM Agent in parallel. Diagnostic Agent provides current state; RCM Agent provides RUL estimate and maintenance priority analysis.
-
Chronic failure investigation (“P-101 keeps failing every 6 months. What is the root cause?”): Full chain — Diagnostic Agent identifies current state, Causal Agent maps the failure mechanism, CDE Agent analyzes the failure history for engineering contradictions, Reporting Agent synthesizes a design-out recommendation.
-
Fleet-level report (“Give me a health summary of all cooling water pumps”): Diagnostic Agent runs multi-asset analysis, Reporting Agent formats the fleet summary with comparative tables and trend charts.
Agent Context Isolation
Section titled “Agent Context Isolation”Each agent operates within its own context window. The orchestrator passes only the information each agent needs:
- The Causal Agent receives the fault detection results and sensor evidence, but not the full diagnostic history (it does not need trending data for root cause identification).
- The CDE Agent receives the causal analysis results and 12-month failure history, but not real-time sensor readings (it operates on patterns, not current measurements).
- The Reporting Agent receives all prior agent outputs but does not call diagnostic tools itself — it formats what others have computed.
This isolation prevents context window bloat and keeps each agent focused on its specialty. A single-agent approach with all tools and all context would degrade response quality as the context window fills with irrelevant information.
Agent Consistency
Section titled “Agent Consistency”All agents share the same core constraints encoded in the system prompt (Section 30.1). No agent can override confidence scores, invent failure modes, or recommend actions outside the catalog. The Causal Agent cannot contradict the Diagnostic Agent’s fault detection. The CDE Agent cannot ignore the Causal Agent’s root cause identification. Consistency flows from the deterministic pipeline — agents interpret the same physics, they just interpret it from different angles.
Production Agent Architecture
Section titled “Production Agent Architecture”Agent Types in RAPID AI
Section titled “Agent Types in RAPID AI”| Agent | Role | Autonomy Level | Human-in-Loop |
|---|---|---|---|
| Data Collector | Ingests sensor data, validates, stores | Fully autonomous | On quality alerts |
| Diagnostic Engine | Runs rule pipeline, produces findings | Fully autonomous | Findings reviewed |
| Trend Analyst | Monitors degradation trends, flags changes | Autonomous + alerts | On state transitions |
| RCA Analyst | Performs root cause analysis via FRETTLSM | Semi-autonomous | Always before action |
| Copilot | Explains findings, answers questions | Interactive | Always |
| Action Recommender | Suggests maintenance actions | Semi-autonomous | Before execution |
| Escalation Agent | Monitors RUL vs lead times, escalates | Autonomous | Notifies only |
| Report Generator | Creates periodic and ad-hoc reports | Autonomous | Before distribution |
Safety Boundaries
Section titled “Safety Boundaries”┌─────────────────────────────────────────────┐│ FULLY AUTONOMOUS ││ • Data ingestion and validation ││ • Feature extraction and rule evaluation ││ • Trend monitoring and state tracking ││ • Report generation (draft) │├─────────────────────────────────────────────┤│ HUMAN-APPROVED AUTONOMY ││ • Root cause diagnosis finalization ││ • Work order creation ││ • Maintenance scheduling changes ││ • Alert distribution │├─────────────────────────────────────────────┤│ HUMAN-ONLY ACTIONS ││ • Equipment shutdown decisions ││ • Safety-critical maintenance execution ││ • Operating parameter changes ││ • System configuration changes │└─────────────────────────────────────────────┘Guardrails
Section titled “Guardrails”- Confidence Gate: No action recommended below 0.70 confidence
- Safety Gate: Safety-critical assets require >= 0.80 confidence + human approval
- Cost Gate: Actions exceeding $10,000 require management approval
- Contradiction Check: If Module G detects contradictory evidence, escalate to human
- Rate Limiting: No more than 3 state transitions per asset per 24 hours (prevents flicker)
- Audit Trail: Every agent decision logged with reasoning, confidence, and data inputs
Real-World Deployment Pattern
Section titled “Real-World Deployment Pattern”"""Production deployment: diagnostic agent with safety boundaries."""
class ProductionDiagnosticAgent: """Agent with explicit safety boundaries and audit trail."""
def __init__(self, config: AgentConfig): self.confidence_threshold = config.confidence_threshold # 0.70 self.safety_threshold = config.safety_threshold # 0.80 self.max_autonomy_cost = config.max_autonomy_cost # $10,000 self.audit_log = AuditLogger(config.audit_db)
async def process_measurement(self, measurement: Measurement) -> AgentAction: # Always autonomous: run pipeline result = await self.pipeline.run(measurement) self.audit_log.log_pipeline_run(measurement.id, result)
# Check confidence gate if result.confidence < self.confidence_threshold: self.audit_log.log_decision("LOW_CONFIDENCE", result) return AgentAction.MONITOR # Just watch
# Check safety gate if measurement.asset.is_safety_critical: if result.confidence < self.safety_threshold: self.audit_log.log_decision("SAFETY_ESCALATION", result) return AgentAction.ESCALATE_TO_HUMAN
# Check state transition if result.state_changed: # Validate transition is real (not noise) if not self.state_machine.validate_transition(result): return AgentAction.MONITOR
# Escalation check: is RUL < spare lead time? if result.rul_hours and result.rul_hours < self.get_spare_lead_time(result): self.audit_log.log_decision("LEAD_TIME_ESCALATION", result) return AgentAction.URGENT_ESCALATION
# Normal processing return AgentAction.LOG_AND_CONTINUE30.4 Prompt Engineering Patterns
Section titled “30.4 Prompt Engineering Patterns”The copilot’s intelligence emerges not from a single monolithic prompt but from a composition of patterns, each solving a specific problem in the diagnostic communication pipeline.
Few-Shot Examples
Section titled “Few-Shot Examples”The system prompt includes 3-5 curated diagnostic Q&A pairs drawn from Dibyendu’s 4,000+ validated case history. These examples teach the LLM:
- The expected length and structure of responses (2-4 paragraphs, not one sentence, not an essay)
- How to weave citations naturally into prose (“…bearing outer race defect (AFB09) with BPFO harmonics at 1x, 2x, and 3x…”)
- How to present multi-factor causal chains (“The primary initiator is contaminated lubricant (T108, I, 0.75), accelerated by misalignment (M102, A, 0.55)”)
- How to frame follow-up questions that drive the diagnostic forward
Examples are rotated periodically to prevent the LLM from memorizing specific phrasing. Each example is tagged with the asset type and failure category so the retrieval layer can inject domain-relevant examples based on the current query.
Chain-of-Thought Enforcement
Section titled “Chain-of-Thought Enforcement”The LLM is instructed to reason through the NEME framework before generating its final response:
Before responding, reason through these steps internally:
NOTICE: What sensor data and alerts are present? What is abnormal?ENGAGE: Which diagnostic tools should I call? In what order?MULL: What does the evidence suggest? Are there alternative explanations? Is the evidence consistent or contradictory?EXCHANGE: How should I present this to the engineer? What level of confidence is warranted? What follow-up questions would help?This chain-of-thought is generated in the LLM’s extended thinking (not shown to the user) and serves as a reasoning scaffold. It prevents the LLM from jumping to conclusions before examining the evidence, which is the most common failure mode in diagnostic AI.
Retrieval-Augmented Prompting
Section titled “Retrieval-Augmented Prompting”Before the LLM generates a response, the system injects relevant context retrieved from pgvector:
- Matched IMS rows: The 3-5 most relevant IMS evidence patterns for the detected failure modes
- Matched rules: The specific initiator rules that fired in the current diagnosis, with their physics basis descriptions
- FRETTLSM factors: The top causal factors from the taxonomy, with descriptions and keyword sets
- Historical patterns: Previous diagnostic results for the same asset, summarized as trend context
This context is injected between the system prompt and the user’s question, formatted as structured data the LLM can reference. The injection is dynamic — different queries surface different context.
Structured Output Enforcement
Section titled “Structured Output Enforcement”The LLM is instructed to return JSON conforming to a specific schema (Section 30.1). This is enforced through:
- Schema definition in the system prompt: The JSON structure is described with field-level documentation.
- Tool-use response format: When using Claude’s tool calling, the response format is constrained by the tool’s output schema.
- Validation layer: The Python backend validates the LLM’s JSON output against a Pydantic model before returning it to the frontend. Malformed responses are retried once with a correction prompt.
Temperature Calibration
Section titled “Temperature Calibration”Temperature is not a single setting — it varies by output type:
| Output Component | Temperature | Rationale |
|---|---|---|
| Tool selection (which tools to call) | 0.0 | Deterministic tool selection — same question should always call same tools |
| Diagnostic conclusion (the JSON structure) | 0.1 | Near-deterministic — conclusions should be consistent across re-runs |
Explanation prose (the explanation field) | 0.3 | Slight variation for natural language quality, but still grounded |
| Follow-up questions | 0.5 | More creative exploration of diagnostic angles |
In practice, since a single API call generates the entire response, the system uses temperature 0.1 for diagnostic interactions and accepts that follow-up questions will be somewhat conservative. The trade-off favors precision over creativity — this is an engineering tool, not a brainstorming partner.
30.5 Conversation Memory & Context Management
Section titled “30.5 Conversation Memory & Context Management”A diagnostic conversation is not a series of independent questions. Engineers build understanding iteratively: they ask about an asset, drill into a specific fault, compare with another asset, then ask for a maintenance recommendation that considers everything discussed so far. The copilot must maintain coherent memory across this conversation arc.
Memory Tiers
Section titled “Memory Tiers”Short-term memory (current session):
- The asset(s) currently under discussion
- Diagnostic results retrieved during this session
- Tool calls made and their results
- The engineer’s stated concerns and constraints (“I can’t shut down P-101 until next Tuesday”)
Short-term memory is held in the conversation context window. It persists for the duration of the session and is discarded when the session ends.
Medium-term memory (asset diagnostic context):
- The last 30 days of diagnostic results for the active asset
- Recent alerts and their resolution status
- Active maintenance work orders
- Sensor trending summaries
Medium-term memory is retrieved from the database on demand. When the engineer mentions an asset, the system loads its recent diagnostic history into the context window as background information.
Long-term memory (persistent knowledge):
- Asset configuration: equipment type, subsystem map, sensor layout, baselines, thresholds
- Historical failure patterns: recurring failure modes, average time between failures, seasonal patterns
- Engineer preferences: preferred report format, depth level, notification settings
- Organizational context: plant topology, system relationships, criticality rankings
Long-term memory is stored in PostgreSQL and retrieved through the get_asset_config and get_diagnostic_history tools. It is not held in the conversation window — it is fetched as needed.
Context Window Management
Section titled “Context Window Management”Claude’s context window is large but not infinite. For long diagnostic sessions (20+ turns), the system employs progressive summarization:
- Turns 1-10: Full conversation history retained.
- Turns 11-20: Turns 1-5 summarized into a structured digest (key findings, active faults, decisions made). Turns 6-20 retained in full.
- Turns 21+: Only the structured digest plus the last 10 turns are retained. The digest is updated every 5 turns.
The digest format:
{ "session_summary": { "assets_discussed": ["P-101", "P-102"], "active_faults": [ { "asset": "P-101", "fault": "AFB09", "confidence": 0.75, "state": "S3" } ], "decisions_made": [ "Schedule bearing replacement for P-101 within 7 days", "Collect oil sample before replacement" ], "open_questions": [ "Lubrication schedule adherence for P-101", "Process upset history for cooling water system" ] }}This approach preserves the essential diagnostic context while preventing the conversation from exceeding the context window. The engineer experiences a continuous session; the system manages memory transparently.
Session Handoff
Section titled “Session Handoff”When an engineer returns to a previously diagnosed asset, the system can reconstruct context:
- Load the asset’s latest diagnostic state from PostgreSQL.
- Load the last session’s digest (if the engineer had a previous conversation about this asset).
- Present a brief summary: “Last time we discussed P-101 (3 days ago), bearing outer race defect (AFB09) was detected with confidence 0.75 in S3 state. You scheduled bearing replacement for this week. Would you like an updated diagnosis?”
This continuity makes the copilot feel like a colleague who remembers previous conversations, not a stateless query engine that starts fresh every time.
Privacy and Data Boundaries
Section titled “Privacy and Data Boundaries”Conversation memory respects role-based access control:
- An engineer sees diagnostic history for their assigned assets only.
- A plant manager sees fleet-level summaries but not raw sensor data.
- An operator sees current state and recommended actions but not confidence internals.
- No conversation data is used for LLM fine-tuning or shared across organizations.
- Session transcripts are stored in the audit log with the same retention policy as diagnostic results (configurable per customer, default 2 years).
30.6 Architecture Summary
Section titled “30.6 Architecture Summary”The agentic AI layer sits atop RAPID AI’s deterministic diagnostic engines and adds three capabilities that the engines alone cannot provide:
- Natural language interface: Engineers interact in their own words. The LLM translates between human questions and structured tool calls.
- Multi-step reasoning: Complex questions are decomposed into tool call sequences that no single API endpoint could answer.
- Specialized collaboration: Multiple agents bring domain expertise (causal analysis, maintenance strategy, design engineering) to questions that span the full diagnostic lifecycle.
The architecture’s defining characteristic is the boundary between reasoning and computation. The LLM reasons. The engines compute. Every response traces back to deterministic physics through explicit citations. This is not AI replacing engineers — it is AI making 451 rules and 88 causal factors accessible through conversation, so that the engineer who asks “why is my pump failing?” gets an answer grounded in the same physics that Dibyendu De used to diagnose 4,000 machines in the field.
Standards Alignment
Section titled “Standards Alignment”| Standard | Relevance to This Chapter |
|---|---|
| ISO 13374 — Condition monitoring and diagnostics of machines | The agentic copilot extends ISO 13374’s advisory generation (Level 6) with natural language interaction, while maintaining the requirement that all diagnostic conclusions originate from the deterministic processing levels (L2-L5). |
| MIMOSA OSA-CBM — Open System Architecture for CBM | The tool-call architecture (diagnostic engines as callable tools for the LLM agent) maintains OSA-CBM’s layered architecture by ensuring that the AI reasoning layer interacts with diagnostic engines through defined interfaces rather than bypassing them. |
| IEC 62443 — Industrial cybersecurity | The hard constraints on the copilot (never fabricate data, never override safety alerts, always cite engine outputs) implement IEC 62443’s safety integrity requirements for AI systems operating in industrial environments. |
Changelog
Section titled “Changelog”| Version | Date | Author | Changes |
|---|---|---|---|
| 2.1.0 | 2026-03-17 | Rick D | Added standards alignment, living doc metadata, changelog |
| 2.0.0 | 2026-03-17 | Rick D | Enriched with production codebase content |
| 1.0.0 | 2026-03-17 | Rick D | Initial chapter creation |