Data Architecture

Chapter 15 — Data Architecture

RAPID AI’s data architecture serves a system that must store raw sensor signals, evaluate hundreds of engineering rules against them, track stability over time, and produce explainable diagnostic outputs. This chapter documents how the data is organized, seeded, queried, and governed.

The Data Model

RAPID AI operates on three interconnected schema stacks. Each was designed for a different analytical concern, but together they form the complete data foundation for the platform.

Normalized Schema: 8 Tables x 100 Rows

The normalized schema is the operational backbone. It contains the Integrated Master Schema (IMS) — the domain knowledge that RAPID AI uses to evaluate equipment health. Eight tables, each seeded with exactly 100 rows, provide the core reference data.

Table	Purpose	Key Columns
`asset_master`	Equipment templates — one row per asset configuration	`asset_template_id`, `asset_type`, `subsystem`, `component`, `criticality`
`functional_failures`	What can go wrong at a functional level	`functional_failure_id`, `asset_type`, `function_description`, `failure_description`
`failure_modes`	Specific failure mechanisms	`failure_mode_id`, `asset_type`, `failure_mode`, `failure_effect`
`sensor_evidence_rules`	What sensor patterns indicate which failures	`sensor_rule_id`, `asset_type`, `sensor_type`, `evidence_description`, `confidence_rule`
`rcm_rules`	Reliability-centered maintenance decision logic	`rcm_rule_id`, `asset_type`, `failure_mode`, `recommended_strategy`
`maintenance_tasks`	Specific maintenance actions and procedures	`maintenance_task_id`, `asset_type`, `task_description`, `skill_required`
`dashboard_output_mappings`	How diagnostic results map to dashboard KPIs	`dashboard_output_id`, `asset_type`, `kpi_widget`, `chart_type`
`schema_relation_map`	Foreign-key bridge connecting all seven tables above	`ims_id`, references to all other table IDs

The schema_relation_map table is the linchpin. Each of its 100 rows (IMS001 through IMS100) connects one asset template to its corresponding functional failure, failure mode, sensor rule, RCM rule, maintenance task, and dashboard mapping. This is how the system answers the chain: “for this equipment type, what can fail, how would we detect it, and what should we do?”

A ninth table, table_dictionary, stores metadata about the eight tables themselves — their names, column counts, and descriptions. It contains 8 rows, one per table.

Reliability Schema: 18 Tables

The reliability schema, defined in Module D (Prognostics), extends the data model into statistical and temporal analysis territory. It is organized into four layers:

Master / Reference (7 tables):

area_master — logical plant sections (areas, units)
asset_master — asset registry for reliability analysis
asset_tag — normalized tag lists per asset
dependency_graph — graph definitions for connected-equipment analysis
dependency_node — one node per asset in each graph
dependency_edge — directed, weighted dependency edges between nodes
metric_catalog — catalog of supported reliability metrics (27 metrics across 7 groups: reliability, maintainability, availability, Weibull, growth, dependency risk, data quality)

Control (2 tables):

analysis_request — submitted reliability analysis runs
analysis_request_target — targets included in each request (asset, area, or portfolio scope)

Event / History (5 tables):

failure_event — failures, degradations, trips, shutdowns, restorations
runtime_interval — operating and downtime exposure intervals (basis for MTBF)
maintenance_event — work history, optionally linked to triggering failure events
sensor_evidence_summary — condition evidence summaries over time windows
sensor_inferred_failure_mode — inferred failure modes from sensor evidence

Result (4 tables):

reliability_result_asset — asset-level result envelopes
reliability_result_area — area-level result envelopes
metric_result — normalized metric values (one row per result-metric pair)
quality_flag_result — data sufficiency and confidence flags

The schema follows 3NF normalization principles. Result metrics are stored in a normalized metric_result table rather than as fixed columns, making the model extensible for future metrics — additional Weibull outputs, maintenance burden variants, or customer-specific KPIs can be added without schema migration.

Relationship Between Normalized and Reliability Schemas

The normalized schema answers what can fail and what to do about it — it is the knowledge layer. The reliability schema answers when and how likely — it is the analytical layer.

They connect through two shared concepts:

Asset identity: Both schemas have asset_master tables, though with different column sets. The normalized schema’s asset_master stores equipment templates (19 asset types), while the reliability schema’s asset_master stores specific installed assets with area references and parent hierarchies.
Failure modes: The normalized schema’s failure_modes table maps failure mechanisms to equipment types and sensor evidence. The reliability schema’s failure_event table records actual failure occurrences with timestamps. The sensor_inferred_failure_mode table bridges the two — it references sensor evidence summaries and maps them to failure mode candidates.

Currently, these stacks use incompatible primary key strategies (string IDs like FM0001 in the normalized schema, varchar(64) IDs like AST-PUMP-001 in the reliability schema). A bridge table or unified PK convention is needed before they can be joined in production queries.

Entity-Relationship Overview

The full production schema (documented in the Database Schema Pack) defines 34 tables organized into seven entity groups:

Tenant and user — tenants, users (multi-tenant boundary)
Asset and dependency — plants, asset_types, assets, asset_dependencies, sensors
Telemetry and observation — telemetry_events, waveform_files, manual_observations, maintenance_events, feature_snapshots, anomaly_events
Diagnostic and reasoning — diagnostic_runs, diagnostic_hypotheses, evidence_items
Rule and knowledge — engineering_rules, rule_execution_logs, failure_modes, imperfections, imperfection_findings, contradictions, contradiction_findings
Reliability and planning — reliability_assessments, root_cause_chains, design_recommendations, maintenance_plans, maintenance_plan_tasks, spare_recommendations, manpower_recommendations
Governance — copilot_sessions, copilot_messages, rule_approvals, knowledge_audit_logs

The diagnostic run is the central transaction. A single diagnostic_runs row spawns hypotheses, evidence items, imperfection findings, contradiction findings, root cause chains, reliability assessments, and maintenance plans. This star structure makes it possible to trace any recommendation back through the full reasoning chain to the original sensor data.

The IMS as Data

100 Rows x 34 Columns

The Integrated Master Schema (IMS) is the flattened, denormalized view of the eight normalized tables. Each of its 100 rows represents a complete diagnostic pathway — from an equipment template through its failure mode, sensor evidence, RCM decision, and maintenance action, all the way to its dashboard output mapping.

The 34 columns break into logical groups:

Group	Columns	Purpose
Asset identity	`asset_template_id`, `asset_type`, `subsystem`, `component`, `criticality`	What equipment are we talking about?
Functional context	`function_description`, `failure_description`	What function does it serve, and what does failure look like?
Failure mechanism	`failure_mode`, `failure_effect`, `severity_code`	How specifically does it fail?
Sensor evidence	`sensor_type`, `evidence_description`, `confidence_rule`, `detection_method`	What sensor readings indicate this failure?
RCM decision	`recommended_strategy`, `task_interval`, `decision_basis`	What maintenance strategy does RCM prescribe?
Maintenance action	`task_description`, `skill_required`, `estimated_duration`	What specific work needs to be done?
Dashboard output	`kpi_widget`, `chart_type`, `alert_threshold`, `display_priority`	How does this appear on screen?
Cross-references	`ims_id`, all foreign key IDs	How do the normalized tables link?

Loading the IMS: The ReferenceLoader Pattern

The IMS is loaded once at application startup and cached for the lifetime of the process. The ReferenceLoader pattern works as follows:

On first access: Read all normalized CSV files (or query the database), join them via schema_relation_map, and build an in-memory lookup structure keyed by asset_type and failure_mode_id.
Lazy caching: The loaded data is stored as an immutable dictionary. Subsequent lookups hit the cache, not the database.
Reload on demand: The /admin/schema/reload endpoint invalidates the cache and forces a fresh load — useful after adding new IMS rows.

This pattern keeps diagnostic rule evaluation fast (sub-millisecond lookups) while avoiding the overhead of per-request database queries for reference data that changes infrequently.

Extending the IMS

The IMS is designed to grow by adding rows, not code. To add support for a new failure mode:

Add one row to asset_master (if the equipment type is new)
Add one row to functional_failures (what function fails)
Add one row to failure_modes (the specific mechanism)
Add one row to sensor_evidence_rules (how to detect it)
Add one row to rcm_rules (what maintenance strategy to apply)
Add one row to maintenance_tasks (what work to perform)
Add one row to dashboard_output_mappings (how to display it)
Add one row to schema_relation_map (link them all together)

No Python code changes. No deployment. The next /admin/schema/reload call picks up the new row, and the diagnostic engine can immediately evaluate it. This is the meaning of “rules are data, not code.”

Seed Data Strategy

320 Failure Modes Across 8 Equipment Types

The seed data provides a working diagnostic knowledge base from day one. The 320 failure modes are distributed across 8 equipment types:

Equipment Type	Modes	FM ID Range
Bearing	60	FM0001-FM0020, FM0141-FM0160, FM0281-FM0300
Pump	60	FM0021-FM0040, FM0161-FM0180, FM0301-FM0320
Electric Motor	40	FM0041-FM0060, FM0181-FM0200
Gearbox	40	FM0061-FM0080, FM0201-FM0220
Fan	30	FM0081-FM0095, FM0221-FM0235
Compressor	30	FM0096-FM0110, FM0236-FM0250
Conveyor	30	FM0111-FM0125, FM0251-FM0265
Hydraulic System	30	FM0126-FM0140, FM0266-FM0280

Bearings and pumps receive the heaviest coverage (60 modes each) because they are the most common rotating equipment in industrial plants and the most frequent targets for vibration-based diagnostics.

SQL Insertion Order (Referential Integrity)

The seed data must be loaded in a specific order to satisfy foreign key constraints. The master runner script (00_run_all_seed_inserts.sql) enforces this:

BEGIN;

1. asset_master             -- equipment templates (no dependencies)
2. functional_failures      -- references asset types
3. failure_modes             -- references asset types
4. sensor_evidence_rules     -- references asset types and failure modes
5. rcm_rules                 -- references failure modes and strategies
6. maintenance_tasks         -- references asset types and strategies
7. dashboard_output_mappings -- references asset types and KPIs
8. schema_relation_map       -- references all seven tables above
9. table_dictionary          -- metadata about the tables themselves

COMMIT;

The entire seed load is wrapped in a single transaction. If any INSERT fails (column-shift defect, duplicate key, FK violation), the entire batch rolls back cleanly. This prevents partial loads that would leave the IMS in an inconsistent state.

The Seed Manifest

The seed_insert_manifest.csv tracks what each SQL file contains:

Table	Source CSV	Output SQL	Row Count
asset_master	asset_master.csv	asset_master_seed_inserts.sql	100
functional_failures	functional_failures.csv	functional_failures_seed_inserts.sql	100
failure_modes	failure_modes.csv	failure_modes_seed_inserts.sql	100
sensor_evidence_rules	sensor_evidence_rules.csv	sensor_evidence_rules_seed_inserts.sql	100
rcm_rules	rcm_rules.csv	rcm_rules_seed_inserts.sql	100
maintenance_tasks	maintenance_tasks.csv	maintenance_tasks_seed_inserts.sql	100
dashboard_output_mappings	dashboard_output_mappings.csv	dashboard_output_mappings_seed_inserts.sql	100
schema_relation_map	schema_relation_map.csv	schema_relation_map_seed_inserts.sql	100
table_dictionary	table_dictionary.csv	table_dictionary_seed_inserts.sql	8

Note the discrepancy: the normalized schema seeds 100 failure modes, while the full master library provides 320. The additional 220 modes (FM0101-FM0320) exist in failure_modes_full_320_seed.sql but do not yet have corresponding sensor evidence rules, RCM rules, maintenance tasks, or dashboard mappings. Extending the normalized schema to cover all 320 modes is a documented gap.

pgvector for Semantic Search

Why Vector Similarity

Engineers do not query diagnostic systems with SQL. They say things like “my pump is vibrating and running hot” or “bearing noise getting worse after last overhaul.” These natural language descriptions need to be matched against the IMS’s 100 sensor evidence descriptions and 320 failure mode names.

Traditional keyword search fails here because the vocabulary is inconsistent — an engineer might say “shaking” when the IMS says “elevated RMS velocity,” or “overheating” when the evidence rule says “bearing temperature rise above baseline.”

Vector similarity search solves this by converting both the query and the IMS descriptions into high-dimensional embeddings, then finding the closest matches by cosine distance. The query does not need to use the exact words in the evidence description — it needs to be semantically close.

Embedding Strategy

The embedding pipeline works on rule and evidence descriptions:

Source text: Each sensor_evidence_rules.evidence_description and failure_modes.failure_mode value is concatenated with its asset_type to form a composite text.
Embedding model: Text is converted to vectors using an embedding model (e.g., OpenAI text-embedding-3-small at 1536 dimensions, or a local model for air-gapped deployments).
Storage: Vectors are stored in a pgvector column alongside the source row, enabling similarity queries without a separate vector database.
Indexing: An IVFFlat or HNSW index on the vector column enables sub-second similarity search across the full IMS.

Embeddings are generated once when seed data is loaded and regenerated on /admin/schema/reload. Since the IMS changes infrequently (adding rows, not updating existing ones), embedding freshness is not a concern.

Query Pattern

A typical semantic search flow:

User input: "my pump is vibrating and running hot"
    |
    v
Embed the query text -> [0.012, -0.034, 0.089, ...]
    |
    v
SELECT failure_mode_id, failure_mode, asset_type,
       1 - (embedding <=> query_embedding) AS similarity
FROM failure_modes
WHERE asset_type = 'Pump'
ORDER BY embedding <=> query_embedding
LIMIT 5;
    |
    v
Results:
  1. FM0023 "Bearing overheating" (similarity: 0.87)
  2. FM0025 "Seal failure with temperature rise" (similarity: 0.82)
  3. FM0021 "Cavitation" (similarity: 0.71)

The results feed into the diagnostic engine as candidate failure modes, bypassing the need for the engineer to know the exact FM ID or navigate the IMS manually. This is the foundation for the Engineering Copilot — a natural language interface to RAPID AI’s rule-based reasoning.

Time-Series Data

Sensor Readings Table Design

The telemetry_events table captures all sensor data flowing into RAPID AI:

Column	Type	Purpose
`telemetry_id`	UUID	Primary key
`tenant_id`	UUID	Multi-tenant isolation
`asset_id`	UUID	Which equipment
`sensor_id`	UUID	Which sensor (nullable for manual readings)
`event_time`	timestamptz	When the reading was taken (indexed)
`metric_code`	text	What was measured (rms, temp, rpm, pressure)
`metric_value`	numeric	The measurement value
`unit`	text	Engineering unit (mm/s, degrees C, RPM)
`quality_code`	text	Data quality flag (valid, suspect, missing)
`ingestion_source`	text	How it arrived (stream, file, manual)

This design uses a narrow-table pattern: each reading is one row with a metric_code discriminator, rather than one wide row with a column per metric. This accommodates assets with different sensor configurations without schema changes — a pump with 3 sensors and a gearbox with 7 sensors both write to the same table.

Partitioning Strategy

At production scale, telemetry_events will be the largest table by orders of magnitude. Two partitioning strategies apply:

By time (primary): Monthly range partitions on event_time. This aligns with the most common query pattern — “show me the last 30 days of readings for this asset” — and enables efficient partition pruning. Old partitions can be detached and moved to cold storage without affecting current queries.

By asset (secondary): Within each time partition, a hash partition on asset_id distributes data evenly across storage. This prevents hot spots when a high-frequency sensor floods the system with readings.

Retention Policy

Tier	Retention	Granularity	Storage
Hot	90 days	Raw readings (every sample)	Primary PostgreSQL
Warm	1 year	Hourly aggregates (min, max, avg, stddev)	Partitioned PostgreSQL
Cold	5+ years	Daily aggregates	Object storage (Parquet files)

Raw waveform data (time-series arrays from accelerometers) is never stored in the relational database. It goes to object storage via the waveform_files table, which stores only the metadata (storage URI, sampling rate, duration) alongside a reference to the asset and sensor.

Data Quality

Module A as the Data Quality Gate

Module A (Signal Validation) is the first module in the pipeline, and its primary purpose is to prevent bad data from entering the system. It applies 19 GUARD rules (DG001-DG019) that check for:

NaN values in sensor readings
Flatline signals (sensor stuck at one value)
Signal clipping (amplitude exceeding sensor range)
Sampling rate violations
Unit mismatches
Excessive noise floors

Each GUARD rule produces either a hard block (signal rejected, downstream modules receive nothing) or a soft penalty (signal accepted with a reduced quality score). The quality score is a multiplicative product of all triggered penalties:

quality_score = product(penalty_i for each triggered rule)

A signal with no issues scores 1.0. A signal with two soft penalties of 0.8 and 0.9 scores 0.72. This score propagates downstream through every subsequent module’s confidence calculation.

How Bad Data Degrades Confidence Downstream

Confidence propagation is the mechanism by which Module A’s quality assessment affects every downstream result. The formula:

Canonical reference: See Chapter 6 for the authoritative confidence propagation formula.

C_final = Q_data x (1 - product(1 - C_i))

Where Q_data is Module A’s quality score and C_i are the individual rule confidence scores from Modules B through E. If Q_data = 0.72 (marginal signal quality), even a perfectly matched failure mode rule with C_i = 0.95 produces a final confidence of only 0.72 x 0.95 = 0.68. The system is honest about uncertainty — it will not claim high confidence from low-quality data.

This design means that RAPID AI degrades gracefully. A noisy sensor does not produce false alarms — it produces lower-confidence diagnoses that explicitly flag the data quality concern.

Data Validation Rules at the Database Level

Beyond Module A’s runtime checks, the database schema enforces structural integrity:

Temporal constraints:

analysis_request.end_ts >= analysis_request.start_ts
runtime_interval.end_ts > runtime_interval.start_ts
failure_event.failure_end_ts >= failure_event.failure_start_ts
sensor_evidence_summary.window_end_ts > sensor_evidence_summary.window_start_ts

Referential constraints:

Every metric_result row must have exactly one parent — either asset_result_id or area_result_id, never both, never neither.
Every dependency_edge must reference source and target nodes within the same graph_id.
Every metric_result.metric_code must exist in metric_catalog.

Quality tracking:

The quality_code column on telemetry_events (valid/suspect/missing) enables downstream queries to filter or weight readings by data quality.
The quality_flag_result table stores data sufficiency and confidence outputs separately from metric results, making it possible to audit why a particular analysis produced low-confidence results.

These constraints ensure that even if application-level validation fails, the database will reject structurally invalid data. Defense in depth — the same principle that makes RAPID AI’s diagnostic pipeline reliable also governs its data layer.

Next: Chapter 16 — Roadmap

Standards Alignment

Standard	Relevance to This Chapter
ISO 14224 — Reliability and maintenance data	The normalized schema (8 tables x 100 rows) and reliability schema (18 tables) implement ISO 14224’s standardized data structure for equipment taxonomy, failure modes, maintenance events, and reliability metrics.
ISO 13374 — Condition monitoring and diagnostics of machines	The data architecture supports ISO 13374’s processing chain by providing structured storage for each processing level’s inputs and outputs, from raw sensor data through diagnostic results to advisory outputs.
MIMOSA OSA-CBM — Open System Architecture for CBM	The schema_relation_map table implements OSA-CBM’s data interoperability requirements by maintaining traceable links between asset hierarchy, failure modes, detection rules, and maintenance actions.

Changelog

Version	Date	Author	Changes
2.1.0	2026-03-17	Rick D	Added standards alignment, living doc metadata, changelog
2.0.0	2026-03-17	Rick D	Enriched with production codebase content
1.0.0	2026-03-17	Rick D	Initial chapter creation