February 22, 2026 · ~2,000 words · For AI/ML developers

Why RDF/OWL is the Right Foundation for Health AI Agents

AI agents working with health data typically receive raw JSON with no schema context, proprietary API formats, or flat CSVs. None of these are self-describing. RDF/OWL solves the semantic grounding problem, and here is why the Cascade Protocol is built on it.

RDF AI Agents Ontology Health Data OWL

The Problem

Imagine you are building an AI agent to help patients manage their medications. Your agent needs to understand what medications a patient takes, what the dosages are, when they started, and what clinical codes map to those medications. You have data. You need the agent to reason about it.

In practice, you are probably handing that agent one of three things:

Ad-hoc JSON: A bespoke schema your team invented. No external documentation. The agent has to guess what "med" means, whether "dose" is a string or a number, and what unit system is in use.
Proprietary API responses: A format specific to one EHR vendor or health data platform. Semantics are locked in that vendor's documentation, which the agent cannot access at inference time.
Flat CSV exports: Column headers like MedicationName, StartDate, Dose. No type information, no coding system, no provenance. Is "25" in the Dose column milligrams or micrograms? The agent cannot tell.

None of these formats are self-describing. The agent must rely on external documentation, system prompts, or hand-crafted schema definitions to understand the data it is working with. This creates fragility: add a new field, rename a column, change a coding system, and the agent's understanding breaks.

This is the semantic grounding problem for health AI. It is not a small inconvenience. It is the reason most health AI applications are brittle, difficult to audit, and nearly impossible to reason about for compliance purposes.

Why this matters for compliance

If your agent cannot reliably identify what data it is working with, you cannot build a reliable audit trail. You cannot demonstrate to a regulator that the agent used a lab value from a certified device rather than a self-reported estimate. Semantic grounding is not just a developer ergonomics problem — it is a compliance foundation.

The RDF Difference

RDF (Resource Description Framework) takes a fundamentally different approach to data. Instead of key-value pairs in a document, RDF models everything as a graph of triples: subject, predicate, object. Every resource is identified by a URI. Every predicate is a globally defined term with formal semantics. The graph structure enables reasoning.

Consider what happens when an agent reads a Cascade Protocol Turtle file:

URIs are globally identified. The subject <urn:uuid:a1b2c3d4> is a universally unique identifier. It refers to exactly one thing, anywhere in the world.
Predicates are typed. health:rxNormCode is defined in the Cascade health vocabulary. Its domain and range are specified in the OWL ontology. An agent that loads the ontology knows this is an RxNorm drug code, not a free-text field.
The graph structure enables traversal. An agent can follow a triple from a medication record to its provenance record, to the originating clinical system, to the patient's WebID, without any additional schema documentation.
The data is self-describing. The Turtle prefixes at the top of every file declare the vocabularies in use. An agent reading the file knows exactly what ontology namespace each property comes from. No external documentation required at inference time.

When an agent reads a Cascade Turtle file, it knows exactly what every triple means. Not because you told it in a system prompt. Because the data itself carries the semantic context as global URIs that resolve to formal ontology definitions.

Three-Way Comparison

Let's make this concrete. Here is the same medication record — Metoprolol 25mg, started January 15, 2024 — represented three different ways.

Option A: Ad-hoc JSON

{
  "med": "Metoprolol",
  "dose": "25mg",
  "start": "2024-01-15"
}

— No types, no coding system, no unit disambiguation, no provenance. The field name "med" is ambiguous. Is "25mg" a string or two fields? The date format is undeclared. An agent working with this data is making assumptions at every step.

Option B: FHIR R4 MedicationStatement JSON

{
  "resourceType": "MedicationStatement",
  "status": "active",
  "medicationCodeableConcept": {
    "coding": [{
      "system": "http://www.nlm.nih.gov/research/umls/rxnorm",
      "code": "866511",
      "display": "Metoprolol Tartrate 25 MG Oral Tablet"
    }]
  },
  "effectivePeriod": {
    "start": "2024-01-15"
  },
  "dosage": [{
    "doseAndRate": [{
      "doseQuantity": {
        "value": 25,
        "unit": "mg",
        "system": "http://unitsofmeasure.org",
        "code": "mg"
      }
    }]
  }]
}

— Typed and coded, which is a significant improvement. The RxNorm code and UCUM unit are present. But the FHIR schema is enormous and requires external documentation to understand. There is no native provenance model — you would need to add a FHIR Provenance resource separately, with no formal link between them. And an agent receiving this JSON still needs to know the FHIR schema to parse it correctly.

Option C: Cascade Turtle

@prefix cascade: <https://ns.cascadeprotocol.org/core/v1#> .
@prefix clinical: <https://ns.cascadeprotocol.org/clinical/v1#> .
@prefix health:   <https://ns.cascadeprotocol.org/health/v1#> .
@prefix prov:     <http://www.w3.org/ns/prov#> .
@prefix sct:      <http://snomed.info/id/> .
@prefix xsd:      <http://www.w3.org/2001/XMLSchema#> .

<urn:uuid:a1b2c3d4-e5f6-7890-abcd-ef1234567890>
    a clinical:MedicationRecord ;
    clinical:medicationName "Metoprolol Tartrate" ;
    clinical:dose "25"^^xsd:decimal ;
    clinical:doseUnit "mg" ;
    clinical:rxNormCode "866511" ;
    clinical:snomedCode sct:372756006 ;
    clinical:startDate "2024-01-15"^^xsd:date ;
    cascade:dataProvenance cascade:ClinicalGenerated ;
    cascade:schemaVersion "1.6" ;
    prov:wasGeneratedBy cascade:ClinicalGenerated ;
    prov:generatedAtTime "2024-01-15T10:00:00Z"^^xsd:dateTime .

— Typed, coded, provenance-attached, and self-describing. The dose is a decimal, the unit is explicit. The SNOMED CT code and RxNorm code are present as globally resolvable URIs. Provenance is attached directly to the record using W3C PROV-O. An agent reading this file can load the vocabulary ontology at https://ns.cascadeprotocol.org/clinical/v1# and know the formal semantics of every property without any additional documentation.

The OWL Ontology Angle

RDF gives you a graph model. OWL (Web Ontology Language) gives you formal semantics for that graph. The Cascade Protocol vocabulary files are OWL ontologies, which means they define classes, properties, domain and range constraints, and class hierarchies that an agent or reasoner can use to make inferences.

For example, the clinical vocabulary defines:

clinical:MedicationRecord is an OWL class. An agent that loads the ontology knows that any resource of this type represents a medication, not a vital sign or a lab result.
clinical:snomedCode has a formal domain (clinical:MedicationRecord) and a range (an SNOMED CT URI). The agent knows this property is always a SNOMED CT concept URI, not a free-text description.
Class hierarchies let you reason about type subsumption. A clinical:LabResult is a subclass of clinical:ClinicalRecord. A query for all clinical records will return lab results, which is what you want.

This matters for AI agents because it means an agent can reason about the data without being told explicitly. If an agent knows that clinical:snomedCode produces SNOMED CT URIs, and it knows that SNOMED CT encodes drug-supplement interaction information, it can traverse those relationships to identify potential interactions — using the ontology as its reasoning backbone, not hard-coded logic in the system prompt.

Formal vs. informal semantics

JSON schemas can describe structure, but they cannot describe semantics. You can say that a field is a string, but you cannot say that the string is a SNOMED CT code that participates in subsumption relationships with other clinical concepts. OWL ontologies can. That gap is the entire difference between a JSON API and a knowledge graph.

The Three-Layer Architecture

The Cascade Protocol organizes health semantics into three layers that build on each other:

Layer 1: Established Standards

These are the universal languages of healthcare: SNOMED CT for clinical concepts, LOINC for lab and observation identifiers, RxNorm for medications, ICD-10 for diagnoses, UCUM for units, FHIR for resource structure. The Cascade Protocol does not replace these — it references them as external URIs in every record.

A SNOMED CT code is always a resolvable URI: <http://snomed.info/id/372756006> identifies Lisinopril, globally and unambiguously. A LOINC code like http://loinc.org/rdf#2951-2 identifies serum potassium. These are not strings in a field; they are globally identifiable resources in the linked data graph.

Layer 2: Cascade Domain Vocabulary

The domain vocabularies — health: for device-generated wellness data, clinical: for EHR-imported records — model health data at the application level. They provide the properties that link to Layer 1 codes and add domain-specific context: provenance type, schema version, data quality annotations.

This is where Cascade creates vocabulary that does not yet exist in any standard. Properties like cascade:dataProvenance link to the provenance classification system. Properties like health:rxNormCode make the RxNorm linkage explicit as a typed property, not a generic string.

Layer 3: Application Vocabulary

The application-specific vocabularies — pots: for POTS screening data, checkup: for patient intake summaries — model the user-facing artifacts that aggregating applications need. These are patient-friendly representations that combine data from multiple Layer 2 sources into actionable summaries.

The three layers ensure that Cascade data always traces back to established standards. An AI agent can start with a patient-facing checkup:MedicationSummary, traverse to the underlying clinical:MedicationRecord records, and from there follow the SNOMED CT URIs to the global clinical ontology.

Agent Workflow Example

Here is a concrete scenario that shows how an AI agent uses the Cascade vocabulary to reason about a potential drug-supplement interaction — without any hard-coded logic.

Scenario: A patient takes Lisinopril (prescribed, EHR-imported) and a potassium supplement (self-reported). Their latest serum potassium from a lab panel is 5.1 mEq/L, flagged slightly above the reference range.

Step 1: Agent receives the Cascade Pod

The agent connects to the local MCP server (cascade serve --mcp) and queries for medications and lab results. The MCP server returns Turtle data for both. The agent does not need to parse a proprietary schema — it reads the vocabulary prefixes and knows it is working with clinical: data.

Step 2: Agent identifies the K+ supplement

The supplement record carries clinical:snomedCode sct:88526004 (the SNOMED code for dietary potassium supplement). Because this is an RDF URI, the agent can recognize it as a potassium compound without any keyword matching. The provenance is cascade:SelfReported — the agent notes this data came from the patient, not from a prescriber.

Step 3: Agent cross-references with Lisinopril

The Lisinopril record carries clinical:snomedCode sct:372756006. Both SNOMED codes are in the RDF graph. The agent, using the SNOMED ontology or its training knowledge, recognizes that Lisinopril is an ACE inhibitor that can cause potassium retention — creating a known additive risk when combined with potassium supplementation.

Step 4: Agent checks the lab value

The lab result record carries health:loincCode "http://loinc.org/rdf#2951-2" (serum potassium) and health:value "5.1"^^xsd:decimal with health:unit "mEq/L". The provenance is cascade:ClinicalGenerated — this is a certified lab value, not a consumer device reading.

Step 5: Agent reasons about trust and generates an observation

Because all three data points carry explicit provenance, the agent can reason about trust differentially: the Lisinopril prescription is ClinicalGenerated (highest trust), the supplement is SelfReported (valuable but unverified), and the lab value is ClinicalGenerated (certified). The agent writes a structured observation back to the Pod with cascade:AIGenerated provenance, noting the interaction risk and the evidence chain. The MCP server automatically attaches this provenance tag — the agent cannot claim its output is clinical data.

No hard-coded drug interaction logic. No keyword matching on medication names. The reasoning is grounded in globally identified clinical concepts, and the provenance chain is preserved end-to-end.

Full Turtle Example: Medication Interaction Scenario

Here is the complete Turtle representation of the scenario above, showing all three records and the AI-generated interaction observation:

@prefix cascade:  <https://ns.cascadeprotocol.org/core/v1#> .
@prefix clinical: <https://ns.cascadeprotocol.org/clinical/v1#> .
@prefix health:   <https://ns.cascadeprotocol.org/health/v1#> .
@prefix prov:     <http://www.w3.org/ns/prov#> .
@prefix sct:      <http://snomed.info/id/> .
@prefix loinc:    <http://loinc.org/rdf#> .
@prefix xsd:      <http://www.w3.org/2001/XMLSchema#> .

# ── Lisinopril (EHR-imported, ClinicalGenerated) ─────────────────────────
<urn:uuid:med-lisinopril-001>
    a clinical:MedicationRecord ;
    clinical:medicationName "Lisinopril" ;
    clinical:dose "10"^^xsd:decimal ;
    clinical:doseUnit "mg" ;
    clinical:frequency "once daily" ;
    clinical:snomedCode sct:372756006 ;
    clinical:rxNormCode "29046" ;
    clinical:startDate "2023-06-01"^^xsd:date ;
    cascade:dataProvenance cascade:ClinicalGenerated ;
    cascade:schemaVersion "1.6" ;
    prov:wasGeneratedBy cascade:ClinicalGenerated ;
    prov:generatedAtTime "2023-06-01T09:00:00Z"^^xsd:dateTime .

# ── Potassium Supplement (patient self-reported) ──────────────────────────
<urn:uuid:med-k-supplement-001>
    a clinical:SupplementRecord ;
    clinical:medicationName "Potassium Gluconate" ;
    clinical:dose "500"^^xsd:decimal ;
    clinical:doseUnit "mg" ;
    clinical:frequency "twice daily" ;
    clinical:snomedCode sct:88526004 ;
    cascade:dataProvenance cascade:SelfReported ;
    cascade:schemaVersion "1.6" ;
    prov:wasGeneratedBy cascade:SelfReported ;
    prov:generatedAtTime "2026-01-10T18:00:00Z"^^xsd:dateTime .

# ── Serum Potassium Lab Result (certified lab, ClinicalGenerated) ─────────
<urn:uuid:lab-k-serum-001>
    a clinical:LabResult ;
    clinical:loincCode loinc:2951-2 ;
    clinical:labTestName "Potassium, Serum" ;
    health:value "5.1"^^xsd:decimal ;
    health:unit "mEq/L" ;
    clinical:referenceRangeLow "3.5"^^xsd:decimal ;
    clinical:referenceRangeHigh "5.0"^^xsd:decimal ;
    clinical:resultDate "2026-02-15"^^xsd:date ;
    cascade:dataProvenance cascade:ClinicalGenerated ;
    cascade:schemaVersion "1.6" ;
    prov:wasGeneratedBy cascade:ClinicalGenerated ;
    prov:generatedAtTime "2026-02-15T14:00:00Z"^^xsd:dateTime .

# ── AI-Generated Interaction Observation ─────────────────────────────────
<urn:uuid:obs-drug-interaction-001>
    a cascade:Observation ;
    cascade:observationText """Potential drug-supplement interaction identified.
Lisinopril (ACE inhibitor, SNOMED 372756006) combined with potassium
gluconate supplementation (SNOMED 88526004) may cause additive
hyperkalemia. Current serum K+ is 5.1 mEq/L (LOINC 2951-2),
slightly above reference range. Recommend clinical review.""" ;
    cascade:dataProvenance cascade:AIGenerated ;
    cascade:agentId "claude-desktop-v1" ;
    cascade:confidence "0.87"^^xsd:decimal ;
    prov:wasGeneratedBy cascade:AIGenerated ;
    prov:wasDerivedFrom
        <urn:uuid:med-lisinopril-001> ,
        <urn:uuid:med-k-supplement-001> ,
        <urn:uuid:lab-k-serum-001> ;
    prov:generatedAtTime "2026-02-22T10:30:00Z"^^xsd:dateTime .

Every element in this file is traceable. Every data point has a provenance class. Every clinical concept has a SNOMED CT or LOINC URI. Every agent-generated output is tagged as AIGenerated with a confidence score and explicit links to the source records it derived from. This is what it looks like when AI reasoning is built on a proper semantic foundation.

Conclusion

The semantic grounding problem in health AI is not going to be solved by better prompt engineering or larger context windows. It is a structural problem: if the data you feed an agent has no self-describing semantics, the agent will always be working from incomplete information and making assumptions that are invisible to auditors.

RDF/OWL provides the foundation that health AI needs. Globally identified resources, formally typed predicates, class hierarchies that support reasoning, and native provenance attachment via W3C PROV-O. The Cascade Protocol builds this foundation into every record, at every layer, from device-generated wellness data through EHR-imported clinical records to AI-generated observations.

The cost of learning Turtle syntax and the RDF data model is real. But the alternative — building health AI agents on JSON with informal schemas and hope — has a much higher cost: brittle systems, unauditable reasoning, and compliance risk that compounds with every new data source you add.