Provenance as a Compliance Primitive
When an AI agent writes "your blood pressure trend is improving," how do you know which readings it used? Were they from a medical-grade device or a consumer app? Were they from three days ago or three months ago? Without provenance, you cannot answer these questions. This post explains how the Cascade Protocol turns provenance into a first-class compliance primitive.
The Trust Problem
Health data is not homogeneous. A blood pressure reading from a certified medical-grade device that has been calibrated, validated, and imported from an EHR is fundamentally different from a blood pressure reading entered manually by a patient who may have used a consumer cuff, may have misread the display, or may have estimated from memory.
Yet when AI agents process health data, they routinely work with both types of data without any systematic way to distinguish them. The data arrives as fields in a JSON object or rows in a table. There is no attached quality classification, no source attribution, no chain of custody.
This creates a category of problems that cannot be solved at the application layer:
- Clinical decision support: An agent should weight a certified lab value differently from a patient-estimated value. If the data does not carry this distinction, the agent must either treat all data equally (unsafe) or require the operator to maintain external source-quality tables (fragile and unauditable).
- Regulatory audit: HIPAA requires that organizations maintain audit trails documenting what data was accessed and by whom. But "what data" means nothing without knowing the provenance of that data. A regulator reviewing an AI-generated clinical summary needs to know whether the underlying values came from certified sources.
- Liability: If an AI-generated recommendation is acted upon clinically and the outcome is negative, the legal question is: what data did the AI use, and was that data trustworthy? Without provenance, this question is unanswerable.
These are not theoretical risks. They are the baseline compliance requirements for any application that uses AI to process health data and surfaces the results to patients or clinicians.
The W3C PROV-O Model
The W3C PROV-O Provenance Ontology is a standard for representing and querying provenance information in RDF. It defines three core concepts:
prov:Entity— A thing that exists or has existed. In health data, an entity is a specific lab result, medication record, or device reading.prov:Activity— Something that occurred. In health data, an activity might be a clinical import, a device sync, a manual data entry session, or an AI analysis run.prov:Agent— Something that bears responsibility for an activity. In health data, an agent might be a clinical system, a device manufacturer, the patient themselves, or an AI model.
PROV-O connects these concepts with relations:
prov:wasGeneratedBy— Links an entity to the activity that produced itprov:wasAttributedTo— Links an entity to the agent responsible for itprov:wasDerivedFrom— Links an entity to the entities it was derived fromprov:used— Links an activity to the entities it consumed
A simple example: a blood pressure reading (prov:Entity) was generated by a device sync activity (prov:Activity) performed by an Apple Watch (prov:Agent). Those three facts, expressed as RDF triples, give you a machine-readable chain of custody.
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix cascade: <https://ns.cascadeprotocol.org/core/v1#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
# The blood pressure reading (entity)
<urn:uuid:bp-reading-001>
a prov:Entity ;
prov:wasGeneratedBy <urn:uuid:device-sync-activity-001> ;
prov:wasAttributedTo <urn:uuid:apple-watch-agent> ;
prov:generatedAtTime "2026-02-15T09:15:00Z"^^xsd:dateTime .
# The device sync activity
<urn:uuid:device-sync-activity-001>
a prov:Activity ;
cascade:dataProvenance cascade:DeviceGenerated .
# The device agent
<urn:uuid:apple-watch-agent>
a prov:Agent ;
cascade:agentDescription "Apple Watch Series 10, FDA 510(k) cleared" .
Cascade's Five Provenance Classes
The Cascade Protocol defines five provenance classes in the core vocabulary, each representing a distinct source of data with different trust implications:
cascade:ClinicalGenerated
Data originating from an EHR or clinical system. Imported via Apple Health clinical records, HL7 FHIR export, or direct EHR integration. This represents the highest-trust data source for clinical decision-making: the record was created by a certified clinical system, potentially reviewed by a clinician, and follows formal data quality standards.
Examples: Lab results from a hospital lab panel; medication records imported from an EHR; diagnoses from a physician's chart note.
cascade:DeviceGenerated
Data generated by a wearable or medical device. Trust depends heavily on the device: an FDA 510(k)-cleared blood pressure monitor is considerably more trustworthy than a consumer fitness tracker. Consuming applications can attach device metadata to add specificity. Device-generated data is typically objective (not self-reported) but may have measurement error or calibration drift.
Examples: Apple Watch heart rate and SpO2 readings; CGM glucose values; validated blood pressure cuff readings synced via Bluetooth.
cascade:SelfReported
Data entered directly by the patient or care recipient. This data is genuinely valuable — patients often know things about their health that do not appear in clinical records, such as supplement use, symptom patterns, and lifestyle factors. But it is unverified: the patient may misremember, estimate, or enter data that differs from clinical measurements.
Examples: Manually entered medication doses; symptom diary entries; self-reported supplement use; diet and exercise logs.
cascade:AIExtracted
Structured data that an AI model extracted from an existing document. For example, parsing a scanned lab report PDF into structured fields, or extracting medication names from a discharge summary. The source document may be authoritative, but the extraction process introduces AI error. Data tagged as AIExtracted should be treated as requiring human review before clinical use.
Examples: Medications parsed from a scanned prescription; lab values extracted from a PDF report; diagnoses structured from a clinical note.
cascade:AIGenerated
Data created entirely by AI reasoning. Observations, summaries, risk assessments, recommendations, and trend analyses written by an AI agent. This is not "extracted" from any specific source — it is the AI's synthetic output. It must always be clearly distinguished from clinical data, which is why the Cascade MCP server automatically tags all agent writes with this provenance class. Agents cannot override this.
Examples: AI-written health summaries; drug interaction observations; trend analysis; visit preparation notes generated by an AI agent.
The K+ / Lisinopril Scenario
Let's walk through a concrete clinical scenario to see how the five provenance classes enable differential trust reasoning.
A patient takes Lisinopril for hypertension and has recently started taking a potassium gluconate supplement they read about online. Their most recent lab panel shows serum potassium at 5.1 mEq/L — slightly above the reference range.
Serum K+ = 5.1 mEq/L — cascade:ClinicalGenerated
This lab result was imported from the patient's EHR via Apple Health clinical records. It carries the LOINC code for serum potassium (loinc:2951-2) and a reference range of 3.5–5.0 mEq/L. The agent treats this as the highest-trust data point: a certified lab measurement from a clinical system. The fact that it is above the reference range is clinically significant.
Potassium supplement 500mg BID — cascade:SelfReported
The patient entered this supplement in the app manually. The agent can see it carries cascade:SelfReported provenance. This means the supplement is clinically plausible and worth attending to — but it is unverified. The agent notes that this could explain the elevated K+ value, but flags that the supplement data should be verified with the patient before clinical action.
Lisinopril 10mg daily — cascade:ClinicalGenerated
The Lisinopril prescription was imported from the patient's pharmacy records via the EHR. It carries a SNOMED CT code for Lisinopril (sct:372756006). This is high-trust data: an active prescription from a clinical system. The agent knows Lisinopril is an ACE inhibitor and can cause potassium retention as a side effect.
Interaction observation — cascade:AIGenerated
The agent writes a structured observation noting the potential interaction: ACE inhibitor + potassium supplementation + elevated serum K+. The MCP server automatically tags this observation as AIGenerated. The observation explicitly links to the three source records via prov:wasDerivedFrom. Anyone reading the audit trail can see exactly what the agent used, and that the output is AI reasoning, not a clinical measurement.
The provenance chain here is not just a nice-to-have. It is what makes this scenario auditable. A clinician reviewing the AI-generated observation can see that it is based on one certified lab value, one certified prescription record, and one self-reported supplement — and can calibrate their response accordingly.
The Compliance Angle
HIPAA Audit Trails
HIPAA's Security Rule (45 CFR § 164.312(b)) requires covered entities to implement hardware, software, and procedural mechanisms that record and examine activity in information systems that contain or use electronic protected health information. This requirement is typically interpreted as a technical audit trail.
Cascade's provenance model and audit log satisfy the spirit of this requirement in two ways. First, every record carries data provenance metadata: you can query the Pod for all records created by AI agents, all records derived from specific source data, or all records of a given trust level. Second, the MCP server writes a structured audit entry to provenance/audit-log.ttl for every operation, including the agent ID, timestamp, operation type, and data categories accessed.
21st Century Cures Act
The 21st Century Cures Act's information blocking provisions require that health data be accessible to patients in ways they can use. Provenance is a key part of this: when a patient receives a summary of their health data, they have a right to know where that data came from and whether any part of it was AI-generated.
The Cascade provenance model makes this possible at the data level. Every record is tagged. Every AI-generated observation carries a clear label and links back to its source records. Patients and their authorized representatives can inspect the Pod directly and see exactly what an AI agent read and wrote.
Data Lineage for Regulatory Submissions
In research and clinical trial contexts, data lineage — the complete chain from raw measurement to analyzed result — is a regulatory requirement for submissions to the FDA and other agencies. The Cascade provenance model provides this chain in machine-readable form. A SPARQL query over the Pod can reconstruct the complete derivation graph for any AI-generated output.
The Audit Log
Beyond the provenance metadata on individual records, the Cascade MCP server maintains an append-only audit log at provenance/audit-log.ttl within the Pod. This log records every operation performed by an AI agent through the MCP server.
Each cascade:AuditEntry in the log records:
- Timestamp: ISO 8601 datetime of the operation
- Operation type:
read,write,query,validate, orconvert - Data types accessed: An RDF list of the categories of data touched (e.g.,
"medications","lab_results") - Agent ID: The identifier of the AI agent or tool that performed the operation
- Record count: The number of records read or written
Crucially, the audit log is stored in the same RDF/Turtle format as the rest of the Pod data. It is not a separate database or log file in a proprietary format. It is a first-class citizen of the data model, queryable with any RDF tool.
The audit log as a compliance artifact
Because the audit log is machine-readable RDF, it can be used directly in regulatory submissions, legal proceedings, and compliance audits. It does not require any translation or export step. A regulator can query the log directly using standard SPARQL tools.
Full TTL Example: Provenance Chain + Audit Log
Here is a complete Turtle example showing the provenance chain for the K+ scenario above, including the AI-generated observation and the corresponding audit log entries:
@prefix cascade: <https://ns.cascadeprotocol.org/core/v1#> .
@prefix clinical: <https://ns.cascadeprotocol.org/clinical/v1#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix loinc: <http://loinc.org/rdf#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
# ── Serum K+ Lab Result ───────────────────────────────────────────────────
<urn:uuid:lab-k-serum-001>
a clinical:LabResult ;
clinical:loincCode loinc:2951-2 ;
clinical:labTestName "Potassium, Serum" ;
clinical:value "5.1"^^xsd:decimal ;
clinical:unit "mEq/L" ;
clinical:referenceRangeLow "3.5"^^xsd:decimal ;
clinical:referenceRangeHigh "5.0"^^xsd:decimal ;
clinical:resultDate "2026-02-15"^^xsd:date ;
cascade:dataProvenance cascade:ClinicalGenerated ;
prov:wasGeneratedBy cascade:ClinicalGenerated ;
prov:generatedAtTime "2026-02-15T14:00:00Z"^^xsd:dateTime .
# ── AI-Generated Interaction Observation ─────────────────────────────────
<urn:uuid:obs-k-interaction-001>
a cascade:Observation ;
cascade:observationText """Potential ACE inhibitor + potassium supplementation interaction.
Serum K+ is 5.1 mEq/L (above reference), with concurrent Lisinopril
(ClinicalGenerated) and self-reported potassium gluconate supplementation.
Recommend clinical review before adjusting supplementation.""" ;
cascade:dataProvenance cascade:AIGenerated ;
cascade:agentId "claude-desktop-v1" ;
cascade:confidence "0.85"^^xsd:decimal ;
prov:wasGeneratedBy cascade:AIGenerated ;
prov:wasDerivedFrom
<urn:uuid:lab-k-serum-001> ,
<urn:uuid:med-lisinopril-001> ,
<urn:uuid:med-k-supplement-001> ;
prov:generatedAtTime "2026-02-22T10:30:00Z"^^xsd:dateTime .
# ── Audit Log Entries ─────────────────────────────────────────────────────
# (stored in provenance/audit-log.ttl)
<#audit-read-medications> a cascade:AuditEntry ;
cascade:timestamp "2026-02-22T10:29:45.102Z"^^xsd:dateTime ;
cascade:operation "read" ;
cascade:dataTypes ("medications") ;
cascade:agentId "claude-desktop-v1" ;
cascade:recordsAccessed 3 .
<#audit-read-labs> a cascade:AuditEntry ;
cascade:timestamp "2026-02-22T10:29:52.781Z"^^xsd:dateTime ;
cascade:operation "read" ;
cascade:dataTypes ("lab_results") ;
cascade:agentId "claude-desktop-v1" ;
cascade:recordsAccessed 8 .
<#audit-write-observation> a cascade:AuditEntry ;
cascade:timestamp "2026-02-22T10:30:01.449Z"^^xsd:dateTime ;
cascade:operation "write" ;
cascade:dataTypes ("observations") ;
cascade:agentId "claude-desktop-v1" ;
cascade:recordsAccessed 1 .
The complete trace: what the agent read, what it wrote, the exact time of each operation, and the full provenance chain linking the AI-generated output back to its source data. All in standard RDF/Turtle, readable by any person or machine.
Conclusion
Provenance is not a logging afterthought. It is a first-class semantic property of every health data record. When you build on the Cascade Protocol, every piece of data your AI agent reads carries a machine-readable classification of how trustworthy it is. Every piece of data your AI agent writes carries an explicit label identifying it as AI output. And the audit log keeps a complete record of every operation.
This is what it means to treat provenance as a compliance primitive: it is not added on top of the data model, it is woven into it. The cost of getting this right is substantially lower than the cost of retrofitting provenance tracking onto a system that was not designed with it in mind.
For organizations building AI-powered health applications, the Cascade Protocol's provenance model provides a structural answer to the compliance questions that regulators, legal teams, and clinical partners will inevitably ask: what data did the AI use, where did that data come from, and how do we know the AI's output is clearly labeled as such?
Further reading
- Security & Compliance Guide — detailed provenance model and audit log specification
- Agent examples — working code for MCP server integration
- Why RDF/OWL is the Right Foundation for Health AI Agents — the semantic grounding context for this post
- W3C PROV-O Specification