February 22, 2026 · ~2,000 words · For health app developers and architects

Local-First is the Only Responsible Architecture for AI + Health Data

The common pattern: you build a health app, you add AI features, you send user health data to an AI API. You have just created a massive HIPAA compliance risk. This post makes the case that local-first architecture — AI agents that run on-device with zero bytes of health data leaving the machine — is the only genuinely responsible approach.

Local-First Privacy HIPAA MCP Architecture

The Common Pattern

Here is how most health apps add AI features in 2026:

User opens the app.
App pulls health data from local storage or an EHR API.
App assembles a prompt with the health data and sends it to an AI API (OpenAI, Anthropic, Google).
AI API returns a response.
App displays the AI-generated summary or recommendation to the user.

This pattern is fast, easy to implement, and produces impressive demos. It is also, in most configurations, a significant compliance problem that is easy to underestimate until something goes wrong.

The question is not whether this pattern produces a bad user experience — it often produces a great one. The question is whether you have thought through the full compliance, liability, and user trust implications of transmitting health data to a third-party AI service. Most teams have not, because the default developer experience of AI APIs makes it trivially easy to skip that thinking.

Before you continue building

If you are sending health data to any third-party AI API today, ask yourself: Does your company have a signed Business Associate Agreement (BAA) with that AI provider? Do you know where the AI provider stores inference data? Do you know whether the AI provider uses API inputs for model training? If you cannot answer all three with certainty, you have compliance exposure.

The Risks of Cloud AI + Health Data

⚠

No BAA with most LLM providers

HIPAA requires a signed Business Associate Agreement with any vendor that handles PHI on your behalf. Most major AI providers (OpenAI, Anthropic, Google) do not offer BAAs to individual developers or small companies by default. Enterprise-tier agreements are available, but they require formal negotiation and typically prohibit consumer health data use cases. If you are sending PHI to an AI API without a BAA, you are in violation of HIPAA, regardless of how compelling the product is.

⚠

Unclear data retention policies

AI providers retain inference data for varying lengths of time. Even if a provider claims it does not use API inputs for training (OpenAI's API has this policy), data is typically retained for safety monitoring and abuse detection. Your users' health data is sitting in a third-party's infrastructure, subject to that provider's retention schedule, security posture, and potential legal process.

⚠

Vendor risk: terms of service can change

You build your product on an AI API's current terms of service. The provider changes those terms, or gets acquired, or changes its data handling policies. Your users' health data is now subject to a different data handling regime — one they never consented to and one you did not design for. GDPR's data minimization and purpose limitation principles are particularly relevant here.

⚠

Users have no control over their own data

Once health data is transmitted to a third-party AI provider, the user has lost meaningful control over it. They cannot delete it from the provider's systems. They cannot audit what the provider did with it. They cannot revoke access. This is in direct tension with the 21st Century Cures Act's information blocking provisions and GDPR's right to erasure.

⚠

Cross-contamination risk at scale

When multiple users' health data passes through the same AI inference infrastructure, there is a non-zero risk of cross-user contamination in model outputs, caching layers, or logging systems. This risk is small but non-zero for any shared infrastructure. For health data, "small but non-zero" is not an acceptable risk profile.

The Local-First Alternative

The Cascade Protocol is designed from the ground up around a different architecture: the AI agent runs on the user's device or local server, and health data never leaves the machine.

[ Health Data Pod ] ——> [ Local MCP Server ] ——> [ AI Agent (local) ]

ZERO bytes of health data transmitted to external services

This is not a limitation. It is the correct design for health AI. The AI agent needs access to the user's health data to do useful work. The health data does not need to leave the device for the agent to access it. The model context protocol (MCP) provides a standardized interface that runs entirely over local socket connections.

The key insight is that there are two separable concerns:

The AI model: The language model that does the reasoning. This can be a locally-running model (Ollama, LM Studio, etc.) or a cloud model accessed via a network connection. When using a cloud model, you are sending the model's context to the cloud, which includes whatever data you put in the prompt.
The health data: The structured health records in the user's Pod. These should never go in the prompt. Instead, the agent uses MCP tools to retrieve specific, relevant data in structured form, do its reasoning, and write results back — all locally.

Wait — if the agent uses a cloud model and the MCP tools return health data, doesn't that data still end up in the model's context? Yes. This is the honest complexity of local-first health AI. We address it in the limits section below.

How It Works Technically

The Cascade local-first architecture has four components:

1. The Local Pod

Health data is stored in a local Pod — a directory of RDF/Turtle files on the user's device. When using the CascadeSDK (Swift), this data is encrypted at rest with AES-256-GCM, with encryption keys stored in the device keychain. The Pod directory contains all of the user's health records, organized by data type.

2. The Local MCP Server

Running cascade serve --mcp starts a local Model Context Protocol server that listens on a stdio or socket interface. This server exposes a set of tools that allow an AI agent to read from and write to the Pod in a structured, consent-gated way. All file operations are validated against the Pod boundary to prevent path traversal.

Critically, the MCP server runs entirely locally. It makes no outbound network connections. Its only communication is with the AI agent framework on the same machine via the MCP protocol.

3. The AI Agent Framework

Claude Desktop, and increasingly other AI clients, support MCP natively. You configure the Cascade MCP server in Claude Desktop's MCP configuration file, and Claude can then use the Cascade tools to read and analyze the user's health data in a conversational interface.

When Claude uses an MCP tool to read health data, that data flows: Pod files → MCP server (local) → Claude Desktop client (local) → Claude API (cloud, for the model inference). The model receives the data in its context window and generates a response. The response flows back: Claude API → Claude Desktop client → displayed to user.

4. Audit Logging

Every MCP tool invocation is logged to the Pod's audit trail at provenance/audit-log.ttl. The user can see exactly what data the agent accessed and when. Every agent write is tagged as AIGenerated provenance, clearly distinguishing AI-generated content from clinical data.

Your device (nothing leaves this boundary)

Health Pod → cascade serve --mcp → Claude Desktop

RDF/Turtle files — AES-256-GCM — stdio/socket — local process

↑ model inference only (no raw PHI if using local model)

Cloud: Claude API (for model inference, if not using a local model)

The Compliance Argument

✓

No BAA required when data doesn't leave the device

If health data never leaves the user's device, there is no business associate relationship to create with the AI provider. The AI provider is processing model inputs and outputs, not receiving or storing health data on behalf of your business. This is a bright-line compliance position, not a grey area. (Note: if you are using a cloud AI API and health data does appear in the model's context, you should still consult legal counsel about BAA requirements.)

✓

The audit log is local and user-controlled

The Pod's audit log is stored on the user's device, not in your cloud infrastructure. Users can read it, export it, and delete it. Regulators can inspect it. There is no data silo in your infrastructure that a regulator would need subpoena power to access. Everything the AI agent did is recorded in a standard RDF/Turtle file that any person or machine can read.

✓

Data can be deleted trivially

GDPR's right to erasure requires that you be able to delete a user's data on request. With a local-first architecture, deletion is straightforward: remove the Pod directory (or, with CascadeSDK, delete the keychain key, which renders the AES-256-GCM-encrypted data cryptographically irrecoverable). There is no data spread across your servers, your AI provider's servers, and your logging infrastructure that needs to be coordinated and deleted.

✓

Users see exactly what the agent accessed

Because the audit log is a user-accessible file in the Pod, users can read it directly. They can see every read and write operation the agent performed, what data categories it touched, and what it wrote back. This is the 21st Century Cures Act's spirit of patient data transparency made concrete.

Practical Example: Claude Desktop MCP Configuration

Setting up Claude Desktop to access a Cascade Pod locally takes about two minutes. Add the following to your Claude Desktop MCP configuration file (~/.claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "cascade-health-pod": {
      "command": "cascade",
      "args": [
        "serve",
        "--mcp",
        "--pod-path", "/Users/yourname/health-pod"
      ]
    }
  }
}

With this configuration in place, Claude Desktop will start the Cascade MCP server as a subprocess when it launches. The MCP server provides Claude with the following tools:

cascade_pod_read — Read structured health data from the Pod by data type
cascade_pod_write — Write AI-generated observations and summaries back to the Pod with automatic AIGenerated provenance tagging
cascade_pod_query — Query the Pod for specific records or data ranges
cascade_validate — Validate Pod data against SHACL shapes
cascade_convert — Convert Pod data to FHIR R4 JSON for clinical system interoperability

Claude can now ask questions like "Review my medication list and check for any potential interactions" and access the relevant records directly through the MCP interface — without any health data being sent to the AI provider outside of the model inference context.

No code required

The above configuration is the entire integration. There is no application layer to build, no data pipeline to maintain, no API keys to manage for the health data access layer. The Cascade CLI handles all the structured data access, and the MCP protocol handles the agent-tool interface.

The Offline Assurance

There is a simple test you can perform to verify the local-first architecture is working as expected: disconnect from the internet, then use Claude Desktop to query your health Pod.

If you are using a local AI model (via Ollama or LM Studio), the entire workflow continues to work offline. The model runs locally, the MCP server runs locally, and the Pod is local. Zero network dependencies.

If you are using Claude Desktop with Anthropic's cloud API, the model inference will fail offline (Claude needs to reach Anthropic's servers for inference). But the MCP server and Pod access remain local — you can confirm this by watching network traffic: the Cascade MCP server makes no outbound connections whatsoever.

This offline test is a useful sanity check during development. If your AI + health data workflow does not degrade gracefully offline, it means health data is flowing out of your local environment during normal operation — which may not be what you intended.

For extra assurance

You can verify the absence of outbound connections from the Cascade MCP server at any time using a network monitor (Little Snitch on macOS, or lsof -i / Wireshark). The server process should show zero network connections during normal operation.

What Local-First Cannot Do

Honesty matters here. Local-first architecture has real limitations that are worth acknowledging explicitly.

Using a cloud AI model means health data enters the model's context

When Claude Desktop uses a Cascade MCP tool to read health data and includes it in a message to Claude (Anthropic's cloud API), that health data is being sent to Anthropic's servers for model inference. The MCP server itself is local and makes no outbound connections, but the data Claude retrieves via MCP tools does appear in the model context that goes to the API.

This is a fundamental architectural tension: you cannot use a cloud-hosted AI model and simultaneously guarantee that health data never leaves the device. The honest position is that local-first architecture with a cloud model significantly reduces the scope of data transmission (you only send what the agent explicitly requests, not a bulk dump), but does not eliminate it for the model inference step.

If this is unacceptable for your use case, use a locally-running model (Ollama, LM Studio) instead of a cloud API. The Cascade MCP server works identically with either. The tradeoff is model quality: current local models are capable but generally less capable than frontier cloud models for complex clinical reasoning tasks.

Local-first does not guarantee security of the device

Local-first means data does not leave the device through the Cascade stack. It does not protect against a compromised operating system, malware with filesystem access, or physical access to an unlocked device. These are device-level security concerns outside the protocol's scope. Platform-level protections (iOS Secure Enclave, macOS FileVault, keychain access control) are what address these threats.

Collaborative health data sharing requires a different architecture

If your use case requires sharing health data between multiple parties — a patient and their cardiologist, a research participant and a study coordinator — the local-first Pod model needs to be extended with a sync and access control layer. The Cascade Protocol's Solid-compatible data format supports this via WebID-based access control, but the sync infrastructure is beyond the scope of the local CLI tools.

Conclusion

The default path for adding AI to a health app is the easy path: send data to an API, get an answer, display it. It is easy, fast, and creates compliance problems that are invisible until they are not.

The local-first alternative requires a small amount of additional thinking: you need to understand MCP, you need to install the Cascade CLI, you need to think about what data the agent should have access to. But the result is an architecture where you can answer the compliance questions that matter: where is the data? who can access it? what did the agent do with it? can the user delete it?

For most health AI use cases — medication review, symptom tracking, visit preparation, trend analysis — local-first with a Cascade Pod and Claude Desktop MCP integration is sufficient, simple to set up, and compliant by default. For use cases that genuinely require cloud data sharing, build that deliberately, with appropriate consent, BAAs, and access controls — not because an API call was convenient.

Health data is among the most sensitive personal information that exists. It deserves an architecture that treats it that way.