A team building a clinical decision support product had done the integration work correctly. They had FHIR access to a health system’s patient data. Structured fields were coming through cleanly: diagnoses, medications, lab values, vitals. Their model was trained. The pipeline was running.
But the predictions kept underperforming in ways that were hard to explain.
When they dug in, the answer wasn’t in the model. It was in the data. The structured fields they were receiving told them what a clinician had coded. They didn’t tell them what the clinician had actually observed, reasoned through, or been concerned about. That information lived in the progress notes, the assessment and plan sections, the free-text observations that a physician had written at the point of care. It lived in the narrative.
And the narrative wasn’t in the FHIR feed.
Getting access to it required a separate data use agreement, a de-identification process the health system’s privacy team needed to approve, an IRB determination in some cases, and in one instance, a custom extraction from the EHR’s document storage that the vendor quoted as a separate professional services engagement.
The structured data had moved. The context that made it meaningful hadn’t. And the product, which was fundamentally a reasoning tool, was trying to reason without the part of the record where the reasoning actually lived.
This is not a rare edge case. It is one of the most consistent gaps I see teams hit when building AI and analytics products on healthcare data. And it points to something deeper than a missing API endpoint.
The Standard Evolved. The Problem Didn’t.
To understand why narrative data sits outside most interoperability frameworks, it helps to understand what those frameworks were originally built to do.
HL7 v2, which became the dominant healthcare messaging standard through the 1990s and 2000s, was designed to move discrete clinical events between systems inside a hospital. An admission. A lab result. A medication order. A discharge. These are transactional events with defined fields, defined values, and defined recipients. HL7 v2 was built to carry them reliably from one system to another, and it does that reasonably well, even today.
What it was never designed to carry was clinical reasoning. The assessment and plan a physician writes after examining a patient. The nursing note that flags a subtle change in behavior. The care coordinator’s summary of a patient’s social situation that explains why the medication isn’t being taken. These are narrative artifacts. They don’t fit into discrete fields. They can’t be reliably coded or structured without losing something in the translation.
HL7 v3 tried to address this with a more expressive document model, the Clinical Document Architecture, which could carry structured and unstructured content together. The adoption was poor. The standard was complex, the implementation burden was high, and health systems that had already invested in HL7 v2 infrastructure had little incentive to migrate.
FHIR brought a more modern approach. The DocumentReference and DiagnosticReport resources can carry clinical notes. The standard technically supports narrative content. But technical support in a specification and actual availability through a production API are two different things. Most FHIR implementations expose the structured resources that regulators specifically mandate. Clinical notes, care summaries, and free-text observations remain inconsistently available, inconsistently formatted, and frequently gated behind additional governance processes that structured data doesn’t require.
The standard evolved. The gap between what the standard supports and what implementations actually expose did not close at the same pace. And for AI and analytics products that depend on narrative data, that gap is often the difference between a product that works and one that doesn’t.
Privacy as a Genuine Concern and a Convenient Friction
Clinical notes are sensitive in ways that structured data is not. A diagnosis code tells you a patient has Type 2 diabetes. A progress note might tell you the patient is struggling with depression, is in an abusive relationship, or has a substance use history they haven’t disclosed to their family. The narrative carries context that is clinically valuable precisely because it is specific, personal, and unfiltered.
HIPAA governs both. But in practice, health systems apply considerably more caution to unstructured narrative data than to structured fields. Privacy review processes are longer. De-identification is harder to validate. The risk of inadvertent re-identification is higher when the data includes free text written in natural language rather than coded values.
This caution is legitimate. It is also, in some cases, applied in ways that go beyond what the regulatory framework strictly requires, and that create friction that slows down access to data that could be shared safely with the right governance in place.
The pattern I have seen repeatedly is this: a technically and legally feasible data sharing arrangement stalls not because there is a genuine regulatory barrier, but because the privacy and compliance review process at the health system is under-resourced, risk-averse by institutional culture, or simply unfamiliar with the specific use case being proposed. Nobody says no. Nobody says yes either. The request sits in a queue, gets escalated, gets sent back for clarification, and eventually the team trying to build the product runs out of runway.
Red flag: When a data access request has been in review for more than 60 days without a clear decision path, the blocker is almost never purely legal. It is organizational. Find the person who can make the decision and get in the room with them directly.
The privacy layer is real and necessary. But founders need to distinguish between genuine regulatory constraints and institutional friction that looks like compliance but is really just slowness, risk aversion, or in some cases, a subtle form of the same proprietary resistance that shows up in commercial negotiations.
What Proprietary Resistance Looks Like Here
Blog 2 covered the commercial economics of interoperability broadly. This layer is narrower and more specific: the proprietary resistance that shapes which data gets exposed, at what fidelity, and under what terms.
EHR vendors store clinical notes and narrative content in their own document management systems, with their own data models, their own metadata structures, and their own access controls. When they build FHIR APIs, they make choices about what to surface. Structured data that maps cleanly to mandated resources gets exposed. Narrative content that requires custom extraction, format conversion, and additional governance overhead gets deprioritized, or put behind a separate commercial arrangement.
This is partly legitimate technical complexity. Normalizing free-text clinical notes across a health system’s document repository is genuinely hard. But it is also partly a reflection of where the regulatory pressure is strongest. Mandates have been specific about structured data resources. They have been less prescriptive about clinical note availability. Where the mandate is clear, compliance follows. Where it is ambiguous, the default is to do less.
Decision rule: If your product’s core value depends on narrative clinical data, treat access to that data as a first-order business risk, not a technical integration task. Validate that you can get it, in a usable form, under a workable governance arrangement, before you build the product around it.
The Structured Data Trap
There is a subtler version of this problem that catches teams who do have good structured data access.
Structured fields in a clinical record reflect what a clinician coded, not necessarily what they observed or decided. Diagnosis codes are added for billing purposes, often by coders working from documentation after the fact. Medication lists reflect what was prescribed, not what was taken. Problem lists drift out of date. Vitals get recorded but the clinical interpretation of those vitals lives in the note.
For an AI or analytics product, building on structured data alone means building on a representation of care that has already been filtered, simplified, and sometimes distorted by the coding and documentation process. The model learns from what was recorded in discrete fields. The clinical reality that drove the decision often lived somewhere else.
This is not an argument against using structured data. It is an argument for being precise about what your model is actually learning from, and what it is missing. Teams that understand this gap build products that account for it. Teams that don’t tend to discover it the hard way, usually after deployment, when the model behaves unexpectedly in cases where the structured fields looked fine but the clinical picture was more complicated.
If you only remember one thing: structured data tells you what was coded. Narrative data tells you what was thought. For most clinical AI use cases, you need both. Plan your data access strategy accordingly, before you finalize your architecture.
What to Do Before You Build on Clinical Data
If your product depends on healthcare data, structured or narrative, these are the questions worth answering before you commit to an architecture or a timeline.
- Have you identified specifically which data elements your product needs, and confirmed whether they are available through standard FHIR endpoints or require separate access agreements?
- If your product depends on clinical notes or narrative content, have you validated access with a real health system, not just a developer sandbox or a public dataset?
- Do you understand the de-identification or data use agreement requirements for the narrative data you need, and have you scoped the time and cost of that process realistically?
- Have you mapped the privacy and compliance review process at your target health system, including who owns the decision and what the typical timeline looks like?
- Is your model or analytics logic validated against the data you will actually have in production, or against a richer dataset that won’t be available at the health systems you’re selling to?
- Have you distinguished between genuine regulatory constraints and institutional friction, and do you have a path to navigate the latter?
Closing
The evolution of healthcare data standards is a real story of progress. HL7 v2 gave the industry a common language for clinical events. FHIR brought that language into the modern web. Mandates have pushed adoption further than voluntary incentives ever would have.
But standards define what is possible. They do not determine what gets implemented, what gets exposed, or what gets shared. Those decisions are made by organizations with their own incentives, their own risk tolerances, and their own commercial interests. The result is a landscape where the structured data layer has improved substantially, while the narrative layer, the part of the clinical record where reasoning, context, and clinical judgment actually live, remains inconsistently accessible, inconsistently governed, and consistently underestimated by teams building products that depend on it.
The gap is not primarily technical. It is structural. And the founders who navigate it well are the ones who understood that early, validated their data access assumptions before building, and treated narrative data not as a nice-to-have enrichment layer but as a core dependency that needed its own access strategy from day one.
The standard evolved. The incentives didn’t always follow. In healthcare data, they rarely do.

