Standard Data Models: Why They Matter, At What Cost & What Other Sectors Can Learn When Meaning Breaks Down | Insights

A deep dive into standard data models across healthcare, finance, and higher education, exploring why translation doesn't scale and how shared data meaning protects people from fatigue.

If you have ever been asked to explain why two reports show different numbers for what everyone thought was “the same thing”, you already understand the problem that standard data models exist to address.

It is almost never a calculation issue. It is almost always a meaning issue.

This is not unique to any one sector. At Dataqubed Ltd, we see it repeatedly across healthcare, life sciences, insurance, banking, wealth and asset management, biomedical sciences (inc bioinformatics), and higher education. Different domains, different pressures but the same underlying friction: data is plentiful, but shared meaning is fragile. In some of these sectors, that friction became costly enough that something had to give.

Standard data models, common data models, and reference standards did not emerge because they were intellectually neat. They emerged because translation, reconciliation, and reinterpretation simply did not scale.

This article looks at what standard data models really are, why they appear, what they enable, and what they cost. And we reflect on the situation in higher education - not as a sector that has failed at coming up with a CDM, but as one that is grappling with the same problem under a very different set of constraints.

What Is a Standard Data Model, Really?

A standard data model is not a single database, a single system, or a vendor product. It is an agreement about how a domain is represented in data so that information created in one place can be understood and reused elsewhere.

Where standard models have endured, they tend to separate three things deliberately.

First, meaning: what a concept actually is and is not.
Second, structure: how that concept is represented in data.
Third, use: how that data is exchanged, analysed, or reported.

This separation is not theoretical. Many initiatives fail because they rush to structure without agreeing meaning, or because they publish definitions without giving practitioners a practical way to implement them. A model only becomes real when it is usable by people who were not involved in designing it.

Crucially, standard data models do not eliminate variation. They make variation explicit, mappable, and explainable.

Standard data models: explaining variation and meaning

Why Do Sectors Adopt Standard Data Models At All?

Standard data models rarely appear because a sector decides it wants to be “more mature”. They appear when the cost of everyone doing things their own way becomes unacceptable.

In regulated clinical research, that cost is traceability and reviewability. Standards such as CDISC SDTM exist so that trial data can be aggregated, analysed, reused, and reviewed consistently across studies and sponsors. Without this discipline, regulatory review would collapse under inconsistency.
In payments and banking, the move to ISO 20022 was driven by the limits of unstructured messaging. Richer, structured data was needed to support processing, reconciliation, compliance, and resilience across the payments ecosystem.
In healthcare analytics, the OMOP Common Data Model exists because observational health data is recorded differently across electronic health records, claims systems, and registries. OMOP transforms those disparate datasets into a common representation, with standardised vocabularies, so that the same analyses can be applied consistently across many sources.
In genomics and bioinformatics, collaboration at scale only became possible once shared representations emerged. Formats such as the Variant Call Format succeeded because they provided a generic, scalable way to represent variants that tool builders and researchers could rally around.

Different sectors, different pressures - but the same underlying truth: translation does not scale.

Benefits of Standard Data Models That Show Up in Everyday Work

“Interoperability” is often framed as an architectural concern. In practice, its benefits are felt most strongly by the people doing the work.

When meaning is shared, analysts spend less time reconciling definitions and more time answering questions.

In healthcare analytics environments that use OMOP, once data is mapped into the common model, standard analytical methods can be reused rather than rebuilt for each dataset. That is a direct reduction in effort, not an abstract gain.

Operational decisions become faster and safer because numbers behave consistently. In healthcare, this is discussed in terms of evidence quality and safety. In other sectors, it shows up as fewer escalations, fewer reconciliations, and fewer meetings to explain why yesterday’s number changed today.

System change becomes less traumatic. A standard model acts as a buffer between systems. When one system is replaced, the mapping is done once. Downstream users do not have to re-engineer everything they rely on and organisations don’t have to pay again and again for it.

Perhaps most importantly, shared models enable shared tools and shared practice. In healthcare, the real value of OMOP lies not just in the model itself, but in the ecosystem of validation checks, standard queries, and community knowledge that surrounds it. In finance, richer structured data reduces manual intervention across organisations. In life sciences, standard models support reuse and review without repeated reinterpretation.

These benefits are not glamorous. They are mundane… and that is precisely why they matter.

The Costs and Trade-Offs of Standard Data Models

Standard data models are not free… read that again.

Mapping local data into a standard model is real, skilled work.

Healthcare literature around OMOP is explicit about the effort required for transformation, vocabulary mapping, and validation. Local reality is messy, and no model makes that disappear.

Standards can also drift or calcify. Without active stewardship, they lag behind practice. People still “comply”, but quietly build workarounds leading to the all too familiar consequence of the model surviving but a slowly increasing erosion of trust.

One model rarely serves every purpose well. OMOP for instance, is explicitly designed for observational analytics. It is not trying to be an operational messaging standard or a transactional system model. Attempts to build a single model that perfectly supports operations, reporting, exchange, and analytics often collapse under their own ambition.

There is also the risk of standardising the wrong thing. If workarounds are codified instead of meaning, yesterday’s compromises become tomorrow’s constraints.

Understanding the true cost of standard data models

What Needs to Be True for Standard Models to Succeed?

Across healthcare, life sciences, finance, insurance, bioinformatics and higher education, the same conditions recur.

There must be a clear, practical purpose that people can recognise in their own work. CDISC exists for regulatory review and reuse. ISO 20022 exists to enable richer payment data and ecosystem efficiency. OMOP exists to support consistent observational analysis.

The scope must be bounded. Successful models are explicit about what they cover and what they leave local, at least initially.
There must be trusted, continuous governance. Governance here does not mean bureaucracy. It means stewardship: versioning, change control, and a way to resolve disagreements about meaning over time.
Tooling and validation matter. People adopt standards when they can implement them without heroic effort. In OMOP implementations, validation and repeatable queries are essential to making the model usable in practice.
Finally, incentives must align with the work required. In sectors where standard models endure, adoption is reinforced by regulatory, commercial, or ecosystem-level pressures - not just goodwill or well-intentioned ambition.

And Where Does Higher Education Sit?

In Dataqubed, being experienced in big data and analytics in multiple sectors, we’ve observed the following in higher education. Namely, it is not unique in facing the challenges with agreeing and implementing a sector wide standard data model - but it operates under a distinct set of constraints.

The UK HE sector has not stood still. Programmes such as HEDIIP invested significant effort in analysing existing student data collections and developing a logical data model to underpin future collections. HESA’s Data Futures programme introduced a new student data model for national returns from 2022/23 onwards, revealing both the ambition and the complexity of sector-wide change.

Alongside this, UCISA’s Higher Education Reference Models (HERM) provide a shared enterprise architecture view for the sector, including a Data Reference Model intended to establish common language - the “business nouns” of higher education. HERM has evolved over multiple versions and is actively maintained and reviewed by a practitioner community.

This matters. It shows that the sector understands the problem and has invested in foundational work.

It also highlights something important. These efforts have focused on shared understanding and reference, not on defining an operational or analytical common data model that creates unavoidable implementation momentum for systems and suppliers.

That is not a failure. In other sectors, reference models and shared language usually come first. What follows - if it follows at all - is a deliberate decision to move into bounded, implementable models tied to specific purposes.

Higher education and data reference models

So Why Has Higher Education Not Crossed the Threshold to Mandated Implementation?

Like for most aspects of higher education, institutional autonomy is likely to be at play. This fiercely guarded attribute means deeply local academic rules make variation in policy, process and data the norm, not the exception.

Legacy systems and local workarounds make mapping effort significant in the face of business-as-usual (BAU) pressures. And unlike healthcare safety or payment resilience, higher education as yet lacks a single forcing function that makes interoperability the cheapest option.

A common data model is not a document. It is a long-term social and technical agreement. Those take time, trust, and sustained stewardship.

A Dataqubed Reflection: Adoption, Adaptation, and the Missing Data Foundations

Across every sector we work in, one pattern repeats with remarkable consistency. Organisations are encouraged, and often rightly, to adopt modern software platforms rather than adapt them. The argument is familiar: adoption lowers cost, reduces complexity, simplifies upgrades, and aligns organisations to product roadmaps that are designed to evolve.

In principle, this is sound advice. But adoption only works when the conditions for adoption already exist.

In healthcare, life sciences, finance, and insurance, the shift towards adoption was not simply a technology decision. It was preceded, sometimes quietly, sometimes painfully, by years of work on shared data meaning. Common vocabularies, reference models, canonical representations, and agreed reporting constructs created a layer of stability beneath the systems themselves.

Those investments were rarely glamorous. They were slow, contested, and often frustrating. But they meant that when organisations were asked to adopt new platforms, they were not also being asked to resolve decades of unresolved data ambiguity at the same time.

Higher education is increasingly being asked to make a similar leap, often without that stabilising layer fully in place.

Universities feel the pressure to adopt standard student system processes while still carrying deeply local academic rules, regulatory requirements, historical policy decisions, and complex legacy estates. When shared data meaning does not exist, adaptation does not disappear - it simply moves and exactly into spaces we don’t want it to.

It moves into reporting layers, into downstream analytics, into shadow systems and spreadsheets, into manual reconciliations that sit quietly outside programme plans. It moves into the emotional labour of professional services teams who are told the system is “standard” while knowing, from experience, that the data rarely behaves that way.

This is not resistance to change. It is not a failure of capability. It is a structural mismatch between aspiration and foundation.

What we see repeatedly is that without shared data meaning, adoption becomes something that has to be continually propped up - and this is where the real cost lies. Every system upgrade risks breaking reports. Every integration becomes bespoke. Every new insight requires explanation before it can be trusted.

The hidden cost is not just financial. It is cumulative fatigue. Teams relearn the same lessons on every programme. Knowledge fails to transfer cleanly between institutions, suppliers, and projects. People move roles, but the understanding of “what the data really means” stays trapped locally, often undocumented, often held by a shrinking number of individuals.

Other sectors learned, sometimes the hard way, that this is not sustainable. In healthcare, common data models such as OMOP did not replace operational systems. They created a shared analytical language that allowed insight, comparison, and learning to scale without forcing uniformity. In finance, structured standards such as ISO 20022 did not eliminate complexity, but they moved it into places where it could be managed consistently rather than rediscovered repeatedly.

Higher education has already taken important first steps. Initiatives such as HEDIIP and UCISA’s HERMs show a sector willing to invest in shared language and conceptual clarity. That work should not be underestimated. In other sectors, it was a necessary precursor to anything more concrete. What has not yet happened is a crucial sector-level decision about whether shared data meaning should move beyond reference and into selective, purposeful implementation.

Not a grand, all-encompassing common data model. Not a mandated vendor schema - HE needs to take the reins and not be led by them by vendor agendas. But a bounded, realistic step that tackles a problem everyone recognises as painful: progression, outcomes, regulatory reporting, or cross-system integration.

At Dataqubed, we are not arguing that higher education must do this. Nor are we suggesting that a common data model is a silver bullet. We are arguing something more modest - and more important.

Without some shared data meaning, the promise of “adopt not adapt” will continue to place disproportionate strain on the people delivering change, rather than on the structures that make change easier. Other sectors eventually recognised that shared data foundations are not an optional extra; they are what make adoption humane, sustainable, and repeatable.

Higher education has the opportunity to learn that lesson deliberately rather than through repeated exhaustion. The question is not whether a common data model is the answer.

It is whether the sector is willing to ask:

What problem are we trying to make easier and for whom?

What This Means in Practice

For HE leaders, this is a reminder that adoption is not just a procurement or governance decision. When shared data meaning is weak, the cost of “adopt not adapt” is paid downstream - in reporting effort, workarounds, and fatigue. Investing time in shared data foundations is not about control or standardisation for its own sake; it is about making change survivable and repeatable. The most important leadership question is not “are we adopting the system?” but “where is adaptation really happening now - and at what cost?”

For delivery teams, this is a recognition of reality. The late nights reconciling data, the quiet spreadsheets that “just make it work”, the sense that every programme starts from scratch - these are not personal failures or poor planning. They are symptoms of missing shared meaning.

A move towards common data foundations would not remove complexity, but it would stop the same complexity being rediscovered, re-explained, and reabsorbed by different teams every time.

For university system suppliers, particularly those relating to student systems, this is an invitation rather than a criticism. “Adopt not adapt” is a reasonable aspiration, but it only holds when customers share enough data meaning to meet you halfway. Reference models like HERM show that the sector is willing to do that groundwork. The opportunity now is to engage with those foundations - to support mapping, expose meaning clearly, and help shift adaptation out of fragile reporting layers and into more sustainable patterns.

Fostering collaboration with strong data foundations

Why Dataqubed Cares About This

At Dataqubed Ltd, we care about data because we’ve seen what happens when it’s allowed to scale properly and when it isn’t. We’ve worked with data at the level of the human and bacterial genomes (Human Genome Mapping Project, Cambridge; UBC, Canada), with financial data in developing economies (M-PESA in Kenya, Afghanistan, Egypt, India), and with large-scale national healthcare datasets. We know our big data, but more importantly, we understand how fragile insight becomes when meaning isn’t shared. We firmly believe data challenges in higher education are surmountable with the perseverance, grit and determination this sector shows in spades already.

Across sectors, we’ve seen how strong data foundations that we’ve delivered reduce friction, lower risk, and make learning repeatable rather than exhausting. We’ve also seen how the absence of those foundations and refusal to invest in them, quietly pushes effort onto people… into rework, manual fixes, and constant explanation.

We enjoy talking about data because these are not abstract problems. They affect how organisations make decisions, deliver change, and support the people doing the work. If you want to talk to someone who truly understands data - how it behaves in the real world, how to scale it, and how to extract insight that lasts - we’d love to have that conversation. 📩 hello@dataqubed.com

Ready to build strong data foundations for your organisation? Get in touch with our expert strategy and engineering team today.

References

OHDSI. Standardized Data: The OMOP Common Data Model. https://www.ohdsi.org/data-standardization/
Voss, E.A. et al. (2015). Feasibility and utility of applications of the OMOP Common Data Model. Journal of the American Medical Informatics Association. https://academic.oup.com/jamia/article/22/3/574/716183
CDISC. Study Data Tabulation Model (SDTM). https://www.cdisc.org/standards/foundational/sdtm
ISO. ISO 20022 – Financial Services Messaging Standard. https://www.iso20022.org/
Bank of England. ISO 20022 migration for UK payments and RTGS. https://www.bankofengland.co.uk/payment-and-settlement/rtgs-renewal-programme/iso-20022
Danecek, P. et al. (2011). The Variant Call Format and VCFtools. Bioinformatics. https://doi.org/10.1093/bioinformatics/btr330
UCISA. Higher Education Reference Models (HERM). https://www.ucisa.ac.uk/herm
UCISA. HERM Version 3.1 release announcement.
EDUCAUSE. The Higher Education Reference Models. https://www.educause.edu/focus-areas-and-initiatives/enterprise-it/higher-education-reference-models
HESA. Student Record Data Model (Data Futures). https://www.hesa.ac.uk/innovation/data-futures
HESA. HEDIIP logical data model (archived).
Office for Students. Independent review of HESA Data Futures. https://www.officeforstudents.org.uk/publications/independent-review-of-hesa-data-futures/