An ANSI-based Resource-Oriented Clinical Information Repository (CIR) in the Cloud

Resource-Oriented is different from Inmon and Kimball

Since the turn of this century, companies which handle large volumes of data (e.g., Google, Amazon, etc.) have been moving away from the early 90s RDBMS-centric design patterns advocated by Inmon and Kimball, and over onto contemporary Web based Resource-Oriented architectures.

Building contemporary Web based Resource-Oriented data services is no more complex than building the typical 90s style Inmon or Kimball type solution. However, how you design, implement, test and deploy a Web based Resource-Oriented solution is very, very different from how you accomplish those same tasks when your intention is to create a 90s style RDBMS-centric Inmon or Kimball type solution. For example, in a Resource-Oriented solution you design ways to move the computation to the data, while in the 90s style you design ways to move the data to the computation. Moreover, whereas the components that make up an Inmon or Kimball solution are always tightly coupled to one another and therefore brittle, the components of a contemporary Resource-Oriented solution are always loosely coupled, and therefore highly malleable, scalable, flexible, robust, and extensible. In addition, contrary to the RDBMS-centric design pattern, consensus is not imperative to achieving a robust and viable Web based solution, and establishing a central authority (with powers beyond matters of information security) is optional, not mandatory.

Before moving on, the notion of large volumes of data needs to be revisited. The challenge of large volumes of data is not always only a matter of large spacious volumes of data. Alternatively, it can very often be a serious challenge to bring together data sets in very tiny periods of time. Combining (distributed) data in small windows of time creates a type of big data challenge – too much data for a very small window of time. So, a big data challenge is not only a matter of space, it can also be a matter of time.

 

Clinical Information Repository Benefits

At present, throughout the Healthcare marketplace, clinical information is silo’d across transaction as well as analytic systems. This is having a profound negative impact on the quality of medical services, as well as contributing to significant increases in the costs of healthcare services. The purpose of the Resource-Oriented Clinical Information Repository (CIR) is to improve the quality of medical services by provisioning high quality clinical information quickly, and in a highly cost effective manner. The benefits of a Resource-Oriented Clinical Information Repository (CIR) are:

  • Common scheme for addressing all the different things of interest to the business
    • Business names the things of interest
    • Support Healthcare Core Domain + related business domains (HR, Finance, etc.)
    • Broker controlled access to resources
  • Data is encrypted at rest and while in transit
    • Move the computation to the data, not the data to the computation
    • The burden of multiple copies of data is kept to a minimum
  • Services and concepts are easy to use as needed
    • Work with all the different things of interest to the business (in all their forms and formats) in basically the same way
    • Long-term stability
  • Data can be read quickly
    • Highly cacheable named resources
    • Supports not only SQL but other Domain Specific Languages (DSL) as well Named Query/Command as reusable resources
  • Data is in a consistent form and quality
    • Form is based upon the business context of the request
      • Free to change the shape and format of information
    • ANSI-based entities + Business Domain of Excellence properties
    • Data governance compliant Master data types
    • Relevant, well cataloged and classified
    • Easy to navigate and CRUD
  • Efficient, secure, reliable, and sophisticated data interchange
    • Loosely coupled and uniform interfaces
      • Standard compression/decompression mechanism(s)
      • Standard encoding and decoding mechanism
    • Self-describing messages that contain all of the resource state needed to handle the interchange
    • Alert consumer to resource updates or progress in workflows
    • Enforce authentication and authorization at transport level
    • Consumers digitally sign the body of their request
  • Logging for audit purposes
    • Record CIR state transitions caused by consumers visiting and manipulating resources
  • Assimilates complex bursty data streams, and batches, from internal information sources
    • Capture, manage and share metadata (e.g., resource usage)
      • Cost does not out way business value of metadata
    • Integrates CIR resources, services and concepts with external information sources
    • Extend CIR resources into new uses
    • CIR consumer does not have to do the integration between resources

 

CIR Data Model

The CIR Data Model contains objects, concepts and other entities, as well as the relationships between them, of shared interest in the Healthcare marketplace. It is important to point out that the RDBMS-centric solutions are not capable of containing these relationships, let alone the properties that define and describe those relationships of interest to the business.

Many people imagine that the word ‘relational’ denotes relationships. It does not, it is just the formal mathematical name that describes how the data is structured – the data structures are relations (commonly called tables and views). The notion of ‘relationships’ is not a part of the relational data model. The concept of a relationship is an after-thought in the design of a relational data service. The foreign key construct is used to enforce the literal data values of a relation’s primary key that are redundantly stored in another relation. A primary key is an artificial value used internally by the data service, and has no business meaning or value outside the context of the internals of the data service. In addition, if you use a relation to implement a relationship, that solution is actually an anti-SQL pattern. And like all anti-SQL patterns, all queries to such relations are impossible for the RDBMs to optimize.

Consequently, if you recognize the high market value of relationships between the things of interest to your business, then you do not hardwire those things (of interest to the business) to an RDBMS. Your Resource-Oriented solution can feed an RDBMS with data, but that solution must not be constrained by the inherent, inescapable, limitations of the relational data service (or by a relational data model). You lose on all fronts when you elevate a relational data model above the market value of the relationships present among the things of interest to your business.

The entities contained in the CIR data model are not represented as relations, nor are CIR entities normalized or de-normalized. You can always cast a CIR entity into a relation of any normal form you need. However, the CIR entity is not a relation. An entity has 1 or N properties. Each property has a name, and a value that is one of several primitive value types (sting, integer, image, etc.). Unlike a column of a record in a table, the property of an entity can be assigned a list of values. Unlike a relation which can only be manipulated using relational algebra SQL, an entity can be manipulated using numerous declarative as well as imperative DSL statements. For example, entity query filters can be implemented in Regex statements embedded within cURL commands.

As with all types of modern data models, the CIR supports Data Governance, as well as Master Data Type (MDT) Management. What makes these disciplines particularly challenging are the complex interchanges of information resource between Healthcare vendors, as well as regulatory bodies. To address these challenges, the industry promotes the adoption of a variety of standards for defining the entities of interest to them. In the Healthcare marketplace, these entities of interest to the business are routinely defined by ANSI and industry standards, such as:

  • HL7 3.x; ASC X12; SNOMED-CT; RXNorm; Loinc
  • ICD9/10; OASIS; CCD; CCR; CDA

Though these standards are helpful, the construction of the CIR data model requires merging entities, objects and concepts from disparate information vocabularies and schema. Since ANSI Healthcare standards committees focus on different ontologies, each committee adopts different schema. In consideration of these disparities, the ANSI entity, object or concept is of greater interest than the schema which contains it. A great strength of a Resource-Oriented architecture is its ability to provide a common scheme for addressing all the different things of interest to the business. The effect of this common scheme is to drive ambiguity out of the data model, and thereby make the information resources contained in the CIR easy to use as needed.

However, to innovate and thereby create new value in the marketplace, to derive competitive advantage in the marketplace, those ANSI entities must be blended with the objects, concepts and entities found in the Healthcare company’s core domain(s) of excellence. Domain Driven Design (DDD) is the methodology used to identify entities, their properties, relationship and behaviors in that core domain.

An MDT entity in the CIR data model is based on an ANSI entity to which has been blended properties discovered in your domain of excellence. As this discovery process unfolds, the ANSI definition of the entity serves to minimize volatility in the CIR data model. It is important to realize that the sole purpose of the existence of an MDT is to serve the consumer, and may or may not be needed in a given business context.

The golden rule to keep in mind when designing CIR entities is that you write the entity the way the entity will be read. This rule is not a paradox, it just requires forethought. This rule is only a concern if you don’t know where you’re going. This golden rule requires the Business Analyst to know ahead of time the query/command the CIR is going to perform. For a given query the Business Analyst just needs to know the identity of the entity to query, the properties being filtered or sorted, the operators of the filters, and the orders of the sorts. The Business Analyst records these facts in their user story/case.

User stories produced by Behavior Driven Development (BDD) drive the weekly construction of CIR data model and resources, according to current business priorities. Test Driven Development (TDD) verifies and validates the user story, as well as the forms and formats of the resource supported.

A CIR entity is documented in JSON format. Business Analysts and Software Engineers work with same specifications of CIR resources. Each user story references JSON specifications of CIR resources.

The CIR Schema has the same form as the Business Domain Model, and is therefore easy to understand. The CIR data model is visualized as property graph which is isomorphic with (the same form as) the Business Model. In addition the language used to express the objects, concepts, things and entities contained in the CIR schema is a blend of ANSI vocabularies and the business vocabulary.

Each instance of an entity available from the CIR:

  • Has been cleaned
  • Has a primary key (PK) that uniquely identifies each entity instance
  • Is immutable
  • And, is Data Governance compliant

Identifying & Resolving Meaningful Business Concepts

For each domain of the Healthcare business which will be supported by the CIR, it’s all about:

  • Defining meaningful business concepts of interest in that domain
  • Giving each concept a logical name which serves 2 purposes:
    • Identification (we give names to things in order to distinguish the things we are interested in)
    • Manipulation (the name is a handle by which you interact with the resource)
  • Being able to (securely) resolve a concept, i.e., being able to find, request, and use the concept as needed.
    • You do not need to write a query or call a separate service to use the concept

Although the preceding design approach is present in the Inmon/Kimball, how you implement your identifiers and how you resolve them, is entirely different in a Resource-Oriented solution. Having made that point, it is also necessary to point out that since SQL services are available in the Resource-Oriented CIR, SQL can be used to resolve CIR resources. Remember, the marketing term NoSQL does not mean ‘no’ SQL, it means ‘not only’ SQL.

Each business channel that uses the CIR has their preferred forms and shapes of information resources they consume. In a Resource-Oriented solution, the details of how the concept has been implemented are immaterial to a consumer. You separate the name of the thing from the process that produces that thing by implementing a REST endpoint.

In a Resource-Oriented solution the identifier (name) is implemented as a URL plus an optional Named Query/Command. This means that each thing resolves to a unique REST service endpoint that will convey your request to a query/command event handler which negotiates the responses available to disparate resource consumers. The major implication of this is that we work with all the different things of interest to us (in all their forms and formats) in basically the same way. Different endpoints will provide different capabilities, but they all generally function alike.

Not only does the business context of the request dictate the shape and form of the information, the business context also determines the format (XML, JSON, CSV) of the information. The REST pattern has a powerful influence on the discipline of Master Data Type (MDT) management. When the REST pattern is in-place, the MDT does not limit the information to a prescribed format. The information resource format follows the business context of use – resource format follows use.

The implementation of the CIR is incremental, and occurs every time in 1 of 2 ways: release a new REST service endpoint, or alter a REST service endpoint’s ability to negotiate a response.

CIR Resources via HTTP/HTTPS

A Resource is just a Meaningful-Business-Concept to which the business has assigned an identifier, which has been implemented as a REST endpoint. As such, consuming Resources to meet your needs involves using HTTP/HTTPS verbs to manipulate the resource.

An HTTP GET request to an identifier returns a representation of the resource or resource collection from the CIR. We are also able to retry an HTTP POST, PUT, PATCH, DELETE requests (idempotent) when using the CIR. We can use HTTP HEAD method to Discover Metadata About A Resource contained in the CIR. And, lastly, we can use HTTP OPTIONS method to Discover What You Can Do To A Resource.

This also means that the consuming client does not know what it is going to get back from the endpoint/identifier/request-handler. Nevertheless, the consumer has to know how to parse the general-purpose, standards-based, response (e.g., HTML, JSON, XML).

CIR as a Multi-level DaaS

The danger of describing the CIR Data-as-a-Service (DaaS) as something composed of level, or layers, is that people frequently forget that the important fact that you work with every CIR entity, object, concept, resource in the same way – you manipulate a REST endpoint. As such, the CIR consumer does not experience moving through layers.

The sole purpose of the idea of layers is to support the hierarchical positioning of slogans which describe the capabilities supported by a CIR resource. It is important to understand that regardless of a resource capability present at any layer in the CIR DaaS, that resource is always only manipulated at a REST endpoint.

Every CIR resource REST endpoint is initially, and perpetually, available at layer 0. The layer 0 slogan is ‘Collecting, Cleaning & Displaying.’ In layer 0, data from internal and external sources is collected, cleaned and assimilated into the CIR data model. Every CIR resource available at layer 0 is data governance, and master data type management, compliant. Every layer 0 CIR resource can be displayed using 3rd party data visualization tools.

The slogan for layer 1 is ‘Easy to Explore, Interact.’ Work to enhance and optimize the structure of the endpoints, and the links between them is accomplished in layer 1. Work done in this layer ensures that the consumer does not need to interpolate CIR resources, and can easily resolve and manipulate the resources they need.

The slogan for layer 2 is ‘Question, Share.’ Ad-hoc and advanced analytic processing of CIR resources occurs in layer 2. Aggregation of CIR resources is supported, and interactive charts of CIR resources are available at layer 2. Lastly, at layer 2, consolidation of reports and dashboards is achieved.

The slogan for layer 3 is ‘Make inferences, predictions and recommendations.’ After discovering and capturing the things of interest, and defining their relationships, the business begins to make inferences, predictions and recommendations about those things.

 

Building CIR on a PaaS in the Cloud

When choosing a Cloud Platform-as-a-Service (PaaS) vendor, these are the minimum selection criteria:

  • Hands-free OPS (CIR is hosted in the cloud)
    • Software engineers manage + monitor URI Request Handlers they create
  • Business Continuity + High Availability
  • Automatically scale
    • The number of users
    • The size of the data
  • Secure access to Clinical Information Repository
  • HTTPS verbs + Proxies + VPNs + Firewalls + ACLs + Encryption

In the next blog we will explore how to build the CIR on the Google Cloud Platform