The CDISC Unified Study Definition Model (USDM), Biomedical Concepts, and the implementation of end-to-end study data automation is hard to envision from model diagrams and conceptual drawings.
To make the USDM vision tangible we present how the USDM model can be used for study data flow automation by use of our ‘technology demonstrator’.
We show:
- The creation of a digital study protocol including the study design and the schedule of assessments (SoA)
- Driving data capture artefacts from the SoA
- Loading data including bulk loads and human entered
- The automated generation of SDTM
The USDM, BCs, and SDTM can be brought together in a seamless manner, allowing us to move away from a siloed and processed-focused way of working to a data-centric world of seamless integrated standards.
Background
We have re-used work done as part of the CDISC DDF (Digital Data Flow)
- The CDISC Pilot Study protocol and its public data
- Study metadata excel example of the CDISC Pilot Study (developed as part of CDISC DDF)
- Our technology demonstrator which is shown below is built upon the DDF prototype tooling (developed as part of the CDISC DDF)
The technology demonstrator is a prototype and its purpose is to demonstrate what is possible in terms of study automation. The model used is the USDM model extended with links to SDTM via relationships to the BC.
The model and data is stored as a neo4j property graph.
From Protocol to raw SDTM in 3 mins
Since we don’t have a fully fleshed UI for entering the study design and SoA we make use of the CDISC DDF tooling which can load an excel representation of the study metadata into the USDM model.

We have enriched the original CDISC Pilot data with data contracts URIs and data point URIs

If you are in a hurry, and don’t care about the principles behind the scenes, you can just watch the video below. The files above are being uploaded in the demonstration.
If you want a deeper dive into the principles being used then read👇
The conceptual view of the USDM in action
We represent the protocol and its SoA electronically.
Schedule of Assessments (SoA)
The protocol describes a study design that has epochs (periods in which the patient undergoes various treatments) and in each of the epochs the patient come to the clinic at a number of pre-defined times – timepoints (TP). At each of the visit (timepoints) to the clinic, at set of pre-specified activities (A) will occur, to measure the patient’s well-being and physical condition. An activity can be vital signs, registration of demographics, and biochemistry tests etc. All details about the activities are described in the (electronic) protocol. An activity may only be measured at the first visit while other activities might be repeatedly measured during the study. This is modelled via the scheduled activity instance (SI), i.e. which activities (A) are measured at which timepoints (TP): A+TP=SI. This is essential the Schedule of Activities (SoA) of the protocol.

An activity is high-level description of a measurement being performed on a patient. For example the activity ‘Vital Signs’ can be the measurement of heart rate, systolic and diastolic blood pressure. Biochemistry test can be a series of test, e.g. total cholesterol, albumin, sodium, triglycerides.
Biomedical Concepts
Each of these tests are biomedical concepts (BC).

The biomedical concepts have a set of properties, for example:
- The date and time at which the measurement was taken
- The value of the result
- The unit of the result
- The specimen in which a test is measured
- The position of the patient when the measurement was taken, if relevant
The set of properties will vary depending on the type of biomedical concept.
To describe the biomedical concept and their properties we use terminology. This is to ensure that we represent the data in a consistent way.
Data Contract – goodbye mapping
From the scheduled activity instance (SI) and the BC properties, we know exactly which data points we are collecting in a study.

The data contract represents a unique identifier (similar to a barcode) of a systolic blood pressure unit (VSORRESU) measured at a given timepoint, or an albumin result (LBORRES) collected at a given timepoint etc.

These data contract URIs are generated automatically when the user has specified the BCs and its properties that are relevant to collect and specified at which timepoints, i.e. specified the detailed SoA.
As we haven’t spent time on making a UI for this, we have just assumed that all properties of the BCs are to be collected in our demo.
Getting data
Imagine you need to collect this data for your study. If you ask the data providers (EDC, Lab etc) to retain the data contract URI and return it when they deliver the data then you don’t have to care about if the data providers used different names for the tests and their properties. For example the lab could call the albumin result ALB_SERUM_RES and the unit ALB_SERUM_U.
If you didn’t have the data contract URI you would need to map from ALB_SERUM_RES to LBORRES (where LBTEST=ALB and LBSPEC=SERUM) and similar with the unit. Having the data contract URI you don’t need any mapping.

How do we then add that data to the study?
We create the subject node/data point (SU) containing the subject ID and we relate (SU->S) it to the study to indicate that the subject is enrolled in the study. This data point is typically made when the subject signs informed consent (completes the Activity:Informed consent).

When the data provider delivers the data we will get the value, the subject id and the data contract URI. We can then query the database for the subjects (SU) that equals the subject id and the data contracts (DC) that equals contract URI, and then we create a new data point node (DP) where we set the property value = value, and uri = data contract URI/subject id. Finally, we relate the new data point (DP) with the subject (SU) and the data contract (DC).
Our simple cypher query look like this:
LOAD CSV WITH HEADERS FROM 'file:///{filename}' AS data_row MATCH (dc:DataContract {{uri:data_row['DC_URI']}}) MATCH (design:StudyDesign {{name:'Study Design 1'}}) MERGE (d:DataPoint {{uri: data_row['DATAPOINT_URI'], value: data_row['VALUE']}}) MERGE (s:Subject {{identifier:data_row['USUBJID']}}) MERGE (dc)<-[:FOR_DC_REL]-(d) MERGE (d)-[:FOR_SUBJECT_REL]->(s) RETURN count(*)
Extracting data
Getting data from the data base it done using a cypher query. We can either get the raw data or choose to present it in SDTM format.
Here our query show the data in its raw format.

To display the data in SDTM format we extend the USDM model with the Canonical Reference Model (CRM). The CRM model is a generic model of observations (see Biomedical Concepts Treatise).
We have linked the BC properties (blue LBORRES) to SDTM (class) variables (green LBORRES) via the CRM node (beige). Using the CRM node allows us to link other models if required.

In our prototype demonstrator, the only thing we need to do is to link the BCs to Domains. Which is making the relationship between ‘Albumin Presence in Urine’ (blue)<–USING_BC_REL– ‘LB’ (green). Below we have linked other BCs to VS domain.

Using the CRM models linkage, we can then query the database to get the data in SDTM format.

Summary
Linking the protocol to BCs to generate data contracts provides the capabilities that has been discussed in the industry for, at least, the last decade: end-to-end study data automation. It will be the end of mapping between different formats as we can provide the address to a specific datapoint that we have specified in the protocol, as well as describing the context in which the datapoint exist. It provides control and flexibility as we can display the (same) data in various formats. We believe that using this approach, the industry will finally get specifications that are able to break the silos in which different departments are looking at a clinical study from different perspectives.