All you need to know about SDTM
The SDTM standard is a CDISC standard, and it means Study Data Tabulation Model. If you want to know more about CDISC standards you can read our introduction to CDISC standards, or you can read about how CDISC standards fit into the drug development process.
What is SDTM in clinical trials?
CDISC SDTM is the name of the model (or framework) used for organizing data collected in human and animal clinical trials. The model was developed by CDISC – the Clinical Data Interchange Standards Consortium – a standards development organization for dealing with medical research data.
Once you’ve gathered all the necessary data for your trial, it must be converted into a specific table format optimized for review, to be accepted by the FDA. The study data tabulation model – SDTM – is the name of that structure.
Why is SDTM required?
SDTM is there to give regulatory reviewers – namely the FDA – a clear description of the structure, attributes and contents of each dataset, and the variables submitted as part of your clinical trial. Before CDISC SDTM was enforced, there were different domain names for each domain, different variables, and different variable names. Nothing was standard. As a result, reviewers spent huge amounts of time trying to get the data into a standard format – figuring out the domain names and names of the variables in each dataset – rather than reviewing the data itself. This ultimately prolonged the clinical trial process.
You can read more about what other CDISC standards are required for regulatory submissions.
The introduction of CDISC SDTM standardized this end to end required data structure. Now we have standard domain names, and a standard structure for each domain. There are standard variables, and standard names for SDTM datasets. It means that each bit of data collected can now be easily identified. Regulators can review the data much quicker, making the process far more efficient. Plus, it makes all of your studies consistent, because they’re all in the same standard format.
Formalising the structure of the domains has also led to the development of conformance rules, by both the CDISC SDTM team and regulatory bodies such as FDA and PDMA. You can find out about the FDA’s clinical trials guidance documents. These rules are programmed into software validation tools to automate the checking of SDTM clinical datasets, against the conformance rule.
What is the latest version of SDTM?
CDISC build the SDTM standards from two important models:
The core model provides a standardized set of variables, assembled into “classes”, which are refined and built into variable collections for specific uses cases (SDTM-IG domains) e.g. Vital Signs observations, Medical History reporting etc. The SDTM core model also supports the non-human trial standard SEND-IG.
The latest versions being SDTM v1.7 for SDTM-IG v3.3 and SDTM-IG Medical Devices (SDTMIG-MD) v1.1. CDISC are always developing new domains. It’s important to regularly check the CDISC website for the latest updates.
There is currently a Public Review for SDTMIG v3.4, SDTM 2.0, and Conformance Rules v1.2. Comments are due by 1 June 2020. Read more about planned updates to standards.
What are SDTM domains?
In order to be able to correctly implement the SDTM, it’s important to have a good understanding of its domains and how they’re structured.
SDTM is based on the observations that are collected from subjects taking part in a clinical trial. An observation is a piece of data collected during a study. For example “Subject 12 had a mild headache starting on study day 5”.
Most observations collected should be classified into one of the general observation classes (also known as data classes). These are:
Each of the general observation classes have associated domains. A domain is simply a group of observations that share a common topic, such as Medical History or Vital Signs. In addition to the general observation classes, there are 3 other domains. But we’ll get into the various SDTM domains after a bit more on variables.
The example below shows the observations “Nausea”, “Headache” and “Dizziness”. These are part of the Adverse Events domain. And the Adverse Events domain is in the Events observation class.
Domains are prefixed by a two character domain code that’s used to map a variable to a domain. For example, the domain Medical History is prefixed by the domain code MH. The variable –SEQ contains two hyphens that indicate a domain code is required. So the example becomes MHSEQ. Another example is the variable –TESTCD in the Vital Signs domain becomes VSTESTCD. Each domain has a dataset which is a collection of related data. SDTM datasets are described by a set of named variables. And each of these named variables are categorized by their role.
What are SDTM variable roles?
A role category conveys a particular type of information about a variable. And variables can have just one role.
Variable roles have 5 categories:
- Identifier variables allow the study, subject, domain and sequence number of a record to be identified.
- Topic variables describe the focus of an observation.
- Timing variables describe the date, time and duration of an observation.
- Qualifier variables describe the results of an observation with text or numeric values.
- Rule variables describe algorithms or methods for calculations or looping conditions and are mainly used for the Trial Design domain.
In the example below, variable roles are shown in the top row of the table. The color coded areas on the second row show the variables that correspond to the variable roles.
Qualifier variables are further categorized as follows:
- Grouping qualifiers group observations together
- Result qualifiers describe the result for a finding
- Synonym qualifiers contain another name for the observation
- Record qualifiers define the supplementary attributes of an observation
- Variable qualifiers describe the value of an observation
What are SDTM core variables?
Core variables are a measure of compliance to the specific SDTM-IG domain model. The value of a core variable shows the importance of the variable to the overall domain structure.
Variables are divided into 3 categories:
- Required variables are needed to identify a data record, for example STUDYID and USUBJID. Or, they are needed to make a record easily understood, for example TERM and TEST. They must always be included in the dataset and cannot be null.
- Expected variables are needed to make a record useful within a specific domain. They must always be included in the dataset but they can be null for some records. If no data is collected, a comment must be included to explain why.
- Permissible variables must be included in the dataset if results are collected or derived, but they can be left null or blank.
Variables from the parent class can also be inserted into the domain if required.
What SDTM domains are there?
Currently there’s a large collection of domains, and CDISC are constantly developing more. These consist of names, with abbreviations. For example, Demographics (DM), Subject Visits (SV), Adverse Events (AE), Lab Results (LB) and Vital Signs (VS) to name a few. Each SDTM domain usually consists of a file, named after the domain (e.g. AE.xpt).
Most observations that are collected fit into one of the general observation classes:
- Interventions datasets capture treatments and procedures that are given to a subject as specified by the protocol. Examples are Exposure (EX), Concomitant Medications (CM) and Substance Use (SU) e.g. tobacco, caffeine, alcohol.
- Events datasets capture planned protocol milestones such as randomization and study completion. Unplanned incidents that occur before, or during a study are also captured. Examples are Adverse Events (AE), Disposition (DS) and Medical History (MH).
- Findings captures observations that address specific questions such as observations made during physical examinations, laboratory tests, ECG testing etc. Findings About is included and captures data related to the Interventions and Events classes. Examples are Vital Signs (VS), Physical Exam (PE), Labs (LB) and Subject Characteristics (SC).
In addition to the general observation classes, there’s 3 special case classes:
- Special Purpose datasets can be Demographics (DM), Comments (CO), Subject Elements (SE) and Subject Visits (SV).
- Trial Design has datasets that describe the design of a trial. Examples are Trial Summary (TS), Trial Arms (TA) and Trial Visits (TV).
- Relationship datasets represent the relationships between datasets and records
What if my data doesn’t fit into published domains?
Then you create a custom domain! Which is easier said than done. Not to worry though, there’s some steps you can follow to help keep you on the right track. But firstly…
What’s a custom domain?
It’s a domain that’s created to include non-standard SDTM data. In order words, if there are no published domains that are a fit for your data, then you need to create a new, customized domain.
Always make sure you check the latest version of the SDTM-IG though. CDISC is continually updating it with new domains so you might just find one that fits after all.
3 things to consider before creating a custom domain
Ask yourself these questions before you set about creating a new custom domain.
- Is there a common topic in one of the published domains?
- Are you thinking of creating custom domain due to a separation of time? There’s no need to create a custom domain in a situation where time separation is a factor. You can represent both prior and current observations in a domain.
- Does the data relate to another domain e.g. an event or intervention, and have a linked “findings about” domain describing the event.
If you answered no to questions 1 and 2, you need a custom domain!
Steps to create a custom domain
So firstly, decide which of the general observation classes your data topic fits into best. Then follow the steps below.
- Come up with a 2 letter domain code that fits your topic but that’s not already used by any other domain. CDISC suggest you prefix with X, Y, Z to indicate a custom domain.
- Add identifier, topic, qualifier and timing variables. Make sure to set the variable in the same order within the domain as they are written here. Variables must then be ordered within their roles to match the order in the SDTM document.
- Add any code lists that are needed.
- Add ValueListRefs or ValueListDefs if required. An example this could be that a Finding domain will usually have a ValueList attached to the “Result” FTORRES. Additional valuelists may be used to describe other columns such as “Standard Result (Character) e.g. FTSTRESC.
- Add ValueList variables. For example, one for each FTTESTCD.
- Attach any relevant Supplemental Qualifiers to the domain.
How to implement SDTM
The following sections explain how to map source datasets to SDTM domains, considerations and other necessary deliverables.
How to do an SDTM mapping
The SDTM-IG extends and refines the SDTM core model with specific domain implementations, business rules, assumptions and examples. It should be used along with the relevant version of the SDTM. So make sure you have the correct versions of both of these documents.
Here’s some basic steps to help keep you on the right track:
- Determine which SDTM domains to create
- Compare the SDTM metadata to the SDTM metadata and map directly where possible
- Map the rest of the source datasets to SDTM domains
- Map variables in the source datasets to the variables in the SDTM domains
- Decide whether custom domains and SUPPQUAL domains need to be created
- Perform the data conversion – there are various mapping tools you can use to do this
- Validate the SDTM datasets
- Generate and validate Define.xml
There are a number of different types of SDTM mappings you can do for steps 2, 3 and 4 above.
- Directly map to a domain variable without making any changes
- Rename the source variable name and label without the need to make any other changes
- Map values to standard units or terminology
- Change the format of a source variable
- Combine two or more source variables to make a single domain variable
- Split a single source variable into two or more domain variables
- Derive a domain variable from one or more source variables using logic, computation, algorithm or decoding
And remember, you might need to use more than one type of mapping to create an SDTM variable.
SDTM mapping can be a complicated task, so it’s important to plan everything out in advance. By creating a mapping specification, you’ll know where data came from, how it came and where it’s to go to. There’s various mapping scenarios you can use. It’s important to use the SDTM model and Implementation Guide during this process. And by using standard process and tools, you’ll maximize your chances of success.
SDTM mapping specifications should be developed at the same time as annotating CRFs. The mapping specification tells the user how to do a mapping. An annotated CRF is a visual representation of a mapping showing how the source data relates to the SDTM data.
If this sounds like a lot to take on, there’s some neat technology that can help to automate this process. See how SDTM conversion can be much quicker and easier with our SDTM mapping tools and SDTM automation.
SDTM annotated CRFs
A Blank CRF is a collection of pages that is a mandatory deliverable for submission to the FDA. The file is always called blankcrf.pdf. Each question on a form must be manually annotated to show the origin of variables. It links the fields on the form with the variables in the dataset (the source of the data). Annotations help the reviewer find where variables come from in the submitted SDTM datasets. Find out more about the benefits of automating annotated CRFs.
What is SDTM controlled terminology?
SDTM has standard codelists for particular variables with allowable values for these variables. These values are required for submission to the FDA and PMDA in CDISC complaint SDTM datasets. You should always use the most up to version of controlled terminology when you start to map your SDTM datasets. Find out more about using NCI controlled terminology for standardizing data.
CDISC and NCI Enterprise Vocabulary Services partnered up to develop a standard controlled terminology. However the CDISC / NCI controlled terms for Lab tests are not unique. They require additional information for differentiation.
SDTM datasets and LOINC codes
Over the last 25 years, the LOINC project has provided a standard classification for health measurements. Most SDTM programmers will encounter “LOINC Code” information in Lab data. But the classification system has been extended to cover other measurements such as ECG. So what is LOINC? LOINC is an internationally recognized classification system, and is often requested in regulatory data submissions to provide context to clinical measurement data, for example Labs and ECG. Read more here about LOINC codes and SDTM.
SDTM Define XML
The FDA requires a Define.xml file to be included for all drug submissions. It describes the content and structure of date collected during the study. The Define.xml file makes the review of study data quicker and easier for the FDA. You can read our blog about using the Define XML standard for dataset design.
The latest version of the standard is Define 2.0. It describes the content and structure of date collected during the study which are domains, variables, methods, controlled terminology and supporting documents. One of the things that crops up often is how to handle data coming in from multiple sources. You can read our blog on how to describe multiple origins for a value in Define-XML 2.0.
Creating a define.xml requires a lot of programming expertise. It takes a lot of time. That’s why it’s so important to make the process as quick and easy as possible. See how our visual define.xml editor can get you faster Define.
How Formedix can help…
There’s a lot to get your head round! But, did you know we’re on the CDISC XML technical team? We were involved in creating the CDISC ODM and Define models. We’ve been in the business for over 20 years. So our CDISC knowledge isn’t too bad! Learn more about how we help you with CDISC Compliance. And, we’re well placed to give real-world, practical CDISC training.
Our clinical metadata repository and clinical trial automation software supports all versions of CDISC standards and SDTM automation. We keep our platform updated in line with CDISC and NCI standards. That way your study designs and datasets are always regulatory compliant.
And, you have the option of a free trial. It comes with 6 hours training to get you started!