Using NCI Controlled Terminology for Standardizing Data
Use of consistent terminology within and between studies is critical to enabling an efficient trial process. CDISC standards such as CDASH, SDTM, SEND and ADaM standardize the structures to use when collecting and submitting data, but what about the data values themselves?
This post gives an overview of CDISC’s standardized controlled terminology and how it should be used to collect and submit data in a way that speeds trials and enables cross study analysis.
CDISC partner with NCI to define standardized terminology
CDISC have partnered with the National Cancer Institute (NCI) to help bring consistency to terminology in clinical trials. Through this collaboration a standard set of terminology has been defined and is updated quarterly, bringing semantic consistency across studies. Using this terminology, the same piece information will be represented in the same way across studies.
Lets take a simple example – Sex:
- Should Male be represented by “Male”, “MALE”, “M” or “1”?
- How should someone who is intersex be recorded?
In days gone by this information may have been collected in different ways for each study. The CDISC NCI Terminology standard now defines that the only allowable options are “F”, “M”, “U” and “UNDIFFERENTIATED”, and that a person who is intersex should be recorded as “UNDIFFERENTIATED”. You can see how this very simple example already makes it much easier to understand the underlying data and easily segment for different populations across studies.
FDA’s data standards catalog now mandates that the standardized NCI terminology be used for all new studies. This makes a reviewer’s life much easier, as they know how to interpret the data.
How is CDISC NCI Terminology defined?
The table below shows how the standard Sex terminology is defined.
Identifying standardized terminology
The blue line represents the code list that contains all the individual codes. The code list has a standard “C-code” of “C66731”. Each of the individual codes has its own C-code, e.g. “C16576”. These C-codes are used in ODM and Define-XML metadata to identify code lists as being NCI standard terminology.
This codelist is defined as being non-extensible, which means that wherever the codelist is used it must not contain any terms not listed here. Code lists that are extensible can have additional terms added. These additional terms must be described in Define-XML as being extended terms and cannot be duplicates or synonyms of existing terms.
Submission Value, Synonyms and Preferred Term
The submission value is what must be submitted to a regulator. The value actually displayed on a form may be different, and may be one of the Synonyms or the Preferred Term.
Available in many colours
The NCI Terminology standards are a available in a number of formats to suit different needs, including Excel, ODM, HTML and RDF. The example above is taken from the Excel version.
Grouped into subsets for CDASH, SDTM, ADaM and more
For convenience, the NCI Terminology is grouped into a number of subsets.
- SDTM terminology contains terms for use in an SDTM-based submission
- CDASH terminology contains terms for use in CDASH-based data collection
- ADaM terminology contains terms for use in ADaM datasets
- SEND terminology contains terms for use in a SEND-based submission
- Protocol terminology contains terms for use with the Protocol Representation Model (PRM) and Clinical Trial Registry XML (CTR-XML)
Using CDISC NCI Terminology in a study
When defining submission metadata in Define-XML, code lists are used to define fixed lists of values for certain variables. As per current FDA regulation, these code lists must conform to the standardized CDISC NCI terminology, where such terminology exists. They are linked to the standardized terminology by adding Aliases to the C-codes, e.g.:
<Alias Name="C66731" Context="nci:ExtCodeID"/>
When defining data collection metadata in ODM there is no requirement to use the standardized terminology, however doing so will remove the need to map terms from your collected data to your submitted data. This greatly simplifies your end-to-end study process and increases consistency between studies.
Management of updates
Managing terminology updates can be a significant amount of work, particularly with the standards being updated so regularly. How do you know what studies or organizational standards might be affected by new terminology? How do you then make the required updates in an efficient, reliable manner?
The easiest way to manage this is by using a CDISC-aware metadata repository that allows you to define organizational standards built upon the standardized terminology, and build your studies from those standards. As the repository understands the relationships between all the standards and your studies, it can simplify the process of updating to new versions of the CDISC NCI Terminology standards.
Check compliance to CDISC NCI Terminology
Once your standardized datasets have been defined, you should generate a Define-XML that describes the datasets including references to the standardized terminologies. You can then run a compliance check to verify your dataset designs against the NCI terminology.