The Formedix Clinic

Drop in and catch up on the latest news and standards knowledge. No appointment necessary.

The Formedix Clinic > Using Define-XML for Dataset Design

Using Define-XML for Dataset Design

22 January 2018
by Kevin Burges About The Author

This latest post in our "Introduction to CDISC" series is all about Define-XML. It's known by many due to it being required in regulatory submissions, but why does it exist and what benefits can it bring you?

In the past sponsors submitting to FDA were required to submit a PDF describing their submission datasets.  As we all know PDF is great for viewing on screen or printing, but the information inside it can't be interpreted by a computer in any meaningful way. Enter CDISC's Define-XML model...

Define-XML standardizes how to describe datasets in a machine-readable manner. It can be used to define any tabular dataset structure, though it's primarily used to describe SDTM, ADaM and SEND datasets for regulatory submission.  FDA now requires that all submissions use Define-XML to describe their datasets, so it's something we all need to be familiar with.

Get example Define-XML and PDF files  DOWNLOAD

If you're currently starting your Define-XML at the end of your study, you're missing many of the benefits it can bring to your end to end study process. This post provides an overview of how it can be used throughout your process to drive efficiencies. We'll not be diving into how it actually works. 

Define SDTM, ADaM and SEND datasets up front

If you don't know where you're going, how do you know if you've arrived at the right place? The answer is, of course, you don't. However, this is how many organizations still approach clinical trials. They define their CRFs, collect data, then think about converting to SDTM datasets at the end. There are two problems with this process.

  • They don't know when designing the CRFs if all relevant SDTM data is being collected
  • They don't have a definition for what they want to submit, so can't verify if the submission data is what they intended

This can lead to incomplete data, protocol amendments, complex mapping to standardized CDISC datasets, increased QA, and ultimately an elongated study process.

The solution is to define your study, end to end, right at the start. This gives confidence, before you start even collecting your data, that your CRFs are correct and can easily be converted into submission datasets that will satisfy the regulator. 

The first step in this is to define your submission datasets up front, using Define-XML. The right dataset design software will help you rapidly define SDTM, SEND and ADaM datasets and export their definition as Define-XML.

Check compliance of Define-XML dataset definitions to SDTM, SEND and ADaM

You can verify the compliance of your submission dataset designs to CDISC standards before collecting any data, by running standard validation tools. Once you have the datasets defined, the next step is to define the mappings to them.

Mapping CDASH to SDTM
Mapping from EDC to submission datasets

Some EDC systems support exporting data in ODM format that matches your study design, however most people use tools that are aimed at working with tabular datasets. Datasets are still by far the most popular type of data export from an EDC system. If you have created your study from an ODM study specification then the datasets will bear some resemblance to the ODM, but they're not the same. When mapping your collected data to SDTM or SEND, you need to know what the datasets coming from your EDC system will look like.

A clever study design environment will be able predict these EDC export datasets and generate a Define-XML describing them. This enables you to define the mappings to your submission datasets, before collecting any data. If your CRFs are designed using CDASH, mapping should be a breeze.

Benefits of Standardization and Re-use

Take a look at our previous blog on using ODM and CDASH for CRF design for details of how you can dramatically decrease your study setup time by standardizing and re-using your ODM and Define-XML study designs in a CDISC metadata repository

Verify CRO datasets against your Define-XML specification

If you're using a CRO to generate your submission datasets, how do you know what they've delivered is correct? If you have defined your datasets up front using Define-XML, you can automatically verify whether the delivered data conforms to your original specification. This greatly reduces the amount of QA resource required and will surface any problems much faster. No more having to manually check data against an Excel or PDF specification

View as PDF or HTML

Define-XML is great for computers, but it's not something most people want to look at. Thankfully it can easily be converted into PDF or HTML, making it simple for anyone to understand.

Working with legacy data

Organizations often have lots of legacy data in XPT datasets for which they have no machine readable metadata. There may not even be something akin to an Excel description of the data, or of there is it may be incomplete or incorrect. To  help make better use of this data, it's possible to generate Define-XML metadata directly from the XPT datasets. This makes it easy to understand the content of the datasets and make appropriate use of them.

 

As you can see, Define-XML isn't just something you should be using because you have to. It can bring real benefits to your study process.

Download our Define-XML example, including HTML and PDF views.

 

Other posts in this series:

Topics: CDISC, SDTM, Define-XML, ADaM, SEND