Machine learning predicts if a COVID-19 clinical trial will be successful

By Samantha Black, PhD, ScienceBoard editor in chief

July 23, 2021 -- Using machine learning, researchers have teased out the underlying factors of success for COVID-19 clinical trials, according to a PLOS One article published on July 12. The analysis showed that drug features and study keywords are the most informative features of COVID-19 trials.

Randomized clinical trials have provided bodies of evidence for the approval of emergency use authorizations for several COVID-19 drugs and vaccines. As of July 15, more than 6,180 COVID-19 clinical trials have been registered through ClinicalTrials.gov, the U.S. registry and database for privately and publicly funded clinical studies conducted worldwide.

Not all studies are expected to be completed or achieve the intended goals, according to researchers from Florida Atlantic University. Around 10% to 12% of trials are terminated due to various reasons, including the following:

Insufficient enrollment
Scientific data from trial
Safety/efficacy concerns
Administrative reasons
External information from a different study
Lack of funding/resources
Business/sponsor decisions

It is imperative to understand what type of trials are going to be successful in the face of a global epidemic, as they provide critical safety and efficacy data leading to regulatory approvals. In the current study, Xingquan Zhu, PhD, senior author and a professor in the department of computer and electrical engineering and computer science at Florida Atlantic University, and Magdalyn Elkin, a second-year doctoral student in the department, used machine learning to predict COVID-19 trial completion or cessation (terminated, withdrawn, or suspended).

"The main purpose of our research was to predict whether a COVID-19 clinical trial will be completed or terminated, withdrawn, or suspended," said Zhu, in a statement. "Clinical trials involve a great deal of resources and time, including planning and recruiting human subjects."

Because COVID-19 is still a relatively novel disease, most clinical trials are still actively recruiting, and only around 14% have completed. The researchers selected 772 COVID-19 clinical trials from the dataset in which the status was marked as "completed," "terminated," "withdrawn," or "suspended," with 81.3% completed and 18.7% cessation.

Each trial was identified by selected clinical trial features. Statistics features, which model clinical trials with respect to administrative, study information, study design, and eligibility criteria, were first considered. Next, the researchers considered drug features, which are derived from the Intervention Medical Subject Heading field in the clinical trial report. Of the trials in the study, 48.83% were interventional.

Keyword features capture important terms describing a clinical trial's summary and were the third category included in the analysis. Finally, embedding features, which provide textual descriptions of the trial, such as study objective, expected participants, and process and procedures, were considered. Cumulatively, 693 features were represented in each of the 772 trials included in the analysis.

Click image to enlarge.

As of July 15, more than 6,180 COVID-19 clinical trials have been registered through ClinicalTrials.gov. Image courtesy of Florida Atlantic University/College of Engineering and Computer Science.

The researchers used the ReliefF similarity-based feature selection method to study each feature's impact on completion or cessation. After training and testing samples, the research duo compared the data using four predictive models: neural network, random forest, XGBoost, and logistic regression.

"If we can predict the likelihood of whether a trial might be terminated or not down the road, it will help stakeholders better plan their resources and procedures," Zhu stated. "Eventually, such computational approaches may help our society save time and sources to combat the global COVID-19 pandemic."

The authors noted that the current study was focused on investigating a particular disease (COVID-19), and the trials compared shared more common features, which provides a large improvement in the predictive power of modeling clinical trials. Segregating clinical trial data by research area or disease resulted in balanced accuracy as high as 70% and F1 score (harmonic mean of precision and recall) of 50%. The study achieved area under the curve (AUC) scores of over 0.87 and balanced accuracy for prediction scores of over 0.81, indicating high efficacy of using computational methods for COVID-19 clinical trial prediction.

Feature selection and ranking showed that keyword features were the most informative for COVID-19 trial prediction, followed by drug features, statistics features, and embedding features. Because most of the trials were interventional, it is logical that drug intervention (a drug feature) is a key component to a trial's success.

"Clinical trials that have stopped for various reasons are costly and often represent a tremendous loss of resources," said Stella Batalama, PhD, dean of the College of Engineering and Computer Science at Florida Atlantic University. "As future outbreaks of COVID-19 are likely even after the current pandemic has declined, it is critical to optimize efficient research efforts.

"The new approach ... will be helpful to design computational approaches to predict whether or not a COVID-19 clinical trial will be completed so that stakeholders can leverage the predictions to plan resources, reduce costs, and minimize the time of the clinical study," Batalama concluded.

Do you have a unique perspective on your research related to clinical trials or artificial intelligence? Contact the editor today to learn more.

---

Join The Science Advisory Board today!

If you like this content, please share it with a colleague!