Next-generation statistical simulator provides benchmarking tool

By Elissa Wolfson, The Science Advisory Board assistant editor

May 11, 2023 -- UCLA researchers have developed an all-in-one, next-generation statistical simulator capable of assimilating a wide range of information to generate realistic synthetic data and provide a benchmarking tool for researchers who use advanced technologies to study diseases and potential therapies. The study, published Thursday in Nature Biotechnology, indicates that the new computer modeling, or in silico, system, can help researchers evaluate and validate computational methods.

Single-cell RNA sequencing, called single-cell transcriptomics, is the foundation for analyzing the genome-wide gene expression levels of cells. The introduction of additional information generated by fields with names ending with omics -- including genomics, proteomics, phenomics, transcriptomics, metabolomics, lipidomics, and epigenomics -- offers detail on a range of molecular features. In recent years, spatial transcriptomic technologies made it possible to profile gene expression levels -- the cells' genetic makeup -- with spatial location information of cell neighborhoods, showing the precise locations and movements of cells within tissue.

Thousands of computational methods have been developed to analyze single-cell and spatial omics data for a variety of tasks, making method benchmarking a pressing challenge for method developers and users. Although simulators have evolved and become more powerful, researchers say there are numerous limitations. For example, few simulators can generate realistic single-cell RNA sequencing data from continuous cell trajectories by mimicking real data, and most lack the ability to simulate data of multi-omics and spatial transcriptomics.

The researchers developed a new statistical simulator, scDesign3, that would generate realistic single-cell and spatial omics data, including various cell states, experimental designs, and feature modalities, by learning interpretable parameters from real data. The researchers believe scDesign3 offers the first probabilistic model that unifies the generation and inference for single-cell and spatial omics data. Equipped with interpretable parameters and a model likelihood, they say scDesign3 is beyond a versatile simulator and has unique advantages for generating customized in silico data, which can serve as negative and positive controls for computational analysis, and for assessing inferred cell clusters, trajectories, and spatial locations in an unsupervised way for goodness-of-fit, a measure of how well a statistical model fits a set of observations.

In summary, they believe scDesign3 accommodates various cell statuses, diverse omics modalities, and complex experimental designs, making it a superior multifunctional suite for benchmarking computational methods and interpreting single-cell and spatial omics data. They would like to see the system's transparent modeling and interpretable parameters help users explore, alter, and simulate data.

"We introduced the scDesign3, which we believe is the most realistic and versatile simulator to date, to fill the gap between researchers' benchmarking needs and the limitations of existing tools," Jingyi Jessica Li, senior author and UCLA researcher and professor in statistics, biostatistics, computational medicine, and human genetics, said in a statement.


Copyright © 2023 scienceboard.net
 


Email Address:

First Name:

Last Name:

Learn about ScienceBoard

Get the latest life sciences research and industry news, delivered straight to your inbox, for free.

Why subscribe?

ScienceBoard is uniquely focused on the business of research, addressing the biggest problems that the biomedical industry face. You’ll get breaking news, events coverage, and deep dives into the science that drives innovation, delivered to your inbox daily.

I have read and agree to the privacy policy and terms of service and wish to opt-in for ScienceBoard.net.