Next-generation statistical simulator provides benchmarking tool

By Elissa Wolfson, The Science Advisory Board assistant editor

May 11, 2023 -- UCLA researchers have developed an all-in-one, next-generation statistical simulator capable of assimilating a wide range of information to generate realistic synthetic data and provide a benchmarking tool for researchers who use advanced technologies to study diseases and potential therapies. The study, published Thursday in Nature Biotechnology, indicates that the new computer modeling, or in silico, system, can help researchers evaluate and validate computational methods.

Single-cell RNA sequencing, called single-cell transcriptomics, is the foundation for analyzing the genome-wide gene expression levels of cells. The introduction of additional information generated by fields with names ending with omics -- including genomics, proteomics, phenomics, transcriptomics, metabolomics, lipidomics, and epigenomics -- offers detail on a range of molecular features. In recent years, spatial transcriptomic technologies made it possible to profile gene expression levels -- the cells' genetic makeup -- with spatial location information of cell neighborhoods, showing the precise locations and movements of cells within tissue.

Thousands of computational methods have been developed to analyze single-cell and spatial omics data for a variety of tasks, making method benchmarking a pressing challenge for method developers and users. Although simulators have evolved and become more powerful, researchers say there are numerous limitations. For example, few simulators can generate realistic single-cell RNA sequencing data from continuous cell trajectories by mimicking real data, and most lack the ability to simulate data of multi-omics and spatial transcriptomics.

The researchers developed a new statistical simulator, scDesign3, that would generate realistic single-cell and spatial omics data, including various cell states, experimental designs, and feature modalities, by learning interpretable parameters from real data. The researchers believe scDesign3 offers the first probabilistic model that unifies the generation and inference for single-cell and spatial omics data. Equipped with interpretable parameters and a model likelihood, they say scDesign3 is beyond a versatile simulator and has unique advantages for generating customized in silico data, which can serve as negative and positive controls for computational analysis, and for assessing inferred cell clusters, trajectories, and spatial locations in an unsupervised way for goodness-of-fit, a measure of how well a statistical model fits a set of observations.

In summary, they believe scDesign3 accommodates various cell statuses, diverse omics modalities, and complex experimental designs, making it a superior multifunctional suite for benchmarking computational methods and interpreting single-cell and spatial omics data. They would like to see the system's transparent modeling and interpretable parameters help users explore, alter, and simulate data.

"We introduced the scDesign3, which we believe is the most realistic and versatile simulator to date, to fill the gap between researchers' benchmarking needs and the limitations of existing tools," Jingyi Jessica Li, senior author and UCLA researcher and professor in statistics, biostatistics, computational medicine, and human genetics, said in a statement.

Curio Bioscience emerges from stealth mode with whole-transcriptome spatial mapping kit
Curio Bioscience on Wednesday emerged from stealth mode, saying it has commenced commercial operations with the launch of Curio Seeker, a high-resolution,...
Transcriptomics help visualize diffuse midline glioma
A Dana-Farber-led team used spatial single-cell transcriptomics to visualize diffuse midline glioma cancer cell structure across different age groups...
NCI awards $4.2M to Mount Sinai to create Proteogenomic Data Analysis Center
The National Cancer Institute (NCI) has awarded $4.2 million over five years to Mount Sinai researchers to establish a Proteogenomic Data Analysis Center...
Veranome, Cold Spring Harbor collaborate on in situ sequencing technology
Veranome Biosystems has entered into a collaboration and licensing agreement with Cold Spring Harbor Laboratory to add in situ sequencing technology to...
Gretel, Illumina partner to develop synthetic data for genomic research
Gretel announced a collaboration with Illumina to create synthetic genomic data that can be accessed by medical researchers globally. The companies also...
Waters debuts mass spectrometer with MRT technology
Waters has introduced its next-generation Select Series multireflecting time-of-flight (MRT) mass spectrometer. The new system is designed to provide...
Rebus Biosystems debuts its Epser spatial omics platform
Rebus Biosystems has debuted its Esper spatial omics platform to advance the study of tissue biology and innovations in neuroscience, cancer, infectious...
Veranome unveils complete spatial omics workflow
Veranome Biosystems has unveiled its complete sample-to-insight high-plex spatial omics workflow enabled by the company's Pisces technology.

Copyright © 2023

Science Advisory Board on LinkedIn
Science Advisory Board on Facebook
Science Advisory Board on Twitter