ortho_seqs: A Python tool for sequence analysis and higher order sequence–phenotype mapping (preprint) Icon
September 17, 2022

An important goal in sequence analysis is to understand how parts of DNA, RNA, or protein sequences interact with each other and to predict how these interactions result in given phenotypes. Mapping phenotypes onto underlying sequence space at first- and higher order levels in order to independently quantify the impact of given nucleotides or residues along a sequence is critical to understanding sequence–phenotype relationships. We developed a Python software tool, ortho_seqs, that quantifies higher order sequence-phenotype interactions based on our previously published method of applying multivariate tensor-based orthogonal polynomials to biological sequences.

AIRRscape: an interactive tool for exploring B-cell receptor repertoires and antibody responses Icon
May 27, 2022

Technological advances in next generation sequencing have allowed for broad experimental sampling of immune repertoires, providing insight into how our immune system responds to infection, vaccination, autoimmunity, and cancer. The scale of these “big data”, however, make it difficult to bioinformatically extract the key sequence features that are shared across multiple repertoires. With AIRRscape, we enable large-scale immune repertoire visualization and analysis that requires no knowledge of the command line or advanced programming. By providing the community with an open-source, interactive, and user-friendly interface, we reduce the barriers to exploring immune repertoires at scale.

The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans Icon
May 13, 2022

Molecular characterization of cell types using single-cell transcriptome sequencing is revolutionizing cell biology and enabling new insights into the physiology of human organs. We created a human reference atlas comprising nearly 500,000 cells from 24 different tissues and organs, many from the same donor. This atlas enabled molecular characterization of more than 400 cell types, their distribution across tissues, and tissue-specific variation in gene expression.

Leveraging the Cell Ontology to classify unseen cell types Icon
September 21, 2021

Here, we present OnClass, an algorithm and accompanying software for automatically classifying cells into cell types that are part of the controlled vocabulary that forms the Cell Ontology.

Analyzing genomic data using tensor-based orthogonal polynomials with application to synthetic RNAs Icon
December 11, 2020

An important goal in molecular biology is to quantify both the patterns across a genomic sequence and the relationship between phenotype and underlying sequence. We propose a multivariate tensor-based orthogonal polynomial approach to characterize nucleotides or amino acids in a given sequence and map corresponding phenotypes onto the sequence space.

MARS: discovering novel cell types across heterogeneous single-cell experiments Icon
October 19, 2020

Although tremendous effort has been put into cell-type annotation, identification of previously uncharacterized cell types in heterogeneous single-cell RNA-seq data remains a challenge. Here we present MARS, a meta-learning approach for identifying and annotating known as well as new cell types.

A single-cell transcriptomic atlas characterizes ageing tissues in the mouse Icon
July 15, 2020

Despite rapid advances over recent years, many of the molecular and cellular processes that underlie the progressive loss of healthy physiology are poorly understood. To gain a better insight into these processes, here we generate a single-cell transcriptomic atlas across the lifespan of Mus musculus that includes data from 23 tissues and organs.

Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris Icon
October 18, 2018

Here we present a compendium of single-cell transcriptomic data from the model organism Mus musculus that comprises more than 100,000 cells from 20 organs and tissues. Learn more.