scater

Single cell analysis toolkit for expression from RNA-seq in R

View project on GitHub

What is scater?

scater is an R package to help with the analysis of single-cell RNA-seq data, with a focus on pre-processing, quality control, data normalisation and visualisation.

Overview

Let's be honest. We'd rather not do quality control. We'd all rather spend our time answering exciting scientific questions and fitting sexy statistical models to nice clean datasets. Sadly, getting those nice clean datasets takes a lot of work. Work vital for the success of projects in genomics, but boring work.

This package, scater aims to ease this burden, making rigorous QC easier so that we can get to the exciting stuff quicker.

This package contains useful tools for the analysis of single-cell gene expression data using the statistical software R. The package places an emphasis on tools for quality control, visualisation and pre-processing of data before further downstream analysis.

By providing wrapper functions to kallisto, a tool for extremely fast quantification of transcript abundance from RNA-seq reads, we can go from a set of FASTQ files to an object in an R session with summarised abundance values in a couple of lines of code, in minutes.

The scater package provides a sane data structure for single-cell RNA-seq data based on Bioconductor's ExpressionSet, to organise complicated datasets with convenient subsetting.

We provide many plotting functions and functions to compute QC metrics and make it easy to work through pre-processing, QC and normalisation steps.

We hope that scater fills a useful niche between raw RNA-sequencing data and more focused downstream modelling tools such as monocle, scLVM, SCDE, edgeR, limma and so on.

Installation

This package currently lives on GitHub, so I recommend using Hadley Wickham's devtools package to install scater directly from GitHub. If you don't have devtools installed, then install that from CRAN (as shown below) and then run the call to install scater:

install.packages("devtools")
devtools::install_github("davismcc/scater")

We plan to contribute scater to Bioconductor in the near future.

Getting started

The best place to start is the vignette. The vignette provides explanations of the structure and use of the package, with example analyses demonstrating how to step through pre-processing and QC.

Acknowledgements and disclaimer

The package leans heavily on previously published work and packages, namely edgeR and limma. The SCESet is heavily inspired by the CellDataSet class from monocle.

The package is currently under active development, so is to be used with appropriate caution as some features may be unstable or change without warning.

Please do try it, though, and contact me with bug reports, feedback, questions and suggestions to improve the package.

Davis McCarthy, June 2015