scater
?
What is scater
is an R package to help with the analysis of single-cell RNA-seq data, with a focus on pre-processing, quality control, data normalisation and visualisation.
Overview
Let's be honest. We'd rather not do quality control. We'd all rather spend our time answering exciting scientific questions and fitting sexy statistical models to nice clean datasets. Sadly, getting those nice clean datasets takes a lot of work. Work vital for the success of projects in genomics, but boring work.
This package, scater
aims to ease this burden, making rigorous QC easier so that we can get to the exciting stuff quicker.
This package contains useful tools for the analysis of single-cell gene expression data using the statistical software R. The package places an emphasis on tools for quality control, visualisation and pre-processing of data before further downstream analysis.
By providing wrapper functions to kallisto
, a tool for extremely fast quantification of transcript abundance from RNA-seq reads, we can go from a set of FASTQ files to an object in an R session with summarised abundance values in a couple of lines of code, in minutes.
The scater
package provides a sane data structure for single-cell RNA-seq data based on Bioconductor's ExpressionSet, to organise complicated datasets with convenient subsetting.
We provide many plotting functions and functions to compute QC metrics and make it easy to work through pre-processing, QC and normalisation steps.
We hope that scater
fills a useful niche between raw RNA-sequencing data and more focused downstream modelling tools such as
monocle, scLVM,
SCDE, edgeR, limma and
so on.
Installation
This package currently lives on GitHub, so I recommend using Hadley Wickham's
devtools
package to install scater
directly from GitHub. If you don't have
devtools
installed, then install that from CRAN (as shown below) and then run
the call to install scater
:
install.packages("devtools")
devtools::install_github("davismcc/scater")
We plan to contribute scater
to Bioconductor in the near future.
Getting started
The best place to start is the vignette. The vignette provides explanations of the structure and use of the package, with example analyses demonstrating how to step through pre-processing and QC.
Acknowledgements and disclaimer
The package leans heavily on previously published work and packages, namely
edgeR and
limma. The SCESet
is heavily inspired by the CellDataSet
class from monocle.
The package is currently under active development, so is to be used with appropriate caution as some features may be unstable or change without warning.
Please do try it, though, and contact me with bug reports, feedback, questions and suggestions to improve the package.
Davis McCarthy, June 2015