Last updated: 2019-10-30
Checks: 2 0
Knit directory: fibroblast-clonality/
This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: .vscode/
Ignored: code/.DS_Store
Ignored: code/selection/.DS_Store
Ignored: code/selection/.Rhistory
Ignored: code/selection/figures/
Ignored: data/.DS_Store
Ignored: logs/
Ignored: src/.DS_Store
Ignored: src/Rmd/.Rhistory
Untracked files:
Untracked: .dockerignore
Untracked: .dropbox
Untracked: .snakemake/
Untracked: Rplots.pdf
Untracked: Snakefile_clonality
Untracked: Snakefile_somatic_calling
Untracked: analysis/.ipynb_checkpoints/
Untracked: analysis/assess_mutect2_fibro-ipsc_variant_calls.ipynb
Untracked: analysis/cardelino_fig1b.R
Untracked: analysis/cardelino_fig2b.R
Untracked: code/analysis_for_garx.Rmd
Untracked: code/selection/data/
Untracked: code/selection/fit-dist.nb
Untracked: code/selection/result-figure.R
Untracked: code/yuanhua/
Untracked: data/Melanoma-RegevGarraway-DFCI-scRNA-Seq/
Untracked: data/PRJNA485423/
Untracked: data/canopy/
Untracked: data/cell_assignment/
Untracked: data/cnv/
Untracked: data/de_analysis_FTv62/
Untracked: data/donor_info_070818.txt
Untracked: data/donor_info_core.csv
Untracked: data/donor_neutrality.tsv
Untracked: data/exome-point-mutations/
Untracked: data/fdr10.annot.txt.gz
Untracked: data/human_H_v5p2.rdata
Untracked: data/human_c2_v5p2.rdata
Untracked: data/human_c6_v5p2.rdata
Untracked: data/neg-bin-rsquared-petr.csv
Untracked: data/neutralitytestr-petr.tsv
Untracked: data/raw/
Untracked: data/sce_merged_donors_cardelino_donorid_all_qc_filt.rds
Untracked: data/sce_merged_donors_cardelino_donorid_all_with_qc_labels.rds
Untracked: data/sce_merged_donors_cardelino_donorid_unstim_qc_filt.rds
Untracked: data/sces/
Untracked: data/selection/
Untracked: data/simulations/
Untracked: data/variance_components/
Untracked: figures/
Untracked: output/differential_expression/
Untracked: output/differential_expression_cardelino-relax/
Untracked: output/donor_specific/
Untracked: output/nvars_by_category_by_donor.tsv
Untracked: output/nvars_by_category_by_line.tsv
Untracked: output/variance_components/
Untracked: qolg_BIC.pdf
Untracked: references/
Untracked: reports/
Untracked: src/Rmd/DE_pathways_FTv62_callset_clones_pairwise_vs_base.unst_cells.carderelax.Rmd
Untracked: src/Rmd/Rplots.pdf
Untracked: src/Rmd/cell_assignment_cardelino-relax_template.Rmd
Untracked: tree.txt
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 35c3269 | Davis McCarthy | 2019-10-30 | Updating index to get accurate update time |
html | 35c3269 | Davis McCarthy | 2019-10-30 | Updating index to get accurate update time |
Rmd | 550176f | Davis McCarthy | 2019-10-30 | Updating analysis to reflect accepted ms |
html | 8729e02 | davismcc | 2018-11-09 | Build site. |
Rmd | 218d792 | John Blischak | 2018-09-11 | Fix some links on homepage. |
html | 0540cdb | davismcc | 2018-09-02 | Build site. |
html | f0ed980 | davismcc | 2018-08-31 | Build site. |
html | ca3438f | davismcc | 2018-08-29 | Build site. |
html | e573f2f | davismcc | 2018-08-27 | Build site. |
html | 9ec2a59 | davismcc | 2018-08-26 | Build site. |
Rmd | cae617f | davismcc | 2018-08-26 | Updating simulation analyses |
html | 36acf15 | davismcc | 2018-08-25 | Build site. |
Rmd | 56d90a6 | davismcc | 2018-08-25 | Completing index with descriptions of data availability and new analyses. |
Rmd | d618fe5 | davismcc | 2018-08-25 | Updating analyses |
html | 090c1b9 | davismcc | 2018-08-24 | Build site. |
html | 02a8343 | davismcc | 2018-08-24 | Build site. |
Rmd | 97e062e | davismcc | 2018-08-24 | Updating Rmd’s |
Rmd | 43f15d6 | davismcc | 2018-08-24 | Adding data pre-processing workflow and updating analyses. |
html | d2e8b31 | davismcc | 2018-08-19 | Build site. |
html | 1489d32 | davismcc | 2018-08-17 | Add html files |
Rmd | 6b5f8c7 | davismcc | 2018-08-17 | Updating organisational pages. |
Rmd | 1cbadbd | davismcc | 2018-08-10 | Updating analyses. |
html | 2531565 | davismcc | 2018-08-08 | Tweaking clone prevalences |
Rmd | 7397e00 | davismcc | 2018-08-08 | Updating stylez and tweaking Rmds |
html | 9856275 | davismcc | 2018-08-07 | Build site. |
Rmd | 5fc189d | davismcc | 2018-08-07 | Start workflowr project. |
This project investigates clonality in human dermal fibroblast cell populations in 32 cell lines from distinct donors, using bulk whole-exome sequencing and single-cell RNA-sequencing data.
Key findings:
For a richer overview, see the About page.
A pre-print describing the work in detail is available:
The data pre-processing for this project from the raw data described above is complicated and computationally expensive, so this repository does not reproduce the data pre-processing in an automated way. However, we provide the source code for the Snakemake workflow for data pre-processing in this repository. Docker images providing the computing environment and software used are publicly available, split into an image for command line bioinformatics tools and an R installation with necessary packages installed.
If you would like to pre-process the data from raw reads to results as we have, please consult our description of how to run the workflow.
Here we present the reproducible the results of our analyses. They were generated by rendering the R Markdown documents into webpages available at the links below.
The results presented in the paper were produced with these analyses.
This is a complicated project, and reproducing all of the results presented, especially from raw data is highly non-trivial. Nevertheless, we have made all data available so that everything is entirely reproducible.
Single-cell RNA-seq data have been deposited in the ArrayExpress database at EMBL-EBI under accession number E-MTAB-7167. Whole-exome sequencing data is available through the HipSci portal. Processed data and large results files are available from Zenodo with DOI 10.5281/zenodo.1403510.
To set up the project to reproduce our analyses, first clone the source code repository from GitHub. Next, download all of the reference, metadata and results files and add them to the (cloned) project folder with the following structure:
.
├── data
│ ├── canopy
│ │ ├── canopy_results.*.rds
│ ├── cell_assignment
│ │ ├── cardelino_results.*.rds
│ ├── de_analysis_FTv62
│ │ ├── cellcycle_analyses
│ │ │ ├── filt_lenient.all_filt_sites.de_results_unstimulated_cells.cc.rds
│ │ │ ├── filt_lenient.cell_coverage_sites.de_results_unstimulated_cells.cc.rds
│ │ │ ├── filt_strict.all_filt_sites.de_results_unstimulated_cells.cc.rds
│ │ │ └── filt_strict.cell_coverage_sites.de_results_unstimulated_cells.cc.rds
│ │ ├── filt_lenient.all_filt_sites.de_results_unstimulated_cells.rds
│ │ ├── filt_lenient.cell_coverage_sites.de_results_unstimulated_cells.rds
│ │ ├── filt_strict.all_filt_sites.de_results_unstimulated_cells.rds
│ │ └── filt_strict.cell_coverage_sites.de_results_unstimulated_cells.rds
│ ├── donor_info_070818.txt
│ ├── donor_info_core.csv
│ ├── donor_neutrality.tsv
│ ├── exome-point-mutations
│ │ ├── high-vs-low-exomes.v62.ft.alldonors-filt_lenient.all_filt_sites.vep_most_severe_csq.txt
│ │ └── high-vs-low-exomes.v62.ft.filt_lenient-alldonors.txt.gz
│ ├── human_H_v5p2.rdata
│ ├── human_c2_v5p2.rdata
│ ├── human_c6_v5p2.rdata
│ ├── neg-bin-rsquared-petr.csv
│ ├── neutralitytestr-petr.tsv
| ├── sces
│ │ ├── sce_*.rds
│ ├── selection
│ │ ├── neg-bin-params-fit.csv
│ │ ├── neg-bin-rsquared-fit.csv
│ ├── simulations
│ │ ├── *.filt_lenient.cell_coverage_sites.mult.rds
│ │ ├── *.simulate.rds
│ └── variance_components
│ ├── covar_all.csv
│ ├── donorVar
│ │ ├── *.var_part.var1.csv
│ ├── fit_all_gene_highVar.csv
│ ├── fit_per_gene_highVar.csv
│ ├── gene_info_all.csv
│ └── logcnt_all.csv
├── metadata
│ ├── cell_metadata.csv
│ └── data_processing_metadata.tsv
├── references
│ ├── 1000G_phase1.indels.hg19.sites.vcf.gz
│ ├── GRCh37.p13.genome.ERCC92.fa
│ ├── Homo_sapiens.GRCh37.rel75.cdna.all.ERCC92.fa.gz
│ ├── Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz
│ ├── dbsnp_138.hg19.biallelicSNPs.HumanCoreExome12.Top1000ExpressedIpsGenes.Maf0.01.HWE0.0001.HipSci.vcf.gz
│ ├── dbsnp_138.hg19.vcf.gz
│ ├── gencode.v19.annotation_ERCC.gtf
│ ├── hipsci.wec.gtarray.HumanCoreExome.imputed_phased.20170327.genotypes.allchr.fibro_samples_v2_filt_vars_sorted_oa.vcf.gz
│ ├── hipsci.wec.gtarray.HumanCoreExome.imputed_phased.20170327.genotypes.allchr.fibro_samples_v2_filt_vars_sorted_oa.vcf.gz.csi
│ └── knownIndels.intervals
For simplicity, we ignore all the directories and files present in the source code repository (that you should have clones) to focus just on where you should add the files downloaded from Zenodo. Yes, it’s still complicated, but such is life.
There is a large number of canopy_results.*.rds
files: these should be stored in the data/canopy
directory. Similarly, all of the cardelino_results.*.rds
files should be stored in data/cell_assignment
. All of the SingleCellExperiment object files (sce_*.rds
) should be stored in data/sces
. Simulation results files (*.mult.rds
; *.simulate.rds
) should be stored in data/simulations
. Variance components results should be stored in data/variance_components
as shown above.
Differential expression results belong in data/de_analysis_FTv62
.
Metadata files belong in metadata
. Reference files belong in references
.
With the data downloaded and organised as above, you will be able to reproduce the analyses presented in the RMarkdown files linked to above and, if desired, even run the whole analysis pipeline from raw reads to results following these instructions.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.