リボソームプロファイリングのクオリティメトリクスを提供する MappingQC

MappingQCは、リボソームプロファイリングデータのマッピングの品質の概要を示すいくつかの図を簡単に生成するツールである。より具体的には、 P site offsetの計算、遺伝子分布、およびメタジェニック分類の概要を示す。さらに、MappingQCは、データの標準的なトランスクリプトのトリプレット周期性とリンクされたトリプレットフェーズ（リボソームプロファイリングに典型的）の徹底的な分析を行う。特に、phase distributionとRPFの長さ、リンクの相対的な位置、およびトリプレット同一性の間のリンクが考慮される。

Galaxy版（link）とローカル版がある。ここではローカル版使用の流れを簡単にまとめます。

インストール

依存（Githubより）

MappingQC relies on following Perl modules which have to be installed on your system:

DBI
Getopt::Long
Parallel::ForkManager
CWD
Data::Dumper (for debugging purposes)

Furthermore, mappingQC relies on following Python2 modules which have to be installed on your system:

getopt
defaultdict (collections)
sqlite3
pandas
numpy
matplotlib (including pyplot, colors, cm, gridspec, ticker and mplot3d)
seaborn

Github

#bioconda (link) ここでは仮想環境mappingqcに入れる
conda create -n mappingqc -c bioconda -y mqc python=2.7
conda activate mappingqc

> map2bed.pl

$ map2bed.pl

map2bed converts ART's map files to a BED file

USAGE: /usr/local/bin/map2bed.pl out_bed_file.bed in_map_file_1 [ in_map_file_2 ...]

(mqc) kazuma@kamisakumanoMBP:~/Downloads$ mQC.pl

Working directory : /Users/kazuma/Downloads

The following tmpfolder is used : /Users/kazuma/Downloads/tmp

MappingQC (Stand-alone version)

MappingQC is a tool to easily generate some figures which give a nice overview of the quality of the mapping of ribosome profiling data. More specific, it gives an overview of the P site offset calculation, the gene distribution and the metagenic classification. Furthermore, MappingQC does a thorough analysis of the triplet periodicity and the linked triplet phase (typical for ribosome profiling) in the canonical transcript of your data. Especially, the link between the phase distribution and the RPF length, the relative sequence position and the triplet identity are taken into account.

Input parameters:

--help this helpful screen

--work_dir working directory to run the scripts in (default: current working directory)

--experiment_name customly chosen experiment name for the mappingQC run (mandatory)

--samfile path to the SAM/BAM file that comes out of the mapping script of PROTEOFORMER (mandatory)

--cores the amount of cores to run the script on (integer, default: 5)

--species the studied species (mandatory)

--ens_v the version of the Ensembl database you want to use

--tmp temporary folder for storing temporary files of mappingQC (default: work_dir/tmp)

--unique whether to use only the unique alignments.

Possible options: Y, N (default Y)

--mapper the mapper you used to generate the SAM file (STAR, TopHat2, HiSat2) (default: STAR)

--maxmultimap the maximum amount of multimapped positions used for filtering the reads (default: 16)

--ens_db path to the Ensembl SQLite database with annotation info. If you want mappingQC to download the right Ensembl database automatically for you, put in 'get' for this parameter (mandatory)

--offset the offset determination method.

Possible options:

- plastid: calculate the offsets with Plastid (Dunn et al. 2016)

- standard: use the standard offsets from the paper of Ingolia et al. (2012) (default option)

- from_file: use offsets from an input file

--plastid_bam the mapping bam file for Plastid offset generation (default: convert)

--min_length_plastid the minimum RPF length for Plastid offset generation (default 22)

--max_length_plastid the maximum RPF length for Plastid offset generation (default 34)

--offset_file the offsets input file

--min_length_gd minimum RPF length used for gene distributions and metagenic classification (default: 26).

--max_length_gd maximum RPF length used for gene distributions and metagenic classification (default: 34).

--outfolder the folder to store the output files (default: work_dir/mQC_output)

--tool_dir folder with necessary additional mappingQC tools. More information below in the dependencies section. (default: search for the default tool directory location in the active conda environment)

--plotrpftool the module that will be used for plotting the RPF-phase figure

Possible options:

- grouped2D: use Seaborn to plot a grouped 2D bar chart (default)

- pyplot3D: use mplot3d to plot a 3D bar chart. This tool can suffer sometimes from Escher effects, as it tries to plot a 3D plot with the 2D software of pyplot and matplotlib.

- mayavi: use the mayavi package to plot a 3D bar chart. This tool only works on local systems with graphical cards.

--outhtml custom name for the output HTML file (default: work_dir/mQC_experiment_name.html)

--outzip custom name for output ZIP file (default: work_dir/mQC_experiment_name.zip)

ERROR: do not forget the experiment name!

実行方法

samとデータベースを指定する。

mQC.pl --experiment_name yourexperimentname --samfile yoursamfile.sam --cores 20 --species human --ens_v 86 --ens_db ENS_hsa_86.db --unique N --offset plastid --plastid_bam yourbamfile.bam --tool_dir mqc_tools

引用

https://github.com/Biobix/mQC