2022/01/23 インストール追記
RNA-seqは、細胞のトランスクリプトーム全体の特性評価を提供する。シーケンスのパフォーマンスとライブラリの品質の評価は、RNA-seqデータの解釈に不可欠だが、この問題に対処するツールはほとんない(論文執筆時点)。ここではデータ品質の重要な指標を提供するプログラムであるRNA-SeQCを紹介する。これらのメトリックには、歩留まり、アラインメント、duplicationの割合、GCバイアス、rRNAコンテンツ、アラインメント領域(エクソン、イントロン、遺伝子内)、カバレッジの連続性、3 '/ 5'バイアス、検出可能な転写産物の数などが含まれる。このソフトウェアは、ライブラリ構築プロトコル、入力マテリアル、その他の実験パラメーターのマルチサンプル評価を提供する。ソフトウェアのモジュール性により、パイプラインの統合と、アラインできるリード数、duplication率、rRNA汚染などのデータ品質の主要な測定値の定期的な監視が可能になる。 RNA-SeQCにより、研究者は下流の分析にサンプルを含めるかどうかについて十分な情報に基づいた決定を下すことができる。要約すると、RNA-SeQCは、実験の設計、プロセスの最適化、および下流のコンピューター解析に不可欠な品質管理手段を提供する。
HP
https://software.broadinstitute.org/cancer/cga/rna-seqc
インストール
The latest stable build of RNA-SeQC is available on the GitHub Releases page, and contains static binaries for Linux and OSX.
依存
- openjdk 7
本体 Github
#bioconda (link)
mamba create -n RNA-SeQC -y
conda activate RNA-SeQC
mamba install -c bioconda -y rna-seqc
> rnaseqc
$ rnaseqc
rnaseqc [gtf] [bam] [output] {OPTIONS}
RNASeQC 2.3.4
OPTIONS:
-h, --help Display this message and quit
--version Display the version and quit
gtf The input GTF file containing features
to check the bam against
bam The input SAM/BAM file containing reads
to process
output Output directory
-s[sample], --sample=[sample] The name of the current sample. Default:
The bam's filename
--bed=[BEDFILE] Optional input BED file containing
non-overlapping exons used for fragment
size calculations
--fasta=[fasta] Optional input FASTA/FASTQ file
containing the reference sequence used
for parsing CRAM files
--chimeric-distance=[DISTANCE] Set the maximum accepted distance
between read mates. Mates beyond this
distance will be counted as chimeric
pairs. Default: 2000000 [bp]
--fragment-samples=[SAMPLES] Set the number of samples to take when
computing fragment sizes. Requires the
--bed argument. Default: 1000000
-q[QUALITY],
--mapping-quality=[QUALITY] Set the lower bound on read quality for
exon coverage counting. Reads below this
number are excluded from coverage
metrics. Default: 255
--base-mismatch=[MISMATCHES] Set the maximum number of allowed
mismatches between a read and the
reference sequence. Reads with more than
this number of mismatches are excluded
from coverage metrics. Default: 6
--offset=[OFFSET] Set the offset into the gene for the 3'
and 5' windows in bias calculation. A
positive value shifts the 3' and 5'
windows towards eachother, while a
negative value shifts them apart.
Default: 150 [bp]
--window-size=[SIZE] Set the size of the 3' and 5' windows in
bias calculation. Default: 100 [bp]
--gene-length=[LENGTH] Set the minimum size of a gene for bias
calculation. Genes below this size are
ignored in the calculation. Default: 600
[bp]
--legacy Use legacy counting rules. Gene and exon
counts match output of RNA-SeQC 1.1.9
--stranded=[stranded] Use strand-specific metrics. Only
features on the same strand of a read
will be considered. Allowed values are
'RF', 'rf', 'FR', and 'fr'
-v, --verbose Give some feedback about what's going
on. Supply this argument twice for
progress updates while parsing the bam
-t[TAG...], --tag=[TAG...] Filter out reads with the specified tag.
--chimeric-tag=[TAG] Reads maked with the specified tag will
be labeled as Chimeric. Defaults to 'mC'
for STAR
--exclude-chimeric Exclude chimeric reads from the read
counts
-u, --unpaired Treat all reads as unpaired, ignoring
filters which require properly paired
reads
--rpkm Output gene RPKM values instead of TPMs
--coverage If this flag is provided, coverage
statistics for each transcript will be
written to a table. Otherwise, only
summary coverage statistics are
generated and added to the metrics table
--coverage-mask=[SIZE] Sets how many bases at both ends of a
transcript are masked out when computing
per-base exon coverage. Default: 500bp
-d[threshold],
--detection-threshold=[threshold] Number of counts on a gene to consider
the gene 'detected'. Additionally, genes
below this limit are excluded from 3'
bias computation. Default: 5 reads
"--" can be used to terminate flag options and force all following
arguments to be treated as positional options
Argument validation error: No GTF file provided
テストラン
rnaseqc [gtf] [bam] [output] {OPTIONS}の順で指定する。
git clone --recursive https://github.com/broadinstitute/rnaseqc.git
cd rnaseqc/
rnaseqc test_data/downsampled.gtf test_data/downsampled.bam --bed test_data/downsampled.bed --coverage .
- <gtf> The input GTF file containing features.
- <bam> The input SAM/BAM file containing reads to process.
- <output> Output directory.
大きめのテストファイルはそのままではgit cloneされません。直接githubからダウンロードするかGit LFS をインストールしてから実行してください。
出力
引用
RNA-SeQC: RNA-seq metrics for quality control and process optimization
David S. DeLuca,* Joshua Z. Levin, Andrey Sivachenko, Timothy Fennell, Marc-Danie Nazaire, Chris Williams, Michael Reich, Wendy Winckler, Gad Getz
Bioinformatics. 2012 Jun 1; 28(11): 1530–1532
関連