2018-02-27

RNA seqのクオリティコントロールツール RSeQC

2019 12/2 インストール追記

2020 12/9 誤字修正, help追加

2022 インストール手順追記

RSeQCはクオリティ、GCバイアス、PCRバイアス、ヌクレオチド組成バイアス、シーケンスのデプス、strandの特異性、カバレッジ均一性およびゲノムのfeature上のカバレッジ分布など、RNA-seq実験を総合的に評価するパッケージ。SAMとBAMを入力として利用できる。可視化にはRを利用している。

マニュアル

http://rseqc.sourceforge.net

インストール

依存

gcc
python2.7
numpy
R

#bioconda(link)
mamba create -n RSeQC -y
conda activate RSeQC
mamba install -c bioconda -y rseqc -y

#pip
pip install RSeQC

> bam2fq.py -h

$ bam2fq.py -h

Usage: bam2fq.py [options]

Description: Convert alignments in BAM or SAM format into fastq format.

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format.

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output fastq files(s).

-s, --single-end Specificy '-s' or '--single-end' for single-end

sequencing.

-c, --compress Specificy '-c' or '--compress' to compress output

fastq file(s) using 'gzip' command.

> bam2wig.py -h

$ bam2wig.py -h

Usage: bam2wig.py [options]

Convert BAM file into wig file. BAM file must be sorted and indexed using SAMtools.

Note: SAM format file is not supported.

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM format. BAM file must be sorted

and indexed using samTools. .bam and .bai files should

be placed in the same directory.

-s CHROMSIZE, --chromSize=CHROMSIZE

Chromosome size file. Tab or space separated text file

with 2 columns: first column is chromosome name/ID,

second column is chromosome size. Chromosome name

(such as "chr1") should be consistent between this

file and the BAM file.

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output wiggle files(s). One wiggle file will

be generated for non strand-specific data, two wiggle

files ("Prefix_Forward.wig" and "Prefix_Reverse.wig")

will be generated for strand-specific RNA-seq data.

-t TOTAL_WIGSUM, --wigsum=TOTAL_WIGSUM

Specified wigsum. Eg: 1,000,000,000 equals to coverage

of 10 million 100nt reads. Ignore this option to

disable normalization

-u, --skip-multi-hits

Skip non-unique hit reads.

-d STRAND_RULE, --strand=STRAND_RULE

How read(s) were stranded during sequencing. For

example: --strand='1++,1--,2+-,2-+' means that this is

a pair-end, strand-specific RNA-seq data, and the

strand rule is: read1 mapped to '+' => parental gene

on '+'; read1 mapped to '-' => parental gene on '-';

read2 mapped to '+' => parental gene on '-'; read2

mapped to '-' => parental gene on '+'. If you are not

sure about the strand rule, run 'infer_experiment.py'

default=none (Not a strand specific RNA-seq data).

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality to determine "uniquely

mapped". default=30

> bam_stat.py -h

$ bam_stat.py -h

Usage: bam_stat.py [options]

Summarizing mapping statistics of a BAM or SAM file.

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format.

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality (phred scaled) to determine

"uniquely mapped" reads. default=30

> mismatch_profile.py -h

$ mismatch_profile.py -h

Usage: mismatch_profile.py [options]

Calculate the distribution of mismatches across reads. Note that the "MD" tag must exist in BAM file.

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_BAM, --input=INPUT_BAM

Input BAM file. [required]

-l READ_ALIGNMENT_LENGTH, --read-align-length=READ_ALIGNMENT_LENGTH

Alignment length of read. It is usually set to the

orignial read length. For example, all these cigar

strings ("101M", "68M140N33M", "53M1D48M") suggest the

read alignment length is 101. [required]

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s). [required]

-n READ_NUMBER, --read-num=READ_NUMBER

Number of aligned reads with mismatches used to

calculate the mismatch profile. default=1000000

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality. default=30

> clipping_profile.py -h

$ clipping_profile.py -h

Usage: clipping_profile.py [options]

This program is used estimate clipping profile of RNA-seq reads from BAM or SAM file

Note that to use this funciton, CIGAR strings within SAM/BAM file should have 'S' operation

(This means your reads mapper should support clipped mapping).

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format.

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s).

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality (phred scaled) for an

alignment to be considered as "uniquely mapped".

default=30

-s LAYOUT, --sequencing=LAYOUT

Sequencing layout. "SE"(single-end) or "PE"(pair-end).

> deletion_profile.py -h

$ deletion_profile.py -h

Usage: deletion_profile.py [options]

Calculate the distributions of deleted nucleotides across reads.

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_BAM, --input=INPUT_BAM

Input BAM file. [required]

-l READ_ALIGNMENT_LENGTH, --read-align-length=READ_ALIGNMENT_LENGTH

Alignment length of read. It is usually set to the

orignial read length. For example, all these cigar

strings ("101M", "68M140N33M", "53M1D48M") suggest the

read alignment length is 101. [required]

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s). [required]

-n READ_NUMBER, --read-num=READ_NUMBER

Number of aligned reads with deletions used to

calculate the deletion profile. default=1000000

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality. default=30

> insertion_profile.py -h

$ insertion_profile.py -h

Usage: insertion_profile.py [options]

Calculate the distributions of inserted nucleotides across reads

Note CIGAR strings within SAM/BAM file should have 'I' operation

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format.

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s).

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality (phred scaled) for an

alignment to be considered as "uniquely mapped".

default=30

-s LAYOUT, --sequencing=LAYOUT

Sequencing layout. "SE"(single-end) or "PE"(pair-end).

> divide_bam.py -h

$ divide_bam.py -h

Usage: divide_bam.py [options]

Equally divide BAM file (m alignments) into n parts. Each part contains roughly m/n alignments

that are randomly sampled from total alignments.

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM format. BAM file should be

sorted and indexed.

-n SUBSET_NUM, --subset-num=SUBSET_NUM

Number of small BAM files

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output BAM files. Output "Prefix_num.bam".

-s, --skip-unmap Skip unmapped reads.

> geneBody_coverage.py -h

$ geneBody_coverage.py -h

Usage: geneBody_coverage.py [options]

Calculate the RNA-seq reads coverage over gene body.

Note:

1) Only input sorted and indexed BAM file(s). SAM format is not supported.

2) Genes/transcripts with mRNA length < 100 will be skipped (Number specified to "-l" cannot be < 100).

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILES, --input=INPUT_FILES

Input file(s) in BAM format. "-i" takes these input:

1) a single BAM file. 2) "," separated BAM files. 3)

directory containing one or more bam files. 4) plain

text file containing the path of one or more bam file

(Each row is a BAM file path). All BAM files should be

sorted and indexed using samtools.

-r REF_GENE_MODEL, --refgene=REF_GENE_MODEL

Reference gene model in bed format. [required]

-l MIN_MRNA_LENGTH, --minimum_length=MIN_MRNA_LENGTH

Minimum mRNA length (bp). mRNA smaller than

"min_mRNA_length" will be skipped. default=100

-f OUTPUT_FORMAT, --format=OUTPUT_FORMAT

Output file format, 'pdf', 'png' or 'jpeg'.

default=pdf

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s). [required]

> inner_distance.py -h

$ inner_distance.py -h

Usage: inner_distance.py [options]

Calculate the inner distance (insert size) of RNA-seq fragments.

RNA fragment

_________________||_________________

| |

||||||||||------------------||||||||||

read_1 insert_size read_2

fragment size = read_1 + insert_size + read_2

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format.

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s)

-r REF_GENE, --refgene=REF_GENE

Reference gene model in BED format.

-k SAMPLESIZE, --sample-size=SAMPLESIZE

Number of read-pairs used to estimate inner distance.

default=1000000

-l LOWER_BOUND_SIZE, --lower-bound=LOWER_BOUND_SIZE

Lower bound of inner distance (bp). This option is

used for ploting histograme. default=-250

-u UPPER_BOUND_SIZE, --upper-bound=UPPER_BOUND_SIZE

Upper bound of inner distance (bp). This option is

used for plotting histogram. default=250

-s STEP_SIZE, --step=STEP_SIZE

Step size (bp) of histograme. This option is used for

plotting histogram. default=5

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality (phred scaled) for an

alignment to be called "uniquely mapped". default=30

> junction_annotation.py -h

$ junction_annotation.py -h

Usage: junction_annotation.py [options]

Annotate splicing reads against gene model in two levels: reads level and juncion level.

Note:

1) A read, especially long read, can be spliced 2 or more times

2) Multiple splicing reads spanning the same intron can be consolidated into one splicing junction.

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format.

-r REF_GENE_MODEL, --refgene=REF_GENE_MODEL

Reference gene model in bed format. This file is

better to be a pooled gene model as it will be used to

annotate splicing junctions [required]

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s). [required]

-m MIN_INTRON, --min-intron=MIN_INTRON

Minimum intron length (bp). default=50 [optional]

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality (phred scaled) for an

alignment to be considered as "uniquely mapped".

default=30

> junction_saturation.py -h

$ junction_saturation.py -h

Usage: junction_saturation.py [options]

Check if sequencing depth is saturated or not, based on the idea that when sequencing depth is

approaching saturation, less NEW junctions will be detected.

See http://rseqc.sourceforge.net/ for details.

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format.[required]

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s). [required]

-r REFGENE_BED, --refgene=REFGENE_BED

Reference gene model in bed fomat. This gene model is

used to determine known splicing junctions. [required]

-l PERCENTILE_LOW_BOUND, --percentile-floor=PERCENTILE_LOW_BOUND

Sampling starts from this percentile. A integer

between 0 and 100. default=5

-u PERCENTILE_UP_BOUND, --percentile-ceiling=PERCENTILE_UP_BOUND

Sampling ends at this percentile. A integer between 0

and 100. default=100

-s PERCENTILE_STEP, --percentile-step=PERCENTILE_STEP

Sampling frequency. Smaller value means more sampling

times. A integer between 0 and 100. default=5

-m MINIMUM_INTRON_SIZE, --min-intron=MINIMUM_INTRON_SIZE

Minimum intron size (bp). default=50

-v MINIMUM_SPLICE_READ, --min-coverage=MINIMUM_SPLICE_READ

Minimum number of supportting reads to call a

junction. default=1

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality (phred scaled) for an

alignment to be called "uniquely mapped". default=30

> read_distribution.py -h

$ read_distribution.py -h

Usage: read_distribution.py [options]

Check reads distribution over exon, intron, UTR, intergenic ... etc

The following reads will be skipped:

qc_failed

PCR duplicate

Unmapped

Non-primary (or secondary)

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format.

-r REF_GENE_MODEL, --refgene=REF_GENE_MODEL

Reference gene model in bed format.

> read_duplication.py -h

$ read_duplication.py -h

Usage: read_duplication.py [options]

Calculte reads' duplication rate.

Sequence-based: Reads with identical sequence are considered as "duplicate reads".

Mapping-based: Reads mapped to the exact same location are considered as "duplicate reads".

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format.

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s).

-u UPPER_LIMIT, --up-limit=UPPER_LIMIT

Upper limit of reads' occurrence. Only used for

plotting, default=500 (times)

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality (phred scaled) for an

alignment to be considered as "uniquely mapped".

default=30

> read_GC.py -h

$ read_GC.py -h

Usage: read_GC.py [options]

-------------------------------------------------------------------------------------------------

Calculate distribution of reads' GC content

-------------------------------------------------------------------------------------------------

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format.

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s).

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality (phred scaled) for an

alignment to be called "uniquely mapped". default=30

> read_NVC.py -h

$ read_NVC.py -h

Usage: read_NVC.py [options]

-------------------------------------------------------------------------------------------------

For each nucleotide position of read (5'->3'), check the nucleotide frequency. The generated R script will

gives NVC (Nucleotide Versus Cycle) plot.

-------------------------------------------------------------------------------------------------

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Input file in BAM or SAM format.[required]

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s). [required]

-x, --nx Flag option. Presense of this flag tells program to

include N,X in output NVC plot [required]

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality (phred scaled) for an

alignment to be called "uniquely mapped". default=30

> read_quality.py -h

$ read_quality.py -h

Usage: read_quality.py [options]

-------------------------------------------------------------------------------------------------

Calculating Phred Quality Score for each position on read. Note that each read should have

the fixed (same) length

-------------------------------------------------------------------------------------------------

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format. [required]

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output files(s). [required]

-r REDUCE_FOLD, --reduce=REDUCE_FOLD

To avoid making huge vector in R, nucleotide with

particular phred score less frequent than this number

will be ignored. Increase this number save more memory

while reduce precision. Set to 1 achieves maximum

precision (i.e. every nucleotide will be considered).

This option only applies to the 'boxplot'. default=1

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality (phred scaled) for an

alignment to be called "uniquely mapped". default=30

> RNA_fragment_size.py -h

$ RNA_fragment_size.py -h

Usage: RNA_fragment_size.py [options]

calculate fragment size for each gene/transcript. For each transcript/gene, it Will report:

1) # of fragment that was used.

2) mean of fragment size

3) median of fragment size

4) stdev of fragment size

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input=INPUT_FILE

Input BAM file

-r REFGENE_BED, --refgene=REFGENE_BED

Reference gene model in BED format. Must be strandard

12-column BED file. [required]

-q MAP_QUAL, --mapq=MAP_QUAL

Minimum mapping quality (phred scaled) for an

alignment to be called "uniquely mapped". default=30

-n FRAGMENT_NUM, --frag-num=FRAGMENT_NUM

Minimum number of fragment. default=3

>split_paired_bam.py -h

$ split_paired_bam.py -h

Usage: split_paired_bam.py [options]

-------------------------------------------------------------------------------------------------

Split bam file (pair-end) into 2 single-end bam file

-------------------------------------------------------------------------------------------------

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILE, --input-file=INPUT_FILE

Alignment file in BAM or SAM format. BAM file should

be sorted and indexed

-o OUTPUT_PREFIX, --out-prefix=OUTPUT_PREFIX

Prefix of output BAM files. "prefix.R1.bam" file

contains the 1st read, "prefix.R2.bam" file contains

the 2nd read

>tin.py

$ tin.py

Usage: tin.py [options]

-------------------------------------------------------------------------------------------------

This program calculates transcript integrity number (TIN) for each transcript (or gene) in

BED file. TIN is conceptually similar to RIN (RNA integrity number) but provides transcript

level measurement of RNA quality and is more sensitive to measure low quality RNA samples:

1) TIN score of a transcript is used to measure the RNA integrity of the transcript.

2) Median TIN score across all transcripts can be used to measure RNA integrity of that

"RNA sample".

3) TIN ranges from 0 (the worst) to 100 (the best). TIN = 60 means: 60% of the transcript

has been covered if the reads coverage were uniform.

4) TIN will be assigned to 0 if the transcript has no coverage or covered reads is fewer than

cutoff.

-------------------------------------------------------------------------------------------------

Options:

--version show program's version number and exit

-h, --help show this help message and exit

-i INPUT_FILES, --input=INPUT_FILES

Input BAM file(s). "-i" takes these input: 1) a single

BAM file. 2) "," separated BAM files (no spaces

allowed). 3) directory containing one or more bam

files. 4) plain text file containing the path of one

or more bam files (Each row is a BAM file path). All

BAM files should be sorted and indexed using samtools.

[required]

-r REF_GENE_MODEL, --refgene=REF_GENE_MODEL

Reference gene model in BED format. Must be strandard

12-column BED file. [required]

-c MINIMUM_COVERAGE, --minCov=MINIMUM_COVERAGE

Minimum number of read mapped to a transcript.

default=10

-n SAMPLE_SIZE, --sample-size=SAMPLE_SIZE

Number of equal-spaced nucleotide positions picked

from mRNA. Note: if this number is larger than the

length of mRNA (L), it will be halved until it's

smaller than L. default=100

-s, --subtract-background

Subtract background noise (estimated from intronic

reads). Only use this option if there are substantial

intronic reads.

ラン

マニュアルからテストのbamやbed,gtfなどをダウンロードできる。ここではこのデータを使って動作を確認する。

ペアエンドのbamからfastqに変換。

bam2fq.py -i test_PairedEnd_StrandSpecific_hg19.sam -o bam2fq_out1

-i INPUT_FILE
-o OUTPUT_PREFIX
-s single-end for single-end sequencing.

シングルエンドは-sをつける。

bamからwiggleフォーマット（UCSC解説）に変換。

UCSCのwigToBigWigコマンドが見つかれば、出力は自動でbigwigフォーマット（UCSC解説）に変換される。オーサーが準備したscript (fetchChromSizes) をマニュアルに従い準備しておく。

#bamはsamtools sortでcoordinate sortされている必要がある。
samtools sort -m 1000000000 input.bam input.sorted.bam
samtools index input.sorted.bam

#fetchChromSizesでスクリプトでサイズを計算
fetchChromSizes hg19 >hg19.chrom.sizes
bam2wig.py -s hg19.chrom.sizes -i sample.bam -o out -u

-s Chromosome size file. Tab or space separated text file with 2 columns: first column is chromosome name/ID, second column is chromosome size.
-u Skip non-unique hit reads.
-t Specified wigsum. Eg: 1,000,000,000 equals to coverage of 10 million 100nt reads. Ignore this option to disable normalization
-q Minimum mapping quality for an alignment to be called “uniquely mapped”. default=30
-d How read(s) were stranded during sequencing. For example: –strand=‘1++,1–,2+-,2-+’ means that this is a pair-end, strand-specific RNA-seq data, and the strand rule is: read1 mapped to ‘+’ => parental gene on ‘+’; read1 mapped to ‘-‘ => parental gene on ‘-‘; read2 mapped to ‘+’ => parental gene on ‘-‘; read2 mapped to ‘-‘ => parental gene on ‘+’. If you are not sure about the strand rule, run ‘infer_experiment.py’ default=none (Not a strand specific RNA-seq data).

bam/samのstatistics出力。

bam_stat.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam

-q Minimum mapping quality to determine uniquely mapped read.

出力。

$ bam_stat.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam

Load BAM file ... Done

#==================================================

#All numbers are READ count

#==================================================

Total records: 44826454

QC failed: 0

Optical/PCR duplicate: 0

Non primary hits 0

Unmapped reads: 0

mapq < mapq_cut (non-unique): 0

mapq >= mapq_cut (unique): 44826454

Read-1: 28674562

Read-2: 16151892

Reads map to '+': 23884437

Reads map to '-': 20942017

Non-splice reads: 39959172

Splice reads: 4867282

Reads mapped in proper pairs: 31803590

Proper-paired reads map to different chrom:0

リードのクリップアライメントの分布。

（clipped mappingが可能なアライナーを使っている必要がある）。

clipping_profile.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam -s "PE" -o out

-s Sequencing layout. “SE”(single-end) or “PE”(pair-end).
-q Minimum mapping quality (phred scaled) for an alignment to be considered as “uniquely mapped”. default=30

リード1の出力。

f:id:kazumaxneo:20180227225932j:plain

リード２も出力される。

リードのmismatch分布。

mismatch_profile.py -l 101 -i sample.bam -o out

-l Alignment length of read. It is usually set to the orignial read length. For example, all these cigar strings (“101M”, “68M140N33M”, “53M1D48M”) suggest the read alignment length is 101. [required]

リードのdeletion分布。

deletion_profile.py -i sample.bam -l 101 -o out

-l Alignment length of read. It is usually set to the orignial read length.

リードのinsertion分布。

insertion_profile.py -s "PE" -i test.bam -o out

-s Sequencing layout. “SE”(single-end) or “PE”(pair-end).

f:id:kazumaxneo:20180227234729j:plain

ランダムにリードをサンプリングしてbamをn個に分割

（ランダムだが均一なリード数になる）。

divide_bam.py -n 3 -i SingleEnd_StrandSpecific_50mer_Human_hg19.bam -o output

-n Number of small BAM files

遺伝子上のカバレッジ分布

geneBody_coverage.py -r hg19.housekeeping.bed -i test.bam -o output

-r Reference gene model in bed format. [required]

複数bamなら-i test1.bam,test2.bam,test3.bam 、またはbamのディレクトリを-i /data/ という風に指定する。bigwigから計算するgeneBody_coverage2.pyもある。

f:id:kazumaxneo:20180227233520j:plain

ペアエンドの内側のインサートサイズを推測。

f:id:kazumaxneo:20180227234231j:plain

inner_distance.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output -r hg19.refseq.bed12

f:id:kazumaxneo:20180301124426j:plain

スプライシングジャンクションをリファレンスと比較し、比率を返す。

junction_annotation.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output -r hg19.refseq.bed12

f:id:kazumaxneo:20180227235016j:plain

レアなalternative splicingを探すためにデプスが十分か推測。

junction_saturation.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -r hg19.refseq.bed12 -o output

-r Reference gene model in BED format.

f:id:kazumaxneo:20180227235353j:plain

feature上のリードの分布

read_distribution.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam -r hg19.refseq.bed12

出力の定義はマニュアルで確認してください。

f:id:kazumaxneo:20180301124530j:plain

duplicateの分布。

read_duplication.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output

GC含量の分布

read_GC.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output

f:id:kazumaxneo:20180228202855j:plain

リードの各部位の塩基の分布

read_NVC.py -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam -o output

クオリティの分布

read_quality.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam -o output_quality

f:id:kazumaxneo:20180301001942j:plain

f:id:kazumaxneo:20180301001944j:plain

各transcriptsごとのリードサイズ

RNA_fragment_size.py -r hg19.RefSeq.union.bed -i SRR873822_RIN10.bam > SRR873822_RIN10.fragSize

bamの分割（rRNA除去など）

split_bam.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam -r hg19.rRNA.bed -o output

以下の３種類に分割される。rRNAをbedで与えれば、rRNAにマッピングされなかったリードをbamのまま抽出できる。

XXX.in.bam: reads that are mapped to exon regions of the gene list (or reads consumed by gene list).
XXX.ex.bam: reads that cannot be mapped the exon regions of the original gene list.
XXX.junk.bam: qcfailed reads or unmapped reads.

ペアエンドbamからシングルエンドのbam２つに分割。

split_paired_bam.py -i Pairend_StrandSpecific_51mer_Human_hg19.bam -o output

TIN値からサンプルRNAの完全さを評価。

tin.py -r hg19.RefSeq.union.bed -i Pairend_StrandSpecific_51mer_Human_hg19.bam -o output

RIN値（1≦RIN≦10）は18Sおよび28Sリボソーム RNAの量に大きく依存したリボソーム RNA完全性の尺度で、mRNA品質の直接測定ではない。実験ではリボゼロなどでrRNAを除いているため、下流のフローでRNAの完全さを分析できない。実際の状況は、「AU-リッチ配列」、「転写長」、「GC含量」、「二次構造」および「RNA-タンパク質複合体」などの因子に応じて、分解速度は転写物間で著しく異なることがある（ref：http://www.illumina.com/documents/products/technotes/technote-truseq-rna-access.pdf）。これらの限界を克服するために、転写レベルでRNAの完全性を測定できるアルゴリズムTIN（リンク）が考案された。 TINは、発現された各転写物のスコア（0≦TIN≦100）を計算する。また全transcriptsの平均のmedTINはサンプル間のRNA完全性を比較するために使用できる。

ほかにも定量してFPKMを出すFPKM_count.pyや、RNA seqの種類を推測するinfer_experiment.py （strand specificかなどをマッピングから推測）、bigwigをノーマライズするnormalize_bigwig.py、２つのbigwigファイルを操作するoverlay_bigwig.py、ヘキサマーの頻度をリファレンスとリードで調べるread_hexamer.pyがある。

引用

RSeQC: quality control of RNA-seq experiments.

Wang L, Wang S, Li W.

Bioinformatics. 2012 Aug 15;28(16):2184-5.