Githubより
YAPCはATAC-seq、DNase-seq、ChIP-seqなどのゲノムハイスループットシーケンシングデータ用のピークコーラーである。1つのタイムポイントにつき2つの生物学的複製を持つ時系列データセット(または複数の条件を持つその他のデータ)において、特徴的な幅の代表的なピークを捕捉する目的で特に書かれているが、1つのレプリケートにも使用できる。簡単に説明すると、ピーク候補の位置は、すべてのサンプルで平均化されたシグナルの凹んだ領域(負の平滑化された二次導関数を持つ領域)を用いて定義される。候補となったピークは、IDR(Li et al 2011)を用いて条件ごとの統計的有意性が検証される。
インストール
macos10.14でmambaを使ってテストした。
依存
#bioconda(link)
mamba create -n yapc -y
conda activate yapc
mamba install -c bioconda yapc -y
> yapc
u$ yapc -h
yapc (yet another peak caller) 0.1
usage: yapc [-h] [--smoothing-window-width SMOOTHING_WINDOW_WIDTH] [--smoothing-times SMOOTHING_TIMES] [--min-concave-region-width MIN_CONCAVE_REGION_WIDTH] [--truncate-idr-input TRUNCATE_IDR_INPUT]
[--fixed-peak-halfwidth FIXED_PEAK_HALFWIDTH] [--pseudoreplicates] [--recycle]
OUTPUT_PREFIX [CONDITION_REP1_REP2 [CONDITION_REP1_REP2 ...]]
An adhoc peak caller for genomic high-throughput sequencing data such as ATAC-seq, DNase-seq or ChIP-seq. Specifically written for the purpose of capturing representative peaks of characteristic width in a time series
data set with two biological replicates per time point. Briefly, candidate peak locations are defined using concave regions (regions with negative smoothed second derivative) from signal averaged across all samples. The
candidate peaks are then tested for condition-specific statistical significance using IDR.
positional arguments:
OUTPUT_PREFIX Prefix to use for all output files
CONDITION_REP1_REP2 Name of the condition, BigWig files of first and second replicates; all separated by spaces. (default: None)
optional arguments:
-h, --help show this help message and exit
--smoothing-window-width SMOOTHING_WINDOW_WIDTH
Width of the smoothing window used for the second derivative track. If the peak calls aren't capturing the peak shape well, try setting this to different values ranging from 75 to 200. (default:
150)
--smoothing-times SMOOTHING_TIMES
Number of times smoothing is applied to the second derivative. (default: 3)
--min-concave-region-width MIN_CONCAVE_REGION_WIDTH
Discard concave regions smaller than the threshold specified. (default: 75)
--truncate-idr-input TRUNCATE_IDR_INPUT
Truncate IDR input to the number of peaks specified. (default: 100000)
--fixed-peak-halfwidth FIXED_PEAK_HALFWIDTH
Set final peak coordinates to the specified number of base pairs on either side of the concave region mode. (default: None)
--pseudoreplicates Use pseudoreplicates as implemented in modENCODE (Landt et al 2012; around Fig 7): for each condition, assess peak reproducibility in replicates and pseudoreplicates; report globalIDRs for the
set with a larger number of peak calls (at IDR=0.001). Pseudoreplicates are specified as the 3rd and 4th file name after every condition. (default: False)
--recycle Do not recompute (intermediate) output files if a file with the expected name is already present. Enabling this can lead to funky behaviour e.g. in the case of a previously interrupted run.
(default: False)
実行方法
biwigファイルを指定する。
yapc OUTPUT_PREFIX wt_emb atac_wt_emb_rep1.bw atac_wt_emb_rep2.bw
引用
https://github.com/jurgjn/yapc
Chromatin accessibility dynamics across C. elegans development and ageing
Jürgen Jänes, Yan Dong, Michael Schoof, Jacques Serizay, Alex Appert, Chiara Cerrato, Carson Woodbury, Ron Chen, Carolina Gemma, Ni Huang, Djem Kissiov, Przemyslaw Stempor, Annette Steward, Eva Zeiser, Sascha Sauer, Julie Ahringer
Elife. 2018 Oct 26;7:e37344
関連