2021 2/10 v5 リリースへのリンク追加
2021 5/14, 9/20 help追加
2021 9/20 論文引用
2021 12/22 コマンド修正
elPrep 5は、シーケンスアライメント/マップファイルをバリアントコーラーで処理するためのelPrepフレームワークを更新したものである。elPrep 5は、バリアントコールのためのGATKベスト・プラクティスに記載されている完全なパイプライン:これは、PCRとoptical duplicatesマーキング、座標順序によるソート、塩基品質スコアの再校正、ハプロタイプコールアルゴリズムを用いたバリアントコールから構成される、を実行できるようになった。ベンチマークでは、elPrep 5はGATK 4と同じハードウェアリソースを使用しながら、全exomeデータと全ゲノムデータの両方でバリアントコールパイプラインの実行時間を8~16倍高速化していることが示されている。 このことから、elPrep 5は、より高速な実行時間が必要な場合にGATK 4の代わりにドロップインで使用するのに適している。
HPより
GATK 4と比較して、elPrep 5は、GATK 4が使用するRAMの+- 0.70倍、ディスクスペースの+- 0.70倍の使用量で、パイプラインを8.5~16倍高速に実行できる。elPrepの出力は、GATKの出力と同じです。(*50x NA12878 Illumina Platinum genome, hg38, run on AWS m5.24xlarge, Intel Xeon, 96 vCPU, 384 GiB RAM)
インストール
2021 5/14
登録するとビルドされた実行ファイルがダウンロードできるようになっています。
https://www.imec-int.com/en/expertise/lifesciences/genomics/dna-sequence-analysis-software
> ./elprep
# ./elprep
elprep version 5.0.1 compiled with go1.15.7 - see http://github.com/exascience/elprep for more information.
2021/05/14 12:14:56 Incorrect number of parameters.
Print command details:
[--help]
[--help-extended]
Available commands: filter, sfm, vcf-to-elsites, bed-to-elsites, fasta-to-elfasta
filter/sfm parameters:
elprep [filter | sfm] sam-file sam-output-file
[--output-type [sam | bam]]
[--replace-reference-sequences sam-file]
[--filter-unmapped-reads]
[--filter-unmapped-reads-strict]
[--filter-mapping-quality mapping-quality]
[--filter-non-exact-mapping-reads]
[--filter-non-exact-mapping-reads-strict]
[--filter-non-overlapping-reads bed-file]
[--replace-read-group read-group-string]
[--mark-duplicates]
[--mark-optical-duplicates file]
[--optical-duplicates-pixel-distance nr]
[--remove-duplicates]
[--remove-optional-fields [all | list]]
[--keep-optional-fields [none | list]]
[--sorting-order [keep | unknown | unsorted | queryname | coordinate]]
[--clean-sam]
[--bqsr recal-file]
[--reference elfasta]
[--quantize-levels nr]
[--sqq list]
[--max-cycle nr]
[--known-sites list]
[--haplotypecaller vcf-file]
[--reference-confidence [GVCF | BP_RESOLUTION | NONE]
[--sample-name sample-name]
[--activity-profile igv-file]
[--assembly-regions igv-file]
[--assembly-region-padding nr]
[--target-regions bed-file]
[--nr-of-threads nr]
[--timed]
[--log-path path]
[--intermediate-files-output-prefix name] (sfm only)
[--intermediate-files-output-type [sam | bam]] (sfm only)
[--tmp-path path]
[--single-end] (sfm only)
[--contig-group-size nr] (sfm only)
vcf-to-elsites parameters:
elprep vcf-to-elsites vcf-file elsites-file
[--log-path path]
bed-to-elsites parameters:
elprep bed-to-elsites bed-file elsites-file
[--log-path path]
fasta-to-elfasta parameters:
elprep fasta-to-elfasta fasta-file elfasta-file
[--log-path path]
> elprep sfm
# elprep sfm
elprep version 5.0.2 compiled with go1.16.4 - see http://github.com/exascience/elprep for more information.
Incorrect number of parameters.
sfm parameters:
elprep sfm sam-file sam-output-file
[--output-type [sam | bam]]
[--replace-reference-sequences sam-file]
[--filter-unmapped-reads]
[--filter-unmapped-reads-strict]
[--filter-mapping-quality mapping-quality]
[--filter-non-exact-mapping-reads]
[--filter-non-exact-mapping-reads-strict]
[--filter-non-overlapping-reads bed-file]
[--replace-read-group read-group-string]
[--mark-duplicates]
[--mark-optical-duplicates file]
[--optical-duplicates-pixel-distance nr]
[--remove-duplicates]
[--remove-optional-fields [all | list]]
[--keep-optional-fields [none | list]]
[--sorting-order [keep | unknown | unsorted | queryname | coordinate]]
[--clean-sam]
[--bqsr]
[--reference elfasta]
[--quantize-levels nr]
[--sqq list]
[--max-cycle nr]
[--known-sites list]
[--haplotypecaller vcf-file]
[--reference-confidence [GVCF | BP_RESOLUTION | NONE]
[--sample-name sample-name]
[--activity-profile igv-file]
[--assembly-regions igv-file]
[--assembly-region-padding nr]
[--target-regions bed-file]
[--nr-of-threads nr]
[--timed]
[--log-path path]
[--intermediate-files-output-prefix name]
[--intermediate-files-output-type [sam | bam]]
[--tmp-path path]
[--single-end]
[--contig-group-size nr]
> elprep filter
# elprep filter
elprep version 5.0.2 compiled with go1.16.4 - see http://github.com/exascience/elprep for more information.
Incorrect number of parameters.
filter parameters:
elprep filter sam-file sam-output-file
[--output-type [sam | bam]]
[--replace-reference-sequences sam-file]
[--filter-unmapped-reads]
[--filter-unmapped-reads-strict]
[--filter-mapping-quality mapping-quality]
[--filter-non-exact-mapping-reads]
[--filter-non-exact-mapping-reads-strict]
[--filter-non-overlapping-reads bed-file]
[--replace-read-group read-group-string]
[--mark-duplicates]
[--mark-optical-duplicates file]
[--optical-duplicates-pixel-distance nr]
[--remove-duplicates]
[--remove-optional-fields [all | list]]
[--keep-optional-fields [none | list]]
[--sorting-order [keep | unknown | unsorted | queryname | coordinate]]
[--clean-sam]
[--reference elfasta]
[--bqsr recal-file]
[--quantize-levels nr]
[--sqq list]
[--max-cycle nr]
[--known-sites list]
[--haplotypecaller vcf-file]
[--reference-confidence [GVCF | BP_RESOLUTION | NONE]
[--sample-name sample-name]
[--activity-profile igv-file]
[--assembly-regions igv-file]
[--assembly-region-padding nr]
[--target-regions]
[--nr-of-threads nr]
[--timed]
[--log-path path]
> elprep vcf-to-elsites
# elprep vcf-to-elsites
elprep version 5.0.2 compiled with go1.16.4 - see http://github.com/exascience/elprep for more information.
Incorrect number of parameters.
vcf-to-elsites parameters:
elprep vcf-to-elsites vcf-file elsites-file
[--log-path path]
> elprep bed-to-elsites
# elprep bed-to-elsites
elprep version 5.0.2 compiled with go1.16.4 - see http://github.com/exascience/elprep for more information.
Incorrect number of parameters.
bed-to-elsites parameters:
elprep bed-to-elsites bed-file elsites-file
[--log-path path]
> elprep fasta-to-elfasta
# elprep fasta-to-elfasta
elprep version 5.0.2 compiled with go1.16.4 - see http://github.com/exascience/elprep for more information.
Incorrect number of parameters.
fasta-to-elfasta parameters:
elprep fasta-to-elfasta fasta-file elfasta-file
[--log-path path]
condaで導入できるバージョンもv5になっています。
#anaconda (link)
mamba install -c bioconda elprep -y
実行方法
ビルドに失敗しました。Releasesページから間も無く公開される予定のビルド済みパッケージが利用できるまで待ちます。
2021 5/25
#準備
elprep fasta-to-elfasta hg38.fasta hg38.elfasta
elprep vcf-to-elsites dbsnp_138.hg38.vcf dbsnp_138.hg38.elsites
#mappingしてelprepに渡す
minimap2 -ax sr -R "@RG\tID:X\tLB:Y\tSM:sample1\tPL:ILLUMINA" \
-t 12 assembly.fasta pair_*.fq.gz |\
elprep filter /dev/stdin map.bam \
--mark-duplicates --remove-duplicates \
--filter-mapping-quality 0 \
--clean-sam \
--nr-of-threads 12 \
--sorting-order coordinate \
--bqsr output.recal \
--known-sites dbSNP_common_all.elsites
--reference GRCh38.elfasta
--target-regions targetedregions.bed
引用
Multithreaded variant calling in elPrep 5
Charlotte Herzeel, Pascal Costanza, Dries Decap, Jan Fostier,Roel Wuyts, Wilfried Verachtert
bioRxiv, Posted December 11, 2020
Multithreaded variant calling in elPrep 5
Charlotte Herzeel ,Pascal Costanza ,Dries Decap,Jan Fostier,Roel Wuyts,Wilfried Verachtert
PLOS ONE, Published: February 4, 2021