2020 4/17 画面表示バグ修正, インストール手順修正
vcf2mafはVCFをMutation Annotation Format (MAF)に変換し、各バリアントがすべての可能な遺伝子アイソフォームのうちの1つだけにアノテーションする。VCFをMAFに変換するためには、各バリアントはそれが影響を与える可能性のあるすべての遺伝子転写物/アイソフォームのうちの1つだけにマップされなければならない。変異ごとに単一の効果を選択することは、しばしば主観的なものになる。vcf2maf および maf2maf スクリプトは、その責任のほとんどを Ensembl の VEP に任せつつ、それらの「正規」アイソフォームをオーバーライドしたり、カスタム ExAC VCF をアノテーションに使用したりすることができる。また、MAF ライクなフォーマット(MAFに似た様々なフォーマット)や VCF ライクなフォーマットの解析を幅広くサポートしている。
インストール
依存が多いのでdocker環境ででテストした(anaconda3.7環境でcondaで導入した。ホスト OSはmacos10.14)。
#bioconda( link)
conda install -c bioconda -y vcf2maf
#dockerhub (link)annnotation等を含むため15GBくらいあります。どちらかといえばVEPのイメージです...
docker pull vanallenlab/vcf2maf:v1.6.17-5a45760
docker run --rm -itv $PWD:/data/ vanallenlab/vcf2maf:v1.6.17-5a45760
> perl /opt/vcf2maf/vcf2maf.pl -h
> maf2maf.pl --man
# maf2maf.pl --man
NAME
maf2maf.pl - Reannotate the effects of variants in a MAF by running maf2vcf followed by vcf2maf
SYNOPSIS
perl maf2maf.pl --help
perl maf2maf.pl --input-maf test.maf --output-maf test.vep.maf
OPTIONS
--input-maf Path to input file in MAF format
--output-maf Path to output MAF file [Default: STDOUT]
--tmp-dir Folder to retain intermediate VCFs/MAFs after runtime [Default: usually /tmp]
--tum-depth-col Name of MAF column for read depth in tumor BAM [t_depth]
--tum-rad-col Name of MAF column for reference allele depth in tumor BAM [t_ref_count]
--tum-vad-col Name of MAF column for variant allele depth in tumor BAM [t_alt_count]
--nrm-depth-col Name of MAF column for read depth in normal BAM [n_depth]
--nrm-rad-col Name of MAF column for reference allele depth in normal BAM [n_ref_count]
--nrm-vad-col Name of MAF column for variant allele depth in normal BAM [n_alt_count]
--retain-cols Comma-delimited list of columns to retain from the input MAF [Center,Verification_Status,Validation_Status,
Mutation_Status,Sequencing_Phase,Sequence_Source,Validation_Method,Score,BAM_file,Sequencer,Tumor_Sample_UUID,Matched_Norm_Sample_UUID]
--custom-enst List of custom ENST IDs that override canonical selection
--vep-path Folder containing the vep script [~/vep]
--vep-data VEP's base cache/plugin directory [~/.vep]
--vep-forks Number of forked processes to use when running VEP [4]
--buffer-size Number of variants VEP loads at a time; Reduce this for low memory systems [5000]
--any-allele When reporting co-located variants, allow mismatched variant alleles too
--filter-vcf A VCF for FILTER tag common_variant. Set to 0 to disable [~/.vep/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz]
--max-filter-ac Use tag common_variant if the filter-vcf reports a subpopulation AC higher than this [10]
--species Ensembl-friendly name of species (e.g. mus_musculus for mouse) [homo_sapiens]
--ncbi-build NCBI reference assembly of variants in MAF (e.g. GRCm38 for mouse) [GRCh37]
--cache-version Version of offline cache to use with VEP (e.g. 75, 84, 91) [Default: Installed version]
--ref-fasta Reference FASTA file [~/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz]
--help Print a brief help message and quit
--man Print the detailed manual
DESCRIPTION
This script runs a given MAF through maf2vcf to generate per-TN-pair
VCFs in a temporary folder, and then runs vcf2maf on each VCF to
reannotate variant effects and create a new combined MAF
Relevant links:
Homepage: https://github.com/ckandoth/vcf2maf
VCF format: http://samtools.github.io/hts-specs/
MAF format: https://wiki.nci.nih.gov/x/eJaPAQ
VEP: http://ensembl.org/info/docs/tools/vep/index.html
VEP annotated VCF format: http://ensembl.org/info/docs/tools/vep/vep_formats.html#vcfout
AUTHORS
Cyriac Kandoth (ckandoth@gmail.com)
Qingguo Wang (josephw10000@gmail.com)
LICENSE
Apache-2.0 | Apache License, Version 2.0 | https://www.apache.org/licenses/LICENSE-2.0
> maf2maf.pl --man
# maf2maf.pl --man
NAME
maf2maf.pl - Reannotate the effects of variants in a MAF by running maf2vcf followed by vcf2maf
SYNOPSIS
perl maf2maf.pl --help
perl maf2maf.pl --input-maf test.maf --output-maf test.vep.maf
OPTIONS
--input-maf Path to input file in MAF format
--output-maf Path to output MAF file [Default: STDOUT]
--tmp-dir Folder to retain intermediate VCFs/MAFs after runtime [Default: usually /tmp]
--tum-depth-col Name of MAF column for read depth in tumor BAM [t_depth]
--tum-rad-col Name of MAF column for reference allele depth in tumor BAM [t_ref_count]
--tum-vad-col Name of MAF column for variant allele depth in tumor BAM [t_alt_count]
--nrm-depth-col Name of MAF column for read depth in normal BAM [n_depth]
--nrm-rad-col Name of MAF column for reference allele depth in normal BAM [n_ref_count]
--nrm-vad-col Name of MAF column for variant allele depth in normal BAM [n_alt_count]
--retain-cols Comma-delimited list of columns to retain from the input MAF [Center,Verification_Status,
Validation_Status,Mutation_Status,Sequencing_Phase,Sequence_Source,Validation_Method,Score,BAM_file,Sequencer,
Tumor_Sample_UUID,Matched_Norm_Sample_UUID]
--custom-enst List of custom ENST IDs that override canonical selection
--vep-path Folder containing the vep script [~/vep]
--vep-data VEP's base cache/plugin directory [~/.vep]
--vep-forks Number of forked processes to use when running VEP [4]
--buffer-size Number of variants VEP loads at a time; Reduce this for low memory systems [5000]
--any-allele When reporting co-located variants, allow mismatched variant alleles too
--filter-vcf A VCF for FILTER tag common_variant. Set to 0 to disable [~/.vep/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz]
--max-filter-ac Use tag common_variant if the filter-vcf reports a subpopulation AC higher than this [10]
--species Ensembl-friendly name of species (e.g. mus_musculus for mouse) [homo_sapiens]
--ncbi-build NCBI reference assembly of variants in MAF (e.g. GRCm38 for mouse) [GRCh37]
--cache-version Version of offline cache to use with VEP (e.g. 75, 84, 91) [Default: Installed version]
--ref-fasta Reference FASTA file [~/.vep/homo_sapiens/91_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz]
--help Print a brief help message and quit
--man Print the detailed manual
DESCRIPTION
This script runs a given MAF through maf2vcf to generate per-TN-pair
VCFs in a temporary folder, and then runs vcf2maf on each VCF to
reannotate variant effects and create a new combined MAF
Relevant links:
Homepage: https://github.com/ckandoth/vcf2maf
VCF format: http://samtools.github.io/hts-specs/
MAF format: https://wiki.nci.nih.gov/x/eJaPAQ
VEP: http://ensembl.org/info/docs/tools/vep/index.html
VEP annotated VCF format: http://ensembl.org/info/docs/tools/vep/vep_formats.html#vcfout
AUTHORS
Cyriac Kandoth (ckandoth@gmail.com)
Qingguo Wang (josephw10000@gmail.com)
LICENSE
Apache-2.0 | Apache License, Version 2.0 | https://www.apache.org/licenses/LICENSE-2.0
EnsemblのVEPも導入される。
help
> vep
テストラン
vcfとリファレンスを指定する(リファレンス指定フラグ"--ref-fasta"がないとHomo_sapiens.GRCh37.75.dna.primary_assembly.fa.gzが使用される(ファイルがある場合のみ))。
git clone https://github.com/mskcc/vcf2maf.git
cd vcf2maf/
perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf -ref-fasta <path>/<to>/GRCh37.primary_assembly.fa.gz
出力MAFの16列目と17列目にtumor/normalの標本IDを記入し、VCFのマッチした遺伝子型から遺伝子型と対立遺伝子数を解析するには、--tumor-idと--normal-idフラグを立てる。
vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf --tumor-id WD1309 --normal-id NB1308
マッチしたnormalサンプルがない場合は--normal-idオプションのみスキップする。
引用
https://github.com/mskcc/vcf2maf
関連
関連スクリプト
no longer supported (deprecated)
参考
https://www.researchgate.net/post/Vcf2maf_how_does_it_select_only_one_gene_from_overlapping_genes