2019 6/11 データベース追記
HmtNoteは、VCFファイルからヒトのミトコンドリアのバリアントにアノテーションを付けるためのPythonパッケージである。バリアントは、基本、相互参照、変動性、予測のサブセットにグループ化された幅広い情報を使用してアノテーションが付けられているので、ユーザーは関心のある特定のアノテーションを選択するか、またはそれらをすべて使用できる。アノテーションは、最近公開されたヒトミトコンドリア変異のデータベースであるHmtVarのデータを使用して実行される。これは、いくつかのオンラインリソースから情報を収集し、病原性予測を提供する。HmtNoteは、インターネット接続に頼ることなく、オフラインでバリアントにアノテーションを付けるために使用できるローカルアノテーションデータベースをダウンロードすることもできる。HmtNoteはフリーでオープンソースのパッケージで、PyPI(https://pypi.org/project/hmtnote)またはGitHub(https://github.com/robertopreste/HmtNote)からダウンロードしてインストールできる。
People of #bioinformatics, if you need to annotate human mitochondrial variants from VCF files please have a look at my new HmtNote #python python package! 😊 https://t.co/4GI9W2HYev https://t.co/NoArqXGX1r
— Roberto Preste (@robertopreste) April 11, 2019
マニュアル
インストール
ubuntu16.04のminiconda3-4.0.5環境でテストした。
依存
- HmtNote only supports Python 3.
本体 Github
pip install hmtnote
export LC_ALL=C.UTF-8 && export LANG=C.UTF-8
> hmtnote --help
# hmtnote --help
Usage: hmtnote [OPTIONS] COMMAND [ARGS]...
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
annotate Annotate a VCF file using data from HmtVar.
dump Download databases from HmtVar for offline annotation.
> hmtnote annotate --help
# hmtnote annotate --help
Usage: hmtnote annotate [OPTIONS] INPUT_VCF OUTPUT_VCF
Annotate a VCF file using data from HmtVar.
If neither --basic, --crossref, --variab nor --predict are provided, they
will all default to True, and the VCF will be annotated using all the
available information. If no internet connection is available, use the
--offline option to use the local database for annotation (you must have
previously downloaded it using the hmtnote dump command).
Options:
-b, --basic Annotate VCF using basic information (locus, pathogenicity,
etc.)
(default: False)
-c, --crossref Annotate VCF using cross-reference information (Clinvar and
dbSNP IDs, etc.)
(default: False)
-v, --variab Annotate VCF using variability information (nucleotide and
aminoacid
variability, allele frequencies) (default: False)
-p, --predict Annotate VCF using predictions information (from MutPred,
Panther, Polyphen
and other resources) (default: False)
-o, --offline Annotate VCF using previously downloaded databases (offline
mode)
(default: False)
--help Show this message and exit.
> hmtnote dump --help
# hmtnote dump --help
Usage: hmtnote dump [OPTIONS]
Download databases from HmtVar for offline annotation.
Options:
--help Show this message and exit.
実行方法
on the fly でHmtVarのデータベース(webページ, pubmed) を使ってミトコンドリア変異のアノテーションを行う。入力のvcfと出力のvcfをそれぞれ指定する。annotateコマンドでは4つのアノテーション、basic、cross-reference、variability、predictionsを全て行う。
hmtnote annotate input.vcf output_annotated.vcf
本番環境がオフラインの場合、前もってデータをダウンロードしてから実行する。
hmtnote dump
hmtnote annotate input.vcf annotated.vcf --offline
hmtnote annotate input.vcf annotated_variability.vcf --variab --offline
下にも載せましたが、どのようなデータベースやフィルタリング条件が使われているかはmanualを確認してください。
https://hmtnote.readthedocs.io/en/latest/usage.html
マニュアルより転載
basic
- Locus: Locus to which the variant belongs
- AaChange: Aminoacidic change determined
- Pathogenicity: Pathogenicity predicted by HmtVar
- DiseaseScore: Disease score calculated by HmtVar
- HmtVar: HmtVar ID of the variant (can be used to view the related VariantCard on https://www.hmtvar.uniba.it/varCard/
)
Cross-reference
- Clinvar: Clinvar ID of the variant
- dbSNP: dbSNP ID of the variant
- OMIM: OMIM ID of the variant
- MitomapAssociatedDiseases: Diseases associated to the variant according to Mitomap
- MitomapSomaticMutations: Diseases associated to the variant according to Mitomap Somatic Mutations
Variability
- NtVarH: Nucleotide variability of the position in healthy individuals
- NtVarP: Nucleotide variability of the position in patient individuals
- AaVarH: Aminoacid variability of the position in healthy individuals
- AaVarP: Aminoacid variability of the position in patient individuals
- AlleleFreqH: Allele frequency of the variant in healthy individuals overall
- AlleleFreqP: Allele frequency of the variant in patient individuals overall
- AlleleFreqH_AF: Allele frequency of the variant in healthy individuals from Africa
- AlleleFreqP_AF: Allele frequency of the variant in patient individuals from Africa
- AlleleFreqH_AM: Allele frequency of the variant in healthy individuals from America
- AlleleFreqP_AM: Allele frequency of the variant in patient individuals from America
- AlleleFreqH_AS: Allele frequency of the variant in healthy individuals from Asia
- AlleleFreqP_AS: Allele frequency of the variant in patient individuals from Asia
- AlleleFreqH_EU: Allele frequency of the variant in healthy individuals from Europe
- AlleleFreqP_EU: Allele frequency of the variant in patient individuals from Europe
- AlleleFreqH_OC: Allele frequency of the variant in healthy individuals from Oceania
- AlleleFreqP_OC: Allele frequency of the variant in patient individuals from Oceania
Predictions
- MutPred_Prediction: Pathogenicity prediction offered by MutPred
- MutPred_Probability: Confidence of the pathogenicity prediction offered by MutPred
- Panther_Prediction: Pathogenicity prediction offered by Panther
- Panther_Probability: Confidence of the pathogenicity prediction offered by Panther
- PhDSNP_Prediction: Pathogenicity prediction offered by PhD SNP
- PhDSNP_Probability: Confidence of the pathogenicity prediction offered by PhD SNP
- SNPsGO_Prediction: Pathogenicity prediction offered by SNPs & GO
- SNPsGO_Probability: Confidence of the pathogenicity prediction offered by SNPs & GO
- Polyphen2HumDiv_Prediction: Pathogenicity prediction offered by Polyphen2 HumDiv
- Polyphen2HumDiv_Probability: Confidence of the pathogenicity prediction offered by Polyphen2 HumDiv
- Polyphen2HumVar_Prediction: Pathogenicity prediction offered by Polyphen2 HumVar
- Polyphen2HumVar_Probability: Confidence of the pathogenicity prediction offered by Polyphen2 HumVar
hmtnoteはpython moduleとしても利用できます。
引用
Human mitochondrial variant annotation with HmtNote
R. Preste, R. Clima, M. Attimonelli
bioRxiv preprint first posted online Apr. 10, 2019