macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

従来型のMLSTタイピングを行う mlst

2020 6/4 コマンド修正

 

mlstは、ゲノム配列を入力として伝統的な7つの遺伝子座に対するタイピングを行う。PubMLSTのタイピングスキームに従う。

 

インストール

依存

  • Perl >= 5.26
  • NCBI BLAST+ blastn >= 2.9.0
  • You probably have blastn already installed already.
  • If you use Brew or Conda, this will install the blast package for you.
  • Perl modules: Moo,List::MoreUtils,JSON
  • Debian: sudo apt-get install libmoo-perl liblist-moreutils-perl libjson-perl
  • Redhat: sudo apt-get install perl-Moo perl-List-MoreUtils perl-JSON
  • Most Unix: sudo cpan Moo List::MoreUtils JSON
  • any2fasta
  • Converts sequence files to FASTA, even compressed ones

Github

#bioconda(link)
conda create -n mlst -y
conda activate mlst
conda install -c conda-forge -c bioconda -c defaults mlst -y

#homebrew
brew install brewsci/bio/mlst

 > mlst -h

$ mlst -h

SYNOPSIS

  Automatic MLST calling from assembled contigs

USAGE

  % mlst --list                                            # list known schemes

  % mlst [options] <contigs.{fasta,gbk,embl}[.gz]          # auto-detect scheme

  % mlst --scheme <scheme> <contigs.{fasta,gbk,embl}[.gz]> # force a scheme

GENERAL

  --help            This help

  --version         Print version and exit(default ON)

  --check           Just check dependencies and exit (default OFF)

  --quiet           Quiet - no stderr output (default OFF)

  --threads [N]     Number of BLAST threads (suggest GNU Parallel instead) (default '1')

  --debug           Verbose debug output to stderr (default OFF)

SCHEME

  --scheme [X]      Don't autodetect, force this scheme on all inputs (default '')

  --list            List available MLST scheme names (default OFF)

  --longlist        List allelles for all MLST schemes (default OFF)

  --exclude [X]     Ignore these schemes (comma sep. list) (default 'ecoli_2,abaumannii')

OUTPUT

  --csv             Output CSV instead of TSV (default OFF)

  --json [X]        Also write results to this file in JSON format (default '')

  --label [X]       Replace FILE with this name instead (default '')

  --nopath          Strip filename paths from FILE column (default OFF)

  --novel [X]       Save novel alleles to this FASTA file (default '')

  --legacy          Use old legacy output with allele header row (requires --scheme) (default OFF)

SCORING

  --minid [n.n]     DNA %identity of full allelle to consider 'similar' [~] (default '95')

  --mincov [n.n]    DNA %cov to report partial allele at all [?] (default '10')

  --minscore [n.n]  Minumum score out of 100 to match a scheme (when auto --scheme) (default '50')

PATHS

  --blastdb [X]     BLAST database (default '/Users/kazu/anaconda3/envs/mlst/db/blast/mlst.fa')

  --datadir [X]     PubMLST data (default '/Users/kazu/anaconda3/envs/mlst/db/pubmlst')

HOMEPAGE

  https://github.com/tseemann/mlst - Torsten Seemann

 

実行方法

ゲノム配列、またはcontig配列のfastagenbankファイルを指定する。

mlst input.fa > out

#複数
mlst *fasta --threads 8 > out

gzip, zip、 bzip2圧縮状態にも対応している。

 

Streptomyces atratusのstrain SCSIO ZH16 完全長ゲノムに対して適用してみた。

mlst Streptomyces_atratus_strain_SCSIO_ZH16.fasta > out

$ mlst Salmonella_enterica_subsp._enterica_serovar_Enteritidis_str._RM2968.fasta 

[13:34:55] This is mlst 2.19.0 running on darwin with Perl 5.026002

[13:34:55] Checking mlst dependencies:

[13:34:55] Found 'blastn' => /Users/kazu/anaconda3/envs/mlst/bin/blastn

[13:34:55] Found 'any2fasta' => /Users/kazu/anaconda3/envs/mlst/bin/any2fasta

[13:34:55] Found blastn: 2.9.0+ (002009)

[13:34:55] Excluding 2 schemes: abaumannii ecoli_2

[13:34:57] Found exact allele match ecoli.icd-40

[13:34:57] Found exact allele match senterica.thrA-11

[13:34:57] Found exact allele match senterica.sucA-6

[13:34:57] Found exact allele match senterica.hisD-7

[13:34:57] Found exact allele match senterica.dnaN-2

[13:34:57] Found exact allele match senterica.aroC-5

[13:34:57] Found exact allele match senterica.hemD-3

[13:34:57] Found exact allele match senterica.purE-6

Salmonella_enterica_subsp._enterica_serovar_Enteritidis_str._RM2968.fasta senterica 11 aroC(5) dnaN(2) hemD(3) hisD(7) purE(6) sucA(6) thrA(11)

[13:34:57] If you like MLST, you're absolutely going to love wgMLST!

[13:34:57] Done.

見えにくいが、以下の部分が結果。

Salmonella_enterica_subsp._enterica_serovar_Enteritidis_str._RM2968.fasta senterica 11 aroC(5) dnaN(2) hemD(3) hisD(7) purE(6) sucA(6) thrA(11)

オートモードで実行したが、sentericaと検出され、7つの遺伝子座とタイピング結果が示されている。

 

スキームを自動指定せず、ユーザーが指定することも可能。 利用可能なスキームを表示。

 mlst --list

$ mlst --list 

cronobacter otsutsugamushi wolbachia sinorhizobium sthermophilus_2 ssuis csputorum cconcisus tpallidum neisseria ecloacae bcereus chyointestinalis senterica borrelia spyogenes mmassiliense vvulnificus saureus shaemolyticus pacnes pmultocida_multihost pfluorescens ganatis bpseudomallei sbsec bsubtilis kaerogenes vibrio aeromonas kkingae slugdunensis cmaltaromaticum bhampsonii bpilosicoli ypseudotuberculosis sdysgalactiae kpneumoniae vtapetis cbotulinum spneumoniae sthermophilus sepidermidis vparahaemolyticus clari yersinia mpneumoniae smaltophilia psalmonis leptospira mhaemolytica pgingivalis ecoli_2 hinfluenzae mhyorhinis bbacilliformis efaecalis cdiphtheriae abaumannii cdifficile fpsychrophilum arcobacter pputida cupsaliensis mcatarrhalis taylorella chelveticus mbovis pdamselae blicheniformis scanis edwardsiella abaumannii_2 plarvae mycobacteria leptospira_3 streptomyces koxytoca ecoli lsalivarius soralis hcinaedi mhyopneumoniae ppentosaceus mabscessus bhyodysenteriae xfastidiosa leptospira_2 bcc pmultocida_rirdc vcholerae2 bintermedia yruckeri suberis tenacibaculum bhenselae magalactiae brucella clanienae msynoviae vcholerae cinsulaenigrae dnodosus chlamydiales lmonocytogenes mflocculare miowae shominis szooepidemicus sgallolyticus hparasuis mcanis ranatipestifer campylobacter mcaseolyticus efaecium orhinotracheale brachyspira csepticum bwashoensis cfreundii paeruginosa sagalactiae rhodococcus achromobacter mplutonius liberibacter cfetus hsuis aphagocytophilum hpylori ureaplasma bordetella spseudintermedius

 

スキームを指定して実行。

mlst --scheme vibrio Vibrio*.gbk.gz > out

 

--minidを下げて実行。

mlst --scheme vibrio --minid 80 Vibrio*.gbk.gz > out

整数は正確な遺伝子の一致、~1などはその対立遺伝子に類似していることを表します。(100% ≥ --minid)。詳細はGithubのMissing dataの部分を読んでください。

引用

Seemann T, mlst Github https://github.com/tseemann/mlst

 

"This publication made use of the PubMLST website (https://pubmlst.org/) developed by Keith Jolley (Jolley & Maiden 2010, BMC Bioinformatics, 11:595) and sited at the University of Oxford. The development of that website was funded by the Wellcome Trust".

 

関連