Mapsembler2は、ターゲットを絞ったアセンブリソフトウェアである。Mapsembler2は、入力として任意のNGSのrawリードセットとスターター配列を取り、ユーザーの選択に応じて、そのスターター配列近傍を線形シーケンスまたはグラフとして出力する。
以下のような用途で使うことが提案されている。
1、アセンブルした配列を検証する(SNV、SVsがあるかどうか)。
2、特定の酵素をコードする塩基配列がメタゲノムのリードセット内に存在するか調べる。
3、配列をextendする。
紹介
https://www.biostars.org/p/109189/
解説
https://www.biostars.org/p/109189/
HP
https://colibread.inria.fr/software/mapsembler2/
音が出ます。
インストール
#bioconda(link)
conda create -n mapsembler2 -y
conda activate mapsembler2
conda install -c bioconda -y mapsembler2
> mapsembler2_extend
$ mapsembler2_extend
NAME
mapsembler_extend, version 1.0.0 - Copyright INRIA - CeCILL License
SYNOPSIS
mapsembler2_extend <extrem_kmers.fasta> <readsC1.fasta> [<readsC2.fasta> [<readsC3.fasta] ...] [-t extension_type] [-k value] [-c value] [-g value] [-i index_name] [-o name] [-h]
DESCRIPTION
TODO
OPTIONS
-t extension_type. Default: 1
1: a strict sequence: any branching stops the extension
2: a consensus sequence: contiging approach
3: a strict graph: any branching is conserved in the graph
4: a consensus graph: "small" polymorphism is merged, but "large" structures are represented
-k size_kmers: Size of the k-mers used duriung the extension phase Default: 31. Accepted range, depends on the compilation (make k=42 for instance)
-c min_coverage: a sequence is covered by at least min_coverage coherent reads. Default: 2
-g estimated_genome_size: estimation of the size of the genome whose reads come from.
It is in bp, does not need to be accurate, only controls memory usage. Default: 3 billion
-x node_len: limit max of nodes length. Default: 40
-y graph_max_depth: limit max of graph depth.Default: 10000
-i index_name: stores the index files in files starting with this prefix name. Can be re-used latter. Default: "index"
IF THE FILE "index_name.bloom" EXISTS: the index is not re-created
-o file_name_prefix: where to write outputs. Default: "res_mapsembler"
-p search_mod: kind of prosses Breadth or Depth. Default: Breadth
-h prints this message and exit
> mapsembler2_extremities
$ mapsembler2_extremities
USAGE for 'mapsembler2_extremities'
--k (1 arg) : kmer size that will be used for mapsembler2 [default '']
--starters (1 arg) : starters fasta file [default '']
--reads (1 arg) : reads dataset file name. Several reads sets can be provided, surrounded by "'s and separated by a space (e.g. --reads "reads1.fa reads2.fa") [default '']
--output (1 arg) : output substarters file name [default '']
--min-solid-subkmer (1 arg) : minimim abundance to keep a subkmer [default '3']
-debug (0 arg) : debugging
-nb-cores (1 arg) : number of cores [default '0']
-verbose (1 arg) : verbosity level [default '1']
-help (0 arg) : display help about possible options
実行方法
スターター配列とペアエンドfastq(gzip圧縮にも対応)を指定する。
mapsembler2_extend starter.fasta pair_1.fq pair_2.fq -t 1 -k 31 -c 2
- -t extension_type. Default: 1
1: a strict sequence: any branching stops the extension
2: a consensus sequence: contiging approach
3: a strict graph: any branching is conserved in the graph
4: a consensus graph: "small" polymorphism is merged, but "large" structures are represented - -k Size of the k-mers used duriung the extension phase Default: 31. Accepted range, depends on the compilation (make k=42 for instance)
- -c min_coverage: a sequence is covered by at least min_coverage coherent reads. Default: 2
引用
関連