ターゲット配列とその近傍領域をアセンブリする mapsembler2

Mapsembler2は、ターゲットを絞ったアセンブリソフトウェアである。Mapsembler2は、入力として任意のNGSのrawリードセットとスターター配列を取り、ユーザーの選択に応じて、そのスターター配列近傍を線形シーケンスまたはグラフとして出力する。

以下のような用途で使うことが提案されている。

１、アセンブルした配列を検証する（SNV、SVsがあるかどうか）。

２、特定の酵素をコードする塩基配列がメタゲノムのリードセット内に存在するか調べる。

３、配列をextendする。

紹介

https://www.biostars.org/p/109189/

解説

https://www.biostars.org/p/109189/

https://colibread.inria.fr/software/mapsembler2/

音が出ます。

インストール

#bioconda(link)
conda create -n mapsembler2 -y
conda activate mapsembler2
conda install -c bioconda -y mapsembler2

> mapsembler2_extend

$ mapsembler2_extend

NAME

mapsembler_extend, version 1.0.0 - Copyright INRIA - CeCILL License

SYNOPSIS

mapsembler2_extend <extrem_kmers.fasta> <readsC1.fasta> [<readsC2.fasta> [<readsC3.fasta] ...] [-t extension_type] [-k value] [-c value] [-g value] [-i index_name] [-o name] [-h]

DESCRIPTION

TODO

OPTIONS

-t extension_type. Default: 1

1: a strict sequence: any branching stops the extension

2: a consensus sequence: contiging approach

3: a strict graph: any branching is conserved in the graph

4: a consensus graph: "small" polymorphism is merged, but "large" structures are represented

-k size_kmers: Size of the k-mers used duriung the extension phase Default: 31. Accepted range, depends on the compilation (make k=42 for instance)

-c min_coverage: a sequence is covered by at least min_coverage coherent reads. Default: 2

-g estimated_genome_size: estimation of the size of the genome whose reads come from.

It is in bp, does not need to be accurate, only controls memory usage. Default: 3 billion

-x node_len: limit max of nodes length. Default: 40

-y graph_max_depth: limit max of graph depth.Default: 10000

-i index_name: stores the index files in files starting with this prefix name. Can be re-used latter. Default: "index"

IF THE FILE "index_name.bloom" EXISTS: the index is not re-created

-o file_name_prefix: where to write outputs. Default: "res_mapsembler"

-p search_mod: kind of prosses Breadth or Depth. Default: Breadth

-h prints this message and exit

> mapsembler2_extremities

$ mapsembler2_extremities

USAGE for 'mapsembler2_extremities'

--k (1 arg) : kmer size that will be used for mapsembler2 [default '']

--starters (1 arg) : starters fasta file [default '']

--reads (1 arg) : reads dataset file name. Several reads sets can be provided, surrounded by "'s and separated by a space (e.g. --reads "reads1.fa reads2.fa") [default '']

--output (1 arg) : output substarters file name [default '']

--min-solid-subkmer (1 arg) : minimim abundance to keep a subkmer [default '3']

-debug (0 arg) : debugging

-nb-cores (1 arg) : number of cores [default '0']

-verbose (1 arg) : verbosity level [default '1']

-help (0 arg) : display help about possible options

実行方法

スターター配列とペアエンドfastq（gzip圧縮にも対応）を指定する。

mapsembler2_extend starter.fasta pair_1.fq pair_2.fq -t 1 -k 31 -c 2

-t extension_type. Default: 1
1: a strict sequence: any branching stops the extension
2: a consensus sequence: contiging approach
3: a strict graph: any branching is conserved in the graph
4: a consensus graph: "small" polymorphism is merged, but "large" structures are represented
-k Size of the k-mers used duriung the extension phase Default: 31. Accepted range, depends on the compilation (make k=42 for instance)
-c min_coverage: a sequence is covered by at least min_coverage coherent reads. Default: 2