Kmasker - macでインフォマティクス

　多くの植物ゲノムは、高レベルのrepetitive sequencesを持っている。ハイスループットシーケンスリードを使用したこれらの複雑なゲノムのアセンブリは、依然として困難な作業である。これらのデータセットの repeat complexity を過小評価または無視すると、ダウンストリームの分析を容易に誤った方向に導く可能性がある。 k-merカウントの方法によるrepetitive regionsの検出は、信頼できることが証明された。 kmerカウントを利用する使いやすいアプリケーションは、特に植物の分野で高い需要がある。

　コマンドラインおよびWebベースのソリューションとして提供されるゲノムデータの分析ワークフロー全体で、k-merカウント情報をアシスタントとして使用するツールであるKmasker plantsを紹介する。repetitive sequencesをスクリーニングおよびマスクするコアコンピタンスに加えて、異なる品種またはclosely relatedな種間の比較研究を可能にする機能と、Cas9エンドヌクレアーゼを使用した部位特異的突然変異誘発の適用のためのガイドRNAの標的特異性を推定する方法を統合した。さらに、経済的に最も重要な10の栽培種の事前に計算されたインデックスを維持するKmasker plantsのWebサービスをセットアップする。

Kmasker plants ‐ a tool for assessing complex sequence space in plant species https://t.co/fcfWCDhQhk
— Uwe Scholz (@UweScholz271) 2019年12月11日

webサービスFAQ

https://kmasker.ipk-gatersleben.de//?id=faq

tutorial

https://doi.ipk-gatersleben.de/DOI/10fdd0bb-825f-459a-9d08-7c04066208f0/77a28b20-87c1-402b-ae36-47af029956e0/2

インストール

ubuntu18.04LTSでテストした。

Github

#bioconda (link)
conda create -n Kmasker Kmasker
conda activate Kmasker

Check Kmasker installation

> Kmasker --check_config --verbose

help

> Kmasker

$ Kmasker

Usage of program Kmasker:

(version: 1.1.1 rc231015) (session id: 3shXCd251r)

Description:

Kmasker is a tool for the automatic detection of repetitive sequence regions.

There are three modules and you should select one for your analysis.

Modules:

--build construction of new index (requires --seq)

--run perform analysis and masking (requires --fasta)

--explore perform downstream analysis with constructed index and detected repeats

General options:

--show_repository show complete list of private and external k-mer indices

--show_details show details for a requested kindex

--show_path show path Kmaskers looks for constructed kindex

--remove_kindex remove kindex from repository

--set_private_path change path to private repository

--set_external_path change path to external repository [readonly]

--expert_setting_kmasker submit individual parameter to Kmasker eg. pctgap,

minseed, mingff (see documentation!)

--expert_setting_jelly submit individual parameter to jellyfish (e.g. on memory usage

for index construction)

--expert_setting_blast submit individual parameter to blast (e.g. '-evalue')

--threads set number of threads [4]

--bed force additional BED output [off]

--user_conf set specific user configuration file [/Users/kazu/.kmasker_user.config]

--global_conf set specific global configuration file [/Users/kazu/anaconda3/envs/Kmasker/etc/kmasker.config]

--check_install shows the detected/configured path for all used applications

--setid set a user specified process id

--long_id create a process id that is unique for this host (e.g. for use in cluster environments)

--temp sets the location of temporary files [./temp/]

--verbose enables verbose output and keeps log files

--make_model For use with krispr: Build a new krispr model. You have to specifiy a .csv after this paramter. Details at https://git.io/JecYI. You can use -m to specify the coverage threshold.

> Kmasker --build

実行方法

１、build - パスの設定（初回のみ）

Kmasker --build --set_private_path path/to/directory

２、indexing - k-mer インデックス構造の構築

Kmasker --build --seq input.fq --gs 135 --in At1 --cn arabidopsis

３、run - Kmaskerのコアプロセス

4つの一般的なオプションがある。1) SINGLEまたはMULTIPLEインデックス構造を用いた基本的なk-mer解析、2)蛍光in situハイブリダイゼーション(FISH)に適用可能な候補配列のスクリーニング、3)適用されているk-merインデックス構造の違いを検索する比較解析、4)ゲノム全体の特異性を調べるための短い配列プローブの解析。

1) 基本的なk-mer解析 - 作成したindexとゲノムのFASTAファイルを指定する。

Kmasker --run --fasta query.fasta --kindex At1

f:id:kazumaxneo:20200707003353p:plain

KMASKER_masked_KDX_At1_1Cwrcpk6RY.fasta

f:id:kazumaxneo:20200707003348p:plain

Xでマスクされる。runコマンドの他の使い方についてはGithubと論文を読んで下さい。

webサービス

https://kmasker.ipk-gatersleben.deにアクセスする。

メールアドレス、植物種を指定する（指定k-mer長のindexが構築済みで管理されている）。

f:id:kazumaxneo:20200707003617p:plain

リピートマスクを行いたいゲノム配列をアップロードする。

f:id:kazumaxneo:20200707003807p:plain

パラメータについてはFAQを確認して下さい。

引用

Kmasker plants ‐ a tool for assessing complex sequence space in plant species
Sebastian Beier Chris Ulpinnis Markus Schwalbe Thomas Münch Robert Hoffie Iris Koeppel Christian Hertig Nagaveni Budhagatapalli Stefan Hiekel Krishna Mohan Pathi Goetz Hensel Martin Grosse Sindy Chamas Sophia Gerasimova Jochen Kumlehn Uwe Scholz Thomas Schmutzer

Plant J. 2020 May;102(3):631-642