2019-08-10

ヒトゲノムを扱えるハイパフォーマンスなロングリードアセンブラ Shasta

2020 3/14 動画追加

2020 9/30 論文引用追加

2022/02/04 v0.9

2022/06/08 アップデートに伴いコマンド修正, help更新

　ロングリードシーケンシング技術からヒトゲノムアセンブリを作成する現在のワークフローは、大きなコホートへの効率的な拡大を妨げるコストおよび生産時間のボトルネックを有している。著者らは11のヒトゲノム用に最適化されたPromethIONナノポアシーケンス法を実証する。 9日間、1台のマシンで実行されたシーケンスから、サンプルあたりわずか3つのフローセルを使用して、平均63倍のカバレッジ、42 KbのリードN50、90％の平均リード同一性、100 Kb +リードで6.5倍のカバレッジが達成された。これらのデータを集めるために、新しい計算ツールを導入した：Shasta - de novo long readアセンブラ、およびMarginPolish＆HELEN - 一連のナノポアアセンブリポリッシングアルゴリズム。 1つの商用計算ノードで、Shastaは6時間以内に完全なヒトゲノムアセンブリを作成することができ、MarginPolish＆HELENはわずか1日で結果を磨き、ナノポアリードのみからの半数体サンプルに対して99.9％の同一性（QV30）を達成できる。我々（著者ら）は、精度、コスト、および時間の観点から、二倍体、一倍体、および trio-binned のヒトサンプルのアセンブリ性能を評価し、すべての分野で現在の最先端の方法と比較して改善を実証する。さらに、Hi-Cシークエンシングを追加すると、11のゲノムすべてについて染色体レベルのscaffoldsが得られることを示す。

user documentation

https://chanzuckerberg.github.io/shasta/QuickStart.html

2022/02/04

🧬Shasta 0.9.0 is now available!🧬
*Improved de novo long read phased assembly
*Higher quality, fewer artifacts
We are actively developing phased assemblies and love feedback. Please give it a try and reach out to us on GitHub.@BenedictPaten @cziscience https://t.co/tFcYF2KxHy
— Dr. Sara Simmonds 🧬🐚🐠🌱🌸 (@SeSimmonds) February 3, 2022

2020/10/06

New Shasta v0.6 release: https://t.co/3F3MbCoV0I
Huge progress on almost every front; first hints of haplotype resolved ONT only assembly. @nanopore @cziscience
— Benedict Paten (@BenedictPaten) 2020年10月6日

London Calling 2019

f:id:kazumaxneo:20200314115011p:plain

Shastaについても言及（18:50付近）

Quickstart

https://chanzuckerberg.github.io/shasta/QuickStart.html#QuickStartLinux

インストール

オーサーらが用意したdockerイメージを使ってテストした。

本体　Github

各プラットフォーム向けの開発段階のバイナリが配布されている（テスト時はダウンロードできなかった）。

#conda (link)
mamba create -n shasta -y
conda activate shasta
mamba install -c bioconda shasta -y

#docker images latestタグのダウンロード
docker pull tpesout/shasta:latest

#古いバージョン
#linux（Ubuntu 16.04, and 18.04, Linux Mint 18.3, CentOS 7.6, Debian 9, Fedora 29）
wget https://github.com/chanzuckerberg/shasta/releases/download/X.Y.Z/shasta-Linux-X.Y.Z
chmod ugo+x shasta-Linux-X.Y.Z

#macos
curl -O -L https://github.com/chanzuckerberg/shasta/releases/download/X.Y.Z/shasta-macOS-X.Y.Z
#shasta-macOS-X.Y.Zにリネームする

> shasta

Shasta Release 0.8.0

2022-Jun-07 23:51:27.173048 Assembly begins with the following command line:

shasta

Option "--config" is missing and is now required to run an assembly.

It must specify either a configuration file

or one of the following built-in configurations:

HiFi-Oct2021

Nanopore-Dec2019

Nanopore-Jun2020

Nanopore-Oct2021

Nanopore-OldGuppy-Sep2020

Nanopore-Phased-Aug2021

Nanopore-Plants-Apr2021

Nanopore-Sep2020

Nanopore-UL-Dec2019

Nanopore-UL-Jun2020

Nanopore-UL-Oct2021

Nanopore-UL-Phased-Oct2021

Nanopore-UL-Sep2020

Nanopore-UL-iterative-Sep2020

2022-Jun-07 23:51:27.177074 Option "--config" is missing and is now required to run an assembly.

> shasta -h

Options allowed only on the command line:

-h [ --help ] Write a help message.

-v [ --version ] Identify the Shasta version.

--config arg Configuration name. Can be the name of

a built-in configuration or the name of

a configuration file.

--input arg Names of input files containing reads.

Specify at least one.

--assemblyDirectory arg (=ShastaRun) Name of the output directory. If

command is assemble, this directory

must not exist.

--command arg (=assemble) Command to run. Must be one of:

assemble, saveBinaryData,

cleanupBinaryData, explore,

createBashCompletionScript

--memoryMode arg (=anonymous) Specify whether allocated memory is

anonymous or backed by a filesystem.

Allowed values: anonymous, filesystem.

--memoryBacking arg (=4K) Specify the type of pages used to back

memory.

Allowed values: disk, 4K , 2M (for best

performance). All combinations

(memoryMode, memoryBacking) are allowed

except for (anonymous, disk).

Some combinations require root

privilege, which is obtained using sudo

and may result in a password prompting

depending on your sudo set up.

--threads arg (=0) Number of threads, or 0 to use one

thread per virtual processor.

--exploreAccess arg (=user) Specify allowed access for --command

explore. Allowed values: user, local,

unrestricted. DO NOT CHANGE FROM

DEFAULT VALUE WITHOUT UNDERSTANDING THE

SECURITY IMPLICATIONS.

--port arg (=17100) Port to be used by the http server

(command --explore).

--alignmentsPafFile arg The name of a PAF file containing

alignments of reads to a reference.

Only used for --command explore, for

display of the alignment candidate

graph. Experimental.

Options allowed on the command line and in the config file:

--Reads.minReadLength arg (=10000) Read length cutoff. Shorter reads are

discarded.

--Reads.desiredCoverage arg (=0) Reduce coverage to desired value. If

not zero, specifies desired coverage

(number of bases). The read length

cutoff specified via

--Reads.minReadLength is increased to

reduce coverage to the specified value.

Power of 10 multipliers can be used,

for example 120Gb to request 120 Gb of

coverage.

--Reads.noCache If set, skip the Linux cache when

loading reads. This is done by

specifying the O_DIRECT flag when

opening input files containing reads.

--Reads.palindromicReads.skipFlagging

Skip flagging palindromic reads. Oxford

Nanopore reads should be flagged for

better results.

--Reads.palindromicReads.maxSkip arg (=100)