出版品質の一塩基多型可視化プロットを出力する Snipit

Snipitは、配列中の一塩基多型を参照配列と比較して要約するためにデザインされた解析・可視化ツールである。このツールは、ヌクレオチドやアミノ酸の違いを効率的にカタログ化し、カスタマイズ可能な出版用の図を通して明確な比較を可能にする。設定可能なカラーパレット、カスタマイズ可能なレコードソート、複数のフォーマットで図を出力する機能など、snipitは様々な分野の研究者に使いやすいインターフェースを提供する。さらに、snipitには組換えパターンを図示するための特別なrecombi-modeが含まれており、通常では検出が困難な配列間の関係を強調することができる。

SnipitはオープンソースのPythonベースのツールで、GNU-GPL 3.0ライセンス(https://github.com/aineniamh/snipit)の下、GitHubでホストされている。PyPiからpipを使ってインストールできる。ソースコードと追加ドキュメントはGitHub リポジトリにある。

インストール

Github

#pip (pypi)
pip install snipit

> snipit

usage: snipit <alignment> [options]

snipit

optional arguments:

-h, --help show this help message and exit

Input options:

alignment Input alignment fasta file

-t {nt,aa}, --sequence-type {nt,aa}

Input sequence type: aa or nt

-r REFERENCE, --reference REFERENCE

Indicates which sequence in the alignment is the

reference (by sequence ID). Default: first sequence in

alignment

-l LABELS, --labels LABELS

Optional csv file of labels to show in output snipit

plot. Default: sequence names

--l-header LABEL_HEADERS

Comma separated string of column headers in label csv.

First field indicates sequence name column, second the

label column. Default: 'name,label'

Mode options:

--recombi-mode Allow colouring of query seqeunces by mutations

present in two 'recombi-references' from the input

alignment fasta file

--recombi-references RECOMBI_REFERENCES

Specify two comma separated sequence IDs in the input

alignment to use as 'recombi-references'. Ex.

Sequence_ID_A,Sequence_ID_B

--cds-mode Assumes sequence supplied is a coding sequence

Output options:

-d OUTPUT_DIR, --output-dir OUTPUT_DIR

Output directory. Default: current working directory

-o OUTFILE, --output-file OUTFILE

Output file name stem. Default: snp_plot

-s, --write-snps Write out the SNPs in a csv file.

-f FORMAT, --format FORMAT

Format options (png, jpg, pdf, svg, tiff) Default: png

Figure options:

--height HEIGHT Overwrite the default figure height

--width WIDTH Overwrite the default figure width

--size-option SIZE_OPTION

Specify options for sizing. Options: expand, scale

--solid-background Force the plot to have a solid background, rather than

a transparent one.

-c , --colour-palette

Specify colour palette. Options: [classic,

classic_extended, primary, purine-pyrimidine,

greyscale, wes, verity, ugene]. Use ugene for protein

alignments.

--flip-vertical Flip the orientation of the plot so sequences are

below the reference rather than above it.

--sort-by-mutation-number

Render the graph with sequences sorted by the number

of SNPs relative to the reference (fewest to most).

Default: False

--sort-by-id Sort sequences alphabetically by sequence id. Default:

False

--sort-by-mutations SORT_BY_MUTATIONS

Sort sequences by bases at specified positions.

Positions are comma separated integers. Ex. '1,2,3'

--high-to-low If sorted by mutation number is selected, show the

sequences with the fewest SNPs closest to the

reference. Default: False

--remove-site-text Do not annotate text on the individual columns in the

figure.

SNP options:

--show-indels Include insertion and deletion mutations in snipit

plot.

--include-positions INCLUDED_POSITIONS [INCLUDED_POSITIONS ...]

One or more range (closed, inclusive; one-indexed) or

specific position only included in the output. Ex.

'100-150' or Ex. '100 101' Considered before '--

exclude-positions'.

--exclude-positions EXCLUDED_POSITIONS [EXCLUDED_POSITIONS ...]

One or more range (closed, inclusive; one-indexed) or

specific position to exclude in the output. Ex.

'100-150' or Ex. '100 101' Considered after '--

include-positions'.

--ambig-mode {all,snps,exclude}

Controls how ambiguous bases are handled - [all]

include all ambig such as N,Y,B in all positions;

[snps] only include ambig if a snp is present at the

same position; [exclude] remove all ambig, same as

depreciated --exclude-ambig-pos

Misc options:

-v, --version show program's version number and exit

実行方法

multi-fastaファイルと出力prefixを指定する。レポジトリのテスト.fastaには６つのゲノムが含まれる（全て29,903-bp長なのでおそらくSARS-CoV-2）。

git clone https://github.com/aineniamh/snipit.git
cd snipit/docs/
snipit test.fasta --output-file test

PNGファイル: test.pngだけが出力される。

（透過PNGとなっているので、背景は透明）

提供されたアラインメントを塩基配列と仮定し（"-t nt"）、SNPを塩基の変化で色分けしたグラフが作成される。曖昧な変化は灰色で表示される。

PDF出力に変更

snipit test.fasta --output-file prefix -f pdf

-f, --format Format options (png, jpg, pdf, svg, tiff) Default: png

カラーパレットをprimaryに変更

snipit test.fasta --output-file prefix -c primary

-c , --colour-palette Specify colour palette. Options: [classic, classic_extended, primary, purine-pyrimidine, greyscale, wes, verity, ugene]. Use ugene for protein alignments.

アミノ酸配列の可視化にも対応している。ugeneの使用が推奨されている。

snipit aa_alignment.fasta --output-file prefix -c ugene -t aa

-t {nt, aa}, --sequence-type {nt, aa} Input sequence type: aa or nt

recombi-modeでは、親集団を表すリファレンスを指定することで、組換え配列中の変異をハイライトできる。リファレンスと親の配列名はmulti-fastaファイルの配列名で指定する。

snipit test.fasta --reference USA_3 --recombi-mode --recombi-references "USA_1,USA_2"

-r , --reference Indicates which sequence in the alignment is the reference (by sequence ID). Default: first sequence in alignment
--recombi-mode Allow colouring of query seqeunces by mutations present in two 'recombi-references' from the input alignment fasta file
--recombi-references Specify two comma separated sequence IDs in the input alignment to use as 'recombi-references'. Ex. Sequence_ID_A,Sequence_ID_B

ある親に固有の変異であればどちらの親か、あるいはどちらの親集団にも存在しないユニークな変異であるかによって色分けされる。

論文とレポジトリより

snipitの開発の動機は、アウトブレイク調査ソフトウェアcivet（O'Toole et al.2022）によるアウトブレイク調査を支援するために、微妙に異なるSARS-CoV-2ゲノム配列を比較する必要性にあった。しかし、その後snipitの機能は拡張され、様々なモードやカスタマイズ可能なオプションを含むようになった。snipitはウィルスのアウトブレイク調査、組換え体の検出と解析、さらに細菌のAMR検出を含む、SNPsの比較を必要とするあらゆる調査に使用できる可能性がある。
可視化に含む領域は除外する領域は"--include-positions"と"--exclude-positions"で指定する。ポジション単位でも指定できる。

引用

Publication-ready single nucleotide polymorphism visualization with snipit

Áine O’Toole, Ammar Aziz, Daniel Maloney

Bioinformatics, Volume 40, Issue 8, August