HTS (NGS) 関連のインフォマティクス情報についてまとめています。

Mulit-FASTAの分割 (split)

2018 10/26追記

2019 10/28インストール追記

2020 4/29 追記





conda install -c bioconda -y bbmap -h

$ -h


Written by Brian Bushnell

Last modified April 17, 2018


Description:  Splits a sequence file evenly into multiple files.


Usage: in=<file> in2=<file2> out=<outfile> out2=<outfile2> ways=<number>


in2 and out2 are for paired reads and are optional.

If input is paired and out2 is not specified, data will be written interleaved.

Output filenames MUST contain a '%' symbol.  This will be replaced by a number.


Parameters and their defaults:


in=<file>       Input file.

out=<file>      Output file pattern.

ways=-1         The number of output files to create; must be positive.


ow=f            (overwrite) Overwrites files that already exist.

app=f           (append) Append to files that already exist.

zl=4            (ziplevel) Set compression level, 1 (low) to 9 (max).

int=f           (interleaved) Determines whether INPUT file is considered interleaved.


Java Parameters:

-Xmx            This will set Java's memory usage, overriding autodetection.

                -Xmx20g will specify 20 gigs of RAM, and -Xmx200m will specify 200 megs.

                    The max is typically 85% of physical memory.

-eoom           This flag will cause the process to exit if an out-of-memory

                exception occurs.  Requires Java 8u92+.

-da             Disable assertions.


Please contact Brian Bushnell at if you encounter any problems.




5クロモソームを分割するなら、以下のようにコマンドを打つ。 in=input.fasta out=chromosome%.fasta ways=5
  • ways=1 The number of output files to create; must be positive.


ヒトゲノムなどの大きなゲノムなら-Xmx20G などをつけておく(javaの使用メモリ20 GB)。 -Xmx20G in=hs37d5.fa out=chromosome%.fasta ways=86




EMBOSS: seqretsplit

seqretsplit input_multi.fasta out

seqretsplit input* out




grep -n ">" input.fasta |wc -l



csplit -z input.fasta '/>/' '{*}'