2018 8/20 mpileupコマンドの謝り修正
2019 2/26 condaインストール追記
2021 6/2 help更新
- samtools
- bcftools
conda install -c bioconda -y sambamba
brew install sambamba
sambanba view #動作確認。例えばviewのヘルプを見る。
> sambamba view -h
# sambamba view -h
Usage: sambamba-view [options] <input.bam | input.sam> [region1 [...]]
Options: -F, --filter=FILTER
set custom filter for alignments
filter flag bits; 'i1/i2' corresponds to -f i1 -F i2 samtools arguments;
either of the numbers can be omitted
-f, --format=sam|bam|cram|json
specify which format to use for output (default is SAM)
-h, --with-header
print header before reads (always done for BAM output)
-H, --header
output only header to stdout (if format=bam, the header is printed as SAM)
-I, --reference-info
output to stdout only reference names and lengths in JSON
-L, --regions=FILENAME
output only reads overlapping one of regions from the BED file
-c, --count
output to stdout only count of matching records, hHI are ignored
-v, --valid
output only valid alignments
-S, --sam-input
specify that input is in SAM format
-C, --cram-input
specify that input is in CRAM format
-T, --ref-filename=FASTA
specify reference for writing CRAM
-p, --show-progress
show progressbar in STDERR (works only for BAM files with no regions specified)
-l, --compression-level
specify compression level (from 0 to 9, works only for BAM output)
-o, --output-filename
specify output filename
-t, --nthreads=NTHREADS
maximum number of threads to use
-s, --subsample=FRACTION
subsample reads (read pairs)
set seed for subsampling
> sambamba sort
# sambamba sort
Usage: sambamba-sort [options] <input.bam>
Options: -m, --memory-limit=LIMIT
approximate total memory limit for all threads (by default 2GB)
directory for storing intermediate files; default is system directory for temporary files
-o, --out=OUTPUTFILE
output file name; if not provided, the result is written to a file with .sorted.bam extension
-n, --sort-by-name
sort by read name instead of coordinate (lexicographical order)
-N, --natural-sort
sort by read name instead of coordinate (so-called 'natural' sort as in samtools)
-l, --compression-level=COMPRESSION_LEVEL
level of compression for sorted BAM, from 0 to 9
-u, --uncompressed-chunks
write sorted chunks as uncompressed BAM (default is writing with compression level 1), that might be faster in some cases but uses more disk space
-p, --show-progress
show progressbar in STDERR
-t, --nthreads=NTHREADS
use specified number of threads
-F, --filter=FILTER
keep only reads that satisfy FILTER
> sambamba mpileup
# sambamba mpileup
usage: sambamba-pileup [options] input.bam [input2.bam [...]]
[--samtools <samtools mpileup args>]
[--bcftools <bcftools call args>]
This subcommand relies on external tools and acts as a multi-core
implementation of samtools and bcftools.
Therefore, the following tools should be present in $PATH:
* samtools
* bcftools (when used)
If --samtools is skipped, samtools mpileup is called with default arguments
If --bcftools is used without parameters, samtools is called with
switch '-gu' and bcftools is called as 'bcftools view -'
If --bcftools is skipped, bcftools is not called
Sambamba splits input BAM files into chunks and feeds them
to samtools mpileup and, optionally, bcftools in parallel.
The chunks are slightly overlapping so that variant calling
should not be impacted by these manipulations. The obtained results
from the multiple processes are combined as ordered output.
Sambamba options:
-L, --regions=FILENAME
provide BED file with regions
(no need to duplicate it in samtools args);
all input files must be indexed
-o, --output-filename=<STDOUT>
specify output filename
directory for temporary files
-t, --nthreads=NTHREADS
maximum number of threads to use
-b, --buffer-size=64_000_000
chunk size (in bytes)
-B, --output-buffer-size=512_000_000
output buffer size (in bytes)
Sambamba paths:
sambamba-pileup: failed to locate samtools executable in PATH
> sambamba merge
# sambamba merge
Usage: sambamba-merge [options] <output.bam> <input1.bam> <input2.bam> [...]
Options: -t, --nthreads=NTHREADS
number of threads to use for compression/decompression
-l, --compression-level=COMPRESSION_LEVEL
level of compression for merged BAM file, number from 0 to 9
-H, --header
output merged header to stdout in SAM format, other options are ignored; mainly for debug purposes
-p, --show-progress
show progress bar in STDERR
-F, --filter=FILTER
keep only reads that satisfy FILTER
> sambamba index
# sambamba index
Usage: sambamba-index [OPTIONS] <input.bam|input.cram> [output_file]
Creates index for a BAM or CRAM file
Options: -t, --nthreads=NTHREADS
number of threads to use for decompression
-p, --show-progress
show progress bar in STDERR
-c, --check-bins
check that bins are set correctly
-C, --cram-input
specify that input is in CRAM format
> sambamba markup
# sambamba markup
sambamba 0.6.6
Usage: sambamba [command] [args...]
Available commands: 'view', 'index', 'merge', 'sort',
'flagstat', 'slice', 'markdup', 'depth', 'mpileup'
To get help on a particular command, just call it without args.
Leave bug reports and feature requests at
> sambamba depth -h
# sambamba depth -h
Usage: sambamba-depth region|window|base [options] input.bam [input2.bam [...]]
All BAM files must be coordinate-sorted and indexed.
The tool has three modes: base, region, and window,
each name means per which unit to print the statistics.
Common options:
-F, --filter=FILTER
set custom filter for alignments; the default value is
'mapping_quality > 0 and not duplicate and not failed_quality_control'
-o, --output-file=FILENAME
output filename (by default /dev/stdout)
-t, --nthreads=NTHREADS
maximum number of threads to use
-c, --min-coverage=MINCOVERAGE
minimum mean coverage for output (default: 0 for region/window, 1 for base)
-C, --max-coverage=MAXCOVERAGE
maximum mean coverage for output
-q, --min-base-quality=QUAL
don't count bases with lower base quality
output combined statistics for all samples
-a, --annotate
add additional column of y/n instead of
skipping records not satisfying the criteria
-m, --fix-mate-overlaps
detect overlaps of mate reads and handle them on per-base basis
base subcommand options:
list or regions of interest or a single region in form chr:beg-end (optional)
-z, --report-zero-coverage (DEPRECATED, use --min-coverage=0 instead)
don't skip zero coverage bases
region subcommand options:
list or regions of interest or a single region in form chr:beg-end (required)
-T, --cov-threshold=COVTHRESHOLD
multiple thresholds can be provided,
for each one an extra column will be added,
the percentage of bases in the region
where coverage is more than this value
window subcommand options:
-w, --window-size=WINDOWSIZE
breadth of the window, in bp (required)
overlap of successive windows, in bp (default is 0)
-T, --cov-threshold=COVTHRESHOLD
same meaning as in 'region' subcommand
view(リンク) 特定の染色体や特定の領域にアライメントされたリードだけ抽出したり、数を数える。
sambamba view --reference-info input.bam
- -I, --reference-info output to stdout only reference names and lengths in JSON
sambamba view -S --reference-info input.bam
- -S specify that input is in SAM format
sambamba view -c -F "mapping_quality >= 50" input.bam
- -c output to stdout only count of matching records, hHI are ignored
- -F set custom filter for alignments
sambamba view -F "proper_pair" input.bam chr19 -t 8 -f sam -o output.sam
- -o specify output filename
- -t maximum number of threads to use
- -f specify which format to use for output (default is SAM)
sambamba view -S -F "proper_pair" input.sam chr1 -t 8 -f bam -h -p -o output.bam
- -h print header before reads (always done for BAM output)
- -p show progressbar in STDERR
"proper_pair"の割合を知りたければ、下のflagstatが便利です。 特定の領域にオーバーラップしたリードを数えるなら、sliceの方が高速と書かれています(sliceリンク)。
sort(リンク) ソート。
sambamba sort input.bam -o sorted.bam -t 20 -p -l 6
- -o output file name; if not provided, the result is written to a file with .sorted.bam extension
- -t use specified number of threads
- -p show progressbar in STDERR
- -l level of compression for sorted BAM, from 0 to 9(指定がなければ"6"くらいのサイズになる)
sambamba sort input.bam -o sorted.bam -t 20 -p -l 5
- -o output file name; if not provided, the result is written to
flagstat(リンク) bamの情報表示。
sambamba flagstat input.bam -t 8 -p
- -t use NTHREADS for decompression
- -p show progressbar in STDERR
sambamba flagstat input.bam
509280 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
538 + 0 supplementary
0 + 0 duplicates
456058 + 0 mapped (89.55%:N/A)
508742 + 0 paired in sequencing
254371 + 0 read1
254371 + 0 read2
451458 + 0 properly paired (88.74%:N/A)
454670 + 0 with itself and mate mapped
850 + 0 singletons (0.17%:N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
index(リンク) bamのindexファイル作成
sambamba index input.bam -t 8 -p
- -t use NTHREADS for decompression
- -p show progressbar in STDERR
merge(リンク) bamのマージ。
sambamba merge merge.bam input1.bam input2.bam nput3.bam -t 12 -p -l 9
- -t Specify number of threads to use for compression/decompression tasks.
- -l Specify compression level of output file as a number from 0 to 9
- -p Show progressbar in STDERR.
(If --bcftools is used without parameters, bcftools is called as 'bcftools view -Ov' )
mpileup(リンク) samtoolsのmpileupのparallelバージョン。
sambamba mpileup input.bam -t 12 --samtools -f ref.fa > output
- -t Specify number of threads to use.
- -p Show progressbar in STDERR.
- -o specify output filename
If --bcftools is used without parameters, bcftools is called as 'bcftools view -Ov'
- -O, --output-type <b|u|z|v> b: compressed BCF, u: uncompressed BCF, z: compressed VCF, v: uncompressed VCF [v]
Sambamba: fast processing of NGS alignment formats.
Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P.
Bioinformatics. 2015 Jun 15;31(12):2032-4. doi: 10.1093/bioinformatics/btv098. Epub 2015 Feb 19.