bamからbigWigとWiggle Formatに変換するツール

2019 3/20　誤字修正

2021 12/23 コマンド修正

2023/10/01追記

bamからwiggleファイルに変換してviewerに取り込むと、カバレッジtrackとして表示できる。ただしそれにはsamtoolsのpileupを使いbamからwiggleファイルを作る必要があり、作り方がやや面倒だった。現在では、コマンド一発でwiggleファイルを作るツールが公開されている。このツールを使ってbigwig/wiggleに変換し、IGVにカバレッジtrackを追加する流れを書いておく。

インストール

wiggleとbigwigに変換するツール

#bioconda 
mamba install -c bioconda -y ucsc-wigtobigwig
mamba install -c bioconda -y bam2wig

#2021 2/13 追記こちらを入れておく
mamba install -c bioconda jvarkit-bam2wig

https://github.com/MikeAxtell/bam2wig

バイナリなので、condaを使わない場合もpathを通して実行権を与えるだけでランできる。ただしbigwigに変換するには、下のリンクからwigtobigwigツールをダウンロードしてパスを通しておく必要がある。wigtobigwigがないと、wiggleファイルだけ作られる。

wigtobigwigのダウンロード - unix

Index of /admin/exe/linux.x86_64

wigtobigwigのダウンロード - mac OSX

bam2wig (linux)

> wigToBigWig

$ wigToBigWig

wigToBigWig v 4 - Convert ascii format wig file (in fixedStep, variableStep

or bedGraph format) to binary big wig format.

usage:

wigToBigWig in.wig chrom.sizes out.bw

Where in.wig is in one of the ascii wiggle formats, but not including track lines

and chrom.sizes is a two-column file/URL: <chromosome name> <size in bases>

and out.bw is the output indexed big wig file.

If the assembly <db> is hosted by UCSC, chrom.sizes can be a URL like

http://hgdownload.cse.ucsc.edu/goldenPath//bigZips/.chrom.sizes

or you may use the script fetchChromSizes to download the chrom.sizes file.

If not hosted by UCSC, a chrom.sizes file can be generated by running

twoBitInfo on the assembly .2bit file.

options:

-blockSize=N - Number of items to bundle in r-tree. Default 256

-itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024

-clip - If set just issue warning messages rather than dying if wig

file contains items off end of chromosome.

-unc - If set, do not use compression.

-fixedSummaries - If set, use a predefined sequence of summary levels.

-keepAllChromosomes - If set, store all chromosomes in b-tree.

> bam2wig.sh -h

$ bam2wig.sh -h

Usage: bam2wig [options] Files

Options:

-bg, --bedgraph

Produce a BED GRAPH instead of a WIGGLE file.

Default: false

--display

What kind of data should we display ?

Default: COVERAGE

Possible Values: [COVERAGE, CLIPPING, INSERTION, DELETION, READ_GROUPS, CASE_CTRL]

--filter

A filter expression. Reads matching the expression will be filtered-out.

Empty String means 'filter out nothing/Accept all'. See https://github.com/lindenb/jvarkit/blob/master/src/main/resources/javacc/com/github/lindenb/jvarkit/util/bio/samfilter/SamFilterParser.jj

for a complete syntax.

Default: mapqlt(1) || MapQUnavailable() || Duplicate() || FailsVendorQuality() || NotPrimaryAlignment() || SupplementaryAlignment()

-f, --format

`Printf` Format for the values. see

https://docs.oracle.com/javase/tutorial/java/data/numberformat.html .

Use "%.01f" to print an integer. "%e" for scientific notation.

Default: %.3f

-t, --header

print a UCSC custom track header: something lile track type=track_type

name="__REPLACE_WIG_NAME__" description="__REPLACE_WIG_DESC__". Use

`sed` to replace the tokens. e.g: `sed

'/^track/s/__REPLACE_WIG_NAME__/My data/'`

Default: false

-h, --help

print help and exit

--helpFormat

What kind of help. One of [usage,markdown,xml].

--region, --interval

Limit analysis to this interval. An interval as the following syntax :

"chrom:start-end" or "chrom:middle+extend" or "chrom:start-end+extend"

or "chrom:start-end+extend-percent%".A program might use a Reference

sequence to fix the chromosome name (e.g: 1->chr1)

--mindepth, --mindp

When using display READ_GROUPS, What is the minimal read depth that

should be considered ?

Default: 0

-o, --output

Output file. Optional . Default: stdout

--partition

When using display READ_GROUPS, how should we partition the ReadGroup ?

Data partitioning using the SAM Read Group (see

https://gatkforums.broadinstitute.org/gatk/discussion/6472/ ) . It can

be any combination of sample, library....

Default: sample

Possible Values: [readgroup, sample, library, platform, center, sample_by_platform, sample_by_center, sample_by_platform_by_center, any]

--pedigree, -ped

Pedigree file for CASE_CTRL. A pedigree is a text file delimited with

tabs. No header. Columns are (1) Family (2) Individual-ID (3) Father Id

or '0' (4) Mother Id or '0' (5) Sex : 1 male/2 female / 0 unknown (6)

Status : 0 unaffected, 1 affected,-9 unknown

--percentile

How to group data in the sliding window ?

Default: AVERAGE

Possible Values: [MIN, MAX, MEDIAN, AVERAGE, RANDOM, SUM]

--version

print version and exit

-s, --windowShift

window shift

Default: 25

-w, --windowSize

window size

Default: 100

ラン

bamからwiggleファイルを出力。

bam2wig.sh input.bam -o out.wig

#追記 bam => bigwig （deeptools）; biwgwig形式で書き出すには-of bigwigを指定
bamCoverage --bam input.bam -o output.bw -of bigwig --binSize=10 --minMappingQuality 10 --numberOfProcessors=max

bam2wig.shで使いそうなオプション

-r : Limit analysis to specified read group

-l : Minimum size of read to report. Default: no minimum.

-L : Maximum size of read to report. Default: no maximum.

-c : Limit analysis to the indicated locus.

-m : Count each as (1/(n mapped reads)) * 1E6. In other words, normalize to reads per million.

計算が終わると、bamのあるディレクトリにbamファイル名のディレクトリが追加される。その中にbigwigとwig形式のファイルが作られる。

IGVで表示する。まずbamを取り込む。