TieBrushは、MITライセンスのもと、C++パッケージとして提供される。コンパイル済みのバイナリ、ソースコード、サンプルデータはGitHub (https://github.com/alevar/tiebrush)で公開されている
ubuntru18でテストした(docker使用、cmake v3.10.2, gcc v7.4.0)。
git clone https://github.com/alevar/tiebrush.git --recursive
cd tiebrush/
cmake -DCMAKE_BUILD_TYPE=Release .
make -j8
make install
> ./tiebrush
TieBrush v0.0.6
Summarize and filter read alignments from multiple sequencing samples (taken as sorted SAM/BAM/CRAM files). This utility aims to merge/collapse "duplicate" read alignments across multiple sequencing samples (inputs), adding custom SAM tags in order to keep track of the "alignment multiplicity" count (how many times the same alignment is seen across all input data) and "sample count" (how many samples show that same alignment).
usage: tiebrush [-h] -o OUTPUT [-L|-P|-E] [-S] [-M] [-N max_NH_value] [-Q min_mapping_quality] [-F FLAGS] ...
Input arguments:
... input alignment files can be provided as a space-delimited
list of filenames or as a text file containing a list of
filenames, one per line
Required arguments:
-o File for BAM output
Optional arguments:
-h,--help Show this help message and exit
--version Show the program version and exit
-L,--full If enabled, only reads with the same CIGAR
and MD strings will be grouped and collapsed.
By default, TieBrush will consider the CIGAR
string only when grouping reads
Only one of -L, -P or -E options can be enabled
-P,--clip If enabled, reads will be grouped by clipped
CIGAR string. In this mode 5S10M5S and 3S10M3S
CIGAR strings will be grouped if the coordinates
of the matching substring (10M) are the same
between reads
-E,--exon If enabled, reads will be grouped if their exon
boundaries are the same. This option discards
any structural variants contained in mapped
substrings of the read and only considers start
and end coordinates of each non-splicing segment
of the CIGAR string
-S,--keep-supp If enabled, supplementary alignments will be
included in the collapsed groups of reads.
By default, TieBrush removes any mappings
not listed as primary (0x100). Note, that if enabled,
each supplementary mapping will count as a separate read
-M,--keep-unmap If enabled, unmapped reads will be retained (uncollapsed)
in the output. By default, TieBrush removes any
unmapped reads
-N Maximum NH score of the reads to retain
-Q Minimum mapping quality of the reads to retain
-F Bits in SAM flag to use in read comparison. Only reads that
have specified flags will be merged together (default: 0)
Error: no input provided!
> ./tiecov -h
TieCov v0.0.6
The TieCov utility can take the output file produced by TieBrush and generate the following auxiliary files:
1. BedGraph file with the coverage data
2. Junction BED file
3. a heatmap BED that uses color intensity to represent the number of samples that contain each position
usage: tiecov [-s out.sample] [-c out.coverage] [-j out.junctions] [-W] input
Input arguments (required):
input alignment file in SAM/BAM/CRAM format
Optional arguments (at least one of -s/-c/-j must be specified):
-h,--help Show this help message and exit
--version Show program version and exit
-s BedGraph file with an estimate of the number of samples
which contain alignments for each interval.
-c BedGraph (or BedWig with '-W') file with coverage
for all mapped bases.
-j BED file with coverage of all splice-junctions
in the input file.
-W save coverage in BigWig format. Default output
is in Bed format
Tiewrap is a utility script provided to make running TieBrush on large datasets a bit easier.
> ./tiewrap.py -h
usage: tiewrap.py [-h] -o OUTPUT [-L] [-P] [-E] [-S] [-M] [-N MAX_NH]
Help Page
positional arguments:
input Input can be provided as a space-delimited list of
filenames or as a textfile containing a list of
filenames one per each line.
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
File for BAM output.
-L, --full If enabled, only reads with the same CIGAR and MD
strings will be grouped and collapsed. By default,
TieBrush will consider the CIGAR string only when
grouping reads.
-P, --clip If enabled, reads will be grouped by clipped CIGAR
string. In this mode 5S10M5S and 3S10M3S cigar strings
will be grouped if the coordinates of the matching
substring (10M) are the same between reads.
-E, --exon If enabled, reads will be grouped if their exon
boundaries are the same. This option discards any
structural variants contained in mapped substrings of
the read and only considers start and end coordinates
of each non-splicing segment of the CIGAR string.
-S, --keep-supp If enabled, supplementary alignments will be included
in the collapsed groups of reads. By default, TieBrush
removes any mappings not listed as primary (0x100).
Note, that if enabled, each supplementary mapping will
count as a separate read.
-M, --keep-unmap If enabled, unmapped reads will be retained
(uncollapsed) in the output. By default, TieBrush
removes any unmapped reads.
-N MAX_NH, --max-nh MAX_NH
Maximum NH score of the reads to retain.
-Q MIN_MAP_QUAL, --min-map-qual MIN_MAP_QUAL
Minimum mapping quality of the reads to retain.
-F FLAGS, --flags FLAGS
Bits in SAM flag to use in read comparison. Only reads
that have specified flags will be merged together
(default: 0)
-t THREADS, --threads THREADS
Number of threads to use.
-b BATCH_SIZE, --batch-size BATCH_SIZE
Number of input files to process in a batch on each thread.
cd example/
../tiebrush -o t1/t1.bam t1/t1s0.bam t1/t1s1.bam t1/t1s2.bam t1/t1s3.bam t1/t1s4.bam t1/t1s5.bam t1/t1s6.bam t1/t1s7.bam t1/t1s8.bam t1/t1s9.bam
../tiebrush -o t2/t2.bam t2/t2s0.bam t2/t2s1.bam t2/t2s2.bam t2/t2s3.bam t2/t2s4.bam t2/t2s5.bam t2/t2s6.bam t2/t2s7.bam t2/t2s8.bam t2/t2s9.bam
2、tiecov のラン
tiecov -s t1/t1.sample -c t1/tb.coverage -j t1/tb.junctions t1/t1.bam
tiecov -s t2/t2.sample -c t2/tb.coverage -j t2/tb.junctions t2/t2.bam
TieBrush: an efficient method for aggregating and summarizing mapped reads across large datasets
Ales Varabyou, Geo Pertea, Christopher Pockrandt, Mihaela Pertea Author Notes
Bioinformatics, Volume 37, Issue 20, 15 October 2021, Pages 3650–3651