macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

並列化に対応し、高速にバーコードやアダプターをトリミングする FLEXBAR

2021/10/21, 24 インストール追記

 

FLEXBARはMultiplexで読んだシーケンスのdemultiplexやアダプタートリミングに使われるツール。柔軟な条件でランできる。よく使われているらしく、現在Flexbar3まで発表されている。解析時間は短く、100Mのリードなら数秒〜10秒程度の時間でアダプターをトリミングできる。ペアリードの順番は破壊しないので安心して使える。

 

マニュアル

f:id:kazumaxneo:20180120123327j:plain

 

インストール

github

Binaryをダウンロードして(リンクのmac版)、環境変数にパスを通す。

export DYLD_LIBRARY_PATH=/path/FlexbarDir:$DYLD_LIBRARY_PATH #カレントがダウンロードしたFlexbarDir/なら後半は ='pwd':$DYLD_LIBRARY_PATH

#conda (20211021追記)
mamba install -c bioconda -y flexbar

> flexbar --help

$ flexbar -h

 

flexbar - flexible barcode and adapter removal

==============================================

 

SYNOPSIS

    flexbar -r reads [-b barcodes] [-a adapters] [options]

 

DESCRIPTION

    The program Flexbar preprocesses high-throughput sequencing data

efficiently. It demultiplexes barcoded runs and

    removes adapter sequences. Several adapter removal presets for

Illumina libraries are included. Flexbar computes

    exact overlap alignments using SIMD and multicore parallelism.

Moreover, trimming and filtering features are

    provided, e.g. trimming of homopolymers at read ends. Flexbar

increases read mapping rates and improves genome as

    well as transcriptome assemblies. Unique molecular identifiers can

be extracted in a flexible way. The software

    supports data in fasta and fastq format from multiple sequencing

platforms. Refer to the manual on

    github.com/seqan/flexbar/wiki or contact Johannes Roehr on

github.com/jtroehr for support with this application.

 

OPTIONS

    -h, --help

          Display the help message.

    -hh, --full-help

          Display the help message with advanced options.

    -v, --versions

          Print Flexbar and SeqAn version numbers.

    -c, --cite

          Show program references for citation.

 

  Basic options:

    -n, --threads INTEGER

          Number of threads to employ. Default: 1.

    -t, --target OUTPUT_PREFIX

          Prefix for output file names or paths. Default: flexbarOut.

    -r, --reads INPUT_FILE

          Fasta/q file or stdin (-) with reads that may contain barcodes.

    -p, --reads2 INPUT_FILE

          Second input file of paired reads, gz and bz2 files supported.

 

  Barcode detection:

    -b, --barcodes INPUT_FILE

          Fasta file with barcodes for demultiplexing, may contain N.

    -br, --barcode-reads INPUT_FILE

          Fasta/q file containing separate barcode reads for detection.

    -bo, --barcode-min-overlap INTEGER

          Minimum overlap of barcode and read. Default: barcode length.

    -be, --barcode-error-rate DOUBLE

          Error rate threshold for mismatches and gaps. Default: 0.0.

    -bt, --barcode-trim-end STRING

          Type of detection, see section trim-end modes. Default: LTAIL.

 

  Adapter removal:

    -a, --adapters INPUT_FILE

          Fasta file with adapters for removal that may contain N.

    -a2, --adapters2 INPUT_FILE

          File with extra adapters for second read set in paired mode.

    -aa, --adapter-preset STRING

          One of TruSeq, SmallRNA, Methyl, Ribo, Nextera, and NexteraMP.

    -ao, --adapter-min-overlap INTEGER

          Minimum overlap for removal without pair overlap. Default: 3.

    -ae, --adapter-error-rate DOUBLE

          Error rate threshold for mismatches and gaps. Default: 0.1.

    -at, --adapter-trim-end STRING

          Type of removal, see section trim-end modes. Default: RIGHT.

    -ap, --adapter-pair-overlap STRING

          Overlap detection of paired reads. One of ON, SHORT, and ONLY.

 

  Filtering and trimming:

    -u, --max-uncalled INTEGER

          Allowed uncalled bases N for each read. Default: 0.

    -x, --pre-trim-left INTEGER

          Trim given number of bases on 5' read end before detection.

    -y, --pre-trim-right INTEGER

          Trim specified number of bases on 3' end prior to detection.

    -m, --min-read-length INTEGER

          Minimum read length to remain after removal. Default: 18.

 

  Quality-based trimming:

    -q, --qtrim STRING

          Quality-based trimming mode. One of TAIL, WIN, and BWA.

    -qf, --qtrim-format STRING

          Quality format. One of sanger, solexa, i1.3, i1.5, and i1.8.

    -qt, --qtrim-threshold INTEGER

          Minimum quality as threshold for trimming. Default: 20.

 

  Trimming of homopolymers:

    -hr, --htrim-right STRING

          Trim certain homopolymers on right read end after removal.

    -hi, --htrim-min-length INTEGER

          Minimum length of homopolymers at read ends. Default: 3.

    -he, --htrim-error-rate DOUBLE

          Error rate threshold for mismatches. Default: 0.1.

 

  Output selection:

    -f, --fasta-output

          Prefer non-quality format fasta for output.

    -z, --zip-output STRING

          Direct compression of output files. One of GZ and BZ2.

    -1, --stdout-reads

          Write reads to stdout, tagged and interleaved if needed.

 

  Logging and tagging:

    -l, --align-log STRING

          Print chosen read alignments. One of ALL, MOD, and TAB.

    -o, --stdout-log

          Write statistics to stdout instead of target log file.

    -g, --removal-tags

          Tag reads that are subject to adapter or barcode removal.

 

TRIM-END MODES

    ANY: longer side of read remains after removal of overlap

    LEFT: right side remains after removal, align <= read end

    RIGHT: left part remains after removal, align >= read start

    LTAIL: consider first n bases of reads in alignment

    RTAIL: use only last n bases, see tail-length options

 

EXAMPLES

    flexbar -r reads.fq -t target -q TAIL -qf i1.8

    flexbar -r reads.fq -b barcodes.fa -bt LTAIL

    flexbar -r reads.fq -a adapters.fa -ao 3 -ae 0.1

    flexbar -r r1.fq -p r2.fq -a a1.fa -a2 a2.fa -ap ON

    flexbar -r r1.fq -p r2.fq -aa TruSeq -ap ON

 

VERSION

    Last update: May 2019

    flexbar version: 3.5.0

    SeqAn version: 2.4.0

 

Available on github.com/seqan/flexbar

 

 

ラン

アダプターを除く。

flexbar -r input.fastq -a adaptor.fa -be LTAIL -t target -n 4
  • -t   Prefix for output file names or paths. Default: flexbarOut.
  • -r   Fasta/q file or stdin (-) with reads that may contain barcodes.
  • -p   Second input file of paired reads, gz and bz2 files supported.
  • -a   Fasta file with adapters for removal that may contain N.
  • -n   Number of threads to employ. Default: 1.
  • -be   Type of detection, see section trim-end modes. Default: LTAIL.

TRIM-END MODES

  1. ANY: longer side of read remains after removal of overlap
  2. LEFT: right side remains after removal, align <= read end
  3. RIGHT: left part remains after removal, align >= read start
  4. LTAIL: consider first n bases of reads in alignment
  5. RTAIL: use only last n bases, see tail-length options

 

 

引用

Flexbar 3.0 - SIMD and multicore parallelization.

Roehr JT, Dieterich C, Reinert K.

Bioinformatics. 2017 Sep 15;33(18):2941-2942.

 

FLEXBAR-Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms.

Dodt M, Roehr JT, Ahmed R, Dieterich C.

Biology (Basel). 2012 Dec 14;1(3):895-905.