macでインフォマティクス

macでインフォマティクス

NGS関連のインフォマティクス情報についてまとめています。

並列化に対応し、高速にバーコードやアダプターをトリミングする FLEXBAR

FLEXBARはMultiplexで読んだシーケンスのdemultiplexやアダプタートリミングに使われるツール。柔軟な条件でランできる。よく使われているらしく、現在Flexbar3まで発表されている。解析時間は短く、100Mのリードなら数秒〜10秒程度の時間でアダプターをトリミングできる。ペアリードの順番は破壊しないので安心して使える。

 

マニュアル

f:id:kazumaxneo:20180120123327j:plain

 

インストール

github

https://github.com/seqan/flexbar

Binaryをダウンロードして(リンクのmac版)、環境変数にパスを通す。

export DYLD_LIBRARY_PATH=/path/FlexbarDir:$DYLD_LIBRARY_PATH #カレントがダウンロードしたFlexbarDir/なら後半は ='pwd':$DYLD_LIBRARY_PATH

参考 OSXでDYLD_LIBRARY_PATHを設定する意味がわからない - non vorrei lavorare

> flexbar --help

$ flexbar --help

 

flexbar - flexible barcode and adapter removal

==============================================

 

SYNOPSIS

    flexbar -r reads [-b barcodes] [-a adapters] [options]

 

DESCRIPTION

    -h, --help

          Display the help message.

    -hh, --full-help

          Display the help message with advanced options.

    -v, --versions

          Print Flexbar and SeqAn version numbers.

    -c, --cite

          Show program reference for citation.

 

  Basic options:

    -n, --threads NUM

          Number of threads to employ. Default: 1.

    -t, --target STR

          Prefix for output file names or paths. Default: flexbarOut.

    -r, --reads FILE

          Fasta/q file or stdin (-) with reads that may contain barcodes.

    -p, --reads2 FILE

          Second input file of paired reads, gz and bz2 files supported.

 

  Barcode detection:

    -b, --barcodes FILE

          Fasta file with barcodes for demultiplexing, may contain N.

    -br, --barcode-reads FILE

          Fasta/q file containing separate barcode reads for detection.

    -be, --barcode-trim-end STR

          Type of detection, see section trim-end modes. Default: LTAIL.

    -bo, --barcode-min-overlap NUM

          Minimum overlap of barcode and read. Default: barcode length.

    -bt, --barcode-error-rate NUM

          Error rate threshold for mismatches and gaps. Default: 0.1.

 

  Adapter removal:

    -a, --adapters FILE

          Fasta file with adapters for removal that may contain N.

    -as, --adapter-seq STR

          Single adapter sequence as alternative to adapters option.

    -ae, --adapter-trim-end STR

          Type of removal, see section trim-end modes. Default: RIGHT.

    -ao, --adapter-min-overlap NUM

          Minimum overlap of adapter and read for removal. Default: 3.

    -at, --adapter-error-rate NUM

          Error rate threshold for mismatches and gaps. Default: 0.1.

 

  Filtering and trimming:

    -u, --max-uncalled NUM

          Allowed uncalled bases N for each read. Default: 0.

    -x, --pre-trim-left NUM

          Trim given number of bases on 5' read end before detection.

    -y, --pre-trim-right NUM

          Trim specified number of bases on 3' end prior to detection.

    -m, --min-read-length NUM

          Minimum read length to remain after removal. Default: 18.

 

  Quality-based trimming:

    -q, --qtrim STR

          Quality-based trimming mode. One of TAIL, WIN, and BWA.

    -qf, --qtrim-format STR

          Quality format. One of sanger, solexa, i1.3, i1.5, and i1.8.

    -qt, --qtrim-threshold NUM

          Minimum quality as threshold for trimming. Default: 20.

 

  Output selection:

    -f, --fasta-output

          Prefer non-quality format fasta for output.

    -z, --zip-output STR

          Direct compression of output files. One of GZ and BZ2.

    -s, --single-reads

          Write single reads for too short counterparts in pairs.

 

  Logging and tagging:

    -l, --align-log STR

          Print chosen read alignments. One of ALL, MOD, and TAB.

    -o, --stdout-log

          Write statistics to console instead of target log file.

    -g, --removal-tags

          Tag reads that are subject to adapter or barcode removal.

 

TRIM-END MODES

    ANY: longer side of read remains after removal of overlap

    LEFT: right side remains after removal, align <= read end

    RIGHT: left part remains after removal, align >= read start

    LTAIL: consider first n bases of reads in alignment

    RTAIL: use only last n bases, see tail-length options

 

EXAMPLES

    flexbar -r reads.fq -t target -b brc.fa -be LTAIL -a adp.fa

    flexbar -r reads.fq.gz -q TAIL -qf i1.8 -a adp.fa -ao 5 -at 0.4

 

VERSION

    Last update: March 2017

    flexbar version: 3.0

    SeqAn version: 2.2.0

 

Available on github.com/seqan/flexbar

 

Show advanced options: flexbar -hh

 

 

ラン

アダプターを除く。

flexbar -r input.fastq -a adaptor.fa -be LTAIL -t target -n 4
  • -t   Prefix for output file names or paths. Default: flexbarOut.
  • -r   Fasta/q file or stdin (-) with reads that may contain barcodes.
  • -p   Second input file of paired reads, gz and bz2 files supported.
  • -a   Fasta file with adapters for removal that may contain N.
  • -n   Number of threads to employ. Default: 1.
  • -be   Type of detection, see section trim-end modes. Default: LTAIL.

TRIM-END MODES

  1. ANY: longer side of read remains after removal of overlap
  2. LEFT: right side remains after removal, align <= read end
  3. RIGHT: left part remains after removal, align >= read start
  4. LTAIL: consider first n bases of reads in alignment
  5. RTAIL: use only last n bases, see tail-length options

 

 

引用

Flexbar 3.0 - SIMD and multicore parallelization.

Roehr JT, Dieterich C, Reinert K.

Bioinformatics. 2017 Sep 15;33(18):2941-2942.

 

FLEXBAR-Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms.

Dodt M, Roehr JT, Ahmed R, Dieterich C.

Biology (Basel). 2012 Dec 14;1(3):895-905.