インストール
ubuntu16.0.4でテストした。
本体 Github
git clone https://github.com/yangao07/fxtools.git --recursive
cd fxtools; make
> ./fxtools
$ fxtools
Program: fxtools (light-weight processing tool for FASTA, FASTQ and BAM format data)
Usage: fxtools <command> [options]
Command:
filter (fl) filter fa/fq sequences with specified length boundary.
filter-name (fn) filter fa/fq sequences with specified name.
filter-bam (fb) filter bam/sam records with specified read length boundary.
filter-bam-name (fbn) filter bam/sam records with specified read name.
split-fx (sx) split fa/fq file into multipule files.
fq2fa (qa) convert FASTQ format data to FASTA format data.
fa2fq (aq) convert FASTA format data to FASTQ format data.
bam2bed (bb) convert BAM file to BED file. seperated exon regions for spliced BAM
re-co (rc) convert DNA sequence(fa/fq) to its reverse-complementary sequence.
seq-display (sd) display a specified region of FASTA/FASTQ file.
cigar-parse (cp) parse the given cigar(stdout).
length-parse (lp) parse the length of sequences in fa/fq file.
merge-fa (mf) merge the reads with same read name in fasta/fastq file.
merge-filter-fa (mff) merge and filter the reads with same read name in fasta file.
duplicate-fa (df) duplicate the read sequence with specific copy number.
error-parse (ep) parse indel and mismatch error based on CIGAR and NM in bam file.
dna2rna (dr) convert DNA fa/fq to RNA fa/fq.
rna2dna (rd) convert RNA fa/fq to DNA fa/fq.
trim (tr) trim poly A tail(poly T head).
trimF (tf) trim and filter with poly A tail(poly T head). Only poly A contained reads will be kept.
実行方法
1、fxtools filter
Usage: fxtools filter <in.fa/fq> <lower-bound> <upper-bound>(-1 for NO bound) > <out.fa/fq>
100-1000bpの配列(fasta/fastq)のみ出力
fxtools filter input.fasta 100 1000 > output.fasta
bamの場合はfxtools filter-bamを使う。
2、fxtools filter-name
Usage: fxtools filter-name [-n name] [-m sub-name] [-l] <in.fa/fq> > <out.fa/fq>
-n [STR] only output read with specified name.
-m [STR] only output read whose name or comment contain specified string.
-l input a list of names or sub-names with a list file, each line is a name or sub-name. [False]
指定したヘッダ名のみ出力
#ヘッダ名がchr1
fxtools filter-name -n chr1 input.fa > output.fa
#部分一致、ヘッダにplasmidを含む
fxtools filter-name -m plamisd input.fa > output.fa
bamの場合はfxtools filter-bam-nameを使う。
3、 fxtools split-fx
Usage: fxtools split-fx <in.fa/q> <N> <out_dir>
指定数にfasta/fastqを分割
#8配列からなるmulti-fastaを8個に分割
mkdir outdir
fxtools split-fx input.fa 8 outdir
# =>出力ディレクトリに8つのfastaファイルができる。
#8配列からなるmulti-fastaを3個に分割
mkdir outdir
fxtools split-fx input.fa 3 outdir
# =>出力ディレクトリに3つのfastaファイルができる。そのうち2つは3配列のmulti-fasta、
bamの場合はfxtools filter-bam-nameを使う。
4、 fxtools bam2bed
Usage: fxtools bam2bed in.bam > out.bed
bam => bed
fxtools bam2bed inout.bam > out.bed
5、 fxtools re-co
Usage: fxtools re-co in.fa/fq > out.fa
DNA配列をreverse-complementaryに変換
#fastq
fxtools re-co input.fastq > out.fastq
6、fxtools seq-display
Usage: fxtools seq-display <in.fa/fq> <chr/read_name> <start_pos(1-based)> <end_pos>
use negative coordinate to indicate later part of sequence. (e.g., -1 for last bp)
指定した配列の指定した領域を出力
#chr1の10000-11000を出力
fxtools seq-display input.fasta chr1 10000 11000 > out.fa
7、fxtools seq-display
Usage: fxtools seq-display <in.fa/fq> <chr/read_name> <start_pos(1-based)> <end_pos>
use negative coordinate to indicate later part of sequence. (e.g., -1 for last bp)
指定した配列の指定した領域を出力
#chr1の10000-11000を出力
fxtools seq-display input.fasta chr1 10000 11000 > out.fa
8、fxtools cigar-parse
Usage: fxtools cigar-parse <input-cigar>
CIGARをパース
fxtools cigar-parse 144S157M
出力
Cigar length:
157 M
144 S
seq-len: 301
ref-len: 157
9、fxtools length-parse
Usage: fxtools length-parse <in.fa/fq/len>
fasta/fastqのリード長分析
fxtools length-parse reference.fasta
Read_9997_length=16254bp_startpos=2897484_number_of_errors=2626_total_error_prob=0.15_passes=1.359881649744054_passes_left=1_passes_right=2_cut_position=10404 16254
Read_9998_length=14882bp_startpos=1299896_number_of_errors=1313_total_error_prob=0.09_passes=1.5013880205881063_passes_left=1_passes_right=2_cut_position=7420 14882
Read_9999_length=3576bp_startpos=603342_number_of_errors=13_total_error_prob=0.00_passes=6.341188385708311_passes_left=7_passes_right=6_cut_position=1220 3576
== '/Users/kazuma/Documents/pacbio-GT-S_simulation.fastq' read length stats ==
Total reads 10,000
Total bases 81,524,014
Mean length 8,152
Min. length 50
Max. length 30,592
N-50 length 9,430
10、fxtools merge-fa
Usage: fxtools merge-fa <in.fa/fq> [N] > <out.fa/fq>
optional: use N to separate merged sequences
同じヘッダのfasta/fastqを統合(長い方が残る)
fxtools merge-fa inout.fa > output.fa
11、 fxtools duplicate-fa
Usage: fxtools duplicate-fa <in.fa/fq> <copy_number> > out.fa/fq
fasta/fastqを複製
#3倍にduplicate
fxtools duplicate-fa input.fa 3 > out.fa
> seqkit stats input.fa out.fa
file format type num_seqs sum_len min_len avg_len max_len
input.fa FASTA DNA 1 15,360 15,360 15,360 15,360
out.fa FASTA DNA 1 46,080 46,080 46,080 46,080
12、fxtools error-parse
Usage: fxtools error-parse <input.bam> [-s] > error.out
-s include non-primary records in the output.
BAMファイルのCIGARとNMに基づいて、インデルとミスマッチをパース
fxtools error-parse input.bam > output
13、 fxtools trim
Usage: fxtools trim in.fa/fq min_trim_length min_fraction window_size > out.fa
poly A tail (poly head)をトリミング
fxtools trim input.fq 10 0.05 4 > output.fq
14、 fxtools trim
Usage: fxtools trim in.fa/fq min_trim_length min_fraction window_size > out.fa
poly A tail (poly head)をトリミング
fxtools trim input.fq 10 0.05 4 > output.fq
poly A tail (poly head)を持つリードだけ出力するならfxtools trimFを使う。
他にも、fastq <=>fasta変換コマンドやDNA <=> RNA 変換コマンドがある。
引用
https://github.com/yangao07/fxtools
関連