2023/07/04 追記
With plenty of colleagues, we are happy to introduce bioconvert, a software to convert bioinformatics files from one format to another. bioconvert currently includes 50 formats and 100 converters. pic.twitter.com/jXI68WyZ6w
— Yoann Dufresne (@yoann_dufresne) March 23, 2023
#conda(link) (依存が多いので10分くらいかかる)
mamba create -n bioconvert -y
conda activate bioconvert
mamba install -c bioconda bioconvert -y
#pip(pypi 一部の変換ではsamtoolsも必要)
pip install bioconvert
docker pull bioconvert/bioconvert:0.6.1
> bioconvert --help
$ bioconvert -h
usage: bioconvert [-h] [-v {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--dependency-report] [-a] [--version] [--conversion-graph [{cytoscape,cytoscape-all}]]
positional arguments:
sub-command help
abi2fasta abi to-> fasta (1 methods)
abi2fastq abi to-> fastq (1 methods)
abi2qual abi to-> qual (1 methods)
bam2bedgraph bam to-> bedgraph (2 methods)
bam2bigwig bam to-> bigwig (2 methods)
bam2cov bam to-> cov (2 methods)
bam2cram bam to-> cram (1 methods)
bam2fasta bam to-> fasta (1 methods)
bam2fastq bam to-> fastq (2 methods)
bam2json bam to-> json (1 methods)
bam2sam bam to-> sam (4 methods)
bam2tsv bam to-> tsv (2 methods)
bam2wiggle bam to-> wiggle (1 methods)
bcf2vcf bcf to-> vcf (1 methods)
bcf2wiggle bcf to-> wiggle (1 methods)
bed2wiggle bed to-> wiggle (1 methods)
bedgraph2bigwig bedgraph to-> bigwig (1 methods)
bedgraph2cov bedgraph to-> cov (1 methods)
bedgraph2wiggle bedgraph to-> wiggle (1 methods)
bigbed2bed bigbed to-> bed (1 methods)
bigbed2wiggle bigbed to-> wiggle (1 methods)
bigwig2bedgraph bigwig to-> bedgraph (2 methods)
bigwig2wiggle bigwig to-> wiggle (1 methods)
bplink2plink bplink to-> plink (1 methods)
bplink2vcf bplink to-> vcf (1 methods)
bz22gz bz2 to-> gz (2 methods)
clustal2fasta clustal to-> fasta (3 methods)
clustal2nexus clustal to-> nexus (1 methods)
clustal2phylip clustal to-> phylip (2 methods)
clustal2stockholm clustal to-> stockholm (2 methods)
cram2bam cram to-> bam (1 methods)
cram2fasta cram to-> fasta (1 methods)
cram2fastq cram to-> fastq (1 methods)
cram2sam cram to-> sam (1 methods)
csv2tsv csv to-> tsv (3 methods)
csv2xls csv to-> xls (2 methods)
dsrc2gz dsrc to-> gz (1 methods)
embl2fasta embl to-> fasta (2 methods)
embl2genbank embl to-> genbank (2 methods)
fasta2clustal fasta to-> clustal (3 methods)
fasta2faa fasta to-> faa (1 methods)
fasta2fasta_agp fasta to-> fasta_agp (1 methods)
fasta2fastq fasta to-> fastq (1 methods)
fasta2genbank fasta to-> genbank (3 methods)
fasta2nexus fasta to-> nexus (1 methods)
fasta2phylip fasta to-> phylip (3 methods)
fasta2twobit fasta to-> twobit (1 methods)
fasta_qual2fastq fasta_qual to-> fastq (1 methods)
fastq2fasta fastq to-> fasta (9 methods)
fastq2fasta_qual fastq to-> fasta_qual (1 methods)
fastq2qual fastq to-> qual (1 methods)
genbank2embl genbank to-> embl (2 methods)
genbank2fasta genbank to-> fasta (3 methods)
genbank2gff3 genbank to-> gff3 (1 methods)
gfa2fasta gfa to-> fasta (2 methods)
gff22gff3 gff2 to-> gff3 (1 methods)
gff32gff2 gff3 to-> gff2 (1 methods)
gff32gtf gff3 to-> gtf (1 methods)
gz2bz2 gz -to-> bz2 (3 methods)
gz2dsrc gz -to-> dsrc (1 methods)
json2yaml json to-> yaml (1 methods)
maf2sam maf to-> sam (1 methods)
newick2nexus newick to-> nexus (1 methods)
newick2phyloxml newick to-> phyloxml (1 methods)
nexus2clustal nexus to-> clustal (3 methods)
nexus2fasta nexus to-> fasta (3 methods)
nexus2newick nexus to-> newick (2 methods)
nexus2phylip nexus to-> phylip (1 methods)
nexus2phyloxml nexus to-> phyloxml (1 methods)
ods2csv ods to-> csv (1 methods)
pdb2faa pdb to-> faa (1 methods)
phylip2clustal phylip to-> clustal (2 methods)
phylip2fasta phylip to-> fasta (3 methods)
phylip2nexus phylip to-> nexus (1 methods)
phylip2stockholm phylip to-> stockholm (2 methods)
phylip2xmfa phylip to-> xmfa (1 methods)
phyloxml2newick phyloxml to-> newick (1 methods)
phyloxml2nexus phyloxml to-> nexus (1 methods)
plink2bplink plink to-> bplink (1 methods)
plink2vcf plink to-> vcf (1 methods)
sam2bam sam to-> bam (1 methods)
sam2cram sam to-> cram (1 methods)
sam2paf sam to-> paf (1 methods)
scf2fasta scf to-> fasta (1 methods)
scf2fastq scf to-> fastq (1 methods)
sra2fastq sra to-> fastq (1 methods)
stockholm2clustal stockholm to-> clustal (2 methods)
stockholm2phylip stockholm to-> phylip (2 methods)
tsv2csv tsv to-> csv (3 methods)
twobit2fasta twobit to-> fasta (2 methods)
vcf2bcf vcf to-> bcf (1 methods)
vcf2bed vcf to-> bed (1 methods)
vcf2bplink vcf to-> bplink (1 methods)
vcf2plink vcf to-> plink (1 methods)
vcf2wiggle vcf to-> wiggle (1 methods)
wig2bed wig to-> bed (1 methods)
xls2csv xls to-> csv (2 methods)
xlsx2csv xlsx to-> csv (2 methods)
xmfa2phylip xmfa to-> phylip (1 methods)
yaml2json yaml to-> json (1 methods)
-h, --help show this help message and exit
Set the outpout verbosity. Same as --level
Set the outpout verbosity. Same as --verbosity
--dependency-report Output all bioconvert dependencies in json and exit
-a, --allow-indirect-conversion
Show all possible indirect conversions (labelled as intermediate)
--version Show version
--conversion-graph [{cytoscape,cytoscape-all}]
Bioconvert contains tens of converters whose list is available as follows:
bioconvert --help
Each conversion has its own sub-command and dedicated help. For instance:
bioconvert fastq2fasta --help
Because the subcommand contains the format, extensions are not important
for the conversion itself. This would convert the test.txt file (fastq
format) into a fasta file:
bioconvert fastq2fasta test.txt test.fasta
If you use known extensions, the converter may be omitted::
bioconvert test.fastq test.fasta
Users must ensure that their input format files are properly formatted.
If there is a conversion from A to B and another for B to C, you can also
perform indirect conversion using -a argument (experimental). This command
shows all possible indirect conversions:
bioconvert --help -a
Please visit http://bioconvert.readthedocs.org for more information about the
project or formats available. Would you wish to help, please join our open
source collaborative project at https://github/bioconvert/bioconvert
> bioconvert fastq2fasta --help
$ bioconvert fastq2fasta --help
usage: bioconvert fastq2fasta [-h] [-f] [-v {DEBUG,INFO,WARNING,ERROR,CRITICAL}] [--raise-exception] [-X] [-b] [-N BENCHMARK_N] [-T BENCHMARK_TAG] [-I] [--benchmark-mode BENCHMARK_MODE] [-M BENCHMARK_METHODS [BENCHMARK_METHODS ...]] [-a] [-e EXTRA_ARGUMENTS]
[-m [{awk,bioconvert,mappy,mawk,perl,readfq,sed,seqkit,seqtk}]] [-s] [-t THREADS]
[input_file] [output_file]
Convert file from '('FASTQ',)' to '('FASTA',)' format. See bioconvert.readthedocs.io for details
positional arguments:
input_file The path to the file to convert. (default: None)
output_file The path where the result will be stored. (default: None)
-h, --help show this help message and exit
-f, --force if outfile exists, it is overwritten with this option (default: False)
Set the outpout verbosity. (default: ERROR)
--raise-exception Let exception ending the execution be raised and displayed (default: False)
-X, --batch Allow conversion of a set of files using wildcards. You must use quotes to escape the wildcards. For instance: --batch 'test*fastq' (default: False)
-b, --benchmark Running all available methods (default: False)
Number of trials for each methods (default: 5)
Save results (json and image) named after this tag. You may include sub directories (default: bioconvert)
-I, --benchmark-save-image
Save results as an image (using the same tag as from --benchmark-tag) (default: False)
--benchmark-mode BENCHMARK_MODE
Set the mode of the benchmark, which can be time, CPU or memory. Defaults to time) (default: time)
Methods to include. Provide list as space-separated method names. Use -s to get the full list. (default: all)
-a, --allow-indirect-conversion
Allow to chain converter when direct conversion is absent (default: False)
Any arguments accepted by the method's tool (default: )
-m [{awk,bioconvert,mappy,mawk,perl,readfq,sed,seqkit,seqtk}], --method [{awk,bioconvert,mappy,mawk,perl,readfq,sed,seqkit,seqtk}]
The method to use to do the conversion. (default: bioconvert)
-s, --show-methods A converter may have several methods (default: False)
-t THREADS, --threads THREADS
threads to be used (default: 4)
Bioconvert is an open source collaborative project. Please feel free to join us at https://github/biokit/bioconvert
1、基本的な使用法。ほとんどの場合、Bioconvert はサブコマンド名と入力と出力のファイル名だけ指定する。
bioconvert fastq2fasta input.fastq output.fasta
bioconvert fastq2fasta input.fastq
このように、どのようなフォーマット間の変換をするかはサブコマンドで指定する。fastq2fastaならfastqをfastaに変換する(from fastq to (2) fasta)。サブコマンドは非常に多くの種類があるが、説明の前にbioconvertのより柔軟な使い方について確認しておく。
bioconvert input.fastq output.fasta
bioconvert test.fastq.gz test.fasta
bioconvert test.fastq.gz test.fasta.gz
bioconvert test.fastq.gz test.fasta.bz2
bioconvert test.fastq.gz test.fastq.dsrc
5、ワイルドカード。入力の* や ? などのワイルドカードも認識して順次変換可能。出力を指定しないために暗黙の変換はできない。サブコマンドを指定してどんな変換を行うのか明示する。
#fastq => fasta
bioconvert fastq2fasta "*.fastq"
bioconvert sra2fastq ERR043367
- -m [{fastq_dump}] The method to use to do the conversion. (default: fastq_dump)
7、sam <=> bam変換。samtoolsで行うsam => bam変換を行うことが出来る(coordinate sort)。
#暗黙的に変換 sam => bam
bioconvert ERR043367.sam ERR043367.bam
#bam => sam
bioconvert ERR043367.bam ERR043367.bam
#サブコマンドを明示して変換 sam => bam
bioconvert sam2bam ERR043367.sam
- -m [{samtools}] The method to use to do the conversion. (default: samtools)
- -t threads to be used (default: 4)
- -X Allow conversion of a set of files using wildcards. You must use quotes to escape the wildcards. For instance: --batch 'test*fastq' (default: False)
minimap2 -a ref.fasta reads.fq | bioconvert sam2bam - out.bam
さらに、A => B変換を経てB =>C変換をしたい場合、-a引数(現在 experimental)を使って間接変換を行うこともできる。 "-a Allow to chain converter when direct conversion is absent (default: False)"
bioconvert --help -aで表示されるヘルプにおいて、"intermediate"が付いているのが間接変換を行うコマンドになる。
- 変換するファイルがたくさんある場合は、Sequanaプロジェクトにsnakemakeパイプラインが用意されており、pip install sequana_bioconvertでインストールできる(解説)。
- pythonのコンソール上でも使用できる。
- サブコマンドが用意されていても、変換が無理なら実行されない。例えばfasta => clustalフォーマット変換は、fasta形式のMSAを想定しているので、入力ファイル内に配列が1つしか存在しないとかなら実行されない。
BioConvert: a comprehensive format converter for life sciences
Hugo Caro, Sulyvan Dollin, Anne Biton, Bryan Brancotte, Dimitri Desvillechabrol, Yoann Dufresne, Blaise Li, Etienne Kornobis, Frédéric Lemoine, Nicolas Maillet, Amandine Perrin, Nicolas Traut, Bertrand Néron, Thomas Cokelaer
bioRxiv, Posted March 15, 2023