genome_updaterはNCBIゲノム(refseq / genbank)をダウンロードおよび更新するBashスクリプトである。データの更新、詳細ログの保持、ファイル整合性チェック(MD5)、そして並列[2]ダウンロードをサポートする。




#bioconda (link)
conda create -n genome_updater -y
conda activate genome_updater
conda install -c bioconda -y genome_updater


 -g Organism group (one or more comma-separated entries) [archaea, bacteria, fungi, human (also contained in vertebrate_mammalian), invertebrate, metagenomes (genbank), other (synthetic genomes - only genbank), plant, protozoa, vertebrate_mammalian, vertebrate_other, viral (only refseq)]. Example: archaea,bacteria

    or Species level taxids (one or more comma-separated entries). Example: species:622,562

    or Any level taxids - lineage will be generated (one or more comma-separated entries). Example: taxids:620,649776


 -d Database [genbank, refseq]

Default: refseq

 -c RefSeq Category [all, reference genome, representative genome, na]

Default: all

 -l Assembly level [all, Complete Genome, Chromosome, Scaffold, Contig]

Default: all

 -f File formats [genomic.fna.gz,assembly_report.txt, ... - check for all file formats]

Default: assembly_report.txt


 -k Dry-run, no data is downloaded or updated - just checks for available sequences and changes

 -i Fix failed downloads or any incomplete data from a previous run, keep current version

 -x Allow the deletion of extra files if some are found in the repository folder


 -u Report of updated assembly accessions (Added/Removed, assembly accession, url)

 -r Report of updated sequence accessions (Added/Removed, assembly accession, genbank accession, refseq accession, sequence length, taxid). Only available when file assembly_report.txt selected and successfully downloaded

 -p Output list of URLs for downloaded and failed files

 -a Download the current version of the Taxonomy database (taxdump.tar.gz)


 -o Working output directory 

Default: ./tmp.XXXXXXXXXX

 -b Version label

Default: current timestamp (YYYY-MM-DD_HH-MM-SS)

 -e External "assembly_summary.txt" file to recover data from 

Default: ""

 -t Threads

Default: 1


 -m Check MD5 for downloaded files

 -s Silent output

 -w Silent output with download progress (%) and download version at the end

 -n Conditional exit status. Exit Code = 1 if more than N files failed to download (integer for file number, float for percentage, 0 -> off)

Default: 0





#refseqのarchaeaとbacteriaの完全長ゲノムをダウンロード。8スレッド指定。MD5チェック。 -d "refseq" -g "archaea,bacteria" -c "all" -l "Complete Genome" -f "genomic.fna.gz" -o "arc_bac_refseq_cg" -t 8 -u -m

#fastaに追加でgbff(genbank)をダウンロード。 -d "refseq" -g "archaea,bacteria" -c "all" -l "Complete Genome" -f "genomic.fna.gz,genomic.gbff.gz" -o "arc_bac_refseq_cg" -t 8 -u -m -i

#しばらく経ってからアップデートをチェック -d "refseq" -g "archaea,bacteria" -c "all" -l "Complete Genome" -f "genomic.fna.gz,genomic.gbff.gz" -o "arc_bac_refseq_cg" -k

#更新があったらアップデート。 -d "refseq" -g "archaea,bacteria" -c "all" -l "Complete Genome" -f "genomic.fna.gz,genomic.gbff.gz" -o "arc_bac_refseq_cg" -t 8 -u -m


Refseqの全RNA virus(under the taxon Riboviria)をダウンロード -d "refseq" -g "taxids:2559587" -f "genomic.fna.gz" -o "all_rna_virus" -t 12


genbankの利用可能な全virusゲノム数を確認 -d "genbank" -g "viral" -c "all" -l "all" -k



genbankの利用可能な全virusゲノム数を確認 -d "refseq" -g "fungi" -c "all" -l "all" -f "assembly_report.txt" -o "fungi" -t 12 -r -p




