MetamORF: A repository of unique short Open Reading Frames identified by both experimental and computational approaches for gene-level and meta-analysis
Sebastien A. Choteau, Audrey Wagner, Philippe Pierre, Lionel Spinelli, Christine Brun
Full-length de novo viral quasispecies assembly through variation graph construction Jasmijn A Baaijens, Bastiaan Van der Roest, Johannes Köster, Leen Stougie, Alexander Schönhuth Bioinformatics, Volume 35, Issue 24, 15 December 2019, Pages 5086–5094
~~ separated list of VIGOR parameters to override default values. Use --list-config-parameters to see settable parameters.
-v, --verboseverbose logging (default=terse)
--list-config-parameters [{all,current}]
list available configuration parameters and exit. Bydefaultonlylistsdescription,usetheverboseoption before this option to list more
information
--list-databases list the names and other information about the found vigor compatibledatabases.Requires reference database path to be set either by passing the
--reference-database-path command line parameter or setting reference_database_path in the configuration file
--versionprint version information
--config-file CONFIG_FILE
config file to use
--reference-database-path REFERENCE_DATABASE_PATH
reference database path
--virus-config VIRUSSPECIFICCONFIG
Path to virus specific configuration
--virus-config-path VIRUSSPECIFICCONFIGPATH
Path to directory containing virus specific config files.
--overwrite-output overwrite existing output files if they exist
--temporary-directory TEMPORARYDIRECTORY
Root directory to use for temporary directories
--list-output-formatslist acceptable output formats and exit
cblasterはタンパク質配列のコレクションが与えられると、リモート(NCBI BLAST API経由)またはローカル(DIAMOND経由)で配列データベースを検索する。検索結果は解析され、identity、coverage、e-valueのユーザー定義しきい値に基づいてフィルタリングされる。残りのヒットのゲノム座標は、NCBIのIdentical Protein Group (IPG)データベース(またはローカル検索の場合はローカルデータベース)から取得される。最後に、cblasterはcollocationのインスタンスをスキャンし、可視化する。
cblaster is tested on Python 3.6, and its only external Pythondependency is the requests module (used for interaction with NCBI APIs). If you want to perform local searches, you should have diamond installed and available on your system $PATH.
-qf, --query_file Path to FASTA file containing protein sequences to be searched
-me, --max_evalue Maximum e-value for a BLAST hit to be saved (def. 0.01)
-mi, --min_identity Minimum percent identity for a BLAST hit to be saved (def. 30)
-mc, --min_coverage Minimum percent query coverage for a BLAST hit to be saved (def. 50)
-u, --unique Minimum number of unique query sequences that must be conserved in a hit cluster (def. 3)
-mh, --min_hits Minimum number of hits in a cluster (def. 3)
-g, --gap Maximum allowed intergenic distance (bp) between conserved hits to be considered in the same block (def. 20000)
-ode,--output_delimiter Delimiter character to use when printing result output.
-odc, --output_decimals Total decimal places to use when printing score values
-s, --session_file Load session from JSON. If the specified file does not exist, the results of the new search will be saved to this file.
リモートでの配列検索が実行される。
[12:26:10] INFO - Starting cblaster in remote mode
[12:26:10] INFO - Launching new search
[12:26:12] INFO - Request Identifier (RID): UPPEKFF9016
[12:26:12] INFO - Request Time Of Execution (RTOE): 27s
[12:26:39] INFO - Polling NCBI for completion status
[12:26:39] INFO - Checking search status...
[12:27:39] INFO - Checking search status...
[12:27:40] INFO - Search has completed successfully!
[12:27:40] INFO - Retrieving results for search UPPEKFF9016
[12:27:43] INFO - Parsing results...
[12:27:43] INFO - Found 6835 hits meeting score thresholds
[12:27:43] INFO - Fetching genomic context of hits
[12:28:18] WARNING - Found no hits for IPG 346263175
[12:28:18] WARNING - Found no hits for IPG 346272747
[12:28:18] WARNING - Found no hits for IPG 336900197
[12:28:18] WARNING - Found no hits for IPG 341670659
[12:28:18] WARNING - Found no hits for IPG 334707585
[12:28:18] WARNING - Found no hits for IPG 341666946
[12:28:18] WARNING - Found no hits for IPG 341669695
[12:28:18] WARNING - Found no hits for IPG 341667385
(以下省略)
出力
0%(白)から100%(青)の同一性を示すヒートマップが表示される。
右上には
”our search of 3 queries returned 12241 hits from 12240 unique sequences. cblaster detected 14 clusters across 10515 genomic scaffolds from 1237 organisms.”
cblaster: a remote search tool for rapid identification and visualisation of homologous gene clusters
Cameron Laurence Mathison Gilchrist, Thomas J Booth, Yit-Heng Heng Chooi
bioRxiv, Posted November 09, 2020
追記
cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters Cameron L M Gilchrist, Thomas J Booth, Bram van Wersch, Liana van Grieken, Marnix H Medema, Yit-Heng Chooi Bioinformatics Advances, Volume 1, Issue 1, 2021