macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

アセンブル結果の分析およびマージを行う CAMSA

2019 6/11 インストール追記、タイトル修正

 

 ドライの計算技術およびウエット実験技術を利用して、ドラフトゲノムからゲノムを再構築する様々な方法が存在するが、それらはアセンブリの一部のみを生成する。したがって、異なる方法によって作製されたアセンブリ結果を比較して統合することが重要となるが、手動で実行するとかなりの労力がかかる。CAMSAは2つ以上のアセンブリの比較分析および統合のためのツール。統合されたscaffoldsを構築し、いくつかのアセンブリメトリックを含む広範なレポートも作成する。

 

公式ページ

https://cblab.org/camsa/

wiki

https://github.com/compbiol/CAMSA/wiki/Usage

入力ファイル

https://github.com/compbiol/CAMSA/wiki/Input

 

 

インストール 

Github

sudo pip install CAMSA
run_camsa.py --help #テストラン

> run_camsa.py -h

$ run_camsa.py -h

usage: run_camsa.py [-h] [--seqi SEQI] [--seqi-delimiter SEQI_DELIMITER]

                    [--seqi-ensure-all] [-c CONFIG] [--c-cw-exact C_CW_EXACT]

                    [--c-cw-candidate C_CW_CANDIDATE]

                    [--c-subgroups-cntlim C_SUBGROUPS_CNTLIM]

                    [--c-subgroups-uo-cntlim C_SUBGROUPS_UO_CNTLIM]

                    [--ref-disable] [--ref REFERENCE_NAME]

                    [--c-merging-cw-min C_MERGING_CW_MIN]

                    [--c-merging-strategy {greedy,maximal-matching}]

                    [--c-merging-cycles] [--version]

                    [--i-delimiter I_DELIMITER] [-o O_DIR]

                    [--o-merged-format O_MERGED_FORMAT]

                    [--o-subgroups-format O_SUBGROUPS_FORMAT]

                    [--o-subgroups-uo-format O_SUBGROUPS_UO_FORMAT]

                    [--o-collapsed-format O_COLLAPSED_FORMAT]

                    [--o-original-format O_ORIGINAL_FORMAT]

                    [--c-logging-level {0,10,20,30,40,50}]

                    [--c-logging-formatter-entry C_LOGGING_FORMATTER_ENTRY]

                    points [points ...]

 

================================================================================

| Sergey Aganezov & Max A. Alekseyev (c)                                       |

| Computational Biology Institute, The George Washington University            |

|                                                                              |

| CAMSA is a tool for Comparative Analysis and Merging of Scaffold Assemblies  |

|                                                                              |

| For more information refer to github.com/compbiol/camsa/wiki                 |

| With any questions, please, contact Sergey Aganezov [aganezov(at)cs.jhu.edu] |

================================================================================

 Args that start with '--' (eg. --seqi) can also be set in a config file (/Users/user/.pyenv/versions/miniconda2-4.0.5/lib/python2.7/site-packages/camsa/run_camsa.ini or /Users/user/.pyenv/versions/miniconda2-4.0.5/lib/python2.7/site-packages/camsa/logging.ini or specified via -c). Config file syntax allows: key=value, flag=true, stuff=[a,b,c] (for details, see syntax at https://goo.gl/R74nmi). If an arg is specified in more than one place, then commandline values override config file values which override defaults.

 

positional arguments:

  points                A list of input files, representing a standard CAMSA format for assembly points.

 

optional arguments:

  -h, --help            show this help message and exit

  --seqi SEQI           A file with sequences' information about fragments, involved in input assembly points

  --seqi-delimiter SEQI_DELIMITER

                        A delimiter character for the file containing sequences' information.

                        DEFAULT: \t

  --seqi-ensure-all     Ensures, that either all, or non of the sequences participating in assembly points have information provided for them.

                        DEFAULT: False

  -c CONFIG, --config CONFIG

                        Config file path with settings for CAMSA to run with.

                        Overwrites the default CAMSA configuration file.

                        Values in config file can be overwritten by command line arguments.

  --c-cw-exact C_CW_EXACT

                        A confidence weight value assigned to oriented assembly points and respective exact assembly edges,

                        in case "?" is specified as the respective assembly point confidence weight.

                        DEFAULT: 1.0

  --c-cw-candidate C_CW_CANDIDATE

                        A confidence weight value assigned to semi/un-oriented assembly points and respective candidate assembly edges,

                        in case "?" is specified as the respective assembly point confidence weight.

                        DEFAULT: 0.75

  --c-subgroups-cntlim C_SUBGROUPS_CNTLIM

                        A maximum number of assemblies subgroups (sorted in descending order) to be output. -1 for no limit.

  --c-subgroups-uo-cntlim C_SUBGROUPS_UO_CNTLIM

                        A maximum number of unoriented versions of assemblies subgroups (sorted in descending order) to be output. -1 for no limit.

  --ref-disable

  --ref REFERENCE_NAME

  --c-merging-cw-min C_MERGING_CW_MIN

                        A threshold for the minimum cumulative confidence weight for merged assembly edges in MSAG.

                        Edges with confidence weight below are not considered in the "merged" assembly construction.

                        DEFAULT: 0.0

  --c-merging-strategy {greedy,maximal-matching}

                        A strategy to produced a merged assembly from the given ones.

                        DEFAULT: maximal-matching

  --c-merging-cycles    Whether to allow cycles in the produced merged assembly.

                        DEFAULT: False

  --version             show program's version number and exit

  --i-delimiter I_DELIMITER

                        String used as a delimiter in the input files with CAMSA assembly points

  -o O_DIR, --o-dir O_DIR

                        A directory, where CAMSA will store all of the produced output (report, assets, etc).

                        DEFAULT: camsa_{date}

  --o-merged-format O_MERGED_FORMAT

                        The CAMSA-out formatting for the merged scaffold assemblies in a form of CAMSA points.

  --o-subgroups-format O_SUBGROUPS_FORMAT

                        The CAMSA-out formatting for the subgrouped assembly points form of CAMSA points.

  --o-subgroups-uo-format O_SUBGROUPS_UO_FORMAT

                        The CAMSA-out formatting for the subgrouped unoriented assembly points in a form of CAMSA points.

  --o-collapsed-format O_COLLAPSED_FORMAT

                        The CAMSA-out formatting for the collapsed assembly points and their computed conflicts.

  --o-original-format O_ORIGINAL_FORMAT

                        The CAMSA-out formatting for the non-collapsed assembly points and their computed conflicts.

  --c-logging-level {0,10,20,30,40,50}

                        Logging level for CAMSA.

                        DEFAULT: 20

  --c-logging-formatter-entry C_LOGGING_FORMATTER_ENTRY

                        Format string for python logger.

 

実行方法

まずコンティグのFASTAファイルをCAMSAの入力フォーマットに変換する。

fasta2camsa_points.py contigs.fasta scaffolds.fasta -o OUTDIR 
  • -o OUTPUT_DIR Output directory to store temporary and final files.

f:id:kazumaxneo:20171128191245j:plain

 scaffolds.camsa.pointsができる。

 

run_camsa.py f1.camsa.points -o output_dir

インタラクティブな分析htmlレポートも出力される。

f:id:kazumaxneo:20180320213134j:plain

 

引用

CAMSA: a Tool for Comparative Analysis and Merging of Scaffold Assemblies

Sergey S. Aganezov, Max A. Alekseyev

BMC Bioinformatics. 2017; 18(Suppl 15): 496.