macでインフォマティクス

macでインフォマティクス

NGS関連のインフォマティクス情報についてまとめています。

MetaMeta

 

 現在、環境サンプルをcharacterizeすることを目指して、ますます多くのメタゲノム分析ツールが利用可能になっている[論文より ref.1,2,3,4]。Whole metagenome shotgun (WMS)シーケンシングテクニックから生成される大量のデータにより動機づけられたメタゲノムのプロファイリングは、実際のシナリオでよりアクセスしやすく、より速く適用可能となり、メタゲノミクス分析の標準的な方法となってきている[ref.5,6,7]。 WMSシーケンシングデータに基づいて配列の分類を実行するツールにさまざまな種類がある。 1つの基本的なアプローチは、リファレンスや先行知識なしにショートリードから完全またはほぼ完全なゲノムを再構築することを目的とするde novo sequence assembly [ref.8,9,10]である。これはコミュニティ構成を評価するための最高の解決策を提供する。しかし、リード長の短さ、不十分なカバレッジ、類似のDNA配列、およびlow abundant な株のために、メタゲノミクスデータから有意義なアセンブリを生成することは非常に困難である[ref.11 link]。

 より一般的には、アセンブリなしで直接WMSリードを使う、リファレンスベースの手法がある。つまり、以前に取得されたゲノム配列に依存して解析が行われる。このカテゴリのアプリケーションでは、2つのスタンダード:taxonomic profilingツールとビニングツールが採用されている。Profilersは、WMSシーケンス全体を分析し、所定のリファレンス配列に基づいて生物およびそれらの相対存在量を予測することを目指している。ビニングツールは、特定のサンプル中の各シーケンスを個別に分類し、それらのそれぞれをリファレンスセットの最も有望な生物に連結することを目的とする。これら概念の違いにかかわらず、両方のツール群を微生物群集の特徴づけに使用することが可能だった。しかし、ビニングツールは各配列の個々の分類を生成し、分類学的プロファイラとして使用するために変換し、正規化する必要がある。

 これらの2つのカテゴリの中で利用可能な方法は、いくつかの技術、例えば、リードマッピング、k-merアライメント、および組成分析が含まれる。データベース、例えばcomplete ゲノム配列、マーカー遺伝子、タンパク質配列の構築に関する変形もまた一般的である。これらの技術の多くは、最新のシークエンシング技術の高いスループットと、利用可能な多数のリファレンスゲノム配列を処理するための計算コストを克服するために開発された。

 ツール、パラメータ、データベース、およびテクニックのためのいくつかのオプションの利用可能性は、研究者にとってどのメソッドを使用するか決めるシナリオを複雑にする。さまざまなツールは、さまざまなシナリオで良好な結果を提供し、複数の構成で多かれ少なかれ正確であるか、感度が高い。研究やサンプルの変化ごとにそれらの出力に依存することは難しい。さらに、複数の方法が使用される場合、異なる基準セットを使用するツール間の矛盾した結果を統合することは困難である。さらに、インストール、パラメータ、データベース作成、標準出力の欠如は容易に解決できない課題である。

 メタゲノムシーケンス分類ツールの共同実行と統合のための新しいパイプラインであるMetaMetaを提案する。 MetaMetaにはいくつかの利点がある:簡単なインストールとセットアップ、複数のツールとサンプルとデータベースのサポート、複数の結果を組み合わせた最終プロファイルの改善、すぐに使用できる並列化と高性能コンピューティング(HPC)の統合、データベースの自動ダウンロードと設定、カスタムデータベースの作成、統合された前処理ステップ(リードのトリミング、エラー訂正、およびサブサンプリング) 。MetaMetaは、正しい識別情報をマージし、誤った識別情報を適切にフィルタリングすることにより、より慎重なプロファイリング結果を実現する。 MetaMetaはSnakeMake [ref.12 link]で構築され、オープンソースである。(以下略)

 

Tool selectionより

MetaMetaは、CLARK [ref.19]、DUDes [ref.20]、GOTTCHA [ref.21]、Kraken [ref.22]、Kaiju [ref.23]、およびmOTUs [ref.24]の6つのツールセットで評価された。 選んだ動機は、部分的には、そのようなツールの性能を比較している最近の論文による[ref.3、4、25]。 CLARK、GOTTCHA、Kraken、およびmOTUは[ref.4]に従い非常に低い偽陽性数を達成した。 DUDesは、[ref.25]に従って精度と感度との間の良好なトレードオフを達成するin houseツールであった。 Kaijuはtranslatedデータベースを使用し、全ゲノムベースの方法に多様性をもたらす。

 

f:id:kazumaxneo:20180920211149j:plain

MetaMeta Pipeline. 論文より転載(pubmed)。

 

MetaMetaに関するツイート


インストール

cent os6のanaconda3-4.0.0環境でテストした。

本体 Github

#Anaconda環境ならcondaで導入可能だが、linuxのみ。
conda install -c bioconda metameta
#他とバッティングしていたようなので、仮想環境にインストールし直した


#仮想環境でテストする。pyenvでpythonのバージョン管理しているものとする
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda

conda create -n metametaenv metameta=1.2.0
source activate metametaenv

 >metameta --help

$ metameta --help

usage: snakemake [-h] [--profile PROFILE] [--snakefile FILE] [--gui [PORT]]

                 [--cores [N]] [--local-cores N]

                 [--resources [NAME=INT [NAME=INT ...]]]

                 [--config [KEY=VALUE [KEY=VALUE ...]]] [--configfile FILE]

                 [--list] [--list-target-rules] [--directory DIR] [--dryrun]

                 [--printshellcmds] [--debug-dag] [--dag]

                 [--force-use-threads] [--rulegraph] [--d3dag] [--summary]

                 [--detailed-summary] [--archive FILE] [--touch]

                 [--keep-going] [--force] [--forceall]

                 [--forcerun [TARGET [TARGET ...]]]

                 [--prioritize TARGET [TARGET ...]]

                 [--until TARGET [TARGET ...]]

                 [--omit-from TARGET [TARGET ...]] [--allow-ambiguity]

                 [--cluster CMD | --cluster-sync CMD | --drmaa [ARGS]]

                 [--drmaa-log-dir DIR] [--cluster-config FILE]

                 [--immediate-submit] [--jobscript SCRIPT] [--jobname NAME]

                 [--cluster-status CLUSTER_STATUS] [--kubernetes [NAMESPACE]]

                 [--kubernetes-env ENVVAR [ENVVAR ...]]

                 [--container-image IMAGE] [--reason] [--stats FILE]

                 [--nocolor] [--quiet] [--nolock] [--unlock]

                 [--cleanup-metadata FILE [FILE ...]] [--rerun-incomplete]

                 [--ignore-incomplete] [--list-version-changes]

                 [--list-code-changes] [--list-input-changes]

                 [--list-params-changes] [--latency-wait SECONDS]

                 [--wait-for-files [FILE [FILE ...]]] [--benchmark-repeats N]

                 [--notemp] [--keep-remote] [--keep-target-files]

                 [--keep-shadow]

                 [--allowed-rules ALLOWED_RULES [ALLOWED_RULES ...]]

                 [--max-jobs-per-second MAX_JOBS_PER_SECOND]

                 [--max-status-checks-per-second MAX_STATUS_CHECKS_PER_SECOND]

                 [--restart-times RESTART_TIMES] [--attempt ATTEMPT]

                 [--timestamp] [--greediness GREEDINESS] [--no-hooks]

                 [--print-compilation]

                 [--overwrite-shellcmd OVERWRITE_SHELLCMD] [--verbose]

                 [--debug] [--runtime-profile FILE] [--mode {0,1,2}]

                 [--bash-completion] [--use-conda] [--conda-prefix DIR]

                 [--create-envs-only] [--use-singularity]

                 [--singularity-prefix DIR] [--singularity-args ARGS]

                 [--wrapper-prefix WRAPPER_PREFIX]

                 [--default-remote-provider {S3,GS,FTP,SFTP,S3Mocked,gfal,gridftp}]

                 [--default-remote-prefix DEFAULT_REMOTE_PREFIX]

                 [--no-shared-fs] [--version]

                 [target [target ...]]

 

Snakemake is a Python based language and execution environment for GNU Make-

like workflows.

 

positional arguments:

  target                Targets to build. May be rules or files.

 

optional arguments:

  -h, --help            show this help message and exit

  --profile PROFILE     Name of profile to use for configuring Snakemake.

                        Snakemake will search for a corresponding folder in

                        /etc/xdg/xdg-ubuntu/snakemake and

                        /home/kazu/.config/snakemake. Alternatively, this can

                        be an absolute or relative path. The profile folder

                        has to contain a file 'config.yaml'. This file can be

                        used to set default values for command line options in

                        YAML format. For example, '--cluster qsub' becomes

                        'cluster: qsub' in the YAML file. Profiles can be

                        obtained from https://github.com/snakemake-profiles.

  --snakefile FILE, -s FILE

                        The workflow definition in a snakefile.

  --gui [PORT]          Serve an HTML based user interface to the given

                        network and port e.g. 168.129.10.15:8000. By default

                        Snakemake is only available in the local network

                        (default port: 8000). To make Snakemake listen to all

                        ip addresses add the special host address 0.0.0.0 to

                        the url (0.0.0.0:8000). This is important if Snakemake

                        is used in a virtualised environment like Docker. If

                        possible, a browser window is opened.

  --cores [N], --jobs [N], -j [N]

                        Use at most N cores in parallel (default: 1). If N is

                        omitted, the limit is set to the number of available

                        cores.

  --local-cores N       In cluster mode, use at most N cores of the host

                        machine in parallel (default: number of CPU cores of

                        the host). The cores are used to execute local rules.

                        This option is ignored when not in cluster mode.

  --resources [NAME=INT [NAME=INT ...]], --res [NAME=INT [NAME=INT ...]]

                        Define additional resources that shall constrain the

                        scheduling analogously to threads (see above). A

                        resource is defined as a name and an integer value.

                        E.g. --resources gpu=1. Rules can use resources by

                        defining the resource keyword, e.g. resources: gpu=1.

                        If now two rules require 1 of the resource 'gpu' they

                        won't be run in parallel by the scheduler.

  --config [KEY=VALUE [KEY=VALUE ...]], -C [KEY=VALUE [KEY=VALUE ...]]

                        Set or overwrite values in the workflow config object.

                        The workflow config object is accessible as variable

                        config inside the workflow. Default values can be set

                        by providing a JSON file (see Documentation).

  --configfile FILE     Specify or overwrite the config file of the workflow

                        (see the docs). Values specified in JSON or YAML

                        format are available in the global config dictionary

                        inside the workflow.

  --list, -l            Show availiable rules in given Snakefile.

  --list-target-rules, --lt

                        Show available target rules in given Snakefile.

  --directory DIR, -d DIR

                        Specify working directory (relative paths in the

                        snakefile will use this as their origin).

  --dryrun, -n          Do not execute anything.

  --printshellcmds, -p  Print out the shell commands that will be executed.

  --debug-dag           Print candidate and selected jobs (including their

                        wildcards) while inferring DAG. This can help to debug

                        unexpected DAG topology or errors.

  --dag                 Do not execute anything and print the directed acyclic

                        graph of jobs in the dot language. Recommended use on

                        Unix systems: snakemake --dag | dot | display

  --force-use-threads   Force threads rather than processes. Helpful if shared

                        memory (/dev/shm) is full or unavailable.

  --rulegraph           Do not execute anything and print the dependency graph

                        of rules in the dot language. This will be less

                        crowded than above DAG of jobs, but also show less

                        information. Note that each rule is displayed once,

                        hence the displayed graph will be cyclic if a rule

                        appears in several steps of the workflow. Use this if

                        above option leads to a DAG that is too large.

                        Recommended use on Unix systems: snakemake --rulegraph

                        | dot | display

  --d3dag               Print the DAG in D3.js compatible JSON format.

  --summary, -S         Print a summary of all files created by the workflow.

                        The has the following columns: filename, modification

                        time, rule version, status, plan. Thereby rule version

                        contains the versionthe file was created with (see the

                        version keyword of rules), and status denotes whether

                        the file is missing, its input files are newer or if

                        version or implementation of the rule changed since

                        file creation. Finally the last column denotes whether

                        the file will be updated or created during the next

                        workflow execution.

  --detailed-summary, -D

                        Print a summary of all files created by the workflow.

                        The has the following columns: filename, modification

                        time, rule version, input file(s), shell command,

                        status, plan. Thereby rule version contains the

                        versionthe file was created with (see the version

                        keyword of rules), and status denotes whether the file

                        is missing, its input files are newer or if version or

                        implementation of the rule changed since file

                        creation. The input file and shell command columns are

                        selfexplanatory. Finally the last column denotes

                        whether the file will be updated or created during the

                        next workflow execution.

  --archive FILE        Archive the workflow into the given tar archive FILE.

                        The archive will be created such that the workflow can

                        be re-executed on a vanilla system. The function needs

                        conda and git to be installed. It will archive every

                        file that is under git version control. Note that it

                        is best practice to have the Snakefile, config files,

                        and scripts under version control. Hence, they will be

                        included in the archive. Further, it will add input

                        files that are not generated by by the workflow itself

                        and conda environments. Note that symlinks are

                        dereferenced. Supported formats are .tar, .tar.gz,

                        .tar.bz2 and .tar.xz.

  --touch, -t           Touch output files (mark them up to date without

                        really changing them) instead of running their

                        commands. This is used to pretend that the rules were

                        executed, in order to fool future invocations of

                        snakemake. Fails if a file does not yet exist.

  --keep-going, -k      Go on with independent jobs if a job fails.

  --force, -f           Force the execution of the selected target or the

                        first rule regardless of already created output.

  --forceall, -F        Force the execution of the selected (or the first)

                        rule and all rules it is dependent on regardless of

                        already created output.

  --forcerun [TARGET [TARGET ...]], -R [TARGET [TARGET ...]]

                        Force the re-execution or creation of the given rules

                        or files. Use this option if you changed a rule and

                        want to have all its output in your workflow updated.

  --prioritize TARGET [TARGET ...], -P TARGET [TARGET ...]

                        Tell the scheduler to assign creation of given targets

                        (and all their dependencies) highest priority.

                        (EXPERIMENTAL)

  --until TARGET [TARGET ...], -U TARGET [TARGET ...]

                        Runs the pipeline until it reaches the specified rules

                        or files. Only runs jobs that are dependencies of the

                        specified rule or files, does not run sibling DAGs.

  --omit-from TARGET [TARGET ...], -O TARGET [TARGET ...]

                        Prevent the execution or creation of the given rules

                        or files as well as any rules or files that are

                        downstream of these targets in the DAG. Also runs jobs

                        in sibling DAGs that are independent of the rules or

                        files specified here.

  --allow-ambiguity, -a

                        Don't check for ambiguous rules and simply use the

                        first if several can produce the same file. This

                        allows the user to prioritize rules by their order in

                        the snakefile.

  --cluster CMD, -c CMD

                        Execute snakemake rules with the given submit command,

                        e.g. qsub. Snakemake compiles jobs into scripts that

                        are submitted to the cluster with the given command,

                        once all input files for a particular job are present.

                        The submit command can be decorated to make it aware

                        of certain job properties (input, output, params,

                        wildcards, log, threads and dependencies (see the

                        argument below)), e.g.: $ snakemake --cluster 'qsub

                        -pe threaded {threads}'.

  --cluster-sync CMD    cluster submission command will block, returning the

                        remote exitstatus upon remote termination (for

                        example, this should be usedif the cluster command is

                        'qsub -sync y' (SGE)

  --drmaa [ARGS]        Execute snakemake on a cluster accessed via DRMAA,

                        Snakemake compiles jobs into scripts that are

                        submitted to the cluster with the given command, once

                        all input files for a particular job are present. ARGS

                        can be used to specify options of the underlying

                        cluster system, thereby using the job properties

                        input, output, params, wildcards, log, threads and

                        dependencies, e.g.: --drmaa ' -pe threaded {threads}'.

                        Note that ARGS must be given in quotes and with a

                        leading whitespace.

  --drmaa-log-dir DIR   Specify a directory in which stdout and stderr files

                        of DRMAA jobs will be written. The value may be given

                        as a relative path, in which case Snakemake will use

                        the current invocation directory as the origin. If

                        given, this will override any given '-o' and/or '-e'

                        native specification. If not given, all DRMAA stdout

                        and stderr files are written to the current working

                        directory.

  --cluster-config FILE, -u FILE

                        A JSON or YAML file that defines the wildcards used in

                        'cluster'for specific rules, instead of having them

                        specified in the Snakefile. For example, for rule

                        'job' you may define: { 'job' : { 'time' : '24:00:00'

                        } } to specify the time for rule 'job'. You can

                        specify more than one file. The configuration files

                        are merged with later values overriding earlier ones.

  --immediate-submit, --is

                        Immediately submit all jobs to the cluster instead of

                        waiting for present input files. This will fail,

                        unless you make the cluster aware of job dependencies,

                        e.g. via: $ snakemake --cluster 'sbatch --dependency

                        {dependencies}. Assuming that your submit script (here

                        sbatch) outputs the generated job id to the first

                        stdout line, {dependencies} will be filled with space

                        separated job ids this job depends on.

  --jobscript SCRIPT, --js SCRIPT

                        Provide a custom job script for submission to the

                        cluster. The default script resides as 'jobscript.sh'

                        in the installation directory.

  --jobname NAME, --jn NAME

                        Provide a custom name for the jobscript that is

                        submitted to the cluster (see --cluster). NAME is

                        "snakejob.{rulename}.{jobid}.sh" per default. The

                        wildcard {jobid} has to be present in the name.

  --cluster-status CLUSTER_STATUS

                        Status command for cluster execution. This is only

                        considered in combination with the --cluster flag. If

                        provided, Snakemake will use the status command to

                        determine if a job has finished successfully or

                        failed. For this it is necessary that the submit

                        command provided to --cluster returns the cluster job

                        id. Then, the status command will be invoked with the

                        job id. Snakemake expects it to return 'success' if

                        the job was successfull, 'failed' if the job failed

                        and 'running' if the job still runs.

  --kubernetes [NAMESPACE]

                        Execute workflow in a kubernetes cluster (in the

                        cloud). NAMESPACE is the namespace you want to use for

                        your job (if nothing specified: 'default'). Usually,

                        this requires --default-remote-provider and --default-

                        remote-prefix to be set to a S3 or GS bucket where

                        your . data shall be stored. It is further advisable

                        to activate conda integration via --use-conda.

  --kubernetes-env ENVVAR [ENVVAR ...]

                        Specify environment variables to pass to the

                        kubernetes job.

  --container-image IMAGE

                        Docker image to use, e.g., when submitting jobs to

                        kubernetes. By default, this is

                        'quay.io/snakemake/snakemake', tagged with the same

                        version as the currently running Snakemake instance.

                        Note that overwriting this value is up to your

                        responsibility. Any used image has to contain a

                        working snakemake installation that is compatible with

                        (or ideally the same as) the currently running

                        version.

  --reason, -r          Print the reason for each executed rule.

  --stats FILE          Write stats about Snakefile execution in JSON format

                        to the given file.

  --nocolor             Do not use a colored output.

  --quiet, -q           Do not output any progress or rule information.

  --nolock              Do not lock the working directory

  --unlock              Remove a lock on the working directory.

  --cleanup-metadata FILE [FILE ...], --cm FILE [FILE ...]

                        Cleanup the metadata of given files. That means that

                        snakemake removes any tracked version info, and any

                        marks that files are incomplete.

  --rerun-incomplete, --ri

                        Re-run all jobs the output of which is recognized as

                        incomplete.

  --ignore-incomplete, --ii

                        Do not check for incomplete output files.

  --list-version-changes, --lv

                        List all output files that have been created with a

                        different version (as determined by the version

                        keyword).

  --list-code-changes, --lc

                        List all output files for which the rule body (run or

                        shell) have changed in the Snakefile.

  --list-input-changes, --li

                        List all output files for which the defined input

                        files have changed in the Snakefile (e.g. new input

                        files were added in the rule definition or files were

                        renamed). For listing input file modification in the

                        filesystem, use --summary.

  --list-params-changes, --lp

                        List all output files for which the defined params

                        have changed in the Snakefile.

  --latency-wait SECONDS, --output-wait SECONDS, -w SECONDS

                        Wait given seconds if an output file of a job is not

                        present after the job finished. This helps if your

                        filesystem suffers from latency (default 5).

  --wait-for-files [FILE [FILE ...]]

                        Wait --latency-wait seconds for these files to be

                        present before executing the workflow. This option is

                        used internally to handle filesystem latency in

                        cluster environments.

  --benchmark-repeats N

                        Repeat a job N times if marked for benchmarking

                        (default 1).

  --notemp, --nt        Ignore temp() declarations. This is useful when

                        running only a part of the workflow, since temp()

                        would lead to deletion of probably needed files by

                        other parts of the workflow.

  --keep-remote         Keep local copies of remote input files.

  --keep-target-files   Do not adjust the paths of given target files relative

                        to the working directory.

  --keep-shadow         Do not delete the shadow directory on snakemake

                        startup.

  --allowed-rules ALLOWED_RULES [ALLOWED_RULES ...]

                        Only consider given rules. If omitted, all rules in

                        Snakefile are used. Note that this is intended

                        primarily for internal use and may lead to unexpected

                        results otherwise.

  --max-jobs-per-second MAX_JOBS_PER_SECOND

                        Maximal number of cluster/drmaa jobs per second,

                        default is 10, fractions allowed.

  --max-status-checks-per-second MAX_STATUS_CHECKS_PER_SECOND

                        Maximal number of job status checks per second,

                        default is 10, fractions allowed.

  --restart-times RESTART_TIMES

                        Number of times to restart failing jobs (defaults to

                        0).

  --attempt ATTEMPT     Internal use only: define the initial value of the

                        attempt parameter (default: 1).

  --timestamp, -T       Add a timestamp to all logging output

  --greediness GREEDINESS

                        Set the greediness of scheduling. This value between 0

                        and 1 determines how careful jobs are selected for

                        execution. The default value (1.0) provides the best

                        speed and still acceptable scheduling quality.

  --no-hooks            Do not invoke onstart, onsuccess or onerror hooks

                        after execution.

  --print-compilation   Print the python representation of the workflow.

  --overwrite-shellcmd OVERWRITE_SHELLCMD

                        Provide a shell command that shall be executed instead

                        of those given in the workflow. This is for debugging

                        purposes only.

  --verbose             Print debugging output.

  --debug               Allow to debug rules with e.g. PDB. This flag allows

                        to set breakpoints in run blocks.

  --runtime-profile FILE

                        Profile Snakemake and write the output to FILE. This

                        requires yappi to be installed.

  --mode {0,1,2}        Set execution mode of Snakemake (internal use only).

  --bash-completion     Output code to register bash completion for snakemake.

                        Put the following in your .bashrc (including the

                        accents): `snakemake --bash-completion` or issue it in

                        an open terminal session.

  --use-conda           If defined in the rule, run job in a conda

                        environment. If this flag is not set, the conda

                        directive is ignored.

  --conda-prefix DIR    Specify a directory in which the 'conda' and 'conda-

                        archive' directories are created. These are used to

                        store conda environments and their archives,

                        respectively. If not supplied, the value is set to the

                        '.snakemake' directory relative to the invocation

                        directory. If supplied, the `--use-conda` flag must

                        also be set. The value may be given as a relative

                        path, which will be extrapolated to the invocation

                        directory, or as an absolute path.

  --create-envs-only    If specified, only creates the job-specific conda

                        environments then exits. The `--use-conda` flag must

                        also be set.

  --use-singularity     If defined in the rule, run job within a singularity

                        container. If this flag is not set, the singularity

                        directive is ignored.

  --singularity-prefix DIR

                        Specify a directory in which singularity images will

                        be stored.If not supplied, the value is set to the

                        '.snakemake' directory relative to the invocation

                        directory. If supplied, the `--use-singularity` flag

                        must also be set. The value may be given as a relative

                        path, which will be extrapolated to the invocation

                        directory, or as an absolute path.

  --singularity-args ARGS

                        Pass additional args to singularity.

  --wrapper-prefix WRAPPER_PREFIX

                        Prefix for URL created from wrapper directive

                        (default: https://bitbucket.org/snakemake/snakemake-

                        wrappers/raw/). Set this to a different URL to use

                        your fork or a local clone of the repository.

  --default-remote-provider {S3,GS,FTP,SFTP,S3Mocked,gfal,gridftp}

                        Specify default remote provider to be used for all

                        input and output files that don't yet specify one.

  --default-remote-prefix DEFAULT_REMOTE_PREFIX

                        Specify prefix for default remote provider. E.g. a

                        bucket name.

  --no-shared-fs        Do not assume that jobs share a common file system.

                        When this flag is activated, Snakemake will assume

                        that the filesystem on a cluster node is not shared

                        with other nodes. For example, this will lead to

                        downloading remote files on each cluster node

                        separately. Further, it won't take special measures to

                        deal with filesystem latency issues. This option will

                        in most cases only make sense in combination with

                        --default-remote-provider. Further, when using

                        --cluster you will have to also provide --cluster-

                        status. Only activate this if you know what you are

                        doing.

  --version, -v         show program's version number and exit

 

実行方法

ランにはデータベースディレクトリやシーケンシングデータのパスを記載した、以下のようなconfigファイルが必要になる。

workdir: "/home/user/folder/results/"
dbdir: "/home/user/folder/databases/"
samples:
  sample_name_1:
     fq1: "/home/user/folder/reads/file.1.fq"
     fq2: "/home/user/folder/reads/file.2.fq"

データベースはmetametaのGithubレポジトリのlinkからダウンロードできる。

f:id:kazumaxneo:20180920212550j:plain

 

 configファイルを全て埋めたら実行する。

metameta --configfile yourconfig.yaml --use-conda --keep-going --cores 24

 

 テストラン

#metametaをインストールしたディレクトリに移動する。pyenv下でanacondaを管理していたので以下の場所に移動した。
cd /home/kazu/.pyenv/versions/anaconda3-4.0.0/opt/metameta/
metameta --configfile sampledata/sample_data_custom_viral.yaml --use-conda --keep-going --cores 6

 大量のエラーが出る。ツール間のバッティングがありそうなので、condaで仮想環境を作りやり直してみる。

 

 

 

 

 

複数結果の統合、さらなるツールの追加などもサポートしています。

引用

MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling
Piro VC, Matschkowski M, Renard BY

Microbiome. 2017 Aug 14;5(1):101