2021 4/2 追記
2021 8/30 checkm tetra 追記
CheckM wiki - plots
> checkm gc_plot -h
$ checkm gc_plot -h
usage: checkm gc_plot [-h] [--image_type {eps,pdf,png,ps,svg}] [--dpi DPI]
[--font_size FONT_SIZE] [-x EXTENSION] [--width WIDTH]
bin_dir output_dir dist_value [dist_value ...]
Create GC histogram and delta-GC plot.
positional arguments:
bin_dir directory containing bins to plot (fasta format)
output_dir directory to hold plots
dist_value reference distribution(s) to plot; integer between 0 and 100
optional arguments:
-h, --help show this help message and exit
--image_type {eps,pdf,png,ps,svg}
desired image type (default: png)
--dpi DPI desired DPI of output image (default: 600)
--font_size FONT_SIZE
Desired font size (default: 8)
-x, --extension EXTENSION
extension of bins (other files in directory are ignored) (default: fna)
--width WIDTH width of output image (default: 6.5)
--height HEIGHT height of output image (default: 3.5)
-w, --gc_window_size GC_WINDOW_SIZE
window size used to calculate GC histogram (default: 5000)
-b, --gc_bin_width GC_BIN_WIDTH
width of GC bars in histogram (default: 0.01)
-q, --quiet suppress console output
> checkm tetra_plot -h
$ checkm tetra_plot -h
usage: checkm tetra_plot [-h] [--image_type {eps,pdf,png,ps,svg}] [--dpi DPI]
[--font_size FONT_SIZE] [-x EXTENSION]
[--width WIDTH] [--height HEIGHT] [-w TD_WINDOW_SIZE]
[-b TD_BIN_WIDTH] [-q]
results_dir bin_dir output_dir tetra_profile
Create tetranucleotide distance (TD) histogram and delta-TD plot.
positional arguments:
results_dir directory specified during qa command
bin_dir directory containing bins to plot (fasta format)
output_dir directory to hold plots
tetra_profile tetranucleotide profiles for each bin (see tetra command)
dist_value reference distribution(s) to plot; integer between 0 and 100
optional arguments:
-h, --help show this help message and exit
--image_type {eps,pdf,png,ps,svg}
desired image type (default: png)
--dpi DPI desired DPI of output image (default: 600)
--font_size FONT_SIZE
Desired font size (default: 8)
-x, --extension EXTENSION
extension of bins (other files in directory are ignored) (default: fna)
--width WIDTH width of output image (default: 6.5)
--height HEIGHT height of output image (default: 3.5)
-w, --td_window_size TD_WINDOW_SIZE
window size used to calculate TD histogram (default: 5000)
-b, --td_bin_width TD_BIN_WIDTH
width of TD bars in histogram (default: 0.01)
-q, --quiet suppress console output
Example: checkm tetra_plot ./output ./bins ./plots tetra.tsv 95
> checkm len_plot -h
$ checkm len_plot -h
usage: checkm len_plot [-h] [--image_type {eps,pdf,png,ps,svg}] [--dpi DPI]
[--font_size FONT_SIZE] [-x EXTENSION] [--width WIDTH]
[--height HEIGHT] [-q]
bin_dir output_dir
Cumulative sequence length plot.
positional arguments:
bin_dir directory containing bins to plot (fasta format)
output_dir directory to hold plots
optional arguments:
-h, --help show this help message and exit
--image_type {eps,pdf,png,ps,svg}
desired image type (default: png)
--dpi DPI desired DPI of output image (default: 600)
--font_size FONT_SIZE
Desired font size (default: 8)
-x, --extension EXTENSION
extension of bins (other files in directory are ignored) (default: fna)
--width WIDTH width of output image (default: 6.5)
--height HEIGHT height of output image (default: 6.5)
-q, --quiet suppress console output
Example: checkm len_plot ./bins ./plots
> checkm marker_plot -h
$ checkm marker_plot -h
usage: checkm marker_plot [-h] [--image_type {eps,pdf,png,ps,svg}] [--dpi DPI]
[--font_size FONT_SIZE] [-x EXTENSION]
[--width WIDTH] [--height HEIGHT]
[--fig_padding FIG_PADDING] [-q]
results_dir bin_dir output_dir
Plot position of marker genes on sequences.
positional arguments:
results_dir directory specified during qa command
bin_dir directory containing bins to plot (fasta format)
output_dir directory to hold plots
optional arguments:
-h, --help show this help message and exit
--image_type {eps,pdf,png,ps,svg}
desired image type (default: png)
--dpi DPI desired DPI of output image (default: 600)
--font_size FONT_SIZE
Desired font size (default: 8)
-x, --extension EXTENSION
extension of bins (other files in directory are ignored) (default: fna)
--width WIDTH width of output image (default: 6.5)
--height HEIGHT height of output image (default: 6.5)
--fig_padding FIG_PADDING
white space to place around figure (in inches) (default: 0.2)
-q, --quiet suppress console output
Example: checkm marker_plot ./output ./bins ./plots
> checkm par_plot -h
$ checkm par_plot -h
usage: checkm par_plot [-h] [--image_type {eps,pdf,png,ps,svg}] [--dpi DPI]
[--font_size FONT_SIZE] [-x EXTENSION] [--width WIDTH]
[--height HEIGHT] [-q]
results_dir bin_dir output_dir coverage_file
Parallel coordinate plot of GC and coverage.
positional arguments:
results_dir directory specified during qa command
bin_dir directory containing bins to plot (fasta format)
output_dir directory to hold plots
coverage_file file indicating coverage of each sequence (see coverage command)
optional arguments:
-h, --help show this help message and exit
--image_type {eps,pdf,png,ps,svg}
desired image type (default: png)
--dpi DPI desired DPI of output image (default: 600)
--font_size FONT_SIZE
Desired font size (default: 8)
-x, --extension EXTENSION
extension of bins (other files in directory are ignored) (default: fna)
--width WIDTH width of output image (default: 6.5)
--height HEIGHT height of output image (default: 6.5)
-q, --quiet suppress console output
Example: checkm par_plot ./output ./bins ./plots coverage.tsv
> checkm cov_pca -h
$ checkm cov_pca -h
usage: checkm cov_pca [-h] [--image_type {eps,pdf,png,ps,svg}] [--dpi DPI]
[--font_size FONT_SIZE] [-x EXTENSION] [--width WIDTH]
[--height HEIGHT] [-q]
bin_dir output_dir coverage_file
PCA plot of coverage profiles.
positional arguments:
bin_dir directory containing bins to plot (fasta format)
output_dir directory to hold plots
coverage_file file indicating coverage of each sequence (see coverage command)
optional arguments:
-h, --help show this help message and exit
--image_type {eps,pdf,png,ps,svg}
desired image type (default: png)
--dpi DPI desired DPI of output image (default: 600)
--font_size FONT_SIZE
Desired font size (default: 8)
-x, --extension EXTENSION
extension of bins (other files in directory are ignored) (default: fna)
--width WIDTH width of output image (default: 6.5)
--height HEIGHT height of output image (default: 6.5)
-q, --quiet suppress console output
Example: checkm cov_pca ./bins ./plots coverate.tsv
> checkm coverage -h
$ checkm coverage -h
usage: checkm coverage [-h] [-x EXTENSION] [-r] [-a MIN_ALIGN]
[-e MAX_EDIT_DIST] [-m MIN_QC] [-t THREADS] [-q]
bin_dir output_file bam_files [bam_files ...]
Calculate coverage of sequences.
positional arguments:
bin_dir directory containing bins (fasta format)
output_file print results to file
bam_files BAM files to parse
optional arguments:
-h, --help show this help message and exit
-x, --extension EXTENSION
extension of bins (other files in directory are ignored) (default: fna)
-r, --all_reads use all reads to estimate coverage instead of just those in proper pairs
-a, --min_align MIN_ALIGN
minimum alignment length as percentage of read length (default: 0.98)
-e, --max_edit_dist MAX_EDIT_DIST
maximum edit distance as percentage of read length (default: 0.02)
-m, --min_qc MIN_QC minimum quality score (in phred) (default: 15)
-t, --threads THREADS
number of threads (default: 1)
-q, --quiet suppress console output
Example: checkm coverage ./bins coverage.tsv example_1.bam example_2.bam
checkm lineage_wf -t 20 -x fa metagenome/ output 1> log
ゲノムビン内の配列のGC分布の評価に適したグラフの出力。binned_dir はメタゲノムのコンティグのディレクトリ(ここではmetagenome/)、output_dir は出力ディレクトリの指定。
checkm gc_plot -x fa binned_dir output_dir 95
左のグラフはGCのヒストグラム。典型的な 純化されたゲノムはシングルピークの分布になる。右のグラフは、ゲノムビンの各シーケンスを、ゲノム全体の平均GC(x軸)およびシーケンス長(y軸)からの偏差の関数としてプロットしたもの。赤の破線は長さの関数として平均GCから予想される偏差を表す。
ゲノムビン内の配列のcoding density評価に適したグラフの出力。checkm_result_dir は
checkmのcheckm lineage_wfコマンドの出力ディレクトリ。binned_dir はメタゲノムのコンティグのディレクトリ(ここではmetagenome/)、output_dir は出力ディレクトリの指定。
checkm coding_plot -x fa checkm_result_dir binned_dir output_dir 95
checkm len_plot -x fa binned_dir output_dir
ゲノムビン配列上のマーカー遺伝子の位置をプロット。 これにより、マーカー遺伝子が連結されている範囲に関する情報が提供される。 マーカー遺伝子のない配列は表示されない。
checkm marker_plot -x fa checkm_result_dir binned_dir output_dir
ゲノムビン内の各配列のGCとカバレッジを示す平行座標プロットを生成する。 典型的なゲノムでは、すべてのシーケンスがプロット全体に同様のパスを生成する。 配列の分岐パスを持つシーケンスは汚染である可能性がある。このプロットには、ゲノムビン内のすべての配列のカバレッジファイルが必要になる。前もってcoverageコマンドを実行しておく。
checkm coverage -x fa binned_dir output_coverage input.bam
checkm profile output_coverage > coverage_profile
checkm par_plot -x fa checkm_result binned_dir output_dir output_coverage
配列間のカバレッジプロファイル距離の主成分プロット(PCA)を生成する。 このプロットには、ゲノムビン内のすべての配列のカバレッジプロファイルを示すファイルが必要( 3サンプル以上必要)。
checkm coverage -x fa binned_dir output_coverage \
sample1.bam sample2.bam sample3.bam
checkm par_plot -x fa checkm_result binned_dir output_dir output_coverage
2021 8/30
checkm tetra -t 30 assembly.fasta tetranucleotide.tsv
checkm tetra_plot -x fasta checkM_outdir/ bin_dir/ output_dir tetranucleotide.tsv 95
CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes.
Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW.
Genome Res. 2015 Jul;25(7):1043-55.