macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

TPMカウントから主成分分析を行う zpca

 

zpcaはFoivos Gypasが公開されているTPMカウントから主成分分析を行うツール。試してみる。

 

インストール

ubuntu18.04のminiconda3.8環境でmambaを使って導入した。

#bioconda (link)
mamba install -c bioconda -y zpca

#docker
docker pull zavolab/zpca

#Singularity
singularity pull docker://zavolab/zpca

> zpca-tpm -h

$ zpca-tpm -h

usage: zpca-tpm [-h] --tpm FILE [--tpm-filter TPM_FILTER] [--tpm-pseudocount TPM_PSEUDOCOUNT] --out FILE [-v]

 

Perform PCA based on a TPM expression matrix (rows are genes/transcripts, columns are samples).

 

optional arguments:

-h, --help show this help message and exit

--tpm FILE TPM table (tsv).

--tpm-filter TPM_FILTER

Filter genes/transcripts with mean expression less than the provided filter. Default: 0

--tpm-pseudocount TPM_PSEUDOCOUNT

Pseudocount to add in the tpm table. Default: 1

--out FILE Output directory

-v, --verbose Verbose

> zzpca-counts -h

usage: zpca-counts [-h] --counts FILE --lengths FILE [--pseudocount PSEUDOCOUNT] [--filter-not-expressed] --out DIRECTORY [-v]

 

Perform PCA based on an expression matrix (rows are genes/transcripts, columns are samples).

 

optional arguments:

-h, --help show this help message and exit

--counts FILE Counts table (tsv). The first column should contain the gene/transcript id. The other columns should contain the counts for each sample.

--lengths FILE Table of feature lengths (tsv). 

The file can have two types of formats.

First option: The first column should contain the gene/transcript id.

The second column should contain the corresponding lengths

Second option: The first column should contain the gene/transcript id.

The rest of the columns should contain the gene/transcript lengths for each of the samples

Note that the sample names should be the same the sample names of the counts.

--pseudocount PSEUDOCOUNT

Pseudocount to add in the count table. Default: 1

--filter-not-expressed

Filter not expressed genes/transcripts (0 counts for all samples).

--out DIRECTORY Output directory

-v, --verbose Verbose

 

 

実行方法

TPMカウントのmatrixファイルを指定する。

zpca-tpm --tpm gene_TPM.txt --out outdir

./outdir

f:id:kazumaxneo:20210330014503p:plain

 

scree_plot.png

f:id:kazumaxneo:20210330014329p:plain

 

引用

GitHub - zavolanlab/zpca: PCA analysis