zpcaはFoivos Gypasが公開されているTPMカウントから主成分分析を行うツール。試してみる。
インストール
ubuntu18.04のminiconda3.8環境でmambaを使って導入した。
#bioconda (link)
mamba install -c bioconda -y zpca
#docker
docker pull zavolab/zpca
#Singularity
singularity pull docker://zavolab/zpca
> zpca-tpm -h
$ zpca-tpm -h
usage: zpca-tpm [-h] --tpm FILE [--tpm-filter TPM_FILTER] [--tpm-pseudocount TPM_PSEUDOCOUNT] --out FILE [-v]
Perform PCA based on a TPM expression matrix (rows are genes/transcripts, columns are samples).
optional arguments:
-h, --help show this help message and exit
Filter genes/transcripts with mean expression less than the provided filter. Default: 0
--tpm-pseudocount TPM_PSEUDOCOUNT
Pseudocount to add in the tpm table. Default: 1
--out FILE Output directory
-v, --verbose Verbose
> zzpca-counts -h
usage: zpca-counts [-h] --counts FILE --lengths FILE [--pseudocount PSEUDOCOUNT] [--filter-not-expressed] --out DIRECTORY [-v]
Perform PCA based on an expression matrix (rows are genes/transcripts, columns are samples).
optional arguments:
-h, --help show this help message and exit
--counts FILE Counts table (tsv). The first column should contain the gene/transcript id. The other columns should contain the counts for each sample.
--lengths FILE Table of feature lengths (tsv).
The file can have two types of formats.
First option: The first column should contain the gene/transcript id.
The second column should contain the corresponding lengths
Second option: The first column should contain the gene/transcript id.
The rest of the columns should contain the gene/transcript lengths for each of the samples
Note that the sample names should be the same the sample names of the counts.
--pseudocount PSEUDOCOUNT
Pseudocount to add in the count table. Default: 1
--filter-not-expressed
Filter not expressed genes/transcripts (0 counts for all samples).
--out DIRECTORY Output directory
-v, --verbose Verbose
実行方法
TPMカウントのmatrixファイルを指定する。
zpca-tpm --tpm gene_TPM.txt --out outdir
./outdir
scree_plot.png
引用
GitHub - zavolanlab/zpca: PCA analysis