macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

GTF/GFF2をCSVに変換する GTF2CSV

 

GTF2CSVは、GFF2をCSVに変換して、データベースに挿入したり、pandasのdataframeに読み込んでスライスしたりすることができる。

 

インストール

pipを使って導入した。

Github

pip install git+https://github.com/zyxue/gtf2csv.git#egg=gtf2csv

> gtf2csv -h

usage: gtf2csv [-h] -f GTF [-c CARDINALITY_CUTOFF] [-o OUTPUT] [-m {csv,pkl}] [-t NUM_CPUS]

 

Convert GTF file to plain csv

 

optional arguments:

-h, --help show this help message and exit

-f GTF, --gtf GTF the GTF file to convert

-c CARDINALITY_CUTOFF, --cardinality-cutoff CARDINALITY_CUTOFF

for a tag that may appear multiple times in the attribute column (so-called multiplicity tag in this program), if its cardinality, i.e. the number of possibles values across all row, is lower than this cutoff, then it's a low-

caridnaltiy tag, and each of its possible value would be transformed into a separate binary column. Otherwise, it is a high-cardinality tag and all of its values in one row would be simply concatenated to avoid making too many

columns

-o OUTPUT, --output OUTPUT

the output filename, if not specified, would just set it to be the same as the input but with extension replaced (gtf => csv)

-m {csv,pkl}, --output-format {csv,pkl}

default to csv, but pkl (python pickle format) is much faster in IO, thus recommended

-t NUM_CPUS, --num-cpus NUM_CPUS

number of cpus for parallel processing, default to 1

 

 

実行方法

gtf2csv -f input.gtf -o out.csv -t 3
  • -o    the output filename, if not specified, would just set it to be the same as the input but with extension replaced (gtf => csv)
  • -f   GTF the GTF file to convert
  • -t   number of cpus for parallel processing, default to 1

 

引用

GitHub - zyxue/gtf2csv: Convert genome annotation GTF file into plain CSV format