GTF2CSVは、GFF2をCSVに変換して、データベースに挿入したり、pandasのdataframeに読み込んでスライスしたりすることができる。
インストール
pipを使って導入した。
pip install git+https://github.com/zyxue/gtf2csv.git#egg=gtf2csv
> gtf2csv -h
usage: gtf2csv [-h] -f GTF [-c CARDINALITY_CUTOFF] [-o OUTPUT] [-m {csv,pkl}] [-t NUM_CPUS]
Convert GTF file to plain csv
optional arguments:
-h, --help show this help message and exit
-f GTF, --gtf GTF the GTF file to convert
-c CARDINALITY_CUTOFF, --cardinality-cutoff CARDINALITY_CUTOFF
for a tag that may appear multiple times in the attribute column (so-called multiplicity tag in this program), if its cardinality, i.e. the number of possibles values across all row, is lower than this cutoff, then it's a low-
caridnaltiy tag, and each of its possible value would be transformed into a separate binary column. Otherwise, it is a high-cardinality tag and all of its values in one row would be simply concatenated to avoid making too many
columns
-o OUTPUT, --output OUTPUT
the output filename, if not specified, would just set it to be the same as the input but with extension replaced (gtf => csv)
-m {csv,pkl}, --output-format {csv,pkl}
default to csv, but pkl (python pickle format) is much faster in IO, thus recommended
-t NUM_CPUS, --num-cpus NUM_CPUS
number of cpus for parallel processing, default to 1
実行方法
gtf2csv -f input.gtf -o out.csv -t 3
- -o the output filename, if not specified, would just set it to be the same as the input but with extension replaced (gtf => csv)
- -f GTF the GTF file to convert
- -t number of cpus for parallel processing, default to 1
引用
GitHub - zyxue/gtf2csv: Convert genome annotation GTF file into plain CSV format