2019 6/17 追記
2020 2/21 タイトル修正
2020 3/30 help追記
BEDヘの変換はawkやperlやpythonのスクリプトで簡単にできるが、BEDOPSのvcf2nedを使うと、indelの種類などによってフィルタリングしながら分類することができ便利である。
インストール
#homebrew
brew install BEDOPS
#bioconda(link)
conda install -c bioconda -y bedops
> bedops
$ bedops
bedops
citation: http://bioinformatics.oxfordjournals.org/content/28/14/1919.abstract
https://doi.org/10.1093/bioinformatics/bts277
version: 2.4.37 (typical)
authors: Shane Neph & Scott Kuehn
USAGE: bedops [process-flags] <operation> <File(s)>*
Every input file must be sorted per the sort-bed utility.
Each operation requires a minimum number of files as shown below.
There is no fixed maximum number of files that may be used.
Input files must have at least the first 3 columns of the BED specification.
The program accepts BED and Starch file formats.
May use '-' for a file to indicate reading from standard input (BED format only).
Process Flags:
--chrom <chromosome> Jump to and process data for given <chromosome> only.
--ec Error check input files (slower).
--header Accept headers (VCF, GFF, SAM, BED, WIG) in any input file.
--help Print this message and exit successfully.
--help-<operation> Detailed help on <operation>.
An example is --help-c or --help-complement
--range L:R Add 'L' bp to all start coordinates and 'R' bp to end
coordinates. Either value may be + or - to grow or
shrink regions. With the -e/-n operations, the first
(reference) file is not padded, unlike all other files.
--range S Pad or shrink input file(s) coordinates symmetrically by S.
This is shorthand for: --range -S:S.
--version Print program information.
Operations: (choose one of)
-c, --complement [-L] File1 [File]*
-d, --difference ReferenceFile File2 [File]*
-e, --element-of [bp | percentage] ReferenceFile File2 [File]*
by default, -e 100% is used. 'bedops -e 1' is also popular.
-i, --intersect File1 File2 [File]*
-m, --merge File1 [File]*
-n, --not-element-of [bp | percentage] ReferenceFile File2 [File]*
by default, -n 100% is used. 'bedops -n 1' is also popular.
-p, --partition File1 [File]*
-s, --symmdiff File1 File2 [File]*
-u, --everything File1 [File]*
-w, --chop [bp] [--stagger <nt>] [-x] File1 [File]*
by default, -w 1 is used with no staggering.
Example: bedops --range 10 -u file1.bed
NOTE: Only operations -e|n|u preserve all columns (no flattening)
公式マニュアル
http://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/vcf2bed.html
ラン
vcfからbedに変換する。
vcf2bed < gatk.vcf > gatk.bed
- --do-not-sort (-d) Do not sort BED output with sort-bed
- --snvs (-v) Report only single nucleotide variants
- --insertions (-t) Report only insertion variants
- --deletions (-n) Report only deletion variants
- --keep-header (-k) Preserve header section as pseudo-BED elements
snpsのみbedに変換する。
vcf2bed --snvs < gatk.vcf > gatk_snps.bed
塩基置換、挿入、欠損の数を数える。
vcf2bed --snvs < gatk.vcf|wc -l #SNV
vcf2bed --insertions < gatk.vcf|wc -l #Insertion
vcf2bed --deletions < gatk.vcf|wc -l #Deletion
vcf2bedはBAM、GFF、GTF、GVF、PSL、RepeatMasker (OUT)、SAM、VCF、WIGなど多様なフォーマットをBEDに変換することができる。
GFF(GFF3)をbedに変換する。
convert2bed --input=gff < input.gff3 > output.bed
またはawkを使う。以下のようにして6列フォーマットのBEDに変換できる。
cat input.gtf | awk '{OFS = "\t"} {print $1,$4,$5,$3,$6,$7}' > output.bed
awkはデフォルトスペース区切り出力だが、bedtoolsはタブを区切りとして認識するので、タブ区切りを指定。
追記
BEDからGTF (cent OSで動作確認)
awk '{print $1"\t"$7"\t"$8"\t"($2+1)"\t"$3"\t"$5"\t"$6"\t"$9"\t"(substr($0, index($0,$10)))}' input.bed > output.gtf
BEDを使って何かするにはbedtoolsを使います。
引用
BEDOPS: high-performance genomic feature operations
Neph S1, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, Rynes E, Maurano MT, Vierstra J, Thomas S, Sandstrom R, Humbert R, Stamatoyannopoulos JA.
Bioinformatics. 2012 Jul 15;28(14):1919-20
How To Convert Bed Format To Gtf?