オオムギ（大麦）のRNA seq解析 - macでインフォマティクス

勉強会用資料

時系列データ

publishされた論文

http://www.sciencedirect.com/science/article/pii/S1631069115000888?via=ihub

利用するシーケンスデータ

https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP032854

fastaとgtf

http://plants.ensembl.org/Hordeum_vulgare/Info/Index

シーケンスデータのダウンロード（上記リンクよりAccession Listをダウンロードして使用）

prefetch --option-file SRR_Acc_List.txt  --max-size 1000000000

#SRAのID、例えばSRR6364639を直接ダウンロードするなら
prefetch SRR6364639 --max-size 1000000000

--max-size 1000000000　20GB以上のファイルのダウンロードに必要。

fastqに変換

fastq-dump <inputt.sra> -O <directory>

-O　指定したディレクトリに出力。
--split-files 　paired-endならをつける。

index

bowtie2-build -f Hordeum_vulgare.fa chr

mapping

tophat2 -I 500000 -i 20 -p 15 --read-mismatches 5 --read-gap-length 4 --max-multihits 100 --read-edit-dist 6 -G Hordeum_vulgare.Hv_IBSC_PGSB_v2.36.gtf -o tophat_Bs1_2 chr Bs3_24/SRR1049655.fastq

featurecount

featureCounts -T 8 -t exon -g gene_id -a Hordeum_vulgare.Hv_IBSC_PGSB_v2.36.gtf -o geaturecounts_all.txt tophat_Bs1_2/accepted_hits.bam tophat_Bs2_2/accepted_hits.bam tophat_Bs3_2/accepted_hits.bam tophat_Bs1_24/accepted_hits.bam tophat_Bs2_24/accepted_hits.bam tophat_Bs3_24/accepted_hits.bam

raw read

library("ggplot2") 
library("reshape2") 
rawCount = read.table("featureCounts_counts_trimmed.txt", header = TRUE, row.names = 1) 
 head(rawCount) #必ず毎回確認 
artificialCount = log2(rawCount + 1) 
head(artificialCount)
df <- melt(logdata) #data.frame形式に変換。 
head(df) #必ず毎回確認
g <- ggplot(df, aes (x = variable, y = value )) + geom_boxplot() 
plot(g)

f:id:kazumaxneo:20170716161733j:plain