macでインフォマティクス

macでインフォマティクス

HTS (NGS) 関連のインフォマティクス情報についてまとめています。

UCSC、NCBI、Ensemblからゲノムをダウンロードする genomepy

2021 10/9 コマンドの修正(バージョンアップ)

 

タイトルの通りのツール。簡単に紹介します。

 

インストール

依存

  • tabix
  • genePredToBed
  • genePredToGtf
  • bedToGenePred
  • gtfToGenePred
  • gff3ToGenePred
conda install -c bioconda -y ucsc-genepredtobed
conda install -c bioconda -y ucsc-genepredtogtf
conda install -c bioconda -y ucsc-gtftogenepred
conda install -c bioconda -y ucsc-bedtogenepred
conda install -c bioconda -y ucsc-gff3togenepred

本体 Github

#bioconda (link)
conda install -c bioconda -y genomepy

#pip
pip install genomepy

genomepy -h

$ genomepy -h

Usage: genomepy [OPTIONS] COMMAND [ARGS]...

 

Options:

  --version   Show the version and exit.

  -h, --help  Show this message and exit.

 

Commands:

  config     manage configuration

  genomes    list available genomes

  install    install genome

  plugin     manage plugins

  providers  list available providers

  search     search for genomes

 

 

実行方法

1、ONにするpluginを確認。

genomepy plugin list

$ genomepy plugin list

plugin              enabled

blacklist           

bowtie2             

bwa                 

gaps                *

gmap                

hisat2              

minimap2            

sizes               *

アライナーはbowtie2、bwa、gmap、hisat2、minimap2、starを指定できる。ONにすると、そのアライナーのindexがgenomeのダウンロード後に作成される。当然、そのアライナーのコマンドが使用できない環境では作成されない。blascklistは以下の論文が詳しい(link)。

 

(optional)、 pluginのbwaをONにする。

genomepy plugin enable bwa

ONになっているか確認する。

> genomepy plugin list 

$ genomepy plugin list

plugin              enabled

blacklist           

bowtie2             

bwa                 *

gaps                *

gmap                

hisat2              

minimap2            

sizes               *

ONになっている。 

 

3、利用できるデータベースを確認する。

genomepy providers

$ genomepy providers

Ensembl

UCSC

NCBI

UCSCNCBIEnsemblのゲノムが利用できる。

 

例えばUCSCからダウンロードできるゲノムを確認する。

genomepy genomes -p UCSC

$ genomepy genomes -p UCSC

UCSC hg38 Human Dec. 2013 (GRCh38/hg38) Genome at UCSC

UCSC hg19 Human Feb. 2009 (GRCh37/hg19) Genome at UCSC

UCSC hg18 Human Mar. 2006 (NCBI36/hg18) Genome at UCSC

UCSC hg17 Human May 2004 (NCBI35/hg17) Genome at UCSC

UCSC hg16 Human July 2003 (NCBI34/hg16) Genome at UCSC

UCSC mm10 Mouse Dec. 2011 (GRCm38/mm10) Genome at UCSC

UCSC mm9 Mouse July 2007 (NCBI37/mm9) Genome at UCSC

UCSC mm8 Mouse Feb. 2006 (NCBI36/mm8) Genome at UCSC

UCSC mm7 Mouse Aug. 2005 (NCBI35/mm7) Genome at UCSC

UCSC anoGam3 A. gambiae Oct. 2006 (AgamP3/anoGam3) Genome at UCSC

UCSC anoGam1 A. gambiae Feb. 2003 (IAGEC MOZ2/anoGam1) Genome at UCSC

UCSC apiMel2 A. mellifera Jan. 2005 (Baylor 2.0/apiMel2) Genome at UCSC

UCSC apiMel1 A. mellifera July 2004 (Baylor 1.2/apiMel1) Genome at UCSC

UCSC xenLae2 African clawed frog Aug. 2016 (Xenopus_laevis_v2/xenLae2) Genome at UCSC

UCSC vicPac2 Alpaca Mar. 2013 (Vicugna_pacos-2.0.1/vicPac2) Genome at UCSC

UCSC vicPac1 Alpaca Jul. 2008 (Broad/vicPac1) Genome at UCSC

UCSC allMis1 American alligator Aug. 2012 (allMis0.2/allMis1) Genome at UCSC

UCSC dasNov3 Armadillo Dec. 2011 (Baylor/dasNov3) Genome at UCSC

UCSC gadMor1 Atlantic cod May 2010 (Genofisk GadMor_May2010/gadMor1) Genome at UCSC

UCSC papAnu4 Baboon Apr. 2017 (Panu_3.0/papAnu4) Genome at UCSC

UCSC papAnu2 Baboon Mar. 2012 (Baylor Panu_2.0/papAnu2) Genome at UCSC

UCSC papHam1 Baboon Nov. 2008 (Baylor Pham_1.0/papHam1) Genome at UCSC

UCSC bisBis1 Bison Oct. 2014 (Bison_UMD1.0/bisBis1) Genome at UCSC

UCSC panPan2 Bonobo Aug. 2015 (MPI-EVA panpan1.1/panPan2) Genome at UCSC

UCSC panPan1 Bonobo May 2012 (Max-Planck/panPan1) Genome at UCSC

UCSC aptMan1 Brown kiwi Jun. 2015 (MPI-EVA AptMant0/aptMan1) Genome at UCSC

UCSC melUnd1 Budgerigar Sep. 2011 (WUSTL v6.3/melUnd1) Genome at UCSC

UCSC otoGar3 Bushbaby Mar. 2011 (Broad/otoGar3) Genome at UCSC

UCSC caePb2 C. brenneri Feb. 2008 (WUGSC 6.0.1/caePb2) Genome at UCSC

UCSC caePb1 C. brenneri Jan. 2007 (WUGSC 4.0/caePb1) Genome at UCSC

UCSC cb3 C. briggsae Jan. 2007 (WUGSC 1.0/cb3) Genome at UCSC

UCSC cb1 C. briggsae July 2002 (WormBase cb25.agp8/cb1) Genome at UCSC

UCSC ce11 C. elegans Feb. 2013 (WBcel235/ce11) Genome at UCSC

UCSC ce10 C. elegans Oct. 2010 (WS220/ce10) Genome at UCSC

UCSC ce6 C. elegans May 2008 (WS190/ce6) Genome at UCSC

UCSC ce4 C. elegans Jan. 2007 (WS170/ce4) Genome at UCSC

UCSC ce2 C. elegans Mar. 2004 (WS120/ce2) Genome at UCSC

UCSC ci3 C. intestinalis Apr. 2011 (Kyoto KH/ci3) Genome at UCSC

UCSC ci2 C. intestinalis Mar. 2005 (JGI 2.1/ci2) Genome at UCSC

UCSC ci1 C. intestinalis Dec. 2002 (JGI 1.0/ci1) Genome at UCSC

UCSC caeJap1 C. japonica Mar. 2008 (WUGSC 3.0.2/caeJap1) Genome at UCSC

UCSC caeRem3 C. remanei May 2007 (WUGSC 15.0.1/caeRem3) Genome at UCSC

UCSC caeRem2 C. remanei Mar. 2006 (WUGSC 1.0/caeRem2) Genome at UCSC

UCSC felCat9 Cat Nov. 2017 (Felis_catus_9.0/felCat9) Genome at UCSC

UCSC felCat8 Cat Nov. 2014 (ICGSC Felis_catus_8.0/felCat8) Genome at UCSC

UCSC felCat5 Cat Sep. 2011 (ICGSC Felis_catus 6.2/felCat5) Genome at UCSC

UCSC felCat4 Cat Dec. 2008 (NHGRI/GTB V17e/felCat4) Genome at UCSC

UCSC felCat3 Cat Mar. 2006 (Broad/felCat3) Genome at UCSC

UCSC galGal6 Chicken Mar. 2018 (GRCg6a/galGal6) Genome at UCSC

UCSC galGal5 Chicken Dec 2015 (Gallus_gallus-5.0/galGal5) Genome at UCSC

UCSC galGal4 Chicken Nov. 2011 (ICGSC Gallus_gallus-4.0/galGal4) Genome at UCSC

UCSC galGal3 Chicken May 2006 (WUGSC 2.1/galGal3) Genome at UCSC

UCSC galGal2 Chicken Feb. 2004 (WUGSC 1.0/galGal2) Genome at UCSC

UCSC panTro6 Chimp Jan. 2018 (Clint_PTRv2/panTro6) Genome at UCSC

UCSC panTro5 Chimp May 2016 (Pan_tro 3.0/panTro5) Genome at UCSC

UCSC panTro4 Chimp Feb. 2011 (CSAC 2.1.4/panTro4) Genome at UCSC

UCSC panTro3 Chimp Oct. 2010 (CGSC 2.1.3/panTro3) Genome at UCSC

UCSC panTro2 Chimp Mar. 2006 (CGSC 2.1/panTro2) Genome at UCSC

UCSC panTro1 Chimp Nov. 2003 (CGSC 1.1/panTro1) Genome at UCSC

UCSC criGriChoV2 Chinese hamster Jun. 2017 (CHOK1S_HZDv1/criGriChoV2) Genome at UCSC

UCSC criGriChoV1 Chinese hamster Aug. 2011 (CHO K1 cell line/criGriChoV1) Genome at UCSC

UCSC criGri1 Chinese hamster Jul. 2013 (C_griseus_v1.0/criGri1) Genome at UCSC

UCSC manPen1 Chinese pangolin Aug 2014 (M_pentadactyla-1.1.1/manPen1) Genome at UCSC

UCSC latCha1 Coelacanth Aug. 2011 (Broad/latCha1) Genome at UCSC

UCSC bosTau9 Cow Apr. 2018 (ARS-UCD1.2/bosTau9) Genome at UCSC

UCSC bosTau8 Cow Jun. 2014 (Bos_taurus_UMD_3.1.1/bosTau8) Genome at UCSC

UCSC bosTau7 Cow Oct. 2011 (Baylor Btau_4.6.1/bosTau7) Genome at UCSC

UCSC bosTau6 Cow Nov. 2009 (Bos_taurus_UMD_3.1/bosTau6) Genome at UCSC

UCSC bosTau4 Cow Oct. 2007 (Baylor 4.0/bosTau4) Genome at UCSC

UCSC bosTau3 Cow Aug. 2006 (Baylor 3.1/bosTau3) Genome at UCSC

UCSC bosTau2 Cow Mar. 2005 (Baylor 2.0/bosTau2) Genome at UCSC

UCSC macFas5 Crab-eating macaque Jun. 2013 (Macaca_fascicularis_5.0/macFas5) Genome at UCSC

UCSC droAna2 D. ananassae Aug. 2005 (Agencourt prelim/droAna2) Genome at UCSC

UCSC droAna1 D. ananassae July 2004 (TIGR/droAna1) Genome at UCSC

UCSC droEre1 D. erecta Aug. 2005 (Agencourt prelim/droEre1) Genome at UCSC

UCSC droGri1 D. grimshawi Aug. 2005 (Agencourt prelim/droGri1) Genome at UCSC

UCSC dm6 D. melanogaster Aug. 2014 (BDGP Release 6 + ISO1 MT/dm6) Genome at UCSC

UCSC dm3 D. melanogaster Apr. 2006 (BDGP R5/dm3) Genome at UCSC

UCSC dm2 D. melanogaster Apr. 2004 (BDGP R4/dm2) Genome at UCSC

UCSC dm1 D. melanogaster Jan. 2003 (BDGP R3/dm1) Genome at UCSC

UCSC droMoj2 D. mojavensis Aug. 2005 (Agencourt prelim/droMoj2) Genome at UCSC

UCSC droMoj1 D. mojavensis Aug. 2004 (Agencourt prelim/droMoj1) Genome at UCSC

UCSC droPer1 D. persimilis Oct. 2005 (Broad/droPer1) Genome at UCSC

UCSC dp3 D. pseudoobscura Nov. 2004 (FlyBase 1.03/dp3) Genome at UCSC

UCSC dp2 D. pseudoobscura Aug. 2003 (Baylor freeze1/dp2) Genome at UCSC

UCSC droSec1 D. sechellia Oct. 2005 (Broad/droSec1) Genome at UCSC

UCSC droSim1 D. simulans Apr. 2005 (WUGSC mosaic 1.0/droSim1) Genome at UCSC

UCSC droVir2 D. virilis Aug. 2005 (Agencourt prelim/droVir2) Genome at UCSC

UCSC droVir1 D. virilis July 2004 (Agencourt prelim/droVir1) Genome at UCSC

UCSC droYak2 D. yakuba Nov. 2005 (WUGSC 7.1/droYak2) Genome at UCSC

UCSC droYak1 D. yakuba Apr. 2004 (WUGSC 1.0/droYak1) Genome at UCSC

UCSC canFam3 Dog Sep. 2011 (Broad CanFam3.1/canFam3) Genome at UCSC

UCSC canFam2 Dog May 2005 (Broad/canFam2) Genome at UCSC

UCSC canFam1 Dog July 2004 (Broad/canFam1) Genome at UCSC

UCSC turTru2 Dolphin Oct. 2011 (Baylor Ttru_1.4/turTru2) Genome at UCSC

UCSC eboVir3 Ebola virus Sierra Leone 2014 (G3683/KM034562.1/eboVir3) Genome at UCSC

UCSC loxAfr3 Elephant Jul. 2009 (Broad/loxAfr3) Genome at UCSC

UCSC calMil1 Elephant shark Dec. 2013 (Callorhinchus_milii-6.1.3/calMil1) Genome at UCSC

UCSC musFur1 Ferret  Apr. 2011 (MusPutFur1.0/musFur1) Genome at UCSC

UCSC fr3 Fugu Oct. 2011 (FUGU5/fr3) Genome at UCSC

UCSC fr2 Fugu Oct. 2004 (JGI 4.0/fr2) Genome at UCSC

UCSC fr1 Fugu Aug. 2002 (JGI 3.0/fr1) Genome at UCSC

UCSC thaSir1 Garter snake Jun. 2015 (Thamnophis_sirtalis-6.0/thaSir1) Genome at UCSC

UCSC nomLeu3 Gibbon Oct. 2012 (GGSC Nleu3.0/nomLeu3) Genome at UCSC

UCSC nomLeu2 Gibbon Jun. 2011 (GGSC Nleu1.1/nomLeu2) Genome at UCSC

UCSC nomLeu1 Gibbon Jan. 2010 (GGSC Nleu1.0/nomLeu1) Genome at UCSC

UCSC aquChr2 Golden eagle Oct. 2014 (aquChr-1.0.2/aquChr2) Genome at UCSC

UCSC rhiRox1 Golden snub-nosed monkey Oct. 2014 (Rrox_v1/rhiRox1) Genome at UCSC

UCSC gorGor5 Gorilla Mar. 2016 (GSMRT3/gorGor5) Genome at UCSC

UCSC gorGor4 Gorilla Dec 2014 (gorGor4.1/gorGor4) Genome at UCSC

UCSC gorGor3 Gorilla May 2011 (gorGor3.1/gorGor3) Genome at UCSC

UCSC chlSab2 Green monkey Mar. 2014 (Chlorocebus_sabeus 1.1/chlSab2) Genome at UCSC

UCSC cavPor3 Guinea pig Feb. 2008 (Broad/cavPor3) Genome at UCSC

UCSC eriEur2 Hedgehog May 2012 (EriEur2.0/eriEur2) Genome at UCSC

UCSC eriEur1 Hedgehog June 2006 (Broad/eriEur1) Genome at UCSC

UCSC equCab3 Horse Jan. 2018 (EquCab3.0/equCab3) Genome at UCSC

UCSC equCab2 Horse Sep. 2007 (Broad/equCab2) Genome at UCSC

UCSC equCab1 Horse Jan. 2007 (Broad/equCab1) Genome at UCSC

UCSC dipOrd1 Kangaroo rat Jul. 2008 (Broad/dipOrd1) Genome at UCSC

UCSC petMar3 Lamprey Dec. 2017 (Pmar_germline 1.0/petMar3) Genome at UCSC

UCSC petMar2 Lamprey Sep. 2010 (WUGSC 7.0/petMar2) Genome at UCSC

UCSC petMar1 Lamprey Mar. 2007 (WUGSC 3.0/petMar1) Genome at UCSC

UCSC braFlo1 Lancelet Mar. 2006 (JGI 1.0/braFlo1) Genome at UCSC

UCSC anoCar2 Lizard May 2010 (Broad AnoCar2.0/anoCar2) Genome at UCSC

UCSC anoCar1 Lizard Feb. 2007 (Broad/anoCar1) Genome at UCSC

UCSC galVar1 Malayan flying lemur Jun. 2014 (G_variegatus-3.0.2/galVar1) Genome at UCSC

UCSC triMan1 Manatee Oct. 2011 (Broad v1.0/triMan1) Genome at UCSC

UCSC calJac3 Marmoset March 2009 (WUGSC 3.2/calJac3) Genome at UCSC

UCSC calJac1 Marmoset June 2007 (WUGSC 2.0.2/calJac1) Genome at UCSC

UCSC oryLat2 Medaka Oct. 2005 (NIG/UT MEDAKA1/oryLat2) Genome at UCSC

UCSC geoFor1 Medium ground finch Apr. 2012 (GeoFor_1.0/geoFor1) Genome at UCSC

UCSC pteVam1 Megabat Jul. 2008 (Broad/pteVam1) Genome at UCSC

UCSC myoLuc2 Microbat Jul. 2010 (Broad Institute Myoluc2.0/myoLuc2) Genome at UCSC

UCSC balAcu1 Minke whale Oct. 2013 (BalAcu1.0/balAcu1) Genome at UCSC

UCSC micMur2 Mouse lemur May 2015 (Mouse lemur/micMur2) Genome at UCSC

UCSC micMur1 Mouse lemur Jul. 2007 (Broad/micMur1) Genome at UCSC

UCSC hetGla2 Naked mole-rat Jan. 2012 (Broad HetGla_female_1.0/hetGla2) Genome at UCSC

UCSC hetGla1 Naked mole-rat Jul. 2011 (BGI HetGla_1.0/hetGla1) Genome at UCSC

UCSC oreNil2 Nile tilapia Jan. 2011 (Broad oreNil1.1/oreNil2) Genome at UCSC

UCSC monDom5 Opossum Oct. 2006 (Broad/monDom5) Genome at UCSC

UCSC monDom4 Opossum Jan. 2006 (Broad/monDom4) Genome at UCSC

UCSC monDom1 Opossum Oct. 2004 (Broad prelim/monDom1) Genome at UCSC

UCSC ponAbe3 Orangutan Jan. 2018 (Susie_PABv2/ponAbe3) Genome at UCSC

UCSC ponAbe2 Orangutan July 2007 (WUGSC 2.0.2/ponAbe2) Genome at UCSC

UCSC priPac1 P. pacificus Feb. 2007 (WUGSC 5.0/priPac1) Genome at UCSC

UCSC chrPic1 Painted turtle Dec. 2011 (v3.0.1/chrPic1) Genome at UCSC

UCSC ailMel1 Panda Dec. 2009 (BGI-Shenzhen 1.0/ailMel1) Genome at UCSC

UCSC susScr11 Pig Feb. 2017 (Sscrofa11.1/susScr11) Genome at UCSC

UCSC susScr3 Pig Aug. 2011 (SGSC Sscrofa10.2/susScr3) Genome at UCSC

UCSC susScr2 Pig Nov. 2009 (SGSC Sscrofa9.2/susScr2) Genome at UCSC

UCSC ochPri3 Pika May 2012 (OchPri3.0/ochPri3) Genome at UCSC

UCSC ochPri2 Pika Jul. 2008 (Broad/ochPri2) Genome at UCSC

UCSC ornAna2 Platypus Feb. 2007 (ASM227v2/ornAna2) Genome at UCSC

UCSC ornAna1 Platypus Mar. 2007 (WUGSC 5.0.1/ornAna1) Genome at UCSC

UCSC nasLar1 Proboscis monkey Nov. 2014 (Charlie1.0/nasLar1) Genome at UCSC

UCSC oryCun2 Rabbit Apr. 2009 (Broad/oryCun2) Genome at UCSC

UCSC rn6 Rat Jul. 2014 (RGSC 6.0/rn6) Genome at UCSC

UCSC rn5 Rat Mar. 2012 (RGSC 5.0/rn5) Genome at UCSC

UCSC rn4 Rat Nov. 2004 (Baylor 3.4/rn4) Genome at UCSC

UCSC rn3 Rat June 2003 (Baylor 3.1/rn3) Genome at UCSC

UCSC rheMac10 Rhesus Feb. 2019 (Mmul_10/rheMac10) Genome at UCSC

UCSC rheMac8 Rhesus Nov. 2015 (BCM Mmul_8.0.1/rheMac8) Genome at UCSC

UCSC rheMac3 Rhesus Oct. 2010 (BGI CR_1.0/rheMac3) Genome at UCSC

UCSC rheMac2 Rhesus Jan. 2006 (MGSC Merged 1.0/rheMac2) Genome at UCSC

UCSC proCap1 Rock hyrax Jul. 2008 (Broad/proCap1) Genome at UCSC

UCSC sacCer3 S. cerevisiae Apr. 2011 (SacCer_Apr2011/sacCer3) Genome at UCSC

UCSC sacCer2 S. cerevisiae June 2008 (SGD/sacCer2) Genome at UCSC

UCSC sacCer1 S. cerevisiae Oct. 2003 (SGD/sacCer1) Genome at UCSC

UCSC strPur2 S. purpuratus Sep. 2006 (Baylor 2.1/strPur2) Genome at UCSC

UCSC strPur1 S. purpuratus Apr. 2005 (Baylor 1.1/strPur1) Genome at UCSC

UCSC aplCal1 Sea hare Sept. 2008 (Broad 2.0/aplCal1) Genome at UCSC

UCSC oviAri4 Sheep Nov. 2015 (Oar_v4.0/oviAri4) Genome at UCSC

UCSC oviAri3 Sheep Aug. 2012 (ISGC Oar_v3.1/oviAri3) Genome at UCSC

UCSC oviAri1 Sheep Feb. 2010 (ISGC Ovis_aries_1.0/oviAri1) Genome at UCSC

UCSC sorAra2 Shrew Aug. 2008 (Broad/sorAra2) Genome at UCSC

UCSC sorAra1 Shrew June 2006 (Broad/sorAra1) Genome at UCSC

UCSC choHof1 Sloth Jul. 2008 (Broad/choHof1) Genome at UCSC

UCSC speTri2 Squirrel Nov. 2011 (Broad/speTri2) Genome at UCSC

UCSC saiBol1 Squirrel monkey Oct. 2011 (Broad/saiBol1) Genome at UCSC

UCSC gasAcu1 Stickleback Feb. 2006 (Broad/gasAcu1) Genome at UCSC

UCSC tarSyr2 Tarsier Sep. 2013 (Tarsius_syrichta-2.0.1/tarSyr2) Genome at UCSC

UCSC tarSyr1 Tarsier Aug. 2008 (Broad/tarSyr1) Genome at UCSC

UCSC sarHar1 Tasmanian devil Feb. 2011 (WTSI Devil_ref v7.0/sarHar1) Genome at UCSC

UCSC echTel2 Tenrec Nov. 2012 (Broad/echTel2) Genome at UCSC

UCSC echTel1 Tenrec July 2005 (Broad/echTel1) Genome at UCSC

UCSC tetNig2 Tetraodon Mar. 2007 (Genoscope 8.0/tetNig2) Genome at UCSC

UCSC tetNig1 Tetraodon Feb. 2004 (Genoscope 7/tetNig1) Genome at UCSC

UCSC nanPar1 Tibetan frog Mar. 2015 (BGI_ZX_2015/nanPar1) Genome at UCSC

UCSC tupBel1 Tree shrew Dec. 2006 (Broad/tupBel1) Genome at UCSC

UCSC melGal5 Turkey Nov. 2014 (Turkey_5.0/melGal5) Genome at UCSC

UCSC melGal1 Turkey Dec. 2009 (TGC Turkey_2.01/melGal1) Genome at UCSC

UCSC macEug2 Wallaby Sep. 2009 (TWGS Meug_1.1/macEug2) Genome at UCSC

UCSC cerSim1 White rhinoceros May 2012 (CerSimSim1.0/cerSim1) Genome at UCSC

UCSC xenTro9 X. tropicalis Jul. 2016 (Xenopus_tropicalis_v9.1/xenTro9) Genome at UCSC

UCSC xenTro7 X. tropicalis Sep. 2012 (JGI 7.0/xenTro7) Genome at UCSC

UCSC xenTro3 X. tropicalis Nov. 2009 (JGI 4.2/xenTro3) Genome at UCSC

UCSC xenTro2 X. tropicalis Aug. 2005 (JGI 4.1/xenTro2) Genome at UCSC

UCSC xenTro1 X. tropicalis Oct. 2004 (JGI 3.0/xenTro1) Genome at UCSC

UCSC taeGut2 Zebra finch Feb. 2013 (WashU taeGut324/taeGut2) Genome at UCSC

UCSC taeGut1 Zebra finch Jul. 2008 (WUGSC 3.2.4/taeGut1) Genome at UCSC

UCSC danRer11 Zebrafish May 2017 (GRCz11/danRer11) Genome at UCSC

UCSC danRer10 Zebrafish Sep. 2014 (GRCz10/danRer10) Genome at UCSC

UCSC danRer7 Zebrafish Jul. 2010 (Zv9/danRer7) Genome at UCSC

UCSC danRer6 Zebrafish Dec. 2008 (Zv8/danRer6) Genome at UCSC

UCSC danRer5 Zebrafish July 2007 (Zv7/danRer5) Genome at UCSC

UCSC danRer4 Zebrafish Mar. 2006 (Zv6/danRer4) Genome at UCSC

UCSC danRer3 Zebrafish May 2005 (Zv5/danRer3) Genome at UCSC

NCBIEnsembl多いので保存した方がよい。多いので検索にも数分かかる。

#Ensembl
genomepy genomes -p Ensembl > list

#NCBI
genomepy genomes -p NCBI > list

 

4、ダウンロード

例えば以下のようなリストの、

$ genomepy genomes -p Ensembl

 

Ensembl Red5_PS1_1.69.0 actinidia_chinensis

Ensembl AMTR1.0 amborella_trichopoda

Ensembl ASM9120v1 cyanidioschyzon_merolae

Ensembl AUK_PRJEB4211_v1 coffea_canephora

Ensembl CcrdV1 cynara_cardunculus

(以下略)

cyanidioschyzon merolaeのゲノムアセンブリASM9120v1をダウンロードしてみる。

以下のように打つ。

genomepy install ASM9120v1 -p Ensembl

-gをつけるとパス指定。

 

追記

マウスのmm10。デフォルトではEnsemblだったが、ーpでUCSCに変更。

genomepy install mm10 -p UCSC

 

デフォルトでは~/.local/share/genomes/に保存される。開いてみる。

open ~/.local/share/genomes/

f:id:kazumaxneo:20191123183432p:plain

ゲノムのfasta以外に、bwaのpluginをonにしたため、bwaアライナーのindexファイルも作成される。他のアライナーはパスを通してないためindexが作成されなかった。

 

optional

生物名で検索してダウンロードする例。

UCSCから利用できるヒトゲノムバージョンをサーチ。

genomepy search -p UCSC human

#データベース指定なしなら(少し時間がかかる)
genomepy search human

$ genomepy search -p UCSC human

UCSC hg38 Human Dec. 2013 (GRCh38/hg38) Genome at UCSC

UCSC hg19 Human Feb. 2009 (GRCh37/hg19) Genome at UCSC

UCSC hg18 Human Mar. 2006 (NCBI36/hg18) Genome at UCSC

UCSC hg17 Human May 2004 (NCBI35/hg17) Genome at UCSC

UCSC hg16 Human July 2003 (NCBI34/hg16) Genome at UCSC

最新のhg38以外に4つ利用できる。

 

今度はアノテーションも含めてダウンロードしてみる。ソフトフィルタリングもONにする。

genomepy install -g UCSC_human_genome -m soft --annotation hg38 UCSC
  • -g <TEXT>      genome directory
  • -m <TEXT>     mask (hard or soft)
  • --annotation   download annotation

ダウンロードされた。

f:id:kazumaxneo:20191123191757p:plain

READMEを開く。

f:id:kazumaxneo:20191123191830p:plain

関連