For background complete the introductory tutorials on rgbif and on rotl to familiarize yourself.
https://ropensci.org/tutorials/rotl_tutorial/
https://ropensci.org/tutorials/rgbif_tutorial/

Combining evolutionary and geographic information

In order to examine geographic distributions of species, it is useful to tie together information about phylogenetic relationships, which we can get from Open Tree of Life, with information about where individuals of that species are found.

install.packages(c("httr", "jsonlite", "rgbif", "rotl", "phytools", "viridis"))
## Installing packages into '/home/ejmctavish/R/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(httr)
library(jsonlite)
library(rgbif)
library(rotl)
library(phytools)
## Loading required package: ape
## Loading required package: maps
library(viridis)
## Loading required package: viridisLite
## 
## Attaching package: 'viridis'
## The following object is masked from 'package:maps':
## 
##     unemp

Choose some species

Lets look for some fun species, A polar bear, “Ursus maritimus”, a hippo “Hippopotamus amphibius”, a gorilla “Gorilla gorilla” and a wild boar “Sus scrofa”.

polarbear <-"Ursus maritimus"
hippo <-"Hippopotamus amphibius"
boar <-"Sus scrofa"
gorilla <- "Gorilla gorilla"
dolphin <- "Tursiops truncatus"

spp <- c(polarbear, boar, gorilla, hippo,dolphin)

We can use Open Tree of Life to generate a tree for our species!

taxa <- rotl::tnrs_match_names(spp, context="Animals")
tr <-  rotl::tol_induced_subtree(ott_id(taxa), label="name")
## 
Progress [----------------------------------] 0/62 (  0%) ?s
Progress [================================] 62/62 (100%)  0s
                                                            
## Warning in collapse_singles(tr, show_progress): Dropping singleton
## nodes with labels: Euarchontoglires, mrcaott786ott112387, Primates,
## mrcaott786ott3428, Haplorrhini, Simiiformes, Catarrhini, mrcaott786ott3607729,
## mrcaott786ott83926, mrcaott83926ott3607702, mrcaott83926ott96938,
## mrcaott83926ott6145147, mrcaott83926ott3607728, mrcaott83926ott770295,
## mrcaott83926ott3607876, mrcaott83926ott3607873, Homininae,
## Gorilla, Laurasiatheria, mrcaott1548ott6790, mrcaott1548ott3607484,
## mrcaott1548ott4942380, mrcaott1548ott4942547, mrcaott1548ott3021, Artiodactyla,
## mrcaott1548ott5256, mrcaott5256ott4944931, Cetacea, mrcaott5256ott3615450,
## mrcaott5256ott44568, Odontoceti, mrcaott5256ott5269, mrcaott5269ott6470,
## mrcaott5269ott47843, mrcaott47843ott194312, Delphinidae, mrcaott47843ott111103,
## mrcaott111103ott825969, mrcaott111103ott124220, mrcaott124220ott250045,
## mrcaott124220ott155492, mrcaott124220ott124224, mrcaott124220ott246130,
## Tursiops, Ancodonta, Hippopotamidae, Hippopotamus, Suina, Suidae,
## Sus, mrcaott4697ott263949, Carnivora, mrcaott4697ott6940, Caniformia,
## mrcaott4697ott10732, mrcaott10732ott22064, Ursidae, mrcaott10732ott679667,
## mrcaott10732ott297463, mrcaott10732ott153282, mrcaott10732ott7067610,
## mrcaott10732ott350099
plot(tr)

Don’t forget to cite the studies that went into building your tree!

Pulling supporting studies isn’t included in rotl yet (or I couldn’t find it). So we will use an Open Tree API call directly.

url_ot <- 'https://api.opentreeoflife.org/v3/tree_of_life/induced_subtree'
body <- list(ott_ids=taxa$ott_id)
r <- httr::POST(url_ot, body = body, encode = "json")#gets which studies support the subtree from  the open tree API
for(studytree in httr::content(r)$supporting_studies){
study <-  strsplit(studytree, '@')[[1]][1]
meta <- get_study_meta(study) #pulls the metadata for each study
pub <- get_publication(meta)  #grabs teh publication information
cat(pub,"\n\n")
}
## Lartillot, Nicolas, Frédéric Delsuc. 2012. Joint reconstruction of divergence times and life-history evolution in placental mammals using a phylogenetic covariance model. Evolution 66 (6): 1773-1787. 
## 
## O'Leary, M. A., J. I. Bloch, J. J. Flynn, T. J. Gaudin, A. Giallombardo, N. P. Giannini, S. L. Goldberg, B. P. Kraatz, Z.-X. Luo, J. Meng, X. Ni, M. J. Novacek, F. A. Perini, Z. S. Randall, G. W. Rougier, E. J. Sargis, M. T. Silcox, N. B. Simmons, M. Spaulding, P. M. Velazco, M. Weksler, J. R. Wible, A. L. Cirranello. 2013. The placental mammal ancestor and the post-K-Pg radiation of placentals. Science 339 (6120): 662-667. 
## 
## Springer, Mark S., Robert W. Meredith, John Gatesy, Christopher A. Emerling, Jong Park, Daniel L. Rabosky, Tanja Stadler, Cynthia Steiner, Oliver A. Ryder, Jan E. Janečka, Colleen A. Fisher, William J. Murphy. 2012. Macroevolutionary dynamics and historical biogeography of primate diversification inferred from a species supermatrix. PLoS ONE 7 (11): e49521. 
## 
## Mark S. Springer, Christopher A. Emerling, Robert W. Meredith, Jan E. Janečka, Eduardo Eizirik, William J. Murphy, 2017, 'Waking the undead: Implications of a soft explosive model for the timing of placental mammal diversification', Molecular Phylogenetics and Evolution, vol. 106, pp. 86-102 
## 
## Meredith, R.W., Janecka J., Gatesy J., Ryder O.A., Fisher C., Teeling E., Goodbla A., Eizirik E., Simao T., Stadler T., Rabosky D., Honeycutt R., Flynn J., Ingram C., Steiner C., Williams T., Robinson T., Herrick A., Westerman M., Ayoub N., Springer M., & Murphy W. 2011. Impacts of the Cretaceous Terrestrial Revolution and KPg Extinction on Mammal Diversification. Science 334 (6055): 521-524. 
## 
## Tarver, J. E., P. C. J. Donoghue, M. J. Benton. 2011. Is evolutionary history repeatedly rewritten in light of new fossil discoveries? Proceedings of the Royal Society B: Biological Sciences 278 (1705): 599-604. 
## 
## Graham J. Slater, Jeremy A. Goldbogen, Nicholas D. Pyenson, 2017, 'Independent evolution of baleen whale gigantism linked to Plio-Pleistocene ocean dynamics', Proceedings of the Royal Society B: Biological Sciences, vol. 284, no. 1855, p. 20170546 
## 
## Perelman, P., Johnson W.E., Roos C., Seuánez H.N., Horvath J.E., Moreira M.A., Kessing B.D., Pontius J., Roelke M., Rumpler Y., Schneider M.C., Silva A., O'brien S., & Pecon-slattery J. 2011. A Molecular Phylogeny of Living Primates. PLoS Genetics 7(3): e1001342. 
## 
## Nyakatura, Katrin, Olaf RP Bininda-Emonds. 2012. Updating the evolutionary history of Carnivora (Mammalia): a new species-level supertree complete with divergence time estimates. BMC Biology 10 (1): 12 
## 
## Krause, Johannes, Tina Unger, Aline Noçon, Anna-Sapfo Malaspinas, Sergios-Orestis Kolokotronis, Mathias Stiller, Leopoldo Soibelzon, Helen Spriggs, Paul H Dear, Adrian W Briggs, Sarah CE Bray, Stephen J O'Brien, Gernot Rabeder, Paul Matheus, Alan Cooper, Montgomery Slatkin, Svante Pääbo, Michael Hofreiter. 2008. Mitochondrial genomes reveal an explosive radiation of extinct and extant bears near the Miocene-Pliocene boundary. BMC Evolutionary Biology 8 (1): 220. 
## 
## Toljagi&#263; O., Voje K.L., Matschiner M., Liow L., & Hansen T.F. 2017. Millions of Years Behind: Slow Adaptation of Ruminants to Grasslands. Systematic Biology, . 
## 
## Steeman, M., Hebsgaard M., Fordyce R., Ho S., Rabosky D., Nielsen R., Rahbek C., Glenner H., Sørensen M., & Willerslev E. 2009. Radiation of Extant Cetaceans Driven by Restructuring of the Oceans. Systematic Biology 58 (6): 573-585. 
## 
## McGowen, M., Spaulding M., and Gatesy J. 2009. Divergence date estimation and a comprehensive molecular tree of extant cetaceans. Molecular Phylogenetics and Evolution 53 (3): 891-906.

Lets combine the phylogenetic information from Open Tree with geographic information from GBIF

We will use the Global Biodiversity Information Facility (GBIF) to search for records of these species, and arbitrarily chose the first record returned as the location for the species. (sometimes the first record has no lat-long info, and we need to find a better one)

dat1 <- rgbif::occ_search(scientificName = polarbear, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 10)
locsdat1 <- cbind(polarbear, dat1$data) #labels the lat long columns with the species name
colnames(locsdat1)<-c("species","Long","Lat")

dat2 <- rgbif::occ_search(scientificName = hippo, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 10)
locsdat2 <- cbind(hippo, dat2$data)
colnames(locsdat2)<-c("species","Long","Lat")

dat3 <- rgbif::occ_search(scientificName = boar,  fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 10)
locsdat3 <- cbind(boar, dat3$data)
colnames(locsdat3)<-c("species","Long","Lat")

dat4 <- rgbif::occ_search(scientificName = gorilla, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 10)
locsdat4 <- cbind(gorilla, dat4$data)
colnames(locsdat4)<-c("species","Long","Lat")

dat5 <- rgbif::occ_search(scientificName = dolphin, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 10)
locsdat5 <- cbind(dolphin, dat5$data)
colnames(locsdat5)<-c("species","Long","Lat")

locs <- as.data.frame(rbind(locsdat1,locsdat2, locsdat3, locsdat4, locsdat5)) #Combine the data from each species
locs
##                   species        Long        Lat
## 1         Ursus maritimus  -93.816720  58.733208
## 2         Ursus maritimus  -94.152214  58.775311
## 3         Ursus maritimus -156.858589  71.226550
## 4         Ursus maritimus   12.389055  78.930327
## 5         Ursus maritimus  -93.959003  58.766739
## 6         Ursus maritimus -156.723045  71.350298
## 7         Ursus maritimus  -93.192528  58.649208
## 8         Ursus maritimus  -20.507641  74.428475
## 9         Ursus maritimus -156.670606  71.272698
## 10        Ursus maritimus  -94.104818  58.765814
## 11 Hippopotamus amphibius   37.315509  -2.705606
## 12 Hippopotamus amphibius   27.055363 -25.321550
## 13 Hippopotamus amphibius   32.461800 -28.284100
## 14 Hippopotamus amphibius   19.834524 -34.665625
## 15 Hippopotamus amphibius   32.529800 -28.133800
## 16 Hippopotamus amphibius   38.021624  -2.822808
## 17 Hippopotamus amphibius   32.408200 -28.236500
## 18 Hippopotamus amphibius   39.191352  -3.399534
## 19 Hippopotamus amphibius   32.528700 -28.133000
## 20 Hippopotamus amphibius   32.530200 -28.134200
## 21             Sus scrofa    6.425629  50.869137
## 22             Sus scrofa    8.907829  49.988293
## 23             Sus scrofa    8.509169  49.979488
## 24             Sus scrofa    8.210533  50.410290
## 25             Sus scrofa    8.952374  50.132347
## 26             Sus scrofa    8.567899  49.976768
## 27             Sus scrofa    9.147406  52.178852
## 28             Sus scrofa   10.269532  48.920700
## 29             Sus scrofa    9.723029  49.976425
## 30             Sus scrofa    7.982176  50.326050
## 31        Gorilla gorilla   12.972422  -1.600545
## 32        Gorilla gorilla   11.667805  -0.317679
## 33        Gorilla gorilla   10.053514  -2.998760
## 34        Gorilla gorilla   12.868204  -1.605592
## 35        Gorilla gorilla   13.163600  -1.624496
## 36        Gorilla gorilla   12.849542  -1.651610
## 37        Gorilla gorilla -106.663485  35.078382
## 38        Gorilla gorilla   12.894200  -1.721151
## 39        Gorilla gorilla   12.884102  -1.738712
## 40        Gorilla gorilla   12.930430  -1.711025
## 41     Tursiops truncatus -110.689321  25.008885
## 42     Tursiops truncatus -118.392640  33.811742
## 43     Tursiops truncatus -119.614499  34.057997
## 44     Tursiops truncatus  -97.164837  27.940834
## 45     Tursiops truncatus  168.418270 -46.565697
## 46     Tursiops truncatus -118.255547  33.715199
## 47     Tursiops truncatus  -85.314767  29.812679
## 48     Tursiops truncatus  -97.392456  27.751821
## 49     Tursiops truncatus  -97.066288  27.982494
## 50     Tursiops truncatus    4.045470  43.452470
locs[,1] <- gsub(' ','_',locs[,1])#On the tree, names have underscores instead of spaces.
#Sometimes the lat longs have spaces in the records. 
#We use trimws to get rid of the whitespace, and as.numeric to record the the latitudes and longitudes as numbers instead of characters.
locs[,2]<-as.numeric(trimws(as.character(locs[,2])))
locs[,3]<-as.numeric(trimws(as.character(locs[,3])))
latlong <- as.matrix(locs[,c(3,2)]) #flip the columns to lat, long instead of long,lat
rownames(latlong) <- locs[,1]

Check that your species are all included and have the correct names (taxonomic name recognition sometimes updates them)

latlong
##                               Lat        Long
## Ursus_maritimus         58.733208  -93.816720
## Ursus_maritimus         58.775311  -94.152214
## Ursus_maritimus         71.226550 -156.858589
## Ursus_maritimus         78.930327   12.389055
## Ursus_maritimus         58.766739  -93.959003
## Ursus_maritimus         71.350298 -156.723045
## Ursus_maritimus         58.649208  -93.192528
## Ursus_maritimus         74.428475  -20.507641
## Ursus_maritimus         71.272698 -156.670606
## Ursus_maritimus         58.765814  -94.104818
## Hippopotamus_amphibius  -2.705606   37.315509
## Hippopotamus_amphibius -25.321550   27.055363
## Hippopotamus_amphibius -28.284100   32.461800
## Hippopotamus_amphibius -34.665625   19.834524
## Hippopotamus_amphibius -28.133800   32.529800
## Hippopotamus_amphibius  -2.822808   38.021624
## Hippopotamus_amphibius -28.236500   32.408200
## Hippopotamus_amphibius  -3.399534   39.191352
## Hippopotamus_amphibius -28.133000   32.528700
## Hippopotamus_amphibius -28.134200   32.530200
## Sus_scrofa              50.869137    6.425629
## Sus_scrofa              49.988293    8.907829
## Sus_scrofa              49.979488    8.509169
## Sus_scrofa              50.410290    8.210533
## Sus_scrofa              50.132347    8.952374
## Sus_scrofa              49.976768    8.567899
## Sus_scrofa              52.178852    9.147406
## Sus_scrofa              48.920700   10.269532
## Sus_scrofa              49.976425    9.723029
## Sus_scrofa              50.326050    7.982176
## Gorilla_gorilla         -1.600545   12.972422
## Gorilla_gorilla         -0.317679   11.667805
## Gorilla_gorilla         -2.998760   10.053514
## Gorilla_gorilla         -1.605592   12.868204
## Gorilla_gorilla         -1.624496   13.163600
## Gorilla_gorilla         -1.651610   12.849542
## Gorilla_gorilla         35.078382 -106.663485
## Gorilla_gorilla         -1.721151   12.894200
## Gorilla_gorilla         -1.738712   12.884102
## Gorilla_gorilla         -1.711025   12.930430
## Tursiops_truncatus      25.008885 -110.689321
## Tursiops_truncatus      33.811742 -118.392640
## Tursiops_truncatus      34.057997 -119.614499
## Tursiops_truncatus      27.940834  -97.164837
## Tursiops_truncatus     -46.565697  168.418270
## Tursiops_truncatus      33.715199 -118.255547
## Tursiops_truncatus      29.812679  -85.314767
## Tursiops_truncatus      27.751821  -97.392456
## Tursiops_truncatus      27.982494  -97.066288
## Tursiops_truncatus      43.452470    4.045470

Now to join together the geographic information with the phylogenetic information!

Trees from OpenTree don’t automatically come with branch lengths - we need to infer some branch lengths for this tree in order to plot it. (in future we can use R-Datelife! datelife.org)

tr_bl<-ape::compute.brlen(tr)

Combine the phylogeny and the mapped locations

cols<-setNames(sample(viridis(n=ape::Ntip(tr_bl))),
    tr_bl$tip.label)# This sets the colors to pretty colors
tdobj<-phytools::phylo.to.map(tr_bl,latlong,plot = FALSE, direction="rightwards")
## objective: 4
## objective: 4
## objective: 4
## objective: 2
plot(tdobj, colors=cols, direction="rightwards")

#If there is an error with 'cols' try this instead:
#plot(tdobj, direction="rightwards")

We can do a better job of name matching by actually using the cross taxonomy mappings already encoded in the Open Tree Taxonomy!

Lets try some plants

spp2 <- c("Mimulous guttatus","Arabidopsis thaliana", "Musa gracilis", "Carex capitata")
taxa2 <- rotl::tnrs_match_names(spp2)
tr2 <- rotl::tol_induced_subtree(ott_id(taxa2), label="name")
## 
Progress [---------------------------------] 0/100 (  0%) ?s
Progress [==============================] 100/100 (100%)  0s
                                                            
## Warning in collapse_singles(tr, show_progress): Dropping singleton nodes with
## labels: mrcaott2ott10930, mrcaott2ott2441, mrcaott2ott969, mrcaott2ott62529,
## mrcaott2ott8379, eudicotyledons, Gunneridae, mrcaott2ott2464, mrcaott2ott8384,
## rosids, malvids, mrcaott96ott14140, mrcaott96ott50744, mrcaott96ott378,
## mrcaott378ott29446, mrcaott378ott1697, Brassicales, mrcaott378ott307071,
## mrcaott378ott32461, mrcaott378ott509555, mrcaott378ott318175,
## mrcaott378ott9635, mrcaott378ott125843, mrcaott378ott509568, mrcaott378ott28763,
## mrcaott378ott83547, Brassicaceae, mrcaott378ott299734, mrcaott378ott4673,
## mrcaott378ott4671, mrcaott4671ott58909, mrcaott4671ott6278, mrcaott6278ott15318,
## mrcaott6278ott158438, mrcaott6278ott10585, mrcaott6278ott34460,
## mrcaott6278ott9083, mrcaott9083ott19798, mrcaott19798ott31487,
## mrcaott31487ott88883, mrcaott31487ott152275, mrcaott248ott10053,
## mrcaott248ott20991, mrcaott248ott557, asterids, mrcaott248ott650,
## mrcaott248ott320, lamiids, mrcaott248ott979274, mrcaott248ott26035,
## mrcaott248ott1191, mrcaott248ott2108, Lamiales, mrcaott248ott11341,
## mrcaott248ott55259, Core_Lamiales, mrcaott248ott1016, mrcaott1016ott25224,
## mrcaott1016ott1452, mrcaott1452ott5046, mrcaott1452ott2133, mrcaott1452ott1607,
## mrcaott1452ott108490, mrcaott1452ott305916, mrcaott1452ott39175, Phrymaceae,
## Erythranthe, Liliopsida, mrcaott121ott290, Petrosaviidae, mrcaott121ott4474,
## mrcaott121ott1439, mrcaott121ott334, mrcaott121ott4575, mrcaott121ott3449,
## Zingiberales, mrcaott8470ott44226, Musaceae, Musa, mrcaott252ott213153,
## Poales, mrcaott252ott128594, mrcaott252ott1477, mrcaott1477ott591692,
## mrcaott1477ott2066, Cyperaceae, Cyperoideae, mrcaott2066ott957772,
## mrcaott2066ott297890, mrcaott2066ott57510, mrcaott2066ott262874, Cariceae,
## Carex, mrcaott2078ott4161, mrcaott4161ott193247, mrcaott4161ott980217,
## mrcaott4161ott5264517, mrcaott4161ott265965, mrcaott4161ott8152,
## mrcaott4161ott164308
tr_bl2<-ape::compute.brlen(tr2)
taxa2[,"gbif"] <- NA
locs = data.frame(matrix(vector(), nrow(taxa2), 2))
i=1
for(id in taxa2$ott_id){
   tax_info <- rotl::taxonomy_taxon_info(id)
   for(source in tax_info[[1]]$tax_sources){
     if (grepl('gbif', source, fixed=TRUE)){
            gbif_id <- strsplit(source,":")[[1]][2]
            taxa2[i,]$gbif <-gbif_id
            result <- rgbif::occ_search(taxonKey = gbif_id, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 5)
            if (dim(result$data)[1] > 0) {
                try(locs[i,1] <- as.data.frame(result$data)[2])
                try(locs[i,2] <- as.data.frame(result$data)[1])
            }
     }}
   i = i + 1}
## Warning in `[<-.data.frame`(`*tmp*`, i, 1, value =
## structure(list(decimalLatitude = c(-44.170553, : replacement element 1 has 5
## rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 2, value =
## structure(list(decimalLongitude = c(170.995575, : replacement element 1 has 5
## rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 1, value =
## structure(list(decimalLatitude = c(51.493676, : replacement element 1 has 5 rows
## to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 2, value =
## structure(list(decimalLongitude = c(-0.015267, : replacement element 1 has 5
## rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 1, value =
## structure(list(decimalLatitude = c(-2.361221, : replacement element 1 has 4 rows
## to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 2, value =
## structure(list(decimalLongitude = c(101.599041, : replacement element 1 has 4
## rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 1, value =
## structure(list(decimalLatitude = c(64.611437, : replacement element 1 has 5 rows
## to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 2, value =
## structure(list(decimalLongitude = c(-149.292663, : replacement element 1 has 5
## rows to replace 1 rows
rownames(locs) <- gsub(' ','_',taxa2$unique_name) 
colnames(locs) <- c("latitude","longitude")

One of our species got reconciled to a synonym!

plot(tr2)

Mimulous guttatus is a synonym of Erythranthe lutea https://tree.opentreeoflife.org/taxonomy/browse?id=662909

taxa2
##          search_string          unique_name approximate_match ott_id is_synonym
## 1    mimulous guttatus  Erythranthe guttata              TRUE 504496      FALSE
## 2 arabidopsis thaliana Arabidopsis thaliana             FALSE 309263      FALSE
## 3        musa gracilis        Musa gracilis             FALSE   8478      FALSE
## 4       carex capitata       Carex capitata             FALSE   4161      FALSE
##            flags number_matches    gbif
## 1                             5 7346102
## 2                             1 3052436
## 3 sibling_higher              1 2762973
## 4 sibling_higher              1 2728838

Because we are mapping tips based on keys instead of names, the name replacement doesn’t matter affect our mapping

tdobj2<-phytools::phylo.to.map(tr_bl2,locs, plot = FALSE, direction="rightwards")
## objective: 4
## objective: 2
## objective: 2
plot(tdobj2, direction="rightwards")