For background complete the introductory tutorials on rgbif and on rotl to familiarize yourself.
https://ropensci.org/tutorials/rotl_tutorial/
https://ropensci.org/tutorials/rgbif_tutorial/

Combining evolutionary and geographic information

In order to examine geographic distributions of species, it is useful to tie together information about phylogenetic relationships, which we can get from Open Tree of Life, with information about where individuals of that species are found.

library(httr)
library(jsonlite)
library(rgbif)
library(rotl)
library(phytools)
## Loading required package: ape
## Loading required package: maps
library(viridis)
## Loading required package: viridisLite

Choose some species

Lets look for some fun species, A polar bear, “Ursus maritimus”, a hippo “Hippopotamus amphibius”, a gorilla “Gorilla gorilla” and a wild boar “Sus scrofa”.

polarbear <-"Ursus maritimus"
hippo <-"Hippopotamus amphibius"
boar <-"Sus scrofa"
gorilla <- "Gorilla gorilla"
dolphin <- "Tursiops truncatus"

spp <- c(polarbear, boar, gorilla, hippo,dolphin)

We can use Open Tree of Life to generate a tree for our species!

taxa <- tnrs_match_names(spp, context="Animals")
tr <- tol_induced_subtree(ott_id(taxa), label="name")
## 
Progress [----------------------------------] 0/46 (  0%) ?s
Progress [================================] 46/46 (100%)  0s
                                                            
## Warning in collapse_singles(tr, show_progress): Dropping singleton nodes
## with labels: Euarchontoglires, mrcaott786ott112387, Primates, Haplorrhini,
## Simiiformes, Catarrhini, Hominoidea, Hominidae, Homininae, Gorilla,
## Laurasiatheria, mrcaott1548ott6790, mrcaott1548ott3021, Cetartiodactyla,
## mrcaott1548ott5269, Cetacea, Odontoceti, mrcaott5269ott234629,
## mrcaott5269ott6470, mrcaott5269ott47843, mrcaott47843ott194312,
## Delphinidae, mrcaott47843ott111103, mrcaott111103ott825969,
## mrcaott111103ott124224, mrcaott124224ott250045, mrcaott124224ott155492,
## mrcaott124224ott246130, mrcaott246130ott5846404, Tursiops, Hippopotamidae,
## Hippopotamus, Suina, Suidae, Sus, mrcaott4697ott222594, Carnivora,
## Caniformia, mrcaott4697ott10732, mrcaott10732ott22064, Ursidae,
## mrcaott10732ott679667, mrcaott10732ott297463, mrcaott10732ott153282,
## mrcaott10732ott89827, mrcaott10732ott350099
plot(tr)

Don’t forget to cite the studies that went into building your tree!

Pulling supporting studies isn’t included in rotl yet (or I couldn’t find it). So we will use an Open Tree API call directly.

url_ot <- 'https://api.opentreeoflife.org/v3/tree_of_life/induced_subtree'
body <- list(ott_ids=taxa$ott_id)
r <- POST(url_ot, body = body, encode = "json")#gets which studies support the subtree from  the open tree API
for(studytree in content(r)$supporting_studies){
study <-  strsplit(studytree, '@')[[1]][1]
meta <- get_study_meta(study) #pulls the metadata for each study
pub <- get_publication(meta)  #grabs teh publication information
cat(pub,"\n\n")
}
## Meredith, R.W., Janecka J., Gatesy J., Ryder O.A., Fisher C., Teeling E., Goodbla A., Eizirik E., Simao T., Stadler T., Rabosky D., Honeycutt R., Flynn J., Ingram C., Steiner C., Williams T., Robinson T., Herrick A., Westerman M., Ayoub N., Springer M., & Murphy W. 2011. Impacts of the Cretaceous Terrestrial Revolution and KPg Extinction on Mammal Diversification. Science 334 (6055): 521-524. 
## 
## Nyakatura, Katrin, Olaf RP Bininda-Emonds. 2012. Updating the evolutionary history of Carnivora (Mammalia): a new species-level supertree complete with divergence time estimates. BMC Biology 10 (1): 12 
## 
## Tarver, J. E., P. C. J. Donoghue, M. J. Benton. 2011. Is evolutionary history repeatedly rewritten in light of new fossil discoveries? Proceedings of the Royal Society B: Biological Sciences 278 (1705): 599-604. 
## 
## Springer, Mark S., Robert W. Meredith, John Gatesy, Christopher A. Emerling, Jong Park, Daniel L. Rabosky, Tanja Stadler, Cynthia Steiner, Oliver A. Ryder, Jan E. Janečka, Colleen A. Fisher, William J. Murphy. 2012. Macroevolutionary dynamics and historical biogeography of primate diversification inferred from a species supermatrix. PLoS ONE 7 (11): e49521. 
## 
## Perelman, P., Johnson W.E., Roos C., Seuánez H.N., Horvath J.E., Moreira M.A., Kessing B.D., Pontius J., Roelke M., Rumpler Y., Schneider M.C., Silva A., O'brien S., & Pecon-slattery J. 2011. A Molecular Phylogeny of Living Primates. PLoS Genetics 7(3): e1001342. 
## 
## Pozzi, Luca, Jason A. Hodgson, Andrew S. Burrell, Kirstin N. Sterner, Ryan L. Raaum, Todd R. Disotell. 2014. Primate phylogenetic relationships and divergence dates inferred from complete mitochondrial genomes. Molecular Phylogenetics and Evolution 75: 165-183. 
## 
## Lartillot, Nicolas, Frédéric Delsuc. 2012. Joint reconstruction of divergence times and life-history evolution in placental mammals using a phylogenetic covariance model. Evolution 66 (6): 1773-1787. 
## 
## O'Leary, M. A., J. I. Bloch, J. J. Flynn, T. J. Gaudin, A. Giallombardo, N. P. Giannini, S. L. Goldberg, B. P. Kraatz, Z.-X. Luo, J. Meng, X. Ni, M. J. Novacek, F. A. Perini, Z. S. Randall, G. W. Rougier, E. J. Sargis, M. T. Silcox, N. B. Simmons, M. Spaulding, P. M. Velazco, M. Weksler, J. R. Wible, A. L. Cirranello. 2013. The placental mammal ancestor and the post-K-Pg radiation of placentals. Science 339 (6120): 662-667. 
## 
## Krause, Johannes, Tina Unger, Aline Noçon, Anna-Sapfo Malaspinas, Sergios-Orestis Kolokotronis, Mathias Stiller, Leopoldo Soibelzon, Helen Spriggs, Paul H Dear, Adrian W Briggs, Sarah CE Bray, Stephen J O'Brien, Gernot Rabeder, Paul Matheus, Alan Cooper, Montgomery Slatkin, Svante Pääbo, Michael Hofreiter. 2008. Mitochondrial genomes reveal an explosive radiation of extinct and extant bears near the Miocene-Pliocene boundary. BMC Evolutionary Biology 8 (1): 220. 
## 
## McGowen, M., Spaulding M., and Gatesy J. 2009. Divergence date estimation and a comprehensive molecular tree of extant cetaceans. Molecular Phylogenetics and Evolution 53 (3): 891-906. 
## 
## Steeman, M., Hebsgaard M., Fordyce R., Ho S., Rabosky D., Nielsen R., Rahbek C., Glenner H., Sørensen M., & Willerslev E. 2009. Radiation of Extant Cetaceans Driven by Restructuring of the Oceans. Systematic Biology 58 (6): 573-585.

Lets combine the phylogenetic information from Open Tree with geographic information from GBIF

We will use the Global Biodiversity Information Facility (GBIF) to search for records of these species, and arbitrarily chose the first record returned as the location for the species. (sometimes the first record has no lat-long info, and we need to find a better one)

dat1 <- occ_search(scientificName = polarbear, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 10)
## Registered S3 method overwritten by 'crul':
##   method                 from
##   as.character.form_file httr
locsdat1 <- cbind(polarbear, dat1$data) #labels the lat long columns with the species name
colnames(locsdat1)<-c("species","Long","Lat")

dat2 <- occ_search(scientificName = hippo, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 10)
locsdat2 <- cbind(hippo, dat2$data)
colnames(locsdat2)<-c("species","Long","Lat")

dat3 <- occ_search(scientificName = boar,  fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 10)
locsdat3 <- cbind(boar, dat3$data)
colnames(locsdat3)<-c("species","Long","Lat")

dat4 <- occ_search(scientificName = gorilla, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 10)
locsdat4 <- cbind(gorilla, dat4$data)
colnames(locsdat4)<-c("species","Long","Lat")

dat5 <- occ_search(scientificName = dolphin, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 10)
locsdat5 <- cbind(dolphin, dat5$data)
colnames(locsdat5)<-c("species","Long","Lat")

locs <- as.data.frame(rbind(locsdat1,locsdat2, locsdat3, locsdat4, locsdat5)) #Combine the data from each species
locs
##                   species        Long        Lat
## 1         Ursus maritimus  -95.834763  74.811591
## 2         Ursus maritimus   16.880762  76.693540
## 3         Ursus maritimus   15.420295  76.973100
## 4         Ursus maritimus   17.834438  79.594483
## 5         Ursus maritimus   21.153876  79.036866
## 6         Ursus maritimus  -24.185662  69.497186
## 7         Ursus maritimus   18.044720  80.024230
## 8         Ursus maritimus   12.737274  79.590858
## 9         Ursus maritimus  -42.754841  71.603621
## 10        Ursus maritimus -156.823540  71.341687
## 11 Hippopotamus amphibius   30.773700  -1.724287
## 12 Hippopotamus amphibius   32.264790 -11.984929
## 13 Hippopotamus amphibius    9.760014   8.512886
## 14 Hippopotamus amphibius   38.125000   0.125000
## 15 Hippopotamus amphibius   38.125000   0.125000
## 16 Hippopotamus amphibius   36.335128  -0.836471
## 17 Hippopotamus amphibius   36.125000  -0.375000
## 18 Hippopotamus amphibius   36.727085  -1.298866
## 19 Hippopotamus amphibius   35.076993  -1.492121
## 20 Hippopotamus amphibius    1.986739  11.295472
## 21             Sus scrofa -121.134327  36.652202
## 22             Sus scrofa  -97.781719  30.413012
## 23             Sus scrofa  -97.199268  27.949949
## 24             Sus scrofa   12.114698  58.542243
## 25             Sus scrofa -121.941247  37.851017
## 26             Sus scrofa   13.550520  52.579220
## 27             Sus scrofa   17.316222  59.723542
## 28             Sus scrofa -121.392652  37.159589
## 29             Sus scrofa  128.251913  26.859226
## 30             Sus scrofa  128.244232  26.832315
## 31        Gorilla gorilla   16.297543   2.912555
## 32        Gorilla gorilla   16.208158   2.967106
## 33        Gorilla gorilla   16.261598   2.894185
## 34        Gorilla gorilla    9.546863  -1.981317
## 35        Gorilla gorilla   16.511686   2.279494
## 36        Gorilla gorilla   16.572800   2.365717
## 37        Gorilla gorilla   16.203756   2.222516
## 38        Gorilla gorilla   15.530528  -3.078171
## 39        Gorilla gorilla   15.432799  -3.112646
## 40        Gorilla gorilla   15.563743  -3.131757
## 41     Tursiops truncatus -118.396942  33.737042
## 42     Tursiops truncatus  -80.171216  25.778251
## 43     Tursiops truncatus  -97.188325  27.962853
## 44     Tursiops truncatus -119.883349  34.411222
## 45     Tursiops truncatus  -80.841218  32.022872
## 46     Tursiops truncatus  -94.930123  29.340706
## 47     Tursiops truncatus  -89.502327  21.204404
## 48     Tursiops truncatus  -79.889818  32.690279
## 49     Tursiops truncatus  -75.060396  38.608856
## 50     Tursiops truncatus -113.336993  26.632157
locs[,1] <- gsub(' ','_',locs[,1])#On the tree, names have underscores instead of spaces.
#Sometimes the lat longs have spaces in the records. 
#We use trimws to get rid of the whitespace, and as.numeric to record the the latitudes and longitudes as numbers instead of characters.
locs[,2]<-as.numeric(trimws(as.character(locs[,2])))
locs[,3]<-as.numeric(trimws(as.character(locs[,3])))
latlong <- as.matrix(locs[,c(3,2)]) #flip the columns to lat, long instead of long,lat
rownames(latlong) <- locs[,1]

Check that your species are all included and have the correct names (taxonomic name recognition sometimes updates them)

latlong
##                               Lat        Long
## Ursus_maritimus         74.811591  -95.834763
## Ursus_maritimus         76.693540   16.880762
## Ursus_maritimus         76.973100   15.420295
## Ursus_maritimus         79.594483   17.834438
## Ursus_maritimus         79.036866   21.153876
## Ursus_maritimus         69.497186  -24.185662
## Ursus_maritimus         80.024230   18.044720
## Ursus_maritimus         79.590858   12.737274
## Ursus_maritimus         71.603621  -42.754841
## Ursus_maritimus         71.341687 -156.823540
## Hippopotamus_amphibius  -1.724287   30.773700
## Hippopotamus_amphibius -11.984929   32.264790
## Hippopotamus_amphibius   8.512886    9.760014
## Hippopotamus_amphibius   0.125000   38.125000
## Hippopotamus_amphibius   0.125000   38.125000
## Hippopotamus_amphibius  -0.836471   36.335128
## Hippopotamus_amphibius  -0.375000   36.125000
## Hippopotamus_amphibius  -1.298866   36.727085
## Hippopotamus_amphibius  -1.492121   35.076993
## Hippopotamus_amphibius  11.295472    1.986739
## Sus_scrofa              36.652202 -121.134327
## Sus_scrofa              30.413012  -97.781719
## Sus_scrofa              27.949949  -97.199268
## Sus_scrofa              58.542243   12.114698
## Sus_scrofa              37.851017 -121.941247
## Sus_scrofa              52.579220   13.550520
## Sus_scrofa              59.723542   17.316222
## Sus_scrofa              37.159589 -121.392652
## Sus_scrofa              26.859226  128.251913
## Sus_scrofa              26.832315  128.244232
## Gorilla_gorilla          2.912555   16.297543
## Gorilla_gorilla          2.967106   16.208158
## Gorilla_gorilla          2.894185   16.261598
## Gorilla_gorilla         -1.981317    9.546863
## Gorilla_gorilla          2.279494   16.511686
## Gorilla_gorilla          2.365717   16.572800
## Gorilla_gorilla          2.222516   16.203756
## Gorilla_gorilla         -3.078171   15.530528
## Gorilla_gorilla         -3.112646   15.432799
## Gorilla_gorilla         -3.131757   15.563743
## Tursiops_truncatus      33.737042 -118.396942
## Tursiops_truncatus      25.778251  -80.171216
## Tursiops_truncatus      27.962853  -97.188325
## Tursiops_truncatus      34.411222 -119.883349
## Tursiops_truncatus      32.022872  -80.841218
## Tursiops_truncatus      29.340706  -94.930123
## Tursiops_truncatus      21.204404  -89.502327
## Tursiops_truncatus      32.690279  -79.889818
## Tursiops_truncatus      38.608856  -75.060396
## Tursiops_truncatus      26.632157 -113.336993

Now to join together the geographic information with the phylogenetic information!

Trees from OpenTree don’t automatically come with branch lengths - we need to infer some branch lengths for this tree in order to plot it. (in fruture we can use R-Datelife! datelife.org)

tr_bl<-compute.brlen(tr)

Combine the phylogeny and the mapped locations

cols<-setNames(sample(viridis(n=Ntip(tr_bl))),
    tr_bl$tip.label)# This sets the colors to pretty colors
tdobj<-phylo.to.map(tr_bl,latlong,plot = FALSE, direction="rightwards")
## objective: 4
## objective: 4
## objective: 4
## objective: 2
plot(tdobj, colors=cols, direction="rightwards")

#If there is an error with 'cols' try this instead:
#plot(tdobj, direction="rightwards")

We can do a better job of name matching by actually using the cross taxonomy mappings already encoded in the Open Tree Taxonomy!

Lets try some plants

spp2 <- c("Mimulous guttatus","Arabidopsis thaliana", "Musa gracilis", "Carex capitata")
taxa2 <- tnrs_match_names(spp2)
tr2 <- tol_induced_subtree(ott_id(taxa2), label="name")
## 
Progress [----------------------------------] 0/82 (  0%) ?s
Progress [================================] 82/82 (100%)  0s
                                                            
## Warning in collapse_singles(tr, show_progress): Dropping singleton
## nodes with labels: mrcaott2ott10930, eudicotyledons, mrcaott2ott969,
## mrcaott2ott62529, mrcaott2ott8379, Gunneridae, mrcaott2ott2464,
## mrcaott2ott8384, rosids, malvids, mrcaott96ott14140, mrcaott96ott50744,
## mrcaott96ott378, mrcaott378ott29446, mrcaott378ott1697, Brassicales,
## mrcaott378ott32461, mrcaott378ott532276, mrcaott378ott9635,
## mrcaott378ott125843, mrcaott378ott509568, mrcaott378ott28763,
## Brassicaceae, mrcaott378ott4073, mrcaott4073ott58909, mrcaott4073ott4671,
## mrcaott4073ott16478, mrcaott4073ott10585, mrcaott4073ott19434,
## mrcaott4073ott6278, mrcaott4073ott19798, mrcaott19798ott31487,
## mrcaott31487ott88883, Arabidopsis, mrcaott248ott10053, mrcaott248ott20991,
## mrcaott248ott557, mrcaott248ott27233, mrcaott248ott650, mrcaott248ott320,
## lamiids, mrcaott248ott979274, mrcaott248ott26035, mrcaott248ott1191,
## mrcaott248ott2108, Lamiales, mrcaott248ott11341, mrcaott248ott55259,
## Core_Lamiales, mrcaott248ott1016, mrcaott1016ott25224, mrcaott1016ott1452,
## mrcaott1452ott5046, mrcaott1452ott2133, mrcaott1452ott1605,
## mrcaott1452ott12526, mrcaott1452ott305916, mrcaott1452ott39175,
## Phrymaceae, Erythranthe, Liliopsida, mrcaott121ott290, Petrosaviidae,
## mrcaott121ott2256, mrcaott121ott1439, mrcaott121ott334, mrcaott121ott4575,
## mrcaott121ott3449, Zingiberales, mrcaott8470ott44226, Musaceae, Musa,
## mrcaott252ott213153, Poales, mrcaott252ott128594, mrcaott252ott1477,
## mrcaott1477ott591692, mrcaott1477ott2066, Cyperaceae, Cyperoideae,
## Cariceae, Carex
tr_bl2<-compute.brlen(tr2)
taxa2[,"gbif"] <- NA
locs = data.frame(matrix(vector(), nrow(taxa2), 2))
i=1
for(id in taxa2$ott_id){
   tax_info <- taxonomy_taxon_info(id)
   for(source in tax_info[[1]]$tax_sources){
     if (grepl('gbif', source, fixed=TRUE)){
            gbif_id <- strsplit(source,":")[[1]][2]
            taxa2[i,]$gbif <-gbif_id
            result <- occ_search(taxonKey = gbif_id, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 5)
            if (dim(result$data)[1] > 0) {
                try(locs[i,1] <- as.data.frame(result$data)[2])
                try(locs[i,2] <- as.data.frame(result$data)[1])
            }
     }}
   i = i + 1}
## Warning in `[<-.data.frame`(`*tmp*`, i, 1, value =
## structure(list(decimalLatitude = c(-35.998035, : replacement element 1 has
## 5 rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 2, value =
## structure(list(decimalLongitude = c(-70.564162, : replacement element 1 has
## 5 rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 1, value =
## structure(list(decimalLatitude = c(58.250051, : replacement element 1 has 5
## rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 2, value =
## structure(list(decimalLongitude = c(11.45741, : replacement element 1 has 5
## rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 1, value =
## structure(list(decimalLatitude = c(5.02, : replacement element 1 has 3 rows
## to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 2, value =
## structure(list(decimalLongitude = c(102.52, : replacement element 1 has 3
## rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 1, value =
## structure(list(decimalLatitude = c(66.280224, : replacement element 1 has 4
## rows to replace 1 rows
## Warning in `[<-.data.frame`(`*tmp*`, i, 2, value =
## structure(list(decimalLongitude = c(26.116533, : replacement element 1 has
## 4 rows to replace 1 rows
rownames(locs) <- gsub(' ','_',taxa2$unique_name) 
colnames(locs) <- c("latitude","longitude")

One of our species got reconciled to a synonym!

plot(tr2)

Mimulous guttatus is a synonym of Erythranthe lutea https://tree.opentreeoflife.org/taxonomy/browse?id=662909

taxa2
##          search_string          unique_name approximate_match ott_id
## 1    mimulous guttatus    Erythranthe lutea              TRUE 662909
## 2 arabidopsis thaliana Arabidopsis thaliana             FALSE 309263
## 3        musa gracilis        Musa gracilis             FALSE   8478
## 4       carex capitata       Carex capitata             FALSE   4161
##   is_synonym          flags number_matches    gbif
## 1       TRUE                             5 7730307
## 2      FALSE                             1 3052436
## 3      FALSE SIBLING_HIGHER              1 2762973
## 4      FALSE                             2 2728838

Because we are mapping tips based on keys instead of names, the name replacement doesn’t matter affect our mapping

tdobj2<-phylo.to.map(tr_bl2,locs, plot = FALSE, direction="rightwards")
## objective: 4
## objective: 2
## objective: 2
plot(tdobj2, direction="rightwards")