For background complete the introductory tutorials on rgbif and on rotl to familiarize yourself.
https://ropensci.org/tutorials/rotl_tutorial/
https://ropensci.org/tutorials/rgbif_tutorial/

Combining evolutionary and geographic information

In order to examine geographic distributions of species, it is useful to tie together information about phylogenetic relationships, which we can get from Open Tree of Life, with information about where individuals of that species are found.

Install and load necessary packages

install.packages("devtools")
library(devtools)
install_github("liamrevell/phytools")
#We are using the newest github version of phytools because it has some cool mapping updates that are not yet avaiable in the cran version (as of March 12, 2019).
library(phytools)
library(httr)
library(jsonlite)
library(rgbif)
library(rotl)
library(viridis)

Choose some species

Lets look for some fun species, A polar bear, “Ursus maritimus”, a hippo “Hippopotamus amphibius”, a gorilla “Gorilla gorilla” and a wild boar “Sus scrofa”.

polarbear <-"Ursus maritimus"
hippo <-"Hippopotamus amphibius"
boar <-"Sus scrofa"
gorilla <- "Gorilla gorilla"
spp <- c(polarbear, boar, gorilla, hippo)

We can use Open Tree of Life to generate a tree for our species!

First we match the names to species in the OpenTree database. Specifying the context of the names makes the search faster.

taxa <- tnrs_match_names(spp, context="Animals")
taxa
##            search_string            unique_name approximate_match ott_id
## 1        ursus maritimus        Ursus maritimus             FALSE  10732
## 2             sus scrofa             Sus scrofa             FALSE 730013
## 3        gorilla gorilla        Gorilla gorilla             FALSE 417965
## 4 hippopotamus amphibius Hippopotamus amphibius             FALSE 510762
##   is_synonym flags number_matches
## 1      FALSE                    1
## 2      FALSE                    1
## 3      FALSE                    1
## 4      FALSE                    1

Then we get the relationships between these species from the OpenTree synthetic tree. (See the whole tree at tree.opentreeoflife.org!)

tr <- tol_induced_subtree(ott_id(taxa), label="name")
## 
Progress [----------------------------------] 0/32 (  0%) ?s
Progress [================================] 32/32 (100%)  0s
                                                            
plot(tr)

Don’t forget to cite the studies that went into building your tree!

Pulling supporting studies isn’t included in rotl yet (or I couldn’t find it). So we will use an Open Tree API call directly.

url_ot <- 'https://api.opentreeoflife.org/v3/tree_of_life/induced_subtree'
body <- list(ott_ids=taxa$ott_id)
r <- POST(url_ot, body = body, encode = "json")#gets which studies support the subtree from  the open tree API
for(tree in content(r)$supporting_studies){
study <- strsplit(tree,'@')[[1]][1]
meta <- get_study_meta(study) #pulls the metadata for each study
pub <- get_publication(meta)  #grabs teh publication information
cat(pub,"\n\n")
}
## Meredith, R.W., Janecka J., Gatesy J., Ryder O.A., Fisher C., Teeling E., Goodbla A., Eizirik E., Simao T., Stadler T., Rabosky D., Honeycutt R., Flynn J., Ingram C., Steiner C., Williams T., Robinson T., Herrick A., Westerman M., Ayoub N., Springer M., & Murphy W. 2011. Impacts of the Cretaceous Terrestrial Revolution and KPg Extinction on Mammal Diversification. Science 334 (6055): 521-524. 
## 
## Nyakatura, Katrin, Olaf RP Bininda-Emonds. 2012. Updating the evolutionary history of Carnivora (Mammalia): a new species-level supertree complete with divergence time estimates. BMC Biology 10 (1): 12 
## 
## Tarver, J. E., P. C. J. Donoghue, M. J. Benton. 2011. Is evolutionary history repeatedly rewritten in light of new fossil discoveries? Proceedings of the Royal Society B: Biological Sciences 278 (1705): 599-604. 
## 
## Springer, Mark S., Robert W. Meredith, John Gatesy, Christopher A. Emerling, Jong Park, Daniel L. Rabosky, Tanja Stadler, Cynthia Steiner, Oliver A. Ryder, Jan E. Janečka, Colleen A. Fisher, William J. Murphy. 2012. Macroevolutionary dynamics and historical biogeography of primate diversification inferred from a species supermatrix. PLoS ONE 7 (11): e49521. 
## 
## Perelman, P., Johnson W.E., Roos C., Seuánez H.N., Horvath J.E., Moreira M.A., Kessing B.D., Pontius J., Roelke M., Rumpler Y., Schneider M.C., Silva A., O'brien S., & Pecon-slattery J. 2011. A Molecular Phylogeny of Living Primates. PLoS Genetics 7(3): e1001342. 
## 
## Pozzi, Luca, Jason A. Hodgson, Andrew S. Burrell, Kirstin N. Sterner, Ryan L. Raaum, Todd R. Disotell. 2014. Primate phylogenetic relationships and divergence dates inferred from complete mitochondrial genomes. Molecular Phylogenetics and Evolution 75: 165-183. 
## 
## Lartillot, Nicolas, Frédéric Delsuc. 2012. Joint reconstruction of divergence times and life-history evolution in placental mammals using a phylogenetic covariance model. Evolution 66 (6): 1773-1787. 
## 
## O'Leary, M. A., J. I. Bloch, J. J. Flynn, T. J. Gaudin, A. Giallombardo, N. P. Giannini, S. L. Goldberg, B. P. Kraatz, Z.-X. Luo, J. Meng, X. Ni, M. J. Novacek, F. A. Perini, Z. S. Randall, G. W. Rougier, E. J. Sargis, M. T. Silcox, N. B. Simmons, M. Spaulding, P. M. Velazco, M. Weksler, J. R. Wible, A. L. Cirranello. 2013. The placental mammal ancestor and the post-K-Pg radiation of placentals. Science 339 (6120): 662-667. 
## 
## Krause, Johannes, Tina Unger, Aline Noçon, Anna-Sapfo Malaspinas, Sergios-Orestis Kolokotronis, Mathias Stiller, Leopoldo Soibelzon, Helen Spriggs, Paul H Dear, Adrian W Briggs, Sarah CE Bray, Stephen J O'Brien, Gernot Rabeder, Paul Matheus, Alan Cooper, Montgomery Slatkin, Svante Pääbo, Michael Hofreiter. 2008. Mitochondrial genomes reveal an explosive radiation of extinct and extant bears near the Miocene-Pliocene boundary. BMC Evolutionary Biology 8 (1): 220.

Lets combine the phylogenetic information from Open Tree with geographic information from GBIF

We will use the Global Biodiversity Information Facility (GBIF) to search for locations of records of these species. Check out https://www.gbif.org/ to see tons more cool information!

dat1 <- occ_search(scientificName = polarbear, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 5)
dat2 <- occ_search(scientificName =hippo, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 5)
dat3 <- occ_search(scientificName = boar,  fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 5)
dat4 <- occ_search(scientificName = gorilla, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 5)
locs <- as.data.frame(rbind(dat1$data,dat2$data, dat3$data, dat4$data)) #Combine the data from each species
locs[,1] <- gsub(' ','_',locs[,1])# enforce newick style tip names, by replacing spaces with underscores
#Sometimes the lat longs have spaces in the records. 
#We use trimws to get rid of the whitespace, and as.numeric to encode them as numbers instead of characters.
locs[,2]<-as.numeric(trimws(as.character(locs[,2])))
locs[,3]<-as.numeric(trimws(as.character(locs[,3])))
latlong <- as.matrix(locs[,c(3,2)]) #flip the columns to lat, long instead of long,lat
rownames(latlong) <- locs[,1]

Check that your species are all included and have the correct names (taxonomic name recognition sometimes updates them)

latlong
##                        decimalLatitude decimalLongitude
## Ursus_maritimus              79.594483        17.834438
## Ursus_maritimus              76.693540        16.880762
## Ursus_maritimus              76.973100        15.420295
## Ursus_maritimus              79.036866        21.153876
## Ursus_maritimus              69.497186       -24.185662
## Hippopotamus_amphibius      -11.984929        32.264790
## Hippopotamus_amphibius        8.512886         9.760014
## Hippopotamus_amphibius       -1.298866        36.727085
## Hippopotamus_amphibius       -0.836471        36.335128
## Hippopotamus_amphibius      -11.807329        32.255476
## Sus_scrofa                   50.418335         8.203289
## Sus_scrofa                   28.937054       -96.243454
## Sus_scrofa                   50.105846         8.867303
## Sus_scrofa                   49.408302         7.472525
## Sus_scrofa                   51.477722         8.182111
## Gorilla_gorilla               2.912555        16.297543
## Gorilla_gorilla               2.894185        16.261598
## Gorilla_gorilla               2.967106        16.208158
## Gorilla_gorilla              -1.981317         9.546863
## Gorilla_gorilla               2.279494        16.511686

Now to join together the geographic information with the phylogenetic information!

Trees from OpenTree don’t automatically come with branch lengths - we need to infer some branch lengths for this tree in order to plot it. (in future we can use R-Datelife to get the ages! datelife.org)

tr_bl<-compute.brlen(tr)

Combine the phylogeny and the mapped locations using phytools. Thank you to Liam Revell for help getting this working!

cols<-setNames(sample(viridis(n=Ntip(tr_bl))),
    tr_bl$tip.label)
tdobj<-phylo.to.map(tr_bl,latlong,plot = FALSE, direction="rightwards")
## objective: 2
## objective: 2
## objective: 2
plot(tdobj, colors=cols, direction="rightwards")

We can do a better job of name matching by actually using the cross taxonomy mappings already encoded in the Open Tree Taxonomy!

Lets try it some plants

plants <- c("Mimulous guttatus","Arabidopsis thaliana", "Musa gracilis", "Carex capitata")
plant_taxa <- tnrs_match_names(plants)
plant_tree <- tol_induced_subtree(ott_id(plant_taxa), label="name")
## 
Progress [----------------------------------] 0/82 (  0%) ?s
Progress [================================] 82/82 (100%)  0s
                                                            
plant_tree_bl<-compute.brlen(plant_tree)

One of these species is not what we searched for!

plot(plant_tree)

Mimulus guttatus is a synonym of Erythranthe lutea https://tree.opentreeoflife.org/taxonomy/browse?id=662909

plant_taxa
##          search_string          unique_name approximate_match ott_id
## 1    mimulous guttatus    Erythranthe lutea              TRUE 662909
## 2 arabidopsis thaliana Arabidopsis thaliana             FALSE 309263
## 3        musa gracilis        Musa gracilis             FALSE   8478
## 4       carex capitata       Carex capitata             FALSE   4161
##   is_synonym          flags number_matches
## 1       TRUE                             5
## 2      FALSE                             1
## 3      FALSE SIBLING_HIGHER              1
## 4      FALSE                             2

Because we will mapping tips based on unique identifiers instead of names, the name replacement doesn’t affect our mapping

We can write a function to search for a GBIF identifier using the OpenTree ID for each taxon.

get_gbif_id<-function(ottid){
   tax_info <- taxonomy_taxon_info(as.integer(ottid))
   gbif_id = NA
   for(source in tax_info[[1]]$tax_sources){
     if (grepl('gbif', source, fixed=TRUE)){
            gbif_id <- strsplit(source,":")[[1]][2]
     }}
     return(gbif_id)
   }

Once we have the gbif ID, we can seach for location records using that ID

plant_locs<- NULL
for(ottid in plant_taxa$ott_id){
   gbif_id <- get_gbif_id(ottid)
   result <- occ_search(taxonKey = gbif_id, fields=c('name','decimalLatitude', 'decimalLongitude'), limit = 2)
   if (dim(result$data)[2]==3){#drops records with no data at all
      plant_locs<-as.data.frame(rbind(plant_locs, result$data))
   }
}

#some tidying to make the input in the correct format for phytools
plant_locs<-na.omit(plant_locs) #remove records with NA
plant_locs[,1] <- gsub(' ','_',plant_locs[,1])# enforce newick style tip names, by replacing spaces with underscores
#Sometimes the lat longs have spaces in the records. 
#We use trimws to get rid of the whitespace, and as.numeric to encode them as numbers instead of characters.
plant_locs[,2]<-as.numeric(trimws(as.character(plant_locs[,2])))
plant_locs[,3]<-as.numeric(trimws(as.character(plant_locs[,3])))
plant_latlong <- as.matrix(plant_locs[,c(3,2)]) #flip the columns to lat, long instead of long,lat
rownames(plant_latlong) <- plant_locs[,1]
cols<-setNames(sample(viridis(n=Ntip(plant_tree_bl))),plant_tree_bl$tip.label)
plant_tdobj<-phylo.to.map(plant_tree_bl,plant_latlong, plot = FALSE, direction = "rightwards")
## objective: 2
## objective: 0
## objective: 0
plot(plant_tdobj, colors=cols, direction="rightwards")