Finding your taxa in the Open Tree of Life Taxonomy

Overview

Teaching: 5 min
Exercises: 5 min
Questions
  • What is the Open Tree of Life Taxonomy?

  • What are OTT ids?

  • What does TNRS stand for?

Objectives
  • Getting OTT ids for some taxa.

  • Understanding TNRS and approximate matching.



The Open Tree of Life Taxonomy (OTT from now on) synthesizes taxonomic information from different sources and assigns each taxon a unique numeric identifier, which we refer to as the OTT id. To interact with the OTT (and any other Open Tree of Life services) using R, we will learn how to use the functions from the rotl package. If you don’t know if you have the package installed, go to setup and follow the instructions there.

To deal with synonyms and scientific name misspellings, the Open Tree Taxonomy uses the Taxonomic Name Resolution Service (TNRS from now on), that allows linking scientific names to a unique OTT id, while dealing with misspellings, synonyms and scientific name variants. The functions from rotl that interact with OTT’s TNRS start with “tnrs_”.


Getting OTT ids for a taxon

To get OTT ids for a taxon or set of taxa we will use the function tnrs_match_names(). This function takes a character vector of one or more scientific names as main argument.

Hands on! Running TNRS

Do a tnrs_match_names() run for the amphibians (Amphibia). Save the output to an object named resolved_name.

You can try different misspellings and synonyms of your taxon to see TNRS in action.

resolved_name <- rotl::tnrs_match_names(names = "amphibians")
resolved_name
  search_string unique_name approximate_match ott_id is_synonym flags
1    amphibians    Amphibia              TRUE 544595      FALSE      
  number_matches
1              6

Ok, we were able to run the function tnrs_match_names successfully. Now, let’s explore the structure of the output.


The ‘match_names’ object

As we can tell from the data printed to screen, the output of the tnrs_match_names function is some sort of a data table. In R (and all object-oriented programmming languages), defined data structures called classes are assigned to objects. This makes data manipulation and usage of objects across different functions much easier. Redundantly, a class is defined as a data structure that is the same among all objects that belong to the same class. However, we can do more to understadn the structure of any class, To get the name of the class of the tnrs_match_names() output, we will use the function class.

class(resolved_name)
[1] "match_names" "data.frame" 


As you can see, an object can belong to one or more classes.

Indeed, R is telling us that the output of tnrs_match_names() is a data frame (a type of table) and a ‘match_names’ object, which is in turn a data frame with exactly 7 named columns: search_string, unique_name, approximate_match, ott_id, is_synonym, flags, and number_matches.

Next we will explore the kinds of data that are stored in each of the columns of a ‘match_names’ object.


Kinds of data stored in a ‘match_names’ object

You should have a good idea by now of what type of data is stored in the ott_ids column.

Can you guess what type of data is displayed in the column search_string and unique_name?

How about is_synonym?

The column approximate_match tells us whether the unique name was inferred from the search string using approximate matching (TRUE) or not (FALSE).

Finally, the flags column tells us if our unique name has been flagged in the OTT (TRUE) or not (FALSE). It also indicates the type of flag associated to the taxon. Flags are markers that indicate if the taxon in question is problematic and should be included in further analyses of the Open Tree workflow. You can read more about flags in the Open Tree wiki.

Now we know what kind of data is retrieved by the tnrs_match_names() function. Pretty cool!


Pro tip 1.1: Looking at “hidden” elements of a data object

The ‘match_names’ object has more data that is not exposed on the screen and is not part of the main data structure. This “hidden” data is stored in the attributes of the object. All objects have at least one attribute, the class. If an object has more attributes, these can be accesed with the function attributes().

Let’s explore the attributes and class of a basic object, such as a character vector. It certainly has a class:

class(c("Hello!", "my", "name", "is", "Luna!"))
[1] "character"

But what about other attributes:

attributes(c("Hello!", "my", "name", "is", "Luna!"))
NULL

As you can see, some objects have no hidden attributes.

Let’s look for hidden attributes on our ‘match_names’ object:

attributes(resolved_name)

The structure of the “attributes” data is complicated and extracting it requires some exploring.

class(attributes(resolved_name))
[1] "list"
names(attributes(resolved_name))
[1] "names"              "row.names"          "class"             
[4] "original_order"     "original_response"  "match_id"          
[7] "has_original_match" "json_coords"       
str(attributes(resolved_name))
List of 8
 $ names             : chr [1:7] "search_string" "unique_name" "approximate_match" "ott_id" ...
 $ row.names         : int 1
 $ class             : chr [1:2] "match_names" "data.frame"
 $ original_order    : num 1
 $ original_response :List of 10
  ..$ context                     : chr "All life"
  ..$ governing_code              : chr "undefined"
  ..$ includes_approximate_matches: logi TRUE
  ..$ includes_deprecated_taxa    : logi FALSE
  ..$ includes_suppressed_names   : logi FALSE
  ..$ matched_names               :List of 1
  .. ..$ : chr "amphibians"
  ..$ results                     :List of 1
  .. ..$ :List of 2
  .. .. ..$ matches:List of 6
  .. .. .. ..$ :List of 7
  .. .. .. .. ..$ is_approximate_match: logi TRUE
  .. .. .. .. ..$ is_synonym          : logi FALSE
  .. .. .. .. ..$ matched_name        : chr "Amphibina"
  .. .. .. .. ..$ nomenclature_code   : chr "ICZN"
  .. .. .. .. ..$ score               : num 0.778
  .. .. .. .. ..$ search_string       : chr "amphibians"
  .. .. .. .. ..$ taxon               :List of 10
  .. .. .. .. .. ..$ flags                   : list()
  .. .. .. .. .. ..$ is_suppressed           : logi FALSE
  .. .. .. .. .. ..$ is_suppressed_from_synth: logi FALSE
  .. .. .. .. .. ..$ name                    : chr "Succinea"
  .. .. .. .. .. ..$ ott_id                  : int 978937
  .. .. .. .. .. ..$ rank                    : chr "genus"
  .. .. .. .. .. ..$ source                  : chr "ott3.3draft1"
  .. .. .. .. .. ..$ synonyms                :List of 12
  .. .. .. .. .. .. ..$ : chr "Amphibia"
  .. .. .. .. .. .. ..$ : chr "Amphibina"
  .. .. .. .. .. .. ..$ : chr "Arborcinea"
  .. .. .. .. .. .. ..$ : chr "Brachyspira"
  .. .. .. .. .. .. ..$ : chr "Cerinasota"
  .. .. .. .. .. .. ..$ : chr "Cochlohydra"
  .. .. .. .. .. .. ..$ : chr "Luccinea"
  .. .. .. .. .. .. ..$ : chr "Lucena"
  .. .. .. .. .. .. ..$ : chr "Succinaea"
  .. .. .. .. .. .. ..$ : chr "Succinastrum"
  .. .. .. .. .. .. ..$ : chr "Tapada"
  .. .. .. .. .. .. ..$ : chr "Truella"
  .. .. .. .. .. ..$ tax_sources             :List of 7
  .. .. .. .. .. .. ..$ : chr "worms:181586"
  .. .. .. .. .. .. ..$ : chr "ncbi:145426"
  .. .. .. .. .. .. ..$ : chr "gbif:2297197"
  .. .. .. .. .. .. ..$ : chr "irmng:1393632"
  .. .. .. .. .. .. ..$ : chr "irmng:1348813"
  .. .. .. .. .. .. ..$ : chr "irmng:1133222"
  .. .. .. .. .. .. ..$ : chr "irmng:1202351"
  .. .. .. .. .. ..$ unique_name             : chr "Succinea"
  .. .. .. ..$ :List of 7
  .. .. .. .. ..$ is_approximate_match: logi TRUE
  .. .. .. .. ..$ is_synonym          : logi FALSE
  .. .. .. .. ..$ matched_name        : chr "Amphibia"
  .. .. .. .. ..$ nomenclature_code   : chr "ICZN"
  .. .. .. .. ..$ score               : num 0.75
  .. .. .. .. ..$ search_string       : chr "amphibians"
  .. .. .. .. ..$ taxon               :List of 10
  .. .. .. .. .. ..$ flags                   : list()
  .. .. .. .. .. ..$ is_suppressed           : logi FALSE
  .. .. .. .. .. ..$ is_suppressed_from_synth: logi FALSE
  .. .. .. .. .. ..$ name                    : chr "Amphibia"
  .. .. .. .. .. ..$ ott_id                  : int 544595
  .. .. .. .. .. ..$ rank                    : chr "class"
  .. .. .. .. .. ..$ source                  : chr "ott3.3draft1"
  .. .. .. .. .. ..$ synonyms                :List of 1
  .. .. .. .. .. .. ..$ : chr "Lissamphibia"
  .. .. .. .. .. ..$ tax_sources             :List of 4
  .. .. .. .. .. .. ..$ : chr "ncbi:8292"
  .. .. .. .. .. .. ..$ : chr "worms:178701"
  .. .. .. .. .. .. ..$ : chr "gbif:131"
  .. .. .. .. .. .. ..$ : chr "irmng:1131"
  .. .. .. .. .. ..$ unique_name             : chr "Amphibia"
  .. .. .. ..$ :List of 7
  .. .. .. .. ..$ is_approximate_match: logi TRUE
  .. .. .. .. ..$ is_synonym          : logi FALSE
  .. .. .. .. ..$ matched_name        : chr "Amphibia"
  .. .. .. .. ..$ nomenclature_code   : chr "ICN"
  .. .. .. .. ..$ score               : num 0.75
  .. .. .. .. ..$ search_string       : chr "amphibians"
  .. .. .. .. ..$ taxon               :List of 10
  .. .. .. .. .. ..$ flags                   :List of 1
  .. .. .. .. .. .. ..$ : chr "sibling_higher"
  .. .. .. .. .. ..$ is_suppressed           : logi FALSE
  .. .. .. .. .. ..$ is_suppressed_from_synth: logi FALSE
  .. .. .. .. .. ..$ name                    : chr "Bostrychia"
  .. .. .. .. .. ..$ ott_id                  : int 782484
  .. .. .. .. .. ..$ rank                    : chr "genus"
  .. .. .. .. .. ..$ source                  : chr "ott3.3draft1"
  .. .. .. .. .. ..$ synonyms                :List of 1
  .. .. .. .. .. .. ..$ : chr "Amphibia"
  .. .. .. .. .. ..$ tax_sources             :List of 5
  .. .. .. .. .. .. ..$ : chr "silva:AF203893/#6"
  .. .. .. .. .. .. ..$ : chr "ncbi:103711"
  .. .. .. .. .. .. ..$ : chr "worms:143904"
  .. .. .. .. .. .. ..$ : chr "gbif:2661216"
  .. .. .. .. .. .. ..$ : chr "irmng:1282403"
  .. .. .. .. .. ..$ unique_name             : chr "Bostrychia (genus in kingdom Archaeplastida)"
  .. .. .. ..$ :List of 7
  .. .. .. .. ..$ is_approximate_match: logi TRUE
  .. .. .. .. ..$ is_synonym          : logi FALSE
  .. .. .. .. ..$ matched_name        : chr "Amphibia"
  .. .. .. .. ..$ nomenclature_code   : chr "ICZN"
  .. .. .. .. ..$ score               : num 0.75
  .. .. .. .. ..$ search_string       : chr "amphibians"
  .. .. .. .. ..$ taxon               :List of 10
  .. .. .. .. .. ..$ flags                   : list()
  .. .. .. .. .. ..$ is_suppressed           : logi FALSE
  .. .. .. .. .. ..$ is_suppressed_from_synth: logi FALSE
  .. .. .. .. .. ..$ name                    : chr "Egadroma"
  .. .. .. .. .. ..$ ott_id                  : int 732965
  .. .. .. .. .. ..$ rank                    : chr "genus"
  .. .. .. .. .. ..$ source                  : chr "ott3.3draft1"
  .. .. .. .. .. ..$ synonyms                :List of 1
  .. .. .. .. .. .. ..$ : chr "Amphibia"
  .. .. .. .. .. ..$ tax_sources             :List of 2
  .. .. .. .. .. .. ..$ : chr "ncbi:247376"
  .. .. .. .. .. .. ..$ : chr "irmng:1307131"
  .. .. .. .. .. ..$ unique_name             : chr "Egadroma"
  .. .. .. ..$ :List of 7
  .. .. .. .. ..$ is_approximate_match: logi TRUE
  .. .. .. .. ..$ is_synonym          : logi FALSE
  .. .. .. .. ..$ matched_name        : chr "Amphibia"
  .. .. .. .. ..$ nomenclature_code   : chr "ICZN"
  .. .. .. .. ..$ score               : num 0.75
  .. .. .. .. ..$ search_string       : chr "amphibians"
  .. .. .. .. ..$ taxon               :List of 10
  .. .. .. .. .. ..$ flags                   : list()
  .. .. .. .. .. ..$ is_suppressed           : logi FALSE
  .. .. .. .. .. ..$ is_suppressed_from_synth: logi FALSE
  .. .. .. .. .. ..$ name                    : chr "Stenolophus"
  .. .. .. .. .. ..$ ott_id                  : int 561664
  .. .. .. .. .. ..$ rank                    : chr "genus"
  .. .. .. .. .. ..$ source                  : chr "ott3.3draft1"
  .. .. .. .. .. ..$ synonyms                :List of 6
  .. .. .. .. .. .. ..$ : chr "Agonoderos"
  .. .. .. .. .. .. ..$ : chr "Agonoderus"
  .. .. .. .. .. .. ..$ : chr "Amphibia"
  .. .. .. .. .. .. ..$ : chr "Astenolophus"
  .. .. .. .. .. .. ..$ : chr "Egadroma"
  .. .. .. .. .. .. ..$ : chr "Stenelophus"
  .. .. .. .. .. ..$ tax_sources             :List of 3
  .. .. .. .. .. .. ..$ : chr "ncbi:177549"
  .. .. .. .. .. .. ..$ : chr "gbif:8401238"
  .. .. .. .. .. .. ..$ : chr "irmng:1330562"
  .. .. .. .. .. ..$ unique_name             : chr "Stenolophus"
  .. .. .. ..$ :List of 7
  .. .. .. .. ..$ is_approximate_match: logi TRUE
  .. .. .. .. ..$ is_synonym          : logi FALSE
  .. .. .. .. ..$ matched_name        : chr "Amphibia"
  .. .. .. .. ..$ nomenclature_code   : chr "ICZN"
  .. .. .. .. ..$ score               : num 0.75
  .. .. .. .. ..$ search_string       : chr "amphibians"
  .. .. .. .. ..$ taxon               :List of 10
  .. .. .. .. .. ..$ flags                   : list()
  .. .. .. .. .. ..$ is_suppressed           : logi FALSE
  .. .. .. .. .. ..$ is_suppressed_from_synth: logi FALSE
  .. .. .. .. .. ..$ name                    : chr "Succinea"
  .. .. .. .. .. ..$ ott_id                  : int 978937
  .. .. .. .. .. ..$ rank                    : chr "genus"
  .. .. .. .. .. ..$ source                  : chr "ott3.3draft1"
  .. .. .. .. .. ..$ synonyms                :List of 12
  .. .. .. .. .. .. ..$ : chr "Amphibia"
  .. .. .. .. .. .. ..$ : chr "Amphibina"
  .. .. .. .. .. .. ..$ : chr "Arborcinea"
  .. .. .. .. .. .. ..$ : chr "Brachyspira"
  .. .. .. .. .. .. ..$ : chr "Cerinasota"
  .. .. .. .. .. .. ..$ : chr "Cochlohydra"
  .. .. .. .. .. .. ..$ : chr "Luccinea"
  .. .. .. .. .. .. ..$ : chr "Lucena"
  .. .. .. .. .. .. ..$ : chr "Succinaea"
  .. .. .. .. .. .. ..$ : chr "Succinastrum"
  .. .. .. .. .. .. ..$ : chr "Tapada"
  .. .. .. .. .. .. ..$ : chr "Truella"
  .. .. .. .. .. ..$ tax_sources             :List of 7
  .. .. .. .. .. .. ..$ : chr "worms:181586"
  .. .. .. .. .. .. ..$ : chr "ncbi:145426"
  .. .. .. .. .. .. ..$ : chr "gbif:2297197"
  .. .. .. .. .. .. ..$ : chr "irmng:1393632"
  .. .. .. .. .. .. ..$ : chr "irmng:1348813"
  .. .. .. .. .. .. ..$ : chr "irmng:1133222"
  .. .. .. .. .. .. ..$ : chr "irmng:1202351"
  .. .. .. .. .. ..$ unique_name             : chr "Succinea"
  .. .. ..$ name   : chr "amphibians"
  ..$ taxonomy                    :List of 5
  .. ..$ author : chr "open tree of life project"
  .. ..$ name   : chr "ott"
  .. ..$ source : chr "ott3.3draft1"
  .. ..$ version: chr "3.3"
  .. ..$ weburl : chr "https://tree.opentreeoflife.org/about/taxonomy-version/ott3.3"
  ..$ unambiguous_names           : list()
  ..$ unmatched_names             : list()
 $ match_id          : int 2
 $ has_original_match: logi TRUE
 $ json_coords       :'data.frame':	1 obs. of  4 variables:
  ..$ search_string     : chr "amphibians"
  ..$ original_order    : num 1
  ..$ match_id          : int 2
  ..$ has_original_match: logi TRUE

There are many hidden attributes on our ‘match_names’ object. The function synonyms() in the package rotl can extract the synonyms from the attributes of a ‘match_names’ object.

rotl::synonyms(resolved_name)
$Amphibia
[1] "Lissamphibia"

attr(,"class")
[1] "otl_synonyms" "list"        

That’s neat!


Getting OTT ids for multiple taxon names at a time

Now that we know about classes and the data structure of the tnrs_match_names output, we will learn how to use the tnrs_match_names function for multiple taxa. In this case, you will have to create a character vector with your taxon names and use it as input for tnrs_match_names:


Hands on! Running TNRS for multiple taxa

Do a tnrs_match_names() run for the amphibians (Amphibia), the genus of the dog (Canis), the genus of the cat (Felis), the family of dolphins (Delphinidae), and the class of birds (Aves). Save the output to an object named resolved_names.

Again, you can try different misspellings and synonyms of your taxa to see TNRS in action.

my_taxa <- c("amphibians", "canis", "felis", "delphinidae", "avess")
resolved_names <- rotl::tnrs_match_names(names = my_taxa, context_name = "All life")
resolved_names
  search_string unique_name approximate_match ott_id is_synonym flags
1    amphibians    Amphibia              TRUE 544595      FALSE      
2         canis       Canis             FALSE 372706      FALSE      
3         felis       Felis             FALSE 563165      FALSE      
4   delphinidae Delphinidae             FALSE 698406      FALSE      
5         avess        Aves              TRUE  81461      FALSE      
  number_matches
1              6
2              2
3              1
4              1
5              1

You should get a matched named for all the taxa in this example. If you do not get a match for all your taxa, and you get an unexpected warning message, it means that the tnrs_match_names function might not be working as expected. Please refer to Pro tip 1.2 below for alternative ways to get OTT ids for multiple taxa at a time using tnrs_match_names.

Finally, we are going to learn how to extract specific pieces of data from a match_names object to use in other functions and workflows.


Pro Tip 1.2: Getting OTT ids for multiple taxa, the hacker way.

If you get a warning message saying that any of your taxon names “are not matched”, it means that the tnrs_match_names function is not implementig TNRS for inputs with more than one name. This is an unexpected behaviour. See this GitHub issue for updates.

As you already know, running tnrs_match_names() using one name at a time works well:

rotl::tnrs_match_names(names = "amphibians")
rotl::tnrs_match_names(names = "avess")
  search_string unique_name approximate_match ott_id is_synonym flags
1    amphibians    Amphibia              TRUE 544595      FALSE      
  number_matches
1              6
  search_string unique_name approximate_match ott_id is_synonym flags
1         avess        Aves              TRUE  81461      FALSE      
  number_matches
1              1

While running it with multiple names without explicitly specifying a taxonomic context does not:

resolved_names <- rotl::tnrs_match_names(names = my_taxa)
Warning: amphibians, avess are not matched

If we want to run the function for a multiple element character vector, we can use a loop or an sapply, which will run the function individually for each taxa within my_taxa, avoiding the unexpected behaviours observed above.

Let’s try it using sapply:

resolved_names <- sapply(my_taxa, rotl::tnrs_match_names)
class(resolved_names)
[1] "matrix" "array" 
resolved_names
                  amphibians   canis   felis   delphinidae   avess  
search_string     "amphibians" "canis" "felis" "delphinidae" "avess"
unique_name       "Amphibia"   "Canis" "Felis" "Delphinidae" "Aves" 
approximate_match TRUE         FALSE   FALSE   FALSE         TRUE   
ott_id            544595       372706  563165  698406        81461  
is_synonym        FALSE        FALSE   FALSE   FALSE         FALSE  
flags             ""           ""      ""      ""            ""     
number_matches    6            2       1       1             1      

The data structure is not the same as we obtained using a single taxon name. To get that same data frame structure, we can transpose the output resolved_names with the function t, and make it a data.frame with the function as.data.frame:

resolved_names <- t(resolved_names)
resolved_names <- as.data.frame(resolved_names)
resolved_names
            search_string unique_name approximate_match ott_id is_synonym flags
amphibians     amphibians    Amphibia              TRUE 544595      FALSE      
canis               canis       Canis             FALSE 372706      FALSE      
felis               felis       Felis             FALSE 563165      FALSE      
delphinidae   delphinidae Delphinidae             FALSE 698406      FALSE      
avess               avess        Aves              TRUE  81461      FALSE      
            number_matches
amphibians               6
canis                    2
felis                    1
delphinidae              1
avess                    1
class(resolved_names)
[1] "data.frame"

Our object is now a data frame, but it is not a ‘match_names’ object As we mentioned above, classes are used by functions to recognise suitable data structure of objects. To use this object with other functions from the rotl pacakge, we will have to add ‘match_names’ to the class of our object:

class(resolved_names) <- c("match_names", "data.frame")
class(resolved_names)
[1] "match_names" "data.frame" 

Changing the class attribute does not change the actual structure of the object:

resolved_names
            search_string unique_name approximate_match ott_id is_synonym flags
amphibians     amphibians    Amphibia              TRUE 544595      FALSE      
canis               canis       Canis             FALSE 372706      FALSE      
felis               felis       Felis             FALSE 563165      FALSE      
delphinidae   delphinidae Delphinidae             FALSE 698406      FALSE      
avess               avess        Aves              TRUE  81461      FALSE      
            number_matches
amphibians               6
canis                    2
felis                    1
delphinidae              1
avess                    1


Extracting data from a ‘match_names’ object

It is easy to access elements from a ‘match_names’ object using regular indexing. For example, using the column number, we can extract all elements from a certain column. Let’s extract all data from the second column:

resolved_names[,2]
$amphibians
[1] "Amphibia"

$canis
[1] "Canis"

$felis
[1] "Felis"

$delphinidae
[1] "Delphinidae"

$avess
[1] "Aves"

We can also use the name of the column so we do not have to remember its position:

resolved_names[,"unique_name"]
$amphibians
[1] "Amphibia"

$canis
[1] "Canis"

$felis
[1] "Felis"

$delphinidae
[1] "Delphinidae"

$avess
[1] "Aves"

Because it is a ‘data.frame’, we can also access the values of any column by using the “$” and the column name to index it, like this:

resolved_names$unique_name
$amphibians
[1] "Amphibia"

$canis
[1] "Canis"

$felis
[1] "Felis"

$delphinidae
[1] "Delphinidae"

$avess
[1] "Aves"

The ‘match_names’ object has a relatively simple structure that is easy to explore and mine. We will see later that the outputs of other rotl functions are more complicated and accessing their elements requires a lot of hacking. Fortunately, the rotl creators have added some functions that allow interacting with these complicated outputs. The functions unique_name(), ott_id(), and flags() extract values from the respective columns of a ‘match_names’ object, in the form of a list instead of a vector. To extract data from the other columns there are no specialized functions, so you will have to index.


Hands on! Extract the OTT ids from a ‘match_names’ object

You now have a ‘match_names’ object that we called resolved_names. There are at least two ways to extract the OTT ids from it. Can you figure them out? Store them in an object we will call my_ott_ids.

Hint: You can find one solution by browsing the rotl package documentation to find a function that will do this for a ‘match_names’ object.

You will find a second solution by using your knowledge on data frames and tables to extract the data from the ott_id column.

Look at some solutions

Get the OTT ids as a list, with the function ott_id():

my_ott_id <- rotl::ott_id(resolved_names) # rotl:::ott_id.match_names(resolved_names) is the same.
my_ott_id
named list()
attr(,"class")
[1] "otl_ott_id" "list"      

Or, get the OTT ids as a vector:

my_ott_id <- resolved_names$ott_id # or resolved_names[, "ott_id"]
my_ott_id
$amphibians
[1] 544595

$canis
[1] 372706

$felis
[1] 563165

$delphinidae
[1] 698406

$avess
[1] 81461


There are no specialized functions to extract values from a row of a ‘match_names’ object, so we have to do some indexing. You can get values from all columns of one row:

resolved_names[1,]
           search_string unique_name approximate_match ott_id is_synonym flags
amphibians    amphibians    Amphibia              TRUE 544595      FALSE      
           number_matches
amphibians              6

Or get just one specific value from a certain column, using the column name:

resolved_names[1,"unique_name"]
$amphibians
[1] "Amphibia"

Or using the column position:

resolved_names[1,2]
$amphibians
[1] "Amphibia"


There we go! Now we know how to get OTT ids from a bunch of taxa of interest. Let’s see what we can do with these on the next section.


Pro tip 1.3: Name the rows of your ‘match_names’ object

To facilitate the use of OTT ids later, you can name the rows of your ‘match_names’ object using the function rownames().

You can name them whatever you want. For example, you can use the unique_name identifier:

rownames(resolved_names) <- resolved_names$unique_name
resolved_names

Or simply call them something short that makes sense to you and is easy to remember:

rownames(resolved_names) <- c("amphs", "dogs", "cats", "flippers", "birds")
resolved_names
         search_string unique_name approximate_match ott_id is_synonym flags
amphs       amphibians    Amphibia              TRUE 544595      FALSE      
dogs             canis       Canis             FALSE 372706      FALSE      
cats             felis       Felis             FALSE 563165      FALSE      
flippers   delphinidae Delphinidae             FALSE 698406      FALSE      
birds            avess        Aves              TRUE  81461      FALSE      
         number_matches
amphs                 6
dogs                  2
cats                  1
flippers              1
birds                 1

This will facilitate accessing elements of the ‘match_names’ object by allowing to just use the row name as row index (instead of a number).

There are at least two ways to do this.

You can use the “$” to acces a named column of the data frame:

resolved_names["flippers",]$ott_id
$delphinidae
[1] 698406

Or, you can use the column name as column index:

resolved_names["flippers","ott_id"]
$delphinidae
[1] 698406

In both cases, you will get the OTT id of the Delphinidae. Cool!


Key Points

  • Open Tree of Life Taxonomy ids, or OTT ids are unique numeric identifiers for individual taxa that the Open Tree of Life project uses to handle taxonomy.

  • You can go from a scientific name to an OTT id using TNRS matching.

  • You can not go from a common name to OTT id using the Open Tree of Life tools.