Dealing with "broken" and "invalid" taxa
Overview
Teaching: 5 min
Exercises: 5 minQuestions
What is a broken taxon?
How do I detect it?
Objectives
Get to know the functions that interact with nodes in the synthetic OpenTree.
Understand outputs from those functions.
We say that a taxon is “broken” when its OTT id is not assigned to any node in the OpenTree synthetic tree. As mentioned before, this happens when the OTT id belongs to a taxon that is not monophyletic in the current version of the synthetic OpenTree. This is the reason why we get an error when we try to get an OpenTree synthetic subtree including the OTT id of the genus Canis –it is not monophyletic in the tree.
There is a way to find out that a group is “broken” before trying to get the subtree and getting an error.
rotl::is_in_tree(resolved_names["Canis",]$ott_id)
[1] FALSE
Indeed, our Canis is not in the synthetic OpenTree. To extract a subtree of a “broken” taxon, we have some options. But we will focus on one.
Getting the MRCA of a taxon
The function tol_node_info()
gets for you all relevant information of the node that is the ancestor or MRCA of a taxon. That also includes the actual node id.
canis_node_info <- rotl::tol_node_info(resolved_names["Canis",]$ott_id)
canis_node_info
OpenTree node.
Node id: mrcaott47497ott110766
Number of terminal descendants: 85
Is taxon: FALSE
Let’s explore the class of the output.
class(canis_node_info)
[1] "tol_node" "list"
So we have an object of class ‘list’ and ‘tol_node’. When we printed it, we got some information. But we do not know how much information might not be “printed” to screen.
Let’s use the functions str()
or ls()
to check out the data strcture of our ‘tol_node’ object.
str(canis_node_info)
List of 8
$ node_id : chr "mrcaott47497ott110766"
$ num_tips : int 85
$ query : chr "ott372706"
$ resolves :List of 1
..$ pg_2812@tree6545: chr "node1135827"
$ source_id_map:List of 5
..$ ot_278@tree1 :List of 3
.. ..$ git_sha : chr ""
.. ..$ study_id: chr "ot_278"
.. ..$ tree_id : chr "tree1"
..$ ot_328@tree1 :List of 3
.. ..$ git_sha : chr ""
.. ..$ study_id: chr "ot_328"
.. ..$ tree_id : chr "tree1"
..$ pg_1428@tree2855:List of 3
.. ..$ git_sha : chr ""
.. ..$ study_id: chr "pg_1428"
.. ..$ tree_id : chr "tree2855"
..$ pg_2647@tree6169:List of 3
.. ..$ git_sha : chr ""
.. ..$ study_id: chr "pg_2647"
.. ..$ tree_id : chr "tree6169"
..$ pg_2812@tree6545:List of 3
.. ..$ git_sha : chr ""
.. ..$ study_id: chr "pg_2812"
.. ..$ tree_id : chr "tree6545"
$ supported_by :List of 2
..$ ot_278@tree1: chr "node233"
..$ ot_328@tree1: chr "node495"
$ synth_id : chr "opentree13.4"
$ terminal :List of 2
..$ pg_1428@tree2855: chr "node610132"
..$ pg_2647@tree6169: chr "ott247333"
- attr(*, "class")= chr [1:2] "tol_node" "list"
This is telling us that tol_node_info()
extracted 8 different pieces of information from my node.
Right now we are only interested in the node id. Where do you think we can find it?
Hands on! Get the node id of Canis MRCA
Extract it from your
canis_node_info
object and call itcanis_node_id
.canis_node_id <- canis_node_info$node_id
Pro tip 3.1: Get the node id of the MRCA of a group of OTT ids
Sometimes you want the MRCA of a bunch of lineages. The function
tol_mrca()
gets the node of the MRCA of a group of OTT ids.Can you use it to get the mrca of Canis?
The node that contains Canis is mrcaott47497ott110766.
Getting a subtree using a node id instead of the taxon OTT id
Now that we have a node id, we can use it to get a subtree with tol_subtree()
, using the argument node_id
.
canis_node_subtree <- rotl::tol_subtree(node_id = canis_node_id)
canis_node_subtree
Phylogenetic tree with 85 tips and 28 internal nodes.
Tip labels:
Canis_lupus_pallipes_ott47497, Canis_lupus_chanco_ott47500, Canis_lupus_baileyi_ott67371, Canis_lupus_laniger_ott80830, Canis_lupus_hattai_ott83897, Canis_lupus_desertorum_ott234374, ...
Node labels:
, , , , , , ...
Unrooted; no branch lengths.
ape::plot.phylo(canis_node_subtree, cex = 1.2)
Nice! We got a subtree of 85 tips, containing all descendants from the node that also contains Canis.
If you explore the taxon names at the tip, you will notice that this includes species assigned to genera other than Canis.
Now, what if I want a subtree of certain taxonomic ranks withing my group? Go to the next episode and find out how you can do this!
Pro Tip 3.2: Get an induced subtree of taxonomic children
What if I really, really need a tree containing species within the genus Canis only, excluding everything that does not belong to the genus taxonomically, even if it does phylogenetically?
We can get the OTT ids of the taxonomic children of our taxon of interest and use the function
tol_induced_subtree()
.First, we will get the taxonomic children.
canis_taxonomy <- rotl::taxonomy_subtree(resolved_names["Canis",]$ott_id)
canis_taxonomy
$tip_label [1] "Canis_dirus_ott3612500" [2] "Canis_anthus_ott5835572" [3] "Canis_rufus_ott113383" [4] "Canis_simensis_ott752755" [5] "Canis_aureus_ott621168" [6] "Canis_mesomelas_elongae_ott576165" [7] "Canis_adustus_ott621176" [8] "unclassified_Canis_ott7655955" [9] "Canis_latrans_ott247331" [10] "Canis_lupus_baileyi_ott67371" [11] "Canis_lupus_laniger_ott80830" [12] "Canis_lupus_orion_ott7067596" [13] "Canis_lupus_hodophilax_ott318630" [14] "Canis_lupus_signatus_ott545727" [15] "Canis_lupus_arctos_ott5340002" [16] "Canis_lupus_mogollonensis_ott263524" [17] "Canis_lupus_variabilis_ott5839539" [18] "Canis_lupus_lupus_ott883675" [19] "Canis_lupus_campestris_ott4941916" [20] "Canis_lupus_lycaon_ott948004" [21] "Canis_lupus_pallipes_ott47497" [22] "Canis_lupus_chanco_ott47500" [23] "Canis_lupus_x_Canis_lupus_familiaris_ott4941915" [24] "Canis_lupus_desertorum_ott234374" [25] "Canis_lupus_familiaris_ott247333" [26] "Canis_lupus_dingo_ott380529" [27] "Canis_lupus_labradorius_ott531973" [28] "Canis_lupus_hattai_ott83897" [29] "Canis_lupus_lupaster_ott987895" [30] "Canis_himalayensis_ott346723" [31] "Canis_indica_ott346728" [32] "Canis_environmental_samples_ott4941917" [33] "Canissp.KEB-2016ott5925604" [34] "Canis_sp._CANInt1_ott470950" [35] "'Canissp.Russia/33" [36] "500ott5338950'" [37] "Canis_sp._ott247325" [38] "'Canissp.Belgium/36" [39] "000ott5338951'" [40] "Canis_environmental_sample_ott4941918" [41] "Canis_morenis_ott6145387" [42] "Canis_niger_ott6145388" [43] "Canis_palaeoplatensis_ott6145390" [44] "Canis_osorum_ott6145389" [45] "Canis_thooides_ott6145392" [46] "Canis_antarcticus_ott6145381" [47] "Canis_proplatensis_ott6145391" [48] "Canis_feneus_ott6145384" [49] "Canis_geismarianus_ott6145385" [50] "Canis_ameghinoi_ott7655930" [51] "Canis_nehringi_ott7655947" [52] "Canis_palustris_ott7655949" [53] "Canis_lanka_ott7655942" [54] "Canis_pallipes_ott7655948" [55] "Canis_gezi_ott7655939" [56] "Canis_montanus_ott7655945" [57] "Canis_primaevus_ott7655951" [58] "Canis_chrysurus_ott7655935" [59] "Canis_dukhunensis_ott7655937" [60] "Canis_kokree_ott7655941" [61] "Canis_sladeni_ott7655952" [62] "Canis_himalaicus_ott7655940" [63] "Canis_chanco_ott7655934" [64] "Canis_curvipalatus_ott7655936" [65] "Canis_lateralis_ott7655943" [66] "Canis_argentinus_ott7655931" [67] "Canis_tarijensis_ott7655953" [68] "Canis_naria_ott7655946" [69] "Canis_peruanus_ott7655950" [70] "Canis_cautleyi_ott7655933" [71] "Canis_ursinus_ott7655954" [72] "Canis_armbrusteri_ott3612502" [73] "Canis_ferox_ott3612501" [74] "Canis_lepophagus_ott3612503" [75] "Canis_edwardii_ott3612509" [76] "Canis_apolloniensis_ott3612508" [77] "Canis_cedazoensis_ott3612507" [78] "Canis_primigenius_ott3612506" [79] "Canis_lydekkeri_ott7655944" [80] "Canis_arnensis_ott7655932" [81] "Canis_antarticus_ott6145382" [82] "Canis_dingo_ott6145383" [83] "Canis_etruscus_ott7655938" [84] "Canis_spelaeus_ott3612504" $edge_label [1] "Canis_mesomelas_ott666235" "Canis_lupus_ott247341" [3] "Canis_ott372706"
Now, extract the OTT ids.
canis_taxonomy_ott_ids <- datelife::extract_ott_ids(x = canis_taxonomy$tip_label)
After extracting ott ids, there are some non numeric elements:
Canissp.KEB-2016ott5925604 'Canissp.Russia/33 500ott5338950' 'Canissp.Belgium/36 000ott5338951'
NAs removed.
Try to get an induced subtree of Canis taxonomic children.
canis_taxonomy_subtree <- rotl::tol_induced_subtree(canis_taxonomy_ott_ids)
Error: HTTP failure: 400 [/v3/tree_of_life/induced_subtree] Error: node_id 'ott3612504' was not found!list(ott247325 = "pruned_ott_id", ott3612504 = "pruned_ott_id", ott3612506 = "pruned_ott_id", ott3612508 = "pruned_ott_id", ott470950 = "pruned_ott_id", ott4941915 = "pruned_ott_id", ott4941917 = "pruned_ott_id", ott6145381 = "pruned_ott_id", ott6145384 = "pruned_ott_id", ott6145385 = "pruned_ott_id", ott6145387 = "pruned_ott_id", ott6145388 = "pruned_ott_id", ott6145389 = "pruned_ott_id", ott6145390 = "pruned_ott_id", ott6145391 = "pruned_ott_id", ott6145392 = "pruned_ott_id", ott7655932 = "pruned_ott_id", ott7655944 = "pruned_ott_id", ott7655945 = "pruned_ott_id", ott7655955 = "pruned_ott_id")
It is often not possible to get an induced subtree of all taxonomic children from a taxon, because some of them will not make it to the synthetic tree.
To verify which ones are giving us trouble, we can use the function
is_in_tree()
again.canis_in_tree <- sapply(canis_taxonomy_ott_ids, rotl::is_in_tree) # logical vector canis_taxonomy_ott_ids_intree <- canis_taxonomy_ott_ids[canis_in_tree] # extract ott ids in tree
Now get the tree.
canis_taxonomy_subtree <- rotl::tol_induced_subtree(canis_taxonomy_ott_ids_intree)
Plot it.
ape::plot.phylo(canis_taxonomy_subtree, cex = 1.2)
There! We have a synthetic subtree (derived from phylogenetic information) containing only the taxonomic children of Canis.
Key Points
It is not possible to get a subtre from an OTT id that is not in the synthetic tree.
OTT ids and node ids allow us to interact with the synthetic OpenTree.