By the end of this video, you should know the shared terminology that we use in genomic epidemiology for describing trees. This terminology is helpful for describing specific parts of the tree, patterns you see in the tree, and for orienting where something is in the tree.
Take-home messages
- Trees have tips(leaves), branches, and internal nodes.
- Tips, which can also be called leaves, represent genome sequences from pathogens that you have actually sampled.
- Internal nodes represent inferred ancestral pathogens from which our sampled pathogens descend. We haven’t sampled them, but we think that these ancestral pathogens likely circulated at some point given the pattern of genetic diversity that we actually do observe in the tips of the tree.
- We can infer the probable sequence of ancestral pathogens represented by internal nodes.
- Branches connect internal nodes to other internal nodes and to tips in the tree. The length of a branch is proportional to the amount of nucleotide changes that occur on the branch. I.e. longer branches indicate more mutations, shorter branches indicate fewer mutations.
- You can calculate the number of nucleotide differences between two sequences by walking along the branches between two sequences, and counting up the sum of the branch lengths.
- If two tips have identical genome sequences, then they should not be separated by any branch length. This means that they will appear to be stacked vertically within a tree.
- The root of the tree is a special internal node. It represents the earliest ancestral node in your entire tree, from which all other internal nodes and tips and your tree are descended.
- The term clade, or lineage, refers to groups of related tips and internal nodes.
- We define a clade by the mutation or set of mutations that all sequences within the clade share, but that sequences outside of the clade/lineage do not have.
- Because the tree is inherently hierarchical, clades/lineages within the tree can be nested.
- We use the term basal to describe the directionality of being “earlier” in the tree. E.g., because the root is the earliest internal node in the tree, we describe the root as being the most basal internal node in the tree.
- We use the term topology to describe the pattern of grouping into clades that is present in a particular tree.
- You can rotate different parts of the tree around internal nodes without changing the topology. The topology stays the same as long as sequences continue to group together in the same clades.
Questions
- Take a look at the tree below. Which samples are part of a clade defined by a C8950T mutation?
- Using the same tree as in the previous question, which mutation is more basal, C8950T or T23031C?
- Again using this same tree, how many mutations are unique to hCoV-10/Israel/CVL-4630/2021 in this specific tree?
- How many mutations separate hCoV-10/Israel/CVL-4630/2021 from hCoV-19/Palestine/AAS57/2021?
- Take a look at the three trees below. Which one(s) have the same topology? Which one(s) have a different topology? Can you explain why?