Reconstructing Biological Diversity

Evolution
- all the plants, animals and other organisms alive today, descended from earlier organisms by genetic modifications that accumulated in successive generations. These earlier organisms descended from even more ancient forms, and these from yet older forms of life. This chain of descent continues back in time to the beginning of life.

The evolutionary force that often gets major consideration because it leads to adaptation is Natural Selection.

1. organisms with successful adaptations leave more offspring.
2. these offspring inherit the successful adaptations and thus also leave more offspring.
3. eventually, after several generations, the adaptation becomes a feature all of the members of the species possess.

Thus, the natural world can be explained:

Similarities among organisms are due to common descent; differences are due to adaptations to the environment or (less commonly) chance events. As the environment never ceases changing, animals and plants accumulate modifications and are continuously molded into entirely different forms.




II. Systematics - Determining the Pattern of Evolution

Perhaps Darwin's best argument that life evolved was the observed hierarchy of life - all organisms are united in a pattern of increasing similarity. The pattern of relatedness is called phylogeny and systematics is the field of biology that studies and seeks to determine phylogenies.

A. Evidence Used to Determine Phylogeny

The data systematists use to reconstruct phylogenies are the attributes or characters that organisms have.

These characters come from

1. Morphology
2. Developmental Biology
3. Molecular Biology (genetics, biochemistry)

1. Morphology

Systematist study external characteristics, examine bones and teeth, dissect organ systems, make histological light microscopy slides, and peer at the cells and tissues with the electron microscope all to gather comparative evidence for evolution. Along the way, much is added to our knowledge about the basic biology of different organisms.

2. Developmental Pathways (Ontogeny)

The basic embryonic development is constrained and rarely changes because any mutation that affects early development is almost always lethal and selected against. Therefore, developmental evidence can be particularly useful in uncovering evolutionary relationships between diverse adult forms.

3. Molecular and Biochemical Data

Because the characteristics of an organism are determined by its genetic content, changes in organisms over the course of evolution should be reflected in their genetic information. CAUTION is needed though! Similarities may not always be a result of common ancestry. Molecules and genes could have similar construction as a result of convergent evolution.

B. Analyzing the Data to Reconstruct Evolution

There are three major ways in which classifications are constructed:

1. phenetic or numerical taxonomy;
2. synthetic or traditional taxonomy;
3. cladistic or phylogenetic taxonomy.

1. Phenetics or Numerical Taxonomy
The method that uses overall similarity is phenetics, or numerical taxonomy.

In measuring similarity, pheneticists use a variety of techniques and methods but all use the following steps:

1. A data matrix is constructed listing the character states (in coded numerical form) for all the characters among all of the organisms or taxa under study.

2. This descriptive information is converted into a measure (called a coefficient) of similarity between every pair of taxa.

3. Taxa are clustered based on their similarity coefficient so that those with higher similarity are placed together in a tree (called a phenogram).

For example: consider five species of bugs (A-E):



Ten characteristics (and the character states in parentheses)are noted:
1. Pointed nose (present or absent)
2. Four legs (present or absent)
3. Toes on legs (present or absent)
4. Large eyes (present or absent)
5. Triangular shield on back (present or absent)
6. Front legs stripped (present or absent)
7. Five spots on the back (present or absent)
8. Palps on posterior end (present or absent)
9. Antennae thick (present or absent)
10. Antennae covered with hairs so that they look stripped (present or absent)

The information can be recorded in a table:


It isn't necessary to use +/-, any coding will do and, in fact, it is common to convert observations to a numerical code in numerical taxonomy:


Simple matching similarity coefficient - is the ratio of the number of matches between the two taxa to the total number of characters. The coefficient is often multiplied by 100 and then it can be seen to mean the percentage of characters the two taxa share.

The simple matching similarity for taxon A and taxon B is:



7/10 or 0.70.

All of the matching coefficients can be represented in a matrix:


Clustering

i. Find the two most similar animals and link them.
From the similarity table , we can see two groups share 80% - B and C and D and E. These taxa are linked first at 80%:



ii. Link the next most similar taxa.
From the similarity table, we can see that 70% is the next highest cluster. But now we have a tie - should we join A to B-C; or D-E to B? If we join A to B and C it will cluster with the B-C group at 70%:
A-B = 0.7
A-C = 0.7
average = 0.7 + 0.7 ÷ 2 = 0.7

But to join B to the D-E cluster we must also include C because it is already joined to B:
D-B = 0.7
D-C = 0.5
E-B = 0.7
E-C = 0.5
average = 0.7 + 0.5 + 0.7 + 0.5 ÷ 4 = 0.6

So we can see that joining A next is best because it actually joins at a higher with more overall similarity to the BC cluster:



iii. Link D-E to the A-B-C- cluster by averaging
The D-E cluster must be linked to the A-C-B cluster because of the 0.7 similarity between B and DE. This is done by finding the average similarity between D and E and each of the three taxa in the A-B-C cluster:
A-D = 0.4
B-D = 0.7
C-D = 0.5
A-E = 0.4
B-E = 0.7
C-E = 0.5
total = 3.20
average = 3.2 ÷ 6 = 0.53

Then the A-B-C cluster and D-E are linked at 0.53 and the PHENOGRAM is complete:


There is a problem with using overall similarity to uncover relatedness.

Overall similarity may be misleading because there are actually two reasons why organisms have similar characteristics and only one of them is due to evolutionary relatedness.

When two species have a similar characteristic because it was inherited by both from a common ancestor, it is called a homologous features (or homology).

However, when unrelated species adopt a similar way of life, their body parts may take on similar functions and end up resembling one another due to convergent evolution. When two species have a similar characteristic because of convergent evolution, the feature is called an analogous features (or homoplasy).

How do biologists tell whether a similarity is homologous or homoplasious? A set of criteria that have resulted from years of experimentation and observation are used to identify homologies. These criteria include:

1) resemblance in detail;
2) similar position in relation to neighboring structures or organs;
3) similarity in embryological development; and
4) similar genetic control.


A final criterion (or test) of homology is agreement with other characters. This test tests hypotheses of homology using the deduction:

If a feature in two different animals is a homology, other features will indicate they are closely related (related animals usually share more than one homology).

If all other characters indicate that the animals are not closely related, the hypothesis has been falsified.


These homology criteria can be illustrated by examining the different mammalian forelimbs. At first glance, the wing of a bat, leg of a cat, flipper of a whale, arm of a human and leg of a horse may not seem very similar, but they are homologous. The limbs all contain the same type of bones (similar in detail). The forelimb attaches to the shoulder girdle in all of the animals (similar position in relation to neighboring structures). During embryological development, the forelimb develops from the same tissues in all of the mammals. In addition to the forelimbs, all mammals have hair and mammary glands (other characters indicate the animals are related and will share homologies).


The school that evaluates homology for phylogenetic reconstruction is:

2. Traditional or Synthetic Systematics

The basic assumption of the synthetic taxonomist is that the fossil record and diversity of living organisms can be explained by and interpreted in light of evolutionary mechanisms and scenarios as generally described in the modern synthesis.

Consider our example --
Five species of five species of bugs:



A careful examination of the bugs in their natural environment reveals that the bugs are preyed upon by birds and small mammals. A biologist who observes this might reasonably conclude that species may develope adaptations to avoid predators (e.g., body or leg markings that camoflague the bug; if they hide from predators in crevices thay might have large eyes to see in dim light; etc). The biologist might go further and conclude that these characters could be convergent adaptations in response to predation. Therefore, characters 4, 5, 6, and 7 would not reflect evolution. The tree should be built based on characters 1, 2, 3, 8, 9, and 10:


The major criticisms of traditional classification is its heavy reliance on models of evolution that may or may not be true. If the tree is used to develope a story about how organisms evolved, the biologist risks circular reasoning because an evolutionary story was used to build the tree.



3. Cladistics or Phylogenetic Systematics

Can we just group by overall homologous similarity and thus avoid models of evolution?

In other words, if two animals share the highest number of homologies, can we reasonably assume they are closest relatives? The answer is no - evolutionary relationships cannot be reconstructed by just grouping together species by their number of shared homologies.

This is because there are two kinds of homology: Apomorphy and Plesiomorphy.

The method that groups organisms that share derived characters is called Cladistics. Taxa that share many derived characters are grouped more closely together than those that do not. The relationships are shown in a branching hierarchical tree called a cladogram.

The cladogram is constructed such that the number of changes from one character state to the next are minimized. The principle behind this is the rule of parsimony - any hypothesis that requires fewer assumptions is a more defensible hypothesis.

Determining Primitive (Plesiomorphic) and Derived (Apomorphic) Characters

The first step in basic cladistic analysis is to determine which character states are primitive and which are derived.

outgroup comparison: if a taxon that is not a member of the group of organisms being classified has a character state that is the same as some of the organisms in the group, then that character state can be considered to be plesiomorphic. The outside taxon is called the outgroup and the organisms being classified are the ingroup.


1. b is plesiomorphic and a is apomorphic
b --> a
2. a is plesiomorphic and b is apomorphic
a --> b
If state a is also found in a taxon outside the group being studied, the first hypothesis will force us to make more assumptions than the second. That is it is less parsimonious:


Return to our example bugs and now compare them to a bug from another group:

This comparison lets us add a line to the data chart for the outgroup:



The rescored character chart is shown below the derived character states (apomorphies) coded as 1 and primitive states coded as zero (0):



Constructing a Cladogram
The method was described by Hennig and is called Hennig argumentation. It works by considering the information provided by each character one at a time.

What is the cladogram for the bug data?:


This actually is a simple example. What do we do if characters conflict? For example, suppose an eleventh character is observed that is present in both the outgroup and Taxon A:


This new character suggests that A and E are related. This evidence does not conflict with the information conveyed by characters 1, 2, 5, 7, 9, or 10:


But when the other characters (characters 3, 4, 6, and 8) are included several conflicts arise......

Character 3 (or 8) must be seen to evolve twice or to evolve and then secondarily be lost:

Or

Character 4 must be seen to evolve twice:

Character 6 must be seen to have been secondarily lost in E:


Making a tree that conforms to character 11, therefore, requires us to assume homoplasy in four other characters. On the other hand if we make a tree that conforms to those characters, character 11 is homoplasious:



We have two different hypotheses of the relationships. One in which A and E are closely related, and another in which A is closely related to C. Which one is best? The one circled below because it requires the fewest assumptions (i.e., is the most parsimonious):



The Preferred Method of Systematics

Criteria:

1. Repeatability
2. Information Content
3. Reconstruction of Evolution

Repeatability - Preference Of Cladistics Over Synthetics

Synthetic and cladistic taxonomy differ from each other primarily in the repeatability of the method. Cladistics, since it follows a more precise analytical procedure for handling data, is more repeatable and therefore more scientific. Synthetics is often not science but learned opinion.

Informativeness - Preference Of Cladistics Over Phenetics

It has sometimes been argued that the purpose of biological classification is not to reflect evolution, which is unknowable, but to create and indexing device through which storage and retrieval of information about organic diversity are facilitated. To fulfill this purpose they maintain that a classification should have high information content and be natural. It has been asserted that grouping by overall similarity (phenetics) will best achieve such a classification. However, in a series of papers, Farris (1977, 1979, 1980, 1981a, 1982, 1983a, 1982b) shows that cladistic methods will produce classifications that have more information content and naturalness than those resulting from the phenetic approach.

Simply put, information content of a classification is how much information it conveys about character state distribution. Information is contained in a classification when the branching pattern of the tree describes the distribution of the characters. For example, consider the following two trees for three taxa A, B, and C, their common ancestor Anc, and the data for two characters:

The branching pattern of tree a describes the changes in the character data perfectly. Tree b branches in a pattern that is opposite to the changes in the characters. Thus, tree a has higher information content than tree b.

Grouping by synapomorphy (shared derived character states) rather than raw similarity (phenetics) will always give a more informative classification because it uses the principle of parsimony and does not average together the information.



What information does the branching diagram convey about the characters:


In the phenogram, character 4 shows a convergence in taxa C and A. Furthermore, two of the branches - the BC group and the DE group are not defined by any characters (arrows in above phenogram).

The resulting cladogram is:



In the cladogram, none of the characters show convergence or reversals. Furthermore, all of the branches are defined by one or more character states.

Classifications

One of the major products of systematics is the formal classification system for species. These names are handles by which information and communication about organisms and diversity are conveyed.

The system of classification that we use today was originally described by Carolus Linnaeus.

The groups into which organisms are placed are referred to as taxa (singular, taxon). The taxa are arranged in a hierarchy. The broadest taxa contain a large number of organisms that share very fundamental characteristics. Each broad taxon includes many smaller, more inclusive taxa (each of which contains organisms that share increasingly more specific characteristics). The levels in the hierarchy are:



A biological classification should

1. summarize the characteristics of organisms efficiently (in other words, have high information content), and
2. reflect real groups (in other words, reflect evolutionary relatedness or phylogeny).

Therefore, we can use our phylogenetic trees to not only tell us about relatedness, but also to help us create a classification.


What a tree actually says about relationships

The trees that result from phylogenetic analysis should be viewed as relative statements of relationship. For example in the following tree, the otter and the weasel are hypothesized to share a more recent common ancestor with each other than with the seal; but the dog lineage (dogs+seals+bears+otters+ weasels) all share a more recent common ancestor with one another than with the cats:


The tree does not explicitly hypothesize ancestor-descendant relationships. In other words, the tree hypothesizes that otters and weasels are related, but not that weasels evolved from otters or that otters evolved from weasels.

Monophyly
One of the tasks of a systematist is to convert the tree into the formal hierarchical Linnean classification by giving groups that share a common ancestor a formal taxonomic name. Such groups are called monophyletic taxa and they are recognized because they share unique derived characters. The tree shows several sets of most closely related taxa, that are nested within larger sets. Because Linnean categories are also internested, these sets of taxa can be converted into Linnean categories. The monophyletic groups are:


Why Do Classification Schemes Change?

New Data -
New technologies constantly give rise to new sources of character information. New information reveals new similarities and differences among taxa that cause us to revise the placement of a taxon in a tree or to choose to lump or split a taxon within an existing classification.

New Taxa -
As previously unknown species are discovered, classifications will also need to be revised to reflect their placement. This will undoubted have a large impact on existing classification schemes because, at this time, we cannot say how many more species exist on earth waiting to be discovered.

Misinterpreted data
Finally, new studies occasionally lead to the discovery that features used to group species into a taxon are actually convergent or nonunique characters. When this happens, the old taxon is abandoned and a new monophyletic taxon is created in its place. There are two instances where this occurs:

Polyphyly - Occasionally, new studies lead to the discovery that features used to group species into a taxon are actually convergent characters. The taxon is then known to be polyphyletic (taxa that do not share a recent common ancestor and were grouped on the basis of homoplasy).

Paraphyly - Inevitably some plesiomorphic characters are incorrectly interpreted to be synapomorphies and a paraphyletic taxon is created.




How Trees Summarize Information (Mapping Characters Onto Trees)