5 The Early Branches in the Tree of Life

Back

Norman R. Pace

76

The development of DNA sequencing technology in the last

decades of the 20th century revolutionized biology, including

the ways in which we can study the history of life. Before

the availability of gene sequences, relationships of fossils were

the main hope to chart the evolution of life. The character

traits used to relate organisms in evolution were primarily

morphological and could not be applied to microbial organisms.

With gene sequences, contemporary organisms are

related quantitatively in terms of nucleotide differences.

Variation in sequences among modern organisms is a measure

of the extent of biodiversity. Gene and now wholegenome

sequences also allow the inference of maps of the

history of evolution, in the form of phylogenetic trees. The

results are illuminating and provide grist for conjecture and

controversy on the evolutionary process. The purpose of this

article is to tour the large-scale structure of the phylogenetic

Tree of Life and to provide some interpretation of this emerging

view of life’s history. I emphasize how our understanding

of the extent of the tree has expanded because of recent

molecular studies of microbial diversity in the environment.

Molecular Phylogeny: Inference

of Phylogenetic Trees

Ancestral relationships of modern organisms are derived

using the techniques of “molecular phylogeny.” The basic

notion of molecular phylogeny is simple. Sequences of homologous

(more properly, orthologous) genes, genes with

common ancestry and function, from different organisms are

aligned so that corresponding DNA bases can be compared.

The number of differences between pairs of sequences is

counted, which is considered to be some measure of the

evolutionary distance that has separated the pairs of organisms.

Just as geographical maps can be constructed from

distances between land features, evolutionary maps—“phylogenetic

trees”—can be inferred from evolutionary distances

(sequence changes) between homologous genes. Calculations

of the path of evolution are fraught with statistical uncertainties,

however.

The process of inferring the best relatedness trees from

pairwise sequence counts is complex and dependent on

models of evolution used to calculate such trees (Swofford

et al. 1996). One complexity that vexes attempts to infer the

deeper relationships in the universal phylogenetic tree with

certainty is that the actual number of sequence changes was

greater than the observed number. This is because of the

probabilities of back mutations, where no change is counted,

and multiple past mutations, which are counted as only single

changes. Numbers of mutational events per observed mutation

can be estimated statistically, but a significant amount

of the information used to build trees then becomes inferential,

not directly observed. The mathematics of estimating

actual changes from observed change are such that deeper

branch points in phylogenetic trees are accompanied by

greater statistical uncertainty as to their position. Still another

The Early Branches in the Tree of Life 77

complexity is that different lines of descent have evolved at

different rates, which confuses tree-building algorithms.

Current advanced methods for inference of phylogenetic

relationships are well developed to cope with the problems

mentioned and others, but statistical vagaries are inescapable.

The methods in common use are dependent on different

models for reconstructing relationships, and this can influence

the topological outcome of phylogenetic calculations.

Popular methods for inferring phylogenetic trees from sequence

relationships include evolutionary distance (ED),

maximum parsimony (MP), and maximum likelihood (ML).

ED uses corrected sequence differences directly as distances

to calculate the pattern of ancestral connections. MP presumes

that the fewest changes make the best trees, so optimal

relatedness patterns are estimated by the minimum

number of changes required to generate the topology. ML is

a statistical method that calculates the likelihood of a particular

topology given the sequence differences. In each case,

statistical uncertainties in the calculations render any particular

result questionable. Consequently, nodes in trees are

tested many times using the same method and with subsets

of the sequence collection, so-called “bootstrap analysis.” The

reliability of a particular result, for instance, a branch point

in a tree or the composition of a relatedness group, is tested

by the frequency with which the result occurs in the set of

bootstrap trees. At the current state of their development, the

different methods for calculating phylogenetic trees usually

give generally comparable results. Nonetheless, intrinsic uncertainties

in any tree must be acknowledged, particularly in

the placement of deeper branches.

What Gene for Deep Phylogeny?

Any collection of homologous gene sequences can be used

to infer phylogenetic relationships among those genes, but

genes used to infer the overall structure of evolution, a universal

phylogenetic tree, have special constraints on their

properties (Woese 1987). One is that the gene must occur

in all forms of life, so all can be related to one another. The

hemoglobin gene, for instance, would not be useful for largescale

phylogeny because many groups do not contain the

gene. A second constraint is that the gene must have resisted,

over the ages, lateral transfer between genetic lines of descent.

Genomic studies have shown clearly that many kinds

of genes, for example, metabolic genes, have experienced extensive

lateral transfer during the course of their evolution

(Koonin et al. 2001, Woese 2000). Use of such genes for

phylogenetic reconstructions produces conflicting results. A

third constraint on genes that can be used to infer global

phylogenetic trees is that they contain sufficient information,

numbers of homologous nucleotides, so that relationships

can be established with the best statistical reliability. There

are not many genes that meet all these requirements. Most

genes occur in only a limited diversity of organisms, and

many have undergone lateral transfer. The most generally

accepted large-scale phylogenetic results are based on the use

of ribosomal RNA (rRNA) gene sequences, those of the large

subunits and small subunits (SSUs) of rRNAs. Ribosomes are

present in all cells and major organelles, and phylogenetic

trees inferred with these gene sequences are congruent with

trees constructed using other elements of the cellular nucleic

acid–based, information-processing machinery. Therefore,

changes in the rRNA sequences seem to reflect the evolutionary

path of the genetic machinery.

SSU rRNA sequences were first used for phylogenetic studies

by Carl Woese, even before it was possible to determine

gene sequences rapidly. Woese painstakingly prepared radioactive

rRNAs from many diverse organisms, mostly microbes,

and compared their content of short patches of sequences,

fragments called oligonucleotides. The prevailing notion of

life’s evolutionary diversity at the time was framed in the context

of two kinds of organisms, procaryote or eucaryote. Consequently,

it was unexpected when the rRNA sequences from

diverse organisms fell into three, not two, fundamentally distinct

groups (Woese and Fox 1977). There had to be three

primary lines of evolutionary descent, phylogenetic “domains,”

now termed Archaea (formerly archaebacteria), (eu)Bacteria,

and Eucarya (eucaryotes; Woese et al. 1990). Woese’s 1977

paper reporting the discovery of Archaea sparked publicity and

controversy (Woese and Fox 1977). The concept of three primary

relatedness groups of life touched off a flurry of refutations

defending the procaryote–eucaryote or the five-kingdoms

notions to account for biological organization. These familiar

notions had never previously been tested, however, and the

analysis of rRNA sequences proved them fundamentally incorrect.

The shift in public and textbook treatment of the large

organization of life is ongoing.

The Three Phylogenetic Domains of Life

Figure 5.1 is derived from a tree calculated using the particular

set of rRNA sequences (Barns et al. 1996). The figure is a

rough map of the course of evolution of the genetic core of

cells, the collection of genes that propagates replication and

gene expression. The dimension along the lines is sequence

change, not time. Estimated evolutionary change that separates

contemporary sequences (organisms) is read along line segments.

The “root” of the universal tree, the point of origin for

modern lineages, cannot be established using sequences of only

one type of molecule. However, phylogenetic studies of gene

families that originated before the last common ancestor of the

three domains have placed the root on the bacterial line (Gogarten

et al. 1989, Iwabe et al. 1989). This means that Eucarya

and Archaea had a common history that excluded the descendants

of the bacterial line. This period of evolutionary history

shared by Eucarya and Archaea was an important time in the

evolution of cells, during which the refinement of the primordial

information-processing mechanisms occurred. Thus,

78 The Origin and Radiation of Life on Earth

modern representatives of Eucarya and Archaea share many

properties that differ from bacterial cells in fundamental ways.

One example of similarities and differences is in the nature of

the transcription machinery. The RNA polymerases of Eucarya

and Archaea resemble each other far more than either resembles

the bacterial type of polymerase. Moreover, whereas

all bacterial cells use sigma factors to regulate the initiation of

transcription, eucaryal and archaeal cells use TATA-binding

proteins (Marsh et al. 1994, Rowlands et al. 1994). The shared

evolutionary history of Eucarya and Archaea suggests that we

may be able to recognize fundamental elements of our own

cells through study of the far simpler archaeal version.

The rRNA sequence information, along with other molecular

data, solidly confirms the century-old notion that mitochondria

and chloroplasts are derived from bacterial symbionts.

The sequence comparisons establish that mitochondria

are representatives of the Proteobacteria, the group

indicated by Escherichia and Agrobacterium in figure 5.1.

Chloroplasts derived from cyanobacteria, represented by

Synechococcus and Gloeobacter in figure 5.1. Thus, all of the

respiratory and photosynthetic capacity of eucaryotic cells was

obtained from bacterial symbionts. The nuclear component

of the modern eucaryotic cell did not derive from an ancient

bacterial or archaeal symbiosis, however. Molecular trees based

on rRNA and other reliable genes show unequivocally that the

Eucarya are as old as the Archaea. The mitochondrion and

chloroplast came in relatively late in the sense of sequence

change in rRNA, but early in the chronological history of life

Figure 5.1. Universal tree

based on SSU rRNA sequences.

Sixty-four rRNA sequences

representative of all known

phylogenetic domains were

aligned, and a tree was

produced with an ML method

(Barns et al. 1996). That tree

was modified, resulting in the

composite one shown, by

trimming and adjusting branch

points to incorporate the results

of other analyses. The scale bar

corresponds approximately to

0.1 changes per nucleotide

(Pace 1997).

The Early Branches in the Tree of Life 79

(described below). This later evolution of the major organelles

is evidenced by the fact that mitochondria and chloroplasts

diverged from peripheral branches in the molecular trees (fig.

5.1). Moreover, the most deeply divergent eucaryotes in phylogenetic

trees even lack mitochondria. These latter kinds of

organisms, little-studied but sometimes troublesome anaerobic

creatures such as Giardia, Trichomonas, and Vairimorpha,

nonetheless contain at least a few bacteria-type genes (Sogin

and Silberman 1998). These genes may be evidence of an earlier

symbiosis that was lost, or perhaps a gene transfer event

between the evolutionary domains.

A Microbial World

A sobering aspect of large-scale phylogenetic trees such as

shown in figure 5.1 is the graphical realization that most of

our knowledge in biological sciences has focused on but a

small slice of biological diversity. The organisms most represented

in our textbooks of biology, animals (Homo in fig.

5.1), plants (Zea), and fungi (Coprinus), constitute only peripheral

branches even of eucaryotic cellular diversity. Life’s

genetic diversity is mainly microbial in nature. Although the

biosphere is absolutely dependent on the activities of microorganisms,

our understanding of the makeup and natural

history of microbial ecosystems is, at best, rudimentary. One

reason for the paucity of information is that microbiologists

traditionally have relied on laboratory cultures for the detection

and identification of microbes. Yet, more than 99%

of natural microbes are not cultured using standard techniques.

Consequently, most environmental microbes have

remained largely unknown.

The development of cloning and sequencing technology,

coupled with the relational perspective afforded by phylogenetic

trees, made it possible to identify environmental microbes

without the requirement for culture (Pace 1997). The

occurrence of phylogenetic types of organisms, “phylotypes,”

and their distribution in natural communities can be surveyed

by sequencing rRNA genes obtained directly from

environmental DNA by cloning. This sidesteps the need to

culture organisms in order to learn something about them.

A sequence-based phylogenetic assessment of an uncultivated

organism can provide insight into many of the properties of

the organism through comparison with its studied relatives.

On the other hand, many of the phylotypes detected in the

environment have no close relatives in the culture collections,

so little can be inferred about the properties of the organisms

that correspond to the sequences. The sequences, however,

can be used to devise experimental tools, for instance,

molecular hybridization probes, that can be used identify and

study the inhabitants of microbial ecosystems. Regardless of

the properties of the organisms they represent, the novel

rRNA sequences have provided additional perspective on the

topology of the universal tree. The following sections discuss

the evolutionary structures of the three domains.

Bacteria

Most knowledge of microorganisms has derived from the

study of only a few kinds of bacteria, mainly cultured organisms

and in the context of disease or industrial products. Any

general census of bacteria that make up naturally occurring

microbial communities was not possible until the development

of the molecular methods that identify rRNA sequencebased

phylotypes without culture. As rRNA sequences have

accumulated in the databases, now numbering more than

80,000, it is apparent that the heavily studied species represent

only a fraction of bacterial diversity.

The phylogenetic tree shown in figure 5.1 is based on a

calculated result with the sequences included. Trees inferred

with such a diversity of sequences can accurately

portray relationships between the domains, but the order

of branches within the domains is likely to be inaccurate

because of the small number of taxa selected for the analysis.

A summary of the results of tree calculations with different

methods and different suites of bacterial rRNA

sequences is diagrammed in figure 5.2 (Hugenholtz et al.

1998a). The wedges indicate the radiations of the major

clades, relatedness groups. These are termed “phylogenetic

divisions,” or “phyla.” The number of known bacterial divisions

has expanded substantially in recent years. The first

compilation, by Woese in 1987 (fig. 5.2 inset), could include

only about 12 divisions. About 40 such deeply related

groups of bacteria have now been identified by rRNA

sequences. Only about two-thirds of the bacterial divisions

have cultured representatives (filled wedges in fig. 5.2). The

remaining (open wedges) have been detected only in molecular

surveys of environmental rRNA genes. Organisms

that belong to these bacterial divisions without cultured

members sometimes are abundant in their respective environments,

and therefore, their activities are likely significant

in the local biogeochemistry. Sequences that identify

members of the WS6 division, for instance, are conspicuous

in hydrocarbon bioremediation sites and so likely are

important for that process (Dojka et al. 1998). OP11 sequences,

first detected in a hot spring in Yellowstone National

Park (Hugenholtz et al. 1998b), commonly are

abundant in anaerobic environments (J. K. Harris, S. T.

Kelley, and N. R. Pace, unpubl. obs.). The rRNA sequences

thus point to areas for investigation by microbiologists.

Phylogenetic analyses of available molecular sequences,

rRNA and protein, have failed to resolve convincingly any

specific branching orders of the bacterial divisions. Trees

produced using rRNA sequences (e.g., figs. 5.1 and 5.2) often

indicate that a few of the division lineages (e.g., Aquificales,

Thermotogales) branch more deeply than the main radiation,

but this is possibly an artifact of the high-temperature nature

of those organisms and their rRNAs. The base of the

bacterial tree is best seen as a polytomy, an expansive radiation

that is not resolved with the current data. It is possible

that future studies will draw together some of the groups that

80 The Origin and Radiation of Life on Earth

now seem to constitute division-level diversity. An important

direction in this regard is the accumulation of additional

sequences, particularly those that represent the entire diversity

of the bacterial divisions. Broad taxon representation of

sequences is required to produce the most accurate phylogenetic

trees (Hillis 1998). Currently, however, as illustrated

in figure 5.3, most rRNA sequences are from only a few of

the bacterial divisions. Further environmental surveys with

molecular methods will be the most efficient way, possibly

the only way, to gather a broader information base on bacterial

diversity. It is also likely that genomic studies will contribute

to the resolution of the bacterial tree. For instance,

the common occurrence of gene families could be evidence

for a specific relationship between divisions that are not convincingly

relatives within the accuracy of the rRNA trees.

Although the understanding of the fine structure of the bacterial

tree will improve, the current picture of the base of the

tree as an expansive radiation of independent lines of genetic

descent is unlikely to change.

This overall structure of the bacterial phylogenetic tree

(fig. 5.2), a line of descent with no (surviving) branches and

then a burst of diversifying genetic lineages, is intriguing.

This evolutionary radiation surely was one of the great landmarks

in biology, and the consequences of that diversification

included profound modification of this planet, through

the metabolic activities of the resulting organisms. What

could have sparked such a spectacular radiation in the bacterial

tree? One possibility is that the expansive genetic differentiation

resulted when early life developed sufficient

sophistication that stable, independent lines of descent

could be established. Before that, the rudimentary nature

of biochemical processes may have precluded the establishment

of independent genetic lines of descent. Genes would

have been shared by communities of replicating entities.

Woese has discussed the transition between early biochemistry

and the establishment of the cellular lines of descent

as analogous to an annealing process (Woese 1998, 2000).

Initially, mutation rates and lateral transfer would have been

high. As increasingly complex and specific structures accumulated,

both mutation rates and lateral transfer would

have tapered off, and discrete genetic lines of descent could

be established.

Figure 5.2. Diagrammatic

representation of the phylogenetic

divisions of Bacteria.

Phylogenetic trees containing

sequences from the indicated

organisms or groups of

organisms, chosen to represent

the broad diversity of Bacteria,

were used as the basis of the

figure. Wedges indicate that

several representative sequences

fall within the indicated depth

of branching. Solid wedges are

represented by cultured

organisms. Open wedges are

represented only by environmental

sequences and are

named after rRNA gene clone

libraries (OP, WS, TM, OS). The

smaller or larger areas of the

sectors correspond to smaller or

larger numbers of sequences

available. The scale corresponds

approximately to 0.1 changes

per nucleotide (Hugenholtz

et al. 1998a). The inset shows

the bacterial tree of the 12

phylogenetic divisions known in

1987 (Woese 1987).

The Early Branches in the Tree of Life 81

Archaea

In 1977, at the time of the recognition that archaeans are

fundamentally distinct from both bacteria and eucaryotes,

only a few species of those organisms had been cultured and

studied. The properties of these organisms seemed unusual.

Some of the cultured species were highly anaerobic methanogens,

using molecular hydrogen as an energy source and

respiring with carbon dioxide, to make methane. Others

thrived in saturated brine, for instance, Israel’s Dead Sea, and

produced a rhodopsin-like pigment akin to that in our own

eyes. A third type of what became known as members of

Archaea were acidophilic thermophiles, found in acidic geothermal

springs. Most examples of Archaea that have been

cultured since their recognition also have been obtained from

those environments. Consequently, archaeans popularly have

been considered restricted to environments that are “extreme”

by human standards. Molecular studies have shown,

however, that this perception is seriously distorted. Archaeal

rRNA genes belonging to uncultured organisms are widely

distributed in environments that are not necessarily extreme.

Our understanding of the structure of the archaeal phylogenetic

tree rests on only about 1000 rRNA sequences, about

half from cultured organisms and the others from environmental

surveys of rRNA genes. Relatively few environments

have been analyzed for Archaea, however, so the extent of

diversity that makes up that phylogenetic domain surely is

far broader than we know.

Figure 5.4 is a diagram of the known phylogenetic

makeup of the domain Archaea. There are two main relatedness

groups, Euryarchaeota and Crenarchaeota. A potential

third deeply divergent lineage of Archaea, Korarchaeota,

is represented only by environmental rRNA gene sequences,

so the status of this group needs to be tested and consolidated

by further studies of gene sequences and descriptions

of organismal properties (Barns et al. 1996). The branch

between these main evolutionary clades of Archaea are the

deepest within any of the three domains. The depth of separation

of Euryarchaeota and Crenarchaeota also is indicated

by many biochemical properties and genomic features. For

instance, even DNA is packaged differently in these two kinds

of organisms: euryarchaeotes use histones to package chromatin,

much as do eucaryotes, whereas crenarchaeal genomes

evidently lack histone genes (Pereira et al. 1997). The mode

of packaging DNA by the latter organisms is not known.

There are cultured representatives of most of the main

lineages of Euryarchaeota. Molecular analyses of environmental

sequences have revealed no new groups that diverge

deeply in the euryarchaeal tree. In contrast, most of the

known rRNA diversity of Crenarchaeota is known only from

environmental sequences. All cultured crenarchaea are thermophilic

and often are obtained from geothermal environ-

Figure 5.3. Phylogenetic distribution of SSU rRNA sequences > 500 nucleotides in length in the

RDP-ARB database (http://rdp.cme.msu.edu/html/). Figure compiled by Kirk Harris.

82 The Origin and Radiation of Life on Earth

ments. The properties of these organisms did much to popularize

the notion of archaeans as exclusively “extremophiles.”

It came as a surprise, then, when abundant, phylogenetically

diverse crenarchaeal rRNA gene sequences were discovered

in more moderate habitats ranging from shallow and deep

marine waters, soils, sediments, and rice paddies, to symbionts

in some invertebrates (DeLong and Pace 2001). As

shown in figure 5.4, only one of the main relatedness groups

in Crenarchaeota is composed of named organisms. The

other groups consist of environmental organisms represented

only by sequences. These otherwise largely unknown organisms

are some of the most abundant creatures on Earth. In

the oceans, for instance, low-temperature crenarchaea occur

at concentrations of 107 to 108 cells per liter throughout the

water column at all latitudes, and typically constitute 20–50%

of the cells present. The niche in the global ecosystem that

these organisms fill is not known. Cultured crenarchaea commonly

use hydrogen as an energy source, and molecular

hydrogen is pervasive in the environment at very low levels

(Morita 2000). Perhaps the low-temperature crenarchaea tap

this ubiquitous fuel. Although low-temperature crenarchaea

have so far eluded pure culture for laboratory studies, recent

developments in genome science are being exploited to learn

more about them. Environmental DNA is cloned as large

pieces that can be linked together and sequenced to gain

further information on the organisms identified by the rRNA

sequences (DeLong et al. 1999).

Eucarya

Molecular evolutionary studies of eucaryotes have relied

generally on a sparse collection of gene sequences that do

not represent the full range of eucaryotic diversity in nature.

As shown in figure 5.1, the most diverse eucaryotic rRNA

sequences are derived from microbes. Yet, such organisms

are the least known of eucaryotes and have received the least

attention from molecular phylogenetic studies. More than

100,000 microbial eucaryotes, “protists,” have been described

(Patterson and Sogin 1993), but only a few thousand have

been investigated for rRNA sequence (Sogin and Silberman

1998). Moreover, as with the collection of bacterial rRNA

sequences, the collection of eucaryal sequences is heavily

biased toward only a few relatedness groups. The recent

addition of environmental rRNA gene sequences to phylogenetic

calculations has improved the resolution of the eucaryotic

tree by providing additional diversity Dawson and

Pace 2002). A diagram that summarizes the phylogeny of the

eucaryotic taxonomic kingdoms from the rRNA perspective

is shown in figure 5.5. There is no convention for the taxonomic

organization of sequence-based relatedness groups of

eucaryotes. Based on various traditional or molecular classification

schemes, eucaryotes have been categorized into anywhere

from three to more than 70 major kingdoms. Eucaryal

sequences available in the databases fall into about 30 independent

relatedness clusters, the known kingdom-level relatedness

groups (Dawson 2000; not all shown in fig. 5.5).

From the perspective of rRNA sequences, the overall

topology of the eucaryal tree is seen as a basal radiation of

independent lines of descent, one of which gave rise to other

main lines, one of which culminated in the “crown radiation”

of the familiar taxonomic kingdoms such as animals, plants,

stramenopiles, and so forth (fig. 5.5). The specific positions

of intermediate branches in the rRNA tree are only approximate,

but the successive branching order is indicated by several

kinds of analyses (Dawson and Pace 2002, Sogin et al.

1989). The accuracy with which the kingdom-level lines can

be resolved will improve as the sequence collection available

Figure 5.4. Diagrammatic

representation of the phylogeny

of Archaea. Wedges indicate

that several representative

sequences fall within the

indicated depth of branching.

Names correspond to organisms

or groups of organisms, or

environmental clones (Dawson

2000.

The Early Branches in the Tree of Life 83

for analysis grows. This view of successive branching in the

eucaryotic tree contrasts with the results of some comparisons

of protein-encoding genes, with limited phylogenetic

representation (Philippe et al. 2000). Those results have been

interpreted to indicate that there is no particular branching

order, that the contemporary kingdom-level lines derive from

a single expansive radiation analogous to the bacterial radiation

(fig. 5.2). Proponents of this view have argued that

extensive sequence differences between basal-derived and

crown-group rRNA genes do not reflect great evolutionary

distances, but rather are a consequence of relatively rapid

evolution in the basal lines. Some of the environmental rRNA

gene sequences branch more deeply in the tree than the

crown radiation, however, and are not rapidly evolving lines.

These environmental sequences punctuate the long lines

between the crown and the previously identified basal divergences.

The occurrence of deeply divergent eucaryotic lines

with slow substitution rates (short lines) indicates that the high

rates (long lines) previously ascribed to the basal divergences

in rRNA trees are not the norm. Phylogenetic trees based on a

single gene, SSU rRNA in this case, of course cannot reflect

the genealogies of all the genes that specify organisms because

of the potential influence of lateral transfer. Genes with phylogenies

that are not congruent with the rRNA tree possibly

have undergone lateral transfer in their evolution.

The successive radiations of the main lines of descent are

significant landmarks in eukaryotic history. Correlation of

cellular properties or genomic sequences with rRNA trees

may provide clues regarding the biological innovations that

sparked these deep radiations. One noteworthy correlation

may be the phylogenetic distribution of the major organelles,

chloroplasts and mitochondria. All characterized representatives

of the basal lineages of eukaryotes lack mitochondria

and chloroplasts, whereas organisms of more peripherally

branching groups have those organelles. As diagrammed in

figure 5.6, the distribution of these organelles indicates that

much of the modern diversity of eucaryotes may have been

made possible by the metabolic power and light-harvesting

capacity of bacteria.

Time and the Tree of Life

Because sequences of genes change with time, it seems natural

to try to infer the times of branch points in evolutionary history

by the extents of sequence divergence between modern

genes. Indeed, molecular phylogenetic trees often are interpreted

in the context of time since the divergence of particular

branches. This simple correlation between time and sequence

change is not well founded, however, because different lines

of descent can change at different rates. This is seen in the

lengths of line segments (extents of sequence change) in

the three-domain tree in figure 5.1. Thus, lines leading to

modern-day members of Archaea are systematically short

compared with the lines leading to their sister group, modern

eucaryotes. Moreover, the rate of change in sequences is

Figure 5.5. Schematic diagram

of the evolution of Eucarya. The

branch points of these kingdom-

level groups are based on

trees inferred with ED, MP, and

ML and representative sequences.

The areas of the

wedges reflect nonlinearly the

relative numbers of SSU rRNA

sequences of these groups in

GenBank. Groups named LEM,

BOL, and BAQ are represented

only by environmental rRNA

gene clones (Dawson and Pace

2002).

84 The Origin and Radiation of Life on Earth

not constant with time. This is seen in the mitochondria,

which have undergone many more sequence (and other)

changes than has their sister line in this tree, the line leading

to the proteobacterium Agrobacterium tumefaciens (fig. 5.1).

Thus, a sequence-based phylogenetic tree cannot be used to

date events unless the tree can be calibrated by correlating a

historical occurrence with some feature in the tree.

The deep evolutionary branches that gave rise to the

phylogenetic domains blur into the origin of life, and their

subbranches probably happened early, as well. A geological

and biological correlation that may estimate one time point

in the Tree of Life is the occurrence of molecular oxygen and

the phylogenetic radiation of the only organisms that produce

oxygen, the cyanobacteria. Although oxygen did not

become abundant until 2–2.5 billion years (Byr) ago, there

is evidence for oxidized iron in 3.5-Byr-old rocks (Sleep

2002). The occurrence of stromatolites in those rocks indicates

that complex microbial communities had developed by

that time. Moreover, the shapes of ostensible microfossils in

cherts of the same age are proposed to resemble morphologically

conspicuous, modern-day cyanobacteria (Schopf

1994). This presence of oxygen, bolstered by the fossil record,

suggests that the cyanobacterial radiation (indicated by

Gloeobacter, Synechococcus, and chloroplast in fig. 5.1) had

already occurred by 3.5 Byr ago. The main bacterial divergences

must have occurred even before the time of the

cyanobacterial radiation. Because the phylogenetic line that

led to chloroplasts originated at the base of the cyanobacterial

radiation, it seems likely that chloroplasts, as well, were derived

early. The branch point of a mitochondrial lineage from

proteobacteria is consistent with the early appearance of that

organelle, too. Therefore, the modern kind of eucaryotic cell,

with organelles, probably also arose early, more than 3.5 Byr

ago. The eucaryotic nuclear line of descent is even more

ancient, as old as the archaeal line.

Conclusion and Prospects

The general outlines of a universal phylogenetic tree are now

in place. It is clear, however, that it incompletely portrays

the breadth of biological diversity. A main reason that it is

incomplete is because our understanding of microbial diversity

is rudimentary. Molecular studies of environmental organisms

continue to reveal major relatedness groups that

were not suspected. Are there still other primary domains to

be discovered? Perhaps. The methods used to hunt organisms

in the environment are heavily dependent on the microbial

diversity that we already know about. Are there other

new bacterial divisions and eukaryotic kingdoms to be discovered?

Almost certainly. Even the limited studies of microbial

ecosystems so far have turned up remarkable novelty,

and the complexity of those ecosystems indicates that much

broader diversity will be encountered.

The complexity of the microbial world does not fit well

into the call of many biologists to enumerate all of Earth’s

species. Microbial diversity is too broad, far too complex to

be accommodated by species counts. On the other hand, a

sampling and an articulation of the extent of cellular diversity

can be accomplished by sequence surveys of environmental

rRNA genes. The sequences reflect the kinds of organisms that

they represent, and the frequencies of the phylotypes are a

rough census of the microbial world. An expanded sequence

representation of life’s diversity also will afford more accurate

molecular phylogenetic reconstructions and bring us to a closer

understanding of our earliest beginnings.

Dedication

This article is dedicated to Roy Chapman Andrews, who knew

that there were things to be discovered; and to the American

Museum of Natural History, which gave him the opportunity to

go find them.

Acknowledgments

I thank colleagues in my lab for comments that improved this

article. My research activities are supported by the National

Institutes of Health, the National Science Foundation, and the

NASA Astrobiology Institute.

Literature Cited

Barns, S. M., C. F. Delwiche, J. D. Palmer, and N. R. Pace. 1996.

Perspectives on archaeal diversity, thermophily and

Figure 5.6. Possible pattern of eukaryotic rRNA diversification.

The diagram shows the pattern of eukaryotic evolution and the

incorporation of the major organelles, chloroplasts, and

mitochondria. As described in the text, the organelles would

have been in place more than 3.5 Byr ago.

}"Crown"

Mitochondria and

Chloroplasts ?

Archaea

Bacteria

The Early Branches in the Tree of Life 85

monophyly from environmental rRNA sequences. Proc.

Natl. Acad. Sci. USA 93:9188–9193.

Dawson, S. C. 2000. Evolution of the Eucarya and Archaea:

perspectives from natural microbial assemblages. Thesis,

University of California, Berkeley.

Dawson, S. C., and N. R. Pace. 2002. Novel kingdom-level

eukaryotic diversity in anoxic environments. Proc. Natl.

Acad. Sci. USA. 99:8324–8329.

DeLong, E. F., and. N. R. Pace. 2001. Environmental diversity

of Bacteria and Archaea. Syst. Biol. 50:470–478.

DeLong, E. F., C. Schleper, R. Feldman, and R. V. Swanson.

1999. Application of genomics for understanding the

evolution of hyperthermophilic and nonthermophilic

Crenarchaeota. Biol. Bull. 196:363–366.

Dojka, M. A., P. Hugenholtz, S. K. Haack, and N. R. Pace. 1998.

Microbial diversity in a hydrocarbon- and chlorinatedsolvent-

contaminated aquifer undergoing intrinsic bioremediation.

Appl. Environ. Microbiol. 64:3869–3877.

Gogarten, J. P., H. Kibak, P. Dittrich, L. Taiz, E. J. Bowman,

B. J. Bowman, M. F. Manolson, R. J. Poole, T. Date,

T. Oshima, J. Konishi, K. Denda, and M. Yoshida. 1989.

Evolution of the vacuolar H+-ATPase: implications for the

origin of eukaryotes. Proc. Natl. Acad. Sci. USA 86:6661–

6665.

Hillis, D. M. 1998. Taxonomic sampling, phylogenetic accuracy,

and investigator bias. Syst. Biol. 47:3–8.

Hugenholtz, P., B. M. Goebel, and N. R. Pace. 1998a. Impact of

culture-independent studies on the emerging phylogenetic

view of bacterial diversity. J. Bacteriol. 180:4765–4774.

Hugenholtz, P., C. Pitulle, K. L. Hershberger, and N. R. Pace.

1998b. Novel division level bacterial diversity in a Yellowstone

hot spring. J. Bacteriol. 180:366–376.

Iwabe, N., K. Kuma, M. Hasegawa, S. Osawa, and T. Miyata.

1989. Evolutionary relationship of archaebacteria,

eubacteria, and eukaryotes inferred from phylogenetic

trees of duplicated genes. Proc. Natl. Acad. Sci. USA

86:9355–9359.

Koonin, E. V., K. S. Makarova, and L. Aravind. 2001. Horizontal

gene transfer in prokaryotes: quantification and classification.

Annu. Rev. Microbiol. 55:709–742.

Marsh, T. L., C. I. Reich, R. B. Whitelock, and G. J. Olsen.

1994. Transcription factor IID in the Archaea: sequences in

the Thermococcus celer genome would encode a product

closely related to the TATA-binding protein of eukaryotes.

Proc. Natl. Acad. Sci. USA 91:4180–4185.

Morita, R. Y. 2000. Is H2 the universal energy source for longterm

survival? Microb. Ecol. 38:307–320.

Pace, N. R. 1997. A molecular view of microbial diversity and

the biosphere. Science 276:734–740.

Patterson, D. J., and Sogin, M. L. 1993. Eukaryote origins and

protistan diversity. Pp. 13–46 in The origin and evolution

of prokaryotic and eukaryotic cells (H. Hartman and

K. Matsuno, eds.). World Scientific, River Edge, NJ.

Pereira, S. L., R. A. Grayling, R. Lurz, and J. N. Reeve. 1997.

Archaeal nucleosomes. Proc. Natl. Acad. Sci. USA

94:12633–12637.

Philippe, H., P. Lopez, H. Brinkmann, K. Budin, A. Germot,

J. Laurent, D. Moreira, M. Muller, and H. Le Guyader. 2000.

Early-branching or fast-evolving eukaryotes? An answer

based on slowly evolving positions. Proc. R. Soc. Lond B

267:1213–1221.

Rowlands, T., P. Baumann, and S. P. Jackson. 1994. The TATAbinding

protein: a general transcription factor in eukaryotes

and archaebacteria. Science 264:1326–1329.

Schopf, J. W. 1994. The oldest known records of life: early

archaean stromatolites, microfossils, and organic matter.

Pp. 193–207 in Early life on Earth (S. Bengston, ed.).

Columbia University Press, New York.

Sleep, N. 2002. Oxygenating the atmosphere. Nature 410:317–

319.

Sogin, M. L., J. H. Gunderson, H. J. Elwood, R. A. Alonso, and

D. A. Peattie. 1989. Phylogenetic meaning of the kingdom

concept: an unusual ribosomal RNA from Giardia lamblia.

Science 243:75–77.

Sogin, M. L., and J. D. Silberman. 1998. Evolution of the

protists and protistan parasites from the perspective of

molecular systematics. Int. J. Parasitol. 28:11–20.

Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis.

1996. Phylogenetic inference. Pp. 407–514 in Molecular

systematics (D. M. Hillis, C. Moritz, and B. K. Mable, eds.).

Sinauer Associates, Sunderland, MA.

Woese, C. R. 1987. Bacterial evolution. Microbiol. Rev. 51:221–

271.

Woese, C. R. 1998. The universal ancestor. Proc. Natl. Acad.

Sci. USA 95:6854–6859.

Woese, C. R. 2000. Interpreting the universal phylogenetic tree.

Proc. Natl. Acad. Sci. USA 97:8392–8396.

Woese, C. R., and G. E. Fox. 1977. Phylogenetic structure of the

prokaryotic domain: the primary kingdoms. Proc. Natl.

Acad. Sci. USA 74:5088–5090.

Woese, C. R., O. Kandler, and M. L. Wheelis. 1990. Towards a

natural system of organisms: proposal for the domains

Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. USA

87:4576–4579.