Пресс-релиз популярных книг
.
Авторы: 111 А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Э Ю Я
Книги: 164 А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Э Ю Я
На сайте 111 авторов, 92 книг, 72 статей, 5913 глав.
6 Bacteria and Archaea
W. Ford Doolittle
86
The Triumph of Molecular Phylogeny
The collection of chapters in this volume and the symposium
for which they were assembled celebrate one of the signal
achievements of 20th century biology: the integration of
molecular sequence analyses with more traditional comparative
and paleontological approaches in the construction of a
universal Tree of Life. Integration is one of the key words here.
Without molecular data, we would still find it easy to tell
birds from bees or to distinguish any bird or bee from broccoli,
brewer’s yeast, or bacteria. But we would have no strong
basis for deciding, as we have (see Baldauf et al., ch. 4 in this
vol.), that all birds and bees are closer kin to yeast than to
broccoli. Nor would we have much reason to be as confident
as we are that, despite the manifest differences in size, shape,
and lifestyle, organisms in the first four groups—all eukaryotes,
with nucleated cells—share a common ancestor with
the nonnucleated prokaryotes (Bacteria and Archaea). For
all the very deep branchings, only molecular data—in the
form of DNA or protein sequence, or sometimes three-dimensional
protein structure—can provide unarguable evidence
for common ancestry and define lines of descent.
Unarguable is another key word. Of course, biologists
have never been at a loss for theories about how one type of
living thing might be evolutionarily related to another, and
what features might be important for deciding this. I remember
being taught in high school that brewer’s yeast and other
fungi were really a complex kind of bacterium, because of
their shared absorptive mode of taking nutrients, cell walls,
and general cellular simplicity, for instance. By the time I
started college, this view had been replaced by the synthesis
known as Whittaker’s Five Kingdoms (Animals, Plants,
Fungi, Protozoa, and Bacteria each as separate assemblages).
Such theories were always fluid and arguable, because there
were few commonly agreed upon grounds for formulating
or proving them. One difficulty was in knowing which shared
features are truly homologous (similar because they derive
from such a feature in a common ancestor) and which are
analogous (independently evolved for similar purposes, e.g.,
the wings of birds and bats, or the aquatic habits of fishes
and whales). Claims for evolutionary relatedness can only
be made on the basis of homologous traits. Another difficulty
was in converting data about shared features (if homologous)
into quantifiable measures of overall organismal
similarity. How do we combine data about biochemical
pathways, cellular ultrastructure, and behavior, which are
so profoundly different in quality, into a single quantity
measuring relatedness?
Molecular sequence data, at least at first blush, obviate
both problems. There are 20100 possible proteins 100 amino
acids long. Anything more than about 15% sequence identity
between two proteins cannot be mere coincidence and
is unlikely to be the result of evolution independently rediscovering
the same solution twice (convergence), because one
of evolutionary biology’s best-learned lessons is that there are
many different ways to solve the same challenge. So signifiBacteria
and Archaea 87
cant sequence similarity can be taken as significant evidence
of homology. It is also eminently quantifiable: we have only
to line two sequences up so as to optimize the match, and
count the identical amino acids (or nucleotides, for an RNA
or DNA sequence). These advantages of molecular sequence
data were first recognized by Emile Zuckerkandl and Linus
Pauling, whose 1965 papers founded the now flourishing
discipline of molecular phylogeny (Zuckerkandl and Pauling
1965). Further, Zuckerkandl and Pauling argued that gene
sequence data (or its direct read-out in RNA or protein sequence)
deserve our attention more than features of organismal
form and function, because they are more fundamental. DNA
sequence determines organismal form and function, and not
the other way round. Indeed, the latter contain no evolutionary
information that is not encoded in the former.
Implicit in Zuckerkandl and Pauling’s arguments, and
embodied in the molecular phylogenetic work they inspired,
was the assumption that, in picking a gene to do phylogeny
with, all we needed to worry about was the ease with which
it (or its RNA or protein product) can be isolated and sequenced,
and the breadth of its distribution. (Hemoglobins
are marvelous for doing vertebrate phylogeny, but plants and
bacteria don’t have them.) What we didn’t have to concern
ourselves with was the possibility that different genes in a
genome might have different phylogenetic histories. This assumption
is depicted in figure 6.1A and could be summarized
as individual gene trees = genome tree = organism tree.
Carl Woese made something like this assumption near the
end of the 1960s, when he chose small subunit (SSU) ribosomal
RNA (rRNA) as a “universal molecular chronometer,”
a stand-in for all genes. SSU rRNA was one of the few ubiquitously
distributed gene products that could be easily isolated
and (at least partially) sequenced at that time (Woese
1987). It still would be one of the best all around choices
(see Pace, ch. 5 in this vol.).
Ironically, a strong violation of the principle illustrated
by figure 6.1A was proposed by Lynn Margulis, at very nearly
the same time, and provided one of the first hypotheses about
deep phylogeny that the infant discipline of molecular phylogeny
could cut its teeth on (Margulis 1970). She dusted
off and made modern the endosymbiont hypothesis for the
origin of chloroplasts and mitochondria, first proposed by
Mereschowsky in the late 19th century. According to this
notion, these energy-generating organelles (the first responsible
for photosynthesis in all plants and algae, and the second
for respiration in almost all eukaryotes) were once
free-living bacteria that had become trapped in the cytoplasm
of ancient eukaryotic cells, as permanent endosymbionts
(fig. 6.2). In this sheltered and nutrient-rich environment,
many genes useful only for independent life were lost,
whereas many producing proteins still needed for photosynthesis
or respiration were transferred to the nucleus (so that
their products would thenceforth have to be transported back
into the organelle). A few genes were retained on the tiny
residual genomes found in mitochondria and plastids, however,
and these could unequivocally be used to trace the
evolutionary origins of these organelles.
Among such retained genes were those for organellar
versions of SSU rRNA. By the mid 1970s, several groups had
shown that chloroplast and mitochondrial SSU rRNA genes
Figure 6.1. Three models for the relationships between organismal, genome, and gene phylogenies, for four imaginary species
(labeled A, B, C, and D). (A) shows the “standard model”: no genes are exchanged between genomes, so the gene complements of any
genome can change only through loss of genes or duplication of genes, followed by divergence in sequence and function. (B) shows
the “stable core”: some, possibly even most, genes can be exchanged between genomes over evolutionary time, but a core of genes is
immune to this process, and the (congruent) phylogenies of these genes can be used to trace organismal phylogeny, and construct the
true Tree of Life. In (C), the “shifting core” model, no two genes need have the same phylogeny throughout all of life’s history.
Nevertheless, within restricted regions of the tree, most genes might evolve in a coherent fashion, showing congruent phylogenies.
A
A B C D A B C D A B C D
B C
88 The Origin and Radiation of Life on Earth
were indeed of independent bacterial origin (cyanobacteria
and a-proteobacteria, respectively), exhibiting phylogenies
clearly different from each other (Gray and Doolittle 1982).
More to the point, their phylogenies also differed from that
of the nuclear-gene-encoded SSU rRNA of cytoplasmic ribosomes—
a marker for the evolutionary history of the protoeukaryotic
host that first harbored the symbionts (fig. 6.2).
So this very important idea about cellular evolution was also
the first serious counterexample to the assumption that all
of an organism’s genes should have the same phylogeny.
Indeed, it was the fact that they don’t that proved the endosymbiont
hypothesis.
In the rest of this chapter I show that there are very
many other genes like this, genes that show different phylogenies
from SSU rRNA and from each other (and have
nothing to do with the endosymbiont hypothesis). Within
the prokaryotic domains (Bacteria and Archaea), in particular,
much coding DNA can be and demonstrably has been
exchanged across species, genus, phylum, or even domain
boundaries—so many genes, indeed, that the pattern of
relationships defined by SSU rRNA genes may not be exhibited
by the majority of the genes in any genome. For
prokaryotes, the appropriate model for typical relationships
between gene phylogenies might look more like B or C than
A in figure 6.1. This is probably not so much a problem for
eukaryotes, especially complex multicellular ones, and I will
confine myself to the topic assigned me, Bacteria and
Archaea. But because there seems to be so much gene sharing
between the two, my title might more appropriately have
been Bacteriandarchaea.
None of this necessarily means that Darwin was fundamentally
wrong, or that the concept of a unique and universal
organismal Tree of Life is passй, or that—if certain
assumptions hold—rRNA does not track this tree best. But
there is not a unique universal genomic tree, and we need to
develop more sophisticated (but also much more interesting
and exciting) ways of thinking about what we mean by
the Tree of Life.
Superbugs, Drugs, and Lateral Gene Transfer
The mid 1960s also saw the discovery of lateral gene transfer
(LGT), the process (or rather, collection of processes)
underlying microbial gene sharing. Infectious disease microbiologists,
mostly in the United States and Japan, found that
the rapid rise of resistance to commonly (and often excessively)
used antibiotics among human pathogens (especially
in hospitals) was not due to the expected Darwinian mechanism
of random mutation followed by natural selection
(Falkow 1975). Instead, genes determining resistance to
antibiotics (by a variety of mechanisms) had been recruited
from preexisting natural reservoirs and were being passed
around among pathogens on small circular DNA molecules
(plasmids), themselves well adapted to spreading infectiously
between bacterial species (fig. 6.3). Selection is still involved—
pathogens receiving the resistance-conferring plasmids produce
more progeny because they have them. So the process
is Darwinian. But it was not mutations occurring within genes
within species, but whole genes (or suites of genes) transferred
across species boundaries, on which selection was
acting. Indeed, we now know that plasmids can carry several
different genes for resistance to several different kinds
of antibiotics simultaneously, and that special mechanisms
and genetic devices (insertion sequences, transposons, and
integrons, to give some names) have evolved to facilitate the
assembly and transmission of such genes (Bushman 2002).
We were also soon to learn that antibiotic resistance determinants
were not the only kinds of coding sequences that
plasmids could carry. Clusters of several genes involved in
the synthesis of unusual and inessential metabolites or the
degradation of unusual and rarely available substrates were
also exchanged in this way. Two Canadians (Sorin Sonea and
Maurice Panniset) and an Australian (Darryl Reanney) soon
constructed a bold if inchoate theory on this foundation
(Sonea and Panniset 1976, Reanney 1976). They asserted that
because of between-species gene transfer—mediated not only
by plasmids but also by bacterial viruses (phages) and
through cell-to-cell contact (conjugation) or DNA uptake
(transformation)—all bacteria might be viewed as one species,
responding to environmental challenges (over evolutionary
time) as a single “global superorganism.” As I recall it,
these claims were widely dismissed during the 1970s and
Figure 6.2. The endosymbiont hypothesis for the origin of
mitochondria. A respiring a-proteobacterium was acquired by a
nonrespiring host (the protoeukaryote) as an endosymbiont,
conferring the benefits of respiration (efficient metabolism). The
endosymbiont lost genes needed for independent growth and
transferred many other genes to the nucleus. A small mitochondrial
genome (sometimes only a dozen genes) remains in the
organelle. A similar hypothesis would have chloroplasts derive
from cyanobacteria (blue-green algae). Both hypotheses are
considered proven (Gray and Doolittle 1982).
Bacteria and Archaea 89
1980s—they were so hopelessly radical! Most of the genes
then known to be transferred by plasmids could be viewed
as somehow “specialized” and, under most circumstances,
dispensable. Genes for core informational functions (replication,
transcription, and translation) were not known to be
subject to LGT, nor were genes of basic and widely conserved
metabolic pathways. So LGT was seen as a genetic add-on,
not a fundamental evolutionary force. It might even have
appeared on the scene recently, as the microbes’ way of coping
with human activity, namely, antibiotic use and the flooding
of microbial environments with many unusual pollutants,
some highly toxic but some of novel nutritional value (for
bacteria).
Pathogenicity (and Other) Islands
As we acquired the ability to characterize and especially to
sequence longer and longer stretches of DNA, however, we
could begin to see that still much more complex genetic
packages could be delivered across species boundaries by
LGT. And chromosome as well as plasmids could harbor the
transferred genes. In particular, pathogenic bacteria often
differ from harmless relatives by the possession of large functionally
specialized clusters, called pathogenicity islands,
some containing more than 100 genes (Hacker and Kaper
2000). These include virulence factors of many sorts, facilitating
survival within, protection from, or attack on the host,
as well as genes promoting the islands’ transfer as units. Often,
pathogenicity islands are inserted within a particular type
of chromosomal sequence (a gene for transfer RNA) and have
different compositional characteristics (relative composition
of G, C, A, and T) than the surrounding genes (fig. 6.3). Most
cogently, the genes of which they are composed may be found
in very similar form in very distantly related bacterial (or even
archaeal) genomes, but not in the pathogen’s closest relatives.
Clearly, they have been transferred into the genomes in which
we find them, although we don’t generally know the transfer
mechanism. So, very complex and important (for bacteria
and for us) suites of biochemical/physiological/behavioral
characteristics can be acquired in “one fell swoop” by LGT.
And recently, we’ve come to realize that there are also “symbiosis
islands” (promoting cooperation with hosts), “saprophytic
islands” (facilitating decay), and “ecological islands”
(metabolism in unusual circumstances).
Genomic Diversity: The Iceberg of Which
Phylotypic Diversity Is but the Tip
Still, resistance factors and complex multigene determinants
of interactions (benign or malign) with hosts and environments
might be seen as “specialized.” Surely, they constitute
no serious threat to our understandings of the evolutionary
histories of the everyday genes comprising the bulk of most
genomes, or to our ability to reconstruct the universal tree
using a nontransferrable marker, like SSU rRNA.
Genomics and, in particular, the appearance of complete
bacterial and archaeal genomic sequences now call even this
view into question. More than 100 such sequences will soon
be publicly available, and these will demolish the notion that
genomes in general contain just a few genes (or gene clusters)
of foreign origin, and these only for specialized functions.
Particularly striking are the comparisons that can be
drawn between different isolates of the very same bacterial
species. Consider for instance Escherichia coli, the laboratory
workhorse of molecular biologists and biotechnologists for
the last five decades. The complete genome sequence of K12,
their favorite strain, was reported in 1997 (Blattner et al.
1997). Many of its 4405 genes were already familiar from
genetic experiments or piecemeal gene sequencing studies.
The community therefore thought that it had this species
under wraps, genomically—until four years later, when the
genome of another E. coli isolate, O157:H7, was completed
(Perna et al. 2001) This is the strain that first attracted popular
attention in 1993 through the death of three young customers
of a fast-food restaurant in California, and two years
ago killed seven drinking from contaminated wells in
Ontario. The sequencing showed that it has 1387 genes that
K12 doesn’t have, whereas K12 itself has 528 genes not found
in O157:H7—numbers corresponding to 26% of the genome
of O157 and 12% of K12’s. Many of these differences can
only be explained by LGT, verifiable through similarity to
homologous genes in evolutionarily distant bacteria (or even
archaea) and, most persuasively, through the construction
of phylogenetic trees for each gene. These many differences
are also clearly the consequence of many different LGT
events, not just the acquisition of a few large pathogenicity
islands. In fact there are 177 physically separated “O islands”
Figure 6.3. Bacterial antibiotic resistance genes found on
plasmids have been the major cause of the rise in drug-resistant
“superbugs.” Their spread is one form of LGT. Also, genes for
many functions related to pathogenicity are clustered in
transferrable regions of bacterial chromosomes.
90 The Origin and Radiation of Life on Earth
(genes or gene clusters present in O157 but not K12) and about
234 “K islands.” Although many of the strain-specific genes of
O157:H7 are likely to be specialized determinants of virulence,
many are not. They encode seemingly pedestrian microbial
functions (e.g., carbohydrate transfer, glutamate fermentation,
or aromatic compound degradation).
Preliminary data for other E. coli strains show the O157:H7
versus K12 difference to be typical, not aberrant. Similar studies
based on similar information on other pathogens produce
similar results. Strains of the same “species” often differ from
each other by up to 25% in gene content. Simple logic (with
the assumption that, on average, bacterial genomes are getting
neither larger nor smaller) dictates that about half of this
difference can be attributed to acquisition of new genes by one
or the other strain, after their joint separation from a common
ancestor. (The other half could be explained by loss, from one
or the other strain, of genes present in that ancestor.)
We know about genomic variability in pathogens because
it is easy to obtain funding to study the biology of pathogens.
Data on nonpathogens are scant. Recently Camilla Nesbш in
my lab, with Karen Nelson at The Institute for Genomic
Research, has been looking at genomic diversity within
Thermotoga maritima, a nonpathogen par excellence. This
hyperthermophilic bacterium grows best at 80°C and was
isolated from the seafloor in a geothermal area near Vulcano,
Italy. Preliminary data suggest that here, too, there will be
something like 20% variability in gene content, between otherwise
very similar isolates. If this turns out to be generally
true for “environmental microbes” (including Archaea), then
we cannot explain away within-species genomic variation as
a by-product of intense host–parasite warfare: we must accept
it as a fact of prokaryotic life. We must also accept, then,
that the microbial world is even more wildly diverse than those
who use “phylotyping” (amplification and sequencing of SSU
rRNA genes from environmental DNA samples; see Pace,
ch. 5 in this vol.) have already told us. Such studies have revealed,
through a plethora of new twigs on the branches of
the SSU rRNA tree, a hitherto unimaginable diversity of relatives
of known groups. They have also led to the discovery of
completely new groups, without previously known relatives.
For each isolate identified by a single SSU rRNA sequence
(“phylotype”), however, there may now be many more genomic
variants, differing in their content of truly different
(nonhomologous) genes by more than, say, the genomes of all
the animals. (Animals do, of, course vary in gene content, but
through duplication and functional divergence of genes they
already have, or through gene loss—scarcely ever through
the introduction of genuinely novel genes by LGT.)
How Much Exchange over Life’s Whole History?
There is no easy way to know how old any bacterial species
is, or (which is almost the same question) how long strains
within a species have been diverging—and surely there is no
uniform age. Howard Ochman and Isaac Jones estimate that
various E. coli strains began to diverge about 25–40 Myr
(million years) ago, based on an often quoted but largely
unverified estimate of the divergence of Escherichia from
Salmonella at 100–150 Myr ago (Ochman and Jones 2000).
In contrast, Yersinia pestis, the cause of plague, may be only
a few thousand years old (Achtman et al. 1999)! But however
ancient bacterial species in general may be, their ages
will be dwarfed by that of life itself. So, if 10–20% of a genome
can “turn over” because of LGT and gene loss within
(generously) 100 Myr, what fraction would we expect to have
been affected by LGT over 3.8 billion years? No one thinks
that all genes are equally exchangeable, but still it is reasonable
to ask what fraction of any contemporary genomes’ genes
has been affected by LGT. There are several ways one might
try to do this.
Ochman and Jeff Lawrence look at basic compositional
features of genes, in particular, the relative frequencies of A,
T, G, and C and the choice among alternative codings for the
same amino acids (Ochman et al. 2000). Prokaryotic species
differ significantly in these parameters, which tend to be similar
within a genome. Thus, a recently transferred gene might
“stick out like a sore thumb” from the surrounding long-term
residents. (With time—perhaps a few hundred million
years—genome-specific mutational and selectional pressures
will attenuate and ultimately erase the differences.) With
analyses based on these premises, Ochman and collaborators
find foreign gene contents from 0.0% (for Mycoplasma
genitalium or Rickettsia prowazecki, intracellular human parasites)
to 16.6% for the cyanobacterium Synechocystis, with
E. coli boasting 12.8% transfers.
Eugene Koonin and his colleagues employ a completely
different method (called BLAST) that makes all possible
pairwise comparisons between each of a genome’s genes and
all homologous genes in other genomes (or the larger databases),
and calculates sequence similarity (Koonin et al.
2001). Genes that have greatest sequence similarity to genes
in species that are distant on the rRNA tree (rather than to
genes in species that are close) are likely transfers. The most
easily detected transfers would be those involving the greatest
distances: genes in an archaeal genome that are most
similar to homologs in the bacterial domain, and vice versa.
Koonin finds up 15.6% interdomain transfer (for an
archaean, Halobacterium salinarum). Rumor in the field now
has it that similar analyses will show that one-third of the
genes in the yet-to-be-published genome sequence of the
methane-producing archaean Methanosarcina mazei are of
bacterial provenance—an astonishing result!
The third and best way to assess a genome’s origins is to
construct phylogenetic trees for each of its genes, by stateof-
the-art methods. For many individual genes, compelling
cases can be developed. My favorite example is the gene for
HMGCoA reductase (3-hydroxy-3-methylglutaryl coenzyme
A reductase), a key enzyme in the synthesis of isoprenoid
compounds (sterols, e.g.) in all three domains (and the tarBacteria
and Archaea 91
get of the statins that many people take to reduce endogenous
cholesterol synthesis). Our attention was first drawn to
HMGCoA reductase because BLAST analyses showed that the
version of this gene in Archaeoglobus fulgidis (a hyperthermophilc
archaean sometimes found in undersea oil wells) was
very like homologous genes in bacteria and unlike the versions
found in other Archaea. In fact, most Archaea have an
HMGCoA reductase very similar to that of eukaryotes, so for
them statins are antibiotics! A tree prepared by Yan Boucher
for HMGCoA reductases (fig. 6.4) not only confirmed this
result but identified other transfers—Bacteria to Giardia
intestinalis (a single-celled pathogenic eukaryote), Archaea to
Vibrio cholerae (a bacterial pathogen), and Archaea to Streptomyces
species (bacteria that produce antibacterial antibiotics).
Gene-by-gene analyses are time consuming, because
human judgment is still often required. Less reliable but very
rapid programs for preparing, by simple automatic methods,
all the trees for all the genes in a genome are being developed.
That of Thomas Sicheritz-Ponten and Siv Andersson
shows, not unlike Koonin’s BLAST studies, interdomain
(Bacteria to Archaea or Archaea to Bacteria) transfers
amounting to up to about 20% of a genome (Sicheritz-
Ponten and Andersson 2001).
Is this about the limit? Are 70–80% of most genomes well
behaved in the long-term evolutionary sense, as well as the
short? Probably not. Foreign gene estimates are all likely
to be underestimates. Ochman’s analyses, for instance, can
only look back a few hundred million years. Koonin’s and
Sicheritz-Ponten’s results described interdomain transfers
(Bacteria to Archaea or vice versa). Because Bacteria and
Archaea have dissimilar gene expression machinery and control
signals, genes transferred between them should often be
poorly read. Harder to detect, intradomain transfers should
be much more frequent.
Hunting Down the Core
There is another way to skin this cat. Instead of asking what
fraction of genes in a given genome have clearly different histories
than the majority (or than SSU rRNA), we can ask if
we can find, by comparing all genomes, a stable core of shared
genes (fig. 6.1B) that have the same history. There is a general
belief that such a core should exist, based on a hypothesis
and an observation.
The hypothesis, first articulated by Woese when he decided
to settle on SSU rRNA as a “universal molecular chronometer,”
has come to be called the “complexity hypothesis”
(Jain et al. 1999). The idea is simple: genes whose protein
(or RNA) products must interact in the cell will coevolve.
Mutations that affect the structure of one gene product (call
it A) will be compensated by mutations that affect another,
interacting, gene product (B) in a compensatory way, so that
the essential interactions between A and B are preserved
throughout the evolutionary history of a species or lineage.
Meanwhile, in another, related lineage, the homologous gene
products A' and B' will also be coevolving, but likely along a
somewhat different path. If the B gene of the first lineage were
replaced by the B'> of the second lineage, there might be
problems: the A gene product might not interact as effectively
with the B' product (and similarly, A' might not be effective
with B). This seems a very reasonable conjecture, and the
corollary—that genes involved in even more complex interactions
(A + B + C + D + E . . .) should be very hard to exchange
for homologous genes in different lineages, without
detriment to growth—seems inescapable.
SSU rRNA is the central part of an enormously complex
structure, the ribosome. This factory for translation (the RNA
→protein part of DNA →RNA →protein) also requires two
other RNAs and more than 50 proteins, in order to do its
vital and always essential job. The complexity hypothesis
would predict that the genes encoding these RNAs and proteins
could not be transferred across even very short evolutionary
distances. Similarly, the various genes encoding the
machineries of transcription (DNA →RNA) and replication
(copying of DNA) should be hard to transfer. Certainly, it is
the case that the genes identified as foreign in individual
sequenced bacterial or archaeal genomes are seldom genes
of these informational classes. But there are now several reliable
reports of transfer of “informational genes,” especially
those involved in translation and (in a few cases) SSU rRNA
itself (Yap et al. 1999)!
The observation on which confidence in a stable core rests
is what some of us call “coherence.” Many individual genes,
when known from a sufficient number of species, do re-create
the same major groups—Archaea (and within them euryarchaeotes
and crenarchaeotes) or Bacteria (and within them
the known bacterial phyla, such as cyanobacteria, a-, b- and
g-proteobacteria and so forth). There is no published systematic
survey that says how many “many” is, however, or
that compares a large number of well-resolved trees for congruence
of topology. And few genes agree on branching order
of bacterial phyla (even though they do distinguish Bacteria
and Archaea). Pace (ch. 5 in this vol.) suggests that the poor
resolution at the base of the bacteria bespeaks a rapid radiation
some 3.5 billion years ago, perhaps caused by a key innovation.
This is one explanation but not the only.
Surely the most rigorous test of the stable core idea would
be to compare all bacterial and archaeal genomes, distill out
the set of genes of which all genomes have a copy, make trees,
and tally up how many subscribe to which topology. Efforts
to do this have failed: there are very few genes shared by all
genomes (even all bacterial or all archaeal genomes)—perhaps
50 or fewer (Teichman and Mitchison 1999). Few of
these genes give statistically robust trees, so we simply cannot
say whether their topologies are congruent or not. The
assumption that there might be a stable core of genes for all
prokaryotes is not disproved by this, but neither is it proven:
it remains a hypothesis. In an effort to test the stable core
idea on a more limited basis, we looked at the core of genes
92 The Origin and Radiation of Life on Earth
shared by four sequenced eukarychaeotes, asking if these all
produced the same tree (Nesbш et al. 2001). Several hundred
genes could be looked at and, because there are only three
unrooted phylogenetic trees for four taxa, easily scored for
agreement or disagreement. It turns out that each of the three
possible trees is significantly represented among the 263
shared genes we looked at. In other words, although there is
a core of genes shared by the four genomes, it does not seem
to be a stable core. The shared genes often appear to have
different phylogenetic histories. This could mean that genes
are not infrequently replaced by homologous but possibly
quite different versions of themselves, transferred in across
species lines.
So it is not possible to prove that there is any sizable stable
core, even within a relatively restricted group such as the
euryarchaeotes. Hervй Philippe and collaborators have tried
another approach (Brochier et al. 2002). Individual trees
constructed for 57 translational proteins shared by 45 bacterial
species mostly disagree, as expected: there is too much
noise and too little phylogenetic signal. But if they strung
all gene sequences together to obtain one concatenated sequence,
then a statistically robust tree could be obtained, and
44 of the 57 genes did not significantly contradict this result.
(The 13 others showed significant evidence for transfer.)
So perhaps these comprise a true core for all of Bacteria.
But 44 is but a few percent of the number of genes in a typical
bacterial genome. And when Brown and collaborators
(2001) included members of Archaea in a similar study, they
were obliged to reduce the apparent stable core even further,
to only 14 genes. Woese may be correct in asserting, “An
organismal genealogical trace of some kind does seem to exist
. . . but that trace is carried clearly almost exclusively in the
componentry of the cellular information processing systems”
(Woese 2000:8393). However, when it comes to prokaryotes,
and the deepest branches of the universal tree, proving
even this modest claim is surprisingly difficult!
Figure 6.4. Phylogeny of genes
encoding HMGCoA reductase, a
key enzyme in the synthesis of
sterols and related lipids. The
predominant bacterial form
(class 2) and predominant
eukaryotic/archaeal form (class
2) are unquestionably homologous
but with different functional
characteristics. Four LGT
events are very strongly supported
by the phylogenetic
analysis. The boxed numbers are
bootstrap values, measures of
statistical robustness, for a tree
obtained by maximum likelihood,
maximum parsimony, and
distance methods. Archaeal
names are italicized, eukaryotic
names are underlined, and
bacterial names are in regular
letters.
Bacteria and Archaea 93
Other Models
Absence of evidence is not evidence of absence. A conservative
summary of what I’ve said so far is that the existence of
a stable core is hard to prove. The signal-to-noise ratio in the
data we need to decide about events occurring three and
more billion years ago is too low, and our methods are still
too crude. “Hard to prove” is not “disproven.” But all parties
to the debate now accept that the core of genes that has been
stably associated in all prokaryotic genomes since the first
genome is far smaller than we used to think. And, just maybe,
there might be no such core.
What if there weren’t? Could there be some other model
than those depicted in figure 6.1, A and B, to explain the
undeniable fact that we can classify bacteria and archaea into
groups that have many shared defining features—that the
entire edifice of Linnaean hierarchical classification has been
more or less successfully imposed on microbial systematics?
Jeff Lawrence, Peter Gogarten, and I have been working on
such a model, which is still in the verbal stages (no formal
mathematics) and has as yet no fixed name (Gogarten et al.
2002). Here I call it the model of the “shifting core” or, alternatively,
the model of “nested gene pools.” In fact, it’s
not much different from what Woese himself now believes
(Woese 2000), although we would probably disagree on the
values of its parameters.
Imagine that all genes are potentially exchangeable but
that the frequency or likelihood of exchange varies tremendously.
Many factors would affect this. Complexity of interactions
of the gene’s product, and whether or not it was
genetically linked (and so could be co-transferred) with other
interacting genes would be important factors, related to the
genes themselves. So would essentiality: genes that must always
be present can only be replaced through an intermediate
stage in which both the originally resident and the incoming
foreign gene are found in the same genome. (Such intermediates
are well known.) Biochemistry of the donor and recipient
organism would be a key determinant. Transferred genes
for various components of the photosynthetic apparatus are
only likely to be of any use to species that already do photosynthesis.
If of no use, transferred genes will soon be lost and
we will never know that a transfer occurred. Similarly, the
differences in gene expression systems between Bacteria and
Archaea must reduce the frequency of successfully fixed
transfers between them. Environmental niche matters, too:
genes from thermophiles make proteins that work best in
other thermophiles. Finally, donors and recipients must be
found in close proximity in nature, and physical and genetic
mechanisms to pass DNA between them (including “accidental”
mechanisms) must exist.
Imagine that we ourselves create hundreds of different
bacterial species, with genes and genomes made from
scratch by machine, and then set them up in various niches
and allow them to transfer genes according to such rules.
Although there would initially have been no deep “phylogenetic”
relationships between these human-made species
or their genes, patterns of shared genes and similarities in
sequences would eventually emerge, because of recurring
transfers at different frequencies. In other words, LGT itself
can create and maintain the patterns we seek to explain
by the model depicted in figure 6.1A, but the underlying
process would be as shown in figure 6.1C. According to
this model, organisms that exchange genes most frequently
would comprise “species.” Different species whose organisms
share genes somewhat less frequently would comprise
genera, and so on up the Linnaean ladder. Bacteria are coherent
as a domain because they more frequently exchange
genes with other bacteria than with members of Archaea
(and vice versa), but still, interdomain transfer does occasionally
happen.
This model may not be correct in its extreme form (no
stable core at all), but something like it must apply in the
long run to most of the genes that make up prokaryotic genomes.
In the short run (corresponding to the divergence of
strains in a species or species in a genus, perhaps), it may
most accurately describe only the 20% of a genome’s worth
of genes that are found in some genomes but not others.
[However, recombination within genes—which I have not
discussed—may have a similarly confounding effect, at this
level (see Maynard Smith et al. 2000).]
The One True Tree
Darwin did describe the relationships of all organisms as a
tree and thought that the patterns of similarities and differences
between all contemporary species could be explained
as the result of successive bifurcative speciation events, going
back to one, or just a few first living things. If we had a videotape
of all that (and 3.8 billion years to sit down and watch
it!), we could trace all the bifurcations, and that tracing would
be the universal Tree of Life. But there is no video, so we have
been trying to reconstruct these bifurcations by comparing
the sequences of genes, initially on the assumption that any
gene would in principle do, but more recently with the belief
that only some genes will tell the true story. But even if
none do, and figure 6.1C shows how genomes truly evolve,
the situation need not be seen as hopeless. Some kind of
consensus of the phylogenies of all genes of all genomes,
weighted perhaps in favor of those least frequently transferred,
might still have a good chance of recreating the pattern
of speciation events recorded on our imaginary videotape.
We don’t yet know how best to make such consensus phylogenies.
Some investigators want to call them “genome phylogenies,”
a misleading term, I believe. Frequent LGT does
not mean there is no single true universal Tree of Life for
organisms, only that reconstructing this tree has become
more problematic. But frequent LGT does mean that there
is no single true universal tree of genomes, because these are
made up of parts that have different phylogenies!
94 The Origin and Radiation of Life on Earth
Cold Comfort to Creationism
Advocates of Biblical interpretations of life’s history and proponents
of “intelligent design” like to cite disagreement
within the evolutionary community and, in particular, claims
to have “overthrown Darwin” as support for their views.
Therefore, early publications asserting that evidence for extensive
LGT was “uprooting the Tree of Life” have found
popularity with them. Perhaps some of us (especially me)
were not careful enough in stating that what was being uprooted
was the tree of genomes. Our acceptance of the video
version of the organismal tree remains steadfast, regardless
of problems in constructing it.
Even so, there is a challenge to Darwinism, as it has itself
evolved over the last century. Darwinists (more properly,
neo-Darwinists) see adaptation happening as the result of
selection among mutations that have arisen in genes within
populations of species, and speciation as most commonly the
result of divergent (and ultimately incompatible) adaptations
being fixed in different populations. Explicitly or implicitly,
figure 6.1A is the model of genome evolution most compatible
with this neo-Darwinian view. This, I assert, is what
Darwin himself would have expected, had he lived to see the
centenary of the publication of The Origin of Species. If adaptations
are instead often due to acquisition of genes from
different species, then figure 6.1C might the more relevant
model. I’d hope that Darwin, had he hung on for still another
half century, would have found this at least amusing
and recognized the profound difference.
In any case, what does it matter what Darwin would think?
Evolutionary biologists are committed to materalistic,
nonsupernatural explanations of the patterns of similarity and
difference we see in the living world, not to the correctness of
Darwin’s own particular explanations. If we substitute one
materalistic, nonsupernatural explanation for another, this is
a sign of paradigmatic health, not weakness. Sometimes I think
we ourselves forget this, and defend Darwin and neo-Darwinism
(and, indeed, the gene-based Tree of Life) as if they were
received truth, not provisional interpretations of a fascinatingly
complex world. We should stop doing that!
Literature Cited
Achtman, M., K. Zurth, G. Morelli, G. Torrea, A. Guiyole, and
E. Carniel. 1999. Yersinia pestis, the cause of plague, is a
recently emerged clone of Yersinia tuberculosis. Proc. Natl.
Acad. Sci. USA 96:14043–14048.
Blattner, F. R., G. Plunkett, III, C. A. Bloch, N. T. Perna,
V. Burland, M. Riley, J. Collado-Vides, J. D. Glasner, C. K.
Rode, G. F. Mayhew, et al. 1997. The complete genome
sequence of Escherichia coli K-12. Science 277:1453–1474.
Brochier, C., E. Bapteste, D. Moreira, and H. Philippe. 2002.
Eubacterial phylogeny based on translational apparatus
proteins. Trends Genet. 18:1–5.
Brown, J. R., C. J. Douady, M. J. Italia, W. E. Marshall, and M. J.
Stanhope. 2001. Universal trees based on large combined
protein sequence datasets. Nat. Genet. 28:281–285.
Bushman, F. 2002. Lateral DNA transfer. Cold Spring Harbor
Laboratory Press, Cold Spring Harbor, NY.
Falkow, S. 1975. Infectious multiple drug resistance. Pion Ltd.,
London.
Gogarten, J. P., W. F. Doolittle, and J. G. Lawrence. 2002.
Prokaryotic evolution in the light of gene transfer. Mol. Biol.
Evol. 19:2226–2238.
Gray, M. W., and W. F. Doolittle. 1982. Has the endosymbiont
hypothesis been proven? Microbiol. Rev. 46:1–42.
Hacker, J., and J. B. Kaper. 2000. Pathogenicity islands and the
evolution of microbes. Annu. Rev. Microbiol. 54:641–679.
Jain, R. C., M. C. Rivera, and J. A. Lake. 1999. Horizontal gene
transfer among genomes: the complexity hypothesis. Proc.
Natl. Acad. Sci. USA 96:3801–3806.
Koonin, E. V., K. S. Marakova, and L. Aravind. 2001. Horizontal
gene transfer in prokaryotes: quantification and classification.
Annu. Rev. Microbiol. 55:709–742.
Margulis, L. 1970. Origin of eukaryotic cells. Yale University
Press, New Haven, CT.
Maynard Smith, J., E. J. Feil, and N. H. Smith. 2000. Population
structure and evolutionary dynamics of pathogenic bacteria.
Bioessays 22:1115–1122.
Nesbш, C. L., Y. Boucher, and W. F. Doolittle. 2001. Defining
the core of nontransferable prokaryotic genes: the euryarchaeal
core. J. Mol. Evol. 53:340–350.
Ochman, H., and I. B. Jones. 2000. Evolutionary dynamics of
full genome content in Escherichia coli. EMBO J. 19:6637–
6643.
Ochman, H., J. G. Lawrence, and E. A. Groisman 2000. Lateral
gene transfer and the nature of bacterial innovation. Nature
405:299–304.
Perna, N. T., G. Plunkett, III, V. Burland, B. Mau, J. D. Glasner,
D. J. Rose, G. F. Mayhew, P. S. Evans, J. Gregor, et al. 2001.
Genome sequence of enterohaemorrhagic Escherichia coli
O157:H7. Nature 409:529–532.
Reanney, D. C. 1976. Extrachromosomal elements as possible
elements of adaptation and development. Bacteriol. Rev.
40:552–590.
Sicheritz-Ponten, T., and S. G. Andersson. 2001. A phylogenomic
approach to microbial evolution. Nucleic Acids
Res. 29:545–552.
Sonea, S., and M. Panniset. 1976. Manifesto for a new bacteriology.
Rev. Can. Biol. 35:103–167.
Teichman, S. A., and G. Mitchison. 1999. Is there phylogenetic
signal in prokaryotic proteins? J. Mol. Evol. 49:98–107.
Woese, C. R. 1987. Bacterial evolution. Microbiol. Rev. 51:221–
271.
Woese, C. R. 2000. Interpreting the universal phylogenetic tree.
Proc. Natl. Acad. Sci. USA 97:8392–8396.
Yap, W. H., Z. Zhang, and Y. Wang. 1999. Distinct types of
rRNA operons exist in the genomes of the actinomycete
Thermomonospora chromogena and evidence for horizontal
transfer of an entire rRNA operon. J. Bacteriol. 181:5201–
5209.
Zuckerkandl, E., and L. Pauling. 1965. Evolutionary divergence
and convergence in proteins. Pp. 97–166 in Evolving genes
and proteins (V. Bryson and H. J. Vogel, eds.). Academic
Press, New York.
Популярные книги
- Старинные занимательные задачи
- Медоносные растения
- Математика Древнего Китая
- Algebratic geometry
- Workbook in Higher Algebra
- Mathematics and art
- Finite element analysis
- Пчеловодство
- Fields and galois theory
- Black Holes
Популярные статьи
- Higher-Order Finite Element Methods
- Электровакуумные приборы
- Riemann zeta functionS
- Универсальная открытая архитектурно-строительная система зданий серии Б1.020.1-71
- Complex Analysis 2002-2003
- Пример расчета прочности елементов, стыков и узлов несущего каркаса здания
- Составы, вещества и материалы для огнезащитыметаллических консрукций и изделий
- CMOS Technology
- Рекомендации по расчету и конструированию сборных железобетонных колонн каркасов зданий серии Б1.020.1-7 с плоскими стыками ВИНСТ
- Советы старого пчеловода