The Living World
Fourth Edition
GEORGE B. JOHNSON
11
Genomics
PowerPoint® Lectures prepared by Johnny El-Rady
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
11.1 Genomics
The full complement of genetic information of an
organism is its genome
Genomics is a new field of biology concerned with
the sequencing and study of genomes
The first genome to be sequenced was that of the
virus FX174
Frederick Sanger in 1977 obtained the sequence
of this 5,375 genome
The advent of automatic DNA sequencing machines
has facilitated the sequencing of larger genomes
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Sequencing DNA
DNA is first amplified
The DNA fragments are then mixed with
Primers + DNA polymerase
A supply of the four nucleotides
A smaller supply of chemically-tagged nucleotides
that terminate replication
The DNA is denatured into single-strands allowing
DNA replication to proceed
The addition of a chemically-tagged nucleotide to the
growing chain halts DNA replication
Thus the mixture will contain double-stranded
DNA of various lengths
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Sequencing DNA
The DNA mixture is separated by fragment size
using gel electrophoresis
Examination of the fragments from shortest to
longest reveals the nucleotide sequence of the DNA
Linking the sequence of the various DNA fragments
will yield the sequence of the entire genome
The scanning and analysis of the gel is greatly
facilitated by the use of computers
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Fig. 11.1 How to sequence DNA
One color
corresponds to
each nucleotide
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
11.2 Comparing Genomes
A comparison of the genomes of different species
can reveal relationships between the species
Genomes contain vast amounts of information on the
history of life
To date, genomes of more than 100 prokaryotes and
18 eukaryotes have been or are being sequenced
This growing number has revolutionized the study
of comparative evolutionary biology
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
The Tiger Pufferfish (Fugu rubripes)
Genome consists of 365
million base pairs
This is only 1/9th the
human DNA
But the gene number
is about the same
Much of the extra human
DNA appears in introns
Fig. 11.2
Some human and Fugu genes have been conserved
Other genes are unique to each species
A considerable scrambling of gene order exists
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
The Mouse (Mus musculus)
Genome consists of 700 million base pairs less than
that of humans
Again, the gene number is about the same
Indeed, the human genome shares about 99% of its
genes with mice
Most of the genes that are unique to mice are
linked to the sense of smell and reproduction
The difference between the two species may be due
to gene expression rather than gene number
The level, timing and location of gene transcription
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
The Mouse (Mus musculus)
Mouse and humans last shared a common
ancestor about 75 million years ago
Since then, mouse DNA has mutated twice as
fast as human DNA
Moreover, chromosomal rearrangements appear
to have occurred twice as fast in mice
However, the common ancestral sequence of
genes has been preserved in both species
The “junk” DNA (not gene-coding) in both species
ended up in comparable regions of the genome
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Chimpanzee (Pan troglodytes)
Humans and chimps diverged from a common
ancestor only about 6 million years ago
The genomes of the two species are only 1.4%
different at the level of the DNA
So what explains the big difference in body and
behavior?
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Chimpanzee (Pan troglodytes)
A detailed comparison of corresponding
chromosomes in the two species reveals large gaps
These gaps are due to either DNA deletions or
insertions
The insertions typically involve the transposable
element Alu
These gaps often result in significant changes in
protein structure
They also affect gene expression
Which genes are transcribed, and where and when?
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Chimpanzee (Pan troglodytes)
Therefore, the major phenotypic differences between
humans and chimps result from accumulation of
many small alterations
On the other hand, a few genes may have crucial
contributions
Gene FOXP2, for example
Encodes a protein that plays an important role
in language development
The protein differs by two amino acids
between humans and other apes
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Fig. 11.3 Humans
have one less
chromosome pair
than other apes
Two mid-sized ape
chromosomes fused to form
the large chromosome #2
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Insects: Drosophila and Anopheles
Insects are the most species-rich and
morphologically diverse animal group on earth
The genomes of two insects have been sequenced
Drosophila melanogaster, the fruit fly
Anopheles gambiae, the malaria mosquito
The two insects diverged ~ 250 million years ago
The organization of genes has undergone
significant shuffling
Drosophila exhibits less noncoding DNA than
Anopheles
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
The Protist Plasmodium
Plasmodium falciparum is the cause of malaria
It has a genome of only 23 million base pairs
This corresponds to about 5,300 genes
Plasmodium contains a unique subcellular
component termed the apicoplast
The only site for fatty acid synthesis
About 12% of nuclear genes encode proteins that
head to the apicoplast
This suggests a drug target to combat malaria
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Flowering Plants: Arabidopsis and Oryza
Arabidopsis thaliana (the wall cress) is a tiny
member of the mustard family
Provides a model for studying plant genetics and
development
Has ~ 26,000 genes
Oryza sativa (rice) is a member of the grass family
Has enormous economic significance
Has a relatively small genome of ~ 430 million base pairs
However, the number of genes is surprisingly large
Between 33,000 and 63,000, depending on the study
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Flowering Plants: Arabidopsis and Oryza
Oryza and Arabidopsis diverged from a common
ancestor about 150 to 200 million years ago
More than 80% of the genes found in rice are also
found in Arabidopsis
~ 1/3rd of the common genes are “plant” genes
These include the thousands of genes involved
in photosynthesis
Both plant genomes have larger gene families than
are seen in animals or fungi
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
11.3 The Human Genome
The sequence of the entire human genome was
reported on June 26, 2000
It consists of 3.2 billion base pairs
If the human genome were a book
It would be 500,000 pages long
It would take about 60 years to read at the rate of
8 hours a day, every day, at five bases a second
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Geography of the Genome
The number of genes in humans is only about
25,000-30,000
However, there are about 4 times more mRNA
molecules
The genes are divided into exons and introns
Thus alternative mRNA splicing can generate
much more mRNA than there are genes
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Fig. 11.4 What the human genome is like
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Geography of the Genome
Genes are not distributed evenly throughout the
human genome
Chromosome 19 is small yet packed with genes
Chromosomes 4 and 8 are large yet have few
genes
On most chromosomes, clusters rich in genes are
scattered between vast stretches of “barren” DNA
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
DNA That Codes for Proteins
The human genome contains four different classes
of protein-encoding genes
1. Single-copy genes
Most genes fit in this class
Silent copies, inactivated by mutation, are
called pseudogenes
2. Segmental duplications
Blocks of similar genes in the same order are
found throughout the human genome
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
DNA That Codes for Proteins
The human genome contains four different classes
of protein-encoding genes
3. Multigene families
Groups of related but distinctly different genes
that often occur together in cluster
Arose from a single ancestral sequence
4. Tandem clusters
DNA sequences repeated thousands of times
in tandem array
The rRNA genes, for example
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Noncoding DNA
Only 1-1.5% of the human genome is coding DNA
There are four major types of noncoding DNA
1. Noncoding DNA within genes
Together introns make up about 24% of the
human genome
2. Structural DNA
~ 20% of the genome is constitutive
heterochromatin
Located near centromeres and telomeres
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Noncoding DNA
Only 1-1.5% of the human genome is coding DNA
There are four major types of noncoding DNA
3. Repeated sequences
Simple sequence repeats (SSRs)
Two- or three-nucleotide sequences repeated
thousands of times
Constitute ~3% of the human genome
Duplicated Sequences
Repeated sequences, other than SSRs
Constitute ~7% of the human genome
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Noncoding DNA
Only 1-1.5% of the human genome is coding DNA
There are four major types of noncoding DNA
4. Transposable elements
Make up ~45% of the human genome
They include
LINEs
Long interspersed elements
~6,000 DNA bases long
Active transposons
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Noncoding DNA
Only 1-1.5% of the human genome is coding DNA
There are four major types of noncoding DNA
4. Transposable elements
Make up ~45% of the human genome
They include
Alu sequences
~300 DNA bases long
Have no transposition machinery
Reside within, and transpose with, LINEs
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
11.4 Gene Microarrays
A gene microarray is a glass square smaller than a
postage stamp, covered with millions of DNA strands
Microarray chips, or biochips, can be used to delve
into a person’s genes
The DNA is denatured then washed over the
microarray
Bound complementary sequences are detected by
a computer
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
11.4 Gene Microarrays
Microarrays are also used to detect the level of gene
expression
Fluorescently-tagged cDNA is made from mRNA
Complementary binding results in a dot-patterned
microarray
Similarly, two different sources of DNA can be
compared
For example, the genetic similarity between two
different organisms/individuals can be determined
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Fig. 11.5 A gene microarray
cDNA from
sample 1 is
labeled with
a green dye
cDNA from
sample 2 is
labeled with
a red dye
Yellow spots
indicate the
binding of
both samples
The more
the yellow
spots, the
more similar
the DNA
sources
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
11.4 Gene Microarrays
Single nucleotide polymorphisms (SNPs)
Spot differences between “reference sequences”
and the DNA of a particular individual
Some SNPs are associated with cancers and
other genetic disorders
Others may give red hair or high cholesterol
Each of us differs from the standard “type
sequence” in some 25,000 nucleotide SNPs
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Fig. 11.6
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
11.4 Gene Microarrays
Researchers have identified over a million different
SNPs, all of which can reside on a few biochips
So, the SNP, and thus DNA, profile of an individual
can be easily obtained
This raises critical issues of personal privacy
Protecting medical information, for example
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
11.5 Proteomics: The Next Frontier
Bioinformatics is an area of genomics that combines
molecular genetics and computational analysis
It attempts to predict the type of protein encoded
by a particular sequence
Example
The structure of the Pax6 protein has been
deduced from the gene’s DNA sequence
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
11.5 Proteomics: The Next Frontier
All the proteins an organism possesses is its
proteome
The study and analysis of the proteome is an
emerging field termed proteomics
Protein arrays are being developed for this purpose
These utilize fluorescently-labeled antibodies
While there may be a million different proteins, the
number of distinct motifs is thought to be < 5,000
1,000 of these have already been cataloged
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Descargar

The Living World