Computational Biology
& Bioinformatics:
Molecular Biology Primer
What is Bioinformatics
biology
Bioinformatics is an interdisciplinary field
concerned with
the study of information content, structure, and
processes in biological systems
 the use of computer science (informatics) to address
problems in the life sciences, which includes the
creation of data bases on genomes, proteins and
metabolic pathways and mining them for knowledge

computer science
the development of efficient algorithms that solve
biological problems

bioinformatics
biology
informatics
information
mathematics
Bioinformatics
Biological
Data
+
Computer
Calculations
Where does Bioinformatics come from?
A little Biology: Evolutionary
Tree of Life
Njsas.org
Animal Cell (Eukaryotic)
Faculty.southwest.tn.edu
Bacterial Cell (Prokaryote)
Uccs.edu
Cells & DNA
Ogm-info.com
DNA and Chromosome
wikipedia
Four types of nucleic acids of DNA
Note that A pairs with T; and G pairs with C.
Primary Structure of DNA
• Unbranched polymer
• Sequence of nucleotide bases
• Double stranded
atgaatcgta ggggtttgaa cgctggcaat
acgatgactt ctcaagcgaa cattgacgac
ggcagctgga aggcggtctc cgagggcgga ……
Building Blocks of Biological Systems:
nucleotides and amino acids
DNA (nucleotides, 4 types): information
carrier/encoder.
RNA: bridge from DNA to protein.
Protein (amino acids, 20 types): action molecules.
Processes
• Replication of DNA
• Transcription of gene (DNA) to messenger
RNA (mRNA)
• Translation of mRNA into proteins
• Folding of proteins into 3D from
• Biochemical or structural functions of
proteins
DNA  RNA  Protein
transcription
translation
(access excellence resource center)
© 1999 The International Herpes Management Forum, all rights reserved.
© 1999 The International Herpes Management Forum, all rights reserved.
Translation: Universal Genetic Code
• Translation form nucleotide code to amino acid code.
atgaatcgta ggggtttgaa cgctggcaat
acgatgactt ctcaagcgaa cattgacgac
ggcagctgga aggcggtctc cgagggcgga ……
MNRRGLNAGNTMTSQANIDDGSWKAVSEGG …
Genetic Code
uccs.edu
Building Blocks of Biological Systems:
nucleotides and amino acids
DNA (nucleotides, 4 types): information
carrier/encoder.
RNA: bridge from DNA to protein.
Protein (amino acids, 20 types): action molecules.
Sequence of Amino Acids:
Protein
•
•
•
•
Unbranched polymer
Peptide backbone
Twenty side chain types
3D structure the key
Amino Acid
Polypeptide Chain
From genes to proteins and its function
Gene
> DNA sequence
AATTCATGAAAATCGTATACTGGTCTGGTACCGGCAACAC
TGAGAAAATGGCAGAGCTCATCGCTAAAGGTATCATCGAA
TCTGGTAAAGACGTCAACACCATCAACGTGTCTGACGTTA
ACATCGATGAACTGCTGAACGAAGATATCCTGATCCTGGG
TTGCTCTGCCATGGGCGATGAAGTTCTCGAGGAAAGCGAA
TTTGAACCGTTCATCGAAGAGATCTCTACCAAAATCTCTG
GTAAGAAGGTTGCGCTGTTCGGTTCTTACGGTTGGGGCGA
CGGTAAGTGGATGCGTGACTTCGAAGAACGTATGAACGGC
TACGGTTGCGTTGTTGTTGAGACCCCGCTGATCGTTCAGA
ACGAGCCGGACGAAGCTGAGCAGGACTGCATCGAATTTGG
TAAGAAGATCGCGAACATCTAGTAGA
Function
> Protein sequence
MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVS
DVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEIS
TKISGKKVALFGSYGWGDGKWMRDFEERMNGYG
CVVVETPLIVQNEPDEAEQDCIEFGKKIANI
Languages of Protein and DNA
In real life
What do bioinformaticians study?
Example: Comparison and Similarity
What is the function of these structures?
What is the function of this sequence?
What is the function of this motif?
–
–
the fold provides a scaffold, which can be decorated
in different ways by different sequences to confer
different functions
knowing the fold & function allows us to rationalise
how the structure effects its function at the molecular
level
• Compare proteins with similar sequences and understand what
the similarities and differences mean.
Genomes sequenced
A.thaliana
First bacterial genomes sequenced
H.influenzae and M.genitalium
1995
•Mouse
•Ciona
•Rice
•Fugu
•Anopheles
The yeast genome
2002
Human draft
1996
2001
•Human finished
•Rat
•Chicken
E.coli K12
1997
1998
2004
Full sequence
of chr. 22
2005
1999
C.elegans
2000
D.melanogaster
Genome & Chr. 21
2003
Chimpanzee
Xenopus
Zebrafish
Sequences
(millions)
# of databases (estimated) .
Growth in Data and Databases
700
600
500
400
300
200
100
0
Year
2005
2007
2000
1996 2004
1995
1992
1990
1986
1985
1980
1982
Whole genome comparisons: Gene order in genomes
The ~1000 genes on Mouse Chromosome 16 map to
Human Chromosomes 3, 8, 12, 16, 21, and 22
Mouse Chromosome 16
Comparative Genomics
Helicobacter pylori J99
Helicobacter pylori 26695
Phylogenetic tree
•
HRV10
HRV100
HRV66
HRV77
HRV25
HRV62
HRV29
HRV44
HRV31
HRV47
HRV39
HRV59
HRV63
HRV40
HRV85
HRV56
HRV54
HRV98
HRV1A
HRV1bGenba
HRV12
HRV78
HRV20
HRV68
HRV28
HRV53
HRV71
HRV51
HRV65
HRV46
HRV80
HRV45
HRV8
HRV95
HRV58
HRV36
HRV89Genba
HRV7
HRV88
HRV23
HRV30
HRV2Genban
HRV49
HRV43
HRV75
HRV16Genba
HRV81
HRV57
HRV55
HRVHanks
HRV21
HRV11
HRV33
HRV76
HRV24
HRV90
HRV18
HRV34
HRV50
HRV73
HRV13
HRV41
HRV61
HRV96
HRV15
HRV74
HRV38
HRV60
HRV67
HRV32
HRV9
HRV19
HRV82
HRV22
HRV64
HRV94
Reconstruction phylogenetics tree
Graph based and Optimization Methods
Protein structure prediction and modeling
• Predict the 3-dimensional structure of a protein from its
primary sequence
MNIFEMLRID
HLLTKSPSLN
DEAEKLFNQD
LDAVRRCALI
LQQKRWDEAA
TTFRTGTWDA
EGLRLKIYKD
AAKSELDKAI
VDAAVRGILR
NMVFQMGETG
VNLAKSRWYN
YKNL
TEGYYTIGIG
GRNCNGVITK
NAKLKPVYDS
VAGFTNSLRM
QTPNRAKRVI
?
Computer Aided Drug Design
• Understanding how structures bind other molecule
(function)
• Designing inhibitors
• Docking, structure modeling
Protein docking
Given 2 biological molecules (one of them protein) determine whether
they interact.
Protein-protein docking
Protein- ligand docking
•
Efficiently represent the docking surface and identify regions of interest.
•
Match corresponding surfaces to optimize binding sites.
Optimization methods for docking
problems
Genetic Algorithm
Linear optimization
Nonlinear optimization
Direct search methods
..
Drug Lead Screening & Docking
?
Complementarity
- Shape
- Chemical
- Electrostatic
Molecular Graphs and Graph similarity
• A molecular structure can be interpreted as a mathematical
graph where each bond is an edge.
• Such a representation allows for the mathematical processing
of molecular structures using graph theory.
Microarray: Measuring Gene Expression
Idea: measure the amount of mRNA to see which
genes are being expressed in (used by) the cell.
Measuring protein would be more direct, but is
currently harder.
Hybridization, RNA, cDNA
Microarray
The Process
Chemistry Basics:
Surface Chemistry is used to attach the probe molecules
to the glass substrate.
Chemical reactions are used to attach the florescent
dyes to the target molecules
Probe and Target hybridise to form a double helix
Labelled targets
in solution
Heteroduplexes
Probes on array
Hybridisation
The array
+ Green label
RNA sample 1
Scanner
+ Red label
RNA sample 2
Tumors and Microarray
Tumor gene profiles and Microarray data



Image portrays gene expression
profiles showing differences
between different tumors
Tumors:
MD (medulloblastoma)
Mglio (malignant glioma)
Rhab (rhabdoid)
PNET (primitive neuro ectodermal
tumor)
Ncer: normal cerebella
Resolution Image Processing for Microarray
standard 10m [currently, max 5m]
100m spot on chip = 10 pixels in diameter
Image format
TIFF (tagged image file format) 16 bit
(65’536 levels of grey)
1cm x 1cm image at 16 bit = 2Mb
Data
What is a genetic network?
Gene networks are usually
represented as directed
graphs where the nodes are
defined as the genes and the
edges represent regulation.
Networks summarized a
limited relationship between
a subset of genes in both
positive and negative
feedback loops.
Jenssen et al. 2001
Construction of a Simple Network
Clustering
Brazhnik et al.
Descargar

Mining For 3D Contact Potentials in Proteins