Information System for
Comparative Analysis of Legume
Genomes
Anita Dalwani
Advisors: Dr. Roger Innes,
Dr. Haixu Tang
LAYOUT
•
•
•
•
•
•
Motivation
Participants
Background
Design
Results/Demo
Future Work
Motivation?
Motivation
• Goal of legume genome project
- Investigate the process of genome restructuring
following polyploidization in plants (soybean and its
relatives in the Glycine genus)
- Try answering questions like :
- Genome evolution on both short(<100,000yrs)
and long (>50 million yrs) time scale
- Evolution of disease resistance (R) genes.
Motivation
• To answer these questions:
- 1 Mbp syntenic genomic regions from six taxa
as well as their duplicated regions in the
polyploidy members (12 such regions in total)
will be sequenced and analyzed.
- These regions contain several important
disease resistance (R) genes.
Motivation
Plant species and accession
No. of regions to be
analysed
Whole
Genome size (megabases)
G. max cultivar Williams 82
2
1103
G. max PI 96983
2
1103
G. tomentella G1188 (2n=80)
4
2083
G. tomentella race D3 (2n-40)
2
1103
Teramnus labialus
1
< 700
Medicago truncatula
1
466
Motivation
• Information System
- central repository for the data
- stores and retrieves updated information
- bioinformatics and visualization tools
Participants
Participants
University
Roles
Roger Innes
Tom Ashfield
Anita Dalwani
Murali Mohan
Innes Lab
Indiana University,
Bloomington.
Principal Investigator
R gene evolution
Database development, Web application.
Database development.
Nevin Young
Steve Cannon
Roxanne Denny
Young Lab,
University of
Minnesota
Co-PI
phylogenetic; R genes; comparative genomics.
Lab Manager
Jeff Doyle
Bernard Pfeil
Doyle Lab
Cornell University
Co-PI
phylogenetic and polyploidy
Bruce Roe
Majesta Siegfried
Roe Lab,
Oklahoma University
Co-PI
Bac sequencing
Saghai Maroof
Milind Ratnaparkhe
Jafar Mammado
Maroof Lab,
Virginia Tech
Co-PI
R genes; comparative genomics
R genes; comparative genomics
Background
•
Procedure
1. Create and make available Bacterial
Artificial Chromosome (BAC) libraries of
each species.
Indexing available BAC, BAC end sequences,
library, probes, vector, gel images
Background
2. Assemble syntenic BAC contigs from
each library
i. Strategically chosen soybean clones are
used as probes
ACCCGT
AATTC
Probe 53 Probe 21 Probe 9 Probe 26 Probe 1 Probe 3 -
GTACTT
AAACT
ACCCGT
AATTC
GTACTT
AAACT
CCCC
AATC
CCCC
AATC
ii. Individual probes are hybridized to high-density BAC
filters representing all the target genomes
Background
Background
iii. Integrity of contigs is confirmed by
fingerprinting
iv. Set of clones that hybridize to two or more
probes are selected
v. BACs representing the tentative minimum
tiling path will be end sequenced
Probe53
Probe21
Probe9
Probe26
Probe1
Bac4
Bac4
Probe3
Bac1
Bac2
Bac2
Bac3
Bac5
Bac6
Bac7
Bac8
Bac8
Probe53
Probe21
Bac2
Bac2
Probe9
Probe26
Probe1
Bac4
Bac4
Probe3
Bac3
Bac8
Bac8
ACCCGT
AATTC
ACCCGT
AAATC
GTACTT
AAACT
CCCC
AATCT
CCGC
AATC
CTTCTT
CCCC
AATC
Background
3. DNA sequencing, Assembly ,
Annotation
4. Compare the content, order and
sequence of gene
5. Results available for public
Importance
•
Information System
-
Centrally available data
-
User-friendly interface for retrieving the information
-
Updated progress information
-
Tools for interpreting the results.
Works as an Laboratory Management Information
System
Design
•
Steps for designing the Information
System.
1. Design the Database
- Data: BAC, BES, Probes, Libraries,
vector, library screen hits etc.
Design
- Visualize the relationship between these
large amount of data.
For example,
Library table stores detailed information about
each library used rather than having each BAC
storing the library information
Design
- Created tables based on these relationship
Main tables used in the database are:
BAC
GEL IMAGES
GENOTYPE
LIBRARY SCREENS
PRIMER
PROBE WITHIN BACS
BES
GENOMIC SOUTHERNS
LIBRARY
LIBRARY SCREEN HITS
PROBE
VECTOR
Design
PROBE
Has
L IB R A R Y
D e r i ve d
fr o m
H as
H as
is a
BAC
H as
P R IM E R
is a
BES
Design
L ib r a r y
S creen
H as
L ib r a r y
S c r e e n h its
Design
2. Populate the database with initial set of
data
- Initial set of data was stored in form of MSExcel.
- Perl script for parsing information.
Design
• Web Database Application
- understanding the needs for the project
- Web database interface
- displays information about the
project
- add and update interface
- tools for analyses
Design
• For determining the tiling path
- Designing a Visualization tool
- displays the locations of the clones with
respect to probes
- Probes are strategically chosen from soybean
genomes
Design
- Input : library name
- subset of probes with at least one hit with the library
are selected
- BAC clones for the library are generated which have
hits with probes
- Probes are arranged in order of their position
- BACs are mapped to these probes.
Design
• System Specifications
- Database: Oracle 9i
- Languages: PHP, Perl, HTML, JavaScript
- Web Server: Apache 1.3.29
- Platform: Unix (SunOS 5.9)
Results
Future Work
• Comparative physical Mapping
• Bioinformatics tools
• Public interface
Acknowledgements
• Dr. Roger Innes
• Dr. Haixu Tang
• Dr. Sun Kim
• Legume genome project team
References
• Innes, Roger W. Comparative Analysis of
Legume Genome Evolution, Proposal
submitted to National Science Foundation.
• Tang, Haixu. Comparative physical mapping:
ordering clones by cross species
hybridization Dec 2004.
• www.bio.indiana.edu/~nsflegume
Descargar

Information System for Comparative Analysis of Legume