Selection of Resources for the
Development of an Information
Service Program in Molecular
Biology and Genetics
Ansuman Chattopadhyay, PhD
Information Specialist in Molecular
Biology and Genetics
Health Sciences Library System
University of Pittsburgh
Topics
Multi Step Life Sciences Research
 Literature Retrieval
 Sequence Analysis
 Laboratory Resources
 University of Pittsburgh HSLS Molecular
Biology Information Service Program

Life Sciences Research- A
Multi Step Process
Hypothesis
Generation
Knowledge
Mining
Sequence
Analysis
Mol Biol Information Service
Laboratory
Bench Work
Literature Retrieval Resources
Hypothesis
Generation
Knowledge
Mining
Sequence
Analysis
Laboratory
Bench Work
PubMed
--CellSpace Knowledge Miner
--PubGene
--Genomatix BiblioSphere
Too much information
83,130
31,596
Literature Retrieval Resources
www.cellomics.cellspace.com
http://www.pubgene.org
http://www.genomatix.de/
What is CellSpace ?
www.cellomics.cellspace.com
CellSpace is a bioinformatics tool-- a knowledge
mining system that automatically detects,
analyzes, and reports the logical relationships
between four types of terms found in the
research literature:
1.
2.
3.
4.
molecule: proteins, genes, drugs
function: biological processes and disease states
cell type
organism
CellSpace Knowledge Miner
What is CellSpace ?
Literature Association
Molecules
+
+
+
+
Functions
+
+
+
+
Cells &
Systems
+
+
_
_
Organisms
+
+
_
_
Cells & Systems: Cells, Sub-cellular
Components,Tissues and Organs
Molecules:
• Molecules
•Drugs
•Genes
•Proteins
Functions:
• Biological
Functions
• Disease
States
What you can do with CellSpace?
•Start with a single protein (or other molecule) and
find its functions, the diseases in which it is implicated,
and related molecules.
•Start with a disease or biological function and find
related molecules, or related functions.
•Start with two or more functions, and find the related
molecules that they have in common
What you can do with CellSpace?
•Start with results from a high-throughput experiment
(such as a cluster of co-regulated genes from
microarray analysis), and easily find the functions
that they share.
• Start with the results of proteomics experiments, and
quickly screen the data to distinguish published
interactions from novel ones.
.View the literature that supports the connections
found in CellSpace.
CellSpace Knowledge Miner
Start with a disease or biological function
and find related molecules, or related
functions
•Find molecules related to apoptosis
5
1
2
3
Drag and drop
4
Click to select
Find molecules associated with
“apoptosis”
Get references
Results are presented
with statistical likelihood
value
CellSpace Knowledge Miner
How CellSpace Works?
CellSpace computers analyze the National Library
of Medicine's MEDLINE database, performing
proprietary statistical correlation analyses regarding
the organisms, cell types, biological processes,
and molecules reported in 655 selected life science
research journals. The molecular relationships
extracted from the literature are then stored in the
CellSpace database, which can be queried via the
CellSpace user interface.
The information is updated every two weeks
PubGene
The Network Browser tool displays literature association
networks for a gene.
The Set Cover Article Search tool will let you search the
literature using a set covering algorithm. The set covering
algorithm is particularly useful to search for literature
references for large sets of terms.
PubGene
PubGeneThe query gene is shown with
bright red font in the graph,
its direct neighbors are shown
with darker red font,
and neighbours of neighbours
are shown with black font
BiblioSphere
BiblioSphere
BiblioSphere
BiblioSphere
BiblioSphere
BiblioSphere
Resources comparison
Availability Coverage
Update
frequency
CellSpace
Commercial
2 weeks free
trial
Every 2
weeks
PubGene
V2.1 free
V2.3 commercial
BiblioSphere
20
use/month
free
655
Medline
journals
All Medline
Journals
SP: H,M,R
V2.1-once in a
year
V2.3- every 2
weeks
All Medline continuous
Journals
Abstract
only
SP: H,M,R
Resources comparison
Search Terms
CellSpace
Mol: gene, protein, Drugs
Func: Biological func, Disease
state,
Cell and tissue type,
PubGene
Gene name
Bibliosphere
Gene name
Information Hubs
Hypothesis
Generation
Knowledge
Mining
Sequence
Analysis
Laboratory
Bench Work
The molecular biology and genetics resources
that can serve as information hubs,
an access point to retrieve a broad range of
information through a small number of
selected web-based public databases
Information Hubs
•UCSC Genome Bioinformatics Resources
Gene’s detail page
Genome Browser
Family Browser
Proteome Browser
•SwissProt
•LocusLink / Entrez Gene
•Gene Cards
•Gene Lynx
•Incyte Proteome Bioknowledge Library
•Human Protein Reference Database
•Organism Genome Consortium sites
Information Hubs
Gene Expression
Data
UCSC
Family browser
LocusLink
RNA
Structure
SwissProt
OMIM
GeneCards
Other
Species
CGAP
UCSC Gene’s
Detail Page
GeneLynx
PubMed
AceView
UCSC
genome browser
Mouse Genome
Informatics
UCSC
Proteome browser
Sequence
Genomic,mRNA
Protein
Protein
Structure
GO
Annotations
Molecular
function
Bio
pathways
Cellular
component
Information Hubs
http://genome.ucsc.edu/cgi-bin/hgGene?hgsid=31408663&db=hg16&hgg_gene=U14680&hgg_chrom=chr17&hgg_start=41570859&hgg_end=41650551
Information Hubs
http://www.ncbi.nlm.nih.gov/LocusLink/LocRpt.cgi?l=672
Information Hubs
Information Hubs
http://bioinfo.weizmann.ac.il/cards-bin/carddisp?BRCA1&search=BRCA1&suff=txt
Information Hubs
http://www.hprd.org/protein/00218
Information Hubs
Information Hubs
Sequence
Expression in
Organ/Tissue
Cell Type
Tumor Type
Protein
Interactions
Literature
Excerpts
Disease
Proteome
BioKnowledge
Library
Gene Ontology
terms
Gene
Regulation
Protein
Modifications
Resources Comparison
Availability
Type
SP Coverage
Noteworthy
Features
H,M,R, etc
Expression,
Proteome/Fam
ily Browser
ALL
Protein
information
UCSC
Free
SwissProt/
uniprot
Free
LocusLink
Free
H,M,R,N,P
etc
Link to NCBI
resources
GeneCards
Free
H
Expression
GeneLynx
Free
H,M,R
Proteome BKL
Commercial
Curated
H,M,R,Y,N,
Pathogenic
Fungi
Literature
excerpts
HPRD
Free
Curated
H
Protein
interaction
Curated
Genome Browsers :
Molecular Database Catalog
http://nar.oupjournals.org/
•Nucleic Acids Research Database Issue
Growth of Molecular databases
600
500
400
Articles
300
Databases
200
100
0
1996 1997 1998 1999 2000 2001 2002 2003 2004
Database Catalog
http://www.infobiogen.fr/services/dbcat/
Sequence Analysis
Hypothesis
Generation
Knowledge
Mining
Sequence Search
Sequence Alignment
Sequence
Analysis
Laboratory
Bench Work
MolBiol Tools:
Restriction mapping,
PCR primer design
Sequence Manipulation
Web Server Catalog
http://nar.oupjournals.org/
•Nucleic Acids Research Database Issue
Sequence Analysis
http://www.bioinformatics.vg/
http://healthlinks.washington.edu/index.cfm?id=210BCCB7-511A-4C6B-8B40-DFC47AABEA7F
http://www.hsls.pitt.edu/guides/genetics
Sequence Analysis
http://www.bioinformatics.vg/
Sequence Analysis
Sequence Analysis
Sequence Analysis
Sequence Analysis
DNAStar
LaserGene
PC/Mac
PC/Mac
Sequence Analysis
Vector NTI
Database
DNA/RNA
Protein
Oligo
Enzyme
Gel Marker
Blast Result
Analysis Result
Software
Vector NTI core
AlignX
ContigExpress
GenomBench
BioAnnotator
Sequence Analysis
VectorNTI Advanced software suit consists of five independent
yet interconnected components:
•Vector NTI core: the cornerstone application for Vector NTI suite,
provides tools for sequence analysis and molecule manipulation.
•AlignX: a multiple sequence alignment tool
•ContigExpress: a DNA sequence assembly and sequencing
project management tool
•GenomBench: a tool for genomic DNA sequence analysis
and annotation
•BioAnnotator: a tool for functional annotation of DNAs
and proteins
Sequence Analysis
Using vector NTI molecular biologists can:
•Perform routine sequence analysis tasks such as restriction
mapping, identifying protein coding regions or finding
sequence motifs and carrying out sequence similarity searches
•Generate recombinant cloning strategies and protocols
•Design and analyze PCR primers
•Catalog a growing number of plasmids and PCR primers,
in order to track the origin and lineage of recombinant molecules
•Run in silico gel electrophoresis
•Perform and edit multiple sequence alignments on proteins
and nucleic acids
•Create publication quality graphics and more
Laboratory Resources
Hypothesis
generation
Knowledge
Mining
Protocols:
Useful Laboratory Resources:
Sequence
Analysis
Laboratory
Bench work
Laboratory Resources
http://www.interscience.wiley.com/c_p/index.htm
Basic Protocol
Alternate Protocol
Commentary
Critical Parameters
Troubleshooting
Time Considerations
Key References
Internet Resources
Laboratory Resources
http://researchlink.labvelocity.com/
HSLS Mol Biol Information Service
HSLS Mol Biol Information Service
http://www.hsls.pitt.edu/guides/genetics
Website Usage Report
http://www.hsls.pitt.edu/guides/genetics
Workshops
May 2003-April 2004
45
40
35
30
# Times Offered
25
20
# Workshop
Attendees
15
10
5
0
1
2
3
4
5
Workshop 1: Information Hubs
2: Sequence Similarity Searching
3: DNA Protein Analysis Tools
4: CellSpace Knowledge Miner
5: VectorNTI
One-on-one Consultation
14
12
10
8
Number of
Consultations
6
4
Total: 70
2
2003
ay
M
ar
M
Ja
n
N
ov
Se
pt
y
Ju
l
M
ay
0
2004
“…..only half of biomedical researchers using
genome databases are familiar with the tools
that can be used to actually access the data.”
“….. all scientists on the planet must be
empowered to use these powerful databases
to unravel longstanding scientific mysteries.”
atabases to unravel longstanding scientifi
c… Andreas D. Baxevanis & Francis S. Collins
Nature Genetics, September 2002, Vol 32
Descargar

Selection of Resources for the Development of an