![]() ![]() |
|
|
Bioinformatics
Center BIOINFORMATICS CENTRE (BIC) UNIVERSITY
OF PUNE Introduction
& History The
Bioinformatics Center at the University of Pune was established in 1987 as
one of the nine Distributed Information Centers under the Biotechnology
Information System (BTIS) of the Department of Biotechnology, Govt. of
India. The Bioinformatics Center (DIC) at the University of Pune, Pune
provides accurate and up-to-date information in the area of Biotechnology
with a stress on Virology, proteins and nucleic acid sequences and
structures, Microbial strain data and also an access to the other related
areas through networks. Apart from answering the bibliographic queries and
supplying the sequence data, the Center also offers unique facilities for
data analysis. The hardware, software and the expertise available at this
Center, facilitate the users in every possible way. The Center is
equipped with state-of-art services and facilities in terms of hardware,
software, information storage and retrieval. The Main
Objectives Of This Centre Are : Ø
To
function as an information base with resources of databases in the area of
Biotechnology and Life Sciences. Ø
To
generate information relevant to specific areas, structure the information
and make it available to interested parties. Ø
To
act as an active network node, in which scientists and entrepreneurs get
access to the information resources available in the area of their interest. Ø
To
identify and implement necessary information systems and software. Ø
To
conduct seminars, workshops and training courses (Advanced Diploma in
Bioinformatics). Research
Activities: The Centre has a second wing consisting of doctoral, post- doctoral
students, research assistants and the scientists who carry out research in
areas closely related to Bioinformatics. Some of the research projects that
are in progress are listed below: Computer
Aided Identification and Classification of Viruses The Bioinformatics
Centre has prepared the World's largest computerised 'Animal Virus
Information System' (AVIS). Using
the information available in this database, a hybrid method to identify the
unknown animal viruses on the computer is developed. This method has two
main components - §
Assignment
of family to the virus using deterministic approach in monotheistic fashion,
and §
Use
of probabilistic method to identify a virus from the pre-knowledge of
assigned family. The probabilistic method is based on Willcox's implementation of
Bayes' theorem. The software developed in the Centre allows rapid virus
identification and also suggests the users to carry out additional tests to
improve identification score.
This
is the only software in the world that allows on-line, through Internet,
identification of species in interactive mode. Studies on
the proteins of viruses Protein sequence database is searched to extract sequences of viral
proteins. These protein sequences were grouped and conserved oligo-peptides
are obtained for a particular protein from each family/group. These unique
conserved Oligo peptides served as a signature of that family/group. In
addition, amino acids composition data of protein has been used in the
creation of identification matrices. These identification matrices are used
in computer aided virus identification. The analyses of Flavivirus envelope
glyco-proteins sequences have pointed out periodic occurrence of Gly amino
acid. The role of Gly and other periodically repeating amino acid is being
investigated by carrying out modeling studies at the peptide level as well
as by analysing modeled 3-D structure of EgP of Japanese Encephalitis Virus. Analysis of
Human DNA Sequences Human DNA sequence data is increasing
almost exponentially due to Human Genome Project. Several groups are
analysing this data to get insight in the function of the genome. At our
Centre, available Homo sapiens DNA sequence data were extracted and analysed.
One of the interesting findings of the study is the
higher usage of trinucleotides ending with T in the F3 frame. Similar
pattern in the usage of trinucleotides was also reported earlier from
analysis of cDNA sequences in prokaryotes (Kolaskar & Reddy, 1985).
Therefore protein coding regions and F3 DNA sequences having large open
reading frame were translated in protein and protein databases such as PIR,
SWISSOROT were searched. Our analysis further pointed that these
hypothetical Homo sapiens proteins from F3 frame have sequence similarity
with bacteria, protozoa, etc. Such DNA sequences are therefore characterised
as ancient DNA. Additional studies to support our hypothesis are in
progress. Analysis of
inter-residue Ca distances and its implications in protein structure: Analysis of Ca i to Ca
i+n distance distributions in 159 well resolved X-ray crystal structures of
non-homologous proteins have been carried out. The results show that when n
< 5, the distributions are clearly bimodal. For 9 < n < 15 they
have a single sharp peak and are skewed towards the right. At higher values
of n (n > 20) the distributions appear like a truncated gaussian. A
strong correlation between secondary structure content and Ca i to Ca i+n
distance was found for n < 5, which gradually disappeared with increasing
values of n. The results indicate that: §
Short-range
interactions play a major role in determining the conformation of residues
separated by upto 5 positions in the sequence. §
Medium
range interactions seem to dominate when the sequence separation is between
9 to 15 residues and §
Long
range interactions are predominant at sequence separations of 20 residues or
greater. Difference (f Y) probability distributions of peptides corresponding
to particular Ca i to Ca i+n distance range were calculated. The results
show that: §
Peptides
of length upto 5 residues have very different (f Y) distributions compared
to that calculated for all proteins studied. §
However,
for peptides having n greater than 20 residues the (f Y) distributions are
indistinguishable from that calculated for entire proteins. The results obtained from difference (f Y) distributions as well as
Ca i to Ca i+n distance distributions indicate that at least a 20 residue
peptide is required to adequately reflect the balance of short, medium and
long range interactions as found in proteins. Finally, analysis of the
conformations of those peptides, which had uncommon values of Ca i to Ca i+n
distance was carried out. It was found that peptides when n was ~30,
peptides having very small Ca i to Ca i+n distance (< 5A) had certain
select topologies. A particularly common motif was a helix with bend(s) at
one or both ends and an extended region closely following the contours of
the helix. When the Ca i to Ca i+n distance was unusually large (> 50A),
the peptides were found to have very open structures. In most cases they
were found to occur as linkers between domains in multidomain proteins. Flexibility
and order in nucleic acids - a study of the backbone conformation of
oligonucleotides: Crystal structure data of 56 oligonucleotide structures have been
analysed by studying the variation of main chain dihedral angles. In
nucleotides which conform to A-DNA it was observed that when the torsion
angle around C5'--C4' bond (g) was nearly equal to 180o, the torsion angle
around P--O5' bond (g) was also ~180o. All the nucleotides having this
conformation were found to be purines. Further, in the B-form nucleotides
when a was in the range 0o to 90o, b and g were found to be trans with
respect to each other and d ~ 120o, showing coupling between a, b, g and d
torsion angles. It was observed that the six main chain torsion angles
preferred certain specific range of values in both A and B form DNAs.
However the ranges are not exclusive as it was also observed that only 47%
of A-DNA nucleotides and 32% of B-DNA nucleotides have all the six main
chain torsion angles within the ranges characteristic of A and B form DNAs
respectively. The analysis of global helical parameters, twist, rise and
radius and the helix axis have shown that the angle between ideal A or B-DNA
helix axes and the helix axis of the experimental structures, (R0-A and R0-B
respectively), are very useful to study structural variation within each
family. The analyses of structures have shown that main chain flexibility is
quite high at mono- or dinucleotide levels. The structural variation among
various nucleotides in a molecule is coupled in such a way as to maintain an
overall helical path of the backbone. Helical flexibility studies therefore,
should take into consideration both the main chain and the base-pair
conformations. Use Of
Genomic And Metabolic Pathway Data To Model de novo Purine Biosynthesis
Pathway In Helicobacter Pylori: In the last fifteen months, complete genome sequences of sixteen
microorganisms have become available. Similarly metabolic pathway database
WIT has also become available on the Internet along with analysis tools such
as those with KEGG. These different databases have been used to gain an
insight into one of the fundamental pathway of Helicobacterpylori namely purine biosynthesis. In this organism, the purine
biosynthesis pathway is very different as compared to Escherichia coli,
Bacillus subtilis, and even it's closest relative Campylobacter jejuni. Thus
study of this pathway by analysing the Helicobacter pylori sequence data and
the metabolic pathway data provides a very good example to show the power
and usefulness of these databases when combined. The pathway postulated
using enzymes with similar sequence and functions and the reactions which
they catalyse provide interesting variations in the pathway. Molecular
Dynamics Simulations using Parallel AMBER: MD
Simulations of PvuII Substrate The parallel version of AMBER, obtained from University of
California San Francisco, USA, was ported on Indian made parallel
supercomputer PARAM OpenFrame. This version of AMBER was used to study the
oligomer 5'-TGACCAGCTGGTC-3' which contains the cleavage site for PvuII, a
restriction endonuclease type II enzyme. One still does not have the
complete idea about the exact mechanism of enzyme action in PvuII and BamHI.
MD studies can be very useful in understanding such problems. Unconstrained
Molecular Dynamics (MD) simulations on the duplex of the oligomer
5'-TGACCAGCTGGTC-3' in explicit water box and counterions (Na+) was carried
out. Molecular dynamics simulation was carried out for 1.3 nanosecond at 283
K with Watson-Crick constraints. This is one of the largest simulations
reported in the world. The offset value was seen to be maximum at base A6,
G7 and C8 at temperature 300 K and 283 K. MD
Simulations of Promoter Sequences The process of transcription initiation has attracted attention of
scientists in the last few decades. One of the reason is the general
existence of TATA sequence around -10, at the 5' end of the transcription
initiator. TATA sequences are present in many other coding and non-coding
regions. Why only certain TATA acts as a signal is a major question. One way
to gain insight is to carry out computer simulation of TATA containing
sequences which fall under the category of promoters and non-promoters. It
has been hypothesized that the TATA box at the -10 region acts as a signal
for transcription initiator due to high flexibility in this region. Such a
flexible DNA bends with specific curvature with little distortion in double
helical conformation. RNA polymerase recognizes such bent conformations in
prokaryotes. The essential validity of this hypothesis was checked by carrying
out molecular dynamics studies on the region around TATA in: §
P22
ant promoter (5'-AGCACTCTACTATATTCTCAATAG-3'), §
its
point mutants- non-promoter (T->C at -7, 5'-AGCACTCTACTATATCCTCAATAG-3')
and §
weak
promoter (T->G at -8, 5'-AGCACTCTACTATAGTCTCAATAG-3'). These unconstrained molecular dynamics studies pointed out higher
flexibility to the region around TATA in promoters as against non-promoters.
The bending trend in the promoter sequences was also observed in preliminary
molecular dynamics studies of other known prokaryotic promoter regions
around TATA. The bending was
minimal for non-promoter sequences with TATA at the Centre. Replacement of
TATA by other tetranucleotides such as GAGA also seems to reduce the
flexibility of the promoter region, thus indicating the validity of the
hypothesis. Modelling Of
The Peptides For Rheumatoid Arthritis Factor And b2-Microglobulin: Polyclonal or monoclonal human IgM rheumatoid factors (RF) react
with: §
Eight
antigenic sites on the CH3 IgG domain §
Four
sites on CH2 and §
Two
on human b2 microglobulin. All 14 of these RF-reactive epitopes are linear 7-11 amino acid
peptides with different primary sequence. We questioned whether RF
reactivity with such a variety of epitopes showing no obvious sequence
homology might result from conformational similarities shared by various RF-reactive
regions. Strong support for this concept was obtained using rabbit antisera
as well as mouse mAbs to individual CH3, CH2 or b2m RF-reactive peptides.
Major cross-reactivity was demonstrated between most of the 14 different
CH3, CH2, or b2m RF-reactive peptides using individual anti-epitope
antibodies. Molecular modelling studies of these peptides showed striking
similarities in three-dimensional shape among many RF-reactive peptides.
Main-chain atoms rather than side chains seemed to contribute most directly
to conformational similarity. Molecular simulation studies on control
peptides showed no conformational similarities with RF-reactive peptides.
Our studies indicate that autoantibodies such as RF recognize main-chain
conformations of reactive epitopes and react with a number of antigenic
determinants of quite different primary sequence but similar main chain
conformations. Homology
Modelling Of Variable Region Of Heavy Chains Of: A. Rh IgM Rheumatoid factor sequences of IgM type were extracted from data
bank and aligned with the IgM sequences of structurally known
immunoglobuline. Three-dimensional structure of Rheumatoid factor was
predicted using homology-modelling approach. The predicted structure of RH
factor will help in better understanding of the pathology of the disease and
in designing the drug. B. Wegener Disease Wegener's Granulomatosis (WG) is a systemic disease of unknown
etiology characterized by nectrotizing granulomas and vasculitis affecting
the upper and lower respiratory tract and kidneys. In most WG patients,
presence of anti-neutrophil cytoplasmic antibodies (cANCA) provides a useful
diagnostic serologic marker often paralleling disease activity and tissue
inflammatory reaction. Anti-neutrophil cytoplasmic antibodies react with a
limited spectrum of neutrophil cytoplasmic
antigenic materials including proteinase-3 (PR-3) and in some instances
myeloperoxidase or lactoferrin. Several previous studies have focussed on
attempts to define the predominant antigenic epitopes on PR3 or the
structural features of V-regions of antibodies reacting with PR-3 or other
related neutrophilic cytoplasmic antigens. This work is carried out in collaboration with Prof. Ralph C.
Williams, Eminent Scholar, Marcia Whitney Schott, Chair in Rheumatoid
Arthritis Research, Div. of Rheumatology and Clinical Immunology, Dept. of
Medicine, University of Florida. He has a primary V-region sequence of a
monoclonal human IgM anti-PR3 antibody derived from a cell line (WGH1)
produced from a patient with WG and we have carried out molecular modelling
for VH of WGH1 which shows a very unusual conformation within the heavy
chain V-region CDR3 of this human monoclonal antibody. The manuscript
describing the methodology and the results has been communicated. Japanese encephalitis (JE) is a RNA virus which is endemic in India
and South-East Asia. The only measure to prevent the disease is prophylactic
vaccination. Therefore, antigenic determinants on the envelope glycoprotein
of Japanese encephalitis virus were predicted using the algorithm developed
in-house (Kolaskar & Tongaonkar, 1990). The 155YSAQVGASQ163 a predicted
antigenic determinant was confirmed experimentally to be neutralizing B-cell
epitope. Carrying out molecular modelling studies on the peptide YSAQVGASQ
developed the stable epitope. It
is shown that the monoclonal (Hx-2) raised against purified virus binds
strongly to this peptide. The anti-peptide antibody is also neutralising.
The spacer region is postulated from AAKFT for attaching the Th cell binding
peptide that was identified earlier. The conformation of Th epitope -
436SIGKAVHQVF445 was predicted using Biosym software package. These two
B-cell and Th-cell epitopes were chosen as the candidates of chimeric
synthetic peptide vaccine. This work was presented in the Xth International Congress of
Virology, Jerusalem, Israel during August 1996 as a poster. Homology
Modeling of Envelope Glycoprotein of Japanese Encephalitis Virus In continuation of the peptide vaccine development project we have
carried out homology modelling of envelope glycoprotein (EgP) of JEV
Nakayama strain. The sheer size of the protein as well as the
non-availability of the 3-D structure information on closely related
proteins makes the task of predicting the 3-D structure of the JEV EgP
difficult. Recently, the 3-D structure of Tick-Borne encephalitis (TBE) virus
EgP (partial) has been solved by X-ray crystallography (Rey et.al, 1995). We
have predicted the 3-D structure of JEV EgP (1-399) using homology modelling
approach and TBE virus EgP as a
template. The initial conformation of loop regions was chosen by (Hobohm
& Sander, 1994). However the final conformation of the each loop was
fixed by carrying out MD studies of 500ps at room temperature and
minimisation using steepest descents and conjugate gradient methods. These
loops were than attached to the SCRs and minimisation was carried out for
the whole molecule after removing any short contacts due to side chain-side
chain and side chain-main chain interactions. The unconstrained energy
minimized structure was analysed and the residues having bad bond angles,
dihedral angles (f ,Y ,w ) were fixed to equilibrium values and minimisation
was carried out further. The residues which were found to have (f, Y) values
outside the allowed region of Ramachandran Plot were corrected. After each
such modification unconstrained energy minimisation is carried out to reach
the acceptable rms. derivative criteria. This whole molecule was then soaked in water and the studies are in
progress. It has been noticed from the preliminary analysis that it
has three distinct domains: §
A
central b barrel (domain I) §
An
elongated dimerization region (domain II) and §
The
C-terminal immunoglobulin-like module (domain III). The predominant secondary structure is b strand with 2 helices. The
overall secondary structure in the hinge regions is quite different in JEV
as compared to TBE. The conformation of the peptide YSAQVGASQ in the protein
is very similar to the conformation of the free peptide, which we had
modelled earlier. To our knowledge this is the largest single chain protein modelled
on the computer. This structure will be highly useful in designing peptide
vaccine. This work was presented in the Bioinformatics <--> Structure
conference, Jerusalem, Israel during Nov 17-21, 1996. Prediction of
3D structure of Asparagine Synthetase Asparagine synthetase is an important enzyme as the level of
Asparagine is very high in leukemia patients. Present treatment of
Asparaginase, if given in higher doses create ammonia toxicity and kill the
patients. Therefore design of specific inhibitor of this enzyme could help
in controlling Disease State. Amino
acid sequence comparison of Asparagine synthetases from various biological
sources and structure of Asparaginase with bound aspartate indicated the
conserved region 316IETYDVTTIRASTPMYL332 with Thr332. This peptide is
predicted to be the active site. Molecular modelling studies of this peptide
in vaccuo and in water showed that the peptide has loop conformation. Side
directed mutagenesis studies proved that the hypothesis namely involvement
of this region in Aspartate binding is correct. Further, the
glutamate-binding domain that is in the N-terminal region of Asparagine
synthetase and belongs to purF family of amidotransferases is being modelled
using homology modelling approach. Reference proteins used for this study
are glutamine PRPP amidotransferase (1GPH), glucosamine 6-phosphate synthase
(1GMS).
Bioinformatics Center National Chemical Laboratory
|
|
Home | News | Opportunites | Careers | Contact Us | Talent | Facilities | Business | Government | Lifestyles | Committee Members | Manufacturing Companies | Guest Book | Pune Vyaspeeth Members | |