Bioinformatics Center  : 

BIOINFORMATICS CENTRE (BIC) UNIVERSITY OF PUNE  

Introduction & History  : 

The Bioinformatics Center at the University of Pune was established in 1987 as one of the nine Distributed Information Centers under the Biotechnology Information System (BTIS) of the Department of Biotechnology, Govt. of India. The Bioinformatics Center (DIC) at the University of Pune, Pune provides accurate and up-to-date information in the area of Biotechnology with a stress on Virology, proteins and nucleic acid sequences and structures, Microbial strain data and also an access to the other related areas through networks. Apart from answering the bibliographic queries and supplying the sequence data, the Center also offers unique facilities for data analysis. The hardware, software and the expertise available at this Center, facilitate the users in every possible way. The Center is equipped with state-of-art services and facilities in terms of hardware, software, information storage and retrieval.  

The Main Objectives Of This Centre Are :

Ø      To function as an information base with resources of databases in the area of Biotechnology and Life Sciences.

Ø      To generate information relevant to specific areas, structure the information and make it available to interested parties.

Ø      To act as an active network node, in which scientists and entrepreneurs get access to the information resources available in the area of their interest.

Ø      To identify and implement necessary information systems and software.

Ø      To conduct seminars, workshops and training courses (Advanced Diploma in Bioinformatics).

Research Activities:  

The Centre has a second wing consisting of doctoral, post- doctoral students, research assistants and the scientists who carry out research in areas closely related to Bioinformatics. Some of the research projects that are in progress are listed below:  

Computer Aided Identification and Classification of Viruses

 The Bioinformatics Centre has prepared the World's largest computerised 'Animal Virus Information System' (AVIS).  Using the information available in this database, a hybrid method to identify the unknown animal viruses on the computer is developed. This method has two main components -

§         Assignment of family to the virus using deterministic approach in monotheistic fashion, and

§         Use of probabilistic method to identify a virus from the pre-knowledge of assigned family.

The probabilistic method is based on Willcox's implementation of Bayes' theorem. The software developed in the Centre allows rapid virus identification and also suggests the users to carry out additional tests to improve identification score. This is the only software in the world that allows on-line, through Internet, identification of species in interactive mode.

Studies on the proteins of viruses

Protein sequence database is searched to extract sequences of viral proteins. These protein sequences were grouped and conserved oligo-peptides are obtained for a particular protein from each family/group. These unique conserved Oligo peptides served as a signature of that family/group. In addition, amino acids composition data of protein has been used in the creation of identification matrices. These identification matrices are used in computer aided virus identification. The analyses of Flavivirus envelope glyco-proteins sequences have pointed out periodic occurrence of Gly amino acid. The role of Gly and other periodically repeating amino acid is being investigated by carrying out modeling studies at the peptide level as well as by analysing modeled 3-D structure of EgP of Japanese Encephalitis Virus.

Analysis of Human DNA Sequences

Human DNA sequence data is increasing almost exponentially due to Human Genome Project. Several groups are analysing this data to get insight in the function of the genome. At our Centre, available Homo sapiens DNA sequence data were extracted and analysed. One of the interesting findings of the study is the higher usage of trinucleotides ending with T in the F3 frame. Similar pattern in the usage of trinucleotides was also reported earlier from analysis of cDNA sequences in prokaryotes (Kolaskar & Reddy, 1985). Therefore protein coding regions and F3 DNA sequences having large open reading frame were translated in protein and protein databases such as PIR, SWISSOROT were searched. Our analysis further pointed that these hypothetical Homo sapiens proteins from F3 frame have sequence similarity with bacteria, protozoa, etc. Such DNA sequences are therefore characterised as ancient DNA. Additional studies to support our hypothesis are in progress.  

Analysis of inter-residue Ca distances and its implications in protein structure:

 Analysis of Ca i to Ca i+n distance distributions in 159 well resolved X-ray crystal structures of non-homologous proteins have been carried out. The results show that when n < 5, the distributions are clearly bimodal. For 9 < n < 15 they have a single sharp peak and are skewed towards the right. At higher values of n (n > 20) the distributions appear like a truncated gaussian. A strong correlation between secondary structure content and Ca i to Ca i+n distance was found for n < 5, which gradually disappeared with increasing values of n.  

The results indicate that:

§         Short-range interactions play a major role in determining the conformation of residues separated by upto 5 positions in the sequence.

§         Medium range interactions seem to dominate when the sequence separation is between 9 to 15 residues and

§         Long range interactions are predominant at sequence separations of 20 residues or greater.  

Difference (f Y) probability distributions of peptides corresponding to particular Ca i to Ca i+n distance range were calculated. The results show that:

§         Peptides of length upto 5 residues have very different (f Y) distributions compared to that calculated for all proteins studied.

§         However, for peptides having n greater than 20 residues the (f Y) distributions are indistinguishable from that calculated for entire proteins.  

The results obtained from difference (f Y) distributions as well as Ca i to Ca i+n distance distributions indicate that at least a 20 residue peptide is required to adequately reflect the balance of short, medium and long range interactions as found in proteins. Finally, analysis of the conformations of those peptides, which had uncommon values of Ca i to Ca i+n distance was carried out. It was found that peptides when n was ~30, peptides having very small Ca i to Ca i+n distance (< 5A) had certain select topologies. A particularly common motif was a helix with bend(s) at one or both ends and an extended region closely following the contours of the helix. When the Ca i to Ca i+n distance was unusually large (> 50A), the peptides were found to have very open structures. In most cases they were found to occur as linkers between domains in multidomain proteins.  

Flexibility and order in nucleic acids - a study of the backbone conformation of oligonucleotides:  

Crystal structure data of 56 oligonucleotide structures have been analysed by studying the variation of main chain dihedral angles. In nucleotides which conform to A-DNA it was observed that when the torsion angle around C5'--C4' bond (g) was nearly equal to 180o, the torsion angle around P--O5' bond (g) was also ~180o. All the nucleotides having this conformation were found to be purines. Further, in the B-form nucleotides when a was in the range 0o to 90o, b and g were found to be trans with respect to each other and d ~ 120o, showing coupling between a, b, g and d torsion angles. It was observed that the six main chain torsion angles preferred certain specific range of values in both A and B form DNAs. However the ranges are not exclusive as it was also observed that only 47% of A-DNA nucleotides and 32% of B-DNA nucleotides have all the six main chain torsion angles within the ranges characteristic of A and B form DNAs respectively. The analysis of global helical parameters, twist, rise and radius and the helix axis have shown that the angle between ideal A or B-DNA helix axes and the helix axis of the experimental structures, (R0-A and R0-B respectively), are very useful to study structural variation within each family. The analyses of structures have shown that main chain flexibility is quite high at mono- or dinucleotide levels. The structural variation among various nucleotides in a molecule is coupled in such a way as to maintain an overall helical path of the backbone. Helical flexibility studies therefore, should take into consideration both the main chain and the base-pair conformations.  

Use Of Genomic And Metabolic Pathway Data To Model de novo Purine Biosynthesis Pathway In Helicobacter Pylori:

In the last fifteen months, complete genome sequences of sixteen microorganisms have become available. Similarly metabolic pathway database WIT has also become available on the Internet along with analysis tools such as those with KEGG. These different databases have been used to gain an insight into one of the fundamental pathway of Helicobacterpylori namely purine biosynthesis. In this organism, the purine biosynthesis pathway is very different as compared to Escherichia coli, Bacillus subtilis, and even it's closest relative Campylobacter jejuni. Thus study of this pathway by analysing the Helicobacter pylori sequence data and the metabolic pathway data provides a very good example to show the power and usefulness of these databases when combined. The pathway postulated using enzymes with similar sequence and functions and the reactions which they catalyse provide interesting variations in the pathway.  

Molecular Dynamics Simulations using Parallel AMBER:  

 MD Simulations of PvuII Substrate

The parallel version of AMBER, obtained from University of California San Francisco, USA, was ported on Indian made parallel supercomputer PARAM OpenFrame. This version of AMBER was used to study the oligomer 5'-TGACCAGCTGGTC-3' which contains the cleavage site for PvuII, a restriction endonuclease type II enzyme. One still does not have the complete idea about the exact mechanism of enzyme action in PvuII and BamHI. MD studies can be very useful in understanding such problems. Unconstrained Molecular Dynamics (MD) simulations on the duplex of the oligomer 5'-TGACCAGCTGGTC-3' in explicit water box and counterions (Na+) was carried out. Molecular dynamics simulation was carried out for 1.3 nanosecond at 283 K with Watson-Crick constraints. This is one of the largest simulations reported in the world. The offset value was seen to be maximum at base A6, G7 and C8 at temperature 300 K and 283 K. It was also noticed that the rise value (h) between G7:C7' and C8:G6' basepairs increases. These studies also point out that the molecule has a helix shortening upto 50 %. The rise between G7:C7' and C8:G6' facilitates the bulky basic amino acid side chains to interact with DNA and carry out standard acid-base catalysis and thus cleavage of phosphodiester bond. The docking studies of the MD snapshots with the PvuII enzyme are in progress.

MD Simulations of Promoter Sequences

The process of transcription initiation has attracted attention of scientists in the last few decades. One of the reason is the general existence of TATA sequence around -10, at the 5' end of the transcription initiator. TATA sequences are present in many other coding and non-coding regions. Why only certain TATA acts as a signal is a major question. One way to gain insight is to carry out computer simulation of TATA containing sequences which fall under the category of promoters and non-promoters. It has been hypothesized that the TATA box at the -10 region acts as a signal for transcription initiator due to high flexibility in this region. Such a flexible DNA bends with specific curvature with little distortion in double helical conformation. RNA polymerase recognizes such bent conformations in prokaryotes.

The essential validity of this hypothesis was checked by carrying out molecular dynamics studies on the region around TATA in:

§         P22 ant promoter (5'-AGCACTCTACTATATTCTCAATAG-3'),

§         its point mutants- non-promoter (T->C at -7, 5'-AGCACTCTACTATATCCTCAATAG-3') and

§         weak promoter (T->G at -8, 5'-AGCACTCTACTATAGTCTCAATAG-3').

These unconstrained molecular dynamics studies pointed out higher flexibility to the region around TATA in promoters as against non-promoters. The bending trend in the promoter sequences was also observed in preliminary molecular dynamics studies of other known prokaryotic promoter regions around TATA. The bending was minimal for non-promoter sequences with TATA at the Centre. Replacement of TATA by other tetranucleotides such as GAGA also seems to reduce the flexibility of the promoter region, thus indicating the validity of the hypothesis.  

Modelling Of The Peptides For Rheumatoid Arthritis Factor And b2-Microglobulin:

Polyclonal or monoclonal human IgM rheumatoid factors (RF) react with:

§         Eight antigenic sites on the CH3 IgG domain

§         Four sites on CH2 and

§         Two on human b2 microglobulin.

All 14 of these RF-reactive epitopes are linear 7-11 amino acid peptides with different primary sequence. We questioned whether RF reactivity with such a variety of epitopes showing no obvious sequence homology might result from conformational similarities shared by various RF-reactive regions. Strong support for this concept was obtained using rabbit antisera as well as mouse mAbs to individual CH3, CH2 or b2m RF-reactive peptides. Major cross-reactivity was demonstrated between most of the 14 different CH3, CH2, or b2m RF-reactive peptides using individual anti-epitope antibodies. Molecular modelling studies of these peptides showed striking similarities in three-dimensional shape among many RF-reactive peptides. Main-chain atoms rather than side chains seemed to contribute most directly to conformational similarity. Molecular simulation studies on control peptides showed no conformational similarities with RF-reactive peptides. Our studies indicate that autoantibodies such as RF recognize main-chain conformations of reactive epitopes and react with a number of antigenic determinants of quite different primary sequence but similar main chain conformations.  

Homology Modelling Of Variable Region Of Heavy Chains Of:  

A. Rh IgM  

Rheumatoid factor sequences of IgM type were extracted from data bank and aligned with the IgM sequences of structurally known immunoglobuline. Three-dimensional structure of Rheumatoid factor was predicted using homology-modelling approach. The predicted structure of RH factor will help in better understanding of the pathology of the disease and in designing the drug.

B. Wegener Disease

Wegener's Granulomatosis (WG) is a systemic disease of unknown etiology characterized by nectrotizing granulomas and vasculitis affecting the upper and lower respiratory tract and kidneys. In most WG patients, presence of anti-neutrophil cytoplasmic antibodies (cANCA) provides a useful diagnostic serologic marker often paralleling disease activity and tissue inflammatory reaction. Anti-neutrophil cytoplasmic antibodies react with a limited spectrum of neutrophil cytoplasmic antigenic materials including proteinase-3 (PR-3) and in some instances myeloperoxidase or lactoferrin. Several previous studies have focussed on attempts to define the predominant antigenic epitopes on PR3 or the structural features of V-regions of antibodies reacting with PR-3 or other related neutrophilic cytoplasmic antigens.

This work is carried out in collaboration with Prof. Ralph C. Williams, Eminent Scholar, Marcia Whitney Schott, Chair in Rheumatoid Arthritis Research, Div. of Rheumatology and Clinical Immunology, Dept. of Medicine, University of Florida. He has a primary V-region sequence of a monoclonal human IgM anti-PR3 antibody derived from a cell line (WGH1) produced from a patient with WG and we have carried out molecular modelling for VH of WGH1 which shows a very unusual conformation within the heavy chain V-region CDR3 of this human monoclonal antibody. The manuscript describing the methodology and the results has been communicated.

Peptide Vaccine Modelling For JE

Japanese encephalitis (JE) is a RNA virus which is endemic in India and South-East Asia. The only measure to prevent the disease is prophylactic vaccination. Therefore, antigenic determinants on the envelope glycoprotein of Japanese encephalitis virus were predicted using the algorithm developed in-house (Kolaskar & Tongaonkar, 1990). The 155YSAQVGASQ163 a predicted antigenic determinant was confirmed experimentally to be neutralizing B-cell epitope.

Carrying out molecular modelling studies on the peptide YSAQVGASQ developed the stable epitope. It is shown that the monoclonal (Hx-2) raised against purified virus binds strongly to this peptide. The anti-peptide antibody is also neutralising. The spacer region is postulated from AAKFT for attaching the Th cell binding peptide that was identified earlier. The conformation of Th epitope - 436SIGKAVHQVF445 was predicted using Biosym software package. These two B-cell and Th-cell epitopes were chosen as the candidates of chimeric synthetic peptide vaccine.

This work was presented in the Xth International Congress of Virology, Jerusalem, Israel during August 1996 as a poster.

Homology Modeling of Envelope Glycoprotein of Japanese Encephalitis Virus

In continuation of the peptide vaccine development project we have carried out homology modelling of envelope glycoprotein (EgP) of JEV Nakayama strain. The sheer size of the protein as well as the non-availability of the 3-D structure information on closely related proteins makes the task of predicting the 3-D structure of the JEV EgP difficult.

Recently, the 3-D structure of Tick-Borne encephalitis (TBE) virus EgP (partial) has been solved by X-ray crystallography (Rey et.al, 1995). We have predicted the 3-D structure of JEV EgP (1-399) using homology modelling approach and TBE virus EgP as a template. The initial conformation of loop regions was chosen by (Hobohm & Sander, 1994). However the final conformation of the each loop was fixed by carrying out MD studies of 500ps at room temperature and minimisation using steepest descents and conjugate gradient methods. These loops were than attached to the SCRs and minimisation was carried out for the whole molecule after removing any short contacts due to side chain-side chain and side chain-main chain interactions. The unconstrained energy minimized structure was analysed and the residues having bad bond angles, dihedral angles (f ,Y ,w ) were fixed to equilibrium values and minimisation was carried out further. The residues which were found to have (f, Y) values outside the allowed region of Ramachandran Plot were corrected. After each such modification unconstrained energy minimisation is carried out to reach the acceptable rms. derivative criteria.

This whole molecule was then soaked in water and the studies are in progress. It has been noticed from the preliminary analysis that it has three distinct domains:

§         A central b barrel (domain I)

§         An elongated dimerization region (domain II) and

§         The C-terminal immunoglobulin-like module (domain III).

The predominant secondary structure is b strand with 2 helices. The overall secondary structure in the hinge regions is quite different in JEV as compared to TBE. The conformation of the peptide YSAQVGASQ in the protein is very similar to the conformation of the free peptide, which we had modelled earlier.  

To our knowledge this is the largest single chain protein modelled on the computer. This structure will be highly useful in designing peptide vaccine. This work was presented in the Bioinformatics <--> Structure conference, Jerusalem, Israel during Nov 17-21, 1996.  

Prediction of 3D structure of Asparagine Synthetase

Asparagine synthetase is an important enzyme as the level of Asparagine is very high in leukemia patients. Present treatment of Asparaginase, if given in higher doses create ammonia toxicity and kill the patients. Therefore design of specific inhibitor of this enzyme could help in controlling Disease State.

Amino acid sequence comparison of Asparagine synthetases from various biological sources and structure of Asparaginase with bound aspartate indicated the conserved region 316IETYDVTTIRASTPMYL332 with Thr332. This peptide is predicted to be the active site. Molecular modelling studies of this peptide in vaccuo and in water showed that the peptide has loop conformation. Side directed mutagenesis studies proved that the hypothesis namely involvement of this region in Aspartate binding is correct. Further, the glutamate-binding domain that is in the N-terminal region of Asparagine synthetase and belongs to purF family of amidotransferases is being modelled using homology modelling approach. Reference proteins used for this study are glutamine PRPP amidotransferase (1GPH), glucosamine 6-phosphate synthase (1GMS).  

Bioinformatics Center                 National Chemical Laboratory

 


Home | News | Opportunites | Careers | Contact Us | Talent | Facilities | Business | Government | Lifestyles | Committee Members | Manufacturing Companies | Guest Book | Pune Vyaspeeth Members |