Journal of Computational Biology and Bioinformatics Research Vol. 5(1), pp. 6-14, April 2013 Available online at http://www.academicjournals.org/JCBBR DOI: 10.5897/JCBBR12.013 ISSN 2141-2227 ©2013 Academic Journals Full Length Research Paper Annotation of virulence factors in schistosomes for the development of a SchistoVir database Adewale S. Adebayo1 and Chiaka I. Anumudu2* 1 Cell Biology and Genetics Unit, Department of Zoology, University of Ibadan, Oyo State, Nigeria 2 Cellular Parasitology Programme, Department of Zoology, University of Ibadan, Oyo State, Nigeria. Accepted 8 March, 2013 Scientific efforts in the eradication of neglected tropical diseases, such as those caused by the parasitic helminthes, can be improved if a database of key virulence factors directly implicated in pathogenesis is available. As a first step towards creating SchistoVir, a database of virulence protein factors in schistosomes, in this study, we curated, annotated and aligned sequences of twenty virulence factors identified from the literature, using several bioinformatics tools including UniProtKB, SchistoDB, VirulentPred, InterProScan, ProtScale, MotifScan, TDRtarget, SignalP, MODBASE, PDB and MUSCLE. Among the protein entries, the most frequently occurring amino acid residues were lysine, serine, leucine, glutamine, glycine and cysteine in order of magnitude. Although sequence repeat regions (SRRs) of significant value were identified manually in fifty percent of the proteins (while dipeptide repeats (DiPs) and single amino acid repeats (SAARs) were not), nevertheless, seventy-two percent of the protein entries were classified as virulent by the prediction model, VirulentPred. Most of the entries (eighty percent) did not have target compounds based on the database of available chemical compounds at TDRtargets. Fourteen of the twenty entries (seventy percent) had more than 30 consecutively negative amino acid residues based on the ProtScale’s Kyte and Doolittle hydrophobicity plot. Hence, they would be hydrophobic enough to be transmembrane in location or secretory in nature. Only 7 (tyrosinase, serine protease1, Tspan-1, VAL4, cathepsin b and L and calreticulin) had cleavage sites and signal peptides, while none had a significant signal anchor probability. The annotations and characterization provided by this work and the development of a SchistoVir database will aid in further research of schistosome pathogenesis and control. Key words: Protein database, bioinformatics tools, virulence proteins/factors, annotation, schistosomes. INTRODUCTION Schistosomes are pathogenic helminthes, a group of worldwide morbidity and mortality. Besides, it is believed parasites which constitute important sources of morbidity to infect 200 million persons (Dvorak et al., 2008; Chalmers and mortality in several parts of the world, with 2 billion et al., 2008). The control of schistosomiasis involves po- persons affected (Fumagalli, 2010). Of the several spe- pulation-based chemotherapy with the use of praziquantel cies of schistosomes known, Schistosoma mansoni, (PZQ) and metrifonate drugs. PZQ increa-ses antigen Schistosoma haematobium and Schistosoma japonicum exposure, induces adenosine receptor bloc-kade and cal- are important in the spread of morbidity. Among the cium influx, causes paralysis and distorts the parasite’s tropical diseases caused by parasites, schistosomiasis morphology. Snail control (through habitat modification, ranks second only to malaria as a cause of catastrophic environmental planning and molluscicides) and immu- nological control (vaccination) are also components of the disease control strategy. The identification and deve- lopment of vaccine candidates (usually protein antigens) *Corresponding author. E-mail: will be useful in reducing morbidity drastically. Yet, a single chiaka.anumudu@mail.ui.edu.ng or walsaks002@yahoo.com. actual potent vaccine is not within reach (WHO, 2010). UNIVERSITY OF IBADAN LIBRARY Adewale and Chiaka 7 Generally, parasites (including helminthes) use several These are available at http://prediction centre. llnl. gov, mechanisms to attack hosts, while deriving nutrients and provided by the Lawrence Livermore National Laboratory, scheming for their own continued survival, and the US. ProtVirDB (http://bioinfo.icgeb.res.in/protvirdb) is a success of parasites can be highly correlated with their database for virulence factors in protozoans. The data- ability to evolve a sophisticated immune evasion strategy bases mentioned provide unified information portals for (Matisz et al., 2011). Therefore, the key parasite proteins researchers interested in a panoramic or in-depth view of involved in these actions are critical to survival and they the virulent proteins in a parasite of interest or in compa- have been classified as virulence proteins (Fankhauser et rison with other parasites. al., 2007; Ramana and Gupta, 2009). These virulence In fact, many works available have indicated the roles proteins may be involved in attachment or adhesion to of several virulence proteins in schistosomes pathogen- host membrane receptor cells, establishment and pene- nesis, the full genome database of the S. mansoni tration within host cells, cleaving of host proteins and (SchistoDB, www.schistodb.net/schistodb) has been made invasion (Ramana and Gupta, 2009). These proteins are publicly available. Also, there are a number of public pro- strictly regulated and have also been identified as thera- tein databases which offer information on the proteome of peutic agents; some of them are already at clinical trial S. mansoni, S. haematobium or S. japonicum, for exam- stages (Gomez et al., 2010). A function-based classi- ple, UniProt (http://uniprot.org). Nevertheless, there is no fication of these virulence proteins was done by Ramana specialized and simplified public information portal for the and Gupta (2009). The major classes identified were virulence factors identified so far in the schistosomes, invasion, establishment, adhesion, proteases and others either in S. mansoni, S. haematobium or S. japonicum. with unknown or putative function, although there is To the best of our knowledge, no database or classi- sometimes no sharp delineation amongst the different fication to specifically annotate virulence proteins in categories. parasitic worms exists, although substantial research has Schistosome virulence proteins have been evaluated been done in characterizing proteins in different helminth for use as vaccine or drug targets. Chalmers et al. species (Caprona et al., 2005; Braschi et al., 2006; Curwen (2008), MacDonald et al. (2002) and Lopez-Quezada and et al., 2006; Cardoso et al., 2008; Aslam et al., 2008; Bos McKerrow (2011) affirmed that SmVAL and serpins et al., 2009; Boumis et al., 2011). Such a database will (serine protease inhibitors) are involved in immune sys- give a simplified information portal for the researchers tem modulation. The SmVAL (S. mansoni venom allergen interested in a panoramic or in-depth view of the virulent -like) have a conserved SCP/TAPS domain that posse- proteins in a parasite or in comparison with other para- sses envenomation and larval penetration activity. The sites. It will also facilitate sequence retrieval and analysis serpins are able to distort host proteases. Furthermore, and will be a useful tool for the research community in the proteomic studies have also shown that the Sm29, study of schistosome pathogenesis. gluthatione-S-transferase, thioredoxin, tetraspanins, triose phosphate isomerase, Sm32 among others in schistosoma, METHODOLOGY are critical for host haemoglobin degradation, reduction of In order to construct and develop a preliminary secondary database Th2 immune response and tissue invasion (Braschi et al., of schistosome virulence proteins (SchistoVir), annotation of 2006; Hansell et al., 2008; Cardoso et al., 2008; selected proteins were done using some tools. Verjovski-Almeida and DeMarco, 2008; Sharma et al., 2009). There are also studies to indicate the importance Catalogue of protein entries of some of these proteins in treatment strategies. WHO /USAID partnership led to the establishment of the Schis- Protein entries were curated from nucleotide sequences, genomic sequences and literature available in NCBI’s RefSeq and PubMed tosomiasis Vaccine Development Programme (SVDP) 1 2 (http://ncbi.nlm.nih) ;GeneDB (http://genedb.org/genedb/smansoni); which identified epitopes of the triose phosphate isomerase 3 4 SchistoDB Release 2.0 ((http://schistodb.net) and UniProt KB (TPI), Sm14 and GST as virulence proteins with key vac- (http://www.uniprot.org/search). The criteria for choice of entry will cine candidates (WHO, 2010). Braschi et al. (2006), Reis be essentiality, decisiveness or crucial nature of the protein for et al. (2008) and Aslam et al. (2008) also highlighted the survival in the host. The use of multiple databases is to ensure that roles of tetraspanins, TPI and serpins in the preparation all possible entries are obtained. The databases will provide litera- ture sources, genomic sequences, contigs and protein sequences of efficacious vaccines. of schistosome entries. In silico approaches have enhanced research in para- site pathogenesis and efforts in this direction continue till today (Devor, 2005; Hogeweg, 2011). This falls under the 1. National Centre for Biotechnology Information. It has a large depository of categorization of functional genomics (Garg and Gupta, biomedical literature and genomic information. The RefSeq provides updated, universally confirmed genomic sequences and PubMed provides literature. 2008). Available databases of bacterial virulent proteins 2. GeneDB provides genomic and proteomic data on species which have been and toxins include VFDB (Virulence Factors Database), completely sequenced PRINTS (Protein Family fingerprints) and MVirDB (Micro- 3. SchistoDB provides protein and genomic information on the Schistosoma bial database of protein toxins and virulence factors) mansoni genome 4. UniProtKnowledgebase is a mega database on proteomic data from hundreds (Zhou et al., 2007; Tsai et al., 2009). of species UNIVERSITY OF IBADAN LIBRARY 8 J. Comput. Biol. Bioinform. Res. Coding and protein sequences, status and amino acid length bin/motif_scan) (Sigrist et al., 2010) of the Swiss Institute of were obtained from SchistoDB or UniProt queries with the use of Bioinformatics which makes use of the PROSITE database. Also, keywords or gene names. Ontogenic expression which gives a the InterPro Scan version 33.0 (Mulder et al., 2007) measure of the protein’s expression at different stages of the life (http://www.ebi.ac.uk/Tools/services/web_iprscan) from European cycle of schistosomes was obtained from SchistoDB using a Bioinformatics Institute, an arm of European Molecular Biology number of ESTs (expressed sequence tags) in adults per total Laboratory (EMBL) was used to discover conserved protein signa- number of ESTs for all stages. The results were saved as complete tures and domains of individual entries. html pages. Blastp searches were conducted using the UniProt Query sequences in plain format used in MotifScan and InterPro- Blast tool with default parameters. Scan returned results in html and SVG formats. Signatures, profiles and domains recognized are derived from these results and sum- marized. The relative hydrophobicity of a protein and the absence Phylogenetic and secretory/transmembrane analysis or presence of signal peptides with cleavage sites are important in determining if it is transmembrane or not. Homologous protein sequences to each of the virulence factors from selected species are obtained from NCBI and alignments were made using the MUSCLE multiple alignment tool (at Other tools in the annotation and characterization of entries http://ebi.ac.uk/tools/muscle) (Edgar, 2004) with default parameters and results are displayed in ClustalW format. Muscle is hosted by Protein entries were also characterized using the following tools: European Bioinformatics Institute-EMBL and provides for automa- ted sequence analysis with multiple alignment. 1. VirulentPred (http:bioinfo.icgeb.res.in/virpred): a bacterial Signal sequences in the proteins were recognized with SignalP virulence factor prediction server, which relies upon amino acid 3.0 (http://www.cbs.dtu.dk/services/SignalP) hosted by the Tech- composition, dipeptide composition, similarity search of known nical University of Denmark, using both the neural network and virulence factors in bacteria, and cascade support vector machine Hidden Markov Model (HMM) methods (Nielsen et al., 1997; algorithms to predict likelihood of virulence. Bendtsen et al., 2004). In order to increase accuracy, presence or 2. TDRtargets version 4.0 (Crowther et al., 2010): Using homology absence of signal peptides was defined by the default Neural to druggable proteins with specified modifiable criteria, the server Network D score thresholds. This is in accordance with the method generates possible drug targets. In the results, associated drugs or of Bos et al. (2009). SignalP results present d, s and y scores. The compounds represent the results of searches conducted on the D score is the average of the maximal Y-score (the most likely TDR targets database for each entry. The GO terms display gene- location of the cleavage site of the signal sequence) and the mean rally accepted terms of the function of a protein, the cellular compo- S-score, and is the best way to discriminate true signal sequences nent of which the protein is part and the interaction of the protein in proteins (Emanuelsson et al., 2007). In responding to query with other substances which is its molecular function. sequences, proteins with a D score greater than 55 and HMM grea- ter than 90% were scored as having an N-terminal signal sequence (hence, transmembrane) and presented in our results. The default RESULTS AND DISCUSSION eukaryote setting of SignalP, with each sequence truncated after 70 residues, was used to avoid false positive detection of signal sequence outside the N-terminus. SignalP results were read off the An initial twenty proteins derived from the literature were query results directly from both the HMM and neural networks. annotated in a simplified format using bioinformatics tools. Due to the large nature of the results or documentation generated, a summary of protein Domain, model and transmembrane predictions documentation and analysis are presented in Tables 1 Model search or prediction was done using protein data bank (PDB and 2, respectively. The proposed schema is shown in or MODBASE; http://rscb.pdb.org and http://salilab.org/modbase Figure 1. Simply put, a database schema is a graphical respectively), with the protein sequence as a query with default depiction of the structure of the database and a similar settings. The model predictions are presented as 3D structures 1 obtained from MODBASE query searches. Helices, beta sheets, pattern has been adopted by Ramana and Gupta (2009). turns or coils and loops are easily visible this way. PDB (protein A sample of the annotation pages for one of the protein data bank) 3D structures are displayed when available. entries is shown in Figure 2. The proteins included are 2 3 ProtScale from ExPasy (http://web.expasy.org/cgi- discussed under the following headings. bin/protscale) was used to generate a hydropathy plot based on the calculated hydrophobicity of constituent amino acids. Interpretation is based on the fact that twenty consecutive hydrophobic amino Protein entries acids are needed for a peptide/protein to be transmembrane. The Kyte-Doolittle hydrophobicity plot method was adopted while using ProtScale. ProtScale results from query sequence were generated Inclusion of the protein entries, data generated for the in a numerical verbose format so as to be able to obtain numbers of amino acid sequence, molecular weight, domains and consecutive hydrophobic residues. Prediction of motifs and signatures for the entries agree with the works of Chacon domains was done using Motif Scan (http://myhits.isb-sib.ch/cgi- et al. (2003), Herve et al. (2003) for 28GST, Ramos et al. (2003, 2009) and Rabia et al. (2010) for Sm14, Fitzpatrick et al. (2007) for tyrosinase, Chalmers et al. (2008) for 1. MODBASE predicts a protein’s 3D structure, when it is not available in its venom allergen like VAL, Kane et al. (2004) for major egg data bank (Pieper et al., 2011). It is a tool for comparative protein structure antigen p40, Berriman et al. (2009), Boumis et al. (2011) modeling. for thioredoxin, Lopez-Quezada and McKerrow (2011) for 2. ProtScale analyses the profile of a query sequence using amino acid residues present serpin and Wu et al. (2011). 3. Protein analysis software from Swiss Bioinformatics Proteases, proteinase inhibitors and binding proteins UNIVERSITY OF IBADAN LIBRARY Adewale and Chiaka 9 Table 1. The database proteins with their basic features and classification. Systematic name Name Feature Functional classification 50% (germball), 33% (cercariae), 43.62 kD, 4 Smp_155560 Serpin Establishment (Protease inhibitor) paralogs Smp_062080,062120,155530,155550) Smp_000022 Tyrosinase Female adults only, 56.3kD Invasion (egg migration/formation Smp_075800.1 Sm32 2 paralogs (Smp_075790,179170) Protease 85.7% adult, 7.1% schistosomula,1 paralog Establishment Smp_054470 Thioredoxin (Smp_008070),11.2kD (redox homeostasis) Establishment Smp_155310 Tetraspanin/Tspan-1 Adults only, 24.05kD, 8 paralogs (Protective) 2.8% (adult) 1.9% (schistosomula), 168.16kD, 3 Smp_157090 Cathepsin L Protease paralogs (Smp_034410.1-3) 36.6kD, 4 paralogs (Smp_067060, Smp_158420 Cathespsin B Protease 103610,141610,179980) Smp_072190 Sm29 58% (adult), 21.2kD, no paralogs Establishment Fatty acid binding 85% (adult), 9.8% (schistosomula), 14.9kD, 3 Establishment (Uptake of host Smp_095360 protein or Sm14 (fabp) paralogs (Smp_174440.4,095360.1-2) fatty acid) Smp_030350 Serine Protease 1 (Sp1) 54.28kD, no paralogs Protease Triose phosphate 81.8% adult,12.7% expression (cercariae), Smp_003990 Establishment isomerase 28.09kD Smp_054160 28GST 85% (adult), 23.82kD, no paralogs Invasion Smp_030370 Calreticulin 50.3% (adult), 45.4kD,no paralogs Establishment (protein folding) Smp_183000 Major egg antigen p40 21.07kD, 17 paralogs Establishment (immune evasion) Phosphoglycerate Smp_018890 75% (adult),18.47kD,1 paralog (Smp_187370) Establishment kinase (Pgk) Fructose 1,6 Smp_042160.2 80.8% adult, 39.7kD,1 paralog (Smp_042160.1) Establishment biphosphate aldolase Venom allergen like Smp_002070 28 paralogs, confirmed only as transcripts Establishment (VAL) *All systematic names are derived from the SchistoDB (Berriman et al., 2009); *Paralogs are homologous or similar sequences found in same species; *All percentages are calculated from number of EST per total EST. Table 2. Classification of database proteins based on different bioinformatics tools. Parameter Number of entries Entry protein Tyrosinase (0.6586); fabp1-97aa (0.6656); Sm14 (0.7938); Sp1 (0.6586); Virulence using bacterial thioredoxin 2 (0.6586); Tspan-1 (1.2771); egg antigen p40 (1.1121); serpin parameters and score 13 (1.0287); cathepsin B(0.2498); cathepsin L(1.3629); calreticulin (0.2028); (ViruPred) Sm29 (0.9471); Sm32 Non virulence according to Triose p.isom (-1.742); thioredoxin!; Pgk (-0.816); 28GST (-0.572); fructose 5 Virupred bp aldolase (-1.367) Having known targets / 4 Tyrosinase, t.p.isomearse, fabp3, 28GST compounds No known target compounds Fabp1, SP1*, calreticulin, thioredoxin1 and 2*, Tspan-1, p40, cathepsin b*, [Human ortholog target 16 cathepsin L*, serpin* PGK*, Sm29, Sm23, fructose biphosphate aldolase*, available*] Sm32* MODBASE/PDB data available Fabp3*, 28GST*, Sm14*, aldolase^, Sm29^, pgk^, thioredoxin1&2^, sp1, 13 [both available*, MODBASE cathepsin L and b^, tpi^, p40^ only^] include Sm32, serine protease and cathepsins. 58 kD serine (Aslam et al., 2008). Involvement of protease inhibitors or protease has been documented to be capable of mitigating proteinase inhibitors such as serpin 1 in direct virulence the effects of IgE immune response through cleavage or pathogenesis has also been recognised by Dalton et al. UNIVERSITY OF IBADAN LIBRARY 10 J. Comput. Biol. Bioinform. Res. Table 2. Contd. No PDB data/chain 3 Tspan-1, Sm23, Sm32 No MODBASE prediction model Hydrophobicity: Entries with highest number of consecutive Calreticulin (130), cathepsin L (79), VAL 4 (65), cathepsin B (49), p40 (42), Sm32(40) negative residues Largest number of Cathepsin L (55), Sm32 (40), Sm23 (35), 28GST (30), aldolase (26), SP1(20), Sm32 Epitopes (20), serpin1 (17), Pgk 23 !: Predicted virulent based on dipeptide composition but non virulent based on other parameters, ^ no PDB structural data available. INDIVIDUAL PROTEIN ANNOTATED RETURNED KEYWORDS RESULTS USER FOR QUERY QUERY PROTEIN ENTRY ADDITIONAL LINKS FOR EACH ENTRY Figure 1. Schema (graphical structure) for the database. al. (1997), Lin and He (2006) and Lopez-Quezada and expression and antigenicity. Also, each protein entry had McKerrow (2011). at least two GO ID and term/names to represent the Triose phosphate isomerase, aldolase, calreticulin and function term (F) and the process terms (P). The GO terms glutathione s-transferase (GST) are entries which have aptly describe entries based on their status as a cellular been identified as excretory-secretory (ES) products in component; in biological process and molecular function. Schistosoma by previous studies and these ES proteins have been implicated in invasion of host tissues by spo- rocysts, haemocyte encapsulation and eventual immuno- Amino acid sequence suppression and immune evasion (Guillou et al., 2007; Reis et al., 2008). The amino acid length of the protein entries and coding None of the entries had a confirmed myristoylation site, sequence varied from 97 to 1471 aa and 231 to 4413 bp. an amino acid side chain to which a lipid group, myristic Protein length is apparently diverse although long-chained acid (14C saturated fatty acid) can be added to aid the proteins are not in abundance among the entries. The localization of a cytosolic protein to a membrane. Although more frequently occurring residues among the protein se- one entry, the major egg antigen p40 had probable sites. quences were lysine, serine, leucine, glycine, cysteine, Nevertheless, this did not connote the absence of trans- glutamine and aspartic acid. The first 5 residues have membrane proteins among the entries. also been found to occur frequently among eukaryotic It is also noteworthy that some of the entries: Sm14, virulence proteins as reported by Garg and Gupta (2008). Sm23, 28GST, Quezada et al. (2009) also showed that conserved cysteine triose phosphate isomerase and Sm32 have already been identified as vaccine candidates (Cardoso residues and leucine rich domains are key to virulence et al., 2008; WHO, 2010). This is due to their high level of protein function. UNIVERSITY OF IBADAN LIBRARY Adewale and Chiaka 11 SAMPLE ANNOTATION PAGE Sm 14 Fatty acid binding protein (FaBP) Status: putative, conserved in few eukaryotes Protein/ coding sequence MSSFLGKWKLSESHNFDAVMSKLGVSWATRQIGNTVTPTVTFTMDG DKMTMLTESTFKNLSCTFKFGEEFDEKTSDGRNVKSVVEKNSESKLT QTQVDPKNTTVIVREVDGDTMKTTVTVGDVTAIRNYKRLS 133aa Coding sequence ATGTCTAGTTTCTTGGGAAAGTGGAAACTTAGCGAGTCACACAAC TTCGATGCTGTCATGTCAAAGCTAGGTGTCTCATGGGCAACTCGAC AGATTGGGAACACAGTGACCCCAACTGTAACCTTCACAATGGATG GGGATAAAATGACTATGTTAACAGAGTCAACTTTCAAAAATCTTTC TTGTACGTTCAAGTTCGGCGAGGAATTCGATGAAAAAACAAGTGA CGGCAGAAATGTCAAGTCAGTTGTTGAAAAAAATTCCGAGTCGAA GTTAACGCAAACTCAAGTAGATCCCAAAAACACAACTGTAATCGT TCGTGAAGTGGATGGTGATACTATGAAAACGACTGTGACTGTTGG TGACGTTACTGCCATTCGCAATTATAAACGACTATCCTAA 402bp Suggested Virulence category: Establishment Associated compound (TDR): none human ortholog cmpd: 3-carbazol-9- ylpropanoic acid, 2 hexyldecanoic acid Ontogenic/stage specific exp.(schistodb) (no of EST/total ESTs) Adult (139/163), schistosomula (16/163) Model Prediction (A:MODBASE;B: PDB structure bound to oleic acid) Homology 3 paralogs Best blastp hits (% identity): Homo sapiens fabp, brain (48%); Bos taurus heart fabp(44%); Mus musculus adipocyte binding protein (43%) F igure 2. S ample annotation page for one of the protein entries. UNIVERSITY OF IBAD N LIBRARY 12 J. Comput. Biol. Bioinform. Res. None of the entries appeared to have single amino acid ronment of the membrane and its likelihood of being repeats (SAARS) or Di-peptide repeats (DiPs). On the other transmembrane. Most of the protein entries possess a hand, sequence repeat regions (SRRs) of two amino acids significant number of hydrophobic amino acids, with cal- in length were found in triose phosphate isomerase, tyro- reticulin having 130 consecutive residues (the highest) sinase, egg antigen p40, sp1, tspan-1, serpin, cathepsin and thioredoxin 2 having 14 residues (lowest). Fatty acid L, calreticulin, 28GST and Sm14. These 2aa repeats binding protein, thioredoxin 1 and 2, and Sm29 were the were both heteropeptide and homopeptide repeats. The only entries with less than 20 consecutive residues. numbers of repeats found in the rest of the entries were These are all evident from ProtScale results generated not considered to be significant enough. The occurrence numerically (verbose format). The algorithm used by of repeats in many of the virulence proteins of microbial ProtScale takes no cognizance of the first 4 amino acids and protozoan pathogens has been reported by of the protein sequence used as query for such and are Gravekamp et al. (1998), Karlin et al. (2002), Fankhauser usually involved in signal peptide and cleavages. Hence, et al. (2007) and Ramana and Gupta (2009). None of the high number of entries (16 of 20) had strong hydrophobic public repeat searching tools: ProtrepeatsDB and RepSeq portions with at least 20 to 30 consecutive negative resi- (Kalita et al., 2006; Depledge et al., 2007) were available dues. It has been postulated from previous studies that online for use. Hence, entries were examined manually for hydrophobic residues of virulence proteins would aid their repeats, though the note was taken for natural probability integration of membrane, adhesion to host cells or stimu- of amino acid repeat occurring for any given sequence. late binding of the proteins to targets in the hosts (Katsir Availability of repeat searching tools/servers would have et al., 2008; Quezada et al., 2009; Blanco et al., 2010). provided a means of detecting the more complicated Sites for cleavage of proteins involved in the conven- mismatch repeats that may be present. tional secretory pathway in cells were identified in 7 of the Similarly, there were no notable conserved sequences entries (35%). Such entries have signal sequences and a observed in the alignment of the protein entries genera- high D score, and it predicts that the entries are integral ted from MUSCLE. This might have been expected due to the membrane and non-cytoplasmic. The usage of to a wide range of protein families. eukaryotic parameters and truncation of each query seq- uences to 70 terminal residues (in the signalP queries) Virulence increased reliability of the result generated (Bos et al, 2009). Virulence-related outer membrane, integral mem- The twenty protein entries were used as query on brane or transmembrane proteins have been identified in VirulentPred which identifies partial sequence of a protein prokaryotic and eukaryotic pathogens (Schulz and Vogt, that could aid virulence according to bacterial models 1999; Kim et al., 2009). with default settings. The results of these VirulentPred predictions (Table 1) may serve as a form of validation for Associated compounds and drugs some of the entries. 13 of the 20 entries queried on the server were declared virulent by all parameters (amino Diverse compounds were found to be associated with acid composition, dipeptide composition, similarity search different protein entries even among paralogs. As our and cascade support vector machine algorithms). queries of TDRtargets showed, only 4 of the protein According to Garg and Gupta (2008), VirulentPred is entries including triose phosphate isomerase, 28GST and highly sensitive even for eukaryotic sequences, but may tyrosinase, had already named or known compounds that produce false positives in eukaryotic sequences due to could be used to target them. There are still no known compositional differences. compounds to directly target 16 of the protein entries, Although schistosomes are eukaryotic and more highly although 9 of the 16 have known targets for their human evolved than bacteria pathogens, certain features may be orthologs. Hence some key protein factors involved in conserved in virulence proteins. In fact, such conserva- virulence of schistosomes cannot yet be targeted with tion is seen in the occurrence of repeats and residues any of the synthetic/natural compounds known to humans, (previous page) in these pathogens. Lysine and serine, at least within the TDRtarget database, a huge database two of the most frequent residues in the protein entries formed by a collaboration of several universities and the (and in eukaryotic virulence proteins) also frequently occur WHO TDR drug target network (Crowther et al., 2010). in bacterial virulence proteins. Hence, the VirulentPred Ontogenic/stage specific expression results cannot be totally discarded. It is pertinent to note that most of the protein entries Other functional predictions including tyrosinase, fabp, Sm14, thioredoxin, Tspan-1, cathepsin L, phosphoglycerate kinase, Sm29, calreticulin, In the Kyte-Doolittle hydropathy plot, a large number of 28GST, aldolase and Sm32 have high expression levels consecutive negative amino acid residues (20 to 30) are in adult schistosomes as results from SchistoDB showed highly indicative of the hydrophobicity of a protein, its it. Serpin is one exception of the entries (high expression ability to be inserted into the internal hydrophobic envi- in germball and cercariae). Adult pathogens are highly UNIVERSITY OF IBADAN LIBRARY Adewale and Chiaka 13 evolved and would be capable of producing proteins 8. If additional link search is performed, go to http:// directly involving pathogenesis. schistodb.net/schistodb[End]. Model search and prediction Summary, conclusion and recommendation Surprisingly, a significant number of the protein entries had no structural data available in the protein data bank Studies on molecular aspects of parasitic helminthes can (PDB) and MODBASE was relied on for prediction of the be improved especially in the tropics if a database of key structures of most of the entries. 28GST and Sm14 had virulence factors which are directly implicated in pathoge- 3D model data from both databases, 13 entries had nesis is developed. This work has laid a foundation for us model predictions generated from MODBASE. 3 entries: to develop such a database. It would be useful to the sm23, Sm32 and tspan-1 had no models from both research community at a time when the search for vac- databases. In terms of online availability and accessibility cines for several helminth diseases is on the increase. It of 3D structural data, research on many proteins directly also provides grounds for further studies related to the involved in schistosome pathogenesis may be slow. significance of proteins involved in virulence of the para- sitic helminthes. The database will be made public (de- Suggested virulence category pending on funds availability), updated regularly and additional tools for virulence prediction incorporated. It is The protein entries had specific functions such as eva- proposed to expand the database to include other patho- sion of host immune responses, penetration of host genic helminthes so that users are provided with more barriers, degradation of host protective proteins and esta- possibilities in terms of species coverage. blishment in the host, all of which contribute to virulence, the relative ability of parasites to induce pathogenesis. Each of these functions was used to categorize the entries REFERENCES into a suggested virulence category. Hence 4 categories were identified: Proteases, Establishment (uptake of host Aslam A, Quinn P, McIntosh RS, Shi J, Ghumra A, McKerrow JH, Bunting KA, Dunne DW, Doenhoff MJ, Sherie LM, Ke Z, Richard JP nutrients, redox homeostasis, etc), proteases and inva- (2008). Proteases from Schistosoma mansoni cercariae cleave IgE at sion (proteinase inhibitors, host penetration) (Table 1) solvent exposed interdomain region. Mol. Immunol. 45(2):567-574. Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004). Improved Antigenicity prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340:783-795. Berriman M, Haas BJ,LoVerdo PT, Wilson RA, Dillon GP, Cerquiera GC, El Sayed NM (2009). The genome of the blood fluke It is worthy of note that all protein entries had at least 5 Schistosoma mansoni. Nature 460:352-358 putative antigenic epitopes with the highest being that of Bos DH, Mayfield C, Minchella DJ (2009). Analysis of regulatory cathepsin L, SmSP1 with 55 putative epitopes. Epitopes protease sequences identified through bioinformatic data mining of the Schistosoma mansoni genome. BMC Genomics 10: 488-492. or better still, the antigenic determinants are discrete Boumis G, Angelucci F, Bellelli A, Brunori M, Dimastrogiovanni D, Miele sites on a protein or antigen which B and T lymphocytes AE (2011). Structural and functional characterization of Schistosoma recognize. Epitopes are the immunologically active regions mansoni Thioredoxin. Protein Sci. 20(6):1069-1076. in a complex antigen that actually bind to B-cell or T-cell Blanco MT, Sacristán B, Lucio L, Blanco J, Pérez-Giraldo C, Gómez- García AC (2010). Cell surface hydrophobicity as an indicator of receptors. Such information on epitopes is useful when other virulence factors in Candida albicans. Rev. Iberoam Micol. considering adaptive immunity response to parasites and 27(4):195-199. possible therapeutic targets. Recognition of these epito- Braschi S, Borges WC, Wilson RA (2006). Proteomic analysis of the pes depended on evidence provided by TDR targets schistosome tegument and its surface membranes. Mem Inst Oswaldo Cruz. 101(I): 205-212. v4.0. Caprona A, Riveaua G, Caprona M, Trottein F (2005). Schistosomes: the road from host–parasite interactions to vaccines in clinical trials. Algorithm [Structured English Text] Trends Parasitol. 21(3): 143-149. Cardoso FC, Macedo GC, Gava E, Kitten GT, Mati VL (2008). Schistosoma mansoni Tegument Protein Sm29 Is Able to Induce a If a protein name is entered into the query search, Th1-Type of Immune Response and Protection against Parasite 1. [Start] Copy alphabets of a query Infection. PLoS Negl Trop Dis. 2(10): e308. 2. Compare alphabets of the query with the stored Chalmers IW, McArdle AJ, Coulson RM, Wagner MA, Schmid R, Hirai template protein keywords 1 to 20 H, Hoffmann KF (2008). Developmentally regulated expression, alternative splicing and distinct sub-groupings in members of the 3. Search for a match Schistosoma mansoni venom allergen-like (SmVAL) gene family. a. There is a match if all the alphabets are the same BMC Genomics 9:89. b. There is also a match if 3 to 5 alphabets are the same Crowther GJ, Shanmugam D, Carmona SJ, Doyle MA, Hertz-Fowler C, 5. Display the matched protein keyword. If there is no Berriman M, Nwaka S, Ralph SA, Roos DS, Van Voorhis WC, Agüero F (2010). Identification of Attractive Drug Targets in Neglected- match, display ‘no result’ Disease Pathogens Using an In Silico Approach. PLoS Negl Trop 6. Search the annotated protein entry for matched Dis. 4(8): e804. keyword Curwen RS, Ashton PD, Sundaralingam S, and Wilson RA (2006). 7. Display the annotation page[End]. Identification of Novel Proteases and Immunomodulators in the UNIVERSITY OF I ADAN LIBRARY 14 J. Comput. Biol. Bioinform. Res. Secretions of Schistosome Cercariae That Facilitate Host Entry. Mol. MacDonald AS, Araujo MI, Pearce EJ (2002). Immunology of Parasitic Cell. Proteomics 5(5):835-844. Helminth Infections. Infect. Immun. 70(2):427–433. Dalton JP, Clough FA, Jones MK, Brindley PJ (1997). The cysteine Matisz CE, McDougall JJ, Sharkey KA, McKay DM (2011). Helminth proteinases of Schistosoma mansoni cercariae. Parasitology 114: Parasites and the Modulation of Joint Inflammation. J. Parasitol. Res. 105-112. 2011:942616. Depledge DP, Lower RP, Smith DF (2007). RepSeq – A database of Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, amino acid repeats present in lower eukaryotic pathogens. BMC Bork P, Buillard V, Cerutti L (2007). New developments in the Bioinformatics 8:122. InterPro database. Nucleic Acids Res. 35: D224-D228. Dvorak J, Mashiyama ST, Braschi S, Sajid M, Knudsen GM, Hansell E, Nielsen H, Engelbrecht J, Brunak S and von Heijne G (1997). Lim KC, Hsieh I, Bahgat M, Mackenzie B, Medzihradszky KF, Babbitt Identification of prokaryotic and eukaryotic signal peptides and PC, Caffrey CF and McKerrow JH (2008). Differential use of protease prediction of their cleavage sites. Protein Eng. 10:1-6. families for invasion by schistosome cercariae. Biochimie 90: 345- Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger 358. A, Braberg H et al (2011). MODBASE, a database of annotated Edgar RC (2004). MUSCLE: a multiple sequence alignment method comparative protein structure models and associated with reduced time and space complexity. BMC Bioinformatics 5:113. resources. Nucleic Acids Res. 39:465-474. Emanuelsson O, Brunak S, von Heijne G and Nielsen H (2007). Quezada CM, Hicks SW, Galán JE, Stebbins CE (2009). A family Locating proteins in the cell using TargetP, SignalP and related tools. of Salmonella virulence factors functions as a distinct class of Nat. Protoc. 2:953-971. autoregulated E3 ubiquitin ligases. Proc. Natl. Acad. Sci. 106(12): Fankhauser N, Nguyen-Ha T, Adler J, Mäse P (2007). Surface antigens 4864-4869. and potential virulence factors from parasites detected by Rabia I, El-Ahwany E, El-Komy W, Nagy F (2010). Immunomodulation comparative genomics of perfect amino acid repeats. Proteome Sci. of Hepatic Morbidity in Murine Schistosoma mansoni Using Fatty 5: 20. Acid Binding Protein. J. Am. Sci. 6(7):170-176. Fitzpatrick JM, Hirai YHH, Hoffmann KF (2007). Schistosome egg Ramana J, Gupta D (2009). ProtVirDB: a database of protozoan virulent production is dependent upon the activities of two developmentally proteins. Bioinformatics 25 (12):1568-1569. regulated tyrosinases. FASEB J. 21: 823-835. Ramos CR, Figueredo RC, Pertinhez TA, Vilar MM, Nascimento AL et Fumagalli M, Pozzoli U, Cagliani R, Comi GP, Bresolin N, Clerici M, al (2003). Gene structure and M20T polymorphism of the Sironi M (2010). The landscape of human genes involved in the Schistosoma mansoni Sm14 fatty acid-binding protein: structural, immune response to parasitic worms. BMC Evol. Biol. 10:264. functional and immunoprotection analysis. J. Biol. Chem. 278:12745- Garg A, Gupta D (2008). VirulentPred: a SVM based prediction method 12751. for virulent proteins in bacterial pathogens. BMC Bioinformatics 9:62. Ramos CR, Spisni A, Oyama S Jr, Sforca ML, Ramos HR, Vilar MM et Gomez C, Ramirez ME, Calixto-Galvez M, Medel O and Rodríguez MA al (2009). Stability Improvement of the fatty acid binding protein (2010). Regulation of Gene Expression in Protozoa Parasites. J Sm14 from S mansoni by Cys rep: Structural and functional Biomed. Biotechnol. 2010: 726045. characterization of a vaccine candidate. J. Biochim. Biophys. Acta Gravekamp C, Rosner B, Madoff LC (1998). Deletion of repeats in the 1794(4):655-662. alpha C protein enhances the pathogenicity of group B streptococci in Reis EAG, Mauadi Carmo TA, Athanazio R, Reis MG, Harn DA Jr immune mice. Infect. Immun. 66:4347-4354. (2008). Schistosoma mansoni triose phosphate isomerase peptide Guillou F, Roger E, Moné Y, Rognon A, Grunau C, Théron A, Mitta G, MAP4 is able to trigger naıve donor immune response towards a Coustau C, Gourbal BE (2007). Excretory–secretory proteome of type-1 cytokine profile. Scand. J. Immunol. (Clinical Immunology) larval Schistosoma mansoni and Echinostoma caproni, two parasites 68:169–176. of Biomphalaria glabrata. Mol. Biochem. Parasitol. 155 (1):45-56. Sharma M, Khanna S, Bulusu G, Mitra A (2009). Comparative modeling Hansell E, Braschi S, Medzhiradszsky KF, Sajid M, Debnath M (2008). of thioredoxin reductase from Schistosoma mansoni: a multifunctional Proteomic Analysis of Skin invasion by blood fluke larvae. PLoS target for antischistosomal therapy. J. Mol. Graph Model 27(6):665- Negl. Trop. Dis. 2(7):e262. 675. Herve M, Angeli V, Pinzar E, Wintjens R, Faveeuw C, Narumiya S, Schulz GE, Vogt J (1999). The structure of the outer membrane protein Capron A (2003). Pivotal roles of the parasite PGD2 synthase and of OmpX from Escherichia coli reveals possible mechanisms of the host D prostanoid receptor 1 in schistosome immune evasion. virulence. Structure 7 (10): 1301–1309. Eur. J. Immunol. 33: 2764–2772. Sigrist CJA, Cerutti L, De Castro E, Langendijk-Genevaux PS, Bulliard Hogeweg P (2011). The Roots of Bioinformatics in Theoretical Biology. V, Bairoch A, Hulo N (2010). PROSITE, a protein domain database PLoS Comput. Biol. 7(3): e1002021. for functional characterization and annotation. Nucleic Acids Res. Kalita MK, Ramasamy G, Duraisamy S, Chauhan VS and Gupta D (Database) 38: 161–166. (2006). ProtRepeatsDB: a database of amino acid repeats in Tsai CT, Huang WL, Ho SJ, Shu LS, Ho SY (2009). Virulent-GO: genomes. BMC Bioinformatics (database) 7:336. Prediction of Virulent Proteins in Bacterial Pathogens Utilizing Gene Kane CM, Cervi L, Sun J, McKee AS, Katherine SM, Sagi S, Ontology Terms. Int. J. Biol. Life Sci. 5(4):2009 Christopher AH, Edward JP (2004). Helminth Antigens Modulate Verjovski-Almeida S, DeMarco R (2008). Current developments on TLR-Initiated Dendritic Cell Activation. J. Immunol. 173(12):7454-61. Schistosoma proteomics. Acta Tropica 108:183-185. Karlin S, Brocchieri L, Bergman A, Mrazek J, Gentles AJ (2002). Amino World Health Organisation (WHO) Document (2010). Parasitic acid runs in eukaryotic proteomes, disease associations. Proc. Natl. Diseases-Schistosomiasis. Available at Acad. Sci. USA 99:333-338. http://who.int/vaccine_research/diseases/soa_parasitic/en/index5.htm Katsir LE, Schilmiller AL, Staswick PE, He SY, Howe GA (2008). COI1 l.Accessed 13 July 2011. is a critical component of a receptor for jasmonate and the bacterial Zhou CE, Smith J, Lam M, Zemla A, Dyer MD, Slezak T (2007). virulence factor coronatine. Proc. Natl. Acad. Sci. 105(19): 7100-7105 MvirDB—a microbial database of protein toxins, virulence factors and Kim KH, Willger SD, Park SW, Puttikamonkul S, Grahl N (2009). TmpL, antibiotic resistance genes for bio-defence applications. Nucleic a Transmembrane Protein Required for Intracellular Redox Acids Res. (database) 35:391–394. Homeostasis and Virulence in a Plant and an Animal Fungal Pathogen. PLoS Pathog 5(11): e1000653. Lin YL, He S (2006). Sm22.6 antigen is an inhibitor to human thrombin. Mol. Biochem. Parasitol. 147(1):95-100. Lopez Quezada LA, McKerrow JH (2011). Schistosome serine protease inhibitors: parasite defense or homeostasis. Anais da Academia Brasileira de Ciências (Annals of the Brazilian Academy of Sciences) 83(2): 663-672. UNIVERSITY OF IBADAN LIBRARY