Browsing by Author "Hemalatha, N."
Now showing 1 - 15 of 15
Results Per Page
Sort Options
Item Computational approach for the prediction of ERF and DREB proteins in indica rice using support vector machine(2012) Hemalatha, N.; Rajesh, M.K.; Narayanan, N.K.Drought and salt stress are considered to be major impediments in rice production systems. To understand the genetics of tolerance to these abiotic stresses and develop drought/salt tolerant cultivars, genomic regions influencing yield and its response to water deficit have to be identified. A method for predicting two drought tolerant proteins viz. dehydration-responsive element binding proteins (DREB) and ethylene responsive factor (ERF) in the genome of indica rice has been described. The proposed method, ERFDREBSVMPRED, was developed using support vector machine and a prediction accuracy of 89% for DREB and 81% for ERF was achieved. The developed tool could predict DREB protein with 100% specificity at a 71% sensitivity rate and ERF protein with 100% specificity at a 60% sensitivity rate.Item Development of a tool for computational prediction of σ70 promoters in Pseudomonas spp using SVM and HMM approaches(2014) Merin K. Eldo; Rajesh, M.K.; Jamshinath, T.P.; Hemalatha, N.; Murali Gopal; George V. ThomasPromoters are regions in DNA that play important role in the regulation of gene expression. The ability to locate promoters within a section of DNA is known to be a very difficult and important task in DNA analysis. Since experimental techniques to identify promoters are costly and time consuming, in silico methods offer an alternative. In this study, we have developed a tool for identification of σ70 promoters in the –10 and –35 regions of sequences from Pseudomonas spp. Promoters were predicted using both Support Vector Machine (SVM) and Hidden Markov Model (HMM) based approaches. SVM performed better when trained using RBF kernel with a cross-validation of 5 and a value of 0.03 for the gamma parameter. The module developed using SVM showed a sensitivity of 78% and a specificity of 80%. The programmes required to process the user input were written using Perl and HTML codes were used to create a user interface. The user interface accepts a query sequence and the processed result will be displayed in a new window. The tool named PROMIT (PROMoter Identification Tool), was developed in the Windows platform, has a user friendly interface and works well for sequences from Pseudomonas spp.Item Genome-Wide Analysis of Putative ERF and DREB GENE Families in Indica Rice (O. sativa L. subsp. Indica)(2012-10) Hemalatha, N.; Rajesh, M.K.; Narayanan, N.K.Drought is a major constraint to rice production and its stability in rain-fed and poorly irrigated environments. Identifying genomic regions influencing the yield and its response to water deficits will aid in our understanding of the genetics of drought tolerance and development of more drought tolerant cultivars. Besides drought, the other major impediment to increased crop production is salt stress. In this context, identification of drought and salt-responsive genes assumes significance. In this paper we carried out genome-wide analyses to explore putative genes encoding ethylene responsive factor (ERF) and dehydration-responsive element binding proteins (DREB) in the genome of indica rice. Reference nucleotides of well established molecular function, representing each of the protein families investigated, were chosen as query sequences for searches in the indica rice genome database. Clones having genomic sequences similar to the related genes were taken and converted to amino acid sequences. Putative sequences were subjected to PROSITE and Pfam databases and 31 signature sequences related to ERF family and 30 sequences related to DREB were obtained. Proteins showing more than 30% identity were taken and phylogenetic trees were generated for each family. The results of this sudy provide basic genomic information about new ERF and DREB gene families in indica rice.Item Genome-wide Analysis of Putative Erfand Dreb Gene Families in Indica Rice (o. Sativa l. Subsp. Indica)(2011) Hemalatha, N.; Rajesh, M.K.; Narayanan, N.K.Drought is a major constraint to rice production and its stability in rain-fed and poorly irrigated environments. Identifying genomic regions influencing the yield and its response to water deficits will aid in our understanding of the genetics of drought tolerance and development of more drought tolerant cultivars. Besides drought, the other major impediment to increased crop production is salt stress. In this context, identification of drought and salt-responsive genes assumes significance. In this paper we carried out genome-wide analyses to explore putative genes encoding ethylene responsive factor (ERF) and dehydration-responsive element binding proteins (DREB) in the genome of indica rice. Reference nucleotides of well established molecular function, representing each of the protein families investigated, were chosen as query sequences for searches in the indica rice genome database. Clones having genomic sequences similar to the related genes were taken and converted to amino acid sequences. Putative sequences were subjected to PROSITE and Pfam databases and 31 signature sequences related to ERF family and 30 sequences related to DREB were obtained. Proteins showing more than 30% identity were taken and phylogenetic trees were generated for each family. The results of this sudy provide basic genomic information about new ERF and DREB gene families in indica rice.Item An Integrative system for prediction of NAC proteins in rice using different feature extraction methods(2013-02) Hemalatha, N.; Rajesh, M.K.; Narayanan, N.K.The NAC gene family encodes a large family of plant-specific transcription factors with diverse roles in various developmental processes and stress responses in plants. Creation of genome wide prediction tools for NAC proteins will have a significant impact on gene annotation in rice. In the present study, NACSVM,a tool for computational genome-scale prediction of NAC proteins in rice was developed integrating compositional and evolutionary information of NAC proteins. Initially, support vector machine (SVM)- based modules were developed using combinatorial presence of diverse protein features such as traditional amino acid, dipeptide (i+1), tripeptide (i+2), four-parts composition and PSSM and an overall accuracy of 79%, 93%, 93%, 79% and 100% respectively was achieved. Later, two hybrid modules were developed based on amino acid, dipeptide and tripeptide composition, through which an overall accuracy of 83% and 79% was achieved. NACSVM was also evaluated using position-specific iterated – basic local alignment search tool which resulted in a lower accuracy of 50%. In order to enchmark NACSVM , the tool was evaluated using independent data test and cross validation methods. The different statistical analyses carried out revealed that the proposed algorithm is an useful tool for annotating NAC proteins in genome of rice.Item LTTRPred: A tool for prediction of LysR-type transcriptional regulator of pyoluteorin pathway in plant growth promoting Pseudomonas spp(2014-12) Anil Paul; Hemalatha, N.; Rajesh, M.K.Plant growth promoting Pseudomonas spp. produce an antifungal compound called pyoluteorin (Plt) that suppress diseases caused by phytopathogenic fungi. The pathway specific regulator PltR, a typical LysR-type transcriptional regulator (LTTR), is responsible for the transcriptional activation of the Plt biosynthetic operon. The LTTR family represents one of the largest classes of bacterial transcriptional regulatory proteins. A large number of LTTRs possess function as global transcriptional activators or repressors of unlinked genes or operons involved in metabolism, quinoline signal, virulence etc. The proposed method, LTTRPred, is an useful tool developed for identifying and predicting the LTTR, which is responsible for the activation of Plt transcription regulators, from whole genomes of various Pseudomonas spp. LTTRPred was developed using support vector machine (SVM) and Waikato Environment for Knowledge Analysis (WEKA) based on the composition of amino acid and amino acid pairs. Modules in SVM were developed using traditional amino acid, dipeptide (n+1) and hybrid amino acid composition modules and an overall accuracy of 100, 100 and 98 per cent respectively, was achieved. Modules in WEKA were also developed using the same modules and an overall accuracy of 100 per cent achieved for all. The performance of the tool was tested using various datasets of LTTR genes from different Pseudomonas spp. The best performing SVM and WEKA modules from the present investigation was implemented as a dynamic web server ‘LTTRPred’, which is freely available and can be accessed online (http://210.212.229.56/lttrpred/). This tool can be used for the functional annotation of the Pseudomonas spp. possessing LTTR geItem LTTRPred: A tool for prediction of transcriptional regulator of pyoluteorin pathway in Pseudomonas species using SVM-based approach(2012-11) Anil Paul; Rajesh, M.K.; Hemalatha, N.; Jamshinath, T.P.; Murali Gopal; George V. ThomasItem A machine learning approach for detecting MAP kinase in the genome of Oryza sativa L. ssp. indica(2014) Hemalatha, N.; Rajesh, M.K.; Narayanan, N.K.Plant development and crop yield are highly influenced by temperature. High temperature negatively affects different stages of plant development in rice, mainly booting and flowering. Identifying candidate genes associated with high temperature stress response may provide knowledge for the improvement of heat tolerance in rice. As the rice genome sequencing has already been undertaken, a major work challenge is annotating proteins and decoding their functionalities. MAP kinase (MAPK) proteins are involved in signaling various abiotic and biotic stresses, like temperature stress or drought, wounding and pathogen infection. Moreover, MAPKs have also been implicated in cell cycle and developmental processes. In this study, an attempt has been made in developing a MAP kinase prediction tool for rice, MapPred. The computational approach has been developed using Sequential Minimum Optimization (SMO) algorithm in Weka workbench, and a sensitivity of 100% was obtained using dipeptide method. MapPred was also tested with three plants, namely Arabidopsis, maize and tomato to prove that developed tool has higher accuracy with rice than other plants which further proves the higher prediction accuracy of species-specific tools. Prediction performance of MapPred was evaluated using cross validation, independent data test and leave one out validation. Our experimental results demonstrated that proposed algorithm based on dipeptide method could be very effective in the computational approach for predicting MAPK proteins in Oryza sativasubsp.indica.Item A Machine Learning Approach for Discovery of Novel Non- Ribosomal Peptide Synthetases (NRPS) in genomes of Plant Growth Promoting Pseudomonas Spp(2014-10) Philip Job, N.; Jamshinath, T.P.; Hemalatha, N.; Rajesh, M.K.Non-ribosomal peptide synthetases (NRPSs) are multi-modular megasynthasespossessing the ability to catalyze biosynthesis of small bioactive peptides through a thiotemplate mechanismwhich is independent of ribosomes. These enzymes are invovled in production of a wide range of chemical products of broad structural and biological activity. The present study was performed with an aim to develop a gene prediction tool using a machine learning work bench called WEKA (Waikato Environment for Knowledge Analysis) for NRPS in plant growth promoting Pseudomonas spp.First, a model was developed using the training data which was generated using many classifiers. The trained model was then used for the prediction of NRPS in a given set of unknown sequences. Cross-validation results showed that the ‘Logistic of Functions’ was the best classifier when compared to others, showing high accuracy and performance in classifying the instances. We hope that the tool will aid in discovering of novel NRPS by predicting them from sequence data obtained by whole genome sequencing of bacteria or metagenomics.Item A Machine Learning Approach for Prediction of Domains of DELLA Proteins, a Key Component of Gibberellic Acid Signaling in Plants(2016-03) Akhil, V.; Amal, V.; Hemalatha, N.; Rajesh, M.K.Item A Machine Learning Approach for Prediction of Gibberellic Acid Metabolic Enzymes in Monocotyledonous Plants(2014-08) Sreepriya, P.; Naganeeswaran, S.; Hemalatha, N.; Sreejisha, P.; Rajesh, M.K.Gibberellins (GA) are one of the most important phytohormones that control different aspects of plant growth and influence various developments such as seed germination, stem elongation and floral induction. More than 130 GAs have been identified; however, only a small number of them are biologically active. In this study, five enzymes in GA metabolic pathway in monocots have been thoroughly researched namely, ent-copalyl-diphosphate synthase (CPS), ent-kaurene synthase (KS), ent-kaurene oxidase (KO), GA 20-oxidase (GA20ox), and GA 2-oxidase (GA2ox). We have designed and implemented a high performance prediction tool for these enzymes using machine learning algorithms. ‘GAPred’ is a web-based system to provide a comprehensive collection of enzymes in GA metabolic pathway and a systematic framework for the analysis of these enzymes for monocots. WEKA-based classifiers (Naïve-Bayes) and Support Vector Machine (SVM) based-modules were developed using dipeptide composition and high accuracies were obtained. In addition, BLAST and Hidden Markov Model (HMMER-based model) were also developed for searching sequence databases for homolog’s of enzymes of GA metabolic pathway, and for making protein sequence alignments.Item Machine Learning Approaches for Prediction of Expansin Gene Family in Indica Rice(2013-12) Hemalatha, N.; Rajesh, M.K.; Narayanan, N.K.Expansin refers to a family of proteins present in the plant cell wall which has important roles in plant cell growth, emergence of root hairs, meristem function and other developmental processes. A major constraint to rice production is submergence of rice by flash flooding. In our earlier study, we had identified 21 novel sequences related to expansin gene families in the genome of indica rice using genome-wide analysis. Development of a tool for the prediction of these expansin genes using computational approaches might significantly enhance rice gene annotation. ExpansinPred, a novel computational method based on radial basis function (RBF) and support vector machines (SVMs) for prediction of aexpansins (EXPA) and b-expansins (EXPB), is presented in this work. Two large families of expansin genes have been discovered in plants, namely EXPA and EXPB. The experimental data are curated from NCBI and include 24 EXPA and 20 EXPB, of indica rice, after redundancy elimination. The proper window length for a potential expansin was optimized as 4 for EXPA and EXPB with prediction accuracies 100 % each for both classifiers for RBF classifier. For SVM, the window length was optimized as 3 for EXPA and 4 for EXPB with prediction accuracies 90 and 100 %, respectively. To evaluate the prediction performance of ExpansinPred, cross-validation, independent dataset validation and jackknife validation were carried out. ExpansinPred was also compared with four more algorithms namely Naive Bayes, sequential minimal optimization, J48 and random forest. To further prove that species-specific predictor is much better than general tool, ExpansinPred was compared with an All-plant tool and also with plants other than rice as test set. The different statistical analyses carried out demonstrated that the proposed algorithm is a useful computational tool for rice genome annotation, specifically for predicting expansin gene family, and can benefit rice research community.Item Nacpred: Computational Prediction of Nac Proteins in Rice Implemented Using Smo Algorithm(2013) Hemalatha, N.; Rajesh, M.K.; Narayanan, N.K.The impact of abiotic stresses, such as drought, on plant growth and development severely hampers crop production worldwide. The development of stress-tolerant crops will greatly benefit agricultural systems in areas prone to abiotic stresses. Recent advances in molecular and genomic technologies have resulted in a greater understanding of the mechanisms underlying the genetic control of the abiotic stress response in plants. NAC (NAM, ATAF1/2 and CUC2) domain proteins are plant-specific transcriptional factors which has diversified roles in various plant developmental processes and stress responses. More than 100 NAC genes have been identified in rice. In the proposed method, NACPred, an attempt has been made in the direction of computational prediction of NAC proteins. The well-known sequential minimum optimization (SMO) algorithm, which is most commonly used algorithm for numerical solutions of the support vector learning problems, has been used for the development of various modules in this tool. Modules were first developed using amino acid, traditional dipeptide (i+1), tripeptide (i+2) and an overall accuracy of 76%, 90%, and 97% respectively was achieved. To gain further insight, a hybrid module (hybrid1 and hybrid2) was also developed based on amino acid composition and dipeptide composition, which achieved an overall accuracy of 90% and 97%. To evaluate the prediction performance of NACPred, cross validation, leave one out validation and independent data test validation were carried out. It was also compared with algorithms namely RBF and Random Forest. The different statistical analyses worked out revealed that the proposed algorithm is useful for rice genome annotation, specifically predicting NAC proteins.Item NACSVMPred: A Machine Learning Approach for Prediction of NAC Proteins in Rice Using Support Vector Machines(2012) Hemalatha, N.; Rajesh, M.K.; Narayanan, N.K.NAC proteins are plant-specific transcriptional factors with diversified roles in various developmental processes and stress responses. Development of genome wide prediction tools for NAC proteins will substantially have an impact on rice gene annotation. NACSVMPred is an effort in this direction for computational genome-scale prediction of NAC proteins in rice by integrating compositional and evolutionary information of proteins. Support vector machine (SVM)-based modules were first developed using traditional amino acid, dipeptide (i+1), tripeptide (i+2), four-parts composition and PSSM and an overall accuracy of 79%, 93%, 93%, 79% and 100% respectively was achieved. Further, two hybrid modules were developed based on amino acid, dipeptide and tripeptide composition, which achieved an overall accuracy of 83% and 79%. NACSVMPred was also evaluated with PSI-BLAST, which resulted in a lower accuracy of 50%. The different statistical analyses carried out revealed that the proposed algorithm is useful for rice genome annotation, specifically predicting NAC proteins.Item PhzPred – A Tool for Prediction of Phenazine Synthesizing Genes in Plant Growth Promoting Pseudomonas spp(2014-10) Shilpa, S.; Anil Paul; Naganeeswaran, S.; Hemalatha, N.; Rajesh, M.K.Phenazines are natural products produced by the bacterial strain of Pseudomonas spp. which possess anti-microbial activities and include more than 50 pigmented heterocyclic nitrogen containing secondary metabolites. Seven core phenazine biosynthetic genes have been identified in nearly all identified bacterial strains that produce phenazine compounds. In this study, a model has been developed to predict the phenazine biosynthetic genes from a set of protein sequences usingmachine learning algorithms from whole genomes of Pseudomonas spp. Initially, protein sequences from the Pseudomonas spp. were retrieved from public databases and used to train the WEKA models. To train the different classifiers in WEKA, three amino acid compositions were used: monomer amino acids, dipeptide amino acids, and a hybridmethod. The trained models were then used for the prediction of phenazine synthesizing gene in anuser submitted sequence. The best WEKA modules were selected based on the performance of different classifiers in training and testing. The performances of the classifier’s were then evaluated based on 10-fold cross validation and independent data set validation techniques. In the proposed methodology, better performance was observed for the hybrid feature extraction method. The development of a genome wide prediction tool for phenazinesynthesizing genes will substantially have an impact on bacterial genome annotation and devising crop protection strategies using plant growth promoting rhizobacteria.