An Integrative system for prediction of NAC proteins in rice using different feature extraction methods

Abstract

The NAC gene family encodes a large family of plant-specific transcription factors with diverse roles in various developmental processes and stress responses in plants. Creation of genome wide prediction tools for NAC proteins will have a significant impact on gene annotation in rice. In the present study, NACSVM,a tool for computational genome-scale prediction of NAC proteins in rice was developed integrating compositional and evolutionary information of NAC proteins. Initially, support vector machine (SVM)- based modules were developed using combinatorial presence of diverse protein features such as traditional amino acid, dipeptide (i+1), tripeptide (i+2), four-parts composition and PSSM and an overall accuracy of 79%, 93%, 93%, 79% and 100% respectively was achieved. Later, two hybrid modules were developed based on amino acid, dipeptide and tripeptide composition, through which an overall accuracy of 83% and 79% was achieved. NACSVM was also evaluated using position-specific iterated – basic local alignment search tool which resulted in a lower accuracy of 50%. In order to enchmark NACSVM , the tool was evaluated using independent data test and cross validation methods. The different statistical analyses carried out revealed that the proposed algorithm is an useful tool for annotating NAC proteins in genome of rice.

Description

Keywords

SVM, NAC, RBF, PSSM, ROC, AUC

Citation

International Journal on Soft Computing (IJSC) Vol.4, No.1,p. 9-21, February 2013

Collections