PhosPred-RF: a novel sequence-based predictor for phosphorylation sites using sequential information only

PhosPred-RF: a novel sequence-based predictor for phosphorylation sites using sequential information only 215 178 Transactions on NanoBioscience (TNB)

Many recent efforts have been made for the development of machine learning based methods for fast and accurate phosphorylation site prediction. Currently, a majority of well-performing methods are based on hybrid information to build prediction models, such as evolutionary information, and disorder information, etc. Unfortunately, this type of methods suffers two major limitations: one is that it would be not much of help for protein phosphorylation site prediction in case of no obvious homology detected; the other is that computing such the complicated information is time-consuming, which probably limits the usage of predictors in practical applications. In this study, we present a simple, fast and powerful feature representation algorithm, which sufficiently explores the sequential information from multiple perspectives only based on primary sequences, and successfully captures the differences between true phosphorylation sites and non-phosphorylation sites. Using the proposed features, we propose a random forest based predictor named PhosPred-RF in the prediction of protein phosphorylation sites from proteins. We evaluate and compare the proposed predictor with state-of-the-art predictors on some benchmark datasets. The experimental results show that PhosPred-RF outperforms other existing predictors, demonstrating its potential to be a useful tool for protein phosphorylation site prediction. Currently, the proposed PhosPred-RF is freely accessible to the public through the user-friendly webserver http://server.malab.cn/PhosPred-RF