Automated Literature Mining for Validation of High-Throughput Function Prediction


Funded through the National Institutes of Health, National Library of Medicine, ARRA award R01-LM010120, 2009-2011

Project summary: We will develop methods for automatically mining the full text literature to validate computational predictions of functional sites in proteins. Our overall approach is to integrate predictions of protein functional sites derived from structural modeling with information extraction from text to enable identification of statements supporting or refuting a prediction in the literature.

This project is a collaboration with Michael Wall and Judith Cohn at Los Alamos National Laboratory.  The structural prediction utilizes the Dynamic Perturbation Analysis (DPA) approach, which uses analysis of protein dynamics to predict protein functional sites (Ming, Cohn & Wall, BMC Struct Biol 2008; Ming, & Wall, JMB 2006).

Please check back regularly for updates on data and software releases.

Relevant publications:

Verspoor, K., Roeder, C., Johnson, H., Cohen, K.B., Baumgartner, W.A., Hunter, L. (2010) Exploring species-based strategies for gene normalization. Transactions on Computational Biology and Bioinformatics. 2010.

C. Ramakrishnan, W. A. Baumgartner Jr., J. A. Blake, G. APC Burns, K. B. Cohen, H. Drabkin, J. Eppig, E. Hovy, C. Hsu, L. E. Hunter, T. Ingulfsen, H. Onda, S. Pokkunuri, E. Riloff, C. Roeder and K. Verspoor. (2010). Building the Scientific Knowledge Mine (SciKnowMine): a community-driven framework for text mining tools in direct service to biocuration. In Proceedings of the NLP Frameworks Workshop at the Language Resources and Evaluation Conference, pp. 9-14.

K. Verspoor and C. Mejia-Muñoz. (2010) Text Mining for Protein Function Prediction: Detection of Active Residues in Full-text publications. Poster presentation at the Intelligent Systems for Molecular Biology Conference.