Corpora for biomedical natural language processing
A project of the Biomedical Text Mining Group
at the Center for Computational Pharmacology
Lab: RC-1 S. Room L18-6400A
Phone: 303-916-2417

Home Obtaining corpora Publications Empirical data on corpus usage Corpus design Survey data

Empirical data on corpus design and usage in biomedical natural language processing

This page provides a link to, and supplemental material related for, our 2005 AMIA paper. The paper provides data on the usage of various biomedical corpora. It discusses the formats of a couple of corpora that have not been frequently used outside of the labs that built them, and suggests that the format in which annotations are recorded and distributed is an important factor in determining whether or not a corpus will be widely adopted. At the suggestion of a reviewer, we performed a survey designed to elicit user preferences regarding various aspects of corpus design and contents. To see the survey and results, follow this link.

This document last modified 08/09/10 13:08.