Projects in Bioinformatics: Natural Language Processing
BIOI 7791 (1-3 credits)
Projects in Bioinformatics: Natural Language Processing
Fitzsimons RC-1 North Tower, Room P18-1402
Tuesdays and Thursdays, 0930am-1050am
Office hours by appointment
Instructor: K. B. Cohen
Office: RC-1 S. Room L18-6400A
Phone: 303-916-2417

Home Readings Links Problem Sets Project

I taught this course in the spring of 2005. Students in the course did a joint project on ontology alignment, writing a paper which has since been published as Johnson et al. (2005), Evaluation of lexical methods for detecting relationships between concepts from multiple ontologies.

Course description

Students in this course will participate in one project in biomedical language processing. This participation will usually take the form of significant programming work, and the project will be a competitive evaluation of language processing in the biomedical domain. Participation as a team is strongly encouraged. On a weekly basis, students will read and participate in discussions of selections from the literature on BLP. Students will be responsible for presenting one paper and establishing context for one paper over the course of the quarter.

The main project this quarter will be the TREC Genomics track ( The goal here is for participants in the course to build a system to carry out at least one of this year's tasks. Participation as a team, and even as one giant team, is not just permitted, but encouraged.

Other projects will be considered as well. In general, projects should be substantial enough that when completed, a paper based on the work could be submitted to PSB, ACL, or ISMB, and any proposal for other projects should make use of data that is already publicly available.

Goals for the course: Students will gain practical experience with natural language processing tools and techniques.

Course Requirements:

  • Prerequisites:

    • BIOI 7713
    • High level of comfort with computers and the internet

  • Textbooks:

    Natural language processing for online applications: text retrieval, extraction, and categorization by Peter Jackson & Isabelle Moulinier.

  • Online readings which will be made available before each lecture on this site.
  • Access to a computer with an internet connection. You will need an account with a UCHSC IP address to get access to some readings. Visit UCHSC Information Services to get a remote access account.

Supplementary texts:


Each of the following topics will be covered in a class lecture. This list is tentative until the lecture is posted. I may change things around depending on recent results, class interests, etc. Lecture notes, readings and links to external web sites will be provided before each lecture. NB: Some PDF versions of the lecture notes have small differences from the PPT originals -- when there is a conflict, use the PPT.
  1. Tokenization (PDF lecture slides) (PPT lecture slides) (readings) (links)

  2. Shallow parsing and part-of-speech tagging (PPT lecture slides) (readings) (links)

  3. Information extraction I (PPT lecture slides) (readings) (links)

  4. Information extraction II
  5. Information extraction III (PPT lecture slides)
  6. Anaphora and coreference I: resolution in text (PPT lecture slides) (readings) (links
  7. Anaphora and coreference II: grounding in knowledge modelsg (PPT lecture slides) (readings) (links


    Paper notes: Every week we will read at least one paper. You will write up, and turn in, either a one-page set of notes on the paper, using a format that I will explain in class, or we will complete a review form, using a representative format from some conference or another. I will distribute the conference review forms, as well. I will announce the format weekly. By the end of the quarter, you will have the beginnings of a well-annotated collection of the literature on biomedical language processing.

    Project participation: Students taking the course for two or more hours of credit are expected to do approximately four or more hours of programming or other project-related work per week. This work will be graded on timeliness, documentation, and testing. Students taking the course for one hour of credit may choose between programming work and paper presentations.

    Honor Code

    The Graduate School requires that this honor code be included in all course syllabi.

    Education at the Health Sciences Center is conducted under the honor system. All students who have entered health professional programs should have developed the qualities of honesty and integrity, and each student should apply these principles to his or her academic and subsequent professional career. All students are also expected to have achieved a level of maturity, which is reflected by appropriate conduct at all times.

[an error occurred while processing this directive]
This document last modified 08/09/10 13:08.