Announcement of Data Release and Call for Participation

2011 i2b2/VA/Cincinnati Shared-Tasks and Workshop
i2b2/VA Track on Challenges in Natural Language Processing for Clinical Data

Tentative Timeline
Training Data Release: 1 June, 2011
Test Data Release: 1 August, 2011
System Outputs Due: 3 August, 2011
Paper Submission: 1 September, 2011
Workshop: October, 2011 in Washington, DC

The fifth i2b2/VA challenge is on co-reference resolution. The data for this track is provided by Partners HealthCare, Beth Israel Deaconess Medical Center (MIMIC II Database), University of Pittsburgh, and the Mayo Clinic. All records have been fully de-identified and manually annotated for co-reference. Part of the data has been produced by the ODIE grant (R01 CA127979, NCI/NCBI, PI Crowley) and contributed to the i2b2/VA 2011 challenge under SHARP 4 (U01 SHARP 4, ONC, PI Chute).

The challenge registration starts on March 14, 2011. Training data for the challenge will be released starting June 1, 2011. Test data are scheduled to be released in August 2011. The results of the challenge will be presented at the workshop organized by i2b2, VA, and the University of Cincinnati.

Data for the 2011 i2b2/VA track will be released under a Data Use Agreement and are to be used for research only. Obtaining the data requires completing a registration and signing the Data Use Agreement. Different from previous years, this year, the data use agreements have to be submitted to i2b2/VA at least a month in advance of the data release. Timely submission of the agreements to i2b2/VA is essential in order for the paperwork to be completed by the data donating institutions prior to release of the data to individual teams. At data release time, data will only be made available to the teams whose paperwork has been approved by the data donating institutions. Other teams will gain access to the data after their paperwork is completed.

In order to allow teams to start their preparations for the challenge, we have posted sample input and output files. The input for the coreference task comes in .txt and .con files. While output format for the coreference task is not yet final, the .pair and .chain files can serve as the starting point. .pair files show antecedent and anaphor pairs while .chain files show complete coreference chains. Sample files on the task can be found in the downloads section.

Evaluation Dates, File Formats, and Evaluation Metrics.

The i2b2/VA track evaluation will be conducted using withheld test data. Participating teams are asked to stop development as soon as they download the test data. Each team is allowed to upload (through this website) up to three system runs for each of the tiers of the challenge. System output is expected in the form of standoff annotations, following the exact format of the ground truth annotations provided by the organizers. MUC, B-Cubed, and CEAF will be used as evaluation metrics.

Participants are asked to submit a short paper describing their system and analyzing their performance. Papers should be in AMIA style and should not exceed five pages. Authors of top performing systems and of particularly novel approaches will be invited to present or demo their systems at the workshop.

Tentative Schedule
March 14, 2011 Registration Opens
June 1, 2011 Training Data Release
August 1, 2011 Test Data Release
August 3, 2011 System Outputs on Test Data Due at 11:59pm Eastern Time
September 1, 2011 Short Papers Due
September 21, 2011 Invitations to Present at the Workshop
October, 2011 Workshop

Organizing Committee:
Ozlem Uzuner, co-chair, SUNY at Albany
Brett R South, co-chair, VA Salt Lake City Health Care System, University of Utah
Wendy Chapman, University of California San Diego
Shuying Shen, VA Salt Lake City Health Care System
Guergana Savova, Children's Hospital Boston
Jiaping Zheng, Children's Hospital Boston
Lynette Hirschman, MITRE
Cheryl Clark, MITRE
John Aberdeen, MITRE
Martha Palmer, University of Colorado Boulder
Arrick Lanfranchi, University of Colorado Boulder

Please see the FAQs and announcements for more information. Questions on the challenge can be addressed to Ozlem Uzuner, n2c2.challenges@gmail.com.