The majority of these Clinical Natural Language Processing (NLP) data sets were originally created at a former NIH-funded National Center for Biomedical Computing (NCBC) known as i2b2: Informatics for Integrating Biology and the Bedside. Links to key citations are provided below.
Based at Partners HealthCare System in Boston from 2004 to 2014, under the leadership of Principal Investigator Isaac Kohane, MD, PhD, and Executive Director Susanne Churchill, PhD, the i2b2 Center was a passionate advocate for the potential of existing clinical records to yield insights that directly impact healthcare improvement. Recognizing the value locked in unstructured text, i2b2 provided sets of fully deidentified notes from the Research Patient Data Registry at Partners for a series of NLP Shared Task challenges and workshops, which were designed and led by Co-Investigator Özlem Uzuner, MEng, PhD, originally at MIT CSAIL and subsequently at SUNY Albany. Those notes were then made available to the community for general research purposes, and have already enabled hundreds of journal and conference articles by the research community.
These data sets now remain under the stewardship of the Department of Biomedical Informatics at Harvard Medical School, where Drs. Kohane and Churchill are Chair and Executive Director, respectively.
The NLP Shared Task challenges and workshops continue to be directed by Dr. Uzuner, now Department Chair and Associate Professor of Information Sciences and Technology in the Volgenau School of Engineering at George Mason University. Beginning in 2018, they are officially known as n2c2 (National NLP Clinical Challenges) — a name that pays tribute to their i2b2 origins.
The software development component of the former i2b2 Center is now under the direction of the i2b2 tranSMART Foundation, a member-driven non-profit foundation developing an open-source / open-data community around the i2b2, tranSMART and OpenBEL translational research platforms.