The research at the Children's Hospital Informatics Program spans a wide range of problems in bioinformatics and clinical informatics. Our goal is to make significant contributions to biomedical research and patient care by understanding and utilizing various types of genomic and proteomic data and by developing innovative hardware and software technologies.

Instrumenting the Healthcare Enterprise for Discovery Research

Since its inception in 2005 i2b2 been designed to provide the instrumentation for using the informational byproducts of health care and the biological materials accumulated through the delivery of health care to – and as a complement to prospective cohort studies and trials -  conduct discovery research and to study the healthcare system in vivo. The utility of this approach is demonstrated by the grass-roots adoption of i2b2 by over 84 academic health centers (AHCs) internationally, each implementation of which is a major, local institutional commitment.

SMART Platforms -- the "App Store" for health

Substitutable Medical Apps, reusable technologies A platform with substitutable apps constructed around core services is a promising approach to driving down healthcare costs, supporting standards evolution, accommodating differences in care workflow, fostering competition in the market, and accelerating innovation.


Automated Epidemiologic Geotemporal Integrated Surveillance System The AEGIS System performs automated, real-time surveillance for bioterrorism and naturally occurring outbreaks. It is the syndromic surveillance system for the Massachusetts Department of Public Health, enabling real time population health monitoring.

Gene Partnership

TGP is the only knowledge-base powering research that has an “informed cohort” of research subjects who can actively participate in the discovery of solutions to disease. By allowing participants to be partners in their own research, we have the ability to capture more data and create powerful studies.

Growth Calculator

Growth Calculator is an online anthropometric calculator, helpful for calculating a variety of standard deviation scores and velocities, as well as for predicted height calculations using heights and bone age values.

Health Map

HealthMap brings together disparate data sources to achieve a unified and comprehensive view of the current global state of infectious diseases and their effect on human and animal health. This freely available Web site integrates outbreak data of varying reliability, ranging from news sources (such as Google News) to curated personal accounts (such as ProMED) to validated official alerts (such as World Health Organization). Through an automated text processing system, the data is aggregated by disease and displayed by location for user-friendly access to the original alert. HealthMap provides a jumping-off point for real-time information on emerging infectious diseases and has particular interest for public health officials and international travelers.

Health Information Technology for Health Care Transitions

This initiative is focused on advancing care of chronically ill youth using patient- and family-centered health information technologies. We are conducting research to refine the PCHR approach to capture information about self-care, social roles, transition readiness, health risk behaviors and psychosocial problems. Our goal is to improve the quality and safety of care systems and health as youth move from pediatric to adult internal medicine. In the CHB Diabetes Program we are developing longitudinal systems for monitoring the health of diabetes-affected adolescents over transitional periods and evaluating youth engagement with the PCHR platform and enriched PCHR supported care systems that involve case management and social network-based surveillance and reporting. This project is supported by grants from the Program on Patient Safety and Quality at Boston Children's Hospital, and by the Harvard Clinical and Translational Sciences Center (CTSC).


Indivo is the original personally controlled health record (PCHR) system. A PCHR enables an individual to own and manage a complete, secure, digital copy of her health and wellness information. Indivo integrates health information across sites of care and over time.


There is an enormous trove of prior knowledge gleaned in the biological sciences. Macrobiology leverages this prior knowledge in the interpretation of whole physiologies (and pathologies) using empirical grounding offered by high-throughput, comprehensive measurements.

Self - Scaling Registries

The Self Scaling Registry project is an open source software platform aimed to simplify multi-institutional patient registry collaboration. Built upon existing successful open source projects i2b2 and SHRINE, the Self Scaling Registry project empowers researchers to form their own data sharing networks, manage data use, and build on top of existing datasets. Our initial deployment of this platform is supporting the CARRA network, a group of pediatric rheumatologists participating from 60 medical institutions, in forming a patient registry to be the basis for future research work, comparative effectiveness studies, and post marketing surveillance.


TuAnalyze supports consented collection, sharing and display of biomedical and behavioral diabetes data. The application stores user-entered data in an Indivo PCHR, allowing for strict user control of data sharing and access. TuAnalyze provides a biosurveillance-derived display of live, aggregate, geo-referenced data back to the community for benchmarking and to incent ongoing engagement.

Multi-source Integrated Platform for Answering Clinical Questions (MiPACQ)

Clinical question answering (cQA) systems focus on the physician needs usually at the point of care, or the investigator in the lab. The questions usually asked either require information highly specific to their patient, e.g. the patient’s lab results or previous history, answered by the patient’s health record, or a more general type of information usually answered through generally available information sources.

Pharmaco-Genomics Research Network (PGRN)

We are working on developing a RA disease activity level classifier for clinical notes directly from Electrical Health Records with chart review and with Natural Language Processing techniques.

Shared Annotation Resources (ShARe)

We are developing standards and infrastructure that can enable technology to extract scientific information from textual medical records. We are annotating a 500K word clinical narrative corpus for syntactic information following the Penn Treebank guidelines and for semantic information following the UMLS definitions. The corpus will be made available to the research community in 2014.

Strategic Health IT Advanced Research Projects (SHARP), SHARPn

We are in the process of creating several open source NLP modules for semantic analysis of clinical narratives, which include a module for coreference, relation extraction, and predicate-argument structure of the sentence. We are also currently involved in several annotation tasks that aim to create a richly annotated corpus of clinical texts. This corpus will include multiple layers of syntactic and semantic annotation such as treebank, propbank, and UMLS annotations. We are also involved in projects which focuses on utilizing active learning to reduce the cost of annotation.

Temporal History of Your Medical Events (THYME)

Temporal relations are of prime importance in biomedicine as they are intrinsically linked to diseases, signs and symptoms, and treatments. Understanding the timeline of clinically relevant events is key to the next generation of translational research where the importance of generalizing over large amounts of data holds the promise of deciphering biomedical puzzles. The goal of our current proposal is to automatically discover temporal relations from clinical free text and create a timeline.