Artificial intelligence and machine learning are reshaping healthcare delivery and discovery. Researchers at CHIP have been engaged in AI from its early stages, pioneering methods in and systems for a wide range of areas—from predictive medicine and digital disease detection to advanced image processing. We have also led innovations in natural language processing, using computerized systems to read, understand, and extract meaning from human language datasets. Building on this foundation, CHIP is actively developing methods and integrating large language models across clinical care, research, and public health, advancing how information is processed and applied in these fields.
Working Groups
CHIP, in partnerships with other programs, has led the launch of Boston Children’s Hospital Artificial Intelligence and Machine Learning Working Group, which gives Boston Children’s clinicians and investigators a forum for sharing knowledge and collaborating across the many facets of artificial intelligence and machine learning.
Projects
CHIP is leading a team to instrument multiple care delivery systems at scale to provide rapid and accurate detection of adverse drug events in electronic health record data, using AI/NLP and the newly regulated "bulk FHIR" APIs. This is part of the ARPA-H "BDF Toolbox" effort, which aims to improve patients’ health outcomes by democratizing access to biomedical data and creating an ecosystem of open-source tools.
Cumulus is a federally funded platform, supported by ONC/ASTP, that enables secure, interoperable data federation across all sites of care through cloud-hosted EHR “side cars” and regulated, open standards-based APIs. Under the 21st Century Cures Act, all sites of care must support the SMART Bulk FHIR Access API, which Cumulus utilizes to seamlessly acquire both structured and unstructured clinical data at scale. Implemented with health systems and public health departments as part of the CDC Data Modernization Initiative, Cumulus integrates with a high-throughput large language model (LLM) pipeline to deliver comprehensive analytics and insights. In addition, it now connects the healthcare delivery system to the research enterprise via the ARPA-H Biomedical Data Fabric, fostering innovation, improving patient outcomes, and accelerating data-driven healthcare discovery.
The Intelligent Histories project develops new ways of using commonly available electronic medical information to predict people's future medical risks, helping doctors choose preventive interventions and improve medical care.
Suicide is one of the ten leading causes of death in the United States. Even though the majority of all individuals who die by suicide have contact with a healthcare professional in the month before their death, suicide risk is rarely detected in such cases. CHIP researchers have developed advanced predictive models able to identify between one third and one half of all suicide attempts on average three years before they occur, enabling life-saving interventions and care.
Psychosis often first appears during adolescence or young adulthood and is difficult to detect. If left undetected and untreated, psychosis can quickly deteriorate to even more severe mental illness. CHIP researchers are developing advanced predictive models to identify cases of first episode psychosis years prior to when they would otherwise be detected by the health system.
Family histories are an essential predictor of disease risk, yet they are often incomplete, inaccurate, and underutilized in today's clinical settings. CHIP researchers are developing improved approaches to providing more complete, accurate and detailed family histories based on electronic health records of patients and their consenting family members. These improved histories enable better clinical risk prediction and decision-making.
CHIP researchers have developed novel network-based models to predict unknown adverse drug events and drug-drug interactions. Instead of waiting for sufficient post-marketing evidence to accumulate, this predictive approach can identify drug safety issues years in advance.
CHIP researchers have developed new ways of using commonly available electronic medical information to predict people's future medical risks helping doctors choose preventive interventions and improve medical care.
We apply advanced modeling techniques to novel data sources in order to predict and detect outbreaks and other public health trends, especially during times of great uncertainty such as epidemics or large public events
The Prediction of Patient Placement (POPP) system aims to improve the flow of patients through the Emergency Department (ED) and hospital, by providing decision makers with real-time predictions of future patient disposition. POPP bridges predictive analytics to the point of care. We apply computer models on live data extracted from the Electronic Health Records to forecast not only what patients currently need, but what will they need in the near future - facilitating a smarter and more efficient use of resources. Building upon our predictive models we developed a Dashboard. POPP was awarded a 2019 Microsoft for Healthcare Innovation Award in the category Optimize Clinical Operational Effectiveness & Improve Outcomes.
Apache clinical Text Analysis and Knowledge Extraction System (cTAKES) is a widely used, open source and free tool for clinical natural language processing (NLP). Unlike general purpose NLP tools, cTAKES is specialized for clinical texts, incorporating Unified Medical Language System (UMLS) resources for finding medical concepts and packaged with machine learning models trained on gold standard clinical texts. Apache cTAKES has NLP use that extends beyond clinical care. Apache cTAKES became the first and only top-level Apache Software Foundation biomedical informatics software in 2013. In 2019, Apache cTAKES was named one of the 20 most influential Apache projects.
CHIP researchers develop novel methods for information extraction to facilitate automatic/unsupervised/minimally supervised extraction of specific discrete cancer- related data from various types of unstructured electronic medical records. Our two main use cases are cancer deep phenotyping for translational science (DeepPhe) and a platform for cancer surveillance by the cancer registries (DeepPhe*CR).
Temporal Histories of Your Medical Event (THYME) uses temporal relations in processing free text. Understanding the timeline of clinically relevant events is key to the next generation of translational research where the importance of generalizing over large amounts of data holds the promise of deciphering biomedical puzzles.
The Health Natural Language Processing (hNLP) Center targets a key challenge to current hNLP research and health-related human language technology development: the lack of health-related language data. The Center’s primary activities are to: provide a repository and a data curation, distribution and management point for health-related language resources, support sponsored research programs and health-related language-based technology evaluations, and engage in collaborations with US and foreign researchers, institutions and data centers.
Our goals are to apply the best performing NLP methods to impactful biomedical uses cases to advance the science of biomedicine and clinical care, such as, pediatric pulmonary hypertension, rheumatoid arthritis, inflammatory bowel disease, artery aneurysms, early childhood obesity, autism spectrum disorder, polycystic ovary syndrome, and methotrexate-induced liver toxicity.
The Fairness in Machine Learning project focuses on addressing these challenges by developing flexible search methods that can explicitly optimize multiple criteria - in this case, between model accuracy and fairness. Our faculty are interested in understanding the intricacies of downstream impacts on healthcare that will arise as more and more models are deployed in the health system. Providing a set of models varying in fairness and accuracy is one way to aid a decision maker in understanding how an algorithm will affect the people it interacts with when it is deployed.
Intelligible Predictive Health Models seeks to analyze and improve the intelligibility and/or explainability of Machine Learning (ML) methods deployed in the health system. We empirically analyze how intelligible current ML tools are and develop novel approaches to interpretable ML.