Some data points in the electronic health record and elsewhere exist buried within free text. For example, PET-CT scans can contain critical data about the progression of a cancer patient’s disease – however, this quantitative data is not available in the EHR in a structured fashion.
To remedy issues like these, Research Informatics has instituted a natural language processing program, whereby clinical free text is subject to computational techniques designed to derive structured data usable by clinical researchers.
Some examples of NLP pipelines live today at WCM include:
- Surgical pathology –TNM staging data, Gleason scores, and ICD-9/10 codes from surgical pathology reports (Read more: https://pubmed.ncbi.nlm.nih.gov/34694896/)
- PHQ-9 – depression screening scores from progress notes (Read more: https://pubmed.ncbi.nlm.nih.gov/30815052/)
- LVEF – ejection fraction data from free text echocardiogram reports (Read more: https://pubmed.ncbi.nlm.nih.gov/29888051/)
- Bone marrow biopsy – blast counts, cellularity, and fibrosis from pathology reports
To learn more about NLP at WCM, contact arch-support@med.cornell.edu.