Some data points in the electronic health record and elsewhere exist buried within free text. For example, PET-CT scans can contain critical data about the progression of a cancer patient’s disease – however, this quantitative data is not available in the EHR in a structured fashion.
To remedy issues like these, Research Informatics has instituted a natural language processing program, whereby clinical free text is subject to computational techniques designed to derive structured data usable by clinical researchers.
Research Informatics can also support the use of large language models or other GenAI techniques to analyze free text and/or unstructured patient data. Please note that funding and/or IRB approval may be required. To learn more about these tools or to request an environment where you can experiment with them, click here.
To learn more about these tools, watch the Tech Tuesday: Vertex AI/LLM for Clinical Research video or contact arch-support@med.cornell.edu.
Some examples of NLP pipelines live today at WCM include:
- Surgical pathology –TNM staging data, Gleason scores, and ICD-9/10 codes from surgical pathology reports (Read more: https://pubmed.ncbi.nlm.nih.gov/34694896/)
- PHQ-9 – depression screening scores from progress notes (Read more: https://pubmed.ncbi.nlm.nih.gov/30815052/)
- LVEF – ejection fraction data from free text echocardiogram reports (Read more: https://pubmed.ncbi.nlm.nih.gov/29888051/)
- Bone marrow biopsy – blast counts, cellularity, and fibrosis from pathology reports