An Unsupervised Approach For Identifying Patient Trial Eligibility
Tech Stack: Python, SQL, Pandas, Sentence-Transformers, PyTorch, SciSpaCy, Scikit-learn
GitHub: Link
- Developed a general unsupervised framework for identifying eligible patients for a clinical trial.
- Leveraged multiple string similarity functions, including deep learning models, to efficiently capture domain, contextual, and lexical similarity between patient notes and trial inclusion/exclusion criteria, even if they have very different lengths.
- The algorithm uses a combination of fast string similarity functions and accurate deep learning models to analyze 100k patient records to find the top 100 trial candidates without a GPU in only 15 minutes.
- Implemented a visualization utility to quickly validate the output and help monitor string similarity model performances.