An Unsupervised Approach For Identifying Patient Trial Eligibility


Tech Stack: Python, SQL, Pandas, Sentence-Transformers, PyTorch, SciSpaCy, Scikit-learn

GitHub: Link


  • Developed a general unsupervised framework for identifying eligible patients for a clinical trial.
  • Leveraged multiple string similarity functions, including deep learning models, to efficiently capture domain, contextual, and lexical similarity between patient notes and trial inclusion/exclusion criteria, even if they have very different lengths.
  • The algorithm uses a combination of fast string similarity functions and accurate deep learning models to analyze 100k patient records to find the top 100 trial candidates without a GPU in only 15 minutes.
  • Implemented a visualization utility to quickly validate the output and help monitor string similarity model performances.