The Challenge
Avalon manages over 60 policy documents that determine whether patients qualify for diagnostic tests. These documents are complex, frequently updated, and critical to the prior authorization process—a task that is currently manual and time-consuming. Healthcare providers and Avalon staff must sift through extensive documentation to determine eligibility, creating inefficiencies and increasing the risk of delays or errors in patient care.
The Solution
To address this bottleneck, Avalon partnered with Tribe to develop a customized PoC leveraging large language models. The objective was to test whether generative AI could accurately and efficiently extract key information from policy documents and generate medically accurate prior authorization questions.
Using Claude 3 Opus, the Tribe AI team demonstrated that LLMs could parse complex healthcare policy language and output structured questions—helping Avalon validate patient eligibility criteria with minimal human intervention.
Key Features
The proof of concept enabled users to:
- Upload a policy document (PDF format)
- Select from multiple LLMs for processing
- Auto-extract a list of covered diagnostic tests
- Manually adjust test lists as needed
- Generate qualifying assessment questions for each test
- Incorporate feedback into a loop for refining question output
This flexible setup allowed both automation and control, giving Avalon stakeholders confidence in the system’s usability and extensibility.
How It Works
- Text Ingestion: PDFs are parsed and text is extracted.
- Relevant Section Identification: A heuristic + LLM combination surfaces policy sections with indications and limitations of coverage.
- Procedure List Generation: The LLM generates a list of covered procedures or diagnostic tests.
- Chunk-Based QA Generation:
- For each identified test, the relevant policy section is divided into chunks.
- The LLM generates questions based on these chunks, ensuring alignment with clinical requirements.
- Output: A finalized list of qualifying questions is returned, ready for integration into Avalon’s workflows.
This pipeline combined automation with human-in-the-loop oversight, using UX design and expert feedback to manage risk.
Impact & The Future
The pilot achieved 100% precision and 83% recall across all policy documents tested—exceeding Avalon’s performance benchmarks. This means:
- Fewer errors in assessing patient eligibility
- Faster review times
- A shift from annual to monthly review cycles for policy documentation
Looking ahead, Avalon plans to expand this system beyond the initial four test policies, working toward broader generalization while refining the question generation process for more complex documents. Continued collaboration with domain experts and UX enhancements will further reduce implicit knowledge gaps and streamline adoption.