How Avalon Used GenAI to Achieve 100% Precision in Prior Authorization

Tribe

Last Updated: Jun 18, 2025

The Challenge

Avalon manages over 60 policy documents that determine whether patients qualify for diagnostic tests. These documents are complex, frequently updated, and critical to the prior authorization process—a task that is currently manual and time-consuming. Healthcare providers and Avalon staff must sift through extensive documentation to determine eligibility, creating inefficiencies and increasing the risk of delays or errors in patient care.

The Solution

To address this bottleneck, Avalon partnered with Tribe to develop a customized PoC leveraging large language models. The objective was to test whether generative AI could accurately and efficiently extract key information from policy documents and generate medically accurate prior authorization questions.

Using Claude 3 Opus, the Tribe AI team demonstrated that LLMs could parse complex healthcare policy language and output structured questions—helping Avalon validate patient eligibility criteria with minimal human intervention.

Key Features

The proof of concept enabled users to:

Upload a policy document (PDF format)
Select from multiple LLMs for processing
Auto-extract a list of covered diagnostic tests
Manually adjust test lists as needed
Generate qualifying assessment questions for each test
Incorporate feedback into a loop for refining question output

This flexible setup allowed both automation and control, giving Avalon stakeholders confidence in the system’s usability and extensibility.

How It Works

Text Ingestion: PDFs are parsed and text is extracted.
Relevant Section Identification: A heuristic + LLM combination surfaces policy sections with indications and limitations of coverage.
Procedure List Generation: The LLM generates a list of covered procedures or diagnostic tests.
Chunk-Based QA Generation:
- For each identified test, the relevant policy section is divided into chunks.
- The LLM generates questions based on these chunks, ensuring alignment with clinical requirements.
Output: A finalized list of qualifying questions is returned, ready for integration into Avalon’s workflows.

This pipeline combined automation with human-in-the-loop oversight, using UX design and expert feedback to manage risk.

Impact & The Future

The pilot achieved 100% precision and 83% recall across all policy documents tested—exceeding Avalon’s performance benchmarks. This means:

Fewer errors in assessing patient eligibility
Faster review times
A shift from annual to monthly review cycles for policy documentation

Looking ahead, Avalon plans to expand this system beyond the initial four test policies, working toward broader generalization while refining the question generation process for more complex documents. Continued collaboration with domain experts and UX enhancements will further reduce implicit knowledge gaps and streamline adoption.

Table of Contents

This is some text inside of a div block.