Automate utilization-review of health insurance claims using generative AI

This document describes a reference architecture for health insurance companieswho want to automate prior authorization (PA) request processing and improvetheir utilization review (UR) processes by using Google Cloud. It's intended forsoftware developers and program administrators in these organizations. This architecturehelps to enable health plan providers to reduce administrative overhead,increase efficiency, and enhance decision-making by automating data ingestion andthe extraction of insights from clinical forms. It also allows them to use AI modelsfor prompt generation and recommendations.

Architecture

The following diagram describes an architecture and an approach for automatingthe data ingestion workflow and optimizing the utilization management (UM) reviewprocess. This approach uses data and AI services in Google Cloud.

Data ingestion and UM review process high-level overview.

The preceding architecture contains two flows of data, which are supported by thefollowing subsystems:

  • Claims data activator (CDA), which extracts data fromunstructured sources, such as forms and documents, and ingests it into adatabase in a structured, machine-readable format. CDA implements the flow ofdata to ingest PA request forms.
  • Utilization review service (UR service), whichintegrates PA request data, policy documents, and othercare guidelines to generate recommendations. The UR service implements theflow of data to review PA requests by using generative AI.

The following sections describe these flows of data.

CDA flow of data

The following diagram shows the flow of data for using CDA to ingest PA requestforms.

PA case managers flow of data.

As shown in the preceding diagram, the PA case manager interacts with thesystem components to ingest, validate, and process the PA requests. The PA casemanagers are the individuals from the business operations team who are responsiblefor the intake of the PA requests. The flow of events is as follows:

  1. The PA case managers receive the PA request forms(pa_forms) from the healthcare provider and uploads them to thepa_forms_bkt Cloud Storage bucket.
  2. Theingestion_service service listens to thepa_forms_bkt bucketfor changes. Theingestion_service service picks uppa_formsformsfrom thepa_forms_bkt bucket. The service identifies the pre-configuredDocument AI processors, which are calledform_processors. Theseprocessors are defined to process thepa_forms forms. Theingestion_serviceservice extracts information from the forms using theform_processorsprocessors. The data extracted from the forms is in JSON format.
  3. Theingestion_service service writes the extracted information withfield-level confidence scores into the Firestore database collection,which is calledpa_form_collection.
  4. Thehitl_app application fetches the information (JSON) withconfidence scores from thepa_form_collection database. The applicationcalculates the document-level confidence score from the field-levelconfidence scores made available in the output by theform_processorsmachine learning (ML) models.
  5. Thehitl_app application displays the extracted information with the field anddocument level confidence scores to the PA case managers so that they canreview and correct the information if the extracted values are inaccurate.PA case managers can update the incorrect values and save the document inthepa_form_collection database.

UR service flow of data

The following diagram shows the flow of data for the UR service.

UR specialist flow of data.

As shown in the preceding diagram, the UR specialists interact with the systemcomponents to conduct a clinical review of the PA requests. The UR specialistsare typically nurses or physicians with experience in a specific clinical areawho are employed by healthcare insurance companies. The case management and routingworkflow for PA requests is out of scope for the workflow that this section describes.

The flow of events is as follows:

  1. Theur_app application displays a list of PA requests andtheir review status to the UR specialists. The status shows asin_queue,in_progress, orcompleted.
  2. The list is created by fetching thepa_form information data from thepa_form_collection database. The UR specialist opens a request byclicking an item from the list displayed in theur_app application.
  3. Theur_app application submits thepa_form information data to theprompt_modelmodel. It uses the Vertex AI Gemini API to generate a promptthat's similar to the following:

    Review a PA request for {medication|device|medical service} for our member, {Patient Name}, who is {age} old, {gender} with {medical condition}. The patient is on {current medication|treatment list}, has {symptoms}, and has been diagnosed with {diagnosis}.

  4. Theur_app application displays the generated prompt to the UR specialists forreview and feedback. UR specialists can update the prompt in the UI andsend it to the application.

  5. Theur_app application sends the prompt to theur_model model with arequest to generate a recommendation. The model generates a responseand returns to the application. The application displays the recommended outcometo the UR specialists.

  6. The UR specialists can use theur_search_app application to search forclinical documents,care guidelines, andplan policy documents. Theclinical documents,care guidelines, andplan policy documents arepre-indexed and accessible to theur_search_app application.

Components

The architecture contains the following components:

  • Cloud Storage buckets. UM application services require the followingCloud Storage buckets in your Google Cloud project:

    • pa_forms_bkt: A bucket to ingest the PA forms that need approval.
    • training_forms: A bucket to hold historical PA forms fortraining the DocAI form processors.
    • eval_forms: A bucket to hold PA forms for evaluating theaccuracy of the DocAI form processors.
    • tuning_dataset: A bucket to hold the data required for tuningthe large language model (LLM).
    • eval_dataset: A bucket to hold the data required forevaluation of the LLM.
    • clinical_docs: A bucket to hold the clinical documents thatthe providers submit as attachments to the PA forms or afterward tosupport the PA case. These documents get indexed by the search applicationin Vertex AI Search service.
    • um_policies: A bucket to hold medical necessity and careguidelines, health plan policy documents, and coverage guidelines.These documents get indexed by the search application in theVertex AI Search service.
  • form_processors: These processors are trained to extract information from thepa_forms forms.

  • pa_form_collection: A Firestore datastore to store the extracted information as JSON documents in the NoSQL database collection.

  • ingestion_service: A microservice that reads the documents from the bucket, passes them to the DocAI endpoints for parsing, and stores the extracted data in Firestore database collection.

  • hitl_app: A microservice (web application) that fetches and displays data values extracted from thepa_forms. It also renders the confidence score reported by form processors (ML models) to the PA case manager so that they can review, correct, and save the information in the datastore.

  • ur_app: A microservice (web application) that UR specialists can use to review the PA requests using Generative AI. It uses the model namedprompt_model to generate a prompt. The microservice passes the data extracted from thepa_forms forms to theprompt_model model to generate a prompt. It then passes the generated prompt tour_model model to get the recommendation for a case.

  • Vertex AI medically-tuned LLMs: Vertex AI has a variety ofgenerative AI foundation models that can be tuned to reduce cost and latency. The models used in this architecture are as follows:

    • prompt_model: An adapter on the LLM tuned to generate promptsbased on the data extracted from thepa_forms.
    • ur_model: An adapter on the LLM tuned to generate a draftrecommendation based on the input prompt.
  • ur_search_app: A search application built with Vertex AI Search to find personalized and relevant information to UR specialists from clinical documents, UM policies, and coverage guidelines.

Products used

This reference architecture uses the following Google Cloud products:

  • Vertex AI: An ML platform that lets you train and deploy ML modelsand AI applications, and customize LLMs for use in AI-powered applications.
  • Vertex AI Search: A platform that lets developers create and deployenterprise-grade AI-powered agents and applications.
  • Document AI: A document processing platform that takesunstructured data from documents and transforms it into structured data.
  • Firestore: A NoSQL document database built for automatic scaling, high performance,and ease of application development.
  • Cloud Run: A serverless compute platform that lets you runcontainers directly on top of Google's scalable infrastructure.
  • Cloud Storage: A low-cost, no-limit object store for diverse data types.Data can be accessed from within and outside Google Cloud, and it'sreplicated across locations for redundancy.
  • Cloud Logging: A real-time log management system with storage, search,analysis, and alerting.
  • Cloud Monitoring: A service that provides visibility into theperformance, availability, and health of your applications and infrastructure.

Use case

UM is a process used by health insurance companies primarily in the United States, butsimilar processes (with a few modifications) are used globally in the healthcareinsurance market. The goal of UM is to help to ensure that patients receive theappropriate care in the correct setting, at the optimum time, and at the lowestpossible cost. UM also helps to ensure that medical care is effective, efficient, andin line with evidence-based standards of care. PA is a UM tool that requiresapproval from the insurance company before a patient receives medical care.

The UM process that many companies use is a barrier to providing andreceiving timely care. It's costly, time-consuming, and overly administrative.It's also complex, manual, and slow. This process significantly impacts the abilityof the health plan to effectively manage the quality of care, and improve the providerand member experience. However, if these companies were to modify their UM process,they could help ensure that patients receive high-quality, cost-effective treatment.By optimizing their UR process, health plans can reduce costs and denialsthrough expedited processing of PA requests, which in turncan improve patient and provider experience. This approach helps to reduce theadministrative burden on healthcare providers.

When health plans receive requests for PA, the PA case managers create cases inthe case management system to track, manage and process the requests. Asignificant amount of these requests are received by fax and mail, with attachedclinical documents. However, the information in these forms and documents is noteasily accessible to health insurance companies for data analytics and businessintelligence. The current process of manually entering information from thesedocuments into the case management systems is inefficient and time-consuming andcan lead to errors.

By automating the data ingestion process, health plans can reduce costs, dataentry errors, and administrative burden on the staff. Extracting valuableinformation from the clinical forms and documents enables health insurancecompanies to expedite the UR process.

Design considerations

This section provides guidance to help you use this reference architecture todevelop one or more architectures that help you to meet your specific requirementsfor security, reliability, operational efficiency, cost, and performance.

Security, privacy, and compliance

This section describes the factors that you should consider when you use thisreference architecture to help design and build an architecture inGoogle Cloud which helps you to meet your security, privacy, and compliancerequirements.

In the United States, the Health Insurance Portability and Accountability Act (known asHIPAA, as amended, including by the Health Information Technology for Economicand Clinical Health — HITECH — Act) demands compliance with HIPAA'sSecurity Rule,Privacy Rule,andBreach Notification Rule.Google Cloud supports HIPAA compliance,but ultimately, you are responsible for evaluating your own HIPAA compliance.Complying with HIPAA is a shared responsibility between you and Google. If yourorganization is subject to HIPAA and you want to use any Google Cloudproducts in connection with Protected Health Information (PHI), you must reviewand accept Google's Business Associate Agreement (BAA). The Google products coveredunder the BAA meet the requirements under HIPAA and align with ourISO/IEC 27001, 27017, and 27018 certifications andSOC 2 report.

Not all LLMs hosted in the Vertex AI Model Garden support HIPAA.Evaluate and use the LLMs that support HIPAA.

To assess how Google's products can meet your HIPAA compliance needs, you canreference the third party audit reports in theCompliance resource center.

We recommend that customers consider the following when selecting AI use cases,and design with these considerations in mind:

Google's products followResponsible AI principles.

For security principles and recommendations that are specific to AI and ML workloads, seeAI and ML perspective: Securityin the Well-Architected Framework.

Reliability

This section describes design factors that you should consider to build andoperate reliable infrastructure to automate PA requestprocessing.

Document AIform_processors is a regional service. Data isstored synchronously across multiple zones within a region. Traffic isautomatically load-balanced across the zones. If a zone outage occurs, dataisn't lost1. If a region outage occurs, the service is unavailable untilGoogle resolves the outage.

You can create Cloud Storage buckets in one of threelocations:regional, dual-region, or multi-region, usingpa_forms_bkt,training_forms,eval_forms,tuning_dataset,eval_dataset,clinical_docs orum_policiesbuckets. Data stored in regional buckets is replicated synchronously across multiplezones within a region. For higher availability, you can use dual-region or multi-regionbuckets, where data is replicated asynchronously across regions.

InFirestore,the information extracted from thepa_form_collection database can sit acrossmultiple data centers to help to ensure global scalability and reliability.

The Cloud Run services,ingestion_service,hitl_app, andur_app,are regional services. Data is stored synchronously across multiple zones withina region. Traffic is automatically load-balanced across the zones. If a zone outageoccurs, Cloud Run jobs continue to run and data isn't lost. If a regionoutage occurs, the Cloud Run jobs stop running until Google resolves theoutage. Individual Cloud Run jobs or tasks might fail. To handle suchfailures, you can usetask retries and checkpointing. For more information, seeJobs retries and checkpoints best practices.Cloud Run general development tips describes some best practices for using Cloud Run.

Vertex AI is a comprehensive and user-friendly machine learningplatform that provides a unified environment for the machine learning lifecycle,from data preparation to model deployment and monitoring.

For reliability principles and recommendations that are specific to AI and ML workloads, seeAI and ML perspective: Reliabilityin the Well-Architected Framework.

Cost optimization

This section provides guidance to optimize the cost of creating and running anarchitecture to automate PA request processing and improveyour UR processes. Carefully managing resource usage andselecting appropriate service tiers can significantly impact the overall cost.

Cloud Storage storage classes:Use the different storage classes (Standard, Nearline,Coldline, or Archive) based on the data access frequency. Nearline,Coldline, and Archive are more cost-effective for less frequently accesseddata.

Cloud Storage lifecycle policies: Implement lifecycle policies toautomatically transition objects to lower-cost storage classes or delete thembased on age and access patterns.

Document AI is priced based on the number of processorsdeployed and based on the number of pages processed by the Document AIprocessors. Consider the following:

  • Processor optimization: Analyze workload patterns to determine theoptimal number of Document AI processors to deploy. Avoidoverprovisioning resources.
  • Page volume management: Pre-processes documents to remove unnecessarypages or optimize resolution can help to reduce processing costs.

Firestore is priced based on activity related to documents, index entries, storage that thedatabase uses, and the amount of network bandwidth. Consider the following:

  • Data modeling: Design your data model to minimize the number ofindex entries and optimize query patterns for efficiency.
  • Network bandwidth: Monitor and optimize network usage to avoid excesscharges. Consider caching frequently accessed data.

Cloud Run charges are calculated based on on-demand CPU usage, memory, and number of requests.Think carefully about resource allocation. Allocate CPU and memory resources basedon workload characteristics. Use autoscaling to adjust resources dynamically basedon demand.

Vertex AI LLMs are typically charged based on the input andoutput of the text or media. Input and output token counts directly affect LLMcosts. Optimize prompts and response generation for efficiency.

Vertex AI Search searchengine charges depend on the features that you use. To help manage your costs,you can choose from the following three options:

  • Search Standard Edition, which offers unstructured search capabilities.
  • Search Enterprise Edition, which offers unstructured search and websitesearch capabilities.
  • Search LLM Add-On, which offers summarization and multi-turn searchcapabilities.

You can also consider the following additional considerations to help optimizecosts:

  • Monitoring and alerts: Set up Cloud Monitoring and billingalerts to track costs and receive notifications when usage exceeds thethresholds.
  • Cost reports: Regularly review cost reports in theGoogle Cloud console to identify trends and optimize resource usage.
  • Consider committed use discounts: If you have predictable workloads,consider committing to using those resources for a specified period to getdiscounted pricing.

Carefully considering these factors and implementing the recommendedstrategies can help you to effectively manage and optimize the cost of running your PAand UR automation architecture on Google Cloud.

For cost optimization principles and recommendations that are specific to AI and ML workloads, seeAI and ML perspective: Cost optimizationin the Well-Architected Framework.

Deployment

The reference implementation code for this architecture is available underopen-source licensing. The architecture that this code implements is aprototype, and might not include all the features and hardening that you needfor a production deployment. To implement and expand this reference architectureto more closely meet your requirements, we recommend that you contactGoogle Cloud Consulting.

The starter code for this reference architecture is available in the followinggit repositories:

  • CDA git repository:This repository contains Terraform deployment scripts for infrastructureprovisioning and deployment of application code.
  • UR service git repository:This repository contains code samples for the UR service.

You can choose one of the following two options for to implement support andservices for this reference architecture:

What's next

Contributors

Author:Dharmesh Patel | Industry Solutions Architect, Healthcare

Other contributors:


  1. For more information about region-specific considerations, seeGeography and regions

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-08-19 UTC.