Who we are

Contacts

1815 W 14th St, Houston, TX 77008

281-817-6190

Healthcare Technology

Transforming Patient Notes into Structured Data with Generative AI

Introducing MediStruct AI Transcription Tool

Introduction

Transcribing and organizing patient visit notes is a time-consuming process that can lead to errors and inefficiencies. Enter MediStruct, an AI-powered solution designed to automate the extraction of structured data from patient notes.

By leveraging cutting-edge generative AI, MediStruct simplifies the transformation of free-form text into organized, structured records. Whether it’s converting transcribed voice notes into readable charts or extracting essential details like diagnoses and recommendations, MediStruct streamlines the documentation process. This workshop will guide you through how MediStruct works, demonstrating the ease of integrating AI into healthcare workflows for improved accuracy and efficiency.

Understanding MediStruct: Your AI-Integrated Patient Notes Extractor

In healthcare, patient documentation is often unstructured, making it difficult for medical professionals to quickly extract key information. MediStruct addresses this challenge by transforming free-form patient notes into a structured, organized format, ready for analysis or storage.

At its core, MediStruct uses a generative AI model to extract essential details like patient name, diagnosis, visit date, and more, from raw transcribed notes. This process reduces manual data entry, helping healthcare providers focus on what really matters: patient care.

MediStruct works by taking text input – whether manually entered or transcribed from voice recordings – and processing it through a natural language processing (NLP) pipeline. This pipeline, powered by a large language model (LLM), structures the information into a predefined chart format. For instance, transcriptions of patient visits are automatically parsed into sections like “Patient Name,” “Current Concerns,” and “Recommendations,” making the data easy to read, store, and query.

This powerful combination of transcription and structured data extraction gives MediStruct the flexibility to work with various input types, making it a versatile tool for healthcare professionals.

Activity 1: Converting Notes into Structured Data

In this first activity, we’ll explore how MediStruct converts unstructured patient notes into structured, readable data. You’ll see how a simple text input can be transformed into a clear, organized patient chart using a large language model (LLM).

The goal of this activity is to take free-form notes, such as those from a doctor’s visit, and extract relevant information like patient name, date of birth, and diagnosis into a predefined format. Let’s walk through the process step by step.

Step 1: Run the Code to Extract Structured Data

Below is the code that sets up the AI to take patient notes as input and return the structured information. In this case, we are using the Bedrock LLM to perform the extraction.

from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate
from langchain_aws import BedrockLLM

# Define the extraction prompt
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an extraction algorithm that will be doing charting for medical personnel. "
            "Only extract relevant information from the text and return it in the chart format given as Markdown. "
            "Do NOT remove the '\\n' character at the end of each line.\n\n"
            "**Patient Name:** [FirstName] [LastName]\n"
            "**Date of Birth:** [DateOfBirth]\n"
            "**MRN:** [MRN]\n"
            "**Visit Date:** [VisitDate]\n"
            "**Visit Time:** [VisitTime]\n"
            "**Current Diagnosis:** [CurrentDiagnosis]\n"
            "**Reason for Referral:** [ReasonForReferral]\n"
            "**Current Concerns:** [CurrentConcerns]\n"
            "**Observations:** [Observations]\n"
            "**Recommendations:** [Recommendations]\n"
            "If you do not know the value of an attribute asked to extract, return null for the attribute's value.",
        ),
        ("human", "{text}"),
    ]
)

# Initialize the Bedrock LLM model
llm = BedrockLLM(model_id="anthropic.claude-v2:1")

# Chain the prompt and LLM
chain = prompt | llm

# Example transcribed patient notes
transcribedVoiceNotes = """
Emily Johnson, born on March 15, 1994, came into my office for a follow-up visit on April 17, 2024, at 10:00 AM. 
She has recently undergone carpal tunnel release surgery and was referred to me by her orthopedic surgeon 
for post-surgical rehabilitation. Emily works as a data entry specialist and is particularly concerned about numbness 
and discomfort in her right wrist, which seems to be more pronounced in the mornings or after prolonged periods of typing.

During our session, I noted that Emily's wrist and hand strength were noticeably reduced, which is not uncommon 
post-surgery but needs to be addressed to avoid long-term issues. She's anxious about her ability to return to 
work without experiencing pain or risking further injury. Our focus in the coming weeks will be on therapeutic 
exercises to improve her strength and flexibility, along with ergonomic adjustments at her workstation
to better support her recovery.
"""

# Process the notes and extract structured data
print("Processing...")
result = chain.invoke({"text": transcribedVoiceNotes})
print(result)

Step 2: Modify the Notes and Re-run the Code

Try editing the patient notes on lines 35–45 of the code snippet and see how the model extracts different information. For example, you can change the patient’s name, diagnosis, or visit date, and the model will adjust the structured output accordingly.

Output:

When you run the above code with the provided sample, you will see the structured output like this:

**Patient Name:** Emily Johnson
**Date of Birth:** March 15, 1994
**MRN:** null
**Visit Date:** April 17, 2024
**Visit Time:** 10:00 AM
**Current Diagnosis:** carpal tunnel release surgery
**Reason for Referral:** post-surgical rehabilitation
**Current Concerns:** numbness and discomfort in her right wrist, which seems to be more pronounced in the mornings or after prolonged periods of typing
**Observations:** Emily's wrist and hand strength were noticeably reduced, which is not uncommon post-surgery but needs to be addressed to avoid long-term issues. She's anxious about her ability to return to work without experiencing pain or risking further injury.
**Recommendations:** Our focus in the coming weeks will be on therapeutic exercises to improve her strength and flexibility, along with ergonomic adjustments at her workstation to better support her recovery.

Step 3: Experiment and See the Flexibility

This activity shows how simple it is to convert unstructured text into a highly structured format using AI. It’s easy to change inputs and adapt the model to any type of medical note, making it a valuable tool in various healthcare settings.

By automating this process, MediStruct saves time and ensures that important information is consistently captured in a structured format, ready for use in further analysis or documentation.

Activity 2: Record and Transcribe Your Own Patient Notes

In the second activity, we’ll take it a step further by allowing you to record your own patient notes, transcribe them, and convert them into structured data. This process demonstrates how MediStruct can work hands-free, taking voice input and turning it into a structured chart using Amazon Transcribe.

Step 1: Record Your Verbal Notes

Start by recording your patient notes. You can use a free online service like Vocaroo to record a short description of a patient visit. Once you’re done, click the save and share button, and copy the link to your recording.

Step 2: Retrieve and Save Your Recording

Now, we’ll use a Python script to retrieve your recorded notes from the Vocaroo link and save the audio file locally.

import requests
import os
import subprocess

# Paste your Vocaroo link here
url = "https://voca.ro/1lqz0Ttpf6J8"  # Replace with your own recording URL
clipid = url.split('/')[-1]

# Format the URL for direct access to the audio file
url = "https://media1.vocaroo.com/mp3/{}".format(clipid)
headers = {
    "Referer": "https://vocaroo.com",
}

# Fetch the audio from Vocaroo
response = requests.get(url, headers=headers)

# Paths for saving the audio files
AUDIO_PATH = 'transcripts/{}.mp3'.format(clipid)
WAV_PATH = 'transcripts/{}.wav'.format(clipid)

# Create a directory to store the audio files if it doesn't exist
if not os.path.exists("transcripts"):
    os.mkdir("transcripts")

# Save the MP3 file locally
with open(AUDIO_PATH, 'wb') as f:
    f.write(response.content)

# Convert MP3 to WAV format using ffmpeg
subprocess.run(["ffmpeg", "-y", "-loglevel", "error", "-i", AUDIO_PATH, WAV_PATH])

if response.status_code != 200:
    print("Something may have gone wrong retrieving your recording. Double-check your link.")
else:
    print("Recording retrieved and saved.")

Once the code runs, you should see the message “Recording retrieved and saved.” If not, check the URL link for accuracy.

Step 3: Play Back Your Recording

Before we transcribe the audio, you can play it back in your notebook to ensure it’s correct.

import IPython
IPython.display.Audio(WAV_PATH)

This will allow you to listen to your recording directly within the notebook.

Step 4: Transcribe the Audio

Next, we’ll transcribe the recording using Amazon Transcribe, which converts spoken language into text. Here’s the code to set up transcription:

import asyncio
import aiofile
from amazon_transcribe.client import TranscribeStreamingClient
from amazon_transcribe.handlers import TranscriptResultStreamHandler
from amazon_transcribe.model import TranscriptEvent
from amazon_transcribe.utils import apply_realtime_delay

import nest_asyncio
nest_asyncio.apply()

SAMPLE_RATE = 48000
BYTES_PER_SAMPLE = 2
CHANNEL_NUMS = 1
CHUNK_SIZE = 1024 * 8
REGION = "us-west-2"  # Set to your AWS region
TRANSCRIBED_AUDIO = ""

# Event handler to process transcription results
class MyEventHandler(TranscriptResultStreamHandler):
    async def handle_transcript_event(self, transcript_event: TranscriptEvent):
        global TRANSCRIBED_AUDIO
        results = transcript_event.transcript.results
        for result in results:
            if not result.is_partial and result.alternatives:
                TRANSCRIBED_AUDIO += str(result.alternatives[0].transcript)

# Asynchronous function for transcription
async def basic_transcribe():
    client = TranscribeStreamingClient(region=REGION)
    stream = await client.start_stream_transcription(
        language_code="en-US",
        media_sample_rate_hz=SAMPLE_RATE,
        media_encoding="pcm",
    )

    async def write_chunks():
        async with aiofile.AIOFile(WAV_PATH, "rb") as afp:
            reader = aiofile.Reader(afp, chunk_size=CHUNK_SIZE)
            await apply_realtime_delay(
                stream, reader, BYTES_PER_SAMPLE, SAMPLE_RATE, CHANNEL_NUMS
            )
        await stream.input_stream.end_stream()

    handler = MyEventHandler(stream.output_stream)
    await asyncio.gather(write_chunks(), handler.handle_events())

# Run the transcription
loop = asyncio.get_event_loop()
loop.run_until_complete(basic_transcribe())
print(TRANSCRIBED_AUDIO)

After running this code, your recording will be transcribed and stored in the TRANSCRIBED_AUDIO variable. You can modify the code to match your AWS region and settings, and the transcription will handle the voice input in real time.

Step 5: Convert Transcribed Audio to Structured Data

Finally, use MediStruct’s LLM to convert the transcribed audio into structured patient notes.

from langchain_core.prompts import ChatPromptTemplate
from langchain_aws import BedrockLLM

# Reuse the prompt template from earlier
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an extraction algorithm that will be doing charting for medical personnel. "
            "Only extract relevant information from the text and return it in the chart format given as Markdown. "
            "Do NOT remove the '\\n' character at the end of each line.\n\n"
            "**Patient Name:** [FirstName] [LastName]\n"
            "**Date of Birth:** [DateOfBirth]\n"
            "**MRN:** [MRN]\n"
            "**Visit Date:** [VisitDate]\n"
            "**Visit Time:** [VisitTime]\n"
            "**Current Diagnosis:** [CurrentDiagnosis]\n"
            "**Reason for Referral:** [ReasonForReferral]\n"
            "**Current Concerns:** [CurrentConcerns]\n"
            "**Observations:** [Observations]\n"
            "**Recommendations:** [Recommendations]\n"
            "If you do not know the value of an attribute, return null for that attribute.",
        ),
        ("human", "{text}"),
    ]
)

# Initialize the LLM and convert the transcribed text
llm = BedrockLLM(model_id="anthropic.claude-v2:1")
chain = prompt | llm

# Convert transcribed audio to structured data
result = chain.invoke({"text": TRANSCRIBED_AUDIO})
print(result)

Output:

Once the transcription and structuring are complete, you’ll see an output similar to this:

**Patient Name:** Emily Johnson
**Date of Birth:** March 15, 1994
**MRN:** null
**Visit Date:** April 17, 2024
**Visit Time:** 10:00 AM
**Current Diagnosis:** carpal tunnel release surgery
**Reason for Referral:** post-surgical rehabilitation
**Current Concerns:** numbness and discomfort in her right wrist
**Observations:** Emily's wrist and hand strength were noticeably reduced
**Recommendations:** Therapeutic exercises and ergonomic adjustments

This fully automated workflow demonstrates the power of AI in turning raw voice input into actionable, structured patient notes, ready for use in documentation or analysis.

Conclusion

MediStruct offers a powerful, AI-driven solution for transforming unstructured patient notes into organized, structured data. By automating the extraction of key details from text or audio inputs, MediStruct streamlines the documentation process for healthcare providers, enabling them to focus more on patient care and less on manual data entry.

Through the activities in this workshop, you’ve seen how easy it is to convert transcribed voice notes into structured charts using generative AI. From recording your own notes and transcribing them with Amazon Transcribe to structuring the extracted information with a large language model, MediStruct showcases the potential of AI in healthcare.

As medical professionals continue to face growing documentation demands, solutions like MediStruct pave the way for more efficient, accurate, and hands-free workflows. By integrating AI into daily tasks, we can significantly reduce the administrative burden on healthcare providers and ultimately improve patient outcomes.