Sandy
Sandy Author at Health Tech Bytes. Cloud Architect, data science and AI practitioner, health informatics strategist

Interact with critical care data using natural language

Interact with critical care data using natural language

ChatGPT has had an illustrious year. Not only has it managed to be an imaginative assistant and pass various licensing exams, it has also sparked a significant cultural debate around how this technology should be used in everyday life.

Earlier this year, ChatGPT has shown promise in healthcare by passing the US Medical Licensing Exam (USMLE), but what does this actually mean for doctors and patients? The aforementioned paper suggests that there could be potential for clinical decision-making; in contrast, it also produced concerning results in real-world scenarios. I think there will be plenty of opinions to determine how and where generative AI will be used in healthcare, some of which may not come to fruition for years to come.

However, there is one area I am fairly certain ChatGPT will see rapid adoption: how users interact with databases. As much of healthcare data is now stored in EHRs, it has inadvertently become more cumbersome to navigate them. Thanks to LLMs, any user can now converse with a database using natural language. This is a huge productivity shift and a powerful capability for anyone that works with healthcare data.

This tutorial continues on from the previous tutorial on using LLMs with LangChain. We will explore using LLMs and LangChain to interact with a SQL Database containing critical care data.

Pre-requisites:

  • Basic knowledge of OpenAI
  • Basic knowledge of LangChain
  • Intermediate knowledge of Azure SQL Database

Environment:

  • Database: Azure SQL 12.0.2000.8
  • Database Driver: ODBC driver 18 for SQL Server
  • Python: 3.11.6
  • OS: MacOS 12.7.1
  • VS Code: 1.84.1

All resources can be found here.

Load data into Azure SQL Database

MIMIC-III is a publicly available dataset comprising of de-identified data for > 40,000 critical care patients who stayed at the Beth Israel Deaconess Medical Center between 2001 and 2012.

Download the MIMIC-III* files (open access), and load the ADMISSIONS table into Azure SQL Database.

*Note: If planning to use credentialed MIMIC data, it is recommended to enroll in the Azure OpenAI service.

Connect to Azure SQL Database

Next, we connect python to Azure SQL Database. This tutorial provides more details on the setup. As there are known issues with the Microsoft ODBC Driver on MacOS, be sure to follow this guide if errors are encountered.

In my own experience, I had to make the following changes via Terminal:

1
2
3
$ brew install openssl@1.1
$ rm -rf $(brew --prefix)/opt/openssl version=$(ls $(brew --prefix)/Cellar/openssl@1.1 | grep "1.1")
$ ln -s $(brew --prefix)/Cellar/openssl@1.1/$version $(brew --prefix)/opt/openssl

Get the odbc connection string to create an instance of the SQL database:

1
2
3
4
5
# connect python to azure sql db
connectionString = f"mssql+pyodbc://{username}:{password}@{server}/{database}?driver={driver}"

try:
    db = SQLDatabase.from_uri(connectionString)

Set the OpenAI API Key

Navigate to OpenAI to obtain the API Key. This will be needed to authenticate requests to the API.

1
2
# set API keys to authenticate requests to the API
OPENAI_API_KEY = "<API key>"

Set up SQL Database Agent

LangChain provides an agent that allows the user to interact with SQL databases. Below are the steps to initialize it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# create new llm model
llm = ChatOpenAI(
    temperature=0, 
    openai_api_key=OPENAI_API_KEY, 
    model_name='gpt-3.5-turbo')

# toolkit for interacting with sql databases
toolkit = SQLDatabaseToolkit(
    db=db,
    llm=llm)

# initialize agent
agent_executor = create_sql_agent(
    llm=llm,
    toolkit=toolkit,
    verbose=True,
    agent_type=AgentType.OPENAI_FUNCTION
    )

Ask questions in natural language

Let’s run a query to aggregate some data:

1
2
# query 2
agent_executor.run("What is the frequency of ethnicities?")

Response:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
> Entering new AgentExecutor chain...

Invoking: `sql_db_list_tables` with `{}`


ADMISSIONS, BuildVersion, ErrorLog
Invoking: `sql_db_schema` with `ADMISSIONS`



CREATE TABLE [ADMISSIONS] (
	row_id INTEGER NOT NULL, 
	subject_id INTEGER NOT NULL, 
	hadm_id INTEGER NOT NULL, 
	admittime DATETIME2 NOT NULL, 
	dischtime DATETIME2 NOT NULL, 
	deathtime DATETIME2 NULL,
...

SELECT ethnicity, COUNT(*) AS frequency 
FROM ADMISSIONS 
GROUP BY ethnicity 
ORDER BY frequency DESC 
OFFSET 0 ROWS 
FETCH NEXT 10 ROWS ONLY
Invoking: `sql_db_query` with `SELECT ethnicity, COUNT(*) AS frequency FROM ADMISSIONS GROUP BY ethnicity ORDER BY frequency DESC OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY`


[('WHITE', 86), ('HISPANIC/LATINO - PUERTO RICAN', 15), ('UNKNOWN/NOT SPECIFIED', 11), ('BLACK/AFRICAN AMERICAN', 7), ('OTHER', 3), ('HISPANIC OR LATINO', 2), ('ASIAN', 2), ('AMERICAN INDIAN/ALASKA NATIVE FEDERALLY RECOGNIZED TRIBE', 2), ('UNABLE TO OBTAIN', 1)]The frequency of ethnicities in the database is as follows:

- WHITE: 86
- HISPANIC/LATINO - PUERTO RICAN: 15
- UNKNOWN/NOT SPECIFIED: 11
- BLACK/AFRICAN AMERICAN: 7
- OTHER: 3
- HISPANIC OR LATINO: 2
- ASIAN: 2
- AMERICAN INDIAN/ALASKA NATIVE FEDERALLY RECOGNIZED TRIBE: 2
- UNABLE TO OBTAIN: 1

> Finished chain.

What’s interesting here is that the chain reveals the LLM’s reasoning process. For example, the LLM recognizes that it has to create the ADMISSIONS table in order to aggregate the results. Without having the user write SQL statements, the LLM generates it dynamically and returns the result in natural language.

We can also compare the LLM response to what would normally be generated by SQL queries.

Conclusion

In this tutorial, we learned how to use LLMs to interact with the MIMIC-III database. Using the LangChain framework, we were able to connect an OpenAI model to an Azure SQL Database. This allowed users to ask questions about MIMIC-III Admissions data in natural language. This is a very simple and effective way information can be retrieved without having to write SQL.

References

comments powered by Disqus