Your first MIMIC query
Table of contents
- Prerequisites
- Understanding the basic structure
- Your first query
- Exploring patient demographics
- Adding hospital admissions
- Next steps
- Common issues
Learn how to write your first query against the MIMIC-IV database. This tutorial assumes you have already gained access to MIMIC data.
Prerequisites
- Access to MIMIC data (see Getting Started)
- Basic SQL knowledge
- Access to a query environment (BigQuery, PostgreSQL, etc.)
Understanding the basic structure
MIMIC-IV is organized into modules. Let’s start with the most fundamental table: patients in the hosp module.
The patients table
The patients table contains basic demographic information. Each row corresponds to a single patient, identified by their subject_id.
subject_id is a unique identifier which specifies an individual patient. Any rows associated with a single subject_id pertain to the same individual.
Your first query
Let’s count how many patients are in the database:
SELECT COUNT(*) AS total_patients
FROM `physionet-data.mimiciv_hosp.patients`;
This example uses BigQuery syntax. Adjust the table name format for your platform.
Expected result
You should see roughly 300,000+ patients in MIMIC-IV.
Exploring patient demographics
Let’s look at the gender distribution:
SELECT
gender,
COUNT(*) AS count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) AS percentage
FROM `physionet-data.mimiciv_hosp.patients`
GROUP BY gender
ORDER BY count DESC;
Adding hospital admissions
Now let’s join with the admissions table to see admission patterns:
SELECT
p.gender,
COUNT(DISTINCT a.hadm_id) AS total_admissions,
COUNT(DISTINCT p.subject_id) AS unique_patients,
ROUND(COUNT(DISTINCT a.hadm_id) / COUNT(DISTINCT p.subject_id), 2) AS avg_admissions_per_patient
FROM `physionet-data.mimiciv_hosp.patients` p
JOIN `physionet-data.mimiciv_hosp.admissions` a
ON p.subject_id = a.subject_id
GROUP BY p.gender;
Key concepts
-
subject_idis a unique identifier which specifies an individual patient. Any rows associated with a singlesubject_idpertain to the same individual. -
hadm_idis an integer identifier which is unique for each patient hospitalization.
Next steps
Now that you’ve run your first queries:
- Explore other tables - Try querying ICU stays
- Learn the schema - Review the schema overview
- Try BigQuery - Read the longer BigQuery tutorial
Common issues
Query timeout
If your query times out, try adding LIMIT 1000 to test on a smaller dataset first.
Permission errors
Ensure you’ve properly signed the data use agreement for the modules you’re querying.
Different platforms
- PostgreSQL: Remove backticks and use
schema.tableformat - AWS: Use appropriate S3 bucket references
- Local: Adjust paths to your local database
Well done! You’ve successfully run your first MIMIC queries. Understanding these basic patterns will help you tackle more complex analyses.