Your first MIMIC query

Table of contents

  1. Prerequisites
  2. Understanding the basic structure
    1. The patients table
  3. Your first query
    1. Expected result
  4. Exploring patient demographics
  5. Adding hospital admissions
    1. Key concepts
  6. Next steps
  7. Common issues
    1. Query timeout
    2. Permission errors
    3. Different platforms

Learn how to write your first query against the MIMIC-IV database. This tutorial assumes you have already gained access to MIMIC data.

Prerequisites

  • Access to MIMIC data (see Getting Started)
  • Basic SQL knowledge
  • Access to a query environment (BigQuery, PostgreSQL, etc.)

Understanding the basic structure

MIMIC-IV is organized into modules. Let’s start with the most fundamental table: patients in the hosp module.

The patients table

The patients table contains basic demographic information. Each row corresponds to a single patient, identified by their subject_id.

subject_id is a unique identifier which specifies an individual patient. Any rows associated with a single subject_id pertain to the same individual.

Your first query

Let’s count how many patients are in the database:

SELECT COUNT(*) AS total_patients
FROM `physionet-data.mimiciv_hosp.patients`;

This example uses BigQuery syntax. Adjust the table name format for your platform.

Expected result

You should see roughly 300,000+ patients in MIMIC-IV.

Exploring patient demographics

Let’s look at the gender distribution:

SELECT
    gender,
    COUNT(*) AS count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) AS percentage
FROM `physionet-data.mimiciv_hosp.patients`
GROUP BY gender
ORDER BY count DESC;

Adding hospital admissions

Now let’s join with the admissions table to see admission patterns:

SELECT
    p.gender,
    COUNT(DISTINCT a.hadm_id) AS total_admissions,
    COUNT(DISTINCT p.subject_id) AS unique_patients,
    ROUND(COUNT(DISTINCT a.hadm_id) / COUNT(DISTINCT p.subject_id), 2) AS avg_admissions_per_patient
FROM `physionet-data.mimiciv_hosp.patients` p
JOIN `physionet-data.mimiciv_hosp.admissions` a
    ON p.subject_id = a.subject_id
GROUP BY p.gender;

Key concepts

  • subject_id is a unique identifier which specifies an individual patient. Any rows associated with a single subject_id pertain to the same individual.

  • hadm_id is an integer identifier which is unique for each patient hospitalization.

Next steps

Now that you’ve run your first queries:

  1. Explore other tables - Try querying ICU stays
  2. Learn the schema - Review the schema overview
  3. Try BigQuery - Read the longer BigQuery tutorial

Common issues

Query timeout

If your query times out, try adding LIMIT 1000 to test on a smaller dataset first.

Permission errors

Ensure you’ve properly signed the data use agreement for the modules you’re querying.

Different platforms

  • PostgreSQL: Remove backticks and use schema.table format
  • AWS: Use appropriate S3 bucket references
  • Local: Adjust paths to your local database

Well done! You’ve successfully run your first MIMIC queries. Understanding these basic patterns will help you tackle more complex analyses.