Querying MIMIC-III in Postgres

Prerequisites: This tutorial assumes that you have an active connection to an instance of MIMIC-III running on PostgreSQL. Note also that all the queries are written assuming that the MIMIC-III database is on your default search path. To change this in PostgreSQL, run the following command:

set search_path to mimiciii;

Note in the above we have assumed MIMIC-III is installed under the schema mimiciii - it may be different on your system.

1. Overview

This tutorial provides an introduction to the structure and content of the MIMIC-III database. By the end of this tutorial you will be able to:

  • Obtain metadata from the various database objects (tables, views, etc);
  • Perform basic queries on a single table;
  • Perform basic ‘joins’ to combine tables and extract useful information;
  • Use database ‘views’ to extract high-level information.

2. Database metadata

Metadata for a particular table can be obtained in Postgres with \d+ <schema>.<tablename>. For example, the following command prints metadata for the admissions table:

\d+ MIMICIII.ADMISSIONS;

-- returns the following metadata

        Column        |              Type              | Modifiers | ...
----------------------+--------------------------------+-----------+ ...
 row_id               | integer                        | not null  | ...
 subject_id           | integer                        | not null  | ...
 hadm_id              | integer                        | not null  | ...
 admittime            | timestamp(0) without time zone | not null  | ...
 dischtime            | timestamp(0) without time zone | not null  | ...
 ...

Try looking at the metadata for other tables such as patients and chartevents.

3. Patient numbers

Ensure that the ‘Query…’ tab at the top of the screen is selected. SQL queries can be entered in the top panel and the results will be displayed at the bottom when the ‘Execute query’ button is pressed. Enter the following SQL in the box and press the ‘Execute query’ button.

SELECT *
FROM patients;

At the bottom of the screen you will see three columns: subject_id, gender, and date of birth. 50 records are retrieved at a time and you can page through the results using the controls at the bottom of the screen.

Obtain the number of patients with the following query:

SELECT COUNT(*)
FROM patients;

The ‘gender’ column identifies the gender of the patient. We can obtain the distinct values used to indicate gender using the following query:

SELECT DISTINCT(gender)
FROM patients;

We can see that ‘M’ and ‘F’ are the two characters used to indicate patient gender. We can use this information to obtain the number of female patients by adding a condition to select rows where the gender is ‘F’:

SELECT COUNT(*)
FROM patients
WHERE gender = 'F';

And the numbers of male and female patients can be obtained using the following query:

SELECT gender, COUNT(*)
FROM patients
GROUP BY gender;

4. Mortality and admissions

A flag which records whether or not a patient died in the hospital is stored in the patients table. Count the number of patients who died using the following query:

SELECT expire_flag, COUNT(*)
FROM patients
GROUP BY expire_flag;

The database also contains date of death for patients who died inside the hospital in the column ‘dod_hosp’ and the date of death found in social security death records in ‘dod_ssn’. This information from both columns is merged in the ‘dod’ column with priority given to ‘dod_hosp’. Note that this database contains adult and neonatal patients which will affect the mortality statistics. Categorizing patients into different age groups is carried out in the next section.

5. Patient age and mortality

To determine the adult mortality rate we must first select the adult patients. We define adults as those patients who are 15 years old or above at the date of their first admission. To perform this query, we must combine the patients and admissions tables to find patient admission dates and dates of birth. We have denoted ‘admissions’ with the alias ‘a’ and ‘patients’ with alias ‘p’:

SELECT p.subject_id, p.dob, a.hadm_id,
    a.admittime, p.expire_flag
FROM admissions a
INNER JOIN patients p
ON p.subject_id = a.subject_id;

Next we find the earliest admission date for each patient. This requires the use of two functions: the ‘MIN’ function, which obtains the minimum value, and the ‘PARTITION BY’ function, which determines the groups over which the minimum value is obtained. To determine the earliest admission time for each patient:

SELECT p.subject_id, p.dob, a.hadm_id,
    a.admittime, p.expire_flag,
    MIN (a.admittime) OVER (PARTITION BY p.subject_id) AS first_admittime
FROM admissions a
INNER JOIN patients p
ON p.subject_id = a.subject_id
ORDER BY a.hadm_id, p.subject_id;

A patient’s age is given by the difference between their date of birth and the date of their first admission. We can obtain this by combining the above query with another query to provide the ages. Furthermore, we assign categories to different ages: >= 15 years old are adults and the rest are assigned to other categories. Note the use of the WITH clause, which allows us to make a temporary view which we can query against in subsequent lines.


WITH first_admission_time AS
(
  SELECT
      p.subject_id, p.dob, p.gender
      , MIN (a.admittime) AS first_admittime
      , MIN( ROUND( (cast(admittime as date) - cast(dob as date)) / 365.242,2) )
          AS first_admit_age
  FROM patients p
  INNER JOIN admissions a
  ON p.subject_id = a.subject_id
  GROUP BY p.subject_id, p.dob, p.gender
  ORDER BY p.subject_id
)
SELECT
    subject_id, dob, gender
    , first_admittime, first_admit_age
    , CASE
        -- all ages > 89 in the database were replaced with 300
        WHEN first_admit_age > 89
            then '>89'
        WHEN first_admit_age >= 14
            THEN 'adult'
        WHEN first_admit_age <= 1
            THEN 'neonate'
        ELSE 'middle'
        END AS age_group
FROM first_admission_time
ORDER BY subject_id

The above query can now be combined with the WHERE and COUNT functions described earlier to determine the number of adult patients, whether or not they died, and therefore, their mortality rate.

WITH first_admission_time AS
(
  SELECT
      p.subject_id, p.dob, p.gender
      , MIN (a.admittime) AS first_admittime
      , MIN( ROUND( (cast(admittime as date) - cast(dob as date)) / 365.242,2) )
          AS first_admit_age
  FROM patients p
  INNER JOIN admissions a
  ON p.subject_id = a.subject_id
  GROUP BY p.subject_id, p.dob, p.gender
  ORDER BY p.subject_id
)
, age as
(
  SELECT
      subject_id, dob, gender
      , first_admittime, first_admit_age
      , CASE
          -- all ages > 89 in the database were replaced with 300
          -- we check using > 100 as a conservative threshold to ensure we capture all these patients
          WHEN first_admit_age > 100
              then '>89'
          WHEN first_admit_age >= 14
              THEN 'adult'
          WHEN first_admit_age <= 1
              THEN 'neonate'
          ELSE 'middle'
          END AS age_group
  FROM first_admission_time
)
select age_group, gender
  , count(subject_id) as NumberOfPatients
from age
group by age_group, gender

Note that no ‘middle’ patients show up - this reflects the fact that MIMIC-III does not contain data from paediatric patients.

6. ICU stays

In the MIMIC-III database, we define an ICU stay to be continuous if a patient is returned to an ICU room within 24 hours of being moved to a ward. Patient ICU movements are recorded in the transfers table:

SELECT *
FROM transfers;

The columns should be fairly self explanatory. Click on the transfers table on the left hand side if you need more information about the columns and the data they contain. The ‘prev_careunit’ and ‘curr_careunit’ contain the names of the previous and current careunit respectively. The transfers table also includes ‘prev_wardid’ and ‘curr_wardid’ columns, which contain the IDs of the previous and current careunit respectively. Ward IDs, which specify the room within a ward, have no corresponding key in order to protect patient health information.

The transfers table may have multiple entries per patient to provide detail of all movement between various careunits of the hospital. The first entry in the transfers table for a patient who comes into the ICU will have nothing in the ‘prev_careunit’ column. Similarly, the last entry for a patient will have nothing in the ‘curr_careunit’. Entries that have nothing in both previous and current careunit columns indicate that the patient has been transfered between non intensive care units. An example query for one patient and result from the transfers table is shown below. Note that columns ‘intime’, ‘outtime’, and ‘los’ have been truncated.

SELECT *
FROM transfers
WHERE HADM_ID = 112213;
row_id subject_id hadm_id icustay_id
54 12 112213
55 12 112213
56 12 112213 232669
57 12 112213
58 12 112213 232669
59 12 112213
60 12 112213

7. Services

Services is a newly added table in MIMIC-III which contains information about the transfers from being under one service to another during a patient’s stay. The services table contains columns including ‘prev_service’ and ‘curr_service’ which contain the names of previous and current services respectively. ‘transfertime’ is the time at which the patient was moved from ‘prev_service’ to ‘curr_service’.

8. Tutorial problem

How would you gather useful information about patients admitted to the ICU? The problem can be broken down into several parts:

Step 1

First start with retrieving ‘subject_id’, ‘hadm_id’, ‘icustay_id’, ‘intime’, and ‘outtime’ from the ‘icustays’ table.

Step 2

Using the patients table retrieve the calculated age of patients.

Step 3

Separate neonates from adult patients.

Step 4

By incorporating the admissions table, find how long each stay was BEFORE the patients were admitted to the ICU

Step 5

Next find the date of the patient’s death if applicable.

Step 6

Then find those deaths that occurred while the patients were in the hospital.

Step 7

Find how many of those deaths occurred within the ICU.

Solutions to the problems in section 8

Solution to step 1

SELECT ie.subject_id, ie.hadm_id, ie.icustay_id,
    ie.intime, ie.outtime
FROM icustays ie;

Solution to step 2

SELECT ie.subject_id, ie.hadm_id, ie.icustay_id,
    ie.intime, ie.outtime,
    ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) AS age
FROM icustays ie
INNER JOIN patients pat
ON ie.subject_id = pat.subject_id;

Solution to step 3

SELECT ie.subject_id, ie.hadm_id, ie.icustay_id,
    ie.intime, ie.outtime,
    ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) AS age,
    CASE
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) <= 1
            THEN 'neonate'
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) <= 14
            THEN 'middle'
        -- all ages > 89 in the database were replaced with 300
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) > 100
            then '>89'
        ELSE 'adult'
        END AS ICUSTAY_AGE_GROUP
FROM icustays ie
INNER JOIN patients pat
ON ie.subject_id = pat.subject_id;

Solution to step 4

SELECT ie.subject_id, ie.hadm_id, ie.icustay_id,
    ie.intime, ie.outtime,
    ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) as age,
    ROUND((cast(ie.intime as date) - cast(adm.admittime as date))/365.242, 2) as preiculos,
    CASE
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) <= 1
            THEN 'neonate'
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) <= 14
            THEN 'middle'
        -- all ages > 89 in the database were replaced with 300
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) > 100
            THEN '>89'
        ELSE 'adult'
        END AS ICUSTAY_AGE_GROUP
FROM icustays ie
INNER JOIN patients pat
ON ie.subject_id = pat.subject_id
INNER JOIN admissions adm
ON ie.hadm_id = adm.hadm_id;

Solution to step 5

SELECT ie.subject_id, ie.hadm_id, ie.icustay_id,
    ie.intime, ie.outtime, adm.deathtime,
    ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) as age,
    ROUND((cast(ie.intime as date) - cast(adm.admittime as date))/365.242, 2) AS preiculos,
    CASE
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) <= 1
            THEN 'neonate'
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) <= 14
            THEN 'middle'
        -- all ages > 89 in the database were replaced with 300
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) > 100
            THEN '>89'
        ELSE 'adult'
        END AS ICUSTAY_AGE_GROUP
FROM icustays ie
INNER JOIN patients pat
ON ie.subject_id = pat.subject_id
INNER JOIN admissions adm
ON ie.hadm_id = adm.hadm_id;

Solution to step 6

SELECT ie.subject_id, ie.hadm_id, ie.icustay_id,
    ie.intime, ie.outtime, adm.deathtime,
    ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) AS age,
    ROUND((cast(ie.intime as date) - cast(adm.admittime as date))/365.242, 2) AS preiculos,
    CASE
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) <= 1
            THEN 'neonate'
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) <= 14
            THEN 'middle'
        -- all ages > 89 in the database were replaced with 300
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) > 100
            THEN '>89'
        ELSE 'adult'
        END AS ICUSTAY_AGE_GROUP,
    -- note that there is already a "hospital_expire_flag" field in the admissions table which you could use
    CASE
        WHEN adm.hospital_expire_flag = 1 then 'Y'
    ELSE 'N'
    END AS hospital_expire_flag
FROM icustays ie
INNER JOIN patients pat
ON ie.subject_id = pat.subject_id
INNER JOIN admissions adm
ON ie.hadm_id = adm.hadm_id;

Solution to step 7

SELECT ie.subject_id, ie.hadm_id, ie.icustay_id,
    ie.intime, ie.outtime, adm.deathtime,
    ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) AS age,
    ROUND((cast(ie.intime as date) - cast(adm.admittime as date))/365.242, 2) AS preiculos,
    CASE
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) <= 1
            THEN 'neonate'
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) <= 14
            THEN 'middle'
        -- all ages > 89 in the database were replaced with 300
        WHEN ROUND((cast(ie.intime as date) - cast(pat.dob as date))/365.242, 2) > 100
            THEN '>89'
        ELSE 'adult'
        END AS ICUSTAY_AGE_GROUP,
    -- note that there is already a "hospital_expire_flag" field in the admissions table which you could use
    CASE
        WHEN adm.hospital_expire_flag = 1 then 'Y'           
    ELSE 'N'
    END AS hospital_expire_flag,
    -- note also that hospital_expire_flag is equivalent to "Is adm.deathtime not null?"
    CASE
        WHEN adm.deathtime BETWEEN ie.intime and ie.outtime
            THEN 'Y'
        -- sometimes there are typographical errors in the death date, so check before intime
        WHEN adm.deathtime <= ie.intime
            THEN 'Y'
        WHEN adm.dischtime <= ie.outtime
            AND adm.discharge_location = 'DEAD/EXPIRED'
            THEN 'Y'
        ELSE 'N'
        END AS ICUSTAY_EXPIRE_FLAG
FROM icustays ie
INNER JOIN patients pat
ON ie.subject_id = pat.subject_id
INNER JOIN admissions adm
ON ie.hadm_id = adm.hadm_id;