MIMIC-IV schema overview
Table of contents
- Modular design
- Core identifier system
- Data flow through the hospital
- Module characteristics
- Data relationships
- Temporal considerations
- Common analysis patterns
- Best practices
Understanding the overall structure and organization of the MIMIC-IV database is crucial for effective analysis.
Modular design
MIMIC-IV uses a modular design. The hospital (hosp) module contains data acquired from the hospital wide electronic health record. The ICU (icu) module contains data from the clinical information system used within the ICU. Additional modules (ED, CXR, ECG, Note) extend MIMIC-IV with data from other systems.
Core identifier system
The database uses a hierarchical identifier system:
Patient level
subject_id is a unique identifier which specifies an individual patient. Any rows associated with a single subject_id pertain to the same individual.
Hospital admission level
hadm_id is an integer identifier which is unique for each patient hospitalization.
Unit stay level
stay_id is an integer which uniquely identifies a stay in an specific department within the hospital. The identifier is used to delineate contiguous ICU or ED stays. As stay_id is generated using the earliest transfer_id existing for the department (e.g. the first transfer_id for an ICU), stay_id and transfer_id are often identical for the same ward stay.
Data flow through the hospital
Understanding how patients move through the hospital helps in understanding the data:
- Patient arrives →
subject_idassigned - Hospital admission →
hadm_idassigned - Unit transfer →
stay_idassigned (ICU, ED, etc.) - Data collection → Events recorded with appropriate IDs
Module characteristics
Hospital (hosp) module
- Purpose: Hospital-wide EHR data
- Key tables:
patients,admissions,transfers,labevents,prescriptions,diagnoses_icd - Coverage: All hospital patients
- Granularity: Order/event-level
ICU module
- Purpose: Intensive care monitoring
- Key tables:
chartevents,inputevents,outputevents,procedureevents - Coverage: ICU patients only
- Granularity: Hour-to-hour or more frequent
Emergency department (ED) module
- Purpose: Emergency department care
- Key tables:
edstays,triage,vitalsign,medrecon,pyxis,diagnosis - Coverage: ED patients only
- Granularity: Visit-level and event-level
Note module
- Purpose: De-identified free-text clinical notes
- Key tables:
discharge,radiology(and their detail tables) - Coverage: Subset of hospitalized patients
- Granularity: Note-level
CXR module
- Purpose: Chest x-ray images and reports linked to MIMIC-IV
- Key tables: lookup tables linking
subject_idtostudy_idanddicom_id - Coverage: ED patients with chest radiographs
- Granularity: Study- and image-level
ECG module
- Purpose: Diagnostic 12-lead ECG waveforms and machine measurements
- Key tables:
record_list,machine_measurements,waveform_note_links - Coverage: Subset of patients with ECG recordings
- Granularity: Study-level
Data relationships
One-to-many relationships
- One patient → Many admissions
- One admission → Many diagnoses
- One admission → Many lab results
- One ICU stay → Many vital sign measurements
Cross-module linking
Patients can be followed across modules using identifiers:
-- Link patient demographics to ICU data
SELECT p.gender, c.valuenum AS heart_rate
FROM `physionet-data.mimiciv_hosp.patients` p
JOIN `physionet-data.mimiciv_hosp.admissions` a ON p.subject_id = a.subject_id
JOIN `physionet-data.mimiciv_icu.icustays` i ON a.hadm_id = i.hadm_id
JOIN `physionet-data.mimiciv_icu.chartevents` c ON i.stay_id = c.stay_id
WHERE c.itemid = 220045 -- Heart rate
Temporal considerations
Time precision
- Hosp: Usually day or hour precision
- ICU: Minute-level precision common
- ED: Varies by event type
For more on how time is represented in MIMIC-IV, including charttime vs. storetime and date shifting, see the Core concepts page.
Common analysis patterns
Patient cohort selection
- Start with the
patientstable for demographics - Join to
admissionsfor admission criteria - Add module-specific criteria as needed
Longitudinal analysis
- Identify patient population
- Extract events from relevant modules
- Align timestamps for temporal analysis
Outcome assessment
- Define outcome from appropriate module
- Link back to patient characteristics
- Account for censoring and follow-up
Best practices
Query design
- Always include appropriate time filters
- Be mindful of data volume in the ICU module
- Use indexed columns for joins when possible
Data validation
- Check for reasonable value ranges
- Validate identifier linkages
- Account for missing data patterns
This schema overview provides the foundation for understanding MIMIC-IV. Each module has its own detailed documentation with table-specific information.