MIMIC is available for use via two cloud platforms: Google Cloud Platform (GCP) and Amazon Web Services (AWS). Access to these services is directly controlled via your PhysioNet account.
In order to use MIMIC on the cloud, you must:
- Be an approved user on PhysioNet. Read this page for instructions on gaining access to MIMIC-III.
- Add cloud credentials to your PhysioNet profile
- Request access on the MIMIC-III PhysioNet project page
Adding cloud credentials
Go to your PhysioNet profile page.
For GCP access, ensure that one of your e-mails is a Google account. This can either be a gmail account (as in the picture), or a G Suite account if your organization is a member of G Suite. You can add an e-mail at the bottom of the page:
You will need to verify your e-mail address before continuing (note: e-mail addresses are only used for GCP access, and not for AWS access).
Once you have a verified e-mail address ready, navigate to the “Cloud” page on PhysioNet.
You should see two options on this page: one for GCP, and one for AWS.
For GCP, click the drop down menu and set your GCP e-mail to the Google account you provided in the earlier step.
For AWS, add your AWS canonical ID. This is not your e-mail. It is a numeric identifier that can be found in your AWS cloud profile. Click here to go to your AWS profile page. Then look for your “Account Id”:
Accessing a project on the cloud
Now that your cloud credentials are available in PhysioNet, you can request access to databases within those cloud systems. Cloud access to PhysioNet projects such as MIMIC-III and eICU-CRD are managed independently. You must request access to the cloud systems via their project pages (access is provisioned instantly for credentialed approved users).
Accessing MIMIC-III on the cloud
For MIMIC-III, go to the MIMIC-III PhysioNet project page.
Once there, scroll to the bottom to the “Files” section. If the page shows a restricted-access warning, you need to get access to MIMIC-III or sign the data use agreement for this project. Otherwise, you should see the following:
The following describes the access options listed above in the order they are listed:
- Downloading the data as one large zip file
- This downloads the data directly from the PhysioNet servers.
- Cloud: Adds your GCP e-mail to the access list for GCP BigQuery.
- This option adds the GCP e-mail in your PhysioNet account to a BigQuery access list; it’s required in order to use the data in BigQuery.
- Cloud: Adds your GCP e-mail to the access list for downloading the data from a GCP Storage Bucket.
- This option adds the GCP e-mail in your PhysioNet account to a GCP access list; it’s required in order to download the data from a storage bucket on GCP.
- Cloud: A public page for viewing the data description in the AWS Open Data Repository.
- This forwards you to the AWS Open Data Repository listing of the data. For information on how to use AWS, we recommend reading this tutorial.
- Cloud: Adds your AWS account ID to the access list for AWS.
- This is necessary in order to access the data via AWS services. For information on how to use AWS, we recommend reading this tutorial.
- Provides a command for downloading the data from PhysioNet as individual CSV files using
wget(when compared to the image above, your command will have a distinct username).
- This downloads the data directly from PhysioNet servers, but in their raw (usually uncompressed) form.
Options #1, #3, #4, and #6 all provide the ability to download the data locally. For the remainder of this guide, we will focus on the two options which provide access to the data in a cloud based relational database (#2 and #5 in the above).
GCP - BigQuery
BigQuery is a columnar, distributed relational database management system. BigQuery accesses only the columns specified in the query, making it ideal for data analysis workflows. Read more about BigQuery on Google.
Once you have requested access to using MIMIC-III on BigQuery, you need to “pin” the dataset to see it on the web browser. This adds the dataset to the sidebar in BigQuery. While not required, we do recommend pinning the data for easier navigation.
- Go to the BigQuery console: http://console.cloud.google.com/bigquery
- On the left sidebar, next to “Resources”, click “+ ADD DATA”, followed by “Pin a project”
- In the pop up window, type
physionet-data, and click “PIN”.
- In the sidebar on the left, you should now see the
physionet-dataproject. Click the arrow to the left of
physionet-datato expand the project.
- You should now see the following projects:
mimiciii_derived. You are ready to query the data! Try a simple query in the main dialogue box:
SELECT * FROM `physionet-data.mimiciii_clinical.icustays` WHERE icustay_id < 200100 ORDER BY icustay_id
The query should return some data, and your browser window should be similar to the below:
At this point you are ready to use MIMIC on BigQuery!
A tutorial on using BigQuery to query MIMIC-III is available here.
Note that we have a number of pre-generated “views” of the data. These are available in the
mimiciii_derived dataset which you are free to query. All code used to generate these views has been made openly available in the google-cloud-views branch of the MIMIC code repository.
If you are having issues, see the Troubleshooting section.
Recently, the MIT Laboratory of Computational Physiology (LCP) started hosting the MIMIC-III dataset on the AWS cloud through the AWS Public Dataset program. You can now use the MIMIC-III dataset via S3 without having to download, copy, or pay to store it. Instead, you can analyze the MIMIC-III dataset in the AWS Cloud using AWS services like Amazon EC2, Athena, AWS Lambda, or Amazon EMR. AWS Cloud availability enables quicker and cheaper research into the dataset.
Services like Athena also offer you new analytical approaches to the MIMIC-III dataset. Using Athena, you can execute standard SQL queries against MIMIC-III without first loading the data into a database. Because you can reference the MIMIC-III dataset hosted by MIT LCP in Amazon S3, your analyses always reference the most recent version of the MIMIC-III dataset. Live hosting reduces upfront time and effort, eliminates data synchronization issues, improves data analysis, and reduces overall study costs.
Once you have successfully requested access to MIMIC-III on AWS, you can follow the instructions linked below. These instructions initialize and execute an entire study performed on MIMIC-III using a hosted Jupyter notebook service on AWS.
I get a pop-up about Terms of Service
You will need to agree to all GCP Terms of Service and adhere to their terms in order to use the data on BigQuery.
When I go to BigQuery, it asks me to create a project
Almost all of your interactions with GCP are associated with a project. Importantly, all billing for your usage must be allotted to a single project. In order to use BigQuery you must have an activate project associated with your account. BigQuery offers a $300 free trial for first time users.
Create a project and select it as your activate project. If you’ve done this correctly, then the top bar of the Google console page should stop saying “Select a project”, and instead have your project name. For example, in the below, I have selected the project
alistairewj, which is now the activate project:
I can only see
These datasets are fully public, so the implication is that you have not been granted access to the full versions of the databases. Please (1) double check you have entered your cloud information into your PhysioNet profile, verifying any e-mails as needed, and (2) requested access to the specific cloud project on its respective PhysioNet project page.