Working with EMAP star

A template JupyterNotebook for working with EMAP. The following features of this notebook, and associated files are documented here to minimise the risk of data leaks or other incidents.

  • Usernames and passwords are stored in a .env file that is excluded from version control. An example file is found at ./.env.example (which is tracked and shared via version control).
  • .gitattributes are set to strip JupyterNotebook cells when pushing to GitHub

NOTE

This notebook will be be run manually.
That should happen from the project root directory where the readme.md and the .env file are stored.

The following changes to the project root assuming that the notebook kernel is normally starting from the same directory as the notebook itself.

%cd ../..

Basic set-up

Load libraries

import os
from dotenv import load_dotenv
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sqlalchemy import create_engine

Load environment variables

Load environment variables and set-up SQLAlchemy connection engine for the EMAP Star

load_dotenv(dotenv_path=".env")
try:
    assert os.getenv("DOTENV_FILE_EXISTS") == 'TRUE'
except AssertionError:
    print("!!! ERROR: check that the .env file exists at the top level of the project")
    print("!!! ERROR: check that the relative path is correct")
# Construct the PostgreSQL connection
uds_host = os.getenv('EMAP_DB_HOST')
uds_name = os.getenv('EMAP_DB_NAME')
uds_port = os.getenv('EMAP_DB_PORT')
uds_user = os.getenv('EMAP_DB_USER')
uds_passwd = os.getenv('EMAP_DB_PASSWORD')

emapdb_engine = create_engine(f'postgresql://{uds_user}:{uds_passwd}@{uds_host}:{uds_port}/{uds_name}')

A first example script

Now use the connection to work with EMAP.

For example, let’s inspect patients currently in ED or Resus.

Here’s the SQL:

-- Example script 
-- to pick out patients currently in A&E resus or majors

SELECT
   vd.location_visit_id
  ,vd.hospital_visit_id
  ,vd.location_id
  -- ugly HL7 location string 
  ,lo.location_string
  -- time admitted to that bed/theatre/scan etc.
  ,vd.admission_datetime
  -- time discharged from that bed
  ,vd.discharge_datetime

FROM star.location_visit vd
-- location label
INNER JOIN star.location lo ON vd.location_id = lo.location_id
WHERE 
-- last few hours
vd.admission_datetime > NOW() - '12 HOURS'::INTERVAL    
-- just CURRENT patients
AND
vd.discharge_datetime IS NULL
-- filter out just ED and Resus or Majors
AND
-- unpacking the HL7 string formatted as 
-- Department^Ward^Bed string
SPLIT_PART(lo.location_string,'^',1) = 'ED'
AND
SPLIT_PART(lo.location_string,'^',2) ~ '(RESUS|MAJORS)'
-- sort
ORDER BY lo.location_string
;

The SQL script is stored at ./utils/queries/current_bed.sql.
We can load the script, and read the results into a Pandas dataframe.

# Read the sql file into a query 'q' and the query into a dataframe
q = Path('utils/queries/current_bed.sql').read_text()
df = pd.read_sql_query(q, emapdb_engine)
df.head()

Working with hospital visits

A series of three scripts

  1. Simply pull hospital visits
  2. Add in hospital numbers (MRN) and handle patient merges
  3. Add in patient demographics

Simply pull hospital visits

SELECT
   vo.hospital_visit_id
  ,vo.encounter
  -- admission to hospital
  ,vo.admission_datetime
  ,vo.arrival_method
  ,vo.presentation_datetime
  -- discharge from hospital
  -- NB: Outpatients have admission events but not discharge events
  ,vo.discharge_datetime
  ,vo.discharge_disposition

-- start from hospital visits
FROM star.hospital_visit vo
WHERE 
      -- hospital visits within the last 12 hours
      vo.presentation_datetime > NOW() - '12 HOURS'::INTERVAL   
      -- emergencies
  AND vo.patient_class = 'EMERGENCY'
      -- attending via ambulance
  AND vo.arrival_method = 'Ambulance'
      -- sort descending
ORDER BY vo.presentation_datetime DESC
; 
# Read the sql file into a query 'q' and the query into a dataframe
q = Path('./utils/queries/hospital_visit_1.sql').read_text()
df = pd.read_sql_query(q, emapdb_engine)

df.head()

Add in hospital numbers (MRN) and handle patient merges

See the series of joins in the middle of the script that retrieve the live MRN. That is we recognise that patients may have had an episode of care with one MRN, and then that episode was merged with another historical MRN. One of those two MRNs will then become the ‘live’ MRN and can be used to trace the patient across what otherwise would be different identities.

SELECT
   vo.hospital_visit_id
  ,vo.encounter
  ,vo.admission_datetime
  ,vo.arrival_method
  ,vo.presentation_datetime
  ,vo.discharge_datetime
  ,vo.discharge_disposition
  -- original MRN
  ,original_mrn.mrn AS original_mrn
  -- live MRN
  ,live_mrn.mrn AS live_mrn

-- start from hospital visits
FROM star.hospital_visit vo
-- get original mrn
INNER JOIN star.mrn original_mrn ON vo.mrn_id = original_mrn.mrn_id
-- get mrn to live mapping 
INNER JOIN star.mrn_to_live mtl ON vo.mrn_id = mtl.mrn_id 
-- get live mrn 
INNER JOIN star.mrn live_mrn ON mtl.live_mrn_id = live_mrn.mrn_id 

WHERE 
      -- hospital visits within the last 12 hours
      vo.presentation_datetime > NOW() - '12 HOURS'::INTERVAL   
      -- emergencies
  AND vo.patient_class = 'EMERGENCY'
      -- attending via ambulance
  AND vo.arrival_method = 'Ambulance'
      -- sort descending
ORDER BY vo.presentation_datetime DESC
; 
# Read the sql file into a query 'q' and the query into a dataframe
q = Path('./utils/queries/hospital_visit_2.sql').read_text()
df = pd.read_sql_query(q, emapdb_engine)

df.head()

Add in patient demographics

SELECT
   vo.hospital_visit_id
  ,vo.encounter
  ,vo.admission_datetime
  ,vo.arrival_method
  ,vo.presentation_datetime
  ,vo.discharge_datetime
  ,vo.discharge_disposition
  -- original MRN
  ,original_mrn.mrn AS original_mrn
  -- live MRN
  ,live_mrn.mrn AS live_mrn

  -- core demographics
  ,cd.date_of_birth
  -- convert dob to age in years
  ,date_part('year', AGE(cd.date_of_birth)) AS age
  ,cd.sex
  ,cd.home_postcode
  -- grab initials from first and last name
  ,CONCAT(LEFT(cd.firstname, 1), LEFT(cd.lastname, 1)) AS initials

-- start from hospital visits
FROM star.hospital_visit vo
INNER JOIN star.core_demographic cd ON vo.mrn_id = cd.mrn_id

-- get original mrn
INNER JOIN star.mrn original_mrn ON vo.mrn_id = original_mrn.mrn_id
-- get mrn to live mapping 
INNER JOIN star.mrn_to_live mtl ON vo.mrn_id = mtl.mrn_id 
-- get live mrn 
INNER JOIN star.mrn live_mrn ON mtl.live_mrn_id = live_mrn.mrn_id 

WHERE 
      -- hospital visits within the last 12 hours
      vo.presentation_datetime > NOW() - '12 HOURS'::INTERVAL   
      -- emergencies
  AND vo.patient_class = 'EMERGENCY'
      -- attending via ambulance
  AND vo.arrival_method = 'Ambulance'
      -- sort descending
ORDER BY vo.presentation_datetime DESC
; 
# Read the sql file into a query 'q' and the query into a dataframe
q = Path('./utils/queries/hospital_visit_3.sql').read_text()
df = pd.read_sql_query(q, emapdb_engine)

df.head()