Visualizing Algorithmic Cause of Death Predictions

Team Members: Aditya Anerao, Maggie Dorr, Chethan Jujjavarapu, Andrew Teng

Abstract

Determining an individual’s cause of death is an important concern, particularly in areas where death commonly occurs outside of hospitals or healthcare facilities. As a result, various algorithms have been developed to predict these causes based on medical information, including symptoms the patient exhibits. Verbal autopsy (VA), a survey with a relative or close contact, is used to identify the leading cause of death in populations without adequate vital registration systems. VA algorithms leverage symptom-cause information (SCI) to associate symptoms with causes of death (CoD). However, these algorithms vary in accuracy, which can be improved by grouping CoD. In collaboration with Tyler McCormick (UW Statistics and Sociology Departments), we created interactive and dynamic visualizations that depict associations between SCI and CoD based on these algorithms to help policymakers and stakeholders in low resource areas visualize uncertainty in predictive models. Furthermore, we designed these visualizations to assist in understanding the cause-symptom relationships and algorithm performances.

Keywords: Verbal Autopsy; Death Prediction; Cause of Death; Algorithms

Data and Summary Figures

We obtained data from the Population Health Metrics Research Consortium (PHMRC), which contains 7,841 adult deaths across six distinct locations: (1) Andhra Pradesh, India; (2) Bohol, Philippines; (3) Dar es Salaam, Tanzania; (4) Mexico City, Mexico; (5) Pemba Island, Tanzania; and (6) Uttar Pradesh, India. All recorded deaths have VA data and expert confirmed CoD. The confirmed CoD are grouped in three levels consisting of 34, 46, and 55 causes.

Resources

Paper (.pdf) || Poster (.pdf) || GitHub README.md


Visualizations

Note: There are four visualizations. The first three utilize the dropdown menu in the yellow box below, while the last visualization has its own dropdown.

Select a location and grouping:

(1) High-Level Exploration: Force Directed Network Graph

To understand the relationships between causes of death and symptoms, we start off with a network. This graph loads in data and presents the connections between nodes (symptoms and causes) with edges. Our network approach allows for quick understanding between symptoms and causes at a high-level.

To assist new users, we will follow a specific cause of death, acute myocardial infarction (AMI), across our graphs. Please select "Andhra Pradesh, India (34 Causes)" from the above dropdown menu and then "Acute Myocardial Infarction" from the yellow dropdown menu. This cause is linked to numerous symptoms. We can identify these symptoms by hovering over them (red circles) along with their strength value by hovering over the accompanying edge. However, with this approach, it is difficult to identify all the symptoms' names, since you need to hover over each one to get more information.


Find a specific 'true causes of death' node:

Exploration Tips:

- Click on a node to highlight and see what causes and symptoms are related.

- Double click the node to unhighlight.

- Hover over the links and nodes to get more details.

- Drag nodes around to the whitespace to make it easier to click on links.

- Zoom in and out to explore further.


Legend:
  • True Causes of Death
  • Symptoms


(2) Deeper Exploration: Parallel Coordinates

While the network is informative, it lacks structure to easily understand the data. To address this issue, we developed a parallel coordinates graph. With this visual, we now have structure, where causes are on the left y-axis and symptoms are on the right y-axis with relationships represented as lines. We can easily explore and compare different causes and symptoms to get a better understanding of the most associated causes/symptoms.

Continuing our exploration of AMI, please click on "Acute Myocardial Infarction" under the "Cause" y-axis. You should see the association lines highlighted that represent the symptoms linked to AMI. With the parallel coordinate graph, you are now able to identify all the symptoms at once. As expected, we see that AMI is associated to symptoms related to breathing problems and coughing. Surprisingly, we see that AMI is also associated to both light and heavy drinkers in this region. If we want to quantify the association for each of these symptoms, we can hover over each line. This is inefficient as it is difficult to observe any kind of association pattern across our symptoms.


Exploration Tips:

- Click either the Cause or Symptom names to highlight all associated relationships; double click to unselect.
- Hover over lines to observe a single relationship and the accompanying strength value (frequency of this relationship/total # of relationships) as a tooltip.


(3) Low-Level Exploration: Heatmap

Though the parallel coordinates graph above was more clear in the relationships between causes of death and symptoms, it does not as clearly display the uncertainty in some relationships. To address this issue, we built a heatmap which allows users to easily observe each cause and symptom along the y- and x-axes, respectively, and at their intersection, view the brightness (i.e., lightness) of the red hue as the level of association.

In the parallel coordinate graph, we were able to identify all the symptoms associated to AMI at once, however we were unable to compare the strength of each symptom's association at once to observe any interesting patterns. With our heatmap graph, we can do this as the hue of red indicates the strength of a symptom-cause association. Please identify "Acute Myocardial Infarction" on the y-axis; unsuprisingly, we observe that all symptoms do not have the same association strength to AMI. Interestingly, for this region, the symptom of 'No Injuries' is highly associated with AMI. However, if you switch to a different location or grouping (using the yellow bar), you'll find that is not always the case.


Exploration Tips:

- Hover over the squares for the precise association value between the cause and symptom of interest.
- Click a square to highlight corresponding row and column; double click a square to unhighlight.
- Scroll left and right to view more.

Legend: 0 1

(4) Outcomes: Average Probability of Predicting Cause of Death by Algorithm

This graph was designed to compare how different algorithms predict cause of death; the triple bar chart allows all three algorithms to be simultaneously compared. This visualization shows each algorithm's average probability for predicting the correct cause of death. The three algorithms are color coded. This figure serves as an indirect comparison of the performance of different algorithms. Therefore, the height of different bars can be compared for an individual cause of death.

Please select "Outcome Grouping: 34", we can observe that the average probability value for predicting AMI based on symptoms is low and quite different between the three algorithms: NBC - 0.13, InterVA - 0.32, and InSilicoVA - 0.42. One could theorize that because AMI is associated to 71 symptoms, it may share enough symptoms with other serious causes that makes it difficult to predict with confidence. For example, AMI and AIDS share 69 symptoms. In addition, symptoms that are highly associated with AMI are also associated with other causes, such as the symptom Continuous Trouble Breathing is also highly associated to the Cancer-related CoDs. This may make it difficult for the algorithms to find features that are strongly associated to AMI specifically.


Select a grouping:

Exploration Tips:

- Hover a bar in the graph to see numerical average probability value of a predicted cause of death outcome for a given algorithm.
- Algorithms are represented with different colors as per the legend.