Fork me on GitHub

Project Team 07: Visualizing Chemical and Genetic Data of Boston Tap Water, DS 4200 F20

Emily Chen, Meera Ravichandran, Fangyi Zhao

Service-Learning Course Project as part of DS 4200 F20: Information Visualization, taught by Prof. Cody Dunne, Data Visualization @ Khoury, Northeastern University.

Abstract

The Massachusetts Water Resources Authority supplies water to over 2.5 million people in the Boston area. The Pinto lab investigates the characteristics of the microbiome in this tap water in order to ensure safe drinking water is being distributed, focusing on the complete ammonia oxidizer bacterial communities in tap water. This project seeks to visualize characteristics of tap water, creating a visual “story” which combines data from all characteristics, both chemical and genomic, taken of tap water over a time series. We produced a webpage encoding three visualizations – a stacked bar chart, table and heatmap – with the goal of discovering patterns to help scientist and researchers model the behavior of bacterial communities in order to improve water filtration and distribution systems.

Visualization

* NOTE: The Membrane Non-Intact Cells represents the amount of cells with non-intact membranes, while the Membrane Intact Cells represents the amount of cells with intact membranes. These values add up to equal the total cell count in the water.

Demo Video

Visualization Explanation

Presentation Slides

We created three visualizations: a bar chart, heatmap, and table. Our main view is the stacked bar chart, and it illustrates the total cell concentration over time. The membrane intact cell count is encoded in a darker purple, while the membrane non-intact cell count is encoded in a lighter purple. We used colors of the same hue but of different saturations because they are parts of the same total cell count. Users can hover over each bar to see the exact cell counts. The stacks allow comparison between the membrane intact cells and membrane non-intact cells within each run or date.

For our details on demand feature, users can click on any of the bars, brush over a series of bars on the bar chart, or brush over rows on the table to view a corresponding heatmap. The heatmap uses a green-white sequential colormap to encode the percentages of the ten most prevalent bacterial phyla overall: Betaproteobacteria, Nitrospira, Gammaproteobacteria, Planctomycetia, Actinobacteria, Oligoflexia, Gemmatimonadetes, Chlamydiia, Flavobacteriia, and Deltaproteobacteria. We also included a legend to the left of the heatmap to illustrate the range of percentages. Additionally, upon hovering over each box in the heatmap, users are able to view the exact percentages for each of the phyla.

For the table, we showed an overview of the chemical data: Date, Season, Temperature, pH, Chlorine, Ammonium, Nitrate, and Nitrite. Our brushing and linking functionality will allow users to brush over rows in the table, highlighting them as well as outlining the corresponding bars in the main bar chart.

Finally, our brushing and linking feature connects the bar chart to the table. When a user brushes over rows in the table, the selected rows are highlighted, the corresponding bars are outlined in red. When a user brushes over bars in the bar chart or clicks on certain bars, the selected bars are outlined in red and the corresponding rows are highlighted. In both cases, the corresponding heatmap for the selected dates is displayed. When no data is selected in the other two charts, the full heatmap with all dates and phyla is shown.

Acknowledgments