As part of the Data Visualization module we had a group assignment to pick a topic of our choice and develop a visualization in Tableau.
We picked Zika as it was (a) something we wanted to know more about, (b) had occurred recently but the outbreak was over and (c) we found some data to get us started. We thought a tool to visualize the geographical spread of a disease with time would be useful for researchers and the general public alike.
In creating a data based visualization there are two major considerations
(1) what is the story?
(2) who am I telling it to?
The second question was easily answered. We wanted to create visualizations to allow the general public to explore and learn about Zika in their own way and at their own pace.
But first we needed to educate ourselves on the topic and figure out the story!
Finding the story and data collation
We googled, followed up on references within references within references, manually collated data from newspaper articles and peer-reviewed journals dating back to 1947, wrote some python script to join circa 2000 csv files into one and joined all this with other data sets we found online (WHO, PAHO, and another github repository) into one master csv file.
The aim of the data collation exercise was to get as complete a picture as we could, within the allotted time frame, using what was freely available to us online. While the resulting data set does not necessarily contain every single last case of Zika ever reported, it is much more comprehensive than anything we came across while researching this project.
In this way we collated the data and gained a deep subject matter knowledge in one go.
It turns out there are many, many stories to tell about Zika and so from this point, we split up and each took on a story and set about creating visualizations.
I took on the History of the Spread of Zika by Mosquitos. Basically, when and where it has been established to occur and the number of cases reported.
Note: Zika can also be transmitted in other ways but mosquito is the primary vector and when a case occurs in an area where the offending mosquito is present, it is assumed that this is the mode of transmission.
To tell this story I created two dashboards
(1) The History of the Spread of Zika – a high level, mostly qualitative history of where and when Zika has been reported
(2) A quantitative dashboard focused on the outbreak in the America’s from 2015-2017.
You need access to Tableau Online to use the link above so I have included a screenshot for anyone without access. Refer to Figure 1.
Figure 1 Screenshot of the History of the Spread of Zika dashboard. In the screenshot, all years up to and including 2018 are shown. The user can control what years to see, highlight the parent strain to see where the African and Asian strains have been found to occur or filter by occurence type (study, isolated case or outbreaks). The #Cases and #Countries scorecards are linked to the year filter.There is a tooltip included (not shown) that shows the median % population effected for scientific studies and the # cases per country for isolated confirmed cases and outbreaks.
So that you can critically compare the visualization to the story I was trying to tell here is a synopsis (it’s a bit long, but that’s the whole point of the visualization – to replace this text with an interactive learning tool):
- Zika was first detected in Uganda in 1947.
- It silently circulated through Africa and Asia until 2006 with only 14 reported cases in 58 years. It was mostly detected retrospectively as the antibody in patient serum samples.
- Over 40 different strains of Zika have been identified but all are thought to belong to just two parent strains, referred to as the ‘African’ and ‘Asian’ strains. Between 1948 and 2006, the African strain was only detected in Africa and the Asian strain was only detected in Asia.
- The first recorded outbreak occurred on the island of Yap just off the coast of Indonesia in 2007. There were 83,000 cases, equating to 73% of the population.
- Zika then made its way across the pacific, hopping from island to island causing similar outbreaks in French Polynesia and Easter Island before being reported in Brazil in early 2015 where it was first associated with microcephaly.
- There was an almost simultaneous outbreak on a small island group called Capo Verde off the west coast of Senegal.
- Although not reported at the time, studies on samples collected in Haiti in 2014 were subsequently found to be positive for Zika.
- Zika was then reported all over the low-lying areas of South and Central America, the Caribbean and even reached North America (Texas and Florida) with almost 800,000 cases reported in 115 countries by the end of the outbreak (circa end 2017).
- There were 3 isolated cases reported in Guinea-Bissau on the west coast of Africa in 2016.
- Where the parent strain has been identified it has been found that the Asian strain of Zika caused all cases outside of mainland Africa, including the outbreak on Capo Verde.
So in reaching Capo Verde, Zika completed its first circumnavigation of the planet, almost 70 years after it was first detected.
A few points on the technical aspects of the dashboard
Showing the build up over time
An important feature of the dashboard was to allow the user to see the build up of the geographical spread of Zika over time. By dropping ‘Year’ onto the Page shelf, Tableau Desktop creates a slide show of the view for each Year with playback functionality. There is a ‘Show History’ option, which is supposed to continue to show all previous observations as the slide show progresses. But with chloropleths (filled maps), the previous values are not shown filled but with a circle symbol instead. Refer to Figure 2 (left).
Figure 2 The History of Zika (1947 -2006). When using the ‘Show History’ option with a chloropleth in Tableau, previous observations are displayed as a filled circle symbol instead of a filled country (left). A work around involving parameters and calculated fields enables the history to be shown as filled countries (right).
I did some googling and found a suggested work around. A parameter called ‘Parameter Year’ was set-up with integers spanning the range of years in question (1947 – 2017). A calculated field ‘Show Year’ was then created as [Year] <= [Parameter Year] and dropped on the filter shelf, set to only show values when this expression is TRUE (ie when the Year is less than or equal to the Parameter Year). The parameter control was then turned on and can be used to slide between Years, showing all countries with a recorded occurrence of Zika up to and including the Year in question. Refer to Figure 1 (right).
You do loose the playback functionality though, so hopefully this is an issue Tableau will deal with sooner rather than later.
Dealing with Alaska and HAWAII
Zika made its way via mosquito as far as Texas and Florida in the United States. But, using a chloropleth on a country level means Alaska and Hawaii are also included. This is misleading. To deal with this I found two options (i) create my own custom geo-coding for the United States and import it into Tableau or (ii) be a bit creative with blank objects.
Given this was a once off visualization, I decided not to invest time in the custom geo-coding. Since the countries Zika made it to are all clustered about the equator, I just created blank objects, coloured them white, gave them a white border and used them to hide the Northern part of the world and Hawaii.
2. The America’s Outbreak 2015-2017
For this outbreak between 2015 and 2017, I had detailed weekly case counts on a country level, which enabled me to put together a mini movie of the rise and fall of Zika in the Americas between 2015 and 2107 using Tableau Desktop. Unfortunately, again, the ‘play’ feature is only available in Tableau Desktop and not Tableau Online or Server. I originally replaced the playback function with a slider on Tableau Online but it is nowhere near as effective. Instead, I have made a homemade movie of the dashboard in action and posted it to you-tube.
As well as a map showing the geographical spread and a qualitative representation of the number of cases in the form of the size of the filled blue circle (left), the dashboard includes graphs showing the weekly and cumulative case counts for (i) all of the countries (top right) and (ii) allows the user to select a particular country to follow (bottom right). Tooltips are included to allow a user to explore the numbers of cases in more detail at anytime point. In the video I’ve set the country to Brazil as it is of most interest.
You should get the following information from watching the mini movie a couple of times:
- There is a first wave of Zika reported only in Brazil. This wave peaks in July 2015 and there are approximately 30,000 cases.
- The second country to detect Zika is Capo Verde, off the west coast of Senegal.
- The second wave is much larger and peaks in February 2016. This wave of Zika is reported to occur all over low-lying areas of South America, Central America and the Caribbean and by the end of this wave the cumulative count of cases is almost 800,000.
- A third wave occurs in the first half of 2017, but it is much smaller than the first and second wave.
Finally, I’ d like to point out that the explosion of cases in January 2016 is likely a result of Zika becoming a reportable disease as opposed to Zika expanding its territory to such a large extent overnight.
More questions than Answers
Learning about the history of Zika left me with more questions than answers…
Why has the Asian and not the African strain made it around the world?
Why has the African strain not at least made it to Capo Verde?
Why has Zika not been associated with microcephaly before Brazil in 2015?
These question are the subject of current scientific investigation. You can follow the links above if you are interested to find out more.