Plotting Covid-19 Pandemic

covid
Author

Ashwin Malshe

Published

March 8, 2020

In this post, we will visualize spread of worldwide COVID-19 cases through time. I obtained the data from Rami Krispin’s website: https://ramikrispin.github.io/coronavirus/ using coronovirus package. I also decided to do some experimentation using John Coene’s fantastic echarts4r package, which allows us to access echarts API.

Load the libraries and get the data in the R session.

library(dplyr)
library(echarts4r)
library(coronavirus)

# Get the data
data("coronavirus")

Data Preparation

Print out the first 6 observations.

head(coronavirus)
        date province country     lat      long      type cases   uid iso2 iso3
1 2020-01-22  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
2 2020-01-23  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
3 2020-01-24  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
4 2020-01-25  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
5 2020-01-26  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
6 2020-01-27  Alberta  Canada 53.9333 -116.5765 confirmed     0 12401   CA  CAN
  code3    combined_key population continent_name continent_code
1   124 Alberta, Canada    4413146  North America             NA
2   124 Alberta, Canada    4413146  North America             NA
3   124 Alberta, Canada    4413146  North America             NA
4   124 Alberta, Canada    4413146  North America             NA
5   124 Alberta, Canada    4413146  North America             NA
6   124 Alberta, Canada    4413146  North America             NA

We are interested in date and type. Let’s take a look at the distinct values for type.

coronavirus %>% count(type)
       type      n
1 confirmed 330327
2     death 330327
3  recovery 313182

There are only 3 values: confirmed, death, and recovery. Next we will create sum of cases for each of the values and store them in separate data sets.

dt1 <- coronavirus %>% 
  filter(type == "confirmed") %>% 
  group_by(date) %>% 
  summarize(Confirmed = sum(cases, na.rm = TRUE), .groups = "drop")

dt2 <- coronavirus %>% 
  filter(type == "death") %>% 
  group_by(date) %>% 
  summarize(Death = sum(cases, na.rm = TRUE), .groups = "drop")


dt3 <- coronavirus %>% 
  filter(type == "recovery") %>% 
  group_by(date) %>% 
  summarize(Recovered = sum(cases, na.rm = TRUE), .groups = "drop")

There is an error in the recovery figures in 14th December 2020. So I plot only cases and deaths.

Finally, we will merge the 2 datasets so that we will have the counts of each type in separate columns.

dt <- dt1 %>% 
  inner_join(dt2, by = "date") 

Plot

Finally, time to make the plot! Note how we can build this plot in separate elements.

dt %>% 
  e_charts(x = date) %>% 
  e_line(serie = Confirmed) %>% 
  e_line(serie = Death) %>% 
  e_tooltip(trigger = "axis") %>% 
  e_datazoom(type = "slider") %>% 
  e_title("Worldwide COVID-19 cases") %>% 
  e_theme("bee-insipired") 


This plot is interactive so you can hover over the plot to get the exact readings. You can also toggle time series on or off by clicking on the legends on top.