Data Visualization during outbreaks like COVID-19

Introduction

Hundreds of initiatives have been initiated to display key indicators related to the spread of COVID-19; individuals, non-profit companies, for-profit companies and public administrations published dashboards all over the world.

The dashboards were built by software like Qlik or relied on private or open-source code.

Centers for Disease Control and Prevention became famous : CDC in the US, ECDC in Europe became overloaded by requests.

Data collected by public administrations were made publicly available and the first dashboards came up with its share of critics.

REST APIs

Access data on COVID19 through an easy API for free. Build dashboards, mobile apps or integrate in to other applications.

https://covid19api.com/ with 34,642,916 requests served by the API

https://rapidapi.com/collection/coronavirus-covid-19

Blogs

Developers Respond to COVID-19 https://rapidapi.com/blog/developers-respond-to-covid-19

Articles and Blogs

Articles and Blog posts describing how to increase the quality of your dashboards

Open collaboration on COVID-19
https://github.blog/2020-03-23-open-collaboration-on-covid-19 By Martin Woodward
Live tracker: How many coronavirus cases have been reported in each U.S. state?
https://www.politico.com/interactives/2020/coronavirus-testing-by-state-chart-of-new-cases/ By Beatrice Jin | 03/16/2020 8:10 PM EDT | Updated 04/22/2020 7:10 PM EDT
Closing the data divide: the need for open data, Apr 21, 2020 | Jennifer Yokoyama - Chief IP Counsel
https://blogs.microsoft.com/on-the-issues/2020/04/21/open-data-campaign-divide/
Bing delivers new COVID-19 experiences including partnership with GoFundMe to help affected businesses
https://blogs.bing.com/search/2020_04/Bing-delivers-new-COVID-19-experiences-including-partnership-with-GoFundMe-to-help-affected-business
17 (or so) responsible live visualizations about the coronavirus, for you to use
https://blog.datawrapper.de/coronaviruscharts
Epidemic Modeling 101: Or why your CoVID19 exponential fits are wrong
https://github.com/DataForScience/Epidemiology101
Charting new territory; How The Economist designs charts for Instagram
https://medium.economist.com/charting-new-territory-7f5afb293270

Dashboards

Dashboards
description dashboard url source code  
WHO Coronavirus (COVID-19) https://covid19.who.int none  
WHO - Explore the Data https://covid19.who.int/explorer none  
Healthmap (made by schools and hospitals) https://healthmap.org/covid-19 none  
COVID-19 India https://www.covid19india.org https://github.com/covid19india/covid19india-react  
COVID-19 Scenarios https://covid19-scenarios.org https://github.com/neherlab/covid19_scenarios  
List of dashboards WW https://covid19dashboards.com https://github.com/github/covid19-dashboard  
How many days each country’s outbreak is behind or ahead of the United States https://predictcovid.com https://github.com/lachlanjc/covid19  
COVID-19 Italia - Monitoraggio situazione (Desktop app) https://github.com/pcm-dpc/COVID-19 http://arcg.is/C1unv  
COVID-19 Italia - Monitoraggio situazione (Mobile app) https://github.com/pcm-dpc/COVID-19 http://arcg.is/081a51  
Latest updates on COVID-19 in Tokyo https://stopcovid19.metro.tokyo.lg.jp https://github.com/tokyo-metropolitan-gov/covid19  
Official World Health Organization COVID-19 App https://github.com/WorldHealthOrganization/app    
Real-time tracking of pathogen evolution https://nextstrain.org https://github.com/nextstrain  
16 000 viral genomic sequences of hCoV-19 shared with unprecedented speed via GISAID https://www.gisaid.org https://www.gisaid.org/epiflu-applications/next-hcov-19-app
Dashboard of the COVID-19 Virus Outbreak https://co.vid19.sg/singapore none  

Datasources

72h nonprofit online hackathon is to develop open-source prototypes, which contribute to solving the most pressing challenges in the current crisis.

Tens of thousands of volunteers build solutions for the Corona pandemic. To avoid reinventing the wheel, we created a central place to learn about existing projects and add new ideas.

Location for summaries and analysis of data related to n-CoV 2019, first reported in Wuhan, China

The Institute for Health Metrics and Evaluation (IHME) is an independent population health research center at UW Medicine, part of the University of Washington, that provides rigorous and comparable measurement of the world’s most important health problems and evaluates the strategies used to address them

COVID-19 Projections worldwide

NIH National Institute of Health, Open-Access Data and Computational Resources to Address COVID-19:

The COVID Tracking Project collects and publishes the most complete testing data available for US states and territories

Bing COVID-19 data sources

A repo for coronavirus related case count data from around the world. The repo will be regularly updated

We are building an open database of COVID-19 cases with chest X-ray or CT images

Data in time of COVID-19

Hack for Wuhan

#WirVsVirus Hackathon

Data Repository by Johns Hopkins

Open Source Know-How

White House Dataset

COVID-19 Open Research Dataset (CORD-19) by Microsoft Research

COVID-19 Data

COVID-19 Open Research Dataset Challenge

Crowdbreaks Data

COVID-19 Cases Switzerland

COVID-19 case numbers communicated by official Swiss Canton’s and FL’s sources

Swiss Hospital Data

Swiss Federal Railways Data

Open Data City of Zurich

Zurich Tourism Open Data

Reddit Resources

Nth Opinion

World Health Organization App
https://github.com/WorldHealthOrganization/app

European CDC Data

ACAPS Resources

Real-time tracking of pathogen evolution

nextstrain.org

COVID Tracker

COVID-19 Hospital Impact Model for Epidemics

NVIDIA Parabricks Genomics Analysis Toolkit

Electricity Generation, Transportation and Consumption Data

COVID-19 API Initiative

Facebook API & SDK

Proximity Tech to fight COVID-19 by Uepaa

Scandit SDK

For barcode, text and ID document scanning capabilities

https://www.scandit.com/developers/

Swiss Radio and Television API

MongoDB’s flexible data model

Go and try it for free at https://cloud.mongodb.com and if you require more credits, please fill in this form: https://forms.gle/R4zLtWurWkNFMozq9

IBM Cloud and Watson APIs

Postman COVID-19 API Resource Center

Google Maps, Cloud and G Suite

Fossilo.com - Daily Archives of COVID-19 relevant pages

Lockdowns by country

COVID-19 Lockdown dates by country; A list of countries and the dates that each country went into lockdown.

Tests conducted by country

Covid19 Tests Conducted by Country; Captures the number of tests conducted in any country/region

clinical trials

Vivli A global clinical research data sharing platform. The Vivli team is dedicated to helping researchers share and access data from clinical trials to advance science

Todo

continue à partir de data vizualization

Widgets anyone can embed into a html page

COVID-19 on Bing

Data is collected from multiple sources (CDC, WHO, ECDC, Wikipedia, 24/7 Wall St., BNO News) that update at different times and may not always align. Some regions may not provide complete breakdown of COVID-19 related stats.

https://bing.com/covid

<div class="bingwidget" data-type="covid19" data-market="en-us" data-language="en-us" data-app="bingwidget"></div>

<script src="//www.bing.com/widget/bootstrap.answer.js" async=""></script>

Data formats

How to format the data your are exposing: text, numeric, complex object using XML or JSON?

List of locations in a TXT file

Source : https://github.com/microsoft/COVID-19-Widget/blob/master/AllLocations.txt

All Locations.txt
Location: Country/State
/
/United States
/United States/Alabama
/United States/Alaska
/Afghanistan
/Albania
/Algeria

Data models

How should you design your data model? Which level of details do you plan?

Data model

Data models
Record name
Total world statistics
Cases by country
Cases by country by day
Data for all countries
Mask usage instructions
Infection histories by country
India-specific figures
North American figures
Data by ISO code
Testing statistics
Transportation infection data

Multilingual applications

Twenty percent of the population understand English, sharing a dashboard in English describing an outbreak is of little help for the vast majority of the popuplation.

Note

Build a multilanguage application from day 1.

[WikiLangSpok20] indicates an approximate list of languages by the total number of speakers.

List of languages by total number of speakers
Rank Language Speakers (millions)
1 English 1.268
2 Mandarin Chinese 1.120
3 Hindi 637.2
4 Spanish 537.9
5 French 276.6
6 Standard Arabic 274.0
7 Bengali 265.2
8 Russian 258.0
9 Portuguese 252.2
10 Indonesian 199.0
11 Urdu 170.6
12 German 131.6
13 Japanese 126.4
14 Swahili 98.5
15 Marathi 95.3

List of locations

List of locations consist of several information; e.g., country name, state, county, province, commune, longitude, latitude, post box

Bing lists countries and states names in a text file; US States are separated by a slash ‘/’
https://github.com/microsoft/COVID-19-Widget/blob/master/AllLocations.txt

Metadata

  • Hospital

    • All beds needed is the total number of beds needed exclusively for COVID patients, and includes ICU beds needed for COVID patients. covid19.healthdata.org
    • Bed shortage
    • ICU beds needed is the total number of ICU beds needed exclusively for COVID patients. covid19.healthdata.org
    • ICU bed shortage
    • Invasive ventilators needed
  • Uncertainty is the range of values that is likely to include the correct projected estimate for a given data category. Larger uncertainty intervals can result from limited data availability, small studies, and conflicting data, while smaller uncertainty intervals can result from extensive data availability, large studies, and data that are consistent across sources. The model presented in this tool has a 95% uncertainty interval and is represented by the shaded area(s) on each chart. covid19.healthdata.org

  • correlation between metadata

    • All beds needed -> Bed shortage
    • ICU beds needed -> ICU bed shortage

Projects

Comparison of COVID-19 case reporting from different sources

Daily cumulative case numbers (starting Jan 22, 2020) reported by the Johns Hopkins University Center for Systems Science and Engineering (CSSE), WHO situation reports, and the Chinese Center for Disease Control and Prevention (Chinese CDC) for within (A) and outside (B) mainland China.

Source : An interactive web-based dashboard to track COVID-19 in real time: https://www.thelancet.com/journals/laninf/article/PIIS1473-3099%2820%2930120-1/fulltext

Software Architecture

Gatsby is a free and open source framework based on React that helps developers build blazing fast websites and apps:

urlwatch configuration to monitor state COVID-19 data:

urlwatch monitors webpages for you:

The COVID Tracking Project collects and publishes the most complete testing data available for US states and territories.

Scan/Trim/Extra Pipeline for State Coronavirus Site

The crawler and parsers are there for the 50 US states and DC. The focus is to collect offical published COVID-19 statistics.

Provide embed code to integrate your dashboards easily

Vocabulary

Use fatal instead of death

Use Lives Lost instead of death

RDA COVID-19 Recommendations and guidelines

The objectives of the RDA COVID-19 Working Group (CWG) are:

to clearly define detailed guidelines on data sharing under the present COVID-19 circumstances to help stakeholders follow best practices to maximize the efficiency of their work, and to act as a blueprint for future emergencies; to develop guidelines for policymakers to maximise timely data sharing and appropriate responses in such health emergencies; to address the interests of researchers, policy makers, funders, publishers, and providers of data sharing infrastructures.

Source : RDA COVID-19 Recommendations and guidelines 1st release - open for comments https://www.eoscsecretariat.eu/eosc-liaison-platform/post/rda-covid-19-recommendations-and-guidelines-1st-release-open-comments

FAIR and timely

“FAIR principles [means] that data, software, models and other outputs should be Findable, Accessible, Interoperable and Reusable”

“A balance between achieving ‘perfectly’ FAIR outputs and timely sharing is necessary with the keygoal of immediate and open sharing as a driver.”

Metadata

“While rich metadata is desirable, even a minimum set of key fields/descriptors is valuable” [RDA20]

“The use of common metadata standards, as adopted by one’s relevant discipline, as well as vocabularies, are highly recommended”

“metadata should describe the data as well as the terms under which it can be accessed and reused.”

“Ideally, data and metadata should be exposed via machine readable endpoints (e.g. RDF, APIs)”

“Where there are restrictions on accessing or using datasets, metadata should be shared openly to enable discovery (e.g. CC0or CC-BYlicenses).”

Documentation

“Research outputs need to be documented, which includes documentation of methodologies used to define and construct data, data cleaning, data imputation, data provenance and so on”

“Software should provide documentation that describes at least the libraries, algorithms, assumptions and parameters used”

“Equally, research context, methods used to collect data, and quality-assurance steps taken are important”

“When sharing datasets, other relevant outputs (or documents) should also be made available, such as codebooks, lab journals, or informed consent form templates,”

Use of Trustworthy Repositories

“To facilitate data quality control, timely sharing and sustained access, data should be deposited in trustworthy data repositories (TDRs)”

“Whenever possible, these should be trustworthy data repositories (TDRs) that have been certified, subject to rigorous governance, and committed to longer-term preservation of their data holdings.”

“As the first choice, widely used disciplinary repositories are recommended for maximum accessibility and assessability of the data, followed by general or institutional repositories”

“Using existing open repositories is better than starting new resources.”

“By providing persistent identifiers, demanding preferred formats, rich metadata, etc., certified trustworthy repositories already guarantee a baseline FAIRness of and sustained access to the data, as well as citation.”

Ethics and Privacy

“Access to individual participant data andtrial documents should be as open as possible and as closed as necessary, to protect participant privacy and reduce the risk of data misuse”

What are the blocking factors for sharing data?

  • non-machine-readable data (e.g., PDF)
  • heterogeneous measurement standards
  • divergent metadata formats
  • lack of version control
  • fragmented datasets
  • delays in releasing data
  • non-standard definitions and reporting parameters
  • lack of metadata
  • unavailable or undocumented computer code
  • frequently changing web addresses
  • copyright and usage conditions
  • translation requirements
  • consents
  • approvals
  • legal restrictions
  • lack or no integration of clinical, eHealth, surveillance, and research systems within and across jurisdictions or providers

Major difficulty

Lack of contextual data needed to study the evolution of disease in sub-populations. e.g.,

  • healthy sub-populations that are vulnerable to serious long term effects following recovery that we don’t know about yet because we don’t have the data and because we are focusing on deaths
  • age-specific vulnerabilities
  • disadvantaged sub-populations with limited health care
  • vulnerabilities evident in severe disease associated with comorbidities
  • vulnerabilities due to environmental conditions
  • vulnerabilities due to social and cultural norms
  • following sequelae and immunity

Glossary

FAIR
FAIR principles

Data, software, models and other outputs should be Findable, Accessible, Interoperable and Reusable

[WI16]

FAIRER
FAIRER principles
Findable, Accessible, Interoperable, Reusable, Ethical, and Reproducible [RDA20]
OMICS
Omics are defined as data from cell and molecular biology [RDA20]
RDA
Reasearch Data alliance https://www.rd-alliance.org
SPIRIT
Standard Protocol Items: Recommendations for Interventional Trials
TDR
Trustworthy Data Repositories
Trustworthy Data Repositories (TDRs) are repositories that have been certified, subject to rigorous governance, and committed to longer-term preservation of their data holdings. [RDA20]

Citations

[RDA20](1, 2, 3, 4) https://www.rd-alliance.org/system/files/RDA%20COVID-19%3B%20recommendations%20and%20guidelines%2C%201st%20release%2024%20April%202020.pdf
[WI16]The FAIR Guiding Principles for scientific data management and stewardship. Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

Glossary for Data Visualization

CDC

Center for Disease Control and Prevention

https://www.cdc.gov

ECDC

European Center for Disease Control and Prevention

https://ecdc.europa.eu

Citations

[WikiLangSpok20]Wikipedia contributors. (2020, April 28). List of languages by total number of speakers. In Wikipedia, The Free Encyclopedia. Retrieved 17:52, April 30, 2020, from https://en.wikipedia.org/w/index.php?title=List_of_languages_by_total_number_of_speakers&oldid=953636952

To do list

Todo

continue à partir de data vizualization

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/data-visualization/checkouts/stable/source/dataviz/datasources.rst, line 251.)

Indices and tables