Data Visualization during outbreaks like COVID-19¶
Introduction¶
Hundreds of initiatives have been initiated to display key indicators related to the spread of COVID-19; individuals, non-profit companies, for-profit companies and public administrations published dashboards all over the world.
The dashboards were built by software like Qlik or relied on private or open-source code.
Centers for Disease Control and Prevention became famous : CDC in the US, ECDC in Europe became overloaded by requests.
Data collected by public administrations were made publicly available and the first dashboards came up with its share of critics.
REST APIs¶
Access data on COVID19 through an easy API for free. Build dashboards, mobile apps or integrate in to other applications.
https://covid19api.com/ with 34,642,916 requests served by the API
https://rapidapi.com/collection/coronavirus-covid-19
Blogs¶
Developers Respond to COVID-19 https://rapidapi.com/blog/developers-respond-to-covid-19
News¶
- The Latest News On The API Economy
- https://www.programmableweb.com/category/coronavirus%2Bcovid-19/news?category=29999%2C30105&api_videos=0
Articles and Blogs¶
Articles and Blog posts describing how to increase the quality of your dashboards
- Open collaboration on COVID-19
- https://github.blog/2020-03-23-open-collaboration-on-covid-19 By Martin Woodward
- Live tracker: How many coronavirus cases have been reported in each U.S. state?
- https://www.politico.com/interactives/2020/coronavirus-testing-by-state-chart-of-new-cases/ By Beatrice Jin | 03/16/2020 8:10 PM EDT | Updated 04/22/2020 7:10 PM EDT
- Closing the data divide: the need for open data, Apr 21, 2020 | Jennifer Yokoyama - Chief IP Counsel
- https://blogs.microsoft.com/on-the-issues/2020/04/21/open-data-campaign-divide/
- Bing delivers new COVID-19 experiences including partnership with GoFundMe to help affected businesses
- https://blogs.bing.com/search/2020_04/Bing-delivers-new-COVID-19-experiences-including-partnership-with-GoFundMe-to-help-affected-business
- 17 (or so) responsible live visualizations about the coronavirus, for you to use
- https://blog.datawrapper.de/coronaviruscharts
- Epidemic Modeling 101: Or why your CoVID19 exponential fits are wrong
- https://github.com/DataForScience/Epidemiology101
- Charting new territory; How The Economist designs charts for Instagram
- https://medium.economist.com/charting-new-territory-7f5afb293270
Dashboards¶
Datasources¶
72h nonprofit online hackathon is to develop open-source prototypes, which contribute to solving the most pressing challenges in the current crisis.
Tens of thousands of volunteers build solutions for the Corona pandemic. To avoid reinventing the wheel, we created a central place to learn about existing projects and add new ideas.
Location for summaries and analysis of data related to n-CoV 2019, first reported in Wuhan, China
The Institute for Health Metrics and Evaluation (IHME) is an independent population health research center at UW Medicine, part of the University of Washington, that provides rigorous and comparable measurement of the world’s most important health problems and evaluates the strategies used to address them
COVID-19 Projections worldwide
NIH National Institute of Health, Open-Access Data and Computational Resources to Address COVID-19:
The COVID Tracking Project collects and publishes the most complete testing data available for US states and territories
Bing COVID-19 data sources
A repo for coronavirus related case count data from around the world. The repo will be regularly updated
We are building an open database of COVID-19 cases with chest X-ray or CT images
Data in time of COVID-19
Hack for Wuhan
#WirVsVirus Hackathon
https://wirvsvirushackathon.org/
Overview of topics and challenges: https://airtable.com/shrPm5L5I76Djdu9B
Overview of submissions: https://wirvsvirushackathon.devpost.com/submissions
Data Repository by Johns Hopkins
Open Source Know-How
White House Dataset
COVID-19 Open Research Dataset (CORD-19) by Microsoft Research
COVID-19 Data
COVID-19 Open Research Dataset Challenge
Crowdbreaks Data
COVID-19 Cases Switzerland
https://github.com/daenuprobst/covid19-cases-switzerland
Project of Daniel Probst: https://www.corona-data.ch/
COVID-19 case numbers communicated by official Swiss Canton’s and FL’s sources
Swiss Hospital Data
Swiss Federal Railways Data
Open Data: https://opentransportdata.swiss/ and https://data.sbb.ch/
Open Journey Planner (OJP): https://opentransportdata.swiss/de/cookbook/open-journey-planner-ojp/
Open Data City of Zurich
Air Quality: https://data.stadt-zuerich.ch/dataset/luftqualitaet-tages-aktuelle-messungen
Traffic count data for motorized private transport (hourly values): https://data.stadt-zuerich.ch/dataset/sid_dav_verkehrszaehlung_miv_od2031
Parking guidance system: https://data.stadt-zuerich.ch/dataset/parkleitsystem
Zurich Tourism Open Data
Reddit Resources
Nth Opinion
- World Health Organization App
- https://github.com/WorldHealthOrganization/app
European CDC Data
ACAPS Resources
Real-time tracking of pathogen evolution
nextstrain.org
COVID Tracker
COVID-19 Hospital Impact Model for Epidemics
NVIDIA Parabricks Genomics Analysis Toolkit
Electricity Generation, Transportation and Consumption Data
COVID-19 API Initiative
Facebook API & SDK
Messenger: https://developers.facebook.com/docs/messenger-platform/
Instagram & Facebook Stories: https://developers.facebook.com/docs/instagram-api/reference/user/stories/
Proximity Tech to fight COVID-19 by Uepaa
Scandit SDK
For barcode, text and ID document scanning capabilities
Swiss Radio and Television API
MongoDB’s flexible data model
Go and try it for free at https://cloud.mongodb.com and if you require more credits, please fill in this form: https://forms.gle/R4zLtWurWkNFMozq9
IBM Cloud and Watson APIs
https://ibm.biz/cloud-vs-covid19 Please ping IBM on slack to get a unique Feature Code
https://developer.ibm.com/blogs/the-2020-call-for-code-global-challenge-takes-on-covid-19/
Postman COVID-19 API Resource Center
Google Maps, Cloud and G Suite
Fossilo.com - Daily Archives of COVID-19 relevant pages
Lockdowns by country¶
COVID-19 Lockdown dates by country; A list of countries and the dates that each country went into lockdown.
Tests conducted by country¶
Covid19 Tests Conducted by Country; Captures the number of tests conducted in any country/region
clinical trials¶
Vivli A global clinical research data sharing platform. The Vivli team is dedicated to helping researchers share and access data from clinical trials to advance science
Todo
continue à partir de data vizualization
Widgets anyone can embed into a html page¶
COVID-19 on Bing¶
- Data is collected from multiple sources (CDC, WHO, ECDC, Wikipedia, 24/7 Wall St., BNO News) that update at different times and may not always align. Some regions may not provide complete breakdown of COVID-19 related stats.
-
<div class="bingwidget" data-type="covid19" data-market="en-us" data-language="en-us" data-app="bingwidget"></div> <script src="//www.bing.com/widget/bootstrap.answer.js" async=""></script>
Data formats¶
How to format the data your are exposing: text, numeric, complex object using XML or JSON?
List of locations in a TXT file¶
Source : https://github.com/microsoft/COVID-19-Widget/blob/master/AllLocations.txt
Location: Country/State |
---|
/ |
/United States |
/United States/Alabama |
/United States/Alaska |
/Afghanistan |
/Albania |
/Algeria |
Data models¶
How should you design your data model? Which level of details do you plan?
Data model¶
Record name |
---|
Total world statistics |
Cases by country |
Cases by country by day |
Data for all countries |
Mask usage instructions |
Infection histories by country |
India-specific figures |
North American figures |
Data by ISO code |
Testing statistics |
Transportation infection data |
Multilingual applications¶
Twenty percent of the population understand English, sharing a dashboard in English describing an outbreak is of little help for the vast majority of the popuplation.
Note
Build a multilanguage application from day 1.
[WikiLangSpok20] indicates an approximate list of languages by the total number of speakers.
Rank | Language | Speakers (millions) |
---|---|---|
1 | English | 1.268 |
2 | Mandarin Chinese | 1.120 |
3 | Hindi | 637.2 |
4 | Spanish | 537.9 |
5 | French | 276.6 |
6 | Standard Arabic | 274.0 |
7 | Bengali | 265.2 |
8 | Russian | 258.0 |
9 | Portuguese | 252.2 |
10 | Indonesian | 199.0 |
11 | Urdu | 170.6 |
12 | German | 131.6 |
13 | Japanese | 126.4 |
14 | Swahili | 98.5 |
15 | Marathi | 95.3 |
List of locations¶
List of locations consist of several information; e.g., country name, state, county, province, commune, longitude, latitude, post box
- Bing lists countries and states names in a text file; US States are separated by a slash ‘/’
- https://github.com/microsoft/COVID-19-Widget/blob/master/AllLocations.txt
Metadata¶
Hospital
- All beds needed is the total number of beds needed exclusively for COVID patients, and includes ICU beds needed for COVID patients. covid19.healthdata.org
- Bed shortage
- ICU beds needed is the total number of ICU beds needed exclusively for COVID patients. covid19.healthdata.org
- ICU bed shortage
- Invasive ventilators needed
Uncertainty is the range of values that is likely to include the correct projected estimate for a given data category. Larger uncertainty intervals can result from limited data availability, small studies, and conflicting data, while smaller uncertainty intervals can result from extensive data availability, large studies, and data that are consistent across sources. The model presented in this tool has a 95% uncertainty interval and is represented by the shaded area(s) on each chart. covid19.healthdata.org
correlation between metadata
- All beds needed -> Bed shortage
- ICU beds needed -> ICU bed shortage
Projects¶
Comparison of COVID-19 case reporting from different sources¶
Daily cumulative case numbers (starting Jan 22, 2020) reported by the Johns Hopkins University Center for Systems Science and Engineering (CSSE), WHO situation reports, and the Chinese Center for Disease Control and Prevention (Chinese CDC) for within (A) and outside (B) mainland China.
Source : An interactive web-based dashboard to track COVID-19 in real time: https://www.thelancet.com/journals/laninf/article/PIIS1473-3099%2820%2930120-1/fulltext
Software Architecture¶
Gatsby is a free and open source framework based on React that helps developers build blazing fast websites and apps:
urlwatch configuration to monitor state COVID-19 data:
urlwatch monitors webpages for you:
The COVID Tracking Project collects and publishes the most complete testing data available for US states and territories.
Scan/Trim/Extra Pipeline for State Coronavirus Site
The crawler and parsers are there for the 50 US states and DC. The focus is to collect offical published COVID-19 statistics.
Provide embed code to integrate your dashboards easily
RDA COVID-19 Recommendations and guidelines¶
The objectives of the RDA COVID-19 Working Group (CWG) are:
to clearly define detailed guidelines on data sharing under the present COVID-19 circumstances to help stakeholders follow best practices to maximize the efficiency of their work, and to act as a blueprint for future emergencies; to develop guidelines for policymakers to maximise timely data sharing and appropriate responses in such health emergencies; to address the interests of researchers, policy makers, funders, publishers, and providers of data sharing infrastructures.
Source : RDA COVID-19 Recommendations and guidelines 1st release - open for comments https://www.eoscsecretariat.eu/eosc-liaison-platform/post/rda-covid-19-recommendations-and-guidelines-1st-release-open-comments
FAIR and timely¶
“FAIR principles [means] that data, software, models and other outputs should be Findable, Accessible, Interoperable and Reusable”
“A balance between achieving ‘perfectly’ FAIR outputs and timely sharing is necessary with the keygoal of immediate and open sharing as a driver.”
Metadata¶
“While rich metadata is desirable, even a minimum set of key fields/descriptors is valuable” [RDA20]
“The use of common metadata standards, as adopted by one’s relevant discipline, as well as vocabularies, are highly recommended”
“metadata should describe the data as well as the terms under which it can be accessed and reused.”
“Ideally, data and metadata should be exposed via machine readable endpoints (e.g. RDF, APIs)”
“Where there are restrictions on accessing or using datasets, metadata should be shared openly to enable discovery (e.g. CC0or CC-BYlicenses).”
Documentation¶
“Research outputs need to be documented, which includes documentation of methodologies used to define and construct data, data cleaning, data imputation, data provenance and so on”
“Software should provide documentation that describes at least the libraries, algorithms, assumptions and parameters used”
“Equally, research context, methods used to collect data, and quality-assurance steps taken are important”
“When sharing datasets, other relevant outputs (or documents) should also be made available, such as codebooks, lab journals, or informed consent form templates,”
Use of Trustworthy Repositories¶
“To facilitate data quality control, timely sharing and sustained access, data should be deposited in trustworthy data repositories (TDRs)”
“Whenever possible, these should be trustworthy data repositories (TDRs) that have been certified, subject to rigorous governance, and committed to longer-term preservation of their data holdings.”
“As the first choice, widely used disciplinary repositories are recommended for maximum accessibility and assessability of the data, followed by general or institutional repositories”
“Using existing open repositories is better than starting new resources.”
“By providing persistent identifiers, demanding preferred formats, rich metadata, etc., certified trustworthy repositories already guarantee a baseline FAIRness of and sustained access to the data, as well as citation.”
Ethics and Privacy¶
“Access to individual participant data andtrial documents should be as open as possible and as closed as necessary, to protect participant privacy and reduce the risk of data misuse”
Legal¶
“Technical solutions that ensure anonymisation, encryption, privacy protection, and data de-identification will increase trust in data sharing.”
“Emergency data legislation activated during a pandemic needs to clearly outline data custodianship/ownership, publication rights and arrangements, consent models, and permissions around sharing data and exemptions.”
What are the blocking factors for sharing data?¶
- non-machine-readable data (e.g., PDF)
- heterogeneous measurement standards
- divergent metadata formats
- lack of version control
- fragmented datasets
- delays in releasing data
- non-standard definitions and reporting parameters
- lack of metadata
- unavailable or undocumented computer code
- frequently changing web addresses
- copyright and usage conditions
- translation requirements
- consents
- approvals
- legal restrictions
- lack or no integration of clinical, eHealth, surveillance, and research systems within and across jurisdictions or providers
Major difficulty¶
Lack of contextual data needed to study the evolution of disease in sub-populations. e.g.,
- healthy sub-populations that are vulnerable to serious long term effects following recovery that we don’t know about yet because we don’t have the data and because we are focusing on deaths
- age-specific vulnerabilities
- disadvantaged sub-populations with limited health care
- vulnerabilities evident in severe disease associated with comorbidities
- vulnerabilities due to environmental conditions
- vulnerabilities due to social and cultural norms
- following sequelae and immunity
Glossary¶
- FAIR
- FAIR principles
Data, software, models and other outputs should be Findable, Accessible, Interoperable and Reusable
- FAIRER
- FAIRER principles
- Findable, Accessible, Interoperable, Reusable, Ethical, and Reproducible [RDA20]
- OMICS
- Omics are defined as data from cell and molecular biology [RDA20]
- RDA
- Reasearch Data alliance https://www.rd-alliance.org
- SPIRIT
- Standard Protocol Items: Recommendations for Interventional Trials
- TDR
- Trustworthy Data Repositories
- Trustworthy Data Repositories (TDRs) are repositories that have been certified, subject to rigorous governance, and committed to longer-term preservation of their data holdings. [RDA20]
Citations¶
[RDA20] | (1, 2, 3, 4) https://www.rd-alliance.org/system/files/RDA%20COVID-19%3B%20recommendations%20and%20guidelines%2C%201st%20release%2024%20April%202020.pdf |
[WI16] | The FAIR Guiding Principles for scientific data management and stewardship. Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18 |
Glossary for Data Visualization¶
- CDC
Center for Disease Control and Prevention
- ECDC
European Center for Disease Control and Prevention
Citations¶
[WikiLangSpok20] | Wikipedia contributors. (2020, April 28). List of languages by total number of speakers. In Wikipedia, The Free Encyclopedia. Retrieved 17:52, April 30, 2020, from https://en.wikipedia.org/w/index.php?title=List_of_languages_by_total_number_of_speakers&oldid=953636952 |
To do list¶
Todo
continue à partir de data vizualization
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/data-visualization/checkouts/stable/source/dataviz/datasources.rst, line 251.)