How Can We Trust COVID-19 Data?

admin | April 19, 2020| 11:11 am

Can you imagine being the President of the United States, a Governor of a state, or any government leader these days? Trying to make decisions on how to handle the coronavirus; do I invoke Stay-at-Home orders and force large amounts of people to lose their jobs and/or businesses; or do I open the economy and risk loss of life? How do you make these decisions? Theoretically these decisions should be made without regard to political party or ulterior political motives, but that concept may be “theoretical”. As people who are impacted by these decisions, we hope our leaders are making them based on factual data. As they stand behind their podiums bombarding us with daily numbers of cases, numbers of deaths, hospitalization rates, etc. do you ever wonder how are they getting this data? How old is it? Is it accurate? As someone who works with data and understands the details of how data is gathered, transformed, and reported, I question how all these disparate government agencies are capable of gathering, verifying, and accurately reporting such important metrics.

As confirmed in recent news reports, I am not alone in my skepticism. However, data is crucial right now. The most valuable weapon we have in the battle against COVID-19 is data, which must form the basis of every action we take to combat the virus. Massive amounts of data are being gathered in an attempt to understand and control the pandemic. Hospitals are collecting data on every case, laboratories are studying as many samples as possible, academic institutions are contributing research and testing myriad hypotheses, and vast government agencies are investigating each isolated outbreak. The reporting generated from these diverse sources form a complex web of information, making it difficult to interpret. Even in small sample sizes, I have seen the data can be misleading.

Take for example the Diamond Princess cruise ship. Early on in this crisis, its 3,711 passengers and crew members were quarantined onboard the ship for nearly a month and tested regularly for COVID-19. Many epidemiologists believed this case study would be a good representation of the virus’s effects on a population. The case fatality rate onboard the ship was around 1%, which scientists hoped they could accurately project onto larger populations. However, it quickly became apparent certain metrics were not taken into consideration, particularly the older age demographic of the passengers. Due to this major influencing factor on the virus’s outcome, the case study could not be projected onto larger, more diverse populations, where “the real death rate could stretch from five times lower (0.025%) to five times higher (0.625%).”1 The data from the ship did not provide enough information to form broader conclusions. In fact, it indicated to me that better ways to understand coronavirus data are still necessary.

How can we make the data more useful? Perhaps the data we are basing our assumptions on is not granular enough. We are leaning on high-level models with many sources to predict the virus’s trajectory based on the data collected. These models are only effective if they consider multiple factors such as regional, demographic, social, and economic disparities when analyzing the numbers, giving the data proper context. They must also be vigorously vetted by experts across many fields. As noted by Harvard Business Review, “These massive, decentralized, and crowd-sourced data can reliably be converted to life-saving knowledge if tempered by expertise, transparency, rigor, and collaboration.”2 These models can give us a more complete understanding of the virus and resulting pandemic, but with so much data available, they are numerous and varied. They can also be manipulated.

Recently in Florida, the designer of the state’s COVID-19 dashboard was removed from her position. Rebekah Jones claims her removal was due to her unwillingness to “manually change data to drum up support for the plan to reopen.”3 She refused an order to censor data that did not support a political agenda. How do we know this is not happening more often?

It is a matter of trust I find difficult to reconcile. The agencies tasked at formulating a response to the crisis such as the Centers for Disease Control and Prevention are relying on the data in these models as an integral part of their “surveillance systems”, created “in collaboration with state, local, territorial, and academic partners to monitor COVID-19 disease in the United States.”4 How do they know the data is free from outside influence? How do they know they are using the correct information to form their recommendations? They are the experts, but I can only hope they are right.

Although we have gained a great deal of knowledge in a short time about this virus and its effects, it is certainly not enough, and questions will remain. We must continue to amass as much information as possible with a specific, granular approach, then apply our combined talents to interpret it, and our common goal to use it correctly. The key to our response to this crisis will be its foundation on accurate data. How we combat the pandemic depends on it.