<![CDATA[Michael Remington]]>http://localhost:2368/http://localhost:2368/favicon.pngMichael Remingtonhttp://localhost:2368/Ghost 4.1Wed, 29 Dec 2021 20:34:16 GMT60<![CDATA[Using Machine Learning to Group US States by Covid-19 Hospitalization Trends]]>http://localhost:2368/using-machine-learning-to-compare-states-covid-trajectories/605e41f775c89200014a0456Tue, 01 Jun 2021 19:20:20 GMTOverviewUsing Machine Learning to Group US States by Covid-19 Hospitalization Trends

Machine learning algorithms can provide unique insights from Covid-19 data. In this article I'll use a clustering algorithm to group US states based on trends in Covid-19 hospitalizations and other metrics. Some groupings are surprising and invite further investigation. Why would a state have a different trend than its neighbors? Why would some distant states have similar trends? These results may hint at patterns of interstate travel during the pandemic.

First we'll explore the hospitalization trend groupings, then we'll add more metrics involving cases and deaths. Lastly, we'll dive into how the project works.

Data Preprocessing

All data was scaled from 0-1 to account for population differences. We are looking at similar hospitalization trends and timing, not similar numbers hospitalized. Without scaling the groups are predictable and uninteresting: states with similar populations are grouped.

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
All states before scaling. States with higher populations have taller peaks. Note: values dip below 0 because they have been centered by Scikit-Learn's PCA implementation.
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
After scaling, all state trends are equal height.

Results

The OPTICS clustering algorithm produced these groupings. The inputs were Covid-19 hospitalizations from March 2020 to March 2021 for each state.

Group 1

Hospitalizations in these states followed nearly identical trends and peaked within 15 days of each other. This group has 3 bordering states.

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

Group 2

The geographical outlier is Washington State. Despite this, the hospitalization trajectories show a similar plateau in hospitalizations around the same time.

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

Group 3

Despite sharing borders with the last group, the clustering algorithm separates Delaware, Massachusetts, and New Hampshire. These states show a narrower and later peak in hospitalizations compared to their neighbors in group 2. Delaware is particularly interesting - it's completely surrounded by states from group 2 but does not share their hospitalization curve.

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

Group 4

These hospitalization trends are similar but the states are distant. Their peaks occurred within ten days of each other and have similar shapes. There may be a non-obvious commonality that led to similar trends.

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

Additional Hospitalization Groups

Lowering the minimum states per group from 3 to 2 results in additional groups. Note that AZ, TX, LA, and MS experienced two large peaks in hospitalizations, while DC, NY, NC, and VA experienced just one large peak simultaneously.

Two Peaks

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

One Peak

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

All Hospitalization Trend Groups

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

Adding Cases, Deaths, and Other Metrics

Using my web interface, we'll now add four additional metrics to cluster states by. Precise definitions for these metrics are found in the methods section.

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

The clustering algorithm returns new groups after taking into account the five metrics above.

The line charts now attempt to represent all five metrics in each line. This is done with dimensionality reduction (more details later).

Group 1

The absence of the Dakotas is notable.

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

Group 2

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

Group 3

We again see Washington as a geographical outlier even though its chart shows similar trends in the five metrics.

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

Group 4

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

Group 5

These southern states show similar trajectories in the five metrics. They have two major peaks while the previous groups had just one.

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends
Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

All Trend Groups

Additional groups 6 and 7 were created by lowering the minimum states per group from 3 to 2.

Using Machine Learning to Group US States by Covid-19 Hospitalization Trends

Discussion

I'm a data scientist, not an infectious disease expert, but I'll speculate that interstate travel, population density, and related elements may be key factors for these groups.

For example, we can presume that there was more interstate travel between North and South Dakota than between South Dakota and any of group 1 (red in the above map). Other groups like group 3 (green) are less explainable due to geographical outliers.

Methods

All data is from covidtracking.com. The data was scaled from 0-1 as explained earlier.

Data Definitions:

hospitalizedCurrently: Number hospitalized for Covid-19 at any point in time. Not just new admissions.
deathIncrease: New Covid-19 deaths per day. This is smoothed with a one-week average as is common in prominent publications such as the New York Times.
inIcuCurrently: Number in an ICU for Covid-19 at any point in time. Not just new admissions.
percentPostive: The fraction of tests that are positive. I believe this is a better comparison than cases per million because of varying case definitions and testing rates between states.
Case Fatality Rate: Fraction of deaths out of known cases for each state. This is not the lethality of Covid-19, which is instead estimated by the infection fatality rate (IFR).

Reducing Dimensions

If more than one metric is selected then dimensionality reduction is done with PCA. PCA projects the metrics into a lower-dimensional space while attempting to preserve the information that explains the most variance. A potential improvement would involve clustering in all five dimensions and reducing dimensions only for visualization. I implemented a T-SNE function for this purpose.

The OPTICS clustering algorithm was used to generate clusters. The minimum number of datapoints per cluster was 3 unless otherwise stated.

https://twitter.com/mremingtn

Source code

github.com/remingm/covid19-clustering-states

]]>
<![CDATA[Interactive Covid-19 population immunity estimates]]>I made a web app that estimates population immunity for SARS-CoV-19. Try it out at covid.mremington.co.

covid.mremington.co

What follows is an overview of how the estimate works. Feel free to skip this post and just explore the site if you're not interested in the

]]>
http://localhost:2368/project-interactive-covid-19-webapp/605d0ed675c89200014a0183Sun, 07 Mar 2021 02:02:00 GMT

I made a web app that estimates population immunity for SARS-CoV-19. Try it out at covid.mremington.co.

Interactive Covid-19 population immunity estimates
covid.mremington.co

What follows is an overview of how the estimate works. Feel free to skip this post and just explore the site if you're not interested in the behind-the-scenes.

First, a crash course in population immunity:

Why is population immunity important?

  • Normal life may safely return when enough of the population has immunity to Covid-19, limiting further spread. This is known as "herd immunity". [1]
  • Herd immunity may be achieved either through infection and recovery or by vaccination. [2]
  • Besides protecting the individual, the goal of vaccination is for a population to reach herd immunity safely. [2]
  • Herd immunity also protects those who are unable to be vaccinated, such as newborns and immunocompromised people, because the disease spread within the population is very limited. [2]

How much of the population needs immunity?

  • The herd immunity threshold (HIT) is debated among scientists. The commonly accepted herd immunity threshold for SARS-CoV-2 is 60-80%. [3] [1]
  • A research group at the University of Oxford estimates the threshold at 10% to 60% when accounting for T cell immunity studies. [3]
  • Another research group calculates the threshold at 10% to 20% when accounting for diversity in population mixing. Some consider this controversial. [3]
  • Infections, hospitalizations, and related metrics may decline as population immunity rises, even if the HIT is not reached. [1]

References

  1. COVID-19 Vaccines and Herd Immunity ; Harvard Center for Communicable Disease Dynamics
  2. What Is Herd Immunity? | Infectious Diseases | JAMA | JAMA Network
  3. Covid-19: Do many people have pre-existing immunity? | The BMJ

To estimate population immunity we need to know:
1. The number of vaccinated people
2. The number of people who were infected and recovered
3. The overlap between these groups

For vaccinations my code simply pulls the data daily. You can choose that either the first or second dose be counted as immunity.

Interactive Covid-19 population immunity estimates

Determining recovered infections is more tricky. Cases only represent a fraction of known infections, since not all infected people get tested. Luckily Youyang Gu, one of the top Covid-19 modelers, has presented a simple equation for estimating true infections. To use this equation we need the daily percentage of positive tests and the daily number of new cases.

Many who get vaccinated have already been infected. It's impossible to know the exact overlap, but if we don't estimate it then our immunity estimate will be too high. One of the top modelers assumes 50% of vaccinations go to recovered infections. In my web app that is customizable, with a default of 20%. 28% of the US is estimated to have been infected as of March 7, 2021.

Interactive Covid-19 population immunity estimates

I also used SciPy to find the last peak in hospitalizations and report the estimated population immunity at that time. Note that this is not necessarily the HIT, as seasonality and other factors also affect transmission. My model found that US hospitalizations began to decline from their all-time peak after January 12, 2021 when the estimated population immunity was 22.7%.

Putting it all together, here is the complete web app, covid.mremington.co, as of March 7th, 2021:

Interactive Covid-19 population immunity estimates
covid.mremington.co

The source code is available here.

Lastly, I was greatly inspired by Youyang Gu of covid19-projections.com, who built a similar model to estimate immunity.

References

COVID-19 Vaccines and Herd Immunity
By Marc Lipsitch How many people in a population will need to get a COVID-19 vaccine before herd immunity is achieved? This question has been all over the news lately, with speculations of 70%, 80%…
Interactive Covid-19 population immunity estimates
Patient Information: What Is Herd Immunity?
This JAMA Patient Page describes what herd immunity is, how it limits disease spread, and how it is achieved in a population either by vaccination or by infection and recovery from a disease.
Interactive Covid-19 population immunity estimates
Covid-19: Do many people have pre-existing immunity?
It seemed a truth universally acknowledged that the human population had no pre-existing immunity to SARS-CoV-2, but is that actually the case? Peter Doshi explores the emerging research on immunological responses Even in local areas that have experienced some of the greatest rises in excess deaths…
Interactive Covid-19 population immunity estimates
Estimating True Infections - Revisited
We use artificial intelligence to accurately forecast infections, deaths, and recovery timelines of the COVID-19 / coronavirus pandemic in the US and globally
Interactive Covid-19 population immunity estimates
Path to Herd Immunity - COVID-19 Vaccine Projections
We use artificial intelligence to accurately forecast infections, deaths, and recovery timelines of the COVID-19 / coronavirus pandemic in the US and globally
Interactive Covid-19 population immunity estimates
]]>