Hospital Admissions
| Attribute | Details |
|---|---|
| Source Name | hospital-admissions |
| Data Source | Health system partners |
| Geographic Levels | National, State, County, Hospital Referral Region (HRR), Metropolitan Statistical Area (MSA) (see geography coding docs) |
| Temporal Granularity | Daily (see date format docs) |
| Reporting Cadence | Daily |
| Date of last data revision: | October 20, 2020 (see data revision docs) |
| Temporal Scope Start | 2020-02-01 |
| License | CC BY |
Changelog
Click to expand
See COVIDcast Signal Changes for general information about how we track changes to signals.
October 20, 2020
We went from a custom geo mapping file (for aggregating from zip->(county, msa, hrr, state)) to a central geo file based on rigorously sourced US census data.
Overview
This data source is based on electronic medical records and claims data about hospital admissions, provided to us by health system partners. We use this inpatient data to estimate the percentage of new hospital admissions with a COVID-associated diagnosis code in a given location, on a given day.
See also our Health & Human Services data source for official COVID hospitalization reporting from the Department of Health & Human Services.
Active Signals. These signals are currently updated.
| Signal | Description |
|---|---|
smoothed_covid19_from_claims |
Estimated percentage of new hospital admissions with COVID-associated diagnoses, based on claims data from health system partners, smoothed in time using a Gaussian linear smoother Earliest date available: 2020-02-01 |
smoothed_adj_covid19_from_claims |
Same as smoothed_covid19_from_claims, but with systematic day-of-week effects removed using the same mechanism as in doctor-visits Earliest date available: 2020-02-01 |
Deprecated Signals. These signals are frozen and no longer updated as of Oct 1, 2020.
| Signal | Description |
|---|---|
smoothed_covid19 |
Estimated percentage of new hospital admissions with COVID-associated diagnoses, based on electronic medical record and claims data from health system partners, smoothed in time using a Gaussian linear smoother. Earliest date available: 2020-02-01 |
smoothed_adj_covid19 |
Same as smoothed_covid19, but with systematic day-of-week effects removed using the same mechanism as in doctor-visits. Earliest date available: 2020-02-01 |
Table of Contents
Lag and Backfill
Hospitalizations are reported and processed by the health system partners several days after they occur, so the signal is typically available within several days of lag. This means that estimates for a specific day are only available several days later.
The amount of lag in reporting can vary, particularly whether the data comes from electronic medical records or from processed claims. After we first report estimates for a specific date, further hospitalization data may arrive for that date, or diagnoses for admissions from that date may change. When this occurs, we issue new estimates. This means that a reported estimate for, say, June 10th may first be available in the API on June 14th and subsequently revised on June 16th.
As claims and records are available at a significant and variable latency, the signal experiences heavy backfill with data delayed for a couple of weeks. We expect estimates available for the most recent 7-13 days to change substantially in later data revisions (having a median delta of 10% or more). Estimates for dates more than 57 days in the past are expected to remain fairly static (having a median delta of 1% or less), as most major revisions have already occurred.
See our blog post for more information on backfill.
Limitations
This data source is based on electronic medical records and claims data provided to us by health system partners. The partners can report on a portion of hospitalizations, but not all of them, and so this source only represents those hospitalizations known to them. Their coverage may vary across the United States.
Standard errors and sample sizes are not available for this data source.
Qualifying Admissions
We receive two daily data streams of new hospital admissions recorded by the health system partners at each location. One stream is based on electronic medical records, and the other comes from claims records.
In the electronic medical records stream, admissions are considered COVID-associated if they meet the following criteria:
- If the admission has any ICD-10 code matching {U071, U072, B9729}, or
- If the primary ICD-10 code is one of {R05, R060, R509, Z9911, R0902, R0603, R0609, R062, R069, R0602, R05, R0600, J9691, J9692, J9621, J9690, J9601, J9600, J189, J22, J1289, J129, J1281, B9721, B9732, B342, B349, A419, R531, R6889} and there is a secondary ICD-10 code of Z20828, or
- If the primary ICD-10 code is Z20828.
For the claims stream, admissions are considered COVID-associated if the patient has a primary ICD-10 code matching {U071, U072, B9729, J1281, Z03818, B342, J1289}.
Estimation
For a fixed location \(i\) and time \(t\), let \(Y_{it}\) denote the number of hospital admissions meeting the qualifying conditions. Let \(N_{it}\) denote the total number of hospital admissions.
Our estimate of the COVID-19 percentage is given by
\[\hat p_{it} = 100 \cdot \frac{Y_{it} + 0.5}{N_{it} + 1}.\]The additional pseudo-observation of 0.5 means this estimate can be interpreted as the posterior mode of a binomial proportion with a \(\text{Beta}(1/2, 1/2)\) Jeffreys prior. The practical effect is to prevent \(\hat p_{it}\) from being exactly zero or one, which would result in estimated standard errors of 0. The estimated standard error is:
\[\widehat{\text{se}}(\hat{p}_{it}) = 100 \sqrt{\frac{\frac{\hat{p}_{it}}{100}(1-\frac{\hat{p}_{it}}{100})}{N_{it}}}.\]Data Streams
smoothed_covid19andsmoothed_adj_covid19: This signal combines data from electronic medical records (emr) and claims records. Here, \(Y_{it} = Y_{it}^{\text{emr}} + Y_{it}^{\text{claims}}\) is the number of hospital admissions meeting the qualifying conditions, where the superscript denotes the respective data stream. Similarly, \(N_{it} = N_{it}^{\text{emr}} + N_{it}^{\text{claims}}\) is taken as the total number of hospital admissions. Thus, the estimate of the COVID-19 percentage is weighted by the contribution from each data stream according to the magnitude of their total admissions.
smoothed_covid19_from_claimsandsmoothed_adj_covid19_from_claims: This signal uses data solely from the claims stream: \(Y_{it} = Y_{it}^{\text{claims}}\) and \(N_{it} = N_{it}^{\text{claims}}\).
Backwards Padding
This source undergoes the same backwards padding adjustment as the doctor-visits
source; see its documentation.
Smoothing
This source undergoes the same smoothing adjustment as the doctor-visits
source (see its documentation), with the exception
that the smoothing is performed on the raw counts, rather than the raw rate.