|
Overview
Each data-providing entity supplies historical data at the beginning of their
participation in the syndromic surveillance system. We fit a model to this data and
use the results to estimate distributional parameters for each day of the year.
Using these distributional parameters, we calculate the probability of seeing as many more cases as were seen on each day. We then correct for multiple testing.
or the same parameters are also used to generate expected counts serving as
input to SaTScan.
Model
The model is a generalized linear or generalized linear mixed model.
The part of the model pertaining to the day of the observation includes:
- A secular linear trend over time
- Sine and cosine effects for seasonality
- Month indicators (11) for non-trigonometric effects of season
- Day-of-week indicators (6) for day-to-day variability
- Holiday and day-after holiday indicators
No assertion is made that this is an exhaustive or parsimonious list of
useful covariates.
Currently, this is a Poisson generalized linear model with a different intercept
for each region (5-digit or 3-digit zip code). Model assessment continues.
Distributional parameter estimation
To find the distributional parameters, we calculate the value of the linear
predictor from the model for each area for each day. Then we invert the link
function (e.g. we exponentiate, for the Poisson) to find the estimated
parameter (mean, for Poisson) from the distribution. Our approach does
not incorporate information regarding the variability of this estimate.
Probability assessment
We use the distribution to straightforwardly calculate the probability
of seeing each number of cases. Then 1 less the sum of the probability of
fewer cases is the probability of as many or more.
Correction for multiple testing
Since many areas are tested, the nominal probability generated in the
previous step is misleading. The expected number of times a given probability
is expected to occur is equal to (probability * number of tests)-1. This can
also be expressed as the number of days that doing that many tests would be
required in order to expect exactly one count with a probability equal
to the observed.
SatScan analysis
The space-time scan statistic uses a cylindrical window with a circular
geographic base and with height corresponding to time, where both the base
and height vary in size, with a maximum geographical size of 50% of the population
at risk and a maximum height of three days. The cylindrical window is then moved
in space and time, so that for each possible geographical location and size, it also
visits each possible time period. In effect, we obtain an infinite number of
overlapping cylinders of different size and shape, jointly covering the entire study
region, where each cylinder reflects a possible outbreak.
The estimated Poisson
mean from the modeling described above is used as the expected count for each
region and day. This corrects the SaTScan for seasonal, day-of-week, and other
patterns observed in the historical period. <\p>
For each cylinder, the observed and
expected numbers of cases are noted, and these are used to calculate a Poisson-based
log likelihood ratio reflecting how 'unusual' it is to observe what was observed.
The p-values are adjusted for the multiple testing inherent in the many cylinders
considered.
<\p>
For a full technical discussion and details, please
contact the head statistician for the project.
|