Editorial Type: Spill Surveillance Principles
 | 
Online Publication Date: 23 Jul 2024

False Confidence and Time-Series Statistics in Oil Spill Injury Assessments

,
, and
Article Category: Research Article
DOI: 10.7901/2169-3358-2024.1.295
Save
Download PDF

ABSTRACT 295

Time-series statistics, specifically time-series regression approaches, have been used by Natural Resource Damage Assessment (NRDA) practitioners to determine whether an injury to a natural resource has occurred and, when an injury is thought to have occurred, to quantify the magnitude of the injury. This study evaluates one of these models using case studies from two recent oil spill injury assessments that used time-series regression methods to evaluate injuries to marine mammals and outdoor recreation services.

First, we summarize the general structure and assumptions underlying the time-series models typically used in oil spill injury assessments. Second, we investigate uncertainty using prediction errors derived by applying the modeling procedure to baseline periods in the data (i.e., times when there are no effects of an oil spill). Our proposed approach quantifies the uncertainty arising from the combination of natural variation in the data and uncertainty associated with the modeling process. Compared to other statistical methods commonly used to quantify uncertainty associated with regression models, our approach provides a more robust estimate that better matches the informational needs of an oil spill injury assessment. Applying our approach to the case studies, we find that time-series regression models can have relatively large prediction errors that limit the ability of analysts to reach firm conclusions regarding injury. Failure to appropriately quantify uncertainty can therefore lead to false confidence. Finally, we discuss how incorporating additional information in a causal analysis can reduce uncertainty associated with the use of time-series modeling in this context, leading to higher confidence associated with oil spill injury assessments.

INTRODUCTION

Natural Resource Damage Assessment (NRDA) under the Oil Pollution Act (OPA) is a process through which the environment and public are made “whole” for injuries to natural resources resulting from unpermitted releases of petroleum products. Making the environment whole involves returning natural resources and their services to their baseline (i.e., without spill) conditions through cleanup and primary restoration, and is not a focus of this paper.

Making the public whole primarily occurs through compensatory restoration, which is the implementation of one or more restoration projects designed to compensate the public for the loss in natural resource services between the time of the spill and the time when services return to baseline (what the authors call the spill period). Finding appropriate compensatory restoration involves three main assessment phases: i) injury determination; ii) injury quantification; and iii) damage determination.

Injury determination seeks to identify what natural resources and services were injured, where injury is defined in OPA regulations as a “an observable or measurable adverse change in a natural resource or impairment of a natural resource service” resulting from the incident. OPA NRDA guidance states that Natural Resource Trustees (Trustees), acting on behalf of the public, are required to demonstrate that the incident directly or indirectly caused an adverse change in the natural resource or services (Huguenin et al. 1996).

For natural resources and/or services that are determined to have been injured, the injury is quantified during the injury quantification phase as the change in natural resource services relative to baseline conditions. Services are defined as “the functions performed by a natural resource for the benefit of another natural resource and/or the public.” Services include human uses of natural resources such as outdoor recreation. Baseline is defined as “the condition of the natural resources and services that would have existed had the incident not occurred.”

Damage determination is the process of estimating the amount of money sufficient to fund restoration projects such that, when those projects are implemented, the public is compensated for the quantified injury.

This paper considers the use of time-series regression modeling to inform the injury assessment, which consists of the injury determination and injury quantification phases. The Background section presents relevant background information, including why uncertainty needs to be better considered during injury assessments. The Methods section presents an overview of the methods developed by the authors to quantify the uncertainty in baseline predictions. The Case Study Examples section illustrates the methods, using two recent OPA NRDA assessments as case studies. Finally, the Discussion section summarizes the main conclusions.

BACKGROUND

Time series regression models can be used in NRDA injury assessments when there is consistent data collected over time for a resource, service, or human use. A common example is daily visitation or revenue data collected by agencies that manage outdoor recreation access, such as state park entrance fees and paid parking at beach parking lots. Recent outdoor recreation injury assessments that relied on time series data include the 2014 Texas City “Y” oil spill (TCY), 1 the 2015 Refugio Beach oil spill (Refugio Beach Oil Spill Trustees, 2021), and the 2019 Second 80s Incident.2 An example for a natural resource is the TCY dolphin injury assessment, which relied on strandings data collected by the National Oceanic and Atmospheric Administration's (NOAA's) Marine Mammal Strandings Network.

The term time series regression can refer to a variety of analytical approaches; here, it simply refers to a regression model that is: 1) applied to data that are collected over time; and 2) includes some measure of time as a predictor in the regression.

The service metric of interest (e.g., recreational visitation) is modeled as a function of variables that are expected to be related to the service metric in predictable ways (e.g., recreational visitation tends to be higher during the summer months and on weekends compared to weekdays). Time-series models as defined in this paper must include at least one time variable and may include other variables. Because the variables do not necessarily have a causal relationship with the service metric, these models should be thought of as predictive rather than explanatory. In other words, the model can be used to make predictions of the service metric but cannot generally be used to explain the underlying causal relationships between the variables and the service metric.

A generic structure for a time series regression model is:

where

  • Countn is the service metric of interest for observation n;

  • Timen is one or more variables that represent the time of observation n (e.g., year, month, etc.) including the model constant;

  • Othern represents one or more optional variables that are expected to be useful predictors of observation n (e.g., for recreation, variables describing weather conditions);

  • Spilln is an optional variable that indicates whether observation n falls within the defined spill period;3

  • α, βT, βO, and βS are coefficients to be estimated; and

  • εn is an error term that represents the difference between the model's prediction and the actual count for observation n.

When a Spill variable is included, it can be interpreted directly as follows: 1) the estimated coefficient for the Spill variable indicates the deviation in the service metric from the predicted baseline level during the defined spill period; and 2) the standard error of the estimated coefficient can be used to test whether the coefficient is statistically significant. Baseline predictions can be made using all variables except the Spill variable; leaving out the Spill variable in this way makes predictions during the post-spill period as if the spill had not occurred.

Whether or not a Spill variable is included, if baseline is modeled perfectly (i.e., without error), then any deviation in the actual data from the predicted baseline should be interpreted as a spill effect because, by definition, baseline includes everything except the spill effect. Real models, of course, do not predict baseline perfectly; therefore, deviations from the predicted baseline cannot generally be attributed to the spill without additional information. As noted in OPA NRDA guidance, a causal analysis is essential to injury determination, as “OPA emphasizes the need for trustees to establish that the identified injuries resulted from the incident” (Huguenin et al. 1996, p. 2-4; emphasis added). With respect to injury determination, causal analysis seeks to determine if the incident was likely the cause of, or a contributing factor to, an adverse change in a natural resource or services relative to baseline conditions. With respect to injury quantification, causal analysis attempts to disentangle the influences of different causes so that the effect of the spill can be isolated and quantified.

As noted in Pearl (2010), regression analysis deals with associations among variables but not causation. The time-series regression models used in NRDAs cannot determine causation without outside information. However, regression modeling is well-suited to answer two particular questions that aid causal analysis.

Question 1: How likely is it that the observed deviation from predicted baseline is the result of sampling variation? If certain assumptions are valid, then this question can be answered based on the statistical significance of the estimated effect of the Spill variable. The OPA NRDA guidance stresses the well-known finding that statistical significance does not indicate biological or social significance, using the following quote from the National Resource Council (1990) for emphasis: “Whether changes in the environment are statistically significant has no bearing on the extent to which the changes may be either meaningful or important” (Huguenin et al. 1996, p. 3-11).

Question 2: How likely is it that the observed deviation from predicted baseline is the result of non-spill factors? This question is answered by the method for quantifying the uncertainty in baseline predictions that is presented in this paper.

METHODS

The authors refer to the uncertainty around the combined effects of non-spill factors on the predicted baseline as uncertainty bounds. Uncertainty bounds are calculated from individual prediction errors, which are defined as the differences between a model's baseline predictions and the actual values of the baseline data.4 The uncertainty bounds provide information on a model's ability (or lack thereof) to make baseline predictions and quantify the uncertainty of non-spill effects in the modeling process.

Calculating uncertainty bounds involves four basic steps:

  1. Use a time-series regression model to make baseline predictions for all baseline data points (i.e., all points outside of the suspected spill period). The model, as used here, should be interpreted broadly to include any decisions made by the analyst regarding the treatment of data and methods of interpretation, such as grouping the results by weeks or months.

  2. Calculate prediction errors as the differences (or percent differences, as appropriate) between baseline predictions and actual baseline data.

  3. Calculate the upper and lower limits of the prediction errors based on the mean and standard deviation.5

  4. Calculate uncertainty bounds by applying the upper and lower limits to the baseline predictions.

The results are easily interpreted. If the actual with-spill data falls within the uncertainty bounds, then non-spill factors cannot be ruled out as causes or contributing factors. For example, imagine that there is a 5% decline in recreational visitation relative to a model's baseline predictions during the month following a spill. If the uncertainty bounds are ±10% of the predicted baseline, then one cannot have high confidence that the 5% deviation is a spill effect because non-spill factors could have caused the entire 5% deviation.6 Conversely, if the uncertainty bounds are ±2% of the predicted baseline, then one can have higher confidence that the spill is the cause or a contributing factor to the deviation because it is unlikely that non-spill factors could have caused the deviation.

Note that one should not interpret the timing of the deviation (for example, that it occurs shortly after the spill) as evidence in favor of the spill effect over non-spill causes without additional causal information. To do so would be to commit the post hoc, ergo propter hoc (after this, because of this) logical fallacy, which assumes that one thing causes another simply because the first thing preceded the other. While logically a cause must precede its effect, proper time ordering is a necessary but not sufficient condition for concluding causality.

These steps for calculating uncertainty bounds are designed to be flexible so that they can be applied to a variety of regression models or algorithms for investigating a spill effect. The basic premise is simple – whatever one does to try to estimate the spill effect, do the same thing for the non-spill baseline data. The resulting errors then indicate the uncertainty in baseline predictions, which is essentially a measure of the model's accuracy when predicting the data on which it was trained.

CASE STUDY EXAMPLES

The method for calculating uncertainty bounds is illustrated using examples from two recent OPA NRDAs. Note that the models are not the exact same as those used in actual assessments; they are simplified and should be considered illustrative examples.

Refugio Beach Oil Spill – Outdoor Recreation

On May 19, 2015, a pipeline owned and operated by Plains All American Pipeline ruptured near Refugio State Beach. Over 100,000 gallons of crude oil were spilled, much of which ran down a storm drain and into a ravine under the freeway, entering the ocean near Refugio Beach in California (NOAA, 2024). The resulting NRDA used time-series regression modeling to assess potential spill effects on outdoor recreation. Some of the Trustees’ modeling is documented in Horsch et al. (2018). The Responsible Party team's modeling was led by one of the authors and is not publicly available.

Daily entrance fee data at various locations throughout the study area were obtained from California State Parks and other sources. The fees are used as an index of visitation patterns. This paper uses two state beaches/parks representative of the wide range of potential spill impacts throughout the study area. Refugio State Beach is close to the initial spill location, was heavily oiled, and had an approximately two-month closure to recreators because of the spill. Point Mugu State Park is located 63 miles (straight-line distance) from the initial spill location, had no documented oiling, and had no recreational closures.

The procedure to investigate spill effects includes the following steps:

  1. Estimate a time series regression model using historical baseline data (which include March through September data for 2010 to 2014). No data from the spill year (2015) is included and no Spill variable is included. Each site is modeled separately. The model for each site took the form:

    where

    • Parameters to be estimated are omitted from the equation for ease of exposition;

    • Month, DayOfWeek, and Holiday are sets of indicator variables representing time for the observation on day d; and

    • Temperature, Precipitation, and WindSpeed are all continuous variables representing weather conditions on day d.

  2. The fitted model is used to predict daily baseline in the post-spill period during the spill year (May 19, 2015, through the end of September).

  3. Predicted baseline and actual data are aggregated into half-month time blocks. There are two half-months per month, where the first half-month is defined as the 1st through 15th of the month and the second half-month is defined as the 16th through end of the month.

  4. Actual (with spill) data are compared to predicted baseline in each half-month to investigate the potential spill effect.

Figure 1 illustrates the modeling results without uncertainty bounds. The blue lines in Figure 1 illustrate baseline recreation in the spill year, as predicted by the fitted model. The orange lines illustrate the actual data. The blue and orange lines are compared to investigate the potential spill effect.

Figure 1:Figure 1:Figure 1:
Figure 1: Example of Time-Series Modeling Applied to Two Outdoor Recreation Sites in the Refugio Beach Oil Spill Injury Assessment.

Citation: International Oil Spill Conference 2024, 1; 10.7901/2169-3358-2024.1.295

For both sites, the actual data is close to the predicted baseline in the first half of May (before the spill occurred). Both sites drop below predicted baseline following the spill. For Refugio Beach, the actual with-spill data are below predicted baseline from the second half of May through the first half of September. For Point Mugu, the actual with-spill data are below predicted baseline in the second half of May, are above predicted baseline in June in July, and below in August (note that there are no data for September for Point Mugu). Based just on these modeling results, one might conclude that Refugio Beach was negatively affected by the spill through the first half of September and Point Mugu was affected for the second half of May.

The picture changes when the authors’ uncertainty bounds are included, especially for Point Mugu (see Figure 2). The shaded blue areas in Figure 2 indicate the uncertainty bounds in baseline predictions, calculated as uncertainty bounds = mean prediction error ±1.96 standard deviations, where the mean and standard deviation is calculated separately for each half-month time block.

Figure 2:Figure 2:Figure 2:
Figure 2: Adding Uncertainty Bounds for the Two Outdoor Recreation Sites in the Refugio Beach Oil Spill Injury Assessment.

Citation: International Oil Spill Conference 2024, 1; 10.7901/2169-3358-2024.1.295

Using 1.96 standard deviations means that 5% of observations are expected to fall outside of the uncertainty bounds due to non-spill causes. In other words, if an orange point falls outside of the uncertainty bounds, it means that there is approximately a 5% chance that non-spill factors could have caused the entire deviation. Any orange point within the uncertainty bounds could have been caused entirely by non-spill factors. However, one cannot conclude that the spill was not the cause or a contributing factor based just on the uncertainty bounds; a full causal analysis is needed to determine this.7

For Refugio State Beach, the actual with-spill data is well outside the uncertainty bounds through the closure period, which is an obvious spill effect consistent with the known spill-related closure. Once the site was re-opened to the public in the second half of July, recreation rebounded but not all the way to the predicted baseline, and the second half of July is still outside of the uncertainty bounds. However, the August and September data are within the uncertainty bounds, indicating that the difference between actual with-spill data and predicted baseline in those months could have been caused entirely by non-spill factors. Quantifying uncertainty reduces one's potential confidence that the spill effect extended into August and September.

For Point Mugu, all of the actual with-spill data are well within the uncertainty bounds. This suggests that it is highly possible that non-spill factors caused the entire difference between the predicted baseline and the actual data. Based on the modeling results, one can have little or no confidence that the deviation in the second half of May is, in fact, a spill effect. A causal analysis would consider additional information when interpreting these results, such as the lack of documented oiling and no recreational closures at this site.

These examples show that failing to consider uncertainty in the modeling results can lead to false confidence in the modeling results. In terms of injury determination, one can have high confidence that recreation at Refugio Beach was negatively affected by the spill. However, one can have little or no confidence that recreation at Point Mugu was affected by the spill. Based on the modeling information, recreation at Point Mugu probably should not be considered injured unless other information included in a causal analysis suggests otherwise.

Texas City “Y” Oil Spill – Dolphin Strandings

On March 22, 2014, the 585 foot bulk carrier M/V Summer Wind collided with the oil tank-barge Kirby 27706 in Galveston Bay near Texas City, Texas. The barge spilled approximately 168,000 gallons of intermediate fuel oil into lower Galveston Bay and eventually into the Gulf of Mexico. The majority of the discharged oil stranded on shorelines between Galveston and Matagorda Islands (NOAA, 2024). Following the spill, stranded dolphin carcasses were reported by the public and individuals conducting shoreline oiling assessment.

Data on recorded dolphin strandings for 2000 through 2017 were obtained from NOAA's Marine Mammals Strandings Network and used to investigate the potential effect of the TCY oil spill on dolphin mortality. None of the modeling performed for the NRDA is publicly available at this time. For this paper, the authors apply a simplified model to the data for illustrative purposes that is similar to one of the several models considered in the actual assessment.

The daily counts were aggregated by week of the year.8 Although there is a general seasonal pattern that is reflected in the mean, the data exhibit relatively high variation within weeks across years, especially in Weeks 5 through 14 (see Figure 3). In other words, there is a seasonal pattern in the mean but not necessarily a consistent seasonal pattern across years.

Figure 3:Figure 3:Figure 3:
Figure 3: Recorded strandings data used to predict baseline (the spill year is not included).

Citation: International Oil Spill Conference 2024, 1; 10.7901/2169-3358-2024.1.295

The data for the spill year were included in the regression along with a Spill variable, defined as the two weeks following the spill. The following regression model was fit to the data:

where

  • WeekOfYearn is a set of indicator variables representing time for the observation on week n; and

  • UnusualMortalityEventn is an indicator variable representing whether an unusual mortality event was designated on observation n; and

  • Spilln is an indicator variable representing the two weeks following the spill.

Figure 4 illustrates the modeling results.

Figure 4:Figure 4:Figure 4:
Figure 4: Results of model. The grey shaded area indicates the defined spill period.

Citation: International Oil Spill Conference 2024, 1; 10.7901/2169-3358-2024.1.295

The model predicts that baseline strandings (blue line) increase from Week 1 until Week 11, then decrease from Week 11 to nearly zero by Week 17, a pattern that closely follows the mean of the underlying data shown in Figure 3. There is a large difference between the actual strandings and the predicted baseline during the spill period, which (without consideration of uncertainty) appears consistent with a spill effect. The actual number of strandings during the spill period is 19. With an estimated baseline of 5, this implies a spill effect of 14 strandings if the model perfectly predicts baseline.

The Spill variable is highly statistically significant (at a greater than 99.999% confidence level), which indicates that the deviation during the spill period is very unlikely to have been the result of sampling variation. Does this mean that one can have confidence that the spill caused strandings to increase by 14? As noted above, statistical significance cannot answer this question. However, the authors’ uncertainty bounds can help one get closer to an answer.

Figure 5 shows the results of the model including uncertainty bounds. The upper graph shows the prediction errors for baseline data and their upper and lower limits. The spill year's data is not included in the upper graph. The lower graph shows the uncertainty bounds around baseline predictions and the spill year's data.

Figure 5:Figure 5:Figure 5:
Figure 5: TCY Dolphins Stranding Model Results with Uncertainty Bounds.

Citation: International Oil Spill Conference 2024, 1; 10.7901/2169-3358-2024.1.295

The width of the uncertainty bounds indicate that there is considerable uncertainty in baseline predictions; in some weeks the uncertainty is more than 100% of the prediction. In the spill period, the uncertainty bounds indicate that non-spill factors could be responsible for almost all of the deviation from baseline. These findings suggest that non-spill factors were not likely the cause of the entire deviation from predicted baseline, which suggests that the spill may have been a contributing factor. However, because non-spill factors could have caused most of the deviation, one can have little or no confidence that the deviation of 14 strandings was likely caused solely by the spill.

Considering the inaccuracy of the model as shown by the uncertainty bounds, baseline strandings could range from 0 to 18, which is 0% to 95% of the observed strandings. Obviously 0% to 95% is a very wide range; this indicates that the model provides little useful information about the number of baseline strandings, and therefore, little insight into the magnitude of a potential spill effect. The large underprediction in Week 12 of the spill year further calls the results of the model into question (the baseline was 12 strandings but the model predicted only 4.4 strandings). If the model cannot reasonably predict baseline in the time leading up to the spill, how can one have confidence that it can predict baseline after the spill?

As has been mentioned throughout this paper, a causal analysis is needed to better understand potential causes and effects. The causal analysis for the dolphin injury assessment would need to consider the pathology and necropsy analyses performed. It might also be prudent to refine the model if possible to reduce uncertainty. For example, perhaps additional variables expected to predict dolphin mortality such as water temperature and the direction of winds and tides could be included in the model (as well as being included in the causal analysis). Or, one could consider whether the patterns of strandings before the spill period (Weeks 1 through 12) could help to predict the pattern in strandings after Week 12 (making use of a common type of time-series model called autoregressive models that use past data to predict future data).

DISCUSSION

Conventional methods of interpreting time-series regression model results for NRDA injury assessments do not consider uncertainty in the baseline predictions. At worst, they don't consider uncertainty at all. At best, they typically only consider the uncertainty associated with sampling variation (through testing for statistical significance or other statistical techniques such as bootstrapping).

Regression models are necessarily abstractions and simplifications from the actual causal processes that generate data. They tend to predict means well but often underpredict or overpredict actual values. Comparing data from a spill period to a model's baseline predictions without considering the uncertainty in the predictions can lead to false confidence in the results and incorrect conclusions about the likelihood that the spill caused or contributed to an adverse impact observed following a spill.

The authors developed a method to quantify the uncertainty of the baseline predictions made with time-series regression models in NRDA injury assessments which generally considers all non-spill factors that might be affecting the results (or more accurately, all non-spill effects that are reflected in the variation of the actual baseline data). The proposed method uses a model's prediction errors to construct uncertainty bounds around the model's baseline predictions during the spill period. Applying this method can contribute to an analyst's understanding of the potential causes of deviations during injury determination and can be used to form ranges of potential service reductions during injury quantification.

As noted throughout this paper, time-series regression analysis cannot determine or quantify injury on its own; the results need to be combined with other information in causal analysis. The authors also note that the time-series regression models used in NRDAs tend to be relatively (perhaps even overly) simplistic. A reduction in predictive uncertainty can likely be gained through more advanced models. The usefulness of autoregressive and moving average time-series models as well as models that include reference sites should be evaluated. Injury assessments should include a comprehensive model selection and evaluation process. Similarly, it is generally appropriate to perform sensitivity analyses as part of model selection and evaluation. Multiple-lines-of-evidence and weight-of-evidence approaches should be considered. A formal evaluation of the data and decision process such as that described in EPA (2023) could also be useful for integrating interpretation of the modeling results into a causal analysis.

Finally, we recommend that statistical analyses be implemented in a way that can help inform the causal analysis. A better understanding of what statistical analysis can and can't do in terms of establishing causality is an important first step. While this paper touched on this topic, the reader should refer to other sources such as Pearl (2010) for more information on causal analysis.

  1. General information related to the TCY spill and NRDA can be found at NOAA (2024). Detailed NRDA methods are not yet publicly available.

  2. Although the NRDA for the Second 80s incident was conducted under the Comprehensive Environmental Response, Compensation, and Liability Act of 1980 (CERCLA), it is included here because most of the recreational impacts assessed were associated with a release of hazardous substances into the water, which was substantially similar to an oil spill for the purposes of this paper. General information related to the incident and assessment can be found at TCEQ (2024). Detailed NRDA methods are not yet publicly available.

  3. One must include post-spill data in order to include a Spill variable in the model. When the optional Spill variable is not included, data following the spill should be excluded when estimating the model.

  4. Prediction error includes more sources of variation than statistical significance (which only includes sampling variation); therefore, prediction error is a more comprehensive representation of uncertainty in the estimates than statistical significance.

  5. Readers with a statistical background should note that this step does not involve finding the confidence interval of the mean prediction error, but rather the variation of prediction errors around the mean.

  6. The spill could still be a contributing factor, which is why a full causal analysis is warranted.

  7. Indeed, this lines up with the Trustees’ conclusions for these sites in the actual assessment.

  8. Additionally, some strandings were removed from the dataset in an attempt to control for the increased search effort associated with the spill response.

REFERENCES

Copyright: Copyright 2024 – International Oil Spill Conference 2024
Figure 1:
Figure 1:

Example of Time-Series Modeling Applied to Two Outdoor Recreation Sites in the Refugio Beach Oil Spill Injury Assessment.


Figure 2:
Figure 2:

Adding Uncertainty Bounds for the Two Outdoor Recreation Sites in the Refugio Beach Oil Spill Injury Assessment.


Figure 3:
Figure 3:

Recorded strandings data used to predict baseline (the spill year is not included).


Figure 4:
Figure 4:

Results of model. The grey shaded area indicates the defined spill period.


Figure 5:
Figure 5:

TCY Dolphins Stranding Model Results with Uncertainty Bounds.


Contributor Notes

  • Download PDF