The Development of Forecast Confidence Measures Using NCEP Ensembles and their Real-time Implementation Within NWS Web-based Graphical Forecasts

Summary of goals:

Scientific

Analysis and modeling of the functional relationship between various measures of ensemble spread and forecast confidence

Examination of the seasonal and synoptic-pattern evolution of ensemble forecast spread

Diagnosing how various initial condition perturbations influence model bias and forecast skill

Determination of the forecast value of an ensemble spread normalized by the climatological spread

Examination of the most effective methods for graphically representing forecast confidence

Operational Forecasting

Improved recognition of the level of guidance agreement or disagreement within the forecast process

Graphical display of the level of forecast confidence within the IFPS system and GFE-based web-output

Graphical display of the level of forecast confidence as it compares to climatology within the IFPS & GFE

Improved communication of the uncertainties associated with a given weather pattern to both forecasters and the public

Educational

Further training of forecasters on how to use and interpret ensemble output

Education of the public on the necessary nature of decreasing forecast confidence as forecast length increases

Education of the public that certain type of forecast patterns will necessarily imply lower/higher forecast confidence

Education of the public on how the graphically displayed forecast confidence measures should be interpreted

Improved perception of the forecast process and its communication to the public

Short term goals

Over the next 3 years, analysis of about 3 years (2004-2006) of individual model ensemble data will occur by month and forecast length for each grid point. What will result is a climatology of each ensemble member, which will obviously not match the observed climatology based upon the NCEP reanalysis. The model ensemble climatology will be normalized so that eventually there will be a mapping between the model ensemble value and the real-world value. Since there is a limited amount of data at this point, the climatology will be based on a 45 day running climatology. Variables will include MSLP, geopotential heights, temperature, wind speed and precipitation among others. Once the normalized climatology distributions have been calculated for each grid point within the ensemble member, per month or season confidence measures will be developed from comparing the normalized spread of the ensemble members. This normalized spread will also be compared to the typical spread for that time of year and location to arrive at a relative measure of forecast uncertainty.

Frequently Asked Questions

What are the time series plots actually showing?

Below is an image of the Binghamton confidence time series from the 12Z06MAY GFS run

The blue line:

The blue line in the image shows the average ensemble spread for this time of year. It is based on a 45 day running mean that is centered on the current date. You can see the average ensemble spread for this time of year increases with forecast length, from about 1F at initialization to around 6 to 7F at the 180hr forecast. This is expected because forecast error generally increases with forecast length when averaged over long periods of time.

The black line:

The current GFS ensemble is shown as the black line. In this certain image, the forecast spread is generally near the normal line until around May 8th. From about May 8th until late on May 11th, the current forecast spread is LESS than the normal forecast spread for this time of year. The area between the normal spread and the current forecast spread is shaded in green. Thus, there is MORE agreement among the ensemble members than is normal for the time of year. With all else equal, this means higher confidence in the forecast and less sensitivity to the uncertainty in the initial conditions.

However from May 12th and beyond out to the 180hr forecast, the current ensemble spread grows FASTER than the GFS climatology spread. Thus the envelope of solutions is expanding more rapidly than is normal for a 6-7 day forecast initialized on May 6th at 12Z. Thus, our forecast confidence for days 6-7 is LESS than normal. This is shown with the red shading.

The yellow line:

The yellow line shows the 1 sigma of the 25-year reanalysis 2 meter temperature climatology for BGM. Before May 12, the GFS ensemble envelope is smaller than the climatology's envelope, which is good and expected. However on May 12th and beyond, the spread of the GFS ensemble forecasts is actually larger than the observed climatology. It can be concluded that for May 12 and beyond, we should have no faith in the ensembles since their envelope of solutions is broader than what climatology gives us. In this case, it would probably be better to go with a climatology forecast on days 6 and 7. You will also notice that this line will be higher in the winter months than in the summer months since the potential range of temperature in the winter is generally larger than that in the summer months.

Since 2-meter temp is prone to more errors than MOS, why not just use MOS?

Since this was done for 2-meter temp rather than MOS is not a big concern. Even though 2-meter temp is prone to more errors than MOS, we are not concerned explicitly with the 2-m temperature itself. We are more concerned with how the 2-meter temp varies from ensemble member to ensemble member. Due to the fact that the blue line is a model climo, it is mostly calibrated. Even if ensemble MOS was used, the plot would be very similar. The main reason why we do not use MOS is because the ensemble grids are more readily available.

Will there be maps other than 2-meter temp?

Since the ensemble grids are more readily available, we will be able to produce more confidence maps/timeseries such as precip, wind, thickness and even precipitation type. Additional maps/timeseries will be available within the next month.

How are the actual forecasts performing when there is high/low confidence?

One would hope that when there is higher than normal confidence, the corresponding forecast issued at the same time should be quite accurate. On the other hand, one would expect that when there is lower than normal confidence, the corresponding forecast should be less accurate. Since early March, a number of cases have been looked at where the forecasts were verified under high and low confidence times. So far the results of the verifications indeed show that the forecasts under high confidence regimes are much more accurate than forecats made under low confidence regimes. However, there have been a few instances where forecasts under high confidence regimes have busted as much as 8 degrees and where forecasts under low confidence regimes have only busted by a degree or two. In the coming months, a number of case studies will be posted. Of course, your feedback is welcome.