Weatherman Rich's Chatmosphere: June 2018

Introduction

When checking the weather forecast in the mornings to see whether you’ll need an umbrella that day, you’ll probably be looking at the forecast for your current point-location, giving data specific to your neighbourhood. If you’re planning a weekend away, the forecast display can easily be changed to another specific point-location. But what if your business need to know what is happening across an area much greater than what can be represented by a single point location? For large areas, obtaining weather forecasts from point-locations is unlikely to suffice and may lead to strategic errors for the business, and ultimately, financial losses.

One example of where aggregated weather forecasts are required is in the energy industry. Electricity demand (load) is strongly dependent on the weather conditions, and so weather data is a vital component of electricity load forecasting. Aggregate-level load forecasts must consider the weather conditions across the entire area, as opposed to using just one location to represent the weather conditions everywhere. This article aims to address the challenges of weather location selection and proposes the best methodologies for determining how many, and which, weather locations should be used when forecasting load over large areas.

The Indian state of Uttar Pradesh, India’s most populous state, is taken as an example case study. Uttar Pradesh covers 243,290 km², equalling the size of the United Kingdom. Weather therefore varies significantly across the state – there are higher altitude areas in the north, and more rainfall occurs in the north and eastern areas. As a result, the impact of weather on the state’s load will also vary significantly across the area.

Uttar Pradesh - India’s most populous stare.

In particular, localised extreme weather events can be hugely problematic. For example, there could be heavy rainfall in Agra (west Uttar Pradesh), but sunny skies in Lucknow (central Uttar Pradesh). If the weather data is only received for Lucknow, the forecasting system would have no knowledge of the rainfall elsewhere.

(Include a relevant radar image once available)

The geography and climatology varies across Uttar Pradesh.

Methodology

In order to give an aggregated state load forecast, a virtual weather station must be formed, which best represents the weather across the entire state.

Initially, the weather forecast locations you do choose must have data availability, reliability and accuracy. All weather stations that do not, must be ignored. Let’s assume that we begin with 20 weather station locations across Uttar Pradesh that have reliable and accurate data, and provide a good geographical spread across the state.

The virtual weather station can then be built by using multiple point-locations. But how many point locations should be used? Too little will not represent the area well enough, but too many point locations may actually increase the forecast error and increases the amount of data, which has its own issues.

Once you’ve decided how many point locations to use, how do you decide which point locations should be used? Do you choose a roughly even spread of locations across the entire area? Or do you choose more in the west because population is higher there? Ultimately, you want to represent the impact of weather on load, not just the average weather of the state. This will vary with population density, economic and demographic diversity, as well as the end-use diversity of the residential consumers.

Another challenge is that weather forecasts are rarely perfect. Inaccuracies are lower the closer in time we are to the forecast time, but will often still contain significant errors. To rectify this, real-time observational data can be used to compare the forecasted values with the actual observed values and adjust the weather forecasts accordingly. This is particularly effective during days of extreme weather, because the forecasting system is able to quickly react to sudden weather changes.

Methods for aggregated weather forecasts

a) Simple Averaging

The first method simply takes an average of every forecast variable, from all 20 weather locations. Although this is a simple method, it can be very effective at capturing the impact of weather on load across a large area, as shown by multiple studies (Lloyd 2014). However, there are of course issues to using this simple average method. Firstly, averaging may actually reduce accuracy - many studies have shown that sometimes you can use too many weather locations and cause unwanted smoothing of the data. Additionally, taking the average fails to deal with:

· the effect of geographical diversity – differing weather across the state;

· demographic diversity – differing economic and classes across the state;

· end-use diversity – differing efficiencies of appliances, difference uses of electricity, etc.

Also, why 20 weather locations? Why not 5 or 10, or 100? This number has been heuristically (randomly) picked and is unlikely to be the optimal number of weather locations to use.

b) Best-fit to Load

In this second method, we progress on the simple averaging technique, by building a load model for every weather location (20 in this case). The accuracy of each model is then tested against actual load and the individual models are ranked in order of their accuracies. Next, the weather data from the top 5 best-fit models are combined, perhaps through simple averaging. This method only uses weather locations that have a strong impact on load, and so reduces the chance of errors creeping in from unimportant weather locations (Charlton and Singleton, 2014). But how do we know if these 5 models have captured the effect of geographical diversity, economic diversity or end-use diversity? And why the 5 best models and not the top 3, top 7 or top 15 models? Again, these numbers have just been heuristically selected.

c) Zonal best-fit to Load

The third method is very similar to one above, but this time we select the best model for each ‘zone’ (Nedellec et al., 2014). Zones are defined by the operators, covering a small, defined territory within the greater load area. Again, a separate load model is built, one per weather station location. The accuracy of each model is then tested against the actual load. Next, within each zone, the weather station that achieved the highest load accuracy is used. All other weather stations within that zone are ignored. Finally, the average is taken of all the zonal weather stations.

This deals with the geographical diversity problem – all zones are represented. But as a result, it may neglect the issue of capturing the population, economic or end-use diversities. And again, the number of zones and the initial number of weather locations is randomly decided.

The state can be split into zones.

d) Ranking and Rating

All three of the above methods have heuristically selected the number of weather stations to use; they do not answer the question “How many weather locations should be used?” The Ranking and Rating method attempts to answer this question (Hong & Wang et al., 2015).

A load model is built for each individual weather location. These loads are then compared against the actual aggregate load and ranked in order of their load error, in terms of the mean average percentage error (MAPE) or root mean squared error (RMSE). Obviously, the highest ranked model contains the weather station that has the highest impact on the load. The next step then combines the weather data from x number of the highest performing load models, where x is a number ranging from 2 to the total number of locations; e.g. for x = 2, 3, 4, 5, etc. Again, we fit the virtual weather station of each combination scenario with load data, and arrange in ascending order of error (MAPE or RMSE). The virtual weather station that produces the lowest error is therefore the combination of weather stations locations to use.

Combining weather stations – weighted means

How should the weather locations be combined? A simple average may be effective, but it may be more effective to apply a weighted mean. This weighting could be by population density, giving greater influence to weather in areas with higher populations. This is often used in developed countries, where population correlates well with local electricity demand, but this may not be the case in developing countries. For example in India, highly populated areas tend to be poorer areas with less energy-use per capita, and so population density does not correlate well with local electricity demand. Therefore, a different type of weighting should be applied, such as the economic density of the area. This may correlate better than population density for developing countries, because economically powerful areas tends to be co-located with areas of high residential demand.

The population density of Uttar Pradesh, India.

An important point to make here is that weather insensitive loads should be taken out of the equation. Weather-insensitive loads are sources of load that are not affected by the weather. These are mostly from industrial sources.

Weather insensitive load centres, like this car factory, must be ignored when applying the Rating and Ranking method.

Conclusions

There are multiple methods for producing an aggregated weather forecast, applicable to electricity demand forecasting and management for large load areas. The best of these is the Rating and Ranking method, which ranks the weather station locations by their load error when inputted into a single-station load model. This answers the question “what weather station locations should be used?” Next, the question of “how many weather station locations should be used?” is answered by combining the weather data from x number of the highest performing load models. The virtual weather station that produced the lowest error is therefore the combination of weather stations locations to use.

These weather stations can be combined using a simple average or a weighted mean. It is suggested that a weighted mean by population density is suitable in developed countries, but a weighted mean by economic density is more suitable in developing countries.

References

Charlton, N., & Singleton, C. (2014). A refined parametric model for short term load forecasting. International Journal of Forecasting, 30(2), 364–368.

Hong, T., Wang, P. and White, L., 2015. Weather station selection for electric load forecasting. International Journal of Forecasting, 31(2), pp.286-295.

Lloyd, J. R. (2014). GEFCom2012 hierarchical load forecasting: Gradient boosting machines and Gaussian processes. International Journal of Forecasting, 30(2), 369–374.

Nedellec, R., Cugliari, J., & Goude, Y. (2014). GEFCom2012: Electric load forecasting and backcasting with semi-parametric models. International Journal of Forecasting, 30(2), 440–446.

Weatherman Rich's Chatmosphere

Thursday, June 28, 2018

Aggregated Weather Forecasts for Large Load Areas