Introduction
When checking the weather forecast in the mornings to see
whether you’ll need an umbrella that day, you’ll probably be looking at the forecast
for your current point-location, giving data specific to your neighbourhood. If
you’re planning a weekend away, the forecast display can easily be changed to
another specific point-location. But what if your business need to know what is
happening across an area much greater than what can be represented by a single
point location? For large areas, obtaining weather forecasts from point-locations
is unlikely to suffice and may lead to strategic errors for the business, and
ultimately, financial losses.
One example of where aggregated weather forecasts are
required is in the energy industry. Electricity demand (load) is strongly
dependent on the weather conditions, and so weather data is a vital component
of electricity load forecasting. Aggregate-level load forecasts must consider
the weather conditions across the entire area, as opposed to using just one location
to represent the weather conditions everywhere. This article aims to address
the challenges of weather location selection and proposes the best methodologies
for determining how many, and which, weather locations should be used when
forecasting load over large areas.
The Indian state of Uttar Pradesh, India’s most populous
state, is taken as an example case study. Uttar Pradesh covers 243,290 km2,
equalling the size of the United Kingdom. Weather therefore varies
significantly across the state – there are higher altitude areas in the north,
and more rainfall occurs in the north and eastern areas. As a result, the
impact of weather on the state’s load will also vary significantly across the area.
Uttar Pradesh
- India’s most populous stare.
In particular, localised extreme weather events can be
hugely problematic. For example, there could be heavy rainfall in Agra (west
Uttar Pradesh), but sunny skies in Lucknow (central Uttar Pradesh). If the
weather data is only received for Lucknow, the forecasting system would have no
knowledge of the rainfall elsewhere.
(Include a relevant radar image once available)
The
geography and climatology varies across Uttar Pradesh.
Methodology
In order to give an aggregated state load forecast, a virtual weather station must be formed,
which best represents the weather across the entire state.
Initially, the weather forecast locations you do choose must
have data availability, reliability and accuracy. All weather stations that do
not, must be ignored. Let’s assume that we begin with 20 weather station
locations across Uttar Pradesh that have reliable and accurate data, and provide
a good geographical spread across the state.
The virtual weather station can then be built by using
multiple point-locations. But how many point locations should be used? Too
little will not represent the area well enough, but too many point locations
may actually increase the forecast error and increases the amount of data,
which has its own issues.
Once you’ve decided how many point locations to use, how do
you decide which point locations should be used? Do you choose a roughly even
spread of locations across the entire area? Or do you choose more in the west
because population is higher there? Ultimately, you want to represent the
impact of weather on load, not just the average weather of the state. This will
vary with population density, economic and demographic diversity, as well as the
end-use diversity of the residential consumers.
Another challenge is that weather forecasts are rarely
perfect. Inaccuracies are lower the closer in time we are to the forecast time,
but will often still contain significant errors. To rectify this, real-time
observational data can be used to compare the forecasted values with the actual
observed values and adjust the weather forecasts accordingly. This is
particularly effective during days of extreme weather, because the forecasting
system is able to quickly react to sudden weather changes.
Methods for aggregated weather forecasts
a)
Simple Averaging
The first method simply takes an average of every forecast
variable, from all 20 weather locations. Although this is a simple method, it
can be very effective at capturing the impact of weather on load across a large
area, as shown by multiple studies (Lloyd 2014). However, there are of course
issues to using this simple average method. Firstly, averaging may actually
reduce accuracy - many studies have shown that sometimes you can use too many
weather locations and cause unwanted smoothing of the data. Additionally,
taking the average fails to deal with:
·
the effect of geographical diversity – differing
weather across the state;
·
demographic diversity – differing economic and
classes across the state;
·
end-use diversity – differing efficiencies of
appliances, difference uses of electricity, etc.
Also, why 20 weather locations? Why not 5 or 10, or 100?
This number has been heuristically (randomly) picked and is unlikely to be the
optimal number of weather locations to use.
b)
Best-fit to Load
In this second method, we progress on the simple averaging
technique, by building a load model for every weather location (20 in this case).
The accuracy of each model is then tested against actual load and the
individual models are ranked in order of their accuracies. Next, the weather
data from the top 5 best-fit models are combined, perhaps through simple
averaging. This method only uses weather locations that have a strong impact on
load, and so reduces the chance of errors creeping in from unimportant weather
locations (Charlton and Singleton, 2014). But how do we know if these 5 models
have captured the effect of geographical diversity, economic diversity or
end-use diversity? And why the 5 best models and not the top 3, top 7 or top 15
models? Again, these numbers have just been heuristically selected.
c)
Zonal best-fit to Load
The third method is very similar to one above, but this time
we select the best model for each ‘zone’ (Nedellec et al., 2014). Zones are
defined by the operators, covering a small, defined territory within the
greater load area. Again, a separate load model is built, one per weather
station location. The accuracy of each model is then tested against the actual
load. Next, within each zone, the weather station that achieved the highest
load accuracy is used. All other weather stations within that zone are ignored.
Finally, the average is taken of all the zonal weather stations.
This deals with the geographical diversity problem – all
zones are represented. But as a result, it may neglect the issue of capturing
the population, economic or end-use diversities. And again, the number of zones
and the initial number of weather locations is randomly decided.
The
state can be split into zones.
d)
Ranking and Rating
All three of the above methods have heuristically selected
the number of weather stations to use; they do not answer the question “How
many weather locations should be used?” The Ranking and Rating method attempts
to answer this question (Hong & Wang et al., 2015).
A load model is built for each individual weather location.
These loads are then compared against the actual aggregate load and ranked in
order of their load error, in terms of the mean average percentage error (MAPE)
or root mean squared error (RMSE). Obviously, the highest ranked model contains
the weather station that has the highest impact on the load. The next step then
combines the weather data from x number of the highest performing load models,
where x is a number ranging from 2 to the total number of locations; e.g. for x
= 2, 3, 4, 5, etc. Again, we fit the virtual weather station of each combination
scenario with load data, and arrange in ascending order of error (MAPE or RMSE).
The virtual weather station that produces the lowest error is therefore the
combination of weather stations locations to use.
Combining weather stations – weighted means
How should the weather locations be combined? A simple average
may be effective, but it may be more effective to apply a weighted mean. This
weighting could be by population density, giving greater influence to weather
in areas with higher populations. This is often used in developed countries,
where population correlates well with local electricity demand, but this may
not be the case in developing countries. For example in India, highly populated
areas tend to be poorer areas with less energy-use per capita, and so
population density does not correlate well with local electricity demand.
Therefore, a different type of weighting should be applied, such as the
economic density of the area. This may correlate better than population density
for developing countries, because economically powerful areas tends to be co-located
with areas of high residential demand.
The population density of Uttar
Pradesh, India.
An important point to make here is that weather insensitive
loads should be taken out of the equation. Weather-insensitive loads are
sources of load that are not affected by the weather. These are mostly from
industrial sources.
Weather insensitive load centres, like this
car factory, must be ignored when applying the Rating and Ranking method.
Conclusions
There are multiple methods for producing an aggregated
weather forecast, applicable to electricity demand forecasting and management
for large load areas. The best of these is the Rating and Ranking method, which
ranks the weather station locations by their load error when inputted into a
single-station load model. This answers the question “what weather station
locations should be used?” Next, the question of “how many weather station
locations should be used?” is answered by combining the weather data from x
number of the highest performing load models. The virtual weather station that
produced the lowest error is therefore the combination of weather stations
locations to use.
These weather stations can be combined using a simple
average or a weighted mean. It is suggested that a weighted mean by population
density is suitable in developed countries, but a weighted mean by economic
density is more suitable in developing countries.
References
Charlton, N., & Singleton, C. (2014). A refined
parametric model for short term load forecasting. International Journal of
Forecasting, 30(2), 364–368.
Hong, T., Wang, P. and
White, L., 2015. Weather station selection for electric load forecasting. International
Journal of Forecasting, 31(2), pp.286-295.
Lloyd, J. R. (2014). GEFCom2012 hierarchical load forecasting:
Gradient boosting machines and Gaussian processes. International Journal of
Forecasting, 30(2), 369–374.
Nedellec, R., Cugliari, J., & Goude, Y. (2014).
GEFCom2012: Electric load forecasting and backcasting with semi-parametric
models. International Journal of Forecasting, 30(2), 440–446.





