Next Article in Journal
Conductivity Transport Mechanisms of Solution-Processed Spinel Nickel Cobaltite-Based Hole Transporting Layers and Its Implementation as Charge Selective Contact in Organic Photovoltaics
Previous Article in Journal
Synoptic Aspects of the Supercell of Halkidiki, 10 July 2019
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Optimization of Weather Forecast Data Using Machine Learning Algorithms †

by
Dimitrios Soumelidis
,
Georgios Karoutsos
*,
Nikolaos Skepastianos
and
Nicolas Tzonichakis
General Aviation Applications 3D S.A., 54646 Thessaloniki, Greece
*
Author to whom correspondence should be addressed.
Presented at the 16th International Conference on Meteorology, Climatology and Atmospheric Physics—COMECAP 2023, Athens, Greece, 25–29 September 2023.
Published: 24 August 2023

Abstract

:
Numerical weather prediction models exhibit errors while simulating atmospheric processes. To provide alerts for weather hazards, early warning systems are fed with forecast data from these models. The success of such an early warning system requires the minimization of errors that are induced by the forecast models. On the other hand, machine learning techniques have been proposed as an alternate method for nonlinear and dynamic systems due to the fact that this approach includes effective structure and parameter estimation methodologies, and it is powerful when implemented for problems whose resolutions require knowledge that is hard to specify. In this study, the goal is to implement machine learning methods as post-process algorithms on model output. The algorithm will discover the patterns that produce the errors and then lead to improved information for the system. This way, better planning and more efficient decision making are possible. High-resolution forecast data are available from Weather Research and Forecasting Model (WRF) simulations using initial and boundary conditions from the Global Forecasting System (GFS). Using nested domains, the desired downscaling can be achieved. Observations are available from General Aviation Applications 3D S.A.’s automatic weather station network, which has been operational for over 5 years. The network covers the region of Central Macedonia and has more than twenty stations. Ten of them were selected based on the availability of the data and the data quality control checks. Two sets of data are established. The first one is used to train the algorithm and the other to validate the performance of the new forecast.

1. Introduction

Weather forecast models produce numerical solutions to simulate physical processes in the atmosphere. However, these solutions fail to accurately describe physical processes, and this is one reason why errors are observed in forecasts. An effective way to reduce these errors and improve the performance of a numerical weather prediction model is the application of different algorithms to the output of the model. The goal of such algorithms is to predict the forecast error and remove it out of the final output of the forecast.
On the other hand, machine learning is a subfield of artificial intelligence that focuses on developing algorithms and techniques that enable computers to learn and make predictions or decisions without being explicitly programmed. It involves the study of algorithms and statistical models that allow computers to learn from and make predictions or take actions based on data. The core idea behind machine learning is to enable computers to learn patterns or relationships from data and generalize that knowledge to make predictions or decisions on new, unseen data. Rather than following a predetermined set of rules, machine learning algorithms learn iteratively from examples or experiences, continuously improving their performance over time.
Machine learning has a wide range of applications across various domains, including image [1] and speech recognition, recommendation systems, autonomous vehicles [2], finance, healthcare, weather forecast [3,4], and many others. It has revolutionized many industries and continues to advance rapidly with the availability of large datasets, increased computational power, and advancements in algorithms and techniques.
Artificial neural networks (ANNs) are a subset of machine learning algorithms that are modeled after the structure and function of the human brain. They consist of interconnected nodes, or neurons, that are organized into layers. The input layer receives the raw data, which are then processed through one or more hidden layers before being output as a prediction or classification.
The training of an ANN is a form of machine learning that involves adjusting the weights and biases of the neurons in the network based on a training dataset [5]. The goal is to minimize the difference between the predicted output of the network and the actual output for a given input. Once trained, the ANN can be used to make forecasts by feeding in new input data and computing the output using the trained weights. The accuracy of the forecasts depends on the quality and quantity of the input data, the network architecture, and the training algorithm used.
ANNs have been widely used in weather forecasting due to their ability to learn complex patterns and relationships in data. Hanoon et al. [6] showed that an ANN architecture has good potential to predict daily temperature and relative humidity, with an acceptable range of accuracy. Machine learning and deep neural network models were tested on weather station data by Talsma et al. [7], showing results with promising accuracy (6 h prediction RMSE = 1.53–1.72 °C) for use in frost and minimum-temperature prediction applications. Moosavi et al. [8] performed experiments using WRF-ARW model forecast data that demonstrate the strong potential of machine learning approaches to aid the study of model errors. While their experiments were focused on forecasting precipitation, the methodology developed was general and can be applied to the study of errors in other models, for other quantities of interest, and for learning additional relationships between model physics and model errors.
In this study, an ANN is used with inputs from a weather forecast model to accurately predict the air temperature at 2 m. Both hourly and daily data are used. The output is compared to air temperature measurements from weather stations inside the region of Central Macedonia, Greece. The training period of the ANN is from 1 September 2021 to 31 December 2021. The test period is from 1 January 2022 to 31 March 2023.

2. Materials and Methods

2.1. Artificial Neural Network Structure

ANNs used for temperature forecasts typically consist of an input layer, one or more hidden layers, and an output layer. The number of nodes in each layer and the connections between them are determined based on the problem requirements and the available data.
In this specific case, the ANN has a combination of 5 to 9 input nodes and each of them represents input variables, such as temperature at 2 m, skin temperature, temperature at the first and second vertical layer of the model, temperature at 850 hPa, dew point at 2 m, wind speed, cloud fraction, and soil temperature. The input layer feeds into two hidden layers, each consisting of several nodes, with an activation function being responsible for the output of each node. Each hidden layer was tested with different number of nodes, but the chosen activation function is the hyperbolic tangent. The output layer consists of a single node that provides the forecasted value of the target variable, which, in this case, is air temperature at 2 m in case of hourly data or minimum and maximum temperature in case of daily data.
During the training process, the network learns to map the input variables to the target variable by adjusting the weights of the connections between the nodes. This is achieved using a backpropagation algorithm that minimizes the mean squared error (MSE) between the predicted and actual values of the target variable. The MSE is calculated by taking the average of the squared differences between the predicted and actual values over all training examples. The backpropagation algorithm involves computing the error at the output layer and then propagating it backwards through the network to adjust the weights of the connections. This is carried out using the chain rule of calculus to calculate the gradient of the error with respect to each weight.

2.2. Forecast Model

The Weather Research and Forecasting model with the Advance Research WRF (WRF-ARW) core is used to produce temperature forecasts on an hourly basis for Greece for the next 5 days. The WRF-ARW model is a next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications [9]. It was developed at the National Center for Atmospheric Research (NCAR), which is operated by the University Corporation for Atmospheric Research (UCAR).
The necessary initial and boundary conditions for the 5-day daily forecast are provided by the US National Aeronautics and Space Administration (NCAR) through the 1200UTC forecast cycle of the Global Forecasting System (GFS).
Using nested domains, the necessary downscaling is achieved to the desired 2 × 2 km resolution for the final grid that covers a great part of Greece (Figure 1). A complete set of forecasts for the period from 1 September 2021 up to 31 March 2023 is available with exactly the same set-up as the model. The model data are not only available on an hourly basis but also data that were calculated daily.

2.3. Observations

On the other hand, to verify the accuracy of the forecast, the automated weather station network of the General Aviation Applications 3D S.A. (3DSA) company was used. The network has twenty weather stations that are working operationally inside the region of Central Macedonia, Greece. Hourly and daily data from ten of them, which have a complete dataset for the period from September 2021 up to March 2023 without even a single hour missing, were used for the verification. Also, daily data from another seven automated weather stations, which are in the same region but are part of the National Observatory of Athens’ network (NOA) [10], were used. Figure 2 shows the location of each weather station. For each weather station, the nearest grid point of the model was selected for the verification process.

3. Results and Discussion

3.1. Error Analysis

Comparing the hourly air temperature forecasts with the measurements from the weather stations, the errors are bigger, on average, during nighttime and much smaller during daytime, and this fact is true for all five forecast days. The average mean absolute error (MAE) for noon and afternoon hours is never greater than 2.0 °C, except for the fourth and fifth forecast day. On the contrary, during night and early morning hours, the MAE is greater than 2.5 °C, even for the first forecast day, and it is 3.5 °C for the last forecast day. Similarly, the daily maximum temperature forecast is much more accurate than the daily minimum temperature. As shown in Table 1, the bias, the MAE, and the Root Mean Squared Error (RMSE) are almost 1–2 °C less for maximum temperature than they are for minimum temperature. The errors for maximum temperature are almost unbiased and similar to all stations. The errors for minimum temperature are greater to some stations (e.g., Agras, Elani, Serres) and much smaller to others (e.g., Epanomi, Naoussa).

3.2. Application of the Artificial Neural Network

Testing the optimal number of inputs and the variables that should be used showed that cloud fraction introduces noise, and the results are better without it. On the other hand, fewer variables than eight offered less reduction in the forecast errors. The final list of the inputs for the hourly forecast is air temperature at 2 m, dew point at 2 m, skin temperature, soil temperature (closest level to surface), model temperature (first and second vertical layer), temperature at 850 hPa, and wind speed. For maximum and minimum temperature forecast, the list is maximum daily temperature and minimum daily temperature, respectively, and the value of the other seven variables at the time of the maximum or minimum temperature is referred to.
Another crucial thing is the number of nodes in the hidden layers. A lot of different numbers were tested. For five nodes in each hidden layer, the lowest accuracy of the forecast is observed. Increasing the nodes up to 150, the accuracy is improved. For 300 or 500 nodes in each hidden layer, the accuracy is slightly worse. For this reason, the results for 150 nodes in each hidden layer will be presented afterwards.
The first major impact of the ANN is that it minimizes bias. The bias of the hourly temperature for all forecast hours, the maximum temperature, and the minimum temperature is less than 0.3 °C, and in most of the cases is 0 °C. The meaning of this is that any noise that the algorithm could detect within the available data is excluded from the new forecasts. As shown in Table 2, the MAE and RMSE are significantly reduced for the minimum temperature for about 1–1.2 °C. The MAE is about 2.6–3.0 °C at first and, after the application of the ANN, decreases to 1.6–1.9 °C. The RMSE is about 3.2–3.6 °C at first and, after, it decreases to 2.0–2.4 °C. The percentage of forecast errors that are greater than 2 °C is between 51 and 57% at first but, after, it is about 30%. For the weather station in Agras, the MAE for the third forecast day reduced from 3.8 °C to 2.0 °C, and the number of forecasts with errors greater than 2 °C reduced from 312 to 173. The same indicator for the weather station in Naoussa reduced only from 1.3 °C to 1.1 °C.
The MAE and RMSE for the maximum temperature have not changed at all. For the second forecast day, the MAE even increased by 0.1 °C, and for the first forecast day, the RMSE increased from 1.7 °C to 1.9 °C. The main reason for this result is that the bias of the maximum temperature was almost 0 °C or very close to 0 °C (0.2 °C for the fifth forecast day).
For the hourly temperature forecasts, the MAE and RMSE are reduced for the nighttime and early morning hours when the initial errors are bigger. For the noon and afternoon forecast hours, the reduction is small or there is no error reduction at all.

4. Conclusions

The application of an artificial neural network to the forecast data of the WRF-ARW model showed that it can reduce forecast errors. In cases when the error was high, the reduction was also high. When the error was small, or the bias of the error was too close to zero, the algorithm failed to improve the forecasts. The best results with this algorithm can be observed when the ANN is applied on the minimum temperature forecasts and on hourly data with high errors, like those in the nighttime and early morning hours. For maximum temperature forecast and hourly forecasts that predict the temperature at noon and afternoon hours, it is better to not apply the algorithm as the improvement is insignificant.
Possibly, a third hidden layer and the addition of more variables as inputs from the same model or another model could solve the problem when the bias is zero.

Author Contributions

D.S. and G.K. designed and applied the research; N.S. acquired the data; D.S., G.K. and N.S. wrote the original draft; N.T. reviewed and edited the manuscript and provided technical help. All authors have read and agreed to the published version of the manuscript.

Funding

This research was co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH–CREATE–INNOVATE (project code: T2EDK-05354, “EXTREMES”).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are openly available at https://extremesweather.gr/data/extremes/Dataset_3DSA_WRF-ANN.zip (accessed on 1 August 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
  2. Bachute, M.R.; Subhedar, J.M. Autonomous Driving Architectures: Insights of Machine Learning and Deep Learning Algorithms. Mach. Learn. Appl. 2021, 6, 100164. [Google Scholar] [CrossRef]
  3. Abhishek, K.; Singh, M.P.; Ghosh, S.; Anand, A. Weather Forecasting Model using Artificial Neural Network. Proc. Technol. 2012, 4, 311–318. [Google Scholar] [CrossRef]
  4. Bochenek, B.; Ustrnul, Z. Machine Learning in Weather Prediction and Climate Analyses—Applications and Perspectives. Atmosphere 2022, 13, 180. [Google Scholar] [CrossRef]
  5. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  6. Hanoon, M.S.; Ahmed, A.N.; Zaini, N.; Razzaq, A.; Kumar, P.; Sherif, M.; Sefelnasr, A. Developing machine learning algorithms for meteorological temperature and humidity forecasting at Terengganu state in Malaysia. Sci. Rep. 2021, 11, 18935. [Google Scholar] [CrossRef] [PubMed]
  7. Talsma, C.J.; Solander, K.C.; Mudunuru, M.K.; Crawford, B.; Powell, M.R. Frost prediction using machine learning and deep neural network models. Front. Artif. Intell. 2023, 5, 963781. [Google Scholar] [CrossRef] [PubMed]
  8. Moosavi, A.; Rao, V.; Sandu, A. Machine learning based algorithms for uncertainty quantification in numerical weather prediction models. J. Comput. Sci. 2021, 50, 101295. [Google Scholar] [CrossRef]
  9. Skamarock, W.C.; Klemp, J.B.; Dudhia, J.; Gill, D.O.; Liu, Z.; Berner, J.; Wang, W.; Powers, J.G.; Duda, M.G.; Barker, D.M.; et al. A Description of the Advanced Research WRF Version 4. In NCAR Tech. Note NCAR/TN-556+STR, 1st ed.; NCAR: Boulder, Colorado, USA, 2019; p. 145. [Google Scholar] [CrossRef]
  10. Lagouvardos, K.; Kotroni, V.; Bezes, A.; Koletsis, I.; Kopania, T.; Lykoudis, S.; Mazarakis, N.; Papagiannaki, K.; Vougioukas, S. The automatic weather stations NOANN network of the National Observatory of Athens: Operation and datebase. Geosci. Data J. 2017, 4, 4–16. [Google Scholar] [CrossRef]
Figure 1. WRF-ARW forecast domains. Most of Greece is covered by a 2 × 2 km grid with 280 × 271 grid points and 40 vertical levels. Yellow color: the region of Central Macedonia, Greece.
Figure 1. WRF-ARW forecast domains. Most of Greece is covered by a 2 × 2 km grid with 280 × 271 grid points and 40 vertical levels. Yellow color: the region of Central Macedonia, Greece.
Environsciproc 26 00049 g001
Figure 2. Location of the automated weather stations. Blue dots show the weather stations of 3DSA company. Red dots show the weather stations of the NOA network.
Figure 2. Location of the automated weather stations. Blue dots show the weather stations of 3DSA company. Red dots show the weather stations of the NOA network.
Environsciproc 26 00049 g002
Table 1. On the left, Bias, MAE, and RMSE averaged between all 17 weather stations for the entire test period (January 2022–March 2023) for every forecast day before the application of the ANN. On the right, the MAE for the third forecast day for six weather stations before the application of the ANN.
Table 1. On the left, Bias, MAE, and RMSE averaged between all 17 weather stations for the entire test period (January 2022–March 2023) for every forecast day before the application of the ANN. On the right, the MAE for the third forecast day for six weather stations before the application of the ANN.
Forc
Day
Minimum
Temperature
Maximum
Temperature
Weather StationMinimum
Temperature
Maximum
Temperature
20222023Total20222023Total
BiasMAERMSEBiasMAERMSEAgras3.93.53.81.51.51.5
12.12.73.3−0.11.41.7Flamouria3.12.73.01.71.61.6
22.22.83.30.11.51.9Epanomi1.11.41.21.61.21.5
32.22.83.40.11.62.0Elani3.73.93.71.21.41.3
42.22.93.50.11.72.2Naoussa1.31.31.31.61.41.6
52.23.03.60.22.02.5Serres3.13.23.11.71.81.7
Table 2. On the left, bias, MAE, and RMSE averaged between all 17 weather stations for the entire test period (January 2022–March 2023) for every forecast day after the application of the ANN. On the right, the MAE for the third forecast day for six weather stations after the application of the ANN.
Table 2. On the left, bias, MAE, and RMSE averaged between all 17 weather stations for the entire test period (January 2022–March 2023) for every forecast day after the application of the ANN. On the right, the MAE for the third forecast day for six weather stations after the application of the ANN.
Forc
Day
Minimum
Temperature
Maximum
Temperature
Weather StationMinimum
Temperature
Maximum
Temperature
20222023Total20222023Total
BiasMAERMSEBiasMAERMSEAgras1.92.42.01.61.51.6
10.11.62.1−0.11.41.9Flamouria1.41.71.51.71.41.5
20.21.72.2−0.11.61.9Epanomi1.11.31.11.51.11.4
30.11.82.30.01.62.0Elani1.82.52.01.41.21.3
40.11.82.30.01.72.2Naoussa1.11.31.11.61.31.5
50.01.92.40.02.02.5Serres1.82.41.91.81.81.8
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Soumelidis, D.; Karoutsos, G.; Skepastianos, N.; Tzonichakis, N. Optimization of Weather Forecast Data Using Machine Learning Algorithms. Environ. Sci. Proc. 2023, 26, 49. https://0-doi-org.brum.beds.ac.uk/10.3390/environsciproc2023026049

AMA Style

Soumelidis D, Karoutsos G, Skepastianos N, Tzonichakis N. Optimization of Weather Forecast Data Using Machine Learning Algorithms. Environmental Sciences Proceedings. 2023; 26(1):49. https://0-doi-org.brum.beds.ac.uk/10.3390/environsciproc2023026049

Chicago/Turabian Style

Soumelidis, Dimitrios, Georgios Karoutsos, Nikolaos Skepastianos, and Nicolas Tzonichakis. 2023. "Optimization of Weather Forecast Data Using Machine Learning Algorithms" Environmental Sciences Proceedings 26, no. 1: 49. https://0-doi-org.brum.beds.ac.uk/10.3390/environsciproc2023026049

Article Metrics

Back to TopTop