Seasonal streamflow drought forecasting based on pattern recognition concepts using statistical and machine learning approaches
Abstract
Understanding and forecasting drought events is crucial for effective water resource management
and mitigation planning. Forecasting droughts is challenging due to their inherently complex
patterns and dependencies. However, there is a tendency for droughts to occur during specific
seasons or times of the year and exhibit distinct seasonal variability. This research focuses on
analyzing seasonal drought patterns using a grouped data concept, where similar data points are
aggregated into groups to represent distinct hydrological drought conditions.
The objective is to develop a methodology that can effectively recognize and predict droughts
based on these grouped streamflow data sets. In the proposed study exploratory data analysis
techniques are used to recognize the seasonal patterns within the data to extract meaningful drought
patterns from the streamflow data. The study employed a combination of statistical methods and
machine learning techniques, including Markov models and Long Short-Term Memory models
(LSTM), to forecast the grouped seasonal streamflow data. A Markov model is employed to model
the transition probabilities among hydrological drought states, capturing the temporal
dependencies in streamflow behaviour. Subsequently, a Hidden Markov model (HMM) is utilized
to employ the underlying states (or underlying drought levels) in observed streamflow data. To
further enhance forecasting capabilities, monthly and weekly LSTM networks are utilized to learn
long-term sequential dependencies and forecast future streamflow drought patterns.
The study area was selected as the Palliser Triangle, the driest region in Canada. A total of 25 river
stations (catchment area ranging from 319 to 47,800 km2
) were chosen, representing a range of
river capacities: low flow (annual runoff range from 0 to 50 mm), medium flow ((annual runoff
range from 50 to 175 mm), and high flow (annual runoff more than 175 mm) The monthly flow
sequences of these rivers displayed the coefficient of variation ranging from 0.61 to 3.84, skewness
from 0.57 to 8.39 and lag-1 autocorrelation from 0.2 to 0.63. In view of the highly skewed nature
of monthly flows, the Box-Cox transformation was applied to normalize the data sequences and
the normalization parameter ƛ ranged from -0.96 to 0.16. The Box-Cox transformation proved
powerful for the normalization of flow data sets, which provided a strong platform for the analysis
and forecasting of hydrologic droughts. The model results revealed that the discrete Markov model
performed best for medium-flow rivers, achieving an average forecast accuracy of 65%, and the
Hidden Markov model demonstrated superior performance for both low-flow and high-flow rivers,
with an average forecast accuracy of 74%. The LSTM model showed consistent performance
across all river types, providing monthly forecasts with approximately 80% accuracy and weekly
forecasts with an impressive 90% average accuracy. [...]