Deep Learning and Machine Learning Models to Predict Energy Consumption in Steel Industry

 Abstract —This paper present the study results of predicting energy consumption in the steel industry using modeling methods based on machine learning and deep learning techniques. Machine learning algorithms used in this work include artificial neural network (ANN), k-nearest neighbors (kNN), random forest (RF), and gradient boosting (GB). Deep learning technique is long short-term memory (LSTM). Linear regression, which is the statistical-based learning algorithm, is also applied to be the baseline of this comparative study. The modeling results reveal that among the statistical-based and machine learning-based techniques, GB and RF are the best two models to predict energy consumption, whereas ANN shows the predictive performance comparable to the linear regression model. Nevertheless, LSTM outperforms both statistical-based and machine learning-based algorithms in predicting industrial energy consumption.


I. INTRODUCTION
Efficient energy usage planning and management is the primary concern in smart buildings and manufacturing.Researchers and practitioners in both sectors have long been search for accurate methods to estimate the amount of energy usage for proper design and development in smart building [1].In the manufacturing process, efficient energy usage is also important to numerous industrial applications such as petrochemical industries, iron and steel mills, and many others.
In recent years, a data-driven approach using machine learning algorithms to build a model for predicting amount of energy use has gained much popularity among researchers.In the past decade, a non-linear algorithms such as support vector machine (SVM) and support vector regression (SVR) had been applied to the problem [2][3][4][5].The SVM and SVR algorithms produce a single model to predict future value of energy use.A more sophisticate method deploys group of models to cooperatively predict energy usage.Such method is called ensemble machine learning.Many types of learning algorithms can be applied in the ensemble scheme, for example, swarm intelligence [6], a group of regression trees [7], and decision trees [8].The two ensemble learning algorithms that had been proven accurate and effective for predicting energy consumption are random forest [9][10][11] and gradient boosting [12][13][14].
Recent advancement in the field of machine learning is the introduction of deep learning technique.It has been widely used with promising results in many application areas including energy consumption estimation.The applied deep learning techniques are varied from auto-encoder [15,16], Boltzmann machine [17], deep recurrent neural network [18][19][20][21], to long short-term memory [22,23].

A. Data
Energy consumption data used in this research are obtained from the UCI repository [24].This energy data are publicly available by the Daewoo Steel Company in South Korea.The original data are comprised of 11 attributes, but this research extracts only two of them (i.e., date-time and energy usage) to be used in the modeling process.Energy usage data had been recorded continuously in a 15-minute interval starting from the time 00:15 of January 1, 2018 and end recording at the time 00:00 of December 31, 2018 resulting in 35,041 data records.The unit of energy usage is kilowatt-hour (kWh).Within the year 2018, the maximum energy usage was 157.18 kWh, while the minimum was 0 and the standard deviation was 33.44.Therefore, the dataset used in this work is time series and characteristic of the series can be illustrated as in Fig. 1.

B. Predictive Modeling Method
The modeling process of this research is comprised of three main phases: data pre-processing, model creation, and model evaluation (as displayed in Fig. 2).
At the first phase of data pre-processing, there are two steps, that are, feature extraction and data normalization.Feature extraction is the step to select two variables (or attributes/features) from the original dataset.The selected attributes are date-time and energy consumption in the steel manufacturing.Energy consumption data were recorded continuously every 15 minutes for the whole year of 2018.
The pre-processed dataset is thus consisted of two features and 35,041 instances (or records).The values of energy consumption fluctuate in the range of 0 to almost 160 kWh.Data normalization to decrease the range is therefore applied in order to obtain an accurate model in the subsequent phase.We adopt z-score normalization as shown in ( 1) when E' is the normalized energy consumption value, E is the original value of energy consumption,  is the mean value of energy consumption and  is standard deviation.
The second phase of this research is model creation.We utilize three types of modeling algorithms: statistical-based, machine learning, and deep learning.The statistical-based learning algorithm is linear regression.It is to be used as the baseline to compare performance of predictive models.
The machine learning-based algorithms are composed of two categories of learning algorithms: single model and a group of models (or ensemble).Learning algorithms that generate a single model for predicting future values are k-nearest neighbors (kNN) and artificial neural network (ANN).Ensemble algorithms that use a group of models to work in a cooperative manner in forecasting future values of energy consumption are random forest (RF) and gradient boosting (GB).In the first step of machine learning-based modeling process, there are ten more features (namely, lag1 up to lag10) to be added to the dataset.The augmented features are lagged data of energy consumption, which are data records in the previous ten periods of time.This feature augmentation step is for producing an accurate model.
The deep learning algorithm is long short-term memory (LSTM).The algorithm LSTM is a kind of regression network that utilizes historical data sequence to estimate future value.LSTM works in an iteration manner such that at each round (or epoch) a set of historical time series is used for training the network to predict future series one at a time and then shifts the series by one step and update the network state to predict the next time series.We specify the LSTM network to compose of 200 hidden units and train the network for 250 epochs.Initial learning rate of LSTM is 0.005 and then decrease the rate by a factor of 0.2 after 125 epochs.The improved LSTM model is also generated.The performance of LSTM can be improved by using observed values instead of the predicted value to update the network.
The last phase of this research framework is model evaluation.For statistical-based and machine learning-based modeling schemes, two types of model assessment have been deployed, that are, cross validation and hold out methods.Cross validation has been repeated ten times using ten data subsets.For each of the ten iteration, nine data subsets had been used as training set and the remaining one subset is for testing.Each round a different test set is applied.The hold out method separates data into two subsets: 90% of the data are for training and the remaining 10% are for testing.In deep learning, it is forecasting a time series such that the order of series is important, therefore cross validation does not make sense.We thus perform only the hold out evaluation method.

A. Performance of Machine Learning Models
To compare predictive performance of models built from different types of learning algorithms, we adopt three evaluation metrics: root mean squared error (RMSE), mean absolute error (MAE), and R-squared (R 2 ).RMSE and MAE measure the difference between actual energy consumption values and the values predicted by the model.Thus, the lower is the better.The metric R 2 (or coefficient of determination) represents the goodness of fit showing how well the data fit the regression model.The higher R 2 is the better because the model can capture more variability in the dataset.
Results of model performance evaluation using 10-fold CV method are illustrated in Table I.Gradient boosting is the best algorithm with lowest error and highest R 2 .Random forest is the second best one.Algorithm k-nearest neighbor comes in the third place showing performance better than artificial neural network.Linear regression is the worst model.

B. Performance of Deep Learning Model
Performance of the LSTM model to forecast future values of energy consumption is shown in Fig. 3.The network has been updated with the predicted values thus making RMSE quite high at 29.0043.When the model has been improved by using observed values to update the network, the RMSE is as low as 7.957 (in Fig. 4).
Predictive performances of the three groups of learning algorithms (i.e., statistical-based, machine learning, deep learning) are summarized and shown in Table II.The improved long short-term memory show the best performance when assessed the model with hold out method.Random forest and gradient boosting are quite comparable and come in the second and third place, respectively.Linear regression and artificial neural network are almost at the same performance level, whereas k-nearest neighbors shows the worst performance.It can be noticed from the experimental results that k-nearest neighbors is subjective to the test set in the sense that using different test data, predictive performance can be affected significantly.

IV. CONCLUSION
This research presents the experimental studied results of applying data-drive methods to predict energy consumption, which is the area of interest in many domains including manufacturing and smart buildings.Modeling methods to forecast energy usage can utilize different schemes such as physical modeling based on accurate equation formulation, intelligent modeling based on historical data, and a hybrid method.This research focuses on the intelligent modeling method because of its less time consuming compared to the physical method.The intelligent modeling method is based on the application of various machine learning algorithms.
Performance of intelligent modeling using three different groups of machine learning schemes has been explored.The machine learning schemes include statistical-based using linear regression algorithm, the machine learning-based using k-nearest neighbors, artificial neural network, random forest, and gradient boosting algorithms, and deep learning using long short-term memory algorithm.The experimental results reveal that deep learning is the best method to be applied for energy consumption prediction, whereas the ensemble scheme using random forest and gradient boosting algorithms are among the second best one.

Fig. 2 .
Fig.2.The data-driven modeling process for predicting energy consumption in steel industry.

TABLE I :
PERFORMANCE OF FIVE MACHINE LEARNING MODELS ON PREDICTING ENERGY CONSUMPTION (ASSESSED BY 10-FOLD CROSS

TABLE II :
PREDICTIVE PERFORMANCE OF DEEP LEARNING MODEL COMPARATIVE TO MACHINE LEARNING MODELS (ASSESSED BY HOLD-OUT METHOD USING 10% OF DATA AS A TEST SET)