Abstract—The cancellation of subscribers is always a matter of special concern for service providers in general and VNPT An Giang in particular because customers are the ones who bring in revenue and bring value to service providers. To achieve growth and maintain profitability, service providers must find ways to develop new subscribers while also maintaining a stable number of existing subscribers. Therefore, it is very important to research solutions to identify and forecast subscribers who are likely to withdraw from the data network in order to have a customer care strategy to reduce leaving the network. In this paper, we present an approach to exploiting broadband internet subscriber data from the available data warehouse at VNPT An Giang, building a forecasting model for broadband internet subscribers leaving the network before 1 month and 3 months. First, collect data about the 12-month usage history of broadband internet subscribers including 102,920 active subscribers and 24,376 disconnected subscribers with 12 related attributes per subscriber; then the data is preprocessed to remove null data, negative numeric data, and duplicated data; In order to reduce the number of input attributes, select the attributes that are considered to be the most useful for the model, we use the SelectKBest method of the Sklearn library to evaluate and select 8 attributes with high scores. Based on historical data of 6 months/12 months, divide the data into 12 different data sets (in which 6 data sets are for building and evaluating the model to predict that subscribers leave the network before 1 month; 6 datasets for building and evaluating predictive models of subscribers leaving the network before 3 months) (see Table 6 and Table 7). To select the set of attributes and the most suitable model for the forecasting problem of broadband internet subscribers leaving the network, we propose to use 4 machine learning methods including Decision Tree (DT), Support Vector Machine (SVM), Multilayer Perceptron (MLP), Long short-term memory (LSTM). To evaluate the performance of machine learning methods on predictability, we use the Area Under the Curve measure (AUC).
Index Terms—Forecasting broadband internet, decision tree, support vector machine, multilayer perceptron, long short-term memory.
Dong-Ho Le is with VNPT An Giang, Vietnam.
Van-Dung Hoang is with HCMC University of Technology and Education, Vietnam.
Cite: Dong-Ho Le and Van-Dung Hoang, "Application of Classification Methods in Forecasting Broadband Internet Subscribers Leaving the Network," International Journal of Machine Learning vol. 13, no. 1, pp. 13-30, 2023.Copyright @ 2023 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).