key: cord-0070384-50s2mrpm authors: Shukla, Vishal; Prashar, Sanjeev; Pandiya, Bhartrihari title: Is price a significant predictor of the churn behavior during the global pandemic? A predictive modeling on the telecom industry date: 2021-11-22 journal: J Revenue Pricing Manag DOI: 10.1057/s41272-021-00367-2 sha: deed599538efb9ae637c4e8278a2ebe58d8bdc59 doc_id: 70384 cord_uid: 50s2mrpm The recent pandemic has affected the world in many aspects including, communication. Telecom and internet-based communication have witnessed a drastic upsurge due to lockdown and consequent work-from-home situation, being termed as the new normal. Due to the low switching costs and stiff competition, telecom service providers are struggling hard to attract new customers and prevent the existing ones from switching to rival telecom service providers. Hence this study is undertaken with an aim to discover the key factors in the order of their relative worth that could be focused upon by the telecom companies to prevent their customers from churning. For the study, linear discriminant analysis was applied on the collected dataset to predict the customers’ churn behavior. It was found that tariff rates for domestic calls and the number of calls made to the customer service were the significant predictors of the customer churn behavior. The global pandemic caused due to the novel coronavirus led to a nationwide lockdown in India starting mid-March 2020. People were confined to the boundaries of their homes and offline activities were significantly replaced by online activities. This became a new normal leading to an increase in internet and telephony usage. The COVID-19 pandemic has changed the pattern of telecom consumption habits for not only internet usage but also for telephonic conversation. Due to the nationwide lockdown and norms related to social distancing, and subsequent confinement to homes, an upsurge in usage of telecom services was witnessed. Even when this pandemic ends, it is expected that this habit of staying and working remotely will remain in the coming times as people and various organizations globally have gradually adjusted to this new normal (De et al. 2020 ). This creates an increased demand for the internet and telephony services, both in numbers and quality of services, leading to stringent competition in the telecom industry. Being locked down and immobile in their homes, increased the internet usage and calling as people had more leisure time too. This allowed them to engage more with browsing online content or social media activity. There is a drastic increase in the usage of video conferencing services, like Zoom, content delivery services like Akamai (Branscombe 2020) . This has led to a race between telecom operators to provide cheap data and calling plans so that they can take a competitive edge and increase their revenues. Around 5.74 million subscribers requested mobile number portability (MNP) in March 2020, but this number increased by 7.12% to 6.18 million in November 2020 (TRAI Report 2020). This unabated increase in requests for number portability within eight months provides firm evidence of dissatisfied customers, which is a matter of serious concern for telecom service providers. Though revenues might be short-termed, to have longterm sustainability, telecom services providers need to work upon customer loyalty as the market is gradually saturating (Hwang and Kim 2018) . The customers are exhibiting churn behavior and are switching from one telecom operator to another based on certain unidentified reasons. To sustain their profitability, telecom operators must make a serious effort to identify the factors responsible for customer churning. Therefore, this research has been endeavored to identify the variables of churning intention in the telecom market and their relative worth. These variables of churning, once identified, can specify the reasons for churning. In addition, this research also attempts to find solutions that can lead to a decrease in the probability of customer churn from their telecom service provider. The service providers can draft the requisite strategy, using these variables, to reduce the churning of their customers. To fulfill these objectives, the research questions framed were: RQ1. What are the variables that lead to churning behavior in the telecom sector? RQ2. What is the relative worth of each of these variables that lead to churning behavior in the telecom sector? RQ3. Which of the variables of churning behavior lead to a decrease in the probability of customer churn? The paper has been organized is in the following format: it opens with the introduction of the theme of the research, i.e., how the pandemic has added to the churning in the telecom industry. The subsequent part details the overview of the research work available in the extant literature and identification of the literature gap, followed by the outline of the material and methods employed in this research work. The description of the data is followed by the information about the discriminant analysis, problem formulation, and data pre-processing. Thereafter, exploratory data analysis has been described. The next section details about the development of the prediction model and its testing divided into eight steps, which illustrate the step-by-step analysis, their respective outputs, and their interpretations. Finally, the findings of the research work and concluding remarks have been presented, followed by the managerial implications and suggestions for telecom service providers. The paper closes with a discussion on limitations along with the future scope of the study. The literature review has been undertaken to comprehend the various aspects of churning behavior. The varied methods and techniques used by previous researchers in predicting the churn have been discussed in the context of the present research work. Besides this, the changing consumption behavior of customers during the current pandemic situation was also studied. Churning refers to the customer loss because of switching to a different service provider from their present ones during a specific phase of time (Mock 2011) . Companies aim to maximize customer retention as it is a major concern across all industries including telecommunication (Rajamohamed and Manokaran 2018; Choudhari and Potey 2018) . The timing of the prediction of churning behavior is equally critical. Earlier the prediction, the better it is for the organizations. As per Alboukaey et al. (2020) , the monthly churn prediction is partly inefficient; hence daily churn prediction models are required. Thus, churn management includes the identification of valuable customers who might exhibit churning behavior and being proactive to prevent the churn. It is an accepted concept that retaining the present customers is more economic than acquiring new ones (Rust et al. 1995; Heskett et al. 1994) . Van den Poel and Lariviere (2004) observed that the customer retention rate results in a noteworthy effect on businesses. The customers switch from one option to other for gaining the advantage of the best deals possible (Capizzi and Ferguson 2005; Bellizzi and Bristol 2004) or under the influence of their friends' churning behavior (Ferreira et al. 2019 ). Geetha and Kumari (2012) did a thorough study of the usage pattern in case of non-revenue earning customers causing churn in revenue. Customers with changes in usage revenue along with more use of value-added services are expected to show churn behavior. The details of some other research work related to churning behavior in the Indian telecom sector have been delineated in Table 1 . Various researches have shown that customer buying intention has been affected during this pandemic period (Islam et al. 2021; Wang and Na 2020; Laato et al. 2020; Kabadayi et al. 2020) . Laato et al. (2020) studied the purchasing behavior of the customers in unusual times of pandemic with the stimulus-organism-response approach. A strong connection was posited between self-intention to isolate and unusual purchase intent. It proves customer behavior is directly linked to self-isolation time. Many organizations shifted their digital infrastructure to facilitate work-fromhome (WFH) for the time being (Khetarpal 2020; Akala 2020) . The meetings and interactions are held online due to the global pandemic situation. Conclusively, it is noted that despite a large number of studies that have been undertaken to decipher churning behavior as well as customer behavior during the pandemic, there exists a research gap. Evidently, there are limited researches in this area of customer churn during the pandemic period in the telecom sector, and no research has been attempted, which categorizes the variables of churning in order of their relative worth. The usage of a linear discriminant analysis leads to a predictive examination where the prognosis of indicators of churning can be done. The significance of the proposed study of the churning variables is that the telecom service providers can focus on the more significant ones on a priority basis and pay lesser heed to the less critical ones. The present study focuses on the switching behavior during the global pandemic and that is the novelty in approach as the telecommunication industry is in a spot of concern in these changing circumstances. Many studies pertaining to predicting customer churn behavior have been conducted using various classification techniques, but none of these have used the discriminant analysis approach that makes this study unique in terms of methodology. The dataset used in this study was acquired from the three private sector telecom companies operating in the Delhi-NCR region of India. The dataset consisted of 2500 customer details and contained information about the number of months they were subscribed to the services of the company (subscription_time), the number of voicemails they received (num_voicemail), the total amount they spent on the domestic calls made (dom_call_charges), the total amount they spent on international calls made (intl_call_ charge), the total amount they spent on availing internet services (total_internet_charge), the total number of calls they made to the customer service of their telecom service provider (num_ccs_calls), network quality (network_quality) measured in terms of the availability of the network ranging from 1 to 5 where '5' and '1' represent the best and the worst respectively, internet speed (int_speed) categorized as 1 = 4G, 2 = 3G, and 3 = 2G. Also, average call drops per day (call_drops_ave_day) were measured. All these variables were the independent variable X and we had a dependent variable Y named churn consisting of two categories-0 and 1, where 0 indicates the customers who remained with the service provider or those who didn't churn, while 1 represents that the existing customer churned and switched to another service provider. The entire data was for a period of nine months ranging from March 2020 when the first phase of lockdown was enforced in India to November 2020, when maximum unlock was announced. According to Hair et al. (2014) , discriminant analysis is an appropriate technique for statistical treatment for situations where the outcome is categorical in nature and the predictor variables are metric or continuous. This method uses the information offered by the independent variable X to classify the dependent variable Y into two groups. In this study, Y is classified as a categorical variable or group consisting of customers who do no churn (denoted by 0) and customers who churn (denoted by 1). This statistical technique entails obtaining a variate that is the linear combination of independent variables, which optimally discriminates between objects in the prior defined groups. The equation for discriminant analysis is almost similar to that of the multiple regression and is expressed as follows: where Z jk is discriminant Z score of discriminant function j for object k, a is intercept, W i is discriminant weight for independent variable i, X ik is independent variable i for object k. Though there are many other classification techniques like logit regression, ANN, cluster analysis, etc., but Hair At the initial step, the data file was processed using R programming to convert the target variables into categorical variables. To run a discriminant analysis, the target or the outcome variable Y must be categorical. The output is shown in Table 2 . After the conversion of the target variables to categorical variables, we determined the baseline churn rate for the dataset. The output is given in Table 3 . From the above table, it is obvious that the dataset is imbalanced due to the baseline churn rate being only 14.32%, meaning there are more zeros than ones. This will considerably impact our interpretation of the model performance measures. In the next step, we explored our dataset to have a preliminary insight on how our outcome variable Y, labeled as churn, is impacted by the dependent variables taken for the study. The boxplots (Fig. 1) reflect that the subscription time (No. of months the customer stayed with the service provider) has no impact on customer churn intention. The second boxplot (Fig. 2) indicates that customers who churn, for them the number of voicemail messages is lower as compared to those who don't. Tariffs for domestic calls (Fig. 3 ) and international calls (Fig. 4) , and internet usage charge (Fig. 5) , are generally higher for subscribers who churn as against those who don't. This int 3 5 2 4 3 4 2 1 3 2 … $ int_speed int 1 1 2 2 1 1 1 3 2 1 … $ call_drops_ave_day int 2 2 1 2 1 0 3 2 0 1 … $ churn Factor w/2 levels "0", "1": 1 1 1 1 1 1 1 1 1 1 … indicates that the customers who churn, realize at some point in time, that they are paying more money for the services and are not happy with the value for money that they are getting. The availability of the network quality ( Fig. 6) is low for those who churn in contrast to those who don't. Internet speed (Fig. 7) and average call drops in a day (Fig. 8) do not seem to be significant predictors of churn intention. With respect to the number of calls made to the customer service by the customers, we found that the churn rate is relatively high (Fig. 9 ). This is a clear indication that the customers who churned, had attempted to contact the customer service many times, but failed to get a satisfactory resolution to their problems. Based on the assumptions derived from the exploratory data analysis, we further explored our variables using discriminant analysis. Our data had variables exhibiting various dimensions, so we scaled these to avoid the influence of the range of each variable on the discriminant coefficients. Thereafter, we divided the dataset into two parts-training and testing sets. The training and testing dataset ratio was kept as 70:30. Seventy percent of the dataset (1750) observations were kept for the training purpose, and the remaining 750 observations were kept for the testing purpose. The baseline churn rates for training data and test data are given in Tables 4 and 5 . The output given in the above tables indicate that the distribution of partition data is correct. We further developed our prediction model using the training dataset and then tested the performance of the developed model using the test dataset. This was done in eight steps. Step 1: testing the significance of discriminant function using MANOVA In this first step, we used MANOVA for testing the significance of the discriminant function. For this purpose, we formulated a null and an alternate hypothesis as given below: H 0 µ 1 = µ 2 = ... = µ k (The independent variables taken in the study are not significant discriminators of the churning and non-churning groups). H 1 µ 1 ≠ µ 2 ≠ ... ≠ µ k (At least one independent variable is a significant discriminator of the churning and non-churning groups). The output of the MANOVA test is shown in Table 6 . From this, we can see that Wilks λ for MANOVA is 0.8883, which is very close to 1. This indicates that there is a relatively low extent of discrimination in the model. However, we also see that the p-value is highly significant, and thus null hypothesis stands rejected. From these results, we can infer that the discriminant model is vastly significant. Here, we identified a set of attributes, which discriminate the churning out customers from those who won't. For this, we developed the Fisher discriminant function (FDF) using the R packages DiscriMiner and MASS. Since our outcome variable Y has only two values, so we will be having only a single discriminant function represented by DF1. The coefficients of the discriminant function are present in Table 7 . Then, the independent X variables were sorted based on their coefficients in descending order. This provides us an understanding of how each X variable is having an influence on differentiating the Y variable (Table 8) . From the above results, we can infer that the domestic call charges had the maximum impact, followed by the number of calls made to the customer service. The output given in Table 7 indicates that the p-value is insignificant for the subscription time, network quality, internet speed, and average call drops in a day, but is highly significant for the other X variables. From this, it can be interpreted that all the X variables, except the subscription time, network quality, internet speed, and average call drops in a day, are excellent predictors in terms of differentiating our customer groups. The correlation coefficients of each of the variables help to ascertain the comparative importance of each of the X variables. From Table 7 , it is obvious that the domestic call charge is the most noteworthy variable in its ability to discriminate the churning and non-churning groups. Step 5: records classification based on discriminant analysis of X variables and predicting Y variables for the test set From the above results, it is observed that the discriminant model that we developed is significant. This information was used to classify records belonging to either of the group of churning customers or non-churning customers depending on the X variables. The lda() function in R was used to categorize records based on the value of X variables and predict the class and probability for the test set. The output is presented in Table 9 . The difference in the group means from Table 9 gives us a better idea of the factors that have the highest and significant contribution to discriminate between the two groups. The above output reflects that the group means difference is highest in the case of domestic call charges and lowest for average call drops in a day. The linear discriminant coefficients presented in Table 9 also reflects a similar pattern and indicates the independent variable that has maximum contribution to group partition. Step 6: visualization of the groups Further to have better insights into how our probable churning customers look like as compared to their non-churning counterparts, we visualized the data in the form of graphs as shown in Fig. 10 . Figure 10 gives a graphical visualization of the groups that we created through the discriminant analysis. They closely resemble Wilks λ value of 0.8883, which was arrived at Step 1 (MANOVA output). These graphical visualizations clearly indicate that though our discriminant model is significant, both the groups are not entirely exclusive from each other. Rather, they have some overlap between them. Step 7: making predictions on the test set Now having built the model and performed its training, we used our test dataset to predict the churning or non-churning behavior of the customers based on the independent variables of the testing dataset. To make predictions on the testing dataset, we applied the discriminant model that was constructed with the training dataset. The aim of performing this step is to measure the improvement in the performance Step 8: evaluation of the model performance measures Table 10 indicates the model performance on the test data. The error rate of the model comes out to be 0.143. Hence, the accuracy of the model can be calculated as 1 − error rate, which comes out to be 1 − 0.143 = 0.857 = 85.7%. This is fairly good. However, during the initial data exploration, we found that our data is not balanced, therefore accuracy cannot be the standalone predictor of the fitness of the model. So, to ascertain that whether this model is a robust model, we considered looking into some other measures of model performance. Sensitivity could be another indicator of the model performance. Also known as the true positive rate, this is the percentage of samples that are truly positive that give a positive result using the test in question (Steward 2019) . Sensitivity is the ability to predict accurately about the customers who are likely to churn. In our model, the sensitivity comes out to be 11.2%, which is extremely low. For a better predicting of churn, it is necessary that the model should pick the positives as positives. Another measure is specificity, which is the percentage of samples that test negative using the test in question that are genuinely negative (Steward 2019) . This is also termed as the true negative rate. In our case, the specificity came out to be 98.1% and this reflects a lack of balance between the two performance measures. To get an optimum balance for sensitivity and specificity, we decided to vary the threshold of the model to some other values from the default value of 50%. The respective outputs are given in Table 11 . Further, we compared the values of sensitivity and specificity at various threshold values. The respective outputs are given in Table 12 . Now, we have the accuracy, sensitivity, and specificity at various threshold levels as indicated in Table 12 , but we will take into account the optimal cut-off or threshold value. For this, we plot these three values for different thresholds and the point of intersection will be the optimal threshold value. The output from the plot is given in Fig. 11 . In Fig. 11 , we can see that the three values for threshold values of 15%, 16%, and 17% intersect with each other. But, if we compare the accuracy, sensitivity, and specificity of these three threshold values, we find that the accuracy, sensitivity, and specificity values at the threshold value of 15% are more balanced as compared to the other values. Thus, it is apparent that the optimum threshold for the model is 15%. Upon drawing predictions at this threshold, the accuracy and specificity comes to be 74% and sensitivity is very close to these values at 68%. Based on the discriminant coefficients and the correlation ratios provided by the model, we can predict and conclude that an increase in domestic call charges, number of calls made to the customer service, internet usage charges, and international call charges are the strong predictors of the customer churn intention and increase the probability of a customer churn. Thus, the relative worth of the variables that lead to churning behavior in the telecom sector is the value contribution of research. The higher the worth of the factor, the greater is its role in the churn intention of the customers. Domestic call charges are the best indicator of the churning behavior because a slight increase in the price of domestic calls disrupts existing tariffs, emanating confusion in the mind of the customer that may lead to switching from the telecom service provider. Our finding confirms the findings of Chadha and Bhandari (2014) , according to which tariff is one of the major antecedents of customers switching toward MNP. The price sensitivity has increased in the pandemic due to a reduction in the income levels of the people. The number of calls made to the customer service is a strong indication of frustration or uncertainty in customers' minds. Apart from some queries to the customer service, most of the calls are made to register the complaints and grievances to vent out their problems and ask for solutions. The aim should be to process the calls and settle the query as soon as possible. The internet charges and the speed are issues of concern for customers because the usage has increased manifolds since the digital upsurge. A speedy internet at affordable rates is expected by customers. Our this finding is in congruence with the findings of Dey et al. (2020) , who conducted a study in the UK telecom market and concluded that speed is an important factor that influences customer satisfaction and churning intention. The scale of international calls is increasing day by day due to globalization so the charges also need to be reasonable and competitive so that customers can make international calls at affordable prices. The model also gave a meaningful insight that an increase in the number of voicemail messages, network quality, and duration of the subscription may decrease the probability of customer churn. In these situations, they are more likely to continue with the existing telecom service provider. Thus, this study adds more dimensions to the churning factors identified by the study conducted by Geetha and Kumari (2011) such as the proportion of local and STD calls made to other networks along with higher usage of value-added services. The Indian telecom sector is undergoing intense competition and switching cost in this sector is low. Thus, it becomes imperative to understand customer expectations and their viewpoint regarding their switching and retention behavior. The facility of MNP has empowered customers to a great level and in such situations, it becomes pertinent for the telecom service providers to get into the nuances of customer's expectations with respect to internet and telephony services (Chadha and Bhandari 2014) . The meaningful insights given by our discriminant model can help the telecom service providers and the telecom regulators formulate strategies to reduce the customer churn and cases of MNP, which is huge as indicated in a 2020-2021 Report by the Telecom Regulatory Authority of India. The key lessons that these organizations can take from this study are to find an amicable solution to resolve the customer issues within the first or the second call. Dignity and respect are the essential components of customer loyalty (Bahri-Ammari and Bilgihan 2019), and these are reflected in how quickly a customer's problem is addressed. If a customer has to make multiple calls to customer service pertaining to an issue, then it may result in churn intention. Also, there should be a robust, smooth, and organized escalation procedure for attending to those issues which are not resolved within two calls. A good network availability for calling and smooth internet surfing experience matters a lot to customers and a poor network leads to churning. Thus, telecom service providers should focus on providing the best network quality to the customers. Finally, to address the most important factor, i.e., the price, the telecom service providers should offer more lucrative and affordable plans. In the present market situation, customers are informed and can compare several value offerings of similar products. To maximize the customer base, companies can cross-sell products like more data bundles, tariff discount vouchers, etc., which is the need of the hour in the present scenario, where online classes and work-from-home have become the new normal. Ultimately, to retain the customers, the telecom service providers should show some empathy toward their customers and should provide them better internet and telephony services at affordable prices rather than cashing the situation. This would help develop better customer relations and loyalty that would ultimately reduce the churn intention of the customers. Though these findings are derived from the study conducted on the telecom industry but can be generalized to different service sectors/industries. This study has used only one classification technique to predict the churn behavior, and that too only in the context of telecom services. Similar studies could be further extended to other services such as healthcare, entertainment, education, etc. Also, many other machine learning models like logistic regression, decision tree, random forest, clustering, and Neural Network models could be applied and the comparison of the accuracy of these models could be studied. A collection of data from other countries could provide meaningful insight for comparing the churn intention of the telecom customers across different countries. The churn intention may also be affected by cultural and societal norms and these variables were not a part of this study. A future study can be done by taking into account the cultural and societal differences among different countries and their effect on customer satisfaction and loyalty. More big employers are talking about permanent work-from-home positions Dynamic behavior based churn prediction in mobile telecom Customer retention to mobile telecommunication service providers: The roles of perceived justice and customer loyalty program An assessment of supermarket loyalty cards in one major US market Leveraging service recovery strategies to reduce customer churn in an emerging market The Network Impact of the Global COVID-19 Pandemic. The New Stack Loyalty trends for the twenty-first century Determinants of customer switching towards mobile number portability Predictive to prescriptive analysis for customer churn in telecom industry using hybrid data mining techniques Impact of digital surge during COVID-19 pandemic: A viewpoint on research and practice Syed Sardar Mahammad, and Ben Binsardi. 2020. The role of speed on customer satisfaction and switching intention: A study of the UK mobile telecom market Effect of friends' churn on consumer behavior in mobile networks Analysis of churn behavior of consumers in Indian telecom sector Multivariate data analysis: Pearson new international edition Putting the serviceprofit chain to work Does mIM experience affect satisfaction with and loyalty toward O2O services? Shujaat Mubarik, and Liang Xiaobei. 2021. Panic buying in the COVID-19 pandemic: A multi-country examination The impact of coronavirus on service ecosystems as service megadisruptions Post-COVID, 75% of 4.5 lakh TCS employees to permanently work from home by '25; from 20%. Business Today Unusual purchasing behavior during the early stages of the COVID-19 pandemic: The stimulus-organism-response approach An Approach to Mitigate the Risk of Customer Churn Using Machine Learning Algorithms Comparing carrier churn Analysis of customer churn prediction in telecom sector using cart algorithm Improved credit card churn prediction based on rough clustering and supervised learning techniques Return on quality (ROQ): Making service quality financially accountable Sensitivity vs Specificity. Technology Networks Customer attrition analysis for financial services using proportional hazard models Panic buying? Food hoarding during the pandemic period with city lockdown