key: cord-0633008-xrqm8es6 authors: Cao, Hongliu; Thomas, Eoin title: Destination similarity based on implicit user interest date: 2021-02-12 journal: nan DOI: nan sha: fa8166112e878c98ae84a25c612b257dc9b9541a doc_id: 633008 cord_uid: xrqm8es6 With the digitization of travel industry, it is more and more important to understand users from their online behaviors. However, online travel industry data are more challenging to analyze due to extra sparseness, dispersed user history actions, fast change of user interest and lack of direct or indirect feedbacks. In this work, a new similarity method is proposed to measure the destination similarity in terms of implicit user interest. By comparing the proposed method to several other widely used similarity measures in recommender systems, the proposed method achieves a significant improvement on travel data. Key words: Destination similarity, Travel industry, Recommender System, Implicit user interest The travel industry is more and more relying on e-commerce nowadays as online solutions have made life more convenient and comfortable for everyone. However, unlike the online shopping industry, the travel industry is more complicated to analyse in four ways: 1. user data are much more sparse. A user on amazon may have a lot of search and purchasing history on Amazon during a short period like one month. But in the travel industry, a passenger may only reserve fight tickets once a year. 2. For platforms such as Amazon, each good/item has a clear hierarchical category (e.g, diapers belong to the category of Baby care, which belongs to a higher level category of Baby), which may be used as the definition of user interest. In this application for the travel industry, we consider the destination as the item such as Paris, London or Shanghai. But it is hard to define a category for Paris in terms of explicit user interest. 3. Account and user information is necessary for online purchasing on Amazon, which makes it easier to group the history of searches and purchases by user. For online flight bookings for example, the user does not have to create an account during the booking phase. Travelers often book flight tickets, train tickets, hotel, activities on different platforms, which makes it harder to group the purchases by user. 4 . Most online products such as movies or Amazon products have ratings as user can give feedback easily. However, given that a traveler has been to Paris for example, it is hard to get a rating from the traveler on the Paris trip. Because a trip includes tickets booking, hotel booking, Point Of Interests (POIs) visits, food, etc. On the other hand, it is difficult for travelers to rate a trip to a certain destination as there are many variables. All these differences make it a great challenge to collect, analyse and understand user data in the travel industry. One important way to help understand traveler trends is destination similarity. Destination similarity is very important for the travel industry: 1) For travelers: we want to help travelers to find similar destinations that they may be interested in. For example, with the search or booking history of a traveler, similar destinations can be recommended to the traveler for the next trip. Another example is that the unprecedented COVID-19 crisis has introduced a number of travel restrictions which will prevent leisure travelers from reaching to their dream destination. With the destination similarity, we can recommend them alternative nonrestricted destinations they might also be interested in. 2) For tourism offices: tourism offices can better identify their competitors for each origin market. This can allow them to better distinguish themselves and target travelers considering trips to similar locations. 3) For online advertising companies, destination similarity can be used to identify if the current user who is searching for destination A would be interested in their impression of destination B, to improve the click through rate or conversion rate. 4) For a sustainable travel industry, the destination similarity can be used to suggest destinations that travelers might be willing to visit (so with the potential to convert a search into a booking), but closer to their home or simply better served by direct flights, thus reducing the CO2 emissions linked to transport. It can also be a solution to fight over-tourism problems by recommending similar destinations with fewer tourists or offering a more local an authentic experience, making travel even more rewarding. In this work, we propose to measure destination similarity from the search logs based on anonymized cookie ID. Various similarity measures (among users or items) have been proposed in the literature for collaborative filtering (CF) based recommender systems. However, most of these measures are proposed based on the users' ratings, while there's no rating information for a destination (city) in the travel industry. This makes many similarity measures not suitable for our problem. To fill this gap in the literature, we investigate different possible solutions and propose a new similarity measure, which has superior performance than the state of the art methods. The remainder of the paper is organized as follows: the background and related works are introduced in Section II. In Section III, the proposed similarity measures are introduced. We describe the data sets chosen in this study and provide the protocol of our experiments in Section IV and the experimental results in Section V. The final conclusion and future works are given in Section VI. Generally speaking, before a traveler makes a booking for a holiday, there are two preliminary steps: inspiration and search. There are a lot of information sources that can inspire travelers such as a movie scene or recommendations from relatives. During the inspiration step, the user interest is broad and general, thus we can only estimate the implicit user interest. With enough accumulated motivation, travelers will then search for more detailed information from travel websites, blogs or reviews to enrich their general picture of the destination. Then, the traveler will start to search flight and hotel information. If the prices agree with the budget, the traveler will pass to the next step: booking. Otherwise, the traveler may search for another similar destination and compare the price. The general motivation here is that: when a user searches for travel to a destination, this action shows that the user is interested in this destination. But we don't know what exactly the user is interested in (e.g. the museum, beach, mountains or other activities), which can be called 'implicit user interest'. The explicit user interest is difficult and expensive to get. Many companies ask the customers directly, while others try to infer from customer shopping/spending behavior. However, apart from the costs, the explicit user interest may not be clear for travelers themselves either. Because tourist attractions or POIs are not the only reason travelers get interested in a destination, it may also due to the culture (local people, food, etc.), weather, events and so on. Capturing implicit user interest seems easier and more direct. When a user searches both destination A and destination B, there must be some similarities between these two destinations for this user. However, the user interest changes overtime. If the time difference between two different searches is 10 months for example, these two destinations may not be similar as the user interest may shift due to the season or travel context. Hence, limiting the time period between two different searches is important. The research question is: given that a user has searched for several destinations, how can we determine the destination similarity? In the literature, there are many item similarity measures that can be applied. For example, in recommender systems, the CF has become the most widely used method to recommend items for users [1] , [2] . The core of CF is to calculate similarities among users or items [3] . Destination similarity can be seen as recommending destinations for users from the point of view of CF. The classic CF problem has a rating matrix. Let R = [r u,i ] m×n be a rating matrix with m users and n items. Each entry r ui is a rating value given by user u to item i. There are different ranges of r ui in real world datasets. Among which, the range 1,2,3,4,5 is adopted in many datasets such as movie reviews and restaurant reviews. Many predefined similarity measures can be used for CF. The most commonly used similarity measure are the Pearson correlation coefficient (PCC) [4] and the Cosine similarity (Cos) [5] : where I represents the set of common rating items by users u and v. r u and r v are the average rating value of user u and v respectively. Many variants of PCC and Cos have also been proposed. In [6] , Constrained Pearson Correlation Coefficient (CPCC) has been proposed to take the negative rating values into account. Other studies suggest that the number of co-rated items can impact the performance of the similarity measure. Hence, the Weighted Pearson Correlation Coefficient (WPCC) [7] and the Sigmoid Function based Pearson Correlation Coefficient (SPCC) [8] have also been proposed. Another widely used similarity measure is the Jaccard. Jaccard similarity coefficient is one of the popular methods for calculating the similarity between users/items [9] , [10] : where I u and I v are the sets of items rated by users u and v respectively. Unlike the previous two similarity measures, the Jaccard similarity considers only the number of co-rated items between two users regardless the rating values, which seems to be suitable for our problem. Apart from previously introduced widely used similarity measures, there are some recent advances in the literature too. In [2] , the authors list the limitations of popular similarity measures used in CF and propose a new similarity measure by combining the percentage of non common ratings with the absolute difference of ratings. However, all these similarity measures are proposed for measuring user similarities with item ratings [10] , [11] . However, this is not adapted to recommend destinations to travelers for two reasons. Firstly, there are no ratings for destinations from travelers. From search logs, we can only get binary response that if a traveler searched this destination or not. Secondly, due to the user interest change, only recent searches should be used to recommend destinations to travelers, which means that there are very few searched cities. It is difficult to measure the user similarity with little information. Hence, we need to measure the destination similarity instead of user similarity. Here, we propose a new similarity measure for items without any user ratings in the next section, and apply it to destination similarity in our experiments. To recommend a destination to a traveler who has made one or more searches recently, we can directly measure the destination similarity and recommend destinations similar to travelers' recent searches. Let R = [r u,i ] m×n be a binary matrix of with m users and n destinations. r u,i = 1 means user u recently searched destination i, while r u,i = 0 means user u didn't search destination i. The matrix R is very sparse due to two reasons: 1. there are many destinations while each traveler only knows few of them; 2. people don't plan travel frequently. As the matrix is binary, many commonly used CF similarity measures based on ratings are less meaningful. In this work, a simple and easy to understand similarity measure is proposed inspired from Random Forest Similarity (RFS) [12] , [13] . Random Forest (RF) classifier [14] is one of the most successful and widely used classifiers. A RF H is an ensemble made up with M decision trees , denoted as in Equation (4): . RFS is a similarity measure inferred from RF, which is also widely used [15] - [17] . For each tree h k in the forest H, if two different instances X i and X j fall in the same terminal node, they are considered as similar: where l k (X) is a function that returns the leaf node of tree k given input X. The final RFS measure RF S (H) consists in calculating the RF S (k) value for each tree h k in the forest, and to average the resulting similarity values over the M trees as in Equation (6): Cluster Consensus Similarity (CCS): RFS is mostly designed for classification problems, which is not suitable for the destination similarity problem with only user-destination interaction matrix. However, inspired by the idea of RFS, we propose a simple method named CCS. For RFS, each tree is trained to give a different data partition: each leaf node groups one or several instances together. In this work, the destinations searched by each user can be seen as a cluster. The destinations in this cluster share some similarity in terms of this user's interest. With this intuition, the similarity between destination i and j for user u can be defined as: Similar to RFS, the final similarity between destination i and j is then averaged over all users: With the proposed CCS measure, a n × n destination similarity matrix can be provided ( (9)). In the similarity matrix, each row is a similarity vector, represents its similarity to all other destinations. To avoid recommend the searched destination, the diagonal values (the similarity value to itself) are set to 0. Normed Cluster Consensus Similarity (CCS norm ): The proposed CCS method is simple and easy to understand. The destination similarity for each user is calculated at first to reflect this user's travel interest. Then, the similarity values of all users are averaged, which can be seen as a majority voting process. However, there are two requirements for a similarity measure. The first one is that each similarity vector can provide a good ranking, so that it can answer which are the most similar destinations given a known destination. The second one is to provide a solution when given multiple inputs. This requires that different similarity vectors (e.g. S CCS i, and S CCS j, ) should be comparable so that simple operations such as summing make sense. CCS can meet the first requirement, but not the second one. Because less popular destinations have very small value, especially when the user number m is very large, while popular destinations have larger value. This means that if we average two similarity vectors to find destinations similar to both given searched cities, the popular one dominates the averaged results. This means that we only focus on the destinations to the popular one and ignore the less popular one. To mitigate this effect, the CCS norm is proposed to re-scale each similarity vector: The proposed CCS norm method re-scales the similarity vectors for all destinations to the same range [0,1] so that they are comparable between destinations. CCS norm can help to avoid the situation that popular destination dominates the final similarity values when merging with less popular destinations. However, this brings another problem. Popular destinations are searched by most people, which means there are more data and we have more confidence on the similarity vector for popular destinations. For example, if unpopular destination i has been searched by 2 users among 1 million users and another unpopular destination j has been searched by another 2 users, the similarity between i and j is 0. However, the fact can be that they are quite similar, we just don't have enough data to support their similarity. Hence, we have more confidence for the similarity vector of popular destinations. With this intuition, the P CCS is proposed based on the CCS norm : where p i is the popularity of destination i, defined on b i , the rank of destination i based on the number of searches: Here, w ∈ (0, 1) is a parameter to control the difference between the similarity values for popular and unpopular destinations. This allows a trade-off between putting more confidence on popular destinations (w > 0) and putting same confidence for all destinations (w = 0). When w is bigger, the popular destination vectors are more weighted. However, too big w may lead to over focus on popular destinations and decrease the recommendation diversity. In this work, we use the destination search data. In this dataset, we use anonymized user cookie id to group the searches of the same user together. Users from 5 countries are selected due to business interest. The data contain activity related to search sessions, from which only 3 columns are preserved: the anonymized user cookie id, the searched destination location and the country from which the search was made. This last field is only used to create multiple market specific datasets to preserve cultural differences in destination preferences. The main objective of our experiments is to find the most suited similarity measure for search logs (binary user-items interaction). Apart from comparing different measures, the cultural difference and user interest change should also be taken into consideration. Culture difference The travel preferences of a French person, for example, may be different from those of a Chinese person. To deal with this challenge, we took country/culture into account when measuring destination similarity. The destination similarity is measured by the country from which the search was made. User interest change For most countries, user interest can changes over time. For example, summer destinations are usually different from winter destinations. To cope with this challenge, we regularly update the destination similarity to adapt to this shift in user interest. In the previous solution, recent two months data are used to calculate the similarity matrix and the results are updated weekly. Training For the training procedure, we have data from 5 countries. For each country, 10 time periods are selected across 2019 and 2020. The three proposed methods CCS, CCS norm and P CCS are compared to 4 widely used similarity measures in the literature: Cosine similarity, Pearson similarity, Jaccard similarity and Kulsinski similarity. Like Jaccard similarity, Kulsinski similarity is also a measure for binary vectors. It is less popular, but tested in many different fields [18] - [20] and achieving some good results [21] . For P CCS method, different w values are tested and the best w is selected. Testing The similarity matrix is calculated from the 8 weeks of data, and tested on the data of the following week. During the test phase, we randomly mask one searched destination for each user, and use the rest of the searched destinations to predict the masked destination. To realize this, the average of the rest searched destinations' similarity vectors is calculated. Then, we check if the masked destination is in the list of top 5 most similar destinations. To compare seven similarity measures, the experiments on 5 countries data with 10 time periods for each country have been done. The means and standard deviations of the top 5 accuracy (relative) over 10 periods for each method are shown in Table I . The results illustrate that among all five countries, P CCS always has the best performance, followed by CCS norm . P earson is always the worst performing method among all similarity measures. Table I gives more details on comparison. On average, the proposed P CCS improves the performance of P earson by 7.01%. Compared to the previous selected method Cos, P CCS gains an improvement of 4.44% on average. From the results of average ranking in Table I , it can be observed that two best performing methods are P CCS and CCS norm , while the widely used similarity measures Cos and P earson are the worst. One reason is that these two measures are not designed for the comparison for binary vectors. The Wilcoxon-Holm post hoc test with Critical Differences (CD) is also done to have an overall statistical comparison. The statistical test result is shown in Figure 1 . Generally speaking, all three CCS based methods are significantly better than P earson and Cos. The proposed P CCS is significantly better than all other 6 similarity measures. In the previous section, all the experiments use the previous eight weeks of data as training data for the following week of test data. However, using eight weeks of training data for each market and updating the results weekly can be very time consuming and computationally expensive. In this section, we try to reduce the training data to four weeks only to see to which extend the prediction performance is affected. The experimental results are presented in Table II . Similar to the analysis in the previous section using eight week training data, the results trained on only four weeks data also show that the proposed P CCS is the best performing method while Cos and P earson are the worst. On average, P CCS increases the performance of P earson by 7.06% and increases the performance of Cos by 4.67%, which is also similar to the conclusion in the previous section. However, compared to the results in Table I , the average performance of each similarity measure is not strongly impacted by the reduction of training data size. The method that has the biggest difference is Cos, with a reduction of 0.65% of accuracy. P CCS has a performance reduction of 0.42% on average. But when we look into each country, it can be found that the differences on Country2, Country3, Country4 and Country5 are negligible, but the performance on Country1 has a drop around 1.76% (this analysis is based on the comparison of the absolute performances, which is not disclosed). One possible reason can be that, the data coverage in Country1 is not very good and the data size is much smaller than other countries, which can also explain this big performance drop on Country1 than other countries. The objective of this experiment is to answer the research question: how many data are enough to have a good prediction performance but at the same time keep the computational efficiency. The experimental results show that for countries with good data coverage, four weeks training data are enough compared to eight weeks training data. However, for countries without a proper data coverage, it's better to use eight weeks training data. Due to the crisis of covid-19, the data volume of 2020 is much smaller than the volume of 2019 data. The question in this section is: should we use data from 2019 or 2020 as training data for the prediction of 2020? To answer this question, we choose the first week of June 2020 as the test data and use the data from April and May 2019 and 2020 respectively as training data to compare their prediction performance. We choose this time period is because there were confinement in Europe during April and May 2020, and the data volume is lower compared to the same period in 2019. The experimental results are shown in Figure 2 . Globally speaking, 2020 data have better perdition performance than 2019 data even though the 2020 data volume is much smaller. The smallest difference happens in Country3 with 1.05% accuracy gap, while the biggest difference happens in Country4 with 4.75% performance gap. One possible reason may be the change of user interest. Another possible reason may be that the covid-19 crisis has changed user's behavior (e.g. more local travel than international travel). Fig. 2 . The comparison of the prediction performance between 2019 data and 2020 data. X-axis is the country, Y-axis is the accuracy. In this work, we have presented the challenges of recommending destinations in the travel industry and the differences of recommending items other e-commerce sectors. The challenges of extra sparseness, dispersed user history, change of user interest, few direct or indirect feedbacks make it much harder to understand travelers and make recommendations more difficult than for other online consumers. To tackle these challenges and to understand travelers, we decide to measure the destination similarity in terms of traveler's implicit user interest. There are many similarity measures proposed in the field of collaborative filtering. However, most of these measures are designed for user-item interaction with ratings. Hence we propose a new similarity measure for user-item interaction without ratings to deal with the challenges in travel industry. The proposed P CCS is inspired from Random Forest Similarity to take user interest into account. The destination popularity is added to adjust the magnitude of each similarity vector so that the similarity vectors of different destinations can be fused correctly. After comparing seven different similarity measures on real world data, the proposed P CCS is proved to be the best solution. However, there are some improvements can be done. Firstly, we can expand single-source to multi-sources. Users can be limited by the knowledge of destinations, which means that there are destinations users never search because they don't know them not because they are not interested in them. Search logs can only reflect the similarity among destinations known to users. In this way, apart from the implicit user interest from search logs, other sources can be added such as destination images or descriptions. By using multi-source data, the implicit user interest can be fused with the destination information to provide a more meaningful similarity measure. Secondly, more user information and session information can be collected to provide a more personalized similarity measure. But more input information may also limit its use in real world use cases. Introduction to recommender systems handbook A new similarity measure for collaborative filtering based recommender systems A new user similarity model to improve the accuracy of collaborative filtering A survey of collaborative filtering techniques Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions Social information filtering: algorithms for automating "word of mouth An algorithmic framework for performing collaborative filtering Trustwalker: a random walk model for combining trust-based and item-based recommendation Comparison of collaborative filtering algorithms with various similarity measures for movie recommendation A survey of similarity measures for collaborative filtering-based recommender system Similarity measures for collaborative filtering recommender systems Random forest dissimilarity based multi-view learning for radiomics application Random forest for dissimilarity based multi-view learning: application to radiomics Random forests Unsupervised learning with random forest predictors Improve the performance of transfer learning without fine-tuning using dissimilarity-based multi-view learning for breast cancer histology images Gene expression data clustering with random forest dissimilarity Data patterns discovery using unsupervised learning Medal: Deep active learning sampling method for medical image analysis Acquiring banking networks Amritanlp at semeval-2018 task 10: Capturing discriminative attributes using convolution neural network over global vector representation