key: cord-0258623-q6tvey2b authors: Fu, Chun; Miller, Clayton title: Using Google Trends as a proxy for occupant behavior to predict building energy consumption date: 2021-10-31 journal: nan DOI: 10.1016/j.apenergy.2021.118343 sha: 79561e849596b4514162ea889b2f5f595308a094 doc_id: 258623 cord_uid: q6tvey2b In recent years, the availability of larger amounts of energy data and advanced machine learning algorithms has created a surge in building energy prediction research. However, one of the variables in energy prediction models, occupant behavior, is crucial for prediction performance but hard-to-measure or time-consuming to collect from each building. This study proposes an approach that utilizes the search volume of topics (e.g., education} or Microsoft Excel) on the Google Trends platform as a proxy of occupant behavior and use of buildings. Linear correlations were first examined to explore the relationship between energy meter data and Google Trends search terms to infer building occupancy. Prediction errors before and after the inclusion of the trends of these terms were compared and analyzed based on the ASHRAE Great Energy Predictor III (GEPIII) competition dataset. The results show that highly correlated Google Trends data can effectively reduce the overall RMSLE error for a subset of the buildings to the level of the GEPIII competition's top five winning teams' performance. In particular, the RMSLE error reduction during public holidays and days with site-specific schedules are respectively reduced by 20-30% and 2-5%. These results show the potential of using Google Trends to improve energy prediction for a portion of the building stock by automatically identifying site-specific and holiday schedules. Building energy prediction has been an important research topic in recent decades due to its extensive use in thermal load prediction [1] , systems optimization [2, 3] , measurement and verification [4, 5, 6] , calibration [7, 8] , and retrofit analysis [9] . A recent text-mining study of over 30,000 publications related to building energy efficiency and data science showed the growth of prediction techniques over the last decade [10] . There are many types of building energy prediction methods, including traditional statistical time-series models [11, 12, 13] , building simulation models (e.g., Energy-Plus) [14] , and some popular machine learning models such as neural networks [15, 16, 3] , deep learning [1, 17, 18, 19] , or tree-based models [20] . Among them, machine learning models are currently receiving the most attention because of their prediction accuracy and multi-factor flexibility [21] . For example, in the overview paper of the Great Energy Predictor III (GEPIII) machine learning competition 1 hosted by the ASHRAE organization on the Kaggle platform [22] , it was highlighted that most of the winning teams used gradient boosting tree-based models with model-ensembling techniques to predict building energy with meta, weather, and temporal data. This competition was the latest generation of the series of ASHRAE-led 1 https://www.kaggle.com/c/ashrae-energy-prediction competitions [23, 24, 25, 23, 26] towards creating a machine learning body of knowledge, open data set [27] , and means of benchmarking energy prediction methods [28] . This momentum demonstrated considerable research potential in the field of building energy forecasting. 1.1. Occupant behavior and patterns data could improve prediction Although there has been considerable research in building energy prediction, a common challenge in building energy modeling is the lack of occupant behavior in the dataset, which may negatively affect prediction performance. An error analysis of the GEPIII competition showed that the average prediction of the top 50 teams in the competition had a higher than acceptable range of error in over 20% of the test data set [29] . The error regions were especially high for hot water meters (60% of test data) and primary use types like science/technology (45.2% of test data). One of the key reasons for these errors is challenges in creating occupancy schedules during periods of localized holidays, break periods, time off, or other effects [30] . University campuses, for example, have periods such as spring break in which students are not in class, which results in energy usage being different from regular days -usually lowered. Further, the site-specific calendars of each building and sites can be different, making it difficult to scale the use of such data. These schedules from different locations require additional manual Figure 1 : Correlation between data sources like weekday, weekend, and holidays schedules and building energy consumption is well understood. The use of Google Trends as a means of detecting occupant behaviour could automate the process of characterizing the localized effects of unique site-specific schedules. work to collect and organize. In terms of modeling, one strategy is to separately train models by day types, such as regular-day and holiday energy models, but there might be a higher risk of error due to insufficient training data. Some studies combine fuzzy models and similar day methods for training [31, 32, 33] . Some studies use holidays as the input by binary encoding (i.e., 0 for a non-holiday, and 1 for a holiday), but efforts in manually collecting calendar data and confirming the building type are inevitable [34, 35] . Due to the tedious efforts to collect calendar data, most past research focused on buildings in a single region, and no research has yet to propose a general method that can be widely applied across building types or countries. Due to the popularity of mobile devices and networks, the access of such mobility data combined with data science is a promising research direction. Google Trends, a popular search volume query platform based on the most popular search engine worldwide, with a large enough volume of search, provides data for understanding what search terms people use to find information on the Internet 2 . Examples of work using Google Trends as a means of improving the prediction of human-related behavior include papers in health care [36, 37, 38] , financial markets [39, 40, 41, 42] , and tourism [43, 44, 45] . In the building performance field, Google's Popular Time was used to create weekly data-driven schedules for each building type and compared with standard schedules from ASHRAE [46] . In other 2 https://trends.google.com/trends/ studies, crowd positioning data were used to extract occupancy patterns in buildings [47, 48] . However, in the research of building energy prediction, the state-of-the-art does not leverage any online data beyond manually collected and organized data on weekdays/weekends and national holidays. Therefore, this work uses the search volume provided by the Google Trends data in an innovative way to improve the energy model. Figure 1 shows the goal of this paper in its focus on using data collected from Google Trends to find correlating signals that can help predict energy consumption for various buildings. Google Trends provides a search volume of specific topics with selected time periods and regions, representing the trend of people's behavior and providing calendars for the building energy model. The purpose of this research is to provide a framework, starting from selecting and evaluating topics, verifying whether topics can improve the accuracy of the energy model across buildings and countries, and finally suggesting how to use Google Trends data to improve the prediction model. This study raises the following three questions: 1. How to evaluate whether the search volume of topics matches the energy use behaviors of the building? 2. How much accuracy of the energy model can be improved/reduced by provided calendars from Google Trends? 3. What are the topics that can improve the performance of model prediction? 4. Could this method be scaled across multiple sites from a diversity of locations? Figure 2 : Overview of the methodology to process data from the BDG2 and Google Trends data sources to test prediction improvement over the GEPIII competition-inspired baseline. The rest of the paper is organized as follows. Section 2 outlines the process of collecting and cleaning the data, performing a correlation analysis to filter the best topics for prediction, and testing the ability of Google Trends to influence prediction accuracy. Section 3 shows the effect of implementing the proposed method on a large set of data from energy meters in the GEPIII. Finally, Sections 4 and 5 interpret the results in the context of improving the field of energy prediction in buildings and indicates the limitations of this paper. For this study, the time-series hourly data from 2,380 meters in the Building Data Genome 2 (BDG2) project dataset were used for modeling. These data were also used in the GEPIII Kaggle competition [22] . Figure 2 illustrates the flow of the framework, starting from preprocessing energy and Google Trends data, selecting and evaluating topics, and finally, comparing the proposed method with the baseline model. Two datasets were used in this study: (1) building energy data and (2) Google Trends data to evaluate the impact of search volumes on the performance of building energy models. Both datasets contain time-series data from 2016 to 2017. For this study, the data from 2016 was used as training, and the data from 2017 was retained as validation. The Building Data Genome 2.0 (BDG2) is an open dataset containing hourly meter readings and metadata of 3,053 power meters over two years. Each of the buildings has metadata such as floor area, weather, and primary use type. This dataset can serve as a benchmark for comparing different machine learning algorithms and data science techniques. This study compared the prediction result before and after adding Google Trends based on the baseline model in GEPIII, so only 2380 meters out of 3053 meters were used in this study to be consistent with the competition settings. Table 1 outlines the metadata variables available in the data set. Google Trends is a service that analyzes the popularity of search queries in various regions and languages in Google search. It provides users with normalized time series of search volumes (between 0 and 100) for any keywords of interest. On the Google Trends platform, there are two sources of search volumes: topics and terms. In this research, only search volumes of topics were collected and applied to the energy model. The reason for using topics instead of terms is that definition of topics in Google Trends is a set of terms that share the same concept across languages, while terms only show matches in user's query in the given language. For instance, the official example gives that the research volume of the topics London includes results such as British Capital and Londres (Spanish for London). But the terms London only contains the words London or London Bridge, and does not contain words in other languages or related concepts. Therefore, using topics instead of terms better captures a set of keywords and be well applied in different countries. For the selection of topics, in previous studies, search terms and topics for modeling were manually collected based on the author's knowledge and judgment [39, 38, 36, 40, 44] . However, out of simplicity and intuition, this research directly uses the primary use column from dataset as initial search keywords (e.g., Education, Office, Retail, etc.) to collect appropriate topics from suggested ones (see Figure 3) . A similar selection process was also seen in a study predicting trading behaviour [39] , which used terms suggested by the Google Sets service by giving keywords related to the stock market. For example, if entering education as search term on the platform, the suggested topics include Education, Higher education, Sex Education (A comedy-drama series), etc. Out of these suggested topics, only reasonable ones were selected, and irrelevant topics like Sex Education would be therefore removed. Furthermore, additional keywords of productivity tools were also considered: Microsoft Office, Google Docs, and Mails. Because these tools are often used in the workplace, derived topics are expected to be related to human behavior in buildings (e.g., looking up "how do I do vlookup on Microsoft excel" on Google Search). Table 2 shows a table of extracted topics and the categories to which they belong. The daily search volume of these topics in each country was downloaded in batches according to the selected topics. These daily search volume data included 39 topics from 2016 to 2017 in 4 countries. This study used an open-source Python package called pytrends to extract the data set for use in this analysis. To normalize the search volume of different topics in different years in the model, the search volume of all topics was standardized by year (i.e., subtracted by yearly average and divided by yearly standard deviation). Figure 4 visualizes two examples of processed time series for two topics to show the typical patterns and seasonality of such data that is similar to energy consumption measurement. Before developing the energy prediction model, the preprocessing of the data was done to prepare for the analysis. First, due to the daily granularity of Google Trends data, the energy data was converted from hourly to daily values. Next, the best-fit topics for each meter's energy behavior were found via calculating the correlation between meter readings and Google Trends, and then the signals of these topics were used as input features for the prediction model in the next phase. To be consistent with the daily search volume provided by Google Trends, the hourly meter data in this study was resampled to a daily resolution, which represents the calendar data of the meter. Among various techniques of dimension reduction, PCA (Principal component analysis) is a computationally effective way to compress and represent data in lower dimensions via finding principals with the highest variance [49] . This study used PCA to reduce the dimensionality of the annual data from 8784 dimensions (366 days x 24 hours) to 366 dimensions (366 days). For simplicity and consistency, only the first PCA component was retained as the derived calendar data. Figure 5 shows an example of implementing PCA on an electricity meter in an educational building, with a heatmap visualizing the intensity of energy use in daily values. Finding the best-fit topic for each power meter and building type Each power meter has its unique calendar pattern, so one of the focuses in this study is to find appropriate topics that would assist in modeling. The best-fit topic for each meter was found by calculating linear correlation to evaluate the similarity between the energy and Google Trends data. A past study also utilized linear correlation to select topics for improving the predictability of COVID-19 cases [38] . Figure 6 illustrates an example of two buildings. According to the calculation result of the correlation coefficient between the electricity meter and each topic, the search topic with the highest correlation to the Google Trends The baseline model in this study was chosen for its balanced performance between simplicity and accuracy, and it is a publicly-shared model in GEPIII on the Kaggle platform. This LightGBM-based model was developed using a 3-fold crossvalidation method with basic data cleaning (e.g., abnormally extreme values and days-long constants were removed). The only difference between the proposed method and the baseline model is the additional feature of Google Trends data as the model settings and parameters were implemented in the same way. Table 3 shows an overview of these two models. In addition to evaluating the overall performance of the proposed method, the respective impacts of different day types are also considered to analyze further the benefits of Google Trend data in energy modeling. In this study, a majority of the power meters in the dataset are from university campuses. The academic calendar of universities was collected and manually labeled as (1) regular day, (2) public holiday, and (3) site-specific schedules. The definition of regular and public holidays day types were taken from online sources. Site-specific schedules are breaks and vacations of universities, which are unique to that site based on operational and academic calendars. Upon analyzing error reduction aggregated by different day types, the potential for Google Trends data to help automate the process of accounting for calendar data could be evaluated. This process might be beneficial, especially during the site-specific schedules, which are generally the most challenging period to predict. The evaluation metric for this research is Root Mean Squared Logarithmic Error (RMSLE), which is consistent with ASHRAE's evaluation method in the Kaggle competition. If the data is skewed with extreme outliers, the error will be significantly amplified when using Root Mean Squared Error (RMSE) evaluation. Therefore, this metric was selected as it calculated the relative error between prediction and actual values while having robustness to the outliers. RMSLE is calculated according to Equation 1 with the following definitions: • n is the total number of observations • pi is prediction of target • a i is the actual target for i • log(x) is the natural logarithm of x In this section, correlations between Google Trends data of topics and energy data and how they affect the prediction result are studied. Furthermore, to give a more intuitive understanding of RMSLE reduction, a benchmark of RMSLE collected from different levels of competitors in GEPIII is provided for comparison. Table 4 shows the results of the correlation analysis. Out of 2,380 meters, 293 were found to be highly correlated to the topics of Google Trends, and almost all of them are electricitytype meters. In contrast, most of the other three types of meters (chilled water, steam, and hot water) only have poor-correlated topics. The reason is that these three types are mainly related to environmental controls such as cooling and heating, so these meter readings are more dependent on weather variability, especially the outdoor temperature. Figure 7 shows the distribution of correlation values according to building types and it can be seen that education and office building have the highest correlations. This result means that the energy use behaviors of these building types are more likely to find similar search trends with specific topics. For example, the two example buildings mentioned earlier are education and office buildings, which have high correlations with the topics Education, and Microsoft Excel respectively (R 2 >0.8). Figure 8a shows that education and office-related topics have the highest percentage of high correlation to electricity meters. This result shows that Google Trends of these topics have more similarities to the time series of power meters, which may better explain building energy behavior. For example, office-related topics, such as Office 365, Microsoft Excel, Enterprise and Employment, have 30-60% of high correlations with power meters. As for education-related topics, such as Education and Primary school, with roughly 20-50% of highly correlated power meters. This situation is also reinforced in Figure 8b , where Education and Office buildings happen to have the highest proportion of highly correlated meters among all building types. Table 5 shows the comparison of the prediction error of the proposed method after adding Google Trends with the baseline model. It can be observed that a tendency for higher correlation of topics helps reduce more errors. In terms of electricity meters, using the method on the high, fair, and poor correlation categories of buildings could reduce RMSLE by 1.9%, 1.2%, and 1.0%, respectively. If different day types are considered, the errors from public holidays and site-specific schedules can be reduced by 24% and 3.5% for highly correlated topics. In contrast, no significant error reduction is found on regular days. The other three types of meters (chilled water, hot water, and steam meter) are highly dependent on weather conditions due to their functionality for heating and cooling. Without fair and high correlated topics, the error of non-electricity meters even increased after adding low correlated topics, harming the accuracy of the model prediction. Table 6 shows the prediction errors of meters that are further analyzed with highly correlated topics. It can be seen that most of the topics are related to office and education, and these highly correlated topics could contribute to the reduction of errors. For example, Education could reduce the RMSLE of meters in the US by 5.7% on average, especially for public holidays and site-specific schedules by 30% and 8.3%, respectively. Likewise, Primary school also benefits the prediction during holidays and site-specific schedules, but the signal error in the test dataset causes the overall RMSLE to increase slightly. Office-related topics, including Enterprise, Microsoft Excel and Microsoft Word, are also effective in reducing RM-SLE by about 1.0 -5.0%. These cases show that Google Trends data with high correlation could serve as a good calendar feature for energy models. Table ? ? shows the comparison of the prediction results with the benchmark from the GEPIII competition. It is observed that higher correlated topics elevate the prediction performance to a higher level. After removing leak data in the test dataset, the average RMSLE was calculated for each medal level as a benchmark. Power meters with high-correlated topics have a 1.9% error reduction in RMSLE, which is equivalent to the shift to the Top 5 prize-winning level of the GEPIII competition. As for fair and poor correlated topics, the error increased by +0.2% and +0.9%, respectively, causing no change to their medal level. To further analyze the temporal change of error after adding Google Trends data, Figure 9 shows aggregated prediction results and the temporal error in weekly values. Since officeand education-related topics have the most significant positive impact on the prediction accuracy, Microsoft Excel and Education in the US were selected as the topics in the example. For Microsoft Excel, an office-related topic, the baseline model could not sufficiently predict the electricity consumption during holidays or vacations. In contrast, the proposed method with Google Trends can better predict energy use during holidays or vacations by using Microsoft Excel's search volume as an additional feature. Topic Education had similar positive benefits, with significant error reduction during site-specific schedules (e.g., winter and summer vacations). These visualization results show Google Trend's potential as an occupant-driven feature, helping the baseline model predict more accurately. Figure 10 provides examples from two buildings, comparing the predictions before and after adding Google Trends. In the case of Building 642 (education-type building), the baseline model maintains a similar trend throughout the year, with significant errors during summer vacation and special holidays (e.g., Thanksgiving and Christmas). However, after adding Google Trends, the proposed method can predict energy values more accurately on these special days, and the RMSLE could be reduced from 0.317 to 0.280 (-11.8%). Similarly, in offices such as Building 1142, the error of the baseline model mainly comes from national holidays, such as Thanksgiving and Christmas. With the help of Google Trends as a new feature, the proposed method can effectively reduce the RMSLE from 0.234 to 0.180 (-23.3% ). 196 991 *Some of meters were removed because they were leak and became public to competitors in the GEPIII competition Table 7 : Error comparison between baseline and proposed method with benchmark from GEPIII This analysis illustrates the ability for an innovative data source that has correlation and prediction power for energy consumption to be added to a process of modeling. In previous work, the GEPIII competition was shown to have about 20% of time with the classification of error [29] . Errors were classified based on two main aspects: single-or multi-building (whether more than 33% of buildings in a single site have simultaneous errors) and in-range or out-of-range (whether the scaled RMSLE is greater than 0.3). In this study, the proposed method of combining Google Trends into an energy model can specifically reduce the in-range error of multiple buildings. These non-extreme and cross-building errors mostly come from scheduling scenarios, such as breaks or holidays on university campuses. In most current energy models (including the GEPIII competition), the lack of schedules or calendars would cause simultaneous errors in a specific group of buildings. However, incorporating Google Trends data into energy modeling with correlation examination could reduce such errors through these alternative signals representing occupancy in buildings. This framework opens doors for the screening and utilization of other data sets. Calendar data, in previous studies, have mainly used category variables or binary values to define day types as model inputs. This study provides a calendar feature with continuous values, the search volume of topics on the Google Trends platform, and model inputs that closely match energy usage behavior. Other data sources could be used in the same way to influence the prediction of building energy. WiFi connection data is beginning to emerge that can estimate the building's occupancy [50, 51, 52] . Bluetooth Low Energy (BLE) systems can collect information about occupants' patterns of use of the building [53, 54] . Even text data sources from maintenance systems or emails could be extracted to understand the operations of the building [55, 56] . Improving building energy prediction for anomalous days can positively influence the implementation of various analytics and control approaches. For example, large-scale building energy benchmarking could utilize alternative data sources to increase the accuracy of determining buildings that are better or worse than their peers [57] . The method in this paper could influence the accuracy of short and medium-term prediction used for occupant-based building controls [58, 59] . Processing of data for calibration and the use in the simulation process could be improved [60, 61, 62] . The influence of predicting localized behavior better could better influence the classification of building use types [63] . Out of 2380 power meters in the study, only 293 meters have high correlations with Google Trends data and significantly improve prediction performance. One of the main reasons is that around 50% meters in the dataset, non-electricity meters, are more dependent on weather variables due to their role in heating and cooling the indoor environment. However, there are still more than 70% electricity meters that don't have correlated trends from Google Trends topics. Besides, out of 39 buildingrelated topics, less than ten were highly correlated with energy use (R 2 >0.8). These highly correlated keywords can be mainly classified into office-related keywords (e.g. Enterprise, Microsoft Excel, etc.) and education-related keywords (e.g. Education, Primary school, etc.). This result indicates that Google Trends might only positively influence office and education building in the energy model. For other non-primary building types, like Parking and Retail, no highly correlated topics were found, resulting in no significant improvement for these cases. To fulfill these unsolved issues, more topics of Google Trends could be tested for finding more variation, or alternative datasets from other sources (e.g., WiFi and mobility data) could also be used for representing occupancy in buildings. There are numerous topics and terms that future researchers can explore to understand whether improvement is possible. It is possible even to try other online data sources such as Twitter or Linked-In that could contain signals that are proxies of behavior that influence building energy performance. Another limitation to consider is how Google Trends is affected by the size of the selected region, with a trade-off between region specificity and data quantity. Larger regions (e.g., countries) usually contain larger search volumes and have more stable and good data quality, but at the expense of losing region specificity. In contrast, smaller regions (e.g., cities) have unstable data quality due to less data, but the data could be much closer to the occupant behavior in the local area. This study uses country-wide Google Trends data as the calendar feature to ensure robustness with enough quantity. However, city-or state-specific events will not be specified in the country-wide Google Trends data, making it difficult to predict energy use for local events accurately. This study shows a process of combining temporal data collected from Google Trends to predict energy consumption in buildings. It was found that the search volume of some building-related topics is highly correlated with building energy use from a subset of meters. The similarity between search volume and building energy use enables Google Trends data to help improve the prediction of energy models. In particular, during non-working periods (e.g., consecutive holidays and school breaks), Google Trends can effectively provide occupancy of different building types as a new feature. This insight effectively reduces the high cost needed in previous studies that required separately developing a model for each day type or manually collecting calendar data. This study also provides a simple but effective method for evaluating calendar features for energy prediction models. The best-fit topics can be found efficiently by calculating the linear correlation between Google Trends and energy data. In addition, according to the comparison of prediction results before and after adding Google Trends data, the data with high correlation (R 2 >0.8) topics can significantly improve prediction results, comparable to top-5-winning solutions. On the other hand, however, Google Trends data with lower correlation risks increasing the prediction error. This framework could also be applied to similar data sets from other sources, such as transportation mobility data or WiFi connection data, and is expected to have similar effects in providing an occupant-driven feature, thereby improving energy prediction. The Google Trends topics used in this analysis include considerations for crosslanguage characteristics making the methodology applicable to energy data from building across sites or countries, giving the potential of incorporating occupant-driven data into energy modeling. This analysis can be reproduced using the data and code from the following GitHub repository: https://github. com/buds-lab/google-trends-for-buildings. Building thermal load prediction through shallow machine learning and deep learning Advanced data analytics for enhancing building performances: From data-driven to big data-driven approaches Development of an ANN-based building energy model for information-poor buildings using transfer learning Automated measurement and verification: Performance of public domain wholebuilding electric baseline models Fernandes, Accuracy of automated measurement and verification (M&V) techniques for energy savings in commercial buildings Application of automated measurement and verification to utility energy efficiency program data Bayesian calibration of building energy models with large datasets Continuous-time bayesian calibration of energy models using BIM and energy data Review of data-driven energy modelling techniques for building retrofit Data science for building energy efficiency: A comprehensive text-mining driven review of scientific literature Hourly thermal load prediction for the next 24 hours by ARIMA, EWMA, LR and an artificial neural network A Change-Point principal component analysis (CP/PCA) method for predicting energy usage in commercial buildings: The PCA model Inverse blackbox modeling of the heating and cooling load in office buildings Comparison between detailed model simulation and artificial neural network for forecasting building energy consumption A comparative analysis of artificial neural network architectures for building energy consumption forecasting Trees vs neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption Deep reinforcement learning to optimise indoor temperature control and heating energy consumption in buildings Deep learning techniques for load forecasting in large commercial buildings A short-term building cooling load prediction method using deep learning algorithms Gradient boosting machine for modeling the energy consumption of commercial buildings A review of data-driven building energy consumption prediction studies The ashrae great energy predictor iii competition: Overview and results, Science and Technology for the The great energy predictor shootout II Predicting hourly building energy use: The great energy predictor shootout-Overview and discussion of results Great energy predictor shootout II: modeling energy use in large commercial buildings Predicting system loads with artificial neural Networks-Methods and results from" the great energy predictor shootout The building data genome project 2, energy meter data from the ASHRAE great energy predictor III competition More buildings make more generalizable Models-Benchmarking prediction methods on open electrical meter data Limitations of machine learning for building energy prediction: ASHRAE great energy predictor III kaggle competition error analysis Statistical change detection of building energy consumption: Applications to savings estimation Holiday load forecasting using fuzzy polynomial regression with weather feature selection and adjustment Holidays short-term load forecasting using fuzzy improved similar day method Energy consumption prediction of airconditioning systems in buildings by selecting similar days based on combined weights Improving Short-Term heat load forecasts with calendar and holiday data Comparison of three short-term load forecast models in southern california Using google trends and ambient temperature to predict seasonal influenza outbreaks The use of google trends in health care research: a systematic review COVID-19 predictability in the united states using google trends time series Quantifying trading behavior in financial markets using google trends Forecasting private consumption with google trends data Nowcasting with google trends in an emerging market Forecasting private consumption: survey-based indicators vs. google trends Bringing forecasting into the future: Using google to predict visitation in U.S. national parks Forecasting tourism demand with google trends: Accuracy comparison of countries versus cities Google trends and tourists' arrivals: Emerging biases and proposed corrections Context-specific urban occupancy modeling using location-based services data Planning for sustainable cities by estimating building occupancy with mobile phones An approach for obtaining and extracting occupancy patterns in buildings based on mobile positioning data Principal component analysis, Chemometrics and intelligent laboratory systems Understanding occupancy pattern and improving building energy efficiency through Wi-Fi based indoor positioning Building occupancy and energy consumption: Case studies across building types Proceedings of the 7th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, BuildSys '20 A scalable bluetooth low energy approach to identify occupancy patterns and profiles in office spaces Humans-asa-Sensor for Buildings-Intensive longitudinal indoor comfort models Twenty years of building performance analysis trends: A topic modeling analysis of the Bldg-Sim email list archive Text-mining building maintenance work orders for component fault frequency Data-driven urban energy simulation (DUE-S): A framework for integrating engineering simulation and machine learning methods in a multi-scale urban energy modeling workflow Using Occupant-Centric control for commercial HVAC systems A userinteractive system for smart thermal environment control in office buildings Occupancy data at different spatial resolutions: Building energy performance and model calibration A Data-Driven load shape profile based building benchmarking: Comparing doe reference buildings with a large metering dataset SynCity: Using open data to create a synthetic city of hourly building energy estimates by integrating data-driven and physics-based methods Islands of misfit buildings: Detecting uncharacteristic electricity use behavior using load shape clustering The authors would like to thank the team who developed and released the BDG2 data set and the technical aspects of the GEPIII competition, including (alphabetical order) Anjukan Kathirgamanathan, Bianca Bicchetti, Brodie Hobson, Forrest Meggers, June Young Park, Pandarasamy Arjunan, Paul Raftery, Zixiao Shi, and Zoltan Nagy. The authors would also like to thank the GEPIII planning committee members, including Anthony Fontanini, Chris Balbach, Jeff Haberl, and Krishnan Gowri. The ASHRAE organization is acknowledged for supporting the competition prize money and the Kaggle platform for hosting GEPIII as a non-profit competition.