key: cord-0670616-cthh6pfy authors: Smerdov, Anton; Burnaev, Evgeny; Somov, Andrey; Stepanov, Anton title: AI-enabled Prediction of eSports Player Performance Using the Data from Heterogeneous Sensors date: 2020-12-07 journal: nan DOI: nan sha: 25ce7f2800c0d4fe96efdbe0f38fddd40a9ef2d0 doc_id: 670616 cord_uid: cthh6pfy The emerging progress of eSports lacks the tools for ensuring high-quality analytics and training in Pro and amateur eSports teams. We report on an Artificial Intelligence (AI) enabled solution for predicting the eSports player in-game performance using exclusively the data from sensors. For this reason, we collected the physiological, environmental, and the game chair data from Pro and amateur players. The player performance is assessed from the game logs in a multiplayer game for each moment of time using a recurrent neural network. We have investigated that attention mechanism improves the generalization of the network and provides the straightforward feature importance as well. The best model achieves ROC AUC score 0.73. The prediction of the performance of particular player is realized although his data are not utilized in the training set. The proposed solution has a number of promising applications for Pro eSports teams and amateur players, such as a learning tool or a performance monitoring system. eSports is an organized video gaming where the single players or teams compete against each other to achieve a specific goal by the end of the game. The eSports industry has progressed a lot within the last decade [1] : a huge number of professional and amateur teams take part in numerous competitions where the prize pools achieve tens of millions of US dollars. Its global audience has already reached 380 mln. in 2018 and is expected to reach more than 550 mln. in 2021 [2] . eSports industry includes so far a number of promising directions, e.g., streaming, hardware, game development, connectivity, analytics, and training. Apart from the growing audience, the number of eSports players and Pro-players (or athletes), the players with a contract, has tremendously grown for the last few years. It made the competition among the players and teams even harder, attracting extra funding for training process and analytics. The opportunity to win a prize pool playing a favorite game is very tempting for amateur players, and most of them consider a professional eSports career in the future. At the same time, analytics and training direction is recognized as the most promising one as it includes the innovative research and business in artificial intelligence, data/video processing, and sensing. Although eSports is recognised in many countries as sport, it is still in infancy period: there is a lack of training methodologies and widely accepted data analytics tools. It makes it unclear how to improve the particular game skill except for spending the lion share of time in the game and watching how the popular streamers perform, and participating in trainings. Currently, there is a lack of tools providing feedback about the player performance and advising how to perform better. It creates a huge potential for the eSports research in order to understanding the factors essential to win in a game. Considering the abundance of data available through the game replays, so-called 'demo' files, allow for replicating the game and performing fundamental analysis. This kind of analytics is available for both amateur players and professional eSports athletes. In terms of prediction and analytics, which is relevant to the research reported in this work, most of the current works in eSports rely exclusively on the in-game data analysis. However, using only in-game data for estimating the players' performance is a limiting factor for providing helpful feedback to the team and players. While it can provide the primary information about the gamer's traits and behavior, the huge amount of data from the physical world and captured by sensors [3] is omitted. Moreover, sensor data may be more suitable for the eSports domain since models trained on in-game data only quickly become obsolete when a new patch is released. Information about the player's physiological conditions, e.g. heart rate, muscle activity, and movements can supplement logs obtained from in-game data to provide additional information for predictive models and potentially improve their performance. Multimodal systems utilizing this information have already been explored for audio, photo, and video stimuli [4] . In this article, we report on predicting the eSports player performance using the data collected from different sensors and recurrent neural network for data analysis. While there is a number of relevant research papers dealing with the prediction of a player skill in general, to the best of our knowledge, there is no research on the estimation of current player performance at a particular moment of time relying using various sensors and the data collected from Pro players. This immediate prediction can provide the instantaneous feedback and can serve as a useful tool for the eSports team analysts and managers to monitor the current conditions of players. Another practical application is the real-time performance monitoring tool for eSports enthusiasts who want to progress towards the professional level and sign a contract with a professional team. Since playing a game is a high mental load arXiv:2012.03491v2 [cs.HC] 24 Aug 2021 and stress, we propose to use a multimodal system to record the players' physiological activity (heart rate, muscle activity, eye movement, skin resistance, mouse movement), gaming chair movement, and environmental conditions (temperature, humidity, and CO 2 level). This data may help explain variations in gaming performance during the game and to identify which factors affect the performance the most. Contribution of this work is threefold. (i) Experimental testbed and heterogeneous data collection from various sensors. The dataset is collected in collaboration with a professional eSports team. (ii) Investigation of the optimal neural network architecture for predicting a player performance and interpreting the obtained results. (iii) In terms of data analysis, we made a special emphasis on the current performance status of the player instead of considering the overall player skill. This paper is organized as follows: in Section II we introduce the reader to the relevant research in the area. Afterwards, we present methods used in this research in Section III. Experimental results are demonstrated in Section IV. Finally, we provide concluding remarks in Secion V. Wearable sensors and body sensor networks have been widely applied for assessing a human behaviour and activity recognition in many areas [5] . However, this approach has not been extensively used for assessing eSports players: typically in-game data or data collected from computer keyboard and mouse have been analyzed so far. Due to this limitation the performance evaluation methods are limited as well. This section is therefore divided into two parts: first, we overview relevant research in terms of data collection and activity recognition and, second, we discuss recent research on performance evaluation methods. Indeed, there is a lack of prior research utilizing sensors data to predict eSports player behavior. It happens due to young research domain in eSports. Recent research on using sensor data in eSports is limited to predicting the overall player skill or finding simple dependencies in the data. In [6] authors investigate the correlations between psychophysiological arousal (heart rate, electrodermal activity) and self-reported player experience in a first-person shooter game. Similar research about the relation between player stress and game experience has been investigated in the MOBA genre [7] . The connection between the gaze and player skill is investigated in [8] . Mouse and keyboard data is a natural source of information about the players. Its relation with player performance in first-person shooters is covered in [9] for Red Eclipse and [10] for Counter-Strike: Global Offensive (CS:GO). Player performance can also be predicted by activity on a chair during the game [11] or in reaction to key game events [12] . However, there is extensive research work carried out on data collection and activity recognition in other applications including sports, medicine, and daily activity monitoring. In sports, wearable sensing systems are used for detection and classification of training exercises for goalkeepers [13] , assessing header skills in soccer [14] . Also, wearable systems were designed to classify tricks in skateboarding [15] , classify popular swimming styles using sensors [16] , and other activities in sports [17] . In terms of daily activity monitoring and medical applications, they have been studied for nearly three decades with the use of wearable sensors. Many medical studies deal with the investigation of human gait, for example, for patients with the Parkinson's disease [18] . Most of the current research in eSports analytics relies exclusively on the in-game data collection and further analysis. It has been shown that information about kills, deaths, and other game events can help predict a game outcome in Multiplayer Online Battle Arena (MOBA) discipline for Dota 2 [19] , League of Legends [20] , and Rocket League [21] . Another opportunity to predict the game outcome in the MOBA related disciplines is based on the features extracted from players' match history, as well as in-game statistics [22] . Players match history can also be used to create a rating system for predicting the matches outcome in the First-Person Shooter (FPS) genre [23] . As noticed earlier, the in-game data in eSports is widely used for analytic studies in the area. Drachen et al. consider clustering a player behavior to learn the optimal team compositions for League of Legends discipline to develop a set of descriptive play style groupings [24] . Research by Gao et al. [25] targets the identification of the heroes that players are controlling and the role they take. The authors have used classical machine learning algorithms trained on game data to predict a hero ID and one of three roles and achieve the accuracy ranging from 73% to 89% which depends on features and targets used. Eggert et al. [26] has continued the work by Gao et al. [25] and applied the supervised machine learning to classify the behavior of DOTA players in terms of hero roles and playstyles. Martens et al. [27] have proposed to predict a winning team analyzing the toxicity of in-game chat. In [28] authors used pre-match features to predict the outcome and analyze blowout matches (when one team outscores another by a large margin). The research reported in [29] describes the cluster evaluation, description, and interpretation for player profiles in Minecraft. The authors state that automated clustering methods based on game interaction features help identify the real player communities in Minecraft. In many domains the skills and performance can be assessed and/or predicted based on sensors data [30] . In sport, the data from the Inertial Measurement Unit (IMU) can be helpful for estimating volleyball athlete skill [31] . Tennis player performance can be assessed from the IMU data on the hand and chest [32] , or on the waist, leg, and hand [33] . Similar techniques have been investigated for skill estimation in soccer [13] , climbing [34] , golf [35] , gym exercising [36] , and alpine skiing [37] . Another popular domain for skill assessment based on the sensor data is surgery. In [38] authors use IMU data to create a skill quantification framework for surgeons. Ershad et al. [39] have shown the connection between the surgeon skill and behavior information collected from IMU. Ahmidi et al. [40] developed a system using motion and eye-tracking data for surgical task and skill prediction. The authors have used hidden Markov models [41] to represent surgeon state which is similar to the method proposed in this work in Section III. The connection between the surgeon actions and pressure sensors data has been investigated in [42] . Physiological data have also been used for predicting skill level and skill acquisition in working activities, such as mold polishing [43] and clay kneading [44] , as well as dancing [45] . In this research, we use a number of wearable and local unobtrusive sensors used for data collection during the game session and further data analysis. It was carried out with respect to the players needs. In this section, we describe the sensors used in this research, data collection procedure, data pre-processing, and data analysis helping predict the players performance. In Figure 1 we present an overview of prediction system. In our work, we use three groups of sensors: physiological sensors, sensors integrated into a game chair, and environmental sensors. The sensor network architecture is shown in Figure 2 . The list of sensors used, their locations, and sampling rates are presented in Table I . Further issues associated with sampling rates are discussed in Section III-D1. Physiological data recorded: • Electromyography (EMG) data as an indicator of muscle activity. EMG data are related to physiological tension affecting player's current state [53] . • Heart rate data received by a heart rate monitor on a chest. High values correspond to mental stress and arousal [54] , which might affect the rationality of player decisions. • Electrodermal activity (GSR) or skin resistance data as a measure of person arousal [55] . This value is also connected with the stress level. • Eye tracker data of player gaze position on a monitor in pixel values. The player must check the minimap and other indicators on the screen to have relevant information about the game and, thus, make effective decisions [8] . • Mouse movements captured by a custom python script as a measure of the intensity of a player input. This data is an indirect indicator of the hand movement activity as well as the player skill [9] . The sensors integrated into a game chair are presented by a 3-axial accelerometer and a 3-axial gyroscope. We illustrate axes orientation for the chair in Figure 3 . Recorded data includes: • Linear acceleration of a chair. It captures the player movements to the game table, parallel to it, chair height changes, and small oscillations possible in stress conditions. Behavior on a chair is connected with the player skill [12] , [11] . • Angular velocity of a chair. This data provides the information about the person's wiggling and spinnings on a chair. Environmental data recorded: • CO 2 level. High CO 2 level results in the reduction of cognitive abilities [56] , thus directly affect the gaming process. • Relative humidity. High level of relative humidity results in the reduction of neurobehavioral performance [57] • Environmental temperature. Too warm conditions may affect the human performance [58] . Apart from a number of heterogeneous sensors, the sensor network has a dedicated storage server (based on Intel NUC The sensors have a wireless connection to the network as they are placed near the eSports athletes (WLAN). The router has a low latency connection to the Internet (WAN). Proper synchronization of the sensors and gamer PC is essential for further data collection and analysis. 1) NTP Server: At present, there are many options for building time-synchronous systems for industrial applications, e.g., TSN by NI 1 . At the same time, there must be a reasonable balance between the cost of synchronization solution and its accuracy. The cost of most industrial solutions is high, preventing them from integrating into the player's PC. It happens since the desktop computers are primarily selected according to the 3D games performance criteria and do not have specific hardware devices on board. That is why we decided to realize the synchronization on a single NTP server. A reliable and always available server which could be located close enough and characterized by the minimum delay in transmitting the packets over the network is vital for our sensor network. A single-board computer Raspberry PI 3B was selected as a server, and a GPS signal was used as a source of reasonably accurate time. The signal from the satellite was received by a separate module based on the MTK MT3333 chipset and having the UART interface as well as supporting the PPS signal on Raspbian Stretch OS, GPS support packages (gpsd, gpsd -clients, pps-tools) and Chrony time server was installed. Raspberry PI was located near the window for better satellite signal reception and connected to the local area network via a wired interface. The presence of a dedicated PPS signal acquired by a separate IO pin (GPIO) Raspberry PI made it possible to ensure time accuracy in the range of 10 −5 − 10 −6 s (time accuracy of 1 − 10 us). 2) Sensors: The sensors in our network are deployed on Raspberry PI (RPi). The broadcast network "sync" command was sent to the sensors prior to measurements. After the command reception, a custom made script synchronized the local time to the local NTP server (Stratum 1) time on each RPI. Feedback status with the current time difference was also reported from every RPI to the local data storage PC. In 1 http://www.ni.com/white-paper/54730/en/ this case, all RPI were synchronized before the measurement procedure starts. Time drift of local RPI time was measured: it is in the range 10 − 20 ms per hour. In this case, the sync command was repeated every 10 minutes. This allows us to have synchronized sensors all the time. 3) Gamer PC Synchronization: Performing the time synchronization on a gamer PC was another significant issue. The players used MS Windows OS on their PCs which do not provide the accurate time to the user by default (you can check how accurate the clock on your PC is at www.time.is). The default settings in Windows 7/8/10 allow the users to synchronize time with the NTP server only once a week. At the same time, the average time drift of the clock is 50 ms per hour and even more for an ordinary PC according to our measurements. In MS Windows 10 OS build 1607 and newer, there is a way to reduce the synchronization period and get significantly higher time accuracy by setting the registry. Then Windows Time Service should be switched to Auto (always loaded after the PC starts) start mode. The accuracy of the clock within 1 ms requires to meet a number of conditions 2 . In our experiment (taking into account the local time server Stratum 1 based on RPI), all the requirements were met with the exception of the ping value (it was < 1 ms, instead of the required value < 0.1 ms). However, it allows us to achieve the necessary synchronization accuracy. In the case of proper registry settings after some time, the drift is compensated by the internal Windows algorithms, and the clocks become synchronous with the time server (within 2 − 3 ms accuracy). Upon synchronizing the hardware in the network, we start the data collection procedure. We invited 21 participants to play FPS Counter-Strike: Global Offensive (CS:GO) for 30-35 minutes. We note here that six pro-players took part in this experiment. All the participants were informed about the project and the experimental details. Every participant signed a written consent form which allowed recording physiological and in-game data. Then players were equipped with the sensors for data collection. We did not receive any complaints about the uncomfortable gaming experience because of the sensors. The experimental testbed is snown in Figure 4 . Players needed to play Deathmatch mode of CS:GO. The example of the game screen interface is shown in Figure 5 . In this mode the goal of each player is to achieve as many kills of other players as possible and to minimize the number of their own deaths. When the player is killed it immediately respawns in a random location in the game. This mode is often used by eSports players in their training routine. After the game had been finished, we saved the replays for the future game events extraction. Collected data samples for the 5-minutes intervals for two players are shown in Figure 6 . For the reader convenience, we color the intervals in the 1-second vicinity of kill and death events. It is clear that for the data from some physiological indicators, e.g. skin resistance or heart rate, there are global and local trends, which might align with changes in the player's efficiency in the game. Another point is that the players usually do not move a lot on a gaming chair. However, they can change their posture from time to time, and this event is captured by the IMU on the chair and also might be connected with the game events and player performance. To get rid of the noise and occasional outliers, we have clipped all the data by 0.5 and 99.5 percentiles and smoothed the data by 100 ms moving window. We have also reparametrized gaze, mouse, and muscle activity signals. Mouse signal has been converted from x and y increments to Euclidian distance passed to reflect the mouse speed; gaze data has been transformed from x and y coordinates to Euclidian distance passed; muscle activity signal has been changed to L1-distance to the reference level for the player in order to represent the intensity of muscle tension. For 3.7% data missed we have used linear interpolation to fill in the unknown values since it provides a stable and accurate approximation. In order to predict the player performance at each moment of time, it is convenient to resample the data from all the sensors to the common sampling rate. This helps apply the proposed data analysis for discrete time series predictions, such as hidden Markov models [41] or recurrent neural networks [59] . However, data from different sensors has different underlying nature and should be resampled accordingly. While it is reasonable to average the data within a time step interval for heart rate, skin resistance, muscle activity, environmental data, chair acceleration and rotation, averaging is not applicable for the gaze movement and mouse movement data. The reason behind it is that we are interested in the total distance passed within the time step instead of the average distance passed per measurement. The total distance does not depend on a number of samples, but only on their sum. Resampling introduces an important hyperparameter time step. Throughout the manuscript we refer to it is as ∆t ∈ R. Big time step values, e.g. 5 minutes, are not meaningful for our problem since we need to extract the relevant information about the player. On the other hand, too small timestamp, e.g 0.1 s, may lead to an excessive number of observations and noisier data. Indeed, the resampling time step should not be smaller than the time between the measurements for the majority of sensors. After converting the sampling rates to the common value we obtained a 15-dimensional feature vector for each moment of time. Further in the paper we will refer to this feature vector as x(t) ∈ R n . Its components are described in Table II . There is no generic player effectiveness metric for the majority of eSports disciplines. The most popular evaluation metric for FPS and MOBA games is Kill Death Ratio (KDR) [60] . It equals the number of kills divided by the number of deaths for the time interval. If KDR ¿ 1, it means that the player performs well, or, at least, better than some players on a game server. Otherwise, the player most likely performs bad compared to other players. KDR takes values from 0 to +∞ which is not a clear range for prediction. When the player performs very well and has many kills and few deaths, KDR is fluctuating drastically because of division by a small number. In opposite, if there are many deaths and few kills, KDR is around 0 and changes slowly. This drastic inconsistency in the target creates difficulties for training machine learning algorithms. One possible solution could be to apply logarithm to KDR, but this does not solve the issue with the scale, because logarithm takes the values from −∞ to +∞. We propose a more numerically stable target value which equals the proportion of kills for the player. More precisely, where p τ (t) ∈ R is the proportion or performance and equals the proportion of kills for a player at the moment t ∈ R considering the kills and death in the next τ ∈ R seconds. In other words, it is the ratio of kills in the next τ seconds. K(t) ∈ R and D(t) ∈ R are the total number of kills and deaths at the moment t; therefore k τ (t) ∈ R and d τ (t) ∈ R equals the number of kills and deaths within the interval [t, t + τ ]. p τ (t) varies from 0 to 1 and has higher values for well-performing players. Bounding by 0 and 1 helps efficiently train the machine learning algorithms that are sensitive to the target scale. The important hyperparameter introduced above is τ . Essentially, it is a window size for which the information about the future player performance is aggregated. Small values of the hyperparameter like 1 s lead to noisy target values, while large values like 10 minutes neglect the subtle yet important changes in the player performance. τ is commonly referred to as a forecasting horizon. p τ (t) is a well defined metric for evaluating the player effectiveness, and it is possible to predict this value directly, thus considering the problem as a regression problem. However, it is unclear how to interpret the quality of regression results in an understandable and interpretable way. Formulating the problem in classification terms helps measuring the quality of prediction by more comprehensive classification metrics like accuracy, ROC AUC, and others. These metrics are much easier to compare with results obtained on other data or by other models. The natural way to claim if a person plays well or bad at the moment is to compare the current performance with his or her average performance in the past. It is important to consider the past events only to avoid overfitting. Formally: where y τ (t) ∈ {0, 1} and equals 1 in case of good game performance and 0 otherwise; p τ (t) is an average player performance in the past. Figure 7 demonstrates how the kills ratio p τ (t) and corresponding binary target y τ (t) changes over time for three forecasting horizons. The substantial advantage of using y τ (t) instead of p τ (t) is target unification between the players. 0 and 1 values of y τ (t) have the same meaning for all players and imply bad and good current performance, respectively. That is not the case for raw values of p τ (t), because the same values of p τ (t) may be good for one player, but bad for another one. For example, 0.5 kills ratio may be an achievement for a newbie player, but a failure for a professional player. Another justification of using y τ (t) as a target is robustness to the skill of other players on a server. Player's score p τ (t) may be too low in absolute value because of the strong opponents, but target y τ (t) is robust because it evaluates the performance within one game. The motivation for this target is to unify the target variable for all players and to provide in advance an immediate feedback for a player or for a manager that something is going wrong. Predicting future player performance y τ (t) is essential for coaches and progressing players as it provides the quick feedback on players' actions. That might help identify the inevitable failures in player performance, e.g. burnout, fatigue, etc., in advance and take measures to help the player to recover or even to change the player during the eSports competition. This target is also helpful for learning purposes: although a person plays very well or very poor, it helps find the moments when the player performs a bit better or worse than average. Despite we formulate the performance metric in terms of kills/deaths, the metric is directly applicable to the majority of First-Player Shooters, as well as other games including kills/deaths. The performance metrics for these and other games can be calculated in other terms (such as gold, scores, progress, etc.), while the data processing and algorithms may be the same. We trained four models for predicting a player performance using the data from sensors: baseline model, logistic regression, recurrent neural network, and recurrent neural network with attention. In this section, we describe these methods in detail. The output for all models is the probability that the person will play better in some fixed period in the future. All the models are evaluated by ROC AUC score discussed in section III-G2. 1) Baseline: Before training a complex model, it is crucial to set up a simple baseline to compare it with. A common practice in time series analysis is to establish a baseline model using a current target value as a future prediction. For our problem, the baseline uses average player performance in the last τ seconds as a prediction. In other words, baseline prediction is p τ (t). This prediction is correct because p τ (t) takes values from 0 to 1, and then they treated as probabilities by the algorithm used to calculate the ROC AUC metric. 2) Logistic Regression: Logistic regression [61] is a simple and robust linear classification algorithm. It takes a feature vector x(t) ∈ R n as an input at the moment of time t and provides the probability y(t) equals 1 as an output: where w ∈ R n is the learnable weights vector, b ∈ R is the bias term. In our study, the dimensionality is n = 15 since we used 15 values from sensors. Logistic regression can capture only linear dependencies in the data because the feature vector x(t) involved in dot product with vector w only. 3) Recurrent Neural Network: A neural network can be considered as a nonlinear generalization of logistic regression. In this subsection, we first describe the essential components used in the network and then describe the entire architecture. Recurrent Neural Network Background A Recurrent Neural Network (RNN) is a network that maintains an internal state inherent to some sequence of events. It is proven to be efficient for discrete time series prediction [59] . One of the simplest examples of RNN is a neural network with one hidden layer. Denote the sequence of input features and targets as x 0 , x ∆t , . . . , x (N − 1)∆t and y 0 , y ∆t , . . . , y (N − 1)∆t respectively, where x(t) ∈ R n is the n-dimensional feature vector for the moment t, y(t) ∈ R is the corresponding target, ∆t is the time step used for discretization, N ∈ N is the total number of steps. At each moment t, the recurrent network has an mdimensional hidden state vector h(t) ∈ R m which is calculated using the current input x(t) and the previous hidden state h(t − ∆t): where W ∈ R m×n , U ∈ R m×m , b ∈ R m are the learnable matrices and the bias vector. The intuition behind using the RNN architecture might be in considering the hidden state of the network as a current state of the player represented by the data from sensors. The state is a vector with many components and some of them can present how well the person will play. The final predictionŷ τ (t) at the moment t is calculated by the feed-forward network consisting of 1 or more linear layers: where f : R m → R is the function corresponding to the feed-forward network. We used a sigmoid function as a final activation for the network to ensureŷ τ (t) ∈ [0, 1], soŷ τ (t) has a meaning of probability. More advanced modifications of the recurrent layer include the Gated Recurrent Unit (GRU) [62] and Long Short-Term Memory (LSTM) [63] . Both of them utilize the gating mechanism to better control the flow of information. LSTM architecture incorporates an input, output, and forget gates and a memory cell, while a simpler GRU architecture uses the update and reset gates only. We found GRU performs better in our task, so we formally define it as follows: where z(t) and r(t) are the update and reset gates, A popular technique for improving the network quality and interpretability is an attention mechanism. Temporal attention can help emphasize the relevant hidden states from the past. Input attention helps to select the essential input features. It is also possible to combine both of them [64] . Since the proposed GRU model uses only one previous hidden state for prediction, it makes no sense to use the temporal attention. However, the input attention can be used. The attention layer provides the weights vector α(t) ∈ R n , which is applied to a vector x (t) ∈ R n by the element-wise multiplication:x This operation demonstrates important components in the vector x(t) while decreasing the contribution of its nonrelevant components. Typically, α(t) components are bounded by 0 and 1 and produced by another linear layer integrated into the network. In order to consider both the current and the previous data from sensors, the attention layer takes current measurements x(t) and hidden state h(t) produced by GRU: where the W a ∈ R (m+n)×m , b a ∈ R m are trainable parameters. The intuition behind the input attention mechanism is a feature selection. Based on the current input and hidden state, it can help ignoring the uninformative features for each moment of time and keep relevant features unaltered. The ability to provide the time-dependent feature importance is a significant advantage compared to other methods, which can either provide feature importance for one particular moment or provide it only for the whole time series. The neural network architecture is shown in Figure 8 . First, the sensor data are processed by the dense layers for each feature group. Then the attention block is applied to amplify a signal from the features relevant at the moment. The resulting vector goes through the GRU cell to update the hidden state h(t). This hidden state is saved for further iterations and goes through a feed-forward network to form the eventual predictionŷ τ (t). The total inference time is about 5 ms on a CPU. The network was trained by the truncated backpropagation through time [65] technique designed for RNNs and Adam optimizer [66] with learning rate warmup [67] technique to improve the convergence. For attention and feed-forward networks, we used 1 and 2 linear layers, respectively, with ReLU nonlinearity. This activation function can improve the convergence and numerical stability [68] . To validate whether the attention mechanism helps improve network performance, we also trained another network without the attention block. GRU cell is a crucial part of the network architecture. It helps accumulate information about previous player states, so the network can use the retrospective context for prediction. This information is stored in the hidden layer of the GRU cell. According to the experiments, 8 neurons in the hidden layer works for our problem the best. Too few neurons caused low predictive power, while too many led to overfitting. The motivation to use separate linear layers for three feature groups is to combine more complex features from the sensors data and to preserve the disentangled feature representation for the attention layer. This is an analog of grouped convolutions [69] in convolutional networks. For the same reasons, we applied the attention to each feature group separately, thus having a 3-dimensional attention vector at each moment of time. k n 1 q l e I / 3 n D F M J b P + M y S Y F J u j w U p g J D j I s i 8 J g r R k H M D S F U c Z M V 0 y l R h I K p q 2 p K c F e / v E 5 6 V 0 3 X a b r 3 1 / V W u 6 y j g s 7 Q O W o g F 9 2 g F r p D H d R F F D 2 h Z / S K 3 q z c e r H e r Y / l 6 I Z V 7 p y i P 7 A + f w D a h Z R N < / l a t e x i t >ŷ G. Validation 1) Training and Evaluation Process: In order to correctly estimate the generalization capabilities of classical machine learning algorithms, we used repeated cross-validation [70] . In particular, we randomly split 21 players into the train and test groups of 16 and 5 players, respectively. Then we trained the algorithms and repeated this procedure again 100 times. It helps to lower the variance in evaluation. For neural networks, we also used a validation set, so we randomly split players into train, validation, and test sets with 11, 5, and 5 players, respectively. The neural network has been trained until the error on the validation set is not improving for 5 epochs (early stopping procedure). One training epoch comprises of 20 batches. Each batch consists of input features and targets for all the time steps for a randomly selected player from the train set. To minimize the randomness in the evaluation results, each network has been trained 15 times with random weight initializations and train/val/test split. In both cases, the input features for train, test, and validation sets are normalized based on the mean values and standard deviations calculated on the train set. 2) Evaluation Metric: Due to construction, the target is balanced: 50.1% belong to the positive class and 49.9% to the negative class. The common metric for classification evaluation is the area under the receiver operating characteristic curve (ROC AUC) [71] . It ranges from 0 to 1 with the 0.5 score for random guessing. Higher values are better. For proper evaluation, we first calculated the ROC AUC scores for each individual participant in the train, validation, or test sets, and then averaged the results. That is the proper evaluation because it estimates how well the model can separate the high/low performance conditions for one individual participant and does not benefit from separating the participants between themselves in the case when the metric is calculated on predictions for all the participants. We have trained the predictive models for several time steps ∆t (see Section III-D1) and forecasting horizons τ (see Section III-E) combinations. According to our experiments, reasonable ranges are from 5 to 30 s for time step, and from 60 to 300 s for the forecasting horizon. The average results for the neural network with attention are shown in Table III . According to Table III , the best time step value is about 20 s, and the best forecasting horizon for the model is from 3 to 5 minutes. In other words, the optimal way to predict the player behavior is to aggregate the sensors data every 20 s and make a prediction for the next 3-5 minutes. We have compared the performance of the algorithms described in Section III-F with respect to the time step values with fixed forecasting horizon τ equal 180 s. The results are shown in Figure 9 . There is a clear peak near 20-25 s for all the methods. The neural network consistently outperforms the logistic regression and the baseline model. The use of the attention block helps increase the model score. Figure 10 demonstrates the relation between the forecasting horizon and algorithms performance with the time step ∆t equal to 20 s. The neural network outperforms other methods and achieves the maximum performance for forecasting horizons in the range from 3 to 5 minutes. The attention block helps improve the model. In order to interpret the neural network predictions, we calculated the feature importances and visualized predictions of a pretrained network and its internal state for the discretization step equal to 20 s and the forecasting horizon equal to 3 minutes. Figure 11 shows how attention, network hidden state, target, and network prediction change over time. Clearly, the importance of different features varies over time, and periodically the data from some sensors is non-relevant. The network hidden state, which can be treated as a player state, also varies during the game. To calculate the feature importance, we trained 100 instances of the neural network with random weight initializations and the train/val/test splits, and the averaged attentions on the test set for the best epoch of each neural network. Afterwards, we averaged the results across all networks. The results are shown in Table IV and Figure 12 . Information about physiological activity such as heart rate, muscle activity, hand movement, and so on is the most relevant for the network. It is worth noting that all the feature groups have considerable importance, thus contribute to the overall prediction. Since the features in each feature group are mixed, we can not estimate the feature importance for every raw feature. Experimental results have shown practical feasibility of predicting the eSports players behavior using only data from sensors. The system updates the prediction several times per minute, so this interactivity is enough for potential users like eSports managers or professional players to understand that something goes wrong or to get a quick feedback. Feedback from the algorithm about the performance prediction and feature importance may suggest the users to change their gaming behavior. For example, an eSports manager may understand in time that bad results of the team are connected with the stuffy environmental conditions and adjust the air conditioning. Users may find information from the system useful to make the radical decisions, e.g., changing a player or gaming equipment (computer mouse, display, game chair, etc.). We have found that 20 s time step and 3-5 minutes behavior forecasting horizon are the most natural parameters for CS:GO, but potential users can set up any other hyperparameters depending on their scenario. Significant advantage caused by the diversified training dataset is universality for the player skill level enabling a wide range of potential users. Negligible model inference time and data collection time on a PC proves the principal feasibility of deploying the model on the edge devices, some of which might be designed specifically for neural network operations. However, the model retraining still requires high computational capability. Larger and more diversified dataset could improve the performance of prediction. The limitations of the study is a small number of participants involved, the fuzziness of the definition of player's performance, and overrepresentation of data from males and young people in the dataset. Future work includes more diverse and extensive data collection with more subjects recorded, and the investigation of better metrics of players' performance. These would allow researchers to utilize more complex machine learning methods and to develop more reliable and robust models. In this article, we have reported on the AI-enabled system for predicting the performance of eSports players using only the data from heterogeneous sensors. The system consists of a number of sensors capable of recording players' physiological data, movements on the game chair, and environmental conditions. Upon data collection we have processed them into time series with meaningful sensors features and the target extracted from the game events. The Recurrent Neural Network (RNN) demonstrated the best performance comparing to baseline and logistic regression. Application of the attention mechanism for RNN has helped to interpret the network predictions as well as to extract the feature importances. Our work showed the connection between the player performance and the data from sensors as well as the possibility of making a real-time system for training and forecasting in eSports. We have also investigated potential applications of the proposed AI system in the eSports domain. Given the growth of eSports activity due to the coronavirus pandemic in 2019-2021 and rapid development of consumer wearable devices within the last years, this work shows the prospectives of full-fledged research in the intersection of these two fields. Moreover, the model trained on the eSports domain can be transferred to other domains using domain adaptation methods to estimate user's performance similar to estimating in-game performance. Considering hunders of millions of active gamers in the world and widespread of wearables, crowdsourcing data collection is a promising way to collect the data on a global scale. We also see potential improvements in our system with computer vision methods. Emotion recognition and pose estimation techniques applied to data collected from web camera can provide more information about the current state of a player. Understanding esports as a stem career ready curriculum in the wild Global esports market report Recognizing emotional states with wearables while playing a serious game A survey on psycho-physiological analysis & measurement methods in multimodal systems An automated daily sports activities and gender recognition method based on novel multikernel local diamond pattern using sensor signals Correlation between heart rate, electrodermal activity and player experience in first-person shooter games Towards multimodal stress response modelling in competitive league of legends Visual fixations duration as an indicator of skill level in esports Predicting skill from gameplay input to a first-person shooter Esports athletes and players: a comparative study Understanding cyber athletes behaviour through a smart chair: Cs:go and monolith team scenario esports pro-players behavior during the game events: Statistical analysis of data obtained using the smart chair Sensor-based detection and classification of soccer goalkeeper training exercises On smart soccer ball as a head impact sensor Classification and visualization of skateboard tricks using wearable sensors Swimming stroke phase segmentation based on wearable motion capture technique Inertial sensor-based analysis of equestrian sports between beginner and professional riders under different horse gaits A measurement system to monitor postural behavior: Strategy assessment and classification rating Win prediction in esports: Mixed-rank match prediction in multi-player online battle arena games League of legends match outcome prediction A random forest approach to identify metrics that best predict match outcome and player ranking in the esport rocket league Real-time esports match result prediction Predicting winning team and probabilistic ratings in "dota 2" and "counter-strike: Global offensive" video games Guns, swords and data: Clustering of player behavior in computer games in the wild (preprint) Classifying dota 2 hero characters based on play style and performance Classification of player roles in the team-based multi-player game dota 2 Toxicity detection in multiplayer online games Trouncing in dota 2: An investigation of blowout matches Cluster evaluation, description, and interpretation for serious games Wearable biometric performance measurement system for combat sports Volleyball skill assessment using a single wearable micro inertial measurement unit at wrist Towards a wearable device for skill assessment and skill acquisition of a tennis player during the first serve Investigating the translational and rotational motion of the swing using accelerometers for athlete skill assessment Climbax: skill assessment for climbing enthusiasts Mems sensor application for the motion analysis in sports science The mobile fitness coach: Towards individualized skill assessment using personalized mobile devices Comfortable and convenient turning skill assessment for alpine skiers using imu and plantar pressure distribution sensors Beyond activity recognition: skill assessment from accelerometer data Automatic surgical skill rating using stylistic behavior components Surgical task and skill classification from eye tracking and tool motion in minimally invasive surgery Hidden Markov Models: Theory and Applications. BoD-Books on Demand Measurement of the physical properties during laparoscopic surgery performed on pigs by using forceps with pressure sensors Development of a new skill acquisition tool and evaluation of mold-polishing skills Hierarchical organization of the coordinative structure of the skill of clay kneading Analysis of the movement variability in dance activities using wearable sensors Grove-emg detector Grove -gsr Heart rate monitor Mpu-9250 product specification Winsen Electronics Technology Co Pmod hygro reference manual The effect of repeated tense-release sequences on emg and self-report of muscle tension: An evaluation of jacobsonian and post-jacobsonian assumptions about progressive relaxation Influence of mental stress on heart rate and heart rate variability The three arousal model: Implications of gray's twofactor learning theory for heart rate, electrodermal activity, and psychopathy Associations of cognitive function scores with carbon dioxide, ventilation, and volatile organic compound exposures in office workers: a controlled exposure study of green and conventional office environments Changes in eeg signals during the cognitive activity at varying air temperature and relative humidity Effects of thermal discomfort in an office on perceived air quality, sbs symptoms, physiological responses, and human performance Discrete time recurrent neural network architectures: A unifying review An exploratory study of player and team performance in multiplayer firstperson-shooter games Logistic regression Gate-variants of gated recurrent unit (gru) neural networks Long short-term memory A dual-stage attention-based recurrent neural network for time series prediction On training recurrent networks with truncated backpropagation through time in speech recognition Adam: A method for stochastic optimization On the variance of the adaptive learning rate and beyond Analysis of function of rectified linear unit used in deep learning Grouped convolutional neural networks for multivariate time series Estimating classification error rate: Repeated crossvalidation, repeated hold-out and bootstrap Understanding receiver operating characteristic (roc) curves The reported study was funded by RFBR according to the research project No. 18-29-22077. Authors would like to thank professional eSports team DreamEaters for fruitful discussions and data collection.