key: cord-0071723-sjhm7n4v
authors: Ren, Jinyan
title: Pop Music Trend and Image Analysis Based on Big Data Technology
date: 2021-12-09
journal: Comput Intell Neurosci
DOI: 10.1155/2021/4700630
sha: 90d8ce7d546b0bd7e18af4bcbd897bc7f395b949
doc_id: 71723
cord_uid: sjhm7n4v

With people's pursuit of music art, a large number of singers began to analyze the trend of music in the future and create music works. Firstly, this study introduces the theory of music pop trend analysis, big data mining technology, and related algorithms. Then, the autoregressive integrated moving (ARIM), random forest, and long-term and short-term memory (LSTM) algorithms are used to establish the image analysis and prediction model, analyze the music data, and predict the music trend. The test results of the three models show that when the singer's songs are analyzed from three aspects: collection, download, and playback times, the LSTM model can predict well the playback times. However, the LSTM model also has some defects. For example, the model cannot accurately predict some songs with large data fluctuations. At the same time, there is no big data gap between the playback times predicted by the ARIM model image analysis and the actual playback times, showing the allowable error fluctuation range. A comprehensive analysis shows that compared with the ARIM algorithm and random forest algorithm, the LSTM algorithm can predict the music trend more accurately. The research results will help many singers create songs according to the current and future music trends and will also make traditional music creation more information-based and modern.

As an entertainment product, pop music has attracted more attention. According to relevant research, China's mobile music market developed rapidly from 2013 to 2018. In addition, the development of many types of popular music determines the main development direction of music in the future to a certain extent [1] . It reflects the influence of many social behaviors on pop music and the audience's preference for related music [2, 3] . Using image analysis and prediction of the development trend of pop music, the collection of resources in the music library, and the integration of user behavior on different platforms, we can analyze user data and preferences, provide various pop music data sets, accurately analyze the specific attributes of music works, and accurately control dynamic pop music. e trend of user preferences determines the form of pop music [4] . ere is little research on the image prediction of pop music trend all over the world. Alibaba Group launched Alibaba music trend forecast in 2016. After approximately 7 years of development, Alibaba music has nearly 1 million analysis records and historical user behavior data. Later, the number of artists or songs to be played in the next step and the future dark horse will be excluded from the mainstream data. Multiple music platforms will mainly control the trend of pop music [5] .

Data mining is a new discipline born in the 1980s. It is mainly oriented to the research field of artificial intelligence for business applications. From a technical point of view, data mining is a process of obtaining implicit, undetected, and potentially valuable information and knowledge from a large number of complex, irregular, random, and fuzzy data. Data mining is to extract, transform, and analyze some potential laws and values from a huge database to obtain key information and useful knowledge to assist business decision-making. Today, more than 90% of the data on the internet is generated within two years, and the amount of data generated every day is still rising with great ease. Under this background, it is not enough to have the ability to receive and store massive data alone. It is also necessary to deal with these data effectively to obtain the laws and patterns that can guide the future behavior and improve the efficiency of enterprises, society effectiveness, and efficiency of organizations and institutions. e speed of computer processing is very fast, however, the law of digging and saving from massive data broadcasting is not a simple operation. erefore, it is necessary to have a good data mining algorithm to complete the process of "gold in the sand." erefore, various data digging and saving algorithms came into being. e current research aims to predict the music trends using the LSTM (long-term and short-term memory) algorithm and big data technology images, and they help the singers create songs according to the current and future trends of pop music. e innovation of this study is to select the most appropriate and accurate algorithm model by the comparative analysis of the ARIM algorithm, random forest algorithm, and LSTM algorithm.

e results show that compared with the traditional image prediction model, the LSTM algorithm has a better prediction accuracy.

In the current era of big data, various industries try to get rid of traditional development modes by utilizing data. e music industry has gradually aroused numerous scholars' interest in studying the application of big data in this field.

Wang et al. proposed a CL-LDA (Latent Dirichlet Allocation) topic model that could well adapt to the topic mining task of short text with sparse semantics and a lack of cooccurrence information in OHCs (online health communities) [6] . Hervé et al. found that the extraction of semantic memory and situational memory was completed by different neural networks. However, these results were basically obtained using language and visual space materials. ey tried to utilize common or uncommon melodies to explore the neural substrates under the semantic and episodic elements of music [7] . Jin et al. designed a smart neural network for music composition to automatically create specific genres of music. e model had a superior and innovative structure that acquired the music sequence using an actor's long short-term memory. en, it decided the probability of the sequence by a procedure via reward as feedback to improve the performance of music creation. Besides, the rule of music theory was introduced to confine the genre of generated music [8] . Pelchat and Gelowitz input the images of spectrograms generated from the time slices of songs into a neural network to classify the songs into their respective musical styles [9] . Yan trained the network weights by the T-S-based cognitive neural network and improved the genetic algorithm. ey integrated the integration of the momentum method and learning rateadaptive adjustment with the membership function parameter adjustment strategy. Besides, they introduced a compensation factor correlated to the input dimension into the membership degree. e extreme input dimension resulted in rule calamity, indicating that their research method was suitable for the music recognition system [10] . Meng and Chen selected two methods different from the previous Mel Cepstral Coefficients and Constant Q Transform to extract the features of music, and they used the convolution neural networks for training and recognition. ey adopted the Mel cepstral coefficient to determine timbre and used the constant Q transform to determine pitch. ey finally found that the recognition success rate reached 95% after inputting the corresponding features into the neural network for training and learning [11] . Zhang et al. proposed an improved music separation method based on discriminative training depth neural network and presented an improved objective function to discriminate against the training. Moreover, they added an additional layer to the DNN model and introduced the time-frequency masking to optimize the estimated accompaniment of the song. ey obtained the corresponding time-domain signal using inverse Fourier transform. Finally, they verified the influence of different parameters on the separation performance and compared it with the existing music separation methods. eir experimental results showed that the improved objective function and the introduction of timefrequency masking significantly improved the separation performance of the DNN, and the separation performance improved by approximately 4 dB compared with other existing music separation methods [12] . Dawson et al. stated that people could train the networks to solve musical problems and study how these networks encode musical properties.

ey also reported very high correlations between the network connection weights and discrete Fourier phase spaces used to represent the musical sets [13] . Dorfler et al. posed the question of whether replacing it by applying adaptive or learned filters directly to the raw data can improve learning success. e theoretical results showed that approximately reproducing the mel-spectrogram coefficients by applying the adaptive filters and subsequent timeaveraging on the squared amplitudes was in principle possible. ey also conducted extensive experimental work on the task of singing voice detection in music. e results of their experiments showed that for the classification based on convolutional neural networks, the features obtained from the adaptive filter banks followed by time-averaging the squared modulus of the filters' output perform better than the canonical Fourier transform-based mel-spectrogram coefficients. ey believed that the alternative adaptive approaches with center frequencies or time-averaging lengths learned from the training data performed equally well [14] . Liu et al. exploited the low-level information from the spectrograms of audio and developed a novel CNN architecture that took the multiscale time-frequency information into consideration. e CNN architecture transferred more suitable semantic features for the decision-making layer to discriminate the genre of the unknown music clip. ey conducted experiments on the benchmark datasets, including GTZAN, Ballroom, and Extended Ballroom, which proved the excellent performance of the architecture [15] . Zhou established the database of regional culture and music characteristic resources by data mining technology and classified the regional characteristic music and cultural resources data combined with the improved BP (Back Propagation) neural network model. ey also constructed a 2

Computational Intelligence and Neuroscience set of databases including classification, search, audition, and storage to protect and spread the regional music characteristic cultural resources. At the same time, it also provides new ideas for cultural heritage [16] . e above research on big data and different neural networks in the field of music has promoted the maturity of these technologies. ere are certain differences in the accuracy of music prediction models established by different types of neural networks and algorithms applied in the field of music.

erefore, different algorithms are selected to establish models here to predict the music trend more accurately.

A group of people in society, driven by certain psychological needs, carry out certain music behavior in a certain period, resulting in a certain music genre spread in a certain social background. is social phenomenon may form different degrees of social popularization and social fanaticism, which can be called music popularization. e music trend here is limited to a singer's specific distribution point in future. e current analysis and research of music are mainly under the foundation of users' advice.

ere are few studies in the field of pop music teaching, including neural networks, Gaussian mixture model, and support vector machine. e commonly used prediction methods of music ending prediction contain the SVM (support vector machine) and ANN (artificial neural network). ese models generally have a limited learning ability and a high-dimensional kernel function with low explanatory ability. In particular, the radial basis function is sensitive to missing data [17] , which is not a good choice for processing the mass of online music data. e utilization of artificial neural networks to build a music trend prediction model requires a large experimental environment and a long time with a medium prediction effect. Additionally, the artificial neural network needs various parameters that have great influences on the experimental results and bring complex workload [18] . e network music data is diverse, complex, high dimensional, and considerable.

e existing music video model and the traditional statistical model are often difficult to realize the efficient data analysis of online songs (downloads, play count, and collections). Moreover, musicians have many difficulties in connecting with audiences. e results of the deep mining of music data are not ideal either. In recent years, many prediction models have been studied from the perspective of regression prediction and time series. ey have achieved prediction results with a relatively high accuracy based on the Random Forest algorithm in many areas.

erefore, according to the regression prediction method, the ARIM algorithm, Random Forest algorithm, and LSTM algorithm are selected to establish the prediction models of music trend from the perspective of time series prediction. Figure 1 shows the CRISP-DM (crossindustry standard process for data mining) reference model. In the process of data mining, there are two vital links, namely, model evaluation and model establishment, both of which belong to machine learning.

Machine learning is a process of computers' simulation of human learning behavior. As two foundations of data mining, machine learning provides technical methods for analyzing data, and the database technology realizes the data management of data mining. Figure 2 represents the nexus of database technology, data mining, and machine learning [19] .

ere are three main types of machine learning: semisupervised learning, unsupervised learning, and supervised learning.

Supervised learning involves a set of labeled data, enabling computers to use specific patterns to identify the new samples of each labeled type. e two main types of supervised learning are classification and regression. e representative methods of supervised learning include the decision tree, Naive Bayes model, and support vector machine. In unsupervised learning, the data is labelless like most data in the real world. Hence, the unsupervised learning algorithm is particularly useful [20] . Unsupervised learning methods are divided into two categories: (1) one is a direct method based on the probability density function estimation: trying to find the distribution parameters in the feature space and classifying them. (2) e other is the concise clustering method based on the similarity measure between the samples: its principle is to try to determine the core or initial core of different categories and aggregate the samples into different categories according to the similarity measure between the samples and cores. Using the clustering results, we can extract the hidden information in the data set, classify the future data, and predict the future data. It is applied to data mining, pattern recognition, image processing, etc. e clustering is mainly used to classify the data according to its behavior or attribute, while the dimensionality reduction can reduce the variables of data sets. e most representative of unsupervised learning is the K-Means method. e role of semisupervised learning is to classify the data with or without labels using the classification function.

e most representative algorithm of semisupervised learning is the expectation maximization algorithm [21] .

Random forest refers to a classifier that uses multiple trees to train and predict samples. Random forest is an algorithm that integrates multiple trees using the idea of ensemble learning. Its basic unit is the decision tree, and its essence belongs to a major branch of machine learning-ensemble learning method.

ere are two keywords in the name of random forest, one is "random" and the other is "forest." Machine learning can usually be divided into the following categories: dimension reduction, clustering, regression, and classification. Regression is not only a kind of supervised learning technology but also a relatively comprehensive technology. e main role of Computational Intelligence and Neuroscience regression is to predict targets, such as future weather, future stock markets, or commodity prices. Compared with other methods, regression is the highest prediction method in terms of accuracy. erefore, many studies use the regression model to predict and analyze related issues. e most appropriate method should be selected first in the process of establishing the regression model. For example, the least square method is suitable when the dimension of the surveyed data is relatively small [22] . e regression model is widely applied in a vast range of prediction fields, such as the stock trend, economic trend, future product sales, and event risk prediction. Besides, the collaborative Random Forest algorithm can also have good prediction effects and broad applicability. e Random Forest algorithm is essentially a type of supervised learning algorithm, which is an integrated learning algorithm by a decision tree. e Random Forest algorithm can be applied not only to the classification of problems but also to the regression model [23] .

e decision tree is a tree structure in which each internal node represents a test on an attribute, each branch represents a test output, and each leaf node represents a category. e classification tree (decision tree) is a very common classification method. It is a kind of supervised learning. e so-called supervised learning is that given a pile of samples, each sample has a set of attributes and a category. ese categories are determined in advance, and then, a classifier can be obtained by learning, which can give correct classification to the new objects. Figure 3 shows the decision tree, where the circle represents the root node and internal node, and the square represents the leaf node. ree steps must be followed in the process of establishing a decision tree. Firstly, select the features of the target object. Secondly, establish the growth trend of the tree.

irdly, prune the tree. So far, the decision tree algorithm has been widely used in stock prediction, commodity price prediction, housing price forecast, and the prediction of future economic trend. It has achieved excellent achievements in many fields.

(1) Entropy: the essence of entropy is to interpret the uncertainty in random variables. Let X be a random variable and the value of X be a range of X 1 , X 2 ,. . ., X n . X i represents the values that may be taken, and the probability that the random variable X equals X i is set as P i . en, set the entropy of random variable X as H(X), which can be written as

Let D be the sample set. e random variable X represents the specific category of D. ere are a total of K categories in the selected sample D. Meanwhile, |C K | denotes the number of samples of category K, and |D| is the total number of samples. en, the probability of each category is CKD, and the entropy of the sample set D is expressed as

(2) Information gain: the information gain is the difference value of entropy before and after the data set is divided by a feature. e essence of entropy represents the uncertainty in the random variables. When the value of entropy increases, the uncertainty of variables in the sample will also increase. erefore, the difference value of entropy before and after the division can be used to judge the division effect of the sample set D in the current feature. e entropy of the sample set D before the division is certain. e information gain g (D, A) is equal to the difference between the entropy H(D) before the division of data set D by a feature A and the entropy H(D) of D. e calculation of g (D, A) is shown as

(3) Information gain ratio: the information gain ratio is the product of a penalty parameter P and the information gain in essence. e penalty parameter is the reciprocal of the entropy H A (D) of the data set D with feature A as a Computational Intelligence and Neuroscience random variable; that is, the samples with the same value of feature A are divided into the same subset. In general, the penalty parameter is inversely proportional to the number of features. e penalty parameter can be calculated as follows:

Equation (5) is the calculation method of information gain ratio:

In the above equation, H A (D) is the entropy obtained by taking the current feature A as the random variable for the sample set D and g (D, A) represents information gain.

Suppose there is a data set and substitute some features X 1 , X 2 , . . ., X P to predict the object variable Y. In a relatively simple model, it is assumed that the object variable Y is a linear combination of these features:

y ≈ a 0 + a 1 x 1 + · · · + a p x p .

Neural Network) structure that is improved based on RNN [24] . LSTM itself is a part of the whole neural network rather than an independent network structure, replacing the hidden layer units in the original network. LSTM can deal with the data with "sequence" properties like time series data, such as daily stock price trend and time domain waveform of mechanical vibration signals. Also, it can process data like the data of natural language with sequence properties consisting of ordered words [25] . All RNNs have a chain with repetitive neural network modules. In the standard RNN, this repetitive structural module has only one very simple structure, such as a tanh layer. LSTM is also such a structure, but the repetitive modules have different structures. Different from the structure of a single neural network layer, there are four interaction layers in LSTM, which interact in a very special way. LSTM is an artificial RNN architecture for deep learning [26] .

LSTM can not only process a single data point (such as image) but also process the whole data sequence. e LSTM unit consists of memory unit, input gate, output gate, and forget gate. e memory unit can remember the values within any time interval, and the three gates control the information flow entering or existing the unit. LSTM is especially suitable for the classification, processing, and prediction of time series data because there may be a lag of unknown duration between the important events in a time series. LSTM is developed to deal with the explosion and disappearance gradient problems that may occur in the training of traditional RNNs. e relative insensitivity to gap length is the advantage of LSTM over the RNN, hidden Markov model, and other sequential learning methods in many applications. e first step in LSTM is to decide what information to discard from the cell state, which is completed using the forget gate. e role of the input gate is to add new data to the cell state. e role of the output gate is to determine the value of the output. X t represents the input, S t−1 denotes the unit of state memory, and h t−1 signifies the intermediate output in the forget door. e retained vector in the updated state memory unit is determined jointly by X t under the action of sigmoid function and tanh function. Moreover, s t represents the state of the memory unit after being updated, while o t stands for the state of the output gate, and h t refers to the intermediate output.

en, equations (7)- (12) are obtained:

Among the above equations, f t is the state of the forget gate, while i t represents the state of input gate, and g t denotes the input node. Besides, o t refers to the state of output gate, s t is the state memory unit, and h t represents the state of Furthermore, b f , b i ,  b g , and b o are the bias items of the forget gate, input gate, input node, and output gate, respectively. Meanwhile, ⊙ represents the multiplication of each element in the vector by bit, σ denotes the change of sigmoid function, and Φ represents the change of tanh function.

Model. In the process of time series research, the ARIMA model is the optimization of ARMA model [27] [28] [29] . Both models are suitable for processing time series data. e ARIMA model can achieve the accurate prediction of data. When the obtained data features are not very stable, the ARIMA model can be used to steady the data features using the initial difference method [30] [31] [32] .

(1) Distribution analysis: firstly, under the premise of ensuring the data quality, tools, such as drawing software, programming software, modeling software, and analysis software, are used to visualize the song data to further observe the distribution of the data. e specific types and characteristics of the data can be obtained by the analysis of data distribution. If the obtained data are quantitative data, then the distribution histogram and frequency distribution table are established to visually present the data. Otherwise, the bar chart, pie chart, and line chart are drawn to realize the visualization of qualitative data.

(2) Comparative analysis: the comparative analysis of the obtained relevant indicators in the perspective of data amount can evaluate the prediction accuracy of the algorithm. e comparative analysis can also be conducted on data distribution, which is usually used for the comparison analysis of time series and the horizontal or vertical comparison between different indicators. For example, using horizontal and vertical comparative analysis from the three aspects of the play count, collection, and downloads of a song, singers can be divided into two types: exploding singers and stable singers. (3) Statistical analysis: this method mainly analyzes data distribution, such as distribution shape analysis, dispersion degree detection, and concentration analysis. e basic statistics describing data are also divided into three categories: distribution shape statistics, dispersion statistics, and central tendency statistics. e statistics and analysis of the data of every song is presented in the form of graphs to further study the distribution rules and data trends.

e following three statistics are used to identify and analyze the data: collection, downloads, and play count. Figure 4 illustrates the statistics of collection, play count, and downloads of different types of singers. e LSTM model is used to compare the real and predictive values of three singers from mainland China, Hong Kong, and another country, as shown in Figure 5 .

Upon the comparison of the real value and predictive value of three singers from different places, the prediction error of the singer from mainland China is relatively large. However, the prediction of the other two singers is basically close to the actual value. Meanwhile, the prediction data fluctuates violently overall, but it will eventually stabilize and come close to the real value. Figure 6 presents the statistics of collection, downloads, and play count of nine songs, including three Chinese songs, three Cantonese songs, and three English songs. Figure 7 shows the prediction value and actual value of the play count of the nine songs.

Upon the prediction of nine songs from the collection, downloads, and play count, the model achieves a good prediction of the play count of the nine songs on the whole. However, the model cannot accurately predict the play count of the songs with wide fluctuations. e prediction results of other aspects are basically close to the actual situation. erefore, the prediction of music trend by the time series of LSTM algorithm shows a good effect.

Results of the ARIMA Model. Figure 8 represents the comparison of the play count of two songs of different singers.

According to Figure 8 , the daily playing count of the songs of the two singers is close to the stationary time series. erefore, the first-order difference operation is performed on the data of the play count to make the original time series data with large fluctuations become relatively stable time series data. Figure 9 illustrates the prediction of the play count of a song in the next two months from the perspectives of users and singers. Figure 9 shows that there is no obvious gap between the predictive value of the play count of the song and the actual value, and the error fluctuation of predictive value is within the allowable range. erefore, the ARIMA model can accurately predict the play count of songs.

In the experiment, the model predictions of the rim algorithm, random forest algorithm, and LSTM algorithm are compared. Figure 10 illustrates the MAE (mean absolute error) and RMSE (root mean square error) of the prediction results. From October 1, 2020, to October 31, 2020, the test predicts the playback times of the nine songs using the ARIM algorithm, random forest algorithm, and LSTM algorithm. According to Figure 10 , the LSTM algorithm has a better prediction performance in the play count of songs of the nine artists than the ARIM algorithm and Random Forest algorithm. Besides, compared with the other two algorithms, the RMSE and MAE of the LSTM algorithm have reduced the data from 0.072 and 0.045 to 0.045 and 0.032, respectively. Meanwhile, the error rate is reduced by 36.5% and 28.1%, compared with the ARIM algorithm and the Random Forest algorithm. erefore, the LSTM algorithm can predict the music trend more accurately.

e prediction of music trend is mainly achieved by the LSTM and big data mining. e prediction models are established by the ARIM algorithm, Random Forest algorithm, and LSTM algorithm to predict the data of songs and calculate the fluctuation of time series. e results show that after analyzing the nine singers' collection, downloads, and play count, the LSTM model makes a good prediction on the play count of the nine singers' songs. However, there are also some defects in the LSTM model. For example, the model cannot accurately predict for some songs with large data fluctuations. e prediction results of other aspects are close to the actual situation.

Meanwhile, there is no big error in the play count of prediction results of the ARIM algorithm and the actual value, and the error fluctuation of the prediction is also within the allowable range. is indicates that the ARIM algorithm can also meet the prediction analysis. After comprehensive analysis, the LSTM algorithm has the most accurate prediction of the music trend among three algorithms. However, there are still some defects in this study. As songs usually have a relatively high periodicity and randomness and are greatly affected by the external factors, it is possible that the emergence of some film and television works can lead to a dramatic increase in the play count of music. erefore, the feasibility error analysis of these aspects should be carried out in future research.

e data used to support the findings of this study are available upon request to the author.

e author declares that there are no conflicts of interest. Computational Intelligence and Neuroscience 11

Information and communication (computer) technologies in the musical art of pop: pedagogical aspect

Mozart or pop music? effects of background music on wine consumers

A consideration of the code of computer music as writing, and some thinking on analytical theories

Jordanian musiqa sha'abie: an expression of ethnical authenticity in the stream of global pop music

Music computer in teaching the "listening to music" course

Information needs mining of COVID-19 in Chinese online health communities

Semantic and episodic memory of music are subserved by distinct neural networks

A style-specific music composition neural network

Neural network music genre classification

Music recognition algorithm based on T-S cognitive neural network

Automatic music transcription based on convolutional neural network, constant Q transform and MFCC

Improving mispronunciation detection of arabic words for non-native learners using deep convolutional neural network features

Artificial neural networks solve musical problems with fourier phase spaces

Detection of Anolis carolinensis using drone images and a deep neural network: an effective tool for controlling invasive species

Bottom-up broadcast neural network for music genre classification

Database design of regional music characteristic culture resources based on improved neural network in data mining

Music autotagging using scattering transform and convolutional neural network with self-attention

Using principal paths to walk through music and visual art style spaces induced by convolutional neural networks

Research on sports competition information management based on computer database technology

Embedding the pillars of quality in health information technology solutions using "integrated patient journey mapping" (IPJM): case study

Research on the integration of ideological and political elements in the course "database technology and application

Detection and classification of immature leukocytes for diagnosis of acute myeloid leukemia using random forest algorithm

Fault diagnosis of spur gear system through decision tree algorithm using vibration signal

Unsupervised monocular depth estimation of driving scenes using siamese convolutional LSTM networks

Hourly day-ahead wind power forecasting with the EEMD-CSO-LSTM-EFG deep learning technique

Effects of different feature parameters of semg on human motion pattern recognition using multilayer perceptrons and LSTM neural networks

Clustered hybrid wind power prediction model based on ARMA, PSO-SVM and clustering methods

Graph neural networks with convolutional ARMA filters

A new fuzzy time series method based on an ARMA-type recurrent pi-sigma artificial neural network

Recurrent neural network-adapted nonlinear ARMA-GARCH model with application to S & P 500 index data

Estimation of a digitised Gaussian ARMA model by monte carlo expectation maximisation

Estimating an individual's oxygen uptake during cycling exercise with a recurrent neural network trained from easy-to-obtain inputs: a pilot study

Acknowledgments is work was supported by the Conservatory of Music Shanxi University.