key: cord-0031678-59t1uqru
authors: Li, JiaoLong
title: E-Commerce Fraud Detection Model by Computer Artificial Intelligence Data Mining
date: 2022-05-09
journal: Comput Intell Neurosci
DOI: 10.1155/2022/8783783
sha: e7249c36808fc1b1d06c02183d055351899cc69b
doc_id: 31678
cord_uid: 59t1uqru

This study aims to identify e-commerce fraud, solve the financial risks of e-commerce enterprises through big data mining (BDM), further explore more effective solutions through Information fusion technology (IFT), and create an e-commerce fraud detection model (FDM) based on IFT (namely, computer technology (CT), artificial intelligence (AI), and data mining (DM). Meanwhile, BDM technology, support vector machine (SVM), logistic regression model (LRM), and the proposed IFT-based FDM are comparatively employed to study e-commerce fraud risks deeply. Specifically, the LRM can effectively solve data classification problems. The proposed IFT-based FDM fuses different information sources. The experimental findings corroborate that the proposed Business-to-Business (B2B) e-commerce enterprises-oriented IFT-based FDM presents significantly higher fraud identification accuracy than SVM and LRM. Therefore, the IFT-based FDM is superior to SVM and LRM; it can process and calculate e-commerce enterprises' financial risk data from different sources and obtain higher accuracy. BDM technology provides an important research method for e-commerce fraud identification. The proposed e-commerce enterprise-oriented FDM based on IFT can correctly analyze enterprises' financial status and credit status, obtaining the probability of fraudulent behaviors. The results are of great significance to B2B e-commerce fraud identification and provide good technical support for promoting the healthy development of e-commerce.

anks to the increasingly mature computer technology (CT) and fully-fledged search engines and the openness of government information, the possibility of people obtaining and interpreting data has greatly increased. Various visual communication (VISCOM) media use visualization technology to spread information, thus enhancing their influence [1, 2] . With the introduction of artificial intelligence (AI) technology, e-commerce is booming rapidly. In particular, the e-commerce industry is taking customers to a new level of experience in a new form. AI technologies have exerted great potential and brought recent changes to the e-commerce industry [3] [4] [5] . Millions of people's identification (ID) cards are stolen every year, but so far, there is no simple way to track down the thieves who stole them. A research team of foreign scholars has proposed a new fraud detection model (FDM) to trace the fraudster online within their few clicks of the mouse. Traditional lie detection includes face-to-face conversation and lie detectors that measure heart rate and skin electrical conduction. However, these methods lack remote control or simultaneous multiple people detection mechanisms. e new invention proposed by Italian researchers is a computer-based remote test method, which can identify fraud by measuring subjects' response time to true and false personal information. However, this method is limited and requires experimental researchers to know the truth before the test can be carried out smoothly [6] [7] [8] [9] .

AI has three key elements: data mining (DM), natural language processing (NLP), and machine learning (ML), which together promote the rapid development of e-commerce companies [10] . AI enables machines to perform tasks that previously required manual operation, allowing decisionmakers more time for business strategy [11] . e field of e-commerce has long become a key battlefield of black market (BM) fraud. According to iResearch consulting data, the transaction scale of China's online shopping market reached about 6 trillion RMB in 2017 alone and is expected to reach 7.5 trillion RMB in 2018. e huge transaction amount is accompanied by huge marketing and promotion expenses, and the BM is rampant with marketing and promotion [12] [13] [14] . Big data mining (BDM) and intelligent data mining (IDM) technology can help establish a large amount of data information [15, 16] . For example, the telemarketing robot has presented high efficiency, such as accurately counting and recording the data in the telemarketing process to classify customers more clearly. Meanwhile, accurate speech recognition (SR) and the intelligent call system will find out the potential customers of the enterprise one by one and record and save the call content during the outbound call. en, they can customize the customer classification rules, such as A/B/C/D, classify and export the intended customers (A/B customers), and follow up accurately, which is more conducive to the handover of results [17] [18] [19] [20] . e existing study has not provided effective solutions for resolving the financial risk in e-commerce. is study focuses on the e-commerce-oriented fraud risk assessment (FRA) and aims to solve the business financial risk through BDM, explore effective solutions through information fusion technology (IFT), and create an e-commerce-oriented FDM based on IFT (namely, CT, AI, and DM). e research content is of great significance for Business-to-Business (B2B) e-commerce FRA and provides good technical support for promoting the healthy development of e-commerce.

AI is a new technical science that studies and develops theories, methods, technologies, and application systems used to simulate, extend, and expand human intelligence. Moreover, AI is a branch of computer science to understand the essence of intelligence and produce a new intelligent machine that can respond similarly to human intelligence. e realm of AI includes robotic technology (RT), language recognition (LR), image recognition (IR), natural language processing (NLP), and expert systems (ESs) [10, 21] . Since the dawn of AI, relevant theories and technologies have become increasingly mature, and the application field has also been expanding. AI-based products envision the "container" of human wisdom in the future. Additionally, AI can simulate the information processing of human consciousness and thinking, and even if it is not human intelligence, it can think like people and may exceed human intelligence. Figure 1 represents the applications of the AI technology. Figure 1 unfolds the AI technology for subdivision application development based on basic theories and data. Midstream enterprises (MEs) have three barriers (technology ecosystem, capital, and talents) and are becoming the core of the AI industry. MEs are more likely to focus on a specific domain and technology layer to expand to the upstream and downstream of the industrial chain than the vast majority of upstream and downstream enterprises.

is level includes machine learning (ML), platforms, and application technologies (computer vision (CV), speech recognition (SR), natural language processing (NLP)). Also, recent years have witnessed China's extensive research and development efforts of vertical technologies, resulting in mature technologies and obvious competitive advantages CV and SR. On the other hand, IFTcan collect and integrate various information sources, multimedia, and multiformat information to generate a complete, accurate, timely, effective, and comprehensive information process [22, 23] . Figure 2 gives the working principle of a multisource information fusion system (IFS).

In Figure 2 , the light, humidity, temperature, and monitoring devices are suitable for sending the collected corresponding data within the monitoring range to the upper computer through the communication module to store the above data. e system is suitable for embedding the video data into the environmental monitoring device (EMD) in real-time and sending the data to the EMD. Usually, the IFS will deploy multiple EMD and sensor groups. If no clear target can be captured due to environmental factors, the nearest EMD meeting a clear shooting requirement will be called. en, the captured target image is synthesized. In particular, the target image synthesis is to extract the target track and splice the image according to the target track to obtain a complete image.

In Figure 3 , the data layer fuses the original data layer collected data and integrates and analyzes them before sensor measurement preprocessing. It can carry out multisource image composition, image analysis and understanding, and direct synthesis of similar radar waveforms. In particular, AI text classification (AI) belongs to supervised learning and needs training, such as Bayesian, support vector machine (SVM), and neural network algorithms (NNA). Figure 4 depicts the solution process of SVM. Figure 4 shows the solution process of SVM [24, 25] . Accordingly, the SVM model can be trained and verified on the training set. (1) calculates the hyperplane classification equation:

(1)

In equation (1), x, w, and b are the input vector, the weight vector, and the negative offset threshold, respectively. e optimal hyperplane equation can be assumed as the following:

In (2), x 0 and b 0 are the weight and offset of the optimal hyperplane, which is unique. e distance from any point x in the sample space to the optimal hyperplane is expressed by the following: Computational Intelligence and Neuroscience

In (3), w T 0 * x + b 0 is the projection of the data point x in the w direction, but the inner product of x and w contains the length of w. us, w can be transformed into a unit vector, and the relative distance from the x point to the decision surface can be obtained by dividing by the norm of w.

e hyperplane solves Lagrange "dual problem" by adding each constraint condition to Lagrange multiplier (LM), as manifested in the following:

en, the partial derivative of w and b in (4) is calculated. Let the partial derivative be 0 to get the final dual problem. α i represents a variable. e final model can be obtained by calculating w and b, as displayed in the following:

Logistic regression (LR) is a generalized linear regression analysis (LRA) model, often used to find risk factors for particular situations, predict the probability of certain conditions under different independent variables, and judge. (6) demonstrates the logistic regression model (LRM):

In (6), β i represents the change of logit (p) corresponding to the unit change of the independent x i . p denotes the probability of the event.

DM is extracting hidden, unknown, but potentially useful information and knowledge from countless incomplete, noisy, fuzzy, and random data. Many terms similar to DM exist, such as knowledge discovery in databases (KDD), data analysis, data fusion, and decision support (DS). e original data can be structured, such as relational DataBase (DB), or semistructured, such as text, graphics, image data, and even heterogeneous data distributed on the network. e method of discovering knowledge can be mathematical or nonmathematical. It can be deductive or inductive [26] [27] [28] .

e discovered knowledge can manage information management, optimize the query, support decisions, and control processes, among others. It can also maintain the overall data. erefore, DM is a broad interdisciplinary subject that brings together researchers in different fields, especially scholars and engineers in DB, AI, mathematical statistics, visualization, parallel computing, etc. Noticeably, ML is a crucial but dependent DM approach, and the two complement each other. Figure 5 describes the relationship between DM and ML.

As illustrated in Figure 5 , DM mainly uses ML technologies to analyze massive amounts of data and uses DB technology to manage these data. Table 1 lists the technical features of DM.

As tabulated in Table 1 , DM technology runs on a large amount of data and obtains useful results. Manual analysis can summarize small amounts of data that mostly cannot reflect the general characteristics of the real world. us, DM technology is used in complex data. Concretely, implicit DM is to discover the deeper knowledge under the surface. e results of mining must bring direct or indirect benefits to the enterprise. Yet, in some DM projects for enterprises, DM might exert little effect or not at all due to the lack of clear business objectives, insufficient data quality, managers' resistance to changing business processes, or inexperienced DM personnel. e architecture of DM is shown in Figure 6 . Figure 6 first defines the goal according to the actual situation of the problem and the real needs of users and then collects data to determine what data need to be collected. Secondly, it tests the data quality, draws charts, and calculates the data features to master the data features of the sample as much as possible. Afterward, it preprocesses the data before data analysis to get structured data type meeting the model requirements. Finally, the data are mined and analyzed.

It has been believed that future e-commerce will improve data volume and automation in FRA. Table 2 enumerates the causes of e-commerce fraud.

Most businesses in Table 1 lack targeted treatment for mobile transactions, nor do they assess the fraud risk of these transactions in different ways. Merchants do not effectively share data with their FRA team, but those mastering more information can make predominant decisions. Social media is widely used in the manual audit, and it is also an area with great potential [29] . Table 3 signifies the features and manifestations of e-commerce fraud. transaction result data will be stored rather than the transaction processes. us, the accuracy of FRA is not high. On the other hand, logs contain no important network messages referenceable for FRA, and the business system needs to be modified to unify the log format and content. Lastly, the buried point mode acquisition has additional loss over network bandwidth and application performance, and it is not easy to ensure information security. 

Aggravating the fraud of dishonest users A priori nature of online products Consumers cannot test empirical products in a good way Diversity of network products Network product quality asymmetry e subjectivity of product utility evaluation Evaluation information asymmetry Variability of online product content e online trading platform is challenging to manage and aggravates the difficulty of consumers comparing product information In Figure 7 , the proposed e-commerce FDM first collects the original network data sent to the data analysis and processing step to obtain the user transaction information and user behavior information.

en, it sends the user transaction information and behavior information to the FDM matching step. e rule matching engine (RME) is combined with the IFT-based FDM, and the matching result is output to the fraud behavior judgment step. Finally, the output matching results are judged to form specific fraud behaviors.

is section collects 30,000 e-commerce behavior samples, divided into nonfraud and fraud samples according to their fraud attributes. e accuracy of the proposed e-commerce FDM is trained according to different algorithms by training samples, and data mining test samples test the model performance. Figure 8 analyzes the sample accuracy of the e-commerce FDM.

As revealed in Figure 8 , with the increase of sample size, the accuracy of the proposed e-commerce FDM gets higher. Test samples' accuracy (84.10%) is significantly higher than that (75.20%) of training samples. e training data obtained from the study are 75.20% which represents that the proposed e-commerce model has high accuracy.

According to the test samples, Figure 9 exhibits the classification effect of the proposed IFT-based FDM on 1,000 e-commerce fraud samples.

As can be seen from the classification of e-commerce fraud samples in Figure 9 , with the increase of sample size, the average fraud accuracy and fraud coverage are 89.41% and 86.95%, respectively. When the sample size is 900, the coverage of e-commerce fraud and accuracy of fraud identification is 100% and 94.90%. Figure 10 analyzes the classification effect of e-commerce fraud samples under different model methods. Figure 10 corroborates that the classification accuracy of the proposed IFT-based e-commerce FDM is significantly higher than that of the SVM model and LRM. Hence, the proposed IFT-based e-commerce FDM is better than the SVM model and LRM; it can process and calculate the enterprises' possible e-commerce fraud risk data from different sources and obtain higher accuracy. Importantly, BDM technology provides an important research method for enterprise e-commerce fraud.

is paper aims to study e-commerce fraud identification, solve the B2B e-commerce enterprises' financial risk through BDM, explore more effective solutions through IFT, and create an e-commerce-oriented FDM based on IFT (CT, AI, and DM). Firstly, according to the fraud attributes, samples are divided into nonfraud and fraud samples.

en, different algorithms are used to train samples, and DM is used to test the accuracy of samples. e experiment finds that with the increase of sample size, the accuracy of the proposed e-commerce FDM is higher. Test samples' accuracy (84.10%) is significantly higher than that (75.20%) of training samples. Meanwhile, the average fraud identification accuracy and fraud coverages are 89.41% and 86.95%, respectively. e classification accuracy of the proposed e-commerce enterprises-oriented IFT-based FDM is significantly higher than that of the SVM and LRM. us, the proposed e-commerce FDM based on IFT can correctly analyze the financial situation of businesses, reflect the credit status of companies, and obtain the probability of business fraud. Such research findings are of great significance for B2B e-commerce fraud identification and provide good technical support for promoting the healthy development of e-commerce. However, according to the diversity of e-commerce fraud, Computational Intelligence and Neuroscience effective verification of the model's effectiveness and accurate identification of fraudulent users still need to be improved in all aspects of technology.

e data used to support the findings of this study are available from the author upon request.

e author declares no conflicts of interest.

Imagining big data: illustrations of "big data

Artifact-based rendering: harnessing natural and traditional visual media for more expressive and engaging 3D visualizations

Consumer perception towards artificial intelligence in E-commerce with reference to Chennai city, India

How to make e-commerce more successful by use of Kano's model to assess customer satisfaction in terms of sustainable development

A B2B flexible pricing decision support system for managing the request for quotation process under e-commerce business environment

A helpful method to extract features using analyzing social network for fraud detection

Nontargeted analytical methods as a powerful tool for the authentication of spices and herbs: a review

Online worker fraud and evolving threats to the integrity of MTurk data: a discussion of virtual private servers and the limitations of IPbased screening procedures

Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection

Investigating the impacting factors for the healthcare professionals to adopt artificial intelligence-based medical diagnosis support system (AIMDSS)

Unsupervised by any other name: hidden layers of knowledge production in artificial intelligence on social media

e analysis on Chinese e-commerce tax losses based on the perspective of information asymmetry

Sustainable online shopping logistics for customer satisfaction and repeat purchasing behavior: evidence from China

e relationship among product risk, perceived satisfaction and purchase intentions for online shopping

e perception difference analysis of the influence of coastal residents of big data mining technology on marine tourism development

Transforming big data into smart data: an insight on the use of the k-nearest neighbors algorithm to obtain quality data

Development of AI-based real time agent advisor system on call center-focused on N bank call center

Chatbot design method using hybrid word vector expression model based on real telemarketing data

A call center system based on Expert systems for the acquisition of agricultural knowledge transferred from text-to-speech in China

Gender classification based on the non-lexical cues of emergency calls with recurrent neural networks (RNN)

Algorithmic decision-making? e user interface and its role for human involvement in decisions supported by artificial intelligence

Distributed Kalman filtering and control through embedded average consensus information fusion

Weighted assignment fusion algorithm of evidence conflict based on Euclidean distance and weighting strategy, and application in the wind turbine system

Wavelet packet energy-based damage identification of wood utility poles using support vector machine multi-classifier and evidence theory

GIS-based landslide susceptibility mapping using hybrid integration approaches of fractal dimension with index of entropy and support vector machine

Diversity of acupuncture point selections according to the acupuncture styles and their relations to theoretical elements in traditional asian medicine: a data-mining-based literature study

Recent developments in the Inorganic Crystal Structure Database: theoretical crystal structure data and related features

COVID-19 pandemic in the new era of big data analytics: methodological innovations and future research directions

An individual-groupmerchant relation model for identifying fake online reviews: an empirical study on a Chinese e-commerce platform