key: cord-0030153-fvswfrus
authors: Mughaid, Ala; Al-Zu’bi, Shadi; AL Arjan, Ahmed; AL-Amrat, Rula; Alajmi, Rathaa; Zitar, Raed Abu; Abualigah, Laith
title: An intelligent cybersecurity system for detecting fake news in social media websites
date: 2022-04-21
journal: Soft comput
DOI: 10.1007/s00500-022-07080-1
sha: 54d7ea2926226672459d2d4e073a9ed6c19f9ffb
doc_id: 30153
cord_uid: fvswfrus

People worldwide suffer from fake news in many life aspects, healthcare, transportation, education, economics, and many others. Therefore, many researchers have considered seeking techniques for automatically detecting fake news in the last decade. The most popular news agencies use e-publishing on their websites; even websites can publish any news they want. However, thus before quotation any news from a website, there should be a close look at news resource ranking by using a trusted websites classifier, such as the website world rank, which reflects the repute of these websites. This paper uses the world rank of news websites as the main factor of news accuracy by using two widespread and trusted websites ranking. Moreover, a secondary factor is proposed to compute the news accuracy similarity by comparing the current news with fakes news and getting the possible news accuracy. Experiments results are conducted on several benchmark datasets. The results showed that the proposed method got promising results compared to other comparative methods in defining the news accuracy.

working in Cybercrimes such as in Al-Masalha et al. (2020) . Fake news found its way into the Internet, and it becomes difficult for people to find truthful information. Fake news is spread online through social media and fake news websites. Current social media is fertile ground for the spread of fake news Sadiku et al. (2018) ; Wu et al. (2022) . By which content can circulate among users with no third party to arbitrate Otair et al. (2022) . Misinformation amplified by new means in the internet age poses a threat to society worldwide.

Fake news is fabricated content deceptively presented as real news Spradling et al. (2021) ; Goldani et al. (2021) ; Şahin et al. (2021) . It consists of stories designed to increase readership, online sharing, and Internet click revenue Freire et al. (2021) . Fake news is published on the Internet to mislead to damage an agency, person, or rival. Fake news spreads faster and more profound than truth. There are many types of Fake as some are Clickbait: exaggerated in reporting false news created for generating clicks to increase ad revenue. E.g., proclaiming that drinking two gallons of water a day is good, chocolate will help lose weight. Propaganda: deceptive story designed to promote the author's agenda. It may be politically motivated. Politicians and governments use propaganda to promote their agenda. Opinion: The author's commentary was invented to influence the reader. Humor: this story is used for entertainment and satire to discuss public affairs. The authors promote themselves as delivering entertainment and call themselves comedians instead of journalists Choudhary and Arora (2021); Kaliyar et al. (2021) .

Several works and approaches have been proposed to uncover fake news Gandomi et al. (2022) ; Paka et al. (2021) ; Sivasankari and Vadivu (2021) ; Fayaz et al. (2022) . Sanik et al. in Saikh et al. (2020) presented two models that rely on deep learning to solve the problem of detecting fake news in content using several domains across the Internet. The proposed models are binary classification that aims to classify fake and verified content Abualigah and Diabat (2022) . They evaluated these models based on two pre-trained data sets: fake news AMT and celebrity for fake news detection. Then, by testing both models using data sets, they reached an accurate amount of 77.08%, 83.3% when using fake news AMT, and 76.53%, 79% when using celebrity datasets. In another work, Rubin et al. in Saikh et al. (2020) ; Pérez-Rosas et al. (2017a) ; Rubin et al. (2016) used a dataset containing satirical news (The Onion and fact-checking sites like Politi-fact and Snopes), but these sites are restricted in a particular domain of news like politics. Perez Rosas et al. (2017) Pérez-Rosas et al. (2017a) created two datasets for a fake news detection mission to cover seven different news domains. They also used exploratory analytics to identify language differences in legitimate and fake news content using the numbering features, followed by N-grams, Complete LIWC, and Syntax features. Ahmed et al. in Ahmed et al. (2018) presented an approach to detecting pseudo-content that begins with preprocessing the dataset by removing unnecessary characters and words from the data such as stop words, removing punctuation and tokenization of the text, and then do stemming from extracting the n-gram features, and the required document was created. The final step in the classification process is training the classifier. Six different machine learning algorithms were tested, namely (SGD), SVM, (LSVM), K-Nearest Neighbor (KNN), LR, and DT. Veronica Perez-Rosas, et al in Pérez-Rosas et al. (2017b) , they search the new ways of fake news classification that published over the Internet such as social media for instantly, in which they proposed mechanism depends on online fake news detection by twofold, Firstly, by the novel two datasets of fake news detection in seven different domains in news, secondly, they conduct a set of learning of experiments to build accurate fake news detectors, in which they used two ways to build a novel datasets, in which used the manual and crowdsourced annotation efforts to build the first dataset and used direct collect from the web to build the second dataset, in which they using these datasets to conduct several exploratory analyses to identify linguistic properties that are predominantly present in fake content and build fake news detectors depends on linguistic features that achieve accuracies of up to 78%, such as a combination of lexical, syntactic, and semantic information as well features representing text readability properties.

Rafael A. Monteiro et al. in Monteiro et al. (2018) , the researchers studied the Brazilian Portuguese language problem in fake news detection for other languages by using analyzing linguistic characteristics and machine learning, in which they created a corpus by collecting samples of fake news and labeled it, which they called "Fake.Br corpus," after they created an own corpus, which is used it in build automatic fake news classifier by using main two of procedures, the first procedure is reducing of text size by truncating the text. The second used Linear SVC technique, in which, after applying all features, machine learning on Fake was used Fake.Br corpus to obtain 89% of the final accuracy. Hanslowski et al. in Hanselowski et al. (2018) conducted a retrospective analysis of the three best participating systems in the fake news challenge to enhance artificial intelligence techniques to combat fake news; through their research paper, they provided a deep analysis of these three best-performing systems, critical evaluation of the experimental setup and also suggested a new measure based on F1 Added to the features used and conducted a detailed analysis of these features.

Reese et al. in Reis et al. (2019) reviewed many previous studies through their research paper and focused on identifying the proposed features in this work, on implementing these features on a recently released data set, consisting of news articles sourced from Buzzfeed related to the US elections, where they explained the use of these features to distinguish between news the truth and the fake, at first they ignored all the stories classified as unreal or false, and the mixture of right and wrong as one group which is fake news, and the rest is accurate. Classify the features into features extracted from news content, news source, and published news environments and then evaluate these features based on some machine learning algorithm. Thorne et al. in Thorne et al. (2018) introduced a new publicly available dataset for checking and verifying textual sources (FEVER: Fact-Extraction and Verification), generated by Wikipedia. In their paper, Vlachos et al. in Vlachos and Riedel (2014) , they present the task of verifying information and facts by creating a data set using data validated by fact-checking websites such as Poli-tiFact, and then explaining the correlation and relationship of fact-checking with natural language processing tasks.

M. Al-Khair et al. Alkhair et al. (2019) presented an Arab group of false news for three Arab celebrities through the information published on YouTube, specifically about the news of their death. They performed a statistical analysis of the data collected and then classified the news using three methods: support vector Machine, Multinomial Naïve Bayes, Decision Tree. To distinguish between rumors and lack of comments, it has been noted that the performance achieved varies depending on the subject of the rumors. A method for analyzing and evaluating the credibility of tweets on Twitter was proposed by Castillo et al. Castillo et al. (2011) , where they divided these tweets as credible or unreliable, based on the features extracted from the tweets, the most important of these features is the behavior of users in the posting process and the number of retweets made by users, where they explained that the publication of reliable news comes from authors who have previously written a large number of topics and have many republishes. Qazvinian and others Qazvinian et al. (2011) relied on several of topics also spread on Twitter and investigated tweets on this news, suggested a classification of support or denial of stories and also relied on features based on content, on the network and some elements of Twitter, and found that the classification on topics increases inaccuracy with the increase in Tweets. Other optimization techniques can be used to solve the Fake news detection ; Agushaka et al. (2022) ; Oyelade et al. (2022) .

It is essential to be aware of how the above techniques can be used to employ fake news worldly, by giving one example which took place shortly before the most recent USA presidential election in a series of events known as "pizzagate." Fake news publishers in Macedonia circulated a false political conspiracy theory that former First lady, secretary of state, and presidential candidate Hillary Clinton and other prominent democratic political figures coordinated a child trafficking ring out of a Washington, DC pizzeria the name of Comet Ping Pong. This false news publication was widely shared via Facebook and directed readers to websites to generate advertising revenue. In the bizarre turn of events in December 2016, a man who read the fake publication drove from North Carolina to Washington, DC, and shot open a locked door at the actual Comet Ping Pong pizzeria Klein and Wueller (2018) .

In the era of AI, several intelligent applications have been employed with a huge amount of cloud data such as in AlZu'bi et al. (2018 AlZu'bi et al. ( , 2019 ; Elbes et al. (2020); Al-Zu'bi et al. (2021) ; AlZu'bi and Jararweh (2020); Guo et al. (2021) . The most popular news agencies use e-publishing on their websites; even websites can publish any news they want. However, thus before quotation any news from a website, there should be a close look at news resource ranking by using a trusted websites classifier, such as the website world rank, which reflects the repute of these websites. In this paper, we proposed a new method to detect fake news by proposing a novel algorithm of fake news accuracy detection according to the sequences, aggregate, and Live news. The proposed method collects websites ranking of news sources by RankAPI and Alexa website. It collects related news from the Google search engine by live news titles. After executing the news accuracy algorithm, compute the rank of source news website from total news accuracy 50%, and apply to stop words, Tokenizing, Stemming for both text and title of fake news dataset and Text and Title of living news. Then, implements cosine similarity for last Compute cosine similarity of living news text with related news text from total news accuracy 50% scores. This paper uses the world rank of news websites as the main factor of news accuracy by using two widespread and trusted websites ranking. Moreover, a secondary factor is proposed to compute the news accuracy similarity by comparing the current news with fakes news and getting the possible news accuracy. Experiments results are conducted on several benchmark datasets. The results showed that the proposed method got promising results compared to other comparative methods in defining the news accuracy.

The main sections of this paper are organized as follows: Sect. 2 shows the proposed method for accurate news definition. Section 3 presents the experiments and results. Section 4 shows the discussion and comparisons with state-of-theart methods. The conclusion and future work directions are given in Sect. 5.

The proposed system depends on checking the news accuracy published by the news websites, using a different way of the previous techniques that only used machine learning to classify the news as either fake or real. However, this is not enough to classify the news because machine learning depends on words similarity between two documents might be one for real and the other for fake, but they could have high words similarity. The methodology of this work depends on two. Firstly, the news text similarity across fake news datasets such as Kaggle Secondly, ranking of websites for the source text. The ranking system employed two different sources of website ranking, such RankAPI and Alexa, which are specialists for website ranking Aldwairi and Alwahedi (2018a); Aphiwongsophon and Chongstitvatana (2018) . The text similarity was given 50%, and the other 50% were given to news websites ranking. Finally, we compute the news accuracy by sum these percentages from 100% as news accuracy percentages.

The actual word is done using a fake news dataset generated by the Kaggle website, which is used to train the machine to check the current news similarity with the fake news dataset. In which a full training using attributes such that (id, title, author, text, label) kaggle (2021b).

The fake news detection programmed tool depends on two scenarios:

1. Detect news accuracy by website ranking: In this stage, we gather the rank of the news publisher website by two rank factors, the first from the Alexa website and the second from the Rankapi website. We will illustrate it in section news websites ranking ratio. 2. Detect news accuracy by getting top1 cosine similarity from fake news dataset and compute the true news percentage:

In this paper, the fake news dataset has been considered, and it contains a list of articles considered as "fake" news from the Kaggle website ( Fig. 1 illustrates a sample dataset), which contains four columns title, text, subject, and date kaggle (2021a): This paper depends on lexical similarity for current text news with previous fake news saved in the fake news dataset. We applied cosine similarity for token text with all fake news texts in the dataset. We got one of the most similarities from fake news texts as a similarity factor from 0 to 1. In this paper, we except the id, title, author, and label features, and we used just text feature as a similarity factor.

In the proposed system, we built a nontraditional algorithm for fake news detection that depends on two main factors. The first factor depends on features collected from worldly news ranking websites, and the other factor is using news text similarity a crossing with previous fake news dataset. Then, using those factors to compute the final news accuracy. However, the proposed system works in two stages, as illustrated as follows:

Stage 1: The system automatically collects the latest news from the news API, and it some-up some features for each news such as news title, news weblink, news source, and getting news text by scraping it from news webpage that has been published this news. Then, the system prepares those features for the next stage to achieve some formulas.

Stage 2: This stage uses the features that are achieved from the previous stage to compute some other formulas, as shown below illustration of the formulas: -Compute website news ranking (R):

In this section, the system computes R-value by summing up R1 and R2, in which R1 and R2 are two ranking sources of news websites as will be illustrated later on, the following formula illustrates how to compute the website ranking ratio:

The formula in equation 1 illustrates how to compute the website ranking ratio:

Where:

-R: is the final ratio of news website ranking.

-R1: is Rank API ratio of news website ranking.

-R2: is Alexa ratio of news website ranking.

-Compute the top-one similarity of news text:(TP)

In this section, the system computes the more remarkable similarity fake news texts from the fake news dataset, and the following formula illustrates how to compute the cosine similarity ratio with N fake news counts from the fake news dataset:

The formula in equation 2 illustrates how to compute the cosine similarity ratio of the current news text (CN) with N fake news counts from the fake news dataset:

Where:

-TP: is True Positive rate with fake news.

-Top: return the top1 similarity rate of fake news texts (FNi) with current news text (N).

-CN: is the tested news text.

-F N i : is Fake News text in the dataset, i=1,...,N.

-Compute False Positive rate of current news (FP):

In this section, the system computes FP after collecting the similarity ratio of fake news. The formula in equation 3 illustrates how to compute the False Positive ratio of false news based on the True Positive ratio of fake news.

Where:

-FP: is False Positive ratio of fake news.

-TP: is True Positive ratio of true news.

-Note: We multiply by 0.5 because the news text similarity is 50% of total news accuracy. In this section, the system computes the final accuracy of current news. The following formula illustrates how to compute the final ratio of news accuracy:

The formula in equation 4 illustrates how to compute the final ratio of news accuracy:

Where:

• Accuracy: is News Accuracy ratio from 100%.

• R: is news website ranking ratio from 50%

• FP: is False Positive of tested news ratio from 50%

Tables 1 and 2 represent the ranking ratio of some news collected from news websites. The ranking ration was given 50% out of the total news accuracy. Two websites were employed to achieve this ratio: RankAPI and Alexa, as each of these websites was given 25%, respectively. The ranking ratio of these websites expresses R1 and R2, respectively, from formula 1. The summation of R1 and R2 expresses R, which is the final ratio of website rank. Tables 1 and 2 show ranking ratio according to the API ranking source and Alexa, which will be considered as R1 and R2, respectively, in Equation 1:

-Rank API Factor:

This website uses an analysis algorithm that depends on Google Page Rank, which is measures how important the web page or websites and computes the quality of content and backlinks. The rank score in this website is between 0 From >= 400 k To < 300 k 17

From >= 500 k To < 400 k 15

From >= 600 k To < 700 k 13

From >= 700 k To < 800 k 9

From >= 800 k To < 900 k 7

From >= 900 k To <= 1 m 5

More than 1 m 1 None 0

and 10, where 10 is the highest page rank and 0 lowest page rank. The page rank brings the website closer to a top position in Google search results, and vice versa Rankapi (2011). Figure 2 illustrates an example of how to get CNN website rank from Rank API website:

-Rank Alexa Factor:

The Alexa rank ranking system owned by Amazon uses web classification by dynamic data inside the website to classify the top popularity websites. It ranks many websites as of recognition, with the lower site Alexa rank. It also looks at how an internet site is doing relative to other sites, making it a proper benchmarking or competitive analysis. It uses to classification millions of websites, as the rank result number refers to website rank crossing millions of websites Duo (2021) . Figure 3 illustrates an example of how to get CNN website rank by Alexa website. Table 3 shows that RankAPI and Alexa ranked websites are using different techniques.

In this section, we will be getting the first part of the news accuracy ratio by computing R using the formula in equation 1. By getting the news website first-ranking R1 from the Rankapi website, as shown in Table 1 , and then getting the 

In this section, we implement current news similarity with (N) counts of fake news from the fake news dataset, and this matter demands a set of steps to obtain the second part of the news accuracy ratio. To attain that, each current news text and current fake news text must pass in text preprocessing before similarity implementation on current news text and current fake news. The following explanation shows text preprocessing and text similarity:

-Text Pre-Processing:

Before texts similarity implementation, we must prepare the two texts with some processing that they become ready for the similarity process as the following steps show text processing:

-

Step 1: The uniform case of letters:

In this step, we convert all letters in the text to lowercase. -Step 2: Remove punctuation and non-ASCII characters:

This step removes all punctuations from the text and all non-ASCII characters. -Step 3: Remove Stop-words:

In this step, we remove all stop-words from the text, such as 'the,' 'is,' 'are,' 'a,' 'an,' and so on. -Step 4: Text tokenization:

In this step, we are splitting all text into words, by using a space pattern for splitting it. -Step 5: Words-stemming:

In this step, we return all text-words to basic words, i.e., remove any suffix letters at the end of a word, for example, "Playing," "played," and "Plays," all of them refer to one basic word is "Play," this is called word-stemming.

The text will be ready for the next stage, which can be prepared to implement similarities between texts.

-Create TF matrix for texts:

The Term frequency (TF) matrix for text terms is a twodimensional matrix containing two rows and N columns. The rows for text-1 and text-2, the columns for all text terms, and N refer to the count of text terms. The following stage explains how to create a TF matrix.

In this stage, we aggregate all words in text-1 and text-2 as a set without repeated words. These term words are represented by columns headers and the rows labeled as text-1 and text-2, where the crossing of rows and columns is the frequent count in text. Table 4 shows TF matrix tacking in calculations the above details.

-Text-1: The current news for test.

-Text-2: The current fake news for test.

-Term1…N: All consecutive words in the text. -Cosine-Similarity for text-1 and text-2:

After converting all terms in each of text-1 and text-2 to vectors (i.e., numbers as A's and B's in TF matrix) in the previous stage, now we can compute the cosine similarity score between text-1 and text-2, by the formula in Equation 6 Nguyen and Bai (2010):

Where:

n: The count of terms in the text 1 & 2.

-A i : The term-i frequency in text-1.

-B i : The term-i frequency in text-2.

-Compute current news accuracy:

-Step 1: From Equation 2, we obtain top1, which is the cosine similarity score of current news with N fake news. Where N is the count of fake news in the dataset, we have to have TP (True Positive rate of current news). -Step 2:

From Equation 3, we obtain FP (False Positive of current news), by subtraction one from TP, the result multiplication by 0.5 to get the second part of news accuracy (i.e., the second half of 50% of accuracy score).In the website ranking stage, we obtained the first half of the accuracy score from 50%. -Step 3:

From Equation 4, we obtain the final news accuracy score by adding the R-value to FP-value to get NA (News accuracy score).

This section will show the results for ten news from various resources, where the first five news from high popularity sources and the second five from low popularity sources. The proposed system was trained on these samples, often news to get results and then analyzes the results. The following Sects. 4.1 and 4.2, show system results in detail. 

The system in the initial stage aggregates up-to-date news from the Newsapi server, and then, the system is trained to check for aggregated news by the methodology of the proposed technique, 5 shows some info about aggregated news. The python program collects live news from the live news API server, as in this paper we are detecting fake direct news, then we trained the system to check for aggregated news using the proposed technique, the following 6 shows some info about Live aggregated news which is (News title, News Source, News URL). Table 6 shows the world ranking of the domain name of news websites, which we employed for differentiating the sources of websites ranking (Alexa and Rankapi) as explained previously in this research.

From Table 6 , we have chosen ten news sites that published news, the first five of which are famous and accredited news sites around the world in publishing the news, and the other five are unknown or approved sites to publish news. Then, we examined the ranking of these news sites using two ranking sites (Alexa and Rankapi) to obtain the results in Table 6 .

After getting the values of news websites ranking, we are using Rankapi ratio Table (Table 1 ) and the Alexa ratio Table  (Table 2) , to compute website ranking score R using Equation 1. Table 7 shows websites ranking ratios R.

According to the methodology of the proposed technique, after getting the news websites ranking result, we took 25% of each website (Alexa)and (Rankapi), then we added together to get the first 50% of news accuracy as shown in Table 7 .

The final stage of checking news accuracy is to gather the current news text and all fake news text from the Kaggle dataset. Then, inputs the text to preprocessing for preparation to do the similarity process. The proposed system used cosine-similarity in which the system processes the current news text with 5000 fake news from the dataset to achieve similarity. Then, it gets the top1 similarity score crossing fake news and being named FP. After that, it computes TP from Equation 3. Table 8 shows FP, TP ratios, and final similarity score out of 50% based on the previous explanation. From Table 8 , we gather the current news text and all fake news text from the Kaggle dataset. Then, we input the text to preprocessing to do the similarity process. The proposed system used cosine-similarity in which the system processes the current news text with 5000 fake news from the dataset to achieve similarity. Then, it gets the top1 similarity score crossing fake news and being named FP. After that, it computes TP from Equation 3. Figure 4 shows how to distribute each of TP and FP, where a subset of 5000 fake news has been used from the dataset. It can be noticed from Fig. 4 that the ratio of FP is much higher than the ratio of TP for all the ten news. Table 9 shows the duration time for each news individually, where a subset of 5000 fake news has been extracted from the dataset for similarity. Table 9 shows the duration time for each news individually, where the system used 5000 fake news for similarity, as we note the time taken for each news was less than a minute, which indicates the speed of the system, this helps to detect big news in record time. We noticed that the duration time increases similarity if the text is long and vice versa. Figure   5 illustrates how to distribute the duration time of each news for achieving similarity. We noticed from Fig. 5 that the duration time increases in similarity if the text is long, and vice versa, if the similarity is less the time duration is short.

In this section, we are reaching the final stage of computing a news accuracy score out of 100%, based on Equation 4. Table 10 shows the final results of news accuracy for the 10 news shown in Table 5 .

The results in Table 10 are considered as an acceptable achievement as the final accuracy ratio depends on news website ranking classification, and also it depends on news text similarity with a set of fake news. Now, if we consider websites that repeated fake news, thus the similarity ratio against a set of fake news is high even if the website rank has a high ratio, resulting in a low ratio in news accuracy. Furthermore, if the similarity against the set of fake news is a low ratio, and news website ranking ratio was low, thus appears the news accuracy low ratio as well, and vice versa is true. This illustrates the big difference in the news accuracy results in Table  10 . In another case, if the news accuracy ratio is between 50% to 80%, that means the news similarity against the set of fake news was high, and the news website has a high ranking classification. Hence, the website popularity affects the news accuracy according to the developed system. Figure 6 illustrates the final score for the ten news out of 100% based on Ranking and Cosine.

In this research, we have used two techniques for achieving news accuracy: a mix between news text processing using cosine similarity against fake news dataset and search about news website ranking by RankAPI and Alexa website. Com- Fig. 6 The final news accuracy score paring the proposed technique to machine learning, we took ten real news using linear model regression to train machine learning. The accuracy score of the train and test is over 95%. However, after inputting this news to the machine learning for prediction. The ML classified six news as real and four as fake, but the results were better after inputting the same news into the proposed system. Since the FP and TP of checked news referred to reality more than fake news, and the final accuracy of checked news is over 90%, this is nearing reality. The following Table 11 shows the results of comparison between the proposed technique and machine learning techniques:

Researchers have adopted different methods in detecting fake news, Table 12 presents some of them, and what are their results compared to the method adopted in this research.

In Ahmed et al. (2017) , the researchers proposed a fake news detection model, using two different methods to extract the feature: Term Frequency (TF) and Term Frequency-Inverted Document Frequency (TF-IDF) .

For text analysis, they used n-gram features and six machine learning classification techniques, namely: K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Logistic Regression (LR), Linear Support Vector Machine (LSVM), Decision Tree (DT), and random regression (SGD). The dataset used a dataset gathered from real news sites (Reuters.com) and fake news from the fake news dataset on kaggle.com. They used 12,600 honest news articles and 12,600 fake news articles from kaggle.com. The focus was on political news. Preprocessing of the data set was performed by removing the stop words, tokenization, lower casing, sentence segmentation, stemming (use Porter stemming), and removing punctuation marks which help in reducing the size of the data. The final step is to train the classifier by running the six machine learning algorithms and studying the effect of size n in n-gram on performance. The dataset was split into 80% for training and 20% for testing. The highest accuracy was achieved with unigram and SVM linear classifier by 92, and increasing the size of n in n-gram reduces the accuracy of the algorithms.

Finally, they ran an additional experiment by running the proposed model on a public dataset (Adali and Horne) and got 87% accuracy using n-gram and Linear SVM algorithm. In the proposed model, we compared the results of only the textual content similarity criterion (which is the part similar to what the researchers did), the highest result we reached through the tested sample is 46.5 (which is 50%), we multiplied it by 2 to get the value from (100%), so the accuracy becomes (93%), and we notice that we have surpassed them in one criterion without taking into account the second criterion.

In Thota et al. (2018a) , the researchers presented a proposed model based on the structure of the neural network to accurately predict the stance between a pair of the title and the article, depending on the extent of the similarity between the content of the article text and the title (classifying stance to agree, disagree, discuss, unrelated). Using the Fake News Challenge dataset (FNC-1), they split it into a 67% training set, 33% validation, or test set. Then, they preprocess the data by removing the stop words, which is the first step by using the natural language toolkit (NLTK) library, punctuation removal, and stemming. Then, they prepared the text, converted it into raw text, extracted the features by using two methods of word bag and TF-IDF, then trained the model on the structure of the dense neural network using TF-IDF and the word bag and found the perfect similarity between the title and article pair. Their proposed model obtained the results TF-IDF on unigram and bigram with cosine similarity 94.31%, a bag of words with cosine similarity 89.23%.

In this research, we aggregated live news and related news and compared live news with the highest similarity to related news (two datasets were adopted), and what we have done is more accurate, as the news was compared with different sources and is similar live news.

In Bali et al. (2019) during this research, the researchers proposed a model for detecting fake news from the perspective of natural language processing and machine learning. An assessment known as cross-validation was performed on three data sets and compared the performance of seven machine learning algorithms in terms of accuracy and F1 scores.

The model was tested on three datasets: the open-source dataset, the Kaggle dataset for fake news, and the GitHub repository for fake and real news. Next, previous datasets were preprocessed, removing symbols, links, and information unimportant in analyzing text features. To extract the features, it was used: n-gram: (use unigram, bigram, and trigram to count the number of occurrences), use TF-IDF: frequency of the term-frequency of the inverse document, frequency of the term is the number of times a word appears in a given document that can be calculated Equation 7:

Equation 8 is used to calculate the frequency of the reverse document

Then, it calculates the cosine similarity of the standard vectors tf-idf for headers and texts Contents.

1. Include words: replace each word with a vector of real value. 2. Emotional Score: used to analyze sentiment in tone followed by different articles, using the Open-Source Library Natural Language Toolkit (NLTK); Emotional intensity was analyzed for positive, negative, neutral, and compound emotions. 3. Linguistic: The readability criteria are defined for news articles. The lexical diversity of articles is calculated and used as features. In this paper, 163 total features were used.

To evaluate the model, cross-validation was used on the data sets, and the model was evaluated on seven machine learning algorithms which are: Random Forest (RF), Support Vector Classifier (SVC), Gaussian Naïve Bayes (GNB), AdaBoost (AB), K-Nearest Neighbor (KNN), Multilayer Perceptron (MLP) and Gradient Enhancement (XGB). By comparing the accuracy with cross-validation and three datasets, the XGB algorithm was observed to outperform the XGB algorithm with an accuracy of 87.2% in the open-source data set and 92.0% in the Kaggle, and 87.3% in the George McIntire dataset.

In this research, if we consider the similarity of textual content only (which is similar to what the researchers did), the highest similarity we got was 93% (46.5% * 2). The similarity values distinguish us, and what distinguishes us is the second criterion, which is the ranking of news websites sources globally, which gave additional accuracy to reveal the accuracy of the news.

In Al Asaad and Erascu (2018) , the researcher uses the machine learning technique. Based on article content, an algorithm uses data mining to detect fake news. The algorithm extracts the text features (characteristics of content and publisher). Performs linguistic and visual studies on the extracted features (source, headlines, body text, image, video). The construct machine learning model will use external sources (e.g., readers of Facebook) by using the static dataset, not feedback, to train the model. Take an article from a news website, not social media site. They detect fake news based on the clickbait title system studies the relationship between the article title and the body. The used data set of fake and real news. http://www.fakenewschallenge.org. Use Scikit-learn library in python. Extract features from the dataset (using text representation models). Test two classification approaches on the title content. The researcher experts the result with the combined tools, and he gets a result greater than 0.8 for content and title classification. He observes that the linear classification model works best with TF-TDF the result was .094. Probabilistic classification gives low accuracy score when combined with TF-TDF. Both classifications give the same result for title 0.95. Bi-gram frequency model gives low accuracy for title classifications compared with Bag-of-words TF-TDF.

In Aldwairi and Alwahedi (2018b) , the researcher locates the credible clickbait database. Then, compute the attributes and produce the data file for WEKA. They collect URLs for the clickbait from the web by focusing on the social media sites that have more fake news or clickbait ads or articles such as Facebook, Forex, and Reddit. After gathering URLs in a file, the researcher uses a python script to compute the attributes from the title and the content of the web pages. Then, they extracted the features from the web pages. They use cross-validation 10-fold in all experiments. After reading the website attributes file into WEKA. The researcher ranks the attributes based on several algorithms. To choose the most relevant and increase accuracy, decrease time. The researcher obtains classification results as Metrics and Classifier. The classification of the researcher is based on: Precision, Recall, f-measure, and ROC. The logistic classifier has the highest Precision with a result= of 99.4%. Logistic and random Tree classifiers had the best recall(sensitivity)=99.3%. BayesNet and Naivebytes are the best areas under the ROC curve.

In Thota et al. (2018b) , the researcher uses: FNC-1 dataset (2 models) to detect fake news. The first model, Riedel et with 88% accuracy, with (lexical and similarity), passed through a multi-layer perception (MLP) with one hidden layer. The second model is called Davis et al. with 93% accuracy. Glove representation (GloVe)Word2Vec predictive model. These techniques are applied to both the headlines and website articles. He allocates 67% of FNC data to the train set and the remaining 33% to the test set. Training data are divided into validation set (80/20) splits. All experiments are conducted on training and validation setup. The researcher trained models on Dense neural networks (DNN).

Compared with the previous research, we detect fake news with the proposed fake news accuracy detection algorithm according to these sequences. Aggregate Live news, Collect websites ranking of news sources by(RankAPI and Alexa website), Collect related news from Google search engine by live news titles, execute news accuracy algorithm, Compute rank of source news website from total news accuracy 50%, apply(stop words, Tokenizing, Stemming) for both Text and Title of fake news dataset and Text and Title of living news, then implements cosine similarity for last Compute cosine similarity of living news text with related news text from total news accuracy 50% scores.

With advanced technology and communication methods, the information spread among people without verification. The problem is that fake news plays a vital role in our life, that's why researchers started looking for a solution to stop fake news and disinformation from spreading widely. It is hard to control the flow of information online. We attempted to verify the news by implementing an algorithm by combining several methods to get satisfactory results. In this paper, we took a different approach to detect fake news as the first step. We collect live news from the Google search engine by live news title, then we rank news websites according to the results generated from RnkAPI and Alexa websites. Then, we execute algorithm programming language using python; the programmed tool in this paper depends on two primary sections. Detect news accuracy by ranking news and rating news accuracy by getting top1 cosine similarity from fake news dataset. After that, we applied News text similarity with fake news dataset, texts similarity implementation we applied text processing (uniform case of letters, remove punctuation and non-ASCII characters, Remove Stop-words, Text tokenization, Words-stemming) then we Created TF matrix for texts to compute current news accuracy (Compute rank of source news website from total news accuracy 50% + cosine similarity of living news 50% scores).

Funding This research received no external funding.

Data availability Data are available from the authors upon reasonable request.

The authors declare that there is no conflict of interest regarding the publication of this paper.

Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.

Chaotic binary group search optimizer for feature selection

The arithmetic optimization algorithm

Applications, deployments, and integration of internet of drones (iod): a review

Aquila optimizer: a novel meta-heuristic optimization algorithm

Reptile search algorithm (rsa): a nature-inspired meta-heuristic optimizer

Dwarf mongoose optimization algorithm

Detection of online fake news using n-gram analysis and machine learning techniques

Detecting opinion spams and fake news using text classification

A tool for fake news detection

Cyber-crime effect on jordanian society

Efficient 3d medical image segmentation algorithm over a secured multimedia network

Detecting fake news in social media networks

Detecting fake news in social media networks

An arabic corpus of fake news: Collection, analysis and classification

Data fusion in autonomous vehicles research, literature tracing from imaginary idea to smart surrounding community

Multi-orientation geometric medical volumes segmentation using 3d multiresolution analysis

A multi-levels geolocation based crawling method for social media platforms

Sixth international conference on social networks analysis. management and security (SNAMS)

Detecting fake news with machine learning method

An improved hybrid swarm intelligence for scheduling iot application tasks in the cloud

A novel deep learning-based feature selection model for improving the static analysis of vulnerability detection

Information credibility on twitter

Linguistic feature based learning model for fake news detection and classification

Alexa rank: everything you need to know about it

A platform for power management based on indoor localization in smart buildings using long short-term neural networks

Machine learning for fake news classification with optimal feature selection

Fake news detection based on explicit and implicit signals of a hybrid crowd: an approach inspired in meta-learning

Machine learning technologies for big data analytics

Convolutional neural network with margin loss for fake news detection

Probabilistic inference-based modeling for sustainable environmental systems under hybrid cloud infrastructure

A retrospective analysis of the fake news challenge stance detection task

Fake news, build a system to identify unreliable news articles

Fakebert: fake news detection in social media with a bert-based deep learning approach

Fake news: A legal perspective

Contributions to the study of fake news in portuguese: New corpus and automatic detection results. In: international conference on computational processing of the portuguese language

Cosine similarity metric learning for face verification

An enhanced grey wolf optimizer based particle swarm optimizer for intrusion detection system in wireless sensor networks

Cross-sean: a cross-stitch semi-supervised neural attention model for covid-19 fake news detection

Automatic detection of fake news

Automatic detection of fake news

Rumor has it: identifying misinformation in microblogs

Supervised learning for fake news detection

Fake news or truth? using satirical cues to detect potentially misleading news

Fake news and misinformation

Prediction of software vulnerability based deep symbiotic genetic algorithms: phenotyping of dominant-features

A deep learning approach for automatic detection of fake news

Tracing the fake news propagation path using social network analysis

Protection from 'fake news': the need for descriptive factual labeling for online content

Fever: a large-scale dataset for fact extraction and verification

Fake news detection: a deep learning approach

Fake news detection: a deep learning approach

Fact checking: task definition and dataset construction

Enhance teaching-learning-based optimization for tsallisentropy-based feature selection classification approach

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations