key: cord-0068083-tpm4mwin
authors: Ben Abdessalem Karaa, Wahiba; Alkhammash, Eman; Slimani, Thabet; Hadjouni, Myriam
title: Intelligent Recommendations of Startup Projects in Smart Cities and Smart Health Using Social Media Mining
date: 2021-09-24
journal: J Healthc Eng
DOI: 10.1155/2021/3400943
sha: 6c1ee8224f3a2bb9542b8be82880f88e58bf67e0
doc_id: 68083
cord_uid: tpm4mwin

The paper presents a recommendation model for developing new smart city and smart health projects. The objective is to provide recommendations to citizens about smart city and smart health startups to improve entrepreneurship and leadership. These recommendations may lead to the country's advancement and the improvement of national income and reduce unemployment. This work focuses on designing and implementing an approach for processing and analyzing tweets inclosing data related to smart city and smart health startups and providing recommended projects as well as their required skills and competencies. This approach is based on tweets mining through a machine learning method, the Word2Vec algorithm, combined with a recommendation technique conducted via an ontology-based method. This approach allows discovering the relevant startup projects in the context of smart cities and makes links to the needed skills and competencies of users. A system was implemented to validate this approach. The attained performance metrics related to precision, recall, and F-measure are, respectively, 95%, 66%, and 79%, showing that the results are very encouraging.

1.1. Background. Sustainability aims to improve life quality without troubling environmental safety [1] . Various studies have shown that smart city projects are always concerned with sustainability objectives [2, 3] . Smartness and sustainability are interrelated in city strategies [4, 5] . Smart city projects, today, give forefront attention to sustainability when producing technological-based services [6] . Many studies have focused on difficulties related to smart sustainable city projects' startup phase, especially for young people in developing countries [7] .

Smart city services integrate many greatly promising technologies resembling databases, data warehouses, advanced computer networks, content management, big data, social mining, and so forth [8] . It should be noted that these technological solutions in the field of smart cities have been applied in various areas, such as smart home [9] , smart grid [10] , smart community [11] , smart governance [12] , smart building [13] , smart manufacturing [14] , smart agriculture [15] , and smart healthcare applications that play an important role nowadays [16] [17] [18] . Smart healthcare monitoring in smart cities is important to provide better services and carefulness to residents [19] [20] [21] [22] [23] .

Social media is one of the most recent and rapidly growing phenomena. Social media platforms have become popular in users' daily lives; they allow communication and sharing of opinions and experiences [24] . e data collected from social media are particularly useful and reflect significant human experiences and behaviors for the resolution of various problems. Many researchers have proposed social media mining for data collection in the context of smart city applications, needed as a complement to data collected from sensors to ensure efficient services [25] .

Social media can provide help to healthcare, since everybody with access to social media can post information about how to deal with certain health problems [26, 27] . e use of data of Twitter, Instagram, and so forth has been exploited to investigate, track, and predict a range of health incidences and diseases [28] [29] [30] .

e numbers of approaches and applications using social networks, in the context of smart cities, constantly increase to measure resident engagement [31] , to guarantee security for citizens and detect violence [32] , to manage traffic and emergency services [33] , and to do transport planning [34] . Numerous studies and applications have been interested in social media and their impact on smart cities, such as Twitter emotion analysis [35] , detection and classification of critical events [36, 37] , and education [38, 39] . e collected information from social networks can be mined to understand the links between smart cities' social and technical aspects. Indeed, social media exploitation provides the necessary tools to analyze information diffusion, analyze social behavior vis-à-vis smart cities, study its influence, and provide effective recommendations.

In this research work, we focus on the recommendation of startup projects in the context of smart cities for Saudi youth, based on what users have posted in tweets. ese tweets can be seen as raw resources and reach information exchanged between individuals. us, this work focuses on designing and implementing an approach for processing and analyzing tweets inclosing data related to smart city startups. e objective is to provide recommendations to users about smart city and smart health startups to improve entrepreneurship and leadership following their skills and competencies.

For this purpose, first, extracted tweets are preprocessed (cleaning, stop word removal, tokenization, and lemmatization). en, the Word2Vec algorithm is applied; it uses a neural network model to learn word associations from the preprocessed tweets. From the model, synonymous words are detected and represented in a vector based on cosine similarity. e vector indicates the level of semantic similarity between the words. In this study, this vector conveys similar words related to the names of smart city startups.

Finally, from the Word2Vec vector, some recommendations are given to encourage Saudi youth to focus on startup projects related to smart cities to enhance the quality of life for KSA citizens with Vision 2030. In this study, the recommendations are based on an OWL ontology created for this purpose. Besides, for each recommended project, a list of needed skills are suggested to support the user's choice according to his skills and competencies.

is paper is organized as follows. e second section will deal with a literature review about the KSA's position regarding innovation and smart city startups and recommendation techniques for startup projects after the first introductory section. e subsequent third section is dedicated to a presentation of the research methodology and the proposed approach. After that, the fourth section will discuss the research findings, based on the experimentations and tests and evaluations of the implemented system. e final section, Conclusion, will emphasize some future works to enhance the attained results.

Startups. To encourage entrepreneurship and improve the technological innovation of youths in the Kingdom of Saudi Arabia (KSA), many strategies have been formulated, following the social development to join efforts and achieve the goals and programs of Vision 2030. e government constantly promotes innovation partnership programs, allowing ambitious youth entrepreneurs to work on innovative startup projects. With international strategic engagement, the government encourages emerging projects that intersect with sustainable development objectives. e government offers financial support for implementing these projects by providing the necessary assistance and contributions to carry out their projects, especially for young people [40] .

is support leads to the country's advancement and the improvement of national income, improving quality of life and reducing unemployment.

ere are many classes of organizations that are specialized in financial funding for small projects in KSA. e first actor is the bank sector [41] .

However, even though youth project funding organizations are abundant, and with the large panoply of businesses that exist on the scene and the growth of competitiveness, there are problems associated with choosing the best project or investment. e decisions to be taken may have consequences on the development of the project and its success. Young people generally lack experience; it is hard for them to find suitable startups. Numerous recommendation algorithms can be operated to recommend suitable things to users based on diverse information, such as the context and preferences. A recommendation system can help Saudi youth find a suitable smart city and sustainable startups.

ere are commonly known recommendation algorithms that can be applied or adapted to different fields [42, 43] . e most recommender approaches are as follows: based on Collaborative Filtering (CF) [44, 45] and Content-Based (CB) [46] . e prosperity of a business and the investment returns depend on the startup's choice. Hence, recommender approaches related to startup projects have received increasing consideration. ey can be beneficial for youth entrepreneurship and increase the country's economic benefits [47] . e majority of proposed recommendation approaches in the startup field use the CF, the CB, or hybrid methods [48] . To rank the startups and recommend them to investment companies, Xu et al. [49] used collected data about investment companies, startup projects, and investment events. Kim et al. [47] suggested a framework that recommends expected startups to enterprises based on their technological similarity scores between patent abstracts and startup profile texts by collecting patent applications from the WISDOMAIN website and startup information from "Crunchbase" database. Zhong et al. [50] proposed an integrated approach to recommend a startup investment to investors by studying the investor's investment preferences and the expected returns and potential risks of the startup. e proposed approach analyzes investment events collected from the "ITjuzi" platform.

Social network exploration has been employed in many fields with diverse goals. Many researchers have demonstrated that information from social networks can be exploited to improve the accuracy of recommendation systems [51] [52] [53] . However, social networks have never been exploited in the recommendation of startup projects. Even so, there is plenty of quantity of exchanged information between users, in social networks, about startup projects, which can be exploited in many domains, particularly, in startup recommendation systems, in the context of smart cities subsections if several methods are described.

e methodology adopted in this study derives from studies and considerations of smart city sustainable services, social networks, startups, and recommendation systems. e following objectives guide the proposed approach:

(i) Provide opportunities to young people and encourage them to instigate their startup project in the context of smart cities. (ii) Establish a startup recommendation system in the context of smart cities to guide young people in their choices. (iii) Take advantage of the great information about startup projects in social networks to establish such a recommendation system.

is research work focuses on the design and implementation of an approach concerned with processing and analyzing tweets holding data associated with startup projects about smart cities to provide recommendations for the Saudi Youth towards enhanced entrepreneurship and leadership. e proposed approach sets up an association between different research fields and suggests a recommendation system for smart city startups by mining tweets about smart cities. e proposed approach uses a Twitter mining approach and ontology-based recommendation technique to extract information from tweets, gives recommendations, and provides useful information to Saudi youth, connecting them to smart city growth and engaging them in innovation projects. Figure 1 illustrates the main components of the proposed approach. e proposed approach baptized "RecSPSC" entails four main components: tweets extraction, tweets preprocessing, tweets representation, and recommendation module. e first component performs the collection of public tweets through an application programming interface (API). In the second module, the collected tweets are preprocessed and passed to the tweets representation module. In this third module, data is fed into a neural network (Word2Vec) to discover a potential correlation between the different encountered concepts. e output is a vector of words representing the semantic similarity between these words.

Finally, the previous step's output is used as an input in the fourth module to make recommendations based on a startup-ontology. e ontology represents the concepts related to innovation and smart city service startups and skills associated with these projects. e following subsections reveal the details of each component of the proposed approach RecSPSC.

According to statistics [54, 55] , as of April 2020, Twitter, the online social networking, was ranked as one of the most important social networks, based on active users. As of this date, Twitter had 386 million active users. Tweets are short messages spread between twitter platform users. An author's tweets are dispersed to his followers or subscribers, that is, individuals who have chosen to follow his messages' publication. We introduce the following example:

RT @BlueprintStats receives $20K from the Community Ideation Fund in the Velocities region. Read more about the sports technology #startup, CEO @hunterhawley5, and the company's plans looking forward.

e Twitter messages have a maximum of 280 characters including the following: (i) Text or the message, in a given language, to transmit. (ii) A Hashtag, the symbol #, followed by a set of relevant words or characters. e hashtags usually boost the audience of a tweet and help Twitter users in searching for tweets. (iii) e username such as @username: to indicate the author of the tweet. (iv) Image or video. ese types of multimedia tweets are habitually well spread. (v) Optionally, a URL related to an interesting link, providing more details about the subject since the tweet length is limited. e addition of a link can also increase the audience of a tweet.

Apart from the general tweets, we can find mentions, replies, and retweets, which are tweets but with particularities: the mention is a tweet containing the username of another Twitter account preceded by the @ symbol. For example, "Hello @TwitterSupport!." e reply is a reaction to another person's tweet. Mentions and replies Journal of Healthcare Engineering 3 are displayed to the recipient in their notifications tabs. Only people who follow the sender and follow the mentioned account will see these tweets in their home timeline (the timeline is the main page that shows the tweets of accounts to which the user has subscribed). For the sender, they will be displayed on their profile page containing their public tweets. A retweet is a reposting of a tweet. Retweets help to share tweets with followers. Anyone can retweet someone else's tweets or his tweets. e users type "RT" at the commencement of a tweet to indicate that they are retweeting someone else's tweet.

In practice, Twitter has its specific vocabulary; Twitter users create many hashtags in abbreviations. e most commonly used are identified as follows:

#TT, Trending Topics; #FF, FollowFriday (every Friday, Twitter accounts that you wish to recommend to subscribers of your feed); #PP, Profile Picture; #NP, Now Playing (used to talk about the music we are listening to (music, radio. . .)); #NW, Now Watching (used to talk about what we are watching: television, film, videos. . .); #LT, Last Tweet (or the previous tweet, used when a user refers to his previous posted tweet); #NSFW, Not Safe For Work (used to report inappropriate content in a professional or public setting, indecent, even vulgar, violent, or sexual).

In addition to the text message itself, a tweet can have more than 150 hidden attributes related to it, including a unique identifier for the tweet, the time when this tweet was created, the geographic location of the tweet, the number of times the tweet has been replied to, and the number of times the tweet has been retweeted.

For the collection of tweets, we used TAGS (Twitter Arching Google Spreadsheet) to fetch Twitter data related to three months (from 16 June 2020 to 16 June 2021), with hashtags in English, such as #smart city, #Sustainability, #investor, #startup, and #entrepreneur, and Arabic, such as (Pilot project)

We also search tweets using terms without #, such as smart city, support project, investor, and ‫ان‬ ‫ش‬ ‫ئ‬ ‫ة‬

. We collected in an excel sheet 1 529 775 nonduplicated tweets (Table 1) .

Although the words and hashtags used to collect the tweets were in English and Arabic, we got tweets in other languages, such as French, Italian, and Japanese. In Table 2 , a sample of extracted tweets with the hashtag "startup" are given.

As explained in Algorithm 1, the preprocessing tasks on the extracted tweets are translation, cleaning, tokenization, and lemmatization. 

Although the aforementioned hashtags and terms used to collect the tweets were in English and Arabic, we got tweets in many other languages. For this reason, a translation of the tweets to English is required.

As explained in the previous section, apart from the tweet's text, many other elements are found, such as username, image, video, URL, mention, reply, retweet, and abbreviations. ese elements are not relevant in this study and need to be removed. e tweet cleaning consists of the following steps:

(i) Remove the abbreviations, such as #TT, #FF, and #PP, as well as the abbreviation "RT" referring to retweets. (ii) Remove extra whitespaces (when there is more than one space between words). (iii) Remove usernames, which are portions of text starting the symbol "@" without spaces in the middle. (ix) Remove the stop-words (a, and, or, etc.).

It is an essential step in natural language processing. It identifies the basic units (words) to be processed in a given language [56] . Generally, tokenization is based on the presence of special delimiters or marks, such as spaces.

It is the process of converting a word into a normalized form. It consists of removing the suffix of a word [57, 58] . For instance, by removing the words' suffixes, ranked, and ranks, we get the lemma rank. is step is very useful for many natural languages' processing to reduce the size of the vocabulary. Table 3 shows samples of tweets after the stated steps of the preprocessing module.

Word2vec [59, 60] is a neural network with an input layer, one hidden layer, and an output layer. It is an unsupervised machine learning method that automatically learns from the neighboring words (context) in the input corpus and represents words into vectors, according to their similarities. Word2Vec [61, 62] takes a text corpus as input and produces a vector with semantic and syntactic similarities for each word in the corpus. For each word in the corpus, a vector is represented grouping the words that are sharing the same context ( Figure 2 ). Word2vec [47, 50] can be implemented according to two approaches, the Continuous Bag-of-Words Approach (CBOW) or the Continuous Skip-Gram Approach (Skip-Gram) [63] . e CBOW uses the context to expect a target word. For example, it predicts the output word from other near words. Skip-Gram uses a word to guess a target context, for example, predicting other words that appear around a given word (Figure 3 ). Ci vediamo martedì 7 luglio alle ore 18:30 all'evento online, organizzato da @founding, dove proveremo a der il nostro piccolo supporto alle #startup in phase di pre-seed partecipanti al programma di accelerazione. Per partecipare l'iscrizioneè gratuita Italian

Input: Excel file of tweets; Output: Excel file of lemmatized tweets; (1) #import python libraries (2) import re (3) import string (4) from nltk.corpus import stopwords (5) from nltk.tokenize import sent_tokenize, word_tokenize (6) from nltk.stem import WordNetLemmatizer (7) from googletrans import Translator (8) import xlrd (9) import xlsxwriter (10) #Translation of the Excel file to English (11) column ← 0 (12) for i in range(sheet.numrows): (13) Journal of Healthcare Engineering e two approaches need a large corpus to capture relationships between words in their contexts and have the same neural network architecture and the same parameters. Generally, CBOW is used when the corpus is too large; it does faster and better than Skip-Gram in this situation.

In this paper, a CBOW-based approach is used. In CBOW, the iterative training process tends to maximize the log probability of each word given its context using the following equation [59, 60] :

where T is the corpus size; wt is the t th word in the corpus; c is the window size; it is a given number of words surrounding the input word. w t+c t−c is the set of words in the window of size c surrounding wt, where p(w t+c t−c ) is a softmax function, computed as follows: 

where ew and e ′ w signify, respectively, the input and output embeddings in CBOW. Since tweets are collected using specific hashtags and terms (smart city, startup, sustainability, etc.), these words tend to appear together in the tweets, and the sought names of startup projects will have similar contexts. Word2Vec asserts that words sharing similar contexts share semantic meanings as well. Consequently, we can state that Word2vec using the CBOW approach can be useful to embody the tweets' frequent words and represent their relationships. e relationships are computed giving the similarity. e similarity of two words is computed according to the cosine similarity of two vectors that represent the two words. Consider vectors v1 and v2, related to two given words. e cosine similarity between the words is obtained by taking the scalar product of the two vectors divided by the product of their norms (equation (3) 

In this study, the preprocessed unlabelled tweets are used as a corpus during the training process. All the words are trained alongside other words close to them in the corpus to create a training model. Afterward, in the prediction process, the training model is used to calculate the words' distribution in the corpus. e output is a vector where each element has a distribution of weight towards the other elements.

e pseudocode of this step is sketched in Algorithm 2.

e algorithm represents words into vectors based on several features such as vector size, word frequency, windows size, and the number of epochs (Algorithm 2, lines 22 to 25):

(i) vector_size: is parameter determines the number of neurons in the hidden layer of the network. With vector_size � 2, it will be possible to represent the terms in the plan. We can indicate much more (hundreds of. . . ). (ii) min_word_frequency: is parameter indicates the minimum frequency of a term included in the calculation to get the most frequent words in the corpus (vocabulary size). (iii) window_size: is parameter specifies the size of the neighborhood to be taken into account. It fixes the context or a given number of words surrounding the input word in the CBOW model. In the Skip-Gram model, they are the words surrounding the output word. (iv) epochs: e number of iterations.

Unlike the habitual recommender systems, which generally propose recommender systems to catch useful information such as similar users' preferences on the social network that could be considered by the recommender systems, in this study, the recommender system is based on an OWL ontology. e latter allows the correlation between the different encountered concepts in the tweets and then gives suitable recommendations about smart cities and sustainable startup projects for Saudi youth. e proposed approach RecSPSC is based on the Word2Vec vector from the previous module and a startup-ontology created for this purpose. 

Ontologies can be considered valuable tools for the Semantic Web data representation to organize knowledge and explore relationships. In the context of smart cities, ontologies have been used. ere have been existing domain ontologies closely related to smart cities and cross-domain ontologies that can help this field. For example, the FIPA (Foundation for Intelligent Physical Agents) device ontology specification is among the early ontologies dealing with Figure 3 : Word2Vec CBOW and Skip-Gram models [53] .

Output: word-vectors tweets;

(1) #import python libraries (2) import h2o (3) from h2o.estimators.word2vec import H2OWord2vecEstimator (4) from nltk.tokenize import word_tokenize (5) import xlrd (6) column ⟵ 0 (7) # loading Excel lines into a list, each line is an element (8) for i in range(sheet.numrows): (9) line ⟵ sheet.cell_value(i, column) (10) liste ⟵ liste.append(line) (11) #transform the list into tokenized list (12) liste � [word_tokenize(msg) for msg in liste] (13) #initialisation of H2O (14) h2o.init() (15) #creation of H2O Frames, each element of the list will be a frame (16) Journal of Healthcare Engineering devices [64] , and OWL-Time [65] is an ontology providing a trivial model for the formalization of temporal objects. e SSN ontology can express sensors regarding measurement processes, capabilities, deployments, and observations [66] . e Stream Annotation Ontology (SAO) [67] , as an extension of the SSN ontology, allows the publication of the derived data concerning IoTstreams, as well as its capacity to represent aggregated data.

In this work, a new ontology is proposed, which is the startup-ontology. e ontology represents the concepts related to innovation and smart city and sustainable startups as well as future young entrepreneur skills. Figure 4 In this study, the effort is concentrated on social media exploitation, which is text tweets. e recommendation technique is conducted through an ontology-based method combined with the Word2Vec algorithm, which can help to discover the relevant concepts related to startup projects in the context of smart cities. It also specifies for the recommended projects the needed skills (Algorithm 3).

To validate the proposed RecSPSC approach, a system was implemented using Python v3.6 under the PyCharm Integrated Development Environment (IDE). In the first algorithm (Algorithm 1), the preprocessing step includes the following: translation, cleaning of tweets, tokenization, and lemmatization. e Google API was used for the translation of the tweets. e remaining tasks, such as cleaning, as well as tokenization and lemmatization, were carried out with the support of the Natural Language Toolkit (NLTK) libraries [68] . e second algorithm (Algorithm 2) is interested in the generation of the Word2Vec vector. Word2vec algorithm is implemented according to CBOW because of the large number of tweets. e CBOW is implemented with the H2O library. H2O (https://www.h2o.ai/products/h2o/) is a JAVA platform that implements several machine learning algorithms. We can access these functionalities via the API mechanism, in particular under Python. e third algorithm (Algorithm 3) is related to the recommendation of smart city and sustainable startups that Saudi youth can choose according to the needed skills and competencies for each project.

is module required, first, the construction of the ontology and its population. e ontology was created using Protégé, the well-known, free, and open-source framework for building ontologies. e implementation of the recommendation task necessitated the OWLREADY2 library for manipulating the ontology. e experiments were accomplished by changing the parameters settings of word2vec: vector_size, min_word_frequency, window_size, and epochs.

To evaluate the approach RecSPSC, the experiments are conducted with 1 529 775 collected tweets as mentioned in Section 5.1. Setting vector_size � 2, min_-word_frequency � 3, window_size � 1, and epochs � 1000 as parameters, Word2Vec vector is generated. Table 4 presents an extract of the resulting Word2Vec vector.

For each word in the table, V1 and V2 represent the similarity measure of the word with "startup" in double dimension (vector_size � 2).

We notice that many composed words are concatenated, such as digitalmarketing and artificialintelligence. In the tweets, those words are usually mentioned in the following form: #digitalmarketing and #artificialintelligence. e result can be represented graphically; Figure 5 plots the similarity between the data.

Additional experimentations were performed, varying the parameters (vector_size, min_word_frequency, win-dow_size, and epochs). Table 5 specifies the size of the generated vectors (Word2Vec size). It is noticed that the Word2Vec size is influenced by the minimum frequency more than by the vector size, the window size, or the number of epochs. For example, with 2, 3, 1, and 1000, respectively, as the vector_size, min_word_frequency, window_size, and epochs, the Word2Vec size is 3065876 (number of words). By changing only the dimension (vector_ size) from 3 to 5, the Word2Vec size is reduced to 28278. e generated Word2Vec vector will be used in the recommendation process. e recommender module has as an input the generated Word2vec vector. Indeed, the recommendations vary according to the sated parameters: vector size, the minimum frequency of the words, the window size, and the number of epochs. In Figure 6 , the word cloud data visualization is used for representing the recommended smart city service startups. It can be perceived that the most recommended projects are related to marketing strategy, driverless cars, renewables, lifestyle, and so forth.

In addition, for each recommended project, a list of needed skills are suggested to support the user's choice according to his skills and competencies. Table 6 gathers an extraction of the recommended projects as well as the required skills.

To evaluate and assess the proposed approach's performance, the performance measures, precision, recall, and F-measure, are computed. e precision evaluates the correct smart city service startups recommended by the system from all of those recommended by the system. It measures the aptitude of the system to find only relevant smart city service startups (SCS). It is calculated according to the following equation:

e recall evaluates the total recommended smart city service startups (SCS) from all those available in the corpus Journal of Healthcare Engineering (collection of tweets). It is computed according to the following equation:

Recall � total recommended SCS total SCS in the corpus .

e F-measure is synthesizing the precision and recall measures,. It is calculated according to the following equation: Table 7 summarizes the results of the mentioned performance measures. In line 2, only the minimum frequency is changed. In line 3, the epoch number is modified. In line 4, line 5, and line 6, the window size, vector size, and minimum frequency are changed.

In Table 7 , we can notice that the performances (precision, recall, and F-measure) are influenced by the word frequency taken as a minimum. e lower the frequency, the better the performance. e algorithm performs significantly better when the word frequency is low. We attained the best precision of 0.95, recall of 0.66, and F-measure of 0.79 with three words' frequency, for example. is can be justified by the size of the generated Word2Vec vector, which is larger in the case of low word frequency, and the system generates plentiful potential smart city service startups to feed the ontology-recommender module. We can perceive also that precision is better than recall.

is is natural, since there is a correlation between precision and recall; the precision is increased at the cost of the recall. Actually, the system did not detect lots of smart city service startups existing in the corpus. Indeed, many words can be effective smart city service startups but not detected by the system, since they have a frequency less than 3 in the corpus. Commonly, the frequency parameter is chosen to start from 3. With a frequency of 1 or 2, the system will generate a vector of enormous size, where several words are taken into account without being relevant, and therefore the precision will be reduced. Table 7 shows that when the frequency is fixed, five for example, even if we change the other parameters (window size or vector size), the system gives the same size of the generated vector and the same list of words. We can argue that the minimum frequency of words is the most influencing parameter in the generation of the Word2Vec vector.

We evaluated and compared our approach RecSPSC with the widely deployed recommendation approaches, Collaborative Filtering (CF) [44, 45] and Content-Based (CB) [46] . e comparison is performed according to the precision, recall, and F-measure averages (see Table 8 and Figure 7 ). e results show that RecSPSC outperforms the existing approaches. Window size  Epochs  1  2  3  1  1000  3065876  2  2  5  1  1000  28278  3  2  5  1  500  28278  4  2  5  2  500  28278  5  5  5  2  500  28278  6  5  6 1 500 24814 Figure 6 : Word cloud representation of the recommended smart city service startups. 

Startups are representatives of innovation as well as new business opportunities. erefore, great attention is paid to startups in smart cities, since they contribute to the deployment of new products, new services, new business, and so forth and successively lead to the improvement of the life quality and a country's overall economy. is study suggests a novel computational approach for smart city startups' recommendation by identifying the innovation projects from the smart cities' perspectives.

In this approach, tweets are preprocessed. Afterward, the Word2Vec algorithm is applied; it uses a neural network model to learn word associations from the preprocessed tweets. From this model, synonymous words are detected and represented in a vector, based on the cosine similarity. e vector indicates the level of semantic similarity between the words. In this study, this vector conveys similar words related to the names of smart city startups.

Finally, from the Word2Vec vector, some recommendations are produced to encourage users to focus on startup projects related to smart cities. In this study, the recommendations are based on an OWL ontology created for this purpose. Besides, for each recommended project, a list of needed skills and competencies are suggested to support the user's choice. e attained performance metrics related to precision, recall, and F-measure are, respectively, 95%, 66%, and 79%, showing that the results are very encouraging. e forthcoming research will focus on further enhancements of the proposed approach. e first main direction for future improvement is related to the Word2Vec algorithm. Actually, the Word2Vec algorithm works with tokens (single word) and generates vectors of tokens with similar characteristics. However, a group of words, called usually "phrase," carry a special meaning. For instance, "innovation projects," "support projects," "digital marketing," "machine learning," and "web development" can be relevant in this research as smart city startup projects. e upcoming work will focus on a method that pays special attention to text segmentation to detect sentence boundaries and phrase boundaries (text chunking) and then propose a new algorithm, a Phrase2Vec version, and evaluate its influence on the approach.

In this research, the ontology population is limited. e automatic or semiautomatic ontology population is the second potential work, since the richness of the ontology through its instances can influence the results.

No data were used to support this study.

e authors declare that there are no conflicts of interest regarding the publication of this paper.

Smartainability: a methodology for assessing the sustainability of the smart city

Are smart city projects catalyzing urban energy sustainability?

Using multivariate statistical methods to assess the urban smartness on the example of selected European cities

Smart city pilot projects: exploring the dimensions and conditions of scaling up

How Do Cities Promote Urban Sustainability and Smartness? an Evaluation of the City Strategies of Six Largest Finnish Cities

Smart cities in the making

e importance of internal alignment in smart city initiatives: an ecosystem approach

Clustering smart city services: perceptions, expectations, responses

Social impacts and control in the smart home

Demand side management for smart grid based on smart home appliances with renewable energy sources and an energy storage system

A sidechainbased decentralized authentication scheme via optimized twoway peg protocol for smart community

Smart governance in the context of smart cities: a literature review

IoT-based smart building environment service for occupants' thermal comfort

Smart manufacturing

Climate-smart agriculture: what is it good for?

Applicability of WSN and biometric models in the field of healthcare

An internet of health things-driven deep learning framework for detection and classification of skin cancer using transfer learning

Iot enabled technology in secured healthcare: applications, challenges and future directions

A novel smart healthcare design, simulation, and implementation using healthcare 4.0 processes

Smart healthcare: making medical care more intelligent

Development of smart healthcare monitoring system in IoT environment

Measuring and preventing COVID-19 using the SIR model and machine learning in smart health care

Research on the application of blockchain in smart healthcare: constructing a hierarchical framework

e missing parts from social media-enabled smart cities: who, where, when, and what?

Sensing service architecture for smart cities using social network platforms

Social media use in healthcare: a systematic review of effects on patients and on their relationship with healthcare professionals

Social media-enabled healthcare: a conceptual model of social media affordances, online social support, and health behaviors and outcomes

Social media based surveillance systems for healthcare using machine learning: a systematic review

Discovering thematic change and evolution of utilizing social media for healthcare research

Social medicine: twitter in healthcare

Smart city communication via social media: analysing residents' and visitors' engagement

A soft computing approach to violence detection in social media for smart cities

Real-time traffic event detection from social media

Social network sustainability for transport planning with complex interconnections

Emotion identification in twitter messages for smart city applications

An Arabic social media based framework for incidents and events monitoring in smart cities

Twittersensing: an event-based approach for wireless sensor networks optimization exploiting social media in smart city applications

Social networks research for sustainable smart education

Tweeting and mining OECD-related microcontent in the post-truth era: a cloud-based app

e social development bank

Almaal.org. Small business support

Recommender systems challenges and solutions survey

Knowledge-based recommendation: a review of ontology-based recommender systems for e-learning

Recommender systems

A survey of collaborative filtering based social recommender systems

A review of contentbased and context-based recommendation systems

Recommendation of startups as technology cooperation candidates from the perspectives of similarity and potential: a deep learning approach

A venture capital recommendation algorithm based on heterogeneous information network

Recommending investors for new startups by integrating network diffusion and investors' domain preference

Which startup to invest in: a personalized portfolio strategy

RsRS: ridesharing recommendation system based on social networks to improve the user's QoE

A novel recommendation system in location-based social networks using distributed ELM

Cross-platform dynamic goods recommendation system based on reinforcement learning and social networks

Text mining: open source tokenization tools-an analysis

Highly languageindependent word lemmatization using a machine-learning classifier

A gradient boosting-seq2seq system for latin pos tagging and lemmatization

Word2Vec

Word2vec, pro machine learning algorithms

Weighted word2vec based on the distance of words

e enhancement of TextRank algorithm by using word2vec and its application on topic extraction

Using part of speech tagging for improving Word2vec model

FIPA Nomadic Application Support Specification, Foundation for Intelligent Physical Agents

e modular SSN ontology: a joint W3C and OGC standard specifying the semantics of sensors, observations, sampling, and actuation

A knowledge-based approach for real-time iot data stream annotation and processing

Natural Language Processing with python: Natural Language Processing Using NLTK

Journal of Healthcare Engineering 13