key: cord-0560627-47kge10i
authors: Pougu'e-Biyong, John; Gupta, Akshay; Haghighi, Aria; El-Kishky, Ahmed
title: Learning Stance Embeddings from Signed Social Graphs
date: 2022-01-27
journal: nan
DOI: nan
sha: f3137d1fb2aa261d596095f7f4e8b574c944e29a
doc_id: 560627
cord_uid: 47kge10i

A key challenge in social network analysis is understanding the position, or stance, of people in the graph on a large set of topics. While past work has modeled (dis)agreement in social networks using signed graphs, these approaches have not modeled agreement patterns across a range of correlated topics. For instance, disagreement on one topic may make disagreement(or agreement) more likely for related topics. We propose the Stance Embeddings Model(SEM), which jointly learns embeddings for each user and topic in signed social graphs with distinct edge types for each topic. By jointly learning user and topic embeddings, SEM is able to perform cold-start topic stance detection, predicting the stance of a user on topics for which we have not observed their engagement. We demonstrate the effectiveness of SEM using two large-scale Twitter signed graph datasets we open-source. One dataset, TwitterSG, labels (dis)agreements using engagements between users via tweets to derive topic-informed, signed edges. The other, BirdwatchSG, leverages community reports on misinformation and misleading content. On TwitterSG and BirdwatchSG, SEM shows a 39% and 26% error reduction respectively against strong baselines.

Signed graphs (or networks) have been used to model support and opposition between members of a group of people, or community, in settings ranging from understanding political discourse in congress [22] to identifying polarization in social networks [11] . In such graphs, each node represents an individual in the community, a positive (+) edge indicates agreement between two community members and a negative (−) one denotes disagreement. For instance, Epinions [11] is a who-trust-whom graph extracted from the nowdefunct online review site, where each edge represents whether one member has rated another as trustworthy (+) or not (−). The 108th US Senate signed graph [16] represents political alliances (+) or oppositions (−) between congressional members across 7,804 bills in the 108th U.S. Congress. Past work have leveraged signed graphs and insights from social psychology [5] in order to better understand and predict patterns of community interaction [11, 16] .

Recent research in text-based stance detection has proven the benefits of capturing implicit relationships between topics, especially in cases where there are many topics at stake, and most with little training data [2, 3, 12] . One shortcoming of traditional signed graph analysis is that it reduces the interaction between any two individuals to a binary value of agreement (+) or disagreement (−). Interactions in communities may be much more complex and change depending on underlying context. In the U.S. senate, two senators may agree on bills related to climate change, but differ on taxation policy bills. In a sports community, two French football fans may support rival clubs, but will generally both support the national team at the World Cup. Most communities will have several different aspects, or topics, of discourse that have rich structure and dynamics within a community. For instance, in the French football fan example, it is very likely for someone to support the national team if we have observed support for a local club. This example and others highlight the value of modeling community stance across a range of topics [15] .

In this work, we use signed topic graphs to represent (dis)agreement across topics of discourse with a community. Each edge represents a binary agreement value ({+, −}) with respect to a single topic ; the inventory of topics is assumed to be fixed and finite, but varies across applications.

Our proposed method, the Stance Embeddings Model (SEM), detailed in Section 3, leverages an extension of the node2vec algorithm [6] to signed topic graphs to learn embeddings for nodes as well as for topics. Learning member (node) and topic embeddings jointly enables us to represent topic-informed stance embeddings for each member, which can accurately predict member agreement across community topics (Section 5.4). This allows us to do zero-shot topicstance prediction for a member, even when we haven't observed past engagement from the member on a topic (Section 5.5). As importantly, it allows us to capture implicit relationships between topics (Section 5.7).

We apply and evaluate our approach on two Twitter-based signed social graphs that we open-source alongside this work (see Section 4). For both of these datasets, we represent online interactions as a signed topic graph, where each node is a Twitter user 1 and each edge represents an interaction between users on a given topic. The TwitterSG dataset (Section 4.1) consists of ∼13M interactions (edges) between ∼750k Twitter users (nodes), spanning 200 sportsrelated topics; each edge represents one user replying to another user's Tweet or explicitly using the 'favorite' UI action (AKA, a like). This graph is ∼6x larger than the Epinions graph, which to the best of the authors' knowledge, is the largest publicly available signed social graph. The BirdwatchSG dataset instead leverages Birdwatch 2 annotations to indicate whether a user finds information on a Tweet to be misinformation or misleading or are rated helpful in clarifying facts (see Section 4.2 for details).

The core contributions of this paper are: 

Our work falls at the intersection of the literature on shallow graph (or network) embeddings and signed graph embeddings.

Shallow graph embeddings. Shallow graph embedding methods learn node embeddings when node features are unavailable [1, 6, 17, 18, 21] . They leverage the structure, i.e. the adjacency matrix, of the graph only. The two most popular variants are node2vec [6] and its specific case DeepWalk [18] . Node2vec and DeepWalk build on top of word2vec [13, 14] , a word embedding technique in natural language processing. Node2vec generates second-order random walks on unsigned graphs, and learns node embeddings by training a skip-gram with negative sampling (SGNS) [14] to predict the surroundings of the input node. The learnt embeddings are such that nodes close in the graph are close in the embedding space. However, node2vec is not adapted for signed graphs because it is based on the homophily assumption (connected nodes should lie close in the embedding space) whereby in signed graphs agreeing nodes should be closer while disagreeing nodes farther apart.

Signed graph embeddings. To overcome the homophily limitation, SNE [27] generates uniform random walks on signed graphs (by ignoring the weights on the edges) and replaces the skip-gram model by a log-bilinear model. The model predicts the representation of a target node given its predecessors along a path. To capture the signed relationships between nodes, two signed-type vectors are incorporated into the log-bilinear model. SIDE [8] generates firstorder random walks and defines a likelihood function composed of a signed proximity term to model the social balance theory, and two bias terms to mimic the preferential attachment theory. SiNE [24] is a deep neural network-based model guided for social theories. SiNE maximises the margin between the embedding similarity of friends and the embedding similarity of foes. StEM [20] is a deep learning method aiming at learning not only representations of nodes of different classes (e.g. friends and foes) but also decision boundaries between opposing groups . Hence, unlike other methods, e.g. SiNE, which are distance-based (thus, use only local information), StEM attempts to incorporate global information.

Overall, to learn node embeddings, SNE and SIDE generate random walks but do not use the skip-gram in node2vec to learn parameters, while SiNE and StEM use a margin loss function and decision boundary method respectively.

Our model. Our approach extends the traditional skip-gram objective of word2vec and node2vec to signed graphs. We facilitate this by ensuring that each training example is constructed via a sign-informed random walk. Leveraging the skip-gram architecture not only provides scalability advantages, but also the flexibility to extend the embedding process to effectively utilize edge attributes. While all mentioned related works are edge-attribute agnostic, our proposed method can leverage edge attributes such as topics in the form of contextual topic embeddings. We argue that incorporating edge attributes, such as topics, in the embedding process can benefit understanding stance in signed social-network interactions.

In many real-world social systems, relations between two nodes can be represented as signed graphs with positive and negative links. Early signed graphs are derived from observations in the physical world such as the relationships among Allied and Axis powers during World War II [4] .

The development of social media has enabled the mining of larger, signed social graphs. Typically, signed graphs in social media represent relations among online users where positive links indicate friendships, trust, and like, whereas negative links indicate foes, distrust, and dislike. Signed graphs in social media often have thousands of users and millions of links, and they are usually sparser than physical world signed graphs.

Existing datasets. Epinions, Slashdot [11] , Wiki-Rfa [25] , Bit-coinOtc, BitcoinAlpha [9, 10] are the largest and most widely used signed graphs used for benchmarking signed graph embeddings methods. Epinions.com 3 was a product review site where users can write reviews for various products with rating scores from 1 to 5. Other users could rate the helpfulness of reviews. Slashdot 4 is a technology news platform on which users can create friend and foe links with other users. For a Wikipedia editor to become an administrator, a request for adminship (RfA) must be submitted, and any Wikipedia member may cast a supporting, neutral, or opposing vote. This induces a directed, signed graph Wiki-Rfa [25] in which nodes represent Wikipedia members and edges represent votes. BitcoinOtc and BitcoinAlpha [9, 10] are who-trusts-whom graphs of users who trade using Bitcoin on online platforms. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Platforms' members can rate each other members positively or negatively.

We curate and open-source two real-world signed social graphs with attributed (topics) edges. TwitterSG is a signed edge-attributed, multi-edge, directed graph with 12,848,093 edges between 753,944 Twitter users (nodes), spanning 200 sports-related topics: teams, sports, players, managers, and events (e.g. Los Angeles Lakers, Basketball, Cristiano Ronaldo, Zinedine Zidane, Olympics). TwitterSG contains ∼6x more nodes than Epinions, the largest publicly available signed graph. BirdwatchSG is a signed edgeattributed, multi-edge, directed graph with 441,896 edges between 2,987 Birdwatch participants (nodes) based in the USA, and spanning 1,020 diverse topics prone to misleading content and/or partisanship (e.g. COVID-19, US Presidential Elections). Table 1 provides statistics on the datasets. 

Let = ( , ) be a signed (un)directed topic graph: each edge has a topic , and a sign of − or +. We use to denote the finite set of topics . Note that there can be multiple edges between users corresponding to different topic interactions. We define = ( , ) the subgraph of which contain all the edges with topic . We aim to learn a node mapping function : → R , and a topic embedding function : → R . Our approach will implicitly define embeddings for each edge using learned node and topic embeddings. For an edge ( , ) with topic , we combine the source embedding and topic embedding using ( ( ), ( )); see Section 5.1 for choices of considered. This transformed source node embedding is combined with the target node embedding using an operator Φ(·, ·) from Table 2 . We evaluate these edge embeddings compared to other signed graph edge embeddings in Section 5, but for the remainder of this section, we will detail how we learn the node and topic embedding functions and .

As we apply the skip-gram objective to graph data via random walks, our work can be considered an extension to node2vec [6] . However, while node2vec only operates on unsigned homogeneous graphs, our embedding approach naturally incorporates signed edges as well as edge attributes such as topics.

Given an input signed topic graph, we outline how we create training examples to learn node and topic embeddings using the skip-gram objective.

Random walks on edge-attributed graphs. We first iterate through each topic-specific subgraph , and mask the edge weights yielding a topic-graph ′ = ( , ′ ) where all edges are unsigned and unweighted. We follow the sampling procedure of [6] , and define a second-order random walk with two parameters and that guide the walker on ′ . Let us consider a walker that just traversed edge ( , ) and now resides at node . The walker next decides to walk to edge ( , ) with the unnormalised transition probability :

where is the shortest path distance between nodes and . and are return and in-out parameters respectively, and control how fast the walk explores and leaves the neighborhood of starting node . For example, < 1, means the walker is more inclined to visit nodes which are further away from node .

For each node in ′ , we simulate random walks of fixed length starting at . At every step of the walk, sampling is done based on transition probabilities defined in Eq. 1.

Creating signed contexts. In node2vec, the contexts of a source node are the nodes surrounding it in the walks. The context vocabulary is thus identical to the set of nodes . This effectively embeds connected node close to each other in the embedding space. However, in signed graphs, agreeing nodes (linked with positive edges) should be embedded in close proximity while disagreeing nodes (linked with negative edges) should be farther away. We incorporate these insights into our skip-gram objective.

Unlike with node2vec, whereby a source node predicts context node, we propose to predict sign and node as contexts. In other words, we predict not only the context node, but also whether the source node agrees or disagrees with them on a given topic. While the context node is determined by the random walk, there may not be a signed edge between a source node and context node for that topic. To infer whether or not a source and context node agree on some topic, we apply Heider's social balance theory [5] .

Let be an arbitrary topic, and consider the graph depicted in Figure 1 . Assuming a random walk sampled via the procedure described above, we have a sequence of nodes. Using a window of size around a source node 0 , 2 context nodes are produced from the walk: before 0 and after: ( − . . . 0 . . . + ). In addition we compute the inferred sign, ( 0 , ), between our source node and the ℎ context node as follows:

where is the weight, +1 or −1, between nodes and . As seen in Equation 2, we can leverage Heider's social balance theory to assign each context node a sign with respect to the source node. In simple terms we have three rules: applies this to (dis)agreements over topics and as such, we can compute the (dis)agreement sign between the source node and a context node simply by multiplying the edge signs between the source and context as defined by the random walk between them. 5 By incorporating these signed (dis)agreements with the source node alongside each context node, our skip-gram objectives need to not only predict the context node, but also whether or not the context node agrees with the source node on a topic. As such, node proximity and stance both influence a node's embedding.

The training examples are composed of a source node , a topic , and a set of contexts ( ) where contexts consist of (node, sign) pairs. We associate embedding vectors , , and for the source, context (node-sign pair), and topics respectively; these vectors are parameters to be learned. In Fig. 2 , we visualize this topicaware skip-gram architecture as a generalisation of the original skip-gram neural network architecture.

To learn these vectors, we generalise the SkipGram objective to incorporate topic information as follows:

where , = ′ ∈ exp( ′ · ( , )), with (·, ·) an operation over topic and node embedding vectors (e.g. addition of both vectors). As the partition function , is expensive to compute, we approximate it using negative sampling [14] . Moreover, the sign in any context of Equation 3 is derived from Equation 2.

In this section, we describe two new social-network signed topic graphs that we curate and open-source alongside our work. Both datasets are fully anonymized without personally identifiable information.

Twitter Signed Graph, or TwitterSG , is a signed, directed, edgeattributed graph of users, drawn from Twitter interactions. A positivesigned edge exists from user to user if user liked a tweet posted by user . A negative-signed edge exists from user to user if user expressed opposition towards user 's tweet, e.g., by replying I disagree with you. The topic of an edge from user to user is determined by the topic of user 's tweet, also called the target tweet. Tweet topics were inferred with a topic classifier provided and used in production by Twitter; we restrict interactions in Twit-terSG to sports-related topics (e.g., sports teams, players, managers, or events). The tweets related to these interactions were published between 20th May (Ice Hockey World Championships) and 8th August 2021 (closing date of the 2020 Tokyo Olympic Games), and collected via Twitter API.

Several challenges arise when attempting to build a large signed graph with interactions on Twitter. First, the graph may be extremely sparse due to the number of active users and the skewed distribution of tweets per user. Second, opposition mostly goes silent (the user may keep scrolling if they do not agree with a statement) or is expressed via reply to a tweet, which requires more effort than clicking a like button to express support. For this reason, there is a substantial unbalance between the amount of support and opposition signals. And lastly, opposition in a tweet may be implicit. To overcome these challenges, we adopt a multi-step strategy to create a user-tweet graph (Fig. 3) , that we project onto a user-user graph:

(1) We curated a list of high-precision English and French expressions which express clear opposition (e.g. "I disagree" and "you're wrong") 6 . We retained all sports-related tweets containing at least one of these expressions, and the tweets they replied to. For the sake of clarity, tweet ( ) is posted by user 1 ( 2 ).

(2) To control the graph sparsity, we retained all users 3 who both (i) wrote a tweet liked by user 1, and (ii) liked the tweet (opposition tweet) written by user 1. (2) and (3)) so that the share of negative edges in our graph is close to 10%. We ranked the topics by decreasing frequency and filtered out all the tweets not related to the top 200 topics. (4) We project the resulting user-tweet graph onto a user-user graph. We anonymise all the nodes (users) and edges (tweets). Eventually, the edge data of the final graph is provided under the format depicted in Fig. 4 . TwitterSG contains 753,944 nodes (users), Figure 4 : TwitterSG . An edge represents that the source node (user) has a positive (+) or negative (−) stance towards the target node (user) for the given topic.

200 topics and 12,848,093 edges. Among these edges, 9.6% are negative (opposition) and 90.4% are positive. Most frequent topics are depicted in Figure 6 . There may be several edges between two nodes (several interactions, several topics).

Birdwatch Signed Graph, or BirdwatchSG , is a signed, directed, edge-attributed graph of users, drawn from note ratings on Birdwatch 7 . Birdwatch is a pilot launched by Twitter in January 2021 in the USA to address misleading information on the platform, in a community-driven fashion: the Birdwatch participants can identify information in tweets they believe is misleading and write notes that provide informative context. They can also rate the helpfulness (either helpful, somewhat helpful, or not helpful) of notes added by other contributors. All Birdwatch contributions are publicly available on the Download Data page of the Birdwatch site 8 so that anyone in the USA has free access to analyse the data.

Starting with Birdwatch data from January to July 2021, we create a positive (negative) edge from participant 1 to 2 if participant 1 rated a note written by participant 2 as helpful (not helpful). We filter out the somewhat helpful ratings. The topic associated with an edge is the topic of the tweet the note refers to. We anonymise all the nodes and edges. Eventually, the edge data of the final graph is provided under the format depicted in Fig. 5 . Figure 6 . There may be several edges between two nodes (several interactions, several topics).

In this section, we evaluate the embeddings produced by our SEM method (Section 3) and compare its performance to three state-of-the art signed graph embedding models on our TwitterSG and Bird-watchSG datasets (Section 4).

SEM variants. We evaluate three variants of SEM, each of which corresponds to a different choice of function to combine node and topic embeddings (Section 3):

• SEM-mask: The topic information is ignored. This corresponds to ( , ) = in the first layer of the topicaware skip-gram architecture, Fig. 2 .

• SEM-addition: The topic and node embeddings are added in the first layer of the topic-aware skip-gram architecture (Fig.  2) , i.e., ( , ) = + . • SEM-hadamard: The topic and node embeddings are combined via element-wise multiplication (hadamard) in the first layer of the topic-aware skip-gram architecture, i.e., ( , ) = × .

Note that the SEM variants only change how the user and topic embedding are combined during node2vec training (Section 3.3). Baselines. We compare SEM to three state-of-the-art signed graph embedding methods described in Section 2.1: StEM [20] , SIDE [8] , SiNE [24] . Like SEM-mask, these three methods are topic agnostic and were only tested on signed graphs lacking topics, or other attributes, on edges.

We set the node embedding dimension ( ) to 64 for all methods and experiments. For SEM variants, we set walks per node ∈ {5, 10, 20, 80}, walk length = 40, context size = 5, return parameter = 1.5, in-out parameter = 0.5, negative sample size to 20, subsampling threshold to 1e-5, and the optimisation is run for 1 to 5 epochs. For two given users and a given topic, edge weights are summed and the overall topical edge weight is set to +1 if the sum is positive, and -1 otherwise. For baseline methods, we use the same parameter settings as those suggested in their respective papers. The edge topic information is masked for baselines and SEM-mask.

We follow previous work by evaluating our method, SEM, and baselines on a signed link prediction task [8, 20, 24] . In signed link prediction, we are given a signed graph where the sign, or agreement value, on several edges is missing and we predict each edge's sign value using the observed edges. In particular, we formulate link sign prediction as a binary classification task using embedding learned from each method as follows. For each dataset, we perform Table 2 : Operations (Φ) to produce edge embeddings from node embeddings for evaluation ( Section 5.3)

Output

-fold cross-validation (80/20% training/test set) and evaluate with mean AUC over the 5 folds. For all approaches, we create edge embeddings by combining node embeddings using Φ( 1 , 2 ) using operations from Table 2 . Note that this means for topic-aware SEM variants we do not explicitly use the topic embedding for evaluation.

Using the edge representations in the training set, we fit a binary classifier to predict edge signs on the test set. Due to the sign imbalance sign in the edge data, we downsample the positive signs when fitting the classifier. 

In Table 3 , we report results for SEM variants and baselines using both nearest neighbors (kNN) and logistic regression (LR) classification on edge embeddings. For each approach, we report the best value over choices of translation operator Φ(·, ·) from Table 2 .

On TwitterSG and BirdwatchSG , SEM-mask, the topic-agnostic version of SEM, shows better or competitive performance with the three baselines. The topic-aware SEM variants significantly outperform topic-agnostic baselines on both datasets and across both edge embedding classifiers. On TwitterSG , SEM-addition improves the AUC by 2.9% and 3.0% the AUC for the = 5 and = 10 kNN classifiers respectively, compared to the best performing topic-agnostic method. On BirdwatchSG , SEM-addition improves the AUC by 2.0% and 2.5% the AUC at = 5 and = 10 respectively, compared to the best performing topic-agnostic method. These results demonstrate that SEM learns improved node embeddings for signed edge prediction.

One important advantage of learning user and topic embeddings jointly is the potential for predicting the stance of a user on topics for which we have not observed their engagement. We investigate the performance of methods on this 'cold start' subset of test samples ( 1 , 2 , , ) such that the engagement of user 1 or 2 on topic was not observed during training. In other words, there is no training sample ( 1 , ., ., ) or (., 2 , ., ). This represents 28% and 17% of the test data for TwitterSG and BirdwatchSG respectively (average over 5 folds).

In Table 4 , we present signed edge prediction AUC results limited to only 'cold start' using only nearest neighbors classification since this had better performance overall. Only SEM-addition is able to maintain performance across both datasets and edge embedding classifiers (compare to Table 3 ). This hints that, during training, SEM-addition learns topic relationships such that an observed disagreement on one topic affect the likelihood of disagreements (or agreements) for other topics. 

We investigate learning topic embeddings separately from user node embeddings in order to understand the value of jointly learning these as we propose. In order to explore this, we alter how we train a link prediction classifier for topic-agnostic approaches to also learn a topic embedding table. For topic-aware SEM-methods, we instead opt to freeze this topic embedding table to what was learned during graph embedding. As depicted in Figure 7 , for a given edge = ( 1 , 2 , , ), this classifier takes as input the pre-trained user embeddings 1 and 2 combined with a topic embedding learned as part of this classifier Figure 7 : Logistic regression classifier for stance detection to investigate learning topic embeddings separately from user node embeddings (Section 5.6).

training process for topic-agnostic approaches. We combine these embeddings similarly to how we propose in Section 3.1 for training SEM: The user embedding 2 and topic embeddings are combined via functions (·, ·) matching the choices for that combine the graph-embedding learned user and topic embeddings defined in Section 5.1. The resulting vector is combined with 1 user embedding vector via functions Φ(·, ·) defined in Table 2 . The resulting edge embedding is the input to the LR classifier. Note that we deliberately combine the topic embedding with the user embedding 2 only. Indeed edge operations ℓ 1 and ℓ 2 in Table 2 involve the difference between source and target node embeddings. So combining the topic embedding into source and target embeddings would cancel each other out. Note also that when we set =mask, we effectively ignore this learned (or frozen topic embedding), reducing to the same setting for LR in Table 3 . For other values of the topic embedding (learned or frozen from graph embedding) is used for edge prediction.

In Table 5 , for each and graph embedding approach, we report the best AUC found over functions Φ. The performance of SEMaddition remains unmatched by the topic-agnostic methods even when topic-agnostic approaches learn topic embeddings during classifier training. Performance is still significantly degraded compared to our best results in Table 3 , demonstrating that training topic and node embeddings in tandem remains the most beneficial way to incorporate context (topic) into stance detection on signed graphs. We do note however that for SEM-variant performance decreases if we use the learned topic embedding at test time.

In Figure 8 , we depict the topic embeddings obtained with SEMaddition trained on TwitterSG , and projected with tSNE [23] . We can discern clear clusters of topics associated to a specific sport (e.g. NFL, hockey, baseball) or group of sports (e.g. fighting sports: WWE, Wrestling). Among these clusters, we observe finer-resolution groups. For instance, English football clubs lie close to the Premier League topic. Karim Benzema, Antoine Griezmann and Paul Pogba are the closest neighbours to France, while Zinedine Zidane and Raphael Varane are close to Real Madrid CF. Michael Jordan and Kobe Bryant are closest neighbours. We observe similar patterns on BirdwatchSG topics, and with SEM-hadamard (not depicted due to space constraints). The presence of meaningful topical clusters demonstrate the ability of our method to capture topic similarities when a diverse range of topics are discussed. The US public debate is known to be politically polarised, and so are Birdwatch reports according to recent research [7, 19, 26] . Consequently, we expect to observe two major clusters of user embeddings. Figure 9 displays the user embeddings obtained with SEM-addition trained on BirdwatchSG , and projected with tSNE. The presence of two distinct opinion clusters prove the ability of our sign-informed context generation strategy to capture oppositions, and separate opposing views in the graph.

Further, we visually inspect the ability of the model to distinguish positive and negative edges. Let = ( 1 , 2 , , ) be an edge of topic going from user 1 to 2 with weight ∈ {−1, 1}. For the sake of visualisation, we define the embedding of edge as the hadamard product of the two user embeddings 1 × 2 . Figure 10 displays the projected BirdwatchSG edge embeddings obtained with SEMaddition and tSNE. The positive (negative) edges are colored in blue (red). We observe distinct clusters of positive or negative edges, which confirms the capability of the model to discriminate positive and negative edges.

In this work, we introduce SEM, a framework for learning stance embeddings in signed, edge-attributed, social networks. Utilizing sign-informed random walks to generate training examples, we demonstrate how the scalable skip-gram objective can be successfully applied to learn signed-graph embeddings. Our approach is flexible and can incorporate arbitrary edge-attribute such as topics, to provide context embeddings in edge interactions. Experimental results show that SEM embeddings outperform state-of-the-art signed-graph embedding techniques on two new datasets: Twit-terSG and BirdwatchSG . We open-source these two datasets to the network mining community to spur further research in social network analysis.

Distributed large-scale natural graph factorization

Zero-Shot Stance Detection: A Dataset and Model using Generalized Topic Representations

Adversarial Learning for Zero-Shot Stance Detection on Social Media

A landscape theory of aggregation

Structural balance: a generalization of Heider's theory

node2vec: Scalable feature learning for networks

POLE: Polarized Embedding for Signed Networks

Side: representation learning in signed directed networks

Rev2: Fraudulent user prediction in rating platforms

Edge weight prediction in weighted signed networks

Signed networks in social media

Enhancing Zeroshot and Few-shot Stance Detection with Commonsense Knowledge Graph

Efficient estimation of word representations in vector space

Distributed representations of words and phrases and their compositionality

Learning Ideological Embeddings from Information Cascades

The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship, co-attendance and other co-behaviors

Asymmetric transitivity preserving graph embedding

Deepwalk: Online learning of social representations

Nicolas Pröllochs. 2021. Community-Based Fact-Checking on Twitter's Birdwatch Platform

A method for learning representations of signed networks

Line: Large-scale information network embedding

Get out the vote: Determining support or opposition from Congressional floor-debate transcripts

Visualizing data using t-SNE

Signed network embedding in social media

Exploiting social network structure for person-to-person sentiment analysis

Can the Wikipedia moderation model rescue the social marketplace of ideas?

Sne: signed network embedding

To build TwitterSG , we mined all sports-related tweets containing at least one of the following expressions which express opposition:tais toi(.) -abruti(.) -ratio(.) -tu dis nimp(.)