key: cord-0546737-higf145w authors: Suprem, Abhijit; Pu, Calton title: MiDAS: Multi-integrated Domain Adaptive Supervision for Fake News Detection date: 2022-05-19 journal: nan DOI: nan sha: a08b57ff3a2dcef0f1a425411de727022dec9c9a doc_id: 546737 cord_uid: higf145w COVID-19 related misinformation and fake news, coined an 'infodemic', has dramatically increased over the past few years. This misinformation exhibits concept drift, where the distribution of fake news changes over time, reducing effectiveness of previously trained models for fake news detection. Given a set of fake news models trained on multiple domains, we propose an adaptive decision module to select the best-fit model for a new sample. We propose MiDAS, a multi-domain adaptative approach for fake news detection that ranks relevancy of existing models to new samples. MiDAS contains 2 components: a doman-invariant encoder, and an adaptive model selector. MiDAS integrates multiple pre-trained and fine-tuned models with their training data to create a domain-invariant representation. Then, MiDAS uses local Lipschitz smoothness of the invariant embedding space to estimate each model's relevance to a new sample. Higher ranked models provide predictions, and lower ranked models abstain. We evaluate MiDAS on generalization to drifted data with 9 fake news datasets, each obtained from different domains and modalities. MiDAS achieves new state-of-the-art performance on multi-domain adaptation for out-of-distribution fake news classification. The misinformation and fake news associated with the COVID-19 pandemic, called an 'infodemic' by WHO (Enders et al., 2020) , have grown dramatically, and evolved with the pandemic. Fake news has eroded institutional trust (Ognyanova et al., 2020) and have increasingly negative impacts outside social communities (Quinn et al., 2021) . The challenge is to filter active fake news campaigns while they are raging, just like today's online email spam filters, instead of offline, retrospective detection long after the campaigns have ended. We divide this challenge to detect fake news online into two parts: (1) the variety of data (both real and fake), and (2) the timeliness of data collection and processing (both real and fake). In this paper, we focus on the first (variety) part of challenge, with the timeliness (which depends on solutions to handle variety) in future work (Pu et al., 2020) . The infodemic, and fake news more generally, evolves with a growing variety of ephemeral topics and content, a phenomenon called real concept drift (Gama et al., 2014) . However, the excellent results on single-domain classification , have generalization difficulties when applied to cross-domain experiments (Suprem and Pu, 2022; Wahle et al., 2022) . A benchmark study over 15 language models shows reduced cross-domain fake news detection accuracy (Wahle et al., 2022) . A generalization study in (Suprem and Pu, 2022 ) finds significant performance deterioration when models are used on unseen, non-overlapping datasets. Intuitively, it is entirely reasonable that state-of-the-art models trained on one dataset or time period will have reduced accuracy on future time periods. Real concept drift is introduced into fake news as content changes (Gama et al., 2014) , camouflage (Shrestha and Spezzano, 2021) , linguistic drift (Eisenstein et al., 2014) , and adversarial adaptation by fake news producers when faced with debunking efforts such as CDC on the pandemic (Weinzierl et al., 2021) . To catch up with concept drift, the classification models need to be expanded to cover a wide variety of data sets (Kaliyar et al., 2021; Li et al., 2021; Suprem and Pu, 2022) , or augmented with new knowledge on true novelty such as the appearance of the Omicron variant (Pu et al., 2020) . In this paper, we assume the availability of domain-specific authorative sources such as CDC and WHO that provide trusted up-to-date information on the pandemic. A key challenge of such multi-domain classifiers is a decision module to select the best-fit model amongst a set of existing models to classify new samples. This degree of knowledge is defined by the overlap between an unlabeled sample and existing models' training datasets (Suprem and Pu, 2022) . Intuitively, a best-fit model better captures a sample point's neighborhood in its own training data Chen et al. (2022); Urner and Ben-David (2013) . MiDAS. We propose MIDAS, a multi-domain adaptative approach for early fake news detection, with potential for online filtering. MIDAS integrates multiple pre-trained and fine-tuned models along with their training data to create a domain-invariant representation. On this representation, MIDAS uses a notion of local Lipschitz smoothness to estimate the overlap, and therefore relevancy, between a new sample and model training datasets. This overlap estimate is used to rank models on relevancy to the new sample. Then, MIDAS selects the highest ranked model to perform classification. We evaluate MIDAS on 9 fake news datasets obtained from different domains and modalities. We show new state-of-the-art performance on multi-domain adaptation for early fake news classification. Contributions. Our contributions are as follows: 1. MIDAS: a framework for adaptive model selection by using sample-to-data overlap to measure model relevancy 2. Experimental results of MIDAS on 9 fake news datasets with state-of-the-art results using unsupervised domain adaptation. 2 Related Work Domain adaptation maps a target domain into a source domain. This allows a classifier learned from the source domain to predict the target domain samples (Farahani et al., 2021) . Some approaches focus on a domain invariant representation between source and target . Then, a new classifier can be trained on this invariant representation for both source and target samples. Domain invariance is scalable to multiple source domains by fusing their latent representations with an adversarial encoder-discriminator framework (Li et al., 2021) . For multi-source domain adaptation (MDA), classifiers for each source have different weights: static weights using distance (Li et al., 2021) or per-sample weights on l2 norm (Suprem et al., 2020) . Alongside domain adaptation, weak supervision (WS) is also common for propagating labels from source domains to a target domain (Ratner et al., 2017) . Both approaches estimate labels closest to the true label of the target domain sample. This works with the assumption that the source domains or labeling functions, respectively, are correlated to the true labels due to expertise and domain knowledge. In each case, whether MDA or WS, domains or labeling functions need to be weighted to ensure reliance on the best-fit source. Snorkel, from (Ratner et al., 2017) , uses expert labeling functions and weighs them on conditional independence. Similarly, approaches in use coverage of expert foundation models and weigh on distance to embedded sample. EEWS from (Rühling Cachay et al., 2021) directly combines source data and labeling function in estimator parametrization to generate dynamic weights for each sample. MDA approaches weigh sources with weak supervision (Li et al., 2021) , distance (Suprem and Pu, 2022) , or as team-of-experts (Pu et al., 2020) . Source training data : Unlabeled data : Domain-invariant encoder : Common embedding space : Decoders to reconstruct : Discriminator loss : Reconstruction loss : Normalized L-scores : Label matrix Figure 1 : MIDAS architecture: The encoder generates a domain invariant representation. We add perturbation to this representation. Then, fine-tuned models {M } ( i = 1) k process the sample and perturbations. After computing the local Lipschitsz constant for each model, we can rank their L scores and select the best-fit model's label from the label matrix. Let there be k source data domains, with labels Each of these source has an associated source model SM, with a total of k SMs: {f i } ( i = 1) k , where we have access to the training data X i and weights w i . Each SM yields hidden embeddings through a feature extractor backbone, or foundation model (Bommasani et al., 2021) . Embeddings are projected to class probabilities with any type of classification layer/module. MIDAS adaptively weights the k SM predictions for some unlabeled, potentially drifted target data domain X . We accomplish this by using local embedding smoothness of the SMs as a proxy for model relevance to a sample x ∈ X . SMs are typically smooth in the embedding space (Urner and Ben-David, 2013) ; further, smoothness is correlated with local accuracy (Chen et al., 2022) . With MIDAS, we rank each f k on a smoothness measure around the embedding for x , i.e. f k (x ). Then, MIDAS can directly use the top-ranked f k , or the smoothest f k under a smoothness threshold, as the best-fit relevant models for x , with the remaining models abstaining. Because we are directly measuring smoothness on the embedding space, MIDAS can use already fine-tuned, state-of-the-art classifiers for each task, allowing off-the-shelf, plug-and-play usage. These classifiers are foundation models (Bommasani et al., 2021) that have been fine-tuned with architectural changes, learned weights, and hyperparameter tuning for their specific dataset. There are two key challenges in MIDAS: 1. How do we compare smoothness of SMs that have been trained on different domains? 2. How can be measure the smoothness itself for unlabeled samples in the embedding space of SMs? We address (1) with an encoder E that generates a domain invariant representation on {X i } k i=1 . This unifies the data domains, allowing comparisons for different SMs to start from the same source domain. For (2), we extend the idea of local Lipschitz smoothness from (Chen et al., 2022) and (Urner and Ben-David, 2013) to randomized Lipschitz smoothness. In randomized Lipschitz smoothness, we randomly perturb E(x ), the domain invariant representation of x . Then, we compute the local lipschitz constant L on these perturbations E(x ) + , with respect to E(x ) to measure smoothness. This allows us to calculate an L k for each f k and use the local Lipschitz constant, a measure for the local smoothness, as a measure of relevancy. We now describe MIDAS components and implementation details. The MIDAS architecture is shown in Figure 1 . First, we cover the encoder-decoder framework to generate the domain invariant representations of source and target datasets. Then we present the randomized Lipschitz smoothness measure to generate SM relevancy rankings. To compare different f k relevancy, we require a common source domain (Li et al., 2021) . We achieve this with a single-encoder multiple-decoders design, where we have a single encoder to generate a domain invariant representation from all source domains. Then we use k decoders to reconstruct the invariant representation for each f k . To train E, we use an adversarial discriminator D to enforce invariance with a min-max game, where the discriminator tries to identify the source domain of the invariant representation, and the encoder tries to fool it: We use a gradient reversal layer R to convert the min-max to single-step minimization. R is the identity matrix during the forward pass, and the discriminator gradient during the backward pass, scaled by λ = −1. Then, the adversarial optimization becomes: To train D k , we use the masked language modeling loss from BERT and AlBERT pretraining. Parameters are initialized with AlBERT's albert-base-uncased weights (Lan et al., 2019) . In summary, we train a single encoder and k decoders. The encoder projects our training data in the form of SentencePiece tokens (Kudo and Richardson, 2018 ) into a domain-invariant representation, trained with a domain discriminator. Each decoder then reconstructs the original input training tokens from the invariant representation. We use decoders because MIDAS is designed to work with our existing BERT and AlBERT classifiers, which expect SentencePiece token input. Decoders are trained with masked language modeling, where we randomly mask up to 15% of words or tokens in the input. Then, during prediction, an unlabeled, potentially drifted sample x from an unseen distribution X is converted to a domain invariant representation E(x ). Each decoder D k reconstructs from this invariant representation the input to its corresponding SM f k . With a common embedding space, we can now compare relevancy of each model to an unlabeled, potentially drift sample x . To present our approach, we need to introduce Lipschitz continuity. We can extend this to define Lipschitz-smooth with respect to SM predictions using Lipschitzness from (Chen et al., 2022) . Definition (Lipshitzness). An SM is Lipschitz smooth if, for some class label C, That is, with L k smoothness, the diference in f k 's predicted labels on x 1 and x 2 is bounded by L k for all x ∈ X. However, the local value of L k can vary across the embedding space. Consequently, f k is smoother wherever L k is smaller 1 . As such, we want small L k for samples in the same class, and large L k for samples from different classes. With these definitions, we can present our approach for finding the best-fit relevant f k for x , defined as the f k with the smoothest embedding space around x . Theorem 1. Let the best-fit f k for a sample x be the SM that is smoothest around x . We can find the best-fit f k for a particular sample x , given a distance threshold , by solving: The max term estimates the L k value for each f k by sampling N random points in an -Ball around x . Then, we find the f k that has the smallest L k . A key insight is that adversarial attacks exploit non-smoothness of a model's embedding space to fool classifiers, by generating a noise such that f k (x + ) = f k (x). This non-smoothness occurs when f k does not capture enough training data in the region around x properly; in GANs, this causes 'holes' in the latent space (Suprem et al., 2020) during image synthesis. Conversely, adversarial defenses either enforce smoothness around embedding space or on potentially perturbed inputs themselves (Das et al., 2018) . Similarly, GANs can enforce 1-Lipschitzness to improve coverage of sample generation (Qin et al., 2020) . So, given several f k with different local L k around the embedding f k (x + ), a lower L k indicates smoother embedding space because that SM has captured more training data in the region surrounding x relative to other SMs, similar to the overlap metric calculated in (Suprem and Pu, 2022) However, even with a low L k , the classification labels y k = f k (f k (x)), obtained from the classification module f k of the kth SM, can change on perturbations around x. We can use probabilistic Lipschitzness to bound the probability of y k changing in the neighborhood x + as a function of the perturbation . That is, a function that is Lipschitz by Definition 1 (L-Lipschitz) satisfies Definition 3 (φ-Lipschitz) with φ( ) = 1 if ≥ 1/L and φ( ) = 0 if < 1/L. From this, it follows that if f k satisfies the φ-Lipschitz condition, then the number of samples within an -Ball of x that have a different label from x is bounded by φ( ), per (Urner and Ben-David, 2013) . As we move further from x , the probability of a label change increases. Let a be the accuracy for f k at x . If we know the accuracy drop α at the edge of the -Ball where labels change values, we can bound the accuracy of predictions between x and some perturbed point x r in an -Ball around x to: If we calculate α for an SM using training examples of different labels within the margins allowed by probabilistic Lipschitzness, we find that accuracy bound depends only on choice of . Consequently we can approximate f k 's accuracy on x if x r is within a small -Ball around x to be at least: Recall that the encoder projects all samples to a common invariant domain E : {X} k i=1 → G. Each SM-specific decoder D k then converts from the invariant domain to the k-th source domain D k : G → X k . With this encoder-decoders framework, we can use the invariant domain G as the common source domain for all SMs. Per Definition 3, accuracy is best estimated in an < 1 ball. Once E is trained, we can use the class cluster centers (obtained from K-Means clustering on the embeddings) from the training data to compute a local cluster-specific L for each cluster in each f k . This partitioning allows us to compute local smoothness characteristics of the embedding space, simialr to the concurrent work in (Chen et al., 2022) . We use the cluster centers because we want the strongest measure of smoothness for (Suprem and Pu, 2022) and (Chen et al., 2022) both use the centroid to set up accuracy thresholds. Then, we can estimate a local L k for each f k at the cluster centers with the m nearest point to the center using Equation (3). We take the maximum L k among all f k to obtain the upper bound on the local smoothness among the SMs. On this maximum, we can calculate = 1/max(L k( i = 1) k ). So, the only hyperparameter here is m, the number of neighbors to compute the local L k for each label cluster in each f k . We explore impact of m in Section 5.2. Now, for an unlabeled sample x , we first generate the domain invariant representation x G = E(x ). Then we perturb x G to generate r points {x G i} r i=1 in an -Ball around x G . Using x G and {x G i} r i=1 , we can compute the local Lipschitz constant L k for each f k using Equation (3). We have 2 possibilities: 1. L k ≥ 1/ : f k that satisfy this condition abstain from providing predictions, since the accuracy drop is unbounded. 2. L k < 1/ : f k that satisfy this condition can provide labels, because their accuracy is bounded per Equation (6). We implemented and evaluated MIDAS on PyTorch 1.11 on a server running NVIDIA T100 GPUs. We have released our implementation code. MiDAS Datasets. We use 9 fake news datasets, shown in Table 1 . Where available, we used provided train-test splits; otherwise, we performed class-balanced 70:30 splits. We performed a preliminary motivating evaluation, shown in Table 1 . Here, we have compared an oracle case to the concept drift case. In the oracle case, we train 9 models, one on each dataset, and then evaluate this model on its corresponding dataset's test set. This is the case where the prediction data matches the training distribution. In the concept drift case, we trained a model on 8 datasets, then evaluated on the held-out dataset. In this case, the prediction dataset, even though on the same topic of Covid-19 fake news detection, does not match the training distribution. We see significant accuracy drops, between 20% to 50%. This matches the observations in generalizability in (Suprem and Pu, 2022) and (Wahle et al., 2022) . MiDAS Evaluation. We evaluate MIDAS with held-one-out testing at the dataset level similar to the generalization studies approach in (Suprem and Pu, 2022 Figure 2a shows tSNE projection of all 9 datasets' pretrained BERT encoder's embeddings. Figure 2b shows dataset embeddings when generated with MiDAS' encoder. The label overlaps between true news and fake news have been separated; in this case, enforcing domain invariance forces true and fake labels of all datasets to cluster. In Figure 2c and Figure Ablation Study. To further evaluate MIDAS' efficacy, we also conducted an ablation study in Section 5.3 by varying the number of training datasets, types of loss functions, masked language model trainig, and loss weights. Adjustment of m. For each experiment in Section 5.2, we sampled points in an -ball around x, where calculated using steps Section 4.3. We explore effects of adjusting the radius of this sampling ball by changing m, the the number of nearest neighbors, in Section 5.4. Specifically, we show that as the sampling/perturbation radius increases beyond , MIDAS' accuracy decreases. Conversely, as sampling radius is reduced, MIDAS increases accuracy while sacrificing coverage. We now present evaluation results for MIDAS. In each experiment, we designate a single dataset as the target dataset without labels, and the remaining datasets act as source domains. In these cases, the held-one-out dataset acts as the drifted dataset, similar to the generalization experiments in (Suprem and Pu, 2022) . We follow the steps in Section 4.3 to test MIDAS, and compare classification accuracy to 5 approaches: (i) an ensemble of the training models with equal weights, (ii) a Snorkel labeler (Ratner et al., 2017 ) that treats each model as a labeling function, (iii) an EEWS labeler (Rühling Cachay et al., 2021 ) that treats each model as a labeling function, (iv) a KMP-model (Suprem and Pu, 2022) that uses KMeans clustering with proxies to compute overlap, and (v) an 'oracle' AlBERT model fine-tuned on the held-out dataset. Figure 2 shows several characteristics of the MiDAS encoder. Figure 2b shows domain invariance in the labels. Each point is a sample from the 9 datasets; before applying MiDAS, there is significant label overlap because each dataset exists in separate domains. After applying MiDAS, datasets are projected to a domain invariance embedding. This forces sampes with the same label, irrespective of source domain, to cluster together and reduce the label overlap observed in (Suprem and Pu, 2022) . After training MIDAS' encoder, we need to compute the radius using m nearest neighbors to the label cluster centers. We examine the impact of different m values for computing in Figure 2c and Figure 2d . As we increase the number of neighbors used in estimating local L, the estimate for L increases, indicating reduced smoothness the further we deviate from the cluster center. In turn, this reduces the acceptable -ball radius to bound probability of label change, per (Urner and Ben-David, 2013) , since = 1/L. A large m would significantly reduce the size of the sampling -ball, and perturbations would be negligible. A small m would yield a poor estimate for local L and a large sampling ball. We further explore impact of changing m directly on accuracy in Section 5.4. Here, we fix m = 50 for remaining experiments, since we observed the -Ball radius generally stabilized around this value. In Table 3 : MiDAS Ablation Study. We examined impact of different design choices here. Of note is that using masked language modeling is significantly useful in improving end-to-end accuracy. Further, adding a center loss term ensures the domain-invariance representations have separable clusters, as we see in Figure 2b . Finally, with weighted loss, we give the discriminator 2x the importance of the encoder masking loss to focus on domain invariance. samples. On average, MIDAS sees a 30% increase in accuracy compared to an ensemble. Further, by using the training data itself to adaptively guide SM selection, MiDAS improves by 21% on Snorkel, 10% on EEWS and KMP. We evaluated the impact of several design and training choices for MIDAS in an ablation study in Table 3 We use a version of MiDAS trained with half of the sources with the most data points for each experiment (MiDAS-Half). This yields near-random accuracy, since this is a modified ensemble on different source datasets. Adding the remaining sources improves MiDAS' coverage and improves accuracy by 15%. We add a center loss term (He et al., 2018) to the encoder output to encourage clustering on the labels between multiple sources; increasing accuracy by 5%. Next, we added language masking to the input during the encoder-decoder training to further fine-tune the encoder for the fake-news tasks, yielding a 4% improvement. Finally, we increased the weights for the discriminator loss compared to encoder loss to emphasize domain invariance, yielding a 5% improvement for MiDAS' accuracy on fake news detection for unseen, drifted data. We compare convergence for different experiments in Figure 3 : the encoder converges faster in each case. Further, adding the center and weighted losses contribute to discriminator fooling and stabilizing the discriminator loss. Finally, we investigate with respect to m, which was fixed at m=50. For these experiments, we investigated increasing and decreasing m to, respectively, increase and decrease . Increasing the neighbors increases the computed L, since we are using points further from the smooth cluster center. Table 4 : Impact of m values: As we increase the number of nearest neighbors, we get a higher estimate for L, per Figure 2c , which reduces the -Ball sampling radius and relaxes threshold for model predictions. This leads to lower accuracy with higher coverage. Decreasing m, in turn, increases accuracy at the cost of lower coverage. In turn, this reduces the sampling -ball, so the perturbations we apply will be smaller, and in some cases, negligible. Furthermore, threshold value of L needed to accept an SM's prediction is higher (since it is 1/ ), so MiDAS tolerates lower smoothness for each model, and accepts predictions from more models, resulting in higher coverage and lower overall accuracy. On the other hand, using fewer neighbors means larger sampling ball and smaller threshold for acceptance. It is more likely for perturbed samples to be further away, yielding a higher value of L unless a corresponding model is especially smooth around that point. This would, as a result, reduce coverage, but increase accuracy. We show this in Table 4 for several values of m across 4 of our datasets. Using only the nearest neighbor yields minimal coverage. As we increase the m, coverage increases significantly, and accuracy approaches ensemble accuracy. Conversely, as we reduce m, we also reduce L and consequently, the smoothness threshold to accept a prediction. This increases accuracy, since MIDAS rejects predictions that do not satisfy the threshold. However, coverage decreases as well: we show in Table 4 fewer unseen samples from the target domain can be labeled as we decrease m. We also see that m can have outsized impact on accuracy as well: 'coaid' f1 scores drop from 0.73 to 0.56, even though coverage increases only slightly, from 0.98 to 1.0 when we increase m from 100 to 150. This occurs because at m = 150, MiDAS' relaxed thresholds allow poorer models to provide predictions as well, reducing accuracy in the final averaged result. We tested MiDAS in the scenario where fine-tuned models already exist. This constrains the MiDAS encoder, which must also train a decoder so match the inputs of the fine-tuned models that expect tokenized input. A more flexible approach would deploy models and MiDAS together, with each fine-tuned model directly trained with the MiDAS encoder. This would improve both training time, convergence, as well as accuracy, since each model would directly use the MiDAS generated encodings, instead of domain-specific reconstructions. Furthermore, we selected m using empirical observations. However, there can be technically grounded approaches as well, such as using the high-density bands from (Jiang et al., 2018; Suprem et al., 2020) . In these cases, the α-high density region of each cluster can be used to estimate a good m. We leave further exploration of m as well as integration of fine-tuned models into the encoder training framework to future work. We have presented MIDAS, a system for adaptively selecting best-fit model for a set of samples from drifting distributions. MIDAS uses a domain-invariance embedding to estimate local smoothness for fine-tuned models around drifting samples. By using local smoothness as a proxy for accuracy and training data relevancy, MIDAS improves on generalization accuracy across 9 fake news datasets. With MIDAS, we can detect COVID-19 related fake news with over 10% accuracy improvement over weak labeling approaches. We hope MIDAS will lead further exploration into the tradeoff between generalizability and fine-tuning, as well as research into mitigating generalization difficulties of pre-trained models. Maxwell Weinzierl, Suellen Hopfer, and Sanda M Harabagiu. Misinformation adoption or rejection in the era of covid-19. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM), AAAI Press, 2021. Covid 19 Fake News Dataset On the opportunities and risks of foundation models Transformer-based language model fine-tuning methods for covid-19 fake news detection Train and you'll miss it: Interactive model iteration with weak supervision and pre-trained embeddings Shoring up the foundations: Fusing model embeddings and weak supervision A covid-19 rumor dataset Coaid: Covid-19 healthcare misinformation dataset Shield: Fast, practical defense and vaccination for deep learning using jpeg compression A heuristic-driven uncertainty based ensemble framework for fake news detection in tweets and news articles Diffusion of lexical change in social media The different forms of covid-19 misinformation and their consequences. The Harvard Kennedy School Misinformation Review A brief review of domain adaptation Fast and three-rious: Speeding up weak supervision with triplet methods A survey on concept drift adaptation Triplet-center loss for multi-view 3d object retrieval Dafd: Domain adaptation framework for fake news detection To trust or not to trust a classifier Mcnnet: generalizing fake news detection with a multichannel convolutional neural network using a novel covid-19 dataset Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing Albert: A lite bert for self-supervised learning of language representations Multi-source domain adaptation with weak supervision for early fake news detection Characterizing covid-19 misinformation communities using a novel twitter dataset A stance data set on polarized conversations on twitter about the efficacy of hydroxychloroquine as a treatment for covid-19 Misinformation in action: Fake news exposure is linked to lower trust in media, higher trust in government when your side is in power Covid-19 Fake News Dataset (Kaggle) Beyond artificial reality: Finding and monitoring live events from social sensors How does lipschitz regularization influence gan training The instagram infodemic: cobranding of conspiracy theories, coronavirus disease 2019 and authority-questioning beliefs Snorkel: Rapid training data creation with weak supervision End-to-end weak supervision Textual characteristics of news title and body to detect fake news: a reproducibility study Exploring generalizability of fine-tuned models for fake news detection Odin: Automated drift detection and recovery in video analytics Probabilistic lipschitzness a niceness assumption for deterministic labels Testing the generalization of neural language models for covid-19 misinformation detection