key: cord-0156990-p83nhiz9 authors: Setlur, Vidya; Kumar, Arathi title: Sentifiers: Interpreting Vague Intent Modifiers in Visual Analysis using Word Co-occurrence and Sentiment Analysis date: 2020-09-26 journal: nan DOI: nan sha: 76dc2ca05577cb8e44e7461464b87f0a5dd29fa1 doc_id: 156990 cord_uid: p83nhiz9 Natural language interaction with data visualization tools often involves the use of vague subjective modifiers in utterances such as"show me the sectors that are performing"and"where is a good neighborhood to buy a house?."Interpreting these modifiers is often difficult for these tools because their meanings lack clear semantics and are in part defined by context and personal user preferences. This paper presents a system called system that makes a first step in better understanding these vague predicates. The algorithm employs word co-occurrence and sentiment analysis to determine which data attributes and filters ranges to associate with the vague predicates. The provenance results from the algorithm are exposed to the user as interactive text that can be repaired and refined. We conduct a qualitative evaluation of the Sentifiers system that indicates the usefulness of the interface as well as opportunities for better supporting subjective utterances in visual analysis tasks through natural language. Understanding user intent in a query has been recognized as an important aspect of any natural language (NL) interaction system [9, 25] . Search queries typically consist of keywords and terms called modifiers that imply a diverse set of search intents [23] . While basic keyword matches from users' search queries might elicit a reasonable set of results, interpreting modifiers provides a better understanding of the semantics in the queries [26] . Recently, NL interfaces for visual analysis tools have garnered interest in supporting expressive ways for users to interact with their data and see results expressed as visualizations [1-3, 15, 19, 22, 30, 35, 36] . Users often employ vague language while formulating natural language queries when exploring data such as "which country has a high number of gold medals?" or "what time of the day do more bird strikes occur?" [21] . There has been some precedence of research to better understand how these simple vague modifiers comprising of superlatives and numerical graded adjectives should be appropriately interpreted [21, 31] . However, users also employ less concrete and often subjective modifiers such as 'best', 'safe', and 'worse' in utterances [21] . The interpretation of such modifiers makes it challenging for natural language interfaces to precisely determine the extensions of such concepts and mapping intent to the analytical functions provided in the visual analysis systems. Contribution This paper introduces Sentifiers, 1 a system to explore reasonable interpretations and defaults for such subjective vague modifiers in natural language interfaces for visual analysis. The algorithm identifies numerical attributes that can be associated with a modifier using word co-occurrence. Sentiment analysis determines the filter ranges applied to the attributes. Similar polarities result in associating the Top N of data values for an attribute with the modifier, while diverging polarities are mapped to the Bottom N. Figure 1a indicates that 'unsafe' and the attribute magnitude have similar negative sentiment polarities, defaulting to a higher earthquake magnitude range as seen in the map. The system has the ability to utilize any domain-specific information if available, such as WolframAlpha [4] . Figure 1b shows diverging polarities for the modifier 'struggling' paired with attributes incomePerCapita and lifeExpectancy. Lower numerical filter ranges based on the statistical properties of the data are applied to generate the scatterplot. Interactive text is displayed to show the provenance of the system's interpretation with clickable portions exposed as widgets that can be refined by the user. An evaluation of the system provides useful insights for future system design of NL input systems for supporting vague concepts in visual analysis. Research exploring the semantics of vague concepts for understanding intent transcends three main categories: (1) Computational Linguistics, (2) Intent and Modifiers in Search Systems, and (3) Natural Language Interaction for Visual Analysis. The notion of vagueness in language has been studied in the computational linguistics community [32] . Research has focused on the conceptualization and representation of vague knowledge [8] . The Vague system introduces a technique for generating referring expressions that included gradable adjectives [39] . De Melo et al. infer adjective grade ordering from large corpora [14] and Vegnaduzzo automatically detects subjective adjectives [40] . Computational linguists have developed approaches for subjectivity and polarity labeling of word senses [6, 42] . In our work, we draw inspiration from linguistic literature, specifically polarity identification for computing the semantics around vague subjective concepts. Search systems have explored techniques to deduce intent in queries during exploratory search. Several techniques exist to extract entityoriented search intent to improve query suggestions and recommendations [16] . Detecting intent in search systems is also based on query topic classification [33] . Bendersky et al. assign weights to terms in a search query based on concept importance [7] . Recent work has focused on deriving query intent by fitting queries into templates [5, 25] . Li et al. employ semantic and syntactic features to decompose queries into keywords and intent modifiers [25] . Researchers have predicted search intent and intentional task types from search behavior [12, 28] . While the goal of our work to interpret intent in queries is similar to that of search tasks, we focus on resolving vague modifiers to generate relevant visualization responses. Similar to search systems, natural language interfaces for visual analysis need to understand intent and handle modifiers in the utterances. DataTone provides ambiguity widgets to allow a user to update the system's default interpretation [19] . Eviza and Analyza support simple pragmatics in analytical interaction through contextual inferencing [15, 30] . Evizeon [22] and Orko [35] extend pragmatics in analytical conversation. None of these systems consider how imprecise modifiers can be interpreted. The Ask Data system describes the handling of numerical vague concepts such as 'cheap' and 'high' by inferring a range based on the underlying statistical properties of the data [31] . Hearst et al. explore appropriate visualization responses to singular and plural superlatives and numerical graded adjectives based on the shape of the data distributions [21] . We extend this work to more vague, subjective modifiers. We introduce a system, Sentifiers that interprets vague modifiers such as 'safe' and 'struggling' in a NL interface for visual analysis. The system employs a web-based architecture with the input query processed by an ANTLR parser with a context-free grammar, similar to parsers described in [22, 30] . A data manager provides information about the data attributes and executes queries to retrieve data. The query upon execution, generates a D3 visualization result [10] . The process for resolving a set of data attributes and their values to a modifier found in the NL input to Sentifiers, is outlined as: Input: Natural language utterance α Output: Generate visualization response α is the NL input utterance. m is the vague modifier in the utterance α. Part-of-Speech tagger POS identifies m in α. attrs num is the set of numerical attributes in the dataset D. attrs cnum is the set of co-occurring numerical attributes in D with attrs cnum ∈ attrs num . PMI computes co-occurrence scores w c for m and attrs num . polarity computes sentiment polarities p for m and attrs cnum . 1 Invoke POS(α) returning m. 2 Compute PMI(m, attrs num ) → w c for each attr i ∈ attrs cnum . 3 Compute polarity(m, attrs cnum ) → p. 4 Update interface based on w c and p. Vague modifiers are gradable adjectives that modify nouns and and are associated with an abstract scale ordered by their semantic intensity [24] . Gradable adjectives can be classified into two categories based on their interpretation as measure functions [24] . Numerical graded adjectives such as 'large' and 'cheap' are viewed as measurements that are associated with a numerical quantity for size and cost respectively. Complex graded adjectives like 'good' and 'healthy' tend to be underspecified for the exact feature being measured. While the interpretation of numerical gradable adjectives has been explored in NL interfaces for visual analysis [21, 30, 31] , this paper specifically focuses on the handling of complex gradable adjectives. Sentifiers first applies a commonly used performant partof-speech (POS) tagger during the parsing process to identify these complex gradable adjectives and their referring attributes in the NL utterances [37] . The system can distinguish complex gradable adjectives by checking for the absence of superlative or comparative tags that are used to annotate numerical graded adjectives. The next step maps the vague modifier to a scale based on its semantic intensity so that the modifier can be interpreted as a set of numerical filters for generating a visualization response. We base our approach on linguistic models that represent the subjectivity of complex modifiers as a generalized measure mapping the modifier to numerical attributes in a multidimensional space [18] . For example, the subjectivity of the modifier 'healthy' can be interpreted based on 'weight', 'amount of exercise', and 'hospital visits.' Sentifiers computes the semantic relatedness between the modifier and the numerical data attributes using a co-occurrence measure. To have sufficient coverage for co-occurrence, we use an extensive Google n-grams 2 corpus [27] . To maximize the chances of cooccurrence, Sentifiers considers co-occurrence between all n-gram combinations of the modifier and the attribute names. For example, some of the n-grams for the attribute income per capita are 'income per capita,' 'income per,' 'per capita,' and 'income. ' We employ a Pointwise Mutual Information Measure (PMI), an information-theoretic measure that quantifies the probability of how tightly occurring a modifier m and a numerical attribute attr num are to the probability of observing the terms independently [13] . We found this measure to work well and was performant with terse word co-occurrence pairings without requiring sentence embeddings. We consider any numerical attribute attr cnum that has a non-zero PMI score, indicating the presence of a co-occurrence with m. The PMI of modifier n-gram t m with one of the attribute n-grams t attr is: Once the modifier is semantically associated with co-occurring numerical attributes, we need to determine a reasonable numerical range to associate with the modifier. Sentiment polarity analysis is a linguistic technique that uses positive and negative lexicons to determine the polarity of a phrase [43] . The technique provides the ability to dynamically compute the sentiment of the phrase based on the context in which its terms co-occur rather than pre-tagging the phrase with absolute polarities, which is often not scalable. We determine the individual sentiment scores with a sentiment classification based on a recursive neural tensor network [34] . We choose this technique as its models handle negations and reasonably predict sentiments of terser phrases, characteristic of queries to Sentifiers. The sentiments are returned as a 5-class classification: very negative, negative, neutral, positive, and very positive. The values are normalized as [−1, +1], ranging from negative to positive to provide an overall sentiment. We then determine the sentiment polarities of the modifier m and co-occurring attribute attr cnum pair based on their individual sentiments (ignoring the strength of the sentiments) using the following combinatorial logic. We treat neutral sentiment similar to positive sentiment as neutral text tends to lie near the positive boundary of a positive-negative binary classifier [43] . if (sentiment m == positive or sentiment m == neutral) and (sentiment attr cnum == positive or sentiment attr cnum == neutral) then Compute TopN(attr cnum ). else if (sentiment m == positive or sentiment m == neutral) and sentiment attr cnum == negative then Compute BottomN(attr cnum ). else if sentiment m == negative and (sentiment attr cnum == positive or sentiment attr cnum == neutral) then Compute BottomN(attr cnum ). else if sentiment m == negative and sentiment attr cnum == negative then Compute TopN(attr cnum ). end if Sentifiers uses sentiment polarities to compute the ranges in two ways: If domain knowledge exists, the system uses the information to determine a default (Figure 3a uses the Richter scale [4] ). Otherwise, the system computes Top N to range from [med + MAD, max] and Bottom N to range from [min, abs(med − MAD)] where med, MAD, min, and max are the median, median absolute deviation, minimum, and maximum values for attr cnum respectively (see Figure 3b ). We choose MAD as it tends to be less affected by non-normality [11] . Figure 1 shows the Sentifiers interface with an input field that accepts text queries. Upon execution of the query, range filters for the cooccurring numerical attributes are applied, showing a visualization response. The system interpretation is expressed in the form of interactive text [41] above the visualization (Figure 4a ) to help the user understand the provenance of how the modifier was interpreted. Positive, negative, and neutral sentiments are shown in blue, red, and yellow respectively (Figure 4b) . The text contains widgets that show ranges starting from the highest co-occurring one. Similar to other NL systems [19, 30, 31] , we expose system presumptions as widgets (Figure 4c ). If domain-specific semantics are used, a link to the source is provided (Figure 1a) . To provide easier readability, Sentifiers displays up to two widgets. Word co-occurrence and sentiment analysis techniques can result in incorrect results. The user has the ability repair the system decisions (Figures 4d and f ) and the interface updates to reflect the changes (Figure 4e ). These refinements are persistent for the duration of the user session. We conducted a user study of Sentifiers with the following goals: (1) collect qualitative feedback on the handling of the modifiers for various visual analysis tasks and (2) identify system limitations. This information would help provide insights as to how the handling of complex vague modifiers could integrate into a more comprehensive NL visual analysis interface. The study was exploratory in nature where we observed the types of vague modifiers people asked and how they responded to the system behavior. Because the main goal of our study was to gain qualitative insight in the system behavior, we encouraged participants to think aloud with the experimenter. Figure 4 : Interactive text response to a query "which countries are booming?". Sentifiers provides the ability to refine the system defaults. We recruited ten volunteers (five males, five females, age 24 -65). All were fluent in English and all regularly used some type of NL interface such as Google. Eight used a visualization tool on a regular basis and the rest considered themselves beginners. Each participant was randomly assigned a dataset of earthquakes in the US [38] or the health and wealth of nations [20] with equal number of participants for each. We began with a short introduction of how to use the system. Participants were instructed to phrase their queries in whatever way that felt most natural and to tell us whenever the system did something unexpected. We discussed reactions to system behavior throughout the session and concluded with an interview. The study trials were done remotely over a shared videoconference to conform with social distancing protocol due to COVID-19. All sessions took approximately 30 minutes. We employed a mixed-methods approach involving qualitative and quantitative analysis, but considered the quantitative analysis mainly as a complement to our qualitative findings. The quantitative analysis consisted of the number of times participants used vague subjective modifiers and interacted with the text response. Overall, participants were positive about the system and identified many benefits. Several participants were impressed with the ability of the system to understand their queries ("I typed scary to see what it would do, and it understood." [P2]). Sentifiers' text feedback was found to be helpful ("I wasn't sure how the system would handle this, but it was pretty clear when I saw the response" [P4]). The participants appreciated the functionality to be able to correct the system's response ("I wanted to tweak the range a bit and it was useful to be able to change the slider and see the result update" [P9]). The number of unique vague modifiers per participant ranged from 3 to 12 (µ = 6.7) with a total of 24 unique complex modifiers overall. The three most common modifiers were 'good', 'bad', 'severe' for the earthquakes dataset and 'prosperous', 'flourishing', 'poor' for the health and wealth of nations dataset. All participants interacted with the text response to understand the system behavior. The most common interaction was updating the data ranges for the attributes (69% of the interactions), followed by adding new attributes (23%), and deleting attributes from the interpreted result (8%). Comments relevant to this behavior included, "The range seemed high for me and I changed it. It was nice to see the system remember that" [P10], "I wanted population to be added to the mix and it was easy to just click and do that" [P3], and "I wasn't interested in life expectancy so I just got rid of it" [P1]. The study also revealed several shortcomings and provides opportunities for future NL systems supporting visual analysis tasks: Support for more complex interpretations: The current implementation does not support combinations of vague modifiers in the same query. For example, the system was unable to interpret "show me countries that are doing very well and poorly." [P4]. P2 expressed that they wanted flexibility in defining analytical functions such as associating 'unsafe' to the frequency of recently occurring earthquakes with magnitude 5 are greater. Sentifiers failed to correctly interpret queries such as "which countries are reasonably doing well," where P7 expected some middle range, though they were able to adjust the ranges after. A comprehensive evaluation with additional datasets would be necessary to ascertain how effective this system would be alongside standard visual analysis tools. Handling customization and in-situ curation: The topic of customization of the interpretation behavior came up during the study. For example, P1 said "I typed -show me which countries are affordable and it showed me an income range. I was expecting a response that considered inflation, GDP or have a way for me to define that." The algorithms employed in Sentifiers assume that the data attributes are curated with human-readable words and phrases. However, data is often messy with domain-specific terminology. Future work should explore mechanisms for users to customize semantics of attributes and interpretations in the flow of their analysis. Handling system expectations, biases, and failures: NL algorithms have shown to exhibit socio-economic biases, including gender and racial assumptions often due to the nature of the training data [17] . Their use can perpetuate and even amplify cultural stereotypes in NL systems. For example, P7 commented, "I asked for good places to live and the system responded with a high income per capita. To me, that opens up bigger issues such as gentrification and economic segregation." This suggests that there is a responsibility for improved transparency in system behavior; determining appropriate de-biasing methods remains an open research problem. This paper presents a technique to explore how a system can interpret subjective modifiers prevalent in natural language queries during visual analysis. Using word co-occurrence and sentiment polarities, we implement Sentifiers to map these modifiers to more concrete functions. We expose the provenance of the system's behavior as an interactive text response. An evaluation of the system indicates that participants found the system to be intuitive and appreciated the ability to refine the system choices. Feedback from interacting with Sentifiers identifies opportunities for handling vagueness in language in the future design of such natural language tools to support data exploration. As Bertrand Russell stated [29] -"Everything is vague to a degree you do not realize till you have tried to make it precise." Towards rich query interpretation: Walking back and forth for mining query templates Improving the impact of subjectivity word sense disambiguation on contextual opinion analysis Learning concept importance using a weighted dependence model Selected Papers -Uncertain Reasoning Track -FLAIRS Computational semantics in discourse: Underspecification, resolution, and inference D3: Data-driven documents Doing Better Statistics in Human-Computer Interaction Actively predicting diverse search intent from user browsing behaviors Word association norms, mutual information, and lexicography Good, great, excellent: Global inference of semantic intensities Analyza: Exploring data with conversation Mining coordinated intent representation for entity search and recommendation Bias in computer systems Multidimensionality in the grammar of gradability Data-Tone: Managing ambiguity in natural language interfaces for data visualization Toward interface defaults for vague modifiers in natural language interfaces for visual analysis Applying pragmatics principles for interaction with visual analytics Determining the user intent of web search engine queries Projecting the Adjective: The Syntax and Semantics of Gradability and Comparison. Outstanding dissertations in linguistics. Garland Understanding the semantic structure of noun phrase queries Introduction to Information Retrieval Quantitative analysis of culture using millions of digitized books Extracting information seeking intentions for web search sessions The Analysis of Mind Eviza: A natural language interface for visual analysis Inferencing underspecified natural language utterances in visual analysis Vagueness in Context Building bridges for web query classification Recursive deep models for semantic compositionality over a sentiment treebank Orko: Facilitating multimodal interaction for visual exploration and analysis of networks Articulate: A semi-automated model for translating natural language queries into meaningful visualizations Featurerich part-of-speech tagging with a cyclic dependency network USGS. Earthquake facts and statistics Generating vague descriptions Acquisition of subjective adjectives with limited resources Explorable explanations Creating subjective and objective sentence classifiers from unannotated texts Articles: Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis