Open data visualizations and analytics as tools for policy-making University of South Florida From the SelectedWorks of Loni Hagen 2019 Open data visualizations and analytics as tools for policy-making Loni Hagen, University of South Florida Available at: https://works.bepress.com/loni-hagen/14/ http://www.usf.edu https://works.bepress.com/loni-hagen/ https://works.bepress.com/loni-hagen/14/ Contents lists available at ScienceDirect Government Information Quarterly journal homepage: www.elsevier.com/locate/govinf Open data visualizations and analytics as tools for policy-making Loni Hagena,⁎, Thomas E. Kellera, Xiaoyi Yerdenb, Luis Felipe Luna-Reyesb a University of South Florida, 4202 E. Fowler Avenue, Tampa, FL 33620-7800, USA b University at Albany, 135 Western Ave., Albany, NY 12222, USA A R T I C L E I N F O Keywords: Policy informatics Policy analytics Open data Topic modeling Visual analytics Usability testing A B S T R A C T Government agencies collect large amounts of structured and unstructured data. Although these data can be used to improve services as well as policy processes, it is not always clear how to analyze the data and how to glean insights for policy making, especially when the data includes large volumes of unstructured text data. This article reports opinions found in “We the People” petition data using topic modeling and visual analytics. It provides an assessment of the usability of the visual analytics results for policy making based on interviews with data professionals and policy makers. We found that visual analytics have potentially positive impacts on policy making practices. Experts also articulated potential barriers regarding the adoption of visual analytics tools, and made suggestions. Potential barriers included insufficient resources in government agencies and difficulty in- tegrating analytics with current work practices. The main suggestions involved providing training and inter- pretation guidelines along with the visual analytics tools. Major contributions of this study include: (1) sug- gesting viable visualization tools for analyzing textual data for policy making, and (2) suggesting how to lower barriers to adoption by increasing usability. 1. Introduction Technology developments have been recognized as catalyzers of organizational change and transformation (Bannister & Connolly, 2014; Treacy & O'Sullivan, 2010). The Internet has not been an exception, and it has triggered the development of new business models and service delivery mechanisms both in the public and private sectors (Bergh & Benghiat, 2017; Luna-Reyes & Gil-Garcia, 2014). In the private sector, for example, businesses like Amazon have applied information tech- nologies and data analysis techniques to transform the retail industry (Bergh & Benghiat, 2017). In the public sector, technologies have also promoted change, although research suggests that change has not been transformational, but incremental (Norris & Reddick, 2013). However, recent trends in open data, big data, and data analytics have renewed both the possibility and interest in transforming government activity, particularly in the development of policy (Janssen & Helbig, 2019; Puron-Cid, Gil-Garcia, & Luna-Reyes, 2016). In particular, some researchers have identified the potential impact of social media data and petitioning systems in the early stages of policy making, contributing to the improvement of problem definition and agenda setting activities (Hagen, Harrison, & Dumas, 2018; Janssen & Helbig, 2019; Luna-Reyes, 2017). More specifically, Janssen and Helbig (In Press) pointed to the need for developing methods to analyze content developed with such platforms as sources of inspiration for policy makers. However, data collected through these platforms poses at least two challenges for its effective use. First, these datasets include large amounts of unstructured textual data that makes manual reading too burdensome to understand the content. Although recent efforts to develop advanced text mining tools have contributed to the first chal- lenge, the use of such tools poses a second challenge given that there is still much to learn in its application and interpretation by policy ma- kers. In this way, it is rare to find empirical examples of textual data being successfully adopted for policy making. However, one of the motivations behind opening data by government is to promote in- novations that facilitate the exploitation of these data (Mergel, Kleibrink, & Sörvik, 2018). Motivated by these challenges, we explore data from the We The People petitioning platform to answer two research questions: (1) what is a potential solution to efficiently extract and effectively present topics expressed in large volumes of textual data?, and (2) to what extent do policy makers consider visual analytics solutions to be usable and useful for policy making?. To answer the first question, we extend previous work on topic modeling (Hagen, 2018) by applying topic modeling for topic extraction and visualization tools such as LDAvis for presenting the extracted topics. Then, to answer the second research question, we test the usability of these possible solutions with policy makers, data https://doi.org/10.1016/j.giq.2019.06.004 Received 17 August 2018; Received in revised form 5 June 2019; Accepted 9 June 2019 ⁎ Corresponding author at: School of Information, University of South Florida, 4202 E. Fowler Avenue, CIS2031, Tampa, FL 33620-7800, USA. E-mail address: lonihagen@usf.edu (L. Hagen). Government Information Quarterly xxx (xxxx) xxx–xxx 0740-624X/ © 2019 Elsevier Inc. All rights reserved. Please cite this article as: Loni Hagen, et al., Government Information Quarterly, https://doi.org/10.1016/j.giq.2019.06.004 http://www.sciencedirect.com/science/journal/0740624X https://www.elsevier.com/locate/govinf https://doi.org/10.1016/j.giq.2019.06.004 https://doi.org/10.1016/j.giq.2019.06.004 mailto:lonihagen@usf.edu https://doi.org/10.1016/j.giq.2019.06.004 analysts, and communication specialists to empirically show their perspectives on adopting such visual analytics tools for everyday practices. In this way, this research contributes to the data-driven policy making literature by proposing a framework to facilitate the analysis and visualization of large volumes of text data, and by diag- nosing government practitioners' responses and feedback on such visual analytics tools for policy making. The structure of the paper is as follows: The second section presents background information about “We the People” data. The third section discusses theoretical foundation of value creation through open data and introduces topic modeling and visual analytics research conducted in open data context. The fourth section describes the data and methods, including a potential solution to distill and present re- presentative themes expressed in large volumes of text data. The fifth section presents our key findings from the usability evaluation. The sixth section discusses the main findings in terms of barriers and lim- itations, and the final section includes conclusion and future research. 2. Background: We the people open data The US e-petitioning platform “We the People” (WtP) was launched in 2011 as the flagship initiative of the Obama administration to in- crease public participation in government (The White House, 2015). The data created through e-petitioning includes petition title, petition texts, signatures and their accumulation, some characteristics of peti- tioners and signers, issue categories and metadata (The White House, 2017). According to the platform rules, petitions that accumulate more than 100,000 signatures in less than 30 days get an official update from the White House. Although not all petitions reach this threshold, data from past petitions are made available to the public for free use, re-use, and distribution as Open Data (Ubaldi, 2013). Datasets are updated about every 6 months by including new data. Following general prin- ciples of open data as a source of innovation, the WtP platform provides an API to facilitate data access and manipulation (see https://petitions. whitehouse.gov/developers/get-code). Moreover, the platform pro- vides some analytical tools developed by civic programmers (https:// obamawhitehouse.archives.gov/blog/2014/06/03/hackathon-here- white-house). It has been suggested in previous research that open petitioning data are potential sources of policy topics of public interest (Hagen et al., 2018). In this way, open petitioning data becomes the focal point of interest in this research. WtP open data is unique in three aspects. First, the dataset includes direct expressions of citizen opinion to governments, which is rarely available in traditional information sources such as major news outlets, survey results, or administrative data. Therefore, the petition data can be used to inform public opinion and sentiment regarding policy mat- ters to policy makers. Second, the WtP dataset is a good example of a technically advanced open data set; it is a quality dataset arranged with defined metadata, arranged in a machine-readable format, and is made available through an open API. Third, WtP data is a by-product of a petitioning platform, and governments are flooded with similar types of datasets as the use of social media platforms increases. The major challenges in using WtP data for creating value for the policy-making process is the volume of data and the unstructured nature of petitions. Use of unstructured data such as abundant text data has been recognized as one of the biggest challenges of big data ana- lytics (Siegel, 2016). While open government and open data initiatives create and share unprecedented amount of text data including citizen expressions, the process of going through them are too time consuming and complicated to be practical, especially if policy makers need to go through large volumes of text (Walters, Aydelotte, & Miller, 2000). These types of big textual data are growing exponentially as the number of government-led platforms and adoption of commercial social media increases. Topic modeling and the recent development of visualization tools may help to reduce cost and time related to the analysis of large volumes of text data. 3. Literature review 3.1. Analytics to create value through open data Governments around the world have exerted efforts to “create and institutionalize a culture of Open Government” (Nam, 2012, p. 348) by embracing the ideas of transparency, civic engagement in governance, and policy making (Aitamurto, 2012; The White House, 2011). Opening data not only brings changes in government's culture towards “open- ness, transparency and accountability,” but can also increase public engagement by cultivating a culture of sharing and collaborating through open data (Ubaldi, 2013). These cultural changes and active citizen engagement can create economic innovations (Mergel et al., 2018; Zuiderwijk, Helbig, Gil-Garcia, & Janssen, 2014), improved government performance (Ubaldi, 2013), and increased accountability of elected officials (Sivarajah et al., 2016). Unfortunately, actual crea- tion of value through innovative use of open data has proven to be a difficult task. Despite the increasing initiatives of open data platforms, reported use cases and created value have been lacking (Najafabadi & Luna-Reyes, 2017). For example, out of 183,000 datasets published in data.gov (The United States' open data portal), only 78 apps are made available in the platform as of November 2017. Data and technology barriers are one of the major obstacles in achieving innovation through open data initiatives (Magalhaes & Roseira, 2017; Toots, McBride, Kalvet, & Krimmer, 2017; Zuiderwijk et al., 2014; Zuiderwijk, Janssen, Choenni, Meijer, & Alibaks, 2012). Early on, scholars stressed the importance of open data technolo- gies—in terms of uniformity and integration of information sources as well as the importance of creating metadata (Dawes, Pardo, & Cresswell, 2004). Later, studies recommended that interactivity and usability are crucial elements to make open platforms available for meaningful citizen engagement (Toots et al., 2017). More recently, open data scholars identified certain technical requirements—such as machine-readable formats, use of APIs, tools for data wrangling, and technical competence of users—are lacking in achieving innovation using open data (Magalhaes & Roseira, 2017; Zuiderwijk et al., 2014). As scholars commonly have recognized, publishing data is not en- ough to attain innovation using open data (Janssen, Charalabidis, & Zuiderwijk, 2012). The success of open data depends on active external participation to use the published data (Attard, Orlandi, Scerri, & Auer, 2015). However, for non-technical users, the fundamental lack of ex- pertise and knowledge required for the collection, manipulation, ana- lysis, and interpretability of the data hinders meaningful engagement with open data, and it is a critical problem (Graves & Hendler, 2013). An important portion of open data users may be non-technical users who want to analyze trends over time to understand longitudinal changes but cannot perform required tasks due to a lack of expertise. Recent studies have rightly pointed out lacking capabilities of the supply-side open data platforms for supporting non-technical users (Chatfield & Reddick, 2017) as well as lacking best practices for using the data (Bertot, Butler, & Travis, 2014). 3.2. Visualization of topic modeling For understanding topics and themes expressed in large volumes of text data, topic modeling has been frequently adopted to automatically discover latent themes in a document collection based on the co-oc- currence of words (Blei, 2012). The outcome of topic modeling includes topics (a keyword list sorted by the relevance ranking to the topic) and topic proportions in each document. In general, five to thirty highly ranked keywords are presented as a topic. Topic modeling is an unsupervised machine learning method that extracts topics without relying on prior human knowledge. So, there are two noticeable issues when applying topic modeling results for policy making. The first issue is doing it right. It is important to make proper decisions and care in the process of modeling to produce human L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 2 https://petitions.whitehouse.gov/developers/get-code https://petitions.whitehouse.gov/developers/get-code https://obamawhitehouse.archives.gov/blog/2014/06/03/hackathon-here-white-house https://obamawhitehouse.archives.gov/blog/2014/06/03/hackathon-here-white-house https://obamawhitehouse.archives.gov/blog/2014/06/03/hackathon-here-white-house http://data.gov interpretable topics (Boyd-Graber, Mimno, & Newman, 2014). Hagen (2018) extracted topics using petitioning data, although this study focus is limited to showing “how to train and evaluate” topic modeling and does not show how topic modeling results can be presented and utilized for policy making and can be implemented for everyday practices. Our work extends these efforts to produce interpretable topics that are, therefore, amenable to policy making. The second issue of topic modeling for policy making resides in how to interpret the meaning of topics and relationship among them (Hagen, 2016; Sievert & Shirley, 2014). Given that topics are extracted based solely on the statistical traits of term co-occurrence, there is no theo- retical reason to believe they are easily interpretable by a human (Boyd- Graber et al., 2014). However, some digital government studies have adopted topic modeling to identify and understand public opinions ex- pressed in text data. Reddick, Chatfield, and Ojo (2017), for example, extracted topics appearing Facebook posts as an effort to create a social media text analytics framework. Hagen, Uzuner, Kotfila, Harrison, and Lamanna (2015) extracted emerging topics from WtP data using a small set of petitions created in the early years of WtP (initiation to mid- 2014). Although both examples are steps in the right direction, these studies only displayed topic words with limited interpretations, and it is still hard to make sense out of the topic modeling results for non- technical readers based solely on the presented topic words. In order to improve interpretability of topic modeling results, more recent studies have adopted visual analytics to present topic modeling results. Cassi, Lahatte, Rafols, Sautier, and de Turckheim (2017) explained the re- lationships between the ways in which the academic literature and social needs as expressed in discussions among members of the Eur- opean Parliament approach the topic of obesity. Visual analytics tools were effective in presenting the clear misalignment between academic studies and social needs in terms of the obesity issue. In addition to an improved interpretability, visual analytics tools enable meaningful engagement of non-technical users. Graves and Hendler (2013) proposed the use of visualization methods to provide simple mechanisms for non-technical users to explore open data. Using over 160 public datasets, Keshif, a visualization tool, “let the user de- fine what is being visualized and explored, not how” (Yalçın, Elmqvist, & Bederson, 2016). Poucke et al. (2016) demonstrated that researchers can build complex and automated processes with multiple mouse clicks instead of programming codes. Using rapidminer (rapidminer, 2017), a big data analytics tool, non-coding scientists can prepare data, train and validate models, and embed analytic results. As such, experts of open data stressed the importance of data analysis and visualization tools to achieve innovation using open data (Toots et al., 2017). Consumers and end users of open data are diverse (e.g., government employees, innovators, citizens, and journalists/researchers/activists) (Gascó-Hernández, Martin, Reggi, Pyo, & Luna-Reyes, 2018). One of the most popular user groups of open data have been technicians who used open data to develop new tools. Developers and data suppliers (most often using open data) get together through hackathons in order to create new services and products using open data. However, we do not know to what extent these products and services have been used by governments to create value, nor do we have information regarding their influence on actual policy making. Perhaps we can achieve in- novation from open data when we make visual analytics tools available on open data platforms alongside open data sets. Moreover, innovative use cases, if provided on open data platforms, can stimulate users' creativity. Further, user-perception on usefulness of a new technology also influences the users' intention to actually use the technology. 4. Methods 4.1. Data We used data collected through the publicly available White House application program interface (API) that contains all petitions related data appeared on the WtP website between September 22, 2011 (the initiation date), and July 12, 2016. This corpus contained 4985 petition documents. We combined each petition title and its corresponding ra- tionale into one document, which forms the basic unit for this analysis. Fig. 1 is an example of a WtP petition. Available datasets include meta data (including signature counts, user tagging information, the petition creation dates, signature dates and initials of signers). 4.2. Tools for assessing and visualizing data1 We collected the WtP OGD data from the WtP API and stored them in a MySQL database (an open source Structured Query Language (SQL) database) (Oracle, 2017). We queried relevant data fields (petition creation date, title, petition body, and signature counts) from the SQL data for the analysis. After selecting petitions written in English, we converted all texts to lower case, normalized white spaces, eliminated punctuations, non-al- phanumeric characters, and removed short words of only one or two characters using R tm package. We used an English stopwords dic- tionary included in the “mallet” package to eliminate less informative words such as “a,” “the,” and “of,” which appear in almost every English documents; “amp” is added in the stopwords dictionary to eliminate “amp” which is a processed version of ampersand (&). We used the R mallet package to train Latent Dirichlet Allocation (LDA) topic models (Mimno, 2013). Statistical topic modeling such as LDA (Blei, 2012) extracts a coherent theme, which is a probability dis- tribution over a vocabulary assuming that documents are composed of multiple themes. Each theme (or topic) is generally represented by words (we call this topic words) that appear the most frequently in the relevant documents and also is represented by documents that are the most representative of the theme. In deciding number of topics to produce, we followed suggestions made by Hagen (2018)—30 topics were produced using 3344 petitions and 26 topics are good quality topics for a direct human interpretation, and a manual content analysis result by PEW (2016)—25 issue categories are reported after manual analysis of 4799 WtP petitions. Based on the two studies, it is apparent that about 25 policy issue-dimensions can reasonably reflect the WtP corpus. We decided to produce 30 topics expecting that about 25 topics would be “human interpretable” topics because a small portion of the final topics are likely to be low quality for human interpretation (Boyd- Graber et al., 2014; Hagen, 2016). Using random initiation, we have produced ten sets of 30 topics to reassure random initiation does not influence the stability of the topics. We found that most of the topics (26 out of 30) make sense for human interpretation (Appendix I reports the 30 topics, labels, and quality). We then developed visualizations for these LDA topics using LDAvis, an open source topic modeling visualization tool (Sievert & Shirley, 2014). We also aggregated available information from the dataset (i.e., signature counts and dates of petition creation) as well as Google Trends for topic interpretation. Fig. 2 shows the framework of the visual analytics using topic modeling. To help the interpretation and further analyses, we labeled each topic based on the LDAvis visualization results. The topic words were sorted in descending order based on the estimated term frequency within the selected topic (red bars in Fig. 3), which informs topic words that are highly relevant to the specific topic. The relevance of a term to topic is given by a weight parameter λ. Topic words displayed in Fig. 3(a) are acquired using λ = 1. Topic words displayed in Fig. 3(b) are results from using λ = 0.6, an optimal value suggested in the lit- erature (Sievert & Shirley, 2014). The width of the blue bar indicates the “corpus-wide frequencies of each term,” and the width of the red bar represents “the topic-specific frequencies of each term” (Sievert & 1 The R script and the data we used for the analysis is available: https:// github.com/lonihagen/Topic-Modeling L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 3 https://github.com/lonihagen/Topic-Modeling https://github.com/lonihagen/Topic-Modeling Shirley, 2014, p. 68). For example, the red bars for “election” and “clinton” are fully red, with no blue bar showing (in Fig. 3(a)), which means that these terms are used exclusively in Topic 5, and thus are highly representative of the Topic 5. When used λ = 0.6 in Fig. 3(b), these two terms are the first and the second most highly relevant terms representing the Topic 5. After extracting the 30 topics, labels are se- lected from the top 10 topic words (except Police & BLM) displayed by LDAvis (relevance parameter λ = 0.6) and by also considering semantic meaningfulness. The size of circles (on the left side of Fig. 3, which shows the global topic view) “are proportional to the relative prevalence of the topics in the corpus” (Sievert & Shirley, 2014, p. 68). For example, Topic 1 is prevalent in about 20% of the corpus, while Topic 21 is prevalent in about 2% of the corpus according to the circle size displayed in Fig. 3. The biggest topic and the smallest topics tend to be hard to interpret because they often include a mixture of different topics according to a study conducted by Hagen (2016). Also, the distance between topics indicates the semantic distance of topics. For the usability assessment, we created a software package which has interactive features (snap- shots of the package is in Figs. 3, 4, and 5). In addition to the important topic words, the visualization enables the representation of relations between topics, and the prevalence of topics in the entire set of petitions. For example, Fig. 4 shows topic 13, which is a topic about police brutality and the Black Lives Matter (BLM) movement (Rickford, 2016). The left pane of Fig. 4 shows topological positioning of topic 13, which is located close to topics 20 (Http and China—lacking human rights in China), 6 (Prison Sentence topic), 19 (White Genocide) topics. The right-side pane in Fig. 4 shows the most relevant words representing the topic: “police,” “officers,” “enforce- ment,” “officer,” “violence,” “black,” “shot,” “law,” “unarmed,” “brown,” and “killed.” In addition, when we click the first topic word “police” for example, we can see other topics that include “police” in their topic words. For example, Fig. 5 shows that topics 6 (Prison Sentence topic) and 7 (Terrorism Syria topic) include the term “police” in topic words. Since the size of topic 6 is bigger in this case, the term “police” plays more important role to form topic 6 (Prison Sentence topic) compared to topic 7 (Terrorism Syria topic). As such, the LDAvis results show contextual richness of topic modeling results by informing topological position of the topics, and Fig. 1. An example of an WtP petition. Note: The first two lines (bold and large font) are the title of the petition, and the rest of the text is the rationale of this petition. Fig. 2. Framework of the visual analytics of topic modeling. L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 4 Fig. 3. LDAvis results using λ = 1 (a) and λ = 0.6(b) focused on “clinton” topic. L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 5 relations of the topic with other closely related topics. Also, the red bar on the right pane shows the level of importance of each term in the topic. These added information provided by LDAvis provides a rich snapshot of public opinions expressed in WtP petitions. In addition to LDAvis visualizations, we produced two other types of visualizations. As a way of visualizing the popularity of each topic, we decided to show signature counts over time (see Fig. 6). Some topics such as Election Clinton, Police & BLM, and Prison Sentence topics seem to gain public attention over time. Other topics such as Food Labeling, Guns Firearms, Marijuana, and Secession topics show overall negative slopes and thus indicate decreasing levels of attention on these topics. Some other topics have different behaviors depending on external events. Police & BLM topic, for example, includes topic words such as “police,” “law,” “officers,” “violence,” “enforcement,” “officer,” “black,” and “death.” The majority of petitions representing the topic are critical of police brutality, especially against African-Americans. Among the top 20 highly relevant petitions to the topic, petitions requesting police officers to wear body cameras were extremely popular, starting on Fig. 4. LDAvis results using λ = 0.6(b) focused on “Police & BLM” topic. Fig. 5. Topics including “police” in topic words. L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 6 August 13, 2014 right after the Michael Brown case, a black male shot by a police officer on August 9, 2014. Similarly, several petitions under the Guns Firearms topic were in- itiated right after Sandy Hook Elementary School shooting on December 14, 2012, but the level of public attention to the Guns Firearms topic (reflected in number of signatures) have been decreasing ever since (see Guns Firearms topic in Fig. 6 in the second row). Other information sources such as Google Trends can be used to compare petition topics and popularities against the keyword search results of Google Trends (see Fig. 7). Google Trends results can be used as a proxy to measure what people are thinking (Stephens-Davidowitz, 2017). We selected relevant topic words from a sample of six topics and searched in Google Trends in the United States (https://trends.google. com/). The Google Trends results are displayed in the left column, in contrast to the WtP topics and signature counts displayed in the right column of Fig. 7. Some WtP topic popularities seem to correspond to people's thoughts reflected in Google Trends. For example, the attention paid to the topics, Marijuana, Guns Firearms, and Secession, have decreased since they were peaked in 2012 in both Google Trends and the WtP topics. The Election Clinton, and Police & BLM topics have gained higher attention in Google Trends as well as in WtP (fourth and fifth rows of Fig. 7). These results indicate that WtP may reflect the public's attention to certain topics, and topic modeling results combined with signature counts can reveal the level of popularity of certain topics. However, due to the platform specific effect, it would be naïve to think that WtP al- ways should correctly reflect the public's attention. For example, the White Anti Genocide topic was extremely popular in 2012 and has de- creased in popularity on WtP, while making gains in popularity on Google trend (the last row of Fig. 7). During 2012 and 2013 after President Obama's reelection, there were organized activities relating to petition creation and signing on WtP regarding “white genocide” issues (Hagen, 2016), which has gradually decreased since then. Specific groups of people were dedicated to spread out the agenda on WtP. As seen in the Google trend results, the public started paying attention to this topic much later (since mid-2014) than WtP, according to Google Fig. 6. Changes of number of signatures per topic by time. L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 7 https://trends.google.com/ https://trends.google.com/ trends results. These interpretations are merely examples, and were not provided to the experts. If the visual analytics are effective, we expect that policy makers can acquire actionable information and insights that can be used for their policy making. Note: Y axis of the Google trends results represent search interest relative to the highest point on the chart for the given region (U.S.) and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. Y axis of the topic popularity is log values of signature counts of petitions assigned to the topic. 4.3. Usability assessment Usability assessments have been used as tools to involve users in the development of technologies to better understand their needs as well as forms in which technology can support their work processes (Howell & Lang, 2017; Rubin, 1994). User-centered approaches to application development involve the use of tools and methods to help software developers and analysts improve the usability of their applications. The International Standard Organization (ISO 9241-11) defines usability as the extent in which a product –in this case a visualization tool—can serve the needs of a specific user group. Usability tests are commonly used to assess information systems. The ISO standard identifies three main indicators for usability, effectiveness, efficiency, and user sa- tisfaction (ISO 9241-11). Effectiveness refers to the extent in which the product features help the user to accomplish the stated goals. Efficiency is related mainly to the extent in which the product helps the user to reach these goals with the least possible effort. Finally, user satisfaction refers to the subjective perception of the user and the interaction with the product. Nielsen (2012) suggests additional indicators such as learnability (how easy is to move around the interface), memorability (how easy is to remember how to use it) and errors (how many errors people make when interacting with the system). The utility of the system –providing the features you need—is closely related to usability. In fact, it is suggested that the usefulness of a system results from considering both usability and utility (Nielsen, 2012). We adopted a Heuristic Evaluation approach to usability testing Fig. 7. Google trends and topic popularity. L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 8 (Nielsen & Molich, 1990), to assess potential ways in which our vi- sualizations may support the process of policy making as well as po- tential improvements. We were mostly interested in understanding the utility of the visualizations, as well as its learnability and user sa- tisfaction. In this way, we designed a set of questions with these di- mensions in mind. We also included questions related to the nature of their expertise and current positions to better understand their re- sponses. Finally, we asked them to give suggestions for improvement and general comments. The interview included 12 questions (see Ap- pendix II). Consistent with the Heuristic Evaluation Approach, we used these questions to explore the expert perspective on the visualizations. We approached 6 experts who were either policy makers, data analysts, or communications specialists. Although our original plan was to involve only policy and data analysts, one of them suggested the inclusion of a communications specialist. Sample size is consistent with usability testing practices, and experts were selected using a con- venience sampling (Rubin, 1994). Usability tests were conducted with each expert individually during the months of May and June in 2018. Each interview started by asking experts about their background, experience in data analysis, and per- ception about social media and petitions sites for policy making. Then, we introduced 1) interactive LDAvis interface, snapshots of which are shown in Figs. 3, 4, and 5 2) topics and signature counts by time, shown in Fig. 6, and 3) Google trends and topic popularity, shown in Fig. 7, to the interviewees. It is important to note that some visualizations pre- sented to the interviewees were interactive, allowing them to explore relationships among topics in the computer, and doing some simple analysis with the graphs. Each expert had a chance to interact with the LDAvis interactive visualization tool, as well as the two graphs for 5–10 min. After introducing the visualization tools, we asked experts about their interpretation about the utility, learnability and satisfaction of the visualization tools in their daily job. Each interview had a duration of 45 to 60 min. Five sessions took place in a discussion room on campus reserved by a member of the research team and the other one was conducted at the participant's office. 5. Findings: Usability assessment In this section of the paper, we include the main findings from the usability assessment of the visualization tools introduced in previous sections of the paper. Data for the assessment comes from the six face- to-face interviews with experts in data analysis, policy making, and communications. Among the experts, three were policy makers from different levels of public sectors of New York State, including institution level, district level, and state level. Only one of them had experience of using data visualizations for policy making. There were two other ex- perts who were data analysts with a background in information science. One of them has significant experience in data analysis, algorithm de- sign, and health informatics, and the other has several years of ex- perience using data visualization for decision making in the private sector. We also interviewed a communication specialist from a public institution, considering her potential in using data visualization for decision making as a criteria for selection. Table 1 presents an overview of main responses from experts in the usability assessment. All experts found at least some topics to be relevant for the policy conversation. Expert 6 suggested that topics in the interface varied in terms of re- levance, some of them were more important than others. We found that experts were able to use the interactive LDAvis in- terface, and that –in general—their interpretations of the data were consistent among themselves. In general, experts perceived that it was easy to interact with the visualization interface and interpret the results especially with a brief introduction from the interviewer. As it is shown in Table 1, at least two of them found them less intuitive than the other experts and harder to interpret. Some of their reactions included phrases such as “the interface is designed very well, everything is very clear, I feel comfortable interacting with it,” or “your introduction helps a lot… for me to understand the interface and to interpret the visuali- zation.” They think the tool is potentially helpful for analyzing large amounts of qualitative data through theme generation, and the data visualization provides an easier way to communicate with people pos- sessing different levels of technical proficiency. For example, one mentioned, “couple years ago, we have received a lot of feedback from the residents in our district through the survey we sent out, however, due to a lack of staff and technique, we did not know what to do with it. Now I can see that this tool will be very helpful with analyzing those kind of feedback”. Another one explained, “I think this tool will be very useful to put information into different categories or themes,” and “Data visualization provides summarized results and present it in a very vivid way. It is especially good at presenting the trend and the changes over time.” Some interviewees without prior experience using visualization, however, conveyed their struggle: “The data visualization catches my eyes but I am not sure whether I understand it correctly. Some of the themes are very self-explanatory, some are not. Maybe because I do not have enough experience, but I think it is very important someone can help people to interpret it in a right way.” Some experts found visualizations over time (see Fig. 6) particularly interesting, finding different ways of describing them. Some of them described the trends using phrases such as: “It seems that the search interest does not match the signatures over time, I don't know why. Some results are even opposite….” and “Hmmm, it is interesting, the search interest does not necessarily match the signatures overtime, which means that people who are interested in search some topic but may not end up act on it to sign the petition about that topic….” In addition, most of the experts recognized the utility of LDA tools in analyzing qualitative data in general, and they also pointed out po- tential areas of improvement and obstacles for them to implement these tools in their own practice. For example: Currently, this tool only focuses on topic extraction. However, as a policy maker, when we make decision, we mainly focus on under- standing people's opinion, whether they are for or against some is- sues. We would also be interested to know what specific issues about certain topic that people are interested in. For example, the health care topic, what specific issues people are interested in, do they support or against it? A couple more shared concerns are associated with a lack of re- sources and the need for training in the use of this type of tools. For example, one expert stated: I am working in the same office with other legislators, and we share one analyst. Most of the time, I will conduct research on my own. For me, I will need some training to be able to use and understand this tool. Also, we have to consider the budget of the department to implement this tool, or even hire some technical person to manage this tool. It is not feasible for my department, at least for now it is not feasible. Similarly, another expert shared: Designing and implementing a visualization tool requires additional funding, staffs with technical skills, data analytical skill, critical thinking, reflective ability, communication skills….Training is ne- cessary, especially for people with no technical background to learn how to use the tool to help with their daily work. For one, the actual incorporation of the tool into his daily work was unclear: “I rarely use data visualization in my own work, I can see its merit, but I am not sure how to incorporate [it] in my work, maybe in the future, there will be an opportunity for me to do so.” Experts provided suggestions for the future improvement of the tool. Referring to the LDAvis interface, one of them suggested, “I think in the interface, instead of numbers, adding the labels to each topic will make L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 9 T ab le 1 O ve rv ie w of m ai n re sp on se s fr om ex p er ts . T op ic E xp er t 1 E xp er t 2 E xp er t 3 E xp er t 4 E xp er t 5 E xp er t 6 E xp er t ba ck gr ou n d Le gi sl at iv e D ir ec to r (P ol ic y A n al ys t) P ri n ci p al D at a Sc ie n ti st Fo rm er T ec h n ol og y D ir ec to r at P w C T ru st ee fo r th e N Y S H ig h er E d u ca ti on Se rv ic es an al yz in g p ol ic y im p ac ts C ou n ty Le gi sl at u re R ep re se n ta ti ve C om m u n ic at io n s Sp ec ia li st . U se r of d at a in n ew s. R el ev an ce of th e to p ic s O n ly th e U kr ai n e R u ss ia , se ce ss io n to p ic s ar e re le va n t. So m e ar e re le va n t. So m e ar e re le va n t. So m e ar e re le va n t. M an y to p ic s ar e bi g is su es in re al li fe . So m e ar e m or e re le va n t to ge n er al p u bl ic in te re st th an th e ot h er . In te rp re ta bi li ty V is u al iz at io n of to p ic s ov er ti m e ar e m or e m ea n in gf u l th an a si m p le to p ic m od el in g p re se n ta ti on it se lf . In te re st in g to p ic s. So m e to p ic si gn at u re s se em to m at ch th e ge n er al in te re st ov er ti m e, li ke C li n to n & el ec ti on . Li ke s th e d at a vi su al iz at io n of si gn at u re s ov er ti m e, bu t h e w ou ld li ke to se e it ac ro ss lo n ge r ti m e fr am e to ge t a m or e in fo rm at io n . So m e to p ic s ar e ve ry va gu e. C h an ge of th e n u m be r of th e si gn at u re s m at ch th e ch an ge of th e p u bl ic in te re st in p ol it ic s in re al li fe . H e fe lt th e re su lt s ar e in te re st in g, bu t it co u ld h av e be en li tt le co n fu si n g w it h ou t so m e ex tr a ex p la n at io n s. T h e se ar ch in te re st d o n ot n ec es sa ri ly al ig n w it h th e n u m be r of th e si gn at u re s. It ca tc h es m y ey es , bu t n ot ve ry se lf - ex p la n at or y. M or e in te re st ed in th e re su lt s w it h bi g in cr ea se or d ec re as e. Le ar n ab il it y V er y u se r fr ie n d ly . If im p le m en te d in to th ei r d om ai n , on ly n ee d m in or tr ai n in g to u se it bu t n ee d m or e tr ai n in g to u n d er st an d th e m ec h an is m be h in d it . N ic e d es ig n . V er y ea sy to in te ra ct w it h . E as y to in te ra ct w it h it bu t n ot cl ea r w h at ar e th e in si gh ts th at ca n be d ra w n fr om th es e re su lt s. E as y to u n d er st an d an d in te ra ct w it h th e in te rf ac e. It w il l be co n fu si n g w it h ou t ex p la n at io n . It is p re tt y se lf -e xp la n at or y af te r th e in tr od u ct io n . It is ea sy to in te ra ct w it h th e in te rf ac e. U ti li ty G oo d to an al yz e fe ed ba ck fr om re si d en ts au to m at ic al ly . M ay al so be h el p fu l to an al yz e so m e co n tr ov er si al is su es (o n ly to a ce rt ai n d eg re e) . It w il l be u se fu l to tr ac k lo n gi tu d in al ch an ge if it ca n be p ro ve d re p re se n ti n g ge n er al p u bl ic s. D at a vi su al iz at io n h el p ea si ly co m m u n ic at e in si gh ts w it h p eo p le w it h d iff er en t le ve ls of te ch n ic al ba ck gr ou n d s. It w il l be h el p fu l to d ea li n g w it h la rg e am ou n t of qu al it at iv e d at a. G oo gl e tr en d re su lt s m ay be tt er re p re se n t ge n er al p u bl ic in te re st ra th er th an p et it io n si gn at u re s. H el p fu l fo r el ec te d offi ci al s an d p ol ic y m ak er s to ge t to kn ow th e sp ec ifi c p eo p le 's co n ce rn an d at ti tu d e to w ar d s ce rt ai n is su es . H el p in te ra ct w it h d at a, an d ea si er to ex tr ac t in fo rm at io n fr om th e vi su al iz at io n . Sk il ls n ee d ed to be ab le to p ro d u ce or ap p ly th es e to ol s M ak e su re th at th e d at a ar e fr om ex p er t so u rc es th at ca n re p re se n t th e ge n er al p u bl ic s. N ee d to be tr ai n ed to u n d er st an d th e m ec h an ic s be h in d th e sc en es . N ee d so li d te ch n ic al sk il ls to p u t th in gs to ge th er . A ls o n ee d te ch n ic al tr ai n in g to u se th e to ol s. H ow to u se th e to ol to eff ec ti ve ly co m m u n ic at e w it h cl ie n ts an d le t th em u n d er st an d th e in fo rm at io n co n ta in ed in th e vi su al iz at io n . C ri ti ca l th in ki n g, be re fl ec ti ve , go od at m at h ,c om p u te r sc ie n ce an d te ch n ol og y. D at a an al ys is sk il l, p ro gr am m in g sk il ls . D at a an al yt ic sk il l. P ot en ti al of so ci al m ed ia to in fl u en ce p ol ic y co n ve rs at io n s So ci al m ed ia an d p et it io n s si te s d efi n it el y p la y a ro le , bu t it ca n n ot re p re se n t th e ge n er al p u bl ic be ca u se of ac ce ss is su es . N ot su re h ow m u ch it w il l h av e im p ac t on le gi sl at u re s or p ol ic y m ak in g. Sk ep ti ca l be ca u se th ey on ly re p re se n t sm al l gr ou p s of p eo p le w h o ar e ei th er fa r- le ft or fa r- ri gh t. N ot su re if so ci al m ed ia an d p et it io n ca n re p re se n t th e ge n er al p u bl ic . It 's go od to co ll ec t fe ed ba ck an d in te ra ct w it h p eo p le . N ot su re h ow it w il l aff ec t th e p ol ic y m ak in g, m ay ra is e aw ar en es s. P et it io n s w on 't n ec es sa ri ly le ad to an y ch an ge in p ol ic y m ak in g. In ap p ro p ri at e co n te n ts of th e p et it io n s m ak e p eo p le vi ew th em as n on -h ig h qu al it y re fe re n ce . L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 10 it easier to see which circle represents which topic.” Another expert suggested including some measure of people's interest or sentiment analysis, “For me, I would like to see the tool to generate more in-depth analysis on what specific aspects related to each topic that people are interested in, and conduct some sentiment analysis to see what their attitudes towards these aspects are.” Finally, another expert suggested including in the visualization information about the validity of the analysis, “Besides the results of the tool, I would be more interested to see how to validate the tool.” Finally, all experts expressed skepticism about how well social media or online petitions reflect the interests of the general public to some extent. For example, one of them mentioned, “In my opinion, online petitions only represent a small group of people who may share a very extreme idea or has strong motivation to express themselves. It is difficult to assess to what extent it represents the more general in- terest.” Another expert also discussed the importance of his local con- stituents and issues related to online platforms access, “As a policy maker, I care more about the interest and need of the people in my district. In my district, most of the people do not participate in online activities, their opinion may not be shown from these petitions.” 6. Discussion Using topic modeling and visualization tools, we observed that the government experts recognize adopting visual analytics tools as a dis- tant future, rather than current and feasible practices. So, we begin the discussion by deliberating barriers and suggestions for adopting visual analytics tools for policy making based on our interview results. 6.1. Barriers, limitations, and suggestions All experts thought the tools are potentially helpful for analyzing large amounts of qualitative data by generating themes, and that the data visualization offers an easier way to communicate with people with different technical backgrounds. However, the interview data also identified issues involved with adopting these tools and their corre- sponding analytics results into their work practices. Experts stressed that developing “user-centric” tools that support achieving their goals will be crucial. When it comes to “user-centric,” previous efforts of providing tools for “users” have mainly assumed users are citizens or developers (Cisco, 2013; Sahuguet, Krauss, Palacios, & Sangokoya, 2014). Efforts to develop tools with government practitioners as “users” have been lacking. Government practitioners are bounded by structure, rules, regulations, and limited resources, which makes tool development and implementation often difficult. Our tool is specifically designed for government practitioners by adopting no-cost, open source tools in order to address resource con- straints issues. Even with the open source tools, the experts identified that a lack of skills are the major barriers for them adopting visual analytics tools. Experts stated that training and some level of guidance on interpreting LDA analytics results will be necessary for them to adopt these results for policy making. In fact, experts stated that the minimal level of introductory training provided to them during the study regarding the LDAvis tool was very helpful in interpreting the LDA results. This is in line with a previous study's findings, which stressed the importance of training to increase confidence of data users (Gascó-Hernández et al., 2018). Interestingly though, when it comes to implementing the tools in their practices, experts assume that the new tools should work while keeping the current work practices uninterrupted. A policy maker working in the legislative field stated that he relies on document review and door-to-door visits to collect feedback, which information he re- ferences for agenda setting activities. And, he stated that new tools such as LDA analytics are not relevant to his work because it does not fit into his current work practice. When it comes to implementing new tools, therefore, it would be helpful to assess current work practices, and to include a feedback loop so that newly adopted tools factor into current practices to bring improvement in work practices, rather than being regarded as a disruption. Ostensibly, this view may vary across levels of government and the perceived access to technology by constituents. 6.2. Higher bars for adopting information acquired by data-driven analysis for policy making When the visual analytics results were presented (without providing our interpretations), the experts responded with mixed responses in terms of interpretability of the visual analytics results. While all the experts were able to make sense out of the LDAvis presentations, which we thought was promising, they were split on interpreting the signature trends and the comparison results with Google Trends. This is see- mingly because interpretation of these additional visual analysis results requires a technical understanding and contextual knowledge of plat- form specific effects. Experts tended to expect that WtP should represent the entire public's opinion to add value to the policy process, and based on this expectation, some of them concluded the analysis was not useful for their decision making because these results cannot be generalizable. Interestingly, when asked about the usual ways of introducing topics into the legislative or policy agenda, experts suggested pathways in- volving only one or two simple pipelines. Each expert identified only one or two ways that lead to agenda setting in their offices, which are based on letters written by residents, issues people talk about, stake- holder's concern, or reflecting an institution's priorities and subsequent discussions. That means, although experts also depend on a single path to agenda setting, when the analysis results are produced based on one or two platform(s) and computational methods, they raise the bar to conclude the results are not usable for their policy making because the data and analysis results are not generalizable (which we do not claim it can be generalizable). Considering policy makers' higher expectation for information ex- tracted via a data-driven process, visual analytics should consider in- cluding multiple data sources for conducting analyses. This way, di- verse pathways can be produced that can be helpful for agenda setting and are not bounded by one specific environment. To clarify, contextual information attached to data are still important for policy making. What we need to be careful about in analytics tool development is under- standing the extent we can deliver information by reflecting the con- textual basis for that information. 6.3. Implications on tool development As previous studies have suggested, making good quality open da- tasets available would be a good start for open data initiatives, but analytics tools provided alongside the datasets help create immediate benefits by extracting useful information from the data. In fact, some U.S. open data sites provide tools for visualization. For example, New York City, San Francisco, and Orlando, among many other major cities, provide interactive visualizations through private vendors. Unfortunately, any analytical tool that also enables textual analysis is not yet available in these platforms. Our study has implications for tool development so that engineers can develop usable and useful tools for government practitioners using open data. We demonstrated that our topic modeling analytics and vi- sualizations could be useful for policy making when there are large volumes of text data. In order for the LDA results and visualizations to be useful for decision making and agenda setting, government practi- tioners wanted to see more granular information regarding each topic. Specifically, experts suggested that knowing more granular levels of issues than topic level and public attitude expressed in each topic would be highly valuable for making decisions based on the LDA analytics results. For example, as stated above, one expert suggested that repla- cing numbers with labels will be more useful for understanding topics L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 11 at a glance – a move that would make the tool more user-friendly. All in all, the study highlights the importance of user (in this case, govern- ment practitioners) engagement in tool development process. 7. Conclusion In this paper, we extend open data research by suggesting a process to extract and visualize textual big data in order to make sense of it. LDA topic modeling was used to extract emerging topics from petitions, and visualization tools such as LDAvis were used for visual presenta- tions of the topics. Then, we interviewed 6 experts to assess the us- ability of the prototype visualizations as well as to gather more general impressions of their potential value for policy making. The interview results of the visual analytics tools show that the experts were positive about the usability of the analytics results and tools regardless of their technical experience. Still, experts had overall high standards for usability and usefulness. While acknowledging the potential of these tools they also desired to maintain their current practices for setting policy agenda. In addition, experts expressed that a lack of resources and training are major barriers for adopting such tools. Visual analytics tools have evolved so that practitioners, even those who are not big data scientists or engineers, can use these techniques to extract useful and actionable information (Marr, 2018). Our results suggest that achieving tangible benefits from using open data for gov- ernment policy making through innovative tools and techniques may require overcoming major barriers. Nonetheless, involving policy ma- kers as well as policy analysts in the process of tool building and ana- lytics may provides insights and lessons for the continued adoption of visual analytics for policy making. This study contributes to the open data literature by producing and testing possible solutions to extract useful information from text data using visual analytics and LDA topic modeling. We expect that these solutions may offer insights to government practitioners as well as scholars of e-government. These possible solutions can be used to convince and motivate other policy makers and to encourage and in- spire others to participate in the open data movement. This study also contributes to the policy and data analytics literature by applying topic modeling, an automatic topic extraction method, for policy making. Hagen (2018) demonstrated the process, validity, and evaluation of topic modeling using a WtP data set. Hagen called for more case studies using topic modeling with additional datasets in order to establish the validity of adopting unsupervised learning methods. Compared to Hagen (2018), we produced similar topics using a bigger data set, and captured new topics reflecting important issues during 2015–2016, such as the U.S. Presidential election and the Black Lives Matter movement. Our study also validates the stability of mallet topic mod- eling for extracting interpretable opinions. Some limitations should be noted. The LDA topic modeling we adopted for the study treats words as discrete entities, which is called bag-of-words representation, which does not capture the full meaning of the text. This is considered as one of the weaknesses of LDA models. More advanced topic modeling methods could potentially increase quality and interpretability of the topics. Studies show that including semantic information in topic modeling can improve topic quality (Batmanghelich, Saeedi, Narasimhan, & Gershman, 2016). Also, putting higher weights on named entities such as person name, location name, and events can improve interpretability and usability of topics (Krasnashchok & Jouili, 2018; Lau, Baldwin, & Newman, 2013). Topic modeling is an unsupervised machine learning methods, which is de- vised to enhance human decision making. Therefore, rigorous vetting of interpretability and utility are extremely important. So, some recent studies showed that incorporating user feedback in the topic modeling process can improve the interpretability and usefulness of the topics (Feng & Boyd-Graber, 2019; Kumar, Smith-Renner, Findlater, Seppi, & Boyd-Graber, 2019). In the future, we plan to adopt more advanced topic modeling tools to enhance interpretability of the topics, and also to analyze attitude and sentiment associated with each topic, as the experts suggested. Further, we are also interested in studies on training programs to fa- cilitate open data use for value creation using analytical tools. Future research will benefit from more domain-specific tool development and from including policy makers in the tool development process. In this way, the application will be tailored to the needs of users, and both usability and value will be augmented. Loni Hagen is an assistant professor at the University of South Florida's School of Information. Her current research interests are in use of computational methods to extract actionable information from open data for data-driven policy making. Her research domains include e- participation, emergency communication, privacy, and cybersecurity. Thomas E. Keller is a research scientist affiliated with Research Computing and the Genomics Program at the University of South Florida. His current interests are in data relating to text analysis using open data and social network analysis as well as computational and evolutionary biology with epigenomics and deep learning. Luis Felipe Luna-Reyes is an Associate Professor in the Department of Public Administration and Policy. He has been a Fulbright Scholar and he is currently Faculty Fellow at the Center for Technology in Government. He is also a Research Affiliated at the Universidad de las Americas, Puebla and a member of the Mexican National Research System. His research is at the intersection of Public Administration, Information Systems and Systems Sciences. He uses multi-method ap- proaches to contribute to a better understanding of collaboration and governance processes in the development of information technologies across functional and organizational boundaries in government. He is the author or co-author of more than 100 articles published in leading Journals and Academic Conferences. Xiaoyi Zhao is a PhD candidate in Information Science PhD pro- gram at the University at Albany, College of Emergency Preparedness, Homeland Security and Cybersecurity. Her current research interests are exploring the utilization and impact of open data using mixed methods including quantitative and qualitative data analysis and system dynamic modeling Acknowledgements Loni Hagen was supported by the National Research Foundation of Korea Grant funded by the Korean Government (NRF- 2017S1A3A2066084). Appendix I: 30 topics Table A1 LDA-topics, labels and topic quality. Topic ID Label Topic words 1 People people time make country american stop government states 2 President Obama** president obama congress states united petition act administration 3 Tax Budget* tax federal pay government money dollars budget employees (continued on next page) L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 12 Table A1 (continued) Topic ID Label Topic words 4 Cancer Disease** health care cancer disease research medical treatment patients 5 Election Clinton** vote investigation election clinton investigate people federal party 6 Prison Sentence** justice years prison case life trial court release 7 Terrorism Syria** war terrorist people stop government terrorism genocide syria 8 Guns Firearms** law amendment gun rights states laws ban weapons 9 Children Gender children child women sex law sexual parents rights 10 Religion* rights government religious human freedom god religion church 11 National Holiday* day national american house holiday white awareness world 12 Water Park Energy** water national energy park land oil areas gas 13 Police & BLM ** police law officers violence enforcement officer black death 14 Internet Companies* internet service information access companies small business government 15 Students School Education** students school education schools student public children college 16 Ukraine Russia* ukraine russian russia puerto sanctions japan ukrainian rico 17 Visa Immigration** visa immigration united states status family green home 18 Military Veterans** military service members veterans soldiers war army forces 19 White Anti Genocide** white anti genocide countries whites racist word code 20 Http & China* http www org chinese people human china world 21 Animal* animals animal dogs wild hong dog kong horses 22 Secession* states united government state america people powers nature 23 Vehicle & FAA** vehicles safety vehicle faa aircraft air cars flight 24 Medal Award* medal honor freedom award presidential game team american 25 Food Labeling** food fda products foods health safe labeling ban 26 Marijuana** marijuana drug cannabis medical schedule hemp states substances 27 Ebola & TPP* ebola trans media trump trade partnership people protect 28 FDA & Blood fda blood life india drug sri sikhs drugs 29 Mcllellan mcclellan act iran veterans toxic nuclear congress health 30 Charly Wingate charly robbery pardon vietnam max retrial wingate circumcision Based on the topics and visualization results, human coders put labels following the guideline reported in 3.2 and also judged the quality of each topic. Table A1 shows the label, topic quality (indicated by number of asterisks), and eight topic words for each of the 30 topics extracted from the petition data. Asterisks in the “Label” column in Table A1 indicate topic quality judged by a human annotator; ‘**’ indicates “good quality,” ‘*’ indicates “fair quality,” and no asterisk indicates “poor quality” topics. Appendix II: Interview Questions (IRB approved) Six to eight sampled policy analysts from capital region of New York State will evaluate the practical usefulness of the text mining tools developed by the researchers. We will come to the interviewees' work places and interview them individually. We will prepare three sets of electronic in- struments: 1) the interactive software loaded with the visualization results,2) one electronic file containing 30 graphs reflecting topics and petition signatures, and 3) another electronic file containing 12 images showing Google Trends and topics in two columns and six rows. The participants will be instructed about data and data mining tools used to create the visualization and presented images, then will be requested to investigate all of them. Any questions will be answered by the investigator(s). After the participants are finished with the investigation, they will be prompted to answer to the questionnaire. The participants will be given as long as necessary to complete the investigation. The questionnaire includes the following questions: 1. What kind of analysis do you do as everyday practice? 2. How things get into the conversation in regards with legislative or policy agenda? 3. What do you think about using social media and petition sites for possible agenda for legislatures or more in general to establish policy? 4. What is your general perception of the relevance of these topics for current legislative and policy agenda? 5. What is your interpretation of these results? 6. How user friendly are these tools from your point of view and experience? 7. How useful do you think this tool and images would be for your work and practice? 8. What would you do to improve the tool and make it more helpful for your practice? 9. How well do you think that these images and analyses represent the interests of the general public? 10. What do you think about skills needed to be able to produce/apply these tools? 11. Do you feel comfortable about applying similar technologies in your work? What kind of skills would be needed to be able to do this? 12. Is there any other relevant topics you would like to discuss or any other thing you want to mention that was not covered in our questions? References Aitamurto, T. (2012). Crowdsourcing for democracy: A new era in policy-making. Parliament of Finland. Attard, J., Orlandi, F., Scerri, S., & Auer, S. (2015). A systematic review of open gov- ernment data initiatives. Government Information Quarterly, 32(4), 399–418. https:// doi.org/10.1016/j.giq.2015.07.006. Bannister, F., & Connolly, R. (2014). ICT, public values and transformative government: A framework and programme for research. Government Information Quarterly, 31(1), 119–128. https://doi.org/10.1016/j.giq.2013.06.002. Batmanghelich, K., Saeedi, A., Narasimhan, K., & Gershman, S. (2016). Nonparametric spherical topic modeling with word Embeddings. Proceedings of the Conference. Association for Computational Linguistics. Meeting. 2016. Proceedings of the Conference. Association for Computational Linguistics. Meeting (pp. 537–542). . https://doi.org/10. 18653/v1/P16-2087. Bergh, C., & Benghiat, G. (2017). Analytics at Amazon speed: The new normal. Business Intelligence Journal, 22(2), 46–54. Bertot, J. C., Butler, B. S., & Travis, D. M. (2014). Local big data: The role of libraries in building community data infrastructures. Proceedings of the 15th annual international conference on digital government research (Dg.o 2014) (pp. 17–23). . https://doi.org/ 10.1145/2612733.2612762. Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84. Boyd-Graber, J., Mimno, D., & Newman, D. (2014). Care and feeding of topic models: L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 13 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0010 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0010 https://doi.org/10.1016/j.giq.2015.07.006 https://doi.org/10.1016/j.giq.2015.07.006 https://doi.org/10.1016/j.giq.2013.06.002 https://doi.org/10.18653/v1/P16-2087 https://doi.org/10.18653/v1/P16-2087 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0035 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0035 https://doi.org/10.1145/2612733.2612762 https://doi.org/10.1145/2612733.2612762 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0045 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0050 Problems, diagnostics, and improvements. In E. M. Airoldi, D. Blei, E. A. Erosheva, & S. E. Fienberg (Eds.). Handbook of mixed membership models and their applications (pp. 225–255). Boca Raton, FL: CRC Press. Cassi, L., Lahatte, A., Rafols, I., Sautier, P., & de Turckheim, É. (2017). Improving fitness: Mapping research priorities against societal needs on obesity. Journal of Informetrics, 11(4), 1095–1113. https://doi.org/10.1016/j.joi.2017.09.010. Chatfield, A. T., & Reddick, C. G. (2017). A longitudinal cross-sector analysis of open data portal service capability: The case of Australian local governments. Government Information Quarterly, 34(2), 231–243. https://doi.org/10.1016/j.giq.2017.02.004. Cisco. (2013). The internet of everything for cities. Retrieved from https://www.cisco. com/c/dam/en_us/solutions/industries/docs/gov/everything-for-cities. pdf. Dawes, S. S., Pardo, T. A., & Cresswell, A. M. (2004). Designing electronic government information access programs: A holistic approach. Government Information Quarterly, 21(1), 3–23. Feng, S., & Boyd-Graber, J. (2019). What can AI do for me?: Evaluating machine learning interpretations in cooperative play. Proceedings of the 24th international conference on intelligent user interfaces - IUI ‘19 (pp. 229–239). . https://doi.org/10.1145/3301275. 3302265. Gascó-Hernández, M., Martin, E. G., Reggi, L., Pyo, S., & Luna-Reyes, L. F. (2018). Promoting the use of open government data: Cases of training and engagement. Government Information Quarterly, 35(2), 233–242. https://doi.org/10.1016/j.giq. 2018.01.003. Graves, A., & Hendler, J. (2013). Visualization tools for open government data. Proceedings of the 14th annual international conference on digital government research (pp. 136–145). . Hagen, L. (2016). Topic modeling for e-petition analysis: Interpreting petitioners' policy prio- rities (Ph.D.)United States – New York: State University of New York at Albany. Hagen, L. (2018). Content analysis of e-petitions with topic modeling: How to train and evaluate LDA models? Information Processing and Management, 54(6), 1292–1307. https://doi.org/10.1016/j.ipm.2018.05.006. Hagen, L., Harrison, T. M., & Dumas, C. L. (2018). Data analytics for policy informatics: The case of E-petitioning. Policy analytics, modelling, and informatics (pp. 205–224). Cham: Springer. https://doi.org/10.1007/978-3-319-61762-6_9. Hagen, L., Uzuner, O., Kotfila, C., Harrison, T. M., & Lamanna, D. (2015). Understanding Citizens' direct policy suggestions to the Federal Government: A natural language processing and topic modeling approach. 2015 48th Hawaii International Conference on System Sciences (HICSS) (pp. 2134–2143). . https://doi.org/10.1109/HICSS.2015. 257. Howell, E., & Lang, J. (2017). Researching UX: User research. VIC Australia: Sitepoint Pty Ltd. Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and myths of open data and open government. Information Systems Management, 29(4), 258–268. https://doi.org/10.1080/10580530.2012.716740. Janssen, M., & Helbig, N. (2019). In PressInnovating and changing the policy-cycle: Policy- makers be prepared!. Government Information Quarterlyhttps://doi.org/10.1016/j. giq.2015.11.009. Krasnashchok, K., & Jouili, S. (2018). Improving topic quality by promoting named en- tities in topic modeling. Proceedings of the 56th annual meeting of the Association for Computational Linguistics. Vol. 2. Proceedings of the 56th annual meeting of the Association for Computational Linguistics (pp. 247–253). . Short Papers. Retrieved from https://www.aclweb.org/anthology/P18-2040. Kumar, V., Smith-Renner, A., Findlater, L., Seppi, K., & Boyd-Graber, J. (2019). Why Didn't you listen to me? Comparing user control of human-in-the-loop topic models. ArXiv:1905.09864 [Cs]. Retrieved from http://arxiv.org/abs/1905.09864. Lau, J. H., Baldwin, T., & Newman, D. (2013). On collocations and topic models. ACM Transactions on Speech and Language Processing, 10(3), 10:1–10:14. https://doi.org/ 10.1145/2483969.2483972. Luna-Reyes, L. F. (2017). Opportunities and challenges for digital governance in a world of digital participation. Information Polity, 22(2–3), 197–205. https://doi.org/10. 3233/IP-170408. Luna-Reyes, L. F., & Gil-Garcia, J. R. (2014). Digital government transformation and in- ternet portals: The co-evolution of technology, organizations, and institutions. Government Information Quarterly, 31(4), 545–555. https://doi.org/10.1016/j.giq. 2014.08.001. Magalhaes, G., & Roseira, C. (2017). Open government data and the private sector: An empirical view on business models and value creation. Government Information Quarterly. https://doi.org/10.1016/j.giq.2017.08.004. Marr, B. (2018, June 20). Comparing data visualization software: Here are the 7 best tools for 2018. Forbes. Retrieved from https://www.forbes.com/sites/bernardmarr/2018/ 06/20/comparing-data-visualization-software-here-are-the-7-best-tools-for-2018/. Mergel, I., Kleibrink, A., & Sörvik, J. (2018). Open data outcomes: U.S. cities between product and process innovation. Government Information Quarterly, 35(4), 622–632. https://doi.org/10.1016/j.giq.2018.09.004. Mimno, D. (2013). Package “mallet”. Retrieved October 9, 2015, from https://cran.r- project.org/web/packages/mallet/mallet.pdf. Najafabadi, M. M., & Luna-Reyes, L. F. (2017). Open government data ecosystems: A closed-loop perspective. Proceedings of the 50th Hawaii international conference on system science (HICSS-50) (pp. 2711–2720). . Nam, T. (2012). Citizens' attitudes toward Open Government and Government 2.0. International Review of Administrative Sciences, 78(2), 346–368. https://doi.org/10. 1177/0020852312438783. Nielsen, J. (2012). Usability 101: Introduction to usability. Nielsen Norman Group. Retrieved from http://www. nngroup.com/articles/usability-101-introduction-to- usability/on 2018-11-20. Nielsen, J., & Molich, R. (1990). Heuristic evaluation of user interfaces. CHI ‘90 pro- ceedings of the SIGCHI conference on human factors in computing systemshttps://doi. org/10.1145/97243.97281. Norris, D. F., & Reddick, C. G. (2013). Local E-Government in the United States: Transformation or incremental change? Public Administration Review, 73(1), 165–175. https://doi.org/10.1111/j.1540-6210.2012.02647.x. Oracle (2017). MySQL. Retrieved November 13, 2017, from https://www.mysql.com/. Poucke, S. V., Zhang, Z., Schmitz, M., Vukicevic, M., Laenen, M. V., Celi, L. A., & Deyne, C. D. (2016). Scalable predictive analysis in critically ill patients using a visual open data analysis platform. PLoS One, 11(1), e0145791. https://doi.org/10.1371/journal. pone.0145791. Puron-Cid, G., Gil-Garcia, J. R., & Luna-Reyes, L. F. (2016). Opportunities and challenges of policy informatics: Tackling complex problems through the combination of open data, technology and analytics. International Journal of Public Administration in the Digital Age, 3(2), 66–85. rapidminer (2017). Lightning fast unified data science platform. Retrieved November 16, 2017, from https://rapidminer.com/products/. Reddick, C. G., Chatfield, A. T., & Ojo, A. (2017). A social media text analytics framework for double-loop learning for citizen-centric public services: A case study of a local government Facebook use. Government Information Quarterly, 34(1), 110–125. https://doi.org/10.1016/j.giq.2016.11.001. Rickford, R. (2016). Black lives matter: Toward a modern practice of mass struggle. New Labor Forum, 25(1), 34–42. https://doi.org/10.1177/1095796015620171. Rubin, J. (1994). Handbook of usability testing: How to plan, design, and conduct effective tests (1st ed.). Wiley. Sahuguet, A., Krauss, J., Palacios, L., & Sangokoya, D. (2014). Open civic data: Of the people, for the people, by the people. IEEE Data Engineering Bulletin, 37(4), 15–26. Siegel, E. (2016). Predictive analytics: The power to predict who will click, buy, lie, or die (2nd ed.). Wiley. Sievert, C., & Shirley, K. E. (2014). LDAvis: A method for visualizing and interpreting topics. Proceedings of the workshop on interactive language learning, visualization, and interfaces (pp. 63–70). . Sivarajah, U., Weerakkody, V., Waller, P., Lee, H., Irani, Z., Choi, Y., & Glikman, Y. (2016). The role of e-participation and open data in evidence-based policy decision making in local government. Journal of Organizational Computing and Electronic Commerce, 26(1–2), 64–79. Stephens-Davidowitz, S. (2017). Everybody lies: Big data, new data, and what the internet can tell us about who we really are. New York, NY: Dey Street Books. The White House. (2011). September 20Opening remarks by President Obama on open government partnership. Retrieved May 16, 2015, from https://www.whitehouse. gov/node/78625 The White House (2015). The open government partnership: Third open government National Action Plan for the United States of America. Retrieved from http://www. whitehouse.gov/blog/2013/12/06/united-states-releases-its-second-open- government-national-action-plan. The White House (2017). For developers: We the people API. (Retrieved November 3, 2017, from /developers). Toots, M., McBride, K., Kalvet, T., & Krimmer, R. (2017). Open data as enabler of public service co-creation: Exploring the drivers and barriers. 2017 Conference for E- Democracy and Open Government (CeDEM) (pp. 102–112). . https://doi.org/10.1109/ CeDEM.2017.12. Treacy, M., & O'Sullivan, J. (2010). e-Government and organisational transformation – A perspective from the property registration Authority of Ireland. Presented at the 10th European conference on e-government (ECEG 2010) (pp. 400–408). . Ubaldi, B. (2013). Open government data: Towards empirical analysis of open government data initiatives. Paris: OECD Working Papers on Public Governance (22) 0_1,1,4-60. Walters, L. C., Aydelotte, J., & Miller, J. (2000). Putting more public in policy analysis. Public Administration Review, 60(4), 349–359. https://doi.org/10.1111/0033-3352. 00097. Yalçın, M. A., Elmqvist, N., & Bederson, B. B. (2016). Keshif: Out-of-the-box visual and interactive data exploration environment - semantic scholar. In: Proceedings of the IEEE VIS 2016 workshop on visualization in practice: Open source visualization and visual analytics software (Retrieved from /paper/Keshif-Out-of-the-Box-Visual-and- Interactive-Data-Yalçın-Elmqvist/4364b7bb4f731bef0c9f22067691fefa42d85c93). Zuiderwijk, A., Helbig, N., Gil-Garcia, R. J., & Janssen, M. (2014). Special issue on in- novation through open data - a review of the state-of-the-art and an emerging re- search agenda: Guest Editors' introduction. Journal of Theoretical & Applied Electronic Commerce Research, 9(2), I–XIII. https://doi.org/10.4067/S0718- 18762014000200001. Zuiderwijk, A., Janssen, M., Choenni, S., Meijer, R., & Alibaks, R. S. (2012). Socio-tech- nical impediments of open data. Electronic Journal of Electronic Government, 10(2), 156–172. L. Hagen, et al. Government Information Quarterly xxx (xxxx) xxx–xxx 14 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0050 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0050 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0050 https://doi.org/10.1016/j.joi.2017.09.010 https://doi.org/10.1016/j.giq.2017.02.004 https://www.cisco.com/c/dam/en_us/solutions/industries/docs/gov/everything-for-cities https://www.cisco.com/c/dam/en_us/solutions/industries/docs/gov/everything-for-cities http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0065 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0065 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0065 https://doi.org/10.1145/3301275.3302265 https://doi.org/10.1145/3301275.3302265 https://doi.org/10.1016/j.giq.2018.01.003 https://doi.org/10.1016/j.giq.2018.01.003 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0080 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0080 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0080 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0085 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0085 https://doi.org/10.1016/j.ipm.2018.05.006 https://doi.org/10.1007/978-3-319-61762-6_9 https://doi.org/10.1109/HICSS.2015.257 https://doi.org/10.1109/HICSS.2015.257 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0105 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0105 https://doi.org/10.1080/10580530.2012.716740 https://doi.org/10.1016/j.giq.2015.11.009 https://doi.org/10.1016/j.giq.2015.11.009 https://www.aclweb.org/anthology/P18-2040 http://arxiv.org/abs/1905.09864 https://doi.org/10.1145/2483969.2483972 https://doi.org/10.1145/2483969.2483972 https://doi.org/10.3233/IP-170408 https://doi.org/10.3233/IP-170408 https://doi.org/10.1016/j.giq.2014.08.001 https://doi.org/10.1016/j.giq.2014.08.001 https://doi.org/10.1016/j.giq.2017.08.004 https://www.forbes.com/sites/bernardmarr/2018/06/20/comparing-data-visualization-software-here-are-the-7-best-tools-for-2018/ https://www.forbes.com/sites/bernardmarr/2018/06/20/comparing-data-visualization-software-here-are-the-7-best-tools-for-2018/ https://doi.org/10.1016/j.giq.2018.09.004 https://cran.r-project.org/web/packages/mallet/mallet.pdf https://cran.r-project.org/web/packages/mallet/mallet.pdf http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0170 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0170 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0170 https://doi.org/10.1177/0020852312438783 https://doi.org/10.1177/0020852312438783 https://doi.org/10.1145/97243.97281 https://doi.org/10.1145/97243.97281 https://doi.org/10.1111/j.1540-6210.2012.02647.x https://www.mysql.com/ https://doi.org/10.1371/journal.pone.0145791 https://doi.org/10.1371/journal.pone.0145791 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0200 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0200 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0200 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0200 https://rapidminer.com/products/ https://doi.org/10.1016/j.giq.2016.11.001 https://doi.org/10.1177/1095796015620171 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0220 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0220 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0225 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0225 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0235 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0235 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0240 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0240 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0240 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0245 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0245 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0245 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0245 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0255 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0255 https://www.whitehouse.gov/node/78625 https://www.whitehouse.gov/node/78625 http://www.whitehouse.gov/blog/2013/12/06/united-states-releases-its-second-open-government-national-action-plan http://www.whitehouse.gov/blog/2013/12/06/united-states-releases-its-second-open-government-national-action-plan http://www.whitehouse.gov/blog/2013/12/06/united-states-releases-its-second-open-government-national-action-plan http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0265 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0265 https://doi.org/10.1109/CeDEM.2017.12 https://doi.org/10.1109/CeDEM.2017.12 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0275 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0275 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0275 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0280 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0280 https://doi.org/10.1111/0033-3352.00097 https://doi.org/10.1111/0033-3352.00097 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0295 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0295 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0295 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0295 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0295 https://doi.org/10.4067/S0718-18762014000200001 https://doi.org/10.4067/S0718-18762014000200001 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0320 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0320 http://refhub.elsevier.com/S0740-624X(18)30368-X/rf0320 University of South Florida From the SelectedWorks of Loni Hagen 2019 Open data visualizations and analytics as tools for policy-making Open data visualizations and analytics as tools for policy-making Introduction Background: We the people open data Literature review Analytics to create value through open data Visualization of topic modeling Methods Data Tools for assessing and visualizing data1 Usability assessment Findings: Usability assessment Discussion Barriers, limitations, and suggestions Higher bars for adopting information acquired by data-driven analysis for policy making Implications on tool development Conclusion Acknowledgements Appendix I: 30 topics Appendix II: Interview Questions (IRB approved) References