key: cord-0259775-k0ibmaw3 authors: Liu, Michael Xieyang; Kittur, Aniket; Myers, Brad A. title: Crystalline: Lowering the Cost for Developers to Collect and Organize Information for Decision Making date: 2022-02-04 journal: nan DOI: 10.1145/3491102.3501968 sha: 0e8ca9c8ed4ae11c4e0de34538341c6d51498d7e doc_id: 259775 cord_uid: k0ibmaw3 Developers perform online sensemaking on a daily basis, such as researching and choosing libraries and APIs. Prior research has introduced tools that help developers capture information from various sources and organize it into structures useful for subsequent decision-making. However, it remains a laborious process for developers to manually identify and clip content, maintaining its provenance and synthesizing it with other content. In this work, we introduce a new system called Crystalline that attempts to automatically collect and organize information into tabular structures as the user searches and browses the web. It leverages natural language processing to automatically group similar criteria together to reduce clutter as well as passive behavioral signals such as mouse movement and dwell time to infer what information to collect and how to visualize and prioritize it. Our user study suggests that developers are able to create comparison tables about 20% faster with a 60% reduction in operational cost without sacrificing the quality of the tables. Developers spend a large portion of their time searching and making sense of the web for solutions to their programming problems [9, 108] . In many cases, the answers to such problems are not limited Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). to a single solution, but developers discover that there are multiple legitimate options, and they must identify relevant criteria and constraints based on their unique contexts and carefully consider the trade-offs among those possible options [42, 63, 77, 78, 81, 82, 92, 94, 100, 107] . For example, when converting an old web application to use a modern JavaScript front-end framework, React.js [34] (with its ability to be progressively adopted into existing code bases) may be more suitable when one wants to gradually convert each separate module while minimizing the overall system downtime, whereas a more comprehensive framework such as Angular [47] might be a better choice if one wants to take advantage of various official utility packages like routing [44] , animation [45] and data validation [46] . There have been many commercial and research tools and systems that try to help people make sense of information about tradeoffs to facilitate further decision making, such as by helping with easily capturing snippets of information [1, 5, 53, 110, 121] from web pages or organizing and synthesizing information into useful schema and representations [15, 29, 61, 71, 81, 122] . For example, one common practice that people employ is copying pieces of text as well as taking screenshots and putting them in a running Google Doc as they search and browse the web [88] . One system that is relevant to the context of programming is Unakite [81] , which enables developers to collect and organize information online into comparison tables with options, criteria, and evidence to help with making decisions (see Figure 2 ). However, even with the above tools, it remains a challenging process for developers to manually identify and capture the relevant content, maintain its provenance (where it came from), and synthesize it with other content. Prior work suggests that one cause is that people are often uncertain about which information will eventually turn out to be relevant, valuable, and worth capturing, especially at early stages of their learning and exploration when they are overloaded with information [4, 37] . Under these circumstances, people are hesitant to frequently pause and shift their focus from the investigation itself to reasoning about what to capture for later use [14, 58, 72, 109] , or they could be too engaged in the sensemaking process and forget to collect anything at all. Indeed, research suggests that interactions for gathering information while performing active reading need to be quick and low effort, otherwise people tend not to capture information in the first place [58, 81, 85, 118] . In addition, though existing tools provide users with the flexibility and agency to synthesize the collected information into useful representations, such as comparison tables [15, 81] or knowledge maps [87] , developers still need to perform these organizing operations manually. This is often a laborious process, as developers need to Figure 1 : Crystalline's list view UI (a). As the developer browses a web page (b), Crystalline attempts to automatically collect options and criteria from the page, and display them in the options (c) and criteria panes (d) in the sidebar (a). In addition, Crystalline leverages natural language processing to automatically group similar criteria together, as shown by the multiplepages icon (e). Crystalline uses behavioral signals such as mouse movement and dwell time to try to automatically detect the relative importance of the criteria (shown by the display order, with most important at the top). Users can use the "See more" and "See less" buttons (g) to adjust how many criteria are to be displayed at once. Crystalline will remind users of the existence of additional related evidence through a red notification dot at the top right of a criterion (f). The sidebar can be toggled in and out by clicking the browser extension icon (h). Users may pin (i) important criteria to the top of the list. take stock of all the pieces of information, identify connections among them, and directly manipulate the representation to reflect the connections. Another challenge reported in prior work is that developers' needs for collecting and organizing information are often not discovered until part of the way through an investigation process [16, 81] . This could be due to several major reasons, including but not limited to: 1) additional external requirements, constraints, or user feedback are discovered or introduced in the middle of a project which significantly complicates the original decision making problem [23, 30, 31] ; 2) developers discover many more options, criteria, and their trade-offs than they anticipated at the beginning [81] ; and/or 3) developers are required to explain or document their decisions and design rationale after the fact for the long-term maintainability and success of a software project [25, 39, 75, 76, 79, 104, 112] . In these situations, it is hard and involves duplicate work for developers to recall and retrace their steps for reaching their current state of sensemaking (the linear history visualization in almost all current browsers is known to be not particularly effective [16, 67, 124] ) and recollect all the relevant evidence again. In our new work, we explore the idea of having a system dynamically help users keep track of and organize information by leveraging the content they are browsing and the signals from their browsing behavior. Although we focus on the domain of programming due to strongly motivating prior work and ease of prototype development due to regularities of the programming context, our work may also generalize to other sensemaking contexts on the web. We instantiate this idea in a prototype system called Crystalline, 1 which is an extension to the Chrome web browser. Crystalline plays the role of a user's copilot and attempts to automatically identify and keep track of the options, criteria, and the corresponding evidence snippets from the web pages that a user has viewed, and organize the snippets into both list and tabular formats. To achieve this, Crystalline mines a variety of behavioral signals while a user browses the web, including scrolling patterns and mouse cursor actions, and employs natural language understanding techniques to automatically classify and organize the collected content. The goal is that users can focus more on reading and understanding web content while occasionally guiding the system when it makes mistakes. We conducted a user study to evaluate the usability and effectiveness of Crystalline compared to Unakite as a baseline, which found that developers are able to build comparison tables about 20% faster with a 60% reduction in operational cost without sacrificing the quality of the tables. In particular, it only requires around 12% of the total task completion time for participants to use the tool to build and maintain a table, compared to around 30% in the baseline condition. The primary contributions described in this paper include: • evidence that it is possible to automatically identify options, criteria, and relevant evidence from web pages that a user is browsing using a set of natural language understanding heuristics, • a set of implicit behavioral signals that users exhibit when browsing the web which can be used for prioritizing and filtering that collected information, • a prototype system called Crystalline that integrates the heuristics and signals to automatically collect and organize viewed information into list and comparison table views for subsequent decision making, • an evaluation that offers empirical insights into the usability, usefulness, and effectiveness of those signals and the system. Sensemaking is widely considered to be the process of searching, collecting, and organizing information to iteratively develop a mental model that best fits the evidence [96, 106] . As knowledge workers [9] , many activities that developers perform on a daily basis involve extensive sensemaking, such as designing the overall software architecture [56, 83] , learning and understanding unfamiliar code and concepts [26, 73] , debugging and fixing incorrect software behaviors [25, 74] , planning and executing code refactorings [32, 41, 86] , and evaluating past code and design patterns for future reuse [82, 91] . In this work, we focus on the particular type of sensemaking activity where developers leverage web resources to make a decision to solve their programming problem [9, 63] . Here, developers not only need to find information pertinent to their problem [8, 59, 97, 115] , which is the first step in such complex sensemaking tasks [106, 123] , but also collect and synthesize relevant information into structured knowledge so that they can make progress towards fully understanding the decision space [53, 71, 72, 81] . Indeed, our survey [63] revealed that over half of the questions asked on Stack Overflow contain answers with multiple options, each option valuable to the programming community due to a unique set of criteria that it fulfills. Software engineering research has also identified that subsequent developers frequently need help with understanding the rationale of design decisions and code implementations made by previous developers [75, 76, 112] . This can be particularly difficult if the previous developers failed to properly document the rationale [120] , or the documentation was incomplete or not up-to-date [38] . Granted, the fundamental challenge here is that it is effort-and timeintensive for decision authors to document their rationale (either in situ or after the fact) with little immediate payoff for themselves [42] . Our previous Unakite tool [81] addressed this challenge by encouraging authors to document their decision making processes and results using the tool's lightweight collecting and organizing features. Building on top of this, Crystalline further transforms the previously active capturing and organizing work [5, 81, 110] into passive monitoring and error-fixing [80] , which has been shown to present a much lower entry barrier for people to start contributing [37] . To help people more effectively gather and process online information, systems and tools like SenseMaker [3] , SearchPad [5] , Hunter Gather [110] , CoSense [93] , Tabs.do [16] , as well as commercial systems like the Evernote clipper [33] , enable people to take entire pages or snippets of content from the web, classify them, and later put them together into a document with a coherent narrative for sensemaking, decision making or sharing and collaboration. However, one common characteristic of these tools is that it is mostly the user's responsibility to manually complete the information collection, triage, and organization process, while we attempt to do this automatically with Crystalline as the user searches and browses the web. Other threads of prior research have explored different ways for machines to help during sensemaking, which inspired and informed our design. For example, systems like Entity Quick Click [6, 66, 116] employ techniques like named-entity recognition [84] to pre-process and highlight semantically meaningful entities in web content, and enable users to collect and annotate relevant information with a single click. Previous work like Thresher [60] and Dontcheva et al.'s personal web summarization tool [29] let users annotate and curate patterns and templates of information that they would like to collect on a few example web pages, then automatically collect them from future pages. In addition, Chang et al.' Mesh system [15] automatically retrieves relevant consumer product facts and reviews from Amazon into a comparison table to enable users to curate and explore nuanced options and criteria. These systems have largely relied on natural language understanding to analyze and transform the web content that users browse and read, while we argue that leveraging the signals from users' natural browsing behavior, such as dwell time and cursor movements, would unlock a new design space for automated machine support during online sensemaking, motivating us to use both NLP heuristics and passive behavioral signals to infer what information to collect and how to visualize and prioritize it in Crystalline. Prior research has investigated various implicit behavioral patterns that people exhibit when reading and interacting with content on a digital screen. One thread of research has explored using behaviors [81] . For full details, see [81] . such as dwell time, cursor movements, clicks, scrolling patterns, and gaze positions as implicit signals to approximate user interest on web pages as well as search result relevance [22, 50, 51, 57, 65] . For example, Claypool et al. [22] had participants use a custom-built browser to surf the web and concluded that the time spent on a page, the amount of scrolling on a page, and the combination of time and scrolling had a strong correlation with explicit user interest. In addition, Hijikata [57] discovered that actions such as text tracing and link pointing are decent behavioral indicators for perceived interesting segments of web pages. Similarly, in the domain of web searches, Buscher et al. [10] [11] [12] , Guo and Agichtein [50, 51] , and Huang et al. [65] demonstrated that eye tracking, as well as interactions like scrolling and cursor hovers, could accurately predict user interests in search results pages. Building on the empirical understanding laid out by this research, in this work, we explore putting a combination of these implicit behavioral signals into use to approximate user visual attention in a working prototype. We used heuristics and pilot testing to devise mechanisms that translate the raw behavioral signals into numeric scores representing the "amount of attention" a user has given to a particular piece of online content. We then use these scores to filter out and rank the content of the evolving comparison table, further reducing the cost for developers to manually manage and prioritize collected information incrementally as they are searching and browsing. In this work, we explore automatically keeping track of and organizing relevant information on the web about trade-offs for developers as they are making decisions. To ground our research, we build on the "Option-Criterion-Evidence" framework introduced in our Unakite system [81] . We first briefly explain this framework as well as the Unakite system to provide necessary background for this research. Then we discuss the design goals for the new Crystalline system. Unakite was designed to address both the need of developers to synthesize online information about trade-offs when making programming decisions as well as the need of subsequent developers to be able to understand the rationale behind those decisions [81] . As a Chrome extension, Unakite enables developers to manually collect any content from any web pages as snippets (pieces of information, Figure 2 -d) into the snippet repository (a holding tank of information snippets, Figure 2 -c) by selecting (Figure 2 -a1) or dragging out a bounding box to enclose the desired content with the mouse cursor ( Figure 2-a2) . To organize the collected content, developers can use drag-and-drop to move the collected snippets from the repository into a comparison table (Figure 2 -b) options (as row headers, e.g., a solution to solve a problem), criteria (as column headers, e.g., a standard by which options are judged), and evidence ("thumbs-up" or positive, "thumbs-down" or negative, and "informational" ("i") ratings that spread across the rest of the table cells) that illustrates the trade-offs among various options on those criteria. Developers can also rank the options and criteria in the table to reflect their unique order of preferences. The resulting comparison table is automatically saved and can be used by subsequent developers to understand the context of the previous decision space: what options and alternatives were explored, what criteria needed to be met, what trade-offs were discovered, and what was considered the most important and why. Although Unakite has been shown to incur less operational overhead when it comes to collecting and organizing information in situ compared to common baseline methods like using Google Docs [81] , developers still need to manually collect and structure each piece of content, which can be a costly process [58, 71, 72, 85, 118] . In addition, it forces developers to start using the tool from the outset to be able to capture the whole exploration, but, for cases in which the needs for collecting and organizing information are not discovered until partway through an investigation process (which can be quite common in agile style software development [23, 30, 31, 81] that is widely adopted across the software development industry), developers would have to retrace their exploration paths from the beginning and re-collect and organize the content, wasting time and causing duplicate work. In order to address the above limitations of Unakite as well as other similar sensemaking tools [3, 5, 16, 93] , we formulated the following design goals: • Minimize the cost to collect information. The system should attempt to automatically collect information in the background without the user's specific attention or direction. This will help users focus on the main task of reading and comprehending the content. • Actively filter, organize, and prioritize information. The system should actively filter, organize, and prioritize the collected information that gets presented to the user and help the user avoid information overload. • Reduce the cost of incorrect automation support. In cases where machine support is incorrect or undesirable, the system should allow users to easily recover from those mistakes [2, 62] . Guided by prior work and our design goals, we designed and implemented Crystalline, a Chrome extension prototype to help developers automatically collect and organize information relevant to their decision making problems. Users mainly interact with Crystalline through a sidebar ( Figure 1a ) that is injected directly into every web page. As a developer opens and reads web pages, the sidebar will be updated with the automatically collected options ( Figure 1c ) and criteria (Figure 1d ) in the list view (Figure 1c & d) . The list view serves as a concise and glanceable outline that reflects one's exploration progress -what options one has encountered and what criteria one has looked into. Clicking on one of the criteria will enter a detailed view for that criterion (Figure 3a) , listing out all the collected evidence snippets organized by options; similarly, clicking on an option will enter the detailed view for that option, which lists all the related criteria and the corresponding evidence associated with that option. Details on how we currently implemented the automatic collection and organization features are discussed in section 4.2. In addition, developers can also switch to the comparison table view (Figure 3c ) that summarizes the decision making space and the trade-offs among various options in detail. The order in which a criterion gets presented both in the list and the comparison table view are based on the estimated importance of the item to the user, which we approximate by the amount of attention a user has given to it. This, in turn, is derived from the user's implicit behavioral signals, which we will discuss in detail in section 4.2.2. To examine a particular piece of evidence in the detailed view or a comparison table cell, users can hover on it to zoom in (Figure 3b ), or click on it to teleport to the original web page and scroll position from where it was previously collected. Similar to previous systems [61, 81, 99] , the sidebar can be toggled in and out like a drawer by clicking the extension icon ( Figure 1h ) or using a keyboard shortcut. Developers can passively monitor the sidebar as they are searching and browsing to make sure the system performs correctly, and quickly correct or dismiss the mistakes that the system makes. In addition, developers are free to hide the sidebar to have an unobstructed view of the web page, knowing that all the features for automatic information collection and organization are still running in the background even if the sidebar is in the hidden state. We now discuss how the different features in Crystalline are designed and implemented, and how they support our design goals. In Crystalline, we explore having the system automatically collect relevant information in the background without the user having to explicitly perform the action of collecting information. This has the benefit of minimizing the distraction and cost of keeping track of information as an extra step in addition to thinking about the content on a web page, which, in turn, maximizes a user's attention to reading and understanding the content itself. Specifically, Crystalline collects information about options, criteria, and their associated evidence snippets as discussed previously, which was reported by prior work as the key aspects developers look for when solving decision making problems [63, 75, 81] . Currently, to automatically recognize the options, Crystalline employs the following techniques: (1) it looks for the word or phrase between any instances of "vs. " (or other variants like "v.s. ", "versus", etc.) in web page titles and opening paragraphs and adds them as potential options. For example, the Medium.com article titled "Tensorflow vs Keras vs Pytorch: Which Framework is the Best?" 2 would yield "Tensorflow", "Keras", and "Pytorch" as three potential options; (2) it first runs noun phrase and entity extractions using the Google Cloud Natural Language API [48] on the web page title, section headers as well as the column and row headers of any HTML tables, then checks if the identified entities are mentioned in the titles of other visited pages. In addition, it also checks if the identified entities would frequently come up in each other's Google autocomplete results (the Google "vs" technique is described in [40, 82] , which issues queries in the form of "[option_name] vs" to the Google Autocomplete API to get a list of autocomplete results that can be interpreted as potential alternatives to "[option_name]". An earlier version of this technique was launched as an experimental feature named Google Sets [21, 119] ). Furthermore, it checks if the identified entities are mentioned repeatedly across the main content of the current web page. All potential options will go through a final deduplication process to produce the final list of options presented in the options pane (Figure 1c ) in the sidebar. We chose and tuned these heuristics based on our internal usage and pilot testing results. In the future, more advanced NLP techniques could be used to augment the current set of heuristics. Crystalline uses a similar set of heuristics to identify criteria from the web pages, with an emphasis on examining section headers and table headers (and entities extracted from them) rather than website titles. In this work and in the context of programming, we focus on using such heuristics to identify the criteria directly mentioned in the content, such as extracting "learning curve" from "React is widely considered to have quite a steep learning curve." We leave the extraction of latent criteria for future work, which are more commonly seen in domains other than programming, such as extracting "price" from "I bought this mp3 player for almost nothing" [98] . Further, users can always edit the options and criteria names, delete unwanted options or criteria, or manually select and collect any text as either an option or a criterion using the popup menu ( Figure 4) Triggers each time when the mouse cursor hovers over a content block for at least 2 seconds. This accounts for situations where the developer naturally moves the mouse cursor onto the content that is currently being read to guide his or her attention [18, 64, 102, 103] . However, a cursor hover triggering will be disqualified when the system detects an extended period of idling (2 minutes) without any user actions. Weak 0.5 , where is the duration (measured in seconds) of the cursor's stay within the bounds of content block . The maximum score is 10. In our pilot testing, users rarely spend more than 10 seconds reading a text block. The longer some content stays visible, the more likely that the user is interested in it [22, 65] . Triggers each time when a content block gets scrolled into and stays in the visible view port for at least 2 seconds. This indicates that the developer has at least paid attention to . However, a dwell triggering during idling is disqualified. Weak 0.2 , where is the duration (measured in seconds) of content block 's stay in the visible browser viewport. The maximum score is 4. In our pilot testing, users rarely stay at one location for more than 10 seconds. Table 1 : Implicit behavioral signals used in Crystalline to track user attention. Column 1 lists the implicit signals; column 2 provides evidence from selected prior research on the efficacy of the signals; column 3 describes how the signals are used in Crystalline; column 4 indicates the relative strength of a signal in terms of predicting user attention; column 5 details the scoring function used to translate signal triggerings into numeric scores based on the relative signal strengths. The scoring functions were empirically determined through iterative pilot testing. Not all options or criteria are equally useful to a particular developer. Prior work has suggested that a programming decision usually comes down to how well each option matches the developer's goals and criteria that he or she deemed important [42, 77, 78, 82, 92, 94, 100, 107] . In this work, we explore using the amount of attention that one pays to a particular criterion to approximate its perceived value or importance. To operationalize this, for each web page that a developer visits, Crystalline processes all the content blocks (HTML block-level elements, such as

,

  • ,
    , and 
    , etc.) to detect what options and criteria are associated with each block. Specifically, it prioritizes verbatim mentioning of options and criteria within a block, then possible options and criteria identified from section headers above the block, then web page titles. If no options are detected, the page title is used as a placeholder. Next, Crystalline tracks each triggering of five implicit behavioral signals (copying content, text highlighting, clicking, cursor hovering, and content dwelling) listed in Table 1 on any content block and translates it into a numeric score (using column 5). The final attention score representing the amount of attention that a user pays to a particular criterion is then calculated using equation (1): where is the set of all implicit signal triggerings; is a particular triggering; ( , ) returns 1 if was triggered on a content block that is associated with the criterion , and returns 0 otherwise; and ( ) is the corresponding scoring function found in the last column in Table 1 . The scoring functions were empirically determined through iterative pilot testing. To accommodate various behavioral patterns exhibited by different users, we iteratively recruited four batches of participants with diverse backgrounds and job responsibilities both within our lab and externally. We followed a diary study approach [101] by monitoring their online searching and browsing behavior related to programming through a custom chrome extension that logs triggerings of the above behavior signals and ranks the importance of the associated content blocks accordingly (the initial score functions were determined through our heuristics). At the end of each sensemaking episode, we prompted them to review how well the system did in inferring what they thought was important, and tuned the score function heuristics accordingly (favoring recall over precision). We leave more advanced and adaptive scoring models for future work to investigate. By default, the system shows the top 15 criteria ranked by decreasing attention scores in both the list and the table view. Users can use the "See More" and "See Less" buttons to adjust how many criteria that they would like to see at the same time ( Figure 1g ). As the user browses more content and spreads his or her attention on different content blocks, the order of these criteria changes accordingly in real-time, which provides the user with an ambient awareness of what the system thinks are important. To provide users with the flexibility to override the system's ranking, they can right-click on a criterion and use the "pin this criterion" feature to pin it at the top (Figure 1i ). They can additionally specify their own order of preferences by dragging and dropping to reorder the criteria in the table view, which will automatically pin a criterion if it is not already pinned. Each time an implicit behavioral signal triggering is detected, Crystalline also collects the target content block as an evidence snippet, which is presented with its original styling [81] in the detail views and the comparison table view as mentioned above. One way for Crystalline to actively manage the relationships among the collected information is to automatically merge similar criteria together into criteria groups (indicated by a "multiple items" icon at the end, see Figure 1e ). To achieve this, we leverage recent advances in transformer machine learning models such as Universal Sentence Encoder [13] and BERT [28] that can encode textual content into semantically meaningful vector representations called embeddings [43] , i.e., two or more semantically close pieces of content will also be close in the embedding vector space (measured by a distance metric, e.g., the cosine similarity distance between vectors [113] ). Crystalline computes an embedding for every criterion as the average of its own embedding and its corresponding evidence snippet, and automatically merges criteria that are within a specified semantic distance threshold to each other into a group. For example, as shown in Figure 3a , the system automatically merges "Right to Left" (taken from the option "Splide") and "RTL" (taken from the option "Swiper") together since they are semantically similar. The distance threshold was determined empirically through iterative pilot testing. This has the benefit of reducing clutter while helping users make connections among the information that they have seen, which is reported by prior work as one of the difficult steps during sensemaking and schematization [37, 96, 106] . In case the system fails to automatically group similar criteria together, users can use drag and drop to manually make the grouping. Similarly, users can easily split a criteria group by right-clicking on the group and hitting the "split this criteria group" menu item. In situations where a user reads and investigates some criterion at one location, Crystalline will also actively look for evidence for the same or similar criteria from other pages that the user has visited (including the current page) but has not (yet) paid attention to according to the implicit signals. Crystalline will remind the user of the existence of this additional evidence through a red notification dot at the top right of a criterion (Figure 1f ) as well as in the detailed views (Figure 3d ). This then serves as an additional way for the system to help users uncover and manage unseen relationships among the information space, as well as a springboard for users to jump directly to the "overlooked" information for further investigation. The Crystalline Chrome browser extension is implemented in HTML, JavaScript, and CSS, using the React JavaScript library [34] . It also uses Google's Firebase for database synchronization and persistence, back-end functions, and user authentication. To produce the content embeddings, we used bert-as-a-service [28] and the uncased_L-12_H-768_A-12 pre-trained BERT model to implement a REST API that the extension can query on-demand. The embedding calculations are known to incur significant computational costs and delays. Therefore, to ensure a smooth user experience, they are better suited to run on a remote server with the necessary resources rather than locally in an end-user's browser. Unlike other systems [33, 95] that help users find more information from new sources, Crystalline only collects information from the web pages that a user has explicitly visited. This is an intentional design choice we make in the current implementation: the major role of Crystalline is to remove the burden for users to actively keep track of relevant information that they have personally seen and investigated so that it is easier for them to revisit and recall. We leave the design space of automating the discovery of new relevant information for future research to explore. We conducted an initial lab study to evaluate the usability of the Crystalline system in helping developers collect and organize information. We recruited 12 participants (7 male, 5 female) aged 22-35 ( = 27.6, = 3.7) years old through emails and social media. The participants were required to be 18 or older, fluent in English, and experienced in programming. Participants had on average 6.9 years of programming experience, with half of them currently working or having worked as a professional developer and the rest having programming experience in universities. The study was a within-subjects design, where participants were presented with two tasks and were asked to complete one of them using Unakite (baseline condition) and the other using Crystalline (experimental condition), in a counterbalanced order. For each task, participants were presented a programming decision-making problem, a set of four web pages, some necessary background of the problem, and a list of three options available to solve the problem that they were required to investigate. The provided web pages were either documentation pages of specific options or comprehensive review articles reviewing several options together. Participants were instructed to read through the provided web pages, and use either Unakite or Crystalline to collect and organize information into a comparison table containing all the given options and at least 8 different criteria in the order of their perceived importance. We imposed a 20-minute limit per task to keep participants from getting caught up in one of the tasks. However, they were instructed to inform the researcher when they have collected 8 criteria as well as the associated evidence. If they wished to continue beyond this checkpoint, they were allowed to, until they felt like they could make no further progress. Specifically, the two tasks were to use the corresponding system in each condition to build a comparison table of: • (A) Choosing a JavaScript carousel library to build a photo sharing web application. The available options were: Splide.js 3 , Slick 4 , and Swiper 5 . • (B) Choosing a front-end framework to implement a basic personal portfolio website. The available options were: React.js 6 , Angular 7 , and Vue.js 8 . We chose Unakite over other commercially available tools such as Google Docs as the baseline condition because: 1) it can be easily used to capture richer contexts such as formatted text (example code), images, and links; 2) similar to Crystalline, it also provides a 3 https://splidejs.com/ 4 https://kenwheeler.github.io/slick/ 5 https://swiperjs.com/ 6 https://reactjs.org/ 7 https://angular.io/ 8 https://vuejs.org/ sidebar that allows participants to view and organize the collected information directly rather than switching context over to another browser tab or application to paste in and structure information; and 3) Unakite was shown to be easy to learn and use in prior research and incurs significantly less overhead cost than using Google Docs [81] . In addition, rather than letting participants search for their own pages to research, we provided them with the predefined set of pages to ensure a fair comparison of the results, and since helping to find relevant web pages is not a goal of Crystalline. Requiring participants to only read the predefined pages (each contains on average 7 screenfuls of content) also helps ensure that the two tasks are of roughly equal difficulty in terms of reading and cognitive processing effort. Furthermore, to ensure realism and participant engagement, the tasks were selected based on actual questions asked and discussed on programming forums and websites. We specifically simplified the requirements and background of task B to match that of task A, since otherwise, choosing a JavaScript framework (e.g., to build interactive industry-level web applications) would arguably be more substantial and involve deeper and much more careful comparisons and team discussions that are beyond the scope of this lab study. In fact, as shown in section 6.1 there was no significant difference by task. Each study session started by obtaining consent and having participants fill out a demographic survey. Participants were then given a 10-minute tutorial showcasing the various features of Unakite and Crystalline and a 10-minute practice session on both systems before starting. At the end of the study, the researcher conducted a survey and an interview eliciting subjective feedback on the Unakite and Crystalline experience. Each study session took approximately 60 minutes, using a designated MacBook Pro computer with Chrome, Unakite and Crystalline installed. All sessions were carried out in person, with participants and the researcher appropriately masked following COVID-19 mitigation protocols. All participants were compensated $15 for their time. The study was approved by our institution's IRB office. Table 3 : Statistics of scores in the post-tasks survey. Participants were asked to rate their agreement with statements related to their experience interacting with Crystalline and Unakite on a 7-point Likert scale from "Strongly Disagree" (a score of 1) to "Strongly Agree" (a score of 7). Statistics in column 2 and 3 are presented in the form of mean (standard deviation). Statistically significant differences ( < 0.05) through paired t-tests are marked with an * . All participants were able to complete all of the tasks in both conditions, and nobody went over the pre-imposed time limit. Figure 1 , together with Figure 3 , shows an example table built by one of the participants in the study for task A. To examine how Crystalline performs compared to the baseline Unakite condition, we measured the time it took for participants to finish each task. A two-way repeated measures ANOVA was conducted to examine the within-subject effects of condition (Crystalline vs. Unakite) and task (A vs. B) on task completion time. There was a statistically significant effect of condition ( (1, 20) = 8.06, = 0.01) such that participants completed tasks significantly faster (21.6% faster) with Crystalline (Mean = 611.8 seconds, SD = 144.6 seconds) than in the Unakite condition (Mean = 780.3 seconds, SD = 137.6 seconds). There was no significant effect of task ( (1, 20) = 0.11, = 0.74), indicating the two tasks were indeed of roughly equal difficulty. These results suggest Crystalline helped participants build up comparison tables faster overall, even the majority of their time was necessarily spent reading through the material in both conditions. To account for this reading time, we also compared the overhead cost [81] of using both tools to collect and organize information. For the Crystalline condition, we calculated the overhead cost as the portion of the time participants spent on directly interacting with Crystalline (scrolling through the list and table view to examine the evidence collected so far, splitting and merging criteria, pinning important criteria, manually collecting information, etc.) out of the total time they used for a task (vs. reading and comprehending the web pages). Similarly, in the Unakite condition, the overhead cost was calculated as the percent of time participants spent on directly using Unakite features (selecting and collecting information snippets, drag and dropping snippets into the comparison table, etc.), in the same way as was done to compare Unakite to Google Docs [81] . A two-way repeated measures ANOVA was conducted to examine the within-subject effects of condition (Crystalline vs. Unakite) and task (A vs. B) on overhead cost. There was a statistically significant effect of condition ( (1, 20) = 77.5, < 0.001) such that the overhead cost was significantly lower (almost 60% lower) in the Crystalline condition (Mean = 11.6%, SD = 0.04) than in the Unakite condition (Mean = 28.4%, SD = 0.07). Again, there was no significant effect of task ( (1, 20) = 0.53, = 0.48)). Thus, using Crystalline resulted in reduced overhead costs of collecting and organizing information. To gain deeper insights into why the overhead cost was significantly lower in the Crystalline condition, we tallied the number of interactions performed in each task while collecting and organizing information to build the comparison tables (Table 2) . Here, we notice that the majority of interactions in the Unakite condition are to manually collect information snippets (on average 26.6 times) and place them into the comparison table (on average 15.5 times). In contrast, in the Crystalline condition, the majority of interactions are to merge criteria into groups (on average 2.08 times) and pin or reorder the criteria in the table (on average 5.42 times). This suggests that, to some extent, Crystalline has transformed the previously active capturing and organizing work into passive monitoring and error-fixing, which explains the lower overhead cost. In the survey, participants reported (in 7-point Likert scales) that they thought the interactions with Crystalline were understandable and clear (Mean = 6.17, SD = 0.39), Crystalline was easy to learn (Mean = 6.08, SD = 0.79), and they enjoyed Crystalline's features (Mean = 6.25, SD = 0.45). In addition, compared to Unakite (Mean = 5.75, SD = 0.45), they thought using Crystalline (Mean = 6.08, SD = 0.29) would help them solve programming problems more efficiently and effectively, and would recommend Crystalline (Mean = 6.17, SD = 0.58) over Unakite (Mean = 5.58, SD = 0.51) to friends and colleagues doing programming work, both differences were statistically significant under paired t-tests. Details of the survey questions and scores are presented in Table 3 . 6.2.1 Usability and usage patterns. Overall, participants appreciated the increased efficiency afforded by various Crystalline features. Many (9/12) mentioned that the perceived workload to collect and organize what they have investigated was minimal, saying that "I feel like I got a table for free" (P3), "the fact that I can see what I've paid a lot of attention to automatically bubbles up to the top is quite magical" (P9), and "It feels as if I was sitting in the passenger seat and not having to do all the steering and maneuvering" (P7). Some (3/12) participants also reported having taken advantage of the overlooked information reminder feature (Figure 3d ) to guide their research. Furthermore, participants reflected that Crystalline relieves them of the burden of trying to anticipate the value of a particular piece of information before collecting it since "the important bits will eventually be at or near the top, hopefully" (P12), and they could "focus on reading the page itself and not context switch to bookkeeping mode again and again" (P5). However, some did voice concerns about the system's ability at the beginning of the tasks, arguing that they were "skeptical if it will actually collect the right things" (P1), and reported that they would "skim through the list view and the table view quite frequently at the beginning" (P7). However, as they progressed through the tasks, their confidence in Crystalline increased, and they only occasionally checked the sidebar. We observed that three of the 12 participants ended up not examining and editing the system's output until they felt like they had finished reading and processing all the given pages, and they made minimal edits to the results. 6.2.2 Working with machine suggestions. Participants generally thought that the benefits of automating the collection and organization process outweighed the costs of dealing with occasional unhelpful machine suggestions, such as incorrectly merging criteria together or prioritizing unimportant criteria at the top of the list. For example, P7 reflected, "it feels like a mind reader. I know it's not perfect, but I also don't expect it to be, and would actually prefer occasionally peeking into what it's been doing and fixing whatever that's not correct than grabbing everything by myself all the time." Some did raise concerns about the ordering of criteria getting changed too frequently ("they [the criteria] were jumping around", P7) at the beginning. This is likely due to the fact that users were skimming through a web page without paying particular attention to anything at the beginning, causing their attention scores to be relatively indistinguishable. For future iterations of the system, we could experiment with less frequent UI update intervals under these circumstances so it would cause less distraction. Similar to what was reported in prior work [99] , since our participants were not explicitly told how the system worked to automatically collect and rank information, they had to form their own mental models and hypotheses about how the system works and how they could affect it with their behavior. For example, P8 noticed that "it looks like if I spend a little bit more time on a particular place on a page, the corresponding criterion would get picked up and bumped up quickly; and if I click on that part a bunch of times, which happens to be what I typically would do when I try to focus my attention on something now that I'm thinking about it, it's [the corresponding criterion] going to go up even faster." This suggests that our implicit signals were working, and further, that with experience users might adapt to explicitly steer the system towards their goal of collecting and prioritizing information, resulting in, to some extent, a mixed-initiative collection approach that still would require much less effort than the baseline methods. Future research could explore the costs and benefits of a wide variety of interactions and signals that lie on the spectrum between implicit behavioral signals to full manual direct manipulations, and any differences caused by directly instructing users about the implicit signals being used. Though the current version of Crystalline mainly focuses on reducing the cost for developers to collect and organize information, which was exactly what we tested in the lab study, we were also interested in making sure that the quality of the comparison tables built using Crystalline does not degrade as seen in other automation scenarios [49, 111] . Since there is not a gold standard comparison table, we evaluated the correctness of Crystalline's automatic approaches by how much editing participants had to do in order to fix Crystalline's mistakes and make sure that all the content in the table was eventually filled out and ranked correctly according to their understanding as per the study protocol. As shown in Table 2 (b), participants only had to perform on average 12.2 edits to the automatically generated comparison tables, compared to the 51.3 actions that they had to manually perform in the baseline Unakite condition (the difference is statistically significant, < 0.01). Among these, edits that are related to collecting information, such as manually selecting information and capture (0.92 times), renaming (1.92 times), and deleting information (0.50 times) were minimal, suggesting that our combination of NLP and behavioral signal heuristics was working effectively to collect information that the users thought was important. However, participants pinned or reordered the criteria that were automatically ranked by Crystalline on average 5.42 times (SD = 2.27 times). One possible explanation is that the universal scoring functions (in Table 1 ) did not necessarily apply to every single participant, suggesting the need for a more sophisticated and personalized scoring mechanism in future iterations of Crystalline and systems that leverage signals from users' natural browsing behavior. In addition, we asked and coded their opinions about using these tables as if they were the subsequent developers trying to understand the design rationale. In general, participants were excited about using comparison tables automatically built by Crystalline. For example, P10 highlighted scenarios where Crystalline would be useful for his own purposes, saying that "it's sort of like a nevererased whiteboard that would most likely help me remember what I looked at three months ago." In addition, some reflected that compared to having no clue of why a decision was made in a particular way in the first place, they would appreciate at least having access to a Crystalline table even if it was not actively monitored and maintained during the initial developer's sensemaking process. For example, P4 said: "I think being able to read something like this [Crystalline table] is going to make a big difference when you're banging your head against the wall trying to understand why this particularly old API was chosen, I mean, especially when the guy who wrote the code was long gone, I could at least 'read a transcription of his mind' in some sense." Here, we see preliminary evidence that our approach of automatically collecting and organizing information on behalf of developers is useful and valuable. We leave the formal evaluation of the quality of fully automatically built comparison tables with possibly more advanced versions of Crystalline for future work. Currently, Crystalline works best on a limited set of web pages in the programming domain, including documentation pages that are dedicated to a particular library or a set of APIs, as well as review articles or question answering pages that discuss and compare several options together. We chose to optimize for these types of web pages in the current prototype as they are reported in prior work [63, 81] as well as our formative discussions with developers as some of the most frequently consulted programming resources when it comes to making decisions. However, the performance reported on the web pages used in the study is not necessarily representative of how Crystalline would operate even on web pages of these types for users in general. In addition, Crystalline currently relies heavily on the overall structure of the web pages being standard, meaning that a page uses HTML tags appropriately according to their semantics (e.g., enclosing headers and list items in and
  • tags rather than wrapping everything with
    tags) and that there is a strong semantic coherence between a section header and its corresponding content. Though this is sufficient to demonstrate the idea of automatic collecting and organization and the benefits they offer, future research is needed to make Crystalline-style tools work on a more diverse set of web pages, as well as how to be clear upfront about its limitations in parsing web pages that do not follow appropriate web standards. Furthermore, our lab study has several limitations. Given the short amount of training and practice time participants had, some might not have been able to fully grasp the various features of Crystalline, or they might have been confused about what Unakite (the baseline system) has to offer. The study tasks might not be what participants typically encounter in their daily work, depending on whether they are in a position to make decisions, and thus they may not be equipped with the necessary motivation or context that they would otherwise have in real life. We mitigate these risks in the study setup by: 1) having participants perform a practice task for each condition simulating what they would have to do in the real tasks; 2) choosing the study tasks based on actual questions that are discussed by developers on Stack Overflow and other popular programming community forums; and 3) providing participants with sufficient background information and context to help them get prepared. In fact, 7 out of 12 participants reported that the tasks were indeed similar to what they would deal with in their daily work. We would like to further address these limitations in the future by having developers use Crystalline on their own work and personal projects, which would provide them with sufficient motivation as well as experience with Crystalline enriched over time. Finally, the overhead cost measurement in the study could be conservative, as we did not account for the time participants spent simply glancing or looking at the sidebars without any explicit interactions with it. However, from our observations during the study, participants rarely spent any extended time doing this. Nevertheless, we would like to take advantage of more advanced tools such as eye tracking [7, 89, 90, 103] in the future to more accurately account for the proportion of time when a participant's gaze is fixated on the user interface of the tools rather than on actual web content. Through designing and evaluating Crystalline, we gained deeper insights into the benefits and trade-offs of automatically collecting and organizing information for developers as they make sense of the web to make programming decisions. This motivates some ideas for future work. While Crystalline's approach provides developers with an inexpensive way of capturing knowledge in the browser, it represents only one piece of a larger puzzle of how to support a developer's everyday work that involves sensemaking and decision making. One dimension to characterize this is that developers also frequently perform activities outside their browsers, such as in IDEs, code editors [108] , command-line interfaces [19] , literate programming notebooks [68, 69] , or threads of discussions during formal or informal meetings [125] . Further research would be needed to understand how to collect and organize information from these sources as well as how to integrate them together to provide a more comprehensive picture of the decision making context. Another dimension that is relevant is the lifecycle of the knowledge captured via systems like Unakite and Crystalline. Early evidence from the user study has suggested there is a benefit of Crystalline's organization from the perspective of a subsequent developer who may need to understand a previous developer's decision. Future research could investigate how well developers are able to understand and potentially reuse these automatically assembled knowledge artifacts, possibly without any manual interventions from the initial knowledge authors, which could, in turn, eliminate the starting cost associated with initial knowledge creation [37] and unlock the virtuous cycle of accelerated programming knowledge reuse [37, 82] . Though the current set of mechanisms for deriving the importance of criteria from implicit behavioral signals generally works well for the setting of this research, there could be situations where a user's default browsing behaviors and patterns fall outside the limited set of signals and heuristics that Crystalline is currently looking for. For example, a user might not have the habit of unconsciously using the cursor as a reading guide or might not interact with the page at all while reading, which would render the tracking of some of the behavioral signals moot. In addition, users could exhibit different or additional behavior patterns when generalized to other tasks domains that involve information-backed decision making, such as comparison shopping, trip-planning, etc. [15, 53] . For example, when interacting with a map view to find the best local dining option, a user may frequently pan around and zoom (in and out) to view different restaurants, and both the duration of stay on a particular restaurant and how many times it is viewed back and forth could be leveraged to approximate the user's interest and investment of effort. One way to address these concerns is to leverage a more diverse set of behavioral signals and potentially signal combinations, such as scrolling, mouse panning, zooming, eye tracking [35, 36, 89, 90] , and facial gestures tracking [70, 117] to collect a more accurate picture of what users are seeing on screen. Another future direction that could be fruitful is to take a machine learning approach instead of the current rule-based approach for approximating content importance using behavioral signals. Specifically, we could leverage recent advances in crowdsourcing and labeling [17, 20, 27, 114] to log, annotate, and construct a largescale data set that maps a variety of behavioral signals to the perceived importance of content blocks that they are triggered on, and train on this data set to obtain scoring functions that would work more widely. Alternatively, an online learning approach could also be promising, where the system continuously learns, adapts, and improves from an individual user's behavior over time, as suggested by Horvitz [62] . Last but not least, automation afforded by systems like Crystalline enable people to focus their attention on reading and comprehending the web pages rather than splitting attention with having to collect and organize the information at the same time. However, prior work in learning science, such as Bransford et al. [24] , found that people who personally performed the actions of collecting, categorizing, and organizing information were more likely to be able to recall it correctly and in detail, and exhibited increased confidence in the final outcome. This raises an interesting tension and trade-off between full-on automation and direct manipulationfuture research would be required to examine the long term effect on people's learning outcome as well as confidence in their decisions using systems like Crystalline, and determine the appropriate levels and circumstances when automatic information bookkeeping should be applied. This paper explored how automatically collecting and organizing information as developers search and browse the web can better support them in decision making scenarios. Our designs were motivated by the growing complexity of the decisions that developers need to make, and the lack of tooling support to help them efficiently gather and synthesize evidence without causing much interruption to their main focus of reading and understanding content online. We introduced Crystalline, a browser extension that instantiates this idea by leveraging natural language processing and users' behavior signals such as mouse movement and dwell time to infer what information to collect and how to organize and prioritize it on behalf of a user. Through a lab study with 12 participants, we found promising evidence that using Crystalline as a copilot to collect and organize information is much faster and more efficient, and the resulting knowledge artifacts are potentially useful and valuable for the initial user as well as for subsequent consumption by people who need to understand the original decision-making context. Google Notebook Guidelines for Human-AI Interaction SenseMaker: An Information-exploration Interface Supporting the Contextual Evolution of a User's Interests Perspectives on information overload SearchPad: explicit capture of search context to support Web search Entity quick click: rapid text copying based on automatic entity extraction On-the-fly calibration for improved on-device eye tracking Example-centric Programming: Integrating Web Search into the Development Environment Two Studies of Opportunistic Programming: Interleaving Web Foraging, Learning, and Writing Code What do you see when you're surfing? using eye tracking to predict salient regions of web pages Eye movements as implicit relevance feedback Segment-level display time as implicit feedback: a comparison to eye tracking Supporting Mobile Sensemaking Through Intentionally Uncertain Highlighting Mesh: Scaffolding Comparison Tables for Online Decision Making Association for Computing Machinery Tabs.do: Task-Centric Browser Tab Management Learning to Detect Human-Object Interactions What can a mouse cursor tell us more? correlation of eye/mouse movements on web browsing Bashon: A Hybrid Crowd-Machine Workflow for Shell Command Synthesis Improving Crowd-Supported GUI Testing with Structural Guidance Google Sets Will Be Shut Down Implicit interest indicators Agile software development, the people factor How people learn: Brain, mind, experience, and school: Expanded edition A study of the documentation essential to software maintenance Easing Program Comprehension by Sharing Navigation Data ImageNet: A large-scale hierarchical image database BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Summarizing Personal Web Browsing Sessions Empirical studies of agile software development: A systematic review Context-Aware User Feedback in Continuous Software Evolution Measure it? Manage it? Ignore it? software practitioners and technical debt Best Note Taking App -Organize Your Notes with Evernote React -A JavaScript library for building user interfaces Eyelid Gestures on Mobile Devices for People with Motor Impairments Eyelid gestures for people with motor impairments Distributed Sensemaking: Improving Sensemaking by Leveraging the Efforts of Previous Users Analyzing the co-evolution of comments and source code The relevance of software documentation, tools and technologies: a survey The Google 'vs' Trick Refactoring: improving the design of existing code Comparative Evaluation of Javascript Frameworks word2vec Explained: deriving Mikolov et al. 's negative-sampling word-embedding method Angular -Introduction to Angular animations Angular -Validating form input Angular -One Framework Cloud Natural Language Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts Exploring mouse movements for inferring query intent Ready to buy or just browsing? detecting web searcher goals from interaction data Towards predicting web searcher gaze position from mouse movements Bento Browser: Complex Mobile Search Without Tabs HyperSource: Bridging the Gap Between Source and Code-related Web Sites Interactive Extraction of Examples from Existing Code Supporting the collaborative development of requirements and architecture documentation Implicit user profiling for on demand relevance feedback Informal Information Gathering Techniques for Active Reading Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers Thresher: automating the unwrapping of semantic content from the World Wide Web Understanding How Programmers Can Use Annotations on Documentation Principles of mixed-initiative user interfaces An Exploratory Study of Web Foraging to Understand and Support Programming Decisions No search result left behind: branching behavior with browser tabs Improving searcher models using mouse cursor activity Interactive Data Integration through Smart Copy & Paste A Framework to Collect and Visualize User's Browser History for Better User Experience and Personalized Recommendations Variolite: Supporting Exploratory Programming by Data Scientists The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool Face Recognition Assistant for People with Visual Impairments Standing on the Schemas of Giants: Socially Augmented Information Foraging Costs and Benefits of Structured Information Foraging Six Learning Barriers in End-User Programming Systems An Exploratory Study of How Developers Seek, Relate, and Collect Relevant Information during Software Maintenance Tasks Hard-to-answer Questions About Code Maintaining Mental Models: A Study of Developer Work Habits Comparing TensorFlow Deep Learning Performance Using CPUs Performance Comparison and Evaluation of Web Development Technologies in PHP, Python, and Node.js How software engineers use documentation: the state of the practice FMT: A Wearable Camera-Based Object Tracking Memory Aid for Older Adults Unakite: Scaffolding Developers' Decision-Making Using the Web To Reuse or Not To Reuse? A Framework and System for Evaluating Summarized Knowledge Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code Named entity recognition approaches Saving and Using Encountered Information: Implications for Electronic Periodicals How We Refactor, and How We Know It SenseMap: Supporting browser-based online sensemaking through analytic provenance CoNotate: Suggesting Queries Based on Notes Promotes Knowledge Discovery SearchGazer: Webcam Eye Tracking for Remote Studies of Web Search WebGazer: Scalable Webcam Eye Tracking Using User Interactions Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models A Comprehensive Evaluation of Cryptographic Algorithms CoSense: Enhancing Sensemaking for Collaborative Web Search An Empirical Study of the Framework Impact on the Security of JavaScript Web Applications The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis Seahawk: Stack Overflow in the IDE A rule-based approach to aspect extraction from product reviews ForSense: Accelerating Online Research Through Sensemaking Integration and Machine Research Support JSMeter: Comparing the Behavior of JavaScript Benchmarks with Real Web Applications The diary study: a workplace-oriented research tool to guide laboratory efforts Exploring how mouse movements relate to eye movements on web search results pages Eye-mouse coordination patterns on web search results pages How Do Professional Developers Comprehend Software Note the Highlight: Incorporating Active Reading Tools in a Search as Learning Environment The Cost Structure of Sensemaking Conference on Human Factors in Computing Systems (CHI '93) A comparison of bug finding tools for Java How Developers Search for Code: A Case Study Beyond paper: supporting active reading with free form digital ink annotations Hunter Gatherer: Interaction Support for the Creation and Management of Within-web-page Collections Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy Questions Programmers Ask During Software Evolution Tasks Modern information retrieval: A brief overview Popup: reconstructing 3D video using particle filtering to aggregate crowd responses Mica: A Web-Search Tool for Finding API Components and Examples Citrine: providing intelligent copy-and-paste TeethTap: Recognizing Discrete Teeth Gestures Using Motion and Acoustic Sensing on an Earpiece Active reading and its discontents: the situations, problems and ideas of readers United States Patent: 7350187 -System and methods for automatically creating lists The documentary structure of source code Social CheatSheet: An Interactive Community-Curated Information Overlay for Web Applications OrgBox: Supporting Cognitive and Metacognitive Activities During Exploratory Search Supporting Exploratory Search, Introduction, Special Issue Contextual web history: using visual and contextual cues to improve web browser history Making Sense of Group Chat Through Collaborative Tagging and Summarization This research was supported in part by NSF grants CCF-1814826 and FW-HTF-RL-1928631, Google, Bosch, the Office of Naval Research, and the CMU Center for Knowledge Acceleration. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors. We would like to thank our study participants for their kind participation and our anonymous reviewers for their insightful feedback. We are genuinely grateful to Yongsung Kim, Joseph Chee Chang, and Amber Horvath for their valuable feedback. In addition, we sincerely thank Jinlei Chen, Tianying Chen, Yulan Feng, Nan Gao, Haojian Jin, Toby Jia-Jun Li, Franklin Mingzhe Li, Julia Jiayin Qian, Haitian Sun, Jiachen Wang, Eric Yiyi Wang, Ziyan Wang, Zheng Yao, and Yi Zhou for their constant support, especially during the COVID-19 pandemic.