key: cord-0588406-2j3r7ykj
authors: Mathews, Noble Saji; Chimalakonda, Sridhar
title: Detox Browser -- Towards Filtering Sensitive Content On the Web
date: 2021-06-18
journal: nan
DOI: nan
sha: 555d7efe4d184732da725610fe66595209e1c298
doc_id: 588406
cord_uid: 2j3r7ykj

The annual consumption of web-based resources is increasing at a very fast rate, mainly due to an increase in affordability and accessibility of the internet. Many are relying on the web to get diverse perspectives, but at the same time, it can expose them to content that is harmful to their mental well-being. Catchy headlines and emotionally charged articles increase the number of readers which in turn increases ad revenue for websites. When a user consumes a large quantity of negative content, it adversely impacts the user's happiness and has a significant impact on his/her mood and state of mind. Many studies carried out during the COVID-19 pandemic has shown that people across the globe irrespective of their country of origin have experienced higher levels of anxiety and depression. Web filters can help in constructing a digital environment that is more suitable for people prone to depression, anxiety and stress. A significant amount of work has been done in the field of web filtering, but there has been limited focus on helping Highly Sensitive Persons (HSP's) or those with stress disorders induced by trauma. Through this paper, we propose detox Browser, a simple tool that enables end-users to tune out of or control their exposure to topics that can affect their mental well being. The extension makes use of sentiment analysis and keywords to filter out flagged content from google search results and warns users if any blacklisted topics are detected when navigating across websites

Due to ease of use and convenience, more and more people rely on web content from search engines and social platforms for reading, based on their topics of interest [4] . Rise in popularity of easily accessible online content has increased the possibility of incidental exposure to news -a situation in which people consume content even when they actually aim to do something else [7] . In case of content which deals with situations of collective trauma like school shootings and acts of terrorism can cause anxiety [11] . Based on type content and quantity of content to which a person is exposed, a person can become more vulnerable to trauma related mental health problems over time [8] . According to Sharma et al's study of mental health scenario in India during the COVID-19 pandemic, sensationalized news-stories increase anxiety so it is desirable to avoid them [17] . f In a study by Olagoke to evaluate the psychological impact of exposure to COVID news in the main stream media, it was found that young educated adults who perceive themselves to be vulnerable to COVID are more prone to depression [15] . Unlike the individualistic culture of countries like the USA, India has a collectivistic culture where family bonds are considered more valuable in general and a greater share of people live in joint families [13] . In such a situation, COVID related news may increase the fear that family members in high risk categories can contract COVID which in turn may lead to a higher level of depression, anxiety and stress [22] . Recently, there has been a push for provision of mental health services through various information and communication technologies [5] . But given the stigma associated with mental health, the general public would be reluctant to involve a third party in matters concerning their mental health [18] . Hence, in order to escape the infodemic 2 which came along with the COVID pandemic, people may need tools that can help control the content they view Fear, depression and stress are some of the psychological symptoms reported by people who had to quarantine themselves due to COVID [3] and people with prior trauma tend to spend more time on news related to the pandemic which in turn may lead to increase in the severity of post-traumatic stress disorder (PTSD) [19] . Thus it essential for people with prior trauma to filter out content which can act as PTSD triggers. It is quite possible that there are people who see news on politics as a stress inducer and would like to avoid such news to maintain their mental stability.

According to Adreas and Mathes [14] , this exposure to news can be distracting and can cause people to waste time contemplating over frivolous news. Frank & Pero [12] express that such exposure to news can cause aversion to it.

To solve these problems we conceptualised and designed the Detox Browser which enables a user to filter out content which he/she might feel is adversely affecting their mental well-being. In addition to default filters, the extension enables a user to customize the browsing experience as per their personal requirements. The extension analyzes sentiment, categorizes content and removes blacklisted topics to prevent the user from being overwhelmed with information which can possibly traumatize him / her. The extension achieves this through simple keyword-based checks followed by a detailed analysis. Even though it is mainly aimed at google search results, it also checks websites for topics blacklisted by the user. The extension is in open beta and its features are being continuously improved.

According to the study by Wu & Li [23] , exposure to negative news related to COVID-19 can potentially cause depression.

Given the harmful effects many have recommended spreading awareness about various mental health disorders and explored using tele-medical and e-Health interventions for treatment of such cases as described by Fonseca et al [6] .

Currently, the medical infrastructure is highly strained and available infrastructure to treat such diseases is limited, so it is important to focus on prevention rather than treatment. [16] .

URL based content filtering is one of the most commonly available software solutions to filter web content, but it requires a database of URLs which is cumbersome to maintain [1] . Google has an inbuilt feature "safe search" which removes results which are age inappropriate. Though Google's inbuilt safe search function filters out obscene content, but it cannot be customised according to a user's requirement. Also given the nature of content of information which has to be filtered out, Google's safe search is not enough. The functionality can further be improved with chrome extensions like Profanity Filter 3 , Safe Words 4 and many more which can remove profane words and even censor it with symbolic stand-ins for the purpose obfuscating these words [21] . These extensions focus on removing profane words but not on other types of content which the user might want to censor or might be sensitive to.

There have also been tools such as the Good News chrome extension 5 which block news stories based on keywords and phrases, but this works only on the Google News website. Further to the best of our knowledge, there has not been much effort towards the use of sentiment analysis on the returned search results for methods to enable users to have a better control over their web browsing experience. Machine learning has also played an important role in the field of web filtering. Tools have been developed to automatically remove malicious comments which can ruin the user experience of news readers [20] . A similar approach was used to build NewsWeeder which uses user ratings to categorize and display news to the users liking [10] .

From the existing literature, we can see that significant work has been done in the development of abuse or profanity filters. However, not much has been done towards personalised search filtering based on the sentiment of the articles and in enabling users to have control over displayed content across the web. To address this, we propose Detox browser.

Detox Browser has been built as a chrome browser extension, primarily targeted at filtering google search results. It also supports profanity detection and content warnings for cross website navigation. The behaviour of the extension in each context can be modified via the settings and the sensitivity can be adjusted through the personalization options.

The workflow utilised by the current version of the extension is shown in Figure 1 .

Once a google search is detected based on the URL, the extension extracts all HTML nodes from the page that have links in them. These extracted links are then used to obtain the closest parent nodes. They are categorised based on which of the predefined patterns they match. These patterns for categorization are obtained by manually analysing the selectors for the required components of the webpage being analysed. Currently, these patterns have been extracted for google search and the extension supports normal search results, featured stories, news and videos while ignoring special elements and Wikipedia / dictionary results so that direct searches for a topic and its meaning are not blocked.

A mutation observer which is a built-in object that watches for any changes to a DOM element, ensures that any changes in the search page content are checked by the extension. From the parent nodes the text content is extracted by shown to perform well in negative sample ratings [2] . The Afinn dataset contains words which are rated from -5 (negative) to +5 (positive), this allows for quick preliminary analysis in order to facilitate flagging of articles on page load based on if the value comes out to be negative. Flagged articles are replaced by a placeholder div that can be clicked to reinstate the swapped element. An emoji depicting strongly negative, negative, neutral, positive and strongly positive is appended to the left of the articles indicate the score obtained via lexical analysis as shown in figure 2 [C].

For further checks, the extracted text is passed through a Multinomial Naive Bayes Classifier and Natural Language Processor. To keep the extension compact and for easy updates and improvements to these, they are deployed online and served through an API endpoint. The classifier is trained on a 20 Years Times of India Headlines data-set [9] and categorises the articles into the top 50 categories from the data-set which has over 300 groups. The user can hover over content hidden by the extension to see the keywords generated through NLP and in case the user feels that it is a false positive then he/she can unhide it as shown in Figure 2 [A] . The placeholder also mentions the domain name of the original article to help the user judge if he wants to see the article.

The extension popup allows users to tweak the sensitivity to their liking. The polarity list allows specification of phrases along with a score from -5 to 5 which are used to override default values from AFINN. Further, the blacklist enables users to totally remove topics which they might not want to see. Depending on the options selected in the menu shown in Figure 2 [E] the blacklist behaviour varies. The words / phrases in the blacklist are searched for using regex queries and if detected they are blurred by default with an option to remove on hover an example is shown in Figure   2 [D]. For search results if the blur is disabled the extension will totally remove the result which contained the blacklisted keywords. This behaviour is also adopted to filter profanity and abusive language. For this, the profane-words package 6 is used which contains a large collection of profanity in English. The extension also packages a background script that warns the user if any of the blacklisted keywords are detected on a web site other than google search that the user visits with a popup as shown in Figure 2 [B]. Through the options panel one may disable the extension on certain websites.

Through Detox browser we aim to help improve mental health of the users by providing them the ability to have control on the content they see on the web. Through preliminary testing of the extension, we found that it tends to flag articles quickly. This trigger happy behaviour can be attributed to the lexical analysis and the limited amount of text in the title and description to judge the content of a search result. Over the course of development, we had tried analysing the entire content of the main article by loading it in the background, however, this approach was quite resource intensive, and hence was not pursued. The lexical analysis was mostly chosen because of its small bundle size and quickness.

However, this causes a lot of false positives as it does not take into account the context based meaning of the statement.

The categorizer is trained on Indian headlines hence is region specific to an extent. The sentiment analysis in place also works only in English and the tool does not support local languages.

The tool is pending evaluation and a proper user study with volunteers with prior trauma, would be helpful in modifying the extension to cater to specific needs of such users. We also believe that our tool could be used by the general public, but we cannot comment on its efficacy yet. Another limitation of our approach is that we need to specify patterns for the elements to be analysed and replaced properly. This means that adding support for new websites requires manual effort and also that major updates to the way content is displayed on the target website can break the primary scripts. To take this into account we have kept checks in place that warn the user when the patterns no longer return results as expected.

In this paper, we introduced Detox Browser, a Google Chrome extension that filters search results as per the user's preferences. It also gives a popup warning if the content on any website is blacklisted by the user. In the future we plan to extend the native scripts support to social media websites. Adding these direct scripts provide much more flexibility in ways we can control the content being delivered on the page. Taking into account dislikes in videos and image metadata for more features surrounding online media would help improve the tool as well. Further, since the Classifier and NLP model are deployed on the server side, we can keep updating the categorizer and NLP toolset to improve its accuracy. The categorizer could also be made to use more generic classes to account for use in multiple regions. Further expansion includes adding a secondary check with context-aware sentiment analysis to reduce false positives without the user's intervention in tuning the filter's sensitivity. In order to make it user friendly, we wish to introduce starter keywords in future versions so that casual users can slide through default sensitivity levels based on personalised word lists implemented and shared by the volunteers in the open beta.

Next generation filtering: Offline filtering enhanced proxy architecture for web content filtering

Evaluating the performance of the most important Lexicons used to Sentiment analysis and opinions Mining

The psychological impact of quarantine and how to reduce it: rapid review of the evidence

Seeking and sharing health information online: comparing search engines and social media

The need for a mental health technology revolution in the COVID-19 pandemic

Using Information and Communication Technologies (ICT) for Mental Health Prevention and Treatment

Incidental news exposure online and its impact on well-being

Media exposure to collective trauma, mental health, and functioning: does it matter what you see?

Times of India News Headlines

Newsweeder: Learning to filter netnews

Self-efficacy and health-related outcomes of collective trauma: A systematic review

From incidental exposure to intentional avoidance: Psychological reactance to political communication during the 2017 German national election campaign

Relation between big five personality traits and Hofstede's cultural dimensions: Samples from the USA and India. Cross Cultural Management

Learning from Incidental Exposure to Political Information in Online Environments

Exposure to coronavirus news on mainstream media: The role of risk perceptions and depression

Evidence synthesis of digital interventions to mitigate the negative impact of the COVID-19 pandemic on public mental health: rapid meta-review

Indians vs. COVID-19: The scenario of mental health

Stigma and discrimination as a barrier to mental health service utilization in India

Overwhelmed by the news: A longitudinal study of prior trauma, posttraumatic stress disorder trajectories, and news watching during the COVID-19 pandemic

Automatic identification of personal insults on social news sites

Explicit words filtering mechanism on web browser for kids

Immediate psychological responses and associated factors during the initial stage of the 2019 coronavirus disease (COVID-19) epidemic among the general population in China

The Relationship Between the Duration of Attention to Pandemic News and Depression During the Outbreak of Coronavirus Disease 2019: The Roles of Risk Perception and Future Time Perspective