key: cord-0126936-ibufl2z4
authors: Caccao, Fl'avio Nakasato; Costa, Anna Helena Reali; Unterstell, Natalie; Yonaha, Liuca; Stec, Taciana; Ishisaki, F'abio
title: Tracking environmental policy changes in the Brazilian Federal Official Gazette
date: 2022-02-11
journal: nan
DOI: nan
sha: d4e991e0f8c90a96b1e4ba6c90237993d3bc7537
doc_id: 126936
cord_uid: ibufl2z4

Even though most of its energy generation comes from renewable sources, Brazil is one of the largest emitters of greenhouse gases in the world, due to intense farming and deforestation of biomes such as the Amazon Rainforest, whose preservation is essential for compliance with the Paris Agreement. Still, regardless of lobbies or prevailing political orientation, all government legal actions are published daily in the Brazilian Federal Official Gazette (BFOG, or"Di'ario Oficial da Uni~ao"in Portuguese). However, with hundreds of decrees issued every day by the authorities, it is absolutely burdensome to manually analyze all these processes and find out which ones can pose serious environmental hazards. In this paper, we present a strategy to compose automated techniques and domain expert knowledge to process all the data from the BFOG. We also provide the Government Actions Tracker, a highly curated dataset, in Portuguese, annotated by domain experts, on federal government acts about the Brazilian environmental policies. Finally, we build and compared four different NLP models on the classfication task in this dataset. Our best model achieved a F1-score of $0.714 pm 0.031$. In the future, this system should serve to scale up the high-quality tracking of all oficial documents with a minimum of human supervision and contribute to increasing society's awareness of government actions.

Brazil has one of the largest reserves of biodiversity in the world, such as the Amazon Rainforest, Cerrado and Atlantic forest. The preservation of these biomes is essential for the country to be able to fulfill the objectives of the Paris Agreement [12] , since 78% of greenhouse gas emissions in Brazil come from land Author to whom correspondence should be addressed.

use and cover change [20] . In 2020, while global emissions fell as a result of the coronavirus pandemic, in Brazil they grew substantially driven by deforestation and farming [17] ; the Amazon Rainforest deforestation rate was the greatest of the decade [15] . At the same time, the country is an agribusiness powerhouse, with 26.6% of the GDP related to it [3] . Despite this complex and dynamic environment, governments at all hierarchical levels of the nation are required to record all their legal actions in the Official Gazette. Thus, one can infer potentially harmful directions for the environmental policy from the systematic scrutiny of these documents. This makes tracking government acts a powerful tool to alert journalists and empower civil society with qualified and clear information [13] . However, this is an arduous task for manual work alone [6] : hundreds of highly technical documents are issued every day by the Legislative and the Executive branches, at the federal, state and municipal levels. According to the Brazilian Institute of Geography and Statistics (IBGE), Brazil has 5570 municipalities and 26 states in addition to its Federal District 3 , and typically each of these federative entities has its own Official Gazette.

Therefore, this scenario represents a yet underexplored opportunity for the most recent pre-trained language models, particularly for the Brazilian Portuguese language. Pretrained language models, such as BERT [4] and T5 [11] , have started to become popular in recent years and have set new quality standards in virtually all natural language processing (NLP) tasks, such as classification, translating and question answering. They have millions or billions of parameters and are built upon the Transformer architecture [19] , which leverages self-attention mechanisms and eliminates the need for recurrent neural networks. This allows models to be trained in parallel in a self-supervised way over huge databases, such as the entire Wikipedia. Finally, the parametric knowledge accumulated in its parameters can thus be transferred efficiently to general language tasks after minor fine-tunings on smaller domain-specific datasets. The results obtained in this way usually outperform models trained solely on the aforementioned smaller dataset [11] .

Work presented in [18] draws attention to the existing opportunities for collaboration between the NLP community and social scientists in studies related to climate change, such as tracing political discourses, topic modeling, and extracting insights when there is not "Big Data" -which is often a prerequisite in machine learning. ClimateQA is one of the most recent initiatives to tackle public and open data with advanced NLP models [10] . It consists of a RoBERTa model [9] that can be adjusted in any company's reporting database to answer questions about sustainability by consulting the company's unstructured files.

Other works, such as [8] , address the lack of accountability of politicians by providing a topic modeling system to aggregate policy makers' speeches from multiple data sources, such as Twitter and Facebook. Such systems have the potential to help the public more clearly discern opinions that are currently in vogue in public debate.

In the Portuguese language, the scarcity of resources, datasets or models, is even more striking. Among the initiatives, DEBACER is an algorithm developed to automatically segment blocks of speeches by politicians registered in the minutes of the Portuguese Parliament [5] . On the environmental side, Pirá is the first Portuguese-English bilingual question answering (QA) dataset on the Brazilian coast and oceans in general; it was crowdsourced and contains 2,261 question answer pairs on these subjects [1] . In the same direction, DEEPAGÉ is a QA system dedicated to answer questions about the Brazilian environment in Portuguese. It runs over news and Wikipedia articles on the subject and was fine-tuned on QA pairs filtered and translated from a massive open-domain QA dataset, due to the lack of Portuguese QA datasets on the topic [2] .

To cover the gap of approaches like those mentioned above in the case of the Portuguese language and address important federal government acts, in this work, we collected thousands of historical documents from the Brazilian Federal Official Gazette (BFOG) -or, as it is known in Brazil, "Diário Oficial da União" (DOU) -and compared multiple NLP models to classify changes in environmental policies. In order to do so, we first built a rule-based robot to scrap all the official documents from the BFOG, filter and pre-classify based on keywords defined by domain experts. Thus, the same domain experts reviewed and enriched a share of this initial data specifically related to environmental issues. Finally, this curated dataset was splitted and used to train and compare multiple NLP models on the classification task of federal government acts. We tested 4 models: from a traditional Naive Bayes and BiLSTMs to two state-of-the-art techniques, based on BERT [4] , a bidirectional Transformer encoder architecture. To summarize, our main contributions are:

-A new approach to collecting and classifying Brazilian Federal Official Gazette data, that takes advantage of automatic pre-classification techniques and knowledge from domain experts; -The Government Actions Tracker (GAT), an ever-increasing highly curated dataset, in Portuguese, of federal government acts related to the main Brazilian environmental policies -the dataset is made available at https://www.politicaporinteiro.org/monitor-de-atos-publicos/ 4 ; -Comparison among multiple NLP models, from LSTMs to BERT, designed to classify the acts aforementioned; -A BERT model fine-tuned on a Masked Language Model (MLM) task over a corpus of 500k raw documents (not included in GAT) from the BFOG. It is made available at https://huggingface.co/flavio-nakasato/berdou_ 500k.

It is noteworthy that the system formed only by the rule-based robot followed by a layer of human supervision today feeds one of the largest newspapers in the country with daily monitoring of acts by the Brazilian government that may have negative consequences for the preservation of the country's native forests and wildlife 5 . Thanks to this, it was possible to identify massive repeals of protection laws moved by the Federal Government in 2020, with the potential increase in deforestation 6 . However, since this current process requires the evaluation of human experts, a more effective classification system, such as these presented in this work, could eliminate the need for human supervision in the vast majority of cases, allowing their efforts to be redirected to new challenges, and dramatically scaling the model's tracking capability.

In the next sections, we will cover the construction strategy of the datasets, highlighting the domain experts' role in this effort and how the rule-based robot performed the initial classifications. Thus, we present GAT, the dataset used for training our models, as well as their main settings. Finally, we discuss our results, presenting future work perspectives.

Every morning, a robot scrapes all documents published in the Federal Official Gazette 7 and pre-classifies them under "Themes", based on rules defined by domain experts and refined over the years. These rules are based mainly on keywords and more complex expressions to include or exclude a document from a given theme. So far, there are 22 possible themes like Climate Change, Amazon Region and Environmental Disasters, and this pre-tagging helps the robot and domain experts to make an initial filter of the most pertinent documents. All official document data is transformed and loaded into a database.

The most relevant documents filtered by the robot are also sent to a separate file, where, every day, two specialists jointly review them and annotated an Action, a Circumstance and a Classification fields for each record, besides some more useful metadata. An Action refers to the legal action defined by the document, while a Circumstance usually carries more details about the action taken. Both are concatenated into a new variable Context we created to feed the models. Also, for the most part, the two strings are only extracted from the original document with minimal adjustments. We expect this might make it more natural to adapt and scale our models to other, new and larger, official documents mentioned in the previous section, which lack human annotators and reviewers. The process is illustrated in Figure 1 . Regarding the Classification field, domain experts defined 12 classes, described below: 5 The Environmental Policy Monitor can be accessed here: https://arte.folha.uol.

com.br/ambiente/monitor-politica-ambiental. 6 A newspaper article reporting this can be found here:

https://www1.folha.uol.com.br/ambiente/2020/07/ governo-acelerou-canetadas-sobre-meio-ambiente-durante-a-pandemia. shtml. 7 The official documents of the federal government, originally in PDF, are also published in a machine readable format -in this case, XML. These are the files processed by the robot. This filtered and verified subset of data is loaded into the Government Actions Tracker database (step e), which is used to train the NLP models. These models receive a Context variable for each document and are trained/fine-tuned to classify it as a Regulation/Deregulation/Neutral action (step g). The system over the gray area represents the Política Por Inteiro's system currently deployed in production.

-Regulation: Action that seeks to institute a rule or norm by the public administration, giving guidelines and producing guidance to economic agents; -Deregulation: Action that seeks to revoke and/or reverse a previously established regulation, change its understanding or orientation; -Institutional reform: Change in structure, skills and institutional arrangement related to public policy; -Response: Action that aims to respond to a significant external event, such as a natural disaster or a major accident; -Flexibilization: Alteration, temporary or not, of deadlines or conditions for compliance with environmental rules, norms and legislation; -Neutral: Action with no significant impact when considered in isolation, but cataloging assessed as necessary because it addresses topics on relevant agendas or with indications of becoming relevant in the medium and long terms; -Retreat: Action that seeks to revoke, replace or modify previously established regulations, due to political or popular pressure; -Law consolidation: Result of regulatory review, with no impact on content;

-Revocation: Batch revisions or acts associated with the full revision process; -Privatization: Action that seeks the alienation of business rights under the competence of the Union; the transfer, to the private sector, of the execution of public services operated by the Union; or the transfer or grant of rights over movable and immovable property of the Union; -Legislation: Action that seeks to agree a new law before society, giving guidelines and providing guidance to economic agents; -Planning: Action that does not institute regulatory processes per se, but discloses documents and guiding strategies, such as management plans, creation of committees and working groups, approval of programs and policies that have not yet been defined, among others.

Misclassifications found by the annotators are also used regularly to refine the rule-based robot, in a process of continuous feedback adjustment via active learning. In the curation process of the GAT dataset, specialists even regularly double-checked the original BFOG documents themselves, which substantially improved the rule-based robot recall over time, minimizing the chances of loss of relevant material. 

After the human supervision stage, the verified and enriched data are sent to a separate database, the Government Actions Tracker (GAT) database. Table 1 shows the Theme, Action, Circumstance columns and their respective Classification column for three examples of instances, obtained from the GAT database (held in the original language). The version of GAT dataset we used in this work has 1,181 instances and no missing data in the Theme or Classification variables. Figure 2 shows their distributions in the dataset used. The Action feature has 29.1 ± 19.6 words on average; the Circumstance one, 70.0 ± 54.0 words. The first record dates from January 1, 2019; the last, July 12, 2021. Due to the small size of the GAT database, the training with the original 12 classes proved to be very unstable, since the objective was to predict the Classification of a document given only its Context. Thus, we regrouped the previous classes of the Classification variable into three major classes, as follows:

-Regulation: Regulation, Planning and Response; -Neutral: Neutral, Retreat and Legislation; -Deregulation: Privatization, Deregulation, Flexibilization, Institutional reform, Law consolidation and Revocation.

Therefore, the dataset ended up with the following proportions: Regulation (52.0%), Neutral (18.5%) and Deregulation (29.5%).

In order to discover the most suitable model for classifying the Acts in the BFOG, we built and compared four models. We start with two simpler models, based on Naive Bayes [21] and Bidirectional LSTMs (BiLSTMs) [14] . We also fine-tuned a BERTimbau Base model [16] (we call "BERT1") directly on the classification task of GAT. Finally, our fourth model ("BERT2") was also based on the BERTimbau Base model, but in this case we performed two subsequential fine-tunings: first on 500k documents from the entire (and unprocessed) BFOG database since 2002 8 (the blue database in Figure 1 ), and then on the GAT database to classify the government acts.

All models were trained under a stratified cross-validation with k-folds = 10. We also performed different strategies for data augmentation with sequence-tosequence models, such as T5 [11] and Pegasus [22] . However, as none of them outperformed the models without data augmentation, we left them out of the scope of this work. Further, we describe our four models in more details below:

Simpler models: Naive Bayes and BiLSTM As our first models, we trained a Multinomial Naive Bayes (NB) model and a bidirectional LSTM (BiLSTM) network with 32 LSTM units, with batch size of 32, maximum sequence length of 50 for 4 epochs -a Gridsearch was performed to select these hyperparameters. The net weights were initialized from a previously trained matrix (CBOW) of 300 dimensions [7] .

We also fine-tuned a BERTimbau Base model, a pretrained BERT model in Portuguese with 12 layers and 110 million parameters, on the GAT classification task. After multiple inspections, we trained the model for 15 epochs, with 512 as the maximum sequence length, a learning rate of 5e-5, a batch size of 128 and an Adam's optimizer algorithm with weight decay.

BERT2 As Table 1 illustrates, the sentences in the BFOG have a very characteristic writing style, with many technical terms and legal jargon. Hence, instead of fine-tuning BERTimbau Base directly on the GAT classification task, we first trained it on a Masked Language Model (MLM) task on a dataset of 500k documents, since 2002, from the BFOG, scraped from the official websites. We only use the body of the documents (the "ementa detalhada"), with minimal text processing -basically the removal of special characters like "

" or "	". The final file was 1.5GB of plain text and we trained the MLM model for 10 epochs. Finally, the model was then fine-tuned in the classification problem of the GAT dataset, as in the previous case and with the same hyperparameters as BERT1, as both responded in a similar way to the tests. Table 2 summarizes the results obtained for each model. Pre-trained models were superior to baselines in all metrics, and BERT2 performed slightly above BERT1, although the difference is not statistically relevant. Among the baseline models, even with the LSTM network having been initialized with an already trained weight matrix and after a careful choice of parameters, its results in general were still inferior to the NB, which reinforces the idea that, without pretrained models based on the Transformer architecture, recurrent networks can be quite limited in offering a substantial improvement over simpler models.

Considering the proportions of each class in each experiment, one can see that specially the first two models, BERT1 and BERT2, are promising, despite the challenges of dealing with a small database for the current standards of the state-of-the-art pretrained models and also despite of the imbalanced classes. 

In this work, we present strategy to leverage automated techniques and expert knowledge to track and classify potentially harmful changes in environmental policies directly from texts in Brazilian Federal Official Gazette. In addition to this strategy, we contribute with the Government Actions Tracker, a new challenging curated dataset, in Portuguese, of federal government acts related to Brazilian environmental policies. Also, we designed and compared four different NLP models on the classification task posed by this dataset, from the simpler models to the state-of-the-art ones. Monitoring each act published by the government in order to inform civil society is an extremely challenging task and there is still no single practical solution to solve it. While a rule-based system working jointly with domain experts already deliver immeasurable value when it comes to policy monitoring, it is also paramount that the latest NLP technologies be considered to increase the scalability and performance of these systems. Hence, among the future work are the expansion of the annotated datasets, as well as the improvement of the best models presented here so that they can keep the quality and stability of the classification for a greater number of classes.

This work is the result of an academic partnership between the Escola Politécnica of Universidade de São Paulo (EP-USP) and Política Por Inteiro. Without the data curation efforts of Política Por Inteiro experts, building these NLP models would not have been possible. Also, this work was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES, Finance Code 001), the Itaú Unibanco S.A., through the Programa de Bolsas Itaú (PBI) of the Centro de Ciência de Dados (C2D) of EP-USP, and by the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) (grant 310085/2020-9). We also thank the Center for Artificial Intelligence (C4AI-USP), with support by the Fundação de Amparoà Pesquisa do Estado de São Paulo (FAPESP, grant 2019/07665-4) and by the IBM Corporation. The data, views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the financiers.

Pirá: A Bilingual Portuguese-English Dataset for Question-Answering about the Ocean

Deepagé: Answering questions in portuguese about the brazilian environment

BERT: Pre-training of deep bidirectional transformers for language understanding

Debacer: a method for slicing moderated debates

Text as data: The promise and pitfalls of automatic content analysis methods for political texts

Portuguese word embeddings: Evaluating on word analogies and natural language tasks

From talk to action with accountability: Monitoring the public discussion of policy makers with deep neural networks and topic modelling

Roberta: A robustly optimized bert pretraining approach

Analyzing Sustainability Reports Using Natural Language Processing

Exploring the Limits of Transfer Learning with a Unified Textto-Text Transformer

The threat of political bargaining to climate mitigation in Brazil

Bidirectional recurrent neural networks

The Brazilian Amazon deforestation rate in 2020 is the greatest of the decade

BERTimbau: Pretrained BERT Models for Brazilian Portuguese

Deforestation boosts Brazil greenhouse gas emissions as global emissions fall -Reuters

The climate change debate and natural language processing

Attention is all you need

Climatic Benefits From the 2006-2017 Avoided Deforestation in Amazonian Brazil

Exploring conditions for the optimality of naïve bayes

Pegasus: Pre-training with extracted gapsentences for abstractive summarization