key: cord-0272386-vufrrfqt authors: Lee, Jean; Youn, Hoyoul Luis; Stevens, Nicholas; Poon, Josiah; Han, Soyeon Caren title: FedNLP: An interpretable NLP System to Decode Federal Reserve Communications date: 2021-06-11 journal: nan DOI: 10.1145/3404835.3462785 sha: 3c2f9375571839176a7dbe63ecaf5701b3842cfd doc_id: 272386 cord_uid: vufrrfqt The Federal Reserve System (the Fed) plays a significant role in affecting monetary policy and financial conditions worldwide. Although it is important to analyse the Fed's communications to extract useful information, it is generally long-form and complex due to the ambiguous and esoteric nature of content. In this paper, we present FedNLP, an interpretable multi-component Natural Language Processing system to decode Federal Reserve communications. This system is designed for end-users to explore how NLP techniques can assist their holistic understanding of the Fed's communications with NO coding. Behind the scenes, FedNLP uses multiple NLP models from traditional machine learning algorithms to deep neural network architectures in each downstream task. The demonstration shows multiple results at once including sentiment analysis, summary of the document, prediction of the Federal Funds Rate movement and visualization for interpreting the prediction model's result. Over the years, the role of the U.S. Federal Reserve System (the Fed) has expanded due to changes in the monetary and financial conditions globally. The Fed's decisions have a chain effect on a broader range of economic factors like inflation, employment, value of currency, growth and loans [10] . Therefore, it is important to analyse the Fed's communications that anchor and guide market expectations, however, it is generally long-form and complex due to the ambiguous and esoteric nature of content [2, 7] . Additionally, the Fed has increased their interest in research exploring the importance of Natural Language Processing (NLP) for macroeconomics [4] . It is aligned with the remarkable progress in NLP that has seen the emergence of a massive number of model architectures (e.g. Transformers [23] ), and pre-trained models (e.g. BERT [6] , T5 [16] ). Considering the fact that the Fed supervision carries vast amounts of unstructured data, the significant improvement in NLP research could assist their needs. However, there are no pilot studies to identify how NLP components could help end-users analyse Federal Reserve communications. In this paper, we present FedNLP, an multi-component NLP system that aims to decode Federal Reserve communications with NO code. The system is designed for end-users to assist their holistic and intuitive understanding of the Fed's communications through the use of multiple NLP components. We define that an end-user is the person who works within a broad range of business sectors such as finance and accounting, often reads economic and financial news, and has low to no programming skills. Our objectives are to reduce the gap between the advance of NLP technology and the needs for the use of NLP by building a "practical use" of NLP system. Inspired by recent research that combines language tools [22] , our system focuses on presenting multiple NLP components such as sentiment analysis [14] , prediction [6, 26] , explanation [18] , and summarization [16] in one application. Fig. 1 shows the functional system flow that consists of NLP and application modules required to deliver a no-code system to an end-user. In each NLP task, multiple models are built and then the final models are selected for web applications. The main contributions of this paper are as follows: • To the best of our knowledge, we propose FedNLP, the first interpretable multi-component NLP system for decoding Federal Reserve communications that assist end-users. • We implement the multiple NLP components and the associated algorithms from traditional machine learning models to pre-trained deep neural network models. • Our demonstration 1 facilitates the development of the system and shows the results from multiple NLP models at once. • Through our demo, an end-user can easily compare the different results from both general algorithms and the financial domain-specific algorithms. The Federal Reserve (The Fed) controls the interest rate, specifically the Federal Funds Rate (FFR), in order to maximize employment rate and achieve stability in the prices of goods and services in the U.S. [3] . Since the FFR indirectly impacts a very broad range of the economy, it is important to interpret the underlying factors which contribute to it. Historically, when the economy shows signs of weakness, like during the great recession or the Covid-19 pandemic, the Fed typically lowers rates. This decision is made by a committee within the Fed, called the Federal Open Market Committee (FOMC). As the FOMC has become more transparent with its communications [19] and has expanded their research interests to include the use of NLP [4] , it is important to identify how NLP components could help end-users make better-informed decisions. In this research, we focused on FOMC members' communications such as reports, press releases and speeches as they also hold certain importance and insights for the market [8, 11] . There has been some work on interactive analysis of understanding ML performance. Several systems have taken a black-box approach which does not rely on internal workings of a model, but is designed to let users examine inputs and outputs. Many are general-purpose and focusing on a visual inspection of model's behavior on sample data, including ModelTracker [1] , Prospector [12] , Manifold [27] , or What-If Tool [25] . For example, What-If Tool provides a rich support for intersectional analysis within dataset, tests hypothetical outcomes and focuses on ML fairness. In linguistic tasks, visualisation has shown to be useful tool for understanding deep neural networks such as LSTMVis [21] , Seq2Seq-Vis [20] , BertViz [24] , ExBERT [9] , or LIT [22] . Typical solutions include visualizing the internal structure or intermediate states of the model to enhance the understanding and interpretation, evaluating and analysing the performance of models or algorithms, and interactively improving the models at different development stages such as feature engineering or hyperparameter tuning through integration of domain knowledge. However, the focus of these tools has been restricted to developers, lacking the ability to deal with long documents or complex industry-specific documents. In this research, we design a visualization system for end-users to provide a holistic view of how NLP techniques analyse and interpret the Fed's communications. Initially, we define an end-user and conduct preliminary focus group interviews by recruiting target end-users to identify system requirements. Through the in-depth interviews, we determine which functional components would be more useful, and design the proposed system and the functional system flow (Fig. 1) . Additionally, we collect text data associated with the Fed's communications from over 30 websites related to changes in target Federal Funds Rate (e.g. lower, maintain, or raise). With this real-world data, we implement eight widely used NLP components and algorithms ( Table 1) . The sentiment analysis, prediction, explanation and summarization tasks provide a side-by-side comparison of generic and finance-specific algorithms and pose the question to end-users whether financial-specific algorithms are more capable of capturing the Fed's communication than generic algorithms. Ultimately, the proposed system is deployed in a live environment, to provide a simple and easy to access environment for end-users (Fig. 2 ). In our system, we include sentiment analysis which shows the attitude or the emotion of the writer (e.g. positive, negative, or neutral). Table 1 : Multiple language processing components and algorithms in the proposed FedNLP. "General" denotes general algorithms and "Financial" denotes the financial domain-specific algorithms. Algorithm Description General Financial TextBlob [13] Returns polarity and subjectivity using TextBlob for general settings. v LM Sentiment [14] Returns polarity and subjectivity using LM sentiment for financial settings. v LDA [17] Visualizes term clusters and topics in HTML using LDA model. v XGBoost [5] Displays ML model predictions with explanation component. v FinBERT [26] Displays model predictions using a fine-tuned FinBERT. v Lime [18] Visualizes top 10 highly-contributing features and highlights sentences. v v TextRank [15] Displays an extractive summarization using a graph-based ranking model v T5 [16] Displays an abstractive summarization using a fine-tuned T5. v Decoupled APIs Shows multi-components in one webpage that works with new input data. v v For the generic representation, we apply a TextBlob algorithm [13] that trains the data using NLTK corpus with naive bayes classifier. For the financial representation, we adopt the lexicon-based method for economic and financial documents, which was constructed by Loughran and McDonald (LM sentiment [14] ). LM sentiment consists of financial word dictionaries appearing in corporate 10K/Q documents and earning calls. We more focus on predicting the direction of changes to the FFR by enabling the NLP models to be trained on the textual representations of the FOMC's decision. We conduct extensive experiments including traditional Machine Learning (ML), Neural Network (NN), and pre-trained models. Our approach is compared to following models: SVM, Linear SVC, Logistic Regression, Random Forest, XG-Boost, CNN, a fine-tuned BERT, and a fine-tuned FinBERT. Among all the experiments on ML models, XGBoost with TF-IDF features achieve the highest test accuracy of 0.73 and weighted average F1 score of 0.66 while detecting all three classes. In Neural Network baseline models, the best setting on CNN and BERT base overfits and performs relatively poorly. A fine-tuned FinBERT achieves a test accuracy of 0.72 and weighted average F1 score of 0.65, however, detecting only maintain class which shows overfitting issues. In our system, we implement XGBoost with TF-IDF features and a fine-tuned FinBERT model based on the evaluation results. Due to space limits, we don't include the detailed results in this paper. One of the questions from end-users was how to trust a prediction model and its results. The strength of text-based prediction models is that the model can provide an explanation that is easily understandable by people. In our research, we implement Local Interpretable Model-agnostic Explanations (LIME [18] ) that provides a visualisation by using the classifier's output to generate a linear surrogate model. The visualisation shows that the highly contributed words are different for each prediction model even though the prediction results are the same. A fine-tuned FinBERT captures more financial and economic jargon than any other model, however, is computationally expensive. In the system, we chose XGBoost with LIME as the explanation module because of computation costs. For the deployment, we further optimises the delivery of the explanation by isolating the result from the visualisation library (D3). The choice of XGBoost and delivery optimisation reduced the time to deliver a result from the explanation module from 10 minutes to 30 seconds. The summarization module provides key information about the document, giving users a simple way to quickly decide whether or not to read the full content. Automatic summarization gives a direct benefit because the Fed's documents are often long and complex. Automatic summarization has been studied for decades and there are two types of categories -extractive and abstractive. Extractive summarization aims to capture most information with the least redundancy whereas abstractive summarization aims to generate new sentences to encapsulate maximum gist of the input document. In this research, we use TextRank [15] and a fine-tuned T5 [16] for extractive and abstractive summarization, respectively. In order to provide a simple no code experience, all components are delivered to end-users through a familiar interface -a web application (Fig 1) . The GUI is a lightweight web-based Angular application that provides a simple, intuitive interface for end-users to explore documents and the related predictions. Following Angular design principles, the application architecture consists of components (e.g. document listing and graphs), and page components for areas. Additionally, there are six data services that modularise access a single REST API or NLP API. These services make HTTP requests to an API endpoint and parse the response into JavaScript objects for use within components. The GUI is hosted on AWS, published to an S3 bucket and deployed as a static website using AWS CloudFront for distribution and AWS Route 53 for domain resolution. The REST API delivers static content in JSON format to the GUI. The API consists of Node.js microservices deployed to AWS Lambda in a serverless configuration. The microservice is a simple data-retrieval script that 1) determines which object/s to retrieve based on path parameters, 2) makes the call to the AWS DynamoDB database containing the static content, 3) formats and emits the response into JSON data. In order to improve the application performance and data load time, the document data is split into two endpoints; documents contains lightweight data used in lists and section pages, and the document-extensions endpoint returns full document content used on the document view only. A simple Node.js script extracts and splits document and document-extension content from the NLP pipeline document data. A dedicated DynamoDB load script uploads this data into the corresponding table. While category, domain, and author content were manually sourced, it was transformed and uploaded with the same DynamoDB load script. The NLP API is an endpoint for text analysis (WordCloud, sentiment analysis and topic modelling) and NLP tasks (prediction, explanation and summarization). The API itself consists of a simple Flask web server running inside a Docker container on an AWS EC2 server. The Docker environment initialises with all required libraries, including PyTorch and TensorFlow, as well as the configuration for Flask and network. On initialisation, the trained and packaged model from the NLP pipeline is downloaded from an AWS S3 bucket onto the server. The Flask API exposes one route for text analysis or each NLP task, handles routing, data input parsing, execution of the underlying code and formatting and emitting the response as JSON data. We propose the FedNLP system, which is designed to let end-users explore various NLP analyses and tasks to assist with decoding Federal Reserve communications. To the best of our knowledge, our system is the first of its kind to present the use of NLP in analysing many forms of Fed's documents including post-meeting minutes, members' speeches, and transcripts. The system enables end-users to experiment with custom input through an interactive demo which presents multiple NLP analysis results automatically through decoupled, productionized APIs. In practical use, FedNLP will be emphasized as a supplementary system to provide text analysis indicators from Federal Reserve communications and conduct further empirical studies. Modeltracker: Redesigning performance analysis tools for machine learning Central bank communication and monetary policy: A survey of theory and evidence What economic goals does the Federal Reserve seek to achieve through its monetary policy? TheFed Board of Governors of the Federal Reserve System. 2020. Workshop on the Use of Natural Language Processing in Supervision. Retrieved October Xgboost: A scalable tree boosting system BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Decoding central bankers' language Do Federal Reserve communications help predict federal funds target rate decisions 2020. exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformer Models Impact of Federal Funds Rate on Monthly Stocks Return of United States of America Have minutes helped to predict fed funds rate changes Interacting with predictions: Visual inspection of black-box machine learning models TextBlob: Simplified Text Processing When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks Textrank: Bringing order into text Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Software framework for topic modelling with large corpora Why Should I Trust You?": Explaining the Predictions of Any Classifier How the Fed Moves Markets: Central Bank Analysis for the Modern Era S eq 2s eq-v is: A visual debugging tool for sequence-to-sequence models Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models Attention is All you Need Analyzing the Structure of Attention in a Transformer Language Model The what-if tool: Interactive probing of machine learning models Finbert: A pretrained language model for financial communications Manifold: A model-agnostic framework for interpretation and diagnosis of machine learning models We thank Gautam Radhakrishnan Ajit, Manisha Gupta, Renil Austin Mendez, and Lavanshu Agrawal for helping us to collect data and formulate the problem. We would also like to thank the anonymous reviewers and the NLP group at the University of Sydney for providing us with valuable comments.