Submitted 19 August 2020
Accepted 28 November 2020
Published 28 January 2021

Corresponding authors
Hrituja Khatavkar,
hrituja.khatavkar@sitpune.edu.in
Ketan Kotecha, head@scaai.siu.edu.in

Academic editor
Rajanikanth Aluvalu

Additional Information and
Declarations can be found on
page 18

DOI 10.7717/peerj-cs.340

Copyright
2021 Gite et al.

Distributed under
Creative Commons CC-BY 4.0

OPEN ACCESS

Explainable stock prices prediction from
financial news articles using sentiment
analysis
Shilpa Gite1, Hrituja Khatavkar1, Ketan Kotecha2, Shilpi Srivastava1,
Priyam Maheshwari1 and Neerav Pandey1

1 Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India
2 Symbiosis Center for Applied Artificial Intelligence (SCAAI), Symbiosis International (Deemed University),
Pune, Maharashtra, India

ABSTRACT
The stock market is very complex and volatile. It is impacted by positive and negative
sentiments which are based on media releases. The scope of the stock price analysis relies
upon ability to recognise the stock movements. It is based on technical fundamentals
and understanding the hidden trends which the market follows. Stock price prediction
has consistently been an extremely dynamic field of exploration and research work.
However, arriving at the ideal degree of precision is still an enticing challenge. In
this paper, we are proposing a combined effort of using efficient machine learning
techniques coupled with a deep learning technique—Long Short Term Memory
(LSTM)—to use them to predict the stock prices with a high level of accuracy.
Sentiments derived by users from news headlines have a tremendous effect on the
buying and selling patterns of the traders as they easily get influenced by what they
read. Hence, fusing one more dimension of sentiments along with technical analysis
should improve the prediction accuracy. LSTM networks have proved to be a very useful
tool to learn and predict temporal data having long term dependencies. In our work,
the LSTM model uses historical stock data along with sentiments from news items to
create a better predictive model.

Subjects Artificial Intelligence, Data Mining and Machine Learning
Keywords Long Short-Term Memory (LSTM), Explainable AI(XAI), Stock price prediction,
Deep Learning

INTRODUCTION
Stock market investment/trading can be very tricky and stressful but rewarding if predicted
correctly. It has been an object of study for the past many decades and is a complex
task because of the large number of parameters, disordered information, and dynamism.
Several technical indicators and sources of information affect the stock prices, but due to
the substantial amount of data present, it becomes difficult to predict the prices. However,
with the advancement in technology, particularly in processing large chunks of temporal
data, the field is continuously improving to achieve better prediction accuracy.

There is a famous hypothesis in finance called the Efficient Market Hypothesis, which
states that asset prices cannot entirely depend on obsolete information and market prices
react to new information, for example, financial news articles, social media blogs, etc (Ţiţan,

How to cite this article Gite S, Khatavkar H, Kotecha K, Srivastava S, Maheshwari P, Pandey N. 2021. Explainable stock prices prediction
from financial news articles using sentiment analysis. PeerJ Comput. Sci. 7:e340 http://doi.org/10.7717/peerj-cs.340

https://peerj.com/computer-science
mailto:hrituja.khatavkar@sitpune.edu.in
mailto:head@scaai.siu.edu.in
https://peerj.com/academic-boards/editors/
https://peerj.com/academic-boards/editors/
http://dx.doi.org/10.7717/peerj-cs.340
http://creativecommons.org/licenses/by/4.0/
http://creativecommons.org/licenses/by/4.0/
http://doi.org/10.7717/peerj-cs.340


2015). These sources change the sentiments of the investor and the traders. With the
advancement in Artificial Intelligence, information coming from both financial time series,
which captures sentiments and other technical analysis data, can be fused to predict stock
prices.

In this paper, we suggest a technique involving LSTM and the interpretability power
of Explainable AI (XAI) for visual representations to provide a definite outline that helps
them anticipate their future stock prices. The data we are using is the National Stock
Exchange (NSE) and the news headlines aggregated from Pulse (Pulse by Zerodha, 2020 ).
Pulse has aggregated 210,000+ Indian finance news headlines from various news websites
like Business Standard, The Hindu Business, Reuter, and many other news websites.

STATE-OF-THE-ART TECHNIQUES
Cho et al. (2014) proved that the Recurrent Neural Network (RNN) is a powerful model
for processing context information from textual data. However, to tackle long-term
dependencies, the variant of RNN, LSTM, has proved to be very effective in handling
complex tasks for text processing and language modeling on temporal data (Sherstinsky,
2020). We propose using LSTM for news classification for sentiments, employing the
interactions of words during the compositional process. LSTM incorporates a memory
cell, which is a unit of computation that supersedes the traditional deep learning neurons
in the network (Moghar & Hamiche, 2020; Egeli, Ozturan & Badur, 2003). To understand
the behavior of the proposed model, we also intend to make our model explainable.
XAI aims to develop a collection of machine learning techniques that produce more
explainable models (Doran, Schulz & Besold, 2017). Using XAI techniques, we wish to
provide knowledge about the prediction made by the model so that the user can get
insights for future trading/investment strategies. The model can be interpreted by visual
tools, which can help us to consider the biases in the model before making the final decision.

Kalyani, Bharathi & Rao (2016) in their research, using supervised machine learning for
classification of news headlines and additional text mining techniques to examine news
polarity. The news articles with its polarity score and text converted to tf-idf vector space
are fed to the classifier. Three different classification algorithms (Support Vector Machines
‘‘SVM’’, Naïve Bayes and Random Forest) are implemented to investigate and enhance
classification accuracy. Results of all three algorithms are compared based on precision,
recall, accuracy, and other model evaluation techniques. When evaluating the results of
all classifiers, the SVM classifier performs satisfactorily for unknown data. The Random
Forest also showed better results when compared to the Naïve Bayes algorithm. Finally, the
relationship between news articles and stock price data are plotted.

Nayak, Pai & Pai (2016) used the historical data from 2003 obtained from Yahoo Finance
and used two models to predict the stock trend. One model was built for the prediction
of daily stock by considering all the data available daily. The second model that was built
was for monthly prediction of stocks, and it considered data available every month. Also,
two different datasets were used for each model. A historical price dataset was used for
the Daily Prediction model and historical data from 2003 obtained from Yahoo finance is

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 2/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.340


used for a monthly prediction model. The dataset was modeled using various models like
boosted decision tree, logistic regression, and support vector machine. Up to 70% accuracy
was observed using the Support Vector Machine.

Vargas, Lima & Evsuko (2017) proposed a Recurrent Convolutional Neural network
model. It predicts intraday movements of the S&P 500 record. The model data sources
financial news headlines from the day past of the forecast day and utilizes a few specialized
indicators which are extracted from the primary target. Every news is processed through a
two-step procedure—initially, a word2vec model to create a vector to display each word,
and afterward they implement the mean (average) of all the word vectors of the same title.
The RCNN model uses deep learning models: CNN and RNN. The CNN model is utilized
to separate rule base data from the text while the RNN-LSTM model is utilized to get the
context information and to interpret the stock information attributes for forecast purposes.

Yoo, Kim & Jan (2005) analyzed and assessed a portion of the current ML techniques for
stock exchange prediction. After comparing various models like multivariate regression,
Neural Networks, Support Vector Machines, and Case-Based Reasoning models, they
inferred that Neural Networks offer the capacity to predict market trends accurately when
contrasted with different procedures. SVMs and Case-Based Reasoning are famous for
predicting stock costs due to their simplicity of use and implementation.

LSTM
LSTM (Long Short-Term Memory) is an improved form of RNN. LSTM models avoid
the problems encountered by RNN. Hochreiter & Schmidhuber (1997) introduced LSTMs
that make use of memory cells that can either forget unnecessary information or store
information for more extended periods. LSTMs are explicitly modeled to handle tasks
involving historical texts and are also able to educate themselves on long term dependencies.
With the help of memory cells, they are capable of educating themselves. LSTMs have a
chain-like structure making it easier to pass on information. The information is passed
on as a state of the cell from one memory cell to another. The output of the network is
modified by the state of these cells.

The architecture of LSTM allows for constant error flow with the help of constant,
self-connected units (Hochreiter & Schmidhuber, 1997). This flow of error and states is
propagated with the help of the three gates: input gate, output gate and forget gate, that
each LSTM memory cell block is composed of. Input gates modulate the amount of new
information received by a cell, forget gates determine what amount of information from the
previous cell is passed on to the current cell; they determine what information is relevant
and what information needs to be forgotten (Kim & Won, 2018). Given below in Fig. 1A is
the structure of Simple Recurrent Network (SRN). As compared to SRN, LSTM-Cell given
in Fig. 1B is different and has multiple gates.

LSTM model has two passes:
• Forward Pass, and
• Backward Pass

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 3/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.340


Figure 1 Detailed schematic of the Simple Recurrent Network (SRN) unit (A) and a Long Short-Term
Memory block (B) as used in the hidden layers of a recurrent neural network.

Full-size DOI: 10.7717/peerjcs.340/fig-1

Forward Propagation
In neural networks, the storage and calculation of intermediate results in the sequence from
the input layer to the output layer are called Forward Propagation. In this propagation,
we work incrementally through the mechanics of a deep network with one hidden layer.
It consists of the processing of weighted inputs through the activation function. Forward
propagation is done to get a value. We then compare this computed value with the real value
to further compute the error. Then, in backward propagation, we calculate the derivative
of the error with respect to the weights. Then we subtract the value obtained from the
value of the weight. The final step in a forward pass (Vargas, Lima & Evsuko, 2017) is to
the comparison between the predicted and the expected output.

Forward propagation is calculated using the following steps:

Z[i]=w[i]a[i−1]+b[i−1] (1.1)

g[i](Z)=σ(Z[i]). (1.2)

Backpropagation
Backpropagation (Vargas, Lima & Evsuko, 2017) is analogous to calculating the delta rule
for a multilayer feedforward network. Backpropagation is the method to calculate the
gradient of neural network parameters. The main aim of backpropagation is to minimize
the cost function by weights and biases of the network. The amount of adjustment to be
made is determined by the gradients of the cost function for the respective parameters.
Assuming we have the following functions:

Q= f (P) (1.3)

R=g(Q) (1.4)

g(Q)=g◦f (P). (1.5)

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 4/21

https://peerj.com
https://doi.org/10.7717/peerjcs.340/fig-1
http://dx.doi.org/10.7717/peerj-cs.340


Where, the input and the output P, Q, R are tensors of arbitrary shapes. By using the chain
rule, we can compute the derivative of
R wrt P:

dR/dP=dot(dR/dQ,dQ/dP). (1.6)

A Forward Pass
Let xt be the input for time t, N be the quantity of LSTM blocks, M is the number of inputs.
We then get the subsequent weights for LSTM layer:
•Input weights: Wz, Wi , Wf , Wo ∈ RN×M

•Recurrent weights: Rz, Ri , Rf , Ro ∈ RN×N

•Peephole weights: pi , pf , po ∈ RN

•Bias weights: bz, bi , bf , bo ∈ RN

Then the vector formulas for a vanilla LSTM layer forward pass can be written as:

zt=Wzx
t
+Rzy

t−1
+bz (1.7)

zt=g(zt) block input (1.8)

it=Wix
t
+Riy

t−1
+pi�c

t−1
+bi (1.9)

it=σ(it) input gate (1.10)

f t=Wfx
t
+Rfy

t−1
+pf�c

t−1
+bf (1.11)

f t=σ(f t) forget gate (1.12)

f t=Wfx
t
+Rfy

t−1
+pf�c

t−1
+bf cell (1.13)

ft=σ(f t) (1.14)

ct=zt�it+ct−1�ft (1.15)

ot=Wox
t
+Roy

t−1
+po�c

t
+bo (1.16)

ot=σ(ot) (1.17)

yt=h(ct)�ot Output gate (1.18)

where σ , g and h are pointwise non-linear activation functions. The logistic sigmoid
(σ(x)=1/1+e −x) is employed as an activation function and also the tangent hyperbole
(g(x) = h(x) = tanh(x)) is typically used because the block input and output activation
function. Pointwise multiplication of 2 vectors is denoted by � (Greff et al., 2015) .

B. Backpropagation
Calculation of deltas in LSTM Block:

δyt=1t+RTz δ
t+1
z +R

T
i δ

t+1
i +R

T
f δ

t+1
f +R

T
oδ

t+1
o (1.19)

δot=δyt�h(ct)σ ′(ot) (1.20)

δct=δyt�ot�h′(ct)+po�δo
t
+pi�δi

t+1
+pf�δf

t+1
+δct+1�ft+1. (1.21)

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 5/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.340


Here, 1t is the vector of deltas delegated from the layer above. If E is the loss function,
it corresponds to ∂E ∂yt , but not counting the recurrent dependencies (Pathmind, Inc.,
2020). Then:

δf t=δct�ct−1�σ ′(f t) (1.22)

Why LSTM?
LSTM networks various state cells. These short and long-term memory cells rely on the
state of these cells. These memory cells act as an aide for the model to remember historical
context as predictions made by the network are influenced by past experiences of inputs
to the network. This helps us make better predictions. LSTM networks tend to keep the
context of information fed by inputs by integrating a loop that allows information to flow,
in one direction i.e from one step to the following.

Explainable AI (XAI)
We want our model to output not only the prediction part but also the explanation as to
why the prediction turned out that way. If our machine makes a prediction, why should
the user trust the prediction made by the machine? Today, machine learning and Artificial
Intelligence(AI) are exploited to make decisions in many fields like medical, finance, and
sports. There are cases where machines aren’t 100% accurate. So, the user should blindly
trust the choice of the machine? How can the user trust AI systems that derive inferences
on probable unfair grounds? To solve the problem of trust between the user and Artificial
Intelligence, XAI (Saffar, 2019) can be employed. It gives us the reasoning for a prediction
made by the model. Mainly XAI is employed to resolve the black box problem. This ’’black
box’’ phase is interpreted by XAI and explained to the users. The user cannot completely
depend on the model without a clear understanding of the model and the way the output
is achieved. XAI provides a clear understanding of how the model achieved a certain
prediction or result. XAI gives a human-understandable explanation for the prediction.
Current models enable interpretation but leave it to the user to apply their knowledge, bias
and understanding.

METHODOLOGY
Given below is a systematic representation of how the data served to the system, how the
model is trained and tested. And how XAI works to interpret the model.

System design
Figure 2 explains the overall architecture of the system. We preprocess the News Headlines
Dataset by performing tokenization, removing stop words, and embedding it. The headlines
are then normalized and sentiment analysis is performed on it to comprehend the sentiment
behind each sentiment, i.e whether a particular headline has a positive or negative sentiment
associated with it. We then use the preprocessed headlines to classify them. We classify
them to analyze whether they produce a positive sentiment or a negative sentiment. Later,
this headline dataset, along with the preprocessed Yahoo Finance dataset, is combined to

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 6/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.340


Figure 2 System Architecture of Stock Market Prediction using LSTM and XAI. Shows how the data is
initially processed, divided into train and test set to build further predictions and evaluate the biases using
XAI.

Full-size DOI: 10.7717/peerjcs.340/fig-2

form the final dataset. Then the dataset is divided into train and test dataset. The training
dataset is used to build the LSTM model. The test dataset to verify the results obtained
after training. Finally, XAI is implemented using the LIME tool that is used to interpret the
model and understand the biases involved in the dataset.

Algorithm
The algorithm is divided into prediction of the opening price using LSTM-CNN and XAI
for interpreting the model. Given below is the algorithm.

Prediction

Input: processed news headlines
Instantiate ReduceLRPlateau()
Instantiate ModelCheckpoint()
Instantiate EarlyStopping()
Truncate and pad input sequence
while epoch r: 1 ->R

For batch_size b: 1->B
Design the model by adding:

Sequential layer to LSTM model
embedding Layer
Conv1D() and MaxPooling1D
LSTM()
Dense layer

Fit the model
Evaluate the model

Save the model
Use XAI to interpret the model
Output: Table with Predicted opening values

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 7/21

https://peerj.com
https://doi.org/10.7717/peerjcs.340/fig-2
http://dx.doi.org/10.7717/peerj-cs.340


XAI

Input: Processed headlines
Instantiate LimeTextExplainer()
Instantiate explain_instance() with appropriate hyperparam-
eters(text, pipeline.predict_proba, number of features)
Create an ordered dictionary of words and weights
Plot a bar-plot with appropriate axes
Output: Plot of biases of words (weights) vs words

Procedure
Selecting the prediction model
Based on the evaluation done by Ghosh et al. (2019) comparing the various models,
including simple linear regression such as ARIMA, AR, ARMA, or nonlinear models such
as ANN, ARCH, RNN, LSTM it was concluded that LSTM could achieve higher accuracy as
compared to other models. Stock market prediction requires a nonlinear dynamic system;
LSTM can train themselves with nonlinear input–output pairs. Moreover, combining
qualitative factors like international events, financial news, News headlines’ sentiments,
etc. can achieve much higher accuracy.

Selecting datasets
A dataset of financial news headlines is obtained from Pulse. Pulse is a news aggregator and
it aggregates Financial news from various sources thus providing less biased dataset. This
dataset is combined with a Yahoo Finance dataset (Yahoo Finance, 2020). Since our main
focus was on Indian Stock Market prices (Hiransha et al., 2018), we extracted news headlines
from the dataset accordingly. Moreover, an indication of changes in the stocks (Zhang,
Fuehres & Gloor, 2011) for BSE and NSE was included. The values represented the stock
market prices of the day the news headline was published and of the day after. The dataset
was split into 80% for training, 10% for validation, and 10% for testing. The dataset used
in this project has 19,736 data entries and has 3 parameters: New Headlines, Website and
the Timestamp.

Data preprocessing
The data obtained is raw and unprocessed. To perform any computation, processing the
data is necessary. For instance, the null data should be removed, and the trivial values
should be removed, etc. Given below in the Fig. 3 is the raw dataset obtained from the
sources.

This is what the Financial News dataset looked like before preprocessing. Table 1 shows
the features neccessary for our model and we clean the dataset to obtain only these features.

Steps involved in Data Preprocessing are represented in Fig. 4:
As we are first training the LSTM model only for BSE-SENSEX and NSE-NIFTY, we

train it on news headlines involving SENSEX and NIFTY specific news only. For this, we
search for headlines that contain the word: ‘‘Sensex’’ (case - independent).

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 8/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.340


Figure 3 Dataset before cleaning. The data collected through various sources consists of unwanted and
redundant values that need to be processed.

Full-size DOI: 10.7717/peerjcs.340/fig-3

Table 1 Features of Clean data. Final columns that are used in the dataset that is necessary for the model
building and model evaluation.

Column A Column B Column C

News headline Website Timestamp

Figure 4 Steps in data preprocessing. (A) Dropping: to include only the necessary data, unwanted
columns/rows must be dropped; (B) Tokenization helps in increasing the speed of computation
and efficiency; (C) cleaning is required after tokenization since some words are not necessary for the
evaluation; (D) normalizing: by bringing all values in one range, it is easier to compare various values and
evaluate the model.

Full-size DOI: 10.7717/peerjcs.340/fig-4

After running the code on our cleaned dataset we get this result as shown in Fig. 5.

Sentiment analysis for stock market news headlines
To be able to predict the stock market (Patel et al., 2014), we must understand the market’s
sentiment correctly. Negative news will lead to a fall in the price of the stock and positive
news will lead to rising in the price of the stock. The most commonly used words in a news
headline will give rise to an instant trigger of emotions in a particular person’s mind. Hence
we made a word cloud of the top 150 words commonly occurring in a news headline from
our dataset as depicted below in Fig. 6.

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 9/21

https://peerj.com
https://doi.org/10.7717/peerjcs.340/fig-3
https://doi.org/10.7717/peerjcs.340/fig-4
http://dx.doi.org/10.7717/peerj-cs.340


Figure 5 Dataset after cleaning. Clean dataset, after preprocessing helps in an efficient model building
and model evaluation.

Full-size DOI: 10.7717/peerjcs.340/fig-5

Figure 6 Top 150 commonly used words in a news headline by using WordCloud. The WordCloud
generated from the data gives the overview of which words in the headline impact stocks.

Full-size DOI: 10.7717/peerjcs.340/fig-6

LSTM network is a type of RNN that preserves its state. (Olah, 2020) LSTM (Selvin
et al., 2017) has already provided satisfactory results in the area of text data applications
such as machine translation and NLP (Bird, Klein & Loper, 2016). The headlines are a type
of text data; for that reason, it was a rational decision to apply this type of network to our
data. Figure 7 shows the LSTM network architecture used in our project.

Input
News features from different online news sites have been scraped and used as input for our
application. Then the headlines are cleaned, and the remaining characters are converted
to lowercase to overcome identical double words with different capitalization. Word
embedding is dealt with by changing over the words to their proper index using the top
2000 words in the corpus; the rest of the words that are not found in the top 2000 words
corpus are set to zero. The maximum size of the word embeddings is 100 and zero-padded
for smaller headlines.

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 10/21

https://peerj.com
https://doi.org/10.7717/peerjcs.340/fig-5
https://doi.org/10.7717/peerjcs.340/fig-6
http://dx.doi.org/10.7717/peerj-cs.340


Figure 7 Network Architecture of LSTM Network. The network architecture of the LSTM network de-
picts how the processed headlines are embedded and passed through a CNN Layer and further through an
LSTM cell to output the prediction.

Full-size DOI: 10.7717/peerjcs.340/fig-7

LSTM - CNN model for news headlines
1. Initially, we have created an instance of ReduceLROnPlateau, which reduces the

learning rate when a metric has stopped improving accuracy.
2. Secondly, we have a ModelCheckpoint callback. It is used for training the model using

model.fit() to save a model or weights at some interval, which can be loaded later to
continue the training from the state saved.

3. An instance of EarlyStopping is created. This stops training the model when a particular
metric has stopped improving. Then the preprocessed data is loaded and only the top n
words are kept, and the rest are zero. Also, split it into the train dataset and test dataset.
The data is then truncated and we pad the input sequence (X_train and X_test). A split
of 70-30 is done for the train and test.

4. Finally, we create a model.
(a) The first layer is constructed using the sequential network model of the Keras

framework. A Sequential model works good enough for vanilla stack layers. Each
layer has one input tensor and one output tensor exactly.

(b) The next layer, i.e., an Embedding layer, is added. Here, the text data is encoded
first. In an embedding, words are defined by dense vectors. Each vector serves the
projection of the word into a continuous vector space.

(c) Then a MaxPooling1D layer is added. A pooling layer is an elementary constituent
unit of CNN. The main function of the pooling layer is to sequentially minimize
the spatial orientation size and the number of parameters and hence reduce the
computation required over the network. The pooling layer operates separately for
each feature map.

Out of all the pooling layers, the one best suited for our model was max pooling. It is
depicted as follows: It takes the maximum value over the window defined by pool_size and
hence downsamples the input representation. In our model, the pool_size is 2. Figure 8
visualises the working of CNN MaxPooling1D.

Next, we then construct an LSTM layer of 100 units, as our inputs’ maximum review
length is 100. We then add a Dense layer having the activation function of sigmoid as this
is our final layer. We also are using an adam optimizer with a learning rate of 1e−3. The

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 11/21

https://peerj.com
https://doi.org/10.7717/peerjcs.340/fig-7
http://dx.doi.org/10.7717/peerj-cs.340


Figure 8 CNN MaxPooling1D. The MaxPooliing1D layer minimizes the spatial orientation size. The
layer works separately on each feature.

Full-size DOI: 10.7717/peerjcs.340/fig-8

model is finally compiled and then we save the model. We then print the model summary
and the function is called for today and tomorrow parameters.

LSTM model for OHLC dataset
The OHLC dataset consists of Open, High, Low and Close stock prices of a particular
company for each period. For this model, we have initially loaded the dataset and split it
into test and train data. This dataset does not require much preprocessing as the data and
operation over it is very straightforward. Once data is loaded we use MinMaxScaler() which
transforms features by scaling each feature to a given range. The fit_transform method
is applied to the instance scl of MinMaxScaler() to tokenize the text and output, each
dimension corresponding to the frequency of token found in the text. The data is split into
input ‘X’ and output ‘y’. It is collected by splitting ‘i’ past days as input X and ‘j" coming
days as Y.

The fit_transform method is applied to the instance scl of MinMaxScaler() to tokenize
the text and output. Then we build the model with the first layer constructed using the
Sequential model and the next 2 hidden layers using LSTM Finally, a Dense() layer is used
for constructing the output.

Hyperparameters, loss function and optimizer
A hyperparameter is a configuration whose value cannot be determined by the dataset. It
needs trial experiments where one can manually test which hyperparameter suits best for
the model.Once fine-tuned the model is trained better.

Optimizers are algorithms for altering the attributes as weights and learning rate in
order to reduce the loss. Table 2 represents the hyperparameters and their values used in
the experiment.

LIME
As represented in Fig. 9, after training the model and obtaining predictions the explainer
interpretes the model so that the user can make appropriate judgements. After studying
various XAI (Khaleghi, 2019; Ribeiro, Singh & Guestrin, 2016) tools like LIME, What-
If (Wexler et al., 2019), DeepLift (Shrikumar et al., 2016), Shapely (Messalas, Christos &

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 12/21

https://peerj.com
https://doi.org/10.7717/peerjcs.340/fig-8
http://dx.doi.org/10.7717/peerj-cs.340


Table 2 Hyperparameters, loss function and optimizer and their values used along with detailed com-
ments.

Title Value Comments

Loss function Binary cross-entropy Since the type of classification performed is binary
Optimizer Adam optimization Used to enhance the network
Learning rate 0.001 Set after trials
Decay 0.1 Validation loss does not ameliorate significantly any more

for 5 epochs, there will be decay.

Figure 9 Working of XAI. The mode generates the prediction, which along with the data, is run through
the XAI tool. The XAI tool explains the data based on the biases to facilitate better decision making.

Full-size DOI: 10.7717/peerjcs.340/fig-9

Yannis, 2019), we realized the one which will determine the Transparency of our model in
the best possible way was LIME.

According to the study, the What-if Tool is an interactive visual tool to help the user
comprehend the datasets and better understand the output of TensorFlow models. To
analyze the models deployed. However, it is well suited for XGBoost and SciKit Learn
Models.

DeepLift is a tool that collates the activation of a neuron to its ’reference points’ and
assigns contribution scores accordingly. A single backward pass is used to compute scores
efficiently. DeepLift reveals dependencies that are otherwise missed by other methods
giving separate consideration for positive and negative contributions.

LIME, short for Local Interpretable Model-Agnostic Explanations (Ribeiro, Singh &
Guestrin, 2016), is a tool that is used to explain what the machine learning model is
performing. It is applied to the models which perform behaviour but we do not know how
it works.

Machine learning models (Patel et al., 2015) can be explained using plots, for example,
linear regression, decision trees; these are some of the classifiers that can be explained using
plots (It can only give an explanation for the global, but it cannot give an explanation of
any local observation). But there are some significant data with more complexities and
more dimensions that cannot be explained using plots. One of the examples is the neural
network; it cannot be explained using plots.

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 13/21

https://peerj.com
https://doi.org/10.7717/peerjcs.340/fig-9
http://dx.doi.org/10.7717/peerj-cs.340


By using LIME, we can understand what the model does, which features it picks on
to create or develop prediction. LIME is model-agnostic; it means that it can be used for
any black-box model that we know today or can be developed in the future. LIME tool
is local; it means it is observation specific; it explains every single observation that you
have in a dataset. If a model has a high accuracy, can we trust the model without having
interpretability? The answer is no; we cannot trust it because many models have noises that
can predict the output right, but the way the model has predicted will contain some faults
which are not good for a long term process. So, interpretability is important for building
trust with the model.

Performance measure
Validation loss
For neural networks we consider the loss to be negative log-likelihood for classification
and residual sum of squares for regression. Hence the primary aim of our learning model
is to decrease the value of the loss function as we tune its parameters by changing weights
through different optimization methods. The value of loss suggests how well a certain
model runs for each optimization iteration. Preferably, after each iteration, we would
expect a decrease in the value of the loss. Some points should be kept in mind while we
observe the decrease in loss value. For example, one may face the problem of overfitting
wherein the model memorizes the training dataset examples and becomes less effective
for the test dataset. Over-fitting also occurs when you do not employ a regularization or
normalization. We may have a very intricate model, or the number of data points N is very
low.

Validation accuracy
We calculate the accuracy of a model once the model is completely trained. This means
that the parameters are fixed, and no further model learning will be taking place. Then
the test dataset is then inputted into the model, and the number of errors the model
makes is reestablished after comparing it with the target values. Finally, the percentage
of misclassified data is calculated. Let us understand this with an example. The model’s
accuracy is 96.2% means that out of 1,000 test samples, the model classifies 962 of samples
correctly.

RESULTS AND DISCUSSION
Result
Table 3 represents the results generated after performing the experiments.

In LSTM-CNN the input was the data that we get from preprocessing. The data was
then combined with a dataset of the news headlines and the prices. The model which we
trained was of the news headlines of Sensex.

There are a total of 100 epochs to be run in the training of the model. But it stops at
the 15th epoch; this is due to avoiding making the model overfitting. The accuracy of the
model in a total of 15 epochs is 74.76%. The Loss was 0.1693.

As we see in Fig. 10:

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 14/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.340


Table 3 Result generated. The LSTM-CNN model results using the New Headlines dataset and its com-
parison with the result generated by the LSTM model using the OHLC dataset. The result depicts the ac-
curacy and loss of the two models.

Model Dataset No of
Epochs

Accuracy Loss

LSTM-CNN News Headlines 15 74.76% 0.1693
LSTM OHLC 10 88.73% 0.1733

Figure 10 Loss LSTM -CNN model for News Headlines. The loss values for the LSTM-CNN model
showing that the model was processed till Epoch No. 16, and there was an Early Stopping to avoid
overfitting.

Full-size DOI: 10.7717/peerjcs.340/fig-10

Epoch 00015:val_acc did not improve from 0.73876.
There are a total of 100 epochs. They stopped training at 15 to 18th epochs. This is because

of the function EarlyStopping that helps to avoid the problem of overfitting.
The accuracy remains constant after the 15th epochs, whereas, val_acc starts decreasing

due to which training stops.
val_acc:- It is the measure of how good are the predictions of the model. So, in our

case, the model is very well trained till the 15th epoch, and after that, the training is not
necessary.

acc:- It gives the percentage of instances that are correctly classified. In our case, the
percentage is 93.15%.

val_loss:- val_loss is the loss that is applied to the testing set, whereas, the loss is applied
to the training set. val_loss is the best to depend on to avoid the problem of overfitting.
Overfitting is when the model is fit too closely to the training data.

The val_loss and loss keep on decreasing, whereas acc and val_acc are kept on increasing
this shows that our model is well-trained. Figure 11 visualises the loss over number of
epochs.

As according to the plot obtained after training, we observe that initially, the loss is
very high; however the validation loss is low relatively. Once the epochs progress, the loss
decreases, thus providing better learning for our model. The point where the loss and
validation loss plateaus are the point where no further learning might be possible with the
same amount of data for the same number of epochs. Although the accuracy of LSTM-CNN
with news headlines data is less than LSTM with OHLC data, since we are interpreting
the model, it is adding another perspective of viewing at the model, which makes it a
better model. When compared with RNN-CNN (Tan et al., 2019; Zhang, Chen & Huang,

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 15/21

https://peerj.com
https://doi.org/10.7717/peerjcs.340/fig-10
http://dx.doi.org/10.7717/peerj-cs.340


Figure 11 Loss for LSTM for OHLC Data. Graph comparing the loss vs. val_loss for the OHLC dataset
for the LSTM model shows as the number of epochs progresses, the values of loss and val_loss tend to de-
crease and plateau to approximately 0.0010.

Full-size DOI: 10.7717/peerjcs.340/fig-11

Figure 12 Sample 1,000 feature weights given by LIME. Result obtained by LIME tool after modeling it.
We get a cumulative 1,000 explanation features. Each word is weighted.

Full-size DOI: 10.7717/peerjcs.340/fig-12

2018) model , which it performs way better since not only does it remember small term
information but also long term information.

Results of XAI
As for the first model,as shown in Fig. 12 , we get a cumulative 1,000 explanation features.
The word surge is mostly related in a negative context and hence is having a negative
weight. Similarly, higher is mostly in those sentences which depict the growth of stocks and
hence is having positive weight. We can also observe the word Sensex having near about
neutral weight as it has both positive and negative references. Thus depicts both falls as

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 16/21

https://peerj.com
https://doi.org/10.7717/peerjcs.340/fig-11
https://doi.org/10.7717/peerjcs.340/fig-12
http://dx.doi.org/10.7717/peerj-cs.340


Table 4 Results of LIME tool. The sample of the result obtained by the lime tool where each word is as-
signed weight or bias.

Test Words Weights or
biases (approx)

Surge −0.15
Baroda +0.1
Higher +0.8
Edges +0.5
Live +0.3
Bank +0.2
Sensex −0.1
Of −0.05
SBI −0.7

Figure 13 Prediction results. Bar graph depicting Predicted Open vs. Actual Open. The red bar shows
the Actual Open Prices, and the blue bar shows the predicted opening prices for 7 days.

Full-size DOI: 10.7717/peerjcs.340/fig-13

well as the rise of stocks. By Table 4, we can thus interpret our data and thus understand
the biases in the dataset. Our model is not just a black box now, and the customers know
the insides of the system through just one graph. Thus we achieve the explanations along
with the insights from the data.

Test results
As shown in above Fig. 13, the trained model can now be tested for predictions. The blue
bar represents the predicted value, and the red bar represents the actual value obtained
from the dataset. This gives us a fair understanding of the output. If we want to test for the
real-time news, we can enter news that gets preprocessed and get a measure of the opening
price, i.e., how much did the opening price rise or fall from the previous opening price. We
present values in Table 4 for getting the exact idea of these values and gauge the accuracy
of our predictions.

The table below shows the following: date of actual open , previous closing price,
predicted opening price by our model, actual opening price, rise / fall in price, error
percentage, chi-squared test.
Here the:
Rise/fall is given by: (Predicted Open -Previous Close) (1.23)

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 17/21

https://peerj.com
https://doi.org/10.7717/peerjcs.340/fig-13
http://dx.doi.org/10.7717/peerj-cs.340


Table 5 Set of predictions made. The table with the final set of predictions along with comparisons with Actual Opening Prices, calculations for
Rise/Fall, Error Percentage, and the chi-squared test.

Test
case

Date Previous
close

Predicted
open

Actual
open

Rise/fall Error(%) Chi-squared
test

1 07/07/2020 36487.28 36558.12 36660.35 170.84 0.27 0.28
2 08/07/2020 36674.52 36594.59 36738.38 −79.93 0.39 0.56
3 09/07/2020 36329.01 36512.42 36450.69 183.41 0.16 1.52
4 10/07/2020 36737.69 36678.35 36555.13 −59.34 0.33 0.41
5 13/07/2020 36594.33 36610.27 36,880.66 15.94 0.73 1.98
6 14/07/2020 36693.69 36649.78 36,517.28 −43.91 0.21 0.48
7 15/07/2020 36033.06 36257.20 36,314.76 224.14 0.15 0.01

Error percentage is given by: abs((Actual-predicted)/(Actual ))*100 (1.24)
Chi-square is given by:(actual - predicted)2/predicted (1.25)

As presented above in Table 5, seven test cases have been stated with the stock market
prices along with the rise and fall of those values. The error represents the value difference
in percentage (Actual-predicted by our model) and for our test cases, it’s showing in the
range 0.15 to 0.73. Thus, we could manage to predict the Indian stock market price using
XAI by referring to the impact of financial news articles. Also the chi-square test allows us
to assess how much is the observed data varying from actual data.

CONCLUSION AND FUTURE SCOPE
The financial news articles play a major role in the movement of the stock price. Financial
news plays a dominant factor in how a particular company’s stock is perceived by the
investors at a given time. Making predictions based on news headlines can help budding
investors learn how and when stock prices fall or rise and take a decision based on the
same. Our proposed model created an explainable model that gives this explanation as well
as maintains and thus makes the output meaningful. This was done with the XAI using
the tool LIME. Future research directions could be automated predictions from a news
headline from a financial website and a multilingual financial news headline prediction.
We can also add emotion-based GIFs to add a fun element and make it more appealing
for the learner. The prediction model can be used as a decision-maker for an algorithmic
trading model.

ADDITIONAL INFORMATION AND DECLARATIONS

Funding
The authors received no funding for this work.

Competing Interests
The authors declare there are no competing interests.

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 18/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.340


Author Contributions
• Shilpa Gite and Ketan Kotecha analyzed the data, performed the computation work,
prepared figures and/or tables, authored or reviewed drafts of the paper, and approved
the final draft.
• Hrituja Khatavkar conceived and designed the experiments, performed the experiments,
performed the computation work, prepared figures and/or tables, authored or reviewed
drafts of the paper, and approved the final draft.
• Shilpi Srivastava conceived and designed the experiments, analyzed the data, performed
the computation work, prepared figures and/or tables, authored or reviewed drafts of
the paper, and approved the final draft.
• Priyam Maheshwari performed the experiments, performed the computation work,
authored or reviewed drafts of the paper, and approved the final draft.
• Neerav Pandey performed the computation work, authored or reviewed drafts of the
paper, and approved the final draft.

Data Availability
The following information was supplied regarding data availability:

Raw data, including financial news headlines obtained from Pulse and data from Yahoo
Finance, is available in the Supplementary Files.

Code is available at GitHub: https://github.com/Hrituja/Stock-Market-Prediction.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/
peerj-cs.340#supplemental-information.

REFERENCES
Bird S, Klein E, Loper E. 2016. Natural language processing with python. Available at

https://b-ok.cc/book/755716/be52d7?dsource=recommend (accessed on 16 July 2020).
Cho K, Merrienboer BV, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio

Y. 2014. Learning phrase representations using rnn encoder-decoder for statistical
machine translation. In: Proceedings of the 2014 conference on empirical methods in
natural language processing (EMNLP). DOI 10.3115/v1/d14-1179.

Doran D, Schulz S, Besold TR. 2017. What does explainable ai really mean? A new
conceptualization of perspectives. ArXiv preprint. arXiv:abs/1710.00794.

Egeli B, Ozturan M, Badur B. 2003. Stock market prediction using artificial neural
networks. In: Proceedings of the 3rd International Conference on Business, Hawaii, 18–
21 June 2003. 1–8.

Ghosh A, Bose B, Maji G, Debnath NN, Sen S. 2019. Stock price prediction using LSTM
on Indian Share Market. EasyChair 63:101–110 DOI 10.29007/qgcz.

Greff K, Srivastava RK, Koutnik J, Steunebrink BR, Schmidhuber J. 2015. LSTM: a
search space odyssey. ArXiv preprint. arXiv:1503.04069.

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 19/21

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.340#supplemental-information
https://github.com/Hrituja/Stock-Market-Prediction
http://dx.doi.org/10.7717/peerj-cs.340#supplemental-information
http://dx.doi.org/10.7717/peerj-cs.340#supplemental-information
https://b-ok.cc/book/755716/be52d7?dsource=recommend
http://dx.doi.org/10.3115/v1/d14-1179
http://arXiv.org/abs/abs/1710.00794
http://dx.doi.org/10.29007/qgcz
http://arXiv.org/abs/1503.04069
http://dx.doi.org/10.7717/peerj-cs.340


Hiransha M, Gopalakrishnan EA, VijayKrishna M, Soman KP. 2018. NSE stock market
prediction using deep-learning models. Procedia Computer Science 132:1351–1362
DOI 10.1016/j.procs.2018.05.050.

Hochreiter S, Schmidhuber J. 1997. Long Short-Term Memory. Neural Computation
9(8):1735–1780 DOI 10.1162/neco.1997.9.8.1735.

Kalyani J, Bharathi HN, Rao J. 2016. Stock Trend Prediction Using News Sentiment
Analysis’’. International Journal of Computer Science & Information Technology
(IJCSIT) 8(3):67–76.

Khaleghi B. 2019. The how of explainable AI: post-modelling explainability. Towards
Media Science. Available at https://towardsdatascience.com/the-how-of-explainable-
ai-post-modelling-explainability-8b4cbc7adf5f (accessed on 16 July 2020).

Kim H, Won C. 2018. Forecasting the volatility of stock price index: a hybrid model inte-
grating LSTM with multiple GARCH-type models. Expert Systems with Applications
103:23–37 DOI 10.1016/j.eswa.2018.03.002.

Messalas A, Christos M, Yannis K. 2019. Model-agnostic interpretability with shapley
values. DOI 10.1109/IISA.2019.8900669.

Moghar A, Hamiche M. 2020. Stock market prediction using LSTM recurrent neural net-
work. Procedia Computer Science 170:1168–1173 DOI 10.1016/j.procs.2020.03.049.

Nayak A, Pai MM, Pai RM. 2016. Prediction models for Indian Stock market. Procedia
Computer Science 89:441–449.

Olah C. 2020. Understanding LSTM Networks. [Blog Post]. Available at https://colah.
github.io/posts/2015-08-Understanding-LSTMs/ (accessed on 16 July 2020).

Patel J, Shah S, Thakkar P, Kotecha K. 2015. Predicting stock and stock price index
movement using trend deterministic data preparation and machine learning tech-
niques. Expert Systems with Applications 42:259–268 DOI 10.1016/j.eswa.2014.07.040.

Patel J, Shah S, Thakkar P, Kotecha K. 2014. Predicting stock market index using fusion
of machine learning techniques. Expert Systems with Applications 42:2162–2172
DOI 10.1016/j.eswa.2014.10.031.

Pathmind, Inc. 2020. A beginner’s guide to LSTMs and recurrent neural networks.
Available at https://pathmind.com/wiki/lstm.

Pulse by Zerodha. 2020. Pulse by Zerodha—The latest financial and market news from
all major Indian news sources aggregated in one place. Available at https://pulse.
zerodha.com/ (accessed on 16 July 2020).

Ribeiro M, Singh S, Guestrin C. 2016. ’’Why should i trust you?’’: explaining the
predictions of any classifier. 97:7–101 DOI 10.18653/v1/N16-3020.

Saffar M. 2019. How explainable artificial intelligence (XAI) can help us trust AI.
Medium. Available at https://medium.com/altaml/how-explainable-artificial-
intelligence-xai-can-help-us-trust-ai-8f01b574102d (accessed on 16 July 2020).

Selvin S, Vinayakumar R, Gopalakrishnan EA, Menon VK, Soman KP. 2017. Stock
price prediction using LSTM, RNN and CNN-sliding window model. In: 2017
International conference on advances in computing, communications and informatics
(ICACCI). DOI 10.1109/icacci.2017.8126078.

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 20/21

https://peerj.com
http://dx.doi.org/10.1016/j.procs.2018.05.050
http://dx.doi.org/10.1162/neco.1997.9.8.1735
https://towardsdatascience.com/the-how-of-explainable-ai-post-modelling-explainability-8b4cbc7adf5f
https://towardsdatascience.com/the-how-of-explainable-ai-post-modelling-explainability-8b4cbc7adf5f
http://dx.doi.org/10.1016/j.eswa.2018.03.002
http://dx.doi.org/10.1109/IISA.2019.8900669
http://dx.doi.org/10.1016/j.procs.2020.03.049
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
http://dx.doi.org/10.1016/j.eswa.2014.07.040
http://dx.doi.org/10.1016/j.eswa.2014.10.031
https://pathmind.com/wiki/lstm
https://pulse.zerodha.com/
https://pulse.zerodha.com/
http://dx.doi.org/10.18653/v1/N16-3020
https://medium.com/altaml/how-explainable-artificial-intelligence-xai-can-help-us-trust-ai-8f01b574102d
https://medium.com/altaml/how-explainable-artificial-intelligence-xai-can-help-us-trust-ai-8f01b574102d
http://dx.doi.org/10.1109/icacci.2017.8126078
http://dx.doi.org/10.7717/peerj-cs.340


Sherstinsky A. 2020. Fundamentals of recurrent neural network (RNN) and long short-
term memory (LSTM) network. Physica D: Nonlinear Phenomena 404:132306
DOI 10.1016/j.physd.2019.132306.

Shrikumar A, Greenside P, Shcherbina A, Kundaje A. 2016. Not just a black box:
learning important features through propagating activation differences..

Tan KK, Le NQK, Yeh HY, Chua MCH. 2019. Ensemble of deep recurrent neural
networks for identifying enhancers via dinucleotide physicochemical properties. Cell
8(7):767 DOI 10.3390/cells8070767.

Ţiţan AG. 2015. The efficient market hypothesis: review of specialized litera-
ture and empirical research. Procedia Economics and Finance 32:442–449
DOI 10.1016/s2212-5671(15)01416-1.

Vargas MR, Lima BS, Evsuko A. 2017. Deep learning for stock market prediction from
financial news articles. In: 2017 IEEE International Conference on Computational
Intelligence and Virtual Environments for Measurement Systems and Applications
(CIVEMSA). Piscataway: IEEE, 60–65.

Wexler J, Pushkarna M, Bolukbasi T, Wattenberg M, Viega F, Wilson J. 2019. The
What-If Tool: interactive probing of machine learning models. IEEE Transactions on
Visualization and Computer Graphics 26:56–65 DOI 10.1109/TVCG.2019.2934619.

Yahoo Finance. 2020. Yahoo Finance–stock market live, quotes, business & finance news.
Available at https://in.finance.yahoo.com/ (accessed on 16 July 2020).

Yoo PD, Kim MH, Jan T. 2005. Machine learning techniques and use of event infor-
mation for stock market prediction: a survey and evaluation. In: International
Conference on Computational Intelligence for Modeling, Control and Automation
(CIMCA 2005). Piscataway: IEEE, 835–841.

Zhang X, Chen F, Huang R. 2018. A combination of RNN and CNN for attention-based
relation classification. Procedia Computer Science 131:911–917
DOI 10.1016/j.procs.2018.04.221.

Zhang X, Fuehres H, Gloor PA. 2011. Predicting Stock Market Indicators Through
Twitter ’’I hope it is not as bad as I fear. Procedia - Social and Behavioral Sciences
26:55–62 DOI 10.1016/j.sbspro.2011.10.562.

Gite et al. (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.340 21/21

https://peerj.com
http://dx.doi.org/10.1016/j.physd.2019.132306
http://dx.doi.org/10.3390/cells8070767
http://dx.doi.org/10.1016/s2212-5671(15)01416-1
http://dx.doi.org/10.1109/TVCG.2019.2934619
https://in.finance.yahoo.com/
http://dx.doi.org/10.1016/j.procs.2018.04.221
http://dx.doi.org/10.1016/j.sbspro.2011.10.562
http://dx.doi.org/10.7717/peerj-cs.340