Summary of your 'study carrel'
==============================

This is a summary of your Distant Reader 'study carrel'.

The Distant Reader harvested & cached your content into a
collection/corpus. It then applied sets of natural language
processing and text mining against the collection. The results of
this process was reduced to a database file -- a 'study carrel'.
The study carrel can then be queried, thus bringing light
specific characteristics for your collection. These
characteristics can help you summarize the collection as well as
enumerate things you might want to investigate more closely.

This report is a terse narrative report, and when processing 
is complete you will be linked to a more complete narrative
report. 

                               Eric Lease Morgan <emorgan@nd.edu>


Number of items in the collection; 'How big is my corpus?'
----------------------------------------------------------
43


Average length of all items measured in words; "More or less, how big is each item?"
------------------------------------------------------------------------------------
3339


Average readability score of all items (0 = difficult; 100 = easy)
------------------------------------------------------------------
55


Top 50 statistically significant keywords; "What is my collection about?"
-------------------------------------------------------------------------
4	user
4	image
3	task
3	query
3	document
2	word
2	review
2	model
2	graph
2	dataset
2	claim
2	Lucene
2	BM25
1	view
1	tweet
1	topic
1	text
1	term
1	system
1	symptom
1	session
1	sentence
1	seed
1	schema
1	recommendation
1	ranker
1	question
1	product
1	premise
1	patent
1	passage
1	ontology
1	node
1	network
1	location
1	list
1	lexicon
1	language
1	label
1	item
1	irony
1	feature
1	entity
1	english
1	embedding
1	early
1	domain
1	dmp
1	disease
1	damage


Top 50 lemmatized nouns; "What is discussed?"
---------------------------------------------
961	model
762	query
721	document
559	task
522	word
512	user
469	information
465	text
436	result
435	dataset
418	image
383	approach
381	term
354	retrieval
350	representation
345	embedding
339	network
338	method
329	system
327	review
323	datum
310	language
282	feature
273	score
271	set
252	graph
242	work
234	search
227	performance
226	attention
218	number
214	node
208	context
200	evaluation
199	learning
197	domain
195	analysis
193	topic
192	training
188	vector
185	sentence
180	function
175	label
174	time
174	similarity
166	claim
165	relevance
157	product
157	premise
155	sentiment


Top 50 proper nouns; "What are the names of persons or places?"
--------------------------------------------------------------
161	al
134	q
132	et
118	IR
103	Sect
94	BM25
86	Table
84	Fig
77	Eq
68	j
68	Retrieval
65	S
63	Lucene
61	u
60	Information
60	English
56	k
54	t
52	K
51	Twitter
51	COLTR
49	D
48	T
47	Bantu
45	.
42	d
42	BERT
41	s
41	i
41	DOI
39	TREC
39	Neural
37	m
37	TransRev
36	sha
33	c
33	Task
32	M
31	L
30	eRisk
30	C
29	VRSS
29	F
28	y
28	w
28	BC
27	LSTM
27	CNN
27	A
26	Wikipedia


Top 50 personal pronouns nouns; "To whom are things referred?"
-------------------------------------------------------------
1842	we
428	it
214	they
207	i
96	them
22	one
20	us
12	you
9	he
6	itself
4	u
4	ours
4	me
3	she
3	s
3	ourselves
2	themselves
2	ndcg@10
2	him
2	's
1	Π
1	her
1	f


Top 50 lemmatized verbs; "What do things do?"
---------------------------------------------
4101	be
1083	use
695	have
466	base
347	learn
304	show
300	propose
227	do
219	consider
207	provide
195	give
173	follow
170	generate
166	make
163	train
160	include
145	rank
138	evaluate
135	compare
134	set
131	find
127	compute
124	contain
120	embed
117	define
116	describe
115	perform
114	obtain
109	support
107	retrieve
107	represent
106	identify
103	see
100	introduce
94	predict
93	improve
92	relate
92	present
92	exist
91	take
91	apply
89	extract
87	select
87	combine
82	focus
80	capture
79	report
78	require
77	outperform
77	need


Top 50 lemmatized adjectives and adverbs; "How are things described?"
---------------------------------------------------------------------
482	not
403	-
282	different
282	also
269	more
244	such
224	other
222	only
206	well
199	first
197	same
193	neural
168	semantic
161	then
160	large
152	new
151	however
144	similar
135	e.g.
134	relevant
131	most
130	good
128	deep
117	previous
116	high
112	social
109	therefore
104	specific
104	available
103	non
101	multi
97	as
96	long
95	many
92	single
83	modal
83	cross
82	online
81	original
79	second
78	several
78	common
75	multiple
74	important
74	further
73	simple
71	local
70	very
70	standard
70	small


Top 50 lemmatized superlative adjectives; "How are things described to the extreme?"
-------------------------------------------------------------------------
84	good
48	most
19	least
13	wide
11	high
10	Most
7	near
6	large
6	bad
4	long
3	small
3	old
3	low
3	late
3	close
2	big
1	weak
1	topmost
1	strong
1	slow
1	slight
1	simple
1	short
1	hard
1	great
1	fast
1	easy
1	early
1	Least
1	ImageNet
1	-which
1	-there
1	-d


Top 50 lemmatized superlative adverbs; "How do things do to the extreme?"
------------------------------------------------------------------------
83	most
15	least
8	well
1	widest


Top 50 Internet domains; "What Webbed places are alluded to in this corpus?"
----------------------------------------------------------------------------
7	github.com
2	neural-ir-explorer.ec.tuwien.ac.at
1	www.gurobi.com
1	slidewiki.org
1	ielab


Top 50 URLs; "What is hyperlinked from this corpus?"
----------------------------------------------------
2	http://neural-ir-explorer.ec.tuwien.ac.at/
2	http://github.com/david-morris/SlideImages/
2	http://github.com/bioinformatics-ua/BioASQ
1	http://www.gurobi.com/
1	http://slidewiki.org/
1	http://ielab
1	http://github.com/ly233/Seed-Guided-Topic-Model
1	http://github.com/Valentyn1997/kg-alignment-lessons-learned
1	http://github.com/MaziarMF/deep-k-means


Top 50 email addresses; "Who are you gonna call?"
-------------------------------------------------


Top 50 positive assertions; "What sentences are in the shape of noun-verb-noun?"
-------------------------------------------------------------------------------
6	document is relevant
3	approaches do not
3	data using t
3	graph embedding methods
3	image does not
3	reviews are not
3	word embedding models
3	word embedding vectors
2	approach does not
2	approach is better
2	approaches are not
2	data are available
2	dataset containing text
2	dataset is less
2	embedding is then
2	image embedding space
2	images retrieved so
2	information is not
2	methods are not
2	model is not
2	model was able
2	results include articles
2	retrieval using block
2	retrieval using monolingual
2	users are more
2	users do not
2	word embedding techniques
2	work is not
2	work was partially
1	approach includes verbs
1	approach is also
1	approach is beneficial
1	approach is different
1	approach is domain
1	approach is not
1	approach is semi
1	approach provided results
1	approach provides users
1	approach use l
1	approach was not
1	approaches are almost
1	approaches are common
1	approaches are incomparable
1	approaches are mkl
1	approaches are often
1	approaches are only
1	approaches have also
1	approaches is also
1	approaches is lower
1	approaches perform better


Top 50 negative assertions; "What sentences are in the shape of noun-verb-no|not-noun?"
---------------------------------------------------------------------------------------
3	reviews are not available
2	information is not available
1	approach does not explicitly
1	approach is not only
1	approach was not scalable
1	approaches are not antagonist
1	approaches are not probabilistic
1	data is not available
1	dataset is not publicly
1	documents are not available
1	documents containing no seed
1	methods are not yet
1	model was not fine
1	queries have no lemmas
1	result was not strictly
1	review is not available
1	set does not necessarily
1	system is not able
1	systems are not able
1	systems is not directly
1	text is not available
1	users do not accurately
1	users do not simply
1	users were not aware
1	work is not possible


A rudimentary bibliography
--------------------------
      id = cord-020896-yrocw53j
  author = Agarwal, Mansi
   title = MEMIS: Multimodal Emergency Management Information System
    date = 2020-03-17
keywords = damage; system; tweet
 summary = We present MEMIS, a system that can be used in emergencies like disasters to identify and analyze the damage indicated by user-generated multimodal social media posts, thereby helping the disaster management groups in making informed decisions. To this end, we propose MEMIS, a multimodal system capable of extracting information from social media, and employs both images and text for identifying damage and its severity in real-time (refer Sect. Therefore, we effectively have three models for each modality: first for filtering the informative tweets, then for those pertaining to the infrastructural damage (or any other category related to the relief group), and finally for assessing the severity of damage present. Similarly, if at least one of the text and the image modality predicts an informative tweet as containing infrastructural damage, the tweet undergoes severity analysis. Here, we use attention fusion to combine the feature interpretations from the text and image modalities for the severity analysis module [12, 26] .
     doi = 10.1007/978-3-030-45439-5_32

      id = cord-020843-cq4lbd0l
  author = Almeida, Tiago
   title = Calling Attention to Passages for Biomedical Question Answering
    date = 2020-03-24
keywords = document; passage
 summary = This paper presents a pipeline for document and passage retrieval for biomedical question answering built around a new variant of the DeepRank network model in which the recursive layer is replaced by a self-attention layer combined with a weighting mechanism. On the other hand, models such as the Deep Relevance Matching Model (DRMM) [3] or DeepRank [10] follow a interaction-based approach, in which matching signals between query and document are captured and used by the neural network to produces a ranking score. The main contribution of this work is a new variant of the DeepRank neural network architecture in which the recursive layer originally included in the final aggregation step is replaced by a self-attention layer followed by a weighting mechanism similar to the term gating layer of the DRMM. The proposed model was evaluated on the BioASQ dataset, as part of a document and passage (snippet) retrieval pipeline for biomedical question answering, achieving similar retrieval performance when compared to more complex network architectures.
     doi = 10.1007/978-3-030-45442-5_9

      id = cord-020880-m7d4e0eh
  author = Barrón-Cedeño, Alberto
   title = CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media
    date = 2020-03-24
keywords = claim; task
 summary = Task 3 asks to retrieve text snippets from a given set of Web pages that would be useful for verifying a target tweet''s claim. Finally, the lab offers a fifth task that asks to predict the check-worthiness of the claims made in English political debates and speeches. Task 3 is defined as follows: Given a check-worthy claim on a specific topic and a set of text snippets extracted from potentially-relevant webpages, return a ranked list of all evidence snippets for the claim. Once we acquire annotations for Task 1, we share with participants the Web pages and text snippets from them solely for the check-worthy claims, which would enable the start of the evaluation cycle for Task 3. Task 4 is defined as follows: Given a check-worthy claim on a specific topic and a set of potentially-relevant Web pages, predict the veracity of the claim.
     doi = 10.1007/978-3-030-45442-5_65

      id = cord-020912-tbq7okmj
  author = Batra, Vishwash
   title = Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration
    date = 2020-03-17
keywords = VRSS; image; text
 summary = We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. More concretely, we incorporate the global context information encoded in the entire text sequence (through the attention mechanism) into a variational autoencoder (VAE) at each time step, which converts the input text into an image representation in the image embedding space. To capture the semantics of the images retrieved so far (in a story/recipe), we assume the prior of the distribution of the topic given the text input follows the distribution conditional on the latent topic from the previous time step. -We propose a new variational recurrent seq2seq (VRSS) retrieval model for seq2seq retrieval, which employs temporally-dependent latent variables to capture the sequential semantic structure of text-image sequences. Our work is related to: cross-modal retrieval, story picturing, variational recurrent neural networks, and cooking recipe datasets.
     doi = 10.1007/978-3-030-45439-5_4

      id = cord-020814-1ty7wzlv
  author = Berrendorf, Max
   title = Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned
    date = 2020-03-24
keywords = GCN; entity
 summary = In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Graph Convolutional Networks (GCN) [7, 9] , which have been recently become increasingly popular, are at the core of state-of-the-art methods for entity alignments in KGs [3, 6, 22, 24, 27] . 1. We investigate the reproducibility of the published results of a recent GCNbased method for entity alignment and uncover differences between the method''s description in the paper and the authors'' implementation. Overview of used datasets with their sizes in the number of triples (edges), entities (nodes), relations (different edge types) and alignments. GCN-Align [22] is a GCN-based approach to embed all entities from both graphs into a common embedding space. Semi-supervised entity alignment via knowledge graph embedding with awareness of degree difference Entity alignment between knowledge graphs using attribute embeddings
     doi = 10.1007/978-3-030-45442-5_1

      id = cord-020890-aw465igx
  author = Brochier, Robin
   title = Inductive Document Network Embedding with Topic-Word Attention
    date = 2020-03-17
keywords = document; topic; word
 summary = 
     doi = 10.1007/978-3-030-45439-5_22

      id = cord-020808-wpso3jug
  author = Cardoso, João
   title = Machine-Actionable Data Management Plans: A Knowledge Retrieval Approach to Automate the Assessment of Funders’ Requirements
    date = 2020-03-24
keywords = dmp; ontology
 summary = In order to guide researchers through the process of managing their data, many funding agencies (e.g. the National Science Foundation (NSF), the European Commission (EC), or the Fundação para a Ciência e Tecnologia (FCT) have created and published their own open access policies, as well as requiring that any grant proposals be accompanied by a Data Management Plan (DMP). The DMP is a document describing the techniques, methods and policies on how data from a research project is to be created or collected, documented, accessed, preserved and disseminated. The second part comprises of the execution of the following four tasks and results in both the collection of the necessary mappings between the ontology and the identified DMP templates, and creation of DL queries based on the funders'' requirements. The DMP Common Standard Ontology (DCSO) 1 , was created with the objective of providing an implementation of the DMP Common Standards model expressed through the usage of semantic technology, which has been considered a possible solution in the data management and preservation domains [9] .
     doi = 10.1007/978-3-030-45442-5_15

      id = cord-020908-oe77eupc
  author = Chen, Zhiyu
   title = Leveraging Schema Labels to Enhance Dataset Search
    date = 2020-03-17
keywords = dataset; label; schema
 summary = 
     doi = 10.1007/978-3-030-45439-5_18

      id = cord-020899-d6r4fr9r
  author = Doinychko, Anastasiia
   title = Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views
    date = 2020-03-17
keywords = Cond; view
 summary = In this paper, we present a conditional GAN with two generators and a common discriminator for multiview learning problems where observations have two views, but one of them may be missing for some of the training samples. We address the problem of multiview learning with Generative Adversarial Networks (GANs) in the case where some observations may have missing views without there being an external resource to complete them. We demonstrate that generated views allow to achieve state-of-the-art results on a subset of Reuters RCV1/RCV2 collections compared to multiview approaches that rely on Machine Translation (MT) for translating documents into languages in which their versions do not exist; before training the models. 3.2); -Achieve state-of-the art performance compared to multiview approaches that rely on external view generating functions on multilingual document classification; and which is another challenging application than image analysis which is the domain of choice for the design of new GAN models (Sect.
     doi = 10.1007/978-3-030-45439-5_53

      id = cord-020914-7p37m92a
  author = Dumani, Lorik
   title = A Framework for Argument Retrieval: Ranking Argument Clusters by Frequency and Specificity
    date = 2020-03-17
keywords = claim; premise
 summary = From an information retrieval perspective, an interesting task within this setting is finding the best supporting and attacking premises for a given query claim from a large corpus of arguments. From an information retrieval perspective, an interesting task within this setting is finding the best supporting (pro) and attacking (con) premises for a given query claim [31] . Given a user''s keyword query, the system retrieves, ranks, and presents premises supporting and attacking the query, taking similarity of the query with the premise, its corresponding claim, and other contextual information into account. We assume that we work with a large corpus of argumentative text, for example collections of political speeches or forum discussions, that has already been mined and transferred into claims with the corresponding premises and stances. We consider the following problem: Given a controversial claim or topic, for example "We should abandon fossil fuels", a user searches for the most important premises from the corpus supporting or attacking it.
     doi = 10.1007/978-3-030-45439-5_29

      id = cord-020916-ds0cf78u
  author = Fard, Mazar Moradi
   title = Seed-Guided Deep Document Clustering
    date = 2020-03-17
keywords = SD2C; seed; word
 summary = The main contributions of this study can be summarized as follows: (a) We introduce the Seed-guided Deep Document Clustering (SD2C) framework, 1 the first attempt, to the best of our knowledge, to constrain clustering with seed words based on a deep clustering approach; and (b) we validate this framework through experiments based on automatically selected seed words on five publicly available text datasets with various sizes and characteristics. The constrained clustering problem we are addressing in fact bears strong similarity with the one of seed-guided dataless text classification, which consist in categorizing documents based on a small set of seed words describing the classes/clusters. This can be done by enforcing that seed words have more influence either on the learned document embeddings, a solution we refer to as SD2C-Doc, or on the cluster representatives, a solution we refer to as SD2C-Rep. Note that the second solution can only be used when the clustering process is based on cluster representatives (i.e., R = {r k } K k=1 with K the number of clusters), which is indeed the case for most current deep clustering methods [1] .
     doi = 10.1007/978-3-030-45439-5_1

      id = cord-020888-ov2lzus4
  author = Formal, Thibault
   title = Learning to Rank Images with Cross-Modal Graph Convolutions
    date = 2020-03-17
keywords = PRF; image; model
 summary = While most of the current approaches for cross-modal retrieval revolve around learning how to represent text and images in a shared latent space, we take a different direction: we propose to generalize the cross-modal relevance feedback mechanism, a simple yet effective unsupervised method, that relies on standard information retrieval heuristics and the choice of a few hyper-parameters. The model can be understood very simply: similarly to PRF methods in standard information retrieval, the goal is to boost images that are visually similar to top images (from a text point of view), i.e. images that are likely to be relevant to the query but were initially badly ranked (which is likely to happen in the web scenario, where text is crawled from source page and can be very noisy).
     doi = 10.1007/978-3-030-45439-5_39

      id = cord-020901-aew8xr6n
  author = García-Durán, Alberto
   title = TransRev: Modeling Reviews as Translations from Users to Items
    date = 2020-03-17
keywords = embedding; review; user
 summary = 
     doi = 10.1007/978-3-030-45439-5_16

      id = cord-020830-97xmu329
  author = Ghanem, Bilal
   title = Irony Detection in a Multilingual Context
    date = 2020-03-24
keywords = arabic; irony
 summary = We show that these monolingual models trained separately on different languages using multilingual word representation or text-based features can open the door to irony detection in languages that lack of annotated data for irony. We aim here to bridge the gap by tackling ID in tweets from both multilingual (French, English and Arabic) and multicultural perspectives (Indo-European languages whose speakers share quite the same cultural background vs. We can justify that by, the language presentation of the Arabic and French tweets are quite informal and have many dialect words that may not exist in the pretrained embeddings we used comparing to the English ones (lower embeddings coverage ratio), which become harder for the CNN to learn a clear semantic pattern. The CNN architecture trained on cross-lingual word representation shows that irony has a certain similarity between the languages we targeted despite the cultural differences which confirm that irony is a universal phenomena, as already shown in previous linguistic studies [9, 24, 35] .
     doi = 10.1007/978-3-030-45442-5_18

      id = cord-020834-ch0fg9rp
  author = Grand, Adrien
   title = From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance
    date = 2020-03-24
keywords = Lucene; Suel
 summary = We share the story of how an innovation that originated from academia-blockmax indexes and the corresponding block-max Wand query evaluation algorithm of Ding and Suel [6] -made its way into the open-source Lucene search library. We see this paper as having two main contributions beyond providing a narrative of events: First, we report results of experiments that attempt to match the original conditions of Ding and Suel [6] and present additional results on a number of standard academic IR test collections. 3 Support for block-max indexes was the final feature that was implemented, based on the developers'' reading of the paper by Ding and Suel [6] , which required invasive changes to Lucene''s index format. The story of block-max Wand in Lucene provides a case study of how an innovation that originated in academia made its way into the world''s most widely-used search library and achieved significant impact in the "real world" through hundreds of production deployments worldwide (if we consider the broader Lucene ecosystem, which includes systems such as Elasticsearch and Solr).
     doi = 10.1007/978-3-030-45442-5_3

      id = cord-020835-n9v5ln2i
  author = Jangra, Anubhav
   title = Text-Image-Video Summary Generation Using Joint Integer Linear Programming
    date = 2020-03-24
keywords = ILP; image
 summary = 
     doi = 10.1007/978-3-030-45442-5_24

      id = cord-020815-j9eboa94
  author = Kamphuis, Chris
   title = Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants
    date = 2020-03-24
keywords = BM25; Lucene
 summary = Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene''s often maligned approximation of document length. Although learning-to-rank approaches and neural ranking models are widely used today, they are typically deployed as part of a multi-stage reranking architecture, over candidate documents supplied by a simple term-matching method using traditional inverted indexes [1] . Our goal is a large-scale reproducibility study to explore the nuances of different variants of BM25 and their impact on retrieval effectiveness. Their findings are confirmed: effectiveness differences in IR experiments are unlikely to be the result of the choice of BM25 variant a system implemented. We implemented a variant that uses exact document lengths, but is otherwise identical to the Lucene default. Storing exact document lengths would allow for different ranking functions to be swapped at query time more easily, as no information would be discarded at index time.
     doi = 10.1007/978-3-030-45442-5_4

      id = cord-020806-lof49r72
  author = Landin, Alfonso
   title = Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings
    date = 2020-03-24
keywords = item; recommendation
 summary = title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings In this paper, we present EER, a linear model for the top-N recommendation task, which takes advantage of user and item embeddings for improving novelty and diversity without harming accuracy. In this paper, we propose a method to augment an existing recommendation linear model to make more diverse and novel recommendations, while maintaining similar accuracy results. Experiments conducted on three datasets show that our proposal outperforms the original model in both novelty and diversity while maintaining similar levels of accuracy. On the other side, as results in Table 3 show, ELP is able to provide good figures in novelty and diversity, thanks to the embedding model capturing non-linear relations between users and items. It is common in the field of recommender systems for methods with lower accuracy to have higher values in diversity and novelty. FISM: factored item similarity models for top-n recommender systems
     doi = 10.1007/978-3-030-45442-5_27

      id = cord-020794-d3oru1w5
  author = Leekha, Maitree
   title = A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling
    date = 2020-03-24
keywords = LMOTE
 summary = title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling In this work, we introduce a novel over-sampling technique to address the problem of class imbalance, and propose a multi-task deep learning approach for mining suggestions from multiple domains. Experimental results on a publicly available dataset show that our over-sampling technique, coupled with the multi-task framework outperforms state-of-the-art open domain suggestion mining models in terms of the F-1 measure and AUC. In our study, we generate synthetic positive reviews till the number of suggestion and non-suggestion class samples becomes equal in the training set. All comparisons have been made in terms of the F-1 score of the suggestion class for a fair comparison with prior work on representational learning for open domain suggestion mining [5] (refer Baseline in Table 3 ). In this work, we proposed a Multi-task learning framework for Open Domain Suggestion Mining along with a novel language model based over-sampling technique for text-LMOTE.
     doi = 10.1007/978-3-030-45442-5_28

      id = cord-020851-hf5c0i9z
  author = Losada, David E.
   title = eRisk 2020: Self-harm and Depression Challenges
    date = 2020-03-24
keywords = early; task
 summary = 
     doi = 10.1007/978-3-030-45442-5_72

      id = cord-020801-3sbicp3v
  author = MacAvaney, Sean
   title = Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning
    date = 2020-03-24
keywords = TREC; english
 summary = In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. [28] leveraged a data set of Wikipedia pages in 25 languages to train a learning to rank algorithm for Japanese-English and Swahili-English cross-language retrieval. In particular, to circumvent the lack of training data, we leverage transfer learning techniques to train Arabic, Mandarin, and Spanish retrieval models using English training data. We evaluate our models in a zero-shot setting; that is, we use them to predict relevance scores for query document pairs in languages never seen during training. Because large-scale relevance judgments are largely absent in languages other than English, we propose a new setting to evaluate learning-to-rank approaches: zero-shot cross-lingual ranking.
     doi = 10.1007/978-3-030-45442-5_31

      id = cord-020931-fymgnv1g
  author = Meng, Changping
   title = ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis
    date = 2020-03-17
keywords = English; feature; sentence
 summary = 
     doi = 10.1007/978-3-030-45439-5_3

      id = cord-020904-x3o3a45b
  author = Montazeralghaem, Ali
   title = Relevance Ranking Based on Query-Aware Context Analysis
    date = 2020-03-17
keywords = query; term
 summary = The primary goal of the proposed model is to combine the exact and semantic matching between query and document terms, which has been shown to produce effective performance in information retrieval. In basic retrieval models such as BM25 [30] and the language modeling framework [29] , the relevance score of a document is estimated based on explicit matching of query and document terms. Finally, our proposed model for relevance ranking provides the basis for natural integration of semantic term matching and local document context analysis into any retrieval model. [13] proposed a generalized estimate of document language models using a noisy channel, which captures semantic term similarities computed using word embeddings. Note that in this experiment, we only consider methods that select expansion terms based on word embeddings and not other information sources such as the top retrieved documents for each query (PRF).
     doi = 10.1007/978-3-030-45439-5_30

      id = cord-020848-nypu4w9s
  author = Morris, David
   title = SlideImages: A Dataset for Educational Image Classification
    date = 2020-03-24
keywords = dataset; image
 summary = Currently, many document analysis systems are trained in part on scene images due to the lack of large datasets of educational image data. In this paper, we address this issue and present SlideImages, a dataset for the task of classifying educational illustrations. SlideImages contains training data collected from various sources, e.g., Wikimedia Commons and the AI2D dataset, and test data collected from educational slides. Born-digital and educational images need further benchmarks on challenging information retrieval tasks in order to test generalization. While document scans and born-digital educational illustrations have materially different appearance, these papers show that the utility of deep neural networks is not limited to scene image tasks (Fig. 1) . The related DocFigure dataset covers similar images and has much more data than SlideImages. In this paper, we have presented the task of classifying educational illustrations and images in slides and introduced a novel dataset SlideImages.
     doi = 10.1007/978-3-030-45442-5_36

      id = cord-020811-pacy48qx
  author = Muhammad, Shamsuddeen Hassan
   title = Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon
    date = 2020-03-24
keywords = lexicon
 summary = title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon To this end, we propose an approach to automatically generate a domain-specific sentiment lexicon using a vector model enriched by weights. Although research has been carried out on corpus-based approaches for automatic generation of a domain-specific lexicon [1, 4, 5, 7, 9, 10, 14] , existing approaches focused on creation of a lexicon from a single corpus [4] . To this end, this work proposes an incremental approach for the automatic generation of a domain-specific sentiment lexicon. We aim to investigate an incremental technique for automatically generating domain-specific sentiment lexicon from a corpus. Can we automatically generate a sentiment lexicon from a corpus and improves the existing approaches? After detecting the domain shift, we merge the distribution using a similar approach discussed (in updating using the same corpus) and generate the lexicon.
     doi = 10.1007/978-3-030-45442-5_81

      id = cord-020918-056bvngu
  author = Nchabeleng, Mathibele
   title = Evaluating the Effectiveness of the Standard Insights Extraction Pipeline for Bantu Languages
    date = 2020-03-17
keywords = Bantu; Runyankore; language
 summary = 
     doi = 10.1007/978-3-030-45439-5_11

      id = cord-020832-iavwkdpr
  author = Nguyen, Dat Quoc
   title = ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents
    date = 2020-03-24
keywords = chemical; patent
 summary = title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents ChEMU involves two key information extraction tasks over chemical reactions from patents. In this paper, we propose a new evaluation lab (called ChEMU) focusing on information extraction over chemical reactions from patents. Our goals are: (1) To develop tasks that impact chemical research in both academia and industry, (2) To provide the community with a new dataset of chemical entities, enriched with relational links between chemical event triggers and arguments, and (3) To advance the state-of-the-art in information extraction over chemical patents. The ChEMU lab at CLEF-2020 1 offers the two information extraction tasks of Named entity recognition (Task 1) and Event extraction (Task 2) over chemical reactions from patent documents. ChEMU will focus on two new tasks of named entity recognition and event extraction over chemical reactions from patents.
     doi = 10.1007/978-3-030-45442-5_74

      id = cord-020820-cbikq0v0
  author = Papadakos, Panagiotis
   title = Dualism in Topical Relevance
    date = 2020-03-24
keywords = query; user
 summary = To this end, in this paper we elaborate on the idea of leveraging the available antonyms of the original query terms for eventually producing an answer which provides a better overview of the related conceptual and information space. In this paper we elaborate on the idea of leveraging the available antonyms of the original query terms (if they exist), for eventually producing an answer which provides a better overview of the related information and conceptual space. In their comments for these queries, users mention that the selected (i.e., dual) list "provides a more general picture" and "more relevant and interesting results, although contradicting". For the future, we plan to define the appropriate antonyms selection algorithms and relevance metrics, implement the proposed functionality in a meta-search setting, and conduct a large scale evaluation with real users over exploratory tasks, to identify in which queries the dual approach is beneficial and to what types of users.
     doi = 10.1007/978-3-030-45442-5_40

      id = cord-020909-n36p5n2k
  author = Papadakos, Panagiotis
   title = bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users
    date = 2020-03-17
keywords = biased; domain
 summary = -the bias goggles model for computing the bias characteristics of web domains for a user-defined concept, based on the notions of Biased Concepts (BCs), Aspects of Bias (ABs), and the metrics of the support of the domain for a specific AB and BC, and its bias score for this BC, -the introduction of the Support Flow Graph (SFG), along with graph-based algorithms for computing the AB support score of domains, that include adaptations of the Independence Cascade (IC) and Linear Threshold (LT) propagation models, and the new Biased-PageRank (Biased-PR) variation that models different behaviours of a biased surfer, -an initial discussion about performance and implementation issues, -some promising evaluation results that showcase the effectiveness and efficiency of the approach on a relatively small dataset of crawled pages, using the new AGBR and AGS metrics, -a publicly accessible prototype of bias goggles.
     doi = 10.1007/978-3-030-45439-5_52

      id = cord-020871-1v6dcmt3
  author = Papariello, Luca
   title = On the Replicability of Combining Word Embeddings and Retrieval Models
    date = 2020-03-24
keywords = Fisher; model
 summary = 
     doi = 10.1007/978-3-030-45442-5_7

      id = cord-020905-gw8i6tkn
  author = Qu, Xianshan
   title = An Attention Model of Customer Expectation to Improve Review Helpfulness Prediction
    date = 2020-03-17
keywords = attention; product; review
 summary = To model such customer expectations and capture important information from a review text, we propose a novel neural network which leverages review sentiment and product information. In order to address the above issues, we propose a novel neural network architecture to introduce sentiment and product information when identifying helpful content from a review text. In the cold start scenario, our proposed model demonstrates an AUC improvement of 5.4% and 1.5% on Amazon and Yelp data sets, respectively, when compared to the state of the art model. From Table 5 , we see that adding a sentiment attention layer (HSA) to the base model (HBiLSTM) results in an average improvement in the AUC score of 2.0% and 2.6%, respectively on the Amazon and Yelp data sets. In this paper, we describe our analysis of review helpfulness prediction and propose a novel neural network model with attention modules to incorporate sentiment and product information.
     doi = 10.1007/978-3-030-45439-5_55

      id = cord-020872-frr8xba6
  author = Santosh, Tokala Yaswanth Sri Sai
   title = DAKE: Document-Level Attention for Keyphrase Extraction
    date = 2020-03-24
keywords = CRF; document
 summary = 
     doi = 10.1007/978-3-030-45442-5_49

      id = cord-020936-k1upc1xu
  author = Sanz-Cruzado, Javier
   title = Axiomatic Analysis of Contact Recommendation Methods in Social Networks: An IR Perspective
    date = 2020-03-17
keywords = BM25; user
 summary = 
     doi = 10.1007/978-3-030-45439-5_12

      id = cord-020885-f667icyt
  author = Sharma, Ujjwal
   title = Semantic Path-Based Learning for Review Volume Prediction
    date = 2020-03-17
keywords = graph; network; node
 summary = In this work, we present an approach that uses semantically meaningful, bimodal random walks on real-world heterogeneous networks to extract correlations between nodes and bring together nodes with shared or similar attributes. In this work, -We propose a novel method that incorporates restaurants and their attributes into a multimodal graph and extracts multiple, bimodal low dimensional representations for restaurants based on available paths through shared visual, textual, geographical and categorical features. In this section, we discuss prior work that leverages graph-based structures for extracting information from multiple modalities, focussing on the auto-captioning task that introduced such methods. For each of these sub-networks, we perform random walks and use a variant of the heterogeneous skip-gram objective introduced in [6] to generate low-dimensional bimodal embeddings. Our attention-based model combines separately learned bimodal embeddings using a late-fusion setup for predicting the review volume of the restaurants.
     doi = 10.1007/978-3-030-45439-5_54

      id = cord-020875-vd4rtxmz
  author = Suwaileh, Reem
   title = Time-Critical Geolocation for Social Good
    date = 2020-03-24
keywords = LMP; location
 summary = To address this problem, I aim to exploit different techniques such as training neural models, enriching the tweet representation, and studying methods to mitigate the lack of labeled data. In my work, I am interested in tackling the Location Mention Prediction (LMP) problem during time-critical situations. The location taggers have to address many challenges including microblogging-specific challenges (e.g., tweet sparsity, noisiness, stream rapid-changing, hashtag riding, etc.) and the task-specific challenges (e.g., time-criticality of the solution, scarcity of labeled data, etc.). Alternatively, Sultanik and Fink [25] , used Information Retrieval (IR) based approach to identify the location mentions in tweets. Moreover, Hoang and Mothe [8] combined syntactic and semantic features to train traditional ML-based models whereas Kumar and Singh [13] trained a Convolutional Neural Network (CNN) model that learns the continuous representation of tweet text and then identifies the location mentions.
     doi = 10.1007/978-3-030-45442-5_82

      id = cord-020903-qt0ly5d0
  author = Tamine, Lynda
   title = What Can Task Teach Us About Query Reformulations?
    date = 2020-03-17
keywords = session; task
 summary = task-based sessions represent significantly different background contexts to be used in the perspective of better understanding users'' query reformulations. Using insights from large-scale search logs, our findings clearly show that task is an additional relevant search unit that helps better understanding user''s query reformulation patterns and predicting the next user''s query. To design support processes for task-based search systems, we argue that we need to: (1) fully understand how user''s task performed in natural settings drives the query reformulations changes; and (2) gauge the level of similarity of these changes trends with those observed in time-based sessions. With this in mind, we perform large-scale log analyses of users naturally engaged in tasks to examine query reformulations from both the time-based session vs. To identify query reformulation patterns, most of the previous works used large-scale log analyses segmented into time-based sessions.
     doi = 10.1007/978-3-030-45439-5_42

      id = cord-020891-lt3m8h41
  author = Witschel, Hans Friedrich
   title = KvGR: A Graph-Based Interface for Explorative Sequential Question Answering on Heterogeneous Information Sources
    date = 2020-03-17
keywords = graph; question; user
 summary = 
     doi = 10.1007/978-3-030-45439-5_50

      id = cord-020932-o5scqiyk
  author = Zhong, Wei
   title = Accelerating Substructure Similarity Search for Formula Retrieval
    date = 2020-03-17
keywords = list; query
 summary = In text similarity search, query processing can be accelerated through dynamic pruning [18] , which typically estimates score upperbounds to prune documents unlikely to be in the top K results. As a result, the posting list entry also stores the root node ID for indexed paths, in order to reconstruct matches substructures at merge time. Define partial upperbound matrix W = {w i,j } |Tq|×|T| where T = {T(m), m ∈ T q } are all the token paths from query OPT (T is essentially the same as tokenized P(T q )), and a binary variable x |T|×1 indicating which corresponding posting lists are placed in the non-requirement set. We have presented rank-safe dynamic pruning strategies that produce an upperbound estimation of structural similarity in order to speedup formula search using subtree matching. Our dynamic pruning strategies and specialized inverted index are different from traditional linear text search pruning methods and they further associate query structure representation with posting lists.
     doi = 10.1007/978-3-030-45439-5_47

      id = cord-020927-89c7rijg
  author = Zhuang, Shengyao
   title = Counterfactual Online Learning to Rank
    date = 2020-03-17
keywords = COLTR; DBGD; ranker
 summary = 
     doi = 10.1007/978-3-030-45439-5_28

      id = cord-020846-mfh1ope6
  author = Zlabinger, Markus
   title = DSR: A Collection for the Evaluation of Graded Disease-Symptom Relations
    date = 2020-03-24
keywords = disease; symptom
 summary = 
     doi = 10.1007/978-3-030-45442-5_54