Carrel name: journal-advancesInInformationRetrieval-cord
Creating study carrel named journal-advancesInInformationRetrieval-cord
Initializing database
         file: cache/cord-020793-kgje01qy.json
          key: cord-020793-kgje01qy
      authors: Suominen, Hanna; Kelly, Liadh; Goeuriot, Lorraine; Krallinger, Martin
        title: CLEF eHealth Evaluation Lab 2020
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_76
          sha: 
       doc_id: 20793
     cord_uid: kgje01qy

         file: cache/cord-020794-d3oru1w5.json
          key: cord-020794-d3oru1w5
      authors: Leekha, Maitree; Goswami, Mononito; Jain, Minni
        title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_28
          sha: 
       doc_id: 20794
     cord_uid: d3oru1w5

         file: cache/cord-020811-pacy48qx.json
          key: cord-020811-pacy48qx
      authors: Muhammad, Shamsuddeen Hassan; Brazdil, Pavel; Jorge, Alípio
        title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_81
          sha: 
       doc_id: 20811
     cord_uid: pacy48qx

         file: cache/cord-020820-cbikq0v0.json
          key: cord-020820-cbikq0v0
      authors: Papadakos, Panagiotis; Kalipolitis, Orfeas
        title: Dualism in Topical Relevance
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_40
          sha: 
       doc_id: 20820
     cord_uid: cbikq0v0

         file: cache/cord-020885-f667icyt.json
          key: cord-020885-f667icyt
      authors: Sharma, Ujjwal; Rudinac, Stevan; Worring, Marcel; Demmers, Joris; van Dolen, Willemijn
        title: Semantic Path-Based Learning for Review Volume Prediction
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_54
          sha: 
       doc_id: 20885
     cord_uid: f667icyt

         file: cache/cord-020896-yrocw53j.json
          key: cord-020896-yrocw53j
      authors: Agarwal, Mansi; Leekha, Maitree; Sawhney, Ramit; Ratn Shah, Rajiv; Kumar Yadav, Rajesh; Kumar Vishwakarma, Dinesh
        title: MEMIS: Multimodal Emergency Management Information System
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_32
          sha: 
       doc_id: 20896
     cord_uid: yrocw53j

         file: cache/cord-020834-ch0fg9rp.json
          key: cord-020834-ch0fg9rp
      authors: Grand, Adrien; Muir, Robert; Ferenczi, Jim; Lin, Jimmy
        title: From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_3
          sha: 
       doc_id: 20834
     cord_uid: ch0fg9rp

         file: cache/cord-020815-j9eboa94.json
          key: cord-020815-j9eboa94
      authors: Kamphuis, Chris; de Vries, Arjen P.; Boytsov, Leonid; Lin, Jimmy
        title: Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_4
          sha: 
       doc_id: 20815
     cord_uid: j9eboa94

         file: cache/cord-020841-40f2p3t4.json
          key: cord-020841-40f2p3t4
      authors: Hofstätter, Sebastian; Zlabinger, Markus; Hanbury, Allan
        title: Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-ranking Results
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_58
          sha: 
       doc_id: 20841
     cord_uid: 40f2p3t4

         file: cache/cord-020801-3sbicp3v.json
          key: cord-020801-3sbicp3v
      authors: MacAvaney, Sean; Soldaini, Luca; Goharian, Nazli
        title: Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_31
          sha: 
       doc_id: 20801
     cord_uid: 3sbicp3v

         file: cache/cord-020806-lof49r72.json
          key: cord-020806-lof49r72
      authors: Landin, Alfonso; Parapar, Javier; Barreiro, Álvaro
        title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_27
          sha: 
       doc_id: 20806
     cord_uid: lof49r72

         file: cache/cord-020843-cq4lbd0l.json
          key: cord-020843-cq4lbd0l
      authors: Almeida, Tiago; Matos, Sérgio
        title: Calling Attention to Passages for Biomedical Question Answering
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_9
          sha: 
       doc_id: 20843
     cord_uid: cq4lbd0l

         file: cache/cord-020912-tbq7okmj.json
          key: cord-020912-tbq7okmj
      authors: Batra, Vishwash; Haldar, Aparajita; He, Yulan; Ferhatosmanoglu, Hakan; Vogiatzis, George; Guha, Tanaya
        title: Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_4
          sha: 
       doc_id: 20912
     cord_uid: tbq7okmj

         file: cache/cord-020899-d6r4fr9r.json
          key: cord-020899-d6r4fr9r
      authors: Doinychko, Anastasiia; Amini, Massih-Reza
        title: Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_53
          sha: 
       doc_id: 20899
     cord_uid: d6r4fr9r

         file: cache/cord-020905-gw8i6tkn.json
          key: cord-020905-gw8i6tkn
      authors: Qu, Xianshan; Li, Xiaopeng; Farkas, Csilla; Rose, John
        title: An Attention Model of Customer Expectation to Improve Review Helpfulness Prediction
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_55
          sha: 
       doc_id: 20905
     cord_uid: gw8i6tkn

         file: cache/cord-020888-ov2lzus4.json
          key: cord-020888-ov2lzus4
      authors: Formal, Thibault; Clinchant, Stéphane; Renders, Jean-Michel; Lee, Sooyeol; Cho, Geun Hee
        title: Learning to Rank Images with Cross-Modal Graph Convolutions
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_39
          sha: 
       doc_id: 20888
     cord_uid: ov2lzus4

         file: cache/cord-020916-ds0cf78u.json
          key: cord-020916-ds0cf78u
      authors: Fard, Mazar Moradi; Thonet, Thibaut; Gaussier, Eric
        title: Seed-Guided Deep Document Clustering
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_1
          sha: 
       doc_id: 20916
     cord_uid: ds0cf78u

         file: cache/cord-020936-k1upc1xu.json
          key: cord-020936-k1upc1xu
      authors: Sanz-Cruzado, Javier; Macdonald, Craig; Ounis, Iadh; Castells, Pablo
        title: Axiomatic Analysis of Contact Recommendation Methods in Social Networks: An IR Perspective
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_12
          sha: 
       doc_id: 20936
     cord_uid: k1upc1xu

         file: cache/cord-020914-7p37m92a.json
          key: cord-020914-7p37m92a
      authors: Dumani, Lorik; Neumann, Patrick J.; Schenkel, Ralf
        title: A Framework for Argument Retrieval: Ranking Argument Clusters by Frequency and Specificity
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_29
          sha: 
       doc_id: 20914
     cord_uid: 7p37m92a

         file: cache/cord-020909-n36p5n2k.json
          key: cord-020909-n36p5n2k
      authors: Papadakos, Panagiotis; Konstantakis, Giannis
        title: bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_52
          sha: 
       doc_id: 20909
     cord_uid: n36p5n2k

         file: cache/cord-020932-o5scqiyk.json
          key: cord-020932-o5scqiyk
      authors: Zhong, Wei; Rohatgi, Shaurya; Wu, Jian; Giles, C. Lee; Zanibbi, Richard
        title: Accelerating Substructure Similarity Search for Formula Retrieval
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_47
          sha: 
       doc_id: 20932
     cord_uid: o5scqiyk

         file: cache/cord-020832-iavwkdpr.json
          key: cord-020832-iavwkdpr
      authors: Nguyen, Dat Quoc; Zhai, Zenan; Yoshikawa, Hiyori; Fang, Biaoyan; Druckenbrodt, Christian; Thorne, Camilo; Hoessel, Ralph; Akhondi, Saber A.; Cohn, Trevor; Baldwin, Timothy; Verspoor, Karin
        title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_74
          sha: 
       doc_id: 20832
     cord_uid: iavwkdpr

         file: cache/cord-020875-vd4rtxmz.json
          key: cord-020875-vd4rtxmz
      authors: Suwaileh, Reem
        title: Time-Critical Geolocation for Social Good
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_82
          sha: 
       doc_id: 20875
     cord_uid: vd4rtxmz

         file: cache/cord-020814-1ty7wzlv.json
          key: cord-020814-1ty7wzlv
      authors: Berrendorf, Max; Faerman, Evgeniy; Melnychuk, Valentyn; Tresp, Volker; Seidl, Thomas
        title: Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_1
          sha: 
       doc_id: 20814
     cord_uid: 1ty7wzlv

         file: cache/cord-020813-0wc23ixy.json
          key: cord-020813-0wc23ixy
      authors: Hashemi, Helia; Aliannejadi, Mohammad; Zamani, Hamed; Croft, W. Bruce
        title: ANTIQUE: A Non-factoid Question Answering Benchmark
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_21
          sha: 
       doc_id: 20813
     cord_uid: 0wc23ixy

         file: cache/cord-020903-qt0ly5d0.json
          key: cord-020903-qt0ly5d0
      authors: Tamine, Lynda; Melgarejo, Jesús Lovón; Pinel-Sauvagnat, Karen
        title: What Can Task Teach Us About Query Reformulations?
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_42
          sha: 
       doc_id: 20903
     cord_uid: qt0ly5d0

         file: cache/cord-020830-97xmu329.json
          key: cord-020830-97xmu329
      authors: Ghanem, Bilal; Karoui, Jihen; Benamara, Farah; Rosso, Paolo; Moriceau, Véronique
        title: Irony Detection in a Multilingual Context
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_18
          sha: 
       doc_id: 20830
     cord_uid: 97xmu329

         file: cache/cord-020848-nypu4w9s.json
          key: cord-020848-nypu4w9s
      authors: Morris, David; Müller-Budack, Eric; Ewerth, Ralph
        title: SlideImages: A Dataset for Educational Image Classification
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_36
          sha: 
       doc_id: 20848
     cord_uid: nypu4w9s

         file: cache/cord-020808-wpso3jug.json
          key: cord-020808-wpso3jug
      authors: Cardoso, João; Proença, Diogo; Borbinha, José
        title: Machine-Actionable Data Management Plans: A Knowledge Retrieval Approach to Automate the Assessment of Funders’ Requirements
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_15
          sha: 
       doc_id: 20808
     cord_uid: wpso3jug

         file: cache/cord-020835-n9v5ln2i.json
          key: cord-020835-n9v5ln2i
      authors: Jangra, Anubhav; Jatowt, Adam; Hasanuzzaman, Mohammad; Saha, Sriparna
        title: Text-Image-Video Summary Generation Using Joint Integer Linear Programming
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_24
          sha: 
       doc_id: 20835
     cord_uid: n9v5ln2i

         file: cache/cord-020880-m7d4e0eh.json
          key: cord-020880-m7d4e0eh
      authors: Barrón-Cedeño, Alberto; Elsayed, Tamer; Nakov, Preslav; Da San Martino, Giovanni; Hasanain, Maram; Suwaileh, Reem; Haouari, Fatima
        title: CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_65
          sha: 
       doc_id: 20880
     cord_uid: m7d4e0eh

         file: cache/cord-020846-mfh1ope6.json
          key: cord-020846-mfh1ope6
      authors: Zlabinger, Markus; Hofstätter, Sebastian; Rekabsaz, Navid; Hanbury, Allan
        title: DSR: A Collection for the Evaluation of Graded Disease-Symptom Relations
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_54
          sha: 
       doc_id: 20846
     cord_uid: mfh1ope6

         file: cache/cord-020891-lt3m8h41.json
          key: cord-020891-lt3m8h41
      authors: Witschel, Hans Friedrich; Riesen, Kaspar; Grether, Loris
        title: KvGR: A Graph-Based Interface for Explorative Sequential Question Answering on Heterogeneous Information Sources
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_50
          sha: 
       doc_id: 20891
     cord_uid: lt3m8h41

         file: cache/cord-020931-fymgnv1g.json
          key: cord-020931-fymgnv1g
      authors: Meng, Changping; Chen, Muhao; Mao, Jie; Neville, Jennifer
        title: ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_3
          sha: 
       doc_id: 20931
     cord_uid: fymgnv1g

         file: cache/cord-020871-1v6dcmt3.json
          key: cord-020871-1v6dcmt3
      authors: Papariello, Luca; Bampoulidis, Alexandros; Lupu, Mihai
        title: On the Replicability of Combining Word Embeddings and Retrieval Models
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_7
          sha: 
       doc_id: 20871
     cord_uid: 1v6dcmt3

         file: cache/cord-020904-x3o3a45b.json
          key: cord-020904-x3o3a45b
      authors: Montazeralghaem, Ali; Rahimi, Razieh; Allan, James
        title: Relevance Ranking Based on Query-Aware Context Analysis
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_30
          sha: 
       doc_id: 20904
     cord_uid: x3o3a45b

         file: cache/cord-020908-oe77eupc.json
          key: cord-020908-oe77eupc
      authors: Chen, Zhiyu; Jia, Haiyan; Heflin, Jeff; Davison, Brian D.
        title: Leveraging Schema Labels to Enhance Dataset Search
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_18
          sha: 
       doc_id: 20908
     cord_uid: oe77eupc

         file: cache/cord-020927-89c7rijg.json
          key: cord-020927-89c7rijg
      authors: Zhuang, Shengyao; Zuccon, Guido
        title: Counterfactual Online Learning to Rank
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_28
          sha: 
       doc_id: 20927
     cord_uid: 89c7rijg

         file: cache/cord-020918-056bvngu.json
          key: cord-020918-056bvngu
      authors: Nchabeleng, Mathibele; Byamugisha, Joan
        title: Evaluating the Effectiveness of the Standard Insights Extraction Pipeline for Bantu Languages
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_11
          sha: 
       doc_id: 20918
     cord_uid: 056bvngu

         file: cache/cord-020901-aew8xr6n.json
          key: cord-020901-aew8xr6n
      authors: García-Durán, Alberto; González, Roberto; Oñoro-Rubio, Daniel; Niepert, Mathias; Li, Hui
        title: TransRev: Modeling Reviews as Translations from Users to Items
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_16
          sha: 
       doc_id: 20901
     cord_uid: aew8xr6n

         file: cache/cord-020890-aw465igx.json
          key: cord-020890-aw465igx
      authors: Brochier, Robin; Guille, Adrien; Velcin, Julien
        title: Inductive Document Network Embedding with Topic-Word Attention
         date: 2020-03-17
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45439-5_22
          sha: 
       doc_id: 20890
     cord_uid: aw465igx

         file: cache/cord-020851-hf5c0i9z.json
          key: cord-020851-hf5c0i9z
      authors: Losada, David E.; Crestani, Fabio; Parapar, Javier
        title: eRisk 2020: Self-harm and Depression Challenges
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_72
          sha: 
       doc_id: 20851
     cord_uid: hf5c0i9z

         file: cache/cord-020872-frr8xba6.json
          key: cord-020872-frr8xba6
      authors: Santosh, Tokala Yaswanth Sri Sai; Sanyal, Debarshi Kumar; Bhowmick, Plaban Kumar; Das, Partha Pratim
        title: DAKE: Document-Level Attention for Keyphrase Extraction
         date: 2020-03-24
      journal: Advances in Information Retrieval
          DOI: 10.1007/978-3-030-45442-5_49
          sha: 
       doc_id: 20872
     cord_uid: frr8xba6

Reading metadata file and updating bibliogrpahics
=== updating bibliographic database
Building study carrel named journal-advancesInInformationRetrieval-cord
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 79544 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 80520 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 80988 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81131 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81037 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81457 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 80985 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81443 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81145 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81469 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81200 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81394 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
/data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81367 Aborted                 $FILE2BIB "$FILE" > "$OUTPUT"
=== file2bib.sh ===
         id: cord-020811-pacy48qx
     author: Muhammad, Shamsuddeen Hassan
      title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020811-pacy48qx.txt
      cache: ./cache/cord-020811-pacy48qx.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020811-pacy48qx.txt'
=== file2bib.sh ===
         id: cord-020841-40f2p3t4
     author: Hofstätter, Sebastian
      title: Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-ranking Results
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020841-40f2p3t4.txt
      cache: ./cache/cord-020841-40f2p3t4.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	2
resourceName	b'cord-020841-40f2p3t4.txt'
=== file2bib.sh ===
         id: cord-020808-wpso3jug
     author: Cardoso, João
      title: Machine-Actionable Data Management Plans: A Knowledge Retrieval Approach to Automate the Assessment of Funders’ Requirements
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020808-wpso3jug.txt
      cache: ./cache/cord-020808-wpso3jug.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020808-wpso3jug.txt'
=== file2bib.sh ===
         id: cord-020875-vd4rtxmz
     author: Suwaileh, Reem
      title: Time-Critical Geolocation for Social Good
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020875-vd4rtxmz.txt
      cache: ./cache/cord-020875-vd4rtxmz.txt

Content-Encoding	ISO-8859-1
Content-Type	text/plain; charset=ISO-8859-1
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020875-vd4rtxmz.txt'
=== file2bib.sh ===
         id: cord-020843-cq4lbd0l
     author: Almeida, Tiago
      title: Calling Attention to Passages for Biomedical Question Answering
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020843-cq4lbd0l.txt
      cache: ./cache/cord-020843-cq4lbd0l.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020843-cq4lbd0l.txt'
=== file2bib.sh ===
         id: cord-020848-nypu4w9s
     author: Morris, David
      title: SlideImages: A Dataset for Educational Image Classification
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020848-nypu4w9s.txt
      cache: ./cache/cord-020848-nypu4w9s.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020848-nypu4w9s.txt'
=== file2bib.sh ===
         id: cord-020820-cbikq0v0
     author: Papadakos, Panagiotis
      title: Dualism in Topical Relevance
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020820-cbikq0v0.txt
      cache: ./cache/cord-020820-cbikq0v0.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	4
resourceName	b'cord-020820-cbikq0v0.txt'
=== file2bib.sh ===
         id: cord-020813-0wc23ixy
     author: Hashemi, Helia
      title: ANTIQUE: A Non-factoid Question Answering Benchmark
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020813-0wc23ixy.txt
      cache: ./cache/cord-020813-0wc23ixy.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	2
resourceName	b'cord-020813-0wc23ixy.txt'
=== file2bib.sh ===
         id: cord-020814-1ty7wzlv
     author: Berrendorf, Max
      title: Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020814-1ty7wzlv.txt
      cache: ./cache/cord-020814-1ty7wzlv.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020814-1ty7wzlv.txt'
=== file2bib.sh ===
         id: cord-020832-iavwkdpr
     author: Nguyen, Dat Quoc
      title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020832-iavwkdpr.txt
      cache: ./cache/cord-020832-iavwkdpr.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020832-iavwkdpr.txt'
=== file2bib.sh ===
         id: cord-020794-d3oru1w5
     author: Leekha, Maitree
      title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020794-d3oru1w5.txt
      cache: ./cache/cord-020794-d3oru1w5.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	2
resourceName	b'cord-020794-d3oru1w5.txt'
=== file2bib.sh ===
         id: cord-020806-lof49r72
     author: Landin, Alfonso
      title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020806-lof49r72.txt
      cache: ./cache/cord-020806-lof49r72.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	209
resourceName	b'cord-020806-lof49r72.txt'
=== file2bib.sh ===
         id: cord-020801-3sbicp3v
     author: MacAvaney, Sean
      title: Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020801-3sbicp3v.txt
      cache: ./cache/cord-020801-3sbicp3v.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020801-3sbicp3v.txt'
=== file2bib.sh ===
         id: cord-020834-ch0fg9rp
     author: Grand, Adrien
      title: From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020834-ch0fg9rp.txt
      cache: ./cache/cord-020834-ch0fg9rp.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020834-ch0fg9rp.txt'
=== file2bib.sh ===
         id: cord-020793-kgje01qy
     author: Suominen, Hanna
      title: CLEF eHealth Evaluation Lab 2020
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020793-kgje01qy.txt
      cache: ./cache/cord-020793-kgje01qy.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	4
resourceName	b'cord-020793-kgje01qy.txt'
=== file2bib.sh ===
         id: cord-020815-j9eboa94
     author: Kamphuis, Chris
      title: Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020815-j9eboa94.txt
      cache: ./cache/cord-020815-j9eboa94.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	304
resourceName	b'cord-020815-j9eboa94.txt'
=== file2bib.sh ===
         id: cord-020885-f667icyt
     author: Sharma, Ujjwal
      title: Semantic Path-Based Learning for Review Volume Prediction
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020885-f667icyt.txt
      cache: ./cache/cord-020885-f667icyt.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	4
resourceName	b'cord-020885-f667icyt.txt'
=== file2bib.sh ===
         id: cord-020899-d6r4fr9r
     author: Doinychko, Anastasiia
      title: Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020899-d6r4fr9r.txt
      cache: ./cache/cord-020899-d6r4fr9r.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020899-d6r4fr9r.txt'
=== file2bib.sh ===
         id: cord-020916-ds0cf78u
     author: Fard, Mazar Moradi
      title: Seed-Guided Deep Document Clustering
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020916-ds0cf78u.txt
      cache: ./cache/cord-020916-ds0cf78u.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020916-ds0cf78u.txt'
=== file2bib.sh ===
         id: cord-020932-o5scqiyk
     author: Zhong, Wei
      title: Accelerating Substructure Similarity Search for Formula Retrieval
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020932-o5scqiyk.txt
      cache: ./cache/cord-020932-o5scqiyk.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	244
resourceName	b'cord-020932-o5scqiyk.txt'
=== file2bib.sh ===
         id: cord-020888-ov2lzus4
     author: Formal, Thibault
      title: Learning to Rank Images with Cross-Modal Graph Convolutions
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020888-ov2lzus4.txt
      cache: ./cache/cord-020888-ov2lzus4.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020888-ov2lzus4.txt'
=== file2bib.sh ===
         id: cord-020909-n36p5n2k
     author: Papadakos, Panagiotis
      title: bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020909-n36p5n2k.txt
      cache: ./cache/cord-020909-n36p5n2k.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	2
resourceName	b'cord-020909-n36p5n2k.txt'
=== file2bib.sh ===
         id: cord-020880-m7d4e0eh
     author: Barrón-Cedeño, Alberto
      title: CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020880-m7d4e0eh.txt
      cache: ./cache/cord-020880-m7d4e0eh.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	2
resourceName	b'cord-020880-m7d4e0eh.txt'
=== file2bib.sh ===
         id: cord-020912-tbq7okmj
     author: Batra, Vishwash
      title: Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020912-tbq7okmj.txt
      cache: ./cache/cord-020912-tbq7okmj.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	4
resourceName	b'cord-020912-tbq7okmj.txt'
=== file2bib.sh ===
         id: cord-020896-yrocw53j
     author: Agarwal, Mansi
      title: MEMIS: Multimodal Emergency Management Information System
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020896-yrocw53j.txt
      cache: ./cache/cord-020896-yrocw53j.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020896-yrocw53j.txt'
=== file2bib.sh ===
         id: cord-020914-7p37m92a
     author: Dumani, Lorik
      title: A Framework for Argument Retrieval: Ranking Argument Clusters by Frequency and Specificity
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020914-7p37m92a.txt
      cache: ./cache/cord-020914-7p37m92a.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020914-7p37m92a.txt'
=== file2bib.sh ===
         id: cord-020830-97xmu329
     author: Ghanem, Bilal
      title: Irony Detection in a Multilingual Context
       date: 2020-03-24
      pages: 
  extension: .txt
        txt: ./txt/cord-020830-97xmu329.txt
      cache: ./cache/cord-020830-97xmu329.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	4
resourceName	b'cord-020830-97xmu329.txt'
=== file2bib.sh ===
         id: cord-020905-gw8i6tkn
     author: Qu, Xianshan
      title: An Attention Model of Customer Expectation to Improve Review Helpfulness Prediction
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020905-gw8i6tkn.txt
      cache: ./cache/cord-020905-gw8i6tkn.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	4
resourceName	b'cord-020905-gw8i6tkn.txt'
=== file2bib.sh ===
         id: cord-020903-qt0ly5d0
     author: Tamine, Lynda
      title: What Can Task Teach Us About Query Reformulations?
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020903-qt0ly5d0.txt
      cache: ./cache/cord-020903-qt0ly5d0.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020903-qt0ly5d0.txt'
=== file2bib.sh ===
         id: cord-020904-x3o3a45b
     author: Montazeralghaem, Ali
      title: Relevance Ranking Based on Query-Aware Context Analysis
       date: 2020-03-17
      pages: 
  extension: .txt
        txt: ./txt/cord-020904-x3o3a45b.txt
      cache: ./cache/cord-020904-x3o3a45b.txt

Content-Encoding	UTF-8
Content-Type	text/plain; charset=UTF-8
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	3
resourceName	b'cord-020904-x3o3a45b.txt'
Que is empty; done
journal-advancesInInformationRetrieval-cord
=== reduce.pl bib ===
         id = cord-020811-pacy48qx
     author = Muhammad, Shamsuddeen Hassan
      title = Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 1725
  sentences = 113
     flesch = 50
    summary = title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon To this end, we propose an approach to automatically generate a domain-specific sentiment lexicon using a vector model enriched by weights. Although research has been carried out on corpus-based approaches for automatic generation of a domain-specific lexicon [1, 4, 5, 7, 9, 10, 14] , existing approaches focused on creation of a lexicon from a single corpus [4] . To this end, this work proposes an incremental approach for the automatic generation of a domain-specific sentiment lexicon. We aim to investigate an incremental technique for automatically generating domain-specific sentiment lexicon from a corpus. Can we automatically generate a sentiment lexicon from a corpus and improves the existing approaches? After detecting the domain shift, we merge the distribution using a similar approach discussed (in updating using the same corpus) and generate the lexicon.
      cache = ./cache/cord-020811-pacy48qx.txt
       txt  = ./txt/cord-020811-pacy48qx.txt
=== reduce.pl bib ===
         id = cord-020794-d3oru1w5
     author = Leekha, Maitree
      title = A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 1569
  sentences = 105
     flesch = 59
    summary = title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling In this work, we introduce a novel over-sampling technique to address the problem of class imbalance, and propose a multi-task deep learning approach for mining suggestions from multiple domains. Experimental results on a publicly available dataset show that our over-sampling technique, coupled with the multi-task framework outperforms state-of-the-art open domain suggestion mining models in terms of the F-1 measure and AUC. In our study, we generate synthetic positive reviews till the number of suggestion and non-suggestion class samples becomes equal in the training set. All comparisons have been made in terms of the F-1 score of the suggestion class for a fair comparison with prior work on representational learning for open domain suggestion mining [5] (refer Baseline in Table 3 ). In this work, we proposed a Multi-task learning framework for Open Domain Suggestion Mining along with a novel language model based over-sampling technique for text-LMOTE.
      cache = ./cache/cord-020794-d3oru1w5.txt
       txt  = ./txt/cord-020794-d3oru1w5.txt
=== reduce.pl bib ===
         id = cord-020914-7p37m92a
     author = Dumani, Lorik
      title = A Framework for Argument Retrieval: Ranking Argument Clusters by Frequency and Specificity
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 5482
  sentences = 302
     flesch = 67
    summary = From an information retrieval perspective, an interesting task within this setting is finding the best supporting and attacking premises for a given query claim from a large corpus of arguments. From an information retrieval perspective, an interesting task within this setting is finding the best supporting (pro) and attacking (con) premises for a given query claim [31] . Given a user's keyword query, the system retrieves, ranks, and presents premises supporting and attacking the query, taking similarity of the query with the premise, its corresponding claim, and other contextual information into account. We assume that we work with a large corpus of argumentative text, for example collections of political speeches or forum discussions, that has already been mined and transferred into claims with the corresponding premises and stances. We consider the following problem: Given a controversial claim or topic, for example "We should abandon fossil fuels", a user searches for the most important premises from the corpus supporting or attacking it.
      cache = ./cache/cord-020914-7p37m92a.txt
       txt  = ./txt/cord-020914-7p37m92a.txt
=== reduce.pl bib ===
         id = cord-020806-lof49r72
     author = Landin, Alfonso
      title = Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2373
  sentences = 150
     flesch = 52
    summary = title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings In this paper, we present EER, a linear model for the top-N recommendation task, which takes advantage of user and item embeddings for improving novelty and diversity without harming accuracy. In this paper, we propose a method to augment an existing recommendation linear model to make more diverse and novel recommendations, while maintaining similar accuracy results. Experiments conducted on three datasets show that our proposal outperforms the original model in both novelty and diversity while maintaining similar levels of accuracy. On the other side, as results in Table 3 show, ELP is able to provide good figures in novelty and diversity, thanks to the embedding model capturing non-linear relations between users and items. It is common in the field of recommender systems for methods with lower accuracy to have higher values in diversity and novelty. FISM: factored item similarity models for top-n recommender systems
      cache = ./cache/cord-020806-lof49r72.txt
       txt  = ./txt/cord-020806-lof49r72.txt
=== reduce.pl bib ===
         id = cord-020815-j9eboa94
     author = Kamphuis, Chris
      title = Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2249
  sentences = 154
     flesch = 60
    summary = Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene's often maligned approximation of document length. Although learning-to-rank approaches and neural ranking models are widely used today, they are typically deployed as part of a multi-stage reranking architecture, over candidate documents supplied by a simple term-matching method using traditional inverted indexes [1] . Our goal is a large-scale reproducibility study to explore the nuances of different variants of BM25 and their impact on retrieval effectiveness. Their findings are confirmed: effectiveness differences in IR experiments are unlikely to be the result of the choice of BM25 variant a system implemented. We implemented a variant that uses exact document lengths, but is otherwise identical to the Lucene default. Storing exact document lengths would allow for different ranking functions to be swapped at query time more easily, as no information would be discarded at index time.
      cache = ./cache/cord-020815-j9eboa94.txt
       txt  = ./txt/cord-020815-j9eboa94.txt
=== reduce.pl bib ===
         id = cord-020885-f667icyt
     author = Sharma, Ujjwal
      title = Semantic Path-Based Learning for Review Volume Prediction
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 4026
  sentences = 245
     flesch = 48
    summary = In this work, we present an approach that uses semantically meaningful, bimodal random walks on real-world heterogeneous networks to extract correlations between nodes and bring together nodes with shared or similar attributes. In this work, -We propose a novel method that incorporates restaurants and their attributes into a multimodal graph and extracts multiple, bimodal low dimensional representations for restaurants based on available paths through shared visual, textual, geographical and categorical features. In this section, we discuss prior work that leverages graph-based structures for extracting information from multiple modalities, focussing on the auto-captioning task that introduced such methods. For each of these sub-networks, we perform random walks and use a variant of the heterogeneous skip-gram objective introduced in [6] to generate low-dimensional bimodal embeddings. Our attention-based model combines separately learned bimodal embeddings using a late-fusion setup for predicting the review volume of the restaurants.
      cache = ./cache/cord-020885-f667icyt.txt
       txt  = ./txt/cord-020885-f667icyt.txt
=== reduce.pl bib ===
         id = cord-020888-ov2lzus4
     author = Formal, Thibault
      title = Learning to Rank Images with Cross-Modal Graph Convolutions
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 5211
  sentences = 256
     flesch = 55
    summary = While most of the current approaches for cross-modal retrieval revolve around learning how to represent text and images in a shared latent space, we take a different direction: we propose to generalize the cross-modal relevance feedback mechanism, a simple yet effective unsupervised method, that relies on standard information retrieval heuristics and the choice of a few hyper-parameters. The model can be understood very simply: similarly to PRF methods in standard information retrieval, the goal is to boost images that are visually similar to top images (from a text point of view), i.e. images that are likely to be relevant to the query but were initially badly ranked (which is likely to happen in the web scenario, where text is crawled from source page and can be very noisy).
      cache = ./cache/cord-020888-ov2lzus4.txt
       txt  = ./txt/cord-020888-ov2lzus4.txt
=== reduce.pl bib ===
         id = cord-020916-ds0cf78u
     author = Fard, Mazar Moradi
      title = Seed-Guided Deep Document Clustering
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 5079
  sentences = 265
     flesch = 57
    summary = The main contributions of this study can be summarized as follows: (a) We introduce the Seed-guided Deep Document Clustering (SD2C) framework, 1 the first attempt, to the best of our knowledge, to constrain clustering with seed words based on a deep clustering approach; and (b) we validate this framework through experiments based on automatically selected seed words on five publicly available text datasets with various sizes and characteristics. The constrained clustering problem we are addressing in fact bears strong similarity with the one of seed-guided dataless text classification, which consist in categorizing documents based on a small set of seed words describing the classes/clusters. This can be done by enforcing that seed words have more influence either on the learned document embeddings, a solution we refer to as SD2C-Doc, or on the cluster representatives, a solution we refer to as SD2C-Rep. Note that the second solution can only be used when the clustering process is based on cluster representatives (i.e., R = {r k } K k=1 with K the number of clusters), which is indeed the case for most current deep clustering methods [1] .
      cache = ./cache/cord-020916-ds0cf78u.txt
       txt  = ./txt/cord-020916-ds0cf78u.txt
=== reduce.pl bib ===
         id = cord-020820-cbikq0v0
     author = Papadakos, Panagiotis
      title = Dualism in Topical Relevance
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2468
  sentences = 133
     flesch = 56
    summary = To this end, in this paper we elaborate on the idea of leveraging the available antonyms of the original query terms for eventually producing an answer which provides a better overview of the related conceptual and information space. In this paper we elaborate on the idea of leveraging the available antonyms of the original query terms (if they exist), for eventually producing an answer which provides a better overview of the related information and conceptual space. In their comments for these queries, users mention that the selected (i.e., dual) list "provides a more general picture" and "more relevant and interesting results, although contradicting". For the future, we plan to define the appropriate antonyms selection algorithms and relevance metrics, implement the proposed functionality in a meta-search setting, and conduct a large scale evaluation with real users over exploratory tasks, to identify in which queries the dual approach is beneficial and to what types of users.
      cache = ./cache/cord-020820-cbikq0v0.txt
       txt  = ./txt/cord-020820-cbikq0v0.txt
=== reduce.pl bib ===
         id = cord-020793-kgje01qy
     author = Suominen, Hanna
      title = CLEF eHealth Evaluation Lab 2020
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2379
  sentences = 116
     flesch = 51
    summary = Laypeople's increasing difficulties to retrieve and digest valid and relevant information in their preferred language to make health-centred decisions has motivated CLEF eHealth to organize yearly labs since 2012. substantial community interest in the tasks and their resources has led to CLEF eHealth maturing as a primary venue for all interdisciplinary actors of the ecosystem for producing, processing, and consuming electronic health information. Information access conferences have organized evaluation labs on related Electronic Health (eHealth) Information Extraction (IE), Information Management (IM), and Information Retrieval (IR) tasks for almost 20 years. This Consumer Health Search (CHS) task follows a standard IR shared challenge paradigm from the perspective that it provides participants with a test collection consisting of a set of documents and a set of topics to develop IR techniques for. The IR task at the CLEF eHealth evaluation lab 2016: usercentred health information retrieval
      cache = ./cache/cord-020793-kgje01qy.txt
       txt  = ./txt/cord-020793-kgje01qy.txt
=== reduce.pl bib ===
         id = cord-020843-cq4lbd0l
     author = Almeida, Tiago
      title = Calling Attention to Passages for Biomedical Question Answering
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2235
  sentences = 125
     flesch = 50
    summary = This paper presents a pipeline for document and passage retrieval for biomedical question answering built around a new variant of the DeepRank network model in which the recursive layer is replaced by a self-attention layer combined with a weighting mechanism. On the other hand, models such as the Deep Relevance Matching Model (DRMM) [3] or DeepRank [10] follow a interaction-based approach, in which matching signals between query and document are captured and used by the neural network to produces a ranking score. The main contribution of this work is a new variant of the DeepRank neural network architecture in which the recursive layer originally included in the final aggregation step is replaced by a self-attention layer followed by a weighting mechanism similar to the term gating layer of the DRMM. The proposed model was evaluated on the BioASQ dataset, as part of a document and passage (snippet) retrieval pipeline for biomedical question answering, achieving similar retrieval performance when compared to more complex network architectures.
      cache = ./cache/cord-020843-cq4lbd0l.txt
       txt  = ./txt/cord-020843-cq4lbd0l.txt
=== reduce.pl bib ===
         id = cord-020841-40f2p3t4
     author = Hofstätter, Sebastian
      title = Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-ranking Results
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 1526
  sentences = 91
     flesch = 53
    summary = In this paper we look beyond metrics-based evaluation of Information Retrieval systems, to explore the reasons behind ranking results. We present the content-focused Neural-IR-Explorer, which empowers users to browse through retrieval results and inspect the inner workings and fine-grained results of neural re-ranking models. The explorer complements metrics based evaluation, by focusing on the content of queries and documents, and how the neural models relate them to each other. Users can explore each query result in more detail: We show the internal partial scores and content of the returned documents with different highlighting modes to surface the inner workings of a neural re-ranking model. The explorer displays data created by a batched evaluation run of a neural re-ranking model. Additionally, the Neural-IR-Explorer also illuminates the pool bias [12] of the MSMARCO ranking collection: The small number of judged documents per query makes the evaluation fragile. We presented the content-focused Neural-IR-Explorer to complement metric based evaluation of retrieval models.
      cache = ./cache/cord-020841-40f2p3t4.txt
       txt  = ./txt/cord-020841-40f2p3t4.txt
=== reduce.pl bib ===
=== reduce.pl bib ===
         id = cord-020932-o5scqiyk
     author = Zhong, Wei
      title = Accelerating Substructure Similarity Search for Formula Retrieval
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 4602
  sentences = 278
     flesch = 65
    summary = In text similarity search, query processing can be accelerated through dynamic pruning [18] , which typically estimates score upperbounds to prune documents unlikely to be in the top K results. As a result, the posting list entry also stores the root node ID for indexed paths, in order to reconstruct matches substructures at merge time. Define partial upperbound matrix W = {w i,j } |Tq|×|T| where T = {T(m), m ∈ T q } are all the token paths from query OPT (T is essentially the same as tokenized P(T q )), and a binary variable x |T|×1 indicating which corresponding posting lists are placed in the non-requirement set. We have presented rank-safe dynamic pruning strategies that produce an upperbound estimation of structural similarity in order to speedup formula search using subtree matching. Our dynamic pruning strategies and specialized inverted index are different from traditional linear text search pruning methods and they further associate query structure representation with posting lists.
      cache = ./cache/cord-020932-o5scqiyk.txt
       txt  = ./txt/cord-020932-o5scqiyk.txt
=== reduce.pl bib ===
         id = cord-020909-n36p5n2k
     author = Papadakos, Panagiotis
      title = bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 5005
  sentences = 256
     flesch = 63
    summary = -the bias goggles model for computing the bias characteristics of web domains for a user-defined concept, based on the notions of Biased Concepts (BCs), Aspects of Bias (ABs), and the metrics of the support of the domain for a specific AB and BC, and its bias score for this BC, -the introduction of the Support Flow Graph (SFG), along with graph-based algorithms for computing the AB support score of domains, that include adaptations of the Independence Cascade (IC) and Linear Threshold (LT) propagation models, and the new Biased-PageRank (Biased-PR) variation that models different behaviours of a biased surfer, -an initial discussion about performance and implementation issues, -some promising evaluation results that showcase the effectiveness and efficiency of the approach on a relatively small dataset of crawled pages, using the new AGBR and AGS metrics, -a publicly accessible prototype of bias goggles.
      cache = ./cache/cord-020909-n36p5n2k.txt
       txt  = ./txt/cord-020909-n36p5n2k.txt
=== reduce.pl bib ===
         id = cord-020899-d6r4fr9r
     author = Doinychko, Anastasiia
      title = Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 4666
  sentences = 244
     flesch = 56
    summary = In this paper, we present a conditional GAN with two generators and a common discriminator for multiview learning problems where observations have two views, but one of them may be missing for some of the training samples. We address the problem of multiview learning with Generative Adversarial Networks (GANs) in the case where some observations may have missing views without there being an external resource to complete them. We demonstrate that generated views allow to achieve state-of-the-art results on a subset of Reuters RCV1/RCV2 collections compared to multiview approaches that rely on Machine Translation (MT) for translating documents into languages in which their versions do not exist; before training the models. 3.2); -Achieve state-of-the art performance compared to multiview approaches that rely on external view generating functions on multilingual document classification; and which is another challenging application than image analysis which is the domain of choice for the design of new GAN models (Sect.
      cache = ./cache/cord-020899-d6r4fr9r.txt
       txt  = ./txt/cord-020899-d6r4fr9r.txt
=== reduce.pl bib ===
         id = cord-020896-yrocw53j
     author = Agarwal, Mansi
      title = MEMIS: Multimodal Emergency Management Information System
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 4874
  sentences = 270
     flesch = 52
    summary = We present MEMIS, a system that can be used in emergencies like disasters to identify and analyze the damage indicated by user-generated multimodal social media posts, thereby helping the disaster management groups in making informed decisions. To this end, we propose MEMIS, a multimodal system capable of extracting information from social media, and employs both images and text for identifying damage and its severity in real-time (refer Sect. Therefore, we effectively have three models for each modality: first for filtering the informative tweets, then for those pertaining to the infrastructural damage (or any other category related to the relief group), and finally for assessing the severity of damage present. Similarly, if at least one of the text and the image modality predicts an informative tweet as containing infrastructural damage, the tweet undergoes severity analysis. Here, we use attention fusion to combine the feature interpretations from the text and image modalities for the severity analysis module [12, 26] .
      cache = ./cache/cord-020896-yrocw53j.txt
       txt  = ./txt/cord-020896-yrocw53j.txt
=== reduce.pl bib ===
         id = cord-020905-gw8i6tkn
     author = Qu, Xianshan
      title = An Attention Model of Customer Expectation to Improve Review Helpfulness Prediction
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 5412
  sentences = 330
     flesch = 60
    summary = To model such customer expectations and capture important information from a review text, we propose a novel neural network which leverages review sentiment and product information. In order to address the above issues, we propose a novel neural network architecture to introduce sentiment and product information when identifying helpful content from a review text. In the cold start scenario, our proposed model demonstrates an AUC improvement of 5.4% and 1.5% on Amazon and Yelp data sets, respectively, when compared to the state of the art model. From Table 5 , we see that adding a sentiment attention layer (HSA) to the base model (HBiLSTM) results in an average improvement in the AUC score of 2.0% and 2.6%, respectively on the Amazon and Yelp data sets. In this paper, we describe our analysis of review helpfulness prediction and propose a novel neural network model with attention modules to incorporate sentiment and product information.
      cache = ./cache/cord-020905-gw8i6tkn.txt
       txt  = ./txt/cord-020905-gw8i6tkn.txt
=== reduce.pl bib ===
         id = cord-020912-tbq7okmj
     author = Batra, Vishwash
      title = Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 4506
  sentences = 247
     flesch = 50
    summary = We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. More concretely, we incorporate the global context information encoded in the entire text sequence (through the attention mechanism) into a variational autoencoder (VAE) at each time step, which converts the input text into an image representation in the image embedding space. To capture the semantics of the images retrieved so far (in a story/recipe), we assume the prior of the distribution of the topic given the text input follows the distribution conditional on the latent topic from the previous time step. -We propose a new variational recurrent seq2seq (VRSS) retrieval model for seq2seq retrieval, which employs temporally-dependent latent variables to capture the sequential semantic structure of text-image sequences. Our work is related to: cross-modal retrieval, story picturing, variational recurrent neural networks, and cooking recipe datasets.
      cache = ./cache/cord-020912-tbq7okmj.txt
       txt  = ./txt/cord-020912-tbq7okmj.txt
=== reduce.pl bib ===
         id = cord-020801-3sbicp3v
     author = MacAvaney, Sean
      title = Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2530
  sentences = 154
     flesch = 53
    summary = In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. [28] leveraged a data set of Wikipedia pages in 25 languages to train a learning to rank algorithm for Japanese-English and Swahili-English cross-language retrieval. In particular, to circumvent the lack of training data, we leverage transfer learning techniques to train Arabic, Mandarin, and Spanish retrieval models using English training data. We evaluate our models in a zero-shot setting; that is, we use them to predict relevance scores for query document pairs in languages never seen during training. Because large-scale relevance judgments are largely absent in languages other than English, we propose a new setting to evaluate learning-to-rank approaches: zero-shot cross-lingual ranking.
      cache = ./cache/cord-020801-3sbicp3v.txt
       txt  = ./txt/cord-020801-3sbicp3v.txt
=== reduce.pl bib ===
         id = cord-020834-ch0fg9rp
     author = Grand, Adrien
      title = From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2733
  sentences = 137
     flesch = 54
    summary = We share the story of how an innovation that originated from academia-blockmax indexes and the corresponding block-max Wand query evaluation algorithm of Ding and Suel [6] -made its way into the open-source Lucene search library. We see this paper as having two main contributions beyond providing a narrative of events: First, we report results of experiments that attempt to match the original conditions of Ding and Suel [6] and present additional results on a number of standard academic IR test collections. 3 Support for block-max indexes was the final feature that was implemented, based on the developers' reading of the paper by Ding and Suel [6] , which required invasive changes to Lucene's index format. The story of block-max Wand in Lucene provides a case study of how an innovation that originated in academia made its way into the world's most widely-used search library and achieved significant impact in the "real world" through hundreds of production deployments worldwide (if we consider the broader Lucene ecosystem, which includes systems such as Elasticsearch and Solr).
      cache = ./cache/cord-020834-ch0fg9rp.txt
       txt  = ./txt/cord-020834-ch0fg9rp.txt
=== reduce.pl bib ===
         id = cord-020875-vd4rtxmz
     author = Suwaileh, Reem
      title = Time-Critical Geolocation for Social Good
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2030
  sentences = 134
     flesch = 51
    summary = To address this problem, I aim to exploit different techniques such as training neural models, enriching the tweet representation, and studying methods to mitigate the lack of labeled data. In my work, I am interested in tackling the Location Mention Prediction (LMP) problem during time-critical situations. The location taggers have to address many challenges including microblogging-specific challenges (e.g., tweet sparsity, noisiness, stream rapid-changing, hashtag riding, etc.) and the task-specific challenges (e.g., time-criticality of the solution, scarcity of labeled data, etc.). Alternatively, Sultanik and Fink [25] , used Information Retrieval (IR) based approach to identify the location mentions in tweets. Moreover, Hoang and Mothe [8] combined syntactic and semantic features to train traditional ML-based models whereas Kumar and Singh [13] trained a Convolutional Neural Network (CNN) model that learns the continuous representation of tweet text and then identifies the location mentions.
      cache = ./cache/cord-020875-vd4rtxmz.txt
       txt  = ./txt/cord-020875-vd4rtxmz.txt
=== reduce.pl bib ===
         id = cord-020832-iavwkdpr
     author = Nguyen, Dat Quoc
      title = ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 1980
  sentences = 118
     flesch = 49
    summary = title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents ChEMU involves two key information extraction tasks over chemical reactions from patents. In this paper, we propose a new evaluation lab (called ChEMU) focusing on information extraction over chemical reactions from patents. Our goals are: (1) To develop tasks that impact chemical research in both academia and industry, (2) To provide the community with a new dataset of chemical entities, enriched with relational links between chemical event triggers and arguments, and (3) To advance the state-of-the-art in information extraction over chemical patents. The ChEMU lab at CLEF-2020 1 offers the two information extraction tasks of Named entity recognition (Task 1) and Event extraction (Task 2) over chemical reactions from patent documents. ChEMU will focus on two new tasks of named entity recognition and event extraction over chemical reactions from patents.
      cache = ./cache/cord-020832-iavwkdpr.txt
       txt  = ./txt/cord-020832-iavwkdpr.txt
=== reduce.pl bib ===
         id = cord-020814-1ty7wzlv
     author = Berrendorf, Max
      title = Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2314
  sentences = 144
     flesch = 55
    summary = In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Graph Convolutional Networks (GCN) [7, 9] , which have been recently become increasingly popular, are at the core of state-of-the-art methods for entity alignments in KGs [3, 6, 22, 24, 27] . 1. We investigate the reproducibility of the published results of a recent GCNbased method for entity alignment and uncover differences between the method's description in the paper and the authors' implementation. Overview of used datasets with their sizes in the number of triples (edges), entities (nodes), relations (different edge types) and alignments. GCN-Align [22] is a GCN-based approach to embed all entities from both graphs into a common embedding space. Semi-supervised entity alignment via knowledge graph embedding with awareness of degree difference Entity alignment between knowledge graphs using attribute embeddings
      cache = ./cache/cord-020814-1ty7wzlv.txt
       txt  = ./txt/cord-020814-1ty7wzlv.txt
=== reduce.pl bib ===
         id = cord-020813-0wc23ixy
     author = Hashemi, Helia
      title = ANTIQUE: A Non-factoid Question Answering Benchmark
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2941
  sentences = 185
     flesch = 59
    summary = Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. Despite the widely-known importance of studying answer passage retrieval for non-factoid questions [1, 2, 8, 18] , the research progress for this task is limited by the availability of high-quality public data. Although WikiPassageQA is an invaluable contribution to the community, it does not cover all aspects of the non-factoid question answering task and has the following limitations: (i) it only contains an average of 1.7 relevant passages per question and does not cover many questions with multiple correct answers; (ii) it was created from the Wikipedia website, containing only formal text; (iii) more importantly, the questions in the WikiPassageQA dataset were generated by crowdworkers, which is different from the questions that users ask in real-world systems; (iv) the relevant passages in WikiPassageQA contain the answer to the question in addition to some surrounding text. In contrast, ANTIQUE provides a reliable collection with complete relevance annotations for evaluating non-factoid QA models.
      cache = ./cache/cord-020813-0wc23ixy.txt
       txt  = ./txt/cord-020813-0wc23ixy.txt
=== reduce.pl bib ===
         id = cord-020903-qt0ly5d0
     author = Tamine, Lynda
      title = What Can Task Teach Us About Query Reformulations?
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 4957
  sentences = 264
     flesch = 61
    summary = task-based sessions represent significantly different background contexts to be used in the perspective of better understanding users' query reformulations. Using insights from large-scale search logs, our findings clearly show that task is an additional relevant search unit that helps better understanding user's query reformulation patterns and predicting the next user's query. To design support processes for task-based search systems, we argue that we need to: (1) fully understand how user's task performed in natural settings drives the query reformulations changes; and (2) gauge the level of similarity of these changes trends with those observed in time-based sessions. With this in mind, we perform large-scale log analyses of users naturally engaged in tasks to examine query reformulations from both the time-based session vs. To identify query reformulation patterns, most of the previous works used large-scale log analyses segmented into time-based sessions.
      cache = ./cache/cord-020903-qt0ly5d0.txt
       txt  = ./txt/cord-020903-qt0ly5d0.txt
=== reduce.pl bib ===
         id = cord-020830-97xmu329
     author = Ghanem, Bilal
      title = Irony Detection in a Multilingual Context
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2806
  sentences = 158
     flesch = 54
    summary = We show that these monolingual models trained separately on different languages using multilingual word representation or text-based features can open the door to irony detection in languages that lack of annotated data for irony. We aim here to bridge the gap by tackling ID in tweets from both multilingual (French, English and Arabic) and multicultural perspectives (Indo-European languages whose speakers share quite the same cultural background vs. We can justify that by, the language presentation of the Arabic and French tweets are quite informal and have many dialect words that may not exist in the pretrained embeddings we used comparing to the English ones (lower embeddings coverage ratio), which become harder for the CNN to learn a clear semantic pattern. The CNN architecture trained on cross-lingual word representation shows that irony has a certain similarity between the languages we targeted despite the cultural differences which confirm that irony is a universal phenomena, as already shown in previous linguistic studies [9, 24, 35] .
      cache = ./cache/cord-020830-97xmu329.txt
       txt  = ./txt/cord-020830-97xmu329.txt
=== reduce.pl bib ===
         id = cord-020848-nypu4w9s
     author = Morris, David
      title = SlideImages: A Dataset for Educational Image Classification
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2276
  sentences = 145
     flesch = 51
    summary = Currently, many document analysis systems are trained in part on scene images due to the lack of large datasets of educational image data. In this paper, we address this issue and present SlideImages, a dataset for the task of classifying educational illustrations. SlideImages contains training data collected from various sources, e.g., Wikimedia Commons and the AI2D dataset, and test data collected from educational slides. Born-digital and educational images need further benchmarks on challenging information retrieval tasks in order to test generalization. While document scans and born-digital educational illustrations have materially different appearance, these papers show that the utility of deep neural networks is not limited to scene image tasks (Fig. 1) . The related DocFigure dataset covers similar images and has much more data than SlideImages. In this paper, we have presented the task of classifying educational illustrations and images in slides and introduced a novel dataset SlideImages.
      cache = ./cache/cord-020848-nypu4w9s.txt
       txt  = ./txt/cord-020848-nypu4w9s.txt
=== reduce.pl bib ===
         id = cord-020808-wpso3jug
     author = Cardoso, João
      title = Machine-Actionable Data Management Plans: A Knowledge Retrieval Approach to Automate the Assessment of Funders’ Requirements
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2328
  sentences = 137
     flesch = 49
    summary = In order to guide researchers through the process of managing their data, many funding agencies (e.g. the National Science Foundation (NSF), the European Commission (EC), or the Fundação para a Ciência e Tecnologia (FCT) have created and published their own open access policies, as well as requiring that any grant proposals be accompanied by a Data Management Plan (DMP). The DMP is a document describing the techniques, methods and policies on how data from a research project is to be created or collected, documented, accessed, preserved and disseminated. The second part comprises of the execution of the following four tasks and results in both the collection of the necessary mappings between the ontology and the identified DMP templates, and creation of DL queries based on the funders' requirements. The DMP Common Standard Ontology (DCSO) 1 , was created with the objective of providing an implementation of the DMP Common Standards model expressed through the usage of semantic technology, which has been considered a possible solution in the data management and preservation domains [9] .
      cache = ./cache/cord-020808-wpso3jug.txt
       txt  = ./txt/cord-020808-wpso3jug.txt
=== reduce.pl bib ===
         id = cord-020880-m7d4e0eh
     author = Barrón-Cedeño, Alberto
      title = CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media
       date = 2020-03-24
      pages = 
  extension = .txt
       mime = text/plain
      words = 2693
  sentences = 217
     flesch = 64
    summary = Task 3 asks to retrieve text snippets from a given set of Web pages that would be useful for verifying a target tweet's claim. Finally, the lab offers a fifth task that asks to predict the check-worthiness of the claims made in English political debates and speeches. Task 3 is defined as follows: Given a check-worthy claim on a specific topic and a set of text snippets extracted from potentially-relevant webpages, return a ranked list of all evidence snippets for the claim. Once we acquire annotations for Task 1, we share with participants the Web pages and text snippets from them solely for the check-worthy claims, which would enable the start of the evaluation cycle for Task 3. Task 4 is defined as follows: Given a check-worthy claim on a specific topic and a set of potentially-relevant Web pages, predict the veracity of the claim.
      cache = ./cache/cord-020880-m7d4e0eh.txt
       txt  = ./txt/cord-020880-m7d4e0eh.txt
=== reduce.pl bib ===
=== reduce.pl bib ===
=== reduce.pl bib ===
=== reduce.pl bib ===
=== reduce.pl bib ===
         id = cord-020904-x3o3a45b
     author = Montazeralghaem, Ali
      title = Relevance Ranking Based on Query-Aware Context Analysis
       date = 2020-03-17
      pages = 
  extension = .txt
       mime = text/plain
      words = 5192
  sentences = 326
     flesch = 53
    summary = The primary goal of the proposed model is to combine the exact and semantic matching between query and document terms, which has been shown to produce effective performance in information retrieval. In basic retrieval models such as BM25 [30] and the language modeling framework [29] , the relevance score of a document is estimated based on explicit matching of query and document terms. Finally, our proposed model for relevance ranking provides the basis for natural integration of semantic term matching and local document context analysis into any retrieval model. [13] proposed a generalized estimate of document language models using a noisy channel, which captures semantic term similarities computed using word embeddings. Note that in this experiment, we only consider methods that select expansion terms based on word embeddings and not other information sources such as the top retrieved documents for each query (PRF).
      cache = ./cache/cord-020904-x3o3a45b.txt
       txt  = ./txt/cord-020904-x3o3a45b.txt
=== reduce.pl bib ===
=== reduce.pl bib ===
=== reduce.pl bib ===
=== reduce.pl bib ===
=== reduce.pl bib ===
=== reduce.pl bib ===
=== reduce.pl bib ===
=== reduce.pl bib ===
===== Reducing email addresses
Creating transaction
Updating adr table
===== Reducing keywords
cord-020793-kgje01qy
cord-020794-d3oru1w5
cord-020811-pacy48qx
cord-020820-cbikq0v0
cord-020815-j9eboa94
cord-020841-40f2p3t4
cord-020834-ch0fg9rp
cord-020896-yrocw53j
cord-020806-lof49r72
cord-020843-cq4lbd0l
cord-020912-tbq7okmj
cord-020885-f667icyt
cord-020914-7p37m92a
cord-020801-3sbicp3v
cord-020899-d6r4fr9r
cord-020909-n36p5n2k
cord-020832-iavwkdpr
cord-020875-vd4rtxmz
cord-020932-o5scqiyk
cord-020814-1ty7wzlv
cord-020905-gw8i6tkn
cord-020813-0wc23ixy
cord-020916-ds0cf78u
cord-020888-ov2lzus4
cord-020830-97xmu329
cord-020936-k1upc1xu
cord-020848-nypu4w9s
cord-020903-qt0ly5d0
cord-020851-hf5c0i9z
cord-020904-x3o3a45b
cord-020927-89c7rijg
cord-020846-mfh1ope6
cord-020918-056bvngu
cord-020891-lt3m8h41
cord-020835-n9v5ln2i
cord-020931-fymgnv1g
cord-020901-aew8xr6n
cord-020872-frr8xba6
cord-020871-1v6dcmt3
cord-020808-wpso3jug
cord-020880-m7d4e0eh
cord-020890-aw465igx
cord-020908-oe77eupc
Creating transaction
Updating wrd table
===== Reducing urls
cord-020841-40f2p3t4
cord-020814-1ty7wzlv
cord-020843-cq4lbd0l
cord-020848-nypu4w9s
cord-020916-ds0cf78u
cord-020835-n9v5ln2i
cord-020927-89c7rijg
Creating transaction
Updating url table
===== Reducing named entities
cord-020834-ch0fg9rp
cord-020820-cbikq0v0
cord-020806-lof49r72
cord-020815-j9eboa94
cord-020793-kgje01qy
cord-020841-40f2p3t4
cord-020794-d3oru1w5
cord-020885-f667icyt
cord-020896-yrocw53j
cord-020912-tbq7okmj
cord-020899-d6r4fr9r
cord-020843-cq4lbd0l
cord-020801-3sbicp3v
cord-020811-pacy48qx
cord-020914-7p37m92a
cord-020888-ov2lzus4
cord-020905-gw8i6tkn
cord-020936-k1upc1xu
cord-020916-ds0cf78u
cord-020909-n36p5n2k
cord-020832-iavwkdpr
cord-020932-o5scqiyk
cord-020875-vd4rtxmz
cord-020814-1ty7wzlv
cord-020813-0wc23ixy
cord-020903-qt0ly5d0
cord-020830-97xmu329
cord-020848-nypu4w9s
cord-020808-wpso3jug
cord-020846-mfh1ope6
cord-020880-m7d4e0eh
cord-020835-n9v5ln2i
cord-020904-x3o3a45b
cord-020891-lt3m8h41
cord-020871-1v6dcmt3
cord-020908-oe77eupc
cord-020927-89c7rijg
cord-020931-fymgnv1g
cord-020918-056bvngu
cord-020901-aew8xr6n
cord-020890-aw465igx
cord-020851-hf5c0i9z
cord-020872-frr8xba6
Creating transaction
Updating ent table
===== Reducing parts of speech
cord-020793-kgje01qy
cord-020794-d3oru1w5
cord-020811-pacy48qx
cord-020820-cbikq0v0
cord-020885-f667icyt
cord-020896-yrocw53j
cord-020834-ch0fg9rp
cord-020806-lof49r72
cord-020912-tbq7okmj
cord-020841-40f2p3t4
cord-020899-d6r4fr9r
cord-020801-3sbicp3v
cord-020914-7p37m92a
cord-020843-cq4lbd0l
cord-020815-j9eboa94
cord-020888-ov2lzus4
cord-020905-gw8i6tkn
cord-020909-n36p5n2k
cord-020936-k1upc1xu
cord-020813-0wc23ixy
cord-020916-ds0cf78u
cord-020832-iavwkdpr
cord-020814-1ty7wzlv
cord-020932-o5scqiyk
cord-020848-nypu4w9s
cord-020903-qt0ly5d0
cord-020875-vd4rtxmz
cord-020846-mfh1ope6
cord-020835-n9v5ln2i
cord-020880-m7d4e0eh
cord-020918-056bvngu
cord-020830-97xmu329
cord-020851-hf5c0i9z
cord-020890-aw465igx
cord-020908-oe77eupc
cord-020871-1v6dcmt3
cord-020891-lt3m8h41
cord-020904-x3o3a45b
cord-020927-89c7rijg
cord-020808-wpso3jug
cord-020872-frr8xba6
cord-020931-fymgnv1g
cord-020901-aew8xr6n
Creating transaction
Updating pos table
Building ./etc/reader.txt
cord-020904-x3o3a45b
cord-020903-qt0ly5d0
cord-020888-ov2lzus4
cord-020916-ds0cf78u
cord-020904-x3o3a45b
cord-020936-k1upc1xu
                number of items: 43
                   sum of words: 100,167
          average size in words: 3,338
      average readability score: 55

                          nouns: model; query; information; text; document; data; models; retrieval; task; results; documents; user; dataset; word; words; image; search; users; attention; approach; tasks; performance; work; number; set; queries; features; learning; methods; graph; embeddings; terms; training; representations; evaluation; system; review; term; analysis; network; context; language; images; relevance; approaches; representation; similarity; sentiment; networks; datasets
                          verbs: using; based; learns; show; propose; consider; provided; given; follows; generate; made; training; include; ranking; evaluate; compared; setting; finding; compute; contain; embedding; define; described; performs; obtain; supporting; retrieve; represent; identify; saw; introduce; predicts; improving; related; present; existing; takes; applying; extract; selected; combined; focused; capture; reported; require; outperforms; needs; sharing; observed; denotes
                     adjectives: different; neural; semantic; new; similar; relevant; large; deep; previous; first; social; specific; available; non; multi; many; single; best; modal; cross; original; several; common; online; multiple; important; local; better; standard; simple; second; long; high; final; effective; top; various; visual; additional; real; able; latent; traditional; automatic; small; natural; possible; biased; average; textual
                        adverbs: also; however; therefore; well; first; respectively; finally; even; instead; significantly; recently; often; better; furthermore; directly; still; automatically; fully; specifically; rather; especially; additionally; hence; randomly; much; moreover; always; already; usually; less; generally; widely; together; similarly; publicly; previously; now; typically; particularly; manually; semantically; far; otherwise; mainly; actually; simply; namely; jointly; highly; effectively
                       pronouns: we; our; it; their; they; i; its; them; one; us; you; he; itself; my; his; your; u; ours; me; she; s; ourselves; themselves; ndcg@10; mine; him; her; 's; Π; f
                   proper nouns: IR; Sect; BM25; Table; Fig; Eq; Retrieval; S; Lucene; Information; English; K; Twitter; COLTR; D; T; Bantu; BERT; i; DOI; TREC; Neural; TransRev; sha; Task; M; L; eRisk; C; VRSS; F; BC; LSTM; CNN; A; Wikipedia; Analysis; Network; LDA; dom; W; AUC; TF; Model; Cond; Amazon; corpus; IDF; DMP; Adam
                       keywords: user; image; task; query; document; word; review; model; lucene; graph; english; dataset; claim; bm25; vrss; view; tweet; trec; topic; text; term; system; symptom; suel; session; sentence; seed; sd2c; schema; runyankore; recommendation; ranker; question; product; prf; premise; patent; passage; ontology; node; network; location; lmp; lmote; list; lexicon; language; label; item; irony

       one topic; one dimension: model
                        file(s): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148004/
                      titles(s): CLEF eHealth Evaluation Lab 2020

    three topics; one dimension: query; task; graph
                        file(s): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148224/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148256/
                      titles(s): Relevance Ranking Based on Query-Aware Context Analysis | Counterfactual Online Learning to Rank | Axiomatic Analysis of Contact Recommendation Methods in Social Networks: An IR Perspective

  five topics; three dimensions: model document text; query task retrieval; data learning tweets; schema task dataset; graph nodes information
                        file(s): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148208/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148223/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148229/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148211/
                      titles(s): Learning to Rank Images with Cross-Modal Graph Convolutions | What Can Task Teach Us About Query Reformulations? | Counterfactual Online Learning to Rank | bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users | KvGR: A Graph-Based Interface for Explorative Sequential Question Answering on Heterogeneous Information Sources

      Type: cord
     title: journal-advancesInInformationRetrieval-cord
      date: 2021-05-30
      time: 15:05
  username: emorgan
    patron: Eric Morgan
     email: emorgan@nd.edu
     input: facet_journal:"Advances in Information Retrieval"
==== make-pages.sh htm files
==== make-pages.sh complex files
==== make-pages.sh named enities
==== making bibliographics
         id: cord-020896-yrocw53j
     author: Agarwal, Mansi
      title: MEMIS: Multimodal Emergency Management Information System
       date: 2020-03-17
      words: 4874.0
  sentences: 270.0
      pages: 
     flesch: 52.0
      cache: ./cache/cord-020896-yrocw53j.txt
        txt: ./txt/cord-020896-yrocw53j.txt
    summary: We present MEMIS, a system that can be used in emergencies like disasters to identify and analyze the damage indicated by user-generated multimodal social media posts, thereby helping the disaster management groups in making informed decisions. To this end, we propose MEMIS, a multimodal system capable of extracting information from social media, and employs both images and text for identifying damage and its severity in real-time (refer Sect. Therefore, we effectively have three models for each modality: first for filtering the informative tweets, then for those pertaining to the infrastructural damage (or any other category related to the relief group), and finally for assessing the severity of damage present. Similarly, if at least one of the text and the image modality predicts an informative tweet as containing infrastructural damage, the tweet undergoes severity analysis. Here, we use attention fusion to combine the feature interpretations from the text and image modalities for the severity analysis module [12, 26] .
   abstract: The recent upsurge in the usage of social media and the multimedia data generated therein has attracted many researchers for analyzing and decoding the information to automate decision-making in several fields. This work focuses on one such application: disaster management in times of crises and calamities. The existing research on disaster damage analysis has primarily taken only unimodal information in the form of text or image into account. These unimodal systems, although useful, fail to model the relationship between the various modalities. Different modalities often present supporting facts about the task, and therefore, learning them together can enhance performance. We present MEMIS, a system that can be used in emergencies like disasters to identify and analyze the damage indicated by user-generated multimodal social media posts, thereby helping the disaster management groups in making informed decisions. Our leave-one-disaster-out experiments on a multimodal dataset suggest that not only does fusing information in different media forms improves performance, but that our system can also generalize well to new disaster categories. Further qualitative analysis reveals that the system is responsive and computationally efficient.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148216/
        doi: 10.1007/978-3-030-45439-5_32

         id: cord-020843-cq4lbd0l
     author: Almeida, Tiago
      title: Calling Attention to Passages for Biomedical Question Answering
       date: 2020-03-24
      words: 2235.0
  sentences: 125.0
      pages: 
     flesch: 50.0
      cache: ./cache/cord-020843-cq4lbd0l.txt
        txt: ./txt/cord-020843-cq4lbd0l.txt
    summary: This paper presents a pipeline for document and passage retrieval for biomedical question answering built around a new variant of the DeepRank network model in which the recursive layer is replaced by a self-attention layer combined with a weighting mechanism. On the other hand, models such as the Deep Relevance Matching Model (DRMM) [3] or DeepRank [10] follow a interaction-based approach, in which matching signals between query and document are captured and used by the neural network to produces a ranking score. The main contribution of this work is a new variant of the DeepRank neural network architecture in which the recursive layer originally included in the final aggregation step is replaced by a self-attention layer followed by a weighting mechanism similar to the term gating layer of the DRMM. The proposed model was evaluated on the BioASQ dataset, as part of a document and passage (snippet) retrieval pipeline for biomedical question answering, achieving similar retrieval performance when compared to more complex network architectures.
   abstract: Question answering can be described as retrieving relevant information for questions expressed in natural language, possibly also generating a natural language answer. This paper presents a pipeline for document and passage retrieval for biomedical question answering built around a new variant of the DeepRank network model in which the recursive layer is replaced by a self-attention layer combined with a weighting mechanism. This adaptation halves the total number of parameters and makes the network more suited for identifying the relevant passages in each document. The overall retrieval system was evaluated on the BioASQ tasks 6 and 7, achieving similar retrieval performance when compared to more complex network architectures.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148054/
        doi: 10.1007/978-3-030-45442-5_9

         id: cord-020880-m7d4e0eh
     author: Barrón-Cedeño, Alberto
      title: CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media
       date: 2020-03-24
      words: 2693.0
  sentences: 217.0
      pages: 
     flesch: 64.0
      cache: ./cache/cord-020880-m7d4e0eh.txt
        txt: ./txt/cord-020880-m7d4e0eh.txt
    summary: Task 3 asks to retrieve text snippets from a given set of Web pages that would be useful for verifying a target tweet''s claim. Finally, the lab offers a fifth task that asks to predict the check-worthiness of the claims made in English political debates and speeches. Task 3 is defined as follows: Given a check-worthy claim on a specific topic and a set of text snippets extracted from potentially-relevant webpages, return a ranked list of all evidence snippets for the claim. Once we acquire annotations for Task 1, we share with participants the Web pages and text snippets from them solely for the check-worthy claims, which would enable the start of the evaluation cycle for Task 3. Task 4 is defined as follows: Given a check-worthy claim on a specific topic and a set of potentially-relevant Web pages, predict the veracity of the claim.
   abstract: We describe the third edition of the CheckThat! Lab, which is part of the 2020 Cross-Language Evaluation Forum (CLEF). CheckThat! proposes four complementary tasks and a related task from previous lab editions, offered in English, Arabic, and Spanish. Task 1 asks to predict which tweets in a Twitter stream are worth fact-checking. Task 2 asks to determine whether a claim posted in a tweet can be verified using a set of previously fact-checked claims. Task 3 asks to retrieve text snippets from a given set of Web pages that would be useful for verifying a target tweet’s claim. Task 4 asks to predict the veracity of a target tweet’s claim using a set of potentially-relevant Web pages. Finally, the lab offers a fifth task that asks to predict the check-worthiness of the claims made in English political debates and speeches. CheckThat! features a full evaluation framework. The evaluation is carried out using mean average precision or precision at rank k for ranking tasks, and F[Formula: see text] for classification tasks.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148108/
        doi: 10.1007/978-3-030-45442-5_65

         id: cord-020912-tbq7okmj
     author: Batra, Vishwash
      title: Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration
       date: 2020-03-17
      words: 4506.0
  sentences: 247.0
      pages: 
     flesch: 50.0
      cache: ./cache/cord-020912-tbq7okmj.txt
        txt: ./txt/cord-020912-tbq7okmj.txt
    summary: We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. More concretely, we incorporate the global context information encoded in the entire text sequence (through the attention mechanism) into a variational autoencoder (VAE) at each time step, which converts the input text into an image representation in the image embedding space. To capture the semantics of the images retrieved so far (in a story/recipe), we assume the prior of the distribution of the topic given the text input follows the distribution conditional on the latent topic from the previous time step. -We propose a new variational recurrent seq2seq (VRSS) retrieval model for seq2seq retrieval, which employs temporally-dependent latent variables to capture the sequential semantic structure of text-image sequences. Our work is related to: cross-modal retrieval, story picturing, variational recurrent neural networks, and cooking recipe datasets.
   abstract: We address and formalise the task of sequence-to-sequence (seq2seq) cross-modal retrieval. Given a sequence of text passages as query, the goal is to retrieve a sequence of images that best describes and aligns with the query. This new task extends the traditional cross-modal retrieval, where each image-text pair is treated independently ignoring broader context. We propose a novel variational recurrent seq2seq (VRSS) retrieval model for this seq2seq task. Unlike most cross-modal methods, we generate an image vector corresponding to the latent topic obtained from combining the text semantics and context. This synthetic image embedding point associated with every text embedding point can then be employed for either image generation or image retrieval as desired. We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. To this end, we build and release a new Stepwise Recipe dataset for research purposes, containing 10K recipes (sequences of image-text pairs) having a total of 67K image-text pairs. To our knowledge, it is the first publicly available dataset to offer rich semantic descriptions in a focused category such as food or recipes. Our model is shown to outperform several competitive and relevant baselines in the experiments. We also provide qualitative analysis of how semantically meaningful the results produced by our model are through human evaluation and comparison with relevant existing methods.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148232/
        doi: 10.1007/978-3-030-45439-5_4

         id: cord-020814-1ty7wzlv
     author: Berrendorf, Max
      title: Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned
       date: 2020-03-24
      words: 2314.0
  sentences: 144.0
      pages: 
     flesch: 55.0
      cache: ./cache/cord-020814-1ty7wzlv.txt
        txt: ./txt/cord-020814-1ty7wzlv.txt
    summary: In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Graph Convolutional Networks (GCN) [7, 9] , which have been recently become increasingly popular, are at the core of state-of-the-art methods for entity alignments in KGs [3, 6, 22, 24, 27] . 1. We investigate the reproducibility of the published results of a recent GCNbased method for entity alignment and uncover differences between the method''s description in the paper and the authors'' implementation. Overview of used datasets with their sizes in the number of triples (edges), entities (nodes), relations (different edge types) and alignments. GCN-Align [22] is a GCN-based approach to embed all entities from both graphs into a common embedding space. Semi-supervised entity alignment via knowledge graph embedding with awareness of degree difference Entity alignment between knowledge graphs using attribute embeddings
   abstract: In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Variants of GCN are used in multiple state-of-the-art approaches and therefore it is important to understand the specifics and limitations of GCN-based models. Despite serious efforts, we were not able to fully reproduce the results from the original paper and after a thorough audit of the code provided by authors, we concluded, that their implementation is different from the architecture described in the paper. In addition, several tricks are required to make the model work and some of them are not very intuitive.We provide an extensive ablation study to quantify the effects these tricks and changes of architecture have on final performance. Furthermore, we examine current evaluation approaches and systematize available benchmark datasets.We believe that people interested in KG matching might profit from our work, as well as novices entering the field. (Code: https://github.com/Valentyn1997/kg-alignment-lessons-learned).
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148025/
        doi: 10.1007/978-3-030-45442-5_1

         id: cord-020890-aw465igx
     author: Brochier, Robin
      title: Inductive Document Network Embedding with Topic-Word Attention
       date: 2020-03-17
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: Document network embedding aims at learning representations for a structured text corpus i.e. when documents are linked to each other. Recent algorithms extend network embedding approaches by incorporating the text content associated with the nodes in their formulations. In most cases, it is hard to interpret the learned representations. Moreover, little importance is given to the generalization to new documents that are not observed within the network. In this paper, we propose an interpretable and inductive document network embedding method. We introduce a novel mechanism, the Topic-Word Attention (TWA), that generates document representations based on the interplay between word and topic representations. We train these word and topic vectors through our general model, Inductive Document Network Embedding (IDNE), by leveraging the connections in the document network. Quantitative evaluations show that our approach achieves state-of-the-art performance on various networks and we qualitatively show that our model produces meaningful and interpretable representations of the words, topics and documents.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148210/
        doi: 10.1007/978-3-030-45439-5_22

         id: cord-020808-wpso3jug
     author: Cardoso, João
      title: Machine-Actionable Data Management Plans: A Knowledge Retrieval Approach to Automate the Assessment of Funders’ Requirements
       date: 2020-03-24
      words: 2328.0
  sentences: 137.0
      pages: 
     flesch: 49.0
      cache: ./cache/cord-020808-wpso3jug.txt
        txt: ./txt/cord-020808-wpso3jug.txt
    summary: In order to guide researchers through the process of managing their data, many funding agencies (e.g. the National Science Foundation (NSF), the European Commission (EC), or the Fundação para a Ciência e Tecnologia (FCT) have created and published their own open access policies, as well as requiring that any grant proposals be accompanied by a Data Management Plan (DMP). The DMP is a document describing the techniques, methods and policies on how data from a research project is to be created or collected, documented, accessed, preserved and disseminated. The second part comprises of the execution of the following four tasks and results in both the collection of the necessary mappings between the ontology and the identified DMP templates, and creation of DL queries based on the funders'' requirements. The DMP Common Standard Ontology (DCSO) 1 , was created with the objective of providing an implementation of the DMP Common Standards model expressed through the usage of semantic technology, which has been considered a possible solution in the data management and preservation domains [9] .
   abstract: Funding bodies and other policy-makers are increasingly more concerned with Research Data Management (RDM). The Data Management Plan (DMP) is one of the tools available to perform RDM tasks, however it is not a perfect concept. The Machine-Actionable Data Management Plan (maDMP) is a concept that aims to make the DMP interoperable, automated and increasingly standardised. In this paper we showcase that through the usage of semantic technologies, it is possible to both express and exploit the features of the maDMP. In particular, we focus on showing how a maDMP formalised as an ontology can be used automate the assessment of a funder’s requirements for a given organisation.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148019/
        doi: 10.1007/978-3-030-45442-5_15

         id: cord-020908-oe77eupc
     author: Chen, Zhiyu
      title: Leveraging Schema Labels to Enhance Dataset Search
       date: 2020-03-17
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: A search engine’s ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior knowledge to write a query using terms that match with description text. We propose a novel schema label generation model which generates possible schema labels based on dataset table content. We incorporate the generated schema labels into a mixed ranking model which not only considers the relevance between the query and dataset metadata but also the similarity between the query and generated schema labels. To evaluate our method on real-world datasets, we create a new benchmark specifically for the dataset retrieval task. Experiments show that our approach can effectively improve the precision and NDCG scores of the dataset retrieval task compared with baseline methods. We also test on a collection of Wikipedia tables to show that the features generated from schema labels can improve the unsupervised and supervised web table retrieval task as well.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148228/
        doi: 10.1007/978-3-030-45439-5_18

         id: cord-020899-d6r4fr9r
     author: Doinychko, Anastasiia
      title: Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views
       date: 2020-03-17
      words: 4666.0
  sentences: 244.0
      pages: 
     flesch: 56.0
      cache: ./cache/cord-020899-d6r4fr9r.txt
        txt: ./txt/cord-020899-d6r4fr9r.txt
    summary: In this paper, we present a conditional GAN with two generators and a common discriminator for multiview learning problems where observations have two views, but one of them may be missing for some of the training samples. We address the problem of multiview learning with Generative Adversarial Networks (GANs) in the case where some observations may have missing views without there being an external resource to complete them. We demonstrate that generated views allow to achieve state-of-the-art results on a subset of Reuters RCV1/RCV2 collections compared to multiview approaches that rely on Machine Translation (MT) for translating documents into languages in which their versions do not exist; before training the models. 3.2); -Achieve state-of-the art performance compared to multiview approaches that rely on external view generating functions on multilingual document classification; and which is another challenging application than image analysis which is the domain of choice for the design of new GAN models (Sect.
   abstract: In this paper, we present a conditional GAN with two generators and a common discriminator for multiview learning problems where observations have two views, but one of them may be missing for some of the training samples. This is for example the case for multilingual collections where documents are not available in all languages. Some studies tackled this problem by assuming the existence of view generation functions to approximately complete the missing views; for example Machine Translation to translate documents into the missing languages. These functions generally require an external resource to be set and their quality has a direct impact on the performance of the learned multiview classifier over the completed training set. Our proposed approach addresses this problem by jointly learning the missing views and the multiview classifier using a tripartite game with two generators and a discriminator. Each of the generators is associated to one of the views and tries to fool the discriminator by generating the other missing view conditionally on the corresponding observed view. The discriminator then tries to identify if for an observation, one of its views is completed by one of the generators or if both views are completed along with its class. Our results on a subset of Reuters RCV1/RCV2 collections show that the discriminator achieves significant classification performance; and that the generators learn the missing views with high quality without the need of any consequent external resource.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148219/
        doi: 10.1007/978-3-030-45439-5_53

         id: cord-020914-7p37m92a
     author: Dumani, Lorik
      title: A Framework for Argument Retrieval: Ranking Argument Clusters by Frequency and Specificity
       date: 2020-03-17
      words: 5482.0
  sentences: 302.0
      pages: 
     flesch: 67.0
      cache: ./cache/cord-020914-7p37m92a.txt
        txt: ./txt/cord-020914-7p37m92a.txt
    summary: From an information retrieval perspective, an interesting task within this setting is finding the best supporting and attacking premises for a given query claim from a large corpus of arguments. From an information retrieval perspective, an interesting task within this setting is finding the best supporting (pro) and attacking (con) premises for a given query claim [31] . Given a user''s keyword query, the system retrieves, ranks, and presents premises supporting and attacking the query, taking similarity of the query with the premise, its corresponding claim, and other contextual information into account. We assume that we work with a large corpus of argumentative text, for example collections of political speeches or forum discussions, that has already been mined and transferred into claims with the corresponding premises and stances. We consider the following problem: Given a controversial claim or topic, for example "We should abandon fossil fuels", a user searches for the most important premises from the corpus supporting or attacking it.
   abstract: Computational argumentation has recently become a fast growing field of research. An argument consists of a claim, such as “We should abandon fossil fuels”, which is supported or attacked by at least one premise, for example “Burning fossil fuels is one cause for global warming”. From an information retrieval perspective, an interesting task within this setting is finding the best supporting and attacking premises for a given query claim from a large corpus of arguments. Since the same logical premise can be formulated differently, the system needs to avoid retrieving duplicate results and thus needs to use some form of clustering. In this paper we propose a principled probabilistic ranking framework for premises based on the idea of tf-idf that, given a query claim, first identifies highly similar claims in the corpus, and then clusters and ranks their premises, taking clusters of claims as well as the stances of query and premises into account. We compare our approach to a baseline system that uses BM25F which we outperform even with a primitive implementation of our framework utilising BERT.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148234/
        doi: 10.1007/978-3-030-45439-5_29

         id: cord-020916-ds0cf78u
     author: Fard, Mazar Moradi
      title: Seed-Guided Deep Document Clustering
       date: 2020-03-17
      words: 5079.0
  sentences: 265.0
      pages: 
     flesch: 57.0
      cache: ./cache/cord-020916-ds0cf78u.txt
        txt: ./txt/cord-020916-ds0cf78u.txt
    summary: The main contributions of this study can be summarized as follows: (a) We introduce the Seed-guided Deep Document Clustering (SD2C) framework, 1 the first attempt, to the best of our knowledge, to constrain clustering with seed words based on a deep clustering approach; and (b) we validate this framework through experiments based on automatically selected seed words on five publicly available text datasets with various sizes and characteristics. The constrained clustering problem we are addressing in fact bears strong similarity with the one of seed-guided dataless text classification, which consist in categorizing documents based on a small set of seed words describing the classes/clusters. This can be done by enforcing that seed words have more influence either on the learned document embeddings, a solution we refer to as SD2C-Doc, or on the cluster representatives, a solution we refer to as SD2C-Rep. Note that the second solution can only be used when the clustering process is based on cluster representatives (i.e., R = {r k } K k=1 with K the number of clusters), which is indeed the case for most current deep clustering methods [1] .
   abstract: Different users may be interested in different clustering views underlying a given collection (e.g., topic and writing style in documents). Enabling them to provide constraints reflecting their needs can then help obtain tailored clustering results. For document clustering, constraints can be provided in the form of seed words, each cluster being characterized by a small set of words. This seed-guided constrained document clustering problem was recently addressed through topic modeling approaches. In this paper, we jointly learn deep representations and bias the clustering results through the seed words, leading to a Seed-guided Deep Document Clustering approach. Its effectiveness is demonstrated on five public datasets.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148236/
        doi: 10.1007/978-3-030-45439-5_1

         id: cord-020888-ov2lzus4
     author: Formal, Thibault
      title: Learning to Rank Images with Cross-Modal Graph Convolutions
       date: 2020-03-17
      words: 5211.0
  sentences: 256.0
      pages: 
     flesch: 55.0
      cache: ./cache/cord-020888-ov2lzus4.txt
        txt: ./txt/cord-020888-ov2lzus4.txt
    summary: While most of the current approaches for cross-modal retrieval revolve around learning how to represent text and images in a shared latent space, we take a different direction: we propose to generalize the cross-modal relevance feedback mechanism, a simple yet effective unsupervised method, that relies on standard information retrieval heuristics and the choice of a few hyper-parameters. The model can be understood very simply: similarly to PRF methods in standard information retrieval, the goal is to boost images that are visually similar to top images (from a text point of view), i.e. images that are likely to be relevant to the query but were initially badly ranked (which is likely to happen in the web scenario, where text is crawled from source page and can be very noisy).
   abstract: We are interested in the problem of cross-modal retrieval for web image search, where the goal is to retrieve images relevant to a text query. While most of the current approaches for cross-modal retrieval revolve around learning how to represent text and images in a shared latent space, we take a different direction: we propose to generalize the cross-modal relevance feedback mechanism, a simple yet effective unsupervised method, that relies on standard information retrieval heuristics and the choice of a few hyper-parameters. We show that we can cast it as a supervised representation learning problem on graphs, using graph convolutions operating jointly over text and image features, namely cross-modal graph convolutions. The proposed architecture directly learns how to combine image and text features for the ranking task, while taking into account the context given by all the other elements in the set of images to be (re-)ranked. We validate our approach on two datasets: a public dataset from a MediaEval challenge, and a small sample of proprietary image search query logs, referred as WebQ. Our experiments demonstrate that our model improves over standard baselines.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148208/
        doi: 10.1007/978-3-030-45439-5_39

         id: cord-020901-aew8xr6n
     author: García-Durán, Alberto
      title: TransRev: Modeling Reviews as Translations from Users to Items
       date: 2020-03-17
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: The text of a review expresses the sentiment a customer has towards a particular product. This is exploited in sentiment analysis where machine learning models are used to predict the review score from the text of the review. Furthermore, the products costumers have purchased in the past are indicative of the products they will purchase in the future. This is what recommender systems exploit by learning models from purchase information to predict the items a customer might be interested in. The underlying structure of this problem setting is a bipartite graph, wherein customer nodes are connected to product nodes via ‘review’ links. This is reminiscent of knowledge bases, with ‘review’ links replacing relation types. We propose TransRev, an approach to the product recommendation problem that integrates ideas from recommender systems, sentiment analysis, and multi-relational learning into a joint learning objective. TransRev learns vector representations for users, items, and reviews. The embedding of a review is learned such that (a) it performs well as input feature of a regression model for sentiment prediction; and (b) it always translates the reviewer embedding to the embedding of the reviewed item. This is reminiscent of TransE [5], a popular embedding method for link prediction in knowledge bases. This allows TransRev to approximate a review embedding at test time as the difference of the embedding of each item and the user embedding. The approximated review embedding is then used with the regression model to predict the review score for each item. TransRev outperforms state of the art recommender systems on a large number of benchmark data sets. Moreover, it is able to retrieve, for each user and item, the review text from the training set whose embedding is most similar to the approximated review embedding.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148221/
        doi: 10.1007/978-3-030-45439-5_16

         id: cord-020830-97xmu329
     author: Ghanem, Bilal
      title: Irony Detection in a Multilingual Context
       date: 2020-03-24
      words: 2806.0
  sentences: 158.0
      pages: 
     flesch: 54.0
      cache: ./cache/cord-020830-97xmu329.txt
        txt: ./txt/cord-020830-97xmu329.txt
    summary: We show that these monolingual models trained separately on different languages using multilingual word representation or text-based features can open the door to irony detection in languages that lack of annotated data for irony. We aim here to bridge the gap by tackling ID in tweets from both multilingual (French, English and Arabic) and multicultural perspectives (Indo-European languages whose speakers share quite the same cultural background vs. We can justify that by, the language presentation of the Arabic and French tweets are quite informal and have many dialect words that may not exist in the pretrained embeddings we used comparing to the English ones (lower embeddings coverage ratio), which become harder for the CNN to learn a clear semantic pattern. The CNN architecture trained on cross-lingual word representation shows that irony has a certain similarity between the languages we targeted despite the cultural differences which confirm that irony is a universal phenomena, as already shown in previous linguistic studies [9, 24, 35] .
   abstract: This paper proposes the first multilingual (French, English and Arabic) and multicultural (Indo-European languages vs. less culturally close languages) irony detection system. We employ both feature-based models and neural architectures using monolingual word representation. We compare the performance of these systems with state-of-the-art systems to identify their capabilities. We show that these monolingual models trained separately on different languages using multilingual word representation or text-based features can open the door to irony detection in languages that lack of annotated data for irony.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148041/
        doi: 10.1007/978-3-030-45442-5_18

         id: cord-020834-ch0fg9rp
     author: Grand, Adrien
      title: From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance
       date: 2020-03-24
      words: 2733.0
  sentences: 137.0
      pages: 
     flesch: 54.0
      cache: ./cache/cord-020834-ch0fg9rp.txt
        txt: ./txt/cord-020834-ch0fg9rp.txt
    summary: We share the story of how an innovation that originated from academia-blockmax indexes and the corresponding block-max Wand query evaluation algorithm of Ding and Suel [6] -made its way into the open-source Lucene search library. We see this paper as having two main contributions beyond providing a narrative of events: First, we report results of experiments that attempt to match the original conditions of Ding and Suel [6] and present additional results on a number of standard academic IR test collections. 3 Support for block-max indexes was the final feature that was implemented, based on the developers'' reading of the paper by Ding and Suel [6] , which required invasive changes to Lucene''s index format. The story of block-max Wand in Lucene provides a case study of how an innovation that originated in academia made its way into the world''s most widely-used search library and achieved significant impact in the "real world" through hundreds of production deployments worldwide (if we consider the broader Lucene ecosystem, which includes systems such as Elasticsearch and Solr).
   abstract: The latest major release of Lucene (version 8) in March 2019 incorporates block-max indexes and exploits the block-max variant of Wand for query evaluation, which are innovations that originated from academia. This paper shares the story of how this came to be, which provides an interesting case study at the intersection of reproducibility and academic research achieving impact in the “real world”. We offer additional thoughts on the often idiosyncratic processes by which academic research makes its way into deployed solutions.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148045/
        doi: 10.1007/978-3-030-45442-5_3

         id: cord-020813-0wc23ixy
     author: Hashemi, Helia
      title: ANTIQUE: A Non-factoid Question Answering Benchmark
       date: 2020-03-24
      words: 2941.0
  sentences: 185.0
      pages: 
     flesch: 59.0
      cache: ./cache/cord-020813-0wc23ixy.txt
        txt: ./txt/cord-020813-0wc23ixy.txt
    summary: Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. Despite the widely-known importance of studying answer passage retrieval for non-factoid questions [1, 2, 8, 18] , the research progress for this task is limited by the availability of high-quality public data. Although WikiPassageQA is an invaluable contribution to the community, it does not cover all aspects of the non-factoid question answering task and has the following limitations: (i) it only contains an average of 1.7 relevant passages per question and does not cover many questions with multiple correct answers; (ii) it was created from the Wikipedia website, containing only formal text; (iii) more importantly, the questions in the WikiPassageQA dataset were generated by crowdworkers, which is different from the questions that users ask in real-world systems; (iv) the relevant passages in WikiPassageQA contain the answer to the question in addition to some surrounding text. In contrast, ANTIQUE provides a reliable collection with complete relevance annotations for evaluating non-factoid QA models.
   abstract: Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. In this paper, we develop and release a collection of 2,626 open-domain non-factoid questions from a diverse set of categories. The dataset, called ANTIQUE, contains 34k manual relevance annotations. The questions were asked by real users in a community question answering service, i.e., Yahoo! Answers. Relevance judgments for all the answers to each question were collected through crowdsourcing. To facilitate further research, we also include a brief analysis of the data as well as baseline results on both classical and neural IR models.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148024/
        doi: 10.1007/978-3-030-45442-5_21

         id: cord-020841-40f2p3t4
     author: Hofstätter, Sebastian
      title: Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-ranking Results
       date: 2020-03-24
      words: 1526.0
  sentences: 91.0
      pages: 
     flesch: 53.0
      cache: ./cache/cord-020841-40f2p3t4.txt
        txt: ./txt/cord-020841-40f2p3t4.txt
    summary: In this paper we look beyond metrics-based evaluation of Information Retrieval systems, to explore the reasons behind ranking results. We present the content-focused Neural-IR-Explorer, which empowers users to browse through retrieval results and inspect the inner workings and fine-grained results of neural re-ranking models. The explorer complements metrics based evaluation, by focusing on the content of queries and documents, and how the neural models relate them to each other. Users can explore each query result in more detail: We show the internal partial scores and content of the returned documents with different highlighting modes to surface the inner workings of a neural re-ranking model. The explorer displays data created by a batched evaluation run of a neural re-ranking model. Additionally, the Neural-IR-Explorer also illuminates the pool bias [12] of the MSMARCO ranking collection: The small number of judged documents per query makes the evaluation fragile. We presented the content-focused Neural-IR-Explorer to complement metric based evaluation of retrieval models.
   abstract: In this paper we look beyond metrics-based evaluation of Information Retrieval systems, to explore the reasons behind ranking results. We present the content-focused Neural-IR-Explorer, which empowers users to browse through retrieval results and inspect the inner workings and fine-grained results of neural re-ranking models. The explorer includes a categorized overview of the available queries, as well as an individual query result view with various options to highlight semantic connections between query-document pairs. The Neural-IR-Explorer is available at: https://neural-ir-explorer.ec.tuwien.ac.at/.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148052/
        doi: 10.1007/978-3-030-45442-5_58

         id: cord-020835-n9v5ln2i
     author: Jangra, Anubhav
      title: Text-Image-Video Summary Generation Using Joint Integer Linear Programming
       date: 2020-03-24
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: Automatically generating a summary for asynchronous data can help users to keep up with the rapid growth of multi-modal information on the Internet. However, the current multi-modal systems usually generate summaries composed of text and images. In this paper, we propose a novel research problem of text-image-video summary generation (TIVS). We first develop a multi-modal dataset containing text documents, images and videos. We then propose a novel joint integer linear programming multi-modal summarization (JILP-MMS) framework. We report the performance of our model on the developed dataset.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148046/
        doi: 10.1007/978-3-030-45442-5_24

         id: cord-020815-j9eboa94
     author: Kamphuis, Chris
      title: Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants
       date: 2020-03-24
      words: 2249.0
  sentences: 154.0
      pages: 
     flesch: 60.0
      cache: ./cache/cord-020815-j9eboa94.txt
        txt: ./txt/cord-020815-j9eboa94.txt
    summary: Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene''s often maligned approximation of document length. Although learning-to-rank approaches and neural ranking models are widely used today, they are typically deployed as part of a multi-stage reranking architecture, over candidate documents supplied by a simple term-matching method using traditional inverted indexes [1] . Our goal is a large-scale reproducibility study to explore the nuances of different variants of BM25 and their impact on retrieval effectiveness. Their findings are confirmed: effectiveness differences in IR experiments are unlikely to be the result of the choice of BM25 variant a system implemented. We implemented a variant that uses exact document lengths, but is otherwise identical to the Lucene default. Storing exact document lengths would allow for different ranking functions to be swapped at query time more easily, as no information would be discarded at index time.
   abstract: When researchers speak of BM25, it is not entirely clear which variant they mean, since many tweaks to Robertson et al.’s original formulation have been proposed. When practitioners speak of BM25, they most likely refer to the implementation in the Lucene open-source search library. Does this ambiguity “matter”? We attempt to answer this question with a large-scale reproducibility study of BM25, considering eight variants. Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene’s often maligned approximation of document length. As an added benefit, our empirical approach takes advantage of databases for rapid IR prototyping, which validates both the feasibility and methodological advantages claimed in previous work.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148026/
        doi: 10.1007/978-3-030-45442-5_4

         id: cord-020806-lof49r72
     author: Landin, Alfonso
      title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings
       date: 2020-03-24
      words: 2373.0
  sentences: 150.0
      pages: 
     flesch: 52.0
      cache: ./cache/cord-020806-lof49r72.txt
        txt: ./txt/cord-020806-lof49r72.txt
    summary: title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings In this paper, we present EER, a linear model for the top-N recommendation task, which takes advantage of user and item embeddings for improving novelty and diversity without harming accuracy. In this paper, we propose a method to augment an existing recommendation linear model to make more diverse and novel recommendations, while maintaining similar accuracy results. Experiments conducted on three datasets show that our proposal outperforms the original model in both novelty and diversity while maintaining similar levels of accuracy. On the other side, as results in Table 3 show, ELP is able to provide good figures in novelty and diversity, thanks to the embedding model capturing non-linear relations between users and items. It is common in the field of recommender systems for methods with lower accuracy to have higher values in diversity and novelty. FISM: factored item similarity models for top-n recommender systems
   abstract: Nowadays, item recommendation is an increasing concern for many companies. Users tend to be more reactive than proactive for solving information needs. Recommendation accuracy became the most studied aspect of the quality of the suggestions. However, novel and diverse suggestions also contribute to user satisfaction. Unfortunately, it is common to harm those two aspects when optimizing recommendation accuracy. In this paper, we present EER, a linear model for the top-N recommendation task, which takes advantage of user and item embeddings for improving novelty and diversity without harming accuracy.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148017/
        doi: 10.1007/978-3-030-45442-5_27

         id: cord-020794-d3oru1w5
     author: Leekha, Maitree
      title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling
       date: 2020-03-24
      words: 1569.0
  sentences: 105.0
      pages: 
     flesch: 59.0
      cache: ./cache/cord-020794-d3oru1w5.txt
        txt: ./txt/cord-020794-d3oru1w5.txt
    summary: title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling In this work, we introduce a novel over-sampling technique to address the problem of class imbalance, and propose a multi-task deep learning approach for mining suggestions from multiple domains. Experimental results on a publicly available dataset show that our over-sampling technique, coupled with the multi-task framework outperforms state-of-the-art open domain suggestion mining models in terms of the F-1 measure and AUC. In our study, we generate synthetic positive reviews till the number of suggestion and non-suggestion class samples becomes equal in the training set. All comparisons have been made in terms of the F-1 score of the suggestion class for a fair comparison with prior work on representational learning for open domain suggestion mining [5] (refer Baseline in Table 3 ). In this work, we proposed a Multi-task learning framework for Open Domain Suggestion Mining along with a novel language model based over-sampling technique for text-LMOTE.
   abstract: Consumer reviews online may contain suggestions useful for improving commercial products and services. Mining suggestions is challenging due to the absence of large labeled and balanced datasets. Furthermore, most prior studies attempting to mine suggestions, have focused on a single domain such as Hotel or Travel only. In this work, we introduce a novel over-sampling technique to address the problem of class imbalance, and propose a multi-task deep learning approach for mining suggestions from multiple domains. Experimental results on a publicly available dataset show that our over-sampling technique, coupled with the multi-task framework outperforms state-of-the-art open domain suggestion mining models in terms of the F-1 measure and AUC.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148005/
        doi: 10.1007/978-3-030-45442-5_28

         id: cord-020851-hf5c0i9z
     author: Losada, David E.
      title: eRisk 2020: Self-harm and Depression Challenges
       date: 2020-03-24
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: This paper describes eRisk, the CLEF lab on early risk prediction on the Internet. eRisk started in 2017 as an attempt to set the experimental foundations of early risk detection. Over the last three editions of eRisk (2017, 2018 and 2019), the lab organized a number of early risk detection challenges oriented to the problems of detecting depression, anorexia and self-harm. We review in this paper the main lessons learned from the past and we discuss our future plans for the 2020 edition.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148062/
        doi: 10.1007/978-3-030-45442-5_72

         id: cord-020801-3sbicp3v
     author: MacAvaney, Sean
      title: Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning
       date: 2020-03-24
      words: 2530.0
  sentences: 154.0
      pages: 
     flesch: 53.0
      cache: ./cache/cord-020801-3sbicp3v.txt
        txt: ./txt/cord-020801-3sbicp3v.txt
    summary: In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. [28] leveraged a data set of Wikipedia pages in 25 languages to train a learning to rank algorithm for Japanese-English and Swahili-English cross-language retrieval. In particular, to circumvent the lack of training data, we leverage transfer learning techniques to train Arabic, Mandarin, and Spanish retrieval models using English training data. We evaluate our models in a zero-shot setting; that is, we use them to predict relevance scores for query document pairs in languages never seen during training. Because large-scale relevance judgments are largely absent in languages other than English, we propose a new setting to evaluate learning-to-rank approaches: zero-shot cross-lingual ranking.
   abstract: While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148012/
        doi: 10.1007/978-3-030-45442-5_31

         id: cord-020931-fymgnv1g
     author: Meng, Changping
      title: ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis
       date: 2020-03-17
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: Analyzing the readability of articles has been an important sociolinguistic task. Addressing this task is necessary to the automatic recommendation of appropriate articles to readers with different comprehension abilities, and it further benefits education systems, web information systems, and digital libraries. Current methods for assessing readability employ empirical measures or statistical learning techniques that are limited by their ability to characterize complex patterns such as article structures and semantic meanings of sentences. In this paper, we propose a new and comprehensive framework which uses a hierarchical self-attention model to analyze document readability. In this model, measurements of sentence-level difficulty are captured along with the semantic meanings of each sentence. Additionally, the sentence-level features are incorporated to characterize the overall readability of an article with consideration of article structures. We evaluate our proposed approach on three widely-used benchmark datasets against several strong baseline approaches. Experimental results show that our proposed method achieves the state-of-the-art performance on estimating the readability for various web articles and literature.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148251/
        doi: 10.1007/978-3-030-45439-5_3

         id: cord-020904-x3o3a45b
     author: Montazeralghaem, Ali
      title: Relevance Ranking Based on Query-Aware Context Analysis
       date: 2020-03-17
      words: 5192.0
  sentences: 326.0
      pages: 
     flesch: 53.0
      cache: ./cache/cord-020904-x3o3a45b.txt
        txt: ./txt/cord-020904-x3o3a45b.txt
    summary: The primary goal of the proposed model is to combine the exact and semantic matching between query and document terms, which has been shown to produce effective performance in information retrieval. In basic retrieval models such as BM25 [30] and the language modeling framework [29] , the relevance score of a document is estimated based on explicit matching of query and document terms. Finally, our proposed model for relevance ranking provides the basis for natural integration of semantic term matching and local document context analysis into any retrieval model. [13] proposed a generalized estimate of document language models using a noisy channel, which captures semantic term similarities computed using word embeddings. Note that in this experiment, we only consider methods that select expansion terms based on word embeddings and not other information sources such as the top retrieved documents for each query (PRF).
   abstract: Word mismatch between queries and documents is a long-standing challenge in information retrieval. Recent advances in distributed word representations address the word mismatch problem by enabling semantic matching. However, most existing models rank documents based on semantic matching between query and document terms without an explicit understanding of the relationship of the match to relevance. To consider semantic matching between query and document, we propose an unsupervised semantic matching model by simulating a user who makes relevance decisions. The primary goal of the proposed model is to combine the exact and semantic matching between query and document terms, which has been shown to produce effective performance in information retrieval. As semantic matching between queries and entire documents is computationally expensive, we propose to use local contexts of query terms in documents for semantic matching. Matching with smaller query-related contexts of documents stems from the relevance judgment process recorded by human observers. The most relevant part of a document is then recognized and used to rank documents with respect to the query. Experimental results on several representative retrieval models and standard datasets show that our proposed semantic matching model significantly outperforms competitive baselines in all measures.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148224/
        doi: 10.1007/978-3-030-45439-5_30

         id: cord-020848-nypu4w9s
     author: Morris, David
      title: SlideImages: A Dataset for Educational Image Classification
       date: 2020-03-24
      words: 2276.0
  sentences: 145.0
      pages: 
     flesch: 51.0
      cache: ./cache/cord-020848-nypu4w9s.txt
        txt: ./txt/cord-020848-nypu4w9s.txt
    summary: Currently, many document analysis systems are trained in part on scene images due to the lack of large datasets of educational image data. In this paper, we address this issue and present SlideImages, a dataset for the task of classifying educational illustrations. SlideImages contains training data collected from various sources, e.g., Wikimedia Commons and the AI2D dataset, and test data collected from educational slides. Born-digital and educational images need further benchmarks on challenging information retrieval tasks in order to test generalization. While document scans and born-digital educational illustrations have materially different appearance, these papers show that the utility of deep neural networks is not limited to scene image tasks (Fig. 1) . The related DocFigure dataset covers similar images and has much more data than SlideImages. In this paper, we have presented the task of classifying educational illustrations and images in slides and introduced a novel dataset SlideImages.
   abstract: In the past few years, convolutional neural networks (CNNs) have achieved impressive results in computer vision tasks, which however mainly focus on photos with natural scene content. Besides, non-sensor derived images such as illustrations, data visualizations, figures, etc. are typically used to convey complex information or to explore large datasets. However, this kind of images has received little attention in computer vision. CNNs and similar techniques use large volumes of training data. Currently, many document analysis systems are trained in part on scene images due to the lack of large datasets of educational image data. In this paper, we address this issue and present SlideImages, a dataset for the task of classifying educational illustrations. SlideImages contains training data collected from various sources, e.g., Wikimedia Commons and the AI2D dataset, and test data collected from educational slides. We have reserved all the actual educational images as a test dataset in order to ensure that the approaches using this dataset generalize well to new educational images, and potentially other domains. Furthermore, we present a baseline system using a standard deep neural architecture and discuss dealing with the challenge of limited training data.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148059/
        doi: 10.1007/978-3-030-45442-5_36

         id: cord-020811-pacy48qx
     author: Muhammad, Shamsuddeen Hassan
      title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon
       date: 2020-03-24
      words: 1725.0
  sentences: 113.0
      pages: 
     flesch: 50.0
      cache: ./cache/cord-020811-pacy48qx.txt
        txt: ./txt/cord-020811-pacy48qx.txt
    summary: title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon To this end, we propose an approach to automatically generate a domain-specific sentiment lexicon using a vector model enriched by weights. Although research has been carried out on corpus-based approaches for automatic generation of a domain-specific lexicon [1, 4, 5, 7, 9, 10, 14] , existing approaches focused on creation of a lexicon from a single corpus [4] . To this end, this work proposes an incremental approach for the automatic generation of a domain-specific sentiment lexicon. We aim to investigate an incremental technique for automatically generating domain-specific sentiment lexicon from a corpus. Can we automatically generate a sentiment lexicon from a corpus and improves the existing approaches? After detecting the domain shift, we merge the distribution using a similar approach discussed (in updating using the same corpus) and generate the lexicon.
   abstract: Sentiment lexicon plays a vital role in lexicon-based sentiment analysis. The lexicon-based method is often preferred because it leads to more explainable answers in comparison with many machine learning-based methods. But, semantic orientation of a word depends on its domain. Hence, a general-purpose sentiment lexicon may gives sub-optimal performance compare with a domain-specific lexicon. However, it is challenging to manually generate a domain-specific sentiment lexicon for each domain. Still, it is impractical to generate complete sentiment lexicon for a domain from a single corpus. To this end, we propose an approach to automatically generate a domain-specific sentiment lexicon using a vector model enriched by weights. Importantly, we propose an incremental approach for updating an existing lexicon to either the same domain or different domain (domain-adaptation). Finally, we discuss how to incorporate sentiment lexicons information in neural models (word embedding) for better performance.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148022/
        doi: 10.1007/978-3-030-45442-5_81

         id: cord-020918-056bvngu
     author: Nchabeleng, Mathibele
      title: Evaluating the Effectiveness of the Standard Insights Extraction Pipeline for Bantu Languages
       date: 2020-03-17
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: Extracting insights from data obtained from the web in order to identify people’s views and opinions on various topics is a growing practice. The standard insights extraction pipeline is typically an unsupervised machine learning task composed of processes that preprocess the text, visualize it, cluster and identify the topics and sentiment in each cluster, and then graph the network. Given the increasing amount of data being generated on the internet in Africa today, and the multilingual state of African countries, we evaluated how well the standard pipeline works when applied to text wholly or partially written in indigenous African languages, specifically Bantu languages. We carried out an exploratory investigation using Twitter data and compared the outputs from each step of the pipeline for an English dataset and a mixed Bantu language dataset. We found that for Bantu languages, due to their complex grammatical structure, extra preprocessing steps such as part-of-speech tagging and morphological analysis are required during data cleaning, threshold values should be adjusted during topic modeling, and semantic analysis should be performed before completing text preprocessing.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148238/
        doi: 10.1007/978-3-030-45439-5_11

         id: cord-020832-iavwkdpr
     author: Nguyen, Dat Quoc
      title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents
       date: 2020-03-24
      words: 1980.0
  sentences: 118.0
      pages: 
     flesch: 49.0
      cache: ./cache/cord-020832-iavwkdpr.txt
        txt: ./txt/cord-020832-iavwkdpr.txt
    summary: title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents ChEMU involves two key information extraction tasks over chemical reactions from patents. In this paper, we propose a new evaluation lab (called ChEMU) focusing on information extraction over chemical reactions from patents. Our goals are: (1) To develop tasks that impact chemical research in both academia and industry, (2) To provide the community with a new dataset of chemical entities, enriched with relational links between chemical event triggers and arguments, and (3) To advance the state-of-the-art in information extraction over chemical patents. The ChEMU lab at CLEF-2020 1 offers the two information extraction tasks of Named entity recognition (Task 1) and Event extraction (Task 2) over chemical reactions from patent documents. ChEMU will focus on two new tasks of named entity recognition and event extraction over chemical reactions from patents.
   abstract: We introduce a new evaluation lab named ChEMU (Cheminformatics Elsevier Melbourne University), part of the 11th Conference and Labs of the Evaluation Forum (CLEF-2020). ChEMU involves two key information extraction tasks over chemical reactions from patents. Task 1—Named entity recognition—involves identifying chemical compounds as well as their types in context, i.e., to assign the label of a chemical compound according to the role which the compound plays within a chemical reaction. Task 2—Event extraction over chemical reactions—involves event trigger detection and argument recognition. We briefly present the motivations and goals of the ChEMU tasks, as well as resources and evaluation methodology.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148043/
        doi: 10.1007/978-3-030-45442-5_74

         id: cord-020820-cbikq0v0
     author: Papadakos, Panagiotis
      title: Dualism in Topical Relevance
       date: 2020-03-24
      words: 2468.0
  sentences: 133.0
      pages: 
     flesch: 56.0
      cache: ./cache/cord-020820-cbikq0v0.txt
        txt: ./txt/cord-020820-cbikq0v0.txt
    summary: To this end, in this paper we elaborate on the idea of leveraging the available antonyms of the original query terms for eventually producing an answer which provides a better overview of the related conceptual and information space. In this paper we elaborate on the idea of leveraging the available antonyms of the original query terms (if they exist), for eventually producing an answer which provides a better overview of the related information and conceptual space. In their comments for these queries, users mention that the selected (i.e., dual) list "provides a more general picture" and "more relevant and interesting results, although contradicting". For the future, we plan to define the appropriate antonyms selection algorithms and relevance metrics, implement the proposed functionality in a meta-search setting, and conduct a large scale evaluation with real users over exploratory tasks, to identify in which queries the dual approach is beneficial and to what types of users.
   abstract: There are several concepts whose interpretation and meaning is defined through their binary opposition with other opposite concepts. To this end, in this paper we elaborate on the idea of leveraging the available antonyms of the original query terms for eventually producing an answer which provides a better overview of the related conceptual and information space. Specifically, we sketch a method in which antonyms are used for producing dual queries, which can in turn be exploited for defining a multi-dimensional topical relevance based on the antonyms. We motivate this direction by providing examples and by conducting a preliminary evaluation that shows its importance to specific users.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148031/
        doi: 10.1007/978-3-030-45442-5_40

         id: cord-020909-n36p5n2k
     author: Papadakos, Panagiotis
      title: bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users
       date: 2020-03-17
      words: 5005.0
  sentences: 256.0
      pages: 
     flesch: 63.0
      cache: ./cache/cord-020909-n36p5n2k.txt
        txt: ./txt/cord-020909-n36p5n2k.txt
    summary: -the bias goggles model for computing the bias characteristics of web domains for a user-defined concept, based on the notions of Biased Concepts (BCs), Aspects of Bias (ABs), and the metrics of the support of the domain for a specific AB and BC, and its bias score for this BC, -the introduction of the Support Flow Graph (SFG), along with graph-based algorithms for computing the AB support score of domains, that include adaptations of the Independence Cascade (IC) and Linear Threshold (LT) propagation models, and the new Biased-PageRank (Biased-PR) variation that models different behaviours of a biased surfer, -an initial discussion about performance and implementation issues, -some promising evaluation results that showcase the effectiveness and efficiency of the approach on a relatively small dataset of crawled pages, using the new AGBR and AGS metrics, -a publicly accessible prototype of bias goggles.
   abstract: Ethical issues, along with transparency, disinformation, and bias, are in the focus of our information society. In this work, we propose the bias goggles model, for computing the bias characteristics of web domains to user-defined concepts based on the structure of the web graph. For supporting the model, we exploit well-known propagation models and the newly introduced Biased-PR PageRank algorithm, that models various behaviours of biased surfers. An implementation discussion, along with a preliminary evaluation over a subset of the greek web graph, shows the applicability of the model even in real-time for small graphs, and showcases rather promising and interesting results. Finally, we pinpoint important directions for future work. A constantly evolving prototype of the bias goggles system is readily available.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148229/
        doi: 10.1007/978-3-030-45439-5_52

         id: cord-020871-1v6dcmt3
     author: Papariello, Luca
      title: On the Replicability of Combining Word Embeddings and Retrieval Models
       date: 2020-03-24
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: We replicate recent experiments attempting to demonstrate an attractive hypothesis about the use of the Fisher kernel framework and mixture models for aggregating word embeddings towards document representations and the use of these representations in document classification, clustering, and retrieval. Specifically, the hypothesis was that the use of a mixture model of von Mises-Fisher (VMF) distributions instead of Gaussian distributions would be beneficial because of the focus on cosine distances of both VMF and the vector space model traditionally used in information retrieval. Previous experiments had validated this hypothesis. Our replication was not able to validate it, despite a large parameter scan space.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148082/
        doi: 10.1007/978-3-030-45442-5_7

         id: cord-020905-gw8i6tkn
     author: Qu, Xianshan
      title: An Attention Model of Customer Expectation to Improve Review Helpfulness Prediction
       date: 2020-03-17
      words: 5412.0
  sentences: 330.0
      pages: 
     flesch: 60.0
      cache: ./cache/cord-020905-gw8i6tkn.txt
        txt: ./txt/cord-020905-gw8i6tkn.txt
    summary: To model such customer expectations and capture important information from a review text, we propose a novel neural network which leverages review sentiment and product information. In order to address the above issues, we propose a novel neural network architecture to introduce sentiment and product information when identifying helpful content from a review text. In the cold start scenario, our proposed model demonstrates an AUC improvement of 5.4% and 1.5% on Amazon and Yelp data sets, respectively, when compared to the state of the art model. From Table 5 , we see that adding a sentiment attention layer (HSA) to the base model (HBiLSTM) results in an average improvement in the AUC score of 2.0% and 2.6%, respectively on the Amazon and Yelp data sets. In this paper, we describe our analysis of review helpfulness prediction and propose a novel neural network model with attention modules to incorporate sentiment and product information.
   abstract: Many people browse reviews online before making purchasing decisions. It is essential to identify the subset of helpful reviews from the large number of reviews of varying quality. This paper aims to build a model to predict review helpfulness automatically. Our work is inspired by the observation that a customer’s expectation of a review can be greatly affected by review sentiment and the degree to which the customer is aware of pertinent product information. Consequently, a customer may pay more attention to that specific content of a review which contributes more to its helpfulness from their perspective. To model such customer expectations and capture important information from a review text, we propose a novel neural network which leverages review sentiment and product information. Specifically, we encode the sentiment of a review through an attention module, to get sentiment-driven information from review text. We also introduce a product attention layer that fuses information from both the target product and related products, in order to capture the product related information from review text. Our experimental results show an AUC improvement of 5.4% and 1.5% over the previous state of the art model on Amazon and Yelp data sets, respectively.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148225/
        doi: 10.1007/978-3-030-45439-5_55

         id: cord-020872-frr8xba6
     author: Santosh, Tokala Yaswanth Sri Sai
      title: DAKE: Document-Level Attention for Keyphrase Extraction
       date: 2020-03-24
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: Keyphrases provide a concise representation of the topical content of a document and they are helpful in various downstream tasks. Previous approaches for keyphrase extraction model it as a sequence labelling task and use local contextual information to understand the semantics of the input text but they fail when the local context is ambiguous or unclear. We present a new framework to improve keyphrase extraction by utilizing additional supporting contextual information. We retrieve this additional information from other sentences within the same document. To this end, we propose Document-level Attention for Keyphrase Extraction (DAKE), which comprises Bidirectional Long Short-Term Memory networks that capture hidden semantics in text, a document-level attention mechanism to incorporate document level contextual information, gating mechanisms which help to determine the influence of additional contextual information on the fusion with local contextual information, and Conditional Random Fields which capture output label dependencies. Our experimental results on a dataset of research papers show that the proposed model outperforms previous state-of-the-art approaches for keyphrase extraction.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148091/
        doi: 10.1007/978-3-030-45442-5_49

         id: cord-020936-k1upc1xu
     author: Sanz-Cruzado, Javier
      title: Axiomatic Analysis of Contact Recommendation Methods in Social Networks: An IR Perspective
       date: 2020-03-17
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: Contact recommendation is an important functionality in many social network scenarios including Twitter and Facebook, since they can help grow the social networks of users by suggesting, to a given user, people they might wish to follow. Recently, it has been shown that classical information retrieval (IR) weighting models – such as BM25 – can be adapted to effectively recommend new social contacts to a given user. However, the exact properties that make such adapted contact recommendation models effective at the task are as yet unknown. In this paper, inspired by new advances in the axiomatic theory of IR, we study the existing IR axioms for the contact recommendation task. Our theoretical analysis and empirical findings show that while the classical axioms related to term frequencies and term discrimination seem to have a positive impact on the recommendation effectiveness, those related to length normalization tend to be not desirable for the task.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148256/
        doi: 10.1007/978-3-030-45439-5_12

         id: cord-020885-f667icyt
     author: Sharma, Ujjwal
      title: Semantic Path-Based Learning for Review Volume Prediction
       date: 2020-03-17
      words: 4026.0
  sentences: 245.0
      pages: 
     flesch: 48.0
      cache: ./cache/cord-020885-f667icyt.txt
        txt: ./txt/cord-020885-f667icyt.txt
    summary: In this work, we present an approach that uses semantically meaningful, bimodal random walks on real-world heterogeneous networks to extract correlations between nodes and bring together nodes with shared or similar attributes. In this work, -We propose a novel method that incorporates restaurants and their attributes into a multimodal graph and extracts multiple, bimodal low dimensional representations for restaurants based on available paths through shared visual, textual, geographical and categorical features. In this section, we discuss prior work that leverages graph-based structures for extracting information from multiple modalities, focussing on the auto-captioning task that introduced such methods. For each of these sub-networks, we perform random walks and use a variant of the heterogeneous skip-gram objective introduced in [6] to generate low-dimensional bimodal embeddings. Our attention-based model combines separately learned bimodal embeddings using a late-fusion setup for predicting the review volume of the restaurants.
   abstract: Graphs offer a natural abstraction for modeling complex real-world systems where entities are represented as nodes and edges encode relations between them. In such networks, entities may share common or similar attributes and may be connected by paths through multiple attribute modalities. In this work, we present an approach that uses semantically meaningful, bimodal random walks on real-world heterogeneous networks to extract correlations between nodes and bring together nodes with shared or similar attributes. An attention-based mechanism is used to combine multiple attribute-specific representations in a late fusion setup. We focus on a real-world network formed by restaurants and their shared attributes and evaluate performance on predicting the number of reviews a restaurant receives, a strong proxy for popularity. Our results demonstrate the rich expressiveness of such representations in predicting review volume and the ability of an attention-based model to selectively combine individual representations for maximum predictive power on the chosen downstream task.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148205/
        doi: 10.1007/978-3-030-45439-5_54

         id: cord-020793-kgje01qy
     author: Suominen, Hanna
      title: CLEF eHealth Evaluation Lab 2020
       date: 2020-03-24
      words: 2379.0
  sentences: 116.0
      pages: 
     flesch: 51.0
      cache: ./cache/cord-020793-kgje01qy.txt
        txt: ./txt/cord-020793-kgje01qy.txt
    summary: Laypeople''s increasing difficulties to retrieve and digest valid and relevant information in their preferred language to make health-centred decisions has motivated CLEF eHealth to organize yearly labs since 2012. substantial community interest in the tasks and their resources has led to CLEF eHealth maturing as a primary venue for all interdisciplinary actors of the ecosystem for producing, processing, and consuming electronic health information. Information access conferences have organized evaluation labs on related Electronic Health (eHealth) Information Extraction (IE), Information Management (IM), and Information Retrieval (IR) tasks for almost 20 years. This Consumer Health Search (CHS) task follows a standard IR shared challenge paradigm from the perspective that it provides participants with a test collection consisting of a set of documents and a set of topics to develop IR techniques for. The IR task at the CLEF eHealth evaluation lab 2016: usercentred health information retrieval
   abstract: Laypeople’s increasing difficulties to retrieve and digest valid and relevant information in their preferred language to make health-centred decisions has motivated CLEF eHealth to organize yearly labs since 2012. These 20 evaluation tasks on Information Extraction (IE), management, and Information Retrieval (IR) in 2013–2019 have been popular—as demonstrated by the large number of team registrations, submissions, papers, their included authors, and citations (748, 177, 184, 741, and 1299, respectively, up to and including 2018)—and achieved statistically significant improvements in the processing quality. In 2020, CLEF eHealth is calling for participants to contribute to the following two tasks: The 2020 Task 1 on IE focuses on term coding for clinical textual data in Spanish. The terms considered are extracted from clinical case records and they are mapped onto the Spanish version of the International Classification of Diseases, the 10th Revision, including also textual evidence spans for the clinical codes. The 2020 Task 2 is a novel extension of the most popular and established task in CLEF eHealth on CHS. This IR task uses the representative web corpus used in the 2018 challenge, but now also spoken queries, as well as textual transcripts of these queries, are offered to the participants. The task is structured into a number of optional subtasks, covering ad-hoc search using the spoken queries, textual transcripts of the spoken queries, or provided automatic speech-to-text conversions of the spoken queries. In this paper we describe the evolution of CLEF eHealth and this year’s tasks. The substantial community interest in the tasks and their resources has led to CLEF eHealth maturing as a primary venue for all interdisciplinary actors of the ecosystem for producing, processing, and consuming electronic health information.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148004/
        doi: 10.1007/978-3-030-45442-5_76

         id: cord-020875-vd4rtxmz
     author: Suwaileh, Reem
      title: Time-Critical Geolocation for Social Good
       date: 2020-03-24
      words: 2030.0
  sentences: 134.0
      pages: 
     flesch: 51.0
      cache: ./cache/cord-020875-vd4rtxmz.txt
        txt: ./txt/cord-020875-vd4rtxmz.txt
    summary: To address this problem, I aim to exploit different techniques such as training neural models, enriching the tweet representation, and studying methods to mitigate the lack of labeled data. In my work, I am interested in tackling the Location Mention Prediction (LMP) problem during time-critical situations. The location taggers have to address many challenges including microblogging-specific challenges (e.g., tweet sparsity, noisiness, stream rapid-changing, hashtag riding, etc.) and the task-specific challenges (e.g., time-criticality of the solution, scarcity of labeled data, etc.). Alternatively, Sultanik and Fink [25] , used Information Retrieval (IR) based approach to identify the location mentions in tweets. Moreover, Hoang and Mothe [8] combined syntactic and semantic features to train traditional ML-based models whereas Kumar and Singh [13] trained a Convolutional Neural Network (CNN) model that learns the continuous representation of tweet text and then identifies the location mentions.
   abstract: Twitter has become an instrumental source of news in emergencies where efficient access, dissemination of information, and immediate reactions are critical. Nevertheless, due to several challenges, the current fully-automated processing methods are not yet mature enough for deployment in real scenarios. In this dissertation, I focus on tackling the lack of context problem by studying automatic geo-location techniques. I specifically aim to study the Location Mention Prediction problem in which the system has to extract location mentions in tweets and pin them on the map. To address this problem, I aim to exploit different techniques such as training neural models, enriching the tweet representation, and studying methods to mitigate the lack of labeled data. I anticipate many downstream applications for the Location Mention Prediction problem such as incident detection, real-time action management during emergencies, and fake news and rumor detection among others.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148099/
        doi: 10.1007/978-3-030-45442-5_82

         id: cord-020903-qt0ly5d0
     author: Tamine, Lynda
      title: What Can Task Teach Us About Query Reformulations?
       date: 2020-03-17
      words: 4957.0
  sentences: 264.0
      pages: 
     flesch: 61.0
      cache: ./cache/cord-020903-qt0ly5d0.txt
        txt: ./txt/cord-020903-qt0ly5d0.txt
    summary: task-based sessions represent significantly different background contexts to be used in the perspective of better understanding users'' query reformulations. Using insights from large-scale search logs, our findings clearly show that task is an additional relevant search unit that helps better understanding user''s query reformulation patterns and predicting the next user''s query. To design support processes for task-based search systems, we argue that we need to: (1) fully understand how user''s task performed in natural settings drives the query reformulations changes; and (2) gauge the level of similarity of these changes trends with those observed in time-based sessions. With this in mind, we perform large-scale log analyses of users naturally engaged in tasks to examine query reformulations from both the time-based session vs. To identify query reformulation patterns, most of the previous works used large-scale log analyses segmented into time-based sessions.
   abstract: A significant amount of prior research has been devoted to understanding query reformulations. The majority of these works rely on time-based sessions which are sequences of contiguous queries segmented using time threshold on users’ activities. However, queries are generally issued by users having in mind a particular task, and time-based sessions unfortunately fail in revealing such tasks. In this paper, we are interested in revealing in which extent time-based sessions vs. task-based sessions represent significantly different background contexts to be used in the perspective of better understanding users’ query reformulations. Using insights from large-scale search logs, our findings clearly show that task is an additional relevant search unit that helps better understanding user’s query reformulation patterns and predicting the next user’s query. The findings from our analyses provide potential implications for model design of task-based search engines.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148223/
        doi: 10.1007/978-3-030-45439-5_42

         id: cord-020891-lt3m8h41
     author: Witschel, Hans Friedrich
      title: KvGR: A Graph-Based Interface for Explorative Sequential Question Answering on Heterogeneous Information Sources
       date: 2020-03-17
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: Exploring a knowledge base is often an iterative process: initially vague information needs are refined by interaction. We propose a novel approach for such interaction that supports sequential question answering (SQA) on knowledge graphs. As opposed to previous work, we focus on exploratory settings, which we support with a visual representation of graph structures, helping users to better understand relationships. In addition, our approach keeps track of context – an important challenge in SQA – by allowing users to make their focus explicit via subgraph selection. Our results show that the interaction principle is either understood immediately or picked up very quickly – and that the possibility of exploring the information space iteratively is appreciated.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148211/
        doi: 10.1007/978-3-030-45439-5_50

         id: cord-020932-o5scqiyk
     author: Zhong, Wei
      title: Accelerating Substructure Similarity Search for Formula Retrieval
       date: 2020-03-17
      words: 4602.0
  sentences: 278.0
      pages: 
     flesch: 65.0
      cache: ./cache/cord-020932-o5scqiyk.txt
        txt: ./txt/cord-020932-o5scqiyk.txt
    summary: In text similarity search, query processing can be accelerated through dynamic pruning [18] , which typically estimates score upperbounds to prune documents unlikely to be in the top K results. As a result, the posting list entry also stores the root node ID for indexed paths, in order to reconstruct matches substructures at merge time. Define partial upperbound matrix W = {w i,j } |Tq|×|T| where T = {T(m), m ∈ T q } are all the token paths from query OPT (T is essentially the same as tokenized P(T q )), and a binary variable x |T|×1 indicating which corresponding posting lists are placed in the non-requirement set. We have presented rank-safe dynamic pruning strategies that produce an upperbound estimation of structural similarity in order to speedup formula search using subtree matching. Our dynamic pruning strategies and specialized inverted index are different from traditional linear text search pruning methods and they further associate query structure representation with posting lists.
   abstract: Formula retrieval systems using substructure matching are effective, but suffer from slow retrieval times caused by the complexity of structure matching. We present a specialized inverted index and rank-safe dynamic pruning algorithm for faster substructure retrieval. Formulas are indexed from their Operator Tree (OPT) representations. Our model is evaluated using the NTCIR-12 Wikipedia Formula Browsing Task and a new formula corpus produced from Math StackExchange posts. Our approach preserves the effectiveness of structure matching while allowing queries to be executed in real-time.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148252/
        doi: 10.1007/978-3-030-45439-5_47

         id: cord-020927-89c7rijg
     author: Zhuang, Shengyao
      title: Counterfactual Online Learning to Rank
       date: 2020-03-17
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users’ implicit feedback usually contains bias (e.g., position bias, selection bias) and noise (e.g., clicking on irrelevant but attractive snippets, adversarial clicks). Two main methods have arisen for optimizing rankers based on implicit feedback: counterfactual learning to rank (CLTR), which learns a ranker from the historical click-through data collected from a deployed, logging ranker; and online learning to rank (OLTR), where a ranker is updated by recording user interaction with a result list produced by multiple rankers (usually via interleaving). In this paper, we propose a counterfactual online learning to rank algorithm (COLTR) that combines the key components of both CLTR and OLTR. It does so by replacing the online evaluation required by traditional OLTR methods with the counterfactual evaluation common in CLTR. Compared to traditional OLTR approaches based on interleaving, COLTR can evaluate a large number of candidate rankers in a more efficient manner. Our empirical results show that COLTR significantly outperforms traditional OLTR methods. Furthermore, COLTR can reach the same effectiveness of the current state-of-the-art, under noisy click settings, and has room for future extensions.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/
        doi: 10.1007/978-3-030-45439-5_28

         id: cord-020846-mfh1ope6
     author: Zlabinger, Markus
      title: DSR: A Collection for the Evaluation of Graded Disease-Symptom Relations
       date: 2020-03-24
      words: nan
  sentences: nan
      pages: 
     flesch: nan
      cache: 
        txt: 
    summary: 
   abstract: The effective extraction of ranked disease-symptom relationships is a critical component in various medical tasks, including computer-assisted medical diagnosis or the discovery of unexpected associations between diseases. While existing disease-symptom relationship extraction methods are used as the foundation in the various medical tasks, no collection is available to systematically evaluate the performance of such methods. In this paper, we introduce the Disease-Symptom Relation Collection (dsr-collection), created by five physicians as expert annotators. We provide graded symptom judgments for diseases by differentiating between relevant symptoms and primary symptoms. Further, we provide several strong baselines, based on the methods used in previous studies. The first method is based on word embeddings, and the second on co-occurrences of MeSH-keywords of medical articles. For the co-occurrence method, we propose an adaption in which not only keywords are considered, but also the full text of medical articles. The evaluation on the dsr-collection shows the effectiveness of the proposed adaption in terms of nDCG, precision, and recall.
        url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148057/
        doi: 10.1007/978-3-030-45442-5_54

==== make-pages.sh questions [ERIC WAS HERE]
==== make-pages.sh search
/data-disk/reader-compute/reader-cord/bin/make-pages.sh: line 77: /data-disk/reader-compute/reader-cord/tmp/search.htm: No such file or directory
Traceback (most recent call last):
  File "/data-disk/reader-compute/reader-cord/bin/tsv2htm-search.py", line 51, in <module>
    with open( TEMPLATE, 'r' ) as handle : htm = handle.read()
FileNotFoundError: [Errno 2] No such file or directory: '/data-disk/reader-compute/reader-cord/tmp/search.htm'
==== make-pages.sh topic modeling corpus
Zipping study carrel