Carrel name: journal-advancesInInformationRetrieval-cord Creating study carrel named journal-advancesInInformationRetrieval-cord Initializing database file: cache/cord-020793-kgje01qy.json key: cord-020793-kgje01qy authors: Suominen, Hanna; Kelly, Liadh; Goeuriot, Lorraine; Krallinger, Martin title: CLEF eHealth Evaluation Lab 2020 date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_76 sha: doc_id: 20793 cord_uid: kgje01qy file: cache/cord-020794-d3oru1w5.json key: cord-020794-d3oru1w5 authors: Leekha, Maitree; Goswami, Mononito; Jain, Minni title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_28 sha: doc_id: 20794 cord_uid: d3oru1w5 file: cache/cord-020811-pacy48qx.json key: cord-020811-pacy48qx authors: Muhammad, Shamsuddeen Hassan; Brazdil, Pavel; Jorge, Alípio title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_81 sha: doc_id: 20811 cord_uid: pacy48qx file: cache/cord-020820-cbikq0v0.json key: cord-020820-cbikq0v0 authors: Papadakos, Panagiotis; Kalipolitis, Orfeas title: Dualism in Topical Relevance date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_40 sha: doc_id: 20820 cord_uid: cbikq0v0 file: cache/cord-020885-f667icyt.json key: cord-020885-f667icyt authors: Sharma, Ujjwal; Rudinac, Stevan; Worring, Marcel; Demmers, Joris; van Dolen, Willemijn title: Semantic Path-Based Learning for Review Volume Prediction date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_54 sha: doc_id: 20885 cord_uid: f667icyt file: cache/cord-020896-yrocw53j.json key: cord-020896-yrocw53j authors: Agarwal, Mansi; Leekha, Maitree; Sawhney, Ramit; Ratn Shah, Rajiv; Kumar Yadav, Rajesh; Kumar Vishwakarma, Dinesh title: MEMIS: Multimodal Emergency Management Information System date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_32 sha: doc_id: 20896 cord_uid: yrocw53j file: cache/cord-020834-ch0fg9rp.json key: cord-020834-ch0fg9rp authors: Grand, Adrien; Muir, Robert; Ferenczi, Jim; Lin, Jimmy title: From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_3 sha: doc_id: 20834 cord_uid: ch0fg9rp file: cache/cord-020815-j9eboa94.json key: cord-020815-j9eboa94 authors: Kamphuis, Chris; de Vries, Arjen P.; Boytsov, Leonid; Lin, Jimmy title: Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_4 sha: doc_id: 20815 cord_uid: j9eboa94 file: cache/cord-020841-40f2p3t4.json key: cord-020841-40f2p3t4 authors: Hofstätter, Sebastian; Zlabinger, Markus; Hanbury, Allan title: Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-ranking Results date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_58 sha: doc_id: 20841 cord_uid: 40f2p3t4 file: cache/cord-020801-3sbicp3v.json key: cord-020801-3sbicp3v authors: MacAvaney, Sean; Soldaini, Luca; Goharian, Nazli title: Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_31 sha: doc_id: 20801 cord_uid: 3sbicp3v file: cache/cord-020806-lof49r72.json key: cord-020806-lof49r72 authors: Landin, Alfonso; Parapar, Javier; Barreiro, Álvaro title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_27 sha: doc_id: 20806 cord_uid: lof49r72 file: cache/cord-020843-cq4lbd0l.json key: cord-020843-cq4lbd0l authors: Almeida, Tiago; Matos, Sérgio title: Calling Attention to Passages for Biomedical Question Answering date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_9 sha: doc_id: 20843 cord_uid: cq4lbd0l file: cache/cord-020912-tbq7okmj.json key: cord-020912-tbq7okmj authors: Batra, Vishwash; Haldar, Aparajita; He, Yulan; Ferhatosmanoglu, Hakan; Vogiatzis, George; Guha, Tanaya title: Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_4 sha: doc_id: 20912 cord_uid: tbq7okmj file: cache/cord-020899-d6r4fr9r.json key: cord-020899-d6r4fr9r authors: Doinychko, Anastasiia; Amini, Massih-Reza title: Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_53 sha: doc_id: 20899 cord_uid: d6r4fr9r file: cache/cord-020905-gw8i6tkn.json key: cord-020905-gw8i6tkn authors: Qu, Xianshan; Li, Xiaopeng; Farkas, Csilla; Rose, John title: An Attention Model of Customer Expectation to Improve Review Helpfulness Prediction date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_55 sha: doc_id: 20905 cord_uid: gw8i6tkn file: cache/cord-020888-ov2lzus4.json key: cord-020888-ov2lzus4 authors: Formal, Thibault; Clinchant, Stéphane; Renders, Jean-Michel; Lee, Sooyeol; Cho, Geun Hee title: Learning to Rank Images with Cross-Modal Graph Convolutions date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_39 sha: doc_id: 20888 cord_uid: ov2lzus4 file: cache/cord-020916-ds0cf78u.json key: cord-020916-ds0cf78u authors: Fard, Mazar Moradi; Thonet, Thibaut; Gaussier, Eric title: Seed-Guided Deep Document Clustering date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_1 sha: doc_id: 20916 cord_uid: ds0cf78u file: cache/cord-020936-k1upc1xu.json key: cord-020936-k1upc1xu authors: Sanz-Cruzado, Javier; Macdonald, Craig; Ounis, Iadh; Castells, Pablo title: Axiomatic Analysis of Contact Recommendation Methods in Social Networks: An IR Perspective date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_12 sha: doc_id: 20936 cord_uid: k1upc1xu file: cache/cord-020914-7p37m92a.json key: cord-020914-7p37m92a authors: Dumani, Lorik; Neumann, Patrick J.; Schenkel, Ralf title: A Framework for Argument Retrieval: Ranking Argument Clusters by Frequency and Specificity date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_29 sha: doc_id: 20914 cord_uid: 7p37m92a file: cache/cord-020909-n36p5n2k.json key: cord-020909-n36p5n2k authors: Papadakos, Panagiotis; Konstantakis, Giannis title: bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_52 sha: doc_id: 20909 cord_uid: n36p5n2k file: cache/cord-020932-o5scqiyk.json key: cord-020932-o5scqiyk authors: Zhong, Wei; Rohatgi, Shaurya; Wu, Jian; Giles, C. Lee; Zanibbi, Richard title: Accelerating Substructure Similarity Search for Formula Retrieval date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_47 sha: doc_id: 20932 cord_uid: o5scqiyk file: cache/cord-020832-iavwkdpr.json key: cord-020832-iavwkdpr authors: Nguyen, Dat Quoc; Zhai, Zenan; Yoshikawa, Hiyori; Fang, Biaoyan; Druckenbrodt, Christian; Thorne, Camilo; Hoessel, Ralph; Akhondi, Saber A.; Cohn, Trevor; Baldwin, Timothy; Verspoor, Karin title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_74 sha: doc_id: 20832 cord_uid: iavwkdpr file: cache/cord-020875-vd4rtxmz.json key: cord-020875-vd4rtxmz authors: Suwaileh, Reem title: Time-Critical Geolocation for Social Good date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_82 sha: doc_id: 20875 cord_uid: vd4rtxmz file: cache/cord-020814-1ty7wzlv.json key: cord-020814-1ty7wzlv authors: Berrendorf, Max; Faerman, Evgeniy; Melnychuk, Valentyn; Tresp, Volker; Seidl, Thomas title: Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_1 sha: doc_id: 20814 cord_uid: 1ty7wzlv file: cache/cord-020813-0wc23ixy.json key: cord-020813-0wc23ixy authors: Hashemi, Helia; Aliannejadi, Mohammad; Zamani, Hamed; Croft, W. Bruce title: ANTIQUE: A Non-factoid Question Answering Benchmark date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_21 sha: doc_id: 20813 cord_uid: 0wc23ixy file: cache/cord-020903-qt0ly5d0.json key: cord-020903-qt0ly5d0 authors: Tamine, Lynda; Melgarejo, Jesús Lovón; Pinel-Sauvagnat, Karen title: What Can Task Teach Us About Query Reformulations? date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_42 sha: doc_id: 20903 cord_uid: qt0ly5d0 file: cache/cord-020830-97xmu329.json key: cord-020830-97xmu329 authors: Ghanem, Bilal; Karoui, Jihen; Benamara, Farah; Rosso, Paolo; Moriceau, Véronique title: Irony Detection in a Multilingual Context date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_18 sha: doc_id: 20830 cord_uid: 97xmu329 file: cache/cord-020848-nypu4w9s.json key: cord-020848-nypu4w9s authors: Morris, David; Müller-Budack, Eric; Ewerth, Ralph title: SlideImages: A Dataset for Educational Image Classification date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_36 sha: doc_id: 20848 cord_uid: nypu4w9s file: cache/cord-020808-wpso3jug.json key: cord-020808-wpso3jug authors: Cardoso, João; Proença, Diogo; Borbinha, José title: Machine-Actionable Data Management Plans: A Knowledge Retrieval Approach to Automate the Assessment of Funders’ Requirements date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_15 sha: doc_id: 20808 cord_uid: wpso3jug file: cache/cord-020835-n9v5ln2i.json key: cord-020835-n9v5ln2i authors: Jangra, Anubhav; Jatowt, Adam; Hasanuzzaman, Mohammad; Saha, Sriparna title: Text-Image-Video Summary Generation Using Joint Integer Linear Programming date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_24 sha: doc_id: 20835 cord_uid: n9v5ln2i file: cache/cord-020880-m7d4e0eh.json key: cord-020880-m7d4e0eh authors: Barrón-Cedeño, Alberto; Elsayed, Tamer; Nakov, Preslav; Da San Martino, Giovanni; Hasanain, Maram; Suwaileh, Reem; Haouari, Fatima title: CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_65 sha: doc_id: 20880 cord_uid: m7d4e0eh file: cache/cord-020846-mfh1ope6.json key: cord-020846-mfh1ope6 authors: Zlabinger, Markus; Hofstätter, Sebastian; Rekabsaz, Navid; Hanbury, Allan title: DSR: A Collection for the Evaluation of Graded Disease-Symptom Relations date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_54 sha: doc_id: 20846 cord_uid: mfh1ope6 file: cache/cord-020891-lt3m8h41.json key: cord-020891-lt3m8h41 authors: Witschel, Hans Friedrich; Riesen, Kaspar; Grether, Loris title: KvGR: A Graph-Based Interface for Explorative Sequential Question Answering on Heterogeneous Information Sources date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_50 sha: doc_id: 20891 cord_uid: lt3m8h41 file: cache/cord-020931-fymgnv1g.json key: cord-020931-fymgnv1g authors: Meng, Changping; Chen, Muhao; Mao, Jie; Neville, Jennifer title: ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_3 sha: doc_id: 20931 cord_uid: fymgnv1g file: cache/cord-020871-1v6dcmt3.json key: cord-020871-1v6dcmt3 authors: Papariello, Luca; Bampoulidis, Alexandros; Lupu, Mihai title: On the Replicability of Combining Word Embeddings and Retrieval Models date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_7 sha: doc_id: 20871 cord_uid: 1v6dcmt3 file: cache/cord-020904-x3o3a45b.json key: cord-020904-x3o3a45b authors: Montazeralghaem, Ali; Rahimi, Razieh; Allan, James title: Relevance Ranking Based on Query-Aware Context Analysis date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_30 sha: doc_id: 20904 cord_uid: x3o3a45b file: cache/cord-020908-oe77eupc.json key: cord-020908-oe77eupc authors: Chen, Zhiyu; Jia, Haiyan; Heflin, Jeff; Davison, Brian D. title: Leveraging Schema Labels to Enhance Dataset Search date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_18 sha: doc_id: 20908 cord_uid: oe77eupc file: cache/cord-020927-89c7rijg.json key: cord-020927-89c7rijg authors: Zhuang, Shengyao; Zuccon, Guido title: Counterfactual Online Learning to Rank date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_28 sha: doc_id: 20927 cord_uid: 89c7rijg file: cache/cord-020918-056bvngu.json key: cord-020918-056bvngu authors: Nchabeleng, Mathibele; Byamugisha, Joan title: Evaluating the Effectiveness of the Standard Insights Extraction Pipeline for Bantu Languages date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_11 sha: doc_id: 20918 cord_uid: 056bvngu file: cache/cord-020901-aew8xr6n.json key: cord-020901-aew8xr6n authors: García-Durán, Alberto; González, Roberto; Oñoro-Rubio, Daniel; Niepert, Mathias; Li, Hui title: TransRev: Modeling Reviews as Translations from Users to Items date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_16 sha: doc_id: 20901 cord_uid: aew8xr6n file: cache/cord-020890-aw465igx.json key: cord-020890-aw465igx authors: Brochier, Robin; Guille, Adrien; Velcin, Julien title: Inductive Document Network Embedding with Topic-Word Attention date: 2020-03-17 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45439-5_22 sha: doc_id: 20890 cord_uid: aw465igx file: cache/cord-020851-hf5c0i9z.json key: cord-020851-hf5c0i9z authors: Losada, David E.; Crestani, Fabio; Parapar, Javier title: eRisk 2020: Self-harm and Depression Challenges date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_72 sha: doc_id: 20851 cord_uid: hf5c0i9z file: cache/cord-020872-frr8xba6.json key: cord-020872-frr8xba6 authors: Santosh, Tokala Yaswanth Sri Sai; Sanyal, Debarshi Kumar; Bhowmick, Plaban Kumar; Das, Partha Pratim title: DAKE: Document-Level Attention for Keyphrase Extraction date: 2020-03-24 journal: Advances in Information Retrieval DOI: 10.1007/978-3-030-45442-5_49 sha: doc_id: 20872 cord_uid: frr8xba6 Reading metadata file and updating bibliogrpahics === updating bibliographic database Building study carrel named journal-advancesInInformationRetrieval-cord === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 79544 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 80520 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 80988 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81131 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81037 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81457 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 80985 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81443 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81145 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81469 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81200 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81394 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint Try decreasing the value of OMP_NUM_THREADS. /data-disk/reader-compute/reader-cord/bin/file2bib.sh: line 39: 81367 Aborted $FILE2BIB "$FILE" > "$OUTPUT" === file2bib.sh === id: cord-020811-pacy48qx author: Muhammad, Shamsuddeen Hassan title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020811-pacy48qx.txt cache: ./cache/cord-020811-pacy48qx.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020811-pacy48qx.txt' === file2bib.sh === id: cord-020841-40f2p3t4 author: Hofstätter, Sebastian title: Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-ranking Results date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020841-40f2p3t4.txt cache: ./cache/cord-020841-40f2p3t4.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'cord-020841-40f2p3t4.txt' === file2bib.sh === id: cord-020808-wpso3jug author: Cardoso, João title: Machine-Actionable Data Management Plans: A Knowledge Retrieval Approach to Automate the Assessment of Funders’ Requirements date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020808-wpso3jug.txt cache: ./cache/cord-020808-wpso3jug.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020808-wpso3jug.txt' === file2bib.sh === id: cord-020875-vd4rtxmz author: Suwaileh, Reem title: Time-Critical Geolocation for Social Good date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020875-vd4rtxmz.txt cache: ./cache/cord-020875-vd4rtxmz.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020875-vd4rtxmz.txt' === file2bib.sh === id: cord-020843-cq4lbd0l author: Almeida, Tiago title: Calling Attention to Passages for Biomedical Question Answering date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020843-cq4lbd0l.txt cache: ./cache/cord-020843-cq4lbd0l.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020843-cq4lbd0l.txt' === file2bib.sh === id: cord-020848-nypu4w9s author: Morris, David title: SlideImages: A Dataset for Educational Image Classification date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020848-nypu4w9s.txt cache: ./cache/cord-020848-nypu4w9s.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020848-nypu4w9s.txt' === file2bib.sh === id: cord-020820-cbikq0v0 author: Papadakos, Panagiotis title: Dualism in Topical Relevance date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020820-cbikq0v0.txt cache: ./cache/cord-020820-cbikq0v0.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-020820-cbikq0v0.txt' === file2bib.sh === id: cord-020813-0wc23ixy author: Hashemi, Helia title: ANTIQUE: A Non-factoid Question Answering Benchmark date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020813-0wc23ixy.txt cache: ./cache/cord-020813-0wc23ixy.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'cord-020813-0wc23ixy.txt' === file2bib.sh === id: cord-020814-1ty7wzlv author: Berrendorf, Max title: Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020814-1ty7wzlv.txt cache: ./cache/cord-020814-1ty7wzlv.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020814-1ty7wzlv.txt' === file2bib.sh === id: cord-020832-iavwkdpr author: Nguyen, Dat Quoc title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020832-iavwkdpr.txt cache: ./cache/cord-020832-iavwkdpr.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020832-iavwkdpr.txt' === file2bib.sh === id: cord-020794-d3oru1w5 author: Leekha, Maitree title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020794-d3oru1w5.txt cache: ./cache/cord-020794-d3oru1w5.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'cord-020794-d3oru1w5.txt' === file2bib.sh === id: cord-020806-lof49r72 author: Landin, Alfonso title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020806-lof49r72.txt cache: ./cache/cord-020806-lof49r72.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 209 resourceName b'cord-020806-lof49r72.txt' === file2bib.sh === id: cord-020801-3sbicp3v author: MacAvaney, Sean title: Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020801-3sbicp3v.txt cache: ./cache/cord-020801-3sbicp3v.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020801-3sbicp3v.txt' === file2bib.sh === id: cord-020834-ch0fg9rp author: Grand, Adrien title: From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020834-ch0fg9rp.txt cache: ./cache/cord-020834-ch0fg9rp.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020834-ch0fg9rp.txt' === file2bib.sh === id: cord-020793-kgje01qy author: Suominen, Hanna title: CLEF eHealth Evaluation Lab 2020 date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020793-kgje01qy.txt cache: ./cache/cord-020793-kgje01qy.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-020793-kgje01qy.txt' === file2bib.sh === id: cord-020815-j9eboa94 author: Kamphuis, Chris title: Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020815-j9eboa94.txt cache: ./cache/cord-020815-j9eboa94.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 304 resourceName b'cord-020815-j9eboa94.txt' === file2bib.sh === id: cord-020885-f667icyt author: Sharma, Ujjwal title: Semantic Path-Based Learning for Review Volume Prediction date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020885-f667icyt.txt cache: ./cache/cord-020885-f667icyt.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-020885-f667icyt.txt' === file2bib.sh === id: cord-020899-d6r4fr9r author: Doinychko, Anastasiia title: Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020899-d6r4fr9r.txt cache: ./cache/cord-020899-d6r4fr9r.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020899-d6r4fr9r.txt' === file2bib.sh === id: cord-020916-ds0cf78u author: Fard, Mazar Moradi title: Seed-Guided Deep Document Clustering date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020916-ds0cf78u.txt cache: ./cache/cord-020916-ds0cf78u.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020916-ds0cf78u.txt' === file2bib.sh === id: cord-020932-o5scqiyk author: Zhong, Wei title: Accelerating Substructure Similarity Search for Formula Retrieval date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020932-o5scqiyk.txt cache: ./cache/cord-020932-o5scqiyk.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 244 resourceName b'cord-020932-o5scqiyk.txt' === file2bib.sh === id: cord-020888-ov2lzus4 author: Formal, Thibault title: Learning to Rank Images with Cross-Modal Graph Convolutions date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020888-ov2lzus4.txt cache: ./cache/cord-020888-ov2lzus4.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020888-ov2lzus4.txt' === file2bib.sh === id: cord-020909-n36p5n2k author: Papadakos, Panagiotis title: bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020909-n36p5n2k.txt cache: ./cache/cord-020909-n36p5n2k.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'cord-020909-n36p5n2k.txt' === file2bib.sh === id: cord-020880-m7d4e0eh author: Barrón-Cedeño, Alberto title: CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020880-m7d4e0eh.txt cache: ./cache/cord-020880-m7d4e0eh.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'cord-020880-m7d4e0eh.txt' === file2bib.sh === id: cord-020912-tbq7okmj author: Batra, Vishwash title: Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020912-tbq7okmj.txt cache: ./cache/cord-020912-tbq7okmj.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-020912-tbq7okmj.txt' === file2bib.sh === id: cord-020896-yrocw53j author: Agarwal, Mansi title: MEMIS: Multimodal Emergency Management Information System date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020896-yrocw53j.txt cache: ./cache/cord-020896-yrocw53j.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020896-yrocw53j.txt' === file2bib.sh === id: cord-020914-7p37m92a author: Dumani, Lorik title: A Framework for Argument Retrieval: Ranking Argument Clusters by Frequency and Specificity date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020914-7p37m92a.txt cache: ./cache/cord-020914-7p37m92a.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020914-7p37m92a.txt' === file2bib.sh === id: cord-020830-97xmu329 author: Ghanem, Bilal title: Irony Detection in a Multilingual Context date: 2020-03-24 pages: extension: .txt txt: ./txt/cord-020830-97xmu329.txt cache: ./cache/cord-020830-97xmu329.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-020830-97xmu329.txt' === file2bib.sh === id: cord-020905-gw8i6tkn author: Qu, Xianshan title: An Attention Model of Customer Expectation to Improve Review Helpfulness Prediction date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020905-gw8i6tkn.txt cache: ./cache/cord-020905-gw8i6tkn.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'cord-020905-gw8i6tkn.txt' === file2bib.sh === id: cord-020903-qt0ly5d0 author: Tamine, Lynda title: What Can Task Teach Us About Query Reformulations? date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020903-qt0ly5d0.txt cache: ./cache/cord-020903-qt0ly5d0.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020903-qt0ly5d0.txt' === file2bib.sh === id: cord-020904-x3o3a45b author: Montazeralghaem, Ali title: Relevance Ranking Based on Query-Aware Context Analysis date: 2020-03-17 pages: extension: .txt txt: ./txt/cord-020904-x3o3a45b.txt cache: ./cache/cord-020904-x3o3a45b.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'cord-020904-x3o3a45b.txt' Que is empty; done journal-advancesInInformationRetrieval-cord === reduce.pl bib === id = cord-020811-pacy48qx author = Muhammad, Shamsuddeen Hassan title = Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon date = 2020-03-24 pages = extension = .txt mime = text/plain words = 1725 sentences = 113 flesch = 50 summary = title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon To this end, we propose an approach to automatically generate a domain-specific sentiment lexicon using a vector model enriched by weights. Although research has been carried out on corpus-based approaches for automatic generation of a domain-specific lexicon [1, 4, 5, 7, 9, 10, 14] , existing approaches focused on creation of a lexicon from a single corpus [4] . To this end, this work proposes an incremental approach for the automatic generation of a domain-specific sentiment lexicon. We aim to investigate an incremental technique for automatically generating domain-specific sentiment lexicon from a corpus. Can we automatically generate a sentiment lexicon from a corpus and improves the existing approaches? After detecting the domain shift, we merge the distribution using a similar approach discussed (in updating using the same corpus) and generate the lexicon. cache = ./cache/cord-020811-pacy48qx.txt txt = ./txt/cord-020811-pacy48qx.txt === reduce.pl bib === id = cord-020794-d3oru1w5 author = Leekha, Maitree title = A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling date = 2020-03-24 pages = extension = .txt mime = text/plain words = 1569 sentences = 105 flesch = 59 summary = title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling In this work, we introduce a novel over-sampling technique to address the problem of class imbalance, and propose a multi-task deep learning approach for mining suggestions from multiple domains. Experimental results on a publicly available dataset show that our over-sampling technique, coupled with the multi-task framework outperforms state-of-the-art open domain suggestion mining models in terms of the F-1 measure and AUC. In our study, we generate synthetic positive reviews till the number of suggestion and non-suggestion class samples becomes equal in the training set. All comparisons have been made in terms of the F-1 score of the suggestion class for a fair comparison with prior work on representational learning for open domain suggestion mining [5] (refer Baseline in Table 3 ). In this work, we proposed a Multi-task learning framework for Open Domain Suggestion Mining along with a novel language model based over-sampling technique for text-LMOTE. cache = ./cache/cord-020794-d3oru1w5.txt txt = ./txt/cord-020794-d3oru1w5.txt === reduce.pl bib === id = cord-020914-7p37m92a author = Dumani, Lorik title = A Framework for Argument Retrieval: Ranking Argument Clusters by Frequency and Specificity date = 2020-03-17 pages = extension = .txt mime = text/plain words = 5482 sentences = 302 flesch = 67 summary = From an information retrieval perspective, an interesting task within this setting is finding the best supporting and attacking premises for a given query claim from a large corpus of arguments. From an information retrieval perspective, an interesting task within this setting is finding the best supporting (pro) and attacking (con) premises for a given query claim [31] . Given a user's keyword query, the system retrieves, ranks, and presents premises supporting and attacking the query, taking similarity of the query with the premise, its corresponding claim, and other contextual information into account. We assume that we work with a large corpus of argumentative text, for example collections of political speeches or forum discussions, that has already been mined and transferred into claims with the corresponding premises and stances. We consider the following problem: Given a controversial claim or topic, for example "We should abandon fossil fuels", a user searches for the most important premises from the corpus supporting or attacking it. cache = ./cache/cord-020914-7p37m92a.txt txt = ./txt/cord-020914-7p37m92a.txt === reduce.pl bib === id = cord-020806-lof49r72 author = Landin, Alfonso title = Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2373 sentences = 150 flesch = 52 summary = title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings In this paper, we present EER, a linear model for the top-N recommendation task, which takes advantage of user and item embeddings for improving novelty and diversity without harming accuracy. In this paper, we propose a method to augment an existing recommendation linear model to make more diverse and novel recommendations, while maintaining similar accuracy results. Experiments conducted on three datasets show that our proposal outperforms the original model in both novelty and diversity while maintaining similar levels of accuracy. On the other side, as results in Table 3 show, ELP is able to provide good figures in novelty and diversity, thanks to the embedding model capturing non-linear relations between users and items. It is common in the field of recommender systems for methods with lower accuracy to have higher values in diversity and novelty. FISM: factored item similarity models for top-n recommender systems cache = ./cache/cord-020806-lof49r72.txt txt = ./txt/cord-020806-lof49r72.txt === reduce.pl bib === id = cord-020815-j9eboa94 author = Kamphuis, Chris title = Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2249 sentences = 154 flesch = 60 summary = Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene's often maligned approximation of document length. Although learning-to-rank approaches and neural ranking models are widely used today, they are typically deployed as part of a multi-stage reranking architecture, over candidate documents supplied by a simple term-matching method using traditional inverted indexes [1] . Our goal is a large-scale reproducibility study to explore the nuances of different variants of BM25 and their impact on retrieval effectiveness. Their findings are confirmed: effectiveness differences in IR experiments are unlikely to be the result of the choice of BM25 variant a system implemented. We implemented a variant that uses exact document lengths, but is otherwise identical to the Lucene default. Storing exact document lengths would allow for different ranking functions to be swapped at query time more easily, as no information would be discarded at index time. cache = ./cache/cord-020815-j9eboa94.txt txt = ./txt/cord-020815-j9eboa94.txt === reduce.pl bib === id = cord-020885-f667icyt author = Sharma, Ujjwal title = Semantic Path-Based Learning for Review Volume Prediction date = 2020-03-17 pages = extension = .txt mime = text/plain words = 4026 sentences = 245 flesch = 48 summary = In this work, we present an approach that uses semantically meaningful, bimodal random walks on real-world heterogeneous networks to extract correlations between nodes and bring together nodes with shared or similar attributes. In this work, -We propose a novel method that incorporates restaurants and their attributes into a multimodal graph and extracts multiple, bimodal low dimensional representations for restaurants based on available paths through shared visual, textual, geographical and categorical features. In this section, we discuss prior work that leverages graph-based structures for extracting information from multiple modalities, focussing on the auto-captioning task that introduced such methods. For each of these sub-networks, we perform random walks and use a variant of the heterogeneous skip-gram objective introduced in [6] to generate low-dimensional bimodal embeddings. Our attention-based model combines separately learned bimodal embeddings using a late-fusion setup for predicting the review volume of the restaurants. cache = ./cache/cord-020885-f667icyt.txt txt = ./txt/cord-020885-f667icyt.txt === reduce.pl bib === id = cord-020888-ov2lzus4 author = Formal, Thibault title = Learning to Rank Images with Cross-Modal Graph Convolutions date = 2020-03-17 pages = extension = .txt mime = text/plain words = 5211 sentences = 256 flesch = 55 summary = While most of the current approaches for cross-modal retrieval revolve around learning how to represent text and images in a shared latent space, we take a different direction: we propose to generalize the cross-modal relevance feedback mechanism, a simple yet effective unsupervised method, that relies on standard information retrieval heuristics and the choice of a few hyper-parameters. The model can be understood very simply: similarly to PRF methods in standard information retrieval, the goal is to boost images that are visually similar to top images (from a text point of view), i.e. images that are likely to be relevant to the query but were initially badly ranked (which is likely to happen in the web scenario, where text is crawled from source page and can be very noisy). cache = ./cache/cord-020888-ov2lzus4.txt txt = ./txt/cord-020888-ov2lzus4.txt === reduce.pl bib === id = cord-020916-ds0cf78u author = Fard, Mazar Moradi title = Seed-Guided Deep Document Clustering date = 2020-03-17 pages = extension = .txt mime = text/plain words = 5079 sentences = 265 flesch = 57 summary = The main contributions of this study can be summarized as follows: (a) We introduce the Seed-guided Deep Document Clustering (SD2C) framework, 1 the first attempt, to the best of our knowledge, to constrain clustering with seed words based on a deep clustering approach; and (b) we validate this framework through experiments based on automatically selected seed words on five publicly available text datasets with various sizes and characteristics. The constrained clustering problem we are addressing in fact bears strong similarity with the one of seed-guided dataless text classification, which consist in categorizing documents based on a small set of seed words describing the classes/clusters. This can be done by enforcing that seed words have more influence either on the learned document embeddings, a solution we refer to as SD2C-Doc, or on the cluster representatives, a solution we refer to as SD2C-Rep. Note that the second solution can only be used when the clustering process is based on cluster representatives (i.e., R = {r k } K k=1 with K the number of clusters), which is indeed the case for most current deep clustering methods [1] . cache = ./cache/cord-020916-ds0cf78u.txt txt = ./txt/cord-020916-ds0cf78u.txt === reduce.pl bib === id = cord-020820-cbikq0v0 author = Papadakos, Panagiotis title = Dualism in Topical Relevance date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2468 sentences = 133 flesch = 56 summary = To this end, in this paper we elaborate on the idea of leveraging the available antonyms of the original query terms for eventually producing an answer which provides a better overview of the related conceptual and information space. In this paper we elaborate on the idea of leveraging the available antonyms of the original query terms (if they exist), for eventually producing an answer which provides a better overview of the related information and conceptual space. In their comments for these queries, users mention that the selected (i.e., dual) list "provides a more general picture" and "more relevant and interesting results, although contradicting". For the future, we plan to define the appropriate antonyms selection algorithms and relevance metrics, implement the proposed functionality in a meta-search setting, and conduct a large scale evaluation with real users over exploratory tasks, to identify in which queries the dual approach is beneficial and to what types of users. cache = ./cache/cord-020820-cbikq0v0.txt txt = ./txt/cord-020820-cbikq0v0.txt === reduce.pl bib === id = cord-020793-kgje01qy author = Suominen, Hanna title = CLEF eHealth Evaluation Lab 2020 date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2379 sentences = 116 flesch = 51 summary = Laypeople's increasing difficulties to retrieve and digest valid and relevant information in their preferred language to make health-centred decisions has motivated CLEF eHealth to organize yearly labs since 2012. substantial community interest in the tasks and their resources has led to CLEF eHealth maturing as a primary venue for all interdisciplinary actors of the ecosystem for producing, processing, and consuming electronic health information. Information access conferences have organized evaluation labs on related Electronic Health (eHealth) Information Extraction (IE), Information Management (IM), and Information Retrieval (IR) tasks for almost 20 years. This Consumer Health Search (CHS) task follows a standard IR shared challenge paradigm from the perspective that it provides participants with a test collection consisting of a set of documents and a set of topics to develop IR techniques for. The IR task at the CLEF eHealth evaluation lab 2016: usercentred health information retrieval cache = ./cache/cord-020793-kgje01qy.txt txt = ./txt/cord-020793-kgje01qy.txt === reduce.pl bib === id = cord-020843-cq4lbd0l author = Almeida, Tiago title = Calling Attention to Passages for Biomedical Question Answering date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2235 sentences = 125 flesch = 50 summary = This paper presents a pipeline for document and passage retrieval for biomedical question answering built around a new variant of the DeepRank network model in which the recursive layer is replaced by a self-attention layer combined with a weighting mechanism. On the other hand, models such as the Deep Relevance Matching Model (DRMM) [3] or DeepRank [10] follow a interaction-based approach, in which matching signals between query and document are captured and used by the neural network to produces a ranking score. The main contribution of this work is a new variant of the DeepRank neural network architecture in which the recursive layer originally included in the final aggregation step is replaced by a self-attention layer followed by a weighting mechanism similar to the term gating layer of the DRMM. The proposed model was evaluated on the BioASQ dataset, as part of a document and passage (snippet) retrieval pipeline for biomedical question answering, achieving similar retrieval performance when compared to more complex network architectures. cache = ./cache/cord-020843-cq4lbd0l.txt txt = ./txt/cord-020843-cq4lbd0l.txt === reduce.pl bib === id = cord-020841-40f2p3t4 author = Hofstätter, Sebastian title = Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-ranking Results date = 2020-03-24 pages = extension = .txt mime = text/plain words = 1526 sentences = 91 flesch = 53 summary = In this paper we look beyond metrics-based evaluation of Information Retrieval systems, to explore the reasons behind ranking results. We present the content-focused Neural-IR-Explorer, which empowers users to browse through retrieval results and inspect the inner workings and fine-grained results of neural re-ranking models. The explorer complements metrics based evaluation, by focusing on the content of queries and documents, and how the neural models relate them to each other. Users can explore each query result in more detail: We show the internal partial scores and content of the returned documents with different highlighting modes to surface the inner workings of a neural re-ranking model. The explorer displays data created by a batched evaluation run of a neural re-ranking model. Additionally, the Neural-IR-Explorer also illuminates the pool bias [12] of the MSMARCO ranking collection: The small number of judged documents per query makes the evaluation fragile. We presented the content-focused Neural-IR-Explorer to complement metric based evaluation of retrieval models. cache = ./cache/cord-020841-40f2p3t4.txt txt = ./txt/cord-020841-40f2p3t4.txt === reduce.pl bib === === reduce.pl bib === id = cord-020932-o5scqiyk author = Zhong, Wei title = Accelerating Substructure Similarity Search for Formula Retrieval date = 2020-03-17 pages = extension = .txt mime = text/plain words = 4602 sentences = 278 flesch = 65 summary = In text similarity search, query processing can be accelerated through dynamic pruning [18] , which typically estimates score upperbounds to prune documents unlikely to be in the top K results. As a result, the posting list entry also stores the root node ID for indexed paths, in order to reconstruct matches substructures at merge time. Define partial upperbound matrix W = {w i,j } |Tq|×|T| where T = {T(m), m ∈ T q } are all the token paths from query OPT (T is essentially the same as tokenized P(T q )), and a binary variable x |T|×1 indicating which corresponding posting lists are placed in the non-requirement set. We have presented rank-safe dynamic pruning strategies that produce an upperbound estimation of structural similarity in order to speedup formula search using subtree matching. Our dynamic pruning strategies and specialized inverted index are different from traditional linear text search pruning methods and they further associate query structure representation with posting lists. cache = ./cache/cord-020932-o5scqiyk.txt txt = ./txt/cord-020932-o5scqiyk.txt === reduce.pl bib === id = cord-020909-n36p5n2k author = Papadakos, Panagiotis title = bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users date = 2020-03-17 pages = extension = .txt mime = text/plain words = 5005 sentences = 256 flesch = 63 summary = -the bias goggles model for computing the bias characteristics of web domains for a user-defined concept, based on the notions of Biased Concepts (BCs), Aspects of Bias (ABs), and the metrics of the support of the domain for a specific AB and BC, and its bias score for this BC, -the introduction of the Support Flow Graph (SFG), along with graph-based algorithms for computing the AB support score of domains, that include adaptations of the Independence Cascade (IC) and Linear Threshold (LT) propagation models, and the new Biased-PageRank (Biased-PR) variation that models different behaviours of a biased surfer, -an initial discussion about performance and implementation issues, -some promising evaluation results that showcase the effectiveness and efficiency of the approach on a relatively small dataset of crawled pages, using the new AGBR and AGS metrics, -a publicly accessible prototype of bias goggles. cache = ./cache/cord-020909-n36p5n2k.txt txt = ./txt/cord-020909-n36p5n2k.txt === reduce.pl bib === id = cord-020899-d6r4fr9r author = Doinychko, Anastasiia title = Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views date = 2020-03-17 pages = extension = .txt mime = text/plain words = 4666 sentences = 244 flesch = 56 summary = In this paper, we present a conditional GAN with two generators and a common discriminator for multiview learning problems where observations have two views, but one of them may be missing for some of the training samples. We address the problem of multiview learning with Generative Adversarial Networks (GANs) in the case where some observations may have missing views without there being an external resource to complete them. We demonstrate that generated views allow to achieve state-of-the-art results on a subset of Reuters RCV1/RCV2 collections compared to multiview approaches that rely on Machine Translation (MT) for translating documents into languages in which their versions do not exist; before training the models. 3.2); -Achieve state-of-the art performance compared to multiview approaches that rely on external view generating functions on multilingual document classification; and which is another challenging application than image analysis which is the domain of choice for the design of new GAN models (Sect. cache = ./cache/cord-020899-d6r4fr9r.txt txt = ./txt/cord-020899-d6r4fr9r.txt === reduce.pl bib === id = cord-020896-yrocw53j author = Agarwal, Mansi title = MEMIS: Multimodal Emergency Management Information System date = 2020-03-17 pages = extension = .txt mime = text/plain words = 4874 sentences = 270 flesch = 52 summary = We present MEMIS, a system that can be used in emergencies like disasters to identify and analyze the damage indicated by user-generated multimodal social media posts, thereby helping the disaster management groups in making informed decisions. To this end, we propose MEMIS, a multimodal system capable of extracting information from social media, and employs both images and text for identifying damage and its severity in real-time (refer Sect. Therefore, we effectively have three models for each modality: first for filtering the informative tweets, then for those pertaining to the infrastructural damage (or any other category related to the relief group), and finally for assessing the severity of damage present. Similarly, if at least one of the text and the image modality predicts an informative tweet as containing infrastructural damage, the tweet undergoes severity analysis. Here, we use attention fusion to combine the feature interpretations from the text and image modalities for the severity analysis module [12, 26] . cache = ./cache/cord-020896-yrocw53j.txt txt = ./txt/cord-020896-yrocw53j.txt === reduce.pl bib === id = cord-020905-gw8i6tkn author = Qu, Xianshan title = An Attention Model of Customer Expectation to Improve Review Helpfulness Prediction date = 2020-03-17 pages = extension = .txt mime = text/plain words = 5412 sentences = 330 flesch = 60 summary = To model such customer expectations and capture important information from a review text, we propose a novel neural network which leverages review sentiment and product information. In order to address the above issues, we propose a novel neural network architecture to introduce sentiment and product information when identifying helpful content from a review text. In the cold start scenario, our proposed model demonstrates an AUC improvement of 5.4% and 1.5% on Amazon and Yelp data sets, respectively, when compared to the state of the art model. From Table 5 , we see that adding a sentiment attention layer (HSA) to the base model (HBiLSTM) results in an average improvement in the AUC score of 2.0% and 2.6%, respectively on the Amazon and Yelp data sets. In this paper, we describe our analysis of review helpfulness prediction and propose a novel neural network model with attention modules to incorporate sentiment and product information. cache = ./cache/cord-020905-gw8i6tkn.txt txt = ./txt/cord-020905-gw8i6tkn.txt === reduce.pl bib === id = cord-020912-tbq7okmj author = Batra, Vishwash title = Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration date = 2020-03-17 pages = extension = .txt mime = text/plain words = 4506 sentences = 247 flesch = 50 summary = We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. More concretely, we incorporate the global context information encoded in the entire text sequence (through the attention mechanism) into a variational autoencoder (VAE) at each time step, which converts the input text into an image representation in the image embedding space. To capture the semantics of the images retrieved so far (in a story/recipe), we assume the prior of the distribution of the topic given the text input follows the distribution conditional on the latent topic from the previous time step. -We propose a new variational recurrent seq2seq (VRSS) retrieval model for seq2seq retrieval, which employs temporally-dependent latent variables to capture the sequential semantic structure of text-image sequences. Our work is related to: cross-modal retrieval, story picturing, variational recurrent neural networks, and cooking recipe datasets. cache = ./cache/cord-020912-tbq7okmj.txt txt = ./txt/cord-020912-tbq7okmj.txt === reduce.pl bib === id = cord-020801-3sbicp3v author = MacAvaney, Sean title = Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2530 sentences = 154 flesch = 53 summary = In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. [28] leveraged a data set of Wikipedia pages in 25 languages to train a learning to rank algorithm for Japanese-English and Swahili-English cross-language retrieval. In particular, to circumvent the lack of training data, we leverage transfer learning techniques to train Arabic, Mandarin, and Spanish retrieval models using English training data. We evaluate our models in a zero-shot setting; that is, we use them to predict relevance scores for query document pairs in languages never seen during training. Because large-scale relevance judgments are largely absent in languages other than English, we propose a new setting to evaluate learning-to-rank approaches: zero-shot cross-lingual ranking. cache = ./cache/cord-020801-3sbicp3v.txt txt = ./txt/cord-020801-3sbicp3v.txt === reduce.pl bib === id = cord-020834-ch0fg9rp author = Grand, Adrien title = From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2733 sentences = 137 flesch = 54 summary = We share the story of how an innovation that originated from academia-blockmax indexes and the corresponding block-max Wand query evaluation algorithm of Ding and Suel [6] -made its way into the open-source Lucene search library. We see this paper as having two main contributions beyond providing a narrative of events: First, we report results of experiments that attempt to match the original conditions of Ding and Suel [6] and present additional results on a number of standard academic IR test collections. 3 Support for block-max indexes was the final feature that was implemented, based on the developers' reading of the paper by Ding and Suel [6] , which required invasive changes to Lucene's index format. The story of block-max Wand in Lucene provides a case study of how an innovation that originated in academia made its way into the world's most widely-used search library and achieved significant impact in the "real world" through hundreds of production deployments worldwide (if we consider the broader Lucene ecosystem, which includes systems such as Elasticsearch and Solr). cache = ./cache/cord-020834-ch0fg9rp.txt txt = ./txt/cord-020834-ch0fg9rp.txt === reduce.pl bib === id = cord-020875-vd4rtxmz author = Suwaileh, Reem title = Time-Critical Geolocation for Social Good date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2030 sentences = 134 flesch = 51 summary = To address this problem, I aim to exploit different techniques such as training neural models, enriching the tweet representation, and studying methods to mitigate the lack of labeled data. In my work, I am interested in tackling the Location Mention Prediction (LMP) problem during time-critical situations. The location taggers have to address many challenges including microblogging-specific challenges (e.g., tweet sparsity, noisiness, stream rapid-changing, hashtag riding, etc.) and the task-specific challenges (e.g., time-criticality of the solution, scarcity of labeled data, etc.). Alternatively, Sultanik and Fink [25] , used Information Retrieval (IR) based approach to identify the location mentions in tweets. Moreover, Hoang and Mothe [8] combined syntactic and semantic features to train traditional ML-based models whereas Kumar and Singh [13] trained a Convolutional Neural Network (CNN) model that learns the continuous representation of tweet text and then identifies the location mentions. cache = ./cache/cord-020875-vd4rtxmz.txt txt = ./txt/cord-020875-vd4rtxmz.txt === reduce.pl bib === id = cord-020832-iavwkdpr author = Nguyen, Dat Quoc title = ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents date = 2020-03-24 pages = extension = .txt mime = text/plain words = 1980 sentences = 118 flesch = 49 summary = title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents ChEMU involves two key information extraction tasks over chemical reactions from patents. In this paper, we propose a new evaluation lab (called ChEMU) focusing on information extraction over chemical reactions from patents. Our goals are: (1) To develop tasks that impact chemical research in both academia and industry, (2) To provide the community with a new dataset of chemical entities, enriched with relational links between chemical event triggers and arguments, and (3) To advance the state-of-the-art in information extraction over chemical patents. The ChEMU lab at CLEF-2020 1 offers the two information extraction tasks of Named entity recognition (Task 1) and Event extraction (Task 2) over chemical reactions from patent documents. ChEMU will focus on two new tasks of named entity recognition and event extraction over chemical reactions from patents. cache = ./cache/cord-020832-iavwkdpr.txt txt = ./txt/cord-020832-iavwkdpr.txt === reduce.pl bib === id = cord-020814-1ty7wzlv author = Berrendorf, Max title = Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2314 sentences = 144 flesch = 55 summary = In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Graph Convolutional Networks (GCN) [7, 9] , which have been recently become increasingly popular, are at the core of state-of-the-art methods for entity alignments in KGs [3, 6, 22, 24, 27] . 1. We investigate the reproducibility of the published results of a recent GCNbased method for entity alignment and uncover differences between the method's description in the paper and the authors' implementation. Overview of used datasets with their sizes in the number of triples (edges), entities (nodes), relations (different edge types) and alignments. GCN-Align [22] is a GCN-based approach to embed all entities from both graphs into a common embedding space. Semi-supervised entity alignment via knowledge graph embedding with awareness of degree difference Entity alignment between knowledge graphs using attribute embeddings cache = ./cache/cord-020814-1ty7wzlv.txt txt = ./txt/cord-020814-1ty7wzlv.txt === reduce.pl bib === id = cord-020813-0wc23ixy author = Hashemi, Helia title = ANTIQUE: A Non-factoid Question Answering Benchmark date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2941 sentences = 185 flesch = 59 summary = Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. Despite the widely-known importance of studying answer passage retrieval for non-factoid questions [1, 2, 8, 18] , the research progress for this task is limited by the availability of high-quality public data. Although WikiPassageQA is an invaluable contribution to the community, it does not cover all aspects of the non-factoid question answering task and has the following limitations: (i) it only contains an average of 1.7 relevant passages per question and does not cover many questions with multiple correct answers; (ii) it was created from the Wikipedia website, containing only formal text; (iii) more importantly, the questions in the WikiPassageQA dataset were generated by crowdworkers, which is different from the questions that users ask in real-world systems; (iv) the relevant passages in WikiPassageQA contain the answer to the question in addition to some surrounding text. In contrast, ANTIQUE provides a reliable collection with complete relevance annotations for evaluating non-factoid QA models. cache = ./cache/cord-020813-0wc23ixy.txt txt = ./txt/cord-020813-0wc23ixy.txt === reduce.pl bib === id = cord-020903-qt0ly5d0 author = Tamine, Lynda title = What Can Task Teach Us About Query Reformulations? date = 2020-03-17 pages = extension = .txt mime = text/plain words = 4957 sentences = 264 flesch = 61 summary = task-based sessions represent significantly different background contexts to be used in the perspective of better understanding users' query reformulations. Using insights from large-scale search logs, our findings clearly show that task is an additional relevant search unit that helps better understanding user's query reformulation patterns and predicting the next user's query. To design support processes for task-based search systems, we argue that we need to: (1) fully understand how user's task performed in natural settings drives the query reformulations changes; and (2) gauge the level of similarity of these changes trends with those observed in time-based sessions. With this in mind, we perform large-scale log analyses of users naturally engaged in tasks to examine query reformulations from both the time-based session vs. To identify query reformulation patterns, most of the previous works used large-scale log analyses segmented into time-based sessions. cache = ./cache/cord-020903-qt0ly5d0.txt txt = ./txt/cord-020903-qt0ly5d0.txt === reduce.pl bib === id = cord-020830-97xmu329 author = Ghanem, Bilal title = Irony Detection in a Multilingual Context date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2806 sentences = 158 flesch = 54 summary = We show that these monolingual models trained separately on different languages using multilingual word representation or text-based features can open the door to irony detection in languages that lack of annotated data for irony. We aim here to bridge the gap by tackling ID in tweets from both multilingual (French, English and Arabic) and multicultural perspectives (Indo-European languages whose speakers share quite the same cultural background vs. We can justify that by, the language presentation of the Arabic and French tweets are quite informal and have many dialect words that may not exist in the pretrained embeddings we used comparing to the English ones (lower embeddings coverage ratio), which become harder for the CNN to learn a clear semantic pattern. The CNN architecture trained on cross-lingual word representation shows that irony has a certain similarity between the languages we targeted despite the cultural differences which confirm that irony is a universal phenomena, as already shown in previous linguistic studies [9, 24, 35] . cache = ./cache/cord-020830-97xmu329.txt txt = ./txt/cord-020830-97xmu329.txt === reduce.pl bib === id = cord-020848-nypu4w9s author = Morris, David title = SlideImages: A Dataset for Educational Image Classification date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2276 sentences = 145 flesch = 51 summary = Currently, many document analysis systems are trained in part on scene images due to the lack of large datasets of educational image data. In this paper, we address this issue and present SlideImages, a dataset for the task of classifying educational illustrations. SlideImages contains training data collected from various sources, e.g., Wikimedia Commons and the AI2D dataset, and test data collected from educational slides. Born-digital and educational images need further benchmarks on challenging information retrieval tasks in order to test generalization. While document scans and born-digital educational illustrations have materially different appearance, these papers show that the utility of deep neural networks is not limited to scene image tasks (Fig. 1) . The related DocFigure dataset covers similar images and has much more data than SlideImages. In this paper, we have presented the task of classifying educational illustrations and images in slides and introduced a novel dataset SlideImages. cache = ./cache/cord-020848-nypu4w9s.txt txt = ./txt/cord-020848-nypu4w9s.txt === reduce.pl bib === id = cord-020808-wpso3jug author = Cardoso, João title = Machine-Actionable Data Management Plans: A Knowledge Retrieval Approach to Automate the Assessment of Funders’ Requirements date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2328 sentences = 137 flesch = 49 summary = In order to guide researchers through the process of managing their data, many funding agencies (e.g. the National Science Foundation (NSF), the European Commission (EC), or the Fundação para a Ciência e Tecnologia (FCT) have created and published their own open access policies, as well as requiring that any grant proposals be accompanied by a Data Management Plan (DMP). The DMP is a document describing the techniques, methods and policies on how data from a research project is to be created or collected, documented, accessed, preserved and disseminated. The second part comprises of the execution of the following four tasks and results in both the collection of the necessary mappings between the ontology and the identified DMP templates, and creation of DL queries based on the funders' requirements. The DMP Common Standard Ontology (DCSO) 1 , was created with the objective of providing an implementation of the DMP Common Standards model expressed through the usage of semantic technology, which has been considered a possible solution in the data management and preservation domains [9] . cache = ./cache/cord-020808-wpso3jug.txt txt = ./txt/cord-020808-wpso3jug.txt === reduce.pl bib === id = cord-020880-m7d4e0eh author = Barrón-Cedeño, Alberto title = CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media date = 2020-03-24 pages = extension = .txt mime = text/plain words = 2693 sentences = 217 flesch = 64 summary = Task 3 asks to retrieve text snippets from a given set of Web pages that would be useful for verifying a target tweet's claim. Finally, the lab offers a fifth task that asks to predict the check-worthiness of the claims made in English political debates and speeches. Task 3 is defined as follows: Given a check-worthy claim on a specific topic and a set of text snippets extracted from potentially-relevant webpages, return a ranked list of all evidence snippets for the claim. Once we acquire annotations for Task 1, we share with participants the Web pages and text snippets from them solely for the check-worthy claims, which would enable the start of the evaluation cycle for Task 3. Task 4 is defined as follows: Given a check-worthy claim on a specific topic and a set of potentially-relevant Web pages, predict the veracity of the claim. cache = ./cache/cord-020880-m7d4e0eh.txt txt = ./txt/cord-020880-m7d4e0eh.txt === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === id = cord-020904-x3o3a45b author = Montazeralghaem, Ali title = Relevance Ranking Based on Query-Aware Context Analysis date = 2020-03-17 pages = extension = .txt mime = text/plain words = 5192 sentences = 326 flesch = 53 summary = The primary goal of the proposed model is to combine the exact and semantic matching between query and document terms, which has been shown to produce effective performance in information retrieval. In basic retrieval models such as BM25 [30] and the language modeling framework [29] , the relevance score of a document is estimated based on explicit matching of query and document terms. Finally, our proposed model for relevance ranking provides the basis for natural integration of semantic term matching and local document context analysis into any retrieval model. [13] proposed a generalized estimate of document language models using a noisy channel, which captures semantic term similarities computed using word embeddings. Note that in this experiment, we only consider methods that select expansion terms based on word embeddings and not other information sources such as the top retrieved documents for each query (PRF). cache = ./cache/cord-020904-x3o3a45b.txt txt = ./txt/cord-020904-x3o3a45b.txt === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === === reduce.pl bib === ===== Reducing email addresses Creating transaction Updating adr table ===== Reducing keywords cord-020793-kgje01qy cord-020794-d3oru1w5 cord-020811-pacy48qx cord-020820-cbikq0v0 cord-020815-j9eboa94 cord-020841-40f2p3t4 cord-020834-ch0fg9rp cord-020896-yrocw53j cord-020806-lof49r72 cord-020843-cq4lbd0l cord-020912-tbq7okmj cord-020885-f667icyt cord-020914-7p37m92a cord-020801-3sbicp3v cord-020899-d6r4fr9r cord-020909-n36p5n2k cord-020832-iavwkdpr cord-020875-vd4rtxmz cord-020932-o5scqiyk cord-020814-1ty7wzlv cord-020905-gw8i6tkn cord-020813-0wc23ixy cord-020916-ds0cf78u cord-020888-ov2lzus4 cord-020830-97xmu329 cord-020936-k1upc1xu cord-020848-nypu4w9s cord-020903-qt0ly5d0 cord-020851-hf5c0i9z cord-020904-x3o3a45b cord-020927-89c7rijg cord-020846-mfh1ope6 cord-020918-056bvngu cord-020891-lt3m8h41 cord-020835-n9v5ln2i cord-020931-fymgnv1g cord-020901-aew8xr6n cord-020872-frr8xba6 cord-020871-1v6dcmt3 cord-020808-wpso3jug cord-020880-m7d4e0eh cord-020890-aw465igx cord-020908-oe77eupc Creating transaction Updating wrd table ===== Reducing urls cord-020841-40f2p3t4 cord-020814-1ty7wzlv cord-020843-cq4lbd0l cord-020848-nypu4w9s cord-020916-ds0cf78u cord-020835-n9v5ln2i cord-020927-89c7rijg Creating transaction Updating url table ===== Reducing named entities cord-020834-ch0fg9rp cord-020820-cbikq0v0 cord-020806-lof49r72 cord-020815-j9eboa94 cord-020793-kgje01qy cord-020841-40f2p3t4 cord-020794-d3oru1w5 cord-020885-f667icyt cord-020896-yrocw53j cord-020912-tbq7okmj cord-020899-d6r4fr9r cord-020843-cq4lbd0l cord-020801-3sbicp3v cord-020811-pacy48qx cord-020914-7p37m92a cord-020888-ov2lzus4 cord-020905-gw8i6tkn cord-020936-k1upc1xu cord-020916-ds0cf78u cord-020909-n36p5n2k cord-020832-iavwkdpr cord-020932-o5scqiyk cord-020875-vd4rtxmz cord-020814-1ty7wzlv cord-020813-0wc23ixy cord-020903-qt0ly5d0 cord-020830-97xmu329 cord-020848-nypu4w9s cord-020808-wpso3jug cord-020846-mfh1ope6 cord-020880-m7d4e0eh cord-020835-n9v5ln2i cord-020904-x3o3a45b cord-020891-lt3m8h41 cord-020871-1v6dcmt3 cord-020908-oe77eupc cord-020927-89c7rijg cord-020931-fymgnv1g cord-020918-056bvngu cord-020901-aew8xr6n cord-020890-aw465igx cord-020851-hf5c0i9z cord-020872-frr8xba6 Creating transaction Updating ent table ===== Reducing parts of speech cord-020793-kgje01qy cord-020794-d3oru1w5 cord-020811-pacy48qx cord-020820-cbikq0v0 cord-020885-f667icyt cord-020896-yrocw53j cord-020834-ch0fg9rp cord-020806-lof49r72 cord-020912-tbq7okmj cord-020841-40f2p3t4 cord-020899-d6r4fr9r cord-020801-3sbicp3v cord-020914-7p37m92a cord-020843-cq4lbd0l cord-020815-j9eboa94 cord-020888-ov2lzus4 cord-020905-gw8i6tkn cord-020909-n36p5n2k cord-020936-k1upc1xu cord-020813-0wc23ixy cord-020916-ds0cf78u cord-020832-iavwkdpr cord-020814-1ty7wzlv cord-020932-o5scqiyk cord-020848-nypu4w9s cord-020903-qt0ly5d0 cord-020875-vd4rtxmz cord-020846-mfh1ope6 cord-020835-n9v5ln2i cord-020880-m7d4e0eh cord-020918-056bvngu cord-020830-97xmu329 cord-020851-hf5c0i9z cord-020890-aw465igx cord-020908-oe77eupc cord-020871-1v6dcmt3 cord-020891-lt3m8h41 cord-020904-x3o3a45b cord-020927-89c7rijg cord-020808-wpso3jug cord-020872-frr8xba6 cord-020931-fymgnv1g cord-020901-aew8xr6n Creating transaction Updating pos table Building ./etc/reader.txt cord-020904-x3o3a45b cord-020903-qt0ly5d0 cord-020888-ov2lzus4 cord-020916-ds0cf78u cord-020904-x3o3a45b cord-020936-k1upc1xu number of items: 43 sum of words: 100,167 average size in words: 3,338 average readability score: 55 nouns: model; query; information; text; document; data; models; retrieval; task; results; documents; user; dataset; word; words; image; search; users; attention; approach; tasks; performance; work; number; set; queries; features; learning; methods; graph; embeddings; terms; training; representations; evaluation; system; review; term; analysis; network; context; language; images; relevance; approaches; representation; similarity; sentiment; networks; datasets verbs: using; based; learns; show; propose; consider; provided; given; follows; generate; made; training; include; ranking; evaluate; compared; setting; finding; compute; contain; embedding; define; described; performs; obtain; supporting; retrieve; represent; identify; saw; introduce; predicts; improving; related; present; existing; takes; applying; extract; selected; combined; focused; capture; reported; require; outperforms; needs; sharing; observed; denotes adjectives: different; neural; semantic; new; similar; relevant; large; deep; previous; first; social; specific; available; non; multi; many; single; best; modal; cross; original; several; common; online; multiple; important; local; better; standard; simple; second; long; high; final; effective; top; various; visual; additional; real; able; latent; traditional; automatic; small; natural; possible; biased; average; textual adverbs: also; however; therefore; well; first; respectively; finally; even; instead; significantly; recently; often; better; furthermore; directly; still; automatically; fully; specifically; rather; especially; additionally; hence; randomly; much; moreover; always; already; usually; less; generally; widely; together; similarly; publicly; previously; now; typically; particularly; manually; semantically; far; otherwise; mainly; actually; simply; namely; jointly; highly; effectively pronouns: we; our; it; their; they; i; its; them; one; us; you; he; itself; my; his; your; u; ours; me; she; s; ourselves; themselves; ndcg@10; mine; him; her; 's; Π; f proper nouns: IR; Sect; BM25; Table; Fig; Eq; Retrieval; S; Lucene; Information; English; K; Twitter; COLTR; D; T; Bantu; BERT; i; DOI; TREC; Neural; TransRev; sha; Task; M; L; eRisk; C; VRSS; F; BC; LSTM; CNN; A; Wikipedia; Analysis; Network; LDA; dom; W; AUC; TF; Model; Cond; Amazon; corpus; IDF; DMP; Adam keywords: user; image; task; query; document; word; review; model; lucene; graph; english; dataset; claim; bm25; vrss; view; tweet; trec; topic; text; term; system; symptom; suel; session; sentence; seed; sd2c; schema; runyankore; recommendation; ranker; question; product; prf; premise; patent; passage; ontology; node; network; location; lmp; lmote; list; lexicon; language; label; item; irony one topic; one dimension: model file(s): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148004/ titles(s): CLEF eHealth Evaluation Lab 2020 three topics; one dimension: query; task; graph file(s): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148224/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148256/ titles(s): Relevance Ranking Based on Query-Aware Context Analysis | Counterfactual Online Learning to Rank | Axiomatic Analysis of Contact Recommendation Methods in Social Networks: An IR Perspective five topics; three dimensions: model document text; query task retrieval; data learning tweets; schema task dataset; graph nodes information file(s): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148208/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148223/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148229/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148211/ titles(s): Learning to Rank Images with Cross-Modal Graph Convolutions | What Can Task Teach Us About Query Reformulations? | Counterfactual Online Learning to Rank | bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users | KvGR: A Graph-Based Interface for Explorative Sequential Question Answering on Heterogeneous Information Sources Type: cord title: journal-advancesInInformationRetrieval-cord date: 2021-05-30 time: 15:05 username: emorgan patron: Eric Morgan email: emorgan@nd.edu input: facet_journal:"Advances in Information Retrieval" ==== make-pages.sh htm files ==== make-pages.sh complex files ==== make-pages.sh named enities ==== making bibliographics id: cord-020896-yrocw53j author: Agarwal, Mansi title: MEMIS: Multimodal Emergency Management Information System date: 2020-03-17 words: 4874.0 sentences: 270.0 pages: flesch: 52.0 cache: ./cache/cord-020896-yrocw53j.txt txt: ./txt/cord-020896-yrocw53j.txt summary: We present MEMIS, a system that can be used in emergencies like disasters to identify and analyze the damage indicated by user-generated multimodal social media posts, thereby helping the disaster management groups in making informed decisions. To this end, we propose MEMIS, a multimodal system capable of extracting information from social media, and employs both images and text for identifying damage and its severity in real-time (refer Sect. Therefore, we effectively have three models for each modality: first for filtering the informative tweets, then for those pertaining to the infrastructural damage (or any other category related to the relief group), and finally for assessing the severity of damage present. Similarly, if at least one of the text and the image modality predicts an informative tweet as containing infrastructural damage, the tweet undergoes severity analysis. Here, we use attention fusion to combine the feature interpretations from the text and image modalities for the severity analysis module [12, 26] . abstract: The recent upsurge in the usage of social media and the multimedia data generated therein has attracted many researchers for analyzing and decoding the information to automate decision-making in several fields. This work focuses on one such application: disaster management in times of crises and calamities. The existing research on disaster damage analysis has primarily taken only unimodal information in the form of text or image into account. These unimodal systems, although useful, fail to model the relationship between the various modalities. Different modalities often present supporting facts about the task, and therefore, learning them together can enhance performance. We present MEMIS, a system that can be used in emergencies like disasters to identify and analyze the damage indicated by user-generated multimodal social media posts, thereby helping the disaster management groups in making informed decisions. Our leave-one-disaster-out experiments on a multimodal dataset suggest that not only does fusing information in different media forms improves performance, but that our system can also generalize well to new disaster categories. Further qualitative analysis reveals that the system is responsive and computationally efficient. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148216/ doi: 10.1007/978-3-030-45439-5_32 id: cord-020843-cq4lbd0l author: Almeida, Tiago title: Calling Attention to Passages for Biomedical Question Answering date: 2020-03-24 words: 2235.0 sentences: 125.0 pages: flesch: 50.0 cache: ./cache/cord-020843-cq4lbd0l.txt txt: ./txt/cord-020843-cq4lbd0l.txt summary: This paper presents a pipeline for document and passage retrieval for biomedical question answering built around a new variant of the DeepRank network model in which the recursive layer is replaced by a self-attention layer combined with a weighting mechanism. On the other hand, models such as the Deep Relevance Matching Model (DRMM) [3] or DeepRank [10] follow a interaction-based approach, in which matching signals between query and document are captured and used by the neural network to produces a ranking score. The main contribution of this work is a new variant of the DeepRank neural network architecture in which the recursive layer originally included in the final aggregation step is replaced by a self-attention layer followed by a weighting mechanism similar to the term gating layer of the DRMM. The proposed model was evaluated on the BioASQ dataset, as part of a document and passage (snippet) retrieval pipeline for biomedical question answering, achieving similar retrieval performance when compared to more complex network architectures. abstract: Question answering can be described as retrieving relevant information for questions expressed in natural language, possibly also generating a natural language answer. This paper presents a pipeline for document and passage retrieval for biomedical question answering built around a new variant of the DeepRank network model in which the recursive layer is replaced by a self-attention layer combined with a weighting mechanism. This adaptation halves the total number of parameters and makes the network more suited for identifying the relevant passages in each document. The overall retrieval system was evaluated on the BioASQ tasks 6 and 7, achieving similar retrieval performance when compared to more complex network architectures. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148054/ doi: 10.1007/978-3-030-45442-5_9 id: cord-020880-m7d4e0eh author: Barrón-Cedeño, Alberto title: CheckThat! at CLEF 2020: Enabling the Automatic Identification and Verification of Claims in Social Media date: 2020-03-24 words: 2693.0 sentences: 217.0 pages: flesch: 64.0 cache: ./cache/cord-020880-m7d4e0eh.txt txt: ./txt/cord-020880-m7d4e0eh.txt summary: Task 3 asks to retrieve text snippets from a given set of Web pages that would be useful for verifying a target tweet''s claim. Finally, the lab offers a fifth task that asks to predict the check-worthiness of the claims made in English political debates and speeches. Task 3 is defined as follows: Given a check-worthy claim on a specific topic and a set of text snippets extracted from potentially-relevant webpages, return a ranked list of all evidence snippets for the claim. Once we acquire annotations for Task 1, we share with participants the Web pages and text snippets from them solely for the check-worthy claims, which would enable the start of the evaluation cycle for Task 3. Task 4 is defined as follows: Given a check-worthy claim on a specific topic and a set of potentially-relevant Web pages, predict the veracity of the claim. abstract: We describe the third edition of the CheckThat! Lab, which is part of the 2020 Cross-Language Evaluation Forum (CLEF). CheckThat! proposes four complementary tasks and a related task from previous lab editions, offered in English, Arabic, and Spanish. Task 1 asks to predict which tweets in a Twitter stream are worth fact-checking. Task 2 asks to determine whether a claim posted in a tweet can be verified using a set of previously fact-checked claims. Task 3 asks to retrieve text snippets from a given set of Web pages that would be useful for verifying a target tweet’s claim. Task 4 asks to predict the veracity of a target tweet’s claim using a set of potentially-relevant Web pages. Finally, the lab offers a fifth task that asks to predict the check-worthiness of the claims made in English political debates and speeches. CheckThat! features a full evaluation framework. The evaluation is carried out using mean average precision or precision at rank k for ranking tasks, and F[Formula: see text] for classification tasks. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148108/ doi: 10.1007/978-3-030-45442-5_65 id: cord-020912-tbq7okmj author: Batra, Vishwash title: Variational Recurrent Sequence-to-Sequence Retrieval for Stepwise Illustration date: 2020-03-17 words: 4506.0 sentences: 247.0 pages: flesch: 50.0 cache: ./cache/cord-020912-tbq7okmj.txt txt: ./txt/cord-020912-tbq7okmj.txt summary: We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. More concretely, we incorporate the global context information encoded in the entire text sequence (through the attention mechanism) into a variational autoencoder (VAE) at each time step, which converts the input text into an image representation in the image embedding space. To capture the semantics of the images retrieved so far (in a story/recipe), we assume the prior of the distribution of the topic given the text input follows the distribution conditional on the latent topic from the previous time step. -We propose a new variational recurrent seq2seq (VRSS) retrieval model for seq2seq retrieval, which employs temporally-dependent latent variables to capture the sequential semantic structure of text-image sequences. Our work is related to: cross-modal retrieval, story picturing, variational recurrent neural networks, and cooking recipe datasets. abstract: We address and formalise the task of sequence-to-sequence (seq2seq) cross-modal retrieval. Given a sequence of text passages as query, the goal is to retrieve a sequence of images that best describes and aligns with the query. This new task extends the traditional cross-modal retrieval, where each image-text pair is treated independently ignoring broader context. We propose a novel variational recurrent seq2seq (VRSS) retrieval model for this seq2seq task. Unlike most cross-modal methods, we generate an image vector corresponding to the latent topic obtained from combining the text semantics and context. This synthetic image embedding point associated with every text embedding point can then be employed for either image generation or image retrieval as desired. We evaluate the model for the application of stepwise illustration of recipes, where a sequence of relevant images are retrieved to best match the steps described in the text. To this end, we build and release a new Stepwise Recipe dataset for research purposes, containing 10K recipes (sequences of image-text pairs) having a total of 67K image-text pairs. To our knowledge, it is the first publicly available dataset to offer rich semantic descriptions in a focused category such as food or recipes. Our model is shown to outperform several competitive and relevant baselines in the experiments. We also provide qualitative analysis of how semantically meaningful the results produced by our model are through human evaluation and comparison with relevant existing methods. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148232/ doi: 10.1007/978-3-030-45439-5_4 id: cord-020814-1ty7wzlv author: Berrendorf, Max title: Knowledge Graph Entity Alignment with Graph Convolutional Networks: Lessons Learned date: 2020-03-24 words: 2314.0 sentences: 144.0 pages: flesch: 55.0 cache: ./cache/cord-020814-1ty7wzlv.txt txt: ./txt/cord-020814-1ty7wzlv.txt summary: In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Graph Convolutional Networks (GCN) [7, 9] , which have been recently become increasingly popular, are at the core of state-of-the-art methods for entity alignments in KGs [3, 6, 22, 24, 27] . 1. We investigate the reproducibility of the published results of a recent GCNbased method for entity alignment and uncover differences between the method''s description in the paper and the authors'' implementation. Overview of used datasets with their sizes in the number of triples (edges), entities (nodes), relations (different edge types) and alignments. GCN-Align [22] is a GCN-based approach to embed all entities from both graphs into a common embedding space. Semi-supervised entity alignment via knowledge graph embedding with awareness of degree difference Entity alignment between knowledge graphs using attribute embeddings abstract: In this work, we focus on the problem of entity alignment in Knowledge Graphs (KG) and we report on our experiences when applying a Graph Convolutional Network (GCN) based model for this task. Variants of GCN are used in multiple state-of-the-art approaches and therefore it is important to understand the specifics and limitations of GCN-based models. Despite serious efforts, we were not able to fully reproduce the results from the original paper and after a thorough audit of the code provided by authors, we concluded, that their implementation is different from the architecture described in the paper. In addition, several tricks are required to make the model work and some of them are not very intuitive.We provide an extensive ablation study to quantify the effects these tricks and changes of architecture have on final performance. Furthermore, we examine current evaluation approaches and systematize available benchmark datasets.We believe that people interested in KG matching might profit from our work, as well as novices entering the field. (Code: https://github.com/Valentyn1997/kg-alignment-lessons-learned). url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148025/ doi: 10.1007/978-3-030-45442-5_1 id: cord-020890-aw465igx author: Brochier, Robin title: Inductive Document Network Embedding with Topic-Word Attention date: 2020-03-17 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: Document network embedding aims at learning representations for a structured text corpus i.e. when documents are linked to each other. Recent algorithms extend network embedding approaches by incorporating the text content associated with the nodes in their formulations. In most cases, it is hard to interpret the learned representations. Moreover, little importance is given to the generalization to new documents that are not observed within the network. In this paper, we propose an interpretable and inductive document network embedding method. We introduce a novel mechanism, the Topic-Word Attention (TWA), that generates document representations based on the interplay between word and topic representations. We train these word and topic vectors through our general model, Inductive Document Network Embedding (IDNE), by leveraging the connections in the document network. Quantitative evaluations show that our approach achieves state-of-the-art performance on various networks and we qualitatively show that our model produces meaningful and interpretable representations of the words, topics and documents. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148210/ doi: 10.1007/978-3-030-45439-5_22 id: cord-020808-wpso3jug author: Cardoso, João title: Machine-Actionable Data Management Plans: A Knowledge Retrieval Approach to Automate the Assessment of Funders’ Requirements date: 2020-03-24 words: 2328.0 sentences: 137.0 pages: flesch: 49.0 cache: ./cache/cord-020808-wpso3jug.txt txt: ./txt/cord-020808-wpso3jug.txt summary: In order to guide researchers through the process of managing their data, many funding agencies (e.g. the National Science Foundation (NSF), the European Commission (EC), or the Fundação para a Ciência e Tecnologia (FCT) have created and published their own open access policies, as well as requiring that any grant proposals be accompanied by a Data Management Plan (DMP). The DMP is a document describing the techniques, methods and policies on how data from a research project is to be created or collected, documented, accessed, preserved and disseminated. The second part comprises of the execution of the following four tasks and results in both the collection of the necessary mappings between the ontology and the identified DMP templates, and creation of DL queries based on the funders'' requirements. The DMP Common Standard Ontology (DCSO) 1 , was created with the objective of providing an implementation of the DMP Common Standards model expressed through the usage of semantic technology, which has been considered a possible solution in the data management and preservation domains [9] . abstract: Funding bodies and other policy-makers are increasingly more concerned with Research Data Management (RDM). The Data Management Plan (DMP) is one of the tools available to perform RDM tasks, however it is not a perfect concept. The Machine-Actionable Data Management Plan (maDMP) is a concept that aims to make the DMP interoperable, automated and increasingly standardised. In this paper we showcase that through the usage of semantic technologies, it is possible to both express and exploit the features of the maDMP. In particular, we focus on showing how a maDMP formalised as an ontology can be used automate the assessment of a funder’s requirements for a given organisation. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148019/ doi: 10.1007/978-3-030-45442-5_15 id: cord-020908-oe77eupc author: Chen, Zhiyu title: Leveraging Schema Labels to Enhance Dataset Search date: 2020-03-17 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: A search engine’s ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior knowledge to write a query using terms that match with description text. We propose a novel schema label generation model which generates possible schema labels based on dataset table content. We incorporate the generated schema labels into a mixed ranking model which not only considers the relevance between the query and dataset metadata but also the similarity between the query and generated schema labels. To evaluate our method on real-world datasets, we create a new benchmark specifically for the dataset retrieval task. Experiments show that our approach can effectively improve the precision and NDCG scores of the dataset retrieval task compared with baseline methods. We also test on a collection of Wikipedia tables to show that the features generated from schema labels can improve the unsupervised and supervised web table retrieval task as well. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148228/ doi: 10.1007/978-3-030-45439-5_18 id: cord-020899-d6r4fr9r author: Doinychko, Anastasiia title: Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views date: 2020-03-17 words: 4666.0 sentences: 244.0 pages: flesch: 56.0 cache: ./cache/cord-020899-d6r4fr9r.txt txt: ./txt/cord-020899-d6r4fr9r.txt summary: In this paper, we present a conditional GAN with two generators and a common discriminator for multiview learning problems where observations have two views, but one of them may be missing for some of the training samples. We address the problem of multiview learning with Generative Adversarial Networks (GANs) in the case where some observations may have missing views without there being an external resource to complete them. We demonstrate that generated views allow to achieve state-of-the-art results on a subset of Reuters RCV1/RCV2 collections compared to multiview approaches that rely on Machine Translation (MT) for translating documents into languages in which their versions do not exist; before training the models. 3.2); -Achieve state-of-the art performance compared to multiview approaches that rely on external view generating functions on multilingual document classification; and which is another challenging application than image analysis which is the domain of choice for the design of new GAN models (Sect. abstract: In this paper, we present a conditional GAN with two generators and a common discriminator for multiview learning problems where observations have two views, but one of them may be missing for some of the training samples. This is for example the case for multilingual collections where documents are not available in all languages. Some studies tackled this problem by assuming the existence of view generation functions to approximately complete the missing views; for example Machine Translation to translate documents into the missing languages. These functions generally require an external resource to be set and their quality has a direct impact on the performance of the learned multiview classifier over the completed training set. Our proposed approach addresses this problem by jointly learning the missing views and the multiview classifier using a tripartite game with two generators and a discriminator. Each of the generators is associated to one of the views and tries to fool the discriminator by generating the other missing view conditionally on the corresponding observed view. The discriminator then tries to identify if for an observation, one of its views is completed by one of the generators or if both views are completed along with its class. Our results on a subset of Reuters RCV1/RCV2 collections show that the discriminator achieves significant classification performance; and that the generators learn the missing views with high quality without the need of any consequent external resource. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148219/ doi: 10.1007/978-3-030-45439-5_53 id: cord-020914-7p37m92a author: Dumani, Lorik title: A Framework for Argument Retrieval: Ranking Argument Clusters by Frequency and Specificity date: 2020-03-17 words: 5482.0 sentences: 302.0 pages: flesch: 67.0 cache: ./cache/cord-020914-7p37m92a.txt txt: ./txt/cord-020914-7p37m92a.txt summary: From an information retrieval perspective, an interesting task within this setting is finding the best supporting and attacking premises for a given query claim from a large corpus of arguments. From an information retrieval perspective, an interesting task within this setting is finding the best supporting (pro) and attacking (con) premises for a given query claim [31] . Given a user''s keyword query, the system retrieves, ranks, and presents premises supporting and attacking the query, taking similarity of the query with the premise, its corresponding claim, and other contextual information into account. We assume that we work with a large corpus of argumentative text, for example collections of political speeches or forum discussions, that has already been mined and transferred into claims with the corresponding premises and stances. We consider the following problem: Given a controversial claim or topic, for example "We should abandon fossil fuels", a user searches for the most important premises from the corpus supporting or attacking it. abstract: Computational argumentation has recently become a fast growing field of research. An argument consists of a claim, such as “We should abandon fossil fuels”, which is supported or attacked by at least one premise, for example “Burning fossil fuels is one cause for global warming”. From an information retrieval perspective, an interesting task within this setting is finding the best supporting and attacking premises for a given query claim from a large corpus of arguments. Since the same logical premise can be formulated differently, the system needs to avoid retrieving duplicate results and thus needs to use some form of clustering. In this paper we propose a principled probabilistic ranking framework for premises based on the idea of tf-idf that, given a query claim, first identifies highly similar claims in the corpus, and then clusters and ranks their premises, taking clusters of claims as well as the stances of query and premises into account. We compare our approach to a baseline system that uses BM25F which we outperform even with a primitive implementation of our framework utilising BERT. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148234/ doi: 10.1007/978-3-030-45439-5_29 id: cord-020916-ds0cf78u author: Fard, Mazar Moradi title: Seed-Guided Deep Document Clustering date: 2020-03-17 words: 5079.0 sentences: 265.0 pages: flesch: 57.0 cache: ./cache/cord-020916-ds0cf78u.txt txt: ./txt/cord-020916-ds0cf78u.txt summary: The main contributions of this study can be summarized as follows: (a) We introduce the Seed-guided Deep Document Clustering (SD2C) framework, 1 the first attempt, to the best of our knowledge, to constrain clustering with seed words based on a deep clustering approach; and (b) we validate this framework through experiments based on automatically selected seed words on five publicly available text datasets with various sizes and characteristics. The constrained clustering problem we are addressing in fact bears strong similarity with the one of seed-guided dataless text classification, which consist in categorizing documents based on a small set of seed words describing the classes/clusters. This can be done by enforcing that seed words have more influence either on the learned document embeddings, a solution we refer to as SD2C-Doc, or on the cluster representatives, a solution we refer to as SD2C-Rep. Note that the second solution can only be used when the clustering process is based on cluster representatives (i.e., R = {r k } K k=1 with K the number of clusters), which is indeed the case for most current deep clustering methods [1] . abstract: Different users may be interested in different clustering views underlying a given collection (e.g., topic and writing style in documents). Enabling them to provide constraints reflecting their needs can then help obtain tailored clustering results. For document clustering, constraints can be provided in the form of seed words, each cluster being characterized by a small set of words. This seed-guided constrained document clustering problem was recently addressed through topic modeling approaches. In this paper, we jointly learn deep representations and bias the clustering results through the seed words, leading to a Seed-guided Deep Document Clustering approach. Its effectiveness is demonstrated on five public datasets. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148236/ doi: 10.1007/978-3-030-45439-5_1 id: cord-020888-ov2lzus4 author: Formal, Thibault title: Learning to Rank Images with Cross-Modal Graph Convolutions date: 2020-03-17 words: 5211.0 sentences: 256.0 pages: flesch: 55.0 cache: ./cache/cord-020888-ov2lzus4.txt txt: ./txt/cord-020888-ov2lzus4.txt summary: While most of the current approaches for cross-modal retrieval revolve around learning how to represent text and images in a shared latent space, we take a different direction: we propose to generalize the cross-modal relevance feedback mechanism, a simple yet effective unsupervised method, that relies on standard information retrieval heuristics and the choice of a few hyper-parameters. The model can be understood very simply: similarly to PRF methods in standard information retrieval, the goal is to boost images that are visually similar to top images (from a text point of view), i.e. images that are likely to be relevant to the query but were initially badly ranked (which is likely to happen in the web scenario, where text is crawled from source page and can be very noisy). abstract: We are interested in the problem of cross-modal retrieval for web image search, where the goal is to retrieve images relevant to a text query. While most of the current approaches for cross-modal retrieval revolve around learning how to represent text and images in a shared latent space, we take a different direction: we propose to generalize the cross-modal relevance feedback mechanism, a simple yet effective unsupervised method, that relies on standard information retrieval heuristics and the choice of a few hyper-parameters. We show that we can cast it as a supervised representation learning problem on graphs, using graph convolutions operating jointly over text and image features, namely cross-modal graph convolutions. The proposed architecture directly learns how to combine image and text features for the ranking task, while taking into account the context given by all the other elements in the set of images to be (re-)ranked. We validate our approach on two datasets: a public dataset from a MediaEval challenge, and a small sample of proprietary image search query logs, referred as WebQ. Our experiments demonstrate that our model improves over standard baselines. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148208/ doi: 10.1007/978-3-030-45439-5_39 id: cord-020901-aew8xr6n author: García-Durán, Alberto title: TransRev: Modeling Reviews as Translations from Users to Items date: 2020-03-17 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: The text of a review expresses the sentiment a customer has towards a particular product. This is exploited in sentiment analysis where machine learning models are used to predict the review score from the text of the review. Furthermore, the products costumers have purchased in the past are indicative of the products they will purchase in the future. This is what recommender systems exploit by learning models from purchase information to predict the items a customer might be interested in. The underlying structure of this problem setting is a bipartite graph, wherein customer nodes are connected to product nodes via ‘review’ links. This is reminiscent of knowledge bases, with ‘review’ links replacing relation types. We propose TransRev, an approach to the product recommendation problem that integrates ideas from recommender systems, sentiment analysis, and multi-relational learning into a joint learning objective. TransRev learns vector representations for users, items, and reviews. The embedding of a review is learned such that (a) it performs well as input feature of a regression model for sentiment prediction; and (b) it always translates the reviewer embedding to the embedding of the reviewed item. This is reminiscent of TransE [5], a popular embedding method for link prediction in knowledge bases. This allows TransRev to approximate a review embedding at test time as the difference of the embedding of each item and the user embedding. The approximated review embedding is then used with the regression model to predict the review score for each item. TransRev outperforms state of the art recommender systems on a large number of benchmark data sets. Moreover, it is able to retrieve, for each user and item, the review text from the training set whose embedding is most similar to the approximated review embedding. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148221/ doi: 10.1007/978-3-030-45439-5_16 id: cord-020830-97xmu329 author: Ghanem, Bilal title: Irony Detection in a Multilingual Context date: 2020-03-24 words: 2806.0 sentences: 158.0 pages: flesch: 54.0 cache: ./cache/cord-020830-97xmu329.txt txt: ./txt/cord-020830-97xmu329.txt summary: We show that these monolingual models trained separately on different languages using multilingual word representation or text-based features can open the door to irony detection in languages that lack of annotated data for irony. We aim here to bridge the gap by tackling ID in tweets from both multilingual (French, English and Arabic) and multicultural perspectives (Indo-European languages whose speakers share quite the same cultural background vs. We can justify that by, the language presentation of the Arabic and French tweets are quite informal and have many dialect words that may not exist in the pretrained embeddings we used comparing to the English ones (lower embeddings coverage ratio), which become harder for the CNN to learn a clear semantic pattern. The CNN architecture trained on cross-lingual word representation shows that irony has a certain similarity between the languages we targeted despite the cultural differences which confirm that irony is a universal phenomena, as already shown in previous linguistic studies [9, 24, 35] . abstract: This paper proposes the first multilingual (French, English and Arabic) and multicultural (Indo-European languages vs. less culturally close languages) irony detection system. We employ both feature-based models and neural architectures using monolingual word representation. We compare the performance of these systems with state-of-the-art systems to identify their capabilities. We show that these monolingual models trained separately on different languages using multilingual word representation or text-based features can open the door to irony detection in languages that lack of annotated data for irony. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148041/ doi: 10.1007/978-3-030-45442-5_18 id: cord-020834-ch0fg9rp author: Grand, Adrien title: From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance date: 2020-03-24 words: 2733.0 sentences: 137.0 pages: flesch: 54.0 cache: ./cache/cord-020834-ch0fg9rp.txt txt: ./txt/cord-020834-ch0fg9rp.txt summary: We share the story of how an innovation that originated from academia-blockmax indexes and the corresponding block-max Wand query evaluation algorithm of Ding and Suel [6] -made its way into the open-source Lucene search library. We see this paper as having two main contributions beyond providing a narrative of events: First, we report results of experiments that attempt to match the original conditions of Ding and Suel [6] and present additional results on a number of standard academic IR test collections. 3 Support for block-max indexes was the final feature that was implemented, based on the developers'' reading of the paper by Ding and Suel [6] , which required invasive changes to Lucene''s index format. The story of block-max Wand in Lucene provides a case study of how an innovation that originated in academia made its way into the world''s most widely-used search library and achieved significant impact in the "real world" through hundreds of production deployments worldwide (if we consider the broader Lucene ecosystem, which includes systems such as Elasticsearch and Solr). abstract: The latest major release of Lucene (version 8) in March 2019 incorporates block-max indexes and exploits the block-max variant of Wand for query evaluation, which are innovations that originated from academia. This paper shares the story of how this came to be, which provides an interesting case study at the intersection of reproducibility and academic research achieving impact in the “real world”. We offer additional thoughts on the often idiosyncratic processes by which academic research makes its way into deployed solutions. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148045/ doi: 10.1007/978-3-030-45442-5_3 id: cord-020813-0wc23ixy author: Hashemi, Helia title: ANTIQUE: A Non-factoid Question Answering Benchmark date: 2020-03-24 words: 2941.0 sentences: 185.0 pages: flesch: 59.0 cache: ./cache/cord-020813-0wc23ixy.txt txt: ./txt/cord-020813-0wc23ixy.txt summary: Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. Despite the widely-known importance of studying answer passage retrieval for non-factoid questions [1, 2, 8, 18] , the research progress for this task is limited by the availability of high-quality public data. Although WikiPassageQA is an invaluable contribution to the community, it does not cover all aspects of the non-factoid question answering task and has the following limitations: (i) it only contains an average of 1.7 relevant passages per question and does not cover many questions with multiple correct answers; (ii) it was created from the Wikipedia website, containing only formal text; (iii) more importantly, the questions in the WikiPassageQA dataset were generated by crowdworkers, which is different from the questions that users ask in real-world systems; (iv) the relevant passages in WikiPassageQA contain the answer to the question in addition to some surrounding text. In contrast, ANTIQUE provides a reliable collection with complete relevance annotations for evaluating non-factoid QA models. abstract: Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. In this paper, we develop and release a collection of 2,626 open-domain non-factoid questions from a diverse set of categories. The dataset, called ANTIQUE, contains 34k manual relevance annotations. The questions were asked by real users in a community question answering service, i.e., Yahoo! Answers. Relevance judgments for all the answers to each question were collected through crowdsourcing. To facilitate further research, we also include a brief analysis of the data as well as baseline results on both classical and neural IR models. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148024/ doi: 10.1007/978-3-030-45442-5_21 id: cord-020841-40f2p3t4 author: Hofstätter, Sebastian title: Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-ranking Results date: 2020-03-24 words: 1526.0 sentences: 91.0 pages: flesch: 53.0 cache: ./cache/cord-020841-40f2p3t4.txt txt: ./txt/cord-020841-40f2p3t4.txt summary: In this paper we look beyond metrics-based evaluation of Information Retrieval systems, to explore the reasons behind ranking results. We present the content-focused Neural-IR-Explorer, which empowers users to browse through retrieval results and inspect the inner workings and fine-grained results of neural re-ranking models. The explorer complements metrics based evaluation, by focusing on the content of queries and documents, and how the neural models relate them to each other. Users can explore each query result in more detail: We show the internal partial scores and content of the returned documents with different highlighting modes to surface the inner workings of a neural re-ranking model. The explorer displays data created by a batched evaluation run of a neural re-ranking model. Additionally, the Neural-IR-Explorer also illuminates the pool bias [12] of the MSMARCO ranking collection: The small number of judged documents per query makes the evaluation fragile. We presented the content-focused Neural-IR-Explorer to complement metric based evaluation of retrieval models. abstract: In this paper we look beyond metrics-based evaluation of Information Retrieval systems, to explore the reasons behind ranking results. We present the content-focused Neural-IR-Explorer, which empowers users to browse through retrieval results and inspect the inner workings and fine-grained results of neural re-ranking models. The explorer includes a categorized overview of the available queries, as well as an individual query result view with various options to highlight semantic connections between query-document pairs. The Neural-IR-Explorer is available at: https://neural-ir-explorer.ec.tuwien.ac.at/. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148052/ doi: 10.1007/978-3-030-45442-5_58 id: cord-020835-n9v5ln2i author: Jangra, Anubhav title: Text-Image-Video Summary Generation Using Joint Integer Linear Programming date: 2020-03-24 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: Automatically generating a summary for asynchronous data can help users to keep up with the rapid growth of multi-modal information on the Internet. However, the current multi-modal systems usually generate summaries composed of text and images. In this paper, we propose a novel research problem of text-image-video summary generation (TIVS). We first develop a multi-modal dataset containing text documents, images and videos. We then propose a novel joint integer linear programming multi-modal summarization (JILP-MMS) framework. We report the performance of our model on the developed dataset. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148046/ doi: 10.1007/978-3-030-45442-5_24 id: cord-020815-j9eboa94 author: Kamphuis, Chris title: Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants date: 2020-03-24 words: 2249.0 sentences: 154.0 pages: flesch: 60.0 cache: ./cache/cord-020815-j9eboa94.txt txt: ./txt/cord-020815-j9eboa94.txt summary: Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene''s often maligned approximation of document length. Although learning-to-rank approaches and neural ranking models are widely used today, they are typically deployed as part of a multi-stage reranking architecture, over candidate documents supplied by a simple term-matching method using traditional inverted indexes [1] . Our goal is a large-scale reproducibility study to explore the nuances of different variants of BM25 and their impact on retrieval effectiveness. Their findings are confirmed: effectiveness differences in IR experiments are unlikely to be the result of the choice of BM25 variant a system implemented. We implemented a variant that uses exact document lengths, but is otherwise identical to the Lucene default. Storing exact document lengths would allow for different ranking functions to be swapped at query time more easily, as no information would be discarded at index time. abstract: When researchers speak of BM25, it is not entirely clear which variant they mean, since many tweaks to Robertson et al.’s original formulation have been proposed. When practitioners speak of BM25, they most likely refer to the implementation in the Lucene open-source search library. Does this ambiguity “matter”? We attempt to answer this question with a large-scale reproducibility study of BM25, considering eight variants. Experiments on three newswire collections show that there are no significant effectiveness differences between them, including Lucene’s often maligned approximation of document length. As an added benefit, our empirical approach takes advantage of databases for rapid IR prototyping, which validates both the feasibility and methodological advantages claimed in previous work. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148026/ doi: 10.1007/978-3-030-45442-5_4 id: cord-020806-lof49r72 author: Landin, Alfonso title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings date: 2020-03-24 words: 2373.0 sentences: 150.0 pages: flesch: 52.0 cache: ./cache/cord-020806-lof49r72.txt txt: ./txt/cord-020806-lof49r72.txt summary: title: Novel and Diverse Recommendations by Leveraging Linear Models with User and Item Embeddings In this paper, we present EER, a linear model for the top-N recommendation task, which takes advantage of user and item embeddings for improving novelty and diversity without harming accuracy. In this paper, we propose a method to augment an existing recommendation linear model to make more diverse and novel recommendations, while maintaining similar accuracy results. Experiments conducted on three datasets show that our proposal outperforms the original model in both novelty and diversity while maintaining similar levels of accuracy. On the other side, as results in Table 3 show, ELP is able to provide good figures in novelty and diversity, thanks to the embedding model capturing non-linear relations between users and items. It is common in the field of recommender systems for methods with lower accuracy to have higher values in diversity and novelty. FISM: factored item similarity models for top-n recommender systems abstract: Nowadays, item recommendation is an increasing concern for many companies. Users tend to be more reactive than proactive for solving information needs. Recommendation accuracy became the most studied aspect of the quality of the suggestions. However, novel and diverse suggestions also contribute to user satisfaction. Unfortunately, it is common to harm those two aspects when optimizing recommendation accuracy. In this paper, we present EER, a linear model for the top-N recommendation task, which takes advantage of user and item embeddings for improving novelty and diversity without harming accuracy. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148017/ doi: 10.1007/978-3-030-45442-5_27 id: cord-020794-d3oru1w5 author: Leekha, Maitree title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling date: 2020-03-24 words: 1569.0 sentences: 105.0 pages: flesch: 59.0 cache: ./cache/cord-020794-d3oru1w5.txt txt: ./txt/cord-020794-d3oru1w5.txt summary: title: A Multi-task Approach to Open Domain Suggestion Mining Using Language Model for Text Over-Sampling In this work, we introduce a novel over-sampling technique to address the problem of class imbalance, and propose a multi-task deep learning approach for mining suggestions from multiple domains. Experimental results on a publicly available dataset show that our over-sampling technique, coupled with the multi-task framework outperforms state-of-the-art open domain suggestion mining models in terms of the F-1 measure and AUC. In our study, we generate synthetic positive reviews till the number of suggestion and non-suggestion class samples becomes equal in the training set. All comparisons have been made in terms of the F-1 score of the suggestion class for a fair comparison with prior work on representational learning for open domain suggestion mining [5] (refer Baseline in Table 3 ). In this work, we proposed a Multi-task learning framework for Open Domain Suggestion Mining along with a novel language model based over-sampling technique for text-LMOTE. abstract: Consumer reviews online may contain suggestions useful for improving commercial products and services. Mining suggestions is challenging due to the absence of large labeled and balanced datasets. Furthermore, most prior studies attempting to mine suggestions, have focused on a single domain such as Hotel or Travel only. In this work, we introduce a novel over-sampling technique to address the problem of class imbalance, and propose a multi-task deep learning approach for mining suggestions from multiple domains. Experimental results on a publicly available dataset show that our over-sampling technique, coupled with the multi-task framework outperforms state-of-the-art open domain suggestion mining models in terms of the F-1 measure and AUC. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148005/ doi: 10.1007/978-3-030-45442-5_28 id: cord-020851-hf5c0i9z author: Losada, David E. title: eRisk 2020: Self-harm and Depression Challenges date: 2020-03-24 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: This paper describes eRisk, the CLEF lab on early risk prediction on the Internet. eRisk started in 2017 as an attempt to set the experimental foundations of early risk detection. Over the last three editions of eRisk (2017, 2018 and 2019), the lab organized a number of early risk detection challenges oriented to the problems of detecting depression, anorexia and self-harm. We review in this paper the main lessons learned from the past and we discuss our future plans for the 2020 edition. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148062/ doi: 10.1007/978-3-030-45442-5_72 id: cord-020801-3sbicp3v author: MacAvaney, Sean title: Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-Shot Learning date: 2020-03-24 words: 2530.0 sentences: 154.0 pages: flesch: 53.0 cache: ./cache/cord-020801-3sbicp3v.txt txt: ./txt/cord-020801-3sbicp3v.txt summary: In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. [28] leveraged a data set of Wikipedia pages in 25 languages to train a learning to rank algorithm for Japanese-English and Swahili-English cross-language retrieval. In particular, to circumvent the lack of training data, we leverage transfer learning techniques to train Arabic, Mandarin, and Spanish retrieval models using English training data. We evaluate our models in a zero-shot setting; that is, we use them to predict relevance scores for query document pairs in languages never seen during training. Because large-scale relevance judgments are largely absent in languages other than English, we propose a new setting to evaluate learning-to-rank approaches: zero-shot cross-lingual ranking. abstract: While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages. This is primarily due to a lack of data set that are suitable to train ranking algorithms. In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents. Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training. Our results show that the proposed approach can significantly outperform unsupervised retrieval techniques for Arabic, Chinese Mandarin, and Spanish. We also show that augmenting the English training collection with some examples from the target language can sometimes improve performance. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148012/ doi: 10.1007/978-3-030-45442-5_31 id: cord-020931-fymgnv1g author: Meng, Changping title: ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis date: 2020-03-17 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: Analyzing the readability of articles has been an important sociolinguistic task. Addressing this task is necessary to the automatic recommendation of appropriate articles to readers with different comprehension abilities, and it further benefits education systems, web information systems, and digital libraries. Current methods for assessing readability employ empirical measures or statistical learning techniques that are limited by their ability to characterize complex patterns such as article structures and semantic meanings of sentences. In this paper, we propose a new and comprehensive framework which uses a hierarchical self-attention model to analyze document readability. In this model, measurements of sentence-level difficulty are captured along with the semantic meanings of each sentence. Additionally, the sentence-level features are incorporated to characterize the overall readability of an article with consideration of article structures. We evaluate our proposed approach on three widely-used benchmark datasets against several strong baseline approaches. Experimental results show that our proposed method achieves the state-of-the-art performance on estimating the readability for various web articles and literature. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148251/ doi: 10.1007/978-3-030-45439-5_3 id: cord-020904-x3o3a45b author: Montazeralghaem, Ali title: Relevance Ranking Based on Query-Aware Context Analysis date: 2020-03-17 words: 5192.0 sentences: 326.0 pages: flesch: 53.0 cache: ./cache/cord-020904-x3o3a45b.txt txt: ./txt/cord-020904-x3o3a45b.txt summary: The primary goal of the proposed model is to combine the exact and semantic matching between query and document terms, which has been shown to produce effective performance in information retrieval. In basic retrieval models such as BM25 [30] and the language modeling framework [29] , the relevance score of a document is estimated based on explicit matching of query and document terms. Finally, our proposed model for relevance ranking provides the basis for natural integration of semantic term matching and local document context analysis into any retrieval model. [13] proposed a generalized estimate of document language models using a noisy channel, which captures semantic term similarities computed using word embeddings. Note that in this experiment, we only consider methods that select expansion terms based on word embeddings and not other information sources such as the top retrieved documents for each query (PRF). abstract: Word mismatch between queries and documents is a long-standing challenge in information retrieval. Recent advances in distributed word representations address the word mismatch problem by enabling semantic matching. However, most existing models rank documents based on semantic matching between query and document terms without an explicit understanding of the relationship of the match to relevance. To consider semantic matching between query and document, we propose an unsupervised semantic matching model by simulating a user who makes relevance decisions. The primary goal of the proposed model is to combine the exact and semantic matching between query and document terms, which has been shown to produce effective performance in information retrieval. As semantic matching between queries and entire documents is computationally expensive, we propose to use local contexts of query terms in documents for semantic matching. Matching with smaller query-related contexts of documents stems from the relevance judgment process recorded by human observers. The most relevant part of a document is then recognized and used to rank documents with respect to the query. Experimental results on several representative retrieval models and standard datasets show that our proposed semantic matching model significantly outperforms competitive baselines in all measures. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148224/ doi: 10.1007/978-3-030-45439-5_30 id: cord-020848-nypu4w9s author: Morris, David title: SlideImages: A Dataset for Educational Image Classification date: 2020-03-24 words: 2276.0 sentences: 145.0 pages: flesch: 51.0 cache: ./cache/cord-020848-nypu4w9s.txt txt: ./txt/cord-020848-nypu4w9s.txt summary: Currently, many document analysis systems are trained in part on scene images due to the lack of large datasets of educational image data. In this paper, we address this issue and present SlideImages, a dataset for the task of classifying educational illustrations. SlideImages contains training data collected from various sources, e.g., Wikimedia Commons and the AI2D dataset, and test data collected from educational slides. Born-digital and educational images need further benchmarks on challenging information retrieval tasks in order to test generalization. While document scans and born-digital educational illustrations have materially different appearance, these papers show that the utility of deep neural networks is not limited to scene image tasks (Fig. 1) . The related DocFigure dataset covers similar images and has much more data than SlideImages. In this paper, we have presented the task of classifying educational illustrations and images in slides and introduced a novel dataset SlideImages. abstract: In the past few years, convolutional neural networks (CNNs) have achieved impressive results in computer vision tasks, which however mainly focus on photos with natural scene content. Besides, non-sensor derived images such as illustrations, data visualizations, figures, etc. are typically used to convey complex information or to explore large datasets. However, this kind of images has received little attention in computer vision. CNNs and similar techniques use large volumes of training data. Currently, many document analysis systems are trained in part on scene images due to the lack of large datasets of educational image data. In this paper, we address this issue and present SlideImages, a dataset for the task of classifying educational illustrations. SlideImages contains training data collected from various sources, e.g., Wikimedia Commons and the AI2D dataset, and test data collected from educational slides. We have reserved all the actual educational images as a test dataset in order to ensure that the approaches using this dataset generalize well to new educational images, and potentially other domains. Furthermore, we present a baseline system using a standard deep neural architecture and discuss dealing with the challenge of limited training data. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148059/ doi: 10.1007/978-3-030-45442-5_36 id: cord-020811-pacy48qx author: Muhammad, Shamsuddeen Hassan title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon date: 2020-03-24 words: 1725.0 sentences: 113.0 pages: flesch: 50.0 cache: ./cache/cord-020811-pacy48qx.txt txt: ./txt/cord-020811-pacy48qx.txt summary: title: Incremental Approach for Automatic Generation of Domain-Specific Sentiment Lexicon To this end, we propose an approach to automatically generate a domain-specific sentiment lexicon using a vector model enriched by weights. Although research has been carried out on corpus-based approaches for automatic generation of a domain-specific lexicon [1, 4, 5, 7, 9, 10, 14] , existing approaches focused on creation of a lexicon from a single corpus [4] . To this end, this work proposes an incremental approach for the automatic generation of a domain-specific sentiment lexicon. We aim to investigate an incremental technique for automatically generating domain-specific sentiment lexicon from a corpus. Can we automatically generate a sentiment lexicon from a corpus and improves the existing approaches? After detecting the domain shift, we merge the distribution using a similar approach discussed (in updating using the same corpus) and generate the lexicon. abstract: Sentiment lexicon plays a vital role in lexicon-based sentiment analysis. The lexicon-based method is often preferred because it leads to more explainable answers in comparison with many machine learning-based methods. But, semantic orientation of a word depends on its domain. Hence, a general-purpose sentiment lexicon may gives sub-optimal performance compare with a domain-specific lexicon. However, it is challenging to manually generate a domain-specific sentiment lexicon for each domain. Still, it is impractical to generate complete sentiment lexicon for a domain from a single corpus. To this end, we propose an approach to automatically generate a domain-specific sentiment lexicon using a vector model enriched by weights. Importantly, we propose an incremental approach for updating an existing lexicon to either the same domain or different domain (domain-adaptation). Finally, we discuss how to incorporate sentiment lexicons information in neural models (word embedding) for better performance. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148022/ doi: 10.1007/978-3-030-45442-5_81 id: cord-020918-056bvngu author: Nchabeleng, Mathibele title: Evaluating the Effectiveness of the Standard Insights Extraction Pipeline for Bantu Languages date: 2020-03-17 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: Extracting insights from data obtained from the web in order to identify people’s views and opinions on various topics is a growing practice. The standard insights extraction pipeline is typically an unsupervised machine learning task composed of processes that preprocess the text, visualize it, cluster and identify the topics and sentiment in each cluster, and then graph the network. Given the increasing amount of data being generated on the internet in Africa today, and the multilingual state of African countries, we evaluated how well the standard pipeline works when applied to text wholly or partially written in indigenous African languages, specifically Bantu languages. We carried out an exploratory investigation using Twitter data and compared the outputs from each step of the pipeline for an English dataset and a mixed Bantu language dataset. We found that for Bantu languages, due to their complex grammatical structure, extra preprocessing steps such as part-of-speech tagging and morphological analysis are required during data cleaning, threshold values should be adjusted during topic modeling, and semantic analysis should be performed before completing text preprocessing. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148238/ doi: 10.1007/978-3-030-45439-5_11 id: cord-020832-iavwkdpr author: Nguyen, Dat Quoc title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents date: 2020-03-24 words: 1980.0 sentences: 118.0 pages: flesch: 49.0 cache: ./cache/cord-020832-iavwkdpr.txt txt: ./txt/cord-020832-iavwkdpr.txt summary: title: ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents ChEMU involves two key information extraction tasks over chemical reactions from patents. In this paper, we propose a new evaluation lab (called ChEMU) focusing on information extraction over chemical reactions from patents. Our goals are: (1) To develop tasks that impact chemical research in both academia and industry, (2) To provide the community with a new dataset of chemical entities, enriched with relational links between chemical event triggers and arguments, and (3) To advance the state-of-the-art in information extraction over chemical patents. The ChEMU lab at CLEF-2020 1 offers the two information extraction tasks of Named entity recognition (Task 1) and Event extraction (Task 2) over chemical reactions from patent documents. ChEMU will focus on two new tasks of named entity recognition and event extraction over chemical reactions from patents. abstract: We introduce a new evaluation lab named ChEMU (Cheminformatics Elsevier Melbourne University), part of the 11th Conference and Labs of the Evaluation Forum (CLEF-2020). ChEMU involves two key information extraction tasks over chemical reactions from patents. Task 1—Named entity recognition—involves identifying chemical compounds as well as their types in context, i.e., to assign the label of a chemical compound according to the role which the compound plays within a chemical reaction. Task 2—Event extraction over chemical reactions—involves event trigger detection and argument recognition. We briefly present the motivations and goals of the ChEMU tasks, as well as resources and evaluation methodology. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148043/ doi: 10.1007/978-3-030-45442-5_74 id: cord-020820-cbikq0v0 author: Papadakos, Panagiotis title: Dualism in Topical Relevance date: 2020-03-24 words: 2468.0 sentences: 133.0 pages: flesch: 56.0 cache: ./cache/cord-020820-cbikq0v0.txt txt: ./txt/cord-020820-cbikq0v0.txt summary: To this end, in this paper we elaborate on the idea of leveraging the available antonyms of the original query terms for eventually producing an answer which provides a better overview of the related conceptual and information space. In this paper we elaborate on the idea of leveraging the available antonyms of the original query terms (if they exist), for eventually producing an answer which provides a better overview of the related information and conceptual space. In their comments for these queries, users mention that the selected (i.e., dual) list "provides a more general picture" and "more relevant and interesting results, although contradicting". For the future, we plan to define the appropriate antonyms selection algorithms and relevance metrics, implement the proposed functionality in a meta-search setting, and conduct a large scale evaluation with real users over exploratory tasks, to identify in which queries the dual approach is beneficial and to what types of users. abstract: There are several concepts whose interpretation and meaning is defined through their binary opposition with other opposite concepts. To this end, in this paper we elaborate on the idea of leveraging the available antonyms of the original query terms for eventually producing an answer which provides a better overview of the related conceptual and information space. Specifically, we sketch a method in which antonyms are used for producing dual queries, which can in turn be exploited for defining a multi-dimensional topical relevance based on the antonyms. We motivate this direction by providing examples and by conducting a preliminary evaluation that shows its importance to specific users. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148031/ doi: 10.1007/978-3-030-45442-5_40 id: cord-020909-n36p5n2k author: Papadakos, Panagiotis title: bias goggles: Graph-Based Computation of the Bias of Web Domains Through the Eyes of Users date: 2020-03-17 words: 5005.0 sentences: 256.0 pages: flesch: 63.0 cache: ./cache/cord-020909-n36p5n2k.txt txt: ./txt/cord-020909-n36p5n2k.txt summary: -the bias goggles model for computing the bias characteristics of web domains for a user-defined concept, based on the notions of Biased Concepts (BCs), Aspects of Bias (ABs), and the metrics of the support of the domain for a specific AB and BC, and its bias score for this BC, -the introduction of the Support Flow Graph (SFG), along with graph-based algorithms for computing the AB support score of domains, that include adaptations of the Independence Cascade (IC) and Linear Threshold (LT) propagation models, and the new Biased-PageRank (Biased-PR) variation that models different behaviours of a biased surfer, -an initial discussion about performance and implementation issues, -some promising evaluation results that showcase the effectiveness and efficiency of the approach on a relatively small dataset of crawled pages, using the new AGBR and AGS metrics, -a publicly accessible prototype of bias goggles. abstract: Ethical issues, along with transparency, disinformation, and bias, are in the focus of our information society. In this work, we propose the bias goggles model, for computing the bias characteristics of web domains to user-defined concepts based on the structure of the web graph. For supporting the model, we exploit well-known propagation models and the newly introduced Biased-PR PageRank algorithm, that models various behaviours of biased surfers. An implementation discussion, along with a preliminary evaluation over a subset of the greek web graph, shows the applicability of the model even in real-time for small graphs, and showcases rather promising and interesting results. Finally, we pinpoint important directions for future work. A constantly evolving prototype of the bias goggles system is readily available. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148229/ doi: 10.1007/978-3-030-45439-5_52 id: cord-020871-1v6dcmt3 author: Papariello, Luca title: On the Replicability of Combining Word Embeddings and Retrieval Models date: 2020-03-24 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: We replicate recent experiments attempting to demonstrate an attractive hypothesis about the use of the Fisher kernel framework and mixture models for aggregating word embeddings towards document representations and the use of these representations in document classification, clustering, and retrieval. Specifically, the hypothesis was that the use of a mixture model of von Mises-Fisher (VMF) distributions instead of Gaussian distributions would be beneficial because of the focus on cosine distances of both VMF and the vector space model traditionally used in information retrieval. Previous experiments had validated this hypothesis. Our replication was not able to validate it, despite a large parameter scan space. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148082/ doi: 10.1007/978-3-030-45442-5_7 id: cord-020905-gw8i6tkn author: Qu, Xianshan title: An Attention Model of Customer Expectation to Improve Review Helpfulness Prediction date: 2020-03-17 words: 5412.0 sentences: 330.0 pages: flesch: 60.0 cache: ./cache/cord-020905-gw8i6tkn.txt txt: ./txt/cord-020905-gw8i6tkn.txt summary: To model such customer expectations and capture important information from a review text, we propose a novel neural network which leverages review sentiment and product information. In order to address the above issues, we propose a novel neural network architecture to introduce sentiment and product information when identifying helpful content from a review text. In the cold start scenario, our proposed model demonstrates an AUC improvement of 5.4% and 1.5% on Amazon and Yelp data sets, respectively, when compared to the state of the art model. From Table 5 , we see that adding a sentiment attention layer (HSA) to the base model (HBiLSTM) results in an average improvement in the AUC score of 2.0% and 2.6%, respectively on the Amazon and Yelp data sets. In this paper, we describe our analysis of review helpfulness prediction and propose a novel neural network model with attention modules to incorporate sentiment and product information. abstract: Many people browse reviews online before making purchasing decisions. It is essential to identify the subset of helpful reviews from the large number of reviews of varying quality. This paper aims to build a model to predict review helpfulness automatically. Our work is inspired by the observation that a customer’s expectation of a review can be greatly affected by review sentiment and the degree to which the customer is aware of pertinent product information. Consequently, a customer may pay more attention to that specific content of a review which contributes more to its helpfulness from their perspective. To model such customer expectations and capture important information from a review text, we propose a novel neural network which leverages review sentiment and product information. Specifically, we encode the sentiment of a review through an attention module, to get sentiment-driven information from review text. We also introduce a product attention layer that fuses information from both the target product and related products, in order to capture the product related information from review text. Our experimental results show an AUC improvement of 5.4% and 1.5% over the previous state of the art model on Amazon and Yelp data sets, respectively. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148225/ doi: 10.1007/978-3-030-45439-5_55 id: cord-020872-frr8xba6 author: Santosh, Tokala Yaswanth Sri Sai title: DAKE: Document-Level Attention for Keyphrase Extraction date: 2020-03-24 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: Keyphrases provide a concise representation of the topical content of a document and they are helpful in various downstream tasks. Previous approaches for keyphrase extraction model it as a sequence labelling task and use local contextual information to understand the semantics of the input text but they fail when the local context is ambiguous or unclear. We present a new framework to improve keyphrase extraction by utilizing additional supporting contextual information. We retrieve this additional information from other sentences within the same document. To this end, we propose Document-level Attention for Keyphrase Extraction (DAKE), which comprises Bidirectional Long Short-Term Memory networks that capture hidden semantics in text, a document-level attention mechanism to incorporate document level contextual information, gating mechanisms which help to determine the influence of additional contextual information on the fusion with local contextual information, and Conditional Random Fields which capture output label dependencies. Our experimental results on a dataset of research papers show that the proposed model outperforms previous state-of-the-art approaches for keyphrase extraction. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148091/ doi: 10.1007/978-3-030-45442-5_49 id: cord-020936-k1upc1xu author: Sanz-Cruzado, Javier title: Axiomatic Analysis of Contact Recommendation Methods in Social Networks: An IR Perspective date: 2020-03-17 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: Contact recommendation is an important functionality in many social network scenarios including Twitter and Facebook, since they can help grow the social networks of users by suggesting, to a given user, people they might wish to follow. Recently, it has been shown that classical information retrieval (IR) weighting models – such as BM25 – can be adapted to effectively recommend new social contacts to a given user. However, the exact properties that make such adapted contact recommendation models effective at the task are as yet unknown. In this paper, inspired by new advances in the axiomatic theory of IR, we study the existing IR axioms for the contact recommendation task. Our theoretical analysis and empirical findings show that while the classical axioms related to term frequencies and term discrimination seem to have a positive impact on the recommendation effectiveness, those related to length normalization tend to be not desirable for the task. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148256/ doi: 10.1007/978-3-030-45439-5_12 id: cord-020885-f667icyt author: Sharma, Ujjwal title: Semantic Path-Based Learning for Review Volume Prediction date: 2020-03-17 words: 4026.0 sentences: 245.0 pages: flesch: 48.0 cache: ./cache/cord-020885-f667icyt.txt txt: ./txt/cord-020885-f667icyt.txt summary: In this work, we present an approach that uses semantically meaningful, bimodal random walks on real-world heterogeneous networks to extract correlations between nodes and bring together nodes with shared or similar attributes. In this work, -We propose a novel method that incorporates restaurants and their attributes into a multimodal graph and extracts multiple, bimodal low dimensional representations for restaurants based on available paths through shared visual, textual, geographical and categorical features. In this section, we discuss prior work that leverages graph-based structures for extracting information from multiple modalities, focussing on the auto-captioning task that introduced such methods. For each of these sub-networks, we perform random walks and use a variant of the heterogeneous skip-gram objective introduced in [6] to generate low-dimensional bimodal embeddings. Our attention-based model combines separately learned bimodal embeddings using a late-fusion setup for predicting the review volume of the restaurants. abstract: Graphs offer a natural abstraction for modeling complex real-world systems where entities are represented as nodes and edges encode relations between them. In such networks, entities may share common or similar attributes and may be connected by paths through multiple attribute modalities. In this work, we present an approach that uses semantically meaningful, bimodal random walks on real-world heterogeneous networks to extract correlations between nodes and bring together nodes with shared or similar attributes. An attention-based mechanism is used to combine multiple attribute-specific representations in a late fusion setup. We focus on a real-world network formed by restaurants and their shared attributes and evaluate performance on predicting the number of reviews a restaurant receives, a strong proxy for popularity. Our results demonstrate the rich expressiveness of such representations in predicting review volume and the ability of an attention-based model to selectively combine individual representations for maximum predictive power on the chosen downstream task. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148205/ doi: 10.1007/978-3-030-45439-5_54 id: cord-020793-kgje01qy author: Suominen, Hanna title: CLEF eHealth Evaluation Lab 2020 date: 2020-03-24 words: 2379.0 sentences: 116.0 pages: flesch: 51.0 cache: ./cache/cord-020793-kgje01qy.txt txt: ./txt/cord-020793-kgje01qy.txt summary: Laypeople''s increasing difficulties to retrieve and digest valid and relevant information in their preferred language to make health-centred decisions has motivated CLEF eHealth to organize yearly labs since 2012. substantial community interest in the tasks and their resources has led to CLEF eHealth maturing as a primary venue for all interdisciplinary actors of the ecosystem for producing, processing, and consuming electronic health information. Information access conferences have organized evaluation labs on related Electronic Health (eHealth) Information Extraction (IE), Information Management (IM), and Information Retrieval (IR) tasks for almost 20 years. This Consumer Health Search (CHS) task follows a standard IR shared challenge paradigm from the perspective that it provides participants with a test collection consisting of a set of documents and a set of topics to develop IR techniques for. The IR task at the CLEF eHealth evaluation lab 2016: usercentred health information retrieval abstract: Laypeople’s increasing difficulties to retrieve and digest valid and relevant information in their preferred language to make health-centred decisions has motivated CLEF eHealth to organize yearly labs since 2012. These 20 evaluation tasks on Information Extraction (IE), management, and Information Retrieval (IR) in 2013–2019 have been popular—as demonstrated by the large number of team registrations, submissions, papers, their included authors, and citations (748, 177, 184, 741, and 1299, respectively, up to and including 2018)—and achieved statistically significant improvements in the processing quality. In 2020, CLEF eHealth is calling for participants to contribute to the following two tasks: The 2020 Task 1 on IE focuses on term coding for clinical textual data in Spanish. The terms considered are extracted from clinical case records and they are mapped onto the Spanish version of the International Classification of Diseases, the 10th Revision, including also textual evidence spans for the clinical codes. The 2020 Task 2 is a novel extension of the most popular and established task in CLEF eHealth on CHS. This IR task uses the representative web corpus used in the 2018 challenge, but now also spoken queries, as well as textual transcripts of these queries, are offered to the participants. The task is structured into a number of optional subtasks, covering ad-hoc search using the spoken queries, textual transcripts of the spoken queries, or provided automatic speech-to-text conversions of the spoken queries. In this paper we describe the evolution of CLEF eHealth and this year’s tasks. The substantial community interest in the tasks and their resources has led to CLEF eHealth maturing as a primary venue for all interdisciplinary actors of the ecosystem for producing, processing, and consuming electronic health information. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148004/ doi: 10.1007/978-3-030-45442-5_76 id: cord-020875-vd4rtxmz author: Suwaileh, Reem title: Time-Critical Geolocation for Social Good date: 2020-03-24 words: 2030.0 sentences: 134.0 pages: flesch: 51.0 cache: ./cache/cord-020875-vd4rtxmz.txt txt: ./txt/cord-020875-vd4rtxmz.txt summary: To address this problem, I aim to exploit different techniques such as training neural models, enriching the tweet representation, and studying methods to mitigate the lack of labeled data. In my work, I am interested in tackling the Location Mention Prediction (LMP) problem during time-critical situations. The location taggers have to address many challenges including microblogging-specific challenges (e.g., tweet sparsity, noisiness, stream rapid-changing, hashtag riding, etc.) and the task-specific challenges (e.g., time-criticality of the solution, scarcity of labeled data, etc.). Alternatively, Sultanik and Fink [25] , used Information Retrieval (IR) based approach to identify the location mentions in tweets. Moreover, Hoang and Mothe [8] combined syntactic and semantic features to train traditional ML-based models whereas Kumar and Singh [13] trained a Convolutional Neural Network (CNN) model that learns the continuous representation of tweet text and then identifies the location mentions. abstract: Twitter has become an instrumental source of news in emergencies where efficient access, dissemination of information, and immediate reactions are critical. Nevertheless, due to several challenges, the current fully-automated processing methods are not yet mature enough for deployment in real scenarios. In this dissertation, I focus on tackling the lack of context problem by studying automatic geo-location techniques. I specifically aim to study the Location Mention Prediction problem in which the system has to extract location mentions in tweets and pin them on the map. To address this problem, I aim to exploit different techniques such as training neural models, enriching the tweet representation, and studying methods to mitigate the lack of labeled data. I anticipate many downstream applications for the Location Mention Prediction problem such as incident detection, real-time action management during emergencies, and fake news and rumor detection among others. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148099/ doi: 10.1007/978-3-030-45442-5_82 id: cord-020903-qt0ly5d0 author: Tamine, Lynda title: What Can Task Teach Us About Query Reformulations? date: 2020-03-17 words: 4957.0 sentences: 264.0 pages: flesch: 61.0 cache: ./cache/cord-020903-qt0ly5d0.txt txt: ./txt/cord-020903-qt0ly5d0.txt summary: task-based sessions represent significantly different background contexts to be used in the perspective of better understanding users'' query reformulations. Using insights from large-scale search logs, our findings clearly show that task is an additional relevant search unit that helps better understanding user''s query reformulation patterns and predicting the next user''s query. To design support processes for task-based search systems, we argue that we need to: (1) fully understand how user''s task performed in natural settings drives the query reformulations changes; and (2) gauge the level of similarity of these changes trends with those observed in time-based sessions. With this in mind, we perform large-scale log analyses of users naturally engaged in tasks to examine query reformulations from both the time-based session vs. To identify query reformulation patterns, most of the previous works used large-scale log analyses segmented into time-based sessions. abstract: A significant amount of prior research has been devoted to understanding query reformulations. The majority of these works rely on time-based sessions which are sequences of contiguous queries segmented using time threshold on users’ activities. However, queries are generally issued by users having in mind a particular task, and time-based sessions unfortunately fail in revealing such tasks. In this paper, we are interested in revealing in which extent time-based sessions vs. task-based sessions represent significantly different background contexts to be used in the perspective of better understanding users’ query reformulations. Using insights from large-scale search logs, our findings clearly show that task is an additional relevant search unit that helps better understanding user’s query reformulation patterns and predicting the next user’s query. The findings from our analyses provide potential implications for model design of task-based search engines. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148223/ doi: 10.1007/978-3-030-45439-5_42 id: cord-020891-lt3m8h41 author: Witschel, Hans Friedrich title: KvGR: A Graph-Based Interface for Explorative Sequential Question Answering on Heterogeneous Information Sources date: 2020-03-17 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: Exploring a knowledge base is often an iterative process: initially vague information needs are refined by interaction. We propose a novel approach for such interaction that supports sequential question answering (SQA) on knowledge graphs. As opposed to previous work, we focus on exploratory settings, which we support with a visual representation of graph structures, helping users to better understand relationships. In addition, our approach keeps track of context – an important challenge in SQA – by allowing users to make their focus explicit via subgraph selection. Our results show that the interaction principle is either understood immediately or picked up very quickly – and that the possibility of exploring the information space iteratively is appreciated. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148211/ doi: 10.1007/978-3-030-45439-5_50 id: cord-020932-o5scqiyk author: Zhong, Wei title: Accelerating Substructure Similarity Search for Formula Retrieval date: 2020-03-17 words: 4602.0 sentences: 278.0 pages: flesch: 65.0 cache: ./cache/cord-020932-o5scqiyk.txt txt: ./txt/cord-020932-o5scqiyk.txt summary: In text similarity search, query processing can be accelerated through dynamic pruning [18] , which typically estimates score upperbounds to prune documents unlikely to be in the top K results. As a result, the posting list entry also stores the root node ID for indexed paths, in order to reconstruct matches substructures at merge time. Define partial upperbound matrix W = {w i,j } |Tq|×|T| where T = {T(m), m ∈ T q } are all the token paths from query OPT (T is essentially the same as tokenized P(T q )), and a binary variable x |T|×1 indicating which corresponding posting lists are placed in the non-requirement set. We have presented rank-safe dynamic pruning strategies that produce an upperbound estimation of structural similarity in order to speedup formula search using subtree matching. Our dynamic pruning strategies and specialized inverted index are different from traditional linear text search pruning methods and they further associate query structure representation with posting lists. abstract: Formula retrieval systems using substructure matching are effective, but suffer from slow retrieval times caused by the complexity of structure matching. We present a specialized inverted index and rank-safe dynamic pruning algorithm for faster substructure retrieval. Formulas are indexed from their Operator Tree (OPT) representations. Our model is evaluated using the NTCIR-12 Wikipedia Formula Browsing Task and a new formula corpus produced from Math StackExchange posts. Our approach preserves the effectiveness of structure matching while allowing queries to be executed in real-time. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148252/ doi: 10.1007/978-3-030-45439-5_47 id: cord-020927-89c7rijg author: Zhuang, Shengyao title: Counterfactual Online Learning to Rank date: 2020-03-17 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: Exploiting users’ implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users’ changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users’ implicit feedback usually contains bias (e.g., position bias, selection bias) and noise (e.g., clicking on irrelevant but attractive snippets, adversarial clicks). Two main methods have arisen for optimizing rankers based on implicit feedback: counterfactual learning to rank (CLTR), which learns a ranker from the historical click-through data collected from a deployed, logging ranker; and online learning to rank (OLTR), where a ranker is updated by recording user interaction with a result list produced by multiple rankers (usually via interleaving). In this paper, we propose a counterfactual online learning to rank algorithm (COLTR) that combines the key components of both CLTR and OLTR. It does so by replacing the online evaluation required by traditional OLTR methods with the counterfactual evaluation common in CLTR. Compared to traditional OLTR approaches based on interleaving, COLTR can evaluate a large number of candidate rankers in a more efficient manner. Our empirical results show that COLTR significantly outperforms traditional OLTR methods. Furthermore, COLTR can reach the same effectiveness of the current state-of-the-art, under noisy click settings, and has room for future extensions. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148247/ doi: 10.1007/978-3-030-45439-5_28 id: cord-020846-mfh1ope6 author: Zlabinger, Markus title: DSR: A Collection for the Evaluation of Graded Disease-Symptom Relations date: 2020-03-24 words: nan sentences: nan pages: flesch: nan cache: txt: summary: abstract: The effective extraction of ranked disease-symptom relationships is a critical component in various medical tasks, including computer-assisted medical diagnosis or the discovery of unexpected associations between diseases. While existing disease-symptom relationship extraction methods are used as the foundation in the various medical tasks, no collection is available to systematically evaluate the performance of such methods. In this paper, we introduce the Disease-Symptom Relation Collection (dsr-collection), created by five physicians as expert annotators. We provide graded symptom judgments for diseases by differentiating between relevant symptoms and primary symptoms. Further, we provide several strong baselines, based on the methods used in previous studies. The first method is based on word embeddings, and the second on co-occurrences of MeSH-keywords of medical articles. For the co-occurrence method, we propose an adaption in which not only keywords are considered, but also the full text of medical articles. The evaluation on the dsr-collection shows the effectiveness of the proposed adaption in terms of nDCG, precision, and recall. url: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148057/ doi: 10.1007/978-3-030-45442-5_54 ==== make-pages.sh questions [ERIC WAS HERE] ==== make-pages.sh search /data-disk/reader-compute/reader-cord/bin/make-pages.sh: line 77: /data-disk/reader-compute/reader-cord/tmp/search.htm: No such file or directory Traceback (most recent call last): File "/data-disk/reader-compute/reader-cord/bin/tsv2htm-search.py", line 51, in with open( TEMPLATE, 'r' ) as handle : htm = handle.read() FileNotFoundError: [Errno 2] No such file or directory: '/data-disk/reader-compute/reader-cord/tmp/search.htm' ==== make-pages.sh topic modeling corpus Zipping study carrel