Bibliometrics and Information Retrieval: Creating Knowledge through Research Synergies


1 

 
Bibliometrics and Information Retrieval:  
Creating Knowledge through Research Synergies 

 
Judit Bar-Ilan  

Bar-Ilan University 
Ramat Gan, Israel   

Judit.Bar-Ilan@biu.ac.il 
 
 
Marcus John  
Fraunhofer Institute  
For Technological  

Trend Analysis  
Euskirchen, Germany   

marcus.john@int.fraunhofer.de 

 Rob Koopman &  
Shenghui Wang  

OCLC Research Europe  
Leiden, Netherlands   

Rob.Koopman@oclc.org, 
shenghui.wang@gmail.com  

  
Philipp Mayr  

GESIS  
Cologne, Germany  

Philipp.Mayr-
Schlegel@gesis.org  

  
Andrea Scharnhorst 
Royal Netherlands Academy of 

Arts and Sciences 
Amsterdam, Netherlands   

andrea.scharnhorst@dans.kna
w.nl  

 
Dietmar Wolfram  

University of Wisconsin-
Milwaukee  

Milwaukee, WI USA   
dwolfram@uwm.edu    

 
ABSTRACT 

This panel brings together experts in bibliometrics and 

information retrieval to discuss how each of these two 

important areas of information science can help to inform 

the research of the other. There is a growing body of 

literature that capitalizes on the synergies created by 

combining methodological approaches of each to solve 

research problems and practical issues related to how 

information is created, stored, organized, retrieved and 

used. The session will begin with an overview of the 

common threads that exist between IR and metrics, 

followed by a summary of findings from the BIR 

workshops and examples of research projects that combine 

aspects of each area to benefit IR or metrics research areas, 

including search results ranking, semantic indexing and 

visualization. The panel will conclude with an engaging 

discussion with the audience to identify future areas of 

research and collaboration. 

Keywords 

Bibliometrics, Information Retrieval, Digital Libraries, 

Visualization, Search, Semantic Indexing   

INTRODUCTION 

Information Retrieval (IR) and 

Bibliometrics/Informetrics/Scientometrics (referred to 

hereafter as “metrics”) represent two core areas of study in 

Information Science. Each has a long history with noted 

contributions to our understanding of how information is 

created, stored, organized, retrieved and used. Until 

recently, researchers have treated each of these areas as 

separate areas of investigation, with little overlap between 

the research topics undertaken in each area and little 

collaboration among researchers in both areas. This is 

surprising given that there are many common elements of 

interest to researchers in IR and metrics. Recognition of the 

mutually beneficial relationship that exists between IR and 

metrics has been growing over the past 15 years, with 

literature that specifically addresses this topic (e.g., 

Wolfram, 2003; Mayr & Scharnhorst, 2015) and the recent 

Bibliometric-enhanced Information Retrieval workshops 

(Mayr, Frommholz & Cabanac, 2016) held at metrics and 

IR meetings. The mutually beneficial relationship is evident 

in the application of metric and citation analysis methods in 

the design of IR systems and in the use of techniques 

developed in IR that lend themselves to the study of metric 

phenomena. A prime example is the development and use 

of the PageRank algorithm by Page, Brin, Motwani and 

Winograd (1999), which was inspired by ideas from 

citation analysis and then adapted to the Web to inform 

relevance ranking decisions of documents. It has since been 

re-purposed by metrics researchers for the ranking of 

authors and papers. 

 
ASIST 2016, October 14-18, 2016, Copenhagen, Denmark. 

 
PANEL ORGANIZATION 

This panel brings together researchers in IR and metrics to 

present an overview of how IR and metrics research may be 

combined, to provide examples of research that intersect 

both areas, and to engage in a discussion with the audience 

about future potential topics. The session will begin with an 

overview of the synergies that exist between IR and 

metrics, followed by a summary of findings from the BIR 

workshops and examples of research projects that combine 

aspects of each area to benefit IR and/or metrics research. 

The panel will conclude with an engaging discussion with 

the audience to identify future areas of research and 

collaboration. Initial questions to stimulate the discussion 

will include: 1) why don’t more IR researchers look to 

metrics research to help solve their research problems and 

vice versa, and; 2) as an IR or metrics researcher, what do 

you see as a research problem of interest that could benefit 

from the approaches used by the other area?   

OVERVIEW (Dietmar Wolfram) 

Metrics researchers have long recognized that empirical 

regularities or patterns exist in the way information is 

produced and used, such as author and journal productivity, 

the way language is used, and how literatures grow over 

time. These regularities extend to the content of IR systems 

and how the systems are used. Knowledge of these 

regularities, such as patterns in how users interact with IR 

systems, can help to inform the design and evaluation of IR 

systems. Similarly, measures developed for metrics 

research also have applications in IR.   

Conversely, techniques developed that support more 

efficient IR are now being applied in metrics studies. This 

is exemplified in the use of language and topic modeling, 

which were developed to overcome limitations of more 

simplistic “bag of words” approaches in IR. Topic 

modeling has become a useful tool for better understanding 

relationships between papers, authors and journals by 

relying on the language used within the documents of 

interest. These tools complement existing methods based on 

citations and collaborations in helping researchers to reveal 

the underlying structure of disciplines. The relationships 

between metrics research and IR--and in particular 

academic search--are much closer than many researchers 

realize.   

RECENT ADVANCES IN BIBLIOMETRIC-ENHANCED 
INFORMATION RETRIEVAL (Philipp Mayr)  

The presentation will report about recent advances of the 

Bibliometric-enhanced Information Retrieval (BIR) 

workshop initiative. 

Our motivation as organizers of the BIR workshops (2014, 

2015 and 2016) started from the observation that the main 

discourses in both fields are different and the communities 

only partly overlap, as well as from the belief that a 

knowledge transfer is profitable for both sides. 

The first BIR workshop in 2014 set the research agenda by 

introducing each group to the other, illustrating state-of-the-

art methods, reporting on current research problems, and 

brainstorming about common interests. The second 

workshop in 2015 further elaborated these themes. The 

third full-day BIR workshop at ECIR 2016 aimed to 

establish a common ground for the incorporation of 

bibliometric-enhanced services into scholarly search engine 

interfaces. In particular, we addressed specific 

communities, as well as studies on large, cross-domain 

collections like Mendeley and ResearchGate. The third BIR 

workshop addressed explicitly both scholarly and industrial 

researchers. In June 2016, we will organize the 4th BIR 

workshop at the JCDL conference in collaboration with the 

NLP and computational linguistics research group from 

Min-Yen Kan (see BIRNDL workshop 

http://wing.comp.nus.edu.sg/birndl-jcdl2016/). 

The past workshop topics included (but were not limited to) 

the following: 

• IR for digital libraries and scientific information portals 

• IR for scientific domains, e.g. social sciences, life 
sciences, etc. 

• Information seeking behavior 

• Bibliometrics, citation analysis, and network analysis for 
IR 

• Query expansion and relevance feedback approaches 

• Science Modeling (both formal and empirical) 

• Task-based user modeling, interaction, and 
personalization 

• (Long-term) Evaluation methods and test collection 
design 

• Collaborative information handling and information 
sharing 

• Classification, categorization, and clustering approaches 

• Information extraction (including topic detection, entity 
and relation extraction) 

• Recommendations based on explicit and implicit user 
feedback 

Previous BIR workshops have generated a wide range of 

papers. Proceedings are available at http://ceur-ws.org/Vol-

1143/, http://ceur-ws.org/Vol-1344/ and http://ceur-

ws.org/Vol-1567/. The main directions of these workshop 

papers have been: 

• IR and recommendation tool development and evaluation  

• Bibliometric IR experiments and data sets 

• Document Clustering for IR 

• Citation Contexts and Analysis 

http://wing.comp.nus.edu.sg/birndl-jcdl2016/
http://ceur-ws.org/Vol-1344/
http://ceur-ws.org/Vol-1567/
http://ceur-ws.org/Vol-1567/


3 

 
The presentation will report about highlights of the past 

workshop papers and outline future directions of this 

initiative. 

APPLICATION OF THE H-INDEX FOR RANKING 

SEARCH RESULTS (Judit Bar-Ilan) 

In traditional IR, search results are usually ranked using 

tf*idf (term frequency/inverse document frequency). On the 

web, hypertext links can be utilized as well. The web-graph 

(node=web pages, links=hypertext links) is similar to 

citation networks (nodes=publications, links=citations). 

Citations are usually counted without assigning weights to 

citation. Similarly, the number of links to a web page can 

be counted, but this turns out to be insufficient because of 

the lack of quality control on the web, and links have to be 

weighted by their “importance”. This is the idea behind the 

PageRank (Page et al., 1999). This idea stems from 

bibliometrics (Pinski & Narin, 1976). It should be noted 

that the PageRank calculation is quite costly. 

We suggest using a variant of the h-index for ranking. The 

h-index was introduced by Jorge Hirsch (2005). Hirsch is a 

physicist, but the idea of combining publication and citation 

counts captured the imagination of bibliometrics 

researchers, and a huge number of variants were suggested. 

One of them, the h-index of a single journal paper 

suggested by Schubert (2009), can be applied to the web 

graph as well, by assessing the importance of a web page by 

the number of inlinks webpages linking to this page 

received (Bar-Ilan & Levene, 2015). The advantage of this 

method is that it is based on local computation unlike 

PageRank. This idea shows how bibliometrics and 

information retrieval can inform each other. 

SEMANTIC INDEXING FOR INFORMATION RETRIEVAL 
AND BIBLIOMETRIC ANALYSIS (Rob Koopman & 
Shenghui Wang) 
Large scale digital libraries offer users the opportunities to 

explore a vast amount of information using relatively 

uniform mechanisms, such as keyword-based or faceted 

searches. In the meantime, users are challenged to make 

sense of the overloaded result sets that are too big and 

complex to comprehend or to understand and counteract the 

biases derived from different ranking mechanisms that 

render the results. We believe that semantic indexing based 

on statistical analysis together with intuitive interfaces can 

help users to find relevant information and discover patterns 

fast and reliably. 

In this talk, we will present our Ariadne context explorer, 

which allows users to visually explore the context of 

bibliographic entities, such as authors, subjects, journals, 

citations, publishers, etc. The visualization is built on 

semantic indexing of these entities based on the terms that 

share the same contexts in a large scale bibliographic 

dataset. The statistical analysis based on Random Projection 

results in an underlying semantic space within which each 

entity is represented vectorially. Each bibliographic record 

or any piece of text could also be represented as a vector in 

this semantic space. The information retrieval task then 

becomes a task of finding the nearest neighbors in this 

space, no matter the search starts with an author, a citation, 

an article or a free text. 

We will demonstrate the Ariadne context explorer and 

report the results of applying such semantic indexing and 

visualization in a topic-delineation exercise. 

SEEKING FOR THE NEEDLE IN THE HAYSTACK: 
BIBLIOMETRICS, INFORMATION RETRIEVAL AND 
VISUALIZATION IN THE CONTEXT OF TECHNOLOGY 
FORESIGHT (Marcus John) 
Technology foresight is an important element of any 

strategic planning process, since it assists decision makers 

in identifying and assessing future technologies. One 

important assumption made in this context is that 

tomorrow's technologies are based on today’s daily work in 

scientific laboratories. Consequently, any technology 

foresight process must rely on a continuous scanning of the 

scientific and technological landscape in order to detect 

scientific advances, breakthroughs and emerging topics. In 

other words, a kind of science observatory has to be 

established. Due to the rising number of scientific papers 

published each year, it becomes more and more difficult to 

restrict this scanning and monitoring process solely to 

classical desktop research and information retrieval 

techniques. Additionally the classical task of IR, namely the 

identification of relevant information is exacerbated by the 

need to identify relevant and new information. 

Consequently, the information overload makes it necessary 

to complement classical approaches by quantitative data-

driven approaches stemming from informetrics, 

bibliometrics, data mining and related fields. 

This work in progress report presents an overview of the 

ongoing research at the Fraunhofer INT and addresses the 

question if and how these quantitative data-driven 

approaches might enhance the classic portfolio of 

technology foresight. This will be exemplified along a 

prototypical technology foresight process along which 

different IR-related challenges will be identified. It will be 

demonstrated how eavesdropping into today's scientific 

communication by bibliometric means might support this 

process. Exemplarily, a procedure coined "trend 

archaeology" will be presented. This approach examines 

historic scientific trends and seeks for specific patterns 

within their temporal evolution. The proposed method is a 

multidimensional approach, since it takes into account 

multiple aspects of a scientific theme using bibliometric 

means. Additionally, "trend archaeology" is based on the 

synoptic inspection of different scientific themes, which 

emanate from different fields like nanotechnology or 

materials science. It will be demonstrated that for 

technology foresight it is mandatory to take into account the 

multidimensional-multiscalar, dynamic and highly 

interconnected nature of science. 


THE PANEL MEMBERS 

Andrea Scharnhorst (moderator) is Head of e-Research at 

the Data Archiving and Networked Services (DANS) 

institution in the Netherlands - a large digital archive for 

research data primarily from the social sciences and 

humanities. She is also member of the e-humanities group 

at the Royal Netherlands Academy of Arts and Sciences 

(KNAW) in Amsterdam, where she coordinates the 

computational humanities programme. Her work focuses on 

understanding, modeling and simulating the emergence of 

innovations. 

Judit Bar-Ilan (panelist) is professor at the Department of 

Information Science of Bar-Ilan University in Israel. She 

received her PhD in computer science from the Hebrew 

University of Jerusalem and started her research in 

information science in the mid-1990s at the School of 

Library, Archive and Information Studies of the Hebrew 

University of Jerusalem. She moved to the Department of 

Information Science at Bar-Ilan University in 2002. Her 

areas of interest include informetrics, information retrieval, 

Internet research, information behavior, the semantic Web 

and usability. Additional details are available at: 

http://is.biu.ac.il/en/judit/. 

Marcus John (panelist) received his PhD in the field of 

theoretical astrophysics. Since 2007, he has been a senior 

scientist at the Fraunhofer Institute for Technological Trend 

Analysis where he is mainly concerned with technology 

foresight and future-oriented technology analysis. His main 

fields of interest are complex systems science, physics of 

socio-economic systems, simulation methods and human 

enhancement. Additionally his work focuses on the 

application of bibliometric and other quantitative methods 

for technology foresight. 

Rob Koopman (panelist) is an architect in OCLC 

European, Middle East and Africa (EMEA) office in 

Leiden, Netherlands. His main research area is applied data 

science. He has a physics background and has worked at 

OCLC EMEA since 1981. 

Philipp Mayr (panelist) is team leader at the GESIS 

department Knowledge Technologies for the Social 

Sciences (WTS). He is the main organizer of the past BIR 

workshops. His team at GESIS runs two retrieval platforms 

which cover bibliographic information and full texts in the 

social sciences. His research interests include informetrics, 

information retrieval and digital libraries. Additional details 

are available at: http://www.gesis.org/de/das-

institut/mitarbeiterverzeichnis/?alpha=M&name=philipp%2

Cmayr. 

Shenghui Wang (panelist) is a research scientist in OCLC 

Research since 2012, based in OCLC EMEA office in 

Leiden, Netherlands. Her current research activities include 

text mining, visualization as well as linked data 

investigations. 

Dietmar Wolfram (panelist) is professor at the School of 

Information Studies at the University of Wisconsin-

Milwaukee. He received his PhD in Library and 

Information Science from the University of Western 

Ontario. His research interests include informetrics, 

information retrieval, the intersection between these two 

areas, scholarly communication and user studies.  

ACKNOWLEDGMENTS 
This panel is sponsored by ASIS&T SIG/MET. 

REFERENCES 

Bar-Ilan, J., & Levene, M. (2015). The hw-rank: An h-

index variant for ranking web pages. Scientometrics, 102, 

2247-2253. 

Hirsch, J. E. (2005). An index to quantify an individual's 

scientific research output. Proceedings of the National 

academy of Sciences of the United States of America, 

102(46), 16569-16572. 

Mayr, P., Frommholz, I., & Cabanac, G. (2016). 

Bibliometric-Enhanced Information Retrieval: 3rd 

International BIR Workshop. In N. Ferro et al. (Eds.), 

Advances in Information Retrieval: 38th European 

Conference on IR Research, ECIR 2016 (pp. 865-868). 

Springer. 

Mayr, P., & Scharnhorst, A. (2015). Scientometrics and 

information retrieval - weak-links revitalized. 

Scientometrics, 102(3), 2193–2199. doi:10.1007/s11192-

014-1484-3 

Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). 

The PageRank citation ranking: bringing order to the 

web. URL: http://ilpubs.stanford.edu:8090/422/1/1999-

66.pdf.  

Pinski, G., & Narin, F. (1976). Citation influence for 

journal aggregates of scientific publications: Theory, with 

application to the literature of physics. Information 

Processing and Management, 12(5), 297–312. 

Schubert, A. (2009). Using the h-index for assessing single 

publications. Scientometrics, 78(3), 559–565. 

Wolfram, D. (2003). Applied informetrics for information 

retrieval research. Westport, CT: Libraries Unlimited.

 
http://is.biu.ac.il/en/judit/
http://www.gesis.org/de/das-institut/mitarbeiterverzeichnis/?alpha=M&name=philipp%2Cmayr
http://www.gesis.org/de/das-institut/mitarbeiterverzeichnis/?alpha=M&name=philipp%2Cmayr
http://www.gesis.org/de/das-institut/mitarbeiterverzeichnis/?alpha=M&name=philipp%2Cmayr