The Open Library Blog | A web page for every book


    	The Open Library Blog

    
    	A web page for every book


    	Skip to content

    	
    		About


					« Older posts

					
					Importing your Goodreads & Accessing them with Open Library’s APIs


	By mek

	 | 

	Published: December 13, 2020

	
by Mek


Today Joe Alcorn, founder of readng, published an article (https://joealcorn.co.uk/blog/2020/goodreads-retiring-API) sharing news with readers that Amazon’s Goodreads service is in the process of retiring their developer APIs, with an effective start date of last Tuesday, December 8th, 2020. 


A screenshot taken from Joe Alcorn’s post 


The topic stirred discussion among developers and book lovers alike, making the front-page of the popular Hacker News website. 


Hacker News at 2020-12-13 1:30pm Pacific.


The Importance of APIs


For those who are new to the term, an API is a method of accessing data in a way which is designed for computers to consume rather than people. APIs often allow computers to subscribe to (i.e. listen for) events and then take actions. For example, let’s say you wanted to tweet every time your favorite author published a new book. One could sit on Goodreads and refresh the website every fifteen minutes. Or, one might write a twitter bot which automatically connects to Goodreads and checks real-time data using its API. In fact, the reason why Twitter bots work, is that they use Twitter’s API, a mechanism which lets specially designed computer programs submit tweets to the platform.

As one of the more popular book services online today, tens of thousands of readers and organizations rely on Amazon’s Goodreads APIs to lookup information about books and to power their book-related applications across the web. Some authors rely on the data to showcase their works on their personal homepages, online book stores to promote their inventory, innovative new services like thestorygraph are using this data to help readers discover new insights, and even librarians and scholastic websites rely on book data APIs to make sure their catalog information is as up to date and accurate as possible for their patrons.


For years, the Open Library team has been enthusiastic to share the book space with friends like Goodreads who have historically shown great commitment by enabling patrons to control (download and export) their own data and enabling developers to create flourishing ecosystems which promote books and readership through their APIs. When it comes to serving an audience of book lovers, there is no “one size fits all” and we’re glad so many different platforms and APIs exist to provide experiences which meet the needs of different communities. And we’d like to do our part to keep the landscape flourishing.  


“The sad thing is it [retiring their APIs] really only hurts the hobbyist projects and Goodreads users themselves.” — Joe Alcorn


Picture of Aaron Swartz by Noah Berger/Landov from thedailybeast


At Open Library, our top priority is pursuing Aaron Swartz‘s original mission: to serve as an open book catalog for the public (one page for every book ever published) and ensure our community always has free, open data to unlock a world of possibilities. A world which believes in the power of reading to preserve our cultural heritage and empower education and understanding. We sincerely hope that Amazon will decide it’s in Goodreads’ best interests to re-instate their APIs. But either way, Open Library is committed to helping readers, developers, and all book lovers have autonomy over their data and direct access to the data they rely on.


One reason patrons appreciate Open Library is that it aligns with their values


Imports & Exports


In August 2020, one of our Google Summer of Code contributors Tabish Shaikh helped us implement an export option for Open Library Reading Logs to help everyone retain full control of their book data. We also created a Goodreads import feature to help patrons who may want an easy way to check which Goodreads titles may be available to borrow from the Internet Archive’s Controlled Digital Lending program via openlibrary.org and to help patrons organize all their books in one place. We didn’t make a fuss about this feature at the time, because we knew patrons have a lot of options. But things can change quickly and we want patrons to be able to make that decision for themselves.


For those who may not have known, Amazon’s Goodreads website provides an option for downloading/exporting a list of books from one’s bookshelves. You may find instructions on this Goodreads export process here. Open Library’s Goodreads importer enables patrons to take this exported dump of their Goodreads bookshelves and automatically add matching titles to their Open Library Reading Logs.


The Goodreads import feature from https://openlibrary.org/account/import


Known issues. Currently, Open Library’s Goodreads Importer only works for (a) titles that are in the Open Library catalog and (b) which are new enough to have ISBNs. Our staff and community are committed to continuing to improve our catalog to include more titles (we added more than 1M titles this year) and we plan to improve our importer to support other ID types like OCLC and LOC.


APIs & Data


Developers and book overs who have been relying on Amazon’s Goodreads APIs are not out of luck. There are several wonderful services, many of them open-source, including Open Library, which offer free APIs:


	Wikidata.org (by the same group who brought us Wikipedia) is a treasure trove of metadata on Authors and Books. Open Library gratefully leverages this powerful resource to enrich our pages.
	Inventaire.io is a wonderful service which uses Wikidata and Openlibrary data (API: api.inventaire.io)
	Bookbrainz.org (by the group who runs Musicbrainz) is a up-and-coming catalog of books
	WorldCat by OCLC offers various metadata APIs


Did we miss any? Please let us know! We’d love to work together, build stronger integrations with, and support other book-loving services.


Open Library’s APIs. And of course, Open Library has a free, open, Book API which spans nearly 30 million books.


Bulk Data. If you need access to all our data, Open Library releases a free monthly bulk data dump of Authors, Books, and more.


Spoiler: Everything on Open Library is an API!


One of my favorite parts of Open Library is that practically every page is an API. All that is required is adding “.json” to the end. Here are some examples:


Search
https://openlibrary.org/search?q=lord+of+the+rings is our search page for humans…
https://openlibrary.org/search.json?q=lord+of+the+rings is our Search API!

Books
https://openlibrary.org/books/OL25929351M/Harry_Potter_and_the_Methods_of_Rationality is the human page for Harry Potter and the Methods of Rationality…
https://openlibrary.org/books/OL25929351M.json is its API!

Authors
https://openlibrary.org/authors/OL2965893A/Rik_Roots is a human readable author page…
https://openlibrary.org/authors/OL2965893A.json and here is the API!


Did We Mention: Full-text Search over 4M Books?


Major hat tip to the Internet Archive’s Giovanni Damiola for this one: Folks may also appreciate the ability to full-text search across 4M of the Internet Archive’s books (https://blog.openlibrary.org/2018/07/14/search-full-text-within-4m-books) on Open Library: 


You can try it directly here:
http://openlibrary.org/search/inside?q=thanks%20for%20all%20the%20fish


As per usual, nearly all Open Library urls are themselves APIs, e.g.:
http://openlibrary.org/search/inside.json?q=thanks%20for%20all%20the%20fish


Get Involved


Questions? Open Library is an free, open-source, nonprofit project run by the Internet Archive. We do our development transparently in public (here’s our code) and our community spanning more than 40 volunteers meets every week, Tuesday @ 11:30am Pacific. Please contact us to join our call and participate in the process.


Bugs? If something isn’t working as expected, please let us know by opening an issue or joining our weekly community calls.


Want to share thanks? Please follow up on twitter: https://twitter.com/openlibrary and let us know how you’re using our APIs!


Thank you


A special thank you to our lead developers Drini Cami, Chris Clauss, and one of our lead volunteer engineers, Aaron, for spending their weekend helping fix a Python 3 bug which was temporarily preventing Goodreads imports from succeeding.


A Decentralized Future


The Internet Archive has a history cultivating and supporting the decentralized web. We operate a decentralized version of archive.org and host regular meetups and summits to galvanize the distributed web community.


In the future, we can imagine a world where no single website controls all of your data, but rather patrons can participate in a decentralized, distributed network. You may be interested to try Bookwyrm, an open-source decentralized project by Mouse, former engineer on the Internet Archive’s Archive-It team.


						Posted in Uncategorized

						| Leave a comment

					
					On Bookstores, Libraries & Archives in the Digital Age


	By Brewster Kahle

	 | 

	Published: October 7, 2020

	
The following was a guest post by Brewster Kahle on Against The Grain (ATG) – Linking Publishers, Vendors, & Librarians


On Bookstores, Libraries & Archives in the Digital Age-An ATG Guest Post


See the original article here on ATG’s website


By: Brewster Kahle, Founder & Digital Librarian, Internet Archive​​​​​​​


​​​Back in 2006, I was honored to give a keynote at the meeting of the Society of American Archivists, when the president of the Society presented me with a framed blown-up letter “S.”  This was an inside joke about the Internet Archive being named in the singular, Archive, rather than the plural Archives. Of course, he was right, as I should have known all along. The Internet Archive had long since grown out of being an “archive of the Internet”—a singular collection, say of web pages—to being “archives on the Internet,” plural.  My evolving understanding of these different names might help focus a discussion that has become blurry in our digital times: the difference between the roles of publishers, bookstores, libraries, archives, and museums. These organizations and institutions have evolved with different success criteria, not just because of the shifting physical manifestation of knowledge over time, but because of the different roles each group plays in a functioning society. For the moment, let’s take the concepts of Library and Archive.


The traditional definition of a library is that it is made up of published materials, while an archive is made up of unpublished materials. Archives play an important function that must be maintained—we give frightfully little attention to collections of unpublished works in the digital age. Think of all the drafts of books that have disappeared once we started to write with word processors and kept the files on fragile computer floppies and disks. Think of all the videotapes of lectures that are thrown out or were never recorded in the first place. 


Bookstores: The Thrill of the Hunt


Let’s try another approach to understanding distinctions between bookstores, libraries and archives. When I was in my 20’s living in Boston—before Amazon.com and before the World Wide Web (but during the early Internet)—new and used bookstores were everywhere. I thought of them as catering to the specialized interests of their customers: small, selective, and only offering books that might sell and be taken away, with enough profit margin to keep the store in business. I loved them. I especially liked the used bookstore owners—they could peer into my soul (and into my wallet!) to find the right book for me. The most enjoyable aspect of the bookstore was the hunt—I arrived with a tiny sheet of paper in my wallet with a list of the books I wanted, would bring it out and ask the used bookstore owners if I might go home with a bargain. I rarely had the money to buy new books for myself, but I would give new books as gifts. While I knew it was okay to stay for awhile in the bookstore just reading, I always knew the game.


Libraries: Offering Conversations not Answers


The libraries that I used in Boston—MIT Libraries, Harvard Libraries, the Boston Public Library—were very different. I knew of the private Boston Athenæum but I was not a member, so I could not enter. Libraries for me seemed infinite, but still tailored to individual interests. They had what was needed for you to explore and if they did not have it, the reference librarian would proudly proclaim: “We can get it for you!” I loved interlibrary loans—not so much in practice, because it was slow, but because they gave you a glimpse of a network of institutions sharing what they treasured with anyone curious enough to want to know more. It was a dream straight out of Borges’ imagination (if you have not read Borges’ short stories, they are not to be missed, and they are short. I recommend you write them on the little slip of paper you keep in your wallet.) I couldn’t afford to own many of the books I wanted, so it turned off that acquisitive impulse in me. But the libraries allowed me to read anything, old and new. I found I consumed library books very differently. I rarely even brought a book from the shelf to a table; I would stand, browse, read, learn and search in the aisles. Dipping in here and there. The card catalog got me to the right section and from there I learned as I explored. 


Libraries were there to spark my own ideas. The library did not set out to tell a story as a museum would. It was for me to find stories, to create connections, have my own ideas by putting things together. I would come to the library with a question and end up with ideas.  Rarely were these facts or statistics—but rather new points of view. Old books, historical newspapers, even the collection of reference books all illustrated points of view that were important to the times and subject matter. I was able to learn from others who may have been far away or long deceased. Libraries presented me with a conversation, not an answer. Good libraries cause conversations in your head with many writers. These writers, those librarians, challenged me to be different, to be better. 


Staying for hours in a library was not an annoyance for the librarians—it was the point. Yes, you could check books out of the library, and I would, but mostly I did my work in the library—a few pages here, a few pages there—a stack of books in a carrel with index cards tucked into them and with lots of handwritten notes (uh, no laptops yet).


But libraries were still specialized. To learn about draft resisters during the Vietnam War, I needed access to a law library. MIT did not have a law collection and this was before Lexis/Nexis and Westlaw. I needed to get to the volumes of case law of the United States.  Harvard, up the road, had one of the great law libraries, but as an MIT student, I could not get in. My MIT professor lent me his ID that fortunately did not include a photo, so I could sneak in with that. I spent hours in the basement of Harvard’s Law Library reading about the cases of conscientious objectors and others. 


But why was this library of law books not available to everyone? It stung me. It did not seem right. 


A few years later I would apply to library school at Simmons College to figure out how to build a digital library system that would be closer to the carved words over the Boston Public Library’s door in Copley Square:  “Free to All.”  


Archives: A Wonderful Place for Singular Obsessions


When I quizzed the archivist at MIT, she explained what she did and how the MIT Archives worked. I loved the idea, but did not spend any time there—it was not organized for the busy undergraduate. The MIT Library was organized for easy access; the MIT Archives included complete collections of papers, notes, ephemera from others, often professors. It struck me that the archives were collections of collections. Each collection faithfully preserved and annotated.  I think of them as having advertisements on them, beckoning the researcher who wants to dive into the materials in the archive and the mindset of the collector.


So in this formulation, an archive is a collection, archives are collections of collections.  Archivists are presented with collections, usually donations, but sometimes there is some money involved to preserve and catalog another’s life work. Personally, I appreciate almost any evidence of obsession—it can drive toward singular accomplishments. Archives often reveal such singular obsessions. But not all collections are archived, as it is an expensive process.


The cost of archiving collections is changing, especially with digital materials, as is cataloging and searching those collections. But it is still expensive. When the Internet Archive takes on a physical collection, say of records, or old repair manuals, or materials from an art group, we have to weigh the costs and the potential benefits to researchers in the future. 


Archives take the long view. One hundred years from now is not an endpoint, it may be the first time a collection really comes back to light.


Digital Libraries: A Memex Dream, a Global Brain


So when I helped start the Internet Archive, we wanted to build a digital library—a “complete enough” collection, and “organized enough” that everything would be there and findable. A Universal Library. A Library of Alexandria for the digital age. Fulfilling the memex dream of Vanevar Bush (do read “As We May Think“), of Ted Nelson‘s Xanadu, of Tim Berners-Lee‘s World Wide Web, of Danny Hillis‘ Thinking Machine, Raj Reddy’s Universal Access to All Knowledge, and Peter Russell’s Global Brain.


Could we be smarter by having people, the library, networks, and computers all work together?  That is the dream I signed on to.  I dreamed of starting with a collection—an Archive, an Internet Archive. This grew to be  a collection of collections: Archives. Then a critical mass of knowledge complete enough to inform citizens worldwide: a Digital Library. A library accessible by anyone connected to the Internet, “Free to All.”


About the Author: Brewster Kahle, Founder & Digital Librarian, Internet Archive


Brewster Kahle


A passionate advocate for public Internet access and a successful entrepreneur, Brewster Kahle has spent his career intent on a singular focus: providing Universal Access to All Knowledge. He is the founder and Digital Librarian of the Internet Archive, one of the largest digital libraries in the world, which serves more than a million patrons each day. Creator of the Wayback Machine and lending millions of digitized books, the Internet Archive works with more than 800 library and university partners to create a free digital library, accessible to all.


Soon after graduating from the Massachusetts Institute of Technology where he studied artificial intelligence, Kahle helped found the company Thinking Machines, a parallel supercomputer maker. He is an Internet pioneer, creating the Internet’s first publishing system called Wide Area Information Server (WAIS). In 1996, Kahle co-founded Alexa Internet, with technology that helps catalog the Web, selling it to Amazon.com in 1999.  Elected to the Internet Hall of Fame, Kahle is also a Fellow of the American Academy of Arts and Sciences, a member of the National Academy of Engineering, and holds honorary library doctorates from Simmons College and University of Alberta.


						Posted in Discussion, Librarianship, Uncategorized

						| Comments closed

					
					Amplifying the voices behind books


	By mek

	 | 

	Published: September 2, 2020

	
Exploring how Open Library uses author data to help readers move from imagination to impact


By Nick Norman, Edited by Mek & Drini


Image Source: Pexels / Pixabay from popsugar


According to René Descartes, a creative mathematician, “The reading of all good books is like a conversation with the finest [people] of past centuries.” If that’s true, then who are some of the people you’re talking to?


If you’re not sure how to answer that question, you’ll definitely appreciate the ‘Author Stats’ feature  developed  by Open Library.


A deep dive into author stats


Author stats give readers clear insights about their favorite authors that go much deeper than the front cover: such as birthplace, gender, works by time, ethnicity, and country of citizenship. These bits and pieces of knowledge about authors can empower readers in some dynamic ways. But how exactly?


To answer that question, consider a reader who’s passionate about the topic of cultural diversity. However, after the reader examines their personalized author stats, they realize that their reading history lacks diversity. This doesn’t mean the reader isn’t passionate about cultural diversity; rather, author stats empowers the reader to pinpoint specific stats that can be diversified.


Take a moment … or a day, and think about all the books you’ve read — just in the last year or as far back as you can. What if you could align the pages of each of those books with something meaningful … something that matters? What if each time you cracked open a book, the voices inside could point you to places filled with hope and opportunity?


According to Drini Cami — Open Library’s lead developer behind Author Stats ,  “These stats let readers determine where the voices they read are coming from.” Drini continues saying, “A book can be both like a conversation as well as a journey.” He also says, “Statistics related to the authors might help provide readers with feedback as to where the voices they are listening to are coming from, and hopefully encourage the reading of books from a wider variety of perspectives.” Take a moment to let that sink in.


Data with the power to change


While Open Library’s author stats can show author-related demographics, those same stats can do a lot more than that. Drini Cami went on to say that, “Author stats can help readers intelligently alter their  behavior (if they wish to).” A profound statement that Mark Twain — one of the best writers in American history — might even shout from the rooftop.


Broad, wholesome, charitable views of [people] … cannot be acquired by vegetating in one little corner of the earth all one’s lifetime. — Mark Twain


In the eyes of Drini Cami and Mark Twain, books are like miniature time machines that have the power to launch readers into new spaces while changing their behaviors at the same time. For it is only when a reader steps out of their corner of the earth that they can step forward towards becoming a better person — for the entire world.


Connecting two worlds of data


Open Library has gone far beyond the extra mile to provide data about author demographics that some readers may not realize. It started with Open Library’s commitment to providing its readers with what Drini Cami describes as “clean, organized, structured, queryable data.” Simply put, readers can trust that Open Library’s data can be used to provide its audiences with maximum value. Which begs the question, where is all that ‘value’ coming from?


Drini Cami calls it “linked data”. In not so complex terms, you may think of linked data as being two or more storage sheds packed with data. When these storage sheds are connected, well… that’s when the magic happens. For Open Library, that magic starts at the link between Wikidata and Open Library knowledge bases.


Wikidata, a non-profit community-powered project run by Wikimedia, the same team which brought us Wikipedia, is a “free and open knowledge base that can be read and edited by both humans and machines”. It’s like Wikipedia except for storing bite-sized encyclopedic data and facts instead of articles. If you look closely, you may even find some of Wikidata’s data being leveraged within Wikipedia articles.


	Wikipedia’s Summary Info Box
	Source data in Wikidata


Wikidata is where Open Library gets its author demographic data from. This is possible because the entries on Wikidata often include links to source material such as books, authors, learning materials, e-journals, and even to other knowledge bases like Open Library’s. Because of these links, Open Library is able to share its data with Wikidata and often times get back detailed information and structured data in return. Such as author demographics.


Wrangling in the Data


Linking-up services like Wikidata and Open Library doesn’t happen automatically. It requires the hard work of “Metadata Wranglers”. That’s where Charles Horn comes in, the lead Data Engineer at Open Library — without his work, author stats would not be possible.


Charles Horn works closely with Drini Cami and also the team at Wikidata to connect book and author resources on Open Library with the data kept inside Wikidata. By writing clever bots and scripts, Charles and Drini are able to make tens of thousands of connections at scale. To put it simply, as both Open Library and Wikidata grow, their resources and data will become better connected and more accurate. 


Thanks to the help of “Metadata Wranglers”, Open Library users will always have the smartest results — right at their fingertips. 


It’s in a book …


Once Upon a Time, ten-time Grammy Award Winner Chaka Kahn greeted television viewers with her bright voice on the once-popular book reading program, Reading Rainbow. In her words, she sang … “Friends to know, and ways to grow, a Reading Rainbow. I can be anything. Take a look, it’s in a book …”


Thanks to Open Library’s author stats, not only do readers have the power to “take a look” into books, they can see further, and truly change what they see.


Try browsing your author stats and consider following Open Library on twitter.


The “My Reading Stats” option may be found under the “My Books” drop down menu within the main site’s top navigation.


What did you learn about your favorite authors? Please share in the comments below.


						Posted in Community, Cultural Resources, Data

						| Comments closed

					
					Giacomo Cignoni: My Internship at the Internet Archive


	By Drini Cami

	 | 

	Published: August 29, 2020

	
This summer, Open Library and the Internet Archive took part in Google Summer of Code (GSoC), a Google initiative to help students gain coding experience by contributing to open source projects. I was lucky enough to mentor Giacomo while he worked on improving our BookReader experience and infrastructure. We have invited Giacomo to write a blog post to share some of the wonderful work he has done and his learnings. It was a pleasure working with you Giacomo, and we all wish you the best of luck with the rest of your studies! – Drini


Hi, I am Giacomo Cignoni, a 2nd year computer science student from Italy. I submitted my 2020 Google Summer of Code (GSoC) project to work with the Internet Archive and I was selected for it. In this blogpost, I want to tell you about my experience and my accomplishments working this summer on BookReader, Internet Archive’s open source book reading web application.


The BookReader features I enjoyed the most working on are page filters (which includes “dark mode”) and the text selection layer for certain public domain books. They were both challenging, but mostly had a great impact on the user experience of Bookreader. The first allows text to be selected and copied directly from the page images (currently in internal testing), and the second permits turning white-background black-text pages into black-background-white-text ones.


Short summary of implemented features:


	End-to-end testing (search, autoplay, right-to-left books)
	Generic book from Internet Archive demo
	Mobile BookReader table of contents
	Checkbox for filters on book pages (including dark mode)
	Text selection layer plugin for public domain books
	Bug fixes for page flipping
	Using high resolution book images bug fix


First approach to GSoC experience


Once I received the news that I had been selected for GSoC with Internet Archive for my BookReader project, I was really excited, as it was the beginning of a new experience for me. For the same reason, I will not hide that I was a little bit nervous because it was my first internship-like experience. Fortunately, even from the start, my mentor Drini and also Mek were supportive and also ready to offer help. Moreover, the fact that I was already familiar with BookReader was helpful, as I had already used it (and even modified it a little bit) for a personal project.


For most of the month of May, since the 6th, the day of the GSoC selection, I mainly focused on getting to know the other members of the UX team at Internet Archive, whom I would be working with for the rest of the summer, and also define a more precise roadmap of my future work with my mentor, as my proposed project was open to any improvements for BookReader.


End to end testing


The first tasks I worked on, as stated in the project, were about end-to-end testing for BookReader. I learned about the Testcafe tool that was to be used, and my first real task was to remove and explore some old QUnit tests (#308). Then I started to make end-to-end tests for the search feature in BookReader, both for desktop (#314) and mobile (#322). Lastly, I fixed the existent autoplay end-to-end test (#344) that was causing problems and I also had prepared end-to-end tests for right-to-left books (#350), but it wasn’t merged immediately because it needed a feature that I would have implemented later; a system to choose different books from the IA servers to be displayed specifying the book id in the URL.


This work on testing (which lasted until the ~20th of June) was really helpful at the beginning as it allowed me to gain more confidence with the codebase without trying immediately harder tasks and also to gain more confidence with JavaScript ES6. The frequent meetings with my mentor and other members of the team made me really feel part of the workplace.


Working on the source code


The table of contents panel in BookReader mobile


My first experience working on core BookReader source code was during the Internet Archive hackathon on May the 30th when, with the help of my mentor, I created the first draft for the table of content panel for mobile BookReader. I would then resume to work on this feature in July, refining it until it was released (#351). I then worked on a checkbox to apply different filters to the book page images, still on mobile BookReader (#342), which includes a sort of “dark mode”. This feature was probably the one I enjoyed the most working on, as it was challenging but not too difficult, it included some planning and was not purely technical and received great appreciation from users.


 Page filters for BookReader mobile let you read in a “dark mode”


https://twitter.com/openlibrary/status/1280184861957828608


Then I worked on the generic demo feature; a particular demo for BookReader which allows you to choose a book  from the Internet Archive servers to be displayed, by simply adding the book id in the URL as a parameter (#356). This allowed the right to left e2e test to be merged and proved to be useful for manually testing the text selection plugin. In this period I also fixed two page flipping issues: one more critical (when flipping pages in quick succession the pages started turning back and forth randomly) (#386), and the other one less urgent, but it was an issue a user specifically pointed out (in an old BookReader demo it was impossible to turn pages at all) (#383). Another issue I solved was BookReader not correctly displaying high resolution images on high resolution displays (#378).


Open source project experience


One aspect I really enjoyed of my GSoC is the all-around experience of working on an open source project. This includes leaving more approachable tasks for the occasional member of the community to take on and helping them out. Also, I found it interesting working with other members of the team aside from my mentor, both for more technical reasons and for help in UI designing and feedback about the user experience: I always liked having more points of view about my work. Moreover, direct user feedback from the users, which showed appreciation for the new implemented features (such as BookReader “dark mode”), was very motivating and pushed me to do better in the following tasks.


Text selection layer


The normally invisible text layer shown red here for debugging


The biggest feature of my GSoC was implementing the ability to select text directly on the page image from BookReader for public domain books, in order to copy and paste it elsewhere (#367). This was made possible because Internet Archive books have information about each word and its placement in the page, which is collected by doing OCR. To implement this feature we decided to use an invisible text layer placed on top of the page image, with words being correctly positioned and scaled. This made it possible to use the browser’s text selection system instead of creating a new one. The text layer on top of the page was implemented using an SVG element, with subelements for each paragraph and word in the page. The use of the SVG instead of normal html text elements made it a lot easier to overcome most of the problems we expected to find regarding the correct placement and scaling of words in the layer.


I started working sporadically on this feature since the start of July and this led to having a workable demo by the first day of August. The rest of the month of August was spent refining this feature to make it production-ready. This included refining word placement in the layer, adding unit tests, adding support for more browsers, refactoring some functions, making the experience more fluid, making the selected text to be accurate for newlines and spaces on copy. The most challenging part was probably to integrate well the text selection actions in the two page view of BookReader, without disrupting the click-to-flip-page and other functionalities related to mouse-click events.


This feature is currently in internal testing, and scheduled for release in the next few weeks.


The text selection experience


Conclusions


Overall, I was extremely satisfied with my GSoC at the Internet Archive. It was a great opportunity to learn new things for me. I got much more fluent in JavaScript and CSS, thanks to both my mentor and using these languages in practice while coding. I learnt a lot about working on an open source project, but a part that I probably found really interesting was attending and participating in the decision making processes, even about projects I was not involved in. It was also interesting for me to apply concepts I had studied on a more theoretical level at university in a real workplace environment.


To sum things up, the ability to work on something I liked that had an impact on users and the ability to learn useful things for my personal development really made this experience worthwhile for me. I would 100% recommend doing a GSoC at the Internet Archive!


						Posted in BookReader, Community, Google Summer of Code (GSoC), Open Source

						| Comments closed

					
					Google Summer of Code 2020: Adoption by Book Lovers


	By mek

	 | 

	Published: August 29, 2020

	
by Tabish Shaikh & Mek


OpenLibrary.org,the world’s best-kept library secret: Let’s make it easier for book lovers to discover and get started with Open Library.


Hi, my name is Tabish Shaikh and this summer I participated in the Google Summer of Code program with Open Library to develop improvements which will help book lovers discover and use OpenLibrary.org.


My Journey into Open Source


When I got to college, I could tell classes would not be enough to help me get the hands on experience I would need to gain confidence in my programming abilities. I heard from friends and professors within my university that open source projects presented a great opportunity to work with established engineers in the field to gain hands-on experience.


In the past, I tried contributing to a few well known open source projects, like Wikipedia. I selected Wikipedia because the community is large, active, and well established, there’s a lot of documentation, and the project is in a programming language I know well.


I quickly became overwhelmed. Wikipedia may be well established, but a project of that size felt difficult to navigate without a mentor to guide me. I was able to successfully set up my environment, but then I had trouble finding an appropriate first issue to work on and hit a dead end as I tried to familiarize myself with the code. I found myself wishing for a chance to work more closely with the community.


One evening in March of 2018, I was searching for a free algorithms book on Google and discovered Open Library. I had trouble finding the exact book I was looking for, but I could tell Open Library was an important library resource for accessing free books online and I noted their dated design as a big opportunity for improvement. So I bookmarked the page in my browser and was surprised to discover a “Help Us” button. I clicked the button and landed on a github issue which mentions their community calls. This gave me confidence there was a community which could help me get started and answer my questions, so I decided to give it a shot.


The community calls gave me a guided path for positively improving the experience of patrons using the service. During the community calls, members present what they’ve completed, what they’re working on, and what they may be stuck with. In reality, this is a way to be seen for your achievements, update others, and receive help. Having this type of structure helped me discover which appropriate opportunities exist, how to approach and plan to solve the problem.


This experience was really special to me because it was the first time I had been part of an international community and all of the members were aligned toward a common goal of universal access to knowledge.


In the first few months of volunteering I redesigned the website footer and made several pull requests. I also noticed Salman was participating in Google Summer of Code (GSoC) in 2018. I applied to work with Open Library for GSoC in 2019 and was disappointed to learn the Internet Archive didn’t have enough slots for Open Library to participate. Fortunately, I worked with Mek, Open Library’s program lead, who recognized my contributions and arranged an “Internet Archive Summer of Code” (IASoC) internship program where we accomplished a major victory of releasing the sponsorship program which empowers the community to make meaningful, diverse books more available to borrow. You can read the blog post here which was picked up by BoingBoing and Gizmodo.


Noticing a Problem


During my years volunteering, we recognized several indicators that Open Library could be better serving its mission by distributing to a larger audience. Open Library, which has millions of free books to borrow, has an international alexa rank of  #11,079, compared to Goodreads which is a top #300 website without having books to borrow. The data also showed many patrons would drop off at the registration page because it didn’t offer immediate field validation and the fields would be cleared upon submit if, e.g. an email was already registered. The book pages, our most frequently viewed pages, were also very slow to load, causing patrons to drop-off. Also the experience of the book pages was confusing because there were separate views for Works and Editions. Because of all these factors, only around 6% of the Internet Archive’s books were checked out, meaning 94% of the catalog remained underutilized.


I applied to GSoC 2020 with a plan, “Adoption By Book Lovers” to resolve some of these key issues, help more people like myself discover and derive value from the Open Library, and hopefully improve their first experience in the process.


Placing our bets


In the service of helping more patrons discover Open Library, increasing our utilization and engagement, and decreasing confusion and bad experiences, we made 5 key bets:


	Improving Sign Up
	Book Page Redesign
	Shareable Profiles & Public Reading Log
	Imports & Exports
	Twitter Bot


There’s a common saying, “the first impression is the last impression”. This has certainly been true for many patrons attempting to sign up for an Open Library account. The easiest, surest way to help more patrons derive value from the Open Library platform is by Improving Sign Up; reducing the friction and early negative first impressions during account creation.


Open Library’s mission for 2019 was “Reducing bad experiences, confusion, & dead-ends”. By combining our Works and Editions pages into a single more performant Book Page Redesign we believed we’d reduce the confusion of users searching for their favourite books and in turn, also increase distribution. The DoubleClick study by Google shows that 53% of patrons drop off if page load is exceeds 3 seconds and this carries significant SEO penalties. While redesigning our Book Page, a key consideration was page-load performance because we knew this would increase our rank in search engine results and increase retention through the lending and registration funnels.


Finally, by betting on social features, like shareable profiles and public-by-default reading logs, the ability to import books from Goodreads, and a twitter @borrowbot to help patrons discover which books are available to read and borrow on Open Library, we felt confident we could increase the number of patrons that may discover and adopt OpenLibrary.org.


Improving Signup


In 2018, we coincidentally, hit a regression #1431 to our account creation page which presented itself as a server error for patrons trying to register a new account when their username or email was already taken.


Because of this bug, our daily registered users dropped from ~2300 to ~1700 (-500). Through this, we discovered that nearly 1/5 of patrons (i.e. 500 a day) who attempted registration would hit some validation issue when creating their account (e.g. email or username invalid or taken, recaptcha broken). Even after solving the #1431 regression, we hypothesized that many of these 500 patrons were hitting error-cases which refreshed the page and cleared their form inputs, causing patrons to bounce. An easy solution was adding real-time validation to ensure emails, usernames, passwords, and recaptcha are valid before submitting the form.


In order to implement real time validation, we planned Epic #1433 which included two pieces: 


	#2053 – update backend API endpoints
	#2055 – Add real-time field validation for email, username, password to show errors before submission.


While we do not have great analytics on how conversion increased, we do know from our support channels that these changes have anecdotally resulted in a significant decrease in support emails around patron signup.


	Try it out 👉	https://openlibrary.org/account/create


	Issue:	https://github.com/internetarchive/openlibrary/issues/3256


	Pull Request:	https://github.com/internetarchive/openlibrary/pull/3452


Book Page Redesign


User interviews and surveys have taught us that most patrons who visit Open Library are trying to find a “Book”. Many patrons report that the terms Work and Edition may confuse their experience. This confusion is increased because a user can unpredictably be dropped into either a Work page or an Edition page which have different designs.


Our goal in redesigning the books page was to increase clarity of the experience and improve page loading times. To improve clarity and simplify the experience, we merged the work and edition pages to a single book page where patrons may find all the information about a work and learn about the availability of various editions without having to navigate multiple pages.  


When redesigning the Book Page, we made the following changes:


	Editions table. We made the editions table front-and-center to enable readers to quickly switch between the different editions. We also feature editions by availability and language, and allow patrons to change how many results are shown at a time. We added a new search box to enable patrons to find relevant editions without reloading. 
	Navigation tabs. We have bucketed the work’s information into an “Overview” tab and the current Edition’s information in the “This Edition” tab. The tab bar always sticks to the top of the page for easy access to different sections of the page.
	Expandable descriptions. In previous designs, long text descriptions made it difficult to see all important book information at a glance. There are now “Read more” links to expand and collapse long descriptions.
	Clearer buttons. All the favorite actions of readers such as borrowing, searching inside, adding books to one’s reading log, and book star ratings have been grouped together and moved right below the book cover. It’s hopefully more clear now that the “Want to Read”
	Load times. We know page speed is a priority for readers. The new Books Page should be significantly faster (Lazy Loading of Related Works Carousel).


Considerations. We tried to change as little as possible and were careful not to remove existing functionality:


	URLs: Developers and partners will be happy to hear that /works and /books urls and APIs will continue to work as expected without change. Both the work and edition pages will simply appear to use the same consistent design.
	Lists: While admittedly slightly less convenient, you can still add Works to Lists by clicking the “Use this Work” checkbox as shown below. By default, Lists will use Editions.


I had always worked in small teams with not a lot of stakeholders and no clash of ideas. The Books Page Redesign was one in which the issue was open for 3 years and it was being stalled due to clash of interests in how we should display our pages. Completing this issue was a major milestone in my GSoC program where I learned to cooperate and compromise on some aspects of our design so that all stakeholders were happy.


The feedback we received from our patrons was that ~65% patrons found the New Books Page a step forward, ~17% did not have any preference and ~22% found the change a step backward. Therefore we think our hypothesis was correct and this feature would improve user experience and reduce user confusion. 


Read more about the Book Page Redesign: https://blog.openlibrary.org/2020/07/08/re-thinking-open-librarys-book-pages/


	Try it out 👉	https://openlibrary.org/works/OL76837W/The_Da_Vinci_Code


	Issues:	https://github.com/internetarchive/openlibrary/issues/684
	https://github.com/internetarchive/openlibrary/issues/3556


	Pull Request:	https://github.com/internetarchive/openlibrary/pull/3553


Additional Book Page improvements


After completing the Book Page redesign, we made two major improvements to help our Librarian community and to improve performance and load times: a better book /edit experience and Lazy Loading of expensive book page components (e.g. related works carousels).


Book Page Editor. We redesigned the Books Page Editor to enable our librarians edit book metadata with ease. 


Lazy Loading of Related Carousels. To improve the page loading time we firstly created a list of components and their timings and noticed that the Related Works and Author Works took the most time to load thereby slowing down the page for up to 10%. Therefore our hypothesis was to lazy load related works carousel which would then enable our newly designed books pages to load faster. 


The impact of this change was that now pages load up to 10% faster:


	Issues:	https://github.com/internetarchive/openlibrary/issues/3577
	https://github.com/internetarchive/openlibrary/pull/3697


Shareable Profiles & Public Reading Logs 


We noticed that very few patrons share their reading logs or even know they can be shared. However, we know patrons on Goodreads share their reading logs frequently. And also, lists on Open Library are shared all the time. Why is this?


In 2017, when Open Library announced the new Reading Log feature, it was set to be private by default. We expected many patrons would change their reading logs to be public, but because it wasn’t public by default and difficult to discover, patrons didn’t know the feature existed and had no reason to make it public.


In the spirit of being an open platform, we wanted patrons to have the opportunity to make their reading logs public to patrons with similar interests. As a result, we decided to make Reading Logs public by default for new accounts created after 2020-05, with the option for any patron to set their reading logs to private. Even after making this change, we noticed patrons trying to share their generic /account/books page, however this page always reflects the content of the currently logged in user.


By always redirecting /account/books to the publicly shareable /people/username url, we are able to move in a direction which enables patrons to freely share their reading logs and paves the way for other features like “following”, which we’re interested in exploring next year.


Enabling these change required:


	Modifying the user registration page (front-end) + https://github.com/internetarchive/openlibrary/blob/master/openlibrary/plugins/upstream/account.py#L203 (back-end) to support enabling this setting from the account creation form.
	For enabling redirects – Adding a redirect from /account/books page to /people/username and dealing with conditions for public/private reading log.


This change simplified how users share their reading log and profile pages publicly paving a path for more social additions to Open Library.  


	Try it out 👉 	https://openlibrary.org/people/tab91/books/want-to-read, https://openlibrary.org/people/tab91


	Issues:	https://github.com/internetarchive/openlibrary/issues/2058
	https://github.com/internetarchive/openlibrary/issues/3025
	https://github.com/internetarchive/openlibrary/issues/3461


	Pull Requests:	https://github.com/internetarchive/openlibrary/pull/3051
	https://github.com/internetarchive/openlibrary/pull/3476
	https://github.com/internetarchive/openlibrary/pull/3465


Imports & Exports


Goodreads provides a way to download/export a list of books from one’s bookshelves. This feature would allow a user to take an exported dump of their reading log from Goodreads and then add each of these books to their Open Library account.


The Goodreads import feature from https://openlibrary.org/account/import


The export options enables patrons to download a list of Open Library book identifiers from their reading log. 


The download export option from https://openlibrary.org/account/import


A picture of a CSV file crated by the exporter


	Try it out 👉 	https://openlibrary.org/account/import


	Issue:	https://github.com/internetarchive/openlibrary/issues/835
	https://github.com/internetarchive/openlibrary/issues/3661


	Pull Request: 	https://github.com/internetarchive/openlibrary/pull/3597
	https://github.com/internetarchive/openlibrary/pull/3662 


Twitter Bot


Our objective for this task was how do we reach more patrons/readers and help them discover more books on openlibrary.org? According to the hashtag analytics audit done on tweetbinder.com on hashtags #books #amazon using the free version the analytics show that in a 7 day period the number of original tweets(excluding retweets) was approx. 140 with a number impact of 11M. Therefore this is a great opportunity for making our bookshelves discoverable.


Whenever a user tweets out a book with the amazon link/ an ISBN, the twitter @borrowbot would retweet the book with the link from Open Library if it is available. The book will be tweeted only once.


	Try it out 👉 	tweet @borrowbot ISBN/ Amazon link 


	Related Issue:	https://github.com/internetarchive/openlibrary/issues/3255


	Code:	https://github.com/tabshaikh/openlibrary-bots/tree/twitter-bot


Impact


	In no small part because of the bets we made, our international Alexa rank improved by 10% from #11,079 to #9,893.
	Our Book Page load times improved on average by ~10%.
	2 out of 3 of our patrons approved of our Book Page redesign, with 11% celebrating it as game changer.
	More than 5,000 books have already been imported through the Goodreads import tool
	Support team reports significant decrease in account creation support emails


What I learned


I always looked for ways to improve my work and have always loved constructive feedback from my mentor Mek who helped me learn how to estimate time for tasks, effectively identify stakeholders and include them in the process (reaching consensus on decisions was a lot harder than I anticipated), and how to communicate problems and achievements in a way which everyone may understand. Also, writing takes a long time and it’s easy to want to code until the deadline. As our founder Brewster Kahle says, “work backwards from the blog post”. 


I also had the privilege of applying what I’ve learned to be a mentor for both Sachin Naik (#3627 #3622 ) and Fatima (#3454) within our community and helping them submit some of their first pull requests for Open Library


						Posted in Bulk Access, Community, Google Summer of Code (GSoC), Open Source

						| Comments closed

					
							Open Library is an initiative of the Internet Archive, a 501(c)(3) non-profit, building a digital library of Internet sites and other cultural artifacts in digital form.
Other projects include the Wayback Machine, archive.org and  archive-it.org. Your use of the Open Library is subject to the Internet Archive's Terms of Use. 


				« Older posts

					
				Search


			Recent Posts

			
					Importing your Goodreads & Accessing them with Open Library’s APIs
									
	
					On Bookstores, Libraries & Archives in the Digital Age
									
	
					Amplifying the voices behind books
									
	
					Giacomo Cignoni: My Internship at the Internet Archive
									
	
					Google Summer of Code 2020: Adoption by Book Lovers
									

	Archives

		Archives
		Select Month
 December 2020 
 October 2020 
 September 2020 
 August 2020 
 July 2020 
 May 2020 
 November 2019 
 October 2019 
 January 2019 
 October 2018 
 August 2018 
 July 2018 
 June 2018 
 May 2018 
 March 2018 
 December 2017 
 October 2016 
 June 2016 
 May 2016 
 February 2016 
 January 2016 
 November 2015 
 February 2015 
 January 2015 
 December 2014 
 November 2014 
 October 2014 
 August 2014 
 July 2014 
 June 2014 
 May 2014 
 April 2014 
 March 2014 
 April 2013 
 January 2013 
 August 2012 
 December 2011 
 November 2011 
 October 2011 
 July 2011 
 June 2011 
 May 2011 
 April 2011 
 March 2011 
 February 2011 
 January 2011 
 December 2010 
 November 2010 
 October 2010 
 September 2010 
 August 2010 
 July 2010 
 June 2010 
 May 2010 
 April 2010 
 March 2010 
 February 2010 
 January 2010 
 December 2009 
 November 2009 
 October 2009 
 September 2009 
 August 2009 
 July 2009 
 June 2009 
 May 2009 
 April 2009 
 March 2009 
 February 2009 
 January 2009 
 December 2008 
 November 2008 


   			Theme customized from Thematic Theme Framework.