Monday, February 22, 2021

Open Access for Backlist Books, Part II: The All-Stars

Libraries know that a big fraction of their book collections never circulate, even once. The flip side of this fact is that a small fraction of a library's collection accounts for most of the circulation. This is often referred to as Zipf's law; as a physicist I prefer to think of it as another manifestation of log-normal statistics resulting a preferential attachment mechanism for reading. (English translation: "word-of-mouth".)

In my post about the value of Open Access for books, I suggested that usage statistics (circulation, downloads, etc.) are a useful proxy for the value that books generate for their readers. The logical conclusion is that the largest amount of value that can be generated from opening of the backlist comes from the books that are most used, the "all-stars" of the library, not the discount rack or the discards. If libraries are to provide funding for Open Access backlist books, shouldn't they focus their resources on the books that create the most value?

The question of course, is how the library community would ever convince publishers, who have monopolies on these books as a consequence of international copyright laws, to convert these books to Open Access. Although some sort of statutory licensing or fair-use carve-outs could eventually do the trick, I believe that Open Access for a significant number of "backlist All-Stars" can be achieved today by pushing ALL the buttons available to supporters of Open Access. Here's where the Open Access can learn from the game (and business) of baseball.

"Baseball", Henry Sandham, L. Prang & Co. (1861).
  from Digital Commonwealth


Baseball's best player, Mike Trout, should earn $33.25 million this year, a bit over $205,000 per regular season game. If he's chosen for the All-Star game, he won't get even a penny extra to play unless he's named MVP, in which case he earns a $50,000 bonus. So why would he bother to play for free? It turns out there are lots of reasons. The most important have everything to with the recognition and honor of being named as an All-Star, and with having respect for his fans. But being an All-Star is not without financial benefits considering endorsement contracts and earning potential outside of baseball. Playing in the All-Star game is an all-around no-brainer for Mike Trout.

Open Access should be an All-Star game for backlist books. We need to create community-based award programs that recognize and reward backlist conversions to OA. If the world's libraries want to spend $50,000 on backlist physics books, for example, isn't it better to spend it on the the Mike Trout of physics books than on a team full of discount-rack replacement-level players?

Competent publishers would line up in droves for major-league all-star backlist OA programs. They know that publicity will drive demand for their print versions (especially if NC licenses are used.) They know that awards will boost their prestige, and if they're trying to build Open Access publication programs, prestige and quality are a publisher's most important selling points.

The Newbury Medal

Over a hundred backlist books have been converted to open access already this year. Can you name one of them? Probably not, because the publicity value of existing OA conversion programs is negligible. To relicense an All-Star book, you need an all-star publicity program. You've heard of the Newbury Medal, right? You've seen the Newbury medal sticker on children's books, maybe even special sections for them in bookstores. That prize, award by the American Library Association every year to honor the most distinguished contributions to American literature for children, is a powerful driver of sales. The winners get feted in a gala banquet and party (at least they did in the before-times). That's the sort of publicity we need to create for open access books.

If you doubt that "All-Star Open Access" could work, don't discount the fact that it's also the right thing to do. Authors of All-Star backlist books want their books to be used, cherished and remembered. Libraries want books that measurably benefit the communities they serve. Foundations and governmental agencies want to make a difference. Even publishers who look only at their bottom lines can structure a rights conversion as a charitable donation to reduce their tax bills.

And did I mention that there could be Gala Award Celebrations? We need more celebrations, don't you think?

If your community is interest in creating an Open-Access program for backlist books, don't hesitate to contact me at the Free Ebook Foundation!

Notes

I've written about the statistics of book usage here, here and here.

This is the third in a series of posts about creating value of Open Access books. The first two are:

Tuesday, February 16, 2021

Open Access for Backlist Books, Part I: The Slush Pile

"Kale emerging from a slush pile"
(CC BY, Eric Hellman)
Book publishers hate their "slush pile": books submitted for publication unsolicited, rarely with literary merit and unlikely to make money for the publisher if accepted. In contrast, book publishers love their backlist; a strong backlist is what allows a book publisher to remain consistently profitable even when most of their newly published books fail to turn a profit. A publisher's backlist typically consists of a large number of "slushy" books that generate negligible income and a few steady "evergreen" earners. Publishers don't talk much about the backlist slush pile, maybe because it reminds them of their inability to predict a book's commercial success.

With the advent of digital books has come new possibilities for generating value from the backlist slush pile. Digital books can be kept "in print" at essentially no cost (printed books need warehouse space) which has allowed publishers to avoid rights reversion in many cases. Some types of books can be bundled in ebook aggregations that can be offered on a subscription basis. This is reminiscent of the way investment bankers created valuable securities by packaging junk bonds with opaque derivatives.

Open access is a more broadly beneficial way to generate value from the backlist slush pile. There is a reason that libraries keep large numbers of books on their shelves even when they don't circulate for years. The myriad ways that books can create value doesn't have to be tied to book sales, as I wrote in my previous post.

Those of us who want to promote Open Access for backlist ebooks have a number of strategies at our disposal. The most basic strategy is to promote the visibility of these books. Libraries can add listings for these ebooks in their catalogs. Aggregators can make these books easier to find.

Switching backlist books to Open Access licenses can be expensive and difficult. While the cost of digitization has dropped dramatically over the past decade, quality control is still a significant conversion expense. Licensing-related expenses are sometimes large. Unlike journals and journal articles, academic books are typically covered by publishing agreements that give authors royalties on sales and licensing, and give authors control over derivative works such as translations. No publisher would consent to OA relicensing without the consent and support of the author. For older books, a publisher may not even have electronic rights (in the US, the Tasini decision established that electronic rights are separate from print rights), or may need to have a lawyer interpret the language of the original publishing contract. 

While most scholarly publishers obtain worldwide rights to the books they publish, rights for trade books are very often divided among markets. Open-access licenses such as the Creative Commons licenses are not limited to markets, so a license conversion would require the participation of every rights holder worldwide. 

The CC BY license can be problematic for books containing illustrations or figures used by permission from third party rights holders. "All Rights Reserved" illustrations are often included in Open Access Books, but they are carved out of the license by separate rights statements, and to be safe, publishers use the CC BY-ND or CC BY-ND-NC license for the complete book, as the permissions do not cover derivative works. Since the CC BY license allows derivative works, it cannot be used in cases where translation rights have been sold (without also buying out the translation rights). A publisher cannot use a CC BY license for a translated work without also having rights to the original work.

The bottom line is that converting a backlist book to OA often requires economic motivations quite apart from any lost sales. Luckily, there's evidence that opening access can lead to increased sales. Nagaraj and Reimers found that digitization and exposure through Google Books increased sales of print editions by 35% for books in the Public Domain.  In addition, a publisher's commercial position and prestige can be enhanced by the attribution requirement in Creative Commons licenses.

Additional motivation for OA conversion of the backlist slush pile has been supplied by programs such as used by Knowledge Unlatched, where libraries contribute to to a fund used for "unlatching" backlist books. (Knowledge Unlatched has programs for front list books as well.) While such programs can in principle be applied for the "evergreen" backlist, the incentives currently in place result in the unlatching of books in the "slush pile" backlist. While value for society is being gained this way, the willingness of publishers to "unlatch" hundreds of these books poses the question of how much library funding for Open Access should be allocated to the discount bin, as opposed to the backlist books most used in libraries. That's the topic of my next post! 

Notes

This is the second in a series of posts about creating value of Open Access books. The others are:

Friday, February 12, 2021

Creating Value with Open Access Books

Can a book be more valuable if it's free? How valuable? To whom? How do we unlock this value?

a lock with ebooks
I've been wrestling with these questions for over ten years now.  And for each of these questions, the answer is... it depends. A truism of the bookselling business is that "Every book is different" and the same is true of the book freeing "business".

Recently there's been increased interest in academic communities around Open Access book publishing and in academic book relicensing (adding an Open Access License to an already published book). Both endeavors have been struggling with the central question of how to value an open access book. The uncertainty in OA book valuation has led to many rookie mistakes among OA stakeholders. For example, when we first started Unglue.it, we assumed that reader interest would accelerate the relicensing process for older books whose sales had declined. But the opposite turned out to be true. Evidence of reader interest let rights holders know that these backlist titles were much more valuable than sales would indicate, thus precluding any notion of making them Open Access. Pro tip: if you want to pay a publisher to make a books free, don't publish your list of incredibly valuable books!

Instead of a strictly transactional approach, it's more useful to consider the myriad ways that academic books create value. Each of these value mechanisms offer buttons that we can push to promote open access, and point to new structures for markets where participants join together to create mutual value.

First, consider the book's reader. The value created is the reader's increased knowledge, understanding and sometimes, sheer enjoyment. The fact of open access does not itself create the value, but removes some of the barriers which might suppress this value. It's almost impossible to quantify the understanding and enjoyment from books; but "hours spent reading" might be a useful proxy for it.

Next consider a book's creator. While a small number of creators derive an income stream from their books, most academic authors benefit primarily from the development and dissemination of their ideas. In many fields of inquiry, publishing a book is the academic's path to tenure. Educators (and their students!) similarly benefit. In principle, you might assess a textbook's value by measuring student performance.

The value of a book to a publisher can be more than just direct sales revenue. A widely distributed book can be a marketing tool for a publisher's entire business. In the world of Open Access, we can see new revenue models emerging - publication charges, events, sponsorships, even grants and memberships. 

The value of a book to society as a whole can be enormous. In areas of research, a book might lead to technological advances, healthier living, or a more equitable society. Or a book might create outrage, civil strife, and misinformation. That's another issue entirely!

Books can be valuable to secondary distributors as well. Both used book resellers and libraries add value to physical books by increasing their usage. This is much harder to accomplish for paywalled ebooks! Since academic libraries are often considered as potential funding sources for Open Access publishing it's worth noting that the value of an open access ebook to a library is entirely indirect. When a library acts as an Open Access funding source, it's acting as a proxy for the community it serves.

This brings us to communities. The vast majority of books create value for specific communities, not societies as a whole. I believe that community-based funding is the most sustainable path for support of Open Access Books. Community supported OA article publishing has already had plenty of support. Communities organized by discipline have been particularly successful: consider the success that ArXiv has had in promoting Open Access in physics, both at the preprint level and for journals in high-energy physics. A similar story can be told for biomedicine, Pubmed and Pubmed Central. A different sort of community success story has been SciELO, which has used Open Access to address challenges faced by scholars in Latin America.

So far, however, sustainable Open Access has proven to be challenging for scholarly ebooks. My next few posts will discuss the challenges and ways forward for support of ebook relicensing and for OA ebook creation:

Tuesday, December 29, 2020

Infra-infrastructure, inter-infrastructure and para-infrastructure

No one is against "Investing in Infrastructure". No one wants bridges to collapse, investing is always more popular than spending, and it's even alliterative! What's more, since infrastructure is almost invisible by definition, it's politically safe to support investing in infrastructure because no one will see when you don't follow through on your commitment!

Ponte Morandi collapse - Michele Ferraris, CC BY-SA 4.0 via Wikimedia Commons

Geoffrey Bilder gives a talk where he asks us to think of Crossref and similar services as "information infrastructure" akin to "plumbing", where the implication is that since we, as a society, are accustomed to paying plumbers and bridge builders lots of money, we should also pony up for "information infrastructure", which is obvious once you say it.

What qualifies as infrastructure, anyway? If I invest in a new laptop, is that infrastructure for the Go-to-Hellman blog? Blogspot is Google-owned blogging infrastructure for sure. It's certainly not open infrastructure, but it works, and I haven't had to do much maintenance on it. 

There's a lot of infrastructure used to make Unglue.it, which supports distribution of open-access ebooks. It uses Django, which is open-source software originally developed to support newspaper websites. Unglue.it also uses modules that extend Django that were made possible by Django's Open license. It works really well, but I've had to put a fair amount of work into updating my code to keep up with new versions of Django. Ironically, most of this work has been in fixing the extensions that have not updated along with Django.

I deploy Unglue.it on AWS, which is DEFINITELY infrastructure. I have a love/hate relationship with AWS because it works so well, but every time I need to change something, I have to spend 2 hours with documentation to find the one-line incantation that make it work. But every few months, the cost of using AWS goes down, which I like, but the money goes to Amazon, which is ironic because they really don't care for the free ebooks we distribute.

Aside from AWS and Django, the infrastructure I use to deliver Ebook Foundation services includes Python, Docker, Travis-CI, GitHub, git, Ubuntu Linux, MySQL, Postgres, Ansible, Requests, Beautiful Soup, and many others. The Unglue.it database relies on infrastructure services from DOAB, OAPEN, LibraryThing, Project Gutenberg, OpenLibrary and Google Books. My development environment relies heavily on BBEdit and Jupyter. We depend on Crossref and Internet Archive to resolve some links; we use subject vocabulary from Library of Congress and BISAC.

You can imagine why I was interested in "JROST 2020" which turns out to stand for "Join Roadmap for Open Science Tools 2020", a meeting organized by a relatively new non-profit, "Invest in Open Infrastructure" (IOI). The meeting was open and free, and despite the challenges associated with such a meeting in our difficult times, it managed to present a provocative program along with a compelling vision.

If you think a bit about how to address the infrastructure needs of open science and open scholarship in general, you come up with at least 3 questions:

  • How do you identify the "leaky pipes" that need fixing so as to avoid systemic collapse?
  • How do you bolster healthy infrastructure so that it won't need repair?
  • How do you build new infrastructure that will be valuable and thrive?

If it were up to me, my first steps would be to:

  1. Get people with a stake in open infrastructure to talk to each other. Break them out of their silos and figure out how their solutions can help solve problems in other communities.
  2. Create a 'venture fund" for new needed infrastructure. Work on solving the problems that no one wants to tackle on their own.

Invest in Open Infrastructure is already doing this! Kaitlin Thaney, who's been Executive Director of IOI for less that a year, seems to be pressing all the right buttons. The JROST 2020 meeting was a great start on #1 and #2 is the initial direction of the "JROST Rapid Response Fund", whose first round of awards was announced at the meeting.

Among the first awardees of the JROST Rapid Response Fund announced at JROST2020 was an organization that ties into the infrastructure that I use, 2i2c. It's a great example of much-needed infrastructure for scientific computing, education, digital humanities and data science. 2i2c aims to create hosted interactive computing environments that run in the cloud and are powered by entirely open-source technology (Jupyter). As I'm a Jupyter user and enthusiast, this makes me happy.

But while 2i2c is the awardee,  it's being built on top of Jupyter. Is Jupyter also infrastructure? It needs investment too, doesn't it? There's a lot of overlap between the Jupyter team and the 2i2c team, so investment in one could be investment in the other. In fact, Chris Holdgraf, Executive Director of 2i2c, told me that "we see 2i2c as a way to both increase the impact of Jupyter in the research/education community, and a way to more sustainably drive resources back into the Jupyter community.".

Open Science Infrastructure Interdependency
Open Science Infrastructure Interdependency (from
“Scoping the Open Science Infrastructure Landscape in Europe”,
https://doi.org/10.5281/zenodo.4153809)


Where does Jupyter fit in the infrastructure landscape? It's nowhere to be seen on the neat "interdependency map" presented by SPARC EU at JROST. If 2i2c is an example of investment-worthy infrastructure, maybe the best way to think of Jupyter is "infra-infrastructure" - the open information infrastructure needed to build open information infrastructure. "Trickle-down" investment in this sort of infrastructure may be the best way to support projects like Jupyter so they stay open and are widely used.

But wait... Jupyter is built on top of Python, right? Python needs people investing in it, Is Python infra-infra-infrastructure? And Python is built on top of C  (I won't even mention Jython or PyJS), right?? Turtles all the way down. Will 2i2c eventually get buried under other layers of infrastructure, be forgotten and underinvested in, only to be one day excavated and studied by technology archeologists?

Looking carefully at the interdependency map, I don't see a lot of layers. I see a network with lots of loops. And many of the nodes are connectors themselves. Orcid and CrossRef resemble roads, bridges and plumbing not because they're hidden underneath, but because they're visible and in-between. They exist because of the entities they connect cooperate to make the connection robust instead of incidental. They're not infra-infrastructure, they're inter-infrastructure. Trickle-down investment probably wouldn't work for inter-infrastucture. Instead, investments need to come from the communities that benefit so that the communities can decide how to manage and access to the inter-infrastructure to maximize the community benefit.

There's another type of infrastructure that needs investment. I work in ebooks, and a lot of overlapping communities have tackled their own special ebook problems. But the textbook people don't talk to the public domain people don't talk to the monograph people don't talk to the library people. (A slight exaggeration.) There are lots of "almost" solutions that work well for specific tasks. But with the total amount of effort being expended, we could some really amazing things... if only we were better at collaborating.

For example, the Jupyter folks have gotten funding from Sloan for the "Executable Book Project". This is really cool. Similarly, there's Bookdown, which comes out of the R community. And there are other efforts to give ebooks the functionality that a website could have. Gitbook is a commercial open-source effort targeting a similar space, Rebus, a non-profit, is using Pressbooks to gain traction in the textbook space, while MIT Press's PubPub has similar goals.

I'll call these overlapping efforts "para-infrastructure." Should investors in open infrastructure target investment in "rolling up" or merging these efforts? When private equity investors have done this to library automation companies the results have not benefited the user communities, so I'd say "NO!" but what's the alternative?

I've observed that the folks who are doing the best job of just making stuff work rarely have the time or resources to go off to conferences or workshops. Typically, these folks have no incentive to do the work to make their tools work for slightly different problems. That can be time consuming! But it's still easier than taking someone else's work and modifying it to solve your own special problem. I think the best way to invest in open para-infrastructure is to get lots of these folks together and give the time and incentive to talk and to share solutions (and maybe code.) It's hard work, but making the web of open infrastructure stronger and more resilient is what investment in open infrastructure is all about. 

Different types of open infrastructure benefit from different styles of investment; I'm hoping that IOI will build on the directions exhibited by its Rapid Response Fund and invest effectively in infra-infrastructure, inter-infrastructure, and para-infrastructure. 

 Notes

1. Geoff Bilder and Cameron Neylon have a nice discussion of many of the issues in this post: “Bilder G, Lin J, Neylon C (2016) Where are the pipes? Building Foundational Infrastructures for Future Services, retrieved [date], http://cameronneylon.net/blog/where-are-the-pipes-building-foundational-infrastructures-for-future-services/ ‎”

2. "Trickle-down" has a negative connotation in economics, but that's how you feed a tree, right?

Monday, October 19, 2020

We should regulate virality

It turns out that virality on internet platforms is a social hazard! 

Living in the age of the Covid pandemic, we see around us what happens when we let things grow exponentially. The reason that the novel coronavirus has changed our lives is not that it's often lethal - it's that it found a way to jump from one infected person to several others on average, leading to exponential growth. We are infected with virus without regard to the lethality of the virus, but only its reproduction rate.

For years, websites have been built to optimize virality of content. What we see on Facebook or Twitter is not shown to us for its relevance to our lives, its education value, or even its entertainment value. It shown to us because it maximizes our "engagement" - our tendency to interact and spread it. The more we interact with a website, the more money it makes, and so a generation of minds has been employed in the pursuit of more engagement. Sometimes it's cat videos that delight us, but more often these days it's content that enrages and divides us.

Our dissatisfaction with what the internet has become has led calls to regulate the giants of the internet. A lot of the political discourse has focused on "section 20" https://en.wikipedia.org/wiki/Section_230  a part of US law that gives interactive platforms such as Facebook a set of rules that result in legal immunity for content posted by users. As might be expected, many of the proposals for reform have sounded attractive, but the details are typically unworkable in the real world, and often would have effects opposite of what is intended. 

I'd like to argue that the only workable approaches to regulating internet platforms should target their virality. Our society has no problem with regulations that force restaurant, food preparation facilities, and even barbershops to prevent the spread of disease, and no one ever complains that the regulations affect "good" bacteria too. These regulations are a component of our society's immune system, and they are necessary for its healthy functioning.

never going to give you covid
Add caption

You might think that platform virality is too technical to be amenable to regulation, but it's not. That's because of the statistical characteristics of exponential growth. My study of free ebook usage has made me aware of the pervasiveness of exponential statistics on the internet. Sometime labeled the 80-20 rule, the Pareto principle, or log-normal statistics, it's the natural result of processes that grow at a rate proportional to their size. As a result, it's possible to regulate virality of platforms because only a very small amount of content is viral enough dominate the platform. Regulate that tiny amount of super-viral content, and you create incentive to moderate the virality of platforms. The beauty of doing this is that a huge majority of content is untouched by regulation.

How might this work? Imagine a law that removed a platform's immunity for content that it shows to a million people (or maybe 10 million - I've not sure what the cutoff should be). This makes sense, too; if a platform promotes illegal content in such a way that a million people see it, the platform shouldn't get immunity just because "algorithms"! It also makes it practical for platforms to curate the content for harmlessness- it won't kill off the cat videos! The Facebooks and Twitters of the world will complain, but they'll be able to add antibodies and T-cells to their platforms, and the platforms will be healthier for it. Smaller sites will be free to innovate, without too much worry, but to get funding they'll need to have plans for virality limits.

So we really do have a choice; healthy platforms with diverse content, or cesspools of viral content. Doesn't seem like such a hard decision!