1-bp-blogspot-com-1046 ---- None 1-bp-blogspot-com-1202 ---- None 1-bp-blogspot-com-1642 ---- None 1-bp-blogspot-com-2268 ---- None 1-bp-blogspot-com-2516 ---- None 1-bp-blogspot-com-3471 ---- None 1-bp-blogspot-com-3948 ---- None 1-bp-blogspot-com-4344 ---- None 1-bp-blogspot-com-4437 ---- None 1-bp-blogspot-com-5708 ---- None 1-bp-blogspot-com-5875 ---- None 1-bp-blogspot-com-5906 ---- None 1-bp-blogspot-com-7161 ---- None 1-bp-blogspot-com-7275 ---- None 1-bp-blogspot-com-7583 ---- None 1-bp-blogspot-com-758 ---- None 1-bp-blogspot-com-8165 ---- None 1-bp-blogspot-com-8918 ---- None 1-bp-blogspot-com-902 ---- None 1-bp-blogspot-com-9265 ---- None 2021-code4lib-org-1940 ---- code4lib 2021 Skip to main content Toggle navigation Home Schedule Speakers Sponsors General Info Conduct & Safety code for lib 2021 Code4Lib 2021 March 22 - 26 • Online image attributions The conference for people who code for libraries. An annual gathering of technologists from around the world, who largely work for and with libraries, archives, and museums and have a commitment to open technologies. Conference Recordings View on YouTube View the full recordings of the conference from the livestream! Presentation Slides View the Open Science Foundation repository with slides and materials from the presentations. What Comes Next? Code4Lib 2022 Attendees, fill out the post-conference survey! See you next year! Thanks to our sponsors! Platinum Welcome to Code4Lib 2021 code4lib is everything to me. In the community I feel like my work and knowledge is appreciated, so I feel very comfortable and motivated to volunteer, give talks, teach workshops, participate in conferences, host events. It's a great support network, I've never felt as comfortable as I do in this library group! Kim Pham University of Denver The confluence of technology and libraries drew me to Code4Lib when I was a young librarian straddling the areas of library metadata and technology. After eleven years in the community I am still amazed and humbled by the people I meet in the community and the work they do. There isn't another space that seamlessly combines libraries, technology, and the human aspect quite like Code4Lib in the library world. Becky Yoose I came away from Code4lib wanting to invite most of the people I met into my office and ask all of the questions about what everyone is doing and how they’re doing it and how can I do those things and what would they change about their tools; what's better is many of them would gladly help. 7 years on I keep coming back for more because over the years technical excellence isn't the only metric used in this community's continued growth. I have made friends in the Code4lib community. Francis Kayiwa Princeton University Libraries Code4Lib offers the space to be self-aware, outwardly conscious, and vastly creative. The community seems to be expanding and learning without ego, and I feel lucky to have been welcomed into the group with open arms. The conference is a place where one can look holistically at technology alongside thoughtful practitioners, while building lasting friendships. Alyssa Loera Cal Poly Pomona Code4Lib has been transformative for me. When I first learned of Code4Lib, I was considering leaving libraryland. Attending the first Code4Lib conference opened my eyes to the community I never knew I had. Code4Lib continues to humble, to inspire, and to anchor; our collective work is grounded in the cultural heritage mission and in the value of working inclusively in the open for the collective good. Here's to another twelve years, Code4Lib---and then some! Michael Giarlo Stanford University I attended Code4Lib 2018 on a diversity scholarship and I will always be grateful for that opportunity. It was free of buzzwords, full of welcoming people, and the sessions were interesting and accessible even though I don't work closely with technology or coding. I'm more motivated to explore new areas of technology and librarianship, I've started volunteering with the web committee, and I'm looking forward to attending the conference again! Laura Chuang Attending my first Code4Lib allowed me to explore the potential of technology, while upholding the importance of empathy and community-building. The connections I made at Code4Lib have continued to deepen over the last year, and it has been fantastic to see how we have implemented ideas that were shaped by conversations there. Code4Lib has modeled accountability and care, including publicly standing up against harassment and organizing support for our community. Nicky Andrews North Carolina State University Code4Lib has been a great conference for me as a metadata person interested in gaining computer science skills and insights. Presentations and topics are selected by the community. As such, I find a greater portion of presentations at this conference to be interesting, relevant, and educational than at other conferences in which presentations are jury selected. They also offer generous scholarships to underrepresented folks in the Code4Lib community. Yay!. Sonoe Nakasone North Carolina State University Libraries At Code4Lib, you really get the sense that people are there to share with and learn from one another — to advance their work individually and collectively — and have fun while they’re at it. I left the conference reminded of the widespread passion for libraries as critical features of our society, the passion that draws interesting, creative people to library work, and found I had a renewed sense of purpose in my job. Hannah Frost Stanford University Home Schedule Speakers Sponsors General Info Conduct & Safety Contact Us Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 4.0 International license. accesa-org-4815 ---- Inicio - ACCESA info@accesa.org Eventos Inicio Nosotros Proyectos Publicaciones Equipo Transparencia Blog Boletín Revista Sinergias Contacto Suscríbase ACCESA Centro Ciudadano de Estudios para una Sociedad Abierta ACCESA Centro Ciudadano de Estudios para una Sociedad Abierta En ACCESA buscamos mejorar la relación entre el Estado y la sociedad, transformando las estructuras del Estado a unas más abiertas y transparentes que logren satisfacer las demandas ciudadanas, y promoviendo que la sociedad se involucre activamente en la solución de sus problemas. Transparencia Promovemos todas las prácticas que facilitan el control ciudadano y la rendición de cuentas claras sobre las decisiones, acciones y asuntos de interés público. Acceso a la Información Fomentamos que se garantice el derecho humano de acceder a toda la información de interés público de forma amplia, libre, igualitaria y sin discriminación alguna. Participación ciudadana Impulsamos el derecho y deber de la población a integrarse en procesos de deliberación y debate colectivo para incidir en la toma de decisión sobre asuntos de interés público. Proyectos Trabajamos para mejorar la calidad de la democracia en Costa Rica. Conocé nuestros proyectos Revista Sinergias Sinergias es nuestro proyecto editorial de divulgación, reflexión y análisis sobre apertura gubernamental, participación ciudadana, tecnología cívica,… Apoyo a la co-creación del Plan de Acción de Estado Abierto 2019-2022 Como parte de sus obligaciones por ser miembro de la Alianza para un Gobierno Abierto… Construcción colectiva de Política y Reglamento de Participación Ciudadana para el cantón de Osa desde la perspectiva de Gobierno Abierto Durante el año 2019 ACCESA, con el apoyo y financiamiento de la Fundación Trust for the… La sociedad abierta es una sociedad humanista, inclusiva, diversa, plural y orientada al bien común, que busca la realización de la libertad y los derechos de todas las personas.  Blog   Mapeando el desarrollo rural con la apertura de datos 24 marzo, 2021 Desde hace tres años nuestra organización ha buscado aprovechar la celebración internacional del Open Data… Leer más Proceso de co-creación del Plan de Estado Abierto: nuestras reflexiones 26 enero, 2021 Consideramos que este ha sido el proceso de co-creación más transparente, participativo y riguroso que… Leer más La participación ciudadana y el Gobierno Abierto: ¿antídotos para la crisis de la democracia? 22 diciembre, 2020 Acciones tendientes hacia la puesta en marcha de políticas de Gobierno Abierto siguen siendo incipientes… Leer más Nuestra valoración sobre el Diálogo Multisectorial 15 diciembre, 2020 Apoyamos las iniciativas de colaboración y deliberación, pero identificamos errores y falencias en este proceso de… Leer más Asociación Centro Ciudadano de Estudios para una Sociedad Abierta Inicio Nosotros Proyectos Publicaciones Equipo Transparencia Blog Boletín Revista Sinergias Contacto Suscríbase acrl-ala-org-5792 ---- ACRL TechConnect ACRL TechConnect Broken Links in the Discovery Layer—Pt. II: Towards an Ethnography of Broken Links This post continues where my last one left off, investigating broken links in our discovery layer. Be forewarned—most of it will be a long, dry list of all the mundane horrors of librarianship. Metadata mismatches, EZproxy errors, and OpenURL resolvers, oh my! What does it mean when we say a link is broken? The simplest … Continue reading "Broken Links in the Discovery Layer—Pt. II: Towards an Ethnography of Broken Links" Broken Links in the Discovery Layer—Pt. I: Researching a Problem Like many administrators of discovery layers, I’m constantly baffled and frustrated when users can’t access full text results from their searches. After implementing Summon, we heard a few reports of problems and gradually our librarians started to stumble across them on their own. At first, we had no formal system for tracking these errors. Eventually, … Continue reading "Broken Links in the Discovery Layer—Pt. I: Researching a Problem" ORCID for System Interoperability in Scholarly Communication Workflows What is ORCID? If you work in an academic library or otherwise provide support for research and scholarly communication, you have probably heard of ORCID (Open Contributor & Researcher Identifier) in terms of “ORCID iD,” a unique 16-digit identifier that represents an individual in order to mitigate name ambiguity. The ORCID iD number is presented … Continue reading "ORCID for System Interoperability in Scholarly Communication Workflows" Creating Presentations with Beautiful.AI Updated 2018-11-12 at 3:30PM with accessibility information. Beautiful.AI is a new website that enables users to create dynamic presentations quickly and easily with “smart templates” and other design optimized features. So far the service is free with a paid pro tier coming soon. I first heard about Beautiful.AI in an advertisement on NPR and was … Continue reading "Creating Presentations with Beautiful.AI" National Forum on Web Privacy and Web Analytics We had the fantastic experience of participating in the National Forum on Web Privacy and Web Analytics in Bozeman, Montana last month. This event brought together around forty people from different areas and types of libraries to do in-depth discussion and planning about privacy issues in libraries. Our hosts from Montana State University, Scott Young, … Continue reading "National Forum on Web Privacy and Web Analytics" The Ex Libris Knowledge Center and Orangewashing Two days after ProQuest completed their acquisition of Ex Libris in December 2015, Ex Libris announced the launch of their new online Customer Knowledge Center. In the press release for the Knowledge Center, the company describes it as “a single gateway to all Ex Libris knowledge resources,” including training materials, release notes, and product manuals. … Continue reading "The Ex Libris Knowledge Center and Orangewashing" Managing ILS Updates We’ve done a few screencasts in the past here at TechConnect and I wanted to make a new one to cover a topic that’s come up this summer: managing ILS updates. Integrated Library Systems are huge, unwieldy pieces of software and it can be difficult to track what changes with each update: new settings are … Continue reading "Managing ILS Updates" Blockchain: Merits, Issues, and Suggestions for Compelling Use Cases Blockchain holds a great potential for both innovation and disruption. The adoption of blockchain also poses certain risks, and those risks will need to be addressed and mitigated before blockchain becomes mainstream. A lot of people have heard of blockchain at this point. But many are unfamiliar with how this new technology exactly works and … Continue reading "Blockchain: Merits, Issues, and Suggestions for Compelling Use Cases" Introducing Our New Best Friend, GDPR You’ve seen the letters GDPR in every single email you’ve gotten from a vendor or a mailing list lately, but you might not be exactly sure what it is. With GDPR enforcement starting on May 25, it’s time for a crash course in what GDPR is, and why it could be your new best friend … Continue reading "Introducing Our New Best Friend, GDPR" Names are Hard A while ago I stumbled onto the post “Falsehoods Programmers Believe About Names” and was stunned. Personal names are one of the most deceptively difficult forms of data to work with and this article touched on so many common but unaddressed problems. Assumptions like “people have exactly one canonical name” and “My system will never … Continue reading "Names are Hard" activitypub-rocks-4532 ---- ActivityPub Rocks! Don't you miss the days when the web really was the world's greatest decentralized network? Before everything got locked down into a handful of walled gardens? So do we. Enter ActivityPub! ActivityPub is a decentralized social networking protocol based on the ActivityStreams 2.0 data format. ActivityPub is an official W3C recommended standard published by the W3C Social Web Working Group. It provides a client to server API for creating, updating and deleting content, as well as a federated server to server API for delivering notifications and subscribing to content. Sounds exciting? Dive in! ==> Latest published version <== ==> Latest editor's draft <== Or, are you a user looking for ActivityPub software to use? Check out this guide for ActivityPub users (community edited)! ~= Hey, Implementers! =~We're so stoked to have you implementing ActivityPub! To make sure ActivityPub implementations work together, we have: Guide for new ActivityPub implementers -- Community edited and unofficial, but useful! A test suite: -- Make sure your application works right according to the ActivityPub standard. Implementation reports: -- See the implementation coverage of applications which implemented ActivityPub during the standardization process. Looking to discuss implementing ActivityPub? You can join the #social IRC channel on irc.w3.org! See also SocialHub, a community-run forum to discuss ActivityPub developments and ideas, and the Social CG, a W3C Community Group to continue the work of advancing the federated social web... including ActivityPub! -=* ActivityPub News *=- Some (long overdue) site updatesMon 04 January 2021 Let us meet on SocialHub!Thu 26 December 2019 ActivityPub reaches W3C Recommendation status! Everybody party!Tue 20 March 2018 ActivityPub reaches Proposed Recommendation status!Fri 08 December 2017 Test suite up, implementation reports page up... let's get more reports in!Mon 06 November 2017 Mastodon launches their ActivityPub support, and a new CR!Sun 10 September 2017 New tutorial, new logo!Tue 09 May 2017 Help submit implementation reports!Sun 09 April 2017 ActivityPub reaches Candidate Recommendation status!Thu 17 November 2016 activitypub.rocks launches!Mon 14 November 2016 Site contents dual licensed under Creative Commons Attribution-Sharealike 4.0 International and the GNU GPL, version 3 or any later version. ActivityPub logo by mray, released into public domain under CC0 1.0. Powered by Haunt. acrl-ala-org-4948 ---- ACRL TechConnect Skip to content ACRL TechConnect Menu About Authors Broken Links in the Discovery Layer—Pt. II: Towards an Ethnography of Broken Links This post continues where my last one left off, investigating broken links in our discovery layer. Be forewarned—most of it will be a long, dry list of all the mundane horrors of librarianship. Metadata mismatches, EZproxy errors, and OpenURL resolvers, oh my! What does it mean when we say a link is broken? The simplest definition would be: when a link that claims to lead to full text does not. But the way that many discovery layers work is by translating article metadata into a query in a separate database, which leads to some gray areas. What if the link leads to a search with only a single result, the resource in question? What if the link leads to a search with two results, a dozen, a hundred…and the resource is among them? What if the link leads to a journal index and it takes some navigation to get to the article’s full text? Where do we draw the line? The user’s expectation is that selecting something that says “full text” leads to the source itself. I think all of the above count as broken links, though they obviously range in severity. Some mean that the article simply cannot be accessed while others mean that the user has to perform a little more work. For the purposes of this study, I am primarily concerned with the first case: when the full text is nowhere near the link’s destination. As we discuss individual cases reported by end users, it will solidify our definition. Long List I’m going to enumerate some types of errors I’ve seen, providing a specific example and detailing its nature as much as possible to differentiate the errors from each other. 1. The user selects a full text link but is taken to a database query that doesn’t yield the desired result. We had someone report this with an article entitled “LAND USE: U.S. Soil Erosion Rates–Myth and Reality” in Summon which was translated into a query on the article’s ISSN, publication title, and an accidentally truncated title (just “LAND USE”).1 The query fails to retrieve the article but does show 137 other results. The article is present in the database and can be retrieved by editing the query, for instance by changing the title parameter to “U.S. soil erosion rates”. Indeed, the database has the title as “U.S. soil erosion rates–myth and reality”. The article appears to be part of a recurring column and is labelled “POLICY FORUM: LAND USE” which explains the discovery layer’s representation of the title. Fundamentally, the problem is a disagreement about the title between the discovery layer and database. As another example, I’ve seen this problem occur with book reviews where one side prefixes the title with “Review:” while the other does not. In a third instance of this, I’ve seen a query title = "Julia Brannen Peter Moss "and" Ann Mooney Working "and" Caring over the Twentieth Century Palgrave Macmillan Basingstoke Hampshire 2004 234 pp hbk £50 ISBN 1 4039 2059 1" where a lot of ancillary text spilled into the title. 2. The user is looking for a specific piece except the destination database combines this piece with similar ones into a single record with a generic title such that incoming queries fail. So, for instance, our discovery layer’s link might become a title query for Book Review: Bad Feminist by Roxane Gay in the destination, which only has an article named “Book Reviews” in the same issue of the host publication. In my experience, this is one of the more common discovery layer problems and can be described as a granularity mismatch. The discovery layer and subscription database disagree about what the fundamental unit of the publication is. While book reviews often evince this problem, so too do letters to the editor, opinion pieces, and recurring columns. 3. An article present in one of our subscription databases is not represented in the discovery layer, despite the database being correctly selected in the knowledgebase that informs the discovery system’s index. We’re able to read the article “Kopfkino: Julia Phillips’ sculptures beyond the binary” in an EBSCO database that provides access to the journal Flash Art International but no query in Summon can retrieve it as a result. I suppose this is not technically a broken link as a non-existent link but it falls under the general umbrella of discovery layer content problems. 4. The exact inverse of the above: an article is correctly represented by the discovery layer index as being part of a database subscription that the user should have access to, but the article does not actually exist within the source database due to missing content. This occurred with an interview of Howard Willard in American Artist from 1950. While our subscription to Art & Architecture Source does indeed include the issue of American Artist in question, and one can read other articles from it, there was no record for the interview itself in EBSCOHost nor are its pages present in any of the PDF scans of the issue. 5. The user is looking for an article that is combined with another, even though the source seems to agree that they should be treated separately. For instance, one of our users was looking for the article “Musical Curiosities in Athanasius Kircher’s Antiquarian Visions” in the journal Music in Art but Summon’s link lands on a broken link resolver page in the destination EBSCO database. It turns out, upon closer inspection, that the pages for this article are appended to the PDF of the article that appears before it. All other articles for the issue have their own record. This is an interesting hybrid metadata/content problem similar to granularity mismatch: while there is no record for the article itself in the database, the article’s text is present. Yet unlike some granularity mismatches it is impossible to circumvent via search; you have to know to browse the issue and utilize page numbers to locate it. 6. The user selects a link to an article published within the past year in a journal with a year-long embargo. The discovery layer shows a “full text online” link but because the source’s link resolver doesn’t consider an embargoed article to be a valid destination, the link lands on an error page. This is an instance where Summon would, ideally, at least take to you to the article’s citation page but in any case the user won’t be able to retrieve the full text. 7. The user selects an article that is in a journal not contained within any of the library’s database subscriptions. This is usually simple knowledge base error where the journal lists for a database changed without being updated in the discovery layer index. Still, it’s quite common because not all subscription changes are published in a machine-readable manner that would allow discovery layers to automate their ingestion. 8. The user selects an article listed as being published in 2016 in the discovery layer, while the source database has 2017 so the OpenURL fails to resolve properly. Upon investigation, this date mismatch can be traced back to the journal’s publisher which lists the individual articles as being published in 2016 while the issue in which they are contained comes from 2017. The Summon support staff rightly points out to me that they can’t simply change the article dates to match one source; while it might fix some links, it will break others, and this date mismatch is a fundamentally unsolvable disagreement. This issue highlights the brittleness of real world metadata; publishers, content aggregators, and discovery products do not live in harmony. Reviewing the list of problems, this dual organization seems to helpfully group like issues: Metadata & linking problems Metadata mismatch (1, 5, 8) Granularity mismatch (2) Link resolver error (6) Index problems Article not in database/journal/index (3, 4, 5, 6) Journal not in database (7) Of these three, the first category accounts for the vast majority of problems according to my anecdata. It’s notable that issues overlap and their classification is inexact. When a link to an embargoed article fails, should we say that is due to the article being “missing” or a link resolver issue? Whatever the case, it is often clear when a link is broken even if we could argue endlessly about how exactly. There are also a host of problems that we, as librarians, cause. We might misconfigure EZproxy for a database or fail to keep our knowledge base holdings up to date. The difference with these problems is that they tend to happen once and then be resolved forever; I fix the EZproxy stanza, I remove access to the database we unsubscribed from. So the proportion of errors we account for is vanishingly low, while these other errors are eternal. No matter how many granularity mismatches or missing articles in I point out, there are always millions more waiting to cause problems for our users. Notes This sort of incredibly poor handling of punctuation in queries is sadly quite common. Even though, in this instance, the source database and discovery layer are made by the same company the link between them still isn’t prepared to handle a colon in a text string. Consider how many academic articles have colons in their title. This is not good. ↩ Author Eric PhetteplacePosted on July 11, 2019Categories discovery, metadata1 Comment on Broken Links in the Discovery Layer—Pt. II: Towards an Ethnography of Broken Links Broken Links in the Discovery Layer—Pt. I: Researching a Problem Like many administrators of discovery layers, I’m constantly baffled and frustrated when users can’t access full text results from their searches. After implementing Summon, we heard a few reports of problems and gradually our librarians started to stumble across them on their own. At first, we had no formal system for tracking these errors. Eventually, I added a script which inserted a “report broken link” form into our discovery layer’s search results. 1 I hoped that collecting reported problems and then reporting then would identify certain systemic issues that could be resolved, ultimately leading to fewer problems. Pointing out patterns in these errors to vendors should lead to actual progress in terms of user experience. From the broken links form, I began to cull some data on the problem. I can tell you, for instance, which destination databases experience the most problems or what the character of the most common problems is. The issue is the sample bias—are the problems that are reported really the most common? Or are they just the ones that our most diligent researchers (mostly our librarians, graduate students, and faculty) are likely to report? I long for quantifiable evidence of the issue without this bias. How I classify the broken links that have been reported via our form. N = 57 Select Searches & Search Results So how would one go about objectively studying broken links in a discovery layer? The first issue to solve is what searches and search results to review. Luckily, we have data on this—we can view in our analytics what the most popular searches are. But a problem becomes apparent when one goes to review those search terms: artstor hours jstor kanopy Of course, the most commonly occurring searches tend to be single words. These searches all trigger “best bet” or database suggestions that send users directly to other resources. If their result lists do contain broken links, those links are unlikely to ever be visited, making them a poor choice for our study. If I go a little further into the set of most common searches, I see single-word subject searches for “drawing” followed by some proper nouns (“suzanne lacy”, “chicago manual of style”). These are better since it’s more likely users actually select items from their results but still aren’t a great representation of all the types of searches that occur. Why are these types of single-word searches not the best test cases? Because search phrases necessarily have a long tail distribution; the most popular searches aren’t that popular in the context of the total quantity of searches performed 2. There are many distinct search queries that were only ever executed once. Our most popular search of “artstor”? It was executed 122 times over the past two years. Yet we’ve had somewhere near 25,000 searches in the past six months alone. This supposedly popular phrase has a negligible share of that total. Meanwhile, just because a search for “How to Hack it as a Working Parent. Jaclyn Bedoya, Margaret Heller, Christina Salazar, and May Yan. Code4Lib (2015) iss. 28″ has only been run once doesn’t mean it doesn’t represent a type of search—exact citation search—that is fairly common and worth examining, since broken links during known item searches are more likely to be frustrating. Even our 500 most popular searches evince a long tail distribution. So let’s say we resolve the problem of which searches to choose by creating a taxonomy of search types, from single-word subjects to copy-pasted citations. 3 We can select a few real world samples of each type to use in our study. Yet we still haven’t decided which search results we’re going to examine! Luckily, this proves much easier to resolve. People don’t look very far down in the search results 4, rarely scrolling past the first “page” listed (Summon has an infinite scroll so there technically are no pages, but you get the idea). Only items within the first ten results are likely to be selected. Once we have our searches and know that we want to examine only the first ten or so results, my next thought is that it might be worth filtering our results that are unlikely to have problems. But does skipping the records from our catalog, institutional repository, LibGuides, etc. make other problems abnormally more apparent? After all, these sorts of results are likely to work since we’re providing direct links to the Summon link. Also, our users do not heavily employ facets—they would be unlikely to filter out results from the library catalog. 5 In a way, by focusing a study on search results that are the most likely to fail and thus give us information about underlying linking issues, we’re diverging away from the typical search experience. In the end, I think it’s worthwhile to stay true to more realistic search patterns and not apply, for instance, a “Full Text Online” filter which would exclude our library catalog. Next Time on Tech Connect—oh how many ways can things go wrong?!? I’ll start investigating broken links and attempt to enumerate their differing natures. Notes This script was largely copied from Robert Hoyt of Fairfield University, so all credit due to him. ↩ For instance, see: Beitzel, S. M., Jensen, E. C., Chowdhury, A., Frieder, O., & Grossman, D. (2007). Temporal analysis of a very large topically categorized web query log. Journal of the American Society for Information Science and Technology, 58(2), 166–178. “… it is clear that the vast majority of queries in an hour appear only one to five times and that these rare queries consistently account for large portions of the total query volume” ↩ Ignore, for the moment, that this taxonomy’s constitution is an entire field of study to itself. ↩ Pan, B., Hembrooke, H., Joachims, T., Lorigo, L., Gay, G., & Granka, L. (2007). In google we trust: Users’ decisions on rank, position, and relevance. Journal of Computer-Mediated Communication, 12(3), 801–823. ↩ In fact, the most common facet used in our discovery layer is “library catalog” showing that users often want only bibliographic records; the precise opposite of a search aimed at only retrieving article database results. ↩ Author Eric PhetteplacePosted on March 11, 2019March 11, 2019Categories data, discovery3 Comments on Broken Links in the Discovery Layer—Pt. I: Researching a Problem ORCID for System Interoperability in Scholarly Communication Workflows What is ORCID? If you work in an academic library or otherwise provide support for research and scholarly communication, you have probably heard of ORCID (Open Contributor & Researcher Identifier) in terms of “ORCID iD,” a unique 16-digit identifier that represents an individual in order to mitigate name ambiguity. The ORCID iD number is presented as a URI (unique resource identifier) that serves as the link to a corresponding ORCID record, where disambiguating data about an individual is stored. For example, https://orcid.org/0000-0002-9079-593X is the ORCID iD for the late Stephen Hawking, and clicking on this link will take you to Hawking’s ORCID record. Data within ORCID records can include things like names(s) and other identifiers, biographical information, organizational affiliations, and works. Figure 1: This screenshot shows the types of data that can be contained in an ORCID record. Anyone can register for an ORCID iD for free, and individuals have full control over what data appears in their record, the visibility of that data, and whether other individuals or organizations are authorized to add data to their ORCID record on their behalf. Individuals can populate information in their ORCID record themselves, or they can grant permission to organizations, like research institutions, publishers, and funding agencies, to connect with their ORCID record as trusted parties, establishing an official affiliation between the individual and the organization. For example, Figures 2 and 3 illustrate an authenticated ORCID connection between an individual author and the University of Virginia (UVA) as represented in LibraOpen, the UVA Library’s Samvera institutional repository. Figure 2: The University of Virginia Library’s LibraOpen Institutional Repository is configured to make authenticated connections with authors’ ORCID records, linking the author to their contributions and to the institution. Once an author authenticates/connects their ORCID iD in the system, ORCID iD URIs are displayed next to the authors’ names. Image source: doi.org/10.18130/V3FB8T Figure 3: By clicking on the author’s ORCID iD URI in LibraOpen, we can see the work listed on the individual’s ORCID record, with “University of Virginia” as the source of the data, which means that the author gave permission for UVA to write to their ORCID record. This saves time for the author, ensures integrity of metadata, and contributes trustworthy data back to the scholarly communication ecosystem that can then be used by other systems connected with ORCID. Image courtesy of Sherry Lake, UVA https://orcid.org/0000-0002-5660-2970 ORCID Ecosystem & Interoperability These authenticated connections are made possible by configuring software systems to communicate with the ORCID registry through the ORCID API, which is based on OAuth 2.0. With individual researchers/contributors at the center, and their affiliated organizations connecting with them through the ORCID API, all participating organizations’ systems can also communicate with each other. In this way, ORCID not only serves as a mechanism for name disambiguation, it also provides a linchpin for system interoperability in the research and scholarly communication ecosystem. Figure 4: ORCID serves as a mechanism for interoperability between systems and data in the scholarly communication ecosystem. Graphic courtesy of the ORCID organization. Publishers, funders, research institutions (employers), government agencies, and other stakeholders have been adopting and using ORCID increasingly in their systems over the past several years. As a global initiative, over 5 million individuals around the world have registered for an ORCID iD, and that number continues to grow steadily as more organizations start to require ORCID iDs in their workflows. For example, over 65 publishers have signed on to an open letter committing to use ORCID in their processes, and grant funders are continuing to come on board with ORCID as well, having recently released their own open letter demonstrating commitment to ORCID. A full list of participating ORCID member organizations around the globe can be found at https://orcid.org/members. ORCID Integrations ORCID can be integrated into any system that touches the types of data contained within an ORCID record, including repositories, publishing and content management platforms, data management systems, central identity management systems, human resources, grants management, and Current Research Information Systems (CRIS). ORCID integrations can either be custom built into local systems, such as the example from UVA above, or made available through a vendor system out of the box. Several vendor-hosted CRIS such as Pure, Faculty 180, Digital Measures, and Symplectic Elements, already have built-in support for authenticated ORCID connections that can be utilized by institutional ORCID members, which provides a quick win for pulling ORCID data into assessment workflows with no development required. While ORCID has a public API that offers limited functionality for connecting with ORCID iDs and reading public ORCID data, the ORCID member API allows organizations to read from, write to, and auto-update ORCID data for their affiliated researchers. The ORCID institutional membership model allows organizations to support the ORCID initiative and benefit from the more robust functionality that the member API provides. ORCID can be integrated with disparate systems, or with one system from which data flows into others, as illustrated in Figure 5. Figure 5: This graphic from the Czech Technical University in Prague illustrates how a central identity management system is configured to connect with the ORCID registry via the ORCID API, with ORCID data flowing internally to other institutional systems. Image Source: Czech Technical University in Prague Central Library & Computing and Information Centre , 2016: Solving a Problem of Authority Control in DSpace During ORCID Implementation ORCID in US Research Institutions In January of 2018, four consortia in the US – the NorthEast Research Libraries (NERL), the Greater Western Library Alliance (GWLA), the Big Ten Academic Alliance (BTAA), and LYRASIS – joined forces to form a national partnership for a consortial approach to ORCID membership among research institutions in the US, known as the ORCID US Community. The national partnership allows non-profit research institutions to become premium ORCID member organizations for a significantly discounted fee and employs staff to provide dedicated technical and community support for its members. As of December 1, 2018, there are 107 member organizations in the ORCID US Community. In addition to encouraging adoption of ORCID, a main goal of the consortium approach is to build a community of practice around ORCID in the US. Prior to 2018, any institutions participating in ORCID were essentially going it alone and there were no dedicated communication channels or forums for discussion and sharing around ORCID at a national level. However, with the formation of the ORCID US Community, there is now a website with community resources for ORCID adoption specific to the US, dedicated communication channels, and an open door to collaboration between member institutions. Among ORCID US Community member organizations, just under half have integrated ORCID with one or more systems, and the other slightly more than half are either in early planning stages or technical development. (See the ORCID US Community 2018 newsletter for more information.) As an ecosystem, ORCID relies not only on organizations but also the participation of individual researchers, so all members have also been actively reaching out to their affiliated researchers to encourage them to register for, connect, and use their ORCID iD. Getting Started with ORCID ORCID can benefit research institutions by mitigating confusion caused by name ambiguity, providing an interoperable data source that can be used for individual assessment and aggregated review of institutional impact, allowing institutions to assert authority over their institutional name and verify affiliations with researchers, ultimately saving time and reducing administrative burden for both organizations and individuals. To get the most value from ORCID, research institutions should consider the following three activities as outlined in the ORCID US Planning Guide: Forming a cross-campus ORCID committee or group with stakeholders from different campus units (libraries, central IT, research office, graduate school, grants office, human resources, specific academic units, etc.) to strategically plan ORCID system integration and outreach efforts Assessing all of the current systems used on campus to determine which workflows could benefit from ORCID integration Conducting outreach and education around research impact and ORCID to encourage researchers to register for and use their ORCID iD The more people and organizations/systems using ORCID, the more all stakeholders can benefit from ORCID by maintaining a record of an individuals’ scholarly and cultural contributions throughout their career, mitigating confusion caused by name ambiguity, assessing individual contributions as well as institutional impact, and enabling trustworthy and efficient sharing of data across scholarly communication workflows. Effectively, ORCID represents a paradigm shift from siloed, repetitive workflows to the ideal of being able to “enter once, re-use often” by using ORCID to transfer data between systems, workflows, and individuals, ultimately making everyone’s lives easier. Sheila Rabun is the ORCID US Community Specialist at LYRASIS, providing technical and community support for 100+ institutional members of the ORCID US Community. In prior roles, she managed community and communication for the International Image Interoperability Framework (IIIF) Consortium, and served as a digital project manager for several years at the University of Oregon Libraries’ Digital Scholarship Center. Learn more at https://orcid.org/0000-0002-1196-6279 Author Sheila RabunPosted on December 18, 2018December 17, 2018Categories digital scholarship, publication, Scholarly Communication Creating Presentations with Beautiful.AI Updated 2018-11-12 at 3:30PM with accessibility information. Beautiful.AI is a new website that enables users to create dynamic presentations quickly and easily with “smart templates” and other design optimized features. So far the service is free with a paid pro tier coming soon. I first heard about Beautiful.AI in an advertisement on NPR and was immediately intrigued. The landscape of presentation software platforms has broadened in recent years to include websites like Prezi, Emaze, and an array of others beyond the tried and true PowerPoint. My preferred method of creating presentations for the past couple of years has been to customize the layouts available on Canva and download the completed PDFs for use in PowerPoint. I am also someone who enjoys tinkering with fonts and other design elements until I get a presentation just right, but I know that these steps can be time consuming and overwhelming for many people. With that in mind, I set out to put Beautiful.AI to the test by creating a short “prepare and share” presentation about my first experience at ALA’s Annual Conference this past June for an upcoming meeting. A title slide created with Beautiful.AI. Features To help you get started, Beautiful.AI includes an introductory “Design Tips for Beautiful Slides” presentation. It is also fully customizable so you can play around with all of of the features and options as you explore, or you can click on “create new presentation” to start from scratch. You’ll then be prompted to choose a theme, and you can also choose a color palette. Once you start adding slides you can make use of Beautiful.AI’s template library. This is the foundation of the site’s usefulness because it helps alleviate guesswork about where to put content and that dreaded “staring at the blank slide” feeling. Each individual slide becomes a canvas as you create a presentation, similar to what is likely familiar in PowerPoint. In fact, all of the most popular PowerPoint features are available in Beautiful.AI, they’re just located in very different places. From the navigation at the left of the screen users can adjust the colors and layout of each slide as well as add images, animation, and presenter notes. Options to add, duplicate, or delete a slide are available on the right of the screen. The organize feature also allows you to zoom out and see all of the slides in the presentation. Beautiful.AI offers a built-in template to create a word cloud. One of Beautiful.AI’s best features, and my personal favorite, is its built-in free stock image library. You can choose from pre-selected categories such as Data, Meeting, Nature, or Technology or search for other images. An import feature is also available, but providing the stock images is extremely useful if you don’t have your own photos at the ready. Using these images also ensures that no copyright restrictions are violated and helps add a professional polish to your presentation. The options to add an audio track and advance times to slides are also nice to have for creating presentations as tutorials or introductions to a topic. When you’re ready to present, you can do so directly from the browser or export to PDF or PowerPoint. Options to share with a link or embed with code are also available. Usability While intuitive design and overall usability won’t necessarily make or break the existence of a presentation software platform, each will play a role in influencing whether someone uses it more than once. For the most part, I found Beautiful.AI to be easy and fun to use. The interface is bold, yet simplistic, and on trend with current website design aesthetics. Still, users who are new to creating presentations online in a non-PowerPoint environment may find the Beautiful.AI interface to be confusing at first. Most features are consolidated within icons and require you to hover over them to reveal their function. Icons like the camera to represent “Add Image” are pretty obvious, but others such as Layout and Organize are less intuitive. Some of Beautiful.AI’s terminology may also not be as easily recognizable. For example, the use of the term “variations” was confusing to me at first, especially since it’s only an option for the title slide. The absence of any drag and drop capability for text boxes is definitely a feature that’s missing for me. This is really where the automated design adaptability didn’t seem to work as well as I would’ve expected given that it’s one of the company’s most prominent marketing statements. On the title slide of my presentation, capitalizing a letter in the title caused the text to move closer to the edge of the slide. In Canva, I could easily pull the text block over to the left a little or adjust the font size down by a few points. I really am a stickler for spacing in my presentations, and I would’ve expected this to be an element that the “Design AI” would pick up on. Each template also has different pre-set design elements, and it can be confusing when you choose one that includes a feature that you didn’t expect. Yet, text sizes that are pre-set to fit the dimensions of each template does help not only with readability in the creation phase but with overall visibility for audiences. Again, this alleviates some of the guesswork that often happens in PowerPoint with not knowing exactly how large your text sizes will appear when projected onto larger screens. A slide created using a basic template and stock photos available in Beautiful.AI. One feature that does work really well is the export option. Exporting to PowerPoint creates a perfectly sized facsimile presentation, and being able to easily download a PDF is very useful for creating handouts or archiving a presentation later on. Both are nice to have as a backup for conferences where Internet access may be spotty, and it’s nice that Beautiful.AI understands the need for these options. Unfortunately, Beautiful.AI doesn’t address accessibility on its FAQ page nor does it offer alternative text or other web accessibility features. Users will need to add their own slide titles and alt text in PowerPoint and Adobe Acrobat after exporting from Beautiful.AI to create an accessible presentation.  Conclusion Beautiful.AI challenged me to think in new ways about how best to deliver information in a visually engaging way. It’s a useful option for librarians and students who are looking for a presentation website that is fun to use, engaging, and on trend with current web design. Click here to view “My first ALA”presentation created with Beautiful.AI. Jeanette Sewell is the Database and Metadata Management Coordinator at Fondren Library, Rice University. Author Jeanette SewellPosted on November 12, 2018November 12, 2018Categories conferences, library, presentation, technology, tools National Forum on Web Privacy and Web Analytics We had the fantastic experience of participating in the National Forum on Web Privacy and Web Analytics in Bozeman, Montana last month. This event brought together around forty people from different areas and types of libraries to do in-depth discussion and planning about privacy issues in libraries. Our hosts from Montana State University, Scott Young, Jason Clark, Sara Mannheimer, and Jacqueline Frank, framed the event with different (though overlapping) areas of focus. We broke into groups based on our interests from a pre-event survey and worked through a number of activities to identify projects. You can follow along with all the activities and documents produced during the Forum in this document that collates all of them. Float your boat exercise             While initially worried that the activities would feel too forced, instead they really worked to release creative ideas. Here’s an example: our groups drew pictures of boats with sails showing opportunities, and anchors showing problems. We started out in two smaller subgroups of our subgroups and drew a boat, then met with the large subgroup to combine the boat ideas. This meant that it was easy to spot the common themes—each smaller group had written some of the same themes (like GDPR). Working in metaphor meant we could express some more complex issues, like politics, as the ocean—something that always surrounds the issue and can be helpful or unhelpful without much warning. This helped us think differently about issues and not get too focused on our own individual perspective. The process of turning metaphor into action was hard. We had to take the whole world of problems and opportunities and come up with how these could be realistically accomplished. Good and important ideas had to get left behind because they were so big there was no way to feasibly plan them, certainly not in a day or two. The differing assortment of groups (which were mixable where ideas overlapped) ensured that we were able to question each other’s assumptions and ask some hard questions. For example, one of the issues Margaret’s group had identified as a problem was disagreement in the profession about what the proper limits were on privacy. Individually identifiable usage metrics are a valuable commodity to some, and a thing not to be touched to others. While everyone in the room was probably biased more in favor of privacy than perhaps the profession at large is, we could share stories and realities of the types of data we were collecting and what it was being used for. Considering the realities of our environments, one of our ideas to bring everyone from across the library and archives world to create a unified set of privacy values was not going to happen. Despite that, we were able to identify one of the core problems that led to a lack of unity, which was, in many cases, lack of knowledge about what privacy issues existed and how these might affect institutions. When you don’t completely understand something, or only half understand it, you are more likely to be afraid of it.             On the afternoon of the second day and continuing into the morning of the third day, we had to get serious and pick just one idea to focus on to create a project plan. Again, the facilitators utilized a few processes that helped us take a big idea and break it down into more manageable components. We used “Big SCAI” thinking to frame the project: what is the status quo, what are the challenges, what actions are required, and what are the ideals. From there we worked through what was necessary for the project, nice to have, unlikely to get, and completely unnecessary to the project. This helped focus efforts and made the process of writing a project implementation plan much easier. What the workday looked like. Writing the project implementation plan as a group was made easier by shared documents, but we all commented on the irony of using Google Docs to write privacy plans. On the other hand, trying to figure out how to write in groups and easily share what we wrote using any other platform was a challenge in the moment. This reality illustrates the problems with privacy: the tool that is easiest to use and comes to mind first will be the one that ends up being used. We have to create tools that make privacy easy (which was a discussion many of us at the Forum had), but even more so we need to think about the tradeoffs that we make in choosing a tool and educate ourselves and others about this. In this case, since all the outcomes of the project were going to be public anyway, going on the “quick and easy” side was ok.             The Forum project leaders recently presented about their work at the DLF Forum 2018 conference. In this presentation, they outlined the work that they did leading up to the Forum, and the strategies that emerged from the day. They characterized the strategies as Privacy Badging and Certifications, Privacy Leadership Training, Privacy for Tribal Communities and Organizations, Model License for Vendor Contracts, Privacy Research Institute, and a Responsible Assessment Toolkit. You can read through the thought process and implementation strategies for these projects and others yourself at the project plan index. The goal is to ensure that whoever wants to do the work can do it. To quote Scott Young’s follow-up email, “We ask only that you keep in touch with us for the purposes of community facilitation and grant reporting, and to note the provenance of the idea in future proposals—a sort of CC BY designation, to speak in copyright terms.”             For us, this three-day deep dive into privacy was an inspiration and a chance to make new connections (while also catching up with some old friends). But even more, it was a reminder that you don’t need much of anything to create a community. Provided the right framing, as long as you have people with differing experiences and perspectives coming together to learn from each other, you’ve facilitated the community-building.   Author Margaret HellerPosted on October 29, 2018October 29, 2018Categories conferences, privacy The Ex Libris Knowledge Center and Orangewashing Two days after ProQuest completed their acquisition of Ex Libris in December 2015, Ex Libris announced the launch of their new online Customer Knowledge Center. In the press release for the Knowledge Center, the company describes it as “a single gateway to all Ex Libris knowledge resources,” including training materials, release notes, and product manuals. A defining feature is that there has never been any paywall or log-on requirement, so that all Knowledge Center materials remain freely accessible to any site visitor. Historically, access to documentation for automated library systems has been restricted to subscribing institutions, so the Knowledge Center represents a unique change in approach. Within the press release, it is also readily apparent how Ex Libris aims to frame the openness of the Knowledge Center as a form of support for open access. As the company states in the second paragraph, “Demonstrating the Company’s belief in the importance of open access, the site is open to all, without requiring any logon procedure.” Former Ex Libris CEO Matti Shem Tov goes a step further in the following paragraph: “We want our resources and documentation to be as accessible and as open as our library management, discovery, and higher-education technology solutions are.” The problem with how Ex Libris frames their press release is that it elides the difference between mere openness and actual open access. They are a for-profit company, and their currently burgeoning market share is dependent upon a software-as-a-service (SaaS) business model. Therefore, one way to describe their approach in this case is orangewashing. During a recent conversation with me, Margaret Heller came up with the term, based on the color of the PLOS open access symbol. Similar in concept to greenwashing, we can define orangewashing as a misappropriation of open access rhetoric for business purposes. What perhaps makes orangewashing more initially difficult to diagnose in Ex Libris’s (and more broadly, ProQuest’s) case is that they attempt to tie support for open access to other product offerings. Even before purchasing Ex Libris, ProQuest had been including an author-side paid open-access publishing option to its Electronic Thesis and Dissertation platform, though we can question whether this is actually a good option for authors. For its part, Ex Libris has listened to customer feedback about open access discovery. As an example, there are now open access filters for both the Primo and Summon discovery layers. Ex Libris has also, generally speaking, remained open to customer participation regarding systems development, particularly with initiatives like the Developer Network and Idea Exchange. Perhaps the most credible example is in a June 24, 2015 press release, where the company declares “support of the Open Discovery Initiative (ODI) and conformance with ODI’s recommended practice for pre-indexed ‘web-scale’ discovery services.” A key implication is that “conforming to ODI regulations about ranking of search results, linking to content, inclusion of materials in Primo Central, and discovery of open access content all uphold the principles of content neutrality.” Given the above information, in the case of the Knowledge Center, it is tempting to give Ex Libris the benefit of the doubt. As an access services librarian, I understand how much of a hassle it can be to find and obtain systems documentation in order to properly do my job. I currently work for an Ex Libris institution, and can affirm that the Knowledge Center is of tangible benefit. Besides providing easier availability for their materials, Ex Libris has done fairly well in keeping information and pathing up to date. Notably, as of last month, customers can also contribute their own documentation to product-specific Community Knowledge sections within the Knowledge Center. Nevertheless, this does not change the fact that while the Knowledge Center is unique in its format, it represents a low bar to clear for a company of Ex Libris’s size. Their systems documentation should be openly accessible in any case. Moreover, the Knowledge Center represents openness—in the form of company transparency and customer participation—for systems and products that are not open. This is why when we go back to the Knowledge Center press release, we can identify it as orangewashing. Open access is not the point of a profit-driven company offering freely accessible documentation, and any claims to this effect ultimately ring hollow. So what is the likely point of the Knowledge Center, then? We should consider that Alma has become the predominant service platform within academic libraries, with Primo and Summon being the only supported discovery layers for it. While OCLC and EBSCO offer or support competing products, Ex Libris already held an advantageous position even before the ProQuest purchase. Therefore, besides the Knowledge Center serving as supportive measure for current customers, we can view it as a sales pitch to future ones. This may be a smart business strategy, but again, it has little to do with open access. Two other recent developments provide further evidence of Ex Libris’s orangewashing. The first is MLA’s announcement that EBSCO will become the exclusive vendor for the MLA International Bibliography. On the PRIMO-L listserv, Ex Libris posted a statement [listserv subscription required] noting that the agreement “goes against the goals of NISO’s Open Discovery Initiative…to promote collaboration and transparency among content and discovery providers.” Nevertheless, despite not being involved in the agreement, Ex Libris shares some blame given the long-standing difficulty over EBSCO not providing content to the Primo Central Index. As a result, what may occur is the “siloing” of an indispensable research database, while Ex Libris customers remain dependent on the company to help determine an eventual route to access. Secondly, in addition to offering research publications through ProQuest and discovery service through Primo/Summon, Ex Libris now provides end-to-end content management through Esploro. Monetizing more aspects of the research process is certainly far from unusual among academic publishers and service providers. Elsevier arguably provides the most egregious example, and as Lisa Janicke Hinchliffe notes, their pattern of recent acquisitions belies an apparent goal of creating a vertical stack service model for publication services. In considering what Elsevier is doing, it is unsurprising—from a business standpoint—for Ex Libris and ProQuest to pursue profits in a similar manner. That said, we should bear in mind that libraries are already losing control over open access as a consequence of the general strategy that Elsevier is employing. Esploro will likely benefit from having strong library development partners and “open” customer feedback, but the potential end result could place its customers in a more financially disadvantageous and less autonomous position. This is simply antithetical to open access. Over the past few years, Ex Libris has done well not just in their product development, but also their customer support. Making the Knowledge Center “open to all” in late 2015 was a very positive step forward. Yet the company’s decision to orangewash through claiming support for open access as part of a product unveiling still warrants critique. Peter Suber reminds us that open access is a “revolutionary kind of access”—one that is “unencumbered by a motive of financial gain.” While Ex Libris can perhaps talk about openness with a little more credibility than their competitors, their bottom line is still what really matters. Author Chris MartinPosted on September 25, 2018September 25, 2018Categories open access, Scholarly Communication Managing ILS Updates We’ve done a few screencasts in the past here at TechConnect and I wanted to make a new one to cover a topic that’s come up this summer: managing ILS updates. Integrated Library Systems are huge, unwieldy pieces of software and it can be difficult to track what changes with each update: new settings are introduced, behaviors change, bugs are (hopefully) fixed. The video belows shows my approach to managing this process and keeping track of ongoing issues with our Koha ILS. Author Eric PhetteplacePosted on August 13, 2018August 10, 2018Categories library Blockchain: Merits, Issues, and Suggestions for Compelling Use Cases Blockchain holds a great potential for both innovation and disruption. The adoption of blockchain also poses certain risks, and those risks will need to be addressed and mitigated before blockchain becomes mainstream. A lot of people have heard of blockchain at this point. But many are unfamiliar with how this new technology exactly works and unsure about under which circumstances or on what conditions it may be useful to libraries. In this post, I will provide a brief overview of the merits and the issues of blockchain. I will also make some suggestions for compelling use cases of blockchain at the end of this post. What Blockchain Accomplishes Blockchain is the technology that underpins a well-known decentralized cryptocurrency, Bitcoin. To simply put, blockchain is a kind of distributed digital ledger on a peer-to-peer (P2P) network, in which records are confirmed and encrypted. Blockchain records and keeps data in the original state in a secure and tamper-proof manner[1] by its technical implementation alone, thereby obviating the need for a third-party authority to guarantee the authenticity of the data. Records in blockchain are stored in multiple ledgers in a distributed network instead of one central location. This prevents a single point of failure and secures records by protecting them from potential damage or loss. Blocks in each blockchain ledger are chained to one another by the mechanism called ‘proof of work.’ (For those familiar with a version control system such as Git, a blockchain ledger can be thought of as something similar to a P2P hosted git repository that allows sequential commits only.[2]) This makes records in a block immutable and irreversible, that is, tamper-proof. In areas where the authenticity and security of records is of paramount importance, such as electronic health records, digital identity authentication/authorization, digital rights management, historic records that may be contested or challenged due to the vested interests of certain groups, and digital provenance to name a few, blockchain can lead to efficiency, convenience, and cost savings. For example, with blockchain implemented in banking, one will be able to transfer funds across different countries without going through banks.[3] This can drastically lower the fees involved, and the transaction will take effect much more quickly, if not immediately. Similarly, adopted in real estate transactions, blockchain can make the process of buying and selling a property more straightforward and efficient, saving time and money.[4] Disruptive Potential of Blockchain The disruptive potential of blockchain lies in its aforementioned ability to render the role of a third-party authority obsolete, which records and validates transactions and guarantees their authenticity, should a dispute arise. In this respect, blockchain can serve as an alternative trust protocol that decentralizes traditional authorities. Since blockchain achieves this by public key cryptography, however, if one loses one’s own personal key to the blockchain ledger holding one’s financial or real estate asset, for example, then that will result in the permanent loss of such asset. With the third-party authority gone, there will be no institution to step in and remedy the situation. Issues This is only some of the issues with blockchain. Other issues include (a) interoperability between different blockchain systems, (b) scalability of blockchain at a global scale with large amount of data, (c) potential security issues such as the 51% attack [5], and (d) huge energy consumption [6] that a blockchain requires to add a block to a ledger. Note that the last issue of energy consumption has both environmental and economic ramifications because it can cancel out the cost savings gained from eliminating a third-party authority and related processes and fees. Challenges for Wider Adoption There are growing interests in blockchain among information professionals, but there are also some obstacles to those interests gaining momentum and moving further towards wider trial and adoption. One obstacle is the lack of general understanding about blockchain in a larger audience of information professionals. Due to its original association with bitcoin, many mistake blockchain for cryptocurrency. Another obstacle is technical. The use of blockchain requires setting up and running a node in a blockchain network, such as Ethereum[7], which may be daunting to those who are not tech-savvy. This makes a barrier to entry high to those who are not familiar with command line scripting and yet still want to try out and test how a blockchain functions. The last and most important obstacle is the lack of compelling use cases for libraries, archives, and museums. To many, blockchain is an interesting new technology. But even many blockchain enthusiasts are skeptical of its practical benefits at this point when all associated costs are considered. Of course, this is not an insurmountable obstacle. The more people get familiar with blockchain, the more ways people will discover to use blockchain in the information profession that are uniquely beneficial for specific purposes. Suggestions for Compelling Use Cases of Blockchain In order to determine what may make a compelling use case of blockchain, the information profession would benefit from considering the following. (a) What kind of data/records (or the series thereof) must be stored and preserved exactly the way they were created. (b) What kind of information is at great risk to be altered and compromised by changing circumstances. (c) What type of interactions may need to take place between such data/records and their users.[8] (d) How much would be a reasonable cost for implementation. These will help connecting the potential benefits of blockchain with real-world use cases and take the information profession one step closer to its wider testing and adoption. To those further interested in blockchain and libraries, I recommend the recordings from the Library 2.018 online mini-conference, “Blockchain Applied: Impact on the Information Profession,” held back in June. The Blockchain National Forum, which is funded by IMLS and is to take place in San Jose, CA on August 6th, will also be livestreamed. Notes [1] For an excellent introduction to blockchain, see “The Great Chain of Being Sure about Things,” The Economist, October 31, 2015, https://www.economist.com/news/briefing/21677228-technology-behind-bitcoin-lets-people-who-do-not-know-or-trust-each-other-build-dependable. [2] Justin Ramos, “Blockchain: Under the Hood,” ThoughtWorks (blog), August 12, 2016, https://www.thoughtworks.com/insights/blog/blockchain-under-hood. [3] The World Food Programme, the food-assistance branch of the United Nations, is using blockchain to increase their humanitarian aid to refugees. Blockchain may possibly be used for not only financial transactions but also the identity verification for refugees. Russ Juskalian, “Inside the Jordan Refugee Camp That Runs on Blockchain,” MIT Technology Review, April 12, 2018, https://www.technologyreview.com/s/610806/inside-the-jordan-refugee-camp-that-runs-on-blockchain/. [4] Joanne Cleaver, “Could Blockchain Technology Transform Homebuying in Cook County — and Beyond?,” Chicago Tribune, July 9, 2018, http://www.chicagotribune.com/classified/realestate/ct-re-0715-blockchain-homebuying-20180628-story.html. [5] “51% Attack,” Investopedia, September 7, 2016, https://www.investopedia.com/terms/1/51-attack.asp. [6] Sherman Lee, “Bitcoin’s Energy Consumption Can Power An Entire Country — But EOS Is Trying To Fix That,” Forbes, April 19, 2018, https://www.forbes.com/sites/shermanlee/2018/04/19/bitcoins-energy-consumption-can-power-an-entire-country-but-eos-is-trying-to-fix-that/#49ff3aa41bc8. [7] Osita Chibuike, “How to Setup an Ethereum Node,” The Practical Dev, May 23, 2018, https://dev.to/legobox/how-to-setup-an-ethereum-node-41a7. [8] The interaction can also be a self-executing program when certain conditions are met in a blockchain ledger. This is called a “smart contract.” See Mike Orcutt, “States That Are Passing Laws to Govern ‘Smart Contracts’ Have No Idea What They’re Doing,” MIT Technology Review, March 29, 2018, https://www.technologyreview.com/s/610718/states-that-are-passing-laws-to-govern-smart-contracts-have-no-idea-what-theyre-doing/. Author Bohyun KimPosted on July 24, 2018July 26, 2018Categories coding, data, technologyTags bitcoin, blockchain, distributed ledger technology1 Comment on Blockchain: Merits, Issues, and Suggestions for Compelling Use Cases Introducing Our New Best Friend, GDPR You’ve seen the letters GDPR in every single email you’ve gotten from a vendor or a mailing list lately, but you might not be exactly sure what it is. With GDPR enforcement starting on May 25, it’s time for a crash course in what GDPR is, and why it could be your new best friend whether you are in the EU or not. First, you can check out the EU GDPR information site (though it probably will be under heavy load for a few days!) for lots of information on this. It’s important to recognize, however, that for universities like mine with a campus located in the EU, it has created additional oversight to ensure that our own data collection practices are GDPR compliant, or that we restrict people residing in the EU from accessing those services. You should definitely work with legal counsel on your own campus in making any decisions about GDPR compliance. So what does the GDPR actually mean in practice? The requirements break down this way: any company which holds the data of any EU citizen must provide data controls, no matter where the company or the data is located. This means that every large web platform and pretty much every library vendor must comply or face heavy fines. The GDPR offers the following protections for personally identifiable information, which includes things like IP address: privacy terms and conditions must be written in easy to understand language, data breaches require quick notifications, the right to know what data is being collected and to receive a copy of it, the “right to be forgotten” or data erasure (unless it’s in the public interest for the data to be retained), ability to transfer data between providers, systems to be private by design and only collect necessary data, and for companies to appoint data privacy officers without conflicts of interest. How this all works in practice is not consistent, and there will be a lot to be worked out in the courts in the coming years. Note that Google recently lost several right to be forgotten cases, and were required to remove information that they had originally stated was in the public interest to retain. The GDPR has actually been around for a few years, but May 25, 2018 was set as the enforcement date, so many people have been scrambling to meet that deadline. If you’re reading this today, there’s probably not a lot of time to do anything about your own practices, but if you haven’t yet reviewed what your vendors are doing, this would be a good time. Note too that there are no rights guaranteed for any Americans, and several companies, including Facebook, have moved data governance out of their Irish office to California to be out of reach of suits brought in Irish courts. Where possible, however, we should be using all the features at our disposal. As librarians, we already tend to the “privacy by design” philosophy, even though we aren’t always perfect at it. As I wrote in my last post, my library worked on auditing our practices and creating a new privacy policy, and one of the last issues was trying to figure out how we would approach some of our third-party services which we need to provide services to our patrons but that did not allow deleting data. Now some of those features are being made available. For example, Google Analytics now has a data retention feature, which allows you to set data to expire and be deleted after a certain amount of time. Google provides some more detailed instructions to ensure that you are not accidentally collecting personally-identifiable information in your analytics data. Lots of our library vendors provide personal account features, and those too are subject to these new GDPR features. This means that there are new levels of transparency about what kinds of tracking they are doing, and greater ability for patrons to control data, and for you to control data on the behalf of patrons. Here are a few example vendor GDPR compliance statements or FAQs: EBSCO Ex Libris ProQuest Springshare Note that some vendors, like EBSCO, are moving to HTTPS for all sites that weren’t before, and so this may require changes to proxy servers or other links. I am excited about GDPR because no matter where we are located, it gives us new tools to defend the privacy of our patrons. Even better than that, it is providing lots of opportunities on our campuses to talk about privacy with all stakeholders. At my institution, the library has been able to showcase our privacy expertise and have some good conversations about data governance and future goals for privacy. It doesn’t mean that all our problems will be solved, but we are moving in a more positive direction. Author Margaret HellerPosted on May 24, 2018May 23, 2018Categories administration, privacyTags gdpr Names are Hard A while ago I stumbled onto the post “Falsehoods Programmers Believe About Names” and was stunned. Personal names are one of the most deceptively difficult forms of data to work with and this article touched on so many common but unaddressed problems. Assumptions like “people have exactly one canonical name” and “My system will never have to deal with names from China/Japan/Korea” were apparent everywhere. I consider myself a fairly critical and studious person, I devote time to thinking about the consequences of design decisions and carefully attempt to avoid poor assumptions. But I’ve repeatedly run into trouble when handling personal names as data. There is a cognitive dissonance surrounding names; we treat them as rigid identifiers when they’re anything but. We acknowledge their importance but struggle to take them as seriously. Names change. They change due to marriage, divorce, child custody, adoption, gender identity, religious devotion, performance art, witness protection, or none of these at all. Sometimes people just want a new name. And none of these reasons for change are more or less valid than others, though our legal system doesn’t always treat them equally. We have students who change their legal name, which is often something systems expect, but then they have the audacity to want to change their username, too! And that works less often because all sorts of system integrations expect usernames to be persistent. Names do not have a universal structure. There is no set quantity of components in a name nor an established order to those components. At my college, we have students without surnames. In almost all our systems, surname is a required field, so we put a period “.” there to satisfy that requirement. Then, on displays in our digital repository where surnames are assumed, we end up with bolded section headers like “., Johnathan” which look awkward. Many Western names might follow a [Given name] – [Middle name] – [Surname] structure and an unfortunate number of the systems I have to deal with assume all names share this structure. It’s easy to see how this yields problematic results. For instance, if you want to a see a sorted list of users, you probably want to sort by family name, but many systems sort by the name in the last position causing, for instance, Chinese names 1 to be handled differently from Western ones. 2 But it’s not only that someone might not have a middle name, or might have two middle names, or might have a family name in the first position—no, even that would be too simple! Some name components defy simple classifications. I once met a person named “Bus Stop”. “Stop” is clearly not a family affiliation, despite coming in the final position of the name. Sometimes the second component of a tripartite Western name isn’t a middle name at all, but a maiden name or the second word of a two-word first name (e.g. “Mary Anne” or “Lady Bird”)! One cannot even determine by looking at a familiar structure the roles of all of a name’s pieces! Names are also contextual. One’s name with family, with legal institutions, and with classmates can all differ. Many of our international students have alternative Westernized first names. Their family may call them Qiáng but they introduce themselves as Brian in class. We ask for a “preferred name” in a lot of systems, which is a nice step forward, but don’t ask when it’s preferred. Names might be meant for different situations. We have no system remotely ready for this, despite the personalization that’s been seeping into web platforms for decades. So if names are such a trouble, why not do our best and move on? Aren’t these fringe cases that don’t affect the vast majority of our users? These issues simply cannot be ignored because names are vital. What one is called, even if it’s not a stable identifier, has great effects on one’s life. It’s dispiriting to witness one’s name misspelled, mispronounced, treated as an inconvenience, botched at every turn. A system that won’t adapt to suit a name delegitimizes the name. It says, “oh that’s not your real name” as if names had differing degrees of reality. But a person may have multiple names—or many overlapping names over time—and while one may be more institutionally recognized at a given time, none are less real than the others. If even a single student a year is affected, it’s the absolute least amount of respect we can show to affirm their name(s). So what do we to do? Endless enumerations of the difficulties of working with names does little but paralyze us. Honestly, when I consider about the best implementation of personal names, the MODS metadata schema comes to mind. Having a element with any number of children is the best model available. The s can be ordered in particular ways, a “@type” attribute can define a part’s function 3, a record can include multiple names referencing the same person, multiple names with distinct parts can be linked to the same authority record, etc. MODS has a flexible and comprehensive treatment of name data. Unfortunately, returning to “Falsehoods Programmers Believe”, none of the library systems I administer do anywhere near as good a job as this metadata schema. Nor is it necessarily a problem with Western bias—even the Chinese government can’t develop computer systems to accurately represent the names of people in the country, or even agree on what the legal character set should be! 4 It seems that programmers start their apps by creating a “users” database table with columns for unique identifier, username, “firstname”/”lastname” [sic], and work from there. On the bright side, the name isn’t used as the identifier at least! We all learned that in databases class but we didn’t learn to make “names” a separate table linked to “users” in our relational databases. In my day-to-day work, the best I’ve done is to be sensitive to the importance of names changes specifically and how our systems handle them. After a few meetings with a cross-departmental team, we developed a name change process at our college. System administrators from across the institution are on a shared listserv where name changes are announced. In the libraries, I spoke with our frontline service staff about assisting with name changes. Our people at the circulation desk know to notice name discrepancies—sometimes a name badge has been updated but not our catalog records, we can offer to make them match—but also to guide students who may need to contact the registrar or other departments on campus to initiate the top-down name change process. While most of our the library’s systems don’t easily accommodate username changes, I can write administrative scripts for our institutional repository that alter the ownership of a set of items from an old username to a new one. I think it’s important to remember that we’re inconveniencing the user with the work of implementing their name change and not the other way around. So taking whatever extra steps we can do on our own, without pushing labor onto our students and staff, is the best way we can mitigate how poorly our tools are able to support the protean nature of personal names. Notes Chinese names typically have the surname first, followed by the given name. ↩ Another poor implementation can be seen in The Chicago Manual of Style‘s indexing instructions, which has an extensive list of exceptions to the Western norm and how to handle them. But CMoS provides no guidance on how one would go about identifying a name’s cultural background or, for instance, identifying a compound surname. ↩ Although the MODS user guidelines sadly limit the use of the type attribute to a fixed list of values which includes “family” and “given”, rendering it subject to most of the critiques in this post. Substantially expanding this list with “maiden”, “patronymic/matronymic” (names based on a parental given name, e.g. Mikhailovich), and more, as well as some sort of open-ended “other” option, would be a great improvement. ↩ https://www.nytimes.com/2009/04/21/world/asia/21china.html ↩ Author Eric PhetteplacePosted on May 14, 2018May 13, 2018Categories change, data, diversity2 Comments on Names are Hard Posts navigation Page 1 Page 2 … Page 23 Next page Search for: Search About ACRL TechConnect is a moderated blog written by librarians and archivists covering innovative projects, emerging tech tools, coding, usability, design, and more. ACRL TechConnect serves as your source for technology-related content from the Association of College and Research Libraries, a division of the American Library Association, and C&RL News magazine. CC-BY-NC-ND This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Based on a work at acrl.ala.org/techconnect. Recent Posts Broken Links in the Discovery Layer—Pt. II: Towards an Ethnography of Broken Links Broken Links in the Discovery Layer—Pt. I: Researching a Problem ORCID for System Interoperability in Scholarly Communication Workflows Creating Presentations with Beautiful.AI National Forum on Web Privacy and Web Analytics ACRL Technology Resources Articles, Books, and Podcasts C&RL News Column Communities And Groups Categories academic librarianship accessibility administration algorithms API book review Books careers change coding conferences continuing education copyright data design digital libraries digital scholarship digitization discovery diversity hacking ILS information architecture library library as makerspace library design library instruction linked-data management marketing metadata mobile open access pedagogy Planning presentation privacy publication reference Scholarly Communication social media technology The Setup tools tutorial Uncategorized usability use study version control web what-if workflow writing About Authors ACRL TechConnect Proudly powered by WordPress afonte-info-140 ---- Home - Afonte Jornalismo de Dados Home Conteúdo Sobre O que fazemos Quem Somos Contato Curso online da Cásper Líbero aborda marketing político nas redes sociais Abril 19, 2021 0 Ferramentas, métodos e estratégias de construção de imagem pública serão explorados em dois sábados de aulas (15 e 22/5) 10 referências sobre fact-checking para pesquisas e trabalhos acadêmicos Março 31, 2021 0 Dicas de leitura sobre checagem de fatos e desinformação para pesquisadores iniciantes Dados Jornalistas e cientistas de dados são os mais citados por especialistas brasileiros no Twitter Fevereiro 11, 2021 0 Pesquisa do IBPAD e Science Pulse analisou os perfis mais mencionados nas discussões sobre Covid-19 Então, o que é jornalismo profissional? Janeiro 13, 2021 0 Pesquisa recém publicada pela Folha sugere que quem consome “jornalismo profissional” tem menos chance de acreditar em desinformação, mas há um conceito anterior a ser discutido para que tal conclusão faça sentido Eventos Confira como foi o lançamento do projeto Postar ou Não Março 31, 2021 0 Site e e-book buscam incentivar leitura crítica de conteúdos digitais, oferecendo referências bibliográficas e atividades focadas no público jovem Afonte e Goethe-Institut Porto Alegre lançam site e e-book de educação midiática Março 25, 2021 0 Em site e e-book, “Postar ou Não?” é um guia hipermídia que busca incentivar a leitura crítica de conteúdos digitais com conceitos, dicas e testes Siga nas redes sociais Instagram Facebook Twitter Pesquisar por: Curso online da Cásper Líbero aborda marketing político nas redes sociais 10 referências sobre fact-checking para pesquisas e trabalhos acadêmicos Confira como foi o lançamento do projeto Postar ou Não Afonte e Goethe-Institut Porto Alegre lançam site e e-book de educação midiática Inscrições abertas para nova turma do curso de fact-checking na Cásper Líbero Veja como foi o Open Data Day POA 2021 Jornalistas e cientistas de dados são os mais citados por especialistas brasileiros no Twitter Open Data Day POA 2021 foca em fiscalização de gastos públicos Então, o que é jornalismo profissional? 2020: o ano da transparência? Amazônia Artigos checagem coronavirus curso dadodasemana Dados Dados Abertos desinformação Dia dos Dados Abertos eleições eleições2020 entrevista fact-checking Google Trends Jornalismo Jornalismo de Dados LAI Lava Jato media literacy Open Data Day palestra Porto Alegre Postar ou Não? transparência Nenhum comentário encontrado Etiquetas Amazônia Artigos checagem coronavirus curso dadodasemana Dados Dados Abertos desinformação Dia dos Dados Abertos eleições eleições2020 entrevista fact-checking Google Trends Jornalismo Jornalismo de Dados LAI Lava Jato media literacy Meio Ambiente Mina Guaíba Open Data Day palestra pesquisa podcast Porto Alegre Postar ou Não? Reportagem transparência Páginas Contato Conteúdo Home O que fazemos Sobre Quem Somos Artigos recentes Curso online da Cásper Líbero aborda marketing político nas redes sociais 10 referências sobre fact-checking para pesquisas e trabalhos acadêmicos Confira como foi o lançamento do projeto Postar ou Não Afonte e Goethe-Institut Porto Alegre lançam site e e-book de educação midiática Inscrições abertas para nova turma do curso de fact-checking na Cásper Líbero Desenvolvedor Marketing To Go Copyright © 2021 | WordPress Theme by MH Themes afroimpacto-com-470 ---- Afroimpacto Home Clube E-book Newsletter Afroimpacto transformar a vida de pessoas negras causar um afroimpacto Quem somos Somos a Afroimpacto: um hub de desenvolvimento afroempreendedor que realiza ações nos eixos de Consultoria, Educação Empreendedora e Programas de desenvolvimento, com o objetivo de reduzir a desigualdade social, econômica e educacional no cenário do empreendedorismo. Um "hub", no cenário da inovação, são serviços integrados oferecidos a comunidade empreendedora, conectando pessoas e promovendo oportunidades iguais de desenvolvimento. Queremos conectar ecossistemas para impulsionar empreendedores(as) negros(as), e assim, promover seu desenvolvimento sócio-econômico. Para cumprir nossa missão, atuamos em diferentes frentes, considerando a linguagem e adaptabilidade do conteúdo empreendedor à realidade de pessoas negras. Clube Afro O Clube Afro é um clube de conteúdo afroempreendedor para conectar e fortalecer empreendedores negros. Conteúdo semanal sobre afroempreendedorismo e negócios em uma linguagem simplificada Exercícios e ferramentas para desenvolver o seu negócio em conjunto com fórum para tirar dúvidas Rede de Afroempreendedores em diferentes estágios disponíveis 24h A assinatura do clube é mensal e por um preço acessível! Participe E-book Empreender é se lançar, inovar, criar soluções, transformar problemas em oportunidades de negócio, e muitas outras características. Além disso, possui diversas ramificações, entre elas, o Empreendedorismo Negro. Mas como se define esta modalidade de empreendedorismo? Antes de respondermos a essa pergunta, traremos para vocês uma série de dados introdutórios que são necessários para compreender o contexto específico e atual da população negra no cenário brasileiro, que influenciam tanto a forma como estes empreendedores começam a empreender, quanto à perspectiva que empreendem. Download Faça parte da nossa lista de conteúdo! Cadastre-se Tudo certinho! Agora você faz parte da nossa lista :D Contato: contato@afroimpacto.com Faça o download grátis! Preencha o formulário abaixo para receber o e-book em seu e-mail Download Enviamos o link do download para seu e-mail! Boa leitura :D allcontributors-org-6127 ---- All Contributors: ✨ Recognize all contributors, not just the ones who push code ✨ Docs GitHub Star Recognize All Contributors Including those that don't push code Install the Bot Add New Contributors in Seconds We’ve built a bot to automate the tedious stuff for adding project contributors, so you can focus on your project instead of managing your ReadMe. Read the Documentation Emoji Key (Contributions Cheatsheet) How to Use the Bot Submit an Issue How it Works 1. Install the Bot to your project Check the Installation doc for how to add it 2. Start a pull request or comment 3. Mention the @all-contributors Bot 4. Add a contributor’s username and contribution type Check the contribution types in the Emoji Key Cheatsheet 5. Post and your ReadMe updates automatically! It'll add the Contributor Table for your first time, too Who's Using it? There are 2000+ Projects using All Contributors! Start adding contributors to your project today Install the Bot Read the Docs andromedayelton-com-2311 ---- andromeda yelton Skip to content andromeda yelton Menu Home About Contact Resume HAMLET LITA Talks Machine Learning (ALA Midwinter 2019) Boston Python Meetup (August 21, 2018) SWiB16 LibTechConf 2016 Code4Lib 2015 Keynote Texas Library Association 2014 Online Northwest 2014: Five Conversations About Code New Jersey ESummit (May 2, 2013) Westchester Library Association (January 7, 2013) Bridging the Digital Divide with Mobile Services (Webjunction, July 25 2012) I haven’t failed, I’ve just tried a lot of ML approaches that don’t work “Let’s blog every Friday,” I thought. “It’ll be great. People can see what I’m doing with ML, and it will be a useful practice for me!” And then I went through weeks on end of feeling like I had nothing to report because I was trying approach after approach to this one problem that simply didn’t work, hence not blogging. And finally realized: oh, the process is the thing to talk about… Hi. I’m Andromeda! I am trying to make a neural net better at recognizing people in archival photos. After running a series of experiments — enough for me to have written 3,804 words of notes — I now have a neural net that is ten times worse at its task. 🎉 And now I have 3,804 words of notes to turn into a blog post (a situation which gets harder every week). So let me catch you up on the outline of the problem: Download a whole bunch of archival photos and their metadata (thanks, DPLA!) Use a face detection ML library to locate faces, crop them out, and save them in a standardized way Benchmark an off-the-shelf face recognition system to see how good it is at identifying these faces Retrain it Benchmark my new system Step 3: profit, right? Well. Let me also catch you up on some problems along the way: Alas, metadata Archival photos are great because they have metadata, and metadata is like labels, and labels mean you can do supervised learning, right? Well…. Is he “Du Bois, W. E. B. (William Edward Burghardt), 1868-1963” or “Du Bois, W. E. B. (William Edward Burghardt) 1868-1963” or “Du Bois, W. E. B. (William Edward Burghardt)” or “W.E.B. Du Bois”? I mean, these are all options. People have used a lot of different metadata practices at different institutions and in different times. But I’m going to confuse the poor computer if I imply to it that all these photos of the same person are photos of different people. (I have gone through several attempts to resolve this computationally without needing to do everything by hand, with only modest success.) What about “Photographs”? That appears in the list of subject labels for lots of things in my data set. “Photographs” is a person, right? I ended up pulling in an entire other ML component here — spaCy, to do some natural language processing to at least guess which lines are probably names, so I can clear the rest of them out of my way. But spaCy only has ~90% accuracy on personal names anyway and, guess what, because everything is terrible, in predictable ways, it has no idea “Kweisi Mfume” is a person. Is a person who appears in the photo guaranteed to be a person who appears in the photo? Nope. Is a person who appears in the metadata guaranteed to be a person who appears in the photo? Also nope! Often they’re a photographer or other creator. Sometimes they are the subject of the depicted event, but not themselves in the photo. (spaCy will happily tell you that there’s personal name content in something like “Martin Luther King Day”, but MLK is unlikely to appear in a photo of an MLK day event.) Oh dear, linear algebra OK but let’s imagine for the sake of argument that we live in a perfect world where the metadata is exactly what we need — no more, no less — and its formatting is perfectly consistent. 🦄 Here you are, in this perfect world, confronted with a photo that contains two people and has two names. How do you like them apples? I spent more time than I care to admit trying to figure this out. Can I bootstrap from photos that have one person and one name — identify those, subtract them out of photos of two people, go from there? (Not reliably — there’s a lot of data I never reach that way — and it’s horribly inefficient.) Can I do something extremely clever with matrix multiplication? Like…once I generate vector space embeddings of all the photos, can I do some sort of like dot-product thing across all of my photos, or big batches of them, and correlate the closest-match photos with overlaps in metadata? Not only is this a process which begs the question — I’d have to do that with the ML system I have not yet optimized for archival photo recognition, thus possibly just baking bad data in — but have I mentioned I have taken exactly one linear algebra class, which I didn’t really grasp, in 1995? What if I train yet another ML system to do some kind of k-means clustering on the embeddings? This is both a promising approach and some really first-rate yak-shaving, combining all the question-begging concerns of the previous paragraph with all the crystalline clarity of black box ML. Possibly at this point it would have been faster to tag them all by hand, but that would be admitting defeat. Also I don’t have a research assistant, which, let’s be honest, is the person who would usually be doing this actual work. I do have a 14-year-old and I am strongly considering paying her to do it for me, but to facilitate that I’d have to actually build a web interface and probably learn more about AWS, and the prospect of reading AWS documentation has a bracing way of reminding me of all of the more delightful and engaging elements of my todo list, like calling some people on the actual telephone to sort out however they’ve screwed up some health insurance billing. Nowhere to go but up Despite all of that, I did actually get all the way through the 5 steps above. I have a truly, spectacularly terrible neural net. Go me! But at a thousand-plus words, perhaps I should leave that story for next week…. Andromeda Uncategorized Leave a comment April 16, 2021 this time: speaking about machine learning No tech blogging this week because most of my time was taken up with telling people about ML instead! One talk for an internal Harvard audience, “Alice in Dataland”, where I explained some of the basics of neural nets and walked people through the stories I found through visualizing HAMLET data. One talk for the NISO plus conference, “Discoverability in an AI World”, about ways libraries and other cultural heritage institutions are using AI both to enhance traditional discovery interfaces and provide new ones. This was recorded today but will be played at the conference on the 23rd, so there’s still time to register if you want to see it! NISO Plus will also include a session on AI, metadata, and bias featuring Dominique Luster, who gave one of my favorite code4lib talks, and one on AI and copyright featuring one of my go-to JD/MLSes, Nancy Sims. And I’m prepping for an upcoming talk that has not yet been formally announced. Which is to say, I guess, I have a lot of talks about AI and cultural heritage in my back pocket, if you were looking for someone to speak about that 😉 Andromeda Uncategorized Leave a comment February 12, 2021 archival face recognition for fun and nonprofit In 2019, Dominique Luster gave a super good Code4Lib talk about applying AI to metadata for the Charles “Teenie” Harris collection at the Carnegie Museum of Art — more than 70,000 photographs of Black life in Pittsburgh. They experimented with solutions to various metadata problems, but the one that’s stuck in my head since 2019 is the face recognition one. It sure would be cool if you could throw AI at your digitized archival photos to find all the instances of the same person, right? Or automatically label them, given that any of them are labeled correctly? Sadly, because we cannot have nice things, the data sets used for pretrained face recognition embeddings are things like lots of modern photos of celebrities, a corpus which wildly underrepresents 1) archival photos and 2) Black people. So the results of the face recognition process are not all that great. I have some extremely technical ideas for how to improve this — ideas which, weirdly, some computer science PhDs I’ve spoken with haven’t seen in the field. So I would like to experiment with them. But I must first invent the universe set up a data processing pipeline. Three steps here: Fetch archival photographs; Do face detection (draw bounding boxes around faces and crop them out for use in the next step); Do face recognition. For step 1, I’m using DPLA, which has a super straightforward and well-documented API and an easy-to-use Python wrapper (which, despite not having been updated in a while, works just fine with Python 3.6, the latest version compatible with some of my dependencies). For step 2, I’m using mtcnn, because I’ve been following this tutorial. For step 3, face recognition, I’m using the steps in the same tutorial, but purely for proof-of-concept — the results are garbage because archival photos from mid-century don’t actually look anything like modern-day celebrities. (Neural net: “I have 6% confidence this is Stevie Wonder!” How nice for you.) Clearly I’m going to need to build my own corpus of people, which I have a plan for (i.e. I spent some quality time thinking about numpy) but haven’t yet implemented. So far the gotchas have been: Gotcha 1: If you fetch a page from the API and assume you can treat its contents as an image, you will be sad. You have to treat them as a raw data stream and interpret that as an image, thusly: from PIL import Image import requests response = requests.get(url, stream=True) response.raw.decode_content = True data = requests.get(url).content Image.open(io.BytesIO(data)) This code is, of course, hilariously lacking in error handling, despite fetching content from a cesspool of untrustworthiness, aka the internet. It’s a first draft. Gotcha 2: You see code snippets to convert images to pixel arrays (suitable for AI ingestion) that look kinda like this: np.array(image).astype('uint8'). Except they say astype('float32') instead of astype('uint32'). I got a creepy photonegative effect when I used floats. Gotcha 3: Although PIL was happy to manipulate the .pngs fetched from the API, it was not happy to write them to disk; I needed to convert formats first (image.convert('RGB')). Gotcha 4: The suggested keras_vggface library doesn’t have a Pipfile or requirements.txt, so I had to manually install keras and tensorflow. Luckily the setup.py documented the correct versions. Sadly the tensorflow version is only compatible with python up to 3.6 (hence the comment about DPyLA compatibility above). I don’t love this, but it got me up and running, and it seems like an easy enough part of the pipeline to rip out and replace if it’s bugging me too much. The plan from here, not entirely in order, subject to change as I don’t entirely know what I’m doing until after I’ve done it: Build my own corpus of identified people This means the numpy thoughts, above It also means spending more quality time with the API to see if I can automatically apply names from photo metadata rather than having to spend too much of my own time manually labeling the corpus Decide how much metadata I need to pull down in my data pipeline and how to store it Figure out some kind of benchmark and measure it Try out my idea for improving recognition accuracy Benchmark again Hopefully celebrate awesomeness Andromeda Uncategorized Leave a comment February 5, 2021 sequence models of language: slightly irksome Not much AI blogging this week because I have been buried in adulting all week, which hasn’t left much time for machine learning. Sadface. However, I’m in the last week of the last deeplearning.ai course! (Well. Of the deeplearning.ai sequence that existed when I started, anyway. They’ve since added an NLP course and a GANs course, so I’ll have to think about whether I want to take those too, but at the moment I’m leaning toward a break from the formal structure in order to give myself more time for project-based learning.) This one is on sequence models (i.e. “the data comes in as a stream, like music or language”) and machine translation (“what if we also want our output to be a stream, because we are going from a sentence to a sentence, and not from a sentence to a single output as in, say, sentiment analysis”). And I have to say, as a former language teacher, I’m slightly irked. Because the way the models work is — OK, consume your input sentence one token at a time, with some sort of memory that allows you to keep track of prior tokens in processing current ones (so far, so okay). And then for your output — spit out a few most-likely candidate tokens for the first output term, and then consider your options for the second term and pick your most-likely two-token pairs, and then consider all the ways your third term could combine with those pairs and pick your most likely three-token sequences, et cetera, continue until done. And that is…not how language works? Look at Cicero, presuming upon your patience as he cascades through clause after clause which hang together in parallel but are not resolved until finally, at the end, a verb. The sentence’s full range of meanings doesn’t collapse until that verb at the end, which means you cannot be certain if you move one token at a time; you need to reconsider the end in light of the beginning. But, at the same time, that ending token is not equally presaged by all former tokens. It is a verb, it has a subject, and when we reached that subject, likely near the beginning of the sentence, helpfully (in Latin) identified by the nominative case, we already knew something about the verb — a fact we retained all the way until the end. And on our way there, perhaps we tied off clause after clause, chunking them into neat little packages, but none of them nearly so relevant to the verb — perhaps in fact none of them really tied to the verb at all, because they’re illuminating some noun we met along the way. Pronouns, pointing at nouns. Adjectives, pointing at nouns. Nouns, suspended with verbs like a mobile, hanging above and below, subject and object. Adverbs, keeping company only with verbs and each other. There’s so much data in the sentence about which word informs which that the beam model casually discards. Wasteful. And forcing the model to reinvent all these things we already knew — to allocate some of its neural space to re-engineering things we could have told it from the beginning. Clearly I need to get my hands on more modern language models (a bizarre sentence since this class is all of 3 years old, but the field moves that fast). Andromeda Uncategorized 1 Comment January 15, 2021 Adapting Coursera’s neural style transfer code to localhost Last time, when making cats from the void, I promised that I’d discuss how I adapted the neural style transfer code from Coursera’s Convolutional Neural Networks course to run on localhost. Here you go! Step 1: First, of course, download (as python) the script. You’ll also need the nst_utils.py file, which you can access via File > Open. Step 2: While the Coursera file is in .py format, it’s iPython in its heart of hearts. So I opened a new file and started copying over the bits I actually needed, reading them as I went to be sure I understood how they all fit together. Along the way I also organized them into functions, to clarify where each responsibility happened and give it a name. The goal here was ultimately to get something I could run at the command line via python dpla_cats.py, so that I could find out where it blew up in step 3. Step 3: Time to install dependencies. I promptly made a pipenv and, in running the code and finding what ImportErrors showed up, discovered what I needed to have installed: scipy, pillow, imageio, tensorflow. Whatever available versions of the former three worked, but for tensorflow I pinned to the version used in Coursera — 1.2.1 — because there are major breaking API changes with the current (2.x) versions. This turned out to be a bummer, because tensorflow promptly threw warnings that it could be much faster on my system if I compiled it with various flags my computer supports. OK, so I looked up the docs for doing that, which said I needed bazel/bazelisk — but of course I needed a paleolithic version of that for tensorflow 1.2.1 compat, so it was irritating to install — and then running that failed because it needed a version of Java old enough that I didn’t have it, and at that point I gave up because I have better things to do than installing quasi-EOLed Java versions. Updating the code to be compatible with the latest tensorflow version and compiling an optimized version of that would clearly be the right answer, but also it would have been work and I wanted messed-up cat pictures now. (As for the rest of my dependencies, I ended up with scipy==1.5.4, pillow==8.0.1, and imageio==2.9.0, and then whatever sub-dependencies pipenv installed. Just in case the latest versions don’t work by the time you read this. 🙂 At this point I had achieved goal 1, aka “getting anything to run at all”. Step 4: I realized that, honestly, almost everything in nst_utils wanted to be an ImageUtility, which was initialized with metadata about the content and style files (height, width, channels, paths), and carried the globals (shudder) originally in nst_utils as class data. This meant that my new dpla_cats script only had to import ImageUtility rather than * (from X import * is, of course, deeply unnerving), and that utility could pingpong around knowing how to do the things it knew how to do, whenever I needed to interact with image-y functions (like creating a generated image or saving outputs) rather than neural-net-ish stuff. Everything in nst_utils that properly belonged in an ImageUtility got moved, step by step, into that class; I think one or two functions remained, and they got moved into the main script. Step 5: Ughhh, scope. The notebook plays fast and loose with scope; the raw python script is, rightly, not so forgiving. But that meant I had to think about what got defined at what level, what got passed around in an argument, what order things happened in, et cetera. I’m not happy with the result — there’s a lot of stuff that will fail with minor edits — but it works. Scope errors will announce themselves pretty loudly with exceptions; it’s just nice to know you’re going to run into them. Step 5a: You have to initialize the Adam optimizer before you run sess.run(tf.global_variables_initializer()). (Thanks, StackOverflow!) The error message if you don’t is maddeningly unhelpful. (FailedPreconditionError, I mean, what.) Step 6: argparse! I spent some quality time reading this neural style implementation early on and thought, gosh, that’s argparse-heavy. Then I found myself wanting to kick off a whole bunch of different script runs to do their thing overnight investigating multiple hypotheses and discovered how very much I wanted there to be command-line arguments, so I could configure all the different things I wanted to try right there and leave it alone. Aw yeah. I’ve ended up with the following: parser.add_argument('--content', required=True) parser.add_argument('--style', required=True) parser.add_argument('--iterations', default=400) # was 200 parser.add_argument('--learning_rate', default=3.0) # was 2.0 parser.add_argument('--layer_weights', nargs=5, default=[0.2,0.2,0.2,0.2,0.2]) parser.add_argument('--run_until_steady', default=False) parser.add_argument('--noisy_start', default=True) content is the path to the content image; style is the path to the style image; iterations and learning_rate are the usual; layer_weights is the value of STYLE_LAYERS in the original code, i.e. how much to weight each layer; run_until_steady is a bad API because it means to ignore the value of the iterations parameter and instead run until there is no longer significant change in cost; and noisy_start is whether to use the content image plus static as the first input or just the plain content image. I can definitely see adding more command line flags if I were going to be spending a lot of time with this code. (For instance, a layer_names parameter that adjusted what STYLE_LAYERS considered could be fun! Or making “significant change in cost” be a user-supplied rather than hardcoded parameter!) Step 6a: Correspondingly, I configured the output filenames to record some of the metadata used to create the image (content, style, layer_weights), to make it easier to keep track of which images came from which script runs. Stuff I haven’t done but it might be great: Updating tensorflow, per above, and recompiling it. The slowness is acceptable — I can run quite a few trials on my 2015 MacBook overnight — but it would get frustrating if I were doing a lot of this. Supporting both num_iterations and run_until_steady means my iterator inside the model_nn function is kind of a mess right now. I think they’re itching to be two very thin subclasses of a superclass that knows all the things about neural net training, with the subclass just handling the iterator, but I didn’t spend a lot of time thinking about this. Reshaping input files. Right now it needs both input files to be the same dimensions. Maybe it would be cool if it didn’t need that. Trying different pretrained models! It would be easy to pass a different arg to load_vgg_model. It would subsequently be annoying to make sure that STYLE_LAYERS worked — the available layer names would be different, and load_vgg_model makes a lot of assumptions about how that model is shaped. As your reward for reading this post, you get another cat image! A friend commented that a thing he dislikes about neural style transfer is that it’s allergic to whitespace; it wants to paint everything with a texture. This makes sense — it sees subtle variations within that whitespace and it tries to make them conform to patterns of variation it knows. This is why I ended up with the noisy_start flag; I wondered what would happen if I didn’t add the static to the initial image, so that the original negative space stayed more negative-spacey. This, as you can probably tell, uses the Harlem renaissance style image. It’s still allergic to negative space — even without the generated static there are variations in pixel color in the original — but they are much subtler, so instead of saying “maybe what I see is coiled hair?” it says “big open blue patches; we like those”. But the semantics of the original image are more in place — the kittens more kitteny, the card more readable — even though the whole image has been pushed more to colorblocks and bold lines. I find I like the results better without the static — even though the cost function is larger, and thus in a sense the algorithm is less successful. Look, one more. Superhero! Andromeda Uncategorized Leave a comment January 3, 2021 Dear Internet, merry Christmas; my robot made you cats from the void Recently I learned how neural style transfer works. I wanted to be able to play with it more and gain some insights, so I adapted the Coursera notebook code to something that works on localhost (more on that in a later post), found myself a nice historical cat image via DPLA, and started mashing it up with all manner of images of varying styles culled from DPLA’s list of primary source sets. (It really helped me that these display images were already curated for looking cool, and cropped to uniform size!) These sweet babies do not know what is about to happen to them. Let’s get started, shall we? Style image from the Fake News in the 1890s: Yellow Journalism primary source set. I really love how this one turned out. It’s pulled the blue and yellow colors, and the concerned face of the lower kitten was a perfect match for the expression on the right-hand muckraker. The lines of the card have taken on the precise quality of those in the cartoon — strong outlines and textured interiors. “Merry Christmas” the bird waves, like an eager newsboy. Style image from the Food and Social Justice exhibit. This is one of the first ones I made, and I was delighted by how it learned the square-iness of its style image. Everything is more snapped to a grid. The colors are bolder, too, cueing off of that dominant yellow. The Christmas banner remains almost readable and somehow heraldic. Style image from the Truth, Justice, and the American Way primary source set. How about Christmas of Steel? These kittens have broadly retained their shape (perhaps as the figures in the comic book foreground have organic detail?), but the background holly is more polygon-esque. The colors have been nudged toward primary, and the static of the background has taken on a swirl of dynamic motion lines. Style image from the Visual Art During the Harlem Renaissance primary source set. How about starting with something boldly colored and almost abstract? Why look: the kittens have learned a world of black and white and blue, with the background transformed into that stippled texture it picked up from the hair. The holly has gone more colorblocky and the lines bolder. Style image from the Treaty of Versailles and the End of World War I primary source set. This one learned its style so aptly that I couldn’t actually tell where the boundary between the second and third images was when I was placing that equals sign. The soft pencil lines, the vertical textures of shadows and jail bars, the fact that all the colors in the world are black and white and orange (the latter mostly in the middle) — these kittens are positively melting before the force of Wilsonian propaganda. Imagine them in the Hall of Mirrors, drowning in gold and reflecting back at you dozens of times, for full nightmare effect. Style image from the Victorian Era primary source set. Shall we step back a few decades to something slightly more calming? These kittens have learned to take on soft lines and swathes of pale pink. The holly is perfectly happy to conform itself to the texture of these New England trees. The dark space behind the kittens wonders if, perhaps, it is meant to be lapels. I totally can’t remember how I found this cropped version of US food propaganda. And now for kittens from the void. Brown, it has learned. The world is brown. The space behind the kittens is brown. Those dark stripes were helpfully already brown. The eyes were brown. Perhaps they can be the same brown, a hole dropped through kitten-space. I thought this was honestly pretty creepy, and I wondered if rerunning the process with different layer weights might help. Each layer of the neural net notices different sorts of things about its image; it starts with simpler things (colors, straight lines), moves through compositions of those (textures, basic shapes), and builds its way up to entire features (faces). The style transfer algorithm looks at each of those layers and applies some of its knowledge to the generated image. So I thought, what if I change the weights? The initial algorithm weights each of five layers equally; I reran it weighted toward the middle layers and entirely ignoring the first layer, in hopes that it would learn a little less about gaping voids of brown. Same thing, less void. This worked! There’s still a lot of brown, but the kitten’s eye is at least separate from its facial markings. My daughter was also delighted by how both of these images want to be letters; there are lots of letter-ish shapes strewn throughout, particularly on the horizontal line that used to be the edge of a planter, between the lower cat and the demon holly. So there you go, internet; some Christmas cards from the nightmare realm. May 2021 bring fewer nightmares to us all. Andromeda Uncategorized 1 Comment December 24, 2020December 24, 2020 this week in my AI After visualizing a whole bunch of theses and learning about neural style transfer and flinging myself at t-SNE I feel like I should have something meaty this week but they can’t all be those weeks, I guess. Still, I’m trying to hold myself to Friday AI blogging, so here are some work notes: Finished course 4 of the deeplearning.ai sequence. Yay! The facial recognition assignment is kind of buggy and poorly documented and I felt creepy for learning it in the first place, but I’m glad to have finished. Only one more course to go! It’s a 3-week course, so if I’m particularly aggressive I might be able to get it all done by year’s end. Tried making a 3d version of last week’s visualization — several people had asked — but it turned out to not really add anything. Oh well. Been thinking about Charlie Harper’s talk at SWiB this year, Generating metadata subject labels with Doc2Vec and DBPedia. This talk really grabbed me because he started with the exact same questions and challenges as HAMLET — seriously, the first seven and a half minutes of this talk could be the first seven and a half minutes of a talk on HAMLET, essentially verbatim — but took it off in a totally different direction (assigning subject labels). I have lots of ideas about where one might go with this but right now they are all sparkling Voronoi diagrams in my head and that’s not a language I can readily communicate. All done with the second iteration of my AI for librarians course. There were some really good final projects this term. Yay, students! Andromeda Uncategorized 1 Comment December 18, 2020December 19, 2020 Though these be matrices, yet there is method in them. When I first trained a neural net on 43,331 theses to make HAMLET, one of the things I most wanted to do is be able to visualize them. If word2vec places documents ‘near’ each other in some kind of inferred conceptual space, we should be able to see some kind of map of them, yes? Even if I don’t actually know what I’m doing? Turns out: yes. And it’s even better than I’d imagined. 43,331 graduate theses, arranged by their conceptual similarity. Let me take you on a tour! Region 1 is biochemistry. The red dots are biology; the orange ones, chemistry. Theses here include Positional cloning and characterization of the mouse pudgy locus and Biosynthetic engineering for the assembly of better drugs. If you look closely, you will see a handful of dots in different colors, like a buttery yellow. This color is electrical engineering & computer science, and its dots in this region include Computational regulatory genomics : motifs, networks, and dynamics — that is to say, a computational biology thesis that happens to have been housed in computation rather than biology. The green south of Region 2 is physics. But you will note a bit of orange here. Yes, that’s chemistry again; for example, Dynamic nuclear polarization of amorphous and crystalline small molecules. If (like me), you almost majored in chemistry and realized only your senior year that the only chemistry classes that interested you were the ones that were secretly physics…this is your happy place. In fact, most of the theses here concern nuclear magnetic resonance applications. Region 3 has a striking vertical green stripe which turns out to be the nuclear engineering department. But you’ll see some orange streaks curling around it like fingers, almost suggesting three-dimensional depth. I point this out as a reminder that the original neural net embeds these 43,331 documents in a 52-dimensional space; I have projected that down to 2 dimensions because I don’t know about you but I find 52 dimensions somewhat challenging to visualize. However — just as objects may overlap in a 2-dimensional photo even when they are quite distant in 3-dimensional space — dots that are close together in this projection may be quite far apart in reality. Trust the overall structure more than each individual element. The map is not the territory. That little yellow thumb by Region 4 is mathematics, now a tiny appendage off of the giant discipline it spawned — our old friend buttery yellow, aka electrical engineering & computer science. If you zoom in enough you find EECS absolutely everywhere, applied to all manner of disciplines (as above with biology), but the bulk of it — including the quintessential parts, like compilers — is right here. Dramatically red Region 5, clustered together tightly and at the far end, is architecture. This is a renowned department (it graduated I.M. Pei!), but definitely a different sort of creature than most of MIT, so it makes sense that it’s at one extreme of the map. That said, the other two programs in its school — Urban Studies & Planning and Media Arts & Sciences — are just to its north. Region 6 — tiny, yellow, and pale; you may have missed it at first glance — is linguistics island, housing theses such as Topics in the stress and syntax of words. You see how there are also a handful of red dots on this island? They are Brain & Cognitive Science theses — and in particular, ones that are secretly linguistics, like Intonational phrasing in language production and comprehension. Similarly — although at MIT it is not the department of linguistics, but the department of linguistics & philosophy — the philosophy papers are elsewhere. (A few of the very most abstract ones are hanging out near math.) And what about Region 7, the stingray swimming vigorously away from everything else? I spent a long time looking at this and not seeing a pattern. You can tell there’s a lot of colors (departments) there, randomly assorted; even looking at individual titles I couldn’t see anything. Only when I looked at the original documents did I realize that this is the island of terrible OCR. Almost everything here is an older thesis, with low-quality printing or even typewriting, often in a regrettable font, maybe with the reverse side of the page showing through. (A randomly chosen example; pdf download.) A good reminder of the importance of high-quality digitization labor. A heartbreaking example of the things we throw away when we make paper the archival format for born-digital items. And also a technical inspiration — look how much vector space we’ve had to carve out to make room for these! the poor neural net, trying desperately to find signal in the noise, needing all this space to do it. I’m tempted to throw out the entire leftmost quarter of this graph, rerun the 2d projection, and see what I get — would we be better able to see the structures in the high-quality data if they had room to breathe? And were I to rerun the entire neural net training process again, I’d want to include some sort of threshhold score for OCR quality. It would be a shame to throw things away — especially since they will be a nonrandom sample, mostly older theses — but I have already had to throw away things I could not OCR at all in an earlier pass, and, again, I suspect the neural net would do a better job organizing the high-quality documents if it could use the whole vector space to spread them out, rather than needing some of it to encode the information “this is terrible OCR and must be kept away from its fellows”. Clearly I need to share the technical details of how I did this, but this post is already too long, so maybe next week. tl;dr I reached out to Matt Miller after reading his cool post on vectorizing the DPLA and he tipped me off to UMAP and here we are — thanks, Matt! And just as clearly you want to play with this too, right? Well, it’s super not ready to be integrated into HAMLET due to any number of usability issues but if you promise to forgive me those — have fun. You see how when you hover over a dot you get a label with the format 1721.1-X.txt? It corresponds to a URL of the format https://hamlet.andromedayelton.com/similar_to/X. Go play :). Andromeda Uncategorized 2 Comments December 11, 2020December 11, 2020 Of such stuff are (deep)dreams made: convolutional networks and neural style transfer Skipped FridAI blogging last week because of Thanksgiving, but let’s get back on it! Top-of-mind today are the firing of AI queen Timnit Gebru (letter of support here) and a couple of grant applications that I’m actually eligible for (this is rare for me! I typically need things for which I can apply in my individual capacity, so it’s always heartening when they exist — wish me luck). But for blogging today, I’m gonna talk about neural style transfer, because it’s cool as hell. I started my ML-learning journey on Coursera’s intro ML class and have been continuing with their deeplearning.ai sequence; I’m on course 4 of 5 there, so I’ve just gotten to neural style transfer. This is the thing where a neural net outputs the content of one picture in the style of another: Via https://medium.com/@build_it_for_fun/neural-style-transfer-with-swift-for-tensorflow-b8544105b854. OK, so! Let me explain while it’s still fresh. If you have a neural net trained on images, it turns out that each layer is responsible for recognizing different, and progressively more complicated, things. The specifics vary by neural net and data set, but you might find that the first layer gets excited about straight lines and colors; the second about curves and simple textures (like stripes) that can be readily composed from straight lines; the third about complex textures and simple objects (e.g. wheels, which are honestly just fancy circles); and so on, until the final layers recognize complex whole objects. You can interrogate this by feeding different images into the neural net and seeing which ones trigger the highest activation in different neurons. Below, each 3×3 grid represents the most exciting images for a particular neuron. You can see that in this network, there are Layer 1 neurons excited about colors (green, orange), and about lines of particular angles that form boundaries between dark and colored space. In Layer 2, these get built together like tiny image legos; now we have neurons excited about simple textures such as vertical stripes, concentric circles, and right angles. Via https://adeshpande3.github.io/The-9-Deep-Learning-Papers-You-Need-To-Know-About.html, originally from Zeller & Fergus, Visualizing and Understanding Convolutional Networks So how do we get from here to neural style transfer? We need to extract information about the content of one image, and the style of another, in order to make a third image that approximates both of them. As you already expect if you have done a little machine learning, that means that we need to write cost functions that mean “how close is this image to the desired content?” and “how close is this image to the desired style?” And then there’s a wrinkle that I haven’t fully understood, which is that we don’t actually evaluate these cost functions (necessarily) against the outputs of the neural net; we actually compare the activations of the neurons, as they react to different images — and not necessarily from the final layer! In fact, choice of layer is a hyperparameter we can vary (I super look forward to playing with this on the Coursera assignment and thereby getting some intuition). So how do we write those cost functions? The content one is straightforward: if two images have the same content, they should yield the same activations. The greater the differences, the greater the cost (specifically via a squared error function that, again, you may have guessed if you’ve done some machine learning). The style one is beautifully sneaky; it’s a measure of the difference in correlation between activations across channels. What does that mean in English? Well, let’s look at the van Gogh painting, above. If an edge detector is firing (a boundary between colors), then a swirliness detector is probably also firing, because all the lines are curves — that’s characteristic of van Gogh’s style in this painting. On the other hand, if a yellowness detector is firing, a blueness detector may or may not be (sometimes we have tight parallel yellow and blue lines, but sometimes yellow is in the middle of a large yellow region). Style transfer posits that artistic style lies in the correlations between different features. See? Sneaky. And elegant. Finally, for the style-transferred output, you need to generate an image that does as well as possible on both cost functions simultaneously — getting as close to the content as it can without unduly sacrificing the style, and vice versa. As a side note, I think I now understand why DeepDream is fixated on a really rather alarming number of eyes. Since the layer choice is a hyperparameter, I hypothesize that choosing too deep a layer — one that’s started to find complex features rather than mere textures and shapes — will communicate to the system, yes, what I truly want is for you to paint this image as if those complex features are matters of genuine stylistic significance. And, of course, eyes are simple enough shapes to be recognized relatively early (not very different from concentric circles), yet ubiquitous in image data sets. So…this is what you wanted, right? the eager robot helpfully offers. https://www.ucreative.com/inspiration/google-deep-dream-is-the-trippiest-thing-in-the-internet/ I’m going to have fun figuring out what the right layer hyperparameter is for the Coursera assignment, but I’m going to have so much more fun figuring out the wrong ones. Andromeda Uncategorized 2 Comments December 4, 2020December 4, 2020 Let’s visualize some HAMLET data! Or, d3 and t-SNE for the lols. In 2017, I trained a neural net on ~44K graduate theses using the Doc2Vec algorithm, in hopes that doing so would provide a backend that could support novel and delightful discovery mechanisms for unique library content. The result, HAMLET, worked better than I hoped; it not only pulls together related works from different departments (thus enabling discovery that can’t be supported with existing metadata), but it does a spirited job on documents whose topics are poorly represented in my initial data set (e.g. when given a fiction sample it finds theses from programs like media studies, even though there are few humanities theses in the data set). That said, there are a bunch of exploratory tools I’ve had in my head ever since 2017 that I’ve not gotten around to implementing. But here, in the spirit of tossing out things that don’t bring me joy (like 2020) and keeping those that do, I’m gonna make some data viz! There are only two challenges with this: By default Doc2Vec embeds content in a 100-dimensional space, which is kind of hard to visualize. I need to project that down to 2 or 3 dimensions. I don’t actually know anything about dimensionality reduction techniques, other than that they exist. I also don’t know know JavaScript much beyond a copy-paste level. I definitely don’t know d3, or indeed the pros and cons of various visualization libraries. Also art. Or, like, all that stuff in Tufte’s book, which I bounced off of. (But aside from that, Mr. Lincoln, how was the play?) I decided I should start with the pages that display the theses most similar to a given thesis (shout-out to Jeremy Brown, startup founder par excellence) rather than with my ideas for visualizing the whole collection, because I’ll only need to plot ten or so points instead of 44K. This will make it easier for me to tell visually if I’m on the right track and should let me skip dealing with performance issues for now. On the down side, it means I may need to throw out any code I write at this stage when I’m working on the next one. 🤷‍♀️ And I now have a visualization on localhost! Which you can’t see because I don’t trust it yet. But here are the problems I’ve solved thus far: It’s hard to copy-paste d3 examples on the internet. d3’s been around for long enough there’s substantial content about different versions, so you have to double-check. But also most of the examples are live code notebooks on Observable, which is a wicked cool service but not the same environment as a web page! If you just copy-paste from there you will have things that don’t work due to invisible environment differences and then you will be sad. 😢 I got tipped off to this by Mollie Marie Pettit’s great Your First d3 Scatterplot notebook, which both names the phenomenon and provides two versions of the code (the live-editable version and the one you can actually copy/paste into your editor). If you start googling for dimensionality reduction techniques you will mostly find people saying “use t-SNE”, but t-SNE is a lying liar who lies. Mind you, it’s what I’m using right now because it’s so well-documented it was the easiest thing to set up. (This is why I said above that I don’t trust my viz.) But it produces different results for the same data on different pageloads (obviously different, so no one looking at the page will trust it either), and it’s not doing a good job preserving the distances I care about. (I accept that anything projecting from 100d down to 2d will need to distort distances, but I want to adequately preserve meaning — I want the visualization to not just look pretty but to give people an intellectually honest insight into the data — and I’m not there yet.) Conveniently this is not my first time at the software engineering rodeo, so I encapsulated my dimensionality reduction strategy inside a function, and I can swap it out for whatever I like without needing to rewrite the d3 as long as I return the same data structure. So that’s my next goal — try out UMAP (hat tip to Matt Miller for suggesting that to me), try out PCA, fiddle some parameters, try feeding it just the data I want to visualize vs larger neighborhoods, see if I’m happier with what I get. UMAP in particular alleges itself to be fast with large data sets, so if I can get it working here I should be able to leverage that knowledge for my ideas for visualizing the whole thing. Onward, upward, et cetera. 🎉 Andromeda Uncategorized 2 Comments November 20, 2020 Posts navigation Older posts Blog at WordPress.com. Email (Required) Name (Required) Website   Loading Comments... Comment × Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy andromedayelton-com-4308 ---- andromeda yelton andromeda yelton I haven’t failed, I’ve just tried a lot of ML approaches that don’t work “Let’s blog every Friday,” I thought. “It’ll be great. People can see what I’m doing with ML, and it will be a useful practice for me!” And then I went through weeks on end of feeling like I had nothing to report because I was trying approach after approach to this one problem that simply … Continue reading I haven’t failed, I’ve just tried a lot of ML approaches that don’t work → this time: speaking about machine learning No tech blogging this week because most of my time was taken up with telling people about ML instead! One talk for an internal Harvard audience, “Alice in Dataland”, where I explained some of the basics of neural nets and walked people through the stories I found through visualizing HAMLET data. One talk for the … Continue reading this time: speaking about machine learning → archival face recognition for fun and nonprofit In 2019, Dominique Luster gave a super good Code4Lib talk about applying AI to metadata for the Charles “Teenie” Harris collection at the Carnegie Museum of Art — more than 70,000 photographs of Black life in Pittsburgh. They experimented with solutions to various metadata problems, but the one that’s stuck in my head since 2019 … Continue reading archival face recognition for fun and nonprofit → sequence models of language: slightly irksome Not much AI blogging this week because I have been buried in adulting all week, which hasn’t left much time for machine learning. Sadface. However, I’m in the last week of the last deeplearning.ai course! (Well. Of the deeplearning.ai sequence that existed when I started, anyway. They’ve since added an NLP course and a GANs … Continue reading sequence models of language: slightly irksome → Adapting Coursera’s neural style transfer code to localhost Last time, when making cats from the void, I promised that I’d discuss how I adapted the neural style transfer code from Coursera’s Convolutional Neural Networks course to run on localhost. Here you go! Step 1: First, of course, download (as python) the script. You’ll also need the nst_utils.py file, which you can access via … Continue reading Adapting Coursera’s neural style transfer code to localhost → Dear Internet, merry Christmas; my robot made you cats from the void Recently I learned how neural style transfer works. I wanted to be able to play with it more and gain some insights, so I adapted the Coursera notebook code to something that works on localhost (more on that in a later post), found myself a nice historical cat image via DPLA, and started mashing it … Continue reading Dear Internet, merry Christmas; my robot made you cats from the void → this week in my AI After visualizing a whole bunch of theses and learning about neural style transfer and flinging myself at t-SNE I feel like I should have something meaty this week but they can’t all be those weeks, I guess. Still, I’m trying to hold myself to Friday AI blogging, so here are some work notes: Finished course … Continue reading this week in my AI → Though these be matrices, yet there is method in them. When I first trained a neural net on 43,331 theses to make HAMLET, one of the things I most wanted to do is be able to visualize them. If word2vec places documents ‘near’ each other in some kind of inferred conceptual space, we should be able to see some kind of map of them, yes? … Continue reading Though these be matrices, yet there is method in them. → Of such stuff are (deep)dreams made: convolutional networks and neural style transfer Skipped FridAI blogging last week because of Thanksgiving, but let’s get back on it! Top-of-mind today are the firing of AI queen Timnit Gebru (letter of support here) and a couple of grant applications that I’m actually eligible for (this is rare for me! I typically need things for which I can apply in my … Continue reading Of such stuff are (deep)dreams made: convolutional networks and neural style transfer → Let’s visualize some HAMLET data! Or, d3 and t-SNE for the lols. In 2017, I trained a neural net on ~44K graduate theses using the Doc2Vec algorithm, in hopes that doing so would provide a backend that could support novel and delightful discovery mechanisms for unique library content. The result, HAMLET, worked better than I hoped; it not only pulls together related works from different departments (thus … Continue reading Let’s visualize some HAMLET data! Or, d3 and t-SNE for the lols. → andromedayelton-com-509 ---- I haven’t failed, I’ve just tried a lot of ML approaches that don’t work – andromeda yelton Skip to content andromeda yelton Menu Home About Contact Resume HAMLET LITA Talks Machine Learning (ALA Midwinter 2019) Boston Python Meetup (August 21, 2018) SWiB16 LibTechConf 2016 Code4Lib 2015 Keynote Texas Library Association 2014 Online Northwest 2014: Five Conversations About Code New Jersey ESummit (May 2, 2013) Westchester Library Association (January 7, 2013) Bridging the Digital Divide with Mobile Services (Webjunction, July 25 2012) I haven’t failed, I’ve just tried a lot of ML approaches that don’t work Andromeda Uncategorized April 16, 2021 “Let’s blog every Friday,” I thought. “It’ll be great. People can see what I’m doing with ML, and it will be a useful practice for me!” And then I went through weeks on end of feeling like I had nothing to report because I was trying approach after approach to this one problem that simply didn’t work, hence not blogging. And finally realized: oh, the process is the thing to talk about… Hi. I’m Andromeda! I am trying to make a neural net better at recognizing people in archival photos. After running a series of experiments — enough for me to have written 3,804 words of notes — I now have a neural net that is ten times worse at its task. 🎉 And now I have 3,804 words of notes to turn into a blog post (a situation which gets harder every week). So let me catch you up on the outline of the problem: Download a whole bunch of archival photos and their metadata (thanks, DPLA!) Use a face detection ML library to locate faces, crop them out, and save them in a standardized way Benchmark an off-the-shelf face recognition system to see how good it is at identifying these faces Retrain it Benchmark my new system Step 3: profit, right? Well. Let me also catch you up on some problems along the way: Alas, metadata Archival photos are great because they have metadata, and metadata is like labels, and labels mean you can do supervised learning, right? Well…. Is he “Du Bois, W. E. B. (William Edward Burghardt), 1868-1963” or “Du Bois, W. E. B. (William Edward Burghardt) 1868-1963” or “Du Bois, W. E. B. (William Edward Burghardt)” or “W.E.B. Du Bois”? I mean, these are all options. People have used a lot of different metadata practices at different institutions and in different times. But I’m going to confuse the poor computer if I imply to it that all these photos of the same person are photos of different people. (I have gone through several attempts to resolve this computationally without needing to do everything by hand, with only modest success.) What about “Photographs”? That appears in the list of subject labels for lots of things in my data set. “Photographs” is a person, right? I ended up pulling in an entire other ML component here — spaCy, to do some natural language processing to at least guess which lines are probably names, so I can clear the rest of them out of my way. But spaCy only has ~90% accuracy on personal names anyway and, guess what, because everything is terrible, in predictable ways, it has no idea “Kweisi Mfume” is a person. Is a person who appears in the photo guaranteed to be a person who appears in the photo? Nope. Is a person who appears in the metadata guaranteed to be a person who appears in the photo? Also nope! Often they’re a photographer or other creator. Sometimes they are the subject of the depicted event, but not themselves in the photo. (spaCy will happily tell you that there’s personal name content in something like “Martin Luther King Day”, but MLK is unlikely to appear in a photo of an MLK day event.) Oh dear, linear algebra OK but let’s imagine for the sake of argument that we live in a perfect world where the metadata is exactly what we need — no more, no less — and its formatting is perfectly consistent. 🦄 Here you are, in this perfect world, confronted with a photo that contains two people and has two names. How do you like them apples? I spent more time than I care to admit trying to figure this out. Can I bootstrap from photos that have one person and one name — identify those, subtract them out of photos of two people, go from there? (Not reliably — there’s a lot of data I never reach that way — and it’s horribly inefficient.) Can I do something extremely clever with matrix multiplication? Like…once I generate vector space embeddings of all the photos, can I do some sort of like dot-product thing across all of my photos, or big batches of them, and correlate the closest-match photos with overlaps in metadata? Not only is this a process which begs the question — I’d have to do that with the ML system I have not yet optimized for archival photo recognition, thus possibly just baking bad data in — but have I mentioned I have taken exactly one linear algebra class, which I didn’t really grasp, in 1995? What if I train yet another ML system to do some kind of k-means clustering on the embeddings? This is both a promising approach and some really first-rate yak-shaving, combining all the question-begging concerns of the previous paragraph with all the crystalline clarity of black box ML. Possibly at this point it would have been faster to tag them all by hand, but that would be admitting defeat. Also I don’t have a research assistant, which, let’s be honest, is the person who would usually be doing this actual work. I do have a 14-year-old and I am strongly considering paying her to do it for me, but to facilitate that I’d have to actually build a web interface and probably learn more about AWS, and the prospect of reading AWS documentation has a bracing way of reminding me of all of the more delightful and engaging elements of my todo list, like calling some people on the actual telephone to sort out however they’ve screwed up some health insurance billing. Nowhere to go but up Despite all of that, I did actually get all the way through the 5 steps above. I have a truly, spectacularly terrible neural net. Go me! But at a thousand-plus words, perhaps I should leave that story for next week…. Share this: Twitter Facebook Like this: Like Loading... Tagged fridAI Published by Andromeda Romantic analytical technologist librarian. View all posts by Andromeda Published April 16, 2021 Post navigation Previous Post this time: speaking about machine learning Leave a Reply Cancel reply Enter your comment here... Fill in your details below or click an icon to log in: Email (required) (Address never made public) Name (required) Website You are commenting using your WordPress.com account. ( Log Out /  Change ) You are commenting using your Google account. ( Log Out /  Change ) You are commenting using your Twitter account. ( Log Out /  Change ) You are commenting using your Facebook account. ( Log Out /  Change ) Cancel Connecting to %s Notify me of new comments via email. Notify me of new posts via email. Create a free website or blog at WordPress.com. Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy %d bloggers like this: api-flickr-com-6743 ---- Recent Uploads tagged code4lib Recent Uploads tagged code4lib IMG_9817 IMG_9861 IMG_9945 IMG_9946 IMG_9922 IMG_9924 IMG_9932 IMG_9941 IMG_9881 IMG_9866 IMG_9952 IMG_9877 IMG_9959 IMG_9882 IMG_9905 IMG_9845 IMG_9823 IMG_9843 IMG_9895 IMG_9855 apps-lib-umich-edu-4914 ---- Library Tech Talk - U-M Library Library Tech Talk - U-M Library Technology Innovations and Project Updates from the U-M Library I.T. Division Library IT Services Portfolio Academic library service portfolios are mostly a mix of big to small strategic initiatives and tactical projects. Systems developed in the past can become a durable bedrock of workflows and services around the library, remaining relevant and needed for five, ten, and sometimes as long as twenty years. There is, of course, never enough time and resources to do everything. The challenge faced by Library IT divisions is to balance the tension of sustaining these legacy systems while continuing to innovate and develop new services. The University of Michigan’s Library IT portfolio has legacy systems in need of ongoing maintenance and support, in addition to new projects and services that add to and expand the portfolio. We, at Michigan, worked on a process to balance the portfolio of services and projects for our Library IT division. We started working on the idea of developing a custom tool for our needs since all the other available tools are oriented towards corporate organizations and we needed a light-weight tool to support our process. We went through a complete planning process first on whiteboards and paper, then developed an open source tool called TRACC for helping us with portfolio management. 4 keys to a dazzling library website redesign The U-M Library launched a completely new primary website in July after 2 years of work. The redesign project team focused on building a strong team, internal communication, content strategy, and practicing needs informed design and development to make the project a success. Sweet Sixteen: Digital Collections Completed July 2019 - June 2020 Digital Content & Collections (DCC) relies on content and subject experts to bring us new digital collections. This year, 16 digital collections were created or significantly enhanced. Here you will find links to videos and articles by the subject experts speaking in their own words about the digital collections they were involved in and why they found it so important to engage in this work with us. Thank you to all of the people involved in each of these digital collections! Adding Ordered Metadata Fields to Samvera Hyrax How to add ordered metadata fields in Samvera Hyrax. Includes example code and links to actual code. Sinking our Teeth into Metadata Improvement Like many attempts at revisiting older materials, working with a couple dozen volumes of dental pamphlets started very simply but ended up being an interesting opportunity to explore the challenges of making the diverse range of materials held in libraries accessible to patrons in a digital environment. And while improving metadata may not sound glamorous, having sufficient metadata for users to be able to find what they are looking for is essential for the utility of digital libraries. Collaboration and Generosity Provide the Missing Issue of The American Jewess What started with a bit of wondering and conversation within our unit of the Library led to my reaching out to Princeton University with a request but no expectations of having that request fulfilled. Individuals at Princeton, however, considered the request and agreed to provide us with the single issue of The American Jewess that we needed to complete the full run of the periodical within our digital collection. Especially in these stressful times, we are delighted to bring you a positive story, one of collaboration and generosity across institutions, while also sharing the now-complete digital collection itself. How to stop being negative, or digitizing the Harry A. Franck film collection This article reviews how 9,000+ frames of photographic negatives from the Harry A. Franck collection are being digitally preserved. Combine Metadata Harvester: Aggregate ALL the data! The Digital Public Library of America (DPLA) has collected and made searchable a vast quantity of metadata from digital collections all across the country. The Michigan Service Hub works with cultural heritage institutions throughout the state to collect their metadata, transform those metadata to be compatible with the DPLA’s online library, and send the transformed metadata to the DPLA, using the Combine aggregator software, which is being developed here at the U of M Library. Hacks with Friends 2020 Retrospective: A pitch to hitch in 2021 When the students go on winter break I go to Hacks with Friends (HWF) and highly recommend and encourage everyone who can to participate in HWF 2021. Not only is it two days of free breakfast, lunch, and snacks at the Ross School of Business, but it’s a chance to work with a diverse cross section of faculty, staff, and students on innovative solutions to complex problems. U-M Library’s Digital Collection Items are now Included in Library Search The University Library’s digital collections, encompassing more than 300 collections with over a million items, are now discoverable through the library’s Articles discovery tool, powered by Summon. Read on to learn about searching this trove of images and text, and how to add it to your library’s Summon instance. apps-lib-umich-edu-9881 ---- Library Tech Talk Blog | U-M Library Skip to main content Log in Library Tech Talk Technology Innovations and Project Updates from the U-M Library I.T. Division Search Library Tech Talk Subscribe To RSS feed Get updates via Email (U-M Only) Popular posts for Library Tech Talk Library IT Services Portfolio 4 keys to a dazzling library website redesign Sweet Sixteen: Digital Collections Completed July 2019 - June 2020 Adding Ordered Metadata Fields to Samvera Hyrax Sinking our Teeth into Metadata Improvement Tags in Library Tech Talk HathiTrust Library Website MLibrary Labs DLXS Web Content Strategy Mirlyn Digital Collections Digitization search Design MTagger OAI Accessibility Usability Group UX Archive for Library Tech Talk Show 2020 October 2020 (1) September 2020 (1) August 2020 (1) July 2020 (1) June 2020 (2) April 2020 (2) March 2020 (1) January 2020 (1) Show 2019 October 2019 (1) June 2019 (2) April 2019 (1) February 2019 (2) January 2019 (1) Show 2018 December 2018 (1) November 2018 (1) September 2018 (1) July 2018 (2) April 2018 (1) February 2018 (1) Show Older Show 2017 November 2017 (3) September 2017 (1) August 2017 (1) June 2017 (1) April 2017 (1) March 2017 (1) February 2017 (1) January 2017 (1) Show 2016 December 2016 (2) November 2016 (2) August 2016 (2) June 2016 (1) April 2016 (1) March 2016 (1) February 2016 (1) January 2016 (1) Show 2015 December 2015 (1) November 2015 (1) October 2015 (2) September 2015 (2) July 2015 (2) June 2015 (2) May 2015 (2) April 2015 (2) March 2015 (2) February 2015 (2) January 2015 (2) Show 2014 December 2014 (2) November 2014 (2) October 2014 (2) September 2014 (2) August 2014 (2) July 2014 (2) June 2014 (2) Show 2012 December 2012 (1) October 2012 (1) September 2012 (2) April 2012 (2) March 2012 (1) January 2012 (1) Show 2011 August 2011 (2) July 2011 (1) June 2011 (1) May 2011 (1) Show 2010 December 2010 (1) November 2010 (2) September 2010 (2) July 2010 (5) May 2010 (1) April 2010 (1) March 2010 (2) Show 2009 December 2009 (3) October 2009 (2) September 2009 (1) August 2009 (1) July 2009 (1) May 2009 (1) February 2009 (1) January 2009 (2) Show 2008 December 2008 (3) November 2008 (1) October 2008 (2) September 2008 (2) August 2008 (3) July 2008 (5) June 2008 (6) May 2008 (6) Library IT Services Portfolio Academic library service portfolios are mostly a mix of big to small strategic initiatives and tactical projects. Systems developed in the past can become a durable bedrock of workflows and services around the library, remaining relevant and needed for five, ten, and sometimes as long as twenty years. There is, of course, never enough time and resources to do everything. The challenge faced by Library IT divisions is to balance the tension of sustaining these legacy systems while continuing to... October 7, 2020 See all posts by Nabeela Jaffer 4 keys to a dazzling library website redesign The U-M Library launched a completely new primary website in July after 2 years of work. The redesign project team focused on building a strong team, internal communication, content strategy, and practicing needs informed design and development to make the project a success. September 8, 2020 See all posts by Heidi Steiner Burkhardt Sweet Sixteen: Digital Collections Completed July 2019 - June 2020 Digital Content & Collections (DCC) relies on content and subject experts to bring us new digital collections. This year, 16 digital collections were created or significantly enhanced. Here you will find links to videos and articles by the subject experts speaking in their own words about the digital collections they were involved in and why they found it so important to engage in this work with us. Thank you to all of the people involved in each of these digital collections! August 6, 2020 See all posts by Lauren Havens Adding Ordered Metadata Fields to Samvera Hyrax How to add ordered metadata fields in Samvera Hyrax. Includes example code and links to actual code. July 20, 2020 See all posts by Fritz Freiheit Sinking our Teeth into Metadata Improvement Like many attempts at revisiting older materials, working with a couple dozen volumes of dental pamphlets started very simply but ended up being an interesting opportunity to explore the challenges of making the diverse range of materials held in libraries accessible to patrons in a digital environment. And while improving metadata may not sound glamorous, having sufficient metadata for users to be able to find what they are looking for is essential for the utility of digital libraries. June 30, 2020 See all posts by Jackson Huang Collaboration and Generosity Provide the Missing Issue of The American Jewess What started with a bit of wondering and conversation within our unit of the Library led to my reaching out to Princeton University with a request but no expectations of having that request fulfilled. Individuals at Princeton, however, considered the request and agreed to provide us with the single issue of The American Jewess that we needed to complete the full run of the periodical within our digital collection. Especially in these stressful times, we are delighted to bring you a positive... June 15, 2020 See all posts by Lauren Havens How to stop being negative, or digitizing the Harry A. Franck film collection This article reviews how 9,000+ frames of photographic negatives from the Harry A. Franck collection are being digitally preserved. April 27, 2020 See all posts by Larry Wentzel Pager Page 1 of 21 1 2 3 4 5 … 21 Older Posts Library Contact Information University of Michigan Library 818 Hatcher Graduate Library South, 913 S. University Avenue Ann Arbor, MI 48109-1190 (734) 764-0400 | contact-mlibrary@umich.edu Except where otherwise noted, this work is subject to a Creative Commons Attribution 4.0 license. For details and exceptions, see the Library Copyright Policy. ©2014, Regents of the University of Michigan archive-org-5338 ---- Internet Archive: About IA Skip to main content See what's new with book lending at the Internet Archive A line drawing of the Internet Archive headquarters building façade. An illustration of a magnifying glass. An illustration of a magnifying glass. An illustration of a horizontal line over an up pointing arrow. Upload An illustration of a person's head and chest. Sign up | Log in An illustration of a computer application window Wayback Machine An illustration of an open book. Books An illustration of two cells of a film strip. Video An illustration of an audio speaker. Audio An illustration of a 3.5" floppy disk. Software An illustration of two photographs. Images An illustration of a heart shape Donate An illustration of text ellipses. More An icon used to represent a menu that can be toggled by interacting with this icon. About Blog Projects Help Donate An illustration of a heart shape Contact Jobs Volunteer People Search Metadata Search text contents Search TV news captions Search archived websites Advanced Search Sign up for free Log in Read More Server Statistics Archive Statistics Job Opportunities at the Internet Archive Events News [more] Hooniverse: Wayback Machine Allows a Peek into Defunct Detroit Automaker Websites Laughing Squid: An Amazing Collection of Pulp Magazines Going Back 75 Years Is Available Online at The Internet Archive Far Out Magazine: Over 100,000 historic vinyl records are being digitised and made available to stream online for free GigaZine: ウェブ上の情報を記録・保存する「インターネット・アーカイブ」の存続をひっそりと脅かしているものとは? ActuaLitte: Plongez dans l'art japonais de la fin du XIXe siècle grâce à ce magazine numérisé Library Journal: Better World Libraries, Internet Archive Partner, Acquires Better World Books Open Culture: The Internet Archive Is Digitizing & Preserving Over 100,000 Vinyl Records: Hear 750 Full Albums Now Against The Grain: ATG Newsflash: For the Love of Literacy–Better World Books and the Internet Archive Unite to Preserve Millions of Books Research Information: Better World Books affiliates with Internet Archive Wired: The Internet Archive Is Making Wikipedia More Reliable About the Internet Archive The Internet Archive, a 501(c)(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, the print disabled, and the general public. Our mission is to provide Universal Access to All Knowledge. We began in 1996 by archiving the Internet itself, a medium that was just beginning to grow in use. Like newspapers, the content published on the web was ephemeral - but unlike newspapers, no one was saving it. Today we have 25+ years of web history accessible through the Wayback Machine and we work with 750+ library and other partners through our Archive-It program to identify important web pages. As our web archive grew, so did our commitment to providing digital versions of other published works. Today our archive contains: 475 billion web pages 28 million books and texts 14 million audio recordings (including 220,000 live concerts) 6 million videos (including 2 million Television News programs) 3.5 million images 580,000 software programs Anyone with a free account can upload media to the Internet Archive. We work with thousands of partners globally to save copies of their work into special collections. Because we are a library, we pay special attention to books. Not everyone has access to a public or academic library with a good collection, so to provide universal access we need to provide digital versions of books. We began a program to digitize books in 2005 and today we scan 3,500 books per day in 18 locations around the world. Books published prior to 1926 are available for download, and hundreds of thousands of modern books can be borrowed through our Open Library site. Some of our digitized books are only available to people with print disabilities. Like the Internet, television is also an ephemeral medium. We began archiving television programs in late 2000, and our first public TV project was an archive of TV news surrounding the events of September 11, 2001. In 2009 we began to make selected U.S. television news broadcasts searchable by captions in our TV News Archive. This service allows researchers and the public to use television as a citable and sharable reference. The Internet Archive serves millions of people each day and is one of the top 300 web sites in the world. A single copy of the Internet Archive library collection occupies 70+ Petabytes of server space (and we store at least 2 copies of everything). We are funded through donations, grants, and by providing web archiving and book digitization services for our partners. As with most libraries we value the privacy of our patrons, so we avoid keeping the IP (Internet Protocol) addresses of our readers and offer our site in https (secure) protocol. You can find information about our projects on our blog (including important announcements), contact us, buy swag in our store, and follow us on Twitter and Facebook. Welcome to the library! Recent foundation funding generously provided by:: Andrew W. Mellon Foundation Council on Library and Information Resources Democracy Fund Federal Communications Commission Universal Service Program for Schools and Libraries (E-Rate) Institute of Museum and Library Services (IMLS) Knight Foundation Laura and John Arnold Foundation National Endowment for the Humanities, Office of Digital Humanities National Science Foundation The Peter and Carmen Lucia Buck Foundation The Philadelphia Foundation Rita Allen Foundation The Internet Archive is a member of: American Library Association (ALA) Biodiversity Heritage Library (BHL) Boston Library Consortium (BLC) Califa Council on Library and Information Resources (CLIR) Coalition for Networked Information (CNI) Digital Library Federation (DLF) Digital Preservation Coalition (DPC) Digital Public Library of America (DPLA) International Federation of Library Associations and Institutions (IFLA) International Internet Preservation Consortium (IIPC) Music Library Association National Digital Stewardship Alliance (NDSA) ReShare archive-org-9038 ---- Internet Archive: Wayback Machine Skip to main content See what's new with book lending at the Internet Archive A line drawing of the Internet Archive headquarters building façade. An illustration of a magnifying glass. An illustration of a magnifying glass. An illustration of a horizontal line over an up pointing arrow. Upload An illustration of a person's head and chest. Sign up | Log in An illustration of a computer application window Wayback Machine An illustration of an open book. Books An illustration of two cells of a film strip. Video An illustration of an audio speaker. Audio An illustration of a 3.5" floppy disk. Software An illustration of two photographs. Images An illustration of a heart shape Donate An illustration of text ellipses. More An icon used to represent a menu that can be toggled by interacting with this icon. About Blog Projects Help Donate An illustration of a heart shape Contact Jobs Volunteer People Search Metadata Search text contents Search TV news captions Search archived websites Advanced Search Sign up for free Log in Explore more than 544 billion web pages saved over time BROWSE HISTORY Find the Wayback Machine useful? DONATE deviantart.com Oct 15, 2013 21:28:20 cl.cam.ac.uk Feb 29, 2000 18:34:39 foodnetwork.com Oct 20, 2013 22:40:56 yahoo.com Dec 20, 1996 15:45:10 spiegel.com Oct 01, 2013 15:26:30 imdb.com Oct 21, 2013 16:53:47 stackoverflow.com Oct 14, 2013 21:22:10 ubl.com Dec 27, 1996 20:38:47 bloomberg.com Oct 01, 2013 23:10:45 reference.com Oct 18, 2013 07:12:58 feedmag.com Dec 23, 1996 10:53:17 wikihow.com Oct 21, 2013 20:56:46 nbcnews.com Oct 21, 2013 17:24:52 goodreads.com Oct 21, 2013 00:42:42 obamaforillinois.com Nov 09, 2004 04:28:06 geocities.com Feb 22, 1997 17:47:51 amazon.com Feb 04, 2005 00:47:33 nytimes.com Oct 01, 2013 01:42:36 bbc.co.uk Oct 01, 2013 00:13:32 huffingtonpost.com Oct 21, 2013 17:11:12 reddit.com Oct 01, 2013 03:15:39 cnet.com Oct 21, 2013 02:07:03 whitehouse.gov Dec 27, 1996 06:25:41 aol.com Oct 01, 2013 05:01:31 yelp.com Oct 19, 2013 02:44:53 etsy.com Jun 01, 2013 01:38:52 foxnews.com Oct 01, 2013 01:08:27 well.com Jan 08, 1997 06:53:37 w3schools.com Oct 19, 2013 00:55:10 buzzfeed.com Oct 21, 2013 17:32:21 nasa.gov Dec 31, 1996 23:58:47 mashable.com Oct 21, 2013 02:16:14 nfl.com Oct 21, 2013 07:39:25   Tools Wayback Machine Availability API Build your own tools. WordPress Broken Link Checker Banish broken links from your blog. 404 Handler for Webmasters Help users get where they were going. Subscription Service Archive-It enables you to capture, manage and search collections of digital content without any technical expertise or hosting facilities. Visit Archive-It to build and browse the collections. Save Page Now SAVE PAGE Capture a web page as it appears now for use as a trusted citation in the future. Only available for sites that allow crawlers.   FAQ | Contact Us | Terms of Service (Dec 31, 2014) arstechnica-com-2640 ---- Fender bender in Arizona illustrates Waymo’s commercialization challenge | Ars Technica Skip to main content Biz & IT Tech Science Policy Cars Gaming & Culture Store Forums Subscribe Close Navigate Store Subscribe Videos Features Reviews RSS Feeds Mobile Site About Ars Staff Directory Contact Us Advertise with Ars Reprints Filter by topic Biz & IT Tech Science Policy Cars Gaming & Culture Store Forums Settings Front page layout Grid List Site theme Black on white White on black Sign in Comment activity Sign up or login to join the discussions! Stay logged in | Having trouble? Sign up to comment and more Sign up Self-driving — Fender bender in Arizona illustrates Waymo’s commercialization challenge Self-driving systems won't necessarily make the same mistakes as human drivers. Timothy B. Lee - Apr 2, 2021 5:07 pm UTC Enlarge / A Waymo self-driving car in Silicon Valley in 2019. Sundry Photography / Getty reader comments 293 with 120 posters participating, including story author Share this story Share on Facebook Share on Twitter Share on Reddit A police report obtained by the Phoenix New Times this week reveals a minor Waymo-related crash that occurred last October but hadn't been publicly reported until now. Here's how the New Times describes the incident: A white Waymo minivan was traveling westbound in the middle of three westbound lanes on Chandler Boulevard, in autonomous mode, when it unexpectedly braked for no reason. A Waymo backup driver behind the wheel at the time told Chandler police that "all of a sudden the vehicle began to stop and gave a code to the effect of 'stop recommended' and came to a sudden stop without warning." A red Chevrolet Silverado pickup behind the vehicle swerved to the right but clipped its back panel, causing minor damage. Nobody was hurt. Overall, Waymo has a strong safety record. Waymo has racked up more than 20 million testing miles in Arizona, California, and other states. This is far more than any human being will drive in a lifetime. Waymo's vehicles have been involved in a relatively small number of crashes. These crashes have been overwhelmingly minor with no fatalities and few if any serious injuries. Waymo says that a large majority of those crashes have been the fault of the other driver. So it's very possible that Waymo's self-driving software is significantly safer than a human driver. Further Reading This Arizona college student has taken over 60 driverless Waymo rides At the same time, Waymo isn't acting like a company with a multi-year head start on potentially world-changing technology. Three years ago, Waymo announced plans to buy "up to" 20,000 electric Jaguars and 62,000 Pacifica minivans for its self-driving fleet. The company hasn't recently released numbers on its fleet size, but it's safe to say that the company is nowhere near hitting those numbers. The service territory for the Waymo One taxi service in suburban Phoenix hasn't expanded much since it launched two years ago. Waymo hasn't addressed the slow pace of expansion, but incidents like last October's fender-bender might help explain it. Advertisement It’s hard to be sure if self-driving technology is safe Rear-end collisions like this rarely get anyone killed, and Waymo likes to point out that Arizona law prohibits tailgating. In most rear-end crashes, the driver in the back is considered to be at fault. At the same time, it's obviously not ideal for a self-driving car to suddenly come to a stop in the middle of the road. More generally, Waymo's vehicles sometimes hesitate longer than a human would when they encounter complex situations they don't fully understand. Human drivers sometimes find this frustrating, and it occasionally leads to crashes. In January 2020, a Waymo vehicle unexpectedly stopped as it approached an intersection where the stoplight was green. A police officer in an unmarked vehicle couldn't stop in time and hit the Waymo vehicle from behind. Again, no one was seriously injured. It's difficult to know if this kind of thing happens more often with Waymo's vehicles than with human drivers. Minor fender benders aren't always reported to the police and may not be reflected in official crash statistics, overstating the safety of human drivers. By contrast, any crash involving cutting-edge self-driving technology is likely to attract public attention. The more serious problem for Waymo is that the company can't be sure that the idiosyncrasies of its self-driving software won't contribute to a more serious crash in the future. Human drivers cause a fatality about once every 100 million miles of driving—far more miles than Waymo has tested so far. If Waymo scaled up rapidly, it would be taking a risk that an unnoticed flaw in Waymo's programming could lead to someone getting killed. And crucially, self-driving cars are likely to make different types of mistakes than human drivers. So it's not sufficient to make a list of mistakes human drivers commonly make and verify that self-driving software avoids making them. You also need to figure out if self-driving cars will screw up in scenarios that human drivers deal with easily. And there may be no other way to find these scenarios than with lots and lots of testing. Waymo has logged far more testing miles than other companies in the US, but there's every reason to think Waymo's competitors will face this same dilemma as they move toward large-scale commercial deployments. By now, a number of companies have developed self-driving cars that can handle most situations correctly most of the time. But building a car that can go millions of miles without a significant mistake is hard. And proving it is even harder. reader comments 293 with 120 posters participating, including story author Share this story Share on Facebook Share on Twitter Share on Reddit Timothy B. Lee Timothy is a senior reporter covering tech policy, blockchain technologies and the future of transportation. He lives in Washington DC. Email timothy.lee@arstechnica.com // Twitter @binarybits Advertisement You must login or create an account to comment. Channel Ars Technica ← Previous story Next story → Related Stories Sponsored Stories Powered by Today on Ars Store Subscribe About Us RSS Feeds View Mobile Site Contact Us Staff Advertise with us Reprints Newsletter Signup Join the Ars Orbital Transmission mailing list to get weekly updates delivered to your inbox. Sign me up → CNMN Collection WIRED Media Group © 2021 Condé Nast. All rights reserved. Use of and/or registration on any portion of this site constitutes acceptance of our User Agreement (updated 1/1/20) and Privacy Policy and Cookie Statement (updated 1/1/20) and Ars Technica Addendum (effective 8/21/2018). Ars may earn compensation on sales from links on this site. Read our affiliate link policy. Your California Privacy Rights | Do Not Sell My Personal Information The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of Condé Nast. Ad Choices arstechnica-com-4270 ---- Ars Technica’s non-fungible guide to NFTs | Ars Technica Skip to main content Biz & IT Tech Science Policy Cars Gaming & Culture Store Forums Subscribe Close Navigate Store Subscribe Videos Features Reviews RSS Feeds Mobile Site About Ars Staff Directory Contact Us Advertise with Ars Reprints Filter by topic Biz & IT Tech Science Policy Cars Gaming & Culture Store Forums Settings Front page layout Grid List Site theme Black on white White on black Sign in Comment activity Sign up or login to join the discussions! Stay logged in | Having trouble? Sign up to comment and more Sign up This article is for sale as an NFT, probably — Ars Technica’s non-fungible guide to NFTs Is blockchain item authentication a speculative fad or a technological sea change? Kyle Orland - Mar 29, 2021 11:15 am UTC Enlarge / Look ma, I'm on the blockchain Chris Torres | Beeple | Aurich Lawson reader comments 280 with 168 posters participating, including story author Share this story Share on Facebook Share on Twitter Share on Reddit It has been nearly 10 years now since Ars Technica first described Bitcoin to readers as “the world’s first virtual currency… designed by an enigmatic, freedom-loving hacker, and currently used by the geek underground to buy and sell everything from servers to cellphone jammers.” A decade later, Bitcoin and other cryptocurrencies are practically mainstream, and even most non-techies know the blockchain basics powering a decentralized financial revolution (or a persistent bubble, if you prefer). What Bitcoin was to 2011, NFTs are to 2021. So-called “non-fungible tokens” are having a bit of a moment in recent weeks, attracting a surge of venture capital cash and eye-watering speculative values for traceable digital goods. This despite the fact that most of the general public barely understands how this blockchain-based system of digital authentication works, or why it’s behind people paying $69 million for a single GIF. Fungible? Token? Perhaps the simplest way to start thinking about NFTs is as a digital version of the various “certificates of authenticity” that are prevalent in the market for real-world art and collectibles. Instead of a slip of paper, though, NFTs use cryptographic smart contracts and a distributed blockchain (most often built on top of Ethereum these days) to certify who owns each distinct, authentic token. As with cryptocurrencies, those contracts are verified by the collective distributed work of miners who keep the entire system honest with their computational work (the electricity for which creates a lot of nasty carbon emissions). And just like cryptocurrencies, those NFTs can be sold and traded directly on any number of marketplaces without any centralized control structure dictating the rules of those transfers. What makes NFTs different from your run-of-the-mill cryptocurrency is each token’s distinctiveness. With a cryptocurrency like Bitcoin, each individual unit is indistinguishable from another and has an identical value. Each individual Bitcoin can be traded or divided up just like any other Bitcoin (i.e. the Bitcoins are fungible). NFTs being “non-fungible” means each one represents a distinct entity with a distinct value that can’t be divided into smaller units. Just as anyone can start printing their own line of Certificates of Authenticity (or anyone can start up their own cryptocurrency to try to be “the next Bitcoin”), anyone with just a little technical knowhow can start minting their own distinct NFTs. Etherscan currently lists over 9,600 distinct NFT contracts, each its own network of trust representing and tracking its own set of digital goods. Enlarge / It's trivial to make a digital copy of any of the images for sale on Rarible. But those copies won't have the "authenticity" of the actual NFT being sold... These NFT contracts can represent pretty much anything that can exist digitally: a webpage, a GIF, a video clip, you name it. Digital artists are using NFTs to create “scarce” verified versions of their pieces, while collectible companies are using them to create traceable, unforgeable digital trading cards. Video game items and characters can be represented as NFTs, too, allowing for easy proof of ownership and portability even between games controlled by different companies (though the market for such games is still very immature). There are plenty of even odder examples out there. Vid is a TikTok-like social media network that gives users NFT-traced ownership of their posted videos (and royalty payments for the same). The Ethereum Name Service is using NFTs to set up a decentralized version of the ICANN-controlled Domain Name Service for finding online content. Aavegotchi is a weird hybrid that uses digital pets to represent your stake in a decentralized finance protocol called Aave. Essentially, there are hundreds of companies looking to NFTs for situations where they need to trace and verify ownership of distinct digital goods. Advertisement The idea has been catching on quickly, at least among speculators with a lot of money to throw around. Nonfungibles’ database of hundreds of different NFTs has tracked over $48 million in sales across nearly 40,000 NFT transactions in just the last week. Rarible, one of the most popular NFT marketplaces, saw its daily trading volume hit $1.9 million earlier this month, tripling the same number from just a day before. Cryptopunks, an early NFT representing 10,000 unique pixellated avatars, has seen over $176 million in total transactions since its creation in 2017 (with over 10 percent of that volume coming in the last week). How does it work? On a technical level, most NFTs are built on the ERC-721 standard. That framework sets up the basic cryptographic system to track ownership of each individual token (by linking it to user-controlled digital wallets) and allow for secure, verified transfer on the blockchain. Some NFT contracts have built additional attributes and features on top of that standard. The NFT for a cryptokitty, for instance, contains metadata representing that digital avatar’s unique look and traits. That metadata also establishes rules for how often it can “breed” new cryptokitty NFTs and what traits it will pass down to future generations. Those attributes are set and verified on the blockchain, and they can’t be altered no matter how or where the cryptokitty is used. When NFT’s are used to represent digital files (like GIFs or videos), however, those files usually aren’t stored directly “on-chain” in the token itself. Doing so for any decently sized file could get prohibitively expensive, given the cost of replicating those files across every user on the chain. Instead, most NFTs store the actual content as a simple URI string in their metadata, pointing to an Internet address where the digital thing actually resides. It may seem odd to link a system of decentralized, distributed digital goods to content hosted on centralized servers controlled by actual people or companies. Given that the vast majority of webpage links become defunct after just a few years, an NFT pointing to a plain-old web address wouldn’t seem to be a good long-term store of value. Enlarge / A diagram laying out the basic difference between IPFS distributed file storage and standard, centrally controlled HTTP servers. Blocknomi / MaxCDN Many NFTs get around this by using burgeoning blockchain-based file networks such as IPFS or pixelchain. These networks are designed to let users find, copy, and store cryptographically signed files that could be distributed among any number of independent nodes (including ones controlled by the NFT owner). In theory, linking an NFT to an IPFS address could ensure the digital file in question will continue to be accessible in perpetuity, as long as someone has mirrored a verifiable copy on some node in the IPFS network. Are NFTs really that valuable? Just like a certificate of authenticity, the value of an NFT (and the “unique” digital item it represents) is strongly tied to its provenance. The person who spent $560,000 for an NFT representing the original Nyan Cat meme, for instance, obviously didn’t purchase every copy of the famous animated GIF of a pop-tart cat with a rainbow trail behind it. You can still download your own identical copy with a few clicks. The NFT doesn’t even include the copyright to Nyan Cat, which would at least give the owner some legal control over the work (though some NFTs try to embed such rights in their contracts). What makes the Nyan Cat NFT interesting (and potentially valuable) is that it was verified and sold by Chris Torres, the person who created and posted the original Nyan Cat video to YouTube in 2011. That gives this copy of Nyan Cat a unique history and a tie to the meme’s creation that can’t be matched by any other copy (or any other NFT, unless Torres starts diluting the value by minting more). And the blockchain technology behind the NFT ensures the chain of custody for that version of the GIF can be traced back to Torres' original minting, no matter how many times it's sold or transferred. Advertisement Enlarge / This Nyan Cat GIF is practically worthless. So why is an NFT of an "identical" GIF worth so much money to a collector? Does that fact alone really give this NFT any more value than all of the other identical Nyan Cat GIFs floating around on the Internet? That’s for a highly speculative market to figure out. But just as a stroke-for-stroke copy of a Vermeer masterpiece doesn’t have the same value as the one-of-a-kind original, a verified “original” Nyan Cat from the meme’s creator may retain some persistent value to collectors. Just because digital goods are easier to copy than paintings doesn’t make one less valuable than the other, either. It’s trivial to make a near-perfect copy of a photographic print, but original photographs can still sell for millions of dollars to the right buyer. On the other hand, these NFTs might end up being more akin to those novelty deeds that claim the document gives you “ownership” of a star in the night sky. While there’s probably some sentimental value to the idea of owning a star, there isn’t any real robust market where the most coveted stars trade for large sums. And just like there are a lot of competing organizations offering “star deeds” these days, there are a lot of competing firms that could dilute the market with their own NFT offerings. Do you know where your NFT came from? All of this means that tracing the provenance of any given NFT can be of prime importance to its implicit value. NFT marketplace SuperRare ensures its NFTs are “authentic” by only minting tokens for a set of “hand-picked artists” for the time being. NBA Top Shot, meanwhile, relies on its NBA license to make sure its randomized packs of basketball video clips are each unique and have an “official” air to them. But there are plenty of situations where the original ownership of a particular NFT is more questionable. Game developer Jason Rohrer drew some controversy earlier this month by trying to sell NFT tokens for artwork originally created by other artists for his 2012 game The Castle Doctrine. This did not please many of the artists who were not aware their digital work was being resold as a token, to say the least. Then there’s Tokenized Tweets, a simple service that can create a sellable NFT token representing any tweet on the service, including ones created by other people. The service has recently stopped tokenizing tweets that include visual media, and it lets artists make takedown requests if their copyrighted art/photography is tokenized by the service. But that seems like a pretty skimpy Band-Aid for an offering that seems rife with fraud potential. Enlarge / The NFT-backed "Marble Card" frame for Reddit.com has no actual connection to the creators or owners of Reddit. But does that matter? There are also gray areas like Marble Cards, which lets you create an NFT “frame” intended to go around a specific, unique webpage URL. That makes each frame akin to a unique trading card with a picture of a webpage on it. While the service states clearly that “no third-party content is claimed or saved on any blockchain,” the direct link and implicit association with the webpage in question could lead to some thorny questions of ownership. With literally thousands of companies jumping into the NFT space, there’s a gold rush mentality that seems primed to spawn plenty of scams. And even legitimate NFT efforts could see their values fade away quickly if the market’s attention moves on to a different blockchain as its store of “authentic” value. Cryptokitties, one of the first popular NFT collectibles in late 2017, saw transaction volume plummet 98 percent in 2018 as high Ethereum fees and lack of novelty drove some of the more speculative players away. Back in 2011, it was unclear if Bitcoin was going to be a lasting financial instrument or a flash-in-the-pan technological fad. And here in 2021, you can say the same thing about the future for NFTs. Promoted Comments kaworu1986 Ars Praetorian jump to post So the NFT gives you... nothing? It has no connection to copyright ownership of the work it refers to and there's nothing stopping the creation of multiple NFTs for the same work either. Seems really pointless. Surprised to see there's people willing to pay actual money for any of this. 511 posts | registered 6/20/2007 reader comments 280 with 168 posters participating, including story author Share this story Share on Facebook Share on Twitter Share on Reddit Kyle Orland Kyle is the Senior Gaming Editor at Ars Technica, specializing in video game hardware and software. He has journalism and computer science degrees from University of Maryland. He is based in the Washington, DC area. Email kyle.orland@arstechnica.com // Twitter @KyleOrl Advertisement You must login or create an account to comment. Channel Ars Technica ← Previous story Next story → Related Stories Sponsored Stories Powered by Today on Ars Store Subscribe About Us RSS Feeds View Mobile Site Contact Us Staff Advertise with us Reprints Newsletter Signup Join the Ars Orbital Transmission mailing list to get weekly updates delivered to your inbox. Sign me up → CNMN Collection WIRED Media Group © 2021 Condé Nast. All rights reserved. Use of and/or registration on any portion of this site constitutes acceptance of our User Agreement (updated 1/1/20) and Privacy Policy and Cookie Statement (updated 1/1/20) and Ars Technica Addendum (effective 8/21/2018). Ars may earn compensation on sales from links on this site. Read our affiliate link policy. Your California Privacy Rights | Do Not Sell My Personal Information The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of Condé Nast. Ad Choices archivesblogs-com-4254 ---- ArchivesBlogs | a syndicated collection of blogs by and for archivists ArchivesBlogs a syndicated collection of blogs by and for archivists Search Main menu Skip to primary content Skip to secondary content Home About Post navigation ← Older posts Meet Ike Posted on September 18, 2020 from AOTUS “I come from the very heart of America.” – Dwight Eisenhower, June 12, 1945 At a time when the world fought to overcome tyranny, he helped lead the course to victory as the Supreme Allied Commander in Europe. When our nation needed a leader, he upheld the torch of liberty as our 34th president. As a new memorial is unveiled, now is the time for us to meet Dwight David Eisenhower. Eisenhower Memorial statue and sculptures, photo by the Dwight D. Eisenhower Memorial Commission An opportunity to get to know this man can be found at the newly unveiled Eisenhower Memorial in Washington, DC, and the all-new exhibits in the Eisenhower Presidential Library and Museum in Abilene, Kansas. Each site in its own way tells the story of a humble man who grew up in small-town America and became the leader of the free world. The Eisenhower Presidential Library and Museum is a 22-acre campus which includes several buildings where visitors can interact with the life of this president. Starting with the Boyhood Home, guests discover the early years of Eisenhower as he avidly read history books, played sports, and learned lessons of faith and leadership. The library building houses the documents of his administration. With more than 26 million pages and 350,000 images, researchers can explore the career of a 40+-year public servant. The 25,000 square feet of all-new exhibits located in the museum building is where visitors get to meet Ike and Mamie again…for the first time. Using NARA’s holdings, guests gain insight into the life and times of President Eisenhower. Finally, visitors can be reflective in the Place of Meditation where Eisenhower rests beside his first-born son, Doud, and his beloved wife Mamie. A true encapsulation of his life. Eisenhower Presidential Library and Museum, Abilene, Kansas The updated gallery spaces were opened in 2019. The exhibition includes many historic objects from our holdings which highlight Eisenhower’s career through the military years and into the White House. Showcased items include Ike’s West Point letterman’s sweater, the D-Day Planning Table, Soviet lunasphere, and letters related to the Crisis at Little Rock. Several new films and interactives have been added throughout the exhibit including a D-Day film using newly digitized footage from the archives. Eisenhower Presidential Library and Museum, Abilene, Kansas In addition to facts and quotes, visitors will leave with an understanding of how his experiences made Ike the perfect candidate for Supreme Allied Commander of the Allied Expeditionary Force in Europe and the 34th President of the United States. The Eisenhower Memorial, which opened to the public on September 18, is located at an important historical corridor in Washington, DC. The 4-acre urban memorial park is surrounded by four buildings housing institutions that were formed during the Eisenhower Administration and was designed by award-winning architect, Frank Gehry. In 2011, the National Archives hosted Frank Gehry and his collaborator, theater artist Robert Wilson in a discussion about the creation of the Eisenhower National Memorial.  As part of the creative process, Gehry’s team visited the Eisenhower Presidential Library and drew inspiration from the campus. They also used the holdings of the Eisenhower Presidential Library to form the plans for the memorial itself. This also led to the development of online educational programs which will have a continued life through the Eisenhower Foundation. Visitors to both sites will learn lasting lessons from President Eisenhower’s life of public service. Eisenhower Memorial, photo by the Dwight D. Eisenhower Memorial Commission Link to Post | Language: English The First Post 9/11 Phone-In: Richard Hake Sitting-in For Brian Lehrer Posted on September 16, 2020 from NYPR Archives & Preservation On September 18, 2001, The late Richard Hake sat-in for Brian Lehrer at Columbia University’s new studios at WKCR.  Just one week after the attack on the World Trade Center, WNYC was broadcasting on FM at reduced power from the Empire State Building and over WNYE (91.5 FM). Richard spoke with New York Times columnist Paul Krugman on airport security, author James Fallows on the airline industry, Robert Roach Jr. of the International Association of Machinists, and security expert and former New York City Police Commissioner William Bratton as well as WNYC listeners. Link to Post | Language: English Capturing Virtual FSU Posted on September 16, 2020 from Illuminations When the world of FSU changed in March 2020, the website for FSU was used as one of the primary communication tools to let students, faculty, and staff know what was going on. New webpages created specifically to share information and news popped up all over fsu.edu and we had no idea how long those pages would exist (ah, the hopeful days of March) so Heritage & University Archives wanted to be sure to capture those pages quickly and often as they changed and morphed into new online resources for the FSU community. Screenshot of a capture of the main FSU News feed regarding coronavirus. Captured March 13, 2020. While FSU has had an Archive-It account for a while, we hadn’t fully implemented its use yet. Archive-It is a web archiving service that captures and preserves content on websites as well as allowing us to provide metadata and a public interface to viewing the collected webpages. COVID-19 fast-tracked me on figuring out Archive-It and how we could best use it to capture these unique webpages documenting FSU’s response to the pandemic. I worked to configure crawls of websites to capture the data we needed, set up a schedule that would be sufficient to capture changes but also not overwhelm our data allowance, and describe the sites being captured. It took me a few tries but we’ve successfully been capturing a set of COVID related FSU URLs since March. One of the challenges of this work was some of the webpages had functionality that the web crawling just wouldn’t capture. This was due to some interactive widgets on pages or potentially some CSS choices the crawler didn’t like. I decided the content was the most important thing to capture in this case, more so than making sure the webpage looked exactly like the original. A good example of this is the International Programs Alerts page. We’re capturing this to track information about our study abroad programs but what Archive-It displays is quite different from the current site in terms of design. The content is all there though. On the left is how Archive-It displays a capture of the International Programs Alerts page. On the right is how the site actually looks. While the content is the same, the formatting and design is not As the pandemic dragged on and it became clear that Fall 2020 would be a unique semester, I added the online orientation site and the Fall 2020 site to my collection line-up. The Fall 2020 page, once used to track the re-opening plan recently morphed into the Stay Healthy FSU site where the community can look for current information and resources but also see the original re-opening document. We’ll continue crawling and archiving these pages in our FSU Coronavirus Archive for future researchers until they are retired and the university community returns to “normal” operations – whatever that might look like when we get there! Link to Post | Language: English Welcome to the New ClintonLibrary.Gov! Posted on September 14, 2020 from AOTUS The National Archives’ Presidential Libraries and Museums preserve and provide access to the records of 14 presidential administrations. In support of this mission, we developed an ongoing program to modernize the technologies and designs that support the user experience of our Presidential Library websites. Through this program, we have updated the websites of the Hoover, Truman, Eisenhower and Nixon Presidential Libraries.  Recently we launched an updated website for the William J. Clinton Presidential Library & Museum. The website, which received more than 227,000 visitors over the past year, now improves access to the Clinton Presidential Library holdings by providing better performance, improving accessibility, and delivering a mobile-friendly experience. The updated website’s platform and design, based in the Drupal web content management framework, enables the Clinton Presidential Library staff to make increasing amounts of resources available online—especially while working remotely during the COVID-19 crisis. To achieve this website redesign, staff from the National Archives’ Office of Innovation, with both web development and user experience expertise, collaborated with staff from the Clinton Presidential Library to define goals for the new website. Our user experience team first launched the project by interviewing staff of the Clinton Presidential Library to determine the necessary improvements for the updated website to facilitate their work. Next, the user experience team researched the Library’s customers—researchers, students, educators, and the general public—by analyzing user analytics, heatmaps, recordings of real users navigating the site, and top search referrals. Based on the data collected, the user experience team produced wireframes and moodboards that informed the final site design. The team also refined the website’s information architecture to improve the user experience and meet the Clinton Library staff’s needs.  Throughout the project, the team used Agile project management development processes to deliver iterative changes focused on constant improvement. To be Agile, specific goals were outlined, defined, and distributed among team members for mutual agreement. Work on website designs and features was broken into development “sprints”—two-week periods to complete defined amounts of work. At the end of each development sprint, the resulting designs and features were demonstrated to the Clinton Presidential Library staff stakeholders for feedback which helped further refine the website. The project to update the Clinton Presidential Library and Museum website was guided by the National Archives’ strategic goals—to Make Access Happen, Connect with Customers, Maximize NARA’s Value to the Nation, and Build our Future Through our People. By understanding the needs of the Clinton Library’s online users and staff, and leveraging the in-house expertise of our web development and user experience staff, the National Archives is providing an improved website experience for all visitors. Please visit the site, and let us know what you think! Link to Post | Language: English The Road to Edinburgh (Part 2) Posted on September 11, 2020 from Culture on Campus “Inevitably, official thoughts early turned to the time when Scotland would be granted the honour of acting as hosts. Thought was soon turned into action and resulted in Scotland pursuing the opportunity to be host to the Games more relentlessly than any other country has.” From foreword to The Official History of the IXth Commonwealth Games (1970) In our last blog post we left the campaigners working to bring the Commonwealth Games to Edinburgh reflecting on the loss of the 1966 Games to Kingston, Jamaica. The original plan of action sketched out by Willie Carmichael in 1957 had factored in a renewed campaign for 1970 if the initial approach to host the 1966 Games proved unsuccessful. The choice of host cities for the Games were made at the bi-annual General Assemblies of the Commonwealth Games Federation. The campaign to choose the host for 1970 began at a meeting held in Tokyo in 1964 (to coincide with the Olympics), with the final vote taking place at the 1966 Kingston Games. In 1964 the Edinburgh campaign presented a document to the Federation restating its desire to be host city for the Games in 1970. Entitled ‘Scotland Invites’ it laid out Scotland’s case: “We are founder members of the Federation; we have taken part in each Games since the inception in 1930; and we are the only one of six countries who have taken part in every Games, who have not yet had the honour of celebrating the Games.” From Scotland Invites, British Empire and Commonwealth Games Council for Scotland (1964) Documents supporting Edinburgh’s bid to host the 1970 Commonwealth Games presented to meetings of the General Assembly of the Commonwealth Games Federation at Tokyo in 1964 and Kingston in 1966 (ref. WC/2/9/2) Edinburgh faced a rival bid from Christchurch, New Zealand, the competition between the two cities recorded in a series of press cutting files collected by Willie Carmichael. Reports in the Scottish press presented Edinburgh as the favourites for 1970, with Christchurch using their bid as a rehearsal for a more serious campaign to host the 1974 competition. However, the New Zealanders rejected this assessment, arguing that it was the turn of a country in the Southern Hemisphere to host the Games. The 1966 Games brought the final frantic round of lobbying and promotion for the rival bids as members of the Commonwealth Games Federation gathered in Kingston. The British Empire and Commonwealth Games Council for Scotland presented a bid document entitled ‘Scotland 1970’ which included detailed information on the venues and facilities to be provided for the competition along with a broader description of the city of Edinburgh. Artists impression of the new Meadowbank athletics stadium, Edinburgh (ref. WC/2/9/2/12) At the General Assembly of the Commonwealth Games Federation held in Kingston, Jamaica, on 7 August 1966 the vote took place to decide the host of the 1970 Games. Edinburgh was chosen as host city by 18 votes to 11. The Edinburgh campaign team kept a souvenir of this important event. At the end of the meeting they collected together the evidence of their success and put it in an envelope marked ‘Ballot Cards – which recorded votes for Scotland at Kingston 1966.’ The voting cards and envelope now sit in an administrative file which forms part of the Commonwealth Games Scotland Archive. Voting card recording vote for Scotland to host the 1970 Commonwealth Games (ref. CG/2/9/1/2/7) Link to Post | Language: English New Ancient Texts Research Guide Posted on September 10, 2020 from Illuminations “What are the oldest books you have?” is a common question posed to Special Collections & Archives staff at Strozier Library. In fact, the oldest materials in the collection are not books at all but cuneiform tablets ranging in date from 2350 to 1788 BCE (4370-3808 years old). These cuneiform tablets, along with papyrus fragments and ostraka comprise the ancient texts collection in Special Collections & Archives. In an effort to enhance remote research opportunities for students to engage with the oldest materials housed in Strozier Library, a research guide to Ancient Texts at FSU Libraries has been created by Special Collections & Archives staff. Ancient Texts Research Guide The Ancient Texts at FSU Libraries research guide provides links to finding aids with collections information, high-resolution photos of the objects in the digital library, and links to articles or books about the collections. Research guides can be accessed through the tile, “Research Guides,” on the library’s main page. Special Collections & Archives currently has 11 research guides published that share information and resources on specific collections or subjects that can be accessed remotely. While direct access to physical collections is unavailable at this time due to Covid-19, we hope to resume in-person research when it is safe to do so, and Special Collections & Archives is still available to assist you remotely with research and instruction. Please get in touch with us via email at: lib-specialcollections@fsu.edu. For a full list of our remote services, please visit our services page. Link to Post | Language: English SSCI Members Embrace Need for Declassification Reform, Discuss PIDB Recommendations at Senate Hearing Posted on September 10, 2020 from Transforming Classification The Board would like to thank Acting Chairman Marco Rubio (R-FL), Vice Chairman Mark Warner (D-VA), and members of the Senate Select Committee on Intelligence (SSCI) for their invitation to testify yesterday (September 9, 2020) at the open hearing on “Declassification Policy and Prospects for Reform.”    At the hearing, PIDB Member John Tierney responded to questions from committee members about recommendations in the PIDB’s May 2020 Report to the President. He stressed the need for modernizing information security systems and the critical importance of sustained leadership through a senior-level Executive Agent (EA) to oversee and implement meaningful reform. In addition to Congressman Tierney, Greg Koch, the Acting Director of Information Management in the Office of the Director of National Intelligence (ODNI), testified in response to the SSCI’s concerns about the urgent need to improve how the Executive Branch classifies and declassifies national security information. Much of the discussion focused on the PIDB recommendation that the President designate the ODNI as the EA to coordinate the application of information technology, including artificial intelligence and machine learning, to modernize classification and declassification across the Executive Branch. Senator Jerry Moran (R-KS), and Senator Ron Wyden (D-OR), who is a member of the SSCI, joined the hearing to discuss the bill they are cosponsoring to modernize declassification. Their proposed “Declassification Reform Act of 2020” aligns with the PIDB Report recommendations, including the recommendation to designate the ODNI as the EA for coordinating the required reforms. The Board would like to thank Senators Moran and Wyden for their continued support and attention to this crucial issue. Modernizing the classification and declassification system is important for our 21st century national security and it is important for transparency and our democracy. Video of the entire hearing is available to view at the SSCI’s website, and from C-SPAN.  The transcript of prepared testimony submitted to the SSCI by Mr. Tierney is posted on the PIDB website. Link to Post | Language: English Be Connected, Keep A Stir Diary Posted on September 9, 2020 from Culture on Campus The new semester approaches and it’s going to be a bit different from what we’re used to here at the University of Stirling. To help you with your mental health and wellbeing this semester, we’ve teamed up with the Chaplaincy to provide students new and returning with a diary where you can keep your thoughts and feelings, process your new environment, record your joys and capture what the University was like for you in this unprecedented time. Diaries will be stationed at the Welcome Lounges from 12th September and we encourage students to take one for their personal use. Please be considerate of others and only take one diary each. Inside each diary is a QR code which will take you to our project page where you can learn more about the project and where we will be creating an online resource for you to explore the amazing diaries that we keep in Archives and Special Collections. We will be updating this page throughout semester with information from the Archives and events for you to join. Keep an eye out for #StirDiary on social media for all the updates! At the end of semester, you are able to donate your diary to the Archive where it will sit with the University’s institutional records and form a truthful and creative account of what student life was like in 2020. You absolutely don’t have to donate your diary if you don’t want to, the diary belongs to you and you can keep it, throw it away, donate it or anything else (wreck it?) as you like. If you would like to take part in the project but you have missed the Welcome Lounges, don’t worry! Contact Rosie on archives@stir.ac.uk or Janet on janet.foggie1@stir.ac.uk Welcome to the University of Stirling – pick a colour! Link to Post | Language: English PIDB Member John Tierney to Support Modernizing Classification and Declassification before the Senate Select Committee on Intelligence, Tomorrow at 3:00 p.m., Live on C-SPAN Posted on September 8, 2020 from Transforming Classification PIDB member John Tierney will testify at an open hearing on declassification policy and the prospects for reform, to be held by the Senate Select Committee on Intelligence (SSCI) tomorrow, Wednesday, September 9, 2020, from 3:00-4:30 p.m. EST. The hearing will be shown on the SSCI’s website, and televised live on C-SPAN.  SSCI members Senators Ron Wyden (D-OR) and Jerry Moran (R-KS) have cosponsored the proposed “Declassification Reform Act of 2020,” which aligns with recommendations of the PIDB’s latest report to the President, A Vision for the Digital Age: Modernization of the U.S, National Security Classification and Declassification System (May 2020). In an Opinion-Editorial appearing today on the website Just Security, Senators Wyden and Moran present their case for legislative reform to address the challenges of outmoded systems for classification and declassification. At the hearing tomorrow, Mr. Tierney will discuss how the PIDB recommendations present a vision for a uniform, integrated, and modernized security classification system that appropriately defends national security interests, instills confidence in the American people, and maintains sustainability in the digital environment. Mr. Greg Koch, Acting Director of the Information Management Office for the Office of the Director of National Intelligence, will also testify at the hearing. The PIDB welcomes the opportunity to speak before the SSCI and looks forward to discussing the need for reform with the Senators. After the hearing, the PIDB will post a copy of Mr. Tierney’s prepared testimony on its website and on this blog. Link to Post | Language: English Wiki loves monuments – digital skills and exploring stirling Posted on September 8, 2020 from Culture on Campus Every year the Wikimedia Foundation runs Wiki Loves Monuments – the world’s largest photo competition. Throughout September there is a push to take good quality images of listed buildings and monuments and add them to Wiki Commons where they will be openly licensed and available for use across the world – they may end up featuring on Wikipedia pages, on Google, in research and presentations worldwide and will be entered into the UK competition where there are prizes to be had! Below you’ll see a map covered in red and blue pins. These represent all of the listed buildings and monuments that are covered by the Wiki Loves Monuments competition, blue pins are places that already have a photograph and red pins have no photograph at all. The aim of the campaign is to turn as many red pins blue as possible, greatly enhancing the amazing bank of open knowledge across the Wikimedia platforms. The University of Stirling sits within the black circle. The two big clusters of red pins on the map are Stirling and Bridge of Allan – right on your doorstep! We encourage you to explore your local area. Knowing your surroundings, finding hidden gems and learning about the history of the area will all help Stirling feel like home to you, whether you’re a first year or returning student. Look at all those red dots! Of course, this year we must be cautious and safe while taking part in this campaign and you should follow social distancing rules and all government coronavirus guidelines, such as wearing facemasks where appropriate, while you are out taking photographs. We encourage you to walk to locations you wish to photograph, or use the NextBikes which are situated on campus and in Stirling rather than take excessive public transport purely for the purposes of this project. Walking and cycling will help you to get a better sense of where everything is in relation to where you live and keeping active is beneficial to your mental health and wellbeing. Here are your NextBike points on campus where you can pick up a bike to use We hope you’ll join us for this campaign – we have a session planned for 4-5pm on Thursday 17th September on Teams where we’ll tell you more about Wiki Loves Monuments and show you how to upload your images. Sign up to the session on Eventbrite. If you cannot make our own University of Stirling session then Wikimedia UK have their own training session on the 21st September which you can join. Please note that if you want your photographs to be considered for the competition prizes then they must be submitted before midnight on the 30th September. Photographs in general can be added at any time so you can carry on exploring for as long as you like! Finally, just to add a little incentive, this year we’re having a friendly competition between the University of Stirling and the University of St Andrews students to see who can make the most edits so come along to a training session, pick up some brilliant digital skills and let’s paint the town green! Link to Post | Language: English What’s the Tea? Posted on September 4, 2020 from Illuminations Katie McCormick, Associate Dean (she/her/hers) For this post, I interviewed Kate McCormick in order to get a better understanding of the dynamics of Special Collections & Archives. Katie is one of the Associate Deans and has been with SCA for about nine years now (here’s a video of Katie discussing some of our collections on C-SPAN in 2014!). As a vital part of the library, and our leader in Special Collections & Archives, I wanted to get her opinion on how the division has progressed thus far and how they plan to continue to do so in regards to diversity and inclusion.  How would you describe FSU SCA when you first started? “…People didn’t feel comfortable communicating [with each other]… There was one person who really wrote for the blog, and maybe it would happen once every couple of months. When I came on board, my general sense was that we were a department and a group of people with a lot of really great ideas and some fantastic materials, who had come a long way from where things has been, but who hadn’t gotten to a place to be able to organize to change more or to really work more as a team… We were definitely valued as (mostly) the fancy crown jewel group. Really all that mattered was the stuff… it didn’t matter what we were doing with it.” How do you feel the lapse in communication affected diversity and inclusion? “While I don’t have any direct evidence that it excluded people or helped create an environment that was exclusive, I do know that even with our staff at the time, there were times where it contributed to hostilities, frustrations, an  environment where people didn’t feel able to speak or be comfortable in…Everybody just wanted to be comfortable with the people who were just like them that it definitely created some potentially hostile environments. Looking back, I recognize what a poor job we did, as a workplace and a community truly being inclusive, and not just in ways that are immediately visible.” How diverse was SCA when you started?  “In Special Collections there was minimal diversity, certainly less than we have now… [For the libraries as a whole] as you go up in classification and pay, the diversity decreases. That was certainly true when I got here and that remains true.” How would you rank SCA’s diversity and inclusion when you first started? “…Squarely a 5, possibly in some arenas a 4. Not nothing, but I feel like no one was really thinking of it.” And how would you describe it now? “Maybe we’re approaching a 7, I feel like there’s been progress, but there’s still a long way to go in my opinion.” What are some ways we can start addressing these issues? What are some tangible ways you are planning to enact? “For me, some of the first places [is] forming the inclusive research services task force in Special Collections, pulling together a group to look at descriptive practices and applications, and what we’re doing with creating coordinated processing workflows. Putting these issues on the table from the beginning is really important… Right now because we’re primarily in an online environment, i think we have some time to negotiate and change our practices so when we are re-open to the public and people are physically coming in to the spaces, we have new forms, new trainings, people have gone through training that gives them a better sense of identity, communication, diversity.” After my conversation with Katie, I feel optimistic about the direction we are heading in. Knowing how open Special Collections & Archives is about taking critique and trying to put it into action brought me comfort. I’m excited to see how these concerns are addressed and how the department will be putting Dynamic Inclusivity, one of Florida State University’s core values, at the forefront of their practice. I would like to give a big thank you to Katie McCormick for taking the time to do this post with me and for having these conversations! Link to Post | Language: English friday art blog: Terry Frost Posted on September 3, 2020 from Culture on Campus Black and Red on Blue (Screenprint, A/P, 1968) Born in Leamington Spa, Warwickshire, in 1915, Terry Frost KBE RA did not become an artist until he was in his 30s. During World War II, he served in France, the Middle East and Greece, before joining the commandos. While in Crete in June 1941 he was captured and sent to various prisoner of war camps. As a prisoner at Stalag 383 in Bavaria, he met Adrian Heath who encouraged him to paint. After the war he attended Camberwell School of Art and the St. Ives School of Art and painted his first abstract work in 1949. In 1951 he moved to Newlyn and worked as an assistant to the sculptor Barbara Hepworth. He was joined there by Roger Hilton, where they began a collaboration in collage and construction techniques. In 1960 he put on his first exhibition in the USA, in New York, and there he met many of the American abstract expressionists, including Marc Rothko who became a great friend. Terry Frost’s career included teaching at the Bath Academy of Art, serving as Gregory Fellow at the University of Leeds, and also teaching at the Cyprus College of Art. He later became the artist in residence and Professor of Painting at the Department of Fine Art of the University of Reading. Orange Dusk (Lithograph, 2/75, 1970) Frost was renowned for his use of the Cornish light, colour and shape. He became a leading exponent of abstract art and a recognised figure of the British art establishment. These two prints were purchased in the early days of the Art Collection at the beginning of the 1970s. Terry Frost married Kathleen Clarke in 1945 and they had six children, two of whom became artists, (and another, Stephen Frost, a comedian). His grandson Luke Frost, also an artist, is shown here, speaking about his grandfather. Link to Post | Language: English PIDB Sets Next Virtual Public Meeting for October 7, 2020 Posted on September 3, 2020 from Transforming Classification The Public Interest Declassification Board (PIDB) has scheduled its next virtual public meeting for Wednesday, October 7, 2020, from 1:00 to 2:30 p.m.  At the meeting, PIDB members will discuss their priorities for improving classification and declassification in the next 18 months. They will also introduce former Congressman Trey Gowdy, who was appointed on August 24, 2020, to a three-year term on the PIDB. A full agenda, as well as information on how to pre-register, and how to submit questions and comments to the PIDB prior to the virtual meeting, will be posted soon to Transforming Classification. The PIDB looks forward to your participation in continuing our public discussion of priorities for modernizing the classification system going forward. Link to Post | Language: English Digital Collections Updates Posted on September 3, 2020 from UNC Greensboro Digital Collections So as we start a new academic year, we thought this would be a good time for an update on what we’ve been working on recently. Digital collections migration: After more than a year’s delay, the migration of our collections into a new and more user-friendly (and mobile-friendly) platform driven by the Islandora open-source content management system is in the home stretch. This has been a major undertaking and has given us the opportunity to reassess how our collections work. We hope to be live with the new platform in November. 30,000 items (over 380,000 digital images) have already been migrated. 2019-2020 Projects: We’ve made significant progress on most of this year’s projects (see link for project descriptions), though many of these are currently not yet online pending our migration to the Islandora platform: Grant-funded projects: Temple Emanuel Project: We are working with the Public History department and a graduate student in that program. Several hundred items have already been digitized and more work is being done. We are also exploring grant options with the temple to digitize more material. People Not Property: NC Slave Deeds Project: We are in the final year of this project funded by the National Archives and hope to have it online as part of the Digital Library on American Slavery late next year. We are also exploring additional funding options to continue this work. Women Who Answered the Call: This project was funded by a CLIR Recordings at Risk grant. The fragile cassettes have been digitized and we are midway through the process of getting them online in the new platform. Library-funded projects: Poetas sin Fronteras: Poets Without Borders, the Scrapbooks of Dr. Ramiro Lagos: These items have been digitized and will go online when the new platform launches. North Carolina Runaway Slaves Ads Project, Phase 2: Work continues on this ongoing project and over 5700 ads are now online. This second phase has involved both locating and digitizing/transcribing the ads, and we will soon triple the number of ads done in Phase One. We are also working on tighter integration of this project into the Digital Library on American Slavery. PRIDE! of the Community: This ongoing project stemmed from an NEH grant two years ago and is growing to include numerous new oral history interviews and (just added) a project to digitize and display ads from LGBTQ+ bars and other businesses in the Triad during the 1980s and 1990s. We are also working with two Public History students on contextual and interpretive projects based on the digital collection. Faculty-involved projects: Black Lives Matter Collections: This is a community-based initiative to document the Black Lives Matter movement and recent demonstrations and artwork in the area. Faculty: Dr. Tara Green (African America and Diaspora Studies);  Stacey Krim, Erin Lawrimore, Dr. Rhonda Jones, David Gwynn (University Libraries). Civil Rights Oral Histories: This has become multiple projects. We are working with several faculty members in the Media Studies department to make these transcribed interviews available online. November is the target. Faculty: Matt Barr, Jenida Chase, Hassan Pitts, and Michael Frierson (Media Studies); Richard Cox, Erin Lawrimore, David Gwynn (University Libraries). Oral Contraceptive Ads: Working with a faculty member and a student on this project, which may be online by the end of the year. Faculty: Dr. Heather Adams (English); David Gwynn and Richard Cox (University Libraries). Well-Crafted NC: Work is ongoing and we are in the second year of a UNCG P2 grant, working with a faculty member in eth Bryan School and a brewer based in Asheboro. Faculty: Erin Lawrimore, Richard Cox, David Gwynn (University Libraries), Dr. Erick Byrd (Marketing, Entrepreneurship, Hospitality, and Tourism) New projects taken on during the pandemic: City of Greensboro Scrapbooks: Huge collection of scrapbooks from the Greensboro Urban Development Department dating back to the 1940s. These items have been digitized and will go online when the new platform launches. Negro Health Week Pamphlets: 1930s-1950s pamphlets published by the State of North Carolina. These items are currently being digitized and will go online when the new platform launches. Clara Booth Byrd Collection: Manuscript collection. These items are currently being digitized and will go online when the new platform launches. North Carolina Speaker Ban Collection: Manuscript collection. These items are currently being digitized and will go online when the new platform launches. Mary Dail Dixon Papers: Manuscript collection. These items are currently being digitized and will go online when the new platform launches. Ruth Wade Hunter Collection: Manuscript collection. These items are currently being digitized and will go online when the new platform launches. Projects on hold pending the pandemic: Junior League of Greensboro: Much of this has already been digitized and will go online when the new platform launches. UNCG Graduate School Bulletins: Much of this has already been digitized and will go online when the new platform launches.  David Gwynn (Digitization Coordinator, me) offers kudos to Erica Rau and Kathy Howard (Digitization and Metadata Technicians); Callie Coward (Special Collections Cataloging & Digital Projects Library Technician); Charley Birkner (Technology Support Technician); and Dr. Brian Robinson (Fellow for Digital Curation and Scholarship) for their great work in very surreal circumstances over the past six months. Link to Post | Language: English CORRECTION: Creative Fellowship Call for Proposals Posted on September 3, 2020 from Notes For Bibliophiles We have an update to our last post! We’re still accepting proposals for our 2021 Creative Fellowship… But we’ve decided to postpone both the Fellowship and our annual Exhibition & Program Series by six months due to the coronavirus. The annual exhibition will now open on October 1, 2021 (which is 13 months away, but we’re still hard at work planning!). The new due date for Fellowship proposals is April 1, 2021. We’ve adjusted the timeline and due dates in the call for proposals accordingly. Link to Post | Language: English On This Day in the Florida Flambeau, Friday, September 2, 1983 Posted on September 2, 2020 from Illuminations Today in 1983, a disgruntled reader sent in this letter to the editor of the Flambeau. In it, the reader describes the outcome of a trial and the potential effects that outcome will have on the City of Tallahassee. Florida Flambeau, September 2, 1983 It is such a beautifully written letter that I still can’t tell whether or not it’s satire. Do you think the author is being serious or sarcastic? Leave a comment below telling us what you think! Link to Post | Language: English Hartgrove, Meriwether, and Mattingly Posted on September 2, 2020 from The Consecrated Eminence The past few months have been a challenging time for archivists everywhere as we adjust to doing our work remotely. Fortunately, the materials available in Amherst College Digital Collections enable us to continue doing much of our work. Back in February, I posted about five Black students from the 1870s and 1880s — Black Men of Amherst, 1877-1883 — and now we’re moving into the early 20th century. A small clue in The Olio has revealed another Black student that was not included in Harold Wade’s Black Men of Amherst. Robert Sinclair Hartgrove (AC 1905) was known to Wade, as was Robert Mattingly (AC 1906), but we did not know about Robert Henry Meriwether. These three appear to be the first Black students to attend Amherst in the twentieth century. Robert Sinclair Hartgrove, Class of 1905 The text next to Hartgrove’s picture in the 1905 yearbook gives us a tiny glimpse into his time at Amherst. The same yearbook shows Hartgrove not just jollying the players, but playing second base for the Freshman baseball team during the 1902 season. Freshman Baseball Team, 1902 The reference to Meriwether sent me to the Amherst College Biographical Record, where I found Robert Henry Meriwether listed as a member of the Class of 1904. A little digging into the College Catalogs revealed that he belongs with the Class of 1905. College Catalog, 1901-02 Hartgrove and Meriwether are both listed as members of the Freshman class in the 1901-02 catalog. The catalog also notes that they were both from Washington, DC and the Biographical Record indicates that they both prepped at Howard University before coming to Amherst. We find Meriwether’s name in the catalog for 1902-03, but he did not “pull through” as The Olio hopes Hartgrove will; Meriwether returned to Howard University where he earned his LLB in 1907. Hartgrove also became a lawyer, earning his JB from Boston University in 1908 and spending most of his career in Jersey City, NJ. Robert Nicholas Mattingly, Class of 1906 Mattingly was born in Louisville, KY in 1884 and prepped for Amherst at The M Street School in Washington, DC, which changed its name in 1916 to The Dunbar School. Matt Randolph (AC 2016) wrote “Remembering Dunbar: Amherst College and African-American Education in Washington, DC” for the book Amherst in the World, which includes more details of Mattingly’s life. The Amherst College Archives and Special Collections reading room is closed to on-site researchers. However, many of our regular services are available remotely, with some modifications. Please read our Services during COVID-19 page for more information. Contact us at archives@amherst.edu. Link to Post | Language: English Democratizing Access to our Records Posted on September 1, 2020 from AOTUS The National Archives has a big, hairy audacious strategic goal to provide public access to 500 million digital copies of our records through our online Catalog by FY24. When we first announced this goal in 2010, we had less than a million digital copies in the Catalog and getting to 500 million sounded to some like a fairy tale. The goal received a variety of reactions from people across the archival profession, our colleagues and our staff. Some were excited to work on the effort and wanted particular sets of records to be first in line to scan. Some laughed out loud at the sheer impossibility of it. Some were angry and said it was a waste of time and money. Others were fearful that digitizing the records could take their jobs away. We moved ahead. Staff researched emerging technologies and tested them through pilots in order to increase our efficiency. We set up a room at our facilities in College Park to transfer our digital copies from individual hard drives to new technology from Amazon, known as snowballs. We worked on developing new partnership projects in order to get more records digitized. We streamlined the work in our internal digitization labs and we piloted digitization projects with staff in order to find new ways to get digital copies into the Catalog. By 2015, we had 10 million in the Catalog. We persisted. In 2017, we added more digital objects, with their metadata, to the Catalog in a single year than we had for the preceding decade of the project. Late in 2019, we surpassed a major milestone by having more than 100 million digital copies of our records in the Catalog. And yes, it has strained our technology. The Catalog has developed growing pains, which we continue to monitor and mitigate. We also created new finding aids that focus on digital copies of our records that are now available online: see our Record Group Explorer and our Presidential Library Explorer. So now, anyone with a smart phone or access to a computer with wifi, can view at least some of the permanent records of the U.S. Federal government without having to book a trip to Washington, D.C. or one of our other facilities around the country. The descriptions of over 95% of our records are also available through the Catalog, so even if you can’t see it immediately, you can know what records exist. And that is convenient for the millions of visitors we get each year to our website, even more so during the pandemic. National Archives Identifier 20802392 We are well on our way to 500 million digital copies in the Catalog by FY24. And yet, with over 13 billion pages of records in our holdings, we know, we have only just begun. Link to Post | Language: English Lola Hayes and “Tone Pictures of the Negro in Music” Posted on August 31, 2020 from NYPR Archives & Preservation Lola Wilson Hayes (1906-2001) was a highly-regarded African-American mezzo-soprano, WNYC producer, and later, much sought after vocal teacher and coach. A Boston native, Hayes was a music graduate of Radcliffe College and studied voice with Frank Bibb at Baltimore’s Peabody Conservatory. She taught briefly at a black vocational boarding school in New Jersey known as the ‘Tuskeegee of the north'[1] before embarking on a recital and show career which took her to Europe and around the United States. During World War II, she also made frequent appearances at the American Theatre Wing of the Stage Door Canteen of New York and entertained troops at USO clubs and hospitals. Headline from The New York Age, August 12, 1944, pg. 10. (WNYC Archive Collections) Hayes also made time to produce a short but notable run of WNYC programs, which she hosted and performed on the home front. Her November and December 1943 broadcasts were part of a rotating half-hour time slot designated for known recitalists. She shared the late weekday afternoon slot with sopranos Marjorie Hamill, Pina La Corte, Jean Carlton, Elaine Malbin, and the Hungarian pianist Arpád Sándor. Hayes’ series, Tone Pictures of the Negro in Music, sought to highlight African-American composers and was frequently referred to as The Negro in Music. The following outline of 1943 and 1944 broadcasts was pieced together from the WNYC Masterwork Bulletin program guide and period newspaper radio listings. Details on the 1943 programs are sparse. We know that Hayes’ last broadcast in 1943 featured the pianist William Duncan Allen (1906-1999) performing They Led My Lord Away by Roland Hayes and Good Lord Done Been Here by Hall Johnson, and a Porgy and Bess medley by George Gershwin. Excerpt from “Behind the Mike,” November/December 1944, WNYC Masterwork Bulletin. (WNYC Archive Collections) The show was scheduled again in August 1944 as a 15-minute late Tuesday afternoon program and in November that year as a half-hour Wednesday evening broadcast. The August programs began with an interview of soprano Abbie Mitchell (1884-1960), the widow of composer and choral director Will Marion Cook (1869-1944). The composer and arranger Hall Johnson (1888-1970) was her studio guest the following week. The third Tuesday of the month featured pianist Jonathan Brice performing “songs of young contemporary Negro composers,” and the August shows concluded with selections from Porgy and Bess and Cameron Jones. The November broadcasts focused on the work of William Grant Still, “the art songs, spirituals and street cries” of William Lawrence, as well as the songs and spirituals of William Rhodes, lyric soprano Lillian Evanti, and baritone Harry T. Burleigh. Hayes also spent airtime on the work of neo-romantic composer and violinist Clarence Cameron White. The November 29th program considered “the musical setting of poems by Langston Hughes and reportedly included the bard himself. “Langston Hughes was guest of honor and punctuated his interview with a reading from his opera Troubled Island.”[2] This was not the first time the poet’s work was the subject of Hayes’ broadcast. Below is a rare copy of her script from a program airing eight months earlier when she sat in for the regularly scheduled host, soprano Marjorie Hamill. The script for Tone Pictures of the Negro in Music hosted by Lola Hayes on March 24, 1944. (Image used with permission of Van Vecten Trust and courtesy of the Carl Van Vechten Papers Relating to African American Arts and Letters. James Weldon Johnson Collection in the Yale Collection in the Yale Collection of American Literature, Beinecke Rare Book and Manuscript Library)[3] It is unfortunate, but it appears there are no recordings of Lola Hayes’ WNYC program. We can’t say if that’s because they weren’t recorded or, if they were, the lacquer discs have not survived. We do know that World War II-era transcription discs, in general, are less likely to have survived since most of them were cut on coated glass, rather than aluminum, to save vital metals for the war effort. After the war, Hayes focused on voice teaching and coaching. Her students included well-known performers like Dorothy Rudd Moore, Hilda Harris, Raoul Abdul-Rahim, Carol Brice, Nadine Brewer, Elinor Harper, Lucia Hawkins, and Margaret Tynes. She was the first African-American president of the New York Singing Teachers Association (NYSTA), serving in that post from 1970-1972. In her later years, she devoted much of her time to the Lola Wilson Hayes Vocal Artists Award, which gave substantial financial aid to young professional singers worldwide.[4]  ___________________________________________________________ [1] The Manual Training and Industrial School for Colored Youth in Bordentown, New Jersey [2] “The Listening Room,” The People’s Voice, December 2, 1944, pg. 29. The newspaper noted that the broadcast included Hall Johnson’s Mother to Son, Cecil Cohen’s Death of an Old Seaman and Florence Price’s Song to a Dark Virgin, all presumably sung by host, Lola Hayes.  Troubled Island is an opera set in Haiti in 1791. It was composed by William Grant Still with a libretto by Langston Hughes and Verna Arvey. [3] Page two of the script notes Langston Hughes’ grandmother was married to a veteran of the 1859 Harper’s Ferry raid led by abolitionist John Brown. Indeed, Hughes’ grandmother’s first husband was Lewis Sheridan Leary, who was one of Brown’s raiders at Harper’s Ferry. For more on the story please see: A Shawl From Harper’s Ferry. [4] Abdul, Raoul, “Winners of the Lola Hayes Vocal Scholarship and Awards,” The New York Amsterdam News, February 8, 1992, pg. 25. Special thanks to Valeria Martinez for research assistance.   Link to Post | Language: English the road to edinburgh Posted on August 28, 2020 from Culture on Campus On the 50th anniversary of the 1970 Edinburgh Commonwealth Games newly catalogued collections trace the long road to the first Games held in Scotland. A handwritten note dated 10th April 1957 sits on the top of a file marked ‘Scotland for 1970 Host’. The document forms part of a series of files recording the planning, organisation and operation of the 1970 Edinburgh Commonwealth Games, the first to be held in Scotland. Written by Willie Carmichael, a key figure in Scotland’s Games history, the note sets out his plans to secure the Commonwealth Games for Scotland. He begins by noting that Scotland’s intention to host the Games was made at a meeting of Commonwealth Games Federations at the 1956 Melbourne Olympic Games. Carmichael then proceeds to lay out the steps required to make Scotland’s case to be the host of the Games in 1966 or 1970. Willie Carmichael The steps which Carmichael traced out in his note can be followed through the official records and personal papers relating to the Games held in the University Archives. The recently catalogued administrative papers of Commonwealth Games Scotland for the period provide a detailed account of the long process of planning for this major event, recording in particular the close collaboration with Edinburgh Corporation which was an essential element in securing the Games for Scotland (with major new venues being required for the city to host the event). Further details and perspectives on the road to the 1970 Games can be found in the personal papers of figures associated with Commonwealth Games Scotland also held in the University Archives including Sir Peter Heatly and Willie Carmichael himself. The choice of host city for the 1966 Games was to be made at a meeting held at the 1962 Games in Perth, Australia. The first target on Carmichael’s plan, the Edinburgh campaign put forward its application as host city at a Federation meeting held in Rome in 1960. A series of press cutting files collected by Carmichael trace the campaigns progress from this initial declaration of intent through to the final decision made in Perth. Documents supporting Edinburgh’s bid to host the 1966 Commonwealth Games presented to meetings of the Commonwealth Games Federation in Rome (1960) and Perth (1962), part of the Willie Carmichael Archive. Edinburgh faced competition both within Scotland, with the press reporting a rival bid from Glasgow, and across the Commonwealth, with other nations including Jamaica, India and Southern Rhodesia expressing an interest in hosting the 1966 competition. When it came to the final decision in 1962 three cities remained in contention: Edinburgh, Kingston in Jamaica, and Salisbury in Southern Rhodesia. The first round of voting saw Salisbury eliminated. In the subsequent head-to-head vote Kingston was selected as host city for the 1966 Games by the narrowest of margins (17 votes to 16). As Carmichael had sketched out in his 1957 plan if Edinburgh failed in its attempt to host the 1966 Games it would have another opportunity to make its case to hold the 1970 event. Carmichael and his colleagues travelled to Kingston in 1966 confident of securing the support required to bring the Games to Scotland in 1970. In our next blog we’ll look at how they succeeded in making the case for Edinburgh. ‘Scotland Invites’, title page to document supporting Edinburgh’s bid to host the 1966 Commonwealth Games (Willie Carmichael Archive). Link to Post | Language: English friday art blog: kate downie Posted on August 27, 2020 from Culture on Campus Nanbei by Kate Downie (Oil on canvas, 2013) During a series of visits to China a few years ago, Kate Downie was brought into contact with traditional ink painting techniques, and also with the China of today. There she encountered the contrasts and meeting points between the epic industrial and epic romantic landscapes: the motorways, rivers, cityscapes and geology – all of which she absorbed and reflected on in a series of oil and ink paintings. As Kate creates studies for her paintings in situ, she is very much immersed in the landscapes that she is responding to and reflecting on. The artwork shown above, ‘Nanbei’, which was purchased by the Art Collection in 2013, tackles similar themes to Downie’s Scottish based work, reflecting both her interest in the urban landscape and also the edges where land meets water. Here we encounter both aspects within a new setting – an industrial Chinese landscape set by the edge of a vast river. Downie is also obsessed with bridges. As well as the bridge that appears in this image, seemingly supported by trees that follow its line, the space depicted forms an unseen bridge between two worlds and two extremes, between epic natural and epic industrial forms. In this imagined landscape, north meets south (Nanbei literally means North South) and mountains meet skyscrapers; here both natural and industrial structures dominate the landscape. This juxtaposition is one of the aspects of China that impressed the artist and inspired the resulting work. After purchasing this work by Kate Downie, the Art Collection invited her to be one of three exhibiting artists in its exhibition ‘Reflections of the East’ in 2015 (the other two artists were Fanny Lam Christie and Emma Scott Smith). All artists had links to China, and ‘Nanbei’ was central to the display of works in the Crush Hall that Kate had entitled ‘Shared Vision’. Temple Bridge (Monoprint, 2015) Kate Downie studied Fine Art at Gray’s School of Art, Aberdeen and has held artists’ residencies in the USA and Europe. She has exhibited widely and has also taught and directed major art projects. In 2010 Kate Downie travelled to Beijing and Shanghai to work with ink painting masters and she has since returned there several times, slowly building a lasting relationship with Chinese culture. On a recent visit she learned how to carve seals from soapstone, and these red stamps can now be seen on all of her work, including on her print ‘Temple Bridge’ above, which was purchased by the Collection at the end of the exhibition. Kate Downie recently gave an interesting online talk about her work and life in lockdown. It was organised by The Scottish Gallery in Edinburgh which is currently holding an exhibition entitled ‘Modern Masters Women‘ featuring many women artists. Watch Kate Downie’s talk below: Link to Post | Language: English Telling Untold Stories Through the Emmett Till Archives Posted on August 27, 2020 from Illuminations Detail of a newspaper clipping from the Joseph Tobias Papers, MSS 2017-002 Friday August 28th marks the 65th anniversary of the abduction and murder of Emmett Till. Till’s murder is regarded as a significant catalyst for the mid-century African-American Civil Rights Movement. Calls for justice for Till still drive national conversations about racism and oppression in the United States. In 2015, Florida State University (FSU) Libraries Special Collections & Archives established the Emmett Till Archives in collaboration with Emmett Till scholar Davis Houck, filmmaker Keith Beauchamp, and author Devery Anderson. Since then, we have continued to build robust research collections of primary and secondary sources related to the life, murder, and commemoration of Emmett Till. We invite researchers from around the world, from any age group, to explore these collections and ask questions. It is through research and exploration of original, primary resources that Till’s story can be best understood and that truth can be shared. “Mamie had a little boy…”, from the Wright Family Interview, Keith Beauchamp Audiovisual Recordings, MSS 2015-016 FSU Special Collections & Archives. As noted in our Emmett Till birthday post this year, an interview with Emmett Till’s family, conducted by civil rights filmmaker Keith Beauchamp in 2018, is now available through the FSU Digital Library in two parts. Willie Wright, Thelma Wright Edwards, and Wilma Wright Edwards were kind enough to share their perspectives with Beauchamp and in a panel presentation at the FSU Libraries Heritage Museum that Spring. Soon after this writing, original audio and video files from the interview will be also be available to any visitor, researcher, or aspiring documentary filmmaker through the FSU Digital Library. Emmett Till, December 1954. Image from the Davis Houck Papers A presentation by a Till scholar in 2019 led to renewed contact with and a valuable donation from FSU alum Steve Whitaker, who in a way was the earliest contributor to Emmett Till research at FSU. His seminal 1963 master’s thesis, completed right here at Florida State University, is still the earliest known scholarly work on the kidnapping and murder of Till, and was influential on many subsequent retellings of the story. The Till Archives recently received a few personal items from Whitaker documenting life in mid-century Mississippi, as well as a small library of books on Till, Mississippi law, and other topics that can give researchers valuable context for his thesis and the larger Till story. In the future, the newly-founded Emmett Till Lecture and Archives Fund will ensure further opportunities to commemorate Till through events and collection development. FSU Libraries will continue to partner with Till’s family, the Emmett Till Memory Project, Emmett Till Interpretive Center, the Emmett Till Project, the FSU Civil Rights Institute, and other institutions and private donors to collect, preserve and provide access to the ongoing story of Emmett Till. Sources and Further Reading FSU Libraries. Emmett Till Archives Research Guide. https://guides.lib.fsu.edu/till Wright Family Interview, Keith Beauchamp Audiovisual Recordings, MSS 2015-016, Special Collections & Archives, Florida State University, Tallahassee, Florida. Interview Part I: http://purl.flvc.org/fsu/fd/FSU_MSS2015-016_BD_001 Interview Part II: http://purl.flvc.org/fsu/fd/FSU_MSS2015-016_BD_002 Link to Post | Language: English Former Congressman Trey Gowdy Appointed to the PIDB Posted on August 26, 2020 from Transforming Classification On August 24, 2020, House Minority Leader Kevin McCarthy (R-CA) appointed former Congressman Harold W. “Trey” Gowdy, III as a member of the Public Interest Declassification Board. Mr. Gowdy served four terms in Congress, representing his hometown of Spartansburg in South Carolina’s 4th congressional district. The Board members and staff welcome Mr. Gowdy and look forward to working with him in continuing efforts to modernize and improve how the Federal Government classifies and declassifies sensitive information. Mr. Gowdy was appointed by the Minority Leader McCarthy on August 24, 2020. He is serving his first three-year term on the Board. His appointment was announced on August 25, 2020 in the Congressional Record https://www.congress.gov/116/crec/2020/08/25/CREC-2020-08-25-house.pdf Link to Post | Language: English Tracey Sterne Posted on August 25, 2020 from NYPR Archives & Preservation In November of 1981, an item appeared in The New York Times -and it seemed all of us in New York (and elsewhere) who were interested in music, radio, and culture in general, saw it:  “Teresa Sterne,” it read, “who in 14 years helped build the Nonesuch Record label into one of the most distinguished and innovative in the recording industry, will be named Director of Music Programming at WNYC radio next month.” The piece went on to promise that Ms. Sterne, under WNYC’s management, would be creating “new kinds of programming -including some innovative approaches to new music and a series of live music programs.”  This was incredible news. Sterne, by this time, was a true cultural legend. She was known not only for those 14 years she’d spent building Nonesuch, a remarkably smart, serious, and daring record label —but also for how it had all ended, with her sudden dismissal from that label by Elektra, its parent company (whose own parent company was Warner Communications), two years earlier. The widely publicized outrage over her termination from Nonesuch included passionate letters of protest from the likes of Leonard Bernstein, Elliott Carter, Aaron Copland —only the alphabetical beginning of a long list of notable musicians, critics and journalists who saw her firing as a sharp blow to excellence and diversity in music. But the dismissal stood.  By coincidence, only three weeks before the news of her hiring broke, I had applied for a job as a part-time music-host at WNYC. Steve Post, a colleague whom I’d met while doing some producing and on-air work at New York’s decidedly non-profit Pacifica station, WBAI, had come over from there to WNYC, a year before, to do the weekday morning music and news program. “Fishko,” he said to me, “they need someone on the weekends -and I think they want a woman.” My day job of longstanding was as a freelance film editor, but I wanted to keep my hand in the radio world. Weekends would be perfect. In two interviews with executives at WNYC, I had failed to impress. But now I could feel hopeful about making a connection to Ms. Sterne, who was a music person, as was I.  Soon after her tenure began, I threw together a sample tape and got it to her through a contact on the inside. And she said, simply: Yeah, let’s give her a chance. And so it began.  Tracey—the name she was called by all friends and colleagues — seemed, immediately, to be a fascinating, controversial character: she was uniquely qualified to do the work at hand, but at the same time she was a fish out of water. She was un-corporate, not inclined to be polite to the young executives upstairs, and not at all enamored of current trends or audience research. For this we dearly loved her, those of us on the air. She cared how the station sounded, how the music connected, how the information about the music surrounded it. Her preoccupations seemed, even then, to be of the Old School. But she was also fiercely modern in her attitude toward the music, unafraid to mix styles and periods, admiring of new music, up on every instrumentalist and conductor and composer, young, old, avant-garde, traditional. And she had her own emphatic and impeccable taste. Always the best, that was her motto —whatever it is, if it’s great, or even just extremely good, it will distinguish itself and find its audience, she felt.  Tracey Sterne, age 13, rehearsing for a Tchaikovsky concerto performance at WNYC in March 1940. (Finkelstein/WNYC Archive Collections) She had developed her ear and her convictions, as it turned out, as a musician, having been a piano prodigy who performed at Madison Square Garden at age 12. She went on to a debut with the New York Philharmonic, gave concerts at Lewisohn Stadium and the Brooklyn Museum, and so on. I could relate. Though my gifts were not nearly at her level, I, too, had been a dedicated, early pianist and I, too, had looked later for other ways to use what I’d learned at the piano keyboard. And our birthdays were on the same date in March. So, despite being at least a couple of decades apart in age, we bonded.  Tracey’s tenure at WNYC was fruitful, though not long. As she had at Nonesuch, she embraced ambitious and adventurous music programming. She encouraged some of the on-air personalities to express themselves about the music, to “personalize” the air, to some degree. That was also happening in special programs launched shortly before she arrived as part of a New Music initiative, with John Schaefer and Tim Page presenting a range of music way beyond the standard classical fare. And because of Tracey’s deep history and contacts in the New York music business, she forged partnerships with music institutions and found ways to work live performances by individual musicians and chamber groups into the programming. She helped me carve out a segment on air for something we called Great Collaborations, a simple and very flexible idea of hers that spread out to every area of music and made a nice framework for some observations about musical style and history. She loved to talk (sometimes to a fault) and brainstorm about ways to enliven the idea of classical music on the radio, not something all that many people were thinking about, then.  But management found her difficult, slow and entirely too perfectionistic. She found management difficult, slow and entirely too superficial. And after a short time, maybe a year, she packed up her sneakers —essential for navigating the unforgiving marble floors in that old place— and left the long, dusty hallways of the Municipal Building.  After that, I occasionally visited Tracey’s house in Brooklyn for events which I can only refer to as “musicales.” Her residence was on the Upper West Side, but this family house was treated as a country place, she’d go on the weekends. She’d have people over, they’d play piano, and sing, and it might be William Bolcom and Joan Morris, or some other notables, spending a musical and social afternoon. Later, she and I produced a big, New York concert together for the 300th birthday of Domenico Scarlatti –which exact date fell on a Saturday in 1985. “Scarlatti Saturday,” we called it, with endless phone-calling, musician-wrangling and fundraising needed for months to get it off the ground.  The concert itself, much of which was also broadcast on WNYC, went on for many hours, with appearances by some of the finest pianists and harpsichordists in town and out, lines all up and down Broadway to get into Symphony Space.  Throughout, Tracey was her incorruptible self — and a brilliant organizer, writer, thinker, planner, and impossibly driven producing-partner.  I should make clear, however, that for all her knowledge and perfectionistic, obsessive behavior, she was never the cliche of the driven, lonely careerist -or whatever other cliche you might want to choose. She was a warm, haimish person with friends all over the world, friends made mostly through music. A case in point: the “Scarlatti Saturday” event was produced by the two of us on a shoestring. And Tracey, being Tracey, she insisted that we provide full musical and performance information in printed programs, offered free to all audience members, and of course accurate to the last comma. How to assure this? She quite naturally charmed and befriended the printer — who wound up practically donating the costly programs to the event. By the time we were finished she was making him batches of her famous rum balls and he was giving us additional, corrected pages —at no extra charge. It was not a calculated maneuver -it was just how she did things.  You just had to love and respect her for the life force, the intelligence, the excellence and even the temperament she displayed at every turn. Sometimes even now, after her death many years ago at 73 from ALS, I still feel Tracey Sterne’s high standards hanging over me —in the friendliest possible way. ___________________________________________ Sara Fishko hosts WNYC’s culture series, Fishko Files. Link to Post | Language: English Heroes Work Here Posted on August 24, 2020 from AOTUS The National Archives is home to an abundance of remarkable records that chronicle and celebrate the rich history of our nation. It is a privilege to be Archivist of the United States—to be the custodian of our most treasured documents and the head of an agency with such a unique and rewarding mission. But it is my greatest privilege to work with such an accomplished and dedicated staff—the real treasures of the National Archives go home at night. Today I want to recognize and thank the mission-essential staff of NARA’s National Personnel Records Center (NPRC). Like all NARA offices, the NPRC closed in late March to protect its workforce and patrons from the spread of the pandemic and comply with local government movement orders. While modern military records are available electronically and can be referenced remotely, the majority of NPRC’s holdings and reference activity involve paper records that can be accessed only by on-site staff. Furthermore, these records are often needed to support veterans and their families with urgent matters such as medical emergencies, homeless veterans seeking shelter, and funeral services for deceased veterans. Concerned about the impact a disruption in service would have on veterans and their families, over 150 staff voluntarily set aside concerns for their personal welfare and regularly reported to the office throughout the period of closure to respond to these types of urgent requests. These exceptional staff were pioneers in the development of alternative work processes to incorporate social distancing and other protective measures to ensure a safe work environment while providing this critical service. National Personnel Records Center (NPRC) building in St. Louis The Center is now in Phase One of a gradual re-opening, allowing for additional on-site staff.  The same group that stepped up during the period of closure continues to report to the office and are now joined by additional staff volunteers, enabling them to also respond to requests supporting employment opportunities and home loan guaranty benefits. There are now over 200 staff supporting on-site reference services on a rotational basis. Together they have responded to over 32,000 requests since the facility closed in late March. More than half of these requests supported funeral honors for deceased veterans. With each passing day we are a day closer to the pandemic being behind us. Though it may seem far off, there will come a time when Covid-19 is no longer the threat that it is today, and the Pandemic of 2020 will be discussed in the context of history. When that time comes, the mission essential staff of NPRC will be able to look back with pride and know that during this unprecedented crisis, when their country most needed them, they looked beyond their personal well-being to serve others in the best way they were able. As Archivist of the United States, I applaud you for your commitment to the important work of the National Archives, and as a Navy veteran whose service records are held at NPRC, I thank you for your unwavering support to America’s veterans. Link to Post | Language: English Contribute to the FSU Community COVID 19 Project Posted on August 21, 2020 from Illuminations Masks Sign, contributed by Lorraine Mon, view this item in the digital library here Students, faculty, and alumni! Heritage & University Archives is collecting stories and experiences from the FSU community during COVID-19. University life during a pandemic will be studied by future scholars. During this pandemic, we have received requests surrounding the 1918 Flu Pandemic. Unfortunately, not many documents describing these experiences survive in the archive.  To create a rich record of life in these unique times we are asking the FSU Community to contribute their thoughts, experiences, plans, and photographs to the archive. Working from Home, contributed by Shaundra Lee, view this time in the digital library here How did COVID-19 affect your summer? Tell us about your plans for fall. How did COVID-19 change your plans for classes? Upload photographs of your dorm rooms or your work from home set ups. If you’d like to see examples of what people have already contributed, please see the collection on Diginole. You can add your story to the project here. Link to Post | Language: English 2021 Creative Fellowship – Call for Proposals Posted on August 21, 2020 from Notes For Bibliophiles PPL is now accepting proposals for our 2021 Creative Fellowship! We’re looking for an artist working in illustration or two-dimensional artwork to create new work related to the theme of our 2021 exhibition, Tomboys. View the full call for proposals, including application instructions, here. The application deadline is October 1, 2020 April 1, 2021*. *This deadline has shifted since we originally posted this call for proposals! The 2021 Fellowship, and the Exhibition & Program Series, have both been shifted forward by six months due to the coronavirus. Updated deadlines and timeline in the call for proposals! Link to Post | Language: English Friday art blog: still life in the collection Posted on August 20, 2020 from Culture on Campus Welcome to our new regular blog slot, the ‘Friday Art Blog’. We look forward to your continued company over the next weeks and months. You can return to the Art Collection website here, and search our entire permanent collection here. Pears by Jack Knox (Oil on board, 1973) This week we are taking a look at some of the still life works of art in the permanent collection. ‘Still life’ (or ‘nature morte’ as it is also widely known) refers to the depiction of mostly inanimate subject matter. It has been a part of art from the very earliest days, from thousands of years ago in Ancient Egypt, found also on the walls in 1st century Pompeii, and featured in illuminated medieval manuscripts. During the Renaissance, when it began to gain recognition as a genre in its own right, it was adapted for religious purposes. Dutch golden age artists in particular, in the early 17th century, depicted objects which had a symbolic significance. The still life became a moralising meditation on the brevity of life. and the vanity of the acquisition of possessions. But, with urbanization and the rise of a middle class with money to spend, it also became fashionable simply as a celebration of those possessions – in paintings of rare flowers or sumptuous food-laden table tops with expensive silverware and the best china. The still life has remained a popular feature through many modern art movements. Artists might use it as an exercise in technique (much cheaper than a live model), as a study in colour, form, or light and shade, or as a meditation in order to express a deeper mood. Or indeed all of these. The works collected by the University of Stirling Art Collection over the past fifty years reflect its continuing popularity amongst artists and art connoisseurs alike. Bouteille et Fruits by Henri Hayden (Lilthograph, 75/75, 1968) In the modern era the still life featured in the post impressionist art of Van Gogh, Cezanne and Picasso. Henri Hayden trained in Warsaw, but moved to Paris in 1907 where Cezanne and Cubism were influences. From 1922 he rejected this aesthetic and developed a more figurative manner, but later in life there were signs of a return to a sub-cubist mannerism in his work, and as a result the landscapes and still lifes of his last 20 years became both more simplified and more definitely composed than the previous period, with an elegant calligraphy. They combine a new richness of colour with lyrical melancholy. Meditation and purity of vision mark the painter’s last years. Black Lace by Anne Redpath (Gouache, 1951) Anne Redpath is best known for her still lifes and interiors, often with added textural interest, and also with the slightly forward-tilted table top, of which this painting is a good example. Although this work is largely monochrome it retains the fascination the artist had in fabric and textiles – the depiction of the lace is enhanced by the restrained palette. Untitled still life by Euan Heng (Linocut, 1/5, 1974) While Euan Heng’s work is contemporary in practice his imagery is not always contemporary in origin. He has long been influenced by Italian iconography, medieval paintings and frescoes. Origin of a rose by Ceri Richards (Lithograph, 30/70, 1967) In Ceri Richards’ work there is a constant recurrence of visual symbols and motifs always associated with the mythic cycles of nature and life. These symbols include rock formations, plant forms, sun, moon and seed-pods, leaf and flower. These themes refer to the cycle of human life and its transience within the landscape of earth. Still Life, Summer by Elizabeth Blackadder (Oil on canvas, 1963) This is a typical example of one of Elizabeth Blackadder’s ‘flattened’ still life paintings, with no perspective. Works such as this retain the form of the table, with the top raised to give the fullest view. Broken Cast by David Donaldson (Oil on canvas , 1975) David Donaldson was well known for his still lifes and landscape paintings as well as literary, biblical and allegorical subjects. Flowers for Fanny by William MacTaggart Oil on board, 1954 William MacTaggart typically painted landscapes, seascapes and still lifes featuring vases of flowers. These flowers, for his wife, Fanny Aavatsmark, are unusual for not being poppies, his most commonly painted flower. Cake by Fiona Watson (Digital print, 18/25, 2009) We end this blog post with one of the most popular still lifes in the collection. This depiction of Scottish classic the Tunnock’s teacake is a modern take on the still life. It is a firm favourite whenever it is on display. Image by Julie Howden Link to Post | Language: English Solar Energy: A Brief Look Back Posted on August 20, 2020 from Illuminations In the early 1970’s the United States was in the midst of an energy crisis. Massive oil shortages and high prices made it clear that alternative ideas for energy production were needed and solar power was a clear front runner. The origins of the solar cell in the United States date back to inventor Charles Fritz in the 1880’s, and the first attempts at harvesting solar energy for homes, to the late 1930’s. In 1974, the State of Florida put it’s name in the ring to become the host of the National Solar Energy Research Institute. Site proposal for the National Solar Energy Research Institute. Claude Pepper Papers S. 301 B. 502 F. 4 With potential build sites in Miami and Cape Canaveral, the latter possessing the added benefit of proximity to NASA, the Florida Solar Energy Task Force, led by Robert Nabors and endorsed by Representative Pepper, felt confident. The state made it to the final rounds of the search before the final location of Golden, Colorado was settled upon, which would open in 1977. Around this same time however (1975), the Florida Solar Energy Center was established at the University of Central Florida. The Claude Pepper Papers contain a wealth of information on Florida’s efforts in the solar energy arena from the onset of the energy crisis, to the late 1980’s. Carbon copy of correspondence between Claude Pepper and Robert L. Nabors regarding the Cape Canaveral proposed site for the National Solar Research Institute. Claude Pepper Papers S. 301 B. 502 F. 4 Earlier this year, “Tallahassee Solar II”, a new solar energy farm, began operating in Florida’s capitol city.  Located near the Tallahassee International Airport, it provides electricity for more than 9,500 homes in the Leon County area. With the steady gains that the State of Florida continues to make in the area of solar energy expansion, it gets closer to fully realizing its nickname, “the Sunshine State.” Link to Post | Language: English (C)istory Lesson Posted on August 18, 2020 from Illuminations Our next submission is from Rachel Duke, our Rare Books Librarian, who has been with Special collections for two years. This project was primarily geared towards full-time faculty and staff, so I chose to highlight her contribution to see what a full-time faculty’s experience would be like looking through the catalog. Frontispiece and Title Page, Salome, 1894. Image from https://collection.cooperhewitt.org/objects/68775953/ The item she chose was Salome, originally written in French by Oscar Wilde, then translated into English, as her object. While this book does not explicitly identify as a “Queer Text,” Wilde has become canonized in queer historical literature. In the first edition of the book, there is even a dedication to his lover, Lord Alfred Bruce Douglas, who helped with the translation. While there are documented historical examples of what we would refer to today as “queerness,” (queer meaning non-straight) there is still no demarcation of his queerness anywhere in the catalog record. Although the author is not necessarily unpacking his own queer experiences in the text, “both [Salome’s] author and its legacy participate strongly in queer history” as Duke states in her submission.  Oscar Wilde and Lord Alfred Bruce Douglas Even though Wilde was in a queer relationship with Lord Alfred Bruce Douglas, and has been accepted into the Queer canon, why doesn’t his catalog record reflect that history? Well, a few factors come into play. One of the main ones is an aversion to retroactively labeling historical figures. Since we cannot confirm which modern label would fit Wilde, we can’t necessarily outright label him as gay. How would a queer researcher like me go about finding authors and artists from the past who are connected with queer history? It is important to acknowledge LGBTQ+ erasure when discussing this topic. Since the LGBTQ+ community has historically been marginalized, documentation of queerness is hard to come by because: People did not collect, and even actively erased, Queer and Trans Histories. LGBTQ+ history has been passed down primarily as an oral tradition.  Historically, we cannot confirm which labels people would have identified with. Language and social conventions change over time. So while we view and know someone to be queer, since it is not in official documentation we have no “proof.” On the other hand, in some cultures, gay relations were socially acceptable. For example, in the Middle Ages, there was a legislatively approved form of same-sex marriage, known as affrèrement. This example is clearly labeled as *gay* in related library-based description because it was codified that way in the historical record. By contrast, Shakespeare’s sonnets, which (arguably) use queer motifs and themes, are not labeled as “queer” or “gay.” Does queer content mean we retroactively label the AUTHOR queer? Does the implication of queerness mean we should make the text discoverable under queer search terms? Cartoon depicting Oscar Wilde’s visit to San Francisco. By George Frederick Keller – The Wasp, March 31, 1882. Personally, I see both sides. As someone who is queer, I would not want a random person trying to retroactively label me as something I don’t identify with. On the other hand, as a queer researcher, I find it vital to have access to that information. Although they might not have been seen as queer in their time period, their experiences speak to queer history. Identities and people will change, which is completely normal, but as a group that has experienced erasure of their history, it is important to acknowledge all examples of historical queerness as a proof that LGBTQ+ individuals have existed throughout time. How do we responsibly and ethically go about making historical queerness discoverable in our finding aids and catalogs? Click Here to see some more historical figures you might not have known were LGBTQ+. Link to Post | Language: English Post navigation ← Older posts About ArchivesBlogs ArchivesBlogs syndicates content from weblogs about archives and archival issues and then makes the content available in a central location in a variety of formats.More Info.   Languages Deutsch English Español Français Italiano Nederlands Nihongo (日本語) العربية Syndicated Blogs ????????? blog? A Lively Experiment A Repository for Bottled Monsters A View to Hugh Academic Health Center Archives Adventures in Records Management African American Studies at Beinecke Library Annotations: The NEH Preservation Project AOTUS Archaeology Archives Oxford Archivagando Archival science / ??? ??????? Archivalia Archiveros Españoles en la Función Pública (AEFP) Archives and Auteurs Archives and Special Collections Archives d’Assy Archives Forum Archives Gig Archives Hub Blog Archives Outside Archives, Records and Artefacts ArchivesInfo ArchivesNext Archivistica e dintorni Archivium Sancti Iacobi Archivólogo – blog de archivo – Lic. Carmen Marín ArcHiVóNoMo.biT Arkivformidling Around the D AuthentiCity Beaver Archivist Blog bloggers@brooklynmuseum » Libraries & Archives Bogdan's Archival Blog — Blog de arhivist born digital archives (AIMS Project) Brandeis Special Collections Spotlight Calames – le blog Consultores Documentales Cultural Compass Culture on Campus Daily Searchivist De Digitale Archivaris Depotdrengen Digital Library of Georgia Digitization 101 discontents Dub Collections Endangered archives blog Ephemeral Archives F&M Archives & Special Collections Fil d'ariane frei23 – GeschichtsPuls Fresh Pickin's futureArch, or the future of archives… Hanging Together Helen Morgan Historical Notes Illuminations In the mailbox inside the CHS Inside the Gates Keeping Time L’Affaire Makropoulos l’Archivista La Tribune des Archives LiveJournal Archivists LSU Libraries Special Collections Blog M.E. Grenander Department of Special Collections and Archives MIT Libraries News » Archives + MIT History Modern Books and Manuscripts Mudd Manuscript Library Blog National Union of Women Teachers NC Miscellany nccdhistory New Archivist New York State Archives News and Events News – Litwin Books & Library Juice Press Notes For Bibliophiles O arquivista Old Things With Stories Open Beelden Order from Chaos Out of the Box Out of the box Pacific Northwest Features PaulingBlog Peeling Back the Bark Poetry at Beinecke Library Posts on Mark A. Matienzo Practical Archivist Practical E-Records Presbyterian Research RATilburg ReadyResources Reclamation & Representation Records management futurewatch Records Mgmt & Archiving Richard B. Russell Library for Political Research and Studies Room 26 Cabinet of Curiosities SDSU Special Collections: New Acquisitions, Events, and Highlights from Our Collections Special Collections Blog Special Collections – The University of Chicago Library News Special Collections – UGA Libraries News & Events Special Collections – UTC Library Spellbound Blog Stacked Five High State Library of Massachusetts State Records Office of Western Australia The Anarchivist The Autry Blog The Back Table The Butler Center for Arkansas Studies The Charleston Archive The Consecrated Eminence The Devil's Tale The Last Campaign The Legacy Center The Posterity Project The Quantum Archivist The Top Shelf the visible archive Touchable Archives Transforming Classification Trinity University Special Collections and Archives Twin Cities Archives Round Table UNC Greensboro Digital Collections Vault217 VPRO Radio Archief WebArchivists WebArchivists (FR) What the fonds? What's Cool at Hoole What’s on the 6th floor? WNYC Archives & Preservation You Ought to be Ashamed Proudly powered by WordPress arstechnica-com-5132 ---- Dogecoin has risen 400 percent in the last week because why not | Ars Technica Skip to main content Biz & IT Tech Science Policy Cars Gaming & Culture Store Forums Subscribe Close Navigate Store Subscribe Videos Features Reviews RSS Feeds Mobile Site About Ars Staff Directory Contact Us Advertise with Ars Reprints Filter by topic Biz & IT Tech Science Policy Cars Gaming & Culture Store Forums Settings Front page layout Grid List Site theme Black on white White on black Sign in Comment activity Sign up or login to join the discussions! Stay logged in | Having trouble? Sign up to comment and more Sign up Here we go again — Dogecoin has risen 400 percent in the last week because why not Dogecoin rallied after Elon Musk tweeted a photo of "Doge Barking at the Moon." Timothy B. Lee - Apr 16, 2021 6:56 pm UTC Enlarge peng song / Getty reader comments 162 with 112 posters participating, including story author Share this story Share on Facebook Share on Twitter Share on Reddit Dogecoin, a blockchain-based digital currency named for a meme about an excitable canine, has seen its price rise by a factor of five over the last week. The price spike has made it one of the world's 10 most valuable cryptocurrencies, with a market capitalization of $45 billion. Understanding the value of cryptocurrencies is never easy, and it's especially hard for Dogecoin, which was created as a joke. Dogecoin isn't known for any particular technology innovations and doesn't seem to have many practical applications. What Dogecoin does have going for it, however, is memorable branding and an enthusiastic community of fans. And in 2021, that counts for a lot. In recent months, we've seen shares of GameStop soar to levels that are hard to justify based on the performance of GameStop's actual business. People bought GameStop because it was fun and they thought the price might go up. So too for Dogecoin. Tesla CEO Elon Musk may have also played an important role in Dogecoin's ascendancy. Musk has periodically tweeted about the cryptocurrency, and those tweets are frequently followed by rallies in Dogecoin's price. Late on Wednesday night, Musk tweeted out this image: Advertisement Doge Barking at the Moon pic.twitter.com/QFB81D7zOL — Elon Musk (@elonmusk) April 15, 2021 Dogecoin's price tripled over the next 36 hours. My editor suggested that I write about whether Dogecoin's rise is a sign of an overheated crypto market, but for a coin like Dogecoin, I'm not sure that's even a meaningful concept. Dogecoin isn't a company that has revenues or profits. And unlike bitcoin and ether, no one seriously thinks it's going to be the foundation of a new financial system. People are trading Dogecoin because it's fun to trade and because they think they might make money from it. The rising price is a sign that a lot of people have decided it would be fun to speculate in Dogecoin. Of course, the fact that lots of people have money to spend on joke investments might itself be a result of larger macroeconomic forces. The combination of stimulus spending, low interest rates, and pandemic-related saving means that a lot of people have more money than usual sitting in their bank accounts. And restrictions on travel and nightlife mean that many of those same people have a lot of time on their hands. reader comments 162 with 112 posters participating, including story author Share this story Share on Facebook Share on Twitter Share on Reddit Timothy B. Lee Timothy is a senior reporter covering tech policy, blockchain technologies and the future of transportation. He lives in Washington DC. Email timothy.lee@arstechnica.com // Twitter @binarybits Advertisement You must login or create an account to comment. Channel Ars Technica ← Previous story Next story → Related Stories Sponsored Stories Powered by Today on Ars Store Subscribe About Us RSS Feeds View Mobile Site Contact Us Staff Advertise with us Reprints Newsletter Signup Join the Ars Orbital Transmission mailing list to get weekly updates delivered to your inbox. Sign me up → CNMN Collection WIRED Media Group © 2021 Condé Nast. All rights reserved. Use of and/or registration on any portion of this site constitutes acceptance of our User Agreement (updated 1/1/20) and Privacy Policy and Cookie Statement (updated 1/1/20) and Ars Technica Addendum (effective 8/21/2018). Ars may earn compensation on sales from links on this site. Read our affiliate link policy. Your California Privacy Rights | Do Not Sell My Personal Information The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of Condé Nast. Ad Choices arstechnica-com-8015 ---- Tesla: “Full self-driving beta” isn’t designed for full self-driving | Ars Technica Skip to main content Biz & IT Tech Science Policy Cars Gaming & Culture Store Forums Subscribe Close Navigate Store Subscribe Videos Features Reviews RSS Feeds Mobile Site About Ars Staff Directory Contact Us Advertise with Ars Reprints Filter by topic Biz & IT Tech Science Policy Cars Gaming & Culture Store Forums Settings Front page layout Grid List Site theme Black on white White on black Sign in Comment activity Sign up or login to join the discussions! Stay logged in | Having trouble? Sign up to comment and more Sign up Mixed messages — Tesla: “Full self-driving beta” isn’t designed for full self-driving Tesla told California regulators the FSD beta lacks "true autonomous features." Timothy B. Lee - Mar 9, 2021 11:15 pm UTC Enlarge / YouTuber Brandon M captured this drone footage of his Tesla steering toward a parked car in October 2020, shortly after the FSD beta became available to the public. Brandon M / YouTube reader comments 486 with 145 posters participating, including story author Share this story Share on Facebook Share on Twitter Share on Reddit The transparency site PlainSite recently published a pair of letters Tesla wrote to the California Department of Motor Vehicles in late 2020. The letters cast doubt on Elon Musk's optimistic timeline for the development of fully driverless technology. For years, Elon Musk has been predicting that fully driverless technology is right around the corner. At an April 2019 event, Musk predicted that Teslas would be capable of fully driverless operation—known in industry jargon as "level 5"—by the end of 2020. "There's three steps to self-driving," Musk told Tesla investors at the event. "There's being feature complete. Then there's being feature complete to the degree where we think the person in the car does not need to pay attention. And then there's being at a reliability level where we also convince regulators that that is true." Tesla obviously missed Musk's 2020 deadline. But you might be forgiven for thinking Tesla is now belatedly executing the strategy he described two years ago. In October, Tesla released what it called its "full self-driving beta" software to a few-dozen Tesla owners. A few days ago, Musk announced plans to expand the program to more customers. Further Reading “Oh Jeeeesus”: Drivers react to Tesla’s full self-driving beta release Given that the product is called "full self-driving," this might seem like the first step in Musk's three-step progression. After a few more months of testing, perhaps it will become reliable enough to operate without human supervision. That could allow Musk to make good on his latest optimistic timeline for Autopilot: in a December 2020 interview, Musk said he was "extremely confident" that Tesla vehicles would reach level 5 by the end of 2021. But a letter Tesla sent to California regulators the same month had a different tone. Despite the "full self-driving" name, Tesla admitted it doesn't consider the current beta software suitable for fully driverless operation. The company said it wouldn't start testing "true autonomous features" until some unspecified point in the future. “We do not expect significant enhancements” Enlarge In a pair of letters last November and December, officials at the California DMV asked Tesla for details about the FSD beta program. Tesla requires drivers using the beta software to actively supervise it so they can quickly intervene if needed. The DMV wanted to know if Tesla planned to relax requirements for human supervision once the software was made available to the general public. Advertisement In its first response, sent in November, Tesla emphasized that the beta software had limited functionality. Tesla told state regulators that the software is "not capable of recognizing or responding" to "static objects and road debris, emergency vehicles, construction zones, large uncontrolled intersections with multiple incoming ways, occlusions, adverse weather, complicated or adversarial vehicles in the driving path, and unmapped roads." In a December follow-up, Tesla added that "we expect the functionality to remain largely unchanged in a future, full release to the customer fleet." Tesla added that "we do not expect significant enhancements" that would "shift the responsibility for the entire dynamic driving task to the system." The system "will continue to be an SAE Level 2, advanced driver-assistance feature." SAE level 2 is industry jargon for driver-assistance systems that perform functions like lane-keeping and adaptive cruise control. By definition, level 2 systems require continual human oversight. Fully driverless systems—like the taxi service Waymo is operating in the Phoenix area—are considered level 4 systems. In its letter to California officials, Tesla added that "Tesla's development of true autonomous features will follow our iterative process (development, validation, early release, etc.) and any such features will not be released to the general public until we have fully validated them." Critics pounced on the disclosure. "Here it is, straight from Tesla," tweeted prominent Tesla skeptic Ed Niedermeyer. "'Full Self-Driving' is not, and will never be, actually self-driving." This might not be quite fair to Tesla—the company apparently does plan to develop more advanced software eventually. But at a minimum, Tesla's public communication about the full self-driving package could easily give customers the wrong impression about the software's future capabilities. Full autonomy is always right around the corner Enlarge / Elon Musk in 2020. BRENDAN SMIALOWSKI / Getty Since 2016, Tesla has given customers every reason to expect that its "full self-driving" software would be, well, fully self-driving. Early promotional materials for the FSD package described a driver getting out of the vehicle and having it find a parking spot on its own. Tesla has repeatedly talked about the FSD package enabling a Tesla vehicle to operate as an autonomous taxi—an application that requires the car to drive itself without anyone behind the wheel. In 2016, Musk predicted that, within two years, a Tesla owner in Los Angeles would be able to summon their vehicle from New York City. Advertisement Further Reading Tesla’s autonomy event: Impressive progress with an unrealistic timeline If Tesla is really going to achieve fully driverless operation in 2021, that doesn't leave much time to develop, test, and validate complex, safety-critical software. So it would be natural for customers to assume that the software Tesla named "Full Self Driving beta" is, in fact, a beta version of Tesla's long-awaited fully self-driving software. But in its communications with California officials, Tesla makes it clear that's not true. Of course, Elon Musk has a long history of announcing over-optimistic timelines for his products. It's not really news that Tesla failed to meet an optimistic deadline set by its CEO. But there's a deeper philosophical issue that may go beyond a few blown deadlines. The long road to full autonomy Enlarge / Waymo tested its driverless taxis in the Phoenix area for more than three years before beginning driverless commercial operations. Waymo Tesla's overall Autopilot strategy is to start with a driver-assistance system and gradually evolve it into a fully driverless system. A bunch of other companies in the industry—led by Google's Waymo—believe that this is a mistake. They think the requirements of the two products are so different that it makes more sense to create a driverless taxi, shuttle, or delivery service from scratch. In particular, companies like Waymo argue that it's too difficult to get regular customers to pay close attention to an almost-but-not-fully driverless vehicle. If a car drives perfectly for 1,000 miles and then makes a big mistake, there's a significant risk the human driver won't be paying close enough attention to prevent a crash. Waymo initially considered creating an Autopilot-like driver assistance system and licensing it to automakers, but the company ultimately decided that doing so would be too risky. Musk has always shrugged this critique off. As we've seen, he believes improvements to Autopilot's driver-assistance features will transform it into a system capable of fully driverless operation. But in its comments to the DMV, Tesla seems to endorse the opposite viewpoint: that adding "true autonomous features" to Autopilot will require more than just incrementally improving the performance of its existing software. Tesla acknowledged that it needs more sophisticated systems for handling the "static objects, road debris, emergency vehicles, construction zones." And this makes it a little hard to believe Musk's boast that Tesla will achieve level 5 autonomy by the end of 2021. Notably, Google's prototype self-driving vehicles have been able to navigate most roadway conditions—much like today's Tesla FSD beta software—since roughly 2015. Yet the company needed another five years to refine the technology enough to enable fully driverless operation. And that was within a limited geographic area and with help from powerful lidar sensors. Tesla is trying to achieve the same feat for every street nationwide—and using only cameras and radar. Perhaps Tesla will move faster than Waymo, and it won't take another five years to achieve fully driverless operation. But customers considering whether to pay $10,000 for Tesla's full self-driving software package should certainly take Musk's optimistic timeline with a grain of salt. Promoted Comments Frodo Douchebaggins Ars Tribunus Militum jump to post I bought FSD over three years ago when Elon's charisma roll beat my wisdom save. There have been a number of questionable claims, and a few outright lies regarding things that were 100% within their control ( https://web.archive.org/web/20190304232 ... capability being a notable example. TLDR; If you bought FSD and were not thrilled with the price drop after you bought it but well before a single feature was delivered, take heart, for you won’t receive a refund of the difference like an ethical business will do, but you will receive an invite to the early access program and get to use the upcoming features before other people! Except that a month or so later they took down that blog post, and the invites never happened.) I understand that they probably really did think they'd be further along now than they are, but the fact that they're not letting us transfer our FSD license if we want to buy a new car means I'm likely not buying another tesla. It's tantamount to preordering a product and then being told you can't change your shipping address when you move a few years later, and is a slap in the face to the early FSD buyers who have received only a single small feature for that money. Why would I give them more money after what they've done? Their pride is costing them their relationship with the customers that should be the most loyal, and they're doing it over something that costs them nothing except flipping a bit each on the current car and the new car, and for that they get to sell another car. At this point I think we have to be close to large class actions finally emerging, and while they won't help us get what we paid for, and won't get our money back, maybe it'll hurt enough to make them stop over promising and underdelivering 2710 posts | registered 12/17/2012 nimble Ars Centurion et Subscriptor jump to post jeffpkamp wrote: Soon tesla will have four assist levels. Autopilot, Full Self driving, level 5, and , no-for-real-la-to-ny-without-assistance (NFRLNWA). But all joking and sarcasm aside. I've watched probably 2 hours of FSD beta videos, and the only thing that really seemed to give the car trouble was roundabouts. Residential level streets, busy commercial streets, and highways were all navigated with relatives from what I could see. And the computer seemed to be accurately picking up everything important. There were some hilarious glitches as the CV software tried to classify things like turning semi trucks, but it got the positions right. Honestly I think this is just a report from someone who speaks fluent bureaucrat, which Elon obviously does not. In terms of achieving full unmonitored self driving, a video like that means very little. It can demonstrate that the car did the right thing in the particular circumstances of the video. It says next to nothing about how reliably it can do it, nor whether it can deal with the enormous range of possible situations that aren't in the video. For such a video to be meaningful, it would need to be hundeds of thousands of hours long, contain a random selection of driving situations that the software is expected to deal with, and show that no driver interventions were required. In other words, it's only large scale statistics that can demonstrate whether self driving systems are safe, not a bunch of short and quite possibly cherry-picked video clips. The good news is that Tesla are collecting those statistics. The bad news is that they're not sharing them. https://www.forbes.com/sites/bradtemple ... 95cc0f7fab 216 posts | registered 5/30/2005 reader comments 486 with 145 posters participating, including story author Share this story Share on Facebook Share on Twitter Share on Reddit Timothy B. Lee Timothy is a senior reporter covering tech policy, blockchain technologies and the future of transportation. He lives in Washington DC. Email timothy.lee@arstechnica.com // Twitter @binarybits Advertisement You must login or create an account to comment. Channel Ars Technica ← Previous story Next story → Related Stories Sponsored Stories Powered by Today on Ars Store Subscribe About Us RSS Feeds View Mobile Site Contact Us Staff Advertise with us Reprints Newsletter Signup Join the Ars Orbital Transmission mailing list to get weekly updates delivered to your inbox. Sign me up → CNMN Collection WIRED Media Group © 2021 Condé Nast. All rights reserved. Use of and/or registration on any portion of this site constitutes acceptance of our User Agreement (updated 1/1/20) and Privacy Policy and Cookie Statement (updated 1/1/20) and Ars Technica Addendum (effective 8/21/2018). Ars may earn compensation on sales from links on this site. Read our affiliate link policy. Your California Privacy Rights | Do Not Sell My Personal Information The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of Condé Nast. Ad Choices arxiv-org-2331 ---- None bibwild-wordpress-com-389 ---- Bibliographic Wilderness Bibliographic Wilderness Code that Lasts: Sustainable And Usable Open Source Code A presentation I gave at online conference Code4Lib 2021, on Monday March 21. I have realized that the open source projects I am most proud of are a few that have existed for years now, increasing in popularity, with very little maintenance required. Including traject and bento_search. While community aspects matter for open source sustainability, … Continue reading Code that Lasts: Sustainable And Usable Open Source Code → Product management In my career working in the academic sector, I have realized that one thing that is often missing from in-house software development is “product management.” But what does that mean exactly? You don’t know it’s missing if you don’t even realize it’s a thing and people can use different terms to mean different roles/responsibilities. Basically, … Continue reading Product management → Rails auto-scaling on Heroku We are investigating moving our medium-small-ish Rails app to heroku. We looked at both the Rails Autoscale add-on available on heroku marketplace, and the hirefire.io service which is not listed on heroku marketplace and I almost didn’t realize it existed. I guess hirefire.io doesn’t have any kind of a partnership with heroku, but still uses … Continue reading Rails auto-scaling on Heroku → Managed Solr SaaS Options I was recently looking for managed Solr “software-as-a-service” (SaaS) options, and had trouble figuring out what was out there. So I figured I’d share what I learned. Even though my knowledge here is far from exhaustive, and I have only looked seriously at one of the ones I found. The only managed Solr options I … Continue reading Managed Solr SaaS Options → Gem authors, check your release sizes Most gems should probably be a couple hundred kb at most. I’m talking about the package actually stored in and downloaded from rubygems by an app using the gem. After all, source code is just text, and it doesn’t take up much space. OK, maybe some gems have a couple images in there. But if … Continue reading Gem authors, check your release sizes → Every time you decide to solve a problem with code… Every time you decide to solve a problem with code, you are committing part of your future capacity to maintaining and operating that code. Software is never done. Software is drowning the world by James Abley Updating SolrCloud configuration in ruby We have an app that uses Solr. We currently run a Solr in legacy “not cloud” mode. Our solr configuration directory is on disk on the Solr server, and it’s up to our processes to get our desired solr configuration there, and to update it when it changes. We are in the process of moving … Continue reading Updating SolrCloud configuration in ruby → Are you talking to Heroku redis in cleartext or SSL? In “typical” Redis installation, you might be talking to redis on localhost or on a private network, and clients typically talk to redis in cleartext. Redis doesn’t even natively support communications over SSL. (Or maybe it does now with redis6?) However, the Heroku redis add-on (the one from Heroku itself) supports SSL connections via “Stunnel”, … Continue reading Are you talking to Heroku redis in cleartext or SSL? → Comparing performance of a Rails app on different Heroku formations I develop a “digital collections” or “asset management” app, which manages and makes digitized historical objects and their descriptions available to the public, from the collections here at the Science History Institute. The app receives relatively low level of traffic (according to Google Analytics, around 25K pageviews a month), although we want it to be … Continue reading Comparing performance of a Rails app on different Heroku formations → Deep Dive: Moving ruby projects from Travis to Github Actions for CI So this is one of my super wordy posts, if that’s not your thing abort now, but some people like them. We’ll start with a bit of context, then get to some detailed looks at Github Actions features I used to replace my travis builds, with example config files and examination of options available. For … Continue reading Deep Dive: Moving ruby projects from Travis to Github Actions for CI → Unexpected performance characteristics when exploring migrating a Rails app to Heroku I work at a small non-profit research institute. I work on a Rails app that is a “digital collections” or “digital asset management” app. Basically it manages and provides access (public as well as internal) to lots of files and description about those files, mostly images. It’s currently deployed on some self-managed Amazon EC2 instances … Continue reading Unexpected performance characteristics when exploring migrating a Rails app to Heroku → faster_s3_url: Optimized S3 url generation in ruby Subsequent to my previous investigation about S3 URL generation performance, I ended up writing a gem with optimized implementations of S3 URL generation. github: faster_s3_url It has no dependencies (not even aws-sdk). It can speed up both public and presigned URL generation by around an order of magnitude. In benchmarks on my 2015 MacBook compared … Continue reading faster_s3_url: Optimized S3 url generation in ruby → Delete all S3 key versions with ruby AWS SDK v3 If your S3 bucket is versioned, then deleting an object from s3 will leave a previous version there, as a sort of undo history. You may have a “noncurrent expiration lifecycle policy” set which will delete the old versions after so many days, but within that window, they are there. What if you were deleting … Continue reading Delete all S3 key versions with ruby AWS SDK v3 → Github Actions tutorial for ruby CI on Drifting Ruby I’ve been using travis for free automated testing (“continuous integration”, CI) on my open source projects for a long time. It works pretty well. But it’s got some little annoyances here and there, including with github integration, that I don’t really expect to get fixed after its acquisition by private equity. They also seem to … Continue reading Github Actions tutorial for ruby CI on Drifting Ruby → More benchmarking optimized S3 presigned_url generation In a recent post, I explored profiling and optimizing S3 presigned_url generation in ruby to be much faster. In that post, I got down to using a Aws::Sigv4::Signer instance from the AWS SDK, but wondered if there was a bunch more optimization to be done within that black box. Julik posted a comment on that … Continue reading More benchmarking optimized S3 presigned_url generation → Delivery patterns for non-public resources hosted on S3 I work at the Science History Institute on our Digital Collections app (written in Rails), which is kind of a “digital asset management” app combined with a public catalog of our collection. We store many high-resolution TIFF images that can be 100MB+ each, as well as, currently, a handful of PDFs and audio files. We … Continue reading Delivery patterns for non-public resources hosted on S3 → Speeding up S3 URL generation in ruby It looks like the AWS SDK is very slow at generating S3 URLs, both public and presigned, and that you can generate around an order of magnitude faster in both cases. This can matter if you are generating hundreds of S3 URLs at once. My app The app I work is a “digital collections” or … Continue reading Speeding up S3 URL generation in ruby → A custom local OHMS front-end Here at the Science History Institute, we’ve written a custom OHMS viewer front-end, to integrate seamlessly with our local custom “content management system” (a Rails-based digital repository app with source available), and provide some local functionality like the ability to download certain artifacts related to the oral history. We spent quite a bit of energy … Continue reading A custom local OHMS front-end → Encrypting patron data (in Rails): why and how Special guest post by Eddie Rubeiz I’m Eddie Rubeiz. Along with the owner of this blog, Jonathan Rochkind, and our system administrator, Dan, I work on the Science History Institute’s digital collections website, where you will find, among other marvels, this picture of the inventor of Styrofoam posing with a Santa “sculpture”, which predates the … Continue reading Encrypting patron data (in Rails): why and how → Intentionally considering fixity checking In our digital collections app rewrite at Science History Institute, we took a moment to step back and  be intentional about how we approach “fixity checking” features and UI, to make sure it’s well-supporting the needs it’s meant to.  I think we do a good job of providing UI to let repository managers and technical … Continue reading Intentionally considering fixity checking → bitcoinmagazine-com-5669 ---- What Is The Bitcoin Block Size Limit? - Bitcoin Magazine: Bitcoin News, Articles, Charts, and Guides Events Culture Business Technical Markets Store Earn Press Releases Reviews Learn About Bitcoin Magazine Advertise Terms of Use Privacy Policy B.TC Inc Privacy Settings Articles Store Conference Buy Bitcoin Learn Articles Store Conference Buy Bitcoin Learn What Is The Bitcoin Block Size Limit? Author: Bitcoin Magazine Publish date: Aug 17, 2020 The Bitcoin block size limit is a parameter in the Bitcoin protocol that limits the size of Bitcoin blocks, and, therefore, the number of transactions that can be confirmed on the network approximately every 10 minutes. Although Bitcoin launched without this parameter, Satoshi Nakamoto added a 1 megabyte block size limit back when he was still the lead developer of the project. This translated into about three to seven transactions per second, depending on the size of transactions. Further Reading: Who Created Bitcoin? In 2017, Bitcoin’s block size limit was replaced by a block weight limit of 4 million “weight units.” This changed how data in blocks is “counted”: some data weighs more than other data. Perhaps more importantly, it also represented an effective block size limit increase: Bitcoin blocks now have a theoretical maximum size of 4 megabytes and a more realistic maximum size of 2 megabytes. The exact size depends on the types of transactions included. Why Is the Block Size Limit Controversial? The block size limit is controversial because there is disagreement over whether or not such a limit “should be” part of the Bitcoin protocol, and if it should, how big it should be. Satoshi Nakamoto never publicly specified why he added a block size limit to the Bitcoin protocol. It has been speculated that he intended it to be an anti-spam measure, to prevent an attacker from overloading the Bitcoin network with artificially large Bitcoin blocks full of bogus transactions. Some have also been speculated that he intended for it to be a temporary measure, but it is unclear how temporary or under what conditions he foresaw the block size limit being increased or lifted. The code itself that enforces the block size limit certainly wasn’t temporary. Further Reading: Can Bitcoin Scale? A couple years after Satoshi Nakamoto left the project, developers and users started to disagree on the temporality and necessity of the block size limit. As Bitcoin’s user base grew, some believed it was time to increase or lift the block size limit entirely, specifically before Bitcoin blocks would start filling up with transactions. Others came to believe that the block size limit represents a vital security parameter of the protocol and believed it should not be lifted — or at least, it should be lifted more conservatively. Yet others think that the 1 megabyte put in place by Satoshi Nakamoto was actually too large and advocated for a block size limit decrease .Adding more complications, since Bitcoin is decentralized, no particular group or person is in charge of decisions like increasing or decreasing the block size. Disagreements on how such decisions should be made, by whom, or if they should be made at all, has probably led to at least as much controversy as the block size limit itself — but this aspect of the debate is outside the scope of this article. Further Reading: What Is Bitcoin? Why Shouldn’t Bitcoin Blocks Be Too Small? Note: Almost anything about Bitcoin’s block size limit and the risks of it being too big or too small is contested, but these are some of the more general arguments. If Bitcoin blocks are too small, not many transactions can be processed by the Bitcoin network. Broadly speaking, proponents of a block size limit increase (“big blockers”) argue this can have two negative consequences. Not Enough Space? Firstly, smaller bitcoin blocks would mean that there isn’t enough space to include everyone’s transactions in these blocks, and the transaction fee “bidding war” to get transactions confirmed would price most people out of using bitcoin at all. Instead, it could lead to a future where only bank-like institutions make transactions with one another, while regular users hold accounts with these institutions. This would, in turn, open the door to fractional reserve banking, transaction censorship and more of the problems with traditional finance that many bitcoiners hoped to get away from. Deterrent to Adoption Secondly — and this is probably what many “big blockers” consider to be a more pressing concern — users would simply give up on Bitcoin altogether because blocks are too small. Perhaps users would switch to a competing cryptocurrency or they would give up on this type of technology altogether. Why Shouldn’t Bitcoin Blocks Be Too Big? Note: Almost anything about Bitcoin’s block size limit and the risks of it being too big or too small is contested, but these are some of the more general arguments. Opponents of a block size limit increase (“small blockers”) argue there are, roughly speaking, three risks if blocks are too big, each of which have several “sub-risks” as well as nuances. Increased Cost for Bitcoin Nodes The first of these risks is that bigger blocks increase the cost of operating a Bitcoin node. It increases this cost in four ways: It increases the cost of storing the blockchain, as the blockchain would grow faster. It increases bandwidth costs to download (and upload) all transactions and blocks. It increases CPU costs required to validate all transactions and blocks. The bigger the total blockchain is, the longer it takes to bootstrap a new node on the network: It has to download and validate all past transactions and blocks. If the cost to operate a Bitcoin node becomes too high, and users have to (or choose to) use lightweight clients instead, they can no longer verify that the transactions they receive are valid. They could, for example, receive a transaction from an attacker that created coins out of thin air; without knowing the entire history of the Bitcoin blockchain, there is no way to tell the difference. In that case, users would only find out that their coins are fake once they try to spend them later on. Even if users do validate that the block that includes the transaction was mined sufficiently (which is common), miners could be colluding with the attacker. Further Reading: What Is Bitcoin Mining? Perhaps an even bigger risk could arise if, over time, so few users choose to run Bitcoin nodes that the fraudulent coins are noticed too late or not at all. In that case, the Bitcoin protocol itself effectively becomes subject to changes imposed by miners. Miners could go as far as to increase the coin supply or spend coins they do not own. Only a healthy ecosystem with a significant share of users validating their own transactions prevents this. In the Bitcoin white paper, Satoshi Nakamoto acknowledged the above mentioned problems and suggested that light clients could be made secure through a technical solution called “fraud proofs.” Unfortunately, however, he did not detail what these fraud proofs would look like exactly, and so far no one has been able to figure it out. (In fact, some of today’s Bitcoin developers do not believe fraud proofs are viable.) Mining Centralization The second risk of bigger blocks is that they could lead to mining centralization. Whenever a miner finds a new block, it sends this block to the rest of the network, and, in normal circumstances, bigger blocks take longer to find their way to all other miners. While the block is finding its way, however, the miner that found it can immediately start mining on top of the new block himself, giving him a head start on finding the next block. Bigger miners (or pools) find more blocks than smaller miners, thereby gaining more head starts. This means that smaller miners will be less profitable and will eventually be outcompeted, leading to a more centralized mining ecosystem. If mining becomes too centralized, some miners could end up in a position where they can 51 attack the network. That said, this is probably the most complex and nuanced argument against smaller blocks. For one, even big miners have an incentive against creating blocks that are too big: While they can benefit from a head start, too much delay can work to their detriment as a competing block may find its way through the network faster, and other miners will mine on that block instead. There are also technical solutions to speed up block relay, as well as technical solutions to limit the damage from mining centralization itself, but these solutions come with trade-offs of their own. Lower Block Subsidies Could Lead to Less Network Security The third and final risk of big blocks is that they could disincentivize users from adding fees to their transactions. As long as block space is limited, users must outbid each other to have their transactions included in blocks, and as Bitcoin’s block subsidy diminishes, this will have to become a more significant part of the block reward to support Bitcoin’s security model. Without a block size limit, this incentive is taken away. (While individual miners can still choose to only include fees with a minimum fee, other miners would still have an incentive to include transactions below that threshold — thereby diminishing the fee incentive after all.) Attentive readers will have noticed that this last argument in particular works both ways. While “big blockers” see high fees as a problem as it would make Bitcoin less attractive, “small blockers” see high fees as a positive as it would benefit Bitcoin’s security. Will Bitcoin Core Developers Ever Increase the Block Size Limit? Bitcoin Core is the predominant — though not only — Bitcoin implementation in use on the Bitcoin network today. Therefore, many “big blockers” have been looking at Bitcoin Core developers to implement an increase.  Bitcoin Core developers did indeed increase the block size limit, through the Segregated Witness (SegWit) protocol upgrade. By replacing it for a block weight limit, blocks now have a theoretical limit of 4 megabytes and a more realistic limit of 2 megabytes. Cleverly, this was a backwards-compatible soft fork protocol upgrade, which meant that users could opt into the change without splitting the network. However, exactly because this was a soft fork, and not a hard fork as many “big blockers” preferred, they sometimes do not “count” this increase as a block size limit increase at all. Further Reading: What Are Bitcoin Forks? Indeed, Bitcoin Core developers have not deployed a block size limit increase through a hard fork, which is a backwards-incompatible protocol upgrade. This would either require consensus from all of Bitcoin’s users or possibly split the Bitcoin network in two: a version of Bitcoin with the current block weight limit and a version of Bitcoin with the increased block size/weight limit. Users of the version of Bitcoin with the current block weight limit would probably not even consider the hard-forked version of Bitcoin to be “Bitcoin” at all; they might refer to it as “Bitcoin Core coin” or something along these lines. Perhaps more importantly, the current group of Bitcoin Core contributors seem to have no desire to dictate Bitcoin’s protocol rules, nor do they want to split the network. Therefore, they are unlikely to deploy a hard fork (for the block size limit or otherwise) without broad consensus throughout Bitcoin’s user base for such a protocol upgrade. Given the controversial nature of the block size/weight parameter, it’s unlikely that such consensus will form anytime soon, but it could happen down the road. Alternative Solutions There are some alternative solutions to increase Bitcoin’s block size limit, like Extension Blocks, as well as solutions that could achieve something similar, such as “big block” sidechains. It’s not clear that any of these solutions will see the light of day anytime soon either, however; current focus seems more directed toward “layer two” scaling solutions like the Lightning Network. Further Reading: What Is the Lightning Network? Is Bitcoin Block Size Limit Discussion Censored? The short answer is no. As for a slightly longer answer… During the heat of the block size limit debate, one of the most popular Bitcoin discussion platforms on the internet, the Bitcoin-focused subreddit r/bitcoin, imposed heavy-handed moderation. This moderation was intended to stop forum users from promoting consensus-breaking software before the greater user base had actually come to a consensus on the best way forward.  At the time, it was not obvious to everyone that using such software could lead to a split (a non-backwards-compatible hard fork) of the network, and it was often advertised as if it couldn’t. Arguing in favor of a block size limit increase and/or hard fork without directly promoting consensus-breaking software was always allowed. Whether this constituted a form of “censorship” is perhaps in the eye of the beholder, but what’s certain is that anyone who disagreed with this policy was free to start or contribute to competing Bitcoin subreddits, and this is exactly what happened. The r/btc subreddit in particular become a popular discussion platform for those who favored a block size limit increase hard fork. Furthermore, Reddit is only a relatively small part of the internet and an even smaller part of the entire world. While there are some other platforms that have been accused of similar censorship (such as the Bitcointalk forum and the Bitcoin-development mailing list), it is hard to deny that the debate took place loud and clear across social media, news sites, conferences, chat groups and far beyond. Anyone interested in hearing about the different arguments had every chance to inform themselves and even those who didn’t care had a hard time escaping the fallout from the debate. In the end, those who favored a block size limit increase hard fork were unable to convince enough people of their case, and it seems as if some of them have channeled their frustration about this disappointment into anger toward a particular subreddit and its moderators. (Or maybe, by writing this, Bitcoin Magazine is just part of a great cover-up conspiracy. Spooky!) What Is Bitcoin Cash? What Is Bitcoin SV? When it became clear that Bitcoin would increase its block size limit (among other things) through the SegWit soft fork protocol upgrade, some “big blockers” decided to move forward with a block size limit increase hard fork, even knowing that they would be in a minority and split off into their own network to become a new cryptocurrency. This new network and the resulting cryptocurrency is called Bitcoin Cash. Since Bitcoin Cash split off from Bitcoin, it has itself implemented several more hard fork upgrades, some of which, in turn, led to even more splits in the network and new cryptocurrencies. The most notable of these is Bitcoin SV, loosely centered around Craig Wright, one of the men who (almost certainly fraudulently) claims to have been behind the pseudonym Satoshi Nakamoto. It has an even bigger block size limit than Bitcoin Cash does. By Bitcoin Magazine Guides What Is Bitcoin? By Bitcoin Magazine Mar 8, 2021 Guides What Is Bitcoin Mining? By Bitcoin Magazine Aug 10, 2020 Guides What Is Quantum Computing? By Bitcoin Magazine Nov 6, 2019 Guides What is the Lightning Network? By Bitcoin Magazine Oct 9, 2020 Guides What is SegWit? By Bitcoin Magazine Aug 17, 2020 Technical GreenAddress: Increasing Bitcoin's Block-size Limit is not Scaling; it's Pivoting By Aaron van Wirdum Dec 2, 2015 Guides What is 'The Halvening'? By Bitcoin Magazine Jul 20, 2020 Technical Roger Ver Is Still Determined to Increase the Bitcoin Block Size Limit via a Hard Fork By Kyle Torpey Sep 22, 2016 Guides What Is KYC? By Bitcoin Magazine Sep 10, 2020 Guides What Are Bitcoin Mixers? By Bitcoin Magazine Aug 17, 2020 Technical Is It Time to Take an Initiative to Decrease Bitcoin’s Block Size Seriously? By Aaron van Wirdum Feb 15, 2019 Technical Settling the Block Size Debate By Eric Lombrozo Jul 29, 2015 Guides What Are Bitcoin Mining Pools? By Bitcoin Magazine Jun 12, 2020 Guides What Are Bitcoin Forks? By Bitcoin Magazine Aug 11, 2020 Guides What Is A Bitcoin Improvement Proposal (BIP)? By Bitcoin Magazine Aug 17, 2020 Loading… See More About Bitcoin Magazine Advertise Terms of Use Privacy Policy B.TC Inc © 2021 bit-ly-1192 ---- Documenting the Now Slack Documenting the Now Slack The DocNow team and advisory board is using Slack as a collaboration space. You are welcome to join us by filling out the form below If you are interested in contributing to the conversation around the ethics of social media archiving, the DocNow application, and web archiving practices in general. Once you've been invited you can join our slack at: http://docnowteam.slack.com If you have questions or comments that are not answered in a timely manner, or that you would prefer to ask privately please get in touch with the core team at info@docnow.io and we will get back to you. Documenting the Now is dedicated to a harassment-free experience for everyone. Our anti-harassment policy can be found at: https://github.com/DocNow/code-of-conduct * Required Your Name * Your answer Your Email Address * Your answer Your Interest * Your answer Submit Never submit passwords through Google Forms. This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy  Forms     blog-cbeer-info-8871 ---- blog.cbeer.info blog.cbeer.info Autoscaling AWS Elastic Beanstalk worker tier based on SQS queue length LDPath in 3 examples Building a Pivotal Tracker IRC bot with Sinatra and Cinch Real-time statistics with Graphite, Statsd, and GDash Icemelt: A stand-in for integration tests against AWS Glacier blog-dataunbound-com-3127 ---- Data Unbound : Helping organizations access and share data effectively. Special focus on web APIs for data integration. Data Unbound Helping organizations access and share data effectively. Special focus on web APIs for data integration. Skip to content About Some of what I missed from the Cmd-D Automation Conference The CMD-D|Masters of Automation one-day conference in early August would have been right up my alley: It’ll be a full day of exploring the current state of automation technology on both Apple platforms, sharing ideas and concepts, and showing what’s possible—all with the goal of inspiring and furthering development of your own automation projects. Fortunately, those of us who missed it can still get a meaty summary of the meeting by listening to the podcast segment Upgrade #154: Masters of Automation – Relay FM. I've been keen on automation for a long time now and was delighted to hear the panelists express their own enthusiasm for customizing their Macs, iPhones, or iPads to make repetitive tasks much easier and less time-consuming. Noteworthy take-aways from the podcast include: Something that I hear and believe but have yet to experience in person: non-programmers can make use of automation through applications such as Automator — for macOS — and Workflow for iOS. Also mentioned often as tools that are accessible to non-geeks: Hazel and Alfred – Productivity App for Mac OS X. Automation can make the lives of computer users easier but it's not immediately obvious to many people exactly how. To make a lot of headway in automating your workflow, you need a problem that you are motivated to solve. Many people use AppleScript by borrowing from others, just like how many learn HTML and CSS from copying, pasting, and adapting source on the web. Once you get a taste for automation, you will seek out applications that are scriptable and avoid those that are not. My question is how to make it easier for developers to make their applications scriptable without incurring onerous development or maintenance costs? E-book production is an interesting use case for automation. People have built businesses around scripting Photoshop [is there really a large enough market?] OmniGroup's automation model is well worth studying and using. I hope there will be a conference next year to continue fostering this community of automation enthusists and professionals. 2017 09 25 Raymond Yee automation macOS Comments (0) Permalink Fine-tuning a Python wrapper for the hypothes.is web API and other #ianno17 followup In anticipation of #ianno17 Hack Day, I wrote about my plans for the event, one of which was to revisit my own Python wrapper for the nascent hypothes.is web API. Instead of spending much time on my own wrapper, I spent most of the day working with Jon Udell's wrapper for the API. I've been working on my own revisions of the library but haven't yet incorporated Jon's latest changes. One nice little piece of the puzzle is that I learned how to introduce retries and exponential backoff into the library, thanks to a hint from Nick Stenning and a nice answer on Stackoverflow . Other matters In addition to the Python wrapper, there are other pieces of follow-up for me. I hope to write more extensively on those matters down the road but simply note those topics for the moment. Videos from the conference I might start by watching videos from #ianno17 conference: I Annotate 2017 – YouTube. Because I didn't attend the conference per se, I might glean insight into two particular topics of interest to me (the role of page owner in annotations and the intermingling of annotations in ebooks.) An extension for embedding selectors in the URL I will study and try Treora/precise-links: Browser extension to support Web Annotation Selectors in URIs. I've noticed that the same annotation is shown in two related forms: https://hyp.is/Zj2dyi9tEeeTmxvuPjLhSw/blog.dataunbound.com/2017/05/01/revisiting-hypothes-is-at-i-annotate-2017/ https://blog.dataunbound.com/2017/05/01/revisiting-hypothes-is-at-i-annotate-2017/#annotations:Zj2dyi9tEeeTmxvuPjLhSw Does the precise-links extension let me write the selectors into the URL? 2017 05 22 Raymond Yee annotation Comments (0) Permalink Revisiting hypothes.is at I Annotate 2017 I'm looking forward to hacking on web and epub annotation at the #ianno17 Hack Day. I won't be at the I Annotate 2017 conference per se but will be curious to see what comes out of the annual conference. I continue to have high hopes for digital annotations, both on the Web and in non-web digital contexts. I have used Hypothesis on and off since Oct 2013. My experiences so far: I like the ability to highlight and comment on very granular sections of articles for comment, something the hypothes.is annotation tool makes easy to do. I appreciate being able to share annotation/highlight with others (on Twitter or Facebook), though I'm pretty sure most people who bother to click on the links might wonder "what's this" when they click on the link. A small user request: hypothes.is should allow a user to better customize the Facebook preview image for the annotation. I've enjoyed using hypothes.is for code review on top of GitHub. (Exactly how hypothes.is complements the extensive code-commenting functionality in GitHub might be worth a future blog post.) My Plans for Hack Day Python wrapper for hypothes.is This week, I plan to revisit rdhyee/hypothesisapi: A Python wrapper for the nascent hypothes.is web API to update or abandon it in favor of new developments. (For example, I should look at kshaffer/pypothesis: Python scripts for interacting with the hypothes.is API.) Epubs + annotations I want to figure out the state of art for epubs and annotations. I'm happy to see the announcement of a partnership to bring open annotation to eBooks from March 2017. I'd definitely like to figure out how to annotate epubs (e.g., Oral Literature in Africa (at unglue.it) or Moby Dick). The best approach is probably for me to wait until summer at which time we'll see the fruits of the partnership: Together, our goal is to complete a working integration of Hypothesis with both EPUB frameworks by Summer 2017. NYU plans to deploy the ReadiumJS implementation in the NYU Press Enhanced Networked Monographs site as a first use case. Based on lessons learned in the NYU deployment, we expect to see wider integration of annotation capabilities in eBooks as EPUB uptake continues to grow. In the meantime, I can catch up on the current state of futurepress/epub.js: Enhanced eBooks in the browser., grok Epub CFI Updates, and relearn how to parse epubs using Python (e.g., rdhyee/epub_avant_garde: an experiment to apply ideas from https://github.com/sandersk/ebook_avant_garde to arbitrary epubs). Role of page owners I plan to check in on what's going on with efforts at Hypothes.is to involve owners in page annotations: In the past months we launched a small research initiative to gather different points of view about website publishers and authors consent to annotation. Our goal was to identify different paths forward taking into account the perspectives of publishers, engineers, developers and people working on abuse and harassment issues. We have published a first summary of our discussion on our blog post about involving page owners in annotation. I was reminded of these efforts after reading that Audrey Watters had blocked annotation services like hypothes.is and genius from her domains: Un-Annotated Episode 52: Marginalia In the spirit of communal conversation, I threw in my two cents: Have there been any serious exploration of easy opt-out mechanisms for domain owners? Something like robots.txt for annotation tools? 2017 05 01 Raymond Yee annotation Comments (2) Permalink My thoughts about Fargo.io using fargo.io 2013 11 03 Raymond Yee Uncategorized Comments (0) Permalink Organizing Your Life With Python: a submission for PyCon 2015? I have penciled into my calendar a trip  to Montreal to attend PyCon 2014.   In my moments of suboptimal planning, I wrote an overly ambitious abstract for a talk or poster session I was planning to submit.  As I sat down this morning to meet the deadline for submitting a proposal for a poster session (Nov 1), I once again encountered the ominous (but for me, definitive) admonition: Avoid presenting a proposal for code that is far from completion. The program committee is very skeptical of "conference-driven development". It's true: my efforts to organize my life with Python are in the early stages. I hope that I'll be able to write something like the following for PyCon 2015. Organizing Your Life with Python David Allen's Getting Things Done (GTD) system is a popular system for personal productivity. Although GTD can be implemented without any computer technology, I have pursued two different digital implementations, including my current implementation using Evernote, the popular note-taking program. This talk explores using Python in conjunction with the Evernote API to implement GTD on top of Evernote. I have found that a major practical hinderance for using GTD is that it way too easy to commit to too many projects. I will discuss how to combine Evernote, Python, GTD with concepts from Personal Kanban to solve this problem. Addendum: Whoops…I find it embarrassing that I already quoted my abstract in a previous blog post in September that I had forgotten about. Oh well. Where's my fully functioning organization system when I need it! Tagged PyCon, Python 2013 10 30 Raymond Yee Evernote GTD Comments (0) Permalink Current Status of Data Unbound LLC in Pennsylvania I'm currently in the process of closing down Data Unbound LLC in Pennsylvania.  I submitted the paperwork to dissolve the legal entity in April 2013 and have been amazed to learn that it may take up to a year to get the final approval done.  In the meantime, as I establishing a similar California legal entity, I will certainly continue to write on this blog about APIs, mashups, and open data. 2013 10 30 Raymond Yee Data Unbound LLC Comments (0) Permalink Must Get Cracking on Organizing Your Life with Python Talk and tutorial proposals for PyCon 2014 are due tomorrow (9/15) .  I was considering submitting a proposal until I took the heart the appropriate admonition against "conference-driven" development of the program committee.   I will nonetheless use the Oct 15 and Nov 1 deadlines for lightning talks and proposals respectively to judge whether to submit a refinement of the following proposal idea: Organizing Your Life with Python David Allen's Getting Things Done (GTD) system is a popular system for personal productivity.  Although GTD can be implemented without any computer technology, I have pursued two different digital implementations, including my current implementation using Evernote, the popular note-taking program.  This talk explores using Python in conjunction with the Evernote API to implement GTD on top of Evernote. I have found that a major practical hinderance for using GTD is that it way too easy to commit to too many projects.  I will discuss how to combine Evernote, Python, GTD with concepts from Personal Kanban to solve this problem.   2013 09 14 Raymond Yee Getting Things Done Python Comments (0) Permalink Embedding Github gists in WordPress As I gear up I to write more about programming, I have installed the Embed GitHub Gist plugin. So by writing [gist id=5625043] in the text of this post, I can embed https://gist.github.com/rdhyee/5625043 into the post to get: from itertools import islice def triangular(): n = 1 i = 1 while True: yield n i +=1 n += i # for i, n in enumerate(islice(triangular(), 10)): print i+1, n Tagged gist, github 2013 05 21 Raymond Yee Wordpress Comments (2) Permalink Working with Open Data I'm very excited to be teaching a new course Working with Open Data at the UC Berkeley School of Information in the Spring 2013 semester: Open data — data that is free for use, reuse, and redistribution — is an intellectual treasure-trove that has given rise to many unexpected and often fruitful applications. In this course, students will 1) learn how to access, visualize, clean, interpret, and share data, especially open data, using Python, Python-based libraries, and supplementary computational frameworks and 2) understand the theoretical underpinnings of open data and their connections to implementations in the physical and life sciences, government, social sciences, and journalism.   2012 11 23 Raymond Yee Uncategorized Comments (0) Permalink A mundane task: updating a config file to retain old settings I want to have a hand in creating an excellent personal information manager (PIM) that can be a worthy successor to Ecco Pro. So far, running EccoExt (a clever and expansive hack of Ecco Pro) has been a eminently practical solution.   You can download the most recent version of this actively developed extension from the files section of the ecco_pro Yahoo! group.   I would do so regularly but one of the painful problems with unpacking (using unrar) the new files is that there wasn't an updater that would retain the configuration options of the existing setup.  So a mundane but happy-making programming task of this afternoon was to write a Python script to do exact that function, making use of the builtin ConfigParser library. """ compare eccoext.ini files My goal is to edit the new file so that any overlapping values take on the current value """ current_file_path = "/private/tmp/14868/C/Program Files/ECCO/eccoext.ini" new_file_path = "/private/tmp/14868/C/utils/eccoext.ini" updated_file = "/private/tmp/14868/C/utils/updated_eccoext.ini" # extract the key value pairs in both files to compare the two # http://docs.python.org/library/configparser.html import ConfigParser def extract_values(fname): # generate a parsed configuration object, set of (section, options) config = ConfigParser.SafeConfigParser() options_set = set() config.read(fname) sections = config.sections() for section in sections: options = config.options(section) for option in options: #value = config.get(section,option) options_set.add((section,option)) return (config, options_set) # process current file and new file (current_config, current_options) = extract_values(current_file_path) (new_config, new_options) = extract_values(new_file_path) # what are the overlapping options overlapping_options = current_options & new_options # figure out which of the overlapping options are the values different for (section,option) in overlapping_options: current_value = current_config.get(section,option) new_value = new_config.get(section,option) if current_value != new_value: print section, option, current_value, new_value new_config.set(section,option,current_value) # write the updated config file with open(updated_file, 'wb') as configfile: new_config.write(configfile) 2011 02 12 Raymond Yee Ecco Pro Python Comments (0) Permalink « Older posts Pages About Categories Amazon annotation announcments APIs architecture art history automation bibliographics bioinformatics BPlan 2009 Chickenfoot Citizendium collaboration consulting copyright creative commons data mining Data Unbound LLC digital scholarship Ecco Pro education Evernote Firefox Flickr freebase Getting Things Done Google government GTD hardware HCI higher education humanities imaging iSchool journalism libraries macOS mashups meta MITH API workshop Mixing and Remixing information notelets OCLC open access open data OpenID personal information management personal news politics Processing programming tip prototype publishing Python recovery.gov tracking repositories REST screen scraping screencast services SOAP training tutorial UC Berkeley Uncategorized web hosting web services web20 weblogging Wikipedia Wordpress writing Zotero Tags API art history books Chickenfoot codepad coins creative commons data hosting data portability Educause EXIF Firefox Flickr freebase JCDL JCDL 2008 kses Library of Congress mashups mashup symfony Django metadata news NYTimes AmazonEC2 AmazonS3 OMB OpenID openlibrary OpenOffice.org photos politics Project Bamboo Python pywin32 recovery.gov tracking screencast stimulus sychronization video webcast Wikipedia Windows XP WMI Wordpress workshops XML in libraries Zotero Blogroll Information Services and Technology, UC Berkeley UC Berkeley RSS Feeds All posts All comments Meta Log in Blog Search © 2021 | Thanks, WordPress | Barthelme theme by Scott Allan Wallick | Standards Compliant XHTML & CSS | RSS Posts & Comments blog-dataunbound-com-6587 ---- Data Unbound Data Unbound Helping organizations access and share data effectively. Special focus on web APIs for data integration. Some of what I missed from the Cmd-D Automation Conference The CMD-D|Masters of Automation one-day conference in early August would have been right up my alley: It’ll be a full day of exploring the current state of automation technology on both Apple platforms, sharing ideas and concepts, and showing what’s possible—all with the goal of inspiring and furthering development of your own automation projects. Fortunately, […] Fine-tuning a Python wrapper for the hypothes.is web API and other #ianno17 followup In anticipation of #ianno17 Hack Day, I wrote about my plans for the event, one of which was to revisit my own Python wrapper for the nascent hypothes.is web API. Instead of spending much time on my own wrapper, I spent most of the day working with Jon Udell's wrapper for the API. I've been […] Revisiting hypothes.is at I Annotate 2017 I'm looking forward to hacking on web and epub annotation at the #ianno17 Hack Day. I won't be at the I Annotate 2017 conference per se but will be curious to see what comes out of the annual conference. I continue to have high hopes for digital annotations, both on the Web and in non-web […] My thoughts about Fargo.io using fargo.io Organizing Your Life With Python: a submission for PyCon 2015? I have penciled into my calendar a trip  to Montreal to attend PyCon 2014.   In my moments of suboptimal planning, I wrote an overly ambitious abstract for a talk or poster session I was planning to submit.  As I sat down this morning to meet the deadline for submitting a proposal for a poster […] Current Status of Data Unbound LLC in Pennsylvania I'm currently in the process of closing down Data Unbound LLC in Pennsylvania.  I submitted the paperwork to dissolve the legal entity in April 2013 and have been amazed to learn that it may take up to a year to get the final approval done.  In the meantime, as I establishing a similar California legal […] Must Get Cracking on Organizing Your Life with Python Talk and tutorial proposals for PyCon 2014 are due tomorrow (9/15) .  I was considering submitting a proposal until I took the heart the appropriate admonition against "conference-driven" development of the program committee.   I will nonetheless use the Oct 15 and Nov 1 deadlines for lightning talks and proposals respectively to judge whether to […] Embedding Github gists in WordPress As I gear up I to write more about programming, I have installed the Embed GitHub Gist plugin. So by writing [gist id=5625043] in the text of this post, I can embed https://gist.github.com/rdhyee/5625043 into the post to get: Working with Open Data I'm very excited to be teaching a new course Working with Open Data at the UC Berkeley School of Information in the Spring 2013 semester: Open data — data that is free for use, reuse, and redistribution — is an intellectual treasure-trove that has given rise to many unexpected and often fruitful applications. In this […] A mundane task: updating a config file to retain old settings I want to have a hand in creating an excellent personal information manager (PIM) that can be a worthy successor to Ecco Pro. So far, running EccoExt (a clever and expansive hack of Ecco Pro) has been a eminently practical solution.   You can download the most recent version of this actively developed extension from […] bibwild-wordpress-com-6809 ---- Bibliographic Wilderness Skip to content Bibliographic Wilderness Menu About Contact Code that Lasts: Sustainable And Usable Open Source Code A presentation I gave at online conference Code4Lib 2021, on Monday March 21. I have realized that the open source projects I am most proud of are a few that have existed for years now, increasing in popularity, with very little maintenance required. Including traject and bento_search. While community aspects matter for open source sustainability, the task gets so much easier when the code requires less effort to keep alive, for maintainers and utilizers. Using these projects as examples, can we as developers identify what makes code “inexpensive” to use and maintain over the long haul with little “churn”, and how to do that? Slides on Google Docs Rough transcript (really the script I wrote for myself) Hi, I’m Jonathan Rochkind, and this is “Code that Lasts: Sustainable and Usable Open Source Code” So, who am I? I have been developing open source library software since 2006, mainly in ruby and Rails.  Over that time, I have participated in a variety open source projects meant to be used by multiple institutions, and I’ve often seen us having challenges with long-term maintenance sustainability and usability of our software. This includes in projects I have been instrumental in creating myself, we’ve all been there!  We’re used to thinking of this problem in terms of needing more maintainers. But let’s first think more about what the situation looks like, before we assume what causes it. In addition to features  or changes people want not getting done, it also can look like, for instance: Being stuck using out-of-date dependencies like old, even end-of-lifed, versions of Rails or ruby. A reduction in software “polish” over time.  What do I mean by “polish”? Engineer Richard Schneeman writes: [quote] “When we say something is “polished” it means that it is free from sharp edges, even the small ones. I view polished software to be ones that are mostly free from frustration. They do what you expect them to and are consistent.”  I have noticed that software can start out very well polished, but over time lose that polish.  This usually goes along with decreasing “cohesion” in software over time, a feeling like that different parts of the software start to no longer tell the developer a consistent story together.  While there can be an element of truth in needing more maintainers in some cases – zero maintainers is obviously too few — there are also ways that increasing the number of committers or maintainers can result in diminishing returns and additional challenges. One of the theses of Fred Brooks famous 1975 book “The Mythical Man-Month” is sometimes called ”Brooks Law”:  “under certain conditions, an incremental person when added to a project makes the project take more, not less time.” Why? One of the main reasons Brooks discusses is the the additional time taken for communication and coordination between more people – with every person you add, the number of connections between people goes up combinatorily.  That may explain the phenomenon we sometimes see with so-called “Design  by committee” where “too many cooks in the kitchen” can produce inconsistency or excessive complexity. Cohesion and polish require a unified design vision— that’s  not incompatible with increasing numbers of maintainers, but it does make it more challenging because it takes more time to get everyone on the same page, and iterate while maintaining a unifying vision.  (There’s also more to be said here about the difference between just a bunch of committers committing PR’s, and the maintainers role of maintaining historical context and design vision for how all the parts fit together.) Instead of assuming adding more committers or maintainers is the solution, can there instead be ways to reduce the amount of maintenance required? I started thinking about this when I noticed a couple projects of mine which had become more widely successful than I had any right  to expect, considering how little maintainance was being put into them.  Bento_search is a toolkit for searching different external search engines in a consistent way. It’s especially but not exclusively for displaying multiple search results in “bento box” style, which is what Tito Sierra from NCSU first called these little side by side search results.  I wrote bento_search  for use at a former job in 2012.  55% of all commits to the project were made in 2012.  95% of all commits in 2016 or earlier. (I gave it a bit of attention for a contracting project in 2016). But bento_search has never gotten a lot of maintenance, I don’t use it anymore myself. It’s not in wide use, but I found  it kind of amazing, when I saw people giving me credit in conference presentations for the gem (thanks!), when I didn’t even know they were using it and I hadn’t been paying it any attention at all! It’s still used by a handful of institutions for whom it just works with little attention from maintainers. (The screenshot from Cornell University Libraries) Traject is a Marc-to-Solr indexing tool written in ruby  (or, more generally, can be a general purpose extract-transform-load tool), that I wrote with Bill Dueber from the University of Michigan in 2013.  We hoped it would catch on in the Blacklight community, but for the first couple years, it’s uptake was slow.  However, since then, it has come to be pretty popular in Blacklight and Samvera communities, and a few other library technologist uses.  You can see the spikes of commit activity in the graph for a 2.0 release in 2015 and a 3.0 release in 2018 – but for the most part at other times, nobody has really been spending much time on maintaining traject.   Every once in a while a community member submits a minor Pull Request, and it’s usually me who reviews it. Me and Bill remain the only maintainers.  And yet traject just keeps plugging along, picking up adoption and working well for adopters.   So, this made me start thinking, based on what I’ve seen in my career, what are some of the things that might make open source projects both low-maintenance and successful in their adoption and ease-of-use for developers? One thing both of these projects did was take backwards compatibility very seriously.  The first step of step there is following “semantic versioning” a set of rules whose main point is that releases can’t include backwards incompatible changes unless they are a new major version, like going from 1.x to 2.0.  This is important, but it’s not alone enough to minimize backwards incompatible changes that add maintenance burden to the ecosystem. If the real goal is preventing the pain of backwards incompatibility, we also need to limit the number of major version releases, and limit the number and scope of backwards breaking changes in each major release! The Bento_search gem has only had one major release, it’s never had a 2.0 release, and it’s still backwards compatible to it’s initial release.  Traject is on a 3.X release after 8 years, but the major releases of traject have had extremely few backwards breaking changes, most people could upgrade through major versions changing very little or most often nothing in their projects.  So OK, sure, everyone wants to minimize backwards incompatibility, but that’s easy to say, how do you DO it? Well, it helps to have less code overall, that changes less often overall all  – ok, again, great, but how do you do THAT?  Parsimony is a word in general English that means “The quality of economy or frugality in the use of resources.” In terms of software architecture, it means having as few as possible moving parts inside your code: fewer classes, types, components, entities, whatever: Or most fundamentally, I like to think of it in terms of minimizing the concepts in the mental model a programmer needs to grasp how the code works and what parts do what. The goal of architecture design is, what is the smallest possible architecture we can create to make [quote] “simple things simple and complex things possible”, as computer scientist Alan Kay described the goal of software design.  We can see this in bento_search has very few internal architectural concepts.  The main thing bento_search does is provide a standard API for querying a search engine and representing results of a search. These are consistent across different searche engines,, with common metadata vocabulary for what results look like. This makes search engines  interchangeable to calling code.  And then it includes half a dozen or so search engine implementations for services I needed or wanted to evaluate when I wrote it.   This search engine API at the ruby level can be used all by itself even without the next part, the actual “bento style” which is a built-in support for displaying search engine results in a boxes on a page of your choice in a Rails app, way to,  writing very little boilerplate code.   Traject has an architecture which basically has just three parts at the top. There is a reader which sends objects into the pipeline.  There are some indexing rules which are transformation steps from source object to build an output Hash object.  And then a writer which which translates the Hash object to write to some store, such as Solr. The reader, transformation steps, and writer are all independent and uncaring about each other, and can be mixed and matched.   That’s MOST of traject right there. It seems simple and obvious once you have it, but it can take a lot of work to end up with what’s simple and obvious in retrospect!  When designing code I’m often reminded of the apocryphal quote: “I would have written a shorter letter, but I did not have the time” And, to be fair, there’s a lot of complexity within that “indexing rules” step in traject, but it’s design was approached the same way. We have use cases about supporting configuration settings in a  file or on command line; or about allowing re-usable custom transformation logic – what’s the simplest possible architecture we can come up with to support those cases. OK, again, that sounds nice, but how do you do it? I don’t have a paint by numbers, but I can say that for both these projects I took some time – a few weeks even – at the beginning to work out these architectures, lots of diagraming, some prototyping I was prepared to throw out,  and in some cases “Documentation-driven design” where I wrote some docs for code I hadn’t written yet. For traject it was invaluable to have Bill Dueber at University of Michigan also interested in spending some design time up front, bouncing ideas back and forth with – to actually intentionally go through an architectural design phase before the implementation.  Figuring out a good parsimonious architecture takes domain knowledge: What things your “industry” – other potential institutions — are going to want to do in this area, and specifically what developers are going to want to do with your tool.  We’re maybe used to thinking of “use cases” in terms of end-users, but it can be useful at the architectural design stage, to formalize this in terms of developer use cases. What is a developer going to want to do, how can I come up with a small number of software pieces she can use to assemble together to do those things. When we said “make simple things simple and complex things possible”, we can say domain analysis and use cases is identifying what things we’re going to put in either or neither of those categories.  The “simple thing” for bento_search , for instance is just “do a simple keyword search in a search engine, and display results, without having the calling code need to know anything about the specifics of that search engine.” Another way to get a head-start on solid domain knowledge is to start with another tool you have experience with, that you want to create a replacement for. Before Traject, I and other users used a tool written in Java called SolrMarc —  I knew how we had used it, and where we had had roadblocks or things that we found harder or more complicated than we’d like, so I knew my goals were to make those things simpler. We’re used to hearing arguments about avoiding rewrites, but like most things in software engineering, there can be pitfalls on either either extreme. I was amused to notice, Fred Brooks in the previously mentioned Mythical Man Month makes some arguments in both directions.  Brooks famously warns about a “second-system effect”, the [quote] “tendency of small, elegant, and successful systems to be succeeded by over-engineered, bloated systems, due to inflated expectations and overconfidence” – one reason to be cautious of a rewrite.  But Brooks in the very same book ALSO writes [quote] “In most projects, the first system built is barely usable….Hence plan to throw one away; you will, anyhow.” It’s up to us figure out when we’re in which case. I personally think an application is more likely to be bitten by the “second-system effect” danger of a rewrite, while a shared re-usable library is more likely to benefit from a rewrite (in part because a reusable library is harder to change in place without disruption!).  We could sum up a lot of different princples as variations of “Keep it small”.  Both traject and bento_search are tools that developers can use to build something. Bento_search just puts search results in a box on a page; the developer is responsible for the page and an overall app.  Yes, this means that you have to be a ruby developer to use it. Does this limit it’s audience? While we might aspire to make tools that even not-really-developers can just use out of the box, my experience has been that our open source attempts at shrinkwrapped “solutions” often end up still needing development expertise to successfully deploy.  Keeping our tools simple and small and not trying to supply a complete app can actually leave more time for these developers to focus on meeting local needs, instead of fighting with a complicated frameworks that doesn’t do quite what they need. It also means we can limit interactions with any external dependencies. Traject was developed for use with a Blacklight project, but traject code does not refer to Blacklight or even Rails at all, which means new releases of Blacklight or Rails can’t possibly break traject.  Bento_search , by doing one thing and not caring about the details of it’s host application, has kept working from Rails 3.2 all the way up to current Rails 6.1 with pretty much no changes needed except to the test suite setup.  Sometimes when people try to have lots of small tools working together, it can turn into a nightmare where you get a pile of cascading software breakages every time one piece changes. Keeping assumptions and couplings down is what lets us avoid this maintenance nightmare.  And another way of keeping it small is don’t be afraid to say “no” to features when you can’t figure out how to fit them in without serious harm to the parsimony of your architecture. Your domain knowledge is what lets you take an educated guess as to what features are core to your audience and need to be accomodated, and which are edge cases and can be fulfilled by extension points, or sometimes not at all.  By extension points we mean we prefer opportunities for developer-users to write their own code which works with your tools, rather than trying to build less commonly needed features in as configurable features.  As an example, Traject does include some built-in logic, but one of it’s extension point use cases is making sure it’s simple to add whatever transformation logic a developer-user wants, and have it look just as “built-in” as what came with traject. And since traject makes it easy to write your own reader or writer, it’s built-in readers and writers don’t need to include every possible feature –we plan for developers writing their own if they need something else.  Looking at bento_search, it makes it easy to write your own search engine_adapter — that will be useable interchangeably with the built-in ones. Also, bento_search provides a standard way to add custom search arguments specific to a particular adapter – these won’t be directly interchangeable with other adapters, but they are provided for in the architecture, and won’t break in future bento_search releases – it’s another form of extension point.  These extension points are the second half of “simple things simple, complex things possible.” – the complex things possible. Planning for them is part of understanding your developer use-cases, and designing an architecture that can easily handle them. Ideally, it takes no extra layers of abstraction to handle them, you are using the exact  architectural join points the out-of-the-box code is using, just supplying custom components.  So here’s an example of how these things worked out in practice with traject, pretty well I think. Stanford ended up writing a package of extensions to traject called TrajectPlus, to take care of some features they needed that traject didn’t provide. Commit history suggests it was written in 2017, which was Traject 2.0 days.   I can’t recall, but I’d guess they approached me with change requests to traject at that time and I put them off because I couldn’t figure out how to fit them in parsimoniously, or didn’t have time to figure it out.  But the fact that they were *able* to extend traject in this way I consider a validation of traject’s architecture, that they could make it do what they needed, without much coordination with me, and use it in many projects (I think beyond just Stanford).  Much of the 3.0 release of traject was “back-port”ing some features that TrajectPlus had implemented, including out-of-the-box support for XML sources. But I didn’t always do them with the same implementation or API as TrajectPlus – this is another example of being able to use a second go at it to figure out how to do something even more parsimoniously, sometimes figuring out small changes to traject’s architecture to support flexibility in the right dimensions.  When Traject 3.0 came out – the TrajectPlus users didn’t necessarily want to retrofit all their code to the new traject way of doing it. But TrajectPlus could still be used with traject 3.0 with few or possibly no changes, doing things the old way, they weren’t forced to upgrade to the new way. This is a huge win for traject’s backwards compat – everyone was able to do what they needed to do, even taking separate paths, with relatively minimized maintenance work.  As I think about these things philosophically, one of my takeaways is that software engineering is still a craft – and software design is serious thing to be studied and engaged in. Especially for shared libraries rather than local apps, it’s not always to be dismissed as so-called “bike-shedding”.  It’s worth it to take time to think about design, self-reflectively and with your peers, instead of just rushing to put our fires or deliver features, it will reduce maintenance costs and increase values over the long-term.  And I want to just briefly plug “kithe”, a project of mine which tries to be guided by these design goals to create a small focused toolkit for building Digital Collections applications in Rails.  I could easily talk about all of this this another twenty minutes, but that’s our time! I’m always happy to talk more, find me on slack or IRC or email.  This last slide has some sources mentioned in the talk. Thanks for your time!  jrochkind General Leave a comment March 23, 2021March 23, 2021 Product management In my career working in the academic sector, I have realized that one thing that is often missing from in-house software development is “product management.” But what does that mean exactly? You don’t know it’s missing if you don’t even realize it’s a thing and people can use different terms to mean different roles/responsibilities. Basically, deciding what the software should do. This is not about colors on screen or margins (what our stakeholderes often enjoy micro-managing) — I’d consider those still the how of doing it, rather than the what to do. The what is often at a much higher level, about what features or components to develop at all. When done right, it is going to be based on both knowledge of the end-user’s needs and preferences (user research); but also knowledge of internal stakeholder’s desires and preferences (overall organiational strategy, but also just practically what is going to make the right people happy to keep us resourced). Also knowledge of the local capacity, what pieces do we need to put in place to get these things developed. When done seriously, it will necessarily involve prioritization — there are many things we could possibly done, some subset of them we very well may do eventually, but which ones should we do now? My experience tells me it is a very big mistake to try to have a developer doing this kind of product management. Not because a developer can’t have the right skillset to do them. But because having the same person leading development and product management is a mistake. The developer is too close to the development lense, and there’s just a clarification that happens when these roles are separate. My experience also tells me that it’s a mistake to have a committee doing these things, much as that is popular in the academic sector. Because, well, just of course it is. But okay this is all still pretty abstract. Things might become more clear if we get more specific about the actual tasks and work of this kind of product management role. I found Damilola Ajiboye blog post on “Product Manager vs Product Marketing Manager vs Product Owner” very clear and helpful here. While it is written so as to distinguish between three different product management related roles, but Ajiboye also acknowledges that in a smaller organization “a product manager is often tasked with the duty of these 3 roles. Regardless of if the responsibilities are to be done by one or two or three person, Ajiboye’s post serves as a concise listing of the work to be done in managing a product — deciding the what of the product, in an ongoing iterative and collaborative manner, so that developers and designers can get to the how and to implementation. I recommend reading the whole article, and I’ll excerpt much of it here, slightly rearranged. The Product Manager These individuals are often referred to as mini CEOs of a product. They conduct customer surveys to figure out the customer’s pain and build solutions to address it. The PM also prioritizes what features are to be built next and prepares and manages a cohesive and digital product roadmap and strategy. The Product Manager will interface with the users through user interviews/feedback surveys or other means to hear directly from the users. They will come up with hypotheses alongside the team and validate them through prototyping and user testing. They will then create a strategy on the feature and align the team and stakeholders around it. The PM who is also the chief custodian of the entire product roadmap will, therefore, be tasked with the duty of prioritization. Before going ahead to carry out research and strategy, they will have to convince the stakeholders if it is a good choice to build the feature in context at that particular time or wait a bit longer based on the content of the roadmap. The Product Marketing Manager The PMM communicates vital product value — the “why”, “what” and “when” of a product to intending buyers. He manages the go-to-market strategy/roadmap and also oversees the pricing model of the product. The primary goal of a PMM is to create demand for the products through effective messaging and marketing programs so that the product has a shorter sales cycle and higher revenue. The product marketing manager is tasked with market feasibility and discovering if the features being built align with the company’s sales and revenue plan for the period. They also make research on how sought-after the feature is being anticipated and how it will impact the budget. They communicate the values of the feature; the why, what, and when to potential buyers — In this case users in countries with poor internet connection. [While expressed in terms of a for-profit enterprise selling something, I think it’s not hard to translate this to a non-profit or academic environment. You still have an audience whose uptake you need to be succesful, whether internal or external. — jrochkind ] The Product Owner A product owner (PO) maximizes the value of a product through the creation and management of the product backlog, creation of user stories for the development team. The product owner is the customer’s representative to the development team. He addresses customer’s pain points by managing and prioritizing a visible product backlog. The PO is the first point of call when the development team needs clarity about interpreting a product feature to be implemented. The product owner will first have to prioritize the backlog to see if there are no important tasks to be executed and if this new feature is worth leaving whatever is being built currently. They will also consider the development effort required to build the feature i.e the time, tools, and skill set that will be required. They will be the one to tell if the expertise of the current developers is enough or if more engineers or designers are needed to be able to deliver at the scheduled time. The product owner is also armed with the task of interpreting the product/feature requirements for the development team. They serve as the interface between the stakeholders and the development team. When you have someone(s) doing these roles well, it ensures that the development team is actually spending time on things that meet user and business needs. I have found that it makes things so much less stressful and more rewarding for everyone involved. When you have nobody doing these roles, or someone doing it in a cursory or un-intentional way not recognized as part of their core job responsibilities, or have a lead developer trying to do it on top of develvopment, I find it leads to feelings of: spinning wheels, everything-is-an-emergency, lack of appreciation, miscommunication and lack of shared understanding between stakeholders and developers, general burnout and dissatisfaction — and at the root, a product that is not meeting user or business needs well, leading to these inter-personal and personal problems. jrochkind General Leave a comment February 3, 2021 Rails auto-scaling on Heroku We are investigating moving our medium-small-ish Rails app to heroku. We looked at both the Rails Autoscale add-on available on heroku marketplace, and the hirefire.io service which is not listed on heroku marketplace and I almost didn’t realize it existed. I guess hirefire.io doesn’t have any kind of a partnership with heroku, but still uses the heroku API to provide an autoscale service. hirefire.io ended up looking more fully-featured and lesser priced than Rails Autoscale; so the main service of this post is just trying to increase visibility of hirefire.io and therefore competition in the field, which benefits us consumers. Background: Interest in auto-scaling Rails background jobs At first I didn’t realize there was such a thing as “auto-scaling” on heroku, but once I did, I realized it could indeed save us lots of money. I am more interested in scaling Rails background workers than I a web workers though — our background workers are busiest when we are doing “ingests” into our digital collections/digital asset management system, so the work is highly variable. Auto-scaling up to more when there is ingest work piling up can give us really nice inget throughput while keeping costs low. On the other hand, our web traffic is fairly low and probably isn’t going to go up by an order of magnitude (non-profit cultural institution here). And after discovering that a “standard” dyno is just too slow, we will likely be running a performance-m or performance-l anyway — which likely can handle all anticipated traffic on it’s own. If we have an auto-scaling solution, we might configure it for web dynos, but we are especially interested in good features for background scaling. There is a heroku built-in autoscale feature, but it only works for performance dynos, and won’t do anything for Rails background job dynos, so that was right out. That could work for Rails bg jobs, the Rails Autoscale add-on on the heroku marketplace; and then we found hirefire.io. Pricing: Pretty different hirefire As of now January 2021, hirefire.io has pretty simple and affordable pricing. $15/month/heroku application. Auto-scaling as many dynos and process types as you like. hirefire.io by default can only check into your apps metrics to decide if a scaling event can occur once per minute. If you want more frequent than that (up to once every 15 seconds), you have to pay an additional $10/month, for $25/month/heroku application. Even though it is not a heroku add-on, hirefire does advertise that they bill pro-rated to the second, just like heroku and heroku add-ons. Rails autoscale Rails autoscale has a more tiered approach to pricing that is based on number and type of dynos you are scaling. Starting at $9/month for 1-3 standard dynos, the next tier up is $39 for up to 9 standard dynos, all the way up to $279 (!) for 1 to 99 dynos. If you have performance dynos involved, from $39/month for 1-3 performance dynos, up to $599/month for up to 99 performance dynos. For our anticipated uses… if we only scale bg dynos, I might want to scale from (low) 1 or 2 to (high) 5 or 6 standard dynos, so we’d be at $39/month. Our web dynos are likely to be performance and I wouldn’t want/need to scale more than probably 2, but that puts us into performance dyno tier, so we’re looking at $99/month. This is of course significantly more expensive than hirefire.io’s flat rate. Metric Resolution Since Hirefire had an additional charge for finer than 1-minute resolution on checks for autoscaling, we’ll discuss resolution here in this section too. Rails Autoscale has same resolution for all tiers, and I think it’s generally 10 seconds, so approximately the same as hirefire if you pay the extra $10 for increased resolution. Configuration Let’s look at configuration screens to get a sense of feature-sets. Rails Autoscale web dynos To configure web dynos, here’s what you get, with default values: The metric Rails Autoscale uses for scaling web dynos is time in heroku routing queue, which seems right to me — when things are spending longer in heroku routing queue before getting to a dyno, it means scale up. worker dynos For scaling worker dynos, Rails Autoscale can scale dyno type named “worker” — it can understand ruby queuing libraries Sidekiq, Resque, Delayed Job, or Que. I’m not certain if there are options for writing custom adapter code for other backends. Here’s what the configuration options are — sorry these aren’t the defaults, I’ve already customized them and lost track of what defaults are. You can see that worker dynos are scaled based on the metric “number of jobs queued”, and you can tell it to only pay attention to certain queues if you want. Hirefire Hirefire has far more options for customization than Rails Autoscale, which can make it a bit overwhelming, but also potentially more powerful. web dynos You can actually configure as many Heroku process types as you have for autoscale, not just ones named “web” and “worker”. And for each, you have your choice of several metrics to be used as scaling triggers. For web, I think Queue Time (percentile, average) matches what Rails Autoscale does, configured to percentile, 95, and is probably the best to use unless you have a reason to use another. (“Rails Autoscale tracks the 95th percentile queue time, which for most applications will hover well below the default threshold of 100ms.“) Here’s what configuration Hirefire makes available if you are scaling on “queue time” like Rails Autoscale, configuration may vary for other metrics. I think if you fill in the right numbers, you can configure to work equivalently to Rails Autoscale. worker dynos If you have more than one heroku process type for workers — say, working on different queues — Hirefire can scale the independently, with entirely separate configuration. This is pretty handy, and I don’t think Rails Autoscale offers this. (update i may be wrong, Rails Autoscale says they do support this, so check on it yourself if it matters to you). For worker dynos, you could choose to scale based on actual “dyno load”, but I think this is probably mostly for types of processes where there isn’t the ability to look at “number of jobs”. A “number of jobs in queue” like Rails Autoscale does makes a lot more sense to me as an effective metric for scaling queue-based bg workers. Hirefire’s metric is slightly difererent than Rails Autoscale’s “jobs in queue”. For recognized ruby queue systems (a larger list than Rails Autoscale’s; and you can write your own custom adapter for whatever you like), it actually measures jobs in queue plus workers currently busy. So queued+in-progress, rather than Rails Autoscale’s just queued. I actually have a bit of trouble wrapping my head around the implications of this, but basically, it means that Hirefire’s “jobs in queue” metric strategy is intended to try to scale all the way to emptying your queue, or reaching your max scale limit, whichever comes first. I think this may make sense and work out at least as well or perhaps better than Rails Autoscale’s approach? Here’s what configuration Hirefire makes available for worker dynos scaling on “job queue” metric. Since the metric isn’t the same as Rails Autosale, we can’t configure this to work identically. But there are a whole bunch of configuration options, some similar to Rails Autoscale’s. The most important thing here is that “Ratio” configuration. It may not be obvious, but with the way the hirefire metric works, you are basically meant to configure this to equal the number of workers/threads you have on each dyno. I have it configured to 3 because my heroku worker processes use resque, with resque_pool, configured to run 3 resque workers on each dyno. If you use sidekiq, set ratio to your configured concurrency — or if you are running more than one sidekiq process, processes*concurrency. Basically how many jobs your dyno can be concurrently working is what you should normally set for ‘ratio’. Hirefire not a heroku plugin Hirefire isn’t actually a heroku plugin. In addition to that meaning separate invoicing, there can be some other inconveniences. Since hirefire only can interact with heroku API, for some metrics (including the “queue time” metric that is probably optimal for web dyno scaling) you have to configure your app to log regular statistics to heroku’s “Logplex” system. This can add a lot of noise to your log, and for heroku logging add-ons that are tired based on number of log lines or bytes, can push you up to higher pricing tiers. If you use paperclip, I think you should be able to use the log filtering feature to solve this, keep that noise out of your logs and avoid impacting data log transfer limits. However, if you ever have cause to look at heroku’s raw logs, that noise will still be there. Support and Docs I asked a couple questions of both Hirefire and Rails Autoscale as part of my evaluation, and got back well-informed and easy-to-understand answers quickly from both. Support for both seems to be great. I would say the documentation is decent-but-not-exhaustive for both products. Hirefire may have slightly more complete documentation. Other Features? There are other things you might want to compare, various kinds of observability (bar chart or graph of dynos or observed metrics) and notification. I don’t have time to get into the details (and didn’t actually spend much time exploring them to evaluate), but they seem to offer roughly similar features. Conclusion Rails Autoscale is quite a bit more expensive than hirefire.io’s flat rate, once you get past Rails Autoscale’s most basic tier (scaling no more than 3 standard dynos). It’s true that autoscaling saves you money over not, so even an expensive price could be considered a ‘cut’ of that, and possibly for many ecommerce sites even $99 a month might a drop in the bucket (!)…. but this price difference is so significant with hirefire (which has flat rate regardless of dynos), that it seems to me it would take a lot of additional features/value to justify. And it’s not clear that Rails Autoscale has any feature advantage. In general, hirefire.io seems to have more features and flexibility. Until 2021, hirefire.io could only analyze metrics with 1-minute resolution, so perhaps that was a “killer feature”? Honestly I wonder if this price difference is sustained by Rails Autoscale only because most customers aren’t aware of hirefire.io, it not being listed on the heroku marketplace? Single-invoice billing is handy, but probably not worth $80+ a month. I guess hirefire’s logplex noise is a bit inconvenient? Or is there something else I’m missing? Pricing competition is good for the consumer. And are there any other heroku autoscale solutions, that can handle Rails bg job dynos, that I still don’t know about? update a day after writing djcp on a reddit thread writes: I used to be a principal engineer for the heroku add-ons program. One issue with hirefire is they request account level oauth tokens that essentially give them ability to do anything with your apps, where Rails Autoscaling worked with us to create a partnership and integrate with our “official” add-on APIs that limits security concerns and are scoped to the application that’s being scaled. Part of the reason for hirefire working the way it does is historical, but we’ve supported the endpoints they need to scale for “official” partners for years now. A lot of heroku customers use hirefire so please don’t think I’m spreading FUD, but you should be aware you’re giving a third party very broad rights to do things to your apps. They probably won’t, of course, but what if there’s a compromise? “Official” add-on providers are given limited scoped tokens to (mostly) only the actions / endpoints they need, minimizing blast radius if they do get compromised. You can read some more discussion at that thread. jrochkind General 2 Comments January 27, 2021January 30, 2021 Managed Solr SaaS Options I was recently looking for managed Solr “software-as-a-service” (SaaS) options, and had trouble figuring out what was out there. So I figured I’d share what I learned. Even though my knowledge here is far from exhaustive, and I have only looked seriously at one of the ones I found. The only managed Solr options I found were: WebSolr; SearchStax; and OpenSolr. Of these, i think WebSolr and SearchStax are more well-known, I couldn’t find anyone with experience with OpenSolr, which perhaps is newer. Of them all, SearchStax is the only one I actually took for a test drive, so will have the most to say about. Why we were looking We run a fairly small-scale app, whose infrastructure is currently 4 self-managed AWS EC2 instances, running respectively: 1) A rails web app 2) Bg workers for the rails web app 3) Postgres, and 4) Solr. Oh yeah, there’s also a redis running one of those servers, on #3 with pg or #4 with solr, I forget. Currently we manage this all ourselves, right on the EC2. But we’re looking to move as much as we can into “managed” servers. Perhaps we’ll move to Heroku. Perhaps we’ll use hatchbox. Or if we do stay on AWS resources we manage directly, we’d look at things like using an AWS RDS Postgres instead of installing it on an EC2 ourselves, an AWS ElastiCache for Redis, maybe look into Elastic Beanstalk, etc. But no matter what we do, we need a Solr, and we’d like to get it managed. Hatchbox has no special Solr support, AWS doesn’t have a Solr service, Heroku does have a solr add-on but you can also use any Solr with it and we’ll get to that later. Our current Solr use is pretty small scale. We don’t run “SolrCloud mode“, just legacy ordinary Solr. We only have around 10,000 documents in there (tiny for Solr), our index size is only 70MB. Our traffic is pretty low — when I tried to figure out how low, it doesn’t seem we have sufficient logging turned on to answer that specifically but using proxy metrics to guess I’d say 20K-40K requests a day, query as well as add. This is a pretty small Solr installation, although it is used centrally for the primary functions of the (fairly low-traffic) app. It currently runs on an EC2 t3a.small, which is a “burstable” EC2 type with only 2G of RAM. It does have two vCPUs (that is one core with ‘hyperthreading’). The t3a.small EC2 instance only costs $14/month on-demand price! We know we’ll be paying more for managed Solr, but we want to do get out of the business of managing servers — we no longer really have the staff for it. WebSolr (didn’t actually try out) WebSolr is the only managed Solr currently listed as a Heroku add-on. It is also available as a managed Solr independent of heroku. The pricing in the heroku plans vs the independent plans seems about the same. As a heroku add-on there is a $20 “staging” plan that doesn’t exist in the independent plans. (Unlike some other heroku add-ons, no time-limited free plan is available for WebSolr). But once we go up from there, the plans seem to line up. Starting at: $59/month for: 1 million document limit 40K requests/day 1 index 954MB storage 5 concurrent requests limit (this limit is not mentioned on the independent pricing page?) Next level up is $189/month for: 5 million document limit 150K requests/day 4.6GB storage 10 concurrent request limit (again concurrent request limits aren’t mentioned on independent pricing page) As you can see, WebSolr has their plans metered by usage. $59/month is around the price range we were hoping for (we’ll need two, one for staging one for production). Our small solr is well under 1 million documents and ~1GB storage, and we do only use one index at present. However, the 40K requests/day limit I’m not sure about, even if we fit under it, we might be pushing up against it. And the “concurrent request” limit simply isn’t one I’m even used to thinking about. On a self-managed Solr it hasn’t really come up. What does “concurrent” mean exactly in this case, how is it measured? With 10 puma web workers and sometimes a possibly multi-threaded batch index going on, could we exceed a limit of 4? Seems plausible. What happens when they are exceeded? Your Solr request results in an HTTP 429 error! Do I need to now write the app to rescue those gracefully, or use connection pooling to try to avoid them, or something? Having to rewrite the way our app functions for a particular managed solr is the last thing we want to do. (Although it’s not entirely clear if those connection limits exist on the non-heroku-plugin plans, I suspect they do?). And in general, I’m not thrilled with the way the pricing works here, and the price points. I am positive for a lot of (eg) heroku customers an additional $189*2=$378/month is peanuts not even worth accounting for, but for us, a small non-profit whose app’s traffic does not scale with revenue, that starts to be real money. It is not clear to me if WebSolr installations (at “standard” plans) are set up in “SolrCloud mode” or not; I’m not sure what API’s exist for uploading your custom schema.xml (which we’d need to do), or if they expect you to do this only manually through a web UI (that would not be good); I’m not sure if you can upload custom solrconfig.xml settings (this may be running on a shared solr instance with standard solrconfig.xml?). Basically, all of this made WebSolr not the first one we looked at. Does it matter if we’re on heroku using a managed Solr that’s not a Heroku plugin? I don’t think so. In some cases, you can get a better price from a Heroku plug-in than you could get from that same vendor not on heroku or other competitors. But that doesn’t seem to be the case here, and other that that does it matter? Well, all heroku plug-ins are required to bill you by-the-minute, which is nice but not really crucial, other forms of billing could also be okay at the right price. With a heroku add-on, your billing is combined into one heroku invoice, no need to give a credit card to anyone else, and it can be tracked using heroku tools. Which is certainly convenient and a plus, but not essential if the best tool for the job is not a heroku add-on. And as a heroku add-on, WebSolr provides a WEBSOLR_URL heroku config/env variable automatically to code running on heroku. OK, that’s kind of nice, but it’s not a big deal to set a SOLR_URL heroku config manually referencing the appropriate address. I suppose as a heroku add-on, WebSolr also takes care of securing and authenticating connections between the heroku dynos and the solr, so we need to make sure we have a reasonable way to do this from any alternative. SearchStax (did take it for a spin) SearchStax’s pricing tiers are not based on metering usage. There are no limits based on requests/day or concurrent connections. SearchStax runs on dedicated-to-you individual Solr instances (I would guess running on dedicated-to-you individual (eg) EC2, but I’m not sure). Instead the pricing is based on size of host running Solr. You can choose to run on instances deployed to AWS, Google Cloud, or Azure. We’ll be sticking to AWS (the others, I think, have a slight price premium). While SearchStax gives you a pricing pages that looks like the “new-way-of-doing-things” transparent pricing, in fact there isn’t really enough info on public pages to see all the price points and understand what you’re getting, there is still a kind of “talk to a salesperson who has a price sheet” thing going on. What I think I have figured out from talking to a salesperson and support, is that the “Silver” plans (“Starting at $19 a month”, although we’ll say more about that in a bit) are basically: We give you a Solr, we don’t don’t provide any technical support for Solr. While the “Gold” plans “from $549/month” are actually about paying for Solr consultants to set up and tune your schema/index etc. That is not something we need, and $549+/month is way more than the price range we are looking for. While the SearchStax pricing/plan pages kind of imply the “Silver” plan is not suitable for production, in fact there is no real reason not to use it for production I think, and the salesperson I talked to confirmed that — just reaffirming that you were on your own managing the Solr configuration/setup. That’s fine, that’s what we want, we just don’t want to mangage the OS or set up the Solr or upgrade it etc. The Silver plans have no SLA, but as far as I can tell their uptime is just fine. The Silver plans only guarantees 72-hour support response time — but for the couple support tickets I filed asking questions while under a free 14-day trial (oh yeah that’s available), I got prompt same-day responses, and knowledgeable responses that answered my questions. So a “silver” plan is what we are interested in, but the pricing is not actually transparent. $19/month is for the smallest instance available, and IF you prepay/contract for a year. They call that small instance an NDN1 and it has 1GB of RAM and 8GB of storage. If you pay-as-you-go instead of contracting for a year, that already jumps to $40/month. (That price is available on the trial page). When you are paying-as-you-go, you are actually billed per-day, which might not be as nice as heroku’s per-minute, but it’s pretty okay, and useful if you need to bring up a temporary solr instance as part of a migration/upgrade or something like that. The next step up is an “NDN2” which has 2G of RAM and 16GB of storage, and has an ~$80/month pay-as-you-go — you can find that price if you sign-up for a free trial. The discount price price for an annual contract is a discount similar to the NDN1 50%, $40/month — that price I got only from a salesperson, I don’t know if it’s always stable. It only occurs to me now that they don’t tell you how many CPUs are available. I’m not sure if I can fit our Solr in the 1G NDN1, but I am sure I can fit it in the 2G NDN2 with some headroom, so I didn’t look at plans above that — but they are available, still under “silver”, with prices going up accordingly. All SearchStax solr instances run in “SolrCloud” mode — these NDN1 and NDN2 ones we’re looking at just run one node with one zookeeper, but still in cloud mode. There are also “silver” plans available with more than one node in a “high availability” configuration, but the prices start going up steeply, and we weren’t really interested in that. Because it’s SolrCloud mode though, you can use the standard Solr API for uploading your configuration. It’s just Solr! So no arbitrary usage limits, no features disabled. The SearchStax web console seems competently implemented; it let’s you create and delete individual Solr “deployments”, manage accounts to login to console (on “silver” plan you only get two, or can pay $10/month/account for more, nah), and set up auth for a solr deployment. They support IP-based authentication or HTTP Basic Auth to the Solr (no limit to how many Solr Basic Auth accounts you can create). HTTP Basic Auth is great for us, because trying to do IP-based from somewhere like heroku isn’t going to work. All Solrs are available over HTTPS/SSL — great! SearchStax also has their own proprietary HTTP API that lets you do most anything, including creating/destroying deployments, managing Solr basic auth users, basically everything. There is some API that duplicates the Solr Cloud API for adding configsets, I don’t think there’s a good reason to use it instead of standard SolrCloud API, although their docs try to point you to it. There’s even some kind of webhooks for alerts! (which I haven’t really explored). Basically, SearchStax just seems to be a sane and rational managed Solr option, it has all the features you’d expect/need/want for dealing with such. The prices seem reasonable-ish, generally more affordable than WebSolr, especially if you stay in “silver” and “one node”. At present, we plan to move forward with it. OpenSolr (didn’t look at it much) I have the least to say about this, have spent the least time with it, after spending time with SearchStax and seeing it met our needs. But I wanted to make sure to mention it, because it’s the only other managed Solr I am even aware of. Definitely curious to hear from any users. Here is the pricing page. The prices seem pretty decent, perhaps even cheaper than SearchStax, although it’s unclear to me what you get. Does “0 Solr Clusters” mean that it’s not SolrCloud mode? After seeing how useful SolrCloud APIs are for management (and having this confirmed by many of my peers in other libraries/museums/archives who choose to run SolrCloud), I wouldn’t want to do without it. So I guess that pushes us to “executive” tier? Which at $50/month (billed yearly!) is still just fine, around the same as SearchStax. But they do limit you to one solr index; I prefer SearchStax’s model of just giving you certain host resources and do what you want with it. It does say “shared infrastructure”. Might be worth investigating, curious to hear more from anyone who did. Now, what about ElasticSearch? We’re using Solr mostly because that’s what various collaborative and open source projects in the library/museum/archive world have been doing for years, since before ElasticSearch even existed. So there are various open source libraries and toolsets available that we’re using. But for whatever reason, there seem to be SO MANY MORE managed ElasticSearch SaaS available. At possibly much cheaper pricepoints. Is this because the ElasticSearch market is just bigger? Or is ElasticSearch easier/cheaper to run in a SaaS environment? Or what? I don’t know. But there’s the controversial AWS ElasticSearch Service; there’s the Elastic Cloud “from the creators of ElasticSearch”. On Heroku that lists one Solr add-on, there are THREE ElasticSearch add-ons listed: ElasticCloud, Bonsai ElasticSearch, and SearchBox ElasticSearch. If you just google “managed ElasticSearch” you immediately see 3 or 4 other names. I don’t know enough about ElasticSearch to evaluate them. There seem on first glance at pricing pages to be more affordable, but I may not know what I’m comparing and be looking at tiers that aren’t actually usable for anything or will have hidden fees. But I know there are definitely many more managed ElasticSearch SaaS than Solr. I think ElasticSearch probably does everything our app needs. If I were to start from scratch, I would definitely consider ElasticSearch over Solr just based on how many more SaaS options there are. While it would require some knowledge-building (I have developed a lot of knowlege of Solr and zero of ElasticSearch) and rewriting some parts of our stack, I might still consider switching to ES in the future, we don’t do anything too too complicated with Solr that would be too too hard to switch to ES, probably. jrochkind General Leave a comment January 12, 2021January 27, 2021 Gem authors, check your release sizes Most gems should probably be a couple hundred kb at most. I’m talking about the package actually stored in and downloaded from rubygems by an app using the gem. After all, source code is just text, and it doesn’t take up much space. OK, maybe some gems have a couple images in there. But if you are looking at your gem in rubygems and realize that it’s 10MB or bigger… and that it seems to be getting bigger with every release… something is probably wrong and worth looking into it. One way to look into it is to look at the actual gem package. If you use the handy bundler rake task to release your gem (and I recommend it), you have a ./pkg directory in your source you last released from. Inside it are “.gem” files for each release you’ve made from there, unless you’ve cleaned it up recently. .gem files are just .tar files it turns out. That have more tar and gz files inside them etc. We can go into it, extract contents, and use the handy unix utility du -sh to see what is taking up all the space. How I found the bytes jrochkind-chf kithe (master ?) $ cd pkg jrochkind-chf pkg (master ?) $ ls kithe-2.0.0.beta1.gem kithe-2.0.0.pre.rc1.gem kithe-2.0.0.gem kithe-2.0.1.gem kithe-2.0.0.pre.beta1.gem kithe-2.0.2.gem jrochkind-chf pkg (master ?) $ mkdir exploded jrochkind-chf pkg (master ?) $ cp kithe-2.0.0.gem exploded/kithe-2.0.0.tar jrochkind-chf pkg (master ?) $ cd exploded jrochkind-chf exploded (master ?) $ tar -xvf kithe-2.0.0.tar x metadata.gz x data.tar.gz x checksums.yaml.gz jrochkind-chf exploded (master ?) $ mkdir unpacked_data_tar jrochkind-chf exploded (master ?) $ tar -xvf data.tar.gz -C unpacked_data_tar/ jrochkind-chf exploded (master ?) $ cd unpacked_data_tar/ /Users/jrochkind/code/kithe/pkg/exploded/unpacked_data_tar jrochkind-chf unpacked_data_tar (master ?) $ du -sh * 4.0K MIT-LICENSE 12K README.md 4.0K Rakefile 160K app 8.0K config 32K db 100K lib 300M spec jrochkind-chf unpacked_data_tar (master ?) $ cd spec jrochkind-chf spec (master ?) $ du -sh * 8.0K derivative_transformers 300M dummy 12K factories 24K indexing 72K models 4.0K rails_helper.rb 44K shrine 12K simple_form_enhancements 8.0K spec_helper.rb 188K test_support 4.0K validators jrochkind-chf spec (master ?) $ cd dummy/ jrochkind-chf dummy (master ?) $ du -sh * 4.0K Rakefile 56K app 24K bin 124K config 4.0K config.ru 8.0K db 300M log 4.0K package.json 12K public 4.0K tmp Doh! In this particular gem, I have a dummy rails app, and it has 300MB of logs, cause I haven’t b bothered trimming them in a while, that are winding up including in the gem release package distributed to rubygems and downloaded by all consumers! Even if they were small, I don’t want these in the released gem package at all! That’s not good! It only turns into 12MB instead of 300MB, because log files are so compressable and there is compression involved in assembling the rubygems package. But I have no idea how much space it’s actually taking up on consuming applications machines. This is very irresponsible! What controls what files are included in the gem package? Your .gemspec file of course. The line s.files = is an array of every file to include in the gem package. Well, plus s.test_files is another array of more files, that aren’t supposed to be necessary to run the gem, but are to test it. (Rubygems was set up to allow automated *testing* of gems after download, is why test files are included in the release package. I am not sure how useful this is, and who if anyone does it; although I believe that some linux distro packagers try to make use of it, for better or worse). But nobody wants to list every file in your gem individually, manually editing the array every time you add, remove, or move one. Fortunately, gemspec files are executable ruby code, so you can use ruby as a shortcut. I have seen two main ways of doing this, with different “gem skeleton generators” taking one of two approaches. Sometimes a shell out to git is used — the idea is that everything you have checked into your git should be in the gem release package, no more or no less. For instance, one of my gems has this in it, not sure where it came from or who/what generated it. spec.files = `git ls-files -z`.split("\x0").reject do |f| f.match(%r{^(test|spec|features)/}) end In that case, it wouldn’t have included anything in ./spec already, so this obviously isn’t actually the gem we were looking at before. But in this case, in addition to using ruby logic to manipulate the results, nothing excluded by your .gitignore file will end up included in your gem package, great! In kithe we were looking at before, those log files were in the .gitignore (they weren’t in my repo!), so if I had been using that git-shellout technique, they wouldn’t have been included in the ruby release already. But… I wasn’t. Instead this gem has a gemspec that looks like: s.test_files = Dir["spec/*/"] Just include every single file inside ./spec in the test_files list. Oops. Then I get all those log files! One way to fix I don’t really know which is to be preferred of the git-shellout approach vs the dir-glob approach. I suspect it is the subject of historical religious wars in rubydom, when there were still more people around to argue about such things. Any opinions? Or another approach? Without being in the mood to restructure this gemspec in anyway, I just did the simplest thing to keep those log files out… Dir["spec/*/"].delete_if {|a| a =~ %r{/dummy/log/}} Build the package without releasing with the handy bundler supplied rake build task… and my gem release package size goes from 12MB to 64K. (which actually kind of sounds like a minimum block size or something, right?) Phew! That’s a big difference! Sorry for anyone using previous versions and winding up downloading all that cruft! (Actually this particular gem is mostly a proof of concept at this point and I don’t think anyone else is using it). Check your gem sizes! I’d be willing to be there are lots of released gems with heavily bloated release packages like this. This isn’t the first one I’ve realized was my fault. Because who pays attention to gem sizes anyway? Apparently not many! But rubygems does list them, so it’s pretty easy to see. Are your gem release packages multiple megs, when there’s no good reason for them to be? Do they get bigger every release by far more than the bytes of lines of code you think were added? At some point in gem history was there a big jump from hundreds of KB to multiple MB? When nothing particularly actually happened to gem logic to lead to that? All hints that you might be including things you didn’t mean to include, possibly things that grow each release. You don’t need to have a dummy rails app in your repo to accidentally do this (I accidentally did it once with a gem that had nothing to do with rails). There could be other kind of log files. Or test coverage or performance metric files, or any other artifacts of your build or your development, especially ones that grow over time — that aren’t actually meant to or needed as part of the gem release package! It’s good to sanity check your gem release packages now and then. In most cases, your gem release package should be hundreds of KB at most, not MBs. Help keep your users’ installs and builds faster and slimmer! jrochkind General Leave a comment January 11, 2021 Every time you decide to solve a problem with code… Every time you decide to solve a problem with code, you are committing part of your future capacity to maintaining and operating that code. Software is never done. Software is drowning the world by James Abley jrochkind General Leave a comment January 10, 2021 Updating SolrCloud configuration in ruby We have an app that uses Solr. We currently run a Solr in legacy “not cloud” mode. Our solr configuration directory is on disk on the Solr server, and it’s up to our processes to get our desired solr configuration there, and to update it when it changes. We are in the process of moving to a Solr in “SolrCloud mode“, probably via the SearchStax managed Solr service. Our Solr “Cloud” might only have one node, but “SolrCloud mode” gives us access to additional API’s for managing your solr configuration, as opposed to writing it directly to disk (which may not be possible at all in SolrCloud mode? And certainly isn’t using managed SearchStax). That is, the Solr ConfigSets API, although you might also want to use a few pieces of the Collection Management API for associating a configset with a Solr collection. Basically, you are taking your desired solr config directory, zipping it up, and uploading it to Solr as a “config set” [or “configset”] with a certain name. Then you can create collections using this config set, or reassign which named configset an existing collection uses. I wasn’t able to find any existing ruby gems for interacting with these Solr API’s. RSolr is a “ruby client for interacting with solr”, but was written before most of these administrative API’s existed for Solr, and doesn’t seem to have been updated to deal with them (unless I missed it), RSolr seems to be mostly/only about querying solr, and some limited indexing. But no worries, it’s not too hard to wrap the specific API I want to use in some ruby. Which did seem far better to me than writing the specific HTTP requests each time (and making sure you are dealing with errors etc!). (And yes, I will share the code with you). I decided I wanted an object that was bound to a particular solr collection at a particular solr instance; and was backed by a particular local directory with solr config. That worked well for my use case, and I wound up with an API that looks like this: updater = SolrConfigsetUpdater.new( solr_url: "https://example.com/solr", conf_dir: "./solr/conf", collection_name: "myCollection" ) # will zip up ./solr/conf and upload it as named MyConfigset: updater.upload("myConfigset") updater.list #=> ["myConfigSet"] updater.config_name # what configset name is MyCollection currently configured to use? # => "oldConfigSet" # what if we try to delete the one it's using? updater.delete("oldConfigSet") # => raises SolrConfigsetUpdater::SolrError with message: # "Can not delete ConfigSet as it is currently being used by collection [myConfigset]" # okay let's change it to use the new one and delete the old one updater.update_config_name("myConfigset") # now MyCollection uses this new configset, although we possibly # need to reload the collection to make that so updater.reload # now let's delete the one we're not using updater.delete("oldConfigSet") OK, great. There were some tricks in there in trying to catch the apparently multiple ways Solr can report different kinds of errors, to make sure Solr-reported errors turn into exceptions ideally with good error messages. Now, in addition to uploading a configset initially for a collection you are creating to use, the main use case I have is wanting to UPDATE the configuration to new values in an existing collection. Sure, this often requires a reindex afterwards. If you have the recently released Solr 8.7, it will let you overwrite an existing configset, so this can be done pretty easily. updater.upload(updater.config_name, overwrite: true) updater.reload But prior to Solr 8.7 you can not overwrite an existing configset. And SearchStax doesn’t yet have Solr 8.7. So one way or another, we need to do a dance where we upload the configset under a new name than switch the collection to use it. Having this updater object that lets us easily execute relevant Solr API lets us easily experiment with different logic flows for this. For instance in a Solr listserv thread, Alex Halovnic suggests a somewhat complicated 8-step process workaround, which we can implement like so: current_name = updater.config_name temp_name = "#{current_name}_temp" updater.create(from: current_name, to: temp_name) updater.change_config_name(temp_name) updater.reload updater.delete(current_name) updater.upload(configset_name: current_name) updater.change_config_name(current_name) updater.reload updater.delete(temp_name) That works. But talking to Dann Bohn at Penn State University, he shared a different algorithm, which goes like: Make a cryptographic digest hash of the entire solr directory, which we’re going to use in the configset name. Check if the collection is already using a configset named $name_$digest, which if it already is, you’re done, no change needed. Otherwise, upload the configset with the fingerprint-based name, switch the collection to use it, reload, delete the configset that the collection used to use. At first this seemed like overkill to me, but after thinking and experimenting with it, I like it! It is really quick to make a digest of a handful of files, that’s not a big deal. (I use first 7 chars of hex SHA256). And even if we had Solr 8.7, I like that we can avoid doing any operation on solr at all if there had been no changes — I really want to use this operation much like a Rails db:migrate, running it on every deploy to make sure the solr schema matches the one in the repo for the depoy. Dann also shared his open source code with me, which was helpful for seeing how to make the digest, how to make a Zip file in ruby, etc. Thanks Dann! Sharing my code So I also wrote some methods to implement those variant updating stragies, Dann’s, and Alex Halovnic’s from the list etc. I thought about wrapping this all up as a gem, but I didn’t really have the time to make it really good enough for that. My API is a little bit janky, I didn’t spend the extra time think it out really well to minimize the need for future backwards incompat changes like I would if it were a gem. I also couldn’t figure out a great way to write automated tests for this that I would find particularly useful; so in my code base it’s actually not currently test-covered (shhhhh) but in a gem I’d want to solve that somehow. But I did try to write the code general purpose/flexible so other people could use it for their use cases; I tried to document it to my highest standards; and I put it all in one file which actually might not be the best OO abstraction/design, but makes it easier for you to copy and paste the single file for your own use. :) So you can find my code here; it is apache-licensed; and you are welcome to copy and paste it and do whatever you like with it, including making a gem yourself if you want. Maybe I’ll get around to making it a gem in the future myself, I dunno, curious if there’s interest. The SearchStax proprietary API’s SearchStax has it’s own API’s that can I think be used for updating configsets and setting collections to use certain configsets etc. When I started exploring them, they are’t the worst vendor API’s I’ve seen, but I did find them a bit cumbersome to work with. The auth system involves a lot of steps (why can’t you just create an API Key from the SearchStax Web GUI?). Overall I found them harder to use than just the standard Solr Cloud API’s, which worked fine in the SearchStax deployment, and have the added bonus of being transferable to any SolrCloud deployment instead of being SearchStax-specific. While the SearchStax docs and support try to steer you to the SearchStax specific API’s, I don’t think there’s really any good reason for this. (Perhaps the custom SearchStax API’s were written long ago when Solr API’s weren’t as complete?) SearchStax support suggested that the SearchStax APIs were somehow more secure; but my SearchStax Solr API’s are protected behind HTTP basic auth, and if I’ve created basic auth credentials (or IP addr allowlist) those API’s will be available to anyone with auth to access Solr whether I use em or not! And support also suggested that the SearchStax API use would be logged, whereas my direct Solr API use would not be, which seems to be true at least in default setup, I can probably configure solr logging differently, but it just isn’t that important to me for these particular functions. So after some initial exploration with SearchStax API, I realized that SolrCloud API (which I had never used before) could do everything I need and was more straightforward and transferable to use, and I’m happy with my decision to go with that. jrochkind General 3 Comments December 15, 2020December 16, 2020 Are you talking to Heroku redis in cleartext or SSL? In “typical” Redis installation, you might be talking to redis on localhost or on a private network, and clients typically talk to redis in cleartext. Redis doesn’t even natively support communications over SSL. (Or maybe it does now with redis6?) However, the Heroku redis add-on (the one from Heroku itself) supports SSL connections via “Stunnel”, a tool popular with other redis users use to get SSL redis connections too. (Or maybe via native redis with redis6? Not sure if you’d know the difference, or if it matters). There are heroku docs on all of this which say: While you can connect to Heroku Redis without the Stunnel buildpack, it is not recommend. The data traveling over the wire will be unencrypted. Perhaps especially because on heroku your app does not talk to redis via localhost or on a private network, but on a public network. But I think I’ve worked on heroku apps before that missed this advice and are still talking to heroku in the clear. I just happened to run across it when I got curious about the REDIS_TLS_URL env/config variable I noticed heroku setting. Which brings us to another thing, that heroku doc on it is out of date, it doesn’t mention the REDIS_TLS_URL config variable, just the REDIS_URL one. The difference? the TLS version will be a url beginning with rediss:// instead of redis:// , note extra s, which many redis clients use as a convention for “SSL connection to redis probably via stunnel since redis itself doens’t support it”. The redis docs provide ruby and go examples which instead use REDIS_URL and writing code to swap the redis:// for rediss:// and even hard-code port number adjustments, which is silly! (While I continue to be very impressed with heroku as a product, I keep running into weird things like this outdated documentation, that does not match my experience/impression of heroku’s all-around technical excellence, and makes me worry if heroku is slipping…). The docs also mention a weird driver: ruby arg for initializing the Redis client that I’m not sure what it is and it doesn’t seem necessary. The docs are correct that you have to tell the ruby Redis client not to try to verify SSL keys against trusted root certs, and this implementation uses a self-signed cert. Otherwise you will get an error that looks like: OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain) So, can be as simple as: redis_client = Redis.new(url: ENV['REDIS_TLS_URL'], ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE }) $redis = redis_client # and/or Resque.redis = redis_client I don’t use sidekiq on this project currently, but to get the SSL connection with VERIFY_NONE, looking at sidekiq docs maybe on sidekiq docs you might have to(?): redis_conn = proc { Redis.new(url: ENV['REDIS_TLS_URL'], ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE }) } Sidekiq.configure_client do |config| config.redis = ConnectionPool.new(size: 5, &redis_conn) end Sidekiq.configure_server do |config| config.redis = ConnectionPool.new(size: 25, &redis_conn) end (Not sure what values you should pick for connection pool size). While the sidekiq docs mention heroku in passing, they don’t mention need for SSL connections — I think awareness of this heroku feature and their recommendation you use it may not actually be common! Update: Beware REDIS_URL can also be rediss On one of my apps I saw a REDIS_URL which used redis: and a REDIS_TLS_URL which uses (secure) rediss:. But on another app, it provides *only* a REDIS_URL, which is rediss — meaning you have to set the verify_mode: OpenSSL::SSL::VERIFY_NONE when passing it to ruby redis client. So you have to be prepared to do this with REDIS_URL values too — I think it shouldn’t hurt to set the ssl_params option even if you pass it a non-ssl redis: url, so just set it all the time? This second app was heroku-20 stack, and the first was heroku-18 stack, is that the difference? No idea. Documented anywhere? I doubt it. Definitely seems sloppy for what I expect of heroku, making me get a bit suspicious of whether heroku is sticking to the really impressive level of technical excellence and documentation I expect from them. So, your best bet is to check for both REDIS_TLS_URL and REDIS_URL, prefering the TLS one if present, realizing the REDIS_URL can have a rediss:// value in it too. The heroku docs also say you don’t get secure TLS redis connection on “hobby” plans, but I”m not sure that’s actually true anymore on heroku-20? Not trusting the docs is not a good sign. jrochkind General 4 Comments November 24, 2020November 25, 2020 Comparing performance of a Rails app on different Heroku formations I develop a “digital collections” or “asset management” app, which manages and makes digitized historical objects and their descriptions available to the public, from the collections here at the Science History Institute. The app receives relatively low level of traffic (according to Google Analytics, around 25K pageviews a month), although we want it to be able to handle spikes without falling down. It is not the most performance-optimized app, it does have some relatively slow responses and can be RAM-hungry. But it works adequately on our current infrastructure: Web traffic is handled on a single AWS EC2 t2.medium instance, with 10 passenger processes (free version of passenger, so no multi-threading). We are currently investigating the possibility of moving our infrastructure to heroku. After realizing that heroku standard dynos did not seem to have the performance characteristics I had expected, I decided to approach performance testing more methodically, to compare different heroku dyno formations to each other and to our current infrastructure. Our basic research question is probably What heroku formation do we need to have similar performance to our existing infrastructure? I am not an expert at doing this — I did some research, read some blog posts, did some thinking, and embarked on this. I am going to lead you through how I approached this and what I found. Feedback or suggestions are welcome. The most surprising result I found was much poorer performance from heroku standard dynos than I expected, and specifically that standard dynos would not match performance of present infrastructure. What URLs to use in test Some older load-testing tools only support testing one URL over and over. I decided I wanted to test a larger sample list of URLs — to be a more “realistic” load, and also because repeatedly requesting only one URL might accidentally use caches in ways you aren’t expecting giving you unrepresentative results. (Our app does not currently use fragment caching, but caches you might not even be thinking about include postgres’s built-in automatic caches, or passenger’s automatic turbocache (which I don’t think we have turned on)). My initial thought to get a list of such URLs from our already-in-production app from production logs, to get a sample of what real traffic looks like. There were a couple barriers for me to using production logs as URLs: Some of those URLs might require authentication, or be POST requests. The bulk of our app’s traffic is GET requests available without authentication, and I didn’t feel like the added complexity of setting up anything else in a load traffic was worthwhile. Our app on heroku isn’t fully functional yet. Without having connected it to a Solr or background job workers, only certain URLs are available. In fact, a large portion of our traffic is an “item” or “work” detail page like this one. Additionally, those are the pages that can be the biggest performance challenge, since the current implementation includes a thumbnail for every scanned page or other image, so response time unfortunately scales with number of pages in an item. So I decided a good list of URLs was simply a representative same of those “work detail” pages. In fact, rather than completely random sample, I took the 50 largest/slowest work pages, and then added in another 150 randomly chosen from our current ~8K pages. And gave them all a randomly shuffled order. In our app, every time a browser requests a work detail page, the JS on that page makes an additional request for a JSON document that powers our page viewer. So for each of those 200 work detail pages, I added the JSON request URL, for a more “realistic” load, and 400 total URLs. Performance: “base speed” vs “throughput under load” Thinking about it, I realized there were two kinds of “performance” or “speed” to think about. You might just have a really slow app, to exagerate let’s say typical responses are 5 seconds. That’s under low/no-traffic, a single browser is the only thing interacting with the app, it makes a single request, and has to wait 5 seconds for a response. That number might be changed by optimizations or performance regressions in your code (including your dependencies). It might also be changed by moving or changing hardware or virtualization environment — including giving your database more CPU/RAM resources, etc. But that number will not change by horizontally scaling your deployment — adding more puma or passenger processes or threads, scaling out hosts with a load balancer or heroku dynos. None of that will change this base speed because it’s just how long the app takes to prepare a response when not under load, how slow it is in a test only one web worker , where adding web workers won’t matter because they won’t be used. Then there’s what happens to the app actually under load by multiple users at once. The base speed is kind of a lower bound on throughput under load — page response time is never going to get better than 5s for our hypothetical very slow app (without changing the underlying base speed). But it can get a lot worse if it’s hammered by traffic. This throughput under load can be effected not only by changing base speed, but also by various forms of horizontal scaling — how many puma or passenger processes you have with how many threads each, and how many CPUs they have access to, as well as number of heroku dynos or other hosts behind a load balancer. (I had been thinking about this distinction already, but Nate Berkopec’s great blog post on scaling Rails apps gave me the “speed” vs “throughout” terminology to use). For my condition, we are not changing the code at all. But we are changing the host architecture from a manual EC2 t2.medium to heroku dynos (of various possible types) in a way that could effect base speed, and we’re also changing our scaling architecture in a way that could change throughput under load on top of that — from one t2.medium with 10 passenger process to possibly multiple heroku dynos behind heroku’s load balancer, and also (for Reasons) switching from free passenger to trying puma with multiple threads per process. (we are running puma 5 with new experimental performance features turned on). So we’ll want to get a sense of base speed of the various host choices, and also look at how throughput under load changes based on various choices. Benchmarking tool: wrk We’re going to use wrk. There are LOTS of choices for HTTP benchmarking/load testing, with really varying complexity and from different eras of web history. I got a bit overwhelmed by it, but settled on wrk. Some other choices didn’t have all the features we need (some way to test a list of URLs, with at least some limited percentile distribution reporting). Others were much more flexible and complicated and I had trouble even figuring out how to use them! wrk does need a custom lua script in order to handle a list of URLs. I found a nice script here, and modified it slightly to take filename from an ENV variable, and not randomly shuffle input list. It’s a bit confusing understanding the meaning of “threads” vs “connections” in wrk arguments. This blog post from appfolio clears it up a bit. I decided to leave threads set to 1, and vary connections for load — so -c1 -t1 is a “one URL at a time” setting we can use to test “base speed”, and we can benchmark throughput under load by increasing connections. We want to make sure we run the test for long enough to touch all 400 URLs in our list at least once, even in the slower setups, to have a good comparison — ideally it would be go through the list more than once, but for my own ergonomics I had to get through a lot of tests so ended up less tha ideal. (Should I have put fewer than 400 URLs in? Not sure). Conclusions in advance As benchmarking posts go (especially when I’m the one writing them), I’m about to drop a lot of words and data on you. So to maximize the audience that sees the conclusions (because they surprise me, and I want feedback/pushback on them), I’m going to give you some conclusions up front. Our current infrastructure has web app on a single EC2 t2.medium, which is a burstable EC2 type — our relatively low-traffic app does not exhaust it’s burst credits. Measuring base speed (just one concurrent request at a time), we found that performance dynos seem to have about the CPU speed of a bursting t2.medium (just a hair slower). But standard dynos are as a rule 2 to 3 times slower; additionally they are highly variable, and that variability can be over hours/days. A 3 minute period can have measured response times 2 or more times slower than another 3 minute period a couple hours later. But they seem to typically be 2-3x slower than our current infrastructure. Under load, they scale about how you’d expect if you knew how many CPUs are present, no real surprises. Our existing t2.medium has two CPUs, so can handle 2 simultaneous requests as fast as 1, and after that degrades linearly. A single performance-L ($500/month) has 4 CPUs (8 hyperthreads), so scales under load much better than our current infrastructure. A single performance-M ($250/month) has only 1 CPU (!), so scales pretty terribly under load. Testing scaling with 4 standard-2x’s ($200/month total), we see that it scales relatively evenly. Although lumpily because of variability, and it starts out so much worse performing that even as it scales “evenly” it’s still out-performed by all other arcchitectures. :( (At these relatively fast median response times you might say it’s still fast enough who cares, but in our fat tail of slower pages it gets more distressing). Now we’ll give you lots of measurements, or you can skip all that to my summary discussion or conclusions for our own project at the end. Let’s compare base speed OK, let’s get to actual measurements! For “base speed” measurements, we’ll be telling wrk to use only one connection and one thread. Existing t2.medium: base speed Our current infrastructure is one EC2 t2.medium. This EC2 instance type has two vCPUs and 4GB of RAM. On that single EC2 instance, we run passenger (free not enterprise) set to have 10 passenger processes, although the base speed test with only one connection should only touch one of the workers. The t2 is a “burstable” type, and we do always have burst credits (this is not a high traffic app; verified we never exhausted burst credits in these tests), so our test load may be taking advantage of burst cpu. $ URLS=./sample_works.txt wrk -c 1 -t 1 -d 3m --timeout 20s --latency -s load_test/multiplepaths.lua.txt https://[current staging server] multiplepaths: Found 400 paths multiplepaths: Found 400 paths Running 3m test @ https://staging-digital.sciencehistory.org 1 threads and 1 connections Thread Stats Avg Stdev Max +/- Stdev Latency 311.00ms 388.11ms 2.37s 86.45% Req/Sec 11.89 8.96 40.00 69.95% Latency Distribution 50% 90.99ms 75% 453.40ms 90% 868.81ms 99% 1.72s 966 requests in 3.00m, 177.43MB read Requests/sec: 5.37 Transfer/sec: 0.99MB I’m actually feeling pretty good about those numbers on our current infrastructure! 90ms median, not bad, and even 453ms 75th percentile is not too bad. Now, our test load involves some JSON responses that are quicker to deliver than corresponding HTML page, but still pretty good. The 90th/99th/and max request (2.37s) aren’t great, but I knew I had some slow pages, this matches my previous understanding of how slow they are in our current infrastructure. 90th percentile is ~9 times 50th percenile. I don’t have an understanding of why the two different Req/Sec and Requests/Sec values are so different, and don’t totally understand what to do with the Stdev and +/- Stdev values, so I’m just going to be sticking to looking at the latency percentiles, I think “latency” could also be called “response times” here. But ok, this is our baseline for this workload. And doing this 3 minute test at various points over the past few days, I can say it’s nicely regular and consistent, occasionally I got a slower run, but 50th percentile was usually 90ms–105ms, right around there. Heroku standard-2x: base speed From previous mucking about, I learned I can only reliably fit one puma worker in a standard-1x, and heroku says “we typically recommend a minimum of 2 processes, if possible” (for routing algorithmic reasons when scaled to multiple dynos), so I am just starting at a standard-2x with two puma workers each with 5 threads, matching heroku recommendations for a standard-2x dyno. So one thing I discovered is that bencharks from a heroku standard dyno are really variable, but here are typical ones: $ heroku dyno:resize type size qty cost/mo ─────── ─────────── ─── ─────── web Standard-2X 1 50 $ heroku config:get --shell WEB_CONCURRENCY RAILS_MAX_THREADS WEB_CONCURRENCY=2 RAILS_MAX_THREADS=5 $ URLS=./sample_works.txt wrk -c 1 -t 1 -d 3m --timeout 20s --latency -s load_test/multiplepaths.lua.txt https://scihist-digicoll.herokuapp.com/ multiplepaths: Found 400 paths multiplepaths: Found 400 paths Running 3m test @ https://scihist-digicoll.herokuapp.com/ 1 threads and 1 connections Thread Stats Avg Stdev Max +/- Stdev Latency 645.08ms 768.94ms 4.41s 85.52% Req/Sec 5.78 4.36 20.00 72.73% Latency Distribution 50% 271.39ms 75% 948.00ms 90% 1.74s 99% 3.50s 427 requests in 3.00m, 74.51MB read Requests/sec: 2.37 Transfer/sec: 423.67KB I had heard that heroku standard dynos would have variable performance, because they are shared multi-tenant resources. I had been thinking of this like during a 3 minute test I might see around the same median with more standard deviation — but instead, what it looks like to me is that running this benchmark on Monday at 9am might give very different results than at 9:50am or Tuesday at 2pm. The variability is over a way longer timeframe than my 3 minute test — so that’s something learned. Running this here and there over the past week, the above results seem to me typical of what I saw. (To get better than “seem typical” on this resource, you’d have to run a test, over several days or a week I think, probably not hammering the server the whole time, to get a sense of actual statistical distribution of the variability). I sometimes saw tests that were quite a bit slower than this, up to a 500ms median. I rarely if ever saw results too much faster than this on a standard-2x. 90th percentile is ~6x median, less than my current infrastructure, but that still gets up there to 1.74 instead of 864ms. This typical one is quite a bit slower than than our current infrastructure, our median response time is 3x the latency, with 90th and max being around 2x. This was worse than I expected. Heroku performance-m: base speed Although we might be able to fit more puma workers in RAM, we’re running a single-connection base speed test, so it shouldn’t matter to, and we won’t adjust it. $ heroku dyno:resize type size qty cost/mo ─────── ───────────── ─── ─────── web Performance-M 1 250 $ heroku config:get --shell WEB_CONCURRENCY RAILS_MAX_THREADS WEB_CONCURRENCY=2 RAILS_MAX_THREADS=5 $ URLS=./sample_works.txt wrk -c 1 -t 1 -d 3m --timeout 20s --latency -s load_test/multiplepaths.lua.txt https://scihist-digicoll.herokuapp.com/ multiplepaths: Found 400 paths multiplepaths: Found 400 paths Running 3m test @ https://scihist-digicoll.herokuapp.com/ 1 threads and 1 connections Thread Stats Avg Stdev Max +/- Stdev Latency 377.88ms 481.96ms 3.33s 86.57% Req/Sec 10.36 7.78 30.00 37.03% Latency Distribution 50% 117.62ms 75% 528.68ms 90% 1.02s 99% 2.19s 793 requests in 3.00m, 145.70MB read Requests/sec: 4.40 Transfer/sec: 828.70KB This is a lot closer to the ballpark of our current infrastructure. It’s a bit slower (117ms median intead of 90ms median), but in running this now and then over the past week it was remarkably, thankfully, consistent. Median and 99th percentile are both 28% slower (makes me feel comforted that those numbers are the same in these two runs!), that doesn’t bother me so much if it’s predictable and regular, which it appears to be. The max appears to me still a little bit less regular on heroku for some reason, since performance is supposed to be non-shared AWS resources, you wouldn’t expect it to be, but slow requests are slow, ok. 90th percentile is ~9x median, about the same as my current infrastructure. heroku performance-l: base speed $ heroku dyno:resize type size qty cost/mo ─────── ───────────── ─── ─────── web Performance-L 1 500 $ heroku config:get --shell WEB_CONCURRENCY RAILS_MAX_THREADS WEB_CONCURRENCY=2 RAILS_MAX_THREADS=5 URLS=./sample_works.txt wrk -c 1 -t 1 -d 3m --timeout 20s --latency -s load_test/multiplepaths.lua.txt https://scihist-digicoll.herokuapp.com/ multiplepaths: Found 400 paths multiplepaths: Found 400 paths Running 3m test @ https://scihist-digicoll.herokuapp.com/ 1 threads and 1 connections Thread Stats Avg Stdev Max +/- Stdev Latency 471.29ms 658.35ms 5.15s 87.98% Req/Sec 10.18 7.78 30.00 36.20% Latency Distribution 50% 123.08ms 75% 635.00ms 90% 1.30s 99% 2.86s 704 requests in 3.00m, 130.43MB read Requests/sec: 3.91 Transfer/sec: 741.94KB No news is good news, it looks very much like performance-m, which is exactly what we expected, because this isn’t a load test. It tells us that performance-m and performance-l seem to have similar CPU speeds and similar predictable non-variable regularity, which is what I find running this test periodically over a week. 90th percentile is ~10x median, about the same as current infrastructure. The higher Max speed is just evidence of what I mentioned, the speed of slowest request did seem to vary more than on our manual t2.medium, can’t really explain why. Summary: Base speed Not sure how helpful this visualization is, charting 50th, 75th, and 90th percentile responses across architectures. But basically: performance dynos perform similarly to my (bursting) t2.medium. Can’t explain why performance-l seems slightly slower than performance-m, might be just incidental variation when I ran the tests. The standard-2x is about twice as slow as my (bursting) t2.medium. Again recall standard-2x results varied a lot every time I ran them, the one I reported seems “typical” to me, that’s not super scientific, admittedly, but I’m confident that standard-2x are a lot slower in median response times than my current infrastructure. Throughput under load Ok, now we’re going to test using wrk to use more connections. In fact, I’ll test each setup with various number of connections, and graph the result, to get a sense of how each formation can handle throughput under load. (This means a lot of minutes to get all these results, at 3 minutes per number of connection test, per formation!). An additional thing we can learn from this test, on heroku we can look at how much RAM is being used after a load test, to get a sense of the app’s RAM usage under traffic to understand the maximum number of puma workers we might be able to fit in a given dyno. Existing t2.medium: Under load A t2.medium has 4G of RAM and 2 CPUs. We run 10 passenger workers (no multi-threading, since we are free, rather than enterprise, passenger). So what do we expect? With 2 CPUs and more than 2 workers, I’d expect it to handle 2 simultaneous streams of requests almost as well as 1; 3-10 should be quite a bit slower because they are competing for the 2 CPUs. Over 10, performance will probably become catastrophic. 2 connections are exactly flat with 1, as expected for our two CPUs, hooray! Then it goes up at a strikingly even line. Going over 10 (to 12) simultaneous connections doesn’t matter, even though we’ve exhausted our workers, I guess at this point there’s so much competition for the two CPUs already. The slope of this curve is really nice too actually. Without load, our median response time is 100ms, but even at a totally overloaded 12 overloaded connections, it’s only 550ms, which actually isn’t too bad. We can make a graph that in addition to median also has 75th, 90th, and 99th percentile response time on it: It doesn’t tell us too much; it tells us the upper percentiles rise at about the same rate as the median. At 1 simultaneous connection 90th percentile of 846ms is about 9 times the median of 93ms; at 10 requests the 90th percentile of 3.6 seconds is about 8 times the median of 471ms. This does remind us that under load when things get slow, this has more of a disastrous effect on already slow requests than fast requests. When not under load, even our 90th percentile was kind of sort of barley acceptable at 846ms, but under load at 3.6 seconds it really isn’t. Single Standard-2X dyno: Under load A standard-2X dyno has 1G of RAM. The (amazing, excellent, thanks schneems) heroku puma guide suggests running two puma workers with 5 threads each. At first I wanted to try running three workers, which seemed to fit into available RAM — but under heavy load-testing I was getting Heroku R14 Memory Quota Exceeded errors, so we’ll just stick with the heroku docs recommendations. Two workers with 5 threads each fit with plenty of headroom. A standard-2x dyno is runs on shared (multi-tenant) underlying Amazon virtual hardware. So while it is running on hardware with 4 CPUs (each of which can run two “hyperthreads“), the puma doc suggests “it is best to assume only one process can execute at a time” on standard dynos. What do we expect? Well, if it really only had one CPU, it would immediately start getting bad at 2 simulataneous connections, and just get worse from there. When we exceed the two worker count, will it get even worse? What about when we exceed the 10 thread (2 workers * 5 threads) count? You’d never run just one dyno if you were expecting this much traffic, you’d always horizontally scale. This very artificial test is just to get a sense of it’s characteristics. Also, we remember that standard-2x’s are just really variable; I could get much worse or better runs than this, but graphed numbers from a run that seemed typical. Well, it really does act like 1 CPU, 2 simultaneous connections is immediately a lot worse than 1. The line isn’t quite as straight as in our existing t2.medium, but it’s still pretty straight; I’d attribute the slight lumpiness to just the variability of shared-architecture standard dyno, and figure it would get perfectly straight with more data. It degrades at about the same rate of our baseline t2.medium, but when you start out slower, that’s more disastrous. Our t2.medium at an overloaded 10 simultaneous requests is 473ms (pretty tolerable actually), 5 times the median at one request only. This standard-2x has a median response time of 273 ms at only one simultaneous request, and at an overloaded 10 requests has a median response time also about 5x worse, but that becomes a less tolerable 1480ms. Does also graphing the 75th, 90th, and 99th percentile tell us much? Eh, I think the lumpiness is still just standard shared-architecture variability. The rate of “getting worse” as we add more overloaded connections is actually a bit better than it was on our t2.medium, but since it already starts out so much slower, we’ll just call it a wash. (On t2.medium, 90th percentile without load is 846ms and under an overloaded 10 connections 3.6s. On this single standard-2x, it’s 1.8s and 5.2s). I’m not sure how much these charts with various percentiles on them tell us, I’ll not include them for every architecture hence. standard-2x, 4 dynos: Under load OK, realistically we already know you shouldn’t have just one standard-2x dyno under that kind of load. You’d scale out, either manually or perhaps using something like the neat Rails Autoscale add-on. Let’s measure with 4 dynos. Each is still running 2 puma workers, with 5 threads each. What do we expect? Hm, treating each dyno as if it has only one CPU, we’d expect it to be able to handle traffic pretty levelly up to 4 simultenous connections, distributed to 4 dynos. It’s going to do worse after that, but up to 8 there is still one puma worker per connection so it might get even worse after 8? Well… I think that actually is relatively flat from 1 to 4 simultaneous connections, except for lumpiness from variability. But lumpiness from variability is huge! We’re talking 250ms median measured at 1 connection, up to 369ms measured median at 2, down to 274ms at 3. And then maybe yeah, a fairly shallow slope up to 8 simutaneous connections than steeper. But it’s all fairly shallow slope compared to our base t2.medium. At 8 connections (after which we pretty much max out), the standard-2x median of 464ms is only 1.8 times the median at 1 conection. Compared to the t2.median increase of 3.7 times. As we’d expect, scaling out to 4 dynos (with four cpus/8 hyperthreads) helps us scale well — the problem is the baseline is so slow to begin (with very high bounds of variability making it regularly even slower). performance-m: Under load A performance-m has 2.5 GB of memory. It only has one physical CPU, although two “vCPUs” (two hyperthreads) — and these are all your apps, it is not shared. By testing under load, I demonstrated I could actually fit 12 workers on there without any memory limit errors. But is there any point to doing that with only 1/2 CPUs? Under a bit of testing, it appeared not. The heroku puma docs recommend only 2 processes with 5 threads. You could do a whole little mini-experiment just trying to measure/optimize process/thread count on performance-m! We’ve already got too much data here, but in some experimentation it looked to me like 5 processes with 2 threads each performed better (and certainly no worse) than 2 processes with 5 threads — if you’ve got the RAM just sitting there anyway (as we do), why not? I actually tested with 6 puma processes with 2 threads each. There is still a large amount of RAM headroom we aren’t going to use even under load. What do we expect? Well, with the 2 “hyperthreads” perhaps it can handle 2 simultaneous requests nearly as well as 1 (or not?); after that, we expect it to degrade quickly same as our original t2.medium did. It an handle 2 connections slightly better than you’d expect if there really was only 1 CPU, so I guess a hyperthread does give you something. Then the slope picks up, as you’d expect; and it looks like it does get steeper after 4 simultaneous connections, yup. performance-l: Under load A performance-l ($500/month) costs twice as much as a performance-m ($250/month), but has far more than twice as much resources. performance-l has a whopping 14GB of RAM compared to performance-m’s 2.5GB; and performance-l has 4 real CPUs/hyperthreads available to use (visible using the nproc technique in the heroku puma article. Because we have plenty of RAM to do so, we’re going to run 10 worker processes to match our original t2.medium’s. We still ran with 2 threads, just cause it seems like maybe you should never run a puma worker with only one thread? But who knows, maybe 10 workers with 1 thread each would perform better; plenty of room (but not plenty of my energy) for yet more experimentation. What do we expect? The graph should be pretty flat up to 4 simultaneous connections, then it should start getting worse, pretty evenly as simultaneous connections rise all the way up to 12. It is indeed pretty flat up to 4 simultaneous connections. Then up to 8 it’s still not too bad — median at 8 is only ~1.5 median at 1(!). Then it gets worse after 8 (oh yeah, 8 hyperthreads?). But the slope is wonderfully shallow all the way. Even at 12 simultaneous connections, the median response time of 266ms is only 2.5x what it was at one connection. (In our original t2.medium, at 12 simultaneous connections median response time was over 5x what it was at 1 connection). This thing is indeed a monster. Summary Comparison: Under load We showed a lot of graphs that look similar, but they all had different sclaes on the y-axis. Let’s plot median response times under load of all architectures on the same graph, and see what we’re really dealing with. The blue t2.medium is our baseline, what we have now. We can see that there isn’t really a similar heroku option, we have our choice of better or worse. The performance-l is just plain better than what we have now. It starts out performing about the same as what we have now for 1 or 2 simultaneous connections, but then scales so much flatter. The performance-m also starts out about thesame, but sccales so much worse than even what we have now. (it’s that 1 real CPU instead of 2, I guess?). The standard-2x scaled to 4 dynos… has it’s own characteristics. It’s baseline is pretty terrible, it’s 2 to 3 times as slow as what we have now even not under load. But then it scales pretty well, since it’s 4 dynos after all, it doesn’t get worse as fast as performance-m does. But it started out so bad, that it remains far worse than our original t2.medium even under load. Adding more dynos to standard-2x will help it remain steady under even higher load, but won’t help it’s underlying problem that it’s just slower than everyone else. Discussion: Thoughts and Surprises I had been thinking of a t2.medium (even with burst) as “typical” (it is after all much slower than my 2015 Macbook), and has been assuming (in retrospect with no particular basis) that a heroku standard dyno would perform similarly. Most discussion and heroku docs, as well as the naming itself, suggest that a ‘standard’ dyno is, well, standard, and performance dynos are for “super scale, high traffic apps”, which is not me. But in fact, heroku standard dynos are much slower and more variable in performance than a bursting t2.medium. I suspect they are slower than other options you might consider non-heroku “typical” options. My conclusion is honestly that “standard” dynos are really “for very fast, well-optimized apps that can handle slow and variable CPU” and “performance” dynos are really “standard, matching the CPU speeds you’d get from a typical non-heroku option”. But this is not how they are documented or usually talked about. Are other people having really different experiences/conclusions than me? If so, why, or where have I gone wrong? This of course has implications for estimating your heroku budget if considering switching over. :( If you have a well-optimized fast app, say even 95th percentile is 200ms (on bursting t2.medium), then you can handle standard slowness — so what your 95th percentile is now 600ms (and during some time periods even much slower, 1s or worse, due to variability). That’s not so bad for a 95th percentile. One way to get a very fast is of course caching. There is lots of discussion of using caching in Rails, sometimes the message (explicit or implicit) is “you have to use lots of caching to get reasonable performance cause Rails is so slow.” What if many of these people are on heroku, and it’s really you have to use lots of caching to get reasonable performance on heroku standard dyno?? I personally don’t think caching is maintenance free; in my experience properly doing cache invalidation and dealing with significant processing spikes needed when you choose to invalidate your entire cache (cause cached HTML needs to change) lead to real maintenance/development cost. I have not needed caching to meet my performance goals on present architecture. Everyone doesn’t necessarily have the same performance goals/requirements. Mine of a low-traffic non-commercial site are are maybe more modest, I just need users not to be super annoyed. But whatever your performance goals, you’re going to have to spend more time on optimization on a heroku standard than something with much faster CPU — like a standard affordable mid-tier EC2. Am I wrong? One significant factor on heroku standard dyno performance is that they use shared/multi-tenant infrastructure. I wonder if they’ve actually gotten lower performance over time, as many customers (who you may be sharing with) have gotten better at maximizing their utilization, so the shared CPUs are typically more busy? Like a frog boiling, maybe nobody noticed that standard dynos have become lower performance? I dunno, brainstorming. Or maybe there are so many apps that start on heroku instead of switcching from somewhere else, that people just don’t realize that standard dynos are much slower than other low/mid-tier options? I was expecting to pay a premium for heroku — but even standard-2x’s are a significant premium over paying for t2.medium EC2 yourself, one I found quite reasonable…. performance dynos are of course even more premium. I had a sort of baked-in premise that most Rails apps are “IO-bound”, they spend more time waiting on IO than using CPU. I don’t know where I got that idea, I heard it once a long time ago and it became part of my mental model. I now do not believe this is true true of my app, and I do not in fact believe it is true of most Rails apps in 2020. I would hypothesize that most Rails apps today are in fact CPU-bound. The performance-m dyno only has one CPU. I had somehow also been assuming that it would have two CPUs — I’m not sure why, maybe just because at that price! It would be a much better deal with two CPUs. Instead we have a huge jump from $250 performance-m to $500 performance-l that has 4x the CPUs and ~5x the RAM. So it doesn’t make financial sense to have more than one performance-m dyno, you might as well go to performance-l. But this really complicates auto-scaling, whether using Heroku’s feature , or the awesome Rails Autoscale add-on. I am not sure I can afford a performance-l all the time, and a performance-m might be sufficient most of the time. But if 20% of the time I’m going to need more (or even 5%, or even unexpectedly-mentioned-in-national-media), it would be nice to set things up to autoscale up…. I guess to financially irrational 2 or more performance-m’s? :( The performance-l is a very big machine, that is significantly beefier than my current infrastructure. And has far more RAM than I need/can use with only 4 physical cores. If I consider standard dynos to be pretty effectively low tier (as I do), heroku to me is kind of missing mid-tier options. A 2 CPU option at 2.5G or 5G of RAM would make a lot of sense to me, and actually be exactly what I need… really I think performance-m would make more sense with 2 CPUs at it’s existing already-premium price point, and to be called a “performance” dyno. . Maybe heroku is intentionally trying set options to funnel people to the highest-priced performance-l. Conclusion: What are we going to do? In my investigations of heroku, my opinion of the developer UX and general service quality only increases. It’s a great product, that would increase our operational capacity and reliability, and substitute for so many person-hours of sysadmin/operational time if we were self-managing (even on cloud architecture like EC2). But I had originally been figuring we’d use standard dynos (even more affordably, possibly auto-scaled with Rails Autoscale plugin), and am disappointed that they end up looking so much lower performance than our current infrastructure. Could we use them anyway? Response time going from 100ms to 300ms — hey, 300ms is still fine, even if I’m sad to lose those really nice numbers I got from a bit of optimization. But this app has a wide long-tail ; our 75th percentile going from 450ms to 1s, our 90th percentile going from 860ms to 1.74s and our 99th going from 2.3s to 4.4s — a lot harder to swallow. Especially when we know that due to standard dyno variability, a slow-ish page that on my present architecture is reliably 1.5s, could really be anywhere from 3 to 9(!) on heroku. I would anticipate having to spend a lot more developer time on optimization on heroku standard dynos — or, i this small over-burdened non-commercial shop, not prioritizing that (or not having the skills for it), and having our performance just get bad. So I’m really reluctant to suggest moving our app to heroku with standard dynos. A performance-l dyno is going to let us not have to think about performance any more than we do now, while scaling under high-traffic better than we do now — I suspect we’d never need to scale to more than one performance-l dyno. But it’s pricey for us. A performance-m dyno has a base-speed that’s fine, but scales very poorly and unaffordably. Doesn’t handle an increase in load very well as one dyno, and to get more CPUs you have to pay far too much (especially compared to standard dynos I had been assuming I’d use). So I don’t really like any of my options. If we do heroku, maybe we’ll try a performance-m, and “hope” our traffic is light enough that a single one will do? Maybe with Rails autoscale for traffic spikes, even though 2 performance-m dynos isn’t financially efficient? If we are scaling to 2 (or more!) performance-m’s more than very occasionally, switch to performance-l, which means we need to make sure we have the budget for it? jrochkind General Leave a comment November 19, 2020November 19, 2020 Deep Dive: Moving ruby projects from Travis to Github Actions for CI So this is one of my super wordy posts, if that’s not your thing abort now, but some people like them. We’ll start with a bit of context, then get to some detailed looks at Github Actions features I used to replace my travis builds, with example config files and examination of options available. For me, by “Continuous Integration” (CI), I mostly mean “Running automated tests automatically, on your code repo, as you develop”, on every PR and sometimes with scheduled runs. Other people may mean more expansive things by “CI”. For a lot of us, our first experience with CI was when Travis-ci started to become well-known, maybe 8 years ago or so. Travis was free for open source, and so darn easy to set up and use — especially for Rails projects, it was a time when it still felt like most services focused on docs and smooth fit for ruby and Rails specifically. I had heard of doing CI, but as a developer in a very small and non-profit shop, I want to spend time writing code not setting up infrastructure, and would have had to get any for-cost service approved up the chain from our limited budget. But it felt like I could almost just flip a switch and have Travis on ruby or rails projects working — and for free! Free for open source wasn’t entirely selfless, I think it’s part of what helped Travis literally define the market. (Btw, I think they were the first to invent the idea of a “badge” URL for a github readme?) Along with an amazing Developer UX (which is today still a paragon), it just gave you no reason not to use it. And then once using it, it started to seem insane to not have CI testing, nobody would ever again want to develop software without the build status on every PR before merge. Travis really set a high bar for ease of use in a developer tool, you didn’t need to think about it much, it just did what you needed, and told you what you needed to know in it’s read-outs. I think it’s an impressive engineering product. But then. End of an era Travis will no longer be supporting open source projects with free CI. The free open source travis projects originally ran on travis-ci.org, with paid commercial projects on travis-ci.com. In May 2018, they announced they’d be unifying these on travis-ci.com only, but with no announced plan that the policy for free open source would change. This migration seemed to proceed very slowly though. Perhaps because it was part of preparing the company for a sale, in Jan 2019 it was announced private equity firm Idera had bought travis. At the time the announcement said “We will continue to maintain a free, hosted service for open source projects,” but knowing what “private equity” usually means, some were concerned for the future. (HN discussion). While the FAQ on the migration to travis-ci.com still says that travis-ci.org should remain reliable until projects are fully migrated, in fact over the past few months travis-ci.org projects largely stopped building, as travis apparently significantly reduced resources on the platform. Some people began manually migrating their free open source projects to travis-ci.com where builds still worked. But, while the FAQ also still says “Will Travis CI be getting rid of free users? Travis CI will continue to offer a free tier for public or open-source repositories on travis-ci.com” — in fact, travis announced that they are ending the free service for open source. The “free tier” is a limited trial (available not just to open source), and when it expires, you can pay, or apply to a special program for an extension, over and over again. They are contradicting themselves enough that while I’m not sure exactly what is going to happen, but no longer trust them as a service. Enter Github Actions I work mostly on ruby and Rails projects. They are all open source, almost all of them use travis. So while (once moved to travis-ci.com) they are all currently working, it’s time to start moving them somewhere else, before I have dozens of projects with broken CI and still don’t know how to move them. And the new needs to be free — many of these projects are zero-budget old-school “volunteer” or “informal multi-institutional collaboration” open source. There might be several other options, but the one I chose is Github Actions — my sense that it had gotten mature enough to start approaching travis level of polish, and all of my projects are github-hosted, and Github Actions is free for unlimited use for open source. (pricing page; Aug 2019 announcement of free for open source). And we are really fortunate that it became mature and stable in time for travis to withdraw open source support (if travis had been a year earlier, we’d be in trouble). Github Actions is really powerful. It is built to do probably WAY MORE than travis does, definitely way beyond “automated testing” to various flows for deployment and artifact release, to really just about any kind of process for managing your project you want. The logic you can write almost unlimited, all running on github’s machines. As a result though…. I found it a bit overwhelming to get started. The Github Actions docs are just overwhelmingly abstract, there is so much there, you can almost anything — but I don’t actually want to learn a new platform, I just want to get automated test CI for my ruby project working! There are some language/project speccific Guides available, for node.js, python, a few different Java setups — but not for ruby or Rails! My how Rails has fallen, from when most services like this would be focusing on Rails use cases first. :( There are some third part guides available that might focus on ruby/rails, but one of the problems is that Actions has been evolving for a few years with some pivots, so it’s easy to find outdated instructions. One I found helpful orientation was this Drifting Ruby screencast. This screencast showed me there is a kind of limited web UI with integrated docs searcher — but i didn’t end up using it, I just created the text config file by hand, same as I would have for travis. Github provides templates for “ruby” or “ruby gem”, but the Drifting Ruby sccreencast said “these won’t really work for our ruby on rails application so we’ll have to set up one manually”, so that’s what I did too. ¯\_(ツ)_/¯ But the cost of all the power github Actions provides is… there are a lot more switches and dials to understand and get right (and maintain over time and across multiple projects). I’m not someone who likes copy-paste without understanding it, so I spent some time trying to understand the relevant options and alternatives; in the process I found some things I might have otherwise copy-pasted from other people’s examples that could be improved. So I give you the results of my investigations, to hopefully save you some time, if wordy comprehensive reports are up your alley. A Simple Test Workflow: ruby gem, test with multiple ruby versions Here’s a file for a fairly simple test workflow. You can see it’s in the repo at .github/workflows. The name of the file doesn’t matter — while this one is called ruby.yml, i’ve since moved over to naming the file to match the name: key in the workflow for easier traceability, so I would have called it ci.yml instead. Triggers You can see we say that this workflow should be run on any push to master branch, and also for any pull_request at all. Many other examples I’ve seen define pull_request: branches: ["main"], which seems to mean only run on Pull Requests with main as the base. While that’s most of my PR’s, if there is ever a PR that uses another branch as a base for whatever reason, I still want to run CI! While hypothetically you should be able leave branches out to mean “any branch”, I only got it to work by explicitly saying branches: ["**"] Matrix For this gem, we want to run CI on multiple ruby versions. You can see we define them here. This works similarly to travis matrixes. If you have more than one matrix variable defined, the workflow will run for every combination of variables (hence the name “matrix”). matrix: ruby: [ '2.4.4', '2.5.1', '2.6.1', '2.7.0', 'jruby-9.1.17.0', 'jruby-9.2.9.0' ] In a given run, the current value of the matrix variables is available in github actions “context”, which you can acccess as eg ${{ matrix.ruby }}. You can see how I use that in the name, so that the job will show up with it’s ruby version in it. name: Ruby ${{ matrix.ruby }} Ruby install While Github itself provides an action for ruby install, it seems most people are using this third-party action. Which we reference as `ruby/setup-ruby@v1`. You can see we use the matrix.ruby context to tell the setup-ruby action what version of ruby to install, which works because our matrix values are the correct values recognized by the action. Which are documented in the README, but note that values like jruby-head are also supported. Note, although it isn’t clearly documented, you can say 2.4 to mean “latest available 2.4.x” (rather than it meaning “2.4.0”), which is hugely useful, and I’ve switched to doing that. I don’t believe that was available via travis/rvm ruby install feature. For a project that isn’t testing under multiple rubies, if we left out the with: ruby-version, the action will conveniently use a .ruby-version file present in the repo. Note you don’t need to put a gem install bundler into your workflow yourself, while I’m not sure it’s clearly documented, I found the ruby/setup-ruby action would do this for you (installing the latest available bundler, instead of using whatever was packaged with ruby version), btw regardless of whether you are using the bundler-cache feature (see below). Note on How Matrix Jobs Show Up to Github With travis, testing for multiple ruby or rails versions with a matrix, we got one (or, well, actually two) jobs showing up on the Github PR: Each of those lines summaries a collection of matrix jobs (eg different ruby versions). If any of the individual jobs without the matrix failed, the whole build would show up as failed. Success or failure, you could click on “Details” to see each job and it’s status: I thought this worked pretty well — especially for “green” builds I really don’t need to see the details on the PR, the summary is great, and if I want to see the details I can click through, great. With Github Actions, each matrix job shows up directly on the PR. If you have a large matrix, it can be… a lot. Some of my projects have way more than 6. On PR: Maybe it’s just because I was used to it, but I preferred the Travis way. (This also makes me think maybe I should change the name key in my workflow to say eg CI: Ruby 2.4.4 to be more clear? Oops, tried that, it just looks even weirder in other GH contexts, not sure.) Oh, also, that travis way of doing the build twice, once for “pr” and once for “push”? Github Actions doesn’t seem to do that, it just does one, I think corresponding to travis “push”. While the travis feature seemed technically smart, I’m not sure I ever actually saw one of these builds pass while the other failed in any of my projects, I probably won’t miss it. Badge Did you have a README badge for travis? Don’t forget to swap it for equivalent in Github Actions. The image url looks like: https://github.com/$OWNER/$REPOSITORY/workflows/$WORKFLOW_NAME/badge.svg?branch=master, where $WORKFLOW_NAME of course has to be URL-escaped if it ocntains spaces etc. The github page at https://github.com/owner/repo/actions, if you select a particular workflow/branch, does, like travis, give you a badge URL/markdown you can copy/paste if you click on the three-dots and then “Create status badge”. Unlike travis, what it gives you to copy/paste is just image markdown, it doesn’t include a link. But I definitely want the badge to link to viewing the results of the last build in the UI. So I do it manually. Limit to the speciifc workflow and branch that you made the badge for in the UI then just copy and paste the URL from the browser. A bit confusing markdown to construct manually, here’s what it ended up looking like for me: [![CI Status](https://github.com/jrochkind/attr_json/workflows/CI/badge.svg?branch=master)%5D(https://github.com/jrochkind/attr_json/actions?query=workflow%3ACI+branch%3Amaster) view raw gh_badge_markdown_example.txt hosted with ❤ by GitHub I copy and paste that from an existing project when I need it in a new one. :shrug: Require CI to merge PR? However, that difference in how jobs show up to Github, the way each matrix job shows up separately now, has an even more negative impact on requiring CI success to merge a PR. If you want to require that CI passes before merging a PR, you configure that at https://github.com/acct/project/settings/branches under “Branch protection rules”.When you click “Add Rule”, you can/must choose WHICH jobs are “required”. For travis, that’d be those two “master” jobs, but for the new system, every matrix job shows up separately — in fact, if you’ve been messing with job names trying to get it right as I have, you have any job name that was ever used in the last 7 days, and they don’t have the Github workflow name appended to them or anything (another reason to put github workflow name in the job name?). But the really problematic part is that if you edit your list of jobs in the matrix — adding or removing ruby versions as one does, or even just changing the name that shows up for a job — you have to go back to this screen to add or remove jobs as a “required status check”. That seems really unworkable to me, I’m not sure how it hasn’t been a major problem already for users. It would be better if we could configure “all the checks in the WORKFLOW, whatever they may be”, or perhaps best of all if we could configure a check as required in the workflow YML file, the same place we’re defining it, just a required_before_merge key you could set to true or use a matrix context to define or whatever. I’m currently not requiring status checks for merge on most of my projects (even though i did with travis), because I was finding it unmanageable to keep the job names sync’d, especially as I get used to Github Actions and kept tweaking things in a way that would change job names. So that’s a bit annoying. fail_fast: false By default, if one of the matrix jobs fails, Github Acitons will cancel all remaining jobs, not bother to run them at all. After all, you know the build is going to fail if one job fails, what do you need those others for? Well, for my use case, it is pretty annoying to be told, say, “Job for ruby 2.7.0 failed, we can’t tell you whether the other ruby versions would have passed or failed or not” — the first thing I want to know is if failed on all ruby versions or just 2.7.0, so now I’d have to spend extra time figuring that out manually? No thanks. So I set `fail_fast: false` on all of my workflows, to disable this behavior. Note that travis had a similar (opt-in) fast_finish feature, which worked subtly different: Travis would report failure to Github on first failure (and notify, I think), but would actually keep running all jobs. So when I saw a failure, I could click through to ‘details’ to see which (eg) ruby versions passed, from the whole matrix. This does work for me, so I’d chose to opt-in to that travis feature. Unfortunately, the Github Actions subtle difference in effect makes it not desirable to me. Note You may see some people referencing a Github Actions continue-on-error feature. I found the docs confusing, but after experimentation what this really does is mark a job as successful even when it fails. It shows up in all GH UI as succeeeded even when it failed, the only way to know it failed would be to click through to the actual build log to see failure in the logged console. I think “continue on error” is a weird name for this; it is not useful to me with regard to fine-tuning fail-fast; or honestly in any other use case I can think of that I have. Bundle cache? bundle install can take 60+ seconds, and be a significant drag on your build (not to mention a lot of load on rubygems servers from all these builds). So when travis introduced a feature to cache: bundler: true, it was very popular. True to form, Github Actions gives you a generic caching feature you can try to configure for your particular case (npm, bundler, whatever), instead of an out of the box feature “just do the right thing you for bundler, you figure it out”. The ruby/setup-ruby third-party action has a built-in feature to cache bundler installs for you, but I found that it does not work right if you do not have a Gemfile.lock checked into the repo. (Ie, for most any gem, rather than app, project). It will end up re-using cached dependencies even if there are new releases of some of your dependencies, which is a big problem for how I use CI for a gem — I expect it to always be building with latest releases of dependencies, so I can find out of one breaks the build. This may get fixed in the action. If you have an app (rather than gem) with a Gemfile.lock checked into repo, the bundler-cache: true feature should be just fine. Otherwise, Github has some suggestions for using the generic cache feature for ruby bundler (search for “ruby – bundler” on this page) — but I actually don’t believe they will work right without a Gemfile.lock checked into the repo either. Starting from that example, and using the restore-keys feature, I think it should be possible to design a use that works much like travis’s bundler cache did, and works fine without a checked-in Gemfile.lock. We’d want it to use a cache from the most recent previous (similar job), and then run bundle install anyway, and then cache the results again at the end always to be available for the next run. But I haven’t had time to work that out, so for now my gem builds are simply not using bundler caching. (my gem builds tend to take around 60 seconds to do a bundle install, so that’s in every build now, could be worse). update nov 27: The ruby/ruby-setup action should be fixed to properly cache-bust when you don’t have a Gemfile.lock checked in. If you are using a matrix for ruby version, as below, you must set the ruby version by setting the BUNDLE_GEMFILE env variable rather than the way we did it below, and there is is a certain way Github Action requires/provides you do that, it’s not just export. See the issue in ruby/ruby-setup project. Notifications: Not great Travis has really nice defaults for notifications: The person submitting the PR would get an email generally only on status changes (from pass to fail or fail to pass) rather than on every build. And travis would even figure out what email to send to based on what email you used in your git commits. (Originally perhaps a workaround to lack of Github API at travis’ origin, I found it a nice feature). And then travis has sophisticated notification customization available on a per-repo basis. Github notifications are unfortunately much more basic and limited. The only notification settings avaialable are for your entire account at https://github.com/settings/notifications, “GitHub Actions”. So they apply to all github workflows in all projects, there are no workflow- or project-specific settings. You can set to receive notification via web push or email or both or neither. You can receive notifications for all builds or only failed builds. That’s it. The author of a PR is the one who receives the notifications, same as in travis. You will get notifications for every single build, even repeated successes or failures in a series. I’m not super happy with the notification options. I may end up just turning off Github Actions notifications entirely for my account. Hypothetically, someone could probably write a custom Github action to give you notifications exactly how travis offered — after all, travis was using public GH API that should be available to any other author, and I think should be usable from within an action. But when I started to think through it, while it seemed an interesting project, I realized it was definitely beyond the “spare hobby time” I was inclined to give to it at present, especially not being much of a JS developer (the language of custom GH actions, generally). (While you can list third-party actions on the github “marketplace”, I don’t think there’s a way to charge for them). . There are custom third-party actions available to do things like notify slack for build completion; I haven’t looked too much into any of them, beyond seeing that I didn’t see any that would be “like travis defaults”. A more complicated gem: postgres, and Rails matrix Let’s move to a different example workflow file, in a different gem. You can see I called this one ci.yml, matching it’s name: CI, to have less friction for a developer (including future me) trying to figure out what’s going on. This gem does have rails as a dependency and does test against it, but isn’t actually a Rails engine as it happens. It also needs to test against Postgres, not just sqlite3. Scheduled Builds At one point travis introduced a feature for scheduling (eg) weekly builds even when no PR/commit had been made. I enthusiastically adopted this for my gem projects. Why? Gem releases are meant to work on a variety of different ruby versions and different exact versions of dependencies (including Rails). Sometimes a new release of ruby or rails will break the build, and you want to know about that and fix it. With CI builds happening only on new code, you find out about this with some random new code that is unlikely to be related to the failure; and you only find out about it on the next “new code” that triggers a build after a dependency release, which on some mature and stable gems could be a long time after the actual dependency release that broke it. So scheduled builds for gems! (I have no purpose for scheduled test runs on apps). Github Actions does have this feature. Hooray. One problem is that you will receive no notification of the result of the scheduled build, success or failure. :( I suppose you could include a third-party action to notify a fixed email address or Slack or something else; not sure how you’d configure that to apply only to the scheduled builds and not the commit/PR-triggered builds if that’s what you wanted. (Or make an custom action to file a GH issue on failure??? But make sure it doesn’t spam you with issues on repeated failures). I haven’t had the time to investigate this yet. Also oops just noticed this: “In a public repository, scheduled workflows are automatically disabled when no repository activity has occurred in 60 days.” Which poses some challenges for relying on scheduled builds to make sure a stable slow-moving gem isn’t broken by dependency updates. I definitely am committer on gems that are still in wide use and can go 6-12+ months without a commit, because they are mature/done. I still have it configured in my workflow; I guess even without notifications it will effect the “badge” on the README, and… maybe i’ll notice? Very far from ideal, work in progress. :( Rails Matrix OK, this one needs to test against various ruby versions AND various Rails versions. A while ago I realized that an actual matrix of every ruby combined with every rails was far too many builds. Fortunately, Github Actions supports the same kind of matrix/include syntax as travis, which I use. matrix: include: - gemfile: rails_5_0 ruby: 2.4 - gemfile: rails_6_0 ruby: 2.7 I use the appraisal gem to handle setting up testing under multiple rails versions, which I highly recommend. You could use it for testing variant versions of any dependencies, I use it mostly for varying Rails. Appraisal results in a separate Gemfile committed to your repo for each (in my case) rails version, eg ./gemfiles/rails_5_0.gemfile. So those values I use for my gemfile matrix key are actually portions of the Gemfile path I’m going to want to use for each job. Then we just need to tell bundler, in a given matrix job, to use the gemfile we specified in the matrix. The old-school way to do this is with the BUNDLE_GEMFILE environmental variable, but I found it error-prone to make sure it stayed consistently set in each workflow step. I found that the newer (although not that new!) bundle config set gemfile worked swimmingly! I just set it before the bundle install, it stays set for the rest of the run including the actual test run. steps: # [...] - name: Bundle install run: | bundle config set gemfile "${GITHUB_WORKSPACE}/gemfiles/${{ matrix.gemfile }}.gemfile" bundle install --jobs 4 --retry 3 Note that single braces are used for ordinary bash syntax to reference the ENV variable ${GITHUB_WORKSPACE}, but double braces for the github actions context value interpolation ${{ matrix.gemfile }}. Works great! Oh, note how we set the name of the job to include both ruby and rails matrix values, important for it showing up legibly in Github UI: name: ${{ matrix.gemfile }}, ruby ${{ matrix.ruby }}. Because of how we constructed our gemfile matrix, that shows up with job names rails_5_0, ruby 2.7. Still not using bundler caching in this workflow. As before, we’re concerned about the ruby/setup-ruby built-in bundler-cache feature not working as desired without a Gemfile.lock in the repo. This time, I’m also not sure how to get that feature to play nicely with the variant gemfiles and bundle config set gemfile. Github Actions makes you put together a lot more pieces together yourself compared to travis, there are still things I just postponed figuring out for now. update jan 11: the ruby/setup-ruby action now includes a ruby version matrix example in it’s README. https://github.com/ruby/setup-ruby#matrix-of-gemfiles It does require you use the BUNDLE_GEMFILE env variable, rather than the bundle config set gemfile command I used here. This should ordinarily be fine, but is something to watch out for in case other instructions you are following tries to use bundle config set gemfile instead, for reasons or not. Postgres This project needs to build against a real postgres. That is relatively easy to set up in Github Actions. Postgres normally by default allows connections on localhost without a username/password set, and my past builds (in travis or locally) took advantage of this to not bother setting one, which then the app didn’t have to know about. But the postgres image used for Github Actions doesn’t allow this, you have to set a username/password. So the section of the workflow that sets up postgres looks like: jobs: tests: services: db: image: postgres:9.4 env: POSTGRES_USER: postgres POSTGRES_PASSWORD: postgres ports: ['5432:5432'] 5432 is the default postgres port, we need to set it and map it so it will be available as expected. Note you also can specify whatever version of postgres you want, this one is intentionally testing on one a bit old. OK now our Rails app that will be executed under rspec needs to know that username and password to use in it’s postgres connection; when before it connected without a username/password. That env under the postgres service image is not actually available to the job steps. I didn’t find any way to DRY the username/password in one place, I had to repeat it in another env block, which I put at the top level of the workflow so it would apply to all steps. And then I had to alter my database.yml to use those ENV variables, in the test environment. On a local dev machine, if your postgres doens’t have a username/password requirement and you don’t set the ENV variables, it keeps working as before. I also needed to add host: localhost to the database.yml; before, the absence of the host key meant it used a unix-domain socket (filesystem-located) to connect to postgres, but that won’t work in the Github Actions containerized environment. Note, there are things you might see in other examples that I don’t believe you need: No need for an apt-get of pg dev libraries. I think everything you need is on the default GH Actions images now. Some examples I’ve seen do a thing with options: --health-cmd pg_isready, my builds seem to be working just fine without it, and less code is less code to maintain. allow_failures In travis, I took advantage of the travis allow_failures key in most of my gems. Why? I am testing against various ruby and Rails versions; I want to test against *future* (pre-release, edge) ruby and rails versions, cause its useful to know if I’m already with no effort passing on them, and I’d like to keep passing on them — but I don’t want to mandate it, or prevent PR merges if the build fails on a pre-release dependency. (After all, it could very well be a bug in the dependency too!) There is no great equivalent to allow_failures in Github Actions. (Note again, continue_on_error just makes failed jobs look identical to successful jobs, and isn’t very helpful here). I investigated some alternatives, which I may go into more detail on in a future post, but on one project I am trying a separate workflow just for “future ruby/rails allowed failures” which only checks master commits (not PRs), and has a separate badge on README (which is actually pretty nice for advertising to potential users “Yeah, we ALREADY work on rails edge/6.1.rc1!”). Main downside there is having to copy/paste synchronize what’s really the same workflow in two files. A Rails app I have many more number of projects I’m a committer on that are gems, but I spend more of my time on apps, one app in specific. So here’s an example Github Actions CI workflow for a Rails app. It mostly remixes the features we’ve already seen. It doesn’t need any matrix. It does need a postgres. It does need some “OS-level” dependencies — the app does some shell-out to media utilities like vips and ffmpeg, and there are integration tests that utilize this. Easy enough to just install those with apt-get, works swimmingly. - name: Install apt dependencies run: | sudo apt-get -y install libvips-tools ffmpeg mediainfo Update 25 Nov: My apt-get that worked for a couple weeks started failing for some reason on trying to install a libpulse0 dependency of one of those packages, the solution was doing a sudo apt-get update before the sudo apt-get install. I guess this is always good practice? (That forum post also uses apt install and apt update instead of apt-get install and apt-get update, that I can’t tell you much about, I’m really not a linux admin). In addition to the bundle install, a modern Rails app using webpacker needs yarn install. This just worked for me — no need to include lines for installing npm itself or yarn or any yarn dependencies, although some examples I find online have them. (My yarn installs seem to happen in ~20 seconds, so I’m not motivated to try to figure out caching for yarn). And we need to create the test database in the postgres, which I do with RAILS_ENV=test bundle exec rails db:create — typical Rails test setup will then automatically run migrations if needed. There might be other (better?) ways to prep the database, but I was having trouble getting rake db:prepare to work, and didn’t spend the time to debug it, just went with something that worked. - name: Set up app run: | RAILS_ENV=test bundle exec rails db:create yarn install Rails test setup usually ends up running migrations automatically is why I think this worked alone, but you could also throw in a RAILS_ENV=test bundle exec rake db:schema:load if you wanted. Under travis I had to install chrome with addons: chrome: stable to have it available to use with capybara via the webdrivers gem. No need for installing chrome in Github Actions, some (recent-ish?) version of it is already there as part of the standard Github Actions build image. In this workflow, you can also see a custom use of the github “cache” action to cache a Solr install that the test setup automatically downloads and sets up. In this case the cache doesn’t actually save us any build time, but is kinder on the apache foundation servers we are downloading from with every build otherwise (and have gotten throttled from in the past). Conclusion Github Aciton sis a really impressively powerful product. And it’s totally going to work to replace travis for me. It’s also probably going to take more of my time to maintain. The trade-off of more power/flexibility and focusing on almost limitless use cases is more things th eindividual project has to get right for their use case. For instance figuring out the right configuration to get caching for bundler or yarn right, instead of just writing cache: { yarn: true, bundler: true}. And when you have to figure it out yourself, you can get it wrong, which when you are working on many projects at once means you have a bunch of places to fix. The amazingness of third-party action “marketplace” means you have to figure out the right action to use (the third-party ruby/setup-ruby instead of the vendor’s actions/setup-ruby), and again if you change your mind about that you have a bunch of projects to update. Anyway, it is what it is — and I’m grateful to have such a powerful and in fact relatively easy to use service available for free! I could not really live without CI anymore, and won’t have to! Oh, and Github Actions is giving me way more (free) simultaneous parallel workers than travis ever did, for my many-job builds! jrochkind General 7 Comments November 12, 2020January 11, 2021 Posts navigation Older posts Bibliographic Wilderness is a blog by Jonathan Rochkind about digital library services, ruby, and web development. Contact Search for: Email Subscription Enter your email address to subscribe to this blog and receive notifications of new posts by email. Join 218 other followers Email Address: Subscribe Recent Posts Code that Lasts: Sustainable And Usable Open Source Code March 23, 2021 Product management February 3, 2021 Rails auto-scaling on Heroku January 27, 2021 Managed Solr SaaS Options January 12, 2021 Gem authors, check your release sizes January 11, 2021 Archives Archives Select Month March 2021  (1) February 2021  (1) January 2021  (4) December 2020  (1) November 2020  (3) October 2020  (2) September 2020  (3) August 2020  (2) April 2020  (1) March 2020  (1) December 2019  (1) October 2019  (1) September 2019  (1) August 2019  (2) June 2019  (2) April 2019  (3) March 2019  (3) February 2019  (1) December 2018  (1) November 2018  (1) October 2018  (2) September 2018  (4) August 2018  (1) June 2018  (2) May 2018  (1) April 2018  (1) March 2018  (3) February 2018  (1) January 2018  (1) November 2017  (1) October 2017  (1) September 2017  (1) August 2017  (3) July 2017  (1) May 2017  (4) April 2017  (2) March 2017  (9) February 2017  (5) January 2017  (1) December 2016  (7) November 2016  (4) September 2016  (1) August 2016  (4) June 2016  (2) May 2016  (4) March 2016  (2) February 2016  (1) January 2016  (2) November 2015  (2) October 2015  (5) September 2015  (7) August 2015  (5) July 2015  (4) May 2015  (3) April 2015  (5) March 2015  (2) February 2015  (2) January 2015  (4) December 2014  (2) November 2014  (2) October 2014  (6) September 2014  (5) August 2014  (3) July 2014  (3) June 2014  (1) May 2014  (3) April 2014  (5) March 2014  (9) February 2014  (4) January 2014  (5) December 2013  (5) November 2013  (14) October 2013  (4) September 2013  (6) August 2013  (2) July 2013  (7) June 2013  (10) May 2013  (4) April 2013  (5) March 2013  (8) February 2013  (6) January 2013  (16) December 2012  (8) November 2012  (14) October 2012  (6) September 2012  (6) August 2012  (2) July 2012  (5) June 2012  (5) May 2012  (7) April 2012  (12) March 2012  (6) February 2012  (7) January 2012  (6) December 2011  (5) November 2011  (7) October 2011  (5) September 2011  (10) August 2011  (4) July 2011  (5) June 2011  (7) May 2011  (8) April 2011  (5) March 2011  (13) February 2011  (4) January 2011  (12) December 2010  (7) November 2010  (5) October 2010  (5) September 2010  (10) August 2010  (6) July 2010  (7) June 2010  (5) May 2010  (8) April 2010  (8) March 2010  (14) February 2010  (3) January 2010  (3) December 2009  (4) November 2009  (2) October 2009  (3) September 2009  (9) August 2009  (1) July 2009  (4) June 2009  (7) May 2009  (14) April 2009  (17) March 2009  (21) February 2009  (11) January 2009  (16) December 2008  (12) November 2008  (30) October 2008  (12) September 2008  (3) July 2008  (4) June 2008  (2) May 2008  (11) April 2008  (3) March 2008  (4) February 2008  (10) January 2008  (7) December 2007  (4) November 2007  (4) September 2007  (1) August 2007  (3) June 2007  (6) May 2007  (12) April 2007  (11) March 2007  (9) Feeds  RSS - Posts  RSS - Comments Recent Comments jrochkind on Rails auto-scaling on Heroku Adam (Rails Autoscale) on Rails auto-scaling on Heroku On catalogers, programmers, and user tasks – Gavia Libraria on Broad categories from class numbers Replacing MARC – Gavia Libraria on Linked Data Caution jrochkind on Deep Dive: Moving ruby projects from Travis to Github Actions for CI jrochkind on Deep Dive: Moving ruby projects from Travis to Github Actions for CI jrochkind on Deep Dive: Moving ruby projects from Travis to Github Actions for CI eregontp on Deep Dive: Moving ruby projects from Travis to Github Actions for CI Top Posts Bootstrap 3 to 4: Changes in how font size, line-height, and spacing is done. Or "what happened to $line-height-computed." Some notes on what's going on in ActiveStorage yes, product owner and technical lead need to be different people Deep Dive: Moving ruby projects from Travis to Github Actions for CI Are you talking to Heroku redis in cleartext or SSL? Top Clicks w3schools.com/tags/ref_ur… uppy.io github.com/nahi/httpclien… google.com/fonts/specimen… news.ycombinator.com/item… A blog by Jonathan Rochkind. All original content licensed CC-BY. Create a website or blog at WordPress.com Email (Required) Name (Required) Website   Loading Comments... Comment × Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy blog-dshr-org-2809 ---- DSHR's Blog: Elon Musk: Threat or Menace? DSHR's Blog I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation. Tuesday, April 6, 2021 Elon Musk: Threat or Menace? Although both Tesla and SpaceX are major engineering achievements, Elon Musk seems completely unable to understand the concept of externalities, unaccounted-for costs that society bears as a result of these achievements. First, in Tesla: carbon offsetting, but in reverse, Jaime Powell reacted to Tesla taking $1.6B in carbon offsets which provided the only profit Tesla ever made and putting them into Bitcoin: Looked at differently, a single Bitcoin purchase at a price of ~$50,000 has a carbon footprint of 270 tons, the equivalent of 60 ICE cars. Tesla’s average selling price in the fourth quarter of 2020? $49,333. We’re not sure about you, but FT Alphaville is struggling to square the circle of “buy a Tesla with a bitcoin and create the carbon output of 60 internal combustion engine cars” with its legendary environmental ambitions. Unless, of course, that was never the point in the first place. Below the fold, more externalities Musk is ignoring. Second, there is Musk's obsession with establishing a colony on Mars. Even assuming SpaceX can stop their Starship second stage exploding on landing, and do the same with the much bigger first stage, the Mars colony scheme would have massive environmental impacts. Musk envisages a huge fleet of Starships ferrying people and supplies to Mars for between 40 and 100 years. The climate effects of dumping this much rocket exhaust into the upper atmosphere over such a long period would be significant. The idea that a world suffering the catastrophic effects of climate change could sustain such an expensive program over many decades simply for the benfit of a miniscule fraction of the population is laughable. These externalities are in the future. But there are a more immediate set of externalities. Back in 2017 I expressed my skepticism about "Level 5" self-driving cars in Techno-hype part 1, stressing that the problem was that to get to Level 5, or as Musk calls it "Full Self-Driving", you need to pass through the levels where the software has to hand-off to the human. And the closer you get to Level 5, the harder this problem becomes: Suppose, for the sake of argument, that self-driving cars three times as good as Waymo's are in wide use by normal people. A normal person would encounter a hand-off once in 15,000 miles of driving, or less than once a year. Driving would be something they'd be asked to do maybe 50 times in their life. Even if, when the hand-off happened, the human was not "climbing into the back seat, climbing out of an open car window, and even smooching" and had full "situational awareness", they would be faced with a situation too complex for the car's software. How likely is it that they would have the skills needed to cope, when the last time they did any driving was over a year ago, and on average they've only driven 25 times in their life? Current testing of self-driving cars hands-off to drivers with more than a decade of driving experience, well over 100,000 miles of it. It bears no relationship to the hand-off problem with a mass deployment of self-driving technology. Mack Hogan's Tesla's "Full Self Driving" Beta Is Just Laughably Bad and Potentially Dangerous starts: A beta version of Tesla's "Full Self Driving" Autopilot update has begun rolling out to certain users. And man, if you thought "Full Self Driving" was even close to a reality, this video of the system in action will certainly relieve you of that notion. It is perhaps the best comprehensive video at illustrating just how morally dubious, technologically limited, and potentially dangerous Autopilot's "Full Self Driving" beta program is. Hogan sums up the lesson of the video: Tesla's software clearly does a decent job of identifying cars, stop signs, pedestrians, bikes, traffic lights, and other basic obstacles. Yet to think this constitutes anything close to "full self-driving" is ludicrous. There's nothing wrong with having limited capabilities, but Tesla stands alone in its inability to acknowledge its own shortcomings. Hogan goes on to point out the externalities: When technology is immature, the natural reaction is to continue working on it until it's ironed out. Tesla has opted against that strategy here, instead choosing to sell software it knows is incomplete, charging a substantial premium, and hoping that those who buy it have the nuanced, advanced understanding of its limitations—and the ability and responsibility to jump in and save it when it inevitably gets baffled. In short, every Tesla owner who purchases "Full Self-Driving" is serving as an unpaid safety supervisor, conducting research on Tesla's behalf. Perhaps more damning, the company takes no responsibility for its actions and leaves it up to driver discretion to decide when and where to test it out. That leads to videos like this, where early adopters carry out uncontrolled tests on city streets, with pedestrians, cyclists, and other drivers unaware that they're part of the experiment. If even one of those Tesla drivers slips up, the consequences can be deadly. Of course, the drivers are only human so they do slip up: the Tesla arrives at an intersection where it has a stop sign and cross traffic doesn't. It proceeds with two cars incoming, the first car narrowly passing the car's front bumper and the trailing car braking to avoid T-boning the Model 3. It is absolutely unbelievable and indefensible that the driver, who is supposed to be monitoring the car to ensure safe operation, did not intervene there. An example of the kinds of problems that can be caused by autonomous vehicles behaving in ways that humans don't expect is reported by Timothy B. Lee in Fender bender in Arizona illustrates Waymo’s commercialization challenge: A white Waymo minivan was traveling westbound in the middle of three westbound lanes on Chandler Boulevard, in autonomous mode, when it unexpectedly braked for no reason. A Waymo backup driver behind the wheel at the time told Chandler police that "all of a sudden the vehicle began to stop and gave a code to the effect of 'stop recommended' and came to a sudden stop without warning." A red Chevrolet Silverado pickup behind the vehicle swerved to the right but clipped its back panel, causing minor damage. Nobody was hurt. The Tesla in the video made a similar unexpected stop. Lee stresses that, unlike Tesla's, Waymo's responsible test program has resulted in a generally safe product, but not one that is safe enough: Waymo has racked up more than 20 million testing miles in Arizona, California, and other states. This is far more than any human being will drive in a lifetime. Waymo's vehicles have been involved in a relatively small number of crashes. These crashes have been overwhelmingly minor with no fatalities and few if any serious injuries. Waymo says that a large majority of those crashes have been the fault of the other driver. So it's very possible that Waymo's self-driving software is significantly safer than a human driver. ... The more serious problem for Waymo is that the company can't be sure that the idiosyncrasies of its self-driving software won't contribute to a more serious crash in the future. Human drivers cause a fatality about once every 100 million miles of driving—far more miles than Waymo has tested so far. If Waymo scaled up rapidly, it would be taking a risk that an unnoticed flaw in Waymo's programming could lead to someone getting killed. I'm a pedestrian, cyclist and driver in an area infested with Teslas owned, but potentially not actually being driven, by fanatical early adopters and members of the cult of Musk. I'm personally at risk from these people believing that what they paid good money for was "Full Self Driving". When SpaceX tests Starship at their Boca Chica site they take precautions, including road closures, to ensure innocent bystanders aren't at risk from the rain of debris when things go wrong. Tesla, not so much. Of course, Tesla doesn't tell the regulators that what the cult members paid for was "Full Self Driving"; that might cause legal problems. As Timothy B. Lee reports, Tesla: “Full self-driving beta” isn’t designed for full self-driving: "Despite the "full self-driving" name, Tesla admitted it doesn't consider the current beta software suitable for fully driverless operation. The company said it wouldn't start testing "true autonomous features" until some unspecified point in the future. ... Tesla added that "we do not expect significant enhancements" that would "shift the responsibility for the entire dynamic driving task to the system." The system "will continue to be an SAE Level 2, advanced driver-assistance feature." SAE level 2 is industry jargon for a driver-assistance systems that perform functions like lane-keeping and adaptive cruise control. By definition, level 2 systems require continual human oversight. Fully driverless systems—like the taxi service Waymo is operating in the Phoenix area—are considered level 4 systems." There is an urgent need for regulators to step up and stop this dangerous madness: The NHTSA should force Tesla to disable "Full Self Driving" in all its vehicles until the technology has passed an approved test program Any vehicles taking part in such a test program on public roads should be clearly distinguishable from Teslas being driven by actual humans, for example with orange flashing lights. Self-driving test vehicles from less irresponsible companies such as Waymo are distinguishable in this way, Teslas in which some cult member has turned on "Full Self Driving Beta" are not. The FTC should force Tesla to refund, with interest, every dollar paid by their customers under the false pretense that they were paying for "Full Self Driving". Posted by David. at 8:00 AM Labels: techno-hype 5 comments: David. said... Aaron Gordon's This Is the Most Embarrassing News Clip in American Transportation History is a brutal takedown of yet another of Elon Musk's fantasies: "Last night, Shepard Smith ran a segment on his CNBC show revealing Elon Musk's Boring Campany's new Las Vegas car tunnel, which was paid for by $50 million in taxpayer dollars. It is one of the most bizarre and embarrassing television segments in American transportation history, a perfect cap for one of the most bizarre and embarrassing transportation projects in American history." April 11, 2021 at 7:20 AM David. said... Eric Berger's A new documentary highlights the visionary behind space settlement reviews The High Frontier: The Untold Story of Gerard K. O'Neill: "O'Neill popularized the idea of not just settling space, but of doing so in free space rather than on the surface of other planets or moons. His ideas spread through the space-enthusiast community at a time when NASA was about to debut its space shuttle, which first flew in 1981. NASA had sold the vehicle as offering frequent, low-cost access to space. It was the kind of transportation system that allowed visionaries like O'Neill to think about what humans could do in space if getting there were cheaper. The concept of "O'Neill cylinders" began with a question he posed to his physics classes at Princeton: "Is a planetary surface the right place for an expanding industrial civilization?" As it turned out, following their analysis, the answer was no. Eventually, O'Neill and his students came to the idea of free-floating, rotating, cylindrical space colonies that could have access to ample solar energy." However attractive the concept is in the far future, I need to point out that pursuing it before the climate crisis has been satisfactorily resolved will make the lives of the vast majority of humanity worse for the benefit of a tiny minority. April 11, 2021 at 4:24 PM David. said... ‘No one was driving the car’: 2 men dead after fiery Tesla crash in Spring, officials say : "Harris County Precinct 4 Constable Mark Herman told KPRC 2 that the investigation showed “no one was driving” the fully-electric 2019 Tesla when the accident happened. There was a person in the passenger seat of the front of the car and in the rear passenger seat of the car." April 18, 2021 at 10:27 AM David. said... Timothy B. Lee's Consumer Reports shows Tesla Autopilot works with no one in the driver’s seat reports: "Tesla defenders also insisted that Autopilot couldn't have been active because the technology doesn't operate unless someone is in the driver's seat. Consumer Reports decided to test this latter claim by seeing if it could get Autopilot to activate without anyone in the driver's seat. It turned out not to be very difficult. Sitting in the driver's seat, Consumer Reports' Jake Fisher enabled Autopilot and then used the speed dial on the steering wheel to bring the car to a stop. He then placed a weighted chain on the steering wheel (to simulate pressure from a driver's hands) and hopped into the passenger seat. From there, he could reach over and increase the speed using the speed dial. Autopilot won't function unless the driver's seatbelt is buckled, but it was also easy to defeat this check by threading the seatbelt behind the driver. ... the investigation makes clear that activating Autopilot without being in the driver's seat requires deliberately disabling safety measures. Fisher had to buckle the seatbelt behind himself, put a weight on the steering wheel, and crawl over to the passenger seat without opening any doors. Anybody who does that knows exactly what they're doing. Tesla fans argue that people who deliberately bypass safety measures like this have only themselves to blame if it leads to a deadly crash." Well, yes, but Musk's BS has been convincing them to try stunts like this for years. He has to be held responsible, and he has to disable "Full Self Driving" before some innocent bystanders get killed. April 22, 2021 at 2:57 PM David. said... This Automotive News editorial is right but misses the bigger picture: "Tesla's years of misleading consumers about its vehicles' "full self-driving" capabilities — or lack thereof — claimed two more lives this month. ... When critics say the term "autopilot" gives the impression that the car can drive without oversight, Tesla likes to argue that that's based on an erroneous understanding of airplanes' systems. But the company exploits consumers' overconfidence in that label with the way the feature is sold and promoted without correction among Tesla's fanatical online community. Those practices encourage misunderstanding and misuse. In public, Musk says the company is very close to full SAE Level 5 automated driving. In conversations with regulators, the company admits that Autopilot and Full Self-Driving are Level 2 driver-assist suites, not unlike those sold by many other automakers. This nation does not have a good track record of holding manufacturers accountable when their products are misused by the public, which is what happened in this case." It isn't just the Darwin Award winners at risk, it is innocent bystanders at risk. April 27, 2021 at 8:50 AM Post a Comment Newer Post Older Post Home Subscribe to: Post Comments (Atom) Blog Rules Posts and comments are copyright of their respective authors who, by posting or commenting, license their work under a Creative Commons Attribution-Share Alike 3.0 United States License. Off-topic or unsuitable comments will be deleted. DSHR DSHR in ANWR Recent Comments Full comments Blog Archive ▼  2021 (18) ▼  April (5) Dogecoin Disrupts Bitcoin! What Is The Point? NFTs and Web Archiving Cryptocurrency's Carbon Footprint Elon Musk: Threat or Menace? ►  March (3) ►  February (5) ►  January (5) ►  2020 (55) ►  December (4) ►  November (4) ►  October (3) ►  September (6) ►  August (5) ►  July (3) ►  June (6) ►  May (3) ►  April (5) ►  March (6) ►  February (5) ►  January (5) ►  2019 (66) ►  December (2) ►  November (4) ►  October (8) ►  September (5) ►  August (5) ►  July (7) ►  June (6) ►  May (7) ►  April (6) ►  March (7) ►  February (4) ►  January (5) ►  2018 (96) ►  December (7) ►  November (8) ►  October (10) ►  September (5) ►  August (8) ►  July (5) ►  June (7) ►  May (10) ►  April (8) ►  March (9) ►  February (9) ►  January (10) ►  2017 (82) ►  December (6) ►  November (6) ►  October (8) ►  September (6) ►  August (7) ►  July (5) ►  June (7) ►  May (6) ►  April (7) ►  March (11) ►  February (5) ►  January (8) ►  2016 (89) ►  December (4) ►  November (8) ►  October (10) ►  September (8) ►  August (8) ►  July (7) ►  June (8) ►  May (7) ►  April (5) ►  March (10) ►  February (7) ►  January (7) ►  2015 (75) ►  December (7) ►  November (5) ►  October (11) ►  September (5) ►  August (3) ►  July (3) ►  June (8) ►  May (10) ►  April (6) ►  March (6) ►  February (7) ►  January (4) ►  2014 (68) ►  December (7) ►  November (8) ►  October (6) ►  September (8) ►  August (7) ►  July (3) ►  June (5) ►  May (6) ►  April (5) ►  March (6) ►  February (2) ►  January (5) ►  2013 (67) ►  December (3) ►  November (6) ►  October (7) ►  September (6) ►  August (3) ►  July (5) ►  June (6) ►  May (5) ►  April (9) ►  March (5) ►  February (5) ►  January (7) ►  2012 (43) ►  December (4) ►  November (4) ►  October (6) ►  September (6) ►  August (2) ►  July (5) ►  June (2) ►  May (5) ►  March (1) ►  February (5) ►  January (3) ►  2011 (40) ►  December (2) ►  November (1) ►  October (7) ►  September (3) ►  August (5) ►  July (2) ►  June (2) ►  May (2) ►  April (4) ►  March (4) ►  February (4) ►  January (4) ►  2010 (17) ►  December (5) ►  November (3) ►  October (4) ►  September (2) ►  July (1) ►  June (1) ►  February (1) ►  2009 (8) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (2) ►  January (2) ►  2008 (8) ►  December (2) ►  March (1) ►  January (5) ►  2007 (14) ►  December (1) ►  October (3) ►  September (1) ►  August (1) ►  July (2) ►  June (3) ►  May (1) ►  April (2) LOCKSS system has permission to collect, preserve, and serve this Archival Unit. Simple theme. Powered by Blogger. blog-dshr-org-397 ---- DSHR's Blog: A Note On Blockchains DSHR's Blog I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation. Tuesday, October 6, 2020 A Note On Blockchains Blockchains have three components, a data structure, a set of replicas, and a consensus mechanism: The data structure is often said to provide immutability or to be tamper-proof, but this is wrong. It is made out of bits, and bits can be changed or destroyed. What it actually provides is tamper-evidence, revealing that the data structure has changed. If an unauthorized change to the data structure is detected the damage must be repaired. So there must be multiple replicas of the data structure to allow an undamaged replica to be copied to the damaged replica. The role of the consensus mechanism is to authorize changes to the data structure, and prevent unauthorized changes. A change is authorized if the consensus of the replicas agrees to it. Below the fold, some details. Data Structure The data structure used for blockchains is a form of Merkle or hash tree, published by Ralph Merkle in 1980. In the blockchain application it is a linear chain to which fixed-size blocks are added at regular intervals. Each block contains the hash of its predecessor; a chain of blocks. Hash algorithms have a limited lifetime, but while the hash algorithm remains unbroken it is extremely difficult to change blocks in the chain but maintain the same hash values. A change that does not maintain the same hash values is easy to detect. Replicas The set of replicas can be either closed, composed of only replicas approved by some authority, or open, in which case no approval is required for participation. In blockchain jargon, closed replica sets correspond to permissioned blockchains, and open replicas sets to permissionless blockchains. Consensus Mechanism Faults Replicas 1 4 2 7 3 10 4 13 5 16 6 19 An important result in theoretical computer science was published in The Byzantine Generals Problem by Lamport et al in 1982. They showed that the minimum size of a replica set to survive f simultaneous failures was 3f+1. Thus Byzantine Fault Tolerance (BFT) is the most efficient possible consensus mechanism in terms of number of replicas. BFT requires a closed replica set, and synchronized operation of the replicas, so can be used only in permissioned blockchains. If joining the replica set of a permissionless blockchain is free, it will be vulnerable to Sybil attacks, in which an attacker creates many apparently independent replicas which are actually under his sole control. If creating and maintaining a replica is free, anyone can authorize any change they choose simply by creating enough Sybil replicas. Defending against Sybil attacks requires that membership in a replica set be expensive. The cost of an attack is at least the membership cost of half the replica set, so that the attacker controls a majority of the replicas. Permissionless blockchains have implemented a number of ways to make it expensive to take part, including: Proof of Work (PoW), a concept originated by Cynthia Dwork and Moni Naor in 1992, in which the expensive resource is CPU cycles. This is the "mining" technique used by Bitcoin, and is the only technique that has been demonstrated to work well at scale. But at scale the cost and environmental damage is unsustainable; the top 5 cryptocurrencies are estimated to use as much energy as The Netherlands. At smaller scales it doesn't work well because renting 51% of the mining power is cheap enough to motivate attacks. 51% attacks have become endemic among the smaller alt-coins. For example, there were three successful attacks on Ethereum Classic in a single month. Proof of Stake (PoS) in which the expensive resource is capital tied up, or staked. Participants stand to lose their stake in case of detected misbehavior. The Ethereum blockchain has been trying to implement PoS for 5 years, so far without success. The technique has similar economic linits and vulnerabilities as PoW. Proofs of Time & Space (PoTS), advocated by Bram Cohen, in which the expensive resource is disk storage. Conclusion Eric Budish points out the fundamental problem with expensive defenses in The Economic Limits of Bitcoin and the Blockchain: From a computer security perspective, the key thing to note ... is that the security of the blockchain is linear in the amount of expenditure on mining power, ... In contrast, in many other contexts investments in computer security yield convex returns (e.g., traditional uses of cryptography) ... analogously to how a lock on a door increases the security of a house by more than the cost of the lock. The difference between permissioned and permissionless blockchains is the presence or absence of a trusted authority controlling the replica set. A decision not to trust such an authority imposes enormous additional costs and performance penalties on the system because the permissionless consensus mechanism has to be expensive. Decentralization in Bitcoin and Ethereum Networks by Adem Efe Gencer et al compares the cost of a permissioned system using BFT to the actual Bitcoin PoW blockchain: a Byzantine quorum system of size 20 could achieve better decentralization than proof-of-work mining at a much lower resource cost. As an Englishman I appreciate understatement. By "much lower", they mean around 5 orders of magnitude lower. Posted by David. at 8:00 AM Labels: bitcoin 2 comments: David. said... Going from Bad to Worse: From Internet Voting to Blockchain Voting by Sunoo Park, Neha Narula, Michael Specter and Ronald L. Rivest argues that: "given the current state of computer security, any turnout increase derived from with Internet- or blockchain-based voting would come at the cost of losing meaningful assurance that votes have been counted as they were cast, and not undetectably altered or discarded. This state of affairs will continue as long as standard tactics such as malware, zero days, and denial-of-service attacks continue to be effective. This article analyzes and systematizes prior research on the security risks of online and electronic voting, and show that not only do these risks persist in blockchain-based voting systems, but blockchains may introduce additional problems for voting systems." November 19, 2020 at 8:48 AM Michael Hogan said... Which is why voting systems still include paper records, and probably will always include paper records. It calls to mind my oft-stated admonition to amateur futurists, that all of the cool stuff in our increasingly digitized world still relies to far too great an extent on a technology commercialized in 1882 (burning fossil fuels in a boiler to spin a turbine-generator), and even the dominant battery chemistry is about 50 years old. Beware of the "TED talk" mindset - be on the lookout for the dirty old smelter behind that shiny penny. November 28, 2020 at 4:04 AM Post a Comment Newer Post Older Post Home Subscribe to: Post Comments (Atom) Blog Rules Posts and comments are copyright of their respective authors who, by posting or commenting, license their work under a Creative Commons Attribution-Share Alike 3.0 United States License. Off-topic or unsuitable comments will be deleted. DSHR DSHR in ANWR Recent Comments Full comments Blog Archive ►  2021 (18) ►  April (5) ►  March (3) ►  February (5) ►  January (5) ▼  2020 (55) ►  December (4) ►  November (4) ▼  October (3) The Long Now Unbanking The Banked A Note On Blockchains ►  September (6) ►  August (5) ►  July (3) ►  June (6) ►  May (3) ►  April (5) ►  March (6) ►  February (5) ►  January (5) ►  2019 (66) ►  December (2) ►  November (4) ►  October (8) ►  September (5) ►  August (5) ►  July (7) ►  June (6) ►  May (7) ►  April (6) ►  March (7) ►  February (4) ►  January (5) ►  2018 (96) ►  December (7) ►  November (8) ►  October (10) ►  September (5) ►  August (8) ►  July (5) ►  June (7) ►  May (10) ►  April (8) ►  March (9) ►  February (9) ►  January (10) ►  2017 (82) ►  December (6) ►  November (6) ►  October (8) ►  September (6) ►  August (7) ►  July (5) ►  June (7) ►  May (6) ►  April (7) ►  March (11) ►  February (5) ►  January (8) ►  2016 (89) ►  December (4) ►  November (8) ►  October (10) ►  September (8) ►  August (8) ►  July (7) ►  June (8) ►  May (7) ►  April (5) ►  March (10) ►  February (7) ►  January (7) ►  2015 (75) ►  December (7) ►  November (5) ►  October (11) ►  September (5) ►  August (3) ►  July (3) ►  June (8) ►  May (10) ►  April (6) ►  March (6) ►  February (7) ►  January (4) ►  2014 (68) ►  December (7) ►  November (8) ►  October (6) ►  September (8) ►  August (7) ►  July (3) ►  June (5) ►  May (6) ►  April (5) ►  March (6) ►  February (2) ►  January (5) ►  2013 (67) ►  December (3) ►  November (6) ►  October (7) ►  September (6) ►  August (3) ►  July (5) ►  June (6) ►  May (5) ►  April (9) ►  March (5) ►  February (5) ►  January (7) ►  2012 (43) ►  December (4) ►  November (4) ►  October (6) ►  September (6) ►  August (2) ►  July (5) ►  June (2) ►  May (5) ►  March (1) ►  February (5) ►  January (3) ►  2011 (40) ►  December (2) ►  November (1) ►  October (7) ►  September (3) ►  August (5) ►  July (2) ►  June (2) ►  May (2) ►  April (4) ►  March (4) ►  February (4) ►  January (4) ►  2010 (17) ►  December (5) ►  November (3) ►  October (4) ►  September (2) ►  July (1) ►  June (1) ►  February (1) ►  2009 (8) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (2) ►  January (2) ►  2008 (8) ►  December (2) ►  March (1) ►  January (5) ►  2007 (14) ►  December (1) ►  October (3) ►  September (1) ►  August (1) ►  July (2) ►  June (3) ►  May (1) ►  April (2) LOCKSS system has permission to collect, preserve, and serve this Archival Unit. Simple theme. Powered by Blogger. blog-dshr-org-4 ---- DSHR's Blog DSHR's Blog I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation. Thursday, April 22, 2021 Dogecoin Disrupts Bitcoin! Two topics I've posted about recently, Elon Musk's cult and the illusory "prices" of cryptocurrencies, just intersected in spectacular fashion. On April 14 the Bitcoin "price" peaked at $63.4K. Early on April 15, the Musk cult saw this tweet from their prophet. Immediately, the Dogecoin "price" took off like a Falcon 9. A day later, Jemima Kelley reported that If you believe, they put a Dogecoin on the moon. That was to say that: Dogecoin — the crypto token that was started as a joke and that is the favourite of Elon Musk — is having a bit of a moment. And when we say a bit of a moment, we mean that it is on a lunar trajectory (in crypto talk: it is going to da moon). At the time of writing this, it is up over 200 per cent in the past 24 hours — more than tripling in value (for those of you who need help on percentages, it is Friday afternoon after all). Over the past week it’s up more than 550 per cent (almost seven times higher!). The headlines tell the story — Timothy B. Lee's Dogecoin has risen 400 percent in the last week because why not and Joanna Ossinger's Dogecoin Rips in Meme-Fueled Frenzy on Pot-Smoking Holiday. The Dogecoin "price" graph Kelly posted was almost vertical. The same day, Peter Schiff, the notorious gold-bug, tweeted: So far in 2021 #Bitcoin has lost 97% of its value verses #Dogecoin. The market has spoken. Dogecoin is eating Bitcoin. All the Bitcoin pumpers who claim Bitcoin is better than gold because its price has risen more than gold's must now concede that Dogecoin is better than Bitcoin. Below the fold I look back at this revolution in crypto-land. Read more » Posted by David. at 9:00 AM 1 comment: Labels: bitcoin What Is The Point? During a discussion of NFTs, Larry Masinter pointed me to his 2012 proposal The 'tdb' and 'duri' URI schemes, based on dated URIs. The proposal's abstract reads: This document defines two URI schemes. The first, 'duri' (standing for "dated URI"), identifies a resource as of a particular time. This allows explicit reference to the "time of retrieval", similar to the way in which bibliographic references containing URIs are often written. The second scheme, 'tdb' ( standing for "Thing Described By"), provides a way of minting URIs for anything that can be described, by the means of identifying a description as of a particular time. These schemes were posited as "thought experiments", and therefore this document is designated as Experimental. As far as I can tell, this proposal went nowhere, but it raises a question that is also raised by NFTs. What is the point of a link that is unlikely to continue to resolve to the expected content? Below the fold I explore this question. Read more » Posted by David. at 8:00 AM No comments: Labels: personal digital preservation, web archiving Thursday, April 15, 2021 NFTs and Web Archiving One of the earliest observations of the behavior of the Web at scale was "link rot". There were a lot of 404s, broken links. Research showed that the half-life of Web pages was alarmingly short. Even in 1996 this problem was obvious enough for Brewster Kahle to found the Internet Archive to address it. From the Wikipedia entry for Link Rot: A 2003 study found that on the Web, about one link out of every 200 broke each week,[1] suggesting a half-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in Yahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.[2] One might have thought that academic journals were a relatively stable part of the Web, but research showed that their references decayed too, just somewhat less rapidly. A 2013 study found a half-life of 9.3 years. See my 2015 post The Evanescent Web. I expect you have noticed the latest outbreak of blockchain-enabled insanity, Non-Fungible Tokens (NFTs). Someone "paying $69M for a JPEG" or $560K for a New York Times column attracted a lot of attention. Follow me below the fold for the connection between NFTs, "link rot" and Web archiving. Read more » Posted by David. at 8:00 AM 2 comments: Labels: bitcoin, distributed web, web archiving Tuesday, April 13, 2021 Cryptocurrency's Carbon Footprint China’s bitcoin mines could derail carbon neutrality goals, study says and Bitcoin mining emissions in China will hit 130 million tonnes by 2024, the headlines say it all. Excusing this climate-destroying externality of Proof-of-Work blockchains requires a continuous flow of new misleading arguments. Below the fold I discuss one of the more recent novelties. Read more » Posted by David. at 8:00 AM 5 comments: Labels: bitcoin, security Tuesday, April 6, 2021 Elon Musk: Threat or Menace? Although both Tesla and SpaceX are major engineering achievements, Elon Musk seems completely unable to understand the concept of externalities, unaccounted-for costs that society bears as a result of these achievements. First, in Tesla: carbon offsetting, but in reverse, Jaime Powell reacted to Tesla taking $1.6B in carbon offsets which provided the only profit Tesla ever made and putting them into Bitcoin: Looked at differently, a single Bitcoin purchase at a price of ~$50,000 has a carbon footprint of 270 tons, the equivalent of 60 ICE cars. Tesla’s average selling price in the fourth quarter of 2020? $49,333. We’re not sure about you, but FT Alphaville is struggling to square the circle of “buy a Tesla with a bitcoin and create the carbon output of 60 internal combustion engine cars” with its legendary environmental ambitions. Unless, of course, that was never the point in the first place. Below the fold, more externalities Musk is ignoring. Read more » Posted by David. at 8:00 AM 5 comments: Labels: techno-hype Thursday, March 25, 2021 Internet Archive Storage The Internet Archive is a remarkable institution, which has become increasingly important during the pandemic. It has been for many years in the world's top 300 Web sites and is currently ranked #209, sustaining almost 60Gb/s outbound bandwidth from its collection of almost half a trillion archived Web pages and much other content. It does this on a budget of under $20M/yr, yet maintains 99.98% availability. Jonah Edwards, who runs the Core Infrastructure team, gave a presentation on the Internet Archive's storage infrastructure to the Archive's staff. Below the fold, some details and commentary. Read more » Posted by David. at 8:00 AM 1 comment: Labels: storage costs, storage failures, storage media Tuesday, March 16, 2021 Correlated Failures The invaluable statistics published by Backblaze show that, despite being built from technologies close to the physical limits (Heat-Assisted Magnetic Recording, 3D NAND Flash), modern digital storage media are extraordinarily reliable. However, I have long believed that the models that attempt to project the reliability of digital storage systems from the statistics of media reliability are wildly optimistic. They ignore foreseeable causes of data loss such as Coronal Mass Ejections and ransomware attacks, which cause correlated failures among the media in the system. No matter how many they are, if all replicas are destroyed or corrupted the data is irrecoverable. Modelling these "black swan" events is clearly extremely difficult, but much less dramatic causes are in practice important too. It has been known at least since Talagala's 1999 Ph.D. thesis that media failures in storage systems are significantly correlated, and at least since Jiang et al's 2008 Are Disks the Dominant Contributor for Storage Failures? A Comprehensive Study of Storage Subsystem Failure Characteristics that only about half the failures in storage systems are traceable to media failures. The rest happen in the pipeline from the media to the CPU. Because this typically aggregates data from many media components, it naturally causes correlations. As I wrote in 2015's Disk reliability, discussing Backblaze's experience of a 40% Annual Failure Rate (AFR) in over 1,100 Seagate 3TB drives: Alas, there is a long history of high failure rates among particular batches of drives. An experience similar to Backblaze's at Facebook is related here, with an AFR over 60%. My first experience of this was nearly 30 years ago in the early days of Sun Microsystems. Manufacturing defects, software bugs, mishandling by distributors, vibration resonance, there are many causes for these correlated failures. Despite plenty of anecdotes, there is little useful data on which to base models of correlated failures in storage systems. Below the fold I summarize and comment on an important paper by a team from the Chinese University of Hong Kong and Alibaba that helps remedy this. Read more » Posted by David. at 8:00 AM No comments: Labels: fault tolerance, storage failures, storage media Older Posts Home Subscribe to: Posts (Atom) Blog Rules Posts and comments are copyright of their respective authors who, by posting or commenting, license their work under a Creative Commons Attribution-Share Alike 3.0 United States License. Off-topic or unsuitable comments will be deleted. DSHR DSHR in ANWR Recent Comments Full comments Blog Archive ▼  2021 (18) ▼  April (5) Dogecoin Disrupts Bitcoin! What Is The Point? NFTs and Web Archiving Cryptocurrency's Carbon Footprint Elon Musk: Threat or Menace? ►  March (3) ►  February (5) ►  January (5) ►  2020 (55) ►  December (4) ►  November (4) ►  October (3) ►  September (6) ►  August (5) ►  July (3) ►  June (6) ►  May (3) ►  April (5) ►  March (6) ►  February (5) ►  January (5) ►  2019 (66) ►  December (2) ►  November (4) ►  October (8) ►  September (5) ►  August (5) ►  July (7) ►  June (6) ►  May (7) ►  April (6) ►  March (7) ►  February (4) ►  January (5) ►  2018 (96) ►  December (7) ►  November (8) ►  October (10) ►  September (5) ►  August (8) ►  July (5) ►  June (7) ►  May (10) ►  April (8) ►  March (9) ►  February (9) ►  January (10) ►  2017 (82) ►  December (6) ►  November (6) ►  October (8) ►  September (6) ►  August (7) ►  July (5) ►  June (7) ►  May (6) ►  April (7) ►  March (11) ►  February (5) ►  January (8) ►  2016 (89) ►  December (4) ►  November (8) ►  October (10) ►  September (8) ►  August (8) ►  July (7) ►  June (8) ►  May (7) ►  April (5) ►  March (10) ►  February (7) ►  January (7) ►  2015 (75) ►  December (7) ►  November (5) ►  October (11) ►  September (5) ►  August (3) ►  July (3) ►  June (8) ►  May (10) ►  April (6) ►  March (6) ►  February (7) ►  January (4) ►  2014 (68) ►  December (7) ►  November (8) ►  October (6) ►  September (8) ►  August (7) ►  July (3) ►  June (5) ►  May (6) ►  April (5) ►  March (6) ►  February (2) ►  January (5) ►  2013 (67) ►  December (3) ►  November (6) ►  October (7) ►  September (6) ►  August (3) ►  July (5) ►  June (6) ►  May (5) ►  April (9) ►  March (5) ►  February (5) ►  January (7) ►  2012 (43) ►  December (4) ►  November (4) ►  October (6) ►  September (6) ►  August (2) ►  July (5) ►  June (2) ►  May (5) ►  March (1) ►  February (5) ►  January (3) ►  2011 (40) ►  December (2) ►  November (1) ►  October (7) ►  September (3) ►  August (5) ►  July (2) ►  June (2) ►  May (2) ►  April (4) ►  March (4) ►  February (4) ►  January (4) ►  2010 (17) ►  December (5) ►  November (3) ►  October (4) ►  September (2) ►  July (1) ►  June (1) ►  February (1) ►  2009 (8) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (2) ►  January (2) ►  2008 (8) ►  December (2) ►  March (1) ►  January (5) ►  2007 (14) ►  December (1) ►  October (3) ►  September (1) ►  August (1) ►  July (2) ►  June (3) ►  May (1) ►  April (2) LOCKSS system has permission to collect, preserve, and serve this Archival Unit. Simple theme. Powered by Blogger. blog-dshr-org-5180 ---- DSHR's Blog: Cryptocurrency's Carbon Footprint DSHR's Blog I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation. Tuesday, April 13, 2021 Cryptocurrency's Carbon Footprint China’s bitcoin mines could derail carbon neutrality goals, study says and Bitcoin mining emissions in China will hit 130 million tonnes by 2024, the headlines say it all. Excusing this climate-destroying externality of Proof-of-Work blockchains requires a continuous flow of new misleading arguments. Below the fold I discuss one of the more recent novelties. In Bitcoin and Ethereum Carbon Footprints – Part 2, Moritz Seibert claims the reason for mining is to get the mining reward: Bitcoin transactions themselves don’t cause a lot of power usage. Getting the network to accept a transaction consumes almost no power, but having ASIC miners grind through the mathematical ether to solve valid blocks does. Miners are incentivized to do this because they are compensated for it. Presently, that compensation includes a block reward which is paid in bitcoin (6.25 BTC per block) as well as a miner fee (transaction fee). Transaction fees are denominated in fractional bitcoins and paid by the initiator of the transaction. Today, about 15% of total miners’ rewards are transactions fees, and about 85% are block rewards. So, he argues, Bitcoin's current catastrophic carbon footprint doesn't matter because, as the reward decreases, so will the carbon footprint: This also means that the power usage of the Bitcoin network won’t scale linearly with the number of transactions as the network becomes predominantly fee-based and less rewards-based (which causes a lot of power to the thrown at it in light of increasing BTC prices), and especially if those transactions take place on secondary layers. In other words, taking the ratio of “Bitcoin’s total power usage” to “Number of transactions” to calculate the “Power cost per transaction” falsely implies that all transactions hit the final settlement layer (they don’t) and disregards the fact that the final state of the Bitcoin base layer is a fee-based state which requires a very small fraction of Bitcoin’s overall power usage today (no more block rewards). Seibert has some vague idea that there are implications of this not just for the carbon footprint but also for the security of the Bitcoin blockchain: Going forward however, miners’ primary revenue source will change from block rewards to the fees paid for the processing of transactions, which don’t per se cause high carbon emissions. Bitcoin is set to become be a purely fee-based system (which may pose a risk to the security of the system itself if the overall hash rate declines, but that’s a topic for another article because a blockchain that is fully reliant on fees requires that BTCs are transacted with rather than held in Michael Saylor-style as HODLing leads to low BTC velocity, which does not contribute to security in a setup where fees are the only rewards for miners.) Lets leave aside the stunning irresponsibility of arguing that it is acceptable to dump huge amounts of long-lasting greenhouse gas into the atmosphere now because you believe that in the future you will dump less. How realistic is the idea that decreasing the mining reward will decrease the carbon footprint? The graph shows the history of the hash rate, which is a proxy for the carbon footprint. You can see the effect of the "halvening", when on May 11th 2020 the mining reward halved. There was a temporary drop, but the hash rate resumed its inexorable rise. This experiment shows that reducing the mining reward doesn't reduce the carbon footprint. So why does Seibert think that eliminating it will reduce the carbon footprint? The answer appears to be that Seibert thinks the purpose of mining is to create new Bitcoins, that the reason for the vast expenditure of energy is to make the process of creating new coins secure, and that it has nothing to do with the security of transactions. This completely misunderstands the technology. In The Economic Limits of Bitcoin and the Blockchain, Eric Budish examines the return on investment in two kinds of attacks on a blockchain like Bitcoin's. The simpler one is a 51% attack, in which an attacker controls the majority of the mining power. Budish explains what this allows the attacker to do: An attacker could (i) spend Bitcoins, i.e., engage in a transaction in which he sends his Bitcoins to some merchant in exchange for goods or assets; then (ii) allow that transaction to be added to the public blockchain (i.e., the longest chain); and then subsequently (iii) remove that transaction from the public blockchain, by building an alternative longest chain, which he can do with certainty given his majority of computing power. The merchant, upon seeing the transaction added to the public blockchain in (ii), gives the attacker goods or assets in exchange for the Bitcoins, perhaps after an escrow period. But, when the attacker removes the transaction from the public blockchain in (iii), the merchant effectively loses his Bitcoins, allowing the attacker to “double spend” the coins elsewhere. Such attacks are endemic among the smaller alt-coins; for example there were three successful attacks on Ethereum Classic in a single month last year. Clearly, Seibert's future "transaction only" Bitcoin must defend against them. There are two ways to mount a 51% attack, from the outside or from the inside. An outside attack requires more mining power than the insiders are using, whereas an insider attack only needs a majority of the mining power to conspire. Bitcoin miners collaborate in "mining pools" to reduce volatility of their income, and for many years it would have taken only three or so pools to conspire for a successful attack. But assuming insiders are honest, outsiders must acquire more mining power than the insiders are using. Clearly, Bitcoin insiders are using so much mining power that this isn't feasible. The point of mining isn't to create new Bitcoins. Mining is needed to make the process of adding a block to the chain, and thus adding a set of transactions to the chain, so expensive that it isn't worth it for an attacker to subvert the process. The cost, and thus in the case of Proof of Work the carbon footprint, is the whole point. As Budish wrote: From a computer security perspective, the key thing to note ... is that the security of the blockchain is linear in the amount of expenditure on mining power, ... In contrast, in many other contexts investments in computer security yield convex returns (e.g., traditional uses of cryptography) — analogously to how a lock on a door increases the security of a house by more than the cost of the lock. Lets consider the possible futures of a fee-based Bitcoin blockchain. It turns out that currently fee revenue is a smaller proportion of total miner revenue than Seibert claims. Here is the chart of total revenue (~$60M/day): And here is the chart of fee revenue (~$5M/day): Thus the split is about 8% fee, 92% reward: If security stays the same, blocksize stays the same, fees must increase to keep the cost of a 51% attack high enough. The chart shows the average fee hovering around $20, so the average cost of a single transaction would be over $240. This might be a problem for Seibert's requirement that "BTCs are transacted with rather than held". If blocksize stays the same, fees stay the same, security must decrease because the fees cannot cover the cost of enough hash power to deter a 51% attack. Similarly, in this case it would be 12 times cheaper to mount a 51% attack, which would greatly increase the risk of delivering anything in return for Bitcoin. It is already the case that users are advised to wait 6 blocks (about an hour) before treating a transaction as final. Waiting nearly half a day before finality would probably be a disincentive. If fees stay the same, security stays the same, blocksize must increase to allow for enough transactions so that their fees cover the cost of enough hash power to deter a 51% attack. Since 2017 Bitcoin blocks have been effectively limited to around 2MB, and the blockchain is now over one-third of a Terabyte growing at over 25%/yr. Increasing the size limit to say 22MB would solve the long-term problem of a fee-based system at the cost of reducing miners income in the short term by reducing the scarcity value of a slot in a block. Doubling the effective size of the block caused a huge controversy in the Bitcoin community for precisely this short vs. long conflict, so a much larger increase would be even more controversial. Not to mention that the size of the blockchain a year from now would be 3 times bigger imposing additional storage costs on miners. That is just the supply side. On the demand side it is an open question as to whether there would be 12 times the current demand for transactions costing $20 and taking an hour which, at least in the US, must each be reported to the tax authorities. Short vs. Long None of these alternatives look attractive. But there's also a second type of attack in Budish's analysis, which he calls "sabotage". He quotes Rosenfeld: In this section we will assume q < p [i.e., that the attacker does not have a majority]. Otherwise, all bets are off with the current Bitcoin protocol ... The honest miners, who no longer receive any rewards, would quit due to lack of incentive; this will make it even easier for the attacker to maintain his dominance. This will cause either the collapse of Bitcoin or a move to a modified protocol. As such, this attack is best seen as an attempt to destroy Bitcoin, motivated not by the desire to obtain Bitcoin value, but rather wishing to maintain entrenched economical systems or obtain speculative profits from holding a short position. Short interest in Bitcoin is currently small relative to the total stock, but much larger relative to the circulating supply. Budish analyzes various sabotage attack cases, with a parameter ∆attack representing the proportion of the Bitcoin value destroyed by the attack: For example, if ∆attack = 1, i.e., if the attack causes a total collapse of the value of Bitcoin, the attacker loses exactly as much in Bitcoin value as he gains from double spending; in effect, there is no chance to “double” spend after all. ... However, ∆attack is something of a “pick your poison” parameter. If ∆attack is small, then the system is vulnerable to the double-spending attack ... and the implicit transactions tax on economic activity using the blockchain has to be high. If ∆attack is large, then a short time period of access to a large amount of computing power can sabotage the blockchain. The current cryptocurrency bubble ensures that everyone is making enough paper profits from the golden eggs to deter them from killing the goose that lays them. But it is easy to create scenarios in which a rush for the exits might make killing the goose seem like the best way out. Seibert's misunderstanding illustrates the fundamental problem with permissionless blockchains. As I wrote in A Note On Blockchains: If joining the replica set of a permissionless blockchain is free, it will be vulnerable to Sybil attacks, in which an attacker creates many apparently independent replicas which are actually under his sole control. If creating and maintaining a replica is free, anyone can authorize any change they choose simply by creating enough Sybil replicas. Defending against Sybil attacks requires that membership in a replica set be expensive. There are many attempts to provide less environmentally damaging ways to make adding a block to a blockchain expensive, but attempts to make adding a block cheaper are self-defeating because they make the blockchain less secure. There are two reasons why the primary use of a permissionless blockchain cannot be transactions as opposed to HODL-ing: The lack of synchronization between the peers means that transactions must necessarily be slow. The need to defend against Sybil attacks means either that transactions must necessarily be expensive, or that blocks must be impractically large. Posted by David. at 8:00 AM Labels: bitcoin, security 5 comments: David. said... Seibert apparently believes (a) that a fee-only Bitcoin network would be secure, used for large numbers of transactions, and have a low carbon footprint, and (b) that the network would have a low carbon footprint because most transactions would use the Lightning network. Ignoring the contradiction, anyone who believes that the Lightning network would do the bulk of the transactions needs to read the accounts of people actually trying to transact using it. David Gerard writes: "Crypto guy loses a bet, and tries to pay the bet using the Lightning Network. Hilarity ensues." Indeed, the archived Twitter thread from the loser is a laugh-a-minute read. April 20, 2021 at 7:16 PM David. said... Jaime Powell shreds another attempt at cryptocurrency carbon footprint gaslighting in The destructive green fantasy of the bitcoin fanatics: "It is in this context that we should consider the latest “research” from the good folks at ETF-house-come-fund manager ARK Invest and $113bn payment company Square. Titled “Bitcoin is Key to an Abundant, Clean Energy Future”, it does exactly what you’d expect it to. Which is to try justify, after the fact, bitcoin’s insane energy use. Why? Because both entities are deeply involved in this “space” and now need to a) feel better about themselves and b) guard against people going off crypto on the grounds that it is actually a Very Bad Thing. ... The white paper imagines bitcoin mining being a solution, alongside battery storage, for excess energy. It also imagines that if solar and wind prices continue to collapse, bitcoin could eventually transition to being completely renewable-powered in the future. “Imagines” is the key word here. Because in reality, bitcoin mining is quite the polluter. It’s estimated that 72 per cent of bitcoin mining is concentrated in China, where nearly two-thirds of all electricity is generated by coal power, according to a recent Bank of America report. In fact, mining uses coal power so aggressively that when one coal mine flooded and shut down in Xianjiang province over the weekend, one-third of all bitcoin’s computing power went offline." April 25, 2021 at 5:00 PM David. said... In Jack Dorsey and Elon Musk agree on bitcoin's green credentials the BBC reports on yet another of Elon Musk's irresponsible cryptocurrency tweets: "The tweet comes soon after the release of a White Paper from Mr Dorsey's digital payment services firm Square, and global asset management business ARK Invest. Entitled "Bitcoin as key to an abundant, clean energy future", the paper argues that "bitcoin miners are unique energy buyers", because they offer flexibility, pay in a cryptocurrency, and can be based anywhere with an internet connection." The BBC fails to point out that Musk and Dorsey are "talking their book"; Tesla invested $1.6B and Square $220M in Bitcoin. So they have over $1.8B reasons to worry about efforts to limit its carbon footprint. April 25, 2021 at 5:10 PM David. said... This comment has been removed by the author. April 25, 2021 at 5:45 PM David. said... Nathan J. Robinson's Why Cryptocurrency Is A Giant Fraud has an interesting footnote, discussing a "pseudoscholarly masterpiece" of Bitcoin puffery by Vijay Boyapati: "Interestingly, Boyapati cites Bitcoin’s high transaction fees as a feature rather than a bug: “A recent criticism of the Bitcoin network is that the increase in fees to transmit bitcoins makes it unsuitable as a payment system. However, the growth in fees is healthy and expected… A network with ‘low’ fees is a network with little security and prone to external censorship. Those touting the low fees of Bitcoin alternatives are unknowingly describing the weakness of these so-called ‘alt-coins.’” As you can see, this successfully makes the case that high fees are unavoidable, but it also undermines the reasons why any sane person would use this as currency rather than a speculative investment." Right! A permissionless blockchain has to be expensive to run if it is to be secure. Those costs have either to be borne, ultimately, by the blockchain's users, or dumped on the rest of us as externalities (e.g. the blockchain's carbon footprint, the shortage of GPUs, ...). April 25, 2021 at 5:55 PM Post a Comment Newer Post Older Post Home Subscribe to: Post Comments (Atom) Blog Rules Posts and comments are copyright of their respective authors who, by posting or commenting, license their work under a Creative Commons Attribution-Share Alike 3.0 United States License. Off-topic or unsuitable comments will be deleted. DSHR DSHR in ANWR Recent Comments Full comments Blog Archive ▼  2021 (18) ▼  April (5) Dogecoin Disrupts Bitcoin! What Is The Point? NFTs and Web Archiving Cryptocurrency's Carbon Footprint Elon Musk: Threat or Menace? ►  March (3) ►  February (5) ►  January (5) ►  2020 (55) ►  December (4) ►  November (4) ►  October (3) ►  September (6) ►  August (5) ►  July (3) ►  June (6) ►  May (3) ►  April (5) ►  March (6) ►  February (5) ►  January (5) ►  2019 (66) ►  December (2) ►  November (4) ►  October (8) ►  September (5) ►  August (5) ►  July (7) ►  June (6) ►  May (7) ►  April (6) ►  March (7) ►  February (4) ►  January (5) ►  2018 (96) ►  December (7) ►  November (8) ►  October (10) ►  September (5) ►  August (8) ►  July (5) ►  June (7) ►  May (10) ►  April (8) ►  March (9) ►  February (9) ►  January (10) ►  2017 (82) ►  December (6) ►  November (6) ►  October (8) ►  September (6) ►  August (7) ►  July (5) ►  June (7) ►  May (6) ►  April (7) ►  March (11) ►  February (5) ►  January (8) ►  2016 (89) ►  December (4) ►  November (8) ►  October (10) ►  September (8) ►  August (8) ►  July (7) ►  June (8) ►  May (7) ►  April (5) ►  March (10) ►  February (7) ►  January (7) ►  2015 (75) ►  December (7) ►  November (5) ►  October (11) ►  September (5) ►  August (3) ►  July (3) ►  June (8) ►  May (10) ►  April (6) ►  March (6) ►  February (7) ►  January (4) ►  2014 (68) ►  December (7) ►  November (8) ►  October (6) ►  September (8) ►  August (7) ►  July (3) ►  June (5) ►  May (6) ►  April (5) ►  March (6) ►  February (2) ►  January (5) ►  2013 (67) ►  December (3) ►  November (6) ►  October (7) ►  September (6) ►  August (3) ►  July (5) ►  June (6) ►  May (5) ►  April (9) ►  March (5) ►  February (5) ►  January (7) ►  2012 (43) ►  December (4) ►  November (4) ►  October (6) ►  September (6) ►  August (2) ►  July (5) ►  June (2) ►  May (5) ►  March (1) ►  February (5) ►  January (3) ►  2011 (40) ►  December (2) ►  November (1) ►  October (7) ►  September (3) ►  August (5) ►  July (2) ►  June (2) ►  May (2) ►  April (4) ►  March (4) ►  February (4) ►  January (4) ►  2010 (17) ►  December (5) ►  November (3) ►  October (4) ►  September (2) ►  July (1) ►  June (1) ►  February (1) ►  2009 (8) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (2) ►  January (2) ►  2008 (8) ►  December (2) ►  March (1) ►  January (5) ►  2007 (14) ►  December (1) ►  October (3) ►  September (1) ►  August (1) ►  July (2) ►  June (3) ►  May (1) ►  April (2) LOCKSS system has permission to collect, preserve, and serve this Archival Unit. Simple theme. Powered by Blogger. blog-dshr-org-6705 ---- DSHR's Blog: Dogecoin Disrupts Bitcoin! DSHR's Blog I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation. Thursday, April 22, 2021 Dogecoin Disrupts Bitcoin! Two topics I've posted about recently, Elon Musk's cult and the illusory "prices" of cryptocurrencies, just intersected in spectacular fashion. On April 14 the Bitcoin "price" peaked at $63.4K. Early on April 15, the Musk cult saw this tweet from their prophet. Immediately, the Dogecoin "price" took off like a Falcon 9. A day later, Jemima Kelley reported that If you believe, they put a Dogecoin on the moon. That was to say that: Dogecoin — the crypto token that was started as a joke and that is the favourite of Elon Musk — is having a bit of a moment. And when we say a bit of a moment, we mean that it is on a lunar trajectory (in crypto talk: it is going to da moon). At the time of writing this, it is up over 200 per cent in the past 24 hours — more than tripling in value (for those of you who need help on percentages, it is Friday afternoon after all). Over the past week it’s up more than 550 per cent (almost seven times higher!). The headlines tell the story — Timothy B. Lee's Dogecoin has risen 400 percent in the last week because why not and Joanna Ossinger's Dogecoin Rips in Meme-Fueled Frenzy on Pot-Smoking Holiday. The Dogecoin "price" graph Kelly posted was almost vertical. The same day, Peter Schiff, the notorious gold-bug, tweeted: So far in 2021 #Bitcoin has lost 97% of its value verses #Dogecoin. The market has spoken. Dogecoin is eating Bitcoin. All the Bitcoin pumpers who claim Bitcoin is better than gold because its price has risen more than gold's must now concede that Dogecoin is better than Bitcoin. Below the fold I look back at this revolution in crypto-land. I'm writing on April 21, and the Bitcoin "price" is around $55K, about 87% of its peak on April 14. In the same period Dogecoin's "price" peaked at $0.37, and is now around $0.32, or 267% of its $0.12 "price" on April 14. There are some reasons for Bitcoin's slump apart from people rotating out of BTC into DOGE in response to Musk's tweet. Nivesh Rustgi reports: Bitcoin’s hashrate dropped 25% from all-time highs after an accident in the Xinjiang region’s mining industry caused flooding and a gas explosion, leading to 12 deaths with 21 workers trapped since. ... The leading Bitcoin mining data centers in the region have closed operations to comply with the fire and safety inspections. The Chinese central authority is conducting site inspections “on individual mining operations and related local government agencies,” tweeted Dovey Wan, partner at Primitive Crypto. ... The accident has reignited the centralization problems arising from China’s dominance of the Bitcoin mining sector, despite global expansion efforts. The drop in the hash rate had the obvious effects. David Gerard reports: The Bitcoin hash rate dropped from 220 exahashes per second to 165 EH/s. The rate of new blocks slowed. The Bitcoin mempool — the backlog of transactions waiting to be processed — has filled. Transaction fees peaked at just over $50 average on 18 April. The average BTC transaction fee is now just short of $60, with a median fee over $26! The BTC blockchain did around 350K transactions on April 15, but on April 16 it could only manage 190K. It is also true that DOGE had upward momentum before Musk's tweet. After being nearly flat for almost a month, it had already doubled since April 6. Kelly quotes David Kimberley at Freetrade: Dogecoin’s rise is a classic example of greater fool theory at play, Dogecoin investors are basically betting they’ll be able to cash out by selling to the next person wanting to invest. People are buying the cryptocurrency, not because they think it has any meaningful value, but because they hope others will pile in, push the price up and then they can sell off and make a quick buck. But when everyone is doing this, the bubble eventually has to burst and you’re going to be left short-changed if you don’t get out in time. And it’s almost impossible to say when that’s going to happen. Kelly also quotes Khadim Shubber explaining that this is all just entertainment: Bitcoin, and cryptocurrencies in general, are not directly analogous to the fairly mundane practice of buying a Lottery ticket, but this part of its appeal is often ignored in favour of more intellectual or high-brow explanations. It has all the hallmarks of a fun game, played out across the planet with few barriers to entry and all the joy and pain that usually accompanies gambling. There’s a single, addictive reward system: the price. The volatility of cryptocurrencies is often highlighted as a failing, but in fact it’s a key part of its appeal. Where’s the fun in an asset whose price snoozes along a predictable path? The rollercoaster rise and fall and rise again of the crypto world means that it’s never boring. If it’s down one day (and boy was it down yesterday) well, maybe the next day it’ll be up again. Note the importance of volatility. In a must-read interview that New York Magazine entitled BidenBucks Is Beeple Is Bitcoin Prof. George Galloway also stressed the importance of volatility: Young people want volatility. If you have assets and you’re already rich, you want to take volatility down. You want things to stay the way they are. But young people are willing to take risks because they can afford to lose everything. For the opportunity to double their money, they will risk losing everything. Imagine a person who has the least to lose: He’s in solitary confinement in a supermax-security prison. That person wants maximum volatility. He prays for such volatility, that there’s a revolution and they open the prison. People under the age of 40 are fed up. They have less than half of the economic security, as measured by the ratio of wealth to income, that their parents did at their age. Their share of overall wealth has crashed. A lot of them are bored. A lot of them have some stimulus money in their pocket. And in the case of GameStop, they did what’s kind of a mob short squeeze. ... I see crypto as a mini-revolution, just like GameStop. The central banks and governments are all conspiring to create more money to keep the shareholder class wealthy. Young people think, That’s not good for me, so I’m going to exit the ecosystem and I’m going to create my own currency. This all reinforces my skepticism about the "price" and "market cap" of cryptocurrencies. Posted by David. at 9:00 AM Labels: bitcoin 1 comment: David. said... Joe Weisenthal (@TheStalwart) tweeted: "WHY I LOVE THE DOGECOIN RALLY SO MUCH See all the serious stuff about decentralized finance, or stores of value, or people thirsting for alternatives for the dollar. Nobody can talk about it with a straight face when it comes to Dogecoin. ... But really, all the crypto talking points go out the window with Doge." April 26, 2021 at 12:19 PM Post a Comment Older Post Home Subscribe to: Post Comments (Atom) Blog Rules Posts and comments are copyright of their respective authors who, by posting or commenting, license their work under a Creative Commons Attribution-Share Alike 3.0 United States License. Off-topic or unsuitable comments will be deleted. DSHR DSHR in ANWR Recent Comments Full comments Blog Archive ▼  2021 (18) ▼  April (5) Dogecoin Disrupts Bitcoin! What Is The Point? NFTs and Web Archiving Cryptocurrency's Carbon Footprint Elon Musk: Threat or Menace? ►  March (3) ►  February (5) ►  January (5) ►  2020 (55) ►  December (4) ►  November (4) ►  October (3) ►  September (6) ►  August (5) ►  July (3) ►  June (6) ►  May (3) ►  April (5) ►  March (6) ►  February (5) ►  January (5) ►  2019 (66) ►  December (2) ►  November (4) ►  October (8) ►  September (5) ►  August (5) ►  July (7) ►  June (6) ►  May (7) ►  April (6) ►  March (7) ►  February (4) ►  January (5) ►  2018 (96) ►  December (7) ►  November (8) ►  October (10) ►  September (5) ►  August (8) ►  July (5) ►  June (7) ►  May (10) ►  April (8) ►  March (9) ►  February (9) ►  January (10) ►  2017 (82) ►  December (6) ►  November (6) ►  October (8) ►  September (6) ►  August (7) ►  July (5) ►  June (7) ►  May (6) ►  April (7) ►  March (11) ►  February (5) ►  January (8) ►  2016 (89) ►  December (4) ►  November (8) ►  October (10) ►  September (8) ►  August (8) ►  July (7) ►  June (8) ►  May (7) ►  April (5) ►  March (10) ►  February (7) ►  January (7) ►  2015 (75) ►  December (7) ►  November (5) ►  October (11) ►  September (5) ►  August (3) ►  July (3) ►  June (8) ►  May (10) ►  April (6) ►  March (6) ►  February (7) ►  January (4) ►  2014 (68) ►  December (7) ►  November (8) ►  October (6) ►  September (8) ►  August (7) ►  July (3) ►  June (5) ►  May (6) ►  April (5) ►  March (6) ►  February (2) ►  January (5) ►  2013 (67) ►  December (3) ►  November (6) ►  October (7) ►  September (6) ►  August (3) ►  July (5) ►  June (6) ►  May (5) ►  April (9) ►  March (5) ►  February (5) ►  January (7) ►  2012 (43) ►  December (4) ►  November (4) ►  October (6) ►  September (6) ►  August (2) ►  July (5) ►  June (2) ►  May (5) ►  March (1) ►  February (5) ►  January (3) ►  2011 (40) ►  December (2) ►  November (1) ►  October (7) ►  September (3) ►  August (5) ►  July (2) ►  June (2) ►  May (2) ►  April (4) ►  March (4) ►  February (4) ►  January (4) ►  2010 (17) ►  December (5) ►  November (3) ►  October (4) ►  September (2) ►  July (1) ►  June (1) ►  February (1) ►  2009 (8) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (2) ►  January (2) ►  2008 (8) ►  December (2) ►  March (1) ►  January (5) ►  2007 (14) ►  December (1) ►  October (3) ►  September (1) ►  August (1) ►  July (2) ►  June (3) ►  May (1) ►  April (2) LOCKSS system has permission to collect, preserve, and serve this Archival Unit. Simple theme. Powered by Blogger. blog-dshr-org-7459 ---- DSHR's Blog: Techno-hype part 1 DSHR's Blog I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation. Tuesday, November 14, 2017 Techno-hype part 1 Don't, don't, don't, don't believe the hype! Public Enemy New technologies are routinely over-hyped because people under-estimate the gap between a technology that works and a technology that is in everyday use by normal people. You have probably figured out that I'm skeptical of the hype surrounding blockchain technology. Despite incident-free years spent routinely driving in company with Waymo's self-driving cars, I'm also skeptical of the self-driving car hype. Below the fold, an explanation. Clearly, self-driving cars driven by a trained self-driving car driver in Bay Area traffic work fine: We've known for several years now that Waymo's (previously Google's) cars can handle most road conditions without a safety driver intervening. Last year, the company reported that its cars could go about 5,000 miles on California roads, on average, between human interventions. Crashes per 100M miles Waymo's cars are much safer than almost all human drivers: Waymo has logged over two million miles on U.S. streets and has only had fault in one accident, making its cars by far the lowest at-fault rate of any driver class on the road— about 10 times lower than our safest demographic of human drivers (60–69 year-olds) and 40 times lower than new drivers, not to mention the obvious benefits gained from eliminating drunk drivers. However, Waymo’s vehicles have a knack for getting hit by human drivers. When we look at total accidents (at fault and not), the Waymo accident rate is higher than the accident rate of most experienced drivers ... Most of these accidents are fender-benders caused by humans, with no fatalities or serious injuries. The leading theory is that Waymo’s vehicles adhere to the letter of traffic law, leading them to brake for things they are legally supposed to brake for (e.g., pedestrians approaching crosswalks). Since human drivers are not used to this lawful behavior, it leads to a higher rate of rear-end collisions (where the human driver is at-fault). Clearly, this is a technology that works. I would love it if my grand-children never had to learn to drive, but even a decade from now I think they will still need to. But, as Google realized some time ago, just being safer on average than most humans almost all the time is not enough for mass public deployment of self-driving cars. Back in June, John Markoff wrote: Three years ago, Google’s self-driving car project abruptly shifted from designing a vehicle that would drive autonomously most of the time while occasionally requiring human oversight, to a slow-speed robot without a brake pedal, accelerator or steering wheel. In other words, human driving was no longer permitted. The company made the decision after giving self-driving cars to Google employees for their work commutes and recording what the passengers did while the autonomous system did the driving. In-car cameras recorded employees climbing into the back seat, climbing out of an open car window, and even smooching while the car was in motion, according to two former Google engineers. “We saw stuff that made us a little nervous,” Chris Urmson, a roboticist who was then head of the project, said at the time. He later mentioned in a blog post that the company had spotted a number of “silly” actions, including the driver turning around while the car was moving. Johnny Luu, a spokesman for Google’s self-driving car effort, now called Waymo, disputed the accounts that went beyond what Mr. Urmson described, but said behavior like an employee’s rummaging in the back seat for his laptop while the car was moving and other “egregious” acts contributed to shutting down the experiment. Gareth Corfield at The Register adds: Google binned its self-driving cars' "take over now, human!" feature because test drivers kept dozing off behind the wheel instead of watching the road, according to reports. "What we found was pretty scary," Google Waymo's boss John Krafcik told Reuters reporters during a recent media tour of a Waymo testing facility. "It's hard to take over because they have lost contextual awareness." ... Since then, said Reuters, Google Waymo has focused on technology that does not require human intervention. Timothy B. Lee at Ars Technica writes: Waymo cars are designed to never have anyone touch the steering wheel or pedals. So the cars have a greatly simplified four-button user interface for passengers to use. There are buttons to call Waymo customer support, lock and unlock the car, pull over and stop the car, and start a ride. But, during a recent show-and-tell with reporters, they weren't allowed to press the "pull over" button: a Waymo spokesman tells Ars that the "pull over" button does work. However, the event had a tight schedule, and it would have slowed things down too much to let reporters push it. Google was right to identify the "hand-off" problem as essentially insoluble, because the human driver would have lost "situational awareness". Jean-Louis Gassée has an appropriately skeptical take on the technology, based on interviews with Chris Urmson: Google’s Director of Self-Driving Cars from 2013 to late 2016 (he had joined the team in 2009). In a SXSW talk in early 2016, Urmson gives a sobering yet helpful vision of the project’s future, summarized by Lee Gomesin an IEEE Spectrum article [as always, edits and emphasis mine]: “Not only might it take much longer to arrive than the company has ever indicated — as long as 30 years, said Urmson — but the early commercial versions might well be limited to certain geographies and weather conditions. Self-driving cars are much easier to engineer for sunny weather and wide-open roads, and Urmson suggested the cars might be sold for those markets first.” But the problem is actually much worse than either Google or Urmson say. Suppose, for the sake of argument, that self-driving cars three times as good as Waymo's are in wide use by normal people. A normal person would encounter a hand-off once in 15,000 miles of driving, or less than once a year. Driving would be something they'd be asked to do maybe 50 times in their life. Even if, when the hand-off happened, the human was not "climbing into the back seat, climbing out of an open car window, and even smooching" and had full "situational awareness", they would be faced with a situation too complex for the car's software. How likely is it that they would have the skills needed to cope, when the last time they did any driving was over a year ago, and on average they've only driven 25 times in their life? Current testing of self-driving cars hands-off to drivers with more than a decade of driving experience, well over 100,000 miles of it. It bears no relationship to the hand-off problem with a mass deployment of self-driving technology. Remember the crash of AF447? the aircraft crashed after temporary inconsistencies between the airspeed measurements – likely due to the aircraft's pitot tubes being obstructed by ice crystals – caused the autopilot to disconnect, after which the crew reacted incorrectly and ultimately caused the aircraft to enter an aerodynamic stall, from which it did not recover. This was a hand-off to a crew that was highly trained, but had never before encountered a hand-off during cruise. What this means is that unrestricted mass deployment of self-driving cars requires Level 5 autonomy: Level 5 _ Full Automation System capability: The driverless car can operate on any road and in any conditions a human driver could negotiate. • Driver involvement: Entering a destination. Note that Waymo is just starting to work with Level 4 cars (the link is to a fascinating piece by Alexis C. Madrigal on Waymo's simulation and testing program). There are many other difficulties on the way to mass deployment, outlined by Timothy B. Lee at Ars Technica. Although Waymo is actually testing Level 4 cars in the benign environment of Phoenix, AZ: Waymo, the autonomous car company from Google’s parent company Alphabet, has started testing a fleet of self-driving vehicles without any backup drivers on public roads, its chief executive officer said Tuesday. The tests, which will include passengers within the next few months, mark an important milestone that brings autonomous vehicle technology closer to operating without any human intervention. But the real difficulty is this. The closer the technology gets to Level 5, the worse the hand-off problem gets, because the human has less experience. Incremental progress in deployments doesn't make this problem go away. Self-driving taxis in restricted urban areas maybe in the next five years; a replacement for the family car, don't hold your breath. My grand-children will still need to learn to drive. Posted by David. at 8:00 AM Labels: techno-hype 31 comments: David. said... Cecilia Kang's Where Self-Driving Cars Go To Learn looks at the free-for-all testing environment in Arizona: "Over the past two years, Arizona deliberately cultivated a rules-free environment for driverless cars, unlike dozens of other states that have enacted autonomous vehicle regulations over safety, taxes and insurance. Arizona took its anything-goes approach while federal regulators delayed formulating an overarching set of self-driving car standards, leaving a gap for states. The federal government is only now poised to create its first law for autonomous vehicles; the law, which echoes Arizona’s stance, would let hundreds of thousands of them be deployed within a few years and would restrict states from putting up hurdles for the industry." What could possibly go wrong? November 16, 2017 at 9:30 PM Mike K said... It seems to me that that there's a "good enough" solution for mass deployment before Stage 5 is in production, provided that the "pull over button" works and that in all situations where you invoke concern about a human-driver-takeover, the AI can reliably default to avoiding hitting anything while it decelerates. That is, if the AI realizes it doesn't know how to handle the situation normally, it accepts defeat and comes to stop. (That seems to be the norm during current testing, based on my read of Madrigal's Waymo article.) If that's the case, humans don't suddenly have to take over a moving vehicle that's already in a boundary situation. Instead, having stopped, the AI can then reassess (if the confounding factors have changed) or the human can slowly drive out of proximity. Or perhaps such situations become akin to a flat tire is now--some people are capable of recovering on their own, others wait for roadside assistance. Coming to a stop on, or even alongside, a highway is far from ideal, I concede, and will lead to more rear-enders as long as humans still drive some percentage of vehicles. But rear end accidents are far less likely to cause fatalities than other types (citation needed,) so that seems like an acceptable trade-off during a transitional period. All that said, I'm cautiously pessimistic about self-driving cars in our lifetimes. I'm more worried about bugs, outages, and hacking preventing widespread implementation. November 20, 2017 at 12:33 PM David. said... "how much preparation have federal transportation authorities carried out to meet the challenge of the advent of self-driving cars and trucks? Not nearly enough, according to a new 44-page report by the Government Accountability Office, a Congressional watchdog agency." reports Paul Feldman. And: "the U.S. House of Representatives has approved a bill allowing self-driving vehicles to operate on public roadways with minimal government supervision. Similar legislation has been OK’d by a Senate committee, but is currently stalled by a handful of senators concerned about safety provisions." December 11, 2017 at 7:13 AM David. said... In increasing order of skepticism, we have first A Decade after DARPA: Our View on the State of the Art in Self-Driving Cars by Bryan Salesky, CEO, Argo AI (Ford's self-driving effort): "Those who think fully self-driving vehicles will be ubiquitous on city streets months from now or even in a few years are not well connected to the state of the art or committed to the safe deployment of the technology." Second, After Peak Hype, Self-Driving Cars Enter the Trough of Disillusionment by Aarian Marshall at Wired using Gartner’s “hype cycle” methodology: "Volvo’s retreat is just the latest example of a company cooling on optimistic self-driving car predictions. In 2012, Google CEO Sergey Brin said even normies would have access to autonomous vehicles in fewer than five years—nope. Those who shelled out an extra $3,000 for Tesla’s Enhanced Autopilot are no doubt disappointed by its non-appearance, nearly six months after its due date. New Ford CEO Jim Hackett recently moderated expectations for the automaker’s self-driving service, which his predecessor said in 2016 would be deployed at scale by 2021. “We are going to be in the market with products in that time frame,” he told the San Francisco Chronicle. “But the nature of the romanticism by everybody in the media about how this robot works is overextended right now.”" And third Wired: Self Driving Car Hype Crashes Into Harsh Realities by Yves Smith at naked capitalism, which is the only piece to bring up the hand-off problem: "The fudge is to have a human at ready to take over the car in case it asks for help. First, as one might infer, the human who is suddenly asked to intervene is going to have to quickly asses the situation. The handoff delay means a slower response than if a human had been driving the entire time. Second, and even worse, the human suddenly asked to take control might not even see what the emergency need is. Third, the car itself might not recognize that it is about to get into trouble." All three pieces are worth reading. December 30, 2017 at 7:09 AM David. said... More skepticism from Christian Wolmar: “This is a fantasy that has not been thought through, and is being promoted by technology and auto manufacturers because tech companies have vast amounts of footloose capital they don’t know what to do with, and auto manufacturers are terrified they’re not on board with the new big thing,” he said. “So billions are being spent developing technology that nobody has asked for, that will not be practical, and that will have many damaging effects.” He has an entire book on the topic. January 11, 2018 at 8:21 AM David. said... Tim Bradshaw reports: "Autonomous vehicles are in danger of being turned into “weapons”, leading governments around the world to block cars operated by foreign companies, the head of Baidu’s self-driving car programme has warned. Qi Lu, chief operating officer at the Chinese internet group, said security concerns could become a problem for global carmakers and technology companies, including the US and China. “It has nothing to do with any particular government — it has to do with the very nature of autonomy,” he said on the sidelines of the Consumer Electronics Show last week. “You have an object that is capable of moving by itself. By definition, it is a weapon.” Charlie Stross figured this out ten years ago. January 15, 2018 at 8:40 AM David. said... “We will have autonomous cars on the road, I believe within the next 18 months,” [Uber CEO Khosrowshahi} said. ... for example, Phoenix, there will be 95% of cases where the company may not have everything mapped perfectly, or the weather might not be perfect, or there could be other factors that will mean Uber will opt to send a driver. “But in 5 percent of cases, we’ll send an autonomous car,” Khosrowshahi said, when everything’s just right, and still the user will be able to choose whether they get an AV or a regular car." reports Darrell Etherington at TechCrunch. Given that Uber loses $5B/yr and Khosrowshahi has 25 months to IPO it, you should treat everything he says as pre-IPO hype. January 23, 2018 at 2:01 PM David. said... Uber and Lyft want you banned from using your own self-driving car in urban areas is the title of a piece by Ethan Baron at siliconbeat. The geometric impossibility of replacing mass transit with fleets of autonomous cars is starting to sink in. February 4, 2018 at 5:06 PM David. said... Ross Marchand at Real Clear Policy looks into Waymo's reported numbers: "The company’s headline figures since 2015 are certainly encouraging, with “all reported disengagements” dropping from .80 per thousand miles (PTM) driven to .18 PTM. Broken down by category, however, this four-fold decrease in disengagements appears very uneven. While the rate of technology failures has fallen by more than 90 percent (from .64 to .06), unsafe driving rates decreased only by 25 percent (from .16 to .12). ... But the ability of cars to analyze situations on the road and respond has barely shown improvement since the beginning of 2016. In key categories, like “incorrect behavior prediction” and “unwanted maneuver of the vehicle,” Waymo vehicles actually did worse in 2017 than in 2016." February 19, 2018 at 11:30 AM David. said... And also The most cutting-edge cars on the planet require an old-fashioned handwashing: "For example, soap residue or water spots could effectively "blind" an autonomous car. A traditional car wash's heavy brushes could jar the vehicle's sensors, disrupting their calibration and accuracy. Even worse, sensors, which can cost over $100,000, could be broken. A self-driving vehicle's exterior needs to be cleaned even more frequently than a typical car because the sensors must remain free of obstructions. Dirt, dead bugs, bird droppings or water spots can impact the vehicle's ability to drive safely." February 23, 2018 at 7:24 AM David. said... "[California]’s Department of Motor Vehicles said Monday that it was eliminating a requirement for autonomous vehicles to have a person in the driver’s seat to take over in the event of an emergency. ... The new rules also require companies to be able to operate the vehicle remotely ... and communicate with law enforcement and other drivers when something goes wrong." reports Daisuke Wakabayashi at the NYT. Note that these are not level 5 autonomous cars, they are remote-controlled. February 26, 2018 at 8:37 PM David. said... "Cruise vehicles "can't easily handle two-way residential streets that only have room for one car to pass at a time. That's because Cruise cars treat the street as one lane and always prefer to be in the center of a lane, and oncoming traffic causes the cars to stop." Other situations that give Cruise vehicles trouble: - Distinguishing between motorcycles and bicycles - Entering tunnels, which can interfere with the cars' GPS sensors - U-turns - Construction zones" From Timothy B. Lee's New report highlights limitations of Cruise self-driving cars. It is true that GM's Cruise is trying to self-drive in San Francisco, which isn't an easy place for humans. But they are clearly a long way from Waymo's level, even allowing for the easier driving in Silicon Valley and Phoenix. March 14, 2018 at 5:29 PM David. said... "While major technology and car companies are teaching cars to drive themselves, Phantom Auto is working on remote control systems, often referred to as teleoperation, that many see as a necessary safety feature for the autonomous cars of the future. And that future is closer than you might think: California will allow companies to test autonomous vehicles without a safety driver — as long as the car can be operated remotely — starting next month." from John R. Quain's When Self-Driving Cars Can’t Help Themselves, Who Takes the Wheel?. So the car is going to call Tech Support and be told "All our operators are busy driving other cars. You call is important to us, please don't hang up." March 15, 2018 at 10:59 AM David. said... "Police in Tempe, Arizona, have released dash cam footage showing the final seconds before an Uber self-driving vehicle crashed into 49-year-old pedestrian Elaine Herzberg. She died at the hospital shortly afterward. ... Tempe police also released internal dash cam footage showing the car's driver, Rafaela Vasquez, in the seconds before the crash. Vasquez can be seen looking down toward her lap for almost five seconds before glancing up again. Almost immediately after looking up, she gets a look of horror on her face as she realizes the car is about to hit Herzberg." writes Timothy B. Lee at Ars Technica. In this case the car didn't hand off to the human, but even if it had the result would likely have been the same. March 22, 2018 at 6:17 AM David. said... Timothy B. Lee at Ars Technica has analyzed the video and writes Video suggests huge problems with Uber’s driverless car program: "The video shows that Herzberg crossed several lanes of traffic before reaching the lane where the Uber car was driving. You can debate whether a human driver should have been able to stop in time. But what's clear is that the vehicle's lidar and radar sensors—which don't depend on ambient light and had an unobstructed view—should have spotted her in time to stop. On top of that, the video shows that Uber's "safety driver" was looking down at her lap for nearly five seconds just before the crash. This suggests that Uber was not doing a good job of supervising its safety drivers to make sure they actually do their jobs." March 22, 2018 at 5:03 PM David. said... "In a blogpost, Tesla said the driver of the sport-utility Model X that crashed in Mountain View, 38-year-old Apple software engineer Wei Huang, “had received several visual and one audible hands-on warning earlier in the drive and the driver’s hands were not detected on the wheel for six seconds prior to the collision." reports The Guardian. The car tried to hand off to the driver but he didn't respond. March 31, 2018 at 8:43 PM David. said... “Technology does not eliminate error, but it changes the nature of errors that are made, and it introduces new kinds of errors,” said Chesley Sullenberger, the former US Airways pilot who landed a plane in the Hudson River in 2009 after its engines were struck by birds and who now sits on a Department of Transportation advisory committee on automation. “We have to realize that it’s not a panacea.” from the New York Times editorial The Bright, Shiny Distraction of Self-Driving Cars. April 1, 2018 at 8:29 PM David. said... In The way we regulate self-driving cars is broken—here’s how to fix it Timothy B. Lee sets out a very pragmatic approach to regulation of self-driving cars. Contrast this with the current rush to exempt them from regulations! For example: "Anyone can buy a conventional car and perform safety tests on it. Academic researchers, government regulators, and other independent experts can take a car apart, measure its emissions, probe it for computer security flaws, and subject it to crash tests. This means that if a car has problems that aren't caught (or are even covered up) by the manufacturer, they're likely to be exposed by someone else. But this kind of independent analysis won't be an option when Waymo introduces its driverless car service later this year. Waymo's cars won't be for sale at any price, and the company likely won't let customers so much as open the hood. This means that the public will be mostly dependent on Waymo itself to provide information about how its cars work." April 10, 2018 at 12:11 PM David. said... In People must retain control of autonomous vehicles Ashley Nunes, Bryan Reimer and Joseph F. Coughlin sound a warning against Level 5 self-driving vehicles and many strong cautions against rushed deployment of lower levels in two areas: Liability: "Like other producers, developers of autonomous vehicles are legally liable for damages that stem from the defective design, manufacture and marketing of their products. The potential liability risk is great for driverless cars because complex systems interact in ways that are unexpected." Safety: "Driverless cars should be treated much like aircraft, in which the involvement of people is required despite such systems being highly automated. Current testing of autonomous vehicles abides by this principle. Safety drivers are present, even though developers and regulators talk of full automation." April 11, 2018 at 2:06 PM David. said... Alex Roy's The Half-Life Of Danger: The Truth Behind The Tesla Model X Crash is a must-read deep dive into the details of the argument in this post, with specifics about Tesla's "Autopilot" and Cadillac's "SuperCruise": "As I stated a year ago, the more such systems substitute for human input, the more human skills erode, and the more frequently a 'failure' and/or crash is attributed to the technology rather than human ignorance of it. Combine the toxic marriage of human ignorance and skill degradation with an increasing number of such systems on the road, and the number of crashes caused by this interplay is likely to remain constant—or even rise—even if their crash rate declines." April 18, 2018 at 9:16 AM David. said... A collection of posts about Stanford's autonomous car research is here. See, in particular, Holly Russell's research on the hand-off problem. April 26, 2018 at 9:07 AM David. said... "All companies testing autonomous vehicles on [California]’s public roads must provide annual reports to the DMV about “disengagements” that occur when a human backup driver has to take over from the robotic system. The DMV told eight companies with testing permits to provide clarification about their reports." from Ethan Barron's Self-driving cars’ shortcomings revealed in DMV reports. The clarifications are interesting, including such things as: "delayed perception of a pedestrian walking into the street" "failed to give way to another vehicle trying to enter a lane" "trouble when other drivers behaved badly. Other drivers had failed to yield, run stop signs, drifted out of their own lane and cut in front aggressively" May 3, 2018 at 4:03 PM David. said... Angie Schmidt's How Uber’s Self-Driving System Failed to Brake and Avoid Killing Elaine Herzberg reports on the devastating NTSB report: "The report doesn’t assign culpability for the crash but it points to deficiencies in Uber’s self-driving car tests. Uber’s vehicle used Volvo software to detect external objects. Six seconds before striking Herzberg, the system detected her but didn’t identify her as a person. The car was traveling at 43 mph. The system determined 1.3 seconds before the crash that emergency braking would be needed to avert a collision. But the vehicle did not respond, striking Herzberg at 39 mph. NTSB writes: According to Uber, emergency braking maneuvers are not enabled while the vehicle is under computer control, to reduce the potential for erratic vehicle behavior. The vehicle operator is relied on to intervene and take action. The system is not designed to alert the operator. Amir Efrati at The Information cites two anonymous sources at Uber who say the company “tuned” its emergency brake system to be less sensitive to unidentified objects." People need to be jailed for this kind of irresponsibility. May 24, 2018 at 3:40 PM David. said... Timothy B. Lee's As Uber and Tesla struggle with driverless cars, Waymo moves forward stresses how far ahead Waymo is in (mostly) self-driving cars: "So Waymo's recently announced car deals—20,000 cars from Jaguar Land Rover, another 62,000 from Fiat Chrysler—are just the latest sign that Waymo is assembling all the pieces it will need for a full-scale commercial taxi service in the Phoenix area and likely other places not long after that. It would be foolish for Waymo to invest so heavily in all this infrastructure if its technology were still years away from being ready for commercial deployment. Those 23 rider support workers need customers to talk to. And, of course, Waymo needs to get those 82,000 Jaguar and Chrysler vehicles on the road to avoid losing millions of dollars on the investment. Throughout all this, Waymo has been testing its vehicles at a faster and faster pace. It took Waymo six months to go from 3 million testing miles in May 2017 to 4 million miles in November. Then it took around three months to reach 5 million miles in February, and less than three months to reach 6 million in early May." June 1, 2018 at 8:36 PM David. said... Timothy B. Lee's Why emergency braking systems sometimes hit parked cars and lane dividers makes the same point as my post, this time about "driver assistance" systems: "The fundamental issue here is that tendency to treat lane-keeping, adaptive cruise control, and emergency braking as independent systems. As we've seen, today's driver assistance systems have been created in a piecemeal fashion, with each system following a do-no-harm philosophy. They only intervene if they're confident they can prevent an accident—or at least avoid causing one. If they're not sure, they do nothing and let the driver make the decision. The deadly Tesla crash in Mountain View illustrates how dangerous this kind of system can be." Thus: "Once a driver-assistance system reaches a certain level of complexity, the assumption that it's safest for the system to do nothing no longer makes sense. Complex driver assistance systems can behave in ways that surprise and confuse drivers, leading to deadly accidents if the driver's attention wavers for just a few seconds. At the same time, by handling most situations competently, these systems can lull drivers into a false sense of security and cause them to pay less careful attention to the road." June 8, 2018 at 9:22 AM David. said... "[Drive.AI board member Andrew Ng] seems to be saying that he is giving up on the promise of self-driving cars seamlessly slotting into the existing infrastructure. Now he is saying that every person, every “bystander”, is going to be responsible for changing their behavior to accommodate imperfect self-driving systems. And they are all going to have to be trained! I guess that means all of us. Whoa!!!! The great promise of self-driving cars has been that they will eliminate traffic deaths. Now [Ng] is saying that they will eliminate traffic deaths as long as all humans are trained to change their behavior? What just happened? If changing everyone’s behavior is on the table then let’s change everyone’s behavior today, right now, and eliminate the annual 35,000 fatalities on US roads, and the 1 million annual fatalities world-wide. Let’s do it today, and save all those lives." From Bothersome Bystanders and Self Driving Cars, Rodney Brooks' awesome takedown of Andrew Ng's truly stupid comments reported in Russell Brandom's Self-driving cars are headed toward an AI roadblock: "There’s growing concern among AI experts that it may be years, if not decades, before self-driving systems can reliably avoid accidents. As self-trained systems grapple with the chaos of the real world, experts like NYU’s Gary Marcus are bracing for a painful recalibration in expectations, a correction sometimes called “AI winter.” That delay could have disastrous consequences for companies banking on self-driving technology, putting full autonomy out of reach for an entire generation." July 5, 2018 at 7:56 PM David. said... "Drive.ai plans to license its technology to others, and has struck a deal with Lyft, a ride-hailing firm, to operate vehicles in and around San Francisco. “I think the autonomous-vehicle industry should be upfront about recognising the limitations of today’s technology,” says Mr Ng. It is surely better to find pragmatic ways to work around those limitations than pretend they do not exist or promise that solving them will be easy." reports The Economist. They describe drive.ai's extremely constrained trial service: "Drive.ai, a startup, has deployed seven minivans to transport people within a limited area of the city that includes an office park and a retail area. ... All pick-ups and drop-offs happen at designated stops, to minimise disruption as passengers get on and off. ... The vans are painted a garish orange and clearly labelled as self-driving vehicles. ... Screens mounted on the vans’ exteriors let them communicate with pedestrians and other road users, ... Similarly, rather than trying to build a vehicle that can navigate roadworks (a notoriously difficult problem, given inconsistent signage), Drive.ai has arranged for the city authorities to tell it where any roadworks are each day, so that its vehicles can avoid them. ... Drive.ai will limit the service to daylight hours, which makes things simpler and safer. Each vehicle will initially have a safety driver, ... If a van gets confused it can stop and call for help: a remote supervisor then advises it how to proceed (rather than driving the vehicle remotely, which would not be safe, says Mr Ng).: It seems that Mr. Ng has learned from the response to his comments that it isn't our responsibility to avoid running into his cars. August 4, 2018 at 12:38 PM David. said... In Even self-driving leader Waymo is struggling to reach full autonomy Timothy B. Lee reports on the "launch" of Waymo's "public" "autonomous" taxi service: "In late September, a Waymo spokeswoman told Ars by email that the Phoenix service would be fully driverless and open to members of the public—claims I reported in this article. We now know that Waymo One won't be fully driverless; there will be a driver in the driver's seat. And Waymo One is open to the public in only the narrowest, most technical sense: initially it will only be available to early riders—the same people who have been participating in Waymo's test program for months." Even in the benign environment of Phoenix, trained self-driving car drivers are still needed: "Over the course of October and November, Randazzo spent three days observing Waymo's cars in action—either by following them on the roads or staking out the company's depot in Chandler. He posted his findings in a YouTube video. The findings suggest that Waymo's vehicles aren't yet ready for fully autonomous operation." December 7, 2018 at 11:03 AM David. said... Paris Marx writes in Self-Driving Cars Will Always Be Limited. Even the Industry Leader Admits it: "even Waymo’s CEO, John Krafcik, now admits that the self-driving car that can drive in any condition, on any road, without ever needing a human to take control — what’s usually called a “level 5” autonomous vehicle — will never exist. At the Wall Street Journal’s D.Live conference on November 13, Krafcik said that “autonomy will always have constraints.” It will take decades for self-driving cars to become common on roads, and even then they will not be able to drive in certain conditions, at certain times of the year, or in any weather. In short, sensors on autonomous vehicles don’t work well in snow or rain — and that may never change." January 8, 2019 at 6:12 AM David. said... Christian Wolmar's My speech on driverless cars at the Transportation Research Board, Washington DC, 15/1/19 is a must-read debunking of the autonomous car hype by a respected British transport journalist. Among his many points: "Michael DeKort, an aerospace engineer turned whistleblower wrote recently: ‘Handover cannot be made safe no matter what monitoring and notification system is used. That is because enough time cannot be provided to regain proper situational awareness in critical scenarios.’" No-one could have predicted ... January 20, 2019 at 6:06 AM David. said... Ashley Nunes' The Cost of Self-Driving Cars Will Be the Biggest Barrier to Their Adoption tackles the important question of whether, even if they can be made safe, self-driving cars can be affordable: "However, the systems underlying HAVs, namely sensors, radar, and communication devices, are costly compared to older (less safe) vehicles. This raises questions about the affordability of life-saving technology for those who need it most. While all segments of society are affected by road crashes, the risks are greatest for the poor. These individuals are more likely to die on the road partly because they own older vehicles that lack advanced safety features and have lower crash-test ratings. Some people have suggested that the inability to purchase HAVs outright may be circumvented by offering these vehicles for-hire. This setup, analogous to modern day taxis, distributes operating costs over a large number of consumers making mobility services more affordable. Self-driving technology advocates suggest that so-called robotaxis, operated by for-profit businesses, could produce considerable savings for consumers." Nunes computes that, even assuming the capital cost of a robotaxi is a mere $15K, the answer is public subsidy: "consumer subsidies will be crucial to realizing the life-saving benefits of this technology. Although politically challenging, public revenues already pay for a portion of road crash-related expenditures. In the United States alone, this amounts to $18 billion, the equivalent of over $156 in added taxes for every household." But to justify the subsidy, they have to be safe. Which brings us back to the hand-off problem. March 14, 2019 at 7:05 AM Post a Comment Newer Post Older Post Home Subscribe to: Post Comments (Atom) Blog Rules Posts and comments are copyright of their respective authors who, by posting or commenting, license their work under a Creative Commons Attribution-Share Alike 3.0 United States License. Off-topic or unsuitable comments will be deleted. DSHR DSHR in ANWR Recent Comments Full comments Blog Archive ►  2021 (18) ►  April (5) ►  March (3) ►  February (5) ►  January (5) ►  2020 (55) ►  December (4) ►  November (4) ►  October (3) ►  September (6) ►  August (5) ►  July (3) ►  June (6) ►  May (3) ►  April (5) ►  March (6) ►  February (5) ►  January (5) ►  2019 (66) ►  December (2) ►  November (4) ►  October (8) ►  September (5) ►  August (5) ►  July (7) ►  June (6) ►  May (7) ►  April (6) ►  March (7) ►  February (4) ►  January (5) ►  2018 (96) ►  December (7) ►  November (8) ►  October (10) ►  September (5) ►  August (8) ►  July (5) ►  June (7) ►  May (10) ►  April (8) ►  March (9) ►  February (9) ►  January (10) ▼  2017 (82) ►  December (6) ▼  November (6) Intel's "Management Engine" Has Web Advertising Jumped The Shark? Techno-hype part 2 Techno-hype part 1 Keynote at Pacific Neighborhood Consortium Randall Munroe Says It All ►  October (8) ►  September (6) ►  August (7) ►  July (5) ►  June (7) ►  May (6) ►  April (7) ►  March (11) ►  February (5) ►  January (8) ►  2016 (89) ►  December (4) ►  November (8) ►  October (10) ►  September (8) ►  August (8) ►  July (7) ►  June (8) ►  May (7) ►  April (5) ►  March (10) ►  February (7) ►  January (7) ►  2015 (75) ►  December (7) ►  November (5) ►  October (11) ►  September (5) ►  August (3) ►  July (3) ►  June (8) ►  May (10) ►  April (6) ►  March (6) ►  February (7) ►  January (4) ►  2014 (68) ►  December (7) ►  November (8) ►  October (6) ►  September (8) ►  August (7) ►  July (3) ►  June (5) ►  May (6) ►  April (5) ►  March (6) ►  February (2) ►  January (5) ►  2013 (67) ►  December (3) ►  November (6) ►  October (7) ►  September (6) ►  August (3) ►  July (5) ►  June (6) ►  May (5) ►  April (9) ►  March (5) ►  February (5) ►  January (7) ►  2012 (43) ►  December (4) ►  November (4) ►  October (6) ►  September (6) ►  August (2) ►  July (5) ►  June (2) ►  May (5) ►  March (1) ►  February (5) ►  January (3) ►  2011 (40) ►  December (2) ►  November (1) ►  October (7) ►  September (3) ►  August (5) ►  July (2) ►  June (2) ►  May (2) ►  April (4) ►  March (4) ►  February (4) ►  January (4) ►  2010 (17) ►  December (5) ►  November (3) ►  October (4) ►  September (2) ►  July (1) ►  June (1) ►  February (1) ►  2009 (8) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (2) ►  January (2) ►  2008 (8) ►  December (2) ►  March (1) ►  January (5) ►  2007 (14) ►  December (1) ►  October (3) ►  September (1) ►  August (1) ►  July (2) ►  June (3) ►  May (1) ►  April (2) LOCKSS system has permission to collect, preserve, and serve this Archival Unit. Simple theme. Powered by Blogger. blog-dshr-org-9208 ---- DSHR's Blog: NFTs and Web Archiving DSHR's Blog I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation. Thursday, April 15, 2021 NFTs and Web Archiving One of the earliest observations of the behavior of the Web at scale was "link rot". There were a lot of 404s, broken links. Research showed that the half-life of Web pages was alarmingly short. Even in 1996 this problem was obvious enough for Brewster Kahle to found the Internet Archive to address it. From the Wikipedia entry for Link Rot: A 2003 study found that on the Web, about one link out of every 200 broke each week,[1] suggesting a half-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in Yahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.[2] One might have thought that academic journals were a relatively stable part of the Web, but research showed that their references decayed too, just somewhat less rapidly. A 2013 study found a half-life of 9.3 years. See my 2015 post The Evanescent Web. I expect you have noticed the latest outbreak of blockchain-enabled insanity, Non-Fungible Tokens (NFTs). Someone "paying $69M for a JPEG" or $560K for a New York Times column attracted a lot of attention. Follow me below the fold for the connection between NFTs, "link rot" and Web archiving. Kahle's idea for addressing "link rot", which became the Wayback Machine, was to make a copy of the content at some URL, say: http://www.example.com/page.html keep the copy for posterity, and re-publish it at a URL like: https://web.archive.org/web/19960615083712/http://www.example.com/page.html What is the difference between the two URLs? The original is controlled by Example.Com, Inc.; they can change or delete it on a whim. The copy is controlled by the Internet Archive, whose mission is to preserve it unchanged "for ever". The original is subject to "link rot", the second is, one hopes, not subject to "link rot". The Wayback Machine's URLs have three components: https://web.archive.org/web/ locates the archival copy at the Internet Archive. 19960615083712 indicates that the copy was made on 15th June, 1996 at 8:37:12. http://www.example.com/page.html is the URL from which the copy was made. The fact that the archival copy is at a different URL from the original causes a set of problems that have bedevilled Web archiving. One is that, if the original goes away, all the links that pointed to it break, even though there may be an archival copy to which they could point to fulfill the intent of the link creator. Another is that, if the content at the original URL changes, the link will continue to resolve but the content it returns may no longer reflect the intent of the link creator, although there may be an archival copy that does. Even in the early days of the Web it was evident that Web pages changed and vanished at an alarming rate. The point is that the meaning of a generic Web URL is "whatever content, or lack of content, you find at this location". That is why URL stands for Universal Resource Locator. Note the difference with URI, which stands for Universal Resource Identifier. Anyone can create a URL or URI linking to whatever content they choose, but doing so provides no rights in or control over the linked-to content. In People's Expensive NFTs Keep Vanishing. This Is Why, Ben Munster reports that: over the past few months, numerous individuals have complained about their NFTs going “missing,” “disappearing,” or becoming otherwise unavailable on social media. This despite the oft-repeated NFT sales pitch: that NFT artworks are logged immutably, and irreversibly, onto the Ethereum blockchain. So NTFs have the same problem that Web pages do. Isn't the blockchain supposed to make things immortal and immutable? Kyle Orland's Ars Technica’s non-fungible guide to NFTs provides an over-simplified explanation: When NFT’s are used to represent digital files (like GIFs or videos), however, those files usually aren’t stored directly “on-chain” in the token itself. Doing so for any decently sized file could get prohibitively expensive, given the cost of replicating those files across every user on the chain. Instead, most NFTs store the actual content as a simple URI string in their metadata, pointing to an Internet address where the digital thing actually resides. NFTs are just links to the content they represent, not the content itself. The Bitcoin blockchain actually does contain some images, such as this ASCII portrait of Len Sassaman and some pornographic images. But the blocks of the Bitcoin blockchain were originally limited to 1MB and are now effectively limited to around 2MB, enough space for small image files. What’s the Maximum Ethereum Block Size? explains: Instead of a fixed limit, Ethereum block size is bound by how many units of gas can be spent per block. This limit is known as the block gas limit ... At the time of writing this, miners are currently accepting blocks with an average block gas limit of around 10,000,000 gas. Currently, the average Ethereum block size is anywhere between 20 to 30 kb in size. That's a little out-of-date. Currently the block gas limit is around 12.5M gas per block and the average block is about 45KB. Nowhere near enough space for a $69M JPEG. The NFT for an artwork can only be a link. Most NFTs are ERC-721 tokens, providing the optional Metadata extension: /// @title ERC-721 Non-Fungible Token Standard, optional metadata extension /// @dev See https://eips.ethereum.org/EIPS/eip-721 /// Note: the ERC-165 identifier for this interface is 0x5b5e139f. interface ERC721Metadata /* is ERC721 */ { /// @notice A descriptive name for a collection of NFTs in this contract function name() external view returns (string _name); /// @notice An abbreviated name for NFTs in this contract function symbol() external view returns (string _symbol); /// @notice A distinct Uniform Resource Identifier (URI) for a given asset. /// @dev Throws if `_tokenId` is not a valid NFT. URIs are defined in RFC /// 3986. The URI may point to a JSON file that conforms to the "ERC721 /// Metadata JSON Schema". function tokenURI(uint256 _tokenId) external view returns (string); } The Metadata JSON Schema specifies an object with three string properties: name: "Identifies the asset to which this NFT represents" description: "Describes the asset to which this NFT represents" image: "A URI pointing to a resource with mime type image/* representing the asset to which this NFT represents. Consider making any images at a width between 320 and 1080 pixels and aspect ratio between 1.91:1 and 4:5 inclusive." Note that the JSON metadata is not in the Ethereum blockchain, it is only pointed to by the token on the chain. If the art-work is the "image", it is two links away from the blockchain. So, given the evanescent nature of Web links, the standard provides no guarantee that the metadata exists, or is unchanged from when the token was created. Even if it is, the standard provides no guarantee that the art-work exists or is unchanged from when the token is created. Caveat emptor — Absent unspecified actions, the purchaser of an NFT is buying a supposedly immutable, non-fungible object that points to a URI pointing to another URI. In practice both are typically URLs. The token provides no assurance that either of these links resolves to content, or that the content they resolve to at any later time is what the purchaser believed at the time of purchase. There is no guarantee that the creator of the NFT had any copyright in, or other rights to, the content to which either of the links resolves at any particular time. There are thus two issues to be resolved about the content of each of the NFT's links: Does it exist? I.e. does it resolve to any content? Is it valid? I.e. is the content to which it resolves unchanged from the time of purchase? These are the same questions posed by the Holy Grail of Web archiving, persistent URLs. Assuming existence for now, how can validity be assured? There have been a number of systems that address this problem by switching from naming files by their location, as URLs do, to naming files by their content by using the hash of the content as its name. The idea was the basis for Bram Cohen's highly successful BitTorrent — it doesn't matter where the data comes from provided its integrity is assured because the hash in the name matches the hash of the content. The content-addressable file system most used for NFTs is the Interplanetary File System (IPFS). From its Wikipedia page: As opposed to a centrally located server, IPFS is built around a decentralized system[5] of user-operators who hold a portion of the overall data, creating a resilient system of file storage and sharing. Any user in the network can serve a file by its content address, and other peers in the network can find and request that content from any node who has it using a distributed hash table (DHT). In contrast to BitTorrent, IPFS aims to create a single global network. This means that if Alice and Bob publish a block of data with the same hash, the peers downloading the content from Alice will exchange data with the ones downloading it from Bob.[6] IPFS aims to replace protocols used for static webpage delivery by using gateways which are accessible with HTTP.[7] Users may choose not to install an IPFS client on their device and instead use a public gateway. If the purchaser gets both the NFT's metadata and the content to which it refers via IPFS URIs, they can be assured that the data is valid. What do these IPFS URIs look like? The (excellent) IPFS documentation explains: https://ipfs.io/ipfs/ # e.g https://ipfs.io/ipfs/Qme7ss3ARVgxv6rXqVPiikMJ8u2NLgmgszg13pYrDKEoiu Browsers that support IPFS can redirect these requests to your local IPFS node, while those that don't can fetch the resource from the ipfs.io gateway. You can swap out ipfs.io for your own http-to-ipfs gateway, but you are then obliged to keep that gateway running forever. If your gateway goes down, users with IPFS aware tools will still be able to fetch the content from the IPFS network as long as any node still hosts it, but for those without, the link will be broken. Don't do that. Note the assumption here that the ipfs.io gateway will be running forever. Note also that only some browsers are capable of accessing IPFS content without using a gateway. Thus the ipfs.io gateway is a single point of failure, although the failure is not complete. In practice NFTs using IPFS URIs are dependent upon the continued existence of Protocol Labs, the organization behind IPFS. The ipfs.io URIs in the NFT metadata are actually URLs; they don't point to IPFS, but to a Web server that accesses IPFS. Pointing to the NFT's metadata and content using IPFS URIs assures their validity but does it assure their existence? The IPFS documentation's section Persistence, permanence, and pinning explains: Nodes on the IPFS network can automatically cache resources they download, and keep those resources available for other nodes. This system depends on nodes being willing and able to cache and share resources with the network. Storage is finite, so nodes need to clear out some of their previously cached resources to make room for new resources. This process is called garbage collection. To ensure that data persists on IPFS, and is not deleted during garbage collection, data can be pinned to one or more IPFS nodes. Pinning gives you control over disk space and data retention. As such, you should use that control to pin any content you wish to keep on IPFS indefinitely. To assure the existence of the NFT's metadata and content they must both be not just written to IPFS but also pinned to at least one IPFS node. To ensure that your important data is retained, you may want to use a pinning service. These services run lots of IPFS nodes and allow users to pin data on those nodes for a fee. Some services offer free storage-allowance for new users. Pinning services are handy when: You don't have a lot of disk space, but you want to ensure your data sticks around. Your computer is a laptop, phone, or tablet that will have intermittent connectivity to the network. Still, you want to be able to access your data on IPFS from anywhere at any time, even when the device you added it from is offline. You want a backup that ensures your data is always available from another computer on the network if you accidentally delete or garbage-collect your data on your own computer. Thus to assure the existence of the NFT's metadata and content pinning must be rented from a pinning service, another single point of failure. In summary, it is possible to take enough precautions and pay enough ongoing fees to be reasonably assured that your $69M NFT and its metadata and the JPEG it refers to will remain accessible. Whether in practice these precautions are taken is definitely not always the case. David Gerard reports: But functionally, IPFS works the same way as BitTorrent with magnet links — if nobody bothers seeding your file, there’s no file there. Nifty Gateway turn out not to bother to seed literally the files they sold, a few weeks later. [Twitter; Twitter] Anil Dash claims to have invented, with Kevin McCoy, the concept of NFTs referencing Web URLs in 2014. He writes in his must-read NFTs Weren’t Supposed to End Like This: Seven years later, all of today’s popular NFT platforms still use the same shortcut. This means that when someone buys an NFT, they’re not buying the actual digital artwork; they’re buying a link to it. And worse, they’re buying a link that, in many cases, lives on the website of a new start-up that’s likely to fail within a few years. Decades from now, how will anyone verify whether the linked artwork is the original? All common NFT platforms today share some of these weaknesses. They still depend on one company staying in business to verify your art. They still depend on the old-fashioned pre-blockchain internet, where an artwork would suddenly vanish if someone forgot to renew a domain name. “Right now NFTs are built on an absolute house of cards constructed by the people selling them,” the software engineer Jonty Wareing recently wrote on Twitter. My only disagreement with Dash is that, as someone who worked on archiving the "old-fashioned pre-blockchain internet" for two decades, I don't believe that there is a new-fangled post-blockchain Internet that makes the problems go away. And neither does David Gerard: The pictures for NFTs are often stored on the Interplanetary File System, or IPFS. Blockchain promoters talk like IPFS is some sort of bulletproof cloud storage that works by magic and unicorns. Posted by David. at 8:00 AM Labels: bitcoin, distributed web, web archiving 2 comments: David. said... Kal Rustiala & Christopher Jon Sprigman's The One Redeeming Quality of NFTs Might Not Even Exist explains: "once you understand what the NFT is and how it actually works, you can see that it does nothing to permit the buyer, as the New Yorker put it, to own a “digital Beanie Baby” with only one existing copy. In fact, the NFT may make the authenticity question even more difficult to resolve." They quote David Hockney agreeing with David Gerard: "On an art podcast, Hockney recently said, “What is it that they’re owning? I don’t really know.” NFTs, Hockney said, are the domain of “international crooks and swindlers.” Hockney may have a point. If you look at them closely, NFTs do almost nothing to guarantee authenticity. In fact, for reasons we’ll explain, NFTs may actually make the problem of authenticity in digital art worse." April 20, 2021 at 7:58 PM David. said... Who could have predicted counterfeit NFTs? Tim Schneider's The Gray Market: How a Brazen Hack of That $69 Million Beeple Revealed the True Vulnerability of the NFT Market (and Other Insights) reports that: "In the opening days of April, an artist operating under the pseudonym Monsieur Personne (“Mr. Nobody”) tried to short-circuit the NFT hype machine by unleashing “sleepminting,” a process that complicates, if not corrodes, one of the value propositions underlying non-fungible tokens. ... Sleepminting enables him to mint NFTs for, and to, the crypto wallets of other artists, then transfer ownership back to himself without their consent or knowing participation. Nevertheless, each of these transactions appears as legitimate on the blockchain record as if the unwitting artist had initiated them on their own, opening up the prospect of sophisticated fraud on a mass scale." And it is arguably legal because NFTs are just a (pair of) links: "Personne told me that, after being “thoroughly consulted and advised by personal lawyers and specialist law firms,” he is confident there are “little to no legal repercussions for sleepminting.” His argument is that ERC721 smart contracts only contain a link pointing to a JSON (Javascript Object Notation) file, which in turn points to a “publicly available and hosted digital asset file”—here, Beeple’s Everydays image." April 23, 2021 at 2:07 PM Post a Comment Newer Post Older Post Home Subscribe to: Post Comments (Atom) Blog Rules Posts and comments are copyright of their respective authors who, by posting or commenting, license their work under a Creative Commons Attribution-Share Alike 3.0 United States License. Off-topic or unsuitable comments will be deleted. DSHR DSHR in ANWR Recent Comments Full comments Blog Archive ▼  2021 (18) ▼  April (5) Dogecoin Disrupts Bitcoin! What Is The Point? NFTs and Web Archiving Cryptocurrency's Carbon Footprint Elon Musk: Threat or Menace? ►  March (3) ►  February (5) ►  January (5) ►  2020 (55) ►  December (4) ►  November (4) ►  October (3) ►  September (6) ►  August (5) ►  July (3) ►  June (6) ►  May (3) ►  April (5) ►  March (6) ►  February (5) ►  January (5) ►  2019 (66) ►  December (2) ►  November (4) ►  October (8) ►  September (5) ►  August (5) ►  July (7) ►  June (6) ►  May (7) ►  April (6) ►  March (7) ►  February (4) ►  January (5) ►  2018 (96) ►  December (7) ►  November (8) ►  October (10) ►  September (5) ►  August (8) ►  July (5) ►  June (7) ►  May (10) ►  April (8) ►  March (9) ►  February (9) ►  January (10) ►  2017 (82) ►  December (6) ►  November (6) ►  October (8) ►  September (6) ►  August (7) ►  July (5) ►  June (7) ►  May (6) ►  April (7) ►  March (11) ►  February (5) ►  January (8) ►  2016 (89) ►  December (4) ►  November (8) ►  October (10) ►  September (8) ►  August (8) ►  July (7) ►  June (8) ►  May (7) ►  April (5) ►  March (10) ►  February (7) ►  January (7) ►  2015 (75) ►  December (7) ►  November (5) ►  October (11) ►  September (5) ►  August (3) ►  July (3) ►  June (8) ►  May (10) ►  April (6) ►  March (6) ►  February (7) ►  January (4) ►  2014 (68) ►  December (7) ►  November (8) ►  October (6) ►  September (8) ►  August (7) ►  July (3) ►  June (5) ►  May (6) ►  April (5) ►  March (6) ►  February (2) ►  January (5) ►  2013 (67) ►  December (3) ►  November (6) ►  October (7) ►  September (6) ►  August (3) ►  July (5) ►  June (6) ►  May (5) ►  April (9) ►  March (5) ►  February (5) ►  January (7) ►  2012 (43) ►  December (4) ►  November (4) ►  October (6) ►  September (6) ►  August (2) ►  July (5) ►  June (2) ►  May (5) ►  March (1) ►  February (5) ►  January (3) ►  2011 (40) ►  December (2) ►  November (1) ►  October (7) ►  September (3) ►  August (5) ►  July (2) ►  June (2) ►  May (2) ►  April (4) ►  March (4) ►  February (4) ►  January (4) ►  2010 (17) ►  December (5) ►  November (3) ►  October (4) ►  September (2) ►  July (1) ►  June (1) ►  February (1) ►  2009 (8) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (2) ►  January (2) ►  2008 (8) ►  December (2) ►  March (1) ►  January (5) ►  2007 (14) ►  December (1) ►  October (3) ►  September (1) ►  August (1) ►  July (2) ►  June (3) ►  May (1) ►  April (2) LOCKSS system has permission to collect, preserve, and serve this Archival Unit. Simple theme. Powered by Blogger. blog-dshr-org-9536 ---- DSHR's Blog: The Evanescent Web DSHR's Blog I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation. Tuesday, February 10, 2015 The Evanescent Web Papers drawing attention to the decay of links in academic papers have quite a history, i blogged about three relatively early ones six years ago. Now Martin Klein and a team from the Hiberlink project have taken the genre to a whole new level with a paper in PLoS One entitled Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. Their dataset is 2-3 orders of magnitude bigger than previous studies, their methods are far more sophisticated, and they study both link rot (links that no longer resolve) and content drift (links that now point to different content). There's a summary on the LSE's blog. Below the fold, some thoughts on the Klein et al paper. As regards link rot, they write: In order to combat link rot, the Digital Object Identifier (DOI) was introduced to persistently identify journal articles. In addition, the DOI resolver for the URI version of DOIs was introduced to ensure that web links pointing at these articles remain actionable, even when the articles change web location. But even when used correctly, such as http://dx.doi.org/10.1371/journal.pone.0115253, DOIs introduce a single point of failure. This became obvious on January 20th when the doi.org domain name briefly expired. DOI links all over the Web failed, illustrating yet another fragility of the Web. It hasn't been a good time for access to academic journals for other reasons either. Among the publishers unable to deliver content to their customers in the last week or so were Elsevier, Springer, Nature, HighWire Press and Oxford Art Online. I've long been a fan of Herbert van de Sompel's work, especially Memento. He's a co-author on the paper and we have been discussing it. Unusually, we've been disagreeing. We completely agree on the underlying problem of the fragility of academic communication in the Web era as opposed to its robustness in the paper era. Indeed, in the introduction of another (but much less visible) recent paper entitled Towards Robust Hyperlinks for Web-Based Scholarly Communication Herbert and his co-authors echo the comparison between the paper and Web worlds from the very first paper we published on the LOCKSS system a decade and a half ago. Nor am I critical of the research underlying the paper, which is clearly of high quality and which reveals interesting and disturbing properties of Web-based academic communication. All I'm disagreeing with Herbert about is the way the research is presented in the paper. My problem with the presentation is that this paper, which has a far higher profile than other recent publications in this area, and which comes at a time of unexpectedly high visibility for web archiving, seems to me to be excessively optimistic, and to fail to analyze the roots of the problem it is addressing. It thus fails to communicate the scale of the problem. The paper is, for very practical reasons of publication in a peer-reviewed journal, focused on links from academic papers to the web-at-large. But I see it as far too optimistic in its discussion of the likely survival of the papers themselves, and the other papers they link to (see Content Drift below). I also see it as far too optimistic in its discussion of proposals to fix the problem of web-at-large references that it describes (see Dependence on Authors below). All the proposals depend on actions being taken either before or during initial publication by either the author or the publisher. There is evidence in the paper itself (see Getting Links Right below) that neither authors nor publishers can get DOIs right. Attempts to get authors to deposit their papers in institutional repositories notoriously fail. The LOCKSS team has met continual frustration in getting publishers to make small changes to their publishing platforms that would make preservation easier, or in some cases even possible. Viable solutions to the problem cannot depend on humans to act correctly. Neither authors nor publishers have anything to gain from preservation of their work. In addition, the paper fails to even mention the elephant in the room, the fact that both the papers and the web-at-large content are copyright. The archives upon which the proposed web-at-large solutions rest, such as the Internet Archive, are themselves fragile. Not just for the normal economic and technical reasons we outlined nearly a decade ago, but because they operate under the DMCA's "safe harbor" provision and thus must take down content upon request from a claimed copyright holder. The archives such as Portico and LOCKSS that preserve the articles themselves operate instead with permission from the publisher, and thus must impose access restrictions. This is the root of the problem. In the paper world in order to monetize their content the copyright owner had to maximize the number of copies of it. In the Web world, in order to monetize their content the copyright owner has to minimize the number of copies. Thus the fundamental economic motivation for Web content militates against its preservation in the ways that Herbert and I would like. None of this is to suggest that developing and deploying partial solutions is a waste of time. It is what I've been doing the last quarter of my life. There cannot be a single comprehensive technical solution. The best we can do is to combine a diversity of partial solutions. But we need to be clear that even if we combine everything anyone has worked on we are still a long way from solving the problem. Now for some details. Content Drift As regards content drift, they write: Content drift is hardly a matter of concern for references to journal articles, because of the inherent fixity that, especially PDF-formated, articles exhibit. Nevertheless, special-purpose solutions for long-term digital archiving of the digital journal literature, such as LOCKSS, CLOCKSS, and Portico, have emerged to ensure that articles and the articles they reference can be revisited even if the portals that host them vanish from the web. More recently, the Keepers Registry has been introduced to keep track of the extent to which the digital journal literature is archived by what memory organizations. These combined efforts ensure that it is possible to revisit the scholarly context that consists of articles referenced by a certain article long after its publication. While I understand their need to limit the scope of their research to web-at-large resources, the last sentence is far too optimistic. First, research using the Keepers Registry and other resources shows that at most 50% of all articles are preserved. So future scholars depending on archives of digital journals will encounter large numbers of broken links. Second, even the 50% of articles that are preserved may not be accessible to a future scholar. CLOCKSS is a dark archive and is not intended to provide access to future scholars unless the content is triggered. Portico is a subscription archive, future scholars' institutions may not have a subscription. LOCKSS provides access only to readers at institutions running a LOCKSS box. These restrictions are a response to the copyright on the content and are not susceptible to technical fixes. Third, the assumption that journal articles exhibit "inherent fixity" is, alas, outdated. Both the HTML and PDF versions of articles from state-of-the-art publishing platforms contain dynamically generated elements, even when they are not entirely generated on-the-fly. The LOCKSS system encounters this on a daily basis. As each LOCKSS box collects content from the publisher independently, each box gets content that differs in unimportant respects. For example, the HTML content is probably personalized ("Welcome Stanford University") and updated ("Links to this article"). PDF content is probably watermarked ("Downloaded by 192.168.1.100"). Content elements such as these need to be filtered out of the comparisons between the "same" content at different LOCKSS boxes. One might assume that the words, figures, etc. that form the real content of articles do not drift, but in practice it would be very difficult to validate this assumption. Soft-404 Responses I've written before about the problems caused for archiving by "soft-403 and soft-404" responses by Web servers. These result from Web site designers who believe their only audience is humans, so instead of providing the correct response code when they refuse to supply content, they return a pretty page with a 200 response code indicating valid content. The valid content is a refusal to supply the requested content. Interestingly, PubMed is an example, as I discovered when clicking on the (broken) PubMed link in the paper's reference 58. Klein et al define a live web page thus: On the one hand, the HTTP transaction chain could end successfully with a 2XX-level HTTP response code. In this case we declared the URI to be active on the live web. Their estimate of the proportion of links which are still live is thus likely to be optimistic, as they are likely to have encountered at least soft-404s if not soft-403s. Getting Links Right Even when the dx.doi.org resolver is working, its effectiveness in persisting links depends on its actually being used. Klein et al discover that in many cases it isn't: one would assume that URI references to journal articles can readily be recognized by detecting HTTP URIs that carry a DOI, e.g., http://dx.doi.org/10.1007/s00799-014-0108-0. However, it turns out that references rather frequently have a direct link to an article in a publisher's portal, e.g. http://link.springer.com/article/10.1007%2Fs00799-014-0108-0, instead of the DOI link. The direct link may well survive relocation of the content within the publisher's site. But journals are frequently bought and sold between publishers, causing the link to break. I believe there are two causes for these direct links, publisher's platforms inserting them so as not to risk losing the reader, but more importantly the difficulty for authors to create correct links. Cutting and pasting from the URL bar in their browser necessarily gets the direct link, creating the correct one via dx.doi.org requires the author to know that it should be hand-edited, and to remember to do it. Attempts to ensure linked materials are preserved suffer from a similar problem: The solutions component of Hiberlink also explores how to best reference archived snapshots. The common and obvious approach, followed by Webcitation and Perma.cc, is to replace the original URI of the referenced resource with the URI of the Memento deposited in a web archive. This approach has several drawbacks. First, through removal of the original URI, it becomes impossible to revisit the originally referenced resource, for example, to determine what its content has become some time after referencing. Doing so can be rather relevant, for example, for software or dynamic scientific wiki pages. Second, the original URI is the key used to find Mementos of the resource in all web archives, using both their search interface and the Memento protocol. Removing the original URI is akin to throwing away that key: it makes it impossible to find Mementos in web archives other than the one in which the specific Memento was deposited. This means that the success of the approach is fully dependent on the long term existence of that one archive. If it permanently ceases to exist, for example, as a result of legal or financial pressure, or if it becomes temporally inoperative as a result of technical failure, the link to the Memento becomes rotten. Even worse, because the original URI was removed from the equation, it is impossible to use other web archives as a fallback mechanism. As such, in the approach that is currently common, one link rot problem is replaced by another. The paper, and a companion paper, describe Hiberlink's solution, which is to decorate the link to the original resource with an additional link to its archived Memento. Rene Voorburg of the KB has extended this by implementing robustify.js:  robustify.js checks the validity of each link a user clicks. If the linked page is not available, robustify.js will try to redirect the user to an archived version of the requested page. The script implements Herbert Van de Sompel's Memento Robust Links - Link Decoration specification (as part of the Hiberlink project) in how it tries to discover an archived version of the page. As a default, it will use the Memento Time Travel service as a fallback. You can easily implement robustify.js on your web pages in so that it redirects pages to your preferred web archive. Note, however, that soft-403s and soft-404s pose the same problem for robustify.js as they do for all Web archiving technologies. Dependence on Authors Many of the solutions that have been proposed to the problem of reference rot also suffer from dependence on authors: Webcitation was a pioneer in this problem domain when, years ago, it introduced the service that allows authors to archive, on demand, web resources they intend to reference. ... But Webcitation has not been met with great success, possibly the result of a lack of authors' awareness regarding reference rot, possibly because the approach requires an explicit action by authors, likely because of both. Webcitation is not the only one: To a certain extent, portals like FigShare and Zenodo play in this problem domain as they allow authors to upload materials that might otherwise be posted to the web at large. The recent capability offered by these systems that allows creating a snapshot of a GitHub repository, deposit it, and receive a DOI in return, serves as a good example. The main drivers for authors to do so is to contribute to open science and to receive a citable DOI, and, hence potentially credit for the contribution. But the net effect, from the perspective of the reference rot problem domain, is the creation of a snapshot of an otherwise evolving resource. Still, these services target materials created by authors, not, like web archives do, resources on the web irrespective of their authorship. Also, an open question remains to which extent such portals truly fulfill a long term archival function rather than being discovery and access environments. Hiberlink is trying to reduce this dependence: In the solutions thread of Hiberlink, we explore pro-active archiving approaches intended to seamlessly integrate into the life cycle of an article and to require less explicit intervention by authors. One example is an experimental Zotero extension that archives web resources as an author bookmarks them during note taking. Another is HiberActive, a service that can be integrated into the workflow of a repository or a manuscript submission system and that issues requests to web archives to archive all web at large resources referenced in submitted articles. But note that these services (and Voorburg's) depend on the author or the publisher installing them. Experience shows that authors are focused on getting their current paper accepted, large publishers are reluctant to implement extensions to their publishing platforms that offer no immediate benefit, and small publishers lack the expertise to do so. Ideally, these services would be back-stopped by a service that scanned recently-published articles for web-at-large links and submitted them for archiving, thus requiring no action by author or publisher. The problem is that doing so requires the service to have access to the content as it is published. The existing journal archiving services, LOCKSS, CLOCKSS and Portico have such access to about half the published articles, and could in principle be extended to perform this service. In practice doing so would need at least modest funding. The problem isn't as simple as it appears at first glance, even for the articles that are archived. For those that aren't, primarily from less IT-savvy authors and small publishers, the outlook is bleak. Archiving Finally, the solutions assume that submitting a URL to an archive is enough to ensure preservation. It isn't. The referenced web site might have a robots.txt policy preventing collection. The site might have crawler traps, exceed the archive's crawl depth, or use Javascript in ways that prevent the archive collecting a usable representation. Or the archive may simply not process the request in time to avoid content drift or link rot. Acknowledgement I have to thank Herbert van de Sompel for greatly improving this post through constructive criticism. But it remains my opinion alone. Update: Fixed broken link to Geoff Bilder post at Crossref flagged by Rob Baxter in comments to a December 2016 post on a similar topic. Posted by David. at 8:00 AM Labels: digital preservation, e-journals, memento 10 comments: rv said... "Note, however, that soft-403s and soft-404s pose the same problem for robustify.js as they do for all Web archiving technologies." I just uploaded a new version of the robustify.js helper script (https://github.com/renevoorburg/robustify.js) that attempts to recognize soft-404s. It does so by forcing a '404' with a random request and comparing the results of that with the results of the original request (using fuzzy hashing). It seems to work very well but I am missing a good test set of soft 404's. February 10, 2015 at 1:14 PM David. said... Good idea, René! February 10, 2015 at 2:51 PM Unknown said... As ever, a good and challenging read. Although I am not one of the authors of the paper you review I have been involved in a lot of the underlying thinking as one of the PIs in the project, described at Hiberlink.org and would like to add a few comments, especially on the matter of potential remedy. We were interested in the prospect of change & intervention in three simple workflows (for the author; for the issuing body; for the hapless library/repository) in order to enable transactional archiving of referenced content - reasoning that it was best that this was done as early as possible after the content on the web was regarded as important, and also that such archiving was best done when the actor in question had their mind in gear. The prototyping using Zotero and OJS was done via plug-ins because having access to the source code our colleague Richard Wincewicz could mock this up as a demonstrator. One strategy was that would then invite ‘borrowing’ of the functionality (of snapshot/DateTimeStamp/archive/‘decorate’ with DateTimeStamp of URI within the citation) by commercial reference managers and editorial software so that authors and/or publishers (editors?) did not have to do something special. Reference rot is a function of time: the sooner the fish (fruit?) is flash frozen the less it has chance to rot. However, immediate post-publication remedy is better than none. The suggestion that there is pro-active fix for content ingested into LOCKSS, CLOCKSS and Portico (and other Keepers of digital content) by archiving of references is very much welcomed. This is part of our thinking for remodelling Repository Junction Broker which supports machine ingest into institutional repositories but what you suggest could have greater impact. February 12, 2015 at 6:27 AM Martin Klein said... A comment on the issue of soft404s: Your point is well taken and the paper's methodology section would clearly have benefited from mentioning this detriment and why we chose to not address it. My co-authors and I are very well aware of the soft404 issue, common approaches to detect them (such as introduced in [1] and [2]), and have, in fact, applied such methods in the past [3]. However, given the scale of our corpus of 1 million URIs, and the soft404 ratio found in previous studies (our [3] found a ratio of 0.36% and [4] found 3.41%), we considered checking for soft404s too expensive in light of potential return. Especially since, as you have pointed out in the past [5], web archives also archive soft404s, we would have had to detect soft404s on the live web as well as in web archives. Regardless, I absolutely agree that our reference rot numbers for links to web at large resources likely represent a lower bound. It would be interesting to investigate the ratio of soft404s and build a good size corpus to evaluate common and future detection approaches. The soft404 on the paper's reference 58 (which is introduced by the publisher) seems to "only" be a function of the PubMed search as a request for [6] returns a 404. [1] http://dx.doi.org/10.1145/988672.988716 [2] http://dx.doi.org/10.1145/1526709.1526886 [3] http://arxiv.org/abs/1102.0930 [4] http://dx.doi.org/10.1007/978-3-642-33290-6_22 [5] http://blog.dshr.org/2013/04/making-memento-succesful.html [6] http://www.ncbi.nlm.nih.gov/pubmed/aodfhdskjhfsjkdhfskldfj February 13, 2015 at 2:47 PM David. said... Peter Burnhill supports the last sentence of my post with this very relevant reference<: thoughts of (Captain) Clarence Birdseye Some advice on quick freezing references to Web caught resources: Better done when references are noted (by the author), and then could be re-examined at point of issue (by the editor / publisher). When delivered by the crate (onto digital shelves) the rot may have set in for some of these fish ... March 1, 2015 at 6:39 AM David. said... Geoffrey Bilder has a very interesting and detailed first instalment of a multi-part report on the DOI outage that is well worth reading. April 20, 2015 at 7:24 PM David. said... As reported on the UK Serials Group listserv, UK Elsevier subscribers encountered a major outage last weekend due to "unforeseen technical issues". June 9, 2015 at 9:23 AM David. said... The outages continued sporadically through Tuesday. This brings up another issue about the collection of link rot statistics. The model behind these studies so far is that a Web resource appears at some point in time, remains continually accessible for a period, then becomes inaccessible and remains inaccessible "for ever". Clearly, the outages noted here show that this isn't the case. Between the resource's first appearance and its last, there is some probably time-varying probability that it is available that is less than 1. June 10, 2015 at 7:29 AM David. said... Timothy Geigner at TechDirt supplies the canonical example of why depending on the DMCA "safe harbor" is risky for preservation. Although in this case the right thing happened in response to a false DMCA takedown notice, detecting them is between difficult and impossible. July 10, 2015 at 5:15 PM David. said... Herbert Van de Sompel, Martin Klein and Shawn Jones revisit the issue of why DOIs are not in practice used to refer to articles in a poster for WWW2016 Persistent URIs Must Be Used To Be Persistent. Note that this link is not a DOI, in this case because the poster doesn't have one (yet?). March 1, 2016 at 6:11 AM Post a Comment Newer Post Older Post Home Subscribe to: Post Comments (Atom) Blog Rules Posts and comments are copyright of their respective authors who, by posting or commenting, license their work under a Creative Commons Attribution-Share Alike 3.0 United States License. Off-topic or unsuitable comments will be deleted. DSHR DSHR in ANWR Recent Comments Full comments Blog Archive ►  2021 (18) ►  April (5) ►  March (3) ►  February (5) ►  January (5) ►  2020 (55) ►  December (4) ►  November (4) ►  October (3) ►  September (6) ►  August (5) ►  July (3) ►  June (6) ►  May (3) ►  April (5) ►  March (6) ►  February (5) ►  January (5) ►  2019 (66) ►  December (2) ►  November (4) ►  October (8) ►  September (5) ►  August (5) ►  July (7) ►  June (6) ►  May (7) ►  April (6) ►  March (7) ►  February (4) ►  January (5) ►  2018 (96) ►  December (7) ►  November (8) ►  October (10) ►  September (5) ►  August (8) ►  July (5) ►  June (7) ►  May (10) ►  April (8) ►  March (9) ►  February (9) ►  January (10) ►  2017 (82) ►  December (6) ►  November (6) ►  October (8) ►  September (6) ►  August (7) ►  July (5) ►  June (7) ►  May (6) ►  April (7) ►  March (11) ►  February (5) ►  January (8) ►  2016 (89) ►  December (4) ►  November (8) ►  October (10) ►  September (8) ►  August (8) ►  July (7) ►  June (8) ►  May (7) ►  April (5) ►  March (10) ►  February (7) ►  January (7) ▼  2015 (75) ►  December (7) ►  November (5) ►  October (11) ►  September (5) ►  August (3) ►  July (3) ►  June (8) ►  May (10) ►  April (6) ►  March (6) ▼  February (7) Don't Panic Using the official Linux overlayfs Report from FAST15 Vint Cerf's talk at AAAS The Evanescent Web It takes longer than it takes Disk reliability ►  January (4) ►  2014 (68) ►  December (7) ►  November (8) ►  October (6) ►  September (8) ►  August (7) ►  July (3) ►  June (5) ►  May (6) ►  April (5) ►  March (6) ►  February (2) ►  January (5) ►  2013 (67) ►  December (3) ►  November (6) ►  October (7) ►  September (6) ►  August (3) ►  July (5) ►  June (6) ►  May (5) ►  April (9) ►  March (5) ►  February (5) ►  January (7) ►  2012 (43) ►  December (4) ►  November (4) ►  October (6) ►  September (6) ►  August (2) ►  July (5) ►  June (2) ►  May (5) ►  March (1) ►  February (5) ►  January (3) ►  2011 (40) ►  December (2) ►  November (1) ►  October (7) ►  September (3) ►  August (5) ►  July (2) ►  June (2) ►  May (2) ►  April (4) ►  March (4) ►  February (4) ►  January (4) ►  2010 (17) ►  December (5) ►  November (3) ►  October (4) ►  September (2) ►  July (1) ►  June (1) ►  February (1) ►  2009 (8) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (2) ►  January (2) ►  2008 (8) ►  December (2) ►  March (1) ►  January (5) ►  2007 (14) ►  December (1) ►  October (3) ►  September (1) ►  August (1) ►  July (2) ►  June (3) ►  May (1) ►  April (2) LOCKSS system has permission to collect, preserve, and serve this Archival Unit. Simple theme. Powered by Blogger. blog-dshr-org-9748 ---- DSHR's Blog: The Bitcoin "Price" DSHR's Blog I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation. Thursday, January 14, 2021 The Bitcoin "Price" Jemima Kelly writes No, bitcoin is not “the ninth-most-valuable asset in the world” and its a must-read. Below the fold, some commentary. Source The "price" of BTC in USD has quadrupuled in the last three months, and thus its "market cap" has sparked claims that it is the 9th most valuable asset in the world. Kelly explains the math: Just like you would calculate a company’s market capitalisation by multiplying its stock price by the number of shares outstanding, with bitcoin you just multiply its price by its total “supply” of coins (ie, the number of coins that have been mined since the first one was in January 2009). Simples! If you do that sum, you’ll see that you get to a very large number — if you take the all-time-high of $37,751 and multiply that by the bitcoin supply (roughly 18.6m) you get to just over $665bn. And, if that were accurate and representative and if you could calculate bitcoin’s value in this way, that would place it just below Tesla and Alibaba in terms of its “market value”. (On Wednesday!) Then Kelly starts her critique, which is quite different from mine in Stablecoins: In the context of companies, the “market cap” can be thought of as loosely representing what someone would have to pay to buy out all the shareholders in order to own the company outright (though in practice the shares have often been over- or undervalued by the market, so shareholders are often offered a premium or a discount). Companies, of course, have real-world assets with economic value. And there are ways to analyse them to work out whether they are over- or undervalued, such as price-to-earnings ratios, net profit margins, etc. With bitcoin, the whole value proposition rests on the idea of the network. If you took away the coinholders there would be literally nothing there, and so bitcoin’s value would fall to nil. Trying to value it by talking about a “market cap” therefore makes no sense at all. Secondly, she takes aim at the circulating BTC supply: Another problem is that although 18.6m bitcoins have indeed been mined, far fewer can actually be said to be “in circulation” in any meaningful way. For a start, it is estimated that about 20 per cent of bitcoins have been lost in various ways, never to be recovered. Then there are the so-called “whales” that hold most of the bitcoin, whose dominance of the market has risen in recent months. The top 2.8 per cent of bitcoin addresses now control 95 per cent of the supply (including many that haven’t moved any bitcoin for the past half-decade), and more than 63 per cent of the bitcoin supply hasn’t been moved for the past year, according to recent estimates. The small circulating supply means that BTC liquidity is an illusion: the idea that you can get out of your bitcoin position at any time and the market will stay intact is frankly a nonsense. And that’s why the bitcoin religion’s “HODL” mantra is so important to be upheld, of course. Because if people start to sell, bad things might happen! And they sometimes do. The excellent crypto critic Trolly McTrollface (not his real name, if you’re curious) pointed out on Twitter that on Saturday a sale of just 150 bitcoin resulted in a 10 per cent drop in the price. And there are a lot of "whales' HODL-ing. If one decides to cash out, everyone will get trampled in the rush for the exits: More than 2,000 wallets contain over 1,000 bitcoin in them. What would happen to the price if just one of those tried to unload their coins on to the market at once? It wouldn’t be pretty, we would wager. What we call the “bitcoin price” is in fact only the price of the very small number of bitcoins that wash around the retail market, and doesn’t represent the price that 18.6m bitcoins would actually be worth, even if they were all actually available. Source Note that Kelly's critique implictly assumes that BTC is priced in USD, not in the mysteriously inflatable USDT. The graph shows that the vast majority of the "very small number of bitcoins that wash around the retail market" are traded for, and thus priced in USDT. So the actual number of bitcoins being traded for real money is a small fraction of a very small number. Bitfinex & Tether have agreed to comply with the New York Supreme Court and turn over their financial records to the New York Attorney General by 15th January. If they actually do, and the details of what is actually backing the current stock of nearly 24 billion USDT become known, things could get rather dynamic. As Tim Swanson explains in Parasitic Stablecoins, the 24B USD are notionally in a bank account, and the solvency of that account is not guaranteed by any government deposit insurance. So even if there were a bank account containing 24B USD, if there is a rush for the exits the bank holding that account could well go bankrupt. To give a sense of scale, the 150 BTC sale that crashed the "price" by 10% represents ( 150 / 6.25 ) / 6 = 4 hours of mining reward. If miners were cashing out their rewards, they would be selling 900BTC or $36M/day. In the long term, the lack of barriers to entry means that the margins on mining are small. But in the short term, mining capacity can't respond quickly to large changes in the "price". It certainly can't increase four times in three months. Source Lets assume that three months ago, when 1BTC≈10,000USDT, the BTC ecosystem was in equilibrium with the mining rewards plus fees slightly more than the cost of mining. While the BTC "price" has quadrupled, the hash rate and thus the cost of mining has oscillated between 110M and 150M TeraHash/s. It hasn't increased significantly, so miners only now need to sell about 225BTC or $9M/day to cover their costs. With the price soaring, they have an incentive to HODL their rewards. Posted by David. at 8:00 AM Labels: bitcoin 23 comments: David. said... Alex Pickard was an early buyer of BTC, and became a miner in 2017. But the scales have fallen from his eyes, In Bitcoin: Magic Internet Money he explains that BTC is useless for anything except speculation: "Essentially overnight it became “digital gold” with no use other than for people to buy and hodl ... and hope more people would buy and hodl, and increase the price of BTC until everyone on earth sells their fiat currency for BTC, and then…? Well, what exactly happens then, when BTC can only handle about 350,000 transactions per day and 7.8 billion people need to buy goods and services?" And he is skeptical that Tether will survive: "If Tether continues as a going concern, and if the rising price of BTC is linked to USDT issuance, then BTC will likely continue to mechanically build a castle to the sky. I have shown how BTC price increases usually follow USDT issuance. In late 2018, when roughly 1 billion USDT were redeemed, the price of BTC subsequently fell by over 50%. Now, imagine what would happen if Tether received a cease-and-desist order, and its bank accounts were seized. Today’s digital gold would definitely lose its luster." January 14, 2021 at 10:10 AM David. said... The saga of someone trying to turn "crypto" into "fiat". January 14, 2021 at 3:39 PM David. said... An anonymous Bitcoin HODL-er finally figured out the Tether scam and realized his winnings. His must-read account is The Bit Short: Inside Crypto’s Doomsday Machine: "The legitimate crypto exchanges, like Coinbase and Bitstamp, clearly know to stay far away from Tether: neither supports Tether on their platforms. And the feeling is mutual! Because if Tether Ltd. were ever to allow a large, liquid market between Tethers and USD to develop, the fraud would instantly become obvious to everyone as the market-clearing price of Tether crashed far below $1. Kraken is the biggest USD-banked crypto exchange on which Tether and US dollars trade freely against each other. The market in that trading pair on Kraken is fairly modest — about $16M worth of daily volume — and Tether Ltd. surely needs to keep a very close eye on its movements. In fact, whenever someone sells Tether for USD on Kraken, Tether Ltd. has no choice but to buy it — to do otherwise would risk letting the peg slip, and unmask the whole charade. My guess is that maintaining the Tether peg on Kraken represents the single biggest ongoing capital expense of this entire fraud. If the crooks can’t scrape together enough USD to prop up the Tether peg on Kraken, then it’s game over, and the whole shambles collapses. And that makes it the fraud’s weak point." January 18, 2021 at 8:22 AM David. said... Tether's bank is Deltec, in the Bahamas. The anonymous Bitcoin HODL-er points out that: "Bahamas discloses how much foreign currency its domestic banks hold each month." As of the end of September 2020, all Bahamian banks in total held about $5.3B USD worth of foreign currency. At that time there were about 15.5B USDT in circulation. Even if we assume that Deltec held all of it, USDT was only 34% backed by actual money. January 18, 2021 at 9:14 AM David. said... David Gerard's Tether printer go brrrrr — cryptocurrency’s substitute dollar problem collects a lot of nuggets about Tether, but also this: "USDC loudly touts claims that it’s well-regulated, and implies that it’s audited. But USDC is not audited — accountants Grant Thornton sign a monthly attestation that Centre have told them particular things, and that the paperwork shows the right numbers. An audit would show for sure whether USDC’s reserve was real money, deposited by known actors — and not just a barrel of nails with a thin layer of gold and silver on top supplied by dubious entities. But, y’know, it’s probably fine and you shouldn’t worry." February 3, 2021 at 3:05 PM David. said... In 270 addresses are responsible for 55% of all cryptocurrency money laundering, Catalin Cimpanu discusses a report from Chainalysis: "1,867 addresses received 75% of all criminally-linked cryptocurrency funds in 2020, a sum estimated at around $1.7 billion. ... The company believes that the cryptocurrency-related money laundering field is now in a vulnerable position where a few well-orchestrated law enforcement actions against a few cryptocurrency operators could cripple the movement of illicit funds of many criminal groups at the same time. Furthermore, additional analysis also revealed that many of the services that play a crucial role in money laundering operations are also second-tier services hosted at larger legitimate operators. In this case, a law enforcement action wouldn't even be necessary, as convincing a larger company to enforce its anti-money-laundering policies would lead to the shutdown of many of today's cryptocurrency money laundering hotspots." February 15, 2021 at 12:24 PM David. said... In Bitcoin is now worth $50,000 — and it's ruining the planet faster than ever, Eric Holthaus points out the inevitable result of the recent spike in BTC: "The most recent data, current as of February 17 from the University of Cambridge shows that Bitcoin is drawing about 13.62 Gigawatts of electricity, an annualized consumption of 124 Terawatt-hours – about a half-percent of the entire world’s total – or about as much as the entire country of Pakistan. Since most electricity used to mine Bitcoin comes from fossil fuels, Bitcoin produces a whopping 37 million tons of carbon dioxide annually, about the same amount as Switzerland does by simply existing." February 21, 2021 at 12:19 PM David. said... In Elon Musk wants clean power, but Tesla's dealing in environmentally dirty bitcoin notes that: "Tesla boss Elon Musk is a poster child of low-carbon technology. Yet the electric carmaker's backing of bitcoin this week could turbocharge global use of a currency that's estimated to cause more pollution than a small country every year. Tesla revealed on Monday it had bought $1.5 billion of bitcoin and would soon accept it as payment for cars, sending the price of the cryptocurrency though the roof. ... The digital currency is created via high-powered computers, an energy-intensive process that currently often relies on fossil fuels, particularly coal, the dirtiest of them all." But Reuters fails to ask where the $1.5B that spiked BTC's "price" came from. It wasn't Musk's money, it was the Tesla shareholder's money. And how did they get it? By selling carbon offsets. So Musk is taking subsidies intended to reduce carbon emissions and using them to generate carbon emissions. February 21, 2021 at 12:30 PM David. said... One flaw in Eric Holthaus' Bitcoin is now worth $50,000 — and it's ruining the planet faster than ever is that while he writes: "There are decent alternatives to Bitcoin for people still convinced by the potential social benefits of cryptocurrencies. Ethereum, the world’s number two cryptocurrency, is currently in the process of converting its algorithm from one that’s fundamentally competitive (proof-of-work, like Bitcoin uses) to one that’s collaborative (proof-of-stake), a move that will conserve more than 99% of its electricity use." He fails to point out that (a) Ethereum has been trying to move to proof-of-stake for many years without success, and (b) there are a huge number of other proof-of-work cryptocurrencies that, in aggregate, also generate vast carbon emissions. February 21, 2021 at 12:57 PM David. said... Four posts worth reading inspired by Elon Musk's pump-and-HODL of Bitcoin. First, Jamie Powell's Tesla and bitcoin: the accounting explains how $1.5B of BTC will further obscure the underlying business model of Tesla. Of course, if investors actually understood Tesla's business model they might not be willing to support a PE of, currently, 1,220.78, so the obscurity may be the reason for the HODL. Second, Izabella Kaminska's What does institutional bitcoin mean? looks at the investment strategies hedge funds like Blackrock will use as they "dabble in Bitcoin". It involves the BTC futures market being in contango and is too complex to extract but well worth reading. Third, David Gerard's Number go up with Tether — Musk and Bitcoin set the world on fire points out that Musk's $1.5B only covers 36 hours of USDT printing: "Tether has given up caring about plausible appearances, and is now printing a billion tethers at a time. As I write this, Tether states its reserve as $34,427,896,266.91 of book value. That’s $34.4 billion — every single dollar of which is backed by … pinky-swears, maybe? Tether still won’t reveal what they’re claiming to constitute backing reserves." In Bitcoin's 'Elon Musk pump' rally to $48K was exclusively driven by whales, Joseph Young writes: "n recent months, so-called “mega whales” sold large amounts of Bitcoin between $33,000 and $40,000. Orders ranging from $1 million to $10 million rose significantly across major cryptocurrency exchanges, including Binance. But as the price of Bitcoin began to consolidate above $33,000 after the correction from $40,000, the buyer demand from whales surged once again. Analysts at “Material Scientist” said that whales have been showing unusually large volume, around $150 million in 24 hours. This metric shows that whales are consistently accumulating Bitcoin in the aftermath of the news that Tesla bought $1.5 billion worth of BTC." February 21, 2021 at 4:49 PM David. said... Ethereum consumes about 22.5TWh/yr - much less than Bitcoin's 124TWh/yr, but still significant. It will continue to waste power until the switch to proof-of-stake, underway for the past 7 years, finally concludes. Don't hold your breath. February 22, 2021 at 10:21 AM David. said... The title of Jemima Kelly's Hey Citi, your bitcoin report is embarrassingly bad says all that needs to be said, but her whole post is a fun read. March 2, 2021 at 9:26 AM David. said... Jemima Kelley takes Citi's embarrassing "bitcoin report" to the woodshed again in The many chart crimes of *that* Citi bitcoin report: "Not only was this “report” actually just a massive bitcoin-shilling exercise, it also contained some really quite embarrassing errors from what is meant to be one of the top banks in the world (and their “premier thought leadership” division at that). The error that was probably most shocking was the apparent failure of the six Citi analysts who authored the report to grasp the difference between basis points and percentage points." March 3, 2021 at 6:43 AM David. said... Adam Tooze's Talking (and reading) about Bitcoin is an economist's view of Bitcoin: "To paraphrase Gramsci, crypto is the morbid symptom of an interregnum, an interregnum in which the gold standard is dead but a fully political money that dares to speak its name has not yet been born. Crypto is the libertarian spawn of neoliberalism’s ultimately doomed effort to depoliticize money." Tooze quotes Izabella Kaminska contrasting the backing of "fiat" by the requirement to pay tax with Bitcoin: "Private “hackers” routinely raise revenue from stealing private information and then demanding cryptocurrency in return. The process is known as a ransom attack. It might not be legal. It might even be classified as extortion or theft. But to the mindset of those who oppose “big government” or claim that “tax is theft”, it doesn’t appear all that different. A more important consideration is which of these entities — the hacker or a government — is more effective at enforcing their form of “tax collection” upon the system. The government, naturally, has force, imprisonment and the law on its side. And yet, in recent decades, that hasn’t been quite enough to guarantee effective tax collection from many types of individuals or corporations. Hackers, at a minimum, seem at least comparably effective at extracting funds from rich individuals or multinational organisations. In many cases, they also appear less willing to negotiate or to cut deals." March 5, 2021 at 8:49 AM David. said... IBM Blockchain Is a Shell of Its Former Self After Revenue Misses, Job Cuts: Sources by Ian Allison is the semi-official death-knell for IBM's Hyperledger: "IBM has cut its blockchain team down to almost nothing, according to four people familiar with the situation. Job losses at IBM (NYSE: IBM) escalated as the company failed to meet its revenue targets for the once-fêted technology by 90% this year, according to one of the sources." David Gerard comments: "Hyperledger was a perfect IBM project — a Potemkin village open source project, where all the work was done in an IBM office somewhere." March 5, 2021 at 2:23 PM David. said... Ketan Joshi's Bitcoin is a mouth hungry for fossil fuels is a righteous rant about cryptocurrencies' energy usage: "I think the story of Bitcoin isn’t a sideshow to climate; it’s actually a very significant and central force that will play a major role in dragging down the accelerating pace of positive change. This is because it has an energy consumption problem, it has a fossil fuel industry problem, and it has a deep cultural / ideological problem. All three, in symbiotic concert, position Bitcoin to stamp out the hard-fought wins of the past two decades, in climate. Years of blood, sweat and tears – in activism, in technological development, in policy and regulation – extinguished by a bunch of bros with laser-eye profile pictures." March 16, 2021 at 10:57 AM David. said... The externalities of cryptocurrencies, and bitcoin in particular, don't just include ruining the climate, but also ruining the lives of vulnerable elderly who have nothing to do with "crypto". Mark Rober's fascinating video Glitterbomb Trap Catches Phone Scammer (who gets arrested) reveals that Indian phone scammers transfer their ill-gotten gains from stealing the life savings of elderly victims from the US to India using Bitcoin. March 19, 2021 at 6:19 PM David. said... The subhead of Noah Smith's Bitcoin Miners Are on a Path to Self-Destruction is: "Producing the cryptocurrency is a massive drain on global power and computer chip supplies. Another way is needed before countries balk." March 26, 2021 at 11:50 AM David. said... In Before Bitfinex and Tether, Bennett Tomlin pulls together the "interesting" backgrounds of the "trustworthy" people behind Bitfinex & Tether. March 29, 2021 at 4:14 PM David. said... David Gerard reports that: "Coinbase has had to pay a $6.5 million fine to the CFTC for allowing an unnamed employee to wash-trade Litecoin on the platform. On some days, the employee’s wash-trading was 99% of the Litecoin/Bitcoin trading pair’s volume. Coinbase also operated two trading bots, “Hedger and Replicator,” which often matched each others’ orders, and reported these matches to the market." As he says: "If Coinbase — one of the more regulated exchanges — did this, just think what the unregulated exchanges get up to." Especially with the "trustworthy" characters running the unregulated exchanges. March 29, 2021 at 4:19 PM David. said... Martin C. W. Walker and Winnie Mosioma's Regulated cryptocurrency exchanges: sign of a maturing market or oxymoron? examines the (mostly lack of) regulation of exchanges and concludes; "In general, cryptocurrencies lack anyone that is genuinely accountable for core processes such as transfers of ownership, trade validation and creation of cryptocurrencies. A concern that can ultimately only be dealt with by acceptance of the situation or outright bans. However, the almost complete lack of regulation of the highly centralised cryptocurrency exchanges should be an easier-to-fill gap. Regulated entities relying on prices from “exchanges” for accounting or calculation of the value of futures contracts are clearly putting themselves at significant risk." Coinbase just filed for a $65B direct listing despite just having been fine $6.5M forwash-tradding Litecoin. April 14, 2021 at 12:10 PM David. said... Izabella Kaminska outlines the the risks underlying Coinbase's IPO in Why Coinbase’s stellar earnings are not what they seem. The sub-head is: "It’s easy to be profitable if your real unique selling point is being a beneficiary of regulatory arbitrage." And she concludes: "Coinbase may be a hugely profitable business, but it may also be a uniquely risky one relative to regulated trading venues such as the CME or ICE, neither of which are allowed to take principal positions to facilitate liquidity on their platforms. Instead, they rely on third party liquidity providers. Coinbase, however, is not only known to match client transactions on an internalised “offchain” basis (that is, not via the primary blockchain) but also to square-off residual unmatched positions via bilateral relationships in crypto over-the-counter markets, where it happens to have established itself as a prominent market maker. It’s an ironic state of affairs because the netting processes that are at the heart of this system expose Coinbase to the very same risks that real-time gross settlement systems (such as bitcoin) were meant to vanquish." April 16, 2021 at 1:24 PM David. said... Nathan J. Robinson hits the nail on the head with Why Cryptocurrency Is A Giant Fraud: "You may have ignored Bitcoin because the evangelists for it are some of the most insufferable people on the planet—and you may also have kicked yourself because if you had listened to the first guy you met who told you about Bitcoin way back, you’d be a millionaire today. But now it’s time to understand: is this, as its proponents say, the future of money?" and: "But as is generally the case when someone is trying to sell you something, the whole thing should seem extremely fishy. In fact, much of the cryptocurrency pitch is worse than fishy. It’s downright fraudulent, promising people benefits that they will not get and trying to trick them into believing in and spreading something that will not do them any good. When you examine the actual arguments made for using cryptocurrencies as currency, rather than just being wowed by the complex underlying system and words like “autonomy,” “global,” and “seamless,” the case for their use by most people collapses utterly. Many believe in it because they have swallowed libertarian dogmas that do not reflect how the world actually works." Robinson carefully dismantles the idea that cryptocurrencies offer "security", "privacy", "convenience", and many of the other arguments for them. TGhe whole article is well worth reading. April 25, 2021 at 5:34 PM Post a Comment Newer Post Older Post Home Subscribe to: Post Comments (Atom) Blog Rules Posts and comments are copyright of their respective authors who, by posting or commenting, license their work under a Creative Commons Attribution-Share Alike 3.0 United States License. Off-topic or unsuitable comments will be deleted. DSHR DSHR in ANWR Recent Comments Full comments Blog Archive ▼  2021 (18) ►  April (5) ►  March (3) ►  February (5) ▼  January (5) Effort Balancing And Rate Limits ISP Monopolies The Bitcoin "Price" Two Million Page Views! The New Oldweb.today ►  2020 (55) ►  December (4) ►  November (4) ►  October (3) ►  September (6) ►  August (5) ►  July (3) ►  June (6) ►  May (3) ►  April (5) ►  March (6) ►  February (5) ►  January (5) ►  2019 (66) ►  December (2) ►  November (4) ►  October (8) ►  September (5) ►  August (5) ►  July (7) ►  June (6) ►  May (7) ►  April (6) ►  March (7) ►  February (4) ►  January (5) ►  2018 (96) ►  December (7) ►  November (8) ►  October (10) ►  September (5) ►  August (8) ►  July (5) ►  June (7) ►  May (10) ►  April (8) ►  March (9) ►  February (9) ►  January (10) ►  2017 (82) ►  December (6) ►  November (6) ►  October (8) ►  September (6) ►  August (7) ►  July (5) ►  June (7) ►  May (6) ►  April (7) ►  March (11) ►  February (5) ►  January (8) ►  2016 (89) ►  December (4) ►  November (8) ►  October (10) ►  September (8) ►  August (8) ►  July (7) ►  June (8) ►  May (7) ►  April (5) ►  March (10) ►  February (7) ►  January (7) ►  2015 (75) ►  December (7) ►  November (5) ►  October (11) ►  September (5) ►  August (3) ►  July (3) ►  June (8) ►  May (10) ►  April (6) ►  March (6) ►  February (7) ►  January (4) ►  2014 (68) ►  December (7) ►  November (8) ►  October (6) ►  September (8) ►  August (7) ►  July (3) ►  June (5) ►  May (6) ►  April (5) ►  March (6) ►  February (2) ►  January (5) ►  2013 (67) ►  December (3) ►  November (6) ►  October (7) ►  September (6) ►  August (3) ►  July (5) ►  June (6) ►  May (5) ►  April (9) ►  March (5) ►  February (5) ►  January (7) ►  2012 (43) ►  December (4) ►  November (4) ►  October (6) ►  September (6) ►  August (2) ►  July (5) ►  June (2) ►  May (5) ►  March (1) ►  February (5) ►  January (3) ►  2011 (40) ►  December (2) ►  November (1) ►  October (7) ►  September (3) ►  August (5) ►  July (2) ►  June (2) ►  May (2) ►  April (4) ►  March (4) ►  February (4) ►  January (4) ►  2010 (17) ►  December (5) ►  November (3) ►  October (4) ►  September (2) ►  July (1) ►  June (1) ►  February (1) ►  2009 (8) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (2) ►  January (2) ►  2008 (8) ►  December (2) ►  March (1) ►  January (5) ►  2007 (14) ►  December (1) ►  October (3) ►  September (1) ►  August (1) ►  July (2) ►  June (3) ►  May (1) ►  April (2) LOCKSS system has permission to collect, preserve, and serve this Archival Unit. Simple theme. Powered by Blogger. blog-dshr-org-9954 ---- DSHR's Blog: What Is The Point? DSHR's Blog I'm David Rosenthal, and this is a place to discuss the work I'm doing in Digital Preservation. Thursday, April 22, 2021 What Is The Point? During a discussion of NFTs, Larry Masinter pointed me to his 2012 proposal The 'tdb' and 'duri' URI schemes, based on dated URIs. The proposal's abstract reads: This document defines two URI schemes. The first, 'duri' (standing for "dated URI"), identifies a resource as of a particular time. This allows explicit reference to the "time of retrieval", similar to the way in which bibliographic references containing URIs are often written. The second scheme, 'tdb' ( standing for "Thing Described By"), provides a way of minting URIs for anything that can be described, by the means of identifying a description as of a particular time. These schemes were posited as "thought experiments", and therefore this document is designated as Experimental. As far as I can tell, this proposal went nowhere, but it raises a question that is also raised by NFTs. What is the point of a link that is unlikely to continue to resolve to the expected content? Below the fold I explore this question. I think there are two main reasons why duri: went nowhere: The duri: concept implies that Web content in general is not static, but it is actually much more dynamic than that. Even the duri: specification admits this: There are many URIs which are, unfortunately, not particularly "uniform", in the sense that two clients can observe completely different content for the same resource, at exactly the same time. Personalization, advertisements, geolocation, watermarks, all make it very unlikely that either several clients accessing the same URI at the same time, or a single client accessing the same URI at different times, would see the same content. When this proposal was put forward in 2012, it was competing with a less elegant but much more useful competitor that had been in use for 16 years. The duri: specificartion admits that: There are no direct resolution servers or processes for 'duri' or 'tdb' URIs. However, a 'duri' URI might be "resolvable" in the sense that a resource that was accessed at a point in time might have the result of that access cached or archived in an Internet archive service. See, for example, the "Internet Archive" project But the duri: URI doesn't provide the information needed to resolve to the "cached or archived" content. The Internet Archive's Wayback Machine uses URIs which, instead of the prefix duri:[datetime]: have the prefix https://web.archive.org/web/[datetime]/. This is more useful, both because browsers will actually resolve these URIs, and because they resolve to a service devoted to delivering the content of the URI at the specified time. The competition for duri: was not merely long established, but also actually did what users presumably wanted, which was to resolve to the content of the specified URL at the specified time. It is true that a user creating a Wayback Machine URL, perhaps using the "Save Page Now" button, would preserve the content accessed by the Wayback Machine's crawler. which might be different from that accessed by the user themselves. But the user could compare the two versions at the time of creation, and avoid using the created Wayback Machine URL if the differences were significant. Publishing a Wayback Machine URL carries an implicit warranty that the creator regarded any differences as insignificant. The history of duri: suggests that there isn't a lot of point in "durable" URIs lacking an expectation that they will continue to resolve to the original content. NFTs have the expectation, but lack the mechanism necessary to satisfy the expectation. Posted by David. at 8:00 AM Labels: personal digital preservation, web archiving No comments: Post a Comment Newer Post Older Post Home Subscribe to: Post Comments (Atom) Blog Rules Posts and comments are copyright of their respective authors who, by posting or commenting, license their work under a Creative Commons Attribution-Share Alike 3.0 United States License. Off-topic or unsuitable comments will be deleted. DSHR DSHR in ANWR Recent Comments Full comments Blog Archive ▼  2021 (18) ▼  April (5) Dogecoin Disrupts Bitcoin! What Is The Point? NFTs and Web Archiving Cryptocurrency's Carbon Footprint Elon Musk: Threat or Menace? ►  March (3) ►  February (5) ►  January (5) ►  2020 (55) ►  December (4) ►  November (4) ►  October (3) ►  September (6) ►  August (5) ►  July (3) ►  June (6) ►  May (3) ►  April (5) ►  March (6) ►  February (5) ►  January (5) ►  2019 (66) ►  December (2) ►  November (4) ►  October (8) ►  September (5) ►  August (5) ►  July (7) ►  June (6) ►  May (7) ►  April (6) ►  March (7) ►  February (4) ►  January (5) ►  2018 (96) ►  December (7) ►  November (8) ►  October (10) ►  September (5) ►  August (8) ►  July (5) ►  June (7) ►  May (10) ►  April (8) ►  March (9) ►  February (9) ►  January (10) ►  2017 (82) ►  December (6) ►  November (6) ►  October (8) ►  September (6) ►  August (7) ►  July (5) ►  June (7) ►  May (6) ►  April (7) ►  March (11) ►  February (5) ►  January (8) ►  2016 (89) ►  December (4) ►  November (8) ►  October (10) ►  September (8) ►  August (8) ►  July (7) ►  June (8) ►  May (7) ►  April (5) ►  March (10) ►  February (7) ►  January (7) ►  2015 (75) ►  December (7) ►  November (5) ►  October (11) ►  September (5) ►  August (3) ►  July (3) ►  June (8) ►  May (10) ►  April (6) ►  March (6) ►  February (7) ►  January (4) ►  2014 (68) ►  December (7) ►  November (8) ►  October (6) ►  September (8) ►  August (7) ►  July (3) ►  June (5) ►  May (6) ►  April (5) ►  March (6) ►  February (2) ►  January (5) ►  2013 (67) ►  December (3) ►  November (6) ►  October (7) ►  September (6) ►  August (3) ►  July (5) ►  June (6) ►  May (5) ►  April (9) ►  March (5) ►  February (5) ►  January (7) ►  2012 (43) ►  December (4) ►  November (4) ►  October (6) ►  September (6) ►  August (2) ►  July (5) ►  June (2) ►  May (5) ►  March (1) ►  February (5) ►  January (3) ►  2011 (40) ►  December (2) ►  November (1) ►  October (7) ►  September (3) ►  August (5) ►  July (2) ►  June (2) ►  May (2) ►  April (4) ►  March (4) ►  February (4) ►  January (4) ►  2010 (17) ►  December (5) ►  November (3) ►  October (4) ►  September (2) ►  July (1) ►  June (1) ►  February (1) ►  2009 (8) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (2) ►  January (2) ►  2008 (8) ►  December (2) ►  March (1) ►  January (5) ►  2007 (14) ►  December (1) ►  October (3) ►  September (1) ►  August (1) ►  July (2) ►  June (3) ►  May (1) ►  April (2) LOCKSS system has permission to collect, preserve, and serve this Archival Unit. Simple theme. Powered by Blogger. blog-esilibrary-com-6095 ---- Equinox Open Library Initiative Skip to content Facebook-f Twitter Linkedin-in Vimeo About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Fulfillment CORAL Services Consulting Migration Development Hosting & Support Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Collaborate Communities Partnerships Grants We Provide Connect Sales Support Donate Social Media × About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Fulfillment CORAL Services Consulting Migration Development Hosting & Support Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Collaborate Communities Partnerships Grants We Provide Connect Sales Support Donate Social Media About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Koha On Demand Koha Dedicated Hosting Fulfillment CORAL SubjectsPlus Services Consulting Workflow and Advanced ILS Consultation Data Services Web Design IT Consultation Migration Development Hosting & Support Sequoia Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Resource Library Collaborate Communities Evergreen Koha CORAL Equinox Grants Connect Sales Support Donate Contact Us × About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Koha On Demand Koha Dedicated Hosting Fulfillment CORAL SubjectsPlus Services Consulting Workflow and Advanced ILS Consultation Data Services Web Design IT Consultation Migration Development Hosting & Support Sequoia Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Resource Library Collaborate Communities Evergreen Koha CORAL Equinox Grants Connect Sales Support Donate Contact Us About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Fulfillment CORAL Services Consulting Migration Development Hosting & Support Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Collaborate Communities Partnerships Grants We Provide Connect Sales Support Donate Social Media × About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Fulfillment CORAL Services Consulting Migration Development Hosting & Support Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Collaborate Communities Partnerships Grants We Provide Connect Sales Support Donate Social Media Equinox provides innovative open source software for libraries of all types. Extraordinary service. Exceptional value. As a 501(c)(3) nonprofit corporation, Equinox supports library automation by investing in open source software and providing technology services for libraries. Products Services Ask Us How » About Equinox » News & Events Press Release Equinox Open Library Initiative Awards Center for Khmer Studies the Equinox Open Source Grant Learn More » Press Release Equinox Open Library Initiative Awards Vermont Jazz Center the Equinox Open Source Grant Learn More » Press Release Equinox Launches New Website Featuring Open Source Library Products, Services, and Education Learn More » Products & Services Koha is the first free and open source library automation package. Equinox’s team includes some of Koha’s core developers. Learn More Evergreen is a unique and powerful open source ILS designed to support large, dispersed, and multi-tiered library networks. Learn More Equinox provides ongoing educational opportunities through equinoxEDU, including live webinars, workshops, and online resources. Learn More Fulfillment is an open source interlibrary loan management system. Fulfillment can be used alongside or in connection with any integrated library system. Learn More CORAL is an open source electronic resources management system. Its interoperable modules allow libraries to streamline their management of electronic resources. Learn More Customized For Your Library Consulting Migration Development Hosting & Support Training & Education Why Choose Equinox? Equinox is different from most ILS providers. As a non-profit organization, our guiding principle is to provide a transparent, open software development process, and we release all code developed to publicly available repositories. Equinox is experienced with serving libraries of all types in the United States and internationally. We’ve supported and migrated libraries of all sizes, from single library sites to full statewide implementations. Equinox is technically proficient, with skilled project managers, software developers, and data services staff ready to assist you. We’ve helped libraries automating for the first time and those migrating from legacy ILS systems. Equinox knows libraries. More than fifty percent of our team are professional librarians with direct experience working in academic, government, public and special libraries. We understand the context and ecosystem of library software. Sign up today for news & updates! Please enable JavaScript in your browser to complete this form. Email * Name *First Last I'd like to hear more about: Koha Evergreen equinoxEDU Other Please describe: Submit Working with Equinox has been like night and day. It's amazing to have a system so accessible to our patrons and easy to use. It has super-charged our library lending power! Brooke MatsonExecutive Director, Spark Central Equinox Open Library Initiative hosts Evergreen for the SCLENDS library consortium. Their technical support has been both prompt, responsive, and professional in reacting to our support requests during COVID-19. They have been a valuable consortium partner in meeting the needs of the member libraries and their patrons. Chris YatesSouth Carolina State Library Working with Equinox was great! They were able to migrate our entire consortium with no down time during working hours. The Equinox team went the extra mile in helping Missouri Evergreen. Colleen KnightMissouri Evergreen Previous Next Twitter Equinox OLIFollow Equinox OLI@EquinoxOLI·13h Our latest: "Equinox Open Library Initiative Awards Vermont Jazz Center the Equinox Open Source Grant." Read more: https://bit.ly/3xjCLFT #equinoxgrant #grant #opensource #kohails #oss @vtjazz Reply on Twitter 1386891644117659648Retweet on Twitter 13868916441176596481Like on Twitter 13868916441176596481Twitter 1386891644117659648 Retweet on TwitterEquinox OLI Retweeted Newswire@inewswire·20 Apr Equinox Open Library Initiative celebrates 15 years as a small business delivering 'Extraordinary Service. Exceptional Value' to libraries worldwide: https://www.newswire.com/news/equinox-launches-new-website-featuring-open-source-library-products-21368731 Reply on Twitter 1384523516700200960Retweet on Twitter 13845235167002009602Like on Twitter 13845235167002009602Twitter 1384523516700200960 Equinox OLI@EquinoxOLI·20 Apr equinoxEDU : Spotlight on the @EvergreenILS Booking Module registration is open! Save your spot for April 28, 1-2pm EDT. Free & open: https://bit.ly/3ahwbGl Reply on Twitter 1384354942253768707Retweet on Twitter 1384354942253768707Like on Twitter 1384354942253768707Twitter 1384354942253768707 Equinox OLI@EquinoxOLI·17 Apr #ICYMI - @EvergreenILS 3.7.0 is here! Congrats to the community! #evgils #oss #opensource Evergreen ILS@EvergreenILSEvergreen 3.7.0 released! New features include SAML, hold groups, boostrap OPAC, "did you mean?" and much, much more, read about it here: https://evergreen-ils.org/evergreen-3-7-0-released/ #evgils Reply on Twitter 1383220421139386370Retweet on Twitter 13832204211393863701Like on Twitter 13832204211393863702Twitter 1383220421139386370 Equinox OLI@EquinoxOLI·16 Apr Our latest: "Equinox Launches New Website Featuring Open Source Library Products, Services, and Education." Read more: https://www.equinoxoli.org/equinox-launches-new-website-featuring-open-source-library-products-services-and-education/ #libraries #services #consulting #interlibraryloan #opensource #training #tech #evgils #kohails #oss Reply on Twitter 1383089316151365636Retweet on Twitter 13830893161513656362Like on Twitter 1383089316151365636Twitter 1383089316151365636 Events Open Source Twitter Chat with Guest Moderator Becky Yoose #ChatOpenS Event Join us on Twitter with the hashtag #ChatOpenS as we discuss cybersecurity with Becky Yoose of LDH Consulting Services. Read More Open Source Twitter Chat with Rogan Hamby #ChatOpenS Event 03/17/2021 Join us on Twitter @EquinoxOLI and the #ChatOpenS hashtag from 12-1pm EDT as we discuss all things #opensource & libraries. Moderated by Rogan Hamby, Data and Project Analyst for Read More EquinoxEDU: Spotlight on Evergreen 3.6 Event 03/05/2021 Join us for an EquinoxEDU: Spotlight session on new features in the Evergreen ILS! In this live webinar we will highlight some of the newest features in version 3.6 Read More Equinox Open Library Initiative Equinox Open Library Initiative Inc. is a 501(c)3 corporation devoted to the support of open source software for public libraries, academic libraries, school libraries, and special libraries. As the successor to Equinox Software, Inc., Equinox provides exceptional service and technical expertise delivered by experienced librarians and technical staff. Equinox offers affordable, customized consulting services, software development, hosting, training, and technology support for libraries of all sizes and types. Connect Please enable JavaScript in your browser to complete this form. Email Submit Facebook-f Twitter Linkedin-in Vimeo Contact Us info@equinoxoli.org 877.OPEN.ILS (877.673.6457) +1.770.709.5555 PO Box 69 Norcross, GA 30091 Copyright © 2007 – 2021 Equinox Open Library Initiative. All rights reserved. Privacy Policy  |   Terms of Use  |   Equinox Library Services Canada  |   Site Map Skip to content Open toolbar Accessibility Tools Increase Text Decrease Text Grayscale High Contrast Negative Contrast Light Background Links Underline Readable Font Reset blog-esilibrary-com-7328 ---- Equinox Open Library Initiative Skip to content Facebook-f Twitter Linkedin-in Vimeo About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Fulfillment CORAL Services Consulting Migration Development Hosting & Support Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Collaborate Communities Partnerships Grants We Provide Connect Sales Support Donate Social Media × About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Fulfillment CORAL Services Consulting Migration Development Hosting & Support Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Collaborate Communities Partnerships Grants We Provide Connect Sales Support Donate Social Media About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Koha On Demand Koha Dedicated Hosting Fulfillment CORAL SubjectsPlus Services Consulting Workflow and Advanced ILS Consultation Data Services Web Design IT Consultation Migration Development Hosting & Support Sequoia Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Resource Library Collaborate Communities Evergreen Koha CORAL Equinox Grants Connect Sales Support Donate Contact Us × About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Koha On Demand Koha Dedicated Hosting Fulfillment CORAL SubjectsPlus Services Consulting Workflow and Advanced ILS Consultation Data Services Web Design IT Consultation Migration Development Hosting & Support Sequoia Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Resource Library Collaborate Communities Evergreen Koha CORAL Equinox Grants Connect Sales Support Donate Contact Us About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Fulfillment CORAL Services Consulting Migration Development Hosting & Support Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Collaborate Communities Partnerships Grants We Provide Connect Sales Support Donate Social Media × About Our Team Newsroom Events History Ethics Disclosures Products Evergreen Koha Fulfillment CORAL Services Consulting Migration Development Hosting & Support Training & Education Learn equinoxEDU Tips & Tricks Conference Presentations Collaborate Communities Partnerships Grants We Provide Connect Sales Support Donate Social Media Equinox provides innovative open source software for libraries of all types. Extraordinary service. Exceptional value. As a 501(c)(3) nonprofit corporation, Equinox supports library automation by investing in open source software and providing technology services for libraries. Products Services Ask Us How » About Equinox » News & Events Press Release Equinox Open Library Initiative Awards Center for Khmer Studies the Equinox Open Source Grant Learn More » Press Release Equinox Open Library Initiative Awards Vermont Jazz Center the Equinox Open Source Grant Learn More » Press Release Equinox Launches New Website Featuring Open Source Library Products, Services, and Education Learn More » Products & Services Koha is the first free and open source library automation package. Equinox’s team includes some of Koha’s core developers. Learn More Evergreen is a unique and powerful open source ILS designed to support large, dispersed, and multi-tiered library networks. Learn More Equinox provides ongoing educational opportunities through equinoxEDU, including live webinars, workshops, and online resources. Learn More Fulfillment is an open source interlibrary loan management system. Fulfillment can be used alongside or in connection with any integrated library system. Learn More CORAL is an open source electronic resources management system. Its interoperable modules allow libraries to streamline their management of electronic resources. Learn More Customized For Your Library Consulting Migration Development Hosting & Support Training & Education Why Choose Equinox? Equinox is different from most ILS providers. As a non-profit organization, our guiding principle is to provide a transparent, open software development process, and we release all code developed to publicly available repositories. Equinox is experienced with serving libraries of all types in the United States and internationally. We’ve supported and migrated libraries of all sizes, from single library sites to full statewide implementations. Equinox is technically proficient, with skilled project managers, software developers, and data services staff ready to assist you. We’ve helped libraries automating for the first time and those migrating from legacy ILS systems. Equinox knows libraries. More than fifty percent of our team are professional librarians with direct experience working in academic, government, public and special libraries. We understand the context and ecosystem of library software. Sign up today for news & updates! Please enable JavaScript in your browser to complete this form. Email * Name *First Last I'd like to hear more about: Koha Evergreen equinoxEDU Other Please describe: Submit Working with Equinox has been like night and day. It's amazing to have a system so accessible to our patrons and easy to use. It has super-charged our library lending power! Brooke MatsonExecutive Director, Spark Central Equinox Open Library Initiative hosts Evergreen for the SCLENDS library consortium. Their technical support has been both prompt, responsive, and professional in reacting to our support requests during COVID-19. They have been a valuable consortium partner in meeting the needs of the member libraries and their patrons. Chris YatesSouth Carolina State Library Working with Equinox was great! They were able to migrate our entire consortium with no down time during working hours. The Equinox team went the extra mile in helping Missouri Evergreen. Colleen KnightMissouri Evergreen Previous Next Twitter Equinox OLIFollow Equinox OLI@EquinoxOLI·13h Our latest: "Equinox Open Library Initiative Awards Vermont Jazz Center the Equinox Open Source Grant." Read more: https://bit.ly/3xjCLFT #equinoxgrant #grant #opensource #kohails #oss @vtjazz Reply on Twitter 1386891644117659648Retweet on Twitter 13868916441176596481Like on Twitter 13868916441176596481Twitter 1386891644117659648 Retweet on TwitterEquinox OLI Retweeted Newswire@inewswire·20 Apr Equinox Open Library Initiative celebrates 15 years as a small business delivering 'Extraordinary Service. Exceptional Value' to libraries worldwide: https://www.newswire.com/news/equinox-launches-new-website-featuring-open-source-library-products-21368731 Reply on Twitter 1384523516700200960Retweet on Twitter 13845235167002009602Like on Twitter 13845235167002009602Twitter 1384523516700200960 Equinox OLI@EquinoxOLI·20 Apr equinoxEDU : Spotlight on the @EvergreenILS Booking Module registration is open! Save your spot for April 28, 1-2pm EDT. Free & open: https://bit.ly/3ahwbGl Reply on Twitter 1384354942253768707Retweet on Twitter 1384354942253768707Like on Twitter 1384354942253768707Twitter 1384354942253768707 Equinox OLI@EquinoxOLI·17 Apr #ICYMI - @EvergreenILS 3.7.0 is here! Congrats to the community! #evgils #oss #opensource Evergreen ILS@EvergreenILSEvergreen 3.7.0 released! New features include SAML, hold groups, boostrap OPAC, "did you mean?" and much, much more, read about it here: https://evergreen-ils.org/evergreen-3-7-0-released/ #evgils Reply on Twitter 1383220421139386370Retweet on Twitter 13832204211393863701Like on Twitter 13832204211393863702Twitter 1383220421139386370 Equinox OLI@EquinoxOLI·16 Apr Our latest: "Equinox Launches New Website Featuring Open Source Library Products, Services, and Education." Read more: https://www.equinoxoli.org/equinox-launches-new-website-featuring-open-source-library-products-services-and-education/ #libraries #services #consulting #interlibraryloan #opensource #training #tech #evgils #kohails #oss Reply on Twitter 1383089316151365636Retweet on Twitter 13830893161513656362Like on Twitter 1383089316151365636Twitter 1383089316151365636 Events Open Source Twitter Chat with Guest Moderator Becky Yoose #ChatOpenS Event Join us on Twitter with the hashtag #ChatOpenS as we discuss cybersecurity with Becky Yoose of LDH Consulting Services. Read More Open Source Twitter Chat with Rogan Hamby #ChatOpenS Event 03/17/2021 Join us on Twitter @EquinoxOLI and the #ChatOpenS hashtag from 12-1pm EDT as we discuss all things #opensource & libraries. Moderated by Rogan Hamby, Data and Project Analyst for Read More EquinoxEDU: Spotlight on Evergreen 3.6 Event 03/05/2021 Join us for an EquinoxEDU: Spotlight session on new features in the Evergreen ILS! In this live webinar we will highlight some of the newest features in version 3.6 Read More Equinox Open Library Initiative Equinox Open Library Initiative Inc. is a 501(c)3 corporation devoted to the support of open source software for public libraries, academic libraries, school libraries, and special libraries. As the successor to Equinox Software, Inc., Equinox provides exceptional service and technical expertise delivered by experienced librarians and technical staff. Equinox offers affordable, customized consulting services, software development, hosting, training, and technology support for libraries of all sizes and types. Connect Please enable JavaScript in your browser to complete this form. Email Submit Facebook-f Twitter Linkedin-in Vimeo Contact Us info@equinoxoli.org 877.OPEN.ILS (877.673.6457) +1.770.709.5555 PO Box 69 Norcross, GA 30091 Copyright © 2007 – 2021 Equinox Open Library Initiative. All rights reserved. Privacy Policy  |   Terms of Use  |   Equinox Library Services Canada  |   Site Map Skip to content Open toolbar Accessibility Tools Increase Text Decrease Text Grayscale High Contrast Negative Contrast Light Background Links Underline Readable Font Reset blog-iandavis-com-2237 ---- Internet Alchemy, the blog of Ian Davis Internet Alchemy est. 1999 2017 · 2011 · 2006 · 2001 2016 · 2010 · 2005 · 2000 2015 · 2009 · 2004 · 1999 2014 · 2008 · 2003 2012 · 2007 · 2002                      Mon, Oct 23, 2017 Serverless: why microfunctions > microservices This post follows on from a post I wrote a couple of years back called Why Service Architectures Should Focus on Workflows. In that post I attempted to describe the fragility of microservice systems that were simply translating object-oriented patterns to the new paradigm. These systems were migrating domain models and their interactions from in-memory objects to separate networked processes. They were replacing in-process function calls with cross-network rpc calls, adding latency and infrastructure complexity. The goal was scalability and flexibility but, I argued, the entity modelling approach introduced new failure modes. I suggested a solution: Instead of carving up the domain by entity, focus on the workflows. If I was writing that post today I would say “focus on the functions” because the future is serverless functions, not microservices. Or, more brashly: microfunctions > microservices The industry has moved apace in the last 3 years with a focus on solving the infrastructure challenges caused by running hundreds of intercommunicating microservices. Containers have matured and become the de-facto standard for the unit of microservice deployment with management platforms such as Kubernetes to orchestrate them and frameworks like GRPC for robust interservice communication. The focus still tends to be on interacting entities though: when placing an order the “order service” talks to the “customer service” which reserves items by talking to the “stock service” and the “payment service” which talks to the “payment gateway” after first checking with the “fraud service”. When the order needs to be shipped the “shipping service” asks the “order service” for orders that need to be fulfilled and tells the “stock service” to remove the reservation, then to the “customer service” to locate the customer etc. All of these services are likely to be persisting state in various backend databases. Microservices are organized as vertical slices through the domain: The same problems still exist: if the customer service is overwhelmed by the shipping service then the order service can’t take new orders. The container manager will, of course, scale up the number of customer service instances and register them with the appropriate load balancers, discovery servers, monitoring and logging. However, it cannot easily cope with a critical failure in this service, perhaps caused by a repeated bad request that panics the service and prevents multiple dependent services from operating properly. Failures and slowdowns in response times are handled within client services through backoff strategies, circuit breakers and retries. The system as a whole increases in complexity but remains fragile. By contrast, in a serverless architecture, the emphasis is on the functions of the system. For this reason serverless is sometimes called FaaS – Functions as a Service. Systems are decomposed into functions that encapsulate a single task in a single process. Instead of each request involving the orchestration of multiple services the request uses an instance of the appropriate function. Rather than the domain model being exploded into separate networked processes its entities are provided in code libraries compiled into the function at build time. Calls to entity methods are in-process so don’t pay the network latency or reliability taxes. In this paradigm the “place order” function simply calls methods on customer, stock and payment objects, which may then interact with the various backend databases directly. Instead of a dozen networked RPC calls, the function relies on 2-3 database calls. Additionally, if a function is particularly hot it can be scaled directly without affecting the operation of other functions and, crucially, it can fail completely without taking down other functions. (Modulo the reliability of databases which affect both styles of architecture identically.) Microfunctions are horizontal slices through the domain: The advantages I wrote last time still hold up when translated to serverless terminology: Deploying or retiring a function becomes as simple as switching it on or off which leads to greater freedom to experiment. Scaling a function is limited to scaling a single type of process horizontally and the costs of doing this can be cleanly evaluated. The system as a whole becomes much more robust. When a function encounters problems it is limited to a single workflow such as issuing invoices. Other functions can continue to operate independently. Latency, bandwidth use and reliability are all improved because there are fewer network calls. The function still relies on the database and other support systems such as lock servers, but most of the data flow is controlled in-process. The unit of testing and deployment is a single function which reduces the complexity and cost of maintenance. One major advantage that I missed is the potential for extreme cost savings through scale, particularly the scale attainable by running on public shared infrastructure. Since all the variability of microservice deployment configurations is abstracted away into a simple request/response interface the microfunctions can be run as isolated shared-nothing processes, billed only for the resources they use in their short lifetime. Anyone who has costed for redundant microservices simply for basic resilience will appreciate the potential here. Although there are number of cloud providers in this space (AWS Lambda, Google Cloud Functions, Azure Functions) serverless is still an emerging paradigm with the problems that come with immaturity. Adrian Coyler recently summarized an excellent paper and presentation dealing with the challenges of building serverless systems which highlights many of these, including the lack of service level agreements and loose performance guarantees. It seems almost certain though that these will improve as the space matures and overtakes the microservice paradigm. Other posts tagged as architecture, distributed-systems, technology, serverless, faas Earlier Posts Gorecipes: Fin Wed, Mar 30 2016 Another Blog Refresh Sun, Feb 22 2015 Why Service Architectures Should Focus on Workflows Mon, Mar 31 2014 Help me crowdfund my game Amberfell Mon, Nov 12 2012 blog-librarything-com-3311 ---- The Thingology Blog The Thingology Blog New Syndetics Unbound Feature: Mark and Boost Electronic Resources ProQuest and LibraryThing have just introduced a major new feature to our catalog-enrichment suite, Syndetics Unbound, to meet the needs of libraries during the COVID-19 crisis. Our friends at ProQuest blogged about it briefly on the ProQuest blog. This blog post goes into greater detail about what we did, how we did it, and what […] Introducing Syndetics Unbound Short Version Today we’re going public with a new product for libraries, jointly developed by LibraryThing and ProQuest. It’s called Syndetics Unbound, and it makes library catalogs better, with catalog enrichments that provide information about each item, and jumping-off points for exploring the catalog. To see it in action, check out the Hartford Public Library […] ALAMW 2016 in Boston (and Free Passes)! Abby and KJ will be at ALA Midwinter in Boston this weekend, showing off LibraryThing for Libraries. Since the conference is so close to LibraryThing headquarters, chances are good that a few other LT staff members may appear, as well! Visit Us. Stop by booth #1717 to meet Abby & KJ (and potential mystery guests!), […] For ALA 2015: Three Free OPAC Enhancements For a limited time, LibraryThing for Libraries (LTFL) is offering three of its signature enhancements for free! There are no strings attached. We want people to see how LibraryThing for Libraries can improve your catalog. Check Library. The Check Library button is a “bookmarklet” that allows patrons to check if your library has a book […] ALA 2015 in San Francisco (Free Passes) Our booth. But this is Kate, not Tim or Abby. She had the baby. Tim and I are headed to San Francisco this weekend for the ALA Annual Conference. Visit Us. Stop by booth #3634 to talk to us, get a demo, and learn about all the new and fun things we’re up to with […] New “More Like This” for LibraryThing for Libraries We’ve just released “More Like This,” a major upgrade to LibraryThing for Libraries’ “Similar items” recommendations. The upgrade is free and automatic for all current subscribers to LibraryThing for Libraries Catalog Enhancement Package. It adds several new categories of recommendations, as well as new features. We’ve got text about it below, but here’s a short […] Subjects and the Ship of Theseus I thought I might take a break to post an amusing photo of something I wrote out today: The photo is a first draft of a database schema for a revamp of how LibraryThing will do library subjects. All told, it has 26 tables. Gulp. About eight of the tables do what a good cataloging […] LibraryThing Recommends in BiblioCommons Does your library use BiblioCommons as its catalog? LibraryThing and BiblioCommons now work together to give you high-quality reading recommendations in your BiblioCommons catalog. You can see some examples here. Look for “LibraryThing Recommends” on the right side. Not That Kind of Girl (Daniel Boone Regional Library) Carthage Must Be Destroyed (Ottowa Public Library) The […] NEW: Annotations for Book Display Widgets Our Book Display Widgets is getting adopted by more and more libraries, and we’re busy making it better and better. Last week we introduced Easy Share. This week we’re rolling out another improvement—Annotations! Book Display Widgets is the ultimate tool for libraries to create automatic or hand-picked virtual book displays for their home page, blog, […] Send us a programmer, win $1,000 in books. We just posted a new job post Job: Library Developer at LibraryThing (Telecommute). To sweeten the deal, we are offering $1,000 worth of books to the person who finds them. That’s a lot of books. Rules! You get a $1,000 gift certificate to the local, chain or online bookseller of your choice. To qualify, you […] blog-librarything-com-8946 ---- The Thingology Blog The LibraryThing Blog Thingology Monday, April 20th, 2020 New Syndetics Unbound Feature: Mark and Boost Electronic Resources ProQuest and LibraryThing have just introduced a major new feature to our catalog-enrichment suite, Syndetics Unbound, to meet the needs of libraries during the COVID-19 crisis. Our friends at ProQuest blogged about it briefly on the ProQuest blog. This blog post goes into greater detail about what we did, how we did it, and what efforts like this may mean for library catalogs in the future. What it Does The feature, “Mark and Boost Electronic Resources,” turns Syndetics Unbound from a general catalog enrichment tool to one focused on your library’s electronic resources—the resources patrons can access during a library shutdown. We hope it encourages libraries to continue to promote their catalog, the library’s own and most complete collection repository, instead of sending patrons to a host of partial, third-party eresource platforms. The new feature marks the library’s electronic resources and “boosts,” or promotes, them in Syndetics Unbound’s discovery enhancements, such as “You May Also Like,” “Other Editions,” “Tags” and “Reading Levels.” Here’s a screenshot showing the feature in action. How it Works The feature is composed of three settings. By default, they all turn on together, but they can be independently turned off and on. Boost electronic resources chooses to show electronic editions of an item where they exist, and boosts such items within discovery elements. Mark electronic resources with an “e” icon marks all electronic resources—ebooks, eaudio, and streaming video. Add electronic resources message at top of page adds a customizable message to the top of the Syndetics Unbound area. “Mark and Boost Electronic Holdings” works across all enrichments. It is particularly important for “Also Available As” which lists all the other formats for a given title. Enabling this feature sorts electronic resources to the front of the list. We also suggest that, for now, libraries may want to put “Also Available As” at the top of their enrichment order. Why We Did It Your catalog is only as good as your holdings. Faced with a world in which physical holdings are off-limits and electronic resources essential, many libraries have discouraged use of the catalog, which is dominated by non-digital resources, in favor of linking directly to Overdrive, Hoopla, Freegal and so forth. Unfortunately, these services are silos, containing only what you bought from that particular vendor. “Mark and Boost Electronic Resources” turns your catalog toward digital resources, while preserving what makes a catalog important—a single point of access to ALL library resources, not a vendor silo. Maximizing Your Electronic Holdings To make the best use of “Mark and Boost Electronic Resources,” we need to know about all your electronic resources. Unfortunately, some systems separate MARC holdings and electronic holdings; all resources appear in the catalog, but only some are available for export to Syndetics Unbound. Other libraries send us holding files with everything, but they are unable to send us updates every time new electronic resources are added. To address this issue, we have therefore advanced a new feature—”Auto-discover electronic holdings.” Turn this on and we build up an accurate representation of your library’s electronic resource holdings, without requiring any effort on your part. Adapting to Change “Mark and Boost Electronic Resources” is our first feature change to address the current crisis. But we are eager to do others, and to adapt the feature over time, as the situation develops. We are eager to get feedback from librarians and patrons! — The ProQuest and LibraryThing teams Labels: new features, new product, Syndetics Unbound posted by Tim @3:12 pm 0 Comments » Share Thursday, October 27th, 2016 Introducing Syndetics Unbound Short Version Today we’re going public with a new product for libraries, jointly developed by LibraryThing and ProQuest. It’s called Syndetics Unbound, and it makes library catalogs better, with catalog enrichments that provide information about each item, and jumping-off points for exploring the catalog. To see it in action, check out the Hartford Public Library in Hartford, CT. Here are some sample links: The Raven Boys by Maggie Stiefvater Alexander Hamilton by Ron Chernow Faithful Place by Tana French We’ve also got a press release and a nifty marketing site. UPDATE: Webinars Every Week! We’re now having weekly webinars, in which you can learn all about Syndetics Unbound, and ask us questions. Visit ProQuest’s WebEx portal to see the schedule and sign up! Long Version The Basic Idea Syndetics Unbound aims to make patrons happier and increase circulation. It works by enhancing discovery within your OPAC, giving patrons useful information about books, movies, music, and video games, and helping them find other things they like. This means adding elements like cover images, summaries, recommendations, series, tags, and both professional and user reviews. In one sense, Syndetics Unbound combines products—the ProQuest product Syndetics Plus and the LibraryThing products LibraryThing for Libraries and Book Display Widgets. In a more important sense, however, it leaps forward from these products to something new, simple, and powerful. New elements were invented. Static elements have become newly dynamic. Buttons provide deep-dives into your library’s collection. And—we think—everything looks better than anything Syndetics or LibraryThing have done before! (That’s one of only two exclamation points in this blog post, so we mean it.) Simplicity Syndetics Unbound is a complete and unified solution, not a menu of options spread across one or even multiple vendors. This simplicity starts with the design, which is made to look good out of the box, already configured for your OPAC and look. The installation requirements for Syndetics Unbound are minimal. If you already have Syndetics Plus or LibraryThing for Libraries, you’re all set. If you’ve never been a customer, you only need to add a line of HTML to your OPAC, and to upload your holdings. Although it’s simple, we didn’t neglect options. Libraries can reorder elements, or drop them entirely. We expect libraries will pick and choose, and evaluate elements according to patron needs, or feedback from our detailed usage stats. Libraries can also tweak the look and feel with custom CSS stylesheets. And simplicity is cheap. To assemble a not-quite-equivalent bundle from ProQuest’s and LibraryThing’s separate offerings would cost far more. We want everyone who has Syndetics Unbound to have it in its full glory. Comprehensiveness and Enrichments Syndetics Unbound enriches your catalog with some sixteen enrichments, but the number is less important than the options they encompass. These include both professional and user-generated content, information about the item you’re looking at, and jumping-off points to explore similar items. Quick descriptions of the enrichments: Boilterplate covers for items without covers. Premium Cover Service. Syndetics offers the most comprehensive cover database in existence for libraries—over 25 million full-color cover images for books, videos, DVDs, and CDs, with thousands of new covers added every week. For Syndetics Unbound, we added boilerplate covers for items that don’t have a cover, which include the title, author, and media type. Summaries. Over 18 million essential summaries and annotations, so patrons know what the book’s about. About the Author. This section includes the author biography and a small shelf of other items by the author. The section is also adorned by a small author photo—a first in the catalog, although familiar elsewhere on the web. Look Inside. Includes three previous Syndetics enrichments—first chapters or excerpts, table of contents and large-size covers—newly presented as a “peek inside the book” feature. Series. Shows a book’s series, including reading order. If the library is missing part of the series, those covers are shown but grayed out. You May Also Like. Provides sharp, on-the-spot readers advisory in your catalog, with the option to browse a larger world of suggestions, drawn from LibraryThing members and big-data algorithms. In this and other enrichments, Syndetics Unbound only recommends items that your library owns. The Syndetics Unbound recommendations cover far more of your collection than any similar service. For example, statistics from the Hartford Public Library show this feature on 88% of items viewed. Professional Reviews includes more than 5.4 million reviews from Library Journal, School Library Journal, New York Times, The Guardian, The Horn Book, BookList, BookSeller + Publisher Magazine, Choice, Publisher’s Weekly, and Kirkus. A la carte review sources include Voice of Youth Advocates: VOYA, Doody’s Medical Reviews and Quill and Quire. Reader Reviews includes more than 1.5 million vetted, reader reviews from LibraryThing members. It also allows patrons and librarians to add their own ratings and reviews, right in your catalog, and then showcase them on a library’s home page and social media. Also Available As helps patrons find other available formats and versions of a title in your collection, including paper, audio, ebook, and translations. Exploring the tag system Tags rethinks LibraryThing’s celebrated tag clouds—redesigning them toward simplicity and consistency, and away from the “ransom note” look of most clouds. As data, tags are based on over 131 million tags created by LibraryThing members, and hand-vetted by our staff librarians for quality. A new exploration interface allows patrons to explore what LibraryThing calls “tag mashes”—finding books by combinations of tags—in a simple faceted way. I’m going to be blogging about the redesign of tag clouds in the near future. Considering dozens of designs, we decided on a clean break with the past. (I expect it will get some reactions.) Book Profile is a newly dynamic version of what Bowker has done for years—analyzing thousands of new works of fiction, short-story collections, biographies, autobiographies, and memoirs annually. Now every term is clickable, and patrons can search and browse over one million profiles. Explore Reading Levels Reading Level is a newly dynamic way to see and explore other books in the same age and grade range. Reading Level also includes Metametrics Lexile® Framework for Reading. Click the “more” button to get a new, super-powered reading-level explorer. This is one my favorite features! (Second and last exclamation point.) Awards highlights the awards a title has won, and helps patrons find highly-awarded books in your collection. Includes biggies like the National Book Award and the Booker Prize, but also smaller awards like the Bram Stoker Award and Oklahoma’s Sequoyah Book Award. Browse Shelf gives your patrons the context and serendipity of browsing a physical shelf, using your call numbers. Includes a mini shelf-browser that sits on your detail pages, and a full-screen version, launched from the detail page. Video and Music adds summaries and other information for more than four million video and music titles including annotations, performers, track listings, release dates, genres, keywords, and themes. Video Games provides game descriptions, ESRB ratings, star ratings, system requirements, and even screenshots. Book Display Widgets. Finally, Syndetics Unbound isn’t limited to the catalog, but includes the LibraryThing product Book Display Widgets—virtual book displays that go on your library’s homepage, blog, LibGuides, Facebook, Twitter, Pinterest, or even in email newsletters. Display Widgets can be filled with preset content, such as popular titles, new titles, DVDs, journals, series, awards, tags, and more. Or you point them at a web page, RSS feed, or list of ISBNs, UPCs, or ISSNs. If your data is dynamic, the widget updates automatically. Here’s a page of Book Display Widget examples. Find out More Made it this far? You really need to see Syndetics Unbound in action. Check it Out. Again, here are some sample links of Syndetics Unbound at Hartford Public Library in Hartford, CT: The Raven Boys by Maggie Stiefvater, Alexander Hamilton by Ron Chernow, Faithful Place by Tana French. Webinars. We hold webinars every Tuesday and walk you through the different elements and answer questions. To sign up for a webinar, visit this Webex page and search for “Syndetics Unbound.” Interested in Syndetics Unbound at your library? Go here to contact a representative at ProQuest. Or read more about at the Syndetics Unbound website. Or email us at ltflsupport@librarything.com and we’ll help you find the right person or resource. Labels: librarything for libraries, new feature, new features, new product posted by Tim @10:45 am 4 Comments » Share Thursday, January 7th, 2016 ALAMW 2016 in Boston (and Free Passes)! Abby and KJ will be at ALA Midwinter in Boston this weekend, showing off LibraryThing for Libraries. Since the conference is so close to LibraryThing headquarters, chances are good that a few other LT staff members may appear, as well! Visit Us. Stop by booth #1717 to meet Abby & KJ (and potential mystery guests!), get a demo, and learn about all the new and fun things we’re up to with LibraryThing for Libraries, TinyCat, and LibraryThing. Get in Free. Are you in the Boston area and want to go to ALAMW? We have free exhibit only passes. Click here to sign up and get one! Note: It will get you just into the exhibit hall, not the conference sessions themselves. Labels: Uncategorized posted by Kate @4:05 pm 0 Comments » Share Thursday, June 25th, 2015 For ALA 2015: Three Free OPAC Enhancements For a limited time, LibraryThing for Libraries (LTFL) is offering three of its signature enhancements for free! There are no strings attached. We want people to see how LibraryThing for Libraries can improve your catalog. Check Library. The Check Library button is a “bookmarklet” that allows patrons to check if your library has a book while on Amazon and most other book websites. Unlike other options, LibraryThing knows all of the editions out there, so it finds the edition your library has. Learn more about Check Library Other Editions Let your users know everything you have. Don’t let users leave empty-handed when the record that came up is checked out. Other editions links all your holdings together in a FRBR model—paper, audiobook, ebook, even translations. Lexile Measures Put MetaMetrics’ The Lexile Framework® for Reading in your catalog, to help librarians and patrons find material based on reading level. In addition to showing the Lexile numbers, we also include an interactive browser. Easy to Add LTFL Enhancements are easy to install and can be added to every major ILS/OPAC system and most of the minor ones. Enrichments can be customized and styled to fit your catalog, and detailed usage reporting lets you know how they’re doing. See us at ALA. Stop by booth 3634 at ALA Annual this weekend in San Francisco to talk to Tim and Abby and see how these enhancements work. If you need a free pass to the exhibit hall, details are in this blog post. Sign up We’re offering these three enhancements free, for at least two years. We’ll probably send you links showing you how awesome other enhancements would look in your catalog, but that’s it. Find out more http://www.librarything.com/forlibraries or email Abby Blachly at abby@librarything.com. Labels: alaac15, Lexile measures, librarything for libraries, ltfl posted by Abby @1:31 pm 0 Comments » Share Tuesday, June 23rd, 2015 ALA 2015 in San Francisco (Free Passes) Our booth. But this is Kate, not Tim or Abby. She had the baby. Tim and I are headed to San Francisco this weekend for the ALA Annual Conference. Visit Us. Stop by booth #3634 to talk to us, get a demo, and learn about all the new and fun things we’re up to with LibraryThing for Libraries! Stay tuned this week for more announcements of what we’ll be showing off. No, really. It’s going to be awesome. Get in Free. In the SF area and want to go to ALA? We have free exhibit only passes. Click here to sign up and get one. It will get you just into the exhibit hall, not the conference sessions themselves. Labels: ala, alaac15 posted by Abby @2:17 pm 4 Comments » Share Monday, February 9th, 2015 New “More Like This” for LibraryThing for Libraries We’ve just released “More Like This,” a major upgrade to LibraryThing for Libraries’ “Similar items” recommendations. The upgrade is free and automatic for all current subscribers to LibraryThing for Libraries Catalog Enhancement Package. It adds several new categories of recommendations, as well as new features. We’ve got text about it below, but here’s a short (1:28) video: What’s New Similar items now has a See more link, which opens More Like This. Browse through different types of recommendations, including: Similar items More by author Similar authors By readers Same series By tags By genre You can also choose to show one or several of the new categories directly on the catalog page. Click a book in the lightbox to learn more about it—a summary when available, and a link to go directly to that item in the catalog. Rate the usefulness of each recommended item right in your catalog—hovering over a cover gives you buttons that let you mark whether it’s a good or bad recommendation. Try it Out! Click “See more” to open the More Like This browser in one of these libraries: Spokane County Library District Arapahoe Public Library Waukegan Public Library Cape May Public Library SAILS Library Network Find out more Find more details for current customers on what’s changing and what customizations are available on our help pages. For more information on LibraryThing for Libraries or if you’re interested in a free trial, email abby@librarything.com, visit http://www.librarything.com/forlibraries, or register for a webinar. Labels: librarything for libraries, ltfl, recommendations, similar books posted by Abby @2:02 pm 2 Comments » Share Thursday, February 5th, 2015 Subjects and the Ship of Theseus I thought I might take a break to post an amusing photo of something I wrote out today: The photo is a first draft of a database schema for a revamp of how LibraryThing will do library subjects. All told, it has 26 tables. Gulp. About eight of the tables do what a good cataloging system would do: Distinguishes the various subject systems (LCSH, Medical Subjects, etc.) Preserves the semantic richness of subject cataloging, including the stuff that never makes it into library systems. Breaks subjects into their facets (e.g., “Man-woman relationships — Fiction”) has two subject facets Most of the tables, however, satisfy LibraryThing’s unusual core commitments: to let users do their own thing, like their own little library, but also to let them benefit from and participate in the data and contributions of others.(1) So it: Links to subjects from various “levels,” including book-level, edition-level, ISBN-level and work-level. Allows members to use their own data, or “inherit” subjects from other levels. Allows for members to “play librarian,” improving good data and suppressing bad data.(2) Allows for real-time, fully reversible aliasing of subjects and subject facets. The last is perhaps the hardest. Nine years ago (!) I compared LibraryThing to the “Ship of Theseus,” a ship which is “preserved” although its components are continually changed. The same goes for much of its data, although “shifting sands” might be a better analogy. Accounting for this makes for some interesting database structures, and interesting programming. Not every system at LibraryThing does this perfectly. But I hope this structure will help us do that better for subjects.(3) Weird as all this is, I think it’s the way things are going. At present most libraries maintain their own data, which, while generally copied from another library, is fundamentally siloed. Like an evolving species, library records descend from each other; they aren’t dynamically linked. The data inside the records are siloed as well, trapped in a non-relational model. The profession that invented metadata, and indeed invented sharing metadata, is, at least as far as its catalogs go, far behind. Eventually that will end. It may end in a “Library Goodreads,” every library sharing the same data, with global changes possible, but reserved for special catalogers. But my bet is on a more LibraryThing-like future, where library systems will both respect local cataloging choices and, if they like, benefit instantly from improvements made elsewhere in the system. When that future arrives, we got the schema! 1. I’m betting another ten tables are added before the system is complete. 2. The system doesn’t presume whether changes will be made unilaterally, or voted on. Voting, like much else, existings in a separate system, even if it ends up looking like part of the subject system. 3. This is a long-term project. Our first steps are much more modest–the tables have an order-of-use, not shown. First off we’re going to duplicate the current system, but with appropriate character sets and segmentation by thesaurus and language. Labels: cataloging, subjects posted by Tim @7:44 pm 3 Comments » Share Tuesday, January 20th, 2015 LibraryThing Recommends in BiblioCommons Does your library use BiblioCommons as its catalog? LibraryThing and BiblioCommons now work together to give you high-quality reading recommendations in your BiblioCommons catalog. You can see some examples here. Look for “LibraryThing Recommends” on the right side. Not That Kind of Girl (Daniel Boone Regional Library) Carthage Must Be Destroyed (Ottowa Public Library) The Martian (Edmonton Public Library) Little Bear (West Vancouver Memorial Library) Station Eleven (Chapel Hill Public Library) The Brothers Karamazov (Calgary Public Library) Quick facts: As with all LibraryThing for Libraries products, LibraryThing Recommends only recommends other books within a library’s catalog. LibraryThing Recommends stretches across media, providing recommendations not just for print titles, but also for ebooks, audiobooks, and other media. LibraryThing Recommends shows up to two titles up front, with up to three displayed under “Show more.” Recommendations come from LibraryThing’s recommendations system, which draws on hundreds of millions of data points in readership patterns, tags, series, popularity, and other data. Not using BiblioCommons? Well, you can get LibraryThing recommendations—and much more—integrated in almost every catalog (OPAC and ILS) on earth, with all the same basic functionality, like recommending only books in your catalog, as well as other LibraryThing for Libraries feaures, like reviews, series and tags. Check out some examples on different systems here. SirsiDynix Enterprise (Saint Louis Public Library) SirsiDynix Horizon Information Portal (Hume Libraries) SirsiDynix eLibrary (Spokane County Public Library) III Encore (Arapahoe Public Library) III WebPac Pro (Waukegan Public Library) Polaris (Cape May County Library) Ex Libris Voyager (University of Wisconsin-Eau Claire) Interested? BiblioCommons: email info@bibliocommons.com or visit http://www.bibliocommons.com/AugmentedContent. See the full specifics here. Other Systems: email abby@librarything.com or visit http://www.librarything.com/forlibraries. Labels: Uncategorized posted by Tim @12:43 pm 0 Comments » Share Thursday, October 16th, 2014 NEW: Annotations for Book Display Widgets Our Book Display Widgets is getting adopted by more and more libraries, and we’re busy making it better and better. Last week we introduced Easy Share. This week we’re rolling out another improvement—Annotations! Book Display Widgets is the ultimate tool for libraries to create automatic or hand-picked virtual book displays for their home page, blog, Facebook or elsewhere. Annotations allows libraries to add explanations for their picks. Some Ways to Use Annotations 1. Explain Staff Picks right on your homepage. 2. Let students know if a book is reserved for a particular class. 3. Add context for special collections displays. How it Works Check out the LibraryThing for Libraries Wiki for instructions on how to add Annotations to your Book Display Widgets. It’s pretty easy. Interested? Watch a quick screencast explaining Book Display Widgets and how you can use them. Find out more about LibraryThing for Libraries and Book Display Widgets. And sign up for a free trial of either by contacting ltflsupport@librarything.com. Labels: Book Display Widgets, librarything for libraries, new feature, new features, widgets posted by KJ @10:21 am 0 Comments » Share Tuesday, October 14th, 2014 Send us a programmer, win $1,000 in books. We just posted a new job post Job: Library Developer at LibraryThing (Telecommute). To sweeten the deal, we are offering $1,000 worth of books to the person who finds them. That’s a lot of books. Rules! You get a $1,000 gift certificate to the local, chain or online bookseller of your choice. To qualify, you need to connect us to someone. Either you introduce them to us—and they follow up by applying themselves—or they mention your name in their email (“So-and-so told me about this”). You can recommend yourself, but if you found out about it from someone else, we hope you’ll do the right thing and make them the beneficiary. Small print: Our decision is final, incontestable, irreversible and completely dictatorial. It only applies when an employee is hired full-time, not part-time, contract or for a trial period. If we don’t hire someone for the job, we don’t pay. The contact must happen in the next month. If we’ve already been in touch with the candidate, it doesn’t count. Void where prohibited. You pay taxes, and the insidious hidden tax of shelving. Employees and their families are eligible to win, provided they aren’t work contacts. Tim is not. » Job: Library Developer at LibraryThing (Telecommute) Labels: jobs posted by Tim @10:04 am 1 Comment » Share Page 1 of 4512345...102030...»Last » Thingology is LibraryThing's ideas blog, on the philosophy and methods of tags, libraries and suchnot. The LibraryThing Blog RSS Feed Combined Feed Search for: Recent Posts New Syndetics Unbound Feature: Mark and Boost Electronic Resources Introducing Syndetics Unbound ALAMW 2016 in Boston (and Free Passes)! For ALA 2015: Three Free OPAC Enhancements ALA 2015 in San Francisco (Free Passes) Recent Comments máy phun phân bón on The LibraryThing programming quiz! Janis Jones on Book Display Widgets from LibraryThing for Libraries Marie Seltenrych on Book Display Widgets from LibraryThing for Libraries Tye Bishop on Introducing thingISBN Walter Clark on New group: “Books in 2025—The Future of the Book World” Archives April 2020 October 2016 January 2016 June 2015 February 2015 January 2015 October 2014 June 2014 May 2014 April 2014 March 2014 December 2013 October 2013 September 2013 June 2013 April 2013 March 2013 February 2013 January 2013 November 2012 October 2012 August 2012 June 2012 April 2012 March 2012 February 2012 January 2012 November 2011 October 2011 September 2011 August 2011 July 2011 June 2011 May 2011 April 2011 March 2011 February 2011 January 2011 December 2010 November 2010 October 2010 August 2010 July 2010 June 2010 May 2010 April 2010 March 2010 February 2010 January 2010 December 2009 November 2009 October 2009 September 2009 August 2009 July 2009 June 2009 May 2009 April 2009 March 2009 February 2009 January 2009 December 2008 November 2008 October 2008 September 2008 August 2008 July 2008 June 2008 May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006 October 2006 September 2006 August 2006 July 2006 June 2006 May 2006 Categories 37Signals aaron swartz academics ads ahml ala ala 2008 ala anaheim ALA midwinter ala2007 ala2008 ALA2010 ala2014 alaac15 ALAMW11 ALAMW13 alamw2009 ALAmw2010 Aleph Alexandria Egypt Amazon amusement android apis app arl arlington heights armenian astroturfing ato attention australia australian australian tax office authenticity awards barcode scanning bea ben franklin berkman center bhutan biblios BIGWIG blogging book blogs book covers Book Display Widgets book reviews BookPsychic books bookstores booksurge Boston bowdoin Bowker branded apps brigadoon library britney spears business c.s. lewis canton cataloging categories censorship Charleston chick lit chris catalfo christmas CIG CIL CIL2008 CIL2009 CIL2010 CIL2012 city planning claremont colleges clay shirky cluetrain code codi cognitive cost collection development commemorations common knowledge communiation Computers in Libraries conference ConferenceThing contests copyright covers coverthing crime csuci curiosities cutter DanMARC david weinberger DDC dead or alive department of commerce department of defense department of labor dewey decimal Dewey Decimal Classification discovery layer django doc searls dr. horrible drm Durham Early Reviewers east brunswick ebooks ebpl EBSCOhost economics elton john email employment enhancement Enterprise ereaders erotica event Evergreen everything is miscellaneous ExLibris facebook fake steve federal libraries feedback flash-mob cataloging folksonomy frbr freedom fun future of cataloging future of the book gbs gene smith getting real giraffe gmilcs google google book search groups guardian harry potter harvard coop hidden images hiring homophily houghton mifflin humor iBistro iii il2008 indexing indiebound inspiration instruction international internet archive internet librarians internships interviews iphone app isbns it conversations itt tallaght jacob nielsen jason griffey javascript jeff atwood jobs JSON kelly vista kils kindle kingston koha languages lccn lccns LCSH legacies legacy libraries legacy mob Lexile measures lianza lianza09 lib2.0 liblime librarians libraries libraries of the dead library 2.0 gang library anywhere library blogging library journal library of congress library of congress report library of the futurue library science library technology librarycampnyc2007 librarything librarything for libraries LibraryThing for Publishers librarything local linden labs LIS los gatos LTER ltfl LTFL categories ltfl libraries LTFL Reviews maine marc marcthing marié digby mashups masonic control masons meet-up metadata metasexdactyly michael gorman michael porter microsoft microsoft songsmith mike wesch milestone mobile mobile catalog mobile web monopoly moose movers and shakers nc NCSU neil gaiman NELA2013 new feature new features new product newspapers nipply nook North Carolina oclc oclc numbers oh opacs open data open library Open Shelves Classification open source openness OSC OverDrive paid memberships palinet pay what you want physical world PLA PLA12 PLA2008 podcasts policy politics polls portland Portland Public Library power laws print culture profile pictures QR code ra radiohead randolph county public library rcpl readers advisory reading recommendations reloadevery remixability reviews rhinos richland county rights riverine metaphors roy tennant rusa mars safe for work if you're a cataloger San Francisco State University santathing scanning schaufferwaffer screencasts Seattle Public Library second life secret santas serendipity series sfsu shelf browse shelfari shirky similar books sincerity sirsidynix slco small libraries Social Cataloging social media social networking songsmith sony reader SOPAC spam stack map stats steve lawson strangeness subjects Syndetics Unbound syria tag mirror tagging tagmash tags talis talks tax exemption the thingisbn Tim tipping points tools translation twitter uclassify ugc Uncategorized usability user generated content users utnapishtim VC vertical social networks very short list visualizations Voyager VuFind web web 2.0 webinars weddings weinberger West Virginia westlaw widgets Wikimania 2008 Wikimania2008 wirral wirral libraries work disambiguation Working Group on the Future of Bibliographic Control works worldcat worldcat local xisbn youtube zombies zoomii Meta Register Log in Entries RSS Comments RSS WordPress.org Help/FAQs | About | Privacy/Terms | Blog | Contact | APIs | WikiThing | Common Knowledge | Legacy Libraries | Early Reviewers | Zeitgeist Copyright LibraryThing and/or members of LibraryThing, authors, publishers, libraries, cover designers, Amazon, Bol, Bruna, etc. blog-library-villanova-edu-2758 ---- Falvey Memorial Library Blog Falvey Memorial Library Blog The collection of blogs published by Falvey Memorial Library, Villanova University blog-library-villanova-edu-6016 ---- Falvey Memorial Library :: The collection of blogs published by Falvey Memorial Library, Villanova University Skip Navigation Falvey Memorial Library VISIT / APPLY / GIVE My Library Account Collections Research Services Using the Library About Falvey Memorial Library Search Everything Books & Media Title Journal Title Author Subject Call Number ISBN/ISSN Tag Articles & more Article Title Article Author Other Libraries (ILL) ILL Title ILL Author ILL Subject ILL Call Number ILL ISBN/ISSN Library Website Guides Digital Library Search for books, articles, library site, almost anything Advanced You are exploring: Home > Blogs Falvey Memorial Library Blog Falvey Library Blogs Dig Deeper: Award-Winning Children’s Author Beverly Cleary April 27, 2021Library NewsBeverly Cleary, Dig Deeper, Library Resources Disappointed with the children’s books she read growing up, Beverly Cleary was determined to tell stories kids could relate to. ”I wanted to read funny stories about the sort of children I knew,” she wrote, ”and I decided ... Read More In Praise of Scrapple April 27, 2021Blue Electrode: Sparking between Silicon and Paperdigital library, Distinctive Collections, national poetry month, poems In honor of National Poetry Month, I thought I would share this poem by Philadelphia poet and Villanova alumnus Thomas Augustine Daly (1871-1948). The poem appears in McAroni Ballads and Other Verses (1919), newly digitized in our Digital Library ... Read More From The Archives: Owl Hop April 26, 2021Blue Electrode: Sparking between Silicon and PaperDistinctive Collections, University Archives When you step onto campus, you’ll discover Villanova’s many unique traditions. Some you may find are as old as the University itself and others are much more recent—but they all play an important role in the life of Villanova students. ... Read More New Resource: Eighteenth Century Collections Online April 26, 2021Library NewsEighteenth Century Collections Online, Gale Primary Sources, Library Resources Eighteenth Century Collections Online is broken into two parts and offers full text access to nearly every English-language and foreign-language title printed in the United Kingdom, alongside thousands of works published in the Americas, between ... Read More An Evening with Sr. Thea Bowman (1937-1990): Songs, Service, Struggle on April 27 April 26, 2021Library NewsAfrican American Spirituality, African American Studies, Biblical interpretation, black history, Campus Ministry, Sr. Thea Bowman The Villanova campus community is invited to join Campus Ministry for an evening of prayer and reflection, April 27, 7-8:15 p.m., with the song and spirit of Sr. Thea Bowman, FSPA. Presenters Rev. Naomi Washington-Leapheart and Michelle Sherman ... Read More Happy World Book Day and Shakespeare Day April 23, 2021Library News"William Shakespeare", book, falvey memorial library, Photo Friday, World Book Day Happy World Book Day and Shakespeare Day! To celebrate the Bard’s many contributions to culture and language, we wanted to share this striking edition that is contained in our physical collection. While the collection indeed contains several ... Read More Content Roundup – Third Week – 2021 April 23, 2021Blue Electrode: Sparking between Silicon and PaperContent Roundup This week sees the addition of materials digitized recently, including more Dime Novels and Story Papers and newly acquired letters written from William T. Sherman to Mrs. Mary C. Audenried, widow of Sherman’s longtime aide-de-camp. Dime Novel ... Read More Villanova Open Educational Resource (OER) Adoption Grant April 22, 2021Library News The Affordable Materials Project (AMP)  is offering 5 grants in the amount of $500 to tenure track or continuing faculty to encourage the adoption of an open educational resource (OER) as the primary course material for a class offered in the 2021 ... Read More TBT: 2019 Climate Strikes April 22, 2021Library Newsclimate change, Earth Day, Earth Week, Earth Week 2021, TBT, Throwback, throwback Thursday Here comes a BONUS TBT in honor of Earth Day! The photos featured here come from the March 15, 2019 Climate Strike at Villanova. This was just one of many climate strikes taking place on college campuses across the country. These strikes were ... Read More Search Falvey Library Blogs Categories Blue Electrode: Sparking between Silicon and Paper Library News Resources Technology Developments Feeds Content Comments Archives Compass (2005 - 2008) Meta Log in   Last Modified: December 22, 2015 800 Lancaster Ave., Villanova, PA 19085 610.519.4500 Contact Directions Privacy & Security Diversity Higher Education Act MY NOVA Villanova A-Z Directory Work at Villanova Accessibility Ask Us: Live Chat blog-libux-co-7064 ---- Library User Experience Community - Medium Library User Experience Community - Medium A blog and slack community organized around design and the user experience in libraries, non-profits, and the higher-ed web. - Medium A Library System for the Future This is a what-if story.Continue reading on Library User Experience Community » Alexa, get me the articles (voice interfaces in academia) Thinking about interfaces has led me down a path of all sorts of exciting/mildly terrifying ways of interacting with our devices — from…Continue reading on Library User Experience Community » Accessibility Information on Library Websites Is autocomplete on your library home page? Writing for the User Experience with Rebecca Blakiston First look at Primo’s new user interface What users expect Write for LibUX On the User Experience of Ebooks Unambitious and incapable men in librarianship blog-libux-co-8185 ---- Library User Experience Community Homepage Open in app Sign inGet started Practical Design Thinking for Libraries Library User Experience Community Guest Write ( - we pay!) Our Slack Community FollowFollowing A Library System for the Future A Library System for the Future This is a what-if story. Kelly DaganFeb 25, 2018 Latest Alexa, get me the articles (voice interfaces in academia) Alexa, get me the articles (voice interfaces in academia) Thinking about interfaces has led me down a path of all sorts of exciting/mildly terrifying ways of interacting with our devices — from… Kelly DaganFeb 11, 2018 Accessibility Information on Library Websites Accessibility Information on Library Websites An important part of making your library accessible is advertising that your library’s spaces and services are accessible and inclusive. Carli SpinaNov 17, 2017 Is autocomplete on your library home page? Is autocomplete on your library home page? Literature and some testing I’ve done this semester convinces me that autocomplete fundamentally improves the user experience Jaci Paige WilkinsonAug 20, 2017 Writing for the User Experience with Rebecca Blakiston Writing for the User Experience with Rebecca Blakiston 53:25 | Rebecca Blakiston — author of books on usability testing and writing with clarity; Library Journal mover and shaker — talks shop in… Michael SchofieldAug 1, 2017 Write for LibUX Write for LibUX We should aspire to push the #libweb forward by creating content that sets the bar for the conversation way up there, and I would love your… Michael SchofieldApr 28, 2017 First look at Primo’s new user interface First look at Primo’s new user interface Impressions of some key innovations of Primo’s new UI as well as challenges involved making customizations. Ron GilmourFeb 27, 2017 Today, I learned about the Accessibility Tree Today, I learned about the Accessibility Tree If you didn’t think your grip on web accessibility could get any looser. Michael SchofieldFeb 18, 2017 What users expect What users expect We thought it would be fun to emulate some of our favorite sites in a lightweight concept discovery layer we call Libre. Trey GordnerJan 29, 2017 Critical Librarianship in the Design of Libraries Critical Librarianship in the Design of Libraries Design decisions position libraries to more deliberately influence the user experience toward advocacy — such as communicating moral or… Michael SchofieldJan 10, 2017 The Non-Reader Persona The Non-Reader Persona Michael SchofieldDec 1, 2016 IU Libraries’ Redesign and the descending hero search IU Libraries’ Redesign and the descending hero search Michael SchofieldAug 8, 2016 Accessible, sort of — #a11eh Michael SchofieldJul 21, 2016 Create Once, Publish Everywhere Create Once, Publish Everywhere Michael SchofieldJul 17, 2016 Web education must go further than a conference budget Michael SchofieldMay 8, 2016 Blur the Line Between the Website and the Building Michael SchofieldNov 2, 2015 Say “Ok Library” Say “Ok Library” Michael SchofieldOct 28, 2015 Unambitious and incapable men in librarianship Unambitious and incapable men in librarianship Michael SchofieldOct 25, 2015 On the User Experience of Ebooks On the User Experience of Ebooks So, when it comes to ebooks I am in the minority: I prefer them to the real thing. The aesthetic or whats-it about the musty trappings of… Michael SchofieldOct 5, 2015 About Library User Experience CommunityLatest StoriesArchiveAbout MediumTermsPrivacy blog-openlibrary-org-1866 ---- The Open Library Blog The Open Library Blog A web page for every book Introducing the Open Library Explorer Try it here! If you like it, share it. Bringing 100 Years of Librarian-Knowledge to Life By Nick Norman with Drini Cami & Mek At the Library Leaders Forum 2020 (demo), Open Library unveiled the beta for what it’s calling the Library Explorer: an immersive interface which powerfully recreates and enhances the experience of navigating […] Importing your Goodreads & Accessing them with Open Library’s APIs by Mek Today Joe Alcorn, founder of readng, published an article (https://joealcorn.co.uk/blog/2020/goodreads-retiring-API) sharing news with readers that Amazon’s Goodreads service is in the process of retiring their developer APIs, with an effective start date of last Tuesday, December 8th, 2020. The topic stirred discussion among developers and book lovers alike, making the front-page of the […] On Bookstores, Libraries & Archives in the Digital Age The following was a guest post by Brewster Kahle on Against The Grain (ATG) – Linking Publishers, Vendors, & Librarians By: Brewster Kahle, Founder & Digital Librarian, Internet Archive​​​​​​​ ​​​Back in 2006, I was honored to give a keynote at the meeting of the Society of American Archivists, when the president of the Society presented me with a […] Amplifying the voices behind books Exploring how Open Library uses author data to help readers move from imagination to impact By Nick Norman, Edited by Mek & Drini According to René Descartes, a creative mathematician, “The reading of all good books is like a conversation with the finest [people] of past centuries.” If that’s true, then who are some of […] Giacomo Cignoni: My Internship at the Internet Archive This summer, Open Library and the Internet Archive took part in Google Summer of Code (GSoC), a Google initiative to help students gain coding experience by contributing to open source projects. I was lucky enough to mentor Giacomo while he worked on improving our BookReader experience and infrastructure. We have invited Giacomo to write a […] Google Summer of Code 2020: Adoption by Book Lovers by Tabish Shaikh & Mek OpenLibrary.org,the world’s best-kept library secret: Let’s make it easier for book lovers to discover and get started with Open Library. Hi, my name is Tabish Shaikh and this summer I participated in the Google Summer of Code program with Open Library to develop improvements which will help book lovers discover […] Open Library for Language Learners By Guyrandy Jean-Gilles 2020-07-21 A quick browse through the App Store and aspiring language learners will find themselves swimming in useful programs. But for experienced linguaphiles, the never-ending challenge is finding enough raw content and media to consume in their adopted tongue. Open Library can help. Earlier this year, Open Library added reading levels to […] Meet the Librarians of Open Library By Lisa Seaberg Are you a book lover looking to contribute to a warm, inclusive library community? We’d love to work with you: Learn more about Volunteering @ Open Library Behind the scenes of Open Library is a whole team of developers, data scientists, outreach experts, and librarians working together to make Open Library better […] Re-thinking Open Library’s Book Pages by Mek Karpeles, Tabish Shaikh We’ve redesigned our Book Pages: Before →After. Please share your feedback with us. A web page for every book… This is the mission of Open Library: a free, inclusive, online digital library catalog which helps readers find information about any book ever published. Millions of books in Open Library’s catalog […] Reading Logs: Going Public & Helping Book Lovers Share Hi book lovers, Starting 2020-05-26, Reading Logs for new Open Library accounts will be public by default. Readers may go here to view or manage their Reading Log privacy preferences. This will not affect the privacy of your reading history — only books which you explicitly mark as Want to Read, Currently Reading, or Already […] blog-openlibrary-org-2691 ---- The Open Library Blog | A web page for every book The Open Library Blog A web page for every book Skip to content About « Older posts Introducing the Open Library Explorer By mek | Published: December 16, 2020 Try it here! If you like it, share it. Bringing 100 Years of Librarian-Knowledge to Life By Nick Norman with Drini Cami & Mek At the Library Leaders Forum 2020 (demo), Open Library unveiled the beta for what it’s calling the Library Explorer: an immersive interface which powerfully recreates and enhances the experience of navigating a physical library. If the tagline doesn’t grab your attention, wait until you see it in action: Drini showcasing Library Explorer at the Library Leaders Forum Get Ready to Explore In this article, we’ll give you a tour of the Open Library Explorer and teach you how one may take full advantage of its features. You’ll also get a crash course on the 100+ years of library history which led to its innovation and an opportunity to test-drive it for yourself. So let’s get started!   What better way to set the stage than by taking a trip down memory lane to the last time you were able to visit your local public library. As you pass the front desk, a friendly librarian scribbles some numbers on a piece of paper which they hand to you and points you towards a relevant section. With the list of library call numbers in your hand as your compass, you eagerly make your way through waves of towering bookshelves. Suddenly, you depart from reality and find yourself navigating through a sea of books, discovering treasures you didn’t even know existed. Library photo courtesy of pixy.org/5775865/ Before you know it, one book gets stuffed under one arm, two more books go under your other arm, and a few more books get positioned securely between your knees. You’re doing the math to see how close you are to your check-out limit. Remember those days? What if you could replicate that same library experience and access it every single day, from the convenience of your web browser? Well, thanks to the new Open Library Explorer, you can experience the joys of a physical library right in your web browser, as well as leverage superpowers which enable you to explore in ways which may have previously been impossible. Before we dive into the bells-and-whistles of the Library Explorer, it’s worth learning how and why such innovations came to be. Who needs Library Explorer? This year we’ve seen systems stressed to their max due to the COVID-19 pandemic. With libraries and schools closing their doors globally and stay-at-home orders hampering our access, there has been a paradigm shift in the needs of researchers, educators, students, and families to access fundamental resources online. Getting this information online is a challenge in and of itself. Making it easy to discover and use materials online is another entirely. How does one faithfully compress the entire experience of a reliable, unbiased, expansive public library and its helpful, friendly staff into a 14” computer screen? Some sites, like Netflix or YouTube, solve this problem with recommendation engines that populate information based on what people have previously seen or searched. Consequently, readers may unknowingly find themselves caught in a sort of “algorithmic bubble.” An algorithmic bubble (or “filter bubble”) is a state of intellectual or informational isolation that’s perpetuated by personalized content. Algorithmic bubbles can make it difficult for users to access information beyond their own opinions—effectively isolating them in their own cultural or ideological silos.  Drini Cami, the creator of Library Explorer, says that users’ caught inside these algorithmic bubbles “won’t be exposed to information that is completely foreign to [them]. There is no way to systematically and feasibly explore.” Hence the reasoning behind the Library Explorer’s intelligence comes out of a need to discover information without the constraints of algorithmic bubbles. As readers are exposed to more information, the question becomes, how can readers fully explore swaths of new information and still enjoy the experience? Let’s take a look at how the Library Explorer tackles that half of the problem. Humanity’s Knowledge Brought to Life Earlier this year, Open Library added the ability to search materials by both Dewey Decimal Classification and Library of Congress Classification. These systems contain embedded within them over 100 years of librarian experience, and provide a systematized approach to sort through the entirety of humanity’s knowledge embedded in books.  It is important to note, the systematization of knowledge alone does not necessarily make it easily discoverable. This is what makes the Library Explorer so special. Its digital interface opens the door for readers to seamlessly navigate centuries of books anywhere online. Thanks to innovations such as the Library Explorer, readers can explore more books and access more knowledge with a better experience. A tour of Library Explorer’s features If you’re pulling up a chair for the first time, the Library Explorer presents you with tall, clickable bookshelves situated across your screen. Each shelf has its own identity that can morph into new classes of books and subject categories with a single click. And that’s only the beginning of what it offers. In addition to those smart filters, the Library Explorer wants you to steer the ship… not the other way around. In other words, you can personalize single rows of books, expand entire shelves, or construct an entire library-experience that evolves around your exact interests. You can custom tailor your own personal library from the comfort of your device, wherever you may be. Quick question: as a kid, did you ever layout your newly checked-out library books on your bed to admire them? Well, the creators behind the Library Explorer found a way to mimic that same experience. If you so choose, you can zoom out of the Library Explorer interface to get a complete view of the library you’ve constructed. Let’s explore one more set of cool features the Library Explorer offers by clicking on the “Filter” icon at the bottom of the page. By selecting “Juvenile,” you can instantly transform your entire library into a children’s library, but keep all the useful organization and structure provided by the bookshelves. It’s as if your own personal librarian ran in at lightning speed and removed every book from each shelf that didn’t meet your criteria. Or you may type in “subject:biography” and suddenly your entire library shows you a tailored collection of just biographies on every subject. The sky is your limit. If you click on the Settings tab, you’re given several options to customize the look and feel of your personal Library Explorer. You can switch between using Library of Congress or Dewey Decimal classification to organize your shelves. You can also choose from a variety of delightful options to see your books in 3D. Each book has the correct thickness determined by its actual number of pages. To see your favorite book in 3D, click the settings icon at the bottom of the screen and then press the 3D button. Library Explorer’s 3D view Maybe you’ve experienced a time where you had limited space in your book bag. Perhaps because of that, you chose to wait on checking out heavier books. Or, maybe you judged a book’s strength of knowledge based on its thickness. If that’s you, guess what? The Open Library Explorer lets you do that.  It gets personal… The primary goal of the Library Explorer was to create an experimental interface that ‘opens the door’ for readers to locate new books and engage with their favorite books. The Library Explorer is one of many steps that both the Internet Archive and the Open Library have made towards making knowledge easy to discover. As you know, such innovation couldn’t be possible without people who believe in the necessity of reading. Here is a list of the names of those who contributed to the creation of the Library Explorer: Drini Cami, Open Library Developer and Library Explorer Creator Mek Karpeles, Open Library Program Lead Jim Shelton, UX Designer, Internet Archive Ziyad Basheer, Product Designer Tinnei Pang, Illustrator and Product Designer James Hill-Khurana, Product Designer Nick Norman, Open Library Storyteller & Volunteer Communications Lead  Well, this is the moment you’ve been waiting for. Go here and give the Library Explorer a beta test-run. Also, follow @OpenLibrary on Twitter to learn about other features as soon as they’re released. But before you go… in the comments below, tell us your favorite library experience. We’d love to hear! Posted in Uncategorized | Comments closed Importing your Goodreads & Accessing them with Open Library’s APIs By mek | Published: December 13, 2020 by Mek Today Joe Alcorn, founder of readng, published an article (https://joealcorn.co.uk/blog/2020/goodreads-retiring-API) sharing news with readers that Amazon’s Goodreads service is in the process of retiring their developer APIs, with an effective start date of last Tuesday, December 8th, 2020. A screenshot taken from Joe Alcorn’s post The topic stirred discussion among developers and book lovers alike, making the front-page of the popular Hacker News website. Hacker News at 2020-12-13 1:30pm Pacific. The Importance of APIs For those who are new to the term, an API is a method of accessing data in a way which is designed for computers to consume rather than people. APIs often allow computers to subscribe to (i.e. listen for) events and then take actions. For example, let’s say you wanted to tweet every time your favorite author published a new book. One could sit on Goodreads and refresh the website every fifteen minutes. Or, one might write a twitter bot which automatically connects to Goodreads and checks real-time data using its API. In fact, the reason why Twitter bots work, is that they use Twitter’s API, a mechanism which lets specially designed computer programs submit tweets to the platform. As one of the more popular book services online today, tens of thousands of readers and organizations rely on Amazon’s Goodreads APIs to lookup information about books and to power their book-related applications across the web. Some authors rely on the data to showcase their works on their personal homepages, online book stores to promote their inventory, innovative new services like thestorygraph are using this data to help readers discover new insights, and even librarians and scholastic websites rely on book data APIs to make sure their catalog information is as up to date and accurate as possible for their patrons. For years, the Open Library team has been enthusiastic to share the book space with friends like Goodreads who have historically shown great commitment by enabling patrons to control (download and export) their own data and enabling developers to create flourishing ecosystems which promote books and readership through their APIs. When it comes to serving an audience of book lovers, there is no “one size fits all” and we’re glad so many different platforms and APIs exist to provide experiences which meet the needs of different communities. And we’d like to do our part to keep the landscape flourishing. “The sad thing is it [retiring their APIs] really only hurts the hobbyist projects and Goodreads users themselves.” — Joe Alcorn Picture of Aaron Swartz by Noah Berger/Landov from thedailybeast At Open Library, our top priority is pursuing Aaron Swartz‘s original mission: to serve as an open book catalog for the public (one page for every book ever published) and ensure our community always has free, open data to unlock a world of possibilities. A world which believes in the power of reading to preserve our cultural heritage and empower education and understanding. We sincerely hope that Amazon will decide it’s in Goodreads’ best interests to re-instate their APIs. But either way, Open Library is committed to helping readers, developers, and all book lovers have autonomy over their data and direct access to the data they rely on. One reason patrons appreciate Open Library is that it aligns with their values Imports & Exports In August 2020, one of our Google Summer of Code contributors Tabish Shaikh helped us implement an export option for Open Library Reading Logs to help everyone retain full control of their book data. We also created a Goodreads import feature to help patrons who may want an easy way to check which Goodreads titles may be available to borrow from the Internet Archive’s Controlled Digital Lending program via openlibrary.org and to help patrons organize all their books in one place. We didn’t make a fuss about this feature at the time, because we knew patrons have a lot of options. But things can change quickly and we want patrons to be able to make that decision for themselves. For those who may not have known, Amazon’s Goodreads website provides an option for downloading/exporting a list of books from one’s bookshelves. You may find instructions on this Goodreads export process here. Open Library’s Goodreads importer enables patrons to take this exported dump of their Goodreads bookshelves and automatically add matching titles to their Open Library Reading Logs. The Goodreads import feature from https://openlibrary.org/account/import Known issues. Currently, Open Library’s Goodreads Importer only works for (a) titles that are in the Open Library catalog and (b) which are new enough to have ISBNs. Our staff and community are committed to continuing to improve our catalog to include more titles (we added more than 1M titles this year) and we plan to improve our importer to support other ID types like OCLC and LOC. APIs & Data Developers and book overs who have been relying on Amazon’s Goodreads APIs are not out of luck. There are several wonderful services, many of them open-source, including Open Library, which offer free APIs: Wikidata.org (by the same group who brought us Wikipedia) is a treasure trove of metadata on Authors and Books. Open Library gratefully leverages this powerful resource to enrich our pages. Inventaire.io is a wonderful service which uses Wikidata and Openlibrary data (API: api.inventaire.io) Bookbrainz.org (by the group who runs Musicbrainz) is a up-and-coming catalog of books WorldCat by OCLC offers various metadata APIs Did we miss any? Please let us know! We’d love to work together, build stronger integrations with, and support other book-loving services. Open Library’s APIs. And of course, Open Library has a free, open, Book API which spans nearly 30 million books. Bulk Data. If you need access to all our data, Open Library releases a free monthly bulk data dump of Authors, Books, and more. Spoiler: Everything on Open Library is an API! One of my favorite parts of Open Library is that practically every page is an API. All that is required is adding “.json” to the end. Here are some examples: Search https://openlibrary.org/search?q=lord+of+the+rings is our search page for humans… https://openlibrary.org/search.json?q=lord+of+the+rings is our Search API! Books https://openlibrary.org/books/OL25929351M/Harry_Potter_and_the_Methods_of_Rationality is the human page for Harry Potter and the Methods of Rationality… https://openlibrary.org/books/OL25929351M.json is its API! Authors https://openlibrary.org/authors/OL2965893A/Rik_Roots is a human readable author page… https://openlibrary.org/authors/OL2965893A.json and here is the API! Did We Mention: Full-text Search over 4M Books? Major hat tip to the Internet Archive’s Giovanni Damiola for this one: Folks may also appreciate the ability to full-text search across 4M of the Internet Archive’s books (https://blog.openlibrary.org/2018/07/14/search-full-text-within-4m-books) on Open Library: You can try it directly here: http://openlibrary.org/search/inside?q=thanks%20for%20all%20the%20fish As per usual, nearly all Open Library urls are themselves APIs, e.g.: http://openlibrary.org/search/inside.json?q=thanks%20for%20all%20the%20fish Get Involved Questions? Open Library is an free, open-source, nonprofit project run by the Internet Archive. We do our development transparently in public (here’s our code) and our community spanning more than 40 volunteers meets every week, Tuesday @ 11:30am Pacific. Please contact us to join our call and participate in the process. Bugs? If something isn’t working as expected, please let us know by opening an issue or joining our weekly community calls. Want to share thanks? Please follow up on twitter: https://twitter.com/openlibrary and let us know how you’re using our APIs! Thank you A special thank you to our lead developers Drini Cami, Chris Clauss, and one of our lead volunteer engineers, Aaron, for spending their weekend helping fix a Python 3 bug which was temporarily preventing Goodreads imports from succeeding. A Decentralized Future The Internet Archive has a history cultivating and supporting the decentralized web. We operate a decentralized version of archive.org and host regular meetups and summits to galvanize the distributed web community. In the future, we can imagine a world where no single website controls all of your data, but rather patrons can participate in a decentralized, distributed network. You may be interested to try Bookwyrm, an open-source decentralized project by Mouse, former engineer on the Internet Archive’s Archive-It team. Posted in Uncategorized | Comments closed On Bookstores, Libraries & Archives in the Digital Age By Brewster Kahle | Published: October 7, 2020 The following was a guest post by Brewster Kahle on Against The Grain (ATG) – Linking Publishers, Vendors, & Librarians On Bookstores, Libraries & Archives in the Digital Age-An ATG Guest Post See the original article here on ATG’s website By: Brewster Kahle, Founder & Digital Librarian, Internet Archive​​​​​​​ ​​​Back in 2006, I was honored to give a keynote at the meeting of the Society of American Archivists, when the president of the Society presented me with a framed blown-up letter “S.”  This was an inside joke about the Internet Archive being named in the singular, Archive, rather than the plural Archives. Of course, he was right, as I should have known all along. The Internet Archive had long since grown out of being an “archive of the Internet”—a singular collection, say of web pages—to being “archives on the Internet,” plural.  My evolving understanding of these different names might help focus a discussion that has become blurry in our digital times: the difference between the roles of publishers, bookstores, libraries, archives, and museums. These organizations and institutions have evolved with different success criteria, not just because of the shifting physical manifestation of knowledge over time, but because of the different roles each group plays in a functioning society. For the moment, let’s take the concepts of Library and Archive. The traditional definition of a library is that it is made up of published materials, while an archive is made up of unpublished materials. Archives play an important function that must be maintained—we give frightfully little attention to collections of unpublished works in the digital age. Think of all the drafts of books that have disappeared once we started to write with word processors and kept the files on fragile computer floppies and disks. Think of all the videotapes of lectures that are thrown out or were never recorded in the first place.  Bookstores: The Thrill of the Hunt Let’s try another approach to understanding distinctions between bookstores, libraries and archives. When I was in my 20’s living in Boston—before Amazon.com and before the World Wide Web (but during the early Internet)—new and used bookstores were everywhere. I thought of them as catering to the specialized interests of their customers: small, selective, and only offering books that might sell and be taken away, with enough profit margin to keep the store in business. I loved them. I especially liked the used bookstore owners—they could peer into my soul (and into my wallet!) to find the right book for me. The most enjoyable aspect of the bookstore was the hunt—I arrived with a tiny sheet of paper in my wallet with a list of the books I wanted, would bring it out and ask the used bookstore owners if I might go home with a bargain. I rarely had the money to buy new books for myself, but I would give new books as gifts. While I knew it was okay to stay for awhile in the bookstore just reading, I always knew the game. Libraries: Offering Conversations not Answers The libraries that I used in Boston—MIT Libraries, Harvard Libraries, the Boston Public Library—were very different. I knew of the private Boston Athenæum but I was not a member, so I could not enter. Libraries for me seemed infinite, but still tailored to individual interests. They had what was needed for you to explore and if they did not have it, the reference librarian would proudly proclaim: “We can get it for you!” I loved interlibrary loans—not so much in practice, because it was slow, but because they gave you a glimpse of a network of institutions sharing what they treasured with anyone curious enough to want to know more. It was a dream straight out of Borges’ imagination (if you have not read Borges’ short stories, they are not to be missed, and they are short. I recommend you write them on the little slip of paper you keep in your wallet.) I couldn’t afford to own many of the books I wanted, so it turned off that acquisitive impulse in me. But the libraries allowed me to read anything, old and new. I found I consumed library books very differently. I rarely even brought a book from the shelf to a table; I would stand, browse, read, learn and search in the aisles. Dipping in here and there. The card catalog got me to the right section and from there I learned as I explored.  Libraries were there to spark my own ideas. The library did not set out to tell a story as a museum would. It was for me to find stories, to create connections, have my own ideas by putting things together. I would come to the library with a question and end up with ideas.  Rarely were these facts or statistics—but rather new points of view. Old books, historical newspapers, even the collection of reference books all illustrated points of view that were important to the times and subject matter. I was able to learn from others who may have been far away or long deceased. Libraries presented me with a conversation, not an answer. Good libraries cause conversations in your head with many writers. These writers, those librarians, challenged me to be different, to be better.  Staying for hours in a library was not an annoyance for the librarians—it was the point. Yes, you could check books out of the library, and I would, but mostly I did my work in the library—a few pages here, a few pages there—a stack of books in a carrel with index cards tucked into them and with lots of handwritten notes (uh, no laptops yet). But libraries were still specialized. To learn about draft resisters during the Vietnam War, I needed access to a law library. MIT did not have a law collection and this was before Lexis/Nexis and Westlaw. I needed to get to the volumes of case law of the United States.  Harvard, up the road, had one of the great law libraries, but as an MIT student, I could not get in. My MIT professor lent me his ID that fortunately did not include a photo, so I could sneak in with that. I spent hours in the basement of Harvard’s Law Library reading about the cases of conscientious objectors and others.  But why was this library of law books not available to everyone? It stung me. It did not seem right.  A few years later I would apply to library school at Simmons College to figure out how to build a digital library system that would be closer to the carved words over the Boston Public Library’s door in Copley Square:  “Free to All.”   Archives: A Wonderful Place for Singular Obsessions When I quizzed the archivist at MIT, she explained what she did and how the MIT Archives worked. I loved the idea, but did not spend any time there—it was not organized for the busy undergraduate. The MIT Library was organized for easy access; the MIT Archives included complete collections of papers, notes, ephemera from others, often professors. It struck me that the archives were collections of collections. Each collection faithfully preserved and annotated.  I think of them as having advertisements on them, beckoning the researcher who wants to dive into the materials in the archive and the mindset of the collector. So in this formulation, an archive is a collection, archives are collections of collections.  Archivists are presented with collections, usually donations, but sometimes there is some money involved to preserve and catalog another’s life work. Personally, I appreciate almost any evidence of obsession—it can drive toward singular accomplishments. Archives often reveal such singular obsessions. But not all collections are archived, as it is an expensive process. The cost of archiving collections is changing, especially with digital materials, as is cataloging and searching those collections. But it is still expensive. When the Internet Archive takes on a physical collection, say of records, or old repair manuals, or materials from an art group, we have to weigh the costs and the potential benefits to researchers in the future.  Archives take the long view. One hundred years from now is not an endpoint, it may be the first time a collection really comes back to light. Digital Libraries: A Memex Dream, a Global Brain So when I helped start the Internet Archive, we wanted to build a digital library—a “complete enough” collection, and “organized enough” that everything would be there and findable. A Universal Library. A Library of Alexandria for the digital age. Fulfilling the memex dream of Vanevar Bush (do read “As We May Think“), of Ted Nelson‘s Xanadu, of Tim Berners-Lee‘s World Wide Web, of Danny Hillis‘ Thinking Machine, Raj Reddy’s Universal Access to All Knowledge, and Peter Russell’s Global Brain. Could we be smarter by having people, the library, networks, and computers all work together?  That is the dream I signed on to.  I dreamed of starting with a collection—an Archive, an Internet Archive. This grew to be  a collection of collections: Archives. Then a critical mass of knowledge complete enough to inform citizens worldwide: a Digital Library. A library accessible by anyone connected to the Internet, “Free to All.” About the Author: Brewster Kahle, Founder & Digital Librarian, Internet Archive Brewster Kahle A passionate advocate for public Internet access and a successful entrepreneur, Brewster Kahle has spent his career intent on a singular focus: providing Universal Access to All Knowledge. He is the founder and Digital Librarian of the Internet Archive, one of the largest digital libraries in the world, which serves more than a million patrons each day. Creator of the Wayback Machine and lending millions of digitized books, the Internet Archive works with more than 800 library and university partners to create a free digital library, accessible to all. Soon after graduating from the Massachusetts Institute of Technology where he studied artificial intelligence, Kahle helped found the company Thinking Machines, a parallel supercomputer maker. He is an Internet pioneer, creating the Internet’s first publishing system called Wide Area Information Server (WAIS). In 1996, Kahle co-founded Alexa Internet, with technology that helps catalog the Web, selling it to Amazon.com in 1999.  Elected to the Internet Hall of Fame, Kahle is also a Fellow of the American Academy of Arts and Sciences, a member of the National Academy of Engineering, and holds honorary library doctorates from Simmons College and University of Alberta. Posted in Discussion, Librarianship, Uncategorized | Comments closed Amplifying the voices behind books By mek | Published: September 2, 2020 Exploring how Open Library uses author data to help readers move from imagination to impact By Nick Norman, Edited by Mek & Drini Image Source: Pexels / Pixabay from popsugar According to René Descartes, a creative mathematician, “The reading of all good books is like a conversation with the finest [people] of past centuries.” If that’s true, then who are some of the people you’re talking to? If you’re not sure how to answer that question, you’ll definitely appreciate the ‘Author Stats’ feature developed by Open Library. A deep dive into author stats Author stats give readers clear insights about their favorite authors that go much deeper than the front cover: such as birthplace, gender, works by time, ethnicity, and country of citizenship. These bits and pieces of knowledge about authors can empower readers in some dynamic ways. But how exactly? To answer that question, consider a reader who’s passionate about the topic of cultural diversity. However, after the reader examines their personalized author stats, they realize that their reading history lacks diversity. This doesn’t mean the reader isn’t passionate about cultural diversity; rather, author stats empowers the reader to pinpoint specific stats that can be diversified. Take a moment … or a day, and think about all the books you’ve read — just in the last year or as far back as you can. What if you could align the pages of each of those books with something meaningful … something that matters? What if each time you cracked open a book, the voices inside could point you to places filled with hope and opportunity? According to Drini Cami — Open Library’s lead developer behind Author Stats , “These stats let readers determine where the voices they read are coming from.” Drini continues saying, “A book can be both like a conversation as well as a journey.” He also says, “Statistics related to the authors might help provide readers with feedback as to where the voices they are listening to are coming from, and hopefully encourage the reading of books from a wider variety of perspectives.” Take a moment to let that sink in. Data with the power to change While Open Library’s author stats can show author-related demographics, those same stats can do a lot more than that. Drini Cami went on to say that, “Author stats can help readers intelligently alter their  behavior (if they wish to).” A profound statement that Mark Twain — one of the best writers in American history — might even shout from the rooftop. Broad, wholesome, charitable views of [people] … cannot be acquired by vegetating in one little corner of the earth all one’s lifetime. — Mark Twain In the eyes of Drini Cami and Mark Twain, books are like miniature time machines that have the power to launch readers into new spaces while changing their behaviors at the same time. For it is only when a reader steps out of their corner of the earth that they can step forward towards becoming a better person — for the entire world. Connecting two worlds of data Open Library has gone far beyond the extra mile to provide data about author demographics that some readers may not realize. It started with Open Library’s commitment to providing its readers with what Drini Cami describes as “clean, organized, structured, queryable data.” Simply put, readers can trust that Open Library’s data can be used to provide its audiences with maximum value. Which begs the question, where is all that ‘value’ coming from? Drini Cami calls it “linked data”. In not so complex terms, you may think of linked data as being two or more storage sheds packed with data. When these storage sheds are connected, well… that’s when the magic happens. For Open Library, that magic starts at the link between Wikidata and Open Library knowledge bases. Wikidata, a non-profit community-powered project run by Wikimedia, the same team which brought us Wikipedia, is a “free and open knowledge base that can be read and edited by both humans and machines”. It’s like Wikipedia except for storing bite-sized encyclopedic data and facts instead of articles. If you look closely, you may even find some of Wikidata’s data being leveraged within Wikipedia articles. Wikipedia’s Summary Info Box Source data in Wikidata Wikidata is where Open Library gets its author demographic data from. This is possible because the entries on Wikidata often include links to source material such as books, authors, learning materials, e-journals, and even to other knowledge bases like Open Library’s. Because of these links, Open Library is able to share its data with Wikidata and often times get back detailed information and structured data in return. Such as author demographics. Wrangling in the Data Linking-up services like Wikidata and Open Library doesn’t happen automatically. It requires the hard work of “Metadata Wranglers”. That’s where Charles Horn comes in, the lead Data Engineer at Open Library — without his work, author stats would not be possible. Charles Horn works closely with Drini Cami and also the team at Wikidata to connect book and author resources on Open Library with the data kept inside Wikidata. By writing clever bots and scripts, Charles and Drini are able to make tens of thousands of connections at scale. To put it simply, as both Open Library and Wikidata grow, their resources and data will become better connected and more accurate.  Thanks to the help of “Metadata Wranglers”, Open Library users will always have the smartest results — right at their fingertips.  It’s in a book … Once Upon a Time, ten-time Grammy Award Winner Chaka Kahn greeted television viewers with her bright voice on the once-popular book reading program, Reading Rainbow. In her words, she sang … “Friends to know, and ways to grow, a Reading Rainbow. I can be anything. Take a look, it’s in a book …” Thanks to Open Library’s author stats, not only do readers have the power to “take a look” into books, they can see further, and truly change what they see. Try browsing your author stats and consider following Open Library on twitter. The “My Reading Stats” option may be found under the “My Books” drop down menu within the main site’s top navigation. What did you learn about your favorite authors? Please share in the comments below. Posted in Community, Cultural Resources, Data | Comments closed Giacomo Cignoni: My Internship at the Internet Archive By Drini Cami | Published: August 29, 2020 This summer, Open Library and the Internet Archive took part in Google Summer of Code (GSoC), a Google initiative to help students gain coding experience by contributing to open source projects. I was lucky enough to mentor Giacomo while he worked on improving our BookReader experience and infrastructure. We have invited Giacomo to write a blog post to share some of the wonderful work he has done and his learnings. It was a pleasure working with you Giacomo, and we all wish you the best of luck with the rest of your studies! – Drini Hi, I am Giacomo Cignoni, a 2nd year computer science student from Italy. I submitted my 2020 Google Summer of Code (GSoC) project to work with the Internet Archive and I was selected for it. In this blogpost, I want to tell you about my experience and my accomplishments working this summer on BookReader, Internet Archive’s open source book reading web application. The BookReader features I enjoyed the most working on are page filters (which includes “dark mode”) and the text selection layer for certain public domain books. They were both challenging, but mostly had a great impact on the user experience of Bookreader. The first allows text to be selected and copied directly from the page images (currently in internal testing), and the second permits turning white-background black-text pages into black-background-white-text ones. Short summary of implemented features: End-to-end testing (search, autoplay, right-to-left books) Generic book from Internet Archive demo Mobile BookReader table of contents Checkbox for filters on book pages (including dark mode) Text selection layer plugin for public domain books Bug fixes for page flipping Using high resolution book images bug fix First approach to GSoC experience Once I received the news that I had been selected for GSoC with Internet Archive for my BookReader project, I was really excited, as it was the beginning of a new experience for me. For the same reason, I will not hide that I was a little bit nervous because it was my first internship-like experience. Fortunately, even from the start, my mentor Drini and also Mek were supportive and also ready to offer help. Moreover, the fact that I was already familiar with BookReader was helpful, as I had already used it (and even modified it a little bit) for a personal project. For most of the month of May, since the 6th, the day of the GSoC selection, I mainly focused on getting to know the other members of the UX team at Internet Archive, whom I would be working with for the rest of the summer, and also define a more precise roadmap of my future work with my mentor, as my proposed project was open to any improvements for BookReader. End to end testing The first tasks I worked on, as stated in the project, were about end-to-end testing for BookReader. I learned about the Testcafe tool that was to be used, and my first real task was to remove and explore some old QUnit tests (#308). Then I started to make end-to-end tests for the search feature in BookReader, both for desktop (#314) and mobile (#322). Lastly, I fixed the existent autoplay end-to-end test (#344) that was causing problems and I also had prepared end-to-end tests for right-to-left books (#350), but it wasn’t merged immediately because it needed a feature that I would have implemented later; a system to choose different books from the IA servers to be displayed specifying the book id in the URL. This work on testing (which lasted until the ~20th of June) was really helpful at the beginning as it allowed me to gain more confidence with the codebase without trying immediately harder tasks and also to gain more confidence with JavaScript ES6. The frequent meetings with my mentor and other members of the team made me really feel part of the workplace. Working on the source code The table of contents panel in BookReader mobile My first experience working on core BookReader source code was during the Internet Archive hackathon on May the 30th when, with the help of my mentor, I created the first draft for the table of content panel for mobile BookReader. I would then resume to work on this feature in July, refining it until it was released (#351). I then worked on a checkbox to apply different filters to the book page images, still on mobile BookReader (#342), which includes a sort of “dark mode”. This feature was probably the one I enjoyed the most working on, as it was challenging but not too difficult, it included some planning and was not purely technical and received great appreciation from users. Page filters for BookReader mobile let you read in a “dark mode” https://twitter.com/openlibrary/status/1280184861957828608 Then I worked on the generic demo feature; a particular demo for BookReader which allows you to choose a book  from the Internet Archive servers to be displayed, by simply adding the book id in the URL as a parameter (#356). This allowed the right to left e2e test to be merged and proved to be useful for manually testing the text selection plugin. In this period I also fixed two page flipping issues: one more critical (when flipping pages in quick succession the pages started turning back and forth randomly) (#386), and the other one less urgent, but it was an issue a user specifically pointed out (in an old BookReader demo it was impossible to turn pages at all) (#383). Another issue I solved was BookReader not correctly displaying high resolution images on high resolution displays (#378). Open source project experience One aspect I really enjoyed of my GSoC is the all-around experience of working on an open source project. This includes leaving more approachable tasks for the occasional member of the community to take on and helping them out. Also, I found it interesting working with other members of the team aside from my mentor, both for more technical reasons and for help in UI designing and feedback about the user experience: I always liked having more points of view about my work. Moreover, direct user feedback from the users, which showed appreciation for the new implemented features (such as BookReader “dark mode”), was very motivating and pushed me to do better in the following tasks. Text selection layer The normally invisible text layer shown red here for debugging The biggest feature of my GSoC was implementing the ability to select text directly on the page image from BookReader for public domain books, in order to copy and paste it elsewhere (#367). This was made possible because Internet Archive books have information about each word and its placement in the page, which is collected by doing OCR. To implement this feature we decided to use an invisible text layer placed on top of the page image, with words being correctly positioned and scaled. This made it possible to use the browser’s text selection system instead of creating a new one. The text layer on top of the page was implemented using an SVG element, with subelements for each paragraph and word in the page. The use of the SVG instead of normal html text elements made it a lot easier to overcome most of the problems we expected to find regarding the correct placement and scaling of words in the layer. I started working sporadically on this feature since the start of July and this led to having a workable demo by the first day of August. The rest of the month of August was spent refining this feature to make it production-ready. This included refining word placement in the layer, adding unit tests, adding support for more browsers, refactoring some functions, making the experience more fluid, making the selected text to be accurate for newlines and spaces on copy. The most challenging part was probably to integrate well the text selection actions in the two page view of BookReader, without disrupting the click-to-flip-page and other functionalities related to mouse-click events. This feature is currently in internal testing, and scheduled for release in the next few weeks. The text selection experience Conclusions Overall, I was extremely satisfied with my GSoC at the Internet Archive. It was a great opportunity to learn new things for me. I got much more fluent in JavaScript and CSS, thanks to both my mentor and using these languages in practice while coding. I learnt a lot about working on an open source project, but a part that I probably found really interesting was attending and participating in the decision making processes, even about projects I was not involved in. It was also interesting for me to apply concepts I had studied on a more theoretical level at university in a real workplace environment. To sum things up, the ability to work on something I liked that had an impact on users and the ability to learn useful things for my personal development really made this experience worthwhile for me. I would 100% recommend doing a GSoC at the Internet Archive! Posted in BookReader, Community, Google Summer of Code (GSoC), Open Source | Comments closed Open Library is an initiative of the Internet Archive, a 501(c)(3) non-profit, building a digital library of Internet sites and other cultural artifacts in digital form. Other projects include the Wayback Machine, archive.org and archive-it.org. Your use of the Open Library is subject to the Internet Archive's Terms of Use. « Older posts Search Recent Posts Introducing the Open Library Explorer Importing your Goodreads & Accessing them with Open Library’s APIs On Bookstores, Libraries & Archives in the Digital Age Amplifying the voices behind books Giacomo Cignoni: My Internship at the Internet Archive Archives Archives Select Month December 2020 October 2020 September 2020 August 2020 July 2020 May 2020 November 2019 October 2019 January 2019 October 2018 August 2018 July 2018 June 2018 May 2018 March 2018 December 2017 October 2016 June 2016 May 2016 February 2016 January 2016 November 2015 February 2015 January 2015 December 2014 November 2014 October 2014 August 2014 July 2014 June 2014 May 2014 April 2014 March 2014 April 2013 January 2013 August 2012 December 2011 November 2011 October 2011 July 2011 June 2011 May 2011 April 2011 March 2011 February 2011 January 2011 December 2010 November 2010 October 2010 September 2010 August 2010 July 2010 June 2010 May 2010 April 2010 March 2010 February 2010 January 2010 December 2009 November 2009 October 2009 September 2009 August 2009 July 2009 June 2009 May 2009 April 2009 March 2009 February 2009 January 2009 December 2008 November 2008 Theme customized from Thematic Theme Framework. blog-reeset-net-1240 ---- MarcEdit 7.5 Update – Terry's Worklog Skip to content Terry's Worklog On my work (programming, digital libraries, cataloging) and other stuff that perks my interest (family, cycling, etc) Menu Close Home About Me MarcEdit Homepage GitHub Page Privacy Policy MarcEdit 7.5 Update ChangeLog: https://marcedit.reeset.net/software/update75.txt Highlights Preview Changes One of the most requested features over the years has been the ability to preview changes prior to running them.  As of 7.5.8 – a new preview option has been added to many of the global editing tools in the MarcEditor.  Currently, you will find the preview option attached to the following functions: Replace All Add New Field Delete Field Edit Subfield Edit Field Edit Indicator Copy Field Swap Field Functions that include a preview option will be denoted with the following button: When this button is pressed, the following option is made available When Preview Results is selected, the program will execute the defined action, and display the potential results in a display screen.  For example: To protect performance, only 500 results at a time will be loaded into the preview grid, though users can keep adding results to the grid and continue to review items.  Additionally, users have the ability to search for items within the grid as well as jump to a specific record number (not row number).  These new options will show up first in the windows version of MarcEdit, but will be added to the MarcEdit Mac 3.5.x branch in the coming weeks.  New JSON => XML Translation To better support the translation of data from JSON to MARC, I’ve included a JSON => MARC algorithm in the MARCEngine.  This will allow JSON data to serialized into XML.  The benefit of including this option, is that I’ve been able to update the XML Functions options to allow JSON to be a starting format.  This will specifically useful for users that want to make use of linked data vocabularies to generate MARC Authority records.  Users can direct MarcEdit to facilitate the translation from JSON to XML, and then create XSLT translations that can then be used to complete the process to MARCXML and MARC.  I’ve demonstrated how this process works using a vocabulary of interest to the #critcat community, the Homosaurus vocabulary (How do I generate MARC authority records from the Homosaurus vocabulary? – Terry’s Worklog (reeset.net)). OCLC API Interactions Working with the OCLC API is sometimes tricky.   MarcEdit utilizes a specific authentication process that requires OCLC keys be setup and configured to work a certain way.  When issues come up, it is sometimes very difficult to debug them.  I’ve updated the process and error handling to surface more information – so when problems occur and XML debugging information isn’t available, the actual exception and inner exception data will be surfaced instead.  This often can provide information to help understand why the process isn’t able to complete. Wrap up As noted, there have been a number of updates.  While many fall under the category of house-keeping (updating icons, UX improvements, actions, default values, etc.) – this update does include a number of often asked for, significant updates, that I hope will improve user workflows. –tr Published April 3, 2021By reeset Categorized as MarcEdit Leave a comment Cancel reply Your email address will not be published. Required fields are marked * Comment Name * Email * Website Notify me of follow-up comments by email. Notify me of new posts by email. Post navigation Previous post How do I generate MARC authority records from the Homosaurus vocabulary? Next post Thoughts on NACOs proposed process on updating CJK records Search… Terry's Worklog Proudly powered by WordPress. Dark Mode: blog-reeset-net-1241 ---- None blog-reeset-net-1510 ---- Terry's Worklog – On my work (programming, digital libraries, cataloging) and other stuff that perks my interest (family, cycling, etc) Skip to content Terry's Worklog On my work (programming, digital libraries, cataloging) and other stuff that perks my interest (family, cycling, etc) Menu Close Home About Me MarcEdit Homepage GitHub Page Privacy Policy Thoughts on NACOs proposed process on updating CJK records I would like to take a few minutes and share my thoughts about an updated best practice recently posted by the PCC and NACO related to an update on CJK records. The update is found here: https://www.loc.gov/aba/pcc/naco/CJK/CJK-Best-Practice-NCR.docx. I’m not certain if this is active or a simply a proposal, but I’ve been having a number… Continue reading Thoughts on NACOs proposed process on updating CJK records Published April 20, 2021Categorized as Cataloging, MarcEdit MarcEdit 7.5 Update ChangeLog: https://marcedit.reeset.net/software/update75.txt Highlights Preview Changes One of the most requested features over the years has been the ability to preview changes prior to running them.  As of 7.5.8 – a new preview option has been added to many of the global editing tools in the MarcEditor.  Currently, you will find the preview option attached to… Continue reading MarcEdit 7.5 Update Published April 3, 2021Categorized as MarcEdit How do I generate MARC authority records from the Homosaurus vocabulary? Step by step instructions here: https://youtu.be/FJsdQI3pZPQ Ok, so last week, I got an interesting question on the listserv where a user asked specifically about generating MARC records for use in one’s ILS system from a JSONLD vocabulary.  In this case, the vocabulary in question as Homosaurus (Homosaurus Vocabulary Site) – and the questioner was specifically… Continue reading How do I generate MARC authority records from the Homosaurus vocabulary? Published April 3, 2021Categorized as MarcEdit MarcEdit: State of the Community *2020-2021 * Sigh – original title said 2019-2020.  Obviously, this is for this past year (Jan. 2020-Dec. 31, 2020).   Per usual, I wanted to take a couple minutes and look at the state of the MarcEdit project. This is something that I try to do once a year to gauge the current health of the community,… Continue reading MarcEdit: State of the Community *2020-2021 Published March 24, 2021Categorized as MarcEdit, Uncategorized MarcEdit 7.3.x/7.5.x (beta) Updates Versions are available at: https://marcedit.reeset.net/downloads Information about the changes: 7.3.10 Change Log: https://marcedit.reeset.net/software/update7.txt 7.5.0 Change Log: https://marcedit.reeset.net/software/update75.txt If you are using 7.x – this will prompt as normal for update. 7.5.x is the beta build, please be aware I expect to be releasing updates to this build weekly and also expect to find some issues.… Continue reading MarcEdit 7.3.x/7.5.x (beta) Updates Published February 2, 2021Categorized as MarcEdit MarcEdit 7.5.x/MacOS 3.5.x Timelines I sent this to the MarcEdit Listserv to provide info about my thoughts around timelines related to the beta and release.  Here’s the info. Dear All, As we are getting close to Feb. 1 (when I’ll make the 7.5 beta build available for testing) – I wanted to provide information about the update process going… Continue reading MarcEdit 7.5.x/MacOS 3.5.x Timelines Published January 26, 2021Categorized as MarcEdit MarcEdit 7.5 Change/Bug Fix list * Updated; 1/20 Change: Allow OS to manage supported supported Security Protocol types. Change: Remove com.sun dependency related to dns and httpserver Change: Changed AppData Path Change: First install automatically imports settings from MarcEdit 7.0-2.x Change: Field Count – simplify UI (consolidate elements) Change: 008 Windows — update help urls to oclc Change: Generate FAST… Continue reading MarcEdit 7.5 Change/Bug Fix list Published January 20, 2021Categorized as MarcEdit MarcEdit 7.5 Updates Current list of MarcEdit 7.5 general updates.  I’ll be walking through many of these changes in a webinar 1/15. Significant Changes: Targeted Framework: .NET 5.0 (What’s new in .NET 5 | Microsoft Docs) XML Wizard Changes Support for Attribute-based mapping (extends previous entity based mapping) Linked Data Components updated SPARQL Components Updated Linked Data Rules… Continue reading MarcEdit 7.5 Updates Published January 12, 2021Categorized as MarcEdit MarcEdit 7.5 Update Status I’m planning to start making testing versions of the new MarcEdit instance available around the first of the year broadly, to a handful of testers in mid-Dec.  The translation from .NET 4.7.2 to .NET 5 was more significant than I would have thought – and includes a number of swapped default values – so hunting… Continue reading MarcEdit 7.5 Update Status Published November 30, 2020Categorized as Uncategorized Changes to System.Diagnostics.Process in .NET Core In .NET Core, one of the changes that caught me by surprise is the change related to starting processes.  In the .NET framework – you can open a web site, file, etc. just by using the following:\ System.Diagnostics.Process.Start(path); However, in .NET Core – this won’t work.  When trying to open a file, the process will… Continue reading Changes to System.Diagnostics.Process in .NET Core Published November 19, 2020Categorized as Uncategorized Posts navigation Page 1 … Page 94 Older posts Search… Terry's Worklog Proudly powered by WordPress. Dark Mode: blog-reeset-net-2049 ---- None blog-reeset-net-2210 ---- None blog-reeset-net-2983 ---- Thoughts on NACOs proposed process on updating CJK records – Terry's Worklog Skip to content Terry's Worklog On my work (programming, digital libraries, cataloging) and other stuff that perks my interest (family, cycling, etc) Menu Close Home About Me MarcEdit Homepage GitHub Page Privacy Policy Thoughts on NACOs proposed process on updating CJK records I would like to take a few minutes and share my thoughts about an updated best practice recently posted by the PCC and NACO related to an update on CJK records. The update is found here: https://www.loc.gov/aba/pcc/naco/CJK/CJK-Best-Practice-NCR.docx. I’m not certain if this is active or a simply a proposal, but I’ve been having a number of private discussions with members at the Library of Congress and the PCC as I’ve been trying to understand the genesis for this policy change. I personally believe that formally adopting a policy like this would be exceptionally problematic, and I wanted to flesh out my thoughts on why and some potential better options that could fix the issue that this problem is attempting to solve. But first, I owe some folks an apology. In chatting with some folks at LC (because, let’s be clear, this proposal was created specifically because there are local, limiting practices at LC that artificially are complicating this work) – it came to my attention that the individuals that spent a good deal of time considering and creating this proposal have received some unfair criticism – and I think I bare a lot of responsibility for that. I have done work creating best practices and standards and its thankless, difficult work. Because of that, in cases where I disagree with a particular best practice, my preference has been to address those privately and attempt to understand and share my issues with a set of practices. This is what I have been doing related to this work. However, on the MarcEdit list (a private list), when a request was made related to a feature request in MarcEdit to support this work – I was less thoughtful in my response as the proposed change could fundamentally undo almost a decade of work as I have dealt with thousands of libraries stymied by these kinds of best practices that have significant unintended consequences. My regret is that I’ve been told that my thoughts shared on the MarcEdit list, have been used by others in more public spaces to take this committee’s work to task. This is unfortunate and disappointing, and something I should have been more thoughtful of in my responses on the MarcEdit list. Especially, given that every member of that committee is doing this work as a service to the community. I know I forget that sometimes. So, to the folks that did this work – I’ve not followed (or seen) any feedback you may have received, but in as much that I’m sure I played a part in any push back you may have received, I’m sorry. What does this problem seek to solve? If you look at the proposal, I think that the writers do a good job identifying the issue. Essentially, this issue is unique to authority records. At present, NACO still requires that records created within the program only utilize UTF8 characters that fall within the MARC-8 repertoire. OCLC, the pipeline for creating these records, enforces this rule by invalidating records with UTF8 characters outside the MARC8 range. The proposal seeks to address this by encouraging the use of NRC (Numeric Character Reference) data in UTF8 records, to work around these normalization issues. So, in a nutshell, that is the problem, and that is the proposed solution. But before we move on, let’s talk a little bit about how we got here. This problem currently exists because of, what I believe to be, an extremely narrow and unproductive read of what MARC8 repertoire actually means. For those not in Libraries, MARC8 is essentially a made-up character encoding, used only in libraries, that has so outlived its usefulness. Modern systems have largely stopped supporting it outside of legacy ingest workflows. The issue is that for every academic library or national library that has transitioned to UTF8, hundreds of small libraries or organizations around the world have not. MARC8 continues to exist because the infrastructure that supports these smaller libraries is built around it. But again, I think it is worth thinking about today, what actually is the MARC8 repertoire. Previously, this had been a hard set of defined values. But really, that changed in 2004ish when LC updated guidance and introduced the concept of NRCs to preserve lossless data transfer between systems that were fully UTF8 compliant and older MARC8 systems. NRCs in MARC8 were workable, because it left local systems the ability to handle (or not handle) the data as it seen fit and finally provided an avenue for the Library community as a whole to move on from the limitations MARC8 was imposing on systems. It allowed for the facilitation of data into non-MARC formats that were UTF8 compliant and provided a pathway to allow data from other metadata formats, the ability to reuse that data in MARC records. I would argue that today, the MARC8 repertoire includes NRC notation – and to assume or pretend otherwise, is shortsighted and revisionist. But why is all of this important. Well, it is at the heart of the problem that we find ourselves in. For authority data, the Library of Congress appears to have adopted this very narrow view of what MARC8 means (against their own stated recommendations) and as a result, NACO and OCLC place artificial limits on the pipeline. There are lots of reasons why LC does this, I recognize they are moving slowly because any changes that they make are often met with some level of resistance from members of our community – but in this case, this paralysis is causing more harm to the community than good. Why this proposal is problematic? So, this is the environment that we are working in and the issue this proposal sought to solve. The issue, however, is that the proposal attempts to solve this problem by adopting a MARC8 solution and applying it within UTF8 data – essentially making the case that NRC values can be embedded in UTF8 records to ensure lossless data entry. And while I can see why someone might think that – that assumption is fundamentally incorrect. When LC developed its guidance on NRC notation, this was guidance that was specifically directed in the lossless translation of data to MARC8. UTF8 data has no need for NRC notation. This does not mean that it does not sometimes show up – and as a practical purpose, I’ve spent thousands of hours working with Libraries dealing with the issues this creates in local systems. Aside from the issues this creates in MARC systems around indexing and discovery, it makes data almost impossible to be used outside of that system and in times of migration. In thinking about the implications of this change in the context of MarcEdit, I had the following, specific concerns: NRC data in UTF8 records would break existing workflows for users with current generation systems that would have no reason to expect this data as being present in UTF8 MARC records It would make normalization functionally virtually impossible and potentially re-introduce a problem I spent months solving for organizations related to how UTF8 data is normalized and introduced into local systems. It would break many of the transformation options.  MarcEdit allows for the flow of data to many different metadata formats – all are built on the concept that the first thing MarcEdit does is clean up character encodings to ensure the output data is in UTF8. MarcEdit is used by ~20k active users and ~60k annual users.  Over 1/3 of those users do not use MARC21 and do not use MARC-8.  Allowing the mixing of NRCs and UTF8 data potentially breaks functionality for broad groups of international users. While I very much appreciate the issue that this is attempting to solve, I’ve spent years working with libraries where this kind of practice would introduce a long-term data issue that is very difficult to identify and fix and often shows up unexpectedly when it comes time to migration or share this information with other services, communities, or organizations. So what is the solution?   I think that we can address this issue on two fronts. First, I would advise NACO and OCLC to essentially stop limiting data entry to this very limited notion of MARC8 repertoire. In all other contexts, OCLC provides the ability to enter any valid UTF8 data. This current limit within the authority process is artificial and unnecessary. OCLC could easily remove it, and NACO could amend their process to allow record entry to utilize any valid UTF8 character. This would address the problem that this group was attempting to solve for catalogers creating these records. The second step could take two forms. If LC continues to ignore their own guidance and cleave to an outdated concept of the MARC8 repertoire – OCLC could provide to LC via their pipeline a version of the records where data includes NRC notation for use in LCs own systems. It would mean that I would not recommend using LC as a trusted system for downloading authorities if this was the practice unless I had an internal local process to remove any NRC data found in valid UTF8 records. Essentially, we essentially treat LC’s requirements as a disease and quarantine them and their influence in this process. Of course, what would be more ideal, is LC making the decision to accept UTF8 data without restrictions and rely on applicable guidance and MARC21 best practice by supporting UTF8 data fully, and for those still needing MARC8 data – providing that data using the lossless process of NRCs (per their own recommendations). Conclusion Ultimately, this proposal is a recognition that the current NACO rules and process is broken and broken in a way that it is actively undermining other work in the PCC around linked data development. And while I very much appreciate the thoughtful work that went into the consideration of a different approach, I think the unintended side affects would cause more long-term damage that any short-term gains. Ultimately, what we need is for the principles to rethink why these limitations are in place, and, honestly, really consider ways that we start to deemphasize the role LC plays as a standard holder if in that role, LC’s presence continues to be an impediment for moving libraries forward. Published April 20, 2021By reeset Categorized as Cataloging, MarcEdit Leave a comment Cancel reply Your email address will not be published. Required fields are marked * Comment Name * Email * Website Notify me of follow-up comments by email. Notify me of new posts by email. Post navigation Previous post MarcEdit 7.5 Update Search… Terry's Worklog Proudly powered by WordPress. Dark Mode: blog-reeset-net-3028 ---- None blog-reeset-net-3412 ---- Terry's Worklog Terry's Worklog On my work (programming, digital libraries, cataloging) and other stuff that perks my interest (family, cycling, etc) Thoughts on NACOs proposed process on updating CJK records I would like to take a few minutes and share my thoughts about an updated best practice recently posted by the PCC and NACO related to an update on CJK records. The update is found here: https://www.loc.gov/aba/pcc/naco/CJK/CJK-Best-Practice-NCR.docx. I’m not certain if this is active or a simply a proposal, but I’ve been having a number… Continue reading Thoughts on NACOs proposed process on updating CJK records MarcEdit 7.5 Update ChangeLog: https://marcedit.reeset.net/software/update75.txt Highlights Preview Changes One of the most requested features over the years has been the ability to preview changes prior to running them.  As of 7.5.8 – a new preview option has been added to many of the global editing tools in the MarcEditor.  Currently, you will find the preview option attached to… Continue reading MarcEdit 7.5 Update How do I generate MARC authority records from the Homosaurus vocabulary? Step by step instructions here: https://youtu.be/FJsdQI3pZPQ Ok, so last week, I got an interesting question on the listserv where a user asked specifically about generating MARC records for use in one’s ILS system from a JSONLD vocabulary.  In this case, the vocabulary in question as Homosaurus (Homosaurus Vocabulary Site) – and the questioner was specifically… Continue reading How do I generate MARC authority records from the Homosaurus vocabulary? MarcEdit: State of the Community *2020-2021 * Sigh – original title said 2019-2020.  Obviously, this is for this past year (Jan. 2020-Dec. 31, 2020).   Per usual, I wanted to take a couple minutes and look at the state of the MarcEdit project. This is something that I try to do once a year to gauge the current health of the community,… Continue reading MarcEdit: State of the Community *2020-2021 MarcEdit 7.3.x/7.5.x (beta) Updates Versions are available at: https://marcedit.reeset.net/downloads Information about the changes: 7.3.10 Change Log: https://marcedit.reeset.net/software/update7.txt 7.5.0 Change Log: https://marcedit.reeset.net/software/update75.txt If you are using 7.x – this will prompt as normal for update. 7.5.x is the beta build, please be aware I expect to be releasing updates to this build weekly and also expect to find some issues.… Continue reading MarcEdit 7.3.x/7.5.x (beta) Updates MarcEdit 7.5.x/MacOS 3.5.x Timelines I sent this to the MarcEdit Listserv to provide info about my thoughts around timelines related to the beta and release.  Here’s the info. Dear All, As we are getting close to Feb. 1 (when I’ll make the 7.5 beta build available for testing) – I wanted to provide information about the update process going… Continue reading MarcEdit 7.5.x/MacOS 3.5.x Timelines MarcEdit 7.5 Change/Bug Fix list * Updated; 1/20 Change: Allow OS to manage supported supported Security Protocol types. Change: Remove com.sun dependency related to dns and httpserver Change: Changed AppData Path Change: First install automatically imports settings from MarcEdit 7.0-2.x Change: Field Count – simplify UI (consolidate elements) Change: 008 Windows — update help urls to oclc Change: Generate FAST… Continue reading MarcEdit 7.5 Change/Bug Fix list MarcEdit 7.5 Updates Current list of MarcEdit 7.5 general updates.  I’ll be walking through many of these changes in a webinar 1/15. Significant Changes: Targeted Framework: .NET 5.0 (What’s new in .NET 5 | Microsoft Docs) XML Wizard Changes Support for Attribute-based mapping (extends previous entity based mapping) Linked Data Components updated SPARQL Components Updated Linked Data Rules… Continue reading MarcEdit 7.5 Updates MarcEdit 7.5 Update Status I’m planning to start making testing versions of the new MarcEdit instance available around the first of the year broadly, to a handful of testers in mid-Dec.  The translation from .NET 4.7.2 to .NET 5 was more significant than I would have thought – and includes a number of swapped default values – so hunting… Continue reading MarcEdit 7.5 Update Status Changes to System.Diagnostics.Process in .NET Core In .NET Core, one of the changes that caught me by surprise is the change related to starting processes.  In the .NET framework – you can open a web site, file, etc. just by using the following:\ System.Diagnostics.Process.Start(path); However, in .NET Core – this won’t work.  When trying to open a file, the process will… Continue reading Changes to System.Diagnostics.Process in .NET Core blog-reeset-net-6539 ---- None blog-reeset-net-7876 ---- None blog-reeset-net-794 ---- How do I generate MARC authority records from the Homosaurus vocabulary? – Terry's Worklog Skip to content Terry's Worklog On my work (programming, digital libraries, cataloging) and other stuff that perks my interest (family, cycling, etc) Menu Close Home About Me MarcEdit Homepage GitHub Page Privacy Policy How do I generate MARC authority records from the Homosaurus vocabulary? Step by step instructions here: https://youtu.be/FJsdQI3pZPQ Ok, so last week, I got an interesting question on the listserv where a user asked specifically about generating MARC records for use in one’s ILS system from a JSONLD vocabulary.  In this case, the vocabulary in question as Homosaurus (Homosaurus Vocabulary Site) – and the questioner was specifically looking for a way to pull individual terms for generation into MARC Authority records to add to one’s ILS to improve search and discovery. When the question was first asked, my immediate thought was that this could likely be accommodated using the XML/JSON profiling wizard in MarcEdit.  This tool can review a sample XML or JSON file and allow a user to create a portable processing file based on the content in the file.  However, there were two issues with this approach: The profile wizard assumes that data format is static – i.e., the sample file is representative of other files.  Unfortunately, for this vocabulary, that isn’t the case.  The profile wizard was designed to work with JSON – JSON LD is actually a different animal due to the inclusion of the @ symbol.  While I updated the Profiler to recognize and work better with JSON-LD – the first challenge is one that doesn’t make this a good fit to create a generic process.  So, I looked at how this could be built into the normal processing options. To do this, I added a new default serialization, JSON=>XML == which MarcEdit now supports.  This allows the tool to take a JSON file, and deserialize the data so that is output reliably as XML.  So, for example, here is a sample JSON-LD file (homosaurus.org/v2/adoptiveParents.jsonld): { "@context": { "dc": "http://purl.org/dc/terms/", "skos": "http://www.w3.org/2004/02/skos/core#", "xsd": "http://www.w3.org/2001/XMLSchema#" }, "@id": "http://homosaurus.org/v2/adoptiveParents", "@type": "skos:Concept", "dc:identifier": "adoptiveParents", "dc:issued": { "@value": "2019-05-14", "@type": "xsd:date" }, "dc:modified": { "@value": "2019-05-14", "@type": "xsd:date" }, "skos:broader": { "@id": "http://homosaurus.org/v2/parentsLGBTQ" }, "skos:hasTopConcept": [ { "@id": "http://homosaurus.org/v2/familyMembers" }, { "@id": "http://homosaurus.org/v2/familiesLGBTQ" } ], "skos:inScheme": { "@id": "http://homosaurus.org/terms" }, "skos:prefLabel": "Adoptive parents", "skos:related": [ { "@id": "http://homosaurus.org/v2/socialParenthood" }, { "@id": "http://homosaurus.org/v2/LGBTQAdoption" }, { "@id": "http://homosaurus.org/v2/LGBTQAdoptiveParents" }, { "@id": "http://homosaurus.org/v2/birthParents" } ] } In MarcEdit, the new JSON=>XML process can take this file and output it in XML like this: http://purl.org/dc/terms/ http://www.w3.org/2004/02/skos/core# http://www.w3.org/2001/XMLSchema# http://homosaurus.org/v2/adoptiveParents skos:Concept adoptiveParents 2019-05-14 xsd:date 2019-05-14 xsd:date http://homosaurus.org/v2/parentsLGBTQ http://homosaurus.org/v2/familyMembers http://homosaurus.org/v2/familiesLGBTQ http://homosaurus.org/terms Adoptive parents http://homosaurus.org/v2/socialParenthood http://homosaurus.org/v2/LGBTQAdoption http://homosaurus.org/v2/LGBTQAdoptiveParents http://homosaurus.org/v2/birthParents The ability to reliably convert JSON/JSONLD to XML means that I can now allow users to utilize the same XSLT/XQUERY process MarcEdit utilizes for other library metadata format transformation.  All that was left to make this happen was to add a new origin data format to the XML Function template – and we are off and running. The end result is users could utilize this process with any JSON-LD vocabulary (assuming they created the XSLT) to facilitate the automation of MARC Authority data.  In this case of this vocabulary, I’ve created an XSLT and added it to my github space: https://github.com/reeset/marcedit_xslt_files/blob/master/homosaurus_xml.xsl but have included the XSLT in the MarcEdit XSLT directory in current downloads. In order to use this XSLT and allow your version of MarcEdit to generate MARC Authority records from this vocabulary – you would use the following steps: Be using MarcEdit 7.5.8+ or MarcEdit Mac 3.5.8+ (Mac version will be available around 4/8).  I have not decided if I will backport to 7.3- Open the XML Functions Editor in MarcEdit Add a new Transformation – using JSON as the original format, and MARC as the final.  Make sure the XSLT path is pointed to the location where you saved the downloaded XSLT file. Save That should be pretty much it.  I’ve recorded the steps and placed them here: https://youtu.be/FJsdQI3pZPQ, including some information on values you may wish to edit should you want to localize the XSLT.  Published April 3, 2021By reeset Categorized as MarcEdit 1 comment Pingback: MarcEdit 7.5 Update – Terry's Worklog Leave a comment Cancel reply Your email address will not be published. Required fields are marked * Comment Name * Email * Website Notify me of follow-up comments by email. Notify me of new posts by email. Post navigation Previous post MarcEdit: State of the Community *2020-2021 Next post MarcEdit 7.5 Update Search… Terry's Worklog Proudly powered by WordPress. Dark Mode: blog-twitter-com-3439 ---- Enabling the future of academic research with the Twitter API Developer Blog Back Developer Blog Tips Community Tools Spotlight Sign Up ‎English (US)‎ ‎日本語‎ ‎English (US)‎ ‎日本語‎ Sign Up Tools Enabling the future of academic research with the Twitter API By Adam Tornes and Leanne Trujillo Tuesday, 26 January 2021 Link copied successfully When we introduced the next generation of the Twitter API in July 2020, we also shared our plans to invest in the success of the academic research community with tailored solutions that better serve their goals. Today, we’re excited to launch the Academic Research product track on the new Twitter API.  Why we’re launching this & how we got here Since the Twitter API was first introduced in 2006, academic researchers have used data from the public conversation to study topics as diverse as the conversation on Twitter itself - from state-backed efforts to disrupt the public conversation to floods and climate change, from attitudes and perceptions about COVID-19 to efforts to promote healthy conversation online. Today, academic researchers are one of the largest groups of people using the Twitter API.  Our developer platform hasn’t always made it easy for researchers to access the data they need, and many have had to rely on their own resourcefulness to find the right information. Despite this, for over a decade, academic researchers have used Twitter data for discoveries and innovations that help make the world a better place. Over the past couple of years, we’ve taken iterative steps to improve the experience for researchers, like when we launched a webpage dedicated to Academic Research, and updated our Twitter Developer Policy to make it easier to validate or reproduce others’ research using Twitter data. We’ve also made improvements to help academic researchers use Twitter data to advance their disciplines, answer urgent questions during crises, and even help us improve Twitter. For example, in April 2020, we released the COVID-19 stream endpoint - the first free, topic-based stream built solely for researchers to use data from the global conversation for the public good. Researchers from around the world continue to use this endpoint for a number of projects. Over two years ago, we started our own extensive research to better understand the needs, constraints and challenges that researchers have when studying the public conversation. In October 2020, we tested this product track in a private beta program where we gathered additional feedback. This gave us a glimpse into some of the important work that the free Academic Research product track we’re launching today can now enable. “The Academic Research product track gives researchers a window into understanding the use of Twitter and social media at large, and is an important step by Twitter to support the scientific community.” - Dr. Sarah Shugars, Assistant Professor at New York University “Twitter's enhancements for academic research have the potential to eliminate many of the bottlenecks that scholars confront in working with Twitter's API, and allow us to better evaluate the impact and origin of trends we discover.” - Dr. David Lazer, Professor at Northeastern University What’s launching today With the new Academic Research product track, qualified researchers will have access to all v2 endpoints released to date, as well as: Free access to the full history of public conversation via the full-archive search endpoint, which was previously limited to paid premium or enterprise customers Higher levels of access to the Twitter developer platform for free, including a significantly higher monthly Tweet volume cap of 10 million (20x higher than what’s available on the Standard product track today) More precise filtering capabilities across all v2 endpoints to limit data collection to what is relevant for your study and minimize data cleaning requirements New technical and methodological guides to maximize the success of your studies The release of the Academic Research product track is just a starting point. This initial solution is intended to address the most requested, biggest challenges faced when conducting research on the platform. We are excited to enable even more research that can create a positive impact on the world, and on Twitter, in the future.    For more in-depth details about what’s available, see our post on the Twitter community forum. Where do I start? To use this track, new and existing Twitter developers will need to apply for access with the Academic Research application. This Tweet is unavailable This Tweet is unavailable. An improved developer portal experience guides you to the product track that best fits your needs. We require this additional application step to help protect the security and privacy of people who use Twitter and our developer platform. Each application will go through a manual review process to determine whether the described use cases for accessing our Academic Research product track adhere to our Developer Policy, and that applicants meet these three requirements: You are either a master’s student, doctoral candidate, post-doc, faculty, or research-focused employee at an academic institution or university. You have a clearly defined research objective, and you have specific plans for how you intend to use, analyze, and share Twitter data from your research. Learn more about the application. You will use this product track for non-commercial purposes. Learn about non-commercial use. We understand that these requirements are not representative of everyone doing academic research with Twitter data (for example, if you are an undergraduate, independent researcher, or a non-profit). Our future goal is to serve the complete range of research use cases for public Twitter data. In the meantime, anyone can apply to start with our v2 endpoints on the Standard product track. This Tweet is unavailable This Tweet is unavailable. The new application for the Academic Research track asks specific questions related to your academic profile and research project details. Learn more about the application here. What’s next for the Twitter API v2? Today’s launch marks the beginning of how we plan to support this community with unprecedented access to data that can advance research objectives for nearly any discipline. While we recognize what we’re launching today may not address all needs of the community, this is a starting point and we are committed to continued support for academic researchers in the future. We’ll continue to listen and learn from you all, and welcome your feedback on how we can continue to improve and best serve your needs. As we’ve seen over the last 15 years, the research topics that can be studied with Twitter data are vast, and the future possibilities are endless. We hope you are as excited as we are about the possibilities this new product track creates for your research. In coming months, we will introduce a specialized Business product track, as well as additional levels of access within our Academic Research, Standard, and Business product tracks. We are also exploring more flexible access terms, support for additional Projects with unique use cases within your product track, and other improvements intended to help researchers and developers to get started, grow, and scale their projects all within the same API. To follow our planned releases, check out the product roadmap. Eventually, the new Twitter API will fully replace the v1.1 standard, premium, and enterprise APIs. Though before that can happen, we have a lot more to build, which is why we are referring to today’s launch as Early Access. Early access gives you a chance to get started and get ahead on using our new, v2 endpoints. Learn more about how we plan to roll out the new Twitter API here. Have questions or want to connect with other researchers using the Twitter API? Check out our academic research community forum. Have ideas about how we can improve the new Twitter API? Upvote ideas or add your own in the v2 API feedback channel. This Tweet is unavailable This Tweet is unavailable. Adam Tornes ‎@atornes‎ Staff Product Manager, Developer & Enterprise Solutions Leanne Trujillo ‎@leanne_tru‎ Sr. Program Manager, Developer & Enterprise Solutions Only on Twitter #TwitterAPI #academicresearch Tweet Twitter logo icon Tags: API academicresearch Link copied successfully More from Tools Prototyping in production for rapid user feedback By Daniele Bernardi on Thursday, 3 December 2020 Introducing a new and improved Twitter API By Ian Cairns and Priyanka Shetty on Thursday, 16 July 2020 Previewing changes to the User and Mentions Timeline API endpoints By ‎@robjohnson ‎ and ‎@yoyoel ‎ on Tuesday, 19 March 2019 Designing the new Twitter developer experience By Alyssa Reese on Wednesday, 12 August 2020 See what's happening ‎@Twitter‎ Twitter platform Twitter.com Status Card validator Privacy Center Transparency Center Twitter, Inc. About the company Twitter for Good Company news Brand toolkit Jobs and internships Investors Help Help Center Using Twitter Twitter Media Ads Help Center Managing your account Safety and security Rules and policies Contact us Developer resources Developer home Documentation Forums Communities Developer blog Engineering blog Developer terms Business resources Advertise Twitter for business Resources and guides Twitter for marketers Marketing insights Brand inspiration Twitter Data Twitter Flight School ‎© 2021 Twitter, Inc.‎ Cookies Privacy Terms and conditions By using Twitter’s services you agree to our Cookies Use. We use cookies for purposes including analytics, personalisation, and ads. OK blog-twitter-com-4317 ---- Introducing a new and improved Twitter API Developer Blog Back Developer Blog Tips Community Tools Spotlight Sign Up ‎English (US)‎ ‎日本語‎ ‎English (US)‎ ‎日本語‎ Sign Up Introducing a new and improved Twitter API By Ian Cairns and Priyanka Shetty Thursday, 16 July 2020 Link copied successfully We planned to launch the new Twitter API on July 16, 2020. But given the security incident we discovered on July 15, 2020, the timing of our launch no longer made sense or felt right.  We updated this post on August 12, 2020 to include additional details below to support the official launch of the new Twitter API. This Tweet is unavailable This Tweet is unavailable. -------------------------------------------- This Tweet is unavailable This Tweet is unavailable. Your browser does not support the video element. Play Play Pause Seek 0% buffered 00:00 Current time 00:00 Duration 00:00 Toggle Mute Volume Today, we’re introducing the new Twitter API. Rebuilt from the ground up to deliver new features faster, today’s release includes the first set of new endpoints and features we’re launching so developers can help the world connect to the public conversation happening on Twitter.  If you can’t wait to check it out, visit the new developer portal. If you can, then read on for more about what we’re building, what’s new about the Twitter API v2, what’s launching first, and what’s coming next.  This Tweet is unavailable This Tweet is unavailable. Building in the open and what we've learned This Tweet is unavailable This Tweet is unavailable. Your feedback has been essential in helping us define our vision and roadmap for the new Twitter API. From Tweets to focus groups, you have shared a ton of feedback with us over the past few years about what you need out of the Twitter API and what we can do better. We also learned a lot through Twitter Developer Labs where you’ve been sharing real-time feedback on the new API features we’ve tested in the open.  We’ve always known that our developer ecosystem is diverse, but our API has long taken a one-size-fits-all approach. Your feedback helped us see the importance of making the new Twitter API more flexible and scalable to fit your needs. With the new API, we are building new elevated access options and new product tracks, so more developers can find options to meet their needs. More on that below.  We also know it’s important to be able to plan ahead, and we want to do a better job of sharing our plans with you in advance. Going forward, we’ll share more of what’s coming next on our public roadmap (updates coming soon). We're also sharing a Guide to the future of the Twitter API for more about what to expect as we roll out the new API. We have a lot planned, and it will evolve and improve as we continue to hear from you.  This Tweet is unavailable This Tweet is unavailable. Twitter API v2: What’s New? This Tweet is unavailable This Tweet is unavailable. A new foundation- The new API is built on a completely new foundation — rebuilt for the first time since 2012 — and includes new features so you can get more out of the public conversation. That new foundation allows us to add new functionality faster and better than we’ve done in the past, so expect more new features from Twitter to show up in the API. With this new foundation, developers can expect to see: A cleaner API that's easier to use, with new developer features like the ability to specify which fields get returned, or retrieve more Tweets from a conversation within the same response  Some of the most requested features that were missing from the API, including conversation threading, poll results in Tweets, pinned Tweets on profiles, spam filtering, and a more powerful stream filtering and search query language  New access levels- With the new Twitter API, we’re building multiple access levels to make it easier for developers to get started and to grow what they build. In the past, the Twitter API was separated into three different platforms and experiences: standard (free), premium (self-serve paid), and enterprise (custom paid). As a developer's needs expanded, it required tedious migration to each API. In the future, all developers — from academic researchers to makers to businesses — will have options to get elevated access and grow on the same API.  This Tweet is unavailable This Tweet is unavailable. New product tracks- We love the incredible diversity of developers who use our API. Our plan is to introduce new, distinct product tracks to better serve different groups of developers and provide them with the right experience and support for their needs, along with a range of relevant access levels, and appropriate pricing (where applicable). To start, these product tracks will include: Standard: Available first, this will be the default product track for most developers, including those just getting started, building something for fun, for a good cause, and to learn or teach. We plan to add Elevated access to this track in the future. Academic Research: Academic researchers use the Twitter API to understand what’s happening in the public conversation. In the future, qualified academic researchers will have a way to get Elevated or Custom access to relevant endpoints. We’re also providing tools and guides to make it easier to conduct academic research with the Twitter API. Business: Developers build businesses on the Twitter API, including our Twitter Official Partners and enterprise data customers. We love that their products help other people and businesses better understand and engage with the conversation on Twitter. In the future, this track will include Elevated or Custom access to relevant endpoints. A new developer portal- To help you get the most out of the new API, we’ve also designed and built a new developer portal. This is where you can get started with our new onboarding wizard, manage Apps, understand your API usage and limits, access our new support center, find documentation, and more to come in the future.  With the new Twitter API, we hope to enable more:  Academic research that helps the world better understand our shared perspectives on important topics such as: people’s attitudes about COVID-19, the social impact of floods and climate change, or the prevalence of hateful speech and how to address it.  Tools that help make Twitter better for the people who use it, like: BlockParty, TweetDelete, and Tokimeki Unfollow. Bots that share information and make conversations more fun like the: HAM: Drawings bot, House of Lords Hansard bot, and Emoji Mashup bot.  Businesses like Black Swan, Spiketrap, and Social Market Analytics who serve innovative use cases such as social prediction of future product trends, AI-powered consumer insights and FinTech market intelligence.  Twitter Official Partners such as Brandwatch, Sprinklr and Sprout Social who help brands better understand and engage with their industry and customers. And much more, including new things we haven't thought of yet, but that we know you will... This Tweet is unavailable This Tweet is unavailable. So, what’s launching first? This Tweet is unavailable This Tweet is unavailable. One of the most common reasons developers use the Twitter API is to listen to and analyze the conversation happening on Twitter. So, soon we’ll release Early Access to an initial set of new endpoints for developers to:  Stream Tweets in real-time or analyze past conversations to help the world understand the public conversations happening on Twitter, or help businesses discover customer insights from the conversation Measure Tweet performance to help people and businesses get better at using Twitter Listen for important events to help people learn about new things that matter to them on Twitter And a whole lot more, with new options to explore Tweets from any account All API features we’re releasing first will be available in our new – always free – Basic access level. For most developers, Basic access will provide everything you need to get started and build something awesome.  Eventually, the new API will fully replace the v1.1 standard, premium, and enterprise APIs. Before that can happen though, we have more to build, which is why we are referring to this phase as Early Access. It's a chance to get started now and get ahead. Unlike Twitter Developer Labs which hosts our experiments, everything in the first release will be fully supported and ready for you to use in production. To see the full list of API functionality and endpoints that are included in today’s release, check out our developer forum post. You can get started on the new API by creating a new Project and App today in the new developer portal. You can also connect your new Project to existing Apps, if you would like. To get started with Early Access to the new Twitter API, visit the new developer portal. If you don’t yet have a developer account, apply to get started. This Tweet is unavailable This Tweet is unavailable. What's Next? This Tweet is unavailable This Tweet is unavailable. This is just the beginning. We’re sharing our public roadmap to keep you updated on our vision for the API, along with options to share feedback so that we can continue to learn from you along the way and so you can plan for what’s to come. On deck: full support to hide (and unhide) replies, and free Elevated access for academic researchers.   Developers like you push us and inspire us every day. Your creativity and work with our API make Twitter better for people & businesses, and make the world a better place. Thanks for your partnership on the journey ahead.   This Tweet is unavailable This Tweet is unavailable. Ian Cairns ‎@cairns‎ Head of Product, Twitter Developer Platform Priyanka Shetty ‎@_priyankashetty‎ Product Manager, Twitter Developer Platform Only on Twitter #TwitterAPI @TwitterDev @TwitterAPI Tweet Twitter logo icon Tags: API data customer success Link copied successfully More from Tools Prototyping in production for rapid user feedback By Daniele Bernardi on Thursday, 3 December 2020 A year with Twitter Developer Labs: What we've learned and changed By Kyle Weiss on Thursday, 2 July 2020 Enabling study of the public conversation in a time of crisis By Adam Tornes on Wednesday, 29 April 2020 Announcing more functionality to improve customer engagements on Twitter By Jon Cipriano on Tuesday, 19 December 2017 See what's happening ‎@Twitter‎ Twitter platform Twitter.com Status Card validator Privacy Center Transparency Center Twitter, Inc. About the company Twitter for Good Company news Brand toolkit Jobs and internships Investors Help Help Center Using Twitter Twitter Media Ads Help Center Managing your account Safety and security Rules and policies Contact us Developer resources Developer home Documentation Forums Communities Developer blog Engineering blog Developer terms Business resources Advertise Twitter for business Resources and guides Twitter for marketers Marketing insights Brand inspiration Twitter Data Twitter Flight School ‎© 2021 Twitter, Inc.‎ Cookies Privacy Terms and conditions By using Twitter’s services you agree to our Cookies Use. We use cookies for purposes including analytics, personalisation, and ads. OK blog-twitter-com-6356 ---- Enabling the future of academic research with the Twitter API Developer Blog Back Developer Blog Tips Community Tools Spotlight Sign Up ‎English (US)‎ ‎日本語‎ ‎English (US)‎ ‎日本語‎ Sign Up Tools Enabling the future of academic research with the Twitter API By Adam Tornes and Leanne Trujillo Tuesday, 26 January 2021 Link copied successfully When we introduced the next generation of the Twitter API in July 2020, we also shared our plans to invest in the success of the academic research community with tailored solutions that better serve their goals. Today, we’re excited to launch the Academic Research product track on the new Twitter API.  Why we’re launching this & how we got here Since the Twitter API was first introduced in 2006, academic researchers have used data from the public conversation to study topics as diverse as the conversation on Twitter itself - from state-backed efforts to disrupt the public conversation to floods and climate change, from attitudes and perceptions about COVID-19 to efforts to promote healthy conversation online. Today, academic researchers are one of the largest groups of people using the Twitter API.  Our developer platform hasn’t always made it easy for researchers to access the data they need, and many have had to rely on their own resourcefulness to find the right information. Despite this, for over a decade, academic researchers have used Twitter data for discoveries and innovations that help make the world a better place. Over the past couple of years, we’ve taken iterative steps to improve the experience for researchers, like when we launched a webpage dedicated to Academic Research, and updated our Twitter Developer Policy to make it easier to validate or reproduce others’ research using Twitter data. We’ve also made improvements to help academic researchers use Twitter data to advance their disciplines, answer urgent questions during crises, and even help us improve Twitter. For example, in April 2020, we released the COVID-19 stream endpoint - the first free, topic-based stream built solely for researchers to use data from the global conversation for the public good. Researchers from around the world continue to use this endpoint for a number of projects. Over two years ago, we started our own extensive research to better understand the needs, constraints and challenges that researchers have when studying the public conversation. In October 2020, we tested this product track in a private beta program where we gathered additional feedback. This gave us a glimpse into some of the important work that the free Academic Research product track we’re launching today can now enable. “The Academic Research product track gives researchers a window into understanding the use of Twitter and social media at large, and is an important step by Twitter to support the scientific community.” - Dr. Sarah Shugars, Assistant Professor at New York University “Twitter's enhancements for academic research have the potential to eliminate many of the bottlenecks that scholars confront in working with Twitter's API, and allow us to better evaluate the impact and origin of trends we discover.” - Dr. David Lazer, Professor at Northeastern University What’s launching today With the new Academic Research product track, qualified researchers will have access to all v2 endpoints released to date, as well as: Free access to the full history of public conversation via the full-archive search endpoint, which was previously limited to paid premium or enterprise customers Higher levels of access to the Twitter developer platform for free, including a significantly higher monthly Tweet volume cap of 10 million (20x higher than what’s available on the Standard product track today) More precise filtering capabilities across all v2 endpoints to limit data collection to what is relevant for your study and minimize data cleaning requirements New technical and methodological guides to maximize the success of your studies The release of the Academic Research product track is just a starting point. This initial solution is intended to address the most requested, biggest challenges faced when conducting research on the platform. We are excited to enable even more research that can create a positive impact on the world, and on Twitter, in the future.    For more in-depth details about what’s available, see our post on the Twitter community forum. Where do I start? To use this track, new and existing Twitter developers will need to apply for access with the Academic Research application. This Tweet is unavailable This Tweet is unavailable. An improved developer portal experience guides you to the product track that best fits your needs. We require this additional application step to help protect the security and privacy of people who use Twitter and our developer platform. Each application will go through a manual review process to determine whether the described use cases for accessing our Academic Research product track adhere to our Developer Policy, and that applicants meet these three requirements: You are either a master’s student, doctoral candidate, post-doc, faculty, or research-focused employee at an academic institution or university. You have a clearly defined research objective, and you have specific plans for how you intend to use, analyze, and share Twitter data from your research. Learn more about the application. You will use this product track for non-commercial purposes. Learn about non-commercial use. We understand that these requirements are not representative of everyone doing academic research with Twitter data (for example, if you are an undergraduate, independent researcher, or a non-profit). Our future goal is to serve the complete range of research use cases for public Twitter data. In the meantime, anyone can apply to start with our v2 endpoints on the Standard product track. This Tweet is unavailable This Tweet is unavailable. The new application for the Academic Research track asks specific questions related to your academic profile and research project details. Learn more about the application here. What’s next for the Twitter API v2? Today’s launch marks the beginning of how we plan to support this community with unprecedented access to data that can advance research objectives for nearly any discipline. While we recognize what we’re launching today may not address all needs of the community, this is a starting point and we are committed to continued support for academic researchers in the future. We’ll continue to listen and learn from you all, and welcome your feedback on how we can continue to improve and best serve your needs. As we’ve seen over the last 15 years, the research topics that can be studied with Twitter data are vast, and the future possibilities are endless. We hope you are as excited as we are about the possibilities this new product track creates for your research. In coming months, we will introduce a specialized Business product track, as well as additional levels of access within our Academic Research, Standard, and Business product tracks. We are also exploring more flexible access terms, support for additional Projects with unique use cases within your product track, and other improvements intended to help researchers and developers to get started, grow, and scale their projects all within the same API. To follow our planned releases, check out the product roadmap. Eventually, the new Twitter API will fully replace the v1.1 standard, premium, and enterprise APIs. Though before that can happen, we have a lot more to build, which is why we are referring to today’s launch as Early Access. Early access gives you a chance to get started and get ahead on using our new, v2 endpoints. Learn more about how we plan to roll out the new Twitter API here. Have questions or want to connect with other researchers using the Twitter API? Check out our academic research community forum. Have ideas about how we can improve the new Twitter API? Upvote ideas or add your own in the v2 API feedback channel. This Tweet is unavailable This Tweet is unavailable. Adam Tornes ‎@atornes‎ Staff Product Manager, Developer & Enterprise Solutions Leanne Trujillo ‎@leanne_tru‎ Sr. Program Manager, Developer & Enterprise Solutions Only on Twitter #TwitterAPI #academicresearch Tweet Twitter logo icon Tags: API academicresearch Link copied successfully More from Tools Prototyping in production for rapid user feedback By Daniele Bernardi on Thursday, 3 December 2020 Introducing a new and improved Twitter API By Ian Cairns and Priyanka Shetty on Thursday, 16 July 2020 A year with Twitter Developer Labs: What we've learned and changed By Kyle Weiss on Thursday, 2 July 2020 Enabling study of the public conversation in a time of crisis By Adam Tornes on Wednesday, 29 April 2020 See what's happening ‎@Twitter‎ Twitter platform Twitter.com Status Card validator Privacy Center Transparency Center Twitter, Inc. About the company Twitter for Good Company news Brand toolkit Jobs and internships Investors Help Help Center Using Twitter Twitter Media Ads Help Center Managing your account Safety and security Rules and policies Contact us Developer resources Developer home Documentation Forums Communities Developer blog Engineering blog Developer terms Business resources Advertise Twitter for business Resources and guides Twitter for marketers Marketing insights Brand inspiration Twitter Data Twitter Flight School ‎© 2021 Twitter, Inc.‎ Cookies Privacy Terms and conditions By using Twitter’s services you agree to our Cookies Use. We use cookies for purposes including analytics, personalisation, and ads. OK blog-twitter-com-7472 ---- Rebuilding Twitter’s public API Engineering Back Engineering Insights Infrastructure Open source Sign Up Sign Up Infrastructure Rebuilding Twitter’s public API By Jenny Qiu Hylbert and Steve Cosenza Wednesday, 12 August 2020 Link copied successfully Today we launched the new Twitter API v2. Our first launch of a public API was in 2006 and shortly after, we began building API access to new features with the intention of opening our platform and inviting developers to build the future with us. Six years after the first launch, in 2012, we released the v1.1 API that introduced new requirements and stricter policies needed to curb abuse and protect the Twitter platform.  Today’s launch marks the most significant rebuild of our public API since 2012. It’s built to deliver new features, faster, and to better serve the diverse set of developers who build on Twitter. It’s also built to incorporate many of our experiences and lessons learned over the past fourteen years of operating public APIs. We’d like to show you how we thought about designing and building this from the ground up. This Tweet is unavailable This Tweet is unavailable.  Establishing goals This Tweet is unavailable This Tweet is unavailable. The public Twitter API v1.1 endpoints are currently implemented by a large set of HTTP microservices, a decision we made as part of our re-architecture from a Ruby monolith. While the microservices approach enabled increased development speeds at first, it also resulted in a scattered and disjointed Twitter API as independent teams designed and built endpoints for their specific use cases with little coordination. For the new Twitter API v2, we knew we needed a new architecture that could more easily scale with the large number of API endpoints to serve our planned and new functionality going forward. As part of this design process, we drafted the following goals: Abstraction: Enable Twitter engineers building the Twitter API to focus on querying, mutating, or subscribing to only the data they care about, without needing to worry about the infrastructure and operations of running a production HTTP service. Ownership: Contain core and common API logic in a single place, owned by a single team. Consistency: Provide a consistent experience for external developers by relying on our API design principles to reinforce uniformity. With the above goals in mind, we’ve built a common platform to host all of our new Twitter API endpoints. To operate this multi-tenant platform at scale, we had to minimize any endpoint specific business logic, otherwise the system would quickly become unmaintainable. A powerful data access layer that emphasized declarative queries over imperative code was crucial to this strategy.  This Tweet is unavailable This Tweet is unavailable. Unified data access layer This Tweet is unavailable This Tweet is unavailable. Around this same time, representatives from teams building Twitter for web, iOS, and Android began migrating from individual internal REST endpoints to a unified GraphQL service. Our team followed suit as we realized that the data querying needs of the public Twitter API are similar to the needs of our Twitter mobile and desktop clients. Put another way, Twitter clients query for data and render UIs, while the public Twitter APIs query for data and render JSON responses.  This Tweet is unavailable This Tweet is unavailable. A bonus from consolidating our data querying through a single interface is that the Twitter API can now easily deliver new Twitter features by querying for GraphQL data already being directly used by our consumer apps. When considering exposing GraphQL directly to external developers, we opted for a design most familiar to a broad set of developers in the form of a REST API. This model also makes it easier to protect against unexpected query complexity so we can ensure a reliable service for all developers. This Tweet is unavailable This Tweet is unavailable. Componentizing the API platform This Tweet is unavailable This Tweet is unavailable. With the platform approach decided, we needed a way for different teams to build and contribute to the overall API. To facilitate this, we designed the following three components: Routes to represent the external HTTP endpoints e.g. /2/tweets Selections to represent the ways to find resources e.g. "Tweet lookup by id". To implement a selection, create a GraphQL query which returns one or more resources Resources to represent the core resources in our system e.g. Tweets and users. To implement a resource, create a directory for every resource field which contains a GraphQL query to fetch the data for that specific field e.g. Tweet/text Using these three components to construct a directory structure, teams can independently own and contribute different parts of the overall Twitter API while still returning uniform representations in responses. For example, here's a subset of our selections and resources directories: This Tweet is unavailable This Tweet is unavailable. ├── selections │ └── tweet │ ├── id │ │ ├── Selection.scala │ │ ├── selection.graphql │ ├── multi_ids │ │ ├── Selection.scala │ │ ├── selection.graphql │ ├── search │ │ ├── Selection.scala │ │ ├── selection.graphql ├── resources │ ├── tweet │ │ ├── id │ │ │ ├── Field.scala │ │ │ └── fragment.graphql │ │ ├── author_id │ │ │ ├── Field.scala │ │ │ └── fragment.graphql │ │ ├── text │ │ │ ├── Field.scala │ │ │ └── fragment.graphql GraphQL plays a key role in this architecture. We can utilize GraphQL fragments as the unit of our rendering reuse (in a similar way to React Relay). For example, the GraphQL queries below all use a "platform_tweet" fragment which is a fragment created by combining all the customer requested fields in the /resources/tweet directory: This Tweet is unavailable This Tweet is unavailable. https://api.twitter.com/2/tweets/20 Selection: /selections/tweet/id/selection.graphql This Tweet is unavailable This Tweet is unavailable. query TweetById($id: String!) { tweet_by_rest_id(rest_id: $id) { ...platform_tweet } } https://api.twitter.com/2/tweets?ids=20,21 Selection: /selections/tweet/multi_ids/selection.graphql This Tweet is unavailable This Tweet is unavailable. query TweetsByIds($ids: [String!]!) { tweets_by_rest_ids(rest_ids: $ids) { ...platform_tweet } } https://api.twitter.com/2/tweets/search/recent?query=%23DogsofTwitter Selection: /selections/tweet/search/selection.graphql This Tweet is unavailable This Tweet is unavailable. query TweetsBySearch($query: String!, $start_time: String, $end_time: String, ...) { search_query(query: $query) { matched_tweets(from_date: $start_time, to_date: $end_time, ...) { tweets { ...platform_tweet } next_token } } } Putting it all together This Tweet is unavailable This Tweet is unavailable. At this point in the story, you may be curious where endpoint-specific business logic actually lives. We offer two options: When an endpoint’s business logic can be represented in StratoQL (the language used by Twitter’s data catalog system known as Strato which powers the GraphQL schema), then we only need to write a function in StratoQL without requiring a separate service.  Otherwise, the business logic is contained in a Finatra Thrift microservice written in Scala, exposed by a Thrift Strato Column. With the platform providing the common needs for all HTTP endpoints, new routes and resources can be released without spinning up any new HTTP services. We can ensure uniformity through the platform by standardizing how a Tweet is rendered or how a set of Tweets are paginated regardless of the actual endpoint used for retrieval.  Additionally, if an endpoint can be constructed from queries for already existing data in the GraphQL schema, or if they're able to implement their logic in StratoQL, then we can not only bypass almost all "service owning" responsibilities but also deliver faster access to new Twitter features! One aspect of the platform that has been top of mind since the beginning is the importance of serving the health of the public conversation and protecting the personal data of people using Twitter. The new platform takes a strong stance on where related business logic should live by pushing all security and privacy related logic to backend services. The result is that the API layer is agnostic to this logic and privacy decisions are applied uniformly across all of the Twitter clients and the API. By isolating where these decisions are made, we can limit inconsistent data exposure so that what you see in the iOS app will be the same as what you get from programmatic querying through the API. This is the start of our journey and our work is far from done. We have many more existing v1.1 endpoints to migrate and improve, and entirely new public endpoints to build. We know developers want the ability to interact with all of the different features in the Twitter app and we’re excited for you to see how we’ve leveraged this platform approach to do just that. We can’t wait to bring more features to the new Twitter API! To see more about our plans, check out our Guide to the future of the new API.      This Tweet is unavailable This Tweet is unavailable. Jenny Qiu Hylbert ‎@jqiu‎ Senior Engineering Manager Steve Cosenza ‎@scosenza‎ Senior Staff Engineer Only on Twitter @TwitterEng @TwitterDev #TwitterAPI Tweet Twitter logo icon Tags: API microservices infrastructure Link copied successfully More from Infrastructure Kafka as a storage system By Babatunde Fashola on Wednesday, 16 December 2020 How we fortified Twitter's real time ad spend architecture By Revenue Platform on Monday, 2 November 2020 The infrastructure behind Twitter: efficiency and optimization By Mazdak Hashemi on Tuesday, 23 August 2016 Building DistributedLog: High-performance replicated log service By Leigh Stewart on Wednesday, 16 September 2015 See what's happening Twitter ‎@Twitter‎ Follow Tweets Following Followers Twitter platform Twitter.com Status Card validator Privacy Center Transparency Center Twitter, Inc. About the company Twitter for Good Company news Brand toolkit Jobs and internships Investors Help Help Center Using Twitter Twitter Media Ads Help Center Managing your account Safety and security Rules and policies Contact us Developer resources Developer home Documentation Forums Communities Developer blog Engineering blog Developer terms Business resources Advertise Twitter for business Resources and guides Twitter for marketers Marketing insights Brand inspiration Twitter Data Twitter Flight School ‎© 2021 Twitter, Inc.‎ Cookies Privacy Terms and conditions By using Twitter’s services you agree to our Cookies Use. We use cookies for purposes including analytics, personalisation, and ads. OK blog-vlib-mpg-de-7011 ---- Max Planck vLib News Max Planck vLib News   MPG/SFX server maintenance, Tuesday 01 December, 5-6 pm The database of the MPG/SFX server will undergo scheduled maintenance. The downtime will start at 5 pm. Services are expected to be back after 30 minutes. We apologize for any inconvenience. How to get Elsevier articles after December 31, 2018 The Max Planck Digital Library has been mandated to discontinue their Elsevier subscription when the current agreement expires on December 31, 2018. Read more about the background in the full press release. Nevertheless, most journal articles published until that date will remain available, due to the rights stipulated in the MPG contracts to date. To … Continue reading How to get Elsevier articles after December 31, 2018 → Aleph Multipool-Recherche: Parallele Suche in MPG-Bibliothekskatalogen Update, 07.12.2018: Die Multipool-Suche gibt es jetzt auch als Webinterface. Der Multipool-Expertenmodus im Aleph Katalogisierungs-Client dient der schnellen Recherche in mehreren Datenbanken gleichzeitig. Dabei können die Datenbanken entweder direkt auf dem Aleph-Server liegen oder als externe Ressourcen über das z39.50-Protokoll angebunden sein. Zusätzlich zu den lokalen Bibliotheken ist der MPI Bibliothekskatalog im GBV auf dem … Continue reading Aleph Multipool-Recherche: Parallele Suche in MPG-Bibliothekskatalogen → Goodbye vLib! Shutdown after October 31, 2018 In 2002 the Max Planck virtual Library (vLib) was launched, with the idea of making all information resources relevant for Max Planck users simultaneously searchable under a common user interface. Since then, the vLib project partners from the Max Planck libraries, information retrieval services groups, the GWDG and the MPDL invested much time and effort … Continue reading Goodbye vLib! Shutdown after October 31, 2018 → HTTPS only for MPG/SFX and MPG.eBooks As of next week, all http requests to the MPG/SFX link resolver will be redirected to a corresponding https request. The Max Planck Society electronic Book Index is scheduled to be switched to https only access the week after, starting on November 27, 2017. Regular web browser use of the above services should not be … Continue reading HTTPS only for MPG/SFX and MPG.eBooks → HTTPS enabled for MPG/SFX The MPG/SFX link resolver is now alternatively accessible via the https protocol. The secure base URL of the productive MPG/SFX instance is: https://sfx.mpg.de/sfx_local. HTTPS support enables secure third-party sites to load or to embed content from MPG/SFX without causing mixed content errors. Please feel free to update your applications or your links to the MPG/SFX … Continue reading HTTPS enabled for MPG/SFX → Citation Trails in Primo Central Index (PCI) The May 2016 release brought an interesting functionality to the MPG/SFX server maintenance, Wednesday 20 April, 8-9 am The MPG/SFX server updates to a new database (MariaDB) on Wednesday morning. The downtime will begin at 8 am and is scheduled to last until 9 am. We apologize for any inconvenience. ProQuest Illustrata databases discontinued Last year, the information provider ProQuest decided to discontinue its "Illustrata Technology" and "Illustrata Natural Science" databases. Unfortunately, this represents a preliminary end to ProQuest’s long-year investment into deep indexing content. In a corresponding support article ProQuest states that there "[…] will be no loss of full text and full text + graphics images because … Continue reading ProQuest Illustrata databases discontinued → MPG.ReNa via https only The MPG Resource Navigator MPG.ReNa is now accessible via https only. If in doubt, please double-check any routines and applications loading or embedding content via MPG.ReNa APIs. Please note that you may need to re-subscribe to resource feeds, or update URLs of RSS widgets in your Content Management System, etc. We apologize for any inconvenience. blog-vlib-mpg-de-8647 ---- Max Planck vLib News |   Max Planck vLib News Search Primary Menu Skip to content Home About Contact Disclaimer Privacy Policy Search for: sfx link resolver MPG/SFX server maintenance, Tuesday 01 December, 5-6 pm 30. November 2020 eia The database of the MPG/SFX server will undergo scheduled maintenance. The downtime will start at 5 pm. Services are expected to be back after 30 minutes. We apologize for any inconvenience. outage resources, sfx link resolver How to get Elsevier articles after December 31, 2018 20. December 2018 inga The Max Planck Digital Library has been mandated to discontinue their Elsevier subscription when the current agreement expires on December 31, 2018. Read more about the background in the full press release. Nevertheless, most journal articles published until that date will remain available, due to the rights stipulated in the MPG contracts to date. To fulfill the content needs of Max Planck researchers when Elsevier shuts off access to recent content at the beginning of January, the Max Planck libraries and MPDL have coordinated the setup of a common document order service. This will be integrated into the MPG/SFX interface and can be addressed as follows: Step 1: Search in ScienceDirect, start in any other database or enter the article details into the MPG/SFX citation linker. Step 2: Click the MPG/SFX button. Note: In ScienceDirect, it appears in the “Get Access” section at the top of those article pages for which the full text is no longer available: Step 3: Check the options in the service menu presented to you, e.g. free available full text versions (if available). Step 4: To order the article via your local library or the MPDL, select the corresponding link, e.g. "Request document via your local library". Please note that the wording might differ slightly according to your location. Step 5: Add your personal details to the order form in the next screen and submit your document request. The team in your local library or at the MPDL will get back to you as soon as possible. Please feel free to contact us if you face any problem or want to raise a question. Update, 06.06.2019: Check out our new flyer "How to deal with no subscription DEAL" prepared in cooperation with Max Planck’s PhDnet. elsevier document-delivery resources Aleph Multipool-Recherche: Parallele Suche in MPG-Bibliothekskatalogen 2. November 2018 inga Update, 07.12.2018: Die Multipool-Suche gibt es jetzt auch als Webinterface. Der Multipool-Expertenmodus im Aleph Katalogisierungs-Client dient der schnellen Recherche in mehreren Datenbanken gleichzeitig. Dabei können die Datenbanken entweder direkt auf dem Aleph-Server liegen oder als externe Ressourcen über das z39.50-Protokoll angebunden sein. Zusätzlich zu den lokalen Bibliotheken ist der MPI Bibliothekskatalog im GBV auf dem Aleph-Sever bereits vorkonfiguriert. Die Multipool-Funktion ist im Aleph Katalogisierungs-Client im Recherche-Bereich zu finden (2. Tab): Unterhalb des Bereichs zur Auswahl der relevanten Datenbanken kann man die Suchanfrage eintragen. Hinweise zur verwendeten Kommandosprache finden sich in der Aleph-Hilfe. Nach dem Absenden der Suchanfrage wird die Ergebnisliste mit den Datenbanken und der jeweiligen Treffermenge im unteren Rahmen angezeigt: Zum Öffnen eines einzelnen Sets genügt ein Doppelklick: Bei gemeinsamen Katalogen – wie z.B. dem MPI Bibliothekskatalog im GBV – findet sich der Hinweis auf die bestandshaltende Bibliothek in der Datensatz-Vollanzeige: Zur Einrichtung der Multipool-Suche müssen die vom lokalen Aleph-Client genutzten Konfigurationsdateien (library.ini und searbase.dat) erweitert werden. Bei Bedarf stellen wir die von uns genutzten Dateien gerne zur Verfügung. Weiterführende Informationen finden sich auch im Aleph Wiki: Download und Installation des Aleph Clients Einrichtung weiterer Z39.50-Zugänge Aleph vLib portal Goodbye vLib! Shutdown after October 31, 2018 24. October 2018 inga In 2002 the Max Planck virtual Library (vLib) was launched, with the idea of making all information resources relevant for Max Planck users simultaneously searchable under a common user interface. Since then, the vLib project partners from the Max Planck libraries, information retrieval services groups, the GWDG and the MPDL invested much time and effort to integrate various library catalogs, reference databases, full-text collections and other information resources into MetaLib, a federated search system developed by Ex Libris. With the rise of large search engines and discovery tools in recent years, usage slowly shifted away and the metasearch technology applied was no longer fulfilling user’s expection. Therefore, the termination of most vLib services was announced two years ago and now we are approaching the final shutdown: The vLib portal will cease to operate after the 31th of October 2018. As you know, there are many alternatives to the former vLib services: MPG.ReNa will remain available for browsing and discovering electronic resources available to Max Planck users. In addition, we’ll post some information on how to cross search Max Planck library catalogs soon. Let us take the opportunity to send a big "Thank you!" to all vLib users and collaborators within and outside the Max Planck Society. It always was and will continue to be a pleasure to work with and for you. Goodbye!… and please feel free to contact us in case of any further question. MPG.eBooks, sfx link resolver HTTPS only for MPG/SFX and MPG.eBooks 17. November 2017 eia As of next week, all http requests to the MPG/SFX link resolver will be redirected to a corresponding https request. The Max Planck Society electronic Book Index is scheduled to be switched to https only access the week after, starting on November 27, 2017. Regular web browser use of the above services should not be affected. Please thoroughly test any solutions that integrate these services via their web APIs. Please consider re-subscribing to MPG.eBooks RSS feeds. ebookshttpsrss sfx link resolver HTTPS enabled for MPG/SFX 27. June 2016 inga The MPG/SFX link resolver is now alternatively accessible via the https protocol. The secure base URL of the productive MPG/SFX instance is: https://sfx.mpg.de/sfx_local. HTTPS support enables secure third-party sites to load or to embed content from MPG/SFX without causing mixed content errors. Please feel free to update your applications or your links to the MPG/SFX server. https resources Citation Trails in Primo Central Index (PCI) 2. June 2016 inga The May 2016 release brought an interesting functionality to the Primo Central Index (PCI): The new "Citation Trail" capability enables PCI users to discover relevant materials by providing cited and citing publications for selected article records. At this time the only data source for the citation trail feature is CrossRef, thus the number of citing articles will be below the "Cited by" counts in other sources like Scopus and Web of Science. Further information: Short video demonstrating the citation trail feature (by Ex Libris). Detailed feature description (by Ex Libris) pciprimo-central-indexscopusweb-of-science sfx link resolver MPG/SFX server maintenance, Wednesday 20 April, 8-9 am 20. April 2016 inga The MPG/SFX server updates to a new database (MariaDB) on Wednesday morning. The downtime will begin at 8 am and is scheduled to last until 9 am. We apologize for any inconvenience. outage resources ProQuest Illustrata databases discontinued 15. April 2016 inga Last year, the information provider ProQuest decided to discontinue its "Illustrata Technology" and "Illustrata Natural Science" databases. Unfortunately, this represents a preliminary end to ProQuest’s long-year investment into deep indexing content. In a corresponding support article ProQuest states that there "[…] will be no loss of full text and full text + graphics images because of the removal of Deep Indexed content". In addition, they announce to "[…] develop an even better way for researchers to discover images, figures, tables, and other relevant visual materials related to their research tasks". The MPG.ReNa records for ProQuest Illustrata: Technology and ProQuest Illustrata: Natural Science have been marked as "terminating" and will be deactivated soon. proquest MPG.ReNa MPG.ReNa via https only 30. March 2016 eia The MPG Resource Navigator MPG.ReNa is now accessible via https only. If in doubt, please double-check any routines and applications loading or embedding content via MPG.ReNa APIs. Please note that you may need to re-subscribe to resource feeds, or update URLs of RSS widgets in your Content Management System, etc. We apologize for any inconvenience. https Posts navigation 1 2 … 10 Next → In short In this blog you'll find updates on information resources, vendor platform and access systems provided by the Max Planck Digital Library. Use MPG.ReNa to search and browse through the journal collections, eBook collections and databases available to MPG researchers. New Resources in MPG.ReNa Australian Education Index (ProQuest) 25. April 2021 RiffReporter 25. March 2021 Journal on Excellence in College Teaching 2. March 2021 Persian E-Bookds Miras Maktoob (Brill) 16. February 2021 Translated CIA Documents with Global Perspectives (NewsBank) 14. February 2021 MPDL News   News Categories COinS (4) exLibris (2) localization (6) materials (7) MPG.eBooks (1) MPG.ReNa (3) question and answer (6) resources (21) sfx link resolver (44) tools (10) vLib portal (38) Related Blogs FHI library MPIs Stuttgart Library PubMan blog Proudly powered by WordPress carpentries-org-9661 ---- The Carpentries The Carpentries Nav Donate Search Contact Home About About Us Our Values Code of Conduct Governance Supporters Testimonials Annual Reports Equity, Inclusion, and Accessibility Teach What is a Workshop? Data Carpentry Lessons Software Carpentry Lessons Library Carpentry Lessons Community Lessons Become an Instructor For Instructors Online Workshop Recommendations Learn Our Workshops Our Curricula Upcoming Workshops Past Workshops Our Impact Join Us Get Involved Help Develop Lessons Become a Member Organisation Job Vacancies Our community Our Team Core Team Projects Community Overview Our Instructors Our Maintainers Our Mentors Our Regional Coordinators Our Trainers Committees and Task Forces Current Member Organisations Connect Blog Community Calendar Community Discussions Community Handbook Newsletter Carpentries Podcast Twitter We teach foundational coding and data science skills to researchers worldwide. What we do The Carpentries teaches foundational coding, and data science skills to researchers worldwide. Software Carpentry, Data Carpentry, and Library Carpentry workshops are based on our lessons. Workshop hosts, Instructors, and learners must be prepared to follow our Code of Conduct. More › Who we are Our diverse, global community includes Instructors, helpers, Trainers, Maintainers, Mentors, community champions, member organisations, supporters, workshop organisers, staff and a whole lot more. More › Get involved See all the ways you can engage with The Carpentries. Get information about upcoming events such as workshops, meetups, and discussions from our community calendar, or from our twice-monthly newsletter, Carpentry Clippings. Follow us on Twitter, Facebook, and Slack. More › Subscribe to our Newsletter "Carpentry Clippings" Events, Community Updates, Teaching Tips, in your inbox, twice a month New Blog Posts Core Team 'Acc-athon' to Add Alt Text Across Carpentries Curricula Carpentries Core Team gets a start on alt text updates for Carpentries lessons Read More › More Posts Incubator Lesson Spotlight: Python for Business The Carpentries Strategic Plan: One Year Update Foundations of Astronomical Data Science - Call for Beta Pilot Applications More › Resources for Online Workshops Official Carpentries' Recommendations This page holds an official set of recommendations by The Carpentries to help you organise and run Online Carpentries workshops. The page is updated periodically as we continue to receive input and feedback from our community. Go to Page. Community-Created Resources This resource is a section in our Handbook containing an evolving list of all community-created resources and conversations around teaching Carpentries workshops online. The section is updated periodically to include newer resources and emerging conversations on the subject. Go to Page. Upcoming Carpentries Workshops Click on an individual event to learn more about that event, including contact information and registration instructions. Brac University (online) ** Instructors: Annajiat Alim Rasel, Benson Muite Feb 14 - May 21, 2021 Brac University (online) ** Instructors: Annajiat Alim Rasel Feb 20 - May 22, 2021 University of Edinburgh ** Instructors: Fran Baseby, Chris Wood, Charlotte Desvages, Alex Casper Cline Helpers: Jen Harris, Marco Crotti, Robert Smith, Graham Blyth, Matthew Fellion, Francine Millard Mar 4 - May 13, 2021 ENES unidad León, Licenciatura en Ciencias Agrogenómicas ** Instructors: Tania Vanessa Arellano Fernández, J Abraham Avelar-Rivas Helpers: Maria Cambero, Nelly Sélem, Aarón Jaime Mar 20 - May 8, 2021 King's College London Instructors: Rohit Goswami, Sanjay Fuloria, Annajiat Alim Rasel Helpers: Fursham Hamid, James Cain, Kai Lim Apr 14 - Apr 28, 2021 UCLA (online) Instructors: Jamie Jamison, Scott Gruber, Kristian Allen, Elizabeth McAulay Helpers: Tim Dennis, Geno Sanchez, Leigh Phan, Zhiyuan Yao, Dave George Apr 16 - May 7, 2021 Institute for Modeling Collaboration and Innovation @ The University of Idaho (online) Instructors: Erich Seamon Helpers: Travis Seaborn Apr 20 - Apr 29, 2021 Swansea University (online) Instructors: Ed Bennett, Vladimir Khodygo, Ben Thorpe, Michele Mesiti Helpers: Tom Pritchard Apr 26 - Apr 30, 2021 United States Department of Agriculture (USDA) Instructors: Meghan Sposato, Aditya Bandla, Adrienne Traxler, Kristina Riemer Apr 27 - May 4, 2021 Workshop on Programming using Python ** Instructors: Bezaye Tesfaye Belayneh, Christian Meeßen, Hannes Fuchs, Maximilian Dolling Helpers: Stefan Lüdtke Apr 27 - Apr 28, 2021 United States Geological Survey Instructors: Anthony Valente, Ainhoa Oliden Sanchez, Ian Carroll, Melissa Fabros Apr 27 - Apr 28, 2021 Emerging Public Leaders (online) Instructors: Benson Muite Helpers: Selorm Tamakloe May 1 - May 29, 2021 UW-Madison (online) ** Instructors: Trisha Adamus, Clare Michaud, Tobin Magle, Sailendharan Sudakaran Helpers: Karl Broman, Erin Jonaitis, Casey Schacher, Heather Shimon, Sarah Stevens May 3 - May 10, 2021 Queensland Cyber Infrastructure Foundation ** Instructors: David Green, Paula Andrea Martinez Helpers: Marlies Hankel, Betsy Alpert May 5 - May 5, 2021 University of California, Santa Barbara (online) Instructors: Torin White, Camila Vargas, Greg Janée Helpers: Kristi Liu, Renata Curty May 6 - May 7, 2021 National Oceanic and Atmospheric Administration Instructors: Callum Rollo, D. Sarah Stamps, Jonathan Guyer, Annajiat Alim Rasel May 10 - May 13, 2021 King's College London (online) ** Instructors: Stefania Marcotti, Flavia Flaviani, Alessia Visconti Helpers: Fursham Hamid, Alejandro Santana-Bonilla May 12 - May 26, 2021 UW-Madison (online) Instructors: Trisha Adamus, Clare Michaud, Sarah Stevens, Erwin Lares Helpers: Karl Broman, Casey Schacher, Sarah Stevens, Heather Shimon May 12 - May 19, 2021 Netherlands eScience Center (online) Instructors: Pablo Rodríguez-Sánchez, Alessio Sclocco Helpers: Barbara Vreede, Lieke de Boer May 17 - May 20, 2021 Openscapes Instructors: Jake Szamosi, Bia Villas Boas, Makhan Virdi, Negin Valizadegan May 17 - May 19, 2021 NHS Library and Health Services Instructors: Jez Cope, Fran Baseby, Annajiat Alim Rasel May 18 - May 19, 2021 Queensland Cyber Infrastructure Foundation ** Instructors: Jason Bell, Dag Evensberget, Kasia Koziara Helpers: David Green, Marlies Hankel, Betsy Alpert, Stéphane Guillou, Shern Tee May 19 - May 20, 2021 AUC Data Science Initiative Instructors: Monah Abou Alezz, Muhammad Zohaib Anwar, Yaqing Xu, Jason Williams May 20 - May 25, 2021 Joint Genome Institute/UC Merced ** Instructors: Rhondene Wint Jun 1 - Jun 4, 2021 King's College London (online) ** Instructors: Alessia Visconti, Stefania Marcotti, Flavia Flaviani Helpers: Fursham Hamid, Alejandro Santana-Bonilla Jun 9 - Jun 16, 2021 NWU, South Africa Instructors: Sebastian Mosidi, Martin Dreyer Aug 10 - Aug 13, 2021 ** Workshops marked with asterisks are based on curriculum from The Carpentries lesson programs but may not follow our standard workshop format. Workshops with a globe icon are being held online. The corresponding flag notes the country where the host organization is based. Click here to see our past workshops.  About The Carpentries The Carpentries is a fiscally sponsored project of Community Initiatives, a registered 501(c)3 non-profit organisation based in California, USA. We are a global community teaching foundational computational and data science skills to researchers in academia, industry and government. More › Services Contact RSS Atom sitemap.xml Links Our Code of Conduct Our Community Handbook Our Privacy Policy Our Annual Reports Software Carpentry website Data Carpentry website Library Carpentry website casrai-org-1267 ---- CRediT - Contributor Roles Taxonomy Skip to content CASRAI Less burden. More research. Menu About Blog Resources Supporters More CRediT – Contributor Roles Taxonomy CRediT (Contributor Roles Taxonomy) is high-level taxonomy, including 14 roles, that can be used to represent the roles typically played by contributors to scientific scholarly output. The roles describe each contributor’s specific contribution to the scholarly output. 14 Contributor Roles Conceptualization Data curation Formal Analysis Funding acquisition Investigation Methodology Project administration Resources Software Supervision Validation Visualization Writing – original draft Writing – review & editing Contributor Roles Defined Conceptualization – Ideas; formulation or evolution of overarching research goals and aims. Data curation – Management activities to annotate (produce metadata), scrub data and maintain research data (including software code, where it is necessary for interpreting the data itself) for initial use and later re-use. Formal analysis – Application of statistical, mathematical, computational, or other formal techniques to analyze or synthesize study data. Funding acquisition ​- Acquisition of the financial support for the project leading to this publication. Investigation – ​Conducting a research and investigation process, specifically performing the experiments, or data/evidence collection. Methodology – Development or design of methodology; creation of models. Project administration – Management and coordination responsibility for the research activity planning and execution. Resources – Provision of study materials, reagents, materials, patients, laboratory samples, animals, instrumentation, computing resources, or other analysis tools. Software – Programming, software development; designing computer programs; implementation of the computer code and supporting algorithms; testing of existing code components. Supervision – Oversight and leadership responsibility for the research activity planning and execution, including mentorship external to the core team. Validation – Verification, whether as a part of the activity or separate, of the overall replication/reproducibility of results/experiments and other research outputs. Visualization – Preparation, creation and/or presentation of the published work, specifically visualization/data presentation. Writing – original draft – ​Preparation, creation and/or presentation of the published work, specifically writing the initial draft (including substantive translation). Writing – review & editing – Preparation, creation and/or presentation of the published work by those from the original research group, specifically critical review, commentary or revision – including pre- or post-publication stages. Background CRediT grew from a practical realization that bibliographic conventions for describing and listing authors on scholarly outputs are increasingly outdated and fail to represent the range of contributions that researchers make to published output. Furthermore, there is growing interest among researchers, funding agencies, academic institutions, editors, and publishers in increasing both the transparency and accessibility of research contributions. Most publishers require author and contribution disclosure statements upon article submission – some in structured form, some in free-text form – at the same time that funders are developing more scientifically rigorous ways to track the outputs and impact of their research investments. In mid-2012 the Wellcome Trust and Harvard University co-hosted a workshop to bring together members of the academic, publishing, and funder communities interested in exploring alternative contributorship and attribution models. Following the workshop (see workshop report), and working initially with a group of mainly biomedical journal editors (and members of the ICMJE a pilot project was established to develop a controlled vocabulary of contributor roles (taxonomy) that could be used to describe the typical range of ‘contributions’ to scholarly published output for biomedical and science more broadly. The aim was to develop a taxonomy that was both practical and easy to understand while minimizing the potential for misuse. A draft taxonomy was tested with a sample of recent corresponding authors publishing across science and was relatively well received. The outcomes of the pilot test are described in Nature commentary (April 2014). Benefits Since 2014, the contributor taxonomy – otherwise known as CRediT (Contributor Roles Taxonomy) has been widely adopted across a range of publishers to improve accessibility and visibility of the range of contribution to published research outputs, bringing a number of important and practical benefits to the research ecosystem more broadly, including: Helping to reduce the potential for author disputes. Supporting adherence to authorship/contributorship processes and policies. Enabling visibility and recognition of the different contributions of researchers, particularly in multi-authored works – across all aspects of the research being reported (including data curation, statistical analysis, etc.) Support identification of peer reviewers and specific expertise. ​Support grant making by enabling funders to more easily identify those responsible for specific research products, developments or breakthroughs. Improving the ability to track the outputs and contributions of individual research specialists and grant recipients. Easy identification of potential collaborators and opportunities for research networking. Further developments in data management and nano-publication. ​Inform ‘science of science’ (‘meta-research) to help enhance scientific efficacy and effectiveness. ​Enable new indicators of research value, use and re-use, credit and attribution. Adopters This list is constantly evolving and will be frequently updated. To share information about a CRediT adoption, please email: [credit] at [casrai] dot [org] ​Publishers American Association of Petroleum Geologists BMJ British Psychological Society Cell Press “CPC” Business Perspectives Dartmouth Journal Services De Gruyter Open Duke University Press eLife Elsevier Evidence Based Communications F1000 Research Geological Society of London Health & Medical Publishing Group International Centre of Insect Physiology and Ecology The Journal of Bone & Joint Surgery KAMJE Press Lippincott Williams & Wilkins MA Healthcare MDPI MIT Press Oman Medical Specialty Board Oxford University Press Public Library of Science (Plos) SAE International SAGE Publishing ScholarOne SLACK Incorporated Springer Springer Publishing Company Virtus Interpress Wiley VCH Wolters Kluwer Institutions University of Glasgow Integrators Allen Press/ Peer Track Aries Systems/ Editorial Manager Clarivate Analytics/ ScholarOne Coko Foundation/ PubSweet OpenConf River Valley/ ReView eJournalPress Rescognito ​Worktribe Publishing Outlets Gates Open Research HRB Open Research Wellcome Open Research How to Implement CRediT For academics Just begin allocating the terms appropriately to your contributors within research outputs. Advocate that your institution and any publications you’re submitting to acknowledge and adopt the taxonomy. For Publishers CRediT adoption can be achieved via a manual workflow outside of Submission and Peer Review systems, or through using a system with an existing CRediT integration. The roles given in the above taxonomy include, but are not limited to, traditional authorship roles. The roles are not intended to define what constitutes authorship, but instead to capture all the work that allows scholarly publications to be produced. Recommendations for applying the CRediT taxonomy are: List all Contributions – All contributions should be listed, whether from those listed as authors or individuals named in acknowledgements; Multiple Roles Possible – Individual contributors can be assigned multiple roles, and a given role can be assigned to multiple contributors; Degree of Contribution Optional – Where multiple individuals serve in the same role, the degree of contribution can optionally be specified as ‘lead’, ‘equal’, or ‘supporting’; Shared Responsibility – Corresponding authors should assume responsibility for role assignment, and all contributors should be given the opportunity to review and confirm assigned roles; Make CRediT Machine Readable – CRediT tagged contributions should be coded in JATS xml v1.2 ​The taxonomy has been refined by Consortia Advancing Standards in Research Administration (CASRAI) and National Information Standards Organization (NISO). It is in adoption by Cell Press, PLOS and many other publishers, and has been integrated into some submission and peer review systems including Aries’ Editorial Manager, and River Valley’s ReView. It will be integrated into Coko Foundation’s xPub. For publishers to make CRediT machine readable and with full meta-data available, CRediT should be  coded in JATS xml v1.2, described via this link: https://jats4r.org/credit-taxonomy Links of Interest Resources PLOS & CRediT Cell Press Adoption Interview, Council for Science Editors’ Science Editor Aries Systems CRediT Integration FAQ Aries Systems CRediT Integration video​ Aries Systems/ JBJS CRediT Integration Case Study Articles & Publications How can we ensure visibility and diversity in research contributions? How the Contributor Role Taxonomy (CRediT) is helping the shift from authorship to contributorship Making research contributions more transparent: report of a FORCE workshop Farewell authors, hello contributors Contributorship, Not Authorship: Use CRediT to Indicate Who Did What Now is the time for a team-based approach to team science CRediT where credit is due Now is the time for a team-based approach to team science Increase transparency by adding CRediT to workflow with PubSweet Credit data generators for data reuse Report on the International Workshop on Contributorship and Scholarly Attribution (2012) Guglielmi, Giorgia, Who gets credit? Survey digs into the thorny question of authorship. Nature News. doi: 10.1038/d41586-018-05280-0 Brand, A.; Allen, L.; Altman, M.; Hlava, M.; Scott, J., Beyond Authorship: attribution, contribution, collaboration, and credit. Learned Publishing 2015, 28 (2), 151-155. Allen, L.; Brand, A.; Scott, J.; Altman, M.; Hlava, M., Credit where credit is due. Nature 2014, 508 (7496), 312-313. “Academic Recognition of Team Science: How to Optimize the Canadian Academic System,” (Canadian Academy of Health Sciences, Ottawa (ON), 2017). “Improving recognition of team science contributions in biomedical research careers,” (Academy of Medical Sciences 2016). V. Ilik, M. Conlon, G. Triggs, M. Haendel, K. L. Holmes, OpenVIVO: Transparency in Scholarship. Frontiers in Research Metrics and Analytics preprint (2018). Interview with @DKingsley, Cambridge University Meet the Chairs Liz Allen, Director of Strategic Initiatives, F1000 Research Alison McGonagle-O’Connell, Founder, O’Connell Consulting. Get Involved We have been overwhelmed by the interest in CRediT to date and are working to support adoption and encourage practical usage. We are also working to ensure that CRediT is tied to ORCID and included in the Crossref metadata capture. CRediT is currently managed as an informal standard at CASRAI and we are working towards formal standardisation of the taxonomy at NISO. But please do get involved by joining the community CRediT Interest Group, spreading the word, and providing feedback! Proudly powered by WordPress | Theme: Business catalog-docnow-io-6355 ---- Home | DocNow Tweet Catalog The DocNow Catalog is a collectively curated listing of Twitter datasets. Public datasets are shared as Tweet IDs, which can be hydrated back into full datasets using our Hydrator desktop application. 0 Records comprising 0 tweets Add Record SubjectsAll Tweets Start Tweets End Search Note: all metadata is shared under a CC0 license. Please read our Code of Conduct for more information about contributing datasets. ADDED DATE RANGE TITLE TWEET COUNT CREATORS SUBJECTS REPOSITORY cbeer-info-3687 ---- blog.cbeer.info Chris Beer chris@cbeer.info cbeer _cb_ May 25, 2016 Autoscaling AWS Elastic Beanstalk worker tier based on SQS queue length We are deploying a Rails application (for the Hydra-in-a-Box project) to AWS Elastic Beanstalk. Elastic Beanstalk offers us easy deployment, monitoring, and simple auto-scaling with a built-in dashboard and management interface. Our application uses several potentially long-running background jobs to characterize, checksum, and create derivates for uploaded content. Since we’re deploying this application within AWS, we’re also taking advantage of the Simple Queue Service (SQS), using the active-elastic-job gem to queue and run ActiveJob tasks. Elastic Beanstalk provides settings for “Web server” and “Worker” tiers. Web servers are provisioned behind a load balancer and handle end-user requests, while Workers automatically handle background tasks (via SQS + active-elastic-job). Elastic Beanstalk provides basic autoscaling based on a variety of metrics collected from the underlying instances (CPU, Network, I/O, etc), although, while sufficient for our “Web server” tier, we’d like to scale our “Worker” tier based on the number of tasks waiting to be run. Currently, though, the ability to auto-scale the worker tier based on the underlying queue depth isn’t enable through the Elastic Beanstak interface. However, as Beanstalk merely manages and aggregates other AWS resources, we have access to the underlying resources, including the autoscaling group for our environment. We should be able to attach a custom auto-scaling policy to that auto scaling group to scale based on additional alarms. For example, let’s we want to add additional worker nodes if there are more than 10 tasks for more than 5 minutes (and, to save money and resources, also remove worker nodes when there are no tasks available). To create the new policy, we’ll need to: find the appropriate auto-scaling group by finding the Auto-scaling group with the elasticbeanstalk:environment-id that matches the worker tier environment id; find the appropriate SQS queue for the worker tier; add auto-scaling policies that add (and remove) instances to the autoscaling group; create a new CloudWatch alarm that measures the SQS queue exceeds our configured depth (5) that triggers the auto-scaling policy to add additional worker instances whenever the alarm is triggered; and, conversely, create a new CloudWatch alarm that measures the SQS queue hits 09 that trigger the auto-scaling action to removes worker instances whenever the alarm is triggered. and, similarly for scaling back down. Even though there are several manual steps, they aren’t too difficult (other than discovering the various resources we’re trying to orchestrate), and using Elastic Beanstalk is still valuable for the rest of its functionality. But, we’re in the cloud, and really want to automate everything. With a little CloudFormation trickery, we can even automate creating the worker tier with the appropriate autoscaling policies. First, knowing that the CloudFormation API allows us to pass in an existing SQS queue for the worker tier, let’s create an explicit SQS queue resource for the workers: "DefaultQueue" : { "Type" : "AWS::SQS::Queue", } And wire it up to the Beanstalk application by setting the aws:elasticbeanstalk:sqsd:WorkerQueueURL (not shown: sending the worker queue to the web server tier): "WorkersConfigurationTemplate" : { "Type" : "AWS::ElasticBeanstalk::ConfigurationTemplate", "Properties" : { "ApplicationName" : { "Ref" : "AWS::StackName" }, "OptionSettings" : [ ..., { "Namespace": "aws:elasticbeanstalk:sqsd", "OptionName": "WorkerQueueURL", "Value": { "Ref" : "DefaultQueue"} } } } }, "WorkerEnvironment": { "Type": "AWS::ElasticBeanstalk::Environment", "Properties": { "ApplicationName": { "Ref" : "AWS::StackName" }, "Description": "Worker Environment", "EnvironmentName": { "Fn::Join": ["-", [{ "Ref" : "AWS::StackName"}, "workers"]] }, "TemplateName": { "Ref": "WorkersConfigurationTemplate" }, "Tier": { "Name": "Worker", "Type": "SQS/HTTP" }, "SolutionStackName" : "64bit Amazon Linux 2016.03 v2.1.2 running Ruby 2.3 (Puma)" ... } } Using our queue we can describe one of the CloudWatch::Alarm resources and start describing a scaling policy: "ScaleOutAlarm" : { "Type": "AWS::CloudWatch::Alarm", "Properties": { "MetricName": "ApproximateNumberOfMessagesVisible", "Namespace": "AWS/SQS", "Statistic": "Average", "Period": "60", "Threshold": "10", "ComparisonOperator": "GreaterThanOrEqualToThreshold", "Dimensions": [ { "Name": "QueueName", "Value": { "Fn::GetAtt" : ["DefaultQueue", "QueueName"] } } ], "EvaluationPeriods": "5", "AlarmActions": [{ "Ref" : "ScaleOutPolicy" }] } }, "ScaleOutPolicy" : { "Type": "AWS::AutoScaling::ScalingPolicy", "Properties": { "AdjustmentType": "ChangeInCapacity", "AutoScalingGroupName": ????, "ScalingAdjustment": "1", "Cooldown": "60" } }, However, to connect the policy to the auto-scaling group, we need to know the name for the autoscaling group. Unfortunately, the autoscaling group is abstracted behind the Beanstalk environment. To gain access to it, we’ll need to create a custom resource backed by a Lambda function to extract the information from the AWS APIs: "BeanstalkStack": { "Type": "Custom::BeanstalkStack", "Properties": { "ServiceToken": { "Fn::GetAtt" : ["BeanstalkStackOutputs", "Arn"] }, "EnvironmentName": { "Ref": "WorkerEnvironment" } } }, "BeanstalkStackOutputs": { "Type": "AWS::Lambda::Function", "Properties": { "Code": { "ZipFile": { "Fn::Join": ["\n", [ "var response = require('cfn-response');", "exports.handler = function(event, context) {", " console.log('REQUEST RECEIVED:\\n', JSON.stringify(event));", " if (event.RequestType == 'Delete') {", " response.send(event, context, response.SUCCESS);", " return;", " }", " var environmentName = event.ResourceProperties.EnvironmentName;", " var responseData = {};", " if (environmentName) {", " var aws = require('aws-sdk');", " var eb = new aws.ElasticBeanstalk();", " eb.describeEnvironmentResources({EnvironmentName: environmentName}, function(err, data) {", " if (err) {", " responseData = { Error: 'describeEnvironmentResources call failed' };", " console.log(responseData.Error + ':\\n', err);", " response.send(event, context, resource.FAILED, responseData);", " } else {", " responseData = { AutoScalingGroupName: data.EnvironmentResources.AutoScalingGroups[0].Name };", " response.send(event, context, response.SUCCESS, responseData);", " }", " });", " } else {", " responseData = {Error: 'Environment name not specified'};", " console.log(responseData.Error);", " response.send(event, context, response.FAILED, responseData);", " }", "};" ]]} }, "Handler": "index.handler", "Runtime": "nodejs", "Timeout": "10", "Role": { "Fn::GetAtt" : ["LambdaExecutionRole", "Arn"] } } } With the custom resource, we can finally get access the autoscaling group name and complete the scaling policy: "ScaleOutPolicy" : { "Type": "AWS::AutoScaling::ScalingPolicy", "Properties": { "AdjustmentType": "ChangeInCapacity", "AutoScalingGroupName": { "Fn::GetAtt": [ "BeanstalkStack", "AutoScalingGroupName" ] }, "ScalingAdjustment": "1", "Cooldown": "60" } }, The complete worker tier is part of our CloudFormation stack: https://github.com/hybox/aws/blob/master/templates/worker.json Mar 8, 2015 LDPath in 3 examples At Code4Lib 2015, I gave a quick lightning talk on LDPath, a declarative domain-specific language for flatting linked data resources to a hash (e.g. for indexing to Solr). LDPath can traverse the Linked Data Cloud as easily as working with local resources and can cache remote resources for future access. The LDPath language is also (generally) implementation independent (java, ruby) and relatively easy to implement. The language also lends itself to integration within development environments (e.g. ldpath-angular-demo-app, with context-aware autocompletion and real-time responses). For me, working with the LDPath language and implementation was the first time that linked data moved from being a good idea to being a practical solution to some problems. Here is a selection from the VIAF record [1]: <> void:inDataset <../data> ; a genont:InformationResource, foaf:Document ; foaf:primaryTopic <../65687612> . <../65687612> schema:alternateName "Bittman, Mark" ; schema:birthDate "1950-02-17" ; schema:familyName "Bittman" ; schema:givenName "Mark" ; schema:name "Bittman, Mark" ; schema:sameAs , ; a schema:Person ; rdfs:seeAlso <../182434519>, <../310263569>, <../314261350>, <../314497377>, <../314513297>, <../314718264> ; foaf:isPrimaryTopicOf . We can use LDPath to extract the person’s name: So far, this is not so different from traditional approaches. But, if we look deeper in the response, we can see other resources, including books by the author. <../310263569> schema:creator <../65687612> ; schema:name "How to Cook Everything : Simple Recipes for Great Food" ; a schema:CreativeWork . We can traverse the links to include the titles in our record: LDPath also gives us the ability to write this query using a reverse property selector, e.g: books = foaf:primaryTopic / ^schema:creator[rdf:type is schema:CreativeWork] / schema:name :: xsd:string ; The resource links out to some external resources, including a link to dbpedia. Here is a selection from record in dbpedia: dbpedia-owl:abstract "Mark Bittman (born c. 1950) is an American food journalist, author, and columnist for The New York Times."@en, "Mark Bittman est un auteur et chroniqueur culinaire américain. Il a tenu une chronique hebdomadaire pour le The New York Times, appelée The Minimalist (« le minimaliste »), parue entre le 17 septembre 1997 et le 26 janvier 2011. Bittman continue d'écrire pour le New York Times Magazine, et participe à la section Opinion du journal. Il tient également un blog."@fr ; dbpedia-owl:birthDate "1950+02:00"^^ ; dbpprop:name "Bittman, Mark"@en ; dbpprop:shortDescription "American journalist, food writer"@en ; dc:description "American journalist, food writer", "American journalist, food writer"@en ; dcterms:subject , , , , , , ; LDPath allows us to transparently traverse that link, allowing us to extract the subjects for VIAF record: [1] If you’re playing along at home, note that, as of this writing, VIAF.org fails to correctly implement content negotiation and returns HTML if it appears anywhere in the Accept header, e.g.: curl -H "Accept: application/rdf+xml, text/html; q=0.1" -v http://viaf.org/viaf/152427175/ will return a text/html response. This may cause trouble for your linked data clients. Mar 13, 2013 Building a Pivotal Tracker IRC bot with Sinatra and Cinch We're using Pivotal Tracker on the Fedora Futures project. We also have an IRC channel where the tech team hangs out most of the day, and let each other know what we're working on, which tickets we're taking, and give each other feedback on those tickets. In order to document this, we try to put most of our the discussion in the tickets for future reference (although we are logging the IRC channel, it's not nearly as easy to look up decisions there). Because we're (lazy) developers, we wanted updates in Pivotal to get surfaced in the IRC channel. There was a (neglected) IRC bot, Pivotal-Tracker-IRC-bot, but it was designed to push and pull data from Pivotal based on commands in IRC (and, seems fairly abandoned). So, naturally, we built our own integration: Pivotal-IRC. This was my first time using Cinch to build a bot, and it was a surprisingly pleasant and straightforward experience: bot = Cinch::Bot.new do configure do |c| c.nick = $nick c.server = $irc_server c.channels = [$channel] end end # launch the bot in a separate thread, because we're using this one for the webapp. Thread.new { bot.start } And we have a really tiny Sinatra app that can parse the Pivotal Webhooks payload and funnel it into the channel: post '/' do message = Pivotal::WebhookMessage.new request.body.read bot.channel_list.first.msg("#{message.description} #{message.story_url}") end It turns out we also send links to Pivotal tickets not infrequently, and building two-way communication (using the Pivotal REST API, and the handy pivotal-tracker gem) was also easy. Cinch exposes a handy DSL that parses messages using regular expressions and capturing groups: bot.on :message, /story\/show\/([0-9]+)/ do |m, ticket_id| story = project.stories.find(ticket_id) m.reply "#{story.story_type}: #{story.name} (#{story.current_state}) / owner: #{story.owned_by}" end Mar 9, 2013 Real-time statistics with Graphite, Statsd, and GDash We have a Graphite-based stack of real-time visualization tools, including the data aggregator Statsd. These tools let us easily record real-time data from arbitrary services with mimimal fuss. We present some curated graphs through GDash, a simple Sinatra front-end. For example, we record the time it takes for Solr to respond to queries from our SearchWorks catalog, using this simple bash script: tail -f /var/log/tomcat6/catalina.out | ruby solr_stats.rb (We rotate these logs through truncation; you can also use `tail -f --retry` for logs that are moved away when rotated) And the ruby script that does the actual parsing: require 'statsd.rb' STATSD = Statsd.new(...,8125) # Listen to stdin while str = gets if str =~ /QTime=([^ ]+)/ # extract the QTime ms = $1.to_i # record it, based on our hostname STATSD.timing("#{ENV['HOSTNAME'].gsub('.', '-')}.solr.qtime", ms) end end From this data, we can start asking qustions like: Is our load-balancer configured optimally? (hint: not quite; for a variety of reasons, we've sacrificed some marginal performance benefit for this non-invasive, simpler load-blaance configuration. Why are our the 90th-percentile query times creeping up? (time in ms) (Answers to these questions and more in a future post, I'm sure.) We also use this setup to monitor other services, e.g.: What's happening in our Fedora instance (and, which services are using the repository)? Note the red line ("warn_0") in the top graph. It marks the point where our (asynchronous) indexing system is unable to keep up with demand, and updates may appear at a delay. Given time (and sufficient data, of course), this also gives us the ability to forecast and plan for issues: Is our Solr query time getting worse? (Ganglia can perform some basic manipulation, including taking integrals and derivatives) What is the rate of growth of our indexing backlog, and, can we process it in a reasonable timeframe, or should we scale the indexer service? Given our rate of disk usage, are we on track to run out of disk space this month? this week? If we build graphs to monitor those conditions, we can add Nagios alerts to trigger service alerts. GDash helpfully exposes a REST endpoint that lets us know if a service has those WARN or CRITICAL thresholds. We currently have a home-grown system monitoring system that we're tempted to fold into here as well. I've been evaluating Diamond, which seems to do a pretty good job of collecting granular system statistics (CPU, RAM, IO, Disk space, etc). Mar 8, 2013 Icemelt: A stand-in for integration tests against AWS Glacier One of the threads we've been pursuing as part of the Fedora Futures project is integration with asynchronous and/or very slow storage. We've taken on AWS Glacier as a prime, generally accessable example. Uploading content is slow, but can be done synchronously in one API request: POST /:account_id/vaults/:vault_id/archives x-amz-archive-description: Description ...Request body (aka your content)... Where things get radically different is when requesting content back. First, you let Glacier know you'd like to retrieve your content: POST /:account_id/vaults/:vault_id/jobs HTTP/1.1 { "Type": "archive-retrieval", "ArchiveId": String, [...] } Then, you wait. and wait. and wait some more; from the documentation: Most Amazon Glacier jobs take about four hours to complete. You must wait until the job output is ready for you to download. If you have either set a notification configuration on the vault identifying an Amazon Simple Notification Service (Amazon SNS) topic or specified an Amazon SNS topic when you initiated a job, Amazon Glacier sends a message to that topic after it completes the job. [emphasis added] Icemelt If you're iterating on some code, waiting hours to get your content back isn't realistic. So, we wrote a quick Sinatra app called Icemelt in order to mock the Glacier REST API (and, perhaps taking less time to code than retrieving content from Glacier ). We've tested it using the Ruby Fog client, as well as the official AWS Java SDK, and it actually works! Your content gets stored locally, and the delay for retrieving content is configurable (default: 5 seconds). Configuring the official SDK looks something like this: PropertiesCredentials credentials = new PropertiesCredentials( TestIcemeltGlacierMock.class .getResourceAsStream("AwsCredentials.properties")); AmazonGlacierClient client = new AmazonGlacierClient(credentials); client.setEndpoint("http://localhost:3000/"); And for Fog, something like: Fog::AWS::Glacier.new :aws_access_key_id => '', :aws_secret_access_key => '', :scheme => 'http', :host => 'localhost', :port => '3000' Right now, Icemelt skips a lot of unnecessary work (e.g. checking HMAC digests for authentication, validating hashes, etc), but, as always, patches are very welcome. Next » cedat-mak-ac-ug-494 ---- iLabs Project | The College of Engineering, Design, Art and Technology Skip to content The College of Engineering, Design, Art and Technology Makerere University Menu Home About Us Message from the Principal Historical Background University Vision and Mission Academic Staff Administrative Staff News Events Facts and Figures National Collaborations International Collaborations Academics Schools Margaret Trowell School of Industrial and Fine Arts The Makerere Art School Through The Ages Department of Fine Art Department of Industrial Art and Applied Design Department of Visual Communication, Design and Multimedia School of Built Environment Department of Architecture and Physical Planning Department of Construction Economics and Management Department of Geomatics and Land Management School of Engineering Department of Electrical and Computer Engineering Department of Civil and Environmental Engineering Department of Mechanical Engineering Student Information Undergraduate Programs Graduate Programs Short Courses Public Private Partnerships Research and Publications Centre for Research in Energy and Energy Conservation MAKA Pads Project Industrial Parks Project Low-cost Irrigation Project netLabs!UG Publications CEDAT Newsletters Presidential Initiative Project ARMS Project iLabs Project Center for Research in Transportation Technologies The Kiira EV Center for Technology Design and Development People Makerere University Fine Arts Students Association Makerere Engineering Society Makerere Architecture Students Association Makerere Association of Construction Management Students Alumini Gallery Contact Us iLabs Project iLabs@MAK iLabs@MAK is a CEDAT based research project that develops remote Laboratories on the iLabs Platform to supplement the conventional Laboratories under the Electrical Engineering Department. An iLab system links three computer stations: a Lab Server which runs the laboratory hardware, a Client – the graphical user interface customized to remotely access the laboratory hardware and a Service Broker which manages access to the laboratory and mediates information flow between the Lab Server and the Client. iLabs@MAK is carried out in collaboration with the Massachusetts Institute of Technology (MIT), Obafemi Awolowo University (OAU) and the University of Dar-es-salaam. The fundamental hardware and software used is provided mainly by National Instruments (NI). Mission Advancing knowledge and skills beneficial to Uganda in particular and the world at large through collaboration with pre-eminent Research institutions spearheaded by MIT. Vision To facilitate the improvement of the student learning experience by contributing meaningfully to the movement within higher education leading to global sharing of lab experiments over the Internet. Research iLabs@MAK comprises student developers (both Graduate and Undergraduate) and members of staff from the Faculty of Technology. The student developers carry out research in development of new labs to support curricula of the Bsc. Electrical, Telecommunications and Computer Engineering Programmes. To date, laboratories have been developed supporting experimentation in the fields of Digital circuit analysis, Amplitude/Frequency Modulation, Pulse Code Modulation and Digital Data Transmission. These experiments are used in the courses of Introduction to Digital Electronics (1st year), Applied Digital Electronics, Basic Telephony and Communication Theory I (3rd year). The ongoing research seeks to develop laboratories supporting the fields of Digital Signal Processing, Embedded Systems, Fiber Optic Systems, and Control Systems Engineering. For more information about the iLabs project, please visit our website: cedat.mak.ac.ug/ilabs iLabs Events 5th Annual iLabs-National Instruments Conference iLabs@MAK Project Wins the Best Exhibitor Award iLabs Science and Technology Innovations Challenge 2013 iLabs successfully holds Central Robotics challenge iLabs@Mak Project concludes search for best innovators iLabs Robotics Final 2013   One thought on “iLabs Project” Pingback:Meet the organisations receiving Open Data Day 2021 mini-grants – Open Knowledge Foundation blog Comments are closed. Quick Access Message from the Principal Academic Staff Administrative Staff Undergraduate Programs Graduate Programs Ceremonies and Events Partnerships and Collaborations Publications CEDAT Events << Apr 2021 >> M T W T F S S 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 2 Twitter Feed Two Mechanical Engineering @MakCEDAT @MakerereU students are part of a team that is now a FINALIST for @WegePrize 2021—a global design competition. They belong to a team Musana, which created a stove using solar power and water to fuel cooking, eliminating the need for wood fuel pic.twitter.com/gBZSz1fsC3 About 3 weeks ago from Makerere University CEDAT's Twitter via Twitter Web App 4. Bsc. Construction Management will be still offered at Graduate level as MSc. Construction Management. This will allow for specialization at this level from a wide base of undergraduate programmes and also offer an opportunity for advanced research in Construction Management pic.twitter.com/QwnoW9k4HK Last month from Makerere University CEDAT's Twitter via Twitter Web App 3. BSc. Telecommunications Engineering with BSc. Computer Engineering are being merged to form BSc. Computer and Communications Engineering which will provide the students with a wider scope of employment opportunities and more options for specialization pic.twitter.com/Z8LQC5Flya Last month from Makerere University CEDAT's Twitter via Twitter Web App 2. Those already enrolled on these programmes will not be affected in anyway. Students on the programmes will continue to have normal classes until graduation. pic.twitter.com/RcgsNoE91C Last month from Makerere University CEDAT's Twitter via Twitter Web App Communication about phased out programmes at CEDAT... 1. The phased out programmes are BSc. Telecommunications Engineering, BSc. Computer Engineering and BSc. Construction Management... cedat.mak.ac.ug/news/communic… pic.twitter.com/LxGYOb32yC Last month from Makerere University CEDAT's Twitter via Twitter Web App 50 years of Technology https://www.youtube.com/watch?v=-qtuxc7oHLw Important Links MAK-Home | Webmail Policies | Intranet Presidential Initiative International Collaborations Tag Cloud analysis. architecture CEDAT Charles Niwagaba Civil Engineering communication computer engineering concepts Conferences Construction course Department of Electrical and Computer Engineering Department of Geomatics and Land Management design Development drawing engineering Engineering Mathematics Entrepreneurship Environment Exhibition fundamentals Independent Study Innovation Innovations Makerere Art Gallery Makerere University management materials media methods MTSIFA planning principles process project projects Rationale Research skills student students techniques technology Uganda Connect With Us Contact Us THE COLLEGE OF ENGINEERING, DESIGN, ART AND TECHNOLOGY Makerere University P.O.Box 7062, Kampala-Uganda Email: pr@cedat.mak.ac.ug Web: www.cedat.mak.ac.ug Copyright © 2021 The College of Engineering, Design, Art and Technology. climadeeleicao-com-br-3043 ---- Análise de planos – Clima de Eleição Home Sobre Nosso Time Lideranças pelo clima Análise de planos Publicações Contato Brasil +55 41 998997470 climadeeleicao@gmail.com joaohencer@gmail.com Entre em contato Nosso e-mail → climadeeleicao@gmail.com Entrar em contato Projeto Clima de Eleição capacita centenas de candidaturas sobre a crise climática durante as eleições municipais de 2020. Links úteis Sobre nós Candidatos Nosso time Contato ©2020, Clima de Eleição. Todos direitos reservados. clir-informz-net-471 ---- Template Name: Characters remaining: Template Description: Characters remaining: Name: Filename: Select the destination folder: Select a folder Report Edit Test Activate Deactivate Save As Template Copy Delete Undelete Archive Set as In Progress Form Content Only Unsubscribe Form Custom Some non-required fields on this form are blank. Do you wish to save with blank values? Loading... Join the DLF Forum Newsletter mailing list! Email: * First Name * Last Name * Institution Title Opt-in to join the list DLF Forum News Please verify that you are not a robot. Sign Up DLF never shares your information. But we do like to share information with you! code4lib-org-861 ---- Code4Lib | We are developers and technologists for libraries, museums, and archives who are dedicated to being a diverse and inclusive community, seeking to share ideas and build collaboration. About Chat Conference Jobs Journal Local Mailing List Planet Wiki Code4Lib.org was migrated from Drupal to Jekyll in June 2018. Some links may still be broken. To report issues or help fix see: https://github.com/code4lib/code4lib.github.io Posts Nov 25, 2020 Code4Lib 2021 Sep 5, 2019 Code4Lib 2020 Aug 27, 2018 Code4Lib 2019 Apr 17, 2018 Code4Lib Journal Issue 41 Call for Papers Oct 18, 2017 Issue 38 of the Code4Lib Journal Aug 8, 2017 Code4Lib 2018 Jul 18, 2017 Issue 37 of the Code4Lib Journal Jun 12, 2017 Code4Lib Journal Issue 38 Call for Papers Oct 28, 2016 Code4Lib Journal #34 Oct 14, 2016 C4L17: Call for Presentation/Panel proposals Oct 13, 2016 Code4Lib 2017 Jul 19, 2016 Code4Lib Journal #33 Apr 26, 2016 Code4Lib Journal #32 Sep 17, 2015 jobs.code4lib.org studied Aug 10, 2015 Code4Lib 2016 Jul 27, 2015 Code4Lib Northern California: Stanford, CA Jul 15, 2015 Code4Lib Journal #29 Apr 15, 2015 Code4Lib Journal #28: Special Issue on Diversity in Library Technology Mar 7, 2015 Code4Lib 2016 will be in Philadelphia Mar 7, 2015 Code4Lib 2016 Conference Proposals Feb 21, 2015 Code4Lib North 2015: St. Catharines, ON Feb 21, 2015 Code4Lib 2015 videos Jan 31, 2015 2015 Code of Conduct Dec 12, 2014 Code4Lib 2015 Diversity Scholarships Dec 5, 2014 Your code does not exist in a vacuum Dec 5, 2014 Your Chocolate is in My Peanut Butter! Mixing up Content and Presentation Layers to Build Smarter Books in Browsers with RDFa, Schema.org, and Linked Data Topics Dec 5, 2014 You Gotta Keep 'em Separated: The Case for "Bento Box" Discovery Interfaces Dec 5, 2014 Refinery — An open source locally deployable web platform for the analysis of large document collections Dec 5, 2014 Programmers are not projects: lessons learned from managing humans Dec 5, 2014 Our $50,000 Problem: Why Library School? Dec 5, 2014 Making your digital objects embeddable around the web Dec 5, 2014 Leveling Up Your Git Workflow Dec 5, 2014 Level Up Your Coding with Code Club (yes, you can talk about it) Dec 5, 2014 How to Hack it as a Working Parent: or, Should Your Face be Bathed in the Blue Glow of a Phone at 2 AM? Dec 5, 2014 Helping Google (and scholars, researchers, educators, & the public) find archival audio Dec 5, 2014 Heiðrún: DPLA's Metadata Harvesting, Mapping and Enhancement System Dec 5, 2014 Got Git? Getting More Out of Your GitHub Repositories Dec 5, 2014 Feminist Human Computer Interaction (HCI) in Library Software Dec 5, 2014 Dynamic Indexing: a Tragic Solr Story Dec 5, 2014 Docker? VMs? EC2? Yes! With Packer.io Dec 5, 2014 Digital Content Integrated with ILS Data for User Discovery: Lessons Learned Dec 5, 2014 Designing and Leading a Kick A** Tech Team Dec 5, 2014 Consuming Big Linked Open Data in Practice: Authority Shifts and Identifier Drift Dec 5, 2014 BYOB: Build Your Own Bootstrap Dec 5, 2014 Book Reader Bingo: Which Page-Turner Should I Use? Dec 5, 2014 Beyond Open Source Dec 5, 2014 Awesome Pi, LOL! Dec 5, 2014 Annotations as Linked Data with Fedora4 and Triannon (a Real Use Case for RDF!) Dec 5, 2014 American (Archives) Horror Story: LTO Failure and Data Loss Dec 5, 2014 A Semantic Makeover for CMS Data Dec 4, 2014 Code4lib 2007 Lighting Talks Nov 16, 2014 Store Nov 11, 2014 Voting for Code4Lib 2015 Prepared Talks is now open. Nov 10, 2014 Keynote voting for the 2015 conference is now open! Sep 23, 2014 Code4Lib 2015: Call for Proposals Sep 21, 2014 Code4Lib North (Ottawa): Tuesday October 7th, 2014 Sep 10, 2014 code4libBC: November 27 and 28, 2014 Sep 6, 2014 2015 Conference Schedule Jul 22, 2014 Code4Lib Journal issue 25 Jul 15, 2014 Code4Lib NorCal 28 July in San Mateo Jul 2, 2014 Code4Lib 2015 Apr 18, 2014 Code4Lib 2014 Trip Report - Zahra Ashktorab Apr 18, 2014 Code4Lib 2014 Trip Report- Nabil Kashyap Apr 18, 2014 Code4Lib 2014 Trip Report - Junior Tidal Apr 18, 2014 Code4Lib 2014 Trip Report - Jennifer Maiko Kishi Apr 18, 2014 Code4Lib 2014 Trip Report - J. (Jenny) Gubernick Apr 18, 2014 Code4Lib 2014 Trip Report - Emily Reynolds Apr 18, 2014 Code4Lib 2014 Trip Report - Coral Sheldon Hess Apr 18, 2014 Code4Lib 2014 Trip Report - Christina Harlow Apr 18, 2014 CODE4LIB 2014 Trip Report - Arie Nugraha Mar 10, 2014 Call for proposals: Code4Lib Journal, issue 25 Feb 3, 2014 2014 Code of Conduct Jan 30, 2014 Code4Lib 2015 Call for Host Proposals Jan 24, 2014 Code4Lib 2014 Sponsors Jan 21, 2014 WebSockets for Real-Time and Interactive Interfaces Jan 21, 2014 We Are All Disabled! Universal Web Design Making Web Services Accessible for Everyone Jan 21, 2014 Visualizing Solr Search Results with D3.js for User-Friendly Navigation of Large Results Sets Jan 21, 2014 Visualizing Library Resources as Networks Jan 21, 2014 Under the Hood of Hadoop Processing at OCLC Research Jan 21, 2014 Towards Pasta Code Nirvana: Using JavaScript MVC to Fill Your Programming Ravioli Jan 21, 2014 Sustaining your Open Source project through training Jan 21, 2014 Structured data NOW: seeding schema.org in library systems Jan 21, 2014 Quick and Easy Data Visualization with Google Visualization API and Google Chart Libraries Jan 21, 2014 Queue Programming -- how using job queues can make the Library coding world a better place Jan 21, 2014 PhantomJS+Selenium: Easy Automated Testing of AJAX-y UIs Jan 21, 2014 Personalize your Google Analytics Data with Custom Events and Variables Jan 21, 2014 Organic Free-Range API Development - Making Web Services That You Will Actually Want to Consume Jan 21, 2014 Next Generation Catalogue - RDF as a Basis for New Services Jan 21, 2014 More Like This: Approaches to Recommending Related Items using Subject Headings Jan 21, 2014 Lucene's Latest (for Libraries) Jan 21, 2014 Discovering your Discovery System in Real Time Jan 21, 2014 Dead-simple Video Content Management: Let Your Filesystem Do The Work Jan 21, 2014 Building for others (and ourselves): the Avalon Media System Jan 21, 2014 Behold Fedora 4: The Incredible Shrinking Repository! Jan 21, 2014 All Tiled Up Jan 21, 2014 A reusable application to enable self deposit of complex objects into a digital preservation environment Jan 21, 2014 A Book, a Web Browser and a Tablet: How Bibliotheca Alexandrina's Book Viewer Framework Makes It Possible Jan 21, 2014 2014 Conference Schedule Jan 17, 2014 Code4Lib 2014 Conference Diversity Scholarship Recipients Nov 19, 2013 Code4lib 2014 Diversity Scholarships (Application Deadline: Dec. 13, 2013, 5pm EST) Nov 12, 2013 Code4Lib 2014 Keynote Speakers Sep 30, 2013 Code4Lib 2014 Jun 10, 2013 Code4Lib 2014 Conference Prospectus for Sponsors Mar 28, 2013 Code4Lib 2014 Conference Proposals Jan 31, 2013 Ask Anything! Dec 5, 2012 Code4Lib 2014 Call for Host Proposals Dec 4, 2012 The Care and Feeding of a Crowd Dec 4, 2012 The Avalon Media System: A Next Generation Hydra Head For Audio and Video Delivery Dec 4, 2012 Solr Update Dec 4, 2012 REST IS Your Mobile Strategy Dec 4, 2012 Practical Relevance Ranking for 10 million books. Dec 4, 2012 Pitfall! Working with Legacy Born Digital Materials in Special Collections Dec 4, 2012 n Characters in Search of an Author Dec 4, 2012 Linked Open Communism: Better discovery through data dis- and re- aggregation Dec 4, 2012 Hybrid Archival Collections Using Blacklight and Hydra Dec 4, 2012 HTML5 Video Now! Dec 4, 2012 Hands off! Best Practices and Top Ten Lists for Code Handoffs Dec 4, 2012 Hacking the DPLA Dec 4, 2012 Google Analytics, Event Tracking and Discovery Tools Dec 4, 2012 Evolving Towards a Consortium MARCR Redis Datastore Dec 4, 2012 EAD without XSLT: A Practical New Approach to Web-Based Finding Aids Dec 4, 2012 De-sucking the Library User Experience Dec 4, 2012 Data-Driven Documents: Visualizing library data with D3.js Dec 4, 2012 Creating a Commons Dec 4, 2012 Citation search in SOLR and second-order operators Dec 4, 2012 Browser/Javascript Integration Testing with Ruby Dec 4, 2012 ARCHITECTING ScholarSphere: How We Built a Repository App That Doesn't Feel Like Yet Another Janky Old Repository App Dec 4, 2012 All Teh Metadatas Re-Revisited Dec 4, 2012 Actions speak louder than words: Analyzing large-scale query logs to improve the research experience Nov 30, 2012 Code4Lib 2013 Scholarship (deadline: December 14, 2012) Nov 2, 2012 Code4Lib 2013 Nov 2, 2012 Code4Lib 2013 Schedule Oct 2, 2012 Code4Lib Conference 2013 Call for Propoosals Sep 5, 2012 Keynote voting for the 2013 conference is now open! Jul 11, 2012 Dates Set for Code4Lib 2013 in Chicago May 29, 2012 Code4Lib Journal - Call for Proposals May 7, 2012 ruby-marc 0.5.0 released Apr 10, 2012 Code4Lib Journal: Editors Wanted Feb 3, 2012 Code4Lib Journal Issue 16 is published! Feb 3, 2012 Ask Anything! – Facilitated by Carmen Mitchell- Code4Lib 2012 Jan 26, 2012 Relevance Ranking in the Scholarly Domain - Tamar Sadeh, PhD Jan 26, 2012 Kill the search button II - the handheld devices are coming - Jørn Thøgersen, Michael Poltorak Nielsen Jan 25, 2012 Stack View: A Library Browsing Tool - Annie Cain Jan 25, 2012 Search Engine Relevancy Tuning - A Static Rank Framework for Solr/Lucene - Mike Schultz Jan 25, 2012 Practical Agile: What's Working for Stanford, Blacklight, and Hydra - Naomi Dushay Jan 25, 2012 NoSQL Bibliographic Records: Implementing a Native FRBR Datastore with Redis - Jeremy Nelson Jan 25, 2012 Lies, Damned Lies, and Lines of Code Per Day - James Stuart Jan 25, 2012 Indexing big data with Tika, Solr & map-reduce - Scott Fisher, Erik Hetzner Jan 25, 2012 In-browser data storage and me - Jason Casden Jan 25, 2012 How people search the library from a single search box - Cory Lown Jan 25, 2012 Discovering Digital Library User Behavior with Google Analytics - Kirk Hess Jan 25, 2012 Building research applications with Mendeley - William Gunn Jan 23, 2012 Your UI can make or break the application (to the user, anyway) - Robin Schaaf Jan 23, 2012 Your Catalog in Linked Data - Tom Johnson Jan 23, 2012 The Golden Road (To Unlimited Devotion): Building a Socially Constructed Archive of Grateful Dead Artifacts - Robin Chandler Jan 23, 2012 Quick and Dirty Clean Usability: Rapid Prototyping with Bootstrap - Shaun Ellis Jan 23, 2012 “Linked-Data-Ready” Software for Libraries - Jennifer Bowen Jan 23, 2012 HTML5 Microdata and Schema.org - Jason Ronallo Jan 23, 2012 HathiTrust Large Scale Search: Scalability meets Usability - Tom Burton-West Jan 23, 2012 Design for Developers - Lisa Kurt Jan 23, 2012 Beyond code: Versioning data with Git and Mercurial - Charlie Collett, Martin Haye Jan 23, 2012 ALL TEH METADATAS! or How we use RDF to keep all of the digital object metadata formats thrown at us - Declan Fleming Dec 29, 2011 Discussion for Elsevier App Challenge during Code4Lib 2012 Dec 14, 2011 So you want to start a Kindle lending program Dec 1, 2011 Code4Lib 2013 Call for Host Proposals Nov 29, 2011 Code4Lib 2012 Scholarship (deadline: December 9, 2011) Oct 21, 2011 code4lib 2012 Sponsor Listing Oct 19, 2011 Code4Lib 2012 Schedule Jul 28, 2011 Code4Lib 2012 Feb 11, 2011 Code4Lib 2012 Sponsorship Jan 26, 2011 VuFind Beyond MARC: Discovering Everything Else - Demian Katz Jan 26, 2011 One Week | One Tool: Ultra-Rapid Open Source Development Among Strangers - Scott Hanrath Jan 26, 2011 Letting In the Light: Using Solr as an External Search Component - Jay Luker and Benoit Thiell Jan 26, 2011 Kuali OLE: Architecture for Diverse and Linked Data - Tim McGeary and Brad Skiles Jan 26, 2011 Keynote Address - Diane Hillmann Jan 26, 2011 Hey, Dilbert. Where's My Data?! - Thomas Barker Jan 26, 2011 Enhancing the Mobile Experience: Mobile Library Services at Illinois - Josh Bishoff - Josh Bishoff Jan 26, 2011 Drupal 7 as Rapid Application Development Tool - Cary Gordon Jan 26, 2011 Code4Lib 2012 in Seattle Jan 26, 2011 2011 Lightning Talks Jan 26, 2011 2011 Breakout Sessions Jan 25, 2011 (Yet Another) Home-Grown Digital Library System, Built Upon Open Source XML Technologies and Metadata Standards - David Lacy Jan 25, 2011 Why (Code4) Libraries Exist - Eric Hellman Jan 25, 2011 Visualizing Library Data - Karen Coombs Jan 25, 2011 Sharing Between Data Repositories - Kevin S. Clarke Jan 25, 2011 Practical Relevancy Testing - Naomi Dushay Jan 25, 2011 Opinionated Metadata (OM): Bringing a Bit of Sanity to the World of XML Metadata - Matt Zumwalt Jan 25, 2011 Mendeley's API and University Libraries: Three Examples to Create Value - Ian Mulvany Jan 25, 2011 Let's Get Small: A Microservices Approach to Library Websites - Sean Hannan Jan 25, 2011 GIS on the Cheap - Mike Graves Jan 25, 2011 fiwalk With Me: Building Emergent Pre-Ingest Workflows for Digital Archival Records using Open Source Forensic Software - Mark M Jan 25, 2011 Enhancing the Performance and Extensibility of the XC’s MetadataServicesToolkit - Ben Anderson Jan 25, 2011 Chicago Underground Library’s Community-Based Cataloging System - Margaret Heller and Nell Taylor Jan 25, 2011 Building an Open Source Staff-Facing Tablet App for Library Assessment - Jason Casden and Joyce Chapman Jan 25, 2011 Beyond Sacrilege: A CouchApp Catalog - Gabriel Farrell Jan 25, 2011 Ask Anything! – Facilitated by Dan Chudnov Jan 25, 2011 A Community-Based Approach to Developing a Digital Exhibit at Notre Dame Using the Hydra Framework - Rick Johnson and Dan Brubak Dec 12, 2010 Code4Lib 2011 schedule Dec 10, 2010 Code4Lib 2012 Call for Host Proposals Nov 17, 2010 Scholarships to Attend the 2011 Code4Lib Conference (Deadline Dec. 6, 2010) Sep 23, 2010 Code4Lib 2011 Sponsorship Jun 28, 2010 Issue 10 of the Code4Lib Journal Mar 23, 2010 Location of code4lib 2011 Mar 23, 2010 Code4Lib 2011: Get Ready for the Best Code4lib Conference Yet! Mar 22, 2010 Issue 9 of the Code4Lib Journal Mar 12, 2010 Vote on Code4Lib 2011 hosting proposals Feb 24, 2010 You Either Surf or You Fight: Integrating Library Services With Google Wave - Sean Hannan - Code4Lib 2010 Feb 24, 2010 Vampires vs. Werewolves: Ending the War Between Developers and Sysadmins with Puppet - Bess Sadler - Code4Lib 2010 Feb 24, 2010 The Linked Library Data Cloud: Stop talking and start doing - Ross Singer - Code4Lib 2010 Feb 24, 2010 Taking Control of Library Metadata and Websites Using the eXtensible Catalog - Jennifer Bowen - Code4Lib 2010 Feb 24, 2010 Public Datasets in the Cloud - Rosalyn Metz and Michael B. Klein - Code4Lib 2010 Feb 24, 2010 Mobile Web App Design: Getting Started - Michael Doran - Code4Lib 2010 Feb 24, 2010 Metadata Editing – A Truly Extensible Solution - David Kennedy and David Chandek-Stark - Code4Lib 2010 Feb 24, 2010 Media, Blacklight, and Viewers Like You (pdf, 2.61MB) - Chris Beer - Code4Lib 2010 Feb 24, 2010 Matching Dirty Data – Yet Another Wheel - Anjanette Young and Jeff Sherwood - Code4Lib 2010 Feb 24, 2010 library/mobile: Developing a Mobile Catalog - Kim Griggs - Code4Lib 2010 Feb 24, 2010 Keynote #2: catfish, cthulhu, code, clouds and Levenshtein distance - Paul Jones - Code4Lib 2010 Feb 24, 2010 Keynote #1: Cathy Marshall - Code4Lib 2010 Feb 24, 2010 Iterative Development Done Simply - Emily Lynema - Code4Lib 2010 Feb 24, 2010 I Am Not Your Mother: Write Your Test Code - Naomi Dushay, Willy Mene, and Jessie Keck - Code4Lib 2010 Feb 24, 2010 How to Implement A Virtual Bookshelf With Solr - Naomi Dushay and Jessie Keck - Code4Lib 2010 Feb 24, 2010 HIVE: A New Tool for Working With Vocabularies - Ryan Scherle and Jose Aguera - Code4Lib 2010 Feb 24, 2010 Enhancing Discoverability With Virtual Shelf Browse - Andreas Orphanides, Cory Lown, and Emily Lynema - Code4Lib 2010 Feb 24, 2010 Drupal 7: A more powerful platform for building library applications - Cary Gordon - Code4Lib 2010 Feb 24, 2010 Do It Yourself Cloud Computing with Apache and R - Harrison Dekker - Code4Lib 2010 Feb 24, 2010 Cloud4Lib - Jeremy Frumkin and Terry Reese - Code4Lib 2010 Feb 24, 2010 Becoming Truly Innovative: Migrating from Millennium to Koha - Ian Walls - Code4Lib 2010 Feb 24, 2010 Ask Anything! – Facilitated by Dan Chudnov - Code4Lib 2010 Feb 24, 2010 A Better Advanced Search - Naomi Dushay and Jessie Keck - Code4Lib 2010 Feb 24, 2010 7 Ways to Enhance Library Interfaces with OCLC Web Services - Karen Coombs - Code4Lib 2010 Feb 22, 2010 Code4Lib 2010 Lightning Talks Feb 22, 2010 Code4Lib 2010 Breakout Sessions Feb 21, 2010 Code4Lib 2010 Participant Release Form Feb 5, 2010 Code4Lib 2011 Hosting Proposals Solicited Jan 16, 2010 2010 Code4lib Scholarship Recipients Jan 12, 2010 Code4Lib North Dec 21, 2009 Scholarships to Attend the 2010 Code4Lib Conference Dec 16, 2009 Code4Lib 2010 Registration Dec 14, 2009 2010 Conference info Dec 10, 2009 Code4Lib 2010 Schedule Dec 4, 2009 Code4Lib 2010 Sponsorship Nov 16, 2009 2010 Code4Lib Conference Prepared Talks Voting Now Open! Oct 30, 2009 Code4Lib 2010 Call for Prepared Talk Proposals Sep 21, 2009 Vote for code4lib 2010 keynotes! Jul 10, 2009 Code4Lib 2010 Jun 26, 2009 Code4Lib Journal: new issue 7 now available May 15, 2009 Visualizing Media Archives: A Case Study May 15, 2009 The Open Platform Strategy: what it means for library developers May 15, 2009 If You Love Something...Set it Free May 14, 2009 What We Talk About When We Talk About FRBR May 14, 2009 The Rising Sun: Making the most of Solr power May 14, 2009 Great facets, like your relevance, but can I have links to Amazon and Google Book Search? May 14, 2009 FreeCite - An Open Source Free-Text Citation Parser May 14, 2009 Freebasing for Fun and Enhancement May 14, 2009 Extending biblios, the open source web based metadata editor May 14, 2009 Complete faceting May 14, 2009 A New Platform for Open Data - Introducing ‡biblios.net Web Services May 13, 2009 Sebastian Hammer, Keynote Address May 13, 2009 Blacklight as a unified discovery platform May 13, 2009 A new frontier - the Open Library Environment (OLE) May 8, 2009 The Dashboard Initiative May 8, 2009 RESTafarian-ism at the NLA May 8, 2009 Open Up Your Repository With a SWORD! May 8, 2009 LuSql: (Quickly and easily) Getting your data from your DBMS into Lucene May 8, 2009 Like a can opener for your data silo: simple access through AtomPub and Jangle May 8, 2009 LibX 2.0 May 8, 2009 How I Failed To Present on Using DVCS for Managing Archival Metadata May 8, 2009 djatoka for djummies May 8, 2009 A Bookless Future for Libraries: A Comedy in 3 Acts May 1, 2009 Why libraries should embrace Linked Data Mar 31, 2009 Code4Lib Journal: new issue 6 now available Feb 28, 2009 See you next year in Asheville Feb 20, 2009 Code4Lib 2009 Lightning Talks Feb 19, 2009 code4lib2010 venue voting Feb 17, 2009 OCLC Grid Services Boot Camp (2009 Preconference) Feb 16, 2009 Code4Lib 2010 Hosting Proposals Jan 29, 2009 Code4Lib Logo Jan 29, 2009 Code4Lib Logo Debuts Jan 28, 2009 Code4Lib 2009 Breakout Sessions Jan 16, 2009 Call for Code4Lib 2010 Hosting Proposals Jan 11, 2009 2009 Code4lib Scholarship Recipients Jan 5, 2009 Code4lib 2009 T-shirt Design Contest Dec 17, 2008 code4lib2009 registration open! Dec 15, 2008 Code4Lib Journal Issue 5 Published Dec 5, 2008 Code4lib 2009 Gender Diversity and Minority Scholarships Dec 5, 2008 Calling all Code4Libers Attending Midwinter Dec 3, 2008 Logo Design Process Launched Dec 3, 2008 Code4Lib 2009 Schedule Dec 2, 2008 2009 Pre-Conferences Nov 25, 2008 Voting On Presentations for code4lib 2009 Open until December 3 Nov 18, 2008 drupal4lib unconference (02/27/2009 Darien, CT) Oct 24, 2008 Call for Proposals, Code4Lib 2009 Conference Oct 10, 2008 ne.code4lib.org Sep 30, 2008 code4lib2009 keynote voting Sep 23, 2008 Logo? You Decide Sep 17, 2008 solrpy google code project Sep 3, 2008 Code4Lib 2009 Sep 3, 2008 Code4Lib 2009 Sponsorship Aug 27, 2008 Code4LibNYC Aug 22, 2008 Update from LinkedIn Jul 15, 2008 LinkedIn Group Growing Fast Jul 3, 2008 code4lib group on LInkedIn Apr 17, 2008 ELPUB 2008 Open Scholarship: Authority, Community and Sustainability in the Age of Web 2.0 Mar 4, 2008 Code4libcon 2008 Lightning Talks Mar 3, 2008 Brown University to Host Code4Lib 2009 Feb 26, 2008 Desktop Presenter software Feb 25, 2008 Presentations from LibraryFind pre-conference Feb 21, 2008 Vote for Code4Lib 2009 Host! Feb 19, 2008 Karen Coyle Keynote - R&D: Can Resource Description become Rigorous Data? Feb 6, 2008 Code4libcon 2008 Breakout Sessions Feb 1, 2008 Call for Code4Lib 2009 Hosting Proposals Jan 30, 2008 Code4lib 2008 Conference T-Shirt Design Jan 7, 2008 Code4lib 2008 Registration now open! Dec 27, 2007 Zotero and You, or Bibliography on the Semantic Web Dec 27, 2007 XForms for Metadata creation Dec 27, 2007 Working with the WorldCat API Dec 27, 2007 Using a CSS Framework Dec 27, 2007 The Wayback Machine Dec 27, 2007 The Making of The Code4Lib Journal Dec 27, 2007 The Code4Lib Future Dec 27, 2007 Show Your Stuff, using Omeka Dec 27, 2007 Second Life Web Interoperability - Moodle and Merlot.org Dec 27, 2007 RDF and RDA: declaring and modeling library metadata Dec 27, 2007 ÖpënÜRL Dec 27, 2007 OSS Web-based cataloging tool Dec 27, 2007 MARCThing Dec 27, 2007 Losing sleep over REST? Dec 27, 2007 From Idea to Open Source Dec 27, 2007 Finding Relationships in MARC Data Dec 27, 2007 DLF ILS Discovery Interface Task Force API recommendation Dec 27, 2007 Delivering Library Services in the Web 2.0 environment: OSU Libraries Publishing System for and by Librarians Dec 27, 2007 CouchDB is sacrilege... mmm, delicious sacrilege Dec 27, 2007 Building the Open Library Dec 27, 2007 Building Mountains Out of Molehills Dec 27, 2007 A Metadata Registry Dec 17, 2007 Code4lib 2008 Gender Diversity and Minority Scholarships Dec 12, 2007 Conference Schedule Nov 20, 2007 Code4lib 2008 Keynote Survey Oct 31, 2007 Code4lib 2008 Call for Proposals Oct 16, 2007 Code4Lib 2008 Schedule Jul 18, 2007 code4lib 2008 conference Jul 6, 2007 Random #code4lib Quotes Jun 13, 2007 Request for Proposals: Innovative Uses of CrossRef Metadata May 16, 2007 Library Camp NYC, August 14, 2007 Apr 3, 2007 Code4Lib 2007 - Video, Audio and Podcast Available Mar 14, 2007 Code4Lib 2007 - Day 1 Video Available Mar 13, 2007 Erik Hatcher Keynote Mar 12, 2007 My Adventures in Getting Data into the ArchivistsToolkit Mar 9, 2007 Karen Schneider Keynote "Hurry up please it's time" Mar 9, 2007 Code4Lib Conference Feedback Available Mar 9, 2007 Code4Lib 2007 Video Trickling In Mar 1, 2007 Code4Lib.org Restored Feb 24, 2007 Code4Lib 2008 will be in Portland, OR Feb 13, 2007 Code4Lib Blog Anthology Feb 9, 2007 The Intellectual Property Disclosure Process: Releasing Open Source Software in Academia Feb 6, 2007 Polling for interest in a European code4lib Feb 5, 2007 Call for Proposals to Host Code4Lib 2008 Feb 5, 2007 2007 Code4lib Scholarship Recipients Feb 3, 2007 Delicious! Flare + SIMILE Exhibit Jan 30, 2007 Open Access Self-Archiving Mandate Jan 17, 2007 Evergreen Keynote Jan 17, 2007 Code4Lib 2007 T-Shirt Contest Jan 16, 2007 Stone Soup Jan 10, 2007 #code4lib logging Jan 2, 2007 Two scholarships to attend the 2007 code4lib conference Dec 20, 2006 2007 Conference Schedule Now Available Dec 19, 2006 code4lib 2007 pre-conference workshop: Lucene, Solr, and your data Dec 18, 2006 Traversing the Last Mile Dec 18, 2006 The XQuery Exposé: Practical Experiences from a Digital Library Dec 18, 2006 The BibApp Dec 18, 2006 Smart Subjects - Application Independent Subject Recommendations Dec 18, 2006 Open-Source Endeca in 250 Lines or Less Dec 18, 2006 On the Herding of Cats Dec 18, 2006 Obstacles to Agility Dec 18, 2006 MyResearch Portal: An XML based Catalog-Independent OPAC Dec 18, 2006 LibraryFind Dec 18, 2006 Library-in-a-Box Dec 18, 2006 Library Data APIs Abound! Dec 18, 2006 Get Groovy at Your Public Library Dec 18, 2006 Fun with ZeroConfMetaOpenSearch Dec 18, 2006 Free the Data: Creating a Web Services Interface to the Online Catalog Dec 18, 2006 Forget the Lipstick. This Pig Just Needs Social Skills. Dec 18, 2006 Atom Publishing Protocol Primer Nov 27, 2006 barton data Nov 21, 2006 MIT Catalog Data Oct 29, 2006 Code4Lib Downtime Oct 16, 2006 Call for Proposals Aug 24, 2006 Code4Lib2006 Audio Aug 15, 2006 book club Jul 4, 2006 Code4LibCon Site Proposals Jul 1, 2006 Improving Code4LibCon 200* Jun 28, 2006 Code4Lib Conference Hosting Jun 22, 2006 Learning to Scratch Our Own Itches Jun 15, 2006 2007 Code4Lib Conference Jun 15, 2006 2007 Code4Lib Conference Schedule Jun 15, 2006 2007 Code4Lib Conference Lightning Talks Jun 15, 2006 2007 Code4Lib Conference Breakouts Mar 31, 2006 Results of the journal name vote Mar 22, 2006 #dspace Mar 20, 2006 #code4lib logging Mar 14, 2006 regulars on the #code4lib irc channel Mar 14, 2006 Code4lib Journal Name Vote Mar 14, 2006 code4lib journal: mission, format, guidelines Mar 14, 2006 #code4lib irc channel faq Feb 27, 2006 CUFTS2 AIM/AOL/ICQ bot Feb 24, 2006 code4lib journal: draft purpose, format, and guidelines Feb 21, 2006 2006 code4lib Breakout Sessions Feb 17, 2006 unapi revision 1 Feb 15, 2006 code4lib 2006 presentations will be available Feb 14, 2006 planet update Feb 13, 2006 Weather in Corvallis for Code4lib Feb 13, 2006 Holiday Inn Express Feb 9, 2006 conference wiki Jan 31, 2006 Portland Hostel Jan 27, 2006 Lightning Talks Jan 23, 2006 Code4lib 2006 T-Shirt design vote! Jan 19, 2006 Portland Jazz Festival Jan 13, 2006 unAPI version 0 Jan 13, 2006 conference schedule in hCalendar Jan 12, 2006 code4lib 2006 T-shirt design contest Jan 11, 2006 Conference Schedule Set Jan 11, 2006 code4lib 2006 registration count pool Jan 10, 2006 WikiD Jan 10, 2006 The Case for Code4Lib 501c(3) Jan 10, 2006 Teaching the Library and Information Community How to Remix Information Jan 10, 2006 Practical Aspects of Implementing Open Source in Armenia Jan 10, 2006 Lipstick on a Pig: 7 Ways to Improve the Sex Life of Your OPAC Jan 10, 2006 Generating Recommendations in OPACS: Initial Results and Open Areas for Exploration Jan 10, 2006 ERP Options in an OSS World Jan 10, 2006 AHAH: When Good is Better than Best Jan 10, 2006 1,000 Lines of Code, and other topics from OCLC Research Jan 9, 2006 What Blog Applications Can Teach Us About Library Software Architecture Jan 9, 2006 Standards, Reusability, and the Mating Habits of Learning Content Jan 9, 2006 Quality Metrics Jan 9, 2006 Library Text Mining Jan 9, 2006 Connecting Everything with unAPI and OPA Jan 9, 2006 Chasing Babel Jan 9, 2006 Anatomy of aDORe Jan 6, 2006 Voting on Code4Lib 2006 Presentation Proposals Jan 3, 2006 one more week for proposals Dec 19, 2005 code4lib card Dec 15, 2005 planet facelift Dec 6, 2005 Registration is Open Dec 3, 2005 planet code4lib & blogs Dec 1, 2005 Code4lib 2006 Call For Proposals Nov 29, 2005 code4lib Conference 2006: Schedule Nov 21, 2005 panizzi Nov 21, 2005 drupal installed Nov 21, 2005 code4lib 2006 subscribe via RSS Code4Lib Code4Lib code4lib code4lib.social code4lib code4lib We are developers and technologists for libraries, museums, and archives who are dedicated to being a diverse and inclusive community, seeking to share ideas and build collaboration. codeforpakistan-org-1574 ---- Code for Pakistan - Civic Innovation in Pakistan Login | Register Username or Email Address Password Lost your password? Home About Us Programs Civic Innovation Labs Civic Hackathons Civic Hackathon 2020 SDG Hackathon 2019 Pakistan @100 Innovation Hackathon Previous Hackathons Fellowship Batch 1: KP Fellowship Batch 2: KP Fellowship Batch 3: KP Fellowship Batch 4: KP Fellowship Batch 5: KP Fellowship Batch 6: KP Fellowship Women And Tech Events Civic Apps Annual Reports Impact Report 2019 Blog Contact Careers Code for Pakistan Our goal is to bring together civic-minded software developers to use technology to innovate in public services, by creating open source solutions to address the needs of citizens. This is an opportunity for citizens and the private sector to give back to Pakistan by engendering civic innovation. Read More CfP Founder Sheba Najmi at #021Disrupt19 Play Video Sheba Najmi shared her thoughts on how companies design products and user experiences today, and why it is essential for us to develop a more human approach when it comes to both. Civic Innovation Labs CfP runs Civic Innovation Labs in major cities. Learn more about joining or starting a Lab. Civic Hackathons Civic Hackathons are events that spark civic engagement by bringing designers, developers, and community organizers together to prototype solutions to civic problems. Upcoming Events There is always something interesting going on at CfP. Join our events. CfP is part of a global movement. Watch this video message from Code for America. What We've Done So Far 0 Github Repositories 0 Civic Hackathons 0 Civic Innovation Labs 0 Fellows Graduated Why It Matters Civic innovation starts to reframe the relationship between local government and citizens, which is essential if the two are to live together smartly. Toward a progressive Pakistan! Collaborative Model Through the creation of open source technology to address civic needs, we aim to transform civic life by increasing civic engagement, encouraging the opening of government data, and supporting innovation in the public domain. Our Labs meet regularly to collaborate with local stakeholders (including Government, partner Non-profit Organizations, and Media Organizations) on projects that focus on how to use 21st century web and data tools to improve civic interfaces. Learn more about our Programs Latest From Our Blog April 9, 2021 Applications Open: KP Government Innovation Fellowship Program 2021 (7th Cycle) Read more January 12, 2021 Job Opening: Country Director Read more January 11, 2021 Civic Hackathon 2020 Concludes Read more If you are interested in learning more about Code for Pakistan, Contact Us! We would love to hear from you. Join our Discord community . License CC BY-SA 4.0 coding-confessions-github-io-9008 ---- Coding Confessions | Normalising failure in research software. CodingConfessions About Read Confessions MAKE A CONFESSION Normalising failure. Normalising failure in research software creates an inclusive space for sharing experiences, and generates opportunity to learn. What is Coding Confessions? Simply put: "Where somebody admits to mistakes or bad practice in code they've developed." What's the problem? Everybody who develops software has at some point written some software badly, quickly, cut corners or simply made a mistake that made it function incorrectly. Due to imposter syndrome many people feel like this makes them less worthy developers. Often the root cause is time pressure to make something that "just works" (or at least appeared to). These little short cuts often end up becoming core pieces of software upon which research conclusions and publications are based. People don't like to admit to making mistakes, cutting corners or not following best practice, sometimes hiding these problems away. Why do this? We want to: Change the culture of research so that mistakes can be disclosed without fear. Document mistakes and allow the entire community to benefit from the lessons learned. These will be published on our blog. How to submit a confession Please only submit a confession about something you did yourself, don't submit confessions about the work of others. Send us one paragraph about each of the following: The background to the problem, what were you trying to do? The mistake you made. What steps can be taken to avoid this mistake in the future. You can do this publicly (with atribution) or privately (anonymously). We will then publish them on our blog. See this example blog post. See the submit a confession page for more information. Submit a confession to us How to run a Confessions Workshop at your own event Read confessions in our blog. Confessions Below are the latest confessions from our blog. Confession 1 Dave 1 April 2021 The typo that nearly broke my first paper Eirini 9 February 2021 Confession 2 Dave 9 February 2021 Coding Confessions. Normalising failure in research software. Software Sustainability Institute This project and website was created as part of the Hack Day in the Collaborations Workshop 2021. The Software Sustainability Institute cultivates better, more sustainable, research software to enable world-class research. They help people build better software, and we work with researchers, developers, funders and infrastructure providers to identify key issues and best practice in scientific software. Privacy Thanks Github Pages. Menu commonplace-net-5233 ---- commonplace.net – Data. The final frontier. Skip to content commonplace.net Data. The final frontier. Publications A Common Place All Posts About Contact Infrastructure for heritage institutions – ARK PID’s November 3, 2020November 11, 2020 Lukas KosterData, Infrastructure, Library In the Digital Infrastructure program at the Library of the University of Amsterdam we have reached a first milestone. In my previous post in the Infrastructure for heritage institutions series, “Change of course“, I mentioned the coming implementation of ARK persistent identifiers for our collection objects. Since November 3, 2020, ARK PID’s are available for our university library Alma catalogue through the Primo user interface. Implementation of ARK PID’s for the other collection description systems […] Read more Infrastructure for heritage institutions – change of course June 23, 2020 Lukas KosterData, Infrastructure, Library In July 2019 I published the first post about our planning to realise a “coherent and future proof digital infrastructure” for the Library of the University of Amsterdam. In February I reported on the first results. As frequently happens, since then the conditions have changed, and naturally we had to adapt the direction we are following to achieve our goals. In other words: a change of course, of course.  Projects  I will leave aside the […] Read more Infrastructure for heritage institutions – first results February 24, 2020February 25, 2020 Lukas KosterData, Infrastructure, Library In July 2019 I published the post Infrastructure for heritage institutions in which I described our planning to realise a “coherent and future proof digital infrastructure” for the Library of the University of Amsterdam. Time to look back: how far have we come? And time to look forward: what’s in store for the near future? Ongoing activities I mentioned three “currently ongoing activities”:  Monitoring and advising on infrastructural aspects of new projects Maintaining a structured dynamic overview […] Read more Infrastructure for heritage institutions July 11, 2019January 11, 2020 Lukas KosterData, Infrastructure, Library During my vacation I saw this tweet by LIBER about topics to address, as suggested by the participants of the LIBER 2019 conference in Dublin: It shows a word cloud (yes, a word cloud) containing a large number of terms. I list the ones I can read without zooming in (so the most suggested ones, I guess), more or less grouped thematically: Open scienceOpen dataOpen accessLicensingCopyrightsLinked open dataOpen educationCitizen science Scholarly communicationDigital humanities/DHDigital scholarshipResearch assessmentResearch […] Read more Ten years linked open data June 4, 2016February 13, 2020 Lukas KosterData, Library This post is the English translation of my original article in Dutch, published in META (2016-3), the Flemish journal for information professionals. Ten years after the term “linked data” was introduced by Tim Berners-Lee it appears to be time to take stock of the impact of linked data for libraries and other heritage institutions in the past and in the future. I will do this from a personal historical perspective, as a library technology professional, […] Read more Maps, dictionaries and guidebooks August 3, 2015February 3, 2020 Lukas KosterData Interoperability in heterogeneous library data landscapes Libraries have to deal with a highly opaque landscape of heterogeneous data sources, data types, data formats, data flows, data transformations and data redundancies, which I have earlier characterized as a “data maze”. The level and magnitude of this opacity and heterogeneity varies with the amount of content types and the number of services that the library is responsible for. Academic and national libraries are possibly dealing with more […] Read more Standard deviations in data modeling, mapping and manipulation June 16, 2015February 3, 2020 Lukas KosterData Or: Anything goes. What are we thinking? An impression of ELAG 2015 This year’s ELAG conference in Stockholm was one of many questions. Not only the usual questions following each presentation (always elicited in the form of yet another question: “Any questions?”). But also philosophical ones (Why? What?). And practical ones (What time? Where? How? How much?). And there were some answers too, fortunately. This is my rather personal impression of the event. For a […] Read more Analysing library data flows for efficient innovation November 27, 2014February 14, 2020 Lukas KosterLibrary In my work at the Library of the University of Amsterdam I am currently taking a step forward by actually taking a step back from a number of forefront activities in discovery, linked open data and integrated research information towards a more hidden, but also more fundamental enterprise in the area of data infrastructure and information architecture. All for a good cause, for in the end a good data infrastructure is essential for delivering high […] Read more Looking for data tricks in Libraryland September 5, 2014January 12, 2020 Lukas KosterLibrary IFLA 2014 Annual World Library and Information Congress Lyon – Libraries, Citizens, Societies: Confluence for Knowledge After attending the IFLA 2014 Library Linked Data Satellite Meeting in Paris I travelled to Lyon for the first three days (August 17-19) of the IFLA 2014 Annual World Library and Information Congress. This year’s theme “Libraries, Citizens, Societies: Confluence for Knowledge” was named after the confluence or convergence of the rivers Rhône and Saône where the city of […] Read more Library Linked Data Happening August 26, 2014January 12, 2020 Lukas KosterLibrary On August 14 the IFLA 2014 Satellite Meeting ‘Linked Data in Libraries: Let’s make it happen!’ took place at the National Library of France in Paris. Rurik Greenall (who also wrote a very readable conference report) and I had the opportunity to present our paper ‘An unbroken chain: approaches to implementing Linked Open Data in libraries; comparing local, open-source, collaborative and commercial systems’. In this paper we do not go into reasons for libraries to […] Read more Posts navigation Older posts Profiles and social @lukask on Twitter @lukask on Mastodon My ORCID My Impactstory My Zotero My UvA profile Recent Posts Infrastructure for heritage institutions – ARK PID’s Infrastructure for heritage institutions – change of course Infrastructure for heritage institutions – first results Infrastructure for heritage institutions Ten years linked open data Maps, dictionaries and guidebooks Most Popular Posts Is an e-book a book? (8,462 views) Who needs MARC? (5,839 views) Linked Data for Libraries (5,058 views) Mobile app or mobile web? (4,561 views) User experience in public and academic libraries (4,279 views) Mainframe to mobile (3,484 views) (Discover AND deliver) OR else (3,260 views) Recent Comments Maarten Brinkerink on Infrastructure for heritage institutions Gittaca on Infrastructure for heritage institutions Libraries & the Future of Scholarly Communication at #BTPDF2 – UC3 Portal on Beyond The Library Tatiana Bryant (@BibliotecariaT) on Analysing library data flows for efficient innovation @BibliotecariaT on Analysing library data flows for efficient innovation @LizWoolcott on Analysing library data flows for efficient innovation Tags apps authority files catalog collection conferences cultural heritage data data management developer platforms discovery tools elag exlibris foaf frbr hardware identifiers igelu infrastructure innovation integration interoperability libraries library Library2.0 library systems linked data linked open data marc meetings metadata mobile next generation open data open source open stack open systems people persistent identifiers rda rdf semantic web social networking technology uri web2.0 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. System Log in Entries feed Comments feed WordPress.org Top Posts & Pages Explicit and implicit metadata Analysing library data flows for efficient innovation Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy commonplace.net A Common Place About All Posts Contact Publications Powered by WordPress | Theme: Astrid by aThemes. community-esri-com-4132 ---- Participatory Mapping with Google Forms, Google Sh... - Esri Community Community All Communities Products ArcGIS Survey123 ArcGIS Pro ArcGIS Online ArcGIS Enterprise Data Management Geoprocessing ArcGIS Web AppBuilder ArcGIS Collector ArcGIS Spatial Analyst ArcGIS CityEngine Imagery and Remote Sensing ArcGIS Dashboards All Products Communities Industries Education Water Resources Gas and Pipeline State & Local Government Transportation Water Utilities Telecommunications Roads and Highways Natural Resources Electric Science Commercial All Industries Communities Developers Python ArcGIS API for JavaScript ArcGIS Runtime SDKs ArcObjects SDK ArcGIS API for Python ArcGIS Pro SDK Developers - General ArcGIS API for Silverlight (Retired) ArcGIS API for Flex (Retired) ArcGIS REST API ArcGIS for Windows Mobile (Retired) File Geodatabase API All Developers Communities Worldwide Comunidad Esri Colombia - Ecuador - Panamá ArcGIS 開発者コミュニティ ArcNesia Esri India GeoDev Germany Czech GIS ArcGIS Content - Esri Nederland Esri Italia Community Swiss Geo Community GeoDev Switzerland Comunidad GEOTEC Esri Ireland All Worldwide Communities All Communities Products Developers User Groups Industries Services Worldwide Community Basics Events ArcGIS Topics Learning Networks View All Communities ArcGIS Ideas Community Basics Community Help Documents Community Blog Community Feedback Member Introductions Sign In cancel Turn on suggestions Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type. Showing results for  Show  only  | Search instead for  Did you mean:  Cancel Home : All Communities : Industries : Education : Education Blog : Participatory Mapping with Google Forms, Google Sh... Participatory Mapping with Google Forms, Google Sheets, and ArcGIS Online Subscribe 7585 4 07-21-2017 06:03 AM Labels (2) Labels Higher Education Schools (K - 12) by JosephKerski Esri Frequent Contributor 3 4 7,585 Subscribe to RSS Feed Mark as New Mark as Read Bookmark Subscribe Email to a Friend Printer Friendly Page Report Inappropriate Content ‎07-21-2017 06:03 AM I have been receiving questions from schools that have become "Google Schools" as well as universities and individual researchers who want to use Google Sheets in ArcGIS Online.  What are the advantages of using Google Sheets (spreadsheets, really, is what they are) over using an Excel spreadsheet on your own computer? Google Sheets live in the cloud, just like ArcGIS Online, so they can be edited from any device, anywhere, and the author of the Sheet can invite others to add data to it, so they can accept input from multiple collaborators, students, and faculty. Some educators want to map data that they have input into Google Sheets.  Others want to go to the next level, where multiple students or researchers edit Google Sheets in a participatory mapping or citizen science environment, and the resulting data is mapped and automatically refreshes as the data continues to be added. Both of these scenarios are possible with ArcGIS Online.  To illustrate, I created a form where students are asked, "What country have you visited?", shown below. After students fill out the form, I go to the "responses" zone in Google Forms, and access the spreadsheet that is created from the data.  Now that my data is in my Google Sheet, I access > File > Publish to the Web > and change "Web Page" to "Comma Separated Values (.csv)" file > Publish.   Then, I copy the resulting URL: Then, I access my ArcGIS Online account, open a new or existing map > Add > Add Layer from Web - CSV file > paste your URL for my Google Sheet here.   Next, I > Add Layer > I indicate which fields contain my location information (address, latitude-longitude, city/state/country combination).   That's really all there is to it!  My results are in this map linked here, and shown below: Note that I used one of the fun new basemaps in ArcGIS Online that I wrote about here. In another example, this time using cities instead of countries, see this map of the 10 most polluted and 10 least polluted large cities of the world.  Students examine spatial patterns and reasons for the pollution (or lack of it) in each city using the map and the metadata here.  I created this map by populating this Google Sheet, below.  My students could add 10 or 20 more to this sheet and their changes would be reflected in my ArcGIS Online map. Here is the map from the data, below.  For those explanatory labels, I used this custom label expression:   $feature.City + " is the #" + " " + $feature.Rank + " " + $feature.Variable and set the text color to match the point symbol color for clarity.  For more about expressions, see my blog post here. In another example, my colleague created this google sheet of some schools in India by latitude-longitude. Then she added the published content from Google to her map.  Let's explore a bit deeper.  Let's say that I wanted to visualize the most commonly visited countries among my students.  I can certainly examine the statistics from my Google form, as seen below: However, my goal is really to see this data on a map.  With the analysis tools in ArcGIS Online, this too is quickly done. The Aggregate Points tool will summarize points in polygons.  For my polygons, I added a generalized world countries map layer, and then used Aggregate Points to summarize my point data within those countries.  The result is shown below and is visible as a layer in the map I referenced above.  Another point worth noting is that you can adjust the settings of how your map interacts with your Google Sheet.  Go to the layer's metadata page, and under “Published content & settings”, select "Automatically republish when changes are made." You can set the refresh interval to, for example, 1 minute, but the actual refresh on your map may take somewhat longer because Google’s “Auto re-publish” isn’t quite "real-time".  Then do the following for the layer: Note that if you are geocoding by address (such as city/country, as I did above, or street address), the automatic refresh option is not available: To get around this challenge, I manually added the latitude-longitude values to my cities spreadsheet.  Thanks to the Measure tool in ArcGIS Online, this took less than 1 minute per city.  I simply typed in the city name in ArcGIS Online, and used the Location button under the Measure tools, clicked on the map where the city was located, and entered the resulting coordinates into my spreadsheet. For more information, see this blog essay.   Labels Higher Education Schools (K - 12) Tags (5) Tags: citizen science crowdsourcing google forms google sheets participatory mapping 3 Kudos Share 4 Comments by JosephKerski Esri Frequent Contributor ‎07-26-2017 05:52 PM Mark as Read Mark as New Bookmark Permalink Print Email to a Friend Report Inappropriate Content Important update!  Because of my experience with not being able to flip the ramp in the top 10 polluted cities map, our awesome development team added the Invert button in smart mapping. Now you don’t need to write an equation and have a legend from 0 to 1. See below.  Very useful indeed! --Joseph Kerski  1 Kudo by HaleyNelson New Contributor II ‎06-12-2018 01:48 PM Mark as Read Mark as New Bookmark Permalink Print Email to a Friend Report Inappropriate Content This is great! Will this process work in reverse? For example, will (or can) the google sheets be automatically updated if new points are added to the map, or attributes are updated in the web map? Is this a possible workflow? For example, can I connect a feature layer to a google sheet, collect data on that feature layer in Survey123, and have this data populate in a connected Google Sheet based on the web map refresh interval? 1 Kudo by deleted-user-0eS87ljx3Rcy New Contributor II ‎01-30-2019 09:49 AM Mark as Read Mark as New Bookmark Permalink Print Email to a Friend Report Inappropriate Content Anybody knows how to secure the published google sheets data? We want to bring the google sheet data to AGOL but google clearly states that data is not secured.  0 Kudos by FlorentBigirimana New Contributor III ‎05-27-2020 07:48 AM Mark as Read Mark as New Bookmark Permalink Print Email to a Friend Report Inappropriate Content I have created a google sheet with some records and managed to have the data from it on my web map as a web layer. One thing I was expecting is when values are updated form the google sheet, automatically the value is also updated on my layer in web map. However this is not happening. What am I missing ? 0 Kudos You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in. Comment About the Author I believe that spatial thinking can transform education and society through the application of Geographic Information Systems for instruction, research, administration, and policy. I hold 3 degrees in Geography, have served at NOAA, the US Census Bureau, and USGS as a cartographer and geographer, and teach a variety of F2F (Face to Face) (including T3G) and online courses. I have authored a variety of books and textbooks about the environment, STEM, GIS, and education. These include "Interpreting Our World", "Essentials of the Environment", "Tribal GIS", "The GIS Guide to Public Domain Data", "International Perspectives on Teaching and Learning with GIS In Secondary Education", "Spatial Mathematics" and others. I write for 2 blogs, 2 monthly podcasts, and a variety of journals, and have created over 5,000 videos on the Our Earth YouTube channel. Yet, as time passes, the more I realize my own limitations and that this is a lifelong learning endeavor and thus I actively seek mentors and collaborators. Labels Curriculum-Learning Resources 4 Education Facilities 28 GeoInquiries 1 Higher Education 324 Informal Education 191 Licensing Best Practices 1 Pedagogy and Education Theory 107 Schools (K - 12) 294 Schools (K-12) 4 STEM 1 Students - Higher Education 148 Students - K-12 Schools 1 Success Stories 1 TeacherDesk 1 Tech Tips 3 Terms of Use Community Guidelines Community Basics Privacy Trust Center Legal Contact Esri costhonduras-hn-6426 ---- Inicio - CostHonduras Menú Empleo y Consultorías Sitios de Interés Preguntas Frecuentes Mapa del Sitio Close top bar 112 Your Adress 23 Washington DC 1234 Call us anytime 415 555 1234 Send us a mail mail@domain.com Inicio Acerca de CoST La Iniciativa Historia Estatutos Financiamiento Plan Estratégico Grupo Multisectorial CoST International Historias de Éxito Procesos CoST Divulgación SISOCS Monitoreo de Proyectos Divulgados Aseguramiento Aseguramientos Realizados Auditoría Social EASI Diplomado para Periodistas Diplomado Virtual Inscripciones Recursos Sala de Prensa Noticias Boletines Contáctenos Search Toggle navigation Infraestructura mejor valorada NOTICIAS RECIENTES Proyectos de infraestructura divulgan información bajo nuevo estándar de datos 25 enero, 2021 Las instituciones y entidades ejecutoras que publican información en los portales Sisocs.org y Siscos APP, ahora deberán divulgar información de sus proyectos bajo los lineamientos de las Contrataciones Abiertas para el Estándar de Datos sobre Infraestructura (OC4IDS). La Iniciativa de… Presentan Guía de Respuesta Rápida para No miembros de CoST, para fomentar transparencia en infraestructura en tiempo de crisis 11 enero, 2021 La Iniciativa de Transparencia en Infraestructura (CoST Internacional) presentó, en los primeros días de 2021, una Guía de Respuesta Rápida para impulsar la transparencia y la rendición de cuentas en los proyectos de infraestructira pública ejecutados en tiempos de crisis,… En el Día Internacional Contra la Corrupción, CoST presenta Manual del Índice de Transparencia en Infraestructura 16 diciembre, 2020 Honduras fue, en 2017, el país piloto donde se implementó esta herramienta El ITI no sólo considera el acceso a la información, también la calidad de la misma Para 2030 se estima se podrían ahorrar cerca de 6,000 millones de… Ver más noticias 1237 Proyectos divulgados red vial y APP 500 Personas capacitadas 5 Porcentaje divulgación información proyectos 1 Proyectos en Estudios de Aseguramiento NUESTRAS REDES SOCIALES Tweets by CostHonduras GRUPO MULTISECTORIAL Procesos CoST Divulgación Aseguramiento Auditoría Social CosT Honduras © 2019 | Todos los Derechos Reservados cyber-fsi-stanford-edu-1164 ---- FSI | Cyber | Internet Observatory - David Thiel Skip to: Skip to content Skip to navigation A program of the Cyber Policy Center, part of the Freeman Spogli Institute for International Studies. Search form Home Opportunities Projects End-to-End Encryption Takedowns Trust and Safety Virality Project attribution.news (external link) Election Integrity Project (external link) About Search form Home Opportunities Projects End-to-End Encryption Takedowns Trust and Safety Virality Project attribution.news (external link) Election Integrity Project (external link) About David Thiel David Thiel Chief Technical Officer, Stanford Internet Observatory Big Data Architect Download image Bio David is the Big Data Architect and Chief Technology Officer of the Stanford Internet Observatory. Prior to Stanford, David worked at Facebook, primarily focusing on security and safety for Facebook Connectivity, a collection of projects aimed at providing faster and less expensive internet connectivity to unconnected or underconnected communities. Projects included the Terragraph mesh networking system, the Magma open source mobile network platform, Express Wi-Fi and Facebook Lite. Before Facebook, David was a VP at iSEC Partners and later NCC Group, managing the North American security consulting and research team, as well as producing original security research, coordinating vulnerability disclosure and performing security assessments and penetration testing for companies across a wide range of business sectors. David has spoken at various industry conferences, including Black Hat, DEFCON, PacSec and SOURCE Boston. He is also the author of iOS Application Security (No Starch Press) and coauthor of Mobile Application Security (McGraw-Hill). Publications Combine fields filter All White Paper Contours and Controversies of Parler Stoking Conflicts by Keystroke: An Operation Run by IRA-Linked Individuals Targeting Libya, Sudan, and Syria #ZakzakyLifeMatters: An Investigation into a Facebook Operation Linked to the Islamic Movement in Nigeria An Investigation into a Female-Focused Online Campaign in Iran and Afghanistan targeting Afghans More Publications Topics Security Our Address Encina Hall 616 Jane Stanford Way Stanford University Stanford, CA 94305-6055 Navigate Research Education People Centers News Events About Follow Us General inquiries 650-723-4581 Mail Twitter Facebook Youtube Instagram   Support Us Learn more about how your support makes a difference or make a gift now Make a gift   Top Stanford Home Maps & Directions Search Stanford Emergency Info Terms of Use Privacy Copyright Trademarks Non-Discrimination Accessibility © Stanford University, Stanford, California 94305. Copyright Complaints cynthiang-ca-6115 ---- Learning (Lib)Tech – Stories from my Life as a Technologist Skip to content Learning (Lib)Tech Stories from my Life as a Technologist Menu About Me About this Blog Contact Me Twitter GitHub LinkedIn Flickr RSS UBC iSchool Career Talk Series: Journey from LibTech to Tech The UBC iSchool reached out to me recently asking me to talk about my path from getting my library degree to ending up working in a tech company. Below is the script for my portion of the talk, along with a transcription of the questions I answered. Continue reading “UBC iSchool Career Talk Series: Journey from LibTech to Tech” Author CynthiaPosted on March 5, 2021March 5, 2021Categories Events, LibrarianshipTags career growth, reflectionLeave a comment on UBC iSchool Career Talk Series: Journey from LibTech to Tech Choosing not to go into management (again) Often, to move up and get a higher pay, you have to become a manager, but not everyone is suited to become a manager, and sometimes given the preference, it’s not what someone wants to do. Thankfully at GitLab, in every engineering team including Support, we have two tracks: technical (individual contributor), and management. Continue reading “Choosing not to go into management (again)” Author CynthiaPosted on February 2, 2021March 5, 2021Categories Work cultureTags career growth, management, reflectionLeave a comment on Choosing not to go into management (again) Prioritization in Support: Tickets, Slack, issues, and more I mentioned in my GitLab reflection that prioritization has been quite different working in Support compared to other previous work I’ve done. In most of my previous work, I’ve had to take “desk shifts” but those are discreet where you’re focused on providing customer service during that period of time and you can focus on other things the rest of the time. In Support, we have to constantly balance all the different work that we have, especially in helping to ensure that tickets are responded to within the Service Level Agreement (SLA). It doesn’t always happen, but I ultimately try to reach inbox 0 (with read-only items possibly left), and GitLab to-do 0 by the end of the every week. People often ask me how I manage to do that, so hopefully this provides a bit of insight. Continue reading “Prioritization in Support: Tickets, Slack, issues, and more” Author CynthiaPosted on December 11, 2020December 24, 2020Categories MethodologyTags productivityLeave a comment on Prioritization in Support: Tickets, Slack, issues, and more Reflection Part 2: My second year at GitLab and on becoming Senior again This reflection is a direct continuation of part 1 of my time at GitLab so far. If you haven’t, please read the first part before beginning this one. Continue reading “Reflection Part 2: My second year at GitLab and on becoming Senior again” Author CynthiaPosted on June 17, 2020January 31, 2021Categories Update, Work cultureTags GitLab, organizational culture, reflectionLeave a comment on Reflection Part 2: My second year at GitLab and on becoming Senior again Reflection Part 1: My first year at GitLab and becoming Senior About a year ago, I wrote a reflection on Summit and Contribute, our all staff events, and later that year, wrote a series of posts on the GitLab values and culture from my own perspective. There is a lot that I mention in the blog post series and I’ll try not to repeat myself (too much), but I realize I never wrote a general reflection at year 1, so I’ve decided to write about both years now but split into 2 parts. Continue reading “Reflection Part 1: My first year at GitLab and becoming Senior” Author CynthiaPosted on June 16, 2020January 31, 2021Categories Update, Work cultureTags GitLab, organizational culture, reflectionLeave a comment on Reflection Part 1: My first year at GitLab and becoming Senior Is blog reading dead? There was a bit more context to the question, but a friend recently asked me: What you do think? Is Blogging dead? Continue reading “Is blog reading dead?” Author CynthiaPosted on May 8, 2020May 7, 2020Categories UpdateTags reflectionLeave a comment on Is blog reading dead? Working remotely at home as a remote worker during a pandemic I’m glad that I still have a job, that my life isn’t wholly impacted by the pandemic we’re in, but to say that nothing is different just because I was already a remote worker would be wrong. The effect the pandemic is having on everyone around you has affects your life. It seems obvious to me, but apparently that fact is lost on a lot of people. I’d expect that’s not the case for those who read my blog, but I thought it’d be worth reflecting on anyway. Continue reading “Working remotely at home as a remote worker during a pandemic” Author CynthiaPosted on May 4, 2020May 2, 2020Categories Work cultureTags remoteLeave a comment on Working remotely at home as a remote worker during a pandemic Code4libBC Lightning Talk Notes: Day 2 Code4libBC Day 2 lightning talk notes! Continue reading “Code4libBC Lightning Talk Notes: Day 2” Author CynthiaPosted on November 29, 2019Categories EventsTags authentication, big data, c4lbc, code, code4lib, digital collections, privacy, reference, teachingLeave a comment on Code4libBC Lightning Talk Notes: Day 2 Code4libBC Lightning Talk Notes: Day 1 Code4libBC Day 1 lightning talk notes! Continue reading “Code4libBC Lightning Talk Notes: Day 1” Author CynthiaPosted on November 28, 2019Categories EventsTags c4lbc, digital collections, intranet, MARC, metadata, teachingLeave a comment on Code4libBC Lightning Talk Notes: Day 1 Presentation: Implementing Values in Practical Ways This was presented at Code4libBC 2019. Continue reading “Presentation: Implementing Values in Practical Ways” Author CynthiaPosted on November 28, 2019November 28, 2019Categories Events, Work cultureTags c4lbc, organizational culture, presentation, valuesLeave a comment on Presentation: Implementing Values in Practical Ways Posts navigation Page 1 Page 2 … Page 47 Next page Cynthia Technologist, Librarian, Metadata and Technical Services expert, Educator, Mentor, Web Developer, UXer, Accessibility Advocate, Documentarian View Full Profile → Follow Us Twitter LinkedIn GitHub Telegram Search for: Search Categories Events Librarianship Library Academic Public Special Tours Methodology Project work Technology Tools Update Web design Work culture Follow via Email Enter your email address to receive notifications of new posts by email. Email Address: Follow About Me About this Blog Contact Me Twitter GitHub LinkedIn Flickr RSS Learning (Lib)Tech You must be logged in to post a comment. Loading Comments... Comment × cynthiang-ca-7810 ---- Learning (Lib)Tech Learning (Lib)Tech Stories from my Life as a Technologist UBC iSchool Career Talk Series: Journey from LibTech to Tech The UBC iSchool reached out to me recently asking me to talk about my path from getting my library degree to ending up working in a tech company. Below is the script for my portion of the talk, along with a transcription of the questions I answered. Context To provide a bit of context (and … Continue reading "UBC iSchool Career Talk Series: Journey from LibTech to Tech" Choosing not to go into management (again) Often, to move up and get a higher pay, you have to become a manager, but not everyone is suited to become a manager, and sometimes given the preference, it’s not what someone wants to do. Thankfully at GitLab, in every engineering team including Support, we have two tracks: technical (individual contributor), and management. Progression … Continue reading "Choosing not to go into management (again)" Prioritization in Support: Tickets, Slack, issues, and more I mentioned in my GitLab reflection that prioritization has been quite different working in Support compared to other previous work I’ve done. In most of my previous work, I’ve had to take “desk shifts” but those are discreet where you’re focused on providing customer service during that period of time and you can focus on … Continue reading "Prioritization in Support: Tickets, Slack, issues, and more" Reflection Part 2: My second year at GitLab and on becoming Senior again This reflection is a direct continuation of part 1 of my time at GitLab so far. If you haven’t, please read the first part before beginning this one. Becoming an Engineer (18 months) The more time I spent working in Support, the more I realized that the job was much more technical than I originally … Continue reading "Reflection Part 2: My second year at GitLab and on becoming Senior again" Reflection Part 1: My first year at GitLab and becoming Senior About a year ago, I wrote a reflection on Summit and Contribute, our all staff events, and later that year, wrote a series of posts on the GitLab values and culture from my own perspective. There is a lot that I mention in the blog post series and I’ll try not to repeat myself (too … Continue reading "Reflection Part 1: My first year at GitLab and becoming Senior" Is blog reading dead? There was a bit more context to the question, but a friend recently asked me: What you do think? Is Blogging dead? I think blogging the way it used to work is (mostly) dead. Back in the day, we had a bunch of blogs and people who subscribe to them via email and RSS feeds. … Continue reading "Is blog reading dead?" Working remotely at home as a remote worker during a pandemic I’m glad that I still have a job, that my life isn’t wholly impacted by the pandemic we’re in, but to say that nothing is different just because I was already a remote worker would be wrong. The effect the pandemic is having on everyone around you has affects your life. It seems obvious to … Continue reading "Working remotely at home as a remote worker during a pandemic" Code4libBC Lightning Talk Notes: Day 2 Code4libBC Day 2 lightning talk notes! Code club for adults/seniors – Dethe Elza Richmond Public Library, Digital Services Technician started code clubs, about 2 years ago used to call code and coffee, chain event, got little attendance had code codes for kids, teens, so started one for adults and seniors for people who have done … Continue reading "Code4libBC Lightning Talk Notes: Day 2" Code4libBC Lightning Talk Notes: Day 1 Code4libBC Day 1 lightning talk notes! Scraping index pages and VuFind implementation – Louise Brittain Boisvert Systems Librarian at Legislative collection development policy: support legislators and staff, receive or collect publications, many of them digital but also some digitized (mostly PDF, but others) accessible via link in MARC record previously, would create an index page … Continue reading "Code4libBC Lightning Talk Notes: Day 1" Presentation: Implementing Values in Practical Ways This was presented at Code4libBC 2019. Slides Slides on GitHub Hi everyone, hope you’re enjoying Code4libBC so far. While I’m up here, I just want to take a quick moment to thank the organizers past and present. We’re on our 7th one and still going strong. I hope to continue attending and see this event … Continue reading "Presentation: Implementing Values in Practical Ways" dancohen-org-6268 ---- Dan Cohen – Vice Provost, Dean, and Professor at Northeastern University Skip to the content Search Dan Cohen Vice Provost, Dean, and Professor at Northeastern University Menu About Blog Newsletter Podcast Publications Social Media CV RSS Search Search for: Close search Close Menu About Blog Newsletter Podcast Publications Social Media CV RSS What’s New Podcast Humane Ingenuity Newsletter Blog Publications © 2021 Dan Cohen Powered by WordPress To the top ↑ Up ↑ datafest-ge-59 ---- DataFest 2020 Speakers Agenda Partners About Passes Past Editions X Speakers Agenda Partners About Passes Past Editions 15-17 December Watch the recordings! Follow us: #DataFestTbilisi Online celebration for data lovers Online celebration for data lovers DataFest Tbilisi 2020 is the 4th edition of an annual international data conference happening in the vibrant capital of Georgia. This time, it will take place online and, traditionally, will bring together hundreds of data professionals from all around the world, to inspire and encourage, and to create meaningful connections.    Journalism Human Rights & Democracy Design Analytics Technology Business Speakers All Speakers Gert Franke Co-founder / Managing Director @ CLEVER°FRANKE | The Netherlands Nasser Oudjidane Co-Founder & CEO @ Intrro | UK Devendra Vyavahare Senior data engineer @ Delivery Hero | Germany Rocío Joo Statistician, Researcher, Data scientist @ University of Florida | USA Gev Sogomonian Co-founder @ AimHub | Armenia Tetyana Bohdanova Fellow @ Prague Civil Society Centre | Ukraine Anahit Karapetyan Compliance Investigator / AML Trainer @ Revolut | Poland Wael Eskandar Analyst @ Tactical Tech | Germany Erekle Magradze Director of Engineering @ MaxinAI, Associate Professor @ Ilia State University | Georgia Lasha Pertakhia Machine Learning Engineer @ MaxinAI | Georgia Dr. Divya Seernani Psychologist, Researcher, Co-organizer @ R-Ladies Freiburg | Germany Luca Borella Business Development @ TESOBE | Germany Yulia Kim Business Intelligence Manager @ GoCardless | UK Stefanie Posavec Designer, Artist, Author | UK Henrietta Ross Course leader on MA Data Visualisation @ London College of Communication | UK Varlam Ebanoidze Co-founder @ RiskTech 4 FinTech | UK Gianluigi Davassi CEO @ faire.ai | Germany Miriam Quick Data Journalist, Researcher, Author | UK Rodrigo Menegat Data Journalist | Brazil Mara Pometti Data Strategist @ IBM | Italy / UK Charles Frye Deep Learning Educator @ Weights & Biases | USA Viktor Nestulia Senior Manager @ Open Contracting Partnership | Ukraine Ana Brandusescu McConnell Foundation Professor of Practice | Canada All Speakers Duncan Geere Generative Artist & Information Designer | Sweden Lauren Klein Associate Professor @ Emory University | USA Denise Ajiri Adjunct Assistant Professor @ Columbia University | USA Pedro Ecija Serrano Head of Actuarial and Analytics @ Grant Thornton Ireland | Ireland Uli Köppen Head of AI + Automation Lab @ German Public Broadcaster | Germany Irakli Gogatishvili Head of Data Research Lab @ Bank of Georgia | Georgia Natalia Voutova Head @ Council of Europe Office in Georgia | Georgia Omar Ferwati Researcher @ Forensic Architecture | Canada Caroline Lair Founder @ The Good AI | France Shabnam Mojtahedi Sr. Program Manager @ Benetech | USA Bilal Mateen Clinical Technology Lead @ Wellcome Trust | UK David Mark Human Rights Adviser @ ODIHR | Poland Michela Graziani Co-founder & Product designer @ Symbolikon | Italy Sandra Rendgen Author, Visualization Strategist | Germany Adina Renner Visual Data Journalist @ Neue Zürcher Zeitung | Switzerland Evelina Judeikyte Data Analyst @ iziwork | France Evelyn Münster Data Visualization Designer | Germany Barnaby Skinner Head of Visuals @ Neue Zürcher Zeitung | Switzerland Kathy Rowell Co-Founder & Principal @ HealthDataViz | USA Jane Zhang Data Visualization Designer | Canada Frederico Pires Senior Customer Growth Manager @ UTRUST | Portugal Carlotta Dotto Senior Data Journalist @ First Draft | UK Ashish Singh Co-founder @ ScatterPie Analytics | India Don't miss the event Get your pass Organizers & Partners Questions? Contact us: hello@datafest.ge Follow us: #DataFestTbilisi Subscribe here for news: Send Made by ForSet, Wandio with datamish-com-5819 ---- Bitcoin shorts vs Longs - Click for BTC margin charts - Datamish About Bitcoin Bitcoin is the revolutionary P2P digital cash envisioned by Satoshi Nakamoto. Many attempts have been made to dethrone Bitcoin, but real connoisseurs accept no imitations. Bitcoin is referred to as digital gold with good reason. It is borderless, decentralized, censorship resistant, and open source. Trade Bitcoin The Bitcoin market is not the most volatile crypto market, but is by far the most liquid and most traded. Futures: Spot: Bitcoin development The network has been running for 10 years, but development is in no way stagnant. Bitcoin developers are some of the best in the space, and they are constantly looking for safe ways to improve and upgrade the system. Recent highlights: Segwit Lightning Network Smart contracts (via RSK) Read the Bitcoin white paper or see this page if you want to learn more about Bitcoin. 360D 180D 90D 30D 14D 7D 2D 24H 12H 6H 4H 2H 1H Dashboards Bitcoin BTC Litecoin LTC Ethereum ETH Cardano ADA Monero XMR Zcash ZEC IOTA IOT EOS Ripple XRP Help & Contact Help page Contact Share this page: Bitcoin margin data - BTC 24H Bitcoin BTCUSD 24 hour timeframe ADAGE - A low fee Cardano stake pool Price {{ price }} {{ daily_change_pct }}% From ATH {{ ath_change_pct }}% Year to date {{ ytd_change_pct }}% Longs {{ total_longs }} {{ symbol_longs_amount_pct_change }}% USD lending rate {{ usd_rate }}% USD available {{ usd_funding_available }} Shorts {{ total_shorts }} {{ symbol_shorts_amount_pct_change }}% BTC lending rate {{ symbol_rate }}% BTC available {{ symbol_funding_available }} 00:00 London 00:00 Berlin 00:00 Athens 00:00 Moscow 00:00 Dubai 00:00 Hong Kong 00:00 Beijing 00:00 Seoul 00:00 Tokyo 00:00 Melbourne 00:00 Los Angeles 00:00 Mexico City 00:00 New York Bitcoin price & total long and short interest Left Y: BTC total longs & shorts Right Y: Price USD {{ price }} Mana vs. Pain 24h health score Longs Shorts Pain {{ long_pain_24h }} {{ short_pain_24h }} Mana {{ long_mana_24h }} {{ short_mana_24h }} 7d health score Longs Shorts Pain {{ long_pain_7d }} {{ short_pain_7d }} Mana {{ long_mana_7d }} {{ short_mana_7d }} 14d health score Longs Shorts Pain {{ long_pain_14d }} {{ short_pain_14d }} Mana {{ long_mana_14d }} {{ short_mana_14d }} Longs & USD interest rate Left Y: USD daily interest rate Right Y: BTC longs Shorts & BTC interest rate Left Y: BTC daily interest rate Right Y: BTC shorts Percent longs and shorts Left Y: Percent short vs. long Hedged and unhedged shorts Left Y: BTC shorts Today's sentiment changes (past 24h) Left Y: BTC sentiment change Bitfinex long & short liquidations past 14d Left Y: BTC volume liquidated Bitfinex total long & short liquidations (timeframe) Left Y: BTC volume liquidated Bitmex long & short liquidations past 14d Left Y: USD/contract volume liquidated Bitmex total long & short liquidations (timeframe) Left Y: USD/contract volume liquidated Never leave without a song. Close New version! × Welcome to the new version of datamish.com! A lot of hours has gone into making the new version more user friendly and informative. Hope you like it :-) The old version will be made available on https://old.datamish.com within a few days. Close Longs vs. shorts × Live data BFX On the Bitcoin price chart you can see: Bitcoin price in USD (white line) Total Bitcoin longs (green line) Total Bitcoin short (red line) Both longs and shorts are measured in BTC. On shorter timeframes (say below one week) longs and shorts are typically almost straight lines because they don't fluctuate much and because of Y-axis scaling. The two charts below the price chart show the same values for total longs and shorts, but capture the short term flucturations much better. Esc to close Pain & Mana score × Live data Pain and Mana is a health-score that is calculated by Datamish. Both longs and shorts have a pain and a mana score. Pain is bad and mana is good. Pain score increases when traders are adding to Bitcoin positions while the market is moving against them. So they could be in for a squeeze. Increased mana score happens when traders are closing their Bitcoin positions while price is moving with them. They are regaining energy. A positive mana score can sometimes happen after the other side has been squeezed successfully. Pain and mana score is dependent on timeframe, so Datamish calculates the scores for three different timeframes: 24h, 7d, and 14d. Pain and mana score does not tell you anything you couldn’t figure out for yourself by looking at the price chart, the longs chart, and the shorts chart. If you want to learn more the about how pain and mana score works then go to one of the time three frames and consider how price, shorts, and longs have developed within in that timeframe. Esc to close Longs and USD interest rate × Live data BFX On the chart you can see: Left-Y: Daily interest rate for USD (grey line) Right-Y: Total longs measured in Bitcoin (green line) Changes in long positions are important to consider. Increasing longs express a bullish sentiment, and decreasing longs express a bearish sentiment. If USD interest rate is high, traders are less likely to borrow USD to go long Bitcoin. Interest rate can be pushed up if there is little funding available, so it is a good idea to keep an eye on both interest rates and available funding. At the top of the page there is a section where you can see how much USD funding is available. The risk of liquidation means that margin traders are "weak hands" that can easily be shaken out of their positions. If there are too many longs this can result in a long squeeze. Esc to close Shorts and Bitcoin interest rate × Live data BFX On the chart you can see: Left-Y: Daily interest rate for Bitcoin (grey line) Right-Y: Total shorts measured in BTC (red line) If Bitcoin interest rate is high, traders are less likely to borrow Bitcoin to go short. Interest rate can be pushed up if there is little Bitcoin funding available, so that is worth considering. At the top of the page there is a section where you can see how much Bitcoin funding is available. Changes in short positions are important to consider. If shorts increase then sentiment is bearish, and if shorts decrease then bearish sentiment is decreasing. The risk of liquidation means that margin traders are "weak hands" that can easily be shaken out of their positions. If there are too many shorters then that can lead to a short squeeze. Esc to close Percent longs and shorts × Live data BFX This chart shows the distribution of longs and shorts as a percentage of the total margin interest, and tracks how this distribution has changed over time. Esc to close Hedged and unhedged shorts × Live data BFX On this chart you can see: Yellow line: The amount of BTC shorts that are known to be hedged. Red line: The amount of BTC shorts that are unhedged (or rather not known to be hedged). Adding hedged and unhedged shorts gives you the total amount of shorts. Sometimes you will see a sudden and substantial drop in the total amount of shorts that has no effect on price. This can seem surprising because you usually complete a trade when you close a short position. The explanation is that the closed short position was hedged. In other words the trader that closed his position did not need to go into the market to buy cover when the position was closed. Esc to close Todays sentiment changes × Live data BFX On this chart you can see sentiment changes for the past 24 hours (the timeframe is fixed). Essentially the chart is reflecting how much BTC has been added or removed on the short side (red line) and how much BTC that has been added or removed on the long side (green line). If the green line is above the red line then sentiment can be said to be more bullish than bearish. Likewise sentiment can be said to be more bearish than bullish if the red line is above the green line. Esc to close Bitcoin liquidations on Bitfinex past 14d × Live data BFX This chart shows the volume liquidated each day for the past two weeks (timeframe is fixed). Short liquidations are green, and long liquidations are red. Bitcoin Liquidations on Bitfinex are measured in BTC. For Bitcoin and Ethereum the charts include liquidation data from both spot AND futures exchanges. Esc to close Total long & short Bitcoin liquidations on Bitfinex (timeframe) × Live data BFX This chart shows the total BTC volume liquidated for the selected timeframe. Short liquidations are green, and long liquidations are red. Liquidations on Bitfinex are measured in BTC so this is what we have on the Y-Axis. Above each bar you can see how many positions has been liquidated in total. For Bitcoin and Ethereum the charts include liquidation data from both spot AND futures exchanges. Esc to close Bitcoin liquidations on Bitmex for the past 14d × Live data BITMEX This chart shows the volume liquidated for the Bitcoin-USD trading pair each day for the past two weeks (timeframe is fixed). Short liquidations are green, and long liquidations are red. Liquidations on Bitmex are measured in contracts. Each contract is 1 USD. Esc to close Total Bitcoin liquidations on Bitmex (timeframe) × Live data BITMEX This chart shows the total volume liquidated for the selected timeframe. Short liquidations are green, and long liquidations are red. Bitcoin liquidations on Bitmex are measured in contracts (1 USD), so this is what we have on the Y-Axis. Above each bar you can see how many positions have been liquidated in total. Esc to close datosabiertospj-eastus-cloudapp-azure-com-7111 ---- Estándar de Datos de Contrataciones Abiertas (OCDS) - Conjuntos de datos - Datos Abiertos del Poder Judicial de Costa Rica Ir al contenido Iniciar Sesión Registro Conjuntos de datos Organizaciones Grupos Acerca de Buscar conjuntos de datos Inicio Organizaciones Poder Judicial de Costa Rica Estándar de Datos de ... Estándar de Datos de Contrataciones Abiertas (OCDS) Seguidores 0 Organización Poder Judicial de Costa Rica Poder Judicial de Costa Rica leer más Social Twitter Facebook Licencia Creative Commons Attribution Conjunto de datos Grupos Flujo de Actividad Estándar de Datos de Contrataciones Abiertas (OCDS) Estándar de Datos de Contrataciones Abiertas (OCDS) Datos y Recursos Estándar de Datos de Contrataciones Abiertas ...ZIP Estándar de Datos de Contrataciones Abiertas (OCDS) - Masivo Explorar Más información Ir al recurso Estándar de Datos de Contrataciones Abiertas ...JSON Estándar de Datos de Contrataciones Abiertas (OCDS) - 2018 Explorar Más información Ir al recurso Estándar de Datos de Contrataciones Abiertas ...JSON Estándar de Datos de Contrataciones Abiertas (OCDS) - 2019 Explorar Más información Ir al recurso Estándar de Datos de Contrataciones Abiertas ...JSON Estándar de Datos de Contrataciones Abiertas (OCDS) - 2020 Explorar Más información Ir al recurso Estándar de Datos de Contrataciones Abiertas ...JSON Estándar de Datos de Contrataciones Abiertas (OCDS) - 2021 Explorar Más información Ir al recurso Información Adicional Campo Valor Autor Poder Judicial Mantenedor Poder Judicial Versión 1.0 Última actualización 28 Enero, 2021, 23:03 (UTC) Creado 3 Agosto, 2020, 23:21 (UTC) Lineamiento de publicación de datos abiertos del Poder Judicial de Costa RicasegúnOpen Contracting Data Standard (OCDS) https://proveeduria.poder-judicial.go.cr/images/Documentos/Lineamientos_Open_Contracting_PJCRC_version_final_REV_JU_Innovaapv1.pdf Acerca de Datos Abiertos del Poder Judicial de Costa Rica API CKAN CKAN Association Gestionado con CKAN Idioma español English português (Brasil) 日本語 italiano čeština (Česká republika) català français Ελληνικά svenska српски norsk bokmål (Norge) slovenčina suomi русский Deutsch polski Nederlands български 한국어 (대한민국) magyar slovenščina latviešu Tiếng Việt srpski (latinica) 中文 (简体, 中国) فارسی (ایران) ខ្មែរ English (Australia) українська (Україна) नेपाली galego shqip עברית македонски ไทย українська Indonesia Türkçe español (Argentina) hrvatski íslenska dansk (Danmark) монгол (Монгол) العربية română português (Portugal) lietuvių 中文 (繁體, 台灣) Filipino (Pilipinas) Ir davidgerard-co-uk-1719 ---- News: vanishing NFTs, Free Keene not so free, Coinbase wash-trading Litecoin – Attack of the 50 Foot Blockchain Skip to content Attack of the 50 Foot Blockchain Blockchain and cryptocurrency news and analysis by David Gerard About the author Attack of the 50 Foot Blockchain: The Book Book extras Business bafflegab, but on the Blockchain Buterin’s quantum quest Dogecoin Ethereum smart contracts in practice ICOs: magic beans and bubble machines Imogen Heap: “Tiny Human”. Total sales: $133.20 Index Libra Shrugged: How Facebook Tried to Take Over the Money My cryptocurrency and blockchain commentary and writing for others Press coverage: Attack of the 50 Foot Blockchain Press coverage: Libra Shrugged Table of Contents The conspiracy theory economics of Bitcoin The DAO: the steadfast iron will of unstoppable code Search for: Main Menu News: vanishing NFTs, Free Keene not so free, Coinbase wash-trading Litecoin 29th March 202111th April 2021 - by David Gerard - Leave a Comment I have printed copies of Libra Shrugged and Attack of the 50 Foot Blockchain here — if you’d like to get yourself copies of the books signed by the author, go to this post and see how much to PayPal me. You can support my work by signing up for the Patreon — a few dollars every month ensures the continuing flow of delights. It really does help. [Patreon] I added a $100/month Corporate tier to the Patreon — you get early access to stories I’m working on, and the opportunity to ask your blockchain questions and have me answer! You get that on the other tiers too — but the number is bigger on this tier, and will look more impressive on your analyst newsletter expense account. [Patreon] And tell your friends and colleagues to sign up for this newsletter by email! [scroll down, or click here] Prole art threat I’m an “is it art?” maximalist. NFTs can be used for creative artistic value — just as anything can. The creator and the buyers are playing a game together; there can be genuine appreciation and participation there. I’m not gonna tell ’em they’re wrong. Of course, it may be art, but it can also be a reprehensible scam. The serious problems with the wider NFT market remain. And when the KLF burnt a million quid, they only set it on fire once. If you think of the most absolutely inept and trash-tier way of performing any real-world function, then crypto will reliably not meet even that bar. The pictures for NFTs are often stored on the Interplanetary File System, or IPFS. Blockchain promoters talk like IPFS is some sort of bulletproof cloud storage that works by magic and unicorns. But functionally, IPFS works the same way as BitTorrent with magnet links — if nobody bothers seeding your file, there’s no file there. Nifty Gateway turn out not to bother to seed literally the files they sold, a few weeks later. [Twitter; Twitter] How does the OpenSea NFT platform deal with copyright violations? They keep the unfortunate buyer’s money — and tell them they should have done their own research. (Story by Ben Munster.) [Vice] Beeple has made the wisest play in the NFT game — he got the $60 million in ether for his JPEG, and sold it for dollars immediately. [New Yorker]   woopsie pic.twitter.com/sAQBee0YJ5 — Kim Parker (@thatkimparker) March 28, 2021   This is Radio Freedom Activists from the Free Keene movement, who seek to turn Keene, New Hampshire into a Libertarian paradise, are being ground under the statist jackboot — just for using sound money on a website! Well, running a money transmission business — specifically, exchanging cryptocurrency for actual money — that wasn’t “licensed” by the bureaucratic oppressors who hate freedom. And something about opening bank accounts in the names of churches — “The Shire Free Church”, “The Crypto Church of NH”, “The Church of the Invisible Hand”, and “The Reformed Satanic Church” — and pretending that the money coming in was tax-deductible religious donations. The usual governmental overreach. [Justice Department; Patch; indictment, PDF; case docket] Ian Freeman (On The Land?) had $1.6 million worth of bitcoins, and $178,000 in a safe, when the FBI raided the house where the arrestees lived — which was owned by “Shire Free Church Monadnock.” The same house was raided by the FBI in 2016 — on an investigation into child pornography. Must be one of those coincidences. [Keene Sentinel, archive; Union Leader, archive] For those whose day isn’t complete without some cheering Cantwell News — you know who you are — this particular bunch are all ex-friends of Chris “The Crying Nazi” Cantwell, who moved to Keene specifically to join Free Keene. The recently-arrested activists now claim Cantwell was never part of Free Keene, but that’s completely false — they showed up as moral support to Cantwell’s recent trial on threats of rape, only to throw him under the bus when he was convicted. [Manchester Ink Link] In my 2019 Foreign Policy piece on the ways neo-Nazis used Bitcoin, this bit at the end was about Cantwell: [Foreign Policy, 2019] One neo-Nazi podcaster found a credit card processor that was fine with the content of his show but said he was untouchable for another reason: He was considered a money laundering risk because he dealt in cryptocurrency. One story that didn’t get into that piece is how Cantwell got out of jail after the Unite The Right neo-Nazi rally in Charlottesville, North Carolina in 2017 and bought up big into Bitcoin! … right at the December peak of the 2017 bubble. He lost so much money on Bitcoin that he had to sell his guns to pay his lawyer.   every crypto vision of the future is trying to take a technology developed for hyperadversarial contexts and being like Let's build a society on this. like saying all transit should take place in armored tanks, or all interpersonal disputes should go through full legal discovery — stephanie (@isosteph) March 10, 2021   Lie dream of a casino soul Coinbase has had to pay a $6.5 million fine to the CFTC for allowing an unnamed employee to wash-trade Litecoin on the platform. On some days, the employee’s wash-trading was 99% of the Litecoin/Bitcoin trading pair’s volume. Coinbase also operated two trading bots, “Hedger and Replicator,” which often matched each others’ orders, and reported these matches to the market. [press release; order, PDF] CFTC commissioner Dawn Stump issued an opinion that concurred with the stated facts, but disputes that the issue was within CFTC’s jurisdiction, and says that the reporting didn’t affect the market. This appears not to be the case — it did affect the markets that depended on Coinbase’s numbers. [CFTC; New Money Review] Coinbase’s direct listing public offering has been pushed back at least to April — no reason given, but doubtless coincidental with Coinbase getting caught letting an employee run wild wash-trading on the exchange. [Bloomberg Quint] If Coinbase — one of the more regulated exchanges — did this, just think what the unregulated exchanges get up to. Bloomberg reports a CFTC probe into Binance, and whether the non-US exchange had US customers — attributed to unnamed “people familiar with the matter.” There doesn’t seem to be further news on this one as yet. [Bloomberg] Ben Delo and Arthur Hayes from BitMEX will be surrendering to US authorities to face the Department of Justice charges against them. [Bloomberg; Twitter] Bennett Tomlin summarises what Bitfinex/Tether executives did before Bitfinex or Tether. [blog post]   Said differently – unfortunately Coinbase requires its customers to retain counsel to get customer service… — David Silver (SILVER MILLER) (@dcsilver) March 29, 2021   Baby’s on fire Alex de Vries (Digiconomist) has a study published in Joule on what the rising Bitcoin price means for the Bitcoin network’s energy consumption. He thinks the Bitcoin network could already ues as much energy as every other data centre in the world — with a carbon footprint the size of London. [Joule] “Coin miners have basically added a province’s worth of electricity consumption without adding a province’s worth of economic output, so Bitcoin mining is actually a net drag on the economy as a whole,” Tim Swanson told Al Jazeera. [Al Jazeera] In late 2017, Benjamin Reynolds of Control-Finance Ltd ran a Bitcoin investment scam in the UK. The CFTC, in association with the FCA, now have a $571 million default judgement against him. The hard part: finding him. [press release] New Bitcoin use case found! Selling fake insider trading tips on the dark web. [SEC; complaint, PDF]   An entire generation (or maybe just a cargo cult on twitter/reddit) read the inflation chapter of an econ textbook, panicked & stopped before they read the rest. Maybe the fed should do some PSAs or something. Pay @cullenroche or @TheStalwart to do a youtube series. — Adam Singer (@AdamSinger) March 29, 2021   Be less Brenda The Advertising Standards Authority (UK) has finally acted against an ad for Bitcoin — in this case, a Coinfloor ad running in local papers, featuring a woman buying bitcoins with a third of her pension. The complainant said the ad was: misleading, because it failed to make clear the risks associated with Bitcoin investments, including loss of capital, and that neither Coinfloor Ltd nor the general Bitcoin market were regulated in the UK; and socially irresponsible, because it suggested that purchasing Bitcoin was a good or secure way to invest one’s savings or pension. The ASA upheld both objections. [ASA]   In 1955, a McDonald's hamburger cost $0.15. Today, they're worth $2.50 each. If you had bought 400,000 of them for just $60,000 and never sold, those burgers would be worth $1,000,000 today.#investing #CFA #compounders — abstractify 📚 (@abstractify) March 20, 2021   Carpe Diem Facebook’s Diem applied for a money transmitter licence to FINMA, the Swiss regulator, in April 2020 — back when it was still called Libra. The application is still pending, nearly a year later. FINMA apparently has internal disagreements on whether to let Diem go forward — and they know they absolutely need this to be okay with regulators in the US and EU before they proceed. [SRF, audio in German; Twitter] Kevin Weil, one of Libra/Diem’s four founders, and co-author of the Libra white paper, has quit Facebook Finance. He’s moving to satellite surveillance startup Planet.com. “I’m beyond excited to be working on a non-zombie project,” Weil didn’t quite say. [Twitter; Planet] I’m wondering how long before David Marcus gets bored running WhatsApp Pay and wanders off too. There’s still active contributions to the Diem GitHub repo, if only from Facebook staff. [GitHub] The East Caribbean Central Bank is launching its DCash CBDC pilot on 31 March. [ECCB, archive] The European Central bank has blogged on their plans for a digital euro! That is: no specific plans whatsoever, and repeated reassurances that they’re not about to replace cash, impose negative interest rates, or push out the commercial banks. And they don’t have a consumer use case as yet. [ECB]   Facebook's strategy for protecting their crypto projects from regulators is to rename the project and cycle out all the executives every 6 months so that no regulator can possibly remember if "that Libra or Diem thing" is still around — Kyle S. Gibson (@KyleSGibson) March 18, 2021   ICO, ICO Telegram’s ICO failed so hard that founder Pavel Durov ended up owing $500 million to investors — specifically, the sort of investors who have robust ideas on how to deal with perceived shenanigans. “Pavel’s got a smart team, I’m sure they’ll come up with something,” said one creditor. Durov announced in December that Telegram would start running advertising in public channels. [Telegram] Now Durov has announced a $1 billion bond issue. [Telegram] He is delighted to share that he can finally pay back the guys who put money into the ICO, and that he will continue to enjoy the use of his limbs. SEC’s action against Ripple Labs, claiming XRP is a security, continues — and so far, they’re still sniping over what the case will cover: The SEC asks to strike Ripple’s Fourth Affirmative Defense, “Lack of Due Process and Fair Notice”; Ripple complains that the SEC won’t submit documents in discovery on what it thinks of Bitcoin and Ethereum; Ripple executives Brad Garlinghouse and Christian Larsen ask to quash the SEC subpoenas to look into their personal bank accounts; and John Deaton, representing a group calling itself the XRP Holders, wishes to join the case on the grounds that the SEC has damaged the value of their XRP. Much of this will be dealt with in pleadings to be filed over April, May and June. [Case docket, with linked PDFs] Trailofbits has been fuzz-testing the compiler for Solidity — the language most blockchain smart contracts are written in — for bugs and vulnerabilities. [Trailofbits]   We do have proof that the FTC did, in fact, say “Buttcoin”https://t.co/5eywXuXsO2 https://t.co/QaxYL9OfYg — Buttcoin (@ButtCoin) March 24, 2021   Things happen The crypto ban in India looks set to go ahead, penalising miners and traders — “Officials are confident of getting the bill enacted into law as Prime Minister Narendra Modi’s government holds a comfortable majority in parliament.” You’ll have six months to liquidate your holding. [Reuters] In the meantime, Indian companies will have to disclose their crypto holdings in their profit-and-loss and balance sheets. [Ministry of Corporate Affairs, PDF; Finance Magnates] How’s Reddit’s subforum crypto token experiment going? Well, /r/cryptocurrency is now pay-to-post — 1000 MOON tokens a month, or $5. You can imagine my surprise at seeing the scheme end up being run as a scam to enrich local forum moderators. [Reddit] Visa moves to allow payment settlements using dollar-substitute stablecoin USDC, in a pilot programme with Anchorage and Crypto.com: “Visa has launched a pilot that allows Crypto.com to send USDC to Visa to settle a portion of its obligations for the Crypto.com Visa card program.” The size of the “portion” is not specified. Visa also tweeted some non-detail details. [press release; Reuters; Twitter] Former SEC chair Jay Clayton has his first post-SEC crypto consulting gig — as an advisor to One River Digital Asset Management. [press release]   I have digitized your plums and sold them although, strictly speaking, I sold a hash of a URL to a JSON file describing your plums in perpetuity or, for as long as https://t.co/3jkcTOHQBo stays in business the plums themselves? i burned them forgive me they made a lot of smoke — Ian Holmes (@ianholmes) March 17, 2021   Living on video I did a ton of media on NFTs in the past month, including the BBC’s explainer: What are NFTs and why are some worth millions? “The same guys who’ve always been at it, trying to come up with a new form of worthless magic bean that they can sell for money.” [BBC] Business Insider writes on NFTs, quoting me — and the Independent quotes Business Insider quoting me. [Business Insider, Independent] I went on the Coingeek Conversations podcast again, to talk about NFTs with Josh Petty, a.k.a. Elon Moist of Twetch. We ended up agreeing on most stuff — that you can definitely do good and fun things with NFTs, but the present mainstream market is awful. [Coingeek] I don’t yet know of anyone busted for money-laundering through NFTs, but it’s the obvious use case for objects of purely subjective value being traded in an art market at the speed of crypto. Crypto News has an article, with quotes from me. [Crypto News] I was interviewed on NTD about NFTs: Expert Warns About NFT Digital Crypto Art. [NTD] Kenny Schachter from Artnet writes about NFTs. He’s an art professor, and very much into the potential of NFTs, but he was great to talk to about this stuff. [Artnet] I can’t name it until it airs — they’re worried about their competition sniping them — but I recorded a segment this evening on NFTs for a TV show with quite a large and important audience. Should be out tomorrow, or maybe the day after. Someone sold a house for $3.3 million in bitcoins. I went on a TV segment about it, to explain what the heck a bitcoin is. [video; transcript] Sky News Arabia has a 28-minute bitcoin documentary, with me in — my bits are 15:19–15:34, 16:42–17:13 (holding up one book backwards) and 17:42–18:12. It’s all in Arabic, so I have no idea of its quality, but they’re part of the sane Sky News (UK), not the crazy one (Australia). I’m told the voiceover translations of my bits are accurate. [YouTube] I talked about celebrity crypto scams on NTD — the Elon Musk scams on Twitter, and the Instagram influencer who conned his followers out of bitcoins. Had to use the laptop camera, but ehh, it gave usable results. My segment starts 17:34. [YouTube] Not cryptocurrency related — that’s coming later, when we do the “Bitcoin Nazis” episode — but I’m on the podcast I Don’t Speak German, talking to a couple of antifa commies about Scott Alexander, author of the Intellectual Dark Web rationalist blog Slate Star Codex. I Don’t Speak German is mostly about neo-Nazis and white nationalists, and Slate Star Codex isn’t really that — but Scott Alexander is a massive and explicit fan of eugenics, “human biodiversity” (scientific racism), sterilising those he sees as unfit, and the neoreactionary movement, so that was close enough for our purposes. (For cites on all those claims, listen to the podcast.) It was a fun episode. Also appearing is Elizabeth Sandifer, author of Neoreaction a Basilisk (UK, US), and the person responsible for me starting Attack of the 50 Foot Blockchain. [I Don’t Speak German] Hint for crypto video media: when sending a query, say who the hosts and all the guests are, and what the format is. The media arm of one crypto news site that’s definitely large enough to know better nearly (inadvertently) sprang an ambush live debate on me, until I questioned more closely. Don’t be the outlet that your prospective subjects warn each other about.   Rare that a single tweet so perfectly encapsulates everything that makes my skin crawl about SF Bay Area's moneyed, whitebread techie monoculture. Truly the most cursed thing I have seen in recent memory. https://t.co/8eCWPGlu1O — KC 🏴 (@KdotCdot) March 18, 2021   Check out my technical analysis on the stuck boat, big breakout incoming. Should be unstuck any time now. Very bullish pic.twitter.com/u2BknxUyqT — G. Kennedy Fuld Jr., CFA, MBA, ChEA, FRM (@MemberSee) March 25, 2021   Your subscriptions keep this site going. Sign up today! Share this: Click to share on Twitter (Opens in new window) Click to share on Facebook (Opens in new window) Click to share on LinkedIn (Opens in new window) Click to share on Reddit (Opens in new window) Click to share on Telegram (Opens in new window) Click to share on Hacker News (Opens in new window) Click to email this to a friend (Opens in new window) Taggedanchoragearthur hayesasabeepleben delobenjamin reynoldsbinancebitfinexbitmexcftcchristopher cantwellcoinbasecoinfloorcontrol-financecrypto.comdawn stumpdcashdiemdigiconomistecbeccbfacebook financefinmafree keeneian freemanicoindiaipfsjay claytonkevin weillinkslitecoinnftnifty gatewayone riveropenseapavel durovredditripplesecsolidityswitzerlandtelegramtethertim swansontrailofbitsusdcvisaxrp Post navigation Previous Article Quadriga documentary ‘Dead Man’s Switch’ — the trailer is out Next Article Tether produces a new attestation — it says nothing useful Leave a Reply Cancel reply Your email address will not be published. Required fields are marked * Comment Name * Email * Website Notify me of follow-up comments by email. Notify me of new posts by email. This site uses Akismet to reduce spam. Learn how your comment data is processed. Search for: Click here to get signed copies of the books!   Get blog posts by email! Email Address Subscribe Support this site on Patreon! Hack through the blockchain bafflegab: $5/month for early access to works in progress! $20/month for early access and even greater support! $100/month corporate rate, for your analyst newsletter budget! Buy the books! Libra Shrugged US Paperback UK/Europe Paperback ISBN-13: 9798693053977 Kindle: UK, US, Australia, Canada (and all other Kindle stores) — no DRM Google Play Books (PDF) Apple Books Kobo Smashwords Other e-book stores Attack of the 50 Foot Blockchain US Paperback UK/Europe Paperback ISBN-13: 9781974000067 Kindle: UK, US, Australia, Canada (and all other Kindle stores) — no DRM Google Play Books (PDF) Apple Books Kobo Smashwords Other e-book stores Available worldwide  RSS - Posts  RSS - Comments Recent blog posts News: Coinbase goes public, Bitcoin hashrate goes down, NFTs go down, proof-of-space trashes hard disk market Stilgherrian: The 9pm Dumb Anarcho-Capitalist Blockchain Scams with David Gerard Podcast: I Don’t Speak German #85: Crypto Fascists, with David Gerard Desperate investors, neoliberalism and Keynes: how to increase returns New York’s Excelsior Pass for COVID-19, on IBM Blockchain: doing the wrong thing, badly Excerpts from the book Table of Contents The conspiracy theory economics of Bitcoin Dogecoin Buterin’s quantum quest ICOs: magic beans and bubble machines Ethereum smart contracts in practice The DAO: the steadfast iron will of unstoppable code Business bafflegab, but on the Blockchain Imogen Heap: “Tiny Human”. Total sales: $133.20 Index About Press coverage for Attack of the 50 Foot Blockchain Press coverage for Libra Shrugged My cryptocurrency and blockchain press commentary and writing Facebook author page About the author Contact The content of this site is journalism and personal opinion. Nothing contained on this site is, or should be construed as providing or offering, investment, legal, accounting, tax or other advice. Do not act on any opinion expressed here without consulting a qualified professional. I do not hold a position in any crypto asset or cryptocurrency or blockchain company. Amazon product links on this site are affiliate links — as an Amazon Associate I earn from qualifying purchases. (This doesn’t cost you any extra.) Copyright © 2016–2021 David Gerard Powered by WordPress and HitMag. Send to Email Address Your Name Your Email Address Cancel Post was not sent - check your email addresses! Email check failed, please try again Sorry, your blog cannot share posts by email. davidgerard-co-uk-2274 ---- NFTs: crypto grifters try to scam artists, again – Attack of the 50 Foot Blockchain Skip to content Attack of the 50 Foot Blockchain Blockchain and cryptocurrency news and analysis by David Gerard About the author Attack of the 50 Foot Blockchain: The Book Book extras Business bafflegab, but on the Blockchain Buterin’s quantum quest Dogecoin Ethereum smart contracts in practice ICOs: magic beans and bubble machines Imogen Heap: “Tiny Human”. Total sales: $133.20 Index Libra Shrugged: How Facebook Tried to Take Over the Money My cryptocurrency and blockchain commentary and writing for others Press coverage: Attack of the 50 Foot Blockchain Press coverage: Libra Shrugged Table of Contents The conspiracy theory economics of Bitcoin The DAO: the steadfast iron will of unstoppable code Search for: Main Menu NFTs: crypto grifters try to scam artists, again 11th March 202111th March 2021 - by David Gerard - 15 Comments. Non-fungible tokens, or NFTs, are the crypto hype for 2021 — since DeFi ran out of steam in 2020, and Bitcoin’s pumped bubble seems to be deflating. The scam is to sell NFTs to artists as a get-rich-quick scheme, to make life-changing money. There’s a gusher of money out there! You just create a token! And any number of crypto grifters would be delighted to assist you. For a small consideration. It’s con men with a new variety of magic beans to feed the bubble machine — and artists are their excuse this time. The NFT grift works like this: Tell artists there’s a gusher of free money! They need to buy into crypto to get the gusher of free money. They become crypto advocates, and make excuses for proof-of-work and so on. A few artists really are making life-changing money from this! You probably won’t be one of them. In a nicer, happier world, NFTs would be fun little things you could make and collect and trade, and it’d be great. It’s a pity this is crypto.     What is an NFT? An NFT is a crypto-token on a blockchain. The token is virtual — the thing you own is a cryptographic key to a particular address on the blockchain — but legally, it’s property that you can buy, own or sell like any other property. Most crypto-tokens, such as bitcoins, are “fungible” — e.g., you mostly don’t care which particular bitcoins you have, only how much Bitcoin you have. Non-fungible tokens are a bit different. Each one is unique — and can be used as an identifier for an individual object. The NFT can contain a web address, or maybe just a number, that points somewhere else. An NFT is just a pointer. If the place the NFT points to is a site that claims to sell NFTs that represent artworks — then you have what’s being called crypto-art! Note that it’s only the token that’s non-fungible — the art it points to is on a website, under centralised control, and easily changeable. When I buy an NFT, what do I get? The art itself is not in the blockchain — the NFT is just a pointer to a piece of art on a website. You’re buying the key to a crypto-token. You’re not buying anything else. An NFT doesn’t convey copyright, usage rights, moral rights, or any other rights, unless there’s an explicit licence saying so. It’s like a “Certificate of Authenticity” that’s in Comic Sans, and misspelt. At absolute best, you’re buying a piece of official merchandise — one that’s just a number pointing to a website. Why is an NFT? NFTs exist so that the crypto grifters can have a new kind of magic bean to sell for actual money, and pretend they’re not selling magic beans. The purpose of NFTs is to get you to give your money to crypto grifters. When the grifter has your money, the NFT has done its job, and none of the fabulous claims about NFTs need to work or be true past that point. NFTs are entirely for the benefit of the crypto grifters. The only purpose the artists serve is as aspiring suckers to pump the concept of crypto — and, of course, to buy cryptocurrency to pay for “minting” NFTs. Sometimes the artist gets some crumbs to keep them pumping the concept of crypto. CryptoKitties, in late 2017, was the first popular NFT. CryptoKitties was largely fueled by bored holders of ether — the cryptocurrency for Ethereum — spending their ether, that they had too much of to cash out easily, on some silly toys that they traded amongst themselves. Since then, various marketers have tried to push the idea along. People pay real money for hats in video games, don’t they? Then surely they’ll buy crypto tokens that allegedly represent their favourite commercial IP! These mostly haven’t taken off. The first real success is NBA Top Shots, where you buy an official NBA-marketed token that gives you a website trading card of a video snippet. This has taken off hugely. NBA Top Shots has its own issues, which I’ll probably deal with in a later post. DeFi pumpers tried pushing NFTs in October last year, but they couldn’t get the idea to stick. The recent Bitcoin bubble feels like it’s running out of steam — so they’re pushing the NFT idea again, and pumping it hard. With NBA Top Shots and some heavily promoted big-money alleged sales, crypto art NFTs are hitting the headlines. How do I make an NFT? If you aren’t a technically-minded blockchain enthusiast, there are websites where you can “mint” an NFT. First, you need to buy some ether. This covers the transaction fee to make your NFT. You’ll need Ethereum wallet software, probably Metamask, which is a browser extension. How much do you need? Well, guess and hope you’re lucky. Ethereum transaction fees peaked at $40 per transaction in February. Lots of poor artists have tried making NFTs and lost over $100 they really couldn’t spare — so guess high! You might notice that this looks a lot like a vanity gallery scam, or pay-to-play. You’d be correct — the purpose is to suck your precious actual-money into the crypto economy. Connect your Ethereum wallet to one of the NFT marketplaces. Upload your file and its description. You have created a token! Now you need to hope a bored crypto holder will buy it. What is “digital ownership”? Without a specific contract saying otherwise, an NFT does not grant ownership of the artwork it points to in any meaningful sense. All implications otherwise are lies to get your money. This is the “registration scam” — like selling your name on a star, or a square foot of land on the moon. Musicians will know the “band name registry” scam, where the scammer sells something that they imply will work like a trademark on your name — but, of course, it doesn’t. (There have been multiple “register your band name on a blockchain” scams.) Crypto grifters will talk about “digital ownership.” This is meaningless. The more detail you ask for what actual usable rights this “ownership” conveys, the vaguer the claims will get. The whole idea of Bitcoin was property unconfiscatable by the government, that they could use as money. Instead of a framework of laws and rights, they’d use … a blockchain! This notion is incoherent and stupid on multiple levels — money is a construct agreed upon in a society, property rights are a construct of law and social expectations — but it’s also what the bitcoiners believe and what they wanted. NFTs try to justify themselves with variations on this claim as the marketing pitch. Christie’s auction of an NFT is a fabulous worked example. There’s a 33-page terms and conditions document, and if you wade through the circuitous verbiage, it finally admits that … you’re just buying the crypto-token itself: [Christie’s, PDF, archive] You acknowledge that ownership of an NFT carries no rights, express or implied, other than property rights for the lot (specifically, digital artwork tokenized by the NFT). … You acknowledge and represent that there is substantial uncertainty as to the characterization of NFTs and other digital assets under applicable law. The magic bean in question is bidding at $13 million as I write this, which means Christie’s stands to make about $2 million commission. Pretty good payday for a cryptographic hash. [Christie’s] I don’t understand any of this. Please explain it like I’m five. “Would you like to watch your favourite CBeebies show — or would you like me to write on a piece of paper that you own the show? All you get is the piece of paper.” The trouble with explaining NFTs to a five-year-old is that you’ll have a hard time convincing a five-year-old that this nonsense isn’t the nonsense it obviously is. It sounds unfathomably stupid because it’s unfathomably stupid. The K Foundation Burn A Million NFTs: Crypto art’s ghastly CO2 production Proof-of-work is the reprehensible, planet-destroying mechanism that the Ethereum and Bitcoin blockchains use to decide who gets fresh ether or bitcoins. Proof-of-work is inexcusable nonsense, and every single person making money in anything linked to Ethereum or Bitcoin should feel personal shame. (Crypto grifters don’t possess a shame organ.) Like Bitcoin, Ethereum uses an whole country’s worth of electricity just to keep running — and generates a country’s worth of CO2. The Ethereum developers claim they’re totally moving off proof-of-work any day now — but they’ve been saying that since 2014. Crypto grifters making bad excuses for proof-of-work will often object to calculating their favourite magic bean’s per-transaction energy use, at all. The excuse is that adding more transactions doesn’t directly increase Bitcoin or Ethereum’s energy consumption. The actual reason is that the numbers for Bitcoin and Ethereum are bloody awful. [Digiconomist; Digiconomist] The grifters will routinely pretend it’s somehow impossible to do arithmetic, and divide the energy use by the work achieved with it — in the precise same manner we do for literally every other enterprise or industry that uses energy. But if you’re calculating energy efficiency — of Bitcoin, Ethereum, Visa, Twitter or banks — then taking the total energy used and dividing it by the total work done is the standard way to work that out. Sites have sprung up to calculate the share of energy that crypto art spends. The site cryptoart.wtf picks a random piece of crypto art and calculates that transaction’s energy use. “These figures do not include the production or storage of the works, or even web hosting, but is simply for the act of using the PoW Ethereum blockchain to keep track of sales and activity.” The creator also has a blog post to explain the site, and address common bad excuses for proof-of-work. [cryptoart.wtf; Medium] You may tell yourself “but my personal marginal effect is minimal” — but in that case, don’t pretend you’re not just another aspiring crypto grifter. There are other blockchains that don’t use proof-of-work. Hardly anybody does NFTs on these chains — almost nobody uses them, and the local cryptocurrency for your fees is a lot more work to get hold of. And even if you did use one of these other blockchains, all the other ways that NFTs are a scam would still hold. But what about artists? They need money too Artist pay is terrible. Even quite successful artists whose names you know wonder if they could tap into the rich people status-and-vanity art market, and get life-changing money. (I’ve already seen one artist bedazzled by the prospect of NFT money say that anyone who objects to crypto art must be a shill for Big Tech.) Artists don’t know technology any more than anyone else does, so a lot of artists who tentatively essayed an NFT were completely unaware of the ghastly CO2 production involved in anything that touches cryptocurrency. Several were shocked at the backlash over an issue they’d had no idea existed. Famous artists are getting into NFTs. Grimes did an NFT, and it’d be fair to say that Elon Musk’s partner isn’t going to be doing an NFT for the money. Even if it’s a bit at odds with her album about ecological collapse. But famous musicians have long had a habit of adopting some awful headline-friendly technology that’s utterly unready for prime time consumer use, in order to show that they are hep and up to speed with the astounding future. Then they never speak of it again. Remember Björk’s cryptocurrency album in 2017? Kings of Leon are doing an NFT of their new album — sort of. Their page on NFT site Opensea suggests that you buy a digital download (not an NFT), limited edition vinyl (not an NFT), or a collectible artwork (a wallpaper). So what you’re actually buying is a vinyl record with a download, and in return, you not only give the band money, but hasten ecological collapse. Some small artists have done very well indeed from NFTs — and that’s excellent news! If you’ve made life-changing money from an NFT, then that’s good for the world as well as for you — ‘cos now the money’s out of the hands of the crypto grifters. (For goodness’ sake, cash out now.) An important rule of crypto is: every number that can be faked is faked. NFTs are the sort of con where a shill appears to make a ton of money, so you’ll think you can too. Put a large price tag on your NFT by buying it from yourself — then write a press release talking about your $100,000 sale, and you’re only out the transaction fee. Journalists who can’t be bothered checking things will write this up without verifying that the buyer is a separate person who exists. Just like the high-end art world! Another thing that the high-end art world shares with crypto is money laundering. Press coverage tends to focus on cultural value, and assume this stuff must be of artistic weight because someone spent a fortune on it. The part that functions as a money-laundering scam is only starting to get comment recently. [National Law Review, 2019; Art & Object, 2020] NFTs will almost certainly be used for money laundering as well, because crypto has always been a favourite for that use case. Banksying the unbanksied: fraudulent NFTs There is no mechanism to ensure that an NFT for an artwork is created by the artist. A lot of NFTs are just straight-up fraud. If NFTs weren’t a scam, there would be legal and technical safeguards to help ensure the NFT was being created by someone who owned the work in question, to fend off scammers. But there aren’t any — the sites all work on the basis “we’ll clean it up later, maybe.” This is because NFTs only exist to further the crypto grift. There are multiple NFT sites — you could create an unlimited number of NFTs that all claimed to be of a single particular work. There are a number of Twitter bots that will make an NFT of any tweet you point them at. The point is for the bot owner to make a commission from the sale of the NFTs, before the suckers catch on. Don’t expect Twitter to do anything about these people — Twitter CEO Jack Dorsey has a $2.5 million offer for an NFT of his first tweet. The offer is from Dorsey’s fellow crypto grifter Justin Sun. Now, you might think these two massive crypto holders were just trying to get headlines for the NFT market. [Rolling Stone] Someone NFTed all of dinosaur artist Corbin Rainbolt’s tweeted illustrations — and he took down the lot and put up watermarked versions. “I am not pleased that I have to take this sort of scorched earth policy with my artwork, frankly I am livid.” [Twitter] You could go through and block and report all the Twitter bots, though more will just spring up. [Twitter] But think of all the good things you could do with NFTs, you luddite When you point out that cryptocurrencies are terrible and NFTs are a scam, crypto grifters will start talking about all the things that you could potentially do if NFTs worked like they claim they do. This is a standard crypto grifter move — any clear miserable failure in the present will be answered with talking about the fabulous future! e.g., claiming Bitcoin or blockchain promises will surely come true, because it’s just like the early Internet. Which, of course, it isn’t. What can artists and buyers do about fraudulent NFTs? If the NFT site has a copy of your artwork up, you can send a DMCA notice to them, and to their upstream network provider. If the NFT site is just claiming or implying that you created this NFT when you did not, this is clearly fraudulent (misrepresentation, passing off) — but may be harder to get immediate action on. If you bought an NFT thinking it was put up by the artist, and it wasn’t, then you’ve been defrauded, and should ask for a refund. If the NFT site won’t refund you, then bring to bear absolutely everything you can on them. If the site is unresponsive to notices of fraud — which is quite common, because crypto grifters think “digital ownership” is a thing, and don’t care that other rights might exist in law or society — it is absolutely in order to shout from the rooftops that they are frauds, and blacken their name as best you can. Contact their financial backers too. Then talk about that as well. Ask around to see if you have a lawyer friend, or a friend of a friend, who might be in a position to assist pro bono just because these grifters are that terrible. The most important thing for artists to do about NFT fraud is to work to make NFTs widely considered to be worthless, fraudulent magic beans, with massive CO2 generation per transaction. This shouldn’t be terribly difficult, given that NFTs are in fact worthless, fraudulent magic beans, with massive CO2 generation per transaction. But is it art? You can tell that crypto art is definitely art, because so many proponents of it are insufferable manifesto bros. Just the manifestos could cause runaway global warming from sheer volume of hot air. (“Banksying the unbanksied” courtesy Etienne Beureux.)   Pleased to offer a NFT version of Neoreaction a Basilisk, which you can obtain at https://t.co/SPZnjZIgOI — El Sandifer, Rationality Expert to the Stars (@ElSandifer) March 1, 2021   you claim to place such moral stock in "artists getting paid" yet do not subscribe to my patreon, curious — Dr Samantha Keeper MD (@SamFateKeeper) March 5, 2021   We have a unique opportunity to help the planet and make culture better for future generations, and everyone can contribute simply by not giving a toss about NFTs. — Dan Davies (@dsquareddigest) March 7, 2021   Your subscriptions keep this site going. Sign up today! Share this: Click to share on Twitter (Opens in new window) Click to share on Facebook (Opens in new window) Click to share on LinkedIn (Opens in new window) Click to share on Reddit (Opens in new window) Click to share on Telegram (Opens in new window) Click to share on Hacker News (Opens in new window) Click to email this to a friend (Opens in new window) Taggedchristie'scorbin rainboltcryptokittiesethereumgrimesjack dorseyjustin sunkings of leonnba top shotsnftopenseaproof of work Post navigation Previous Article News: India crypto ban, North Korea, BitMEX execs to appear, IBM Blockchain dead, more McAfee charges Next Article Foreign Policy: It’s a $69 Million JPEG, but Is It Art? 15 Comments on “NFTs: crypto grifters try to scam artists, again” Adam Achen says: 12th March 2021 at 12:24 am Wait, so, NFT don’t even typically include a license for use of the underlying?! Reply David Gerard says: 12th March 2021 at 10:24 am nope! Note how even the Christie’s contract basically says “we dunno wtf this thing is, have fun” Reply K. Paul says: 12th March 2021 at 4:40 am Isn’t it curious that on the same day that 1 billion Tethers get minted on the TRON blockchain, Beeple’s NFT gets sold for USD 69 million worth of ETH? Apparently Justin Sun (founder of the TRON blockchain) was the leading bidder until losing it to another crypto bro at the last bid. Super curious, no? Money laundering? Reply David Gerard says: 12th March 2021 at 10:25 am I’m sure it’s just coincidence, and that Sun definitely didn’t snipe his own bid under another name for press release purposes. Reply K. Paul says: 13th March 2021 at 10:16 am LOL Reply David Gerard says: 16th March 2021 at 11:50 pm It turns out it was bought by … a guy Beeple was already in the crypto business with! https://amycastor.com/2021/03/14/metakovan-the-mystery-beeple-art-buyer-and-his-nft-defi-scheme/ so the $9m (in ETH) to Christie’s is correctly viewed as a marketing expense Reply K. Paul says: 17th March 2021 at 2:40 am I think Sun, Beeple, Christie’s, Vignesh, Musk, etc. are all working together to push NFTs. It all just seems so… planned and organized in advance. Look at what Musk is doing now. Meanwhile, Tether printer goes BRRRRRRRRR!!! WK says: 13th March 2021 at 2:19 pm Thanks; this was an interesting read. I’ve been reading about “crypto” on and off for a while now, trying to understand what it’s all about because it seems like nonsense. My initial skepticism has so far been reinforced and I completely fail to see how Bitcoin or any other digital currency is independent of actual existing hard currencies. This NFT business ($2.5m for a Tweet?) is headscratchingly ridiculous. Reply John S says: 21st March 2021 at 9:01 pm Crypto is a perfect way to take money from “midwits.” Lower IQ people instinctively know it’s dumb and the barrier of entry keeps them out. Genuinely smart people (I’m not a genius but I would place myself in this category) read all the claims and conclude that there is no intrinsic value, regardless of limits on supply etc. People in the middle read the claims and convince themselves they understand this stuff and the marketing (better than FIAT, banks, libertarian utopia etc) are true and get burned. The people who make money in crypto are either insiders or they know it’s crap and sell it during bubble periods instead of holding with the expectation that the value will perpetually increase due to magical properties. Reply Ingvar says: 16th March 2021 at 1:49 pm JWZ on NFTs. Worth a read, including the comments (which, frankly, is not something I am used to saying). Reply Blaise says: 17th March 2021 at 8:26 pm Great work: I now have a much better understanding of NFT and your “contrarian” view makes perfect sense. Reply Adam Burns says: 20th March 2021 at 4:17 pm > It’s [NFTs are] like a “Certificate of Authenticity” that’s in Comic Sans, and misspelt. written in crypto crayons. for the love of humanity! oh … but wait. check out these ‘humanitarians’ https://www.proofofhumanity.id/ Reply JetBlack says: 23rd March 2021 at 12:18 am Just more proof of the unmitigated stupidity of the world we live in. Wow. And Grimes just sold a bunch of NTFs for a tidy sum. Hmm… I wonder who bought those? Is she connected to anyone with a lot of disposable income with an vested interest in Bitcoin and crypto? Reply Alex says: 24th March 2021 at 12:15 am Hello! I have a question, help me out. Let’s say an artist put up his/her work in a conditional NFT market, an auction started and he/she successfully sold it. After this event – what rights does the artist have towards the auctioned work? Or the work is still the intellectual property of the artist? Reply David Gerard says: 24th March 2021 at 12:51 am All the rights, unless explicitly stated otherwise in the sale of the NFT. The purchaser might try to claim implied rights – e.g. a limited right to reproduce the work for the purpose of saying “this is what I bought an NFT of” – but not major rights like copyright or reproduction without an explicit license. Though I am not your lawyer, so ask one if it’s important. Reply Leave a Reply Cancel reply Your email address will not be published. Required fields are marked * Comment Name * Email * Website Notify me of follow-up comments by email. Notify me of new posts by email. This site uses Akismet to reduce spam. Learn how your comment data is processed. Search for: Click here to get signed copies of the books!   Get blog posts by email! Email Address Subscribe Support this site on Patreon! Hack through the blockchain bafflegab: $5/month for early access to works in progress! $20/month for early access and even greater support! $100/month corporate rate, for your analyst newsletter budget! Buy the books! Libra Shrugged US Paperback UK/Europe Paperback ISBN-13: 9798693053977 Kindle: UK, US, Australia, Canada (and all other Kindle stores) — no DRM Google Play Books (PDF) Apple Books Kobo Smashwords Other e-book stores Attack of the 50 Foot Blockchain US Paperback UK/Europe Paperback ISBN-13: 9781974000067 Kindle: UK, US, Australia, Canada (and all other Kindle stores) — no DRM Google Play Books (PDF) Apple Books Kobo Smashwords Other e-book stores Available worldwide  RSS - Posts  RSS - Comments Recent blog posts News: Coinbase goes public, Bitcoin hashrate goes down, NFTs go down, proof-of-space trashes hard disk market Stilgherrian: The 9pm Dumb Anarcho-Capitalist Blockchain Scams with David Gerard Podcast: I Don’t Speak German #85: Crypto Fascists, with David Gerard Desperate investors, neoliberalism and Keynes: how to increase returns New York’s Excelsior Pass for COVID-19, on IBM Blockchain: doing the wrong thing, badly Excerpts from the book Table of Contents The conspiracy theory economics of Bitcoin Dogecoin Buterin’s quantum quest ICOs: magic beans and bubble machines Ethereum smart contracts in practice The DAO: the steadfast iron will of unstoppable code Business bafflegab, but on the Blockchain Imogen Heap: “Tiny Human”. Total sales: $133.20 Index About Press coverage for Attack of the 50 Foot Blockchain Press coverage for Libra Shrugged My cryptocurrency and blockchain press commentary and writing Facebook author page About the author Contact The content of this site is journalism and personal opinion. Nothing contained on this site is, or should be construed as providing or offering, investment, legal, accounting, tax or other advice. Do not act on any opinion expressed here without consulting a qualified professional. I do not hold a position in any crypto asset or cryptocurrency or blockchain company. Amazon product links on this site are affiliate links — as an Amazon Associate I earn from qualifying purchases. (This doesn’t cost you any extra.) Copyright © 2016–2021 David Gerard Powered by WordPress and HitMag. Send to Email Address Your Name Your Email Address Cancel Post was not sent - check your email addresses! Email check failed, please try again Sorry, your blog cannot share posts by email. davidgerard-co-uk-4192 ---- News: Coinbase goes public, Bitcoin hashrate goes down, NFTs go down, proof-of-space trashes hard disk market – Attack of the 50 Foot Blockchain Skip to content Attack of the 50 Foot Blockchain Blockchain and cryptocurrency news and analysis by David Gerard About the author Attack of the 50 Foot Blockchain: The Book Book extras Business bafflegab, but on the Blockchain Buterin’s quantum quest Dogecoin Ethereum smart contracts in practice ICOs: magic beans and bubble machines Imogen Heap: “Tiny Human”. Total sales: $133.20 Index Libra Shrugged: How Facebook Tried to Take Over the Money My cryptocurrency and blockchain commentary and writing for others Press coverage: Attack of the 50 Foot Blockchain Press coverage: Libra Shrugged Table of Contents The conspiracy theory economics of Bitcoin The DAO: the steadfast iron will of unstoppable code Search for: Main Menu News: Coinbase goes public, Bitcoin hashrate goes down, NFTs go down, proof-of-space trashes hard disk market 20th April 2021 - by David Gerard - 1 Comment If you’d like to get yourself copies of the books signed by the author, go to this post and see how much to PayPal me! You can support my work by signing up for the Patreon — $5 or $20 a month is like a few drinks down the pub while we rant about cryptos once a month. It really does help. [Patreon] The Patreon also has a $100/month Corporate tier — the number is bigger on this tier, and will look more impressive on your analyst newsletter expense account. [Patreon] And tell your friends and colleagues to sign up for this newsletter by email! [scroll down, or click here] The Bernard L. Madoff Memorial Coinbase Listing On 14 April 2021, Coinbase listed on NASDAQ as a public company! On the same day the Bitcoin price peaked, and Bernie Madoff — the patron saint of Bitcoin — died. Being a public company brings much closer attention to just what’s going on here, without the sort of dumb excuses that crypto bros will accept. Coinbase’s stock price is unsustainable — the starting price was at 79 times revenue, let alone earnings. For comparison, Palantir’s direct listing was at 19 times revenue. [FT Alphaville, free with registration] The stock price has behaved accordingly — and went from $409.62 on launch day, to $319.00 as I write this. [MarketWatch, archive] The other nice thing about a public listing is that the Coinbase stock price is a proxy for the price of Bitcoin — and you can’t short Bitcoin reliably, but you can certainly short stocks reliably. The Coinbase listing was thoroughly in the spirit of crypto offerings — insiders dumped a pile of their shares immediately, including the chief financial officer selling 100% of hers. Apologists swooped in to say that they only sold vested shares — which means, the shares they actually had, and not the shares they didn’t have. [Twitter; OpenInsider, archive] A lawyer — specifically, a professor of contracts — looks at the Coinbase terms of service, and specifically the requirement to take disputes to arbitration. She’s unconvinced the terms are even enforceable. [ContractsProf Blog] Martin Walker and Winnie Mosioma: “Many cryptocurrency exchanges are now making proud claims about their regulated status, but does ‘regulated’ really mean what investors think?” A review of sixteen crypto exchanges.  [LSE Business Review] Not so much a revolving door as a recirculating sewer — Brian Brooks, formerly of Coinbase, and then of the Office of the Comptroller of the Currency, becomes the CEO of Binance US. [CoinDesk]   Bitcoin is just Avon for men in their late 20s don’t at me — CryptoCharles (@CryptoCharles__) April 12, 2021   Hashrate go down, number follows Bitcoin is so robustly decentralised that a power outage in a single area — or, by some reports, in a single data centre — in Xinjiang took half of Bitcoin’s hashpower offline, across multiple “independent” mining pools. Decentralised! [NASDAQ News] An accident in a coal mine on 10 April didn’t directly stop the flow of electricity — but it did lead to widespread safety inspections in various industries. This included Bitcoin mining data centres being shut down. [Crypto Briefing] The Bitcoin hash rate dropped from 220 exahashes per second to 165 EH/s. The rate of new blocks slowed. The Bitcoin mempool — the backlog of transactions waiting to be processed — has filled. Transaction fees peaked at just over $50 average on 18 April. [Johoe’s Bitcoin Mempool Statistics, archive of 20 April 2021; Ycharts, archive of 20 April 2021] This turned a slight dip in the BTC price over the weekend into a crash — from $64,000 down to $51,000. It’s hard to pump the market if you can’t move your coins. Though that hasn’t stopped Tether doing two-billion-USDT pumps. I’m sure this is all 100% backed with something that won’t crash if you look at it funny. Binance finds itself suddenly unable to fulfil withdrawals of crypto — direct from them to you on the blockchain, without even being able to blame the legacy financial system. Affected tokens: BNB (BEP2 and BEP20), USDT (TRC20 and BEP20) BTC, XRP, DOGE, BUSD (BEP20). But I’m sure it’ll all be fine, and Binance definitely have all the cryptos they claimed to. [Twitter, Twitter] You can cash out any time you like! As long as nobody else is trying to.   who decided to call them NFTs instead of GIF Certificates??? — adam j. sontag (@ajpiano) April 18, 2021   Q. What do you call unsmokeable mushrooms? A. Non-Tokeable Fungi NFTs have a problem: number go … not up. It turns out there isn’t a secondary market for NFTs — nobody buys them after the pumpers have had their turn. [Bloomberg] “It’s not meaningful to characterize a concept as a financial bubble,” said Chris Wilmer, a University of Pittsburgh academic who co-edits a blockchain research journal, and thinks playing with words obscures that NFTs were a month-long bubble. Some news stories called NFTs a “stimulus-led fad”. Now, you might think that was a remarkable euphemism for a blatant pump by crypto bros to fake the appearance of a market. Popular NFT marketplace Rarible has been targeted by … scammers and malware! Unheard of in crypto. [Bleeping Computer] Brian Livingston’s newsletter Muscular Portfolios traces a bit more of the follow-the-money on Metakovan’s purchase of a $69 million NFT. [Muscular Portfolios] Kim Parker: Most artists are not making money off NFTs — and here are some graphs to prove it. [Medium]   Minty Bingo for when NFTs die and everyone comes back crying https://t.co/1aCPplzdui pic.twitter.com/j725JsoIzq — 🕯️synthwave void gremlin 🕯️ (@Lokinne) April 6, 2021   He is genius in allocation of space Proof-of-space crypto may do to hard disks and SSDs what proof-of-work altcoins did to video cards. Bram Cohen’s Chia network seems to already be leading to local shortages of large hard drives — prices in Hong Kong for the 4TB and above range are up to triple the usual price.[HKEPC, in Chinese; WCCFTech] How wonderfully energy-efficient is proof-of-space? Not so great — Shokunin tried out the client: “I tested this Chia thing overnight. Gave it 200GB plot and two CPU threads. After 10 hours it consumed 400GB temp space, didn’t sync yet, CPU usage is always 80%+. Estimated reward time is 5 months. This isn’t green, already being centralised on large waste producing servers.” [Twitter] David S. H. Rosenthal noted precisely this in 2018: “One aspect of the talk that concerned me was that Cohen didn’t seem well-informed about the landscape of storage … If the cloud companies chose to burn-in their new drives by using them for Proof of Space they would easily dominate the network at almost zero cost.” [blog post, 2018] Baby’s on fire CoinHive used to host crypto-miners on web pages — scraps of JavaScript that would use your electricity to mine for Monero. The service was also popular with web malware vendors. CoinHive shut down in 2019. The coinhive.com domain name is now owned by security expert Troy Hunt — if you go to a page that’s still trying to load the CoinHive script, you get a page that warns you about cryptos, web-based malware and cross-site scripting.  [Troy Hunt] There’s enough Bitcoin mining in China that the Bitcoin mining alone is a serious problem for the country to meet its CO2 targets. [Nature; The Economist] David S. H. Rosenthal on how Bitcoin mining can never be green — because the carbon footprint is the point. [blog post] Gothamist: Andrew Yang Wants To Turn NYC Into A Bitcoin Megahub. That Would Be Terrible For Climate Change. “Bitcoin advocates never talk about displacement because it makes the numbers sound bad,” I was quoted as saying. [Gothamist] The Times: The idea of bitcoin going green is laughable — hey Bitcoin, this is what attention from the mainstream looks like. [Times, paywalled, archive]   while y'all are over here getting excited over NFTs I'm making the original NFT pic.twitter.com/Jcf01LB0BZ — live tucker reaction (@vogon) April 5, 2021   ICO, ICO The SEC has sued LBRY over their 2016 ICO — and their still-ongoing offerings of tokens in a manner that, on the face of it, appears to be a ridiculously obvious unregistered offering of securities. The SEC investigation has been going on three years. LBRY decided to market more tokens last year, which may have been the last straw for the SEC. [SEC press release; complaint, PDF] LBRY has struck back! With a site called HELP LBRY SAVE CRYPTO. The FAQ on the site makes a string of assertions which are best answered “read the complaint”. [HELP LBRY SAVE CRYPTO] Paragon was an ICO for “blockchain technology in the cannabis industry”. It was, as usual, an illegal offering of unregistered securities. Paragon settled with the SEC in 2018 — they had to return everyone’s money, and pay a $250,000 fine. Shockingly, the pot coin guys turned out to be flakes — Paragon defaulted on its settlement. [WSJ, 2019, paywalled] Paragon’s founders have disappeared. Aggrieved investors tried to mount a class action last year. [CoinDesk, 2020] Only $175,000 of the SEC penalty was paid, and this will be distributed to Paragon’s investors. [Order, PDF] In SEC v. Ripple, the SEC has been denied access to eight years of personal financial information of Ripple executives Brad Garlinghouse and Christian Larsen. [Order, PDF] And Ripple has gained partial access to SEC discussions on whether XRP was a security, as compared to BTC or ETH. [CoinTelegraph] The independent Telegram messaging service, beloved of crypto pumpers, will be a thing of the past — Pavel Durov was so screwed by paying back the investors in Telegram’s disastrous ICO that he’s now planning to take the company public. According to a claimed leak from the investment bankers preparing the offering, Telegram plans to sell 10% to 25% of the company in a direct US listing, in the hope of $30 to 50 billion, likely in 2023. [CoinDesk; Vedomosti, in Russian] The SEC has published a “Framework for ‘Investment Contract’ Analysis of Digital Assets.” None of this should be news to anyone here, though that won’t stop the crypto bros yelling like stuck pigs. [SEC]   Economists may sometimes say that the sky is green. The average crypto person will fight you on a 67 tweet thread arguing the colour of the sky is wet and in any case inflation is making the Nash equilibrium Llama. — 𝖤𝖽𝗆𝗎𝗇𝖽 𝖲𝖼𝗁𝗎𝗌𝗍𝖾𝗋 (@Edmund_Schuster) March 9, 2021   My beautiful launderette The Bank for International Settlements has a new report: “Supervising cryptoassets for anti-money laundering.” BIS concludes: “the first priority should be implementing the FATF standards wherever that has not taken place yet. This is the absolute minimum needed to mitigate the risks posed by cryptoassets at a global level.” This isn’t saying anything controversial, or advocating anything that isn’t happening — but crypto bros wishfully thinking the FATF ratchet will stop tightening on crypto are incorrect. [BIS, PDF] More on Signal and MobileCoin — Dan Davies (author of Lying for Money, a book that everyone reading this blog should read — UK, US) points out that the FCA already considers doing financial business over WhatsApp, Telegram or Signal “self-evidently suspicious.” In real finance, the traders’ chat channels are logged for compliance — because, without that, traders reliably dive headlong into illegal market shenanigans. And often, even with compliance logging. [Financial News, paywalled; Twitter] Dan correctly describes the innovation of MobileCoin: “pass on illegal inside information, receive payment and launder the proceeds, all in the same app!” [Twitter] The IRS wants information on Kraken crypto exchange customers, and on Circle customers — the latter may include when they owned Poloniex. [Forbes; Justice Department] Turkey gives cryptocurrencies official legal recognition as a payments mechanism, regulating their use either directly or indirectly! All use of cryptos in payments is banned. [Reuters; Resmi Gazete, in Turkish]   Welcome to finance Twitter. Please select your Guy: -Programmer trading in IRA -Leftist sympathizer, detests coworkers -Mysterious furry rumored to hav $500M AUM, 40% returns every year somehow -PhD high energy theory retired at 34 -Guy with tinder name John-MBA,CFA like LinkedIn — diet divorced guy (@neoliberal_dad) November 7, 2019   Central banking, not on the blockchain The Bank of England and the UK Treasury are forming a task force on central bank digital currencies (CBDCs). One of the task force’s vague and ill-specified jobs will be to look into whether they can find a use case for this in the UK — where most cash-like spending is actually a card anyway. [Bank of England] The Bank has been terribly excited about the fabulous possibilities of blockchain since they first noticed Bitcoin in 2013 — they’ve put out a pile of speculative papers, but none with an actual use case. That’s fine — speculating on weird possibilities is one of the things a central bank research unit does. (See Libra Shrugged, chapter 15.) But starting at an idea without a use case is the problem with blockchains in general. The Wall Street Journal has a pretty generic article on China’s DC/EP, but it includes the detail that the latest trial includes e-CNY that expires — “Beijing has tested expiration dates to encourage users to spend it quickly, for times when the economy needs a jump-start.” So even if DC/EP turns into Alipay-but-it’s-PBOC, being run by the PBOC means they can do interesting things with it if they need to. [WSJ, paywalled] The New Republic: Cryptocurrencies Are the Next Frontier for the Surveillance State — on the surveillance potential of CBDCs. With quotes and ideas from Libra Shrugged. [The New Republic]   So far in 2021 #Bitcoin has lost 97% of its value verses #Dogecoin. The market has spoken. Dogecoin is eating Bitcoin. All the Bitcoin pumpers who claim Bitcoin is better than gold because its price has risen more than gold's must now concede that Dogecoin is better than Bitcoin. — Peter Schiff (@PeterSchiff) April 16, 2021   Things happen Dogecoin is having another price pump, firmly establishing DOGE as the true crypto store of value and BTC as a deprecated altcoin. The big pump coincided with 400 million Tethers being deployed. Everything I said in February in my Foreign Policy piece on Dogecoin applies twice as hard. [Reddit] Australian plans to put disability payments on a … blockchain! It’ll work great! Right? With a quote from me. This particular bad idea somewhat resembles the plan to put welfare spending onto a blockchain that the UK government put into its 2016 paper “Distributed Ledger Technology: Beyond Blockchain” [gov.uk, 2016], which I wrote up in chapter 11 of Attack of the 50 Foot Blockchain. [ZDNet] The Marvelous Money Machine! A children’s book for grown-ups. This is great. Pay what you want for the PDF. [Gumroad] Facebook’s WhatsApp Pay Brazil has still not been allowed to go live, in the version where it hooks into the national PIX retail real-time settlement system. [Reuters] Der Spiegel: the German COVID vaccine tracker was going to use five blockchains! It will now use none. Nice try, IBM. [Der Spiegel, archive] Crypto guy loses a bet, and tries to pay the bet using the Lightning Network. Hilarity ensues. [Twitter thread, archive] PayPal lets you make payments with crypto! If it’s crypto you already had in your PayPal crypto holdings — which you can’t top up by depositing crypto from outside, only by buying crypto on PayPal with money. [Reuters] Why do this? The CEO of PayPal is a massive coiner, but he also has to worry about things like “the law.” So this gets crypto into news headlines on the company dime. Living on video Here’s the third pocast I did last week: Dunc Tank with Duncan Gammie! Talking about Attack of the 50 Foot Blockchain and the crypto skeptic view. [Podbean] I went on NTD again to talk about crypto “market cap” and how it’s a meaningless number, starting 11:35. [YouTube] And to talk about the Coinbase listing, starts 13:43. [YouTube] My laptop webcam is still mediocre, but it was better than the other Zoom experts’ webcams. The Naked Scientists podcast has done an episode on “Bitcoin Decrypted: Cash, Code, Crime & Power”. This is going out through BBC Radio 5 Live in the UK, and Radio National in Australia. [my segment; whole podcast] Byline Times: “So who is behind the onward march of the crypto, nearly 13 years on from the credit crunch and the arrival of Bitcoin and the thousands of digital currencies in its slipstream? The short answer is: idealists, ideologues and opportunists.” With a quote from me. [Byline Times] Sydney Morning Herald: ‘Financial weapon’: Bitcoin becomes another factor in China-US contest — with quotes from me. [SMH] I spoke to CNet about altcoins. [CNet] Investor’s Business Daily: Bitcoin Hits Tipping Point After Skyrocketing On Investment Mania — with quotes from me. [Investor’s Business Daily]   learning how to regurgitate on demand like a frightened vulture for the next time a man tries to explain cryptocurrencies to me — Kat Maddox (@ctrlshifti) April 8, 2021   Your subscriptions keep this site going. Sign up today! Share this: Click to share on Twitter (Opens in new window) Click to share on Facebook (Opens in new window) Click to share on LinkedIn (Opens in new window) Click to share on Reddit (Opens in new window) Click to share on Telegram (Opens in new window) Click to share on Hacker News (Opens in new window) Click to email this to a friend (Opens in new window) Taggedaustraliabank of englandbernie madoffbinancebisbitcoinblockchainbrad garlinghousebrazilbrian brooksbrian livingstoncbdcchiachinachristian larsencirclecoinbasecoinhivedcepdogecoindunc tankibmicoirskim parkerkrakenlbrylightning networklinksmarvelous money machineminingmobilecoinnftparagonpaypalpixpodcastpoloniexproof of spaceraribleripplesecsignaltelegramtethertroy huntturkeyunited kingdomwhatsapp payxinjiang Post navigation Previous Article Stilgherrian: The 9pm Dumb Anarcho-Capitalist Blockchain Scams with David Gerard One Comment on “News: Coinbase goes public, Bitcoin hashrate goes down, NFTs go down, proof-of-space trashes hard disk market” D says: 21st April 2021 at 2:21 am Fred Flintstone And The Marvelous Money Machine https://www.amazon.com/dp/B002UZQ0ZC Reply Leave a Reply Cancel reply Your email address will not be published. Required fields are marked * Comment Name * Email * Website Notify me of follow-up comments by email. Notify me of new posts by email. This site uses Akismet to reduce spam. Learn how your comment data is processed. Search for: Click here to get signed copies of the books!   Get blog posts by email! Email Address Subscribe Support this site on Patreon! Hack through the blockchain bafflegab: $5/month for early access to works in progress! $20/month for early access and even greater support! $100/month corporate rate, for your analyst newsletter budget! Buy the books! Libra Shrugged US Paperback UK/Europe Paperback ISBN-13: 9798693053977 Kindle: UK, US, Australia, Canada (and all other Kindle stores) — no DRM Google Play Books (PDF) Apple Books Kobo Smashwords Other e-book stores Attack of the 50 Foot Blockchain US Paperback UK/Europe Paperback ISBN-13: 9781974000067 Kindle: UK, US, Australia, Canada (and all other Kindle stores) — no DRM Google Play Books (PDF) Apple Books Kobo Smashwords Other e-book stores Available worldwide  RSS - Posts  RSS - Comments Recent blog posts News: Coinbase goes public, Bitcoin hashrate goes down, NFTs go down, proof-of-space trashes hard disk market Stilgherrian: The 9pm Dumb Anarcho-Capitalist Blockchain Scams with David Gerard Podcast: I Don’t Speak German #85: Crypto Fascists, with David Gerard Desperate investors, neoliberalism and Keynes: how to increase returns New York’s Excelsior Pass for COVID-19, on IBM Blockchain: doing the wrong thing, badly Excerpts from the book Table of Contents The conspiracy theory economics of Bitcoin Dogecoin Buterin’s quantum quest ICOs: magic beans and bubble machines Ethereum smart contracts in practice The DAO: the steadfast iron will of unstoppable code Business bafflegab, but on the Blockchain Imogen Heap: “Tiny Human”. Total sales: $133.20 Index About Press coverage for Attack of the 50 Foot Blockchain Press coverage for Libra Shrugged My cryptocurrency and blockchain press commentary and writing Facebook author page About the author Contact The content of this site is journalism and personal opinion. Nothing contained on this site is, or should be construed as providing or offering, investment, legal, accounting, tax or other advice. Do not act on any opinion expressed here without consulting a qualified professional. I do not hold a position in any crypto asset or cryptocurrency or blockchain company. Amazon product links on this site are affiliate links — as an Amazon Associate I earn from qualifying purchases. (This doesn’t cost you any extra.) Copyright © 2016–2021 David Gerard Powered by WordPress and HitMag. Send to Email Address Your Name Your Email Address Cancel Post was not sent - check your email addresses! Email check failed, please try again Sorry, your blog cannot share posts by email. developer-twitter-com-2745 ---- Expansions | Twitter Developer Expansions Overview With expansions, developers can expand objects referenced in the payload. Objects available for expansion are referenced by ID. For example, the referenced_tweets.id and author_id fields returned in the Tweets lookup payload can be expanded into complete objects. If you would like to request fields related to the user that posted that Tweet, or the media, poll, or place that was included in that Tweet, you will need to pass the related expansion query parameter in your request to receive that data in your response. When including an expansion in your request, we will include that expanded object’s default fields within the same response. It helps return additional data in the same response without the need for separate requests. If you would like to request additional fields related to the expanded object, you can include the field parameter associated with that expanded object, along with a comma-separated list of fields that you would like to receive in your response. Please note fields are not always returned in the same order they were requested in the query. { "data": { "attachments": { "media_keys": [ "16_1211797899316740096" ] }, "author_id": "2244994945", "id": "1212092628029698048", "referenced_tweets": [ { "type": "replied_to", "id": "1212092627178287104" } ], "text": "We believe the best future version of our API will come from building it with YOU. Here’s to another great year with everyone who builds on the Twitter platform. We can’t wait to continue working with you in the new year. https://t.co/yvxdK6aOo2" } } The Tweet payload above contains some reference IDs for complementary objects we can expand on. We can expand on attachments.media_keys to view the media object, author_id to view the user object, and referenced_tweets.id to view the Tweet object the originally requested Tweet was referencing. Expanded objects will be nested in the "includes" object, as can be seen in the sample response below.   Available expansions in a Tweet payload Expansion Description author_id Returns a user object representing the Tweet’s author referenced_tweets.id Returns a Tweet object that this Tweet is referencing (either as a Retweet, Quoted Tweet, or reply) in_reply_to_user_id Returns a user object representing the Tweet author this requested Tweet is a reply of attachments.media_keys Returns a media object representing the images, videos, GIFs included in the Tweet attachments.poll_ids Returns a poll object containing metadata for the poll included in the Tweet geo.place_id Returns a place object containing metadata for the location tagged in the Tweet entities.mentions.username Returns a user object for the user mentioned in the Tweet referenced_tweets.id.author_id Returns a user object for the author of the referenced Tweet   Available expansion in a user payload Expansion Description pinned_tweet_id Returns a Tweet object representing the Tweet pinned to the top of the user’s profile   Expanding the media, Tweet, and user objects In the following request, we are requesting the following expansions to include alongside the default Tweet fields.  Be sure to replace $BEARER_TOKEN with your own generated bearer token. attachments.media_keys referenced_tweets.id author_id   Sample Request   curl 'https://api.twitter.com/2/tweets/1212092628029698048?expansions=attachments.media_keys,referenced_tweets.id,author_id' --header 'Authorization: Bearer $BEARER_TOKEN' Code copied to clipboard   Sample Response { "data": { "attachments": { "media_keys": [ "16_1211797899316740096" ] }, "author_id": "2244994945", "id": "1212092628029698048", "referenced_tweets": [ { "type": "replied_to", "id": "1212092627178287104" } ], "text": "We believe the best future version of our API will come from building it with YOU. Here’s to another great year with everyone who builds on the Twitter platform. We can’t wait to continue working with you in the new year. https://t.co/yvxdK6aOo2" }, "includes": { "media": [ { "media_key": "16_1211797899316740096", "type": "animated_gif" } ], "users": [ { "id": "2244994945", "name": "Twitter Dev", "username": "TwitterDev" } ], "tweets": [ { "author_id": "2244994945", "id": "1212092627178287104", "referenced_tweets": [ { "type": "replied_to", "id": "1212092626247110657" } ], "text": "These launches would not be possible without the feedback you provided along the way, so THANK YOU to everyone who has contributed your time and ideas. Have more feedback? Let us know ⬇️ https://t.co/Vxp4UKnuJ9" } ] } } Expanding the poll object In the following request, we are requesting the following expansions to include alongside the default Tweet fields: attachments.poll_ids   Sample Request curl 'https://api.twitter.com/2/tweets/1199786642791452673?expansions=attachments.poll_ids' --header 'Authorization: Bearer $BEARER_TOKEN' Code copied to clipboard Sample Response { "data": { "attachments": { "poll_ids": [ "1199786642468413448" ] }, "id": "1199786642791452673", "text": "C#" }, "includes": { "polls": [ { "id": "1199786642468413448", "options": [ { "position": 1, "label": "“C Sharp”", "votes": 795 }, { "position": 2, "label": "“C Hashtag”", "votes": 156 } ] } ] } } Expanding the place object In the following request, we are requesting the following expansions to include alongside the default Tweet fields: geo.place_id   Sample Request curl 'https://api.twitter.com/2/tweets/:ID?expansions=geo.place_id’ --header 'Authorization: Bearer $BEARER_TOKEN' Code copied to clipboard Sample Response { "data": { "geo": { "place_id": "01a9a39529b27f36" }, "id": "ID", "text": "Test" }, "includes": { "places": [ { "full_name": "Manhattan, NY", "id": "01a9a39529b27f36" } ] } } Next step Learn how to use Fields with Expansions Review the different data objects available with Twitter API v2 Was this document helpful? Thank you for the feedback. We’re really glad we could help! Thank you for the feedback. How could we improve this document? This page is missing information. The information was hard to follow or confusing. There is inaccurate information. There is a broken link or typo. Specific Feedback Submit feedback Skip Thank you for the feedback. Your comments will help us improve our documents in the future. Developer agreement, policy & terms Follow @twitterdev Subscribe to developer news Twitter platform Twitter.com Status Card validator Privacy Center Transparency Center Twitter, Inc. About the company Twitter for Good Company news Brand toolkit Jobs and internships Investors Help Help Center Using Twitter Twitter Media Ads Help Center Managing your account Safety and security Rules and policies Contact us Developer resources Developer home Documentation Forums Communities Developer blog Engineering blog Developer terms Business resources Advertise Twitter for business Resources and guides Twitter for marketers Marketing insights Brand inspiration Twitter Data Twitter Flight School © 2021 Twitter, Inc. Cookies Privacy Terms and conditions Language Developer By using Twitter’s services you agree to our Cookies Use. We use cookies for purposes including analytics, personalisation, and ads. OK This page and certain other Twitter sites place and read third party cookies on your browser that are used for non-essential purposes including targeting of ads. Through these cookies, Google, LinkedIn and Demandbase collect personal data about you for their own purposes. Learn more. Accept Decline developer-twitter-com-2901 ---- Fields | Twitter Developer Fields Introduction The Twitter API v2 endpoints are equipped with a new set of parameters called fields, which allows you to select just the data that you want from each of our objects in your endpoint response. For example, if you only need to retrieve a Tweet’s created date, or a user’s bio description, you can specifically request that data to return with a set of other default fields without the full set of fields that associate with that data object. This provides a higher degree of customization by enabling you to only request the fields you require depending on your use case. Default fields will always be returned in the response. With the fields query parameters, you can request additional fields of the object to include in the response. This is done by specifying one of the below parameters, including a comma-separated list of fields that you would like to return. Each object has its own parameter which is used to specifically request the fields that are associated with that object. Here are the different fields parameters that are currently available: Tweet → tweet.fields User → user.fields Media → media.fields Poll → poll.fields Place → place.fields When using an endpoint that primarily returns a particular object, simply use the matching field parameter and specify the field(s) desired in a comma-separated list as the value to that parameter to retrieve those fields in the response.   For example, if you are using the GET /tweets/search/recent endpoint, you will primarily receive Tweet objects in that response. Without specifying any fields parameters, you will just receive the default values, id and text. If you are interested in receiving the public metrics of the Tweets that are returned in the response, you will want to include the tweet.fields parameter in your request, with public_metrics set as the value.  This request would look like the following. If you would like to use this request, make sure to replace $BEARER_TOKEN with your Bearer Token and send it using your command line tool. curl --request GET \ --url 'https://api.twitter.com/2/tweets/search/recent?query=from%3Atwitterdev&tweet.fields=public_metrics' \ --header 'Authorization: Bearer $BEARER_TOKEN' Code copied to clipboard If you send this request in your terminal, then each of the Tweets that return will include the following fields: { "data": { "id": "1263150595717730305", "public_metrics": { "retweet_count": 12, "reply_count": 14, "like_count": 49, "quote_count": 7 }, "text": "Do you 👀our new Tweet settings?\n\nWe want to know how and why you’d use a feature like this in the API. Get the details and let us know what you think👇\nhttps://t.co/RtMhhfAcIB https://t.co/8wxeZ9fJER" } } If you would like to retrieve a set of fields from a secondary object that is associated with the primary object returned by an endpoint, you will need to include an additional expansions parameter.  For example, if you were using the same GET search/tweets/recent endpoint as earlier, and you wanted to retrieve the author's profile description, you will have to pass the expansions=author_id and user.fields=description with your request. Here is an example of what this might look like. If you would like to try this request, make sure to replace the $BEARER_TOKEN with your Bearer Token before pasting it into your command line tool. curl --request GET \ --url 'https://api.twitter.com/2/tweets/search/recent?query=from%3Atwitterdev&tweet.fields=public_metrics&expansions=author_id&user.fields=description' \ --header 'Authorization: Bearer $BEARER_TOKEN' Code copied to clipboard If you specify this in the request, then each of the Tweets that deliver will have the following fields, and the related user object's default and specified fields will return within includes. The user object can be mapped back to the corresponding Tweet(s) by matching the tweet.author_id and users.id fields.   { "data": [ { "id": "1263150595717730305", "author_id": "2244994945", "text": "Do you 👀our new Tweet settings?\n\nWe want to know how and why you’d use a feature like this in the API. Get the details and let us know what you think👇\nhttps://t.co/RtMhhfAcIB https://t.co/8wxeZ9fJER", "public_metrics": { "retweet_count": 12, "reply_count": 13, "like_count": 51, "quote_count": 7 } } ], "includes": { "users": [ { "id": "2244994945", "username": "TwitterDev", "description": "The voice of the #TwitterDev team and your official source for updates, news, and events, related to the #TwitterAPI.", "name": "Twitter Dev" } ] } } Bear in mind that you cannot request specific subfields (for example, public_metrics.retweet_count). All subfields will be returned when the top-level field (public_metrics) is specified. We have listed all possible fields that you can request in each endpoints' API reference page's parameters table.  A full list of fields are listed in the object model. To expand and request fields on an object that is not that endpoint’s primary resource, use the expansions parameter with fields. Next step Learn how to use Fields with Expansions Review the different data objects available with Twitter API v2 Make your first request with Fields and Expansions Was this document helpful? Thank you for the feedback. We’re really glad we could help! Thank you for the feedback. How could we improve this document? This page is missing information. The information was hard to follow or confusing. There is inaccurate information. There is a broken link or typo. Specific Feedback Submit feedback Skip Thank you for the feedback. Your comments will help us improve our documents in the future. Developer agreement, policy & terms Follow @twitterdev Subscribe to developer news Twitter platform Twitter.com Status Card validator Privacy Center Transparency Center Twitter, Inc. About the company Twitter for Good Company news Brand toolkit Jobs and internships Investors Help Help Center Using Twitter Twitter Media Ads Help Center Managing your account Safety and security Rules and policies Contact us Developer resources Developer home Documentation Forums Communities Developer blog Engineering blog Developer terms Business resources Advertise Twitter for business Resources and guides Twitter for marketers Marketing insights Brand inspiration Twitter Data Twitter Flight School © 2021 Twitter, Inc. Cookies Privacy Terms and conditions Language Developer By using Twitter’s services you agree to our Cookies Use. We use cookies for purposes including analytics, personalisation, and ads. OK This page and certain other Twitter sites place and read third party cookies on your browser that are used for non-essential purposes including targeting of ads. Through these cookies, Google, LinkedIn and Demandbase collect personal data about you for their own purposes. Learn more. Accept Decline developer-twitter-com-5357 ---- Guide to the future of the Twitter API | Twitter Developer Early Access Guide to the future of the Twitter API Overview Support for diverse use cases New access options to get started and grow Bringing it all together Evolution of the developer portal Support for OAuth 2 and new permissions Versioning Rolling out the new Twitter API At Twitter, our purpose is to serve the public conversation and we believe developers play a critical role in achieving this. Twitter wouldn’t be what it is today if it weren’t for you. Your creativity and work with our API make Twitter, and the world, a better place.  We’re building the next generation of the Twitter API to better serve our diverse community of developers. Our API gives you the ability to learn from and engage with the conversation on Twitter and we want to give you tools to further uncover, build on, and share the value of this conversation with the world.  To serve our diverse ecosystem, we plan to introduce a few new concepts and components to the platform. Consider this page your guide to the future of the Twitter API. As we build, we’ll update our product roadmap and will be keeping you updated here about the details of our plans. And with our new API foundation, we'll be better able to incorporate your feedback and make improvements along the way.  If you missed our recent announcements, make sure to read our blog posts introducing the new and improved Twitter API, and the Academic Research product track. If you have questions, feedback, or suggestions about any of the following, let us know. Share your feedback Last updated: January 26, 2021   Support for diverse use cases Understanding the global conversation Engaging with people on Twitter Improving Twitter   Understanding the global conversation Engaging with people on Twitter Improving Twitter We’ve always known that our developer ecosystem is diverse, but our API has long taken a one-size-fits-all approach. You helped us to understand the use cases you have when you work with the Twitter API and we're building the new API to help support these use cases—releasing new functionality in phases, each supporting the core use cases that we’ve heard from you.   Understanding the global conversation Our first few releases will be focused on making it easier to understand the public conversation. One of the most common reasons developers use the Twitter API is to listen to and analyze the conversation happening on Twitter. We’re not done yet. In the coming months, we will continue to release additional endpoints to help you understand the conversation, to discover insights, or to make informed decisions.  Explore the Listen & Analyze use case   Engaging with people on Twitter People come to Twitter to connect and interact with each other, with their favorite teams, celebrities, or musicians, with world leaders, with their communities, with brands, with fun bots, and more. Developers play a critical role in creating content and engaging in various ways on the platform. In the coming months, we’ll release a number of endpoints (including new versions of endpoints for creating Tweets) to Early Access to support these use cases.   Improving Twitter Developers have played a key part in making Twitter healthier and more engaging since the beginning. Your love for Twitter shows through your work and we want to make it easy for you to channel your passion for Twitter into actively making it better. We want to empower you to give people more control over their experience on Twitter. The new Academic Research product track represents a crucial beginning to this process, as their research and discoveries can help make the world a better place, even help improve experiences on Twitter. Although we started with academic researchers, keep an eye open for new endpoints, guidance, and tools to fuel this kind of work across our Standard, Academic Research, and Business product tracks.   New access options to get started and grow Access levels Product tracks New license terms Supporting the health of the public conversation   Access levels Product tracks New license terms Supporting the health of the public conversation Your feedback helped us see the importance of making the new Twitter API more flexible and scalable to fit your needs. With the new API, we are building new access options and product tracks so more developers can find options to support their use cases.   Access levels Within the new Twitter API, we intend to introduce three core access levels which make it easy to grow and scale. The three access levels include: Basic access:  Free, default access to endpoints for developers with an approved developer account. Based on research over the past few years, we expect that the large majority of developers (>80%) will find the access they need within this tier to get started and build something awesome. Elevated access: Increased access to collections of relevant endpoints that include access to more Tweets, increased rate limits, and more advanced reliability features. Custom access: While the majority of developers’ goals will be met by Basic and Elevated access, for those who need more, we can help get you what you need.   Product tracks We love the incredible diversity of developers who use our API. And we want to provide a platform that: serves many types of developers with access and tools that fit their use cases continues to offer free and open access for developers, and provides a dedicated and supported path for both commercial and non-commercial services built with the API To accomplish these, we're introducing new, distinct product tracks to better serve different groups of developers and provide them with a tailored experience and support, a range of relevant access levels, and appropriate pricing (where applicable).  Developers who already have a developer account will start in the Standard product track and will be able to apply for others. New developers will be able to apply for the tracks that are relevant to you.  Standard: The default product track for most developers, including those building something for fun, for a good cause, to learn or teach. Academic Research: Academic researchers are one of the largest groups looking to understand what’s happening in the public conversation. Within this track, qualified academic researchers will get increased levels of access to a relevant collection of endpoints, including a new full-archive search endpoint. We’re also providing resources for researchers to make it easier to conduct academic research with the Twitter API. Business: Developers build businesses on the Twitter API. And we love that their products help other people and businesses better understand and engage with the conversation on Twitter. This track will include the option for Elevated access to relevant collections of endpoints, or Custom access. A key part of our strategy is our commitment to working with a diverse set of developers to enable their success. Some developers, including those building client-like applications, deserve more clarity in how to operate with the new Twitter API.  Though too early to share any specifics, reaching this clarity may require a fresh look at policy and product access details that affect them. We’re looking ahead and seek to determine how best to work with these groups to serve the public conversation together.   New license terms We’re designing these tracks with products, pricing, and access level options to better serve the unique needs of different types of developers. To support this, certain product tracks are reserved for non-commercial purposes only. We’ve therefore introduced new commercial use terms to the Developer Agreement that govern how the API can be used in product tracks designated as non-commercial. The Academic Research product track is the first product track we’ve released that is reserved for non-commercial purposes. As we continue introducing additional product tracks, namely the Business product track, we will provide more information about serving commercial use of the Twitter API. For now, commercial use cases are supported through the Standard Basic access or the v1.1 Twitter API.  Know that if you are using the API for commercial purposes, this does not necessarily mean that you are required to pay for access (for example, Basic access on any of our product tracks will be available for free).  We want to continue learning more from you about this approach. If you’re interested, let us know your thoughts on these plans with this short survey.     Supporting the health of the public conversation As with the introduction of our developer application a few years ago, we are committed to a developer platform that works in service of the overall health of conversation on Twitter. Simultaneously, we are committed to a developer platform that is open and serves diverse needs. The introduction of these new access levels and product tracks allows us to offer more options and access with increased trust, as well as more controls to help address platform abuse and to keep the Twitter service safe and secure for everyone. Our hope is that you find that these paths provide even more clarity about how to adhere to our Developer Terms and make it easier to scale your use of the Twitter API for years to come.   Bringing it all together With work underway and several new access levels and product tracks planned, we want to share an illustration of how they may all come together. This is an evolving vision and it will take some time before all of these access options are available. We hope this will be helpful to understand which path may eventually make sense for you. Overview Standard Academic Research Business Overview Standard Academic Research Business We want to continue learning more from you to be sure our approach is right. If you’re interested, please share your feedback with us about these plans.   Evolution of the developer portal Evolution of the developer portal   Evolution of the developer portal   Over the last few months, all developers saw a new interface when they logged in to their developer accounts. This new developer portal is the home base for managing your use of the new Twitter API, with continual improvements and new features to help you build. We’re planning to create new ways to manage access for multiple development environments, to help you rethink how you manage a team of collaborators, track and understand your API usage, move up and down between access levels, and find resources to help you be successful. If you have other ideas you’d like to see, let us know and share your feedback! We’ve also introduced "Projects" within the developer portal as a way to organize your work and manage your access to the Twitter API for each use case you’re building with it. We’re starting with just one Project per developer account for the first Early Access release, so you can begin using Basic access to the new Twitter API. With the recent release of the Academic Research product track, eligible researchers can now add a Project in the Academic Research product track. They may also create or maintain another Project in the Standard product track for a distinct, and separate use case. As we roll out further access levels and product tracks, you’ll be able to create multiple Projects for different use cases. We plan to support separate production, staging and development Apps within a Project as distinct environments to help you better manage your integration, and make it easier for a team to manage a Project and its Apps. For now, you can still use your existing, standalone Apps and create new ones if you need to; eventually, all API access will be through Projects.   Support for OAuth 2 and new permissions Support for OAuth 2 and new permissions   Support for OAuth 2 and new permissions   We are working to add support for OAuth 2. In doing so, we intend to improve the developer experience with more granular permissions to give you more control and to serve the expectations of people authorizing your application. It will be some time before we make this available, however, this is a path we are actively pursuing. We'll share more in the future about how to test this. Share your feedback and suggestions as we build.   Versioning Versioning   Versioning   We expect to launch new major API versions more often than we have in the past (8 years ago!), but we'll still make it a goal to avoid doing so unless there's a compelling reason. We don't expect to make major version updates more often than once every 12 months, and when we do, it will be our goal to support the previous version for at least 1 year until retirement. Between major version changes, you’ll continue to see us add non-breaking improvements as they’re ready. Our goal is that you will only need to update your integration if you’d like to take advantage of new functionality.   Rolling out the new Twitter API Early access Deprecations and migrations Expected sequence of events   Early access Deprecations and migrations Expected sequence of events   Early Access In August 2020, we released Early Access to the new Twitter API v2. Eventually, the new API will fully replace the v1.1 standard, premium, and enterprise APIs. Before that can happen, we have more to build. Since our initial release, we’ve added a handful of new features including the new hide replies endpoint, the user Tweet timeline and user mention timeline endpoints, and the new follows lookup endpoints.  Additionally, we launched the Academic Research product track on the new Twitter API. This specialized track for researchers offers higher levels of access, free access to full-archive search, and other v2 endpoints for approved developers, as well as enhanced features and functionality to get more precise and complete data for analyzing the public conversation. Please note that this product track does have increased eligibility requirements. Academics with a specific research use case for using Twitter data can now apply for the Academic Research product track. For all other developers, we continue to encourage usage of Early Access. Everything we’ve released and will continue releasing into Early Access is fully supported and ready for you to build in production. Once we've completed releasing new versions of core functionality, we’ll move the new API version (v2) into the General Availability (GA) phase and make it the new default version of the Twitter API. To learn more, visit the Early Access overview. For a preview of what’s to come, and what we have planned, check out our expected sequence of events, below! Get started with Early Access If you don't yet have a developer account, apply to get started.   Deprecations and migrations We know migrations can be challenging and we’re committed to doing our part to make migrating to our new API as easy as we can. Whether you use the current standard v1.1, premium, or enterprise endpoints — or a combination — you likely won’t need to migrate for some time. Our intent is to provide plenty of migration time (along with resources to help) when we deprecate existing endpoints. Our goal is to wait until we have completed releasing new versions of core functionality, but there may be exceptions where we need to turn off some legacy services sooner, including: Standard v1.1 statuses/sample and statuses/filter endpoints. Later this year we plan to announce a shorter deprecation window for these two endpoints. The replacements for these endpoints are available in Early Access: the filtered stream and sampled stream endpoints. We're giving you this heads up so you can begin exploring these replacements now. For specific requests or to provide your thoughts on this update, please share your feedback. For those that want to get ahead and migrate early, check out our migration resources for the Twitter API v2.   Expected sequence of updates The effort to replace the v1.1, premium, and enterprise APIs will take some time. To help you plan, we want to share a rough outline of the order in which we hope to roll out changes. Should our plans evolve, we will do our best to keep it updated here. To receive notification about the progress of specific items, sign up to "watch" any cards within our product roadmap.   Timeline Endpoints Product tracks Deprecation Timeline Endpoints Product tracks Deprecation Stay tuned! We will continue to evolve and improve our plans as we learn. Have specific thoughts you'd like to share? We're always listening, so please share your feedback. We'd love to hear from you! Developer agreement, policy & terms Follow @twitterdev Subscribe to developer news Twitter platform Twitter.com Status Card validator Privacy Center Transparency Center Twitter, Inc. About the company Twitter for Good Company news Brand toolkit Jobs and internships Investors Help Help Center Using Twitter Twitter Media Ads Help Center Managing your account Safety and security Rules and policies Contact us Developer resources Developer home Documentation Forums Communities Developer blog Engineering blog Developer terms Business resources Advertise Twitter for business Resources and guides Twitter for marketers Marketing insights Brand inspiration Twitter Data Twitter Flight School © 2021 Twitter, Inc. Cookies Privacy Terms and conditions Language Developer By using Twitter’s services you agree to our Cookies Use. We use cookies for purposes including analytics, personalisation, and ads. OK This page and certain other Twitter sites place and read third party cookies on your browser that are used for non-essential purposes including targeting of ads. Through these cookies, Google, LinkedIn and Demandbase collect personal data about you for their own purposes. Learn more. Accept Decline digitallibrarian-org-4082 ---- The Digital Librarian – Information. Organization. Access. ↓ Skip to Main Content The Digital Librarian Information. Organization. Access. Main Navigation Menu Home About Libraries and the state of the Internet By jaf Posted on June 27, 2016 Posted in digital libraries No Comments Mary Meeker presented her 2016 Internet Trends report earlier this month. If you want a better understanding of how tech and the tech industry is evolving, you should watch her talk and read her slides. This year’s talk was fairly … Libraries and the state of the Internet Read more » Meaningful Web Metrics By jaf Posted on January 3, 2016 Posted in Web metrics No Comments This article from Wired magazine is a must-read if you are interested in more impactful metrics for your library’s web site. At MPOE, we are scaling up our need for in-house web product expertise, but regardless of how much we … Meaningful Web Metrics Read more » Site migrated By jaf Posted on October 1, 2012 Posted in blog No Comments Just a quick note – digitallibrarian.org has been migrated to a new server. You may see a few quirks here and there, but things should be mostly in good shape. If you notice anything major, send me a Challah. Really. … Site migrated Read more » The new iPad By jaf Posted on March 18, 2012 Posted in Apple, Hardware, iPad 1 Comment I decided that it was time to upgrade my original iPad, so I pre-ordered a new iPad, which arrived this past Friday. After a few days, here are my initial thoughts / observations: Compared to the original iPad, the new … The new iPad Read more » 3rd SITS Meeting – Geneva By jaf Posted on August 3, 2011 Posted in Conferences, digital libraries, Uncategorized, workshops No Comments Back in June I attend the 3rd SITS (Scholarly Infrastructure Technical Summit) meeting, held in conjunction with the OAI7 workshop and sponsored by JISC and the Digital Library Federation. This meeting, held in lovely Geneva, Switzerland, brought together library technologists … 3rd SITS Meeting – Geneva Read more » Tagged with: digital libraries, DLF, SITS David Lewis’ presentation on Collections Futures By jaf Posted on March 2, 2011 Posted in eBooks, Librarianship 1 Comment Peter Murray (aka the Disruptive Library Technology Jester) has provided an audio-overlay of David Lewis’ slideshare of his plenary at the last June’s RLG Annual Partners meeting. If you are at all interested in understanding the future of academic libraries, … David Lewis’ presentation on Collections Futures Read more » Tagged with: collections, future, provisioning Librarians are *the* search experts… By jaf Posted on August 19, 2010 Posted in Librarianship No Comments …so I wonder how many librarians know all of the tips and tricks for using Google that are mentioned here? What do we want from Discovery? Maybe it’s to save the time of the user…. By jaf Posted on August 18, 2010 Posted in Uncategorized 1 Comment Just a quick thought on discovery tools – the major newish discovery services being vended to libraries (WorldCat local, Summon, Ebsco Discovery Service, etc.) all have their strengths, their complexity, their middle-of-the-road politician trying to be everything to everybody features. … What do we want from Discovery? Maybe it’s to save the time of the user…. Read more » Putting a library in Starbucks By jaf Posted on August 12, 2010 Posted in digital libraries, Librarianship No Comments It is not uncommon to find a coffee shop in a library these days. Turn that concept around, though – would you expect a library inside a Starbucks? Or maybe that’s the wrong question – how would you react to … Putting a library in Starbucks Read more » Tagged with: coffee, digital library, library, monopsony, starbucks, upsell 1 week of iPad By jaf Posted on April 14, 2010 Posted in Apple, eBooks, Hardware, iPad 1 Comment It has been a little over a week since My iPad was delivered, and in that time I have had the opportunity to try it out at home, at work, and on the road. In fact, I’m currently typing this … 1 week of iPad Read more » Tagged with: Apple, digital lifestyle, iPad, mobile, tablet Posts navigation 1 2 3 Next © 2021 | Powered by Responsive Theme digitallibrarian-org-6938 ---- The Digital Librarian http://digitallibrarian.org Information. Organization. Access. Mon, 27 Jun 2016 19:04:01 +0000 en-US hourly 1 https://wordpress.org/?v=5.6.3 Libraries and the state of the Internet http://digitallibrarian.org/?p=229 http://digitallibrarian.org/?p=229#respond Mon, 27 Jun 2016 12:04:01 +0000 http://digitallibrarian.org/?p=229 Libraries and the state of the Internet Read More »

]]>
Mary Meeker presented her 2016 Internet Trends report earlier this month. If you want a better understanding of how tech and the tech industry is evolving, you should watch her talk and read her slides.

This year’s talk was fairly time constrained, and she did not go into as much detail as she has in years past. That being said, there is still an enormous amount of value in the data she presents and the trends she identifies via that data.

Some interesting takeaways:

  • The growth in total number of internet users worldwide is slowing (the year-to-year growth rate is flat; overall growth is around 7% new years per year)
  • However, growth in India is still accelerating, and India is now the #2 global user market (behind China; USA is 3rd)
  • Similarly, there is a slowdown in the growth of the number of smartphone users and number of smartphones being shipped worldwide (still growing, but at a slower rate)
  • Android continues to demonstrate growth in marketshare; Android devices are continuing to be less costly by a significant margin than Apple devices.
  • Overall, there are opportunities for businesses that innovate / increase efficiency / lower prices / create jobs
  • Advertising continues to demonstrate strong growth; advertising efficacy still has a ways to go (internet advertising is effective and can be even more so)
  • Internet as distribution channel continues to grow in use and importance
  •  Brand recognition is increasingly important
  • Visual communication channel usage is increasing – Generation Z relies more on communicating with images than with text
  • Messaging is becoming a core communication channel for business interactions in addition to social interactions
  • Voice on mobile rapidly rising as important user interface – lots of activity around this
  • Data as platform – important!

So, what kind of take-aways might be most useful to consider in the library context? Some top-of-head thoughts:

  • In the larger context of the Internet, Libraries need to be more aggressive in marketing their brand and brand value. We are, by nature, fairly passive, especially compared to our commercial competition, and a failure to better leverage the opportunity for brand exposure leaves the door open to commercial competitors.
  • Integration of library services and content through messaging channels will become more important, especially with younger users. (Integration may actually be too weak a term; understanding how to use messaging inherently within the digital lifestyles of our users is critical)
  • Voice – are any libraries doing anything with voice? Integration with Amazon’s Alexa voice search? How do we fit into the voice as platform paradigm?

One parting thought, that I’ll try to tease out in a follow-up post: Libraries need to look very seriously at the importance of personalized, customized curation of collections for users, something that might actually be antithetical to the way we currently approach collection development. Think Apple Music, but for books, articles, and other content provided by libraries. It feels like we are doing this in slices and pieces, but that we have not yet established a unifying platform that integrates with the larger Internet ecosystem.

]]>
http://digitallibrarian.org/?feed=rss2&p=229 0
Meaningful Web Metrics http://digitallibrarian.org/?p=207 http://digitallibrarian.org/?p=207#respond Sun, 03 Jan 2016 20:10:52 +0000 http://digitallibrarian.org/?p=207 Meaningful Web Metrics Read More »

]]>
This article from Wired magazine is a must-read if you are interested in more impactful metrics for your library’s web site. At MPOE, we are scaling up our need for in-house web product expertise, but regardless of how much we invest in terms of staffing, it is likely that the amount of requested web support will always exceed the amount of resourcing we have for that support. Leveraging meaningful impact metrics can help us understand the value we get from the investment we make in our web presence, and more importantly help us define what types of impact we want to achieve through that investment. This is no easy feat, but it is good to see that others in the information ecosystem are looking at the same challenges.

]]>
http://digitallibrarian.org/?feed=rss2&p=207 0
Site migrated http://digitallibrarian.org/?p=154 http://digitallibrarian.org/?p=154#respond Mon, 01 Oct 2012 20:25:53 +0000 http://digitallibrarian.org/?p=154 Site migrated Read More »

]]>
Just a quick note – digitallibrarian.org has been migrated to a new server. You may see a few quirks here and there, but things should be mostly in good shape. If you notice anything major, send me a Challah. Really. A nice bread. Or just an email. Your choice. 🙂

]]>
http://digitallibrarian.org/?feed=rss2&p=154 0
The new iPad http://digitallibrarian.org/?p=141 http://digitallibrarian.org/?p=141#comments Sun, 18 Mar 2012 16:20:55 +0000 http://digitallibrarian.org/?p=141 The new iPad Read More »

]]>
I decided that it was time to upgrade my original iPad, so I pre-ordered a new iPad, which arrived this past Friday. After a few days, here are my initial thoughts / observations:

  • Compared to the original iPad, the new iPad is a huge improvement. Much zipper, feels lighter (compared to the original), and of course the display is fantastic.
  • I’ve just briefly tried the dictation feature, and though I haven’t used it extensively yet, the accuracy seems pretty darned good. I wonder if a future update will support Siri?
  • The beauty of the display cannot be understated – crisp, clear (especially for someone with aging eyes)
  • I purchased a 32-Gb model with LTE, but I have not tried the cell network yet. I did see 4G show up, so I’m hoping that Tucson indeed has the newer network.
  • Not really new, but going from the original iPad to the new iPad, I really like the smart cover approach. Ditto with the form factor.
  • Again, not specific to the new model, the ability to access my music, videos, and apps via iCloud means that I can utilize the storage on the iPad more effectively.
  • All-in-all, I can see myself using the new iPad consistently for a variety of tasks, not just for consuming information. Point-in-fact, this post was written with the new iPad.

    ]]>
    http://digitallibrarian.org/?feed=rss2&p=141 1
    3rd SITS Meeting – Geneva http://digitallibrarian.org/?p=130 http://digitallibrarian.org/?p=130#respond Wed, 03 Aug 2011 09:38:19 +0000 http://digitallibrarian.org/?p=130 3rd SITS Meeting – Geneva Read More »

    ]]>
    Back in June I attend the 3rd SITS (Scholarly Infrastructure Technical Summit) meeting, held in conjunction with the OAI7 workshop and sponsored by JISC and the Digital Library Federation. This meeting, held in lovely Geneva, Switzerland, brought together library technologists and technology leaders from North America, Europe, Australia, and Asia for the purpose of exploring common technology and technology-related issues that crossed our geographic boundaries.

    This is the first SITS meeting that I attended – prior to this meeting, there were two other SITS meetings (one in London and one in California). As this SITS meeting was attached to the OAI7 conference, it brought together a group of stakeholders who’s roles in their organizations spanned from technology implementors to technology strategists and decision makers. From having chatted with some of the folks who had attended previous SITS meetings, the attendees at those meetings tended to weigh heavily on the technology implementer / developer side, while this particular instance of SITS had a broader range of discussion that, while centered on technology, also incorporated much of the context to which technology was being applied. For me, that actually made this a more intriguing and productive discussion, as I think that while there are certainly a great variety of strictly technical issues with which we grapple, what often gets lost when talking semantic web, linked data, digital preservation, etc. is the context and focus of the purpose of deploying said technology. So, with that particular piece of context, I’ll describe some of the conversation that occurred at this particular SITS event.

    Due to the schedule of OAI7, this SITS meeting was held in two parts – the afternoon of 24 June, and the morning of 25 June. For the first session, the group met in one of the lecture rooms at the conference venue, and this worked out quite nicely. SITS uses an open agenda / open meeting format, which allows the attendees to basically nominate and elect the topics of discussion for the meeting. After initial introductions, we began proposing topics. I tried to capture as best I could all of the topics that were proposed, though I might have missed one or two:

    * stable links for linked data vs. stable bitstreams for preservation
    * authority hubs / clustered IDs / researcher IDs / ORCID in DSpace
    * effective synchronization of digital resources
    * consistency and usage of usage data
    * digital preservation architecture – integration of tape-based storage and other storage anvironments (external to the library)
    * integration between repositories and media delivery (i.e. streaming) – particularly to access control enforcement
    * nano publications and object granularity
    * pairing storage with different types of applications
    * linking research data to scholarly publications to faculty assessment
    * well-behaved document
    * research impacts and outputs
    * linked open data: from vision to deployment
    * Relationship between open linked data and open research data
    * Name disambiguation

    Following process, we took the above brainstormed list and proceeded to vote on which topic to begin discussion. The first topic chosen was researcher identities, which began with discussion around ORCID, a project that currently has reasonable mindshare behind it. While there are a lot of backers of ORCID, it is not clear whether the approach of a singular researcher ID is a feasible approach, though I believe we’ll discover the answer based on the success (or not) of the project. In general, I think that most of the attendees will be paying attention to ORCID, but that also a wait and see approach is likely as there are many, many issues around researcher IDs that still need to be worked through.

    The next topic was the assessment of research impacts and outputs. This particular topic was not particularly technically focused, but did bring about some interesting discussion about the impact of assessment activities, both positive and negative.

    The next topic, linking research data to scholarly publications to faculty assessment, was a natural progression from the previous topic, and much of the discussion revolved around how to support such relationships. I must admit that while I think this topic is important, I didn’t feel that the discussion really resolved any of the potential issues with supporting researchers in linking data to publications (and then capturing this data for assessment purposes). What is clear is that the concept of publishing data, especially open data, is one that is not necessarily as straight-forward as one would hope when you get into the details, such as where to publish data, how to credit such publication, how is the data maintained, etc. There is a lot of work to be done here.

    Next to be discussed was the preservation of data and software. It was brought up that the sustainability and preservation of data, especially open data, was somewhat analogous to the sustainability and preservation of software, in that both required a certain number of active tasks in order to ensure that both data and software were continually usable. It is also clear that much data requires the proper software in order to be usable, and therefore the issues of software and data sustainability and preservation are in my senses interwoven.

    The group then moved to a brief discussion of the harvesting and use of usage data. Efforts such as COUNTER and popirus2 were mentioned. The ability to track data in a way that balances anonymity and privacy vs. added value back to the user was discussed – the fact that usage data can be leveraged to provide better services back to users was a key consideration.

    The next discussion topic was influenced by the OAI7 workshop. The issue of the synchronisation of resources was discussed, and during OAI7, there was a breakout session that looked at the future of OAI-PMH, both in terms of 1.x sustainability as well as work that might end up with the result of OAI-PMH 2.0. Interestingly, there was some discussion of even the need for data synchronization with the advent of linked data; I can see why this would come up, but I personally believe that linked data isn’t at the point where other methods for ensuring synchronized data aren’t necessary (nor may it ever be).

    Speaking of linked data, the concept arose in many of the SITS discussions, though the group did not officially address it until late in the agenda. I must admit that I’ve yet to drink the linked data lemonade, in the sense that I really don’t see it being the silver bullet that many of its proponents make it out to be, but I do see it as one approach for enabling extended use of data and resources. In the discussion, one of the challenges of the linked data approach that was discussed was the need to map between ontologies.

    At this point, it was getting a bit late into the meeting, but we did talk about two more topics: One was very pragmatic, while the other was a bit more future-thinking (though there might be some disagreement on that). The first was a discussion about how organizationally digital preservation architectures were being supported – were they being supported by central IT, by the Library IT, or otherwise? It seemed that (not surprisingly) a lot depended upon the specific organization, and that perhaps more coordination could be undertaken through efforts such as PASIG. The second discussion was on the topic of “nano-publications”, which the group defined as “things that simply tell you what is being asserted (e.g. Europe is a continent)”. I must admit I got a bit lost about the importance and purpose of nano-publications, but again, it was close to the end of the meeting.

    BTW, as I’m finishing this an email just came through with the official notes from the SITS meeting, which can be accessed at http://eprints.ecs.soton.ac.uk/22546/

    ]]>
    http://digitallibrarian.org/?feed=rss2&p=130 0
    David Lewis’ presentation on Collections Futures http://digitallibrarian.org/?p=126 http://digitallibrarian.org/?p=126#comments Wed, 02 Mar 2011 21:05:12 +0000 http://digitallibrarian.org/?p=126 David Lewis’ presentation on Collections Futures Read More »

    ]]>
    Peter Murray (aka the Disruptive Library Technology Jester) has provided an audio-overlay of David Lewis’ slideshare of his plenary at the last June’s RLG Annual Partners meeting. If you are at all interested in understanding the future of academic libraries, you should take an hour of your time and listen to this presentation. Of particular note, because David says it almost in passing, is that academic libraries are moving away from being collectors of information to being provisioners of information – the difference being that instead of purchasing everything that might be used, academic libraries instead are moving to ensuring that there is a path for provisioning access to materials that actually requested for use by their users. Again, well worth an hour of your time.

    ]]>
    http://digitallibrarian.org/?feed=rss2&p=126 1
    Librarians are *the* search experts… http://digitallibrarian.org/?p=121 http://digitallibrarian.org/?p=121#respond Thu, 19 Aug 2010 14:22:46 +0000 http://digitallibrarian.org/?p=121 …so I wonder how many librarians know all of the tips and tricks for using Google that are mentioned here?

    ]]>
    http://digitallibrarian.org/?feed=rss2&p=121 0
    What do we want from Discovery? Maybe it’s to save the time of the user…. http://digitallibrarian.org/?p=119 http://digitallibrarian.org/?p=119#comments Wed, 18 Aug 2010 13:14:04 +0000 http://digitallibrarian.org/?p=119 What do we want from Discovery? Maybe it’s to save the time of the user…. Read More »

    ]]>
    Just a quick thought on discovery tools – the major newish discovery services being vended to libraries (WorldCat local, Summon, Ebsco Discovery Service, etc.) all have their strengths, their complexity, their middle-of-the-road politician trying to be everything to everybody features. One question I have asked and not yet had a good answer to is “How does your tool save the time of the user?”. For me, that’s the most important feature of any discovery tool.

    Show me data or study results that prove your tool saves the time of the user as compared to other vended tools (and Google and Google Scholar), and you have a clear advantage, at least in what I am considering when choosing to implement a discovery tool.

    ]]>
    http://digitallibrarian.org/?feed=rss2&p=119 1
    Putting a library in Starbucks http://digitallibrarian.org/?p=114 http://digitallibrarian.org/?p=114#respond Thu, 12 Aug 2010 09:40:58 +0000 http://digitallibrarian.org/?p=114 Putting a library in Starbucks Read More »

    ]]>
    It is not uncommon to find a coffee shop in a library these days. Turn that concept around, though – would you expect a library inside a Starbucks? Or maybe that’s the wrong question – how would you react to having a library inside a Starbucks? Well, that concept shuffling its way towards reality, as Starbucks is now experimenting with offering premium (i.e. non-free) content to users while they are on the free wireless that Starbucks provides. In fact, Starbucks actually has a collection development policy for their content – they are providing content in the following areas, which they call channels: News, Entertainment, Wellness, Business & Careers and My Neighborhood. They even call their offerings “curated content”.

    Obviously, this isn’t the equivalent of putting the full contents of a library into a coffee shop, but it is worth our time to pay attention to how this new service approach from Starbucks evolves. Starbucks isn’t giving away content for free just to get customers in the door; they are looking at how they might monetize this service through upsell techniques. The business models and agreements are going to have impact on how libraries do business, and we need to pay attention to how Starbucks brokers agreements with content providers. Eric Hellman’s current favorite term, monopsony, comes to mind here – though in reality Starbucks isn’t buying anything, as no money is actually changing hands, at least to start. Content providers are happy to allow Starbucks to provide limited access (i.e. limited by geographic location / network access) to content for free in order to promote their content and provide a discovery to delivery path that will allow users to extend their use of the content for a price.

    This begs the question – should libraries look at upsell opportunities, especially if it means we can reduce our licensing costs? At the very least, the idea is worth exploring.

    Source: Yahoo News

    ]]>
    http://digitallibrarian.org/?feed=rss2&p=114 0
    1 week of iPad http://digitallibrarian.org/?p=101 http://digitallibrarian.org/?p=101#comments Wed, 14 Apr 2010 11:10:36 +0000 http://digitallibrarian.org/?p=101 1 week of iPad Read More »

    ]]>
    It has been a little over a week since My iPad was delivered, and in that time I have had the opportunity to try it out at home, at work, and on the road. In fact, I’m currently typing this entry on it from the hotel restaurant at the CNI Spring task force meeting. I feel that I have used it enough now to provide some of my insights and thoughts about the iPad, how I am using it, and what I think of it.

    So, how best to describe the iPad? Fun. Convenient. Fun again. The iPad is more than the sum of its parts; much like the iPhone, it provides an overall experience, one that is enjoyable and yes, efficient. Browsing is great fun; I have only run into one site where because of the lack of flash support was completely inaccessible (a local restaurant site). A number of sites that I regularly peruse have some flash aspect that is not available via the iPad, but typically this isn’t a big loss. For example, if there is an engadget article that contains video, I won’t get the video. However, the NY Times, ESPN, and other major sites are already supporting HTML 5 embedded video, and I expect to see a strong push towards HTML 5 and away from flash. In the grand scheme of things, most of the sites I browse are text and image based, and have no issues.

    Likewise for email and calendaring – both work like a charm. Email on the iPad is easy, fun, and much better than on the iPhone. The keyboard, when in landscape mode, is actually much better than I expected, and very suitable for email replies (not to mention blog posts). I’d go as far to say that the usability of the onscreen keyboard (when the iPad is in landscape mode) is as good or better than a typical net book keyboard. Also, an unintended bonus is that typing on the keyboard is pretty much silent; this is somewhat noticeable during conference sessions where a dozen or so attendees are typing their notes and the clack of their keyboards starts to add up.

    So, how am I using my iPad? Well, on this trip, I have used it to read (one novel and a bunch of work-related articles), do email, listen to music, watch videos, stream some netflix, browse the web, draft a policy document for my place of employment, diagram a repository architecture, and take notes during conference sessions. Could I do all of this on a laptop? Sure. Could I do all of this on a laptop without plugging in at any point in the day? Possibly, with the right laptop or net book. But here’s the thing – at the conference, instead of lugging my laptop bag around with me, my iPad replaced the laptop, my notepad, and everything else I would have dragged around in my bag. I literally only took my iPad, which is actually smaller than a standard paper notebook, and honestly I didn’t miss a beat. Quickly jot down a note? Easy. Sketch out an idea? Ditto. It’s all just right there, all the functionality, in a so-much-more convenient form factor.

    Is the iPad perfect? By no means – the desktop interface is optimized for the iPhone / iTouch, and feels a bit inefficient for the larger iPad. Because of the current lack of multitasking (something that Apple has already announced will be available in the next version of the OS), I can’t keep an IM client running in the background. There is no inherent folder system, so saving files outside of applications is more complex then it should be. Fingerprints show up much more than I expected, though they wipe away fairly easily with a cloth. The weight (1.5 lbs) is just enough to make you need to shift how you hold the iPad after a period of time.

    Again, here’s the thing: the iPad doesn’t need to be perfect, it needs to be niche. Is it niche? Ask my laptop bag.

    ]]>
    http://digitallibrarian.org/?feed=rss2&p=101 1
    dihslovenia-si-2585 ---- Digitalno inovacijsko stičišče Slovenije - Digitalno inovacijsko stičišče Slovenije O nas Kontakt Slovenščina | English Vstop Iskanje Katalog strokovnjakov Brskajte po katalogu Vpis v katalog Vavčerji Aktualno Novice Dogodki Baza znanja Katalog dobrih praks Strokovna gradiva Video vsebine Razpisi Sodelujte Ob predsedovanju Slovenije Svetu EU se predstavite na digitalnem razstavišču Tehnologija za ljudi Poziv podjetjem k sodelovanju v pozivu - Spletne tržnice SPS z vavčerji znova podpira digitalizacijo Naložbo sofinancirata Republika Slovenija in Evropska unija iz Evropskega sklada za regionalni razvoj. Brskajte po katalogu strokovnjakov Pridobite vavčer za sofinanciranje Vpišite se v katalog strokovnjakov Novice 22. apr. 2021 Priložnosti za digitalizacijo slovenskega gospodarstva v okviru nove finančne perspektive 2021-2027 01. apr. 2021 Oblikovanje predlogov vsebin za študijske programe 23. mar. 2021 Z novo pobudo lažje do digitalnih znanj za delovna mesta prihodnosti Vse novice Dosezite podobne rezultate tudi vi. Sodelujte z nami! Odkrijte prednosti povezovanja partnerjev v DIH Slovenije. Sodelujte z nami Dogodki Udeležite se srečanj za digitalno transformacijo. Vavčerji Do 60% sofinanciranja na področju digitalizacije. Omogočamo digitalno transformacijo. Gradimo med-sektorska in multidisciplinarna partnerstva: univerze, raziskovalne in poslovne ustanove, podjetja, ponudniki IKT in podporne organizacije za podjetja, ki predstavljajo ekosistem za trajnostno kratkoročno in dolgoročno podporo tej viziji. Povezovanje DIH Slovenije zagotavlja povezave z vlagatelji, olajša dostop do financiranja digitalne transformacije, poveže uporabnike in ponudnike digitalnih inovacij ter omogoča sinergije med digitalnimi in drugimi ključnimi tehnologijami. Kompetence Razvoj digitalnih kompetenc in kadrov prihodnosti. Podpora digitalni transformaciji Skupni razvoj storitev za podporo upravljanju digitalne preobrazbe v podjetjih. Inovacije in prototipi Spodbujanje odprtega inoviranja, oblikovanje novih poslovnih modelov, eksperimentalnih in pilotnih okolij. Internacionalizacija Prenos dobrih praks in sodelovanje z drugimi Digitalnimi inovacijskimi stičišči v EU. Več o DIH Slovenija Strateški partnerji, ki nam pomagajo graditi digitalno prihodnost Slovenije Sodelujte z nami Tudi vi? Ostanite na tekočem. Prijavite se na eNovice! Prijavite se na eNovice Katalog strokovnjakov Vavčerji Aktualno Brskajte po katalogu strokovnjakov Pridobite vavčer za sofinanciranje Vpišite se v katalog strokovnjakov Sodelujte z nami Odkrijte prednosti povezovanja partnerjev v DIH Slovenije. Sodelujte z nami Dimičeva 13, 1503 Ljubljana, Slovenija 040 606 710 pon.–pet., 10:00 - 13:00 info@dihslovenia.si Podpora Naložbo sofinancirata Republika Slovenija in Evropska unija iz Evropskega sklada za regionalni razvoj. Članstvo 2020 © Digital Innovation Hub Slovenia. Vse pravice pridržane. Pravna obvestila Politika zasebnosti Piškotki Prosimo, potrdite piškotke. Na spletni strani dihslovenia.si uporabljamo piškotke z namenom zagotavljanja spletne storitve in funkcionalnosti, ki jih brez njih ne bi mogli nuditi. Prosimo vas, da s klikom na spodnji gumb potrdite uporabo piškotkov na naši spletni strani. Strinjam se Več informacij dlfteach-pubpub-org-6389 ---- None dlfteach-pubpub-org-6525 ---- None dlfteach-pubpub-org-9995 ---- None dltj-org-1250 ---- Publishers going-it-alone (for now?) with GetFTR | Disruptive Library Technology Jester Skip links Skip to primary navigation Skip to content Skip to footer Disruptive Library Technology Jester About Resume Toggle search Toggle menu Peter Murray Library technologist, open source advocate, striving to think globally while acting locally Follow Columbus, Ohio Email Twitter Keybase GitHub LinkedIn StackOverflow ORCID Email Publishers going-it-alone (for now?) with GetFTR Posted on December 03, 2019 and updated on April 03, 2021     5 minute read In early December 2019, a group of publishers announced Get-Full-Text-Research, or GetFTR for short. I read about this first in Roger Schonfeld’s “Publishers Announce a Major New Service to Plug Leakage” piece in The Scholarly Kitchen via Jeff Pooley’s Twitter thread and blog post. Details about how this works are thin, so I’m leaning heavily on Roger’s description. I’m not as negative about this as Jeff, and I’m probably a little more opinionated than Roger. This is an interesting move by publishers, and—as the title of this post suggests—I am critical of the publisher’s “go-it-alone” approach. First, some disclosure might be in order. My background has me thinking of this in the context of how it impacts libraries and library consortia. For the past four years, I’ve been co-chair of the NISO Information Discovery and Interchange topic committee (and its predecessor, the “Discovery to Delivery” topic committee), so this is squarely in what I’ve been thinking about in the broader library-publisher professional space. I also traced the early development of RA21 and more recently am volunteering on the SeamlessAccess Entity Category and Attribute Bundles Working Group; that’ll become more important a little further down this post. I was nodding along with Roger’s narrative until I stopped short here: The five major publishing houses that are the driving forces behind GetFTR are not pursuing this initiative through one of the major industry collaborative bodies. All five are leading members of the STM Association, NISO, ORCID, Crossref, and CHORUS, to name several major industry groups. But rather than working through one of these existing groups, the houses plan instead to launch a new legal entity.  While [Vice President of Product Strategy & Partnerships for Wiley Todd] Toler and [Senior Director, Technology Strategy & Partnerships for the American Chemical Society Ralph] Youngen were too politic to go deeply into the details of why this might be, it is clear that the leadership of the large houses have felt a major sense of mismatch between their business priorities on the one hand and the capabilities of these existing industry bodies. At recent industry events, publishing house CEOs have voiced extensive concerns about the lack of cooperation-driven innovation in the sector. For example, Judy Verses from Wiley spoke to this issue in spring 2018, and several executives did so at Frankfurt this fall. In both cases, long standing members of the scholarly publishing sector questioned if these executives perhaps did not realize the extensive collaborations driven through Crossref and ORCID, among others. It is now clear to me that the issue is not a lack of knowledge but rather a concern at the executive level about the perceived inability of existing collaborative vehicles to enable the new strategic directions that publishers feel they must pursue.  This is the publishers going-it-alone. To see Roger describe it, they are going to create this web service that allows publishers to determine the appropriate copy for a patron and do it without input from the libraries. Librarians will just be expected to put this web service widget into their discovery services to get “colored buttons indicating that the link will take [patrons] to the version of record, an alternative pathway, or (presumably in rare cases) no access at all.” (Let’s set aside for the moment the privacy implications of having a fourth-party web service recording all of the individual articles that come up in a patron’s search results.) Librarians will not get to decide the “alternative pathway” that is appropriate for the patron: “Some publishers might choose to provide access to a preprint or a read-only version, perhaps in some cases on some kind of metered basis.” (Roger goes on to say that he “expect[s] publishers will typically enable some alternative version for their content, in which case the vast majority of scholarly content will be freely available through publishers even if it is not open access in terms of licensing.” I’m not so confident.) No, thank you. If publishers want to engage in technical work to enable libraries and others to build web services that determine the direct link to an article based on a DOI, then great. Libraries can build a tool that consumes that information as well as takes into account information about preprint services, open access versions, interlibrary loan and other methods of access. But to ask libraries to accept this publisher-controlled access button in their discovery layers, their learning management systems, their scholarly profile services, and their other tools? That sounds destined for disappointment. I am only somewhat encouraged by the fact that RA21 started out as a small, isolated collaboration of publishers before they brought in NISO and invited libraries to join the discussion. Did it mean that it slowed down deployment of RA21? Undoubtedly yes. Did persnickety librarians demand transparent discussions and decisions about privacy-related concerns like what attributes the publisher would get about the patron in the Shibboleth-powered backchannel? Yes, but because the patrons weren’t there to advocate for themselves. Will it likely mean wider adoption? I’d like to think so. Have publishers learned that forcing these kinds of technologies onto users without consultation is a bad idea? At the moment it would appear not. Some of what publishers are seeking with GetFTR can be implemented with straight-up OpenURL or—at the very least—limited-scope additions to OpenURL (the Z39.88 open standard!). So that they didn’t start with OpenURL, a robust existing standard, is both concerning and annoying. I’ll be watching and listening for points of engagement, so I remain hopeful. A few words about Jeff Pooley’s five-step “laughably creaky and friction-filled effort” that is SeamlessAccess. Many of the steps Jeff describes are invisible and well-established technical protocols. What Jeff fails to take into account is the very visible and friction-filled effect of patrons accessing content beyond the boundaries of campus-recognized internet network addresses. Those patrons get stopped at step two with a “pay $35 please” message. I’m all for removing that barrier entirely by making all published content “open access”. It is folly to think, though, that researchers and readers can enforce an open access business model on all publishers, so solutions like SeamlessAccess will have a place. (Which is to say nothing of the benefit of inter-institutional resource collaboration opened up by a more widely deployed Shibboleth infrastructure powered by SeamlessAccess.) Tags: discovery, GetFTR, niso, openurl, ra21, SeamlessAccess Categories: Linking Technologies Twitter Facebook LinkedIn Previous Next You May Also Enjoy More Thoughts on Pre-recording Conference Talks 8 minute read Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion abo... Should All Conference Talks be Pre-recorded? 6 minute read The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and con... User Behavior Access Controls at a Library Proxy Server are Okay 9 minute read Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. ... As a Cog in the Election System: Reflections on My Role as a Precinct Election Official 10 minute read I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democ... Enter your search term... Twitter GitHub Feed © 2021 Peter Murray. Powered by Jekyll & Minimal Mistakes. dltj-org-165 ---- Managing Remote Conference Presenters with Zoom | Disruptive Library Technology Jester Skip links Skip to primary navigation Skip to content Skip to footer Disruptive Library Technology Jester About Resume Toggle search Toggle menu Peter Murray Library technologist, open source advocate, striving to think globally while acting locally Follow Columbus, Ohio Email Twitter Keybase GitHub LinkedIn StackOverflow ORCID Email Managing Remote Conference Presenters with Zoom Posted on March 14, 2020 and updated on April 03, 2021     8 minute read Bringing remote presenters into a face-to-face conference is challenging and fraught with peril. In this post, I describe a scheme using Zoom that had in-person attendees forgetting that the presenter was remote! The Code4Lib conference was this week, and with the COVID-19 pandemic breaking through many individuals and institutions made decisions to not travel to Pittsburgh for the meeting. We had an unprecedented nine presentations that were brought into the conference via Zoom. I was chairing the livestream committee for the conference (as I have done for several years—skipping last year), so it made the most sense for me to arrange a scheme for remote presenters. With the help of the on-site A/V contractor, we were able to pull this off with minimal requirements for the remote presenter. List of Requirements 2 Zoom Pro accounts 1 PC/Mac with video output, as if you were connecting an external monitor (the “Receiving Zoom” computer) 1 PC/Mac (the “Coordinator Zoom” computer) 1 USB audio interface Hardwired network connection for the Receiving Zoom computer (recommended) The Pro-level Zoom accounts were required because we needed to run a group call for longer than 40 minutes (to include setup time). And two were needed: one for the Coordinator Zoom machine and one for the dedicated Receiving Zoom machine. It would have been possible to consolidate the two Zoom Pro accounts and the two PC/Mac machines into one, but we had back-to-back presenters at Code4Lib, and I wanted to be able to help one remote presenter get ready while another was presenting. In addition to this equipment, the A/V contractor was indispensable in making the connection work. We fed the remote presenter’s video and audio from the Receiving Zoom computer to the contractor’s A/V switch through HDMI, and the contractor put the video on the ballroom projectors and audio through the ballroom speakers. The contractor gave us a selective audio feed of the program audio minus the remote presenter’s audio (so they wouldn’t hear themselves come back through the Zoom meeting). This becomes a little clearer in the diagram below. Physical Connections and Setup This diagram shows the physical connections between machines. The Audio Mixer and Video Switch were provided and run by the A/V contractor. The Receiving Zoom machine was the one that is connected to the A/V contractor’s Video Switch via an HDMI cable coming off the computer’s external monitor connection. In the Receiving Zoom computer’s control panel, we set the external monitor to mirror what was on the main monitor. The audio and video from the computer (i.e., the Zoom call) went out the HDMI cable to the A/V contractor’s Video Switch. The A/V contractor took the audio from the Receiving Zoom computer through the Video Switch and added it to the Audio Mixer as an input channel. From there, the audio was sent out to the ballroom speakers the same way audio from the podium microphone was amplified to the audience. We asked the A/V contractor to create an audio mix that includes all of the audio sources except the Receiving Zoom computer (e.g., in-room microphones) and plugged that into the USB Audio interface. That way, the remote presenter could hear the sounds from the ballroom—ambient laughter, questions from the audience, etc.—in their Zoom call. (Note that it was important to remove the remote presenter’s own speaking voice from this audio mix; there was a significant, distracting delay between the time the presenter spoke and the audio was returned to them through the Zoom call.) We used a hardwired network connection to the internet, and I would recommend that—particularly with tech-heavy conferences that might overflow the venue wi-fi. (You don’t want your remote presenter’s Zoom to have to compete with what attendees are doing.) Be aware that the hardwired network connection will cost more from the venue, and may take some time to get functioning since this doesn’t seem to be something that hotels often do. In the Zoom meeting, we unmuted the microphone and selected the USB Audio interface as the microphone input. As the Zoom meeting was connected, we made the meeting window full-screen so the remote presenter’s face and/or presentation were at the maximum size on the ballroom projectors. Setting Up the Zoom Meetings The two Zoom accounts came from the Open Library Foundation. (Thank you!) As mentioned in the requirements section above, these were Pro-level accounts. The two accounts were olf_host2@openlibraryfoundation.org and olf_host3@openlibraryfoundation.org. The olf_host2 account was used for the Receiving Zoom computer, and the olf_host3 account was used for the Coordinator Zoom computer. The Zoom meeting edit page looked like this: This is for the “Code4Lib 2020 Remote Presenter A” meeting with the primary host as olf_host2@openlibraryfoundation.org. Note these settings: A recurring meeting that ran from 8:00am to 6:00pm each day of the conference. Enable join before host is checked in case the remote presenter got on the meeting before I did. Record the meeting automatically in the cloud to use as a backup in case something goes wrong. Alternative Hosts is olf_host3@openlibraryfoundation.org The “Code4Lib 2020 Remote Presenter B” meeting was exactly the same except the primary host was olf_host3, and olf_host2 was added as an alternative host. The meetings were set up with each other as the alternative host so that the Coordinator Zoom computer could start the meeting, seamlessly hand it off to the Receiving Zoom computer, then disconnect. Preparing the Remote Presenter Remote presenters were given this information: Code4Lib will be using Zoom for remote presenters. In addition to the software, having the proper audio setup is vital for a successful presentation. Microphone: The best option is a headset or earbuds so a microphone is close to your mouth. Built-in laptop microphones are okay, but using them will make it harder for the audience to hear you. Speaker: A headset or earbuds are required. Do not use your computer’s built-in speakers. The echo cancellation software is designed for small rooms and cannot handle the delay caused by large ballrooms. You can test your setup with a test Zoom call. Be sure your microphone and speakers are set correctly in Zoom. Also, try sharing your screen on the test call so you understand how to start and stop screen sharing. The audience will see everything on your screen, so quit/disable/turn-off notifications that come from chat programs, email clients, and similar tools. Plan to connect to the Zoom meeting 30 minutes before your talk to work out any connection or setup issues. At the 30-minute mark before the remote presentation, I went to the ballroom lobby and connected to the designated Zoom meeting for the remote presenter using the Coordinator Zoom computer. I used this checklist with each presenter: Check presenter’s microphone level and sound quality (make sure headset/earbud microphone is being used!) Check presenter’s speakers and ensure there is no echo Test screen-sharing (start and stop) with presenter Remind presenter to turn off notifications from chat programs, email clients, etc. Remind the presenter that they need to keep track of their own time; there is no way for us to give them cues about timing other than interrupting them when their time is up The critical item was making sure the audio worked (that their computer was set to use the headset/earbud microphone and audio output). The result was excellent sound quality for the audience. When the remote presenter was set on the Zoom meeting, I returned to the A/V table and asked a livestream helper to connect the Receiving Zoom to the remote presenter’s Zoom meeting. At this point, the remote presenter can hear the audio in the ballroom of the speaker before them coming through the Receiving Zoom computer. Now I would lock the Zoom meeting to prevent others from joining and interrupting the presenter (from the Zoom Participants panel, select More then Lock Meeting). I hung out on the remote presenter’s meeting on the Coordinator Zoom computer in case they had any last-minute questions. As the speaker in the ballroom was finishing up, I wished the remote presenter well and disconnected the Coordinator Zoom computer from the meeting. (I always selected Leave Meeting rather than End Meeting for All so that the Zoom meeting continued with the remote presenter and the Receiving Zoom computer.) As the remote presenter was being introduced—and the speaker would know because they could hear it in their Zoom meeting—the A/V contractor switched the video source for the ballroom projectors to the Receiving Zoom computer and unmuted the Receiving Zoom computer’s channel on the Audio Mixer. At this point, the remote speaker is off-and-running! Last Thoughts This worked really well. Surprisingly well. So well that I had a few people comment that they were taken aback when they realized that there was no one standing at the podium during the presentation. I’m glad I had set up the two Zoom meetings. We had two cases where remote presenters were back-to-back. I was able to get the first remote presenter set up and ready on one Zoom meeting while preparing the second remote presenter on the other Zoom meeting. The most stressful part was at the point when we disconnected the first presenter’s Zoom meeting and quickly connected to the second presenter’s Zoom meeting. This was slightly awkward for the second remote presenter because they didn’t hear their full introduction as it happened and had to jump right into their presentation. This could be solved by setting up a second Receiving Zoom computer, but this added complexity seemed to be too much for the benefit gained. I would definitely recommend making this setup a part of the typical A/V preparations for future Code4Lib conferences. We don’t know when an individual’s circumstances (much less a worldwide pandemic) might cause a last-minute request for a remote presentation capability, and the overhead of the setup is pretty minimal. Tags: code4lib, howto, zoom Categories: Raw Technology Twitter Facebook LinkedIn Previous Next You May Also Enjoy More Thoughts on Pre-recording Conference Talks 8 minute read Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion abo... Should All Conference Talks be Pre-recorded? 6 minute read The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and con... User Behavior Access Controls at a Library Proxy Server are Okay 9 minute read Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. ... As a Cog in the Election System: Reflections on My Role as a Precinct Election Official 10 minute read I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democ... Enter your search term... Twitter GitHub Feed © 2021 Peter Murray. Powered by Jekyll & Minimal Mistakes. dltj-org-2962 ---- More Thoughts on Pre-recording Conference Talks | Disruptive Library Technology Jester Skip links Skip to primary navigation Skip to content Skip to footer Disruptive Library Technology Jester About Resume Toggle search Toggle menu Peter Murray Library technologist, open source advocate, striving to think globally while acting locally Follow Columbus, Ohio Email Twitter Keybase GitHub LinkedIn StackOverflow ORCID Email More Thoughts on Pre-recording Conference Talks Posted on April 08, 2021     7 minute read Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion about recording talks to fill in gaps—positive and negative—about the concept, and I was not disappointed. I’m particularly thankful to Lisa Janicke Hinchliffe and Andromeda Yelton along with Jason Griffey, Junior Tidal, and Edward Lim Junhao for generously sharing their thoughts. Daniel S and Kate Deibel also commented on the Code4Lib Slack team. I added to the previous article’s bullet points and am expanding on some of the issues here. I’m inviting everyone mentioned to let me know if I’m mischaracterizing their thoughts, and I will correct this post if I hear from them. (I haven’t found a good comments system to hook into this static site blog.) Pre-recorded Talks Limit Presentation Format Lisa Janicke Hinchliffe made this point early in the feedback: @DataG For me downside is it forces every session into being a lecture. For two decades CfPs have emphasized how will this season be engaging/not just a talking head? I was required to turn workshops into talks this year. Even tho tech can do more. Not at all best pedagogy for learning — Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 Jason described the “flipped classroom” model that he had in mind as the NISOplus2021 program was being developed. The flipped classroom model is one where students do the work of reading material and watching lectures, then come to the interactive time with the instructors ready with questions and comments about the material. Rather than the instructor lecturing during class time, the class time becomes a discussion about the material. For NISOplus, “the recording is the material the speaker and attendees are discussing” during the live Zoom meetings. In the previous post, I described how the speaker could respond in text chat while the recording replay is beneficial. Lisa went on to say: @DataG Q+A is useful but isn't an interactive session. To me, interactive = participants are co-creating the session, not watching then commenting on it. — Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 She described an example: the SSP preconference she ran at CHS. I’m paraphrasing her tweets in this paragraph. The preconference had a short keynote and an “Oprah-style” panel discussion (not pre-prepared talks). This was done live; nothing was recorded. After the panel, people worked in small groups using Zoom and a set of Google Slides to guide the group work. The small groups reported their discussions back to all participants. Andromeda points out (paraphrasing twitter-speak): “Presenters will need much more— and more specialized—skills to pull it off, and it takes a lot more work.” And Lisa adds: “Just so there is no confusion … I don’t think being online makes it harder to do interactive. It’s the pre-recording. Interactive means participants co-create the session. A pause to chat isn’t going to shape what comes next on the recording.” Increased Technical Burden on Speakers and Organizers @ThatAndromeda @DataG Totally agree on this. I had to pre-record a conference presentation recently and it was a terrible experience, logistically. I feel like it forces presenters to become video/sound editors, which is obviously another thing to worry about on top of content and accessibility. — Junior Tidal (@JuniorTidal) April 5, 2021 Andromeda also agreed with this: “I will say one of the things I appreciated about NISO is that @griffey did ALL the video editing, so I was not forced to learn how that works.” She continued, “everyone has different requirements for prerecording, and in [Code4Lib’s] case they were extensive and kept changing.” And later added: “Part of the challenge is that every conference has its own tech stack/requirements. If as a presenter I have to learn that for every conference, it’s not reducing my workload.” It is hard not to agree with this; a high-quality (stylistically and technically) recording is not easy to do with today’s tools. This is also a technical burden for meeting organizers. The presenters will put a lot of work into talks—including making sure the recordings look good; whatever playback mechanism is used has to honor the fidelity of that recording. For instance, presenters who have gone through the effort to ensure the accessibility of the presentation color scheme want the conference platform to display the talk “as I created it.” The previous post noted that recorded talks also allow for the creation of better, non-real-time transcriptions. Lisa points out that presenters will want to review that transcription for accuracy, which Jason noted adds to the length of time needed before the start of a conference to complete the preparations. Increased Logistical Burden on Presenters @ThatAndromeda @DataG @griffey Even if prep is no more than the time it would take to deliver live (which has yet to be case for me and I'm good at this stuff), it is still double the time if you are expected to also show up live to watch along with everyone else. — Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 This is a consideration I hadn’t thought through—that presenters have to devote more clock time to the presentation because first they have to record it and then they have to watch it. (Or, as Andromeda added, “significantly more than twice the time for some people, if they are recording a bunch in order to get it right and/or doing editing.”) No. Audience. Reaction. @DataG @griffey 3) No. Audience. Reaction. I give a joke and no one laughs. Was it funny? Was it not funny? Talks are a *performance* and a *relationship*; I'm getting energy off the audience, I'm switching stuff on the fly to meet their vibe. Prerecorded/webinar is dead. Feels like I'm bombing. — Andromeda Yelton (@ThatAndromeda) April 5, 2021 Wow, yes. I imagine it would take a bit of imagination to get in the right mindset to give a talk to a small camera instead of an audience. I wonder how stand-up comedians are dealing with this as they try to put on virtual shows. Andromeda summed this up: @DataG @griffey oh and I mean 5) I don't get tenure or anything for speaking at conferences and goodness knows I don't get paid. So the ENTIRE benefit to me is that I enjoy doing the talk and connect to people around it. prerecorded talk + f2f conf removes one of these; online removes both. — Andromeda Yelton (@ThatAndromeda) April 5, 2021 Also in this heading could be “No Speaker Reaction”—or the inability for subsequent speakers at a conference to build on something that someone said earlier. In the Code4Lib Slack team, Daniel S noted: “One thing comes to mind on the pre-recording [is] the issue that prerecorded talks lose the ‘conversation’ aspect where some later talks at a conference will address or comment on earlier talks.” Kate Deibel added: “Exactly. Talks don’t get to spontaneously build off of each other or from other conversations that happen at the conference.” Currency of information Lisa points out that pre-recording talks before en event means there is a delay between the recording and the playback. In the example she pointed out, there was a talk at RLUK that pre-recorded would have been about the University of California working on an Open Access deal with Elsevier; live, it was able to be “the deal we announced earlier this week”. Conclusions? Near the end of the discussion, Lisa added: @DataG @griffey @ThatAndromeda I also recommend going forward that the details re what is required of presenters be in the CfP. It was one thing for conferences that pivoted (huge effort!) but if you write the CfP since the pivot it should say if pre-record, platform used, etc. — Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 …and Andromeda added: “Strong agree here. I understand that this year everyone was making it up as they went along, but going forward it’d be great to know that in advance.” That means conferences will need to take these needs into account well before the Call for Proposals (CfP) is published. A conference that is thinking now about pre-recording their talks must work through these issues and set expectations with presenters early. As I hoped, the Twiter replies tempered my eagerness for the all-recorded style with some real-world experience. There could be possibilities here, but adapting face-to-face meetings to a world with less travel won’t be simple and will take significant thought beyond the issues of technology platforms. Edward Lim Junhao summarized this nicely: “I favor unpacking what makes up our prof conferences. I’m interested in recreating that shared experience, the networking, & the serendipity of learning sth you didn’t know. I feel in-person conferences now have to offer more in order to justify people traveling to attend them.” Related, Andromeda said: “Also, for a conf that ultimately puts its talks online, it’s critical that it have SOMEthing beyond content delivery during the actual conference to make it worth registering rather than just waiting for youtube. realtime interaction with the speaker is a pretty solid option.” If you have something to add, reach out to me on Twitter. Given enough responses, I’ll create another summary. Let’s keep talking about what that looks like and sharing discoveries with each other. The Tree of Tweets It was a great discussion, and I think I pulled in the major ideas in the summary above. With some guidance from Ed Summers, I’m going to embed the Twitter threads below using Treeverse by Paul Butler. We might be stretching the boundaries of what is possible, so no guarantees that this will be viewable for the long term. Tags: code4lib, covid19, meeting planning, NISOplus Categories: L/IS Profession Twitter Facebook LinkedIn Previous Next You May Also Enjoy Should All Conference Talks be Pre-recorded? 6 minute read The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and con... User Behavior Access Controls at a Library Proxy Server are Okay 9 minute read Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. ... As a Cog in the Election System: Reflections on My Role as a Precinct Election Official 9 minute read I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democ... Running an All-Online Conference with Zoom [post removed] less than 1 minute read This is an article draft that was accidentally published. I hope to work on a final version soon. If you really want to see it, I saved a copy on the Interne... Enter your search term... Twitter GitHub Feed © 2021 Peter Murray. Powered by Jekyll & Minimal Mistakes. dltj-org-3521 ---- Should All Conference Talks be Pre-recorded? | Disruptive Library Technology Jester Skip links Skip to primary navigation Skip to content Skip to footer Disruptive Library Technology Jester About Resume Toggle search Toggle menu Peter Murray Library technologist, open source advocate, striving to think globally while acting locally Follow Columbus, Ohio Email Twitter Keybase GitHub LinkedIn StackOverflow ORCID Email Should All Conference Talks be Pre-recorded? Posted on April 03, 2021 and updated on April 08, 2021     6 minute read The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and conference organizers. Should all talks be pre-recorded, even when we are back face-to-face? Note! After I posted a link to this article on Twitter, there was a great response of thoughtful comments. I've included new bullet points below and summarized the responses in another blog post. As an entirely virtual conference, I think we can call Code4Lib 2021 a success. Success ≠ Perfect, of course, and last week the conference coordinating team got together on a Zoom call for a debriefing session. We had a lengthy discussion about what we learned and what we wanted to take forward to the 2022 conference, which we’re anticipating will be something with a face-to-face component. That last sentence was tough to compose: “…will be face-to-face”? “…will be both face-to-face and virtual”? (Or another fully virtual event?) Truth be told, I don’t think we know yet. I think we know with some certainty that the COVID pandemic will become much more manageable by this time next year—at least in North America and Europe. (Code4Lib draws from primarily North American library technologists with a few guests from other parts of the world.) I’m hearing from higher education institutions, though, that travel is going to be severely curtailed…if not for health risk reasons, then because budgets have been slashed. So one has to wonder what a conference will look like next year. I’ve been to two online conferences this year: NISOplus21 and Code4Lib. Both meetings recorded talks in advance and started playback of the recordings at a fixed point in time. This was beneficial for a couple of reasons. For organizers and presenters, pre-recording allowed technical glitches to be worked through without the pressure of a live event happening. Technology is not nearly perfect enough or ubiquitously spread to count on it working in real-time. 1 NISOplus21 also used the recordings to get transcribed text for the videos. (Code4Lib used live transcriptions on the synchronous playback.) Attendees and presenters benefited from pre-recording because the presenters could be in the text chat channel to answer questions and provide insights. Having the presenter free during the playback offers new possibilities for making talks more engaging: responding in real-time to polls, getting forehand knowledge of topics for subsequent real-time question/answer sessions, and so forth. The synchronous playback time meant that there was a point when (almost) everyone was together watching the same talk—just as in face-to-face sessions. During the Code4Lib conference coordinating debrief call, I asked the question: “If we saw so many benefits to pre-recording talks, do we want to pre-record them all next year?” In addition to the reasons above, pre-recorded talks benefit those who are not comfortable speaking English or are first-time presenters. (They have a chance to re-do their talk as many times as they need in a much less stressful environment.) “Live” demos are much smoother because a recording can be restarted if something goes wrong. Each year, at least one presenter needs to use their own machine (custom software, local development environment, etc.), and swapping out presenter computers in real-time is risky. And it is undoubtedly easier to impose time requirements with recorded sessions. So why not pre-record all of the talks? I get it—it would be different to sit in a ballroom watching a recording play on big screens at the front of the room while the podium is empty. But is it so different as to dramatically change the experience of watching a speaker at a podium? In many respects, we had a dry-run of this during Code4Lib 2020. It was at the early stages of the coming lockdowns when institutions started barring employee travel, and we had to bring in many presenters remotely. I wrote a blog post describing the setup we used for remote presenters, and at the end, I said: I had a few people comment that they were taken aback when they realized that there was no one standing at the podium during the presentation. Some attendees, at least, quickly adjusted to this format. For those with the means and privilege of traveling, there can still be face-to-face discussions in the hall, over meals, and social activities. For those that can’t travel (due to risks of traveling, family/personal responsibilities, or budget cuts), the attendee experience is a little more level—everyone is watching the same playback and in the same text backchannels during the talk. I can imagine a conference tool capable of segmenting chat sessions during the talk playback to “tables” where you and close colleagues can exchange ideas and then promote the best ones to a conference-wide chat room. Something like that would be beneficial as attendance grows for events with an online component, and it would be a new form of engagement that isn’t practical now. There are undoubtedly reasons not to pre-record all session talks (beyond the feels-weird-to-stare-at-an-unoccupied-ballroom-podium reasons). During the debriefing session, one person brought up that having all pre-recorded talks erodes the justification for in-person attendance. I can see a manager saying, “All of the talks are online…just watch it from your desk. Even your own presentation is pre-recorded, so there is no need for you to fly to the meeting.” That’s legitimate. So if you like bullet points, here’s how it lays out. Pre-recording all talks is better for: Accessibility: better transcriptions for recorded audio versus real-time transcription (and probably at a lower cost, too) Engagement: the speaker can be in the text chat during playback, and there could be new options for backchannel discussions Better quality: speakers can re-record their talk as many times as needed Closer equality: in-person attendees are having much the same experience during the talk as remote attendees Downsides for pre-recording all talks: Feels weird: yeah, it would be different Erodes justification: indeed a problem, especially for those for whom giving a speech is the only path to getting the networking benefits of face-to-face interaction Limits presentation format: it forces every session into being a lecture. For two decades CfPs have emphasized how will this season be engaging/not just a talking head? (Lisa Janicke Hinchliffe) Increased Technical Burden on Speaker and Organizers: conference organizers asking presenters to do their own pre-recording is a barrier (Junior Tidal), and organizers have added new requirements for themselves No Audience Feedback: pre-recording forces the presenter into an unnatural state relative to the audience (Andromeda Yelton) Currency of information: pre-recording talks before en event naturally introduces a delay between the recording and the playback. (Lisa Janicke Hinchliffe) I’m curious to hear of other reasons, for and against. Reach out to me on Twitter if you have some. The COVID-19 pandemic has changed our society and will undoubtedly transform it in ways that we can’t even anticipate. Is the way that we hold professional conferences one of them? Can we just pause for a moment and consider the decades of work and layers of technology that make a modern teleconference call happen? For you younger folks, there was a time when one couldn’t assume the network to be there. As in: the operating system on your computer couldn’t be counted on to have a network stack built into it. In the earliest years of my career, we were tickled pink to have Macintoshes at the forefront of connectivity through GatorBoxes. Go read the first paragraph of that Wikipedia article on GatorBoxes…TCP/IP was tunneled through LocalTalk running over PhoneNet on unshielded twisted pairs no faster than about 200 kbit/second. (And we loved it!) Now the network is expected; needing to know about TCP/IP is pushed so far down the stack as to be forgotten…assumed. Sure, the software on top now is buggy and bloated—is my Zoom client working? has Zoom’s service gone down?—but the network…we take that for granted. ↩ Tags: code4lib, covid19, meeting planning, NISOplus Categories: L/IS Profession Twitter Facebook LinkedIn Previous Next You May Also Enjoy More Thoughts on Pre-recording Conference Talks 8 minute read Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion abo... User Behavior Access Controls at a Library Proxy Server are Okay 9 minute read Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. ... As a Cog in the Election System: Reflections on My Role as a Precinct Election Official 9 minute read I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democ... Running an All-Online Conference with Zoom [post removed] less than 1 minute read This is an article draft that was accidentally published. I hope to work on a final version soon. If you really want to see it, I saved a copy on the Interne... Enter your search term... Twitter GitHub Feed © 2021 Peter Murray. Powered by Jekyll & Minimal Mistakes. dltj-org-4212 ---- Should All Conference Talks be Pre-recorded? | Disruptive Library Technology Jester Skip links Skip to primary navigation Skip to content Skip to footer Disruptive Library Technology Jester About Resume Toggle search Toggle menu Peter Murray Library technologist, open source advocate, striving to think globally while acting locally Follow Columbus, Ohio Email Twitter Keybase GitHub LinkedIn StackOverflow ORCID Email Should All Conference Talks be Pre-recorded? Posted on April 03, 2021 and updated on April 08, 2021     6 minute read The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and conference organizers. Should all talks be pre-recorded, even when we are back face-to-face? Note! After I posted a link to this article on Twitter, there was a great response of thoughtful comments. I've included new bullet points below and summarized the responses in another blog post. As an entirely virtual conference, I think we can call Code4Lib 2021 a success. Success ≠ Perfect, of course, and last week the conference coordinating team got together on a Zoom call for a debriefing session. We had a lengthy discussion about what we learned and what we wanted to take forward to the 2022 conference, which we’re anticipating will be something with a face-to-face component. That last sentence was tough to compose: “…will be face-to-face”? “…will be both face-to-face and virtual”? (Or another fully virtual event?) Truth be told, I don’t think we know yet. I think we know with some certainty that the COVID pandemic will become much more manageable by this time next year—at least in North America and Europe. (Code4Lib draws from primarily North American library technologists with a few guests from other parts of the world.) I’m hearing from higher education institutions, though, that travel is going to be severely curtailed…if not for health risk reasons, then because budgets have been slashed. So one has to wonder what a conference will look like next year. I’ve been to two online conferences this year: NISOplus21 and Code4Lib. Both meetings recorded talks in advance and started playback of the recordings at a fixed point in time. This was beneficial for a couple of reasons. For organizers and presenters, pre-recording allowed technical glitches to be worked through without the pressure of a live event happening. Technology is not nearly perfect enough or ubiquitously spread to count on it working in real-time. 1 NISOplus21 also used the recordings to get transcribed text for the videos. (Code4Lib used live transcriptions on the synchronous playback.) Attendees and presenters benefited from pre-recording because the presenters could be in the text chat channel to answer questions and provide insights. Having the presenter free during the playback offers new possibilities for making talks more engaging: responding in real-time to polls, getting forehand knowledge of topics for subsequent real-time question/answer sessions, and so forth. The synchronous playback time meant that there was a point when (almost) everyone was together watching the same talk—just as in face-to-face sessions. During the Code4Lib conference coordinating debrief call, I asked the question: “If we saw so many benefits to pre-recording talks, do we want to pre-record them all next year?” In addition to the reasons above, pre-recorded talks benefit those who are not comfortable speaking English or are first-time presenters. (They have a chance to re-do their talk as many times as they need in a much less stressful environment.) “Live” demos are much smoother because a recording can be restarted if something goes wrong. Each year, at least one presenter needs to use their own machine (custom software, local development environment, etc.), and swapping out presenter computers in real-time is risky. And it is undoubtedly easier to impose time requirements with recorded sessions. So why not pre-record all of the talks? I get it—it would be different to sit in a ballroom watching a recording play on big screens at the front of the room while the podium is empty. But is it so different as to dramatically change the experience of watching a speaker at a podium? In many respects, we had a dry-run of this during Code4Lib 2020. It was at the early stages of the coming lockdowns when institutions started barring employee travel, and we had to bring in many presenters remotely. I wrote a blog post describing the setup we used for remote presenters, and at the end, I said: I had a few people comment that they were taken aback when they realized that there was no one standing at the podium during the presentation. Some attendees, at least, quickly adjusted to this format. For those with the means and privilege of traveling, there can still be face-to-face discussions in the hall, over meals, and social activities. For those that can’t travel (due to risks of traveling, family/personal responsibilities, or budget cuts), the attendee experience is a little more level—everyone is watching the same playback and in the same text backchannels during the talk. I can imagine a conference tool capable of segmenting chat sessions during the talk playback to “tables” where you and close colleagues can exchange ideas and then promote the best ones to a conference-wide chat room. Something like that would be beneficial as attendance grows for events with an online component, and it would be a new form of engagement that isn’t practical now. There are undoubtedly reasons not to pre-record all session talks (beyond the feels-weird-to-stare-at-an-unoccupied-ballroom-podium reasons). During the debriefing session, one person brought up that having all pre-recorded talks erodes the justification for in-person attendance. I can see a manager saying, “All of the talks are online…just watch it from your desk. Even your own presentation is pre-recorded, so there is no need for you to fly to the meeting.” That’s legitimate. So if you like bullet points, here’s how it lays out. Pre-recording all talks is better for: Accessibility: better transcriptions for recorded audio versus real-time transcription (and probably at a lower cost, too) Engagement: the speaker can be in the text chat during playback, and there could be new options for backchannel discussions Better quality: speakers can re-record their talk as many times as needed Closer equality: in-person attendees are having much the same experience during the talk as remote attendees Downsides for pre-recording all talks: Feels weird: yeah, it would be different Erodes justification: indeed a problem, especially for those for whom giving a speech is the only path to getting the networking benefits of face-to-face interaction Limits presentation format: it forces every session into being a lecture. For two decades CfPs have emphasized how will this season be engaging/not just a talking head? (Lisa Janicke Hinchliffe) Increased Technical Burden on Speaker and Organizers: conference organizers asking presenters to do their own pre-recording is a barrier (Junior Tidal), and organizers have added new requirements for themselves No Audience Feedback: pre-recording forces the presenter into an unnatural state relative to the audience (Andromeda Yelton) Currency of information: pre-recording talks before en event naturally introduces a delay between the recording and the playback. (Lisa Janicke Hinchliffe) I’m curious to hear of other reasons, for and against. Reach out to me on Twitter if you have some. The COVID-19 pandemic has changed our society and will undoubtedly transform it in ways that we can’t even anticipate. Is the way that we hold professional conferences one of them? Can we just pause for a moment and consider the decades of work and layers of technology that make a modern teleconference call happen? For you younger folks, there was a time when one couldn’t assume the network to be there. As in: the operating system on your computer couldn’t be counted on to have a network stack built into it. In the earliest years of my career, we were tickled pink to have Macintoshes at the forefront of connectivity through GatorBoxes. Go read the first paragraph of that Wikipedia article on GatorBoxes…TCP/IP was tunneled through LocalTalk running over PhoneNet on unshielded twisted pairs no faster than about 200 kbit/second. (And we loved it!) Now the network is expected; needing to know about TCP/IP is pushed so far down the stack as to be forgotten…assumed. Sure, the software on top now is buggy and bloated—is my Zoom client working? has Zoom’s service gone down?—but the network…we take that for granted. ↩ Tags: code4lib, covid19, meeting planning, NISOplus Categories: L/IS Profession Twitter Facebook LinkedIn Previous Next You May Also Enjoy More Thoughts on Pre-recording Conference Talks 8 minute read Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion abo... User Behavior Access Controls at a Library Proxy Server are Okay 9 minute read Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. ... As a Cog in the Election System: Reflections on My Role as a Precinct Election Official 9 minute read I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democ... Running an All-Online Conference with Zoom [post removed] less than 1 minute read This is an article draft that was accidentally published. I hope to work on a final version soon. If you really want to see it, I saved a copy on the Interne... Enter your search term... Twitter GitHub Feed © 2021 Peter Murray. Powered by Jekyll & Minimal Mistakes. dltj-org-6401 ---- Publishers going-it-alone (for now?) with GetFTR | Disruptive Library Technology Jester Skip links Skip to primary navigation Skip to content Skip to footer Disruptive Library Technology Jester About Resume Toggle search Toggle menu Peter Murray Library technologist, open source advocate, striving to think globally while acting locally Follow Columbus, Ohio Email Twitter Keybase GitHub LinkedIn StackOverflow ORCID Email Publishers going-it-alone (for now?) with GetFTR Posted on December 03, 2019 and updated on April 03, 2021     5 minute read In early December 2019, a group of publishers announced Get-Full-Text-Research, or GetFTR for short. I read about this first in Roger Schonfeld’s “Publishers Announce a Major New Service to Plug Leakage” piece in The Scholarly Kitchen via Jeff Pooley’s Twitter thread and blog post. Details about how this works are thin, so I’m leaning heavily on Roger’s description. I’m not as negative about this as Jeff, and I’m probably a little more opinionated than Roger. This is an interesting move by publishers, and—as the title of this post suggests—I am critical of the publisher’s “go-it-alone” approach. First, some disclosure might be in order. My background has me thinking of this in the context of how it impacts libraries and library consortia. For the past four years, I’ve been co-chair of the NISO Information Discovery and Interchange topic committee (and its predecessor, the “Discovery to Delivery” topic committee), so this is squarely in what I’ve been thinking about in the broader library-publisher professional space. I also traced the early development of RA21 and more recently am volunteering on the SeamlessAccess Entity Category and Attribute Bundles Working Group; that’ll become more important a little further down this post. I was nodding along with Roger’s narrative until I stopped short here: The five major publishing houses that are the driving forces behind GetFTR are not pursuing this initiative through one of the major industry collaborative bodies. All five are leading members of the STM Association, NISO, ORCID, Crossref, and CHORUS, to name several major industry groups. But rather than working through one of these existing groups, the houses plan instead to launch a new legal entity.  While [Vice President of Product Strategy & Partnerships for Wiley Todd] Toler and [Senior Director, Technology Strategy & Partnerships for the American Chemical Society Ralph] Youngen were too politic to go deeply into the details of why this might be, it is clear that the leadership of the large houses have felt a major sense of mismatch between their business priorities on the one hand and the capabilities of these existing industry bodies. At recent industry events, publishing house CEOs have voiced extensive concerns about the lack of cooperation-driven innovation in the sector. For example, Judy Verses from Wiley spoke to this issue in spring 2018, and several executives did so at Frankfurt this fall. In both cases, long standing members of the scholarly publishing sector questioned if these executives perhaps did not realize the extensive collaborations driven through Crossref and ORCID, among others. It is now clear to me that the issue is not a lack of knowledge but rather a concern at the executive level about the perceived inability of existing collaborative vehicles to enable the new strategic directions that publishers feel they must pursue.  This is the publishers going-it-alone. To see Roger describe it, they are going to create this web service that allows publishers to determine the appropriate copy for a patron and do it without input from the libraries. Librarians will just be expected to put this web service widget into their discovery services to get “colored buttons indicating that the link will take [patrons] to the version of record, an alternative pathway, or (presumably in rare cases) no access at all.” (Let’s set aside for the moment the privacy implications of having a fourth-party web service recording all of the individual articles that come up in a patron’s search results.) Librarians will not get to decide the “alternative pathway” that is appropriate for the patron: “Some publishers might choose to provide access to a preprint or a read-only version, perhaps in some cases on some kind of metered basis.” (Roger goes on to say that he “expect[s] publishers will typically enable some alternative version for their content, in which case the vast majority of scholarly content will be freely available through publishers even if it is not open access in terms of licensing.” I’m not so confident.) No, thank you. If publishers want to engage in technical work to enable libraries and others to build web services that determine the direct link to an article based on a DOI, then great. Libraries can build a tool that consumes that information as well as takes into account information about preprint services, open access versions, interlibrary loan and other methods of access. But to ask libraries to accept this publisher-controlled access button in their discovery layers, their learning management systems, their scholarly profile services, and their other tools? That sounds destined for disappointment. I am only somewhat encouraged by the fact that RA21 started out as a small, isolated collaboration of publishers before they brought in NISO and invited libraries to join the discussion. Did it mean that it slowed down deployment of RA21? Undoubtedly yes. Did persnickety librarians demand transparent discussions and decisions about privacy-related concerns like what attributes the publisher would get about the patron in the Shibboleth-powered backchannel? Yes, but because the patrons weren’t there to advocate for themselves. Will it likely mean wider adoption? I’d like to think so. Have publishers learned that forcing these kinds of technologies onto users without consultation is a bad idea? At the moment it would appear not. Some of what publishers are seeking with GetFTR can be implemented with straight-up OpenURL or—at the very least—limited-scope additions to OpenURL (the Z39.88 open standard!). So that they didn’t start with OpenURL, a robust existing standard, is both concerning and annoying. I’ll be watching and listening for points of engagement, so I remain hopeful. A few words about Jeff Pooley’s five-step “laughably creaky and friction-filled effort” that is SeamlessAccess. Many of the steps Jeff describes are invisible and well-established technical protocols. What Jeff fails to take into account is the very visible and friction-filled effect of patrons accessing content beyond the boundaries of campus-recognized internet network addresses. Those patrons get stopped at step two with a “pay $35 please” message. I’m all for removing that barrier entirely by making all published content “open access”. It is folly to think, though, that researchers and readers can enforce an open access business model on all publishers, so solutions like SeamlessAccess will have a place. (Which is to say nothing of the benefit of inter-institutional resource collaboration opened up by a more widely deployed Shibboleth infrastructure powered by SeamlessAccess.) Tags: discovery, GetFTR, niso, openurl, ra21, SeamlessAccess Categories: Linking Technologies Twitter Facebook LinkedIn Previous Next You May Also Enjoy More Thoughts on Pre-recording Conference Talks 8 minute read Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion abo... Should All Conference Talks be Pre-recorded? 6 minute read The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and con... User Behavior Access Controls at a Library Proxy Server are Okay 9 minute read Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. ... As a Cog in the Election System: Reflections on My Role as a Precinct Election Official 10 minute read I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democ... Enter your search term... Twitter GitHub Feed © 2021 Peter Murray. Powered by Jekyll & Minimal Mistakes. dltj-org-7097 ---- Should All Conference Talks be Pre-recorded? | Disruptive Library Technology Jester Skip links Skip to primary navigation Skip to content Skip to footer Disruptive Library Technology Jester About Resume Toggle search Toggle menu Peter Murray Library technologist, open source advocate, striving to think globally while acting locally Follow Columbus, Ohio Email Twitter Keybase GitHub LinkedIn StackOverflow ORCID Email Should All Conference Talks be Pre-recorded? Posted on April 03, 2021 and updated on April 08, 2021     6 minute read The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and conference organizers. Should all talks be pre-recorded, even when we are back face-to-face? Note! After I posted a link to this article on Twitter, there was a great response of thoughtful comments. I've included new bullet points below and summarized the responses in another blog post. As an entirely virtual conference, I think we can call Code4Lib 2021 a success. Success ≠ Perfect, of course, and last week the conference coordinating team got together on a Zoom call for a debriefing session. We had a lengthy discussion about what we learned and what we wanted to take forward to the 2022 conference, which we’re anticipating will be something with a face-to-face component. That last sentence was tough to compose: “…will be face-to-face”? “…will be both face-to-face and virtual”? (Or another fully virtual event?) Truth be told, I don’t think we know yet. I think we know with some certainty that the COVID pandemic will become much more manageable by this time next year—at least in North America and Europe. (Code4Lib draws from primarily North American library technologists with a few guests from other parts of the world.) I’m hearing from higher education institutions, though, that travel is going to be severely curtailed…if not for health risk reasons, then because budgets have been slashed. So one has to wonder what a conference will look like next year. I’ve been to two online conferences this year: NISOplus21 and Code4Lib. Both meetings recorded talks in advance and started playback of the recordings at a fixed point in time. This was beneficial for a couple of reasons. For organizers and presenters, pre-recording allowed technical glitches to be worked through without the pressure of a live event happening. Technology is not nearly perfect enough or ubiquitously spread to count on it working in real-time. 1 NISOplus21 also used the recordings to get transcribed text for the videos. (Code4Lib used live transcriptions on the synchronous playback.) Attendees and presenters benefited from pre-recording because the presenters could be in the text chat channel to answer questions and provide insights. Having the presenter free during the playback offers new possibilities for making talks more engaging: responding in real-time to polls, getting forehand knowledge of topics for subsequent real-time question/answer sessions, and so forth. The synchronous playback time meant that there was a point when (almost) everyone was together watching the same talk—just as in face-to-face sessions. During the Code4Lib conference coordinating debrief call, I asked the question: “If we saw so many benefits to pre-recording talks, do we want to pre-record them all next year?” In addition to the reasons above, pre-recorded talks benefit those who are not comfortable speaking English or are first-time presenters. (They have a chance to re-do their talk as many times as they need in a much less stressful environment.) “Live” demos are much smoother because a recording can be restarted if something goes wrong. Each year, at least one presenter needs to use their own machine (custom software, local development environment, etc.), and swapping out presenter computers in real-time is risky. And it is undoubtedly easier to impose time requirements with recorded sessions. So why not pre-record all of the talks? I get it—it would be different to sit in a ballroom watching a recording play on big screens at the front of the room while the podium is empty. But is it so different as to dramatically change the experience of watching a speaker at a podium? In many respects, we had a dry-run of this during Code4Lib 2020. It was at the early stages of the coming lockdowns when institutions started barring employee travel, and we had to bring in many presenters remotely. I wrote a blog post describing the setup we used for remote presenters, and at the end, I said: I had a few people comment that they were taken aback when they realized that there was no one standing at the podium during the presentation. Some attendees, at least, quickly adjusted to this format. For those with the means and privilege of traveling, there can still be face-to-face discussions in the hall, over meals, and social activities. For those that can’t travel (due to risks of traveling, family/personal responsibilities, or budget cuts), the attendee experience is a little more level—everyone is watching the same playback and in the same text backchannels during the talk. I can imagine a conference tool capable of segmenting chat sessions during the talk playback to “tables” where you and close colleagues can exchange ideas and then promote the best ones to a conference-wide chat room. Something like that would be beneficial as attendance grows for events with an online component, and it would be a new form of engagement that isn’t practical now. There are undoubtedly reasons not to pre-record all session talks (beyond the feels-weird-to-stare-at-an-unoccupied-ballroom-podium reasons). During the debriefing session, one person brought up that having all pre-recorded talks erodes the justification for in-person attendance. I can see a manager saying, “All of the talks are online…just watch it from your desk. Even your own presentation is pre-recorded, so there is no need for you to fly to the meeting.” That’s legitimate. So if you like bullet points, here’s how it lays out. Pre-recording all talks is better for: Accessibility: better transcriptions for recorded audio versus real-time transcription (and probably at a lower cost, too) Engagement: the speaker can be in the text chat during playback, and there could be new options for backchannel discussions Better quality: speakers can re-record their talk as many times as needed Closer equality: in-person attendees are having much the same experience during the talk as remote attendees Downsides for pre-recording all talks: Feels weird: yeah, it would be different Erodes justification: indeed a problem, especially for those for whom giving a speech is the only path to getting the networking benefits of face-to-face interaction Limits presentation format: it forces every session into being a lecture. For two decades CfPs have emphasized how will this season be engaging/not just a talking head? (Lisa Janicke Hinchliffe) Increased Technical Burden on Speaker and Organizers: conference organizers asking presenters to do their own pre-recording is a barrier (Junior Tidal), and organizers have added new requirements for themselves No Audience Feedback: pre-recording forces the presenter into an unnatural state relative to the audience (Andromeda Yelton) Currency of information: pre-recording talks before en event naturally introduces a delay between the recording and the playback. (Lisa Janicke Hinchliffe) I’m curious to hear of other reasons, for and against. Reach out to me on Twitter if you have some. The COVID-19 pandemic has changed our society and will undoubtedly transform it in ways that we can’t even anticipate. Is the way that we hold professional conferences one of them? Can we just pause for a moment and consider the decades of work and layers of technology that make a modern teleconference call happen? For you younger folks, there was a time when one couldn’t assume the network to be there. As in: the operating system on your computer couldn’t be counted on to have a network stack built into it. In the earliest years of my career, we were tickled pink to have Macintoshes at the forefront of connectivity through GatorBoxes. Go read the first paragraph of that Wikipedia article on GatorBoxes…TCP/IP was tunneled through LocalTalk running over PhoneNet on unshielded twisted pairs no faster than about 200 kbit/second. (And we loved it!) Now the network is expected; needing to know about TCP/IP is pushed so far down the stack as to be forgotten…assumed. Sure, the software on top now is buggy and bloated—is my Zoom client working? has Zoom’s service gone down?—but the network…we take that for granted. ↩ Tags: code4lib, covid19, meeting planning, NISOplus Categories: L/IS Profession Twitter Facebook LinkedIn Previous Next You May Also Enjoy More Thoughts on Pre-recording Conference Talks 8 minute read Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion abo... User Behavior Access Controls at a Library Proxy Server are Okay 9 minute read Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. ... As a Cog in the Election System: Reflections on My Role as a Precinct Election Official 9 minute read I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democ... Running an All-Online Conference with Zoom [post removed] less than 1 minute read This is an article draft that was accidentally published. I hope to work on a final version soon. If you really want to see it, I saved a copy on the Interne... Enter your search term... Twitter GitHub Feed © 2021 Peter Murray. Powered by Jekyll & Minimal Mistakes. dltj-org-7931 ---- Disruptive Library Technology Jester Skip links Skip to primary navigation Skip to content Skip to footer Disruptive Library Technology Jester About Resume Toggle search Toggle menu Peter Murray Library technologist, open source advocate, striving to think globally while acting locally Follow Columbus, Ohio Email Twitter Keybase GitHub LinkedIn StackOverflow ORCID Email Recent Posts More Thoughts on Pre-recording Conference Talks 7 minute read Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion abo... Should All Conference Talks be Pre-recorded? 6 minute read The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and con... User Behavior Access Controls at a Library Proxy Server are Okay 9 minute read Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. ... As a Cog in the Election System: Reflections on My Role as a Precinct Election Official 9 minute read I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democ... Running an All-Online Conference with Zoom [post removed] less than 1 minute read This is an article draft that was accidentally published. I hope to work on a final version soon. If you really want to see it, I saved a copy on the Interne... Previous 1 2 3 … 130 Next Enter your search term... Twitter GitHub Feed © 2021 Peter Murray. Powered by Jekyll & Minimal Mistakes. dltj-org-8735 ---- Should All Conference Talks be Pre-recorded? | Disruptive Library Technology Jester Skip links Skip to primary navigation Skip to content Skip to footer Disruptive Library Technology Jester About Resume Toggle search Toggle menu Peter Murray Library technologist, open source advocate, striving to think globally while acting locally Follow Columbus, Ohio Email Twitter Keybase GitHub LinkedIn StackOverflow ORCID Email Should All Conference Talks be Pre-recorded? Posted on April 03, 2021 and updated on April 08, 2021     6 minute read The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and conference organizers. Should all talks be pre-recorded, even when we are back face-to-face? Note! After I posted a link to this article on Twitter, there was a great response of thoughtful comments. I've included new bullet points below and summarized the responses in another blog post. As an entirely virtual conference, I think we can call Code4Lib 2021 a success. Success ≠ Perfect, of course, and last week the conference coordinating team got together on a Zoom call for a debriefing session. We had a lengthy discussion about what we learned and what we wanted to take forward to the 2022 conference, which we’re anticipating will be something with a face-to-face component. That last sentence was tough to compose: “…will be face-to-face”? “…will be both face-to-face and virtual”? (Or another fully virtual event?) Truth be told, I don’t think we know yet. I think we know with some certainty that the COVID pandemic will become much more manageable by this time next year—at least in North America and Europe. (Code4Lib draws from primarily North American library technologists with a few guests from other parts of the world.) I’m hearing from higher education institutions, though, that travel is going to be severely curtailed…if not for health risk reasons, then because budgets have been slashed. So one has to wonder what a conference will look like next year. I’ve been to two online conferences this year: NISOplus21 and Code4Lib. Both meetings recorded talks in advance and started playback of the recordings at a fixed point in time. This was beneficial for a couple of reasons. For organizers and presenters, pre-recording allowed technical glitches to be worked through without the pressure of a live event happening. Technology is not nearly perfect enough or ubiquitously spread to count on it working in real-time. 1 NISOplus21 also used the recordings to get transcribed text for the videos. (Code4Lib used live transcriptions on the synchronous playback.) Attendees and presenters benefited from pre-recording because the presenters could be in the text chat channel to answer questions and provide insights. Having the presenter free during the playback offers new possibilities for making talks more engaging: responding in real-time to polls, getting forehand knowledge of topics for subsequent real-time question/answer sessions, and so forth. The synchronous playback time meant that there was a point when (almost) everyone was together watching the same talk—just as in face-to-face sessions. During the Code4Lib conference coordinating debrief call, I asked the question: “If we saw so many benefits to pre-recording talks, do we want to pre-record them all next year?” In addition to the reasons above, pre-recorded talks benefit those who are not comfortable speaking English or are first-time presenters. (They have a chance to re-do their talk as many times as they need in a much less stressful environment.) “Live” demos are much smoother because a recording can be restarted if something goes wrong. Each year, at least one presenter needs to use their own machine (custom software, local development environment, etc.), and swapping out presenter computers in real-time is risky. And it is undoubtedly easier to impose time requirements with recorded sessions. So why not pre-record all of the talks? I get it—it would be different to sit in a ballroom watching a recording play on big screens at the front of the room while the podium is empty. But is it so different as to dramatically change the experience of watching a speaker at a podium? In many respects, we had a dry-run of this during Code4Lib 2020. It was at the early stages of the coming lockdowns when institutions started barring employee travel, and we had to bring in many presenters remotely. I wrote a blog post describing the setup we used for remote presenters, and at the end, I said: I had a few people comment that they were taken aback when they realized that there was no one standing at the podium during the presentation. Some attendees, at least, quickly adjusted to this format. For those with the means and privilege of traveling, there can still be face-to-face discussions in the hall, over meals, and social activities. For those that can’t travel (due to risks of traveling, family/personal responsibilities, or budget cuts), the attendee experience is a little more level—everyone is watching the same playback and in the same text backchannels during the talk. I can imagine a conference tool capable of segmenting chat sessions during the talk playback to “tables” where you and close colleagues can exchange ideas and then promote the best ones to a conference-wide chat room. Something like that would be beneficial as attendance grows for events with an online component, and it would be a new form of engagement that isn’t practical now. There are undoubtedly reasons not to pre-record all session talks (beyond the feels-weird-to-stare-at-an-unoccupied-ballroom-podium reasons). During the debriefing session, one person brought up that having all pre-recorded talks erodes the justification for in-person attendance. I can see a manager saying, “All of the talks are online…just watch it from your desk. Even your own presentation is pre-recorded, so there is no need for you to fly to the meeting.” That’s legitimate. So if you like bullet points, here’s how it lays out. Pre-recording all talks is better for: Accessibility: better transcriptions for recorded audio versus real-time transcription (and probably at a lower cost, too) Engagement: the speaker can be in the text chat during playback, and there could be new options for backchannel discussions Better quality: speakers can re-record their talk as many times as needed Closer equality: in-person attendees are having much the same experience during the talk as remote attendees Downsides for pre-recording all talks: Feels weird: yeah, it would be different Erodes justification: indeed a problem, especially for those for whom giving a speech is the only path to getting the networking benefits of face-to-face interaction Limits presentation format: it forces every session into being a lecture. For two decades CfPs have emphasized how will this season be engaging/not just a talking head? (Lisa Janicke Hinchliffe) Increased Technical Burden on Speaker and Organizers: conference organizers asking presenters to do their own pre-recording is a barrier (Junior Tidal), and organizers have added new requirements for themselves No Audience Feedback: pre-recording forces the presenter into an unnatural state relative to the audience (Andromeda Yelton) Currency of information: pre-recording talks before en event naturally introduces a delay between the recording and the playback. (Lisa Janicke Hinchliffe) I’m curious to hear of other reasons, for and against. Reach out to me on Twitter if you have some. The COVID-19 pandemic has changed our society and will undoubtedly transform it in ways that we can’t even anticipate. Is the way that we hold professional conferences one of them? Can we just pause for a moment and consider the decades of work and layers of technology that make a modern teleconference call happen? For you younger folks, there was a time when one couldn’t assume the network to be there. As in: the operating system on your computer couldn’t be counted on to have a network stack built into it. In the earliest years of my career, we were tickled pink to have Macintoshes at the forefront of connectivity through GatorBoxes. Go read the first paragraph of that Wikipedia article on GatorBoxes…TCP/IP was tunneled through LocalTalk running over PhoneNet on unshielded twisted pairs no faster than about 200 kbit/second. (And we loved it!) Now the network is expected; needing to know about TCP/IP is pushed so far down the stack as to be forgotten…assumed. Sure, the software on top now is buggy and bloated—is my Zoom client working? has Zoom’s service gone down?—but the network…we take that for granted. ↩ Tags: code4lib, covid19, meeting planning, NISOplus Categories: L/IS Profession Twitter Facebook LinkedIn Previous Next You May Also Enjoy More Thoughts on Pre-recording Conference Talks 8 minute read Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion abo... User Behavior Access Controls at a Library Proxy Server are Okay 9 minute read Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. ... As a Cog in the Election System: Reflections on My Role as a Precinct Election Official 9 minute read I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democ... Running an All-Online Conference with Zoom [post removed] less than 1 minute read This is an article draft that was accidentally published. I hope to work on a final version soon. If you really want to see it, I saved a copy on the Interne... Enter your search term... Twitter GitHub Feed © 2021 Peter Murray. Powered by Jekyll & Minimal Mistakes. dltj-org-9010 ---- What is known about GetFTR at the end of 2019 | Disruptive Library Technology Jester Skip links Skip to primary navigation Skip to content Skip to footer Disruptive Library Technology Jester About Resume Toggle search Toggle menu Peter Murray Library technologist, open source advocate, striving to think globally while acting locally Follow Columbus, Ohio Email Twitter Keybase GitHub LinkedIn StackOverflow ORCID Email What is known about GetFTR at the end of 2019 Posted on December 28, 2019 and updated on April 03, 2021     14 minute read In early December 2019, a group of publishers announced Get-Full-Text-Research, or GetFTR for short. There was a heck of a response on social media, and the response was—on the whole—not positive from my librarian-dominated corner of Twitter. For my early take on GetFTR, see my December 3rd blog post “Publishers going-it-alone (for now?) with GetFTR.” As that post title suggests, I took the five founding GetFTR publishers to task on their take-it-or-leave-it approach. I think that is still a problem. To get you caught up, here is a list of other commentary. Roger Schonfeld’s December 3rd “Publishers Announce a Major New Service to Plug Leakage” piece in The Scholarly Kitchen Tweet from Herbert Van de Sompel, the lead author of the OpenURL spec, on solving the appropriate copy problem December 5th post “Get To Fulltext Ourselves, Not GetFTR.” on the Open Access Button blog Twitter thread on December 7th between @cshillum and @lisalibrarian on the positioning of GetFTR in relation to link resolvers and an unanswered question about how GetFTR aligns with library interests Twitter thread started by @TAC_NISO on December 9th looking for more information with a link to an STM Association presentation added by @aarontay A tree of tweets starting from @mrgunn’s [I don’t trust publishers to decide] is the crux of the whole thing. In particular, threads of that tweet that include Jason Griffey of NISO saying he knew nothing about GetFTR and Bernhard Mittermaier’s point about hidden motivations behind GetFTR Twitter thread started by @aarontay on December 7th saying “GetFTR is bad for researchers/readers and librarians. It only benefits publishers, change my mind.” Lisa Janicke Hinchliffe’s December 10th “Why are Librarians Concerned about GetFTR?” in The Scholarly Kitchen and take note of the follow-up discussion in the comments Twitter thread between @alison_mudditt and @lisalibrarian clarifying PLOS is not on the Advisory Board with some @TAC_NISO as well. Ian Mulvany’s December 11th “thoughts on GetFTR” on ScholCommsProd GetFTR’s December 11th “Updating the community” post on their website The Spanish Federation of Associations of Archivists, Librarians, Archaeologists, Museologists and Documentalists (ANABAD)’s December 12th “GetFTR: new publishers service to speed up access to research articles” (original in Spanish, Google Translate to English) December 20th news entry from eContent Pro with the title “What GetFTR Means for Journal Article Access” which I’ll only quarrel with this sentence: “Thus, GetFTR is a service where Academic articles are found and provided to you at absolutely no cost.” No—if you are in academia the cost is born by your library even if you don’t see it. But this seems like a third party service that isn’t directly related to publishers or libraries, so perhaps they can be forgiven for not getting that nuance. Wiley’s Chemistry Views news post on December 26th titled simply “Get Full Text Research (GetFTR)” is perhaps only notable for the sentence “Growing leakage has steadily eroded the ability of the publishers to monetize the value they create.” If you are looking for a short list of what to look at, I recommend these posts. GetFTR’s Community Update On December 11—after the two posts I list below—an “Updating the Community” web page was posted to the GetFTR website. From a public relations perspective, it was…interesting. We are committed to being open and transparent This section goes on to say, “If the community feels we need to add librarians to our advisory group we will certainly do so and we will explore ways to ensure we engage with as many of our librarian stakeholders as possible.” If the GetFTR leadership didn’t get the indication between December 3 and December 12 that librarians feel strongly about being at the table, then I don’t know what will. And it isn’t about being on the advisory group; it is about being seen and appreciated as important stakeholders in the research discovery process. I’m not sure who the “community” is in this section, but it is clear that librarians are—at best—an afterthought. That is not the kind of “open and transparent” that is welcoming. Later on in the Questions about library link resolvers section is this sentence: We have, or are planning to, consult with existing library advisory boards that participating publishers have, as this enables us to gather views from a significant number of librarians from all over the globe, at a range of different institutions. As I said in my previous post, I don’t know why GetFTR is not engaging in existing cross-community (publisher/technology-supplier/library) organizations to have this discussion. It feels intentional, which colors the perception of what the publishers are trying to accomplish. To be honest, I don’t think the publishers are using GetFTR to drive a wedge between library technology service providers (who are needed to make GetFTR a reality for libraries) and libraries themselves. But I can see how that interpretation could be made. Understandably, we have been asked about privacy. I punted on privacy in my previous post, so let’s talk about it here. It remains to be seen what is included in the GetFTR API request between the browser and the publisher site. Sure, it needs to include the DOI and a token that identifies the patron’s institution. We can inspect that API request to ensure nothing else is included. But the fact that the design of GetFTR has the browser making the call to the publisher site means that the publisher site knows the IP address of the patron’s browser, and the IP address can be considered personally identifiable information. This issue could be fixed by having the link resolver or the discovery layer software make the API request, and according to the Questions about library link resolvers section of the community update, this may be under consideration. So, yes, an auditable privacy policy and implementation is key for for GetFTR. GetFTR is fully committed to supporting third-party aggregators This is good to hear. I would love to see more information published about this, including how discipline-specific repositories and institutional repositories can have their holdings represented in GetFTR responses. My Take-a-ways In the second to last paragraph: “Researchers should have easy, seamless pathways to research, on whatever platform they are using, wherever they are.” That is a statement that I think every library could sign onto. This Updating the Community is a good start, but the project has dug a deep hole of trust and it hasn’t reached level ground yet. Lisa Janicke Hinchliffe’s “Why are Librarians Concerned about GetFTR?” Posted on December 10th in The Scholarly Kitchen, Lisa outlines a series of concerns from a librarian perspective. I agree with some of these; others are not an issue in my opinion. Librarian Concern: The Connection to Seamless Access Many librarians have expressed a concern about how patron information can leak to the publisher through ill-considered settings at an institution’s identity provider. Seamless Access can ease access control because it leverages a campus’ single sign-on solution—something that a library patron is likely to be familiar with. If the institution’s identity provider is overly permissive in the attributes about a patron that get transmitted to the publisher, then there is a serious risk of tying a user’s research activity to their identity and the bad things that come from that (patrons self-censoring their research paths, commoditization of patron activity, etc.). I’m serving on a Seamless Access task force that is addressing this issue, and I think there are technical, policy, and education solutions to this concern. In particular, I think some sort of intermediate display of the attributes being transmitted to the publisher is most appropriate. Librarian Concern: The Limited User Base Enabled As Lisa points out, the population of institutions that can take advantage of Seamless Access, a prerequisite for GetFTR, is very small and weighted heavily towards well-resourced institutions. To the extent that projects like Seamless Access (spurred on by a desire to have GetFTR-like functionality) helps with the adoption of SAML-based infrastructure like Shibboleth, then the whole academic community benefits from a shared authentication/identity layer that can be assumed to exist. Librarian Concern: The Insertion of New Stumbling Blocks Of the issues Lisa mentioned here, I’m not concerned about users being redirected to their campus single sign-on system in multiple browsers on multiple machines. This is something we should be training users about—there is a single website to put your username/password into for whatever you are accessing at the institution. That a user might already be logged into the institution single sign-on system in the course of doing other school work and never see a logon screen is an attractive benefit to this system. That said, it would be useful for an API call from a library’s discovery layer to a publisher’s GetFTR endpoint to be able to say, “This is my user. Trust me when I say that they are from this institution.” If that were possible, then the Seamless Access Where-Are-You-From service could be bypassed for the GetFTR purpose of determining whether a user’s institution has access to an article on the publisher’s site. It would sure be nice if librarians were involved in the specification of the underlying protocols early on so these use cases could be offered. Update Lisa reached out on Twitter to say (in part): “Issue is GetFTR doesn’t redirect and SA doesnt when you are IPauthenticated. Hence user ends up w mishmash of experience.” I went back to read her Scholarly Kitchen post and realized I did not fully understand her point. If GetFTR is relying on a Seamless Access token to know which institution a user is coming from, then that token must get into the user’s browser. The details we have seen about GetFTR don’t address how that Seamless Access institution token is put in the user’s browser if the user has not been to the Seamless Access select-your-institution portal. One such case is when the user is coming from an IP-address-authenticated computer on a campus network. Do the GetFTR indicators appear even when the Seamless Access institution token is not stored in the browser? If at the publisher site the GetFTR response also uses the institution IP address table to determine entitlements, what does a user see when they have neither the Seamless Access institution token nor the institution IP address? And, to Lisa’s point, how does one explain this disparity to users? Is the situation better if the GetFTR determination is made in the link resolver rather than in the user browser? Librarian Concern: Exclusion from Advisory Committee See previous paragraph. That librarians are not at the table offering use cases and technical advice means that the developers are likely closing off options that meet library needs. Addressing those needs would ease the acceptance of the GetFTR project as mutually beneficial. So an emphatic “AGREE!” with Lisa on her points in this section. Publishers—what were you thinking? Librarian Concern: GetFTR Replacing the Library Link Resolver Libraries and library technology companies are making significant investments in tools that ease the path from discovery to delivery. Would the library’s link resolver benefit from a real-time API call to a publisher’s service that determines the direct URL to a specific DOI? Oh, yes—that would be mighty beneficial. The library could put that link right at the top of a series of options that include a link to a version of the article in a Green Open Access repository, redirection to a content aggregator, one-click access to an interlibrary-loan form, or even an option where the library purchases a copy of the article on behalf of the patron. (More likely, the link resolver would take the patron right to the article URL supplied by GetFTR, but the library link resolver needs to be in the loop to be able to offer the other options.) My Take-a-ways The patron is affiliated with the institution, and the institution (through the library) is subscribing to services from the publisher. The institution’s library knows best what options are available to the patron (see above section). Want to know why librarians are concerned? Because they are inserting themselves as the arbiter of access to content, whether it is in the patron’s best interest or not. It is also useful to reinforce Lisa’s closing paragraph: Whether GetFTR will act to remediate these concerns remains to be seen. In some cases, I would expect that they will. In others, they may not. Publishers’ interests are not always aligned with library interests and they may accept a fraying relationship with the library community as the price to pay to pursue their strategic goals. Ian Mulvany’s “thoughts on GetFTR” Ian’s entire post from December 11th in ScholCommsProd is worth reading. I think it is an insightful look at the technology and its implications. Here are some specific comments: Clarifying the relation between SeamlessAccess and GetFTR There are a couple of things that I disagree with: OK, so what is the difference, for the user, between seamlessaccess and GetFTR? I think that the difference is the following - with seamless access you the user have to log in to the publisher site. With GetFTR if you are providing pages that contain DOIs (like on a discovery service) to your researchers, you can give them links they can click on that have been setup to get those users direct access to the content. That means as a researcher, so long as the discovery service has you as an authenticated user, you don’t need to even think about logins, or publisher access credentials. To the best of my understanding, this is incorrect. With SeamlessAccess, the user is not “logging into the publisher site.” If the publisher site doesn’t know who a user is, the user is bounced back to their institution’s single sign-on service to authenticate. If the publisher site doesn’t know where a user is from, it invokes the SeamlessAccess Where-Are-You-From service to learn which institution’s single sign-on service is appropriate for the user. If a user follows a GetFTR-supplied link to a publisher site but the user doesn’t have the necessary authentication token from the institution’s single sign-on service, then they will be bounced back for the username/password and redirected to the publisher’s site. GetFTR signaling that an institution is entitled to view an article does not mean the user can get it without proving that they are a member of the institution. What does this mean for Green Open Access A key point that Ian raises is this: One example of how this could suck, lets imagine that there is a very usable green OA version of an article, but the publisher wants to push me to using some “e-reader limited functionality version” that requires an account registration, or god forbid a browser exertion, or desktop app. If the publisher shows only this limited utility version, and not the green version, well that sucks. Oh, yeah…that does suck, and it is because the library—not the publisher of record—is better positioned to know what is best for a particular user. Will GetFTR be adopted? Ian asks, “Will google scholar implement this, will other discovery services do so?” I do wonder if GetFTR is big enough to attract the attention of Google Scholar and Microsoft Research. My gut tells me “no”: I don’t think Google and Microsoft are going to add GetFTR buttons to their search results screens unless they are paid a lot. As for Google Scholar, it is more likely that Google would build something like GetFTR to get the analytics rather than rely on a publisher’s version. I’m even more doubtful that the companies pushing GetFTR can convince discovery layers makers to embed GetFTR into their software. Since the two widely adopted discovery layers (in North America, at least) are also aggregators of journal content, I don’t see the discovery-layer/aggregator companies devaluing their product by actively pushing users off their site. My Take-a-ways It is also useful to reinforce Ian’s closing paragraph: I have two other recommendations for the GetFTR team. Both relate to building trust. First up, don’t list orgs as being on an advisory board, when they are not. Secondly it would be great to learn about the team behind the creation of the Service. At the moment its all very anonymous. Where Do We Stand? Wow, I didn’t set out to write 2,500 words on this topic. At the start I was just taking some time to review everything that happened since this was announced at the start of December and see what sense I could make of it. It turned into a literature review of sort. While GetFTR has some powerful backers, it also has some pretty big blockers: Can GetFTR help spur adoption of Seamless Access enough to convince big and small institutions to invest in identity provider infrastructure and single sign-on systems? Will GetFTR grab the interest of Google, Google Scholar, and Microsoft Research (where admittedly a lot of article discovery is already happening)? Will developers of discovery layers and link resolvers prioritize GetFTR implementation in their services? Will libraries find enough value in GetFTR to enable it in their discovery layers and link resolvers? Would libraries argue against GetFTR in learning management systems, faculty profile systems, and other campus systems if its own services cannot be included in GetFTR displays? I don’t know, but I think it is up to the principles behind GetFTR to make more inclusive decisions. The next steps is theirs. Tags: discovery, GetFTR, niso, openurl, ra21, SeamlessAccess Categories: Linking Technologies Twitter Facebook LinkedIn Previous Next You May Also Enjoy More Thoughts on Pre-recording Conference Talks 8 minute read Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion abo... Should All Conference Talks be Pre-recorded? 6 minute read The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and con... User Behavior Access Controls at a Library Proxy Server are Okay 9 minute read Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. ... As a Cog in the Election System: Reflections on My Role as a Precinct Election Official 10 minute read I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democ... Enter your search term... Twitter GitHub Feed © 2021 Peter Murray. Powered by Jekyll & Minimal Mistakes. dltj-org-9420 ---- More Thoughts on Pre-recording Conference Talks | Disruptive Library Technology Jester Skip links Skip to primary navigation Skip to content Skip to footer Disruptive Library Technology Jester About Resume Toggle search Toggle menu Peter Murray Library technologist, open source advocate, striving to think globally while acting locally Follow Columbus, Ohio Email Twitter Keybase GitHub LinkedIn StackOverflow ORCID Email More Thoughts on Pre-recording Conference Talks Posted on April 08, 2021     7 minute read Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion about recording talks to fill in gaps—positive and negative—about the concept, and I was not disappointed. I’m particularly thankful to Lisa Janicke Hinchliffe and Andromeda Yelton along with Jason Griffey, Junior Tidal, and Edward Lim Junhao for generously sharing their thoughts. Daniel S and Kate Deibel also commented on the Code4Lib Slack team. I added to the previous article’s bullet points and am expanding on some of the issues here. I’m inviting everyone mentioned to let me know if I’m mischaracterizing their thoughts, and I will correct this post if I hear from them. (I haven’t found a good comments system to hook into this static site blog.) Pre-recorded Talks Limit Presentation Format Lisa Janicke Hinchliffe made this point early in the feedback: @DataG For me downside is it forces every session into being a lecture. For two decades CfPs have emphasized how will this season be engaging/not just a talking head? I was required to turn workshops into talks this year. Even tho tech can do more. Not at all best pedagogy for learning — Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 Jason described the “flipped classroom” model that he had in mind as the NISOplus2021 program was being developed. The flipped classroom model is one where students do the work of reading material and watching lectures, then come to the interactive time with the instructors ready with questions and comments about the material. Rather than the instructor lecturing during class time, the class time becomes a discussion about the material. For NISOplus, “the recording is the material the speaker and attendees are discussing” during the live Zoom meetings. In the previous post, I described how the speaker could respond in text chat while the recording replay is beneficial. Lisa went on to say: @DataG Q+A is useful but isn't an interactive session. To me, interactive = participants are co-creating the session, not watching then commenting on it. — Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 She described an example: the SSP preconference she ran at CHS. I’m paraphrasing her tweets in this paragraph. The preconference had a short keynote and an “Oprah-style” panel discussion (not pre-prepared talks). This was done live; nothing was recorded. After the panel, people worked in small groups using Zoom and a set of Google Slides to guide the group work. The small groups reported their discussions back to all participants. Andromeda points out (paraphrasing twitter-speak): “Presenters will need much more— and more specialized—skills to pull it off, and it takes a lot more work.” And Lisa adds: “Just so there is no confusion … I don’t think being online makes it harder to do interactive. It’s the pre-recording. Interactive means participants co-create the session. A pause to chat isn’t going to shape what comes next on the recording.” Increased Technical Burden on Speakers and Organizers @ThatAndromeda @DataG Totally agree on this. I had to pre-record a conference presentation recently and it was a terrible experience, logistically. I feel like it forces presenters to become video/sound editors, which is obviously another thing to worry about on top of content and accessibility. — Junior Tidal (@JuniorTidal) April 5, 2021 Andromeda also agreed with this: “I will say one of the things I appreciated about NISO is that @griffey did ALL the video editing, so I was not forced to learn how that works.” She continued, “everyone has different requirements for prerecording, and in [Code4Lib’s] case they were extensive and kept changing.” And later added: “Part of the challenge is that every conference has its own tech stack/requirements. If as a presenter I have to learn that for every conference, it’s not reducing my workload.” It is hard not to agree with this; a high-quality (stylistically and technically) recording is not easy to do with today’s tools. This is also a technical burden for meeting organizers. The presenters will put a lot of work into talks—including making sure the recordings look good; whatever playback mechanism is used has to honor the fidelity of that recording. For instance, presenters who have gone through the effort to ensure the accessibility of the presentation color scheme want the conference platform to display the talk “as I created it.” The previous post noted that recorded talks also allow for the creation of better, non-real-time transcriptions. Lisa points out that presenters will want to review that transcription for accuracy, which Jason noted adds to the length of time needed before the start of a conference to complete the preparations. Increased Logistical Burden on Presenters @ThatAndromeda @DataG @griffey Even if prep is no more than the time it would take to deliver live (which has yet to be case for me and I'm good at this stuff), it is still double the time if you are expected to also show up live to watch along with everyone else. — Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 This is a consideration I hadn’t thought through—that presenters have to devote more clock time to the presentation because first they have to record it and then they have to watch it. (Or, as Andromeda added, “significantly more than twice the time for some people, if they are recording a bunch in order to get it right and/or doing editing.”) No. Audience. Reaction. @DataG @griffey 3) No. Audience. Reaction. I give a joke and no one laughs. Was it funny? Was it not funny? Talks are a *performance* and a *relationship*; I'm getting energy off the audience, I'm switching stuff on the fly to meet their vibe. Prerecorded/webinar is dead. Feels like I'm bombing. — Andromeda Yelton (@ThatAndromeda) April 5, 2021 Wow, yes. I imagine it would take a bit of imagination to get in the right mindset to give a talk to a small camera instead of an audience. I wonder how stand-up comedians are dealing with this as they try to put on virtual shows. Andromeda summed this up: @DataG @griffey oh and I mean 5) I don't get tenure or anything for speaking at conferences and goodness knows I don't get paid. So the ENTIRE benefit to me is that I enjoy doing the talk and connect to people around it. prerecorded talk + f2f conf removes one of these; online removes both. — Andromeda Yelton (@ThatAndromeda) April 5, 2021 Also in this heading could be “No Speaker Reaction”—or the inability for subsequent speakers at a conference to build on something that someone said earlier. In the Code4Lib Slack team, Daniel S noted: “One thing comes to mind on the pre-recording [is] the issue that prerecorded talks lose the ‘conversation’ aspect where some later talks at a conference will address or comment on earlier talks.” Kate Deibel added: “Exactly. Talks don’t get to spontaneously build off of each other or from other conversations that happen at the conference.” Currency of information Lisa points out that pre-recording talks before en event means there is a delay between the recording and the playback. In the example she pointed out, there was a talk at RLUK that pre-recorded would have been about the University of California working on an Open Access deal with Elsevier; live, it was able to be “the deal we announced earlier this week”. Conclusions? Near the end of the discussion, Lisa added: @DataG @griffey @ThatAndromeda I also recommend going forward that the details re what is required of presenters be in the CfP. It was one thing for conferences that pivoted (huge effort!) but if you write the CfP since the pivot it should say if pre-record, platform used, etc. — Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 …and Andromeda added: “Strong agree here. I understand that this year everyone was making it up as they went along, but going forward it’d be great to know that in advance.” That means conferences will need to take these needs into account well before the Call for Proposals (CfP) is published. A conference that is thinking now about pre-recording their talks must work through these issues and set expectations with presenters early. As I hoped, the Twiter replies tempered my eagerness for the all-recorded style with some real-world experience. There could be possibilities here, but adapting face-to-face meetings to a world with less travel won’t be simple and will take significant thought beyond the issues of technology platforms. Edward Lim Junhao summarized this nicely: “I favor unpacking what makes up our prof conferences. I’m interested in recreating that shared experience, the networking, & the serendipity of learning sth you didn’t know. I feel in-person conferences now have to offer more in order to justify people traveling to attend them.” Related, Andromeda said: “Also, for a conf that ultimately puts its talks online, it’s critical that it have SOMEthing beyond content delivery during the actual conference to make it worth registering rather than just waiting for youtube. realtime interaction with the speaker is a pretty solid option.” If you have something to add, reach out to me on Twitter. Given enough responses, I’ll create another summary. Let’s keep talking about what that looks like and sharing discoveries with each other. The Tree of Tweets It was a great discussion, and I think I pulled in the major ideas in the summary above. With some guidance from Ed Summers, I’m going to embed the Twitter threads below using Treeverse by Paul Butler. We might be stretching the boundaries of what is possible, so no guarantees that this will be viewable for the long term. Tags: code4lib, covid19, meeting planning, NISOplus Categories: L/IS Profession Twitter Facebook LinkedIn Previous Next You May Also Enjoy Should All Conference Talks be Pre-recorded? 6 minute read The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and con... User Behavior Access Controls at a Library Proxy Server are Okay 9 minute read Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. ... As a Cog in the Election System: Reflections on My Role as a Precinct Election Official 9 minute read I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democ... Running an All-Online Conference with Zoom [post removed] less than 1 minute read This is an article draft that was accidentally published. I hope to work on a final version soon. If you really want to see it, I saved a copy on the Interne... Enter your search term... Twitter GitHub Feed © 2021 Peter Murray. Powered by Jekyll & Minimal Mistakes. docs-google-com-400 ---- Documentation Sprint 1.1.0 - Google Sheets JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload. Documentation Sprint 1.1.0          Share Sign in The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss File Edit View Insert Format Data Tools Form Add-ons Help Accessibility Unsaved changes to Drive See new changes                 $ %     123                             Conditional formatting     Conditional formatting                                                                                                   SUM AVERAGE COUNT MAX MIN Learn more                     Accessibility               A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AA AB AC AD 1 Review Complete Auditor Reviewer Link Status Type Audience Goal PROBLEMS NOTES 2 About https://islandora.github.io/documentation/ Good For Now Conceptual Stranger Explains at a high level what islandora does 3 Concepts Structuring menu item - not a page 4 Reviewed DIG MA ├── Collections https://islandora.github.io/documentation/concepts/collection/ Needs Work Conceptual Newcomer Explain the concept of Collections in Islandora, with reference to bulk management and the interaction of Islandora Defaults. Points to page that does not exist yet (Bulk Editing). Assumes some Basic Drupal knowledge and knowledge of Islandora Defaults, too early (because this is one of the first pages in the documentation). Collections should probably not be the first page in the documentation tree. 'Content Types' should be in the Glossary. Add more links. 5 Audited MH ├── Access Control https://islandora.github.io/documentation/concepts/access-control/ Needs Work Conceptual DevOps, repository manager Explain what mechanism(s) for access control are available and how restrictions affect Islandora repo content Mixes documentation type and audiences; make this conceptual documentation for repository managers that explains which levels of restriction can be configured, how inheritance works (it doesn't), separate out sysadmin/devops documentation about preventing access to other components of the stack, consider moving overview over contrib modules not part of Islandora core/default to a "solution gallery" or cookbook section with recommendations; fix link to documentation page on manging user accounts 6 KC ├── Accessibility https://islandora.github.io/documentation/concepts/accessibility/ Conceptual 7 ├── Component Overview https://islandora.github.io/documentation/installation/component_overview/ Conceptual Stranger Give an understanding of what components Islandora include and how they work together. This should have a link to the architecture diagram: https://islandora.github.io/documentation/technical-documentation/diagram/ (MA) 8 AB ├── Modelling content in Islandora 8 vs. 7 https://islandora.github.io/documentation/user-documentation/objects_to_resource_nodes/ Conceptual Islandora 7 user Translate between the "object" and "datastreams" model and the "nodes" and "media" model 9 └── Islandora Defaults https://islandora.github.io/documentation/reference/islandora_defaults_reference/ Conceptual Create sensible expectations around configurability and ongoing support 10 Installation Structuring menu item - not a page Proposed page under this menu item: Installation overview, describing why we have so many installation methods 11 ├── Docker Compose (ISLE-DC) https://islandora.github.io/documentation/installation/docker-compose/ Conceptual Reference page: what is ISLE. Explain "best practices" like Remov tutorial Proposed sub-page: Tutorial Create a Dev-Environment; procedural; geared towards 'baby devs'; Hand-hold walkthrough of creating a local sandbox 12 ├── Ansible Playbook https://islandora.github.io/documentation/installation/playbook/ Needs Work Procedural 13 ├── Manual Installation Structuring menu item - not a page Procedural 14 │ ├── Introduction https://islandora.github.io/documentation/installation/manual/introduction/ Procedural Site Builder Assumes, but does not specify Ubuntu (or similar) operating system 15 CG │ ├── Preparing a LAPP Webserver https://islandora.github.io/documentation/installation/manual/preparing_a_webserver/ Needs Work Procedural Site Builder Remove jargon, check specifications. Is this locked to PHP 7.2? To PostgreSQL? LAPP? Linux Apache PostgreSQL & PHP? 16 │ ├── Installing Composer, Drush, and Drupal https://islandora.github.io/documentation/installation/manual/installing_composer_drush_and_drupal/ Procedural Site Builder 17 │ ├── Installing Tomcat and Cantaloupe https://islandora.github.io/documentation/installation/manual/installing_tomcat_and_cantaloupe/ Procedural Site Builder 18 │ ├── Installing Fedora, Syn, and Blazegraph https://islandora.github.io/documentation/installation/manual/installing_fedora_syn_and_blazegraph/ Procedural Site Builder 19 │ ├── Installing Solr https://islandora.github.io/documentation/installation/manual/installing_solr/ Procedural Site Builder 20 │ ├── Installing Crayfish https://islandora.github.io/documentation/installation/manual/installing_crayfish/ Procedural Site Builder 21 │ ├── Installing Karaf and Alpaca https://islandora.github.io/documentation/installation/manual/installing_karaf_and_alpaca/ Procedural Site Builder 22 │ └── Configuring Drupal https://islandora.github.io/documentation/installation/manual/configuring_drupal/ Procedural Site Builder 23 └── Installing Modules https://islandora.github.io/documentation/technical-documentation/install-enable-drupal-modules/ Procedural Site Builder 24 Tutorials Structuring menu item - not a page 25 Reviewed MC MAC ├── Create a Resource Node https://islandora.github.io/documentation/tutorials/create-a-resource-node/ Good For Now Procedural Islandora/Drupal Novice, Content/Collection Manager Hand holdy walkthrough of creating a resource node with a media file. Note in tutorial to Keep it simple and avoid fields with the autocomplete symbol could stand an explanation for avoiding, or a link to more information elsewhere. 26 Audited MC KC ├── Create a Collection https://islandora.github.io/documentation/tutorials/how-to-create-collection/ Good For Now Procedural Islandora/Drupal Novice, Content/Collection Manager Walkthrough of creating and populating a Collection in UI Minor accuracy issue: References to "Collection Members" tab should be changed to "Children tab" as shown in screenshots. This tutorial has "Introduction" section, while previous tutorial has opening "Overview" section 27 Audited MC ├── Configure Blocks https://islandora.github.io/documentation/tutorials/blocks/ Needs Work Procedural Islandora/Drupal Novice, Site Builder Walkthrough of general Block layout and Context configurations Lack of labeled "Overview" or "Introduction" section. Screenshots and steps in the Using Context section need to be updated to match current release (as seen on public sandbox). For example, Context list page on sandbox shows more context groupings than screenshot; text for "Click 'Configure' button" step should read "click 'Edit' option" I found myself wondering if there are Islandora-specific blocks of interest, or if the majority of Islandora-centric configurations are in the Context options (which seems to be the case). 28 Reviewed MC MAC ├── Create or Update a View https://islandora.github.io/documentation/tutorials/create_update_views/ Needs Work Procedural Islandora/Drupal Novice, Site Builder Walkthrough of how to modify existing and create new views Screenshot for step 4.a doesn't match sandbox (different button name). In Create new view section, instructions include selecting "Create a block." Some explanation of relationship with blocks as they are explained in separate page would be helpful. 29 Audited MC └── Video Documentation https://islandora.github.io/documentation/user-documentation/video-docs/ Needs Work Reference Islandora/Drupal Novice, consumers of documentation in video format Provide browsable list of video tutorials available, organized by broad categories Lacks Intro/Overview section in TOC, even though there is intro text. Link to "the playlist" is a link to this page (self-referencing, instead of linking out to YouTube playlist). Text for "Regenerating a Derivative" video link has a typo. The intro text mentions that new videos are added to the playlist (and updated here on this page?) regularly, so it would be nice to place the page's last update info at the top rather than in the footer as it is currently. 30 Documentation Structuring menu item - not a page 31 ├── Introduction https://islandora.github.io/documentation/user-documentation/user-intro/ Conceptual 32 AB KC ├── Intro to Linked Data https://islandora.github.io/documentation/user-documentation/intro-to-ld-for-islandora-8/ Conceptual 33 Audited MA ├── Versioning https://islandora.github.io/documentation/user-documentation/versioning/ Needs Work Conceptual Islandora/Drupal Novice, Site Builder Describes how versioning works in Islandora and Fedora+Islandora, including the workflow Specifically references Islandora 8.x-1.1. This should be updated or made evergreen. This page could also be a good place to intoduce/explain semantic versioning? 34 ├── Content in Islandora 8 Structuring menu item - not a page Conceptual 35 Reviewed MC MA │ ├── Resource Nodes https://islandora.github.io/documentation/user-documentation/resource-nodes/ Conceptual Islandora/Drupal Novice, Repository admins Provide detailed explanation of the components and configuration options for resource nodes. Lacks Intro/Overview section in TOC, even though there is intro text. Last update date at top of page doesn't match last update date in footer. Islandora 8 Property/Value table is missing a row for uid. Field section could use expansion covering how to view/manage/configure fields, to be more consistent with other sections on page. Display modes section needs more clarity in last paragraph about order and overrides. Adding links between this page and the Create a Resource page at https://islandora.github.io/documentation/tutorials/create-a-resource-node/ would be helpful. 36 MC │ ├── Media https://islandora.github.io/documentation/user-documentation/media/ Conceptual 37 MC │ ├── Paged Content https://islandora.github.io/documentation/user-documentation/paged-content/ Conceptual 38 MR │ └── Metadata https://islandora.github.io/documentation/user-documentation/metadata/ Good For Now Conceptual Systems Admin, Users, Novice To describe the basic metadata configuration, how it's stored, and ways it can be configured One minor note is that I was a bit confused by the paragraph that began with "Not all content types in your Drupal site need be Islandora "resource nodes"." It took me two reads to grasp what they were talking about. 39 ├── Configuring Islandora Structuring menu item - not a page Procedural 40 AB │ ├── Modify or Create a Content Type https://islandora.github.io/documentation/user-documentation/content_types/ Procedural 41 │ ├── Configure Search https://islandora.github.io/documentation/user-documentation/searching/ Procedural 42 RL │ ├── Configure Context https://islandora.github.io/documentation/user-documentation/context/ Procedural 43 MR MC │ ├── Multilingual https://islandora.github.io/documentation/user-documentation/multilingual/ Procedural 44 Audited MA │ ├── Extending Islandora https://islandora.github.io/documentation/user-documentation/extending/ Good For Now Reference Site builders To describe an dlink to additional resources for adding non-Islandora Drupal modules. Mostly pointing to the Cookbook. Very brief, just pointing out. Could be imporved by adding https://www.drupal.org/project/project_theme as a link when mentioning themes. 45 Audited MA │ ├── Viewers https://islandora.github.io/documentation/user-documentation/file_viewers/ Needs Work Conceptual Site builders Explains how viewers work, including a configuration example Attempts to be procedural, but the example is not quite written step-by-step enough to follow along and accomplish a goal. Audience seems to be Site builders, especially based on context of the other pages in this section, but it's written a little technical. 46 MA │ ├── IIIF https://islandora.github.io/documentation/user-documentation/iiif/ Reference Site builders Explains what IIIF is and how it works in tghe Islandora context. Crosses the line between procedural and reference, since it both explains, and has some steps for making changes 47 MR │ ├── OAI-PMH https://islandora.github.io/documentation/user-documentation/oai/ Procedural 48 MR │ ├── RDF Generation https://islandora.github.io/documentation/islandora/rdf-mapping/ Procedural 49 MR │ ├── Drupal Bundle Configurations https://islandora.github.io/documentation/islandora/drupal-bundle-configurations/ Procedural 50 │ └── Flysystem https://islandora.github.io/documentation/technical-documentation/flysystem/ Procedural 51 └── Operating an Islandora Repository Structuring menu item - not a page Procedural 52 MC . ├── Create and Manage User Accounts https://islandora.github.io/documentation/user-documentation/users/ Procedural 53 . └── Usage Stats https://islandora.github.io/documentation/user-documentation/usage-stats/ Procedural 54 System Administrator Documentation Structuring menu item - not a page 55 Reviewed MH MA ├── Updating Drupal https://islandora.github.io/documentation/technical-documentation/updating_drupal/ Needs Work Procedural system administrator explain steps needed to update the Drupal component of the Islandora stack check if described process reflects the approach necessary for ISLE; page says it's missing description on updating Islandora features; 'make backup' admonition should be step in the process; 'alternate syntax needed' admonition should be step in the process; highlight more explicitly if Islandora pins versions of Drupal components or modules Missing pages: Describe how to update any other component of the stack that requires special instructions 56 Audited MH RL ├── Uploading large files https://islandora.github.io/documentation/technical-documentation/uploading-large-files/ Good For Now Reference system administrator explain configuration options for use case "I want Islandora users to be able to upload large files" Consider moving to a new "solution gallery" section, or a new "configuration options" page under the Sys Admin documentation 57 Audited MH RL └── JWT Authentication https://islandora.github.io/documentation/technical-documentation/jwt/ Good For Now Reference developer and/or systems administrator lists key storage locations and explains configuration of JWT authentication for secure communication between components Consider moving to installation instructions 58 Documentation for Developers Structuring menu item - not a page 59 Reviewed MH MA ├── Architecture Diagram https://islandora.github.io/documentation/technical-documentation/diagram/ Needs Work Reference developer and system administrator overview over Islandora stack components and their interaction Is "Syn" something that neede to feature in the diagram and list of components? check to make sure the diagram and list of components is up to date 60 ├── REST Documentation Structuring menu item - not a page 61 Audited MH │ ├── Introduction https://islandora.github.io/documentation/technical-documentation/using-rest-endpoints/ Needs Work Reference developer overview over the RESTful API, which allows for programmatic interaction with Islandora content link to Drupal documentation about RESTful API, if it exists; documentation about Authentication should have a separate page 62 Audited MH │ ├── GET https://islandora.github.io/documentation/technical-documentation/rest-get/ Good For Now Reference developer describe how to retrieve metadata for nodes, media and file entities, as well as binary file URLs 63 Audited MH │ ├── POST/PUT https://islandora.github.io/documentation/technical-documentation/rest-create/ Needs Work Reference developer describe how to create a node, media/file entities through the REST API unclear if JSON data in request can contain more than just the required fields (I suppose it can, add an example?); consider creating separate pages for POST and PUT, since the verbs are used for different things (creating node vs. creating file) and are used at slightly different endpoints (Drupal vs. Islandora); check and document if there are for instance file size limitations for using PUT requests (link to https://islandora.github.io/documentation/technical-documentation/uploading-large-files/) 64 Audited MH │ ├── PATCH https://islandora.github.io/documentation/technical-documentation/rest-patch/ Good For Now Reference developer describe how to update values on fields of nodes or media using the REST API 65 Audited MH │ ├── DELETE https://islandora.github.io/documentation/technical-documentation/rest-delete/ Needs Work Reference developer describe how to delete nodes, media or files using the REST API verify and document if deleting nodes/media through REST API can leave media/files orphaned, and how to mitigate that 66 Audited MH │ └── Signposting https://islandora.github.io/documentation/technical-documentation/rest-signposting/ Good For Now Reference developer, system admin describe which HTTP Link Headers Islandora returns in the response to a GET request perhaps link to https://signposting.org/ for rationale and sample use cases? If the Link Headers provided by either Drupal or Islandora are configurable, document that 67 ├── Tests Structuring menu item - not a page Procedural 68 │ ├── Running Tests https://islandora.github.io/documentation/technical-documentation/running-automated-tests/ Procedural 69 │ └── Testing Notes https://islandora.github.io/documentation/technical-documentation/testing-notes/ Procedural 70 ├── Updating drupal-project https://islandora.github.io/documentation/technical-documentation/drupal-project/ Procedural 71 Audited RL ├── Versioning Policy https://islandora.github.io/documentation/technical-documentation/versioning/ Needs Work Reference developer describe how we version the various components of Islandora? Be the "Versioning policy" that seems necessary. Page could be more explicit about how we release major/minor versions, incorporating more of the semver explanations, such as this page: https://docs.launchdarkly.com/sdk/concepts/versioning Actually, I have questions about whether the Drupal 8/9 modules are still using "core compatibility" as the first number, since Drupal 9 is HERE (the page says no) 72 Audited RL ├── Adding back ?_format=jsonld https://islandora.github.io/documentation/technical-documentation/adding_format_jsonld/ Needs Work Procedural developer Document that we changed behaviour around the 1.0 release so that devs can revert if desired This page doesn't make sense as a standalone page. It is random and bizarre. It should be part of the discussion of what Milliner is, and maybe what a URI is in the context of Islandora and Fedora. I don't think we've had this discussion. 73 ├── Updating a `deb` and adding it to Lyrasis PPA https://islandora.github.io/documentation/technical-documentation/ppa-documentation/ Procedural 74 └── Alpaca Structuring menu item - not a page Procedural 75 . ├── Alpaca Technical Stack https://islandora.github.io/documentation/alpaca/alpaca-technical-stack/ Procedural 76 . └── Alpaca Tips https://islandora.github.io/documentation/technical-documentation/alpaca_tips/ Procedural 77 Migration Structuring menu item - not a page 78 ├── Migration Overview https://islandora.github.io/documentation/technical-documentation/migration-overview/ Procedural 79 RL ├── CSV https://islandora.github.io/documentation/technical-documentation/migrate-csv/ Procedural 80 └── Islandora 7 https://islandora.github.io/documentation/technical-documentation/migrate-7x/ Procedural 81 Contributing Structuring menu item - not a page 82 Audited MA ├── How to contribute https://islandora.github.io/documentation/contributing/CONTRIBUTING/ Needs Work Procedural New contributors Explains the avenues and procedures for making contributions to the Islandora codebase and documentation This is based on the CONTRIBUTING.md file that is standard in every Islandora github repo. Because those have to stand alone, it doesn't really read well as part of the larger documentation set, and it could be more verbose in this context, expecially in terms of how ot contribute to documentation. Example of another CONTRIBUTING.md: https://github.com/Islandora/islandora/blob/7.x/CONTRIBUTING.md 83 Audited MA |── Resizing a VM https://islandora.github.io/documentation/technical-documentation/resizing_vm/ Needs Work Procedural Testers Instructions for adjusting the size allocated to a Virtual Machine so that larger files can be adjusted. These instructions are great, but it's wierd that this is a page all on its own. It should be a section or note in a page about using an Islandora VM 84 Audited MA ├── Checking Coding Standards https://islandora.github.io/documentation/technical-documentation/checking-coding-standards/ Needs Work Procedural Developers Describes the commands to run to check coding standards before making a contribution. This should be verified by some one with a dev background to make sure it's all still relevant. and it probably does not need to be its own page. it could be rolled into the description of how to do a pull request that is included in the "How to contribute" page in this same section. 85 ├── Contributing Workflow https://islandora.github.io/documentation/contributing/contributing-workflow/ Procedural 86 YS ├── Creating GitHub Issues https://islandora.github.io/documentation/contributing/create_issues/ Procedural 87 Audited YS ├── Editing Documentation https://islandora.github.io/documentation/contributing/editing-docs/ Needs Work Procedural documentation contributors, developers, committers Instuctions for editing the documentation using the online Github code editor and by creatign a pull request online. A) explain how markdown is a formatting language and that mkdocs uses it B) Refer to "THIS PROJECTS Documentation Style Guide" to exaplain the provenance of the style guide D) mention that you can request a Contributor License Agreement if you don't have one. E) explain that "Starting from the page you want to edit" refers to any of the github.io versions of this content F) mention that there is a way o contribute docs with Issues as mentioned here, by creatign an issue ...https://github.com/Islandora/documentation/blob/24155c50257de067d02aa4e6e48a381ace273d94/CONTRIBUTING.md G) specifically mention that docuemtnation can be built by forking then clonng a local copy of the repo and then one can follow a typical PR process 88 Audited YS ├── How to Build Documentation https://islandora.github.io/documentation/technical-documentation/docs-build/ Needs Work Procedural documentation contributors, developers, committers Instructions on how to build the documentation from the docuemntation repo using. Including how to install the mkdocs Python based software needed to build the docs. A) Provide macos install syntax reffering to "pip3 --user" B) Veriffy if we need to run git submodule update --init --recursive to build docs. C) Consider spelling out the steps from linked traiing video on how to test a doc pull request. (download a zip version of PR branch/commit, mkdocs --clean mkdocs, mkdocs server) D) mention that you can use ctrl-c to quit our of mkdocs on the terminal. 89 Audited YS ├── Documentation Style Guide https://islandora.github.io/documentation/contributing/docs_style_guide/ Good For Now Reference documentation contributors, developers, committers List of suggestions for how to create well formatted and well style documentation. In the bullet that mentions that doc submissiosn shoudl use Github PRs we coudl link to the "Editing Documentation" page that explains the basics of PRs. This page could cover cross page linking syntax for this project. 90 Audited MA └── Committers https://islandora.github.io/documentation/contributing/committers/ Needs Work Reference Everyone? Describes the rights and responsibilities of Islandora committers, and how new committers are nominated and approved. Also lists current and Emeritus committers. Alan Stanley is listed as working for Prince Edward Islandora [sic]. 91 Glossary https://islandora.github.io/documentation/user-documentation/glossary/ Reference 92 93 94 95 96 97 98 99 100 Quotes are not sourced from all markets and may be delayed up to 20 minutes. Information is provided 'as is' and solely for informational purposes, not for trading purposes or advice.Disclaimer       DIG Sprint April 2021 Page Suggestions Instructions Sign-up Pages (pre Nov 2020) Pages (OLD) Pages OLD     A browser error has occurred. Please press Ctrl-F5 to refresh the page and try again. A browser error has occurred. Please hold the Shift key and click the Refresh button to try again. docs-google-com-7032 ---- Islandora Open Meeting: April 27, 2021 - Google Docs JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload. Islandora Open Meeting: April 27, 2021        Share Sign in The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss File Edit View Tools Help Accessibility Debug See new changes docs-google-com-7108 ---- Documentation Sprint 1.1.0 - Google Sheets JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload. Documentation Sprint 1.1.0          Share Sign in The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss File Edit View Insert Format Data Tools Form Add-ons Help Accessibility Unsaved changes to Drive See new changes                 $ %     123                             Conditional formatting     Conditional formatting                                                                                                   SUM AVERAGE COUNT MAX MIN Learn more                     Accessibility               A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AA AB AC AD 1 Review Complete Auditor Reviewer Link Status Type Audience Goal PROBLEMS NOTES 2 About https://islandora.github.io/documentation/ Good For Now Conceptual Stranger Explains at a high level what islandora does 3 Concepts Structuring menu item - not a page 4 Reviewed DIG MA ├── Collections https://islandora.github.io/documentation/concepts/collection/ Needs Work Conceptual Newcomer Explain the concept of Collections in Islandora, with reference to bulk management and the interaction of Islandora Defaults. Points to page that does not exist yet (Bulk Editing). Assumes some Basic Drupal knowledge and knowledge of Islandora Defaults, too early (because this is one of the first pages in the documentation). Collections should probably not be the first page in the documentation tree. 'Content Types' should be in the Glossary. Add more links. 5 Audited MH ├── Access Control https://islandora.github.io/documentation/concepts/access-control/ Needs Work Conceptual DevOps, repository manager Explain what mechanism(s) for access control are available and how restrictions affect Islandora repo content Mixes documentation type and audiences; make this conceptual documentation for repository managers that explains which levels of restriction can be configured, how inheritance works (it doesn't), separate out sysadmin/devops documentation about preventing access to other components of the stack, consider moving overview over contrib modules not part of Islandora core/default to a "solution gallery" or cookbook section with recommendations; fix link to documentation page on manging user accounts 6 KC ├── Accessibility https://islandora.github.io/documentation/concepts/accessibility/ Conceptual 7 ├── Component Overview https://islandora.github.io/documentation/installation/component_overview/ Conceptual Stranger Give an understanding of what components Islandora include and how they work together. This should have a link to the architecture diagram: https://islandora.github.io/documentation/technical-documentation/diagram/ (MA) 8 AB ├── Modelling content in Islandora 8 vs. 7 https://islandora.github.io/documentation/user-documentation/objects_to_resource_nodes/ Conceptual Islandora 7 user Translate between the "object" and "datastreams" model and the "nodes" and "media" model 9 └── Islandora Defaults https://islandora.github.io/documentation/reference/islandora_defaults_reference/ Conceptual Create sensible expectations around configurability and ongoing support 10 Installation Structuring menu item - not a page Proposed page under this menu item: Installation overview, describing why we have so many installation methods 11 ├── Docker Compose (ISLE-DC) https://islandora.github.io/documentation/installation/docker-compose/ Conceptual Reference page: what is ISLE. Explain "best practices" like Remov tutorial Proposed sub-page: Tutorial Create a Dev-Environment; procedural; geared towards 'baby devs'; Hand-hold walkthrough of creating a local sandbox 12 ├── Ansible Playbook https://islandora.github.io/documentation/installation/playbook/ Needs Work Procedural 13 ├── Manual Installation Structuring menu item - not a page Procedural 14 │ ├── Introduction https://islandora.github.io/documentation/installation/manual/introduction/ Procedural Site Builder Assumes, but does not specify Ubuntu (or similar) operating system 15 CG │ ├── Preparing a LAPP Webserver https://islandora.github.io/documentation/installation/manual/preparing_a_webserver/ Needs Work Procedural Site Builder Remove jargon, check specifications. Is this locked to PHP 7.2? To PostgreSQL? LAPP? Linux Apache PostgreSQL & PHP? 16 │ ├── Installing Composer, Drush, and Drupal https://islandora.github.io/documentation/installation/manual/installing_composer_drush_and_drupal/ Procedural Site Builder 17 │ ├── Installing Tomcat and Cantaloupe https://islandora.github.io/documentation/installation/manual/installing_tomcat_and_cantaloupe/ Procedural Site Builder 18 │ ├── Installing Fedora, Syn, and Blazegraph https://islandora.github.io/documentation/installation/manual/installing_fedora_syn_and_blazegraph/ Procedural Site Builder 19 │ ├── Installing Solr https://islandora.github.io/documentation/installation/manual/installing_solr/ Procedural Site Builder 20 │ ├── Installing Crayfish https://islandora.github.io/documentation/installation/manual/installing_crayfish/ Procedural Site Builder 21 │ ├── Installing Karaf and Alpaca https://islandora.github.io/documentation/installation/manual/installing_karaf_and_alpaca/ Procedural Site Builder 22 │ └── Configuring Drupal https://islandora.github.io/documentation/installation/manual/configuring_drupal/ Procedural Site Builder 23 └── Installing Modules https://islandora.github.io/documentation/technical-documentation/install-enable-drupal-modules/ Procedural Site Builder 24 Tutorials Structuring menu item - not a page 25 Reviewed MC MAC ├── Create a Resource Node https://islandora.github.io/documentation/tutorials/create-a-resource-node/ Good For Now Procedural Islandora/Drupal Novice, Content/Collection Manager Hand holdy walkthrough of creating a resource node with a media file. Note in tutorial to Keep it simple and avoid fields with the autocomplete symbol could stand an explanation for avoiding, or a link to more information elsewhere. 26 Audited MC KC ├── Create a Collection https://islandora.github.io/documentation/tutorials/how-to-create-collection/ Good For Now Procedural Islandora/Drupal Novice, Content/Collection Manager Walkthrough of creating and populating a Collection in UI Minor accuracy issue: References to "Collection Members" tab should be changed to "Children tab" as shown in screenshots. This tutorial has "Introduction" section, while previous tutorial has opening "Overview" section 27 Audited MC ├── Configure Blocks https://islandora.github.io/documentation/tutorials/blocks/ Needs Work Procedural Islandora/Drupal Novice, Site Builder Walkthrough of general Block layout and Context configurations Lack of labeled "Overview" or "Introduction" section. Screenshots and steps in the Using Context section need to be updated to match current release (as seen on public sandbox). For example, Context list page on sandbox shows more context groupings than screenshot; text for "Click 'Configure' button" step should read "click 'Edit' option" I found myself wondering if there are Islandora-specific blocks of interest, or if the majority of Islandora-centric configurations are in the Context options (which seems to be the case). 28 Reviewed MC MAC ├── Create or Update a View https://islandora.github.io/documentation/tutorials/create_update_views/ Needs Work Procedural Islandora/Drupal Novice, Site Builder Walkthrough of how to modify existing and create new views Screenshot for step 4.a doesn't match sandbox (different button name). In Create new view section, instructions include selecting "Create a block." Some explanation of relationship with blocks as they are explained in separate page would be helpful. 29 Audited MC └── Video Documentation https://islandora.github.io/documentation/user-documentation/video-docs/ Needs Work Reference Islandora/Drupal Novice, consumers of documentation in video format Provide browsable list of video tutorials available, organized by broad categories Lacks Intro/Overview section in TOC, even though there is intro text. Link to "the playlist" is a link to this page (self-referencing, instead of linking out to YouTube playlist). Text for "Regenerating a Derivative" video link has a typo. The intro text mentions that new videos are added to the playlist (and updated here on this page?) regularly, so it would be nice to place the page's last update info at the top rather than in the footer as it is currently. 30 Documentation Structuring menu item - not a page 31 ├── Introduction https://islandora.github.io/documentation/user-documentation/user-intro/ Conceptual 32 AB KC ├── Intro to Linked Data https://islandora.github.io/documentation/user-documentation/intro-to-ld-for-islandora-8/ Conceptual 33 Audited MA ├── Versioning https://islandora.github.io/documentation/user-documentation/versioning/ Needs Work Conceptual Islandora/Drupal Novice, Site Builder Describes how versioning works in Islandora and Fedora+Islandora, including the workflow Specifically references Islandora 8.x-1.1. This should be updated or made evergreen. This page could also be a good place to intoduce/explain semantic versioning? 34 ├── Content in Islandora 8 Structuring menu item - not a page Conceptual 35 Reviewed MC MA │ ├── Resource Nodes https://islandora.github.io/documentation/user-documentation/resource-nodes/ Conceptual Islandora/Drupal Novice, Repository admins Provide detailed explanation of the components and configuration options for resource nodes. Lacks Intro/Overview section in TOC, even though there is intro text. Last update date at top of page doesn't match last update date in footer. Islandora 8 Property/Value table is missing a row for uid. Field section could use expansion covering how to view/manage/configure fields, to be more consistent with other sections on page. Display modes section needs more clarity in last paragraph about order and overrides. Adding links between this page and the Create a Resource page at https://islandora.github.io/documentation/tutorials/create-a-resource-node/ would be helpful. 36 MC │ ├── Media https://islandora.github.io/documentation/user-documentation/media/ Conceptual 37 MC │ ├── Paged Content https://islandora.github.io/documentation/user-documentation/paged-content/ Conceptual 38 MR │ └── Metadata https://islandora.github.io/documentation/user-documentation/metadata/ Good For Now Conceptual Systems Admin, Users, Novice To describe the basic metadata configuration, how it's stored, and ways it can be configured One minor note is that I was a bit confused by the paragraph that began with "Not all content types in your Drupal site need be Islandora "resource nodes"." It took me two reads to grasp what they were talking about. 39 ├── Configuring Islandora Structuring menu item - not a page Procedural 40 AB │ ├── Modify or Create a Content Type https://islandora.github.io/documentation/user-documentation/content_types/ Procedural 41 │ ├── Configure Search https://islandora.github.io/documentation/user-documentation/searching/ Procedural 42 RL │ ├── Configure Context https://islandora.github.io/documentation/user-documentation/context/ Procedural 43 MR MC │ ├── Multilingual https://islandora.github.io/documentation/user-documentation/multilingual/ Procedural 44 Audited MA │ ├── Extending Islandora https://islandora.github.io/documentation/user-documentation/extending/ Good For Now Reference Site builders To describe an dlink to additional resources for adding non-Islandora Drupal modules. Mostly pointing to the Cookbook. Very brief, just pointing out. Could be imporved by adding https://www.drupal.org/project/project_theme as a link when mentioning themes. 45 Audited MA │ ├── Viewers https://islandora.github.io/documentation/user-documentation/file_viewers/ Needs Work Conceptual Site builders Explains how viewers work, including a configuration example Attempts to be procedural, but the example is not quite written step-by-step enough to follow along and accomplish a goal. Audience seems to be Site builders, especially based on context of the other pages in this section, but it's written a little technical. 46 MA │ ├── IIIF https://islandora.github.io/documentation/user-documentation/iiif/ Reference Site builders Explains what IIIF is and how it works in tghe Islandora context. Crosses the line between procedural and reference, since it both explains, and has some steps for making changes 47 MR │ ├── OAI-PMH https://islandora.github.io/documentation/user-documentation/oai/ Procedural 48 MR │ ├── RDF Generation https://islandora.github.io/documentation/islandora/rdf-mapping/ Procedural 49 MR │ ├── Drupal Bundle Configurations https://islandora.github.io/documentation/islandora/drupal-bundle-configurations/ Procedural 50 │ └── Flysystem https://islandora.github.io/documentation/technical-documentation/flysystem/ Procedural 51 └── Operating an Islandora Repository Structuring menu item - not a page Procedural 52 MC . ├── Create and Manage User Accounts https://islandora.github.io/documentation/user-documentation/users/ Procedural 53 . └── Usage Stats https://islandora.github.io/documentation/user-documentation/usage-stats/ Procedural 54 System Administrator Documentation Structuring menu item - not a page 55 Reviewed MH MA ├── Updating Drupal https://islandora.github.io/documentation/technical-documentation/updating_drupal/ Needs Work Procedural system administrator explain steps needed to update the Drupal component of the Islandora stack check if described process reflects the approach necessary for ISLE; page says it's missing description on updating Islandora features; 'make backup' admonition should be step in the process; 'alternate syntax needed' admonition should be step in the process; highlight more explicitly if Islandora pins versions of Drupal components or modules Missing pages: Describe how to update any other component of the stack that requires special instructions 56 Audited MH RL ├── Uploading large files https://islandora.github.io/documentation/technical-documentation/uploading-large-files/ Good For Now Reference system administrator explain configuration options for use case "I want Islandora users to be able to upload large files" Consider moving to a new "solution gallery" section, or a new "configuration options" page under the Sys Admin documentation 57 Audited MH RL └── JWT Authentication https://islandora.github.io/documentation/technical-documentation/jwt/ Good For Now Reference developer and/or systems administrator lists key storage locations and explains configuration of JWT authentication for secure communication between components Consider moving to installation instructions 58 Documentation for Developers Structuring menu item - not a page 59 Reviewed MH MA ├── Architecture Diagram https://islandora.github.io/documentation/technical-documentation/diagram/ Needs Work Reference developer and system administrator overview over Islandora stack components and their interaction Is "Syn" something that neede to feature in the diagram and list of components? check to make sure the diagram and list of components is up to date 60 ├── REST Documentation Structuring menu item - not a page 61 Audited MH │ ├── Introduction https://islandora.github.io/documentation/technical-documentation/using-rest-endpoints/ Needs Work Reference developer overview over the RESTful API, which allows for programmatic interaction with Islandora content link to Drupal documentation about RESTful API, if it exists; documentation about Authentication should have a separate page 62 Audited MH │ ├── GET https://islandora.github.io/documentation/technical-documentation/rest-get/ Good For Now Reference developer describe how to retrieve metadata for nodes, media and file entities, as well as binary file URLs 63 Audited MH │ ├── POST/PUT https://islandora.github.io/documentation/technical-documentation/rest-create/ Needs Work Reference developer describe how to create a node, media/file entities through the REST API unclear if JSON data in request can contain more than just the required fields (I suppose it can, add an example?); consider creating separate pages for POST and PUT, since the verbs are used for different things (creating node vs. creating file) and are used at slightly different endpoints (Drupal vs. Islandora); check and document if there are for instance file size limitations for using PUT requests (link to https://islandora.github.io/documentation/technical-documentation/uploading-large-files/) 64 Audited MH │ ├── PATCH https://islandora.github.io/documentation/technical-documentation/rest-patch/ Good For Now Reference developer describe how to update values on fields of nodes or media using the REST API 65 Audited MH │ ├── DELETE https://islandora.github.io/documentation/technical-documentation/rest-delete/ Needs Work Reference developer describe how to delete nodes, media or files using the REST API verify and document if deleting nodes/media through REST API can leave media/files orphaned, and how to mitigate that 66 Audited MH │ └── Signposting https://islandora.github.io/documentation/technical-documentation/rest-signposting/ Good For Now Reference developer, system admin describe which HTTP Link Headers Islandora returns in the response to a GET request perhaps link to https://signposting.org/ for rationale and sample use cases? If the Link Headers provided by either Drupal or Islandora are configurable, document that 67 ├── Tests Structuring menu item - not a page Procedural 68 │ ├── Running Tests https://islandora.github.io/documentation/technical-documentation/running-automated-tests/ Procedural 69 │ └── Testing Notes https://islandora.github.io/documentation/technical-documentation/testing-notes/ Procedural 70 ├── Updating drupal-project https://islandora.github.io/documentation/technical-documentation/drupal-project/ Procedural 71 Audited RL ├── Versioning Policy https://islandora.github.io/documentation/technical-documentation/versioning/ Needs Work Reference developer describe how we version the various components of Islandora? Be the "Versioning policy" that seems necessary. Page could be more explicit about how we release major/minor versions, incorporating more of the semver explanations, such as this page: https://docs.launchdarkly.com/sdk/concepts/versioning Actually, I have questions about whether the Drupal 8/9 modules are still using "core compatibility" as the first number, since Drupal 9 is HERE (the page says no) 72 Audited RL ├── Adding back ?_format=jsonld https://islandora.github.io/documentation/technical-documentation/adding_format_jsonld/ Needs Work Procedural developer Document that we changed behaviour around the 1.0 release so that devs can revert if desired This page doesn't make sense as a standalone page. It is random and bizarre. It should be part of the discussion of what Milliner is, and maybe what a URI is in the context of Islandora and Fedora. I don't think we've had this discussion. 73 ├── Updating a `deb` and adding it to Lyrasis PPA https://islandora.github.io/documentation/technical-documentation/ppa-documentation/ Procedural 74 └── Alpaca Structuring menu item - not a page Procedural 75 . ├── Alpaca Technical Stack https://islandora.github.io/documentation/alpaca/alpaca-technical-stack/ Procedural 76 . └── Alpaca Tips https://islandora.github.io/documentation/technical-documentation/alpaca_tips/ Procedural 77 Migration Structuring menu item - not a page 78 ├── Migration Overview https://islandora.github.io/documentation/technical-documentation/migration-overview/ Procedural 79 RL ├── CSV https://islandora.github.io/documentation/technical-documentation/migrate-csv/ Procedural 80 └── Islandora 7 https://islandora.github.io/documentation/technical-documentation/migrate-7x/ Procedural 81 Contributing Structuring menu item - not a page 82 Audited MA ├── How to contribute https://islandora.github.io/documentation/contributing/CONTRIBUTING/ Needs Work Procedural New contributors Explains the avenues and procedures for making contributions to the Islandora codebase and documentation This is based on the CONTRIBUTING.md file that is standard in every Islandora github repo. Because those have to stand alone, it doesn't really read well as part of the larger documentation set, and it could be more verbose in this context, expecially in terms of how ot contribute to documentation. Example of another CONTRIBUTING.md: https://github.com/Islandora/islandora/blob/7.x/CONTRIBUTING.md 83 Audited MA |── Resizing a VM https://islandora.github.io/documentation/technical-documentation/resizing_vm/ Needs Work Procedural Testers Instructions for adjusting the size allocated to a Virtual Machine so that larger files can be adjusted. These instructions are great, but it's wierd that this is a page all on its own. It should be a section or note in a page about using an Islandora VM 84 Audited MA ├── Checking Coding Standards https://islandora.github.io/documentation/technical-documentation/checking-coding-standards/ Needs Work Procedural Developers Describes the commands to run to check coding standards before making a contribution. This should be verified by some one with a dev background to make sure it's all still relevant. and it probably does not need to be its own page. it could be rolled into the description of how to do a pull request that is included in the "How to contribute" page in this same section. 85 ├── Contributing Workflow https://islandora.github.io/documentation/contributing/contributing-workflow/ Procedural 86 YS ├── Creating GitHub Issues https://islandora.github.io/documentation/contributing/create_issues/ Procedural 87 Audited YS ├── Editing Documentation https://islandora.github.io/documentation/contributing/editing-docs/ Needs Work Procedural documentation contributors, developers, committers Instuctions for editing the documentation using the online Github code editor and by creatign a pull request online. A) explain how markdown is a formatting language and that mkdocs uses it B) Refer to "THIS PROJECTS Documentation Style Guide" to exaplain the provenance of the style guide D) mention that you can request a Contributor License Agreement if you don't have one. E) explain that "Starting from the page you want to edit" refers to any of the github.io versions of this content F) mention that there is a way o contribute docs with Issues as mentioned here, by creatign an issue ...https://github.com/Islandora/documentation/blob/24155c50257de067d02aa4e6e48a381ace273d94/CONTRIBUTING.md G) specifically mention that docuemtnation can be built by forking then clonng a local copy of the repo and then one can follow a typical PR process 88 Audited YS ├── How to Build Documentation https://islandora.github.io/documentation/technical-documentation/docs-build/ Needs Work Procedural documentation contributors, developers, committers Instructions on how to build the documentation from the docuemntation repo using. Including how to install the mkdocs Python based software needed to build the docs. A) Provide macos install syntax reffering to "pip3 --user" B) Veriffy if we need to run git submodule update --init --recursive to build docs. C) Consider spelling out the steps from linked traiing video on how to test a doc pull request. (download a zip version of PR branch/commit, mkdocs --clean mkdocs, mkdocs server) D) mention that you can use ctrl-c to quit our of mkdocs on the terminal. 89 Audited YS ├── Documentation Style Guide https://islandora.github.io/documentation/contributing/docs_style_guide/ Good For Now Reference documentation contributors, developers, committers List of suggestions for how to create well formatted and well style documentation. In the bullet that mentions that doc submissiosn shoudl use Github PRs we coudl link to the "Editing Documentation" page that explains the basics of PRs. This page could cover cross page linking syntax for this project. 90 Audited MA └── Committers https://islandora.github.io/documentation/contributing/committers/ Needs Work Reference Everyone? Describes the rights and responsibilities of Islandora committers, and how new committers are nominated and approved. Also lists current and Emeritus committers. Alan Stanley is listed as working for Prince Edward Islandora [sic]. 91 Glossary https://islandora.github.io/documentation/user-documentation/glossary/ Reference 92 93 94 95 96 97 98 99 100 Quotes are not sourced from all markets and may be delayed up to 20 minutes. Information is provided 'as is' and solely for informational purposes, not for trading purposes or advice.Disclaimer       DIG Sprint April 2021 Page Suggestions Instructions Sign-up Pages (pre Nov 2020) Pages (OLD) Pages OLD     A browser error has occurred. Please press Ctrl-F5 to refresh the page and try again. A browser error has occurred. Please hold the Shift key and click the Refresh button to try again. doi-org-3035 ---- A cross disciplinary study of link decay and the effectiveness of mitigation techniques | BMC Bioinformatics | Full Text Skip to main content Advertisement Search Explore journals Get published About BMC My account Search all BMC articles Search BMC Bioinformatics Home About Articles In Review Submission Guidelines Download PDF Volume 14 Supplement 14 Proceedings of the Tenth Annual MCBIOS Conference Proceedings Open Access Published: 09 October 2013 A cross disciplinary study of link decay and the effectiveness of mitigation techniques Jason Hennessey1 & Steven Xijin Ge1  BMC Bioinformatics volume 14, Article number: S5 (2013) Cite this article 6526 Accesses 15 Citations 50 Altmetric Metrics details Abstract Background The dynamic, decentralized world-wide-web has become an essential part of scientific research and communication. Researchers create thousands of web sites every year to share software, data and services. These valuable resources tend to disappear over time. The problem has been documented in many subject areas. Our goal is to conduct a cross-disciplinary investigation of the problem and test the effectiveness of existing remedies. Results We accessed 14,489 unique web pages found in the abstracts within Thomson Reuters' Web of Science citation index that were published between 1996 and 2010 and found that the median lifespan of these web pages was 9.3 years with 62% of them being archived. Survival analysis and logistic regression were used to find significant predictors of URL lifespan. The availability of a web page is most dependent on the time it is published and the top-level domain names. Similar statistical analysis revealed biases in current solutions: the Internet Archive favors web pages with fewer layers in the Universal Resource Locator (URL) while WebCite is significantly influenced by the source of publication. We also created a prototype for a process to submit web pages to the archives and increased coverage of our list of scientific webpages in the Internet Archive and WebCite by 22% and 255%, respectively. Conclusion Our results show that link decay continues to be a problem across different disciplines and that current solutions for static web pages are helping and can be improved. Background Scholarly Internet resources play an increasingly important role in modern research. We can see this by the increasing number of URLs published in a paper's title or abstract [1](also see Figure 1). Until now, maintaining the availability of scientific contributions has been decentralized, mature and effective, utilizing methods developed over centuries to archive the books and journals in which they were communicated. As the Internet is still a relatively new medium for communicating scientific thought, the community is still figuring out how best to use it in a way that preserves contributions for years to come. One problem is that continued availability of these online resources is at the mercy of the organizations or individuals that host them. Many disappear after publication (and some even disappear before[2]), leading to a well-documented phenomenon referred to as link rot or link decay. Figure 1 Growth of scholarly online resources. Not only are the number of URL-containing articles (those with "http" in the title or abstract) published per year increasing (dotted line), but also the percentage of published items containing URLs (solid line). The annual increase in articles according to a linear fit was 174 with R2 0.97. The linear trend for the percentage was an increase of 0.010% per year with R2 0.98. Source: Thomas Reuter's Web of Science Full size image The problem has been documented in several subject areas, with Table 1 containing a large list of these subject-specific studies. In terms of wide, cross-disciplinary analyses, the closest thus far are those of the biological and medical MEDLINE and PubMed databases by Ducut [1] and Wren [3, 4], in addition to Yang's study of the Social Sciences within the Chinese Social Sciences Citation Index (CSSCI) [5]. Table 1 Link decay has been studied for several years in specific subject areas.Full size table Some solutions have been proposed which attack the problem from different angles. The Internet Archive (IA) [6] and WebCite (WC) [7] address the issue by archiving web pages, though their mechanisms for acquiring those pages differ. The IA, beginning from a partnership with the Alexa search engine, employs an algorithm that crawls the Internet at large, storing snapshots of pages it encounters along the way. In contrast, WebCite archives only those pages which are submitted to it, and it is geared toward the scientific community. These two methods, however, can only capture information that is visible from the client. Logic and data housed on the server are not frequently available. Other tools, like the Digital Object Identifier (DOI) System [8] and Persistent Uniform Resource Locator (PURL) [9], provide solutions for when a web resource is moved to a different URL but is still available. The DOI System was created by an international consortium of organizations wishing to assign unique identifiers to items such as movies, television shows, books, journal articles, web sites and data sets. It encompasses several thousand "Naming Authorities" organized under a few "Registration Agencies" that have a lot of flexibility in their business models[10]. Perhaps 30-60% of link rot could be solved using DOIs and PURLs[11, 12]. However they are not without pitfalls. One is that a researcher or company could stop caring about a particular tool for various reasons and thus not be interested in updating its permanent identifier. Another is that the one wanting the permanent URL (the publishing author) is frequently not the same as the person administering the site itself over the long term, thus we have an imbalance of desire vs. responsibilities between the two parties. A third in the case of the DOI System is that there may be a cost in terms of money and time associated with registering their organization that could be prohibitive to authors that don't already have access to a Naming Authority[1]. One example of a DOI System business model would be that of the California Digital Library's EZID service, which charges a flat rate (currently $2,500 for a research institution) for up to 1 million DOIs per year[13]. In this study, we ask two questions: what are the problem's characteristics in scientific literature as a whole and how is it being addressed? To assess progress in combating the problem, we evaluate the effectiveness of the two most prevalent preservation engines: and examine the effectiveness of one prototyped solution. If a URL is published in the abstract, it is assumed that the URL plays a prominent role within that paper, similar to the rationale proposed by Wren [4]. Results Our goals are to provide some metrics that are useful in understanding the problem of link decay in a cross-disciplinary fashion and to examine the effectiveness of the existing archival methods while proposing some incremental improvements. To accomplish these tasks, we downloaded 18,231 Web of Science (WOS) abstracts containing "http" in the title or abstract from the years under study (1996-2010), out of which 17,110 URLs (14,489 unique) were extracted and used. We developed Python scripts to access these URLs over a 30-day period. For the period studied, 69% of the published URLs (67% of the unique) were available on the live Internet, the Internet Archive's Wayback Machine had archived 62% (59% unique) of the total and WebCite had 21% (16% unique). Overall, 65% of all URLs (62% unique) were available from one of the two surveyed archival engines. Figure 2 contains a breakdown by year for availability on the live web as well as through the combined archives, and Figure 3 illustrates each archival engine's coverage. The median lifetime for published URLs was found to be 9.3 years (95% CI [9.3,10.0]), with the median lifetime amongst unique URLs also being 9.3 years (95% CI [9.3,9.3]). Subject-specific lifetimes may be found in Table 2. Using a simple linear model, the chances that a URL published in a particular year is still available goes down by 3.7% for each year added to its age with an R2 of 0.96. Its chances of being archived go up after an initial period of flux (see Figure 2). Submitting our list of unarchived but living URLs to the archival engines showed dramatic promise, increasing the Internet Archive's coverage of the dataset by 2080 URLs, an increase of 22%, and WebCite's by 6348, an increase of 255%. Figure 2 The accessibility of URLs from a particular year is closely correlated with age. The probability of being available (solid line) declines by 3.7% every year based on a linear model with R2 0.96. The surveyed archival engines have about a 70-80% archival rate (dotted line) following an initial ramp time. Full size image Figure 3 URL presence in the archives. Percentage of URLs found in the archives of the Internet Archive (dashed line), WebCite (dotted line) or in any group (solid line). IA is older, and thus accounts for the lion's share of earlier published URLs, though as time goes on WebCite is offering more and more. Full size image Table 2 Comparison of certain statistics based on the subject of a given URL.Full size table How common are published, scholarly online resources? For WOS, both the percentage of published items which contained a URL as well as their absolute number increased steadily since 1996 as seen in Figure 1. Simple linear fits showed the former's annual increase at a conservative 0.010 % per year with an R2 of 0.98 while the latter's increase was 174 papers with an R2 of 0.97. A total of 189 (167 unique) DOI URLs were identified, consisting of 1% of the total, while 9 PURLs (8 unique) were identified. Due to cost[14], it is likely that DOIs will remain useful for tracking commercially published content though not the scholarly online items independent of those publishers. URL survival In order to shed some light on the underlying phenomena of link rot, a survival regression model was fitted with data from the unique URLs. This model, shown in Table 3, identified 17 top-level domains, the number of times a URL has been published, a URL's directory structure depth (hereafter referred to as "depth", using the same definition as [15]), the number of times the publishing article(s) has been cited, whether articles contain funding text as well as 4 journals as having a significant impact on a URL's lifetime at the P< 0.001 level. This survival regression used the logistic distribution and is interpreted similarly to logistic models. To determine the predicted outcome for a particular URL, one takes the intercept (5.2) and adds to it the coefficients for the individual predictors if those predictors are different from the base level; coefficients here are given in years. If numeric, one first multiplies before adding. The result is then interpreted as the location of the peak of a bell curve for the expected lifetime, instead of a log odds ratio as a regular logistic model would give. Among the two categorical predictors (domains and journals having more than 100 samples), the three having the largest positive impact on lifetimes were the journal Zoological Studies (+16) and the top-level domains org and dk (+8 for both). Though smaller in magnitude than the positive ones, the 3 categorical predictors having the largest negative impact were the journals Computer Physics Communications (-4) and Bioinformatics (-2) as well as the domain kr (-3), though the P values associated with the latter two are more marginal than some of the others (.006 and .02 respectively). Table 3 Results of fitting a parametric survival regression using the logistic distribution to the unique URLs.Full size table Predictors of availability While examining URL survival and archival, it is not only interesting to ask which factors significantly correlate with a URL lasting but also which account for most of the differences. To that end, we fit logistic models for each of the measured outcomes (live web, Internet Archive and Web Citation availabilities) to help tease out that information. To enhance comparability, a similar list of predictors (differing only in whether the first or last year a URL was published was used) without interaction terms was employed for all 3 methods and unique deviance calculated by dropping each term from the model and measuring the change in residual deviance. Results were then expressed as a percentage of the total uniquely explained deviance and are graphically shown in Figure 4. Figure 4 How important is each predictor in predicting whether a URL is available? This graph compares what portion of the overall deviance is explained uniquely by each predictor for each of the measured outcomes. A similar list of predictors (differing only in whether the first or last year a URL was published) without interaction terms was employed to construct 3 logistic regression models. The dependent variable for each of the outcomes under study (Live Web, Internet Archive and WebCite) was availability at the time of measurement. Unique deviance was calculated by dropping each term and measuring the change in explained deviance in the logistic model. Results were then expressed as a percentage of the total uniquely explained deviance for each of the 3 methods. Full size image For live web availability, the most deviance was explained by the last year a URL was published (42%) followed by the domain (26%). That these two predictors are very important agrees with much of the published literature thus far. For the Internet Archive, by far the most important predictor was the URL depth at 45%. Based on this, it stands to reason that the Internet Archive either prefers more popular URLs which happen to be at lower depths or employs an algorithm that prioritizes breadth over depth. Similar to the IA, WC had a single predictor that accounted for much of the explained deviance, with the publishing journal representing 49% of the explained deviance. This may reflect WC's efforts to work with publishers as the model shows one of the announced early adopters, BioMed Central [7], as having the two measured journals (BMC Bioinformatics and BMC Genomics) with the highest retention rates. Therefore, WC is biased towards a publication's source (journals). Archive site performance Another way to measure the effectiveness of the current solutions to link decay is to look at the number of "saved" URLs, or those missing ones that are available through archival engines. Out of the 31% of URLs (33% of the unique) which were not accessible on the live web, 49% of them (47% of the unique) were available in one of the two engines, with IA having 47% (46% unique) and WC having 7% (6% unique). WC's comparatively lower performance can likely be attributed to a combination of its requirement for human interaction and its still-growing adoption. In order to address the discrepancy, all sites that were still active but not archived were submitted to the engine(s) from which they were missing. Using the information gleaned from probing the sites as well as the archives, URLs missing from one or both of the archives, yet still alive, were submitted programmatically. This included submitting 2,662 to the Wayback Machine as well as 7,477 to WebCite, of which 2,080 and 6,348 were successful, respectively. Discussion Submission of missing URLs to archives Archiving missing URLs in each of the archival engines had their own special nuances. For the Internet Archive, the lack of a practical documented way of submitting URLs (see http://faq.web.archive.org/my-sites-not-archived-how-can-i-add-it/) necessitated trusting a message shown by the Wayback Machine when one finds a URL that isn't archived and clicks the "Latest" button. In this instance, the user is sent to the URL "http://liveweb.archive.org/" which has a banner proclaiming that the page "will become part of the permanent archive in the next few months". Interestingly, as witnessed by requests for a web page hosted on a server for which the authors could monitor the logs, only those items requested by the client were downloaded. This meant that if only a page's text were fetched, supporting items such as images and CSS files would not be archived. To archive the supporting items and avoid duplicating work, wget's "--page-requisites" option was used instead of a custom parser. WebCite has an easy-to-use API for submitting URLs, though limitations during the submission of our dataset presented some issues. The biggest issue was WebCite's abuse detection process, which would flag the robot after it had made a certain number of requests. To account for this and be generally nice users, we added logic to ensure a minimum delay between archival requests submitted to both the IA and WC. Exponential delay logic was implemented for WC when encountering general timeouts, other failures (like mysql error messages) or the abuse logic. Eventually, we learned that certain URLs would cause WC's crawler to timeout indefinitely, requiring the implementation of a maximum retry count (and a failure status) if the error wasn't caused by the abuse logic. To estimate what impact we had on the archives' coverage of the study URLs, we compared a URL survey done directly prior to our submission process to one done afterwards; a period of about 3.5 months. It was assumed that the contribution due to unrelated processes would not be very large given that there was only a modest increase in coverage, 5% for IA and 1% for WC, over the previous period of just under a year and a half. Each of the two archival engines had interesting behaviors which required gauging successful submission of a URL by whether it was archived as of a subsequent survey rather than using the statuses returned by the engines. For the Internet Archive, it was discovered that an error didn't always indicate failure, as there were 872 URLs for which wget returned an error but which were successfully archived. Conversely, WebCite returned an asynchronous status, such that even in the case of a successful return the URL might fail archival; the case in 955 out of a total of 7,285. Submitting the 2662 URLs to IA took a little less than a day, whereas submitting 7285 to WC took over 2 months. This likely reflects IA's large server capacity, funding and platform maturity due to its age. Generating the list of unique URLs Converting some of the potential predictors from the list of published URLs to the list of unique URLs presented some unique issues. In particular, while converting those based on the URL itself (domain, depth, whether alive or in an archive) were straightforward, those which depended upon a publishing article (number of times URL was published, the number of times an article was cited, publishing journal, whether there was funding text) were estimated by collating the data from each publishing. Only a small amount, 8%, of the unique URLs, appeared more than once, and among the measured variables that pertained to the publishing there was not a large amount of variety. Amongst repeatedly-published URLs, 43% appeared in only one journal and the presence of funding text was the same 76% of the time. For calculating the number of times a paper was published, multiple appearances of a URL within a given title/abstract were counted as one. Thus, while efforts were made to provide a representative collated value where appropriate, it's expected that different methods would not have produced significantly different results. Additional sources of error Even though WOS's index appears to have better quality Optical Character Recognition (OCR) than PubMed, it still has OCR artifacts. To compensate for this, the URL extraction script tried to use some heuristics to detect the most common sources of error and correct them. Some of the biggest sources of error were: randomly inserted spaces in URLs, "similar to" being substituted for the tilde character, periods being replaced with commas and extra punctuation being appended to the URL (sometimes due to the logic added to address the first issue). Likely the largest contributors to false negatives are errors in OCR and the attempts to compensate for them. In assessing the effectiveness of our submissions to IA, it is possible that the estimate could be understated due to URLs that had been submitted but not yet made available within the Wayback Machine. Dynamic websites with interactive content, if only present via an archiving engine, would be a source of false positives, as the person accessing the resource would presumably want to use it as opposed to viewing the design work of its landing page. If a published web site goes away and another installed in its place (especially true if a .com or .net domain is allowed to expire), then the program will not be able to tell the difference since it will see a valid (though impertinent) web site. In addition, though page contents can change and lose relevance from their original use[16], dates of archival were not compared to the publication date. Another source of false positive error would be uncaught OCR artifacts that insert spaces within URLs if it truncated the path but left the correct host intact. The result would be a higher probability that the URL would appear as a higher level index page, which are generally more likely to function than pages at lower levels [11, 12]. Bibliographic database Web of Science was chosen because, compared to PubMed, it was more cross-sectional and had better OCR quality based on a small sampling. Many of the other evaluation criteria were similar between PubMed and WOS, as both contain scholarly work and have an interface to download bibliographic data. Interestingly, due to the continued presence of OCR issues in newer articles, it appears that bibliographic information for some journals is not yet passed electronically. Conclusions Based on the data gathered in this and other studies, it is apparent that there is still a problem with irretrievable scholarly research on the Internet. We found that roughly 50% of URLs published 11 years prior to the survey (in 2000) are still left standing. Interesting is that the rate of decay for late-published URLs (within the past 11 years) appears to be higher than that for the older ones, lending credence to what Koehler suggested about eventual decay rate stabilization[17]. Survival rates for living URLs published between 1996 and 1999, inclusive, only vary by 2.4% (1.5% for unique) and have poor linear fits (R2 of .51 and .18 for unique), whereas years [2000, 2010] have linear slope 0.031 and R2 .90 (.036 and R2 .95 for unique URLs using the first published year) indicating that the availability between years for older URLs is much more stable whereas the availability for more recent online resources follow a linear trend with a predictable loss rate. Overall, 84% of URLs (82% of the unique) were available in some manner: either via the web, IA or WC. Several remedies are available to address different aspects of the link decay problem. For data-based sites that can be archived properly with an engine such as the Internet Archive or WebCite, one remedy is to submit the missing sites which are still alive to the archiving engines. Based on the results of our prototype (illustrated in Figure 5), this method was wildly successful, increasing IA's coverage of the study's URLs by 22% and WebCite's by 255%. Journals could require authors to submit URLs to both the Internet Archive and WebCite, or alternatively programs similar to those employed in this study could be used to do it automatically. Another way to increase archival would be for the owners of published sites to ease restrictions for archiving engines since 507 (352 unique) of the published URLs had archiving disabled via robots.txt according to the Internet Archive. Amongst these, 16% (22% of the unique) have already ceased being valid. While some sites may have good reason for blocking automated archivers (such as dynamic content or licensing issues), there may be others that could remove their restrictions entirely or provide an exception for preservation engines. Figure 5 Coverage of the scholarly URL list for each archival engine at different times. All URLs marked as alive in 2011 but missing from an archive were submitted between the 2012 and 2013 surveys. The effect of submitting the URLs is most evident in the WebCite case though the Internet Archive also showed substantial improvement. Implementing an automated process to do this could vastly improve the retention of scholarly static web pages. Full size image To address the control issue for redirection solutions (DOI, PURL) mentioned in the introduction, those who administer cited tools could begin to maintain and publish a permanent URL on the web site itself. Perhaps an even more radical step would be for either these existing tools or some new tool to take a Wikipedia approach and allow end-users to update and search a database of permanent URLs. Considering the studies that have shown around at least 30% of dead URLs to be locatable using web search engines [3, 18], such a peer-maintained system could be effective and efficient, though spam could be an issue if not properly addressed. For dynamic websites, the current solutions are more technically involved, potentially expensive and less feasible. These include mirroring (hosting a website on another server, possibly at another institution) and providing access to the source code, both of which require time and effort. Once the source is acquired, it can sometimes take considerable expertise to make use of it as there may be complex libraries or framework configuration, local assumptions hard-coded into the software or it could be written for a different platform (GPU, Unix, Windows, etc.). The efforts to have reproducible research, where the underlying logic and data behind the results of a publication are made available to the greater community, have stated many of the same requirements as preserving dynamic websites [19, 20]. Innovation in this area could thus have multiple benefits beyond just the archival. Methods Data preparation and analysis The then-current year (2011) was excluded to eliminate bias from certain journals being indexed sooner than others. For analysis and statistical modeling, the R program [21] and its "survival" library [22] were used (scripts included in Additional file 1). Wherever possible, statistics are presented in 2 forms: one representing the raw list of URLs extracted from abstracts and the other representing a deduplicated set of those URLs. The former is most appropriate when thinking about what a researcher would encounter when trying to use a published URL in an article of interest and also serves as a way to give weight to multiply-published URLs. The latter is more appropriate when contemplating scholarly URLs as a whole or when using statistical models that assume independence between samples. URLs not the goal of this study such as journal promotions and invalid URLs were excluded using computational methods as much as possible in order to minimize subjective bias. The first method, removing 943 (26 unique), looked for identical URLs which comprised a large percentage of a journal's published collection within a given year. Upon manual examination, a decision was then made whether to eliminate them. The second method, which identified 18 invalid URLs (all unique), consisted of checking for WebCitation's "UnexpectedXML" error. These URLs were corrupted to the point that they interfered with XML interpretation of the request due either to an error in our parsing or the OCR. DOI sites were identified by virtue of containing "http://dx.doi.org". PURL sites were identified by virtue of containing "http://purl." in the URL. Interestingly, 3 PURL servers were identified through this mechanism: http://purl.oclc.org, http://purl.org and http://purl.access.gpo.gov. To make for results more comparable to prior work as well as easier to interpret analysis, a URL was considered available if it successfully responded to at least 90% of the requests and unavailable if less than that. This method is similar to the method used by Wren[4], and differs from Ducut's[1] by not using a "variable availability" category defined as being available > 0% and < 90% of the time. Our results show that 466 unique URLs (3.2%) would have been in this middle category, a number quite similar to what Wren's and Ducut's would have been (3.4% and 3.2%, respectively). Being such a small percentage of the total, their treatment is not likely to affect analysis much regardless of how they are interpreted. Having binary data also eases interpretation of the statistical models. In addition, due to the low URL counts for 1994 (3) and 1995 (22), these years were excluded from analysis. Survival model Survival analysis was chosen to analyze living URLs due to its natural fit; like people, URLs have lifetimes and we are interested in discussing them, what causes them to be longer or shorter and by how much. Lifetimes were calculated by assuming URLs were alive each time they were published, which is a potential source of error [2]. Data was coded as either right or left-censored; right-censored since living URLs presumably would die at an unknown time in the future and left-censored because it was unknown when a non-responding URL had died. Ages were coded in months rather than years in order to increase accuracy and precision. Parametric survival regression models were constructed using R's survreg(). In selecting the distribution to use, all of those available were tried, with the logistical showing the best overall fit based on Akaike Information Criterion (AIC) score. Better fits for two of the numeric predictors (number of citations to a publishing paper and number of times a URL was published) were obtained by taking the base 2 logarithm. Collinearity was checked by calculating the variance inflation factor against a logistic regression fit to the web outcome variable. Overall lifetime estimates were made using the survfit() function from R's survival library. Extracting and testing URLs To prepare a list of URLs (and their associated data), a collection of bibliographic data was compiled by searching WOS for "http" in the title or abstract, downloading the results (500 at a time), then finally collating them into a single file. A custom program (extract_urls.py in Additional file 1) was then used to extract the URLs and associated metadata from these, after which 5 positive and 2 negative controls were added. A particular URL was only included once per paper. With the extracted URLs in hand, another custom program (check_urls_web.py in Additional file 1) was used to test the availability of the URLs 3 times a day over the course of 30 days, starting April 16, 2011. These times were generated randomly by scheduler.py (included in Additional file 1), the algorithm guaranteeing that no consecutive runs were closer than 2 hours. A given URL was only visited once per run even if it was published multiple times, saving load on the server and speeding up the total runtime (which averaged about 25 minutes due to use of parallelism). Failure was viewed as anything that caused an exception in python's "urllib2" package (which includes error statuses, like 404), with the exception reason being recorded for later analysis. While investigating some of the failed fetches, a curious thing was noted: there were URLs that would consistently work with a web browser but not with the Python program or other command line downloaders like wget. After some investigation, it was realized that the web server was denying access to unrecognized User Agent strings. In response, the Python program adopted the User Agent of a regular browser and subsequently reduced the number of failed URLs. At the end of the live web testing period, a custom program (check_urls_archived.py in Additional file 1) was used to programmatically query the archive engines on May 23, 2011. For the Internet Archive's Wayback Machine, this was done using an HTTP HEAD request (which saves resources vs. GET) on the URL formed by "http://web.archive.org/web/*/" + . Status was judged by the resulting HTTP status code with 200 meaning success, 404 meaning not archived, 403 signifying a page blocked due to robots.txt and 503 meaning that the server was too busy. Because there were a number of these 503 codes, the script would make up to 4 attempts to access the URL, with increasing back off delays to keep from overloading IA's servers. The end result still contained 18, which were counted as not archived for analysis. For WebCite, the documented API was used. This supports returning XML, a format very suitable to automated parsing [23]. For sites containing multiple statuses, any successful archiving was taken as a success. References 1.Ducut E, Liu F, Fontelo P: An update on Uniform Resource Locator (URL) decay in MEDLINE abstracts and measures for its mitigation. BMC Med Inform Decis Mak. 2008, 8:- PubMed Central  Article  PubMed  Google Scholar  2.Aronsky D, Madani S, Carnevale RJ, Duda S, Feyder MT: The prevalence and inaccessibility of Internet references in the biomedical literature at the time of publication. J Am Med Inform Assn. 2007, 14: 232-234. 10.1197/jamia.M2243. Article  Google Scholar  3.Wren JD: URL decay in MEDLINE - a 4-year follow-up study. Bioinformatics. 2008, 24: 1381-1385. 10.1093/bioinformatics/btn127. CAS  Article  PubMed  Google Scholar  4.Wren JD: 404 not found: the stability and persistence of URLs published in MEDLINE. Bioinformatics. 2004, 20: 668-U208. 10.1093/bioinformatics/btg465. CAS  Article  PubMed  Google Scholar  5.Yang SL, Qiu JP, Xiong ZY: An empirical study on the utilization of web academic resources in humanities and social sciences based on web citations. Scientometrics. 2010, 84: 1-19. 10.1007/s11192-009-0142-7. Article  Google Scholar  6.The Internet Archive. [http://www.archive.org/web/web.php] 7.Eysenbach G, Trudell M: Going, going, still there: Using the WebCite service to permanently archive cited web pages. Journal of Medical Internet Research. 2005, 7: 2-6. 10.2196/jmir.7.1.e2. Article  Google Scholar  8.The DOI System. [http://www.doi.org/] 9.PURL Home Page. [http://purl.org] 10.Key Facts on Digital Object identifier System. [http://www.doi.org/factsheets/DOIKeyFacts.html] 11.Wren JD, Johnson KR, Crockett DM, Heilig LF, Schilling LM, Dellavalle RP: Uniform resource locator decay in dermatology journals - Author attitudes and preservation practices. Arch Dermatol. 2006, 142: 1147-1152. 10.1001/archderm.142.9.1147. Article  PubMed  Google Scholar  12.Casserly MF, Bird JE: Web citation availability: Analysis and implications for scholarship. College & Research Libraries. 2003, 64: 300-317. 10.5860/crl.64.4.300. Article  Google Scholar  13.EZID: Pricing. [http://n2t.net/ezid/home/pricing] 14.Wagner C, Gebremichael MD, Taylor MK, Soltys MJ: Disappearing act: decay of uniform resource locators in health care management journals. J Med Libr Assoc. 2009, 97: 122-130. 10.3163/1536-5050.97.2.009. PubMed Central  Article  PubMed  Google Scholar  15.Koehler W: An analysis of Web page and Web site constancy and permanence. J Am Soc Inf Sci. 1999, 50: 162-180. 10.1002/(SICI)1097-4571(1999)50:2<162::AID-ASI7>3.0.CO;2-B. Article  Google Scholar  16.Bar-Ilan J, Peritz BC: Evolution, continuity, and disappearance of documents on a specific topic on the web: A longitudinal study of "informetrics". Journal of the American Society for Information Science and Technology. 2004, 55: 980-990. 10.1002/asi.20049. Article  Google Scholar  17.Koehler W: A longitudinal study of Web pages continued: a consideration of document persistence. Information Research-an International Electronic Journal. 2004, 9: -- Google Scholar  18.Casserly MF, Bird JE: Web citation availability - A follow-up study. Libr Resour Tech Ser. 2008, 52: 42-53. 10.5860/lrts.52n1.42. Article  Google Scholar  19.Peng RD: Reproducible research and Biostatistics. Biostatistics. 2009, 10: 405-408. 10.1093/biostatistics/kxp014. Article  PubMed  Google Scholar  20.Ince DC, Hatton L, Graham-Cumming J: The case for open computer programs. Nature. 2012, 482: 485-488. 10.1038/nature10836. CAS  Article  PubMed  Google Scholar  21.R Development Core Team: R: A Language and Environment for Statistical Computing. Book R: A Language and Environment for Statistical Computing. 2011, City: R Foundation for Statistical Computing Google Scholar  22.Therneau T: A Package for Survival Analysis in S. Book A Package for Survival Analysis in S. 2012, City, 2.36-12 Google Scholar  23.WebCite Technical Background and Best Practices Guide. [http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf] 24.Markwell J, Brooks DW: "Link rot" limits the usefulness of web-based educational materials in biochemistry and molecular biology. Biochemistry and Molecular Biology Education. 2003, 31: 69-72. 10.1002/bmb.2003.494031010165. Article  Google Scholar  25.Thorp AW, Brown L: Accessibility of internet references in Annals of Emergency Medicine: Is it time to require archiving?. Ann Emerg Med. 2007, 50: 188-192. 10.1016/j.annemergmed.2006.11.019. Article  PubMed  Google Scholar  26.Carnevale RJ, Aronsky D: The life and death of URLs in five biomedical informatics journals. International Journal of Medical Informatics. 2007, 76: 269-273. 10.1016/j.ijmedinf.2005.12.001. Article  PubMed  Google Scholar  27.Dimitrova DV, Bugeja M: Consider the source: Predictors of online citation permanence in communication journals. Portal-Libraries and the Academy. 2006, 6: 269-283. 10.1353/pla.2006.0032. Article  Google Scholar  28.Duda JJ, Camp RJ: Ecology in the information age: patterns of use and attrition rates of internet-based citations in ESA journals, 1997-2005. Frontiers in Ecology and the Environment. 2008, 6: 145-151. 10.1890/070022. Article  Google Scholar  29.Rhodes S: Breaking Down Link Rot: The Chesapeake Project Legal Information Archive's Examination of URL Stability. Law Library Journal. 2010, 102: 581-597. Google Scholar  30.Goh DHL, Ng PK: Link decay in leading information science journals. Journal of the American Society for Information Science and Technology. 2007, 58: 15-24. 10.1002/asi.20513. Article  Google Scholar  31.Russell E, Kane J: The missing link - Assessing the reliability of Internet citations in history journals. Technology and Culture. 2008, 49: 420-429. 10.1353/tech.0.0028. Article  Google Scholar  32.Dellavalle RP, Hester EJ, Heilig LF, Drake AL, Kuntzman JW, Graber M, Schilling LM: Information science - Going, going, gone: Lost Internet references. Science. 2003, 302: 787-788. 10.1126/science.1088234. CAS  Article  PubMed  Google Scholar  33.Evangelou E, Trikalinos TA, Ioannidis JPA: Unavailability of online supplementary scientific information from articles published in major journals. Faseb Journal. 2005, 19: 1943-1944. 10.1096/fj.05-4784lsf. CAS  Article  PubMed  Google Scholar  34.Sellitto C: The impact of impermanent web-located citations: A study of 123 scholarly conference publications. Journal of the American Society for Information Science and Technology. 2005, 56: 695-703. 10.1002/asi.20159. Article  Google Scholar  35.Bar-Ilan J, Peritz B: The lifespan of "informetrics" on the Web: An eight year study (1998-2006). Scientometrics. 2009, 79: 7-25. 10.1007/s11192-009-0401-7. Article  Google Scholar  36.Gomes D, Silva MJ: Modelling Information Persistence on the Web. Book Modelling Information Persistence on the Web. 2006, City Google Scholar  37.Markwell J, Brooks DW: Evaluating web-based information: Access and accuracy. Journal of Chemical Education. 2008, 85: 458-459. 10.1021/ed085p458. CAS  Article  Google Scholar  38.Wu ZQ: An empirical study of the accessibility of web references in two Chinese academic journals. Scientometrics. 2009, 78: 481-503. 10.1007/s11192-007-1951-1. Article  Google Scholar  Download references Acknowledgements The authors would like to thank the South Dakota State University departments of Mathematics & Statistics and Biology & Microbiology for their valuable feedback. Declarations Publication of this article was funded by the National Institutes of Health [GM083226 to SXG]. This article has been published as part of BMC Bioinformatics Volume 14 Supplement 14, 2013: Proceedings of the Tenth Annual MCBIOS Conference. Discovery in a sea of data. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S14. Author information Affiliations Department of Mathematics and Statistics, South Dakota State University, Box 2220, Brookings, SD, 57007, USA Jason Hennessey & Steven Xijin Ge Authors Jason HennesseyView author publications You can also search for this author in PubMed Google Scholar Steven Xijin GeView author publications You can also search for this author in PubMed Google Scholar Corresponding author Correspondence to Steven Xijin Ge. Additional information Competing interests The authors declare that they have no competing interests. Authors' contributions JH implemented the tools for data acquisition and statistical analysis as well as performed a literature review and drafting of the paper. SXG implemented an initial prototype and provided valuable feedback at every step of the process, including critical revision of this manuscript. Electronic supplementary material Additional file 1: supplement.zip. Contains source code used to perform the study, written in python and R. README.txt contains descriptions for each file. (ZIP 40 KB) Rights and permissions This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Reprints and Permissions About this article Cite this article Hennessey, J., Ge, S.X. A cross disciplinary study of link decay and the effectiveness of mitigation techniques. BMC Bioinformatics 14, S5 (2013). https://doi.org/10.1186/1471-2105-14-S14-S5 Download citation Published: 09 October 2013 DOI: https://doi.org/10.1186/1471-2105-14-S14-S5 Keywords Optical Character Recognition Universal Resource Locator Internet Archive Naming Authority Survival Regression Model Download PDF Advertisement BMC Bioinformatics ISSN: 1471-2105 Contact us Submission enquiries: Access here and click Contact Us General enquiries: ORSupport@springernature.com Read more on our blogs Receive BMC newsletters Manage article alerts Language editing for authors Scientific editing for authors Policies Accessibility Press center Support and Contact Leave feedback Careers Follow BMC BMC Twitter page BMC Facebook page BMC Weibo page By using this website, you agree to our Terms and Conditions, California Privacy Statement, Privacy statement and Cookies policy. Manage cookies/Do not sell my data we use in the preference centre. © 2021 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. \ doi-org-9525 ---- "Blockchain Empowers Social Resistance and Terrorism Through Decentrali" by Armin Krishnan Home Search Browse Collections My Account About Digital Commons Network™ Skip to main content My Account FAQ About This IR Scholar Commons     Home > USF Libraries > Open Access Journals > JSS > Vol. 13 > No. 1 (2020)   Article Title Blockchain Empowers Social Resistance and Terrorism Through Decentralized Autonomous Organizations Authors Armin Krishnan, East Carolina UniversityFollow Author Biography Armin Krishnan, PhD is an Associate Professor and Director of Security Studies at East Carolina University. He is the author of five books on new developments in warfare and conflict, including Killer Robots: The Legality and Ethicality of Autonomous Weapons published by Ashgate and Military Neuroscience and the Coming Age of Neurowarfare published by Routledge. His most recent book is Why Paramilitary Operations Fail published by Palgrave Macmillan. Dr. Krishnan has earned his doctorate from University of Salford, UK and he holds other graduate degrees in Political Science and International Relations from the University of Munich and the University of Salford. He has previously taught Intelligence Studies as a Visiting Assistant Professor at the University of Texas at El Paso. DOI https://doi.org/10.5038/1944-0472.13.1.1743 Subject Area Keywords Cybersecurity, Nonstate actors, Security studies, Social media Abstract The invention of the Internet has changed the way social resistance, revolutionary movements and terror groups are organized with new features such as loose network organization, netwars, social media campaigns, and lone wolf attacks. This article argues that blockchain technology will lead to more far-reaching changes in the organization of resistance to authority. Blockchain is a distributed ledger that records transactions using a consensus protocol, and when it meets objective conditions, it also enables smart contracts that execute transactions. Blockchain technology is not only a system for transferring value, but also it is a trustless system in which strangers can cooperate without the need for having to trust each other, as computer code governs their interactions. Blockchain will not only allow resistance/ terror organizations to easily receive donations globally, to have assets that a government can easily confiscate, and to disseminate censorship-resistant propaganda, but more importantly, to operate and cooperate across the world in a truly leaderless, coordinated, and highly decentralized fashion. Governments will need to be more proactive in the area of blockchain technology to mitigate some of the dangers to political stability that may emerge from it. Acknowledgements I want to thank the anonymous reviewers of the article for their encouragement, insights, and constructive criticism that has helped to improve the quality of the article. Recommended Citation Krishnan, Armin. "Blockchain Empowers Social Resistance and Terrorism Through Decentralized Autonomous Organizations." Journal of Strategic Security 13, no. 1 (2020) : 41-58. DOI: https://doi.org/10.5038/1944-0472.13.1.1743 Available at: https://scholarcommons.usf.edu/jss/vol13/iss1/3 View as PDF DOWNLOADS Since March 12, 2020 Share COinS       Journal Home Abstracting & Indexing Aims & Scope Associates Call For Papers Editorial Board Instructions for Reviewers Policies for the Journal of Strategic Security Publication Ethics Submit Manuscript Submission Requirements Most Popular Papers Receive Email Notices or RSS Special Issues: Vol. 13, No. 4 Climate Change and Global Security Volume 9, No. 4, Special Issue Winter 2016: Understanding and Resolving Complex Strategic Security Issues Volume 9, No. 1, Special Issue Spring 2016: Designing Danger: Complex Engineering by Violent Non-State Actors Select an issue: All Issues Vol. 14, No. 1 Vol. 13, No. 4 Climate Change and Global Security Vol. 13, No. 3 Vol. 13, No. 2 Vol. 13, No. 1 Vol. 12, No. 4 Vol. 12, No. 3 Vol. 12, No. 2 Vol. 12, No. 1 Vol. 11, No. 4 Vol. 11, No. 3 Vol. 11, No. 2 Vol. 11, No. 1 Vol. 10, No. 4 Vol. 10, No. 3 Vol. 10, No. 2 Vol. 10, No. 1 Volume 9, No. 4, Special Issue Winter 2016: Understanding and Resolving Complex Strategic Security Issues Vol. 9, No. 3 Vol. 9, No. 2 Volume 9, No. 1, Special Issue Spring 2016: Designing Danger: Complex Engineering by Violent Non-State Actors Volume 8, No. 3, Fall 2015 Supplement: Eleventh Annual IAFIE Conference Vol. 8, No. 4 Volume 8, No. 3, Special Issue Fall 2015: Intelligence: Analysis, Tradecraft, Training, Education, and Practical Application Vol. 8, No. 1 Volume 7, No. 4, Special Issue Winter 2014: Future Challenges in Drone Geopolitics Vol. 7, No. 3 Volume 7, No. 2, Special Issue Summer 2014: The Global SOF Network Vol. 7, No. 1 Volume 6, No. 3, Fall 2013 Supplement: Ninth Annual IAFIE Conference: Expanding the Frontiers of Intelligence Education Vol. 6, No. 4 Vol. 6, No. 3 Vol. 6, No. 2 Vol. 6, No. 1 Vol. 5, No. 4 Volume 5, No. 3, Fall 2012: Energy Security Vol. 5, No. 2 Vol. 5, No. 1 Volume 4, No. 4, Winter 2011: Perspectives on Radicalization and Involvement in Terrorism Vol. 4, No. 3 Volume 4, No. 2, Summer 2011: Strategic Security in the Cyber Age Vol. 4, No. 1 Vol. 3, No. 4 Vol. 3, No. 3 Vol. 3, No. 2 Vol. 3, No. 1 Vol. 2, No. 4 Vol. 2, No. 3 Vol. 2, No. 2 Vol. 2, No. 1 Vol. 1, No. 1   Search Enter search terms: Select context to search: in this journal in this repository across all repositories Advanced Search ISSN: 1944-0464 (Print)ISSN: 1944-0472 (Online)   Hosted By:   Digital Commons Scholar Commons | About This IR | FAQ | My Account | Accessibility Statement Privacy Copyright dp-la-4798 ---- Digital Public Library of America Skip to Main Content Digital Public Library of AmericaShow Menu Browse by Topic Browse by Partner Exhibitions Primary Source Sets My Lists About DPLA News DPLA Pro Browse by Topic Browse by Partner Exhibitions Primary Source Sets My Lists About DPLA News DPLA Pro Digital Public Library of America Donate Discover 43,203,601 images, texts, videos, and sounds from across the United States Search Browse by TopicNew? Start Here DPLA Ebooks Ebook services are core to our commitment to a library-led digital future. We've redesigned our DPLA Ebooks site to showcase how we are helping libraries take control of acquisition and delivery and make more diverse materials easily available while advocating for the needs of libraries in the marketplace. Explore now Online Exhibitions Browse all Exhibitions Recreational Tourism in the Mountain West The Show Must Go On! American Theater in the Great Depression Race to the Moon A History of US Public Libraries In Focus: The Evolution of the Personal Camera Activism in the US Primary Source Sets Browse all Sets Voting Rights Act of 1965 Ida B. Wells and Anti-Lynching Activism The Poetry of Maya Angelou The New Woman Lyndon Johnson's Great Society The Fifteenth Amendment Immigration and Americanization, 1880 - 1930 Space Race How can I use DPLA? Education Educators and students explore our Primary Source Sets to discover history and culture through primary sources and ideas for classroom use. Family Research Genealogists use our search tools to find free materials for their family history research projects. Lifelong Learning Lifelong learners enjoy browsing by topic and viewing Online Exhibitions to learn more about their interests. Scholarly Research Scholarly researchers use DPLA to find open access sources from archives across the country through a single portal. If you’re new to DPLA, these research guides will give you a head start using our site. The guides reflect a few key activities that attract visitors to DPLA, but you can explore many other interests here too. View all user guides DPLA News Browse the archives Flexible licensing models from the DPLA Exchange April 13, 2021 DPLA’s ebooks program serves our mission of maximizing access to digital content by giving libraries across the country greater control over their acquisition and delivery of ebooks and audiobooks DPLA to host Book Talk on Mistrust, by Ethan Zuckerman, on April 22nd at 1 pm ET April 2, 2021 We are pleased to invite you to join us at the inaugural DPLA Book Talk, which will feature a conversation between Mistrust author Ethan Zuckerman and Wikimedia Foundation CEO and… Join the DPLA Community + Open Board Meeting on April 9th March 25, 2021 With expanded vaccine access, many of us have begun to conceive of what our post-Covid worlds might look like. These visions are necessarily colored by all that we have learned… Stay informed Get the latest DPLA news in your inbox General News Ebooks Education Genealogy DPLA Frequently Asked Questions How Can I Use DPLA? Terms & Conditions Harmful Content About DPLA Contact Us Feedback News Tools Primary Source Sets Browse by Partner Browse by Topic Exhibitions My Lists Search DPLA Pro DPLA Pro Home Prospective Hubs Community Reps Hub Network Developers Education Projects Ebooks Events Donate duraspace-org-3775 ---- Home - Duraspace.org Projects DSpace Fedora VIVO Who’s Using Services ArchivesDirect DSpaceDirect DuraCloud Community Our Users Community Programs Service Providers Strategic Partners Membership Values and Benefits Current Members Financial Contributors Become a Member Support Choosing a Project Choosing a Service Technical Specifications Wiki Contact Us News & Events Latest News Event Calendar Webinars Monthly Newsletter About DuraSpace Projects Services Community Membership Support News & Events Projects DSpace Fedora VIVO Who’s Using Services ArchivesDirect DSpaceDirect DuraCloud Community Our Users Community Programs Service Providers Strategic Partners Membership Values and Benefits Current Members Financial Contributors Become a Member Support Choosing a Project Choosing a Service Technical Specifications Wiki Contact Us News & Events Latest News Event Calendar Webinars Monthly Newsletter Help us preserve and provide access to the world's intellectual, cultural and scientific heritage Join Us Learn More Latest News 4.22.21 Fedora 6.0: A Migration Story – The Berlin State Library 4.21.21 DSPACE 7.0 Beta 5 Now Available 4.16.21 DSpace 7.0 Testathon: How You Can Help Us Build a Better DSpace Through Testing & Reporting Our Global Community The community DuraSpace serves is alive with ideas and innovation aimed at collaboratively meeting the needs of the scholarly ecosystem that connects us all. Our global community contributes to the advancement of DSpace, Fedora and VIVO. At the same time subscribers to DuraSpace Services are helping to build best practices for delivery of high quality customer service. We are grateful for our community’s continued support and engagement in the enterprise we share as we work together to provide enduring access to the world’s digital heritage. Join us   Open Source Projects The Fedora, DSpace and VIVO community-supported projects are proud to provide more than 2500 users worldwide from more than 120 countries with freely-available open source software. Fedora is a flexible repository platform with native linked data capabilities. DSpace is a turnkey institutional repository application. VIVO creates an integrated record of the scholarly work of your organization.   Our Services ArchivesDirect, DSpaceDirect, and DuraCloud services from DuraSpace provide access to institutional resources, preservation of treasured collections, and simplified data management tools. Our services are built on solid open source software platforms, can be set up quickly, and are competitively priced. Staff experts work directly with customers to provide personalized on-boarding and superb customer support. DuraCloud is a hosted service that lets you control where and how your content is preserved in the cloud. DSpaceDirect is a hosted turnkey repository solution. ArchivesDirect is a complete, hosted archiving solution.   About About DuraSpace History What We Do Board of Directors Meet the Team Policies Reports Community Our Users Community Programs Service Providers Strategic Partners Membership Values & Benefits Current Members Financial Contributors Become a Member Support Choosing a Project Choosing a Service Technical Specifications Wiki Contact Us News & Events Latest News Event Calendar Webinars Monthly Newsletter This work is licensed under a Creative Commons Attribution 4.0 International License duraspace-org-9771 ---- News – Duraspace.org News – Duraspace.org Meet the Members Welcome to the first in a series of blog posts aimed at introducing you to some of the movers and shakers who work tirelessly to advocate, educate and promote Fedora and other community-supported programs like ours. At Fedora, we are strong because of our people and without individuals like this advocating for continued development we... Read more » The post Meet the Members appeared first on Duraspace.org. Fedora Migration Paths and Tools Project Update: January 2021 This is the fourth in a series of monthly updates on the Fedora Migration Paths and Tools project – please see last month’s post for a summary of the work completed up to that point. This project has been generously funded by the IMLS. The grant team has been focused on completing an initial build... Read more » The post Fedora Migration Paths and Tools Project Update: January 2021 appeared first on Duraspace.org. Fedora Migration Paths and Tools Project Update: December 2020 This is the third in a series of monthly updates on the Fedora Migration Paths and Tools project – please see last month’s post for a summary of the work completed up to that point. This project has been generously funded by the IMLS. The Principal Investigator, David Wilcox, participated in a presentation for CNI... Read more » The post Fedora Migration Paths and Tools Project Update: December 2020 appeared first on Duraspace.org. Fedora 6 Alpha Release is Here Today marks a milestone in our progress toward Fedora 6 – the Alpha Release is now available for download and testing! Over the past year, our dedicated Fedora team, along with an extensive list of active community members and committers, have been working hard to deliver this exciting release to all of our users. So... Read more » The post Fedora 6 Alpha Release is Here appeared first on Duraspace.org. Fedora Migration Paths and Tools Project Update: October 2020 This is the first in a series of monthly blog posts that will provide updates on the IMLS-funded Fedora Migration Paths and Tools: a Pilot Project. The first phase of the project began in September with kick-off meetings for each pilot partner: the University of Virginia and Whitman College. These meetings established roles and responsibilities... Read more » The post Fedora Migration Paths and Tools Project Update: October 2020 appeared first on Duraspace.org. Fedora in the time of COVID-19 The impacts of coronavirus disease 2019 are being felt around the world, and access to digital materials is essential in this time of remote work and study. The Fedora community has been reflecting on the value of our collective digital repositories in helping our institutions and researchers navigate this unprecedented time.  Many member institutions have... Read more » The post Fedora in the time of COVID-19 appeared first on Duraspace.org. NOW AVAILABLE: DSpace 7.0 Beta 2 The DSpace Leadership Group, the DSpace Committers and LYRASIS are proud to announce that DSpace 7.0 Beta 2 is now available for download and testing. Beta 2 is the second scheduled Beta release provided for community feedback and to introduce the new features of the 7.0 platform. As a Beta release, we highly advise against... Read more » The post NOW AVAILABLE: DSpace 7.0 Beta 2 appeared first on Duraspace.org. NOW AVAILABLE: VIVO 1.11.1 VIVO 1.11.1 is now available! VIVO 1.11.1 is a point release containing two patches to the previous 1.11.0 release: – Security patch that now prevents users with self-edit privileges from editing other user profiles [1] – Minor security patch to underlying puppycrawl dependency (CVE-2019-9658) [2] Upgrading from 1.11.0 to 1.11.1 should be a trivial drop-in... Read more » The post NOW AVAILABLE: VIVO 1.11.1 appeared first on Duraspace.org. NOW AVAILABLE: DSpace 7.0 Beta 1 The DSpace Leadership Group, the DSpace Committers and LYRASIS are proud to announce that DSpace 7.0 Beta 1 is now available for download and testing.  Beta1 is the first of several scheduled Beta releases provided for community feedback and to introduce the new features of the 7.0 platform. As a Beta release, we do not... Read more » The post NOW AVAILABLE: DSpace 7.0 Beta 1 appeared first on Duraspace.org. Curriculum Available: Islandora and Fedora Camp in Arizona The curriculum for the upcoming Islandora and Fedora Camp at Arizona State University, February 24-26, 2020 is now available here. Islandora and Fedora Camp, hosted by Arizona State University Libraries, offers everyone a chance to dive in and learn all about the latest versions of Islandora and Fedora. Training will begin with the basics and build... Read more » The post Curriculum Available: Islandora and Fedora Camp in Arizona appeared first on Duraspace.org. ejournals-bc-edu-5722 ---- Information Technology and Libraries Skip to main content Skip to main navigation menu Skip to site footer Current Archives Announcements About About the Journal Editorial Team Submissions Contact Privacy Statement Search Search Register Login Current Issue Vol 40 No 1 (2021) Published: 2021-03-15 Editorials Reviewers Wanted Letter from the Editor Kenneth J. Varnum PDF The Fourth Industrial Revolution Does It Pose an Existential Threat to Libraries? Brady Lund PDF We Can Do It for Free! Using Freeware for Online Patron Engagement Karin Suni, Christopher A. Brown PDF Utilizing Technology to Support and Extend Access to Students and Job Seekers during the Pandemic Daniel Berra PDF Articles User Experience Testing in the Open Textbook Adaptation Workflow A Case Study Camille Thomas, Kimberly Vardeman, Jingjing Wu PDF Peer Reading Promotion in University Libraries Based on a Simulation Study about Readers' Opinion Seeking in Social Network Yiping Jiang, Xiaobo Chi, Yan Lou, Lihua Zuo, Yeqi Chu, Qingyi Zhuge PDF Web Content Strategy in Practice within Academic Libraries Courtney McDonald, Heidi Burkhardt PDF Solving SEO Issues in DSpace-based Digital Repositories A Case Study and Assessment of Worldwide Repositories Matus Formanek PDF Development of a Gold-standard Pashto Dataset and a Segmentation App Yan Han, Marek Rychlik PDF Personalization of Search Results Representation of a Digital Library Ljubomir Paskali, Lidija Ivanovic, Georgia Kapitsaki, Dragan Ivanovic, Bojana Dimic Surla, Dusan Surla PDF Communications User Testing with Microinteractions Enhancing a Next-Generation Repository Sara Gonzales, Matthew B. Carson, Guillaume Viger, Lisa O'Keefe, Norrina B. Allen, Joseph P. Ferrie, Kristi Holmes PDF View All Issues Open Journal Systems Information For Readers For Authors For Librarians Current Issue ejournals-bc-edu-7307 ---- None elibtronic-ca-1923 ---- None elibtronic-ca-3082 ---- None emory-zoom-us-7617 ---- Webinar Registration - Zoom Skip to main content 1.888.799.9666 JOIN A MEETING HOST A MEETING With Video Off With Video On HOST A MEETING WITH VIDEO EMORY IT SERVICE DESK       SIGN IN SIGN UP, IT'S FREE SIGN UP, IT'S FREE webinar register page Topic Samvera Virtual Connect 2021 Description Samvera Virtual Connect (SVC), is an opportunity for Samvera Community participants to gather online to learn about initiatives taking place across interest groups, working groups, local and collaborative development projects, and other efforts. SVC will give the Samvera community a chance to come together to catch up on developments, make new connections, and learn more about the Community. Webinar is over, you cannot register now. If you have any questions, please contact Webinar host: Heather Greer Klein (she/her/hers). × Share via Email All fields are required Your Information Send to Message preview Hi, You are invited to a Zoom webinar. Topic: Samvera Virtual Connect 2021 Register in advance for this webinar: https://emory.zoom.us/webinar/register/WN_sfR-WxKyTl2klDjmVWAPWw After registering, you will receive a confirmation email containing information about joining the webinar.   Send Cancel × Switch Time Zone Time Zone:   Please Select Your Time Zone... (GMT-11:00) Midway Island, Samoa (GMT-11:00) Pago Pago (GMT-10:00) Hawaii (GMT-8:00) Alaska (GMT-8:00) Juneau (GMT-7:00) Vancouver (GMT-7:00) Pacific Time (US and Canada) (GMT-7:00) Tijuana (GMT-7:00) Arizona (GMT-6:00) Edmonton (GMT-6:00) Mountain Time (US and Canada) (GMT-6:00) Mazatlan (GMT-6:00) Saskatchewan (GMT-6:00) Guatemala (GMT-6:00) El Salvador (GMT-6:00) Managua (GMT-6:00) Costa Rica (GMT-6:00) Tegucigalpa (GMT-6:00) Chihuahua (GMT-5:00) Winnipeg (GMT-5:00) Central Time (US and Canada) (GMT-5:00) Mexico City (GMT-5:00) Panama (GMT-5:00) Bogota (GMT-5:00) Lima (GMT-5:00) Monterrey (GMT-4:00) Montreal (GMT-4:00) Eastern Time (US and Canada) (GMT-4:00) Indiana (East) (GMT-4:00) Puerto Rico (GMT-4:00) Caracas (GMT-4:00) Santiago (GMT-4:00) La Paz (GMT-4:00) Guyana (GMT-3:00) Halifax (GMT-3:00) Montevideo (GMT-3:00) Recife (GMT-3:00) Buenos Aires, Georgetown (GMT-3:00) Sao Paulo (GMT-3:00) Atlantic Time (Canada) (GMT-2:30) Newfoundland and Labrador (GMT-2:00) Greenland (GMT-1:00) Cape Verde Islands (GMT+0:00) Azores (GMT+0:00) Universal Time UTC (GMT+0:00) Greenwich Mean Time (GMT+0:00) Reykjavik (GMT+0:00) Casablanca (GMT+0:00) Nouakchott (GMT+1:00) Dublin (GMT+1:00) London (GMT+1:00) Lisbon (GMT+1:00) West Central Africa (GMT+1:00) Algiers (GMT+1:00) Tunis (GMT+2:00) Belgrade, Bratislava, Ljubljana (GMT+2:00) Sarajevo, Skopje, Zagreb (GMT+2:00) Oslo (GMT+2:00) Copenhagen (GMT+2:00) Brussels (GMT+2:00) Amsterdam, Berlin, Rome, Stockholm, Vienna (GMT+2:00) Amsterdam (GMT+2:00) Rome (GMT+2:00) Stockholm (GMT+2:00) Vienna (GMT+2:00) Luxembourg (GMT+2:00) Paris (GMT+2:00) Zurich (GMT+2:00) Madrid (GMT+2:00) Harare, Pretoria (GMT+2:00) Warsaw (GMT+2:00) Prague Bratislava (GMT+2:00) Budapest (GMT+2:00) Tripoli (GMT+2:00) Cairo (GMT+2:00) Johannesburg (GMT+2:00) Khartoum (GMT+3:00) Helsinki (GMT+3:00) Nairobi (GMT+3:00) Sofia (GMT+3:00) Istanbul (GMT+3:00) Athens (GMT+3:00) Bucharest (GMT+3:00) Nicosia (GMT+3:00) Beirut (GMT+3:00) Damascus (GMT+3:00) Jerusalem (GMT+3:00) Amman (GMT+3:00) Moscow (GMT+3:00) Baghdad (GMT+3:00) Kuwait (GMT+3:00) Riyadh (GMT+3:00) Bahrain (GMT+3:00) Qatar (GMT+3:00) Aden (GMT+3:00) Djibouti (GMT+3:00) Mogadishu (GMT+3:00) Kiev (GMT+3:00) Minsk (GMT+4:00) Dubai (GMT+4:00) Muscat (GMT+4:00) Baku, Tbilisi, Yerevan (GMT+4:30) Tehran (GMT+4:30) Kabul (GMT+5:00) Yekaterinburg (GMT+5:00) Islamabad, Karachi, Tashkent (GMT+5:30) India (GMT+5:30) Mumbai, Kolkata, New Delhi (GMT+5:30) Asia/Colombo (GMT+5:45) Kathmandu (GMT+6:00) Almaty (GMT+6:00) Dacca (GMT+6:00) Astana, Dhaka (GMT+6:30) Rangoon (GMT+7:00) Novosibirsk (GMT+7:00) Krasnoyarsk (GMT+7:00) Bangkok (GMT+7:00) Vietnam (GMT+7:00) Jakarta (GMT+8:00) Irkutsk, Ulaanbaatar (GMT+8:00) Beijing, Shanghai (GMT+8:00) Hong Kong SAR (GMT+8:00) Taipei (GMT+8:00) Kuala Lumpur (GMT+8:00) Singapore (GMT+8:00) Perth (GMT+9:00) Yakutsk (GMT+9:00) Seoul (GMT+9:00) Osaka, Sapporo, Tokyo (GMT+9:30) Darwin (GMT+9:30) Adelaide (GMT+10:00) Vladivostok (GMT+10:00) Guam, Port Moresby (GMT+10:00) Brisbane (GMT+10:00) Canberra, Melbourne, Sydney (GMT+10:00) Hobart (GMT+11:00) Magadan (GMT+11:00) Solomon Islands (GMT+11:00) New Caledonia (GMT+12:00) Kamchatka (GMT+12:00) Fiji Islands, Marshall Islands (GMT+12:00) Auckland, Wellington (GMT+13:00) Independent State of Samoa OK Cancel × Continue to PayPal Click to Continue × × Upcoming Meetings Would you like to start this meeting? Would you like to start one of these meetings? View more... Start a New Meeting en-wikipedia-org-1224 ---- The Age of Surveillance Capitalism - Wikipedia The Age of Surveillance Capitalism From Wikipedia, the free encyclopedia Jump to navigation Jump to search Book published in 2019 The Age of Surveillance Capitalism Front cover Author Shoshana Zuboff Subject Politics, cybersecurity Publisher Profile Books Publication date January 15, 2019 ISBN 9781781256855 The Age of Surveillance Capitalism is a 2019 non-fiction book by Professor Shoshana Zuboff which looks at the development of digital companies like Google and Amazon, and suggests that their business models represent a new form of capitalist accumulation that she calls "surveillance capitalism".[1][2] While industrial capitalism exploited and controlled nature with devastating consequences, surveillance capitalism exploits and controls human nature with a totalitarian order as the endpoint of the development.[3] Premise[edit] Zuboff states that Surveillance Capitalism "unilaterally claims human experience as free raw material for translation into behavioural data [which] are declared as a proprietary behavioural surplus, fed into advanced manufacturing processes known as ‘machine intelligence’, and fabricated into prediction products that anticipate what you will do now, soon, and later." She states that these new capitalist products "are traded in a new kind of marketplace that I call behavioural futures markets."[4] In a capitalist society, information, such as a users likes and dislikes, observed from accessing a platform like Facebook is information that can be freely used by that platform to better the experience of a user by feeding them information that data obtained from their previous activity would have shown them to be interested in. This in many ways can be done through the use of an algorithm that automatically filters out information. The danger of surveillance capitalism is that platforms and tech companies are entitled to this information because it is free for them to access. There is very little supervision by governments and users themselves. Because of this, there has been backlash on how these companies have used the information gathered. For example, Google, which is said to be “the pioneer of surveillance capitalism”, Zuboff (2019)[5] introduced a feature that used “commercial models…discovered by people in a time and place”, Zuboff (2019).[5] This means that not only are commercials being specifically targeted to you through your phone, but now work hand in hand with your environment and habits such as being shown an advertisement of a local bar when walking around downtown in the evening. Advertising attempts this technical and specific can easily have an impact on the one's decision-making process in the activities they choose and in political decisions. Thus the idea that these companies seemingly go unchecked whilst having the power to observe and control thinking is one of the many reasons tech companies such as Google themselves are under so much scrutiny. Furthermore, the freedom allotted to tech companies comes from the idea that “surveillance capitalism does not abandon established capitalist ‘laws’ such as competitive production, profit maximization, productivity and growth”, Zuboff (2019),[5] as they are principles any business in a capitalistic society should aim to excel in, in order to be competitive. Zuboff (2019)[5] claims in an article that “new logic accumulation…introduces its own laws of motion”. In other words, this is a new phenomenon in capitalistic operations that should be treated as such and be instilled with its own specific restrictions and limitations. Lastly, as invasive as platforms have been in terms of accumulating information, they have also led to what is now called a “sharing economy”, Van Dijck (2018)[6] in which digital information can be obtained by individuals carrying out their own surveillance capitalism through the aid of platforms themselves. Thus “individuals can greatly benefit from this transformation because it empowers them to set up business”, Van Dijck (2018).[6] Small businesses will also benefit in potentially growing faster than they would have without knowing consumer demands and wants. This leaves surveillance capitalism as an exceptionally useful tool for businesses, but also an invasion of privacy to users. Reception[edit] The New Yorker listed The Age of Surveillance Capitalism as one of its top non-fiction books of 2019.[7] Former President of the United States Barack Obama also listed it as one of his favourite books of 2019, which journalism researcher Avi Asher-Schapiro noted as an interesting choice, given that the book heavily criticises the "revolving door of personnel who migrated between Google & the Obama admin”.[8] Sam DiBella, writing for the LSE Blog, criticised the book's approach which could "inspire paralysis rather than praxis when it comes to forging collective action to counter systematic corporate surveillance."[9] The Financial Times called the book a "masterwork of original thinking and research".[10] References[edit] ^ Bridle, James (2 February 2019). "The Age of Surveillance Capitalism by Shoshana Zuboff review – we are the pawns". The Guardian. ISSN 0261-3077. Retrieved 2020-01-17 – via www.theguardian.com. CS1 maint: discouraged parameter (link) ^ Naughton, John (20 January 2019). "'The goal is to automate us': welcome to the age of surveillance capitalism". The Observer. ISSN 0029-7712. Retrieved 2020-01-17 – via www.theguardian.com. CS1 maint: discouraged parameter (link) ^ "The new tech totalitarianism". www.newstatesman.com. Retrieved 2021-02-21. ^ Naughton, John (2019-01-20). "'The goal is to automate us': welcome to the age of surveillance capitalism". The Observer. ISSN 0029-7712. Retrieved 2020-01-16. ^ a b c d Zuboff, Shoshana; Möllers, Norma; Murakami Wood, David; Lyon, David (2019-03-31). "Surveillance Capitalism: An Interview with Shoshana Zuboff". Surveillance & Society. 17 (1/2): 257–266. doi:10.24908/ss.v17i1/2.13238. ISSN 1477-7487. ^ a b van Dijck, José; Poell, Thomas; de Waal, Martijn (2018-10-18). "The Platform Society". Oxford Scholarship Online. doi:10.1093/oso/9780190889760.001.0001. ISBN 9780190889760. ^ Yorker, The New (2019-12-18). "Our Favorite Nonfiction Books of 2019". The New Yorker (Serial). ISSN 0028-792X. Retrieved 2020-01-16. ^ Binder, Matt. "Obama praises book that slams his White House for its Google relationship". Mashable. Retrieved 2020-01-16. ^ November 10th; Reviews, 2019|Book; Democracy; Comments, culture|0 (2019-11-10). "Book Review: The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power by Shoshana Zuboff". USAPP. Retrieved 2020-01-16. ^ Graphics, FT Interactive. "The Age of Surveillance Capitalism by Shoshana Zuboff". FT Business book of the year award. Retrieved 2020-01-16. Retrieved from "https://en.wikipedia.org/w/index.php?title=The_Age_of_Surveillance_Capitalism&oldid=1019967094" Categories: American non-fiction books 2019 non-fiction books Books critical of capitalism Hidden categories: CS1 maint: discouraged parameter Articles with short description Short description matches Wikidata Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages Italiano Edit links This page was last edited on 26 April 2021, at 12:27 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-1629 ---- Double-spending - Wikipedia Double-spending From Wikipedia, the free encyclopedia Jump to navigation Jump to search Failure mode of digital cash schemes Double-spending is a potential flaw in a digital cash scheme in which the same single digital token can be spent more than once. Unlike physical cash, a digital token consists of a digital file that can be duplicated or falsified.[1][2] As with counterfeit money, such double-spending leads to inflation by creating a new amount of copied currency that did not previously exist. This devalues the currency relative to other monetary units or goods and diminishes user trust as well as the circulation and retention of the currency. Fundamental cryptographic techniques to prevent double-spending, while preserving anonymity in a transaction, are blind signatures and, particularly in offline systems, secret splitting.[2] Contents 1 Centralized currencies 2 Decentralized currencies 3 51% attack 4 References Centralized currencies[edit] Prevention of double-spending is usually implemented using an online central trusted third party that can verify whether a token has been spent.[2] This normally represents a single point of failure from both availability and trust viewpoints. Decentralized currencies[edit] In a decentralized system, the double-spending problem is significantly harder to solve. To avoid the need for a trusted third party, many servers must store identical up-to-date copies of a public transaction ledger, but as transactions (requests to spend money) are broadcast, they will arrive at each server at slightly different times. If two transactions attempt to spend the same token, each server will consider the first transaction it sees to be valid, and the other invalid. Once the servers disagree, there is no way to determine true balances, as each server's observations are considered equally valid. Most decentralized systems solve this with a consensus algorithm, a way to bring the servers back in sync. Two notable types of consensus mechanisms are proof-of-work and proof-of-stake. By 2007, a number of distributed systems for the prevention of double-spending had been proposed.[3][4] The cryptocurrency Bitcoin implemented a solution in early 2009. Its cryptographic protocol used a proof-of-work consensus mechanism where transactions are batched into blocks and chained together using a linked list of hash pointers (blockchain). Any server can produce a block by solving a computationally difficult puzzle (specifically finding a partial hash collision) called mining. The block commits to the entire history of bitcoin transactions as well as the new set of incoming transactions. The miner is rewarded some bitcoins for solving it. The double-spending problem persists, however, if two blocks (with conflicting transactions) are mined at the same approximate time. When servers inevitably disagree on the order of the two blocks, they each keep both blocks temporarily. As new blocks arrive, they must commit to one history or the other, and eventually a single chain will continue on, while the other(s) will not. Since the longest (more technically "heaviest") chain is considered to be the valid data set, miners are incentivized to only build blocks on the longest chain they know about in order for it to become part of that dataset (and for their reward to be valid). Transactions in this system are therefore never technically "final" as a conflicting chain of blocks can always outgrow the current canonical chain. However, as blocks are built on top of a transaction, it becomes increasingly unlikely/costly for another chain to overtake it. 51% attack[edit] The total computational power of a decentralized proof-of-work system is the sum of the computational power of the nodes, which can differ significantly due to the hardware used. Larger computational power increases the chance to win the mining reward for each new block mined, which creates an incentive to accumulate clusters of mining nodes, or mining pools. Any pool that achieves 51% hashing power can effectively overturn network transactions, resulting in double spending. One of the Bitcoin forks, Bitcoin Gold, was hit by such an attack in 2018 and then again in 2020.[5] A given cryptocurrency's susceptibility to attack depends on the existing hashing power of the network since the attacker needs to overcome it. For the attack to be economically viable, the market cap of the currency must be sufficiently large to justify the cost to rent hashing power.[6][7] In 2014, mining pool Ghash.io obtained 51% hashing power in Bitcoin which raised significant controversies about the safety of the network. The pool has voluntarily capped their hashing power at 39.99% and requested other pools to follow in order to restore trust in the network.[8] References[edit] ^ The Double Spending Problem and Cryptocurrencies. Banking & Insurance Journal. Social Science Research Network (SSRN). Accessed 24 December 2017. ^ a b c Mark Ryan. "Digital Cash". School of Computer Science, University of Birmingham. Retrieved 2017-05-27. ^ Jaap-Henk Hoepman (2008). "Distributed Double Spending Prevention". arXiv:0802.0832v1 [cs.CR]. ^ Osipkov, I.; Vasserman, E. Y.; Hopper, N.; Kim, Y. (2007). "Combating Double-Spending Using Cooperative P2P Systems". 27th International Conference on Distributed Computing Systems (ICDCS '07). p. 41. CiteSeerX 10.1.1.120.52. doi:10.1109/ICDCS.2007.91. ^ Canellis, David (2020-01-27). "Bitcoin Gold hit by 51% attacks, $72K in cryptocurrency double-spent". Hard Fork | The Next Web. Retrieved 2020-02-29. ^ "Cost of a 51% Attack for Different Cryptocurrencies | Crypto51". www.crypto51.app. Retrieved 2020-02-29. ^ Varshney, Neer (2018-05-24). "Why Proof-of-work isn't suitable for small cryptocurrencies". Hard Fork | The Next Web. Retrieved 2018-05-25. ^ "Popular Bitcoin Mining Pool Promises To Restrict Its Compute Power To Prevent Feared '51%' Fiasco". TechCrunch. Retrieved 2020-02-29. v t e Cryptocurrencies Technology Blockchain Cryptocurrency tumbler Cryptocurrency exchange Cryptocurrency wallet Cryptographic hash function Distributed ledger Fork Lightning Network MetaMask Smart contract Consensus mechanisms Proof of authority Proof of personhood Proof of space Proof of stake Proof of work Proof of work currencies SHA-256-based Bitcoin Bitcoin Cash Counterparty LBRY MazaCoin Namecoin Peercoin Titcoin Ethash-based Ethereum Ethereum Classic Scrypt-based Auroracoin Bitconnect Coinye Dogecoin Litecoin Equihash-based Bitcoin Gold Zcash RandomX-based Monero X11-based Dash Petro Other AmbaCoin Firo IOTA Primecoin Verge Vertcoin Proof of stake currencies Cardano EOS.IO Gridcoin Nxt Peercoin Polkadot Steem Tezos TRON ERC-20 tokens Augur Aventus Bancor Basic Attention Token Chainlink Kin KodakCoin Minds The DAO Uniswap Stablecoins Dai Diem Tether USD Coin Other currencies Filecoin GNU Taler Hashgraph Nano NEO Ripple Stellar WhopperCoin Related topics Airdrop BitLicense Blockchain game Complementary currency Crypto-anarchism Cryptocurrency bubble Decentralized Finance Digital currency Double-spending Hyperledger Initial coin offering Initial exchange offering Initiative Q List of cryptocurrencies Non-fungible token Token money Virtual currency Category Commons List Retrieved from "https://en.wikipedia.org/w/index.php?title=Double-spending&oldid=1018658906" Categories: Digital currencies Financial cryptography Payment systems Internet fraud Distributed computing Cryptocurrencies Hidden categories: Articles with short description Short description matches Wikidata Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages العربية Español فارسی Français Italiano Português Русский 中文 Edit links This page was last edited on 19 April 2021, at 06:08 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-1453 ---- BitTorrent - Wikipedia BitTorrent From Wikipedia, the free encyclopedia Jump to navigation Jump to search Peer-to-peer file sharing protocol This article is about the file sharing protocol. For other uses, see BitTorrent (disambiguation). BitTorrent Original author(s) Bram Cohen Developer(s) BitTorrent, Inc Initial release 2001; 20 years ago (2001) Repository github.com/bittorrent/bittorrent.org Operating system Android iOS Linux macOS Windows Other Standard(s) The BitTorrent Protocol Specification[1] Type peer-to-peer file sharing License Unknown Website www.bittorrent.org Part of a series on File sharing Technologies File hosting services Online video platform Peer to peer Usenet Web hosting WebRTC XDCC Video sharing sites 123Movies Dailymotion PeerTube Putlocker YouTube BitTorrent sites 1337x Demonoid ExtraTorrent EZTV isoHunt KickassTorrents Nyaa Torrents RARBG Tamil Rockers The Pirate Bay YIFY Academic #ICanHazPDF Internet Archive Library Genesis Sci-Hub Academic Torrents Z-Library File sharing networks BitTorrent Direct Connect eDonkey Freenet Gnutella Gnutella2 IPFS LBRY Ares Galaxy List of P2P protocols OpenNap WebTorrent P2P clients BitComet DC++ Deluge eMule μTorrent qBittorrent Shareaza Soulseek Transmission Tribler Vuze WinMX Napster Streaming programs Butter Project Popcorn Time Torrents-Time Anonymous file sharing Anonymous P2P Darknet Freenet Friend-to-friend I2P Private P2P Proxy server Seedbox Tor VPN Development and societal aspects Timeline Legality BitTorrent issues By country or region Canada Japan Singapore UK US Comparisons Comparison of BitTorrent clients Comparison of BitTorrent sites Comparison of eDonkey software Comparison of Internet Relay Chat clients Comparison of Usenet newsreaders v t e BitTorrent (abbreviated to BT) is a communication protocol for peer-to-peer file sharing (P2P), which enables users to distribute data and electronic files over the Internet in a decentralized manner. BitTorrent is one of the most common protocols for transferring large files; such as, digital video files containing TV shows and video clips, or digital audio files containing songs. P2P networks have been estimated to, collectively, account for approximately 43% to 70% of Internet traffic depending on location, as of February 2009[update].[2] In February 2013, BitTorrent was responsible for 3.35% of all worldwide bandwidth—more than half of the 6% of total bandwidth dedicated to file sharing.[3] In 2019, BitTorrent was a dominant file sharing protocol and generated a substantial amount of Internet traffic, with 2.46% of downstream, and 27.58% of upstream traffic.[4] To send or receive files, a person uses a BitTorrent client, on their Internet-connected computer. A BitTorrent client is a computer program that implements the BitTorrent protocol. Popular clients include μTorrent, Xunlei Thunder,[5][6] Transmission, qBittorrent, Vuze, Deluge, BitComet and Tixati. BitTorrent trackers provide a list of files available for transfer and allow the client to find peer users, known as "seeds", who may transfer the files. Programmer Bram Cohen, a University at Buffalo alumnus,[7] designed the protocol in April 2001, and released the first available version on 2 July 2001.[8] As of June 2020[update], the most recent version was implemented in 2017.[1] BitTorrent clients are available for a variety of computing platforms and operating systems, including an official client released by BitTorrent, Inc. As of 2013[update], BitTorrent has 15–27 million concurrent users at any time.[9] As of January 2012[update], BitTorrent is utilized by 150 million active users. Based on this figure, the total number of monthly users may be estimated to more than a quarter of a billion (≈ 250 million).[10] Torrenting may sometimes be limited by Internet Service Providers (ISPs), on legal or copyright grounds. In turn, users may choose to run seedboxes or Virtual Private Networks (VPNs) as an alternative. On May 15, 2017, an update to the protocol specification was released by BitTorrent, called BitTorrent v2.[11][12] libtorrent was updated to support the new version on September 6, 2020.[13] Animation of protocol use: The colored dots beneath each computer in the animation represent different parts of the file being shared. By the time a copy to a destination computer of each of those parts completes, a copy to another destination computer of that part (or other parts) is already taking place between users. Contents 1 Description 2 Operation 2.1 Search queries 2.2 Downloading torrents and sharing files 2.3 Creating and publishing torrents 2.4 Anonymity 2.5 BitTorrent v2 3 Adoption 3.1 Film, video, and music 3.2 Broadcasters 3.3 Personal works 3.4 Software 3.5 Government 3.6 Education 3.7 Others 4 Technologies built on BitTorrent 4.1 Distributed trackers 4.2 Web seeding 4.2.1 Hash web seeding 4.2.2 HTTP web seeding 4.2.3 Other 4.3 RSS feeds 4.4 Throttling and encryption 4.5 Multitracker 4.6 Peer selection 5 Implementations 6 Legal issues 7 Security problems 7.1 Malware 8 See also 9 References 10 Further reading 11 External links Description[edit] The middle computer is acting as a "seed" to provide a file to the other computers which act as peers. The BitTorrent protocol can be used to reduce the server and network impact of distributing large files. Rather than downloading a file from a single source server, the BitTorrent protocol allows users to join a "swarm" of hosts to upload to/download from each other simultaneously. The protocol is an alternative to the older single source, multiple mirror sources technique for distributing data, and can work effectively over networks with lower bandwidth. Using the BitTorrent protocol, several basic computers, such as home computers, can replace large servers while efficiently distributing files to many recipients. This lower bandwidth usage also helps prevent large spikes in internet traffic in a given area, keeping internet speeds higher for all users in general, regardless of whether or not they use the BitTorrent protocol. The first release of the BitTorrent client had no search engine and no peer exchange, so users who wanted to upload a file had to create a small torrent descriptor file that they would upload to a torrent index site. The first uploader acted as a seed, and downloaders would initially connect as peers (see diagram on the right). Those who wish to download the file would download the torrent, which their client would use to connect to a tracker which had a list of the IP addresses of other seeds and peers in the swarm. Once a peer completed a download of the complete file, it could in turn function as a seed. The file being distributed is divided into segments called pieces. As each peer receives a new piece of the file, it becomes a source (of that piece) for other peers, relieving the original seed from having to send that piece to every computer or user wishing a copy. With BitTorrent, the task of distributing the file is shared by those who want it; it is entirely possible for the seed to send only a single copy of the file itself and eventually distribute to an unlimited number of peers. Each piece is protected by a cryptographic hash contained in the torrent descriptor.[1] This ensures that any modification of the piece can be reliably detected, and thus prevents both accidental and malicious modifications of any of the pieces received at other nodes. If a node starts with an authentic copy of the torrent descriptor, it can verify the authenticity of the entire file it receives. Pieces are typically downloaded non-sequentially, and are rearranged into the correct order by the BitTorrent client, which monitors which pieces it needs, and which pieces it has and can upload to other peers. Pieces are of the same size throughout a single download (for example, a 10 MB file may be transmitted as ten 1 MB pieces or as forty 256 KB pieces). Due to the nature of this approach, the download of any file can be halted at any time and be resumed at a later date, without the loss of previously downloaded information, which in turn makes BitTorrent particularly useful in the transfer of larger files. This also enables the client to seek out readily available pieces and download them immediately, rather than halting the download and waiting for the next (and possibly unavailable) piece in line, which typically reduces the overall time of the download. This eventual transition from peers to seeders determines the overall "health" of the file (as determined by the number of times a file is available in its complete form). The distributed nature of BitTorrent can lead to a flood-like spreading of a file throughout many peer computer nodes. As more peers join the swarm, the likelihood of a successful download by any particular node increases. Relative to traditional Internet distribution schemes, this permits a significant reduction in the original distributor's hardware and bandwidth resource costs. Distributed downloading protocols in general provide redundancy against system problems, reduce dependence on the original distributor,[14] and provide sources for the file which are generally transient and therefore there is no single point of failure as in one way server-client transfers. Operation[edit] A BitTorrent client is capable of preparing, requesting, and transmitting any type of computer file over a network, using the protocol. Up until 2005, the only way to share files was by creating a small text file called a "torrent". These files contain metadata about the files to be shared and the trackers which keep track of the other seeds and peers. Users that want to download the file first obtain a torrent file for it, and connect to the tracker or seeds. In 2005, first Vuze and then the BitTorrent client introduced distributed tracking using distributed hash tables which allowed clients to exchange data on swarms directly without the need for a torrent file. In 2006, peer exchange functionality was added allowing clients to add peers based on the data found on connected nodes. Though both ultimately transfer files over a network, a BitTorrent download differs from a one way server-client download (as is typical with an HTTP or FTP request, for example) in several fundamental ways: BitTorrent makes many small data requests over different IP connections to different machines, while server-client downloading is typically made via a single TCP connection to a single machine. BitTorrent downloads in a random or in a "rarest-first"[15] approach that ensures high availability, while classic downloads are sequential. Taken together, these differences allow BitTorrent to achieve much lower cost to the content provider, much higher redundancy, and much greater resistance to abuse or to "flash crowds" than regular server software. However, this protection, theoretically, comes at a cost: downloads can take time to rise to full speed because it may take time for enough peer connections to be established, and it may take time for a node to receive sufficient data to become an effective uploader. This contrasts with regular downloads (such as from an HTTP server, for example) that, while more vulnerable to overload and abuse, rise to full speed very quickly, and maintain this speed throughout. In the beginning, BitTorrent's non-contiguous download methods made it harder to support "streaming playback". In 2014, the client Popcorn Time allowed for streaming of BitTorrent video files. Since then, more and more clients are offering streaming options. Search queries[edit] The BitTorrent protocol provides no way to index torrent files. As a result, a comparatively small number of websites have hosted a large majority of torrents, many linking to copyrighted works without the authorization of copyright holders, rendering those sites especially vulnerable to lawsuits.[16] A BitTorrent index is a "list of .torrent files, which typically includes descriptions" and information about the torrent's content.[17] Several types of websites support the discovery and distribution of data on the BitTorrent network. Public torrent-hosting sites such as The Pirate Bay allow users to search and download from their collection of torrent files. Users can typically also upload torrent files for content they wish to distribute. Often, these sites also run BitTorrent trackers for their hosted torrent files, but these two functions are not mutually dependent: a torrent file could be hosted on one site and tracked by another unrelated site. Private host/tracker sites operate like public ones except that they may restrict access to registered users and may also keep track of the amount of data each user uploads and downloads, in an attempt to reduce "leeching". Web search engines allow the discovery of torrent files that are hosted and tracked on other sites; examples include The Pirate Bay, Torrentz, isoHunt and BTDigg. These sites allow the user to ask for content meeting specific criteria (such as containing a given word or phrase) and retrieve a list of links to torrent files matching those criteria. This list can often be sorted with respect to several criteria, relevance (seeders-leechers ratio) being one of the most popular and useful (due to the way the protocol behaves, the download bandwidth achievable is very sensitive to this value). Metasearch engines allow one to search several BitTorrent indices and search engines at once. The Tribler BitTorrent client was among the first to incorporate built-in search capabilities. With Tribler, users can find .torrent files held by random peers and taste buddies.[18] It adds such an ability to the BitTorrent protocol using a gossip protocol, somewhat similar to the eXeem network which was shut down in 2005. The software includes the ability to recommend content as well. After a dozen downloads, the Tribler software can roughly estimate the download taste of the user, and recommend additional content.[19] In May 2007, researchers at Cornell University published a paper proposing a new approach to searching a peer-to-peer network for inexact strings,[20] which could replace the functionality of a central indexing site. A year later, the same team implemented the system as a plugin for Vuze called Cubit[21] and published a follow-up paper reporting its success.[22] A somewhat similar facility but with a slightly different approach is provided by the BitComet client through its "Torrent Exchange"[23] feature. Whenever two peers using BitComet (with Torrent Exchange enabled) connect to each other they exchange lists of all the torrents (name and info-hash) they have in the Torrent Share storage (torrent files which were previously downloaded and for which the user chose to enable sharing by Torrent Exchange). Thus each client builds up a list of all the torrents shared by the peers it connected to in the current session (or it can even maintain the list between sessions if instructed). At any time the user can search into that Torrent Collection list for a certain torrent and sort the list by categories. When the user chooses to download a torrent from that list, the .torrent file is automatically searched for (by info-hash value) in the DHT Network and when found it is downloaded by the querying client which can after that create and initiate a downloading task. Downloading torrents and sharing files[edit] Users find a torrent of interest on a torrent index site or by using a search engine built into the client, download it, and open it with a BitTorrent client. The client connects to the tracker(s) or seeds specified in the torrent file, from which it receives a list of seeds and peers currently transferring pieces of the file(s). The client connects to those peers to obtain the various pieces. If the swarm contains only the initial seeder, the client connects directly to it, and begins to request pieces. Clients incorporate mechanisms to optimize their download and upload rates. The effectiveness of this data exchange depends largely on the policies that clients use to determine to whom to send data. Clients may prefer to send data to peers that send data back to them (a "tit for tat" exchange scheme), which encourages fair trading. But strict policies often result in suboptimal situations, such as when newly joined peers are unable to receive any data because they don't have any pieces yet to trade themselves or when two peers with a good connection between them do not exchange data simply because neither of them takes the initiative. To counter these effects, the official BitTorrent client program uses a mechanism called "optimistic unchoking", whereby the client reserves a portion of its available bandwidth for sending pieces to random peers (not necessarily known good partners, so called preferred peers) in hopes of discovering even better partners and to ensure that newcomers get a chance to join the swarm.[24] Although "swarming" scales well to tolerate "flash crowds" for popular content, it is less useful for unpopular or niche market content. Peers arriving after the initial rush might find the content unavailable and need to wait for the arrival of a "seed" in order to complete their downloads. The seed arrival, in turn, may take long to happen (this is termed the "seeder promotion problem"). Since maintaining seeds for unpopular content entails high bandwidth and administrative costs, this runs counter to the goals of publishers that value BitTorrent as a cheap alternative to a client-server approach. This occurs on a huge scale; measurements have shown that 38% of all new torrents become unavailable within the first month.[25] A strategy adopted by many publishers which significantly increases availability of unpopular content consists of bundling multiple files in a single swarm.[26] More sophisticated solutions have also been proposed; generally, these use cross-torrent mechanisms through which multiple torrents can cooperate to better share content.[27] Creating and publishing torrents[edit] The peer distributing a data file treats the file as a number of identically sized pieces, usually with byte sizes of a power of 2, and typically between 32 kB and 16 MB each. The peer creates a hash for each piece, using the SHA-1 hash function, and records it in the torrent file. Pieces with sizes greater than 512 kB will reduce the size of a torrent file for a very large payload, but is claimed to reduce the efficiency of the protocol.[28] When another peer later receives a particular piece, the hash of the piece is compared to the recorded hash to test that the piece is error-free.[1] Peers that provide a complete file are called seeders, and the peer providing the initial copy is called the initial seeder. The exact information contained in the torrent file depends on the version of the BitTorrent protocol. By convention, the name of a torrent file has the suffix .torrent. Torrent files have an "announce" section, which specifies the URL of the tracker, and an "info" section, containing (suggested) names for the files, their lengths, the piece length used, and a SHA-1 hash code for each piece, all of which are used by clients to verify the integrity of the data they receive. Though SHA-1 has shown signs of cryptographic weakness, Bram Cohen did not initially consider the risk big enough for a backward incompatible change to, for example, SHA-3. As of BitTorrent v2 the hash function has been updated to SHA-256.[29] In the early days, torrent files were typically published to torrent index websites, and registered with at least one tracker. The tracker maintained lists of the clients currently connected to the swarm.[1] Alternatively, in a trackerless system (decentralized tracking) every peer acts as a tracker. Azureus was the first[30] BitTorrent client to implement such a system through the distributed hash table (DHT) method. An alternative and incompatible DHT system, known as Mainline DHT, was released in the Mainline BitTorrent client three weeks later (though it had been in development since 2002)[30] and subsequently adopted by the μTorrent, Transmission, rTorrent, KTorrent, BitComet, and Deluge clients. After the DHT was adopted, a "private" flag – analogous to the broadcast flag – was unofficially introduced, telling clients to restrict the use of decentralized tracking regardless of the user's desires.[31] The flag is intentionally placed in the info section of the torrent so that it cannot be disabled or removed without changing the identity of the torrent. The purpose of the flag is to prevent torrents from being shared with clients that do not have access to the tracker. The flag was requested for inclusion in the official specification in August 2008, but has not been accepted yet.[32] Clients that have ignored the private flag were banned by many trackers, discouraging the practice.[33] Anonymity[edit] BitTorrent does not, on its own, offer its users anonymity. One can usually see the IP addresses of all peers in a swarm in one's own client or firewall program. This may expose users with insecure systems to attacks.[24] In some countries, copyright organizations scrape lists of peers, and send takedown notices to the internet service provider of users participating in the swarms of files that are under copyright. In some jurisdictions, copyright holders may launch lawsuits against uploaders or downloaders for infringement, and police may arrest suspects in such cases. Various means have been used to promote anonymity. For example, the BitTorrent client Tribler makes available a Tor-like onion network, optionally routing transfers through other peers to obscure which client has requested the data. The exit node would be visible to peers in a swarm, but the Tribler organization provides exit nodes. One advantage of Tribler is that clearnet torrents can be downloaded with only a small decrease in download speed from one "hop" of routing. i2p provides a similar anonymity layer although in that case, one can only download torrents that have been uploaded to the i2p network.[34] The bittorrent client Vuze allows users who are not concerned about anonymity to take clearnet torrents, and make them available on the i2p network.[35] Most BitTorrent clients are not designed to provide anonymity when used over Tor,[36] and there is some debate as to whether torrenting over Tor acts as a drag on the network.[37] Private torrent trackers are usually invitation only, and require members to participate in uploading, but have the downside of a single centralized point of failure. Oink's Pink Palace and What.cd are examples of private trackers which have been shut down. Seedbox services download the torrent files first to the company's servers, allowing the user to direct download the file from there.[38][39] One's IP address would be visible to the Seedbox provider, but not to third parties. Virtual private networks encrypt transfers, and substitute a different IP address for the user's, so that anyone monitoring a torrent swarm will only see that address. BitTorrent v2[edit] BitTorrent v2 is intended to work seamlessly with previous versions of the BitTorrent protocol. The main reason for the update was that the old cryptographic hash function, SHA-1 is no longer considered safe from malicious attacks by the developers, and as such, v2 uses SHA-256. To ensure backwards compatibility, the v2 .torrent file format supports a hybrid mode where the torrents are hashed through both the new method and the old method, with the intent that the files will be shared with peers on both v1 and v2 swarms. Another update to the specification is adding a hash tree to speed up time from adding a torrent to downloading files, and to allow more granular checks for file corruption. In addition, each file is now hashed individually, enabling files in the swarm to be deduplicated, so that if multiple torrents include the same files, but seeders are only seeding the file from some, downloaders of the other torrents can still download the file. Magnet links for v2 also support a hybrid mode to ensure support for legacy clients.[40] Adoption[edit] A growing number of individuals and organizations are using BitTorrent to distribute their own or licensed works (e.g. indie bands distributing digital files of their new songs). Independent adopters report that without using BitTorrent technology, and its dramatically reduced demands on their private networking hardware and bandwidth, they could not afford to distribute their files.[41] Some uses of BitTorrent for file sharing may violate laws in some jurisdictions (see legal issues section). Film, video, and music[edit] BitTorrent Inc. has obtained a number of licenses from Hollywood studios for distributing popular content from their websites.[citation needed] Sub Pop Records releases tracks and videos via BitTorrent Inc.[42] to distribute its 1000+ albums. Babyshambles and The Libertines (both bands associated with Pete Doherty) have extensively used torrents to distribute hundreds of demos and live videos. US industrial rock band Nine Inch Nails frequently distributes albums via BitTorrent. Podcasting software is starting to integrate BitTorrent to help podcasters deal with the download demands of their MP3 "radio" programs. Specifically, Juice and Miro (formerly known as Democracy Player) support automatic processing of .torrent files from RSS feeds. Similarly, some BitTorrent clients, such as μTorrent, are able to process web feeds and automatically download content found within them. DGM Live purchases are provided via BitTorrent.[43] VODO, a service which distributes "free-to-share" movies and TV shows via BitTorrent.[44][45][46] Broadcasters[edit] In 2008, the CBC became the first public broadcaster in North America to make a full show (Canada's Next Great Prime Minister) available for download using BitTorrent.[47] The Norwegian Broadcasting Corporation (NRK) has since March 2008 experimented with bittorrent distribution, available online.[48] Only selected works in which NRK owns all royalties are published. Responses have been very positive, and NRK is planning to offer more content. The Dutch VPRO broadcasting organization released four documentaries in 2009 and 2010 under a Creative Commons license using the content distribution feature of the Mininova tracker.[49][50][51] Personal works[edit] The Amazon S3 "Simple Storage Service" is a scalable Internet-based storage service with a simple web service interface, equipped with built-in BitTorrent support.[52] Software[edit] Blizzard Entertainment uses BitTorrent (via a proprietary client called the "Blizzard Downloader", associated with the Blizzard "BattleNet" network) to distribute content and patches for Diablo III, StarCraft II and World of Warcraft, including the games themselves.[53] Wargaming uses BitTorrent in their popular titles World of Tanks, World of Warships and World of Warplanes to distribute game updates.[54] CCP Games, maker of the space Simulation MMORPG Eve Online, has announced that a new launcher will be released that is based on BitTorrent.[55][56] Many software games, especially those whose large size makes them difficult to host due to bandwidth limits, extremely frequent downloads, and unpredictable changes in network traffic, will distribute instead a specialized, stripped down bittorrent client with enough functionality to download the game from the other running clients and the primary server (which is maintained in case not enough peers are available). Many major open source and free software projects encourage BitTorrent as well as conventional downloads of their products (via HTTP, FTP etc.) to increase availability and to reduce load on their own servers, especially when dealing with larger files.[57] Government[edit] The British government used BitTorrent to distribute details about how the tax money of British citizens was spent.[58][59] Education[edit] Florida State University uses BitTorrent to distribute large scientific data sets to its researchers.[60] Many universities that have BOINC distributed computing projects have used the BitTorrent functionality of the client-server system to reduce the bandwidth costs of distributing the client-side applications used to process the scientific data. If a BOINC distributed computing application needs to be updated (or merely sent to a user), it can do so with little impact on the BOINC server.[61] The developing Human Connectome Project uses BitTorrent to share their open dataset.[62] Academic Torrents is a BitTorrent tracker for use by researchers in fields that need to share large datasets[63][64] Others[edit] Facebook uses BitTorrent to distribute updates to Facebook servers.[65] Twitter uses BitTorrent to distribute updates to Twitter servers.[66][67] The Internet Archive added BitTorrent to its file download options for over 1.3 million existing files, and all newly uploaded files, in August 2012.[68][69] This method is the fastest means of downloading media from the Archive.[68][70] As of 2011[update], BitTorrent had 100 million users and a greater share of network bandwidth than Netflix and Hulu combined.[71][72] In early 2015, AT&T estimates that BitTorrent represents 20% of all broadband traffic.[73] Routers that use network address translation (NAT) must maintain tables of source and destination IP addresses and ports. Typical home routers are limited to about 2000 table entries[citation needed] while some more expensive routers have larger table capacities. BitTorrent frequently contacts 20–30 servers per second, rapidly filling the NAT tables. This is a known cause of some home routers ceasing to work correctly.[74][75] Technologies built on BitTorrent[edit] Distributed trackers[edit] On 2 May 2005, Azureus 2.3.0.0 (now known as Vuze) was released,[76] introducing support for "trackerless" torrents through a system called the "distributed database." This system is a Distributed hash table implementation which allows the client to use torrents that do not have a working BitTorrent tracker. Instead just bootstrapping server is used (router.bittorrent.com, dht.transmissionbt.com or router.utorrent.com[77][78]). The following month, BitTorrent, Inc. released version 4.2.0 of the Mainline BitTorrent client, which supported an alternative DHT implementation (popularly known as "Mainline DHT", outlined in a draft on their website) that is incompatible with that of Azureus. In 2014, measurement showed concurrent users of Mainline DHT to be from 10 million to 25 million, with a daily churn of at least 10 million.[79] Current versions of the official BitTorrent client, μTorrent, BitComet, Transmission and BitSpirit all share compatibility with Mainline DHT. Both DHT implementations are based on Kademlia.[80] As of version 3.0.5.0, Azureus also supports Mainline DHT in addition to its own distributed database through use of an optional application plugin.[81] This potentially allows the Azureus/Vuze client to reach a bigger swarm. Another idea that has surfaced in Vuze is that of virtual torrents. This idea is based on the distributed tracker approach and is used to describe some web resource. Currently, it is used for instant messaging. It is implemented using a special messaging protocol and requires an appropriate plugin. Anatomic P2P is another approach, which uses a decentralized network of nodes that route traffic to dynamic trackers. Most BitTorrent clients also use Peer exchange (PEX) to gather peers in addition to trackers and DHT. Peer exchange checks with known peers to see if they know of any other peers. With the 3.0.5.0 release of Vuze, all major BitTorrent clients now have compatible peer exchange. Web seeding[edit] Web "seeding" was implemented in 2006 as the ability of BitTorrent clients to download torrent pieces from an HTTP source in addition to the "swarm". The advantage of this feature is that a website may distribute a torrent for a particular file or batch of files and make those files available for download from that same web server; this can simplify long-term seeding and load balancing through the use of existing, cheap, web hosting setups. In theory, this would make using BitTorrent almost as easy for a web publisher as creating a direct HTTP download. In addition, it would allow the "web seed" to be disabled if the swarm becomes too popular while still allowing the file to be readily available. This feature has two distinct specifications, both of which are supported by Libtorrent and the 26+ clients that use it. Hash web seeding[edit] The first was created by John "TheSHAD0W" Hoffman, who created BitTornado.[82][83] This first specification requires running a web service that serves content by info-hash and piece number, rather than filename. HTTP web seeding[edit] The other specification is created by GetRight authors and can rely on a basic HTTP download space (using byte serving).[84][85] Other[edit] In September 2010, a new service named Burnbit was launched which generates a torrent from any URL using webseeding.[86] There are server-side solutions that provide initial seeding of the file from the web server via standard BitTorrent protocol and when the number of external seeders reach a limit, they stop serving the file from the original source.[87] RSS feeds[edit] Main article: Broadcatching A technique called broadcatching combines RSS feeds with the BitTorrent protocol to create a content delivery system, further simplifying and automating content distribution. Steve Gillmor explained the concept in a column for Ziff-Davis in December 2003.[88] The discussion spread quickly among bloggers (Ernest Miller,[89] Chris Pirillo, etc.). In an article entitled Broadcatching with BitTorrent, Scott Raymond explained: I want RSS feeds of BitTorrent files. A script would periodically check the feed for new items, and use them to start the download. Then, I could find a trusted publisher of an Alias RSS feed, and "subscribe" to all new episodes of the show, which would then start downloading automatically – like the "season pass" feature of the TiVo. — Scott Raymond, scottraymond.net[90] The RSS feed will track the content, while BitTorrent ensures content integrity with cryptographic hashing of all data, so feed subscribers will receive uncorrupted content. One of the first and popular software clients (free and open source) for broadcatching is Miro. Other free software clients such as PenguinTV and KatchTV are also now supporting broadcatching. The BitTorrent web-service MoveDigital added the ability to make torrents available to any web application capable of parsing XML through its standard REST-based interface in 2006,[91] though this has since been discontinued. Additionally, Torrenthut is developing a similar torrent API that will provide the same features, and help bring the torrent community to Web 2.0 standards. Alongside this release is a first PHP application built using the API called PEP, which will parse any Really Simple Syndication (RSS 2.0) feed and automatically create and seed a torrent for each enclosure found in that feed.[92] Throttling and encryption[edit] Main article: BitTorrent protocol encryption Since BitTorrent makes up a large proportion of total traffic, some ISPs have chosen to "throttle" (slow down) BitTorrent transfers. For this reason, methods have been developed to disguise BitTorrent traffic in an attempt to thwart these efforts.[93] Protocol header encrypt (PHE) and Message stream encryption/Protocol encryption (MSE/PE) are features of some BitTorrent clients that attempt to make BitTorrent hard to detect and throttle. As of November 2015, Vuze, Bitcomet, KTorrent, Transmission, Deluge, μTorrent, MooPolice, Halite, qBittorrent, rTorrent, and the latest official BitTorrent client (v6) support MSE/PE encryption. In August 2007, Comcast was preventing BitTorrent seeding by monitoring and interfering with the communication between peers. Protection against these efforts is provided by proxying the client-tracker traffic via an encrypted tunnel to a point outside of the Comcast network.[94] In 2008, Comcast called a "truce" with BitTorrent, Inc. with the intention of shaping traffic in a protocol-agnostic manner.[95] Questions about the ethics and legality of Comcast's behavior have led to renewed debate about net neutrality in the United States.[96] In general, although encryption can make it difficult to determine what is being shared, BitTorrent is vulnerable to traffic analysis. Thus, even with MSE/PE, it may be possible for an ISP to recognize BitTorrent and also to determine that a system is no longer downloading but only uploading data, and terminate its connection by injecting TCP RST (reset flag) packets. Multitracker[edit] Another unofficial feature is an extension to the BitTorrent metadata format proposed by John Hoffman[97] and implemented by several indexing websites. It allows the use of multiple trackers per file, so if one tracker fails, others can continue to support file transfer. It is implemented in several clients, such as BitComet, BitTornado, BitTorrent, KTorrent, Transmission, Deluge, μTorrent, rtorrent, Vuze, and Frostwire. Trackers are placed in groups, or tiers, with a tracker randomly chosen from the top tier and tried, moving to the next tier if all the trackers in the top tier fail. Torrents with multiple trackers can decrease the time it takes to download a file, but also have a few consequences: Poorly implemented[98] clients may contact multiple trackers, leading to more overhead-traffic. Torrents from closed trackers suddenly become downloadable by non-members, as they can connect to a seed via an open tracker. Peer selection[edit] As of December 2008[update], BitTorrent, Inc. is working with Oversi on new Policy Discover Protocols that query the ISP for capabilities and network architecture information. Oversi's ISP hosted NetEnhancer box is designed to "improve peer selection" by helping peers find local nodes, improving download speeds while reducing the loads into and out of the ISP's network.[99] Implementations[edit] Main article: Comparison of BitTorrent clients The BitTorrent specification is free to use and many clients are open source, so BitTorrent clients have been created for all common operating systems using a variety of programming languages. The official BitTorrent client, μTorrent, qBittorrent, Transmission, Vuze, and BitComet are some of the most popular clients.[100][101][102][103] Some BitTorrent implementations such as MLDonkey and Torrentflux are designed to run as servers. For example, this can be used to centralize file sharing on a single dedicated server which users share access to on the network.[104] Server-oriented BitTorrent implementations can also be hosted by hosting providers at co-located facilities with high bandwidth Internet connectivity (e.g., a datacenter) which can provide dramatic speed benefits over using BitTorrent from a regular home broadband connection. Services such as ImageShack can download files on BitTorrent for the user, allowing them to download the entire file by HTTP once it is finished. The Opera web browser supports BitTorrent,[105] as does Wyzo and Brave.[106] BitLet allows users to download Torrents directly from their browser using a Java applet. An increasing number of hardware devices are being made to support BitTorrent. These include routers and NAS devices containing BitTorrent-capable firmware like OpenWrt. Proprietary versions of the protocol which implement DRM, encryption, and authentication are found within managed clients such as Pando. Legal issues[edit] Main article: Legal issues with BitTorrent Although the protocol itself is legal,[107] problems stem from using the protocol to traffic copyright infringing works, since BitTorrent is often used to download otherwise paid content, such as movies and video games. There has been much controversy over the use of BitTorrent trackers. BitTorrent metafiles themselves do not store file contents. Whether the publishers of BitTorrent metafiles violate copyrights by linking to copyrighted works without the authorization of copyright holders is controversial. Various jurisdictions have pursued legal action against websites that host BitTorrent trackers. High-profile examples include the closing of Suprnova.org, TorrentSpy, LokiTorrent, BTJunkie, Mininova, Oink's Pink Palace and What.cd. The Pirate Bay torrent website, formed by a Swedish group, is noted for the "legal" section of its website in which letters and replies on the subject of alleged copyright infringements are publicly displayed. On 31 May 2006, The Pirate Bay's servers in Sweden were raided by Swedish police on allegations by the MPAA of copyright infringement;[108] however, the tracker was up and running again three days later. In the study used to value NBC Universal in its merger with Comcast, Envisional examined the 10,000 torrent swarms managed by PublicBT which had the most active downloaders. After excluding pornographic and unidentifiable content, it was found that only one swarm offered legitimate content.[109] In the United States, more than 200,000 lawsuits have been filed for copyright infringement on BitTorrent since 2010.[110] On 30 April 2012, the High Court of Justice ordered five ISPs to block BitTorrent search engine The Pirate Bay.[111] (see List of websites blocked in the United Kingdom) Security problems[edit] One concern is the UDP flood attack. BitTorrent implementations often use μTP for their communication. To achieve high bandwidths, the underlying protocol used is UDP, which allows spoofing of source addresses of internet traffic. It has been possible to carry out Denial-of-service attacks in a P2P lab environment, where users running BitTorrent clients act as amplifiers for an attack at another service.[112] However this is not always an effective attack because ISPs can check if the source address is correct. Malware[edit] Several studies on BitTorrent found files containing malware, available for download. In particular, one small sample[113] indicated that 18% of all executable programs available for download contained malware. Another study[114] claims that as much as 14.5% of BitTorrent downloads contain zero-day malware, and that BitTorrent was used as the distribution mechanism for 47% of all zero-day malware they have found. See also[edit] Anonymous P2P Napster Gnutella Anti-Counterfeiting Trade Agreement Bencode Cache Discovery Protocol Comparison of BitTorrent clients Comparison of BitTorrent sites Comparison of BitTorrent tracker software FastTrack Glossary of BitTorrent terms Magnet URI scheme μTP (Micro Transport Protocol) Peer-to-peer file sharing Segmented file transfer Simple file verification Super-seeding Torrent file Torrent poisoning VPN References[edit] ^ a b c d e Cohen, Bram (October 2002). "BitTorrent Protocol 1.0". BitTorrent.org. Archived from the original on 8 February 2014. Retrieved 1 June 2020. ^ Schulze, Hendrik; Klaus Mochalski (2009). "Internet Study 2008/2009" (PDF). Leipzig, Germany: ipoque. Archived from the original (PDF) on 26 June 2011. Retrieved 3 October 2011. Peer-to-peer file sharing (P2P) still generates by far the most traffic in all monitored regions – ranging from 43% in Northern Africa to 70% Eastern Europe. ^ "Application Usage & Threat Report". Palo Alto Networks. 2013. Archived from the original on 31 October 2013. Retrieved 7 April 2013. ^ Marozzo, Fabrizio; Talia, Domenico; Trunfio, Paolo (2020). "A Sleep-and-Wake technique for reducing energy consumption in BitTorrent networks". Concurrency and Computation: Practice and Experience. 32 (14). doi:10.1002/cpe.5723. ISSN 1532-0634. S2CID 215841734. ^ Van der Sar, Ernesto (4 December 2009). "Thunder Blasts uTorrent's Market Share Away - TorrentFreak". TorrentFreak. Archived from the original on 20 February 2016. Retrieved 18 June 2018. ^ "迅雷-全球共享计算与区块链创领者". www.xunlei.com. Retrieved 21 November 2019. ^ "UB Engineering Tweeter". University at Buffalo's School of Engineering and Applied Sciences. Archived from the original on 11 November 2013. ^ Cohen, Bram (2 July 2001). "BitTorrent – a new P2P app". Yahoo eGroups. Archived from the original on 29 January 2008. Retrieved 15 April 2007. ^ Wang, Liang; Kangasharju, J. (1 September 2013). "Measuring large-scale distributed systems: Case of Bit Torrent Mainline DHT". IEEE P2P 2013 Proceedings. pp. 1–10. doi:10.1109/P2P.2013.6688697. ISBN 978-1-4799-0515-7. S2CID 5659252. Archived from the original on 18 November 2015. Retrieved 7 January 2016. ^ "BitTorrent and μTorrent Software Surpass 150 Million User Milestone". Bittorrent.com. 9 January 2012. Archived from the original on 26 March 2014. Retrieved 9 July 2012. ^ https://github.com/bittorrent/bittorrent.org/commit/51fe877e6ed6f20fb7eea67fe234e7b266aaed84 ^ Cohen, Bram. "The BitTorrent Protocol Specification v2". BitTorrent.org. BitTorrent. Retrieved 28 October 2020. ^ "Bittorrent-v2". libbittorrent.org. libbittorrent. Retrieved 28 October 2020. ^ Menasche, Daniel S.; Rocha, Antonio A. A.; de Souza e Silva, Edmundo A.; Leao, Rosa M.; Towsley, Don; Venkataramani, Arun (2010). "Estimating Self-Sustainability in Peer-to-Peer Swarming Systems". Performance Evaluation. 67 (11): 1243–1258. arXiv:1004.0395. doi:10.1016/j.peva.2010.08.013. S2CID 9361889. by D. Menasche, A. Rocha, E. de Souza e Silva, R. M. Leao, D. Towsley, A. Venkataramani. ^ Urvoy-Keller (December 2006). "Rarest First and Choke Algorithms Are Enough" (PDF). SIGCOMM. Archived (PDF) from the original on 23 May 2012. Retrieved 9 March 2012. ^ Ernesto (12 July 2009). "PublicBT Tracker Set To Patch BitTorrent' Achilles' Heel". Torrentfreak. Archived from the original on 26 March 2014. Retrieved 14 July 2009. ^ Chwan-Hwa (John) Wu, J. David Irwin. Introduction to Computer Networks and Cybersecurity. Chapter 5.4.: Partially Centralized Architectures. CRC Press. February 4, 2013. ISBN 9781466572133 ^ Zeilemaker, N., Capotă, M., Bakker, A., & Pouwelse, J. (2011). "Tribler P2P Media Search and Sharing." Proceedings of the 19th ACM International Conference on Multimedia - MM ’11. ^ "DecentralizedRecommendation –". Tribler.org. Archived from the original on 2 December 2008. Retrieved 9 July 2012. ^ Wong, Bernard; Vigfusson, Ymir; Gun Sirer, Emin (2 May 2007). "Hyperspaces for Object Clustering and Approximate Matching in Peer-to-Peer Overlays" (PDF). Cornell University. Archived (PDF) from the original on 17 June 2012. Retrieved 7 April 2013. ^ Wong, Bernard (2008). "Cubit: Approximate Matching for Peer-to-Peer Overlays". Cornell University. Archived from the original on 31 December 2012. Retrieved 26 May 2008. ^ Wong, Bernard. "Approximate Matching for Peer-to-Peer Overlays with Cubit" (PDF). Cornell University. Archived (PDF) from the original on 29 October 2008. Retrieved 26 May 2008. ^ "Torrent Exchange". Archived from the original on 5 October 2013. Retrieved 31 January 2010. The torrent sharing feature of BitComet. Bitcomet.com. ^ a b Tamilmani, Karthik (25 October 2003). "Studying and enhancing the BitTorrent protocol". Stony Brook University. Archived from the original (DOC) on 19 November 2004. Retrieved 6 May 2006. ^ Kaune, Sebastian; et al. (2009). "Unraveling BitTorrent's File Unavailability: Measurements and Analysis". arXiv:0912.0625 [cs.NI]. ^ D. Menasche; et al. (1–4 December 2009). Content Availability and Bundling in Swarming Systems (PDF). CoNEXT'09. Rome, Italy: ACM via sigcomm.org. ISBN 978-1-60558-636-6. Archived (PDF) from the original on 1 May 2011. Retrieved 18 December 2009. ^ Kaune, Sebastian; et al. "The Seeder Promotion Problem: Measurements, Analysis and Solution Space" (PDF). Queen Mary's University London. Archived (PDF) from the original on 9 August 2014. Retrieved 20 July 2017. ^ "BitTorrent Specification". Wiki.theory.org. Archived from the original on 26 June 2013. Retrieved 9 July 2012.[dubious – discuss] ^ "» BitTorrent v2". Retrieved 27 September 2020. ^ a b Jones, Ben (7 June 2015). "BitTorrent's DHT Turns 10 Years Old". TorrentFreak. Archived from the original on 11 June 2015. Retrieved 5 July 2015. ^ "Unofficial BitTorrent Protocol Specification v1.0". Archived from the original on 14 December 2006. Retrieved 4 October 2009.[dubious – discuss] ^ Harrison, David (3 August 2008). "Private Torrents". Bittorrent.org. Archived from the original on 24 March 2013. Retrieved 4 October 2009. ^ "BitComet Banned From Growing Number of Private Trackers". Archived from the original on 26 March 2014. Retrieved 4 October 2009. ^ "I2P Compared to Tor - I2P". Archived from the original on 22 December 2015. Retrieved 16 December 2015. ^ "I2PHelper HowTo - VuzeWiki". Archived from the original on 20 October 2017. Retrieved 16 December 2015. ^ "Bittorrent over Tor isn't a good idea - The Tor Blog". Archived from the original on 13 October 2016. Retrieved 2 October 2016. ^ Inc., The Tor Project. "Tor Project: FAQ". Archived from the original on 22 October 2016. Retrieved 2 October 2016. ^ "This Website Could Be The Ultimate All-In-One Torrent Machine". 8 April 2016. Archived from the original on 8 April 2016. ^ "Torrent From the Cloud With Seedr - TorrentFreak". 17 January 2016. Archived from the original on 19 April 2016. Retrieved 8 April 2016. ^ "Bittorrent-v2". libbittorrent.org. libbittorrent. Retrieved 28 October 2020. ^ See, for example, "Why Bit Torrent". Archived from the original on 28 January 2013.. tasvideos.org. ^ "Sub Pop page on BitTorrent.com". Archived from the original on 14 January 2007. Retrieved 13 December 2006. ^ "DGMlive.com". DGMlive.com. Archived from the original on 11 November 2013. Retrieved 9 July 2012. ^ "VODO – About...". Retrieved 15 April 2012. (WebCite). ^ Cory Doctorow (15 October 2009). "Vodo: a filesharing service for film-makers". Boing Boing. Happy Mutants LLC. Retrieved 15 April 2012. (WebCite) ^ Ernesto. "Pioneer One, The BitTorrent Exclusive TV-Series Continues". TorrentFreak. Retrieved 15 April 2012. (WebCite) ^ "CBC to BitTorrent Canada's Next Great Prime Minister". CBC News. 19 March 2008. Archived from the original on 14 June 2010. Retrieved 19 March 2008. ^ "Bittorrent" (in Norwegian). Nrkbeta.no. 2008. Archived from the original on 24 October 2013. Retrieved 7 April 2013. ^ "Torrents uploaded by EeuwvandeStad". MiniNova. 2009. Archived from the original on 4 November 2013. Retrieved 7 April 2013. ^ Denters, M. (11 August 2010). "Tegenlicht – Download California Dreaming". VPRO.nl. Archived from the original on 26 March 2014. Retrieved 7 April 2013. ^ Bol, M. (1 October 2009). "Tegenlicht – VPRO gemeengoed" (in Dutch). VPRO.nl. Archived from the original on 26 March 2014. Retrieved 7 April 2013. ^ "Using BitTorrent with Amazon S3". Archived from the original on 26 March 2014. ^ "Blizzard Downloader". Curse Inc. 4 November 2010. Archived from the original on 26 March 2014. Retrieved 4 November 2010. ^ "World of Tanks FAQ". Wargaming. 15 December 2014. Archived from the original on 18 December 2014. Retrieved 15 December 2014. ^ MJ Guthrie (11 March 2013). "EVE Online reconfiguring launcher to use BitTorrent". Massively.joystiq.com. Archived from the original on 13 February 2014. Retrieved 7 April 2013. ^ CCP Games (20 July 2010). "All quiet on the EVE Launcher front? – EVE Community". Community.eveonline.com. Archived from the original on 13 March 2013. Retrieved 7 April 2013. ^ "Complete Download Options List – BitTorrent". Ubuntu.com. Archived from the original on 24 April 2010. Retrieved 7 May 2009. ^ HM Government (4 September 2012). "Combined Online Information System". Data.Gov.Uk Beta. Controller of Her Majesty's Stationery Office. Archived from the original on 26 March 2014. Retrieved 7 September 2012. ^ Ernesto (4 June 2010). "UK Government Uses BitTorrent to Share Public Spending Data". TorrentFreak. Archived from the original on 27 October 2013. Retrieved 7 September 2012. ^ "HPC Data Repository". Florida State University. Archived from the original on 2 April 2013. Retrieved 7 April 2013. ^ Costa, Fernando; Silva, Luis; Fedak, Gilles; Kelley, Ian (2008). "Optimizing the data distribution layer of BOINC with Bit Torrent". 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE International Symposium on Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE. p. 1. doi:10.1109/IPDPS.2008.4536446. ISBN 978-1-4244-1693-6. S2CID 13265537.(registration required) ^ "Torrents Help Researchers Worldwide to Study Babies' Brains". Torrent Freak. 3 June 2017. Archived from the original on 5 January 2018. Retrieved 4 January 2018. ^ "Academic Torrents Website". Retrieved 4 May 2020. ^ Miccoli, Fräntz (2014). "Academic Torrents: Bringing P2P Technology to the Academic World". MyScienceWork. Retrieved 6 May 2020. ^ Ernesto (25 June 2010). "Facebook Uses BitTorrent, and They Love It". Torrent Freak. Torrent Freak. Archived from the original on 19 April 2014. Retrieved 7 September 2012. ^ Ernesto (10 February 2010). "Twitter Uses BitTorrent For Server Deployment". Torrent Freak. Torrent Freak. Archived from the original on 26 March 2014. Retrieved 7 September 2012. ^ Ernesto (16 July 2010). "BitTorrent Makes Twitter's Server Deployment 75x Faster". Torrent Freak. Torrent Freak. Archived from the original on 26 March 2014. Retrieved 7 September 2012. ^ a b Ernesto (7 August 2012). "Internet Archive Starts Seeding 1,398,875 Torrents". TorrentFreak. Archived from the original on 8 August 2012. Retrieved 7 August 2012. ^ "Hot List for bt1.us.archive.org (Updated August 7, 2012, 7:31 pm PDT)". Archived from the original on 3 August 2012. Retrieved 8 August 2012.. Archive.org. ^ "Welcome to Archive torrents". Archived from the original on 19 January 2016. Retrieved 22 December 2015.. Archive.org. 2012. ^ Carr, Austin (4 January 2011). "BitTorrent Has More Users Than Netflix and Hulu Combined—and Doubled". fastcompany.com. Archived from the original on 10 January 2011. Retrieved 9 July 2012. ^ Hartley, Matt (1 July 2011). "BitTorrent turns ten". Financialpost.com. Archived from the original on 4 November 2013. Retrieved 9 July 2012. ^ "AT&T patents system to 'fast-lane' BitTorrent traffic". Thestack.com. 8 May 2006. Archived from the original on 23 February 2015. Retrieved 5 March 2015. ^ "FAQ:Modems/routers that are known to have problems with P2P apps". uTorrent.com. Archived from the original on 13 September 2008. Retrieved 7 April 2013. ^ Halkes, Gertjan; Pouwelse, Johan (2011). Jordi Domingo-Pascual; et al. (eds.). UDP NAT and Firewall Puncturing in the Wild. NETWORKING 2011:10th International IFIP TC 6 Networking Conference, Valencia, Spain, May 9–13, 2011, Proceedings. Springer. p. 7. ISBN 9783642207976. Archived from the original on 9 May 2013. Retrieved 7 April 2013. ^ "Vuze Changelog". Azureus.sourceforge.net. Archived from the original on 1 December 2006. ^ "DHT Bootstrap Update | The BitTorrent Engineering Blog". engineering.bittorrent.com. Retrieved 21 November 2019. ^ GitHub - bittorrent/bootstrap-dht: DHT bootstrap server, BitTorrent Inc., 11 November 2019, retrieved 21 November 2019 ^ Wang, Liang; Kangasharju, Jussi. (2013). "Measuring Large-Scale Distributed Systems: Case of BitTorrent Mainline DHT" (PDF). IEEE Peer-to-Peer. Archived (PDF) from the original on 12 May 2014. Retrieved 15 May 2014. ^ "Khashmir.Sourceforge.net". Khashmir.Sourceforge.net. Archived from the original on 2 July 2012. Retrieved 9 July 2012. ^ "plugins.vuze.com". plugins.vuze.com. Archived from the original on 1 August 2012. Retrieved 9 July 2012. ^ "HTTP-Based Seeding Specification". BitTornado.com. Archived from the original (TXT) on 20 March 2004. Retrieved 9 May 2006. ^ John Hoffman, DeHackEd (25 February 2008). "HTTP Seeding – BitTorrent Enhancement Proposal № 17". Archived from the original on 13 December 2013. Retrieved 17 February 2012. ^ "HTTP/FTP Seeding for BitTorrent". GetRight.com. Archived from the original on 28 December 2009. Retrieved 18 March 2010. ^ Michael Burford (25 February 2008). "WebSeed – HTTP/FTP Seeding (GetRight style) – BitTorrent Enhancement Proposal № 19". Bittorrent.org. Archived from the original on 13 December 2013. Retrieved 17 February 2012. ^ "Burn Any Web-Hosted File into a Torrent With Burnbit". TorrentFreak. 13 September 2010. Archived from the original on 9 August 2011. Retrieved 9 July 2012. ^ "PHP based torrent file creator, tracker and seed server". PHPTracker. Archived from the original on 19 December 2013. Retrieved 9 July 2012. ^ Gillmor, Steve (13 December 2003). "BitTorrent and RSS Create Disruptive Revolution". EWeek.com. Retrieved 22 April 2007. ^ Miller, Ernest (2 March 2004). "BitTorrent + RSS = The New Broadcast". Archived from the original on 23 October 2013.. The Importance of... Corante.com. ^ Raymond, Scott (16 December 2003). "Broadcatching with BitTorrent". scottraymond.net. Archived from the original on 13 February 2004. ^ "MoveDigital API REST functions". Move Digital. 2006. Archived from the original on 11 August 2006. Retrieved 9 May 2006. Documentation. ^ "Prodigem Enclosure Puller(pep.txt)". Prodigem.com. Archived from the original (TXT) on 26 May 2006. Retrieved 9 May 2006. via Internet Wayback Machine. ^ "Encrypting Bittorrent to take out traffic shapers". Torrentfreak.com. 5 February 2006. Archived from the original on 26 March 2014. Retrieved 9 May 2006. ^ "Comcast Throttles BitTorrent Traffic, Seeding Impossible". Archived from the original on 11 October 2013., TorrentFreak, 17 August 2007. ^ Broache, Anne (27 March 2008). "Comcast and BitTorrent Agree to Collaborate". News.com. Archived from the original on 9 May 2008. Retrieved 9 July 2012. ^ Soghoian, Chris (4 September 2007). "Is Comcast's BitTorrent filtering violating the law?". Cnet.com. Archived from the original on 15 July 2010. Retrieved 9 July 2012. ^ "BEP12: Multitracker Metadata Extension". BitTorrent Inc. Archived from the original on 27 December 2012. Retrieved 28 March 2013. ^ "P2P:Protocol:Specifications:Multitracker". wiki.depthstrike.com. Archived from the original on 26 March 2014. Retrieved 13 November 2009.[dubious – discuss] ^ Johnston, Casey (9 December 2008). "Arstechnica.com". Arstechnica.com. Archived from the original on 12 December 2008. Retrieved 9 July 2012. ^ Van Der Sar, Ernesto (4 December 2009). "Thunder Blasts uTorrent's Market Share Away". TorrentFreak. Archived from the original on 7 December 2009. Retrieved 15 September 2011. ^ "uTorrent Dominates BitTorrent Client Market Share". TorrentFreak. 24 June 2009. Archived from the original on 3 April 2014. Retrieved 25 June 2013. ^ "Windows Public File Sharing Market Share 2015". opswat. Archived from the original on 14 April 2016. Retrieved 1 April 2016. ^ Henry, Alan. "Most Popular BitTorrent Client 2015". lifehacker. Archived from the original on 9 April 2016. Retrieved 1 April 2016. ^ "Torrent Server combines a file server with P2P file sharing". Turnkeylinux.org. Archived from the original on 7 July 2012. Retrieved 9 July 2012. ^ Anderson, Nate (1 February 2007). "Does network neutrality mean an end to BitTorrent throttling?". Ars Technica, LLC. Archived from the original on 16 December 2008. Retrieved 9 February 2007. ^ Mark. "How to Stream Movies and Download Torrent Files in Brave Browser". Browser Pulse. Retrieved 6 October 2020. ^ "Is torrenting safe? Is it illegal? Are you likely to be caught?". 29 November 2018. Archived from the original on 6 October 2018. Retrieved 5 October 2018. ^ "The Piratebay is Down: Raided by the Swedish Police". TorrentFreak. 31 May 2006. Archived from the original on 16 April 2014. Retrieved 20 May 2007. ^ "Technical report: An Estimate of Infringing Use of the Internet" (PDF). Envisional. 1 January 2011. Archived (PDF) from the original on 25 April 2012. Retrieved 6 May 2012. ^ "BitTorrent: Copyright lawyers' favourite target reaches 200,000 lawsuits". The Guardian. 9 August 2011. Archived from the original on 4 December 2013. Retrieved 10 January 2014. ^ Albanesius, Chloe (30 April 2012). "U.K. High Court Orders ISPs to Block The Pirate Bay". PC Magazine. Archived from the original on 25 May 2013. Retrieved 6 May 2012. ^ Adamsky, Florian (2015). "P2P File-Sharing in Hell: Exploiting BitTorrent Vulnerabilities to Launch Distributed Reflective DoS Attacks". Archived from the original on 1 October 2015. Retrieved 21 August 2015. ^ Berns, Andrew D.; Jung, Eunjin (EJ) (24 April 2008). "Searching for Malware in Bit Torrent". University of Iowa, via TechRepublic. Archived from the original on 1 May 2013. Retrieved 7 April 2013.(registration required) ^ Vegge, Håvard; Halvorsen, Finn Michael; Nergård, Rune Walsø (2009), "Where Only Fools Dare to Tread: An Empirical Study on the Prevalence of Zero-Day Malware" (PDF), 2009 Fourth International Conference on Internet Monitoring and Protection, IEEE Computer Society, p. 66, doi:10.1109/ICIMP.2009.19, ISBN 978-1-4244-3839-6, S2CID 15567480, archived from the original (PDF (orig. work + pub. paper)) on 17 June 2013 Further reading[edit] Pouwelse, Johan; et al. (2005). "The Bittorrent P2P File-Sharing System: Measurements and Analysis". Peer-to-Peer Systems IV. Lecture Notes in Computer Science. 3640. Berlin: Springer. pp. 205–216. doi:10.1007/11558989_19. ISBN 978-3-540-29068-1. Retrieved 4 September 2011. External links[edit] Wikimedia Commons has media related to BitTorrent. Official website Specification BitTorrent at Curlie Interview with chief executive Ashwin Navin Unofficial BitTorrent Protocol Specification v1.0 at wiki.theory.org Unofficial BitTorrent Location-aware Protocol 1.0 Specification at wiki.theory.org Czerniawski, Michal (20 December 2009). "Responsibility of Bittorrent Search Engines for Copyright Infringements". SSRN. doi:10.2139/ssrn.1540913. SSRN 1540913. Cite journal requires |journal= (help) Cohen, Bram (16 February 2005). "Under the hood of BitTorrent". Computer Systems Colloquium (EE380). Stanford University. v t e Cloud computing As a service Content as a service Data as a service Desktop as a service Function as a service Infrastructure as a service Integration platform as a service Mobile backend as a service Network as a service Platform as a service Security as a service Software as a service Technologies Cloud database Cloud storage Data centers Distributed file system for cloud Hardware virtualization Internet Native cloud application Networking Security Structured storage Virtual appliance Web APIs Virtual private cloud Applications Box Dropbox Google Workspace Drive HP Cloud (closed) IBM Cloud Microsoft Office 365 OneDrive Oracle Cloud Rackspace Salesforce Workday Zoho Platforms Alibaba Cloud Amazon Web Services AppScale Box Bluemix CloudBolt Cloud Foundry Cocaine (PaaS) Creatio Engine Yard Helion GE Predix Google App Engine GreenQloud Heroku IBM Cloud Inktank Jelastic Mendix Microsoft Azure MindSphere Netlify Oracle Cloud OutSystems openQRM OpenShift PythonAnywhere RightScale Scalr Force.com SAP Cloud Platform Splunk VMware vCloud Air WaveMaker Infrastructure Alibaba Cloud Amazon Web Services Abiquo Enterprise Edition CloudStack Citrix Cloud CtrlS DigitalOcean EMC Atmos Eucalyptus Fujitsu GoGrid Google Cloud Platform GreenButton GreenQloud IBM Cloud iland Joyent Linode Lunacloud Microsoft Azure Mirantis Netlify Nimbula Nimbus OpenIO OpenNebula OpenStack Oracle Cloud OrionVM Rackspace Cloud Safe Swiss Cloud SoftLayer Zadara Storage libvirt libguestfs OVirt Virtual Machine Manager Wakame-vdc Virtual Private Cloud OnDemand Category Commons v t e BitTorrent Companies BitTorrent, Inc. Vuze, Inc. People Bram Cohen Ross Cohen Eric Klinker Ashwin Navin Justin Sun Technology Glossary Broadcatching Distributed hash tables DNA I2P index Local Peer Discovery Peer exchange Protocol encryption Super-seeding Tracker Torrent file TCP UDP µTP WebRTC WebTorrent Clients (comparison, usage share) Ares Galaxy BitTorrent (original client) BitComet BitLord Deluge Free Download Manager Flashget FrostWire Getright Go!Zilla KTorrent libtorrent (library) LimeWire µTorrent Miro MLDonkey qBittorrent rTorrent Shareaza Tixati Transmission Tribler Vuze (formerly Azureus) WebTorrent Desktop Xunlei Tracker software (comparison) opentracker PeerTracker TorrentPier XBT Tracker Search engines (comparison) 1337x BTDigg Demonoid etree ExtraTorrent EZTV isoHunt Karagarga KickassTorrents Nyaa Torrents The Pirate Bay RARBG Tamil Rockers Torrentz YIFY yourBittorrent Defunct websites BTJunkie Burnbit LokiTorrent Mininova Oink's Pink Palace OpenBitTorrent Suprnova.org t411 Torrent Project TorrentSpy What.CD YouTorrent Related topics aXXo BitTorrent Open Source License Glossary of BitTorrent terms Popcorn Time Slyck.com TorrentFreak Category Commons v t e Peer-to-peer file sharing Networks, protocols Centralized Direct Connect OpenNap Soribada Soulseek Decentralized Ares BitTorrent DAT eDonkey FastTrack Freenet GNUnet Gnutella Gnutella2 I2P IPFS Kad LBRY OpenFT Perfect Dark Retroshare Share Tribler WebTorrent WinMX Winny ZeroNet Historic Audiogalaxy CuteMX Entropy Kazaa LimeWire Morpheus Overnet Napster Scour WASTE Comparisons of clients Advanced Direct Connect BitTorrent Direct Connect eDonkey Gnutella Gnutella2 WebTorrent Hyperlinks eD2k Magnet Metalink Uses Backup Broadcatching Segmented file transfer Disk sharing game & video sharing Image sharing Music sharing Peercasting Sharing software Web hosting (Freesite, IPFS, ZeroNet) Legal aspects Concepts Privacy Anonymous P2P Darknet Darkweb Friend-to-friend Open Music Model Private P2P Tor Internal technologies DHT Merkle tree NAT traversal PEX Protocol Encryption SHA-1 Super-seeding Tracker UDP hole punching µTP Retrieved from "https://en.wikipedia.org/w/index.php?title=BitTorrent&oldid=1018756587" Categories: BitTorrent Computer-related introductions in 2001 Application layer protocols Web 2.0 File sharing Hidden categories: All accuracy disputes Articles with disputed statements from April 2013 CS1 Norwegian-language sources (no) CS1 Dutch-language sources (nl) Pages with login required references or sources Articles with short description Short description matches Wikidata Use dmy dates from July 2016 Articles containing potentially dated statements from February 2009 All articles containing potentially dated statements Articles containing potentially dated statements from June 2020 Articles containing potentially dated statements from 2013 Articles containing potentially dated statements from January 2012 All articles with unsourced statements Articles with unsourced statements from January 2020 Articles containing potentially dated statements from 2011 Articles with unsourced statements from November 2011 Articles containing potentially dated statements from December 2008 Commons category link is on Wikidata Official website different in Wikidata and Wikipedia Articles with Curlie links CS1 errors: missing periodical Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version In other projects Wikimedia Commons Languages Afrikaans العربية Asturianu Azərbaycanca Беларуская Български Bosanski Català Čeština Cymraeg Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 हिन्दी Hrvatski Ilokano Bahasa Indonesia Italiano עברית ქართული Kurdî Latviešu Lietuvių Magyar മലയാളം मराठी Bahasa Melayu Nederlands 日本語 Norsk bokmål Norsk nynorsk Polski Português Română Русский Shqip සිංහල Simple English Slovenčina Slovenščina Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska தமிழ் Татарча/tatarça ไทย Türkçe Українська اردو Tiếng Việt 吴语 粵語 中文 Edit links This page was last edited on 19 April 2021, at 18:10 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-2210 ---- Chilling effect - Wikipedia Chilling effect From Wikipedia, the free encyclopedia Jump to navigation Jump to search For other uses, see Chilling effect (disambiguation). Not to be confused with Chilling Effects. Part of a series on Censorship by country Countries Algeria Armenia Australia Azerbaijan Bangladesh Belarus Bhutan Bolivia Brazil Canada China (Hong Kong/overseas) Cuba Czech Republic Denmark Ecuador Eritrea Finland France Germany (Nazi / Democratic Republic / Federal Republic) Honduras India Iran Iraq Ireland Israel Italy Jamaica Japan Malaysia Maldives Mexico Myanmar New Zealand Nigeria North Korea Pakistan Paraguay Philippines Poland Portugal Romania Russia (Soviet Union / Russian Empire) Samoa Saudi Arabia Serbia Singapore Somalia South Korea Spain Sri Lanka Sweden Taiwan Thailand Tunisia Turkey Ukraine United Kingdom United States Venezuela Vietnam See also Freedom of speech by country Internet censorship and surveillance by country v t e In a legal context, a chilling effect is the inhibition or discouragement of the legitimate exercise of natural and legal rights by the threat of legal sanction.[1] The right that is most often described as being suppressed by a chilling effect is the US constitutional right to free speech. A chilling effect may be caused by legal actions such as the passing of a law, the decision of a court, or the threat of a lawsuit; any legal action that would cause people to hesitate to exercise a legitimate right (freedom of speech or otherwise) for fear of legal repercussions. When that fear is brought about by the threat of a libel lawsuit, it is called libel chill.[2] A lawsuit initiated specifically for the purpose of creating a chilling effect may be called a Strategic Lawsuit Against Public Participation ("SLAPP"). "Chilling" in this context normally implies an undesirable slowing. Outside the legal context in common usage; any coercion or threat of coercion (or other unpleasantries) can have a chilling effect on a group of people regarding a specific behavior, and often can be statistically measured or be plainly observed. For example, the news headline "Flood insurance [price] spikes have chilling effect on some home sales,"[3] and the abstract title of a two‐part survey of 160 college students involved in dating relationships: "The chilling effect of aggressive potential on the expression of complaints in intimate relationships."[4] Contents 1 Usage 2 History 2.1 Chilling effects on Wikipedia users 3 See also 4 References 5 External links Usage[edit] In United States and Canadian law, the term chilling effects refers to the stifling effect that vague or excessively broad laws may have on legitimate speech activity.[5] However, the term is also now commonly used outside American legal jargon, such as the chilling effects of high prices[3] or of corrupt police, or of "anticipated aggressive repercussions" (in say, personal relationships[4]). A chilling effect is an effect that reduces, suppresses, discourages, delays, or otherwise retards reporting concerns of any kind. An example of the "chilling effect" in Canadian case law can be found in Iorfida v. MacIntyre where the constitutionality of a criminal law prohibiting the publication of literature depicting illicit drug use was challenged. The court found that the law had a "chilling effect" on legitimate forms of expression and could stifle political debate on issues such as the legalization of marijuana.[6] The court noted that it did not adopt the same "chilling effect" analysis used in American law but considered the chilling effect of the law as a part of its own analysis.[7] Regarding Ömer Faruk Gergerlioğlu case in Turkey, press release of the Office of the United Nations High Commissioner for Human Rights (OHCHR) defined, Turkey's mis-using of counter-terrorism measures can have a chilling effect on the enjoyment of fundamental freedoms and human rights.[8] History[edit] In 1644 John Milton expressed the chilling effect of censorship in Areopagitica: For to distrust the judgement and the honesty of one who hath but a common repute in learning and never yet offended, as not to count him fit to print his mind without a tutor or examiner, lest he should drop a schism or something of corruption, is the greatest displeasure and indignity to a free and knowing spirit that can be put upon him.[9] The term chilling effect has been in use in the United States since as early as 1950.[10] The United States Supreme Court first refers to the "chilling effect" in the context of the United States Constitution in Wieman v. Updegraff in 1952.[11] It, however, became further used as a legal term when William J. Brennan, a justice of the United States Supreme Court, used it in a judicial decision (Lamont v. Postmaster General) which overturned a law requiring a postal patron receiving "communist political propaganda"[12] to specifically authorize the delivery.[13] The Lamont case, however, did not center around a law that explicitly stifles free speech. The "chilling effect" referred to at the time was a "deterrent effect" on freedom of expression—even when there is no law explicitly prohibiting it. However, in general, "chilling effect" is now often used in reference to laws or actions that do not explicitly prohibit legitimate speech, but that impose undue burdens.[13][failed verification] Chilling effects on Wikipedia users[edit] Edward Snowden disclosed in 2013 that the US government's Upstream program was collecting data on people reading Wikipedia articles. This revelation had significant impact on the self-censorship of the readers, as shown by the fact that there were substantially fewer views for articles related to terrorism and security.[14] The court case Wikimedia Foundation v. NSA has since followed. See also[edit] Censorship Culture of fear Opinion corridor Fear mongering Media transparency Prior restraint Self-censorship Strategic lawsuit against public participation References[edit] ^ chilling effect. (n.d.). Retrieved October 19, 2011, from http://law.yourdictionary.com/chilling-effect ^ Green, Allen (October 15, 2009). "Banish the libel chill". The Guardian. ^ a b "Flood insurance spikes have chilling effect on some home sales". WWL‑TV Eyewitness News. October 15, 2013. Archived from the original on November 19, 2013. Realtors say [price spikes are] already causing home sales to fall through when buyers realize they can't afford the flood insurance. ^ a b Cloven, Denise H.; Roloff, Michael E. (1993). "The Chilling Effect of Aggressive Potential on The Expression of Complaints in Intimate Relationships". Communication Monographs. 60 (3): 199–219. doi:10.1080/03637759309376309. A two‐part survey of 160 college students involved in dating relationships.... This chilling effect was greater when individuals who generally feared conflict anticipated aggressive repercussions (p < .001), and when people anticipated symbolic aggression from relationally independent partners (p < .05). ^ "censorship-reports-striking-a-balance-hate-speech-freedom-of-expression-and-nondiscrimination-1992-431-pp". doi:10.1163/2210-7975_hrd-2210-0079. Cite journal requires |journal= (help) ^ Iorfida v. MacIntyre, 1994 CanLII 7341 (ON SC)at para. 20, < "Archived copy". Archived from the original on July 13, 2012. Retrieved October 25, 2011.CS1 maint: archived copy as title (link)> retrieved on 2011-10-25 ^ Iorfida v. MacIntyre, 1994 CanLII 7341 (ON SC) at para. 37, < "Archived copy". Archived from the original on July 13, 2012. Retrieved October 25, 2011.CS1 maint: archived copy as title (link)> retrieved on 2011-10-25 ^ https://www.ohchr.org/EN/NewsEvents/Pages/DisplayNews.aspx?NewsID=26934&LangID=E&s=09 ^ John Milton (1644) Areopagitica, edited by George H. Sabine (1951), page 29, Appleton-Century-Crofts ^ Freund, Paul A. "4 Vanderbilt Law Review 533, at 539 (1950–1951): The Supreme Court and Civil Liberties". ^ "The Chilling Effect in Constitutional Law". Columbia Law Review. 69 (5): 808–842. May 1969. doi:10.2307/1121147. JSTOR 1121147. ^ Safire, William (July 20, 2005). "Safire Urges Federal Journalist Shield Law". Center For Individual Freedom. Retrieved June 18, 2008. Justice Brennan reported having written a 1965 decision striking down a state's intrusion on civil liberty because of its "chilling effect upon the exercise of First Amendment rights...” ^ a b "LAMONT V. POSTMASTER GENERAL, 381 U. S. 301 (1965)". Justia. Retrieved June 18, 2008. ^ Penney, Jonathon W. (2016). "Chilling Effects: Online Surveillance and Wikipedia Use". Berkeley Technology Law Journal. doi:10.15779/z38ss13. Retrieved August 20, 2019. External links[edit] Lumen, containing many current examples of alleged chilling effects Terms associated with libel cases Cato Policy Analysis No. 270 Chilling The Internet? Lessons from FCC Regulation of Radio Broadcasting Libel Reform Campaign The Chilling Effect of English libel law v t e Censorship Media regulation Books books banned Films banned films Internet circumvention Music Postal Press Radio Speech and expression Thought Video games banned video games Methods Bleeping Book burning Broadcast delay Burying of scholars Censor bars Chilling effect Concision Conspiracy of silence Content-control software Damnatio memoriae Euphemism Minced oath Expurgation Fogging Gag order Heckling Heckler's veto Internet police Memory hole National intranet Newspaper theft Pixelization Prior restraint Propaganda Purge Revisionism Sanitization Self-censorship Speech code Strategic lawsuit Surveillance computer and network mass Whitewashing Word filtering Contexts Criminal Corporate Hate speech Online Ideological LGBT issues Media bias Moralistic fallacy Naturalistic fallacy Politics Propaganda model Religious Suppression of dissent Systemic bias By country Censorship Chinese issues overseas Freedom of speech Internet censorship v t e Law Core subjects Administrative law Civil law Constitutional law Contract Criminal law Deed Equity Evidence International law Law of obligations Private law Procedure Civil Criminal Property law Public law Restitution Statutory law Tort Other subjects Agricultural law Aviation law Amnesty law Banking law Bankruptcy Commercial law Competition law Conflict of laws Construction law Consumer protection Corporate law Cyberlaw Election law Energy law Entertainment law Environmental law Family law Financial law Financial regulation Health law History of the legal profession History of the American legal profession Immigration law Intellectual property International criminal law International human rights International slavery laws Jurimetrics Labour Law of war Legal archaeology Legal fiction Maritime law Media law Military law Probate Estate Will and testament Product liability Public international law Space law Sports law Tax law Transport law Trust law Unenforced law Women in law Sources of law Charter Code Constitution Custom Divine right Divine law Human rights Natural law Natural and legal rights Case law Precedent Law making Ballot measure Codification Decree Edict Executive order Proclamation Legislation Delegated legislation Regulation Rulemaking Promulgation Repeal Treaty Concordat Statutory law Statute Act of Parliament Act of Congress (US) Legal systems Civil law Common law Chinese law Legal pluralism Religious law Canon law Catholic canon law Hindu law Jain law Jewish law Sharia Roman law Socialist law Statutory law Xeer Yassa Legal theory Anarchist Contract theory Critical legal studies Comparative law Feminist Fundamental theory of Catholic canon law Law and economics Legal formalism History Libertarian International legal theory Principle of legality Principle of typicality Rule of law Sociology Jurisprudence Adjudication Administration of justice Criminal justice Court-martial Dispute resolution Fiqh Lawsuit/Litigation Legal opinion Legal remedy Judge Justice of the peace Magistrate Judgment Judicial review Jurisdiction Jury Justice Practice of law Attorney Barrister Counsel Lawyer Legal representation Prosecutor Solicitor Question of fact Question of law Trial Trial advocacy Trier of fact Verdict Legal institutions Bureaucracy The bar The bench Civil society Court Court of equity Election commission Executive Judiciary Law enforcement Legal education Law school Legislature Military Police Political party Tribunal Category Index Outline Portal Authority control MA: 2778658105 Retrieved from "https://en.wikipedia.org/w/index.php?title=Chilling_effect&oldid=1013826707" Categories: Censorship Freedom of expression American legal terminology Hidden categories: CS1 errors: missing periodical CS1 maint: archived copy as title Use mdy dates from August 2017 All articles with failed verification Articles with failed verification from January 2009 Wikipedia articles with MA identifiers Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages العربية Deutsch Italiano Nederlands 日本語 Polski Português Suomi Türkçe 中文 Edit links This page was last edited on 23 March 2021, at 17:28 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-2238 ---- Simple Mail Transfer Protocol - Wikipedia Simple Mail Transfer Protocol From Wikipedia, the free encyclopedia Jump to navigation Jump to search Internet protocol used for relaying e-mails "SMTP" redirects here. For the email delivery company, see SMTP (company). For Short Message Transfer Protocol, see GSM 03.40. Internet protocol suite Application layer BGP DHCP DNS FTP HTTP HTTPS IMAP LDAP MGCP MQTT NNTP NTP POP PTP ONC/RPC RTP RTSP RIP SIP SMTP SNMP SSH Telnet TLS/SSL XMPP more... Transport layer TCP UDP DCCP SCTP RSVP more... Internet layer IP IPv4 IPv6 ICMP ICMPv6 ECN IGMP IPsec more... Link layer ARP NDP OSPF Tunnels L2TP PPP MAC Ethernet Wi-Fi DSL ISDN FDDI more... v t e The Simple Mail Transfer Protocol (SMTP) is an internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typically use SMTP only for sending messages to a mail server for relaying, and typically submit outgoing email to the mail server on port 587 or 465 per RFC 8314. For retrieving messages, IMAP and POP3 are standard, but proprietary servers also often implement proprietary protocols, e.g., Exchange ActiveSync. Since SMTP's introduction in 1981, it was updated, modified and extended multiple times. The protocol version in common use today has extensible structure with various extensions for authentication, encryption, binary data transfer, internationalized email addresses. SMTP servers commonly use the Transmission Control Protocol on port number 25 (for plaintext) and 587 (for encrypted communications). Contents 1 History 1.1 Predecessors to SMTP 1.2 Original SMTP 1.3 Modern SMTP 2 Mail processing model 3 Protocol overview 3.1 SMTP vs mail retrieval 3.2 Remote Message Queue Starting 4 Outgoing mail SMTP server 4.1 Outgoing mail server access restrictions 4.1.1 Restricting access by location 4.1.2 Client authentication 4.2 Ports 5 SMTP transport example 6 SMTP Extensions 6.1 Extension discovery mechanism 6.2 Binary data transfer 6.3 Mail delivery mechanism extensions 6.4 On-Demand Mail Relay 6.5 Internationalization extension 6.6 Extensions 6.6.1 8BITMIME 6.6.2 SMTP-AUTH 6.6.3 SMTPUTF8 6.7 Security extensions 6.7.1 STARTTLS or "Opportunistic TLS" 6.7.2 SMTP MTA Strict Transport Security 6.7.3 SMTP TLS Reporting 7 Spoofing and spamming 8 Implementations 9 Related requests for comments 10 See also 11 Notes 12 References 13 External links History[edit] Predecessors to SMTP[edit] Various forms of one-to-one electronic messaging were used in the 1960s. Users communicated using systems developed for specific mainframe computers. As more computers were interconnected, especially in the U.S. Government's ARPANET, standards were developed to permit exchange of messages between different operating systems. SMTP grew out of these standards developed during the 1970s. SMTP traces its roots to two implementations described in 1971: the Mail Box Protocol, whose implementation has been disputed,[1] but is discussed in RFC 196 and other RFCs, and the SNDMSG program, which, according to RFC 2235, Ray Tomlinson of BBN invented for TENEX computers to send mail messages across the ARPANET.[2][3][4] Fewer than 50 hosts were connected to the ARPANET at this time.[5] Further implementations include FTP Mail[6] and Mail Protocol, both from 1973.[7] Development work continued throughout the 1970s, until the ARPANET transitioned into the modern Internet around 1980. Original SMTP[edit] In 1980, Jon Postel published RFC 772 which proposed the Mail Transfer Protocol as a replacement of the use of the File Transfer Protocol (FTP) for mail. RFC 780 of May 1981 removed all references to FTP and allocated port 57 for TCP and UDP.[citation needed], an allocation that has since been removed by IANA). In November 1981, Postel published RFC 788 "Simple Mail Transfer Protocol". The SMTP standard was developed around the same time as Usenet, a one-to-many communication network with some similarities.[citation needed] SMTP became widely used in the early 1980s. At the time, it was a complement to the Unix to Unix Copy Program (UUCP), which was better suited for handling email transfers between machines that were intermittently connected. SMTP, on the other hand, works best when both the sending and receiving machines are connected to the network all the time. Both used a store and forward mechanism and are examples of push technology. Though Usenet's newsgroups were still propagated with UUCP between servers,[8] UUCP as a mail transport has virtually disappeared[9] along with the "bang paths" it used as message routing headers.[10] Sendmail, released with 4.1cBSD in 1982, soon after RFC 788 was published in November 1981, was one of the first mail transfer agents to implement SMTP.[11] Over time, as BSD Unix became the most popular operating system on the Internet, Sendmail became the most common MTA (mail transfer agent).[12] The original SMTP protocol supported only unauthenticated unencrypted 7-bit ASCII text communications, susceptible to trivial man-in-the-middle attack, spoofing, and spamming, and requiring any binary data to be encoded to readable text before transmission. Due to absence of proper authentication mechanism, by design every SMTP server was an open mail relay. The Internet Mail Consortium (IMC) reported that 55% of mail servers were open relays in 1998,[13] but less than 1% in 2002.[14] Because of spam concerns most email providers blocklist open relays,[15] making original SMTP essentially impractical for general use on the Internet. Modern SMTP[edit] In November 1995, RFC 1869 defined Extended Simple Mail Transfer Protocol (ESMTP), which established a general structure for all existing and future extensions which aimed to add-in the features missing from the original SMTP. ESMTP defines consistent and manageable means by which ESMTP clients and servers can be identified and servers can indicate supported extensions. Message submission ( RFC 2476) and SMTP-AUTH ( RFC 2554) were introduced in 1998 and 1999, both describing new trends in email delivery. Originally, SMTP servers were typically internal to an organization, receiving mail for the organization from the outside, and relaying messages from the organization to the outside. But as time went on, SMTP servers (mail transfer agents), in practice, were expanding their roles to become message submission agents for Mail user agents, some of which were now relaying mail from the outside of an organization. (e.g. a company executive wishes to send email while on a trip using the corporate SMTP server.) This issue, a consequence of the rapid expansion and popularity of the World Wide Web, meant that SMTP had to include specific rules and methods for relaying mail and authenticating users to prevent abuses such as relaying of unsolicited email (spam). Work on message submission ( RFC 2476) was originally started because popular mail servers would often rewrite mail in an attempt to fix problems in it, for example, adding a domain name to an unqualified address. This behavior is helpful when the message being fixed is an initial submission, but dangerous and harmful when the message originated elsewhere and is being relayed. Cleanly separating mail into submission and relay was seen as a way to permit and encourage rewriting submissions while prohibiting rewriting relay. As spam became more prevalent, it was also seen as a way to provide authorization for mail being sent out from an organization, as well as traceability. This separation of relay and submission quickly became a foundation for modern email security practices. As this protocol started out purely ASCII text-based, it did not deal well with binary files, or characters in many non-English languages. Standards such as Multipurpose Internet Mail Extensions (MIME) were developed to encode binary files for transfer through SMTP. Mail transfer agents (MTAs) developed after Sendmail also tended to be implemented 8-bit-clean, so that the alternate "just send eight" strategy could be used to transmit arbitrary text data (in any 8-bit ASCII-like character encoding) via SMTP. Mojibake was still a problem due to differing character set mappings between vendors, although the email addresses themselves still allowed only ASCII. 8-bit-clean MTAs today tend to support the 8BITMIME extension, permitting some binary files to be transmitted almost as easily as plain text (limits on line length and permitted octet values still apply, so that MIME encoding is needed for most non-text data and some text formats). In 2012, the SMTPUTF8 extension was created to support UTF-8 text, allowing international content and addresses in non-Latin scripts like Cyrillic or Chinese. Many people contributed to the core SMTP specifications, among them Jon Postel, Eric Allman, Dave Crocker, Ned Freed, Randall Gellens, John Klensin, and Keith Moore. Mail processing model[edit] Blue arrows depict implementation of SMTP variations Email is submitted by a mail client (mail user agent, MUA) to a mail server (mail submission agent, MSA) using SMTP on TCP port 587. Most mailbox providers still allow submission on traditional port 25. The MSA delivers the mail to its mail transfer agent (mail transfer agent, MTA). Often, these two agents are instances of the same software launched with different options on the same machine. Local processing can be done either on a single machine, or split among multiple machines; mail agent processes on one machine can share files, but if processing is on multiple machines, they transfer messages between each other using SMTP, where each machine is configured to use the next machine as a smart host. Each process is an MTA (an SMTP server) in its own right. The boundary MTA uses DNS to look up the MX (mail exchanger) record for the recipient's domain (the part of the email address on the right of @). The MX record contains the name of the target MTA. Based on the target host and other factors, the sending MTA selects a recipient server and connects to it to complete the mail exchange. Message transfer can occur in a single connection between two MTAs, or in a series of hops through intermediary systems. A receiving SMTP server may be the ultimate destination, an intermediate "relay" (that is, it stores and forwards the message) or a "gateway" (that is, it may forward the message using some protocol other than SMTP). Per RFC 5321 section 2.1, each hop is a formal handoff of responsibility for the message, whereby the receiving server must either deliver the message or properly report the failure to do so. Once the final hop accepts the incoming message, it hands it to a mail delivery agent (MDA) for local delivery. An MDA saves messages in the relevant mailbox format. As with sending, this reception can be done using one or multiple computers, but in the diagram above the MDA is depicted as one box near the mail exchanger box. An MDA may deliver messages directly to storage, or forward them over a network using SMTP or other protocol such as Local Mail Transfer Protocol (LMTP), a derivative of SMTP designed for this purpose. Once delivered to the local mail server, the mail is stored for batch retrieval by authenticated mail clients (MUAs). Mail is retrieved by end-user applications, called email clients, using Internet Message Access Protocol (IMAP), a protocol that both facilitates access to mail and manages stored mail, or the Post Office Protocol (POP) which typically uses the traditional mbox mail file format or a proprietary system such as Microsoft Exchange/Outlook or Lotus Notes/Domino. Webmail clients may use either method, but the retrieval protocol is often not a formal standard. SMTP defines message transport, not the message content. Thus, it defines the mail envelope and its parameters, such as the envelope sender, but not the header (except trace information) nor the body of the message itself. STD 10 and RFC 5321 define SMTP (the envelope), while STD 11 and RFC 5322 define the message (header and body), formally referred to as the Internet Message Format. Protocol overview[edit] SMTP is a connection-oriented, text-based protocol in which a mail sender communicates with a mail receiver by issuing command strings and supplying necessary data over a reliable ordered data stream channel, typically a Transmission Control Protocol (TCP) connection. An SMTP session consists of commands originated by an SMTP client (the initiating agent, sender, or transmitter) and corresponding responses from the SMTP server (the listening agent, or receiver) so that the session is opened, and session parameters are exchanged. A session may include zero or more SMTP transactions. An SMTP transaction consists of three command/reply sequences: MAIL command, to establish the return address, also called return-path,[16] reverse-path,[17] bounce address, mfrom, or envelope sender. RCPT command, to establish a recipient of the message. This command can be issued multiple times, one for each recipient. These addresses are also part of the envelope. DATA to signal the beginning of the message text; the content of the message, as opposed to its envelope. It consists of a message header and a message body separated by an empty line. DATA is actually a group of commands, and the server replies twice: once to the DATA command itself, to acknowledge that it is ready to receive the text, and the second time after the end-of-data sequence, to either accept or reject the entire message. Besides the intermediate reply for DATA, each server's reply can be either positive (2xx reply codes) or negative. Negative replies can be permanent (5xx codes) or transient (4xx codes). A reject is a permanent failure and the client should send a bounce message to the server it received it from. A drop is a positive response followed by message discard rather than delivery. The initiating host, the SMTP client, can be either an end-user's email client, functionally identified as a mail user agent (MUA), or a relay server's mail transfer agent (MTA), that is an SMTP server acting as an SMTP client, in the relevant session, in order to relay mail. Fully capable SMTP servers maintain queues of messages for retrying message transmissions that resulted in transient failures. A MUA knows the outgoing mail SMTP server from its configuration. A relay server typically determines which server to connect to by looking up the MX (Mail eXchange) DNS resource record for each recipient's domain name. If no MX record is found, a conformant relaying server (not all are) instead looks up the A record. Relay servers can also be configured to use a smart host. A relay server initiates a TCP connection to the server on the "well-known port" for SMTP: port 25, or for connecting to an MSA, port 587. The main difference between an MTA and an MSA is that connecting to an MSA requires SMTP Authentication. SMTP vs mail retrieval[edit] SMTP is a delivery protocol only. In normal use, mail is "pushed" to a destination mail server (or next-hop mail server) as it arrives. Mail is routed based on the destination server, not the individual user(s) to which it is addressed. Other protocols, such as the Post Office Protocol (POP) and the Internet Message Access Protocol (IMAP) are specifically designed for use by individual users retrieving messages and managing mail boxes. To permit an intermittently-connected mail server to pull messages from a remote server on demand, SMTP has a feature to initiate mail queue processing on a remote server (see Remote Message Queue Starting below). POP and IMAP are unsuitable protocols for relaying mail by intermittently-connected machines; they are designed to operate after final delivery, when information critical to the correct operation of mail relay (the "mail envelope") has been removed. Remote Message Queue Starting[edit] Remote Message Queue Starting enables a remote host to start processing of the mail queue on a server so it may receive messages destined to it by sending a corresponding command. The original TURN command was deemed insecure and was extended in RFC 1985 with the ETRN command which operates more securely using an authentication method based on Domain Name System information.[18] Outgoing mail SMTP server[edit] An email client needs to know the IP address of its initial SMTP server and this has to be given as part of its configuration (usually given as a DNS name). This server will deliver outgoing messages on behalf of the user. Outgoing mail server access restrictions[edit] Server administrators need to impose some control on which clients can use the server. This enables them to deal with abuse, for example spam. Two solutions have been in common use: In the past, many systems imposed usage restrictions by the location of the client, only permitting usage by clients whose IP address is one that the server administrators control. Usage from any other client IP address is disallowed. Modern SMTP servers typically offer an alternative system that requires authentication of clients by credentials before allowing access. Restricting access by location[edit] Under this system, an ISP's SMTP server will not allow access by users who are outside the ISP's network. More precisely, the server may only allow access to users with an IP address provided by the ISP, which is equivalent to requiring that they are connected to the Internet using that same ISP. A mobile user may often be on a network other than that of their normal ISP, and will then find that sending email fails because the configured SMTP server choice is no longer accessible. This system has several variations. For example, an organisation's SMTP server may only provide service to users on the same network, enforcing this by firewalling to block access by users on the wider Internet. Or the server may perform range checks on the client's IP address. These methods were typically used by corporations and institutions such as universities which provided an SMTP server for outbound mail only for use internally within the organisation. However, most of these bodies now use client authentication methods, as described below. Where a user is mobile, and may use different ISPs to connect to the internet, this kind of usage restriction is onerous, and altering the configured outbound email SMTP server address is impractical. It is highly desirable to be able to use email client configuration information that does not need to change. Client authentication[edit] Modern SMTP servers typically require authentication of clients by credentials before allowing access, rather than restricting access by location as described earlier. This more flexible system is friendly to mobile users and allows them to have a fixed choice of configured outbound SMTP server. SMTP Authentication, often abbreviated SMTP AUTH, is an extension of the SMTP in order to log in using an authentication mechanism. Ports[edit] Communication between mail servers generally uses the standard TCP port 25 designated for SMTP. Mail clients however generally don't use this, instead using specific "submission" ports. Mail services generally accept email submission from clients on one of: 587 (Submission), as formalized in RFC 6409 (previously RFC 2476) 465 This port was deprecated after RFC 2487, until the issue of RFC 8314. Port 2525 and others may be used by some individual providers, but have never been officially supported. Many Internet service providers now block all outgoing port 25 traffic from their customers. Mainly as an anti-spam measure,[19] but also to cure for the higher cost they have when leaving it open, perhaps by charging more from the few customers that requires it open. SMTP transport example[edit] A typical example of sending a message via SMTP to two mailboxes (alice and theboss) located in the same mail domain (example.com or localhost.com) is reproduced in the following session exchange. (In this example, the conversation parts are prefixed with S: and C:, for server and client, respectively; these labels are not part of the exchange.) After the message sender (SMTP client) establishes a reliable communications channel to the message receiver (SMTP server), the session is opened with a greeting by the server, usually containing its fully qualified domain name (FQDN), in this case smtp.example.com. The client initiates its dialog by responding with a HELO command identifying itself in the command's parameter with its FQDN (or an address literal if none is available).[20] S: 220 smtp.example.com ESMTP Postfix C: HELO relay.example.com S: 250 smtp.example.com, I am glad to meet you C: MAIL FROM: S: 250 Ok C: RCPT TO: S: 250 Ok C: RCPT TO: S: 250 Ok C: DATA S: 354 End data with . C: From: "Bob Example" C: To: Alice Example C: Cc: theboss@example.com C: Date: Tue, 15 Jan 2008 16:02:43 -0500 C: Subject: Test message C: C: Hello Alice. C: This is a test message with 5 header fields and 4 lines in the message body. C: Your friend, C: Bob C: . S: 250 Ok: queued as 12345 C: QUIT S: 221 Bye {The server closes the connection} The client notifies the receiver of the originating email address of the message in a MAIL FROM command. This is also the return or bounce address in case the message cannot be delivered. In this example the email message is sent to two mailboxes on the same SMTP server: one for each recipient listed in the To and Cc header fields. The corresponding SMTP command is RCPT TO. Each successful reception and execution of a command is acknowledged by the server with a result code and response message (e.g., 250 Ok). The transmission of the body of the mail message is initiated with a DATA command after which it is transmitted verbatim line by line and is terminated with an end-of-data sequence. This sequence consists of a new-line (), a single full stop (period), followed by another new-line. Since a message body can contain a line with just a period as part of the text, the client sends two periods every time a line starts with a period; correspondingly, the server replaces every sequence of two periods at the beginning of a line with a single one. Such escaping method is called dot-stuffing. The server's positive reply to the end-of-data, as exemplified, implies that the server has taken the responsibility of delivering the message. A message can be doubled if there is a communication failure at this time, e.g. due to a power shortage: Until the sender has received that 250 reply, it must assume the message was not delivered. On the other hand, after the receiver has decided to accept the message, it must assume the message has been delivered to it. Thus, during this time span, both agents have active copies of the message that they will try to deliver.[21] The probability that a communication failure occurs exactly at this step is directly proportional to the amount of filtering that the server performs on the message body, most often for anti-spam purposes. The limiting timeout is specified to be 10 minutes.[22] The QUIT command ends the session. If the email has other recipients located elsewhere, the client would QUIT and connect to an appropriate SMTP server for subsequent recipients after the current destination(s) had been queued. The information that the client sends in the HELO and MAIL FROM commands are added (not seen in example code) as additional header fields to the message by the receiving server. It adds a Received and Return-Path header field, respectively. Some clients are implemented to close the connection after the message is accepted (250 Ok: queued as 12345), so the last two lines may actually be omitted. This causes an error on the server when trying to send the 221 reply. SMTP Extensions[edit] Extension discovery mechanism[edit] Clients learn a server's supported options by using the EHLO greeting, as exemplified below, instead of the original HELO. Clients fall back to HELO only if the server does not support EHLO greeting.[23] Modern clients may use the ESMTP extension keyword SIZE to query the server for the maximum message size that will be accepted. Older clients and servers may try to transfer excessively sized messages that will be rejected after consuming network resources, including connect time to network links that is paid by the minute.[24] Users can manually determine in advance the maximum size accepted by ESMTP servers. The client replaces the HELO command with the EHLO command. S: 220 smtp2.example.com ESMTP Postfix C: EHLO bob.example.com S: 250-smtp2.example.com Hello bob.example.org [192.0.2.201] S: 250-SIZE 14680064 S: 250-PIPELINING S: 250 HELP Thus smtp2.example.com declares that it can accept a fixed maximum message size no larger than 14,680,064 octets (8-bit bytes). In the simplest case, an ESMTP server declares a maximum SIZE immediately after receiving an EHLO. According to RFC 1870, however, the numeric parameter to the SIZE extension in the EHLO response is optional. Clients may instead, when issuing a MAIL FROM command, include a numeric estimate of the size of the message they are transferring, so that the server can refuse receipt of overly-large messages. Binary data transfer[edit] Original SMTP supports only a single body of ASCII text, therefore any binary data needs to be encoded as text into that body of the message before transfer, and then decoded by the recipient. Binary-to-text encodings, such as uuencode and BinHex were typically used. The 8BITMIME command was developed to address this. It was standardized in 1994 as RFC 1652[25] It facilitates the transparent exchange of e-mail messages containing octets outside the seven-bit ASCII character set by encoding them as MIME content parts, typically encoded with Base64. Mail delivery mechanism extensions[edit] On-Demand Mail Relay[edit] Main article: On-Demand Mail Relay On-Demand Mail Relay (ODMR) is an SMTP extension standardized in RFC 2645 that allows an intermittently-connected SMTP server to receive email queued for it when it is connected. Internationalization extension[edit] Main article: International email Original SMTP supports email addresses composed of ASCII characters only, which is inconvenient for users whose native script is not Latin based, or who use diacritic not in the ASCII character set. This limitation was alleviated via extensions enabling UTF-8 in address names. RFC 5336 introduced experimental[26] UTF8SMTP command and later was superseded by RFC 6531 that introduced SMTPUTF8 command. These extensions provide support for multi-byte and non-ASCII characters in email addresses, such as those with diacritics and other language characters such as Greek and Chinese.[27] Current support is limited, but there is strong interest in broad adoption of RFC 6531 and the related RFCs in countries like China that have a large user base where Latin (ASCII) is a foreign script. Extensions[edit] Like SMTP, ESMTP is a protocol used to transport Internet mail. It is used as both an inter-server transport protocol and (with restricted behavior enforced) a mail submission protocol. The main identification feature for ESMTP clients is to open a transmission with the command EHLO (Extended HELLO), rather than HELO (Hello, the original RFC 821 standard). A server will respond with success (code 250), failure (code 550) or error (code 500, 501, 502, 504, or 421), depending on its configuration. An ESMTP server returns the code 250 OK in a multi-line reply with its domain and a list of keywords to indicate supported extensions. A RFC 821 compliant server returns error code 500, allowing ESMTP clients to try either HELO or QUIT. Each service extension is defined in an approved format in subsequent RFCs and registered with the Internet Assigned Numbers Authority (IANA). The first definitions were the RFC 821 optional services: SEND, SOML (Send or Mail), SAML (Send and Mail), EXPN, HELP, and TURN. The format of additional SMTP verbs was set and for new parameters in MAIL and RCPT. Some relatively common keywords (not all of them corresponding to commands) used today are: 8BITMIME – 8 bit data transmission, RFC 6152 ATRN – Authenticated TURN for On-Demand Mail Relay, RFC 2645 AUTH – Authenticated SMTP, RFC 4954 CHUNKING – Chunking, RFC 3030 DSN – Delivery status notification, RFC 3461 (See Variable envelope return path) ETRN – Extended version of remote message queue starting command TURN, RFC 1985 HELP – Supply helpful information, RFC 821 PIPELINING – Command pipelining, RFC 2920 SIZE – Message size declaration, RFC 1870 STARTTLS – Transport Layer Security, RFC 3207 (2002) SMTPUTF8 – Allow UTF-8 encoding in mailbox names and header fields, RFC 6531 UTF8SMTP – Allow UTF-8 encoding in mailbox names and header fields, RFC 5336 (deprecated[28]) The ESMTP format was restated in RFC 2821 (superseding RFC 821) and updated to the latest definition in RFC 5321 in 2008. Support for the EHLO command in servers became mandatory, and HELO designated a required fallback. Non-standard, unregistered, service extensions can be used by bilateral agreement, these services are indicated by an EHLO message keyword starting with "X", and with any additional parameters or verbs similarly marked. SMTP commands are case-insensitive. They are presented here in capitalized form for emphasis only. An SMTP server that requires a specific capitalization method is a violation of the standard.[citation needed] 8BITMIME[edit] At least the following servers advertise the 8BITMIME extension: Apache James (since 2.3.0a1)[29] Citadel (since 7.30) Courier Mail Server Gmail[30] IceWarp IIS SMTP Service Kerio Connect Lotus Domino Microsoft Exchange Server (as of Exchange Server 2000) Novell GroupWise OpenSMTPD Oracle Communications Messaging Server Postfix Sendmail (since 6.57) The following servers can be configured to advertise 8BITMIME, but do not perform conversion of 8-bit data to 7-bit when connecting to non-8BITMIME relays: Exim and qmail do not translate eight-bit messages to seven-bit when making an attempt to relay 8-bit data to non-8BITMIME peers, as is required by the RFC.[31] This does not cause problems in practice, since virtually all modern mail relays are 8-bit clean.[32] Microsoft Exchange Server 2003 advertises 8BITMIME by default, but relaying to a non-8BITMIME peer results in a bounce. This is allowed by RFC 6152 section 3. SMTP-AUTH[edit] Main article: SMTP Authentication The SMTP-AUTH extension provides an access control mechanism. It consists of an authentication step through which the client effectively logs into the mail server during the process of sending mail. Servers that support SMTP-AUTH can usually be configured to require clients to use this extension, ensuring the true identity of the sender is known. The SMTP-AUTH extension is defined in RFC 4954. SMTP-AUTH can be used to allow legitimate users to relay mail while denying relay service to unauthorized users, such as spammers. It does not necessarily guarantee the authenticity of either the SMTP envelope sender or the RFC 2822 "From:" header. For example, spoofing, in which one sender masquerades as someone else, is still possible with SMTP-AUTH unless the server is configured to limit message from-addresses to addresses this AUTHed user is authorized for. The SMTP-AUTH extension also allows one mail server to indicate to another that the sender has been authenticated when relaying mail. In general this requires the recipient server to trust the sending server, meaning that this aspect of SMTP-AUTH is rarely used on the Internet.[citation needed] SMTPUTF8[edit] Supporting servers include: Postfix (version 3.0 and later)[33] Momentum (versions 4.1[34] and 3.6.5, and later) Sendmail (under development) Exim (experimental as of the 4.86 release) CommuniGate Pro as of version 6.2.2[35] Courier-MTA as of version 1.0[36] Halon as of version 4.0[37] Microsoft Exchange Server as of protocol revision 14.0[38] Haraka and other servers.[39] Oracle Communications Messaging Server as of release 8.0.2.[40] Security extensions[edit] Mail delivery can occur both over plain text and encrypted connections, however the communicating parties might not know in advance of other party's ability to use secure channel. STARTTLS or "Opportunistic TLS"[edit] Main articles: Opportunistic TLS and Email encryption The STARTTLS extensions enables supporting SMTP servers to notify connecting clients that it supports TLS encrypted communication and offers the opportunity for clients to upgrade their connection by sending the STARTTLS command. Servers supporting the extension do not inherently gain any security benefits from its implementation on its own, as upgrading to a TLS encrypted session is dependent on the connecting client deciding to exercise this option, hence the term opportunistic TLS. STARTTLS is effective only against passive observation attacks, since the STARTTLS negotiation happens in plain text and an active attacker can trivially remove STARTTLS commands. This type of man-in-the-middle attack is sometimes referred to as STRIPTLS, where the encryption negotiation information sent from one end never reaches the other. In this scenario both parties take the invalid or unexpected responses as indication that the other does not properly support STARTTLS, defaulting to traditional plain-text mail transfer.[41] Note that STARTTLS is also defined for IMAP and POP3 in other RFCs, but these protocols serve different purposes: SMTP is used for communication between message transfer agents, while IMAP and POP3 are for end clients and message transfer agents. Electronic Frontier Foundation maintains a "STARTTLS Everywhere" list that similarly to "HTTPS Everywhere" list allows relying parties to discover others supporting secure communication without prior communication.[42] RFC 8314 officially declared plain text obsolete and recommend always using TLS, adding ports with implicit TLS. SMTP MTA Strict Transport Security[edit] A newer 2018 RFC 8461called "SMTP MTA Strict Transport Security (MTA-STS)" aims to address the problem of active adversary by defining a protocol for mail servers to declare their ability to use secure channels in specific files on the server and specific DNS TXT records. The relying party would regularly check existence of such record, and cache it for the amount of time specified in the record and never communicate over insecure channels until record expires.[41] Note that MTA-STS records apply only to SMTP traffic between mail servers while communications between end client and the mail server are protected by HTTPS, HTTP Strict Transport Security. In April 2019 Google Mail announced support for MTA-STS.[43] SMTP TLS Reporting[edit] A number of protocols allows secure delivery of messages, but they can fail due to misconfigurations or deliberate active interference, leading to undelivered messages or delivery over unencrypted or unauthenticated channels. RFC 8460 "SMTP TLS Reporting" describes a reporting mechanism and format for sharing statistics and specific information about potential failures with recipient domains. Recipient domains can then use this information to both detect potential attacks and diagnose unintentional misconfigurations. In April 2019 Google Mail announced support for SMTP TLS Reporting.[43] Spoofing and spamming[edit] Main articles: Anti-spam techniques and Email authentication The original design of SMTP had no facility to authenticate senders, or check that servers were authorized to send on their behalf, with the result that email spoofing is possible, and commonly used in email spam and phishing. Occasional proposals are made to modify SMTP extensively or replace it completely. One example of this is Internet Mail 2000, but neither it, nor any other has made much headway in the face of the network effect of the huge installed base of classic SMTP. Instead, mail servers now use a range of techniques, such as stricter enforcement of standards such as RFC 5322,[44][45] DomainKeys Identified Mail, Sender Policy Framework and DMARC, DNSBLs and greylisting to reject or quarantine suspicious emails.[46] Implementations[edit] There is also SMTP proxy implementation as for example nginx.[47] Main articles: List of mail server software and Comparison of mail servers Related requests for comments[edit] RFC 1123 – Requirements for Internet Hosts—Application and Support (STD 3) RFC 1870 – SMTP Service Extension for Message Size Declaration (оbsoletes: RFC 1653) RFC 2505 – Anti-Spam Recommendations for SMTP MTAs (BCP 30) RFC 2821 – Simple Mail Transfer Protocol RFC 2920 – SMTP Service Extension for Command Pipelining (STD 60) RFC 3030 – SMTP Service Extensions for Transmission of Large and Binary MIME Messages RFC 3207 – SMTP Service Extension for Secure SMTP over Transport Layer Security (obsoletes RFC 2487) RFC 3461 – SMTP Service Extension for Delivery Status Notifications (obsoletes RFC 1891) RFC 3463 – Enhanced Status Codes for SMTP (obsoletes RFC 1893, updated by RFC 5248) RFC 3464 – An Extensible Message Format for Delivery Status Notifications (obsoletes RFC 1894) RFC 3798 – Message Disposition Notification (updates RFC 3461) RFC 3834 – Recommendations for Automatic Responses to Electronic Mail RFC 3974 – SMTP Operational Experience in Mixed IPv4/v6 Environments RFC 4952 – Overview and Framework for Internationalized Email (updated by RFC 5336) RFC 4954 – SMTP Service Extension for Authentication (obsoletes RFC 2554, updates RFC 3463, updated by RFC 5248) RFC 5068 – Email Submission Operations: Access and Accountability Requirements (BCP 134) RFC 5248 – A Registry for SMTP Enhanced Mail System Status Codes (BCP 138) (updates RFC 3463) RFC 5321 – The Simple Mail Transfer Protocol (obsoletes RFC 821 aka STD 10, RFC 974, RFC 1869, RFC 2821, updates RFC 1123) RFC 5322 – Internet Message Format (obsoletes RFC 822 aka STD 11, and RFC 2822) RFC 5504 – Downgrading Mechanism for Email Address Internationalization RFC 6409 – Message Submission for Mail (STD 72) (obsoletes RFC 4409, RFC 2476) RFC 6522 – The Multipart/Report Content Type for the Reporting of Mail System Administrative Messages (obsoletes RFC 3462, and in turn RFC 1892) RFC 6531 – SMTP Extension for Internationalized Email Addresses (updates RFC 2821, RFC 2822, RFC 4952, and RFC 5336) RFC 8314 – Cleartext Considered Obsolete: Use of Transport Layer Security (TLS) for Email Submission and Access See also[edit] Bounce address CRAM-MD5 (a SASL mechanism for ESMTPA) RFC 2195 Email Email encryption DKIM Ident List of mail server software List of SMTP server return codes POP before SMTP / SMTP after POP Internet Message Access Protocol Binary Content Extension RFC 3516 Sender Policy Framework (SPF) Simple Authentication and Security Layer (SASL) RFC 4422 SMTP Authentication Variable envelope return path Comparison of email clients for information about SMTP support Notes[edit] ^ The History of Electronic Mail, Tom Van Vleck: "It is not clear this protocol was ever implemented" ^ The First Network Email, Ray Tomlinson, BBN ^ Picture of "The First Email Computer" by Dan Murphy, a PDP-10 ^ Dan Murphy's TENEX and TOPS-20 Papers Archived November 18, 2007, at the Wayback Machine ^ RFC 2235 ^ RFC 469 – Network Mail Meeting Summary ^ RFC 524 – A Proposed Mail Protocol ^ Tldp.org ^ draft-barber-uucp-project-conclusion-05 – The Conclusion of the UUCP Mapping Project ^ The article about sender rewriting contains technical background info about the early SMTP history and source routing before RFC 1123. ^ Eric Allman (1983), Sendmail – An Internetwork Mail Router (PDF), BSD UNIX documentation set, Berkeley: University of California, retrieved June 29, 2012 ^ Craig Partridge (2008), The Technical Development of Internet Email (PDF), IEEE Annals of the History of Computing, 30, IEEE Computer Society, pp. 3–29, doi:10.1109/MAHC.2008.32, S2CID 206442868, archived from the original (PDF) on May 12, 2011 ^ Paul Hoffman (February 1, 1998). "Allowing Relaying in SMTP: A Survey". Internet Mail Consortium. Retrieved May 30, 2010. CS1 maint: discouraged parameter (link) ^ Paul Hoffman (August 2002). "Allowing Relaying in SMTP: A Series of Surveys". Internet Mail Consortium. Archived from the original on January 18, 2007. Retrieved May 30, 2010. CS1 maint: discouraged parameter (link) ^ "In Unix, what is an open mail relay? - Knowledge Base". web.archive.org. June 17, 2007. Retrieved March 15, 2021. ^ "The MAIL, RCPT, and DATA verbs", [D. J. Bernstein] ^ RFC 5321 Section-7.2 ^ Systems, Message. "Message Systems Introduces Latest Version Of Momentum With New API-Driven Capabilities". www.prnewswire.com. Retrieved July 19, 2020. ^ Cara Garretson (2005). "ISPs Pitch In to Stop Spam". PC World. Retrieved January 18, 2016. Last month, the Anti-Spam Technical Alliance, formed last year by Yahoo, America Online, EarthLink, and Microsoft, issued a list of antispam recommendations that includes filtering Port 25. ^ RFC 5321, Simple Mail Transfer Protocol, J. Klensin, The Internet Society (October 2008) ^ RFC 1047 ^ rfc5321#section-4.5.3.2.6 ^ John Klensin; Ned Freed; Marshall T. Rose; Einar A. Stefferud; Dave Crocker (November 1995). SMTP Service Extensions. IETF. doi:10.17487/RFC1869. RFC 1869. ^ "MAIL Parameters". IANA. Retrieved April 3, 2016. ^ Which was obsoleted in 2011 by RFC 6152 corresponding to the then new STD 71 ^ "MAIL Parameters". November 15, 2018. ^ Jiankang Yao (December 19, 2014). "Chinese email address". EAI (Mailing list). IETF. Retrieved May 24, 2016. ^ "SMTP Service Extension Parameters". IANA. Retrieved November 5, 2013. ^ James Server - ChangeLog. James.apache.org. Retrieved on 2013-07-17. ^ 8BITMIME service advertised in response to EHLO on gmail-smtp-in.l.google.com port 25, checked 23 November 2011 ^ Qmail bugs and wishlist. Home.pages.de. Retrieved on 2013-07-17. ^ The 8BITMIME extension. Cr.yp.to. Retrieved on 2013-07-17. ^ "Postfix SMTPUTF8 support is enabled by default", February 8, 2015, postfix.org ^ "Message Systems Introduces Latest Version Of Momentum With New API-Driven Capabilities" (Press release). ^ "Version 6.2 Revision History". CommuniGate.com. ^ Sam Varshavchik (September 18, 2018). "New releases of Courier packages". courier-announce (Mailing list). ^ changelog ^ "MS-OXSMTP: Simple Mail Transfer Protocol (SMTP) Extensions". July 24, 2018. ^ "EAI Readiness in TLDs" (PDF). February 12, 2019. ^ "Communications Messaging Server Release Notes". oracle.com. October 2017. ^ a b "Introducing MTA Strict Transport Security (MTA-STS) | Hardenize Blog". www.hardenize.com. Retrieved April 25, 2019. ^ "STARTTLS Everywhere". EFF. Retrieved August 15, 2019. ^ a b Cimpanu, Catalin. "Gmail becomes first major email provider to support MTA-STS and TLS Reporting". ZDNet. Retrieved April 25, 2019. ^ Message Non Compliant with RFC 5322 ^ Message could not be delivered. Please ensure the message is RFC 5322 compliant. ^ Why are the emails sent to Microsoft Account rejected for policy reasons? ^ "NGINX Docs | Configuring NGINX as a Mail Proxy Server". References[edit] Hughes, L (1998). Internet E-mail: Protocols, Standards and Implementation. Artech House Publishers. ISBN 978-0-89006-939-4. Hunt, C (2003). sendmail Cookbook. O'Reilly Media. ISBN 978-0-596-00471-2. Johnson, K (2000). Internet Email Protocols: A Developer's Guide. Addison-Wesley Professional. ISBN 978-0-201-43288-6. Loshin, P (1999). Essential Email Standards: RFCs and Protocols Made Practical. John Wiley & Sons. ISBN 978-0-471-34597-8. Rhoton, J (1999). Programmer's Guide to Internet Mail: SMTP, POP, IMAP, and LDAP. Elsevier. ISBN 978-1-55558-212-8. Wood, D (1999). Programming Internet Mail. O'Reilly. ISBN 978-1-56592-479-6. External links[edit] IANA registry of mail parameters includes service extension keywords RFC 1869 SMTP Service Extensions RFC 5321 Simple Mail Transfer Protocol RFC 4954 SMTP Service Extension for Authentication (obsoletes RFC 2554) RFC 3848 SMTP and LMTP Transmission Types Registration (with ESMTPA) RFC 6409 Message Submission for Mail (obsoletes RFC 4409, which obsoletes RFC 2476) v t e Email clients Free software Current Alpine Balsa Citadel/UX Claws Mail Cleancode eMail Cone Evolution fetchmail fdm Geary getmail GNUMail Gnus Gnuzilla IMP KMail Mahogany Mailpile Mailx Mailx (Heirloom Project) Modest Mozilla Thunderbird Mulberry Mutt nmh / MH OfflineIMAP Roundcube SeaMonkey SquirrelMail Sylpheed Trojitá YAM Zimbra Discontinued Arachne Beonex Communicator BlitzMail Classilla Columbia MM Elm FossaMail Hula Mailody Mozilla Mail & Newsgroups Nylas N1 Spicebird Proprietary Freeware eM Client EmailTray Foxmail i.Scribe Mailbird Opera Mail Spark Spike TouchMail Retail Hiri Bloomba/WordPerfect Mail Newton IBM Notes InScribe Apple Mail Mail (Windows) Microsoft Outlook Novell GroupWise Airmail Postbox Shareware Becky! Forté Agent GyazMail The Bat! Donationware Pegasus Mail Discontinued cc:Mail Claris Emailer Courier Cyberdog Cyberjack Embrowser Eudora (discontinued in 2010, moved to open-source in 2018) Mailbox Microsoft Entourage Microsoft Internet Mail and News Microsoft Mail MINUET Netscape Mail Netscape Messenger 9 NeXTMail Outlook Express Pine Pocomail POPmail Sparrow Turnpike WebSpyder Windows Live Mail Windows Mail Windows Messaging Related technologies SMTP IMAP JMAP LMTP POP Push-IMAP SMAP SMTP UUCP Related topics Email Unicode and email Category Comparison Retrieved from "https://en.wikipedia.org/w/index.php?title=Simple_Mail_Transfer_Protocol&oldid=1017975199" Categories: Internet mail protocols Hidden categories: Webarchive template wayback links CS1 maint: discouraged parameter Articles with short description Short description matches Wikidata Use mdy dates from October 2013 All articles with unsourced statements Articles with unsourced statements from March 2021 Articles with unsourced statements from April 2021 Articles with unsourced statements from October 2019 Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version In other projects Wikimedia Commons Languages العربية Azərbaycanca भोजपुरी Български Bosanski Català Čeština Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Galego 한국어 Հայերեն हिन्दी Hrvatski Bahasa Indonesia Íslenska Italiano עברית Kurdî Latviešu Lëtzebuergesch Lietuvių Magyar Македонски മലയാളം Bahasa Melayu Nederlands 日本語 Norsk bokmål Norsk nynorsk Олык марий Polski Português Română Русский Shqip Simple English Slovenčina Slovenščina Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska ไทย Türkçe Українська Tiếng Việt 吴语 Yorùbá 中文 Edit links This page was last edited on 15 April 2021, at 16:46 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-319 ---- InterPlanetary File System - Wikipedia InterPlanetary File System From Wikipedia, the free encyclopedia Jump to navigation Jump to search Content-addressable, peer-to-peer hypermedia distribution protocol InterPlanetary File System Original author(s) Juan Benet and Protocol Labs[1] Developer(s) Protocol Labs Initial release February 2015; 6 years ago (2015-02)[1] Stable release 0.8.0 / 18 February 2021; 2 months ago (2021-02-18)[2] Repository github.com/ipfs/ipfs Written in Protocol implementations: Go (reference implementation), JavaScript, C,[3] Python Client libraries: Go, Java, JavaScript, Python, Scala, Haskell, Swift, Common Lisp, Rust, Ruby, PHP, C#, Erlang Operating system Linux, FreeBSD, OpenBSD, macOS, Windows Available in Go, JavaScript, Python Type Protocol, distributed file system, content delivery network License MIT license, Apache license 2.0 Website ipfs.io Part of a series on File sharing Technologies File hosting services Online video platform Peer to peer Usenet Web hosting WebRTC XDCC Video sharing sites 123Movies Dailymotion PeerTube Putlocker YouTube BitTorrent sites 1337x Demonoid ExtraTorrent EZTV isoHunt KickassTorrents Nyaa Torrents RARBG Tamil Rockers The Pirate Bay YIFY Academic #ICanHazPDF Internet Archive Library Genesis Sci-Hub Academic Torrents Z-Library File sharing networks BitTorrent Direct Connect eDonkey Freenet Gnutella Gnutella2 IPFS LBRY Ares Galaxy List of P2P protocols OpenNap WebTorrent P2P clients BitComet DC++ Deluge eMule μTorrent qBittorrent Shareaza Soulseek Transmission Tribler Vuze WinMX Napster Streaming programs Butter Project Popcorn Time Torrents-Time Anonymous file sharing Anonymous P2P Darknet Freenet Friend-to-friend I2P Private P2P Proxy server Seedbox Tor VPN Development and societal aspects Timeline Legality BitTorrent issues By country or region Canada Japan Singapore UK US Comparisons Comparison of BitTorrent clients Comparison of BitTorrent sites Comparison of eDonkey software Comparison of Internet Relay Chat clients Comparison of Usenet newsreaders v t e The InterPlanetary File System (IPFS) is a protocol and peer-to-peer network for storing and sharing data in a distributed file system. IPFS uses content-addressing to uniquely identify each file in a global namespace connecting all computing devices.[4] Contents 1 Design 2 History 3 Other notable uses 4 See also 5 References 6 External links Design[edit] This section needs expansion. You can help by adding to it. (June 2020) IPFS allows users to host and receive content in a manner similar to BitTorrent. As opposed to a centrally located server, IPFS is built around a decentralized system[5] of user-operators who hold a portion of the overall data, creating a resilient system of file storage and sharing. Any user in the network can serve a file by its content address, and other peers in the network can find and request that content from any node who has it using a distributed hash table (DHT). In contrast to BitTorrent, IPFS aims to create a single global network. This means that if Alice and Bob publish a block of data with the same hash, the peers downloading the content from Alice will exchange data with the ones downloading it from Bob.[6] IPFS aims to replace protocols used for static webpage delivery by using gateways which are accessible with HTTP.[7] Users may choose not to install an IPFS client on their device and instead use a public gateway. A list of these gateways is maintained on the IPFS GitHub page.[8] History[edit] This section needs expansion. You can help by adding to it. (June 2020) IPFS was launched in an alpha version in February 2015, and by October of the same year was described by TechCrunch as "quickly spreading by word of mouth."[1] The Catalan independence referendum, taking place in September–October 2017, was deemed illegal by the Constitutional Court of Spain and many related websites were blocked. Subsequently, the Catalan Pirate Party mirrored the website on IPFS to bypass the High Court of Justice of Catalonia order of blocking.[9][10] Phishing attacks have also been distributed through Cloudflare's IPFS gateway since July 2018. The phishing scam HTML is stored on IPFS, and displayed via Cloudflare's gateway. The connection shows as secure via a Cloudflare SSL certificate.[11] The IPStorm botnet, first detected in June 2019, uses IPFS, so it can hide its command-and-control amongst the flow of legitimate data on the IPFS network.[12] Security researchers had worked out previously the theoretical possibility of using IPFS as a botnet command-and-control system.[13][14] Other notable uses[edit] During the block of Wikipedia in Turkey, IPFS was used to create a mirror of Wikipedia, which allows access to the content of Wikipedia despite the ban.[15] That archived version of Wikipedia is a limited immutable copy that cannot be updated. Filecoin, also inter-related to IPFS and developed by Juan Benet and Protocol Labs, is an IPFS-based cooperative storage cloud.[16] Cloudflare runs a distributed web gateway to simplify, speed up, and secure access to IPFS without needing a local node.[17] Microsoft's self-sovereign identity system, Microsoft ION, builds on the Bitcoin blockchain and IPFS through a Sidetree-based DID network.[18] Brave uses Origin Protocol and IPFS to host its decentralized merchandise store[19] and in 2021 added support into their browser.[20] Opera for Android has default support for IPFS, allowing mobile users to browse ipfs:// links to access data on the IPFS network.[21] See also[edit] Content addressable storage Dat (software) Distributed file system Freenet GNUnet ZeroNet References[edit] ^ a b c Case, Amber (4 October 2015). "Why The Internet Needs IPFS Before It's Too Late". TechCrunch. Retrieved 16 July 2019. ^ https://github.com/ipfs/go-ipfs/releases ^ Agorise (23 October 2017). "c-ipfs: IPFS implementation in C. Why C? Think Bitshares' Stealth backups, OpenWrt routers (decentralize the internet/meshnet!), Android TV, decentralized Media, decentralized websites, decent." Github.com. Retrieved 25 October 2017. ^ Finley, Klint (20 June 2016). "The Inventors of the Internet Are Trying to Build a Truly Permanent Web". Wired. ^ Krishnan, Armin (2020). "Blockchain Empowers Social Resistance and Terrorism Through Decentralized Autonomous Organizations". Journal of Strategic Security. 13 (1): 41–58. doi:10.5038/1944-0472.13.1.1743. ISSN 1944-0464. JSTOR 26907412. ^ "Content addressing". docs.ipfs.io. Retrieved 29 August 2020. ^ "IPFS Gateway". docs.ipfs.io. Retrieved 29 August 2020. ^ "Public Gateway Checker | IPFS". ipfs.github.io. Retrieved 29 August 2020. ^ Balcell, Marta Poblet (5 October 2017). "Inside Catalonia's cypherpunk referendum". Eureka Street. ^ Hill, Paul (30 September 2017). "Catalan referendum app removed from Google Play Store". Neowin. Retrieved 6 October 2017. ^ Abrams, Lawrence (4 October 2018). "Phishing Attacks Distributed Through Cloudflare's IPFS Gateway". Bleeping Computer. Retrieved 31 August 2019. ^ Palmer, Danny (11 June 2019). "This unusual Windows malware is controlled via a P2P network". ZDNet. Retrieved 31 August 2019. ^ Patsakis, Constantinos; Casino, Fran (4 June 2019). "Hydras and IPFS: a decentralised playground for malware". International Journal of Information Security. 18 (6): 787–799. arXiv:1905.11880. doi:10.1007/s10207-019-00443-0. S2CID 167217444. ^ Bruno Macabeus; Marcus Vinicius; Jo ̃ao Paolo Cavalcante; Cidcley Teixeira de Souza (6 May 2018). "Protocolos IPFS e IPNS como meio para o controle de botnet: prova de conceito" (PDF). WSCDC - SBRC 2018 (in Portuguese). Retrieved 27 April 2021. ^ Dale, Brady (10 May 2017). "Turkey Can't Block This Copy of Wikipedia". Observer Media. Archived from the original on 18 October 2017. Retrieved 20 December 2017. ^ Johnson, Steven (16 January 2018). "Beyond the Bitcoin Bubble". The New York Times. Retrieved 26 September 2018. ^ Orcutt, Mike (5 October 2018). "A big tech company is working to free the internet from big tech companies". MIT Technology Review. Retrieved 21 April 2020. ^ Simons, Alex (13 May 2019). "Toward scalable decentralized identifier systems". Azure Active Directory Identity Blog. Retrieved 27 April 2021. ^ "Brave Launches New Swag Store Powered by Origin". Brave.com (Press release). 24 March 2020. Retrieved 21 April 2020. ^ Porter, Jon (19 January 2021). "Brave browser takes step toward enabling a decentralized web". The Verge. Retrieved 29 January 2021. ^ "Opera introduces major updates to its blockchain-browser on Android". Opera Blog (Press release). 3 March 2020. Retrieved 21 April 2020. External links[edit] Official website v t e File systems Comparison of file systems distributed Unix filesystem Disk ADFS AdvFS Amiga FFS Amiga OFS APFS AthFS bcachefs BeeGFS BFS Be File System Boot File System Btrfs CVFS CXFS DFS EFS Encrypting File System Extent File System Episode ext ext2 ext3 ext3cow ext4 FFS/FFS2 FAT exFAT Files-11 Fossil GPFS HAMMER HAMMER2 HFS HFS+ HPFS HTFS JFS LFS MFS Macintosh File System TiVo Media File System MINIX NetWare File System Next3 NILFS NILFS2 NSS NTFS OneFS PFS QFS QNX4FS ReFS ReiserFS Reiser4 Reliance Reliance Nitro RFS SFS SNFS Soup (Apple) Tux3 UBIFS UFS soft updates WAPBL VxFS WAFL Xiafs XFS Xsan zFS ZFS Optical disc HSF ISO 9660 ISO 13490 UDF Flash memory and SSD APFS FAT exFAT CHFS TFAT EROFS FFS2 F2FS HPFS JFFS JFFS2 JFS LogFS NILFS NILFS2 NVFS YAFFS UBIFS Distributed CXFS GFS2 Google File System OCFS2 OrangeFS PVFS QFS Xsan more... NAS 9P AFS (OpenAFS) AFP Coda DFS Google File System GPFS Lustre NCP NFS POHMELFS Hadoop SMB (CIFS) SSHFS more... Specialized Aufs AXFS Boot File System CDfs Compact Disc File System cramfs Davfs2 EROFS FTPFS FUSE Lnfs LTFS NOVA MVFS SquashFS UMSDOS OverlayFS UnionFS WBFS Pseudo and virtual configfs devfs debugfs kernfs procfs specfs sysfs tmpfs WinFS Encrypted eCryptfs EncFS EFS Rubberhose SSHFS ZFS Types Clustered Global Grid Self-certifying Flash Journaling Log-structured Object Record-oriented Semantic Steganographic Synthetic Versioning Features Case preservation Copy-on-write Data deduplication Data scrubbing Execute in place Extent File attribute Extended file attributes File change log Fork Links Hard Symbolic Access control Access-control list Filesystem-level encryption Permissions Modes Sticky bit Interfaces File manager File system API Installable File System Virtual file system Lists Cryptographic Default Log-structured v t e Peer-to-peer file sharing Networks, protocols Centralized Direct Connect OpenNap Soribada Soulseek Decentralized Ares BitTorrent DAT eDonkey FastTrack Freenet GNUnet Gnutella Gnutella2 I2P IPFS Kad LBRY OpenFT Perfect Dark Retroshare Share Tribler WebTorrent WinMX Winny ZeroNet Historic Audiogalaxy CuteMX Entropy Kazaa LimeWire Morpheus Overnet Napster Scour WASTE Comparisons of clients Advanced Direct Connect BitTorrent Direct Connect eDonkey Gnutella Gnutella2 WebTorrent Hyperlinks eD2k Magnet Metalink Uses Backup Broadcatching Segmented file transfer Disk sharing game & video sharing Image sharing Music sharing Peercasting Sharing software Web hosting (Freesite, IPFS, ZeroNet) Legal aspects Concepts Privacy Anonymous P2P Darknet Darkweb Friend-to-friend Open Music Model Private P2P Tor Internal technologies DHT Merkle tree NAT traversal PEX Protocol Encryption SHA-1 Super-seeding Tracker UDP hole punching µTP v t e Internet censorship circumvention technologies Background Internet censorship Internet censorship in China National intranet Censorship and blocking technologies IP address blocking DNS cache poisoning Wordfilter Great Firewall of China Blocks on specific websites Facebook Twitter Wikipedia Principles With a proxy server P2P Web proxies SSH VPN PAC Without a proxy server HTTPS IPv6 transition mechanism hosts DNSCrypt Domain fronting Refraction networking Anti-censorship software Free software Lantern Psiphon Shadowsocks Outline VPN GoAgent PirateBox Proprietary software Freegate Ultrasurf Hotspot Shield Garden Networks Telex CGIProxy Proxify Browser extensions uProxy Anonymity Anonymous software Tor JAP (JonDonym) Flash proxy Mixmaster Anonymous P2P network I2P ZeroNet Freenet StealthNet Physical circumvention methods Sneakernet USB dead drop Relevant organizations GreatFire FreeWeibo Turkey Blocks Reference Great Cannon Italics indicates that maintenance of the tool has been discontinued. Category Commons Retrieved from "https://en.wikipedia.org/w/index.php?title=InterPlanetary_File_System&oldid=1020151275" Categories: Application layer protocols Computer-related introductions in 2015 Distributed data storage Distributed file systems File transfer protocols Free network-related software Free software programmed in Python Internet privacy software Internet protocols Network protocols Peer-to-peer computing World Wide Web Hidden categories: CS1 Portuguese-language sources (pt) Articles with short description Short description matches Wikidata Use dmy dates from January 2019 Articles to be expanded from June 2020 All articles to be expanded Articles using small message boxes Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages العربية Català Čeština Deutsch Ελληνικά Español Esperanto فارسی Français Italiano 日本語 Polski Português Русский Українська 吴语 中文 Edit links This page was last edited on 27 April 2021, at 13:33 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-3301 ---- Link rot - Wikipedia Link rot From Wikipedia, the free encyclopedia Jump to navigation Jump to search For link rot in Wikipedia, see Wikipedia:Link rot. Phenomenon of URLs tending to cease functioning Link rot (also called link death, link breaking, or reference rot) is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address or becoming permanently unavailable. A link that no longer points to its target, often called a broken or dead link, is a specific form of dangling pointer. The rate of link rot is a subject of study and research due to its significance to the internet's ability to preserve information. Estimates of that rate vary dramatically between studies. Contents 1 Prevalence 2 Causes 3 Prevention and detection 4 See also 5 Further reading 6 Notes & references 7 External links Prevalence[edit] A number of studies have examined the prevalence of link rot within the World Wide Web, in academic literature that uses URLs to cite web content, and within digital libraries. A 2003 study found that on the Web, about one link out of every 200 broke each week,[1] suggesting a half-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in Yahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.[2] A 2004 study showed that subsets of Web links (such as those targeting specific file types or those hosted by academic institution) could have dramatically different half-lives.[3] The URLs selected for publication appear to have greater longevity than the average URL. A 2015 study by Weblock analyzed more than 180,000 links from references in the full-text corpora of three major open access publishers and found a half-life of about 14 years,[4] generally confirming a 2005 study that found that half of the URLs cited in D-Lib Magazine articles were active 10 years after publication.[5] Other studies have found higher rates of link rot in academic literature but typically suggest a half-life of four years or greater.[6][7] A 2013 study in BMC Bioinformatics analyzed nearly 15,000 links in abstracts from Thomson Reuters's Web of Science citation index and found that the median lifespan of web pages was 9.3 years, and just 62% were archived.[8] A 2002 study suggested that link rot within digital libraries is considerably slower than on the web, finding that about 3% of the objects were no longer accessible after one year[9] (equating to a half-life of nearly 23 years). Causes[edit] Link rot can result from several occurrences. A target web page may be removed. The server that hosts the target page could fail, be removed from service, or relocate to a new domain name. A domain name's registration may lapse or be transferred to another party. Some causes will result in the link failing to find any target and returning an error such as HTTP 404. Other causes will cause a link to target content other than what was intended by the link's author. Other reasons for broken links include: the restructuring of websites that causes changes in URLs (e.g. domain.net/pine_tree might be moved to domain.net/tree/pine) relocation of formerly free content to behind a paywall a change in server architecture that results in code such as PHP functioning differently dynamic page content such as search results that changes by design the presence of user-specific information (such as a login name) within the link deliberate blocking by content filters or firewalls the removal of gTLDs[10] Prevention and detection[edit] Strategies for preventing link rot can focus on placing content where its likelihood of persisting is higher, authoring links that are less likely to be broken, taking steps to preserve existing links, or repairing links whose targets have been relocated or removed. The creation of URLs that will not change with time is the fundamental method of preventing link rot. Preventive planning has been championed by Tim Berners-Lee and other web pioneers.[11] Strategies pertaining to the authorship of links include: linking to primary rather than secondary sources and prioritizing stable sites[citation needed] avoiding links that point to resources on researchers' personal pages[5] using clean URLs[12] or otherwise employing URL normalization or URL canonicalization using permalinks and persistent identifiers such as ARKs, DOIs, Handle System references, and PURLs avoiding linking to documents other than web pages[12] avoiding deep linking linking to web archives such as the Internet Archive,[13] WebCite,[14] Archive.is, Perma.cc,[15] or Amber[16] Strategies pertaining to the protection of existing links include: using redirection mechanisms such as HTTP 301 to automatically refer browsers and crawlers to relocated content using content management systems which can automatically update links when content within the same site is relocated or automatically replace links with canonical URLs[17] integrating search resources into HTTP 404 pages[18] The detection of broken links may be done manually or automatically. Automated methods include plug-ins for content management systems as well as standalone broken-link checkers such as like Xenu's Link Sleuth. Automatic checking may not detect links that return a soft 404 or links that return a 200 OK response but point to content that has changed.[19] See also[edit] Software rot Digital preservation Deletionism and inclusionism in Wikipedia Further reading[edit] Markwell, John; Brooks, David W. (2002). "Broken Links: The Ephemeral Nature of Educational WWW Hyperlinks". Journal of Science Education and Technology. 11 (2): 105–108. doi:10.1023/A:1014627511641. Gomes, Daniel; Silva, Mário J. (2006). "Modelling Information Persistence on the Web" (PDF). Proceedings of the 6th International Conference on Web Engineering. ICWE'06. Archived from the original (PDF) on 2011-07-16. Retrieved 14 September 2010. Dellavalle, Robert P.; Hester, Eric J.; Heilig, Lauren F.; Drake, Amanda L.; Kuntzman, Jeff W.; Graber, Marla; Schilling, Lisa M. (2003). "Going, Going, Gone: Lost Internet References". Science. 302 (5646): 787–788. doi:10.1126/science.1088234. PMID 14593153. Koehler, Wallace (1999). "An Analysis of Web Page and Web Site Constancy and Permanence". Journal of the American Society for Information Science. 50 (2): 162–180. doi:10.1002/(SICI)1097-4571(1999)50:2<162::AID-ASI7>3.0.CO;2-B. Sellitto, Carmine (2005). "The impact of impermanent Web-located citations: A study of 123 scholarly conference publications" (PDF). Journal of the American Society for Information Science and Technology. 56 (7): 695–703. CiteSeerX 10.1.1.473.2732. doi:10.1002/asi.20159. Notes & references[edit] Notes References ^ Fetterly, Dennis; Manasse, Mark; Najork, Marc; Wiener, Janet (2003). "A large-scale study of the evolution of web pages". Proceedings of the 12th international conference on World Wide Web. Archived from the original on 9 July 2011. Retrieved 14 September 2010. ^ van der Graaf, Hans. "The half-life of a link is two year". ZOMDir's blog. Archived from the original on 2017-10-17. Retrieved 2019-01-31. ^ Koehler, Wallace (2004). "A longitudinal study of web pages continued: a consideration of document persistence". Information Research. 9 (2). Archived from the original on 2017-09-11. Retrieved 2019-01-31. ^ "All-Time Weblock Report". August 2015. Archived from the original on 4 March 2016. Retrieved 12 January 2016. ^ a b McCown, Frank; Chan, Sheffan; Nelson, Michael L.; Bollen, Johan (2005). "The Availability and Persistence of Web References in D-Lib Magazine" (PDF). Proceedings of the 5th International Web Archiving Workshop and Digital Preservation (IWAW'05). Archived from the original (PDF) on 2012-07-17. Retrieved 2005-10-12. ^ Spinellis, Diomidis (2003). "The Decay and Failures of Web References". Communications of the ACM. 46 (1): 71–77. CiteSeerX 10.1.1.12.9599. doi:10.1145/602421.602422. Archived from the original on 2020-07-23. Retrieved 2007-09-29. ^ Steve Lawrence; David M. Pennock; Gary William Flake; et al. (March 2001). "Persistence of Web References in Scientific Research". Computer. 34 (3): 26–31. CiteSeerX 10.1.1.97.9695. doi:10.1109/2.901164. ISSN 0018-9162. Wikidata Q21012586. ^ Hennessey, Jason; Xijin Ge, Steven (2013). "A Cross Disciplinary Study of Link Decay and the Effectiveness of Mitigation Techniques". BMC Bioinformatics. 14: S5. doi:10.1186/1471-2105-14-S14-S5. PMC 3851533. PMID 24266891. ^ Nelson, Michael L.; Allen, B. Danette (2002). "Object Persistence and Availability in Digital Libraries". D-Lib Magazine. 8 (1). doi:10.1045/january2002-nelson. Archived from the original on 2020-07-19. Retrieved 2019-09-24. ^ "The death of a TLD". blog.benjojo.co.uk. Archived from the original on 2018-07-26. Retrieved 2018-07-27. ^ Berners-Lee, Tim (1998). "Cool URIs Don't Change". Archived from the original on 2000-03-02. Retrieved 2019-01-31. ^ a b Kille, Leighton Walter (8 November 2014). "The Growing Problem of Internet "Link Rot" and Best Practices for Media and Online Publishers". Journalist's Resource, Harvard Kennedy School. Archived from the original on 12 January 2015. Retrieved 16 January 2015. ^ "Internet Archive: Digital Library of Free Books, Movies, Music & Wayback Machine". 2001-03-10. Archived from the original on 26 January 1997. Retrieved 7 October 2013. ^ Eysenbach, Gunther; Trudel, Mathieu (2005). "Going, going, still there: Using the WebCite service to permanently archive cited web pages". Journal of Medical Internet Research. 7 (5): e60. doi:10.2196/jmir.7.5.e60. PMC 1550686. PMID 16403724. ^ Zittrain, Jonathan; Albert, Kendra; Lessig, Lawrence (12 June 2014). "Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations" (PDF). Legal Information Management. 14 (2): 88–99. doi:10.1017/S1472669614000255. Archived (PDF) from the original on 1 November 2020. Retrieved 10 June 2020. ^ "Harvard University's Berkman Center Releases Amber, a "Mutual Aid" Tool for Bloggers & Website Owners to Help Keep the Web Available | Berkman Center". cyber.law.harvard.edu. Archived from the original on 2016-02-02. Retrieved 2016-01-28. ^ Rønn-Jensen, Jesper (2007-10-05). "Software Eliminates User Errors And Linkrot". Justaddwater.dk. Archived from the original on 11 October 2007. Retrieved 5 October 2007. ^ Mueller, John (2007-12-14). "FYI on Google Toolbar's Latest Features". Google Webmaster Central Blog. Archived from the original on 13 September 2008. Retrieved 9 July 2008. ^ Bar-Yossef, Ziv; Broder, Andrei Z.; Kumar, Ravi; Tomkins, Andrew (2004). "Sic transit gloria telae: towards an understanding of the Web's decay". Proceedings of the 13th international conference on World Wide Web – WWW '04. pp. 328–337. CiteSeerX 10.1.1.1.9406. doi:10.1145/988672.988716. ISBN 978-1581138443. External links[edit] The Wikibook Authoring Webpages has a page on the topic of: Preventing link rot Future-Proofing Your URIs Jakob Nielsen, "Fighting Linkrot", Jakob Nielsen's Alertbox, June 14, 1998. Retrieved from "https://en.wikipedia.org/w/index.php?title=Link_rot&oldid=1016420788" Categories: URL Data quality Product expiration Hidden categories: Articles with short description Short description is different from Wikidata All articles with unsourced statements Articles with unsourced statements from January 2019 Articles prone to spam from November 2015 Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version In other projects Wikimedia Commons Languages Bosanski Dansk Deutsch Eesti Español فارسی Français 한국어 Bahasa Indonesia Nederlands 日本語 Norsk bokmål Polski Português Русский Suomi Svenska ไทย Türkçe Edit links This page was last edited on 7 April 2021, at 02:19 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-3361 ---- Unix philosophy - Wikipedia Unix philosophy From Wikipedia, the free encyclopedia Jump to navigation Jump to search Philosophy on developing software Ken Thompson and Dennis Ritchie, key proponents of the Unix philosophy The Unix philosophy, originated by Ken Thompson, is a set of cultural norms and philosophical approaches to minimalist, modular software development. It is based on the experience of leading developers of the Unix operating system. Early Unix developers were important in bringing the concepts of modularity and reusability into software engineering practice, spawning a "software tools" movement. Over time, the leading developers of Unix (and programs that ran on it) established a set of cultural norms for developing software; these norms became as important and influential as the technology of Unix itself; this has been termed the "Unix philosophy." The Unix philosophy emphasizes building simple, short, clear, modular, and extensible code that can be easily maintained and repurposed by developers other than its creators. The Unix philosophy favors composability as opposed to monolithic design. Contents 1 Origin 2 The UNIX Programming Environment 3 Program Design in the UNIX Environment 4 Doug McIlroy on Unix programming 5 Do One Thing and Do It Well 6 Eric Raymond's 17 Unix Rules 7 Mike Gancarz: The UNIX Philosophy 8 "Worse is better" 9 Criticism 10 See also 11 Notes 12 References 13 External links Origin[edit] The Unix philosophy is documented by Doug McIlroy[1] in the Bell System Technical Journal from 1978:[2] Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features". Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input. Design and build software, even operating systems, to be tried early, ideally within weeks. Don't hesitate to throw away the clumsy parts and rebuild them. Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you've finished using them. It was later summarized by Peter H. Salus in A Quarter-Century of Unix (1994):[1] Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. In their award-winning Unix paper of 1974[citation needed], Ritchie and Thompson quote the following design considerations:[3] Make it easy to write, test, and run programs. Interactive use instead of batch processing. Economy and elegance of design due to size constraints ("salvation through suffering"). Self-supporting system: all Unix software is maintained under Unix. The whole philosophy of UNIX seems to stay out of assembler. — Michael Sean Mahoney[4] The UNIX Programming Environment[edit] In their preface to the 1984 book, The UNIX Programming Environment, Brian Kernighan and Rob Pike, both from Bell Labs, give a brief description of the Unix design and the Unix philosophy:[5] Rob Pike, co-author of The UNIX Programming Environment Even though the UNIX system introduces a number of innovative programs and techniques, no single program or idea makes it work well. Instead, what makes it effective is the approach to programming, a philosophy of using the computer. Although that philosophy can't be written down in a single sentence, at its heart is the idea that the power of a system comes more from the relationships among programs than from the programs themselves. Many UNIX programs do quite trivial things in isolation, but, combined with other programs, become general and useful tools. The authors further write that their goal for this book is "to communicate the UNIX programming philosophy."[5] Program Design in the UNIX Environment[edit] Brian Kernighan has written at length about the Unix philosophy In October 1984, Brian Kernighan and Rob Pike published a paper called Program Design in the UNIX Environment. In this paper, they criticize the accretion of program options and features found in some newer Unix systems such as 4.2BSD and System V, and explain the Unix philosophy of software tools, each performing one general function:[6] Much of the power of the UNIX operating system comes from a style of program design that makes programs easy to use and, more important, easy to combine with other programs. This style has been called the use of software tools, and depends more on how the programs fit into the programming environment and how they can be used with other programs than on how they are designed internally. [...] This style was based on the use of tools: using programs separately or in combination to get a job done, rather than doing it by hand, by monolithic self-sufficient subsystems, or by special-purpose, one-time programs. The authors contrast Unix tools such as cat, with larger program suites used by other systems.[6] The design of cat is typical of most UNIX programs: it implements one simple but general function that can be used in many different applications (including many not envisioned by the original author). Other commands are used for other functions. For example, there are separate commands for file system tasks like renaming files, deleting them, or telling how big they are. Other systems instead lump these into a single "file system" command with an internal structure and command language of its own. (The PIP file copy program found on operating systems like CP/M or RSX-11 is an example.) That approach is not necessarily worse or better, but it is certainly against the UNIX philosophy. Doug McIlroy on Unix programming[edit] Doug McIlroy (left) with Dennis Ritchie McIlroy, then head of the Bell Labs Computing Sciences Research Center, and inventor of the Unix pipe,[7] summarized the Unix philosophy as follows:[1] This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. Beyond these statements, he has also emphasized simplicity and minimalism in Unix programming:[1] The notion of "intricate and beautiful complexities" is almost an oxymoron. Unix programmers vie with each other for "simple and beautiful" honors — a point that's implicit in these rules, but is well worth making overt. Conversely, McIlroy has criticized modern Linux as having software bloat, remarking that, "adoring admirers have fed Linux goodies to a disheartening state of obesity."[8] He contrasts this with the earlier approach taken at Bell Labs when developing and revising Research Unix:[9] Everything was small... and my heart sinks for Linux when I see the size of it. [...] The manual page, which really used to be a manual page, is now a small volume, with a thousand options... We used to sit around in the Unix Room saying, 'What can we throw out? Why is there this option?' It's often because there is some deficiency in the basic design — you didn't really hit the right design point. Instead of adding an option, think about what was forcing you to add that option. Do One Thing and Do It Well[edit] As stated by McIlroy, and generally accepted throughout the Unix community, Unix programs have always been expected to follow the concept of DOTADIW, or "Do One Thing And Do It Well." There are limited sources for the acronym DOTADIW on the Internet, but it is discussed at length during the development and packaging of new operating systems, especially in the Linux community. Patrick Volkerding, the project lead of Slackware Linux, invoked this design principle in a criticism of the systemd architecture, stating that, "attempting to control services, sockets, devices, mounts, etc., all within one daemon flies in the face of the Unix concept of doing one thing and doing it well."[10] Eric Raymond's 17 Unix Rules[edit] In his book The Art of Unix Programming that was first published in 2003,[11] Eric S. Raymond, an American programmer and open source advocate, summarizes the Unix philosophy as KISS Principle of "Keep it Simple, Stupid."[12] He provides a series of design rules:[1] Build modular programs Write readable programs Use composition Separate mechanisms from policy Write simple programs Write small programs Write transparent programs Write robust programs Make data complicated when required, not the program Build on potential users' expected knowledge Avoid unnecessary output Write programs which fail in a way that is easy to diagnose Value developer time over machine time Write abstract programs that generate code instead of writing code by hand Prototype software before polishing it Write flexible and open programs Make the program and protocols extensible. Mike Gancarz: The UNIX Philosophy[edit] In 1994, Mike Gancarz (a member of the team that designed the X Window System), drew on his own experience with Unix, as well as discussions with fellow programmers and people in other fields who depended on Unix, to produce The UNIX Philosophy which sums it up in nine paramount precepts: Small is beautiful. Make each program do one thing well. Build a prototype as soon as possible. Choose portability over efficiency. Store data in flat text files. Use software leverage to your advantage. Use shell scripts to increase leverage and portability. Avoid captive user interfaces. Make every program a filter. "Worse is better"[edit] Main article: Worse is better Richard P. Gabriel suggests that a key advantage of Unix was that it embodied a design philosophy he termed "worse is better", in which simplicity of both the interface and the implementation are more important than any other attributes of the system—including correctness, consistency, and completeness. Gabriel argues that this design style has key evolutionary advantages, though he questions the quality of some results. For example, in the early days Unix used a monolithic kernel (which means that user processes carried out kernel system calls all on the user stack). If a signal was delivered to a process while it was blocked on a long-term I/O in the kernel, then what should be done? Should the signal be delayed, possibly for a long time (maybe indefinitely) while the I/O completed? The signal handler could not be executed when the process was in kernel mode, with sensitive kernel data on the stack. Should the kernel back-out the system call, and store it, for replay and restart later, assuming that the signal handler completes successfully? In these cases Ken Thompson and Dennis Ritchie favored simplicity over perfection. The Unix system would occasionally return early from a system call with an error stating that it had done nothing—the "Interrupted System Call", or an error number 4 (EINTR) in today's systems. Of course the call had been aborted in order to call the signal handler. This could only happen for a handful of long-running system calls such as read(), write(), open(), and select(). On the plus side, this made the I/O system many times simpler to design and understand. The vast majority of user programs were never affected because they did not handle or experience signals other than SIGINT and would die right away if one was raised. For the few other programs—things like shells or text editors that respond to job control key presses—small wrappers could be added to system calls so as to retry the call right away if this EINTR error was raised. Thus, the problem was solved in a simple manner. Criticism[edit] In a 1981 article entitled "The truth about Unix: The user interface is horrid"[13] published in Datamation, Don Norman criticized the design philosophy of Unix for its lack of concern for the user interface. Writing from his background in cognitive science and from the perspective of the then-current philosophy of cognitive engineering,[4] he focused on how end-users comprehend and form a personal cognitive model of systems—or, in the case of Unix, fail to understand, with the result that disastrous mistakes (such as losing an hour's worth of work) are all too easy. See also[edit] Cognitive engineering Unix architecture Minimalism (computing) Software engineering KISS principle Hacker ethic List of software development philosophies Everything is a file Worse is better Notes[edit] ^ a b c d e Raymond, Eric S. (2004). "Basics of the Unix Philosophy". The Art of Unix Programming. Addison-Wesley Professional (published 2003-09-23). ISBN 0-13-142901-9. Retrieved 2016-11-01. ^ Doug McIlroy, E. N. Pinson, B. A. Tague (8 July 1978). "Unix Time-Sharing System: Foreword". The Bell System Technical Journal. Bell Laboratories: 1902–1903.CS1 maint: multiple names: authors list (link) ^ Dennis Ritchie; Ken Thompson (1974), "The UNIX time-sharing system" (PDF), Communications of the ACM, 17 (7): 365–375, doi:10.1145/361011.361061, S2CID 53235982 ^ a b "An Oral History of Unix". Princeton University History of Science. ^ a b Kernighan, Brian W. Pike, Rob. The UNIX Programming Environment. 1984. viii ^ a b Rob Pike; Brian W. Kernighan (October 1984). "Program Design in the UNIX Environment" (PDF). ^ Dennis Ritchie (1984), "The Evolution of the UNIX Time-Sharing System" (PDF), AT&T Bell Laboratories Technical Journal, 63 (8): 1577–1593, doi:10.1002/j.1538-7305.1984.tb00054.x ^ Douglas McIlroy. "Remarks for Japan Prize award ceremony for Dennis Ritchie, May 19, 2011, Murray Hill, NJ" (PDF). Retrieved 2014-06-19. ^ Bill McGonigle. "Ancestry of Linux — How the Fun Began (2005)". Retrieved 2014-06-19. ^ "Interview with Patrick Volkerding of Slackware". linuxquestions.org. 2012-06-07. Retrieved 2015-10-24. ^ Raymond, Eric (2003-09-19). The Art of Unix Programming. Addison-Wesley. ISBN 0-13-142901-9. Retrieved 2009-02-09. ^ Raymond, Eric (2003-09-19). "The Unix Philosophy in One Lesson". The Art of Unix Programming. Addison-Wesley. ISBN 0-13-142901-9. Retrieved 2009-02-09. ^ Norman, Don (1981). "The truth about Unix: The user interface is horrid" (PDF). Datamation. 27 (12). References[edit] The Unix Programming Environment by Brian Kernighan and Rob Pike, 1984 Program Design in the UNIX Environment – The paper by Pike and Kernighan that preceded the book. Notes on Programming in C, Rob Pike, September 21, 1989 A Quarter Century of Unix, Peter H. Salus, Addison-Wesley, May 31, 1994 ( ISBN 0-201-54777-5) Philosophy — from The Art of Unix Programming, Eric S. Raymond, Addison-Wesley, September 17, 2003 ( ISBN 0-13-142901-9) Final Report of the Multics Kernel Design Project by M. D. Schroeder, D. D. Clark, J. H. Saltzer, and D. H. Wells, 1977. The UNIX Philosophy, Mike Gancarz, ISBN 1-55558-123-4 External links[edit] Basics of the Unix Philosophy – by Catb.org The Unix Philosophy: A Brief Introduction – by The Linux Information Project (LINFO) Why the Unix Philosophy still matters Retrieved from "https://en.wikipedia.org/w/index.php?title=Unix_philosophy&oldid=1015300304" Categories: Software development philosophies Unix Hidden categories: CS1 maint: multiple names: authors list Articles with short description Short description matches Wikidata All articles with unsourced statements Articles with unsourced statements from March 2021 Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages العربية Čeština Deutsch Español فارسی Français 한국어 Italiano 日本語 Norsk bokmål Português Русский 中文 Edit links This page was last edited on 31 March 2021, at 18:08 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-338 ---- Evaluation strategy - Wikipedia Evaluation strategy From Wikipedia, the free encyclopedia Jump to navigation Jump to search Evaluation strategies Eager evaluation Lazy evaluation Partial evaluation Remote evaluation Short-circuit evaluation v t e Evaluation strategies are used by programming languages to determine two things—when to evaluate the arguments of a function call and what kind of value to pass to the function. To illustrate, a function application may evaluate the argument before evaluating the function's body and pass the ability to look up the argument's current value and modify it via assignment.[1] The notion of reduction strategy in lambda calculus is similar but distinct. In practical terms, many modern programming languages like C# and Java have converged on a call-by-value/call-by-reference evaluation strategy for function calls.[clarification needed] Some languages, especially lower-level languages such as C++, combine several notions of parameter passing. Historically, call by value and call by name date back to ALGOL 60, which was designed in the late 1950s. Call by reference is used by PL/I and some Fortran systems.[2] Purely functional languages like Haskell, as well as non-purely functional languages like R, use call by need. Evaluation strategy is specified by the programming language definition, and is not a function of any specific implementation. Contents 1 Strict evaluation 1.1 Applicative order 1.2 Call by value 1.2.1 Implicit limitations 1.3 Call by reference 1.4 Call by sharing 1.5 Call by copy-restore 1.6 Partial evaluation 2 Non-strict evaluation 2.1 Normal order 2.2 Call by name 2.3 Call by need 2.4 Call by macro expansion 3 Nondeterministic strategies 3.1 Full β-reduction 3.2 Call by future 3.3 Optimistic evaluation 4 See also 5 References 6 Further reading Strict evaluation[edit] Main article: Eager evaluation In strict evaluation, the arguments to a function are always evaluated completely before the function is applied. Under Church encoding, eager evaluation of operators maps to strict evaluation of functions; for this reason, strict evaluation is sometimes called "eager". Most existing programming languages use strict evaluation for functions. Applicative order[edit] Applicative order evaluation is an evaluation strategy in which an expression is evaluated by repeatedly evaluating its leftmost innermost reducible expression. This means that a function's arguments are evaluated before the function is applied.[3] Call by value[edit] Call by value (also known as pass by value) is the most common evaluation strategy, used in languages as different as C and Scheme. In call by value, the argument expression is evaluated, and the resulting value is bound to the corresponding variable in the function (frequently by copying the value into a new memory region). If the function or procedure is able to assign values to its parameters, only its local variable is assigned—that is, anything passed into a function call is unchanged in the caller's scope when the function returns. Call by value is not a single evaluation strategy, but rather the family of evaluation strategies in which a function's argument is evaluated before being passed to the function. While many programming languages (such as Common Lisp, Eiffel and Java) that use call by value evaluate function arguments left-to-right, some evaluate functions and their arguments right-to-left, and others (such as Scheme, OCaml and C) do not specify order. Implicit limitations[edit] In some cases, the term "call by value" is problematic, as the value which is passed is not the value of the variable as understood by the ordinary meaning of value, but an implementation-specific reference to the value. The effect is that what syntactically looks like call by value may end up rather behaving like call by reference or call by sharing, often depending on very subtle aspects of the language semantics. The reason for passing a reference is often that the language technically does not provide a value representation of complicated data, but instead represents them as a data structure while preserving some semblance of value appearance in the source code. Exactly where the boundary is drawn between proper values and data structures masquerading as such is often hard to predict. In C, an array (of which strings are special cases) is a data structure but the name of an array is treated as (has as value) the reference to the first element of the array, while a struct variable's name refers to a value even if it has fields that are vectors. In Maple, a vector is a special case of a table and therefore a data structure, but a list (which gets rendered and can be indexed in exactly the same way) is a value. In Tcl, values are "dual-ported" such that the value representation is used at the script level, and the language itself manages the corresponding data structure, if one is required. Modifications made via the data structure are reflected back to the value representation and vice versa. The description "call by value where the value is a reference" is common (but should not be understood as being call by reference); another term is call by sharing. Thus the behaviour of call by value Java or Visual Basic and call by value C or Pascal are significantly different: in C or Pascal, calling a function with a large structure as an argument will cause the entire structure to be copied (except if it's actually a reference to a structure), potentially causing serious performance degradation, and mutations to the structure are invisible to the caller. However, in Java or Visual Basic only the reference to the structure is copied, which is fast, and mutations to the structure are visible to the caller. Call by reference[edit] Call by reference (or pass by reference) is an evaluation strategy where a function receives an implicit reference to a variable used as argument, rather than a copy of its value. This typically means that the function can modify (i.e., assign to) the variable used as argument—something that will be seen by its caller. Call by reference can therefore be used to provide an additional channel of communication between the called function and the calling function. A call-by-reference language makes it more difficult for a programmer to track the effects of a function call, and may introduce subtle bugs. A simple litmus test for whether a language supports call-by-reference semantics is if it's possible to write a traditional swap(a, b) function in the language.[4] Many languages support call by reference in some form, but few use it by default. FORTRAN II is an early example of a call-by-reference language. A few languages, such as C++, PHP, Visual Basic .NET, C# and REALbasic, default to call by value, but offer a special syntax for call-by-reference parameters. C++ additionally offers call by reference to const. Call by reference can be simulated in languages that use call by value and don't exactly support call by reference, by making use of references (objects that refer to other objects), such as pointers (objects representing the memory addresses of other objects). Languages such as C, ML and Rust use this technique. It is not a separate evaluation strategy—the language calls by value—but sometimes it is referred to as "call by address" or "pass by address". In ML, references are type- and memory-safe, similar to Rust. A similar effect is achieved by call by sharing (passing an object, which can then be mutated), used in languages like Java, Python, and Ruby. In purely functional languages there is typically no semantic difference between the two strategies (since their data structures are immutable, so there is no possibility for a function to modify any of its arguments), so they are typically described as call by value even though implementations frequently use call by reference internally for the efficiency benefits. Following is an example that demonstrates call by reference in the E programming language: def modify(var p, &q) { p := 27 # passed by value: only the local parameter is modified q := 27 # passed by reference: variable used in call is modified } ? var a := 1 # value: 1 ? var b := 2 # value: 2 ? modify(a, &b) ? a # value: 1 ? b # value: 27 Following is an example of call by address that simulates call by reference in C: void modify(int p, int* q, int* r) { p = 27; // passed by value: only the local parameter is modified *q = 27; // passed by value or reference, check call site to determine which *r = 27; // passed by value or reference, check call site to determine which } int main() { int a = 1; int b = 1; int x = 1; int* c = &x; modify(a, &b, c); // a is passed by value, b is passed by reference by creating a pointer (call by value), // c is a pointer passed by value // b and x are changed return 0; } Call by sharing[edit] Call by sharing (also known as "call by object" or "call by object-sharing") is an evaluation strategy first noted by Barbara Liskov in 1974 for the CLU language.[5] It is used by languages such as Python,[6] Java (for object references), Ruby, JavaScript, Scheme, OCaml, AppleScript, and many others. However, the term "call by sharing" is not in common use; the terminology is inconsistent across different sources. For example, in the Java community, they say that Java is call by value.[7] Call by sharing implies that values in the language are based on objects rather than primitive types, i.e., that all values are "boxed". Because they are boxed they can be said to pass by copy of reference (where primitives are boxed before passing and unboxed at called function). The semantics of call by sharing differ from call by reference: "In particular it is not call by value because mutations of arguments performed by the called routine will be visible to the caller. And it is not call by reference because access is not given to the variables of the caller, but merely to certain objects".[8] So, for example, if a variable was passed, it is not possible to simulate an assignment on that variable in the callee's scope.[9] However, since the function has access to the same object as the caller (no copy is made), mutations to those objects, if the objects are mutable, within the function are visible to the caller, which may appear to differ from call by value semantics. Mutations of a mutable object within the function are visible to the caller because the object is not copied or cloned—it is shared. For example, in Python, lists are mutable, so: def f(a_list): a_list.append(1) m = [] f(m) print(m) outputs [1] because the append method modifies the object on which it is called. Assignments within a function are not noticeable to the caller, because, in these languages, passing the variable only means passing (access to) the actual object referred to by the variable, not access to the original (caller's) variable. Since the rebound variable only exists within the scope of the function, the counterpart in the caller retains its original binding. Compare the Python mutation above with the code below, which binds the formal argument to a new object: def f(a_list): a_list = [1] m = [] f(m) print(m) outputs [], because the statement a_list = [1] reassigns a new list to the variable rather than to the location it references. For immutable objects, there is no real difference between call by sharing and call by value, except if object identity is visible in the language. The use of call by sharing with mutable objects is an alternative to input/output parameters: the parameter is not assigned to (the argument is not overwritten and object identity is not changed), but the object (argument) is mutated.[10] Although this term has widespread usage in the Python community, identical semantics in other languages such as Java and Visual Basic are often described as call by value, where the value is implied to be a reference to the object.[citation needed] Call by copy-restore[edit] Call by copy-restore—also known as "copy-in copy-out", "call by value result", "call by value return" (as termed in the Fortran community)—is a special case of call by reference where the provided reference is unique to the caller. This variant has gained attention in multiprocessing contexts and Remote procedure call:[11] if a parameter to a function call is a reference that might be accessible by another thread of execution, its contents may be copied to a new reference that is not; when the function call returns, the updated contents of this new reference are copied back to the original reference ("restored"). The semantics of call by copy-restore also differ from those of call by reference, where two or more function arguments alias one another (i.e., point to the same variable in the caller's environment). Under call by reference, writing to one will affect the other; call by copy-restore avoids this by giving the function distinct copies, but leaves the result in the caller's environment undefined depending on which of the aliased arguments is copied back first—will the copies be made in left-to-right order both on entry and on return? When the reference is passed to the callee uninitialized, this evaluation strategy may be called "call by result". Partial evaluation[edit] Main article: Partial evaluation In partial evaluation, evaluation may continue into the body of a function that has not been applied. Any sub-expressions that do not contain unbound variables are evaluated, and function applications whose argument values are known may be reduced. If there are side effects, complete partial evaluation may produce unintended results, which is why systems that support partial evaluation tend to do so only for "pure" expressions (i.e., those without side effects) within functions. Non-strict evaluation[edit] This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (June 2013) (Learn how and when to remove this template message) In non-strict evaluation, arguments to a function are not evaluated unless they are actually used in the evaluation of the function body. Under Church encoding, lazy evaluation of operators maps to non-strict evaluation of functions; for this reason, non-strict evaluation is often referred to as "lazy". Boolean expressions in many languages use a form of non-strict evaluation called short-circuit evaluation, where evaluation returns as soon as it can be determined that an unambiguous Boolean will result—for example, in a disjunctive expression (OR) where true is encountered, or in a conjunctive expression (AND) where false is encountered, and so forth. Conditional expressions also usually use lazy evaluation, where evaluation returns as soon as an unambiguous branch will result. Normal order[edit] Normal order evaluation is an evaluation strategy in which an expression is evaluated by repeatedly evaluating its leftmost outermost reducible expression. This means that a function's arguments are not evaluated before the function is applied.[12] Call by name[edit] Call by name is an evaluation strategy where the arguments to a function are not evaluated before the function is called—rather, they are substituted directly into the function body (using capture-avoiding substitution) and then left to be evaluated whenever they appear in the function. If an argument is not used in the function body, the argument is never evaluated; if it is used several times, it is re-evaluated each time it appears. (See Jensen's Device.) Call-by-name evaluation is occasionally preferable to call-by-value evaluation. If a function's argument is not used in the function, call by name will save time by not evaluating the argument, whereas call by value will evaluate it regardless. If the argument is a non-terminating computation, the advantage is enormous. However, when the function argument is used, call by name is often slower, requiring a mechanism such as a thunk. An early use was ALGOL 60. Today's .NET languages can simulate call by name using delegates or Expression parameters. The latter results in an abstract syntax tree being given to the function. Eiffel provides agents, which represent an operation to be evaluated when needed. Seed7 provides call by name with function parameters. Java programs can accomplish similar lazy evaluation using lambda expressions and the java.util.function.Supplier interface. Call by need[edit] Main article: Lazy evaluation Call by need is a memoized variant of call by name, where, if the function argument is evaluated, that value is stored for subsequent use. If the argument is pure (i.e., free of side effects), this produces the same results as call by name, saving the cost of recomputing the argument. Haskell is a well-known language that uses call-by-need evaluation. Because evaluation of expressions may happen arbitrarily far into a computation, Haskell only supports side effects (such as mutation) via the use of monads. This eliminates any unexpected behavior from variables whose values change prior to their delayed evaluation. In R's implementation of call by need, all arguments are passed, meaning that R allows arbitrary side effects. Lazy evaluation is the most common implementation of call-by-need semantics, but variations like optimistic evaluation exist. .NET languages implement call by need using the type Lazy. Call by macro expansion[edit] Call by macro expansion is similar to call by name, but uses textual substitution rather than capture, thereby avoiding substitution. But macro substitution may cause mistakes, resulting in variable capture, leading to undesired behavior. Hygienic macros avoid this problem by checking for and replacing shadowed variables that are not parameters. Nondeterministic strategies[edit] Full β-reduction[edit] Under "full β-reduction", any function application may be reduced (substituting the function's argument into the function using capture-avoiding substitution) at any time. This may be done even within the body of an unapplied function. Call by future[edit] See also: Futures and promises "Call by future", also known as "parallel call by name", is a concurrent evaluation strategy in which the value of a future expression is computed concurrently with the flow of the rest of the program with promises, also known as futures. When the promise's value is needed, the main program blocks until the promise has a value (the promise or one of the promises finishes computing, if it has not already completed by then). This strategy is non-deterministic, as the evaluation can occur at any time between creation of the future (i.e., when the expression is given) and use of the future's value. It is similar to call by need in that the value is only computed once, and computation may be deferred until the value is needed, but it may be started before. Further, if the value of a future is not needed, such as if it is a local variable in a function that returns, the computation may be terminated partway through. If implemented with processes or threads, creating a future will spawn one or more new processes or threads (for the promises), accessing the value will synchronize these with the main thread, and terminating the computation of the future corresponds to killing the promises computing its value. If implemented with a coroutine, as in .NET async/await, creating a future calls a coroutine (an async function), which may yield to the caller, and in turn be yielded back to when the value is used, cooperatively multitasking. Optimistic evaluation[edit] Optimistic evaluation is another call-by-need variant where the function's argument is partially evaluated for some amount of time (which may be adjusted at runtime). After that time has passed, evaluation is aborted and the function is applied using call by need.[13] This approach avoids some the call-by-need strategy's runtime expenses while retaining desired termination characteristics. See also[edit] Beta normal form Comparison of programming languages eval Lambda calculus Call-by-push-value Parameter (computer science) References[edit] This article includes a list of general references, but it remains largely unverified because it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (April 2012) (Learn how and when to remove this template message) ^ Daniel P. Friedman; Mitchell Wand (2008). Essentials of Programming Languages (third ed.). Cambridge, MA: The MIT Press. ISBN 978-0262062794. ^ Some Fortran systems use call by copy-restore. ^ "Applicative order reduction". Encyclopedia2.thefreedictionary.com. Retrieved 2019-11-19. ^ "Java is Pass-by-Value, Dammit!". Retrieved 2016-12-24. ^ Liskov, Barbara; Atkinson, Russ; Bloom, Toby; Moss, Eliot; Schaffert, Craig; Scheifler, Craig; Snyder, Alan (October 1979). "CLU Reference Manual" (PDF). Laboratory for Computer Science. Massachusetts Institute of Technology. Archived from the original (PDF) on 2006-09-22. Retrieved 2011-05-19. ^ Lundh, Fredrik. "Call By Object". effbot.org. Retrieved 2011-05-19. ^ "Java is Pass-by-Value, Dammit!". Retrieved 2016-12-24. ^ CLU Reference Manual (1974), p. 14-15. sfnp error: no target: CITEREFCLU_Reference_Manual1974 (help) ^ Note: in CLU language, "variable" corresponds to "identifier" and "pointer" in modern standard usage, not to the general/usual meaning of variable. ^ "CA1021: Avoid out parameters". Microsoft. ^ "RPC: Remote Procedure Call Protocol Specification Version 2". tools.ietf.org. IETF. Retrieved 7 April 2018. ^ "Normal order reduction". Encyclopedia2.thefreedictionary.com. Retrieved 2019-11-19. ^ Ennals, Robert; Jones, Simon Peyton (August 2003). "Optimistic Evaluation: a fast evaluation strategy for non-strict programs". Further reading[edit] Abelson, Harold; Sussman, Gerald Jay (1996). Structure and Interpretation of Computer Programs (Second ed.). Cambridge, Massachusetts: The MIT Press. ISBN 978-0-262-01153-2. Baker-Finch, Clem; King, David; Hall, Jon; Trinder, Phil (1999-03-10). "An Operational Semantics for Parallel Call-by-Need" (ps). Research report. Faculty of Mathematics & Computing, The Open University. 99 (1). Ennals, Robert; Peyton Jones, Simon (2003). Optimistic Evaluation: A Fast Evaluation Strategy for Non-Strict Programs (PDF). International Conference on Functional Programming. ACM Press. Ludäscher, Bertram (2001-01-24). "CSE 130 lecture notes". CSE 130: Programming Languages: Principles & Paradigms. Pierce, Benjamin C. (2002). Types and Programming Languages. MIT Press. ISBN 0-262-16209-1. Sestoft, Peter (2002). Mogensen, T; Schmidt, D; Sudborough, I. H. (eds.). Demonstrating Lambda Calculus Reduction (PDF). The Essence of Computation: Complexity, Analysis, Transformation. Essays Dedicated to Neil D. Jones. Lecture Notes in Computer Science. 2566. Springer-Verlag. pp. 420–435. ISBN 3-540-00326-6. "Call by Value and Call by Reference in C Programming". Call by Value and Call by Reference in C Programming explained. Archived from the original on 2013-01-21. Retrieved from "https://en.wikipedia.org/w/index.php?title=Evaluation_strategy&oldid=1011286493" Categories: Evaluation strategy Hidden categories: Harv and Sfn no-target errors Wikipedia articles needing clarification from January 2017 All articles with unsourced statements Articles with unsourced statements from June 2014 Articles needing additional references from June 2013 All articles needing additional references Articles lacking in-text citations from April 2012 All articles lacking in-text citations Articles with example Python (programming language) code Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages Čeština Deutsch Français 한국어 日本語 Português Русский Slovenčina Українська 中文 Edit links This page was last edited on 10 March 2021, at 01:52 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-3552 ---- API - Wikipedia API From Wikipedia, the free encyclopedia Jump to navigation Jump to search Set of subroutine definitions, protocols, and tools for building software and applications For other uses, see API (disambiguation). "Api.php" redirects here. For the Wikipedia API, see Special:ApiHelp. In computing, an application programming interface (API) is an interface that defines interactions between multiple software applications or mixed hardware-software intermediaries.[1] It defines the kinds of calls or requests that can be made, how to make them, the data formats that should be used, the conventions to follow, etc. It can also provide extension mechanisms so that users can extend existing functionality in various ways and to varying degrees.[2] An API can be entirely custom, specific to a component, or designed based on an industry-standard to ensure interoperability. Through information hiding, APIs enable modular programming, allowing users to use the interface independently of the implementation. Reference to Web APIs is currently the most common use of the term.[3] There are also APIs for programming languages, software libraries, computer operating systems, and computer hardware. APIs originated in the 1940s, though the term API did not emerge until the 1960s and 70s. Contents 1 Purpose 2 History of the term 3 Usage 3.1 Libraries and frameworks 3.2 Operating systems 3.3 Remote APIs 3.4 Web APIs 4 Design 5 Release policies 5.1 Public API implications 6 Documentation 7 Dispute over copyright protection for APIs 8 Examples 9 See also 10 References 11 Further reading Purpose[edit] In building applications, an API (application programming interface) simplifies programming by abstracting the underlying implementation and only exposing objects or actions the developer needs. While a graphical interface for an email client might provide a user with a button that performs all the steps for fetching and highlighting new emails, an API for file input/output might give the developer a function that copies a file from one location to another without requiring that the developer understand the file system operations occurring behind the scenes.[4] History of the term[edit] A diagram from 1978 proposing the expansion of the idea of the API to become a general programming interface, beyond application programs alone.[5] The meaning of the term API has expanded over its history. It first described an interface only for end-user-facing programs, known as application programs. This origin is still reflected in the name "application programming interface." Today, the term API is broader, including also utility software and even hardware interfaces.[6] The idea of the API is much older than the term. British computer scientists Wilkes and Wheeler worked on modular software libraries in the 1940s for the EDSAC computer. Their book The Preparation of Programs for an Electronic Digital Computer contains the first published API specification. Joshua Bloch claims that Wilkes and Wheeler "latently invented" the API, because it is more of a concept that is discovered than invented.[6] Although the people who coined the term API were implementing software on a Univac 1108, the goal of their API was to make hardware independent programs possible.[7] The term "application program interface" (without an -ing suffix) is first recorded in a paper called Data structures and techniques for remote computer graphics presented at an AFIPS conference in 1968.[8][6] The authors of this paper use the term to describe the interaction of an application — a graphics program in this case — with the rest of the computer system. A consistent application interface (consisting of Fortran subroutine calls) was intended to free the programmer from dealing with idiosyncrasies of the graphics display device, and to provide hardware independence if the computer or the display were replaced.[7] The term was introduced to the field of databases by C. J. Date[9] in a 1974 paper called The Relational and Network Approaches: Comparison of the Application Programming Interface.[10] An API became a part of ANSI/SPARC framework for database management systems. This framework treated the application programming interface separately from other interfaces, such as the query interface. Database professionals in the 1970s observed these different interfaces could be combined; a sufficiently rich application interface could support the other interfaces as well.[5] This observation led to APIs that supported all types of programming, not just application programming. By 1990, the API was defined simply as "a set of services available to a programmer for performing certain tasks" by technologist Carl Malamud.[11] The conception of the API was expanded again with the dawn of web APIs. Roy Fielding's dissertation Architectural Styles and the Design of Network-based Software Architectures at UC Irvine in 2000 outlined Representational state transfer (REST) and described the idea of a "network-based Application Programming Interface" that Fielding contrasted with traditional "library-based" APIs.[12] XML and JSON web APIs saw widespread commercial adoption beginning in 2000 and continuing as of 2021. The web API is now the most common meaning of the term API.[3] When used in this way, the term API has some overlap in meaning with the terms communication protocol and remote procedure call. The Semantic Web proposed by Tim Berners-Lee in 2001 included "semantic APIs" that recast the API as an open, distributed data interface rather than a software behavior interface.[13] Instead, proprietary interfaces and agents became more widespread. Usage[edit] Libraries and frameworks[edit] The interface to a software library is one type of API. The API describes and prescribes the "expected behavior" (a specification) while the library is an "actual implementation" of this set of rules. A single API can have multiple implementations (or none, being abstract) in the form of different libraries that share the same programming interface. The separation of the API from its implementation can allow programs written in one language to use a library written in another. For example, because Scala and Java compile to compatible bytecode, Scala developers can take advantage of any Java API.[14] API use can vary depending on the type of programming language involved. An API for a procedural language such as Lua could consist primarily of basic routines to execute code, manipulate data or handle errors while an API for an object-oriented language, such as Java, would provide a specification of classes and its class methods.[15][16] Language bindings are also APIs. By mapping the features and capabilities of one language to an interface implemented in another language, a language binding allows a library or service written in one language to be used when developing in another language.[17] Tools such as SWIG and F2PY, a Fortran-to-Python interface generator, facilitate the creation of such interfaces.[18] An API can also be related to a software framework: a framework can be based on several libraries implementing several APIs, but unlike the normal use of an API, the access to the behavior built into the framework is mediated by extending its content with new classes plugged into the framework itself. Moreover, the overall program flow of control can be out of the control of the caller and in the framework's hands by inversion of control or a similar mechanism.[19][20] Operating systems[edit] An API can specify the interface between an application and the operating system.[21] POSIX, for example, specifies a set of common APIs that aim to enable an application written for a POSIX conformant operating system to be compiled for another POSIX conformant operating system. Linux and Berkeley Software Distribution are examples of operating systems that implement the POSIX APIs.[22] Microsoft has shown a strong commitment to a backward-compatible API, particularly within its Windows API (Win32) library, so older applications may run on newer versions of Windows using an executable-specific setting called "Compatibility Mode".[23] An API differs from an application binary interface (ABI) in that an API is source code based while an ABI is binary based. For instance, POSIX provides APIs while the Linux Standard Base provides an ABI.[24][25] Remote APIs[edit] Remote APIs allow developers to manipulate remote resources through protocols, specific standards for communication that allow different technologies to work together, regardless of language or platform. For example, the Java Database Connectivity API allows developers to query many different types of databases with the same set of functions, while the Java remote method invocation API uses the Java Remote Method Protocol to allow invocation of functions that operate remotely, but appear local to the developer.[26][27] Therefore, remote APIs are useful in maintaining the object abstraction in object-oriented programming; a method call, executed locally on a proxy object, invokes the corresponding method on the remote object, using the remoting protocol, and acquires the result to be used locally as a return value. A modification of the proxy object will also result in a corresponding modification of the remote object.[28] Web APIs[edit] Main article: Web API Web APIs are the defined interfaces through which interactions happen between an enterprise and applications that use its assets, which also is a Service Level Agreement (SLA) to specify the functional provider and expose the service path or URL for its API users. An API approach is an architectural approach that revolves around providing a program interface to a set of services to different applications serving different types of consumers.[29] When used in the context of web development, an API is typically defined as a set of specifications, such as Hypertext Transfer Protocol (HTTP) request messages, along with a definition of the structure of response messages, usually in an Extensible Markup Language (XML) or JavaScript Object Notation (JSON) format. An example might be a shipping company API that can be added to an eCommerce-focused website to facilitate ordering shipping services and automatically include current shipping rates, without the site developer having to enter the shipper's rate table into a web database. While "web API" historically has been virtually synonymous with web service, the recent trend (so-called Web 2.0) has been moving away from Simple Object Access Protocol (SOAP) based web services and service-oriented architecture (SOA) towards more direct representational state transfer (REST) style web resources and resource-oriented architecture (ROA).[30] Part of this trend is related to the Semantic Web movement toward Resource Description Framework (RDF), a concept to promote web-based ontology engineering technologies. Web APIs allow the combination of multiple APIs into new applications known as mashups.[31] In the social media space, web APIs have allowed web communities to facilitate sharing content and data between communities and applications. In this way, content that is created in one place dynamically can be posted and updated to multiple locations on the web.[32] For example, Twitter's REST API allows developers to access core Twitter data and the Search API provides methods for developers to interact with Twitter Search and trends data.[33] Design[edit] The design of an API has significant impact on its usage.[4] The principle of information hiding describes the role of programming interfaces as enabling modular programming by hiding the implementation details of the modules so that users of modules need not understand the complexities inside the modules.[34] Thus, the design of an API attempts to provide only the tools a user would expect.[4] The design of programming interfaces represents an important part of software architecture, the organization of a complex piece of software.[35] Release policies[edit] APIs are one of the more common ways technology companies integrate. Those that provide and use APIs are considered as being members of a business ecosystem.[36] The main policies for releasing an API are:[37] Private: The API is for internal company use only. Partner: Only specific business partners can use the API. For example, vehicle for hire companies such as Uber and Lyft allow approved third-party developers to directly order rides from within their apps. This allows the companies to exercise quality control by curating which apps have access to the API, and provides them with an additional revenue stream.[38] Public: The API is available for use by the public. For example, Microsoft makes the Windows API public, and Apple releases its API Cocoa, so that software can be written for their platforms. Not all public APIs are generally accessible by everybody. For example, Internet service providers like Cloudflare or Voxility, use RESTful APIs to allow customers and resellers access to their infrastructure information, DDoS stats, network performance or dashboard controls.[39] Access to such APIs is granted either by “API tokens”, or customer status validations.[40] Public API implications[edit] An important factor when an API becomes public is its "interface stability". Changes to the API—for example adding new parameters to a function call—could break compatibility with the clients that depend on that API.[41] When parts of a publicly presented API are subject to change and thus not stable, such parts of a particular API should be documented explicitly as "unstable". For example, in the Google Guava library, the parts that are considered unstable, and that might change soon, are marked with the Java annotation @Beta.[42] A public API can sometimes declare parts of itself as deprecated or rescinded. This usually means that part of the API should be considered a candidate for being removed, or modified in a backward incompatible way. Therefore, these changes allow developers to transition away from parts of the API that will be removed or not supported in the future.[43] Client code may contain innovative or opportunistic usages that were not intended by the API designers. In other words, for a library with a significant user base, when an element becomes part of the public API, it may be used in diverse ways.[44] On February 19, 2020, Akamai published their annual “State of the Internet” report, showcasing the growing trend of cybercriminals targeting public API platforms at financial services worldwide. From December 2017 through November 2019, Akamai witnessed 85.42 billion credential violation attacks. About 20%, or 16.55 billion, were against hostnames defined as API endpoints. Of these, 473.5 million have targeted financial services sector organizations.[45] Documentation[edit] API documentation describes what services an API offers and how to use those services, aiming to cover everything a client would need to know for practical purposes. Documentation is crucial for the development and maintenance of applications using the API.[46] API documentation is traditionally found in documentation files but can also be found in social media such as blogs, forums, and Q&A websites.[47] Traditional documentation files are often presented via a documentation system, such as Javadoc or Pydoc, that has a consistent appearance and structure. However, the types of content included in the documentation differs from API to API.[48] In the interest of clarity, API documentation may include a description of classes and methods in the API as well as "typical usage scenarios, code snippets, design rationales, performance discussions, and contracts", but implementation details of the API services themselves are usually omitted. Restrictions and limitations on how the API can be used are also covered by the documentation. For instance, documentation for an API function could note that its parameters cannot be null, that the function itself is not thread safe,[49] Because API documentation tends to be comprehensive, it is a challenge for writers to keep the documentation updated and for users to read it carefully, potentially yielding bugs.[41] API documentation can be enriched with metadata information like Java annotations. This metadata can be used by the compiler, tools, and by the run-time environment to implement custom behaviors or custom handling.[50] It is possible to generate API documentation in a data-driven manner. By observing many programs that use a given API, it is possible to infer the typical usages, as well the required contracts and directives.[51] Then, templates can be used to generate natural language from the mined data. Dispute over copyright protection for APIs[edit] Main article: Oracle America, Inc. v. Google, Inc. In 2010, Oracle Corporation sued Google for having distributed a new implementation of Java embedded in the Android operating system.[52] Google had not acquired any permission to reproduce the Java API, although permission had been given to the similar OpenJDK project. Judge William Alsup ruled in the Oracle v. Google case that APIs cannot be copyrighted in the U.S and that a victory for Oracle would have widely expanded copyright protection to a "functional set of symbols" and allowed the copyrighting of simple software commands: To accept Oracle's claim would be to allow anyone to copyright one version of code to carry out a system of commands and thereby bar all others from writing its different versions to carry out all or part of the same commands.[53][54] In 2014, however, Alsup's ruling was overturned on appeal to the Court of Appeals for the Federal Circuit, though the question of whether such use of APIs constitutes fair use was left unresolved. [55][56] In 2016, following a two-week trial, a jury determined that Google's reimplementation of the Java API constituted fair use, but Oracle vowed to appeal the decision.[57] Oracle won on its appeal, with the Court of Appeals for the Federal Circuit ruling that Google's use of the APIs did not qualify for fair use.[58] In 2019, Google appealed to the Supreme Court of the United States over both the copyrightability and fair use rulings, and the Supreme Court granted review.[59] Due to the COVID-19 pandemic, the oral hearings in the case were delayed until October 2020.[60] Examples[edit] Main category: Application programming interfaces ASPI for SCSI device interfacing Cocoa and Carbon for the Macintosh DirectX for Microsoft Windows EHLLAPI Java APIs ODBC for Microsoft Windows OpenAL cross-platform sound API OpenCL cross-platform API for general-purpose computing for CPUs & GPUs OpenGL cross-platform graphics API OpenMP API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran on many architectures, including Unix and Microsoft Windows platforms. Server Application Programming Interface (SAPI) Simple DirectMedia Layer (SDL) See also[edit] API testing API writer Augmented web Calling convention Common Object Request Broker Architecture (CORBA) Comparison of application virtual machines Document Object Model (DOM) Double-chance function Foreign function interface Front and back ends Interface (computing) Interface control document List of 3D graphics APIs Microservices Name mangling Open API Open Service Interface Definitions Parsing Plugin RAML (software) Software development kit (SDK) Web API Web content vendor XPCOM References[edit] ^ "What is an API". Hubspire. ^ Fisher, Sharon (1989). "OS/2 EE to Get 3270 Interface Early". Google Books. ^ a b Lane, Kin (October 10, 2019). "Intro to APIs: History of APIs". Postman. Retrieved September 18, 2020. When you hear the acronym “API” or its expanded version “Application Programming Interface,” it is almost always in reference to our modern approach, in that we use HTTP to provide access to machine readable data in a JSON or XML format, often simply referred to as “web APIs.” APIs have been around almost as long as computing, but modern web APIs began taking shape in the early 2000s. ^ a b c 3333Clarke, Steven (2004). "Measuring API Usability". Dr. Dobb's. Retrieved 29 July 2016. ^ a b Database architectures—a feasibility workshop (Report). Washington D.C.: U.S. Department of Commerce, National Bureau of Standards. April 1981. pp. 45–47. hdl:2027/mdp.39015077587742. LCCN 81600004. NBS special publication 500-76. Retrieved September 18, 2020. ^ a b c Bloch, Joshua (August 8, 2018). A Brief, Opinionated History of the API (Speech). QCon. San Francisco: InfoQ. Retrieved September 18, 2020. ^ a b Cotton, Ira W.; Greatorex, Frank S. (December 1968). "Data structures and techniques for remote computer graphics". AFIPS '68: Proceedings of the December 9-11, 1968, Fall Joint Computer Conference. AFIPS 1968 Fall Joint Computer Conference. I. San Francisco, California: Association for Computing Machinery. pp. 533–544. doi:10.1145/1476589.1476661. ISBN 978-1450378994. OCLC 1175621908. ^ "application program interface". Oxford English Dictionary (Online ed.). Oxford University Press. (Subscription or participating institution membership required.) ^ Date, C. J. (July 18, 2019). E. F. Codd and Relational Theory: A Detailed Review and Analysis of Codd's Major Database Writings. p. 135. ISBN 978-1684705276. ^ Date, C. J.; Codd, E. F. (January 1975). "The relational and network approaches: Comparison of the application programming interfaces". In Randall Rustin (ed.). Proceedings of 1974 ACM-SIGMOD Workshop on Data Description, Access and Control. SIGMOD Workshop 1974. 2. Ann Arbor, Michigan: Association for Computing Machinery. pp. 83–113. doi:10.1145/800297.811532. ISBN 978-1450374187. OCLC 1175623233. ^ Carl, Malamud (1990). Analyzing Novell Networks. Van Nostrand Reinhold. p. 294. ISBN 978-0442003647. ^ Fielding, Roy (2000). Architectural Styles and the Design of Network-based Software Architectures (PhD). Retrieved September 18, 2020. ^ Dotsika, Fefie (August 2010). "Semantic APIs: Scaling up towards the Semantic Web". International Journal of Information Management. 30 (4): 335–342. doi:10.1016/j.ijinfomgt.2009.12.003. ^ Odersky, Martin; Spoon, Lex; Venners, Bill (10 December 2008). "Combining Scala and Java". www.artima.com. Retrieved 29 July 2016. ^ de Figueiredo, Luiz Henrique; Ierusalimschy, Roberto; Filho, Waldemar Celes. "The design and implementation of a language for extending applications". TeCGraf Grupo de Tecnologia Em Computacao Grafica. CiteSeerX 10.1.1.47.5194. S2CID 59833827. Retrieved 29 July 2016. ^ Sintes, Tony (13 July 2001). "Just what is the Java API anyway?". JavaWorld. Retrieved 2020-07-18. ^ Emery, David. "Standards, APIs, Interfaces and Bindings". Acm.org. Archived from the original on 2015-01-16. Retrieved 2016-08-08. ^ "F2PY.org". F2PY.org. Retrieved 2011-12-18. ^ Fowler, Martin. "Inversion Of Control". ^ Fayad, Mohamed. "Object-Oriented Application Frameworks". ^ Lewine, Donald A. (1991). POSIX Programmer's Guide. O'Reilly & Associates, Inc. p. 1. ISBN 9780937175736. Retrieved 2 August 2016. ^ West, Joel; Dedrick, Jason (2001). "Open source standardization: the rise of Linux in the network era" (PDF). Knowledge, Technology & Policy. 14 (2): 88–112. Retrieved 2 August 2016. ^ Microsoft (October 2001). "Support for Windows XP". Microsoft. p. 4. Archived from the original on 2009-09-26. ^ "LSB Introduction". Linux Foundation. 21 June 2012. Retrieved 2015-03-27. ^ Stoughton, Nick (April 2005). "Update on Standards" (PDF). USENIX. Retrieved 2009-06-04. ^ Bierhoff, Kevin (23 April 2009). "API Protocol Compliance in Object-Oriented Software" (PDF). CMU Institute for Software Research. Retrieved 29 July 2016. ^ Wilson, M. Jeff (10 November 2000). "Get smart with proxies and RMI". JavaWorld. Retrieved 2020-07-18. ^ Henning, Michi; Vinoski, Steve (1999). Advanced CORBA Programming with C++. Addison-Wesley. ISBN 978-0201379273. Retrieved 16 June 2015. ^ "API-fication" (PDF download). www.hcltech.com. August 2014. ^ Benslimane, Djamal; Schahram Dustdar; Amit Sheth (2008). "Services Mashups: The New Generation of Web Applications". IEEE Internet Computing, vol. 12, no. 5. Institute of Electrical and Electronics Engineers. pp. 13–15. Archived from the original on 2011-09-28. Retrieved 2019-10-01. ^ Niccolai, James (2008-04-23), "So What Is an Enterprise Mashup, Anyway?", PC World ^ Parr, Ben. "The Evolution of the Social Media API". Mashable. Retrieved 26 July 2016. ^ "GET trends/place". developer.twitter.com. Retrieved 2020-04-30. ^ Parnas, D.L. (1972). "On the Criteria To Be Used in Decomposing Systems into Modules" (PDF). Communications of the ACM. 15 (12): 1053–1058. doi:10.1145/361598.361623. S2CID 53856438. ^ Garlan, David; Shaw, Mary (January 1994). "An Introduction to Software Architecture" (PDF). Advances in Software Engineering and Knowledge Engineering. 1. Retrieved 8 August 2016. ^ de Ternay, Guerric (Oct 10, 2015). "Business Ecosystem: Creating an Economic Moat". BoostCompanies. Retrieved 2016-02-01. ^ Boyd, Mark (2014-02-21). "Private, Partner or Public: Which API Strategy Is Best for Business?". ProgrammableWeb. Retrieved 2 August 2016. ^ Weissbrot, Alison (7 July 2016). "Car Service APIs Are Everywhere, But What's In It For Partner Apps?". AdExchanger. ^ "Cloudflare API v4 Documentation". cloudflare. 25 February 2020. Retrieved 27 February 2020. ^ Liew, Zell (17 January 2018). "Car Service APIs Are Everywhere, But What's In It For Partner Apps". Smashing Magazine. Retrieved 27 February 2020. ^ a b Shi, Lin; Zhong, Hao; Xie, Tao; Li, Mingshu (2011). An Empirical Study on Evolution of API Documentation. International Conference on Fundamental Approaches to Software Engineering. Lecture Notes in Computer Science. 6603. pp. 416–431. doi:10.1007/978-3-642-19811-3_29. ISBN 978-3-642-19810-6. Retrieved 22 July 2016. ^ "guava-libraries - Guava: Google Core Libraries for Java 1.6+ - Google Project Hosting". 2014-02-04. Retrieved 2014-02-11. ^ Oracle. "How and When to Deprecate APIs". Java SE Documentation. Retrieved 2 August 2016. ^ Mendez, Diego; Baudry, Benoit; Monperrus, Martin (2013). "Empirical evidence of large-scale diversity in API usage of object-oriented software". 2013 IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM). pp. 43–52. arXiv:1307.4062. doi:10.1109/SCAM.2013.6648183. ISBN 978-1-4673-5739-5. S2CID 6890739. ^ Takanashi, Dean (19 February 2020). "Akamai: Cybercriminals are attacking APIs at financial services firms". Venture Beat. Retrieved 27 February 2020. ^ Dekel, Uri; Herbsleb, James D. (May 2009). "Improving API Documentation Usability with Knowledge Pushing". Institute for Software Research, School of Computer Science. CiteSeerX 10.1.1.446.4214. ^ Parnin, Chris; Treude, Cristoph (May 2011). "Measuring API Documentation on the Web". Web2SE: 25–30. doi:10.1145/1984701.1984706. ISBN 9781450305952. S2CID 17751901. Retrieved 22 July 2016. ^ Maalej, Waleed; Robillard, Martin P. (April 2012). "Patterns of Knowledge in API Reference Documentation" (PDF). IEEE Transactions on Software Engineering. Retrieved 22 July 2016. ^ Monperrus, Martin; Eichberg, Michael; Tekes, Elif; Mezini, Mira (3 December 2011). "What should developers be aware of? An empirical study on the directives of API documentation". Empirical Software Engineering. 17 (6): 703–737. arXiv:1205.6363. doi:10.1007/s10664-011-9186-4. S2CID 8174618. ^ "Annotations". Sun Microsystems. Archived from the original on 2011-09-25. Retrieved 2011-09-30.. ^ Bruch, Marcel; Mezini, Mira; Monperrus, Martin (2010). "Mining subclassing directives to improve framework reuse". 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). pp. 141–150. CiteSeerX 10.1.1.434.15. doi:10.1109/msr.2010.5463347. ISBN 978-1-4244-6802-7. S2CID 1026918. ^ "Oracle and the End of Programming As We Know It". DrDobbs. 2012-05-01. Retrieved 2012-05-09. ^ "APIs Can't be Copyrighted Says Judge in Oracle Case". TGDaily. 2012-06-01. Retrieved 2012-12-06. ^ "Oracle America, Inc. vs. Google Inc" (PDF). Wired. 2012-05-31. Retrieved 2013-09-22. ^ "Oracle Am., Inc. v. Google Inc., No. 13-1021, Fed. Cir. 2014". ^ Rosenblatt, Seth (May 9, 2014). "Court sides with Oracle over Android in Java patent appeal". CNET. Retrieved 2014-05-10. ^ "Google beats Oracle—Android makes "fair use" of Java APIs". Ars Technica. 2016-05-26. Retrieved 2016-07-28. ^ Decker, Susan (March 27, 2018). "Oracle Wins Revival of Billion-Dollar Case Against Google". Bloomberg Businessweek. Retrieved March 27, 2018. ^ Lee, Timothy (January 25, 2019). "Google asks Supreme Court to overrule disastrous ruling on API copyrights". Ars Technica. Retrieved February 8, 2019. ^ vkimber (2020-09-28). "Google LLC v. Oracle America, Inc". LII / Legal Information Institute. Retrieved 2021-03-06. Further reading[edit] Taina Bucher (16 November 2013). "Objects of Intense Feeling: The Case of the Twitter API". Computational Culture (3). ISSN 2047-2390. Argues that "APIs are far from neutral tools" and form a key part of contemporary programming, understood as a fundamental part of culture. What is an API? - in the U.S. supreme court opinion, Google v. Oracle 2021, pp.3-7 - "For each task, there is computer code; API (also known as Application Program Interface) is the method for calling that 'computer code' (instruction - like a recipe - rather than cooking instruction, this is machine instruction) to be carry out" v t e Operating systems General Advocacy Comparison Forensic engineering History Hobbyist development List Timeline Usage share User features comparison Variants Disk operating system Distributed operating system Embedded operating system Mobile operating system Network operating system Object-oriented operating system Real-time operating system Supercomputer operating system Kernel Architectures Exokernel Hybrid Microkernel Monolithic vkernel Rump kernel Unikernel Components Device driver Loadable kernel module User space Process management Concepts Computer multitasking (Cooperative, Preemptive) Context switch Interrupt IPC Process Process control block Real-time Thread Time-sharing Scheduling algorithms Fixed-priority preemptive Multilevel feedback queue Round-robin Shortest job next Memory management, resource protection Bus error General protection fault Memory protection Paging Protection ring Segmentation fault Virtual memory Storage access, file systems Boot loader Defragmentation Device file File attribute Inode Journal Partition Virtual file system Virtual tape library Supporting concepts API Computer network HAL Live CD Live USB OS shell CLI GUI 3D GUI NUI TUI VUI ZUI PXE Authority control BNF: cb13337425v (data) GND: 4430243-5 LCCN: sh98004527 MA: 99613125 Retrieved from "https://en.wikipedia.org/w/index.php?title=API&oldid=1019932945" Categories: Application programming interfaces Technical communication Hidden categories: Articles with short description Short description matches Wikidata Wikipedia articles with BNF identifiers Wikipedia articles with GND identifiers Wikipedia articles with LCCN identifiers Wikipedia articles with MA identifiers Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version In other projects Wikimedia Commons Languages العربية Asturianu Azərbaycanca বাংলা Български Boarisch Bosanski Català Čeština Dansk Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Gaeilge Galego 한국어 हिन्दी Hrvatski Bahasa Indonesia Italiano עברית ქართული Latviešu Lietuvių Magyar മലയാളം Bahasa Melayu Монгол Nederlands 日本語 Nordfriisk Norsk bokmål Norsk nynorsk Piemontèis Polski Português Română Русский Shqip Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska Tagalog தமிழ் ไทย Türkçe Українська Tiếng Việt 吴语 中文 Edit links This page was last edited on 26 April 2021, at 06:47 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-359 ---- Half-life - Wikipedia Half-life From Wikipedia, the free encyclopedia Jump to navigation Jump to search Scientific and mathematical term This article is about the scientific and mathematical concept. For the video game, see Half-Life (video game). For other uses, see Half-Life (disambiguation). This article is missing information about the history of the term half-life. Please expand the article to include this information. Further details may exist on the talk page. (July 2019) Number of half-lives elapsed Fraction remaining Percentage remaining 0 1⁄1 100 1 1⁄2 50 2 1⁄4 25 3 1⁄8 12 .5 4 1⁄16 6 .25 5 1⁄32 3 .125 6 1⁄64 1 .5625 7 1⁄128 0 .78125 ... ... ... n 1/2n 100/2n Half-life (symbol t1⁄2) is the time required for a quantity to reduce to half of its initial value. The term is commonly used in nuclear physics to describe how quickly unstable atoms undergo radioactive decay or how long stable atoms survive. The term is also used more generally to characterize any type of exponential or non-exponential decay. For example, the medical sciences refer to the biological half-life of drugs and other chemicals in the human body. The converse of half-life is doubling time. The original term, half-life period, dating to Ernest Rutherford's discovery of the principle in 1907, was shortened to half-life in the early 1950s.[1] Rutherford applied the principle of a radioactive element's half-life to studies of age determination of rocks by measuring the decay period of radium to lead-206. Half-life is constant over the lifetime of an exponentially decaying quantity, and it is a characteristic unit for the exponential decay equation. The accompanying table shows the reduction of a quantity as a function of the number of half-lives elapsed. Contents 1 Probabilistic nature 2 Formulas for half-life in exponential decay 2.1 Half-life and reaction orders 2.2 Decay by two or more processes 2.3 Examples 3 In non-exponential decay 4 In biology and pharmacology 5 See also 6 References 7 External links Probabilistic nature[edit] Simulation of many identical atoms undergoing radioactive decay, starting with either 4 atoms per box (left) or 400 (right). The number at the top is how many half-lives have elapsed. Note the consequence of the law of large numbers: with more atoms, the overall decay is more regular and more predictable. A half-life usually describes the decay of discrete entities, such as radioactive atoms. In that case, it does not work to use the definition that states "half-life is the time required for exactly half of the entities to decay". For example, if there is just one radioactive atom, and its half-life is one second, there will not be "half of an atom" left after one second. Instead, the half-life is defined in terms of probability: "Half-life is the time required for exactly half of the entities to decay on average". In other words, the probability of a radioactive atom decaying within its half-life is 50%.[2] For example, the image on the right is a simulation of many identical atoms undergoing radioactive decay. Note that after one half-life there are not exactly one-half of the atoms remaining, only approximately, because of the random variation in the process. Nevertheless, when there are many identical atoms decaying (right boxes), the law of large numbers suggests that it is a very good approximation to say that half of the atoms remain after one half-life. Various simple exercises can demonstrate probabilistic decay, for example involving flipping coins or running a statistical computer program.[3][4][5] Formulas for half-life in exponential decay[edit] Main article: Exponential decay An exponential decay can be described by any of the following three equivalent formulas:[6]:109–112 N ( t ) = N 0 ( 1 2 ) t t 1 / 2 N ( t ) = N 0 e − t τ N ( t ) = N 0 e − λ t {\displaystyle {\begin{aligned}N(t)&=N_{0}\left({\frac {1}{2}}\right)^{\frac {t}{t_{1/2}}}\\N(t)&=N_{0}e^{-{\frac {t}{\tau }}}\\N(t)&=N_{0}e^{-\lambda t}\end{aligned}}} where N0 is the initial quantity of the substance that will decay (this quantity may be measured in grams, moles, number of atoms, etc.), N(t) is the quantity that still remains and has not yet decayed after a time t, t1⁄2 is the half-life of the decaying quantity, τ is a positive number called the mean lifetime of the decaying quantity, λ is a positive number called the decay constant of the decaying quantity. The three parameters t1⁄2, τ, and λ are all directly related in the following way: t 1 / 2 = ln ⁡ ( 2 ) λ = τ ln ⁡ ( 2 ) {\displaystyle t_{1/2}={\frac {\ln(2)}{\lambda }}=\tau \ln(2)} where ln(2) is the natural logarithm of 2 (approximately 0.693).[6]:112 Half-life and reaction orders[edit] The value of the half-life depends on the reaction order: Zero order kinetics: The rate of this kind of reaction does not depend on the substrate concentration. The rate law of zero order kinetics is as follows: [ A ] = [ A ] 0 − k t {\displaystyle [A]=[A]_{0}-kt} In order to find the half life we have to replace the concentration value for the initial concentration divided by 2 and isolate the time. If we do it, we find the equation of the half life of the zero order reaction: t 1 / 2 = [ A ] 0 k 2 {\displaystyle t_{1/2}={\frac {[A]_{0}}{k2}}} The t1/2 formula for a zero order reaction suggests the half-life depends on the amount of initial concentration and rate constant. First order kinetics: In first order reactions, the concentration of the reaction will continue to decrease as time progresses until it reaches zero, and the length of half-life will be constant, independent of concentration. The time for [A] to decrease from [A]0 to ½ [A]0 in a first-order reaction is given by the following equation: k t 1 / 2 = − ln ⁡ ( 1 / 2 [ A ] 0 [ A ] 0 ) = − ln ⁡ 1 2 = ln ⁡ 2 {\displaystyle kt_{1/2}=-\ln {\biggl (}{\frac {1/2[A]_{0}}{[A]_{0}}}{\biggr )}=-\ln {\frac {1}{2}}=\ln 2} For a first-order reaction, the half-life of a reactant is independent of its initial concentration. Therefore, if the concentration of A at some arbitrary stage of the reaction is [A], then it will have fallen to ½ [A] after a further interval of (\ln 2)/k. Hence, the half-life of a first order reaction is given as the following: t 1 / 2 = ln ⁡ 2 k {\displaystyle t_{1/2}={\frac {\ln 2}{k}}} The half-life of a first order reaction is independent of its initial concentration and depends solely on the reaction rate constant, k. Second order kinetics: In the second order reactions, the concentration of the reactant decrease following this formula: 1 [ A ] = k t + 1 [ A ] 0 {\displaystyle {\frac {1}{[A]}}=kt+{\frac {1}{[A]_{0}}}} Then, we replace [A] for [A]0 divided by 2 in order to calculate the half-life of the reactant A and isolate the time of the half-life (t1/2): t 1 / 2 = 1 [ A ] 0 k {\displaystyle t_{1/2}={\frac {1}{[A]_{0}k}}} As you can see, the half-life of the second order reactions depends on the initial concentration and rate constant. Decay by two or more processes[edit] Some quantities decay by two exponential-decay processes simultaneously. In this case, the actual half-life T1⁄2 can be related to the half-lives t1 and t2 that the quantity would have if each of the decay processes acted in isolation: 1 T 1 / 2 = 1 t 1 + 1 t 2 {\displaystyle {\frac {1}{T_{1/2}}}={\frac {1}{t_{1}}}+{\frac {1}{t_{2}}}} For three or more processes, the analogous formula is: 1 T 1 / 2 = 1 t 1 + 1 t 2 + 1 t 3 + ⋯ {\displaystyle {\frac {1}{T_{1/2}}}={\frac {1}{t_{1}}}+{\frac {1}{t_{2}}}+{\frac {1}{t_{3}}}+\cdots } For a proof of these formulas, see Exponential decay § Decay by two or more processes. Examples[edit] Half-life demonstrated using dice in a classroom experiment Further information: Exponential decay § Applications and examples There is a half-life describing any exponential-decay process. For example: As noted above, in radioactive decay the half-life is the length of time after which there is a 50% chance that an atom will have undergone nuclear decay. It varies depending on the atom type and isotope, and is usually determined experimentally. See List of nuclides. The current flowing through an RC circuit or RL circuit decays with a half-life of ln(2)RC or ln(2)L/R, respectively. For this example the term half time tends to be used, rather than "half-life", but they mean the same thing. In a chemical reaction, the half-life of a species is the time it takes for the concentration of that substance to fall to half of its initial value. In a first-order reaction the half-life of the reactant is ln(2)/λ, where λ is the reaction rate constant. In non-exponential decay[edit] The term "half-life" is almost exclusively used for decay processes that are exponential (such as radioactive decay or the other examples above), or approximately exponential (such as biological half-life discussed below). In a decay process that is not even close to exponential, the half-life will change dramatically while the decay is happening. In this situation it is generally uncommon to talk about half-life in the first place, but sometimes people will describe the decay in terms of its "first half-life", "second half-life", etc., where the first half-life is defined as the time required for decay from the initial value to 50%, the second half-life is from 50% to 25%, and so on.[7] In biology and pharmacology[edit] See also: Biological half-life A biological half-life or elimination half-life is the time it takes for a substance (drug, radioactive nuclide, or other) to lose one-half of its pharmacologic, physiologic, or radiological activity. In a medical context, the half-life may also describe the time that it takes for the concentration of a substance in blood plasma to reach one-half of its steady-state value (the "plasma half-life"). The relationship between the biological and plasma half-lives of a substance can be complex, due to factors including accumulation in tissues, active metabolites, and receptor interactions.[8] While a radioactive isotope decays almost perfectly according to so-called "first order kinetics" where the rate constant is a fixed number, the elimination of a substance from a living organism usually follows more complex chemical kinetics. For example, the biological half-life of water in a human being is about 9 to 10 days,[9] though this can be altered by behavior and other conditions. The biological half-life of caesium in human beings is between one and four months. The concept of a half-life has also been utilized for pesticides in plants,[10] and certain authors maintain that pesticide risk and impact assessment models rely on and are sensitive to information describing dissipation from plants.[11] In epidemiology, the concept of half-life can refer to the length of time for the number of incident cases in a disease outbreak to drop by half, particularly if the dynamics of the outbreak can be modeled exponentially.[12][13] See also[edit] Half time (physics) List of radioactive nuclides by half-life Mean lifetime Median lethal dose References[edit] ^ John Ayto, 20th Century Words (1989), Cambridge University Press. ^ Muller, Richard A. (April 12, 2010). Physics and Technology for Future Presidents. Princeton University Press. pp. 128–129. ISBN 9780691135045. ^ Chivers, Sidney (March 16, 2003). "Re: What happens during half-lifes [sic] when there is only one atom left?". MADSCI.org. ^ "Radioactive-Decay Model". Exploratorium.edu. Retrieved 2012-04-25. ^ Wallin, John (September 1996). "Assignment #2: Data, Simulations, and Analytic Science in Decay". Astro.GLU.edu. Archived from the original on 2011-09-29.CS1 maint: unfit URL (link) ^ a b Rösch, Frank (September 12, 2014). Nuclear- and Radiochemistry: Introduction. 1. Walter de Gruyter. ISBN 978-3-11-022191-6. ^ Jonathan Crowe; Tony Bradshaw (2014). Chemistry for the Biosciences: The Essential Concepts. p. 568. ISBN 9780199662883. ^ Lin VW; Cardenas DD (2003). Spinal cord medicine. Demos Medical Publishing, LLC. p. 251. ISBN 978-1-888799-61-3. ^ Pang, Xiao-Feng (2014). Water: Molecular Structure and Properties. New Jersey: World Scientific. p. 451. ISBN 9789814440424. ^ Australian Pesticides and Veterinary Medicines Authority (31 March 2015). "Tebufenozide in the product Mimic 700 WP Insecticide, Mimic 240 SC Insecticide". Australian Government. Retrieved 30 April 2018. ^ Fantke, Peter; Gillespie, Brenda W.; Juraske, Ronnie; Jolliet, Olivier (11 July 2014). "Estimating Half-Lives for Pesticide Dissipation from Plants". Environmental Science & Technology. 48 (15): 8588–8602. Bibcode:2014EnST...48.8588F. doi:10.1021/es500434p. PMID 24968074. ^ Balkew, Teshome Mogessie (December 2010). The SIR Model When S(t) is a Multi-Exponential Function (Thesis). East Tennessee State University. ^ Ireland, MW, ed. (1928). The Medical Department of the United States Army in the World War, vol. IX: Communicable and Other Diseases. Washington: U.S.: U.S. Government Printing Office. pp. 116–7. External links[edit] Look up half-life in Wiktionary, the free dictionary. Wikimedia Commons has media related to Half times. Welcome to Nucleonica, Nucleonica.net (archived 2017) wiki: Decay Engine, Nucleonica.net (archived 2016) System Dynamics – Time Constants, Bucknell.edu Researchers Nikhef and UvA measure slowest radioactive decay ever: Xe-124 with 18 billion trillion years v t e Radiation (physics and health) Main articles Non-ionizing radiation Acoustic radiation force Infrared Light Starlight Sunlight Microwave Radio waves Ultraviolet Ionizing radiation Radioactive decay Cluster decay Background radiation Alpha particle Beta particle Gamma ray Cosmic ray Neutron radiation Nuclear fission Nuclear fusion Nuclear reactors Nuclear weapons Particle accelerators Radioactive materials X-ray Earth's energy budget Electromagnetic radiation Synchrotron radiation Thermal radiation Black-body radiation Particle radiation Gravitational radiation Cosmic background radiation Cherenkov radiation Askaryan radiation Bremsstrahlung Unruh radiation Dark radiation Radiation and health Radiation syndrome acute chronic Health physics Dosimetry Electromagnetic radiation and health Laser safety Lasers and aviation safety Medical radiography Mobile phone radiation and health Radiation protection Radiation therapy Radioactivity in the life sciences Radioactive contamination Radiobiology Biological dose units and quantities Wireless electronic devices and health Radiation heat-transfer Related articles Half-life Nuclear physics Radioactive source Radiation hardening List of civilian radiation accidents 1996 Costa Rica accident 1987 Goiânia accident 1984 Moroccan accident 1990 Zaragoza accident See also: the categories Radiation effects, Radioactivity, Radiobiology, and Radiation protection Authority control GND: 4258821-2 MA: 3941253, 113514576 Retrieved from "https://en.wikipedia.org/w/index.php?title=Half-life&oldid=1014132428" Categories: Chemical kinetics Exponentials Radioactivity Hidden categories: CS1 maint: unfit URL Articles with short description Short description is different from Wikidata Articles to be expanded from July 2019 Commons category link is on Wikidata Wikipedia articles with GND identifiers Wikipedia articles with MA identifiers Wikipedia articles with multiple identifiers Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version In other projects Wikimedia Commons Languages Afrikaans العربية Aragonés Asturianu বাংলা Bân-lâm-gú Беларуская भोजपुरी Български Bosanski Català Чӑвашла Čeština Cymraeg Dansk الدارجة Deutsch Eesti Ελληνικά Español Esperanto Euskara فارسی Français Gaeilge Galego 贛語 한국어 हिन्दी Hrvatski Bahasa Indonesia Íslenska Italiano עברית ಕನ್ನಡ ქართული Қазақша Kiswahili Kreyòl ayisyen Latviešu Lietuvių Limburgs Magyar Македонски മലയാളം Bahasa Melayu Nederlands 日本語 Nordfriisk Norsk bokmål Norsk nynorsk Occitan Oʻzbekcha/ўзбекча پنجابی Plattdüütsch Polski Português Română Runa Simi Русский Simple English Slovenčina Slovenščina کوردی Српски / srpski Srpskohrvatski / српскохрватски Suomi Svenska தமிழ் Татарча/tatarça తెలుగు ไทย Türkçe Українська اردو Tiếng Việt 吴语 粵語 中文 Edit links This page was last edited on 25 March 2021, at 10:46 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-5784 ---- Unix philosophy - Wikipedia Unix philosophy From Wikipedia, the free encyclopedia Jump to navigation Jump to search Philosophy on developing software Ken Thompson and Dennis Ritchie, key proponents of the Unix philosophy The Unix philosophy, originated by Ken Thompson, is a set of cultural norms and philosophical approaches to minimalist, modular software development. It is based on the experience of leading developers of the Unix operating system. Early Unix developers were important in bringing the concepts of modularity and reusability into software engineering practice, spawning a "software tools" movement. Over time, the leading developers of Unix (and programs that ran on it) established a set of cultural norms for developing software; these norms became as important and influential as the technology of Unix itself; this has been termed the "Unix philosophy." The Unix philosophy emphasizes building simple, short, clear, modular, and extensible code that can be easily maintained and repurposed by developers other than its creators. The Unix philosophy favors composability as opposed to monolithic design. Contents 1 Origin 2 The UNIX Programming Environment 3 Program Design in the UNIX Environment 4 Doug McIlroy on Unix programming 5 Do One Thing and Do It Well 6 Eric Raymond's 17 Unix Rules 7 Mike Gancarz: The UNIX Philosophy 8 "Worse is better" 9 Criticism 10 See also 11 Notes 12 References 13 External links Origin[edit] The Unix philosophy is documented by Doug McIlroy[1] in the Bell System Technical Journal from 1978:[2] Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features". Expect the output of every program to become the input to another, as yet unknown, program. Don't clutter output with extraneous information. Avoid stringently columnar or binary input formats. Don't insist on interactive input. Design and build software, even operating systems, to be tried early, ideally within weeks. Don't hesitate to throw away the clumsy parts and rebuild them. Use tools in preference to unskilled help to lighten a programming task, even if you have to detour to build the tools and expect to throw some of them out after you've finished using them. It was later summarized by Peter H. Salus in A Quarter-Century of Unix (1994):[1] Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. In their award-winning Unix paper of 1974[citation needed], Ritchie and Thompson quote the following design considerations:[3] Make it easy to write, test, and run programs. Interactive use instead of batch processing. Economy and elegance of design due to size constraints ("salvation through suffering"). Self-supporting system: all Unix software is maintained under Unix. The whole philosophy of UNIX seems to stay out of assembler. — Michael Sean Mahoney[4] The UNIX Programming Environment[edit] In their preface to the 1984 book, The UNIX Programming Environment, Brian Kernighan and Rob Pike, both from Bell Labs, give a brief description of the Unix design and the Unix philosophy:[5] Rob Pike, co-author of The UNIX Programming Environment Even though the UNIX system introduces a number of innovative programs and techniques, no single program or idea makes it work well. Instead, what makes it effective is the approach to programming, a philosophy of using the computer. Although that philosophy can't be written down in a single sentence, at its heart is the idea that the power of a system comes more from the relationships among programs than from the programs themselves. Many UNIX programs do quite trivial things in isolation, but, combined with other programs, become general and useful tools. The authors further write that their goal for this book is "to communicate the UNIX programming philosophy."[5] Program Design in the UNIX Environment[edit] Brian Kernighan has written at length about the Unix philosophy In October 1984, Brian Kernighan and Rob Pike published a paper called Program Design in the UNIX Environment. In this paper, they criticize the accretion of program options and features found in some newer Unix systems such as 4.2BSD and System V, and explain the Unix philosophy of software tools, each performing one general function:[6] Much of the power of the UNIX operating system comes from a style of program design that makes programs easy to use and, more important, easy to combine with other programs. This style has been called the use of software tools, and depends more on how the programs fit into the programming environment and how they can be used with other programs than on how they are designed internally. [...] This style was based on the use of tools: using programs separately or in combination to get a job done, rather than doing it by hand, by monolithic self-sufficient subsystems, or by special-purpose, one-time programs. The authors contrast Unix tools such as cat, with larger program suites used by other systems.[6] The design of cat is typical of most UNIX programs: it implements one simple but general function that can be used in many different applications (including many not envisioned by the original author). Other commands are used for other functions. For example, there are separate commands for file system tasks like renaming files, deleting them, or telling how big they are. Other systems instead lump these into a single "file system" command with an internal structure and command language of its own. (The PIP file copy program found on operating systems like CP/M or RSX-11 is an example.) That approach is not necessarily worse or better, but it is certainly against the UNIX philosophy. Doug McIlroy on Unix programming[edit] Doug McIlroy (left) with Dennis Ritchie McIlroy, then head of the Bell Labs Computing Sciences Research Center, and inventor of the Unix pipe,[7] summarized the Unix philosophy as follows:[1] This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface. Beyond these statements, he has also emphasized simplicity and minimalism in Unix programming:[1] The notion of "intricate and beautiful complexities" is almost an oxymoron. Unix programmers vie with each other for "simple and beautiful" honors — a point that's implicit in these rules, but is well worth making overt. Conversely, McIlroy has criticized modern Linux as having software bloat, remarking that, "adoring admirers have fed Linux goodies to a disheartening state of obesity."[8] He contrasts this with the earlier approach taken at Bell Labs when developing and revising Research Unix:[9] Everything was small... and my heart sinks for Linux when I see the size of it. [...] The manual page, which really used to be a manual page, is now a small volume, with a thousand options... We used to sit around in the Unix Room saying, 'What can we throw out? Why is there this option?' It's often because there is some deficiency in the basic design — you didn't really hit the right design point. Instead of adding an option, think about what was forcing you to add that option. Do One Thing and Do It Well[edit] As stated by McIlroy, and generally accepted throughout the Unix community, Unix programs have always been expected to follow the concept of DOTADIW, or "Do One Thing And Do It Well." There are limited sources for the acronym DOTADIW on the Internet, but it is discussed at length during the development and packaging of new operating systems, especially in the Linux community. Patrick Volkerding, the project lead of Slackware Linux, invoked this design principle in a criticism of the systemd architecture, stating that, "attempting to control services, sockets, devices, mounts, etc., all within one daemon flies in the face of the Unix concept of doing one thing and doing it well."[10] Eric Raymond's 17 Unix Rules[edit] In his book The Art of Unix Programming that was first published in 2003,[11] Eric S. Raymond, an American programmer and open source advocate, summarizes the Unix philosophy as KISS Principle of "Keep it Simple, Stupid."[12] He provides a series of design rules:[1] Build modular programs Write readable programs Use composition Separate mechanisms from policy Write simple programs Write small programs Write transparent programs Write robust programs Make data complicated when required, not the program Build on potential users' expected knowledge Avoid unnecessary output Write programs which fail in a way that is easy to diagnose Value developer time over machine time Write abstract programs that generate code instead of writing code by hand Prototype software before polishing it Write flexible and open programs Make the program and protocols extensible. Mike Gancarz: The UNIX Philosophy[edit] In 1994, Mike Gancarz (a member of the team that designed the X Window System), drew on his own experience with Unix, as well as discussions with fellow programmers and people in other fields who depended on Unix, to produce The UNIX Philosophy which sums it up in nine paramount precepts: Small is beautiful. Make each program do one thing well. Build a prototype as soon as possible. Choose portability over efficiency. Store data in flat text files. Use software leverage to your advantage. Use shell scripts to increase leverage and portability. Avoid captive user interfaces. Make every program a filter. "Worse is better"[edit] Main article: Worse is better Richard P. Gabriel suggests that a key advantage of Unix was that it embodied a design philosophy he termed "worse is better", in which simplicity of both the interface and the implementation are more important than any other attributes of the system—including correctness, consistency, and completeness. Gabriel argues that this design style has key evolutionary advantages, though he questions the quality of some results. For example, in the early days Unix used a monolithic kernel (which means that user processes carried out kernel system calls all on the user stack). If a signal was delivered to a process while it was blocked on a long-term I/O in the kernel, then what should be done? Should the signal be delayed, possibly for a long time (maybe indefinitely) while the I/O completed? The signal handler could not be executed when the process was in kernel mode, with sensitive kernel data on the stack. Should the kernel back-out the system call, and store it, for replay and restart later, assuming that the signal handler completes successfully? In these cases Ken Thompson and Dennis Ritchie favored simplicity over perfection. The Unix system would occasionally return early from a system call with an error stating that it had done nothing—the "Interrupted System Call", or an error number 4 (EINTR) in today's systems. Of course the call had been aborted in order to call the signal handler. This could only happen for a handful of long-running system calls such as read(), write(), open(), and select(). On the plus side, this made the I/O system many times simpler to design and understand. The vast majority of user programs were never affected because they did not handle or experience signals other than SIGINT and would die right away if one was raised. For the few other programs—things like shells or text editors that respond to job control key presses—small wrappers could be added to system calls so as to retry the call right away if this EINTR error was raised. Thus, the problem was solved in a simple manner. Criticism[edit] In a 1981 article entitled "The truth about Unix: The user interface is horrid"[13] published in Datamation, Don Norman criticized the design philosophy of Unix for its lack of concern for the user interface. Writing from his background in cognitive science and from the perspective of the then-current philosophy of cognitive engineering,[4] he focused on how end-users comprehend and form a personal cognitive model of systems—or, in the case of Unix, fail to understand, with the result that disastrous mistakes (such as losing an hour's worth of work) are all too easy. See also[edit] Cognitive engineering Unix architecture Minimalism (computing) Software engineering KISS principle Hacker ethic List of software development philosophies Everything is a file Worse is better Notes[edit] ^ a b c d e Raymond, Eric S. (2004). "Basics of the Unix Philosophy". The Art of Unix Programming. Addison-Wesley Professional (published 2003-09-23). ISBN 0-13-142901-9. Retrieved 2016-11-01. ^ Doug McIlroy, E. N. Pinson, B. A. Tague (8 July 1978). "Unix Time-Sharing System: Foreword". The Bell System Technical Journal. Bell Laboratories: 1902–1903.CS1 maint: multiple names: authors list (link) ^ Dennis Ritchie; Ken Thompson (1974), "The UNIX time-sharing system" (PDF), Communications of the ACM, 17 (7): 365–375, doi:10.1145/361011.361061, S2CID 53235982 ^ a b "An Oral History of Unix". Princeton University History of Science. ^ a b Kernighan, Brian W. Pike, Rob. The UNIX Programming Environment. 1984. viii ^ a b Rob Pike; Brian W. Kernighan (October 1984). "Program Design in the UNIX Environment" (PDF). ^ Dennis Ritchie (1984), "The Evolution of the UNIX Time-Sharing System" (PDF), AT&T Bell Laboratories Technical Journal, 63 (8): 1577–1593, doi:10.1002/j.1538-7305.1984.tb00054.x ^ Douglas McIlroy. "Remarks for Japan Prize award ceremony for Dennis Ritchie, May 19, 2011, Murray Hill, NJ" (PDF). Retrieved 2014-06-19. ^ Bill McGonigle. "Ancestry of Linux — How the Fun Began (2005)". Retrieved 2014-06-19. ^ "Interview with Patrick Volkerding of Slackware". linuxquestions.org. 2012-06-07. Retrieved 2015-10-24. ^ Raymond, Eric (2003-09-19). The Art of Unix Programming. Addison-Wesley. ISBN 0-13-142901-9. Retrieved 2009-02-09. ^ Raymond, Eric (2003-09-19). "The Unix Philosophy in One Lesson". The Art of Unix Programming. Addison-Wesley. ISBN 0-13-142901-9. Retrieved 2009-02-09. ^ Norman, Don (1981). "The truth about Unix: The user interface is horrid" (PDF). Datamation. 27 (12). References[edit] The Unix Programming Environment by Brian Kernighan and Rob Pike, 1984 Program Design in the UNIX Environment – The paper by Pike and Kernighan that preceded the book. Notes on Programming in C, Rob Pike, September 21, 1989 A Quarter Century of Unix, Peter H. Salus, Addison-Wesley, May 31, 1994 ( ISBN 0-201-54777-5) Philosophy — from The Art of Unix Programming, Eric S. Raymond, Addison-Wesley, September 17, 2003 ( ISBN 0-13-142901-9) Final Report of the Multics Kernel Design Project by M. D. Schroeder, D. D. Clark, J. H. Saltzer, and D. H. Wells, 1977. The UNIX Philosophy, Mike Gancarz, ISBN 1-55558-123-4 External links[edit] Basics of the Unix Philosophy – by Catb.org The Unix Philosophy: A Brief Introduction – by The Linux Information Project (LINFO) Why the Unix Philosophy still matters Retrieved from "https://en.wikipedia.org/w/index.php?title=Unix_philosophy&oldid=1015300304" Categories: Software development philosophies Unix Hidden categories: CS1 maint: multiple names: authors list Articles with short description Short description matches Wikidata All articles with unsourced statements Articles with unsourced statements from March 2021 Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages العربية Čeština Deutsch Español فارسی Français 한국어 Italiano 日本語 Norsk bokmål Português Русский 中文 Edit links This page was last edited on 31 March 2021, at 18:08 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-5954 ---- Communal work - Wikipedia Communal work From Wikipedia, the free encyclopedia Jump to navigation Jump to search See also: Mutual aid (organization theory) A quilting bee is a form of communal work. Communal work is a gathering for mutually accomplishing a task or for communal fundraising. Communal work provided manual labour to others, especially for major projects such as barn raising, bees of various kinds, log rolling, and subbotniks. Different words have been used to describe such gatherings. They are less common in today's more individualistic cultures, where there is less reliance on others than in preindustrial agricultural and hunter-gatherer societies. Major jobs such as clearing a field of timber or raising a barn needed many workers. It was often both a social and utilitarian event. Jobs like corn husking or sewing could be done as a group to allow socializing during an otherwise tedious chore. Such gatherings often included refreshments and entertainment. In more modern societies, the word "bee" has also been used for some time already for other social gatherings without communal work, for example for competitions such as a spelling bee. Contents 1 In specific cultures 1.1 Africa 1.1.1 East Africa 1.1.2 Rwanda 1.1.3 Ethiopia 1.1.4 Sudan 1.2 Asia 1.2.1 Indonesia 1.2.1.1 Background 1.2.1.2 Political appropriation 1.2.2 Philippines 1.2.2.1 Etymology 1.2.2.2 Usage 1.2.3 Turkey 1.3 Europe 1.3.1 Finland & the Baltics 1.3.2 Russia, Ukraine, Belarus, Poland 1.3.3 Hungary 1.3.4 Ireland 1.3.5 Asturias 1.3.6 Norway 1.3.7 Serbia 1.4 North America 1.4.1 Cherokee 1.5 Latin America 1.5.1 Mexico 1.5.2 Quechua 1.5.3 Chile 2 Bee 2.1 History 2.2 In literature 2.3 Etymology 3 See also 4 References In specific cultures[edit] Africa[edit] East Africa[edit] Harambee (Swahili pronunciation: [haramˈbeː]) is an East African (Kenyan, Tanzanian and Ugandan) tradition of community self-help events, e.g. fundraising or development activities. Harambee literally means "all pull together" in Swahili, and is also the official motto of Kenya and appears on its coat of arms. Rwanda[edit] Umuganda is a national day of community service held on the last Saturday of each month in Rwanda. In 2009, umuganda was institutionalized in the country. It is translated as "coming together in common purpose to achieve an outcome."[1] Ethiopia[edit] A social event is held to build a house or a farm. Especially for elderly and widows who do not have the physical strength to do it on their own. Sudan[edit] Naffīr (نفير) is an Arabic word used in parts of Sudan (including Kordofan, Darfur, parts of the Nuba mountains and Kassala) to describe particular types of communal work undertakings. Naffīr has been described as including a group recruited through family networks, in-laws and village neighbors for some particular purpose, which then disbands when that purpose is fulfilled.[2] An alternative, more recent, definition describes naffīr as "to bring someone together from the neighborhood or community to carry out a certain project, such as building a house or providing help during the harvest season."[3] The word may be related to the standard Arabic word nafr (نفر) which describes a band, party, group or troop, typically mobilized for war. In standard Arabic, a naffīr āmm (نفير عام) refers to a general call to arms.[4] Naffīr has also been used in a military context in Sudan. For example, the term was used to refer to the an-Naffīr ash-Sha'abī or "People's Militias" that operated in the central Nuba Mountains region in the early 1990s.[5] Asia[edit] Indonesia[edit] The traditional communal slametan unggahan ceremony of Bonokeling village, Banyumas, Central Java, in which the participants literally perform the notion of gotong royong (carrying together). Gotong-royong is a conception of sociality ethos familiar to Indonesia — and in wider extent might also include Malaysia, Brunei and Singapore. In Indonesian languages especially Javanese, gotong means "carrying a burden using one's shoulder", while royong means "together" or "communally", thus the combined phrase gotong royong can be translated literally as "joint bearing of burdens". It translate to working together, helping each other or mutual assistance.[6] Village's public facilities, such as irrigations, streets, and house of worship (village's mosque, church or pura) are usually constructed in gotong royong way, where the funds and materials are collected mutually. The traditional communal events, such as slametan ceremony are also usually held in goyong royong ethos of communal work spirit, which each members of society are expected to contribute and participate in the endeavour harmoniously. The phrase has been translated into English in many ways, most of which hearken to the conception of reciprocity or mutual aid. For M. Nasroen, gotong royong forms one of the core tenets of Indonesian philosophy. Paul Michael Taylor and Lorraine V. Aragon state that "gotong royong [is] cooperation among many people to attain a shared goal."[7] Background[edit] In a 1983 essay Clifford Geertz points to the importance of gotong royong in Indonesian life: An enormous inventory of highly specific and often quite intricate institutions for effecting the cooperation in work, politics, and personal relations alike, vaguely gathered under culturally charged and fairly well indefinable value-images--rukun ("mutual adjustment"), gotong royong ("joint bearing of burdens"), tolong-menolong ("reciprocal assistance")--governs social interaction with a force as sovereign as it is subdued.[8] Anthropologist Robert A. Hahn writes: Javanese culture is stratified by social class and by level of adherence to Islam. ...Traditional Javanese culture does not emphasize material wealth. ...There is respect for those who contribute to the general village welfare over personal gain. And the spirit of gotong royong, or volunteerism, is promoted as a cultural value.[9] Gotong royong has long functioned as the scale of the village, as a moral conception of the political economy. Pottier records the impact of the Green Revolution in Java: "Before the GR, 'Java' had relatively 'open' markets, in which many local people were rewarded in kind. With the GR, rural labour markets began to foster 'exclusionary practices'... This resulted in a general loss of rights, especially secure harvesting rights within a context of mutual cooperation, known as gotong royong." Citing Ann Laura Stoler's ethnography from the 1970s, Pottier writes that cash was replacing exchange, that old patron-client ties were breaking, and that social relations were becoming characterized more by employer-employee qualities.[10] Political appropriation[edit] For Prime Minister Muhammad Natsir, gotong royong was an ethical principle of sociality, in marked contrast to both the "unchecked" feudalism of the West, and the social anomie of capitalism.[11] Ideas of reciprocity, ancient and deeply enmeshed aspects of kampung morality, were seized upon by postcolonial politicians. John Sidel writes: "Ironically, national-level politicians drew on " village conceptions of adat and gotong royong. They drew on notions "of traditional community to justify new forms of authoritarian rule."[12] During the presidency of Sukarno, the idea of gotong royong was officially elevated to a central tenet of Indonesian life. For Sukarno, the new nation was to be synonymous with gotong royong. He said that the Pancasila could be reduced to the idea of gotong royong. On June 1, 1945, Sukarno said of the Pancasila: The first two principles, nationalism and internationalism, can be pressed to one, which I used to call 'socionationalism.' Similarly with democracy 'which is not the democracy of the West' together with social justice for all can be pressed down to one, and called socio democracy. Finally – belief in God. 'And so what originally was five has become three: socio nationalism, socio democracy, and belief in God.' 'If I press down five to get three, and three to get one, then I have a genuine Indonesian term – GOTONG ROYONG [mutual co-operation]. The state of Indonesia which we are to establish should be a state of mutual co-operation. How fine that is ! A Gotong Royong state![13] In 1960, Sukarno dissolved the elected parliament and implemented the Gotong Royong Parliament. Governor of Jakarta, Ali Sadikin, spoke of a desire to reinvigorate urban areas with village sociality, with gotong royong.[14] Suharto's New Order was characterized by much discourse about tradition. During the New Order, Siskamling harnessed the idea of gotong royong. By the 1990s, if not sooner, gotong royong had been "fossilized" by New Order sloganeering.[15] During the presidency of Megawati, the Gotong Royong Cabinet was implemented. It lasted from 2001 to 2004. Philippines[edit] Members of the community volunteering to move a house to new location. Though no longer commonplace, this method of moving houses has become a traditional symbol for the concept of bayanihan. Bayanihan (pronounced [ˌbajɐˈniːhan]) is a Filipino term taken from the word bayan, referring to a nation, country,[16] town or community. The whole term bayanihan refers to a spirit of communal unity or effort to achieve a particular objective. It is focused on doing things as a group as it relates to one's community.[17] Etymology[edit] The origin of the term bayanihan can be traced from a common tradition in Philippine towns where community members volunteer to help a family move to a new place by volunteering to transport the house to a specific location. The process, which is the classic illustration of the term,[18] involves literally carrying the house to its new location. This is done by putting bamboo poles forming a strong frame to lift the stilts from the ground and carrying the whole house with the men positioned at the ends of each pole. The tradition also features a small fiesta hosted by the family to express gratitude to the volunteers. Usage[edit] In society, bayanihan has been adopted as a term to refer to a local civil effort to resolve national issues. One of the first groups to use the term is the Bayanihan Philippine National Folk Dance Company which travels to countries to perform traditional folk dances of the country with the objective of promoting Philippine culture. The concept is related to damayán ("to help one another"). In computing, the term bayanihan has evolved into many meanings and incorporated as codenames to projects that depict the spirit of cooperative effort involving a community of members. An example of these projects is the Bayanihan Linux project which is a Philippines-based desktop-focused Linux distribution. In ethnic newspapers, Bayanihan News is the name of community newspaper for the Philippine community in Australia. It is in English and in Filipino with regular news and articles on Philippine current events and history. It was established in October 1998 in Sydney, Australia. Turkey[edit] Imece is a name given for a traditional Turkish village-scale collaboration. For example, if a couple is getting married, villagers participate in the overall organization of the ceremony including but not limited to preparation of the celebration venue, food, building and settlement of the new house for the newly weds. Tasks are often distributed according to expertise and has no central authority to govern activities. Europe[edit] Finland & the Baltics[edit] A tent is being raised in a talkoot for midsummer in Ylimuonio in 2005. Talkoot (from Finnish: talkoo, almost always used in plural, talkoot) is a Finnish expression for a gathering of friends and neighbors organized to accomplish a task. The word is borrowed into Finland Swedish as talko[19] but is unknown to most Swedes. However, cognate terms and in approximately the same context are used in Estonia (talgu(d)),[20] Latvia (noun talka, verb talkot), and Lithuania (noun talka, verb talkauti). It is the cultural equivalent of communal work in a village community, although adapted to the conditions of Finland, where most families traditionally lived in isolated farms often miles away from the nearest village. A talkoot is by definition voluntary, and the work is unpaid. The voluntary nature might be imaginary due to social pressure, especially in small communities, and one's honour and reputation may be severely damaged by non-attendance or laziness. The task of the talkoot may be something that is a common concern for the good of the group, or it may be to help someone with a task that exceeds his or her own capacity. For instance, elderly neighbours or relatives can need help if their house or garden is damaged by a storm, or siblings can agree to arrange a party for a parent's special birthday as a talkoot. Typically, club houses, landings, churches, and parish halls can be repaired through a talkoot, or environmental tasks for the neighborhood are undertaken. The parents of pre-school children may gather to improve the playground, or the tenants of a tenement house may arrange a talkoot to put their garden in order for the summer or winter. A person unable to contribute with actual work may contribute food for the talkoot party, or act as a baby-sitter. When a talkoot is for the benefit of an individual, he or she is the host of the talkoot party and is obliged to offer food and drink. Russia, Ukraine, Belarus, Poland[edit] Toloka[21] or Taloka (also pomoch) in Russian (Toloka in Ukrainian and Talaka in Belarusian, Tłoka in Polish) is the form of communal voluntary work. Neighbours gathered together to build something or to harvest crops. Hungary[edit] Kaláka (ˈkɒlaːkɒ) is the Hungarian word for working together for a common goal. This can be building a house or doing agricultural activities together, or any other communal work on a volunteer basis. Ireland[edit] Meitheal (Irish pronunciation: [ˈmɛhəl]) is the Irish word for a work team, gang, or party and denotes the co-operative labour system in Ireland where groups of neighbours help each other in turn with farming work such as harvesting crops.[22] The term is used in various writings of Irish language authors. It can convey the idea of community spirit in which neighbours respond to each other's needs. In modern use for example, a meitheal could be a party of neighbours and friends invited to help decorate a house in exchange for food and drink, or in scouting, where volunteer campsite wardens maintain campsites around Ireland. Asturias[edit] Andecha (from Latin indictia 'announcement) it is a voluntary, unpaid and punctual aid to help a neighbor carry out agricultural tasks (cutting hay, taking out potatoes, building a barn, picking up the apple to make cider, etc.). The work is rewarded with a snack or a small party and the tacit commitment that the person assisted will come with his family to the call of another andecha when another neighbor requests it.[23] Very similar to Irish Meitheal. It should not be confused with another Asturian collective work institution, the Sestaferia. In this, the provision of the service is mandatory (under penalty of fine) and is not called a to help of an individual but the provision of common services (repair of bridges, cleaning of roads, etc.) Norway[edit] Dugnad is a Norwegian term for voluntary work done together with other people.[24] It's a very core phenomenon for Norwegians, and the word was voted as the Norwegian word of the year 2004 in the TV programme «Typisk norsk» ("Typically Norwegian"). Participation in a dugnad is often followed by a common meal, served by the host, or consisting of various dishes brought by the participants, thus the meal is also a dugnad. In urban areas, the dugnad is most commonly identified with outdoor spring cleaning and gardening in housing co-operatives. Dugnader (dugnads) are also a phenomenon in kindergartens and elementary schools to make the area nice, clean and safe and to do decorating etc. such as painting and other types of maintenance. Dugnader occur more widely in remote and rural areas. Neighbours sometimes participate during house or garage building, and organizations (such as kindergartens or non-profit organisations) may arrange annual dugnader. The Norwegian word "dugnadsånd" is translatable to the spirit of will to work together for a better community. Many Norwegians will describe this as a typical Norwegian thing to have. The word dugnad was used to unite the people of Norway to cooperate and shut down public activities to fight the pandemic of 2020.[25] Serbia[edit] Moba (Serbian: моба) is an old Serbian tradition of communal self-help in villages. It was a request for help in labor-intensive activities, like harvesting wheat, building a church or repairing village roads. Work was entirely voluntary and no compensation, except possibly meals for workers, was expected. North America[edit] Cherokee[edit] Gadugi (Cherokee:ᎦᏚᎩ) is a term used in the Cherokee language which means "working together"[26] or "cooperative labor" within a community.[27] Historically, the word referred to a labor gang of men and/or women working together for projects such as harvesting crops or tending to gardens of elderly or infirm tribal members.[28] The word Gadugi was derived from the Cherokee word for "bread", which is Gadu. In recent years the Cherokee Nation tribal government has promoted the concept of Gadugi. The GaDuGi Health Center is a tribally run clinic in Tahlequah, Oklahoma, the capital of the Cherokee Nation. The concept is becoming more widely known. In Lawrence, Kansas, in 2004 the rape crisis center affiliated with the University of Kansas, adopted the name, the Gadugi Safe Center, for its programs to aid all people affected by sexual violence.[26] Gadugi is the name of a font included with Microsoft Windows 8 that includes support for the Cherokee language along with other languages of the Americas such as Inuktitut. Latin America[edit] Mexico[edit] Tequio [es]. Zapoteca Quechua[edit] Main article: Minka (communal work) Mink'a or minka (Quechua[29][30] or Kichwa,[31] Hispanicized minca, minga) is a type of traditional communal work in the Andes in favor of the whole community (ayllu). Participants are traditionally paid in kind. Mink'a is still practiced in indigenous communities in Peru, Ecuador, Bolivia, and Chile, especially among the Quechua and the Aymara. Chile[edit] In rural southern Chile, labor reciprocity and communal work remained common through the twentieth century and into the twenty-first, particularly in rural communities on the Archipelago of Chiloé.[32] Referred to as "mingas," the practice can be traced to pre-contact Mapuche and Huilliche traditions of communal labor.[33] In Chiloé, mingas took the form either of días cambiados (tit for tat exchanges of labor between neighbors) or large-scale work parties hosted by a particular family, accompanied by food and drink, and often lasting several days.[34] Most agricultural work and community construction projects were done by way of mingas. The tiradura de casa ("house pull") involved moving a house from one location to another. Panama In rural Panama, especially in the Azuero peninsula region and its diaspora, it is common to hold a 'junta' party[35] as a communal labor event. Most commonly these events are used to harvest rice, clear brush with machetes, or to build houses. Workers generally work without compensation but are provided with meals and often alcoholic beverages such as fermented chicha fuerte and seco. Bee[edit] History[edit] This use of the word bee is common in literature describing colonial North America. One of the earliest documented occurrences is found in the Boston Gazette for October 16, 1769, where it is reported that "Last Thursday about twenty young Ladies met at the house of Mr. L. on purpose for a Spinning Match; (or what is called in the Country a Bee)."[36] It was, and continues to be, commonly used in Australia also, most often as "working bee".[37][38] In literature[edit] Uses in literature include: "There was a bee to-day for making a road up to the church." – Anne Langton "The cellar … was dug by a bee in a single day." – S. G. Goodrich "I made a bee; that is, I collected as many of the most expert and able-bodied of the settlers to assist at the raising." – John Galt, Lawrie Todd (1830) "When one of the pioneers had chopped down timber and got it in shape, he would make a logging bee, get two or three gallons of New England Rum, and the next day the logs were in great heaps. ... after a while there was a carding and jutting mill started where people got their wool made into rolls, when the women spun and wove it. Sometimes the women would have spinning bees. They would put rolls among their neighbors and on a certain day they would all bring in their yarn and at night the boys would come with their fiddles for a dance. ... He never took a salary, had a farm of 80 acres [324,000 m2] and the church helped him get his wood (cut and drawn by a bee), and also his hay." – James Slocum "'I am in a regular quandary', said the mistress of the house, when the meal was about half over. Mr. Van Brunt looked up for an instant, and asked, 'What about?' 'Why, how I am ever going to do to get those apples and sausage-meat done. If I go to doing 'em myself I shall about get through by spring.' 'Why don't you make a bee?' said Mr. Van Brunt." – Susan Warner, The Wide, Wide World (1850)[39] "She is gone out with Cousin Deborah to an apple bee." – Charlotte Mary Yonge, The Trial; or More Links of the Daisy Chain (1864) Etymology[edit] The origin of the word "bee" in this sense is debated. Because it describes people working together in a social group, a common belief is that it derives from the insect of the same name and similar social behavior. This derivation appears in, for example, the Oxford English Dictionary.[40] Other dictionaries, however, regard this as a false etymology, and suggest that the word comes from dialectal been or bean (meaning "help given by neighbors"), derived in turn from Middle English bene (meaning "prayer", "boon" and "extra service by a tenant to his lord").[41][42] See also[edit] Sharing References[edit] ^ "Umuganda". Rwanda Governance Board. Retrieved 2019-12-03. ^ Manger, Leif O. (1987). "Communal Labour in the Sudan". University of Bergen: 7. Cite journal requires |journal= (help) ^ 'Conceptual analysis of volunteer', 2004 ^ Wehr, Hans. A Dictionary of Modern Written Arabic, Arabic - English. Beirut: Librarie Du Liban. ^ Kevlihan, Rob (2005). "Developing Connectors in Humanitarian Emergencies: Is it possible in Sudan?" (PDF). Humanitarian Exchange. 30. ^ "Gotong Royong - KBBI Daring". kbbi.kemdikbud.go.id. Retrieved 2020-05-23. ^ Taylor, Paul Michael; Aragon, Lorraine V (1991). Beyond the Java Sea: Art of Indonesia's Outer Islands. Abrams. p. 10. ISBN 0-8109-3112-5. ^ Geertz, Clifford. "Local Knowledge: Fact and Law in Comparative Perspective," pp. 167–234 in Geertz Local Knowledge: Further Essays in Interpretive Anthropology, NY: Basic Books. 1983. ^ Hahn, Robert A. (1999). Anthropology in Public Health: Bridging Differences in Culture and Society. Oxford, UK: Oxford University Press. ^ Pottier, Johan (1999). Anthropology of Food: The Social Dynamics of Food Security. Oxford, UK: Blackwell. p. 84. ^ Natsir, Muhammad. "The Indonesian Revolution." In Kurzman, Charles Liberal Islam: A Sourcebook, p. 62. Oxford, UK: Oxford University Press. 1998. ^ Sidel, John Thayer (2006). Riots, Pogroms, Jihad: Religious Violence in Indonesia. Ithaca, NY: Cornell University Press. p. 32. ^ "BUNG KARNO: 6 JUNE - 21 JUNE". Antenna. Retrieved 25 March 2013. ^ Kusno, Abidin (2003). Behind the Postcolonial: Architecture, Urban Space and Political Cultures. NY: Routledge. p. 152. ^ Anderson, Benedict (1990). Language and Power: Exploring Political Cultures in Indonesia. Ithaca, NY: Cornell UP. p. 148. ^ Visser, Wayne; Tolhurst, Nick (2017). The World Guide to CSR: A Country-by-Country Analysis of Corporate Sustainability and Responsibility. Routledge. ISBN 978-1-351-27890-4. Retrieved 9 April 2020. ^ Gripaldo, Rolando M. (2005). Filipino Cultural Traits: Claro R. Ceniza Lectures. CRVP. p. 173. ISBN 978-1-56518-225-7. Retrieved 9 April 2020. ^ Smith, Bradford; Shue, Sylvia; Villarreal, Joseph (1992). Asian and Hispanic philanthropy: sharing and giving money, goods, and services in the Chinese, Japanese, Filipino, Mexican, and Guatemalan communities in the San Francisco Bay Area. University of San Francisco, Institute for Nonprofit Organization Management, College of Professional Studies. p. 113. Retrieved 9 April 2020. ^ Mikael Reuter: En/ett iögonfallande talko? (in Swedish). Retrieved: 2010-10-04. ^ "[EKSS] "Eesti keele seletav sõnaraamat"". eki.ee. ^ "Vasmer's Etymological Dictionary". dic.academic.ru. ^ "Meitheal". Irish Dictionary Online. englishirishdictionary.com. Archived from the original on 10 July 2011. Retrieved 28 March 2013. ^ https://dej.rae.es/lema/andecha ^ Ottar Brox; John M. Bryden; Robert Storey (2006). The political economy of rural development: modernisation without centralisation?. Eburon Uitgeverij B.V. p. 79. ISBN 90-5972-086-5. ^ One Word Spared Norway From COVID-19 Disaster Kelsey L.O. July 20, 2020 ^ a b "GaDuGi SafeCenter's Mission Statement and Vision Statement". GaDuGi SafeCenter. Retrieved 25 March 2013. ^ Feeling, Durbin (1975). Cherokee-English Dictionary. Cherokee Nation of Oklahoma. p. 73. ^ Dunaway, Wilma. "The Origin of Gadugi". Cherokee Nation. Retrieved 28 March 2013. ^ Teofilo Laime Ajacopa, Diccionario Bilingüe Iskay simipi yuyayk'ancha, La Paz, 2007 (Quechua-Spanish dictionary) ^ Diccionario Quechua - Español - Quechua, Academía Mayor de la Lengua Quechua, Gobierno Regional Cusco, Cusco 2005 (Quechua-Spanish dictionary) ^ Fabián Potosí C. et al., Ministerio de Educación del Ecuador: Kichwa Yachakukkunapa Shimiyuk Kamu, Runa Shimi - Mishu Shimi, Mishu Shimi - Runa Shimi. Quito (DINEIB, Ecuador) 2009. (Kichwa-Spanish dictionary) ^ Daughters, Anton. "Solidarity and Resistance on the Island of Llingua." Anthropology Now 7:1 pp.1-11 (April 2015) ^ Cárdenas Álvarez, Renato, Daniel Montiel Vera, and Catherine Grace Hall. Los Chonos y los Veliche de Chiloé (Santiago, Chile: Ediciones Olimpho) 1991 ^ Daughters, Anton. "Southern Chile's Archipelago of Chiloé: Shifting Identities in a New Economy." Journal of Latin American and Caribbean Anthropology 21:2 pp.317.335 (July 2016) ^ "Folklore.PanamaTipico.com (English)". folklore.panamatipico.com. Retrieved 2018-10-30. ^ Boston Gazette, October 16, 1769. ^ The Australian Bosses roll up for Tony Abbott's working bee August 11, 2012 Retrieved 3 March 2015. ^ "Brisbane working bee hits streets". abc.net.au. January 15, 2011. Retrieved March 3, 2015. ^ Warner, Susan (1851). The Wide, Wide World. 1. New York: Putnam. p. 277. ^ "bee, n.". Oxford English Dictionary (Online ed.). Oxford University Press. (Subscription or participating institution membership required.) ^ "Bee". Dictionary.com. Retrieved March 3, 2015. ^ "Bee". Merriam-Webster. Retrieved 27 December 2020. Wikimedia Commons has media related to Communal work. v t e Cherokee Tribes Cherokee Nation Eastern Band United Keetoowah Band Culture Society National holiday Calendar Clans Gadugi Green Corn Ceremony Language history syllabary Cherokee (Unicode block) Cherokee Supplement (Unicode block) Cherokee Immersion School New Kituwah Academy Marbles Spiritual beliefs Moon-eyed people Ethnobotany Black drink Stomp dance Booger dance Flag of the Cherokee Nation Legends Ani Hyuntikwalaski Deer Woman Horned Serpent Moon-eyed people Nun'Yunu'Wi Nûñnë'hï Kâ'lanû Ahkyeli'skï U'tlun'ta Tsul 'Kalu History History timeline military Treaties Kituwa Ani-kutani skiagusta (rank) outacite (rank) Raven of Chota Wars Tribal Wars Battle of Taliwa Anglo-Cherokee War Siege of Fort Loudoun Battle of Echoee Cherokee War of 1776 Cherokee–American wars Battle of Hightower Battle of Lindley's Fort Nickajack Expedition American Civil War 1st Cherokee Mounted Rifles Cherokee treaties Treaty of New Echota Treaty of Tellico Treaty of Turkeytown Treaty of Dewitt's Corner Treaty of Hard Labour Treaty of Lochaber Treaty of Hopewell Treaty of Holston Jackson and McMinn Treaty Transylvania Purchase Chickamauga Cherokee Overhill Cherokee Cherokee Phoenix Cherokee Nation (1794–1907) Removal Trail of Tears Indian Removal Act Cultural citizenship Cherokee descent Jacob Brown Grant Deeds Texas Cherokees Organizations Heritage Center Cherokee Preservation Foundation Warriors Society Original Keetoowah Society Keetoowah Nighthawk Society Youth Choir Heritage groups Cherokee Southwest Township Oconaluftee Indian Village Unto These Hills Education Female Seminary Male Seminary Cherokee Central Schools Cherokee High School Sequoyah Schools Sequoyah High School Politics and law Principal Chiefs Blood Law Slavery 1842 revolt freedmen controversy Cherokee Nation v. Georgia (1831) Worcester v. Georgia (1832) The Cherokee Tobacco case (1871) Cherokee Nation v. Leavitt (2005) Cherokee Commission Cherokee Strip in Kansas Sequoyah Constitutional Convention Towns and Villages Cherokee Towns (pre-Removal) Nikwasi Nununyi Cowee Nacoochee NewEchota Chota Kituwa Red Clay Turkeytown Keowee Isunigu Talulah Toxoway Kulsetsiyi Tugaloo Tuskegee Joara Tellico Chatuga Hiwassee Amoyeligunahita Tanasi Conasauga Etowah Brasstown Nantahala Turtletown Ducktown Spike Bucktown Ocoee Tuckasegee Running Water Titsohili Crowtown Nickajack Coyotee Mialoquo Tomotley Toqua Tomassee Oconee Settico Chilhowee Talisi Frogtown Long Swamp Oostanaula Tsatanugi Dirt town Island town Cherokee Nations Western Cherokee Nation Tahlonteeskee Cherokee Nation Tahlequah Eastern Band Qualla Boundary Cherokee Landmarks and Memorial Sites Judaculla Rock Long Island John Ross House Chieftains Museum Ross's Landing Tellico Blockhouse Sequoyah's Cabin Trail of Tears State Park Cherokee Removal Memorial Park Cherokee National Capitol First Cherokee Female Seminary Site Brainerd Mission Rattlesnake Springs Fort Cass Red Clay State Park Hair Conrad Cabin Nancy Ward Tomb Blythe Ferry Bussell Island Chief Vann House Historic Site Mantle Rock Untokiasdiyi Standing Stone Stick Ball Grounds Cullasaja River Tuckasegee River Oconaluftee valley Oconaluftee River Abrams Creek Sycamore Shoals The Great Trading Path The Great War Path Hiwassee River Heritage Center Chatata Tuckaleechee Fort Smith Historic Site Port Royal State Park Five Civilized Tribes Museum Tlanusiyi Cherokee Path People Early leaders Moytoy of Tellico Attakullakulla Amouskositte Old Hop Moytoy of Citico Standing Turkey Outacite of Keowee Oconostota Savanukah Old Tassel Little Turkey Dragging Canoe Kunokeski Incalatanga Tagwadihi Cherokee Nation East (1794-1839) Enola Pathkiller Big Tiger Charles R. Hicks William Hicks John Ross Cherokee Nation West (1810-1839) The Bowl Degadoga Tahlonteeskee John Jolly John Looney John Rogers Eastern Band of Cherokee Indians (1824-present) Yonaguska William Holland Thomas Tsaladihi Gerard Parker Joyce Dugan Patrick Lambert Richard Sneed Cherokee Nation in Indian Territory (1839–1907) Lewis Downing Degataga William P. Ross Utselata Dennis Bushyhead Joel B. Mayes Johnson Harris Samuel Houston Mayes Thomas Buffington William Charles Rogers Cherokee Nation (1975–present) J. B. Milam W. W. Keeler Ross Swimmer Wilma Mankiller Joe Byrd Chadwick "Corntassel" Smith Bill John Baker Chuck Hoskin, Jr. United Keetoowah Band of Cherokee Indians (1939–present) James L. Gordon John W. Hair Other notable Cherokee Nancy Ward Tsali Incalatanga Tahlonteeskee (warrior) Turtle-at-Home Junaluska Goingsnake Elias Boudinot Wauhatchie James Vann David Vann Joseph Vann Bob Benge Nunnahitsunega Ned Christie John Martin Markwayne Mullin Yvette Herrell Sequoya Major Ridge Jenny McIntosh Sam Sixkiller Clement V. Rogers Redbird Smith Durbin Feeling Hastings Shade Kimberly Teehee See also: Cherokee-language Wikipedia Authority control GND: 4170995-0 Retrieved from "https://en.wikipedia.org/w/index.php?title=Communal_work&oldid=1009868254" Categories: Competitions Mutualism (movement) Social groups Hidden categories: CS1 errors: missing periodical Articles with Swedish-language sources (sv) Articles containing Arabic-language text Commons category link from Wikidata Wikipedia articles with GND identifiers Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version In other projects Wikimedia Commons Languages Asturianu Башҡортса Беларуская Беларуская (тарашкевіца)‎ Dansk Deutsch Eesti Español Euskara فارسی Lietuvių Norsk bokmål Norsk nynorsk Polski Português Русский Српски / srpski Suomi Svenska Татарча/tatarça Türkçe Українська Winaray Edit links This page was last edited on 2 March 2021, at 18:09 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-699 ---- GatorBox - Wikipedia GatorBox From Wikipedia, the free encyclopedia Jump to navigation Jump to search GatorBox CS The GatorBox is a LocalTalk-to-Ethernet bridge, a router used on Macintosh-based networks to allow AppleTalk communications between clients on LocalTalk and Ethernet physical networks. The GatorSystem software also allowed TCP/IP and DECnet protocols to be carried to LocalTalk-equipped clients via tunneling, providing them with access to these normally Ethernet-only systems. When the GatorBox is running GatorPrint software, computers on the Ethernet network can send print jobs to printers on the LocalTalk network using the 'lpr' print spool command. When the GatorBox is running GatorShare software, computers on the LocalTalk network can access Network File System (NFS) hosts on Ethernet. Contents 1 Specifications 2 Software 3 Software Requirements 4 See also 5 References 6 External links Specifications[edit] The original GatorBox (model: 10100) is a desktop model that has a 10 MHz Motorola 68000 Cpu, 1MB RAM, 128k EPROM for boot program storage, 2 kB NVRAM for configuration storage, LocalTalk Mini-DIN-8 connector, Serial port Mini-DIN-8 connector, BNC connector, AUI connector, and is powered by an external power supply (16VAC 1 A transformer that is connected by a 2.5 mm plug). This model requires a software download when it is powered on to be able to operate. The GatorBox CS (model: 10101) is a desktop model that uses an internal power supply (120/240 V, 1.0 A, 50–60 Hz). The GatorMIM CS is a media interface module that fits in a Cabletron Multi-Media Access Center (MMAC). The GatorBox CS/Rack (model: 10104) is a rack-mountable version of the GatorBox CS that uses an internal power supply (120/240 V, 1.0 A, 50–60 Hz). The GatorStar GXM integrates the GatorMIM CS with a 24 port LocalTalk repeater.[1] The GatorStar GXR integrates the GatorBox CS/Rack with a 24 port LocalTalk repeater.[2] This model does not have a BNC connector and the serial port is a female DE-9 connector. All "CS" models have 2MB of memory and can boot from images of the software that have been downloaded into the EPROM using the GatorInstaller application. Software[edit] There are three disks in the GatorBox software package. Note that the content of the disks for an original GatorBox is different from that of the GatorBox CS models. Configuration - contains GatorKeeper, MacTCP folder and either GatorInstaller (for CS models) or GatorBox TFTP and GatorBox UDP-TFTP (for original GatorBox model) Application - contains GatorSystem, GatorPrint or GatorShare, which is the software that runs in the GatorBox. The application software for the GatorBox CS product family has a "CS" at the end of the filename. GatorPrint includes GatorSystem functionality. GatorShare includes GatorSystem and GatorPrint functionality. Network Applications - NCSA Telnet, UnStuffit Software Requirements[edit] The GatorKeeper 2.0 application requires Macintosh System version 6.0.2 up to 7.5.1 and Finder version 6.1 (or later) MacTCP (not Open Transport)[3] See also[edit] Kinetics FastPath Line Printer Daemon protocol – Print Spooling LocalTalk-to-Ethernet bridge – Other LocalTalk-to-Ethernet bridges/routers MacIP – TCP/IP Gateway References[edit] McCoy, Michael (August 1991). Setting Up Your GatorBox - Hardware Installation Guide. Cayman Systems. pp. 1–1, A-1–2. ^ Data Communication Network at the ASRM Facility - See 3.1.9 ^ "Glossary of Macintosh Networking terms - See GatorStar". Archived from the original on 2006-10-03. Retrieved 2007-01-25. ^ Christopher, Mason. "GatorBox Software". External links[edit] GatorBox CS configuration information Internet Archive copy of a configuration guide produced by the University of Illinois Juiced.GS magazine Volume 10, Issue 4 (Dec 2005) contains an article on how to set up a GatorBox for use with an Apple IIgs Software and scanned manuals for the GatorBox and GatorBox CS Retrieved from "https://en.wikipedia.org/w/index.php?title=GatorBox&oldid=932309684" Categories: Networking hardware Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages Add links This page was last edited on 24 December 2019, at 23:01 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-7147 ---- Hackathon - Wikipedia Hackathon From Wikipedia, the free encyclopedia Jump to navigation Jump to search Event in which groups of software developers work at an accelerated pace A hackathon (also known as a hack day, hackfest, datathon or codefest; a portmanteau of hacking marathon) is a design sprint-like event; often, in which computer programmers and others involved in software development, including graphic designers, interface designers, project managers, domain experts, and others collaborate intensively on software projects. The goal of a hackathon is to create functioning software or hardware by the end of the event.[1] Hackathons tend to have a specific focus, which can include the programming language used, the operating system, an application, an API, or the subject and the demographic group of the programmers. In other cases, there is no restriction on the type of software being created. Contents 1 Etymology 2 Structure 3 Types of hackathons 3.1 For an application type 3.2 Using a specific programming language, API, or framework 3.3 For a cause or purpose 3.4 As a tribute or a memorial 3.5 For a demographic group 3.6 For internal innovation and motivation 3.7 To connect local tech communities 3.8 Code sprints 3.9 Edit-a-thon 4 Controversies 5 Notable Events 6 See also 7 References 8 External links Etymology[edit] The word "hackathon" is a portmanteau of the words "hack" and "marathon", where "hack" is used in the sense of exploratory programming, not its alternate meaning as a reference to breaching computer security. OpenBSD's apparent first use of the term referred to a cryptographic development event held in Calgary on June 4, 1999,[2] where ten developers came together to avoid legal problems caused due to export regulations of cryptographic software from the United States. Since then, a further three-to-five events per year have occurred around the world to advance development, generally on university campuses. For Sun Microsystems, the usage referred to an event at the JavaOne conference from June 15 to June 19, 1999; there John Gage challenged attendees to write a program in Java for the new Palm V using the infrared port to communicate with other Palm users and register it on the Internet. Starting in the mid to late 2000s, hackathons became significantly more widespread and began to be increasingly viewed by companies and venture capitalists as a way to quickly develop new software technologies, and to locate new areas for innovation and funding. Some major companies were born from these hackathons, such as GroupMe, which began as a project at a hackathon at the TechCrunch Disrupt 2010 conference; in 2011 it was acquired by Skype for $85 million. The software PhoneGap began as a project at the iPhoneDevCamp (later renamed iOSDevCamp) in 2008;[3] the company whose engineers developed PhoneGap, Nitobi, refocused itself around PhoneGap, and Nitobi was bought by Adobe in 2011 for an undisclosed amount.[4] Structure[edit] Hackathons typically start with communication via a presentation or a web page from the hosting organization that mentions the objectives, terms, and details of the hackathon. Developers register to participate in the hackathon and are qualified after the organization screens their background and skills. When the hackathon event begins, the participating individuals or teams start their programming work. The administrator of the hackathon is typically able to answer questions and offer help when there issues come up in the event. Hackathons can last several hours to several days. For hackathons that last 24 hours or longer, especially competitive ones, eating is often informal, with participants often subsisting on food like pizza and energy drinks. Sometimes sleeping is informal as well, with participants sleeping on-site with sleeping bags. At the end of hackathons, there are usually a series of demonstrations in which each group presents their results. To capture the great ideas and work-in-progress often people post a video of the demonstrations, blog about results with screenshots and details, share links and progress on social media, suggest a place for open source code and generally make it possible for people to share, learn from and possibly build from the ideas generated and initial work completed. There is sometimes a contest element as well, in which a panel of judges select the winning teams, and prizes are given. At many hackathons, the judges are made up of organisers and sponsors. At BarCamp-style hackathons, that are organised by the development community, such as iOSDevCamp, the judges are usually made up of peers and colleagues in the field. Such prizes are sometimes a substantial amount of money: a social gaming hackathon at the TechCrunch Disrupt conference offered $250,000 in funding to the winners, while a controversial[5] 2013 hackathon run by Salesforce.com had a payout of $1 million to the winners, billed as the largest-ever prize.[6] Types of hackathons[edit] For an application type[edit] Some hackathons focus on a particular platform such as mobile apps, a desktop operating system, web development or video game development. Mobile app hackathons like Over the Air, held at Phoenix Park, Ireland, can see a large amount of corporate sponsorship and interest.[7][8] Music Hack Day, a hackathon for music-related software and hardware applications, is a popular event, having been held over 30 times around the world since 2009.[9] Also Music Tech Fest, a three-day interdisciplinary festival for music ideas bringing together musicians with hackers, researchers and industry, features a hackathon.[10] Similarly, Science Hack Day, a hackathon for making things with science, has been held over 45 times in over 15 countries around the world since 2010.[11] Hackathons have been held to develop applications that run on various mobile device operating systems, such as Android,[12] iOS[13] and MeeGo.[14] Hackathons have also been held to develop video-based applications and computer games.[15] Hackathons where video games are developed are sometimes called game jams. "TV Hackfest" events have been held in both London[16] and San Francisco,[17] focusing mainly on social television and second screen technologies. In TV Hackfests, challenge briefs are typically submitted by content producers and brands, in the form of broadcast industry metadata or video content, while sponsors supply APIs, SDKs and pre-existing open source software code.[18] Hackathons have also been used in the life sciences to advance the informatics infrastructure that supports research. The Open Bioinformatics Foundation ran two hackathons for its member projects in 2002 and 2003, and since 2010 has held 2-day "codefests" preceding its annual conference.[19] The National Evolutionary Synthesis Center has co-organized and sponsored hackathons for evolutionary bioinformatics since 2006.[20][21] BioHackathon[22] is an annual event that started in 2008 targeted at advancing standards to enable interoperable bioinformatics tools and Web services. Neuroscientists have also used hackathons to bring developers and scientists together to address issues that range from focusing on a specific information system (e.g., Neurosynth Hackathon[23] and the Allen Brain Atlas Hackathon[24]) and providing reserved time for broad scientific inquiry (e.g., Brainhack),[25] to using specific challenges that focus hacking activity (e.g., HBM Hackathon).[26] There has been an emergence of 'datathons' or data-focused hackathons in recent years.[27][28][29] These events challenge data scientists and others to use creativity and data analysis skills and platforms to build, test and explore solutions and dashboards which analyse huge datasets in a limited amount of time. These are increasingly being used to deliver insights in big public and private datasets in various disciplines including business,[30] health care[31][32] news media[33] and for social causes.[34] Using a specific programming language, API, or framework[edit] There have been hackathons devoted to creating applications that use a specific language or framework, like JavaScript,[35] Node.js,[36] HTML5[37] and Ruby on Rails.[38] Some hackathons focus on applications that make use of the application programming interface, or API, from a single company or data source. Open Hack, an event run publicly by Yahoo! since 2006 (originally known as "Hack Day", then "Open Hack Day"), has focused on usage of the Yahoo! API, in addition to APIs of websites owned by Yahoo!, like Flickr.[39] The company's Open Hack India event in 2012 had over 700 attendees.[40] Google has run similar events for their APIs,[41] as has the travel guide company Lonely Planet.[42] The website Foursquare notably held a large, global hackathon in 2011, in which over 500 developers at over 30 sites around the world competed to create applications using the Foursquare API.[43] A second Foursquare hackathon, in 2013, had around 200 developers.[44] The IETF organizes Hackathons for each IETF meetings which are focused on IETF Internet Draft and IETF RFC implementation for better inter-operability and improved Internet Standards.[45] For a cause or purpose[edit] There have been a number of hackathons devoted to improving government, and specifically to the cause of open government.[46] One such event, in 2011, was hosted by the United States Congress.[47] Starting in 2012, NASA has been annually hosting the International Space Apps Challenge. In 2014, the British government and HackerNest ran DementiaHack,[48] the world's first hackathon dedicated to improving the lives of people living with dementia and their caregivers.[49][50] The series continues in 2015, adding the Canadian government and Facebook as major sponsors.[51] The Global Game Jam, the largest video game development hackathon,[52] often includes optional requirements called 'diversifiers'[53] that aim to promote game accessibility and other causes. Various hackathons have been held to improve city transit systems.[54] Hackathons aimed at improvements to city local services are increasing, with one of the London Councils (Hackney) creating a number of successful local solutions with a two-day Hackney-thon.[55] There have also been a number of hackathons devoted to improving education, including Education Hack Day[56] and on a smaller scale, looking specifically at the challenges of field work based geography education, the Field Studies Council[57] hosted FSCHackday.[58] Random Hacks of Kindness is another popular hackathon, devoted to disaster management and crisis response.[59] ThePort[60] instead is a hackathon devoted to solving humanitarian, social and public interest challenges. It's hosted by CERN with partners from other non-governmental organizations such as ICRC and UNDP. In March 2020, numerous world-wide initiatives led by entrepreneurs and governmental representatives from European countries resulted in a series of anti-crisis hackathons Hack the Crisis, with first to happen in Estonia,[61] followed up by Poland,[62] Latvia, and Ukraine. As a tribute or a memorial[edit] A number of hackathons around the world have been planned in memory of computer programmer and internet activist Aaron Swartz, who died in 2013.[63][64][65][66] For a demographic group[edit] Some hackathons are intended only for programmers within a certain demographic group, like teenagers, college students, or women.[67] Hackathons at colleges have become increasingly popular, in the United States and elsewhere. These are usually annual or semiannual events that are open to college students at all universities. They are often competitive, with awards provided by the University or programming-related sponsors. Many of them are supported by the organization Major League Hacking, which was founded in 2013 to assist with the running of collegiate hackathons. PennApps at the University of Pennsylvania was the first student-run college hackathon; in 2015 it became the largest college hackathon with its 12th iteration hosting over 2000 people and offering over $60k in prizes.[68][69] The University of Mauritius Computer Club and Cyberstorm.mu organized a Hackathon dubbed "Code Wars" focused on implementing an IETF RFC in Lynx in 2017.[70][71] ShamHacks at Missouri University of Science and Technology is held annually as an outreach activity of the campus's Curtis Laws Wilson Library. ShamHacks 2018[72] focused on problem statements to better quality of life factors for US veterans, by pairing with veteran-owned company sponsors.[73] For internal innovation and motivation[edit] Some companies hold internal hackathons to promote new product innovation by the engineering staff. For example, Facebook's Like button was conceived as part of a hackathon.[74] To connect local tech communities[edit] Some hackathons (such as StartupBus, founded in 2010 in Australia) combine the competitive element with a road trip, to connect local tech communities in multiple cities along the bus routes. This is now taking place across North America, Europe, Africa and Australasia.[75] Code sprints[edit] Not to be confused with Scrum (software development) § Sprint. In some hackathons, all work is on a single application, such as an operating system, programming language, or content management system. Such events are often known as "code sprints", and are especially popular for open source software projects, where such events are sometimes the only opportunity for developers to meet face-to-face.[76] Code sprints typically last from one week to three weeks and often take place near conferences at which most of the team attend. Unlike other hackathons, these events rarely include a competitive element. The annual hackathon to work on the operating system OpenBSD, held since 1999, is one such event; it may have originated the word "hackathon".[citation needed] Edit-a-thon[edit] An edit-a-thon (a portmanteau of editing marathon) is an event where editors of online communities such as Wikipedia, OpenStreetMap (also as a "mapathon"), and LocalWiki edit and improve a specific topic or type of content. The events typically including basic editing training for new editors. Controversies[edit] This section is in list format, but may read better as prose. You can help by converting this section, if appropriate. Editing help is available. (November 2020) A team at the September 2013 TechCrunch Disrupt Hackathon presented the TitStare app, which allowed users to post and view pictures of men staring at women's cleavage.[77] TechCrunch issued an apology later that day.[78] A November 2013 hackathon run by Salesforce.com, billed as having the largest-ever grand prize at $1 million, was accused of impropriety after it emerged that the winning entrants, a two-person startup called Upshot, had been developing the technology that they demoed for over a year and that one of the two was a former Salesforce employee.[5] Major League Hacking expelled a pair of hackers from the September 2015 hackathon Hack the North at the University of Waterloo for making jokes that were interpreted as bomb threats, leading many hackers to criticize the organization.[79] As a result of the controversy, Victor Vucicevich resigned from the Hack the North organizing team.[80] Use of hackathon participants as de facto unpaid laborers by some commercial ventures has been criticized as exploitative.[81][82]:193–194 Notable Events[edit] MHacks HackMIT Junction (hackathon) See also[edit] MediaWiki has documentation related to: Hackathons Game Jam Installfest Editathon Charrette Startup Weekend Campus Party References[edit] ^ "Hackathon definition". dictionary.com. ^ "OpenBSD Hackathons". OpenBSD. Retrieved 2015-04-10. ^ PhoneGap: It’s Like AIR for the IPhone Archived 2013-03-10 at the Wayback Machine, Dave Johnson, PhoneGap Blog, 18 September 2008 ^ Adobe Acquires Developer Of HTML5 Mobile App Framework PhoneGap Nitobi, Leena Rao, TechCrunch, October 3, 2011 ^ a b Biddle, Sam (November 22, 2013). "The "Biggest Hackathon Prize In History" Was Won By Cheaters". Valleywag. ^ Williams, Alex (November 21, 2013). "Two Harvard University Alum Win Disputed Salesforce $1M Hackathon Prize At Dreamforce [Updated]". TechCrunch. ^ Hackers Get Hired At Bletchley Park Archived 2011-09-26 at the Wayback Machine, HuffPost Tech UK, September 19, 2011 ^ "Mobile App Hackathon - TechVenture 2011". 21 December 2011. Archived from the original on 21 December 2011. Retrieved 16 March 2018. ^ "Music Hack Day homepage". Musichackday.org. Retrieved 2013-10-09. ^ Rich, L. J. (2014-04-20). "Music Hackathon at Music Tech Fest in Boston". BBC News. BBC.com. Retrieved 2015-03-05. ^ "Science Hack Day homepage". Sciencehackday.org. Retrieved 2014-12-09. ^ "Android Hackathon". Android Hackathon. 2010-03-13. Retrieved 2013-10-09. ^ "iOSDevCamp 2011 Hackathon". Iosdevcamp.org. Retrieved 2013-10-09. ^ "N9 Hackathon" (in German). Metalab.at. Retrieved 2013-10-09. ^ "Nordeus 2011 Game Development Hackathon". Seehub.me. Archived from the original on 2013-10-29. Retrieved 2013-10-09. ^ "TV Hackfest homepage". Hackfest.tv. Retrieved 2013-10-09. ^ "Article on TV Hackfest San Francisco". Techzone360.com. 2012-12-19. Retrieved 2013-10-09. ^ "PDF of Feature article on TV Hackfest in AIB The Channel" (PDF). Archived from the original (PDF) on 2014-02-26. Retrieved 2013-10-09. ^ "OBF Hackathons". Open-bio.org. 2013-03-12. Retrieved 2013-10-09. ^ "NESCent-sponsored Hackathons". Informatics.nescent.org. Retrieved 2013-10-09. ^ T Hill (2007-12-14). "Hilmar Lapp, Sendu Bala, James P. Balhoff, Amy Bouck, Naohisa Goto, Mark Holder, Richard Holland, et al. 2007. "The 2006 NESCent Phyloinformatics Hackathon: A Field Report." Evolutionary Bioinformatics Online 3: 287–296". La-press.com. Retrieved 2013-10-09. ^ "biohackathon.org". biohackathon.org. Retrieved 2013-10-09. ^ "hackathon.neurosynth.org". hackathon.neurosynth.org. Archived from the original on 2013-12-02. Retrieved 2013-10-09. ^ "2012 Allen Brain Atlas Hackathon - Hackathon - Allen Brain Atlas User Community". Community.brain-map.org. 2012-09-04. Archived from the original on 2013-12-02. Retrieved 2013-10-09. ^ "Brainhack.org". Brainhack.org. Retrieved 2013-10-09. ^ "HBM Hackathon - Organization for Human Brain Mapping". Humanbrainmapping.org. Retrieved 2013-10-09. ^ "Datathon 2020 the International Sata Science Hackathon". Data Science Society. Retrieved 16 December 2020. ^ "Datathon 2020". Data Republic. Retrieved 16 December 2020. ^ "WiDS Datathon 2021". Women in Data Science. Retrieved 16 December 2020. ^ "KPMG Datathon Challenge". KPMG Malaysia. ^ PubMed: US National Library of Medicine https://www.ncbi.nlm.nih.gov/pmc/?term=datathon. Retrieved 16 December 2020. Missing or empty |title= (help) ^ Aboab, Jerome; Celi, Leo; Charlton, Peter; Feng, Mengling (6 April 2016). "A "datathon" model to support cross-disciplinary collaboration". Science Translational Medicine. 8 (333): 8. doi:10.1126/scitranslmed.aad9072. PMC 5679209. PMID 27053770. ^ "Hack the News Datathon". Data Science Society. ^ "Datathon for Social Good". Our Community. Retrieved 16 December 2020. ^ DownCityJS, the Providence JavaScript Hackathon Archived 2014-03-25 at the Wayback Machine ^ Knockout, Node. "Node Knockout". www.nodeknockout.com. Retrieved 16 March 2018. ^ HTML5 App Hackathon Archived 2014-03-25 at the Wayback Machine, May 5–6, 2012, Berlin, Germany ^ "Pune Rails Hackathon: July 29-30, 2006". Punehackathon.pbworks.com. Retrieved 2013-10-09. ^ Open! Hack! Day!, Flickr blog, September 3, 2008 ^ Purple in Bangalore – Inside Yahoo! Open Hack India 2012 Archived 2013-10-21 at the Wayback Machine, Pushpalee Johnson, August 11, 2012, YDN Blog ^ "Google Hackathon • Vivacity 2015". Vivacity. 2014-12-25. Archived from the original on 2015-01-26. Retrieved 2015-01-10. ^ "Melbourne Hack Day: List Of Presentations And Winners". Archived from the original on 2011-04-22. ^ The hackathon heard round the world! Archived 2012-03-01 at the Wayback Machine, Foursquare blog, September 20, 2011 ^ If you build it, they will come. Check out all the cool new things you can do with Foursquare! #hackathon Archived 2013-04-29 at the Wayback Machine, Foursquare blog, January 8, 2013 ^ "IETF Hackathon". www.ietf.org. Retrieved 2017-12-18. ^ Open government hackathons matter, Mark Headd, govfresh, August 24, 2011 ^ In #HackWeTrust - The House of Representatives Opens Its Doors to Transparency Through Technology, Daniel Schuman, Sunlight Foundation blog, December 8, 2011 ^ Toronto dementia hackathon 12-14 September, Dr. John Preece, British Foreign & Commonwealth Office Blogs, August 8, 2014 ^ Toronto hackathon to target dementia challenges with innovative ideas, British High Commission Ottawa, GOV.UK, July 25, 2014 ^ HackerNest hooks up with British Consulate-General Toronto for new DementiaHack, Joseph Czikk, Betakit, August 12, 2014 ^ "DementiaHack - HackerNest". Archived from the original on 2014-12-16. Retrieved 2015-09-03. ^ "About the Global Game Jam". GlobalGameJam. 2013-09-13. Retrieved 19 April 2016. ^ "Global Game Jam Diversifiers". GlobalGameJam. 2014-01-21. Retrieved 19 April 2016. ^ All aboard the transit hackathon express Archived 2012-01-08 at the Wayback Machine, Roberto Rocha, The Gazette, December 16, 2011 ^ "Hackney Hackathon succeeds in new services". 2014-11-20. Retrieved 17 July 2015. ^ "Education Hack Day". Education Hack Day. Retrieved 2013-10-09. ^ Council, Field Studies. "Page Not Found - FSC". www.field-studies-council.org. Retrieved 16 March 2018. Cite uses generic title (help) ^ "fschackday.org". fschackday.org. Retrieved 2013-10-09. ^ NASA, Microsoft, Google Hosting Hackathon, Elizabeth Montalbano, InformationWeek, June 7, 2010 ^ "THE Port". theport.ch. Retrieved 2017-12-13. ^ "Estonia organized a public-private e-hackatlon to hack the crisis". Retrieved 16 December 2020. ^ "Anti-crisis hackers join forces to find COVID-19 solutions". Retrieved 16 December 2020. ^ Rocheleau, Matt. "In Aaron Swartz' memory, hackathons to be held across globe, including at MIT, next month". Boston Globe. Retrieved 17 October 2013. ^ Doctorow, Cory. "Aaron Swartz hackathon". Boing Boing. Retrieved 17 October 2013. ^ Sifry, Micah L. "techPresident". Personal Democracy Media. Retrieved 11 October 2013. ^ "Aaron Swartz Hackathon". Archived from the original on 29 March 2014. Retrieved 30 October 2013. ^ Female Geeks Flex Their Skills At Ladies-Only Hackathon, Jed Lipinski, Fast Company, September 14, 2011 ^ World's largest student hackathon descends on Wells Fargo Center, Philadelphia Business Journal ^ Student computer whizzes compete at PennApps Hackathon, Philly.com ^ "Code Wars". University Of Mauritius Computer Club. 2017-09-13. Retrieved 2017-10-20. ^ "UoM CodeWars 2017 - Real life code implementations ! - Codarren". Codarren. 2017-09-26. Retrieved 2017-10-20. ^ Goetz, Nicole (1 September 2017). "ShamHacks: Missouri S&T hackathon". ShamHacks. Retrieved 4 April 2018. ^ Sheeley, Andrew (15 February 2018). "ShamHacks' first hackathon benefits veterans and students". Phelps County Focus. Retrieved 5 April 2018. ^ "Stay focused and keep hacking". www.facebook.com. Retrieved 16 March 2018. ^ "Local Talent Drives Startup Culture In Tampa Bay". 83Degrees. Retrieved 2017-08-15. ^ A.Sigfridsson, G. Avram, A. Sheehan and D. K. Sullivan "Sprint-driven development: working, learning and the process of enculturation in the PyPy community" in the Proceedings of the Third International Conference on Open Source Systems, Limerick, Ireland, June 11–13, 2007, Springer, pp. 133-146 ^ "Meet 'Titstare,' the Tech World's Latest 'Joke' from the Minds of Brogrammers". The Wire. 2013-09-09. Retrieved 2015-11-09. ^ "An Apology From". TechCrunch. Retrieved 2015-11-09. ^ Mike Swift (2015-09-19). "When Jokes go too Far". Major League Hacking. Retrieved 2016-06-06. ^ Victor Vucicevich (2015-09-23). "Leaving Hack the North". Medium. Retrieved 2016-06-06. ^ "Sociologists Examine Hackathons and See Exploitation". Wired. ISSN 1059-1028. Retrieved 2020-11-26. ^ Dariusz Jemielniak; Aleksandra Przegalinska (18 February 2020). Collaborative Society. MIT Press. ISBN 978-0-262-35645-9. External links[edit] Wikimedia Commons has media related to Hackathon. "Media-Making Strategies to Support Community and Learning at Hackathons". MIT Center for Civic Media. June 30, 2014. "Demystifying the hackathon". Article from Mckinsey, October, 2015 Retrieved from "https://en.wikipedia.org/w/index.php?title=Hackathon&oldid=1019645966" Categories: Hacker culture Internet slang OpenBSD Software developer communities Software development events Hackathons Hidden categories: Webarchive template wayback links CS1 German-language sources (de) CS1 errors: missing title CS1 errors: bare URL CS1 errors: generic title Articles with short description Short description matches Wikidata All articles with unsourced statements Articles with unsourced statements from March 2016 Articles needing cleanup from November 2020 All pages needing cleanup Articles with sections that need to be turned into prose from November 2020 Commons category link is on Wikidata Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version In other projects Wikimedia Commons MediaWiki Languages العربية Azərbaycanca Български Bosanski Català Čeština Dansk Deutsch Español Esperanto Euskara فارسی Français 한국어 Հայերեն Bahasa Indonesia Italiano עברית Қазақша മലയാളം Bahasa Melayu Nederlands 日本語 Oʻzbekcha/ўзбекча Polski Português Română Русский Српски / srpski Svenska ไทย Türkçe Українська Tiếng Việt 中文 Edit links This page was last edited on 24 April 2021, at 15:43 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-7772 ---- Distributed hash table - Wikipedia Distributed hash table From Wikipedia, the free encyclopedia Jump to navigation Jump to search Decentralized distributed system with lookup service This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Distributed hash table" – news · newspapers · books · scholar · JSTOR (September 2020) (Learn how and when to remove this template message) A distributed hash table (DHT) is a distributed system that provides a lookup service similar to a hash table: key-value pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. The main advantage of a DHT is that nodes can be added or removed with minimum work around re-distributing keys. Keys are unique identifiers which map to particular values, which in turn can be anything from addresses, to documents, to arbitrary data.[1] Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures. DHTs form an infrastructure that can be used to build more complex services, such as anycast, cooperative web caching, distributed file systems, domain name services, instant messaging, multicast, and also peer-to-peer file sharing and content distribution systems. Notable distributed networks that use DHTs include BitTorrent's distributed tracker, the Coral Content Distribution Network, the Kad network, the Storm botnet, the Tox instant messenger, Freenet, the YaCy search engine, and the InterPlanetary File System. Distributed hash tables Contents 1 History 2 Properties 3 Structure 3.1 Keyspace partitioning 3.1.1 Consistent hashing 3.1.2 Rendezvous hashing 3.1.3 Locality-preserving hashing 3.2 Overlay network 3.3 Algorithms for overlay networks 4 Security 5 Implementations 6 Examples 6.1 DHT protocols and implementations 6.2 Applications using DHTs 7 See also 8 References 9 External links History[edit] DHT research was originally motivated, in part, by peer-to-peer (P2P) systems such as Freenet, Gnutella, BitTorrent and Napster, which took advantage of resources distributed across the Internet to provide a single useful application. In particular, they took advantage of increased bandwidth and hard disk capacity to provide a file-sharing service.[2] These systems differed in how they located the data offered by their peers. Napster, the first large-scale P2P content delivery system, required a central index server: each node, upon joining, would send a list of locally held files to the server, which would perform searches and refer the queries to the nodes that held the results. This central component left the system vulnerable to attacks and lawsuits. Gnutella and similar networks moved to a query flooding model – in essence, each search would result in a message being broadcast to every other machine in the network. While avoiding a single point of failure, this method was significantly less efficient than Napster. Later versions of Gnutella clients moved to a dynamic querying model which vastly improved efficiency.[3] Freenet is fully distributed, but employs a heuristic key-based routing in which each file is associated with a key, and files with similar keys tend to cluster on a similar set of nodes. Queries are likely to be routed through the network to such a cluster without needing to visit many peers.[4] However, Freenet does not guarantee that data will be found. Distributed hash tables use a more structured key-based routing in order to attain both the decentralization of Freenet and Gnutella, and the efficiency and guaranteed results of Napster. One drawback is that, like Freenet, DHTs only directly support exact-match search, rather than keyword search, although Freenet's routing algorithm can be generalized to any key type where a closeness operation can be defined.[5] In 2001, four systems—CAN,[6] Chord,[7] Pastry, and Tapestry—ignited DHTs as a popular research topic. A project called the Infrastructure for Resilient Internet Systems (Iris) was funded by a $12 million grant from the United States National Science Foundation in 2002.[8] Researchers included Sylvia Ratnasamy, Ion Stoica, Hari Balakrishnan and Scott Shenker.[9] Outside academia, DHT technology has been adopted as a component of BitTorrent and in the Coral Content Distribution Network. Properties[edit] DHTs characteristically emphasize the following properties: Autonomy and decentralization: the nodes collectively form the system without any central coordination. Fault tolerance: the system should be reliable (in some sense) even with nodes continuously joining, leaving, and failing .[10] Scalability: the system should function efficiently even with thousands or millions of nodes. A key technique used to achieve these goals is that any one node needs to coordinate with only a few other nodes in the system – most commonly, O(log n) of the n participants (see below) – so that only a limited amount of work needs to be done for each change in membership. Some DHT designs seek to be secure against malicious participants[11] and to allow participants to remain anonymous, though this is less common than in many other peer-to-peer (especially file sharing) systems; see anonymous P2P. Finally, DHTs must deal with more traditional distributed systems issues such as load balancing, data integrity, and performance (in particular, ensuring that operations such as routing and data storage or retrieval complete quickly). Structure[edit] The structure of a DHT can be decomposed into several main components.[12][13] The foundation is an abstract keyspace, such as the set of 160-bit strings. A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes. An overlay network then connects the nodes, allowing them to find the owner of any given key in the keyspace. Once these components are in place, a typical use of the DHT for storage and retrieval might proceed as follows. Suppose the keyspace is the set of 160-bit strings. To index a file with given filename and data in the DHT, the SHA-1 hash of filename is generated, producing a 160-bit key k, and a message put(k, data) is sent to any node participating in the DHT. The message is forwarded from node to node through the overlay network until it reaches the single node responsible for key k as specified by the keyspace partitioning. That node then stores the key and the data. Any other client can then retrieve the contents of the file by again hashing filename to produce k and asking any DHT node to find the data associated with k with a message get(k). The message will again be routed through the overlay to the node responsible for k, which will reply with the stored data. The keyspace partitioning and overlay network components are described below with the goal of capturing the principal ideas common to most DHTs; many designs differ in the details. Keyspace partitioning[edit] Most DHTs use some variant of consistent hashing or rendezvous hashing to map keys to nodes. The two algorithms appear to have been devised independently and simultaneously to solve the distributed hash table problem. Both consistent hashing and rendezvous hashing have the essential property that removal or addition of one node changes only the set of keys owned by the nodes with adjacent IDs, and leaves all other nodes unaffected. Contrast this with a traditional hash table in which addition or removal of one bucket causes nearly the entire keyspace to be remapped. Since any change in ownership typically corresponds to bandwidth-intensive movement of objects stored in the DHT from one node to another, minimizing such reorganization is required to efficiently support high rates of churn (node arrival and failure). Consistent hashing[edit] Further information: Consistent hashing Consistent hashing employs a function δ ( k 1 , k 2 ) {\displaystyle \delta (k_{1},k_{2})} that defines an abstract notion of the distance between the keys k 1 {\displaystyle k_{1}} and k 2 {\displaystyle k_{2}} , which is unrelated to geographical distance or network latency. Each node is assigned a single key called its identifier (ID). A node with ID i x {\displaystyle i_{x}} owns all the keys k m {\displaystyle k_{m}} for which i x {\displaystyle i_{x}} is the closest ID, measured according to δ ( k m , i x ) {\displaystyle \delta (k_{m},i_{x})} . For example, the Chord DHT uses consistent hashing, which treats nodes as points on a circle, and δ ( k 1 , k 2 ) {\displaystyle \delta (k_{1},k_{2})} is the distance traveling clockwise around the circle from k 1 {\displaystyle k_{1}} to k 2 {\displaystyle k_{2}} . Thus, the circular keyspace is split into contiguous segments whose endpoints are the node identifiers. If i 1 {\displaystyle i_{1}} and i 2 {\displaystyle i_{2}} are two adjacent IDs, with a shorter clockwise distance from i 1 {\displaystyle i_{1}} to i 2 {\displaystyle i_{2}} , then the node with ID i 2 {\displaystyle i_{2}} owns all the keys that fall between i 1 {\displaystyle i_{1}} and i 2 {\displaystyle i_{2}} . Rendezvous hashing[edit] Further information: Rendezvous hashing In rendezvous hashing, also called highest random weight (HRW) hashing, all clients use the same hash function h ( ) {\displaystyle h()} (chosen ahead of time) to associate a key to one of the n available servers. Each client has the same list of identifiers {S1, S2, ..., Sn }, one for each server. Given some key k, a client computes n hash weights w1 = h(S1, k), w2 = h(S2, k), ..., wn = h(Sn, k). The client associates that key with the server corresponding to the highest hash weight for that key. A server with ID S x {\displaystyle S_{x}} owns all the keys k m {\displaystyle k_{m}} for which the hash weight h ( S x , k m ) {\displaystyle h(S_{x},k_{m})} is higher than the hash weight of any other node for that key. Locality-preserving hashing[edit] Further information: Locality-preserving hashing Locality-preserving hashing ensures that similar keys are assigned to similar objects. This can enable a more efficient execution of range queries, however, in contrast to using consistent hashing, there is no more assurance that the keys (and thus the load) is uniformly randomly distributed over the key space and the participating peers. DHT protocols such as Self-Chord and Oscar[14] address such issues. Self-Chord decouples object keys from peer IDs and sorts keys along the ring with a statistical approach based on the swarm intelligence paradigm.[15] Sorting ensures that similar keys are stored by neighbour nodes and that discovery procedures, including range queries, can be performed in logarithmic time. Oscar constructs a navigable small-world network based on random walk sampling also assuring logarithmic search time. Overlay network[edit] Each node maintains a set of links to other nodes (its neighbors or routing table). Together, these links form the overlay network.[16] A node picks its neighbors according to a certain structure, called the network's topology. All DHT topologies share some variant of the most essential property: for any key k, each node either has a node ID that owns k or has a link to a node whose node ID is closer to k, in terms of the keyspace distance defined above. It is then easy to route a message to the owner of any key k using the following greedy algorithm (that is not necessarily globally optimal): at each step, forward the message to the neighbor whose ID is closest to k. When there is no such neighbor, then we must have arrived at the closest node, which is the owner of k as defined above. This style of routing is sometimes called key-based routing. Beyond basic routing correctness, two important constraints on the topology are to guarantee that the maximum number of hops in any route (route length) is low, so that requests complete quickly; and that the maximum number of neighbors of any node (maximum node degree) is low, so that maintenance overhead is not excessive. Of course, having shorter routes requires higher maximum degree. Some common choices for maximum degree and route length are as follows, where n is the number of nodes in the DHT, using Big O notation: Max. degree Max route length Used in Note O ( 1 ) {\displaystyle O(1)} O ( n ) {\displaystyle O(n)} Worst lookup lengths, with likely much slower lookups times O ( 1 ) {\displaystyle O(1)} O ( log ⁡ n ) {\displaystyle O(\log n)} Koorde (with constant degree) More complex to implement, but acceptable lookup time can be found with a fixed number of connections O ( log ⁡ n ) {\displaystyle O(\log n)} O ( log ⁡ n ) {\displaystyle O(\log n)} Chord Kademlia Pastry Tapestry Most common, but not optimal (degree/route length). Chord is the most basic version, with Kademlia seeming the most popular optimized variant (should have improved average lookup) O ( log ⁡ n ) {\displaystyle O(\log n)} O ( log ⁡ n / log ⁡ ( log ⁡ n ) ) {\displaystyle O(\log n/\log(\log n))} Koorde (with optimal lookup) More complex to implement, but lookups might be faster (have a lower worst case bound) O ( n ) {\displaystyle O({\sqrt {n}})} O ( 1 ) {\displaystyle O(1)} Worst local storage needs, with much communication after any node connects or disconnects The most common choice, O ( log ⁡ n ) {\displaystyle O(\log n)} degree/route length, is not optimal in terms of degree/route length tradeoff, but such topologies typically allow more flexibility in choice of neighbors. Many DHTs use that flexibility to pick neighbors that are close in terms of latency in the physical underlying network. In general, all DHTs construct navigable small-world network topologies, which trade-off route length vs. network degree.[17] Maximum route length is closely related to diameter: the maximum number of hops in any shortest path between nodes. Clearly, the network's worst case route length is at least as large as its diameter, so DHTs are limited by the degree/diameter tradeoff[18] that is fundamental in graph theory. Route length can be greater than diameter, since the greedy routing algorithm may not find shortest paths.[19] Algorithms for overlay networks[edit] Aside from routing, there exist many algorithms that exploit the structure of the overlay network for sending a message to all nodes, or a subset of nodes, in a DHT.[20] These algorithms are used by applications to do overlay multicast, range queries, or to collect statistics. Two systems that are based on this approach are Structella,[21] which implements flooding and random walks on a Pastry overlay, and DQ-DHT, which implements a dynamic querying search algorithm over a Chord network.[22] Security[edit] Because of the decentralization, fault tolerance, and scalability of DHTs, they are inherently more resilient against a hostile attacker than a centralized system.[vague] Open systems for distributed data storage that are robust against massive hostile attackers are feasible.[23] A DHT system that is carefully designed to have Byzantine fault tolerance can defend against a security weakness, known as the Sybil attack, which affects all current DHT designs.[24][25] Petar Maymounkov, one of the original authors of Kademlia, has proposed a way to circumvent the weakness to the Sybil attack by incorporating social trust relationships into the system design.[26] The new system, codenamed Tonika or also known by its domain name as 5ttt, is based on an algorithm design known as "electric routing" and co-authored with the mathematician Jonathan Kelner.[27] Maymounkov has now undertaken a comprehensive implementation effort of this new system. However, research into effective defences against Sybil attacks is generally considered an open question, and wide variety of potential defences are proposed every year in top security research conferences.[citation needed] Implementations[edit] Most notable differences encountered in practical instances of DHT implementations include at least the following: The address space is a parameter of DHT. Several real-world DHTs use 128-bit or 160-bit key space. Some real-world DHTs use hash functions other than SHA-1. In the real world the key k could be a hash of a file's content rather than a hash of a file's name to provide content-addressable storage, so that renaming of the file does not prevent users from finding it. Some DHTs may also publish objects of different types. For example, key k could be the node ID and associated data could describe how to contact this node. This allows publication-of-presence information and often used in IM applications, etc. In the simplest case, ID is just a random number that is directly used as key k (so in a 160-bit DHT ID will be a 160-bit number, usually randomly chosen). In some DHTs, publishing of nodes' IDs is also used to optimize DHT operations. Redundancy can be added to improve reliability. The (k, data) key pair can be stored in more than one node corresponding to the key. Usually, rather than selecting just one node, real world DHT algorithms select i suitable nodes, with i being an implementation-specific parameter of the DHT. In some DHT designs, nodes agree to handle a certain keyspace range, the size of which may be chosen dynamically, rather than hard-coded. Some advanced DHTs like Kademlia perform iterative lookups through the DHT first in order to select a set of suitable nodes and send put(k, data) messages only to those nodes, thus drastically reducing useless traffic, since published messages are only sent to nodes that seem suitable for storing the key k; and iterative lookups cover just a small set of nodes rather than the entire DHT, reducing useless forwarding. In such DHTs, forwarding of put(k, data) messages may only occur as part of a self-healing algorithm: if a target node receives a put(k, data) message, but believes that k is out of its handled range and a closer node (in terms of DHT keyspace) is known, the message is forwarded to that node. Otherwise, data are indexed locally. This leads to a somewhat self-balancing DHT behavior. Of course, such an algorithm requires nodes to publish their presence data in the DHT so the iterative lookups can be performed. Since on most machines sending messages is much more expensive than local hash table accesses, it makes sense to bundle many messages concerning a particular node into a single batch. Assuming each node has a local batch consisting of at most b operations, the bundling procedure is as follows. Each node first sorts its local batch by the identifier of the node responsible for the operation. Using bucket sort, this can be done in O(b + n), where n is the number of nodes in the DHT. When there are multiple operations addressing the same key within one batch, the batch is condensed before being sent out. For example, multiple lookups of the same key can be reduced to one or multiple increments can be reduced to a single add operation. This reduction can be implemented with the help of a temporary local hash table. Finally, the operations are sent to the respective nodes.[28] Examples[edit] DHT protocols and implementations[edit] Apache Cassandra BATON Overlay Mainline DHT – standard DHT used by BitTorrent (based on Kademlia as provided by Khashmir)[29] Content addressable network (CAN) Chord Koorde Kademlia Pastry P-Grid Riak Tapestry TomP2P Voldemort Applications using DHTs[edit] BTDigg: BitTorrent DHT search engine Codeen: web caching Coral Content Distribution Network Freenet: a censorship-resistant anonymous network GlusterFS: a distributed file system used for storage virtualization GNUnet: Freenet-like distribution network including a DHT implementation I2P: An open-source anonymous peer-to-peer network I2P-Bote: serverless secure anonymous email IPFS: A content-addressable, peer-to-peer hypermedia distribution protocol JXTA: open-source P2P platform Oracle Coherence: an in-memory data grid built on top of a Java DHT implementation Perfect Dark: a peer-to-peer file-sharing application from Japan Retroshare: a Friend-to-friend network[30] Jami: a privacy-preserving voice, video and chat communication platform, based on a Kademlia-like DHT Tox: an instant messaging system intended to function as a Skype replacement Twister: a microblogging peer-to-peer platform YaCy: a distributed search engine See also[edit] Couchbase Server: a persistent, replicated, clustered distributed object storage system compatible with memcached protocol. Memcached: a high-performance, distributed memory object caching system. Prefix hash tree: sophisticated querying over DHTs. Merkle tree: tree having every non-leaf node labelled with the hash of the labels of its children nodes. Most distributed data stores employ some form of DHT for lookup. Skip graphs are an efficient data structure for implementing DHTs. References[edit] ^ Stoica, I.; Morris, R.; Karger, D.; Kaashoek, M. F.; Balakrishnan, H. (2001). "Chord: A scalable peer-to-peer lookup service for internet applications" (PDF). ACM SIGCOMM Computer Communication Review. 31 (4): 149. doi:10.1145/964723.383071. A value can be an address, a document, or an arbitrary data item. ^ Liz, Crowcroft; et al. (2005). "A survey and comparison of peer-to-peer overlay network schemes" (PDF). IEEE Communications Surveys & Tutorials. 7 (2): 72–93. CiteSeerX 10.1.1.109.6124. doi:10.1109/COMST.2005.1610546. ^ Richter, Stevenson; et al. (2009). "Analysis of the impact of dynamic querying models on client-server relationships". Trends in Modern Computing: 682–701. ^ Searching in a Small World Chapters 1 & 2 (PDF), retrieved 2012-01-10 ^ "Section 5.2.2" (PDF), A Distributed Decentralized Information Storage and Retrieval System, retrieved 2012-01-10 ^ Ratnasamy; et al. (2001). "A Scalable Content-Addressable Network" (PDF). In Proceedings of ACM SIGCOMM 2001. Retrieved 2013-05-20. Cite journal requires |journal= (help) ^ Hari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. Looking up data in P2P systems. In Communications of the ACM, February 2003. ^ David Cohen (October 1, 2002). "New P2P network funded by US government". New Scientist. Retrieved November 10, 2013. ^ "MIT, Berkeley, ICSI, NYU, and Rice Launch the IRIS Project". Press release. MIT. September 25, 2002. Archived from the original on September 26, 2015. Retrieved November 10, 2013. ^ R Mokadem, A Hameurlain and AM Tjoa. Resource discovery service while minimizing maintenance overhead in hierarchical DHT systems. Proc. iiWas, 2010 ^ Guido Urdaneta, Guillaume Pierre and Maarten van Steen. A Survey of DHT Security Techniques. ACM Computing Surveys 43(2), January 2011. ^ Moni Naor and Udi Wieder. Novel Architectures for P2P Applications: the Continuous-Discrete Approach. Proc. SPAA, 2003. ^ Gurmeet Singh Manku. Dipsea: A Modular Distributed Hash Table Archived 2004-09-10 at the Wayback Machine. Ph. D. Thesis (Stanford University), August 2004. ^ Girdzijauskas, Šarūnas; Datta, Anwitaman; Aberer, Karl (2010-02-01). "Structured overlay for heterogeneous environments". ACM Transactions on Autonomous and Adaptive Systems. 5 (1): 1–25. doi:10.1145/1671948.1671950. ISSN 1556-4665. ^ Forestiero, Agostino; Leonardi, Emilio; Mastroianni, Carlo; Meo, Michela (October 2010). "Self-Chord: A Bio-Inspired P2P Framework for Self-Organizing Distributed Systems". IEEE/ACM Transactions on Networking. 18 (5): 1651–1664. doi:10.1109/TNET.2010.2046745. ^ Galuba, Wojciech; Girdzijauskas, Sarunas (2009), "Peer to Peer Overlay Networks: Structure, Routing and Maintenance", in LIU, LING; ÖZSU, M. TAMER (eds.), Encyclopedia of Database Systems, Springer US, pp. 2056–2061, doi:10.1007/978-0-387-39940-9_1215, ISBN 9780387399409 ^ Girdzijauskas, Sarunas (2009). Designing peer-to-peer overlays a small-world perspective. epfl.ch. EPFL. ^ The (Degree,Diameter) Problem for Graphs, Maite71.upc.es, archived from the original on 2012-02-17, retrieved 2012-01-10 ^ Gurmeet Singh Manku, Moni Naor, and Udi Wieder. "Know thy Neighbor's Neighbor: the Power of Lookahead in Randomized P2P Networks". Proc. STOC, 2004. ^ Ali Ghodsi. "Distributed k-ary System: Algorithms for Distributed Hash Tables", Archived 22 May 2007 at the Wayback Machine. KTH-Royal Institute of Technology, 2006. ^ Castro, Miguel; Costa, Manuel; Rowstron, Antony (1 January 2004). "Should we build Gnutella on a structured overlay?" (PDF). ACM SIGCOMM Computer Communication Review. 34 (1): 131. CiteSeerX 10.1.1.221.7892. doi:10.1145/972374.972397. ^ Talia, Domenico; Trunfio, Paolo (December 2010). "Enabling Dynamic Querying over Distributed Hash Tables". Journal of Parallel and Distributed Computing. 70 (12): 1254–1265. doi:10.1016/j.jpdc.2010.08.012. ^ Baruch Awerbuch, Christian Scheideler. "Towards a scalable and robust DHT". 2006. doi:10.1145/1148109.1148163 ^ Maxwell Young; Aniket Kate; Ian Goldberg; Martin Karsten. "Practical Robust Communication in DHTs Tolerating a Byzantine Adversary". ^ Natalya Fedotova; Giordano Orzetti; Luca Veltri; Alessandro Zaccagnini. "Byzantine agreement for reputation management in DHT-based peer-to-peer networks". doi:10.1109/ICTEL.2008.4652638 ^ Chris Lesniewski-Laas. "A Sybil-proof one-hop DHT" (PDF): 20. Cite journal requires |journal= (help) ^ Jonathan Kelner, Petar Maymounkov (2009). "Electric routing and concurrent flow cutting". arXiv:0909.2859. Bibcode:2009arXiv0909.2859K. Cite journal requires |journal= (help) ^ Sanders, Peter; Mehlhorn, Kurt; Dietzfelbinger, Martin; Dementiev, Roman (2019). Sequential and Parallel Algorithms and Data Structures: The Basic Toolbox. Springer International Publishing. ISBN 978-3-030-25208-3. ^ Tribler wiki Archived December 4, 2010, at the Wayback Machine retrieved January 2010. ^ Retroshare FAQ retrieved December 2011 External links[edit] Distributed Hash Tables, Part 1 by Brandon Wiley. Distributed Hash Tables links Carles Pairot's Page on DHT and P2P research kademlia.scs.cs.nyu.edu Archive.org snapshots of kademlia.scs.cs.nyu.edu Eng-Keong Lua; Crowcroft, Jon; Pias, Marcelo; Sharma, Ravi; Lim, Steve (2005). "IEEE Survey on overlay network schemes". CiteSeerX 10.1.1.111.4197: Cite journal requires |journal= (help) covering unstructured and structured decentralized overlay networks including DHTs (Chord, Pastry, Tapestry and others). Mainline DHT Measurement at Department of Computer Science, University of Helsinki, Finland. v t e BitTorrent Companies BitTorrent, Inc. Vuze, Inc. People Bram Cohen Ross Cohen Eric Klinker Ashwin Navin Justin Sun Technology Glossary Broadcatching Distributed hash tables DNA I2P index Local Peer Discovery Peer exchange Protocol encryption Super-seeding Tracker Torrent file TCP UDP µTP WebRTC WebTorrent Clients (comparison, usage share) Ares Galaxy BitTorrent (original client) BitComet BitLord Deluge Free Download Manager Flashget FrostWire Getright Go!Zilla KTorrent libtorrent (library) LimeWire µTorrent Miro MLDonkey qBittorrent rTorrent Shareaza Tixati Transmission Tribler Vuze (formerly Azureus) WebTorrent Desktop Xunlei Tracker software (comparison) opentracker PeerTracker TorrentPier XBT Tracker Search engines (comparison) 1337x BTDigg Demonoid etree ExtraTorrent EZTV isoHunt Karagarga KickassTorrents Nyaa Torrents The Pirate Bay RARBG Tamil Rockers Torrentz YIFY yourBittorrent Defunct websites BTJunkie Burnbit LokiTorrent Mininova Oink's Pink Palace OpenBitTorrent Suprnova.org t411 Torrent Project TorrentSpy What.CD YouTorrent Related topics aXXo BitTorrent Open Source License Glossary of BitTorrent terms Popcorn Time Slyck.com TorrentFreak Category Commons Retrieved from "https://en.wikipedia.org/w/index.php?title=Distributed_hash_table&oldid=1013584054" Categories: Distributed data storage File sharing Distributed data structures Hash based data structures Network architecture Hashing Hidden categories: CS1 errors: missing periodical Webarchive template wayback links Articles with short description Short description is different from Wikidata Articles needing additional references from September 2020 All articles needing additional references All Wikipedia articles needing clarification Wikipedia articles needing clarification from June 2016 All articles with unsourced statements Articles with unsourced statements from May 2020 Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages Български Català Deutsch Español فارسی Français 한국어 Italiano Magyar Nederlands 日本語 Norsk bokmål Polski Português Русский Српски / srpski Suomi Svenska Türkçe Українська Tiếng Việt 中文 Edit links This page was last edited on 22 March 2021, at 12:23 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-7867 ---- Shure SM57 - Wikipedia Shure SM57 From Wikipedia, the free encyclopedia Jump to navigation Jump to search The Shure SM57 microphone The Shure SM57 is a low-impedance cardioid dynamic microphone made by Shure Incorporated and commonly used in live sound reinforcement and studio recording. It is one of the best-selling microphones in the world. It is used extensively in amplified music and has been used for speeches by every U.S. president since its introduction in 1965.[1] In 2004, honoring its four decades of "solid, dependable performance", it was inducted into the first-ever TEC Awards TECnology Hall of Fame.[1] Contents 1 Background 2 Characteristics 3 Use 4 Specifications 5 See also 6 References 7 External links Background[edit] The origin of SM57 may be traced to 1937, when Shure engineer Benjamin Bauer developed the first single-element directional microphone, the Unidyne, which had a cardioid pickup pattern.[1] In 1959, another Shure engineer, Ernie Seeler, advanced the art of microphone design significantly with the Unidyne III.[1] Seeler torture-tested the Unidyne III during three years of research and development and thereby, produced the SM series of rugged and reliable Shure microphone capsules.[1] The "SM" stands for Studio Microphone;[2] Seeler was an aficionado of classical music and expected the SM57 to be used for orchestras. Because he "despised" rock music, the TEC Foundation said that it was ironic that the microphone has become "a mainstay of rock music."[1] Characteristics[edit] The SM57 uses the same capsule as the popular SM58. Like the SM58, the SM57 is fitted with an XLR connector and features a balanced output, which helps to minimize electrical hum and noise pickup. According to Shure, the frequency response extends from 40 Hertz (Hz) to 15 kHz. The SM57 is manufactured in the United States, Mexico, and China. The Shure A2WS is an accessory windscreen for the SM57 that attenuates wind noise and plosives ("pop" sounds), and protects the microphone capsule. Use[edit] Shure SM57 microphones with A2WS windscreens installed on the lectern of former United States President Barack Obama. The microphone kit (two SM57 microphones, windscreens, microphone stands, and black right-angle XLR cables) is referred to as the VIP/high-profile microphone kit. The SM57 is a popular choice of musicians due to its sturdy construction and ability to work well with instruments that produce high sound pressure levels, such as percussion instruments and electric guitars. The School of Audio Engineering (SAE) recommends the SM57 (along with other makes and models) for four roles in a drum kit: kick drum, snare drum, rack toms, and floor tom.[3] The cardioid pickup pattern of the microphone reduces the pickup of unwanted background sound and the generation of acoustic feedback. SM57s have also been a staple when reinforcing the sound from guitar amplifiers. In a more unconventional fashion, the SM57 has been favoured by some as a vocal mic, both live and in the studio. Notable singers known to have recorded vocals with an SM57 include Anthony Kiedis, Brandon Flowers,[4] Madonna,[5] David Bowie,[6] John Lennon,[7] Jack White,[8] Bjork,[9] Peter Gabriel,[10] Paul Rodgers,[11] Tom Waits,[12] Wayne Coyne,[13] Tom Petty [14]Alice Cooper, Erykah Badu,[15] Caleb Followill[16] and Raphael Saadiq.[17] An early model of the mic, the Unidyne 545 was used on Pet Sounds for Brian Wilson's vocal tracks. Every U.S. president since Lyndon B. Johnson has delivered speeches through an SM57.[1] It became the lectern microphone of the White House Communications Agency in 1965, the year of its introduction, and remains so.[18] Due to its popularity, the SM57 has been counterfeited frequently by manufacturers in China and Thailand.[19] Shure Distribution UK reports that the SM57, SM58, Beta 57A, and Beta 58A are their microphones that are most commonly counterfeited.[20] In 2006, Shure mounted a campaign against the trading of counterfeit microphones.[21] Specifications[edit] SM57 Unidyne III, ca. 1984 Type Dynamic Frequency response 40 to 15,000 Hz Polar pattern Cardioid Sensitivity (at 1,000 Hz open circuit voltage) −56.0 dBV/Pa (at 1,000 Hz) Impedance Rated impedance is 150 ohms (300 ohms actual) for connection to microphone inputs rated low impedance Connector Three-pin professional audio connector (male XLR type) Produced 1965–present See also[edit] Shure SM58 References[edit] ^ a b c d e f g TECnology Hall of Fame: 2004 Archived 2013-12-13 at the Wayback Machine ^ History of Shure Incorporated Archived 2008-04-28 at the Wayback Machine ^ "Microphone Placement: Let's take a look at a standard drum kit". SAE. Retrieved April 6, 2011. CS1 maint: discouraged parameter (link) ^ https://reverb.com/news/gear-tribute-the-shure-sm57-from-rumours-to-the-white-house ^ https://web.archive.org/web/20110830193214/http://www.sheppettibone.com/sp_erotica_diaries.htm ^ https://timpalmer.com/wp-content/themes/timpalmer/pdfs/Melody_maker_1989.pdf ^ https://www.soundonsound.com/people/john-lennon-whatever-gets-you-thru-night ^ https://www.soundonsound.com/techniques/inside-track-jack-white ^ http://www.moredarkthanshark.org/eno_int_audproint-oct08.html ^ https://www.youtube.com/watch?v=scmYG1Pv1_Q&feature=youtu.be&t=35m45s ^ https://www.analogplanet.com/content/royal-sessions-finds-paul-rodgers-fine-voice ^ https://www.soundonsound.com/people/bones-howe-tom-waits ^ https://www.youtube.com/watch?v=zzk4AkZw9vc&feature=youtu.be&t=256 ^ https://www.soundonsound.com/techniques/inside-track-tom-pettys-hypnotic-eye ^ https://web.archive.org/web/20171122013017/http://www.emusician.com/gear/1332/earth-sun-moon/39259 ^ https://www.mixonline.com/recording/kings-leon-365832 ^ Farinella, David John (January 1, 2009). "Music: Raphael Saadiq". Mix. Archived from the original on September 21, 2012. Retrieved April 21, 2012. CS1 maint: discouraged parameter (link) ^ Charles J. Kouri; Rose L. Shure; Hayward Blake; John Lee (2001). Shure: sound people, products, and values. 1. Shure Inc. p. xiii. ISBN 0-9710738-0-5. ^ Home Recording. Joe Shambro, Spotting a Fake Shure Microphone: How to tell if your mic is genuine—or not ^ Shure Distribution UK. What is a counterfeit? Archived 2009-03-03 at the Wayback Machine ^ Shure Distribution UK. Shure Distribution UK Clamp Down on Counterfeiters Archived 2009-04-25 at the Wayback Machine External links[edit] SM57 official page Sound&Recording - 50 Years of Shure SM57 Retrieved from "https://en.wikipedia.org/w/index.php?title=Shure_SM57&oldid=1000474223" Categories: Microphones Hidden categories: Webarchive template wayback links CS1 maint: discouraged parameter Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version In other projects Wikimedia Commons Languages Català Deutsch Español Français Italiano Nederlands 日本語 Suomi Edit links This page was last edited on 15 January 2021, at 07:40 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-8598 ---- Shure SM58 - Wikipedia Shure SM58 From Wikipedia, the free encyclopedia Jump to navigation Jump to search The Shure SM58 microphone The Shure SM58 is a professional cardioid dynamic microphone, commonly used in live vocal applications. Produced since 1966 by Shure Incorporated, it has built a strong reputation among musicians for its durability and sound, and half a century later it is still considered the industry standard for live vocal performance microphones.[1][2][3] The SM58 and its sibling, the SM57, are the best-selling microphones in the world.[4] The SM stands for Studio Microphone.[5] Like all directional microphones, the SM58 is subject to proximity effect, a low frequency boost when used close to the source. The cardioid response reduces pickup from the side and rear, helping to avoid feedback onstage. There are wired (with and without on/off switch) and wireless versions. The wired version provides balanced audio through a male XLR connector. The SM58 uses an internal shock mount to reduce handling noise. A distinctive feature of the SM58 is its pneumatic suspension system for the microphone capsule.[6] The capsule, a readily replaceable component, is surrounded by a soft rubber balloon, rather than springs or solid rubber. This gives notably good isolation from handling noise; one reason for its being a popular microphone for stage vocalists. Microphones with this feature are intended primarily for hand-held use, rather than on a stand or for instrument miking. The SM58 is unswitched, while the otherwise identical SM58S has a sliding on-off switch on the body. Other suffixes refer to any accessories supplied with the microphone: when a cable is provided, the model is actually SM58-CN, while the SM58-LC has no provided cable; the SM58-X2u kit consists of the SM58-LC and an inline X2u XLR-to-USB signal adaptor (capable of providing phantom power for condenser microphones, and offering an in-built headphone jack for monitoring).[7] Contents 1 Specifications 2 Awards 3 Counterfeiting 4 See also 5 References 6 External links Specifications[edit] Robert Lockwood, Jr using an SM58 Patti Smith performing with an SM58 in Finland Randall Bramblett with an SM58 Lower-cost 588SD, circa 1970[8] Type: Dynamic[9] (moving coil) Frequency Response 50 to 15,000 Hz[9] Polar Pattern Cardioid,[9] rotationally symmetrical about microphone axis, uniform with frequency Sensitivity (at 1,000 Hz Open Circuit Voltage) −54.5 dBV/Pa (1.85 mV); 1 Pa = 94 dB SPL[9] Impedance Rated impedance is 150 ohms (300 ohms actual) for connection to microphone inputs rated low impedance[9] Polarity Positive pressure on diaphragm produces positive voltage on pin 2 with respect to pin 3[9] Connector Three-pin male XLR[9] Net Weight 298 grams (10.5 oz)[9] Awards[edit] In 2008, for the second year running, the SM58 microphone won the MI Pro Retail Survey "Best Live Microphone" award.[10] In 2011, Acoustic Guitar magazine honored the SM58 with a Gold Medal in the Player's Choice Awards.[11] Counterfeiting[edit] The SM58 and SM57 have been extensively counterfeited.[12][13][14][15][16][17] Most of these counterfeit microphones are at least functional, but have poorer performance and do not have the pneumatic suspension. There are many other subtle details which can reveal most of these fakes.[18][19] See also[edit] Shure SM57 Shure Beta 58A References[edit] ^ Live Sound International, September/October 2002. Real World: Wired Vocal Microphones Archived 2009-01-07 at the Wayback Machine ^ Miller, Peter L. (2001). Speaking Skills for Every Occasion. Blake's Guides. Pascal Press. p. 30. ISBN 1741250463. ^ Morris, Tee; Tomasi, Chuck; Terra, Evo (2008). Podcasting For Dummies (2 ed.). John Wiley & Sons. p. 36. ISBN 047027557X. ^ Paul Stamler, Shure SM57 Impedance Modification, Recording Magazine, archived from the original on 2014-04-21, retrieved 2014-04-20 CS1 maint: discouraged parameter (link) ^ History of Shure Incorporated ^ Goodwyn, Peterson. "Shure's Secret, Invisible Shockmount". Recording Hacks. Retrieved 1 November 2013. CS1 maint: discouraged parameter (link) ^ "SM58+X2u USB Digital Bundle". Shure Europe. ^ Shure webpage ^ a b c d e f g h Product Specifications (PDF), Shure, retrieved 2012-10-06 CS1 maint: discouraged parameter (link) ^ http://www.shure.com/americas/about-shure/history/index.htm ^ Gerken, Teja. "Acoustic Guitar Player's Choice Awards 2011 - Shure SM58". Acoustic Guitar. String Letter Publishing. Retrieved August 19, 2012. CS1 maint: discouraged parameter (link) ^ "Sennheiser, Shure Team Up For Counterfeit Raid", December 21, 2001, MIX ^ "Counterfeit Shure Microphones Destroyed", October 9, 2002, MIX ^ "Thai-based counterfeit ring smashed", February 1, 2006, Music Trades. "Among the products in this shipment was a large quantity of counterfeit SM58 microphones destined for retail outlets around Thailand." ^ "Auction websites' threat to legitimate brands", January 1, 2007, Pro Sound News Europe. "The SM57, SM58, Beta 57 and Beta 58 are among the fixities proving most attractive to counterfeiters." ^ ""Shure Seizes Counterfeit Microphones in China", November 14, 2007, MIX ^ "Counterfeit Shure Gear Seized: Thousands of counterfeit microphones were recently confiscated in Peru and Paraguay by customs officials", February 2, 2012, Broadcasting & Cable ^ "Spotting a Fake Shure Microphone: How to tell if your mic is genuine -- or not". About.com Home Recording ^ [1]"5 Tips on Spotting a Fake Shure SM58" External links[edit] SM58 official page Shure Asia SM58 official page Shure SM58 history page Retrieved from "https://en.wikipedia.org/w/index.php?title=Shure_SM58&oldid=975527289" Categories: Microphones Hidden categories: Webarchive template wayback links CS1 maint: discouraged parameter Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version In other projects Wikimedia Commons Languages Català Deutsch Español Français Italiano Nederlands 日本語 Русский Suomi Edit links This page was last edited on 29 August 2020, at 01:20 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-8991 ---- Randori - Wikipedia Randori From Wikipedia, the free encyclopedia Jump to navigation Jump to search Free-style practice in Japanese martial arts This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Randori" – news · newspapers · books · scholar · JSTOR (December 2009) (Learn how and when to remove this template message) Randori Japanese name Kanji 乱取り Hiragana らんどり Transcriptions Revised Hepburn randori Randori (乱取り) is a term used in Japanese martial arts to describe free-style practice (sparring). The term denotes an exercise in 取り tori, applying technique to a random ( 乱 ran) succession of uke attacks. The actual connotation of randori depends on the martial art it is used in. In judo, jujutsu, and Shodokan aikido, among others, it most often refers to one-on-one sparring where partners attempt to resist and counter each other's techniques. In other styles of aikido, in particular Aikikai, it refers to a form of practice in which a designated aikidoka defends against multiple attackers in quick succession without knowing how they will attack or in what order. Contents 1 In Japan 2 In Judo 3 In Tenshin Aikido 4 In Kendo 5 In Karate 6 In ninjutsu 7 See also 8 References 9 External links In Japan[edit] The term is used in aikido, judo, and Brazilian jiu-jitsu dojos outside Japan. In Japan, this form of practice is called taninzu-gake (多人数掛け), which literally means multiple attackers. In Judo[edit] The term was described by Jigoro Kano, the founder of Judo, in a speech at the 1932 Los Angeles Olympic Games: "Randori, meaning "free exercise", is practiced under conditions of actual contest. It includes throwing, choking, holding the opponent down, and bending or twisting of the arms. The two combatants may use whatever methods they like provided they do not hurt each other and obey the rules of Judo concerning etiquette, which are essential to its proper working." [1] There are 2 types of Randori.[2] [3] In Tenshin Aikido[edit] In Steven Seagal's Tenshin Aikido Federation (affiliated with the Aikikai), randori is different from that of Aikikai, in that the attackers can do anything to the defender (e.g. punch, grab, kick, etc.), and the randori continues on the ground until a pin. In Kendo[edit] In kendo, jigeiko means "friendly" free combat, as in competition, but without counting points. In Karate[edit] Although in karate the word kumite is usually reserved for sparring, some schools also employ the term randori with regard to "mock-combat" in which both karateka move with speed, parrying and attacking with all four limbs (including knees, elbows, etc.). In these schools, the distinction between randori and kumite is that in randori, the action is uninterrupted when a successful technique is applied. (Also known as ju kumite or soft sparring.) In ninjutsu[edit] Randori is also practiced in Bujinkan ninjutsu and usually represented to the practitioner when he reaches the "Shodan" level. In ninjutsu, randori puts the practitioner in a position where he is armed or unarmed and is attacked by multiple attackers. See also[edit] Kata Sparring Randori-no-kata References[edit] ^ Original text of this speech available at The Judo Information Site at http://judoinfo.com/kano1.htm ^ Ohlenkamp, Neil (16 May 2018). Black Belt Judo. New Holland. ISBN 9781845371098 – via Google Books. ^ Tello, Rodolfo (1 August 2016). Judo: Seven Steps to Black Belt (An Introductory Guide for Beginners). Amakella Publishing. ISBN 9781633870086 – via Google Books. External links[edit] Judo Information Site YouTube: Randori In Tenshin Aikido v t e Japanese martial arts Lists List of Japanese martial arts List of koryū schools of martial arts Ko-budō Battōjutsu Bōjutsu Hojōjutsu Iaijutsu Jōjutsu Jujutsu Jittejutsu Kenjutsu Kyūjutsu Naginatajutsu Ninjutsu Shurikenjutsu Sōjutsu Gendai budō Aikido Daitō-ryū Aiki-jūjutsu Iaido Judo Karate Kendo Kyūdō Nippon Kempo Shorinji Kempo Sumo Terms Aiki Budō Dōjō Kuzushi Maai Mushin Randori Uchi-deshi Zanshin Japanese martial arts  • Japan Martial arts v t e Martial arts List of styles History Timeline Hard and soft Regional origin China Europe India Indonesia Japan Korea Philippines Unarmed techniques Chokehold Clinch Footwork Elbow strike Headbutt Hold Kick Knee strike Joint lock Punch Sweep Takedown Throw Weapons Duel Melee weapons Knife fighting Stick-fighting Swordsmanship Ranged weapons Archery Shooting Training Kata Boxing gloves Practice weapon Punching bag Pushing hands Randori Sparring Grappling Brazilian jiu-jitsu Judo Jujutsu Sambo Shuai Jiao Sumo Wrestling Striking Bando Boxing Capoeira Karate Kickboxing Lethwei Muay Thai Pradal serey Sanshou Savate Taekwondo Vovinam Internal Aikido Aikijutsu Baguazhang Tai chi Xing Yi Quan Full contact / combat sports Professional boxing Professional kickboxing Knockdown karate Mixed martial arts Pankration Submission wrestling Vale tudo Self-defense / combatives Arnis Bartitsu Hapkido Kajukenbo Jieitaikakutōjutsu Krav Maga MCMAP Pencak Silat Systema Wing Chun Legal aspects Silat Melayu Eclectic / hybrids American Kenpo Chun Kuk Do Jeet Kune Do Shooto Shorinji Kempo Unifight Entertainment Beat 'em up Fighting game Martial arts film (Chanbara) Professional wrestling Stage combat Wuxia Portal Outline Retrieved from "https://en.wikipedia.org/w/index.php?title=Randori&oldid=1008934709" Categories: Aikido Japanese martial arts Japanese martial arts terminology Judo Mock combat Training Hidden categories: Articles with short description Short description is different from Wikidata Articles needing additional references from December 2009 All articles needing additional references Articles containing Japanese-language text Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages বাংলা Deutsch Español Français Italiano עברית Nederlands 日本語 Polski Português Русский Српски / srpski Suomi Svenska Українська Edit links This page was last edited on 25 February 2021, at 20:55 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-9073 ---- Sybil attack - Wikipedia Sybil attack From Wikipedia, the free encyclopedia Jump to navigation Jump to search Attack done by multiple fake identities In a Sybil attack, the attacker subverts the reputation system of a network service by creating a large number of pseudonymous identities and uses them to gain a disproportionately large influence. It is named after the subject of the book Sybil, a case study of a woman diagnosed with dissociative identity disorder.[1] The name was suggested in or before 2002 by Brian Zill at Microsoft Research.[2] The term pseudospoofing had previously been coined by L. Detweiler on the Cypherpunks mailing list and used in the literature on peer-to-peer systems for the same class of attacks prior to 2002, but this term did not gain as much influence as "Sybil attack".[3] Sybil attacks are also called sock puppetry. Contents 1 Description 2 Example 3 Prevention 3.1 Identity validation 3.2 Social trust graphs 3.3 Economic costs 3.4 Personhood validation 3.5 Application-specific defenses 4 See also 5 References 6 External links Description[edit] The Sybil attack in computer security is an attack wherein a reputation system is subverted by creating multiple identities.[4] A reputation system's vulnerability to a Sybil attack depends on how cheaply identities can be generated, the degree to which the reputation system accepts inputs from entities that do not have a chain of trust linking them to a trusted entity, and whether the reputation system treats all entities identically. As of 2012[update], evidence showed that large-scale Sybil attacks could be carried out in a very cheap and efficient way in extant realistic systems such as BitTorrent Mainline DHT.[5][6] An entity on a peer-to-peer network is a piece of software which has access to local resources. An entity advertises itself on the peer-to-peer network by presenting an identity. More than one identity can correspond to a single entity. In other words, the mapping of identities to entities is many to one. Entities in peer-to-peer networks use multiple identities for purposes of redundancy, resource sharing, reliability and integrity. In peer-to-peer networks, the identity is used as an abstraction so that a remote entity can be aware of identities without necessarily knowing the correspondence of identities to local entities. By default, each distinct identity is usually assumed to correspond to a distinct local entity. In reality, many identities may correspond to the same local entity. An adversary may present multiple identities to a peer-to-peer network in order to appear and function as multiple distinct nodes. The adversary may thus be able to acquire a disproportionate level of control over the network, such as by affecting voting outcomes. In the context of (human) online communities, such multiple identities are sometimes known as sockpuppets. Example[edit] A notable Sybil attack (in conjunction with a traffic confirmation attack) was launched against the Tor anonymity network for several months in 2014 by unknown perpetrators.[7][8] Prevention[edit] Known approaches to Sybil attack prevention include identity validation, social trust graph algorithms, or economic costs, personhood validation, and application-specific defenses. Identity validation[edit] Validation techniques can be used to prevent Sybil attacks and dismiss masquerading hostile entities. A local entity may accept a remote identity based on a central authority which ensures a one-to-one correspondence between an identity and an entity and may even provide a reverse lookup. An identity may be validated either directly or indirectly. In direct validation the local entity queries the central authority to validate the remote identities. In indirect validation the local entity relies on already-accepted identities which in turn vouch for the validity of the remote identity in question. Practical network applications and services often use a variety of identity proxies to achieve limited Sybil attack resistance, such as telephone number verification, credit card verification, or even based on the IP address of a client. These methods have the limitations that it is usually possible to obtain multiple such identity proxies at some cost—or even to obtain many at low cost through techniques such as SMS spoofing or IP address spoofing. Use of such identity proxies can also exclude those without ready access to the required identity proxy: e.g., those without their own mobile phone or credit card, or users located behind carrier-grade network address translation who share their IP addresses with many others. Identity-based validation techniques generally provide accountability at the expense of anonymity, which can be an undesirable tradeoff especially in online forums that wish to permit censorship-free information exchange and open discussion of sensitive topics. A validation authority can attempt to preserve users' anonymity by refusing to perform reverse lookups, but this approach makes the validation authority a prime target for attack. Protocols using threshold cryptography can potentially distribute the role of such a validation authority among multiple servers, protecting users' anonymity even if one or a limited number of validation servers is compromised.[9] Social trust graphs[edit] Sybil prevention techniques based on the connectivity characteristics of social graphs can also limit the extent of damage that can be caused by a given Sybil attacker while preserving anonymity. Examples of such prevention techniques include SybilGuard,[10] SybilLimit,[11] the Advogato Trust Metric,[12] and the sparsity based metric to identify Sybil clusters in a distributed P2P based reputation system.[13] These techniques cannot prevent Sybil attacks entirely, and may be vulnerable to widespread small-scale Sybil attacks. In addition, it is not clear whether real-world online social networks will satisfy the trust or connectivity assumptions that these algorithms assume.[14] Economic costs[edit] Alternatively, imposing economic costs as artificial barriers to entry may be used to make Sybil attacks more expensive. Proof of work, for example, requires a user to prove that they expended a certain amount of computational effort to solve a cryptographic puzzle. In Bitcoin and related permissionless cryptocurrencies, miners compete to append blocks to a blockchain and earn rewards roughly in proportion to the amount of computational effort they invest in a given time period. Investments in other resources such as storage or stake in existing cryptocurrency may similarly be used to impose economic costs. Personhood validation[edit] As an alternative to identity verification that attempts to maintain a strict "one-per-person" allocation rule, a validation authority can use some mechanism other than knowledge of a user's real identity - such as verification of an unidentified person's physical presence at a particular place and time as in a pseudonym party[15] - to enforce a one-to-one correspondence between online identities and real-world users. Such proof of personhood approaches have been proposed as a basis for permissionless blockchains and cryptocurrencies in which each human participant would wield exactly one vote in consensus.[16][17] A variety of approaches to proof of personhood have been proposed, some with deployed implementations, although many usability and security issues remain.[18] Application-specific defenses[edit] A number of distributed protocols have been designed with Sybil attack protection in mind. SumUp[19] and DSybil[20] are Sybil-resistant algorithms for online content recommendation and voting. Whānau is a Sybil-resistant distributed hash table algorithm.[21] I2P's implementation of Kademlia also has provisions to mitigate Sybil attacks.[22] See also[edit] Astroturfing Ballot stuffing Social bot Sockpuppetry References[edit] ^ Lynn Neary (20 October 2011). Real 'Sybil' Admits Multiple Personalities Were Fake. NPR. Retrieved 8 February 2017. ^ Douceur, John R (2002). "The Sybil Attack". Peer-to-Peer Systems. Lecture Notes in Computer Science. 2429. pp. 251–60. doi:10.1007/3-540-45748-8_24. ISBN 978-3-540-44179-3. ^ Oram, Andrew. Peer-to-peer: harnessing the benefits of a disruptive technology. ^ Trifa, Zied; Khemakhem, Maher (2014). "Sybil Nodes as a Mitigation Strategy Against Sybil Attack". Procedia Computer Science. 32: 1135–40. doi:10.1016/j.procs.2014.05.544. ^ Wang, Liang; Kangasharju, Jussi (2012). "Real-world sybil attacks in BitTorrent mainline DHT". 2012 IEEE Global Communications Conference (GLOBECOM). pp. 826–32. doi:10.1109/GLOCOM.2012.6503215. ISBN 978-1-4673-0921-9. ^ Wang, Liang; Kangasharju, Jussi (2013). "Measuring large-scale distributed systems: case of BitTorrent Mainline DHT". IEEE P2P 2013 Proceedings. pp. 1–10. doi:10.1109/P2P.2013.6688697. ISBN 978-1-4799-0515-7. ^ (30 July 2014). Tor security advisory: "relay early" traffic confirmation attack. ^ Dan Goodin (31 July 2014). Active attack on Tor network tried to decloak users for five months. ^ John Maheswaran, Daniel Jackowitz, Ennan Zhai, David Isaac Wolinsky, and Bryan Ford (9 March 2016). Building Privacy-Preserving Cryptographic Credentials from Federated Online Identities (PDF). 6th ACM Conference on Data and Application Security and Privacy (CODASPY).CS1 maint: uses authors parameter (link) ^ Yu, Haifeng; Kaminsky, Michael; Gibbons, Phillip B; Flaxman, Abraham (2006). SybilGuard: defending against sybil attacks via social networks. 2006 conference on Applications, technologies, architectures, and protocols for computer communications - SIGCOMM '06. pp. 267–78. doi:10.1145/1159913.1159945. ISBN 978-1-59593-308-9. ^ SybilLimit: A Near-Optimal Social Network Defense against Sybil Attacks. IEEE Symposium on Security and Privacy. 19 May 2008. ^ O'Whielacronx, Zooko. "Levien's attack-resistant trust metric". . gmane.org. Retrieved 10 February 2012. CS1 maint: discouraged parameter (link) ^ Kurve, Aditya; Kesidis, George (2011). "Sybil Detection via Distributed Sparse Cut Monitoring". 2011 IEEE International Conference on Communications (ICC). pp. 1–6. doi:10.1109/icc.2011.5963402. ISBN 978-1-61284-232-5. ^ Bimal Viswanath, Ansley Post, Krishna Phani Gummadi, and Alan E Mislove (August 2010). "An analysis of social network-based Sybil defenses". ACM SIGCOMM Computer Communication Review. doi:10.1145/1851275.1851226.CS1 maint: uses authors parameter (link) ^ Ford, Bryan; Strauss, Jacob (1 April 2008). An Offline Foundation for Online Accountable Pseudonyms. 1st Workshop on Social Network Systems - SocialNets '08. pp. 31–6. doi:10.1145/1435497.1435503. ISBN 978-1-60558-124-8. ^ Maria Borge, Eleftherios Kokoris-Kogias, Philipp Jovanovic, Linus Gasser, Nicolas Gailly, Bryan Ford (29 April 2017). Proof-of-Personhood: Redemocratizing Permissionless Cryptocurrencies. IEEE Security & Privacy on the Blockchain (IEEE S&B).CS1 maint: uses authors parameter (link) ^ Ford, Bryan (December 2020). "Technologizing Democracy or Democratizing Technology? A Layered-Architecture Perspective on Potentials and Challenges". In Lucy Bernholz; Hélène Landemore; Rob Reich (eds.). Digital Technology and Democratic Theory. University of Chicago Press. ISBN 9780226748573. ^ Divya Siddarth, Sergey Ivliev, Santiago Siri, Paula Berman (13 October 2020). "Who Watches the Watchmen? A Review of Subjective Approaches for Sybil-resistance in Proof of Personhood Protocols". arXiv:2008.05300.CS1 maint: uses authors parameter (link) ^ Nguyen Tran, Bonan Min, Jinyang Li, and Lakshminarayanan Subramanian (22 April 2009). Sybil-Resilient Online Content Voting (PDF). NSDI ’09: 6th USENIX Symposium on Networked Systems Design and Implementation.CS1 maint: uses authors parameter (link) ^ Haifeng Yu, Chenwei Shi, Michael Kaminsky, Phillip B. Gibbons, and Feng Xiao (19 May 2009). DSybil: Optimal Sybil-Resistance for Recommendation Systems. 30th IEEE Symposium on Security and Privacy.CS1 maint: uses authors parameter (link) ^ Chris Lesniewski-Laas and M. Frans Kaashoek (28 April 2010). Whānau: A Sybil-proof Distributed Hash Table (PDF). 7th USENIX Symposium on Network Systems Design and Implementation (NSDI).CS1 maint: uses authors parameter (link) ^ "The Network Database - I2P". External links[edit] Querci, Daniele; Hailes, Stephen (2010). "Sybil Attacks Against Mobile Users: Friends and Foes to the Rescue". 2010 Proceedings IEEE INFOCOM. pp. 1–5. CiteSeerX 10.1.1.360.8730. doi:10.1109/INFCOM.2010.5462218. ISBN 978-1-4244-5836-3. Bazzi, Rida A; Konjevod, Goran (2006). "On the establishment of distinct identities in overlay networks". Distributed Computing. 19 (4): 267–87. doi:10.1007/s00446-006-0012-y. Lesniewski-Laas, Chris (2008). "A Sybil-proof one-hop DHT". Proceedings of the 1st workshop on Social network systems - SocialNets '08. pp. 19–24. doi:10.1145/1435497.1435501. ISBN 978-1-60558-124-8. Newsome, James; Shi, Elaine; Song, Dawn; Perrig, Adrian (2004). "The sybil attack in sensor networks". Proceedings of the third international symposium on Information processing in sensor networks - IPSN'04. pp. 259–68. doi:10.1145/984622.984660. ISBN 978-1581138467. A Survey of Solutions to the Sybil Attack On Network formation: Sybil attacks and Reputation systems Seigneur, Jean-Marc; Gray, Alan; Jensen, Christian Damsgaard (2005). "Trust Transfer: Encouraging Self-recommendations Without Sybil Attack". Trust Management. Lecture Notes in Computer Science. 3477. pp. 321–37. CiteSeerX 10.1.1.391.5003. doi:10.1007/11429760_22. ISBN 978-3-540-26042-4. A Survey of DHT Security Techniques by Guido Urdaneta, Guillaume Pierre and Maarten van Steen. ACM Computing surveys, 2009. An experiment on the weakness of reputation algorithms used in professional social networks: the case of Naymz by Marco Lazzari. Proceedings of the IADIS International Conference e-Society 2010. Retrieved from "https://en.wikipedia.org/w/index.php?title=Sybil_attack&oldid=1000481849" Categories: Computer network security Reputation management Hidden categories: CS1 maint: uses authors parameter CS1 maint: discouraged parameter Articles with short description Short description matches Wikidata Articles containing potentially dated statements from 2012 All articles containing potentially dated statements Use dmy dates from April 2011 Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages Deutsch Español فارسی Français Italiano Português Русский Українська Edit links This page was last edited on 15 January 2021, at 08:15 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-9317 ---- Fear of missing out - Wikipedia Fear of missing out From Wikipedia, the free encyclopedia Jump to navigation Jump to search "FOMO" redirects here. For the album by Liam Finn, see FOMO (album). type of social anxiety Smartphones enable people to remain in contact with their social and professional network continuously. This may result in compulsive checking for status updates and messages, for fear of missing an opportunity.[1] Fear of missing out (FOMO) is a social anxiety[2] stemming from the belief that others might be having fun while the person experiencing the anxiety is not present. It is characterized by a desire to stay continually connected with what others are doing.[3] FOMO is also defined as a fear of regret,[4] which may lead to concerns that one might miss an opportunity for social interaction, a novel experience or a profitable investment.[5] It is the fear that deciding not to participate is the wrong choice.[4][6] Social networking creates many opportunities for FOMO. While it provides opportunities for social engagement,[3] it offers an endless stream of activities in which any given person is not involved. Psychological dependence on social networks can result in anxiety and can lead to FOMO[7] or even pathological internet use.[8] FOMO is claimed to negatively influence psychological health and well-being.[4] Contents 1 History 2 Definition 3 Effects 4 Causes 5 Marketing technique 6 See also 7 References History[edit] The phenomenon was first identified in 1996 by marketing strategist Dr. Dan Herman, who conducted research for Adam Bellouch and published the first academic paper on the topic in 2000 in The Journal of Brand Management.[9] Author Patrick J. McGinnis coined the term FOMO[10] and popularized it in a 2004 op-ed in The Harbus, the magazine of Harvard Business School. The article was titled McGinnis' Two FOs: Social Theory at HBS, and also referred to another related condition, Fear of a Better Option (FoBO), and their role in the school's social life.[11][12][13] The origin of FOMO has also been traced to the 2004 Harbus article by academic Joseph Reagle.[14] Definition[edit] FOMO refers to the apprehension that one is either not in the know or missing out on information, events, experiences, or decisions that could make one's life better.[3] Those affected by it may not know exactly what they are missing but may still worry that others are having a much better time or doing something better than they are, without them.[2] FOMO could result from not knowing about a conversation,[15] missing a T.V. show, not attending a wedding or party,[16] or hearing that others have discovered a new restaurant.[17] Within video games, FOMO is also used to describe the similar anxiety around missing the ability to obtain in-game items or complete activities that are only available for a limited time.[18] Effects[edit] A study by JWTIntelligence suggests that FOMO can influence the formation of long-term goals and self-perceptions.[2] In this study, around half of the respondents stated that they are overwhelmed by the amount of information needed to stay up-to-date, and that it is impossible to not miss out on something. The process of relative deprivation creates FOMO and dissatisfaction. It reduces psychological well-being.[3][4][19] FOMO led to negative social and emotional experiences, such as boredom and loneliness.[20] A 2013 study found that it negatively impacts mood and life satisfaction,[3] reduces self-esteem, and affects mindfulness.[21] According to John M. Grohol, founder and Editor-in-Chief of Psych Central, FOMO may lead to a constant search for new connections with others, abandoning current connections to do so. Moreover, the desire to stay in touch may endanger personal safety, e.g., while driving.[22] A 2019 University of Glasgow study surveyed 467 adolescents, and found that the respondents felt societal pressure to always be available.[23] FOMO-sufferers may increasingly seek access to others' social lives, and consume an escalating amount of real-time information.[24] Causes[edit] FOMO arises from situational or long-term deficits in psychological needs satisfaction, which are not a new phenomenon.[3] Before the Internet, a related phenomenon, "keeping up with the Jones'", was widely experienced. FOMO generalized and intensified this experience because so much more of people's lives became publicly documented and easily accessed. Further, a common tendency is to post about positive experiences (that great restaurant) rather than negative ones (bad first date). Self-determination theory contends that an individual's psychological satisfaction in their competence, autonomy, and relatedness consists of three basic psychological needs for human beings.[25] Test subjects with lower levels of basic psychological satisfaction reported a higher level of FOMO. Basic psychological satisfaction and FOMO were positively correlated.[3] Four in ten young people reported FOMO sometimes or often.[2] FOMO was found to be negatively correlated with age, and men were more likely than women to report It.[3] Social media platforms that are associated with FOMO include Snapchat,[26] Facebook,[27] Instagram,[28] and Twitter. Marketing technique[edit] Advertising and marketing campaigns may seek to intensify FOMO within a marketing strategy. Examples include AT&T's "Don't be left behind" campaign, Duracell's Powermat "Stay in charge" campaign and Heineken's "Sunrise" campaign.[2] The "Sunrise" campaign, in particular, aimed to encourage responsible drinking by portraying excessive drinking as a way to miss the best parts of a party, rather than claiming that excessive drinking is a risk to personal health. Other brands attempt counter FOMO, such as Nescafé's "Wake up to life" campaign.[2] Harnessing TV viewers' FOMO is also perceived to foster higher broadcast ratings. Real-time updates about status and major social events allow for a more engaging media consumption experience and faster dissemination of information.[2] Real-time tweets about the Super Bowl are considered to be correlated with higher TV ratings due to their appeal to FOMO and the prevalence of social media usage.[2] See also[edit] Hyperbolic discounting Kiasu Loss aversion Irrational exuberance Missed connections Murray's system of needs Opportunity cost Relative deprivation Self-determination theory Social media Status anxiety Social proof References[edit] ^ Anderson, Hephzibah (16 April 2011). "Never heard of Fomo? You're so missing out". The Guardian. Retrieved 6 June 2017. ^ a b c d e f g h "Fear of Missing Out (FOMO)" (PDF). J. Walter Thompson. March 2012. Archived from the original (PDF) on June 26, 2015. ^ a b c d e f g h Przybylski, Andrew K.; Murayama, Kou; DeHaan, Cody R.; Gladwell, Valerie (July 2013). "Motivational, emotional, and behavioral correlates of fear of missing out". Computers in Human Behavior. 29 (4): 1841–1848. doi:10.1016/j.chb.2013.02.014. ^ a b c d Wortham, J. (April 10, 2011). "Feel like a wall flower? Maybe it's your Facebook wall". The New York Times. ^ Shea, Michael (27 July 2015). "Living with FOMO". The Skinny. Retrieved 9 January 2016. ^ Alt, Dorit; Boniel-Nissim, Meyran (2018-06-20). "Parent–Adolescent Communication and Problematic Internet Use: The Mediating Role of Fear of Missing Out (FoMO)". Journal of Family Issues. 39 (13): 3391–3409. doi:10.1177/0192513x18783493. ISSN 0192-513X. S2CID 149746950. ^ Jonathan K. J. (1998). "Internet Addiction on Campus: The Vulnerability of College Students". CyberPsychology & Behavior. 1 (1): 11–17. doi:10.1089/cpb.1998.1.11. Archived from the original on 2014-05-13. ^ Song, Indeok; Larose, Robert; Eastin, Matthew S.; Lin, Carolyn A. (September 2004). "Internet Gratifications and Internet Addiction: On the Uses and Abuses of New Media". CyberPsychology & Behavior. 7 (4): 384–394. doi:10.1089/cpb.2004.7.384. PMID 15331025. ^ Herman, Dan (2000-05-01). "Introducing short-term brands: A new branding tool for a new consumer reality". Journal of Brand Management. 7 (5): 330–340. doi:10.1057/bm.2000.23. ISSN 1350-231X. S2CID 167311741. ^ Kozodoy, Peter (2017-10-09). "The Inventor of FOMO is Warning Leaders About a New, More Dangerous Threat". Inc.com. Retrieved 2017-10-10. ^ "Social Theory at HBS: McGinnis' Two FOs". The Harbus. 10 May 2004. Retrieved 30 March 2017. ^ Schreckinger, Ben (29 July 2014). "The Home of FOMO". Boston. Retrieved 30 March 2017. ^ Blair, Linda (6 October 2017). "How to beat 'fear of missing out' as the growth of social media sites feeds the trend - Independent.ie". Independent.ie. Retrieved 2017-10-10. ^ "FOMO's etymology". reagle.org. Retrieved 2017-10-10. ^ Tait, Amelia (2018-10-11). "Why do we experience the curse of conversation envy?". Metro. Retrieved 2020-05-31. ^ "Why FOMO at uni is totally OK to feel". Debut. 2016-10-11. Retrieved 2020-05-31. ^ Delmar, Niamh. "FOMO: Are you afraid of missing out?". The Irish Times. Retrieved 2020-05-31. ^ Close, James; Lloyd, Joanne (2021). Lifting the Lid on Loot-Boxes (PDF) (Report). GambleAware. Retrieved 2 April 2021. CS1 maint: discouraged parameter (link) ^ Morford, M. (August 4, 2010). "Oh my god you are so missing out". San Francisco Chronicle. ^ Burke, M.; Marlow, C. & Lento, T. (2010). Social network activity and social well-being. Postgraduate Medical Journal. 85. pp. 455–459. CiteSeerX 10.1.1.184.2702. doi:10.1145/1753326.1753613. ISBN 9781605589299. S2CID 207178564. ^ "The FoMo Health Factor". Psychology Today. Retrieved 2020-04-09. ^ Grohol, J. (February 28, 2015). "FOMO Addiction: The Fear of Missing Out". World of Psychology. Psych Central. ^ "Woods, H. C. and Scott, H. (2016) #Sleepyteens: social media use in adolescence is associated with poor sleep quality, anxiety, depression and low self-esteem. Journal of Adolescence, 51, pp. 41-49" (PDF). University of Glasgow. Retrieved 28 May 2020. ^ Amichai-Hamburger, Y. & Ben-Artzi, E. (2003), "Loneliness and internet use", Computers in Human Behavior, 19 (1): 71–80, doi:10.1016/S0747-5632(02)00014-6 ^ Deci, E.L. & Ryan, R.M. (1985). Intrinsic motivation and self-determination in human behavior. Plenum Press. ISBN 9780306420221. ^ "Why Snapchat Is The Leading Cause Of FOMO". The Odyssey Online. 2016-03-21. Retrieved 2017-12-06. ^ Krasnova, Hanna; Widjaja, Thomas; Wenninger, Helena; Buxmann, Peter (2013). "Envy on Facebook: A Hidden Threat to Users' Life Satisfaction? - Semantic Scholar". doi:10.7892/boris.47080. S2CID 15408147. Cite journal requires |journal= (help) ^ Djisseglo, Ayoko (2019-05-05). "FOMO: An Instagram Anxiety". Medium. Retrieved 2020-05-31. v t e Conformity Enforcement Proscription Enemy of the people Enemy of the state Ostracism Blacklisting Cancel culture Censorship Outlaw Civil death Vogelfrei Public enemy Group pressure Bandwagon effect Brainwashing Collectivism Consensus reality Deplatforming Dogma Emotional contagion Behavioral Crime Hysterical Suicide Fear of missing out Groupthink Hazing Herd mentality Indoctrination Invented tradition Memory conformity Milieu control Mobbing Nationalism Normalization Normative social influence Patriotism Peer pressure Pluralistic ignorance Propaganda Rally 'round the flag effect Scapegoating Shunning Social influence Socialization Spiral of silence Teasing Tyranny of the majority Untouchability Xeer Individual pressure Authoritarianism Personality Control freak Obsessive–compulsive personality disorder Conformity Compliance Communal reinforcement Countersignaling Herd behavior Internalization Obedience Social proof Experiments Asch conformity experiments Breaching experiment Milgram experiment Stanford prison experiment Anticonformity Alternative media Anti-authoritarianism Anti-social behaviour Auto-segregation Civil disobedience Cosmopolitanism Counterculture Culture jamming Deviance Devil's advocate Dissent Eccentricity Eclecticism Hermit Idiosyncrasy Individualism Pueblo clown Rebellion Red team Satire Shock value Counterconformists Cagot Damnatio memoriae Dissident Exile Homo sacer Nonperson Outcast Persona non grata Retrieved from "https://en.wikipedia.org/w/index.php?title=Fear_of_missing_out&oldid=1019659121" Categories: Advertising Anxiety Internet culture Social media Hidden categories: CS1 maint: discouraged parameter CS1 errors: missing periodical Articles with short description Short description matches Wikidata Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages العربية Català Deutsch Ελληνικά Español فارسی Français 한국어 Հայերեն Italiano עברית 日本語 Norsk bokmål Pälzisch Português Русский ไทย Українська 中文 Edit links This page was last edited on 24 April 2021, at 17:18 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement en-wikipedia-org-9575 ---- Filter (software) - Wikipedia Filter (software) From Wikipedia, the free encyclopedia Jump to navigation Jump to search For Internet filtering software, see Content-control software. For video filtering software, see Filter (video). For other uses, see Email filtering. A filter is a computer program or subroutine to process a stream, producing another stream. While a single filter can be used individually, they are frequently strung together to form a pipeline. Some operating systems such as Unix are rich with filter programs. Windows 7 and later are also rich with filters, as they include Windows PowerShell. In comparison, however, few filters are built into cmd.exe (the original command-line interface of Windows), most of which have significant enhancements relative to the similar filter commands that were available in MS-DOS. OS X includes filters from its underlying Unix base but also has Automator, which allows filters (known as "Actions") to be strung together to form a pipeline. Contents 1 Unix 1.1 List of Unix filter programs 2 DOS 3 Windows 4 References 5 External links Unix[edit] In Unix and Unix-like operating systems, a filter is a program that gets most of its data from its standard input (the main input stream) and writes its main results to its standard output (the main output stream). Auxiliary input may come from command line flags or configuration files, while auxiliary output may go to standard error. The command syntax for getting data from a device or file other than standard input is the input operator (<). Similarly, to send data to a device or file other than standard output is the output operator (>). To append data lines to an existing output file, one can use the append operator (>>). Filters may be strung together into a pipeline with the pipe operator ("|"). This operator signifies that the main output of the command to the left is passed as main input to the command on the right. The Unix philosophy encourages combining small, discrete tools to accomplish larger tasks. The classic filter in Unix is Ken Thompson's grep, which Doug McIlroy cites as what "ingrained the tools outlook irrevocably" in the operating system, with later tools imitating it.[1] grep at its simplest prints any lines containing a character string to its output. The following is an example: cut -d : -f 1 /etc/passwd | grep foo This finds all registered users that have "foo" as part of their username by using the cut command to take the first field (username) of each line of the Unix system password file and passing them all as input to grep, which searches its input for lines containing the character string "foo" and prints them on its output. Common Unix filter programs are: cat, cut, grep, head, sort, uniq, and tail. Programs like awk and sed can be used to build quite complex filters because they are fully programmable. Unix filters can also be used by Data scientists to get a quick overview about a file based dataset.[2] List of Unix filter programs[edit] awk cat comm cut expand compress fold grep head less more nl perl paste pr sed sh sort split strings tail tac tee tr uniq wc zcat DOS[edit] Two standard filters from the early days of DOS-based computers are find and sort. Examples: find "keyword" < inputfilename > outputfilename sort "keyword" < inputfilename > outputfilename find /v "keyword" < inputfilename | sort > outputfilename Such filters may be used in batch files (*.bat, *.cmd etc.). For use in the same command shell environment, there are many more filters available than those built into Windows. Some of these are freeware, some shareware and some are commercial programs. A number of these mimic the function and features of the filters in Unix. Some filtering programs have a graphical user interface (GUI) to enable users to design a customized filter to suit their special data processing and/or data mining requirements. Windows[edit] Windows Command Prompt inherited MS-DOS commands, improved some and added a few. For example, Windows Server 2003 features six command-line filters for modifying Active Directory that can be chained by piping: DSAdd, DSGet, DSMod, DSMove, DSRm and DSQuery.[3] Windows PowerShell adds an entire host of filters known as "cmdlets" which can be chained together with a pipe, except a few simple ones, e.g. Clear-Screen. The following example gets a list of files in the C:\Windows folder, gets the size of each and sorts the size in ascending order. It shows how three filters (Get-ChildItem, ForEach-Object and Sort-Object) are chained with pipes. Get-ChildItem C:\Windows | ForEach-Object { $_.length } | Sort-Object -Ascending References[edit] ^ McIlroy, M. D. (1987). A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 (PDF) (Technical report). CSTR. Bell Labs. 139. CS1 maint: discouraged parameter (link) ^ Data Analysis with the Unix Shell Archived 2016-01-22 at the Wayback Machine - Bernd Zuther, comSysto GmbH, 2013 ^ Holme, Dan; Thomas, Orin (2004). Managing and maintaining a Microsoft Windows Server 2003 environment : exam 70-290. Redmond, WA: Microsoft Press. pp. 3|17—3|26. ISBN 9780735614376. External links[edit] http://www.webopedia.com/TERM/f/filter.html Retrieved from "https://en.wikipedia.org/w/index.php?title=Filter_(software)&oldid=1003727691" Categories: Software design patterns Programming paradigms Operating system technology Hidden categories: CS1 maint: discouraged parameter Webarchive template wayback links Navigation menu Personal tools Not logged in Talk Contributions Create account Log in Namespaces Article Talk Variants Views Read Edit View history More Search Navigation Main page Contents Current events Random article About Wikipedia Contact us Donate Contribute Help Learn to edit Community portal Recent changes Upload file Tools What links here Related changes Upload file Special pages Permanent link Page information Cite this page Wikidata item Print/export Download as PDF Printable version Languages Čeština Dansk Español Français Italiano Magyar 日本語 Українська Walon Edit links This page was last edited on 30 January 2021, at 11:12 (UTC). Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. Privacy policy About Wikipedia Disclaimers Contact Wikipedia Mobile view Developers Statistics Cookie statement erambler-co-uk-3824 ---- Collaborations Workshop 2021: talks & panel session eRambler home about series tags talks rdm resources Collaborations Workshop 2021: talks & panel session Date: 2021-04-05 Series: Collaborations Workshop 2021 Tags: [Technology] [Conference] [SSI] [Research] [Disability] [Equality, diversity & inclusion] Series This post is part of a series on the SSI Collaborations Workshop in 2021. Collaborations Workshop 2021: collaborative ideas & hackday > Collaborations Workshop 2021: talks & panel session < Contents Provocations FAIR Research Software Equality, Diversity & Inclusion: how to go about it Equality, Diversity & Inclusion: disability issues Lightning talks Data & metadata Learning & teaching/community Wrapping up I’ve just finished attending (online) the three days of this year’s SSI Collaborations Workshop (CW for short), and once again it’s been a brilliant experience, as well as mentally exhausting, so I thought I’d better get a summary down while it’s still fresh it my mind. Collaborations Workshop is, as the name suggests, much more focused on facilitating collaborations than a typical conference, and has settled into a structure that starts off with with longer keynotes and lectures, and progressively gets more interactive culminating with a hack day on the third day. That’s a lot to write about, so for this post I’ll focus on the talks and panel session, and follow up with another post about the collaborative bits. I’ll also probably need to come back and add in more links to bits and pieces once slides and the “official” summary of the event become available. Updates 2021-04-07 Added links to recordings of keynotes and panel sessions Provocations The first day began with two keynotes on this year’s main themes: FAIR Research Software and Diversity & Inclusion, and day 2 had a great panel session focused on disability. All three were streamed live and the recordings remain available on Youtube: View the keynotes recording; Google-free alternative link View the panel session recording; Google-free alternative link FAIR Research Software Dr Michelle Barker, Director of the Research Software Alliance, spoke on the challenges to recognition of software as part of the scholarly record: software is not often cited. The FAIR4RS working group has been set up to investigate and create guidance on how the FAIR Principles for data can be adapted to research software as well; as they stand, the Principles are not ideally suited to software. This work will only be the beginning though, as we will also need metrics, training, career paths and much more. ReSA itself has 3 focus areas: people, policy and infrastructure. If you’re interested in getting more involved in this, you can join the ReSA email list. Equality, Diversity & Inclusion: how to go about it Dr Chonnettia Jones, Vice President of Research, Michael Smith Foundation for Health Research spoke extensively and persuasively on the need for Equality, Diversity & Inclusion (EDI) initiatives within research, as there is abundant robust evidence that all research outcomes are improved. She highlighted the difficulties current approaches to EDI have effecting structural change, and changing not just individual behaviours but the cultures & practices that perpetuate iniquity. What initiatives are often constructed around making up for individual deficits, a bitter framing is to start from an understanding of individuals having equal stature but having different tired experiences. Commenting on the current focus on “research excellent” she pointed out that the hyper-competition this promotes is deeply unhealthy. suggesting instead that true excellence requires diversity, and we should focus on an inclusive excellence driven by inclusive leadership. Equality, Diversity & Inclusion: disability issues Day 2’s EDI panel session brought together five disabled academics to discuss the problems of disability in research. Dr Becca Wilson, UKRI Innovation Fellow, Institute of Population Health Science, University of Liverpool (Chair) Phoenix C S Andrews (PhD Student, Information Studies, University of Sheffield and Freelance Writer) Dr Ella Gale (Research Associate and Machine Learning Subject Specialist, School of Chemistry, University of Bristol) Prof Robert Stevens (Professor and Head of Department of Computer Science, University of Manchester) Dr Robin Wilson (Freelance Data Scientist and SSI Fellow) NB. The discussion flowed quite freely so the following summary, so the following summary mixes up input from all the panel members. Researchers are often assumed to be single-minded in following their research calling, and aptness for jobs is often partly judged on “time send”, which disadvantages any disabled person who has been forced to take a career break. On top of this disabled people are often time-poor because of the extra time needed to manage their condition, leaving them with less “output” to show for their time served on many common metrics. This can partially affect early-career researchers, since resources for these are often restricted on a “years-since-PhD” criterion. Time poverty also makes funding with short deadlines that much harder to apply for. Employers add more demands right from the start: new starters are typically expected to complete a health and safety form, generally a brief affair that will suddenly become an 80-page bureaucratic nightmare if you tick the box declaring a disability. Many employers claim to be inclusive yet utterly fail to understand the needs of their disabled staff. Wheelchairs are liberating for those who use them (despite the awful but common phrase “wheelchair-bound”) and yet employers will refuse to insure a wheelchair while travelling for work, classifying it as a “high value personal item” that the owner would take the same responsibility for as an expensive camera. Computers open up the world for blind people in a way that was never possible without them, but it’s not unusual for mandatory training to be inaccessible to screen readers. Some of these barriers can be overcome, but doing so takes yet more time that could and should be spent on more important work. What can we do about it? Academia works on patronage whether we like it or not, so be the person who supports people who are different to you rather than focusing on the one you “recognise yourself in” to mentor. As a manager, it’s important to ask each individual what they need and believe them: they are the expert in their own condition and their lived experience of it. Don’t assume that because someone else in your organisation with the same disability needs one set of accommodations, it’s invalid for your staff member to require something totally different. And remember: disability is unusual as a protected characteristic in that anyone can acquire it at any time without warning! Lightning talks Lightning talk sessions are always tricky to summarise, and while this doesn’t do them justice, here are a few highlights from my notes. Data & metadata Malin Sandstrom talked about a much-needed refinement of contributor role taxonomies for scientific computing Stephan Druskat showcased a project to crowdsource a corpus of research software for further analysis Learning & teaching/community Matthew Bluteau introduced the concept of the “coding dojo” as a way to enhance community of practice. A group of coders got together to practice & learn by working together to solve a problem and explaining their work as they go He described 2 models: a code jam, where people work in small groups, and the Randori method, where 2 people do pair programming while the rest observe. I’m excited to try this out! Steve Crouch talked about intermediate skills and helping people take the next step, which I’m also very interested in with the GLAM Data Science network Esther Plomp recounted experience of running multiple Carpentry workshops online, while Diego Alonso Alvarez discussed planned workshops on making research software more usable with GUIs Shoaib Sufi showcased the SSI’s new event organising guide Caroline Jay reported on a diary study into autonomy & agency in RSE during COVID Lopez, T., Jay, C., Wermelinger, M., & Sharp, H. (2021). How has the covid-19 pandemic affected working conditions for research software engineers? Unpublished manuscript. Wrapping up That’s not everything! But this post is getting pretty long so I’ll wrap up for now. I’ll try to follow up soon with a summary of the “collaborative” part of Collaborations Workshop: the idea-generating sessions and hackday! Comments You can comment on this post, "Collaborations Workshop 2021: talks & panel session", by: Replying to its tweet on Twitter or its toot on Mastodon Sending a Webmention from your own site to https://erambler.co.uk/blog/collabw21-part-1/ Using this button: Comments & reactions haven't loaded yet. You might have JavaScript disabled but that's cool 😎. me elsewhere :: keyoxide | keybase | mastodon | matrix | twitter | github | gitlab | orcid | pypi | linkedin © 2021 Jez Cope | Built by: Hugo | Theme: Mnemosyne Build status: Except where noted, this work is licensed under a Creative Commons Attribution 4.0 International License. erambler-co-uk-4392 ---- Beginner's guide to Twitter Part I: messages, followers and searching | eRambler eRambler Jez Cope's blog on becoming a research technologist Home About Blogroll Please note: this older content has been archived and is no longer fully linked into the site. Please go to the current home page for up-to-date content. Beginner's guide to Twitter Part I: messages, followers and searching Sunday 15 March 2009 Tagged with Howto Message Social media Social networking Tutorial Tweet Twitter Web 2.0 Twitter home page I’ve recently signed up to Twitter. It’s not a new thing; it’s been around for a few years and it’s probably safe to say that I’m way behind the curve on this one. For those who haven’t come across it yet, it’s a very, very simple social networking site which allows you to broadcast 140-character messages. However, in spite of this simplicity, it’s a very powerful tool, and can be quite off-putting for new users. Since I’m a bit techie and tend to pick these things up quite quickly, a few friends have suggested that I lay down some words on how to get to grips with Twitter. I’ve ended up breaking it into three to make it a bit more digestible: Twitter basics: messages, followers and searching; Confusing conventions: @s, #s and RTs; Useful tools to make your Twittering life easier. I’ll spread them out by publishing them over a period of three days. So, without further ado, here’s the first part of my guide to making this very cool tool work for you. How does it work? When I said it was simple, I wasn’t kidding. Once you’ve signed up on the Twitter website, you do one of three things: send and receive messages, follow people (more on what this means in a bit), or search through the archive of old messages. That’s it. Let’s have a look at those components in more detail. Messages The core of Twitter is the status update or tweet; that’s a brief message, broadcast to every other user, taking up no more than 140 characters (letters, digits, punctuation, spaces). By and large, this will be some form of answer to the question “What are you doing?” You can send as many of these as you like, whenever you like. You can even split a longer message across several tweets (manually), but if you need to do this, you might want to question whether another medium might be more appropriate. You can also send direct messages to specific users: these are completely private one-to-one communications. If you’re having a conversation publicly with another user and it’s starting to ramble on, think about switching to direct messages to avoid subjecting everyone else to a conversation that doesn’t concern them. You can only send direct messages to users who are following you: more on what this means next. Followers Wading through the tweets of every other twitterer on the planet is going to take some time. The answer to this problem is ‘following’. You’ll notice that, to begin with, your home page shows only your own tweets. No, Twitter isn’t broken: this page will only show the tweets of people you’re following. This hands control over what you read back to you: you don’t have to follow anyone you don’t want to. I can’t emphasise enough how important this is: don’t follow anyone whose tweets aren’t worth reading. By all means follow someone for a while before you make this decision, and change your mind all you want. Just remember that if you’re not interested in updates on userxyz’s cat at 90-second intervals, no-one says you have to follow them. Follow button You can follow someone by visiting their profile page, which will have the form “http://twitter.com/username”. This page lists their most recent tweets, newest first. Right at the top, underneath their picture, there’s a button marked “Follow”: click this and it’ll change to a message telling you that you’re now following them. To stop following someone, click this message and it’ll reveal a “Remove” button for you to press. Twitter will send them an email when you start following them, but not when you stop. Following info On the left of your home page, there are links entitled “Following” and “Followers” which take you to a list of people you follow and people who follow you, respectively. On your followers list, you’ll see a tick next to anyone you’re also following, and a follow button next to anyone you’re not. Following people who follow you is good for at least three reasons: It allows you to hold a conversation, and to receive direct messages from them; It's a great way to build your network; It's considered polite. That said, my previous advice still stands: you don’t have to follow anyone you don’t want to. So how do you find people to follow? You’ve got a few options here. The best way to get started is to follow people you know in real life: try searching for them. As I’ve already mentioned you can follow people who follow you. You can wade through the global list of tweets and follow people with similar interests (searching will help here: see the next section). You could have a look at the we follow directory to find people. Finally, you can explore your network by looking at your followers’ followers and so on. It’s worth reiterating at this point that all your tweets are visible, ultimately, to anyone on the network. If you’re not happy with this, you can restrict access, which means that only your followers can read your tweets. It’ll also mean that you have to give your approval before someone can follow you. This might work for you, but openness has it’s benefits: you’ll find it a lot more people will follow you if you keep your account open. You’ll get a lot more out of Twitter if you stay open and simply avoid saying anything that you don’t want the whole world to know. Search So, you’ve got to grips with sending and reading tweets, you’ve chosen a few people to follow and started to join in the global conversation that is Twitter. You’re already getting a lot out of this great tool. But what about all the tweets you’re missing? Perhaps you represent a company and want to know who’s talking about your brand. Maybe you’re going to attend a conference and want to connect with other delegates. Maybe you just want the answer to a question and want to see if someone’s already mentioned it. For these, and many more, problems, Twitter search is the answer. Try searching for a brand, a conference or anything else you’re interested in, and you’ll quickly and easily discover what twitterers the world over are saying about it. You might even want to follow some of them. Well, that’s it for today. Tomorrow I’ll be looking at some of the initially confusing but massively useful conventions that have grown up within Twitter: @replies, #hashtags and retweeting. Did you find this post useful? Is there something I’ve totally missed that you think should really be in there? Perhaps you just think I’m great (well, it might happen). I want to bring you really high quality stuff, and the only way I do that is if you (yes, you with the web browser) tell me how I’m doing. Please leave a comment below or link to me from your own blog (that’ll appear here as a comment too, with a link back to you: free publicity!). I’ll do my best to respond to feedback, correct inaccuracies in the text and write more about things that interest both me and you. Finally, if you find this post useful please tell your friends and colleagues. Thanks for stopping by! Hi, I’m Jez Cope and this is my blog, where I talk about technology in research and higher education, including: Research data management; e-Research; Learning; Teaching; Educational technology. Me elsewhere Twitter github LinkedIn Diigo Zotero Google+ eRambler by Jez Cope is licensed under a Creative Commons Attribution-ShareAlike 4.0 International license erambler-co-uk-5000 ---- eRambler eRambler home about series tags talks rdm resources a blog about research communication & higher education & open culture & technology & making & librarianship & stuff Intro to the fediverse Date: 2021-04-11 Tags: [Fediverse] [Social media] [Twitter] Wow, it turns out to be 10 years since I wrote this beginners guide to Twitter. Things have moved on a loooooong way since then. Far from being the interesting, disruptive technology it was back then, Twitter has become part of the mainstream, the establishment. Almost everyone and everything is on Twitter now, which has both pros and cons. So what’s the problem? It’s now possible to follow all sorts of useful information feeds, from live updates on transport delays to your favourite sports team’s play-by-play performance to an almost infinite number of cat pictures. Read more... Collaborations Workshop 2021: collaborative ideas & hackday Date: 2021-04-07 Series: Collaborations Workshop 2021 Tags: [Technology] [Conference] [SSI] [Research] [Disability] [Equality, diversity & inclusion] My last post covered the more “traditional” lectures-and-panel-sessions approach of the first half of the SSI Collaborations Workshop. The rest of the workshop was much more interactive, consisting of a discussion session, a Collaborative Ideas session, and a whole-day hackathon! The discussion session on day one had us choose a topic (from a list of topics proposed leading up to the workshop) and join a breakout room for that topic with the aim of producing a “speed blog” by then end of 90 minutes. Read more... Collaborations Workshop 2021: talks & panel session Date: 2021-04-05 Series: Collaborations Workshop 2021 Tags: [Technology] [Conference] [SSI] [Research] [Disability] [Equality, diversity & inclusion] I’ve just finished attending (online) the three days of this year’s SSI Collaborations Workshop (CW for short), and once again it’s been a brilliant experience, as well as mentally exhausting, so I thought I’d better get a summary down while it’s still fresh it my mind. Collaborations Workshop is, as the name suggests, much more focused on facilitating collaborations than a typical conference, and has settled into a structure that starts off with with longer keynotes and lectures, and progressively gets more interactive culminating with a hack day on the third day. Read more... Date: 2021-04-03 Tags: [Meta] [Design] I’ve decided to try switching this website back to using Hugo to manage the content and generate the static HTML pages. I’ve been on the Python-based Nikola for a few years now, but recently I’ve been finding it quite slow, and very confusing to understand how to do certain things. I used Hugo recently for the GLAM Data Science Network website and found it had come on a lot since the last time I was using it, so I thought I’d give it another go, and redesign this site to be a bit more minimal at the same time. The theme is still a work in progress so it’ll probably look a bit rough around the edges for a while, but I think I’m happy enough to publish it now. When I get round to it I might publish some more detailed thoughts on the design. Ideas for Accessible Communications Date: 2021-03-20 Tags: [Stuff] [Accessibility] [Ablism] The Disability Support Network at work recently ran a survey on “accessible communications”, to develop guidance on how to make communications (especially internal staff comms) more accessible to everyone. I grabbed a copy of my submission because I thought it would be useful to share more widely, so here it is. Please note that these are based on my own experiences only. I am in no way suggesting that these are the only things you would need to do to ensure your communications are fully accessible. Read more... Matrix self-hosting Date: 2021-03-12 Tags: [Technology] [Matrix] [Communication] [Self-hosting] [DWeb] I started running my own Matrix server a little while ago. Matrix is something rather cool, a chat system similar to IRC or Slack, but open and federated. Open in that the standard is available for anyone to view, but also the reference implementations of server and client are open source, along with many other clients and a couple of nascent alternative servers. Federated in that, like email, it doesn’t matter what server you sign up with, you can talk to users on your own or any other server. Read more... What do you miss least about pre-lockdown life? Date: 2021-02-26 Tags: [Stuff] [Reflection] [Pandemic] @JanetHughes on Twitter: What do you miss the least from pre-lockdown life? I absolutely do not miss wandering around the office looking for a meeting room for a confidential call or if I hadn’t managed to book a room in advance. Let’s never return to that joyless frustration, hey? 10:27 AM · Feb 3, 2021 After seeing Terence Eden taking Janet Hughes' tweet from earlier this month as a writing prompt, I thought I might do the same. Read more... Remarkable blogging Date: 2021-02-06 Tags: [Technology] [Writing] [Gadgets] And the handwritten blog saga continues, as I’ve just received my new reMarkable 2 tablet, which is designed for reading, writing and nothing else. It uses a super-responsive e-ink display and writing on it with a stylus is a dream. It has a slightly rough texture with just a bit of friction that makes my writing come out a lot more legibly than on a slippery glass touchscreen. If that was all there was to it, I might not have wasted my money, but it turns out that it runs on Linux and the makers have wisely decided not to lock it down but to give you full root mess. Read more... GLAM Data Science Network fellow travellers Date: 2021-02-03 Series: GLAM Data Science Network Tags: [Data science] [GLAM] [Librarianship] [Humanities] [Cultural heritage] Updates 2021-02-04 Thanks to Gene @dzshuniper@ausglam.space for suggesting ADHO and a better attribution for the opening quote (see comments below for details) See comments & webmentions for details. “If you want to go fast, go alone. If you want to go far, go together.” — African proverb, probably popularised in English by Kenyan church leader Rev. Samuel Kobia (original) This quote is a popular one in the Carpentries community, and I interpret it in this context to mean that a group of people working together is more sustainable than individuals pursuing the same goal independently. Read more... Date: 2021-01-26 Tags: [Font] [Writing] [Stuff] I’ve updated my blog theme to use the quasi-proportional fonts Iosevka Aile and Iosevka Etoile. I really like the aesthetic, as they look like fixed-width console fonts (I use the true fixed-width version of Iosevka in my terminal and text editor) but they’re actually proportional which makes them easier to read. https://typeof.net/Iosevka/ 1 of 9 Next Page me elsewhere :: keyoxide | keybase | mastodon | matrix | twitter | github | gitlab | orcid | pypi | linkedin © 2021 Jez Cope | Built by: Hugo | Theme: Mnemosyne Build status: Except where noted, this work is licensed under a Creative Commons Attribution 4.0 International License. erambler-co-uk-6760 ---- Collaborations Workshop 2021: collaborative ideas & hackday eRambler home about series tags talks rdm resources Collaborations Workshop 2021: collaborative ideas & hackday Date: 2021-04-07 Series: Collaborations Workshop 2021 Tags: [Technology] [Conference] [SSI] [Research] [Disability] [Equality, diversity & inclusion] Series This post is part of a series on the SSI Collaborations Workshop in 2021. > Collaborations Workshop 2021: collaborative ideas & hackday < Collaborations Workshop 2021: talks & panel session My last post covered the more “traditional” lectures-and-panel-sessions approach of the first half of the SSI Collaborations Workshop. The rest of the workshop was much more interactive, consisting of a discussion session, a Collaborative Ideas session, and a whole-day hackathon! The discussion session on day one had us choose a topic (from a list of topics proposed leading up to the workshop) and join a breakout room for that topic with the aim of producing a “speed blog” by then end of 90 minutes. Those speed blogs will be published on the SSI blog over the coming weeks, so I won’t go into that in more detail. The Collaborative Ideas session is a way of generating hackday ideas, by putting people together at random into small groups to each raise a topic of interest to them before discussing and coming up with a combined idea for a hackday project. Because of the serendipitous nature of the groupings, it’s a really good way of generating new ideas from unexpected combinations of individual interests. After that, all the ideas from the session, along with a few others proposed by various participants, were pitched as ideas for the hackday and people started to form teams. Not every idea pitched gets worked on during the hackday, but in the end 9 teams of roughly equal size formed to spend the third day working together. My team’s project: “AHA! An Arts & Humanities Adventure” There’s a lot of FOMO around choosing which team to join for an event like this: there were so many good ideas and I wanted to work on several of them! In the end I settled on a team developing an escape room concept to help Arts & Humanities scholars understand the benefits of working with research software engineers for their research. Five of us rapidly mapped out an example storyline for an escape room, got a website set up with GitHub and populated it with the first few stages of the game. We decided to focus on a story that would help the reader get to grips with what an API is and I’m amazed how much we managed to get done in less than a day’s work! You can try playing through the escape room (so far) yourself on the web, or take a look at the GitHub repository, which contains the source of the website along with a list of outstanding tasks to work on if you’re interested in contributing. I’m not sure yet whether this project has enough momentum to keep going, but it was a really valuable way both of getting to know and building trust with some new people and demonstrating the concept is worth more work. Other projects Here’s a brief rundown of the other projects worked on by teams on the day. Coding Confessions Everyone starts somewhere and everyone cuts corners from time to time. Real developers copy and paste! Fight imposter syndrome by looking through some of these confessions or contributing your own. https://coding-confessions.github.io/ CarpenPI A template to set up a Raspberry Pi with everything you need to run a Carpentries (https://carpentries.org/) data science/software engineering workshop in a remote location without internet access. https://github.com/CarpenPi/docs/wiki Research Dugnads A guide to running an event that is a coming together of a research group or team to share knowledge, pass on skills, tidy and review code, among other software and working best practices (based on the Norwegian concept of a dugnad, a form of “voluntary work done together with other people”) https://research-dugnads.github.io/dugnads-hq/ Collaborations Workshop ideas A meta-project to collect together pitches and ideas from previous Collaborations Workshop conferences and hackdays, to analyse patterns and revisit ideas whose time might now have come. https://github.com/robintw/CW-ideas howDescribedIs Integrate existing tools to improve the machine-readable metadata attached to open research projects by integrating projects like SOMEF, codemeta.json and HowFAIRIs (https://howfairis.readthedocs.io/en/latest/index.html). Complete with CI and badges! https://github.com/KnowledgeCaptureAndDiscovery/somef-github-action Software end-of-project plans Develop a template to plan and communicate what will happen when the fixed-term project funding for your research software ends. Will maintenance continue? When will the project sunset? Who owns the IP? https://github.com/elichad/software-twilight Habeas Corpus A corpus of machine readable data about software used in COVID-19 related research, based on the CORD19 dataset. https://github.com/softwaresaved/habeas-corpus Credit-all Extend the all-contributors GitHub bot (https://allcontributors.org/) to include rich information about research project contributions such as the CASRAI Contributor Roles Taxonomy (https://casrai.org/credit/) https://github.com/dokempf/credit-all I’m excited to see so many metadata-related projects! I plan to take a closer look at what the Habeas Corpus, Credit-all and howDescribedIs teams did when I get time. I also really want to try running a dugnad with my team or for the GLAM Data Science network. Comments You can comment on this post, "Collaborations Workshop 2021: collaborative ideas & hackday", by: Replying to its tweet on Twitter or its toot on Mastodon Sending a Webmention from your own site to https://erambler.co.uk/blog/collabw21-part-2/ Using this button: Comments & reactions haven't loaded yet. You might have JavaScript disabled but that's cool 😎. me elsewhere :: keyoxide | keybase | mastodon | matrix | twitter | github | gitlab | orcid | pypi | linkedin © 2021 Jez Cope | Built by: Hugo | Theme: Mnemosyne Build status: Except where noted, this work is licensed under a Creative Commons Attribution 4.0 International License. erambler-co-uk-7259 ---- Intro to the fediverse eRambler home about series tags talks rdm resources Intro to the fediverse Date: 2021-04-11 Tags: [Fediverse] [Social media] [Twitter] Wow, it turns out to be 10 years since I wrote this beginners guide to Twitter. Things have moved on a loooooong way since then. Far from being the interesting, disruptive technology it was back then, Twitter has become part of the mainstream, the establishment. Almost everyone and everything is on Twitter now, which has both pros and cons. So what’s the problem? It’s now possible to follow all sorts of useful information feeds, from live updates on transport delays to your favourite sports team’s play-by-play performance to an almost infinite number of cat pictures. In my professional life it’s almost guaranteed that anyone I meet will be on Twitter, meaning that I can contact them to follow up at a later date without having to exchange contact details (and they have options to block me if they don’t like that). On the other hand, a medium where everyone’s opinion is equally valid regardless of knowledge or life experience has turned some parts of the internet into a toxic swamp of hatred and vitriol. It’s easier than ever to forget that we have more common ground with any random stranger than we have similarities, and that’s led to some truly awful acts and a poisonous political arena. Part of the problem here is that each of the social media platforms is controlled by a single entity with almost no accountability to anyone other than shareholders. Technological change has been so rapid that the regulatory regime has no idea how to handle them, leaving them largely free to operate how they want. This has led to a whole heap of nasty consequences that many other people have done a much better job of documenting than I could (Shoshana Zuboff’s book The Age of Surveillance Capitalism is a good example). What I’m going to focus on instead are some possible alternatives. If you accept the above argument, one obvious solution is to break up the effective monopoly enjoyed by Facebook, Twitter et al. We need to be able to retain the wonderful affordances of social media but democratise control of it, so that it can never be dominated by a small number of overly powerful players. What’s the solution? There’s actually a thing that already exists, that almost everyone is familiar with and that already works like this. It’s email. There are a hundred thousand email servers, but my email can always find your inbox if I know your address because that address identifies both you and the email service you use, and they communicate using the same protocol, Simple Mail Transfer Protocol (SMTP)1. I can’t send a message to your Twitter from my Facebook though, because they’re completely incompatible, like oil and water. Facebook has no idea how to talk to Twitter and vice versa (and the companies that control them have zero interest in such interoperability anyway). Just like email, a federated social media service like Mastodon allows you to use any compatible server, or even run your own, and follow accounts on your home server or anywhere else, even servers running different software as long as they use the same ActivityPub protocol. There’s no lock-in because you can move to another server any time you like, and interact with all the same people from your new home, just like changing your email address. Smaller servers mean that no one server ends up with enough power to take over and control everything, as the social media giants do with their own platforms. But at the same time, a small server with a small moderator team can enforce local policy much more easily and block accounts or whole servers that host trolls, nazis or other poisonous people. How do I try it? I have no problem with anyone for choosing to continue to use what we’re already calling “traditional” social media; frankly, Facebook and Twitter are still useful for me to keep in touch with a lot of my friends. However, I do think it’s useful to know some of the alternatives if only to make a more informed decision to stick with your current choices. Most of these services only ask for an email address when you sign up and use of your real name vs a pseudonym is entirely optional so there’s not really any risk in signing up and giving one a try. That said, make sure you take sensible precautions like not reusing a password from another account. Instead of… Try… Twitter, Facebook Mastodon, Pleroma, Misskey Slack, Discord, IRC Matrix WhatsApp, FB Messenger, Telegram Also Matrix Instagram, Flickr PixelFed YouTube PeerTube The web Interplanetary File System (IPFS) Which, if you can believe it, was formalised nearly 40 years ago in 1982 and has only had fairly minor changes since then! ↩︎ Comments You can comment on this post, "Intro to the fediverse", by: Replying to its tweet on Twitter or its toot on Mastodon Sending a Webmention from your own site to https://erambler.co.uk/blog/intro-to-the-fediverse/ Using this button: Comments & reactions haven't loaded yet. You might have JavaScript disabled but that's cool 😎. me elsewhere :: keyoxide | keybase | mastodon | matrix | twitter | github | gitlab | orcid | pypi | linkedin © 2021 Jez Cope | Built by: Hugo | Theme: Mnemosyne Build status: Except where noted, this work is licensed under a Creative Commons Attribution 4.0 International License. erinrwhite-com-2076 ---- Libraries – erin white Libraries – erin white library technology, UX, the web, bikes, #RVA Talk: Using light from the dumpster fire to illuminate a more just digital world This February I gave a lightning talk for the Richmond Design Group. My question: what if we use the light from the dumpster fire of 2020 to see an equitable, just digital world? How can we change our thinking to build the future web we need? Presentation is embedded here; text of talk is below. […] Podcast interview: Names, binaries and trans-affirming systems on Legacy Code Rocks! In February I was honored to be invited to join Scott Ford on his podcast Legacy Code Rocks!. I’m embedding the audio below. View the full episode transcript — thanks to trans-owned Deep South Transcription Services! I’ve pulled out some of the topics we discussed and heavily edited/rearranged them for clarity. Names in systems Legal […] Trans-inclusive design at A List Apart I am thrilled and terrified to say that I have an article on Trans-inclusive design out on A List Apart today. I have read A List Apart for years and have always seen it as The Site for folks who make websites, so it is an honor to be published there. Coming out as nonbinary at work This week, after 10 years of working at VCU Libraries, I have been letting my colleagues know that I’m nonbinary. Response from my boss, my team, and my colleagues has been so positive, and has made this process so incredibly easy. I didn’t really have a template for a coming-out message, so ended up writing […] What it means to stay Seven years ago last month I interviewed for my job at VCU. I started work a few months later, assuming I’d stick around for a couple of years then move on to my Next Academic Library Job. Instead I found myself signing closing papers on a house on my sixth work anniversary, having decided to […] Back-to-school mobile snapshot This week I took a look at mobile phone usage on the VCU Libraries website for the first couple weeks of class and compared that to similar time periods from the past couple years. 2015 Here’s some data from the first week of class through today. Note that mobile is 9.2% of web traffic. To round […] Recruiting web workers for your library In the past few years I’ve created a couple of part-time, then full-time, staff positions on the web team at VCU Libraries. We now have a web designer and a web developer who’ve both been with us for a while, but for a few years it was a revolving door of hires. So let’s just say I’ve hired lots […] Easier access for databases and research guides at VCU Libraries Today VCU Libraries launched a couple of new web tools that should make it easier for people to find or discover our library’s databases and research guides. This project’s goal was to help connect “hunters” to known databases and help “gatherers” explore new topic areas in databases and research guides1. Our web redesign task force […] Why this librarian supports the Ada Initiative This week the Ada Initiative is announcing a fundraising drive just for the library community. I’m pitching in, and I hope you will, too. The Ada Initiative’s mission is to increase the status and participation of women in open technology and culture. The organization holds AdaCamps, ally workshops for men, and impostor syndrome trainings; and […] A new look for search at VCU Libraries This week we launched a new design for VCU Libraries Search (our instance of Ex Libris’ Primo discovery system). The guiding design principles behind this project: Mental models: Bring elements of the search interface in line with other modern, non-library search systems that our users are used to. In our case, we looked to e-commerce websites […] erinrwhite-com-4563 ---- Talk: Using light from the dumpster fire to illuminate a more just digital world – erin white erinrwhite Published April 16, 2021 Skip to content erinrwhite in Libraries, Richmond | April 16, 2021 Talk: Using light from the dumpster fire to illuminate a more just digital world This February I gave a lightning talk for the Richmond Design Group. My question: what if we use the light from the dumpster fire of 2020 to see an equitable, just digital world? How can we change our thinking to build the future web we need? Presentation is embedded here; text of talk is below. Hi everybody, I’m Erin. Before I get started I want to say thank you to the RVA Design Group organizers. This is hard work and some folks have been doing it for YEARS. Thank you to the organizers of this group for doing this work and for inviting me to speak. This talk isn’t about 2020. This talk is about the future. But to understand the future, we gotta look back. The web in 1996 Travel with me to 1996. Twenty-five years ago! I want to transport us back to the mindset of the early web. The fundamental idea of hyperlinks, which we now take for granted, really twisted everyone’s noodles. So much of the promise of the early web was that with broad access to publish in hypertext, the opportunities were limitless. Technologists saw the web as an equalizing space where systems of oppression that exist in the real world wouldn’t matter, and that we’d all be equal and free from prejudice. Nice idea, right? You don’t need to’ve been around since 1996 to know that’s just not the way things have gone down. Pictured before you are some of the early web pioneers. Notice a pattern here? These early visions of the web, including Barlow’s declaration of independence of cyberspace, while inspiring and exciting, were crafted by the same types of folks who wrote the actual declaration of independence: the landed gentry, white men with privilege. Their vision for the web echoed the declaration of independence’s authors’ attempts to describe the world they envisioned. And what followed was the inevitable conflict with reality. We all now hold these truths to be self-evident: The systems humans build reflect humans’ biases and prejudices. We continue to struggle to diversify the technology industry. Knowledge is interest-driven. Inequality exists, online and off. Celebrating, rather than diminishing, folks’ intersecting identities is vital to human flourishing. The web we have known Profit first: monetization, ads, the funnel, dark patterns Can we?: Innovation for innovation’s sake Solutionism: code will save us Visual design: aesthetics over usability Lone genius: “hard” skills and rock star coders Short term thinking: move fast, break stuff Shipping: new features, forsaking infrastructure Let’s move forward quickly through the past 25 years or so of the web, of digital design. All of the web we know today has been shaped in some way by intersecting matrices of domination: colonialism, capitalism, white supremacy, patriarchy. (Thank you, bell hooks.) The digital worlds where we spend our time – and that we build!! – exist in this way. This is not an indictment of anyone’s individual work, so please don’t take it personally. What I’m talking about here is the digital milieu where we live our lives. The funnel drives everything. Folks who work in nonprofits and public entities often tie ourselves in knots to retrofit our use cases in order to use common web tools (google analytics, anyone?) In chasing innovation™ we often overlook important infrastructure work, and devalue work — like web accessibility, truly user-centered design, care work, documentation, customer support and even care for ourselves and our teams — that doesn’t drive the bottom line. We frequently write checks for our future selves to cash, knowing damn well that we’ll keep burying ourselves in technical debt. That’s some tough stuff for us to carry with us every day. The “move fast” mentality has resulted in explosive growth, but at what cost? And in creating urgency where it doesn’t need to exist, focusing on new things rather than repair, the end result is that we’re building a house of cards. And we’re exhausted. To zoom way out, this is another manifestation of late capitalism. Emphasis on LATE. Because…2020 happened. What 2020 taught us Hard times amplify existing inequalities Cutting corners mortgages our future Infrastructure is essential “Colorblind”/color-evasive policy doesn’t cut it Inclusive design is vital We have a duty to each other Technology is only one piece Together, we rise The past year has been awful for pretty much everybody. But what the light from this dumpster fire has illuminated is that things have actually been awful for a lot of people, for a long time. This year has shown us how perilous it is to avoid important infrastructure work and to pursue innovation over access. It’s also shown us that what is sometimes referred to as colorblindness — I use the term color-evasiveness because it is not ableist and it is more accurate — a color-evasive approach that assumes everyone’s needs are the same in fact leaves people out, especially folks who need the most support. We’ve learned that technology is a crucial tool and that it’s just one thing that keeps us connected to each other as humans. Finally, we’ve learned that if we work together we can actually make shit happen, despite a world that tells us individual action is meaningless. Like biscuits in a pan, when we connect, we rise together. Marginalized folks have been saying this shit for years. More of us than ever see these things now. And now we can’t, and shouldn’t, unsee it. The web we can build together Current state: – Profit first – Can we? – Solutionism – Aesthetics – “Hard” skills – Rockstar coders – Short term thinking – Shipping Future state: – People first: security, privacy, inclusion – Should we? – Holistic design – Accessibility – Soft skills – Teams – Long term thinking – Sustaining So let’s talk about the future. I told you this would be a talk about the future. Like many of y’all I have had a very hard time this year thinking about the future at all. It’s hard to make plans. It’s hard to know what the next few weeks, months, years will look like. And who will be there to see it with us. But sometimes, when I can think clearly about something besides just making it through every day, I wonder. What does a people-first digital world look like? Who’s been missing this whole time? Just because we can do something, does it mean we should? Will technology actually solve this problem? Are we even defining the problem correctly? What does it mean to design knowing that even “able-bodied” folks are only temporarily so? And that our products need to be used, by humans, in various contexts and emotional states? (There are also false binaries here: aesthetics vs. accessibility; abled and disabled; binaries are dangerous!) How can we nourish our collaborations with each other, with our teams, with our users? And focus on the wisdom of the folks in the room rather than assigning individuals as heroes? How can we build for maintenance and repair? How do we stop writing checks our future selves to cash – with interest? Some of this here, I am speaking of as a web user and a web creator. I’ve only ever worked in the public sector. When I talk with folks working in the private sector I always do some amount of translating. At the end of the day, we’re solving many of the same problems. But what can private-sector workers learn from folks who come from a public-sector organization? And, as we think about what we build online, how can we also apply that thinking to our real-life communities? What is our role in shaping the public conversation around the use of technologies? I offer a few ideas here, but don’t want them to limit your thinking. Consider the public sector Here’s a thread about public service. ⚖️🏛️ 💪🏼💻🇺🇸 — Dana Chisnell (she / her) (@danachis) February 5, 2021 I don’t have a ton of time left today. I wanted to talk about public service like the very excellent Dana Chisnell here. Like I said, I’ve worked in the public sector, in higher ed, for a long time. It’s my bread and butter. It’s weird, it’s hard, it’s great. There’s a lot of work to be done, and it ain’t happening at civic hackathons or from external contractors. The call needs to come from inside the house. Working in the public sector Government should be – inclusive of all people – responsive to needs of the people – effective in its duties & purpose — Dana Chisnell (she / her) (@danachis) February 5, 2021 I want you to consider for a minute how many folks are working in the public sector right now, and how technical expertise — especially in-house expertise — is something that is desperately needed. Pictured here are the old website and new website for the city of Richmond. I have a whole ‘nother talk about that new Richmond website. I FOIA’d the contracts for this website. There are 112 accessibility errors on the homepage alone. It’s been in development for 3 years and still isn’t in full production. Bottom line, good government work matters, and it’s hard to find. Important work is put out for the lowest bidder and often external agencies don’t get it right. What would it look like to have that expertise in-house? Influencing technology policy We also desperately need lawmakers and citizens who understand technology and ask important questions about ethics and human impact of systems decisions. Pictured here are some headlines as well as a contract from the City of Richmond. Y’all know we spent $1.5 million on a predictive policing system that will disproportionately harm citizens of color? And that earlier this month, City Council voted to allow Richmond and VCU PD’s to start sharing their data in that system? The surveillance state abides. Technology facilitates. I dare say these technologies are designed to bank on the fact that lawmakers don’t know what they’re looking at. My theory is, in addition to holding deep prejudices, lawmakers are also deeply baffled by technology. The hard questions aren’t being asked, or they’re coming too late, and they’re coming from citizens who have to put themselves in harm’s way to do so. Technophobia is another harmful element that’s emerged in the past decades. What would a world look like where technology is not a thing to shrug off as un-understandable, but is instead deftly co-designed to meet our needs, rather than licensed to our city for 1.5 million dollars? What if everyone knew that technology is not neutral? Closing This is some of the future I can see. I hope that it’s sparked new thoughts for you. Let’s envision a future together. What has the light illuminated for you? Thank you! erinrwhite Published April 16, 2021 Write a Comment Cancel Reply Write a Comment Comment Name Email Website Categories CategoriesSelect Category Bikes Conferences Libraries Life Projects Richmond Archives Archives Select Month April 2021 March 2021 May 2019 March 2019 March 2016 February 2016 September 2015 August 2015 January 2015 December 2014 September 2014 August 2014 May 2014 April 2014 March 2014 March 2013 February 2013 Contact E-mail me Follow @erinrwhite Independent Publisher empowered by WordPress erinrwhite-com-5053 ---- erin white – library technology, UX, the web, bikes, #RVA erin white library technology, UX, the web, bikes, #RVA Skip to content erinrwhite in Libraries, Richmond | April 16, 2021 Talk: Using light from the dumpster fire to illuminate a more just digital world This February I gave a lightning talk for the Richmond Design Group. My question: what if we use the light from the dumpster fire of 2020 to see an equitable, just digital world? How can we change our thinking to build the future web we need? Presentation is embedded here; text of talk is below. Hi everybody, I’m Erin. Before I get started I want to say thank you to the RVA Design Group organizers. This is hard work and some folks have been doing it for YEARS. Thank you to the organizers of this group for doing this work and for inviting me to speak. This talk isn’t about 2020. This talk is about the future. But to understand the future, we gotta look back. The web in 1996 Travel with me to 1996. Twenty-five years ago! I want to transport us back to the mindset of the early web. The fundamental idea of hyperlinks, which we now take for granted, really twisted everyone’s noodles. So much of the promise of the early web was that with broad access to publish in hypertext, the opportunities were limitless. Technologists saw the web as an equalizing space where systems of oppression that exist in the real world wouldn’t matter, and that we’d all be equal and free from prejudice. Nice idea, right? You don’t need to’ve been around since 1996 to know that’s just not the way things have gone down. Pictured before you are some of the early web pioneers. Notice a pattern here? These early visions of the web, including Barlow’s declaration of independence of cyberspace, while inspiring and exciting, were crafted by the same types of folks who wrote the actual declaration of independence: the landed gentry, white men with privilege. Their vision for the web echoed the declaration of independence’s authors’ attempts to describe the world they envisioned. And what followed was the inevitable conflict with reality. We all now hold these truths to be self-evident: The systems humans build reflect humans’ biases and prejudices. We continue to struggle to diversify the technology industry. Knowledge is interest-driven. Inequality exists, online and off. Celebrating, rather than diminishing, folks’ intersecting identities is vital to human flourishing. The web we have known Profit first: monetization, ads, the funnel, dark patterns Can we?: Innovation for innovation’s sake Solutionism: code will save us Visual design: aesthetics over usability Lone genius: “hard” skills and rock star coders Short term thinking: move fast, break stuff Shipping: new features, forsaking infrastructure Let’s move forward quickly through the past 25 years or so of the web, of digital design. All of the web we know today has been shaped in some way by intersecting matrices of domination: colonialism, capitalism, white supremacy, patriarchy. (Thank you, bell hooks.) The digital worlds where we spend our time – and that we build!! – exist in this way. This is not an indictment of anyone’s individual work, so please don’t take it personally. What I’m talking about here is the digital milieu where we live our lives. The funnel drives everything. Folks who work in nonprofits and public entities often tie ourselves in knots to retrofit our use cases in order to use common web tools (google analytics, anyone?) In chasing innovation™ we often overlook important infrastructure work, and devalue work — like web accessibility, truly user-centered design, care work, documentation, customer support and even care for ourselves and our teams — that doesn’t drive the bottom line. We frequently write checks for our future selves to cash, knowing damn well that we’ll keep burying ourselves in technical debt. That’s some tough stuff for us to carry with us every day. The “move fast” mentality has resulted in explosive growth, but at what cost? And in creating urgency where it doesn’t need to exist, focusing on new things rather than repair, the end result is that we’re building a house of cards. And we’re exhausted. To zoom way out, this is another manifestation of late capitalism. Emphasis on LATE. Because…2020 happened. What 2020 taught us Hard times amplify existing inequalities Cutting corners mortgages our future Infrastructure is essential “Colorblind”/color-evasive policy doesn’t cut it Inclusive design is vital We have a duty to each other Technology is only one piece Together, we rise The past year has been awful for pretty much everybody. But what the light from this dumpster fire has illuminated is that things have actually been awful for a lot of people, for a long time. This year has shown us how perilous it is to avoid important infrastructure work and to pursue innovation over access. It’s also shown us that what is sometimes referred to as colorblindness — I use the term color-evasiveness because it is not ableist and it is more accurate — a color-evasive approach that assumes everyone’s needs are the same in fact leaves people out, especially folks who need the most support. We’ve learned that technology is a crucial tool and that it’s just one thing that keeps us connected to each other as humans. Finally, we’ve learned that if we work together we can actually make shit happen, despite a world that tells us individual action is meaningless. Like biscuits in a pan, when we connect, we rise together. Marginalized folks have been saying this shit for years. More of us than ever see these things now. And now we can’t, and shouldn’t, unsee it. The web we can build together Current state: – Profit first – Can we? – Solutionism – Aesthetics – “Hard” skills – Rockstar coders – Short term thinking – Shipping Future state: – People first: security, privacy, inclusion – Should we? – Holistic design – Accessibility – Soft skills – Teams – Long term thinking – Sustaining So let’s talk about the future. I told you this would be a talk about the future. Like many of y’all I have had a very hard time this year thinking about the future at all. It’s hard to make plans. It’s hard to know what the next few weeks, months, years will look like. And who will be there to see it with us. But sometimes, when I can think clearly about something besides just making it through every day, I wonder. What does a people-first digital world look like? Who’s been missing this whole time? Just because we can do something, does it mean we should? Will technology actually solve this problem? Are we even defining the problem correctly? What does it mean to design knowing that even “able-bodied” folks are only temporarily so? And that our products need to be used, by humans, in various contexts and emotional states? (There are also false binaries here: aesthetics vs. accessibility; abled and disabled; binaries are dangerous!) How can we nourish our collaborations with each other, with our teams, with our users? And focus on the wisdom of the folks in the room rather than assigning individuals as heroes? How can we build for maintenance and repair? How do we stop writing checks our future selves to cash – with interest? Some of this here, I am speaking of as a web user and a web creator. I’ve only ever worked in the public sector. When I talk with folks working in the private sector I always do some amount of translating. At the end of the day, we’re solving many of the same problems. But what can private-sector workers learn from folks who come from a public-sector organization? And, as we think about what we build online, how can we also apply that thinking to our real-life communities? What is our role in shaping the public conversation around the use of technologies? I offer a few ideas here, but don’t want them to limit your thinking. Consider the public sector Here’s a thread about public service. ⚖️🏛️ 💪🏼💻🇺🇸 — Dana Chisnell (she / her) (@danachis) February 5, 2021 I don’t have a ton of time left today. I wanted to talk about public service like the very excellent Dana Chisnell here. Like I said, I’ve worked in the public sector, in higher ed, for a long time. It’s my bread and butter. It’s weird, it’s hard, it’s great. There’s a lot of work to be done, and it ain’t happening at civic hackathons or from external contractors. The call needs to come from inside the house. Working in the public sector Government should be – inclusive of all people – responsive to needs of the people – effective in its duties & purpose — Dana Chisnell (she / her) (@danachis) February 5, 2021 I want you to consider for a minute how many folks are working in the public sector right now, and how technical expertise — especially in-house expertise — is something that is desperately needed. Pictured here are the old website and new website for the city of Richmond. I have a whole ‘nother talk about that new Richmond website. I FOIA’d the contracts for this website. There are 112 accessibility errors on the homepage alone. It’s been in development for 3 years and still isn’t in full production. Bottom line, good government work matters, and it’s hard to find. Important work is put out for the lowest bidder and often external agencies don’t get it right. What would it look like to have that expertise in-house? Influencing technology policy We also desperately need lawmakers and citizens who understand technology and ask important questions about ethics and human impact of systems decisions. Pictured here are some headlines as well as a contract from the City of Richmond. Y’all know we spent $1.5 million on a predictive policing system that will disproportionately harm citizens of color? And that earlier this month, City Council voted to allow Richmond and VCU PD’s to start sharing their data in that system? The surveillance state abides. Technology facilitates. I dare say these technologies are designed to bank on the fact that lawmakers don’t know what they’re looking at. My theory is, in addition to holding deep prejudices, lawmakers are also deeply baffled by technology. The hard questions aren’t being asked, or they’re coming too late, and they’re coming from citizens who have to put themselves in harm’s way to do so. Technophobia is another harmful element that’s emerged in the past decades. What would a world look like where technology is not a thing to shrug off as un-understandable, but is instead deftly co-designed to meet our needs, rather than licensed to our city for 1.5 million dollars? What if everyone knew that technology is not neutral? Closing This is some of the future I can see. I hope that it’s sparked new thoughts for you. Let’s envision a future together. What has the light illuminated for you? Thank you! April 16, 2021 | Comment This car runs: Love letter to a 1997 Honda Accord Three years ago I sold my 1997 Honda Accord DX. Here’s the Craigslist ad love letter I wrote to it. 1997 Honda Accord DX – 4dr, automatic – This car runs. – $500 (Richmond, VA) 1997 Honda Accord DX 4 door 4 cylinders 206,193 miles Color: “Eucalyptus green pearl” aka the color and year that […] in Life, Richmond | April 1, 2021 Podcast interview: Names, binaries and trans-affirming systems on Legacy Code Rocks! In February I was honored to be invited to join Scott Ford on his podcast Legacy Code Rocks!. I’m embedding the audio below. View the full episode transcript — thanks to trans-owned Deep South Transcription Services! I’ve pulled out some of the topics we discussed and heavily edited/rearranged them for clarity. Names in systems Legal […] in Libraries | March 31, 2021 Post navigation ← Older posts Categories CategoriesSelect Category Bikes Conferences Libraries Life Projects Richmond Archives Archives Select Month April 2021 March 2021 May 2019 March 2019 March 2016 February 2016 September 2015 August 2015 January 2015 December 2014 September 2014 August 2014 May 2014 April 2014 March 2014 March 2013 February 2013 Contact E-mail me Follow @erinrwhite Independent Publisher empowered by WordPress erambler-co-uk-3847 ---- eRambler eRambler Recent content on eRambler Intro to the fediverse Wow, it turns out to be 10 years since I wrote this beginners guide to Twitter. Things have moved on a loooooong way since then. Far from being the interesting, disruptive technology it was back then, Twitter has become part of the mainstream, the establishment. Almost everyone and everything is on Twitter now, which has both pros and cons. So what’s the problem? It’s now possible to follow all sorts of useful information feeds, from live updates on transport delays to your favourite sports team’s play-by-play performance to an almost infinite number of cat pictures. In my professional life it’s almost guaranteed that anyone I meet will be on Twitter, meaning that I can contact them to follow up at a later date without having to exchange contact details (and they have options to block me if they don’t like that). On the other hand, a medium where everyone’s opinion is equally valid regardless of knowledge or life experience has turned some parts of the internet into a toxic swamp of hatred and vitriol. It’s easier than ever to forget that we have more common ground with any random stranger than we have similarities, and that’s led to some truly awful acts and a poisonous political arena. Part of the problem here is that each of the social media platforms is controlled by a single entity with almost no accountability to anyone other than shareholders. Technological change has been so rapid that the regulatory regime has no idea how to handle them, leaving them largely free to operate how they want. This has led to a whole heap of nasty consequences that many other people have done a much better job of documenting than I could (Shoshana Zuboff’s book The Age of Surveillance Capitalism is a good example). What I’m going to focus on instead are some possible alternatives. If you accept the above argument, one obvious solution is to break up the effective monopoly enjoyed by Facebook, Twitter et al. We need to be able to retain the wonderful affordances of social media but democratise control of it, so that it can never be dominated by a small number of overly powerful players. What’s the solution? There’s actually a thing that already exists, that almost everyone is familiar with and that already works like this. It’s email. There are a hundred thousand email servers, but my email can always find your inbox if I know your address because that address identifies both you and the email service you use, and they communicate using the same protocol, Simple Mail Transfer Protocol (SMTP)1. I can’t send a message to your Twitter from my Facebook though, because they’re completely incompatible, like oil and water. Facebook has no idea how to talk to Twitter and vice versa (and the companies that control them have zero interest in such interoperability anyway). Just like email, a federated social media service like Mastodon allows you to use any compatible server, or even run your own, and follow accounts on your home server or anywhere else, even servers running different software as long as they use the same ActivityPub protocol. There’s no lock-in because you can move to another server any time you like, and interact with all the same people from your new home, just like changing your email address. Smaller servers mean that no one server ends up with enough power to take over and control everything, as the social media giants do with their own platforms. But at the same time, a small server with a small moderator team can enforce local policy much more easily and block accounts or whole servers that host trolls, nazis or other poisonous people. How do I try it? I have no problem with anyone for choosing to continue to use what we’re already calling “traditional” social media; frankly, Facebook and Twitter are still useful for me to keep in touch with a lot of my friends. However, I do think it’s useful to know some of the alternatives if only to make a more informed decision to stick with your current choices. Most of these services only ask for an email address when you sign up and use of your real name vs a pseudonym is entirely optional so there’s not really any risk in signing up and giving one a try. That said, make sure you take sensible precautions like not reusing a password from another account. Instead of… Try… Twitter, Facebook Mastodon, Pleroma, Misskey Slack, Discord, IRC Matrix WhatsApp, FB Messenger, Telegram Also Matrix Instagram, Flickr PixelFed YouTube PeerTube The web Interplanetary File System (IPFS) Which, if you can believe it, was formalised nearly 40 years ago in 1982 and has only had fairly minor changes since then! ↩︎ Collaborations Workshop 2021: collaborative ideas & hackday My last post covered the more “traditional” lectures-and-panel-sessions approach of the first half of the SSI Collaborations Workshop. The rest of the workshop was much more interactive, consisting of a discussion session, a Collaborative Ideas session, and a whole-day hackathon! The discussion session on day one had us choose a topic (from a list of topics proposed leading up to the workshop) and join a breakout room for that topic with the aim of producing a “speed blog” by then end of 90 minutes. Those speed blogs will be published on the SSI blog over the coming weeks, so I won’t go into that in more detail. The Collaborative Ideas session is a way of generating hackday ideas, by putting people together at random into small groups to each raise a topic of interest to them before discussing and coming up with a combined idea for a hackday project. Because of the serendipitous nature of the groupings, it’s a really good way of generating new ideas from unexpected combinations of individual interests. After that, all the ideas from the session, along with a few others proposed by various participants, were pitched as ideas for the hackday and people started to form teams. Not every idea pitched gets worked on during the hackday, but in the end 9 teams of roughly equal size formed to spend the third day working together. My team’s project: “AHA! An Arts & Humanities Adventure” There’s a lot of FOMO around choosing which team to join for an event like this: there were so many good ideas and I wanted to work on several of them! In the end I settled on a team developing an escape room concept to help Arts & Humanities scholars understand the benefits of working with research software engineers for their research. Five of us rapidly mapped out an example storyline for an escape room, got a website set up with GitHub and populated it with the first few stages of the game. We decided to focus on a story that would help the reader get to grips with what an API is and I’m amazed how much we managed to get done in less than a day’s work! You can try playing through the escape room (so far) yourself on the web, or take a look at the GitHub repository, which contains the source of the website along with a list of outstanding tasks to work on if you’re interested in contributing. I’m not sure yet whether this project has enough momentum to keep going, but it was a really valuable way both of getting to know and building trust with some new people and demonstrating the concept is worth more work. Other projects Here’s a brief rundown of the other projects worked on by teams on the day. Coding Confessions Everyone starts somewhere and everyone cuts corners from time to time. Real developers copy and paste! Fight imposter syndrome by looking through some of these confessions or contributing your own. https://coding-confessions.github.io/ CarpenPI A template to set up a Raspberry Pi with everything you need to run a Carpentries (https://carpentries.org/) data science/software engineering workshop in a remote location without internet access. https://github.com/CarpenPi/docs/wiki Research Dugnads A guide to running an event that is a coming together of a research group or team to share knowledge, pass on skills, tidy and review code, among other software and working best practices (based on the Norwegian concept of a dugnad, a form of “voluntary work done together with other people”) https://research-dugnads.github.io/dugnads-hq/ Collaborations Workshop ideas A meta-project to collect together pitches and ideas from previous Collaborations Workshop conferences and hackdays, to analyse patterns and revisit ideas whose time might now have come. https://github.com/robintw/CW-ideas howDescribedIs Integrate existing tools to improve the machine-readable metadata attached to open research projects by integrating projects like SOMEF, codemeta.json and HowFAIRIs (https://howfairis.readthedocs.io/en/latest/index.html). Complete with CI and badges! https://github.com/KnowledgeCaptureAndDiscovery/somef-github-action Software end-of-project plans Develop a template to plan and communicate what will happen when the fixed-term project funding for your research software ends. Will maintenance continue? When will the project sunset? Who owns the IP? https://github.com/elichad/software-twilight Habeas Corpus A corpus of machine readable data about software used in COVID-19 related research, based on the CORD19 dataset. https://github.com/softwaresaved/habeas-corpus Credit-all Extend the all-contributors GitHub bot (https://allcontributors.org/) to include rich information about research project contributions such as the CASRAI Contributor Roles Taxonomy (https://casrai.org/credit/) https://github.com/dokempf/credit-all I’m excited to see so many metadata-related projects! I plan to take a closer look at what the Habeas Corpus, Credit-all and howDescribedIs teams did when I get time. I also really want to try running a dugnad with my team or for the GLAM Data Science network. Collaborations Workshop 2021: talks & panel session I’ve just finished attending (online) the three days of this year’s SSI Collaborations Workshop (CW for short), and once again it’s been a brilliant experience, as well as mentally exhausting, so I thought I’d better get a summary down while it’s still fresh it my mind. Collaborations Workshop is, as the name suggests, much more focused on facilitating collaborations than a typical conference, and has settled into a structure that starts off with with longer keynotes and lectures, and progressively gets more interactive culminating with a hack day on the third day. That’s a lot to write about, so for this post I’ll focus on the talks and panel session, and follow up with another post about the collaborative bits. I’ll also probably need to come back and add in more links to bits and pieces once slides and the “official” summary of the event become available. Updates 2021-04-07 Added links to recordings of keynotes and panel sessions Provocations The first day began with two keynotes on this year’s main themes: FAIR Research Software and Diversity & Inclusion, and day 2 had a great panel session focused on disability. All three were streamed live and the recordings remain available on Youtube: View the keynotes recording; Google-free alternative link View the panel session recording; Google-free alternative link FAIR Research Software Dr Michelle Barker, Director of the Research Software Alliance, spoke on the challenges to recognition of software as part of the scholarly record: software is not often cited. The FAIR4RS working group has been set up to investigate and create guidance on how the FAIR Principles for data can be adapted to research software as well; as they stand, the Principles are not ideally suited to software. This work will only be the beginning though, as we will also need metrics, training, career paths and much more. ReSA itself has 3 focus areas: people, policy and infrastructure. If you’re interested in getting more involved in this, you can join the ReSA email list. Equality, Diversity & Inclusion: how to go about it Dr Chonnettia Jones, Vice President of Research, Michael Smith Foundation for Health Research spoke extensively and persuasively on the need for Equality, Diversity & Inclusion (EDI) initiatives within research, as there is abundant robust evidence that all research outcomes are improved. She highlighted the difficulties current approaches to EDI have effecting structural change, and changing not just individual behaviours but the cultures & practices that perpetuate iniquity. What initiatives are often constructed around making up for individual deficits, a bitter framing is to start from an understanding of individuals having equal stature but having different tired experiences. Commenting on the current focus on “research excellent” she pointed out that the hyper-competition this promotes is deeply unhealthy. suggesting instead that true excellence requires diversity, and we should focus on an inclusive excellence driven by inclusive leadership. Equality, Diversity & Inclusion: disability issues Day 2’s EDI panel session brought together five disabled academics to discuss the problems of disability in research. Dr Becca Wilson, UKRI Innovation Fellow, Institute of Population Health Science, University of Liverpool (Chair) Phoenix C S Andrews (PhD Student, Information Studies, University of Sheffield and Freelance Writer) Dr Ella Gale (Research Associate and Machine Learning Subject Specialist, School of Chemistry, University of Bristol) Prof Robert Stevens (Professor and Head of Department of Computer Science, University of Manchester) Dr Robin Wilson (Freelance Data Scientist and SSI Fellow) NB. The discussion flowed quite freely so the following summary, so the following summary mixes up input from all the panel members. Researchers are often assumed to be single-minded in following their research calling, and aptness for jobs is often partly judged on “time send”, which disadvantages any disabled person who has been forced to take a career break. On top of this disabled people are often time-poor because of the extra time needed to manage their condition, leaving them with less “output” to show for their time served on many common metrics. This can partially affect early-career researchers, since resources for these are often restricted on a “years-since-PhD” criterion. Time poverty also makes funding with short deadlines that much harder to apply for. Employers add more demands right from the start: new starters are typically expected to complete a health and safety form, generally a brief affair that will suddenly become an 80-page bureaucratic nightmare if you tick the box declaring a disability. Many employers claim to be inclusive yet utterly fail to understand the needs of their disabled staff. Wheelchairs are liberating for those who use them (despite the awful but common phrase “wheelchair-bound”) and yet employers will refuse to insure a wheelchair while travelling for work, classifying it as a “high value personal item” that the owner would take the same responsibility for as an expensive camera. Computers open up the world for blind people in a way that was never possible without them, but it’s not unusual for mandatory training to be inaccessible to screen readers. Some of these barriers can be overcome, but doing so takes yet more time that could and should be spent on more important work. What can we do about it? Academia works on patronage whether we like it or not, so be the person who supports people who are different to you rather than focusing on the one you “recognise yourself in” to mentor. As a manager, it’s important to ask each individual what they need and believe them: they are the expert in their own condition and their lived experience of it. Don’t assume that because someone else in your organisation with the same disability needs one set of accommodations, it’s invalid for your staff member to require something totally different. And remember: disability is unusual as a protected characteristic in that anyone can acquire it at any time without warning! Lightning talks Lightning talk sessions are always tricky to summarise, and while this doesn’t do them justice, here are a few highlights from my notes. Data & metadata Malin Sandstrom talked about a much-needed refinement of contributor role taxonomies for scientific computing Stephan Druskat showcased a project to crowdsource a corpus of research software for further analysis Learning & teaching/community Matthew Bluteau introduced the concept of the “coding dojo” as a way to enhance community of practice. A group of coders got together to practice & learn by working together to solve a problem and explaining their work as they go He described 2 models: a code jam, where people work in small groups, and the Randori method, where 2 people do pair programming while the rest observe. I’m excited to try this out! Steve Crouch talked about intermediate skills and helping people take the next step, which I’m also very interested in with the GLAM Data Science network Esther Plomp recounted experience of running multiple Carpentry workshops online, while Diego Alonso Alvarez discussed planned workshops on making research software more usable with GUIs Shoaib Sufi showcased the SSI’s new event organising guide Caroline Jay reported on a diary study into autonomy & agency in RSE during COVID Lopez, T., Jay, C., Wermelinger, M., & Sharp, H. (2021). How has the covid-19 pandemic affected working conditions for research software engineers? Unpublished manuscript. Wrapping up That’s not everything! But this post is getting pretty long so I’ll wrap up for now. I’ll try to follow up soon with a summary of the “collaborative” part of Collaborations Workshop: the idea-generating sessions and hackday! Time for a new look... I’ve decided to try switching this website back to using Hugo to manage the content and generate the static HTML pages. I’ve been on the Python-based Nikola for a few years now, but recently I’ve been finding it quite slow, and very confusing to understand how to do certain things. I used Hugo recently for the GLAM Data Science Network website and found it had come on a lot since the last time I was using it, so I thought I’d give it another go, and redesign this site to be a bit more minimal at the same time. The theme is still a work in progress so it’ll probably look a bit rough around the edges for a while, but I think I’m happy enough to publish it now. When I get round to it I might publish some more detailed thoughts on the design. Ideas for Accessible Communications The Disability Support Network at work recently ran a survey on “accessible communications”, to develop guidance on how to make communications (especially internal staff comms) more accessible to everyone. I grabbed a copy of my submission because I thought it would be useful to share more widely, so here it is. Please note that these are based on my own experiences only. I am in no way suggesting that these are the only things you would need to do to ensure your communications are fully accessible. They’re just some things to keep in mind. Policies/procedures/guidance can be stressful to use if anything is vague or inconsistent, or if it looks like there might be more information implied than is explicitly given (a common cause of this is use of jargon in e.g. HR policies). Emails relating to these policies have similar problems, made worse because they tend to be very brief. Online meetings can be very helpful, but can also be exhausting, especially if there are too many people, or not enough structure. Larger meetings & webinars without agendas (or where the agenda is ignored, or timings are allowed to drift without acknowledgement) are very stressful, as are those where there is not enough structure to ensure fair opportunities to contribute. Written reference documents and communications should: Be carefully checked for consistency and clarity Have all all key points explicitly stated Explicitly acknowledge the need for flexibility where it is necessary, rather than implying or hinting at it Clearly define jargon & acronyms where they are necessary to the point being made, and avoid them otherwise Include links to longer, more explicit versions where space is tight Provide clear bullet-point summaries with links to the details Online meetings should: Include sufficient break time (at least 10 minutes out of every hour) and not allow this to be compromised just because a speaker has misjudged the length of their talk Include initial “settling-in” time in agendas to avoid timing getting messed up from the start Ensure the agenda is stuck to, or that divergence from the agenda is acknowledged explicitly by the chair and updated timing briefly discussed to ensure everyone is clear Establish a norm for participation at the start of the meeting and stick to it e.g. ask people to raise hands when they have a point to make, or have specific time for round-robin contributions Ensure quiet/introverted people have space to contribute, but don’t force them to do so if they have nothing to add at the time Offer a text-based alternative to contributing verbally If appropriate, at the start of the meeting assign specific roles of: Gatekeeper: ensures everyone has a chance to contribute Timekeeper: ensures meeting runs to time Scribe: ensures a consistent record of the meeting Be chaired by someone with the confidence to enforce the above: offer training to all staff on chairing meetings to ensure everyone has the skills to run a meeting effectively Matrix self-hosting I started running my own Matrix server a little while ago. Matrix is something rather cool, a chat system similar to IRC or Slack, but open and federated. Open in that the standard is available for anyone to view, but also the reference implementations of server and client are open source, along with many other clients and a couple of nascent alternative servers. Federated in that, like email, it doesn’t matter what server you sign up with, you can talk to users on your own or any other server. I decided to host my own for three reasons. Firstly, to see if I could and to learn from it. Secondly, to try and rationalise the Cambrian explosion of Slack teams I was being added to in 2019. Thirdly, to take some control of the loss of access to historical messages in some communities that rely on Slack (especially the Carpentries and RSE communities). Since then, I’ve also added a fourth goal: taking advantage of various bridges to bring other messaging network I use (such as Signal and Telegram) into a consistent UI. I’ve also found that my use of Matrix-only rooms has grown as more individuals & communities have adopted the platform. So, I really like Matrix and I use it daily. My problem now is whether to keep self-hosting. Synapse, the only full server implementation at the moment, is really heavy on memory, so I’ve ended up running it on a much bigger server than I thought I’d need, which seems overkill for a single-user instance. So now I have to make a decision about whether it’s worth keeping going, or shutting it down and going back to matrix.org, or setting up on one of the other servers that have sprung up in the last couple of years. There are a couple of other considerations here. Firstly, Synapse resource usage is entirely down to the size of the rooms joined by users of the homeowner, not directly the number of users. So if users have mostly overlapping interests, and thus keep to the same rooms, you can support quite a large community without significant extra resource usage. Secondly, there are a couple of alternative server implementations in development specifically addressing this issue for small servers. Dendrite and Conduit. Neither are quite ready for what I want yet, but are getting close, and when ready that will allow running small homeservers with much more sensible resource usage. So I could start opening up for other users, and at least justify the size of the server that way. I wouldn’t ever want to make it a paid-for service but perhaps people might be willing to make occasional donations towards running costs. That still leaves me with the question of whether I’m comfortable running a service that others may come to rely on, or being responsible for the safety of their information. I could also hold out for Dendrite or Conduit to mature enough that I’m ready to try them, which might not be more than a few months off. Hmm, seems like I’ve convinced myself to stick with it for now, and we’ll see how it goes. In the meantime, if you know me and you want to try it out let me know and I might risk setting you up with an account! What do you miss least about pre-lockdown life? @JanetHughes on Twitter: What do you miss the least from pre-lockdown life? I absolutely do not miss wandering around the office looking for a meeting room for a confidential call or if I hadn’t managed to book a room in advance. Let’s never return to that joyless frustration, hey? 10:27 AM · Feb 3, 2021 After seeing Terence Eden taking Janet Hughes' tweet from earlier this month as a writing prompt, I thought I might do the same. The first thing that leaps to my mind is commuting. At various points in my life I’ve spent between one and three hours a day travelling to and from work and I’ve never more than tolerated it at best. It steals time from your day, and societal norms dictate that it’s your leisure & self-care time that must be sacrificed. Longer commutes allow more time to get into a book or podcast, especially if not driving, but I’d rather have that time at home rather than trying to be comfortable in a train seat designed for some mythical average man shaped nothing like me! The other thing I don’t miss is the colds and flu! Before the pandemic, British culture encouraged working even when ill, which meant constantly coming into contact with people carrying low-grade viruses. I’m not immunocompromised but some allergies and residue of being asthmatic as a child meant that I would get sick 2-3 times a year. A pleasant side-effect of the COVID precautions we’re all taking is that I haven’t been sick for over 12 months now, which is amazing! Finally, I don’t miss having so little control over my environment. One of the things that working from home has made clear is that there are certain unavoidable aspects of working in my shared office that cause me sensory stress, and that are completely unrelated to my work. Working (or trying to work) next to a noisy automatic scanner; trying to find a light level that works for 6 different people doing different tasks; lacking somewhere quiet and still to eat lunch and recover from a morning of meetings or the constant vaguely-distracting bustle of a large shared office. It all takes energy. Although it’s partly been replaced by the new stress of living through a global pandemic, that old stress was a constant drain on my productivity and mood that had been growing throughout my career as I moved (ironically, given the common assumption that seniority leads to more privacy) into larger and larger open plan offices. Remarkable blogging And the handwritten blog saga continues, as I’ve just received my new reMarkable 2 tablet, which is designed for reading, writing and nothing else. It uses a super-responsive e-ink display and writing on it with a stylus is a dream. It has a slightly rough texture with just a bit of friction that makes my writing come out a lot more legibly than on a slippery glass touchscreen. If that was all there was to it, I might not have wasted my money, but it turns out that it runs on Linux and the makers have wisely decided not to lock it down but to give you full root mess. Yes, you read that right: root access. It presents as an ethernet device over USB, so you can SSH in with a password found in the settings and have full control over your own devices. What a novel concept. This fact alone has meant it’s built a small yet devoted community of users who have come up with some clever ways of extending its functionality. In fact, many of these are listed on this GitHub repository. Finally, from what I’ve seen so far, the handwriting recognition is impressive to say the least. This post was written on it and needed only a little editing. I think this is a device that will get a lot of use! GLAM Data Science Network fellow travellers Updates 2021-02-04 Thanks to Gene @dzshuniper@ausglam.space for suggesting ADHO and a better attribution for the opening quote (see comments below for details) See comments & webmentions for details. “If you want to go fast, go alone. If you want to go far, go together.” — African proverb, probably popularised in English by Kenyan church leader Rev. Samuel Kobia (original) This quote is a popular one in the Carpentries community, and I interpret it in this context to mean that a group of people working together is more sustainable than individuals pursuing the same goal independently. That’s something that speaks to me, and that I want to make sure is reflected in nurturing this new community for data science in galleries, archives, libraries & museums (GLAM). To succeed, this work needs to be complementary and collaborative, rather than competitive, so I want to acknowledge a range of other networks & organisations whose activities complement this. The rest of this article is an unavoidably incomplete list of other relevant organisations whose efforts should be acknowledged and potentially built on. And it should go without saying, but just in case: if the work I’m planning fits right into an existing initiative, then I’m happy to direct my resources there rather than duplicate effort. Inspirations & collaborators Groups with similar goals or undertaking similar activities, but focused on a different sector, geographic area or topic. I think we should make as much use of and contribution to these existing communities as possible since there will be significant overlap. code4lib Probably the closest existing community to what I want to build, but primarily based in the US, so timezones (and physical distance for in-person events) make it difficult to participate fully. This is a well-established community though, with regular events including an annual conference so there’s a lot to learn here. newCardigan Similar to code4lib but an Australian focus, so the timezone problem is even bigger! GLAM Labs Focused on supporting the people experimenting with and developing the infrastructure to enable scholars to access GLAM materials in new ways. In some ways, a GLAM data science network would be complementary to their work, by providing people not directly involved with building GLAM Labs with the skills to make best use of GLAM Labs infrastructure. UK Government data science community Another existing community with very similar intentions, but focused on UK Government sector. Clearly the British Library and a few national & regional museums & archives fall into this, but much of the rest of the GLAM sector does not. Artifical Intelligence for Libraries, Archives & Museums (AI4LAM) A multinational collaboration between several large libraries, archives and museums with a specific focus on the Artificial Intelligence (AI) subset of data science UK Reproducibility Network A network of researchers, primarily in HEIs, with an interest in improving the transparency and reliability of academic research. Mostly science-focused but with some overlap of goals around ethical and robust use of data. Museums Computer Group I’m less familiar with this than the others, but it seems to have a wider focus on technology generally, within the slightly narrower scope of museums specifically. Again, a lot of potential for collaboration. Training Several organisations and looser groups exist specifically to develop and deliver training that will be relevant to members of this network. The network also presents an opportunity for those who have done a workshop with one of these and want to know what the “next steps” are to continue their data science journey. The Carpentries, aka: Library Carpentry Data Carpentry Software Carpentry Data Science Training for Librarians (DST4L) The Programming Historian CDH Cultural Heritage Data School Supporters These misson-driven organisations have goals that align well with what I imagine for the GLAM DSN, but operate at a more strategic level. They work by providing expert guidance and policy advice, lobbying and supporting specific projects with funding and/or effort. In particular, the SSI runs a fellowship programme which is currently providing a small amount of funding to this project. Digital Preservation Coalition (DPC) Software Sustainability Institute (SSI) Research Data Alliance (RDA) Alliance of Digital Humanities Organizations (ADHO) … and its Libraries and Digital Humanities Special Interest Group (Lib&DH SIG) Professional bodies These organisations exist to promote the interests of professionals in particular fields, including supporting professional development. I hope they will provide communication channels to their various members at the least, and may be interested in supporting more directly, depending on their mission and goals. Society of Research Software Engineering Chartered Institute of Library and Information Professionals Archives & Records Association Museums Association Conclusion As I mentioned at the top of the page, this list cannot possibly be complete. This is a growing area and I’m not the only or first person to have this idea. If you can think of anything glaring that I’ve missed and you think should be on this list, leave a comment or tweet/toot at me! A new font for the blog I’ve updated my blog theme to use the quasi-proportional fonts Iosevka Aile and Iosevka Etoile. I really like the aesthetic, as they look like fixed-width console fonts (I use the true fixed-width version of Iosevka in my terminal and text editor) but they’re actually proportional which makes them easier to read. https://typeof.net/Iosevka/ Training a model to recognise my own handwriting If I’m going to train an algorithm to read my weird & awful writing, I’m going to need a decent-sized training set to work with. And since one of the main things I want to do with it is to blog “by hand” it makes sense to focus on that type of material for training. In other words, I need to write out a bunch of blog posts on paper, scan them and transcribe them as ground truth. The added bonus of this plan is that after transcribing, I also end up with some digital text I can use as an actual post — multitasking! So, by the time you read this, I will have already run it through a manual transcription process using Transkribus to add it to my training set, and copy-pasted it into emacs for posting. This is a fun little project because it means I can: Write more by hand with one of my several nice fountain pens, which I enjoy Learn more about the operational process some of my colleagues go through when digitising manuscripts Learn more about the underlying technology & maths, and how to tune the process Produce more lovely content! For you to read! Yay! Write in a way that forces me to put off editing until after a first draft is done and focus more on getting the whole of what I want to say down. That’s it for now — I’ll keep you posted as the project unfolds. Addendum Tee hee! I’m actually just enjoying the process of writing stuff by hand in long-form prose. It’ll be interesting to see how the accuracy turns out and if I need to be more careful about neatness. Will it be better or worse than the big but generic models used by Samsung Notes or OneNote. Maybe I should include some stylus-written text for comparison. Blogging by hand I wrote the following text on my tablet with a stylus, which was an interesting experience: So, thinking about ways to make writing fun again, what if I were to write some of them by hand? I mean I have a tablet with a pretty nice stylus, so maybe handwriting recognition could work. One major problem, of course, is that my handwriting is AWFUL! I guess I’ll just have to see whether the OCR is good enough to cope… It’s something I’ve been thinking about recently anyway: I enjoy writing with a proper fountain pen, so is there a way that I can have a smooth workflow to digitise handwritten text without just typing it back in by hand? That would probably be preferable to this, which actually seems to work quite well but does lead to my hand tensing up to properly control the stylus on the almost-frictionless glass screen. I’m surprised how well it worked! Here’s a sample of the original text: And here’s the result of converting that to text with the built-in handwriting recognition in Samsung Notes: Writing blog posts by hand So, thinking about ways to make writing fun again, what if I were to write some of chum by hand? I mean, I have a toldest winds a pretty nice stylus, so maybe handwriting recognition could work. One major problems, ofcourse, is that my , is AWFUL! Iguess I’ll just have to see whattime the Ocu is good enough to cope… It’s something I’ve hun tthinking about recently anyway: I enjoy wilting with a proper fountain pion, soischeme a way that I can have a smooch workflow to digitise handwritten text without just typing it back in by hand? That wouldprobally be preferableto this, which actually scams to work quito wall but doers load to my hand tensing up to properly couldthe stylus once almost-frictionlessg lass scream. It’s pretty good! It did require a fair bit of editing though, and I reckon we can do better with a model that’s properly trained on a large enough sample of my own handwriting. What I want from a GLAM/Cultural Heritage Data Science Network Introduction As I mentioned last year, I was awarded a Software Sustainability Institute Fellowship to pursue the project of setting up a Cultural Heritage/GLAM data science network. Obviously, the global pandemic has forced a re-think of many plans and this is no exception, so I’m coming back to reflect on it and make sure I’m clear about the core goals so that everything else still moves in the right direction. One of the main reasons I have for setting up a GLAM data science network is because it’s something I want. The advice to “scratch your own itch” is often given to people looking for an open project to start or contribute to, and the lack of a community of people with whom to learn & share ideas and practice is something that itches for me very much. The “motivation” section in my original draft project brief for this work said: Cultural heritage work, like all knowledge work, is increasingly data-based, or at least gives opportunities to make use of data day-to-day. The proper skills to use this data enable more effective working. Knowledge and experience thus gained improves understanding of and empathy with users also using such skills. But of course, I have my own reasons for wanting to do this too. In particular, I want to: Advocate for the value of ethical, sustainable data science across a wide range of roles within the British Library and the wider sector Advance the sector to make the best use of data and digital sources in the most ethical and sustainable way possible Understand how and why people use data from the British Library, and plan/deliver better services to support that Keep up to date with relevant developments in data science Learn from others' skills and experiences, and share my own in turn Those initial goals imply some further supporting goals: Build up the confidence of colleagues who might benefit from data science skills but don’t feel they are “technical” or “computer literate” enough Further to that, build up a base of colleagues with the confidence to share their skills & knowledge with others, whether through teaching, giving talks, writing or other channels Identify common awareness gaps (skills/knowledge that people don’t know they’re missing) and address them Develop a communal space (primarily online) in which people feel safe to ask questions Develop a body of professional practice and help colleagues to learn and contribute to the evolution of this, including practices of data ethics, software engineering, statistics, high performance computing, … Break down language barriers between data scientists and others I’ll expand on this separately as my planning develops, but here are a few specific activities that I’d like to be able to do to support this: Organise less-formal learning and sharing events to complement the more formal training already available within organisations and the wider sector, including “show and tell” sessions, panel discussions, code cafés, masterclasses, guest speakers, reading/study groups, co-working sessions, … Organise training to cover intermediate skills and knowledge currently missing from the available options, including the awareness gaps and professional practice mentioned above Collect together links to other relevant resources to support self-led learning Decisions to be made There are all sorts of open questions in my head about this right now, but here are some of the key ones. Is it GLAM or Cultural Heritage? When I first started planning this whole thing, I went with “Cultural Heritage”, since I was pretty transparently targeting my own organisation. The British Library is fairly unequivocally a CH organisation. But as I’ve gone along I’ve found myself gravitating more towards the term “GLAM” (which stands for Galleries, Libraries, Archives, Museums) as it covers a similar range of work but is clearer (when you spell out the acronym) about what kinds of work are included. What skills are relevant? This turns out to be surprisingly important, at least in terms of how the community is described, as they define the boundaries of the community and can be the difference between someone feeling welcome or excluded. For example, I think that some introductory statistics training would be immensely valuable for anyone working with data to understand what options are open to them and what limitations those options have, but is the word “statistics” offputting per se to those who’ve chosen a career in arts & humanities? I don’t know because I don’t have that background and perspective. Keep it internal to the BL, or open up early on? I originally planned to focus primarily on my own organisation to start with, feeling that it would be easier to organise events and build a network within a single organisation. However, the pandemic has changed my thinking significantly. Firstly, it’s now impossible to organise in-person events and that will continue for quite some time to come, so there is less need to focus on the logistics of getting people into the same room. Secondly, people within the sector are much more used to attending remote events, which can easily be opened up to multiple organisations in many countries, timezones allowing. It now makes more sense to focus primarily on online activities, which opens up the possibility of building a critical mass of active participants much more quickly by opening up to the wider sector. Conclusion This is the type of post that I could let run and run without ever actually publishing, but since it’s something I need feedback and opinions on from other people, I’d better ship it! I really want to know what you think about this, whether you feel it’s relevant to you and what would make it useful. Comments are open below, or you can contact me via Mastodon or Twitter. Writing About Not Writing Under Construction Grunge Sign by Nicolas Raymond — CC BY 2.0 Every year, around this time of year, I start doing two things. First, I start thinking I could really start to understand monads and write more than toy programs in Haskell. This is unlikely to ever actually happen unless and until I get a day job where I can justify writing useful programs in Haskell, but Advent of Code always gets me thinking otherwise. Second, I start mentally writing this same post. You know, the one about how the blogger in question hasn’t had much time to write but will be back soon? “Sorry I haven’t written much lately…” It’s about as cliché as a Geocities site with a permanent “Under construction” GIF. At some point, not long after the dawn of ~time~ the internet, most people realised that every website was permanently under construction and publishing something not ready to be published was just pointless. So I figured this year I’d actually finish writing it and publish it. After all, what’s the worst that could happen? If we’re getting all reflective about this, I could probably suggest some reasons why I’m not writing much: For a start, there’s a lot going on in both my world and The World right now, which doesn’t leave a lot of spare energy after getting up, eating, housework, working and a few other necessary activities. As a result, I’m easily distracted and I tend to let myself get dragged off in other directions before I even get to writing much of anything. If I do manage to focus on this blog in general, I’ll often end up working on some minor tweak to the theme or functionality. I mean, right now I’m wondering if I can do something clever in my text-editor (Emacs, since you’re asking) to streamline my writing & editing process so it’s more elegant, efficient, ergonomic and slightly closer to perfect in every way. It also makes me much more likely to self-censor, and to indulge my perfectionist tendencies to try and tweak the writing until it’s absolutely perfect, which of course never happens. I’ve got a whole heap of partly-written posts that are juuuust waiting for the right motivation for me to just finish them off. The only real solution is to accept that: I’m not going to write much and that’s probably OK What I do write won’t always be the work of carefully-researched, finely crafted genius that I want it to be, and that’s probably OK too Also to remember why I started writing and publishing stuff in the first place: to reflect and get my thoughts out onto a (virtual) page so that I can see them, figure out whether I agree with myself and learn; and to stimulate discussion and get other views on my (possibly uninformed, incorrect or half-formed) thoughts, also to learn. In other words, a thing I do for me. It’s easy to forget that and worry too much about whether anyone else wants to read my s—t. Will you notice any changes? Maybe? Maybe not? Who knows. But it’s a new year and that’s as good a time for a change as any. When is a persistent identifier not persistent? Or an identifier? I wrote a post on the problems with ISBNs as persistent identifiers (PIDS) for work, so check it out if that sounds interesting. IDCC20 reflections I’m just back from IDCC20, so here are a few reflections on this year’s conference. You can find all the available slides and links to shared notes on the conference programme. There’s also a list of all the posters and an overview of the Unconference Skills for curation of diverse datasets Here in the UK and elsewhere, you’re unlikely to find many institutions claiming to apply a deep level of curation to every dataset/software package/etc deposited with them. There are so many different kinds of data and so few people in any one institution doing “curation” that it’s impossible to do this for everything. Absent the knowledge and skills required to fully evaluate an object the best that can be done is usually to make a sense check on the metadata and flag up with the depositor potential for high-level issues such as accidental disclosure of sensitive personal information. The Data Curation Network in the United States is aiming to address this issue by pooling expertise across multiple organisations. The pilot has been highly successful and they’re now looking to obtain funding to continue this work. The Swedish National Data Service is experimenting with a similar model, also with a lot of success. As well as sharing individual expertise, the DCN collaboration has also produced some excellent online quick-reference guides for curating common types of data. We had some further discussion as part of the Unconference on the final day about what it would look like to introduce this model in the UK. There was general agreement that this was a good idea and a way to make optimal use of sparse resources. There were also very valid concerns that it would be difficult in the current financial climate for anyone to justify doing work for another organisation, apparently for free. In my mind there are two ways around this, which are not mutually exclusive by any stretch of the imagination. First is to Just Do It: form an informal network of curators around something simple like a mailing list, and give it a try. Second is for one or more trusted organisations to provide some coordination and structure. There are several candidates for this including DCC, Jisc, DPC and the British Library; we all have complementary strengths in this area so it’s my hope that we’ll be able to collaborate around it. In the meantime, I hope the discussion continues. Artificial intelligence, machine learning et al As you might expect at any tech-oriented conference there was a strong theme of AI running through many presentations, starting from the very first keynote from Francine Berman. Her talk, The Internet of Things: Utopia or Dystopia? used self-driving cars as a case study to unpack some of the ethical and privacy implications of AI. For example, driverless cars can potentially increase efficiency, both through route-planning and driving technique, but also by allowing fewer vehicles to be shared by more people. However, a shared vehicle is not a private space in the way your own car is: anything you say or do while in that space is potentially open to surveillance. Aside from this, there are some interesting ideas being discussed, particularly around the possibility of using machine learning to automate increasingly complex actions and workflows such as data curation and metadata enhancement. I didn’t get the impression anyone is doing this in the real world yet, but I’ve previously seen theoretical concepts discussed at IDCC make it into practice so watch this space! Playing games! Training is always a major IDCC theme, and this year two of the most popular conference submissions described games used to help teach digital curation concepts and skills. Mary Donaldson and Matt Mahon of the University of Glasgow presented their use of Lego to teach the concept of sufficient metadata. Participants build simple models before documenting the process and breaking them down again. Then everyone had to use someone else’s documentation to try and recreate the models, learning important lessons about assumptions and including sufficient detail. Kirsty Merrett and Zosia Beckles from the University of Bristol brought along their card game “Researchers, Impact and Publications (RIP)”, based on the popular “Cards Against Humanity”. RIP encourages players to examine some of the reasons for and against data sharing with plenty of humour thrown in. Both games were trialled by many of the attendees during Thursday’s Unconference. Summary I realised in Dublin that it’s 8 years since I attended my first IDCC, held at the University of Bristol in December 2011 while I was still working at the nearby University of Bath. While I haven’t been every year, I’ve been to every one held in Europe since then and it’s interesting to see what has and hasn’t changed. We’re no longer discussing data management plans, data scientists or various other things as abstract concepts that we’d like to encourage, but dealing with the real-world consequences of them. The conference has also grown over the years: this year was the biggest yet, boasting over 300 attendees. There has been especially big growth in attendees from North America, Australasia, Africa and the Middle East. That’s great for the diversity of the conference as it brings in more voices and viewpoints than ever. With more people around to interact with I have to work harder to manage my energy levels but I think that’s a small price to pay. Iosevka: a nice fixed-width-font Iosevka is a nice, slender monospace font with a lot of configurable variations. Check it out: https://typeof.net/Iosevka/ Replacing comments with webmentions Just a quickie to say that I’ve replaced the comment section at the bottom of each post with webmentions, which allows you to comment by posting on your own site and linking here. It’s a fundamental part of the IndieWeb, which I’m slowly getting to grips with having been a halfway member of it for years by virtue of having my own site on my own domain. I’d already got rid of Google Analytics to stop forcing that tracking on my visitors, I wanted to get rid of Disqus too because I’m pretty sure the only way that is free for me is if they’re selling my data and yours to third parties. Webmention is a nice alternative because it relies only on open standards, has no tracking and allows people to control their own comments. While I’m currently using a third-party service to help, I can switch to self-hosted at any point in the future, completely transparently. Thanks to webmention.io, which handles incoming webmentions for me, and webmention.js, which displays them on the site, I can keep it all static and not have to implement any of this myself, which is nice. It’s a bit harder to comment because you have to be able to host your own content somewhere, but then almost no-one ever commented anyway, so it’s not like I’ll lose anything! Plus, if I get Bridgy set up right, you should be able to comment just by replying on Mastodon, Twitter or a few other places. A spot of web searching shows that I’m not the first to make the Disqus -> webmentions switch (yes, I’m putting these links in blatantly to test outgoing webmentions with Telegraph…): So long Disqus, hello webmention — Nicholas Hoizey Bye Disqus, hello Webmention! — Evert Pot Implementing Webmention on a static site — Deluvi Let’s see how this goes! Bridging Carpentries Slack channels to Matrix It looks like I’ve accidentally taken charge of bridging a bunch of The Carpentries Slack channels over to Matrix. Given this, it seems like a good idea to explain what that sentence means and reflect a little on my reasoning. I’m more than happy to discuss the pros and cons of this approach If you just want to try chatting in Matrix, jump to the getting started section What are Slack and Matrix? Slack (see also on Wikipedia), for those not familiar with it, is an online text chat platform with the feel of IRC (Internet Relay Chat), a modern look and feel and both web and smartphone interfaces. By providing a free tier that meets many peoples' needs on its own Slack has become the communication platform of choice for thousands of online communities, private projects and more. One of the major disadvantages of using Slack’s free tier, as many community organisations do, is that as an incentive to upgrade to a paid service your chat history is limited to the most recent 10,000 messages across all channels. For a busy community like The Carpentries, this means that messages older than about 6-7 weeks are already inaccessible, rendering some of the quieter channels apparently empty. As Slack is at pains to point out, that history isn’t gone, just archived and hidden from view unless you pay the low, low price of $1/user/month. That doesn’t seem too pricy, unless you’re a non-profit organisation with a lot of projects you want to fund and an active membership of several hundred worldwide, at which point it soon adds up. Slack does offer to waive the cost for registered non-profit organisations, but only for one community. The Carpentries is not an independent organisation, but one fiscally sponsored by Community Initiatives, which has already used its free quota of one elsewhere rendering the Carpentries ineligible. Other umbrella organisations such as NumFocus (and, I expect, Mozilla) also run into this problem with Slack. So, we have a community which is slowly and inexorably losing its own history behind a paywall. For some people this is simply annoying, but from my perspective as a facilitator of the preservation of digital things the community is haemhorraging an important record of its early history. Enter Matrix. Matrix is a chat platform similar to IRC, Slack or Discord. It’s divided into separate channels, and users can join one or more of these to take part in the conversation happening in those channels. What sets it apart from older technology like IRC and walled gardens like Slack & Discord is that it’s federated. Federation means simply that users on any server can communicate with users and channels on any other server. Usernames and channel addresses specify both the individual identifier and the server it calls home, just as your email address contains all the information needed for my email server to route messages to it. While users are currently tied to their home server, channels can be mirrored and synchronised across multiple servers making the overall system much more resilient. Can’t connect to your favourite channel on server X? No problem: just connect via its alias on server Y and when X comes back online it will be resynchronised. The technology used is much more modern and secure than the aging IRC protocol, and there’s no vender lock-in like there is with closed platforms like Slack and Discord. On top of that, Matrix channels can easily be “bridged” to channels/rooms on other platforms, including, yes, Slack, so that you can join on Matrix and transparently talk to people connected to the bridged room, or vice versa. So, to summarise: The current Carpentries Slack channels could be bridged to Matrix at no cost and with no disruption to existing users The history of those channels from that point on would be retained on matrix.org and accessible even when it’s no longer available on Slack If at some point in the future The Carpentries chose to invest in its own Matrix server, it could adopt and become the main Matrix home of these channels without disruption to users of either Matrix or (if it’s still in use at that point) Slack Matrix is an open protocol, with a reference server implementation and wide range of clients all available as free software, which aligns with the values of the Carpentries community On top of this: I’m fed up of having so many different Slack teams to switch between to see the channels in all of them, and prefer having all the channels I regularly visit in a single unified interface; I wanted to see how easy this would be and whether others would also be interested. Given all this, I thought I’d go ahead and give it a try to see if it made things more manageable for me and to see what the reaction would be from the community. How can I get started? !!! reminder Please remember that, like any other Carpentries space, the Code of Conduct applies in all of these channels. First, sign up for a Matrix account. The quickest way to do this is on the Matrix “Try now” page, which will take you to the Riot Web client which for many is synonymous with Matrix. Other clients are also available for the adventurous. Second, join one of the channels. The links below will take you to a page that will let you connect via your preferred client. You’ll need to log in as they are set not to allow guest access, but, unlike Slack, you won’t need an invitation to be able to join. #general — the main open channel to discuss all things Carpentries #random — anything that would be considered offtopic elsewhere #welcome — join in and introduce yourself! That’s all there is to getting started with Matrix. To find all the bridged channels there’s a Matrix “community” that I’ve added them all to: Carpentries Matrix community. There’s a lot more, including how to bridge your favourite channels from Slack to Matrix, but this is all I’ve got time and space for here! If you want to know more, leave a comment below, or send me a message on Slack (jezcope) or maybe Matrix (@petrichor:matrix.org)! I’ve also made a separate channel for Matrix-Slack discussions: #matrix on Slack and Carpentries Matrix Discussion on Matrix MozFest19 first reflections Discussions of neurodiversity at #mozfest Photo by Jennifer Riggins The other weekend I had my first experience of Mozilla Festival, aka #mozfest. It was pretty awesome. I met quite a few people in real life that I’ve previously only known (/stalked) on Twitter, and caught up with others that I haven’t seen for a while. I had the honour of co-facilitating a workshop session on imposter syndrome and how to deal with it with the wonderful Yo Yehudi and Emmy Tsang. We all learned a lot and hope our participants did too; we’ll be putting together a summary blog post as soon as we can get our act together! I also attended a great session, led by Kiran Oliver (psst, they’re looking for a new challenge), on how to encourage and support a neurodiverse workforce. I was only there for the one day, and I really wish that I’d taken the plunge and committed to the whole weekend. There’s always next year though! To be honest, I’m just disappointed that I never had the courage to go sooner, Music for working Today1 the office conversation turned to blocking out background noise. (No, the irony is not lost on me.) Like many people I work in a large, open-plan office, and I’m not alone amongst my colleagues in sometimes needing to find a way to boost concentration by blocking out distractions. Not everyone is like this, but I find music does the trick for me. I also find that different types of music are better for different types of work, and I use this to try and manage my energy better. There are more distractions than auditory noise, and at times I really struggle with visual noise. Rather than have this post turn into a rant about the evils of open-plan offices, I’ll just mention that the scientific evidence doesn’t paint them in a good light2, or at least suggests that the benefits are more limited in scope than is commonly thought3, and move on to what I actually wanted to share: good music for working to. There are a number of genres that I find useful for working. Generally, these have in common a consistent tempo, a lack of lyrics, and enough variation to prevent boredom without distracting. Familiarity helps my concentration too so I’ll often listen to a restricted set of albums for a while, gradually moving on by dropping one out and bringing in another. In my case this includes: Traditional dance music, generally from northern and western European traditions for me. This music has to be rhythmically consistent to allow social dancing, and while the melodies are typically simple repeated phrases, skilled musicians improvise around that to make something beautiful. I tend to go through phases of listening to particular traditions; I’m currently listening to a lot of French, Belgian and Scandinavian. Computer game soundtracks, which are specifically designed to enhance gameplay without distracting, making them perfect for other activities requiring a similar level of concentration. Chiptunes and other music incorporating it; partly overlapping with the previous category, chiptunes is music made by hacking the audio chips from (usually) old computers and games machines to become an instrument for new music. Because of the nature of the instrument, this will have millisecond-perfect rhythm and again makes for undistracting noise blocking with an extra helping of nostalgia! Purists would disagree with me, but I like artists that combine chiptunes with other instruments and effects to make something more complete-sounding. Retrowave/synthwave/outrun, synth-driven music that’s instantly familiar as the soundtrack to many 90s sci-fi and thriller movies. Atmospheric, almost dreamy, but rhythmic with a driving beat, it’s another genre that fits into the “pleasing but not too surprising” category for me. So where to find this stuff? One of the best resources I’ve found is Music for Programming which provides carefully curated playlists of mostly electronic music designed to energise without distracting. They’re so well done that the tracks move seamlessly, one to the next, without ever getting boring. Spotify is an obvious option, and I do use it quite a lot. However, I’ve started trying to find ways to support artists more directly, and Bandcamp seems to be a good way of doing that. It’s really easy to browse by genre, or discover artists similar to what you’re currently hearing. You can listen for free as long as you don’t mind occasional nags to buy the music you’re hearing, but you can also buy tracks or albums. Music you’ve paid for is downloadable in several open, DRM-free formats for you to keep, and you know that a decent chunk of that cash is going directly to that artist. I also love noise generators; not exactly music, but a variety of pleasant background noises, some of which nicely obscure typical office noise. I particularly like mynoise.net, which has a cornucopia of different natural and synthetic noises. Each generator comes with a range of sliders allowing you to tweak the composition and frequency range, and will even animate them randomly for you to create a gently shifting soundscape. A much simpler, but still great, option is Noisli with it’s nice clean interface. Both offer apps for iOS and Android. For bonus points, you can always try combining one or more of the above. Adding in a noise generator allows me to listen to quieter music while still getting good environmental isolation when I need concentration. Another favourite combo is to open both the cafe and rainfall generators from myNoise, made easier by the ability to pop out a mini-player then open up a second generator. I must be missing stuff though. What other musical genres should I try? What background sounds are nice to work to? Well, you know. The other day. Whatever. ↩︎ See e.g.: Lee, So Young, and Jay L. Brand. ‘Effects of Control over Office Workspace on Perceptions of the Work Environment and Work Outcomes’. Journal of Environmental Psychology 25, no. 3 (1 September 2005): 323–33. https://doi.org/10.1016/j.jenvp.2005.08.001. ↩︎ Open plan offices can actually work under certain conditions, The Conversation ↩︎ Working at the British Library: 6 months in It barely seems like it, but I’ve been at the British Library now for nearly 6 months. It always takes a long time to adjust and from experience I know it’ll be another year before I feel fully settled, but my team, department and other colleagues have really made me feel welcome and like I belong. One thing that hasn’t got old yet is the occasional thrill of remembering that I work at my national library now. Every now and then I’ll catch a glimpse of the collections at Boston Spa or step into one of the reading rooms and think “wow, I actually work here!” I also like having a national and international role to play, which means I get to travel a bit more than I used to. Budgets are still tight so there are limits, and I still prefer to be home more often than not, but there is more scope in this job than I’ve had previously for travelling to conferences, giving talks that change the way people think, and learning in different contexts. I’m learning a lot too, especially how to work with and manage people split across multiple sites, and the care and feeding of budgets. As well as missing mo old team at Sheffield, I do also miss some of the direct contact I had with researchers in HE. I especially miss the teaching work, but also the higher-level influencing of more senior academics to change practices on a wider scale. Still, I get to use those influencing skills in different ways now, and I’m still involved with the Carpentries which should let me keep my hand in with teaching. I still deal with my general tendency to try and do All The Things, and as before I’m slowly learning to recognise it, tame it and very occasionally turn it to my advantage. That also leads to feelings of imposterism that are only magnified by the knowledge that I now work at a national institution! It’s a constant struggle some days to believe that I’ve actually earned my place here through hard work, Even if I don’t always feel that I have, my colleagues here certainly have, so I should have more faith in their opinion of me. Finally, I couldn’t write this type of thing without mentioning the commute. I’ve gone from 90 minutes each way on a good day (up to twice that if the trains were disrupted) to 35 minutes each way along fairly open roads. I have less time to read, but much more time at home. On top of that, the library has implemented flexitime across all pay grades, with even senior managers strongly encouraged to make full use. Not only is this an important enabler of equality across the organisation, it relieves for me personally the pressure to work over my contracted hours and the guilt I’ve always felt at leaving work even 10 minutes early. If I work late, it’s now a choice I’m making based on business needs instead of guilt and in full knowledge that I’ll get that time back later. So that’s where I am right now. I’m really enjoying the work and the culture, and I look forward to what the next 6 months will bring! RDA Plenary 13 reflection Photo by me I sit here writing this in the departure lounge at Philadelphia International Airport, waiting for my Aer Lingus flight back after a week at the 13th Research Data Alliance (RDA) Plenary (although I’m actually publishing this a week or so later at home). I’m pretty exhausted, partly because of the jet lag, and partly because it’s been a very full week with so much to take in. It’s my first time at an RDA Plenary, and it was quite a new experience for me! First off, it’s my first time outside Europe, and thus my first time crossing quite so many timezones. I’ve been waking at 5am and ready to drop by 8pm, but I’ve struggled on through! Secondly, it’s the biggest conference I’ve been to for a long time, both in number of attendees and number of parallel sessions. There’s been a lot of sustained input so I’ve been very glad to have a room in the conference hotel and be able to escape for a few minutes when I needed to recharge. Thirdly, it’s not really like any other conference I’ve been to: rather than having large numbers of presentations submitted by attendees, each session comprises lots of parallel meetings of RDA interest groups and working groups. It’s more community-oriented: an opportunity for groups to get together face to face and make plans or show off results. I found it pretty intense and struggled to take it all in, but incredibly valuable nonetheless. Lots of information to process (I took a lot of notes) and a few contacts to follow up on too, so overall I loved it! Using Pipfile in Binder Photo by Sear Greyson on Unsplash I recently attended a workshop, organised by the excellent team of the Turing Way project, on a tool called BinderHub. BinderHub, along with public hosting platform MyBinder, allows you to publish computational notebooks online as “binders” such that they’re not static but fully interactive. It’s able to do this by using a tool called repo2docker to capture the full computational environment and dependencies required to run the notebook. !!! aside “What is the Turing Way?” The Turing Way is, in its own words, “a lightly opinionated guide to reproducible data science.” The team is building an open textbook and running a number of workshops for scientists and research software engineers, and you should check out the project on Github. You could even contribute! The Binder process goes roughly like this: Do some work in a Jupyter Notebook or similar Put it into a public git repository Add some extra metadata describing the packages and versions your code relies on Go to mybinder.org and tell it where to find your repository Open the URL it generates for you Profit Other than step 5, which can take some time to build the binder, this is a remarkably quick process. It supports a number of different languages too, including built-in support for R, Python and Julia and the ability to configure pretty much any other language that will run on Linux. However, the Python support currently requires you to have either a requirements.txt or Conda-style environment.yml file to specify dependencies, and I commonly use a Pipfile for this instead. Pipfile allows you to specify a loose range of compatible versions for maximal convenience, but then locks in specific versions for maximal reproducibility. You can upgrade packages any time you want, but you’re fully in control of when that happens, and the locked versions are checked into version control so that everyone working on a project gets consistency. Since Pipfile is emerging as something of a standard thought I’d see if I could use that in a binder, and it turns out to be remarkably simple. The reference implementation of Pipfile is a tool called pipenv by the prolific Kenneth Reitz. All you need to use this in your binder is two files of one line each. requirements.txt tells repo2binder to build a Python-based binder, and contains a single line to install the pipenv package: pipenv Then postBuild is used by repo2binder to install all other dependencies using pipenv: pipenv install --system The --system flag tells pipenv to install packages globally (its default behaviour is to create a Python virtualenv). With these two files, the binder builds and runs as expected. You can see a complete example that I put together during the workshop here on Gitlab. What do you think I should write about? I’ve found it increasingly difficult to make time to blog, and it’s not so much not having the time — I’m pretty privileged in that regard — but finding the motivation. Thinking about what used to motivate me, one of the big things was writing things that other people wanted to read. Rather than try to guess, I thought I’d ask! Those who know what I'm about, what would you read about, if it was written by me?I'm trying to break through the blog-writers block and would love to know what other people would like to see my ill-considered opinions on.— Jez Cope (@jezcope) March 7, 2019 I’m still looking for ideas, so please tweet me or leave me a comment below. Below are a few thoughts that I’m planning to do something with. Something taking one of the more techy aspects of Open Research, breaking it down and explaining the benefits for non-techy folks?— Dr Beth 🏳️‍🌈 🐺 (@PhdGeek) March 7, 2019 Skills (both techy and non techy) that people need to most effectively support RDM— Kate O'Neill (@KateFONeill) March 7, 2019 Sometimes I forget that my background makes me well-qualified to take some of these technical aspects of the job and break them down for different audiences. There might be a whole series in this… Carrying on our conversation last week I'd love to hear more about how you've found moving from an HE lib to a national library and how you see the BL's role in RDM. Appreciate this might be a bit niche/me looking for more interesting things to cite :)— Rosie Higman (@RosieHLib) March 7, 2019 This is interesting, and something I’d like to reflect on; moving from one job to another always has lessons and it’s easy to miss them if you’re not paying attention. Another one for the pile. Life without admin rights to your computer— Mike Croucher (@walkingrandomly) March 7, 2019 This is so frustrating as an end user, but at the same time I get that endpoint security is difficult and there are massive risks associated with letting end users have admin rights. This is particularly important at the BL: as custodian’s of a nation’s cultural heritage, the risk for us is bigger than for many and for this reason we are now Cyber Essentials Plus certified. At some point I’d like to do some research and have a conversation with someone who knows a lot more about InfoSec to work out what the proper approach to this, maybe involving VMs and a demilitarized zone on the network. I’m always looking for more inspiration, so please leave a comment if you’ve got anything you’d like to read my thoughts on. If you’re not familiar with my writing, please take a minute or two to explore the blog; the tags page is probably a good place to get an overview. Ultimate Hacking Keyboard: first thoughts Following on from the excitement of having built a functioning keyboard myself, I got a parcel on Monday. Inside was something that I’ve been waiting for since September: an Ultimate Hacking Keyboard! Where the custom-built Laplace is small and quiet for travelling, the UHK is to be my main workhorse in the study at home. Here are my first impressions: Key switches I went with Kailh blue switches from the available options. In stark contrast to the quiet blacks on the Laplace, blues are NOISY! They have an extra piece of plastic inside the switch that causes an audible and tactile click when the switch activates. This makes them very satisfying to type on and should help as I train my fingers not to bottom out while typing, but does make them unsuitable for use in a shared office! Here are some animations showing how the main types of key switch vary. Layout This keyboard has what’s known as a 60% layout: no number pad, arrows or function keys. As with the more spartan Laplace, these “missing” keys are made up for with programmable layers. For example, the arrow keys are on the Mod layer on the I/J/K/L keys, so I can access them without moving from the home row. I actually find this preferable to having to move my hand to the right to reach them, and I really never used the number pad in any case. Split This is a split keyboard, which means that the left and right halves can be separated to place the hands further apart which eases strain across the shoulders. The UHK has a neat coiled cable joining the two which doesn’t get in the way. A cool design feature is that the two halves can be slotted back together and function perfectly well as a non-split keyboard too, held together by magnets. There are even electrical contacts so that when the two are joined you don’t need the linking cable. Programming The board is fully programmable, and this is achieved via a custom (open source) GUI tool which talks to the (open source) firmware on the board. You can have multiple keymaps, each of which has a separate Base, Mod, Fn and Mouse layer, and there’s an LED display that shows a short mnemonic for the currently active map. I already have a customised Dvorak layout for day-to-day use, plus a standard QWERTY for not-me to use and an alternative QWERTY which will be slowly tweaked for games that don’t work well with Dvorak. Mouse keys One cool feature that the designers have included in the firmware is the ability to emulate a mouse. There’s a separate layer that allows me to move the cursor, scroll and click without moving my hands from the keyboard. Palm rests Not much to say about the palm rests, other than they are solid wood, and chunky, and really add a little something. I have to say, I really like it so far! Overall it feels really well designed, with every little detail carefully thought out and excellent build quality and a really solid feeling. Custom-built keyboard I’m typing this post on a keyboard I made myself, and I’m rather excited about it! Why make my own keyboard? I wanted to learn a little bit about practical electronics, and I like to learn by doing I wanted to have the feeling of making something useful with my own hands I actually need a small, keyboard with good-quality switches now that I travel a fair bit for work and this lets me completely customise it to my needs Just because! While it is possible to make a keyboard completely from scratch, it makes much more sense to put together some premade parts. The parts you need are: PCB (printed circuit board): the backbone of the keyboard, to which all the other electrical components attach, this defines the possible physical locations for each key Switches: one for each key to complete a circuit whenever you press it Keycaps: switches are pretty ugly and pretty uncomfortable to press, so each one gets a cap; these are what you probably think of as the “keys” on your keyboard and come in almost limitless variety of designs (within the obvious size limitation) and are the easiest bit of personalisation Controller: the clever bit, which detects open and closed switches on the PCB and tells your computer what keys you pressed via a USB cable Firmware: the program that runs on the controller starts off as source code like any other program, and altering this can make the keyboard behave in loads of different ways, from different layouts to multiple layers accessed by holding a particular key, to macros and even emulating a mouse! In my case, I’ve gone for the following: PCB Laplace from keeb.io, a very compact 47-key (“40%") board, with no number pad, function keys or number row, but a lot of flexibility for key placement on the bottom row. One of my key design goals was small size so I can just pop it in my bag and have on my lap on the train. Controller Elite-C, designed specifically for keyboard builds to be physically compatible with the cheaper Pro Micro, with a more-robust USB port (the Pro Micro’s has a tendency to snap off), and made easier to program with a built-in reset button and better bootloader. Switches Gateron Black: Gateron is one of a number of manufacturers of mechanical switches compatible with the popular Cherry range. The black switch is linear (no click or bump at the activation point) and slightly heavier sprung than the more common red. Cherry also make a black switch but the Gateron version is slightly lighter and having tested a few I found them smoother too. My key goal here was to reduce noise, as the stronger spring will help me type accurately without hitting the bottom of the keystroke with an audible sound. Keycaps Blank grey PBT in DSA profile: this keyboard layout has a lot of non-standard sized keys, so blank keycaps meant that I wouldn’t be putting lots of keys out of their usual position; they’re also relatively cheap, fairly classy IMHO and a good placeholder until I end up getting some really cool caps on a group buy or something; oh, and it minimises the chance of someone else trying the keyboard and getting freaked out by the layout… Firmware QMK (Quantum Mechanical Keyboard), with a work-in-progress layout, based on Dvorak. QMK has a lot of features and allows you to fully program each and every key, with multiple layers accessed through several different routes. Because there are so few keys on this board, I’ll need to make good use of layers to make all the keys on a usual keyboard available. Dvorak Simplified Keyboard I’m grateful to the folks of the Leeds Hack Space, especially Nav & Mark who patiently coached me in various soldering techniques and good practice, but also everyone else who were so friendly and welcoming and interested in my project. I’m really pleased with the result, which is small, light and fully customisable. Playing with QMK firmware features will keep me occupied for quite a while! This isn’t the end though, as I’ll need a case to keep the dust out. I’m hoping to be able to 3D print this or mill it from wood with a CNC mill, for which I’ll need to head back to the Hack Space! Less, but better “Wenniger aber besser” — Dieter Rams {:.big-quote} I can barely believe it’s a full year since I published my intentions for 2018. A lot has happened since then. Principally: in November I started a new job as Data Services Lead at The British Library. One thing that hasn’t changed is my tendency to try to do too much, so this year I’m going to try and focus on a single intention, a translation of designer Dieter Rams' famous quote above: Less, but better. This chimes with a couple of other things I was toying with over the Christmas break, as they’re essentially other ways of saying the same thing: Take it steady One thing at a time I’m also going to keep in mind those touchstones from last year: What difference is this making? Am I looking after myself? Do I have evidence for this? I mainly forget to think about them, so I’ll be sticking up post-its everywhere to help me remember! How to extend Python with Rust: part 1 Python is great, but I find it useful to have an alternative language under my belt for occasions when no amount of Pythonic cleverness will make some bit of code run fast enough. One of my main reasons for wanting to learn Rust was to have something better than C for that. Not only does Rust have all sorts of advantages that make it a good choice for code that needs to run fast and correctly, it’s also got a couple of rather nice crates (libraries) that make interfacing with Python a lot nicer. Here’s a little tutorial to show you how easy it is to call a simple Rust function from Python. If you want to try it yourself, you’ll find the code on GitHub. !!! prerequisites I’m assuming for this tutorial that you’re already familiar with writing Python scripts and importing & using packages, and that you’re comfortable using the command line. You’ll also need to have installed Rust. The Rust bit The quickest way to get compiled code into Python is to use the builtin ctypes package. This is Python’s “Foreign Function Interface” or FFI: a means of calling functions outside the language you’re using to make the call. ctypes allows us to call arbitrary functions in a shared library1, as long as those functions conform to certain standard C language calling conventions. Thankfully, Rust tries hard to make it easy for us to build such a shared library. The first thing to do is to create a new project with cargo, the Rust build tool: $ cargo new rustfrompy Created library `rustfrompy` project $ tree . ├── Cargo.toml └── src └── lib.rs 1 directory, 2 files !!! aside I use the fairly common convention that text set in fixed-width font is either example code or commands to type in. For the latter, a $ precedes the command that you type (omit the $), and lines that don’t start with a $ are output from the previous command. I assume a basic familiarity with Unix-style command line, but I should probably put in some links to resources if you need to learn more! We need to edit the Cargo.toml file and add a [lib] section: [package] name = "rustfrompy" version = "0.1.0" authors = ["Jez Cope <j.cope@erambler.co.uk>"] [dependencies] [lib] name = "rustfrompy" crate-type = ["cdylib"] This tells cargo that we want to make a C-compatible dynamic library (crate-type = ["cdylib"]) and what to call it, plus some standard metadata. We can then put our code in src/lib.rs. We’ll just use a simple toy function that adds two numbers together: #[no_mangle] pub fn add(a: i64, b: i64) -> i64 { a + b } Notice the pub keyword, which instructs the compiler to make this function accessible to other modules, and the #[no_mangle] annotation, which tells it to use the standard C naming conventions for functions. If we don’t do this, then Rust will generate a new name for the function for its own nefarious purposes, and as a side effect we won’t know what to call it when we want to use it from Python. Being good developers, let’s also add a test: #[cfg(test)] mod test { use ::*; #[test] fn test_add() { assert_eq!(4, add(2, 2)); } } We can now run cargo test which will compile that code and run the test: $ cargo test Compiling rustfrompy v0.1.0 (file:///home/jez/Personal/Projects/rustfrompy) Finished dev [unoptimized + debuginfo] target(s) in 1.2 secs Running target/debug/deps/rustfrompy-3033caaa9f5f17aa running 1 test test test::test_add ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Everything worked! Now just to build that shared library and we can try calling it from Python: $ cargo build Compiling rustfrompy v0.1.0 (file:///home/jez/Personal/Projects/rustfrompy) Finished dev [unoptimized + debuginfo] target(s) in 0.30 secs Notice that the build is unoptimized and includes debugging information: this is useful in development, but once we’re ready to use our code it will run much faster if we compile it with optimisations. Cargo makes this easy: $ cargo build --release Compiling rustfrompy v0.1.0 (file:///home/jez/Personal/Projects/rustfrompy) Finished release [optimized] target(s) in 0.30 secs The Python bit After all that, the Python bit is pretty short. First we import the ctypes package (which is included in all recent Python versions): from ctypes import cdll Cargo has tidied our shared library away into a folder, so we need to tell Python where to load it from. On Linux, it will be called lib<something>.so where the “something” is the crate name from Cargo.toml, “rustfrompy”: lib = cdll.LoadLibrary('target/release/librustfrompy.so') Finally we can call the function anywhere we want. Here it is in a pytest-style test: def test_rust_add(): assert lib.add(27, 15) == 42 If you have pytest installed (and you should!) you can run the whole test like this: $ pytest --verbose test.py ====================================== test session starts ====================================== platform linux -- Python 3.6.4, pytest-3.1.1, py-1.4.33, pluggy-0.4.0 -- /home/jez/.virtualenvs/datasci/bin/python cachedir: .cache rootdir: /home/jez/Personal/Projects/rustfrompy, inifile: collected 1 items test.py::test_rust_add PASSED It worked! I’ve put both the Rust and Python code on github if you want to try it for yourself. Shortcomings Ok, so that was a pretty simple example, and I glossed over a lot of things. For example, what would happen if we did lib.add(2.0, 2)? This causes Python to throw an error because our Rust function only accepts integers (64-bit signed integers, i64, to be precise), and we gave it a floating point number. ctypes can’t guess what type(s) a given function will work with, but it can at least tell us when we get it wrong. To fix this properly, we need to do some extra work, telling the ctypes library what the argument and return types for each function are. For a more complex library, there will probably be more housekeeping to do, such as translating return codes from functions into more Pythonic-style errors. For a small example like this there isn’t much of a problem, but the bigger your compiled library the more extra boilerplate is required on the Python side just to use all the functions. When you’re working with an existing library you don’t have much choice about this, but if you’re building it from scratch specifically to interface with Python, there’s a better way using the Python C API. You can call this directly in Rust, but there are a couple of Rust crates that make life much easier, and I’ll be taking a look at those in a future blog post. .so on Linux, .dylib on Mac and .dll on Windows ↩︎ New Years's irresolution Photo by Andrew Hughes on Unsplash I’ve chosen not to make any specific resolutions this year; I’ve found that they just don’t work for me. Like many people, all I get is a sense of guilt when I inevitably fail to live up to the expectations I set myself at the start of the year. However, I have set a couple of what I’m referring to as “themes” for the year: touchstones that I’ll aim to refer to when setting priorities or just feeling a bit overwhelmed or lacking in direction. They are: Contribution Self-care Measurement I may do some blog posts expanding on these, but in the meantime, I’ve put together a handful of questions to help me think about priorities and get perspective when I’m doing (or avoiding doing) something. What difference is this making? I feel more motivated when I can figure out how I’m contributing to something bigger than myself. In society? In my organisation? To my friends & family? Am I looking after myself? I focus a lot on the expectations have (or at least that I think others have) of me, but I can’t do anything well unless I’m generally happy and healthy. Is this making me happier and healthier? Is this building my capacity to to look after myself, my family & friends and do my job? Is this worth the amount of time and energy I’m putting in? Do I have evidence for this? I don’t have to base decisions purely on feelings/opinions: I have the skills to obtain, analyse and interpret data. Is this fact or opinion? What are the facts? Am I overthinking this? Can I put a confidence interval for this? Build documents from code and data with Saga !!! tldr “TL;DR” I’ve made Saga, a thing for compiling documents by combining code and data with templates. What is it? Saga is a very simple command-line tool that reads in one or more data files, runs one or more scripts, then passes the results into a template to produce a final output document. It enables you to maintain a clean separation between data, logic and presentation and produce data-based documents that can easily be updated. That allows the flow of data through the document to be easily understood, a cornerstone of reproducible analysis. You run it like this: saga build -d data.yaml -d other_data.yaml \ -s analysis.py -t report.md.tmpl \ -O report.md Any scripts specified with -s will have access to the data in local variables, and any changes to local variables in a script will be retained when everything is passed to the template for rendering. For debugging, you can also do: saga dump -d data.yaml -d other_data.yaml -s analysis.py which will print out the full environment that would be passed to your template with saga build. Features Right now this is a really early version. It does the job but I have lots of ideas for features to add if I ever have time. At present it does the following: Reads data from one or more YAML files Transforms data with one or more Python scripts Renders a template in Mako format Works with any plain-text output format, including Markdown, LaTeX and HTML Use cases Write reproducible reports & papers based on machine-readable data Separate presentation from content in any document, e.g. your CV (example coming soon) Yours here? Get it! I haven’t released this on PyPI yet, but all the code is available on GitHub to try out. If you have pipenv installed (and if you use Python you should!), you can try it out in an isolated virtual environment by doing: git clone https://github.com/jezcope/sagadoc.git cd sagadoc pipenv install pipenv run saga or you can set up for development and run some tests: pipenv install --dev pipenv run pytest Why? Like a lot of people, I have to produce reports for work, often containing statistics computed from data. Although these generally aren’t academic research papers, I see no reason not to aim for a similar level of reproducibility: after all, if I’m telling other people to do it, I’d better take my own advice! A couple of times now I’ve done this by writing a template that holds the text of the report and placeholders for values, along with a Python script that reads in the data, calculates the statistics I want and completes the template. This is valuable for two main reasons: If anyone wants to know how I processed the data and calculated those statistics, it’s all there: no need to try and remember and reproduce a series of button clicks in Excel; If the data or calculations change, I just need to update the relevant part and run it again, and all the relevant parts of the document will be updated. This is particularly important if changing a single data value requires recalculation of dozens of tables, charts, etc. It also gives me the potential to factor out and reuse bits of code in the future, add tests and version control everything. Now that I’ve done this more than once (and it seems likely I’ll do it again) it makes sense to package that script up in a more portable form so I don’t have to write it over and over again (or, shock horror, copy & paste it!). It saves time, and gives others the possibility to make use of it. Prior art I’m not the first person to think of this, but I couldn’t find anything that did exactly what I needed. Several tools will let you interweave code and prose, including the results of evaluating each code snippet in the document: chief among these are Jupyter and Rmarkdown. There are also tools that let you write code in the order that makes most sense to read and then rearrange it into the right order to execute, so-call literate programming. The original tool for this is the venerable noweb. Sadly there is very little that combine both of these and allow you to insert the results of various calculations at arbitrary points in a document, independent of the order of either presenting or executing the code. The only two that I’m aware of are: Dexy and org-mode. Unfortunately, Dexy currently only works on Legacy Python (/Python 2) and org-mode requires emacs (which is fine but not exactly portable). Rmarkdown comes close and supports a range of languages but the full feature set is only available with R. Actually, my ideal solution is org-mode without the emacs dependency, because that’s the most flexible solution; maybe one day I’ll have both the time and skill to implement that. It’s also possible I might be able to figure out Dexy’s internals to add what I want to it, but until then Saga does the job! Future work There are lots of features that I’d still like to add when I have time: Some actual documentation! And examples! More data formats (e.g. CSV, JSON, TOML) More languages (e.g. R, Julia) Fetching remote data over http Caching of intermediate results to speed up rebuilds For now, though, I’d love for you to try it out and let me know what you think! As ever, comment here, tweet me or start an issue on GitHub. Why try Rust for scientific computing? When you’re writing analysis code, Python (or R, or JavaScript, or …) is usually the right choice. These high-level languages are set up to make you as productive as possible, and common tasks like array manipulation have been well optimised. However, sometimes you just can’t get enough speed and need to turn to a lower-level compiled language. Often that will be C, C++ or Fortran, but I thought I’d do a short post on why I think you should consider Rust. One of my goals for 2017’s Advent of Code was to learn a modern, memory-safe, statically-typed language. I now know that there are quite a lot of options in this space, but two seem to stand out: Go & Rust. I gave both of them a try, and although I’ll probably go back to give Go a more thorough test at some point I found I got quite hooked on Rust. Both languages, though young, are definitely production-ready. Servo, the core of the new Firefox browser, is entirely written in Rust. In fact, Mozilla have been trying to rewrite the rendering core in C for nearly a decade, and switching to Rust let them get it done in just a couple of years. !!! tldr “TL;DR” - It’s fast: competitive with idiomatic C/C++, and no garbage-collection overhead - It’s harder to write buggy code, and compiler errors are actually helpful - It’s C-compatible: you can call into Rust code anywhere you’d call into C, call C/C++ from Rust, and incrementally replace C/C++ code with Rust - It has sensible modern syntax that makes your code clearer and more concise - Support for scientific computing are getting better all the time (matrix algebra libraries, built-in SIMD, safe concurrency) - It has a really friendly and active community - It’s production-ready: Servo, the new rendering core in Firefox, is built entirely in Rust Performance To start with, as a compiled language Rust executes much faster than a (pseudo-)interpreted language like Python or R; the price you pay for this is time spent compiling during development. However, having a compile step also allows the language to enforce certain guarantees, such as type-correctness and memory safety, which between them prevent whole classes of bugs from even being possible. Unlike Go (which, like many higher-level languages, uses a garbage collector), Rust handles memory safety at compile time through the concepts of ownership and borrowing. These can take some getting used to and were a big source of frustration when I was first figuring out the language, but ultimately contribute to Rust’s reliably-fast performance. Performance can be unpredictable in a garbage-collected language because you can’t be sure when the GC is going to run and you need to understand it really well to stand a chance of optimising it if becomes a problem. On the other hand, code that has the potential to be unsafe will result in compilation errors in Rust. There are a number of benchmarks (example) that show Rust’s performance on a par with idiomatic C & C++ code, something that very few languages can boast. Helpful error messages Because beginner Rust programmers often get compile errors, it’s really important that those errors are easy to interpret and fix, and Rust is great at this. Not only does it tell you what went wrong, but wherever possible it prints out your code annotated with arrows to show exactly where the error is, and makes specific suggestions how to fix the error which usually turn out to be correct. It also has a nice suite of warnings (things that don’t cause compilation to fail but may indicate bugs) that are just as informative, and this can be extended even further by using the clippy linting tool to further analyse your code. warning: unused variable: `y` --> hello.rs:3:9 | 3 | let y = x; | ^ | = note: #[warn(unused_variables)] on by default = note: to avoid this warning, consider using `_y` instead Easy to integrate with other languages If you’re like me, you’ll probably only use a low-level language for performance-critical code that you can call from a high-level language, and this is an area where Rust shines. Most programmers will turn to C, C++ or Fortran for this because they have a well established ABI (Application Binary Interface) which can be understood by languages like Python and R1. In Rust, it’s trivial to make a C-compatible shared library, and the standard library includes extra features for working with C types. That also means that existing C code can be incrementally ported to Rust: see remacs for an example. On top of this, there are projects like rust-cpython and PyO3 which provide macros and structures that wrap the Python C API to let you build Python modules in Rust with minimal glue code; rustr does a similar job for R. Nice language features Rust has some really nice features, which let you write efficient, concise and correct code. Several feel particularly comfortable as they remind me of similar things available in Haskell, including: Enums, a super-powered combination of C enums and unions (similar to Haskell’s algebraic data types) that enable some really nice code with no runtime cost Generics and traits that let you get more done with less code Pattern matching, a kind of case statement that lets you extract parts of structs, tuples & enums and do all sorts of other clever things Lazy computation based on an iterator pattern, for efficient processing of lists of things: you can do for item in list { ... } instead of the C-style use of an index2, or you can use higher-order functions like map and filter Functions/closures as first-class citizens Scientific computing Although it’s a general-purpose language and not designed specifically for scientific computing, Rust’s support is improving all the time. There are some interesting matrix algebra libraries available, and built-in SIMD is incoming. The memory safety features also work to ensure thread safety, so it’s harder to write concurrency bugs. You should be able to use your favourite MPI implementation too, and there’s at least one attempt to portably wrap MPI in a more Rust-like way. Active development and friendly community One of the things you notice straight away is how active and friendly the Rust community is. There are several IRC channels on irc.mozilla.org including #rust-beginners, which is a great place to get help. The compiler is under constant but carefully-managed development, so that new features are landing all the time but without breaking existing code. And the fabulous Cargo build tool and crates.io are enabling the rapid growth of a healthy ecosystem of open source libraries that you can use to write less code yourself. Summary So, next time you need a compiled language to speed up hotspots in your code, try Rust. I promise you won’t regret it! Julia actually allows you to call C and Fortran functions as a first-class language feature ↩︎ Actually, since C++11 there’s for (auto item : list) { ... } but still… ↩︎ Reflections on #aoc2017 Trees reflected in a lake Joshua Reddekopp on Unsplash It seems like ages ago, but way back in November I committed to completing Advent of Code. I managed it all, and it was fun! All of my code is available on GitHub if you’re interested in seeing what I did, and I managed to get out a blog post for every one with a bit more commentary, which you can see in the series list above. How did I approach it? I’ve not really done any serious programming challenges before. I don’t get to write a lot of code at the moment, so all I wanted from AoC was an excuse to do some proper problem-solving. I never really intended to take a polyglot approach, though I did think that I might use mainly Python with a bit of Haskell. In the end, though, I used: Python (×12); Haskell (×7); Rust (×4); Go; C++; Ruby; Julia; and Coconut. For the most part, my priorities were getting the right answer, followed by writing readable code. I didn’t specifically focus on performance but did try to avoid falling into traps that I knew about. What did I learn? I found Python the easiest to get on with: it’s the language I know best and although I can’t always remember exact method names and parameters I know what’s available and where to look to remind myself, as well as most of the common idioms and some performance traps to avoid. Python was therefore the language that let me focus most on solving the problem itself. C++ and Ruby were more challenging, and it was harder to write good idiomatic code but I can still remember quite a lot. Haskell I haven’t used since university, and just like back then I really enjoyed working out how to solve problems in a functional style while still being readable and efficient (not always something I achieved…). I learned a lot about core Haskell concepts like monads & functors, and I’m really amazed by the way the Haskell community and ecosystem has grown up in the last decade. I also wanted to learn at least one modern, memory-safe compiled language, so I tried both Go and Rust. Both seem like useful languages, but Rust really intrigued me with its conceptual similarities to both Haskell and C++ and its promise of memory safety without a garbage collector. I struggled a lot initially with the “borrow checker” (the component that enforces memory safety at compile time) but eventually started thinking in terms of ownership and lifetimes after which things became easier. The Rust community seems really vibrant and friendly too. What next? I really want to keep this up, so I’m going to look out some more programming challenges (Project Euler looks interesting). It turns out there’s a regular Code Dojo meetup in Leeds, so hopefully I’ll try that out too. I’d like to do more realistic data-science stuff, so I’ll be taking a closer look at stuff like Kaggle too, and figuring out how to do a bit more analysis at work. I’m also feeling motivated to find an open source project to contribute to and/or release a project of my own, so we’ll see if that goes anywhere! I’ve always found the advice to “scratch your own itch” difficult to follow because everything I think of myself has already been done better. Most of the projects I use enough to want to contribute to tend to be pretty well developed with big communities and any bugs that might be accessible to me will be picked off and fixed before I have a chance to get started. Maybe it’s time to get over myself and just reimplement something that already exists, just for the fun of it! The Halting Problem — Python — #adventofcode Day 25 Today’s challenge, takes us back to a bit of computing history: a good old-fashioned Turing Machine. → Full code on GitHub !!! commentary Today’s challenge was a nice bit of nostalgia, taking me back to my university days learning about the theory of computing. Turing Machines are a classic bit of computing theory, and are provably able to compute any value that is possible to compute: a value is computable if and only if a Turing Machine can be written that computes it (though in practice anything non-trivial is mind-bendingly hard to write as a TM). A bit of a library-fest today, compared to other days! from collections import deque, namedtuple from collections.abc import Iterator from tqdm import tqdm import re import fileinput as fi These regular expressions are used to parse the input that defines the transition table for the machine. RE_ISTATE = re.compile(r'Begin in state (?P<state>\w+)\.') RE_RUNTIME = re.compile( r'Perform a diagnostic checksum after (?P<steps>\d+) steps.') RE_STATETRANS = re.compile( r"In state (?P<state>\w+):\n" r" If the current value is (?P<read0>\d+):\n" r" - Write the value (?P<write0>\d+)\.\n" r" - Move one slot to the (?P<move0>left|right).\n" r" - Continue with state (?P<next0>\w+).\n" r" If the current value is (?P<read1>\d+):\n" r" - Write the value (?P<write1>\d+)\.\n" r" - Move one slot to the (?P<move1>left|right).\n" r" - Continue with state (?P<next1>\w+).") MOVE = {'left': -1, 'right': 1} A namedtuple to provide some sugar when using a transition rule. Rule = namedtuple('Rule', 'write move next_state') The TuringMachine class does all the work. class TuringMachine: def __init__(self, program=None): self.tape = deque() self.transition_table = {} self.state = None self.runtime = 0 self.steps = 0 self.pos = 0 self.offset = 0 if program is not None: self.load(program) def __str__(self): return f"Current: {self.state}; steps: {self.steps} of {self.runtime}" Some jiggery-pokery to allow us to use self[pos] to reference an infinite tape. def __getitem__(self, i): i += self.offset if i < 0 or i >= len(self.tape): return 0 else: return self.tape[i] def __setitem__(self, i, x): i += self.offset if i >= 0 and i < len(self.tape): self.tape[i] = x elif i == -1: self.tape.appendleft(x) self.offset += 1 elif i == len(self.tape): self.tape.append(x) else: raise IndexError('Tried to set position off end of tape') Parse the program and set up the transtion table. def load(self, program): if isinstance(program, Iterator): program = ''.join(program) match = RE_ISTATE.search(program) self.state = match['state'] match = RE_RUNTIME.search(program) self.runtime = int(match['steps']) for match in RE_STATETRANS.finditer(program): self.transition_table[match['state']] = { int(match['read0']): Rule(write=int(match['write0']), move=MOVE[match['move0']], next_state=match['next0']), int(match['read1']): Rule(write=int(match['write1']), move=MOVE[match['move1']], next_state=match['next1']), } Run the program for the required number of steps (given by self.runtime). tqdm isn’t in the standard library but it should be: it shows a lovely text-mode progress bar as we go. def run(self): for _ in tqdm(range(self.runtime), desc="Running", unit="steps", unit_scale=True): read = self[self.pos] rule = self.transition_table[self.state][read] self[self.pos] = rule.write self.pos += rule.move self.state = rule.next_state Calculate the “diagnostic checksum” required for the answer. @property def checksum(self): return sum(self.tape) Aaand GO! machine = TuringMachine(fi.input()) machine.run() print("Checksum:", machine.checksum) Electromagnetic Moat — Rust — #adventofcode Day 24 Today’s challenge, the penultimate, requires us to build a bridge capable of reaching across to the CPU, our final destination. → Full code on GitHub !!! commentary We have a finite number of components that fit together in a restricted way from which to build a bridge, and we have to work out both the strongest and the longest bridge we can build. The most obvious way to do this is to recursively build every possible bridge and select the best, but that’s an O(n!) algorithm that could blow up quickly, so might as well go with a nice fast language! Might have to try this in Haskell too, because it’s the type of algorithm that lends itself naturally to a pure functional approach. I feel like I've applied some of the things I've learned in previous challenges I used Rust for, and spent less time mucking about with ownership, and made better use of various language features, including structs and iterators. I'm rather pleased with how my learning of this language is progressing. I'm definitely overusing `Option.unwrap` at the moment though: this is a lazy way to deal with `Option` results and will panic if the result is not what's expected. I'm not sure whether I need to be cloning the components `Vector` either, or whether I could just be passing iterators around. First, we import some bits of standard library and define some data types. The BridgeResult struct lets us use the same algorithm for both parts of the challenge and simply change the value used to calculate the maximum. use std::io; use std::fmt; use std::io::BufRead; #[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)] struct Component(u8, u8); #[derive(Debug, Copy, Clone, Default)] struct BridgeResult { strength: u16, length: u16, } impl Component { fn from_str(s: &str) -> Component { let parts: Vec<&str> = s.split('/').collect(); assert!(parts.len() == 2); Component(parts[0].parse().unwrap(), parts[1].parse().unwrap()) } fn fits(self, port: u8) -> bool { self.0 == port || self.1 == port } fn other_end(self, port: u8) -> u8 { if self.0 == port { return self.1; } else if self.1 == port { return self.0; } else { panic!("{} doesn't fit port {}", self, port); } } fn strength(self) -> u16 { self.0 as u16 + self.1 as u16 } } impl fmt::Display for BridgeResult { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { write!(f, "(S: {}, L: {})", self.strength, self.length) } } best_bridge calculates the length and strength of the “best” bridge that can be built from the remaining components and fits the required port. Whether this is based on strength or length is given by the key parameter, which is passed to Iter.max_by_key. fn best_bridge<F>(port: u8, key: &F, components: &Vec<Component>) -> Option<BridgeResult> where F: Fn(&BridgeResult) -> u16 { if components.len() == 0 { return None; } components.iter() .filter(|c| c.fits(port)) .map(|c| { let b = best_bridge(c.other_end(port), key, &components.clone().into_iter() .filter(|x| x != c).collect()) .unwrap_or_default(); BridgeResult{strength: c.strength() + b.strength, length: 1 + b.length} }) .max_by_key(key) } Now all that remains is to read the input and calculate the result. I was rather pleasantly surprised to find that in spite of my pessimistic predictions about efficiency, when compiled with optimisations turned on this terminates in less than 1s on my laptop. fn main() { let stdin = io::stdin(); let components: Vec<_> = stdin.lock() .lines() .map(|l| Component::from_str(&l.unwrap())) .collect(); match best_bridge(0, &|b: &BridgeResult| b.strength, &components) { Some(b) => println!("Strongest bridge is {}", b), None => println!("No strongest bridge found") }; match best_bridge(0, &|b: &BridgeResult| b.length, &components) { Some(b) => println!("Longest bridge is {}", b), None => println!("No longest bridge found") }; } Coprocessor Conflagration — Haskell — #adventofcode Day 23 Today’s challenge requires us to understand why a coprocessor is working so hard to perform an apparently simple calculation. → Full code on GitHub !!! commentary Today’s problem is based on an assembly-like language very similar to day 18, so I went back and adapted my code from that, which works well for the first part. I’ve also incorporated some advice from /r/haskell, and cleaned up all warnings shown by the -Wall compiler flag and the hlint tool. Part 2 requires the algorithm to run with much larger inputs, and since some analysis shows that it's an `O(n^3)` algorithm it gets intractible pretty fast. There are several approaches to this. First up, if you have a fast enough processor and an efficient enough implementation I suspect that the simulation would probably terminate eventually, but that would likely still take hours: not good enough. I also thought about doing some peephole optimisations on the instructions, but the last time I did compiler optimisation was my degree so I wasn't really sure where to start. What I ended up doing was actually analysing the input code by hand to figure out what it was doing, and then just doing that calculation in a sensible way. I'd like to say I managed this on my own (and I ike to think I would have) but I did get some tips on [/r/adventofcode](https://reddit.com/r/adventofcode). The majority of this code is simply a cleaned-up version of day 18, with some tweaks to accommodate the different instruction set: module Main where import qualified Data.Vector as V import qualified Data.Map.Strict as M import Control.Monad.State.Strict import Text.ParserCombinators.Parsec hiding (State) type Register = Char type Value = Int type Argument = Either Value Register data Instruction = Set Register Argument | Sub Register Argument | Mul Register Argument | Jnz Argument Argument deriving Show type Program = V.Vector Instruction data Result = Cont | Halt deriving (Eq, Show) type Registers = M.Map Char Int data Machine = Machine { dRegisters :: Registers , dPtr :: !Int , dMulCount :: !Int , dProgram :: Program } instance Show Machine where show d = show (dRegisters d) ++ " @" ++ show (dPtr d) ++ " ×" ++ show (dMulCount d) defaultMachine :: Machine defaultMachine = Machine M.empty 0 0 V.empty type MachineState = State Machine program :: GenParser Char st Program program = do instructions <- endBy instruction eol return $ V.fromList instructions where instruction = try (regOp "set" Set) <|> regOp "sub" Sub <|> regOp "mul" Mul <|> jump "jnz" Jnz regOp n c = do string n >> spaces val1 <- oneOf "abcdefgh" secondArg c val1 jump n c = do string n >> spaces val1 <- regOrVal secondArg c val1 secondArg c val1 = do spaces val2 <- regOrVal return $ c val1 val2 regOrVal = register <|> value register = do name <- lower return $ Right name value = do val <- many $ oneOf "-0123456789" return $ Left $ read val eol = char '\n' parseProgram :: String -> Either ParseError Program parseProgram = parse program "" getReg :: Char -> MachineState Int getReg r = do st <- get return $ M.findWithDefault 0 r (dRegisters st) putReg :: Char -> Int -> MachineState () putReg r v = do st <- get let current = dRegisters st new = M.insert r v current put $ st { dRegisters = new } modReg :: (Int -> Int -> Int) -> Char -> Argument -> MachineState () modReg op r v = do u <- getReg r v' <- getRegOrVal v putReg r (u `op` v') incPtr getRegOrVal :: Argument -> MachineState Int getRegOrVal = either return getReg addPtr :: Int -> MachineState () addPtr n = do st <- get put $ st { dPtr = n + dPtr st } incPtr :: MachineState () incPtr = addPtr 1 execInst :: Instruction -> MachineState () execInst (Set reg val) = do newVal <- getRegOrVal val putReg reg newVal incPtr execInst (Mul reg val) = do result <- modReg (*) reg val st <- get put $ st { dMulCount = 1 + dMulCount st } return result execInst (Sub reg val) = modReg (-) reg val execInst (Jnz val1 val2) = do test <- getRegOrVal val1 jump <- if test /= 0 then getRegOrVal val2 else return 1 addPtr jump execNext :: MachineState Result execNext = do st <- get let prog = dProgram st p = dPtr st if p >= length prog then return Halt else do execInst (prog V.! p) return Cont runUntilTerm :: MachineState () runUntilTerm = do result <- execNext unless (result == Halt) runUntilTerm This implements the actual calculation: the number of non-primes between (for my input) 107900 and 124900: optimisedCalc :: Int -> Int -> Int -> Int optimisedCalc a b k = sum $ map (const 1) $ filter notPrime [a,a+k..b] where notPrime n = elem 0 $ map (mod n) [2..(floor $ sqrt (fromIntegral n :: Double))] main :: IO () main = do input <- getContents case parseProgram input of Right prog -> do let c = defaultMachine { dProgram = prog } (_, c') = runState runUntilTerm c putStrLn $ show (dMulCount c') ++ " multiplications made" putStrLn $ "Calculation result: " ++ show (optimisedCalc 107900 124900 17) Left e -> print e Sporifica Virus — Rust — #adventofcode Day 22 Today’s challenge has us helping to clean up (or spread, I can’t really tell) an infection of the “sporifica” virus. → Full code on GitHub !!! commentary I thought I’d have another play with Rust, as its Haskell-like features resonate with me at the moment. I struggled quite a lot with the Rust concepts of ownership and borrowing, and this is a cleaned-up version of the code based on some good advice from the folks on /r/rust. use std::io; use std::env; use std::io::BufRead; use std::collections::HashMap; #[derive(PartialEq, Clone, Copy, Debug)] enum Direction {Up, Right, Down, Left} #[derive(PartialEq, Clone, Copy, Debug)] enum Infection {Clean, Weakened, Infected, Flagged} use self::Direction::*; use self::Infection::*; type Grid = HashMap<(isize, isize), Infection>; fn turn_left(d: Direction) -> Direction { match d {Up => Left, Right => Up, Down => Right, Left => Down} } fn turn_right(d: Direction) -> Direction { match d {Up => Right, Right => Down, Down => Left, Left => Up} } fn turn_around(d: Direction) -> Direction { match d {Up => Down, Right => Left, Down => Up, Left => Right} } fn make_move(d: Direction, x: isize, y: isize) -> (isize, isize) { match d { Up => (x-1, y), Right => (x, y+1), Down => (x+1, y), Left => (x, y-1), } } fn basic_step(grid: &mut Grid, x: &mut isize, y: &mut isize, d: &mut Direction) -> usize { let mut infect = 0; let current = match grid.get(&(*x, *y)) { Some(v) => *v, None => Clean, }; if current == Infected { *d = turn_right(*d); } else { *d = turn_left(*d); infect = 1; }; grid.insert((*x, *y), match current { Clean => Infected, Infected => Clean, x => panic!("Unexpected infection state {:?}", x), }); let new_pos = make_move(*d, *x, *y); *x = new_pos.0; *y = new_pos.1; infect } fn nasty_step(grid: &mut Grid, x: &mut isize, y: &mut isize, d: &mut Direction) -> usize { let mut infect = 0; let new_state: Infection; let current = match grid.get(&(*x, *y)) { Some(v) => *v, None => Infection::Clean, }; match current { Clean => { *d = turn_left(*d); new_state = Weakened; }, Weakened => { new_state = Infected; infect = 1; }, Infected => { *d = turn_right(*d); new_state = Flagged; }, Flagged => { *d = turn_around(*d); new_state = Clean; } }; grid.insert((*x, *y), new_state); let new_pos = make_move(*d, *x, *y); *x = new_pos.0; *y = new_pos.1; infect } fn virus_infect<F>(mut grid: Grid, mut step: F, mut x: isize, mut y: isize, mut d: Direction, n: usize) -> usize where F: FnMut(&mut Grid, &mut isize, &mut isize, &mut Direction) -> usize, { (0..n).map(|_| step(&mut grid, &mut x, &mut y, &mut d)) .sum() } fn main() { let args: Vec<String> = env::args().collect(); let n_basic: usize = args[1].parse().unwrap(); let n_nasty: usize = args[2].parse().unwrap(); let stdin = io::stdin(); let lines: Vec<String> = stdin.lock() .lines() .map(|x| x.unwrap()) .collect(); let mut grid: Grid = HashMap::new(); let x0 = (lines.len() / 2) as isize; let y0 = (lines[0].len() / 2) as isize; for (i, line) in lines.iter().enumerate() { for (j, c) in line.chars().enumerate() { grid.insert((i as isize, j as isize), match c {'#' => Infected, _ => Clean}); } } let basic_steps = virus_infect(grid.clone(), basic_step, x0, y0, Up, n_basic); println!("Basic: infected {} times", basic_steps); let nasty_steps = virus_infect(grid, nasty_step, x0, y0, Up, n_nasty); println!("Nasty: infected {} times", nasty_steps); } Fractal Art — Python — #adventofcode Day 21 Today’s challenge asks us to assist an artist building fractal patterns from a rulebook. → Full code on GitHub !!! commentary Another fairly straightforward algorithm: the really tricky part was breaking the pattern up into chunks and rejoining it again. I could probably have done that more efficiently, and would have needed to if I had to go for a few more iterations and the grid grows with every iteration and gets big fast. Still behind on the blog posts… import fileinput as fi from math import sqrt from functools import reduce, partial import operator INITIAL_PATTERN = ((0, 1, 0), (0, 0, 1), (1, 1, 1)) DECODE = ['.', '#'] ENCODE = {'.': 0, '#': 1} concat = partial(reduce, operator.concat) def rotate(p): size = len(p) return tuple(tuple(p[i][j] for i in range(size)) for j in range(size - 1, -1, -1)) def flip(p): return tuple(p[i] for i in range(len(p) - 1, -1, -1)) def permutations(p): yield p yield flip(p) for _ in range(3): p = rotate(p) yield p yield flip(p) def print_pattern(p): print('-' * len(p)) for row in p: print(' '.join(DECODE[x] for x in row)) print('-' * len(p)) def build_pattern(s): return tuple(tuple(ENCODE[c] for c in row) for row in s.split('/')) def build_pattern_book(lines): book = {} for line in lines: source, target = line.strip().split(' => ') for rotation in permutations(build_pattern(source)): book[rotation] = build_pattern(target) return book def subdivide(pattern): size = 2 if len(pattern) % 2 == 0 else 3 n = len(pattern) // size return (tuple(tuple(pattern[i][j] for j in range(y * size, (y + 1) * size)) for i in range(x * size, (x + 1) * size)) for x in range(n) for y in range(n)) def rejoin(parts): n = int(sqrt(len(parts))) size = len(parts[0]) return tuple(concat(parts[i + k][j] for i in range(n)) for k in range(0, len(parts), n) for j in range(size)) def enhance_once(p, book): return rejoin(tuple(book[part] for part in subdivide(p))) def enhance(p, book, n, progress=None): for _ in range(n): p = enhance_once(p, book) return p book = build_pattern_book(fi.input()) intermediate_pattern = enhance(INITIAL_PATTERN, book, 5) print("After 5 iterations:", sum(sum(row) for row in intermediate_pattern)) final_pattern = enhance(intermediate_pattern, book, 13) print("After 18 iterations:", sum(sum(row) for row in final_pattern)) Particle Swarm — Python — #adventofcode Day 20 Today’s challenge finds us simulating the movements of particles in space. → Full code on GitHub !!! commentary Back to Python for this one, another relatively straightforward simulation, although it’s easier to calculate the answer to part 1 than to simulate. import fileinput as fi import numpy as np import re First we parse the input into 3 2D arrays: using numpy enables us to do efficient arithmetic across the whole set of particles in one go. PARTICLE_RE = re.compile(r'p=<(-?\d+),(-?\d+),(-?\d+)>, ' r'v=<(-?\d+),(-?\d+),(-?\d+)>, ' r'a=<(-?\d+),(-?\d+),(-?\d+)>') def parse_input(lines): x = [] v = [] a = [] for l in lines: m = PARTICLE_RE.match(l) x.append([int(x) for x in m.group(1, 2, 3)]) v.append([int(x) for x in m.group(4, 5, 6)]) a.append([int(x) for x in m.group(7, 8, 9)]) return (np.arange(len(x)), np.array(x), np.array(v), np.array(a)) i, x, v, a = parse_input(fi.input()) Now we can calculate which particle will be closest to the origin in the long-term: this is simply the particle with the smallest acceleration. It turns out that several have the same acceleration, so of these, the one we want is the one with the lowest starting velocity. This is only complicated slightly by the need to get the number of the particle rather than its other information, hence the need to use numpy.argmin. a_abs = np.sum(np.abs(a), axis=1) a_min = np.min(a_abs) a_i = np.squeeze(np.argwhere(a_abs == a_min)) closest = i[a_i[np.argmin(np.sum(np.abs(v[a_i]), axis=1))]] print("Closest: ", closest) Now we define functions to simulate collisions between particles. We have to use the return_index and return_counts options to numpy.unique to be able to get rid of all the duplicate positions (the standard usage is to keep one of each duplicate). def resolve_collisions(x, v, a): (_, i, c) = np.unique(x, return_index=True, return_counts=True, axis=0) i = i[c == 1] return x[i], v[i], a[i] The termination criterion for this loop is an interesting aspect: the most robust to my mind seems to be that eventually the particles will end up sorted in order of their initial acceleration in terms of distance from the origin, so you could check for this but that’s pretty computationally expensive. In the end, all that was needed was a bit of trial and error: terminating arbitrarily after 1,000 iterations seems to work! In fact, all the collisions are over after about 40 iterations for my input but there was always the possibility that two particles with very slightly different accelerations would eventually intersect much later. def simulate_collisions(x, v, a, iterations=1000): for _ in range(iterations): v += a x += v x, v, a = resolve_collisions(x, v, a) return len(x) print("Remaining particles: ", simulate_collisions(x, v, a)) A Series of Tubes — Rust — #adventofcode Day 19 Today’s challenge asks us to help a network packet find its way. → Full code on GitHub !!! commentary Today’s challenge was fairly straightforward, following an ASCII art path, so I thought I’d give Rust another try. I’m a bit behind on the blog posts, so I’m presenting the code below without any further commentary. I’m not really convinced this is good idiomatic Rust, and it was interesting turning a set of strings into a 2D array of characters because there are both u8 (byte) and char types to deal with. use std::io; use std::io::BufRead; const ALPHA: &'static str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; fn change_direction(dia: &Vec<Vec<u8>>, x: usize, y: usize, dx: &mut i32, dy: &mut i32) { assert_eq!(dia[x][y], b'+'); if dx.abs() == 1 { *dx = 0; if y + 1 < dia[x].len() && (dia[x][y + 1] == b'-' || ALPHA.contains(dia[x][y + 1] as char)) { *dy = 1; } else if dia[x][y - 1] == b'-' || ALPHA.contains(dia[x][y - 1] as char) { *dy = -1; } else { panic!("Huh? {} {}", dia[x][y+1] as char, dia[x][y-1] as char); } } else { *dy = 0; if x + 1 < dia.len() && (dia[x + 1][y] == b'|' || ALPHA.contains(dia[x + 1][y] as char)) { *dx = 1; } else if dia[x - 1][y] == b'|' || ALPHA.contains(dia[x - 1][y] as char) { *dx = -1; } else { panic!("Huh?"); } } } fn follow_route(dia: Vec<Vec<u8>>) -> (String, i32) { let mut x: i32 = 0; let mut y: i32; let mut dx: i32 = 1; let mut dy: i32 = 0; let mut result = String::new(); let mut steps = 1; match dia[0].iter().position(|x| *x == b'|') { Some(i) => y = i as i32, None => panic!("Could not find '|' in first row"), } loop { x += dx; y += dy; match dia[x as usize][y as usize] { b'A'...b'Z' => result.push(dia[x as usize][y as usize] as char), b'+' => change_direction(&dia, x as usize, y as usize, &mut dx, &mut dy), b' ' => return (result, steps), _ => (), } steps += 1; } } fn main() { let stdin = io::stdin(); let lines: Vec<Vec<u8>> = stdin.lock().lines() .map(|l| l.unwrap().into_bytes()) .collect(); let result = follow_route(lines); println!("Route: {}", result.0); println!("Steps: {}", result.1); } Duet — Haskell — #adventofcode Day 18 Today’s challenge introduces a type of simplified assembly language that includes instructions for message-passing. First we have to simulate a single program (after humorously misinterpreting the snd and rcv instructions as “sound” and “recover”), but then we have to simulate two concurrent processes and the message passing between them. → Full code on GitHub !!! commentary Well, I really learned a lot from this one! I wanted to get to grips with more complex stuff in Haskell and this challenge seemed like an excellent opportunity to figure out a) parsing with the parsec library and b) using the State monad to keep the state of the simulator. As it turned out, that wasn't all I'd learned: I also ran into an interesting situation whereby lazy evaluation was creating an infinite loop where there shouldn't be one, so I also had to learn how to selectively force strict evaluation of values. I'm pretty sure this isn't the best Haskell in the world, but I'm proud of it. First we have to import a bunch of stuff to use later, but also notice the pragma on the first line which instructs the compiler to enable the BangPatterns language extension, which will be important later. {-# LANGUAGE BangPatterns #-} module Main where import qualified Data.Vector as V import qualified Data.Map.Strict as M import Data.List import Data.Either import Data.Maybe import Control.Monad.State.Strict import Control.Monad.Loops import Text.ParserCombinators.Parsec hiding (State) First up we define the types that will represent the program code itself. data DuetVal = Reg Char | Val Int deriving Show type DuetQueue = [Int] data DuetInstruction = Snd DuetVal | Rcv DuetVal | Jgz DuetVal DuetVal | Set DuetVal DuetVal | Add DuetVal DuetVal | Mul DuetVal DuetVal | Mod DuetVal DuetVal deriving Show type DuetProgram = V.Vector DuetInstruction Next we define the types to hold the machine state, which includes: registers, instruction pointer, send & receive buffers and the program code, plus a counter of the number of sends made (to provide the solution). type DuetRegisters = M.Map Char Int data Duet = Duet { dRegisters :: DuetRegisters , dPtr :: Int , dSendCount :: Int , dRcvBuf :: DuetQueue , dSndBuf :: DuetQueue , dProgram :: DuetProgram } instance Show Duet where show d = show (dRegisters d) ++ " @" ++ show (dPtr d) ++ " S" ++ show (dSndBuf d) ++ " R" ++ show (dRcvBuf d) defaultDuet = Duet M.empty 0 0 [] [] V.empty type DuetState = State Duet program is a parser built on the cool parsec library to turn the program text into a Haskell format that we can work with, a Vector of instructions. Yes, using a full-blown parser is overkill here (it would be much simpler just to split each line on whitespace, but I wanted to see how Parsec works. I’m using Vector here because we need random access to the instruction list, which is much more efficient with Vector: O(1) compared with the O(n) of the built in Haskell list ([]) type. parseProgram applies the parser to a string and returns the result. program :: GenParser Char st DuetProgram program = do instructions <- endBy instruction eol return $ V.fromList instructions where instruction = try (oneArg "snd" Snd) <|> oneArg "rcv" Rcv <|> twoArg "set" Set <|> twoArg "add" Add <|> try (twoArg "mul" Mul) <|> twoArg "mod" Mod <|> twoArg "jgz" Jgz oneArg n c = do string n >> spaces val <- regOrVal return $ c val twoArg n c = do string n >> spaces val1 <- regOrVal spaces val2 <- regOrVal return $ c val1 val2 regOrVal = register <|> value register = do name <- lower return $ Reg name value = do val <- many $ oneOf "-0123456789" return $ Val $ read val eol = char '\n' parseProgram :: String -> Either ParseError DuetProgram parseProgram = parse program "" Next up we have some utility functions that sit in the DuetState monad we defined above and perform common manipulations on the state: getting/setting/updating registers, updating the instruction pointer and sending/receiving messages via the relevant queues. getReg :: Char -> DuetState Int getReg r = do st <- get return $ M.findWithDefault 0 r (dRegisters st) putReg :: Char -> Int -> DuetState () putReg r v = do st <- get let current = dRegisters st new = M.insert r v current put $ st { dRegisters = new } modReg :: (Int -> Int -> Int) -> Char -> DuetVal -> DuetState Bool modReg op r v = do u <- getReg r v' <- getRegOrVal v putReg r (u `op` v') incPtr return False getRegOrVal :: DuetVal -> DuetState Int getRegOrVal (Reg r) = getReg r getRegOrVal (Val v) = return v addPtr :: Int -> DuetState () addPtr n = do st <- get put $ st { dPtr = n + dPtr st } incPtr = addPtr 1 send :: Int -> DuetState () send v = do st <- get put $ st { dSndBuf = (dSndBuf st ++ [v]), dSendCount = dSendCount st + 1 } recv :: DuetState (Maybe Int) recv = do st <- get case dRcvBuf st of (x:xs) -> do put $ st { dRcvBuf = xs } return $ Just x [] -> return Nothing execInst implements the logic for each instruction. It returns False as long as the program can continue, but True if the program tries to receive from an empty buffer. execInst :: DuetInstruction -> DuetState Bool execInst (Set (Reg reg) val) = do newVal <- getRegOrVal val putReg reg newVal incPtr return False execInst (Mul (Reg reg) val) = modReg (*) reg val execInst (Add (Reg reg) val) = modReg (+) reg val execInst (Mod (Reg reg) val) = modReg mod reg val execInst (Jgz val1 val2) = do st <- get test <- getRegOrVal val1 jump <- if test > 0 then getRegOrVal val2 else return 1 addPtr jump return False execInst (Snd val) = do v <- getRegOrVal val send v st <- get incPtr return False execInst (Rcv (Reg r)) = do st <- get v <- recv handle v where handle :: Maybe Int -> DuetState Bool handle (Just x) = putReg r x >> incPtr >> return False handle Nothing = return True execInst x = error $ "execInst not implemented yet for " ++ show x execNext looks up the next instruction and executes it. runUntilWait runs the program until execNext returns True to signal the wait state has been reached. execNext :: DuetState Bool execNext = do st <- get let prog = dProgram st p = dPtr st if p >= length prog then return True else execInst (prog V.! p) runUntilWait :: DuetState () runUntilWait = do waiting <- execNext unless waiting runUntilWait runTwoPrograms handles the concurrent running of two programs, by running first one and then the other to a wait state, then swapping each program’s send buffer to the other’s receive buffer before repeating. If you look carefully, you’ll see a “bang” (!) before the two arguments of the function: runTwoPrograms !d0 !d1. Haskell is a lazy language and usually doesn’t evaluate a computation until you ask for a result, instead carrying around a “thunk” or plan for how to carry out the computation. Sometimes that can be a problem because the amount of memory your program is using can explode unnecessarily as a long computation turns into a large thunk which isn’t evaluated until the very end. That’s not the problem here though. What happens here without the bangs is another side-effect of laziness. The exit condition of this recursive function is that a deadlock has been reached: both programs are waiting to receive, but neither has sent anything, so neither can ever continue. The check for this is (null $ dSndBuf d0') && (null $ dSndBuf d1'). As long as the first program has something in its send buffer, the test fails without ever evaluating the second part, which means the result d1' of running the second program is never needed. The function immediately goes to the recursive case and tries to continue the first program again, which immediately returns because it’s still waiting to receive. The same thing happens again, and the result is that instead of running the second program to obtain something for the first to receive, we get into an infinite loop trying and failing to continue the first program. The bang forces both d0 and d1 to be evaluated at the point we recurse, which forces the rest of the computation: running the second program and swapping the send/receive buffers. With that, the evaluation proceeds correctly and we terminate with a result instead of getting into an infinite loop! runTwoPrograms :: Duet -> Duet -> (Int, Int) runTwoPrograms !d0 !d1 | (null $ dSndBuf d0') && (null $ dSndBuf d1') = (dSendCount d0', dSendCount d1') | otherwise = runTwoPrograms d0'' d1'' where (_, d0') = runState runUntilWait d0 (_, d1') = runState runUntilWait d1 d0'' = d0' { dSndBuf = [], dRcvBuf = dSndBuf d1' } d1'' = d1' { dSndBuf = [], dRcvBuf = dSndBuf d0' } All that remains to be done now is to run the programs and see how many messages were sent before the deadlock. main = do prog <- fmap (fromRight V.empty . parseProgram) getContents let d0 = defaultDuet { dProgram = prog, dRegisters = M.fromList [('p', 0)] } d1 = defaultDuet { dProgram = prog, dRegisters = M.fromList [('p', 1)] } (send0, send1) = runTwoPrograms d0 d1 putStrLn $ "Program 0 sent " ++ show send0 ++ " messages" putStrLn $ "Program 1 sent " ++ show send1 ++ " messages" Spinlock — Rust/Python — #adventofcode Day 17 In today’s challenge we deal with a monstrous whirlwind of a program, eating up CPU and memory in equal measure. → Full code on GitHub (and Python driver script) !!! commentary One of the things I wanted from AoC was an opportunity to try out some popular languages that I don’t currently know, including the memory-safe, strongly-typed compiled languages Go and Rust. Realistically though, I’m likely to continue doing most of my programming in Python, and use one of these other languages when it has better tools or I need the extra speed. In which case, what I really want to know is how I can call functions written in Go or Rust from Python. I thought I'd try Rust first, as it seems to be designed to be C-compatible and that makes it easy to call from Python using [`ctypes`](https://docs.python.org/3.6/library/ctypes.html). Part 1 was another straightforward simulation: translate what the "spinlock" monster is doing into code and run it. It was pretty obvious from the story of this challenge and experience of the last few days that this was going to be another one where the simulation is too computationally expensive for part two, which turns out to be correct. So, first thing to do is to implement the meat of the solution in Rust. spinlock solves the first part of the problem by doing exactly what the monster does. Since we only have to go up to 2017 iterations, this is very tractable. The last number we insert is 2017, so we just return the number immediately after that. #[no_mangle] pub extern fn spinlock(n: usize, skip: usize) -> i32 { let mut buffer: Vec<i32> = Vec::with_capacity(n+1); buffer.push(0); buffer.push(1); let mut pos = 1; for i in 2..n+1 { pos = (pos + skip + 1) % buffer.len(); buffer.insert(pos, i as i32); } pos = (pos + 1) % buffer.len(); return buffer[pos]; } For the second part, we have to do 50 million iterations instead, which is a lot. Given that every time you insert an item in the list it has to move up all the elements after that position, I’m pretty sure the algorithm is O(n^2), so it’s going to take a lot longer than 10,000ish times the first part. Thankfully, we don’t need to build the whole list, just keep track of where 0 is and what number is immediately after it. There may be a closed-form solution to simply calculate the result, but I couldn’t think of it and this is good enough. #[no_mangle] pub extern fn spinlock0(n: usize, skip: usize) -> i32 { let mut pos = 1; let mut pos_0 = 0; let mut after_0 = 1; for i in 2..n+1 { pos = (pos + skip + 1) % i; if pos == pos_0 + 1 { after_0 = i; } if pos <= pos_0 { pos_0 += 1; } } return after_0 as i31; } Now it’s time to call this code from Python. Notice the #[no_mangle] pragmas and pub extern declarations for each function above, which are required to make sure the functions are exported in a C-compatible way. We can build this into a shared library like this: rustc --crate-type=cdylib -o spinlock.so 17-spinlock.rs The Python script is as simple as loading this library, reading the puzzle input from the command line and calling the functions. The ctypes module does a lot of magic so that we don’t have to worry about converting from Python types to native types and back again. import ctypes import sys lib = ctypes.cdll.LoadLibrary("./spinlock.so") skip = int(sys.argv[1]) print("Part 1:", lib.spinlock(2017, skip)) print("Part 2:", lib.spinlock0(50_000_000, skip)) This is a toy example as far as calling Rust from Python is concerned, but it’s worth noting that already we can play with the parameters to the two Rust functions without having to recompile. For more serious work, I’d probably be looking at something like PyO3 to make a proper Python module. Looks like there’s also a very early Rust numpy integration for integrating numerical stuff. You can also do the same thing from Julia, which has a ccall function built in: ccall((:spinlock, "./spinlock.so"), Int32, (UInt64, UInt64), 2017, 377) My next thing to try might be Haskell → Python though… Permutation Promenade — Julia — #adventofcode Day 16 Today’s challenge rather appeals to me as a folk dancer, because it describes a set of instructions for a dance and asks us to work out the positions of the dancing programs after each run through the dance. → Full code on GitHub !!! commentary So, part 1 is pretty straight forward: parse the set of instructions, interpret them and keep track of the dancer positions as you go. One time through the dance. However, part 2 asks for the positions after 1 billion (yes, that’s 1,000,000,000) times through the dance. In hindsight I should have immediately become suspicious, but I thought I’d at least try the brute force approach first because it was simpler to code. So I give it a try, and after waiting for a while, having a cup of tea etc. it still hasn't terminated. I try reducing the number of iterations to 1,000. Now it terminates, but takes about 6 seconds. A spot of arithmetic suggests that running the full version will take a little over 190 years. There must be a better way than that! I'm a little embarassed that I didn't spot the solution immediately (blaming Julia) and tried again in Python to see if I could get it to terminate quicker. When that didn't work I had to think again. A little further investigation with a while loop shows that in fact the dance position repeats (in the case of my input) every 48 times. After that it becomes much quicker! Oh, and it was time for a new language, so I wasted some extra time working out the quirks of [Julia][]. First, a function to evaluate a single move — for neatness, this dispatches to a dedicated function depending on the type of move, although this isn’t really necessary to solve the challenge. Ending a function name with a bang (!) is a Julia convention to indicate that it has side-effects. function eval_move!(move, dancers) move_type = move[1] params = move[2:end] if move_type == 's' # spin eval_spin!(params, dancers) elseif move_type == 'x' # exchange eval_exchange!(params, dancers) elseif move_type == 'p' # partner swap eval_partner!(params, dancers) end end These take care of the individual moves. Parsing the parameters from a string every single time probably isn’t ideal, but as it turns out, that optimisation isn’t really necessary. Note the + 1 in eval_exchange!, which is necessary because Julia is one of those crazy languages where indexes start from 1 instead of 0. These actions are pretty nice to implement, because Julia has circshift as a builtin to rotate a list, and allows you to assign to list slices and swap values in place with a single statement. function eval_spin!(params, dancers) shift = parse(Int, params) dancers[1:end] = circshift(dancers, shift) end function eval_exchange!(params, dancers) i, j = map(x -> parse(Int, x) + 1, split(params, "/")) dancers[i], dancers[j] = dancers[j], dancers[i] end function eval_partner!(params, dancers) a, b = split(params, "/") ia = findfirst([x == a for x in dancers]) ib = findfirst([x == b for x in dancers]) dancers[ia], dancers[ib] = b, a end dance! takes a list of moves and takes the dances once through the dance. function dance!(moves, dancers) for m in moves eval_move!(m, dancers) end end To solve part 1, we simply need to read the moves in, set up the initial positions of the dances and run the dance through once. join is necessary to a) turn characters into length-1 strings, and b) convert the list of strings back into a single string to print out. moves = split(readchomp(STDIN), ",") dancers = collect(join(c) for c in 'a':'p') orig_dancers = copy(dancers) dance!(moves, dancers) println(join(dancers)) Part 2 requires a little more work. We run the dance through again and again until we get back to the initial position, saving the intermediate positions in a list. The list now contains every possible position available from that starting point, so we can find position 1 billion by taking 1,000,000,000 modulo the list length (plus 1 because 1-based indexing) and use that to index into the list to get the final position. dance_cycle = [orig_dancers] while dancers != orig_dancers push!(dance_cycle, copy(dancers)) dance!(moves, dancers) end println(join(dance_cycle[1_000_000_000 % length(dance_cycle) + 1])) This terminates on my laptop in about 1.6s: Brute force 0; Careful thought 1! Dueling Generators — Rust — #adventofcode Day 15 Today’s challenge introduces two pseudo-random number generators which are trying to agree on a series of numbers. We play the part of the “judge”, counting the number of times their numbers agree in the lowest 16 bits. → Full code on GitHub Ever since I used Go to solve day 3, I’ve had a hankering to try the other new kid on the memory-safe compiled language block, Rust. I found it a bit intimidating at first because the syntax wasn’t as close to the C/C++ I’m familiar with and there are quite a few concepts unique to Rust, like the use of traits. But I figured it out, so I can tick another language of my to-try list. I also implemented a version in Python for comparison: the Python version is more concise and easier to read but the Rust version runs about 10× faster. First we include the std::env “crate” which will let us get access to commandline arguments, and define some useful constants for later. use std::env; const M: i64 = 2147483647; const MASK: i64 = 0b1111111111111111; const FACTOR_A: i64 = 16807; const FACTOR_B: i64 = 48271; gen_next generates the next number for a given generator’s sequence. gen_next_picky does the same, but for the “picky” generators, only returning values that meet their criteria. fn gen_next(factor: i64, current: i64) -> i64 { return (current * factor) % M; } fn gen_next_picky(factor: i64, current: i64, mult: i64) -> i64 { let mut next = gen_next(factor, current); while next % mult != 0 { next = gen_next(factor, next); } return next; } duel runs a single duel, and returns the number of times the generators agreed in the lowest 16 bits (found by doing a binary & with the mask defined above). Rust allows functions to be passed as parameters, so we use this to be able to run both versions of the duel using only this one function. fn duel<F, G>(n: i64, next_a: F, mut value_a: i64, next_b: G, mut value_b: i64) -> i64 where F: Fn(i64) -> i64, G: Fn(i64) -> i64, { let mut count = 0; for _ in 0..n { value_a = next_a(value_a); value_b = next_b(value_b); if (value_a & MASK) == (value_b & MASK) { count += 1; } } return count; } Finally, we read the start values from the command line and run the two duels. The expressions that begin |n| are closures (anonymous functions, often called lambdas in other languages) that we use to specify the generator functions for each duel. fn main() { let args: Vec<String> = env::args().collect(); let start_a: i64 = args[1].parse().unwrap(); let start_b: i64 = args[2].parse().unwrap(); println!( "Duel 1: {}", duel( 40000000, |n| gen_next(FACTOR_A, n), start_a, |n| gen_next(FACTOR_B, n), start_b, ) ); println!( "Duel 2: {}", duel( 5000000, |n| gen_next_picky(FACTOR_A, n, 4), start_a, |n| gen_next_picky(FACTOR_B, n, 8), start_b, ) ); } Disk Defragmentation — Haskell — #adventofcode Day 14 Today’s challenge has us helping a disk defragmentation program by identifying contiguous regions of used sectors on a 2D disk. → Full code on GitHub !!! commentary Wow, today’s challenge had a pretty steep learning curve. Day 14 was the first to directly reuse code from a previous day: the “knot hash” from day 10. I solved day 10 in Haskell, so I thought it would be easier to stick with Haskell for today as well. The first part was straightforward, but the second was pretty mind-bending in a pure functional language! I ended up solving it by implementing a [flood fill algorithm][flood]. It's recursive, which is right in Haskell's wheelhouse, but I ended up using `Data.Sequence` instead of the standard list type as its API for indexing is better. I haven't tried it, but I think it will also be a little faster than a naive list-based version. It took a looong time to figure everything out, but I had a day off work to be able to concentrate on it! A lot more imports for this solution, as we’re exercising a lot more of the standard library. module Main where import Prelude hiding (length, filter, take) import Data.Char (ord) import Data.Sequence import Data.Foldable hiding (length) import Data.Ix (inRange) import Data.Function ((&)) import Data.Maybe (fromJust, mapMaybe, isJust) import qualified Data.Set as Set import Text.Printf (printf) import System.Environment (getArgs) Also we’ll extract the key bits from day 10 into a module and import that. import KnotHash Now we define a few data types to make the code a bit more readable. Sector represent the state of a particular disk sector, either free, used (but unmarked) or used and marked as belonging to a given integer-labelled group. Grid is a 2D matrix of Sector, as a sequence of sequences. data Sector = Free | Used | Mark Int deriving (Eq) instance Show Sector where show Free = " ." show Used = " #" show (Mark i) = printf "%4d" i type GridRow = Seq Sector type Grid = Seq (GridRow) Some utility functions to make it easier to view the grids (which can be quite large): used for debugging but not in the finished solution. subGrid :: Int -> Grid -> Grid subGrid n = fmap (take n) . take n printRow :: GridRow -> IO () printRow row = do mapM_ (putStr . show) row putStr "\n" printGrid :: Grid -> IO () printGrid = mapM_ printRow makeKey generates the hash key for a given row. makeKey :: String -> Int -> String makeKey input n = input ++ "-" ++ show n stringToGridRow converts a binary string of ‘1’ and ‘0’ characters to a sequence of Sector values. stringToGridRow :: String -> GridRow stringToGridRow = fromList . map convert where convert x | x == '1' = Used | x == '0' = Free makeRow and makeGrid build up the grid to use based on the provided input string. makeRow :: String -> Int -> GridRow makeRow input n = stringToGridRow $ concatMap (printf "%08b") $ dense $ fullKnotHash 256 $ map ord $ makeKey input n makeGrid :: String -> Grid makeGrid input = fromList $ map (makeRow input) [0..127] Utility functions to count the number of used and free sectors, to give the solution to part 1. countEqual :: Sector -> Grid -> Int countEqual x = sum . fmap (length . filter (==x)) countUsed = countEqual Used countFree = countEqual Free Now the real meat begins! fundUnmarked finds the location of the next used sector that we haven’t yet marked. It returns a Maybe value, which is Just (x, y) if there is still an unmarked block or Nothing if there’s nothing left to mark. findUnmarked :: Grid -> Maybe (Int, Int) findUnmarked g | y == Nothing = Nothing | otherwise = Just (fromJust x, fromJust y) where hasUnmarked row = isJust $ elemIndexL Used row x = findIndexL hasUnmarked g y = case x of Nothing -> Nothing Just x' -> elemIndexL Used $ index g x' floodFill implements a very simple recursive flood fill. It takes a target and replacement value and a starting location, and fills in the replacement value for every connected location that currently has the target value. We use it below to replace a connected used region with a marked region. floodFill :: Sector -> Sector -> (Int, Int) -> Grid -> Grid floodFill t r (x, y) g | inRange (0, length g - 1) x && inRange (0, length g - 1) y && elem == t = let newRow = update y r row newGrid = update x newRow g in newGrid & floodFill t r (x+1, y) & floodFill t r (x-1, y) & floodFill t r (x, y+1) & floodFill t r (x, y-1) | otherwise = g where row = g `index` x elem = row `index` y markNextGroup looks for an unmarked group and marks it if found. If no more groups are found it returns Nothing. markAllGroups then repeatedly applies markNextGroup until Nothing is returned. markNextGroup :: Int -> Grid -> Maybe Grid markNextGroup i g = case findUnmarked g of Nothing -> Nothing Just loc -> Just $ floodFill Used (Mark i) loc g markAllGroups :: Grid -> Grid markAllGroups g = markAllGroups' 1 g where markAllGroups' i g = case markNextGroup i g of Nothing -> g Just g' -> markAllGroups' (i+1) g' onlyMarks filters a grid row and returns a list of (possibly duplicated) group numbers in the row. onlyMarks :: GridRow -> [Int] onlyMarks = mapMaybe getMark . toList where getMark Free = Nothing getMark Used = Nothing getMark (Mark i) = Just i Finally, countGroups puts all the group numbers into a set to get rid of duplicates and returns the size of the set, i.e. the total number of separate groups. countGroups :: Grid -> Int countGroups g = Set.size groupSet where groupSet = foldl' Set.union Set.empty $ fmap rowToSet g rowToSet = Set.fromList . toList . onlyMarks As always, every Haskell program needs a main function to drive the I/O and produce the actual result. main = do input <- fmap head getArgs let grid = makeGrid input used = countUsed grid marked = countGroups $ markAllGroups grid putStrLn $ "Used sectors: " ++ show used putStrLn $ "Groups: " ++ show marked Packet Scanners — Haskell — #adventofcode Day 13 Today’s challenge requires us to sneak past a firewall made up of a series of scanners. → Full code on GitHub !!! commentary I wasn’t really thinking straight when I solved this challenge. I got a solution without too much trouble, but I ended up simulating the step-by-step movement of the scanners. I finally realised that I could calculate whether or not a given scanner was safe at a given time directly with modular arithmetic, and it bugged me so much that I reimplemented the solution. Both are given below, the faster one first. First we introduce some standard library stuff and define some useful utilities. module Main where import qualified Data.Text as T import Data.Maybe (mapMaybe) strip :: String -> String strip = T.unpack . T.strip . T.pack splitOn :: String -> String -> [String] splitOn sep = map T.unpack . T.splitOn (T.pack sep) . T.pack parseScanner :: String -> (Int, Int) parseScanner s = (d, r) where [d, r] = map read $ splitOn ": " s traverseFW does all the hard work: it checks for each scanner whether or not it’s safe as we pass through, and returns a list of the severities of each time we’re caught. mapMaybe is like the standard map in many languages, but operates on a list of Haskell Maybe values, like a combined map and filter. If the value is Just x, x gets included in the returned list; if the value is Nothing, then it gets thrown away. traverseFW :: Int -> [(Int, Int)] -> [Int] traverseFW delay = mapMaybe caught where caught (d, r) = if (d + delay) `mod` (2*(r-1)) == 0 then Just (d * r) else Nothing Then the total severity of our passage through the firewall is simply the sum of each individual severity. severity :: [(Int, Int)] -> Int severity = sum . traverseFW 0 But we don’t want to know how badly we got caught, we want to know how long to wait before setting off to get through safely. findDelay tries traversing the firewall with increasing delay, and returns the delay for the first pass where we predict not getting caught. findDelay :: [(Int, Int)] -> Int findDelay scanners = head $ filter (null . flip traverseFW scanners) [0..] And finally, we put it all together and calculate and print the result. main = do scanners <- fmap (map parseScanner . lines) getContents putStrLn $ "Severity: " ++ (show $ severity scanners) putStrLn $ "Delay: " ++ (show $ findDelay scanners) I’m not generally bothered about performance for these challenges, but here I’ll note that my second attempt runs in a little under 2 seconds on my laptop: $ time ./13-packet-scanners-redux < 13-input.txt Severity: 1900 Delay: 3966414 ./13-packet-scanners-redux < 13-input.txt 1.73s user 0.02s system 99% cpu 1.754 total Compare that with the first, simulation-based one, which takes nearly a full minute: $ time ./13-packet-scanners < 13-input.txt Severity: 1900 Delay: 3966414 ./13-packet-scanners < 13-input.txt 57.63s user 0.27s system 100% cpu 57.902 total And for good measure, here’s the code. Notice the tick and tickOne functions, which together simulate moving all the scanners by one step; for this to work we have to track the full current state of each scanner, which is easier to read with a Haskell record-based custom data type. traverseFW is more complicated because it has to drive the simulation, but the rest of the code is mostly the same. module Main where import qualified Data.Text as T import Control.Monad (forM_) data Scanner = Scanner { depth :: Int , range :: Int , pos :: Int , dir :: Int } instance Show Scanner where show (Scanner d r p dir) = show d ++ "/" ++ show r ++ "/" ++ show p ++ "/" ++ show dir strip :: String -> String strip = T.unpack . T.strip . T.pack splitOn :: String -> String -> [String] splitOn sep str = map T.unpack $ T.splitOn (T.pack sep) $ T.pack str parseScanner :: String -> Scanner parseScanner s = Scanner d r 0 1 where [d, r] = map read $ splitOn ": " s tickOne :: Scanner -> Scanner tickOne (Scanner depth range pos dir) | pos <= 0 = Scanner depth range (pos+1) 1 | pos >= range - 1 = Scanner depth range (pos-1) (-1) | otherwise = Scanner depth range (pos+dir) dir tick :: [Scanner] -> [Scanner] tick = map tickOne traverseFW :: [Scanner] -> [(Int, Int)] traverseFW = traverseFW' 0 where traverseFW' _ [] = [] traverseFW' layer scanners@((Scanner depth range pos _):rest) -- | layer == depth && pos == 0 = (depth*range) + (traverseFW' (layer+1) $ tick rest) | layer == depth && pos == 0 = (depth,range) : (traverseFW' (layer+1) $ tick rest) | layer == depth && pos /= 0 = traverseFW' (layer+1) $ tick rest | otherwise = traverseFW' (layer+1) $ tick scanners severity :: [Scanner] -> Int severity = sum . map (uncurry (*)) . traverseFW empty :: [a] -> Bool empty [] = True empty _ = False findDelay :: [Scanner] -> Int findDelay scanners = delay where (delay, _) = head $ filter (empty . traverseFW . snd) $ zip [0..] $ iterate tick scanners main = do scanners <- fmap (map parseScanner . lines) getContents putStrLn $ "Severity: " ++ (show $ severity scanners) putStrLn $ "Delay: " ++ (show $ findDelay scanners) Digital Plumber — Python — #adventofcode Day 12 Today’s challenge has us helping a village of programs who are unable to communicate. We have a list of the the communication channels between their houses, and need to sort them out into groups such that we know that each program can communicate with others in its own group but not any others. Then we have to calculate the size of the group containing program 0 and the total number of groups. → Full code on GitHub !!! commentary This is one of those problems where I’m pretty sure that my algorithm isn’t close to being the most efficient, but it definitely works! For the sake of solving the challenge that’s all that matters, but it still bugs me. By now I’ve become used to using fileinput to transparently read data either from files given on the command-line or standard input if no arguments are given. import fileinput as fi First we make an initial pass through the input data, creating a group for each line representing the programs on that line (which can communicate with each other). We store this as a Python set. groups = [] for line in fi.input(): head, rest = line.split(' <-> ') group = set([int(head)]) group.update([int(x) for x in rest.split(', ')]) groups.append(group) Now we iterate through the groups, starting with the first, and merging any we find that overlap with our current group. i = 0 while i < len(groups): current = groups[i] Each pass through the groups brings more programs into the current group, so we have to go through and check their connections too. We make several merge passes, until we detect that no more merges took place. num_groups = len(groups) + 1 while num_groups > len(groups): j = i+1 num_groups = len(groups) This inner loop does the actual merging, and deletes each group as it’s merged in. while j < len(groups): if len(current & groups[j]) > 0: current.update(groups[j]) del groups[j] else: j += 1 i += 1 All that’s left to do now is to display the results. print("Number in group 0:", len([g for g in groups if 0 in g][0])) print("Number of groups:", len(groups)) Hex Ed — Python — #adventofcode Day 11 Today’s challenge is to help a program find its child process, which has become lost on a hexagonal grid. We need to follow the path taken by the child (given as input) and calculate the distance it is from home along with the furthest distance it has been at any point along the path. → Full code on GitHub !!! commentary I found this one quite interesting in that it was very quick to solve. In fact, I got lucky and my first quick implementation (max(abs(l)) below) gave the correct answer in spite of missing an obvious not-so-edge case. Thinking about it, there’s only a ⅓ chance that the first incorrect implementation would give the wrong answer! The code is shorter, so you get more words today. ☺ There are a number of different co-ordinate systems on a hexagonal grid (I discovered while reading up after solving it…). I intuitively went for the system known as ‘axial’ coordinates, where you pick two directions aligned to the grid as your x and y axes: note that these won’t be perpendicular. I chose ne/sw as the x axis and se/nw as y, but there are three other possible choices. That leads to the following definition for the directions, encoded as numpy arrays because that makes some of the code below neater. import numpy as np STEPS = {d: np.array(v) for d, v in [('ne', (1, 0)), ('se', (0, -1)), ('s', (-1, -1)), ('sw', (-1, 0)), ('nw', (0, 1)), ('n', (1, 1))]} hex_grid_dist, given a location l calculates the number of steps needed to reach that location from the centre at (0, 0). Notice that we can’t simply use the Manhattan distance here because, for example, one step north takes us to (1, 1), which would give a Manhattan distance of 2. Instead, we can see that moving in the n/s direction allows us to increment or decrement both coordinates at the same time: If the coordinates have the same sign: move n/s until one of them is zero, then move along the relevant ne or se axis back to the origin; in this case the number of steps is greatest of the absolute values of the two coordinates If the coordinates have opposite signs: move independently along the ne and se axes to reduce each to 0; this time the number of steps is the sum of the absolute values of the two coordinates def hex_grid_distance(l): if sum(np.sign(l)) == 0: # i.e. opposite signs return sum(abs(l)) else: return max(abs(l)) Now we can read in the path followed by the child and follow it ourselves, tracking the maximum distance from home along the way. path = input().strip().split(',') location = np.array((0, 0)) max_distance = 0 for step in map(STEPS.get, path): location += step max_distance = max(max_distance, hex_grid_distance(location)) distance = hex_grid_distance(location) print("Child process is at", location, "which is", distance, "steps away") print("Greatest distance was", max_distance) Knot Hash — Haskell — #adventofcode Day 10 Today’s challenge asks us to help a group of programs implement a (highly questionable) hashing algorithm that involves repeatedly reversing parts of a list of numbers. → Full code on GitHub !!! commentary I went with Haskell again today, because it’s the weekend so I have a bit more time, and I really enjoyed yesterday’s Haskell implementation. Today gave me the opportunity to explore the standard library a bit more, as well as lending itself nicely to being decomposed into smaller parts to be combined using higher-order functions. You know the drill by know: import stuff we’ll use later. module Main where import Data.Char (ord) import Data.Bits (xor) import Data.Function ((&)) import Data.List (unfoldr) import Text.Printf (printf) import qualified Data.Text as T The worked example uses a concept of the “current position” as a pointer to a location in a static list. In Haskell it makes more sense to instead use the front of the list as the current position, and rotate the whole list as we progress to bring the right element to the front. rotate :: Int -> [Int] -> [Int] rotate 0 xs = xs rotate n xs = drop n' xs ++ take n' xs where n' = n `mod` length xs The simple version of the hash requires working through the input list, modifying the working list as we go, and incrementing a “skip” counter with each step. Converting this to a functional style, we simply zip up the input with an infinite list [0, 1, 2, 3, ...] to give the counter values. Notice that we also have to calculate how far to rotate the working list to get back to its original position. foldl lets us specify a function that returns a modified version of the working list and feeds the input list in one at a time. simpleKnotHash :: Int -> [Int] -> [Int] simpleKnotHash size input = foldl step [0..size-1] input' & rotate (negate finalPos) where input' = zip input [0..] finalPos = sum $ zipWith (+) input [0..] reversePart xs n = (reverse $ take n xs) ++ drop n xs step xs (n, skip) = reversePart xs n & rotate (n+skip) The full version of the hash (part 2 of the challenge) starts the same way as the simple version, except making 64 passes instead of one: we can do this by using replicate to make a list of 64 copies, then collapse that into a single list with concat. fullKnotHash :: Int -> [Int] -> [Int] fullKnotHash size input = simpleKnotHash size input' where input' = concat $ replicate 64 input The next step in calculating the full hash collapses the full 256-element “sparse” hash down into 16 elements by XORing groups of 16 together. unfoldr is a nice efficient way of doing this. dense :: [Int] -> [Int] dense = unfoldr dense' where dense' [] = Nothing dense' xs = Just (foldl1 xor $ take 16 xs, drop 16 xs) The final hash step is to convert the list of integers into a hexadecimal string. hexify :: [Int] -> String hexify = concatMap (printf "%02x") These two utility functions put together building blocks from the Data.Text module to parse the input string. Note that no arguments are given: the functions are defined purely by composing other functions using the . operator. In Haskell this is referred to as “point-free” style. strip :: String -> String strip = T.unpack . T.strip . T.pack parseInput :: String -> [Int] parseInput = map (read . T.unpack) . T.splitOn (T.singleton ',') . T.pack Now we can put it all together, including building the weird input for the “full” hash. main = do input <- fmap strip getContents let simpleInput = parseInput input asciiInput = map ord input ++ [17, 31, 73, 47, 23] (a:b:_) = simpleKnotHash 256 simpleInput print $ (a*b) putStrLn $ fullKnotHash 256 asciiInput & dense & hexify Stream Processing — Haskell — #adventofcode Day 9 In today’s challenge we come across a stream that we need to cross. But of course, because we’re stuck inside a computer, it’s not water but data flowing past. The stream is too dangerous to cross until we’ve removed all the garbage, and to prove we can do that we have to calculate a score for the valid data “groups” and the number of garbage characters to remove. → Full code on GitHub !!! commentary One of my goals for this process was to knock the rust of my functional programming skills in Haskell, and I haven’t done that for the whole of the first week. Processing strings character by character and acting according to which character shows up seems like a good choice for pattern-matching though, so here we go. I also wanted to take a bash at test-driven development in Haskell, so I also loaded up the Test.Hspec module to give it a try. I did find keeping track of all the state in arguments a bit mind boggling, and I think it could have been improved through use of a data type using record syntax and the `State` monad, so that's something to look at for a future challenge. First import the extra bits we’ll need. module Main where import Test.Hspec import Data.Function ((&)) countGroups solves the first part of the problem, counting up the “score” of the valid data in the stream. countGroups' is an auxiliary function that holds some state in its arguments. We use pattern matching for the base case: [] represents the empty list in Haskell, which indicates we’ve finished the whole stream. Otherwise, we split the remaining stream into its first character and remainder, and use guards to decide how to interpret it. If skip is true, discard the character and carry on with skip set back to false. If we find a “!”, that tells us to skip the next. Other characters mark groups or sets of garbage: groups increase the score when they close and garbage is discarded. We continue to progress the list by recursing with the remainder of the stream and any updated state. countGroups :: String -> Int countGroups = countGroups' 0 0 False False where countGroups' score _ _ _ [] = score countGroups' score level garbage skip (c:rest) | skip = countGroups' score level garbage False rest | c == '!' = countGroups' score level garbage True rest | garbage = case c of '>' -> countGroups' score level False False rest _ -> countGroups' score level True False rest | otherwise = case c of '{' -> countGroups' score (level+1) False False rest '}' -> countGroups' (score+level) (level-1) False False rest ',' -> countGroups' score level False False rest '<' -> countGroups' score level True False rest c -> error $ "Garbage character found outside garbage: " ++ show c countGarbage works almost identically to countGroups, except it ignores groups and counts garbage. They are structured so similarly that it would probably make more sense to combine them to a single function that returns both counts. countGarbage :: String -> Int countGarbage = countGarbage' 0 False False where countGarbage' count _ _ [] = count countGarbage' count garbage skip (c:rest) | skip = countGarbage' count garbage False rest | c == '!' = countGarbage' count garbage True rest | garbage = case c of '>' -> countGarbage' count False False rest _ -> countGarbage' (count+1) True False rest | otherwise = case c of '<' -> countGarbage' count True False rest _ -> countGarbage' count False False rest Hspec gives us a domain-specific language heavily inspired by the rspec library for Ruby: the tests read almost like natural language. I built up these tests one-by-one, gradually implementing the appropriate bits of the functions above, a process known as Test-driven development. runTests = hspec $ do describe "countGroups" $ do it "counts valid groups" $ do countGroups "{}" `shouldBe` 1 countGroups "{{{}}}" `shouldBe` 6 countGroups "{{{},{},{{}}}}" `shouldBe` 16 countGroups "{{},{}}" `shouldBe` 5 it "ignores garbage" $ do countGroups "{<a>,<a>,<a>,<a>}" `shouldBe` 1 countGroups "{{<ab>},{<ab>},{<ab>},{<ab>}}" `shouldBe` 9 it "skips marked characters" $ do countGroups "{{<!!>},{<!!>},{<!!>},{<!!>}}" `shouldBe` 9 countGroups "{{<a!>},{<a!>},{<a!>},{<ab>}}" `shouldBe` 3 describe "countGarbage" $ do it "counts garbage characters" $ do countGarbage "<>" `shouldBe` 0 countGarbage "<random characters>" `shouldBe` 17 countGarbage "<<<<>" `shouldBe` 3 it "ignores non-garbage" $ do countGarbage "{{},{}}" `shouldBe` 0 countGarbage "{{<ab>},{<ab>},{<ab>},{<ab>}}" `shouldBe` 8 it "skips marked characters" $ do countGarbage "<{!>}>" `shouldBe` 2 countGarbage "<!!>" `shouldBe` 0 countGarbage "<!!!>" `shouldBe` 0 countGarbage "<{o\"i!a,<{i<a>" `shouldBe` 10 Finally, the main function reads in the challenge input and calculates the answers, printing them on standard output. main = do runTests repeat '=' & take 78 & putStrLn input <- getContents & fmap (filter (/='\n')) putStrLn $ "Found " ++ show (countGroups input) ++ " groups" putStrLn $ "Found " ++ show (countGarbage input) ++ " characters garbage" I Heard You Like Registers — Python — #adventofcode Day 8 Today’s challenge describes a simple instruction set for a CPU, incrementing and decrementing values in registers according to simple conditions. We have to interpret a stream of these instructions, and to prove that we’ve done so, give the highest value of any register, both at the end of the program and throughout the whole program. → Full code on GitHub !!! commentary This turned out to be a nice straightforward one to implement, as the instruction format was easily parsed by regular expression, and Python provides the eval function which made evaluating the conditions a doddle. Import various standard library bits that we’ll use later. import re import fileinput as fi from math import inf from collections import defaultdict We could just parse the instructions by splitting the string, but using a regular expression is a little bit more robust because it won’t match at all if given an invalid instruction. INSTRUCTION_RE = re.compile(r'(\w+) (inc|dec) (-?\d+) if (.+)\s*') def parse_instruction(instruction): match = INSTRUCTION_RE.match(instruction) return match.group(1, 2, 3, 4) Executing an instruction simply checks the condition and if it evaluates to True updates the relevant register. def exec_instruction(registers, instruction): name, op, value, cond = instruction value = int(value) if op == 'dec': value = -value if eval(cond, globals(), registers): registers[name] += value highest_value returns the maximum value found in any register. def highest_value(registers): return sorted(registers.items(), key=lambda x: x[1], reverse=True)[0][1] Finally, loop through all the instructions and carry them out, updating global_max as we go. We need to be able to deal with registers that haven’t been accessed before. Keeping the registers in a dictionary means that we can evaluate the conditions directly using eval above, passing it as the locals argument. The standard dict will raise an exception if we try to access a key that doesn’t exist, so instead we use collections.defaultdict, which allows us to specify what the default value for a non-existent key will be. New registers start at 0, so we use a simple lambda to define a function that always returns 0. global_max = -inf registers = defaultdict(lambda: 0) for i in map(parse_instruction, fi.input()): exec_instruction(registers, i) global_max = max(global_max, highest_value(registers)) print('Max value:', highest_value(registers)) print('All-time max:', global_max) Recursive Circus — Ruby — #adventofcode Day 7 Today’s challenge introduces a set of processes balancing precariously on top of each other. We find them stuck and unable to get down because one of the processes is the wrong size, unbalancing the whole circus. Our job is to figure out the root from the input and then find the correct weight for the single incorrect process. → Full code on GitHub !!! commentary So I didn’t really intend to take a full polyglot approach to Advent of Code, but it turns out to have been quite fun, so I made a shortlist of languages to try. Building a tree is a classic application for object-orientation using a class to represent tree nodes, and I’ve always liked the feel of Ruby’s class syntax, so I gave it a go. First make sure we have access to Set, which we’ll use later. require 'set' Now to define the CircusNode class, which represents nodes in the tree. attr :s automatically creates a function s that returns the value of the instance attribute @s class CircusNode attr :name, :weight def initialize(name, weight, children=nil) @name = name @weight = weight @children = children || [] end Add a << operator (the same syntax for adding items to a list) that adds a child to this node. def <<(c) @children << c @total_weight = nil end total_weight recursively calculates the weight of this node and everything above it. The @total_weight ||= blah idiom caches the value so we only calculate it once. def total_weight @total_weight ||= @weight + @children.map {|c| c.total_weight}.sum end balance_weight does the hard work of figuring out the proper weight for the incorrect node by recursively searching through the tree. def balance_weight(target=nil) by_weight = Hash.new{|h, k| h[k] = []} @children.each{|c| by_weight[c.total_weight] << c} if by_weight.size == 1 then if target return @weight - (total_weight - target) else raise ArgumentError, 'This tree seems balanced!' end else odd_one_out = by_weight.select {|k, v| v.length == 1}.first[1][0] child_target = by_weight.select {|k, v| v.length > 1}.first[0] return odd_one_out.balance_weight child_target end end A couple of utility functions for displaying trees finish off the class. def to_s "#{@name} (#{@weight})" end def print_tree(n=0) puts "#{' '*n}#{self} -> #{self.total_weight}" @children.each do |child| child.print_tree n+1 end end end build_circus takes input as a list of lists [name, weight, children]. We make two passes over this list, first creating all the nodes, then building the tree by adding children to parents. def build_circus(data) all_nodes = {} all_children = Set.new data.each do |name, weight, children| all_nodes[name] = CircusNode.new name, weight end data.each do |name, weight, children| children.each {|child| all_nodes[name] << all_nodes[child]} all_children.merge children end root_name = (all_nodes.keys.to_set - all_children).first return all_nodes[root_name] end Finally, build the tree and solve the problem! Note that we use String.to_sym to convert the node names to symbols (written in Ruby as :symbol), because they’re faster to work with in Hashes and Sets as we do above. data = readlines.map do |line| match = /(?<parent>\w+) \((?<weight>\d+)\)(?: -> (?<children>.*))?/.match line [match['parent'].to_sym, match['weight'].to_i, match['children'] ? match['children'].split(', ').map {|x| x.to_sym} : []] end root = build_circus data puts "Root node: #{root}" puts root.balance_weight Memory Reallocation — Python — #adventofcode Day 6 Today’s challenge asks us to follow a recipe for redistributing objects in memory that bears a striking resemblance to the rules of the African game Mancala. → Full code on GitHub !!! commentary When I was doing my MSci, one of our programming exercises was to write (in Haskell, IIRC) a program to play a Mancala variant called Oware, so this had a nice ring of nostalgia. Back to Python today: it's already become clear that it's by far my most fluent language, which makes sense as it's the only one I've used consistently since my schooldays. I'm a bit behind on the blog posts, so you get this one without any explanation, for now at least! import math def reallocate(mem): max_val = -math.inf size = len(mem) for i, x in enumerate(mem): if x > max_val: max_val = x max_index = i i = max_index mem[i] = 0 remaining = max_val while remaining > 0: i = (i + 1) % size mem[i] += 1 remaining -= 1 return mem def detect_cycle(mem): mem = list(mem) steps = 0 prev_states = {} while tuple(mem) not in prev_states: prev_states[tuple(mem)] = steps steps += 1 mem = reallocate(mem) return (steps, steps - prev_states[tuple(mem)]) initial_state = map(int, input().split()) print("Initial state is ", initial_state) steps, cycle = detect_cycle(initial_state) print("Steps to cycle: ", steps) print("Steps in cycle: ", cycle) A Maze of Twisty Trampolines — C++ — #adventofcode Day 5 Today’s challenge has us attempting to help the CPU escape from a maze of instructions. It’s not quite a Turing Machine, but it has that feeling of moving a read/write head up and down a tape acting on and changing the data found there. → Full code on GitHub !!! commentary I haven’t written anything in C++ for over a decade. It sounds like there have been lots of interesting developments in the language since then, with C++11, C++14 and the freshly finalised C++17 standards (built-in parallelism in the STL!). I won’t use any of those, but I thought I’d dust off my C++ and see what happened. Thankfully the Standard Template Library classes still did what I expected! As usual, we first include the parts of the standard library we’re going to use: iostream for input & output; vector for the container. We also declare that we’re using the std namespace, so that we don’t have to prepend vector and the other classes with std::. #include <iostream> #include <vector> using namespace std; steps_to_escape_part1 implements part 1 of the challenge: we read a location, move forward/backward by the number of steps given in that location, then add one to the location before repeating. The result is the number of steps we take before jumping outside the list. int steps_to_escape_part1(vector<int>& instructions) { int pos = 0, iterations = 0, new_pos; while (pos < instructions.size()) { new_pos = pos + instructions[pos]; instructions[pos]++; pos = new_pos; iterations++; } return iterations; } steps_to_escape_part2 solves part 2, which is very similar, except that an offset greater than 3 is decremented instead of incremented before moving on. int steps_to_escape_part2(vector<int>& instructions) { int pos = 0, iterations = 0, new_pos, offset; while (pos < instructions.size()) { offset = instructions[pos]; new_pos = pos + offset; instructions[pos] += offset >=3 ? -1 : 1; pos = new_pos; iterations++; } return iterations; } Finally we pull it all together and link it up to the input. int main() { vector<int> instructions1, instructions2; int n; The cin class lets us read data from standard input, which we then add to a vector of ints to give our list of instructions. while (true) { cin >> n; if (cin.eof()) break; instructions1.push_back(n); } Solving the problem modifies the input, so we need to take a copy to solve part 2 as well. Thankfully the STL makes this easy with iterators. instructions2.insert(instructions2.begin(), instructions1.begin(), instructions1.end()); Finally, compute the result and print it on standard output. cout << steps_to_escape_part1(instructions1) << endl; cout << steps_to_escape_part2(instructions2) << endl; return 0; } High Entropy Passphrases — Python — #adventofcode Day 4 Today’s challenge describes some simple rules supposedly intended to enforce the use of secure passwords. All we have to do is test a list of passphrase and identify which ones meet the rules. → Full code on GitHub !!! commentary Fearing that today might be as time-consuming as yesterday, I returned to Python and it’s hugely powerful “batteries-included” standard library. Thankfully this challenge was more straightforward, and I actually finished this before finishing day 3. First, let’s import two useful utilities. from fileinput import input from collections import Counter Part 1 requires simply that a passphrase contains no repeated words. No problem: we split the passphrase into words and count them, and check if any was present more than once. Counter is an amazingly useful class to have in a language’s standard library. All it does is count things: you add objects to it, and then it will tell you how many of a given object you have. We’re going to use it to count those potentially duplicated words. def is_valid(passphrase): counter = Counter(passphrase.split()) return counter.most_common(1)[0][1] == 1 Part 2 requires that no word in the passphrase be an anagram of any other word. Since we don’t need to do anything else with the words afterwards, we can check for anagrams by sorting the letters in each word: “leaf” and “flea” both become “aefl” and can be compared directly. Then we count as before. def is_valid_ana(passphrase): counter = Counter(''.join(sorted(word)) for word in passphrase.split()) return counter.most_common(1)[0][1] == 1 Finally we pull everything together. sum(map(boolean_func, list)) is a common idiom in Python for counting the number of times a condition (checked by boolean_func) is true. In Python, True and False can be treated as the numbers 1 and 0 respectively, so that summing a list of Boolean values gives you the number of True values in the list. lines = list(input()) print(sum(map(is_valid, lines))) print(sum(map(is_valid_ana, lines))) Spiral Memory — Go — #adventofcode Day 3 Today’s challenge requires us to perform some calculations on an “experimental memory layout”, with cells moving outwards from the centre of a square spiral (squiral?). → Full code on GitHub !!! commentary I’ve been wanting to try my hand at Go, the memory-safe, statically typed compiled language from Google for a while. Today’s challenge seemed a bit more mathematical in nature, meaning that I wouldn’t need too many advanced language features or knowledge of a standard library, so I thought I’d give it a “go”. It might have been my imagination, but it was impressive how quickly the compiled program chomped through 60 different input values while I was debugging. I actually spent far too long on this problem because my brain led me down a blind alley trying to do the wrong calculation, but I got there in the end! The solution is a bit difficult to explain without diagrams, which I don't really have time to draw right now, but fear not because several other people have. First take a look at [the challenge itself which explains the spiral memory concept](http://adventofcode.com/2017/day/3). Then look at the [nice diagrams that Phil Tooley made with Python](http://acceleratedscience.co.uk/blog/adventofcode-day-3-spiral-memory/) and hopefully you'll be able to see what's going on! It's interesting to note that this challenge also admits of an algorithmic solution instead of the mathematical one: you can model the memory as an infinite grid using a suitable data structure and literally move around it in a spiral. In hindsight this is a much better way of solving the challenge quickly because it's easier and less error-prone to code. I'm quite pleased with my maths-ing though, and it's much quicker than the algorithmic version! First some Go boilerplate: we have to define the package we’re in (main, because it’s an executable we’re producing) and import the libraries we’ll use. package main import ( "fmt" "math" "os" ) Weirdly, Go doesn’t seem to have these basic mathematics functions for integers in its standard library (please someone correct me if I’m wrong!) so I’ll define them instead of mucking about with data types. Go doesn’t do any implicit type conversion, even between numeric types, and the math builtin package only operates on float64 values. func abs(n int) int { if n < 0 { return -n } return n } func min(x, y int) int { if x < y { return x } return y } func max(x, y int) int { if x > y { return x } return y } This does the heavy lifting for part one: converting from a position on the spiral to a column and row in the grid. (0, 0) is the centre of the spiral. This actually does a bit more than is necessary to calculate the distance as required for part 1, but we’ll use it again for part 2. func spiral_to_xy(n int) (int, int) { if n == 1 { return 0, 0 } r := int(math.Floor((math.Sqrt(float64(n-1)) + 1) / 2)) n_r := n - (2*r-1)*(2*r-1) o := ((n_r - 1) % (2 * r)) - r + 1 sector := (n_r - 1) / (2 * r) switch sector { case 0: return r, o case 1: return -o, r case 2: return -r, -o case 3: return o, -r } return 0, 0 } Now use spiral_to_xy to calculate the Manhattan distance that the value at location n in the spiral memory are carried to reach the “access port” at 0. func distance(n int) int { x, y := spiral_to_xy(n) return abs(x) + abs(y) } This function does the opposite of spiral_to_xy, translating a grid position back to its position on the spiral. This is the one that took me far too long to figure out because I had a brain bug and tried to calculate the value s (which sector or quarter of the spiral we’re looking at) in a way that was never going to work! Fortunately I came to my senses. func xy_to_spiral(x, y int) int { if x == 0 && y == 0 { return 1 } r := max(abs(x), abs(y)) var s, o, n int if x+y > 0 && x-y >= 0 { s = 0 } else if x-y < 0 && x+y >= 0 { s = 1 } else if x+y < 0 && x-y <= 0 { s = 2 } else { s = 3 } switch s { case 0: o = y case 1: o = -x case 2: o = -y case 3: o = x } n = o + r*(2*s+1) + (2*r-1)*(2*r-1) return n } This is a utility function that uses xy_to_spiral to fetch the value at a given (x, y) location, and returns zero if we haven’t filled that location yet. func get_spiral(mem []int, x, y int) int { n := xy_to_spiral(x, y) - 1 if n < len(mem) { return mem[n] } return 0 } Finally we solve part 2 of the problem, which involves going round the spiral writing values into it that are the sum of some values already written. The result is the first of these sums that is greater than or equal to the given input value. func stress_test(input int) int { mem := make([]int, 1) n := 0 mem[0] = 1 for mem[n] < input { n++ x, y := spiral_to_xy(n + 1) mem = append(mem, get_spiral(mem, x+1, y)+ get_spiral(mem, x+1, y+1)+ get_spiral(mem, x, y+1)+ get_spiral(mem, x-1, y+1)+ get_spiral(mem, x-1, y)+ get_spiral(mem, x-1, y-1)+ get_spiral(mem, x, y-1)+ get_spiral(mem, x+1, y-1)) } return mem[n] } Now the last part of the program puts it all together, reading the input value from a commandline argument and printing the results of the two parts of the challenge: func main() { var n int fmt.Sscanf(os.Args[1], "%d", &n) fmt.Printf("Input is %d\n", n) fmt.Printf("Distance is %d\n", distance(n)) fmt.Printf("Stress test result is %d\n", stress_test(n)) } Corruption Checksum — Python — #adventofcode Day 2 Today’s challenge is to calculate a rather contrived “checksum” over a grid of numbers. → Full code on GitHub !!! commentary Today I went back to plain Python, and I didn’t do formal tests because only one test case was given for each part of the problem. I just got stuck in. I did write part 2 out in as nested `for` loops as an intermediate step to working out the generator expression. I think that expanded version may have been more readable. Having got that far, I couldn't then work out how to finally eliminate the need for an auxiliary function entirely without either sorting the same elements multiple times or sorting each row as it's read. First we read in the input, split it and convert it to numbers. fileinput.input() returns an iterator over the lines in all the files passed as command-line arguments, or over standard input if no files are given. from fileinput import input sheet = [[int(x) for x in l.split()] for l in input()] Part 1 of the challenge calls for finding the difference between the largest and smallest number in each row, and then summing those differences: print(sum(max(x) - min(x) for x in sheet)) Part 2 is a bit more involved: for each row we have to find the unique pair of elements that divide into each other without remainder, then sum the result of those divisions. We can make it a little easier by sorting each row; then we can take each number in turn and compare it only with the numbers after it (which are guaranteed to be larger). Doing this ensures we only make each comparison once. def rowsum_div(row): row = sorted(row) return sum(y // x for i, x in enumerate(row) for y in row[i+1:] if y % x == 0) print(sum(map(rowsum_div, sheet))) We can make this code shorter (if not easier to read) by sorting each row as it’s read: sheet = [sorted(int(x) for x in l.split()) for l in input()] Then we can just use the first and last elements in each row for part 1, as we know those are the smallest and largest respectively in the sorted row: print(sum(x[-1] - x[0] for x in sheet)) Part 2 then becomes a sum over a single generator expression: print(sum(y // x for row in sheet for i, x in enumerate(row) for y in row[i+1:] if y % x == 0)) Very satisfying! Inverse Captcha — Coconut — #adventofcode Day 1 Well, December’s here at last, and with it Day 1 of Advent of Code. … It goes on to explain that you may only leave by solving a captcha to prove you’re not a human. Apparently, you only get one millisecond to solve the captcha: too fast for a normal human, but it feels like hours to you. … As well as posting solutions here when I can, I’ll be putting them all on https://github.com/jezcope/aoc2017 too. !!! commentary After doing some challenges from last year in Haskell for a warm up, I felt inspired to try out the functional-ish Python dialect, Coconut. Now that I’ve done it, it feels a bit of an odd language, neither fish nor fowl. It’ll look familiar to any Pythonista, but is loaded with features normally associated with functional languages, like pattern matching, destructuring assignment, partial application and function composition. That makes it quite fun to work with, as it works similarly to Haskell, but because it's restricted by the basic rules of Python syntax everything feels a bit more like hard work than it should. The accumulator approach feels clunky, but it's necessary to allow [tail call elimination](https://en.wikipedia.org/wiki/Tail_call), which Coconut will do and I wanted to see in action. Lo and behold, if you take a look at the [compiled Python version](https://github.com/jezcope/aoc2017/blob/86c8100824bda1b35e5db6e02d4b80890be7a022/01-inverse-captcha.py#L675) you'll see that my recursive implementation has been turned into a non-recursive `while` loop. Then again, maybe I'm just jealous of Phil Tooley's [one-liner solution in Python](https://github.com/ptooley/aocGolf/blob/1380d78194f1258748ccfc18880cfd575baf5d37/2017.py#L8). import sys def inverse_captcha_(s, acc=0): case reiterable(s): match (|d, d|) :: rest: return inverse_captcha_((|d|) :: rest, acc + int(d)) match (|d0, d1|) :: rest: return inverse_captcha_((|d1|) :: rest, acc) return acc def inverse_captcha(s) = inverse_captcha_(s :: s[0]) def inverse_captcha_1_(s0, s1, acc=0): case (reiterable(s0), reiterable(s1)): match ((|d0|) :: rest0, (|d0|) :: rest1): return inverse_captcha_1_(rest0, rest1, acc + int(d0)) match ((|d0|) :: rest0, (|d1|) :: rest1): return inverse_captcha_1_(rest0, rest1, acc) return acc def inverse_captcha_1(s) = inverse_captcha_1_(s, s$[len(s)//2:] :: s) def test_inverse_captcha(): assert "1111" |> inverse_captcha == 4 assert "1122" |> inverse_captcha == 3 assert "1234" |> inverse_captcha == 0 assert "91212129" |> inverse_captcha == 9 def test_inverse_captcha_1(): assert "1212" |> inverse_captcha_1 == 6 assert "1221" |> inverse_captcha_1 == 0 assert "123425" |> inverse_captcha_1 == 4 assert "123123" |> inverse_captcha_1 == 12 assert "12131415" |> inverse_captcha_1 == 4 if __name__ == "__main__": sys.argv[1] |> inverse_captcha |> print sys.argv[1] |> inverse_captcha_1 |> print Advent of Code 2017: introduction It’s a common lament of mine that I don’t get to write a lot of code in my day-to-day job. I like the feeling of making something from nothing, and I often look for excuses to write bits of code, both at work and outside it. Advent of Code is a daily series of programming challenges for the month of December, and is about to start its third annual incarnation. I discovered it too late to take part in any serious way last year, but I’m going to give it a try this year. There are no restrictions on programming language (so of course some people delight in using esoteric languages like Brainf**k), but I think I’ll probably stick with Python for the most part. That said, I miss my Haskell days and I’m intrigued by new kids on the block Go and Rust, so I might end up throwing in a few of those on some of the simpler challenges. I’d like to focus a bit more on how I solve the puzzles. They generally come in two parts, with the second part only being revealed after successful completion of the first part. With that in mind, test-driven development makes a lot of sense, because I can verify that I haven’t broken the solution to the first part in modifying to solve the second. I may also take a literate programming approach with org-mode or Jupyter notebooks to document my solutions a bit more, and of course that will make it easier to publish solutions here so I’ll do that as much as I can make time for. On that note, here are some solutions for 2016 that I’ve done recently as a warmup. Day 1: Python Day 1 instructions import numpy as np import pytest as t import sys TURN = { 'L': np.array([[0, 1], [-1, 0]]), 'R': np.array([[0, -1], [1, 0]]) } ORIGIN = np.array([0, 0]) NORTH = np.array([0, 1]) class Santa: def __init__(self, location, heading): self.location = np.array(location) self.heading = np.array(heading) self.visited = [(0,0)] def execute_one(self, instruction): start_loc = self.location.copy() self.heading = self.heading @ TURN[instruction[0]] self.location += self.heading * int(instruction[1:]) self.mark(start_loc, self.location) def execute_many(self, instructions): for i in instructions.split(','): self.execute_one(i.strip()) def distance_from_start(self): return sum(abs(self.location)) def mark(self, start, end): for x in range(min(start[0], end[0]), max(start[0], end[0])+1): for y in range(min(start[1], end[1]), max(start[1], end[1])+1): if any((x, y) != start): self.visited.append((x, y)) def find_first_crossing(self): for i in range(1, len(self.visited)): for j in range(i): if self.visited[i] == self.visited[j]: return self.visited[i] def distance_to_first_crossing(self): crossing = self.find_first_crossing() if crossing is not None: return abs(crossing[0]) + abs(crossing[1]) def __str__(self): return f'Santa @ {self.location}, heading {self.heading}' def test_execute_one(): s = Santa(ORIGIN, NORTH) s.execute_one('L1') assert all(s.location == np.array([-1, 0])) assert all(s.heading == np.array([-1, 0])) s.execute_one('L3') assert all(s.location == np.array([-1, -3])) assert all(s.heading == np.array([0, -1])) s.execute_one('R3') assert all(s.location == np.array([-4, -3])) assert all(s.heading == np.array([-1, 0])) s.execute_one('R100') assert all(s.location == np.array([-4, 97])) assert all(s.heading == np.array([0, 1])) def test_execute_many(): s = Santa(ORIGIN, NORTH) s.execute_many('L1, L3, R3') assert all(s.location == np.array([-4, -3])) assert all(s.heading == np.array([-1, 0])) def test_distance(): assert Santa(ORIGIN, NORTH).distance_from_start() == 0 assert Santa((10, 10), NORTH).distance_from_start() == 20 assert Santa((-17, 10), NORTH).distance_from_start() == 27 def test_turn_left(): east = NORTH @ TURN['L'] south = east @ TURN['L'] west = south @ TURN['L'] assert all(east == np.array([-1, 0])) assert all(south == np.array([0, -1])) assert all(west == np.array([1, 0])) def test_turn_right(): west = NORTH @ TURN['R'] south = west @ TURN['R'] east = south @ TURN['R'] assert all(east == np.array([-1, 0])) assert all(south == np.array([0, -1])) assert all(west == np.array([1, 0])) if __name__ == '__main__': instructions = sys.stdin.read() santa = Santa(ORIGIN, NORTH) santa.execute_many(instructions) print(santa) print('Distance from start:', santa.distance_from_start()) print('Distance to target: ', santa.distance_to_first_crossing()) Day 2: Haskell Day 2 instructions module Main where data Pos = Pos Int Int deriving (Show) -- Magrittr-style pipe operator (|>) :: a -> (a -> b) -> b x |> f = f x swapPos :: Pos -> Pos swapPos (Pos x y) = Pos y x clamp :: Int -> Int -> Int -> Int clamp lower upper x | x < lower = lower | x > upper = upper | otherwise = x clampH :: Pos -> Pos clampH (Pos x y) = Pos x' y' where y' = clamp 0 4 y r = abs (2 - y') x' = clamp r (4-r) x clampV :: Pos -> Pos clampV = swapPos . clampH . swapPos buttonForPos :: Pos -> String buttonForPos (Pos x y) = [buttons !! y !! x] where buttons = [" D ", " ABC ", "56789", " 234 ", " 1 "] decodeChar :: Pos -> Char -> Pos decodeChar (Pos x y) 'R' = clampH $ Pos (x+1) y decodeChar (Pos x y) 'L' = clampH $ Pos (x-1) y decodeChar (Pos x y) 'U' = clampV $ Pos x (y+1) decodeChar (Pos x y) 'D' = clampV $ Pos x (y-1) decodeLine :: Pos -> String -> Pos decodeLine p "" = p decodeLine p (c:cs) = decodeLine (decodeChar p c) cs makeCode :: String -> String makeCode instructions = lines instructions -- split into lines |> scanl decodeLine (Pos 1 1) -- decode to positions |> tail -- drop start position |> concatMap buttonForPos -- convert to buttons main = do input <- getContents putStrLn $ makeCode input Research Data Management Forum 18, Manchester !!! intro "" Monday 20 and Tuesday 21 November 2017 I’m at the Research Data Management Forum in Manchester. I thought I’d use this as an opportunity to try liveblogging, so during the event some notes should appear in the box below (you may have to manually refresh your browser tab periodically to get the latest version). I've not done this before, so if the blog stops updating then it's probably because I've stopped updating it to focus on the conference instead! This was made possible using GitHub's cool [Gist](https://gist.github.com) tool. Draft content policy I thought it was about time I had some sort of content policy on here so this is a first draft. It will eventually wind up as a separate page. Feedback welcome! !!! aside “Content policy” This blog’s primary purpose is as a reflective learning tool for my own development; my aim in writing any given post is mainly to expose and develop my own thinking on a topic. My reasons for making a public blog rather than a private journal are: 1. If I'm lucky, someone smarter than me will provide feedback that will help me and my readers to learn more 2. If I'm extra lucky, someone else might learn from the material as well Each post, therefore, represents the state of my thinking at the time I wrote it, or perhaps a deliberate provocation or exaggeration; either way, if you don't know me personally please don't judge me based entirely on my past words. This is a request though, not an attempt to excuse bad behaviour on my part. I accept full responsibility for any consequences of my words, whether intended or not. I will not remove comments or ban individuals for disagreeing with me, only for behaving offensively or disrespectfully. I will do my best to be fair and balanced and explain decisions that I take, but I reserve the right to take those decisions without making any explanation at all if it seems likely to further inflame a situation. If I end up responding to anything simply with a link to this policy, that's probably all the explanation you're going to get. It should go without saying, but the opinions presented in this blog are my own and not those of my employer or anyone else I might at times represent. Learning to live with anxiety !!! intro "" This is a post that I’ve been writing for months, and writing in my head for years. For some it will explain aspects of my personality that you might have wondered about. For some it will just be another person banging on self-indulgently about so-called “mental health issues”. Hopefully, for some it will demystify some stuff and show that you’re not alone and things do get better. For as long as I can remember I’ve been a worrier. I’ve also suffered from bouts of what I now recognise as depression, on and off since my school days. It’s only relatively recently that I’ve come to the realisation that these two might be connected and that my ‘worrying’ might in fact be outside the normal range of healthy human behaviour and might more accurately be described as chronic anxiety. You probably won’t have noticed it, but it’s been there. More recently I’ve begun feeling like I’m getting on top of it and feeling “normal” for the first time in my life. Things I’ve found that help include: getting out of the house more and socialising with friends; and getting a range of exercise, outdoors and away from the city (rock climbing is mentally and physically engaging and open water swimming is indescribably joyful). But mostly it’s the cognitive behavioural therapy (CBT) and the antidepressants. Before I go any further, a word about drugs (“don’t do drugs, kids”): I’m on the lowest available dose of a common antidepressant. This isn’t because it stops me being sad all the time (I’m not) or because it makes all my problems go away (it really doesn’t). It’s because the scientific evidence points to a combination of CBT and antidepressants as being the single most effective treatment for generalised anxiety disorder. The reason for this is simple: CBT isn’t easy, because it asks you to challenge habits and beliefs you’ve held your whole life. In the short term there is going to be more anxiety and some antidepressants are also effective at blunting the effect of this additional anxiety. In short, CBT is what makes you better, and the drugs just make it a little bit more effective. A lot of people have misconceptions about what it means to be ‘in therapy’. I suspect a lot of these are derived from the psychoanalysis we often see portrayed in (primarily US) film and TV. The problem with that type of navel-gazing therapy is that you can spend years doing it, finally reach some sort of breakthrough insight, and still not have no idea what the supposed insight means for your actual life. CBT is different in that rather than addressing feelings directly it focuses on habits in your thoughts (cognitive) and actions (behavioural) with feeling better as an outcome (therapy). CBT and related forms of therapy now have decades of clinical evidence showing that they really work. It uses a wide range of techniques to identify, challenge and reduce various common unhelpful thoughts and behaviours. By choosing and practicing these, you can break bad mental habits that you’ve been carrying around, often for decades. For me this means giving fair weight to my successes as well as my failings, allowing flexibility into the rigid rules that I have always, subconsciously, lived by, and being a bit kinder to myself when I make mistakes. It’s not been easy and I have to remind myself to practice this every day, but it’s really helped. !!! aside “More info” If you live in the UK, you might not be aware that you can get CBT and other psychological therapies on the NHS through a scheme called IAPT (improving access to psychological therapies). You can self-refer so you don’t need to see a doctor first, but you might want to anyway if you think medication might help. They also have a progression of treatments, so you might be offered a course of “guided self-help” and then progressed to CBT or another talking therapy if need be. This is what happened to me, and it did help a bit but it was CBT that helped me the most. Becoming a librarian What is a librarian? Is it someone who has a masters degree in librarianship and information science? Is it someone who looks after information for other people? Is it simply someone who works in a library? I’ve been grappling with this question a lot lately because I’ve worked in academic libraries for about 3 years now and I never really thought that’s something that might happen. People keep referring to me as “a librarian” but there’s some imposter feelings here because all the librarians around me have much more experience, have skills in areas like cataloguing and collection management and, generally, have a librarian masters degree. So I’ve been thinking about what it actually means to me to be a librarian or not. NB. some of these may be tongue-in-cheek Ways in which I am a librarian: I work in a library I help people to access and organise information I have a cat I like gin Ways in which I am not a librarian: I don’t have a librarianship qualification I don’t work with books 😉 I don’t knit (though I can probably remember how if pressed) I don’t shush people or wear my hair in a bun (I can confirm that this is also true of every librarian I know) Ways in which I am a shambrarian: I like beer I have more IT experience and qualification than librarianship At the end of the day, I still don’t know how I feel about this or, for that matter, how important it is. I’m probably going to accept whatever title people around me choose to bestow, though any label will chafe at times! Lean Libraries: applying agile practices to library services Kanban board Jeff Lasovski (via Wikimedia Commons) I’ve been working with our IT services at work quite closely for the last year as product owner for our new research data portal, ORDA. That’s been a fascinating process for me as I’ve been able to see first-hand some of the agile techniques that I’ve been reading about from time-to-time on the web over the last few years. They’re in the process of adopting a specific set of practices going under the name “Scrum”, which is fun because it uses some novel terminology that sounds pretty weird to non-IT folks, like “scrum master”, “sprint” and “product backlog”. On my small project we’ve had great success with the short cycle times and been able to build trust with our stakeholders by showing concrete progress on a regular basis. Modern librarianship is increasingly fluid, particularly in research services, and I think that to handle that fluidity it’s absolutely vital that we are able to work in a more agile way. I’m excited about the possibilities of some of these ideas. However, Scrum as implemented by our IT services doesn’t seem something that transfers directly to the work that we do: it’s too specialised for software development to adapt directly. What I intend to try is to steal some of the individual practices on an experimental basis and simply see what works and what doesn’t. The Lean concepts currently popular in IT were originally developed in manufacturing: if they can be translated from the production of physical goods to IT, I don’t see why we can’t make the ostensibly smaller step of translating them to a different type of knowledge work. I’ve therefore started reading around this subject to try and get as many ideas as possible. I’m generally pretty rubbish at taking notes from books, so I’m going to try and record and reflect on any insights I make on this blog. The framework for trying some of these out is clearly a Plan-Do-Check-Act continuous improvement cycle, so I’ll aim to reflect on that process too. I’m sure there will have been people implementing Lean in libraries already, so I’m hoping to be able to discover and learn from them instead of starting froms scratch. Wish me luck! Mozilla Global Sprint 2017 Photo by Lena Bell on Unsplash Every year, the Mozilla Foundation runs a two-day Global Sprint, giving people around the world 50 hours to work on projects supporting and promoting open culture and tech. Though much of the work during the sprint is, of course, technical software development work, there are always tasks suited to a wide range of different skill sets and experience levels. The participants include writers, designers, teachers, information professionals and many others. This year, for the first time, the University of Sheffield hosted a site, providing a space for local researchers, developers and others to get out of their offices, work on #mozsprint and link up with others around the world. The Sheffield site was organised by the Research Software Engineering group in collaboration with the University Library. Our site was only small compared to others, but we still had people working on several different projects. My reason for taking part in the sprint was to contribute to the international effort on the Library Carpentry project. A team spread across four continents worked throughout the whole sprint to review and develop our lesson material. As there were no other Library Carpentry volunteers at the Sheffield site, I chose to work on some urgent work around improving the presentation of our workshops and lessons on the web and related workflows. It was a really nice subproject to work on, requiring not only cleaning up and normalising the metadata we hold on workshops and lessons, but also digesting and formalising our current ad hoc process of lesson development. The largest group were solar physicists from the School of Maths and Statistics, working on the SunPy project, an open source environment for solar data analysis. They pushed loads of bug fixes and documentation improvements, and also mentored a new contributor through their first additions to the project. Anna Krystalli from Research Software Engineering worked on the EchoBurst project, which is building a web browser extension to help people break out of their online echo chambers. It does this by using natural language processing techniques to highlight well-written, logically sound articles that disagree with the reader’s stated views on particular topics of interest. Anna was part of an effort to begin extending this technology to online videos. We had a couple of individuals simply taking the opportunity to break out of their normal work environments to work or learn, including a couple of members of library staff show up for a couple of hours to learn how to use git on a new project! IDCC 2017 reflection For most of the last few years I've been lucky enough to attend the International Digital Curation Conference (IDCC). One of the main audiences attending is people who, like me, work on research data management at universities around the world and it's begun to feel like a sort of "home" conference to me. This year, IDCC was held at the Royal College of Surgeons in the beautiful city of Edinburgh. For the last couple of years, my overall impression has been that, as a community, we're moving away from the "first-order" problem of trying to convince people (from PhD students to senior academics) to take RDM seriously and into a rich set of "second-order" problems around how to do things better and widen support to more people. This year has been no exception. Here are a few of my observations and takeaway points. Everyone has a repository now Only last year, the most common question you'd get asked by strangers in the coffee break would be "Do you have a data repository?" Now the question is more likely to be "What are you using for your data repository?", along with more subtle questions about specific components of systems and how they interact. Integrating active storage and archival systems Now that more institutions have data worth preserving, there is more interest in (and in many cases experience of) setting up more seamless integrations between active and archival storage. There are lessons here we can learn. Freezing in amber vs actively maintaining assets There seemed to be an interesting debate going on throughout the conference around the aim of preservation: should we be faithfully preserving the bits and bytes provided without trying to interpret them, or should we take a more active approach by, for example, migrating obsolete formats to newer alternatives. If the former, should we attempt to preserve the software required to access the data as well? If the latter, how much effort do we invest and how do we ensure nothing is lost or altered in the migration? Demonstrating Data Science instead of debating what it is The phrase "Data Science" was once again one of the most commonly uttered of the conference. However, there is now less abstract discussion about what, exactly, is meant by this "data science" thing; this has been replaced more by concrete demonstrations. This change was exemplified perfectly by the keynote by data scientist Alice Daish, who spent a riveting 40 minutes or so enthusing about all the cool stuff she does with data at the British Museum. Recognition of software as an issue Even as recently as last year, I've struggled to drum up much interest in discussing software sustainability and preservation at events like this; the interest was there, but there were higher priorities. So I was completely taken by surprise when we ended up with 30+ people in the Software Preservation Birds of a Feather (BoF) session, and when very little input was needed from me as chair to keep a productive discussion going for a full 90 minutes. Unashamed promotion of openness As a community we seem to have nearly overthrown our collective embarrassment about the phrase "open data" (although maybe this is just me). We've always known it was a good thing, but I know I've been a bit of an apologist in the past, feeling that I had to "soften the blow" when asking researchers to be more open. Now I feel more confident in leading with the benefits of openness, and it felt like that's a change reflected in the community more widely. Becoming more involved in the conference This year, I took a decision to try and do more to contribute to the conference itself, and I felt like this was pretty successful both in making that contribution and building up my own profile a bit. I presented a paper on one of my current passions, Library Carpentry; it felt really good to be able to share my enthusiasm. I presented a poster on our work integrating our data repository and digital preservation platform; this gave me more of a structure for networking during breaks, as I was able to stand by the poster and start discussions with anyone who seemed interested. I chaired a parallel session; a first for me, and a different challenge from presenting or simply attending the talks. And finally, I proposed and chaired the Software Preservation BoF session (blog post forthcoming). Renewed excitement It's weird, and possibly all in my imagination, but there seemed to be more energy at this conference than at the previous couple I've been to. More people seemed to be excited about the work we're all doing, recent achievements and the possibilities for the future. Introducing PyRefine: OpenRefine meets Python I’m knocking the rust off my programming skills by attempting to write a pure-Python interpreter for OpenRefine “scripts”. OpenRefine is a great tool for exploring and cleaning datasets prior to analysing them. It also records an undo history of all actions that you can export as a sort of script in JSON format. One thing that bugs me though is that, having spent some time interactively cleaning up your dataset, you then need to fire up OpenRefine again and do some interactive mouse-clicky stuff to apply that cleaning routine to another dataset. You can at least re-import the JSON undo history to make that as quick as possible, but there’s no getting around the fact that there’s no quick way to do it from a cold start. There is a project, BatchRefine, that extends the OpenRefine server to accept batch requests over a HTTP API, but that isn’t useful when you can’t or don’t want to keep a full Java stack running in the background the whole time. My concept is this: you use OR to explore the data interactively and design a cleaning process, but then export the process to JSON and integrate it into your analysis in Python. That way it can be repeated ad nauseam without having to fire up a full Java stack. I’m taking some inspiration from the great talk “So you want to be a wizard?" by Julia Evans (@b0rk), who recommends trying experiments as a way to learn. She gives these Rules of Programming Experiments: “it doesn’t have to be good it doesn’t have to work you have to learn something” In that spirit, my main priorities are: to see if this can be done; to see how far I can get implementing it; and to learn something. If it also turns out to be a useful thing, well, that’s a bonus. Some of the interesting possible challenges here: Implement all core operations; there are quite a lot of these, some of which will be fun (i.e. non-trivial) to implement Implement (a subset of?) GREL, the General Refine Expression Language; I guess my undergrad course on implementing parsers and compilers will come in handy after all! Generate clean, sane Python code from the JSON rather than merely executing it; more than anything, this would be a nice educational tool for users of OpenRefine who want to see how to do equivalent things in Python Selectively optimise key parts of the process; this will involve profiling the code to identify bottlenecks as well as tweaking the actual code to go faster Potentially handle contributions to the code from other people; I’d be really happy if this happened but I’m realistic… If you’re interested, the project is called PyRefine and it’s on github. Constructive criticism, issues & pull requests all welcome! Implementing Yesterbox in emacs with mu4e I’ve been meaning to give Yesterbox a try for a while. The general idea is that each day you only deal with email that arrived yesterday or earlier. This forms your inbox for the day, hence “yesterbox”. Once you’ve emptied your yesterbox, or at least got through some minimum number (10 is recommended) then you can look at emails from today. Even then you only really want to be dealing with things that are absolutely urgent. Anything else can wait til tomorrow. The motivation for doing this is to get away from the feeling that we are King Canute, trying to hold back the tide. I find that when I’m processing my inbox toward zero there’s always a temptation to keep skipping to the new stuff that’s just come in. Hiding away the new email until I’ve dealt with the old is a very interesting idea. I use mu4e in emacs for reading my email, and handily the mu search syntax is very flexible so you’d think it would be easy to create a yesterbox filter: maildir:"/INBOX" date:..1d Unfortunately, 1d is interpreted as “24 hours ago from right now” so this filter misses everything that was sent yesterday but less than 24 hours ago. There was a feature request raised on the mu github repository to implement an additional date filter syntax but it seems to have died a death for now. In the meantime, the answer to this is to remember that my workplace observes fairly standard office hours, so that anything sent more than 9 hours ago is unlikely to have been sent today. The following does the trick: maildir:"/INBOX" date:..9h In my mu4e bookmarks list, that looks like this: (setq mu4e-bookmarks '(("flag:unread AND NOT flag:trashed" "Unread messages" ?u) ("flag:flagged maildir:/archive" "Starred messages" ?s) ("date:today..now" "Today's messages" ?t) ("date:7d..now" "Last 7 days" ?w) ("maildir:\"/Mailing lists.*\" (flag:unread OR flag:flagged)" "Unread in mailing lists" ?M) ("maildir:\"/INBOX\" date:..1d" "Yesterbox" ?y))) ;; <- this is the new one Rewarding good practice in research From opensource.com on Flickr Whenever I’m involved in a discussion about how to encourage researchers to adopt new practices, eventually someone will come out with some variant of the following phrase: “That’s all very well, but researchers will never do XYZ until it’s made a criterion in hiring and promotion decisions.” With all the discussion of carrots and sticks I can see where this attitude comes from, and strongly empathise with it, but it raises two main problems: It’s unfair and more than a little insulting to anyone to be lumped into one homogeneous group; and Taking all the different possible XYZs into account, that’s an awful lot of hoops to expect anyone to jump through. Firstly, “researchers” are as diverse as the rest of us in terms of what gets them out of bed in the morning. Some of us want prestige; some want to contribute to a greater good; some want to create new things; some just enjoy the work. One thing I’d argue we all have in common is this: nothing is more offputting than feeling like you’re being strongarmed into something you don’t want to do. If we rely on simplistic metrics, people will focus on those and miss the point. At best people will disengage and at worst they will actively game the system. I’ve got to do these ten things to get my next payrise, and still retain my sanity? Ok, what’s the least I can get away with and still tick them off. You see it with students taking poorly-designed assessments and grown-ups are no difference. We do need to wield carrots as well as sticks, but the whole point is that these practices are beneficial in and of themselves. The carrots are already there if we articulate them properly and clear the roadblocks (don’t you enjoy mixed metaphors?). Creating artificial benefits will just dilute the value of the real ones. Secondly, I’ve heard a similar argument made for all of the following practices and more: Research data management Open Access publishing Public engagement New media (e.g. blogging) Software management and sharing Some researchers devote every waking hour to their work, whether it’s in the lab, writing grant applications, attending conferences, authoring papers, teaching, and so on and so on. It’s hard to see how someone with all this in their schedule can find time to exercise any of these new skills, let alone learn them in the first place. And what about the people who sensibly restrict the hours taken by work to spend more time doing things they enjoy? Yes, all of the above practices are valuable, both for the individual and the community, but they’re all new (to most) and hence require more effort up front to learn. We have to accept that it’s inevitably going to take time for all of them to become “business as usual”. I think if the hiring/promotion/tenure process has any role in this, it’s in asking whether the researcher can build a coherent narrative as to why they’ve chosen to focus their efforts in this area or that. You’re not on Twitter but your data is being used by 200 research groups across the world? Great! You didn’t have time to tidy up your source code for github but your work is directly impacting government policy? Brilliant! We still need convince more people to do more of these beneficial things, so how? Call me naïve, but maybe we should stick to making rational arguments, calming fears and providing low-risk opportunities to learn new skills. Acting (compassionately) like a stuck record can help. And maybe we’ll need to scale back our expectations in other areas (journal impact factors, anyone?) to make space for the new stuff. Software Carpentry: SC Test; does your software do what you meant? “The single most important rule of testing is to do it.” — Brian Kernighan and Rob Pike, The Practice of Programming (quote taken from SC Test page One of the trickiest aspects of developing software is making sure that it actually does what it’s supposed to. Sometimes failures are obvious: you get completely unreasonable output or even (shock!) a comprehensible error message. But failures are often more subtle. Would you notice if your result was out by a few percent, or consistently ignored the first row of your input data? The solution to this is testing: take some simple example input with a known output, run the code and compare the actual output with the expected one. Implement a new feature, test and repeat. Sounds easy, doesn’t it? But then you implement a new bit of code. You test it and everything seems to work fine, except that your new feature required changes to existing code and those changes broke something else. So in fact you need to test everything, and do it every time you make a change. Further than that, you probably want to test that all your separate bits of code work together properly (integration testing) as well as testing the individual bits separately (unit testing). In fact, splitting your tests up like that is a good way of holding on to your sanity. This is actually a lot less scary than it sounds, because there are plenty of tools now to automate that testing: you just type a simple test command and everything is verified. There are even tools that enable you to have tests run automatically when you check the code into version control, and even automatically deploy code that passes the tests, a process known as continuous integration or CI. The big problems with testing are that it’s tedious, your code seems to work without it and no-one tells you off for not doing it. At the time when the Software Carpentry competition was being run, the idea of testing wasn’t new, but the tools to help were in their infancy. “Existing tools are obscure, hard to use, expensive, don’t actually provide much help, or all three.” The SC Test category asked entrants “to design a tool, or set of tools, which will help programmers construct and maintain black box and glass box tests of software components at all levels, including functions, modules, and classes, and whole programs.” The SC Test category is interesting in that the competition administrators clearly found it difficult to specify what they wanted to see in an entry. In fact, the whole category was reopened with a refined set of rules and expectations. Ultimately, it’s difficult to tell whether this category made a significant difference. Where the tools to write tests used to be very sparse and difficult to use they are now many and several options exist for most programming languages. With this proliferation, several tried-and-tested methodologies have emerged which are consistent across many different tools, so while things still aren’t perfect they are much better. In recent years there has been a culture shift in the wider software development community towards both testing in general and test-first development, where the tests for a new feature are written first, and then the implementation is coded incrementally until all tests pass. The current challenge is to transfer this culture shift to the academic research community! Tools for collaborative markdown editing Photo by Alan Cleaver I really love Markdown1. I love its simplicity; its readability; its plain-text nature. I love that it can be written and read with nothing more complicated than a text-editor. I love how nicely it plays with version control systems. I love how easy it is to convert to different formats with Pandoc and how it’s become effectively the native text format for a wide range of blogging platforms. One frustration I’ve had recently, then, is that it’s surprisingly difficult to collaborate on a Markdown document. There are various solutions that almost work but at best feel somehow inelegant, especially when compared with rock solid products like Google Docs. Finally, though, we’re starting to see some real possibilities. Here are some of the things I’ve tried, but I’d be keen to hear about other options. 1. Just suck it up To be honest, Google Docs isn’t that bad. In fact it works really well, and has almost no learning curve for anyone who’s ever used Word (i.e. practically anyone who’s used a computer since the 90s). When I’m working with non-technical colleagues there’s nothing I’d rather use. It still feels a bit uncomfortable though, especially the vendor lock-in. You can export a Google Doc to Word, ODT or PDF, but you need to use Google Docs to do that. Plus as soon as I start working in a word processor I get tempted to muck around with formatting. 2. Git(hub) The obvious solution to most techies is to set up a GitHub repo, commit the document and go from there. This works very well for bigger documents written over a longer time, but seems a bit heavyweight for a simple one-page proposal, especially over short timescales. Who wants to muck around with pull requests and merging changes for a document that’s going to take 2 days to write tops? This type of project doesn’t need a bug tracker or a wiki or a public homepage anyway. Even without GitHub in the equation, using git for such a trivial use case seems clunky. 3. Markdown in Etherpad/Google Docs Etherpad is great tool for collaborative editing, but suffers from two key problems: no syntax highlighting or preview for markdown (it’s just treated as simple text); and you need to find a server to host it or do it yourself. However, there’s nothing to stop you editing markdown with it. You can do the same thing in Google Docs, in fact, and I have. Editing a fundamentally plain-text format in a word processor just feels weird though. 4. Overleaf/Authorea Overleaf and Authorea are two products developed to support academic editing. Authorea has built-in markdown support but lacks proper simultaneous editing. Overleaf has great simultaneous editing but only supports markdown by wrapping a bunch of LaTeX boilerplate around it. Both OK but unsatisfactory. 5. StackEdit Now we’re starting to get somewhere. StackEdit has both Markdown syntax highlighting and near-realtime preview, as well as integrating with Google Drive and Dropbox for file synchronisation. 6. HackMD HackMD is one that I only came across recently, but it looks like it does exactly what I’m after: a simple markdown-aware editor with live preview that also permits simultaneous editing. I’m a little circumspect simply because I know simultaneous editing is difficult to get right, but it certainly shows promise. 7. Classeur I discovered Classeur literally today: it’s developed by the same team as StackEdit (which is now apparently no longer in development), and is currently in beta, but it looks to offer two killer features: real-time collaboration, including commenting, and pandoc-powered export to loads of different formats. Anything else? Those are the options I’ve come up with so far, but they can’t be the only ones. Is there anything I’ve missed? Other plain-text formats are available. I’m also a big fan of org-mode. ↩︎ Software Carpentry: SC Track; hunt those bugs! This competition will be an opportunity for the next wave of developers to show their skills to the world — and to companies like ours. — Dick Hardt, ActiveState (quote taken from SC Track page) All code contains bugs, and all projects have features that users would like but which aren’t yet implemented. Open source projects tend to get more of these as their user communities grow and start requesting improvements to the product. As your open source project grows, it becomes harder and harder to keep track of and prioritise all of these potential chunks of work. What do you do? The answer, as ever, is to make a to-do list. Different projects have used different solutions, including mailing lists, forums and wikis, but fairly quickly a whole separate class of software evolved: the bug tracker, which includes such well-known examples as Bugzilla, Redmine and the mighty JIRA. Bug trackers are built entirely around such requests for improvement, and typically track them through workflow stages (planning, in progress, fixed, etc.) with scope for the community to discuss and add various bits of metadata. In this way, it becomes easier both to prioritise problems against each other and to use the hive mind to find solutions. Unfortunately most bug trackers are big, complicated beasts, more suited to large projects with dozens of developers and hundreds or thousands of users. Clearly a project of this size is more difficult to manage and requires a certain feature set, but the result is that the average bug tracker is non-trivial to set up for a small single-developer project. The SC Track category asked entrants to propose a better bug tracking system. In particular, the judges were looking for something easy to set up and configure without compromising on functionality. The winning entry was a bug-tracker called Roundup, proposed by Ka-Ping Yee. Here we have another tool which is still in active use and development today. Given that there is now a huge range of options available in this area, including the mighty github, this is no small achievement. These days, of course, github has become something of a de facto standard for open source project management. Although ostensibly a version control hosting platform, each github repository also comes with a built-in issue tracker, which is also well-integrated with the “pull request” workflow system that allows contributors to submit bug fixes and features themselves. Github’s competitors, such as GitLab and Bitbucket, also include similar features. Not everyone wants to work in this way though, so it’s good to see that there is still a healthy ecosystem of open source bug trackers, and that Software Carpentry is still having an impact. Software Carpentry: SC Config; write once, compile anywhere Nine years ago, when I first release Python to the world, I distributed it with a Makefile for BSD Unix. The most frequent questions and suggestions I received in response to these early distributions were about building it on different Unix platforms. Someone pointed me to autoconf, which allowed me to create a configure script that figured out platform idiosyncracies Unfortunately, autoconf is painful to use – its grouping, quoting and commenting conventions don’t match those of the target language, which makes scripts hard to write and even harder to debug. I hope that this competition comes up with a better solution — it would make porting Python to new platforms a lot easier! — Guido van Rossum, Technical Director, Python Consortium (quote taken from SC Config page) On to the next Software Carpentry competition category, then. One of the challenges of writing open source software is that you have to make it run on a wide range of systems over which you have no control. You don’t know what operating system any given user might be using or what libraries they have installed, or even what versions of those libraries. This means that whatever build system you use, you can’t just send the Makefile (or whatever) to someone else and expect everything to go off without a hitch. For a very long time, it’s been common practice for source packages to include a configure script that, when executed, runs a bunch of tests to see what it has to work with and sets up the Makefile accordingly. Writing these scripts by hand is a nightmare, so tools like autoconf and automake evolved to make things a little easier. They did, and if the tests you want to use are already implemented they work very well indeed. Unfortunately they’re built on an unholy combination of shell scripting and the archaic Gnu M4 macro language. That means if you want to write new tests you need to understand both of these as well as the architecture of the tools themselves — not an easy task for the average self-taught research programmer. SC Conf, then, called for a re-engineering of the autoconf concept, to make it easier for researchers to make their code available in a portable, platform-independent format. The second round configuration tool winner was SapCat, “a tool to help make software portable”. Unfortunately, this one seems not to have gone anywhere, and I could only find the original proposal on the Internet Archive. There were a lot of good ideas in this category about making catalogues and databases of system quirks to avoid having to rerun the same expensive tests again the way a standard ./configure script does. I think one reason none of these ideas survived is that they were overly ambitions, imagining a grand architecture where their tool provide some overarching source of truth. This is in stark contrast to the way most Unix-like systems work, where each tool does one very specific job well and tools are easy to combine in various ways. In the end though, I think Moore’s Law won out here, making it easier to do the brute-force checks each time than to try anything clever to save time — a good example of avoiding unnecessary optimisation. Add to that the evolution of the generic pkg-config tool from earlier package-specific tools like gtk-config, and it’s now much easier to check for particular versions and features of common packages. On top of that, much of the day-to-day coding of a modern researcher happens in interpreted languages like Python and R, which give you a fully-functioning pre-configured environment with a lot less compiling to do. As a side note, Tom Tromey, another of the shortlisted entrants in this category, is still a major contributor to the open source world. He still seems to be involved in the automake project, contributes a lot of code to the emacs community too and blogs sporadically at The Cliffs of Inanity. Semantic linefeeds: one clause per line I’ve started using “semantic linefeeds”, a concept I discovered on Brandon Rhodes' blog, when writing content, an idea described in that article far better than I could. I turns out this is a very old idea, promoted way back in the day by Brian W Kernighan, contributor to the original Unix system, co-creator of the AWK and AMPL programming languages and co-author of a lot of seminal programming textbooks including “The C Programming Language”. The basic idea is that you break lines at natural gaps between clauses and phrases, rather than simply after the last word before you hit 80 characters. Keeping line lengths strictly to 80 characters isn’t really necessary in these days of wide aspect ratios for screens. Breaking lines at points that make semantic sense in the sentence is really helpful for editing, especially in the context of version control, because it isolates changes to the clause in which they occur rather than just the nearest 80-character block. I also like it because it makes my crappy prose feel just a little bit more like poetry. ☺ Software Carpentry: SC Build; or making a better make Software tools often grow incrementally from small beginnings into elaborate artefacts. Each increment makes sense, but the final edifice is a mess. make is an excellent example: a simple tool that has grown into a complex domain-specific programming language. I look forward to seeing the improvements we will get from designing the tool afresh, as a whole… — Simon Peyton-Jones, Microsoft Research (quote taken from SC Build page) Most people who have had to compile an existing software tool will have come across the venerable make tool (which usually these days means GNU Make). It allows the developer to write a declarative set of rules specifying how the final software should be built from its component parts, mostly source code, allowing the build itself to be carried out by simply typing make at the command line and hitting Enter. Given a set of rules, make will work out all the dependencies between components and ensure everything is built in the right order and nothing that is up-to-date is rebuilt. Great in principle but make is notoriously difficult for beginners to learn, as much of the logic for how builds are actually carried out is hidden beneath the surface. This also makes it difficult to debug problems when building large projects. For these reasons, the SC Build category called for a replacement build tool engineered from the ground up to solve these problems. The second round winner, ScCons, is a Python-based make-like build tool written by Steven Knight. While I could find no evidence of any of the other shortlisted entries, this project (now renamed SCons) continues in active use and development to this day. I actually use this one myself from time to time and to be honest I prefer it in many cases to trendy new tools like rake or grunt and the behemoth that is Apache Ant. Its Python-based SConstruct file syntax is remarkably intuitive and scales nicely from very simple builds up to big and complicated project, with good dependency tracking to avoid unnecessary recompiling. It has a lot of built-in rules for performing common build & compile tasks, but it’s trivial to add your own, either by combining existing building blocks or by writing a new builder with the full power of Python. A minimal SConstruct file looks like this: Program('hello.c') Couldn’t be simpler! And you have the full power of Python syntax to keep your build file simple and readable. It’s interesting that all the entries in this category apart from one chose to use a Python-derived syntax for describing build steps. Python was clearly already a language of choice for flexible multi-purpose computing. The exception is the entry that chose to use XML instead, which I think is a horrible idea (oh how I used to love XML!) but has been used to great effect in the Java world by tools like Ant and Maven. What happened to the original Software Carpentry? “Software Carpentry was originally a competition to design new software tools, not a training course. The fact that you didn’t know that tells you how well it worked.” When I read this in a recent post on Greg Wilson’s blog, I took it as a challenge. I actually do remember the competition, although looking at the dates it was long over by the time I found it. I believe it did have impact; in fact, I still occasionally use one of the tools it produced, so Greg’s comment got me thinking: what happened to the other competition entries? Working out what happened will need a bit of digging, as most of the relevant information is now only available on the Internet Archive. It certainly seems that by November 2008 the domain name had been allowed to lapse and had been replaced with a holding page by the registrar. There were four categories in the competition, each representing a category of tool that the organisers thought could be improved: SC Build: a build tool to replace make SC Conf: a configuration management tool to replace autoconf and automake SC Track: a bug tracking tool SC Test: an easy to use testing framework I’m hoping to be able to show that this work had a lot more impact than Greg is admitting here. I’ll keep you posted on what I find! Changing static site generators: Nanoc → Hugo I’ve decided to move the site over to a different static site generator, Hugo. I’ve been using Nanoc for a long time and it’s worked very well, but lately it’s been taking longer and longer to compile the site and throwing weird errors that I can’t get to the bottom of. At the time I started using Nanoc, static site generators were in their infancy. There weren’t the huge number of feature-loaded options that there are now, so I chose one and I built a whole load of blogging-related functionality myself. I did it in ways that made sense at the time but no longer work well with Nanoc’s latest versions. So it’s time to move to something that has blogging baked-in from the beginning and I’m taking the opportunity to overhaul the look and feel too. Again, when I started there weren’t many pre-existing themes so I built the whole thing myself and though I’m happy with the work I did on it it never quite felt polished enough. Now I’ve got the opportunity to adapt one of the many well-designed themes already out there, so I’ve taken one from the Hugo themes gallery and tweaked the colours to my satisfaction. Hugo also has various features that I’ve wanted to implement in Nanoc but never quite got round to it. The nicest one is proper handling of draft posts and future dates, but I keep finding others. There’s a lot of old content that isn’t quite compatible with the way Hugo does things so I’ve taken the old Nanoc-compiled content and frozen it to make sure that old links should still work. I could probably fiddle with it for years without doing much so it’s probably time to go ahead and publish it. I’m still not completely happy with my choice of theme but one of the joys of Hugo is that I can change that whenever I want. Let me know what you think! License Except where otherwise stated, all content on eRambler by Jez Cope is licensed under a Creative Commons Attribution-ShareAlike 4.0 International license. RDM Resources I occasionally get asked for resources to help someone learn more about research data management (RDM) as a discipline (i.e. for those providing RDM support rather than simply wanting to manage their own data). I’ve therefore collected a few resources together on this page. If you’re lucky I might even update it from time to time! First, a caveat: this is very focussed on UK Higher Education, though much of it will still be relevant for people outside that narrow demographic. My general recommendation would be to start with the Digital Curation Centre (DCC) website and follow links out from there. I also have a slowly growing list of RDM links on Diigo, and there’s an RDM section in my list of blogs and feeds too. Mailing lists Jiscmail is a popular list server run for the benefit of further and higher education in the UK; the following lists are particularly relevant: RESEARCH-DATAMAN DATA-PUBLICATION DIGITAL-PRESERVATION LIS-RESEARCHSUPPORT The Research Data Alliance have a number of Interest Groups and Working Groups that discuss issues by email Events International Digital Curation Conference — major annual conference Research Data Management Forum — roughly every six months, places are limited! RDA Plenary — also every 6 months, but only about 1 in every 3 in Europe Books In no particular order: Martin, Victoria. Demystifying eResearch: A Primer for Librarians. Libraries Unlimited, 2014. Borgman, Christine L. Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, Massachusetts: The MIT Press, 2015. Corti, Louise, Veerle Van den Eynden, and Libby Bishop. Managing and Sharing Research Data. Thousand Oaks, CA: SAGE Publications Ltd, 2014. Pryor, Graham, ed. Managing Research Data. Facet Publishing, 2012. Pryor, Graham, Sarah Jones, and Angus Whyte, eds. Delivering Research Data Management Services: Fundamentals of Good Practice. Facet Publishing, 2013. Ray, Joyce M., ed. Research Data Management: Practical Strategies for Information Professionals. West Lafayette, Indiana: Purdue University Press, 2014. Reports ‘Ten Recommendations for Libraries to Get Started with Research Data Management’. LIBER, 24 August 2012. http://libereurope.eu/news/ten-recommendations-for-libraries-to-get-started-with-research-data-management/. ‘Science as an Open Enterprise’. Royal Society, 2 June 2012. https://royalsociety.org/policy/projects/science-public-enterprise/Report/. Mary Auckland. ‘Re-Skilling for Research’. RLUK, January 2012. http://www.rluk.ac.uk/wp-content/uploads/2014/02/RLUK-Re-skilling.pdf. Journals International Journal of Digital Curation (IJDC) Journal of eScience Librarianship (JeSLib) Fairphone 2: initial thoughts on the original ethical smartphone I’ve had my eye on the Fairphone 2 for a while now, and when my current phone, an aging Samsung Galaxy S4, started playing up I decided it was time to take the plunge. A few people have asked for my thoughts on the Fairphone so here are a few notes. Why I bought it The thing that sparked my interest, and the main reason for buying the phone really, was the ethical stance of the manufacturer. The small Swedish company have gone to great lengths to ensure that both labour and materials are sourced as responsibly as possible. They regularly inspect the factories where the parts are made and assembled to ensure fair treatment of the workers and they source all the raw materials carefully to minimise the environmental impact and the use of conflict minerals. Another side to this ethical stance is a focus on longevity of the phone itself. This is not a product with an intentionally limited lifespan. Instead, it’s designed to be modular and as repairable as possible, by the owner themselves. Spares are available for all of the parts that commonly fail in phones (including screen and camera), and at the time of writing the Fairphone 2 is the only phone to receive 10/10 for reparability from iFixit. There are plans to allow hardware upgrades, including an expansion port on the back so that NFC or wireless charging could be added with a new case, for example. What I like So far, the killer feature for me is the dual SIM card slots. I have both a personal and a work phone, and the latter was always getting left at home or in the office or running out of charge. Now I have both SIMs in the one phone: I can recieve calls on either number, turn them on and off independently and choose which account to use when sending a text or making a call. The OS is very close to “standard” Android, which is nice, and I really don’t miss all the extra bloatware that came with the Galaxy S4. It also has twice the storage of that phone, which is hardly unique but is still nice to have. Overall, it seems like a solid, reliable phone, though it’s not going to outperform anything else at the same price point. It certainly feels nice and snappy for everything I want to use it for. I’m no mobile gamer, but there is that distant promise of upgradability on the horizon if you are. What I don’t like I only have two bugbears so far. Once or twice it’s locked up and become unresponsive, requiring a “manual reset” (removing and replacing the battery) to get going again. It also lacks NFC, which isn’t really a deal breaker, but I was just starting to make occasional use of it on the S4 (mostly experimenting with my Yubikey NEO) and it would have been nice to try out Android Pay when it finally arrives in the UK. Overall It’s definitely a serious contender if you’re looking for a new smartphone and aren’t bothered about serious mobile gaming. You do pay a premium for the ethical sourcing and modularity, but I feel that’s worth it for me. I’m looking forward to seeing how it works out as a phone. Wiring my web I’m a nut for automating repetitive tasks, so I was dead pleased a few years ago when I discovered that IFTTT let me plug different bits of the web together. I now use it for tasks such as: Syndicating blog posts to social media Creating scheduled/repeating todo items from a Google Calendar Making a note to revisit an article I’ve starred in Feedly I’d probably only be half-joking if I said that I spend more time automating things than I save not having to do said things manually. Thankfully it’s also a great opportunity to learn, and recently I’ve been thinking about reimplementing some of my IFTTT workflows myself to get to grips with how it all works. There are some interesting open source projects designed to offer a lot of this functionality, such as Huginn, but I decided to go for a simpler option for two reasons: I want to spend my time learning about the APIs of the services I use and how to wire them together, rather than learning how to use another big framework; and I only have a small Amazon EC2 server to pay with and a heavy Ruby on Rails app like Huginn (plus web server) needs more memory than I have. Instead I’ve gone old-school with a little collection of individual scripts to do particular jobs. I’m using the built-in scheduling functionality of systemd, which is already part of a modern Linux operating system, to get them to run periodically. It also means I can vary the language I use to write each one depending on the needs of the job at hand and what I want to learn/feel like at the time. Currently it’s all done in Python, but I want to have a go at Lisp sometime, and there are some interesting new languages like Go and Julia that I’d like to get my teeth into as well. You can see my code on github as it develops: https://github.com/jezcope/web-plumbing. Comments and contributions are welcome (if not expected) and let me know if you find any of the code useful. Image credit: xkcd #1319, Automation Data is like water, and language is like clothing I admit it: I’m a grammar nerd. I know the difference between ‘who’ and ‘whom’, and I’m proud. I used to be pretty militant, but these days I’m more relaxed. I still take joy in the mechanics of the language, but I also believe that English is defined by its usage, not by a set of arbitrary rules. I’m just as happy to abuse it as to use it, although I still think it’s important to know what rules you’re breaking and why. My approach now boils down to this: language is like clothing. You (probably) wouldn’t show up to a job interview in your pyjamas1, but neither are you going to wear a tuxedo or ballgown to the pub. Getting commas and semicolons in the right place is like getting your shirt buttons done up right. Getting it wrong doesn’t mean you’re an idiot. Everyone will know what you meant. It will affect how you’re perceived, though, and that will affect how your message is perceived. And there are former rules2 that some still enforce that are nonetheless dropping out of regular usage. There was a time when everyone in an office job wore formal clothing. Then it became acceptable just to have a blouse, or a shirt and tie. Then the tie became optional and now there are many professions where perfectly well-respected and competent people are expected to show up wearing nothing smarter than jeans and a t-shirt. One such rule IMHO is that ‘data’ is a plural and should take pronouns like ‘they’ and ‘these’. The origin of the word ‘data’ is in the Latin plural of ‘datum’, and that idea has clung on for a considerable period. But we don’t speak Latin and the English language continues to evolve: ‘agenda’ also began life as a Latin plural, but we don’t use the word ‘agendum’ any more. It’s common everyday usage to refer to data with singular pronouns like ‘it’ and ‘this’, and it’s very rare to see someone referring to a single datum (as opposed to ‘data point’ or something). If you want to get technical, I tend to think of data as a mass noun, like ‘water’ or ‘information’. It’s uncountable: talking about ‘a water’ or ‘an information’ doesn’t make much sense, but it uses singular pronouns, as in ‘this information’. If you’re interested, the Oxford English Dictionary also takes this position, while Chambers leaves the choice of singular or plural noun up to you. There is absolutely nothing wrong, in my book, with referring to data in the plural as many people still do. But it’s no longer a rule and for me it’s weakened further from guideline to preference. It’s like wearing a bow-tie to work. There’s nothing wrong with it and some people really make it work, but it’s increasingly outdated and even a little eccentric. or maybe you’d totally rock it. ↩︎ Like not starting a sentence with a conjunction… ↩︎ #IDCC16 day 2: new ideas Well, I did a great job of blogging the conference for a couple of days, but then I was hit by the bug that’s been going round and didn’t have a lot of energy for anything other than paying attention and making notes during the day! I’ve now got round to reviewing my notes so here are a few reflections on day 2. Day 2 was the day of many parallel talks! So many great and inspiring ideas to take in! Here are a few of my take-home points. Big science and the long tail The first parallel session had examples of practical data management in the real world. Jian Qin & Brian Dobreski (School of Information Studies, Syracuse University) worked on reproducibility with one of the research groups involved with the recent gravitational wave discovery. “Reproducibility” for this work (as with much of physics) mostly equates to computational reproducibility: tracking the provenance of the code and its input and output is key. They also found that in practice the scientists' focus was on making the big discovery, and ensuring reproducibility was seen as secondary. This goes some way to explaining why current workflows and tools don’t really capture enough metadata. Milena Golshan & Ashley Sands (Center for Knowledge Infrastructures, UCLA) investigated the use of Software-as-a-Service (SaaS, such as Google Drive, Dropbox or more specialised tools) as a way of meeting the needs of long-tail science research such as ocean science. This research is characterised by small teams, diverse data, dynamic local development of tools, local practices and difficulty disseminating data. This results in a need for researchers to be generalists, as opposed to “big science” research areas, where they can afford to specialise much more deeply. Such generalists tend to develop their own isolated workflows, which can differ greatly even within a single lab. Long-tail research also often struggles from a lack of dedicated IT support. They found that use of SaaS could help to meet these challenges, but with a high cost required to cover the needed guarantees of security and stability. Education & training This session focussed on the professional development of library staff. Eleanor Mattern (University of Pittsburgh) described the immersive training introduced to improve librarians' understanding of the data needs of their subject areas in delivering their RDM service delivery model. The participants each conducted a “disciplinary deep dive”, shadowing researchers and then reporting back to the group on their discoveries with a presentation and discussion. Liz Lyon (also University of Pittsburgh, formerly UKOLN/DCC) gave a systematic breakdown of the skills, knowledge and experience required in different data-related roles, obtained from an analysis of job adverts. She identified distinct roles of data analyst, data engineer and data journalist, and as well as each role’s distinctive skills, pinpointed common requirements of all three: Python, R, SQL and Excel. This work follows on from an earlier phase which identified an allied set of roles: data archivist, data librarian and data steward. Data sharing and reuse This session gave an overview of several specific workflow tools designed for researchers. Marisa Strong (University of California Curation Centre/California Digital Libraries) presented Dash, a highly modular tool for manual data curation and deposit by researchers. It’s built on their flexible backend, Stash, and though it’s currently optimised to deposit in their Merritt data repository it could easily be hooked up to other repositories. It captures DataCite metadata and a few other fields, and is integrated with ORCID to uniquely identify people. In a different vein, Eleni Castro (Institute for Quantitative Social Science, Harvard University) discussed some of the ways that Harvard’s Dataverse repository is streamlining deposit by enabling automation. It provides a number of standardised endpoints such as OAI-PMH for metadata harvest and SWORD for deposit, as well as custom APIs for discovery and deposit. Interesting use cases include: An addon for the Open Science Framework to deposit in Dataverse via SWORD An R package to enable automatic deposit of simulation and analysis results Integration with publisher workflows Open Journal Systems A growing set of visualisations for deposited data In the future they’re also looking to integrate with DMPtool to capture data management plans and with Archivematica for digital preservation. Andrew Treloar (Australian National Data Service) gave us some reflections on the ANDS “applications programme”, a series of 25 small funded projects intended to address the fourth of their strategic transformations, single use → reusable. He observed that essentially these projects worked because they were able to throw money at a problem until they found a solution: not very sustainable. Some of them stuck to a traditional “waterfall” approach to project management, resulting in “the right solution 2 years late”. Every researcher’s needs are “special” and communities are still constrained by old ways of working. The conclusions from this programme were that: “Good enough” is fine most of the time Adopt/Adapt/Augment is better than Build Existing toolkits let you focus on the 10% functionality that’s missing Succussful projects involved research champions who can: 1) articulate their community’s requirements; and 2) promote project outcomes Summary All in all, it was a really exciting conference, and I’ve come home with loads of new ideas and plans to develop our services at Sheffield. I noticed a continuation of some of the trends I spotted at last year’s IDCC, especially an increasing focus on “second-order” problems: we’re no longer spending most of our energy just convincing researchers to take data management seriously and are able to spend more time helping them to do it better and get value out of it. There’s also a shift in emphasis (identified by closing speaker Cliff Lynch) from sharing to reuse, and making sure that data is not just available but valuable. #IDCC16 Day 1: Open Data The main conference opened today with an inspiring keynote by Barend Mons, Professor in Biosemantics, Leiden University Medical Center. The talk had plenty of great stuff, but two points stood out for me. First, Prof Mons described a newly discovered link between Huntingdon’s Disease and a previously unconsidered gene. No-one had previously recognised this link, but on mining the literature, an indirect link was identified in more than 10% of the roughly 1 million scientific claims analysed. This is knowledge for which we already had more than enough evidence, but which could never have been discovered without such a wide-ranging computational study. Second, he described a number of behaviours which should be considered “malpractice” in science: Relying on supplementary data in articles for data sharing: the majority of this is trash (paywalled, embedded in bitmap images, missing) Using the Journal Impact Factor to evaluate science and ignoring altmetrics Not writing data stewardship plans for projects (he prefers this term to “data management plan”) Obstructing tenure for data experts by assuming that all highly-skilled scientists must have a long publication record A second plenary talk from Andrew Sallons of the Centre for Open Science introduced a number of interesting-looking bits and bobs, including the Transparency & Openness Promotion (TOP) Guidelines which set out a pathway to help funders, publishers and institutions move towards more open science. The rest of the day was taken up with a panel on open data, a poster session, some demos and a birds-of-a-feather session on sharing sensitive/confidential data. There was a great range of posters, but a few that stood out to me were: Lessons learned about ISO 16363 (“Audit and certification of trustworthy digital repositories”) certification from the British Library Two separate posters (from the Universities of Toronto and Colorado) about disciplinary RDM information & training for liaison librarians A template for sharing psychology data developed by a psychologist-turned-information researcher from Carnegie Mellon University More to follow, but for now it’s time for the conference dinner! #IDCC16 Day 0: business models for research data management I’m at the International Digital Curation Conference 2016 (#IDCC16) in Amsterdam this week. It’s always a good opportunity to pick up some new ideas and catch up with colleagues from around the world, and I always come back full of new possibilities. I’ll try and do some more reflective posts after the conference but I thought I’d do some quick reactions while everything is still fresh. Monday and Thursday are pre- and post-conference workshop days, and today I attended Developing Research Data Management Services. Joy Davidson and Jonathan Rans from the Digital Curation Centre (DCC) introduced us to the Business Model Canvas, a template for designing a business model on a single sheet of paper. The model prompts you to think about all of the key facets of a sustainable, profitable business, and can easily be adapted to the task of building a service model within a larger institution. The DCC used it as part of the Collaboration to Clarify Curation Costs (4C) project, whose output the Curation Costs Exchange is also worth a look. It was a really useful exercise to be able to work through the whole process for an aspect of research data management (my table focused on training & guidance provision), both because of the ideas that came up and also the experience of putting the framework into practice. It seems like a really valuable tool and I look forward to seeing how it might help us with our RDM service development. Tomorrow the conference proper begins, with a range of keynotes, panel sessions and birds-of-a-feather meetings so hopefully more then! About me I help people in Higher Education communicate and collaborate more effectively using technology. I currently work at the University of Sheffield focusing on research data management policy, practice, training and advocacy. In my free time, I like to: run; play the accordion; morris dance; climb; cook; read (fiction and non-fiction); write. Better Science Through Better Data #scidata17 Better Science through Better DoughnutsJez Cope Update: fixed the link to the slides so it works now! Last week I had the honour of giving my first ever keynote talk, at an event entitled Better Science Through Better Data hosted jointly by Springer Nature and the Wellcome Trust. It was nerve-wracking but exciting and seemed to go down fairly well. I even got accidentally awarded a PhD in the programme — if only it was that easy! The slides for the talk, “Supporting Open Research: The role of an academic library”, are available online (doi:10.15131/shef.data.5537269), and the whole event was video’d for posterity and viewable online. I got some good questions too, mainly from the clever online question system. I didn’t get to answer all of them, so I’m thinking of doing a blog post or two to address a few more. There were loads of other great presentations as well, both keynotes and 7-minute lightning talks, so I’d encourage you to take a look at at least some of it. I’ll pick out a few of my highlights. Dr Aled Edwards (University of Toronto) There’s a major problem with science funding that I hadn’t really thought about before. The available funding pool for research is divided up into pots by country, and often by funding body within a country. Each of these pots have robust processes to award funding to the most important problems and most capable researchers. The problem comes because there is no coordination between these pots, so researchers all over the world end up getting funded to research the most popular problems leading to a lot of duplication of effort. Industry funding suffers from a similar problem, particularly the pharmaceutical industry. Because there is no sharing of data or negative results, multiple companies spend billions researching the same dead ends chasing after the same drugs. This is where the astronomical costs of drug development come from. Dr Edwards presented one alternative, modelled by a company called M4K Pharma. The idea is to use existing IP laws to try and give academic researchers a reasonable, morally-justifiable and sustainable profit on drugs they develop, in contrast to the current model where basic research is funded by governments while large corporations hoover up as much profit as they possibly can. This new model would develop drugs all the way to human trial within academia, then license the resulting drugs to companies to manufacture with a price cap to keep the medicines affordable to all who need them. Core to this effort is openness with data, materials and methodology, and Dr Edwards presented several examples of how this approach benefited academic researchers, industry and patients compared with a closed, competitive focus. Dr Kirstie Whitaker (Alan Turing Institute) This was a brilliant presentation, presenting a practical how-to guide to doing reproducible research, from one researcher to another. I suggest you take a look at her slides yourself: Showing your working: a how-to guide to reproducible research. Dr Whitaker briefly addressed a number of common barriers to reproducible research: Is not considered for promotion: so it should be! Held to higher standards than others: reviewers should be discouraged from nitpicking just because the data/code/whatever is available (true unbiased peer review of these would be great though) Publication bias towards novel findings: it is morally wrong to not publish reproductions, replications etc. so we need to address the common taboo on doing so Plead the 5th: if you share, people may find flaws, but if you don’t they can’t — if you’re worried about this you should ask yourself why! Support additional users: some (much?) of the burden should reasonably on the reuser, not the sharer Takes time: this is only true if you hack it together after the fact; if you do it from the start, the whole process will be quicker! Requires additional skills: important to provide training, but also to judge PhD students on their ability to do this, not just on their thesis & papers The rest of the presentation, the “how-to” guide of the title' was a well-chosen and passionately delivered set of recommendations, but the thing that really stuck out for me is how good Dr Whitaker is at making the point that you only have to do one of these things to improve the quality of your research. It’s easy to get the impression at the moment that you have to be fully, perfectly open or not at all, but it’s actually OK to get there one step at a time, or even not to go all the way at all! Anyway, I think this is a slide deck that speaks for itself, so I won’t say any more! Lightning talk highlights There was plenty of good stuff in the lightning talks, which were constrained to 7 minutes each, but a few of the things that stood out for me were, in no particular order: Code Ocean — share and run code in the cloud dat project — peer to peer data syncronisation tool Can automate metadata creation, data syncing, versioning Set up a secure data sharing network that keeps the data in sync but off the cloud Berlin Institute of Health — open science course for students Pre-print paper Course materials InterMine — taking the pain out of data cleaning & analysis Nix/NixOS as a component of a reproducible paper BoneJ (ImageJ plugin for bone analysis) — developed by a scientist, used a lot, now has a Wellcome-funded RSE to develop next version ESASky — amazing live, online archive of masses of astronomical data Coda I really enjoyed the event (and the food was excellent too). My thanks go out to: The programme committee for asking me to come and give my take — I hope I did it justice! The organising team who did a brilliant job of keeping everything running smoothly before and during the event The University of Sheffield for letting me get away with doing things like this! Blog platform switch I’ve just switched my blog over to the Nikola static site generator. Hopefully you won’t notice a thing, but there might be a few weird spectres around til I get all the kinks ironed out. I’ve made the switch for a couple of main reasons: Nikola supports Jupyter notebooks as a source format for blog posts, which will be useful to include code snippets It’s written in Python, a language which I actually know, so I’m more likely to be able to fix things that break, customise it and potentially contribute to the open source project (by contrast, Hugo is written in Go, which I’m not really familiar with) Chat rooms vs Twitter: how I communicate now CC0, Pixabay This time last year, Brad Colbow published a comic in his “The Brads” series entitled “The long slow death of Twitter”. It really encapsulates the way I’ve been feeling about Twitter for a while now. Go ahead and take a look. I’ll still be here when you come back. According to my Twitter profile, I joined in February 2009 as user #20,049,102. It was nearing its 3rd birthday and, though there were clearly a lot of people already signed up at that point, it was still relatively quiet, especially in the UK. I was a lonely PhD student just starting to get interested in educational technology, and one thing that Twitter had in great supply was (and still is) people pushing back the boundaries of what tech can do in different contexts. Somewhere along the way Twitter got really noisy, partly because more people (especially commercial companies) are using it more to talk about stuff that doesn’t interest me, and partly because I now follow 1,200+ people and find I get several tweets a second at peak times, which no-one could be expected to handle. More recently I’ve found my attention drawn to more focussed communities instead of that big old shouting match. I find I’m much more comfortable discussing things and asking questions in small focussed communities because I know who might be interested in what. If I come across an article about a cool new Python library, I’ll geek out about it with my research software engineer friends; if I want advice on an aspect of my emacs setup, I’ll ask a bunch of emacs users. I feel like I’m talking to people who want to hear what I’m saying. Next to that experience, Twitter just feels like standing on a street corner shouting. IRC channels (mostly on Freenode), and similar things like Slack and gitter form the bulk of this for me, along with a growing number of WhatsApp group chats. Although online chat is theoretically a synchronous medium, I find that I can treat it more as “semi-synchronous”: I can have real-time conversations as they arise, but I can also close them and tune back in later to catch up if I want. Now I come to think about it, this is how I used to treat Twitter before the 1,200 follows happened. I also find I visit a handful of forums regularly, mostly of the Reddit link-sharing or StackExchange Q&A type. /r/buildapc was invaluable when I was building my latest box, /r/EarthPorn (very much not NSFW) is just beautiful. I suppose the risk of all this is that I end up reinforcing my own echo chamber. I’m not sure how to deal with that, but I certainly can’t deal with it while also suffering from information overload. Not just certifiable… A couple of months ago, I went to Oxford for an intensive, 2-day course run by Software Carpentry and Data Carpentry for prospective new instructors. I’ve now had confirmation that I’ve completed the checkout procedure so it’s official: I’m now a certified Data Carpentry instructor! As far as I’m aware, the certification process is now combined, so I’m also approved to teach Software Carpentry material too. And of course there’s Library Carpentry too… SSI Fellowship 2020 I’m honoured and excited to be named one of this year’s Software Sustainability Institute Fellows. There’s not much to write about yet because it’s only just started, but I’m looking forward to sharing more with you. In the meantime, you can take a look at the 2020 fellowship announcement and get an idea of my plans from my application video: Talks Here is a selection of talks that I’ve given. {{% template %}} <%! import arrow %> Date Title Location % for talk in post.data("talks"): % if 'date' in talk: ${date.format('ddd d MMM YYYY')} % endif % if 'url' in talk: % endif ${talk['title']} % if 'url' in talk: % endif ${talk.get('location', '')} % endfor {{% /template %}} erambler-co-uk-4105 ---- eRambler eRambler Recent content on eRambler Intro to the fediverse Wow, it turns out to be 10 years since I wrote this beginners guide to Twitter. Things have moved on a loooooong way since then. Far from being the interesting, disruptive technology it was back then, Twitter has become part of the mainstream, the establishment. Almost everyone and everything is on Twitter now, which has both pros and cons. So what’s the problem? It’s now possible to follow all sorts of useful information feeds, from live updates on transport delays to your favourite sports team’s play-by-play performance to an almost infinite number of cat pictures. In my professional life it’s almost guaranteed that anyone I meet will be on Twitter, meaning that I can contact them to follow up at a later date without having to exchange contact details (and they have options to block me if they don’t like that). On the other hand, a medium where everyone’s opinion is equally valid regardless of knowledge or life experience has turned some parts of the internet into a toxic swamp of hatred and vitriol. It’s easier than ever to forget that we have more common ground with any random stranger than we have similarities, and that’s led to some truly awful acts and a poisonous political arena. Part of the problem here is that each of the social media platforms is controlled by a single entity with almost no accountability to anyone other than shareholders. Technological change has been so rapid that the regulatory regime has no idea how to handle them, leaving them largely free to operate how they want. This has led to a whole heap of nasty consequences that many other people have done a much better job of documenting than I could (Shoshana Zuboff’s book The Age of Surveillance Capitalism is a good example). What I’m going to focus on instead are some possible alternatives. If you accept the above argument, one obvious solution is to break up the effective monopoly enjoyed by Facebook, Twitter et al. We need to be able to retain the wonderful affordances of social media but democratise control of it, so that it can never be dominated by a small number of overly powerful players. What’s the solution? There’s actually a thing that already exists, that almost everyone is familiar with and that already works like this. It’s email. There are a hundred thousand email servers, but my email can always find your inbox if I know your address because that address identifies both you and the email service you use, and they communicate using the same protocol, Simple Mail Transfer Protocol (SMTP)1. I can’t send a message to your Twitter from my Facebook though, because they’re completely incompatible, like oil and water. Facebook has no idea how to talk to Twitter and vice versa (and the companies that control them have zero interest in such interoperability anyway). Just like email, a federated social media service like Mastodon allows you to use any compatible server, or even run your own, and follow accounts on your home server or anywhere else, even servers running different software as long as they use the same ActivityPub protocol. There’s no lock-in because you can move to another server any time you like, and interact with all the same people from your new home, just like changing your email address. Smaller servers mean that no one server ends up with enough power to take over and control everything, as the social media giants do with their own platforms. But at the same time, a small server with a small moderator team can enforce local policy much more easily and block accounts or whole servers that host trolls, nazis or other poisonous people. How do I try it? I have no problem with anyone for choosing to continue to use what we’re already calling “traditional” social media; frankly, Facebook and Twitter are still useful for me to keep in touch with a lot of my friends. However, I do think it’s useful to know some of the alternatives if only to make a more informed decision to stick with your current choices. Most of these services only ask for an email address when you sign up and use of your real name vs a pseudonym is entirely optional so there’s not really any risk in signing up and giving one a try. That said, make sure you take sensible precautions like not reusing a password from another account. Instead of… Try… Twitter, Facebook Mastodon, Pleroma, Misskey Slack, Discord, IRC Matrix WhatsApp, FB Messenger, Telegram Also Matrix Instagram, Flickr PixelFed YouTube PeerTube The web Interplanetary File System (IPFS) Which, if you can believe it, was formalised nearly 40 years ago in 1982 and has only had fairly minor changes since then! ↩︎ Collaborations Workshop 2021: collaborative ideas & hackday My last post covered the more “traditional” lectures-and-panel-sessions approach of the first half of the SSI Collaborations Workshop. The rest of the workshop was much more interactive, consisting of a discussion session, a Collaborative Ideas session, and a whole-day hackathon! The discussion session on day one had us choose a topic (from a list of topics proposed leading up to the workshop) and join a breakout room for that topic with the aim of producing a “speed blog” by then end of 90 minutes. Those speed blogs will be published on the SSI blog over the coming weeks, so I won’t go into that in more detail. The Collaborative Ideas session is a way of generating hackday ideas, by putting people together at random into small groups to each raise a topic of interest to them before discussing and coming up with a combined idea for a hackday project. Because of the serendipitous nature of the groupings, it’s a really good way of generating new ideas from unexpected combinations of individual interests. After that, all the ideas from the session, along with a few others proposed by various participants, were pitched as ideas for the hackday and people started to form teams. Not every idea pitched gets worked on during the hackday, but in the end 9 teams of roughly equal size formed to spend the third day working together. My team’s project: “AHA! An Arts & Humanities Adventure” There’s a lot of FOMO around choosing which team to join for an event like this: there were so many good ideas and I wanted to work on several of them! In the end I settled on a team developing an escape room concept to help Arts & Humanities scholars understand the benefits of working with research software engineers for their research. Five of us rapidly mapped out an example storyline for an escape room, got a website set up with GitHub and populated it with the first few stages of the game. We decided to focus on a story that would help the reader get to grips with what an API is and I’m amazed how much we managed to get done in less than a day’s work! You can try playing through the escape room (so far) yourself on the web, or take a look at the GitHub repository, which contains the source of the website along with a list of outstanding tasks to work on if you’re interested in contributing. I’m not sure yet whether this project has enough momentum to keep going, but it was a really valuable way both of getting to know and building trust with some new people and demonstrating the concept is worth more work. Other projects Here’s a brief rundown of the other projects worked on by teams on the day. Coding Confessions Everyone starts somewhere and everyone cuts corners from time to time. Real developers copy and paste! Fight imposter syndrome by looking through some of these confessions or contributing your own. https://coding-confessions.github.io/ CarpenPI A template to set up a Raspberry Pi with everything you need to run a Carpentries (https://carpentries.org/) data science/software engineering workshop in a remote location without internet access. https://github.com/CarpenPi/docs/wiki Research Dugnads A guide to running an event that is a coming together of a research group or team to share knowledge, pass on skills, tidy and review code, among other software and working best practices (based on the Norwegian concept of a dugnad, a form of “voluntary work done together with other people”) https://research-dugnads.github.io/dugnads-hq/ Collaborations Workshop ideas A meta-project to collect together pitches and ideas from previous Collaborations Workshop conferences and hackdays, to analyse patterns and revisit ideas whose time might now have come. https://github.com/robintw/CW-ideas howDescribedIs Integrate existing tools to improve the machine-readable metadata attached to open research projects by integrating projects like SOMEF, codemeta.json and HowFAIRIs (https://howfairis.readthedocs.io/en/latest/index.html). Complete with CI and badges! https://github.com/KnowledgeCaptureAndDiscovery/somef-github-action Software end-of-project plans Develop a template to plan and communicate what will happen when the fixed-term project funding for your research software ends. Will maintenance continue? When will the project sunset? Who owns the IP? https://github.com/elichad/software-twilight Habeas Corpus A corpus of machine readable data about software used in COVID-19 related research, based on the CORD19 dataset. https://github.com/softwaresaved/habeas-corpus Credit-all Extend the all-contributors GitHub bot (https://allcontributors.org/) to include rich information about research project contributions such as the CASRAI Contributor Roles Taxonomy (https://casrai.org/credit/) https://github.com/dokempf/credit-all I’m excited to see so many metadata-related projects! I plan to take a closer look at what the Habeas Corpus, Credit-all and howDescribedIs teams did when I get time. I also really want to try running a dugnad with my team or for the GLAM Data Science network. Collaborations Workshop 2021: talks & panel session I’ve just finished attending (online) the three days of this year’s SSI Collaborations Workshop (CW for short), and once again it’s been a brilliant experience, as well as mentally exhausting, so I thought I’d better get a summary down while it’s still fresh it my mind. Collaborations Workshop is, as the name suggests, much more focused on facilitating collaborations than a typical conference, and has settled into a structure that starts off with with longer keynotes and lectures, and progressively gets more interactive culminating with a hack day on the third day. That’s a lot to write about, so for this post I’ll focus on the talks and panel session, and follow up with another post about the collaborative bits. I’ll also probably need to come back and add in more links to bits and pieces once slides and the “official” summary of the event become available. Updates 2021-04-07 Added links to recordings of keynotes and panel sessions Provocations The first day began with two keynotes on this year’s main themes: FAIR Research Software and Diversity & Inclusion, and day 2 had a great panel session focused on disability. All three were streamed live and the recordings remain available on Youtube: View the keynotes recording; Google-free alternative link View the panel session recording; Google-free alternative link FAIR Research Software Dr Michelle Barker, Director of the Research Software Alliance, spoke on the challenges to recognition of software as part of the scholarly record: software is not often cited. The FAIR4RS working group has been set up to investigate and create guidance on how the FAIR Principles for data can be adapted to research software as well; as they stand, the Principles are not ideally suited to software. This work will only be the beginning though, as we will also need metrics, training, career paths and much more. ReSA itself has 3 focus areas: people, policy and infrastructure. If you’re interested in getting more involved in this, you can join the ReSA email list. Equality, Diversity & Inclusion: how to go about it Dr Chonnettia Jones, Vice President of Research, Michael Smith Foundation for Health Research spoke extensively and persuasively on the need for Equality, Diversity & Inclusion (EDI) initiatives within research, as there is abundant robust evidence that all research outcomes are improved. She highlighted the difficulties current approaches to EDI have effecting structural change, and changing not just individual behaviours but the cultures & practices that perpetuate iniquity. What initiatives are often constructed around making up for individual deficits, a bitter framing is to start from an understanding of individuals having equal stature but having different tired experiences. Commenting on the current focus on “research excellent” she pointed out that the hyper-competition this promotes is deeply unhealthy. suggesting instead that true excellence requires diversity, and we should focus on an inclusive excellence driven by inclusive leadership. Equality, Diversity & Inclusion: disability issues Day 2’s EDI panel session brought together five disabled academics to discuss the problems of disability in research. Dr Becca Wilson, UKRI Innovation Fellow, Institute of Population Health Science, University of Liverpool (Chair) Phoenix C S Andrews (PhD Student, Information Studies, University of Sheffield and Freelance Writer) Dr Ella Gale (Research Associate and Machine Learning Subject Specialist, School of Chemistry, University of Bristol) Prof Robert Stevens (Professor and Head of Department of Computer Science, University of Manchester) Dr Robin Wilson (Freelance Data Scientist and SSI Fellow) NB. The discussion flowed quite freely so the following summary, so the following summary mixes up input from all the panel members. Researchers are often assumed to be single-minded in following their research calling, and aptness for jobs is often partly judged on “time send”, which disadvantages any disabled person who has been forced to take a career break. On top of this disabled people are often time-poor because of the extra time needed to manage their condition, leaving them with less “output” to show for their time served on many common metrics. This can partially affect early-career researchers, since resources for these are often restricted on a “years-since-PhD” criterion. Time poverty also makes funding with short deadlines that much harder to apply for. Employers add more demands right from the start: new starters are typically expected to complete a health and safety form, generally a brief affair that will suddenly become an 80-page bureaucratic nightmare if you tick the box declaring a disability. Many employers claim to be inclusive yet utterly fail to understand the needs of their disabled staff. Wheelchairs are liberating for those who use them (despite the awful but common phrase “wheelchair-bound”) and yet employers will refuse to insure a wheelchair while travelling for work, classifying it as a “high value personal item” that the owner would take the same responsibility for as an expensive camera. Computers open up the world for blind people in a way that was never possible without them, but it’s not unusual for mandatory training to be inaccessible to screen readers. Some of these barriers can be overcome, but doing so takes yet more time that could and should be spent on more important work. What can we do about it? Academia works on patronage whether we like it or not, so be the person who supports people who are different to you rather than focusing on the one you “recognise yourself in” to mentor. As a manager, it’s important to ask each individual what they need and believe them: they are the expert in their own condition and their lived experience of it. Don’t assume that because someone else in your organisation with the same disability needs one set of accommodations, it’s invalid for your staff member to require something totally different. And remember: disability is unusual as a protected characteristic in that anyone can acquire it at any time without warning! Lightning talks Lightning talk sessions are always tricky to summarise, and while this doesn’t do them justice, here are a few highlights from my notes. Data & metadata Malin Sandstrom talked about a much-needed refinement of contributor role taxonomies for scientific computing Stephan Druskat showcased a project to crowdsource a corpus of research software for further analysis Learning & teaching/community Matthew Bluteau introduced the concept of the “coding dojo” as a way to enhance community of practice. A group of coders got together to practice & learn by working together to solve a problem and explaining their work as they go He described 2 models: a code jam, where people work in small groups, and the Randori method, where 2 people do pair programming while the rest observe. I’m excited to try this out! Steve Crouch talked about intermediate skills and helping people take the next step, which I’m also very interested in with the GLAM Data Science network Esther Plomp recounted experience of running multiple Carpentry workshops online, while Diego Alonso Alvarez discussed planned workshops on making research software more usable with GUIs Shoaib Sufi showcased the SSI’s new event organising guide Caroline Jay reported on a diary study into autonomy & agency in RSE during COVID Lopez, T., Jay, C., Wermelinger, M., & Sharp, H. (2021). How has the covid-19 pandemic affected working conditions for research software engineers? Unpublished manuscript. Wrapping up That’s not everything! But this post is getting pretty long so I’ll wrap up for now. I’ll try to follow up soon with a summary of the “collaborative” part of Collaborations Workshop: the idea-generating sessions and hackday! Time for a new look... I’ve decided to try switching this website back to using Hugo to manage the content and generate the static HTML pages. I’ve been on the Python-based Nikola for a few years now, but recently I’ve been finding it quite slow, and very confusing to understand how to do certain things. I used Hugo recently for the GLAM Data Science Network website and found it had come on a lot since the last time I was using it, so I thought I’d give it another go, and redesign this site to be a bit more minimal at the same time. The theme is still a work in progress so it’ll probably look a bit rough around the edges for a while, but I think I’m happy enough to publish it now. When I get round to it I might publish some more detailed thoughts on the design. Ideas for Accessible Communications The Disability Support Network at work recently ran a survey on “accessible communications”, to develop guidance on how to make communications (especially internal staff comms) more accessible to everyone. I grabbed a copy of my submission because I thought it would be useful to share more widely, so here it is. Please note that these are based on my own experiences only. I am in no way suggesting that these are the only things you would need to do to ensure your communications are fully accessible. They’re just some things to keep in mind. Policies/procedures/guidance can be stressful to use if anything is vague or inconsistent, or if it looks like there might be more information implied than is explicitly given (a common cause of this is use of jargon in e.g. HR policies). Emails relating to these policies have similar problems, made worse because they tend to be very brief. Online meetings can be very helpful, but can also be exhausting, especially if there are too many people, or not enough structure. Larger meetings & webinars without agendas (or where the agenda is ignored, or timings are allowed to drift without acknowledgement) are very stressful, as are those where there is not enough structure to ensure fair opportunities to contribute. Written reference documents and communications should: Be carefully checked for consistency and clarity Have all all key points explicitly stated Explicitly acknowledge the need for flexibility where it is necessary, rather than implying or hinting at it Clearly define jargon & acronyms where they are necessary to the point being made, and avoid them otherwise Include links to longer, more explicit versions where space is tight Provide clear bullet-point summaries with links to the details Online meetings should: Include sufficient break time (at least 10 minutes out of every hour) and not allow this to be compromised just because a speaker has misjudged the length of their talk Include initial “settling-in” time in agendas to avoid timing getting messed up from the start Ensure the agenda is stuck to, or that divergence from the agenda is acknowledged explicitly by the chair and updated timing briefly discussed to ensure everyone is clear Establish a norm for participation at the start of the meeting and stick to it e.g. ask people to raise hands when they have a point to make, or have specific time for round-robin contributions Ensure quiet/introverted people have space to contribute, but don’t force them to do so if they have nothing to add at the time Offer a text-based alternative to contributing verbally If appropriate, at the start of the meeting assign specific roles of: Gatekeeper: ensures everyone has a chance to contribute Timekeeper: ensures meeting runs to time Scribe: ensures a consistent record of the meeting Be chaired by someone with the confidence to enforce the above: offer training to all staff on chairing meetings to ensure everyone has the skills to run a meeting effectively Matrix self-hosting I started running my own Matrix server a little while ago. Matrix is something rather cool, a chat system similar to IRC or Slack, but open and federated. Open in that the standard is available for anyone to view, but also the reference implementations of server and client are open source, along with many other clients and a couple of nascent alternative servers. Federated in that, like email, it doesn’t matter what server you sign up with, you can talk to users on your own or any other server. I decided to host my own for three reasons. Firstly, to see if I could and to learn from it. Secondly, to try and rationalise the Cambrian explosion of Slack teams I was being added to in 2019. Thirdly, to take some control of the loss of access to historical messages in some communities that rely on Slack (especially the Carpentries and RSE communities). Since then, I’ve also added a fourth goal: taking advantage of various bridges to bring other messaging network I use (such as Signal and Telegram) into a consistent UI. I’ve also found that my use of Matrix-only rooms has grown as more individuals & communities have adopted the platform. So, I really like Matrix and I use it daily. My problem now is whether to keep self-hosting. Synapse, the only full server implementation at the moment, is really heavy on memory, so I’ve ended up running it on a much bigger server than I thought I’d need, which seems overkill for a single-user instance. So now I have to make a decision about whether it’s worth keeping going, or shutting it down and going back to matrix.org, or setting up on one of the other servers that have sprung up in the last couple of years. There are a couple of other considerations here. Firstly, Synapse resource usage is entirely down to the size of the rooms joined by users of the homeowner, not directly the number of users. So if users have mostly overlapping interests, and thus keep to the same rooms, you can support quite a large community without significant extra resource usage. Secondly, there are a couple of alternative server implementations in development specifically addressing this issue for small servers. Dendrite and Conduit. Neither are quite ready for what I want yet, but are getting close, and when ready that will allow running small homeservers with much more sensible resource usage. So I could start opening up for other users, and at least justify the size of the server that way. I wouldn’t ever want to make it a paid-for service but perhaps people might be willing to make occasional donations towards running costs. That still leaves me with the question of whether I’m comfortable running a service that others may come to rely on, or being responsible for the safety of their information. I could also hold out for Dendrite or Conduit to mature enough that I’m ready to try them, which might not be more than a few months off. Hmm, seems like I’ve convinced myself to stick with it for now, and we’ll see how it goes. In the meantime, if you know me and you want to try it out let me know and I might risk setting you up with an account! What do you miss least about pre-lockdown life? @JanetHughes on Twitter: What do you miss the least from pre-lockdown life? I absolutely do not miss wandering around the office looking for a meeting room for a confidential call or if I hadn’t managed to book a room in advance. Let’s never return to that joyless frustration, hey? 10:27 AM · Feb 3, 2021 After seeing Terence Eden taking Janet Hughes' tweet from earlier this month as a writing prompt, I thought I might do the same. The first thing that leaps to my mind is commuting. At various points in my life I’ve spent between one and three hours a day travelling to and from work and I’ve never more than tolerated it at best. It steals time from your day, and societal norms dictate that it’s your leisure & self-care time that must be sacrificed. Longer commutes allow more time to get into a book or podcast, especially if not driving, but I’d rather have that time at home rather than trying to be comfortable in a train seat designed for some mythical average man shaped nothing like me! The other thing I don’t miss is the colds and flu! Before the pandemic, British culture encouraged working even when ill, which meant constantly coming into contact with people carrying low-grade viruses. I’m not immunocompromised but some allergies and residue of being asthmatic as a child meant that I would get sick 2-3 times a year. A pleasant side-effect of the COVID precautions we’re all taking is that I haven’t been sick for over 12 months now, which is amazing! Finally, I don’t miss having so little control over my environment. One of the things that working from home has made clear is that there are certain unavoidable aspects of working in my shared office that cause me sensory stress, and that are completely unrelated to my work. Working (or trying to work) next to a noisy automatic scanner; trying to find a light level that works for 6 different people doing different tasks; lacking somewhere quiet and still to eat lunch and recover from a morning of meetings or the constant vaguely-distracting bustle of a large shared office. It all takes energy. Although it’s partly been replaced by the new stress of living through a global pandemic, that old stress was a constant drain on my productivity and mood that had been growing throughout my career as I moved (ironically, given the common assumption that seniority leads to more privacy) into larger and larger open plan offices. Remarkable blogging And the handwritten blog saga continues, as I’ve just received my new reMarkable 2 tablet, which is designed for reading, writing and nothing else. It uses a super-responsive e-ink display and writing on it with a stylus is a dream. It has a slightly rough texture with just a bit of friction that makes my writing come out a lot more legibly than on a slippery glass touchscreen. If that was all there was to it, I might not have wasted my money, but it turns out that it runs on Linux and the makers have wisely decided not to lock it down but to give you full root mess. Yes, you read that right: root access. It presents as an ethernet device over USB, so you can SSH in with a password found in the settings and have full control over your own devices. What a novel concept. This fact alone has meant it’s built a small yet devoted community of users who have come up with some clever ways of extending its functionality. In fact, many of these are listed on this GitHub repository. Finally, from what I’ve seen so far, the handwriting recognition is impressive to say the least. This post was written on it and needed only a little editing. I think this is a device that will get a lot of use! GLAM Data Science Network fellow travellers Updates 2021-02-04 Thanks to Gene @dzshuniper@ausglam.space for suggesting ADHO and a better attribution for the opening quote (see comments below for details) See comments & webmentions for details. “If you want to go fast, go alone. If you want to go far, go together.” — African proverb, probably popularised in English by Kenyan church leader Rev. Samuel Kobia (original) This quote is a popular one in the Carpentries community, and I interpret it in this context to mean that a group of people working together is more sustainable than individuals pursuing the same goal independently. That’s something that speaks to me, and that I want to make sure is reflected in nurturing this new community for data science in galleries, archives, libraries & museums (GLAM). To succeed, this work needs to be complementary and collaborative, rather than competitive, so I want to acknowledge a range of other networks & organisations whose activities complement this. The rest of this article is an unavoidably incomplete list of other relevant organisations whose efforts should be acknowledged and potentially built on. And it should go without saying, but just in case: if the work I’m planning fits right into an existing initiative, then I’m happy to direct my resources there rather than duplicate effort. Inspirations & collaborators Groups with similar goals or undertaking similar activities, but focused on a different sector, geographic area or topic. I think we should make as much use of and contribution to these existing communities as possible since there will be significant overlap. code4lib Probably the closest existing community to what I want to build, but primarily based in the US, so timezones (and physical distance for in-person events) make it difficult to participate fully. This is a well-established community though, with regular events including an annual conference so there’s a lot to learn here. newCardigan Similar to code4lib but an Australian focus, so the timezone problem is even bigger! GLAM Labs Focused on supporting the people experimenting with and developing the infrastructure to enable scholars to access GLAM materials in new ways. In some ways, a GLAM data science network would be complementary to their work, by providing people not directly involved with building GLAM Labs with the skills to make best use of GLAM Labs infrastructure. UK Government data science community Another existing community with very similar intentions, but focused on UK Government sector. Clearly the British Library and a few national & regional museums & archives fall into this, but much of the rest of the GLAM sector does not. Artifical Intelligence for Libraries, Archives & Museums (AI4LAM) A multinational collaboration between several large libraries, archives and museums with a specific focus on the Artificial Intelligence (AI) subset of data science UK Reproducibility Network A network of researchers, primarily in HEIs, with an interest in improving the transparency and reliability of academic research. Mostly science-focused but with some overlap of goals around ethical and robust use of data. Museums Computer Group I’m less familiar with this than the others, but it seems to have a wider focus on technology generally, within the slightly narrower scope of museums specifically. Again, a lot of potential for collaboration. Training Several organisations and looser groups exist specifically to develop and deliver training that will be relevant to members of this network. The network also presents an opportunity for those who have done a workshop with one of these and want to know what the “next steps” are to continue their data science journey. The Carpentries, aka: Library Carpentry Data Carpentry Software Carpentry Data Science Training for Librarians (DST4L) The Programming Historian CDH Cultural Heritage Data School Supporters These misson-driven organisations have goals that align well with what I imagine for the GLAM DSN, but operate at a more strategic level. They work by providing expert guidance and policy advice, lobbying and supporting specific projects with funding and/or effort. In particular, the SSI runs a fellowship programme which is currently providing a small amount of funding to this project. Digital Preservation Coalition (DPC) Software Sustainability Institute (SSI) Research Data Alliance (RDA) Alliance of Digital Humanities Organizations (ADHO) … and its Libraries and Digital Humanities Special Interest Group (Lib&DH SIG) Professional bodies These organisations exist to promote the interests of professionals in particular fields, including supporting professional development. I hope they will provide communication channels to their various members at the least, and may be interested in supporting more directly, depending on their mission and goals. Society of Research Software Engineering Chartered Institute of Library and Information Professionals Archives & Records Association Museums Association Conclusion As I mentioned at the top of the page, this list cannot possibly be complete. This is a growing area and I’m not the only or first person to have this idea. If you can think of anything glaring that I’ve missed and you think should be on this list, leave a comment or tweet/toot at me! A new font for the blog I’ve updated my blog theme to use the quasi-proportional fonts Iosevka Aile and Iosevka Etoile. I really like the aesthetic, as they look like fixed-width console fonts (I use the true fixed-width version of Iosevka in my terminal and text editor) but they’re actually proportional which makes them easier to read. https://typeof.net/Iosevka/ Training a model to recognise my own handwriting If I’m going to train an algorithm to read my weird & awful writing, I’m going to need a decent-sized training set to work with. And since one of the main things I want to do with it is to blog “by hand” it makes sense to focus on that type of material for training. In other words, I need to write out a bunch of blog posts on paper, scan them and transcribe them as ground truth. The added bonus of this plan is that after transcribing, I also end up with some digital text I can use as an actual post — multitasking! So, by the time you read this, I will have already run it through a manual transcription process using Transkribus to add it to my training set, and copy-pasted it into emacs for posting. This is a fun little project because it means I can: Write more by hand with one of my several nice fountain pens, which I enjoy Learn more about the operational process some of my colleagues go through when digitising manuscripts Learn more about the underlying technology & maths, and how to tune the process Produce more lovely content! For you to read! Yay! Write in a way that forces me to put off editing until after a first draft is done and focus more on getting the whole of what I want to say down. That’s it for now — I’ll keep you posted as the project unfolds. Addendum Tee hee! I’m actually just enjoying the process of writing stuff by hand in long-form prose. It’ll be interesting to see how the accuracy turns out and if I need to be more careful about neatness. Will it be better or worse than the big but generic models used by Samsung Notes or OneNote. Maybe I should include some stylus-written text for comparison. Blogging by hand I wrote the following text on my tablet with a stylus, which was an interesting experience: So, thinking about ways to make writing fun again, what if I were to write some of them by hand? I mean I have a tablet with a pretty nice stylus, so maybe handwriting recognition could work. One major problem, of course, is that my handwriting is AWFUL! I guess I’ll just have to see whether the OCR is good enough to cope… It’s something I’ve been thinking about recently anyway: I enjoy writing with a proper fountain pen, so is there a way that I can have a smooth workflow to digitise handwritten text without just typing it back in by hand? That would probably be preferable to this, which actually seems to work quite well but does lead to my hand tensing up to properly control the stylus on the almost-frictionless glass screen. I’m surprised how well it worked! Here’s a sample of the original text: And here’s the result of converting that to text with the built-in handwriting recognition in Samsung Notes: Writing blog posts by hand So, thinking about ways to make writing fun again, what if I were to write some of chum by hand? I mean, I have a toldest winds a pretty nice stylus, so maybe handwriting recognition could work. One major problems, ofcourse, is that my , is AWFUL! Iguess I’ll just have to see whattime the Ocu is good enough to cope… It’s something I’ve hun tthinking about recently anyway: I enjoy wilting with a proper fountain pion, soischeme a way that I can have a smooch workflow to digitise handwritten text without just typing it back in by hand? That wouldprobally be preferableto this, which actually scams to work quito wall but doers load to my hand tensing up to properly couldthe stylus once almost-frictionlessg lass scream. It’s pretty good! It did require a fair bit of editing though, and I reckon we can do better with a model that’s properly trained on a large enough sample of my own handwriting. What I want from a GLAM/Cultural Heritage Data Science Network Introduction As I mentioned last year, I was awarded a Software Sustainability Institute Fellowship to pursue the project of setting up a Cultural Heritage/GLAM data science network. Obviously, the global pandemic has forced a re-think of many plans and this is no exception, so I’m coming back to reflect on it and make sure I’m clear about the core goals so that everything else still moves in the right direction. One of the main reasons I have for setting up a GLAM data science network is because it’s something I want. The advice to “scratch your own itch” is often given to people looking for an open project to start or contribute to, and the lack of a community of people with whom to learn & share ideas and practice is something that itches for me very much. The “motivation” section in my original draft project brief for this work said: Cultural heritage work, like all knowledge work, is increasingly data-based, or at least gives opportunities to make use of data day-to-day. The proper skills to use this data enable more effective working. Knowledge and experience thus gained improves understanding of and empathy with users also using such skills. But of course, I have my own reasons for wanting to do this too. In particular, I want to: Advocate for the value of ethical, sustainable data science across a wide range of roles within the British Library and the wider sector Advance the sector to make the best use of data and digital sources in the most ethical and sustainable way possible Understand how and why people use data from the British Library, and plan/deliver better services to support that Keep up to date with relevant developments in data science Learn from others' skills and experiences, and share my own in turn Those initial goals imply some further supporting goals: Build up the confidence of colleagues who might benefit from data science skills but don’t feel they are “technical” or “computer literate” enough Further to that, build up a base of colleagues with the confidence to share their skills & knowledge with others, whether through teaching, giving talks, writing or other channels Identify common awareness gaps (skills/knowledge that people don’t know they’re missing) and address them Develop a communal space (primarily online) in which people feel safe to ask questions Develop a body of professional practice and help colleagues to learn and contribute to the evolution of this, including practices of data ethics, software engineering, statistics, high performance computing, … Break down language barriers between data scientists and others I’ll expand on this separately as my planning develops, but here are a few specific activities that I’d like to be able to do to support this: Organise less-formal learning and sharing events to complement the more formal training already available within organisations and the wider sector, including “show and tell” sessions, panel discussions, code cafés, masterclasses, guest speakers, reading/study groups, co-working sessions, … Organise training to cover intermediate skills and knowledge currently missing from the available options, including the awareness gaps and professional practice mentioned above Collect together links to other relevant resources to support self-led learning Decisions to be made There are all sorts of open questions in my head about this right now, but here are some of the key ones. Is it GLAM or Cultural Heritage? When I first started planning this whole thing, I went with “Cultural Heritage”, since I was pretty transparently targeting my own organisation. The British Library is fairly unequivocally a CH organisation. But as I’ve gone along I’ve found myself gravitating more towards the term “GLAM” (which stands for Galleries, Libraries, Archives, Museums) as it covers a similar range of work but is clearer (when you spell out the acronym) about what kinds of work are included. What skills are relevant? This turns out to be surprisingly important, at least in terms of how the community is described, as they define the boundaries of the community and can be the difference between someone feeling welcome or excluded. For example, I think that some introductory statistics training would be immensely valuable for anyone working with data to understand what options are open to them and what limitations those options have, but is the word “statistics” offputting per se to those who’ve chosen a career in arts & humanities? I don’t know because I don’t have that background and perspective. Keep it internal to the BL, or open up early on? I originally planned to focus primarily on my own organisation to start with, feeling that it would be easier to organise events and build a network within a single organisation. However, the pandemic has changed my thinking significantly. Firstly, it’s now impossible to organise in-person events and that will continue for quite some time to come, so there is less need to focus on the logistics of getting people into the same room. Secondly, people within the sector are much more used to attending remote events, which can easily be opened up to multiple organisations in many countries, timezones allowing. It now makes more sense to focus primarily on online activities, which opens up the possibility of building a critical mass of active participants much more quickly by opening up to the wider sector. Conclusion This is the type of post that I could let run and run without ever actually publishing, but since it’s something I need feedback and opinions on from other people, I’d better ship it! I really want to know what you think about this, whether you feel it’s relevant to you and what would make it useful. Comments are open below, or you can contact me via Mastodon or Twitter. Writing About Not Writing Under Construction Grunge Sign by Nicolas Raymond — CC BY 2.0 Every year, around this time of year, I start doing two things. First, I start thinking I could really start to understand monads and write more than toy programs in Haskell. This is unlikely to ever actually happen unless and until I get a day job where I can justify writing useful programs in Haskell, but Advent of Code always gets me thinking otherwise. Second, I start mentally writing this same post. You know, the one about how the blogger in question hasn’t had much time to write but will be back soon? “Sorry I haven’t written much lately…” It’s about as cliché as a Geocities site with a permanent “Under construction” GIF. At some point, not long after the dawn of ~time~ the internet, most people realised that every website was permanently under construction and publishing something not ready to be published was just pointless. So I figured this year I’d actually finish writing it and publish it. After all, what’s the worst that could happen? If we’re getting all reflective about this, I could probably suggest some reasons why I’m not writing much: For a start, there’s a lot going on in both my world and The World right now, which doesn’t leave a lot of spare energy after getting up, eating, housework, working and a few other necessary activities. As a result, I’m easily distracted and I tend to let myself get dragged off in other directions before I even get to writing much of anything. If I do manage to focus on this blog in general, I’ll often end up working on some minor tweak to the theme or functionality. I mean, right now I’m wondering if I can do something clever in my text-editor (Emacs, since you’re asking) to streamline my writing & editing process so it’s more elegant, efficient, ergonomic and slightly closer to perfect in every way. It also makes me much more likely to self-censor, and to indulge my perfectionist tendencies to try and tweak the writing until it’s absolutely perfect, which of course never happens. I’ve got a whole heap of partly-written posts that are juuuust waiting for the right motivation for me to just finish them off. The only real solution is to accept that: I’m not going to write much and that’s probably OK What I do write won’t always be the work of carefully-researched, finely crafted genius that I want it to be, and that’s probably OK too Also to remember why I started writing and publishing stuff in the first place: to reflect and get my thoughts out onto a (virtual) page so that I can see them, figure out whether I agree with myself and learn; and to stimulate discussion and get other views on my (possibly uninformed, incorrect or half-formed) thoughts, also to learn. In other words, a thing I do for me. It’s easy to forget that and worry too much about whether anyone else wants to read my s—t. Will you notice any changes? Maybe? Maybe not? Who knows. But it’s a new year and that’s as good a time for a change as any. When is a persistent identifier not persistent? Or an identifier? I wrote a post on the problems with ISBNs as persistent identifiers (PIDS) for work, so check it out if that sounds interesting. IDCC20 reflections I’m just back from IDCC20, so here are a few reflections on this year’s conference. You can find all the available slides and links to shared notes on the conference programme. There’s also a list of all the posters and an overview of the Unconference Skills for curation of diverse datasets Here in the UK and elsewhere, you’re unlikely to find many institutions claiming to apply a deep level of curation to every dataset/software package/etc deposited with them. There are so many different kinds of data and so few people in any one institution doing “curation” that it’s impossible to do this for everything. Absent the knowledge and skills required to fully evaluate an object the best that can be done is usually to make a sense check on the metadata and flag up with the depositor potential for high-level issues such as accidental disclosure of sensitive personal information. The Data Curation Network in the United States is aiming to address this issue by pooling expertise across multiple organisations. The pilot has been highly successful and they’re now looking to obtain funding to continue this work. The Swedish National Data Service is experimenting with a similar model, also with a lot of success. As well as sharing individual expertise, the DCN collaboration has also produced some excellent online quick-reference guides for curating common types of data. We had some further discussion as part of the Unconference on the final day about what it would look like to introduce this model in the UK. There was general agreement that this was a good idea and a way to make optimal use of sparse resources. There were also very valid concerns that it would be difficult in the current financial climate for anyone to justify doing work for another organisation, apparently for free. In my mind there are two ways around this, which are not mutually exclusive by any stretch of the imagination. First is to Just Do It: form an informal network of curators around something simple like a mailing list, and give it a try. Second is for one or more trusted organisations to provide some coordination and structure. There are several candidates for this including DCC, Jisc, DPC and the British Library; we all have complementary strengths in this area so it’s my hope that we’ll be able to collaborate around it. In the meantime, I hope the discussion continues. Artificial intelligence, machine learning et al As you might expect at any tech-oriented conference there was a strong theme of AI running through many presentations, starting from the very first keynote from Francine Berman. Her talk, The Internet of Things: Utopia or Dystopia? used self-driving cars as a case study to unpack some of the ethical and privacy implications of AI. For example, driverless cars can potentially increase efficiency, both through route-planning and driving technique, but also by allowing fewer vehicles to be shared by more people. However, a shared vehicle is not a private space in the way your own car is: anything you say or do while in that space is potentially open to surveillance. Aside from this, there are some interesting ideas being discussed, particularly around the possibility of using machine learning to automate increasingly complex actions and workflows such as data curation and metadata enhancement. I didn’t get the impression anyone is doing this in the real world yet, but I’ve previously seen theoretical concepts discussed at IDCC make it into practice so watch this space! Playing games! Training is always a major IDCC theme, and this year two of the most popular conference submissions described games used to help teach digital curation concepts and skills. Mary Donaldson and Matt Mahon of the University of Glasgow presented their use of Lego to teach the concept of sufficient metadata. Participants build simple models before documenting the process and breaking them down again. Then everyone had to use someone else’s documentation to try and recreate the models, learning important lessons about assumptions and including sufficient detail. Kirsty Merrett and Zosia Beckles from the University of Bristol brought along their card game “Researchers, Impact and Publications (RIP)”, based on the popular “Cards Against Humanity”. RIP encourages players to examine some of the reasons for and against data sharing with plenty of humour thrown in. Both games were trialled by many of the attendees during Thursday’s Unconference. Summary I realised in Dublin that it’s 8 years since I attended my first IDCC, held at the University of Bristol in December 2011 while I was still working at the nearby University of Bath. While I haven’t been every year, I’ve been to every one held in Europe since then and it’s interesting to see what has and hasn’t changed. We’re no longer discussing data management plans, data scientists or various other things as abstract concepts that we’d like to encourage, but dealing with the real-world consequences of them. The conference has also grown over the years: this year was the biggest yet, boasting over 300 attendees. There has been especially big growth in attendees from North America, Australasia, Africa and the Middle East. That’s great for the diversity of the conference as it brings in more voices and viewpoints than ever. With more people around to interact with I have to work harder to manage my energy levels but I think that’s a small price to pay. Iosevka: a nice fixed-width-font Iosevka is a nice, slender monospace font with a lot of configurable variations. Check it out: https://typeof.net/Iosevka/ Replacing comments with webmentions Just a quickie to say that I’ve replaced the comment section at the bottom of each post with webmentions, which allows you to comment by posting on your own site and linking here. It’s a fundamental part of the IndieWeb, which I’m slowly getting to grips with having been a halfway member of it for years by virtue of having my own site on my own domain. I’d already got rid of Google Analytics to stop forcing that tracking on my visitors, I wanted to get rid of Disqus too because I’m pretty sure the only way that is free for me is if they’re selling my data and yours to third parties. Webmention is a nice alternative because it relies only on open standards, has no tracking and allows people to control their own comments. While I’m currently using a third-party service to help, I can switch to self-hosted at any point in the future, completely transparently. Thanks to webmention.io, which handles incoming webmentions for me, and webmention.js, which displays them on the site, I can keep it all static and not have to implement any of this myself, which is nice. It’s a bit harder to comment because you have to be able to host your own content somewhere, but then almost no-one ever commented anyway, so it’s not like I’ll lose anything! Plus, if I get Bridgy set up right, you should be able to comment just by replying on Mastodon, Twitter or a few other places. A spot of web searching shows that I’m not the first to make the Disqus -> webmentions switch (yes, I’m putting these links in blatantly to test outgoing webmentions with Telegraph…): So long Disqus, hello webmention — Nicholas Hoizey Bye Disqus, hello Webmention! — Evert Pot Implementing Webmention on a static site — Deluvi Let’s see how this goes! Bridging Carpentries Slack channels to Matrix It looks like I’ve accidentally taken charge of bridging a bunch of The Carpentries Slack channels over to Matrix. Given this, it seems like a good idea to explain what that sentence means and reflect a little on my reasoning. I’m more than happy to discuss the pros and cons of this approach If you just want to try chatting in Matrix, jump to the getting started section What are Slack and Matrix? Slack (see also on Wikipedia), for those not familiar with it, is an online text chat platform with the feel of IRC (Internet Relay Chat), a modern look and feel and both web and smartphone interfaces. By providing a free tier that meets many peoples' needs on its own Slack has become the communication platform of choice for thousands of online communities, private projects and more. One of the major disadvantages of using Slack’s free tier, as many community organisations do, is that as an incentive to upgrade to a paid service your chat history is limited to the most recent 10,000 messages across all channels. For a busy community like The Carpentries, this means that messages older than about 6-7 weeks are already inaccessible, rendering some of the quieter channels apparently empty. As Slack is at pains to point out, that history isn’t gone, just archived and hidden from view unless you pay the low, low price of $1/user/month. That doesn’t seem too pricy, unless you’re a non-profit organisation with a lot of projects you want to fund and an active membership of several hundred worldwide, at which point it soon adds up. Slack does offer to waive the cost for registered non-profit organisations, but only for one community. The Carpentries is not an independent organisation, but one fiscally sponsored by Community Initiatives, which has already used its free quota of one elsewhere rendering the Carpentries ineligible. Other umbrella organisations such as NumFocus (and, I expect, Mozilla) also run into this problem with Slack. So, we have a community which is slowly and inexorably losing its own history behind a paywall. For some people this is simply annoying, but from my perspective as a facilitator of the preservation of digital things the community is haemhorraging an important record of its early history. Enter Matrix. Matrix is a chat platform similar to IRC, Slack or Discord. It’s divided into separate channels, and users can join one or more of these to take part in the conversation happening in those channels. What sets it apart from older technology like IRC and walled gardens like Slack & Discord is that it’s federated. Federation means simply that users on any server can communicate with users and channels on any other server. Usernames and channel addresses specify both the individual identifier and the server it calls home, just as your email address contains all the information needed for my email server to route messages to it. While users are currently tied to their home server, channels can be mirrored and synchronised across multiple servers making the overall system much more resilient. Can’t connect to your favourite channel on server X? No problem: just connect via its alias on server Y and when X comes back online it will be resynchronised. The technology used is much more modern and secure than the aging IRC protocol, and there’s no vender lock-in like there is with closed platforms like Slack and Discord. On top of that, Matrix channels can easily be “bridged” to channels/rooms on other platforms, including, yes, Slack, so that you can join on Matrix and transparently talk to people connected to the bridged room, or vice versa. So, to summarise: The current Carpentries Slack channels could be bridged to Matrix at no cost and with no disruption to existing users The history of those channels from that point on would be retained on matrix.org and accessible even when it’s no longer available on Slack If at some point in the future The Carpentries chose to invest in its own Matrix server, it could adopt and become the main Matrix home of these channels without disruption to users of either Matrix or (if it’s still in use at that point) Slack Matrix is an open protocol, with a reference server implementation and wide range of clients all available as free software, which aligns with the values of the Carpentries community On top of this: I’m fed up of having so many different Slack teams to switch between to see the channels in all of them, and prefer having all the channels I regularly visit in a single unified interface; I wanted to see how easy this would be and whether others would also be interested. Given all this, I thought I’d go ahead and give it a try to see if it made things more manageable for me and to see what the reaction would be from the community. How can I get started? !!! reminder Please remember that, like any other Carpentries space, the Code of Conduct applies in all of these channels. First, sign up for a Matrix account. The quickest way to do this is on the Matrix “Try now” page, which will take you to the Riot Web client which for many is synonymous with Matrix. Other clients are also available for the adventurous. Second, join one of the channels. The links below will take you to a page that will let you connect via your preferred client. You’ll need to log in as they are set not to allow guest access, but, unlike Slack, you won’t need an invitation to be able to join. #general — the main open channel to discuss all things Carpentries #random — anything that would be considered offtopic elsewhere #welcome — join in and introduce yourself! That’s all there is to getting started with Matrix. To find all the bridged channels there’s a Matrix “community” that I’ve added them all to: Carpentries Matrix community. There’s a lot more, including how to bridge your favourite channels from Slack to Matrix, but this is all I’ve got time and space for here! If you want to know more, leave a comment below, or send me a message on Slack (jezcope) or maybe Matrix (@petrichor:matrix.org)! I’ve also made a separate channel for Matrix-Slack discussions: #matrix on Slack and Carpentries Matrix Discussion on Matrix MozFest19 first reflections Discussions of neurodiversity at #mozfest Photo by Jennifer Riggins The other weekend I had my first experience of Mozilla Festival, aka #mozfest. It was pretty awesome. I met quite a few people in real life that I’ve previously only known (/stalked) on Twitter, and caught up with others that I haven’t seen for a while. I had the honour of co-facilitating a workshop session on imposter syndrome and how to deal with it with the wonderful Yo Yehudi and Emmy Tsang. We all learned a lot and hope our participants did too; we’ll be putting together a summary blog post as soon as we can get our act together! I also attended a great session, led by Kiran Oliver (psst, they’re looking for a new challenge), on how to encourage and support a neurodiverse workforce. I was only there for the one day, and I really wish that I’d taken the plunge and committed to the whole weekend. There’s always next year though! To be honest, I’m just disappointed that I never had the courage to go sooner, Music for working Today1 the office conversation turned to blocking out background noise. (No, the irony is not lost on me.) Like many people I work in a large, open-plan office, and I’m not alone amongst my colleagues in sometimes needing to find a way to boost concentration by blocking out distractions. Not everyone is like this, but I find music does the trick for me. I also find that different types of music are better for different types of work, and I use this to try and manage my energy better. There are more distractions than auditory noise, and at times I really struggle with visual noise. Rather than have this post turn into a rant about the evils of open-plan offices, I’ll just mention that the scientific evidence doesn’t paint them in a good light2, or at least suggests that the benefits are more limited in scope than is commonly thought3, and move on to what I actually wanted to share: good music for working to. There are a number of genres that I find useful for working. Generally, these have in common a consistent tempo, a lack of lyrics, and enough variation to prevent boredom without distracting. Familiarity helps my concentration too so I’ll often listen to a restricted set of albums for a while, gradually moving on by dropping one out and bringing in another. In my case this includes: Traditional dance music, generally from northern and western European traditions for me. This music has to be rhythmically consistent to allow social dancing, and while the melodies are typically simple repeated phrases, skilled musicians improvise around that to make something beautiful. I tend to go through phases of listening to particular traditions; I’m currently listening to a lot of French, Belgian and Scandinavian. Computer game soundtracks, which are specifically designed to enhance gameplay without distracting, making them perfect for other activities requiring a similar level of concentration. Chiptunes and other music incorporating it; partly overlapping with the previous category, chiptunes is music made by hacking the audio chips from (usually) old computers and games machines to become an instrument for new music. Because of the nature of the instrument, this will have millisecond-perfect rhythm and again makes for undistracting noise blocking with an extra helping of nostalgia! Purists would disagree with me, but I like artists that combine chiptunes with other instruments and effects to make something more complete-sounding. Retrowave/synthwave/outrun, synth-driven music that’s instantly familiar as the soundtrack to many 90s sci-fi and thriller movies. Atmospheric, almost dreamy, but rhythmic with a driving beat, it’s another genre that fits into the “pleasing but not too surprising” category for me. So where to find this stuff? One of the best resources I’ve found is Music for Programming which provides carefully curated playlists of mostly electronic music designed to energise without distracting. They’re so well done that the tracks move seamlessly, one to the next, without ever getting boring. Spotify is an obvious option, and I do use it quite a lot. However, I’ve started trying to find ways to support artists more directly, and Bandcamp seems to be a good way of doing that. It’s really easy to browse by genre, or discover artists similar to what you’re currently hearing. You can listen for free as long as you don’t mind occasional nags to buy the music you’re hearing, but you can also buy tracks or albums. Music you’ve paid for is downloadable in several open, DRM-free formats for you to keep, and you know that a decent chunk of that cash is going directly to that artist. I also love noise generators; not exactly music, but a variety of pleasant background noises, some of which nicely obscure typical office noise. I particularly like mynoise.net, which has a cornucopia of different natural and synthetic noises. Each generator comes with a range of sliders allowing you to tweak the composition and frequency range, and will even animate them randomly for you to create a gently shifting soundscape. A much simpler, but still great, option is Noisli with it’s nice clean interface. Both offer apps for iOS and Android. For bonus points, you can always try combining one or more of the above. Adding in a noise generator allows me to listen to quieter music while still getting good environmental isolation when I need concentration. Another favourite combo is to open both the cafe and rainfall generators from myNoise, made easier by the ability to pop out a mini-player then open up a second generator. I must be missing stuff though. What other musical genres should I try? What background sounds are nice to work to? Well, you know. The other day. Whatever. ↩︎ See e.g.: Lee, So Young, and Jay L. Brand. ‘Effects of Control over Office Workspace on Perceptions of the Work Environment and Work Outcomes’. Journal of Environmental Psychology 25, no. 3 (1 September 2005): 323–33. https://doi.org/10.1016/j.jenvp.2005.08.001. ↩︎ Open plan offices can actually work under certain conditions, The Conversation ↩︎ Working at the British Library: 6 months in It barely seems like it, but I’ve been at the British Library now for nearly 6 months. It always takes a long time to adjust and from experience I know it’ll be another year before I feel fully settled, but my team, department and other colleagues have really made me feel welcome and like I belong. One thing that hasn’t got old yet is the occasional thrill of remembering that I work at my national library now. Every now and then I’ll catch a glimpse of the collections at Boston Spa or step into one of the reading rooms and think “wow, I actually work here!” I also like having a national and international role to play, which means I get to travel a bit more than I used to. Budgets are still tight so there are limits, and I still prefer to be home more often than not, but there is more scope in this job than I’ve had previously for travelling to conferences, giving talks that change the way people think, and learning in different contexts. I’m learning a lot too, especially how to work with and manage people split across multiple sites, and the care and feeding of budgets. As well as missing mo old team at Sheffield, I do also miss some of the direct contact I had with researchers in HE. I especially miss the teaching work, but also the higher-level influencing of more senior academics to change practices on a wider scale. Still, I get to use those influencing skills in different ways now, and I’m still involved with the Carpentries which should let me keep my hand in with teaching. I still deal with my general tendency to try and do All The Things, and as before I’m slowly learning to recognise it, tame it and very occasionally turn it to my advantage. That also leads to feelings of imposterism that are only magnified by the knowledge that I now work at a national institution! It’s a constant struggle some days to believe that I’ve actually earned my place here through hard work, Even if I don’t always feel that I have, my colleagues here certainly have, so I should have more faith in their opinion of me. Finally, I couldn’t write this type of thing without mentioning the commute. I’ve gone from 90 minutes each way on a good day (up to twice that if the trains were disrupted) to 35 minutes each way along fairly open roads. I have less time to read, but much more time at home. On top of that, the library has implemented flexitime across all pay grades, with even senior managers strongly encouraged to make full use. Not only is this an important enabler of equality across the organisation, it relieves for me personally the pressure to work over my contracted hours and the guilt I’ve always felt at leaving work even 10 minutes early. If I work late, it’s now a choice I’m making based on business needs instead of guilt and in full knowledge that I’ll get that time back later. So that’s where I am right now. I’m really enjoying the work and the culture, and I look forward to what the next 6 months will bring! RDA Plenary 13 reflection Photo by me I sit here writing this in the departure lounge at Philadelphia International Airport, waiting for my Aer Lingus flight back after a week at the 13th Research Data Alliance (RDA) Plenary (although I’m actually publishing this a week or so later at home). I’m pretty exhausted, partly because of the jet lag, and partly because it’s been a very full week with so much to take in. It’s my first time at an RDA Plenary, and it was quite a new experience for me! First off, it’s my first time outside Europe, and thus my first time crossing quite so many timezones. I’ve been waking at 5am and ready to drop by 8pm, but I’ve struggled on through! Secondly, it’s the biggest conference I’ve been to for a long time, both in number of attendees and number of parallel sessions. There’s been a lot of sustained input so I’ve been very glad to have a room in the conference hotel and be able to escape for a few minutes when I needed to recharge. Thirdly, it’s not really like any other conference I’ve been to: rather than having large numbers of presentations submitted by attendees, each session comprises lots of parallel meetings of RDA interest groups and working groups. It’s more community-oriented: an opportunity for groups to get together face to face and make plans or show off results. I found it pretty intense and struggled to take it all in, but incredibly valuable nonetheless. Lots of information to process (I took a lot of notes) and a few contacts to follow up on too, so overall I loved it! Using Pipfile in Binder Photo by Sear Greyson on Unsplash I recently attended a workshop, organised by the excellent team of the Turing Way project, on a tool called BinderHub. BinderHub, along with public hosting platform MyBinder, allows you to publish computational notebooks online as “binders” such that they’re not static but fully interactive. It’s able to do this by using a tool called repo2docker to capture the full computational environment and dependencies required to run the notebook. !!! aside “What is the Turing Way?” The Turing Way is, in its own words, “a lightly opinionated guide to reproducible data science.” The team is building an open textbook and running a number of workshops for scientists and research software engineers, and you should check out the project on Github. You could even contribute! The Binder process goes roughly like this: Do some work in a Jupyter Notebook or similar Put it into a public git repository Add some extra metadata describing the packages and versions your code relies on Go to mybinder.org and tell it where to find your repository Open the URL it generates for you Profit Other than step 5, which can take some time to build the binder, this is a remarkably quick process. It supports a number of different languages too, including built-in support for R, Python and Julia and the ability to configure pretty much any other language that will run on Linux. However, the Python support currently requires you to have either a requirements.txt or Conda-style environment.yml file to specify dependencies, and I commonly use a Pipfile for this instead. Pipfile allows you to specify a loose range of compatible versions for maximal convenience, but then locks in specific versions for maximal reproducibility. You can upgrade packages any time you want, but you’re fully in control of when that happens, and the locked versions are checked into version control so that everyone working on a project gets consistency. Since Pipfile is emerging as something of a standard thought I’d see if I could use that in a binder, and it turns out to be remarkably simple. The reference implementation of Pipfile is a tool called pipenv by the prolific Kenneth Reitz. All you need to use this in your binder is two files of one line each. requirements.txt tells repo2binder to build a Python-based binder, and contains a single line to install the pipenv package: pipenv Then postBuild is used by repo2binder to install all other dependencies using pipenv: pipenv install --system The --system flag tells pipenv to install packages globally (its default behaviour is to create a Python virtualenv). With these two files, the binder builds and runs as expected. You can see a complete example that I put together during the workshop here on Gitlab. What do you think I should write about? I’ve found it increasingly difficult to make time to blog, and it’s not so much not having the time — I’m pretty privileged in that regard — but finding the motivation. Thinking about what used to motivate me, one of the big things was writing things that other people wanted to read. Rather than try to guess, I thought I’d ask! Those who know what I'm about, what would you read about, if it was written by me?I'm trying to break through the blog-writers block and would love to know what other people would like to see my ill-considered opinions on.— Jez Cope (@jezcope) March 7, 2019 I’m still looking for ideas, so please tweet me or leave me a comment below. Below are a few thoughts that I’m planning to do something with. Something taking one of the more techy aspects of Open Research, breaking it down and explaining the benefits for non-techy folks?— Dr Beth 🏳️‍🌈 🐺 (@PhdGeek) March 7, 2019 Skills (both techy and non techy) that people need to most effectively support RDM— Kate O'Neill (@KateFONeill) March 7, 2019 Sometimes I forget that my background makes me well-qualified to take some of these technical aspects of the job and break them down for different audiences. There might be a whole series in this… Carrying on our conversation last week I'd love to hear more about how you've found moving from an HE lib to a national library and how you see the BL's role in RDM. Appreciate this might be a bit niche/me looking for more interesting things to cite :)— Rosie Higman (@RosieHLib) March 7, 2019 This is interesting, and something I’d like to reflect on; moving from one job to another always has lessons and it’s easy to miss them if you’re not paying attention. Another one for the pile. Life without admin rights to your computer— Mike Croucher (@walkingrandomly) March 7, 2019 This is so frustrating as an end user, but at the same time I get that endpoint security is difficult and there are massive risks associated with letting end users have admin rights. This is particularly important at the BL: as custodian’s of a nation’s cultural heritage, the risk for us is bigger than for many and for this reason we are now Cyber Essentials Plus certified. At some point I’d like to do some research and have a conversation with someone who knows a lot more about InfoSec to work out what the proper approach to this, maybe involving VMs and a demilitarized zone on the network. I’m always looking for more inspiration, so please leave a comment if you’ve got anything you’d like to read my thoughts on. If you’re not familiar with my writing, please take a minute or two to explore the blog; the tags page is probably a good place to get an overview. Ultimate Hacking Keyboard: first thoughts Following on from the excitement of having built a functioning keyboard myself, I got a parcel on Monday. Inside was something that I’ve been waiting for since September: an Ultimate Hacking Keyboard! Where the custom-built Laplace is small and quiet for travelling, the UHK is to be my main workhorse in the study at home. Here are my first impressions: Key switches I went with Kailh blue switches from the available options. In stark contrast to the quiet blacks on the Laplace, blues are NOISY! They have an extra piece of plastic inside the switch that causes an audible and tactile click when the switch activates. This makes them very satisfying to type on and should help as I train my fingers not to bottom out while typing, but does make them unsuitable for use in a shared office! Here are some animations showing how the main types of key switch vary. Layout This keyboard has what’s known as a 60% layout: no number pad, arrows or function keys. As with the more spartan Laplace, these “missing” keys are made up for with programmable layers. For example, the arrow keys are on the Mod layer on the I/J/K/L keys, so I can access them without moving from the home row. I actually find this preferable to having to move my hand to the right to reach them, and I really never used the number pad in any case. Split This is a split keyboard, which means that the left and right halves can be separated to place the hands further apart which eases strain across the shoulders. The UHK has a neat coiled cable joining the two which doesn’t get in the way. A cool design feature is that the two halves can be slotted back together and function perfectly well as a non-split keyboard too, held together by magnets. There are even electrical contacts so that when the two are joined you don’t need the linking cable. Programming The board is fully programmable, and this is achieved via a custom (open source) GUI tool which talks to the (open source) firmware on the board. You can have multiple keymaps, each of which has a separate Base, Mod, Fn and Mouse layer, and there’s an LED display that shows a short mnemonic for the currently active map. I already have a customised Dvorak layout for day-to-day use, plus a standard QWERTY for not-me to use and an alternative QWERTY which will be slowly tweaked for games that don’t work well with Dvorak. Mouse keys One cool feature that the designers have included in the firmware is the ability to emulate a mouse. There’s a separate layer that allows me to move the cursor, scroll and click without moving my hands from the keyboard. Palm rests Not much to say about the palm rests, other than they are solid wood, and chunky, and really add a little something. I have to say, I really like it so far! Overall it feels really well designed, with every little detail carefully thought out and excellent build quality and a really solid feeling. Custom-built keyboard I’m typing this post on a keyboard I made myself, and I’m rather excited about it! Why make my own keyboard? I wanted to learn a little bit about practical electronics, and I like to learn by doing I wanted to have the feeling of making something useful with my own hands I actually need a small, keyboard with good-quality switches now that I travel a fair bit for work and this lets me completely customise it to my needs Just because! While it is possible to make a keyboard completely from scratch, it makes much more sense to put together some premade parts. The parts you need are: PCB (printed circuit board): the backbone of the keyboard, to which all the other electrical components attach, this defines the possible physical locations for each key Switches: one for each key to complete a circuit whenever you press it Keycaps: switches are pretty ugly and pretty uncomfortable to press, so each one gets a cap; these are what you probably think of as the “keys” on your keyboard and come in almost limitless variety of designs (within the obvious size limitation) and are the easiest bit of personalisation Controller: the clever bit, which detects open and closed switches on the PCB and tells your computer what keys you pressed via a USB cable Firmware: the program that runs on the controller starts off as source code like any other program, and altering this can make the keyboard behave in loads of different ways, from different layouts to multiple layers accessed by holding a particular key, to macros and even emulating a mouse! In my case, I’ve gone for the following: PCB Laplace from keeb.io, a very compact 47-key (“40%") board, with no number pad, function keys or number row, but a lot of flexibility for key placement on the bottom row. One of my key design goals was small size so I can just pop it in my bag and have on my lap on the train. Controller Elite-C, designed specifically for keyboard builds to be physically compatible with the cheaper Pro Micro, with a more-robust USB port (the Pro Micro’s has a tendency to snap off), and made easier to program with a built-in reset button and better bootloader. Switches Gateron Black: Gateron is one of a number of manufacturers of mechanical switches compatible with the popular Cherry range. The black switch is linear (no click or bump at the activation point) and slightly heavier sprung than the more common red. Cherry also make a black switch but the Gateron version is slightly lighter and having tested a few I found them smoother too. My key goal here was to reduce noise, as the stronger spring will help me type accurately without hitting the bottom of the keystroke with an audible sound. Keycaps Blank grey PBT in DSA profile: this keyboard layout has a lot of non-standard sized keys, so blank keycaps meant that I wouldn’t be putting lots of keys out of their usual position; they’re also relatively cheap, fairly classy IMHO and a good placeholder until I end up getting some really cool caps on a group buy or something; oh, and it minimises the chance of someone else trying the keyboard and getting freaked out by the layout… Firmware QMK (Quantum Mechanical Keyboard), with a work-in-progress layout, based on Dvorak. QMK has a lot of features and allows you to fully program each and every key, with multiple layers accessed through several different routes. Because there are so few keys on this board, I’ll need to make good use of layers to make all the keys on a usual keyboard available. Dvorak Simplified Keyboard I’m grateful to the folks of the Leeds Hack Space, especially Nav & Mark who patiently coached me in various soldering techniques and good practice, but also everyone else who were so friendly and welcoming and interested in my project. I’m really pleased with the result, which is small, light and fully customisable. Playing with QMK firmware features will keep me occupied for quite a while! This isn’t the end though, as I’ll need a case to keep the dust out. I’m hoping to be able to 3D print this or mill it from wood with a CNC mill, for which I’ll need to head back to the Hack Space! Less, but better “Wenniger aber besser” — Dieter Rams {:.big-quote} I can barely believe it’s a full year since I published my intentions for 2018. A lot has happened since then. Principally: in November I started a new job as Data Services Lead at The British Library. One thing that hasn’t changed is my tendency to try to do too much, so this year I’m going to try and focus on a single intention, a translation of designer Dieter Rams' famous quote above: Less, but better. This chimes with a couple of other things I was toying with over the Christmas break, as they’re essentially other ways of saying the same thing: Take it steady One thing at a time I’m also going to keep in mind those touchstones from last year: What difference is this making? Am I looking after myself? Do I have evidence for this? I mainly forget to think about them, so I’ll be sticking up post-its everywhere to help me remember! How to extend Python with Rust: part 1 Python is great, but I find it useful to have an alternative language under my belt for occasions when no amount of Pythonic cleverness will make some bit of code run fast enough. One of my main reasons for wanting to learn Rust was to have something better than C for that. Not only does Rust have all sorts of advantages that make it a good choice for code that needs to run fast and correctly, it’s also got a couple of rather nice crates (libraries) that make interfacing with Python a lot nicer. Here’s a little tutorial to show you how easy it is to call a simple Rust function from Python. If you want to try it yourself, you’ll find the code on GitHub. !!! prerequisites I’m assuming for this tutorial that you’re already familiar with writing Python scripts and importing & using packages, and that you’re comfortable using the command line. You’ll also need to have installed Rust. The Rust bit The quickest way to get compiled code into Python is to use the builtin ctypes package. This is Python’s “Foreign Function Interface” or FFI: a means of calling functions outside the language you’re using to make the call. ctypes allows us to call arbitrary functions in a shared library1, as long as those functions conform to certain standard C language calling conventions. Thankfully, Rust tries hard to make it easy for us to build such a shared library. The first thing to do is to create a new project with cargo, the Rust build tool: $ cargo new rustfrompy Created library `rustfrompy` project $ tree . ├── Cargo.toml └── src └── lib.rs 1 directory, 2 files !!! aside I use the fairly common convention that text set in fixed-width font is either example code or commands to type in. For the latter, a $ precedes the command that you type (omit the $), and lines that don’t start with a $ are output from the previous command. I assume a basic familiarity with Unix-style command line, but I should probably put in some links to resources if you need to learn more! We need to edit the Cargo.toml file and add a [lib] section: [package] name = "rustfrompy" version = "0.1.0" authors = ["Jez Cope <j.cope@erambler.co.uk>"] [dependencies] [lib] name = "rustfrompy" crate-type = ["cdylib"] This tells cargo that we want to make a C-compatible dynamic library (crate-type = ["cdylib"]) and what to call it, plus some standard metadata. We can then put our code in src/lib.rs. We’ll just use a simple toy function that adds two numbers together: #[no_mangle] pub fn add(a: i64, b: i64) -> i64 { a + b } Notice the pub keyword, which instructs the compiler to make this function accessible to other modules, and the #[no_mangle] annotation, which tells it to use the standard C naming conventions for functions. If we don’t do this, then Rust will generate a new name for the function for its own nefarious purposes, and as a side effect we won’t know what to call it when we want to use it from Python. Being good developers, let’s also add a test: #[cfg(test)] mod test { use ::*; #[test] fn test_add() { assert_eq!(4, add(2, 2)); } } We can now run cargo test which will compile that code and run the test: $ cargo test Compiling rustfrompy v0.1.0 (file:///home/jez/Personal/Projects/rustfrompy) Finished dev [unoptimized + debuginfo] target(s) in 1.2 secs Running target/debug/deps/rustfrompy-3033caaa9f5f17aa running 1 test test test::test_add ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Everything worked! Now just to build that shared library and we can try calling it from Python: $ cargo build Compiling rustfrompy v0.1.0 (file:///home/jez/Personal/Projects/rustfrompy) Finished dev [unoptimized + debuginfo] target(s) in 0.30 secs Notice that the build is unoptimized and includes debugging information: this is useful in development, but once we’re ready to use our code it will run much faster if we compile it with optimisations. Cargo makes this easy: $ cargo build --release Compiling rustfrompy v0.1.0 (file:///home/jez/Personal/Projects/rustfrompy) Finished release [optimized] target(s) in 0.30 secs The Python bit After all that, the Python bit is pretty short. First we import the ctypes package (which is included in all recent Python versions): from ctypes import cdll Cargo has tidied our shared library away into a folder, so we need to tell Python where to load it from. On Linux, it will be called lib<something>.so where the “something” is the crate name from Cargo.toml, “rustfrompy”: lib = cdll.LoadLibrary('target/release/librustfrompy.so') Finally we can call the function anywhere we want. Here it is in a pytest-style test: def test_rust_add(): assert lib.add(27, 15) == 42 If you have pytest installed (and you should!) you can run the whole test like this: $ pytest --verbose test.py ====================================== test session starts ====================================== platform linux -- Python 3.6.4, pytest-3.1.1, py-1.4.33, pluggy-0.4.0 -- /home/jez/.virtualenvs/datasci/bin/python cachedir: .cache rootdir: /home/jez/Personal/Projects/rustfrompy, inifile: collected 1 items test.py::test_rust_add PASSED It worked! I’ve put both the Rust and Python code on github if you want to try it for yourself. Shortcomings Ok, so that was a pretty simple example, and I glossed over a lot of things. For example, what would happen if we did lib.add(2.0, 2)? This causes Python to throw an error because our Rust function only accepts integers (64-bit signed integers, i64, to be precise), and we gave it a floating point number. ctypes can’t guess what type(s) a given function will work with, but it can at least tell us when we get it wrong. To fix this properly, we need to do some extra work, telling the ctypes library what the argument and return types for each function are. For a more complex library, there will probably be more housekeeping to do, such as translating return codes from functions into more Pythonic-style errors. For a small example like this there isn’t much of a problem, but the bigger your compiled library the more extra boilerplate is required on the Python side just to use all the functions. When you’re working with an existing library you don’t have much choice about this, but if you’re building it from scratch specifically to interface with Python, there’s a better way using the Python C API. You can call this directly in Rust, but there are a couple of Rust crates that make life much easier, and I’ll be taking a look at those in a future blog post. .so on Linux, .dylib on Mac and .dll on Windows ↩︎ New Years's irresolution Photo by Andrew Hughes on Unsplash I’ve chosen not to make any specific resolutions this year; I’ve found that they just don’t work for me. Like many people, all I get is a sense of guilt when I inevitably fail to live up to the expectations I set myself at the start of the year. However, I have set a couple of what I’m referring to as “themes” for the year: touchstones that I’ll aim to refer to when setting priorities or just feeling a bit overwhelmed or lacking in direction. They are: Contribution Self-care Measurement I may do some blog posts expanding on these, but in the meantime, I’ve put together a handful of questions to help me think about priorities and get perspective when I’m doing (or avoiding doing) something. What difference is this making? I feel more motivated when I can figure out how I’m contributing to something bigger than myself. In society? In my organisation? To my friends & family? Am I looking after myself? I focus a lot on the expectations have (or at least that I think others have) of me, but I can’t do anything well unless I’m generally happy and healthy. Is this making me happier and healthier? Is this building my capacity to to look after myself, my family & friends and do my job? Is this worth the amount of time and energy I’m putting in? Do I have evidence for this? I don’t have to base decisions purely on feelings/opinions: I have the skills to obtain, analyse and interpret data. Is this fact or opinion? What are the facts? Am I overthinking this? Can I put a confidence interval for this? Build documents from code and data with Saga !!! tldr “TL;DR” I’ve made Saga, a thing for compiling documents by combining code and data with templates. What is it? Saga is a very simple command-line tool that reads in one or more data files, runs one or more scripts, then passes the results into a template to produce a final output document. It enables you to maintain a clean separation between data, logic and presentation and produce data-based documents that can easily be updated. That allows the flow of data through the document to be easily understood, a cornerstone of reproducible analysis. You run it like this: saga build -d data.yaml -d other_data.yaml \ -s analysis.py -t report.md.tmpl \ -O report.md Any scripts specified with -s will have access to the data in local variables, and any changes to local variables in a script will be retained when everything is passed to the template for rendering. For debugging, you can also do: saga dump -d data.yaml -d other_data.yaml -s analysis.py which will print out the full environment that would be passed to your template with saga build. Features Right now this is a really early version. It does the job but I have lots of ideas for features to add if I ever have time. At present it does the following: Reads data from one or more YAML files Transforms data with one or more Python scripts Renders a template in Mako format Works with any plain-text output format, including Markdown, LaTeX and HTML Use cases Write reproducible reports & papers based on machine-readable data Separate presentation from content in any document, e.g. your CV (example coming soon) Yours here? Get it! I haven’t released this on PyPI yet, but all the code is available on GitHub to try out. If you have pipenv installed (and if you use Python you should!), you can try it out in an isolated virtual environment by doing: git clone https://github.com/jezcope/sagadoc.git cd sagadoc pipenv install pipenv run saga or you can set up for development and run some tests: pipenv install --dev pipenv run pytest Why? Like a lot of people, I have to produce reports for work, often containing statistics computed from data. Although these generally aren’t academic research papers, I see no reason not to aim for a similar level of reproducibility: after all, if I’m telling other people to do it, I’d better take my own advice! A couple of times now I’ve done this by writing a template that holds the text of the report and placeholders for values, along with a Python script that reads in the data, calculates the statistics I want and completes the template. This is valuable for two main reasons: If anyone wants to know how I processed the data and calculated those statistics, it’s all there: no need to try and remember and reproduce a series of button clicks in Excel; If the data or calculations change, I just need to update the relevant part and run it again, and all the relevant parts of the document will be updated. This is particularly important if changing a single data value requires recalculation of dozens of tables, charts, etc. It also gives me the potential to factor out and reuse bits of code in the future, add tests and version control everything. Now that I’ve done this more than once (and it seems likely I’ll do it again) it makes sense to package that script up in a more portable form so I don’t have to write it over and over again (or, shock horror, copy & paste it!). It saves time, and gives others the possibility to make use of it. Prior art I’m not the first person to think of this, but I couldn’t find anything that did exactly what I needed. Several tools will let you interweave code and prose, including the results of evaluating each code snippet in the document: chief among these are Jupyter and Rmarkdown. There are also tools that let you write code in the order that makes most sense to read and then rearrange it into the right order to execute, so-call literate programming. The original tool for this is the venerable noweb. Sadly there is very little that combine both of these and allow you to insert the results of various calculations at arbitrary points in a document, independent of the order of either presenting or executing the code. The only two that I’m aware of are: Dexy and org-mode. Unfortunately, Dexy currently only works on Legacy Python (/Python 2) and org-mode requires emacs (which is fine but not exactly portable). Rmarkdown comes close and supports a range of languages but the full feature set is only available with R. Actually, my ideal solution is org-mode without the emacs dependency, because that’s the most flexible solution; maybe one day I’ll have both the time and skill to implement that. It’s also possible I might be able to figure out Dexy’s internals to add what I want to it, but until then Saga does the job! Future work There are lots of features that I’d still like to add when I have time: Some actual documentation! And examples! More data formats (e.g. CSV, JSON, TOML) More languages (e.g. R, Julia) Fetching remote data over http Caching of intermediate results to speed up rebuilds For now, though, I’d love for you to try it out and let me know what you think! As ever, comment here, tweet me or start an issue on GitHub. Why try Rust for scientific computing? When you’re writing analysis code, Python (or R, or JavaScript, or …) is usually the right choice. These high-level languages are set up to make you as productive as possible, and common tasks like array manipulation have been well optimised. However, sometimes you just can’t get enough speed and need to turn to a lower-level compiled language. Often that will be C, C++ or Fortran, but I thought I’d do a short post on why I think you should consider Rust. One of my goals for 2017’s Advent of Code was to learn a modern, memory-safe, statically-typed language. I now know that there are quite a lot of options in this space, but two seem to stand out: Go & Rust. I gave both of them a try, and although I’ll probably go back to give Go a more thorough test at some point I found I got quite hooked on Rust. Both languages, though young, are definitely production-ready. Servo, the core of the new Firefox browser, is entirely written in Rust. In fact, Mozilla have been trying to rewrite the rendering core in C for nearly a decade, and switching to Rust let them get it done in just a couple of years. !!! tldr “TL;DR” - It’s fast: competitive with idiomatic C/C++, and no garbage-collection overhead - It’s harder to write buggy code, and compiler errors are actually helpful - It’s C-compatible: you can call into Rust code anywhere you’d call into C, call C/C++ from Rust, and incrementally replace C/C++ code with Rust - It has sensible modern syntax that makes your code clearer and more concise - Support for scientific computing are getting better all the time (matrix algebra libraries, built-in SIMD, safe concurrency) - It has a really friendly and active community - It’s production-ready: Servo, the new rendering core in Firefox, is built entirely in Rust Performance To start with, as a compiled language Rust executes much faster than a (pseudo-)interpreted language like Python or R; the price you pay for this is time spent compiling during development. However, having a compile step also allows the language to enforce certain guarantees, such as type-correctness and memory safety, which between them prevent whole classes of bugs from even being possible. Unlike Go (which, like many higher-level languages, uses a garbage collector), Rust handles memory safety at compile time through the concepts of ownership and borrowing. These can take some getting used to and were a big source of frustration when I was first figuring out the language, but ultimately contribute to Rust’s reliably-fast performance. Performance can be unpredictable in a garbage-collected language because you can’t be sure when the GC is going to run and you need to understand it really well to stand a chance of optimising it if becomes a problem. On the other hand, code that has the potential to be unsafe will result in compilation errors in Rust. There are a number of benchmarks (example) that show Rust’s performance on a par with idiomatic C & C++ code, something that very few languages can boast. Helpful error messages Because beginner Rust programmers often get compile errors, it’s really important that those errors are easy to interpret and fix, and Rust is great at this. Not only does it tell you what went wrong, but wherever possible it prints out your code annotated with arrows to show exactly where the error is, and makes specific suggestions how to fix the error which usually turn out to be correct. It also has a nice suite of warnings (things that don’t cause compilation to fail but may indicate bugs) that are just as informative, and this can be extended even further by using the clippy linting tool to further analyse your code. warning: unused variable: `y` --> hello.rs:3:9 | 3 | let y = x; | ^ | = note: #[warn(unused_variables)] on by default = note: to avoid this warning, consider using `_y` instead Easy to integrate with other languages If you’re like me, you’ll probably only use a low-level language for performance-critical code that you can call from a high-level language, and this is an area where Rust shines. Most programmers will turn to C, C++ or Fortran for this because they have a well established ABI (Application Binary Interface) which can be understood by languages like Python and R1. In Rust, it’s trivial to make a C-compatible shared library, and the standard library includes extra features for working with C types. That also means that existing C code can be incrementally ported to Rust: see remacs for an example. On top of this, there are projects like rust-cpython and PyO3 which provide macros and structures that wrap the Python C API to let you build Python modules in Rust with minimal glue code; rustr does a similar job for R. Nice language features Rust has some really nice features, which let you write efficient, concise and correct code. Several feel particularly comfortable as they remind me of similar things available in Haskell, including: Enums, a super-powered combination of C enums and unions (similar to Haskell’s algebraic data types) that enable some really nice code with no runtime cost Generics and traits that let you get more done with less code Pattern matching, a kind of case statement that lets you extract parts of structs, tuples & enums and do all sorts of other clever things Lazy computation based on an iterator pattern, for efficient processing of lists of things: you can do for item in list { ... } instead of the C-style use of an index2, or you can use higher-order functions like map and filter Functions/closures as first-class citizens Scientific computing Although it’s a general-purpose language and not designed specifically for scientific computing, Rust’s support is improving all the time. There are some interesting matrix algebra libraries available, and built-in SIMD is incoming. The memory safety features also work to ensure thread safety, so it’s harder to write concurrency bugs. You should be able to use your favourite MPI implementation too, and there’s at least one attempt to portably wrap MPI in a more Rust-like way. Active development and friendly community One of the things you notice straight away is how active and friendly the Rust community is. There are several IRC channels on irc.mozilla.org including #rust-beginners, which is a great place to get help. The compiler is under constant but carefully-managed development, so that new features are landing all the time but without breaking existing code. And the fabulous Cargo build tool and crates.io are enabling the rapid growth of a healthy ecosystem of open source libraries that you can use to write less code yourself. Summary So, next time you need a compiled language to speed up hotspots in your code, try Rust. I promise you won’t regret it! Julia actually allows you to call C and Fortran functions as a first-class language feature ↩︎ Actually, since C++11 there’s for (auto item : list) { ... } but still… ↩︎ Reflections on #aoc2017 Trees reflected in a lake Joshua Reddekopp on Unsplash It seems like ages ago, but way back in November I committed to completing Advent of Code. I managed it all, and it was fun! All of my code is available on GitHub if you’re interested in seeing what I did, and I managed to get out a blog post for every one with a bit more commentary, which you can see in the series list above. How did I approach it? I’ve not really done any serious programming challenges before. I don’t get to write a lot of code at the moment, so all I wanted from AoC was an excuse to do some proper problem-solving. I never really intended to take a polyglot approach, though I did think that I might use mainly Python with a bit of Haskell. In the end, though, I used: Python (×12); Haskell (×7); Rust (×4); Go; C++; Ruby; Julia; and Coconut. For the most part, my priorities were getting the right answer, followed by writing readable code. I didn’t specifically focus on performance but did try to avoid falling into traps that I knew about. What did I learn? I found Python the easiest to get on with: it’s the language I know best and although I can’t always remember exact method names and parameters I know what’s available and where to look to remind myself, as well as most of the common idioms and some performance traps to avoid. Python was therefore the language that let me focus most on solving the problem itself. C++ and Ruby were more challenging, and it was harder to write good idiomatic code but I can still remember quite a lot. Haskell I haven’t used since university, and just like back then I really enjoyed working out how to solve problems in a functional style while still being readable and efficient (not always something I achieved…). I learned a lot about core Haskell concepts like monads & functors, and I’m really amazed by the way the Haskell community and ecosystem has grown up in the last decade. I also wanted to learn at least one modern, memory-safe compiled language, so I tried both Go and Rust. Both seem like useful languages, but Rust really intrigued me with its conceptual similarities to both Haskell and C++ and its promise of memory safety without a garbage collector. I struggled a lot initially with the “borrow checker” (the component that enforces memory safety at compile time) but eventually started thinking in terms of ownership and lifetimes after which things became easier. The Rust community seems really vibrant and friendly too. What next? I really want to keep this up, so I’m going to look out some more programming challenges (Project Euler looks interesting). It turns out there’s a regular Code Dojo meetup in Leeds, so hopefully I’ll try that out too. I’d like to do more realistic data-science stuff, so I’ll be taking a closer look at stuff like Kaggle too, and figuring out how to do a bit more analysis at work. I’m also feeling motivated to find an open source project to contribute to and/or release a project of my own, so we’ll see if that goes anywhere! I’ve always found the advice to “scratch your own itch” difficult to follow because everything I think of myself has already been done better. Most of the projects I use enough to want to contribute to tend to be pretty well developed with big communities and any bugs that might be accessible to me will be picked off and fixed before I have a chance to get started. Maybe it’s time to get over myself and just reimplement something that already exists, just for the fun of it! The Halting Problem — Python — #adventofcode Day 25 Today’s challenge, takes us back to a bit of computing history: a good old-fashioned Turing Machine. → Full code on GitHub !!! commentary Today’s challenge was a nice bit of nostalgia, taking me back to my university days learning about the theory of computing. Turing Machines are a classic bit of computing theory, and are provably able to compute any value that is possible to compute: a value is computable if and only if a Turing Machine can be written that computes it (though in practice anything non-trivial is mind-bendingly hard to write as a TM). A bit of a library-fest today, compared to other days! from collections import deque, namedtuple from collections.abc import Iterator from tqdm import tqdm import re import fileinput as fi These regular expressions are used to parse the input that defines the transition table for the machine. RE_ISTATE = re.compile(r'Begin in state (?P<state>\w+)\.') RE_RUNTIME = re.compile( r'Perform a diagnostic checksum after (?P<steps>\d+) steps.') RE_STATETRANS = re.compile( r"In state (?P<state>\w+):\n" r" If the current value is (?P<read0>\d+):\n" r" - Write the value (?P<write0>\d+)\.\n" r" - Move one slot to the (?P<move0>left|right).\n" r" - Continue with state (?P<next0>\w+).\n" r" If the current value is (?P<read1>\d+):\n" r" - Write the value (?P<write1>\d+)\.\n" r" - Move one slot to the (?P<move1>left|right).\n" r" - Continue with state (?P<next1>\w+).") MOVE = {'left': -1, 'right': 1} A namedtuple to provide some sugar when using a transition rule. Rule = namedtuple('Rule', 'write move next_state') The TuringMachine class does all the work. class TuringMachine: def __init__(self, program=None): self.tape = deque() self.transition_table = {} self.state = None self.runtime = 0 self.steps = 0 self.pos = 0 self.offset = 0 if program is not None: self.load(program) def __str__(self): return f"Current: {self.state}; steps: {self.steps} of {self.runtime}" Some jiggery-pokery to allow us to use self[pos] to reference an infinite tape. def __getitem__(self, i): i += self.offset if i < 0 or i >= len(self.tape): return 0 else: return self.tape[i] def __setitem__(self, i, x): i += self.offset if i >= 0 and i < len(self.tape): self.tape[i] = x elif i == -1: self.tape.appendleft(x) self.offset += 1 elif i == len(self.tape): self.tape.append(x) else: raise IndexError('Tried to set position off end of tape') Parse the program and set up the transtion table. def load(self, program): if isinstance(program, Iterator): program = ''.join(program) match = RE_ISTATE.search(program) self.state = match['state'] match = RE_RUNTIME.search(program) self.runtime = int(match['steps']) for match in RE_STATETRANS.finditer(program): self.transition_table[match['state']] = { int(match['read0']): Rule(write=int(match['write0']), move=MOVE[match['move0']], next_state=match['next0']), int(match['read1']): Rule(write=int(match['write1']), move=MOVE[match['move1']], next_state=match['next1']), } Run the program for the required number of steps (given by self.runtime). tqdm isn’t in the standard library but it should be: it shows a lovely text-mode progress bar as we go. def run(self): for _ in tqdm(range(self.runtime), desc="Running", unit="steps", unit_scale=True): read = self[self.pos] rule = self.transition_table[self.state][read] self[self.pos] = rule.write self.pos += rule.move self.state = rule.next_state Calculate the “diagnostic checksum” required for the answer. @property def checksum(self): return sum(self.tape) Aaand GO! machine = TuringMachine(fi.input()) machine.run() print("Checksum:", machine.checksum) Electromagnetic Moat — Rust — #adventofcode Day 24 Today’s challenge, the penultimate, requires us to build a bridge capable of reaching across to the CPU, our final destination. → Full code on GitHub !!! commentary We have a finite number of components that fit together in a restricted way from which to build a bridge, and we have to work out both the strongest and the longest bridge we can build. The most obvious way to do this is to recursively build every possible bridge and select the best, but that’s an O(n!) algorithm that could blow up quickly, so might as well go with a nice fast language! Might have to try this in Haskell too, because it’s the type of algorithm that lends itself naturally to a pure functional approach. I feel like I've applied some of the things I've learned in previous challenges I used Rust for, and spent less time mucking about with ownership, and made better use of various language features, including structs and iterators. I'm rather pleased with how my learning of this language is progressing. I'm definitely overusing `Option.unwrap` at the moment though: this is a lazy way to deal with `Option` results and will panic if the result is not what's expected. I'm not sure whether I need to be cloning the components `Vector` either, or whether I could just be passing iterators around. First, we import some bits of standard library and define some data types. The BridgeResult struct lets us use the same algorithm for both parts of the challenge and simply change the value used to calculate the maximum. use std::io; use std::fmt; use std::io::BufRead; #[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)] struct Component(u8, u8); #[derive(Debug, Copy, Clone, Default)] struct BridgeResult { strength: u16, length: u16, } impl Component { fn from_str(s: &str) -> Component { let parts: Vec<&str> = s.split('/').collect(); assert!(parts.len() == 2); Component(parts[0].parse().unwrap(), parts[1].parse().unwrap()) } fn fits(self, port: u8) -> bool { self.0 == port || self.1 == port } fn other_end(self, port: u8) -> u8 { if self.0 == port { return self.1; } else if self.1 == port { return self.0; } else { panic!("{} doesn't fit port {}", self, port); } } fn strength(self) -> u16 { self.0 as u16 + self.1 as u16 } } impl fmt::Display for BridgeResult { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { write!(f, "(S: {}, L: {})", self.strength, self.length) } } best_bridge calculates the length and strength of the “best” bridge that can be built from the remaining components and fits the required port. Whether this is based on strength or length is given by the key parameter, which is passed to Iter.max_by_key. fn best_bridge<F>(port: u8, key: &F, components: &Vec<Component>) -> Option<BridgeResult> where F: Fn(&BridgeResult) -> u16 { if components.len() == 0 { return None; } components.iter() .filter(|c| c.fits(port)) .map(|c| { let b = best_bridge(c.other_end(port), key, &components.clone().into_iter() .filter(|x| x != c).collect()) .unwrap_or_default(); BridgeResult{strength: c.strength() + b.strength, length: 1 + b.length} }) .max_by_key(key) } Now all that remains is to read the input and calculate the result. I was rather pleasantly surprised to find that in spite of my pessimistic predictions about efficiency, when compiled with optimisations turned on this terminates in less than 1s on my laptop. fn main() { let stdin = io::stdin(); let components: Vec<_> = stdin.lock() .lines() .map(|l| Component::from_str(&l.unwrap())) .collect(); match best_bridge(0, &|b: &BridgeResult| b.strength, &components) { Some(b) => println!("Strongest bridge is {}", b), None => println!("No strongest bridge found") }; match best_bridge(0, &|b: &BridgeResult| b.length, &components) { Some(b) => println!("Longest bridge is {}", b), None => println!("No longest bridge found") }; } Coprocessor Conflagration — Haskell — #adventofcode Day 23 Today’s challenge requires us to understand why a coprocessor is working so hard to perform an apparently simple calculation. → Full code on GitHub !!! commentary Today’s problem is based on an assembly-like language very similar to day 18, so I went back and adapted my code from that, which works well for the first part. I’ve also incorporated some advice from /r/haskell, and cleaned up all warnings shown by the -Wall compiler flag and the hlint tool. Part 2 requires the algorithm to run with much larger inputs, and since some analysis shows that it's an `O(n^3)` algorithm it gets intractible pretty fast. There are several approaches to this. First up, if you have a fast enough processor and an efficient enough implementation I suspect that the simulation would probably terminate eventually, but that would likely still take hours: not good enough. I also thought about doing some peephole optimisations on the instructions, but the last time I did compiler optimisation was my degree so I wasn't really sure where to start. What I ended up doing was actually analysing the input code by hand to figure out what it was doing, and then just doing that calculation in a sensible way. I'd like to say I managed this on my own (and I ike to think I would have) but I did get some tips on [/r/adventofcode](https://reddit.com/r/adventofcode). The majority of this code is simply a cleaned-up version of day 18, with some tweaks to accommodate the different instruction set: module Main where import qualified Data.Vector as V import qualified Data.Map.Strict as M import Control.Monad.State.Strict import Text.ParserCombinators.Parsec hiding (State) type Register = Char type Value = Int type Argument = Either Value Register data Instruction = Set Register Argument | Sub Register Argument | Mul Register Argument | Jnz Argument Argument deriving Show type Program = V.Vector Instruction data Result = Cont | Halt deriving (Eq, Show) type Registers = M.Map Char Int data Machine = Machine { dRegisters :: Registers , dPtr :: !Int , dMulCount :: !Int , dProgram :: Program } instance Show Machine where show d = show (dRegisters d) ++ " @" ++ show (dPtr d) ++ " ×" ++ show (dMulCount d) defaultMachine :: Machine defaultMachine = Machine M.empty 0 0 V.empty type MachineState = State Machine program :: GenParser Char st Program program = do instructions <- endBy instruction eol return $ V.fromList instructions where instruction = try (regOp "set" Set) <|> regOp "sub" Sub <|> regOp "mul" Mul <|> jump "jnz" Jnz regOp n c = do string n >> spaces val1 <- oneOf "abcdefgh" secondArg c val1 jump n c = do string n >> spaces val1 <- regOrVal secondArg c val1 secondArg c val1 = do spaces val2 <- regOrVal return $ c val1 val2 regOrVal = register <|> value register = do name <- lower return $ Right name value = do val <- many $ oneOf "-0123456789" return $ Left $ read val eol = char '\n' parseProgram :: String -> Either ParseError Program parseProgram = parse program "" getReg :: Char -> MachineState Int getReg r = do st <- get return $ M.findWithDefault 0 r (dRegisters st) putReg :: Char -> Int -> MachineState () putReg r v = do st <- get let current = dRegisters st new = M.insert r v current put $ st { dRegisters = new } modReg :: (Int -> Int -> Int) -> Char -> Argument -> MachineState () modReg op r v = do u <- getReg r v' <- getRegOrVal v putReg r (u `op` v') incPtr getRegOrVal :: Argument -> MachineState Int getRegOrVal = either return getReg addPtr :: Int -> MachineState () addPtr n = do st <- get put $ st { dPtr = n + dPtr st } incPtr :: MachineState () incPtr = addPtr 1 execInst :: Instruction -> MachineState () execInst (Set reg val) = do newVal <- getRegOrVal val putReg reg newVal incPtr execInst (Mul reg val) = do result <- modReg (*) reg val st <- get put $ st { dMulCount = 1 + dMulCount st } return result execInst (Sub reg val) = modReg (-) reg val execInst (Jnz val1 val2) = do test <- getRegOrVal val1 jump <- if test /= 0 then getRegOrVal val2 else return 1 addPtr jump execNext :: MachineState Result execNext = do st <- get let prog = dProgram st p = dPtr st if p >= length prog then return Halt else do execInst (prog V.! p) return Cont runUntilTerm :: MachineState () runUntilTerm = do result <- execNext unless (result == Halt) runUntilTerm This implements the actual calculation: the number of non-primes between (for my input) 107900 and 124900: optimisedCalc :: Int -> Int -> Int -> Int optimisedCalc a b k = sum $ map (const 1) $ filter notPrime [a,a+k..b] where notPrime n = elem 0 $ map (mod n) [2..(floor $ sqrt (fromIntegral n :: Double))] main :: IO () main = do input <- getContents case parseProgram input of Right prog -> do let c = defaultMachine { dProgram = prog } (_, c') = runState runUntilTerm c putStrLn $ show (dMulCount c') ++ " multiplications made" putStrLn $ "Calculation result: " ++ show (optimisedCalc 107900 124900 17) Left e -> print e Sporifica Virus — Rust — #adventofcode Day 22 Today’s challenge has us helping to clean up (or spread, I can’t really tell) an infection of the “sporifica” virus. → Full code on GitHub !!! commentary I thought I’d have another play with Rust, as its Haskell-like features resonate with me at the moment. I struggled quite a lot with the Rust concepts of ownership and borrowing, and this is a cleaned-up version of the code based on some good advice from the folks on /r/rust. use std::io; use std::env; use std::io::BufRead; use std::collections::HashMap; #[derive(PartialEq, Clone, Copy, Debug)] enum Direction {Up, Right, Down, Left} #[derive(PartialEq, Clone, Copy, Debug)] enum Infection {Clean, Weakened, Infected, Flagged} use self::Direction::*; use self::Infection::*; type Grid = HashMap<(isize, isize), Infection>; fn turn_left(d: Direction) -> Direction { match d {Up => Left, Right => Up, Down => Right, Left => Down} } fn turn_right(d: Direction) -> Direction { match d {Up => Right, Right => Down, Down => Left, Left => Up} } fn turn_around(d: Direction) -> Direction { match d {Up => Down, Right => Left, Down => Up, Left => Right} } fn make_move(d: Direction, x: isize, y: isize) -> (isize, isize) { match d { Up => (x-1, y), Right => (x, y+1), Down => (x+1, y), Left => (x, y-1), } } fn basic_step(grid: &mut Grid, x: &mut isize, y: &mut isize, d: &mut Direction) -> usize { let mut infect = 0; let current = match grid.get(&(*x, *y)) { Some(v) => *v, None => Clean, }; if current == Infected { *d = turn_right(*d); } else { *d = turn_left(*d); infect = 1; }; grid.insert((*x, *y), match current { Clean => Infected, Infected => Clean, x => panic!("Unexpected infection state {:?}", x), }); let new_pos = make_move(*d, *x, *y); *x = new_pos.0; *y = new_pos.1; infect } fn nasty_step(grid: &mut Grid, x: &mut isize, y: &mut isize, d: &mut Direction) -> usize { let mut infect = 0; let new_state: Infection; let current = match grid.get(&(*x, *y)) { Some(v) => *v, None => Infection::Clean, }; match current { Clean => { *d = turn_left(*d); new_state = Weakened; }, Weakened => { new_state = Infected; infect = 1; }, Infected => { *d = turn_right(*d); new_state = Flagged; }, Flagged => { *d = turn_around(*d); new_state = Clean; } }; grid.insert((*x, *y), new_state); let new_pos = make_move(*d, *x, *y); *x = new_pos.0; *y = new_pos.1; infect } fn virus_infect<F>(mut grid: Grid, mut step: F, mut x: isize, mut y: isize, mut d: Direction, n: usize) -> usize where F: FnMut(&mut Grid, &mut isize, &mut isize, &mut Direction) -> usize, { (0..n).map(|_| step(&mut grid, &mut x, &mut y, &mut d)) .sum() } fn main() { let args: Vec<String> = env::args().collect(); let n_basic: usize = args[1].parse().unwrap(); let n_nasty: usize = args[2].parse().unwrap(); let stdin = io::stdin(); let lines: Vec<String> = stdin.lock() .lines() .map(|x| x.unwrap()) .collect(); let mut grid: Grid = HashMap::new(); let x0 = (lines.len() / 2) as isize; let y0 = (lines[0].len() / 2) as isize; for (i, line) in lines.iter().enumerate() { for (j, c) in line.chars().enumerate() { grid.insert((i as isize, j as isize), match c {'#' => Infected, _ => Clean}); } } let basic_steps = virus_infect(grid.clone(), basic_step, x0, y0, Up, n_basic); println!("Basic: infected {} times", basic_steps); let nasty_steps = virus_infect(grid, nasty_step, x0, y0, Up, n_nasty); println!("Nasty: infected {} times", nasty_steps); } Fractal Art — Python — #adventofcode Day 21 Today’s challenge asks us to assist an artist building fractal patterns from a rulebook. → Full code on GitHub !!! commentary Another fairly straightforward algorithm: the really tricky part was breaking the pattern up into chunks and rejoining it again. I could probably have done that more efficiently, and would have needed to if I had to go for a few more iterations and the grid grows with every iteration and gets big fast. Still behind on the blog posts… import fileinput as fi from math import sqrt from functools import reduce, partial import operator INITIAL_PATTERN = ((0, 1, 0), (0, 0, 1), (1, 1, 1)) DECODE = ['.', '#'] ENCODE = {'.': 0, '#': 1} concat = partial(reduce, operator.concat) def rotate(p): size = len(p) return tuple(tuple(p[i][j] for i in range(size)) for j in range(size - 1, -1, -1)) def flip(p): return tuple(p[i] for i in range(len(p) - 1, -1, -1)) def permutations(p): yield p yield flip(p) for _ in range(3): p = rotate(p) yield p yield flip(p) def print_pattern(p): print('-' * len(p)) for row in p: print(' '.join(DECODE[x] for x in row)) print('-' * len(p)) def build_pattern(s): return tuple(tuple(ENCODE[c] for c in row) for row in s.split('/')) def build_pattern_book(lines): book = {} for line in lines: source, target = line.strip().split(' => ') for rotation in permutations(build_pattern(source)): book[rotation] = build_pattern(target) return book def subdivide(pattern): size = 2 if len(pattern) % 2 == 0 else 3 n = len(pattern) // size return (tuple(tuple(pattern[i][j] for j in range(y * size, (y + 1) * size)) for i in range(x * size, (x + 1) * size)) for x in range(n) for y in range(n)) def rejoin(parts): n = int(sqrt(len(parts))) size = len(parts[0]) return tuple(concat(parts[i + k][j] for i in range(n)) for k in range(0, len(parts), n) for j in range(size)) def enhance_once(p, book): return rejoin(tuple(book[part] for part in subdivide(p))) def enhance(p, book, n, progress=None): for _ in range(n): p = enhance_once(p, book) return p book = build_pattern_book(fi.input()) intermediate_pattern = enhance(INITIAL_PATTERN, book, 5) print("After 5 iterations:", sum(sum(row) for row in intermediate_pattern)) final_pattern = enhance(intermediate_pattern, book, 13) print("After 18 iterations:", sum(sum(row) for row in final_pattern)) Particle Swarm — Python — #adventofcode Day 20 Today’s challenge finds us simulating the movements of particles in space. → Full code on GitHub !!! commentary Back to Python for this one, another relatively straightforward simulation, although it’s easier to calculate the answer to part 1 than to simulate. import fileinput as fi import numpy as np import re First we parse the input into 3 2D arrays: using numpy enables us to do efficient arithmetic across the whole set of particles in one go. PARTICLE_RE = re.compile(r'p=<(-?\d+),(-?\d+),(-?\d+)>, ' r'v=<(-?\d+),(-?\d+),(-?\d+)>, ' r'a=<(-?\d+),(-?\d+),(-?\d+)>') def parse_input(lines): x = [] v = [] a = [] for l in lines: m = PARTICLE_RE.match(l) x.append([int(x) for x in m.group(1, 2, 3)]) v.append([int(x) for x in m.group(4, 5, 6)]) a.append([int(x) for x in m.group(7, 8, 9)]) return (np.arange(len(x)), np.array(x), np.array(v), np.array(a)) i, x, v, a = parse_input(fi.input()) Now we can calculate which particle will be closest to the origin in the long-term: this is simply the particle with the smallest acceleration. It turns out that several have the same acceleration, so of these, the one we want is the one with the lowest starting velocity. This is only complicated slightly by the need to get the number of the particle rather than its other information, hence the need to use numpy.argmin. a_abs = np.sum(np.abs(a), axis=1) a_min = np.min(a_abs) a_i = np.squeeze(np.argwhere(a_abs == a_min)) closest = i[a_i[np.argmin(np.sum(np.abs(v[a_i]), axis=1))]] print("Closest: ", closest) Now we define functions to simulate collisions between particles. We have to use the return_index and return_counts options to numpy.unique to be able to get rid of all the duplicate positions (the standard usage is to keep one of each duplicate). def resolve_collisions(x, v, a): (_, i, c) = np.unique(x, return_index=True, return_counts=True, axis=0) i = i[c == 1] return x[i], v[i], a[i] The termination criterion for this loop is an interesting aspect: the most robust to my mind seems to be that eventually the particles will end up sorted in order of their initial acceleration in terms of distance from the origin, so you could check for this but that’s pretty computationally expensive. In the end, all that was needed was a bit of trial and error: terminating arbitrarily after 1,000 iterations seems to work! In fact, all the collisions are over after about 40 iterations for my input but there was always the possibility that two particles with very slightly different accelerations would eventually intersect much later. def simulate_collisions(x, v, a, iterations=1000): for _ in range(iterations): v += a x += v x, v, a = resolve_collisions(x, v, a) return len(x) print("Remaining particles: ", simulate_collisions(x, v, a)) A Series of Tubes — Rust — #adventofcode Day 19 Today’s challenge asks us to help a network packet find its way. → Full code on GitHub !!! commentary Today’s challenge was fairly straightforward, following an ASCII art path, so I thought I’d give Rust another try. I’m a bit behind on the blog posts, so I’m presenting the code below without any further commentary. I’m not really convinced this is good idiomatic Rust, and it was interesting turning a set of strings into a 2D array of characters because there are both u8 (byte) and char types to deal with. use std::io; use std::io::BufRead; const ALPHA: &'static str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; fn change_direction(dia: &Vec<Vec<u8>>, x: usize, y: usize, dx: &mut i32, dy: &mut i32) { assert_eq!(dia[x][y], b'+'); if dx.abs() == 1 { *dx = 0; if y + 1 < dia[x].len() && (dia[x][y + 1] == b'-' || ALPHA.contains(dia[x][y + 1] as char)) { *dy = 1; } else if dia[x][y - 1] == b'-' || ALPHA.contains(dia[x][y - 1] as char) { *dy = -1; } else { panic!("Huh? {} {}", dia[x][y+1] as char, dia[x][y-1] as char); } } else { *dy = 0; if x + 1 < dia.len() && (dia[x + 1][y] == b'|' || ALPHA.contains(dia[x + 1][y] as char)) { *dx = 1; } else if dia[x - 1][y] == b'|' || ALPHA.contains(dia[x - 1][y] as char) { *dx = -1; } else { panic!("Huh?"); } } } fn follow_route(dia: Vec<Vec<u8>>) -> (String, i32) { let mut x: i32 = 0; let mut y: i32; let mut dx: i32 = 1; let mut dy: i32 = 0; let mut result = String::new(); let mut steps = 1; match dia[0].iter().position(|x| *x == b'|') { Some(i) => y = i as i32, None => panic!("Could not find '|' in first row"), } loop { x += dx; y += dy; match dia[x as usize][y as usize] { b'A'...b'Z' => result.push(dia[x as usize][y as usize] as char), b'+' => change_direction(&dia, x as usize, y as usize, &mut dx, &mut dy), b' ' => return (result, steps), _ => (), } steps += 1; } } fn main() { let stdin = io::stdin(); let lines: Vec<Vec<u8>> = stdin.lock().lines() .map(|l| l.unwrap().into_bytes()) .collect(); let result = follow_route(lines); println!("Route: {}", result.0); println!("Steps: {}", result.1); } Duet — Haskell — #adventofcode Day 18 Today’s challenge introduces a type of simplified assembly language that includes instructions for message-passing. First we have to simulate a single program (after humorously misinterpreting the snd and rcv instructions as “sound” and “recover”), but then we have to simulate two concurrent processes and the message passing between them. → Full code on GitHub !!! commentary Well, I really learned a lot from this one! I wanted to get to grips with more complex stuff in Haskell and this challenge seemed like an excellent opportunity to figure out a) parsing with the parsec library and b) using the State monad to keep the state of the simulator. As it turned out, that wasn't all I'd learned: I also ran into an interesting situation whereby lazy evaluation was creating an infinite loop where there shouldn't be one, so I also had to learn how to selectively force strict evaluation of values. I'm pretty sure this isn't the best Haskell in the world, but I'm proud of it. First we have to import a bunch of stuff to use later, but also notice the pragma on the first line which instructs the compiler to enable the BangPatterns language extension, which will be important later. {-# LANGUAGE BangPatterns #-} module Main where import qualified Data.Vector as V import qualified Data.Map.Strict as M import Data.List import Data.Either import Data.Maybe import Control.Monad.State.Strict import Control.Monad.Loops import Text.ParserCombinators.Parsec hiding (State) First up we define the types that will represent the program code itself. data DuetVal = Reg Char | Val Int deriving Show type DuetQueue = [Int] data DuetInstruction = Snd DuetVal | Rcv DuetVal | Jgz DuetVal DuetVal | Set DuetVal DuetVal | Add DuetVal DuetVal | Mul DuetVal DuetVal | Mod DuetVal DuetVal deriving Show type DuetProgram = V.Vector DuetInstruction Next we define the types to hold the machine state, which includes: registers, instruction pointer, send & receive buffers and the program code, plus a counter of the number of sends made (to provide the solution). type DuetRegisters = M.Map Char Int data Duet = Duet { dRegisters :: DuetRegisters , dPtr :: Int , dSendCount :: Int , dRcvBuf :: DuetQueue , dSndBuf :: DuetQueue , dProgram :: DuetProgram } instance Show Duet where show d = show (dRegisters d) ++ " @" ++ show (dPtr d) ++ " S" ++ show (dSndBuf d) ++ " R" ++ show (dRcvBuf d) defaultDuet = Duet M.empty 0 0 [] [] V.empty type DuetState = State Duet program is a parser built on the cool parsec library to turn the program text into a Haskell format that we can work with, a Vector of instructions. Yes, using a full-blown parser is overkill here (it would be much simpler just to split each line on whitespace, but I wanted to see how Parsec works. I’m using Vector here because we need random access to the instruction list, which is much more efficient with Vector: O(1) compared with the O(n) of the built in Haskell list ([]) type. parseProgram applies the parser to a string and returns the result. program :: GenParser Char st DuetProgram program = do instructions <- endBy instruction eol return $ V.fromList instructions where instruction = try (oneArg "snd" Snd) <|> oneArg "rcv" Rcv <|> twoArg "set" Set <|> twoArg "add" Add <|> try (twoArg "mul" Mul) <|> twoArg "mod" Mod <|> twoArg "jgz" Jgz oneArg n c = do string n >> spaces val <- regOrVal return $ c val twoArg n c = do string n >> spaces val1 <- regOrVal spaces val2 <- regOrVal return $ c val1 val2 regOrVal = register <|> value register = do name <- lower return $ Reg name value = do val <- many $ oneOf "-0123456789" return $ Val $ read val eol = char '\n' parseProgram :: String -> Either ParseError DuetProgram parseProgram = parse program "" Next up we have some utility functions that sit in the DuetState monad we defined above and perform common manipulations on the state: getting/setting/updating registers, updating the instruction pointer and sending/receiving messages via the relevant queues. getReg :: Char -> DuetState Int getReg r = do st <- get return $ M.findWithDefault 0 r (dRegisters st) putReg :: Char -> Int -> DuetState () putReg r v = do st <- get let current = dRegisters st new = M.insert r v current put $ st { dRegisters = new } modReg :: (Int -> Int -> Int) -> Char -> DuetVal -> DuetState Bool modReg op r v = do u <- getReg r v' <- getRegOrVal v putReg r (u `op` v') incPtr return False getRegOrVal :: DuetVal -> DuetState Int getRegOrVal (Reg r) = getReg r getRegOrVal (Val v) = return v addPtr :: Int -> DuetState () addPtr n = do st <- get put $ st { dPtr = n + dPtr st } incPtr = addPtr 1 send :: Int -> DuetState () send v = do st <- get put $ st { dSndBuf = (dSndBuf st ++ [v]), dSendCount = dSendCount st + 1 } recv :: DuetState (Maybe Int) recv = do st <- get case dRcvBuf st of (x:xs) -> do put $ st { dRcvBuf = xs } return $ Just x [] -> return Nothing execInst implements the logic for each instruction. It returns False as long as the program can continue, but True if the program tries to receive from an empty buffer. execInst :: DuetInstruction -> DuetState Bool execInst (Set (Reg reg) val) = do newVal <- getRegOrVal val putReg reg newVal incPtr return False execInst (Mul (Reg reg) val) = modReg (*) reg val execInst (Add (Reg reg) val) = modReg (+) reg val execInst (Mod (Reg reg) val) = modReg mod reg val execInst (Jgz val1 val2) = do st <- get test <- getRegOrVal val1 jump <- if test > 0 then getRegOrVal val2 else return 1 addPtr jump return False execInst (Snd val) = do v <- getRegOrVal val send v st <- get incPtr return False execInst (Rcv (Reg r)) = do st <- get v <- recv handle v where handle :: Maybe Int -> DuetState Bool handle (Just x) = putReg r x >> incPtr >> return False handle Nothing = return True execInst x = error $ "execInst not implemented yet for " ++ show x execNext looks up the next instruction and executes it. runUntilWait runs the program until execNext returns True to signal the wait state has been reached. execNext :: DuetState Bool execNext = do st <- get let prog = dProgram st p = dPtr st if p >= length prog then return True else execInst (prog V.! p) runUntilWait :: DuetState () runUntilWait = do waiting <- execNext unless waiting runUntilWait runTwoPrograms handles the concurrent running of two programs, by running first one and then the other to a wait state, then swapping each program’s send buffer to the other’s receive buffer before repeating. If you look carefully, you’ll see a “bang” (!) before the two arguments of the function: runTwoPrograms !d0 !d1. Haskell is a lazy language and usually doesn’t evaluate a computation until you ask for a result, instead carrying around a “thunk” or plan for how to carry out the computation. Sometimes that can be a problem because the amount of memory your program is using can explode unnecessarily as a long computation turns into a large thunk which isn’t evaluated until the very end. That’s not the problem here though. What happens here without the bangs is another side-effect of laziness. The exit condition of this recursive function is that a deadlock has been reached: both programs are waiting to receive, but neither has sent anything, so neither can ever continue. The check for this is (null $ dSndBuf d0') && (null $ dSndBuf d1'). As long as the first program has something in its send buffer, the test fails without ever evaluating the second part, which means the result d1' of running the second program is never needed. The function immediately goes to the recursive case and tries to continue the first program again, which immediately returns because it’s still waiting to receive. The same thing happens again, and the result is that instead of running the second program to obtain something for the first to receive, we get into an infinite loop trying and failing to continue the first program. The bang forces both d0 and d1 to be evaluated at the point we recurse, which forces the rest of the computation: running the second program and swapping the send/receive buffers. With that, the evaluation proceeds correctly and we terminate with a result instead of getting into an infinite loop! runTwoPrograms :: Duet -> Duet -> (Int, Int) runTwoPrograms !d0 !d1 | (null $ dSndBuf d0') && (null $ dSndBuf d1') = (dSendCount d0', dSendCount d1') | otherwise = runTwoPrograms d0'' d1'' where (_, d0') = runState runUntilWait d0 (_, d1') = runState runUntilWait d1 d0'' = d0' { dSndBuf = [], dRcvBuf = dSndBuf d1' } d1'' = d1' { dSndBuf = [], dRcvBuf = dSndBuf d0' } All that remains to be done now is to run the programs and see how many messages were sent before the deadlock. main = do prog <- fmap (fromRight V.empty . parseProgram) getContents let d0 = defaultDuet { dProgram = prog, dRegisters = M.fromList [('p', 0)] } d1 = defaultDuet { dProgram = prog, dRegisters = M.fromList [('p', 1)] } (send0, send1) = runTwoPrograms d0 d1 putStrLn $ "Program 0 sent " ++ show send0 ++ " messages" putStrLn $ "Program 1 sent " ++ show send1 ++ " messages" Spinlock — Rust/Python — #adventofcode Day 17 In today’s challenge we deal with a monstrous whirlwind of a program, eating up CPU and memory in equal measure. → Full code on GitHub (and Python driver script) !!! commentary One of the things I wanted from AoC was an opportunity to try out some popular languages that I don’t currently know, including the memory-safe, strongly-typed compiled languages Go and Rust. Realistically though, I’m likely to continue doing most of my programming in Python, and use one of these other languages when it has better tools or I need the extra speed. In which case, what I really want to know is how I can call functions written in Go or Rust from Python. I thought I'd try Rust first, as it seems to be designed to be C-compatible and that makes it easy to call from Python using [`ctypes`](https://docs.python.org/3.6/library/ctypes.html). Part 1 was another straightforward simulation: translate what the "spinlock" monster is doing into code and run it. It was pretty obvious from the story of this challenge and experience of the last few days that this was going to be another one where the simulation is too computationally expensive for part two, which turns out to be correct. So, first thing to do is to implement the meat of the solution in Rust. spinlock solves the first part of the problem by doing exactly what the monster does. Since we only have to go up to 2017 iterations, this is very tractable. The last number we insert is 2017, so we just return the number immediately after that. #[no_mangle] pub extern fn spinlock(n: usize, skip: usize) -> i32 { let mut buffer: Vec<i32> = Vec::with_capacity(n+1); buffer.push(0); buffer.push(1); let mut pos = 1; for i in 2..n+1 { pos = (pos + skip + 1) % buffer.len(); buffer.insert(pos, i as i32); } pos = (pos + 1) % buffer.len(); return buffer[pos]; } For the second part, we have to do 50 million iterations instead, which is a lot. Given that every time you insert an item in the list it has to move up all the elements after that position, I’m pretty sure the algorithm is O(n^2), so it’s going to take a lot longer than 10,000ish times the first part. Thankfully, we don’t need to build the whole list, just keep track of where 0 is and what number is immediately after it. There may be a closed-form solution to simply calculate the result, but I couldn’t think of it and this is good enough. #[no_mangle] pub extern fn spinlock0(n: usize, skip: usize) -> i32 { let mut pos = 1; let mut pos_0 = 0; let mut after_0 = 1; for i in 2..n+1 { pos = (pos + skip + 1) % i; if pos == pos_0 + 1 { after_0 = i; } if pos <= pos_0 { pos_0 += 1; } } return after_0 as i31; } Now it’s time to call this code from Python. Notice the #[no_mangle] pragmas and pub extern declarations for each function above, which are required to make sure the functions are exported in a C-compatible way. We can build this into a shared library like this: rustc --crate-type=cdylib -o spinlock.so 17-spinlock.rs The Python script is as simple as loading this library, reading the puzzle input from the command line and calling the functions. The ctypes module does a lot of magic so that we don’t have to worry about converting from Python types to native types and back again. import ctypes import sys lib = ctypes.cdll.LoadLibrary("./spinlock.so") skip = int(sys.argv[1]) print("Part 1:", lib.spinlock(2017, skip)) print("Part 2:", lib.spinlock0(50_000_000, skip)) This is a toy example as far as calling Rust from Python is concerned, but it’s worth noting that already we can play with the parameters to the two Rust functions without having to recompile. For more serious work, I’d probably be looking at something like PyO3 to make a proper Python module. Looks like there’s also a very early Rust numpy integration for integrating numerical stuff. You can also do the same thing from Julia, which has a ccall function built in: ccall((:spinlock, "./spinlock.so"), Int32, (UInt64, UInt64), 2017, 377) My next thing to try might be Haskell → Python though… Permutation Promenade — Julia — #adventofcode Day 16 Today’s challenge rather appeals to me as a folk dancer, because it describes a set of instructions for a dance and asks us to work out the positions of the dancing programs after each run through the dance. → Full code on GitHub !!! commentary So, part 1 is pretty straight forward: parse the set of instructions, interpret them and keep track of the dancer positions as you go. One time through the dance. However, part 2 asks for the positions after 1 billion (yes, that’s 1,000,000,000) times through the dance. In hindsight I should have immediately become suspicious, but I thought I’d at least try the brute force approach first because it was simpler to code. So I give it a try, and after waiting for a while, having a cup of tea etc. it still hasn't terminated. I try reducing the number of iterations to 1,000. Now it terminates, but takes about 6 seconds. A spot of arithmetic suggests that running the full version will take a little over 190 years. There must be a better way than that! I'm a little embarassed that I didn't spot the solution immediately (blaming Julia) and tried again in Python to see if I could get it to terminate quicker. When that didn't work I had to think again. A little further investigation with a while loop shows that in fact the dance position repeats (in the case of my input) every 48 times. After that it becomes much quicker! Oh, and it was time for a new language, so I wasted some extra time working out the quirks of [Julia][]. First, a function to evaluate a single move — for neatness, this dispatches to a dedicated function depending on the type of move, although this isn’t really necessary to solve the challenge. Ending a function name with a bang (!) is a Julia convention to indicate that it has side-effects. function eval_move!(move, dancers) move_type = move[1] params = move[2:end] if move_type == 's' # spin eval_spin!(params, dancers) elseif move_type == 'x' # exchange eval_exchange!(params, dancers) elseif move_type == 'p' # partner swap eval_partner!(params, dancers) end end These take care of the individual moves. Parsing the parameters from a string every single time probably isn’t ideal, but as it turns out, that optimisation isn’t really necessary. Note the + 1 in eval_exchange!, which is necessary because Julia is one of those crazy languages where indexes start from 1 instead of 0. These actions are pretty nice to implement, because Julia has circshift as a builtin to rotate a list, and allows you to assign to list slices and swap values in place with a single statement. function eval_spin!(params, dancers) shift = parse(Int, params) dancers[1:end] = circshift(dancers, shift) end function eval_exchange!(params, dancers) i, j = map(x -> parse(Int, x) + 1, split(params, "/")) dancers[i], dancers[j] = dancers[j], dancers[i] end function eval_partner!(params, dancers) a, b = split(params, "/") ia = findfirst([x == a for x in dancers]) ib = findfirst([x == b for x in dancers]) dancers[ia], dancers[ib] = b, a end dance! takes a list of moves and takes the dances once through the dance. function dance!(moves, dancers) for m in moves eval_move!(m, dancers) end end To solve part 1, we simply need to read the moves in, set up the initial positions of the dances and run the dance through once. join is necessary to a) turn characters into length-1 strings, and b) convert the list of strings back into a single string to print out. moves = split(readchomp(STDIN), ",") dancers = collect(join(c) for c in 'a':'p') orig_dancers = copy(dancers) dance!(moves, dancers) println(join(dancers)) Part 2 requires a little more work. We run the dance through again and again until we get back to the initial position, saving the intermediate positions in a list. The list now contains every possible position available from that starting point, so we can find position 1 billion by taking 1,000,000,000 modulo the list length (plus 1 because 1-based indexing) and use that to index into the list to get the final position. dance_cycle = [orig_dancers] while dancers != orig_dancers push!(dance_cycle, copy(dancers)) dance!(moves, dancers) end println(join(dance_cycle[1_000_000_000 % length(dance_cycle) + 1])) This terminates on my laptop in about 1.6s: Brute force 0; Careful thought 1! Dueling Generators — Rust — #adventofcode Day 15 Today’s challenge introduces two pseudo-random number generators which are trying to agree on a series of numbers. We play the part of the “judge”, counting the number of times their numbers agree in the lowest 16 bits. → Full code on GitHub Ever since I used Go to solve day 3, I’ve had a hankering to try the other new kid on the memory-safe compiled language block, Rust. I found it a bit intimidating at first because the syntax wasn’t as close to the C/C++ I’m familiar with and there are quite a few concepts unique to Rust, like the use of traits. But I figured it out, so I can tick another language of my to-try list. I also implemented a version in Python for comparison: the Python version is more concise and easier to read but the Rust version runs about 10× faster. First we include the std::env “crate” which will let us get access to commandline arguments, and define some useful constants for later. use std::env; const M: i64 = 2147483647; const MASK: i64 = 0b1111111111111111; const FACTOR_A: i64 = 16807; const FACTOR_B: i64 = 48271; gen_next generates the next number for a given generator’s sequence. gen_next_picky does the same, but for the “picky” generators, only returning values that meet their criteria. fn gen_next(factor: i64, current: i64) -> i64 { return (current * factor) % M; } fn gen_next_picky(factor: i64, current: i64, mult: i64) -> i64 { let mut next = gen_next(factor, current); while next % mult != 0 { next = gen_next(factor, next); } return next; } duel runs a single duel, and returns the number of times the generators agreed in the lowest 16 bits (found by doing a binary & with the mask defined above). Rust allows functions to be passed as parameters, so we use this to be able to run both versions of the duel using only this one function. fn duel<F, G>(n: i64, next_a: F, mut value_a: i64, next_b: G, mut value_b: i64) -> i64 where F: Fn(i64) -> i64, G: Fn(i64) -> i64, { let mut count = 0; for _ in 0..n { value_a = next_a(value_a); value_b = next_b(value_b); if (value_a & MASK) == (value_b & MASK) { count += 1; } } return count; } Finally, we read the start values from the command line and run the two duels. The expressions that begin |n| are closures (anonymous functions, often called lambdas in other languages) that we use to specify the generator functions for each duel. fn main() { let args: Vec<String> = env::args().collect(); let start_a: i64 = args[1].parse().unwrap(); let start_b: i64 = args[2].parse().unwrap(); println!( "Duel 1: {}", duel( 40000000, |n| gen_next(FACTOR_A, n), start_a, |n| gen_next(FACTOR_B, n), start_b, ) ); println!( "Duel 2: {}", duel( 5000000, |n| gen_next_picky(FACTOR_A, n, 4), start_a, |n| gen_next_picky(FACTOR_B, n, 8), start_b, ) ); } Disk Defragmentation — Haskell — #adventofcode Day 14 Today’s challenge has us helping a disk defragmentation program by identifying contiguous regions of used sectors on a 2D disk. → Full code on GitHub !!! commentary Wow, today’s challenge had a pretty steep learning curve. Day 14 was the first to directly reuse code from a previous day: the “knot hash” from day 10. I solved day 10 in Haskell, so I thought it would be easier to stick with Haskell for today as well. The first part was straightforward, but the second was pretty mind-bending in a pure functional language! I ended up solving it by implementing a [flood fill algorithm][flood]. It's recursive, which is right in Haskell's wheelhouse, but I ended up using `Data.Sequence` instead of the standard list type as its API for indexing is better. I haven't tried it, but I think it will also be a little faster than a naive list-based version. It took a looong time to figure everything out, but I had a day off work to be able to concentrate on it! A lot more imports for this solution, as we’re exercising a lot more of the standard library. module Main where import Prelude hiding (length, filter, take) import Data.Char (ord) import Data.Sequence import Data.Foldable hiding (length) import Data.Ix (inRange) import Data.Function ((&)) import Data.Maybe (fromJust, mapMaybe, isJust) import qualified Data.Set as Set import Text.Printf (printf) import System.Environment (getArgs) Also we’ll extract the key bits from day 10 into a module and import that. import KnotHash Now we define a few data types to make the code a bit more readable. Sector represent the state of a particular disk sector, either free, used (but unmarked) or used and marked as belonging to a given integer-labelled group. Grid is a 2D matrix of Sector, as a sequence of sequences. data Sector = Free | Used | Mark Int deriving (Eq) instance Show Sector where show Free = " ." show Used = " #" show (Mark i) = printf "%4d" i type GridRow = Seq Sector type Grid = Seq (GridRow) Some utility functions to make it easier to view the grids (which can be quite large): used for debugging but not in the finished solution. subGrid :: Int -> Grid -> Grid subGrid n = fmap (take n) . take n printRow :: GridRow -> IO () printRow row = do mapM_ (putStr . show) row putStr "\n" printGrid :: Grid -> IO () printGrid = mapM_ printRow makeKey generates the hash key for a given row. makeKey :: String -> Int -> String makeKey input n = input ++ "-" ++ show n stringToGridRow converts a binary string of ‘1’ and ‘0’ characters to a sequence of Sector values. stringToGridRow :: String -> GridRow stringToGridRow = fromList . map convert where convert x | x == '1' = Used | x == '0' = Free makeRow and makeGrid build up the grid to use based on the provided input string. makeRow :: String -> Int -> GridRow makeRow input n = stringToGridRow $ concatMap (printf "%08b") $ dense $ fullKnotHash 256 $ map ord $ makeKey input n makeGrid :: String -> Grid makeGrid input = fromList $ map (makeRow input) [0..127] Utility functions to count the number of used and free sectors, to give the solution to part 1. countEqual :: Sector -> Grid -> Int countEqual x = sum . fmap (length . filter (==x)) countUsed = countEqual Used countFree = countEqual Free Now the real meat begins! fundUnmarked finds the location of the next used sector that we haven’t yet marked. It returns a Maybe value, which is Just (x, y) if there is still an unmarked block or Nothing if there’s nothing left to mark. findUnmarked :: Grid -> Maybe (Int, Int) findUnmarked g | y == Nothing = Nothing | otherwise = Just (fromJust x, fromJust y) where hasUnmarked row = isJust $ elemIndexL Used row x = findIndexL hasUnmarked g y = case x of Nothing -> Nothing Just x' -> elemIndexL Used $ index g x' floodFill implements a very simple recursive flood fill. It takes a target and replacement value and a starting location, and fills in the replacement value for every connected location that currently has the target value. We use it below to replace a connected used region with a marked region. floodFill :: Sector -> Sector -> (Int, Int) -> Grid -> Grid floodFill t r (x, y) g | inRange (0, length g - 1) x && inRange (0, length g - 1) y && elem == t = let newRow = update y r row newGrid = update x newRow g in newGrid & floodFill t r (x+1, y) & floodFill t r (x-1, y) & floodFill t r (x, y+1) & floodFill t r (x, y-1) | otherwise = g where row = g `index` x elem = row `index` y markNextGroup looks for an unmarked group and marks it if found. If no more groups are found it returns Nothing. markAllGroups then repeatedly applies markNextGroup until Nothing is returned. markNextGroup :: Int -> Grid -> Maybe Grid markNextGroup i g = case findUnmarked g of Nothing -> Nothing Just loc -> Just $ floodFill Used (Mark i) loc g markAllGroups :: Grid -> Grid markAllGroups g = markAllGroups' 1 g where markAllGroups' i g = case markNextGroup i g of Nothing -> g Just g' -> markAllGroups' (i+1) g' onlyMarks filters a grid row and returns a list of (possibly duplicated) group numbers in the row. onlyMarks :: GridRow -> [Int] onlyMarks = mapMaybe getMark . toList where getMark Free = Nothing getMark Used = Nothing getMark (Mark i) = Just i Finally, countGroups puts all the group numbers into a set to get rid of duplicates and returns the size of the set, i.e. the total number of separate groups. countGroups :: Grid -> Int countGroups g = Set.size groupSet where groupSet = foldl' Set.union Set.empty $ fmap rowToSet g rowToSet = Set.fromList . toList . onlyMarks As always, every Haskell program needs a main function to drive the I/O and produce the actual result. main = do input <- fmap head getArgs let grid = makeGrid input used = countUsed grid marked = countGroups $ markAllGroups grid putStrLn $ "Used sectors: " ++ show used putStrLn $ "Groups: " ++ show marked Packet Scanners — Haskell — #adventofcode Day 13 Today’s challenge requires us to sneak past a firewall made up of a series of scanners. → Full code on GitHub !!! commentary I wasn’t really thinking straight when I solved this challenge. I got a solution without too much trouble, but I ended up simulating the step-by-step movement of the scanners. I finally realised that I could calculate whether or not a given scanner was safe at a given time directly with modular arithmetic, and it bugged me so much that I reimplemented the solution. Both are given below, the faster one first. First we introduce some standard library stuff and define some useful utilities. module Main where import qualified Data.Text as T import Data.Maybe (mapMaybe) strip :: String -> String strip = T.unpack . T.strip . T.pack splitOn :: String -> String -> [String] splitOn sep = map T.unpack . T.splitOn (T.pack sep) . T.pack parseScanner :: String -> (Int, Int) parseScanner s = (d, r) where [d, r] = map read $ splitOn ": " s traverseFW does all the hard work: it checks for each scanner whether or not it’s safe as we pass through, and returns a list of the severities of each time we’re caught. mapMaybe is like the standard map in many languages, but operates on a list of Haskell Maybe values, like a combined map and filter. If the value is Just x, x gets included in the returned list; if the value is Nothing, then it gets thrown away. traverseFW :: Int -> [(Int, Int)] -> [Int] traverseFW delay = mapMaybe caught where caught (d, r) = if (d + delay) `mod` (2*(r-1)) == 0 then Just (d * r) else Nothing Then the total severity of our passage through the firewall is simply the sum of each individual severity. severity :: [(Int, Int)] -> Int severity = sum . traverseFW 0 But we don’t want to know how badly we got caught, we want to know how long to wait before setting off to get through safely. findDelay tries traversing the firewall with increasing delay, and returns the delay for the first pass where we predict not getting caught. findDelay :: [(Int, Int)] -> Int findDelay scanners = head $ filter (null . flip traverseFW scanners) [0..] And finally, we put it all together and calculate and print the result. main = do scanners <- fmap (map parseScanner . lines) getContents putStrLn $ "Severity: " ++ (show $ severity scanners) putStrLn $ "Delay: " ++ (show $ findDelay scanners) I’m not generally bothered about performance for these challenges, but here I’ll note that my second attempt runs in a little under 2 seconds on my laptop: $ time ./13-packet-scanners-redux < 13-input.txt Severity: 1900 Delay: 3966414 ./13-packet-scanners-redux < 13-input.txt 1.73s user 0.02s system 99% cpu 1.754 total Compare that with the first, simulation-based one, which takes nearly a full minute: $ time ./13-packet-scanners < 13-input.txt Severity: 1900 Delay: 3966414 ./13-packet-scanners < 13-input.txt 57.63s user 0.27s system 100% cpu 57.902 total And for good measure, here’s the code. Notice the tick and tickOne functions, which together simulate moving all the scanners by one step; for this to work we have to track the full current state of each scanner, which is easier to read with a Haskell record-based custom data type. traverseFW is more complicated because it has to drive the simulation, but the rest of the code is mostly the same. module Main where import qualified Data.Text as T import Control.Monad (forM_) data Scanner = Scanner { depth :: Int , range :: Int , pos :: Int , dir :: Int } instance Show Scanner where show (Scanner d r p dir) = show d ++ "/" ++ show r ++ "/" ++ show p ++ "/" ++ show dir strip :: String -> String strip = T.unpack . T.strip . T.pack splitOn :: String -> String -> [String] splitOn sep str = map T.unpack $ T.splitOn (T.pack sep) $ T.pack str parseScanner :: String -> Scanner parseScanner s = Scanner d r 0 1 where [d, r] = map read $ splitOn ": " s tickOne :: Scanner -> Scanner tickOne (Scanner depth range pos dir) | pos <= 0 = Scanner depth range (pos+1) 1 | pos >= range - 1 = Scanner depth range (pos-1) (-1) | otherwise = Scanner depth range (pos+dir) dir tick :: [Scanner] -> [Scanner] tick = map tickOne traverseFW :: [Scanner] -> [(Int, Int)] traverseFW = traverseFW' 0 where traverseFW' _ [] = [] traverseFW' layer scanners@((Scanner depth range pos _):rest) -- | layer == depth && pos == 0 = (depth*range) + (traverseFW' (layer+1) $ tick rest) | layer == depth && pos == 0 = (depth,range) : (traverseFW' (layer+1) $ tick rest) | layer == depth && pos /= 0 = traverseFW' (layer+1) $ tick rest | otherwise = traverseFW' (layer+1) $ tick scanners severity :: [Scanner] -> Int severity = sum . map (uncurry (*)) . traverseFW empty :: [a] -> Bool empty [] = True empty _ = False findDelay :: [Scanner] -> Int findDelay scanners = delay where (delay, _) = head $ filter (empty . traverseFW . snd) $ zip [0..] $ iterate tick scanners main = do scanners <- fmap (map parseScanner . lines) getContents putStrLn $ "Severity: " ++ (show $ severity scanners) putStrLn $ "Delay: " ++ (show $ findDelay scanners) Digital Plumber — Python — #adventofcode Day 12 Today’s challenge has us helping a village of programs who are unable to communicate. We have a list of the the communication channels between their houses, and need to sort them out into groups such that we know that each program can communicate with others in its own group but not any others. Then we have to calculate the size of the group containing program 0 and the total number of groups. → Full code on GitHub !!! commentary This is one of those problems where I’m pretty sure that my algorithm isn’t close to being the most efficient, but it definitely works! For the sake of solving the challenge that’s all that matters, but it still bugs me. By now I’ve become used to using fileinput to transparently read data either from files given on the command-line or standard input if no arguments are given. import fileinput as fi First we make an initial pass through the input data, creating a group for each line representing the programs on that line (which can communicate with each other). We store this as a Python set. groups = [] for line in fi.input(): head, rest = line.split(' <-> ') group = set([int(head)]) group.update([int(x) for x in rest.split(', ')]) groups.append(group) Now we iterate through the groups, starting with the first, and merging any we find that overlap with our current group. i = 0 while i < len(groups): current = groups[i] Each pass through the groups brings more programs into the current group, so we have to go through and check their connections too. We make several merge passes, until we detect that no more merges took place. num_groups = len(groups) + 1 while num_groups > len(groups): j = i+1 num_groups = len(groups) This inner loop does the actual merging, and deletes each group as it’s merged in. while j < len(groups): if len(current & groups[j]) > 0: current.update(groups[j]) del groups[j] else: j += 1 i += 1 All that’s left to do now is to display the results. print("Number in group 0:", len([g for g in groups if 0 in g][0])) print("Number of groups:", len(groups)) Hex Ed — Python — #adventofcode Day 11 Today’s challenge is to help a program find its child process, which has become lost on a hexagonal grid. We need to follow the path taken by the child (given as input) and calculate the distance it is from home along with the furthest distance it has been at any point along the path. → Full code on GitHub !!! commentary I found this one quite interesting in that it was very quick to solve. In fact, I got lucky and my first quick implementation (max(abs(l)) below) gave the correct answer in spite of missing an obvious not-so-edge case. Thinking about it, there’s only a ⅓ chance that the first incorrect implementation would give the wrong answer! The code is shorter, so you get more words today. ☺ There are a number of different co-ordinate systems on a hexagonal grid (I discovered while reading up after solving it…). I intuitively went for the system known as ‘axial’ coordinates, where you pick two directions aligned to the grid as your x and y axes: note that these won’t be perpendicular. I chose ne/sw as the x axis and se/nw as y, but there are three other possible choices. That leads to the following definition for the directions, encoded as numpy arrays because that makes some of the code below neater. import numpy as np STEPS = {d: np.array(v) for d, v in [('ne', (1, 0)), ('se', (0, -1)), ('s', (-1, -1)), ('sw', (-1, 0)), ('nw', (0, 1)), ('n', (1, 1))]} hex_grid_dist, given a location l calculates the number of steps needed to reach that location from the centre at (0, 0). Notice that we can’t simply use the Manhattan distance here because, for example, one step north takes us to (1, 1), which would give a Manhattan distance of 2. Instead, we can see that moving in the n/s direction allows us to increment or decrement both coordinates at the same time: If the coordinates have the same sign: move n/s until one of them is zero, then move along the relevant ne or se axis back to the origin; in this case the number of steps is greatest of the absolute values of the two coordinates If the coordinates have opposite signs: move independently along the ne and se axes to reduce each to 0; this time the number of steps is the sum of the absolute values of the two coordinates def hex_grid_distance(l): if sum(np.sign(l)) == 0: # i.e. opposite signs return sum(abs(l)) else: return max(abs(l)) Now we can read in the path followed by the child and follow it ourselves, tracking the maximum distance from home along the way. path = input().strip().split(',') location = np.array((0, 0)) max_distance = 0 for step in map(STEPS.get, path): location += step max_distance = max(max_distance, hex_grid_distance(location)) distance = hex_grid_distance(location) print("Child process is at", location, "which is", distance, "steps away") print("Greatest distance was", max_distance) Knot Hash — Haskell — #adventofcode Day 10 Today’s challenge asks us to help a group of programs implement a (highly questionable) hashing algorithm that involves repeatedly reversing parts of a list of numbers. → Full code on GitHub !!! commentary I went with Haskell again today, because it’s the weekend so I have a bit more time, and I really enjoyed yesterday’s Haskell implementation. Today gave me the opportunity to explore the standard library a bit more, as well as lending itself nicely to being decomposed into smaller parts to be combined using higher-order functions. You know the drill by know: import stuff we’ll use later. module Main where import Data.Char (ord) import Data.Bits (xor) import Data.Function ((&)) import Data.List (unfoldr) import Text.Printf (printf) import qualified Data.Text as T The worked example uses a concept of the “current position” as a pointer to a location in a static list. In Haskell it makes more sense to instead use the front of the list as the current position, and rotate the whole list as we progress to bring the right element to the front. rotate :: Int -> [Int] -> [Int] rotate 0 xs = xs rotate n xs = drop n' xs ++ take n' xs where n' = n `mod` length xs The simple version of the hash requires working through the input list, modifying the working list as we go, and incrementing a “skip” counter with each step. Converting this to a functional style, we simply zip up the input with an infinite list [0, 1, 2, 3, ...] to give the counter values. Notice that we also have to calculate how far to rotate the working list to get back to its original position. foldl lets us specify a function that returns a modified version of the working list and feeds the input list in one at a time. simpleKnotHash :: Int -> [Int] -> [Int] simpleKnotHash size input = foldl step [0..size-1] input' & rotate (negate finalPos) where input' = zip input [0..] finalPos = sum $ zipWith (+) input [0..] reversePart xs n = (reverse $ take n xs) ++ drop n xs step xs (n, skip) = reversePart xs n & rotate (n+skip) The full version of the hash (part 2 of the challenge) starts the same way as the simple version, except making 64 passes instead of one: we can do this by using replicate to make a list of 64 copies, then collapse that into a single list with concat. fullKnotHash :: Int -> [Int] -> [Int] fullKnotHash size input = simpleKnotHash size input' where input' = concat $ replicate 64 input The next step in calculating the full hash collapses the full 256-element “sparse” hash down into 16 elements by XORing groups of 16 together. unfoldr is a nice efficient way of doing this. dense :: [Int] -> [Int] dense = unfoldr dense' where dense' [] = Nothing dense' xs = Just (foldl1 xor $ take 16 xs, drop 16 xs) The final hash step is to convert the list of integers into a hexadecimal string. hexify :: [Int] -> String hexify = concatMap (printf "%02x") These two utility functions put together building blocks from the Data.Text module to parse the input string. Note that no arguments are given: the functions are defined purely by composing other functions using the . operator. In Haskell this is referred to as “point-free” style. strip :: String -> String strip = T.unpack . T.strip . T.pack parseInput :: String -> [Int] parseInput = map (read . T.unpack) . T.splitOn (T.singleton ',') . T.pack Now we can put it all together, including building the weird input for the “full” hash. main = do input <- fmap strip getContents let simpleInput = parseInput input asciiInput = map ord input ++ [17, 31, 73, 47, 23] (a:b:_) = simpleKnotHash 256 simpleInput print $ (a*b) putStrLn $ fullKnotHash 256 asciiInput & dense & hexify Stream Processing — Haskell — #adventofcode Day 9 In today’s challenge we come across a stream that we need to cross. But of course, because we’re stuck inside a computer, it’s not water but data flowing past. The stream is too dangerous to cross until we’ve removed all the garbage, and to prove we can do that we have to calculate a score for the valid data “groups” and the number of garbage characters to remove. → Full code on GitHub !!! commentary One of my goals for this process was to knock the rust of my functional programming skills in Haskell, and I haven’t done that for the whole of the first week. Processing strings character by character and acting according to which character shows up seems like a good choice for pattern-matching though, so here we go. I also wanted to take a bash at test-driven development in Haskell, so I also loaded up the Test.Hspec module to give it a try. I did find keeping track of all the state in arguments a bit mind boggling, and I think it could have been improved through use of a data type using record syntax and the `State` monad, so that's something to look at for a future challenge. First import the extra bits we’ll need. module Main where import Test.Hspec import Data.Function ((&)) countGroups solves the first part of the problem, counting up the “score” of the valid data in the stream. countGroups' is an auxiliary function that holds some state in its arguments. We use pattern matching for the base case: [] represents the empty list in Haskell, which indicates we’ve finished the whole stream. Otherwise, we split the remaining stream into its first character and remainder, and use guards to decide how to interpret it. If skip is true, discard the character and carry on with skip set back to false. If we find a “!”, that tells us to skip the next. Other characters mark groups or sets of garbage: groups increase the score when they close and garbage is discarded. We continue to progress the list by recursing with the remainder of the stream and any updated state. countGroups :: String -> Int countGroups = countGroups' 0 0 False False where countGroups' score _ _ _ [] = score countGroups' score level garbage skip (c:rest) | skip = countGroups' score level garbage False rest | c == '!' = countGroups' score level garbage True rest | garbage = case c of '>' -> countGroups' score level False False rest _ -> countGroups' score level True False rest | otherwise = case c of '{' -> countGroups' score (level+1) False False rest '}' -> countGroups' (score+level) (level-1) False False rest ',' -> countGroups' score level False False rest '<' -> countGroups' score level True False rest c -> error $ "Garbage character found outside garbage: " ++ show c countGarbage works almost identically to countGroups, except it ignores groups and counts garbage. They are structured so similarly that it would probably make more sense to combine them to a single function that returns both counts. countGarbage :: String -> Int countGarbage = countGarbage' 0 False False where countGarbage' count _ _ [] = count countGarbage' count garbage skip (c:rest) | skip = countGarbage' count garbage False rest | c == '!' = countGarbage' count garbage True rest | garbage = case c of '>' -> countGarbage' count False False rest _ -> countGarbage' (count+1) True False rest | otherwise = case c of '<' -> countGarbage' count True False rest _ -> countGarbage' count False False rest Hspec gives us a domain-specific language heavily inspired by the rspec library for Ruby: the tests read almost like natural language. I built up these tests one-by-one, gradually implementing the appropriate bits of the functions above, a process known as Test-driven development. runTests = hspec $ do describe "countGroups" $ do it "counts valid groups" $ do countGroups "{}" `shouldBe` 1 countGroups "{{{}}}" `shouldBe` 6 countGroups "{{{},{},{{}}}}" `shouldBe` 16 countGroups "{{},{}}" `shouldBe` 5 it "ignores garbage" $ do countGroups "{<a>,<a>,<a>,<a>}" `shouldBe` 1 countGroups "{{<ab>},{<ab>},{<ab>},{<ab>}}" `shouldBe` 9 it "skips marked characters" $ do countGroups "{{<!!>},{<!!>},{<!!>},{<!!>}}" `shouldBe` 9 countGroups "{{<a!>},{<a!>},{<a!>},{<ab>}}" `shouldBe` 3 describe "countGarbage" $ do it "counts garbage characters" $ do countGarbage "<>" `shouldBe` 0 countGarbage "<random characters>" `shouldBe` 17 countGarbage "<<<<>" `shouldBe` 3 it "ignores non-garbage" $ do countGarbage "{{},{}}" `shouldBe` 0 countGarbage "{{<ab>},{<ab>},{<ab>},{<ab>}}" `shouldBe` 8 it "skips marked characters" $ do countGarbage "<{!>}>" `shouldBe` 2 countGarbage "<!!>" `shouldBe` 0 countGarbage "<!!!>" `shouldBe` 0 countGarbage "<{o\"i!a,<{i<a>" `shouldBe` 10 Finally, the main function reads in the challenge input and calculates the answers, printing them on standard output. main = do runTests repeat '=' & take 78 & putStrLn input <- getContents & fmap (filter (/='\n')) putStrLn $ "Found " ++ show (countGroups input) ++ " groups" putStrLn $ "Found " ++ show (countGarbage input) ++ " characters garbage" I Heard You Like Registers — Python — #adventofcode Day 8 Today’s challenge describes a simple instruction set for a CPU, incrementing and decrementing values in registers according to simple conditions. We have to interpret a stream of these instructions, and to prove that we’ve done so, give the highest value of any register, both at the end of the program and throughout the whole program. → Full code on GitHub !!! commentary This turned out to be a nice straightforward one to implement, as the instruction format was easily parsed by regular expression, and Python provides the eval function which made evaluating the conditions a doddle. Import various standard library bits that we’ll use later. import re import fileinput as fi from math import inf from collections import defaultdict We could just parse the instructions by splitting the string, but using a regular expression is a little bit more robust because it won’t match at all if given an invalid instruction. INSTRUCTION_RE = re.compile(r'(\w+) (inc|dec) (-?\d+) if (.+)\s*') def parse_instruction(instruction): match = INSTRUCTION_RE.match(instruction) return match.group(1, 2, 3, 4) Executing an instruction simply checks the condition and if it evaluates to True updates the relevant register. def exec_instruction(registers, instruction): name, op, value, cond = instruction value = int(value) if op == 'dec': value = -value if eval(cond, globals(), registers): registers[name] += value highest_value returns the maximum value found in any register. def highest_value(registers): return sorted(registers.items(), key=lambda x: x[1], reverse=True)[0][1] Finally, loop through all the instructions and carry them out, updating global_max as we go. We need to be able to deal with registers that haven’t been accessed before. Keeping the registers in a dictionary means that we can evaluate the conditions directly using eval above, passing it as the locals argument. The standard dict will raise an exception if we try to access a key that doesn’t exist, so instead we use collections.defaultdict, which allows us to specify what the default value for a non-existent key will be. New registers start at 0, so we use a simple lambda to define a function that always returns 0. global_max = -inf registers = defaultdict(lambda: 0) for i in map(parse_instruction, fi.input()): exec_instruction(registers, i) global_max = max(global_max, highest_value(registers)) print('Max value:', highest_value(registers)) print('All-time max:', global_max) Recursive Circus — Ruby — #adventofcode Day 7 Today’s challenge introduces a set of processes balancing precariously on top of each other. We find them stuck and unable to get down because one of the processes is the wrong size, unbalancing the whole circus. Our job is to figure out the root from the input and then find the correct weight for the single incorrect process. → Full code on GitHub !!! commentary So I didn’t really intend to take a full polyglot approach to Advent of Code, but it turns out to have been quite fun, so I made a shortlist of languages to try. Building a tree is a classic application for object-orientation using a class to represent tree nodes, and I’ve always liked the feel of Ruby’s class syntax, so I gave it a go. First make sure we have access to Set, which we’ll use later. require 'set' Now to define the CircusNode class, which represents nodes in the tree. attr :s automatically creates a function s that returns the value of the instance attribute @s class CircusNode attr :name, :weight def initialize(name, weight, children=nil) @name = name @weight = weight @children = children || [] end Add a << operator (the same syntax for adding items to a list) that adds a child to this node. def <<(c) @children << c @total_weight = nil end total_weight recursively calculates the weight of this node and everything above it. The @total_weight ||= blah idiom caches the value so we only calculate it once. def total_weight @total_weight ||= @weight + @children.map {|c| c.total_weight}.sum end balance_weight does the hard work of figuring out the proper weight for the incorrect node by recursively searching through the tree. def balance_weight(target=nil) by_weight = Hash.new{|h, k| h[k] = []} @children.each{|c| by_weight[c.total_weight] << c} if by_weight.size == 1 then if target return @weight - (total_weight - target) else raise ArgumentError, 'This tree seems balanced!' end else odd_one_out = by_weight.select {|k, v| v.length == 1}.first[1][0] child_target = by_weight.select {|k, v| v.length > 1}.first[0] return odd_one_out.balance_weight child_target end end A couple of utility functions for displaying trees finish off the class. def to_s "#{@name} (#{@weight})" end def print_tree(n=0) puts "#{' '*n}#{self} -> #{self.total_weight}" @children.each do |child| child.print_tree n+1 end end end build_circus takes input as a list of lists [name, weight, children]. We make two passes over this list, first creating all the nodes, then building the tree by adding children to parents. def build_circus(data) all_nodes = {} all_children = Set.new data.each do |name, weight, children| all_nodes[name] = CircusNode.new name, weight end data.each do |name, weight, children| children.each {|child| all_nodes[name] << all_nodes[child]} all_children.merge children end root_name = (all_nodes.keys.to_set - all_children).first return all_nodes[root_name] end Finally, build the tree and solve the problem! Note that we use String.to_sym to convert the node names to symbols (written in Ruby as :symbol), because they’re faster to work with in Hashes and Sets as we do above. data = readlines.map do |line| match = /(?<parent>\w+) \((?<weight>\d+)\)(?: -> (?<children>.*))?/.match line [match['parent'].to_sym, match['weight'].to_i, match['children'] ? match['children'].split(', ').map {|x| x.to_sym} : []] end root = build_circus data puts "Root node: #{root}" puts root.balance_weight Memory Reallocation — Python — #adventofcode Day 6 Today’s challenge asks us to follow a recipe for redistributing objects in memory that bears a striking resemblance to the rules of the African game Mancala. → Full code on GitHub !!! commentary When I was doing my MSci, one of our programming exercises was to write (in Haskell, IIRC) a program to play a Mancala variant called Oware, so this had a nice ring of nostalgia. Back to Python today: it's already become clear that it's by far my most fluent language, which makes sense as it's the only one I've used consistently since my schooldays. I'm a bit behind on the blog posts, so you get this one without any explanation, for now at least! import math def reallocate(mem): max_val = -math.inf size = len(mem) for i, x in enumerate(mem): if x > max_val: max_val = x max_index = i i = max_index mem[i] = 0 remaining = max_val while remaining > 0: i = (i + 1) % size mem[i] += 1 remaining -= 1 return mem def detect_cycle(mem): mem = list(mem) steps = 0 prev_states = {} while tuple(mem) not in prev_states: prev_states[tuple(mem)] = steps steps += 1 mem = reallocate(mem) return (steps, steps - prev_states[tuple(mem)]) initial_state = map(int, input().split()) print("Initial state is ", initial_state) steps, cycle = detect_cycle(initial_state) print("Steps to cycle: ", steps) print("Steps in cycle: ", cycle) A Maze of Twisty Trampolines — C++ — #adventofcode Day 5 Today’s challenge has us attempting to help the CPU escape from a maze of instructions. It’s not quite a Turing Machine, but it has that feeling of moving a read/write head up and down a tape acting on and changing the data found there. → Full code on GitHub !!! commentary I haven’t written anything in C++ for over a decade. It sounds like there have been lots of interesting developments in the language since then, with C++11, C++14 and the freshly finalised C++17 standards (built-in parallelism in the STL!). I won’t use any of those, but I thought I’d dust off my C++ and see what happened. Thankfully the Standard Template Library classes still did what I expected! As usual, we first include the parts of the standard library we’re going to use: iostream for input & output; vector for the container. We also declare that we’re using the std namespace, so that we don’t have to prepend vector and the other classes with std::. #include <iostream> #include <vector> using namespace std; steps_to_escape_part1 implements part 1 of the challenge: we read a location, move forward/backward by the number of steps given in that location, then add one to the location before repeating. The result is the number of steps we take before jumping outside the list. int steps_to_escape_part1(vector<int>& instructions) { int pos = 0, iterations = 0, new_pos; while (pos < instructions.size()) { new_pos = pos + instructions[pos]; instructions[pos]++; pos = new_pos; iterations++; } return iterations; } steps_to_escape_part2 solves part 2, which is very similar, except that an offset greater than 3 is decremented instead of incremented before moving on. int steps_to_escape_part2(vector<int>& instructions) { int pos = 0, iterations = 0, new_pos, offset; while (pos < instructions.size()) { offset = instructions[pos]; new_pos = pos + offset; instructions[pos] += offset >=3 ? -1 : 1; pos = new_pos; iterations++; } return iterations; } Finally we pull it all together and link it up to the input. int main() { vector<int> instructions1, instructions2; int n; The cin class lets us read data from standard input, which we then add to a vector of ints to give our list of instructions. while (true) { cin >> n; if (cin.eof()) break; instructions1.push_back(n); } Solving the problem modifies the input, so we need to take a copy to solve part 2 as well. Thankfully the STL makes this easy with iterators. instructions2.insert(instructions2.begin(), instructions1.begin(), instructions1.end()); Finally, compute the result and print it on standard output. cout << steps_to_escape_part1(instructions1) << endl; cout << steps_to_escape_part2(instructions2) << endl; return 0; } High Entropy Passphrases — Python — #adventofcode Day 4 Today’s challenge describes some simple rules supposedly intended to enforce the use of secure passwords. All we have to do is test a list of passphrase and identify which ones meet the rules. → Full code on GitHub !!! commentary Fearing that today might be as time-consuming as yesterday, I returned to Python and it’s hugely powerful “batteries-included” standard library. Thankfully this challenge was more straightforward, and I actually finished this before finishing day 3. First, let’s import two useful utilities. from fileinput import input from collections import Counter Part 1 requires simply that a passphrase contains no repeated words. No problem: we split the passphrase into words and count them, and check if any was present more than once. Counter is an amazingly useful class to have in a language’s standard library. All it does is count things: you add objects to it, and then it will tell you how many of a given object you have. We’re going to use it to count those potentially duplicated words. def is_valid(passphrase): counter = Counter(passphrase.split()) return counter.most_common(1)[0][1] == 1 Part 2 requires that no word in the passphrase be an anagram of any other word. Since we don’t need to do anything else with the words afterwards, we can check for anagrams by sorting the letters in each word: “leaf” and “flea” both become “aefl” and can be compared directly. Then we count as before. def is_valid_ana(passphrase): counter = Counter(''.join(sorted(word)) for word in passphrase.split()) return counter.most_common(1)[0][1] == 1 Finally we pull everything together. sum(map(boolean_func, list)) is a common idiom in Python for counting the number of times a condition (checked by boolean_func) is true. In Python, True and False can be treated as the numbers 1 and 0 respectively, so that summing a list of Boolean values gives you the number of True values in the list. lines = list(input()) print(sum(map(is_valid, lines))) print(sum(map(is_valid_ana, lines))) Spiral Memory — Go — #adventofcode Day 3 Today’s challenge requires us to perform some calculations on an “experimental memory layout”, with cells moving outwards from the centre of a square spiral (squiral?). → Full code on GitHub !!! commentary I’ve been wanting to try my hand at Go, the memory-safe, statically typed compiled language from Google for a while. Today’s challenge seemed a bit more mathematical in nature, meaning that I wouldn’t need too many advanced language features or knowledge of a standard library, so I thought I’d give it a “go”. It might have been my imagination, but it was impressive how quickly the compiled program chomped through 60 different input values while I was debugging. I actually spent far too long on this problem because my brain led me down a blind alley trying to do the wrong calculation, but I got there in the end! The solution is a bit difficult to explain without diagrams, which I don't really have time to draw right now, but fear not because several other people have. First take a look at [the challenge itself which explains the spiral memory concept](http://adventofcode.com/2017/day/3). Then look at the [nice diagrams that Phil Tooley made with Python](http://acceleratedscience.co.uk/blog/adventofcode-day-3-spiral-memory/) and hopefully you'll be able to see what's going on! It's interesting to note that this challenge also admits of an algorithmic solution instead of the mathematical one: you can model the memory as an infinite grid using a suitable data structure and literally move around it in a spiral. In hindsight this is a much better way of solving the challenge quickly because it's easier and less error-prone to code. I'm quite pleased with my maths-ing though, and it's much quicker than the algorithmic version! First some Go boilerplate: we have to define the package we’re in (main, because it’s an executable we’re producing) and import the libraries we’ll use. package main import ( "fmt" "math" "os" ) Weirdly, Go doesn’t seem to have these basic mathematics functions for integers in its standard library (please someone correct me if I’m wrong!) so I’ll define them instead of mucking about with data types. Go doesn’t do any implicit type conversion, even between numeric types, and the math builtin package only operates on float64 values. func abs(n int) int { if n < 0 { return -n } return n } func min(x, y int) int { if x < y { return x } return y } func max(x, y int) int { if x > y { return x } return y } This does the heavy lifting for part one: converting from a position on the spiral to a column and row in the grid. (0, 0) is the centre of the spiral. This actually does a bit more than is necessary to calculate the distance as required for part 1, but we’ll use it again for part 2. func spiral_to_xy(n int) (int, int) { if n == 1 { return 0, 0 } r := int(math.Floor((math.Sqrt(float64(n-1)) + 1) / 2)) n_r := n - (2*r-1)*(2*r-1) o := ((n_r - 1) % (2 * r)) - r + 1 sector := (n_r - 1) / (2 * r) switch sector { case 0: return r, o case 1: return -o, r case 2: return -r, -o case 3: return o, -r } return 0, 0 } Now use spiral_to_xy to calculate the Manhattan distance that the value at location n in the spiral memory are carried to reach the “access port” at 0. func distance(n int) int { x, y := spiral_to_xy(n) return abs(x) + abs(y) } This function does the opposite of spiral_to_xy, translating a grid position back to its position on the spiral. This is the one that took me far too long to figure out because I had a brain bug and tried to calculate the value s (which sector or quarter of the spiral we’re looking at) in a way that was never going to work! Fortunately I came to my senses. func xy_to_spiral(x, y int) int { if x == 0 && y == 0 { return 1 } r := max(abs(x), abs(y)) var s, o, n int if x+y > 0 && x-y >= 0 { s = 0 } else if x-y < 0 && x+y >= 0 { s = 1 } else if x+y < 0 && x-y <= 0 { s = 2 } else { s = 3 } switch s { case 0: o = y case 1: o = -x case 2: o = -y case 3: o = x } n = o + r*(2*s+1) + (2*r-1)*(2*r-1) return n } This is a utility function that uses xy_to_spiral to fetch the value at a given (x, y) location, and returns zero if we haven’t filled that location yet. func get_spiral(mem []int, x, y int) int { n := xy_to_spiral(x, y) - 1 if n < len(mem) { return mem[n] } return 0 } Finally we solve part 2 of the problem, which involves going round the spiral writing values into it that are the sum of some values already written. The result is the first of these sums that is greater than or equal to the given input value. func stress_test(input int) int { mem := make([]int, 1) n := 0 mem[0] = 1 for mem[n] < input { n++ x, y := spiral_to_xy(n + 1) mem = append(mem, get_spiral(mem, x+1, y)+ get_spiral(mem, x+1, y+1)+ get_spiral(mem, x, y+1)+ get_spiral(mem, x-1, y+1)+ get_spiral(mem, x-1, y)+ get_spiral(mem, x-1, y-1)+ get_spiral(mem, x, y-1)+ get_spiral(mem, x+1, y-1)) } return mem[n] } Now the last part of the program puts it all together, reading the input value from a commandline argument and printing the results of the two parts of the challenge: func main() { var n int fmt.Sscanf(os.Args[1], "%d", &n) fmt.Printf("Input is %d\n", n) fmt.Printf("Distance is %d\n", distance(n)) fmt.Printf("Stress test result is %d\n", stress_test(n)) } Corruption Checksum — Python — #adventofcode Day 2 Today’s challenge is to calculate a rather contrived “checksum” over a grid of numbers. → Full code on GitHub !!! commentary Today I went back to plain Python, and I didn’t do formal tests because only one test case was given for each part of the problem. I just got stuck in. I did write part 2 out in as nested `for` loops as an intermediate step to working out the generator expression. I think that expanded version may have been more readable. Having got that far, I couldn't then work out how to finally eliminate the need for an auxiliary function entirely without either sorting the same elements multiple times or sorting each row as it's read. First we read in the input, split it and convert it to numbers. fileinput.input() returns an iterator over the lines in all the files passed as command-line arguments, or over standard input if no files are given. from fileinput import input sheet = [[int(x) for x in l.split()] for l in input()] Part 1 of the challenge calls for finding the difference between the largest and smallest number in each row, and then summing those differences: print(sum(max(x) - min(x) for x in sheet)) Part 2 is a bit more involved: for each row we have to find the unique pair of elements that divide into each other without remainder, then sum the result of those divisions. We can make it a little easier by sorting each row; then we can take each number in turn and compare it only with the numbers after it (which are guaranteed to be larger). Doing this ensures we only make each comparison once. def rowsum_div(row): row = sorted(row) return sum(y // x for i, x in enumerate(row) for y in row[i+1:] if y % x == 0) print(sum(map(rowsum_div, sheet))) We can make this code shorter (if not easier to read) by sorting each row as it’s read: sheet = [sorted(int(x) for x in l.split()) for l in input()] Then we can just use the first and last elements in each row for part 1, as we know those are the smallest and largest respectively in the sorted row: print(sum(x[-1] - x[0] for x in sheet)) Part 2 then becomes a sum over a single generator expression: print(sum(y // x for row in sheet for i, x in enumerate(row) for y in row[i+1:] if y % x == 0)) Very satisfying! Inverse Captcha — Coconut — #adventofcode Day 1 Well, December’s here at last, and with it Day 1 of Advent of Code. … It goes on to explain that you may only leave by solving a captcha to prove you’re not a human. Apparently, you only get one millisecond to solve the captcha: too fast for a normal human, but it feels like hours to you. … As well as posting solutions here when I can, I’ll be putting them all on https://github.com/jezcope/aoc2017 too. !!! commentary After doing some challenges from last year in Haskell for a warm up, I felt inspired to try out the functional-ish Python dialect, Coconut. Now that I’ve done it, it feels a bit of an odd language, neither fish nor fowl. It’ll look familiar to any Pythonista, but is loaded with features normally associated with functional languages, like pattern matching, destructuring assignment, partial application and function composition. That makes it quite fun to work with, as it works similarly to Haskell, but because it's restricted by the basic rules of Python syntax everything feels a bit more like hard work than it should. The accumulator approach feels clunky, but it's necessary to allow [tail call elimination](https://en.wikipedia.org/wiki/Tail_call), which Coconut will do and I wanted to see in action. Lo and behold, if you take a look at the [compiled Python version](https://github.com/jezcope/aoc2017/blob/86c8100824bda1b35e5db6e02d4b80890be7a022/01-inverse-captcha.py#L675) you'll see that my recursive implementation has been turned into a non-recursive `while` loop. Then again, maybe I'm just jealous of Phil Tooley's [one-liner solution in Python](https://github.com/ptooley/aocGolf/blob/1380d78194f1258748ccfc18880cfd575baf5d37/2017.py#L8). import sys def inverse_captcha_(s, acc=0): case reiterable(s): match (|d, d|) :: rest: return inverse_captcha_((|d|) :: rest, acc + int(d)) match (|d0, d1|) :: rest: return inverse_captcha_((|d1|) :: rest, acc) return acc def inverse_captcha(s) = inverse_captcha_(s :: s[0]) def inverse_captcha_1_(s0, s1, acc=0): case (reiterable(s0), reiterable(s1)): match ((|d0|) :: rest0, (|d0|) :: rest1): return inverse_captcha_1_(rest0, rest1, acc + int(d0)) match ((|d0|) :: rest0, (|d1|) :: rest1): return inverse_captcha_1_(rest0, rest1, acc) return acc def inverse_captcha_1(s) = inverse_captcha_1_(s, s$[len(s)//2:] :: s) def test_inverse_captcha(): assert "1111" |> inverse_captcha == 4 assert "1122" |> inverse_captcha == 3 assert "1234" |> inverse_captcha == 0 assert "91212129" |> inverse_captcha == 9 def test_inverse_captcha_1(): assert "1212" |> inverse_captcha_1 == 6 assert "1221" |> inverse_captcha_1 == 0 assert "123425" |> inverse_captcha_1 == 4 assert "123123" |> inverse_captcha_1 == 12 assert "12131415" |> inverse_captcha_1 == 4 if __name__ == "__main__": sys.argv[1] |> inverse_captcha |> print sys.argv[1] |> inverse_captcha_1 |> print Advent of Code 2017: introduction It’s a common lament of mine that I don’t get to write a lot of code in my day-to-day job. I like the feeling of making something from nothing, and I often look for excuses to write bits of code, both at work and outside it. Advent of Code is a daily series of programming challenges for the month of December, and is about to start its third annual incarnation. I discovered it too late to take part in any serious way last year, but I’m going to give it a try this year. There are no restrictions on programming language (so of course some people delight in using esoteric languages like Brainf**k), but I think I’ll probably stick with Python for the most part. That said, I miss my Haskell days and I’m intrigued by new kids on the block Go and Rust, so I might end up throwing in a few of those on some of the simpler challenges. I’d like to focus a bit more on how I solve the puzzles. They generally come in two parts, with the second part only being revealed after successful completion of the first part. With that in mind, test-driven development makes a lot of sense, because I can verify that I haven’t broken the solution to the first part in modifying to solve the second. I may also take a literate programming approach with org-mode or Jupyter notebooks to document my solutions a bit more, and of course that will make it easier to publish solutions here so I’ll do that as much as I can make time for. On that note, here are some solutions for 2016 that I’ve done recently as a warmup. Day 1: Python Day 1 instructions import numpy as np import pytest as t import sys TURN = { 'L': np.array([[0, 1], [-1, 0]]), 'R': np.array([[0, -1], [1, 0]]) } ORIGIN = np.array([0, 0]) NORTH = np.array([0, 1]) class Santa: def __init__(self, location, heading): self.location = np.array(location) self.heading = np.array(heading) self.visited = [(0,0)] def execute_one(self, instruction): start_loc = self.location.copy() self.heading = self.heading @ TURN[instruction[0]] self.location += self.heading * int(instruction[1:]) self.mark(start_loc, self.location) def execute_many(self, instructions): for i in instructions.split(','): self.execute_one(i.strip()) def distance_from_start(self): return sum(abs(self.location)) def mark(self, start, end): for x in range(min(start[0], end[0]), max(start[0], end[0])+1): for y in range(min(start[1], end[1]), max(start[1], end[1])+1): if any((x, y) != start): self.visited.append((x, y)) def find_first_crossing(self): for i in range(1, len(self.visited)): for j in range(i): if self.visited[i] == self.visited[j]: return self.visited[i] def distance_to_first_crossing(self): crossing = self.find_first_crossing() if crossing is not None: return abs(crossing[0]) + abs(crossing[1]) def __str__(self): return f'Santa @ {self.location}, heading {self.heading}' def test_execute_one(): s = Santa(ORIGIN, NORTH) s.execute_one('L1') assert all(s.location == np.array([-1, 0])) assert all(s.heading == np.array([-1, 0])) s.execute_one('L3') assert all(s.location == np.array([-1, -3])) assert all(s.heading == np.array([0, -1])) s.execute_one('R3') assert all(s.location == np.array([-4, -3])) assert all(s.heading == np.array([-1, 0])) s.execute_one('R100') assert all(s.location == np.array([-4, 97])) assert all(s.heading == np.array([0, 1])) def test_execute_many(): s = Santa(ORIGIN, NORTH) s.execute_many('L1, L3, R3') assert all(s.location == np.array([-4, -3])) assert all(s.heading == np.array([-1, 0])) def test_distance(): assert Santa(ORIGIN, NORTH).distance_from_start() == 0 assert Santa((10, 10), NORTH).distance_from_start() == 20 assert Santa((-17, 10), NORTH).distance_from_start() == 27 def test_turn_left(): east = NORTH @ TURN['L'] south = east @ TURN['L'] west = south @ TURN['L'] assert all(east == np.array([-1, 0])) assert all(south == np.array([0, -1])) assert all(west == np.array([1, 0])) def test_turn_right(): west = NORTH @ TURN['R'] south = west @ TURN['R'] east = south @ TURN['R'] assert all(east == np.array([-1, 0])) assert all(south == np.array([0, -1])) assert all(west == np.array([1, 0])) if __name__ == '__main__': instructions = sys.stdin.read() santa = Santa(ORIGIN, NORTH) santa.execute_many(instructions) print(santa) print('Distance from start:', santa.distance_from_start()) print('Distance to target: ', santa.distance_to_first_crossing()) Day 2: Haskell Day 2 instructions module Main where data Pos = Pos Int Int deriving (Show) -- Magrittr-style pipe operator (|>) :: a -> (a -> b) -> b x |> f = f x swapPos :: Pos -> Pos swapPos (Pos x y) = Pos y x clamp :: Int -> Int -> Int -> Int clamp lower upper x | x < lower = lower | x > upper = upper | otherwise = x clampH :: Pos -> Pos clampH (Pos x y) = Pos x' y' where y' = clamp 0 4 y r = abs (2 - y') x' = clamp r (4-r) x clampV :: Pos -> Pos clampV = swapPos . clampH . swapPos buttonForPos :: Pos -> String buttonForPos (Pos x y) = [buttons !! y !! x] where buttons = [" D ", " ABC ", "56789", " 234 ", " 1 "] decodeChar :: Pos -> Char -> Pos decodeChar (Pos x y) 'R' = clampH $ Pos (x+1) y decodeChar (Pos x y) 'L' = clampH $ Pos (x-1) y decodeChar (Pos x y) 'U' = clampV $ Pos x (y+1) decodeChar (Pos x y) 'D' = clampV $ Pos x (y-1) decodeLine :: Pos -> String -> Pos decodeLine p "" = p decodeLine p (c:cs) = decodeLine (decodeChar p c) cs makeCode :: String -> String makeCode instructions = lines instructions -- split into lines |> scanl decodeLine (Pos 1 1) -- decode to positions |> tail -- drop start position |> concatMap buttonForPos -- convert to buttons main = do input <- getContents putStrLn $ makeCode input Research Data Management Forum 18, Manchester !!! intro "" Monday 20 and Tuesday 21 November 2017 I’m at the Research Data Management Forum in Manchester. I thought I’d use this as an opportunity to try liveblogging, so during the event some notes should appear in the box below (you may have to manually refresh your browser tab periodically to get the latest version). I've not done this before, so if the blog stops updating then it's probably because I've stopped updating it to focus on the conference instead! This was made possible using GitHub's cool [Gist](https://gist.github.com) tool. Draft content policy I thought it was about time I had some sort of content policy on here so this is a first draft. It will eventually wind up as a separate page. Feedback welcome! !!! aside “Content policy” This blog’s primary purpose is as a reflective learning tool for my own development; my aim in writing any given post is mainly to expose and develop my own thinking on a topic. My reasons for making a public blog rather than a private journal are: 1. If I'm lucky, someone smarter than me will provide feedback that will help me and my readers to learn more 2. If I'm extra lucky, someone else might learn from the material as well Each post, therefore, represents the state of my thinking at the time I wrote it, or perhaps a deliberate provocation or exaggeration; either way, if you don't know me personally please don't judge me based entirely on my past words. This is a request though, not an attempt to excuse bad behaviour on my part. I accept full responsibility for any consequences of my words, whether intended or not. I will not remove comments or ban individuals for disagreeing with me, only for behaving offensively or disrespectfully. I will do my best to be fair and balanced and explain decisions that I take, but I reserve the right to take those decisions without making any explanation at all if it seems likely to further inflame a situation. If I end up responding to anything simply with a link to this policy, that's probably all the explanation you're going to get. It should go without saying, but the opinions presented in this blog are my own and not those of my employer or anyone else I might at times represent. Learning to live with anxiety !!! intro "" This is a post that I’ve been writing for months, and writing in my head for years. For some it will explain aspects of my personality that you might have wondered about. For some it will just be another person banging on self-indulgently about so-called “mental health issues”. Hopefully, for some it will demystify some stuff and show that you’re not alone and things do get better. For as long as I can remember I’ve been a worrier. I’ve also suffered from bouts of what I now recognise as depression, on and off since my school days. It’s only relatively recently that I’ve come to the realisation that these two might be connected and that my ‘worrying’ might in fact be outside the normal range of healthy human behaviour and might more accurately be described as chronic anxiety. You probably won’t have noticed it, but it’s been there. More recently I’ve begun feeling like I’m getting on top of it and feeling “normal” for the first time in my life. Things I’ve found that help include: getting out of the house more and socialising with friends; and getting a range of exercise, outdoors and away from the city (rock climbing is mentally and physically engaging and open water swimming is indescribably joyful). But mostly it’s the cognitive behavioural therapy (CBT) and the antidepressants. Before I go any further, a word about drugs (“don’t do drugs, kids”): I’m on the lowest available dose of a common antidepressant. This isn’t because it stops me being sad all the time (I’m not) or because it makes all my problems go away (it really doesn’t). It’s because the scientific evidence points to a combination of CBT and antidepressants as being the single most effective treatment for generalised anxiety disorder. The reason for this is simple: CBT isn’t easy, because it asks you to challenge habits and beliefs you’ve held your whole life. In the short term there is going to be more anxiety and some antidepressants are also effective at blunting the effect of this additional anxiety. In short, CBT is what makes you better, and the drugs just make it a little bit more effective. A lot of people have misconceptions about what it means to be ‘in therapy’. I suspect a lot of these are derived from the psychoanalysis we often see portrayed in (primarily US) film and TV. The problem with that type of navel-gazing therapy is that you can spend years doing it, finally reach some sort of breakthrough insight, and still not have no idea what the supposed insight means for your actual life. CBT is different in that rather than addressing feelings directly it focuses on habits in your thoughts (cognitive) and actions (behavioural) with feeling better as an outcome (therapy). CBT and related forms of therapy now have decades of clinical evidence showing that they really work. It uses a wide range of techniques to identify, challenge and reduce various common unhelpful thoughts and behaviours. By choosing and practicing these, you can break bad mental habits that you’ve been carrying around, often for decades. For me this means giving fair weight to my successes as well as my failings, allowing flexibility into the rigid rules that I have always, subconsciously, lived by, and being a bit kinder to myself when I make mistakes. It’s not been easy and I have to remind myself to practice this every day, but it’s really helped. !!! aside “More info” If you live in the UK, you might not be aware that you can get CBT and other psychological therapies on the NHS through a scheme called IAPT (improving access to psychological therapies). You can self-refer so you don’t need to see a doctor first, but you might want to anyway if you think medication might help. They also have a progression of treatments, so you might be offered a course of “guided self-help” and then progressed to CBT or another talking therapy if need be. This is what happened to me, and it did help a bit but it was CBT that helped me the most. Becoming a librarian What is a librarian? Is it someone who has a masters degree in librarianship and information science? Is it someone who looks after information for other people? Is it simply someone who works in a library? I’ve been grappling with this question a lot lately because I’ve worked in academic libraries for about 3 years now and I never really thought that’s something that might happen. People keep referring to me as “a librarian” but there’s some imposter feelings here because all the librarians around me have much more experience, have skills in areas like cataloguing and collection management and, generally, have a librarian masters degree. So I’ve been thinking about what it actually means to me to be a librarian or not. NB. some of these may be tongue-in-cheek Ways in which I am a librarian: I work in a library I help people to access and organise information I have a cat I like gin Ways in which I am not a librarian: I don’t have a librarianship qualification I don’t work with books 😉 I don’t knit (though I can probably remember how if pressed) I don’t shush people or wear my hair in a bun (I can confirm that this is also true of every librarian I know) Ways in which I am a shambrarian: I like beer I have more IT experience and qualification than librarianship At the end of the day, I still don’t know how I feel about this or, for that matter, how important it is. I’m probably going to accept whatever title people around me choose to bestow, though any label will chafe at times! Lean Libraries: applying agile practices to library services Kanban board Jeff Lasovski (via Wikimedia Commons) I’ve been working with our IT services at work quite closely for the last year as product owner for our new research data portal, ORDA. That’s been a fascinating process for me as I’ve been able to see first-hand some of the agile techniques that I’ve been reading about from time-to-time on the web over the last few years. They’re in the process of adopting a specific set of practices going under the name “Scrum”, which is fun because it uses some novel terminology that sounds pretty weird to non-IT folks, like “scrum master”, “sprint” and “product backlog”. On my small project we’ve had great success with the short cycle times and been able to build trust with our stakeholders by showing concrete progress on a regular basis. Modern librarianship is increasingly fluid, particularly in research services, and I think that to handle that fluidity it’s absolutely vital that we are able to work in a more agile way. I’m excited about the possibilities of some of these ideas. However, Scrum as implemented by our IT services doesn’t seem something that transfers directly to the work that we do: it’s too specialised for software development to adapt directly. What I intend to try is to steal some of the individual practices on an experimental basis and simply see what works and what doesn’t. The Lean concepts currently popular in IT were originally developed in manufacturing: if they can be translated from the production of physical goods to IT, I don’t see why we can’t make the ostensibly smaller step of translating them to a different type of knowledge work. I’ve therefore started reading around this subject to try and get as many ideas as possible. I’m generally pretty rubbish at taking notes from books, so I’m going to try and record and reflect on any insights I make on this blog. The framework for trying some of these out is clearly a Plan-Do-Check-Act continuous improvement cycle, so I’ll aim to reflect on that process too. I’m sure there will have been people implementing Lean in libraries already, so I’m hoping to be able to discover and learn from them instead of starting froms scratch. Wish me luck! Mozilla Global Sprint 2017 Photo by Lena Bell on Unsplash Every year, the Mozilla Foundation runs a two-day Global Sprint, giving people around the world 50 hours to work on projects supporting and promoting open culture and tech. Though much of the work during the sprint is, of course, technical software development work, there are always tasks suited to a wide range of different skill sets and experience levels. The participants include writers, designers, teachers, information professionals and many others. This year, for the first time, the University of Sheffield hosted a site, providing a space for local researchers, developers and others to get out of their offices, work on #mozsprint and link up with others around the world. The Sheffield site was organised by the Research Software Engineering group in collaboration with the University Library. Our site was only small compared to others, but we still had people working on several different projects. My reason for taking part in the sprint was to contribute to the international effort on the Library Carpentry project. A team spread across four continents worked throughout the whole sprint to review and develop our lesson material. As there were no other Library Carpentry volunteers at the Sheffield site, I chose to work on some urgent work around improving the presentation of our workshops and lessons on the web and related workflows. It was a really nice subproject to work on, requiring not only cleaning up and normalising the metadata we hold on workshops and lessons, but also digesting and formalising our current ad hoc process of lesson development. The largest group were solar physicists from the School of Maths and Statistics, working on the SunPy project, an open source environment for solar data analysis. They pushed loads of bug fixes and documentation improvements, and also mentored a new contributor through their first additions to the project. Anna Krystalli from Research Software Engineering worked on the EchoBurst project, which is building a web browser extension to help people break out of their online echo chambers. It does this by using natural language processing techniques to highlight well-written, logically sound articles that disagree with the reader’s stated views on particular topics of interest. Anna was part of an effort to begin extending this technology to online videos. We had a couple of individuals simply taking the opportunity to break out of their normal work environments to work or learn, including a couple of members of library staff show up for a couple of hours to learn how to use git on a new project! IDCC 2017 reflection For most of the last few years I've been lucky enough to attend the International Digital Curation Conference (IDCC). One of the main audiences attending is people who, like me, work on research data management at universities around the world and it's begun to feel like a sort of "home" conference to me. This year, IDCC was held at the Royal College of Surgeons in the beautiful city of Edinburgh. For the last couple of years, my overall impression has been that, as a community, we're moving away from the "first-order" problem of trying to convince people (from PhD students to senior academics) to take RDM seriously and into a rich set of "second-order" problems around how to do things better and widen support to more people. This year has been no exception. Here are a few of my observations and takeaway points. Everyone has a repository now Only last year, the most common question you'd get asked by strangers in the coffee break would be "Do you have a data repository?" Now the question is more likely to be "What are you using for your data repository?", along with more subtle questions about specific components of systems and how they interact. Integrating active storage and archival systems Now that more institutions have data worth preserving, there is more interest in (and in many cases experience of) setting up more seamless integrations between active and archival storage. There are lessons here we can learn. Freezing in amber vs actively maintaining assets There seemed to be an interesting debate going on throughout the conference around the aim of preservation: should we be faithfully preserving the bits and bytes provided without trying to interpret them, or should we take a more active approach by, for example, migrating obsolete formats to newer alternatives. If the former, should we attempt to preserve the software required to access the data as well? If the latter, how much effort do we invest and how do we ensure nothing is lost or altered in the migration? Demonstrating Data Science instead of debating what it is The phrase "Data Science" was once again one of the most commonly uttered of the conference. However, there is now less abstract discussion about what, exactly, is meant by this "data science" thing; this has been replaced more by concrete demonstrations. This change was exemplified perfectly by the keynote by data scientist Alice Daish, who spent a riveting 40 minutes or so enthusing about all the cool stuff she does with data at the British Museum. Recognition of software as an issue Even as recently as last year, I've struggled to drum up much interest in discussing software sustainability and preservation at events like this; the interest was there, but there were higher priorities. So I was completely taken by surprise when we ended up with 30+ people in the Software Preservation Birds of a Feather (BoF) session, and when very little input was needed from me as chair to keep a productive discussion going for a full 90 minutes. Unashamed promotion of openness As a community we seem to have nearly overthrown our collective embarrassment about the phrase "open data" (although maybe this is just me). We've always known it was a good thing, but I know I've been a bit of an apologist in the past, feeling that I had to "soften the blow" when asking researchers to be more open. Now I feel more confident in leading with the benefits of openness, and it felt like that's a change reflected in the community more widely. Becoming more involved in the conference This year, I took a decision to try and do more to contribute to the conference itself, and I felt like this was pretty successful both in making that contribution and building up my own profile a bit. I presented a paper on one of my current passions, Library Carpentry; it felt really good to be able to share my enthusiasm. I presented a poster on our work integrating our data repository and digital preservation platform; this gave me more of a structure for networking during breaks, as I was able to stand by the poster and start discussions with anyone who seemed interested. I chaired a parallel session; a first for me, and a different challenge from presenting or simply attending the talks. And finally, I proposed and chaired the Software Preservation BoF session (blog post forthcoming). Renewed excitement It's weird, and possibly all in my imagination, but there seemed to be more energy at this conference than at the previous couple I've been to. More people seemed to be excited about the work we're all doing, recent achievements and the possibilities for the future. Introducing PyRefine: OpenRefine meets Python I’m knocking the rust off my programming skills by attempting to write a pure-Python interpreter for OpenRefine “scripts”. OpenRefine is a great tool for exploring and cleaning datasets prior to analysing them. It also records an undo history of all actions that you can export as a sort of script in JSON format. One thing that bugs me though is that, having spent some time interactively cleaning up your dataset, you then need to fire up OpenRefine again and do some interactive mouse-clicky stuff to apply that cleaning routine to another dataset. You can at least re-import the JSON undo history to make that as quick as possible, but there’s no getting around the fact that there’s no quick way to do it from a cold start. There is a project, BatchRefine, that extends the OpenRefine server to accept batch requests over a HTTP API, but that isn’t useful when you can’t or don’t want to keep a full Java stack running in the background the whole time. My concept is this: you use OR to explore the data interactively and design a cleaning process, but then export the process to JSON and integrate it into your analysis in Python. That way it can be repeated ad nauseam without having to fire up a full Java stack. I’m taking some inspiration from the great talk “So you want to be a wizard?" by Julia Evans (@b0rk), who recommends trying experiments as a way to learn. She gives these Rules of Programming Experiments: “it doesn’t have to be good it doesn’t have to work you have to learn something” In that spirit, my main priorities are: to see if this can be done; to see how far I can get implementing it; and to learn something. If it also turns out to be a useful thing, well, that’s a bonus. Some of the interesting possible challenges here: Implement all core operations; there are quite a lot of these, some of which will be fun (i.e. non-trivial) to implement Implement (a subset of?) GREL, the General Refine Expression Language; I guess my undergrad course on implementing parsers and compilers will come in handy after all! Generate clean, sane Python code from the JSON rather than merely executing it; more than anything, this would be a nice educational tool for users of OpenRefine who want to see how to do equivalent things in Python Selectively optimise key parts of the process; this will involve profiling the code to identify bottlenecks as well as tweaking the actual code to go faster Potentially handle contributions to the code from other people; I’d be really happy if this happened but I’m realistic… If you’re interested, the project is called PyRefine and it’s on github. Constructive criticism, issues & pull requests all welcome! Implementing Yesterbox in emacs with mu4e I’ve been meaning to give Yesterbox a try for a while. The general idea is that each day you only deal with email that arrived yesterday or earlier. This forms your inbox for the day, hence “yesterbox”. Once you’ve emptied your yesterbox, or at least got through some minimum number (10 is recommended) then you can look at emails from today. Even then you only really want to be dealing with things that are absolutely urgent. Anything else can wait til tomorrow. The motivation for doing this is to get away from the feeling that we are King Canute, trying to hold back the tide. I find that when I’m processing my inbox toward zero there’s always a temptation to keep skipping to the new stuff that’s just come in. Hiding away the new email until I’ve dealt with the old is a very interesting idea. I use mu4e in emacs for reading my email, and handily the mu search syntax is very flexible so you’d think it would be easy to create a yesterbox filter: maildir:"/INBOX" date:..1d Unfortunately, 1d is interpreted as “24 hours ago from right now” so this filter misses everything that was sent yesterday but less than 24 hours ago. There was a feature request raised on the mu github repository to implement an additional date filter syntax but it seems to have died a death for now. In the meantime, the answer to this is to remember that my workplace observes fairly standard office hours, so that anything sent more than 9 hours ago is unlikely to have been sent today. The following does the trick: maildir:"/INBOX" date:..9h In my mu4e bookmarks list, that looks like this: (setq mu4e-bookmarks '(("flag:unread AND NOT flag:trashed" "Unread messages" ?u) ("flag:flagged maildir:/archive" "Starred messages" ?s) ("date:today..now" "Today's messages" ?t) ("date:7d..now" "Last 7 days" ?w) ("maildir:\"/Mailing lists.*\" (flag:unread OR flag:flagged)" "Unread in mailing lists" ?M) ("maildir:\"/INBOX\" date:..1d" "Yesterbox" ?y))) ;; <- this is the new one Rewarding good practice in research From opensource.com on Flickr Whenever I’m involved in a discussion about how to encourage researchers to adopt new practices, eventually someone will come out with some variant of the following phrase: “That’s all very well, but researchers will never do XYZ until it’s made a criterion in hiring and promotion decisions.” With all the discussion of carrots and sticks I can see where this attitude comes from, and strongly empathise with it, but it raises two main problems: It’s unfair and more than a little insulting to anyone to be lumped into one homogeneous group; and Taking all the different possible XYZs into account, that’s an awful lot of hoops to expect anyone to jump through. Firstly, “researchers” are as diverse as the rest of us in terms of what gets them out of bed in the morning. Some of us want prestige; some want to contribute to a greater good; some want to create new things; some just enjoy the work. One thing I’d argue we all have in common is this: nothing is more offputting than feeling like you’re being strongarmed into something you don’t want to do. If we rely on simplistic metrics, people will focus on those and miss the point. At best people will disengage and at worst they will actively game the system. I’ve got to do these ten things to get my next payrise, and still retain my sanity? Ok, what’s the least I can get away with and still tick them off. You see it with students taking poorly-designed assessments and grown-ups are no difference. We do need to wield carrots as well as sticks, but the whole point is that these practices are beneficial in and of themselves. The carrots are already there if we articulate them properly and clear the roadblocks (don’t you enjoy mixed metaphors?). Creating artificial benefits will just dilute the value of the real ones. Secondly, I’ve heard a similar argument made for all of the following practices and more: Research data management Open Access publishing Public engagement New media (e.g. blogging) Software management and sharing Some researchers devote every waking hour to their work, whether it’s in the lab, writing grant applications, attending conferences, authoring papers, teaching, and so on and so on. It’s hard to see how someone with all this in their schedule can find time to exercise any of these new skills, let alone learn them in the first place. And what about the people who sensibly restrict the hours taken by work to spend more time doing things they enjoy? Yes, all of the above practices are valuable, both for the individual and the community, but they’re all new (to most) and hence require more effort up front to learn. We have to accept that it’s inevitably going to take time for all of them to become “business as usual”. I think if the hiring/promotion/tenure process has any role in this, it’s in asking whether the researcher can build a coherent narrative as to why they’ve chosen to focus their efforts in this area or that. You’re not on Twitter but your data is being used by 200 research groups across the world? Great! You didn’t have time to tidy up your source code for github but your work is directly impacting government policy? Brilliant! We still need convince more people to do more of these beneficial things, so how? Call me naïve, but maybe we should stick to making rational arguments, calming fears and providing low-risk opportunities to learn new skills. Acting (compassionately) like a stuck record can help. And maybe we’ll need to scale back our expectations in other areas (journal impact factors, anyone?) to make space for the new stuff. Software Carpentry: SC Test; does your software do what you meant? “The single most important rule of testing is to do it.” — Brian Kernighan and Rob Pike, The Practice of Programming (quote taken from SC Test page One of the trickiest aspects of developing software is making sure that it actually does what it’s supposed to. Sometimes failures are obvious: you get completely unreasonable output or even (shock!) a comprehensible error message. But failures are often more subtle. Would you notice if your result was out by a few percent, or consistently ignored the first row of your input data? The solution to this is testing: take some simple example input with a known output, run the code and compare the actual output with the expected one. Implement a new feature, test and repeat. Sounds easy, doesn’t it? But then you implement a new bit of code. You test it and everything seems to work fine, except that your new feature required changes to existing code and those changes broke something else. So in fact you need to test everything, and do it every time you make a change. Further than that, you probably want to test that all your separate bits of code work together properly (integration testing) as well as testing the individual bits separately (unit testing). In fact, splitting your tests up like that is a good way of holding on to your sanity. This is actually a lot less scary than it sounds, because there are plenty of tools now to automate that testing: you just type a simple test command and everything is verified. There are even tools that enable you to have tests run automatically when you check the code into version control, and even automatically deploy code that passes the tests, a process known as continuous integration or CI. The big problems with testing are that it’s tedious, your code seems to work without it and no-one tells you off for not doing it. At the time when the Software Carpentry competition was being run, the idea of testing wasn’t new, but the tools to help were in their infancy. “Existing tools are obscure, hard to use, expensive, don’t actually provide much help, or all three.” The SC Test category asked entrants “to design a tool, or set of tools, which will help programmers construct and maintain black box and glass box tests of software components at all levels, including functions, modules, and classes, and whole programs.” The SC Test category is interesting in that the competition administrators clearly found it difficult to specify what they wanted to see in an entry. In fact, the whole category was reopened with a refined set of rules and expectations. Ultimately, it’s difficult to tell whether this category made a significant difference. Where the tools to write tests used to be very sparse and difficult to use they are now many and several options exist for most programming languages. With this proliferation, several tried-and-tested methodologies have emerged which are consistent across many different tools, so while things still aren’t perfect they are much better. In recent years there has been a culture shift in the wider software development community towards both testing in general and test-first development, where the tests for a new feature are written first, and then the implementation is coded incrementally until all tests pass. The current challenge is to transfer this culture shift to the academic research community! Tools for collaborative markdown editing Photo by Alan Cleaver I really love Markdown1. I love its simplicity; its readability; its plain-text nature. I love that it can be written and read with nothing more complicated than a text-editor. I love how nicely it plays with version control systems. I love how easy it is to convert to different formats with Pandoc and how it’s become effectively the native text format for a wide range of blogging platforms. One frustration I’ve had recently, then, is that it’s surprisingly difficult to collaborate on a Markdown document. There are various solutions that almost work but at best feel somehow inelegant, especially when compared with rock solid products like Google Docs. Finally, though, we’re starting to see some real possibilities. Here are some of the things I’ve tried, but I’d be keen to hear about other options. 1. Just suck it up To be honest, Google Docs isn’t that bad. In fact it works really well, and has almost no learning curve for anyone who’s ever used Word (i.e. practically anyone who’s used a computer since the 90s). When I’m working with non-technical colleagues there’s nothing I’d rather use. It still feels a bit uncomfortable though, especially the vendor lock-in. You can export a Google Doc to Word, ODT or PDF, but you need to use Google Docs to do that. Plus as soon as I start working in a word processor I get tempted to muck around with formatting. 2. Git(hub) The obvious solution to most techies is to set up a GitHub repo, commit the document and go from there. This works very well for bigger documents written over a longer time, but seems a bit heavyweight for a simple one-page proposal, especially over short timescales. Who wants to muck around with pull requests and merging changes for a document that’s going to take 2 days to write tops? This type of project doesn’t need a bug tracker or a wiki or a public homepage anyway. Even without GitHub in the equation, using git for such a trivial use case seems clunky. 3. Markdown in Etherpad/Google Docs Etherpad is great tool for collaborative editing, but suffers from two key problems: no syntax highlighting or preview for markdown (it’s just treated as simple text); and you need to find a server to host it or do it yourself. However, there’s nothing to stop you editing markdown with it. You can do the same thing in Google Docs, in fact, and I have. Editing a fundamentally plain-text format in a word processor just feels weird though. 4. Overleaf/Authorea Overleaf and Authorea are two products developed to support academic editing. Authorea has built-in markdown support but lacks proper simultaneous editing. Overleaf has great simultaneous editing but only supports markdown by wrapping a bunch of LaTeX boilerplate around it. Both OK but unsatisfactory. 5. StackEdit Now we’re starting to get somewhere. StackEdit has both Markdown syntax highlighting and near-realtime preview, as well as integrating with Google Drive and Dropbox for file synchronisation. 6. HackMD HackMD is one that I only came across recently, but it looks like it does exactly what I’m after: a simple markdown-aware editor with live preview that also permits simultaneous editing. I’m a little circumspect simply because I know simultaneous editing is difficult to get right, but it certainly shows promise. 7. Classeur I discovered Classeur literally today: it’s developed by the same team as StackEdit (which is now apparently no longer in development), and is currently in beta, but it looks to offer two killer features: real-time collaboration, including commenting, and pandoc-powered export to loads of different formats. Anything else? Those are the options I’ve come up with so far, but they can’t be the only ones. Is there anything I’ve missed? Other plain-text formats are available. I’m also a big fan of org-mode. ↩︎ Software Carpentry: SC Track; hunt those bugs! This competition will be an opportunity for the next wave of developers to show their skills to the world — and to companies like ours. — Dick Hardt, ActiveState (quote taken from SC Track page) All code contains bugs, and all projects have features that users would like but which aren’t yet implemented. Open source projects tend to get more of these as their user communities grow and start requesting improvements to the product. As your open source project grows, it becomes harder and harder to keep track of and prioritise all of these potential chunks of work. What do you do? The answer, as ever, is to make a to-do list. Different projects have used different solutions, including mailing lists, forums and wikis, but fairly quickly a whole separate class of software evolved: the bug tracker, which includes such well-known examples as Bugzilla, Redmine and the mighty JIRA. Bug trackers are built entirely around such requests for improvement, and typically track them through workflow stages (planning, in progress, fixed, etc.) with scope for the community to discuss and add various bits of metadata. In this way, it becomes easier both to prioritise problems against each other and to use the hive mind to find solutions. Unfortunately most bug trackers are big, complicated beasts, more suited to large projects with dozens of developers and hundreds or thousands of users. Clearly a project of this size is more difficult to manage and requires a certain feature set, but the result is that the average bug tracker is non-trivial to set up for a small single-developer project. The SC Track category asked entrants to propose a better bug tracking system. In particular, the judges were looking for something easy to set up and configure without compromising on functionality. The winning entry was a bug-tracker called Roundup, proposed by Ka-Ping Yee. Here we have another tool which is still in active use and development today. Given that there is now a huge range of options available in this area, including the mighty github, this is no small achievement. These days, of course, github has become something of a de facto standard for open source project management. Although ostensibly a version control hosting platform, each github repository also comes with a built-in issue tracker, which is also well-integrated with the “pull request” workflow system that allows contributors to submit bug fixes and features themselves. Github’s competitors, such as GitLab and Bitbucket, also include similar features. Not everyone wants to work in this way though, so it’s good to see that there is still a healthy ecosystem of open source bug trackers, and that Software Carpentry is still having an impact. Software Carpentry: SC Config; write once, compile anywhere Nine years ago, when I first release Python to the world, I distributed it with a Makefile for BSD Unix. The most frequent questions and suggestions I received in response to these early distributions were about building it on different Unix platforms. Someone pointed me to autoconf, which allowed me to create a configure script that figured out platform idiosyncracies Unfortunately, autoconf is painful to use – its grouping, quoting and commenting conventions don’t match those of the target language, which makes scripts hard to write and even harder to debug. I hope that this competition comes up with a better solution — it would make porting Python to new platforms a lot easier! — Guido van Rossum, Technical Director, Python Consortium (quote taken from SC Config page) On to the next Software Carpentry competition category, then. One of the challenges of writing open source software is that you have to make it run on a wide range of systems over which you have no control. You don’t know what operating system any given user might be using or what libraries they have installed, or even what versions of those libraries. This means that whatever build system you use, you can’t just send the Makefile (or whatever) to someone else and expect everything to go off without a hitch. For a very long time, it’s been common practice for source packages to include a configure script that, when executed, runs a bunch of tests to see what it has to work with and sets up the Makefile accordingly. Writing these scripts by hand is a nightmare, so tools like autoconf and automake evolved to make things a little easier. They did, and if the tests you want to use are already implemented they work very well indeed. Unfortunately they’re built on an unholy combination of shell scripting and the archaic Gnu M4 macro language. That means if you want to write new tests you need to understand both of these as well as the architecture of the tools themselves — not an easy task for the average self-taught research programmer. SC Conf, then, called for a re-engineering of the autoconf concept, to make it easier for researchers to make their code available in a portable, platform-independent format. The second round configuration tool winner was SapCat, “a tool to help make software portable”. Unfortunately, this one seems not to have gone anywhere, and I could only find the original proposal on the Internet Archive. There were a lot of good ideas in this category about making catalogues and databases of system quirks to avoid having to rerun the same expensive tests again the way a standard ./configure script does. I think one reason none of these ideas survived is that they were overly ambitions, imagining a grand architecture where their tool provide some overarching source of truth. This is in stark contrast to the way most Unix-like systems work, where each tool does one very specific job well and tools are easy to combine in various ways. In the end though, I think Moore’s Law won out here, making it easier to do the brute-force checks each time than to try anything clever to save time — a good example of avoiding unnecessary optimisation. Add to that the evolution of the generic pkg-config tool from earlier package-specific tools like gtk-config, and it’s now much easier to check for particular versions and features of common packages. On top of that, much of the day-to-day coding of a modern researcher happens in interpreted languages like Python and R, which give you a fully-functioning pre-configured environment with a lot less compiling to do. As a side note, Tom Tromey, another of the shortlisted entrants in this category, is still a major contributor to the open source world. He still seems to be involved in the automake project, contributes a lot of code to the emacs community too and blogs sporadically at The Cliffs of Inanity. Semantic linefeeds: one clause per line I’ve started using “semantic linefeeds”, a concept I discovered on Brandon Rhodes' blog, when writing content, an idea described in that article far better than I could. I turns out this is a very old idea, promoted way back in the day by Brian W Kernighan, contributor to the original Unix system, co-creator of the AWK and AMPL programming languages and co-author of a lot of seminal programming textbooks including “The C Programming Language”. The basic idea is that you break lines at natural gaps between clauses and phrases, rather than simply after the last word before you hit 80 characters. Keeping line lengths strictly to 80 characters isn’t really necessary in these days of wide aspect ratios for screens. Breaking lines at points that make semantic sense in the sentence is really helpful for editing, especially in the context of version control, because it isolates changes to the clause in which they occur rather than just the nearest 80-character block. I also like it because it makes my crappy prose feel just a little bit more like poetry. ☺ Software Carpentry: SC Build; or making a better make Software tools often grow incrementally from small beginnings into elaborate artefacts. Each increment makes sense, but the final edifice is a mess. make is an excellent example: a simple tool that has grown into a complex domain-specific programming language. I look forward to seeing the improvements we will get from designing the tool afresh, as a whole… — Simon Peyton-Jones, Microsoft Research (quote taken from SC Build page) Most people who have had to compile an existing software tool will have come across the venerable make tool (which usually these days means GNU Make). It allows the developer to write a declarative set of rules specifying how the final software should be built from its component parts, mostly source code, allowing the build itself to be carried out by simply typing make at the command line and hitting Enter. Given a set of rules, make will work out all the dependencies between components and ensure everything is built in the right order and nothing that is up-to-date is rebuilt. Great in principle but make is notoriously difficult for beginners to learn, as much of the logic for how builds are actually carried out is hidden beneath the surface. This also makes it difficult to debug problems when building large projects. For these reasons, the SC Build category called for a replacement build tool engineered from the ground up to solve these problems. The second round winner, ScCons, is a Python-based make-like build tool written by Steven Knight. While I could find no evidence of any of the other shortlisted entries, this project (now renamed SCons) continues in active use and development to this day. I actually use this one myself from time to time and to be honest I prefer it in many cases to trendy new tools like rake or grunt and the behemoth that is Apache Ant. Its Python-based SConstruct file syntax is remarkably intuitive and scales nicely from very simple builds up to big and complicated project, with good dependency tracking to avoid unnecessary recompiling. It has a lot of built-in rules for performing common build & compile tasks, but it’s trivial to add your own, either by combining existing building blocks or by writing a new builder with the full power of Python. A minimal SConstruct file looks like this: Program('hello.c') Couldn’t be simpler! And you have the full power of Python syntax to keep your build file simple and readable. It’s interesting that all the entries in this category apart from one chose to use a Python-derived syntax for describing build steps. Python was clearly already a language of choice for flexible multi-purpose computing. The exception is the entry that chose to use XML instead, which I think is a horrible idea (oh how I used to love XML!) but has been used to great effect in the Java world by tools like Ant and Maven. What happened to the original Software Carpentry? “Software Carpentry was originally a competition to design new software tools, not a training course. The fact that you didn’t know that tells you how well it worked.” When I read this in a recent post on Greg Wilson’s blog, I took it as a challenge. I actually do remember the competition, although looking at the dates it was long over by the time I found it. I believe it did have impact; in fact, I still occasionally use one of the tools it produced, so Greg’s comment got me thinking: what happened to the other competition entries? Working out what happened will need a bit of digging, as most of the relevant information is now only available on the Internet Archive. It certainly seems that by November 2008 the domain name had been allowed to lapse and had been replaced with a holding page by the registrar. There were four categories in the competition, each representing a category of tool that the organisers thought could be improved: SC Build: a build tool to replace make SC Conf: a configuration management tool to replace autoconf and automake SC Track: a bug tracking tool SC Test: an easy to use testing framework I’m hoping to be able to show that this work had a lot more impact than Greg is admitting here. I’ll keep you posted on what I find! Changing static site generators: Nanoc → Hugo I’ve decided to move the site over to a different static site generator, Hugo. I’ve been using Nanoc for a long time and it’s worked very well, but lately it’s been taking longer and longer to compile the site and throwing weird errors that I can’t get to the bottom of. At the time I started using Nanoc, static site generators were in their infancy. There weren’t the huge number of feature-loaded options that there are now, so I chose one and I built a whole load of blogging-related functionality myself. I did it in ways that made sense at the time but no longer work well with Nanoc’s latest versions. So it’s time to move to something that has blogging baked-in from the beginning and I’m taking the opportunity to overhaul the look and feel too. Again, when I started there weren’t many pre-existing themes so I built the whole thing myself and though I’m happy with the work I did on it it never quite felt polished enough. Now I’ve got the opportunity to adapt one of the many well-designed themes already out there, so I’ve taken one from the Hugo themes gallery and tweaked the colours to my satisfaction. Hugo also has various features that I’ve wanted to implement in Nanoc but never quite got round to it. The nicest one is proper handling of draft posts and future dates, but I keep finding others. There’s a lot of old content that isn’t quite compatible with the way Hugo does things so I’ve taken the old Nanoc-compiled content and frozen it to make sure that old links should still work. I could probably fiddle with it for years without doing much so it’s probably time to go ahead and publish it. I’m still not completely happy with my choice of theme but one of the joys of Hugo is that I can change that whenever I want. Let me know what you think! License Except where otherwise stated, all content on eRambler by Jez Cope is licensed under a Creative Commons Attribution-ShareAlike 4.0 International license. RDM Resources I occasionally get asked for resources to help someone learn more about research data management (RDM) as a discipline (i.e. for those providing RDM support rather than simply wanting to manage their own data). I’ve therefore collected a few resources together on this page. If you’re lucky I might even update it from time to time! First, a caveat: this is very focussed on UK Higher Education, though much of it will still be relevant for people outside that narrow demographic. My general recommendation would be to start with the Digital Curation Centre (DCC) website and follow links out from there. I also have a slowly growing list of RDM links on Diigo, and there’s an RDM section in my list of blogs and feeds too. Mailing lists Jiscmail is a popular list server run for the benefit of further and higher education in the UK; the following lists are particularly relevant: RESEARCH-DATAMAN DATA-PUBLICATION DIGITAL-PRESERVATION LIS-RESEARCHSUPPORT The Research Data Alliance have a number of Interest Groups and Working Groups that discuss issues by email Events International Digital Curation Conference — major annual conference Research Data Management Forum — roughly every six months, places are limited! RDA Plenary — also every 6 months, but only about 1 in every 3 in Europe Books In no particular order: Martin, Victoria. Demystifying eResearch: A Primer for Librarians. Libraries Unlimited, 2014. Borgman, Christine L. Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, Massachusetts: The MIT Press, 2015. Corti, Louise, Veerle Van den Eynden, and Libby Bishop. Managing and Sharing Research Data. Thousand Oaks, CA: SAGE Publications Ltd, 2014. Pryor, Graham, ed. Managing Research Data. Facet Publishing, 2012. Pryor, Graham, Sarah Jones, and Angus Whyte, eds. Delivering Research Data Management Services: Fundamentals of Good Practice. Facet Publishing, 2013. Ray, Joyce M., ed. Research Data Management: Practical Strategies for Information Professionals. West Lafayette, Indiana: Purdue University Press, 2014. Reports ‘Ten Recommendations for Libraries to Get Started with Research Data Management’. LIBER, 24 August 2012. http://libereurope.eu/news/ten-recommendations-for-libraries-to-get-started-with-research-data-management/. ‘Science as an Open Enterprise’. Royal Society, 2 June 2012. https://royalsociety.org/policy/projects/science-public-enterprise/Report/. Mary Auckland. ‘Re-Skilling for Research’. RLUK, January 2012. http://www.rluk.ac.uk/wp-content/uploads/2014/02/RLUK-Re-skilling.pdf. Journals International Journal of Digital Curation (IJDC) Journal of eScience Librarianship (JeSLib) Fairphone 2: initial thoughts on the original ethical smartphone I’ve had my eye on the Fairphone 2 for a while now, and when my current phone, an aging Samsung Galaxy S4, started playing up I decided it was time to take the plunge. A few people have asked for my thoughts on the Fairphone so here are a few notes. Why I bought it The thing that sparked my interest, and the main reason for buying the phone really, was the ethical stance of the manufacturer. The small Swedish company have gone to great lengths to ensure that both labour and materials are sourced as responsibly as possible. They regularly inspect the factories where the parts are made and assembled to ensure fair treatment of the workers and they source all the raw materials carefully to minimise the environmental impact and the use of conflict minerals. Another side to this ethical stance is a focus on longevity of the phone itself. This is not a product with an intentionally limited lifespan. Instead, it’s designed to be modular and as repairable as possible, by the owner themselves. Spares are available for all of the parts that commonly fail in phones (including screen and camera), and at the time of writing the Fairphone 2 is the only phone to receive 10/10 for reparability from iFixit. There are plans to allow hardware upgrades, including an expansion port on the back so that NFC or wireless charging could be added with a new case, for example. What I like So far, the killer feature for me is the dual SIM card slots. I have both a personal and a work phone, and the latter was always getting left at home or in the office or running out of charge. Now I have both SIMs in the one phone: I can recieve calls on either number, turn them on and off independently and choose which account to use when sending a text or making a call. The OS is very close to “standard” Android, which is nice, and I really don’t miss all the extra bloatware that came with the Galaxy S4. It also has twice the storage of that phone, which is hardly unique but is still nice to have. Overall, it seems like a solid, reliable phone, though it’s not going to outperform anything else at the same price point. It certainly feels nice and snappy for everything I want to use it for. I’m no mobile gamer, but there is that distant promise of upgradability on the horizon if you are. What I don’t like I only have two bugbears so far. Once or twice it’s locked up and become unresponsive, requiring a “manual reset” (removing and replacing the battery) to get going again. It also lacks NFC, which isn’t really a deal breaker, but I was just starting to make occasional use of it on the S4 (mostly experimenting with my Yubikey NEO) and it would have been nice to try out Android Pay when it finally arrives in the UK. Overall It’s definitely a serious contender if you’re looking for a new smartphone and aren’t bothered about serious mobile gaming. You do pay a premium for the ethical sourcing and modularity, but I feel that’s worth it for me. I’m looking forward to seeing how it works out as a phone. Wiring my web I’m a nut for automating repetitive tasks, so I was dead pleased a few years ago when I discovered that IFTTT let me plug different bits of the web together. I now use it for tasks such as: Syndicating blog posts to social media Creating scheduled/repeating todo items from a Google Calendar Making a note to revisit an article I’ve starred in Feedly I’d probably only be half-joking if I said that I spend more time automating things than I save not having to do said things manually. Thankfully it’s also a great opportunity to learn, and recently I’ve been thinking about reimplementing some of my IFTTT workflows myself to get to grips with how it all works. There are some interesting open source projects designed to offer a lot of this functionality, such as Huginn, but I decided to go for a simpler option for two reasons: I want to spend my time learning about the APIs of the services I use and how to wire them together, rather than learning how to use another big framework; and I only have a small Amazon EC2 server to pay with and a heavy Ruby on Rails app like Huginn (plus web server) needs more memory than I have. Instead I’ve gone old-school with a little collection of individual scripts to do particular jobs. I’m using the built-in scheduling functionality of systemd, which is already part of a modern Linux operating system, to get them to run periodically. It also means I can vary the language I use to write each one depending on the needs of the job at hand and what I want to learn/feel like at the time. Currently it’s all done in Python, but I want to have a go at Lisp sometime, and there are some interesting new languages like Go and Julia that I’d like to get my teeth into as well. You can see my code on github as it develops: https://github.com/jezcope/web-plumbing. Comments and contributions are welcome (if not expected) and let me know if you find any of the code useful. Image credit: xkcd #1319, Automation Data is like water, and language is like clothing I admit it: I’m a grammar nerd. I know the difference between ‘who’ and ‘whom’, and I’m proud. I used to be pretty militant, but these days I’m more relaxed. I still take joy in the mechanics of the language, but I also believe that English is defined by its usage, not by a set of arbitrary rules. I’m just as happy to abuse it as to use it, although I still think it’s important to know what rules you’re breaking and why. My approach now boils down to this: language is like clothing. You (probably) wouldn’t show up to a job interview in your pyjamas1, but neither are you going to wear a tuxedo or ballgown to the pub. Getting commas and semicolons in the right place is like getting your shirt buttons done up right. Getting it wrong doesn’t mean you’re an idiot. Everyone will know what you meant. It will affect how you’re perceived, though, and that will affect how your message is perceived. And there are former rules2 that some still enforce that are nonetheless dropping out of regular usage. There was a time when everyone in an office job wore formal clothing. Then it became acceptable just to have a blouse, or a shirt and tie. Then the tie became optional and now there are many professions where perfectly well-respected and competent people are expected to show up wearing nothing smarter than jeans and a t-shirt. One such rule IMHO is that ‘data’ is a plural and should take pronouns like ‘they’ and ‘these’. The origin of the word ‘data’ is in the Latin plural of ‘datum’, and that idea has clung on for a considerable period. But we don’t speak Latin and the English language continues to evolve: ‘agenda’ also began life as a Latin plural, but we don’t use the word ‘agendum’ any more. It’s common everyday usage to refer to data with singular pronouns like ‘it’ and ‘this’, and it’s very rare to see someone referring to a single datum (as opposed to ‘data point’ or something). If you want to get technical, I tend to think of data as a mass noun, like ‘water’ or ‘information’. It’s uncountable: talking about ‘a water’ or ‘an information’ doesn’t make much sense, but it uses singular pronouns, as in ‘this information’. If you’re interested, the Oxford English Dictionary also takes this position, while Chambers leaves the choice of singular or plural noun up to you. There is absolutely nothing wrong, in my book, with referring to data in the plural as many people still do. But it’s no longer a rule and for me it’s weakened further from guideline to preference. It’s like wearing a bow-tie to work. There’s nothing wrong with it and some people really make it work, but it’s increasingly outdated and even a little eccentric. or maybe you’d totally rock it. ↩︎ Like not starting a sentence with a conjunction… ↩︎ #IDCC16 day 2: new ideas Well, I did a great job of blogging the conference for a couple of days, but then I was hit by the bug that’s been going round and didn’t have a lot of energy for anything other than paying attention and making notes during the day! I’ve now got round to reviewing my notes so here are a few reflections on day 2. Day 2 was the day of many parallel talks! So many great and inspiring ideas to take in! Here are a few of my take-home points. Big science and the long tail The first parallel session had examples of practical data management in the real world. Jian Qin & Brian Dobreski (School of Information Studies, Syracuse University) worked on reproducibility with one of the research groups involved with the recent gravitational wave discovery. “Reproducibility” for this work (as with much of physics) mostly equates to computational reproducibility: tracking the provenance of the code and its input and output is key. They also found that in practice the scientists' focus was on making the big discovery, and ensuring reproducibility was seen as secondary. This goes some way to explaining why current workflows and tools don’t really capture enough metadata. Milena Golshan & Ashley Sands (Center for Knowledge Infrastructures, UCLA) investigated the use of Software-as-a-Service (SaaS, such as Google Drive, Dropbox or more specialised tools) as a way of meeting the needs of long-tail science research such as ocean science. This research is characterised by small teams, diverse data, dynamic local development of tools, local practices and difficulty disseminating data. This results in a need for researchers to be generalists, as opposed to “big science” research areas, where they can afford to specialise much more deeply. Such generalists tend to develop their own isolated workflows, which can differ greatly even within a single lab. Long-tail research also often struggles from a lack of dedicated IT support. They found that use of SaaS could help to meet these challenges, but with a high cost required to cover the needed guarantees of security and stability. Education & training This session focussed on the professional development of library staff. Eleanor Mattern (University of Pittsburgh) described the immersive training introduced to improve librarians' understanding of the data needs of their subject areas in delivering their RDM service delivery model. The participants each conducted a “disciplinary deep dive”, shadowing researchers and then reporting back to the group on their discoveries with a presentation and discussion. Liz Lyon (also University of Pittsburgh, formerly UKOLN/DCC) gave a systematic breakdown of the skills, knowledge and experience required in different data-related roles, obtained from an analysis of job adverts. She identified distinct roles of data analyst, data engineer and data journalist, and as well as each role’s distinctive skills, pinpointed common requirements of all three: Python, R, SQL and Excel. This work follows on from an earlier phase which identified an allied set of roles: data archivist, data librarian and data steward. Data sharing and reuse This session gave an overview of several specific workflow tools designed for researchers. Marisa Strong (University of California Curation Centre/California Digital Libraries) presented Dash, a highly modular tool for manual data curation and deposit by researchers. It’s built on their flexible backend, Stash, and though it’s currently optimised to deposit in their Merritt data repository it could easily be hooked up to other repositories. It captures DataCite metadata and a few other fields, and is integrated with ORCID to uniquely identify people. In a different vein, Eleni Castro (Institute for Quantitative Social Science, Harvard University) discussed some of the ways that Harvard’s Dataverse repository is streamlining deposit by enabling automation. It provides a number of standardised endpoints such as OAI-PMH for metadata harvest and SWORD for deposit, as well as custom APIs for discovery and deposit. Interesting use cases include: An addon for the Open Science Framework to deposit in Dataverse via SWORD An R package to enable automatic deposit of simulation and analysis results Integration with publisher workflows Open Journal Systems A growing set of visualisations for deposited data In the future they’re also looking to integrate with DMPtool to capture data management plans and with Archivematica for digital preservation. Andrew Treloar (Australian National Data Service) gave us some reflections on the ANDS “applications programme”, a series of 25 small funded projects intended to address the fourth of their strategic transformations, single use → reusable. He observed that essentially these projects worked because they were able to throw money at a problem until they found a solution: not very sustainable. Some of them stuck to a traditional “waterfall” approach to project management, resulting in “the right solution 2 years late”. Every researcher’s needs are “special” and communities are still constrained by old ways of working. The conclusions from this programme were that: “Good enough” is fine most of the time Adopt/Adapt/Augment is better than Build Existing toolkits let you focus on the 10% functionality that’s missing Succussful projects involved research champions who can: 1) articulate their community’s requirements; and 2) promote project outcomes Summary All in all, it was a really exciting conference, and I’ve come home with loads of new ideas and plans to develop our services at Sheffield. I noticed a continuation of some of the trends I spotted at last year’s IDCC, especially an increasing focus on “second-order” problems: we’re no longer spending most of our energy just convincing researchers to take data management seriously and are able to spend more time helping them to do it better and get value out of it. There’s also a shift in emphasis (identified by closing speaker Cliff Lynch) from sharing to reuse, and making sure that data is not just available but valuable. #IDCC16 Day 1: Open Data The main conference opened today with an inspiring keynote by Barend Mons, Professor in Biosemantics, Leiden University Medical Center. The talk had plenty of great stuff, but two points stood out for me. First, Prof Mons described a newly discovered link between Huntingdon’s Disease and a previously unconsidered gene. No-one had previously recognised this link, but on mining the literature, an indirect link was identified in more than 10% of the roughly 1 million scientific claims analysed. This is knowledge for which we already had more than enough evidence, but which could never have been discovered without such a wide-ranging computational study. Second, he described a number of behaviours which should be considered “malpractice” in science: Relying on supplementary data in articles for data sharing: the majority of this is trash (paywalled, embedded in bitmap images, missing) Using the Journal Impact Factor to evaluate science and ignoring altmetrics Not writing data stewardship plans for projects (he prefers this term to “data management plan”) Obstructing tenure for data experts by assuming that all highly-skilled scientists must have a long publication record A second plenary talk from Andrew Sallons of the Centre for Open Science introduced a number of interesting-looking bits and bobs, including the Transparency & Openness Promotion (TOP) Guidelines which set out a pathway to help funders, publishers and institutions move towards more open science. The rest of the day was taken up with a panel on open data, a poster session, some demos and a birds-of-a-feather session on sharing sensitive/confidential data. There was a great range of posters, but a few that stood out to me were: Lessons learned about ISO 16363 (“Audit and certification of trustworthy digital repositories”) certification from the British Library Two separate posters (from the Universities of Toronto and Colorado) about disciplinary RDM information & training for liaison librarians A template for sharing psychology data developed by a psychologist-turned-information researcher from Carnegie Mellon University More to follow, but for now it’s time for the conference dinner! #IDCC16 Day 0: business models for research data management I’m at the International Digital Curation Conference 2016 (#IDCC16) in Amsterdam this week. It’s always a good opportunity to pick up some new ideas and catch up with colleagues from around the world, and I always come back full of new possibilities. I’ll try and do some more reflective posts after the conference but I thought I’d do some quick reactions while everything is still fresh. Monday and Thursday are pre- and post-conference workshop days, and today I attended Developing Research Data Management Services. Joy Davidson and Jonathan Rans from the Digital Curation Centre (DCC) introduced us to the Business Model Canvas, a template for designing a business model on a single sheet of paper. The model prompts you to think about all of the key facets of a sustainable, profitable business, and can easily be adapted to the task of building a service model within a larger institution. The DCC used it as part of the Collaboration to Clarify Curation Costs (4C) project, whose output the Curation Costs Exchange is also worth a look. It was a really useful exercise to be able to work through the whole process for an aspect of research data management (my table focused on training & guidance provision), both because of the ideas that came up and also the experience of putting the framework into practice. It seems like a really valuable tool and I look forward to seeing how it might help us with our RDM service development. Tomorrow the conference proper begins, with a range of keynotes, panel sessions and birds-of-a-feather meetings so hopefully more then! About me I help people in Higher Education communicate and collaborate more effectively using technology. I currently work at the University of Sheffield focusing on research data management policy, practice, training and advocacy. In my free time, I like to: run; play the accordion; morris dance; climb; cook; read (fiction and non-fiction); write. Better Science Through Better Data #scidata17 Better Science through Better DoughnutsJez Cope Update: fixed the link to the slides so it works now! Last week I had the honour of giving my first ever keynote talk, at an event entitled Better Science Through Better Data hosted jointly by Springer Nature and the Wellcome Trust. It was nerve-wracking but exciting and seemed to go down fairly well. I even got accidentally awarded a PhD in the programme — if only it was that easy! The slides for the talk, “Supporting Open Research: The role of an academic library”, are available online (doi:10.15131/shef.data.5537269), and the whole event was video’d for posterity and viewable online. I got some good questions too, mainly from the clever online question system. I didn’t get to answer all of them, so I’m thinking of doing a blog post or two to address a few more. There were loads of other great presentations as well, both keynotes and 7-minute lightning talks, so I’d encourage you to take a look at at least some of it. I’ll pick out a few of my highlights. Dr Aled Edwards (University of Toronto) There’s a major problem with science funding that I hadn’t really thought about before. The available funding pool for research is divided up into pots by country, and often by funding body within a country. Each of these pots have robust processes to award funding to the most important problems and most capable researchers. The problem comes because there is no coordination between these pots, so researchers all over the world end up getting funded to research the most popular problems leading to a lot of duplication of effort. Industry funding suffers from a similar problem, particularly the pharmaceutical industry. Because there is no sharing of data or negative results, multiple companies spend billions researching the same dead ends chasing after the same drugs. This is where the astronomical costs of drug development come from. Dr Edwards presented one alternative, modelled by a company called M4K Pharma. The idea is to use existing IP laws to try and give academic researchers a reasonable, morally-justifiable and sustainable profit on drugs they develop, in contrast to the current model where basic research is funded by governments while large corporations hoover up as much profit as they possibly can. This new model would develop drugs all the way to human trial within academia, then license the resulting drugs to companies to manufacture with a price cap to keep the medicines affordable to all who need them. Core to this effort is openness with data, materials and methodology, and Dr Edwards presented several examples of how this approach benefited academic researchers, industry and patients compared with a closed, competitive focus. Dr Kirstie Whitaker (Alan Turing Institute) This was a brilliant presentation, presenting a practical how-to guide to doing reproducible research, from one researcher to another. I suggest you take a look at her slides yourself: Showing your working: a how-to guide to reproducible research. Dr Whitaker briefly addressed a number of common barriers to reproducible research: Is not considered for promotion: so it should be! Held to higher standards than others: reviewers should be discouraged from nitpicking just because the data/code/whatever is available (true unbiased peer review of these would be great though) Publication bias towards novel findings: it is morally wrong to not publish reproductions, replications etc. so we need to address the common taboo on doing so Plead the 5th: if you share, people may find flaws, but if you don’t they can’t — if you’re worried about this you should ask yourself why! Support additional users: some (much?) of the burden should reasonably on the reuser, not the sharer Takes time: this is only true if you hack it together after the fact; if you do it from the start, the whole process will be quicker! Requires additional skills: important to provide training, but also to judge PhD students on their ability to do this, not just on their thesis & papers The rest of the presentation, the “how-to” guide of the title' was a well-chosen and passionately delivered set of recommendations, but the thing that really stuck out for me is how good Dr Whitaker is at making the point that you only have to do one of these things to improve the quality of your research. It’s easy to get the impression at the moment that you have to be fully, perfectly open or not at all, but it’s actually OK to get there one step at a time, or even not to go all the way at all! Anyway, I think this is a slide deck that speaks for itself, so I won’t say any more! Lightning talk highlights There was plenty of good stuff in the lightning talks, which were constrained to 7 minutes each, but a few of the things that stood out for me were, in no particular order: Code Ocean — share and run code in the cloud dat project — peer to peer data syncronisation tool Can automate metadata creation, data syncing, versioning Set up a secure data sharing network that keeps the data in sync but off the cloud Berlin Institute of Health — open science course for students Pre-print paper Course materials InterMine — taking the pain out of data cleaning & analysis Nix/NixOS as a component of a reproducible paper BoneJ (ImageJ plugin for bone analysis) — developed by a scientist, used a lot, now has a Wellcome-funded RSE to develop next version ESASky — amazing live, online archive of masses of astronomical data Coda I really enjoyed the event (and the food was excellent too). My thanks go out to: The programme committee for asking me to come and give my take — I hope I did it justice! The organising team who did a brilliant job of keeping everything running smoothly before and during the event The University of Sheffield for letting me get away with doing things like this! Blog platform switch I’ve just switched my blog over to the Nikola static site generator. Hopefully you won’t notice a thing, but there might be a few weird spectres around til I get all the kinks ironed out. I’ve made the switch for a couple of main reasons: Nikola supports Jupyter notebooks as a source format for blog posts, which will be useful to include code snippets It’s written in Python, a language which I actually know, so I’m more likely to be able to fix things that break, customise it and potentially contribute to the open source project (by contrast, Hugo is written in Go, which I’m not really familiar with) Chat rooms vs Twitter: how I communicate now CC0, Pixabay This time last year, Brad Colbow published a comic in his “The Brads” series entitled “The long slow death of Twitter”. It really encapsulates the way I’ve been feeling about Twitter for a while now. Go ahead and take a look. I’ll still be here when you come back. According to my Twitter profile, I joined in February 2009 as user #20,049,102. It was nearing its 3rd birthday and, though there were clearly a lot of people already signed up at that point, it was still relatively quiet, especially in the UK. I was a lonely PhD student just starting to get interested in educational technology, and one thing that Twitter had in great supply was (and still is) people pushing back the boundaries of what tech can do in different contexts. Somewhere along the way Twitter got really noisy, partly because more people (especially commercial companies) are using it more to talk about stuff that doesn’t interest me, and partly because I now follow 1,200+ people and find I get several tweets a second at peak times, which no-one could be expected to handle. More recently I’ve found my attention drawn to more focussed communities instead of that big old shouting match. I find I’m much more comfortable discussing things and asking questions in small focussed communities because I know who might be interested in what. If I come across an article about a cool new Python library, I’ll geek out about it with my research software engineer friends; if I want advice on an aspect of my emacs setup, I’ll ask a bunch of emacs users. I feel like I’m talking to people who want to hear what I’m saying. Next to that experience, Twitter just feels like standing on a street corner shouting. IRC channels (mostly on Freenode), and similar things like Slack and gitter form the bulk of this for me, along with a growing number of WhatsApp group chats. Although online chat is theoretically a synchronous medium, I find that I can treat it more as “semi-synchronous”: I can have real-time conversations as they arise, but I can also close them and tune back in later to catch up if I want. Now I come to think about it, this is how I used to treat Twitter before the 1,200 follows happened. I also find I visit a handful of forums regularly, mostly of the Reddit link-sharing or StackExchange Q&A type. /r/buildapc was invaluable when I was building my latest box, /r/EarthPorn (very much not NSFW) is just beautiful. I suppose the risk of all this is that I end up reinforcing my own echo chamber. I’m not sure how to deal with that, but I certainly can’t deal with it while also suffering from information overload. Not just certifiable… A couple of months ago, I went to Oxford for an intensive, 2-day course run by Software Carpentry and Data Carpentry for prospective new instructors. I’ve now had confirmation that I’ve completed the checkout procedure so it’s official: I’m now a certified Data Carpentry instructor! As far as I’m aware, the certification process is now combined, so I’m also approved to teach Software Carpentry material too. And of course there’s Library Carpentry too… SSI Fellowship 2020 I’m honoured and excited to be named one of this year’s Software Sustainability Institute Fellows. There’s not much to write about yet because it’s only just started, but I’m looking forward to sharing more with you. In the meantime, you can take a look at the 2020 fellowship announcement and get an idea of my plans from my application video: Talks Here is a selection of talks that I’ve given. {{% template %}} <%! import arrow %> Date Title Location % for talk in post.data("talks"): % if 'date' in talk: ${date.format('ddd d MMM YYYY')} % endif % if 'url' in talk: % endif ${talk['title']} % if 'url' in talk: % endif ${talk.get('location', '')} % endfor {{% /template %}} escueladefiscales-com-2189 ---- Escuela de Fiscales Saltar al contenido (presiona la tecla Intro) Escuela de Fiscales Participación Ciudadana y Gobierno Abierto Sumate como FISCAL Inicio Quienes Somos? Portales Abiertos Descargas Blog Escuela de Fiscales Participación Ciudadana y Gobierno Abierto Inicio Quienes Somos? Portales Abiertos Descargas Blog Sumate como FISCAL Menú Inicio > ¡Sumate a Proyecto Yarquen! El cambio climático es una de las grandes amenazas que enfrenta nuestro mundo actualmente, por eso, tenemos que utilizar todos los medios que tenemos a disposición para detenerlo. Desde Escuela de Fiscales, creamos el «proyecto Yarquen» que utiliza datos abiertos para el activismo ambiental.  El proyecto Yarquen consiste en la creación de una web destinada a organizaciones de la sociedad civil, activistas ambientales, periodismo de datos y personas interesadas en cuestiones ambientales la cual, utilizando una herramienta API actualmente en desarrollo, accede a data sets de portales de transparencia oficial del gobierno nacional, provincial y municipal de Argentina, para posteriormente organizarlos en diferentes categorías y con un motor de búsqueda interno permitir un fácil acceso a las personas que no están familiarizados con el uso y trabajo con Datos Abiertos. Una de las grandes dificultades que encuentran usuarios y usuarias de open data en nuestro país consiste en que los data sets se encuentran distribuidos en docenas de portales oficiales diferentes, haciendo muy difícil para las personas que no trabajan habitualmente con open data poder llegar a tener información completa sobre un tema específico. La web de proyecto Yarquen pretende eliminar esas barreras, al permitir que toda la información disponible pueda accederse desde un solo lugar, con expresiones de búsqueda simples que arrojen como resultado el conjunto completo de datasets. El portal contará además con una sección que permitirá generar solicitudes de accesos a la información publica, para aquellos casos en que alguna información necesaria no se encuentre disponible en las webs oficiales. Contará, además, con una sección especial que permitirá a las organizaciones de la sociedad civil y activistas ambientales registrarse, facilitando la vinculación y el trabajo colaborativo entre ellos. Todo esto brindara herramientas útiles a la sociedad civil para trabajar por el cuidado y preservación del medio ambiente de una manera mas efectiva, con mayor conocimiento e información sobre temas específicos. ¿Cómo podes ayudar? Si sos programador/a, diseñador/a, periodista de datos, activista ambiental, trabajas con datos abiertos o perteneces a una organización o colectivo que luche por la preservación y cuidado del ambiente, o simplemente te preocupa el cambio climatico y queres hacer tu parte ¡Te necesitamos! Envianos un mail a info@escueladefiscales.com y te contactaremos a la brevedad.   ©2021 Escuela de Fiscales.Charity Care | Desarrollado por Rara Theme. Funciona con WordPress. escueladefiscales-com-6368 ---- Escuela de Fiscales – Participación Ciudadana y Gobierno Abierto Saltar al contenido (presiona la tecla Intro) Escuela de Fiscales Participación Ciudadana y Gobierno Abierto Sumate como FISCAL Inicio Quienes Somos? Portales Abiertos Descargas Blog Escuela de Fiscales Participación Ciudadana y Gobierno Abierto Inicio Quienes Somos? Portales Abiertos Descargas Blog Sumate como FISCAL Menú Un proyecto argentino preseleccionado para ganar el “Net Zero Challenge”, un concurso internacional que premia el uso de datos abiertos para la acción climática. Seguir Leyendo! Mar del Plata celebró el Dia de los Datos Abiertos. Seguir Leyendo! Dia de los Datos Abiertos Mar del Plata 2021 Seguir Leyendo! Participamos de la creación del primer Plan de Acción de Congreso Abierto. Seguir Leyendo! Verifica tus datos en el PADRÓN 2021! Seguir Leyendo! Escuela de Fiscales participó de la Cumbre virtual de líderes de la Alianza para el Gobierno Abierto (OGP). Seguir Leyendo! Frena la Curva! Con el fin de hacer nuestro aporte a las medidas para combatir la pandemia del nuevo coronavirus Covid-19 nos sumamos a #FrenaLaCurvaArgentina, parte de la Red Internacional #FrenaLaCurva Seguir Leyendo! Foro Federal contra las Violencias de Género El pasado jueves 13 de febrero, participamos del encuentro convocado en Chapadmalal por el Ministerio de Mujeres, Géneros y Diversidad de la Nación, del que también participaron organizaciones comunitarias, organizaciones de la sociedad civil, referentes de los gobiernos locales y provinciales, legisladoras/es y público en general. Seguir Leyendo! Escuela de Fiscales participó del lanzamiento del 4to Plan Gobierno Abierto en la Casa Rosada Escuela de Fiscales, organización que promueve la participación ciudadana y la transparencia institucional y electoral, participó del lanzamiento del 4to Plan Nacional de Gobierno Abierto y de la asunción de la co-presidencia argentina de la alianza global para el Gobierno Abierto en un evento que se realizó el jueves 19 de septiembre en la Casa Rosada, de la que también participaron funcionarios de gobierno, embajadores y representantes de más de 70 organizaciones de la sociedad civil que trabajaron en la redacción del Plan.  Seguir Leyendo! Conoce nuestras actividades Gobierno AbiertoParticipa! Democracia y EleccionesConoce más! Politicas de GéneroSumate! Noticias Destacadas Un proyecto argentino preseleccionado para ganar el “Net Zero Challenge”, un concurso internacional que premia el uso de datos abiertos para la acción climática. Se trata del Proyecto Yarquen, desarrollado por la organización civil Escuela de Fiscales, el cual utiliza datos abiertos y tecnología como herramienta para la lucha por el ambiente. Un proyecto argentino preseleccionado para ganar el “Net Zero Challenge”, un concurso internacional que premia el uso de datos abiertos para la acción climática. Se trata del Proyecto Yarquen, desarrollado por la organización civil Escuela de Fiscales, el cual utiliza datos abiertos y tecnología como herramienta para la lucha por el ambiente. Mar del Plata celebró el Dia de los Datos Abiertos. Por 4to años consecutivo, Mar del Plata formo parte del calendario internacional de eventos del “Open Data Day”, con un encuentro organizado por Escuela de Fiscales donde se trabajó sobre Ambiente, tecnología, desarrollo sostenible, ecología y activismo ambiental. El sábado 6 de marzo Mar del Plata volvió a sumarse a los eventos internaciones del Open Data Day, una celebración anual … Mar del Plata celebró el Dia de los Datos Abiertos. Por 4to años consecutivo, Mar del Plata formo parte del calendario internacional de eventos del “Open Data Day”, con un encuentro organizado por Escuela de Fiscales donde se trabajó sobre Ambiente, tecnología, desarrollo sostenible, ecología y activismo ambiental. El sábado 6 de marzo Mar del Plata volvió a sumarse a los eventos internaciones del Open Data Day, una celebración anual … Participamos de la creación del primer Plan de Acción de Congreso Abierto. La Honorable Cámara de Diputados de la Nación inició el camino para la elaboración del Primer Plan de Acción de Congreso Abierto con el objetivo de construir un Parlamento más abierto, transparente y participativo, y Escuela de Fiscales participó de las mesas de trabajo y co-creación de los compromisos que la HCDN asumirá entre marzo de 2021 y julio de … Participamos de la creación del primer Plan de Acción de Congreso Abierto. La Honorable Cámara de Diputados de la Nación inició el camino para la elaboración del Primer Plan de Acción de Congreso Abierto con el objetivo de construir un Parlamento más abierto, transparente y participativo, y Escuela de Fiscales participó de las mesas de trabajo y co-creación de los compromisos que la HCDN asumirá entre marzo de 2021 y julio de … Escuela de Fiscales participó de la Cumbre virtual de líderes de la Alianza para el Gobierno Abierto (OGP). Este 24 de Septiembre Escuela de Fiscales participó de la cumbre virtual de líderes de la Alianza para el Gobierno Abierto (OGP), oportunidad en la que se realizó el traspaso de la copresidencia del organismo, que nuestro país ejercía  junto a Robin Hodess, al Gobierno de Corea y a María Baron en representación de la sociedad civil. Escuela de Fiscales participó de la Cumbre virtual de líderes de la Alianza para el Gobierno Abierto (OGP). Este 24 de Septiembre Escuela de Fiscales participó de la cumbre virtual de líderes de la Alianza para el Gobierno Abierto (OGP), oportunidad en la que se realizó el traspaso de la copresidencia del organismo, que nuestro país ejercía  junto a Robin Hodess, al Gobierno de Corea y a María Baron en representación de la sociedad civil. Blog Un proyecto argentino preseleccionado para ganar el “Net Zero Challenge”, un concurso internacional que premia el uso de datos abiertos para la acción climática. Se trata del Proyecto Yarquen, desarrollado por la organización civil... Leer más Mar del Plata celebró el Dia de los Datos Abiertos. Por 4to años consecutivo, Mar del Plata formo parte del... Leer más Dia de los Datos Abiertos Mar del Plata 2021 Escuela de Fiscales prepara un evento sobre medio ambiente y... Leer más ©2021 Escuela de Fiscales.Charity Care | Desarrollado por Rara Theme. Funciona con WordPress. erambler-co-uk-695 ---- eRambler eRambler Recent content on eRambler Intro to the fediverse Wow, it turns out to be 10 years since I wrote this beginners guide to Twitter. Things have moved on a loooooong way since then. Far from being the interesting, disruptive technology it was back then, Twitter has become part of the mainstream, the establishment. Almost everyone and everything is on Twitter now, which has both pros and cons. So what’s the problem? It’s now possible to follow all sorts of useful information feeds, from live updates on transport delays to your favourite sports team’s play-by-play performance to an almost infinite number of cat pictures. In my professional life it’s almost guaranteed that anyone I meet will be on Twitter, meaning that I can contact them to follow up at a later date without having to exchange contact details (and they have options to block me if they don’t like that). On the other hand, a medium where everyone’s opinion is equally valid regardless of knowledge or life experience has turned some parts of the internet into a toxic swamp of hatred and vitriol. It’s easier than ever to forget that we have more common ground with any random stranger than we have similarities, and that’s led to some truly awful acts and a poisonous political arena. Part of the problem here is that each of the social media platforms is controlled by a single entity with almost no accountability to anyone other than shareholders. Technological change has been so rapid that the regulatory regime has no idea how to handle them, leaving them largely free to operate how they want. This has led to a whole heap of nasty consequences that many other people have done a much better job of documenting than I could (Shoshana Zuboff’s book The Age of Surveillance Capitalism is a good example). What I’m going to focus on instead are some possible alternatives. If you accept the above argument, one obvious solution is to break up the effective monopoly enjoyed by Facebook, Twitter et al. We need to be able to retain the wonderful affordances of social media but democratise control of it, so that it can never be dominated by a small number of overly powerful players. What’s the solution? There’s actually a thing that already exists, that almost everyone is familiar with and that already works like this. It’s email. There are a hundred thousand email servers, but my email can always find your inbox if I know your address because that address identifies both you and the email service you use, and they communicate using the same protocol, Simple Mail Transfer Protocol (SMTP)1. I can’t send a message to your Twitter from my Facebook though, because they’re completely incompatible, like oil and water. Facebook has no idea how to talk to Twitter and vice versa (and the companies that control them have zero interest in such interoperability anyway). Just like email, a federated social media service like Mastodon allows you to use any compatible server, or even run your own, and follow accounts on your home server or anywhere else, even servers running different software as long as they use the same ActivityPub protocol. There’s no lock-in because you can move to another server any time you like, and interact with all the same people from your new home, just like changing your email address. Smaller servers mean that no one server ends up with enough power to take over and control everything, as the social media giants do with their own platforms. But at the same time, a small server with a small moderator team can enforce local policy much more easily and block accounts or whole servers that host trolls, nazis or other poisonous people. How do I try it? I have no problem with anyone for choosing to continue to use what we’re already calling “traditional” social media; frankly, Facebook and Twitter are still useful for me to keep in touch with a lot of my friends. However, I do think it’s useful to know some of the alternatives if only to make a more informed decision to stick with your current choices. Most of these services only ask for an email address when you sign up and use of your real name vs a pseudonym is entirely optional so there’s not really any risk in signing up and giving one a try. That said, make sure you take sensible precautions like not reusing a password from another account. Instead of… Try… Twitter, Facebook Mastodon, Pleroma, Misskey Slack, Discord, IRC Matrix WhatsApp, FB Messenger, Telegram Also Matrix Instagram, Flickr PixelFed YouTube PeerTube The web Interplanetary File System (IPFS) Which, if you can believe it, was formalised nearly 40 years ago in 1982 and has only had fairly minor changes since then! ↩︎ Collaborations Workshop 2021: collaborative ideas & hackday My last post covered the more “traditional” lectures-and-panel-sessions approach of the first half of the SSI Collaborations Workshop. The rest of the workshop was much more interactive, consisting of a discussion session, a Collaborative Ideas session, and a whole-day hackathon! The discussion session on day one had us choose a topic (from a list of topics proposed leading up to the workshop) and join a breakout room for that topic with the aim of producing a “speed blog” by then end of 90 minutes. Those speed blogs will be published on the SSI blog over the coming weeks, so I won’t go into that in more detail. The Collaborative Ideas session is a way of generating hackday ideas, by putting people together at random into small groups to each raise a topic of interest to them before discussing and coming up with a combined idea for a hackday project. Because of the serendipitous nature of the groupings, it’s a really good way of generating new ideas from unexpected combinations of individual interests. After that, all the ideas from the session, along with a few others proposed by various participants, were pitched as ideas for the hackday and people started to form teams. Not every idea pitched gets worked on during the hackday, but in the end 9 teams of roughly equal size formed to spend the third day working together. My team’s project: “AHA! An Arts & Humanities Adventure” There’s a lot of FOMO around choosing which team to join for an event like this: there were so many good ideas and I wanted to work on several of them! In the end I settled on a team developing an escape room concept to help Arts & Humanities scholars understand the benefits of working with research software engineers for their research. Five of us rapidly mapped out an example storyline for an escape room, got a website set up with GitHub and populated it with the first few stages of the game. We decided to focus on a story that would help the reader get to grips with what an API is and I’m amazed how much we managed to get done in less than a day’s work! You can try playing through the escape room (so far) yourself on the web, or take a look at the GitHub repository, which contains the source of the website along with a list of outstanding tasks to work on if you’re interested in contributing. I’m not sure yet whether this project has enough momentum to keep going, but it was a really valuable way both of getting to know and building trust with some new people and demonstrating the concept is worth more work. Other projects Here’s a brief rundown of the other projects worked on by teams on the day. Coding Confessions Everyone starts somewhere and everyone cuts corners from time to time. Real developers copy and paste! Fight imposter syndrome by looking through some of these confessions or contributing your own. https://coding-confessions.github.io/ CarpenPI A template to set up a Raspberry Pi with everything you need to run a Carpentries (https://carpentries.org/) data science/software engineering workshop in a remote location without internet access. https://github.com/CarpenPi/docs/wiki Research Dugnads A guide to running an event that is a coming together of a research group or team to share knowledge, pass on skills, tidy and review code, among other software and working best practices (based on the Norwegian concept of a dugnad, a form of “voluntary work done together with other people”) https://research-dugnads.github.io/dugnads-hq/ Collaborations Workshop ideas A meta-project to collect together pitches and ideas from previous Collaborations Workshop conferences and hackdays, to analyse patterns and revisit ideas whose time might now have come. https://github.com/robintw/CW-ideas howDescribedIs Integrate existing tools to improve the machine-readable metadata attached to open research projects by integrating projects like SOMEF, codemeta.json and HowFAIRIs (https://howfairis.readthedocs.io/en/latest/index.html). Complete with CI and badges! https://github.com/KnowledgeCaptureAndDiscovery/somef-github-action Software end-of-project plans Develop a template to plan and communicate what will happen when the fixed-term project funding for your research software ends. Will maintenance continue? When will the project sunset? Who owns the IP? https://github.com/elichad/software-twilight Habeas Corpus A corpus of machine readable data about software used in COVID-19 related research, based on the CORD19 dataset. https://github.com/softwaresaved/habeas-corpus Credit-all Extend the all-contributors GitHub bot (https://allcontributors.org/) to include rich information about research project contributions such as the CASRAI Contributor Roles Taxonomy (https://casrai.org/credit/) https://github.com/dokempf/credit-all I’m excited to see so many metadata-related projects! I plan to take a closer look at what the Habeas Corpus, Credit-all and howDescribedIs teams did when I get time. I also really want to try running a dugnad with my team or for the GLAM Data Science network. Collaborations Workshop 2021: talks & panel session I’ve just finished attending (online) the three days of this year’s SSI Collaborations Workshop (CW for short), and once again it’s been a brilliant experience, as well as mentally exhausting, so I thought I’d better get a summary down while it’s still fresh it my mind. Collaborations Workshop is, as the name suggests, much more focused on facilitating collaborations than a typical conference, and has settled into a structure that starts off with with longer keynotes and lectures, and progressively gets more interactive culminating with a hack day on the third day. That’s a lot to write about, so for this post I’ll focus on the talks and panel session, and follow up with another post about the collaborative bits. I’ll also probably need to come back and add in more links to bits and pieces once slides and the “official” summary of the event become available. Updates 2021-04-07 Added links to recordings of keynotes and panel sessions Provocations The first day began with two keynotes on this year’s main themes: FAIR Research Software and Diversity & Inclusion, and day 2 had a great panel session focused on disability. All three were streamed live and the recordings remain available on Youtube: View the keynotes recording; Google-free alternative link View the panel session recording; Google-free alternative link FAIR Research Software Dr Michelle Barker, Director of the Research Software Alliance, spoke on the challenges to recognition of software as part of the scholarly record: software is not often cited. The FAIR4RS working group has been set up to investigate and create guidance on how the FAIR Principles for data can be adapted to research software as well; as they stand, the Principles are not ideally suited to software. This work will only be the beginning though, as we will also need metrics, training, career paths and much more. ReSA itself has 3 focus areas: people, policy and infrastructure. If you’re interested in getting more involved in this, you can join the ReSA email list. Equality, Diversity & Inclusion: how to go about it Dr Chonnettia Jones, Vice President of Research, Michael Smith Foundation for Health Research spoke extensively and persuasively on the need for Equality, Diversity & Inclusion (EDI) initiatives within research, as there is abundant robust evidence that all research outcomes are improved. She highlighted the difficulties current approaches to EDI have effecting structural change, and changing not just individual behaviours but the cultures & practices that perpetuate iniquity. What initiatives are often constructed around making up for individual deficits, a bitter framing is to start from an understanding of individuals having equal stature but having different tired experiences. Commenting on the current focus on “research excellent” she pointed out that the hyper-competition this promotes is deeply unhealthy. suggesting instead that true excellence requires diversity, and we should focus on an inclusive excellence driven by inclusive leadership. Equality, Diversity & Inclusion: disability issues Day 2’s EDI panel session brought together five disabled academics to discuss the problems of disability in research. Dr Becca Wilson, UKRI Innovation Fellow, Institute of Population Health Science, University of Liverpool (Chair) Phoenix C S Andrews (PhD Student, Information Studies, University of Sheffield and Freelance Writer) Dr Ella Gale (Research Associate and Machine Learning Subject Specialist, School of Chemistry, University of Bristol) Prof Robert Stevens (Professor and Head of Department of Computer Science, University of Manchester) Dr Robin Wilson (Freelance Data Scientist and SSI Fellow) NB. The discussion flowed quite freely so the following summary, so the following summary mixes up input from all the panel members. Researchers are often assumed to be single-minded in following their research calling, and aptness for jobs is often partly judged on “time send”, which disadvantages any disabled person who has been forced to take a career break. On top of this disabled people are often time-poor because of the extra time needed to manage their condition, leaving them with less “output” to show for their time served on many common metrics. This can partially affect early-career researchers, since resources for these are often restricted on a “years-since-PhD” criterion. Time poverty also makes funding with short deadlines that much harder to apply for. Employers add more demands right from the start: new starters are typically expected to complete a health and safety form, generally a brief affair that will suddenly become an 80-page bureaucratic nightmare if you tick the box declaring a disability. Many employers claim to be inclusive yet utterly fail to understand the needs of their disabled staff. Wheelchairs are liberating for those who use them (despite the awful but common phrase “wheelchair-bound”) and yet employers will refuse to insure a wheelchair while travelling for work, classifying it as a “high value personal item” that the owner would take the same responsibility for as an expensive camera. Computers open up the world for blind people in a way that was never possible without them, but it’s not unusual for mandatory training to be inaccessible to screen readers. Some of these barriers can be overcome, but doing so takes yet more time that could and should be spent on more important work. What can we do about it? Academia works on patronage whether we like it or not, so be the person who supports people who are different to you rather than focusing on the one you “recognise yourself in” to mentor. As a manager, it’s important to ask each individual what they need and believe them: they are the expert in their own condition and their lived experience of it. Don’t assume that because someone else in your organisation with the same disability needs one set of accommodations, it’s invalid for your staff member to require something totally different. And remember: disability is unusual as a protected characteristic in that anyone can acquire it at any time without warning! Lightning talks Lightning talk sessions are always tricky to summarise, and while this doesn’t do them justice, here are a few highlights from my notes. Data & metadata Malin Sandstrom talked about a much-needed refinement of contributor role taxonomies for scientific computing Stephan Druskat showcased a project to crowdsource a corpus of research software for further analysis Learning & teaching/community Matthew Bluteau introduced the concept of the “coding dojo” as a way to enhance community of practice. A group of coders got together to practice & learn by working together to solve a problem and explaining their work as they go He described 2 models: a code jam, where people work in small groups, and the Randori method, where 2 people do pair programming while the rest observe. I’m excited to try this out! Steve Crouch talked about intermediate skills and helping people take the next step, which I’m also very interested in with the GLAM Data Science network Esther Plomp recounted experience of running multiple Carpentry workshops online, while Diego Alonso Alvarez discussed planned workshops on making research software more usable with GUIs Shoaib Sufi showcased the SSI’s new event organising guide Caroline Jay reported on a diary study into autonomy & agency in RSE during COVID Lopez, T., Jay, C., Wermelinger, M., & Sharp, H. (2021). How has the covid-19 pandemic affected working conditions for research software engineers? Unpublished manuscript. Wrapping up That’s not everything! But this post is getting pretty long so I’ll wrap up for now. I’ll try to follow up soon with a summary of the “collaborative” part of Collaborations Workshop: the idea-generating sessions and hackday! Time for a new look... I’ve decided to try switching this website back to using Hugo to manage the content and generate the static HTML pages. I’ve been on the Python-based Nikola for a few years now, but recently I’ve been finding it quite slow, and very confusing to understand how to do certain things. I used Hugo recently for the GLAM Data Science Network website and found it had come on a lot since the last time I was using it, so I thought I’d give it another go, and redesign this site to be a bit more minimal at the same time. The theme is still a work in progress so it’ll probably look a bit rough around the edges for a while, but I think I’m happy enough to publish it now. When I get round to it I might publish some more detailed thoughts on the design. Ideas for Accessible Communications The Disability Support Network at work recently ran a survey on “accessible communications”, to develop guidance on how to make communications (especially internal staff comms) more accessible to everyone. I grabbed a copy of my submission because I thought it would be useful to share more widely, so here it is. Please note that these are based on my own experiences only. I am in no way suggesting that these are the only things you would need to do to ensure your communications are fully accessible. They’re just some things to keep in mind. Policies/procedures/guidance can be stressful to use if anything is vague or inconsistent, or if it looks like there might be more information implied than is explicitly given (a common cause of this is use of jargon in e.g. HR policies). Emails relating to these policies have similar problems, made worse because they tend to be very brief. Online meetings can be very helpful, but can also be exhausting, especially if there are too many people, or not enough structure. Larger meetings & webinars without agendas (or where the agenda is ignored, or timings are allowed to drift without acknowledgement) are very stressful, as are those where there is not enough structure to ensure fair opportunities to contribute. Written reference documents and communications should: Be carefully checked for consistency and clarity Have all all key points explicitly stated Explicitly acknowledge the need for flexibility where it is necessary, rather than implying or hinting at it Clearly define jargon & acronyms where they are necessary to the point being made, and avoid them otherwise Include links to longer, more explicit versions where space is tight Provide clear bullet-point summaries with links to the details Online meetings should: Include sufficient break time (at least 10 minutes out of every hour) and not allow this to be compromised just because a speaker has misjudged the length of their talk Include initial “settling-in” time in agendas to avoid timing getting messed up from the start Ensure the agenda is stuck to, or that divergence from the agenda is acknowledged explicitly by the chair and updated timing briefly discussed to ensure everyone is clear Establish a norm for participation at the start of the meeting and stick to it e.g. ask people to raise hands when they have a point to make, or have specific time for round-robin contributions Ensure quiet/introverted people have space to contribute, but don’t force them to do so if they have nothing to add at the time Offer a text-based alternative to contributing verbally If appropriate, at the start of the meeting assign specific roles of: Gatekeeper: ensures everyone has a chance to contribute Timekeeper: ensures meeting runs to time Scribe: ensures a consistent record of the meeting Be chaired by someone with the confidence to enforce the above: offer training to all staff on chairing meetings to ensure everyone has the skills to run a meeting effectively Matrix self-hosting I started running my own Matrix server a little while ago. Matrix is something rather cool, a chat system similar to IRC or Slack, but open and federated. Open in that the standard is available for anyone to view, but also the reference implementations of server and client are open source, along with many other clients and a couple of nascent alternative servers. Federated in that, like email, it doesn’t matter what server you sign up with, you can talk to users on your own or any other server. I decided to host my own for three reasons. Firstly, to see if I could and to learn from it. Secondly, to try and rationalise the Cambrian explosion of Slack teams I was being added to in 2019. Thirdly, to take some control of the loss of access to historical messages in some communities that rely on Slack (especially the Carpentries and RSE communities). Since then, I’ve also added a fourth goal: taking advantage of various bridges to bring other messaging network I use (such as Signal and Telegram) into a consistent UI. I’ve also found that my use of Matrix-only rooms has grown as more individuals & communities have adopted the platform. So, I really like Matrix and I use it daily. My problem now is whether to keep self-hosting. Synapse, the only full server implementation at the moment, is really heavy on memory, so I’ve ended up running it on a much bigger server than I thought I’d need, which seems overkill for a single-user instance. So now I have to make a decision about whether it’s worth keeping going, or shutting it down and going back to matrix.org, or setting up on one of the other servers that have sprung up in the last couple of years. There are a couple of other considerations here. Firstly, Synapse resource usage is entirely down to the size of the rooms joined by users of the homeowner, not directly the number of users. So if users have mostly overlapping interests, and thus keep to the same rooms, you can support quite a large community without significant extra resource usage. Secondly, there are a couple of alternative server implementations in development specifically addressing this issue for small servers. Dendrite and Conduit. Neither are quite ready for what I want yet, but are getting close, and when ready that will allow running small homeservers with much more sensible resource usage. So I could start opening up for other users, and at least justify the size of the server that way. I wouldn’t ever want to make it a paid-for service but perhaps people might be willing to make occasional donations towards running costs. That still leaves me with the question of whether I’m comfortable running a service that others may come to rely on, or being responsible for the safety of their information. I could also hold out for Dendrite or Conduit to mature enough that I’m ready to try them, which might not be more than a few months off. Hmm, seems like I’ve convinced myself to stick with it for now, and we’ll see how it goes. In the meantime, if you know me and you want to try it out let me know and I might risk setting you up with an account! What do you miss least about pre-lockdown life? @JanetHughes on Twitter: What do you miss the least from pre-lockdown life? I absolutely do not miss wandering around the office looking for a meeting room for a confidential call or if I hadn’t managed to book a room in advance. Let’s never return to that joyless frustration, hey? 10:27 AM · Feb 3, 2021 After seeing Terence Eden taking Janet Hughes' tweet from earlier this month as a writing prompt, I thought I might do the same. The first thing that leaps to my mind is commuting. At various points in my life I’ve spent between one and three hours a day travelling to and from work and I’ve never more than tolerated it at best. It steals time from your day, and societal norms dictate that it’s your leisure & self-care time that must be sacrificed. Longer commutes allow more time to get into a book or podcast, especially if not driving, but I’d rather have that time at home rather than trying to be comfortable in a train seat designed for some mythical average man shaped nothing like me! The other thing I don’t miss is the colds and flu! Before the pandemic, British culture encouraged working even when ill, which meant constantly coming into contact with people carrying low-grade viruses. I’m not immunocompromised but some allergies and residue of being asthmatic as a child meant that I would get sick 2-3 times a year. A pleasant side-effect of the COVID precautions we’re all taking is that I haven’t been sick for over 12 months now, which is amazing! Finally, I don’t miss having so little control over my environment. One of the things that working from home has made clear is that there are certain unavoidable aspects of working in my shared office that cause me sensory stress, and that are completely unrelated to my work. Working (or trying to work) next to a noisy automatic scanner; trying to find a light level that works for 6 different people doing different tasks; lacking somewhere quiet and still to eat lunch and recover from a morning of meetings or the constant vaguely-distracting bustle of a large shared office. It all takes energy. Although it’s partly been replaced by the new stress of living through a global pandemic, that old stress was a constant drain on my productivity and mood that had been growing throughout my career as I moved (ironically, given the common assumption that seniority leads to more privacy) into larger and larger open plan offices. Remarkable blogging And the handwritten blog saga continues, as I’ve just received my new reMarkable 2 tablet, which is designed for reading, writing and nothing else. It uses a super-responsive e-ink display and writing on it with a stylus is a dream. It has a slightly rough texture with just a bit of friction that makes my writing come out a lot more legibly than on a slippery glass touchscreen. If that was all there was to it, I might not have wasted my money, but it turns out that it runs on Linux and the makers have wisely decided not to lock it down but to give you full root mess. Yes, you read that right: root access. It presents as an ethernet device over USB, so you can SSH in with a password found in the settings and have full control over your own devices. What a novel concept. This fact alone has meant it’s built a small yet devoted community of users who have come up with some clever ways of extending its functionality. In fact, many of these are listed on this GitHub repository. Finally, from what I’ve seen so far, the handwriting recognition is impressive to say the least. This post was written on it and needed only a little editing. I think this is a device that will get a lot of use! GLAM Data Science Network fellow travellers Updates 2021-02-04 Thanks to Gene @dzshuniper@ausglam.space for suggesting ADHO and a better attribution for the opening quote (see comments below for details) See comments & webmentions for details. “If you want to go fast, go alone. If you want to go far, go together.” — African proverb, probably popularised in English by Kenyan church leader Rev. Samuel Kobia (original) This quote is a popular one in the Carpentries community, and I interpret it in this context to mean that a group of people working together is more sustainable than individuals pursuing the same goal independently. That’s something that speaks to me, and that I want to make sure is reflected in nurturing this new community for data science in galleries, archives, libraries & museums (GLAM). To succeed, this work needs to be complementary and collaborative, rather than competitive, so I want to acknowledge a range of other networks & organisations whose activities complement this. The rest of this article is an unavoidably incomplete list of other relevant organisations whose efforts should be acknowledged and potentially built on. And it should go without saying, but just in case: if the work I’m planning fits right into an existing initiative, then I’m happy to direct my resources there rather than duplicate effort. Inspirations & collaborators Groups with similar goals or undertaking similar activities, but focused on a different sector, geographic area or topic. I think we should make as much use of and contribution to these existing communities as possible since there will be significant overlap. code4lib Probably the closest existing community to what I want to build, but primarily based in the US, so timezones (and physical distance for in-person events) make it difficult to participate fully. This is a well-established community though, with regular events including an annual conference so there’s a lot to learn here. newCardigan Similar to code4lib but an Australian focus, so the timezone problem is even bigger! GLAM Labs Focused on supporting the people experimenting with and developing the infrastructure to enable scholars to access GLAM materials in new ways. In some ways, a GLAM data science network would be complementary to their work, by providing people not directly involved with building GLAM Labs with the skills to make best use of GLAM Labs infrastructure. UK Government data science community Another existing community with very similar intentions, but focused on UK Government sector. Clearly the British Library and a few national & regional museums & archives fall into this, but much of the rest of the GLAM sector does not. Artifical Intelligence for Libraries, Archives & Museums (AI4LAM) A multinational collaboration between several large libraries, archives and museums with a specific focus on the Artificial Intelligence (AI) subset of data science UK Reproducibility Network A network of researchers, primarily in HEIs, with an interest in improving the transparency and reliability of academic research. Mostly science-focused but with some overlap of goals around ethical and robust use of data. Museums Computer Group I’m less familiar with this than the others, but it seems to have a wider focus on technology generally, within the slightly narrower scope of museums specifically. Again, a lot of potential for collaboration. Training Several organisations and looser groups exist specifically to develop and deliver training that will be relevant to members of this network. The network also presents an opportunity for those who have done a workshop with one of these and want to know what the “next steps” are to continue their data science journey. The Carpentries, aka: Library Carpentry Data Carpentry Software Carpentry Data Science Training for Librarians (DST4L) The Programming Historian CDH Cultural Heritage Data School Supporters These misson-driven organisations have goals that align well with what I imagine for the GLAM DSN, but operate at a more strategic level. They work by providing expert guidance and policy advice, lobbying and supporting specific projects with funding and/or effort. In particular, the SSI runs a fellowship programme which is currently providing a small amount of funding to this project. Digital Preservation Coalition (DPC) Software Sustainability Institute (SSI) Research Data Alliance (RDA) Alliance of Digital Humanities Organizations (ADHO) … and its Libraries and Digital Humanities Special Interest Group (Lib&DH SIG) Professional bodies These organisations exist to promote the interests of professionals in particular fields, including supporting professional development. I hope they will provide communication channels to their various members at the least, and may be interested in supporting more directly, depending on their mission and goals. Society of Research Software Engineering Chartered Institute of Library and Information Professionals Archives & Records Association Museums Association Conclusion As I mentioned at the top of the page, this list cannot possibly be complete. This is a growing area and I’m not the only or first person to have this idea. If you can think of anything glaring that I’ve missed and you think should be on this list, leave a comment or tweet/toot at me! A new font for the blog I’ve updated my blog theme to use the quasi-proportional fonts Iosevka Aile and Iosevka Etoile. I really like the aesthetic, as they look like fixed-width console fonts (I use the true fixed-width version of Iosevka in my terminal and text editor) but they’re actually proportional which makes them easier to read. https://typeof.net/Iosevka/ Training a model to recognise my own handwriting If I’m going to train an algorithm to read my weird & awful writing, I’m going to need a decent-sized training set to work with. And since one of the main things I want to do with it is to blog “by hand” it makes sense to focus on that type of material for training. In other words, I need to write out a bunch of blog posts on paper, scan them and transcribe them as ground truth. The added bonus of this plan is that after transcribing, I also end up with some digital text I can use as an actual post — multitasking! So, by the time you read this, I will have already run it through a manual transcription process using Transkribus to add it to my training set, and copy-pasted it into emacs for posting. This is a fun little project because it means I can: Write more by hand with one of my several nice fountain pens, which I enjoy Learn more about the operational process some of my colleagues go through when digitising manuscripts Learn more about the underlying technology & maths, and how to tune the process Produce more lovely content! For you to read! Yay! Write in a way that forces me to put off editing until after a first draft is done and focus more on getting the whole of what I want to say down. That’s it for now — I’ll keep you posted as the project unfolds. Addendum Tee hee! I’m actually just enjoying the process of writing stuff by hand in long-form prose. It’ll be interesting to see how the accuracy turns out and if I need to be more careful about neatness. Will it be better or worse than the big but generic models used by Samsung Notes or OneNote. Maybe I should include some stylus-written text for comparison. Blogging by hand I wrote the following text on my tablet with a stylus, which was an interesting experience: So, thinking about ways to make writing fun again, what if I were to write some of them by hand? I mean I have a tablet with a pretty nice stylus, so maybe handwriting recognition could work. One major problem, of course, is that my handwriting is AWFUL! I guess I’ll just have to see whether the OCR is good enough to cope… It’s something I’ve been thinking about recently anyway: I enjoy writing with a proper fountain pen, so is there a way that I can have a smooth workflow to digitise handwritten text without just typing it back in by hand? That would probably be preferable to this, which actually seems to work quite well but does lead to my hand tensing up to properly control the stylus on the almost-frictionless glass screen. I’m surprised how well it worked! Here’s a sample of the original text: And here’s the result of converting that to text with the built-in handwriting recognition in Samsung Notes: Writing blog posts by hand So, thinking about ways to make writing fun again, what if I were to write some of chum by hand? I mean, I have a toldest winds a pretty nice stylus, so maybe handwriting recognition could work. One major problems, ofcourse, is that my , is AWFUL! Iguess I’ll just have to see whattime the Ocu is good enough to cope… It’s something I’ve hun tthinking about recently anyway: I enjoy wilting with a proper fountain pion, soischeme a way that I can have a smooch workflow to digitise handwritten text without just typing it back in by hand? That wouldprobally be preferableto this, which actually scams to work quito wall but doers load to my hand tensing up to properly couldthe stylus once almost-frictionlessg lass scream. It’s pretty good! It did require a fair bit of editing though, and I reckon we can do better with a model that’s properly trained on a large enough sample of my own handwriting. What I want from a GLAM/Cultural Heritage Data Science Network Introduction As I mentioned last year, I was awarded a Software Sustainability Institute Fellowship to pursue the project of setting up a Cultural Heritage/GLAM data science network. Obviously, the global pandemic has forced a re-think of many plans and this is no exception, so I’m coming back to reflect on it and make sure I’m clear about the core goals so that everything else still moves in the right direction. One of the main reasons I have for setting up a GLAM data science network is because it’s something I want. The advice to “scratch your own itch” is often given to people looking for an open project to start or contribute to, and the lack of a community of people with whom to learn & share ideas and practice is something that itches for me very much. The “motivation” section in my original draft project brief for this work said: Cultural heritage work, like all knowledge work, is increasingly data-based, or at least gives opportunities to make use of data day-to-day. The proper skills to use this data enable more effective working. Knowledge and experience thus gained improves understanding of and empathy with users also using such skills. But of course, I have my own reasons for wanting to do this too. In particular, I want to: Advocate for the value of ethical, sustainable data science across a wide range of roles within the British Library and the wider sector Advance the sector to make the best use of data and digital sources in the most ethical and sustainable way possible Understand how and why people use data from the British Library, and plan/deliver better services to support that Keep up to date with relevant developments in data science Learn from others' skills and experiences, and share my own in turn Those initial goals imply some further supporting goals: Build up the confidence of colleagues who might benefit from data science skills but don’t feel they are “technical” or “computer literate” enough Further to that, build up a base of colleagues with the confidence to share their skills & knowledge with others, whether through teaching, giving talks, writing or other channels Identify common awareness gaps (skills/knowledge that people don’t know they’re missing) and address them Develop a communal space (primarily online) in which people feel safe to ask questions Develop a body of professional practice and help colleagues to learn and contribute to the evolution of this, including practices of data ethics, software engineering, statistics, high performance computing, … Break down language barriers between data scientists and others I’ll expand on this separately as my planning develops, but here are a few specific activities that I’d like to be able to do to support this: Organise less-formal learning and sharing events to complement the more formal training already available within organisations and the wider sector, including “show and tell” sessions, panel discussions, code cafés, masterclasses, guest speakers, reading/study groups, co-working sessions, … Organise training to cover intermediate skills and knowledge currently missing from the available options, including the awareness gaps and professional practice mentioned above Collect together links to other relevant resources to support self-led learning Decisions to be made There are all sorts of open questions in my head about this right now, but here are some of the key ones. Is it GLAM or Cultural Heritage? When I first started planning this whole thing, I went with “Cultural Heritage”, since I was pretty transparently targeting my own organisation. The British Library is fairly unequivocally a CH organisation. But as I’ve gone along I’ve found myself gravitating more towards the term “GLAM” (which stands for Galleries, Libraries, Archives, Museums) as it covers a similar range of work but is clearer (when you spell out the acronym) about what kinds of work are included. What skills are relevant? This turns out to be surprisingly important, at least in terms of how the community is described, as they define the boundaries of the community and can be the difference between someone feeling welcome or excluded. For example, I think that some introductory statistics training would be immensely valuable for anyone working with data to understand what options are open to them and what limitations those options have, but is the word “statistics” offputting per se to those who’ve chosen a career in arts & humanities? I don’t know because I don’t have that background and perspective. Keep it internal to the BL, or open up early on? I originally planned to focus primarily on my own organisation to start with, feeling that it would be easier to organise events and build a network within a single organisation. However, the pandemic has changed my thinking significantly. Firstly, it’s now impossible to organise in-person events and that will continue for quite some time to come, so there is less need to focus on the logistics of getting people into the same room. Secondly, people within the sector are much more used to attending remote events, which can easily be opened up to multiple organisations in many countries, timezones allowing. It now makes more sense to focus primarily on online activities, which opens up the possibility of building a critical mass of active participants much more quickly by opening up to the wider sector. Conclusion This is the type of post that I could let run and run without ever actually publishing, but since it’s something I need feedback and opinions on from other people, I’d better ship it! I really want to know what you think about this, whether you feel it’s relevant to you and what would make it useful. Comments are open below, or you can contact me via Mastodon or Twitter. Writing About Not Writing Under Construction Grunge Sign by Nicolas Raymond — CC BY 2.0 Every year, around this time of year, I start doing two things. First, I start thinking I could really start to understand monads and write more than toy programs in Haskell. This is unlikely to ever actually happen unless and until I get a day job where I can justify writing useful programs in Haskell, but Advent of Code always gets me thinking otherwise. Second, I start mentally writing this same post. You know, the one about how the blogger in question hasn’t had much time to write but will be back soon? “Sorry I haven’t written much lately…” It’s about as cliché as a Geocities site with a permanent “Under construction” GIF. At some point, not long after the dawn of ~time~ the internet, most people realised that every website was permanently under construction and publishing something not ready to be published was just pointless. So I figured this year I’d actually finish writing it and publish it. After all, what’s the worst that could happen? If we’re getting all reflective about this, I could probably suggest some reasons why I’m not writing much: For a start, there’s a lot going on in both my world and The World right now, which doesn’t leave a lot of spare energy after getting up, eating, housework, working and a few other necessary activities. As a result, I’m easily distracted and I tend to let myself get dragged off in other directions before I even get to writing much of anything. If I do manage to focus on this blog in general, I’ll often end up working on some minor tweak to the theme or functionality. I mean, right now I’m wondering if I can do something clever in my text-editor (Emacs, since you’re asking) to streamline my writing & editing process so it’s more elegant, efficient, ergonomic and slightly closer to perfect in every way. It also makes me much more likely to self-censor, and to indulge my perfectionist tendencies to try and tweak the writing until it’s absolutely perfect, which of course never happens. I’ve got a whole heap of partly-written posts that are juuuust waiting for the right motivation for me to just finish them off. The only real solution is to accept that: I’m not going to write much and that’s probably OK What I do write won’t always be the work of carefully-researched, finely crafted genius that I want it to be, and that’s probably OK too Also to remember why I started writing and publishing stuff in the first place: to reflect and get my thoughts out onto a (virtual) page so that I can see them, figure out whether I agree with myself and learn; and to stimulate discussion and get other views on my (possibly uninformed, incorrect or half-formed) thoughts, also to learn. In other words, a thing I do for me. It’s easy to forget that and worry too much about whether anyone else wants to read my s—t. Will you notice any changes? Maybe? Maybe not? Who knows. But it’s a new year and that’s as good a time for a change as any. When is a persistent identifier not persistent? Or an identifier? I wrote a post on the problems with ISBNs as persistent identifiers (PIDS) for work, so check it out if that sounds interesting. IDCC20 reflections I’m just back from IDCC20, so here are a few reflections on this year’s conference. You can find all the available slides and links to shared notes on the conference programme. There’s also a list of all the posters and an overview of the Unconference Skills for curation of diverse datasets Here in the UK and elsewhere, you’re unlikely to find many institutions claiming to apply a deep level of curation to every dataset/software package/etc deposited with them. There are so many different kinds of data and so few people in any one institution doing “curation” that it’s impossible to do this for everything. Absent the knowledge and skills required to fully evaluate an object the best that can be done is usually to make a sense check on the metadata and flag up with the depositor potential for high-level issues such as accidental disclosure of sensitive personal information. The Data Curation Network in the United States is aiming to address this issue by pooling expertise across multiple organisations. The pilot has been highly successful and they’re now looking to obtain funding to continue this work. The Swedish National Data Service is experimenting with a similar model, also with a lot of success. As well as sharing individual expertise, the DCN collaboration has also produced some excellent online quick-reference guides for curating common types of data. We had some further discussion as part of the Unconference on the final day about what it would look like to introduce this model in the UK. There was general agreement that this was a good idea and a way to make optimal use of sparse resources. There were also very valid concerns that it would be difficult in the current financial climate for anyone to justify doing work for another organisation, apparently for free. In my mind there are two ways around this, which are not mutually exclusive by any stretch of the imagination. First is to Just Do It: form an informal network of curators around something simple like a mailing list, and give it a try. Second is for one or more trusted organisations to provide some coordination and structure. There are several candidates for this including DCC, Jisc, DPC and the British Library; we all have complementary strengths in this area so it’s my hope that we’ll be able to collaborate around it. In the meantime, I hope the discussion continues. Artificial intelligence, machine learning et al As you might expect at any tech-oriented conference there was a strong theme of AI running through many presentations, starting from the very first keynote from Francine Berman. Her talk, The Internet of Things: Utopia or Dystopia? used self-driving cars as a case study to unpack some of the ethical and privacy implications of AI. For example, driverless cars can potentially increase efficiency, both through route-planning and driving technique, but also by allowing fewer vehicles to be shared by more people. However, a shared vehicle is not a private space in the way your own car is: anything you say or do while in that space is potentially open to surveillance. Aside from this, there are some interesting ideas being discussed, particularly around the possibility of using machine learning to automate increasingly complex actions and workflows such as data curation and metadata enhancement. I didn’t get the impression anyone is doing this in the real world yet, but I’ve previously seen theoretical concepts discussed at IDCC make it into practice so watch this space! Playing games! Training is always a major IDCC theme, and this year two of the most popular conference submissions described games used to help teach digital curation concepts and skills. Mary Donaldson and Matt Mahon of the University of Glasgow presented their use of Lego to teach the concept of sufficient metadata. Participants build simple models before documenting the process and breaking them down again. Then everyone had to use someone else’s documentation to try and recreate the models, learning important lessons about assumptions and including sufficient detail. Kirsty Merrett and Zosia Beckles from the University of Bristol brought along their card game “Researchers, Impact and Publications (RIP)”, based on the popular “Cards Against Humanity”. RIP encourages players to examine some of the reasons for and against data sharing with plenty of humour thrown in. Both games were trialled by many of the attendees during Thursday’s Unconference. Summary I realised in Dublin that it’s 8 years since I attended my first IDCC, held at the University of Bristol in December 2011 while I was still working at the nearby University of Bath. While I haven’t been every year, I’ve been to every one held in Europe since then and it’s interesting to see what has and hasn’t changed. We’re no longer discussing data management plans, data scientists or various other things as abstract concepts that we’d like to encourage, but dealing with the real-world consequences of them. The conference has also grown over the years: this year was the biggest yet, boasting over 300 attendees. There has been especially big growth in attendees from North America, Australasia, Africa and the Middle East. That’s great for the diversity of the conference as it brings in more voices and viewpoints than ever. With more people around to interact with I have to work harder to manage my energy levels but I think that’s a small price to pay. Iosevka: a nice fixed-width-font Iosevka is a nice, slender monospace font with a lot of configurable variations. Check it out: https://typeof.net/Iosevka/ Replacing comments with webmentions Just a quickie to say that I’ve replaced the comment section at the bottom of each post with webmentions, which allows you to comment by posting on your own site and linking here. It’s a fundamental part of the IndieWeb, which I’m slowly getting to grips with having been a halfway member of it for years by virtue of having my own site on my own domain. I’d already got rid of Google Analytics to stop forcing that tracking on my visitors, I wanted to get rid of Disqus too because I’m pretty sure the only way that is free for me is if they’re selling my data and yours to third parties. Webmention is a nice alternative because it relies only on open standards, has no tracking and allows people to control their own comments. While I’m currently using a third-party service to help, I can switch to self-hosted at any point in the future, completely transparently. Thanks to webmention.io, which handles incoming webmentions for me, and webmention.js, which displays them on the site, I can keep it all static and not have to implement any of this myself, which is nice. It’s a bit harder to comment because you have to be able to host your own content somewhere, but then almost no-one ever commented anyway, so it’s not like I’ll lose anything! Plus, if I get Bridgy set up right, you should be able to comment just by replying on Mastodon, Twitter or a few other places. A spot of web searching shows that I’m not the first to make the Disqus -> webmentions switch (yes, I’m putting these links in blatantly to test outgoing webmentions with Telegraph…): So long Disqus, hello webmention — Nicholas Hoizey Bye Disqus, hello Webmention! — Evert Pot Implementing Webmention on a static site — Deluvi Let’s see how this goes! Bridging Carpentries Slack channels to Matrix It looks like I’ve accidentally taken charge of bridging a bunch of The Carpentries Slack channels over to Matrix. Given this, it seems like a good idea to explain what that sentence means and reflect a little on my reasoning. I’m more than happy to discuss the pros and cons of this approach If you just want to try chatting in Matrix, jump to the getting started section What are Slack and Matrix? Slack (see also on Wikipedia), for those not familiar with it, is an online text chat platform with the feel of IRC (Internet Relay Chat), a modern look and feel and both web and smartphone interfaces. By providing a free tier that meets many peoples' needs on its own Slack has become the communication platform of choice for thousands of online communities, private projects and more. One of the major disadvantages of using Slack’s free tier, as many community organisations do, is that as an incentive to upgrade to a paid service your chat history is limited to the most recent 10,000 messages across all channels. For a busy community like The Carpentries, this means that messages older than about 6-7 weeks are already inaccessible, rendering some of the quieter channels apparently empty. As Slack is at pains to point out, that history isn’t gone, just archived and hidden from view unless you pay the low, low price of $1/user/month. That doesn’t seem too pricy, unless you’re a non-profit organisation with a lot of projects you want to fund and an active membership of several hundred worldwide, at which point it soon adds up. Slack does offer to waive the cost for registered non-profit organisations, but only for one community. The Carpentries is not an independent organisation, but one fiscally sponsored by Community Initiatives, which has already used its free quota of one elsewhere rendering the Carpentries ineligible. Other umbrella organisations such as NumFocus (and, I expect, Mozilla) also run into this problem with Slack. So, we have a community which is slowly and inexorably losing its own history behind a paywall. For some people this is simply annoying, but from my perspective as a facilitator of the preservation of digital things the community is haemhorraging an important record of its early history. Enter Matrix. Matrix is a chat platform similar to IRC, Slack or Discord. It’s divided into separate channels, and users can join one or more of these to take part in the conversation happening in those channels. What sets it apart from older technology like IRC and walled gardens like Slack & Discord is that it’s federated. Federation means simply that users on any server can communicate with users and channels on any other server. Usernames and channel addresses specify both the individual identifier and the server it calls home, just as your email address contains all the information needed for my email server to route messages to it. While users are currently tied to their home server, channels can be mirrored and synchronised across multiple servers making the overall system much more resilient. Can’t connect to your favourite channel on server X? No problem: just connect via its alias on server Y and when X comes back online it will be resynchronised. The technology used is much more modern and secure than the aging IRC protocol, and there’s no vender lock-in like there is with closed platforms like Slack and Discord. On top of that, Matrix channels can easily be “bridged” to channels/rooms on other platforms, including, yes, Slack, so that you can join on Matrix and transparently talk to people connected to the bridged room, or vice versa. So, to summarise: The current Carpentries Slack channels could be bridged to Matrix at no cost and with no disruption to existing users The history of those channels from that point on would be retained on matrix.org and accessible even when it’s no longer available on Slack If at some point in the future The Carpentries chose to invest in its own Matrix server, it could adopt and become the main Matrix home of these channels without disruption to users of either Matrix or (if it’s still in use at that point) Slack Matrix is an open protocol, with a reference server implementation and wide range of clients all available as free software, which aligns with the values of the Carpentries community On top of this: I’m fed up of having so many different Slack teams to switch between to see the channels in all of them, and prefer having all the channels I regularly visit in a single unified interface; I wanted to see how easy this would be and whether others would also be interested. Given all this, I thought I’d go ahead and give it a try to see if it made things more manageable for me and to see what the reaction would be from the community. How can I get started? !!! reminder Please remember that, like any other Carpentries space, the Code of Conduct applies in all of these channels. First, sign up for a Matrix account. The quickest way to do this is on the Matrix “Try now” page, which will take you to the Riot Web client which for many is synonymous with Matrix. Other clients are also available for the adventurous. Second, join one of the channels. The links below will take you to a page that will let you connect via your preferred client. You’ll need to log in as they are set not to allow guest access, but, unlike Slack, you won’t need an invitation to be able to join. #general — the main open channel to discuss all things Carpentries #random — anything that would be considered offtopic elsewhere #welcome — join in and introduce yourself! That’s all there is to getting started with Matrix. To find all the bridged channels there’s a Matrix “community” that I’ve added them all to: Carpentries Matrix community. There’s a lot more, including how to bridge your favourite channels from Slack to Matrix, but this is all I’ve got time and space for here! If you want to know more, leave a comment below, or send me a message on Slack (jezcope) or maybe Matrix (@petrichor:matrix.org)! I’ve also made a separate channel for Matrix-Slack discussions: #matrix on Slack and Carpentries Matrix Discussion on Matrix MozFest19 first reflections Discussions of neurodiversity at #mozfest Photo by Jennifer Riggins The other weekend I had my first experience of Mozilla Festival, aka #mozfest. It was pretty awesome. I met quite a few people in real life that I’ve previously only known (/stalked) on Twitter, and caught up with others that I haven’t seen for a while. I had the honour of co-facilitating a workshop session on imposter syndrome and how to deal with it with the wonderful Yo Yehudi and Emmy Tsang. We all learned a lot and hope our participants did too; we’ll be putting together a summary blog post as soon as we can get our act together! I also attended a great session, led by Kiran Oliver (psst, they’re looking for a new challenge), on how to encourage and support a neurodiverse workforce. I was only there for the one day, and I really wish that I’d taken the plunge and committed to the whole weekend. There’s always next year though! To be honest, I’m just disappointed that I never had the courage to go sooner, Music for working Today1 the office conversation turned to blocking out background noise. (No, the irony is not lost on me.) Like many people I work in a large, open-plan office, and I’m not alone amongst my colleagues in sometimes needing to find a way to boost concentration by blocking out distractions. Not everyone is like this, but I find music does the trick for me. I also find that different types of music are better for different types of work, and I use this to try and manage my energy better. There are more distractions than auditory noise, and at times I really struggle with visual noise. Rather than have this post turn into a rant about the evils of open-plan offices, I’ll just mention that the scientific evidence doesn’t paint them in a good light2, or at least suggests that the benefits are more limited in scope than is commonly thought3, and move on to what I actually wanted to share: good music for working to. There are a number of genres that I find useful for working. Generally, these have in common a consistent tempo, a lack of lyrics, and enough variation to prevent boredom without distracting. Familiarity helps my concentration too so I’ll often listen to a restricted set of albums for a while, gradually moving on by dropping one out and bringing in another. In my case this includes: Traditional dance music, generally from northern and western European traditions for me. This music has to be rhythmically consistent to allow social dancing, and while the melodies are typically simple repeated phrases, skilled musicians improvise around that to make something beautiful. I tend to go through phases of listening to particular traditions; I’m currently listening to a lot of French, Belgian and Scandinavian. Computer game soundtracks, which are specifically designed to enhance gameplay without distracting, making them perfect for other activities requiring a similar level of concentration. Chiptunes and other music incorporating it; partly overlapping with the previous category, chiptunes is music made by hacking the audio chips from (usually) old computers and games machines to become an instrument for new music. Because of the nature of the instrument, this will have millisecond-perfect rhythm and again makes for undistracting noise blocking with an extra helping of nostalgia! Purists would disagree with me, but I like artists that combine chiptunes with other instruments and effects to make something more complete-sounding. Retrowave/synthwave/outrun, synth-driven music that’s instantly familiar as the soundtrack to many 90s sci-fi and thriller movies. Atmospheric, almost dreamy, but rhythmic with a driving beat, it’s another genre that fits into the “pleasing but not too surprising” category for me. So where to find this stuff? One of the best resources I’ve found is Music for Programming which provides carefully curated playlists of mostly electronic music designed to energise without distracting. They’re so well done that the tracks move seamlessly, one to the next, without ever getting boring. Spotify is an obvious option, and I do use it quite a lot. However, I’ve started trying to find ways to support artists more directly, and Bandcamp seems to be a good way of doing that. It’s really easy to browse by genre, or discover artists similar to what you’re currently hearing. You can listen for free as long as you don’t mind occasional nags to buy the music you’re hearing, but you can also buy tracks or albums. Music you’ve paid for is downloadable in several open, DRM-free formats for you to keep, and you know that a decent chunk of that cash is going directly to that artist. I also love noise generators; not exactly music, but a variety of pleasant background noises, some of which nicely obscure typical office noise. I particularly like mynoise.net, which has a cornucopia of different natural and synthetic noises. Each generator comes with a range of sliders allowing you to tweak the composition and frequency range, and will even animate them randomly for you to create a gently shifting soundscape. A much simpler, but still great, option is Noisli with it’s nice clean interface. Both offer apps for iOS and Android. For bonus points, you can always try combining one or more of the above. Adding in a noise generator allows me to listen to quieter music while still getting good environmental isolation when I need concentration. Another favourite combo is to open both the cafe and rainfall generators from myNoise, made easier by the ability to pop out a mini-player then open up a second generator. I must be missing stuff though. What other musical genres should I try? What background sounds are nice to work to? Well, you know. The other day. Whatever. ↩︎ See e.g.: Lee, So Young, and Jay L. Brand. ‘Effects of Control over Office Workspace on Perceptions of the Work Environment and Work Outcomes’. Journal of Environmental Psychology 25, no. 3 (1 September 2005): 323–33. https://doi.org/10.1016/j.jenvp.2005.08.001. ↩︎ Open plan offices can actually work under certain conditions, The Conversation ↩︎ Working at the British Library: 6 months in It barely seems like it, but I’ve been at the British Library now for nearly 6 months. It always takes a long time to adjust and from experience I know it’ll be another year before I feel fully settled, but my team, department and other colleagues have really made me feel welcome and like I belong. One thing that hasn’t got old yet is the occasional thrill of remembering that I work at my national library now. Every now and then I’ll catch a glimpse of the collections at Boston Spa or step into one of the reading rooms and think “wow, I actually work here!” I also like having a national and international role to play, which means I get to travel a bit more than I used to. Budgets are still tight so there are limits, and I still prefer to be home more often than not, but there is more scope in this job than I’ve had previously for travelling to conferences, giving talks that change the way people think, and learning in different contexts. I’m learning a lot too, especially how to work with and manage people split across multiple sites, and the care and feeding of budgets. As well as missing mo old team at Sheffield, I do also miss some of the direct contact I had with researchers in HE. I especially miss the teaching work, but also the higher-level influencing of more senior academics to change practices on a wider scale. Still, I get to use those influencing skills in different ways now, and I’m still involved with the Carpentries which should let me keep my hand in with teaching. I still deal with my general tendency to try and do All The Things, and as before I’m slowly learning to recognise it, tame it and very occasionally turn it to my advantage. That also leads to feelings of imposterism that are only magnified by the knowledge that I now work at a national institution! It’s a constant struggle some days to believe that I’ve actually earned my place here through hard work, Even if I don’t always feel that I have, my colleagues here certainly have, so I should have more faith in their opinion of me. Finally, I couldn’t write this type of thing without mentioning the commute. I’ve gone from 90 minutes each way on a good day (up to twice that if the trains were disrupted) to 35 minutes each way along fairly open roads. I have less time to read, but much more time at home. On top of that, the library has implemented flexitime across all pay grades, with even senior managers strongly encouraged to make full use. Not only is this an important enabler of equality across the organisation, it relieves for me personally the pressure to work over my contracted hours and the guilt I’ve always felt at leaving work even 10 minutes early. If I work late, it’s now a choice I’m making based on business needs instead of guilt and in full knowledge that I’ll get that time back later. So that’s where I am right now. I’m really enjoying the work and the culture, and I look forward to what the next 6 months will bring! RDA Plenary 13 reflection Photo by me I sit here writing this in the departure lounge at Philadelphia International Airport, waiting for my Aer Lingus flight back after a week at the 13th Research Data Alliance (RDA) Plenary (although I’m actually publishing this a week or so later at home). I’m pretty exhausted, partly because of the jet lag, and partly because it’s been a very full week with so much to take in. It’s my first time at an RDA Plenary, and it was quite a new experience for me! First off, it’s my first time outside Europe, and thus my first time crossing quite so many timezones. I’ve been waking at 5am and ready to drop by 8pm, but I’ve struggled on through! Secondly, it’s the biggest conference I’ve been to for a long time, both in number of attendees and number of parallel sessions. There’s been a lot of sustained input so I’ve been very glad to have a room in the conference hotel and be able to escape for a few minutes when I needed to recharge. Thirdly, it’s not really like any other conference I’ve been to: rather than having large numbers of presentations submitted by attendees, each session comprises lots of parallel meetings of RDA interest groups and working groups. It’s more community-oriented: an opportunity for groups to get together face to face and make plans or show off results. I found it pretty intense and struggled to take it all in, but incredibly valuable nonetheless. Lots of information to process (I took a lot of notes) and a few contacts to follow up on too, so overall I loved it! Using Pipfile in Binder Photo by Sear Greyson on Unsplash I recently attended a workshop, organised by the excellent team of the Turing Way project, on a tool called BinderHub. BinderHub, along with public hosting platform MyBinder, allows you to publish computational notebooks online as “binders” such that they’re not static but fully interactive. It’s able to do this by using a tool called repo2docker to capture the full computational environment and dependencies required to run the notebook. !!! aside “What is the Turing Way?” The Turing Way is, in its own words, “a lightly opinionated guide to reproducible data science.” The team is building an open textbook and running a number of workshops for scientists and research software engineers, and you should check out the project on Github. You could even contribute! The Binder process goes roughly like this: Do some work in a Jupyter Notebook or similar Put it into a public git repository Add some extra metadata describing the packages and versions your code relies on Go to mybinder.org and tell it where to find your repository Open the URL it generates for you Profit Other than step 5, which can take some time to build the binder, this is a remarkably quick process. It supports a number of different languages too, including built-in support for R, Python and Julia and the ability to configure pretty much any other language that will run on Linux. However, the Python support currently requires you to have either a requirements.txt or Conda-style environment.yml file to specify dependencies, and I commonly use a Pipfile for this instead. Pipfile allows you to specify a loose range of compatible versions for maximal convenience, but then locks in specific versions for maximal reproducibility. You can upgrade packages any time you want, but you’re fully in control of when that happens, and the locked versions are checked into version control so that everyone working on a project gets consistency. Since Pipfile is emerging as something of a standard thought I’d see if I could use that in a binder, and it turns out to be remarkably simple. The reference implementation of Pipfile is a tool called pipenv by the prolific Kenneth Reitz. All you need to use this in your binder is two files of one line each. requirements.txt tells repo2binder to build a Python-based binder, and contains a single line to install the pipenv package: pipenv Then postBuild is used by repo2binder to install all other dependencies using pipenv: pipenv install --system The --system flag tells pipenv to install packages globally (its default behaviour is to create a Python virtualenv). With these two files, the binder builds and runs as expected. You can see a complete example that I put together during the workshop here on Gitlab. What do you think I should write about? I’ve found it increasingly difficult to make time to blog, and it’s not so much not having the time — I’m pretty privileged in that regard — but finding the motivation. Thinking about what used to motivate me, one of the big things was writing things that other people wanted to read. Rather than try to guess, I thought I’d ask! Those who know what I'm about, what would you read about, if it was written by me?I'm trying to break through the blog-writers block and would love to know what other people would like to see my ill-considered opinions on.— Jez Cope (@jezcope) March 7, 2019 I’m still looking for ideas, so please tweet me or leave me a comment below. Below are a few thoughts that I’m planning to do something with. Something taking one of the more techy aspects of Open Research, breaking it down and explaining the benefits for non-techy folks?— Dr Beth 🏳️‍🌈 🐺 (@PhdGeek) March 7, 2019 Skills (both techy and non techy) that people need to most effectively support RDM— Kate O'Neill (@KateFONeill) March 7, 2019 Sometimes I forget that my background makes me well-qualified to take some of these technical aspects of the job and break them down for different audiences. There might be a whole series in this… Carrying on our conversation last week I'd love to hear more about how you've found moving from an HE lib to a national library and how you see the BL's role in RDM. Appreciate this might be a bit niche/me looking for more interesting things to cite :)— Rosie Higman (@RosieHLib) March 7, 2019 This is interesting, and something I’d like to reflect on; moving from one job to another always has lessons and it’s easy to miss them if you’re not paying attention. Another one for the pile. Life without admin rights to your computer— Mike Croucher (@walkingrandomly) March 7, 2019 This is so frustrating as an end user, but at the same time I get that endpoint security is difficult and there are massive risks associated with letting end users have admin rights. This is particularly important at the BL: as custodian’s of a nation’s cultural heritage, the risk for us is bigger than for many and for this reason we are now Cyber Essentials Plus certified. At some point I’d like to do some research and have a conversation with someone who knows a lot more about InfoSec to work out what the proper approach to this, maybe involving VMs and a demilitarized zone on the network. I’m always looking for more inspiration, so please leave a comment if you’ve got anything you’d like to read my thoughts on. If you’re not familiar with my writing, please take a minute or two to explore the blog; the tags page is probably a good place to get an overview. Ultimate Hacking Keyboard: first thoughts Following on from the excitement of having built a functioning keyboard myself, I got a parcel on Monday. Inside was something that I’ve been waiting for since September: an Ultimate Hacking Keyboard! Where the custom-built Laplace is small and quiet for travelling, the UHK is to be my main workhorse in the study at home. Here are my first impressions: Key switches I went with Kailh blue switches from the available options. In stark contrast to the quiet blacks on the Laplace, blues are NOISY! They have an extra piece of plastic inside the switch that causes an audible and tactile click when the switch activates. This makes them very satisfying to type on and should help as I train my fingers not to bottom out while typing, but does make them unsuitable for use in a shared office! Here are some animations showing how the main types of key switch vary. Layout This keyboard has what’s known as a 60% layout: no number pad, arrows or function keys. As with the more spartan Laplace, these “missing” keys are made up for with programmable layers. For example, the arrow keys are on the Mod layer on the I/J/K/L keys, so I can access them without moving from the home row. I actually find this preferable to having to move my hand to the right to reach them, and I really never used the number pad in any case. Split This is a split keyboard, which means that the left and right halves can be separated to place the hands further apart which eases strain across the shoulders. The UHK has a neat coiled cable joining the two which doesn’t get in the way. A cool design feature is that the two halves can be slotted back together and function perfectly well as a non-split keyboard too, held together by magnets. There are even electrical contacts so that when the two are joined you don’t need the linking cable. Programming The board is fully programmable, and this is achieved via a custom (open source) GUI tool which talks to the (open source) firmware on the board. You can have multiple keymaps, each of which has a separate Base, Mod, Fn and Mouse layer, and there’s an LED display that shows a short mnemonic for the currently active map. I already have a customised Dvorak layout for day-to-day use, plus a standard QWERTY for not-me to use and an alternative QWERTY which will be slowly tweaked for games that don’t work well with Dvorak. Mouse keys One cool feature that the designers have included in the firmware is the ability to emulate a mouse. There’s a separate layer that allows me to move the cursor, scroll and click without moving my hands from the keyboard. Palm rests Not much to say about the palm rests, other than they are solid wood, and chunky, and really add a little something. I have to say, I really like it so far! Overall it feels really well designed, with every little detail carefully thought out and excellent build quality and a really solid feeling. Custom-built keyboard I’m typing this post on a keyboard I made myself, and I’m rather excited about it! Why make my own keyboard? I wanted to learn a little bit about practical electronics, and I like to learn by doing I wanted to have the feeling of making something useful with my own hands I actually need a small, keyboard with good-quality switches now that I travel a fair bit for work and this lets me completely customise it to my needs Just because! While it is possible to make a keyboard completely from scratch, it makes much more sense to put together some premade parts. The parts you need are: PCB (printed circuit board): the backbone of the keyboard, to which all the other electrical components attach, this defines the possible physical locations for each key Switches: one for each key to complete a circuit whenever you press it Keycaps: switches are pretty ugly and pretty uncomfortable to press, so each one gets a cap; these are what you probably think of as the “keys” on your keyboard and come in almost limitless variety of designs (within the obvious size limitation) and are the easiest bit of personalisation Controller: the clever bit, which detects open and closed switches on the PCB and tells your computer what keys you pressed via a USB cable Firmware: the program that runs on the controller starts off as source code like any other program, and altering this can make the keyboard behave in loads of different ways, from different layouts to multiple layers accessed by holding a particular key, to macros and even emulating a mouse! In my case, I’ve gone for the following: PCB Laplace from keeb.io, a very compact 47-key (“40%") board, with no number pad, function keys or number row, but a lot of flexibility for key placement on the bottom row. One of my key design goals was small size so I can just pop it in my bag and have on my lap on the train. Controller Elite-C, designed specifically for keyboard builds to be physically compatible with the cheaper Pro Micro, with a more-robust USB port (the Pro Micro’s has a tendency to snap off), and made easier to program with a built-in reset button and better bootloader. Switches Gateron Black: Gateron is one of a number of manufacturers of mechanical switches compatible with the popular Cherry range. The black switch is linear (no click or bump at the activation point) and slightly heavier sprung than the more common red. Cherry also make a black switch but the Gateron version is slightly lighter and having tested a few I found them smoother too. My key goal here was to reduce noise, as the stronger spring will help me type accurately without hitting the bottom of the keystroke with an audible sound. Keycaps Blank grey PBT in DSA profile: this keyboard layout has a lot of non-standard sized keys, so blank keycaps meant that I wouldn’t be putting lots of keys out of their usual position; they’re also relatively cheap, fairly classy IMHO and a good placeholder until I end up getting some really cool caps on a group buy or something; oh, and it minimises the chance of someone else trying the keyboard and getting freaked out by the layout… Firmware QMK (Quantum Mechanical Keyboard), with a work-in-progress layout, based on Dvorak. QMK has a lot of features and allows you to fully program each and every key, with multiple layers accessed through several different routes. Because there are so few keys on this board, I’ll need to make good use of layers to make all the keys on a usual keyboard available. Dvorak Simplified Keyboard I’m grateful to the folks of the Leeds Hack Space, especially Nav & Mark who patiently coached me in various soldering techniques and good practice, but also everyone else who were so friendly and welcoming and interested in my project. I’m really pleased with the result, which is small, light and fully customisable. Playing with QMK firmware features will keep me occupied for quite a while! This isn’t the end though, as I’ll need a case to keep the dust out. I’m hoping to be able to 3D print this or mill it from wood with a CNC mill, for which I’ll need to head back to the Hack Space! Less, but better “Wenniger aber besser” — Dieter Rams {:.big-quote} I can barely believe it’s a full year since I published my intentions for 2018. A lot has happened since then. Principally: in November I started a new job as Data Services Lead at The British Library. One thing that hasn’t changed is my tendency to try to do too much, so this year I’m going to try and focus on a single intention, a translation of designer Dieter Rams' famous quote above: Less, but better. This chimes with a couple of other things I was toying with over the Christmas break, as they’re essentially other ways of saying the same thing: Take it steady One thing at a time I’m also going to keep in mind those touchstones from last year: What difference is this making? Am I looking after myself? Do I have evidence for this? I mainly forget to think about them, so I’ll be sticking up post-its everywhere to help me remember! How to extend Python with Rust: part 1 Python is great, but I find it useful to have an alternative language under my belt for occasions when no amount of Pythonic cleverness will make some bit of code run fast enough. One of my main reasons for wanting to learn Rust was to have something better than C for that. Not only does Rust have all sorts of advantages that make it a good choice for code that needs to run fast and correctly, it’s also got a couple of rather nice crates (libraries) that make interfacing with Python a lot nicer. Here’s a little tutorial to show you how easy it is to call a simple Rust function from Python. If you want to try it yourself, you’ll find the code on GitHub. !!! prerequisites I’m assuming for this tutorial that you’re already familiar with writing Python scripts and importing & using packages, and that you’re comfortable using the command line. You’ll also need to have installed Rust. The Rust bit The quickest way to get compiled code into Python is to use the builtin ctypes package. This is Python’s “Foreign Function Interface” or FFI: a means of calling functions outside the language you’re using to make the call. ctypes allows us to call arbitrary functions in a shared library1, as long as those functions conform to certain standard C language calling conventions. Thankfully, Rust tries hard to make it easy for us to build such a shared library. The first thing to do is to create a new project with cargo, the Rust build tool: $ cargo new rustfrompy Created library `rustfrompy` project $ tree . ├── Cargo.toml └── src └── lib.rs 1 directory, 2 files !!! aside I use the fairly common convention that text set in fixed-width font is either example code or commands to type in. For the latter, a $ precedes the command that you type (omit the $), and lines that don’t start with a $ are output from the previous command. I assume a basic familiarity with Unix-style command line, but I should probably put in some links to resources if you need to learn more! We need to edit the Cargo.toml file and add a [lib] section: [package] name = "rustfrompy" version = "0.1.0" authors = ["Jez Cope <j.cope@erambler.co.uk>"] [dependencies] [lib] name = "rustfrompy" crate-type = ["cdylib"] This tells cargo that we want to make a C-compatible dynamic library (crate-type = ["cdylib"]) and what to call it, plus some standard metadata. We can then put our code in src/lib.rs. We’ll just use a simple toy function that adds two numbers together: #[no_mangle] pub fn add(a: i64, b: i64) -> i64 { a + b } Notice the pub keyword, which instructs the compiler to make this function accessible to other modules, and the #[no_mangle] annotation, which tells it to use the standard C naming conventions for functions. If we don’t do this, then Rust will generate a new name for the function for its own nefarious purposes, and as a side effect we won’t know what to call it when we want to use it from Python. Being good developers, let’s also add a test: #[cfg(test)] mod test { use ::*; #[test] fn test_add() { assert_eq!(4, add(2, 2)); } } We can now run cargo test which will compile that code and run the test: $ cargo test Compiling rustfrompy v0.1.0 (file:///home/jez/Personal/Projects/rustfrompy) Finished dev [unoptimized + debuginfo] target(s) in 1.2 secs Running target/debug/deps/rustfrompy-3033caaa9f5f17aa running 1 test test test::test_add ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Everything worked! Now just to build that shared library and we can try calling it from Python: $ cargo build Compiling rustfrompy v0.1.0 (file:///home/jez/Personal/Projects/rustfrompy) Finished dev [unoptimized + debuginfo] target(s) in 0.30 secs Notice that the build is unoptimized and includes debugging information: this is useful in development, but once we’re ready to use our code it will run much faster if we compile it with optimisations. Cargo makes this easy: $ cargo build --release Compiling rustfrompy v0.1.0 (file:///home/jez/Personal/Projects/rustfrompy) Finished release [optimized] target(s) in 0.30 secs The Python bit After all that, the Python bit is pretty short. First we import the ctypes package (which is included in all recent Python versions): from ctypes import cdll Cargo has tidied our shared library away into a folder, so we need to tell Python where to load it from. On Linux, it will be called lib<something>.so where the “something” is the crate name from Cargo.toml, “rustfrompy”: lib = cdll.LoadLibrary('target/release/librustfrompy.so') Finally we can call the function anywhere we want. Here it is in a pytest-style test: def test_rust_add(): assert lib.add(27, 15) == 42 If you have pytest installed (and you should!) you can run the whole test like this: $ pytest --verbose test.py ====================================== test session starts ====================================== platform linux -- Python 3.6.4, pytest-3.1.1, py-1.4.33, pluggy-0.4.0 -- /home/jez/.virtualenvs/datasci/bin/python cachedir: .cache rootdir: /home/jez/Personal/Projects/rustfrompy, inifile: collected 1 items test.py::test_rust_add PASSED It worked! I’ve put both the Rust and Python code on github if you want to try it for yourself. Shortcomings Ok, so that was a pretty simple example, and I glossed over a lot of things. For example, what would happen if we did lib.add(2.0, 2)? This causes Python to throw an error because our Rust function only accepts integers (64-bit signed integers, i64, to be precise), and we gave it a floating point number. ctypes can’t guess what type(s) a given function will work with, but it can at least tell us when we get it wrong. To fix this properly, we need to do some extra work, telling the ctypes library what the argument and return types for each function are. For a more complex library, there will probably be more housekeeping to do, such as translating return codes from functions into more Pythonic-style errors. For a small example like this there isn’t much of a problem, but the bigger your compiled library the more extra boilerplate is required on the Python side just to use all the functions. When you’re working with an existing library you don’t have much choice about this, but if you’re building it from scratch specifically to interface with Python, there’s a better way using the Python C API. You can call this directly in Rust, but there are a couple of Rust crates that make life much easier, and I’ll be taking a look at those in a future blog post. .so on Linux, .dylib on Mac and .dll on Windows ↩︎ New Years's irresolution Photo by Andrew Hughes on Unsplash I’ve chosen not to make any specific resolutions this year; I’ve found that they just don’t work for me. Like many people, all I get is a sense of guilt when I inevitably fail to live up to the expectations I set myself at the start of the year. However, I have set a couple of what I’m referring to as “themes” for the year: touchstones that I’ll aim to refer to when setting priorities or just feeling a bit overwhelmed or lacking in direction. They are: Contribution Self-care Measurement I may do some blog posts expanding on these, but in the meantime, I’ve put together a handful of questions to help me think about priorities and get perspective when I’m doing (or avoiding doing) something. What difference is this making? I feel more motivated when I can figure out how I’m contributing to something bigger than myself. In society? In my organisation? To my friends & family? Am I looking after myself? I focus a lot on the expectations have (or at least that I think others have) of me, but I can’t do anything well unless I’m generally happy and healthy. Is this making me happier and healthier? Is this building my capacity to to look after myself, my family & friends and do my job? Is this worth the amount of time and energy I’m putting in? Do I have evidence for this? I don’t have to base decisions purely on feelings/opinions: I have the skills to obtain, analyse and interpret data. Is this fact or opinion? What are the facts? Am I overthinking this? Can I put a confidence interval for this? Build documents from code and data with Saga !!! tldr “TL;DR” I’ve made Saga, a thing for compiling documents by combining code and data with templates. What is it? Saga is a very simple command-line tool that reads in one or more data files, runs one or more scripts, then passes the results into a template to produce a final output document. It enables you to maintain a clean separation between data, logic and presentation and produce data-based documents that can easily be updated. That allows the flow of data through the document to be easily understood, a cornerstone of reproducible analysis. You run it like this: saga build -d data.yaml -d other_data.yaml \ -s analysis.py -t report.md.tmpl \ -O report.md Any scripts specified with -s will have access to the data in local variables, and any changes to local variables in a script will be retained when everything is passed to the template for rendering. For debugging, you can also do: saga dump -d data.yaml -d other_data.yaml -s analysis.py which will print out the full environment that would be passed to your template with saga build. Features Right now this is a really early version. It does the job but I have lots of ideas for features to add if I ever have time. At present it does the following: Reads data from one or more YAML files Transforms data with one or more Python scripts Renders a template in Mako format Works with any plain-text output format, including Markdown, LaTeX and HTML Use cases Write reproducible reports & papers based on machine-readable data Separate presentation from content in any document, e.g. your CV (example coming soon) Yours here? Get it! I haven’t released this on PyPI yet, but all the code is available on GitHub to try out. If you have pipenv installed (and if you use Python you should!), you can try it out in an isolated virtual environment by doing: git clone https://github.com/jezcope/sagadoc.git cd sagadoc pipenv install pipenv run saga or you can set up for development and run some tests: pipenv install --dev pipenv run pytest Why? Like a lot of people, I have to produce reports for work, often containing statistics computed from data. Although these generally aren’t academic research papers, I see no reason not to aim for a similar level of reproducibility: after all, if I’m telling other people to do it, I’d better take my own advice! A couple of times now I’ve done this by writing a template that holds the text of the report and placeholders for values, along with a Python script that reads in the data, calculates the statistics I want and completes the template. This is valuable for two main reasons: If anyone wants to know how I processed the data and calculated those statistics, it’s all there: no need to try and remember and reproduce a series of button clicks in Excel; If the data or calculations change, I just need to update the relevant part and run it again, and all the relevant parts of the document will be updated. This is particularly important if changing a single data value requires recalculation of dozens of tables, charts, etc. It also gives me the potential to factor out and reuse bits of code in the future, add tests and version control everything. Now that I’ve done this more than once (and it seems likely I’ll do it again) it makes sense to package that script up in a more portable form so I don’t have to write it over and over again (or, shock horror, copy & paste it!). It saves time, and gives others the possibility to make use of it. Prior art I’m not the first person to think of this, but I couldn’t find anything that did exactly what I needed. Several tools will let you interweave code and prose, including the results of evaluating each code snippet in the document: chief among these are Jupyter and Rmarkdown. There are also tools that let you write code in the order that makes most sense to read and then rearrange it into the right order to execute, so-call literate programming. The original tool for this is the venerable noweb. Sadly there is very little that combine both of these and allow you to insert the results of various calculations at arbitrary points in a document, independent of the order of either presenting or executing the code. The only two that I’m aware of are: Dexy and org-mode. Unfortunately, Dexy currently only works on Legacy Python (/Python 2) and org-mode requires emacs (which is fine but not exactly portable). Rmarkdown comes close and supports a range of languages but the full feature set is only available with R. Actually, my ideal solution is org-mode without the emacs dependency, because that’s the most flexible solution; maybe one day I’ll have both the time and skill to implement that. It’s also possible I might be able to figure out Dexy’s internals to add what I want to it, but until then Saga does the job! Future work There are lots of features that I’d still like to add when I have time: Some actual documentation! And examples! More data formats (e.g. CSV, JSON, TOML) More languages (e.g. R, Julia) Fetching remote data over http Caching of intermediate results to speed up rebuilds For now, though, I’d love for you to try it out and let me know what you think! As ever, comment here, tweet me or start an issue on GitHub. Why try Rust for scientific computing? When you’re writing analysis code, Python (or R, or JavaScript, or …) is usually the right choice. These high-level languages are set up to make you as productive as possible, and common tasks like array manipulation have been well optimised. However, sometimes you just can’t get enough speed and need to turn to a lower-level compiled language. Often that will be C, C++ or Fortran, but I thought I’d do a short post on why I think you should consider Rust. One of my goals for 2017’s Advent of Code was to learn a modern, memory-safe, statically-typed language. I now know that there are quite a lot of options in this space, but two seem to stand out: Go & Rust. I gave both of them a try, and although I’ll probably go back to give Go a more thorough test at some point I found I got quite hooked on Rust. Both languages, though young, are definitely production-ready. Servo, the core of the new Firefox browser, is entirely written in Rust. In fact, Mozilla have been trying to rewrite the rendering core in C for nearly a decade, and switching to Rust let them get it done in just a couple of years. !!! tldr “TL;DR” - It’s fast: competitive with idiomatic C/C++, and no garbage-collection overhead - It’s harder to write buggy code, and compiler errors are actually helpful - It’s C-compatible: you can call into Rust code anywhere you’d call into C, call C/C++ from Rust, and incrementally replace C/C++ code with Rust - It has sensible modern syntax that makes your code clearer and more concise - Support for scientific computing are getting better all the time (matrix algebra libraries, built-in SIMD, safe concurrency) - It has a really friendly and active community - It’s production-ready: Servo, the new rendering core in Firefox, is built entirely in Rust Performance To start with, as a compiled language Rust executes much faster than a (pseudo-)interpreted language like Python or R; the price you pay for this is time spent compiling during development. However, having a compile step also allows the language to enforce certain guarantees, such as type-correctness and memory safety, which between them prevent whole classes of bugs from even being possible. Unlike Go (which, like many higher-level languages, uses a garbage collector), Rust handles memory safety at compile time through the concepts of ownership and borrowing. These can take some getting used to and were a big source of frustration when I was first figuring out the language, but ultimately contribute to Rust’s reliably-fast performance. Performance can be unpredictable in a garbage-collected language because you can’t be sure when the GC is going to run and you need to understand it really well to stand a chance of optimising it if becomes a problem. On the other hand, code that has the potential to be unsafe will result in compilation errors in Rust. There are a number of benchmarks (example) that show Rust’s performance on a par with idiomatic C & C++ code, something that very few languages can boast. Helpful error messages Because beginner Rust programmers often get compile errors, it’s really important that those errors are easy to interpret and fix, and Rust is great at this. Not only does it tell you what went wrong, but wherever possible it prints out your code annotated with arrows to show exactly where the error is, and makes specific suggestions how to fix the error which usually turn out to be correct. It also has a nice suite of warnings (things that don’t cause compilation to fail but may indicate bugs) that are just as informative, and this can be extended even further by using the clippy linting tool to further analyse your code. warning: unused variable: `y` --> hello.rs:3:9 | 3 | let y = x; | ^ | = note: #[warn(unused_variables)] on by default = note: to avoid this warning, consider using `_y` instead Easy to integrate with other languages If you’re like me, you’ll probably only use a low-level language for performance-critical code that you can call from a high-level language, and this is an area where Rust shines. Most programmers will turn to C, C++ or Fortran for this because they have a well established ABI (Application Binary Interface) which can be understood by languages like Python and R1. In Rust, it’s trivial to make a C-compatible shared library, and the standard library includes extra features for working with C types. That also means that existing C code can be incrementally ported to Rust: see remacs for an example. On top of this, there are projects like rust-cpython and PyO3 which provide macros and structures that wrap the Python C API to let you build Python modules in Rust with minimal glue code; rustr does a similar job for R. Nice language features Rust has some really nice features, which let you write efficient, concise and correct code. Several feel particularly comfortable as they remind me of similar things available in Haskell, including: Enums, a super-powered combination of C enums and unions (similar to Haskell’s algebraic data types) that enable some really nice code with no runtime cost Generics and traits that let you get more done with less code Pattern matching, a kind of case statement that lets you extract parts of structs, tuples & enums and do all sorts of other clever things Lazy computation based on an iterator pattern, for efficient processing of lists of things: you can do for item in list { ... } instead of the C-style use of an index2, or you can use higher-order functions like map and filter Functions/closures as first-class citizens Scientific computing Although it’s a general-purpose language and not designed specifically for scientific computing, Rust’s support is improving all the time. There are some interesting matrix algebra libraries available, and built-in SIMD is incoming. The memory safety features also work to ensure thread safety, so it’s harder to write concurrency bugs. You should be able to use your favourite MPI implementation too, and there’s at least one attempt to portably wrap MPI in a more Rust-like way. Active development and friendly community One of the things you notice straight away is how active and friendly the Rust community is. There are several IRC channels on irc.mozilla.org including #rust-beginners, which is a great place to get help. The compiler is under constant but carefully-managed development, so that new features are landing all the time but without breaking existing code. And the fabulous Cargo build tool and crates.io are enabling the rapid growth of a healthy ecosystem of open source libraries that you can use to write less code yourself. Summary So, next time you need a compiled language to speed up hotspots in your code, try Rust. I promise you won’t regret it! Julia actually allows you to call C and Fortran functions as a first-class language feature ↩︎ Actually, since C++11 there’s for (auto item : list) { ... } but still… ↩︎ Reflections on #aoc2017 Trees reflected in a lake Joshua Reddekopp on Unsplash It seems like ages ago, but way back in November I committed to completing Advent of Code. I managed it all, and it was fun! All of my code is available on GitHub if you’re interested in seeing what I did, and I managed to get out a blog post for every one with a bit more commentary, which you can see in the series list above. How did I approach it? I’ve not really done any serious programming challenges before. I don’t get to write a lot of code at the moment, so all I wanted from AoC was an excuse to do some proper problem-solving. I never really intended to take a polyglot approach, though I did think that I might use mainly Python with a bit of Haskell. In the end, though, I used: Python (×12); Haskell (×7); Rust (×4); Go; C++; Ruby; Julia; and Coconut. For the most part, my priorities were getting the right answer, followed by writing readable code. I didn’t specifically focus on performance but did try to avoid falling into traps that I knew about. What did I learn? I found Python the easiest to get on with: it’s the language I know best and although I can’t always remember exact method names and parameters I know what’s available and where to look to remind myself, as well as most of the common idioms and some performance traps to avoid. Python was therefore the language that let me focus most on solving the problem itself. C++ and Ruby were more challenging, and it was harder to write good idiomatic code but I can still remember quite a lot. Haskell I haven’t used since university, and just like back then I really enjoyed working out how to solve problems in a functional style while still being readable and efficient (not always something I achieved…). I learned a lot about core Haskell concepts like monads & functors, and I’m really amazed by the way the Haskell community and ecosystem has grown up in the last decade. I also wanted to learn at least one modern, memory-safe compiled language, so I tried both Go and Rust. Both seem like useful languages, but Rust really intrigued me with its conceptual similarities to both Haskell and C++ and its promise of memory safety without a garbage collector. I struggled a lot initially with the “borrow checker” (the component that enforces memory safety at compile time) but eventually started thinking in terms of ownership and lifetimes after which things became easier. The Rust community seems really vibrant and friendly too. What next? I really want to keep this up, so I’m going to look out some more programming challenges (Project Euler looks interesting). It turns out there’s a regular Code Dojo meetup in Leeds, so hopefully I’ll try that out too. I’d like to do more realistic data-science stuff, so I’ll be taking a closer look at stuff like Kaggle too, and figuring out how to do a bit more analysis at work. I’m also feeling motivated to find an open source project to contribute to and/or release a project of my own, so we’ll see if that goes anywhere! I’ve always found the advice to “scratch your own itch” difficult to follow because everything I think of myself has already been done better. Most of the projects I use enough to want to contribute to tend to be pretty well developed with big communities and any bugs that might be accessible to me will be picked off and fixed before I have a chance to get started. Maybe it’s time to get over myself and just reimplement something that already exists, just for the fun of it! The Halting Problem — Python — #adventofcode Day 25 Today’s challenge, takes us back to a bit of computing history: a good old-fashioned Turing Machine. → Full code on GitHub !!! commentary Today’s challenge was a nice bit of nostalgia, taking me back to my university days learning about the theory of computing. Turing Machines are a classic bit of computing theory, and are provably able to compute any value that is possible to compute: a value is computable if and only if a Turing Machine can be written that computes it (though in practice anything non-trivial is mind-bendingly hard to write as a TM). A bit of a library-fest today, compared to other days! from collections import deque, namedtuple from collections.abc import Iterator from tqdm import tqdm import re import fileinput as fi These regular expressions are used to parse the input that defines the transition table for the machine. RE_ISTATE = re.compile(r'Begin in state (?P<state>\w+)\.') RE_RUNTIME = re.compile( r'Perform a diagnostic checksum after (?P<steps>\d+) steps.') RE_STATETRANS = re.compile( r"In state (?P<state>\w+):\n" r" If the current value is (?P<read0>\d+):\n" r" - Write the value (?P<write0>\d+)\.\n" r" - Move one slot to the (?P<move0>left|right).\n" r" - Continue with state (?P<next0>\w+).\n" r" If the current value is (?P<read1>\d+):\n" r" - Write the value (?P<write1>\d+)\.\n" r" - Move one slot to the (?P<move1>left|right).\n" r" - Continue with state (?P<next1>\w+).") MOVE = {'left': -1, 'right': 1} A namedtuple to provide some sugar when using a transition rule. Rule = namedtuple('Rule', 'write move next_state') The TuringMachine class does all the work. class TuringMachine: def __init__(self, program=None): self.tape = deque() self.transition_table = {} self.state = None self.runtime = 0 self.steps = 0 self.pos = 0 self.offset = 0 if program is not None: self.load(program) def __str__(self): return f"Current: {self.state}; steps: {self.steps} of {self.runtime}" Some jiggery-pokery to allow us to use self[pos] to reference an infinite tape. def __getitem__(self, i): i += self.offset if i < 0 or i >= len(self.tape): return 0 else: return self.tape[i] def __setitem__(self, i, x): i += self.offset if i >= 0 and i < len(self.tape): self.tape[i] = x elif i == -1: self.tape.appendleft(x) self.offset += 1 elif i == len(self.tape): self.tape.append(x) else: raise IndexError('Tried to set position off end of tape') Parse the program and set up the transtion table. def load(self, program): if isinstance(program, Iterator): program = ''.join(program) match = RE_ISTATE.search(program) self.state = match['state'] match = RE_RUNTIME.search(program) self.runtime = int(match['steps']) for match in RE_STATETRANS.finditer(program): self.transition_table[match['state']] = { int(match['read0']): Rule(write=int(match['write0']), move=MOVE[match['move0']], next_state=match['next0']), int(match['read1']): Rule(write=int(match['write1']), move=MOVE[match['move1']], next_state=match['next1']), } Run the program for the required number of steps (given by self.runtime). tqdm isn’t in the standard library but it should be: it shows a lovely text-mode progress bar as we go. def run(self): for _ in tqdm(range(self.runtime), desc="Running", unit="steps", unit_scale=True): read = self[self.pos] rule = self.transition_table[self.state][read] self[self.pos] = rule.write self.pos += rule.move self.state = rule.next_state Calculate the “diagnostic checksum” required for the answer. @property def checksum(self): return sum(self.tape) Aaand GO! machine = TuringMachine(fi.input()) machine.run() print("Checksum:", machine.checksum) Electromagnetic Moat — Rust — #adventofcode Day 24 Today’s challenge, the penultimate, requires us to build a bridge capable of reaching across to the CPU, our final destination. → Full code on GitHub !!! commentary We have a finite number of components that fit together in a restricted way from which to build a bridge, and we have to work out both the strongest and the longest bridge we can build. The most obvious way to do this is to recursively build every possible bridge and select the best, but that’s an O(n!) algorithm that could blow up quickly, so might as well go with a nice fast language! Might have to try this in Haskell too, because it’s the type of algorithm that lends itself naturally to a pure functional approach. I feel like I've applied some of the things I've learned in previous challenges I used Rust for, and spent less time mucking about with ownership, and made better use of various language features, including structs and iterators. I'm rather pleased with how my learning of this language is progressing. I'm definitely overusing `Option.unwrap` at the moment though: this is a lazy way to deal with `Option` results and will panic if the result is not what's expected. I'm not sure whether I need to be cloning the components `Vector` either, or whether I could just be passing iterators around. First, we import some bits of standard library and define some data types. The BridgeResult struct lets us use the same algorithm for both parts of the challenge and simply change the value used to calculate the maximum. use std::io; use std::fmt; use std::io::BufRead; #[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)] struct Component(u8, u8); #[derive(Debug, Copy, Clone, Default)] struct BridgeResult { strength: u16, length: u16, } impl Component { fn from_str(s: &str) -> Component { let parts: Vec<&str> = s.split('/').collect(); assert!(parts.len() == 2); Component(parts[0].parse().unwrap(), parts[1].parse().unwrap()) } fn fits(self, port: u8) -> bool { self.0 == port || self.1 == port } fn other_end(self, port: u8) -> u8 { if self.0 == port { return self.1; } else if self.1 == port { return self.0; } else { panic!("{} doesn't fit port {}", self, port); } } fn strength(self) -> u16 { self.0 as u16 + self.1 as u16 } } impl fmt::Display for BridgeResult { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { write!(f, "(S: {}, L: {})", self.strength, self.length) } } best_bridge calculates the length and strength of the “best” bridge that can be built from the remaining components and fits the required port. Whether this is based on strength or length is given by the key parameter, which is passed to Iter.max_by_key. fn best_bridge<F>(port: u8, key: &F, components: &Vec<Component>) -> Option<BridgeResult> where F: Fn(&BridgeResult) -> u16 { if components.len() == 0 { return None; } components.iter() .filter(|c| c.fits(port)) .map(|c| { let b = best_bridge(c.other_end(port), key, &components.clone().into_iter() .filter(|x| x != c).collect()) .unwrap_or_default(); BridgeResult{strength: c.strength() + b.strength, length: 1 + b.length} }) .max_by_key(key) } Now all that remains is to read the input and calculate the result. I was rather pleasantly surprised to find that in spite of my pessimistic predictions about efficiency, when compiled with optimisations turned on this terminates in less than 1s on my laptop. fn main() { let stdin = io::stdin(); let components: Vec<_> = stdin.lock() .lines() .map(|l| Component::from_str(&l.unwrap())) .collect(); match best_bridge(0, &|b: &BridgeResult| b.strength, &components) { Some(b) => println!("Strongest bridge is {}", b), None => println!("No strongest bridge found") }; match best_bridge(0, &|b: &BridgeResult| b.length, &components) { Some(b) => println!("Longest bridge is {}", b), None => println!("No longest bridge found") }; } Coprocessor Conflagration — Haskell — #adventofcode Day 23 Today’s challenge requires us to understand why a coprocessor is working so hard to perform an apparently simple calculation. → Full code on GitHub !!! commentary Today’s problem is based on an assembly-like language very similar to day 18, so I went back and adapted my code from that, which works well for the first part. I’ve also incorporated some advice from /r/haskell, and cleaned up all warnings shown by the -Wall compiler flag and the hlint tool. Part 2 requires the algorithm to run with much larger inputs, and since some analysis shows that it's an `O(n^3)` algorithm it gets intractible pretty fast. There are several approaches to this. First up, if you have a fast enough processor and an efficient enough implementation I suspect that the simulation would probably terminate eventually, but that would likely still take hours: not good enough. I also thought about doing some peephole optimisations on the instructions, but the last time I did compiler optimisation was my degree so I wasn't really sure where to start. What I ended up doing was actually analysing the input code by hand to figure out what it was doing, and then just doing that calculation in a sensible way. I'd like to say I managed this on my own (and I ike to think I would have) but I did get some tips on [/r/adventofcode](https://reddit.com/r/adventofcode). The majority of this code is simply a cleaned-up version of day 18, with some tweaks to accommodate the different instruction set: module Main where import qualified Data.Vector as V import qualified Data.Map.Strict as M import Control.Monad.State.Strict import Text.ParserCombinators.Parsec hiding (State) type Register = Char type Value = Int type Argument = Either Value Register data Instruction = Set Register Argument | Sub Register Argument | Mul Register Argument | Jnz Argument Argument deriving Show type Program = V.Vector Instruction data Result = Cont | Halt deriving (Eq, Show) type Registers = M.Map Char Int data Machine = Machine { dRegisters :: Registers , dPtr :: !Int , dMulCount :: !Int , dProgram :: Program } instance Show Machine where show d = show (dRegisters d) ++ " @" ++ show (dPtr d) ++ " ×" ++ show (dMulCount d) defaultMachine :: Machine defaultMachine = Machine M.empty 0 0 V.empty type MachineState = State Machine program :: GenParser Char st Program program = do instructions <- endBy instruction eol return $ V.fromList instructions where instruction = try (regOp "set" Set) <|> regOp "sub" Sub <|> regOp "mul" Mul <|> jump "jnz" Jnz regOp n c = do string n >> spaces val1 <- oneOf "abcdefgh" secondArg c val1 jump n c = do string n >> spaces val1 <- regOrVal secondArg c val1 secondArg c val1 = do spaces val2 <- regOrVal return $ c val1 val2 regOrVal = register <|> value register = do name <- lower return $ Right name value = do val <- many $ oneOf "-0123456789" return $ Left $ read val eol = char '\n' parseProgram :: String -> Either ParseError Program parseProgram = parse program "" getReg :: Char -> MachineState Int getReg r = do st <- get return $ M.findWithDefault 0 r (dRegisters st) putReg :: Char -> Int -> MachineState () putReg r v = do st <- get let current = dRegisters st new = M.insert r v current put $ st { dRegisters = new } modReg :: (Int -> Int -> Int) -> Char -> Argument -> MachineState () modReg op r v = do u <- getReg r v' <- getRegOrVal v putReg r (u `op` v') incPtr getRegOrVal :: Argument -> MachineState Int getRegOrVal = either return getReg addPtr :: Int -> MachineState () addPtr n = do st <- get put $ st { dPtr = n + dPtr st } incPtr :: MachineState () incPtr = addPtr 1 execInst :: Instruction -> MachineState () execInst (Set reg val) = do newVal <- getRegOrVal val putReg reg newVal incPtr execInst (Mul reg val) = do result <- modReg (*) reg val st <- get put $ st { dMulCount = 1 + dMulCount st } return result execInst (Sub reg val) = modReg (-) reg val execInst (Jnz val1 val2) = do test <- getRegOrVal val1 jump <- if test /= 0 then getRegOrVal val2 else return 1 addPtr jump execNext :: MachineState Result execNext = do st <- get let prog = dProgram st p = dPtr st if p >= length prog then return Halt else do execInst (prog V.! p) return Cont runUntilTerm :: MachineState () runUntilTerm = do result <- execNext unless (result == Halt) runUntilTerm This implements the actual calculation: the number of non-primes between (for my input) 107900 and 124900: optimisedCalc :: Int -> Int -> Int -> Int optimisedCalc a b k = sum $ map (const 1) $ filter notPrime [a,a+k..b] where notPrime n = elem 0 $ map (mod n) [2..(floor $ sqrt (fromIntegral n :: Double))] main :: IO () main = do input <- getContents case parseProgram input of Right prog -> do let c = defaultMachine { dProgram = prog } (_, c') = runState runUntilTerm c putStrLn $ show (dMulCount c') ++ " multiplications made" putStrLn $ "Calculation result: " ++ show (optimisedCalc 107900 124900 17) Left e -> print e Sporifica Virus — Rust — #adventofcode Day 22 Today’s challenge has us helping to clean up (or spread, I can’t really tell) an infection of the “sporifica” virus. → Full code on GitHub !!! commentary I thought I’d have another play with Rust, as its Haskell-like features resonate with me at the moment. I struggled quite a lot with the Rust concepts of ownership and borrowing, and this is a cleaned-up version of the code based on some good advice from the folks on /r/rust. use std::io; use std::env; use std::io::BufRead; use std::collections::HashMap; #[derive(PartialEq, Clone, Copy, Debug)] enum Direction {Up, Right, Down, Left} #[derive(PartialEq, Clone, Copy, Debug)] enum Infection {Clean, Weakened, Infected, Flagged} use self::Direction::*; use self::Infection::*; type Grid = HashMap<(isize, isize), Infection>; fn turn_left(d: Direction) -> Direction { match d {Up => Left, Right => Up, Down => Right, Left => Down} } fn turn_right(d: Direction) -> Direction { match d {Up => Right, Right => Down, Down => Left, Left => Up} } fn turn_around(d: Direction) -> Direction { match d {Up => Down, Right => Left, Down => Up, Left => Right} } fn make_move(d: Direction, x: isize, y: isize) -> (isize, isize) { match d { Up => (x-1, y), Right => (x, y+1), Down => (x+1, y), Left => (x, y-1), } } fn basic_step(grid: &mut Grid, x: &mut isize, y: &mut isize, d: &mut Direction) -> usize { let mut infect = 0; let current = match grid.get(&(*x, *y)) { Some(v) => *v, None => Clean, }; if current == Infected { *d = turn_right(*d); } else { *d = turn_left(*d); infect = 1; }; grid.insert((*x, *y), match current { Clean => Infected, Infected => Clean, x => panic!("Unexpected infection state {:?}", x), }); let new_pos = make_move(*d, *x, *y); *x = new_pos.0; *y = new_pos.1; infect } fn nasty_step(grid: &mut Grid, x: &mut isize, y: &mut isize, d: &mut Direction) -> usize { let mut infect = 0; let new_state: Infection; let current = match grid.get(&(*x, *y)) { Some(v) => *v, None => Infection::Clean, }; match current { Clean => { *d = turn_left(*d); new_state = Weakened; }, Weakened => { new_state = Infected; infect = 1; }, Infected => { *d = turn_right(*d); new_state = Flagged; }, Flagged => { *d = turn_around(*d); new_state = Clean; } }; grid.insert((*x, *y), new_state); let new_pos = make_move(*d, *x, *y); *x = new_pos.0; *y = new_pos.1; infect } fn virus_infect<F>(mut grid: Grid, mut step: F, mut x: isize, mut y: isize, mut d: Direction, n: usize) -> usize where F: FnMut(&mut Grid, &mut isize, &mut isize, &mut Direction) -> usize, { (0..n).map(|_| step(&mut grid, &mut x, &mut y, &mut d)) .sum() } fn main() { let args: Vec<String> = env::args().collect(); let n_basic: usize = args[1].parse().unwrap(); let n_nasty: usize = args[2].parse().unwrap(); let stdin = io::stdin(); let lines: Vec<String> = stdin.lock() .lines() .map(|x| x.unwrap()) .collect(); let mut grid: Grid = HashMap::new(); let x0 = (lines.len() / 2) as isize; let y0 = (lines[0].len() / 2) as isize; for (i, line) in lines.iter().enumerate() { for (j, c) in line.chars().enumerate() { grid.insert((i as isize, j as isize), match c {'#' => Infected, _ => Clean}); } } let basic_steps = virus_infect(grid.clone(), basic_step, x0, y0, Up, n_basic); println!("Basic: infected {} times", basic_steps); let nasty_steps = virus_infect(grid, nasty_step, x0, y0, Up, n_nasty); println!("Nasty: infected {} times", nasty_steps); } Fractal Art — Python — #adventofcode Day 21 Today’s challenge asks us to assist an artist building fractal patterns from a rulebook. → Full code on GitHub !!! commentary Another fairly straightforward algorithm: the really tricky part was breaking the pattern up into chunks and rejoining it again. I could probably have done that more efficiently, and would have needed to if I had to go for a few more iterations and the grid grows with every iteration and gets big fast. Still behind on the blog posts… import fileinput as fi from math import sqrt from functools import reduce, partial import operator INITIAL_PATTERN = ((0, 1, 0), (0, 0, 1), (1, 1, 1)) DECODE = ['.', '#'] ENCODE = {'.': 0, '#': 1} concat = partial(reduce, operator.concat) def rotate(p): size = len(p) return tuple(tuple(p[i][j] for i in range(size)) for j in range(size - 1, -1, -1)) def flip(p): return tuple(p[i] for i in range(len(p) - 1, -1, -1)) def permutations(p): yield p yield flip(p) for _ in range(3): p = rotate(p) yield p yield flip(p) def print_pattern(p): print('-' * len(p)) for row in p: print(' '.join(DECODE[x] for x in row)) print('-' * len(p)) def build_pattern(s): return tuple(tuple(ENCODE[c] for c in row) for row in s.split('/')) def build_pattern_book(lines): book = {} for line in lines: source, target = line.strip().split(' => ') for rotation in permutations(build_pattern(source)): book[rotation] = build_pattern(target) return book def subdivide(pattern): size = 2 if len(pattern) % 2 == 0 else 3 n = len(pattern) // size return (tuple(tuple(pattern[i][j] for j in range(y * size, (y + 1) * size)) for i in range(x * size, (x + 1) * size)) for x in range(n) for y in range(n)) def rejoin(parts): n = int(sqrt(len(parts))) size = len(parts[0]) return tuple(concat(parts[i + k][j] for i in range(n)) for k in range(0, len(parts), n) for j in range(size)) def enhance_once(p, book): return rejoin(tuple(book[part] for part in subdivide(p))) def enhance(p, book, n, progress=None): for _ in range(n): p = enhance_once(p, book) return p book = build_pattern_book(fi.input()) intermediate_pattern = enhance(INITIAL_PATTERN, book, 5) print("After 5 iterations:", sum(sum(row) for row in intermediate_pattern)) final_pattern = enhance(intermediate_pattern, book, 13) print("After 18 iterations:", sum(sum(row) for row in final_pattern)) Particle Swarm — Python — #adventofcode Day 20 Today’s challenge finds us simulating the movements of particles in space. → Full code on GitHub !!! commentary Back to Python for this one, another relatively straightforward simulation, although it’s easier to calculate the answer to part 1 than to simulate. import fileinput as fi import numpy as np import re First we parse the input into 3 2D arrays: using numpy enables us to do efficient arithmetic across the whole set of particles in one go. PARTICLE_RE = re.compile(r'p=<(-?\d+),(-?\d+),(-?\d+)>, ' r'v=<(-?\d+),(-?\d+),(-?\d+)>, ' r'a=<(-?\d+),(-?\d+),(-?\d+)>') def parse_input(lines): x = [] v = [] a = [] for l in lines: m = PARTICLE_RE.match(l) x.append([int(x) for x in m.group(1, 2, 3)]) v.append([int(x) for x in m.group(4, 5, 6)]) a.append([int(x) for x in m.group(7, 8, 9)]) return (np.arange(len(x)), np.array(x), np.array(v), np.array(a)) i, x, v, a = parse_input(fi.input()) Now we can calculate which particle will be closest to the origin in the long-term: this is simply the particle with the smallest acceleration. It turns out that several have the same acceleration, so of these, the one we want is the one with the lowest starting velocity. This is only complicated slightly by the need to get the number of the particle rather than its other information, hence the need to use numpy.argmin. a_abs = np.sum(np.abs(a), axis=1) a_min = np.min(a_abs) a_i = np.squeeze(np.argwhere(a_abs == a_min)) closest = i[a_i[np.argmin(np.sum(np.abs(v[a_i]), axis=1))]] print("Closest: ", closest) Now we define functions to simulate collisions between particles. We have to use the return_index and return_counts options to numpy.unique to be able to get rid of all the duplicate positions (the standard usage is to keep one of each duplicate). def resolve_collisions(x, v, a): (_, i, c) = np.unique(x, return_index=True, return_counts=True, axis=0) i = i[c == 1] return x[i], v[i], a[i] The termination criterion for this loop is an interesting aspect: the most robust to my mind seems to be that eventually the particles will end up sorted in order of their initial acceleration in terms of distance from the origin, so you could check for this but that’s pretty computationally expensive. In the end, all that was needed was a bit of trial and error: terminating arbitrarily after 1,000 iterations seems to work! In fact, all the collisions are over after about 40 iterations for my input but there was always the possibility that two particles with very slightly different accelerations would eventually intersect much later. def simulate_collisions(x, v, a, iterations=1000): for _ in range(iterations): v += a x += v x, v, a = resolve_collisions(x, v, a) return len(x) print("Remaining particles: ", simulate_collisions(x, v, a)) A Series of Tubes — Rust — #adventofcode Day 19 Today’s challenge asks us to help a network packet find its way. → Full code on GitHub !!! commentary Today’s challenge was fairly straightforward, following an ASCII art path, so I thought I’d give Rust another try. I’m a bit behind on the blog posts, so I’m presenting the code below without any further commentary. I’m not really convinced this is good idiomatic Rust, and it was interesting turning a set of strings into a 2D array of characters because there are both u8 (byte) and char types to deal with. use std::io; use std::io::BufRead; const ALPHA: &'static str = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; fn change_direction(dia: &Vec<Vec<u8>>, x: usize, y: usize, dx: &mut i32, dy: &mut i32) { assert_eq!(dia[x][y], b'+'); if dx.abs() == 1 { *dx = 0; if y + 1 < dia[x].len() && (dia[x][y + 1] == b'-' || ALPHA.contains(dia[x][y + 1] as char)) { *dy = 1; } else if dia[x][y - 1] == b'-' || ALPHA.contains(dia[x][y - 1] as char) { *dy = -1; } else { panic!("Huh? {} {}", dia[x][y+1] as char, dia[x][y-1] as char); } } else { *dy = 0; if x + 1 < dia.len() && (dia[x + 1][y] == b'|' || ALPHA.contains(dia[x + 1][y] as char)) { *dx = 1; } else if dia[x - 1][y] == b'|' || ALPHA.contains(dia[x - 1][y] as char) { *dx = -1; } else { panic!("Huh?"); } } } fn follow_route(dia: Vec<Vec<u8>>) -> (String, i32) { let mut x: i32 = 0; let mut y: i32; let mut dx: i32 = 1; let mut dy: i32 = 0; let mut result = String::new(); let mut steps = 1; match dia[0].iter().position(|x| *x == b'|') { Some(i) => y = i as i32, None => panic!("Could not find '|' in first row"), } loop { x += dx; y += dy; match dia[x as usize][y as usize] { b'A'...b'Z' => result.push(dia[x as usize][y as usize] as char), b'+' => change_direction(&dia, x as usize, y as usize, &mut dx, &mut dy), b' ' => return (result, steps), _ => (), } steps += 1; } } fn main() { let stdin = io::stdin(); let lines: Vec<Vec<u8>> = stdin.lock().lines() .map(|l| l.unwrap().into_bytes()) .collect(); let result = follow_route(lines); println!("Route: {}", result.0); println!("Steps: {}", result.1); } Duet — Haskell — #adventofcode Day 18 Today’s challenge introduces a type of simplified assembly language that includes instructions for message-passing. First we have to simulate a single program (after humorously misinterpreting the snd and rcv instructions as “sound” and “recover”), but then we have to simulate two concurrent processes and the message passing between them. → Full code on GitHub !!! commentary Well, I really learned a lot from this one! I wanted to get to grips with more complex stuff in Haskell and this challenge seemed like an excellent opportunity to figure out a) parsing with the parsec library and b) using the State monad to keep the state of the simulator. As it turned out, that wasn't all I'd learned: I also ran into an interesting situation whereby lazy evaluation was creating an infinite loop where there shouldn't be one, so I also had to learn how to selectively force strict evaluation of values. I'm pretty sure this isn't the best Haskell in the world, but I'm proud of it. First we have to import a bunch of stuff to use later, but also notice the pragma on the first line which instructs the compiler to enable the BangPatterns language extension, which will be important later. {-# LANGUAGE BangPatterns #-} module Main where import qualified Data.Vector as V import qualified Data.Map.Strict as M import Data.List import Data.Either import Data.Maybe import Control.Monad.State.Strict import Control.Monad.Loops import Text.ParserCombinators.Parsec hiding (State) First up we define the types that will represent the program code itself. data DuetVal = Reg Char | Val Int deriving Show type DuetQueue = [Int] data DuetInstruction = Snd DuetVal | Rcv DuetVal | Jgz DuetVal DuetVal | Set DuetVal DuetVal | Add DuetVal DuetVal | Mul DuetVal DuetVal | Mod DuetVal DuetVal deriving Show type DuetProgram = V.Vector DuetInstruction Next we define the types to hold the machine state, which includes: registers, instruction pointer, send & receive buffers and the program code, plus a counter of the number of sends made (to provide the solution). type DuetRegisters = M.Map Char Int data Duet = Duet { dRegisters :: DuetRegisters , dPtr :: Int , dSendCount :: Int , dRcvBuf :: DuetQueue , dSndBuf :: DuetQueue , dProgram :: DuetProgram } instance Show Duet where show d = show (dRegisters d) ++ " @" ++ show (dPtr d) ++ " S" ++ show (dSndBuf d) ++ " R" ++ show (dRcvBuf d) defaultDuet = Duet M.empty 0 0 [] [] V.empty type DuetState = State Duet program is a parser built on the cool parsec library to turn the program text into a Haskell format that we can work with, a Vector of instructions. Yes, using a full-blown parser is overkill here (it would be much simpler just to split each line on whitespace, but I wanted to see how Parsec works. I’m using Vector here because we need random access to the instruction list, which is much more efficient with Vector: O(1) compared with the O(n) of the built in Haskell list ([]) type. parseProgram applies the parser to a string and returns the result. program :: GenParser Char st DuetProgram program = do instructions <- endBy instruction eol return $ V.fromList instructions where instruction = try (oneArg "snd" Snd) <|> oneArg "rcv" Rcv <|> twoArg "set" Set <|> twoArg "add" Add <|> try (twoArg "mul" Mul) <|> twoArg "mod" Mod <|> twoArg "jgz" Jgz oneArg n c = do string n >> spaces val <- regOrVal return $ c val twoArg n c = do string n >> spaces val1 <- regOrVal spaces val2 <- regOrVal return $ c val1 val2 regOrVal = register <|> value register = do name <- lower return $ Reg name value = do val <- many $ oneOf "-0123456789" return $ Val $ read val eol = char '\n' parseProgram :: String -> Either ParseError DuetProgram parseProgram = parse program "" Next up we have some utility functions that sit in the DuetState monad we defined above and perform common manipulations on the state: getting/setting/updating registers, updating the instruction pointer and sending/receiving messages via the relevant queues. getReg :: Char -> DuetState Int getReg r = do st <- get return $ M.findWithDefault 0 r (dRegisters st) putReg :: Char -> Int -> DuetState () putReg r v = do st <- get let current = dRegisters st new = M.insert r v current put $ st { dRegisters = new } modReg :: (Int -> Int -> Int) -> Char -> DuetVal -> DuetState Bool modReg op r v = do u <- getReg r v' <- getRegOrVal v putReg r (u `op` v') incPtr return False getRegOrVal :: DuetVal -> DuetState Int getRegOrVal (Reg r) = getReg r getRegOrVal (Val v) = return v addPtr :: Int -> DuetState () addPtr n = do st <- get put $ st { dPtr = n + dPtr st } incPtr = addPtr 1 send :: Int -> DuetState () send v = do st <- get put $ st { dSndBuf = (dSndBuf st ++ [v]), dSendCount = dSendCount st + 1 } recv :: DuetState (Maybe Int) recv = do st <- get case dRcvBuf st of (x:xs) -> do put $ st { dRcvBuf = xs } return $ Just x [] -> return Nothing execInst implements the logic for each instruction. It returns False as long as the program can continue, but True if the program tries to receive from an empty buffer. execInst :: DuetInstruction -> DuetState Bool execInst (Set (Reg reg) val) = do newVal <- getRegOrVal val putReg reg newVal incPtr return False execInst (Mul (Reg reg) val) = modReg (*) reg val execInst (Add (Reg reg) val) = modReg (+) reg val execInst (Mod (Reg reg) val) = modReg mod reg val execInst (Jgz val1 val2) = do st <- get test <- getRegOrVal val1 jump <- if test > 0 then getRegOrVal val2 else return 1 addPtr jump return False execInst (Snd val) = do v <- getRegOrVal val send v st <- get incPtr return False execInst (Rcv (Reg r)) = do st <- get v <- recv handle v where handle :: Maybe Int -> DuetState Bool handle (Just x) = putReg r x >> incPtr >> return False handle Nothing = return True execInst x = error $ "execInst not implemented yet for " ++ show x execNext looks up the next instruction and executes it. runUntilWait runs the program until execNext returns True to signal the wait state has been reached. execNext :: DuetState Bool execNext = do st <- get let prog = dProgram st p = dPtr st if p >= length prog then return True else execInst (prog V.! p) runUntilWait :: DuetState () runUntilWait = do waiting <- execNext unless waiting runUntilWait runTwoPrograms handles the concurrent running of two programs, by running first one and then the other to a wait state, then swapping each program’s send buffer to the other’s receive buffer before repeating. If you look carefully, you’ll see a “bang” (!) before the two arguments of the function: runTwoPrograms !d0 !d1. Haskell is a lazy language and usually doesn’t evaluate a computation until you ask for a result, instead carrying around a “thunk” or plan for how to carry out the computation. Sometimes that can be a problem because the amount of memory your program is using can explode unnecessarily as a long computation turns into a large thunk which isn’t evaluated until the very end. That’s not the problem here though. What happens here without the bangs is another side-effect of laziness. The exit condition of this recursive function is that a deadlock has been reached: both programs are waiting to receive, but neither has sent anything, so neither can ever continue. The check for this is (null $ dSndBuf d0') && (null $ dSndBuf d1'). As long as the first program has something in its send buffer, the test fails without ever evaluating the second part, which means the result d1' of running the second program is never needed. The function immediately goes to the recursive case and tries to continue the first program again, which immediately returns because it’s still waiting to receive. The same thing happens again, and the result is that instead of running the second program to obtain something for the first to receive, we get into an infinite loop trying and failing to continue the first program. The bang forces both d0 and d1 to be evaluated at the point we recurse, which forces the rest of the computation: running the second program and swapping the send/receive buffers. With that, the evaluation proceeds correctly and we terminate with a result instead of getting into an infinite loop! runTwoPrograms :: Duet -> Duet -> (Int, Int) runTwoPrograms !d0 !d1 | (null $ dSndBuf d0') && (null $ dSndBuf d1') = (dSendCount d0', dSendCount d1') | otherwise = runTwoPrograms d0'' d1'' where (_, d0') = runState runUntilWait d0 (_, d1') = runState runUntilWait d1 d0'' = d0' { dSndBuf = [], dRcvBuf = dSndBuf d1' } d1'' = d1' { dSndBuf = [], dRcvBuf = dSndBuf d0' } All that remains to be done now is to run the programs and see how many messages were sent before the deadlock. main = do prog <- fmap (fromRight V.empty . parseProgram) getContents let d0 = defaultDuet { dProgram = prog, dRegisters = M.fromList [('p', 0)] } d1 = defaultDuet { dProgram = prog, dRegisters = M.fromList [('p', 1)] } (send0, send1) = runTwoPrograms d0 d1 putStrLn $ "Program 0 sent " ++ show send0 ++ " messages" putStrLn $ "Program 1 sent " ++ show send1 ++ " messages" Spinlock — Rust/Python — #adventofcode Day 17 In today’s challenge we deal with a monstrous whirlwind of a program, eating up CPU and memory in equal measure. → Full code on GitHub (and Python driver script) !!! commentary One of the things I wanted from AoC was an opportunity to try out some popular languages that I don’t currently know, including the memory-safe, strongly-typed compiled languages Go and Rust. Realistically though, I’m likely to continue doing most of my programming in Python, and use one of these other languages when it has better tools or I need the extra speed. In which case, what I really want to know is how I can call functions written in Go or Rust from Python. I thought I'd try Rust first, as it seems to be designed to be C-compatible and that makes it easy to call from Python using [`ctypes`](https://docs.python.org/3.6/library/ctypes.html). Part 1 was another straightforward simulation: translate what the "spinlock" monster is doing into code and run it. It was pretty obvious from the story of this challenge and experience of the last few days that this was going to be another one where the simulation is too computationally expensive for part two, which turns out to be correct. So, first thing to do is to implement the meat of the solution in Rust. spinlock solves the first part of the problem by doing exactly what the monster does. Since we only have to go up to 2017 iterations, this is very tractable. The last number we insert is 2017, so we just return the number immediately after that. #[no_mangle] pub extern fn spinlock(n: usize, skip: usize) -> i32 { let mut buffer: Vec<i32> = Vec::with_capacity(n+1); buffer.push(0); buffer.push(1); let mut pos = 1; for i in 2..n+1 { pos = (pos + skip + 1) % buffer.len(); buffer.insert(pos, i as i32); } pos = (pos + 1) % buffer.len(); return buffer[pos]; } For the second part, we have to do 50 million iterations instead, which is a lot. Given that every time you insert an item in the list it has to move up all the elements after that position, I’m pretty sure the algorithm is O(n^2), so it’s going to take a lot longer than 10,000ish times the first part. Thankfully, we don’t need to build the whole list, just keep track of where 0 is and what number is immediately after it. There may be a closed-form solution to simply calculate the result, but I couldn’t think of it and this is good enough. #[no_mangle] pub extern fn spinlock0(n: usize, skip: usize) -> i32 { let mut pos = 1; let mut pos_0 = 0; let mut after_0 = 1; for i in 2..n+1 { pos = (pos + skip + 1) % i; if pos == pos_0 + 1 { after_0 = i; } if pos <= pos_0 { pos_0 += 1; } } return after_0 as i31; } Now it’s time to call this code from Python. Notice the #[no_mangle] pragmas and pub extern declarations for each function above, which are required to make sure the functions are exported in a C-compatible way. We can build this into a shared library like this: rustc --crate-type=cdylib -o spinlock.so 17-spinlock.rs The Python script is as simple as loading this library, reading the puzzle input from the command line and calling the functions. The ctypes module does a lot of magic so that we don’t have to worry about converting from Python types to native types and back again. import ctypes import sys lib = ctypes.cdll.LoadLibrary("./spinlock.so") skip = int(sys.argv[1]) print("Part 1:", lib.spinlock(2017, skip)) print("Part 2:", lib.spinlock0(50_000_000, skip)) This is a toy example as far as calling Rust from Python is concerned, but it’s worth noting that already we can play with the parameters to the two Rust functions without having to recompile. For more serious work, I’d probably be looking at something like PyO3 to make a proper Python module. Looks like there’s also a very early Rust numpy integration for integrating numerical stuff. You can also do the same thing from Julia, which has a ccall function built in: ccall((:spinlock, "./spinlock.so"), Int32, (UInt64, UInt64), 2017, 377) My next thing to try might be Haskell → Python though… Permutation Promenade — Julia — #adventofcode Day 16 Today’s challenge rather appeals to me as a folk dancer, because it describes a set of instructions for a dance and asks us to work out the positions of the dancing programs after each run through the dance. → Full code on GitHub !!! commentary So, part 1 is pretty straight forward: parse the set of instructions, interpret them and keep track of the dancer positions as you go. One time through the dance. However, part 2 asks for the positions after 1 billion (yes, that’s 1,000,000,000) times through the dance. In hindsight I should have immediately become suspicious, but I thought I’d at least try the brute force approach first because it was simpler to code. So I give it a try, and after waiting for a while, having a cup of tea etc. it still hasn't terminated. I try reducing the number of iterations to 1,000. Now it terminates, but takes about 6 seconds. A spot of arithmetic suggests that running the full version will take a little over 190 years. There must be a better way than that! I'm a little embarassed that I didn't spot the solution immediately (blaming Julia) and tried again in Python to see if I could get it to terminate quicker. When that didn't work I had to think again. A little further investigation with a while loop shows that in fact the dance position repeats (in the case of my input) every 48 times. After that it becomes much quicker! Oh, and it was time for a new language, so I wasted some extra time working out the quirks of [Julia][]. First, a function to evaluate a single move — for neatness, this dispatches to a dedicated function depending on the type of move, although this isn’t really necessary to solve the challenge. Ending a function name with a bang (!) is a Julia convention to indicate that it has side-effects. function eval_move!(move, dancers) move_type = move[1] params = move[2:end] if move_type == 's' # spin eval_spin!(params, dancers) elseif move_type == 'x' # exchange eval_exchange!(params, dancers) elseif move_type == 'p' # partner swap eval_partner!(params, dancers) end end These take care of the individual moves. Parsing the parameters from a string every single time probably isn’t ideal, but as it turns out, that optimisation isn’t really necessary. Note the + 1 in eval_exchange!, which is necessary because Julia is one of those crazy languages where indexes start from 1 instead of 0. These actions are pretty nice to implement, because Julia has circshift as a builtin to rotate a list, and allows you to assign to list slices and swap values in place with a single statement. function eval_spin!(params, dancers) shift = parse(Int, params) dancers[1:end] = circshift(dancers, shift) end function eval_exchange!(params, dancers) i, j = map(x -> parse(Int, x) + 1, split(params, "/")) dancers[i], dancers[j] = dancers[j], dancers[i] end function eval_partner!(params, dancers) a, b = split(params, "/") ia = findfirst([x == a for x in dancers]) ib = findfirst([x == b for x in dancers]) dancers[ia], dancers[ib] = b, a end dance! takes a list of moves and takes the dances once through the dance. function dance!(moves, dancers) for m in moves eval_move!(m, dancers) end end To solve part 1, we simply need to read the moves in, set up the initial positions of the dances and run the dance through once. join is necessary to a) turn characters into length-1 strings, and b) convert the list of strings back into a single string to print out. moves = split(readchomp(STDIN), ",") dancers = collect(join(c) for c in 'a':'p') orig_dancers = copy(dancers) dance!(moves, dancers) println(join(dancers)) Part 2 requires a little more work. We run the dance through again and again until we get back to the initial position, saving the intermediate positions in a list. The list now contains every possible position available from that starting point, so we can find position 1 billion by taking 1,000,000,000 modulo the list length (plus 1 because 1-based indexing) and use that to index into the list to get the final position. dance_cycle = [orig_dancers] while dancers != orig_dancers push!(dance_cycle, copy(dancers)) dance!(moves, dancers) end println(join(dance_cycle[1_000_000_000 % length(dance_cycle) + 1])) This terminates on my laptop in about 1.6s: Brute force 0; Careful thought 1! Dueling Generators — Rust — #adventofcode Day 15 Today’s challenge introduces two pseudo-random number generators which are trying to agree on a series of numbers. We play the part of the “judge”, counting the number of times their numbers agree in the lowest 16 bits. → Full code on GitHub Ever since I used Go to solve day 3, I’ve had a hankering to try the other new kid on the memory-safe compiled language block, Rust. I found it a bit intimidating at first because the syntax wasn’t as close to the C/C++ I’m familiar with and there are quite a few concepts unique to Rust, like the use of traits. But I figured it out, so I can tick another language of my to-try list. I also implemented a version in Python for comparison: the Python version is more concise and easier to read but the Rust version runs about 10× faster. First we include the std::env “crate” which will let us get access to commandline arguments, and define some useful constants for later. use std::env; const M: i64 = 2147483647; const MASK: i64 = 0b1111111111111111; const FACTOR_A: i64 = 16807; const FACTOR_B: i64 = 48271; gen_next generates the next number for a given generator’s sequence. gen_next_picky does the same, but for the “picky” generators, only returning values that meet their criteria. fn gen_next(factor: i64, current: i64) -> i64 { return (current * factor) % M; } fn gen_next_picky(factor: i64, current: i64, mult: i64) -> i64 { let mut next = gen_next(factor, current); while next % mult != 0 { next = gen_next(factor, next); } return next; } duel runs a single duel, and returns the number of times the generators agreed in the lowest 16 bits (found by doing a binary & with the mask defined above). Rust allows functions to be passed as parameters, so we use this to be able to run both versions of the duel using only this one function. fn duel<F, G>(n: i64, next_a: F, mut value_a: i64, next_b: G, mut value_b: i64) -> i64 where F: Fn(i64) -> i64, G: Fn(i64) -> i64, { let mut count = 0; for _ in 0..n { value_a = next_a(value_a); value_b = next_b(value_b); if (value_a & MASK) == (value_b & MASK) { count += 1; } } return count; } Finally, we read the start values from the command line and run the two duels. The expressions that begin |n| are closures (anonymous functions, often called lambdas in other languages) that we use to specify the generator functions for each duel. fn main() { let args: Vec<String> = env::args().collect(); let start_a: i64 = args[1].parse().unwrap(); let start_b: i64 = args[2].parse().unwrap(); println!( "Duel 1: {}", duel( 40000000, |n| gen_next(FACTOR_A, n), start_a, |n| gen_next(FACTOR_B, n), start_b, ) ); println!( "Duel 2: {}", duel( 5000000, |n| gen_next_picky(FACTOR_A, n, 4), start_a, |n| gen_next_picky(FACTOR_B, n, 8), start_b, ) ); } Disk Defragmentation — Haskell — #adventofcode Day 14 Today’s challenge has us helping a disk defragmentation program by identifying contiguous regions of used sectors on a 2D disk. → Full code on GitHub !!! commentary Wow, today’s challenge had a pretty steep learning curve. Day 14 was the first to directly reuse code from a previous day: the “knot hash” from day 10. I solved day 10 in Haskell, so I thought it would be easier to stick with Haskell for today as well. The first part was straightforward, but the second was pretty mind-bending in a pure functional language! I ended up solving it by implementing a [flood fill algorithm][flood]. It's recursive, which is right in Haskell's wheelhouse, but I ended up using `Data.Sequence` instead of the standard list type as its API for indexing is better. I haven't tried it, but I think it will also be a little faster than a naive list-based version. It took a looong time to figure everything out, but I had a day off work to be able to concentrate on it! A lot more imports for this solution, as we’re exercising a lot more of the standard library. module Main where import Prelude hiding (length, filter, take) import Data.Char (ord) import Data.Sequence import Data.Foldable hiding (length) import Data.Ix (inRange) import Data.Function ((&)) import Data.Maybe (fromJust, mapMaybe, isJust) import qualified Data.Set as Set import Text.Printf (printf) import System.Environment (getArgs) Also we’ll extract the key bits from day 10 into a module and import that. import KnotHash Now we define a few data types to make the code a bit more readable. Sector represent the state of a particular disk sector, either free, used (but unmarked) or used and marked as belonging to a given integer-labelled group. Grid is a 2D matrix of Sector, as a sequence of sequences. data Sector = Free | Used | Mark Int deriving (Eq) instance Show Sector where show Free = " ." show Used = " #" show (Mark i) = printf "%4d" i type GridRow = Seq Sector type Grid = Seq (GridRow) Some utility functions to make it easier to view the grids (which can be quite large): used for debugging but not in the finished solution. subGrid :: Int -> Grid -> Grid subGrid n = fmap (take n) . take n printRow :: GridRow -> IO () printRow row = do mapM_ (putStr . show) row putStr "\n" printGrid :: Grid -> IO () printGrid = mapM_ printRow makeKey generates the hash key for a given row. makeKey :: String -> Int -> String makeKey input n = input ++ "-" ++ show n stringToGridRow converts a binary string of ‘1’ and ‘0’ characters to a sequence of Sector values. stringToGridRow :: String -> GridRow stringToGridRow = fromList . map convert where convert x | x == '1' = Used | x == '0' = Free makeRow and makeGrid build up the grid to use based on the provided input string. makeRow :: String -> Int -> GridRow makeRow input n = stringToGridRow $ concatMap (printf "%08b") $ dense $ fullKnotHash 256 $ map ord $ makeKey input n makeGrid :: String -> Grid makeGrid input = fromList $ map (makeRow input) [0..127] Utility functions to count the number of used and free sectors, to give the solution to part 1. countEqual :: Sector -> Grid -> Int countEqual x = sum . fmap (length . filter (==x)) countUsed = countEqual Used countFree = countEqual Free Now the real meat begins! fundUnmarked finds the location of the next used sector that we haven’t yet marked. It returns a Maybe value, which is Just (x, y) if there is still an unmarked block or Nothing if there’s nothing left to mark. findUnmarked :: Grid -> Maybe (Int, Int) findUnmarked g | y == Nothing = Nothing | otherwise = Just (fromJust x, fromJust y) where hasUnmarked row = isJust $ elemIndexL Used row x = findIndexL hasUnmarked g y = case x of Nothing -> Nothing Just x' -> elemIndexL Used $ index g x' floodFill implements a very simple recursive flood fill. It takes a target and replacement value and a starting location, and fills in the replacement value for every connected location that currently has the target value. We use it below to replace a connected used region with a marked region. floodFill :: Sector -> Sector -> (Int, Int) -> Grid -> Grid floodFill t r (x, y) g | inRange (0, length g - 1) x && inRange (0, length g - 1) y && elem == t = let newRow = update y r row newGrid = update x newRow g in newGrid & floodFill t r (x+1, y) & floodFill t r (x-1, y) & floodFill t r (x, y+1) & floodFill t r (x, y-1) | otherwise = g where row = g `index` x elem = row `index` y markNextGroup looks for an unmarked group and marks it if found. If no more groups are found it returns Nothing. markAllGroups then repeatedly applies markNextGroup until Nothing is returned. markNextGroup :: Int -> Grid -> Maybe Grid markNextGroup i g = case findUnmarked g of Nothing -> Nothing Just loc -> Just $ floodFill Used (Mark i) loc g markAllGroups :: Grid -> Grid markAllGroups g = markAllGroups' 1 g where markAllGroups' i g = case markNextGroup i g of Nothing -> g Just g' -> markAllGroups' (i+1) g' onlyMarks filters a grid row and returns a list of (possibly duplicated) group numbers in the row. onlyMarks :: GridRow -> [Int] onlyMarks = mapMaybe getMark . toList where getMark Free = Nothing getMark Used = Nothing getMark (Mark i) = Just i Finally, countGroups puts all the group numbers into a set to get rid of duplicates and returns the size of the set, i.e. the total number of separate groups. countGroups :: Grid -> Int countGroups g = Set.size groupSet where groupSet = foldl' Set.union Set.empty $ fmap rowToSet g rowToSet = Set.fromList . toList . onlyMarks As always, every Haskell program needs a main function to drive the I/O and produce the actual result. main = do input <- fmap head getArgs let grid = makeGrid input used = countUsed grid marked = countGroups $ markAllGroups grid putStrLn $ "Used sectors: " ++ show used putStrLn $ "Groups: " ++ show marked Packet Scanners — Haskell — #adventofcode Day 13 Today’s challenge requires us to sneak past a firewall made up of a series of scanners. → Full code on GitHub !!! commentary I wasn’t really thinking straight when I solved this challenge. I got a solution without too much trouble, but I ended up simulating the step-by-step movement of the scanners. I finally realised that I could calculate whether or not a given scanner was safe at a given time directly with modular arithmetic, and it bugged me so much that I reimplemented the solution. Both are given below, the faster one first. First we introduce some standard library stuff and define some useful utilities. module Main where import qualified Data.Text as T import Data.Maybe (mapMaybe) strip :: String -> String strip = T.unpack . T.strip . T.pack splitOn :: String -> String -> [String] splitOn sep = map T.unpack . T.splitOn (T.pack sep) . T.pack parseScanner :: String -> (Int, Int) parseScanner s = (d, r) where [d, r] = map read $ splitOn ": " s traverseFW does all the hard work: it checks for each scanner whether or not it’s safe as we pass through, and returns a list of the severities of each time we’re caught. mapMaybe is like the standard map in many languages, but operates on a list of Haskell Maybe values, like a combined map and filter. If the value is Just x, x gets included in the returned list; if the value is Nothing, then it gets thrown away. traverseFW :: Int -> [(Int, Int)] -> [Int] traverseFW delay = mapMaybe caught where caught (d, r) = if (d + delay) `mod` (2*(r-1)) == 0 then Just (d * r) else Nothing Then the total severity of our passage through the firewall is simply the sum of each individual severity. severity :: [(Int, Int)] -> Int severity = sum . traverseFW 0 But we don’t want to know how badly we got caught, we want to know how long to wait before setting off to get through safely. findDelay tries traversing the firewall with increasing delay, and returns the delay for the first pass where we predict not getting caught. findDelay :: [(Int, Int)] -> Int findDelay scanners = head $ filter (null . flip traverseFW scanners) [0..] And finally, we put it all together and calculate and print the result. main = do scanners <- fmap (map parseScanner . lines) getContents putStrLn $ "Severity: " ++ (show $ severity scanners) putStrLn $ "Delay: " ++ (show $ findDelay scanners) I’m not generally bothered about performance for these challenges, but here I’ll note that my second attempt runs in a little under 2 seconds on my laptop: $ time ./13-packet-scanners-redux < 13-input.txt Severity: 1900 Delay: 3966414 ./13-packet-scanners-redux < 13-input.txt 1.73s user 0.02s system 99% cpu 1.754 total Compare that with the first, simulation-based one, which takes nearly a full minute: $ time ./13-packet-scanners < 13-input.txt Severity: 1900 Delay: 3966414 ./13-packet-scanners < 13-input.txt 57.63s user 0.27s system 100% cpu 57.902 total And for good measure, here’s the code. Notice the tick and tickOne functions, which together simulate moving all the scanners by one step; for this to work we have to track the full current state of each scanner, which is easier to read with a Haskell record-based custom data type. traverseFW is more complicated because it has to drive the simulation, but the rest of the code is mostly the same. module Main where import qualified Data.Text as T import Control.Monad (forM_) data Scanner = Scanner { depth :: Int , range :: Int , pos :: Int , dir :: Int } instance Show Scanner where show (Scanner d r p dir) = show d ++ "/" ++ show r ++ "/" ++ show p ++ "/" ++ show dir strip :: String -> String strip = T.unpack . T.strip . T.pack splitOn :: String -> String -> [String] splitOn sep str = map T.unpack $ T.splitOn (T.pack sep) $ T.pack str parseScanner :: String -> Scanner parseScanner s = Scanner d r 0 1 where [d, r] = map read $ splitOn ": " s tickOne :: Scanner -> Scanner tickOne (Scanner depth range pos dir) | pos <= 0 = Scanner depth range (pos+1) 1 | pos >= range - 1 = Scanner depth range (pos-1) (-1) | otherwise = Scanner depth range (pos+dir) dir tick :: [Scanner] -> [Scanner] tick = map tickOne traverseFW :: [Scanner] -> [(Int, Int)] traverseFW = traverseFW' 0 where traverseFW' _ [] = [] traverseFW' layer scanners@((Scanner depth range pos _):rest) -- | layer == depth && pos == 0 = (depth*range) + (traverseFW' (layer+1) $ tick rest) | layer == depth && pos == 0 = (depth,range) : (traverseFW' (layer+1) $ tick rest) | layer == depth && pos /= 0 = traverseFW' (layer+1) $ tick rest | otherwise = traverseFW' (layer+1) $ tick scanners severity :: [Scanner] -> Int severity = sum . map (uncurry (*)) . traverseFW empty :: [a] -> Bool empty [] = True empty _ = False findDelay :: [Scanner] -> Int findDelay scanners = delay where (delay, _) = head $ filter (empty . traverseFW . snd) $ zip [0..] $ iterate tick scanners main = do scanners <- fmap (map parseScanner . lines) getContents putStrLn $ "Severity: " ++ (show $ severity scanners) putStrLn $ "Delay: " ++ (show $ findDelay scanners) Digital Plumber — Python — #adventofcode Day 12 Today’s challenge has us helping a village of programs who are unable to communicate. We have a list of the the communication channels between their houses, and need to sort them out into groups such that we know that each program can communicate with others in its own group but not any others. Then we have to calculate the size of the group containing program 0 and the total number of groups. → Full code on GitHub !!! commentary This is one of those problems where I’m pretty sure that my algorithm isn’t close to being the most efficient, but it definitely works! For the sake of solving the challenge that’s all that matters, but it still bugs me. By now I’ve become used to using fileinput to transparently read data either from files given on the command-line or standard input if no arguments are given. import fileinput as fi First we make an initial pass through the input data, creating a group for each line representing the programs on that line (which can communicate with each other). We store this as a Python set. groups = [] for line in fi.input(): head, rest = line.split(' <-> ') group = set([int(head)]) group.update([int(x) for x in rest.split(', ')]) groups.append(group) Now we iterate through the groups, starting with the first, and merging any we find that overlap with our current group. i = 0 while i < len(groups): current = groups[i] Each pass through the groups brings more programs into the current group, so we have to go through and check their connections too. We make several merge passes, until we detect that no more merges took place. num_groups = len(groups) + 1 while num_groups > len(groups): j = i+1 num_groups = len(groups) This inner loop does the actual merging, and deletes each group as it’s merged in. while j < len(groups): if len(current & groups[j]) > 0: current.update(groups[j]) del groups[j] else: j += 1 i += 1 All that’s left to do now is to display the results. print("Number in group 0:", len([g for g in groups if 0 in g][0])) print("Number of groups:", len(groups)) Hex Ed — Python — #adventofcode Day 11 Today’s challenge is to help a program find its child process, which has become lost on a hexagonal grid. We need to follow the path taken by the child (given as input) and calculate the distance it is from home along with the furthest distance it has been at any point along the path. → Full code on GitHub !!! commentary I found this one quite interesting in that it was very quick to solve. In fact, I got lucky and my first quick implementation (max(abs(l)) below) gave the correct answer in spite of missing an obvious not-so-edge case. Thinking about it, there’s only a ⅓ chance that the first incorrect implementation would give the wrong answer! The code is shorter, so you get more words today. ☺ There are a number of different co-ordinate systems on a hexagonal grid (I discovered while reading up after solving it…). I intuitively went for the system known as ‘axial’ coordinates, where you pick two directions aligned to the grid as your x and y axes: note that these won’t be perpendicular. I chose ne/sw as the x axis and se/nw as y, but there are three other possible choices. That leads to the following definition for the directions, encoded as numpy arrays because that makes some of the code below neater. import numpy as np STEPS = {d: np.array(v) for d, v in [('ne', (1, 0)), ('se', (0, -1)), ('s', (-1, -1)), ('sw', (-1, 0)), ('nw', (0, 1)), ('n', (1, 1))]} hex_grid_dist, given a location l calculates the number of steps needed to reach that location from the centre at (0, 0). Notice that we can’t simply use the Manhattan distance here because, for example, one step north takes us to (1, 1), which would give a Manhattan distance of 2. Instead, we can see that moving in the n/s direction allows us to increment or decrement both coordinates at the same time: If the coordinates have the same sign: move n/s until one of them is zero, then move along the relevant ne or se axis back to the origin; in this case the number of steps is greatest of the absolute values of the two coordinates If the coordinates have opposite signs: move independently along the ne and se axes to reduce each to 0; this time the number of steps is the sum of the absolute values of the two coordinates def hex_grid_distance(l): if sum(np.sign(l)) == 0: # i.e. opposite signs return sum(abs(l)) else: return max(abs(l)) Now we can read in the path followed by the child and follow it ourselves, tracking the maximum distance from home along the way. path = input().strip().split(',') location = np.array((0, 0)) max_distance = 0 for step in map(STEPS.get, path): location += step max_distance = max(max_distance, hex_grid_distance(location)) distance = hex_grid_distance(location) print("Child process is at", location, "which is", distance, "steps away") print("Greatest distance was", max_distance) Knot Hash — Haskell — #adventofcode Day 10 Today’s challenge asks us to help a group of programs implement a (highly questionable) hashing algorithm that involves repeatedly reversing parts of a list of numbers. → Full code on GitHub !!! commentary I went with Haskell again today, because it’s the weekend so I have a bit more time, and I really enjoyed yesterday’s Haskell implementation. Today gave me the opportunity to explore the standard library a bit more, as well as lending itself nicely to being decomposed into smaller parts to be combined using higher-order functions. You know the drill by know: import stuff we’ll use later. module Main where import Data.Char (ord) import Data.Bits (xor) import Data.Function ((&)) import Data.List (unfoldr) import Text.Printf (printf) import qualified Data.Text as T The worked example uses a concept of the “current position” as a pointer to a location in a static list. In Haskell it makes more sense to instead use the front of the list as the current position, and rotate the whole list as we progress to bring the right element to the front. rotate :: Int -> [Int] -> [Int] rotate 0 xs = xs rotate n xs = drop n' xs ++ take n' xs where n' = n `mod` length xs The simple version of the hash requires working through the input list, modifying the working list as we go, and incrementing a “skip” counter with each step. Converting this to a functional style, we simply zip up the input with an infinite list [0, 1, 2, 3, ...] to give the counter values. Notice that we also have to calculate how far to rotate the working list to get back to its original position. foldl lets us specify a function that returns a modified version of the working list and feeds the input list in one at a time. simpleKnotHash :: Int -> [Int] -> [Int] simpleKnotHash size input = foldl step [0..size-1] input' & rotate (negate finalPos) where input' = zip input [0..] finalPos = sum $ zipWith (+) input [0..] reversePart xs n = (reverse $ take n xs) ++ drop n xs step xs (n, skip) = reversePart xs n & rotate (n+skip) The full version of the hash (part 2 of the challenge) starts the same way as the simple version, except making 64 passes instead of one: we can do this by using replicate to make a list of 64 copies, then collapse that into a single list with concat. fullKnotHash :: Int -> [Int] -> [Int] fullKnotHash size input = simpleKnotHash size input' where input' = concat $ replicate 64 input The next step in calculating the full hash collapses the full 256-element “sparse” hash down into 16 elements by XORing groups of 16 together. unfoldr is a nice efficient way of doing this. dense :: [Int] -> [Int] dense = unfoldr dense' where dense' [] = Nothing dense' xs = Just (foldl1 xor $ take 16 xs, drop 16 xs) The final hash step is to convert the list of integers into a hexadecimal string. hexify :: [Int] -> String hexify = concatMap (printf "%02x") These two utility functions put together building blocks from the Data.Text module to parse the input string. Note that no arguments are given: the functions are defined purely by composing other functions using the . operator. In Haskell this is referred to as “point-free” style. strip :: String -> String strip = T.unpack . T.strip . T.pack parseInput :: String -> [Int] parseInput = map (read . T.unpack) . T.splitOn (T.singleton ',') . T.pack Now we can put it all together, including building the weird input for the “full” hash. main = do input <- fmap strip getContents let simpleInput = parseInput input asciiInput = map ord input ++ [17, 31, 73, 47, 23] (a:b:_) = simpleKnotHash 256 simpleInput print $ (a*b) putStrLn $ fullKnotHash 256 asciiInput & dense & hexify Stream Processing — Haskell — #adventofcode Day 9 In today’s challenge we come across a stream that we need to cross. But of course, because we’re stuck inside a computer, it’s not water but data flowing past. The stream is too dangerous to cross until we’ve removed all the garbage, and to prove we can do that we have to calculate a score for the valid data “groups” and the number of garbage characters to remove. → Full code on GitHub !!! commentary One of my goals for this process was to knock the rust of my functional programming skills in Haskell, and I haven’t done that for the whole of the first week. Processing strings character by character and acting according to which character shows up seems like a good choice for pattern-matching though, so here we go. I also wanted to take a bash at test-driven development in Haskell, so I also loaded up the Test.Hspec module to give it a try. I did find keeping track of all the state in arguments a bit mind boggling, and I think it could have been improved through use of a data type using record syntax and the `State` monad, so that's something to look at for a future challenge. First import the extra bits we’ll need. module Main where import Test.Hspec import Data.Function ((&)) countGroups solves the first part of the problem, counting up the “score” of the valid data in the stream. countGroups' is an auxiliary function that holds some state in its arguments. We use pattern matching for the base case: [] represents the empty list in Haskell, which indicates we’ve finished the whole stream. Otherwise, we split the remaining stream into its first character and remainder, and use guards to decide how to interpret it. If skip is true, discard the character and carry on with skip set back to false. If we find a “!”, that tells us to skip the next. Other characters mark groups or sets of garbage: groups increase the score when they close and garbage is discarded. We continue to progress the list by recursing with the remainder of the stream and any updated state. countGroups :: String -> Int countGroups = countGroups' 0 0 False False where countGroups' score _ _ _ [] = score countGroups' score level garbage skip (c:rest) | skip = countGroups' score level garbage False rest | c == '!' = countGroups' score level garbage True rest | garbage = case c of '>' -> countGroups' score level False False rest _ -> countGroups' score level True False rest | otherwise = case c of '{' -> countGroups' score (level+1) False False rest '}' -> countGroups' (score+level) (level-1) False False rest ',' -> countGroups' score level False False rest '<' -> countGroups' score level True False rest c -> error $ "Garbage character found outside garbage: " ++ show c countGarbage works almost identically to countGroups, except it ignores groups and counts garbage. They are structured so similarly that it would probably make more sense to combine them to a single function that returns both counts. countGarbage :: String -> Int countGarbage = countGarbage' 0 False False where countGarbage' count _ _ [] = count countGarbage' count garbage skip (c:rest) | skip = countGarbage' count garbage False rest | c == '!' = countGarbage' count garbage True rest | garbage = case c of '>' -> countGarbage' count False False rest _ -> countGarbage' (count+1) True False rest | otherwise = case c of '<' -> countGarbage' count True False rest _ -> countGarbage' count False False rest Hspec gives us a domain-specific language heavily inspired by the rspec library for Ruby: the tests read almost like natural language. I built up these tests one-by-one, gradually implementing the appropriate bits of the functions above, a process known as Test-driven development. runTests = hspec $ do describe "countGroups" $ do it "counts valid groups" $ do countGroups "{}" `shouldBe` 1 countGroups "{{{}}}" `shouldBe` 6 countGroups "{{{},{},{{}}}}" `shouldBe` 16 countGroups "{{},{}}" `shouldBe` 5 it "ignores garbage" $ do countGroups "{<a>,<a>,<a>,<a>}" `shouldBe` 1 countGroups "{{<ab>},{<ab>},{<ab>},{<ab>}}" `shouldBe` 9 it "skips marked characters" $ do countGroups "{{<!!>},{<!!>},{<!!>},{<!!>}}" `shouldBe` 9 countGroups "{{<a!>},{<a!>},{<a!>},{<ab>}}" `shouldBe` 3 describe "countGarbage" $ do it "counts garbage characters" $ do countGarbage "<>" `shouldBe` 0 countGarbage "<random characters>" `shouldBe` 17 countGarbage "<<<<>" `shouldBe` 3 it "ignores non-garbage" $ do countGarbage "{{},{}}" `shouldBe` 0 countGarbage "{{<ab>},{<ab>},{<ab>},{<ab>}}" `shouldBe` 8 it "skips marked characters" $ do countGarbage "<{!>}>" `shouldBe` 2 countGarbage "<!!>" `shouldBe` 0 countGarbage "<!!!>" `shouldBe` 0 countGarbage "<{o\"i!a,<{i<a>" `shouldBe` 10 Finally, the main function reads in the challenge input and calculates the answers, printing them on standard output. main = do runTests repeat '=' & take 78 & putStrLn input <- getContents & fmap (filter (/='\n')) putStrLn $ "Found " ++ show (countGroups input) ++ " groups" putStrLn $ "Found " ++ show (countGarbage input) ++ " characters garbage" I Heard You Like Registers — Python — #adventofcode Day 8 Today’s challenge describes a simple instruction set for a CPU, incrementing and decrementing values in registers according to simple conditions. We have to interpret a stream of these instructions, and to prove that we’ve done so, give the highest value of any register, both at the end of the program and throughout the whole program. → Full code on GitHub !!! commentary This turned out to be a nice straightforward one to implement, as the instruction format was easily parsed by regular expression, and Python provides the eval function which made evaluating the conditions a doddle. Import various standard library bits that we’ll use later. import re import fileinput as fi from math import inf from collections import defaultdict We could just parse the instructions by splitting the string, but using a regular expression is a little bit more robust because it won’t match at all if given an invalid instruction. INSTRUCTION_RE = re.compile(r'(\w+) (inc|dec) (-?\d+) if (.+)\s*') def parse_instruction(instruction): match = INSTRUCTION_RE.match(instruction) return match.group(1, 2, 3, 4) Executing an instruction simply checks the condition and if it evaluates to True updates the relevant register. def exec_instruction(registers, instruction): name, op, value, cond = instruction value = int(value) if op == 'dec': value = -value if eval(cond, globals(), registers): registers[name] += value highest_value returns the maximum value found in any register. def highest_value(registers): return sorted(registers.items(), key=lambda x: x[1], reverse=True)[0][1] Finally, loop through all the instructions and carry them out, updating global_max as we go. We need to be able to deal with registers that haven’t been accessed before. Keeping the registers in a dictionary means that we can evaluate the conditions directly using eval above, passing it as the locals argument. The standard dict will raise an exception if we try to access a key that doesn’t exist, so instead we use collections.defaultdict, which allows us to specify what the default value for a non-existent key will be. New registers start at 0, so we use a simple lambda to define a function that always returns 0. global_max = -inf registers = defaultdict(lambda: 0) for i in map(parse_instruction, fi.input()): exec_instruction(registers, i) global_max = max(global_max, highest_value(registers)) print('Max value:', highest_value(registers)) print('All-time max:', global_max) Recursive Circus — Ruby — #adventofcode Day 7 Today’s challenge introduces a set of processes balancing precariously on top of each other. We find them stuck and unable to get down because one of the processes is the wrong size, unbalancing the whole circus. Our job is to figure out the root from the input and then find the correct weight for the single incorrect process. → Full code on GitHub !!! commentary So I didn’t really intend to take a full polyglot approach to Advent of Code, but it turns out to have been quite fun, so I made a shortlist of languages to try. Building a tree is a classic application for object-orientation using a class to represent tree nodes, and I’ve always liked the feel of Ruby’s class syntax, so I gave it a go. First make sure we have access to Set, which we’ll use later. require 'set' Now to define the CircusNode class, which represents nodes in the tree. attr :s automatically creates a function s that returns the value of the instance attribute @s class CircusNode attr :name, :weight def initialize(name, weight, children=nil) @name = name @weight = weight @children = children || [] end Add a << operator (the same syntax for adding items to a list) that adds a child to this node. def <<(c) @children << c @total_weight = nil end total_weight recursively calculates the weight of this node and everything above it. The @total_weight ||= blah idiom caches the value so we only calculate it once. def total_weight @total_weight ||= @weight + @children.map {|c| c.total_weight}.sum end balance_weight does the hard work of figuring out the proper weight for the incorrect node by recursively searching through the tree. def balance_weight(target=nil) by_weight = Hash.new{|h, k| h[k] = []} @children.each{|c| by_weight[c.total_weight] << c} if by_weight.size == 1 then if target return @weight - (total_weight - target) else raise ArgumentError, 'This tree seems balanced!' end else odd_one_out = by_weight.select {|k, v| v.length == 1}.first[1][0] child_target = by_weight.select {|k, v| v.length > 1}.first[0] return odd_one_out.balance_weight child_target end end A couple of utility functions for displaying trees finish off the class. def to_s "#{@name} (#{@weight})" end def print_tree(n=0) puts "#{' '*n}#{self} -> #{self.total_weight}" @children.each do |child| child.print_tree n+1 end end end build_circus takes input as a list of lists [name, weight, children]. We make two passes over this list, first creating all the nodes, then building the tree by adding children to parents. def build_circus(data) all_nodes = {} all_children = Set.new data.each do |name, weight, children| all_nodes[name] = CircusNode.new name, weight end data.each do |name, weight, children| children.each {|child| all_nodes[name] << all_nodes[child]} all_children.merge children end root_name = (all_nodes.keys.to_set - all_children).first return all_nodes[root_name] end Finally, build the tree and solve the problem! Note that we use String.to_sym to convert the node names to symbols (written in Ruby as :symbol), because they’re faster to work with in Hashes and Sets as we do above. data = readlines.map do |line| match = /(?<parent>\w+) \((?<weight>\d+)\)(?: -> (?<children>.*))?/.match line [match['parent'].to_sym, match['weight'].to_i, match['children'] ? match['children'].split(', ').map {|x| x.to_sym} : []] end root = build_circus data puts "Root node: #{root}" puts root.balance_weight Memory Reallocation — Python — #adventofcode Day 6 Today’s challenge asks us to follow a recipe for redistributing objects in memory that bears a striking resemblance to the rules of the African game Mancala. → Full code on GitHub !!! commentary When I was doing my MSci, one of our programming exercises was to write (in Haskell, IIRC) a program to play a Mancala variant called Oware, so this had a nice ring of nostalgia. Back to Python today: it's already become clear that it's by far my most fluent language, which makes sense as it's the only one I've used consistently since my schooldays. I'm a bit behind on the blog posts, so you get this one without any explanation, for now at least! import math def reallocate(mem): max_val = -math.inf size = len(mem) for i, x in enumerate(mem): if x > max_val: max_val = x max_index = i i = max_index mem[i] = 0 remaining = max_val while remaining > 0: i = (i + 1) % size mem[i] += 1 remaining -= 1 return mem def detect_cycle(mem): mem = list(mem) steps = 0 prev_states = {} while tuple(mem) not in prev_states: prev_states[tuple(mem)] = steps steps += 1 mem = reallocate(mem) return (steps, steps - prev_states[tuple(mem)]) initial_state = map(int, input().split()) print("Initial state is ", initial_state) steps, cycle = detect_cycle(initial_state) print("Steps to cycle: ", steps) print("Steps in cycle: ", cycle) A Maze of Twisty Trampolines — C++ — #adventofcode Day 5 Today’s challenge has us attempting to help the CPU escape from a maze of instructions. It’s not quite a Turing Machine, but it has that feeling of moving a read/write head up and down a tape acting on and changing the data found there. → Full code on GitHub !!! commentary I haven’t written anything in C++ for over a decade. It sounds like there have been lots of interesting developments in the language since then, with C++11, C++14 and the freshly finalised C++17 standards (built-in parallelism in the STL!). I won’t use any of those, but I thought I’d dust off my C++ and see what happened. Thankfully the Standard Template Library classes still did what I expected! As usual, we first include the parts of the standard library we’re going to use: iostream for input & output; vector for the container. We also declare that we’re using the std namespace, so that we don’t have to prepend vector and the other classes with std::. #include <iostream> #include <vector> using namespace std; steps_to_escape_part1 implements part 1 of the challenge: we read a location, move forward/backward by the number of steps given in that location, then add one to the location before repeating. The result is the number of steps we take before jumping outside the list. int steps_to_escape_part1(vector<int>& instructions) { int pos = 0, iterations = 0, new_pos; while (pos < instructions.size()) { new_pos = pos + instructions[pos]; instructions[pos]++; pos = new_pos; iterations++; } return iterations; } steps_to_escape_part2 solves part 2, which is very similar, except that an offset greater than 3 is decremented instead of incremented before moving on. int steps_to_escape_part2(vector<int>& instructions) { int pos = 0, iterations = 0, new_pos, offset; while (pos < instructions.size()) { offset = instructions[pos]; new_pos = pos + offset; instructions[pos] += offset >=3 ? -1 : 1; pos = new_pos; iterations++; } return iterations; } Finally we pull it all together and link it up to the input. int main() { vector<int> instructions1, instructions2; int n; The cin class lets us read data from standard input, which we then add to a vector of ints to give our list of instructions. while (true) { cin >> n; if (cin.eof()) break; instructions1.push_back(n); } Solving the problem modifies the input, so we need to take a copy to solve part 2 as well. Thankfully the STL makes this easy with iterators. instructions2.insert(instructions2.begin(), instructions1.begin(), instructions1.end()); Finally, compute the result and print it on standard output. cout << steps_to_escape_part1(instructions1) << endl; cout << steps_to_escape_part2(instructions2) << endl; return 0; } High Entropy Passphrases — Python — #adventofcode Day 4 Today’s challenge describes some simple rules supposedly intended to enforce the use of secure passwords. All we have to do is test a list of passphrase and identify which ones meet the rules. → Full code on GitHub !!! commentary Fearing that today might be as time-consuming as yesterday, I returned to Python and it’s hugely powerful “batteries-included” standard library. Thankfully this challenge was more straightforward, and I actually finished this before finishing day 3. First, let’s import two useful utilities. from fileinput import input from collections import Counter Part 1 requires simply that a passphrase contains no repeated words. No problem: we split the passphrase into words and count them, and check if any was present more than once. Counter is an amazingly useful class to have in a language’s standard library. All it does is count things: you add objects to it, and then it will tell you how many of a given object you have. We’re going to use it to count those potentially duplicated words. def is_valid(passphrase): counter = Counter(passphrase.split()) return counter.most_common(1)[0][1] == 1 Part 2 requires that no word in the passphrase be an anagram of any other word. Since we don’t need to do anything else with the words afterwards, we can check for anagrams by sorting the letters in each word: “leaf” and “flea” both become “aefl” and can be compared directly. Then we count as before. def is_valid_ana(passphrase): counter = Counter(''.join(sorted(word)) for word in passphrase.split()) return counter.most_common(1)[0][1] == 1 Finally we pull everything together. sum(map(boolean_func, list)) is a common idiom in Python for counting the number of times a condition (checked by boolean_func) is true. In Python, True and False can be treated as the numbers 1 and 0 respectively, so that summing a list of Boolean values gives you the number of True values in the list. lines = list(input()) print(sum(map(is_valid, lines))) print(sum(map(is_valid_ana, lines))) Spiral Memory — Go — #adventofcode Day 3 Today’s challenge requires us to perform some calculations on an “experimental memory layout”, with cells moving outwards from the centre of a square spiral (squiral?). → Full code on GitHub !!! commentary I’ve been wanting to try my hand at Go, the memory-safe, statically typed compiled language from Google for a while. Today’s challenge seemed a bit more mathematical in nature, meaning that I wouldn’t need too many advanced language features or knowledge of a standard library, so I thought I’d give it a “go”. It might have been my imagination, but it was impressive how quickly the compiled program chomped through 60 different input values while I was debugging. I actually spent far too long on this problem because my brain led me down a blind alley trying to do the wrong calculation, but I got there in the end! The solution is a bit difficult to explain without diagrams, which I don't really have time to draw right now, but fear not because several other people have. First take a look at [the challenge itself which explains the spiral memory concept](http://adventofcode.com/2017/day/3). Then look at the [nice diagrams that Phil Tooley made with Python](http://acceleratedscience.co.uk/blog/adventofcode-day-3-spiral-memory/) and hopefully you'll be able to see what's going on! It's interesting to note that this challenge also admits of an algorithmic solution instead of the mathematical one: you can model the memory as an infinite grid using a suitable data structure and literally move around it in a spiral. In hindsight this is a much better way of solving the challenge quickly because it's easier and less error-prone to code. I'm quite pleased with my maths-ing though, and it's much quicker than the algorithmic version! First some Go boilerplate: we have to define the package we’re in (main, because it’s an executable we’re producing) and import the libraries we’ll use. package main import ( "fmt" "math" "os" ) Weirdly, Go doesn’t seem to have these basic mathematics functions for integers in its standard library (please someone correct me if I’m wrong!) so I’ll define them instead of mucking about with data types. Go doesn’t do any implicit type conversion, even between numeric types, and the math builtin package only operates on float64 values. func abs(n int) int { if n < 0 { return -n } return n } func min(x, y int) int { if x < y { return x } return y } func max(x, y int) int { if x > y { return x } return y } This does the heavy lifting for part one: converting from a position on the spiral to a column and row in the grid. (0, 0) is the centre of the spiral. This actually does a bit more than is necessary to calculate the distance as required for part 1, but we’ll use it again for part 2. func spiral_to_xy(n int) (int, int) { if n == 1 { return 0, 0 } r := int(math.Floor((math.Sqrt(float64(n-1)) + 1) / 2)) n_r := n - (2*r-1)*(2*r-1) o := ((n_r - 1) % (2 * r)) - r + 1 sector := (n_r - 1) / (2 * r) switch sector { case 0: return r, o case 1: return -o, r case 2: return -r, -o case 3: return o, -r } return 0, 0 } Now use spiral_to_xy to calculate the Manhattan distance that the value at location n in the spiral memory are carried to reach the “access port” at 0. func distance(n int) int { x, y := spiral_to_xy(n) return abs(x) + abs(y) } This function does the opposite of spiral_to_xy, translating a grid position back to its position on the spiral. This is the one that took me far too long to figure out because I had a brain bug and tried to calculate the value s (which sector or quarter of the spiral we’re looking at) in a way that was never going to work! Fortunately I came to my senses. func xy_to_spiral(x, y int) int { if x == 0 && y == 0 { return 1 } r := max(abs(x), abs(y)) var s, o, n int if x+y > 0 && x-y >= 0 { s = 0 } else if x-y < 0 && x+y >= 0 { s = 1 } else if x+y < 0 && x-y <= 0 { s = 2 } else { s = 3 } switch s { case 0: o = y case 1: o = -x case 2: o = -y case 3: o = x } n = o + r*(2*s+1) + (2*r-1)*(2*r-1) return n } This is a utility function that uses xy_to_spiral to fetch the value at a given (x, y) location, and returns zero if we haven’t filled that location yet. func get_spiral(mem []int, x, y int) int { n := xy_to_spiral(x, y) - 1 if n < len(mem) { return mem[n] } return 0 } Finally we solve part 2 of the problem, which involves going round the spiral writing values into it that are the sum of some values already written. The result is the first of these sums that is greater than or equal to the given input value. func stress_test(input int) int { mem := make([]int, 1) n := 0 mem[0] = 1 for mem[n] < input { n++ x, y := spiral_to_xy(n + 1) mem = append(mem, get_spiral(mem, x+1, y)+ get_spiral(mem, x+1, y+1)+ get_spiral(mem, x, y+1)+ get_spiral(mem, x-1, y+1)+ get_spiral(mem, x-1, y)+ get_spiral(mem, x-1, y-1)+ get_spiral(mem, x, y-1)+ get_spiral(mem, x+1, y-1)) } return mem[n] } Now the last part of the program puts it all together, reading the input value from a commandline argument and printing the results of the two parts of the challenge: func main() { var n int fmt.Sscanf(os.Args[1], "%d", &n) fmt.Printf("Input is %d\n", n) fmt.Printf("Distance is %d\n", distance(n)) fmt.Printf("Stress test result is %d\n", stress_test(n)) } Corruption Checksum — Python — #adventofcode Day 2 Today’s challenge is to calculate a rather contrived “checksum” over a grid of numbers. → Full code on GitHub !!! commentary Today I went back to plain Python, and I didn’t do formal tests because only one test case was given for each part of the problem. I just got stuck in. I did write part 2 out in as nested `for` loops as an intermediate step to working out the generator expression. I think that expanded version may have been more readable. Having got that far, I couldn't then work out how to finally eliminate the need for an auxiliary function entirely without either sorting the same elements multiple times or sorting each row as it's read. First we read in the input, split it and convert it to numbers. fileinput.input() returns an iterator over the lines in all the files passed as command-line arguments, or over standard input if no files are given. from fileinput import input sheet = [[int(x) for x in l.split()] for l in input()] Part 1 of the challenge calls for finding the difference between the largest and smallest number in each row, and then summing those differences: print(sum(max(x) - min(x) for x in sheet)) Part 2 is a bit more involved: for each row we have to find the unique pair of elements that divide into each other without remainder, then sum the result of those divisions. We can make it a little easier by sorting each row; then we can take each number in turn and compare it only with the numbers after it (which are guaranteed to be larger). Doing this ensures we only make each comparison once. def rowsum_div(row): row = sorted(row) return sum(y // x for i, x in enumerate(row) for y in row[i+1:] if y % x == 0) print(sum(map(rowsum_div, sheet))) We can make this code shorter (if not easier to read) by sorting each row as it’s read: sheet = [sorted(int(x) for x in l.split()) for l in input()] Then we can just use the first and last elements in each row for part 1, as we know those are the smallest and largest respectively in the sorted row: print(sum(x[-1] - x[0] for x in sheet)) Part 2 then becomes a sum over a single generator expression: print(sum(y // x for row in sheet for i, x in enumerate(row) for y in row[i+1:] if y % x == 0)) Very satisfying! Inverse Captcha — Coconut — #adventofcode Day 1 Well, December’s here at last, and with it Day 1 of Advent of Code. … It goes on to explain that you may only leave by solving a captcha to prove you’re not a human. Apparently, you only get one millisecond to solve the captcha: too fast for a normal human, but it feels like hours to you. … As well as posting solutions here when I can, I’ll be putting them all on https://github.com/jezcope/aoc2017 too. !!! commentary After doing some challenges from last year in Haskell for a warm up, I felt inspired to try out the functional-ish Python dialect, Coconut. Now that I’ve done it, it feels a bit of an odd language, neither fish nor fowl. It’ll look familiar to any Pythonista, but is loaded with features normally associated with functional languages, like pattern matching, destructuring assignment, partial application and function composition. That makes it quite fun to work with, as it works similarly to Haskell, but because it's restricted by the basic rules of Python syntax everything feels a bit more like hard work than it should. The accumulator approach feels clunky, but it's necessary to allow [tail call elimination](https://en.wikipedia.org/wiki/Tail_call), which Coconut will do and I wanted to see in action. Lo and behold, if you take a look at the [compiled Python version](https://github.com/jezcope/aoc2017/blob/86c8100824bda1b35e5db6e02d4b80890be7a022/01-inverse-captcha.py#L675) you'll see that my recursive implementation has been turned into a non-recursive `while` loop. Then again, maybe I'm just jealous of Phil Tooley's [one-liner solution in Python](https://github.com/ptooley/aocGolf/blob/1380d78194f1258748ccfc18880cfd575baf5d37/2017.py#L8). import sys def inverse_captcha_(s, acc=0): case reiterable(s): match (|d, d|) :: rest: return inverse_captcha_((|d|) :: rest, acc + int(d)) match (|d0, d1|) :: rest: return inverse_captcha_((|d1|) :: rest, acc) return acc def inverse_captcha(s) = inverse_captcha_(s :: s[0]) def inverse_captcha_1_(s0, s1, acc=0): case (reiterable(s0), reiterable(s1)): match ((|d0|) :: rest0, (|d0|) :: rest1): return inverse_captcha_1_(rest0, rest1, acc + int(d0)) match ((|d0|) :: rest0, (|d1|) :: rest1): return inverse_captcha_1_(rest0, rest1, acc) return acc def inverse_captcha_1(s) = inverse_captcha_1_(s, s$[len(s)//2:] :: s) def test_inverse_captcha(): assert "1111" |> inverse_captcha == 4 assert "1122" |> inverse_captcha == 3 assert "1234" |> inverse_captcha == 0 assert "91212129" |> inverse_captcha == 9 def test_inverse_captcha_1(): assert "1212" |> inverse_captcha_1 == 6 assert "1221" |> inverse_captcha_1 == 0 assert "123425" |> inverse_captcha_1 == 4 assert "123123" |> inverse_captcha_1 == 12 assert "12131415" |> inverse_captcha_1 == 4 if __name__ == "__main__": sys.argv[1] |> inverse_captcha |> print sys.argv[1] |> inverse_captcha_1 |> print Advent of Code 2017: introduction It’s a common lament of mine that I don’t get to write a lot of code in my day-to-day job. I like the feeling of making something from nothing, and I often look for excuses to write bits of code, both at work and outside it. Advent of Code is a daily series of programming challenges for the month of December, and is about to start its third annual incarnation. I discovered it too late to take part in any serious way last year, but I’m going to give it a try this year. There are no restrictions on programming language (so of course some people delight in using esoteric languages like Brainf**k), but I think I’ll probably stick with Python for the most part. That said, I miss my Haskell days and I’m intrigued by new kids on the block Go and Rust, so I might end up throwing in a few of those on some of the simpler challenges. I’d like to focus a bit more on how I solve the puzzles. They generally come in two parts, with the second part only being revealed after successful completion of the first part. With that in mind, test-driven development makes a lot of sense, because I can verify that I haven’t broken the solution to the first part in modifying to solve the second. I may also take a literate programming approach with org-mode or Jupyter notebooks to document my solutions a bit more, and of course that will make it easier to publish solutions here so I’ll do that as much as I can make time for. On that note, here are some solutions for 2016 that I’ve done recently as a warmup. Day 1: Python Day 1 instructions import numpy as np import pytest as t import sys TURN = { 'L': np.array([[0, 1], [-1, 0]]), 'R': np.array([[0, -1], [1, 0]]) } ORIGIN = np.array([0, 0]) NORTH = np.array([0, 1]) class Santa: def __init__(self, location, heading): self.location = np.array(location) self.heading = np.array(heading) self.visited = [(0,0)] def execute_one(self, instruction): start_loc = self.location.copy() self.heading = self.heading @ TURN[instruction[0]] self.location += self.heading * int(instruction[1:]) self.mark(start_loc, self.location) def execute_many(self, instructions): for i in instructions.split(','): self.execute_one(i.strip()) def distance_from_start(self): return sum(abs(self.location)) def mark(self, start, end): for x in range(min(start[0], end[0]), max(start[0], end[0])+1): for y in range(min(start[1], end[1]), max(start[1], end[1])+1): if any((x, y) != start): self.visited.append((x, y)) def find_first_crossing(self): for i in range(1, len(self.visited)): for j in range(i): if self.visited[i] == self.visited[j]: return self.visited[i] def distance_to_first_crossing(self): crossing = self.find_first_crossing() if crossing is not None: return abs(crossing[0]) + abs(crossing[1]) def __str__(self): return f'Santa @ {self.location}, heading {self.heading}' def test_execute_one(): s = Santa(ORIGIN, NORTH) s.execute_one('L1') assert all(s.location == np.array([-1, 0])) assert all(s.heading == np.array([-1, 0])) s.execute_one('L3') assert all(s.location == np.array([-1, -3])) assert all(s.heading == np.array([0, -1])) s.execute_one('R3') assert all(s.location == np.array([-4, -3])) assert all(s.heading == np.array([-1, 0])) s.execute_one('R100') assert all(s.location == np.array([-4, 97])) assert all(s.heading == np.array([0, 1])) def test_execute_many(): s = Santa(ORIGIN, NORTH) s.execute_many('L1, L3, R3') assert all(s.location == np.array([-4, -3])) assert all(s.heading == np.array([-1, 0])) def test_distance(): assert Santa(ORIGIN, NORTH).distance_from_start() == 0 assert Santa((10, 10), NORTH).distance_from_start() == 20 assert Santa((-17, 10), NORTH).distance_from_start() == 27 def test_turn_left(): east = NORTH @ TURN['L'] south = east @ TURN['L'] west = south @ TURN['L'] assert all(east == np.array([-1, 0])) assert all(south == np.array([0, -1])) assert all(west == np.array([1, 0])) def test_turn_right(): west = NORTH @ TURN['R'] south = west @ TURN['R'] east = south @ TURN['R'] assert all(east == np.array([-1, 0])) assert all(south == np.array([0, -1])) assert all(west == np.array([1, 0])) if __name__ == '__main__': instructions = sys.stdin.read() santa = Santa(ORIGIN, NORTH) santa.execute_many(instructions) print(santa) print('Distance from start:', santa.distance_from_start()) print('Distance to target: ', santa.distance_to_first_crossing()) Day 2: Haskell Day 2 instructions module Main where data Pos = Pos Int Int deriving (Show) -- Magrittr-style pipe operator (|>) :: a -> (a -> b) -> b x |> f = f x swapPos :: Pos -> Pos swapPos (Pos x y) = Pos y x clamp :: Int -> Int -> Int -> Int clamp lower upper x | x < lower = lower | x > upper = upper | otherwise = x clampH :: Pos -> Pos clampH (Pos x y) = Pos x' y' where y' = clamp 0 4 y r = abs (2 - y') x' = clamp r (4-r) x clampV :: Pos -> Pos clampV = swapPos . clampH . swapPos buttonForPos :: Pos -> String buttonForPos (Pos x y) = [buttons !! y !! x] where buttons = [" D ", " ABC ", "56789", " 234 ", " 1 "] decodeChar :: Pos -> Char -> Pos decodeChar (Pos x y) 'R' = clampH $ Pos (x+1) y decodeChar (Pos x y) 'L' = clampH $ Pos (x-1) y decodeChar (Pos x y) 'U' = clampV $ Pos x (y+1) decodeChar (Pos x y) 'D' = clampV $ Pos x (y-1) decodeLine :: Pos -> String -> Pos decodeLine p "" = p decodeLine p (c:cs) = decodeLine (decodeChar p c) cs makeCode :: String -> String makeCode instructions = lines instructions -- split into lines |> scanl decodeLine (Pos 1 1) -- decode to positions |> tail -- drop start position |> concatMap buttonForPos -- convert to buttons main = do input <- getContents putStrLn $ makeCode input Research Data Management Forum 18, Manchester !!! intro "" Monday 20 and Tuesday 21 November 2017 I’m at the Research Data Management Forum in Manchester. I thought I’d use this as an opportunity to try liveblogging, so during the event some notes should appear in the box below (you may have to manually refresh your browser tab periodically to get the latest version). I've not done this before, so if the blog stops updating then it's probably because I've stopped updating it to focus on the conference instead! This was made possible using GitHub's cool [Gist](https://gist.github.com) tool. Draft content policy I thought it was about time I had some sort of content policy on here so this is a first draft. It will eventually wind up as a separate page. Feedback welcome! !!! aside “Content policy” This blog’s primary purpose is as a reflective learning tool for my own development; my aim in writing any given post is mainly to expose and develop my own thinking on a topic. My reasons for making a public blog rather than a private journal are: 1. If I'm lucky, someone smarter than me will provide feedback that will help me and my readers to learn more 2. If I'm extra lucky, someone else might learn from the material as well Each post, therefore, represents the state of my thinking at the time I wrote it, or perhaps a deliberate provocation or exaggeration; either way, if you don't know me personally please don't judge me based entirely on my past words. This is a request though, not an attempt to excuse bad behaviour on my part. I accept full responsibility for any consequences of my words, whether intended or not. I will not remove comments or ban individuals for disagreeing with me, only for behaving offensively or disrespectfully. I will do my best to be fair and balanced and explain decisions that I take, but I reserve the right to take those decisions without making any explanation at all if it seems likely to further inflame a situation. If I end up responding to anything simply with a link to this policy, that's probably all the explanation you're going to get. It should go without saying, but the opinions presented in this blog are my own and not those of my employer or anyone else I might at times represent. Learning to live with anxiety !!! intro "" This is a post that I’ve been writing for months, and writing in my head for years. For some it will explain aspects of my personality that you might have wondered about. For some it will just be another person banging on self-indulgently about so-called “mental health issues”. Hopefully, for some it will demystify some stuff and show that you’re not alone and things do get better. For as long as I can remember I’ve been a worrier. I’ve also suffered from bouts of what I now recognise as depression, on and off since my school days. It’s only relatively recently that I’ve come to the realisation that these two might be connected and that my ‘worrying’ might in fact be outside the normal range of healthy human behaviour and might more accurately be described as chronic anxiety. You probably won’t have noticed it, but it’s been there. More recently I’ve begun feeling like I’m getting on top of it and feeling “normal” for the first time in my life. Things I’ve found that help include: getting out of the house more and socialising with friends; and getting a range of exercise, outdoors and away from the city (rock climbing is mentally and physically engaging and open water swimming is indescribably joyful). But mostly it’s the cognitive behavioural therapy (CBT) and the antidepressants. Before I go any further, a word about drugs (“don’t do drugs, kids”): I’m on the lowest available dose of a common antidepressant. This isn’t because it stops me being sad all the time (I’m not) or because it makes all my problems go away (it really doesn’t). It’s because the scientific evidence points to a combination of CBT and antidepressants as being the single most effective treatment for generalised anxiety disorder. The reason for this is simple: CBT isn’t easy, because it asks you to challenge habits and beliefs you’ve held your whole life. In the short term there is going to be more anxiety and some antidepressants are also effective at blunting the effect of this additional anxiety. In short, CBT is what makes you better, and the drugs just make it a little bit more effective. A lot of people have misconceptions about what it means to be ‘in therapy’. I suspect a lot of these are derived from the psychoanalysis we often see portrayed in (primarily US) film and TV. The problem with that type of navel-gazing therapy is that you can spend years doing it, finally reach some sort of breakthrough insight, and still not have no idea what the supposed insight means for your actual life. CBT is different in that rather than addressing feelings directly it focuses on habits in your thoughts (cognitive) and actions (behavioural) with feeling better as an outcome (therapy). CBT and related forms of therapy now have decades of clinical evidence showing that they really work. It uses a wide range of techniques to identify, challenge and reduce various common unhelpful thoughts and behaviours. By choosing and practicing these, you can break bad mental habits that you’ve been carrying around, often for decades. For me this means giving fair weight to my successes as well as my failings, allowing flexibility into the rigid rules that I have always, subconsciously, lived by, and being a bit kinder to myself when I make mistakes. It’s not been easy and I have to remind myself to practice this every day, but it’s really helped. !!! aside “More info” If you live in the UK, you might not be aware that you can get CBT and other psychological therapies on the NHS through a scheme called IAPT (improving access to psychological therapies). You can self-refer so you don’t need to see a doctor first, but you might want to anyway if you think medication might help. They also have a progression of treatments, so you might be offered a course of “guided self-help” and then progressed to CBT or another talking therapy if need be. This is what happened to me, and it did help a bit but it was CBT that helped me the most. Becoming a librarian What is a librarian? Is it someone who has a masters degree in librarianship and information science? Is it someone who looks after information for other people? Is it simply someone who works in a library? I’ve been grappling with this question a lot lately because I’ve worked in academic libraries for about 3 years now and I never really thought that’s something that might happen. People keep referring to me as “a librarian” but there’s some imposter feelings here because all the librarians around me have much more experience, have skills in areas like cataloguing and collection management and, generally, have a librarian masters degree. So I’ve been thinking about what it actually means to me to be a librarian or not. NB. some of these may be tongue-in-cheek Ways in which I am a librarian: I work in a library I help people to access and organise information I have a cat I like gin Ways in which I am not a librarian: I don’t have a librarianship qualification I don’t work with books 😉 I don’t knit (though I can probably remember how if pressed) I don’t shush people or wear my hair in a bun (I can confirm that this is also true of every librarian I know) Ways in which I am a shambrarian: I like beer I have more IT experience and qualification than librarianship At the end of the day, I still don’t know how I feel about this or, for that matter, how important it is. I’m probably going to accept whatever title people around me choose to bestow, though any label will chafe at times! Lean Libraries: applying agile practices to library services Kanban board Jeff Lasovski (via Wikimedia Commons) I’ve been working with our IT services at work quite closely for the last year as product owner for our new research data portal, ORDA. That’s been a fascinating process for me as I’ve been able to see first-hand some of the agile techniques that I’ve been reading about from time-to-time on the web over the last few years. They’re in the process of adopting a specific set of practices going under the name “Scrum”, which is fun because it uses some novel terminology that sounds pretty weird to non-IT folks, like “scrum master”, “sprint” and “product backlog”. On my small project we’ve had great success with the short cycle times and been able to build trust with our stakeholders by showing concrete progress on a regular basis. Modern librarianship is increasingly fluid, particularly in research services, and I think that to handle that fluidity it’s absolutely vital that we are able to work in a more agile way. I’m excited about the possibilities of some of these ideas. However, Scrum as implemented by our IT services doesn’t seem something that transfers directly to the work that we do: it’s too specialised for software development to adapt directly. What I intend to try is to steal some of the individual practices on an experimental basis and simply see what works and what doesn’t. The Lean concepts currently popular in IT were originally developed in manufacturing: if they can be translated from the production of physical goods to IT, I don’t see why we can’t make the ostensibly smaller step of translating them to a different type of knowledge work. I’ve therefore started reading around this subject to try and get as many ideas as possible. I’m generally pretty rubbish at taking notes from books, so I’m going to try and record and reflect on any insights I make on this blog. The framework for trying some of these out is clearly a Plan-Do-Check-Act continuous improvement cycle, so I’ll aim to reflect on that process too. I’m sure there will have been people implementing Lean in libraries already, so I’m hoping to be able to discover and learn from them instead of starting froms scratch. Wish me luck! Mozilla Global Sprint 2017 Photo by Lena Bell on Unsplash Every year, the Mozilla Foundation runs a two-day Global Sprint, giving people around the world 50 hours to work on projects supporting and promoting open culture and tech. Though much of the work during the sprint is, of course, technical software development work, there are always tasks suited to a wide range of different skill sets and experience levels. The participants include writers, designers, teachers, information professionals and many others. This year, for the first time, the University of Sheffield hosted a site, providing a space for local researchers, developers and others to get out of their offices, work on #mozsprint and link up with others around the world. The Sheffield site was organised by the Research Software Engineering group in collaboration with the University Library. Our site was only small compared to others, but we still had people working on several different projects. My reason for taking part in the sprint was to contribute to the international effort on the Library Carpentry project. A team spread across four continents worked throughout the whole sprint to review and develop our lesson material. As there were no other Library Carpentry volunteers at the Sheffield site, I chose to work on some urgent work around improving the presentation of our workshops and lessons on the web and related workflows. It was a really nice subproject to work on, requiring not only cleaning up and normalising the metadata we hold on workshops and lessons, but also digesting and formalising our current ad hoc process of lesson development. The largest group were solar physicists from the School of Maths and Statistics, working on the SunPy project, an open source environment for solar data analysis. They pushed loads of bug fixes and documentation improvements, and also mentored a new contributor through their first additions to the project. Anna Krystalli from Research Software Engineering worked on the EchoBurst project, which is building a web browser extension to help people break out of their online echo chambers. It does this by using natural language processing techniques to highlight well-written, logically sound articles that disagree with the reader’s stated views on particular topics of interest. Anna was part of an effort to begin extending this technology to online videos. We had a couple of individuals simply taking the opportunity to break out of their normal work environments to work or learn, including a couple of members of library staff show up for a couple of hours to learn how to use git on a new project! IDCC 2017 reflection For most of the last few years I've been lucky enough to attend the International Digital Curation Conference (IDCC). One of the main audiences attending is people who, like me, work on research data management at universities around the world and it's begun to feel like a sort of "home" conference to me. This year, IDCC was held at the Royal College of Surgeons in the beautiful city of Edinburgh. For the last couple of years, my overall impression has been that, as a community, we're moving away from the "first-order" problem of trying to convince people (from PhD students to senior academics) to take RDM seriously and into a rich set of "second-order" problems around how to do things better and widen support to more people. This year has been no exception. Here are a few of my observations and takeaway points. Everyone has a repository now Only last year, the most common question you'd get asked by strangers in the coffee break would be "Do you have a data repository?" Now the question is more likely to be "What are you using for your data repository?", along with more subtle questions about specific components of systems and how they interact. Integrating active storage and archival systems Now that more institutions have data worth preserving, there is more interest in (and in many cases experience of) setting up more seamless integrations between active and archival storage. There are lessons here we can learn. Freezing in amber vs actively maintaining assets There seemed to be an interesting debate going on throughout the conference around the aim of preservation: should we be faithfully preserving the bits and bytes provided without trying to interpret them, or should we take a more active approach by, for example, migrating obsolete formats to newer alternatives. If the former, should we attempt to preserve the software required to access the data as well? If the latter, how much effort do we invest and how do we ensure nothing is lost or altered in the migration? Demonstrating Data Science instead of debating what it is The phrase "Data Science" was once again one of the most commonly uttered of the conference. However, there is now less abstract discussion about what, exactly, is meant by this "data science" thing; this has been replaced more by concrete demonstrations. This change was exemplified perfectly by the keynote by data scientist Alice Daish, who spent a riveting 40 minutes or so enthusing about all the cool stuff she does with data at the British Museum. Recognition of software as an issue Even as recently as last year, I've struggled to drum up much interest in discussing software sustainability and preservation at events like this; the interest was there, but there were higher priorities. So I was completely taken by surprise when we ended up with 30+ people in the Software Preservation Birds of a Feather (BoF) session, and when very little input was needed from me as chair to keep a productive discussion going for a full 90 minutes. Unashamed promotion of openness As a community we seem to have nearly overthrown our collective embarrassment about the phrase "open data" (although maybe this is just me). We've always known it was a good thing, but I know I've been a bit of an apologist in the past, feeling that I had to "soften the blow" when asking researchers to be more open. Now I feel more confident in leading with the benefits of openness, and it felt like that's a change reflected in the community more widely. Becoming more involved in the conference This year, I took a decision to try and do more to contribute to the conference itself, and I felt like this was pretty successful both in making that contribution and building up my own profile a bit. I presented a paper on one of my current passions, Library Carpentry; it felt really good to be able to share my enthusiasm. I presented a poster on our work integrating our data repository and digital preservation platform; this gave me more of a structure for networking during breaks, as I was able to stand by the poster and start discussions with anyone who seemed interested. I chaired a parallel session; a first for me, and a different challenge from presenting or simply attending the talks. And finally, I proposed and chaired the Software Preservation BoF session (blog post forthcoming). Renewed excitement It's weird, and possibly all in my imagination, but there seemed to be more energy at this conference than at the previous couple I've been to. More people seemed to be excited about the work we're all doing, recent achievements and the possibilities for the future. Introducing PyRefine: OpenRefine meets Python I’m knocking the rust off my programming skills by attempting to write a pure-Python interpreter for OpenRefine “scripts”. OpenRefine is a great tool for exploring and cleaning datasets prior to analysing them. It also records an undo history of all actions that you can export as a sort of script in JSON format. One thing that bugs me though is that, having spent some time interactively cleaning up your dataset, you then need to fire up OpenRefine again and do some interactive mouse-clicky stuff to apply that cleaning routine to another dataset. You can at least re-import the JSON undo history to make that as quick as possible, but there’s no getting around the fact that there’s no quick way to do it from a cold start. There is a project, BatchRefine, that extends the OpenRefine server to accept batch requests over a HTTP API, but that isn’t useful when you can’t or don’t want to keep a full Java stack running in the background the whole time. My concept is this: you use OR to explore the data interactively and design a cleaning process, but then export the process to JSON and integrate it into your analysis in Python. That way it can be repeated ad nauseam without having to fire up a full Java stack. I’m taking some inspiration from the great talk “So you want to be a wizard?" by Julia Evans (@b0rk), who recommends trying experiments as a way to learn. She gives these Rules of Programming Experiments: “it doesn’t have to be good it doesn’t have to work you have to learn something” In that spirit, my main priorities are: to see if this can be done; to see how far I can get implementing it; and to learn something. If it also turns out to be a useful thing, well, that’s a bonus. Some of the interesting possible challenges here: Implement all core operations; there are quite a lot of these, some of which will be fun (i.e. non-trivial) to implement Implement (a subset of?) GREL, the General Refine Expression Language; I guess my undergrad course on implementing parsers and compilers will come in handy after all! Generate clean, sane Python code from the JSON rather than merely executing it; more than anything, this would be a nice educational tool for users of OpenRefine who want to see how to do equivalent things in Python Selectively optimise key parts of the process; this will involve profiling the code to identify bottlenecks as well as tweaking the actual code to go faster Potentially handle contributions to the code from other people; I’d be really happy if this happened but I’m realistic… If you’re interested, the project is called PyRefine and it’s on github. Constructive criticism, issues & pull requests all welcome! Implementing Yesterbox in emacs with mu4e I’ve been meaning to give Yesterbox a try for a while. The general idea is that each day you only deal with email that arrived yesterday or earlier. This forms your inbox for the day, hence “yesterbox”. Once you’ve emptied your yesterbox, or at least got through some minimum number (10 is recommended) then you can look at emails from today. Even then you only really want to be dealing with things that are absolutely urgent. Anything else can wait til tomorrow. The motivation for doing this is to get away from the feeling that we are King Canute, trying to hold back the tide. I find that when I’m processing my inbox toward zero there’s always a temptation to keep skipping to the new stuff that’s just come in. Hiding away the new email until I’ve dealt with the old is a very interesting idea. I use mu4e in emacs for reading my email, and handily the mu search syntax is very flexible so you’d think it would be easy to create a yesterbox filter: maildir:"/INBOX" date:..1d Unfortunately, 1d is interpreted as “24 hours ago from right now” so this filter misses everything that was sent yesterday but less than 24 hours ago. There was a feature request raised on the mu github repository to implement an additional date filter syntax but it seems to have died a death for now. In the meantime, the answer to this is to remember that my workplace observes fairly standard office hours, so that anything sent more than 9 hours ago is unlikely to have been sent today. The following does the trick: maildir:"/INBOX" date:..9h In my mu4e bookmarks list, that looks like this: (setq mu4e-bookmarks '(("flag:unread AND NOT flag:trashed" "Unread messages" ?u) ("flag:flagged maildir:/archive" "Starred messages" ?s) ("date:today..now" "Today's messages" ?t) ("date:7d..now" "Last 7 days" ?w) ("maildir:\"/Mailing lists.*\" (flag:unread OR flag:flagged)" "Unread in mailing lists" ?M) ("maildir:\"/INBOX\" date:..1d" "Yesterbox" ?y))) ;; <- this is the new one Rewarding good practice in research From opensource.com on Flickr Whenever I’m involved in a discussion about how to encourage researchers to adopt new practices, eventually someone will come out with some variant of the following phrase: “That’s all very well, but researchers will never do XYZ until it’s made a criterion in hiring and promotion decisions.” With all the discussion of carrots and sticks I can see where this attitude comes from, and strongly empathise with it, but it raises two main problems: It’s unfair and more than a little insulting to anyone to be lumped into one homogeneous group; and Taking all the different possible XYZs into account, that’s an awful lot of hoops to expect anyone to jump through. Firstly, “researchers” are as diverse as the rest of us in terms of what gets them out of bed in the morning. Some of us want prestige; some want to contribute to a greater good; some want to create new things; some just enjoy the work. One thing I’d argue we all have in common is this: nothing is more offputting than feeling like you’re being strongarmed into something you don’t want to do. If we rely on simplistic metrics, people will focus on those and miss the point. At best people will disengage and at worst they will actively game the system. I’ve got to do these ten things to get my next payrise, and still retain my sanity? Ok, what’s the least I can get away with and still tick them off. You see it with students taking poorly-designed assessments and grown-ups are no difference. We do need to wield carrots as well as sticks, but the whole point is that these practices are beneficial in and of themselves. The carrots are already there if we articulate them properly and clear the roadblocks (don’t you enjoy mixed metaphors?). Creating artificial benefits will just dilute the value of the real ones. Secondly, I’ve heard a similar argument made for all of the following practices and more: Research data management Open Access publishing Public engagement New media (e.g. blogging) Software management and sharing Some researchers devote every waking hour to their work, whether it’s in the lab, writing grant applications, attending conferences, authoring papers, teaching, and so on and so on. It’s hard to see how someone with all this in their schedule can find time to exercise any of these new skills, let alone learn them in the first place. And what about the people who sensibly restrict the hours taken by work to spend more time doing things they enjoy? Yes, all of the above practices are valuable, both for the individual and the community, but they’re all new (to most) and hence require more effort up front to learn. We have to accept that it’s inevitably going to take time for all of them to become “business as usual”. I think if the hiring/promotion/tenure process has any role in this, it’s in asking whether the researcher can build a coherent narrative as to why they’ve chosen to focus their efforts in this area or that. You’re not on Twitter but your data is being used by 200 research groups across the world? Great! You didn’t have time to tidy up your source code for github but your work is directly impacting government policy? Brilliant! We still need convince more people to do more of these beneficial things, so how? Call me naïve, but maybe we should stick to making rational arguments, calming fears and providing low-risk opportunities to learn new skills. Acting (compassionately) like a stuck record can help. And maybe we’ll need to scale back our expectations in other areas (journal impact factors, anyone?) to make space for the new stuff. Software Carpentry: SC Test; does your software do what you meant? “The single most important rule of testing is to do it.” — Brian Kernighan and Rob Pike, The Practice of Programming (quote taken from SC Test page One of the trickiest aspects of developing software is making sure that it actually does what it’s supposed to. Sometimes failures are obvious: you get completely unreasonable output or even (shock!) a comprehensible error message. But failures are often more subtle. Would you notice if your result was out by a few percent, or consistently ignored the first row of your input data? The solution to this is testing: take some simple example input with a known output, run the code and compare the actual output with the expected one. Implement a new feature, test and repeat. Sounds easy, doesn’t it? But then you implement a new bit of code. You test it and everything seems to work fine, except that your new feature required changes to existing code and those changes broke something else. So in fact you need to test everything, and do it every time you make a change. Further than that, you probably want to test that all your separate bits of code work together properly (integration testing) as well as testing the individual bits separately (unit testing). In fact, splitting your tests up like that is a good way of holding on to your sanity. This is actually a lot less scary than it sounds, because there are plenty of tools now to automate that testing: you just type a simple test command and everything is verified. There are even tools that enable you to have tests run automatically when you check the code into version control, and even automatically deploy code that passes the tests, a process known as continuous integration or CI. The big problems with testing are that it’s tedious, your code seems to work without it and no-one tells you off for not doing it. At the time when the Software Carpentry competition was being run, the idea of testing wasn’t new, but the tools to help were in their infancy. “Existing tools are obscure, hard to use, expensive, don’t actually provide much help, or all three.” The SC Test category asked entrants “to design a tool, or set of tools, which will help programmers construct and maintain black box and glass box tests of software components at all levels, including functions, modules, and classes, and whole programs.” The SC Test category is interesting in that the competition administrators clearly found it difficult to specify what they wanted to see in an entry. In fact, the whole category was reopened with a refined set of rules and expectations. Ultimately, it’s difficult to tell whether this category made a significant difference. Where the tools to write tests used to be very sparse and difficult to use they are now many and several options exist for most programming languages. With this proliferation, several tried-and-tested methodologies have emerged which are consistent across many different tools, so while things still aren’t perfect they are much better. In recent years there has been a culture shift in the wider software development community towards both testing in general and test-first development, where the tests for a new feature are written first, and then the implementation is coded incrementally until all tests pass. The current challenge is to transfer this culture shift to the academic research community! Tools for collaborative markdown editing Photo by Alan Cleaver I really love Markdown1. I love its simplicity; its readability; its plain-text nature. I love that it can be written and read with nothing more complicated than a text-editor. I love how nicely it plays with version control systems. I love how easy it is to convert to different formats with Pandoc and how it’s become effectively the native text format for a wide range of blogging platforms. One frustration I’ve had recently, then, is that it’s surprisingly difficult to collaborate on a Markdown document. There are various solutions that almost work but at best feel somehow inelegant, especially when compared with rock solid products like Google Docs. Finally, though, we’re starting to see some real possibilities. Here are some of the things I’ve tried, but I’d be keen to hear about other options. 1. Just suck it up To be honest, Google Docs isn’t that bad. In fact it works really well, and has almost no learning curve for anyone who’s ever used Word (i.e. practically anyone who’s used a computer since the 90s). When I’m working with non-technical colleagues there’s nothing I’d rather use. It still feels a bit uncomfortable though, especially the vendor lock-in. You can export a Google Doc to Word, ODT or PDF, but you need to use Google Docs to do that. Plus as soon as I start working in a word processor I get tempted to muck around with formatting. 2. Git(hub) The obvious solution to most techies is to set up a GitHub repo, commit the document and go from there. This works very well for bigger documents written over a longer time, but seems a bit heavyweight for a simple one-page proposal, especially over short timescales. Who wants to muck around with pull requests and merging changes for a document that’s going to take 2 days to write tops? This type of project doesn’t need a bug tracker or a wiki or a public homepage anyway. Even without GitHub in the equation, using git for such a trivial use case seems clunky. 3. Markdown in Etherpad/Google Docs Etherpad is great tool for collaborative editing, but suffers from two key problems: no syntax highlighting or preview for markdown (it’s just treated as simple text); and you need to find a server to host it or do it yourself. However, there’s nothing to stop you editing markdown with it. You can do the same thing in Google Docs, in fact, and I have. Editing a fundamentally plain-text format in a word processor just feels weird though. 4. Overleaf/Authorea Overleaf and Authorea are two products developed to support academic editing. Authorea has built-in markdown support but lacks proper simultaneous editing. Overleaf has great simultaneous editing but only supports markdown by wrapping a bunch of LaTeX boilerplate around it. Both OK but unsatisfactory. 5. StackEdit Now we’re starting to get somewhere. StackEdit has both Markdown syntax highlighting and near-realtime preview, as well as integrating with Google Drive and Dropbox for file synchronisation. 6. HackMD HackMD is one that I only came across recently, but it looks like it does exactly what I’m after: a simple markdown-aware editor with live preview that also permits simultaneous editing. I’m a little circumspect simply because I know simultaneous editing is difficult to get right, but it certainly shows promise. 7. Classeur I discovered Classeur literally today: it’s developed by the same team as StackEdit (which is now apparently no longer in development), and is currently in beta, but it looks to offer two killer features: real-time collaboration, including commenting, and pandoc-powered export to loads of different formats. Anything else? Those are the options I’ve come up with so far, but they can’t be the only ones. Is there anything I’ve missed? Other plain-text formats are available. I’m also a big fan of org-mode. ↩︎ Software Carpentry: SC Track; hunt those bugs! This competition will be an opportunity for the next wave of developers to show their skills to the world — and to companies like ours. — Dick Hardt, ActiveState (quote taken from SC Track page) All code contains bugs, and all projects have features that users would like but which aren’t yet implemented. Open source projects tend to get more of these as their user communities grow and start requesting improvements to the product. As your open source project grows, it becomes harder and harder to keep track of and prioritise all of these potential chunks of work. What do you do? The answer, as ever, is to make a to-do list. Different projects have used different solutions, including mailing lists, forums and wikis, but fairly quickly a whole separate class of software evolved: the bug tracker, which includes such well-known examples as Bugzilla, Redmine and the mighty JIRA. Bug trackers are built entirely around such requests for improvement, and typically track them through workflow stages (planning, in progress, fixed, etc.) with scope for the community to discuss and add various bits of metadata. In this way, it becomes easier both to prioritise problems against each other and to use the hive mind to find solutions. Unfortunately most bug trackers are big, complicated beasts, more suited to large projects with dozens of developers and hundreds or thousands of users. Clearly a project of this size is more difficult to manage and requires a certain feature set, but the result is that the average bug tracker is non-trivial to set up for a small single-developer project. The SC Track category asked entrants to propose a better bug tracking system. In particular, the judges were looking for something easy to set up and configure without compromising on functionality. The winning entry was a bug-tracker called Roundup, proposed by Ka-Ping Yee. Here we have another tool which is still in active use and development today. Given that there is now a huge range of options available in this area, including the mighty github, this is no small achievement. These days, of course, github has become something of a de facto standard for open source project management. Although ostensibly a version control hosting platform, each github repository also comes with a built-in issue tracker, which is also well-integrated with the “pull request” workflow system that allows contributors to submit bug fixes and features themselves. Github’s competitors, such as GitLab and Bitbucket, also include similar features. Not everyone wants to work in this way though, so it’s good to see that there is still a healthy ecosystem of open source bug trackers, and that Software Carpentry is still having an impact. Software Carpentry: SC Config; write once, compile anywhere Nine years ago, when I first release Python to the world, I distributed it with a Makefile for BSD Unix. The most frequent questions and suggestions I received in response to these early distributions were about building it on different Unix platforms. Someone pointed me to autoconf, which allowed me to create a configure script that figured out platform idiosyncracies Unfortunately, autoconf is painful to use – its grouping, quoting and commenting conventions don’t match those of the target language, which makes scripts hard to write and even harder to debug. I hope that this competition comes up with a better solution — it would make porting Python to new platforms a lot easier! — Guido van Rossum, Technical Director, Python Consortium (quote taken from SC Config page) On to the next Software Carpentry competition category, then. One of the challenges of writing open source software is that you have to make it run on a wide range of systems over which you have no control. You don’t know what operating system any given user might be using or what libraries they have installed, or even what versions of those libraries. This means that whatever build system you use, you can’t just send the Makefile (or whatever) to someone else and expect everything to go off without a hitch. For a very long time, it’s been common practice for source packages to include a configure script that, when executed, runs a bunch of tests to see what it has to work with and sets up the Makefile accordingly. Writing these scripts by hand is a nightmare, so tools like autoconf and automake evolved to make things a little easier. They did, and if the tests you want to use are already implemented they work very well indeed. Unfortunately they’re built on an unholy combination of shell scripting and the archaic Gnu M4 macro language. That means if you want to write new tests you need to understand both of these as well as the architecture of the tools themselves — not an easy task for the average self-taught research programmer. SC Conf, then, called for a re-engineering of the autoconf concept, to make it easier for researchers to make their code available in a portable, platform-independent format. The second round configuration tool winner was SapCat, “a tool to help make software portable”. Unfortunately, this one seems not to have gone anywhere, and I could only find the original proposal on the Internet Archive. There were a lot of good ideas in this category about making catalogues and databases of system quirks to avoid having to rerun the same expensive tests again the way a standard ./configure script does. I think one reason none of these ideas survived is that they were overly ambitions, imagining a grand architecture where their tool provide some overarching source of truth. This is in stark contrast to the way most Unix-like systems work, where each tool does one very specific job well and tools are easy to combine in various ways. In the end though, I think Moore’s Law won out here, making it easier to do the brute-force checks each time than to try anything clever to save time — a good example of avoiding unnecessary optimisation. Add to that the evolution of the generic pkg-config tool from earlier package-specific tools like gtk-config, and it’s now much easier to check for particular versions and features of common packages. On top of that, much of the day-to-day coding of a modern researcher happens in interpreted languages like Python and R, which give you a fully-functioning pre-configured environment with a lot less compiling to do. As a side note, Tom Tromey, another of the shortlisted entrants in this category, is still a major contributor to the open source world. He still seems to be involved in the automake project, contributes a lot of code to the emacs community too and blogs sporadically at The Cliffs of Inanity. Semantic linefeeds: one clause per line I’ve started using “semantic linefeeds”, a concept I discovered on Brandon Rhodes' blog, when writing content, an idea described in that article far better than I could. I turns out this is a very old idea, promoted way back in the day by Brian W Kernighan, contributor to the original Unix system, co-creator of the AWK and AMPL programming languages and co-author of a lot of seminal programming textbooks including “The C Programming Language”. The basic idea is that you break lines at natural gaps between clauses and phrases, rather than simply after the last word before you hit 80 characters. Keeping line lengths strictly to 80 characters isn’t really necessary in these days of wide aspect ratios for screens. Breaking lines at points that make semantic sense in the sentence is really helpful for editing, especially in the context of version control, because it isolates changes to the clause in which they occur rather than just the nearest 80-character block. I also like it because it makes my crappy prose feel just a little bit more like poetry. ☺ Software Carpentry: SC Build; or making a better make Software tools often grow incrementally from small beginnings into elaborate artefacts. Each increment makes sense, but the final edifice is a mess. make is an excellent example: a simple tool that has grown into a complex domain-specific programming language. I look forward to seeing the improvements we will get from designing the tool afresh, as a whole… — Simon Peyton-Jones, Microsoft Research (quote taken from SC Build page) Most people who have had to compile an existing software tool will have come across the venerable make tool (which usually these days means GNU Make). It allows the developer to write a declarative set of rules specifying how the final software should be built from its component parts, mostly source code, allowing the build itself to be carried out by simply typing make at the command line and hitting Enter. Given a set of rules, make will work out all the dependencies between components and ensure everything is built in the right order and nothing that is up-to-date is rebuilt. Great in principle but make is notoriously difficult for beginners to learn, as much of the logic for how builds are actually carried out is hidden beneath the surface. This also makes it difficult to debug problems when building large projects. For these reasons, the SC Build category called for a replacement build tool engineered from the ground up to solve these problems. The second round winner, ScCons, is a Python-based make-like build tool written by Steven Knight. While I could find no evidence of any of the other shortlisted entries, this project (now renamed SCons) continues in active use and development to this day. I actually use this one myself from time to time and to be honest I prefer it in many cases to trendy new tools like rake or grunt and the behemoth that is Apache Ant. Its Python-based SConstruct file syntax is remarkably intuitive and scales nicely from very simple builds up to big and complicated project, with good dependency tracking to avoid unnecessary recompiling. It has a lot of built-in rules for performing common build & compile tasks, but it’s trivial to add your own, either by combining existing building blocks or by writing a new builder with the full power of Python. A minimal SConstruct file looks like this: Program('hello.c') Couldn’t be simpler! And you have the full power of Python syntax to keep your build file simple and readable. It’s interesting that all the entries in this category apart from one chose to use a Python-derived syntax for describing build steps. Python was clearly already a language of choice for flexible multi-purpose computing. The exception is the entry that chose to use XML instead, which I think is a horrible idea (oh how I used to love XML!) but has been used to great effect in the Java world by tools like Ant and Maven. What happened to the original Software Carpentry? “Software Carpentry was originally a competition to design new software tools, not a training course. The fact that you didn’t know that tells you how well it worked.” When I read this in a recent post on Greg Wilson’s blog, I took it as a challenge. I actually do remember the competition, although looking at the dates it was long over by the time I found it. I believe it did have impact; in fact, I still occasionally use one of the tools it produced, so Greg’s comment got me thinking: what happened to the other competition entries? Working out what happened will need a bit of digging, as most of the relevant information is now only available on the Internet Archive. It certainly seems that by November 2008 the domain name had been allowed to lapse and had been replaced with a holding page by the registrar. There were four categories in the competition, each representing a category of tool that the organisers thought could be improved: SC Build: a build tool to replace make SC Conf: a configuration management tool to replace autoconf and automake SC Track: a bug tracking tool SC Test: an easy to use testing framework I’m hoping to be able to show that this work had a lot more impact than Greg is admitting here. I’ll keep you posted on what I find! Changing static site generators: Nanoc → Hugo I’ve decided to move the site over to a different static site generator, Hugo. I’ve been using Nanoc for a long time and it’s worked very well, but lately it’s been taking longer and longer to compile the site and throwing weird errors that I can’t get to the bottom of. At the time I started using Nanoc, static site generators were in their infancy. There weren’t the huge number of feature-loaded options that there are now, so I chose one and I built a whole load of blogging-related functionality myself. I did it in ways that made sense at the time but no longer work well with Nanoc’s latest versions. So it’s time to move to something that has blogging baked-in from the beginning and I’m taking the opportunity to overhaul the look and feel too. Again, when I started there weren’t many pre-existing themes so I built the whole thing myself and though I’m happy with the work I did on it it never quite felt polished enough. Now I’ve got the opportunity to adapt one of the many well-designed themes already out there, so I’ve taken one from the Hugo themes gallery and tweaked the colours to my satisfaction. Hugo also has various features that I’ve wanted to implement in Nanoc but never quite got round to it. The nicest one is proper handling of draft posts and future dates, but I keep finding others. There’s a lot of old content that isn’t quite compatible with the way Hugo does things so I’ve taken the old Nanoc-compiled content and frozen it to make sure that old links should still work. I could probably fiddle with it for years without doing much so it’s probably time to go ahead and publish it. I’m still not completely happy with my choice of theme but one of the joys of Hugo is that I can change that whenever I want. Let me know what you think! License Except where otherwise stated, all content on eRambler by Jez Cope is licensed under a Creative Commons Attribution-ShareAlike 4.0 International license. RDM Resources I occasionally get asked for resources to help someone learn more about research data management (RDM) as a discipline (i.e. for those providing RDM support rather than simply wanting to manage their own data). I’ve therefore collected a few resources together on this page. If you’re lucky I might even update it from time to time! First, a caveat: this is very focussed on UK Higher Education, though much of it will still be relevant for people outside that narrow demographic. My general recommendation would be to start with the Digital Curation Centre (DCC) website and follow links out from there. I also have a slowly growing list of RDM links on Diigo, and there’s an RDM section in my list of blogs and feeds too. Mailing lists Jiscmail is a popular list server run for the benefit of further and higher education in the UK; the following lists are particularly relevant: RESEARCH-DATAMAN DATA-PUBLICATION DIGITAL-PRESERVATION LIS-RESEARCHSUPPORT The Research Data Alliance have a number of Interest Groups and Working Groups that discuss issues by email Events International Digital Curation Conference — major annual conference Research Data Management Forum — roughly every six months, places are limited! RDA Plenary — also every 6 months, but only about 1 in every 3 in Europe Books In no particular order: Martin, Victoria. Demystifying eResearch: A Primer for Librarians. Libraries Unlimited, 2014. Borgman, Christine L. Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, Massachusetts: The MIT Press, 2015. Corti, Louise, Veerle Van den Eynden, and Libby Bishop. Managing and Sharing Research Data. Thousand Oaks, CA: SAGE Publications Ltd, 2014. Pryor, Graham, ed. Managing Research Data. Facet Publishing, 2012. Pryor, Graham, Sarah Jones, and Angus Whyte, eds. Delivering Research Data Management Services: Fundamentals of Good Practice. Facet Publishing, 2013. Ray, Joyce M., ed. Research Data Management: Practical Strategies for Information Professionals. West Lafayette, Indiana: Purdue University Press, 2014. Reports ‘Ten Recommendations for Libraries to Get Started with Research Data Management’. LIBER, 24 August 2012. http://libereurope.eu/news/ten-recommendations-for-libraries-to-get-started-with-research-data-management/. ‘Science as an Open Enterprise’. Royal Society, 2 June 2012. https://royalsociety.org/policy/projects/science-public-enterprise/Report/. Mary Auckland. ‘Re-Skilling for Research’. RLUK, January 2012. http://www.rluk.ac.uk/wp-content/uploads/2014/02/RLUK-Re-skilling.pdf. Journals International Journal of Digital Curation (IJDC) Journal of eScience Librarianship (JeSLib) Fairphone 2: initial thoughts on the original ethical smartphone I’ve had my eye on the Fairphone 2 for a while now, and when my current phone, an aging Samsung Galaxy S4, started playing up I decided it was time to take the plunge. A few people have asked for my thoughts on the Fairphone so here are a few notes. Why I bought it The thing that sparked my interest, and the main reason for buying the phone really, was the ethical stance of the manufacturer. The small Swedish company have gone to great lengths to ensure that both labour and materials are sourced as responsibly as possible. They regularly inspect the factories where the parts are made and assembled to ensure fair treatment of the workers and they source all the raw materials carefully to minimise the environmental impact and the use of conflict minerals. Another side to this ethical stance is a focus on longevity of the phone itself. This is not a product with an intentionally limited lifespan. Instead, it’s designed to be modular and as repairable as possible, by the owner themselves. Spares are available for all of the parts that commonly fail in phones (including screen and camera), and at the time of writing the Fairphone 2 is the only phone to receive 10/10 for reparability from iFixit. There are plans to allow hardware upgrades, including an expansion port on the back so that NFC or wireless charging could be added with a new case, for example. What I like So far, the killer feature for me is the dual SIM card slots. I have both a personal and a work phone, and the latter was always getting left at home or in the office or running out of charge. Now I have both SIMs in the one phone: I can recieve calls on either number, turn them on and off independently and choose which account to use when sending a text or making a call. The OS is very close to “standard” Android, which is nice, and I really don’t miss all the extra bloatware that came with the Galaxy S4. It also has twice the storage of that phone, which is hardly unique but is still nice to have. Overall, it seems like a solid, reliable phone, though it’s not going to outperform anything else at the same price point. It certainly feels nice and snappy for everything I want to use it for. I’m no mobile gamer, but there is that distant promise of upgradability on the horizon if you are. What I don’t like I only have two bugbears so far. Once or twice it’s locked up and become unresponsive, requiring a “manual reset” (removing and replacing the battery) to get going again. It also lacks NFC, which isn’t really a deal breaker, but I was just starting to make occasional use of it on the S4 (mostly experimenting with my Yubikey NEO) and it would have been nice to try out Android Pay when it finally arrives in the UK. Overall It’s definitely a serious contender if you’re looking for a new smartphone and aren’t bothered about serious mobile gaming. You do pay a premium for the ethical sourcing and modularity, but I feel that’s worth it for me. I’m looking forward to seeing how it works out as a phone. Wiring my web I’m a nut for automating repetitive tasks, so I was dead pleased a few years ago when I discovered that IFTTT let me plug different bits of the web together. I now use it for tasks such as: Syndicating blog posts to social media Creating scheduled/repeating todo items from a Google Calendar Making a note to revisit an article I’ve starred in Feedly I’d probably only be half-joking if I said that I spend more time automating things than I save not having to do said things manually. Thankfully it’s also a great opportunity to learn, and recently I’ve been thinking about reimplementing some of my IFTTT workflows myself to get to grips with how it all works. There are some interesting open source projects designed to offer a lot of this functionality, such as Huginn, but I decided to go for a simpler option for two reasons: I want to spend my time learning about the APIs of the services I use and how to wire them together, rather than learning how to use another big framework; and I only have a small Amazon EC2 server to pay with and a heavy Ruby on Rails app like Huginn (plus web server) needs more memory than I have. Instead I’ve gone old-school with a little collection of individual scripts to do particular jobs. I’m using the built-in scheduling functionality of systemd, which is already part of a modern Linux operating system, to get them to run periodically. It also means I can vary the language I use to write each one depending on the needs of the job at hand and what I want to learn/feel like at the time. Currently it’s all done in Python, but I want to have a go at Lisp sometime, and there are some interesting new languages like Go and Julia that I’d like to get my teeth into as well. You can see my code on github as it develops: https://github.com/jezcope/web-plumbing. Comments and contributions are welcome (if not expected) and let me know if you find any of the code useful. Image credit: xkcd #1319, Automation Data is like water, and language is like clothing I admit it: I’m a grammar nerd. I know the difference between ‘who’ and ‘whom’, and I’m proud. I used to be pretty militant, but these days I’m more relaxed. I still take joy in the mechanics of the language, but I also believe that English is defined by its usage, not by a set of arbitrary rules. I’m just as happy to abuse it as to use it, although I still think it’s important to know what rules you’re breaking and why. My approach now boils down to this: language is like clothing. You (probably) wouldn’t show up to a job interview in your pyjamas1, but neither are you going to wear a tuxedo or ballgown to the pub. Getting commas and semicolons in the right place is like getting your shirt buttons done up right. Getting it wrong doesn’t mean you’re an idiot. Everyone will know what you meant. It will affect how you’re perceived, though, and that will affect how your message is perceived. And there are former rules2 that some still enforce that are nonetheless dropping out of regular usage. There was a time when everyone in an office job wore formal clothing. Then it became acceptable just to have a blouse, or a shirt and tie. Then the tie became optional and now there are many professions where perfectly well-respected and competent people are expected to show up wearing nothing smarter than jeans and a t-shirt. One such rule IMHO is that ‘data’ is a plural and should take pronouns like ‘they’ and ‘these’. The origin of the word ‘data’ is in the Latin plural of ‘datum’, and that idea has clung on for a considerable period. But we don’t speak Latin and the English language continues to evolve: ‘agenda’ also began life as a Latin plural, but we don’t use the word ‘agendum’ any more. It’s common everyday usage to refer to data with singular pronouns like ‘it’ and ‘this’, and it’s very rare to see someone referring to a single datum (as opposed to ‘data point’ or something). If you want to get technical, I tend to think of data as a mass noun, like ‘water’ or ‘information’. It’s uncountable: talking about ‘a water’ or ‘an information’ doesn’t make much sense, but it uses singular pronouns, as in ‘this information’. If you’re interested, the Oxford English Dictionary also takes this position, while Chambers leaves the choice of singular or plural noun up to you. There is absolutely nothing wrong, in my book, with referring to data in the plural as many people still do. But it’s no longer a rule and for me it’s weakened further from guideline to preference. It’s like wearing a bow-tie to work. There’s nothing wrong with it and some people really make it work, but it’s increasingly outdated and even a little eccentric. or maybe you’d totally rock it. ↩︎ Like not starting a sentence with a conjunction… ↩︎ #IDCC16 day 2: new ideas Well, I did a great job of blogging the conference for a couple of days, but then I was hit by the bug that’s been going round and didn’t have a lot of energy for anything other than paying attention and making notes during the day! I’ve now got round to reviewing my notes so here are a few reflections on day 2. Day 2 was the day of many parallel talks! So many great and inspiring ideas to take in! Here are a few of my take-home points. Big science and the long tail The first parallel session had examples of practical data management in the real world. Jian Qin & Brian Dobreski (School of Information Studies, Syracuse University) worked on reproducibility with one of the research groups involved with the recent gravitational wave discovery. “Reproducibility” for this work (as with much of physics) mostly equates to computational reproducibility: tracking the provenance of the code and its input and output is key. They also found that in practice the scientists' focus was on making the big discovery, and ensuring reproducibility was seen as secondary. This goes some way to explaining why current workflows and tools don’t really capture enough metadata. Milena Golshan & Ashley Sands (Center for Knowledge Infrastructures, UCLA) investigated the use of Software-as-a-Service (SaaS, such as Google Drive, Dropbox or more specialised tools) as a way of meeting the needs of long-tail science research such as ocean science. This research is characterised by small teams, diverse data, dynamic local development of tools, local practices and difficulty disseminating data. This results in a need for researchers to be generalists, as opposed to “big science” research areas, where they can afford to specialise much more deeply. Such generalists tend to develop their own isolated workflows, which can differ greatly even within a single lab. Long-tail research also often struggles from a lack of dedicated IT support. They found that use of SaaS could help to meet these challenges, but with a high cost required to cover the needed guarantees of security and stability. Education & training This session focussed on the professional development of library staff. Eleanor Mattern (University of Pittsburgh) described the immersive training introduced to improve librarians' understanding of the data needs of their subject areas in delivering their RDM service delivery model. The participants each conducted a “disciplinary deep dive”, shadowing researchers and then reporting back to the group on their discoveries with a presentation and discussion. Liz Lyon (also University of Pittsburgh, formerly UKOLN/DCC) gave a systematic breakdown of the skills, knowledge and experience required in different data-related roles, obtained from an analysis of job adverts. She identified distinct roles of data analyst, data engineer and data journalist, and as well as each role’s distinctive skills, pinpointed common requirements of all three: Python, R, SQL and Excel. This work follows on from an earlier phase which identified an allied set of roles: data archivist, data librarian and data steward. Data sharing and reuse This session gave an overview of several specific workflow tools designed for researchers. Marisa Strong (University of California Curation Centre/California Digital Libraries) presented Dash, a highly modular tool for manual data curation and deposit by researchers. It’s built on their flexible backend, Stash, and though it’s currently optimised to deposit in their Merritt data repository it could easily be hooked up to other repositories. It captures DataCite metadata and a few other fields, and is integrated with ORCID to uniquely identify people. In a different vein, Eleni Castro (Institute for Quantitative Social Science, Harvard University) discussed some of the ways that Harvard’s Dataverse repository is streamlining deposit by enabling automation. It provides a number of standardised endpoints such as OAI-PMH for metadata harvest and SWORD for deposit, as well as custom APIs for discovery and deposit. Interesting use cases include: An addon for the Open Science Framework to deposit in Dataverse via SWORD An R package to enable automatic deposit of simulation and analysis results Integration with publisher workflows Open Journal Systems A growing set of visualisations for deposited data In the future they’re also looking to integrate with DMPtool to capture data management plans and with Archivematica for digital preservation. Andrew Treloar (Australian National Data Service) gave us some reflections on the ANDS “applications programme”, a series of 25 small funded projects intended to address the fourth of their strategic transformations, single use → reusable. He observed that essentially these projects worked because they were able to throw money at a problem until they found a solution: not very sustainable. Some of them stuck to a traditional “waterfall” approach to project management, resulting in “the right solution 2 years late”. Every researcher’s needs are “special” and communities are still constrained by old ways of working. The conclusions from this programme were that: “Good enough” is fine most of the time Adopt/Adapt/Augment is better than Build Existing toolkits let you focus on the 10% functionality that’s missing Succussful projects involved research champions who can: 1) articulate their community’s requirements; and 2) promote project outcomes Summary All in all, it was a really exciting conference, and I’ve come home with loads of new ideas and plans to develop our services at Sheffield. I noticed a continuation of some of the trends I spotted at last year’s IDCC, especially an increasing focus on “second-order” problems: we’re no longer spending most of our energy just convincing researchers to take data management seriously and are able to spend more time helping them to do it better and get value out of it. There’s also a shift in emphasis (identified by closing speaker Cliff Lynch) from sharing to reuse, and making sure that data is not just available but valuable. #IDCC16 Day 1: Open Data The main conference opened today with an inspiring keynote by Barend Mons, Professor in Biosemantics, Leiden University Medical Center. The talk had plenty of great stuff, but two points stood out for me. First, Prof Mons described a newly discovered link between Huntingdon’s Disease and a previously unconsidered gene. No-one had previously recognised this link, but on mining the literature, an indirect link was identified in more than 10% of the roughly 1 million scientific claims analysed. This is knowledge for which we already had more than enough evidence, but which could never have been discovered without such a wide-ranging computational study. Second, he described a number of behaviours which should be considered “malpractice” in science: Relying on supplementary data in articles for data sharing: the majority of this is trash (paywalled, embedded in bitmap images, missing) Using the Journal Impact Factor to evaluate science and ignoring altmetrics Not writing data stewardship plans for projects (he prefers this term to “data management plan”) Obstructing tenure for data experts by assuming that all highly-skilled scientists must have a long publication record A second plenary talk from Andrew Sallons of the Centre for Open Science introduced a number of interesting-looking bits and bobs, including the Transparency & Openness Promotion (TOP) Guidelines which set out a pathway to help funders, publishers and institutions move towards more open science. The rest of the day was taken up with a panel on open data, a poster session, some demos and a birds-of-a-feather session on sharing sensitive/confidential data. There was a great range of posters, but a few that stood out to me were: Lessons learned about ISO 16363 (“Audit and certification of trustworthy digital repositories”) certification from the British Library Two separate posters (from the Universities of Toronto and Colorado) about disciplinary RDM information & training for liaison librarians A template for sharing psychology data developed by a psychologist-turned-information researcher from Carnegie Mellon University More to follow, but for now it’s time for the conference dinner! #IDCC16 Day 0: business models for research data management I’m at the International Digital Curation Conference 2016 (#IDCC16) in Amsterdam this week. It’s always a good opportunity to pick up some new ideas and catch up with colleagues from around the world, and I always come back full of new possibilities. I’ll try and do some more reflective posts after the conference but I thought I’d do some quick reactions while everything is still fresh. Monday and Thursday are pre- and post-conference workshop days, and today I attended Developing Research Data Management Services. Joy Davidson and Jonathan Rans from the Digital Curation Centre (DCC) introduced us to the Business Model Canvas, a template for designing a business model on a single sheet of paper. The model prompts you to think about all of the key facets of a sustainable, profitable business, and can easily be adapted to the task of building a service model within a larger institution. The DCC used it as part of the Collaboration to Clarify Curation Costs (4C) project, whose output the Curation Costs Exchange is also worth a look. It was a really useful exercise to be able to work through the whole process for an aspect of research data management (my table focused on training & guidance provision), both because of the ideas that came up and also the experience of putting the framework into practice. It seems like a really valuable tool and I look forward to seeing how it might help us with our RDM service development. Tomorrow the conference proper begins, with a range of keynotes, panel sessions and birds-of-a-feather meetings so hopefully more then! About me I help people in Higher Education communicate and collaborate more effectively using technology. I currently work at the University of Sheffield focusing on research data management policy, practice, training and advocacy. In my free time, I like to: run; play the accordion; morris dance; climb; cook; read (fiction and non-fiction); write. Better Science Through Better Data #scidata17 Better Science through Better DoughnutsJez Cope Update: fixed the link to the slides so it works now! Last week I had the honour of giving my first ever keynote talk, at an event entitled Better Science Through Better Data hosted jointly by Springer Nature and the Wellcome Trust. It was nerve-wracking but exciting and seemed to go down fairly well. I even got accidentally awarded a PhD in the programme — if only it was that easy! The slides for the talk, “Supporting Open Research: The role of an academic library”, are available online (doi:10.15131/shef.data.5537269), and the whole event was video’d for posterity and viewable online. I got some good questions too, mainly from the clever online question system. I didn’t get to answer all of them, so I’m thinking of doing a blog post or two to address a few more. There were loads of other great presentations as well, both keynotes and 7-minute lightning talks, so I’d encourage you to take a look at at least some of it. I’ll pick out a few of my highlights. Dr Aled Edwards (University of Toronto) There’s a major problem with science funding that I hadn’t really thought about before. The available funding pool for research is divided up into pots by country, and often by funding body within a country. Each of these pots have robust processes to award funding to the most important problems and most capable researchers. The problem comes because there is no coordination between these pots, so researchers all over the world end up getting funded to research the most popular problems leading to a lot of duplication of effort. Industry funding suffers from a similar problem, particularly the pharmaceutical industry. Because there is no sharing of data or negative results, multiple companies spend billions researching the same dead ends chasing after the same drugs. This is where the astronomical costs of drug development come from. Dr Edwards presented one alternative, modelled by a company called M4K Pharma. The idea is to use existing IP laws to try and give academic researchers a reasonable, morally-justifiable and sustainable profit on drugs they develop, in contrast to the current model where basic research is funded by governments while large corporations hoover up as much profit as they possibly can. This new model would develop drugs all the way to human trial within academia, then license the resulting drugs to companies to manufacture with a price cap to keep the medicines affordable to all who need them. Core to this effort is openness with data, materials and methodology, and Dr Edwards presented several examples of how this approach benefited academic researchers, industry and patients compared with a closed, competitive focus. Dr Kirstie Whitaker (Alan Turing Institute) This was a brilliant presentation, presenting a practical how-to guide to doing reproducible research, from one researcher to another. I suggest you take a look at her slides yourself: Showing your working: a how-to guide to reproducible research. Dr Whitaker briefly addressed a number of common barriers to reproducible research: Is not considered for promotion: so it should be! Held to higher standards than others: reviewers should be discouraged from nitpicking just because the data/code/whatever is available (true unbiased peer review of these would be great though) Publication bias towards novel findings: it is morally wrong to not publish reproductions, replications etc. so we need to address the common taboo on doing so Plead the 5th: if you share, people may find flaws, but if you don’t they can’t — if you’re worried about this you should ask yourself why! Support additional users: some (much?) of the burden should reasonably on the reuser, not the sharer Takes time: this is only true if you hack it together after the fact; if you do it from the start, the whole process will be quicker! Requires additional skills: important to provide training, but also to judge PhD students on their ability to do this, not just on their thesis & papers The rest of the presentation, the “how-to” guide of the title' was a well-chosen and passionately delivered set of recommendations, but the thing that really stuck out for me is how good Dr Whitaker is at making the point that you only have to do one of these things to improve the quality of your research. It’s easy to get the impression at the moment that you have to be fully, perfectly open or not at all, but it’s actually OK to get there one step at a time, or even not to go all the way at all! Anyway, I think this is a slide deck that speaks for itself, so I won’t say any more! Lightning talk highlights There was plenty of good stuff in the lightning talks, which were constrained to 7 minutes each, but a few of the things that stood out for me were, in no particular order: Code Ocean — share and run code in the cloud dat project — peer to peer data syncronisation tool Can automate metadata creation, data syncing, versioning Set up a secure data sharing network that keeps the data in sync but off the cloud Berlin Institute of Health — open science course for students Pre-print paper Course materials InterMine — taking the pain out of data cleaning & analysis Nix/NixOS as a component of a reproducible paper BoneJ (ImageJ plugin for bone analysis) — developed by a scientist, used a lot, now has a Wellcome-funded RSE to develop next version ESASky — amazing live, online archive of masses of astronomical data Coda I really enjoyed the event (and the food was excellent too). My thanks go out to: The programme committee for asking me to come and give my take — I hope I did it justice! The organising team who did a brilliant job of keeping everything running smoothly before and during the event The University of Sheffield for letting me get away with doing things like this! Blog platform switch I’ve just switched my blog over to the Nikola static site generator. Hopefully you won’t notice a thing, but there might be a few weird spectres around til I get all the kinks ironed out. I’ve made the switch for a couple of main reasons: Nikola supports Jupyter notebooks as a source format for blog posts, which will be useful to include code snippets It’s written in Python, a language which I actually know, so I’m more likely to be able to fix things that break, customise it and potentially contribute to the open source project (by contrast, Hugo is written in Go, which I’m not really familiar with) Chat rooms vs Twitter: how I communicate now CC0, Pixabay This time last year, Brad Colbow published a comic in his “The Brads” series entitled “The long slow death of Twitter”. It really encapsulates the way I’ve been feeling about Twitter for a while now. Go ahead and take a look. I’ll still be here when you come back. According to my Twitter profile, I joined in February 2009 as user #20,049,102. It was nearing its 3rd birthday and, though there were clearly a lot of people already signed up at that point, it was still relatively quiet, especially in the UK. I was a lonely PhD student just starting to get interested in educational technology, and one thing that Twitter had in great supply was (and still is) people pushing back the boundaries of what tech can do in different contexts. Somewhere along the way Twitter got really noisy, partly because more people (especially commercial companies) are using it more to talk about stuff that doesn’t interest me, and partly because I now follow 1,200+ people and find I get several tweets a second at peak times, which no-one could be expected to handle. More recently I’ve found my attention drawn to more focussed communities instead of that big old shouting match. I find I’m much more comfortable discussing things and asking questions in small focussed communities because I know who might be interested in what. If I come across an article about a cool new Python library, I’ll geek out about it with my research software engineer friends; if I want advice on an aspect of my emacs setup, I’ll ask a bunch of emacs users. I feel like I’m talking to people who want to hear what I’m saying. Next to that experience, Twitter just feels like standing on a street corner shouting. IRC channels (mostly on Freenode), and similar things like Slack and gitter form the bulk of this for me, along with a growing number of WhatsApp group chats. Although online chat is theoretically a synchronous medium, I find that I can treat it more as “semi-synchronous”: I can have real-time conversations as they arise, but I can also close them and tune back in later to catch up if I want. Now I come to think about it, this is how I used to treat Twitter before the 1,200 follows happened. I also find I visit a handful of forums regularly, mostly of the Reddit link-sharing or StackExchange Q&A type. /r/buildapc was invaluable when I was building my latest box, /r/EarthPorn (very much not NSFW) is just beautiful. I suppose the risk of all this is that I end up reinforcing my own echo chamber. I’m not sure how to deal with that, but I certainly can’t deal with it while also suffering from information overload. Not just certifiable… A couple of months ago, I went to Oxford for an intensive, 2-day course run by Software Carpentry and Data Carpentry for prospective new instructors. I’ve now had confirmation that I’ve completed the checkout procedure so it’s official: I’m now a certified Data Carpentry instructor! As far as I’m aware, the certification process is now combined, so I’m also approved to teach Software Carpentry material too. And of course there’s Library Carpentry too… SSI Fellowship 2020 I’m honoured and excited to be named one of this year’s Software Sustainability Institute Fellows. There’s not much to write about yet because it’s only just started, but I’m looking forward to sharing more with you. In the meantime, you can take a look at the 2020 fellowship announcement and get an idea of my plans from my application video: Talks Here is a selection of talks that I’ve given. {{% template %}} <%! import arrow %> Date Title Location % for talk in post.data("talks"): % if 'date' in talk: ${date.format('ddd d MMM YYYY')} % endif % if 'url' in talk: % endif ${talk['title']} % if 'url' in talk: % endif ${talk.get('location', '')} % endfor {{% /template %}} ethereum-org-4312 ---- ERC-721 Non-Fungible Token Standard | ethereum.org Help update this page There’s a new version of this page but it’s only in English right now. Help us translate the latest version. Translate page See English Use Ethereum Ethereum Wallets Get ETH Decentralized applications (dapps) Stablecoins Stake ETH Learn What is Ethereum? What is ether (ETH)? Decentralized finance (DeFi) Decentralized autonomous organisations (DAOs) Non-fungible tokens (NFTs) History of Ethereum Ethereum Whitepaper Ethereum 2.0 Ethereum Glossary Ethereum Improvement Proposals Community guides and resources Developers Developers' Home Documentation Tutorials Learn by coding Set up local environment Enterprise Mainnet Ethereum Private Ethereum Community Ethereum Community Grants No results for your search "" Languages Use Ethereum Ethereum Wallets Get ETH Decentralized applications (dapps) Stablecoins Stake ETH Learn What is Ethereum? What is ether (ETH)? Decentralized finance (DeFi) Decentralized autonomous organisations (DAOs) Non-fungible tokens (NFTs) History of Ethereum Ethereum Whitepaper Ethereum 2.0 Ethereum Glossary Ethereum Improvement Proposals Community guides and resources Developers Developers' Home Documentation Tutorials Learn by coding Set up local environment Enterprise Mainnet Ethereum Private Ethereum Community Ethereum Community Grants Search Light Languages Search No results for your search "" Search away! This page is incomplete and we'd love your help. Edit this page and add anything that you think might be useful to others. ERC-721 Non-Fungible Token Standard On this page Introduction Prerequisites Body Examples Popular NFTs Further reading Introduction What is a Non-Fungible Token? A Non-Fungible Token (NFT) is used to identify something or someone in a unique way. This type of Token is perfect to be used on platforms that offer collectible items, access keys, lottery tickets, numbered seats for concerts and sports matches, etc. This special type of Token has amazing possibilities so it deserves a proper Standard, the ERC-721 came to solve that! What is ERC-721? The ERC-721 introduces a standard for NFT, in other words, this type of Token is unique and can have different value than another Token from the same Smart Contract, maybe due to its age, rarity or even something else like its visual. Wait, visual? Yes! All NFTs have a uint256 variable called tokenId, so for any ERC-721 Contract, the pair contract address, uint256 tokenId must be globally unique. Said that a dApp can have a "converter" that uses the tokenId as input and outputs an image of something cool, like zombies, weapons, skills or amazing kitties! Prerequisites Accounts Smart Contracts Token standards Body The ERC-721 (Ethereum Request for Comments 721), proposed by William Entriken, Dieter Shirley, Jacob Evans, Nastassia Sachs in January 2018, is a Non-Fungible Token Standard that implements an API for tokens within Smart Contracts. It provides functionalities like to transfer tokens from one account to another, to get the current token balance of an account, to get the owner of a specific token and also the total supply of the token available on the network. Besides these it also has some other functionalities like to approve that an amount of token from an account can be moved by a third party account. If a Smart Contract implements the following methods and events it can be called an ERC-721 Non-Fungible Token Contract and, once deployed, it will be responsible to keep track of the created tokens on Ethereum. From EIP-721: Methods 1 function balanceOf(address _owner) external view returns (uint256); 2 function ownerOf(uint256 _tokenId) external view returns (address); 3 function safeTransferFrom(address _from, address _to, uint256 _tokenId, bytes data) external payable; 4 function safeTransferFrom(address _from, address _to, uint256 _tokenId) external payable; 5 function transferFrom(address _from, address _to, uint256 _tokenId) external payable; 6 function approve(address _approved, uint256 _tokenId) external payable; 7 function setApprovalForAll(address _operator, bool _approved) external; 8 function getApproved(uint256 _tokenId) external view returns (address); 9 function isApprovedForAll(address _owner, address _operator) external view returns (bool); 10 Show all Copy Events 1 event Transfer(address indexed _from, address indexed _to, uint256 indexed _tokenId); 2 event Approval(address indexed _owner, address indexed _approved, uint256 indexed _tokenId); 3 event ApprovalForAll(address indexed _owner, address indexed _operator, bool _approved); 4 Copy Examples Let's see how a Standard is so important to make things simple for us to inspect any ERC-721 Token Contract on Ethereum. We just need the Contract Application Binary Interface (ABI) to create an interface to any ERC-721 Token. As you can see below we will use a simplified ABI, to make it a low friction example. Web3.py Example First, make sure you have installed Web3.py Python library: 1$ pip install web3 2 1from web3 import Web3 2from web3.utils.events import get_event_data 3 4 5w3 = Web3(Web3.HTTPProvider("https://cloudflare-eth.com")) 6 7ck_token_addr = "0x06012c8cf97BEaD5deAe237070F9587f8E7A266d" # CryptoKitties Contract 8 9acc_address = "0xb1690C08E213a35Ed9bAb7B318DE14420FB57d8C" # CryptoKitties Sales Auction 10 11# This is a simplified Contract Application Binary Interface (ABI) of an ERC-721 NFT Contract. 12# It will expose only the methods: balanceOf(address), name(), ownerOf(tokenId), symbol(), totalSupply() 13simplified_abi = [ 14 { 15 'inputs': [{'internalType': 'address', 'name': 'owner', 'type': 'address'}], 16 'name': 'balanceOf', 17 'outputs': [{'internalType': 'uint256', 'name': '', 'type': 'uint256'}], 18 'payable': False, 'stateMutability': 'view', 'type': 'function', 'constant': True 19 }, 20 { 21 'inputs': [], 22 'name': 'name', 23 'outputs': [{'internalType': 'string', 'name': '', 'type': 'string'}], 24 'stateMutability': 'view', 'type': 'function', 'constant': True 25 }, 26 { 27 'inputs': [{'internalType': 'uint256', 'name': 'tokenId', 'type': 'uint256'}], 28 'name': 'ownerOf', 29 'outputs': [{'internalType': 'address', 'name': '', 'type': 'address'}], 30 'payable': False, 'stateMutability': 'view', 'type': 'function', 'constant': True 31 }, 32 { 33 'inputs': [], 34 'name': 'symbol', 35 'outputs': [{'internalType': 'string', 'name': '', 'type': 'string'}], 36 'stateMutability': 'view', 'type': 'function', 'constant': True 37 }, 38 { 39 'inputs': [], 40 'name': 'totalSupply', 41 'outputs': [{'internalType': 'uint256', 'name': '', 'type': 'uint256'}], 42 'stateMutability': 'view', 'type': 'function', 'constant': True 43 }, 44] 45 46ck_extra_abi = [ 47 { 48 'inputs': [], 49 'name': 'pregnantKitties', 50 'outputs': [{'name': '', 'type': 'uint256'}], 51 'payable': False, 'stateMutability': 'view', 'type': 'function', 'constant': True 52 }, 53 { 54 'inputs': [{'name': '_kittyId', 'type': 'uint256'}], 55 'name': 'isPregnant', 56 'outputs': [{'name': '', 'type': 'bool'}], 57 'payable': False, 'stateMutability': 'view', 'type': 'function', 'constant': True 58 } 59] 60 61ck_contract = w3.eth.contract(address=w3.toChecksumAddress(ck_token_addr), abi=simplified_abi+ck_extra_abi) 62name = ck_contract.functions.name().call() 63symbol = ck_contract.functions.symbol().call() 64kitties_auctions = ck_contract.functions.balanceOf(acc_address).call() 65print(f"{name} [{symbol}] NFTs in Auctions: {kitties_auctions}") 66 67pregnant_kitties = ck_contract.functions.pregnantKitties().call() 68print(f"{name} [{symbol}] NFTs Pregnants: {pregnant_kitties}") 69 70# Using the Transfer Event ABI to get info about transferred Kitties. 71tx_event_abi = { 72 'anonymous': False, 73 'inputs': [ 74 {'indexed': False, 'name': 'from', 'type': 'address'}, 75 {'indexed': False, 'name': 'to', 'type': 'address'}, 76 {'indexed': False, 'name': 'tokenId', 'type': 'uint256'}], 77 'name': 'Transfer', 78 'type': 'event' 79} 80 81# We need the event's signature to filter the logs 82event_signature = w3.sha3(text="Transfer(address,address,uint256)").hex() 83 84logs = w3.eth.getLogs({ 85 "fromBlock": w3.eth.blockNumber - 120, 86 "address": w3.toChecksumAddress(ck_token_addr), 87 "topics": [event_signature] 88}) 89 90# Notes: 91# - 120 blocks is the max range for CloudFlare Provider 92# - If you didn't find any Transfer event you can also try to get a tokenId at: 93# https://etherscan.io/address/0x06012c8cf97BEaD5deAe237070F9587f8E7A266d#events 94# Click to expand the event's logs and copy its "tokenId" argument 95 96recent_tx = [get_event_data(tx_event_abi, log)["args"] for log in logs] 97 98kitty_id = recent_tx[0]['tokenId'] # Paste the "tokenId" here from the link above 99is_pregnant = ck_contract.functions.isPregnant(kitty_id).call() 100print(f"{name} [{symbol}] NFTs {kitty_id} is pregnant: {is_pregnant}") 101 Show all Copy CryptoKitties Contract has some interesting Events other than the Standard ones. Let's check two of them, Pregnant and Birth. 1# Using the Pregnant and Birth Events ABI to get info about new Kitties. 2ck_extra_events_abi = [ 3 { 4 'anonymous': False, 5 'inputs': [ 6 {'indexed': False, 'name': 'owner', 'type': 'address'}, 7 {'indexed': False, 'name': 'matronId', 'type': 'uint256'}, 8 {'indexed': False, 'name': 'sireId', 'type': 'uint256'}, 9 {'indexed': False, 'name': 'cooldownEndBlock', 'type': 'uint256'}], 10 'name': 'Pregnant', 11 'type': 'event' 12 }, 13 { 14 'anonymous': False, 15 'inputs': [ 16 {'indexed': False, 'name': 'owner', 'type': 'address'}, 17 {'indexed': False, 'name': 'kittyId', 'type': 'uint256'}, 18 {'indexed': False, 'name': 'matronId', 'type': 'uint256'}, 19 {'indexed': False, 'name': 'sireId', 'type': 'uint256'}, 20 {'indexed': False, 'name': 'genes', 'type': 'uint256'}], 21 'name': 'Birth', 22 'type': 'event' 23 }] 24 25# We need the event's signature to filter the logs 26ck_event_signatures = [ 27 w3.sha3(text="Pregnant(address,uint256,uint256,uint256)").hex(), 28 w3.sha3(text="Birth(address,uint256,uint256,uint256,uint256)").hex(), 29] 30 31# Here is a Pregnant Event: 32# - https://etherscan.io/tx/0xc97eb514a41004acc447ac9d0d6a27ea6da305ac8b877dff37e49db42e1f8cef#eventlog 33pregnant_logs = w3.eth.getLogs({ 34 "fromBlock": w3.eth.blockNumber - 120, 35 "address": w3.toChecksumAddress(ck_token_addr), 36 "topics": [ck_extra_events_abi[0]] 37}) 38 39recent_pregnants = [get_event_data(ck_extra_events_abi[0], log)["args"] for log in pregnant_logs] 40 41# Here is a Birth Event: 42# - https://etherscan.io/tx/0x3978028e08a25bb4c44f7877eb3573b9644309c044bf087e335397f16356340a 43birth_logs = w3.eth.getLogs({ 44 "fromBlock": w3.eth.blockNumber - 120, 45 "address": w3.toChecksumAddress(ck_token_addr), 46 "topics": [ck_extra_events_abi[1]] 47}) 48 49recent_births = [get_event_data(ck_extra_events_abi[1], log)["args"] for log in birth_logs] 50 Show all Copy Popular NFTs Etherscan NFT Tracker list the top NFT on Ethereum by tranfers volume. CryptoKitties is a game centered around breedable, collectible, and oh-so-adorable creatures we call CryptoKitties. Sorare is a global fantasy football game where you can collect limited editions collectibles, manage your teams and compete to earn prizes. The Ethereum Name Service (ENS) offers a secure & decentralised way to address resources both on and off the blockchain using simple, human-readable names. Unstoppable Domains is a San Francisco-based company building domains on blockchains. Blockchain domains replace cryptocurrency addresses with human-readable names and can be used to enable censorship-resistant websites. Gods Unchained Cards is a TCG on the Ethereum blockchain that uses NFT's to bring real ownership to in-game assets. Further reading EIP-721: ERC-721 Non-Fungible Token Standard OpenZeppelin - ERC-721 Docs OpenZeppelin - ERC-721 Implementation Back to top ↑ Did this page help answer your question? YesNo PreviousERC-20: Fungible Tokens NextOracles Edit page On this page Introduction Prerequisites Body Examples Popular NFTs Further reading Website last updated: April 27, 2021 Use Ethereum Ethereum Wallets Get ETH Decentralized applications (dapps) Stablecoins Stake ETH Learn What is Ethereum? What is ether (ETH)? Community guides and resources History of Ethereum Ethereum Whitepaper Ethereum 2.0 Ethereum Glossary Ethereum Improvement Proposals Developers Get started Documentation Tutorials Learn by coding Set up local environment Developer Resources Ecosystem Ethereum Community Ethereum Foundation Ethereum Foundation Blog Ecosystem Support Program Ecosystem Grant Programs Ethereum Brand Assets Devcon Enterprise Mainnet Ethereum Private Ethereum Enterprise About ethereum.org About us Jobs Contributing Language Support Privacy policy Terms of Use Cookie Policy Contact evergreen-ils-org-1147 ---- Evergreen Downloads – Evergreen ILS Skip to content Evergreen – Open Source Library Software Evergreen – Open Source Library Software About Us Overview Annual Reports F.A.Q. Evergreen Event Code of Conduct Software Freedom Conservancy Project Governance Trademark Policy Documentation Official Documentation Documentation Interest Group Evergreen Roadmap Evergreen Wiki Tabular Release Notes Get Involved! Get Involved! Committees & Interest Groups Communications Mailing Lists IRC Calendar Blog Jobs Proposed Development Projects Merchandise T-shirts and more Conference All Conferences 2021 Evergreen International Online Conference 2020 Evergreen International Online Conference Event Photography Policy Code of Conduct Downloads Evergreen Downloads OpenSRF Downloads Home » Evergreen Downloads Evergreen Downloads Evergreen Downloads Evergreen depends on the following technologies Perl, C, JavaScript, XML, XPath, XSLT, XMPP, OpenSRF, Apache, mod_perl, and PostgreSQL. The latest stable release of a supported Linux distribution is recommended for an Evergreen installation. For Ubuntu, please use the 18.04 64-bit LTS (long term support) Server release. Currently the latest release from the Evergreen 3.6 series is recommended for new installations and stable releases are suggested for production systems. Note: Evergreen servers and staff clients must match. For example, if you are running server version 3.1.0, you should use version 3.1.0 of the staff client. Evergreen 3.2.0+ no longer supports a separate client by default, but building a client remains as an unsupported option. Server & staff client downloads 3.7 Series 3.6 Series 3.5 Series Status stable stable stable Latest Release 3.7.0 3.6.3 3.5.4 Release Date 2021-04-14 2021-04-01 2021-04-01 Release Notes Release Notes Release Notes Release Notes Tabular release notes summary ChangeLog ChangeLog ChangeLog ChangeLog Evergreen Installation Install Instructions Install Instructions Install Instructions Upgrading Notes on upgrading from 3.6.2 TBD TBD OpenSRF Software 3.2.1 (md5) 3.2.1 (md5) 3.2.1 (md5) Server Software Source (md5) Source (md5) Source (md5) Web Staff Client Extension (“Hatch”) Windows Hatch Installer 0.3.2 (md5) – Installation Instructions (Windows & Linux) Git Repository Git Location Git Location Git Location Other Evergreen Staff Clients Staff Client Archive Windows Staff Clients for slightly older stable releases (2.11, 2.10). For Mac and Linux Installing the Evergreen client on Macs Evergreen 2.8.3 Mac Staff Client [.dmg] Evergreen 2.9.0 Mac Staff Client [.dmg] Evergreen 2.12.0 Mac Staff Client [.zip] Evergreen 3.0.0 Mac Staff Client [.zip] Pre-built MAC staff client for Evergreen 2.10 and 2.8 – Provided by SITKA Evergreen in action Visit the Evergreen catalog on our demonstration and development servers, or visit this list of live Evergreen libraries. You can also download an Evergreen staff client and point it at the Evergreen demo or development server (see the community servers page for details). Bug Reports Please report any Evergreen bugs/wishlist on Launchpad. To submit a vulnerability please email your report to open-ils-security@esilibrary.com. Evergreen Code Museum Older versions of Evergreen software are available from the Evergreen Code Museum. Source Code Repository A Gitweb instance sits atop the Git repositories for Evergreen and OpenSRF. You can find both repositories at git.evergreen-ils.org. Here is the running change log for the Evergreen code repository: watch us work. Trac sends code commits to two public Evergreen mailing lists: For Evergreen commits, subscribe to open-ils-commits For OpenSRF commits, subscribe to opensrf-commits About Evergreen This is the project site for Evergreen, a highly-scalable software for libraries that helps library patrons find library materials, and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries. © 2008-2020 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later. The Evergreen Project is a 501(c)3 nonprofit organization. Community Links Evergreen Bug Tracker Evergreen on Open HUB Evergreen Wiki Git Repositories Join IRC! IRC Logs Official Documentation · © 2021 Evergreen ILS · Powered by · Designed with the Customizr theme · evergreen-ils-org-2740 ---- None evergreen-ils-org-3150 ---- Evergreen 3.7.0 Release Notes Evergreen 3.7.0 Release Notes Table of Contents JavaScript must be enabled in your browser to display the table of contents. 1. Upgrade notes 1.1. Database Upgrade Procedure The database schema upgrade for Evergreen 3.7 has more steps than normal. The general procedure, assuming Evergreen 3.6.2 as the starting point, is: Run the main 3.6.2 ⇒ to 3.7 schema update script from the Evergreen source directory, supplying database connection parameters as needed: psql -f Open-ILS/src/sql/Pg/version-upgrade/3.6.2-3.7.0-upgrade-db.sql 2>&1 | tee 3.6.2-3.7.0-upgrade-db.log Create and ingest search suggestions: Run the following from psql to export the strings to files: \a \t \o title select value from metabib.title_field_entry; \o author select value from metabib.author_field_entry; \o subject select value from metabib.subject_field_entry; \o series select value from metabib.series_field_entry; \o identifier select value from metabib.identifier_field_entry; \o keyword select value from metabib.keyword_field_entry; \o \a \t From the command line, convert the exported words into SQL scripts to load into the database. This step assumes that you are at the top of the Evergreen source tree. $ ./Open-ILS/src/support-scripts/symspell-sideload.pl title > title.sql $ ./Open-ILS/src/support-scripts/symspell-sideload.pl author > author.sql $ ./Open-ILS/src/support-scripts/symspell-sideload.pl subject > subject.sql $ ./Open-ILS/src/support-scripts/symspell-sideload.pl series > series.sql $ ,/Open-ILS/src/support-scripts/symspell-sideload.pl identifier > identifier.sql $ ./Open-ILS/src/support-scripts/symspell-sideload.pl keyword > keyword.sql Back in psql, import the suggestions. This step can take several hours in a large databases, but the \i $FILE.sql` steps can be run in parallel. ALTER TABLE search.symspell_dictionary SET UNLOGGED; TRUNCATE search.symspell_dictionary; \i identifier.sql \i author.sql \i title.sql \i subject.sql \i series.sql \i keyword.sql CLUSTER search.symspell_dictionary USING symspell_dictionary_pkey; REINDEX TABLE search.symspell_dictionary; ALTER TABLE search.symspell_dictionary SET LOGGED; VACUUM ANALYZE search.symspell_dictionary; DROP TABLE search.symspell_dictionary_partial_title; DROP TABLE search.symspell_dictionary_partial_author; DROP TABLE search.symspell_dictionary_partial_subject; DROP TABLE search.symspell_dictionary_partial_series; DROP TABLE search.symspell_dictionary_partial_identifier; DROP TABLE search.symspell_dictionary_partial_keyword; (optional) Apply the new opt-in setting for overdue and preduce notices. The following query will set the circ.default_overdue_notices_enabled user setting to true (the default value) for all existing users, ensuring they continue to receive overdue/predue emails. INSERT INTO actor.usr_setting (usr, name, value) SELECT id, circ.default_overdue_notices_enabled, true FROM actor.usr; The following query will add the circ.default_overdue_notices_enabled user setting as an opt-in setting for all action triggers that send emails based on a circ being due (unless another opt-in setting is already in use). UPDATE action_trigger.event_definition SET opt_in_setting = circ.default_overdue_notices_enabled, usr_field = usr WHERE opt_in_setting IS NULL AND hook = checkout.due AND reactor = SendEmail; Evergreen admins who wish to use the new setting should run both of the above queries. Admins who do not wish to use it, or who are already using a custom opt-in setting of their own, do not need to do anything. Perform a VACUUM ANALYZE of the following tables using psql: VACUUM ANALYZE authority.full_rec; VACUUM ANALYZE authority.simple_heading; VACUUM ANALYZE metabib.identifier_field_entry; VACUUM ANALYZE metabib.combined_identifier_field_entry; VACUUM ANALYZE metabib.title_field_entry; VACUUM ANALYZE metabib.combined_title_field_entry; VACUUM ANALYZE metabib.author_field_entry; VACUUM ANALYZE metabib.combined_author_field_entry; VACUUM ANALYZE metabib.subject_field_entry; VACUUM ANALYZE metabib.combined_subject_field_entry; VACUUM ANALYZE metabib.keyword_field_entry; VACUUM ANALYZE metabib.combined_keyword_field_entry; VACUUM ANALYZE metabib.series_field_entry; VACUUM ANALYZE metabib.combined_series_field_entry; VACUUM ANALYZE metabib.real_full_rec; 1.2. New Seed Data 1.2.1. New Permissions Administer geographic location services (ADMIN_GEOLOCATION_SERVICES) Administer library groups (ADMIN_LIBRARY_GROUPS) Manage batch (subscription) hold events (MANAGE_HOLD_GROUPS) Modify patron SSO settings (SSO_ADMIN) View geographic location services (VIEW_GEOLOCATION_SERVICES) 1.2.2. New Global Flags Block the ability of expired user with the STAFF_LOGIN permission to log into Evergreen (auth.block_expired_staff_login) Offer use of geographic location services in the public catalog (opac.use_geolocation) 1.2.3. New Internal Flags Maximum search result count at which spelling suggestions may be offered (opac.did_you_mean.low_result_threshold) 1.2.4. New Library Settings Allow both Shibboleth and native OPAC authentication (opac.login.shib_sso.allow_native) Allow renewal request if renewal recipient privileges have expired (circ.renew.expired_patron_allow) Enable Holdings Sort by Geographic Proximity ('opac.holdings_sort_by_geographic_proximity`) Enable Shibboleth SSO for the OPAC (opac.login.shib_sso.enable) Evergreen SSO matchpoint (opac.login.shib_sso.evergreen_matchpoint) Geographic Location Service to use for Addresses (opac.geographic_location_service_for_address) Keyboard distance score weighting in OPAC spelling suggestions (search.symspell.keyboard_distance.weight) Log out of the Shibboleth IdP (opac.login.shib_sso.logout) Minimum required uses of a spelling suggestions that may be offered (search.symspell.min_suggestion_use_threshold) Pg_trgm score weighting in OPAC spelling suggestions (search.symspell.pg_trgm.weight) Randomize group hold order (holds.subscription.randomize) Shibboleth SSO Entity ID (opac.login.shib_sso.entityId) Shibboleth SSO matchpoint (opac.login.shib_sso.shib_matchpoint) Show Geographic Proximity in Miles (opac.geographic_proximity_in_miles) Soundex score weighting in OPAC spelling suggestions (search.symspell.soundex.weight) 1.2.5. New Stock Action/Trigger Event Definitions Hold Group Hold Placed for Patron Email Notification 2. New Features 2.1. Administration 2.1.1. Single Sign On (Shibboleth) Public Catalog integration The Evergreen OPAC can now be used as a Service Provider (SP) in a Single Sign On infrastructure. This allows system administrators to connect the Evergreen OPAC to an identity provider (IdP). Such a scenario offers significant usability improvements to patrons: They can use the same, IdP-provided login screen and credentials that they use for other applications (SPs). If they have already logged into another participating application, when they arrive at the Evergreen OPAC, they can be logged in without needing to enter any credentials at all. Evergreen can be configured to offer a Single Sign-out service, where logging out of the Evergreen OPAC will also log the user out of all other SPs. It can also offer security benefits, if it enables a Shibboleth-enabled Evergreen installation to move away from insecure autogenerated user passwords (e.g. year of birth or last four digits of a phone number). Different Org Units can use different IdPs. This development also supports a mix of Shibboleth and non-Shibboleth libraries. Note that only the OPAC can be integrated with Shibboleth at this time; no such support exists for the staff client, self-check, etc. Also note that this development does not include automatic provisioning of accounts. At this time, matching accounts must already exist in Evergreen for a patron to successfully authenticate into the OPAC via Single Sign On. Installation Installing and configuring Shibboleth support is a complex project. In broad strokes, the process includes: Installing Shibboleth and the Shibboleth Apache module (apt install libapache2-mod-shib2 on Debian and Ubuntu) Configuring Shibboleth, including: Setting up a certificate assigning an Entity ID getting metadata about the IdP from the IdP (perhaps "locally maintained metadata", where an XML file from the IdP is copied into place on your Evergreen server) Understanding what attributes the IdP will provide about your users, and describing them in the attribute-map.xml file. Providing your Entity ID, information about possible bindings, and any other requested information to the IdP administrator. Much of this information will be available at http://YOUR_EVERGREEN_DOMAIN/Shibboleth.sso/Metadata Configuring Apache, including: Enabling Shibboleth authentication in the eg_vhost.conf file (Optional) Using the new sso_loc Apache variable to identify which org unit should be used as the context location when fetching Shibboleth-related library settings. As a user with the new SSO_ADMIN permission, configure Evergreen using the Library Settings Editor, including: Enable Shibboleth SSO for the OPAC (Optional) Configure whether you will use SSO exclusively, or offer patrons a choice between SSO and standard Evergreen authentication (Optional) Configure whether or not you will use Single Log Out (Optional) In scenarios where a single Evergreen installation is connected to multiple IdPs, assign org units to the relevant IdPs, referenced by the IdP’s Entity Id. Of the attributes defined in attribute-map.xml, configure which one should be used to match users in the Evergreen database. This defaults to uid. For the attribute you chose in the previous step, configure which Evergreen field it should match against. Options are usrname (default), barcode, and email. This video on the SAML protocol can be very helpful for introducing the basic concepts used in the installation and configuration processes. 2.2. Architecture 2.2.1. Block Login of Expired Staff Accounts Evergreen now has the ability to prevent staff users whose accounts have expired from logging in. This is controlled by the new global flag "auth.block_expired_staff_login", which is not enabled by default. If that flag is turned on, accounts that have the STAFF_LOGIN permission and whose expiration date is in the past are prevented from logging into any Evergreen interface, including the staff client, the public catalog, and SIP2. It should be noted that ordinary patrons are allowed to log into the public catalog if their circulation privileges have expired. This feature prevents expired staff users from logging into the public catalog (and all other Evergreen interfaces and APIs) outright in order to prevent them from getting into the staff interface anyway by creative use of Evergreen’s authentication APIs. Evergreen admins are advised to check the expiration status of staff accounts before turning on the global flag, as otherwise it is possible to lock staff users out unexpectedly. The following SQL query will identify expired but otherwise un-deleted users that would be blocked by turning on the flag: SELECT DISTINCT usrname, expire_date FROM actor.usr au, permission.usr_has_perm_at_all(id, 'STAFF_LOGIN') WHERE active AND NOT deleted AND NOT barred AND expire_date < NOW() Note that this query can take a long time to run in large databases given the general way that it checks for users that have the STAFF_LOGIN permission. Replacing the use of permission.usr_has_perm_at_all() with a query on expired users with profiles known to have the STAFF_LOGIN permission will be much faster. 2.2.2. Migration From GIST to GIN Indexes for Full Text Search Evergreen now uses GIN indexes for full text search in PostgreSQL. GIN indexes offer better performance than GIST. For more information on the differences in the two index types, please refer to the PostgreSQL documentation. An upgrade script is provided as part of this migration. If you upgrade normally from a previous release of Evergreen, this upgrade script should run as part of the upgrade process. The migration script recommends that you run a VACUUM ANALYZE in PostgreSQL on the tables that had the indexes changed. The migration process does not do this for you, so you should do it as soon as is convenient after the upgrade. Updating Your Own Indexes If you have added your own full text indexes of type GIST, and you wish to migrate them to GIN, you may do so. The following query, when run in your Evergreen databsase after the migration from GIST to GIN, will identify the remaining GIST indexes in your database: SELECT schemaname, indexname FROM pg_indexes WHERE indexdef ~* 'gist'; If the above query produces output, you can run the next query to output a SQL script to migrate the remaining indexes from GIST to GIN: SELECT 'DROP INDEX ' || schemaname || '.' || indexname || E';\n' || REGEXP_REPLACE(indexdef, 'gist', 'gin', 'i') || E';\n' || 'VACUUM ANAlYZE ' || schemaname || '.' || tablename || ';' FROM pg_indexes WHERE indexdef ~* 'gist'; 2.2.3. Removal of Custom Dojo Build Evergreen had a method of making a custom build of the Dojo JavaScript library. Following this procedure could improve the load times for the OPAC and other interfaces that use Dojo. However, very few sites took advantage of this process or even knew of its existence. As a part of the process, an openils_dojo.js file was built and installed along with the other Dojo files. Evergreen had many references to load this optional file. For the majority of sites that did not use this custom Dojo process, this file did not exist. Browsers would spend time and resources requesting this nonexistent file. This situation also contributed noise to the Apache logs with the 404 errors from these requests. In keeping with the goal of eliminating Dojo from Evergreen, all references to openils_dojo.js have been removed from the OPAC and other files. The profile script required to make the custom Dojo build has also been removed. 2.3. Cataloging 2.3.1. Czech language records in sample data This release adds 7 Czech-language MARC records to the sample data set (also known as Concerto data set). 2.3.2. Publisher Catalog Display Includes 264 Tag Publisher values are now extracted for display from tags 260 OR 264. Upgrade Notes A partial reingest is required to extract the new publisher data for display. This query may be long-running. WITH affected_bibs AS ( SELECT DISTINCT(bre.id) AS id FROM biblio.record_entry bre JOIN metabib.real_full_rec mrfr ON (mrfr.record = bre.id AND mrfr.tag = '264') WHERE NOT bre.deleted ) SELECT metabib.reingest_metabib_field_entries(id, TRUE, FALSE, TRUE, TRUE) FROM affected_bibs; 2.4. Circulation 2.4.1. Hold Groups This feature allows staff to add multiple users to a named hold group bucket and place title-level holds for a record for that entire set of users. Users can be added to such a hold group bucket from either the patron search result interface, via the Add to Bucket dropdown, or through a dedicated Hold Group interface available from the Circulation menu. Adding new patrons to a hold group bucket will require staff have the PLACE_HOLD permission. Holds can be placed for the users in a hold group bucket either directly from the normal staff-place hold interface in the embedded OPAC, or by supplying the record ID within the hold group bucket interface. In the latter case, the list of users for which a hold was attempted but failed to be placed can be downloaded by staff in order to address any placement issues. Placing a hold group bucket hold will requires staff have the MANAGE_HOLD_GROUPS permission, which is new with this development. In the event of a mistaken hold group hold, staff with the MANAGE_HOLD_GROUPS permission will have the ability to cancel all unfulfilled holds created as part of a hold group event. A link to the title’s hold interface is available from the list of hold group events in the dedicated hold group interface. 2.4.2. Scan Item as Missing Pieces Angular Port The Scan Item As Missing Pieces interface is now an Angular interface. The functionality is the same, but the interface displays more details on the item in question (title/author/callnum) before proceeding with the missing pieces process. 2.4.3. Opt-In Setting for Overdue and Predue Emails The "Receive Overdue and Courtesy Emails" user setting permits users to control whether they receive email notifications about overdue items. To use the setting, modify any action trigger event definitions which send emails about overdue items, setting the "Opt In Setting" to "circ.default_overdue_notices_enabled" and the "User Field" to "usr". You can accomplish this by running the following query in your database: UPDATE action_trigger.event_definition SET opt_in_setting = 'circ.default_overdue_notices_enabled', usr_field = 'usr' WHERE opt_in_setting IS NULL AND hook = 'checkout.due' AND reactor = 'SendEmail'; Once this is done, the patron registration screen in the staff client will show a "Receive Overdue and Courtesy Emails" checkbox, which will be checked by default. To ensure that existing patrons continue to recieve email notifications, you will need to add the user setting to their accounts, which you can do by running the following query in your database: INSERT INTO actor.usr_setting (usr, name, value) SELECT id, 'circ.default_overdue_notices_enabled', 'true' FROM actor.usr; 2.4.4. Allow Circulation Renewal for Expired Patrons The "Allow renewal request if renewal recipient privileges have expired" organizational unit setting can be set to true to permit expired patrons to renew circulations. Allowing renewals for expired patrons reduces the number of auto-renewal failures and assumes that a patron with items out eligible for renewals has not been expired for very long and that such patrons are likely to renew their privileges in a timely manner. The setting is referenced based on the current circulation library for the renewal. It takes into account the global flags for "Circ: Use original circulation library on desk renewal instead of the workstation library" and "Circ: Use original circulation library on opac renewal instead of user home library." 2.5. OPAC 2.5.1. Consistent Ordering for Carousels Carousel ordering is now stable and predictable: Newly Cataloged Item and Newest Items by Shelving Location carousels are ordered from most recently cataloged to least recently cataloged. Recently Returned Item carousels is ordered is from most recently returned to least recently returned. Top Circulated Items carousels is ordered is from most circulated to least circulated. Manual carousels (as of now, without the ability to adjust the position of items) are in the order they are added to the backing bucket. Emptying and refilling the bucket allows reordering. 2.5.2. Default Public Catalog to the Bootstrap Skin The public catalog now defaults to the Bootstrap skin rather than the legacy TPAC skin. Bootstrap is now the default in order to encourage more testing, but users should be aware of the following issues; certain specific functionality is available only in the TPAC skin. The TPAC skin remains available for use, but current Evergreen users should start actively considering migrating to the Bootstrap skin. In order to continue to use the TPAC skin, comment out the following line in eg_vhost.conf PerlAddVar OILSWebTemplatePath "@localstatedir@/templates-bootstrap" # Comment this line out to use the legacy TPAC 2.5.3. Did You Mean? Single word search suggestions This feature is the first in the series to add native search suggestions to the Evergreen search logic. A significant portion of the code is dedicated to infrastructure that will be used in later enhancements to the functionality. Overview When searching the public or staff catalog in a single search class (title, author, subject, series, identifier, or keyword) with a single search term users can be presented with alternate search terms. Depending on how the instance has been configured, suggestions may be provided for only misspelled words (as defined by existence in the bibliographic corpus), terms that are spelled properly but occur very few times, or on every single-term search. Settings The following new library settings control the behavior of the suggestions: Maximum search result count at which spelling suggestions may be offered Minimum required uses of a spelling suggestions that may be offered Maximum number of spelling suggestions that may be offered Pg_trgm score weighting in OPAC spelling suggestions Soundex score weighting in OPAC spelling suggestions QWERTY Keyboard similarity score weighting in OPAC spelling suggestions There are also two new internal flags: symspell.prefix_length symspell.max_edit_distance Upgrading This feature requires the addition of new Perl module dependencies. Please run the app server and database server dependency Makefiles before applying the database and code updates. At the end of the database upgrade script, the administrator is presented with a set of instructions necessary to precompute the suggestion dictionary based on the current bibliographic database. The first half of this procedure can be started even before the upgrade begins, as soon as the Evergreen database is no longer accessible to users that might cause changes to bibliographic records. For very large instances, this dictionary generation can take several hours and needs to be run on a server with significant RAM and CPU resources. Please look at the upgrade script before beginning an upgrade and plan this dictionary creation as part of the overall upgrade procedure. Given a server, such as a database server with 64G of RAM, you should be able to run all six of the shell commands in parallel in screen sessions or with a tool such as GNU parallel. These commands invoke a script that will generate a class-specific sub-set of the dictionary, and can be used to recreate the dictionary if necessary in the future. 2.5.4. Sort Holdings by Geographical Proximity This functionality integrates 3rd party geographic lookup services to allow patrons to enter an address on the record details page in the OPAC and sort the holdings for that record based on proximity of their circulating libraries to the entered address. To support this, latitude and longitude coordinates may be associated with each org unit. Care is given to not log or leak patron provided addresses or the context in which they are used. Requires the following Perl modules: Geo::Coder::Free, Geo::Coder::Google, and Geo::Coder::OSM Configuration instructions: Register an account with a third party geographic location service and copy the API Key. Configure the Geographic Location Service (Server Administration > Geographic Location Service > New Geographic Location Service). Enable Global Flag by navigating to Server Administration → Global Flags and locating the opac.use_geolocation flag. (Any entry in the Value field will be ignored.) Enable Library Setting: Enable Holdings Sort by Geographic Proximity (set to True). Enable Library Setting: Geographic Location Service to use for Addresses (use the value from the Name field entered in the Geographic Location Services Configuration entry). Enable Library Setting: Show Geographic Proximity in Miles (if not set, it will default to kilometers). Set the geographic coordinates for each location by navigating to Server Administration > Organizational Units. Select the org unit, switch to the Physical Address subtab and either manually enter Latitude and Longitude values or use the Get Coordinate button. Two new permissions, VIEW_GEOLOCATION_SERVICES and ADMIN_GEOLOCATION_SERVICES, control viewing and editing values in the Geolocation Location Services interface. They are added to the System Administrator and Global Administrator permissions groups by default. 2.5.5. Library Groups The Library Groups search feature revives a longstanding internal concept in Evergreen called “Lassos,” which allows an administrator to define a group of organizational units for searching outside of the standard organizational unit hierarchy. Use case examples include creating a group of law or science libraries within a university consortium, or grouping all school libraries together within a mixed school/public library consortium. Searches can be restricted to a particular Library Group from the library selector in the public catalog basic search page and from the new "Where" selector on the advanced search page. Restricting catalog searches by Library Group is available only in the public catalog and "traditional" staff catalog; it is not available in the Angular staff catalog. This feature adds a new permission, ADMIN_LIBRARY_GROUPS, that allows updating Library Groups and Library Group Maps. This permission is not associated with any profiles by default, and replaces the CREATE_LASSO, UPDATE_LASSO, and DELETE_LASSO permissions. To define new library groups, use the Server Administration Library Groups and Library Group Maps pages. An autogen and a reload of Apache should be performed after making changes to Library Groups. 2.5.6. Easier Styling of Public Catalog Logo and Cart Images Evergreen now has IDs associated with logos and cart images in the TPAC and Bootstrap OPACs to aid in customization. Images are as follows: small Evergreen logo in navigation bar is topnav_logo_image the large Evergreen logo in the center of the splash page of the TPAC is homesearch_main_logo_image the cart icon is cart_icon_image the small logo in the footer is footer_logo_image The Bootstrap OPAC does not have a homesearch logo icon as it is added in the background by CSS and can be directly styled through the CSS. 2.5.7. Easier TPAC Customization via colors.tt2 Twelve new colors for TPAC have been added to the colors.tt2 file as well as having corresponding changes to the style.css.tt2 file. These use descriptive rather than abstract names. These changes help avoid situations were unreadable values are placed on top of each other and where different values are wanted for elements that only refernece a single color previously. Guidelines are below for setting values that correspond to the previous values used in the colors.tt2 file. For more diverse customizations the OPAC should be reviewed before a production load. footer is used for the background color of the footer. It replaces the primary. footer_text sets the text color in the footer and replaces text_invert header sets the background of the header and replaces primary_fade header_text sets the color of text in the header and replaces text_invert header_links_bar sets the background of the links bar that separates the header on the front page of the opac and replaces background_invert header_links_text sets the text on the links bar and replaces text_invert header_links_text_hover set the hover text color on the links bar and replaces primary opac_button sets the background color of the My Opac button and replaces control opac_button_text explicitly sets the text color on the My Opac button opac_button_hover sets the background color of the My Opac button when the mouse is hovering over it and replaces primary opac_button_hover_text sets the text color of the My Opac button when the mouse is hovering over it and replaces text invert Note that is patch is primarily meant for users who wish to continue using TPAC rather than the Bootstrap skin for a while; new Evergreen users are advised to use the now-default Bootstrap skin. 2.5.8. Configurable Read More Accordion for OPAC Search and Record View (TPAC) Read More Button Public catalog record fields (in the TPAC skin only) now truncate themselves based on a configurable amount of characters. The full field may be displayed upon hitting a (Read More) link, which will then toggle into a (Read Less) link to re-truncate the field. Configuration Open-ILS/src/templates/opac/parts/config.tt2 contains two new configuration variables: truncate_contents (default: 1) contents_truncate_length (default: 50). Setting truncate_contents to 0 will disable the read more functionality. The variable contents_truncate_length corresponds to the amount of characters to display before truncating the text. If contents_truncate_length is removed, it will default to 100. Additional configuration for note fields can be made in Open-ILS/src/templates/opac/parts/record/contents.tt2, allowing a trunc_length variable for each individual type of note, which will override contents_truncate_length for that specific type of note. Adding Read More Functionality to further fields To add Read More functionality to any additional fields, you may use the macro accordion(), defined in misc_util.tt2. It can take three variables: str, trunc_length, and element. str corresponds to the string you want to apply it to, trunc_length (optional) will override contents_truncate_length if supplied, and element (optional) provides an alternative HTML element to look at for the truncation process (useful in situations such as the Authors and Cast fields, where each field is processed individually, but needs to be treated as a single field). 2.6. Reports 2.6.1. Reports Scheduler Improvements Previously, the reports scheduler allowed duplicated reports under certain circumstances. A uniqueness constraint now disallows this without adversely affecting the reports process. 3. Miscellaneous The Create Reservation form in the Booking module now includes an option to search for the patron by attributes other than just their barcode. (Bug 1816655) The form to add a user to a Course now includes an option to search for the patron by attributes other than just their barcode. (Bug 1907921) For consistency with the menu action Cataloging ⇒ Retrieve Record by TCN Value, the staff catalog Numeric Search ⇒ TCN search now includes deleted bib records. (Bug 1881650) Add a new command-line script, overdrive-api-checker.pl, for testing the OverDrive API. (Bug 1696825) The Shelving Location Groups editor is ported to Angular. (Bug 1852321) The staff catalog now has the ability to add all search results (up to 1,000 titles) to the basket in one fell swoop. (Bug 1885179) Add All Videos as a search format. (Bug 1917826) Server-side print templates can now have print contexts set. (Bug 1891550) Add ability to set the print context for a print template to "No-Print" to specify, well, that a given receipt should never be printed. (Bug 1891550) Add Check Number as an available column to the Bill History grids. (Bug 1705693) Adds a new control to the item table in the TPAC public catalog only to specify that only items that are available should be displayed. (Bug 1853006) Adds warning before deleting bib records with holds (Bug 1398107) Library scope on (Angular) Administration pages now defaults to workstation location rather than consortium (Bug 173322) Pending users now set last four digits of phone number as password when library setting is enabled (Bug 1887852) 4. Acknowledgments The Evergreen project would like to acknowledge the following organizations that commissioned developments in this release of Evergreen: BC Libraries Cooperative Community Library (Sunbury) Consortium of Ohio Libraries (COOL) Evergreen Community Development Initiative Evergreen Indiana Georgia PINES Linn-Benton Community College Pennsylvania Integrated Library System (PaILS) We would also like to thank the following individuals who contributed code, translations, documentation, patches, and tests to this release of Evergreen: John Amundson Zavier Banks Felicia Beaudry Jason Boyer Dan Briem Andrea Buntz Neiman Christine Burns Galen Charlton Garry Collum Eva Cerniňáková Dawn Dale Elizabeth Davis Jeff Davis Martha Driscoll Bill Erickson Jason Etheridge Ruth Frasur Blake Graham-Henderson Katie Greenleaf Martin Rogan Hamby Elaine Hardy Kyle Huckins Angela Kilsdonk Tiffany Little Mary Llewellyn Terran McCanna Chauncey Montgomery Gina Monti Michele Morgan Carmen Oleskevich Jennifer Pringle Mike Risher Mike Rylander Jane Sandberg Chris Sharp Ben Shum Remington Steed Jason Stephenson Jennifer Weston Beth Willis We also thank the following organizations whose employees contributed patches: BC Libraries Cooperative Calvin College Catalyte CW MARS Equinox Open Library Initiative Georgia Public Library Service Kenton County Public Library King County Library System Linn-Benton Community College MOBIUS NOBLE Westchester Library System We regret any omissions. If a contributor has been inadvertently missed, please open a bug at http://bugs.launchpad.net/evergreen/ with a correction. Last updated 2021-04-14 15:04:29 EDT evergreen-ils-org-4101 ---- Evergreen 3.7.0 Release Notes Evergreen 3.7.0 Release Notes Table of Contents JavaScript must be enabled in your browser to display the table of contents. 1. Upgrade notes 1.1. Database Upgrade Procedure The database schema upgrade for Evergreen 3.7 has more steps than normal. The general procedure, assuming Evergreen 3.6.2 as the starting point, is: Run the main 3.6.2 ⇒ to 3.7 schema update script from the Evergreen source directory, supplying database connection parameters as needed: psql -f Open-ILS/src/sql/Pg/version-upgrade/3.6.2-3.7.0-upgrade-db.sql 2>&1 | tee 3.6.2-3.7.0-upgrade-db.log Create and ingest search suggestions: Run the following from psql to export the strings to files: \a \t \o title select value from metabib.title_field_entry; \o author select value from metabib.author_field_entry; \o subject select value from metabib.subject_field_entry; \o series select value from metabib.series_field_entry; \o identifier select value from metabib.identifier_field_entry; \o keyword select value from metabib.keyword_field_entry; \o \a \t From the command line, convert the exported words into SQL scripts to load into the database. This step assumes that you are at the top of the Evergreen source tree. $ ./Open-ILS/src/support-scripts/symspell-sideload.pl title > title.sql $ ./Open-ILS/src/support-scripts/symspell-sideload.pl author > author.sql $ ./Open-ILS/src/support-scripts/symspell-sideload.pl subject > subject.sql $ ./Open-ILS/src/support-scripts/symspell-sideload.pl series > series.sql $ ,/Open-ILS/src/support-scripts/symspell-sideload.pl identifier > identifier.sql $ ./Open-ILS/src/support-scripts/symspell-sideload.pl keyword > keyword.sql Back in psql, import the suggestions. This step can take several hours in a large databases, but the \i $FILE.sql` steps can be run in parallel. ALTER TABLE search.symspell_dictionary SET UNLOGGED; TRUNCATE search.symspell_dictionary; \i identifier.sql \i author.sql \i title.sql \i subject.sql \i series.sql \i keyword.sql CLUSTER search.symspell_dictionary USING symspell_dictionary_pkey; REINDEX TABLE search.symspell_dictionary; ALTER TABLE search.symspell_dictionary SET LOGGED; VACUUM ANALYZE search.symspell_dictionary; DROP TABLE search.symspell_dictionary_partial_title; DROP TABLE search.symspell_dictionary_partial_author; DROP TABLE search.symspell_dictionary_partial_subject; DROP TABLE search.symspell_dictionary_partial_series; DROP TABLE search.symspell_dictionary_partial_identifier; DROP TABLE search.symspell_dictionary_partial_keyword; (optional) Apply the new opt-in setting for overdue and preduce notices. The following query will set the circ.default_overdue_notices_enabled user setting to true (the default value) for all existing users, ensuring they continue to receive overdue/predue emails. INSERT INTO actor.usr_setting (usr, name, value) SELECT id, circ.default_overdue_notices_enabled, true FROM actor.usr; The following query will add the circ.default_overdue_notices_enabled user setting as an opt-in setting for all action triggers that send emails based on a circ being due (unless another opt-in setting is already in use). UPDATE action_trigger.event_definition SET opt_in_setting = circ.default_overdue_notices_enabled, usr_field = usr WHERE opt_in_setting IS NULL AND hook = checkout.due AND reactor = SendEmail; Evergreen admins who wish to use the new setting should run both of the above queries. Admins who do not wish to use it, or who are already using a custom opt-in setting of their own, do not need to do anything. Perform a VACUUM ANALYZE of the following tables using psql: VACUUM ANALYZE authority.full_rec; VACUUM ANALYZE authority.simple_heading; VACUUM ANALYZE metabib.identifier_field_entry; VACUUM ANALYZE metabib.combined_identifier_field_entry; VACUUM ANALYZE metabib.title_field_entry; VACUUM ANALYZE metabib.combined_title_field_entry; VACUUM ANALYZE metabib.author_field_entry; VACUUM ANALYZE metabib.combined_author_field_entry; VACUUM ANALYZE metabib.subject_field_entry; VACUUM ANALYZE metabib.combined_subject_field_entry; VACUUM ANALYZE metabib.keyword_field_entry; VACUUM ANALYZE metabib.combined_keyword_field_entry; VACUUM ANALYZE metabib.series_field_entry; VACUUM ANALYZE metabib.combined_series_field_entry; VACUUM ANALYZE metabib.real_full_rec; 1.2. New Seed Data 1.2.1. New Permissions Administer geographic location services (ADMIN_GEOLOCATION_SERVICES) Administer library groups (ADMIN_LIBRARY_GROUPS) Manage batch (subscription) hold events (MANAGE_HOLD_GROUPS) Modify patron SSO settings (SSO_ADMIN) View geographic location services (VIEW_GEOLOCATION_SERVICES) 1.2.2. New Global Flags Block the ability of expired user with the STAFF_LOGIN permission to log into Evergreen (auth.block_expired_staff_login) Offer use of geographic location services in the public catalog (opac.use_geolocation) 1.2.3. New Internal Flags Maximum search result count at which spelling suggestions may be offered (opac.did_you_mean.low_result_threshold) 1.2.4. New Library Settings Allow both Shibboleth and native OPAC authentication (opac.login.shib_sso.allow_native) Allow renewal request if renewal recipient privileges have expired (circ.renew.expired_patron_allow) Enable Holdings Sort by Geographic Proximity ('opac.holdings_sort_by_geographic_proximity`) Enable Shibboleth SSO for the OPAC (opac.login.shib_sso.enable) Evergreen SSO matchpoint (opac.login.shib_sso.evergreen_matchpoint) Geographic Location Service to use for Addresses (opac.geographic_location_service_for_address) Keyboard distance score weighting in OPAC spelling suggestions (search.symspell.keyboard_distance.weight) Log out of the Shibboleth IdP (opac.login.shib_sso.logout) Minimum required uses of a spelling suggestions that may be offered (search.symspell.min_suggestion_use_threshold) Pg_trgm score weighting in OPAC spelling suggestions (search.symspell.pg_trgm.weight) Randomize group hold order (holds.subscription.randomize) Shibboleth SSO Entity ID (opac.login.shib_sso.entityId) Shibboleth SSO matchpoint (opac.login.shib_sso.shib_matchpoint) Show Geographic Proximity in Miles (opac.geographic_proximity_in_miles) Soundex score weighting in OPAC spelling suggestions (search.symspell.soundex.weight) 1.2.5. New Stock Action/Trigger Event Definitions Hold Group Hold Placed for Patron Email Notification 2. New Features 2.1. Administration 2.1.1. Single Sign On (Shibboleth) Public Catalog integration The Evergreen OPAC can now be used as a Service Provider (SP) in a Single Sign On infrastructure. This allows system administrators to connect the Evergreen OPAC to an identity provider (IdP). Such a scenario offers significant usability improvements to patrons: They can use the same, IdP-provided login screen and credentials that they use for other applications (SPs). If they have already logged into another participating application, when they arrive at the Evergreen OPAC, they can be logged in without needing to enter any credentials at all. Evergreen can be configured to offer a Single Sign-out service, where logging out of the Evergreen OPAC will also log the user out of all other SPs. It can also offer security benefits, if it enables a Shibboleth-enabled Evergreen installation to move away from insecure autogenerated user passwords (e.g. year of birth or last four digits of a phone number). Different Org Units can use different IdPs. This development also supports a mix of Shibboleth and non-Shibboleth libraries. Note that only the OPAC can be integrated with Shibboleth at this time; no such support exists for the staff client, self-check, etc. Also note that this development does not include automatic provisioning of accounts. At this time, matching accounts must already exist in Evergreen for a patron to successfully authenticate into the OPAC via Single Sign On. Installation Installing and configuring Shibboleth support is a complex project. In broad strokes, the process includes: Installing Shibboleth and the Shibboleth Apache module (apt install libapache2-mod-shib2 on Debian and Ubuntu) Configuring Shibboleth, including: Setting up a certificate assigning an Entity ID getting metadata about the IdP from the IdP (perhaps "locally maintained metadata", where an XML file from the IdP is copied into place on your Evergreen server) Understanding what attributes the IdP will provide about your users, and describing them in the attribute-map.xml file. Providing your Entity ID, information about possible bindings, and any other requested information to the IdP administrator. Much of this information will be available at http://YOUR_EVERGREEN_DOMAIN/Shibboleth.sso/Metadata Configuring Apache, including: Enabling Shibboleth authentication in the eg_vhost.conf file (Optional) Using the new sso_loc Apache variable to identify which org unit should be used as the context location when fetching Shibboleth-related library settings. As a user with the new SSO_ADMIN permission, configure Evergreen using the Library Settings Editor, including: Enable Shibboleth SSO for the OPAC (Optional) Configure whether you will use SSO exclusively, or offer patrons a choice between SSO and standard Evergreen authentication (Optional) Configure whether or not you will use Single Log Out (Optional) In scenarios where a single Evergreen installation is connected to multiple IdPs, assign org units to the relevant IdPs, referenced by the IdP’s Entity Id. Of the attributes defined in attribute-map.xml, configure which one should be used to match users in the Evergreen database. This defaults to uid. For the attribute you chose in the previous step, configure which Evergreen field it should match against. Options are usrname (default), barcode, and email. This video on the SAML protocol can be very helpful for introducing the basic concepts used in the installation and configuration processes. 2.2. Architecture 2.2.1. Block Login of Expired Staff Accounts Evergreen now has the ability to prevent staff users whose accounts have expired from logging in. This is controlled by the new global flag "auth.block_expired_staff_login", which is not enabled by default. If that flag is turned on, accounts that have the STAFF_LOGIN permission and whose expiration date is in the past are prevented from logging into any Evergreen interface, including the staff client, the public catalog, and SIP2. It should be noted that ordinary patrons are allowed to log into the public catalog if their circulation privileges have expired. This feature prevents expired staff users from logging into the public catalog (and all other Evergreen interfaces and APIs) outright in order to prevent them from getting into the staff interface anyway by creative use of Evergreen’s authentication APIs. Evergreen admins are advised to check the expiration status of staff accounts before turning on the global flag, as otherwise it is possible to lock staff users out unexpectedly. The following SQL query will identify expired but otherwise un-deleted users that would be blocked by turning on the flag: SELECT DISTINCT usrname, expire_date FROM actor.usr au, permission.usr_has_perm_at_all(id, 'STAFF_LOGIN') WHERE active AND NOT deleted AND NOT barred AND expire_date < NOW() Note that this query can take a long time to run in large databases given the general way that it checks for users that have the STAFF_LOGIN permission. Replacing the use of permission.usr_has_perm_at_all() with a query on expired users with profiles known to have the STAFF_LOGIN permission will be much faster. 2.2.2. Migration From GIST to GIN Indexes for Full Text Search Evergreen now uses GIN indexes for full text search in PostgreSQL. GIN indexes offer better performance than GIST. For more information on the differences in the two index types, please refer to the PostgreSQL documentation. An upgrade script is provided as part of this migration. If you upgrade normally from a previous release of Evergreen, this upgrade script should run as part of the upgrade process. The migration script recommends that you run a VACUUM ANALYZE in PostgreSQL on the tables that had the indexes changed. The migration process does not do this for you, so you should do it as soon as is convenient after the upgrade. Updating Your Own Indexes If you have added your own full text indexes of type GIST, and you wish to migrate them to GIN, you may do so. The following query, when run in your Evergreen databsase after the migration from GIST to GIN, will identify the remaining GIST indexes in your database: SELECT schemaname, indexname FROM pg_indexes WHERE indexdef ~* 'gist'; If the above query produces output, you can run the next query to output a SQL script to migrate the remaining indexes from GIST to GIN: SELECT 'DROP INDEX ' || schemaname || '.' || indexname || E';\n' || REGEXP_REPLACE(indexdef, 'gist', 'gin', 'i') || E';\n' || 'VACUUM ANAlYZE ' || schemaname || '.' || tablename || ';' FROM pg_indexes WHERE indexdef ~* 'gist'; 2.2.3. Removal of Custom Dojo Build Evergreen had a method of making a custom build of the Dojo JavaScript library. Following this procedure could improve the load times for the OPAC and other interfaces that use Dojo. However, very few sites took advantage of this process or even knew of its existence. As a part of the process, an openils_dojo.js file was built and installed along with the other Dojo files. Evergreen had many references to load this optional file. For the majority of sites that did not use this custom Dojo process, this file did not exist. Browsers would spend time and resources requesting this nonexistent file. This situation also contributed noise to the Apache logs with the 404 errors from these requests. In keeping with the goal of eliminating Dojo from Evergreen, all references to openils_dojo.js have been removed from the OPAC and other files. The profile script required to make the custom Dojo build has also been removed. 2.3. Cataloging 2.3.1. Czech language records in sample data This release adds 7 Czech-language MARC records to the sample data set (also known as Concerto data set). 2.3.2. Publisher Catalog Display Includes 264 Tag Publisher values are now extracted for display from tags 260 OR 264. Upgrade Notes A partial reingest is required to extract the new publisher data for display. This query may be long-running. WITH affected_bibs AS ( SELECT DISTINCT(bre.id) AS id FROM biblio.record_entry bre JOIN metabib.real_full_rec mrfr ON (mrfr.record = bre.id AND mrfr.tag = '264') WHERE NOT bre.deleted ) SELECT metabib.reingest_metabib_field_entries(id, TRUE, FALSE, TRUE, TRUE) FROM affected_bibs; 2.4. Circulation 2.4.1. Hold Groups This feature allows staff to add multiple users to a named hold group bucket and place title-level holds for a record for that entire set of users. Users can be added to such a hold group bucket from either the patron search result interface, via the Add to Bucket dropdown, or through a dedicated Hold Group interface available from the Circulation menu. Adding new patrons to a hold group bucket will require staff have the PLACE_HOLD permission. Holds can be placed for the users in a hold group bucket either directly from the normal staff-place hold interface in the embedded OPAC, or by supplying the record ID within the hold group bucket interface. In the latter case, the list of users for which a hold was attempted but failed to be placed can be downloaded by staff in order to address any placement issues. Placing a hold group bucket hold will requires staff have the MANAGE_HOLD_GROUPS permission, which is new with this development. In the event of a mistaken hold group hold, staff with the MANAGE_HOLD_GROUPS permission will have the ability to cancel all unfulfilled holds created as part of a hold group event. A link to the title’s hold interface is available from the list of hold group events in the dedicated hold group interface. 2.4.2. Scan Item as Missing Pieces Angular Port The Scan Item As Missing Pieces interface is now an Angular interface. The functionality is the same, but the interface displays more details on the item in question (title/author/callnum) before proceeding with the missing pieces process. 2.4.3. Opt-In Setting for Overdue and Predue Emails The "Receive Overdue and Courtesy Emails" user setting permits users to control whether they receive email notifications about overdue items. To use the setting, modify any action trigger event definitions which send emails about overdue items, setting the "Opt In Setting" to "circ.default_overdue_notices_enabled" and the "User Field" to "usr". You can accomplish this by running the following query in your database: UPDATE action_trigger.event_definition SET opt_in_setting = 'circ.default_overdue_notices_enabled', usr_field = 'usr' WHERE opt_in_setting IS NULL AND hook = 'checkout.due' AND reactor = 'SendEmail'; Once this is done, the patron registration screen in the staff client will show a "Receive Overdue and Courtesy Emails" checkbox, which will be checked by default. To ensure that existing patrons continue to recieve email notifications, you will need to add the user setting to their accounts, which you can do by running the following query in your database: INSERT INTO actor.usr_setting (usr, name, value) SELECT id, 'circ.default_overdue_notices_enabled', 'true' FROM actor.usr; 2.4.4. Allow Circulation Renewal for Expired Patrons The "Allow renewal request if renewal recipient privileges have expired" organizational unit setting can be set to true to permit expired patrons to renew circulations. Allowing renewals for expired patrons reduces the number of auto-renewal failures and assumes that a patron with items out eligible for renewals has not been expired for very long and that such patrons are likely to renew their privileges in a timely manner. The setting is referenced based on the current circulation library for the renewal. It takes into account the global flags for "Circ: Use original circulation library on desk renewal instead of the workstation library" and "Circ: Use original circulation library on opac renewal instead of user home library." 2.5. OPAC 2.5.1. Consistent Ordering for Carousels Carousel ordering is now stable and predictable: Newly Cataloged Item and Newest Items by Shelving Location carousels are ordered from most recently cataloged to least recently cataloged. Recently Returned Item carousels is ordered is from most recently returned to least recently returned. Top Circulated Items carousels is ordered is from most circulated to least circulated. Manual carousels (as of now, without the ability to adjust the position of items) are in the order they are added to the backing bucket. Emptying and refilling the bucket allows reordering. 2.5.2. Default Public Catalog to the Bootstrap Skin The public catalog now defaults to the Bootstrap skin rather than the legacy TPAC skin. Bootstrap is now the default in order to encourage more testing, but users should be aware of the following issues; certain specific functionality is available only in the TPAC skin. The TPAC skin remains available for use, but current Evergreen users should start actively considering migrating to the Bootstrap skin. In order to continue to use the TPAC skin, comment out the following line in eg_vhost.conf PerlAddVar OILSWebTemplatePath "@localstatedir@/templates-bootstrap" # Comment this line out to use the legacy TPAC 2.5.3. Did You Mean? Single word search suggestions This feature is the first in the series to add native search suggestions to the Evergreen search logic. A significant portion of the code is dedicated to infrastructure that will be used in later enhancements to the functionality. Overview When searching the public or staff catalog in a single search class (title, author, subject, series, identifier, or keyword) with a single search term users can be presented with alternate search terms. Depending on how the instance has been configured, suggestions may be provided for only misspelled words (as defined by existence in the bibliographic corpus), terms that are spelled properly but occur very few times, or on every single-term search. Settings The following new library settings control the behavior of the suggestions: Maximum search result count at which spelling suggestions may be offered Minimum required uses of a spelling suggestions that may be offered Maximum number of spelling suggestions that may be offered Pg_trgm score weighting in OPAC spelling suggestions Soundex score weighting in OPAC spelling suggestions QWERTY Keyboard similarity score weighting in OPAC spelling suggestions There are also two new internal flags: symspell.prefix_length symspell.max_edit_distance Upgrading This feature requires the addition of new Perl module dependencies. Please run the app server and database server dependency Makefiles before applying the database and code updates. At the end of the database upgrade script, the administrator is presented with a set of instructions necessary to precompute the suggestion dictionary based on the current bibliographic database. The first half of this procedure can be started even before the upgrade begins, as soon as the Evergreen database is no longer accessible to users that might cause changes to bibliographic records. For very large instances, this dictionary generation can take several hours and needs to be run on a server with significant RAM and CPU resources. Please look at the upgrade script before beginning an upgrade and plan this dictionary creation as part of the overall upgrade procedure. Given a server, such as a database server with 64G of RAM, you should be able to run all six of the shell commands in parallel in screen sessions or with a tool such as GNU parallel. These commands invoke a script that will generate a class-specific sub-set of the dictionary, and can be used to recreate the dictionary if necessary in the future. 2.5.4. Sort Holdings by Geographical Proximity This functionality integrates 3rd party geographic lookup services to allow patrons to enter an address on the record details page in the OPAC and sort the holdings for that record based on proximity of their circulating libraries to the entered address. To support this, latitude and longitude coordinates may be associated with each org unit. Care is given to not log or leak patron provided addresses or the context in which they are used. Requires the following Perl modules: Geo::Coder::Free, Geo::Coder::Google, and Geo::Coder::OSM Configuration instructions: Register an account with a third party geographic location service and copy the API Key. Configure the Geographic Location Service (Server Administration > Geographic Location Service > New Geographic Location Service). Enable Global Flag by navigating to Server Administration → Global Flags and locating the opac.use_geolocation flag. (Any entry in the Value field will be ignored.) Enable Library Setting: Enable Holdings Sort by Geographic Proximity (set to True). Enable Library Setting: Geographic Location Service to use for Addresses (use the value from the Name field entered in the Geographic Location Services Configuration entry). Enable Library Setting: Show Geographic Proximity in Miles (if not set, it will default to kilometers). Set the geographic coordinates for each location by navigating to Server Administration > Organizational Units. Select the org unit, switch to the Physical Address subtab and either manually enter Latitude and Longitude values or use the Get Coordinate button. Two new permissions, VIEW_GEOLOCATION_SERVICES and ADMIN_GEOLOCATION_SERVICES, control viewing and editing values in the Geolocation Location Services interface. They are added to the System Administrator and Global Administrator permissions groups by default. 2.5.5. Library Groups The Library Groups search feature revives a longstanding internal concept in Evergreen called “Lassos,” which allows an administrator to define a group of organizational units for searching outside of the standard organizational unit hierarchy. Use case examples include creating a group of law or science libraries within a university consortium, or grouping all school libraries together within a mixed school/public library consortium. Searches can be restricted to a particular Library Group from the library selector in the public catalog basic search page and from the new "Where" selector on the advanced search page. Restricting catalog searches by Library Group is available only in the public catalog and "traditional" staff catalog; it is not available in the Angular staff catalog. This feature adds a new permission, ADMIN_LIBRARY_GROUPS, that allows updating Library Groups and Library Group Maps. This permission is not associated with any profiles by default, and replaces the CREATE_LASSO, UPDATE_LASSO, and DELETE_LASSO permissions. To define new library groups, use the Server Administration Library Groups and Library Group Maps pages. An autogen and a reload of Apache should be performed after making changes to Library Groups. 2.5.6. Easier Styling of Public Catalog Logo and Cart Images Evergreen now has IDs associated with logos and cart images in the TPAC and Bootstrap OPACs to aid in customization. Images are as follows: small Evergreen logo in navigation bar is topnav_logo_image the large Evergreen logo in the center of the splash page of the TPAC is homesearch_main_logo_image the cart icon is cart_icon_image the small logo in the footer is footer_logo_image The Bootstrap OPAC does not have a homesearch logo icon as it is added in the background by CSS and can be directly styled through the CSS. 2.5.7. Easier TPAC Customization via colors.tt2 Twelve new colors for TPAC have been added to the colors.tt2 file as well as having corresponding changes to the style.css.tt2 file. These use descriptive rather than abstract names. These changes help avoid situations were unreadable values are placed on top of each other and where different values are wanted for elements that only refernece a single color previously. Guidelines are below for setting values that correspond to the previous values used in the colors.tt2 file. For more diverse customizations the OPAC should be reviewed before a production load. footer is used for the background color of the footer. It replaces the primary. footer_text sets the text color in the footer and replaces text_invert header sets the background of the header and replaces primary_fade header_text sets the color of text in the header and replaces text_invert header_links_bar sets the background of the links bar that separates the header on the front page of the opac and replaces background_invert header_links_text sets the text on the links bar and replaces text_invert header_links_text_hover set the hover text color on the links bar and replaces primary opac_button sets the background color of the My Opac button and replaces control opac_button_text explicitly sets the text color on the My Opac button opac_button_hover sets the background color of the My Opac button when the mouse is hovering over it and replaces primary opac_button_hover_text sets the text color of the My Opac button when the mouse is hovering over it and replaces text invert Note that is patch is primarily meant for users who wish to continue using TPAC rather than the Bootstrap skin for a while; new Evergreen users are advised to use the now-default Bootstrap skin. 2.5.8. Configurable Read More Accordion for OPAC Search and Record View (TPAC) Read More Button Public catalog record fields (in the TPAC skin only) now truncate themselves based on a configurable amount of characters. The full field may be displayed upon hitting a (Read More) link, which will then toggle into a (Read Less) link to re-truncate the field. Configuration Open-ILS/src/templates/opac/parts/config.tt2 contains two new configuration variables: truncate_contents (default: 1) contents_truncate_length (default: 50). Setting truncate_contents to 0 will disable the read more functionality. The variable contents_truncate_length corresponds to the amount of characters to display before truncating the text. If contents_truncate_length is removed, it will default to 100. Additional configuration for note fields can be made in Open-ILS/src/templates/opac/parts/record/contents.tt2, allowing a trunc_length variable for each individual type of note, which will override contents_truncate_length for that specific type of note. Adding Read More Functionality to further fields To add Read More functionality to any additional fields, you may use the macro accordion(), defined in misc_util.tt2. It can take three variables: str, trunc_length, and element. str corresponds to the string you want to apply it to, trunc_length (optional) will override contents_truncate_length if supplied, and element (optional) provides an alternative HTML element to look at for the truncation process (useful in situations such as the Authors and Cast fields, where each field is processed individually, but needs to be treated as a single field). 2.6. Reports 2.6.1. Reports Scheduler Improvements Previously, the reports scheduler allowed duplicated reports under certain circumstances. A uniqueness constraint now disallows this without adversely affecting the reports process. 3. Miscellaneous The Create Reservation form in the Booking module now includes an option to search for the patron by attributes other than just their barcode. (Bug 1816655) The form to add a user to a Course now includes an option to search for the patron by attributes other than just their barcode. (Bug 1907921) For consistency with the menu action Cataloging ⇒ Retrieve Record by TCN Value, the staff catalog Numeric Search ⇒ TCN search now includes deleted bib records. (Bug 1881650) Add a new command-line script, overdrive-api-checker.pl, for testing the OverDrive API. (Bug 1696825) The Shelving Location Groups editor is ported to Angular. (Bug 1852321) The staff catalog now has the ability to add all search results (up to 1,000 titles) to the basket in one fell swoop. (Bug 1885179) Add All Videos as a search format. (Bug 1917826) Server-side print templates can now have print contexts set. (Bug 1891550) Add ability to set the print context for a print template to "No-Print" to specify, well, that a given receipt should never be printed. (Bug 1891550) Add Check Number as an available column to the Bill History grids. (Bug 1705693) Adds a new control to the item table in the TPAC public catalog only to specify that only items that are available should be displayed. (Bug 1853006) Adds warning before deleting bib records with holds (Bug 1398107) Library scope on (Angular) Administration pages now defaults to workstation location rather than consortium (Bug 173322) Pending users now set last four digits of phone number as password when library setting is enabled (Bug 1887852) 4. Acknowledgments The Evergreen project would like to acknowledge the following organizations that commissioned developments in this release of Evergreen: BC Libraries Cooperative Community Library (Sunbury) Consortium of Ohio Libraries (COOL) Evergreen Community Development Initiative Evergreen Indiana Georgia PINES Linn-Benton Community College Pennsylvania Integrated Library System (PaILS) We would also like to thank the following individuals who contributed code, translations, documentation, patches, and tests to this release of Evergreen: John Amundson Zavier Banks Felicia Beaudry Jason Boyer Dan Briem Andrea Buntz Neiman Christine Burns Galen Charlton Garry Collum Eva Cerniňáková Dawn Dale Elizabeth Davis Jeff Davis Martha Driscoll Bill Erickson Jason Etheridge Ruth Frasur Blake Graham-Henderson Katie Greenleaf Martin Rogan Hamby Elaine Hardy Kyle Huckins Angela Kilsdonk Tiffany Little Mary Llewellyn Terran McCanna Chauncey Montgomery Gina Monti Michele Morgan Carmen Oleskevich Jennifer Pringle Mike Risher Mike Rylander Jane Sandberg Chris Sharp Ben Shum Remington Steed Jason Stephenson Jennifer Weston Beth Willis We also thank the following organizations whose employees contributed patches: BC Libraries Cooperative Calvin College Catalyte CW MARS Equinox Open Library Initiative Georgia Public Library Service Kenton County Public Library King County Library System Linn-Benton Community College MOBIUS NOBLE Westchester Library System We regret any omissions. If a contributor has been inadvertently missed, please open a bug at http://bugs.launchpad.net/evergreen/ with a correction. Last updated 2021-04-14 15:04:29 EDT evergreen-ils-org-4187 ---- Evergreen 3.7.0 released – Evergreen ILS Skip to content Evergreen – Open Source Library Software Evergreen – Open Source Library Software About Us Overview Annual Reports F.A.Q. Evergreen Event Code of Conduct Software Freedom Conservancy Project Governance Trademark Policy Documentation Official Documentation Documentation Interest Group Evergreen Roadmap Evergreen Wiki Tabular Release Notes Get Involved! Get Involved! Committees & Interest Groups Communications Mailing Lists IRC Calendar Blog Jobs Proposed Development Projects Merchandise T-shirts and more Conference All Conferences 2021 Evergreen International Online Conference 2020 Evergreen International Online Conference Event Photography Policy Code of Conduct Downloads Evergreen Downloads OpenSRF Downloads Home » Announcements » Evergreen 3.7.0 released Evergreen 3.7.0 released This entry was posted in Announcements Releases on 4/14/2021 by Galen Charlton The Evergreen Community is pleased to announce the release of Evergreen 3.7.0. Evergreen is highly-scalable software for libraries that helps library patrons find library materials and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries. Evergreen 3.7.0 is a major release that includes the following new features of note: Support for SAML-based Single Sign On Hold Groups, a feature that allows staff to add multiple users to a named hold group bucket and place title-level holds for a record for that entire set of users The Bootstrap public catalog skin is now the default “Did you mean?” functionality for catalog search focused on making suggestions for single search terms Holdings on the public catalog record details page can now be sorted by geographic proximity Library Groups, a feature that allows defining groups of organizational units outside of the hierarchy that can be used to limit catalog search results Expired staff accounts can now be blocked from logging in Publisher data in the public catalog display is now drawn from both the 260 and 264 field The staff catalog can now save all search results (up to 1,000) to a bucket in a single operation New opt-in settings for overdue and predue email notifications A new setting to allow expired patrons to renew loans Porting of additional interfaces to Angular, including Scan Item as Missing Pieces and Shelving Location Groups Evergreen admins installing or upgrading to 3.7.0 should be aware of the following: The minimum version of PostgreSQL required to run Evergreen 3.6 is PostgreSQL 9.6. The minimum version of OpenSRF is 3.2. This release adds anew OpenSRF service, open-ils.geo. The release also adds several new Perl module dependencies, Geo::Coder::Google, Geo::Coder::OSM, String::KeyboardDistance, and Text::Levenshtein::Damerau::XS. The database update procedure has more steps than usual; please consult the upgrade section of the release notes. The release is available on the Evergreen downloads page. Additional information, including a full list of new features, can be found in the release notes. Share this: Facebook Twitter More Reddit LinkedIn Pocket Pinterest Tumblr Print Related Post navigation ← Evergreen 3.7-rc available About Evergreen This is the project site for Evergreen, a highly-scalable software for libraries that helps library patrons find library materials, and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries. © 2008-2020 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later. The Evergreen Project is a 501(c)3 nonprofit organization. Community Links Evergreen Bug Tracker Evergreen on Open HUB Evergreen Wiki Git Repositories Join IRC! IRC Logs Official Documentation · © 2021 Evergreen ILS · Powered by · Designed with the Customizr theme · evergreen-ils-org-5070 ---- Evergreen 3.7-rc available – Evergreen ILS Skip to content Evergreen – Open Source Library Software Evergreen – Open Source Library Software About Us Overview Annual Reports F.A.Q. Evergreen Event Code of Conduct Software Freedom Conservancy Project Governance Trademark Policy Documentation Official Documentation Documentation Interest Group Evergreen Roadmap Evergreen Wiki Tabular Release Notes Get Involved! Get Involved! Committees & Interest Groups Communications Mailing Lists IRC Calendar Blog Jobs Proposed Development Projects Merchandise T-shirts and more Conference All Conferences 2021 Evergreen International Online Conference 2020 Evergreen International Online Conference Event Photography Policy Code of Conduct Downloads Evergreen Downloads OpenSRF Downloads Home » Development Update » Evergreen 3.7-rc available Evergreen 3.7-rc available This entry was posted in Development Update on 4/12/2021 by Galen Charlton The Evergreen Community is pleased to announce the availability of the release candidate for Evergreen 3.7. This release follows up on the recent beta release. The general release of 3.7.0 is planned for Wednesday, 14 April 2021. Between now and then, please download the release candidate and try it out. Additional information, including a full list of new features, can be found in the release notes. Share this: Facebook Twitter More Reddit LinkedIn Pocket Pinterest Tumblr Print Related Post navigation ← Evergreen 3.7-beta available Evergreen 3.7.0 released → About Evergreen This is the project site for Evergreen, a highly-scalable software for libraries that helps library patrons find library materials, and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries. © 2008-2020 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later. The Evergreen Project is a 501(c)3 nonprofit organization. Community Links Evergreen Bug Tracker Evergreen on Open HUB Evergreen Wiki Git Repositories Join IRC! IRC Logs Official Documentation · © 2021 Evergreen ILS · Powered by · Designed with the Customizr theme · evergreen-ils-org-5339 ---- Evergreen Downloads – Evergreen ILS Skip to content Evergreen – Open Source Library Software Evergreen – Open Source Library Software About Us Overview Annual Reports F.A.Q. Evergreen Event Code of Conduct Software Freedom Conservancy Project Governance Trademark Policy Documentation Official Documentation Documentation Interest Group Evergreen Roadmap Evergreen Wiki Tabular Release Notes Get Involved! Get Involved! Committees & Interest Groups Communications Mailing Lists IRC Calendar Blog Jobs Proposed Development Projects Merchandise T-shirts and more Conference All Conferences 2021 Evergreen International Online Conference 2020 Evergreen International Online Conference Event Photography Policy Code of Conduct Downloads Evergreen Downloads OpenSRF Downloads Home » Evergreen Downloads Evergreen Downloads Evergreen Downloads Evergreen depends on the following technologies Perl, C, JavaScript, XML, XPath, XSLT, XMPP, OpenSRF, Apache, mod_perl, and PostgreSQL. The latest stable release of a supported Linux distribution is recommended for an Evergreen installation. For Ubuntu, please use the 18.04 64-bit LTS (long term support) Server release. Currently the latest release from the Evergreen 3.6 series is recommended for new installations and stable releases are suggested for production systems. Note: Evergreen servers and staff clients must match. For example, if you are running server version 3.1.0, you should use version 3.1.0 of the staff client. Evergreen 3.2.0+ no longer supports a separate client by default, but building a client remains as an unsupported option. Server & staff client downloads 3.7 Series 3.6 Series 3.5 Series Status stable stable stable Latest Release 3.7.0 3.6.3 3.5.4 Release Date 2021-04-14 2021-04-01 2021-04-01 Release Notes Release Notes Release Notes Release Notes Tabular release notes summary ChangeLog ChangeLog ChangeLog ChangeLog Evergreen Installation Install Instructions Install Instructions Install Instructions Upgrading Notes on upgrading from 3.6.2 TBD TBD OpenSRF Software 3.2.1 (md5) 3.2.1 (md5) 3.2.1 (md5) Server Software Source (md5) Source (md5) Source (md5) Web Staff Client Extension (“Hatch”) Windows Hatch Installer 0.3.2 (md5) – Installation Instructions (Windows & Linux) Git Repository Git Location Git Location Git Location Other Evergreen Staff Clients Staff Client Archive Windows Staff Clients for slightly older stable releases (2.11, 2.10). For Mac and Linux Installing the Evergreen client on Macs Evergreen 2.8.3 Mac Staff Client [.dmg] Evergreen 2.9.0 Mac Staff Client [.dmg] Evergreen 2.12.0 Mac Staff Client [.zip] Evergreen 3.0.0 Mac Staff Client [.zip] Pre-built MAC staff client for Evergreen 2.10 and 2.8 – Provided by SITKA Evergreen in action Visit the Evergreen catalog on our demonstration and development servers, or visit this list of live Evergreen libraries. You can also download an Evergreen staff client and point it at the Evergreen demo or development server (see the community servers page for details). Bug Reports Please report any Evergreen bugs/wishlist on Launchpad. To submit a vulnerability please email your report to open-ils-security@esilibrary.com. Evergreen Code Museum Older versions of Evergreen software are available from the Evergreen Code Museum. Source Code Repository A Gitweb instance sits atop the Git repositories for Evergreen and OpenSRF. You can find both repositories at git.evergreen-ils.org. Here is the running change log for the Evergreen code repository: watch us work. Trac sends code commits to two public Evergreen mailing lists: For Evergreen commits, subscribe to open-ils-commits For OpenSRF commits, subscribe to opensrf-commits About Evergreen This is the project site for Evergreen, a highly-scalable software for libraries that helps library patrons find library materials, and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries. © 2008-2020 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later. The Evergreen Project is a 501(c)3 nonprofit organization. Community Links Evergreen Bug Tracker Evergreen on Open HUB Evergreen Wiki Git Repositories Join IRC! IRC Logs Official Documentation · © 2021 Evergreen ILS · Powered by · Designed with the Customizr theme · evergreen-ils-org-8730 ---- Evergreen ILS – Evergreen – Open Source Library Software Skip to content Evergreen – Open Source Library Software Evergreen – Open Source Library Software About Us Overview Annual Reports F.A.Q. Evergreen Event Code of Conduct Software Freedom Conservancy Project Governance Trademark Policy Documentation Official Documentation Documentation Interest Group Evergreen Roadmap Evergreen Wiki Tabular Release Notes Get Involved! Get Involved! Committees & Interest Groups Communications Mailing Lists IRC Calendar Blog Jobs Proposed Development Projects Merchandise T-shirts and more Conference All Conferences 2021 Evergreen International Online Conference 2020 Evergreen International Online Conference Event Photography Policy Code of Conduct Downloads Evergreen Downloads OpenSRF Downloads Evergreen Downloads Find Download files, install steps, etc. Learn more » Documentation Official Documentation, DokuWiki, and other resources... Learn more » Get Involved! Mailing Lists, IRC, and more. Come join our community! Learn more » Evergreen 3.7.0 released The Evergreen Community is pleased to announce the release of Evergreen 3.7.0. Evergreen is highly-scalable software for libraries that helps library patrons find library materials and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries. Evergreen 3.7.0 is a major release that includes […] Share this: Facebook Twitter More Reddit LinkedIn Pocket Pinterest Tumblr Print Evergreen 3.7-rc available The Evergreen Community is pleased to announce the availability of the release candidate for Evergreen 3.7. This release follows up on the recent beta release. The general release of 3.7.0 is planned for Wednesday, 14 April 2021. Between now and then, please download the release candidate and try it out. […] Share this: Facebook Twitter More Reddit LinkedIn Pocket Pinterest Tumblr Print Evergreen 3.7-beta available The Evergreen Community is pleased to announce the availability of the beta release for Evergreen 3.7. This release contains various new features and enhancements, including: Support for SAML-based Single Sign On Hold Groups, a feature that allows staff to add multiple users to a named hold group bucket and place […] Share this: Facebook Twitter More Reddit LinkedIn Pocket Pinterest Tumblr Print Post navigation ← Older posts About Evergreen This is the project site for Evergreen, a highly-scalable software for libraries that helps library patrons find library materials, and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries. © 2008-2020 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later. The Evergreen Project is a 501(c)3 nonprofit organization. Community Links Evergreen Bug Tracker Evergreen on Open HUB Evergreen Wiki Git Repositories Join IRC! IRC Logs Official Documentation · © 2021 Evergreen ILS · Powered by · Designed with the Customizr theme · evergreen-ils-org-9796 ---- Evergreen 3.7-beta available – Evergreen ILS Skip to content Evergreen – Open Source Library Software Evergreen – Open Source Library Software About Us Overview Annual Reports F.A.Q. Evergreen Event Code of Conduct Software Freedom Conservancy Project Governance Trademark Policy Documentation Official Documentation Documentation Interest Group Evergreen Roadmap Evergreen Wiki Tabular Release Notes Get Involved! Get Involved! Committees & Interest Groups Communications Mailing Lists IRC Calendar Blog Jobs Proposed Development Projects Merchandise T-shirts and more Conference All Conferences 2021 Evergreen International Online Conference 2020 Evergreen International Online Conference Event Photography Policy Code of Conduct Downloads Evergreen Downloads OpenSRF Downloads Home » Development Update » Evergreen 3.7-beta available Evergreen 3.7-beta available This entry was posted in Development Update on 4/1/2021 by Galen Charlton The Evergreen Community is pleased to announce the availability of the beta release for Evergreen 3.7. This release contains various new features and enhancements, including: Support for SAML-based Single Sign On Hold Groups, a feature that allows staff to add multiple users to a named hold group bucket and place title-level holds for a record for that entire set of users The Bootstrap public catalog skin is now the default “Did you mean?” functionality for catalog search focused on making suggestions for single search terms Holdings on the public catalog record details page can now be sorted by geographic proximity Library Groups, a feature that allows defining groups of organizational units outside of the hierarchy that can be used to limit catalog search results Expired staff accounts can now be blocked from logging in Publisher data in the public catalog display is now drawn from both the 260 and 264 field The staff catalog can now save all search results (up to 1,000) to a bucket in a single operation New opt-in settings for overdue and predue email notifications A new setting to allow expired patrons to renew loans Porting of additional interfaces to Angular, including Scan Item as Missing Pieces and Shelving Location Groups Evergreen admins installing the beta or upgrading a test system to the beta should be aware of the following: The minimum version of PostgreSQL required to run Evergreen 3.6 is PostgreSQL 9.6. The minimum version of OpenSRF is 3.2. This release adds anew OpenSRF service, open-ils.geo. The release also adds several new Perl module dependencies, Geo::Coder::Google, Geo::Coder::OSM, String::KeyboardDistance, and Text::Levenshtein::Damerau::XS. The database update procedure has more steps than usual; please consult the upgrade section of the release notes. The beta release should not be used for production. Additional information, including a full list of new features, can be found in the release notes. Share this: Facebook Twitter More Reddit LinkedIn Pocket Pinterest Tumblr Print Related Post navigation ← Security releases: Evergreen 3.6.3 and 3.5.4 Evergreen 3.7-rc available → About Evergreen This is the project site for Evergreen, a highly-scalable software for libraries that helps library patrons find library materials, and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries. © 2008-2020 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later. The Evergreen Project is a 501(c)3 nonprofit organization. Community Links Evergreen Bug Tracker Evergreen on Open HUB Evergreen Wiki Git Repositories Join IRC! IRC Logs Official Documentation · © 2021 Evergreen ILS · Powered by · Designed with the Customizr theme · everybodyslibraries-com-2729 ---- Everybody's Libraries Everybody's Libraries Libraries for everyone, by everyone, shared with everyone, about everything Public Domain Day 2021: Honoring a lost generation It’s Public Domain Day again. In much of Europe, and other countries with “life+70 years” copyright terms, works by authors who died in 1950, such as George Orwell, Karin Michaelis, George Bernard Shaw, and Edna St. Vincent Millay, have joined … Continue reading → Counting down to 1925 in the public domain We’re rapidly approaching another Public Domain Day, the day at the start of the year when a year’s worth of creative work joins the public domain. This will be the third year in a row that the US will have … Continue reading → From our subjects to yours (and vice versa) (TL;DR: I’m starting to implement services and publish data to support searching across library collections that use customized subject headings, such as the increasingly-adopted substitutes for LCSH terms like “Illegal aliens”. Read on for what I’m doing, why, and where … Continue reading → Everybody’s Library Questions: Finding films in the public domain Welcome to another installment of Everybody’s Library Questions, where I give answers to questions people ask me (in comments or email) that seem to be useful for general consumption. Before I start, though, I want to put in a plug … Continue reading → Build a better registry: My intended comments to the Library of Congress on the next Register of Copyrights The Library of Congress is seeking public input on abilities and priorities desired for the next Register of Copyrights, who heads the Copyright Office, a department within the Library of Congress.  The deadline for comments as I write this is … Continue reading → Welcome to everybody’s online libraries As coronavirus infections spread throughout the world, lots of people are staying home to slow down the spread and save lives.  In the US, many universities, schools, and libraries have closed their doors.  (Here’s what happening at the library where … Continue reading → Public Domain Day 2020: Coming Around Again I’m very happy for 2020 to be arriving.  As the start of the 2020s, it represents a new decade in which we can have a fresh start, and hope to make better decisions and have better outcomes than some of … Continue reading → 2020 vision #5: Rhapsody in Blue by George Gershwin It’s only a few hours from the new year where I write this, but before I ring in the new year, and a new year’s worth of public domain material, I’d like to put in a request for what music … Continue reading → 2020 vision #4: Ding Dong Merrily on High by George Ratcliffe Woodward and others It’s beginning to sound a lot like Christmas everywhere I go.  The library where I work had its holiday party earlier this week, where I joined librarian colleagues singing Christmas, Hanukkah, and winter-themed songs in a pick-up chorus.  Radio stations … Continue reading → 2020 vision #3: The Most Dangerous Game by Richard Connell “Be a realist. The world is made up of two classes–the hunters and the huntees. Luckily, you and I are hunters.” Sanger Rainsford speaks these words at the start of “The Most Dangerous Game”, one of the most famous short … Continue reading → everybodyslibraries-com-9960 ---- Everybody's Libraries | Libraries for everyone, by everyone, shared with everyone, about everything Everybody's Libraries Libraries for everyone, by everyone, shared with everyone, about everything Skip to content Home About About the Free Decimal Correspondence Free Decimal Correspondence ILS services for discovery applications John Mark Ockerbloom The Metadata Challenge ← Older posts Public Domain Day 2021: Honoring a lost generation Posted on January 1, 2021 by John Mark Ockerbloom It’s Public Domain Day again. In much of Europe, and other countries with “life+70 years” copyright terms, works by authors who died in 1950, such as George Orwell, Karin Michaelis, George Bernard Shaw, and Edna St. Vincent Millay, have joined the public domain. Canada, and other countries that still have the Berne Convention’s “life+50 years” copyright terms, get works by authors like E. M. Forster, Nelly Sachs, Bertrand Russell, Elsa Triolet, and other authors who died in 1970 in the public domain. And in the United States, copyrights from 1925 that are still in force have expired, introducing to the public domain a wide variety of works I’ve covered in my prior blog post. The new public domain work that I’ve seen most widely noted is F. Scott Fitzgerald’s Jazz Age novel The Great Gatsby. My library has a copy of the first edition, and its scan of the volume became available on HathiTrust today. Though he doesn’t use the term in Gatsby, Fitzgerald and many other authors writing around 1925 are often considered both members and chroniclers of the “Lost Generation”. The term was coined by Gertrude Stein, and made famous by Ernest Hemingway, who used it in the epigraph to his novel The Sun Also Rises (one of many more works scheduled to join the US public domain a year from now). The Lost Generation describes an age cohort that was disrupted by the First World War, and all the deaths caused by that war and by the influenza pandemic that arose in its wake. Society would never be the same afterwards. It’s ironic that some of the definitive creations of that generation are themselves part of a largely lost generation. At the time of their publication, they were supposed to enter the public domain after 56 years at most, but that maximum term has been extended by 39 more years, well over a generation’s worth of time. The creators of these works that got the full copyright term are almost all now dead, and many of the less famous works in this cohort have also become lost from most people’s memories. Some, including many fragile films of that era, now have all copies lost as well. The generation that now sees these works joining the public domain also has many of the makings of a new “lost generation”. The number of deaths from COVID-19 in the United States, which badly botched its response compared to many similar countries, far exceeds the number of American deaths in World War I, and is a sizable and rapidly growing fraction of all the American deaths from the 1918-1920 flu pandemic. Many more people who have dealt with illness and quarantine have also experienced what feels like a lost year, one that hasn’t ended yet despite today’s change in the calendar. But it’s also important to recognize the key role of the public domain and of open access publications in preventing further loss. While Philadelphia, where I live, has been hit hard by this pandemic, it hasn’t been hit as hard as some other places, in part because masking and other behavioral changes have been more widely used and accepted here. Not long before the current pandemic started, the Mutter Museum’s Spit Spreads Death exhibit reminded us of the horrifying death toll of the 1918 flu pandemic here, caused in large part by failing to stop mass gatherings that made the flu spread like wildfire here. The exhibit’s narrative, which many other local media outlets further elaborated on, was able to freely draw on a wide variety of source materials of the era that were all in the public domain due to their age. The freely available sources from 1918 helped spread public health awareness here in 2020. Open access to resources also spurred the rapid development and testing of effective treatments against COVID. Open sharing of the novel coronavirus genomes, and related scientific data, enabled research on the virus and effective responses to be carried out by many different labs across the globe, and many of the resulting research papers and research materials have also been made freely available in venues that are usually limited to paid subscribers. While much of this work is not public domain, strictly speaking, it is being shared and built on largely as if it were. That has enabled vaccines to be safely rolled out much more quickly than they have been for other diseases. While we celebrate today’s belated additions to the public domain, it’s also important to promote and protect it, because there are still efforts to freeze it or roll it back. The successor to the NAFTA trade deal requires Canada to add 20 years to its copyright terms, for instance (though Canada has not yet implemented that provision). And while there is no current legislation to extend US copyright terms any further, such extensions have been proposed in the past, and we’ve just seen in Congress’s recent funding bill how questionable changes to copyright law can be jammed into “must-pass” legislation with little or no warning or recourse. The public domain enriches our culture, reminds us and lets us learn from our past, and helps us make better futures. As 2021 gives us opportunities to turn the page, let’s celebrate the new opportunities we have to enjoy, share, reuse, and build on our newly public domain works. And let’s make sure we don’t lose any more generations. Posted in online books, open access, publicdomain | 3 Comments Counting down to 1925 in the public domain Posted on December 15, 2020 by John Mark Ockerbloom We’re rapidly approaching another Public Domain Day, the day at the start of the year when a year’s worth of creative work joins the public domain. This will be the third year in a row that the US will have a full crop of new public domain works (after a prior 20-year drought), and once again, I’m noting and celebrating works that will be entering the public domain shortly. Approaching 2019, I wrote a one-post-a-day Advent Calendar for 1923 works throughout the month of December, and approaching 2020, I highlighted a few 1924 works, and related copyright issues, in a series of December posts called 2020 Vision. This year I took to Twitter, making one tweet per day featuring a different 1925 work and creator using the #PublicDomainDayCountdown hashtag. Tweets are shorter than blog posts, but I started 99 days out, so by the time I finish the series at the end of December, I’ll have written short notices on more works than ever. Since not everyone reads Twitter, and there’s no guarantee that my tweets will always be accessible on that site, I’ll reproduce them here. (This post will be updated to include all the tweets up to 2021.) The tweet links have been reformatted for the blog, a couple of 2-tweet threads have been recombined, and some typos may be corrected. If you’d like to comment yourself on any of the works mentioned here, or suggest others I can feature, feel free to reply here or on Twitter. (My account there is @JMarkOckerbloom. You’ll also find some other people tweeting on the #PublicDomainDayCountdown hashtag, and you’re welcome to join in as well.) September 24: It’s F. Scott Fitzgerald’s birthday. His best-known book, The Great Gatsby, joins the US public domain 99 days from now, along with other works with active 1925 copyrights. #PublicDomainDayCountdown (Links to free online books by Fitzgerald here.) September 25: C. K. Scott-Moncrieff’s birthday’s today. He translated Proust’s Remembrance of Things Past (a controversial title, as the Public Domain Review notes). The Guermantes Way, his translation of Proust’s 3rd volume, joins the US public domain in 98 days. #PublicDomainDayCountdown September 26: Today is T.S. Eliot’s birthday. His poem “The Hollow Men” (which ends “…not with a bang but a whimper”) was first published in full in 1925, & joins the US public domain in 97 days. #PublicDomainDayCountdown More by & about him here. September 27: Lady Cynthia Asquith, born today in 1887, edited a number of anthologies that have long been read by children and fans of fantasy and supernatural fiction. Her first major collection, The Flying Carpet, joins the US public domain in 96 days. #PublicDomainDayCountdown September 28: As @Marketplace reported tonight, Agatha Christie’s mysteries remain popular after 100 years. In 95 days, her novel The Secret of Chimneys will join the US public domain, as will the expanded US Poirot Investigates collection. #PublicDomainDayCountdown September 29: Homer Hockett’s and Arthur Schlesinger, Sr.’s Political and Social History of the United States first came out in 1925, and was an influential college textbook for years thereafter. The first edition joins the public domain in 94 days. #PublicDomainDayCountdown September 30: Inez Haynes Gillmore Irwin died 50 years ago this month, after a varied, prolific writing career. This 2012 blog post looks at 4 of her books, including Gertrude Haviland’s Divorce, which joins the public domain in 93 days. #PublicDomainDayCountdown October 1: For some, spooky stories and themes aren’t just for October, but for the whole year. We’ll be welcoming a new year’s worth of Weird Tales to the public domain in 3 months. See what’s coming, and what’s already free online, here. #PublicDomainDayCountdown October 2: Misinformation and quackery has been a threat to public health for a long time. In 13 weeks, the 1925 book The Patent Medicine and the Public Health, by American quack-fighter Arthur J. Cramp joins the public domain. #PublicDomainDayCountdown October 3: Sophie Treadwell, born this day in 1885, was a feminist, modernist playwright with several plays produced on Broadway, but many of her works are now hard to find. Her 1925 play “Many Mansions” joins the public domain in 90 days. #PublicDomainDayCountdown October 4: It’s Edward Stratemeyer’s birthday. Books of his syndicate joining the public domain in 89 days include the debuts of Don Sturdy & the Blythe Girls, & further adventures of Tom Swift, Ruth Fielding, Baseball Joe, Betty Gordon, the Bobbsey Twins, & more. #PublicDomainDayCountdown October 5: Russell Wilder was a pioneering diabetes doctor, testing newly invented insulin treatments that saved many patients’ lives. His 1925 book Diabetes: Its Cause and its Treatment with Insulin joins the public domain in 88 days. #PublicDomainDayCountdown October 6: Queer British Catholic author Radclyffe Hall is best known for The Well of Loneliness. Hall’s earlier novel A Saturday Life is lighter, though it has some similar themes in subtext. It joins the US public domain in 87 days. #PublicDomainDayCountdown October 7: Edgar Allan Poe’s stories have long been public domain, but some work unpublished when he died (on this day in 1849) stayed in © much longer. In 86 days, the Valentine Museum’s 1925 book of his previously unpublished letters finally goes public domain. #PublicDomainDayCountdown October 8: In 1925, the Nobel Prize in Literature went to George Bernard Shaw. In 85 days, his Table-Talk, published that year, will join the public domain in the US, and all his solo works published in his lifetime will be public domain nearly everywhere else. #PublicDomainDayCountdown October 9: Author and editor Edward Bok was born this day in 1863. In Twice Thirty (1925), he follows up his Pulitzer-winning memoir The Americanization of Edward Bok with a set of essays from the perspective of his 60s. It joins the public domain in 84 days. #PublicDomainDayCountdown October 10: In the 1925 silent comedy “The Freshman”, Harold Lloyd goes to Tate University, “a large football stadium with a college attached”, and goes from tackling dummy to unlikely football hero. It joins the public domain in 83 days. #PublicDomainDayCountdown October 11: It’s François Mauriac’s birthday. His Le Desert de l’Amour, a novel that won the 1926 Grand Prix of the Académie Française, joins the US public domain in 82 days. Published translations may stay copyrighted, but Americans will be free to make new ones. #PublicDomainDayCountdown October 12: Pulitzer-winning legal scholar Charles Warren’s Congress, the Constitution, and the Supreme Court (1925) analyzes controversies, some still argued, over relations between the US legislature and the US judiciary. It joins the public domain in 81 days. #PublicDomainDayCountdown October 13: Science publishing in 1925 was largely a boys’ club, but some areas were more open to women authors, such as nursing & science education. I look forward to Maude Muse’s Textbook of Psychology for Nurses going public domain in 80 days. #PublicDomainDayCountdown #AdaLovelaceDay October 14: Happy birthday to poet E. E. Cummings, born this day in 1894. (while some of his poetry is lowercase he usually still capitalized his name when writing it out) His collection XLI Poems joins the public domain in 79 days. #PublicDomainDayCountdown October 15: It’s PG Wodehouse’s birthday. In 78 days more of his humorous stories join the US public domain, including Sam in the Suburbs. It originally ran as a serial in the Saturday Evening Post in 1925. All that year’s issues also join the public domain then. #PublicDomainDayCountdown October 16: Playwright and Nobel laureate Eugene O’Neill was born today in 1888. His “Desire Under the Elms” entered the US public domain this year; in 77 days, his plays “Marco’s Millions” and “The Great God Brown” will join it. #PublicDomainDayCountdown October 17: Not everything makes it to the end of the long road to the US public domain. In 76 days, the copyright for the film Man and Maid (based on a book by Elinor Glyn) expires, but no known copies survive. Maybe someone will find one? #PublicDomainDayCountdown October 18: Corra Harris became famous for her novel A Circuit Rider’s Wife and her World War I reporting. The work she considered her best, though, was As a Woman Thinks. It joins the public domain in 75 days. #PublicDomainDayCountdown October 19: Edna St. Vincent Millay died 70 years ago today. All her published work joins the public domain in 74 days in many places outside the US. Here, magazine work like “Sonnet to Gath” (in Sep 1925 Vanity Fair) will join, but renewed post-’25 work stays in ©. #PublicDomainDayCountdown October 20: All songs eventually reach the public domain. Authors can put them there themselves, like Tom Lehrer just did for his lyrics. But other humorous songs arrive by the slow route, like Tilzer, Terker, & Heagney’s “Pardon Me (While I Laugh)” will in 73 days. #PublicDomainDayCountdown October 21: Sherwood Anderson’s Winesburg, Ohio wasn’t a best-seller when it came out, but his Dark Laughter was. Since Joycean works fell out of fashion, that book’s been largely forgotten, but may get new attention when it joins the public domain in 72 days. #PublicDomainDayCountdown October 22: Artist NC Wyeth was born this day in 1882. The Brandywine Museum near Philadelphia shows many of his works. His illustrated edition of Francis Parkman’s book The Oregon Trail joins the public domain in 71 days. #PublicDomainDayCountdown October 23: Today (especially at 6:02, on 10/23) many chemists celebrate #MoleDay. In 70 days, they’ll also get to celebrate historically important chemistry publications joining the US public domain, including all 1925 issues of Justus Liebigs Annalen der Chemie. #PublicDomainDayCountdown October 24: While some early Alfred Hitchcock films were in the US public domain for a while due to formality issues, the GATT accords restored their copyrights. His directorial debut, The Pleasure Garden, rejoins the public domain (this time for good) in 69 days. #PublicDomainDayCountdown (Addendum: There may still be one more year of copyright to this film as of 2021; see the comments to this post for details.) October 25: Albert Barnes took a different approach to art than most of his contemporaries. The first edition of The Art in Painting, where he explains his theories and shows examples from his collection, joins the public domain in 68 days. #PublicDomainDayCountdown October 26: Prolific writer Carolyn Wells had a long-running series of mystery novels featuring Fleming Stone. Here’s a blog post by The Passing Tramp on one of them, The Daughter of the House, which will join the public domain in 67 days. #PublicDomainDayCountdown October 27: Theodore Roosevelt was born today in 1858, and died over 100 years ago, but some of his works are still copyrighted. In 66 days, 2 volumes of his correspondence with Henry Cabot Lodge, written from 1884-1918 and published in 1925, join the public domain. #PublicDomainDayCountdown October 28: American composer and conductor Howard Hanson was born on this day in 1896. His choral piece “Lament for Beowulf” joins the public domain in 65 days. #PublicDomainDayCountdown October 29: “Skitter Cat” was a white Persian cat who had adventures in several children’s books by Eleanor Youmans, illustrated by Ruth Bennett. The first of the books joins the public domain in 64 days. #PublicDomainDayCountdown #NationalCatDay October 30: “Secret Service Smith” was a detective created by Canadian author R. T. M. Maitland. His first magazine appearance was in 1920; his first original full-length novel, The Black Magician, joins the public domain in 9 weeks. #PublicDomainDayCountdown October 31: Poet John Keats was born this day in 1795. Amy Lowell’s 2-volume biography links his Romantic poetry with her Imagist poetry. (1 review.) She finished and published it just before she died. It joins the public domain in 62 days. #PublicDomainDayCountdown November 1: “Not just for an hour, not for just a day, not for just a year, but always.” Irving Berlin gave the rights to this song to his bride in 1926. Both are gone now, and in 2 months it will join the public domain for all of us, always. #PublicDomainDayCountdown November 2: Mikhail Fokine’s The Dying Swan dance, set to music by Camille Saint-Saëns, premiered in 1905, but its choreography wasn’t published until 1925, the same year a film of it was released. It joins the public domain in 60 days. #PublicDomainDayCountdown (Choreography copyright is weird. Not only does the term not start until publication, which can be long after 1st performance, but what’s copyrightable has also changed. Before 1978 it had to qualify as dramatic; now it doesn’t, but it has to be more than a short step sequence.) November 3: Herbert Hoover was the only sitting president to be voted out of office between 1912 & 1976. Before taking office, he wrote the foreword to Carolyn Crane’s Everyman’s House, part of a homeowners’ campaign he co-led. It goes out of copyright in 59 days. #PublicDomainDayCountdown November 4: “The Golden Cocoon” is a 1925 silent melodrama featuring an election, jilted lovers, and extortion. The Ruth Cross novel it’s based on went public domain this year. The film will join it there in 58 days. #PublicDomainDayCountdown November 5: Investigative journalist Ida Tarbell was born today in 1857. Her History of Standard Oil helped break up that trust in 1911, but her Life of Elbert H. Gary wrote more admiringly of his chairmanship of US Steel. It joins the public domain in 57 days. #PublicDomainDayCountdown November 6: Harold Ross was born on this day in 1892. He was the first editor of The New Yorker, which he established in coöperation with his wife, Jane Grant. After ninety-five years, the magazine’s first issues are set to join the public domain in fifty-six days. #PublicDomainDayCountdown November 7: “Sweet Georgia Brown” by Ben Bernie & Maceo Pinkard (lyrics by Kenneth Casey) is a jazz standard, the theme tune of the Harlem Globetrotters, and a song often played in celebration. One thing we can celebrate in 55 days is it joining the public domain. #PublicDomainDayCountdown November 8: Today I hiked on the Appalachian Trail. It was completed in 1937, but parts are much older. Walter Collins O’Kane’s Trails and Summits of the White Mountains, published in 1925 when the AT was more idea than reality, goes public domain in 54 days. #PublicDomainDayCountdown November 9: In Sinclair Lewis’ Arrowsmith, a brilliant medical researcher deals with personal and ethical issues as he tries to find a cure for a deadly epidemic. The novel has stayed relevant well past its 1925 publication, and joins the public domain in 53 days. #PublicDomainDayCountdown November 10: John Marquand was born today in 1893. He’s known for his spy stories and satires, but an early novel, The Black Cargo, features a sailor curious about a mysterious payload on a ship he’s been hired onto. It joins the US public domain in 52 days. #PublicDomainDayCountdown November 11: The first world war, whose armistice was 102 years ago today, cast a long shadow. Among the many literary works looking back to it is Ford Madox Ford’s novel No More Parades, part of his “Parade’s End” tetralogy. It joins the public domain in 51 days. #PublicDomainDayCountdown November 12: Anne Parrish was born on this day in 1888. In 1925, The Dream Coach, co-written with her brother, got a Newbery honor , and her novel The Perennial Bachelor was a best-seller. The latter book joins the public domain in 50 days. #PublicDomainDayCountdown November 13: In “The Curse of the Golden Cross”, G. K. Chesterton’s Father Brown once again finds a natural explanation to what seem to be preternatural symbols & events. As of today, Friday the 13th, the 1925 story is exactly 7 weeks away from the US public domain. #PublicDomainDayCountdown November 14: The pop standard “Yes Sir, That’s My Baby” was the baby of Walter Donaldson (music) and Gus Kahn (lyrics). It’s been performed by many artists since its composition, and in 48 days, this baby steps out into the public domain. #PublicDomainDayCountdown November 15: Marianne Moore, born on this day in 1887, had a long literary career, including editing the influential modernist magazine The Dial from 1925 on. In 47 days, all 1925 issues of that magazine will be fully in the public domain. #PublicDomainDayCountdown November 16: George S. Kaufman, born today in 1889, wrote or directed a play in every Broadway season from 1921 till 1958. In 46 days, several of his plays join the public domain, including his still-performed comedy “The Butter and Egg Man”. #PublicDomainDayCountdown November 17: Shen of the Sea was a Newbery-winning collection of stories presented as “Chinese” folktales, but written by American author Arthur Bowie Chrisman. Praised when first published, seen more as appropriation later, it’ll be appropriable itself in 45 days. #PublicDomainDayCountdown November 18: I share a birthday today with Jacques Maritain, a French Catholic philosopher who influenced the Universal Declaration of Human Rights. His book on 3 reformers (Luther, Descartes, and Rousseau) joins the public domain in 44 days. #PublicDomainDayCountdown November 19: Prevailing views of history change a lot over 95 years. The 1926 Pulitzer history prize went to a book titled “The War for Southern Independence”. The last volume of Edward Channing’s History of the United States, it joins the public domain in 43 days. #PublicDomainDayCountdown November 20: Alfred North Whitehead’s Science and the Modern World includes a nuanced discussion of science and religion differing notably from many of his contemporaries’. (A recent review of it.) It joins the US public domain in 6 weeks. November 21: Algonquin Round Table member Robert Benchley tried reporting, practical writing, & reviews, but soon found that humorous essays & stories were his forte. One early collection, Pluck and Luck, joins the public domain in 41 days. #PublicDomainDayCountdown November 22: I’ve often heard people coming across a piano sit down & pick out Hoagy Carmichael’s “Heart and Soul”. He also had other hits, one being “Washboard Blues“. His original piano instrumental version becomes public domain in 40 days. #PublicDomainDayCountdown November 23: Harpo Marx, the Marx Brothers mime, was born today in 1888. In his oldest surviving film, “Too Many Kisses” he does “speak”, but silently (like everyone else in it), without his brothers. It joins the public domain in 39 days. #PublicDomainDayCountdown November 24: In The Man Nobody Knows, Bruce Barton likened the world of Jesus to the world of business. Did he bring scriptural insight to management, or subordinate Christianity to capitalism? It’ll be easier to say, & show, after it goes public domain in 38 days. #PublicDomainDayCountdown November 25: Before Virgil Thomson (born today in 1896) was well-known as a composer, he wrote a music column for Vanity Fair. His first columns, and the rest of Vanity Fair for 1925, join the public domain in 37 days. #PublicDomainDayCountdown November 26: “Each moment that we’re apart / You’re never out of my heart / I’d rather be lonely and wait for you only / Oh how I miss you tonight” Those staying safe by staying apart this holiday might appreciate this song, which joins the public domain in 36 days. #PublicDomainDayCountdown (The song, “Oh, How I Miss You Tonight” is by Benny Davis, Joe Burke, and Mark Fisher, was published in 1925, and performed and recorded by many musicians since then, some of whom are mentioned in this Wikipedia article.) November 27: Feminist author Katharine Anthony, born today in 1877, was best known for her biographies. Her 1925 biography of Catherine the Great, which drew extensively on the empress’s private memoirs, joins the public domain in 35 days. #PublicDomainDayCountdown November 28: Tonight in 1925 “Barn Dance” (soon renamed “Grand Ole Opry”) debuted in Nashville. Most country music on it & similar shows then were old favorites, but there were new hits too, like “The Death of Floyd Collins”, which joins the public domain in 34 days. #PublicDomainDayCountdown (The song, with words by Andrew Jenkins and music by John Carson, was in the line of other disaster ballads that were popular in the 1920s. This particular disaster had occurred earlier in the year, and became the subject of song, story, drama, and film.) November 29: As many folks get ready for Christmas, many Christmas-themed works are also almost ready to join the public domain in 33 days. One is The Holly Hedge, and Other Christmas Stories by Temple Bailey. More on the book & author. #PublicDomainDayCountdown November 30: In 1925 John Maynard Keynes published The Economic Consequences of Sterling Parity objecting to Winston Churchill returning the UK to the gold standard. That policy ended in 1931; the book’s US copyright lasted longer, but will finally end in 32 days. #PublicDomainDayCountdown December 1: Du Bose Heyward’s novel Porgy has a distinguished legacy of adaptations, including a 1927 Broadway play, and Gershwin’s opera “Porgy and Bess”. When the book joins the public domain a month from now, further adaptation possibilities are limitless. #PublicDomainDayCountdown December 2: In Dorothy Black’s Romance — The Loveliest Thing a young Englishwoman “inherits a small sum of money, buys a motor car and goes off in search of adventure and romance”. First serialized in Ladies’ Home Journal, it joins the public domain in 30 days. #PublicDomainDayCountdown December 3: Joseph Conrad was born on this day in 1857, and died in 1924, leaving unfinished his Napoleonic novel Suspense. But it was still far enough along to get serialized in magazines and published as a book in 1925, and it joins the public domain in 29 days. #PublicDomainDayCountdown December 4: Ernest Hemingway’s first US-published story collection In Our Time introduced his distinctive style to an American audience that came to view his books as classics of 20th century fiction: It joins the public domain in 28 days. #PublicDomainDayCountdown December 5: Libertarian author Rose Wilder Lane helped bring her mother’s “Little House” fictionalized memoirs into print. Before that, she published biographical fiction based on the life of Jack London, called He Was a Man. It joins the public domain in 27 days. #PublicDomainDayCountdown December 6: Indiana naturalist and author Gene Stratton-Porter died on this day in 1924. Her final novel, The Keeper of the Bees, was published the following year, and joins the public domain in 26 days. One review. #PublicDomainDayCountdown December 7: Willa Cather was born today in 1873. Her novel The Professor’s House depicts 1920s cultural dislocation from a different angle than F. Scott Fitzgerald’s better-known Great Gatsby. It too joins the public domain in 25 days. #PublicDomainDayCountdown December 8: The last symphony published by Finnish composer Jean Sibelius (born on this day in 1865) is described in the Grove Dictionary as his “most remarkable compositional achievement”. It joins the public domain in the US in 24 days. #PublicDomainDayCountdown December 9: When the Habsburg Empire falls, what comes next for the people & powers of Vienna? The novel Old Wine, by Phyllis Bottome (wife of the local British intelligence head) depicts a society undergoing rapid change. It joins the US public domain in 23 days. #PublicDomainDayCountdown December 10: Lewis Browne was “a world traveler, author, rabbi, former rabbi, lecturer, socialist and friend of the literary elite”. His first book, Stranger than Fiction: A Short History of the Jews, joins the public domain in 22 days. #PublicDomainDayCountdown December 11: In 1925, John Scopes was convicted for teaching evolution in Tennessee. Books explaining the science to lay audiences were popular that year, including Henshaw Ward’s Evolution for John Doe. It becomes public domain in 3 weeks. #PublicDomainDayCountdown December 12: Philadelphia artist Jean Leon Gerome Ferris was best known for his “Pageant of a Nation” paintings. Three of them, “The Birth of Pennsylvania”, “Gettysburg, 1863”, and “The Mayflower Compact”, join the public domain in 20 days. #PublicDomainDayCountdown December 13: The Queen of Cooks, and Some Kings was a memoir of London hotelier Rosa Lewis, as told to Mary Lawton. Her life story was the basis for the BBC and PBS series “The Duchess of Duke Street”. It joins the public domain in 19 days. #PublicDomainDayCountdown December 14: Today we’re celebrating new films being added to the National Film Registry. In 18 days, we can also celebrate more Registry films joining the public domain. One is The Clash of the Wolves, starring Rin Tin Tin. #PublicDomainDayCountdown December 15: Etsu Inagaki Sugimoto, daughter of a high-ranking Japanese official, moved to the US in an arranged marriage after her family fell on hard times. Her 1925 memoir, A Daughter of the Samurai, joins the public domain in 17 days. #PublicDomainDayCountdown December 16: On the Trail of Negro Folk-Songs compiled by Dorothy Scarborough assisted by Ola Lee Gulledge, has over 100 songs. Scarborough’s next of kin (not Gulledge, or any of their sources) renewed its copyright in 1953. But in 16 days, it’ll be free for all. #PublicDomainDayCountdown December 17: Virginia Woolf’s writings have been slowly entering the public domain in the US. We’ve had the first part of her Mrs. Dalloway for a while. The complete novel, and her first Common Reader essay collection, join it in 15 days. #PublicDomainDayCountdown December 18: Lovers in Quarantine with Harrison Ford sounds like a movie made for 2020, but it’s actually a 1925 silent comedy (with a different Harrison Ford). It’ll be ready to go out into the public domain after a 14-day quarantine. #PublicDomainDayCountdown December 19: Ma Rainey wrote, sang, and recorded many blues songs in a multi-decade career. Two of her songs becoming public domain in 13 days are “Shave ’em Dry” (written with William Jackson) & “Army Camp Harmony Blues” (with Hooks Tilford). #PublicDomainDayCountdown December 20: For years we’ve celebrated the works of prize-winning novelist Edith Wharton as her stories join the public domain. In 12 days, The Writing of Fiction, her book on how she writes her memorable tales, will join that company. #PublicDomainDayCountdown December 21: Albert Payson Terhune, born today in 1872, raised and wrote about dogs he kept at what’s now a public park in New Jersey. His book about Wolf, who died heroically and is buried there, will also be in the public domain in 11 days. #PublicDomainDayCountdown December 22: In the 1920s it seemed Buster Keaton could do anything involving movies. Go West, a 1925 feature film that he co-wrote, directed, co-produced, and starred in, is still enjoyed today, and it joins the public domain in 10 days. #PublicDomainDayCountdown December 23: In 9 days, not only will Theodore Dreiser’s massive novel An American Tragedy be in the public domain, but so will a lot of the raw material that went into it. Much of it is in @upennlib‘s special collections. #PublicDomainDayCountdown December 24: Johnny Gruelle, born today in 1880, created the Raggedy Ann doll, and a series of books sold with it that went under many Christmas trees. Two of them, Raggedy Ann’s Alphabet Book and Raggedy Ann’s Wishing Pebble, join the public domain in 8 days. #PublicDomainDayCountdown December 25: Written in Hebrew by Joseph Klausner, translated into English by Anglican priest Herbert Danby, Jesus of Nazareth reviewed Jesus’s life and teachings from a Jewish perspective. It made a stir when published in 1925, & joins the public domain in 7 days. #PublicDomainDayCountdown December 26: “It’s a travesty that this wonderful, hilarious, insightful book lives under the inconceivably large shadow cast by The Great Gatsby.” A review of Anita Loos’s Gentlemen Prefer Blondes, also joining the public domain in 6 days. #PublicDomainDayCountdown December 27: “On revisiting Manhattan Transfer, I came away with an appreciation not just for the breadth of its ambition, but also for the genius of its representation.” A review of the John Dos Passos novel becoming public domain in 5 days. #PublicDomainDayCountdown December 28: All too often legal systems and bureaucracies can be described as “Kafkaesque”. The Kafka work most known for that sense of arbitrariness and doom is Der Prozess (The Trial), reviewed here. It joins the public domain in 4 days. #PublicDomainDayCountdown December 29: Chocolate Kiddies, an African American music and dance revue that toured Europe in 1925, featured songs by Duke Ellington and Jo Trent including “Jig Walk”, “Jim Dandy”, and “With You”. They join the public domain in 3 days. #PublicDomainDayCountdown December 30: Lon Chaney starred in 2 of the top-grossing movies of 1925. The Phantom of the Opera has long been in the public domain due to copyright nonrenewal. The Unholy Three, which was renewed, joins it in the public domain in 2 days. #PublicDomainDayCountdown (If you’re wondering why some of the other big film hits of 1925 haven’t been in this countdown, in many cases it’s also because their copyrights weren’t renewed. Or they weren’t actually copyrighted in 1925.) December 31: “…You might as well live.” Dorothy Parker published “Resumé” in 1925, and ultimately outlived most of her Algonquin Round Table-mates. This poem, and her other 1925 writing for periodicals, will be in the public domain tomorrow. #PublicDomainDayCountdown Posted in copyright, publicdomain | 3 Comments From our subjects to yours (and vice versa) Posted on December 3, 2020 by John Mark Ockerbloom (TL;DR: I’m starting to implement services and publish data to support searching across library collections that use customized subject headings, such as the increasingly-adopted substitutes for LCSH terms like “Illegal aliens”. Read on for what I’m doing, why, and where I would value advice and discussion on how to proceed.) I’ve run the Forward to Libraries service for a few years now. As I’ve noted in earlier posts here, it’s currently used on The Online Books Page and in some Wikipedia articles to search for resources in your local library (or any other library you’re interested in) on a subject you’re exploring. One of the key pieces of infrastructure that makes it work is the Library of Congress Subject Headings (LCSH) system, which many research libraries use to describe their holdings. Using the headings in the system, along with mappings between it and other systems for describing subjects (such as the English Wikipedia article titles that Forward to Libraries knows how to relate to LCSH) allows researchers to find materials on the same subjects across multiple collections, using common terminology. There are limitations to relying on LCSH for cross-collection subject searches, though. First of all, many libraries, particularly those outside the US, do not use LCSH. Some use other subject vocabularies. If a mapping has been defined between LCSH and another subject vocabulary (as has been done, for example, with MeSH) one can use that mapping to determine search terms to use in libraries that use that subject vocabulary. We don’t yet have that capability in Forward to Libraries, but I’m hoping to add it eventually. Changing the subjects I’m now also seeing more libraries that use LCSH, but that also use different terms for certain subjects that they find more appropriate for their users. While there is a process for updating LCSH terms (and its terms get updated on a monthly basis) the process can be slow, hard for non-specialists to participate in, and contentious, particularly for larger-scale subject heading changes. It can also be subject to pressure by non-librarians. The Library of Congress ultimately answers to Congress (as its name suggests), and members of Congress have used funding bills to block changes in subject headings that the librarian-run process had approved. They did that in 2016 for the subject heading “Illegal aliens”, where librarians had recommended using other terms to cover subjects related to unauthorized immigration. The documentary film “Change the Subject” (linked with context in this article) has a detailed report on this controversy. Four years after the immigration subject changes were blocked, some libraries have decided not to wait for LCSH to change, and are introducing their own subject terms. The University of Colorado Boulder, for example, announced in 2018 that they would use the term “Undocumented immigrants” where the Library of Congress had “Illegal aliens”. Other libraries have recently announced similar changes. Some library consortia have organized systematic programs to supersede outdated and offensive terms in LCSH in their catalogs. Some groups now maintain specialized subject vocabularies that can both supplement and supersede LCSH terms, such as Homosaurus for LGBT+-related subjects. And there’s also been increasing interest in using subject terms and classifications adapted to local communities. For instance, the Brian Deer Classification System is intended to be both used and shaped by local indigenous communities, and therefore libraries in different locations that use it may well use different terms for some subjects, depending on local usage and interests. Supporting cross-collection search in a community of localized catalogs We can still search across collections that use local terms, as long as we know what those terms are and how to translate between them. Forward to Libraries already uses a data file indicating Wikipedia article titles that correspond closely to LCSH subjects, and vice versa. By extension, we can also create a data file indicating terms to use at a given library that correspond to terms in LCSH and other vocabularies, so we can see what resources are available at different places on a given topics. You can see how that works in practice at The Online Books Page. As I write this, we’re still using the unaltered LCSH subjects (updated to October 2020), so we have a subject page showing free online books on “Illegal aliens”. You can follow links from there to see what other libraries have. If you select the “elsewhere” link in the upper left column and choose the Library of Congress as the library to search, you’ll see what they hold under that subject heading. But if you instead choose the University of Colorado Boulder, you’ll see what they have under “Undocumented immigrants”, the subject term they’ve adopted there. Similar routing happens from Wikipedia. The closest related Wikipedia article at present is “Illegal immigration”, and if you go down to the Further Reading section and select links in the Library Resources box, selecting “Online books” or most libraries will currently take you to their “Illegal aliens” subject search. But selecting University of Colorado Boulder (from “Resources in other libraries” if you don’t already have it specified as your preferred library in Wikipedia) will take you to their “Undocumented immigrants” search. This routing applies two mappings, one from Wikipedia terms to LCSH terms, and another from LCSH terms to local library terms. A common data resource These sorts of transformations are fundamentally data-driven. My Forward to Libraries Github repository now includes a data file listing local subject terms that different libraries use, and how they relate to LCSH subject terms. (The library codes used in the file are the same ones that are used in my libraries data file, and are based on OCLC and/or ISIL identifiers.) The local subject terms file is very short for now– as I write this, it only has enough data for the examples I’ve described above, but I’ll be adding more data shortly for other libraries that have announced and implemented subject headings changes. (And I’ll be glad to hear about more so I can add them.) As with other data in this repository, the data in this file is CC0, so it can be used by anyone for any purpose. In particular, it could be be used by services other than my Forward to Libraries tool, such as by aggregated catalogs that incorporate data from multiple libraries, some of which might use localized subject terms that have LCSH analogues. Where to go next What I’ve shown so far is not far removed from a proof-of-concept demo, but I hope it suggests ways that services can be developed to support searches among and across library collections with diverse subject headings. As I mentioned, I’ll be adding more data on localized subject headings as I hear about it, as well as adding more functionality to the Forward to Libraries service (such as the ability to link from a collection with localized subject headings, so I can support them in The Online Books Page, or in other libraries that have such headings and want to use to the service). There are some extensions that could be done to the basic data model to support scaling up these sorts of localizations, such as customizations used by all the libraries in a given consortium, or ones that adopt wholesale an alternative set of subjects, whether that be MeSH, Homosaurus, or the subject thesaurus of a national library outside the US. Even with data declarations supporting those sorts of “bulk” subject mappings, a universal subject mapping knowledge base could get large over time. I’ve created my own mapping file for my services, and for now I’m happy to grow it as needed and share the data freely. But if there is another suitable mapping hub already available or in the works, I’m happy to consider using that instead. It’s important to support exploration across a community of diverse libraries with a diverse array of subject terms and descriptions. I hope the tools and data I’ve described here will help advance us towards that goal, and that I can help grow them from their current nascent state to make them more broadly useful. Posted in discovery, metadata, subjects, wikipedia | Leave a comment Everybody’s Library Questions: Finding films in the public domain Posted on March 30, 2020 by John Mark Ockerbloom Welcome to another installment of Everybody’s Library Questions, where I give answers to questions people ask me (in comments or email) that seem to be useful for general consumption. Before I start, though, I want to put in a plug for your local librarians.  Even though many library buildings are closed now (as they should be) while we’re trying to get propagation and treatment for COVID-19 under control, many of those libraries offer online services, including interactive online help from librarians. (Many of our libraries are also expanding the scope and hours of these services during this health crisis.)   Your local librarians will have the best knowledge of what’s available to you, can find out more about your needs when they talk to you, and will usually be able to respond to questions faster than I or other specific folks on the Internet can. Check out your favorite library’s website, and look for links like “get help” or “online chat” and see what they offer. OK, now here’s the question, extracted from a comment made by Nicholas Escobar to a recent post: I am currently studying at the University of Edinburgh getting masters degree in film composition. For my final project I am required to score a 15 minute film. I was thinking of picking a short silent film (any genre) in the public domain that is 15 minutes (or very close to that length) and was wondering if you had any suggestions? There are three questions implied by this one: First, how do you find out what films exist that meet your content criteria?  Second, how do you find out whether films in that set are in the public domain?  Finally, how can you get access to a film so you can do things with it (such as write a score for it)? There are a few ways you can come up with films to consider.  One is to ask your local librarian (see above) or professor to recommend reference works or data sources that feature short films.  (Information about feature films, which run longer, are often easier to find, but there’s a fair bit out there as well on short films.)  Another is to search some of the reference works and online data sources I’ll mention in the other answers below. The answer to the copyright question depends on where you are.  In the United States, there are basically three categories of public domain films: First, there are films copyrighted before 1925.  All such films’ copyrights have now expired in the US.  This covers most, but not all, of the commercial silent-film era; once The Jazz Singer came out in 1927, movie studies quickly switched to films with sound. Second, there are US films that entered the public domain because they did not take the steps required to secure or maintain their copyrights.  Researching whether this has occurred with a particular film can be complicated, but because there’s been so much interest in cinema history, others have already researched the copyright history of many US films.  The Wikipedia article “List of films in the public domain in the United States” cites a number of reference sources you can check for the status of various films.  (It also lists specific films believed to be in the public domain, but you should check sources cited in the article for those films, and not just take the word of what could be a random Internet user before relying on that information.) Third, there are films created in their entirety by the US government.  There’s a surprisingly large number of these, in various genres and lengths, with tens of thousands or more digitized in the Internet Archive’s United States Government film collection or listed in the National Archives catalog.  You can do lots of things with works of the United States government, which are generally not subject to copyright. That’s the situation in the United States, at least.  However, if you’re not in the United States, different rules may apply.  In Edinburgh and elsewhere in the United Kingdom (and in most of the rest of Europe), works are generally copyrighted until the end of the 70th year after the death of the last author.  In the UK, the authors of a film are considered to be the principal director, the screenwriter(s), and the composer(s).  (For more specifics, see the relevant portion of UK law.)  However, some countries will also let the copyrights of foreign works expire when they do in their country of origin, and in those a US film that’s in the public domain in the US would also be public domain in those countries.  As you can see in the UK law section I link to, the UK does apply such a “rule of the shorter term” to films from outside the European Economic Area (EEA), if none of the authors are EEA nationals.  So you might be good to go in the UK with many, but not all, US films that are public domain in the US.  (I’m not a UK copyright expert, though; you might want to talk to one to be sure.) Let’s suppose you’ve come up with some suitable possible films, either ones that are in the public domain, ones that have suitable Creative Commons licenses or you can otherwise get permission to score, or ones that are in-copyright but that you could score in the context of a study project, even if you couldn’t publish the resulting audiovisual work.  (Educational fair use is a thing, though its scope also varies from country to country.  Here a guide from the British Library on how it works in the UK.)  We then move on to the last question: How do you get hold of a copy so you can write a score for it? The answer to that question depends on your situation.  Right now, the situation for many of us is that we’re stuck at home, and can’t visit libraries or archives in person.  (And our ability to get physical items like DVDs or videotapes may be limited too.)  So for now, you may be limited to films you can obtain online.  There are various free sources of public domain films: I’ve already mentioned the Internet Archive, whose moving image archive includes many films that are in the public domain (and many that are not, so check rights before choosing one to score).  The Library of Congress also offers more than 2,000 compilations and individual films free to all online.  And your local library may well offer more, as digital video, or as physical recordings (if you can still obtain those).  A number of streaming services that libraries or individuals can subscribe to offer films in the public domain that you can free free to set to music.  Check with your librarian or browse the collection of your favorite streaming service. I’m not an expert in films myself.  Folks reading this who know more, or have more suggestions, should feel free to add comments to this post while comments are open.  In general, the first librarians you talk to won’t usually be experts about the questions you ask.  But even when we can’t give definitive answers on our own, we’re good at sending researchers in productive directions, whether that’s to useful research and reference sources, or to more knowledgeable people.  I hope you’ll take advantage of your librarians’ help, especially during this health crisis.  And, for my questioner and other folks who are interested in scoring or otherwise building on public domain films, I’ll be very interested in hearing about the new works you produce from them.   Posted in copyright, publicdomain, Questions | Comments Off on Everybody’s Library Questions: Finding films in the public domain Build a better registry: My intended comments to the Library of Congress on the next Register of Copyrights Posted on March 19, 2020 by John Mark Ockerbloom The Library of Congress is seeking public input on abilities and priorities desired for the next Register of Copyrights, who heads the Copyright Office, a department within the Library of Congress.  The deadline for comments as I write this is March 20, though I’m currently having trouble getting the form to accept my input, and operations at the Library, like many other places, are in flux due to the COVID-19 pandemic.  Below I reproduce the main portion of the comments I’m hoping to get in before the deadline, in the hope that they will be useful for both them and others interested in copyright.  I’ve added a few hyperlinks for context. At root, the Register of Copyrights needs to do the job the position title implies: Build and maintain an effective copyright registry. A well designed, up-to-date digital registry should make it easy for rightsholders to register, and for the public to use registration information. Using today’s copyright registry involves outdated, cumbersome, and costly technologies and practices. Much copyright data is not online, and the usability of what is online is limited. The Library of Congress is now redesigning its catalogs for linked data and modern interfaces. Its Copyright Office thus also has an opportunity to build a modern copyright registry linked to Library databases and to the world, with compatible linked data technologies, robust APIs, and free open bulk downloads. The Copyright Office’s registry and the Library of Congress’s bibliographic and authority knowledge bases could share data, using global identifiers to name and describe entities they both cover, including publications, works, creators, rightsholders, publishers, serials and other aggregations, registrations, relationships, and transactions. The Copyright Office need not convert wholesale to BIBFRAME, or to other Library-specific systems. It simply needs to create and support identifiers for semantic entities described in the registry (“things, not strings“), associate data with them, and exchange data in standard formats with the Library of Congress catalog and other knowledge bases. As a comprehensive US registry for creative works of all types, the Copyright Office is uniquely positioned to manage such data. The Deep Backfile project at the University of Pennsylvania (which I maintain) provides one example of uses that can be made of linked copyright data. At is a page showing selected copyrights associated with Collier’s Magazine (1888-1957). It links to online copies of public domain issues, contents and descriptive information from external sources like FictionMags, Wikidata, and Wikipedia, and rights contact information for some of its authors. The information shown has no rights restrictions, and can be used by humans and machines. JSON files, and the entire Deep Backfile knowledge base, are available from this page and from Github. It is not the Copyright Office’s job to produce applications like these. But it can provide data that powers them. Much of our Deep Backfile data was copied manually from scanned Catalog of Copyright Entries pages, and from online catalogs lacking easily exported or linked data. The Copyright Office and the Library of Congress could instead produce such data natively (first prospectively, eventually retrospectively). In the process, they could also cross-pollinate each other’s knowledge bases. To implement this vision, the Register needs to understand library standards and linked open data technologies, gather and manage a skilled implementation team, and be sufficiently persuasive, trusted, and organized to bring stakeholders together inside and outside the Copyright Office and the Library of Congress to support and fund a new system’s development. If explained and implemented well, a registry of the sort described here could greatly benefit copyright holders and copyright users alike. The Register of Copyrights should also know copyright law thoroughly, implement sensible regulations required by copyright law and policy, and be a trusted and inclusive expert that rightsholders, users, and policymakers can consult. I expect other commenters to go into more detail about these skills, which are also useful in building a trustworthy registry of the sort I describe. But the Copyright Office is long overdue to be led by a Register who can revitalize its defining purpose: Register copyrights, in up-to-date, scalable, and flexible ways that encourage wide use of the creations they cover, and thus promote the progress of science and useful arts. Update, March 20: As of the late afternoon on the day of the deadline, the form appears to be still rejecting my submission, without a clear error message.  It did, however, accept a very short submission without any attachment, and with a URL pointing here.  So below I include the rest of my intended comment, listing 3 top priorities. (The essay above was for the longer comment asked for about knowledge, skills, and abilities.) These priorities largely restate in summary form what I wrote above.   If anyone else reading this was unable to post their full comment by the deadline due to technical difficulties, you can try emailing something to me (or leaving a comment to this post) and posting a simple comment to that effect on the LC site, and I’ll do my best to get your full comment posted on this blog. Priority #1: Make copyright registration data easy to use: Data should be easy to search, consult, and analyze, individually and in bulk, by people and machines, linked with the Library of Congress’s rich bibliographic data, facilitating verification of copyright ownership, licensing from rightsholders, and cataloging and analysis by libraries, publishers, vendors, and researchers. Priority #2: Make effective copyright registration easy to do: Ensure copyright registration is simple, inexpensive, supports a variety of electronic and physical deposits, and where possible supports persistent, addressible identifiers and accompanying data for semantic entities described in registrations, and their relationships. Priority #3: Be a trusted, inclusive resource for understanding copyright and its uses: Creators, publishers, consumers, and policymakers all are concerned with copyright, and with possible reforms. The Register should help all understand their rights, and provide expert and impartial advice and mediation for diverse copyright stakeholders and policymaking priorities. Other factors: The Register of Copyrights should also be capable of creating, implementing, and keeping up to date appropriate regulations and practices required or implied by Congressional statutes.  (For the “additional comments” attachment, I had a static PDF attachment showing the Collier’s web page linked from my main essay, as it was on March 19.)   Posted in copyright, data, metadata, open access, serials | Comments Off on Build a better registry: My intended comments to the Library of Congress on the next Register of Copyrights Welcome to everybody’s online libraries Posted on March 16, 2020 by John Mark Ockerbloom As coronavirus infections spread throughout the world, lots of people are staying home to slow down the spread and save lives.  In the US, many universities, schools, and libraries have closed their doors.  (Here’s what happening at the library where I work, which as I write this has closed all its buildings.)  But lots of people are still looking for information, to continue studies online, or just to find something good to read. Libraries are stepping up to provide these things online.  Many libraries have provided online information for years, through our own websites, electronic resources that we license, create, or link to, and other online services.  During this crisis, as our primary forms of interaction move online, many of us will be working hard to meet increased demand for digital materials and services (even as many library workers also have to cope with increased demands and stresses on their personal lives). Services are likely to be in flux for a while.  I have a few suggestions for the near term: Check your libraries’ web sites regularly. They should tell you whether the libraries are now physically open or closed (many are closed now, for good reason), and what services the library is currently offering.  Those might change over time, sometimes quickly.  Our main library location at Penn, for instance, was declared closed indefinitely last night, less than 12 hours before it was next due to reopen.   On the other hand, some digitally mediated library services and resources might not be available initially, but then become available after we have safe and workable procedures set up for them and sufficient staffing.    Many library web sites also prominently feature their most useful electronic resources and services, and have extensive collections of electronic resources in their catalogs or online directories.  They may be acquiring more electronic resources to meet increased user demand for online content. Some providers are also increasing what they offer to their library customers during the crisis, and sometimes making some of their material free for all to access. If  you need particular things from your library during this crisis, reach out to them using the contact information given on their website.  When libraries know what their users need, they can often make those needs a priority, and can let you know if and when they can provide them. Check out other free online library services.    I run one of them, The Online Books Page, which now lists over 3 million books and serials freely readable online due to their public domain status or the generosity of their rightsholders.   We’ll be adding more material there over the next few weeks as we incorporate the listings of more collections, and respond to your requests.  There are many other services online as well.   Wikipedia serves not only as a crowd-sourced collection of articles on millions of topics, but also as a directory of further online resources related to those topics.   And the Internet Archive also offers access millions of books and other information resources no longer readily commercially available, many through controlled digital lending and other manifestations of fair use.  (While the limits of fair use are often subject to debate, library copyright specialists make a good case that its bounds tend to increase during emergencies like this one.  See also Kyle Courtney’s blog for more discussion of useful things libraries can do in a health crisis with their copyright powers.) Support the people who provide the informative and creative resources you value.  The current health crisis has also triggered an economic crisis that will make life more precarious for many creators.  If you have funds you can spare, send some of them their way so they can keep making and publishing the content you value.  Humble Bundles, for instance, offer affordable packages of ebooks, games, and other online content you can enjoy while you’re staying home, and pay for to support their authors, publishers, and associated charities.  (I recently bought their Tachyon SF bundle with that in mind; it’s on offer for two more weeks as I write this.)  Check the websites of your favorite authors and artists to see if they offer ways to sponsor their work, or specific projects they’re planning.  Buy books from your favorite independent booksellers (and if they’re closed now, check their website or call them to see if you can buy gift cards to keep them afloat now and redeem them for books later on).  Pay for journalism you value.  Support funding robust libraries in your community. Consider ways you can help build up online libraries.  Many research papers on COVID-19 and related topics have been opened to free access by their authors or publishers since the crisis began.  Increasing numbers of scholarly and other works are also being made open access, especially by those who have already been paid for creating them.   If you’re interested in sharing your work more broadly, and want to learn more about how you can secure rights to do so, the Authors’ Alliance has some useful resources. As libraries shift focus from in-person to online service, some librarians may be busy with new tasks, while others may be left hanging until new plans and procedures get put into motion.  If you’re in the latter category, and want something to do, there are various library-related projects you can work on or learn about.  One that I’m running is the deep backfile project to identify serial issues that are in the public domain in less-than-obvious ways, and to find or create free digital copies of these serials (so that, among other things, people who are stuck at home can read them online).  I’ve recently augmented my list of serial backfiles to research to include serials held by the library in which I work, in the hopes that we could eventually find or produce digital surrogates for some of them that our readers (and anyone else interested) could access from afar.  I can also add sets for other libraries; if you’re interested in one for yours, let me know and I can go into more detail about the data I’m looking for.  (I’m not too worried about creating too many serial sets to research, especially since once information about a serial is added into one of the serial sets, it also gets automatically added into any other sets that include that serial.) Take care of yourself, and your loved ones.  Whether you work in libraries of just use them, this is a stressful time.  Give yourself and those around you room and resources to cope, as we disengage from much of our previous activities, and deal with new responsibilities and concerns.  I’m gratified to see the response of the Wikimedia Foundation, for instance, which is committed both to keeping the world well-informed and up-to-date through Wikipedia and related projects, and also to letting its staff and contractors work half-time for the same pay during the crisis, and waiving sick-day limits. Among new online community support initiatives, I’m also pleased to see librarian-created resources like the Ontario Library Association’s pandemic information brief, with useful information for library users and workers, and the COVID4GLAM Discord community, a discussion space to support the professional and personal needs of people working in libraries, archives, galleries and museums. These will be difficult times ahead.  Our libraries can make a difference online, even as our doors are closed.  I hope you’ll be able to put them to good use.   Posted in libraries, online books, open access | 4 Comments Public Domain Day 2020: Coming Around Again Posted on January 1, 2020 by John Mark Ockerbloom I’m very happy for 2020 to be arriving.  As the start of the 2020s, it represents a new decade in which we can have a fresh start, and hope to make better decisions and have better outcomes than some of what we’ve gone through in recent years.  And I’m also excited to have a full year’s worth of copyrighted works entering the public domain in much of the world, including in the US for the second year in a row after a 20-year public domain freeze. Outside the US, in countries that still use the Berne Convention‘s “life plus 50 years” copyright terms, works by authors who died in 1969 are now in the public domain.  (Such countries include Canada, New Zealand, and a number of other countries mostly in Asia and Africa.)  Many other countries, including most European countries, have extended copyright terms to life of the author(s) plus 70 years, often under pressure from the United States or the European Union.  In those countries, works by authors who died in 1949 are now in the public domain.  The Public Domain Review has a “class of 2020” post featuring some of these authors, along with links to lists of other people who died in the relevant years. In the US, nearly all remaining copyrights from 1924 have now expired, just as copyrights from 1923 expired at the start of last year.  (The exceptions are sound recordings, which will still be under copyright for a little while longer.   But thanks to recent changes in copyright law, those too will join the public domain soon instead of remaining indefinitely in state copyright.)  I discussed some of the works joining the public domain in a series of blog posts last month, in the last one linking to some posts by others that mentioned new public domain arrivals from 1924.  But I’m happy not just because of these specific works, but also because new arrivals to the US public domain are now an annual event, and not just something that happens with published works at rare intervals.  I could get used to this. It isn’t all good news this year.  The most recent draft of the intellectual property chapter of the US-Canada-Mexico trade agreement requires Canada to extend its copyrights another 20 years, making it freeze its public domain not long after we’ve unfrozen our own in the US.  But the agreement hasn’t yet been ratified, and could conceivably still be changed or rejected.  And the continued force of copyrights from the second half of the previous ’20s while we’re entering a new set of ’20s is a reminder that US copyright terms remain overlong; so long, in fact, that many works from that era are lost or severely deteriorated before their copyrights expire. But there’s now an annual checklist of things to do for me and for many other library organizations.  For me, some of the things to do for The Online Books Page include: Updating our documentation on what’s public domain  (done) and on what versions of our site are public domain (also done; as in previous years, I’m dedicating to the public domain works that I wrote whose copyrights I control that are were published more than 14 years ago.  This year that includes the 2005 copyrights to The Online Books Page.) Removing the “no US access” notices from 1924 books I’d linked to at non-US sites, when I couldn’t previously establish that they were public domain here; and removing “US access only” notices for 1879 volumes at HathiTrust, which over the next few days will be making 140-year-old volumes globally accessible without requiring author-death-date review.   (This and other activities below will start tomorrow and continue until done.) Updating our list of first active renewals for serials and our “Determining copyright status of serial issues” decision guide to reflect the expiration of 1924’s copyrights.  As part of this process, I’ll be deleting all the 1924 serial issue and contribution renewals currently recorded in our serials knowledge base, since they’re no longer in force.  If anyone wants to know what they were for historical or other analytical purposes, I have a zipped collection of all our serial renewals records as of the end of 2019, available on request.  They can also be found in the January 1, 2020 commit of this Github directory. Adding newly opened or scanned 1924 books to our listings, through our automated OAI harvests of selected digital collections, readers’ suggestions and requests, surveys of prize winners and other relevant collections, and our own bibliographer selections. All of this is work I’m glad to be doing this year, and hope to be doing more in the years to come.  (And I’m already streamlining our processes to make it easier to do in years to come.)  Its the job of libraries to collect and preserve works of knowledge and creativity and make them easy for people to discover, access, and use.  It’s also our job to empower our users to draw on those works to make new ones.  As the public domain grows, we can freely collect and widely share more works, and our users can likewise build on and reuse more public domain works in their own creations. Supporting the public domain, then, is supporting the work and mission of libraries.  I therefore hope that all libraries and their users will support a robust public domain, and have more works to celebrate and work with every year.  Happy Public Domain Day!       Posted in publicdomain | Comments Off on Public Domain Day 2020: Coming Around Again 2020 vision #5: Rhapsody in Blue by George Gershwin Posted on December 31, 2019 by John Mark Ockerbloom It’s only a few hours from the new year where I write this, but before I ring in the new year, and a new year’s worth of public domain material, I’d like to put in a request for what music to ring it in with: George Gershwin’s Rhapsody in Blue, which joins the public domain in the US as the clock strikes twelve, over 95 years after it was first performed. The unofficial song for Public Domain Day 2019 turned out to be “Yes! We Have No Bananas”, one of the members of the first big class of US public domain works in the last 20 years.  That’s a fun novelty song, and certainly memorable, but not something I necessarily want to hear a lot.  In contrast, for me Rhapsody in Blue has a freshness that makes it a joy for me to hear repeatedly, right from the opening clarinet glissando (apparently the idea of clarinetist Ross Gorman, who took the scale that Gershwin had composed for the piece and gave it the bendy, slidy wail that tells you right away that this is no ordinary concert piece).  It’s brought together classical, popular, high-art and everyday music, as it’s been played and recorded countless times by jazz bands (the original scoring is for jazz band and piano), symphony orchestras, and pop musicans like Billy Joel.  Even its licensing as an theme tune for an airline hasn’t diminished it. There’s lots of other work joining the public domain along with Gershwin’s tune.  I’ve only had a chance to mention a few others in my short series, but others have mentioned more works you may find of interest. At the Internet Archive’s blog, Elizabeth Townsend Gard writes about Vera Brittain’s Not without Honour and other 1924 works that will be in the public domain very soon.  Duke’s Public Domain Day 2020 post mentions various books, films, and musical compositions joining the public domain as well (and has more to say on Rhapsody in Blue).  Wikipedia’s various 1924 articles also mention various works that will either be joining the public domain, or becoming more clearly established there.  And Hathitrust will begin opening access to tens of thousands of scanned volumes from 1924 over the next few days. I’ll have more to say on the new arrivals tomorrow, sometime after the midnight bells chime.  By tradition, the first tune played in the New Year is usually the public domain song “Auld Lang Syne”.  But after that, at your new years’ party or at a later Public Domain celebration, you might enjoy hearing or playing Gershwin’s new arrival in the public domain.     Posted in publicdomain | Comments Off on 2020 vision #5: Rhapsody in Blue by George Gershwin 2020 vision #4: Ding Dong Merrily on High by George Ratcliffe Woodward and others Posted on December 19, 2019 by John Mark Ockerbloom It’s beginning to sound a lot like Christmas everywhere I go.  The library where I work had its holiday party earlier this week, where I joined librarian colleagues singing Christmas, Hanukkah, and winter-themed songs in a pick-up chorus.  Radio stations and shopping centers play a familiar rotation of popular seasonal songs whose biggest hits are from a surprisingly narrow date range centered in the 1950s.  And more traditional familiar Christmas carols, hymns, and songs are being sung and played in concert halls and churches well into January. The more “classic” Christmas music often feels timeless to those of us singing and hearing it.  But while their roots often go back far, the form in which we know them is often much newer that we might think.  Notice how the list in the previous link, for instance, includes “Carol of the Bells”, dated 1936.  That’s when it was first published as a Christmas song, one that’s still under copyright.  Its roots are older, and darker, as is made clear in a recent Slate article well worth reading. As noted there, the melody is based on a Ukrainian folk tune (date unknown), its full musical setting composed by Mykola Leontovych (assassinated by a Soviet agent in 1921), and Christmas-themed lyrics written by the Ukrainian-descended American musician Peter Wilhousky (who lived until 1978). While “Carol of the Bells” still has a number of years left to go on its copyright, another classic Christmas carol will most likely be joining the public domain in the US in just under two weeks.  Like Carol of the Bells, “Ding Dong Merrily on High” is based on a folk tune, in this case a secular dance tune first published in France in the 16th century under the title “Branle de l’Official”.  In 1924, George Ratcliffe Woodward, an English cleric already known for publishing collections of old songs, wrote lyrics for the tune recalling earlier ages, and included them in the Cambridge Carol-Book, published that year by the Society for Promoting Christian Knowledge. Charles Wood, who’d collaborated with Woodward on the earlier Cowley Carol Book,  wrote a harmonization to go with it.  While you won’t hear it at every Christmas service, it remains widely sung this time of year.  That’s in large part because it’s so much fun to sing, with its dance-like rhythms, its long bell-like vocal runs on “Gloria” (something also heard in “Angels We Have Heard on High“), and its praise of various forms of music (musicians liking to hear good things about themselves as much as anyone else). I don’t actually know for sure that “Ding Dong Merrily on High” is still under copyright here.  I have not found a 1951 or 1952 copyright renewal for the song or the book it was published in, but I’m assuming that, if nothing else, GATT restoration retroactively secured and automatically renewed a 1924 US copyright for the song as published in the Cambridge Carol-Book.  (Folks with more knowledge or legal expertise are free to correct me on that.)  Later published arrangements of the song may continue to have active copyrights, but only for material original to those arrangements.  1924’s remaining copyrights, on the other hand, all end in the US on January 1.   (And since Woodward and Wood both died over 70 years ago, the song’s already public domain in most other countries.) The arrival of 2020, then, should at least clear up any ambiguity about the public domain status of the basic carol.  I appreciate that, in part because this song, like many other Christmas carols, lives in a sort of liminal space between the private property regimes set up for copyright holders and the older, more informal understandings of folk culture.  Both kinds of spaces have good reason to exist. On the one hand, it’s good to have more than a few people who can earn a living through music, and one important way many musicians do so is by controlling rights to their compositions.  On the other hand, the folk process, which originally gave rise to the tunes for both “Ding Dong Merrily on High” and “Carol of the Bells”, is also a very good way of creating and passing on shared cultural works. Conflict can rage when two different sets of cultural expectations around creative works try to occupy the same space.  That’s one reason we’ve seen decades of conflict in academia over open access, where scholarly work is largely published by companies that depend on its control and sale to earn money, while it’s largely written by scholars who earn their money in other ways, and tend to prefer free, widespread availability of their work.  Sometimes informal arrangements work best to keep the peace.  Publishers, for instance, have grown more used to free preprint servers, and memes and fan fiction communities have become more widely accepted (and even winning awards) as long as they stay well away from unauthorized commercial exploitation (where both big and small creators tend to draw the line). Sometimes, though, it’s best to have a more formal understanding that works are free for anyone to freely use as we like.  That’s what we’ll have when 1924’s copyrights end, and the works they cover, such as “Ding Dong Merrily on High” are clearly seen to be in the public domain.  And then, those of us who are so inclined can freely sing “hosanna in excelsis!“ Posted in publicdomain | Comments Off on 2020 vision #4: Ding Dong Merrily on High by George Ratcliffe Woodward and others 2020 vision #3: The Most Dangerous Game by Richard Connell Posted on December 13, 2019 by John Mark Ockerbloom “Be a realist. The world is made up of two classes–the hunters and the huntees. Luckily, you and I are hunters.” Sanger Rainsford speaks these words at the start of “The Most Dangerous Game”, one of the most famous short stories of all time. First published in Collier’s magazine in 1924, it’s been reprinted in numerous anthologies, been adapted for radio, TV, and multiple movies, and assigned in countless middle and high school English classes.  The tropes established in the story, in which a hunter finds himself a “huntee”, are so well-established in present-day American culture that there are lengthy TV Tropes pages not just for the story itself, but for the trope named by its title. Up until now, the story’s been under copyright in the US, as well as in Europe and other countries that have “life plus 70 years” copyright terms.  (The author, Richard Connell,  died just over 70 years ago in 1949, so as of January 1, it will be public domain nearly everywhere in the world.)  Anyone reprinting the story, or explicitly adapting it for drama or art has had to get permission or pay a royalty.  On the other hand, many creators have reused its basic idea– humans being hunted for sport or entertainment– without getting such permission. That’s because ideas themselves are not copyrightable, but rather the expression of those ideas.  And the basic idea long predates this particular story: Consider, for instance, gladiators in Roman arenas, or tributes being hunted down in the Labyrinth by the Minotaur of Greek mythology.  But the particular formulation in Connell’s short story, in which General Zaroff, a former nobleman bored with hunting animals, lures humans to his private island to hunt and kill them for sport, is both distinctively memorable, and copyrightable.  Stray too close to it, or quote too much from the story, and you may find yourself the target of lawyers.  (But perhaps not if you yourself are dangerous enough game.  I don’t know if the makers of “The Incredibles“, which also featured a rich recluse using his wits and inventions to hunt humans on a private island, paid royalties to Connell’s estate, or relied on fair use or arguments about uncopyrightable ideas.  But in any case, Disney is better equipped to either negotiate or defend themselves against infringement lawsuits than others would be.) Rereading the story recently, I’m struck by both how it reflects its time in some ways, and in how its action is surprisingly economical.  In 1924, we were still living in the shadow of the First World War, in which multiple empires and noble houses fell, while others continued but began to teeter.  The deadly spectacles of public executions and lynchings were still not uncommon in the United States.  And the dividing of people into two classes– those who are inherently privileged and those who are left in the cold or even considered fair game– was particularly salient that year, as the second incarnation of the Ku Klux Klan neared its peak in popularity, and as immigration law was changed to explicitly keep out people of the “wrong” national origin or race.  Those sorts of division haunt our society to this day. Rainsford objects to Zaroff’s dehumanizing game in what we now tend to think of the story’s setup, which actually takes most of the story’s telling.  (The description of the hunt itself is relatively brief, and no words at all are used to describe the final showdown, which implicitly takes place in the gap between the story’s last two sentences.)  In the end, though, Rainsford prevails by beating his opponent at his own game.  He doesn’t want to kill another human being, but when pressed to the extreme, he adopts his opponent’s rules (at the end giving Zaroff the sporting warning “I am still a beast at bay… Get ready”) and proves to be the better killer. With the story entering the public domain in less than three weeks, we’ll have the chance to reuse, adapt, and critique the story in quotation more freely than ever before.  I hope we use the opportunity not just to recapitulate the story, but to go beyond it in new ways. That’s what happens in the best reuses of tropes.  Consider for instance, how in the Hunger Games books, the main character Katniss repeatedly finds ways to subvert the trope of killing others for entertainment.  Instead of prevailing by beating opponents at the deadly human-hunting game the enemy has created, she and her allies find ways to reject the game’s premise, cut it short, or prevent its recurrence. When, in 19 days, we get another year’s worth of public domain works, I hope we too find ways not just to revisit what’s come before, but make new and better work out of them.  That’s something that the public domain allows everyone, and not just members of some privileged class, to do.           Posted in publicdomain | Comments Off on 2020 vision #3: The Most Dangerous Game by Richard Connell ← Older posts Search for: RSS feed Pages About Free Decimal Correspondence ILS services for discovery applications John Mark Ockerbloom The Metadata Challenge Recent Posts Public Domain Day 2021: Honoring a lost generation Counting down to 1925 in the public domain From our subjects to yours (and vice versa) Everybody’s Library Questions: Finding films in the public domain Build a better registry: My intended comments to the Library of Congress on the next Register of Copyrights Recent Comments Jason on Public Domain Day 2021: Honoring a lost generation John Mark Ockerbloom on Public Domain Day 2021: Honoring a lost generation Norma Bruce on Public Domain Day 2021: Honoring a lost generation Brent Reid on Counting down to 1925 in the public domain John Mark Ockerbloom on Counting down to 1925 in the public domain Archives January 2021 December 2020 March 2020 January 2020 December 2019 November 2019 October 2019 September 2019 July 2019 June 2019 January 2019 December 2018 October 2018 June 2018 January 2018 December 2017 September 2017 January 2017 October 2016 September 2016 July 2016 May 2016 January 2016 January 2015 June 2014 January 2014 October 2013 August 2013 April 2013 March 2013 February 2013 January 2013 December 2012 July 2012 May 2012 January 2012 October 2011 September 2011 June 2011 May 2011 April 2011 January 2011 December 2010 November 2010 October 2010 September 2010 August 2010 July 2010 June 2010 May 2010 April 2010 March 2010 February 2010 January 2010 December 2009 October 2009 September 2009 August 2009 July 2009 June 2009 May 2009 April 2009 March 2009 January 2009 December 2008 November 2008 October 2008 September 2008 August 2008 July 2008 June 2008 May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 Access for all Open Access News Copyrights and wrongs Copyfight Copyright & Fair Use Freedom to Tinker Lawrence Lessig General library-related news and comment LISNews TeleRead Interesting folks Jessamyn West John Scalzi Jonathan Rochkind K. G. Schneider Karen Coyle Lawrence Lessig Leslie Johnston Library Loon Lorcan Dempsey Paul Courant Peter Brantley Walt Crawford Metadata and friends Planet Cataloging Shiny tech Boing Boing O’Reilly Radar Planet Code4lib Tales from the repository RepositoryMan Writing and publishing if:book Making Light Publishing Frontier Everybody's Libraries Blog at WordPress.com. Everybody's Libraries Blog at WordPress.com. Email (Required) Name (Required) Website   Loading Comments... Comment × faillab-wordpress-com-5107 ---- Fail!lab Fail!lab technology, libraries and the future! Luddites, Trumpism and Change: A crossroads for libraries “Globalization is a proxy for technology-powered capitalism, which tends to reward fewer and fewer members of society.” – Om Malik Corner someone and they will react. We may be seeing this across the world as change, globalization, technology and economic dislocation force more and more people into the corner of benefit-nots. They are reacting out […] Is 3D Printing Dying? Inc.’s John Brandon recently wrote about The Slow, Sad, and Ultimately Predictable Decline of 3D Printing. Uh, not so fast. 3D Printing is just getting started. For libraries whose adopted mission is to introduce people to emerging technologies, this is a fantastic opportunity to do so. But it has to be done right. Another dead […] The State of the Library Website T’was a time when the Library Website was an abomination. Those dark days have lightened significantly. But new clouds have appeared on the horizon. Darkest Before the Dawn In the dark ages of Library Websites, users suffered under UX regimes that were rigid, unhelpful and confusing. This was before responsive design became a standard in […] Virtual Realty is Getting Real in the Library My library just received three Samsung S7 devices with Gear VR goggles. We put them to work right away. The first thought I had was: Wow, this will change everything. My second thought was: Wow, I can’t wait for Apple to make a VR device! The Samsung Gear VR experience is grainy and fraught with […] W3C’s CSS Framework Review I’m a longtime Bootstrap fan, but recently I cheated on my old framework. Now I’m all excited by the W3C’s new framework. Like Bootstrap, the W3C’s framework comes with lots of nifty utilities and plug and play classes and UI features. Even if you have a good CMS, you’ll find many of their code libraries […] AI First Looking to the future, the next big step will be for the very concept of the “device” to fade away. Over time, the computer itself—whatever its form factor—will be an intelligent assistant helping you through your day. We will move from mobile first to an AI first world. Google Founder’s Letter, April 2016 My Library […] Google Analytics and Privacy Collecting web usage data through services like Google Analytics is a top priority for any library. But what about user privacy? Most libraries (and websites for that matter) lean on Google Analytics to measure website usage and learn about how people access their online content. It’s a great tool. You can learn about where people […] The L Word I’ve been working with my team on a vision document for what we want our future digital library platform to look like. This exercise keeps bringing us back to defining the library of the future. And that means addressing the very use of the term, ‘Library.’ When I first exited my library (and information science) […] Locking Down Windows I’ve recently moved Back to Windows for my desktop computing. But Windows 10 comes with enormous privacy and security issues that people need to take into account…and get under a semblance of control. Here’s how I did it. There has been much written on this subject, so what I’m including here is more of a […] Killer Apps & Hacks for Windows 10 Did the UX people at Microsoft ever test Windows 10? Here are some must have apps and hacks I’ve found to make life on Windows 10 quick and easy. Set Hotkeys for Apps Sometimes you just want to launch an app from your keyboard. Using a method on Laptopmag.com, you can do this for most […] faillab-wordpress-com-7233 ---- Fail!lab | technology, libraries and the future! Fail!lab technology, libraries and the future! Menu Skip to content Home About Luddites, Trumpism and Change: A crossroads for libraries Posted on December 6, 2016 by mryanhess “Globalization is a proxy for technology-powered capitalism, which tends to reward fewer and fewer members of society.” – Om Malik Corner someone and they will react. We may be seeing this across the world as change, globalization, technology and economic dislocation force more and more people into the corner of benefit-nots. They are reacting out of desperation. It’s not rational. It’s not pretty. But it shouldn’t be surprising. Years ago at a library conference, one of the keynote speakers forecast that there would be a return to the analog (sorry my Twitter-based memory does not identify the person). The rapidity of digitization would be met by a reaction. People would scurry back to the familiar, he said. They always do. Fast forward to 2016, where the decades-long trends toward globalization, borderless labor markets, denationalization, exponential technological change and corresponding social revolutions has hit the wall of public reaction. Brexit. Global Trumpism. Call it what you will. We’re in a change moment. The reaction is here. Reacting to the Reaction People in the Blue Zones, the Technorati, the beneficiaries of cheap foreign labor, free trade and technological innovation are scratching their heads. For all their algorithms and AI, they didn’t see this coming. Everything looked good on their feeds. No danger could possibly burst their self-assured bubble of inevitability. All was quiet. It was like a clear blue, September 2001, morning in New York City. It was like the boardroom in the Federal Reserve in 2006. The serenity was over in an instant. Since Brexit, and then Trump’s election, the Glittery Digitarians have initiated a period of introspection. They’re looking up from their stock tickers and gold-plated smart watches to find a grim reality: the world is crowded with people that have lost much ground at the expense of the global maelstrom that has elevated a very small, lucky few to greatness. They are now seeing, as for the first time, the shuttered towns. The empty retail stores. The displaced and homeless. Suddenly their confident talk of personal AI assistants has turned from technolust to terror. Their success suddenly looks short-sighted. Om Malik wrote in his recent New Yorker op-ed, that Silicon Valley may soon find itself equated with the super villains on Wall Street. He posits that a new business model needs to account for the public good…or else. I recently read Throwing Rocks at the Google Bus: How Growth Became the Enemy of Prosperity by Douglas Rushkoff. If you haven’t read it, now would be a good time. Like Bernie Sanders and others, Rushkoff has been warning of this kind of reaction for awhile. The system is not designed for the public good, but only around a narrow set of shareholder requirements. All other considerations do not compute. My Reaction Let me put this in personal perspective. In my work, I engage the public in “the heart of Silicon Valley” on what they want from their community and what’s missing. What I hear is concern about the loss of quiet, of connection to others, of a pace of life that is not 24/7 always a click away. This is consistent. People feel overwhelmed. As one of the chief technologists for my library, this puts me in a strange place. And I’ve been grappling with it for the past few months. On the one hand, people are curious. They’re happy to try the next big thing. But you also hear the frustration. Meanwhile, the burden of the Tech Industry is more than inflated rents and traffic. There’s a very obvious divide between long-time residents and newcomers. There’s a sense that something has been lost. There’s anger too, even here in the shadow of Google and Facebook. The Library as a Philosophy The other day, I was visited by a Eurpean Library Director who wanted to talk about VR. He asked me where I thought we’d be in ten years. I hesitated. My thoughts immediately went back to the words of despair that I’d been hearing from the public lately. Of course, the genie’s out of the bottle. We can’t stop the digital era. VR interface revolutions will likely emerge. The robots will come. But we can harness this change to our benefit. We can add rules to heal it to our collective needs. This is where the Library comes in. We have a sharing culture. A model that values bridging divides, pooling resources and re-distributing knowledge. It’s a model that is practically unique to the library if you think about it. As I read Rushkoff, I kept coming back to the Librarian’s philosophy on sharing. In his book, he contends that we need to re-imagine (re-code) our economy to work for people. He recalls technologies like HTTP and RSS which were invented and then given away to the world to share and re-use. This sounded very ‘librarian’ to me. We share knowledge in the form of access to technology, after all. We host training on new maker gear, coding, robotics, virtual reality. Perhaps we need to double-down on this philosophy. Perhaps, we can be more than just a bridge. Maybe we can be the engine driving our communities to the other side. We can not just advocate, but do. Have a hackathon? Build a public alternative to the Airbnb app to be used by people in your town. Know the Future In the end, libraires, technologists and digitarians need to tell a better story. We need to get outside our bubbles and tell that story with words that resonate with the benefit-nots. And more, we need that story to be backed up with real-world benefits. It starts with asking the community what kind of world they want to live it? What obstacles keep them from living that way? And then how the library and technology can help make change. We have the philosophy, we have the spaces and we have public permission. Let’s get to work. Posted in innovation, librarianship, society, technology, Uncategorized | Leave a comment Is 3D Printing Dying? Posted on October 12, 2016 by mryanhess Inc.’s John Brandon recently wrote about The Slow, Sad, and Ultimately Predictable Decline of 3D Printing. Uh, not so fast. 3D Printing is just getting started. For libraries whose adopted mission is to introduce people to emerging technologies, this is a fantastic opportunity to do so. But it has to be done right. Another dead end? Brandon cites a few reasons for his pessimism: 3D printed objects are low quality and the printers are finicky 3D printing growth is falling behind initial estimates people in manufacturing are not impressed and the costs are too high I won’t get into all that’s wrong with this analysis, as I feel like most of it is incorrect, or at the very least, a temporary problem typical of a new technology. Instead, I’d like to discuss this in the library maker context. And in fact, you can apply these ideas to any tech project. How to make failure a win—no matter what Libraries are quick to jump on tech. Remember those QR Codes that would revolutionize mobile access? Did your library consider a Second Life branch? How about those Chromebooks! Inevitably, these experiments are going to fail. But that’s okay. As this blog often suggests, failure is a win when doing so teaches you something. Experimenting is the first step in the process of discovery. And that’s really what all these kinds of projects need to be. In the case of a 3D Printing project at your library, it’s important to keep this notion front and center. A 3D Printing pilot with the goal of introducing the public to the technology can be successful if people simply try it out. That seems easy enough. But to be really successful, even this kind of basic 3D Printing project needs to have a fair amount of up-front planning attached to it. Chicago Public Library created a successful Maker Lab. Their program was pretty simple: Hold regular classes showing people how to use the 3D printers and then allow those that completed the introductory course to use the printers in open studio lab times. When I tried this out at CPL, it was quite difficult to get a spot in the class due to popularity. The grant-funded project was so successful, based on the number of attendees, that it was extended and continues to this day. As a grant-funded endeavor, CPL likely wrote out the specifics before any money was handed over. But even an internally-funded project should do this. Keep the goals simple and clear so expectations on the front line match those up the chain of command. Figure out what your measurements of success are before you even purchase the first printer. Be realistic. Always document everything. And return to that documentation throughout the project’s timeline. Taking it to the next level San Diego Public Library is an example of a Maker Project that went to the next level. Uyen Tran saw an opportunity to merge startup seminars with their maker tools at her library. She brought aspiring entrepreneurs into her library for a Startup Weekend event where budding innovators learned how the library could be a resource for them as they launched their companies. 3D printers were part of this successful program. It’s important to note that Uyen already had the maker lab in place before she launched this project. And it would be risky for a library to skip the establishment of a rudimentary 3D printer program before trying for this more ambitious program. But it could be done if that library was well organized with solid project managers and deep roots in the target community. But that’s a tall order to fill. What’s the worst thing that could go wrong? The worst thing that could go wrong is doubling down on failure: repeating one failed project after another without changing the flawed approach behind it. I’d also add that libraries are often out ahead of the public on these technologies, so dead ends are inevitable. To address this, I would also add one more tactic to your tech projects: listening. The public has lots of concerns about a variety of things. If you ask them, they’ll tell you all about them. Many of their concerns are directly related to libraries, but we can often help. We have permission to do so. People trust us. It’s a great position to be in. But we have to ask them to tell us what’s on their mind. We have to listen. And then we need to think creatively. Listening and thinking outside the box was how San Diego took their 3D Printers to the next level. The Long Future of 3D Printing The Wright Brothers first flight managed only 120 feet in the air. A year later, they flew 24 miles. These initial attempts looked nothing like the jet age and yet the technology of flight was born from these humble experiments. Already, 3D printing is being adopted in multiple industries. Artists are using it to prototype their designs. Astronauts are using it to print parts aboard the International Space Station. Bio-engineers are now looking at printing stem-cell structures to replace organs and bones. We’re decades away from the jet age of 3D printing, but this tech is here to stay. John Brandon’s read is incorrect simply because he’s looking at the current state and not seeing the long-term promise. When he asks a Ford engineer for his take on 3D Printing in the assembly process, he gets a smirk. Not a hotbed of innovation. What kind of reaction would he have gotten from an engineer at Tesla? At Apple? Fundamentally, he’s approaching 3D Printers from the wrong perspective and this is why it looks doomed. Libraries should not make this mistake. The world is changing ever more quickly and the public needs us to help them navigate the new frontier. We need to do this methodically, with careful planning and a good dose of optimism. Posted in innovation, technology | Tagged 3D printing, innovation, project planning | 2 Comments The State of the Library Website Posted on September 28, 2016 by mryanhess T’was a time when the Library Website was an abomination. Those dark days have lightened significantly. But new clouds have appeared on the horizon. Darkest Before the Dawn In the dark ages of Library Websites, users suffered under UX regimes that were rigid, unhelpful and confusing. This was before responsive design became a standard in the library world. It was before search engine optimization started to creep into Library meetings. It was before user experience became an actual librarian job title. We’ve come a long way since I wrote The Ugly Truth About Library Websites. Most libraries have evolved beyond the old “website as pamphlet” paradigm to one that is dynamic and focused on user tasks. Public libraries have deployed platforms like BiblioCommons to serve responsive, task-oriented interfaces that integrate their catalogs, programming and website into a single social platform. Books, digital resources, programs and even loanable equipment are all accessible via a single search. What’s more, the critical social networking aspects of library life are also embedded along the user’s path. Celebrated examples of this integrated solution include the San Francisco Public Library and Chicago Public Library. Queens is also hard at work to develop a custom solution. In the academic realm, libraries have turned to unified discovery layers like WorldCat Discovery and EBSCO Discovery Service to simplify (Googlize) the research process. These systems put a single-search box front and center that access resources on the shelf, but also all those electronic resources that make up the bulk of academic budgets. And while there are still many laggards, few libraries ignore these problems outright. The Storm Ahead While the general state of online library interfaces has improved, the unforgiving, hyperbolic curve of change continues to press forward. And libraries cannot stay put. Indeed, we need to quicken our pace and prepare our organizations for ongoing recalibration as the tempo of change increases. The biggest problem for library websites, is that there is little future for the library website. That’s because people will get less and less information through web browsers. Indeed, consider how often you use a web browser on your phone versus an app. Developments in AI, Augmented Reality and Virtual Reality will compound that trend. If you’re like Chris Milk, videographer and VR evangelist, you see the writing on the wall. The modes of how we experience information are about to undergo a fundamental revolution. Milk likens the current state of VR to the old black and white silent films at the dawn of motion pictures. I’d extend this line of thinking to the web page. Within a decade or two, I expect people will look back on web pages as a brief, transitory medium bridging print information to linked data. And as our AI, VR and AR technologies take off, they will liberate information from the old print paradigms altogether. In short, people will interact with information in more direct ways. They will ask a computer to provide them the answer. They will virtually travel to a “space” where they can experience the information they seek. Get Ready to Re-invent the Library…again So where does the library fit into this virtualized and automated future? One possibility is that the good work to transform library data into linked data will enable us to survive this revolution. In fact, it may be our best hope. Another hope is that we continue to emphasize the library as a social space for people to come together around ideas. Whether its a virtual library space or a physical one, the library can be the place in both local and global communities where people meet their universal thirst for connecting with others. The modes of those ideas (books, ebooks, videos, games) will matter far less than the act of connecting. In a sense, you could define the future online library as something between an MMORPG, Meetup.com and the TED conference. So, the library website is vastly improved, but we won’t have long to rest on our laurels. Ready Player One? Put on your VR goggles. Call up Siri. Start rethinking everything you know about the Library website.     Posted in information architecture, librarianship | Tagged internet, libraries, user experience, web design, websites | 1 Comment Virtual Realty is Getting Real in the Library Posted on June 20, 2016 by mryanhess My library just received three Samsung S7 devices with Gear VR goggles. We put them to work right away. The first thought I had was: Wow, this will change everything. My second thought was: Wow, I can’t wait for Apple to make a VR device! The Samsung Gear VR experience is grainy and fraught with limitations, but you can see the potential right away. The virtual reality is, after all, working off a smartphone. There is no high-end graphics card working under the hood. Really, the goggles are just a plastic case holding the phone up to your eyes. But still, despite all this, it’s amazing. Within twenty-four hours, I’d surfed beside the world’s top surfers on giant waves off Hawaii, hung out with the Masai in Africa and shared an intimate moment with a pianist and his dog in their (New York?) apartment. It was all beautiful. We’ve Been Here Before Remember when the Internet came online? If you’re old enough, you’ll recall the crude attempts to chat on digital bulletin board systems (BBS) or, much later, the publication of the first colorful (often jarringly so) HTML pages. It’s the Hello World! moment for VR now. People are just getting started. You can tell the content currently available is just scratching the surface of potentialities for this medium. But once you try VR and consider the ways it can be used, you start to realize nothing will be the same again. The Internet Will Disappear So said Google CEO Erik Schmidt in 2015. He was talking about the rise of AI, wearable tech and many other emerging technologies that will transform how we access data. For Schmidt, the Internet will simply fade into these technologies to the point that it will be unrecognizable. I agree. But being primarily a web librarian, I’m mostly concerned with how new technologies will translate in the library context. What will VR mean for library websites, online catalogs, eBooks, databases and the social networking aspects of libraries. So after trying out VR, I was already thinking about all this. Here are some brief thoughts: Visiting the library stacks in VR could transform the online catalog experience Library programming could break out of the physical world (virtual speakers, virtual locations) VR book discussions could incorporate virtual tours of topics/locations touched on in books Collections of VR experiences could become a new source for local collections VR maker spaces and tools for creatives to create VR experiences/objects Year Zero? Still, VR makes your eyes tired. It’s not perfect. It has a long way to go. But based on my experience sharing this technology with others, it’s addictive. People love trying it. They can’t stop talking about it afterward. So, while it may be some time before the VR revolution disrupts the Internet (and virtual library services with it), it sure feels imminent. Posted in innovation, librarianship, technology | Tagged gear vr, internet, oculus, samsung, virtual reality, vr | Leave a comment W3C’s CSS Framework Review Posted on May 10, 2016 by mryanhess I’m a longtime Bootstrap fan, but recently I cheated on my old framework. Now I’m all excited by the W3C’s new framework. Like Bootstrap, the W3C’s framework comes with lots of nifty utilities and plug and play classes and UI features. Even if you have a good CMS, you’ll find many of their code libraries quite handy. And if you’re CMS-deficient, this framework will save you time and headaches! Why a Framework? Frameworks are great for saving time. You don’t have to reinvent the wheel for standard UI chunks like navigation, image positioning, responsive design, etc. All you need to do is reference the framework in your code and you can start calling the classes to make your site pop. And this is really great since not all well-meaning web teams have an eye for good design. Most quality frameworks look really nice, and they get updated periodically to keep up with design trends. And coming from this well-known standards body, you can also be assured that the W3C’s framework complies with all the nitty-gritty standards all websites should aspire to. Things to Love Some of the things I fell in love with include: CSS-driven navigation menus. There’s really no good reason to rely on JavaScript for a responsive, interactive navigation menu. The W3C agrees. Icon support. This framework allows you to choose from three popular icon sets to bring icons right into your interface. Image support: Lots of great image styling including circular cropping, shadowing, etc. Cards. Gotta love cards in your websites and this framework has some very nice looking card designs for you to use. Built-in colors. Nuff sed. Animations. There are plenty of other nice touches like buttons that lift off the screen, elements that drop into place and much more. I give it a big thumbs up! Check it out at the W3C.org.     Posted in reviews | Tagged css, frameworks, w3c, web design | 1 Comment AI First Posted on May 2, 2016 by mryanhess Looking to the future, the next big step will be for the very concept of the “device” to fade away. Over time, the computer itself—whatever its form factor—will be an intelligent assistant helping you through your day. We will move from mobile first to an AI first world. Google Founder’s Letter, April 2016 My Library recently finalized a Vision Document for our virtual library presence. Happily, our vision was aligned with the long-term direction of technology as understood by movers and shakers like Google. As I’ve written previously, the Library Website will disappear. But this is because the Internet (as we currently understand it) will also disappear. In its place, a new mode of information retrieval and creation will move us away from the paper-based metaphor of web pages. Information will be more ubiquitous. It will be more free-form, more adaptable, more contextualized, more interactive. Part of this is already underway. For example, people are becoming a data set. And other apps are learning about you and changing how they work based on who you are. Your personal data set contains location data, patterns in speech and movement around the world, consumer history, keywords particular to your interests, associations based on your social networks, etc. AI Emerging All of this information makes it possible for emerging AI systems like Siri and Cortana to better serve you. Soon, it will allow AI to control the flow of information based on your mood and other factors to help you be more productive. And like a good friend that knows you very, very well, AI will even be able to alert you to serendipitous events or inconveniences so that you can navigate life more happily. People’s expectations are already being set for this kind of experience. Perhaps you’ve noticed yourself getting annoyed when your personal assistant just fetches a Wikipedia article when you ask it something. You’re left wanting. What we want is that kernel of gold we asked about. But what we get right now, is something too general to be useful. But soon, that will all change. Nascent AI will soon be able to provide exactly the piece of information that you really want rather than a generalized web page. This is what Google means when they make statements like “AI First” or “the Web will die.” They’re talking about a world where information is not only presented as article-like web pages, but broken down into actual kernels of information that are both discrete and yet interconnected. AI First in the Library Library discussions often focus on building better web pages or navigation menus or providing responsive websites. But the conversation we need to have is about pulling our data out of siloed systems and websites and making it available to all modes like AI, apps and basic data harvesters. You hear this conversation in bits and pieces. The ongoing linked data project is part of this long-term strategy. So too with next-gen OPACs. But on the ground, in our local strategy meetings, we need to tie every big project we do to this emerging reality where web browsers are increasingly no longer relevant. We need to think AI First. Posted in librarianship, society, tech industry | Tagged artificial intelligence, google, internet, libraries, linked data | Leave a comment Google Analytics and Privacy Posted on April 27, 2016 by mryanhess Collecting web usage data through services like Google Analytics is a top priority for any library. But what about user privacy? Most libraries (and websites for that matter) lean on Google Analytics to measure website usage and learn about how people access their online content. It’s a great tool. You can learn about where people are coming from (the geolocation of their IP addresses anyway), what devices, browsers and operating systems they are using. You can learn about how big their screen is. You can identify your top pages and much much more. Google Analytics is really indispensable for any organization with an online presence. But then there’s the privacy issue. Is Google Analytics a Privacy Concern? The question is often asked, what personal information is Google Analytics actually collecting? And then, how does this data collection jive with our organization’s privacy policies. It turns out, as a user of Google Analytics, you’ve already agreed to publish a privacy document on your site outlining the why and what of your analytics program. So if you haven’t done so, you probably should if only for the sake of transparency. Personally Identifiable Data Fact is, if someone really wanted to learn about a particular person, it’s not entirely outside the realm of possibility that they could glean a limited set of personal attributes from the generally anonymized data Google Analytics collects. IP addresses can be loosely linked to people. If you wanted to, you could set up filters in Google Analytics that look at a single IP. Of course, on the Google side, any user that is logged into their Gmail, YouTube or other Google account, is already being tracked and identified by Google. This is a broadly underappreciated fact. And it’s a critical one when it comes to how approach the question of dealing with the privacy issue. In both the case of what your organization collects with Google Analytics and what all those web trackers, including Google’s trackers, collect, the onus falls entirely on the user. The Internet is Public Over the years, the Internet has become a public space and users of the Web should understand it as such. Everything you do, is recorded and seen. Companies like Google, Facebook, Mircosoft, Yahoo! and many, many others are all in the data mining business. Carriers and Internet Service Providers are also in this game. They deploy technologies in websites that identify you and then sell what your interests, shopping habits, web searches and other activities are to companies interested in selling to you. They’ve made billions on selling your data. Ever done a search on Google and then seen ads all over the Web trying to sell you that thing you searched last week? That’s the tracking at work. Only You Can Prevent Data Fires The good news is that with little effort, individuals can stop most (but not all) of the data collection. Browsers like Chrome and Firefox have plugins like Ghostery, Avast and many others that will block trackers. Google Analytics can be stopped cold by these plugins. But it won’t solve all the problems. Users also need to set up their browsers to delete cookies websites save to their browsers. And moving off of accounts provided from data mining companies “for free” like Facebook accounts, Gmail and Google.com can also help. But you’ll never be completely anonymous. Super cookies are a thing and are very difficult to stop without breaking websites. And some trackers are required in order to load content. So sometimes you need to pay with your data to play. Policies for Privacy Conscious Libraries All of this means that libraries wishing to be transparent and honest about their data collection, need to also contextualize the information in the broader data mining debate. First and foremost, we need to educate our users on what it means to go online. We need to let them know its their responsibility alone to control their own data. And we need to provide instructions on doing so. Unfortunately, this isn’t an opt-in model. That’s too bad. It actually would be great if the world worked that way. But don’t expect the moneyed interests involved in data mining to allow the US Congress to pass anything that cuts into their bottom line. This ain’t Germany, after all. There are ways with a little javascript to create a temporary opt-in/opt-out feature to your site. This will toggle tags added by Google Tag Manager on and off with a single click. But let’s be honest. Most people will ignore it. And if they do opt-out, it will be very easy for them to overlook everytime without a much more robust opt-in/opt-out functionality baked in to your site. But for most sites and users, this is asking alot. Meanwhile, it diverts attention from the real solution: users concerned about privacy need to protect themselves and not take a given websites word for it. We actually do our users a service by going with the opt-out model. This underlines the larger privacy problems on the Wild Wild Web, which our sites are a part of. Posted in online security & privacy, society | Tagged data mining, google analytics, online security & privacy | 2 Comments The L Word Posted on March 21, 2016 by mryanhess I’ve been working with my team on a vision document for what we want our future digital library platform to look like. This exercise keeps bringing us back to defining the library of the future. And that means addressing the very use of the term, ‘Library.’ When I first exited my library (and information science) program, I was hired by Adobe Systems to work in a team of other librarians. My manager warned us against using the word ‘Librarian’ among our non-librarian colleagues. I think the gist was: too much baggage there. So, we used the word ‘Information Specialist.’ Fast forward a few years to my time in an academic environment at DePaul University Library and this topic came up in the context of services the library provided. Faculty and students associated the library in very traditional ways: a quiet, book-filled space. But the way they used the library was changing despite the lag in their semantic understanding. The space and the virtual tools we put in place online helped users not only find and evaluate information, but also create, organize and share information. A case in point was our adoption of digital publishing tools like Bepress and Omeka, but also the Scholar’s Lab. I’m seeing a similar contradiction in the public library space. Say library and people think books. Walk into a public library and people do games, meetings, trainings and any number of online tasks. This disconnect between what the word ‘Library’ evokes in the mind’s eye and what it means in practice is telling. We’ve got a problem with our brand. In fact, we may need a new word. Taken literally, a library has  been a word for a physical collection of written materials. The Library of Alexandria held scrolls for example. Even code developers rely on ‘libraries’ today, which are collections of materials. In every case, the emphasis is on the collection of things. Now, I’m not suggesting that we move away from books. Books are vessels for ideas and libraries will always be about ideas. In fact, this focus on ideas rather than any one mode for transmitting ideas is key. In today’s library’s people not only read about ideas, they meet to discuss ideas, they brainstorm ideas. I don’t pretend to have the magic word. In fact, maybe it’s taking so long for us to drop ‘Library’ because there is not a good word in existence. Maybe we need create a new one. One tactic that comes to mind as we navigate this terminological evolution is to retain the library, but subsume it inside of something new. I’ve seen this done to various degrees in other libraries. For example, Loyola University in Chicago built an entirely new building adjacent to the book-filled library. Administratively, the building is run by the library, but it is called the Klarchek Information Commons. In that rather marvelous space looking out over Lake Michigan, you’ll find the modern ‘library’ in all its glory. Computers, Collaboration booths, etc. I like this model for fixing our identity problem and I think it would work without throwing the baby out with the bathwater. However, its done, one thing is for sure. Our users have moved on from ‘the library’ and are left with no accurate way to describe that place that they love to go to when they want to engage with ideas. Let’s put our thinking caps on and puts a word on their lips that does justice to what the old library has become. Let’s get past the L Word. Posted in librarianship | Tagged branding, information commons | Leave a comment Locking Down Windows Posted on March 10, 2016 by mryanhess I’ve recently moved Back to Windows for my desktop computing. But Windows 10 comes with enormous privacy and security issues that people need to take into account…and get under a semblance of control. Here’s how I did it. There has been much written on this subject, so what I’m including here is more of a digest of what I’ve found elsewhere with perspective on how it worked out for me over time. Windows Tweaker This is a pretty good tool that does what Windows should do out of the box: give you one-stop access to all Windows’ settings. As it is, Windows 10 has spread out many settings, including those for Privacy, to the Settings screen as well as Registry Editor and Group Policy Editor. There are dozens of look and feel tweaks, including an easy way to force Windows to use the hidden Dark Theme. The Privacy Tab, however, is the single most important. There, you can easily turn of all the nasty privacy holes in Windows 10, such as how the OS sends things like keystrokes (that’s right!) back to Microsoft. The list of holes it will close is long: Telemetry, Biometrics, Advertising ID, Cortana, etc. Cortana Speaking of Cortana, I was really excited that this kind of virtual assistant was embedded in Windows 10. I looked forward to trying it out. But then I read the fine print. Cortana is a privacy nightmare. She can’t be trusted. She’s a blabbermouth and repeats back everything you tell her to not just Microsoft, but indirectly to all of their advertising partners. And who knows where all that data goes and how secure it is in the long run. Yuck! Turn her off. Pull the plug. Zero her out. The easiest way to disable her is to set up a Local Account. But there’s more info out there, including this at PC World. Local Account When you first install Windows 10, unplug the ethernet and shut down wifi. Then, when you’re certain that all of MSFT’s listeners can’t communicate with your machine, go through the Installation Set Up process and when asked to create/log in to your Microsoft Account, don’t. Instead, use the Local Account option. The down sides of going this route are that you can’t sync your experience, accounts and apps across devices. You also won’t be able to use Cortana. The up sides are that using a Local account means you will be far more secure and private in whatever you do with your computer (as long as you maintain the many other privacy settings). Reduce Risk and Streamline Your PC Windows 10 comes crammed with many programs you may not want. Some of these may even be tracking and sharing, so if you don’t actually use it, why not lighten the load on your system and remove them. You can do this the slow way, one app at a time, or you can use the Powershell nuclear option and kill them all at once. I did this and haven’t regretted it one bit. So fire away… Privacy Settings I won’t go into all of this. There is plenty of solid advise on reducing your exposure on other sites (like at PC World) and some lengthy YouTube videos which you can easily find. But it is critical that you go into the Settings panel and turn everything off at the very least. That’s my feeling. Some tell you that you even need to set up IP blocks to keep your machine from reporting back to Microsoft and its advertising partners. Others say this is somewhat overblown, and not unique to Windows, like over at LifeHacker, so I’ll leave it to you to decide. Conclusion It’s really too bad that operating systems have gone down this road. Our PCs should be tools for us and not the other way around. Imagine if everything that happened on your device stayed private. Imagine if it was all encrypted and nobody could hack into your PC or Microsoft’s servers or their advertisers’ databases and learn all kinds of things about you, your family, your work, your finances, your secrets. And yet, this is precisely what Microsoft (and iOS, Android and others) did, intentionally. Frankly, I think its bordering on criminal negligence, but good luck suing when your data gets exploited. Better safe than sorry…that’s my take. Do a little work and lock down your computer. Good luck out there…   Posted in online security & privacy, technology | Tagged microsoft, online security & privacy, security, Windows | Leave a comment Killer Apps & Hacks for Windows 10 Posted on March 3, 2016 by mryanhess Did the UX people at Microsoft ever test Windows 10? Here are some must have apps and hacks I’ve found to make life on Windows 10 quick and easy. Set Hotkeys for Apps Sometimes you just want to launch an app from your keyboard. Using a method on Laptopmag.com, you can do this for most any program. I use this in combination with macros like those noted below. Quick Switch to VPN VPN Macro If you’re a smart and secure Internet user, you probably already use a VPN service to encrypt the data and web requests you send over the Internet (especially while on public wif-fi networks). But Windows 10 makes connecting to your VPN service a bit of a chore (I use Private Internet Access, by the way). It’s weird because Windows actually placed the Connect to VPN in the Communications Center, but you still need to click into that, then click the VPN you want and then click Connect…that’s 3 clicks if you’re counting. I’ve tried two methods to make this at least a little easier. One caveat on all of this: if you log in with an administrator account (which I don’t because I’m concerned about security after all!), you could have your VPN client launch at start, but you’d still need to click the connect button and anytime you put the machine to sleep, it would disconnect (why they do that is beyond me). With both methods, you need to manually add a VPN account to Windows built-in VPN feature. Anyway, here are my two methods: Macro Method You can record actions as a “macro” and then save it as an executable program. You can then save the program to your desktop, start or taskbar. It’s a bit of a chore and in the end, the best you get is two-click access to your VPN connection…not the one-click you would get on a Mac. If my memory serves, this method only works if you log-in with an administrator account. Otherwise, you’ll be prompted for an administrator password each time…an who wants that? Create shortcut to Settings page Add a hotkey to shortcut: Create Macro using something like JitBit that uses the new hotkey. Save as executable Create a shortcut to the desktop and pin to Start Optionally, change the icon to look pretty Pin the Communicator VPN app to your Start pane. This is actually how I ended up going in the end. To do this, you need to ‘hack’ a shortcut that points to your VPN settings panel (where the Connect button resides). On your desktop, right-click and select New > Shortcut A Shortcut wizard will open Paste ms-settings:network-vpn into the form Now pin the shortcut to your Start and you have quick access to the Connect dialog for your VPN Switch between Audio Devices Sometimes I want to jump between my speakers and my headphones and because I hate clicking and loath jumping out of Windows 10’s Metro design into the old-school looking Audio Device Controller, I followed the advice from The Windows Club. Their solution uses freeware called Audio Switcher to assign a hotkey to different audio devices. I added Audio Switcher to my startup to make this a little more automated. Unfortunately, because I normally work in a non-administrator account on Windows 10, I get asked for an Admin password to launch this app at Startup. Egads! In my case, I can now click the F1 (Headphones) and F2 (Speakers)  keys to switch playback devices for sound. Overcoming the Windows Education or Windows Pro watermark Windows embeds a horrible little Windows Education or Windows Pro watermark over the lower right corner of your desktop if you use one of those versions. There are two solutions to removing this remarkably distracting bit of text. Use a white background to “disappear” the white text Or, have an app sit over that space. I use MusicBee (recommended by LifeHacker) and set position the mini-version over that spot. Supposedly there’s a Regex trick where you delete the text but that’s a bit much work for me for such a slight annoyance. Other Tricks There are a couple other tricks that I’ve used to clean up Windows. Removing Metro Apps. This allows you to remove all the built-in apps that are there simply to confound your privacy and peddle your identity to Microsoft’s advertising partners. Remove them. Removing default folders from Explorer. If you’re like me and want better performance, you use a separate hard disk drive for your music, video and images and another drive (probably an SSD) for your OS and programs. Windows 10 is confusing for people with this kind of set up by placing folders in the File Explorer to your Images, Documents, etc. on your C Drive. In my case, that’s not the right drive. So I used the method linked above to remove those from Explorer. Posted in technology | Tagged life hacks, macros, vpn, windows 10 | Leave a comment Post navigation ← Older posts Search Search Subscribe Enter your email address to subscribe to this blog and receive notifications of new posts by email. Join 150 other followers Email Address: Sign me up! Recent Luddites, Trumpism and Change: A crossroads for libraries Is 3D Printing Dying? The State of the Library Website Virtual Realty is Getting Real in the Library W3C’s CSS Framework Review Topics best practices (6) case studies (22) digital services (4) green tech (1) information architecture (13) innovation (33) international librarianship (2) librarianship (23) library management (18) online security & privacy (12) reviews (8) society (34) tech industry (21) technology (48) Uncategorized (2) Tweets RT @techn0joy: I often check other library websites for design inspiration. Today, I found my very favorite stat on @UVaLibrary 's page htt… 2 years ago Congress has sold your privacy. Let's buy theirs - gofundme.com/BuyCongressDat… 4 years ago Luddites, Trumpism and Change: A crossroads for libraries faillab.wordpress.com/2016/12/06/lud… 4 years ago Archives Archives Select Month December 2016 October 2016 September 2016 June 2016 May 2016 April 2016 March 2016 January 2016 December 2015 November 2015 April 2015 February 2015 January 2015 December 2014 November 2014 September 2014 April 2014 March 2014 February 2014 November 2013 October 2013 July 2013 June 2013 May 2013 April 2013 March 2013 February 2013 January 2013 November 2012 October 2012 September 2012 August 2012 July 2012 March 2012 February 2012 January 2012 December 2011 October 2011 September 2011 August 2011 July 2011 June 2011 May 2011 April 2011 March 2011 February 2011 January 2011 December 2010 Blog at WordPress.com. Fail!lab Blog at WordPress.com. Email (Required) Name (Required) Website   Loading Comments... Comment × Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy fc18-ifca-ai-5055 ---- A Quantitative Analysis of the Impact of Arbitrary Blockchain Content on Bitcoin Roman Matzutt1, Jens Hiller1, Martin Henze1, Jan Henrik Ziegeldorf1, Dirk Müllmann2, Oliver Hohlfeld1, and Klaus Wehrle1 1 Communication and Distributed Systems, RWTH Aachen University, Germany, {matzutt,hiller,henze,ziegeldorf,hohlfeld,wehrle}@comsys.rwth-aachen.de 2 Data Protection Research Institute, Goethe University, Frankfurt/Main, muellmann@jur.uni-frankfurt.de Abstract. Blockchains primarily enable credible accounting of digital events, e.g., money transfers in cryptocurrencies. However, beyond this original purpose, blockchains also irrevocably record arbitrary data, rang- ing from short messages to pictures. This does not come without risk for users as each participant has to locally replicate the complete blockchain, particularly including potentially harmful content. We provide the first systematic analysis of the benefits and threats of arbitrary blockchain content. Our analysis shows that certain content, e.g., illegal pornogra- phy, can render the mere possession of a blockchain illegal. Based on these insights, we conduct a thorough quantitative and qualitative anal- ysis of unintended content on Bitcoin’s blockchain. Although most data originates from benign extensions to Bitcoin’s protocol, our analysis re- veals more than 1600 files on the blockchain, over 99 % of which are texts or images. Among these files there is clearly objectionable content such as links to child pornography, which is distributed to all Bitcoin partic- ipants. With our analysis, we thus highlight the importance for future blockchain designs to address the possibility of unintended data insertion and protect blockchain users accordingly. 1 Introduction Bitcoin [45] was the first completely distributed digital currency and remains the most popular and widely accepted of its kind with a market price of ∼4750 USD per bitcoin as of August 31st, 2017 [14]. The enabler and key innovation of Bit- coin is the blockchain, a public append-only and tamper-proof log of all transac- tions ever issued. These properties establish trust in an otherwise trustless, com- pletely distributed environment, enabling a wide range of new applications, up to distributed general-purpose data management systems [69] and purely digital data-sharing markets [41]. In this work, we focus on the arbitrary, non-financial data on Bitcoin’s famous blockchain, which primarily stores financial transac- tions. This non-financial data fuels, e.g., digital notary services [50], secure re- leases of cryptographic commitments [16], or non-equivocation schemes [62]. However, since all Bitcoin participants maintain a complete local copy of the blockchain (e.g., to ensure correctness of blockchain updates and to bootstrap new users), these desired and vital features put all users at risk when objection- able content is irrevocably stored on the blockchain. This risk potential is exem- plified by the (mis)use of Bitcoin’s blockchain as an anonymous and irrevocable content store [40,56,35]. In this paper, we systematically analyse non-financial content on Bitcoin’s blockchain. While most of this content is harmless, there is also content to be considered objectionable in many jurisdictions, e.g., the depic- tion of nudity of a young woman or hundreds of links to child pornography. As a result, it could become illegal (or even already is today) to possess the block- chain, which is required to participate in Bitcoin. Hence, objectionable content can jeopardize the currently popular multi-billion dollar blockchain systems. These observations raise the question whether or not unintended content is ultimately beneficial or destructive for blockchain-based systems. To address this question, we provide the first comprehensive and systematic study of unin- tended content on Bitcoin’s blockchain. We first survey and explain methods to store arbitrary, non-financial content on Bitcoin’s blockchain and discuss poten- tial benefits as well as threats, most notably w.r.t. content considered illegal in different jurisdictions. Subsequently and in contrast to related work [56,40,12], we quantify and discuss unintended blockchain content w.r.t. the wide range of insertion methods. We believe that objectionable blockchain content is a pres- suring issue despite potential benefits and hope to stimulate research to mitigate the resulting risks for novel as well as existing systems such as Bitcoin. This paper is organized as follows. We survey methods to insert arbitrary data into Bitcoin’s blockchain in Section 2 and discuss their benefits and risks in Section 3. In Section 4, we systematically analyze non-financial content in Bitcoin’s blockchain and assess resulting consequences. We discuss related work in Section 5 and conclude this paper in Section 6. 2 Data Insertion Methods for Bitcoin Beyond intended recording of financial transactions, Bitcoin’s blockchain also allows for injection of non-financial data, either short messages via special trans- action types or even complete files by encoding arbitrary data as standard trans- actions. We first briefly introduce Bitcoin transactions and subsequently survey methods available to store arbitrary content on the blockchain via transactions. Bitcoin transactions transfer funds between a payer (sender) and a payee (receiver), who are identified by public-private key pairs. Payers announce their transactions to the Bitcoin network. The miners then publish these transactions in new blocks using their computational power in exchange for a fee. These fees vary, but averaged at 215 satoshi per Byte during August 2017 [4] (1 satoshi = 10−8 bitcoin). Each transaction consists of several input scripts, which unlock funds of previous transactions, and of several output scripts, which specify who receives these funds. To unlock funds, input scripts contain a signature for the previous transaction generated by the owner of the funds. To prevent malicious scripts from causing excessive transaction verification overheads, Bitcoin uses transaction script templates and expects peers to discard non-compliant scripts. Data Insertion Methods Input ScriptsOutput Scripts P2PK P2PKH P2SHP2MS P2SH Injectors SatoshiCryptoGraffiti Apertus StandardOP_RET. Non-St. Coinbase P2SHNon-St. Fig. 1: Bitcoin data insertion methods (italics show content insertion services) Method Payload Costs/B Eff. OP RET. 80 B 3.18–173.55 ct poor Coinbase 96 B — poor Non-St. Out. 99 044 B 1.03–198.05 ct poor Non-St. In. med. P2PK 85 345 B 1.24–207.79 ct high P2PKH 58 720 B 1.87–197.58 ct high P2MS 92 625 B 1.11–234.33 ct high P2SH Out. 62 400 B 1.77–195.54 ct high P2SH In. 99 018 B 1.03–225.61 ct high Table 1: Payload, costs, and efficiency of low-level data insertion methods Figure 1 shows the insertion methods for non-financial data we identified in Bitcoin. We distinguish low-level data insertion methods inserting small data chunks and content insertion services, which systematically utilize the low-level methods to insert larger chunks of data. In the following, we refer to non-financial blockchain data as content if it has a self-contained structure, e.g., a file or read- able text, or as data otherwise, e.g., fragments inserted via a low-level method. 2.1 Low-level Data Insertion Methods We first survey the efficiency of the low-level data insertion methods w.r.t. to in- sertable payload and costs per transaction (Table 1). To this end, we first explain our comparison methodology, before we detail i) intended data insertion meth- ods (OP RETURN and coinbase), ii) utilization of non-standard transactions, and iii) manipulation of standard transactions to insert arbitrary data. Comparison Methodology. We measure the payload per transaction (PpT), i.e., the number of non-financial Bytes that can be added to a single standard- sized transaction (≤ 100 000 B). Costs are given as the minimum and maximum costs per Byte (CpB) for the longest data chunk a transaction can hold, and for inserting 1 B. Costs are inflicted by paying transaction fees and possibly burning currency (at least 546 satoshi per output script), i.e., making it unspendable. For our cost analysis we assume Bitcoin’s market price of 4748.25 USD as of August 31st, 2017 [14] and the average fees of 215 satoshi per Byte as of August 2017 [4]. Note that high variation of market price and fees results in frequent changes of presented absolute costs per Byte. Finally, we rate the overall efficiency of an approach w.r.t. insertion of arbitrary-length content. Intuitively, a method is efficient if it allows for easy insertion of large payloads at low costs. OP RETURN. This special transaction template allows attaching one small data chunk to a transaction and thus provides a controlled channel to an- notate transactions without negative side effects. E.g., in typical implementa- tions peers increase performance by caching spendable transaction outputs and OP RETURN outputs can safely be excluded from this cache. However, data chunk sizes are limited to 80 B per transaction. Coinbase. In Bitcoin, each block contains exactly one coinbase transaction, which introduces new currency into the system to incentivize miners to dedi- cate their computational power to maintain the blockchain. The input script of coinbase transactions is up to 100 B long and consists of a variable-length field encoding the new block’s position in the blockchain [9]. Stating a larger size than the overall script length allows placing arbitrary data in the resulting gap. This method is inefficient as only active miners can insert only small data chunks. Non-standard Transactions. Transactions can deviate from the approved transaction templates [48] via their output scripts as well as input scripts. In the- ory, such transactions can carry arbitrarily encoded data chunks. Transactions using non-standard output scripts can carry up to 96.72 KiB at comparably low costs. However, they are inefficient as miners ignore them with high probability. Yet, non-standard output scripts occasionally enter the blockchain if miners in- sufficiently check them (cf. Section 4.2). Contrarily, non-standard input scripts are only required to match their respective output script. Hence, input scripts can be altered to carry arbitrary data if their semantics are not changed, e.g., by using dead conditional branches. This makes non-standard input scripts slightly better suited for large-scale content insertion than non-standard output scripts. Standard Financial Transactions. Even standard financial transactions can be (mis)used to insert data using mutable values of output scripts. There are four approved templates for standard financial transactions: Pay to public-key (P2PK) and pay to public-key hash (P2PKH) transactions send currency to a dedicated receiver, identified by an address derived from her private key, which is required to spend any funds received [48]. Similarly, multi-signature (P2MS) transactions require m out of n private keys to authorize payments. Pay to script hash (P2SH) transactions refer to a script instead of keys to enable complex spending conditions [48], e.g., to replace P2MS [10]. The respective public keys (P2PK, P2MS) and script hash values (P2PKH, P2SH) can be replaced with ar- bitrary data as Bitcoin peers can not verify their correctness before they are ref- erenced by a subsequent input script. While this method can store large amounts of content, it involves significant costs: In addition to transaction fees, the user must burn bitcoins as she replaces valid receiver identifiers with arbitrary data (i.e., invalid receiver identities), making the output unspendable. Using multi- ple outputs enables PpTs ranging from 57.34 KiB (P2PKH) to 96.70 KiB (P2SH inputs) at CpBs from 1.03 ct to 1.87 ct. As they behave similarly w.r.t. data in- sertion, we collectively refer to all standard financial transactions as P2X in the following. P2SH scripts also allow for efficient data insertion into input scripts as P2SH input scripts are published with their redeem script. Due to miners’ verification of P2SH transactions, transaction are not discarded if the redeem script is not template-compliant (but the overall P2SH transaction is). We now survey different services that systematically leverage the discussed data insertion methods to add larger amounts of content to the blockchain. 2.2 Content Insertion Services Content insertion services rely on the low-level data insertion methods to add content, i.e., files such as documents or images, to the blockchain. We identify four conceptually different content insertion services and present their protocols. CryptoGraffiti. This web-based service [30] reads and writes messages and files from and to Bitcoin’s blockchain. It adds content via multiple P2PKH output scripts within a single transaction, storing up to 60 KiB of content. To retrieve previously added content, CryptoGraffiti scans for transactions that either con- sist of at least 90 % printable characters or contain an image file. Satoshi Uploader. The Satoshi Uploader [56] inserts content using a single transaction with multiple P2X outputs. The inserted data is stored together with a length field and a CRC32 checksum to ease decoding of the content. P2SH Injectors. Several services [35] insert content via slightly varying P2SH input scripts. They store chunks of a file in P2SH input scripts. To ensure file integrity, the P2SH redeem scripts contain and verify hash values of each chunk. Apertus. This service [29] allows fragmenting content over multiple transac- tions using an arbitrary number of P2PKH output scripts. Subsequently, these fragments are referenced in an archive stored on the blockchain, which is used to retrieve and reassemble the fragments. The chosen encoding optionally allows augmenting content with a comment, file name, or digital signature. To conclude, Bitcoin offers various options to insert arbitrary, non-financial data. These options range from small-scale data insertion methods exclusive to active miners to services that allow any user to store files of arbitrary length. This wide spectrum of options for data insertion raises the question which benefits and risks arise from storing content on Bitcoin’s blockchain. 3 Benefits and Risks of Arbitrary Blockchain Content Bitcoin’s design includes several methods to insert arbitrary, non-financial data into its blockchain in both intended and unintended ways. In this section, we discuss potential benefits of engraving arbitrary data into Bitcoin’s blockchain as well as risks of (mis)using these channels for content insertion. 3.1 Benefits of Arbitrary Blockchain Content Besides the manipulation of standard financial transactions, Bitcoin offers coin- base and OP RETURN transactions as explicit channels to irrevocably insert small chunks of non-financial data into its blockchain (cf. Section 2). As we discuss in the following, each insertion method has distinguishing benefits: OP RETURN. Augmenting transactions with short pieces of arbitrary data is beneficial for a wide area of applications [40,12,62]. Different services use OP RETURN to link non-financial assets, e.g., vouchers, to Bitcoin’s block- chain [40,12], to attest the existence of digital documents at a certain point of time as a digital notary service [58,50,12], to realize distributed digital rights management [70,12], or to create non-equivocation logs [62,8]. Coinbase. Coinbase transactions differ from OP RETURN as only miners, who dedicate significant computational resources to maintain the blockchain, can use them to add extra chunks of data to their newly mined blocks. Beyond advertisements or short text messages [40], coinbase transactions can aid the mining process. Adding random bytes to the coinbase transactions allows miners to increase entropy when repeatedly testing random nonces to solve the proof- of-work puzzle [48]. Furthermore, adding identifiable voting flags to transactions enables miners to vote on proposed features, e.g., the adoption of P2SH [10]. Large-scale Data Insertion. Engraving large amounts of data into the block- chain creates a long-term non-manipulable file storage. This enables, e.g., the archiving of historical data or censorship-resistant publication, which helps pro- tecting whistleblowers or critical journalists [66]. However, their content is repli- cated to all users, who do not have a choice to reject storing it. Hence, non-financial data on the blockchain enables new applications that leverage Bitcoin’s security guarantees. In the following, we discuss threats of forcing honest users to download copies of all blockchain content. 3.2 Risks of Arbitrary Blockchain Content Despite potential benefits of data in the blockchain, insertion of objectionable content can put all participants of the Bitcoin network at risk [43,11,40], as such unwanted content is unchangeable and locally replicated by each peer of the Bitcoin network as benign data. To underpin this threat, we first derive an extensive catalog of content that poses high risks if possessed by individuals and subsequently argue that objectionable blockchain content is able to harm honest users. In the following, we identify five categories of objectionable content: Copyright Violations. With the advent of file-sharing networks, pirated data has become a huge challenge for copyright holders. To tackle this problem, copy- right holders predominantly target users that actively distribute pirated data. E.g., German law firms sue users who distribute copyright-protected content via file-sharing networks for fines on behalf of the copyright holders [28]. In re- cent years, prosecutors also convicted downloaders of pirated data. For instance, France temporarily suspended users’ Internet access and subsequently switched to issuing high fines [36]. As users distribute their blockchain copy to new peers, copyright-protected material on the blockchain can thus provoke legal disputes about copyright infringement. Malware. Another threat is to download malware [20,42], which could poten- tially be spread via blockchains [31]. Malware has serious consequences as it can destroy sensitive documents, make devices inoperable, or cause financial losses [34]. Furthermore, blockchain malware can irritate users as it causes an- tivirus software to deny access to important blockchain files. E.g., Microsoft’s antivirus software detected a non-functional virus signature from 1987 on the blockchain, which had to be fixed manually [68]. Privacy Violations. By disclosing sensitive personal data, individuals can harm their own privacy and that of others. This threat peaks when individuals deliberately violate the privacy of others, e.g., by blackmailing victims under the threat of disclosing sensitive data about them on the blockchain. Real-world manifestations of these threats are well-known, e.g., non-consensually releasing private nude photos or videos [54] or fully disclosing an individual’s identity to the public with malicious intents [21]. Jurisdictions such as the whole European Union begin to actively prosecute the unauthorized disclosure and forwarding of private information in social networks to counter this novel threat [5]. Politically Sensitive Content. Governments have concerns regarding the leakage of classified information such as state secrets or information that other- wise harms national security, e.g., propaganda. Although whistleblowers reveal nuisances such as corruption, they force all blockchain users to keep a copy of leaked material. Depending on the jurisdiction, the intentional disclosure or the mere possession of such content may be illegal. While, e.g., the US government usually tends to prosecute intentional theft or disclosure of state secrets [63], in China the mere possession of state secrets can result in longtime prison sen- tences [49]. Furthermore, China’s definition of state secrets is vague [49] and covers, e.g., “activities for safeguarding state security” [60]. Such vague allega- tions w.r.t. state secrets have been applied to critical news in the past [18,24]. Illegal and Condemned Content. Some categories of content are virtually universally condemned and prosecuted. Most notably, possession of child pornog- raphy is illegal at least in the 112 countries [64] that ratified an optional protocol to the Convention on the Rights of the Child [65]. Religious content such as cer- tain symbols, prayers, or sacred texts can be objectionable in extremely religious countries that forbid other religions and under oppressive regimes that forbid re- ligion in general. As an example, possession of items associated with an objected religion, e.g., Bibles in Islamist countries, or blasphemy have proven risky and were sometimes even punished by death [13,38]. In conclusion, a wide range of objectionable content can cause direct harm if possessed by users. In contrast to systems such as social media platforms, file-sharing networks, or online storage systems, such content can be stored on blockchains anonymously and irrevocably. Since all blockchain data is down- loaded and persistently stored by users, they are liable for any objectionable content added to the blockchain by others. Consequently, it would be illegal to participate in a blockchain-based systems as soon as it contains illegal content. While this risk has previously been acknowledged [43], definitive answers re- quire court rulings yet to come. However, considering legal texts we anticipate a high potential for illegal blockchain content to jeopardize blockchain-based sys- tem such as Bitcoin in the future. Our belief stems from the fact that, w.r.t. child pornography as an extreme case of illegal content, legal texts from countries such as the USA [47], England [3], Ireland [32] deem all data illegal that can be con- verted into a visual representation of illegal content. As we stated in Section 2, it is easily possible to locate and reassemble such content on the blockchain. Hence, even though convertibility usually covers creating a visual representation by, e.g., decoding an image file, we expect that the term can be interpreted to include blockchain data in the future. For instance, this is already covered implicitly by German law, as a person is culpable for possession of illegal content if she knowingly possesses an accessible document holding said content [2]. It is criti- cal here that German law perceives the hard disk holding the blockchain as an document [1] and that users can easily reassemble any illegal content within the blockchain. Furthermore, users can be assumed to knowingly maintain control over such illegal content w.r.t. German law if sufficient media coverage causes the content’s existence to become public knowledge among Bitcoin users [61], as has been attempted by Interpol [31]. We thus believe that legislators will speak law w.r.t. non-financial blockchain content and that this has the potential to jeopardize systems such as Bitcoin if they hold illegal content. 4 Blockchain Content Landscape To understand the landscape of non-financial blockchain data and assess its potentials and risks, we thoroughly analyze Bitcoin’s blockchain as it is the most widely used blockchain today. Especially, we are interested in i) the degree of utilization of data and content insertion methods, ii) the temporal evolution of data insertion, and iii) the types of content on Bitcoin’s blockchain, especially w.r.t. objectionable content. In the following, we first outline our measurement methodology before we present an overview and the evolution of non-financial data on Bitcoin’s blockchain. Finally, we analyze files stored on the blockchain to derive if any objectionable content is already present on the blockchain. 4.1 Methodology We detect data-holding transactions recorded on Bitcoin’s blockchain based on our study of data insertion methods and content insertion services (cf. Section 2). We distinguish detectors for data insertion methods and detectors for content insertion services. To reduce false positives, e.g., due to public-key hash values that resemble text, we exclude all standard transaction outputs that include already-spent funds from analysis. This is sensible as data-holding transactions replace public keys or hashes such that spending requires computing correspond- ing private keys or pre-images, which is assumed to be infeasible. Contrarily, even though we thoroughly analyzed possible insertion methods, there is still a chance that we do not exhaustively detect all non-financial data. Nevertheless, our con- tent type analysis establishes a solid lower bound as we only consider readable files retrieved from Bitcoin’s blockchain. In the following, we explain the key characteristics of the two classes of our blockchain content detectors. Low-level Insertion Method Detectors. The first class of detectors is tai- lored to match individual transactions that are likely to contain non-financial data (cf. Section 2.1). These detectors detect manipulated financial transactions as well as OP RETURN, non-standard, and coinbase transactions. Our text detector scans for P2X output scripts for mutable values containing ≥ 90 % printable ASCII characters (to avoid false positives). The detector returns the concatenation of all output scripts of the same transaction that contain text. Finally, we consider all coinbase and OP RETURN transactions as well as non-standard output scripts. We detect coinbase transactions based on the length field mismatch described in Section 2.1. OP RETURN scripts are detectable as they always begin with an OP RETURN operation. Non-standard output scripts comprise all output scripts which are not template-conform. 2009 2011 2013 2015 2017 101 103 105 107 T ra n sa c ti o n s [# ] OP RET. Coinb. Non-St. P2X P2SH Input Fig. 2: Cumulative numbers of detected transactions per data insertion method 2013 2014 2015 2016 2017 0.0 0.2 0.4 0.6 0.8 1.0 1.2 P re se n c e in T X s [% ] OP RET. P2X P2SH Input Fig. 3: Ratio of transactions that utilize data insertion methods Service Detectors. We implemented detectors specific to the content insertion services we identified in Section 2.2. These service-specific detectors enable us to detect and extract files based on the services’ protocols. These detectors also track the data insertion method used in service-created transactions. The CryptoGraffiti detector matches transactions with an output that sends a tip to a public-key hash controlled by its provider. For such a transaction, we concatenate all mutable values of output scripts that spend fewer than 10 000 satoshi and store them in a file. This threshold is used to ignore non- manipulated output scripts, e.g., the service provider spending their earnings. To detect a Satoshi Uploader transaction, we concatenate all of its mutable values that spend the same small amount of bitcoins. If we find the first eight bytes to contain a valid combination of length and CRC32 checksum for the transaction’s payload, we store the payload as an individual file. We detect P2SH Injector content based on redeem scripts containing more than one hash operation (standard transactions use at most one). We then ex- tract the concatenation of the second inputs of all redeem scripts (the first one contains a signature) of a transaction as one file. Finally, the Apertus detector recursively scans the blockchain for Apertus archives, i.e., Apertus-encoded lists of previous transaction identifiers. Once a referred Apertus payload does not constitute another archive, we retrieve its payload file and optional comment by parsing the Apertus protocol. Suspicious Transaction Detector. To account for less wide-spread insertion services, we finally analyze standard transactions that likely carry non-financial data but are not detected otherwise. We only consider transactions with at least 50 suspicious outputs, i.e., roughly 1 KiB of content. We consider a set of outputs suspicious if all outputs i) spend the same small amount (< 10 000 satoshi) and ii) are unspent. This detector trades off detection rate against false-positive rate. Due to overlaps with service detectors, we exclude matches of this detector from our quantitative analysis, but discuss individual findings in Section 4.3. 4.2 Utilization of Data Insertion Methods Data and content insertion in Bitcoin has evolved over time, transitioning from single miners exploiting coinbase transactions to sophisticated services that en- able the insertion of whole files into the blockchain. We study this evolution in 645 In se rt io n s/ M o n th [# ] 2013 2014 2015 2016 2017 0 50 100 150 P2SH Injectors CryptoGraffiti Satoshi Uploader Apertus Fig. 4: Number of files inserted via con- tent insertion services per month 2013 2014 2015 2016 2017 0 2 4 6 8 T X si z e s [M iB ] P2SH Injectors CryptoGraffiti Satoshi Uploader Apertus Fig. 5: Cumulative sizes of transactions from content insertion services terms of used data insertion methods as well as content insertion services and quantify the amount of blockchain data using our developed detectors. Our key insights are that OP RETURN constitutes a well-accepted success story while content insertion services are currently only infrequently utilized. However, the introduction of OP RETURN did not shut down other insertion methods, e.g., P2X manipulation, which enable single users to insert objectionable content. Our measurements are based on Bitcoin’s complete blockchain as of August 31st, 2017, containing 482 870 blocks and 250 845 217 transactions with a total disk size of 122.64 GiB. We first analyze the popularity of different data inser- tion methods and subsequently turn towards the utilization of content insertion services to assess how non-financial data enters the blockchain. Data Insertion Methods. As described in Section 2.1, OP RETURN and coinbase transactions constitute intended data insertion methods, whereas P2X and non-standard P2SH inputs manipulate legitimate transaction templates to contain arbitrary data. Figure 2 shows the cumulative number of transactions containing non-financial data on a logarithmic scale. In total, our detectors found 3 535 855 transactions carrying a total payload of 118.53 MiB, i.e., only 1.4 % of Bitcoin transactions contain non-financial data. However, we strive to further un- derstand the characteristics of non-financial blockchain content as even a single instance of objectionable content can potentially jeopardize the overall system. The vast majority of extracted transactions are OP RETURN (86.8 % of all matches) and coinbase (13.13 %) transactions. Combined, they constitute 95.90 MiB (80.91 % of all extracted data). Out of all blocks, 96.15 % have content- holding coinbase transactions. While only 0.26 % of these contain ≥ 90 % print- able text, 33.49 % of them contain ≥ 15 consecutive printable ASCII characters (mostly surrounded by data without obvious structure). Of these short messages, 14.39 % contain voting flags for new features (cf. Section 3.1). Apart from this, miners often advertise themselves or leave short messages, e.g., prayer verses. OP RETURN transactions were introduced in 2013 to offer a benign way to augment single transactions with non-financial data. This feature is widely used, as shown by Figure 3. Among all methods, OP RETURN is the only one to be present with a rising tendency, with currently 1.2 % of all transactions containing OP RETURN outputs. These transactions predominantly manage off-blockchain assets or originate from notary services [12]. While P2X transactions are contin- uously being manipulated, they make up only 0.02 % of all transactions; P2SH inputs are virtually irrelevant. Hence, short non-financial data chunks are well- accepted, viable extensions to the Bitcoin system (cf. Section 3.1). P2X transactions are asymmetric w.r.t. the number and sizes of data-carrying transactions. Although constituting only 1.6 % of all detector hits, they make up 9.08 % of non-financial data (10.76 MiB). This again highlights the high content- insertion efficiency of P2X transactions (cf. Section 2.1). Finally, we discuss non-standard transactions and non-standard P2SH in- put scripts. In total, we found 1703 transactions containing non-standard out- puts. The three first non-standard transactions (July 2010) repeatedly used the OP CHECKSIG operation. We dedicate this to an attempted DoS attack that tar- gets to cause high verification times. Furthermore, we found 23 P2PKH transac- tions from October 2011 that contained OP 0 instead of a hash value. The steady increase of non-standard transactions in 2012 is due to scripts that consist of 32 seemingly random bytes. Contrarily, P2SH input scripts sporadically carry non- standard redeem scripts and are then often used to insert larger data chunks (as they are used by P2SH Injectors). This is due to P2SH scripts not being checked for template conformity. We found 888 such transactions holding 8.37 MiB of data. Although peers should reject such transactions [48], they still often man- age to enter the blockchain. Non-standard P2SH scripts even carry a substantial amount of data (7.07 % of the total data originate from P2SH Injectors). Content Insertion Services. We now investigate to which extent content insertion services are used to store content on Bitcoin’s blockchain. Figure 4 shows utilization patterns for each service and Figure 5 shows the cumulative size of non-financial data inserted via the respective service. Notably, only few users are likely responsible for the majority of service-inserted content. In total, content insertion services account for 16.12 MiB of non-financial data. More than a half of this content (8.37 MiB) originates from P2SH In- jectors. The remainder was mostly inserted using Apertus (21.70 % of service- inserted data) and Satoshi Uploader (21.24 %). Finally, CryptoGraffiti accounts for 0.82 MiB (5.10 %) of content related to content insertion services. In the following, we study how the individual services have been used over time. Our key observation is that both CryptoGraffiti and P2SH Injectors are in- frequently but steadily used; since 2016 we recognize on average 23.65 data items being added per month using these services. Contrarily, Apertus has been used only 26 times since 2016, while the Satoshi Uploader has not been used at all. In fact, the Satoshi Uploader was effectively used only during a brief period: 92.73 % of all transactions emerged in April 2013. During this time, the service was used to upload four archives, six backup text files, and a PDF file. Although Apertus and the Satoshi Uploader have been used only infrequently, together they constitute 64.32 % of all P2X data we detected. This stems from the utilization of those services to engrave files into the blockchain, e.g., archives or documents (Satoshi Uploader), or images (Apertus). Similarly, P2SH Injectors are used to backup conversations regarding development of the Bitcoin client, especially online chat logs, forum threads, and emails, with a significant peak File Via Service? Overall File Via Service? Overall Type yes no Portion Type yes no Portion Text 1353 54 87.07 % Archive 4 0 0.25 % Images 144 2 9.03 % Audio 2 0 0.12 % HTML 45 0 2.78 % PDF 2 0 0.12 % Source Code 7 3 0.62 % Total 1557 59 100.0 % Table 2: Distribution of blockchain file types according to our content-insertion- service and suspicious-transactions detectors. utilization between May and June 2015 (76.46 % of P2SH Injector matches). Es- pecially Apertus is well-suited for this task as files are spread over multiple trans- actions. Based on the median, the average Apertus file has a size of 17.15 KiB and is spread over 10 transactions, including all overheads. The largest Aper- tus file is 310.72 KiB large (including overheads), i.e., three times the size of a standard transaction, and is spread over 96 transactions. The most heavily frag- mented Apertus file is even spread over 664 transactions. Contrarily, 95.7 % of CryptoGraffiti matches are short text messages with a median length of 80 Byte. In conclusion, content insertion services are only infrequently used with vary- ing intentions and large portions of content was uploaded in bursts, indicating that only few users are likely responsible for the majority of service-inserted blockchain content. While CryptoGraffiti is mostly used to insert short text messages that also fit into one OP RETURN transaction, other services are pre- dominantly used to store, e.g., images or documents. As such files can constitute objectionable content, we further investigate them in the following. 4.3 Investigating Blockchain Files After quantifying basic content insertion in Bitcoin, we now focus on readable files that are extractable from the blockchain. We refer to files as findings of our content-insertion-service or suspicious-transaction detectors that are viewable using appropriate standard software. We reassemble fragmented files only if this is unambiguously possible, e.g., via an Apertus archive. Out of the 22.63 MiB of blockchain data not originating from coinbase or OP RETURN transactions, we can extract and analyze 1557 files with meaningful content. In addition to these, we could extract 59 files using our suspicious-transaction detector (92.25 % text). Table 2 summarizes the different file types of the analyzed files. The vast majority are text-based files and images (99.34 %). In the following, we discuss our findings with respect to objectionable con- tent. We manually evaluated all readable files with respect to the problematic categories we identified in Section 3.2. This analysis reveals that content from all those categories already exists in Bitcoin’s blockchain today. For each of these categories, we discuss the most severe examples. To protect the safety and pri- vacy of individuals, we omit personal identifiable information and refrain from providing exact information on the location of critical content in the blockchain. Copyright Violations. We found seven files that publish (intellectual) property and showcase Bitcoin’s potential to aid copyright violations. Engraved are the text of a book, a copy of the original Bitcoin paper [45,56], and two short textual white papers. Furthermore, we found two leaked cryptographic keys: one RSA private key and a firmware secret key. Finally, the blockchain contains a so-called illegal prime, encoding software to break the copy protection of DVDs [56]. Malware. We could not find actual malware in Bitcoin’s blockchain. How- ever, an individual non-standard transaction contains a non-malicious cross-site scripting detector. A security researcher inserted this small piece of code which, if interpreted by an online blockchain parser, notifies the author about the vul- nerability. Such malicious code could become a threat for users as most websites offering an online blockchain parser also offer online Bitcoin accounts. Privacy Violations. Users store memorable private moments on the block- chain. We extracted six wedding-related images and one image showing a group of people, labeled with their online pseudonyms. Furthermore, 609 transactions contain online public chat logs, emails, and forum posts discussing Bitcoin, in- cluding topics such as money laundering. Storing private chat logs on the block- chain can, e.g., leak single user’s private information irrevocably. Moreover, third parties can release information without knowledge nor consent of affected users. Most notably, we found at least two instances of doxing, i.e., the complete dis- closure of another individual’s personal information. This data includes phone numbers, addresses, bank accounts, passwords, and multiple online identities. Recently, jurisdictions such as the European Union began to punish such serious privacy violations, including the distribution of doxing data [5]. Again, carrying out such assaults via blockchains fortifies the problem due to their immutability. Politically Sensitive Content. The blockchain has been used by whistleblow- ers as a censorship-resistant permanent storage for leaked information. We found backups of the WikiLeaks Cablegate data [37] as well as an online news arti- cle concerning pro-democracy demonstrations in Hong Kong in 2014 [25]. As stated in Section 3.2, restrictive governments are known to prosecute the pos- session of such content. For example, state-critical media coverage has already put individuals in China [18] or Turkey [24] at the risk of prosecution. Illegal and Condemned Content. Bitcoin’s blockchain contains at least eight files with sexual content. While five files only show, describe, or link to mildly pornographic content, we consider the remaining three instances objectionable for almost all jurisdictions: Two of them are backups of link lists to child pornog- raphy, containing 274 links to websites, 142 of which refer to Tor hidden services. The remaining instance is an image depicting mild nudity of a young woman. In an online forum this image is claimed to show child pornography, albeit this claim cannot be verified (due to ethical concerns we refrain from providing a ci- tation). Notably, two of the explicit images were only detected by our suspicious- transaction detector, i.e., they were not inserted via known services. While largely harmless, potentially objectionable blockchain content is infre- quently inserted, e.g., links to alleged child pornography or privacy violations. We thus believe that future blockchain designs must proactively cope with objec- tionable content. Peers can, e.g., filter incoming transactions or revert content- holding transactions [11,51], but this must be scalable and transparent. 5 Related Work Previous work related to ours comprises i) mitigating the distribution of objec- tionable content in file-sharing peer-to-peer networks, ii) studies on Bitcoin’s blockchain, iii) reports on Bitcoin’s susceptibility for content insertion, and iv) approaches to retrospectively remove blockchain content. The trade-off between enabling open systems for data distribution and risking that unwanted or even illegal content is being shared is already known from peer-to-peer networks. Peer-to-peer-based file-sharing protocols typically limit the spreading of objectionable public content by tracking the reputation of users offering files [6,26,55,73] or assigning a reputation to files themselves [19,67]. This way, users can reject objectionable content or content from untrustworthy sources. Contrarily, distributed content stores usually resort to encrypt private files before outsourcing them to other peers [17,7]. By storing only encrypted files, users can plausibly deny possessing any content of others and can thus obliviously store it on their hard disk. Unfortunately, these protection mechanisms are not applicable to blockchains, as content cannot be deleted once it has been added to the blockchain and the utilization of encryption cannot be enforced reliably. Bitcoin’s blockchain was analyzed w.r.t. different aspects by numerous stud- ies. In a first step, multiple research groups [53,33,71,72,39] studied the currency flows in Bitcoin, e.g., to perform wealth analyses. From a different line of re- search, several approaches focused on user privacy and investigated the identities used in Bitcoin [52,46,44,59,23]. These works analyzed to which extent users can be de-anonymized by clustering identities [52,46,44,59,23] and augmenting these clusters with side-channel information [52,44,59,23]. Finally, the blockchain was analyzed w.r.t. the use cases of OP RETURN transactions [12]. While this work is very close to ours, we provide a first comprehensive study of the complete landscape of non-financial data on Bitcoin’s blockchain. The seriousness of objectionable content stored on public blockchains has been motivated by multiple works [56,57,43,11,40,51]. These works, however, fo- cus on reporting individual incidents or consist of preliminary analyses of the distribution and general utilization of content insertion. To the best of our knowl- edge, this paper gives the first comprehensive analysis of this problem space, including a categorization of objectionable content and a survey of potential risks for users if such content enters the blockchain. In contrast to previously considered attacks on Bitcoin’s ecosystem [22,27], illegal content can be inserted instantly at comparably low costs and can put all participants at risk. The utilization of chameleon hash functions [15] to chain blocks recently opened up a potential approach to mitigate unwanted or illegal blockchain con- tent [11]. Here, a single blockchain maintainer or a small group of maintainers can retrospectively revert single transactions, e.g., due to illegal content. To overcome arising trust issues, µchain [51] leverages the consensus approach of traditional blockchains to vote on alterations of the blockchain history. As these approaches tackle unwanted content for newly designed blockchains, we seek to motivate a discussion on countermeasures also for existing systems, e.g., Bitcoin. 6 Conclusion The possibility to store non-financial data on cryptocurrency blockchains is both beneficial and threating for its users. Although controlled channels to insert non- financial data at small rates opens up a field of new applications such as digital notary services, rights management, or non-equivocation systems, objectionable or even illegal content has the potential to jeopardize a whole cryptocurrency. Although court rulings do not yet exist, legislative texts from countries such as Germany, the UK, or the USA suggest that illegal content such as child pornography can make the blockchain illegal to possess for all users. As we have shown in this paper, a plethora of fundamentally different meth- ods to store non-financial–potentially objectionable–content on the blockchain exists in Bitcoin. As of now, this can affect at least 112 countries in which pos- sessing content such as child pornography is illegal. This especially endangers the multi-billion dollar markets powering cryptocurrencies such as Bitcoin. To assess this problem’s severity, we comprehensively analyzed the quantity and quality of non-financial blockchain data in Bitcoin today. Our quantitative analysis shows that 1.4 % of the roughly 251 million transactions in Bitcoin’s blockchain carry arbitrary data. We could retrieve over 1600 files, with new con- tent infrequently being added. Despite a majority of arguably harmless content, we also identify different categories of objectionable content. The harmful poten- tial of single instances of objectionable blockchain content is already showcased by findings such as links to illegal pornography or serious privacy violations. Acknowledgements This work has been funded by the German Federal Ministry of Education and Research (BMBF) under funding reference number 16KIS0443. The responsibil- ity for the content of this publication lies with the authors. References 1. German Criminal Code, Section 11 (2013) 2. German Criminal Code, Sections 184b and 184c (2013) 3. Protection of Children Act, Chapter 37, Section 7 (2015) 4. Bitcoin Transaction Fees. https://bitcoinfees.info (2016) Accessed 09/23/2017. 5. General Data Protection Regulation, Section 24 (2016) 6. Aberer, K., Despotovic, Z.: Managing Trust in a Peer-2-Peer Information System. In: ACM CIKM. (2001) pp. 310–317 7. Adya, A., Bolosky, W.J., Castro, M., Cermak, G., Chaiken, R., Douceur, J.R., Howell, J., Lorch, J.R., Theimer, M., Wattenhofer, R.P.: FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment. SIGOPS Oper. Syst. Rev. 36(SI) (2002) pp. 1–14 8. Ali, M., Shea, R., Nelson, J., Freedman, M.J.: Blockstack: A New Decentralized Internet. (2017) Accessed 09/23/2017. https://bitcoinfees.info 9. Andresen, G.: Block v2 (Height in Coinbase). https://github.com/bitcoin/ bips/blob/master/bip-0034.mediawiki (2012) Accessed 09/23/2017. 10. Andresen, G.: Pay to Script Hash. https://github.com/bitcoin/bips/blob/ master/bip-0016.mediawiki (2012) Accessed 09/23/2017. 11. Ateniese, G., Magri, B., Venturi, D., Andrade, E.: Redactable Blockchain – or – Rewriting History in Bitcoin and Friends. In: IEEE EuroS&P. (2017) pp. 111–126 12. Bartoletti, M., Pompianu, L.: An analysis of Bitcoin OP RETURN metadata. In: FC Bitcoin Workshop. (2017) 13. Bellinger, J., Hussain, M.: Freedom of Speech: The Great Divide and the Common Ground between the United States and the Rest of the World. Islamic Law and International Human Rights Law: Searching for Common Ground? (2012) pp. 168– 180 14. Blockchain.info: Bitcoin Charts. https://blockchain.info/charts (2011) Ac- cessed 09/23/2017. 15. Camenisch, J., Derler, D., Krenn, S., Pöhls, H.C., Samelin, K., Slamanig, D.: Chameleon-Hashes with Ephemeral Trapdoors. In: PKC ’17. (2017) pp. 152–182 16. Clark, J., Essex, A.: CommitCoin: Carbon Dating Commitments with Bitcoin. In: FC. (2012) pp. 390–398 17. Clarke, I., Sandberg, O., Wiley, B., Hong, T.W.: Freenet: A Distributed Anony- mous Information Storage and Retrieval System. In: Designing Privacy Enhanc- ing Technologies: Workshop on Design Issues in Anonymity and Unobservability. (2001) pp. 46–66 18. Committee to Protect Journalists: Chinese journalist accused of illegally acquiring state secrets. https://cpj.org/x/660d (2015) Accessed 09/23/2017. 19. Damiani, E., di Vimercati, D.C., Paraboschi, S., Samarati, P., Violante, F.: A Reputation-based Approach for Choosing Reliable Resources in Peer-to-peer Net- works. In: ACM CCS. (2002) pp. 207–216 20. Dell Security: Annual Threat Report. (2016) Accessed 09/23/2017. 21. Douglas, D.M.: Doxing: a conceptual analysis. Ethics and Information Technology 18(3) (2016) pp. 199–210 22. Eyal, I., Sirer, E.G.: Majority Is Not Enough: Bitcoin Mining Is Vulnerable. In: FC. (2014) pp. 436–454 23. Fleder, M., Kester, M., Sudeep, P.: Bitcoin Transaction Graph Analysis. (2015) 24. Freedom House: Turkey Freedom of the Press Report. https://freedomhouse. org/report/freedom-press/2016/turkey (2016) Accessed 09/23/2017. 25. Gracie, C.: Hong Kong stages huge National Day democracy protests. http: //www.bbc.com/news/world-asia-china-29430229 (2014) Accessed 09/23/2017. 26. Gupta, M., Judge, P., Ammar, M.: A Reputation System for Peer-to-peer Net- works. In: ACM NOSSDAV. (2003) pp. 144–152 27. Heilman, E., Kendler, A., Zohar, A., Goldberg, S.: Eclipse Attacks on Bitcoin’s Peer-to-Peer Network. In: USENIX Security. (2015) pp. 129–144 28. Herald Union: Copyright infringement by illegal file sharing in Ger- many. http://www.herald-union.com/copyright-infringement-by-illegal- file-sharing-in-germany (2015) Accessed 09/23/2017. 29. HugPuddle: Apertus – Archive data on your favorite blockchains. http:// apertus.io (2013) Accessed 09/23/2017. 30. “Hyena”: Cryptograffiti.info. http://cryptograffiti.info Accessed 09/23/2017. 31. Interpol: INTERPOL cyber research identifies malware threat to virtual curren- cies. https://www.interpol.int/News-and-media/News/2015/N2015-033 (2015) Accessed 09/23/2017. https://github.com/bitcoin/bips/blob/master/bip-0034.mediawiki https://github.com/bitcoin/bips/blob/master/bip-0034.mediawiki https://github.com/bitcoin/bips/blob/master/bip-0016.mediawiki https://github.com/bitcoin/bips/blob/master/bip-0016.mediawiki https://blockchain.info/charts https://cpj.org/x/660d https://freedomhouse.org/report/freedom-press/2016/turkey https://freedomhouse.org/report/freedom-press/2016/turkey http://www.bbc.com/news/world-asia-china-29430229 http://www.bbc.com/news/world-asia-china-29430229 http://www.herald-union.com/copyright-infringement-by-illegal-file-sharing-in-germany http://www.herald-union.com/copyright-infringement-by-illegal-file-sharing-in-germany http://apertus.io http://apertus.io http://cryptograffiti.info https://www.interpol.int/News-and-media/News/2015/N2015-033 32. Irish Office of the Attorney General: Child Trafficking and Pornography Act, Section 2. Irish Statue Book (1998) pp. 44–61 33. Kondor, D., Pósfai, M., Csabai, I., Vattay, G.: Do the Rich Get Richer? An Empirical Analysis of the Bitcoin Transaction Network. PLOS ONE 9(2) (02 2014) pp. 1–10 34. Labs, F.S.: Ransomware: How to Predict, Prevent, Detect & Resond. Threat Response (2016) Accessed 09/23/2017. 35. Le Calvez, A.: Non-standard P2SH scripts. https://medium.com/@alcio/non- standard-p2sh-scripts-508fa6292df5 (2015) Accessed 09/23/2017. 36. Lee, D.: France ends three-strikes internet piracy ban policy. http://www.bbc. com/news/technology-23252515 (2013) Accessed 12/12/2017. 37. Lynch, L.: The Leak Heard Round the World? Cablegate in the Evolving Global Mediascape. In Brevini, B., Hintz, A., McCurdy, P., eds.: Beyond WikiLeaks: Implications for the Future of Communications, Journalism and Society. Palgrave Macmillan UK (2013) pp. 56–77 38. Lyons, K., Blight, G.: Where in the world is the worst place to be a Christian? (2015) Accessed 09/23/2017. 39. Maesa, D.D.F., Marino, A., Ricci, L.: Uncovering the Bitcoin Blockchain: An Analysis of the Full Users Graph. In: IEEE DSAA. (2016) pp. 537–546 40. Matzutt, R., Hohlfeld, O., Henze, M., Rawiel, R., Ziegeldorf, J.H., Wehrle, K.: POSTER: I Don’t Want That Content! On the Risks of Exploiting Bitcoin’s Block- chain as a Content Store. In: ACM CCS. (2016) 41. Matzutt, R., Müllmann, D., Zeissig, E.M., Horst, C., Kasugai, K., Lidynia, S., Wieninger, S., Ziegeldorf, J.H., Gudergan, G., Spiecker gen. Döhmann, I., Wehrle, K., Ziefle, M.: myneData: Towards a Trusted and User-controlled Ecosystem for Sharing Personal Data. In Eibl, M., Gaedke, M., eds.: INFORMATIK, Gesellschaft für Informatik, Bonn (2017) pp. 1073–1084 42. McAfee Labs: Threats Report (December 2016). (2016) Accessed 09/23/2017. 43. McReynolds, E., Lerner, A., Scott, W., Roesner, F., Kohno, T.: Cryptographic currencies from a tech-policy perspective: Policy issues and technical directions. In: Springer LNCS. Volume 8976. (2015) pp. 94–111 44. Meiklejohn, S., Pomarole, M., Jordan, G., Levchenko, K., McCoy, D., Voelker, G.M., Savage, S.: A Fistful of Bitcoins: Characterizing Payments Among Men with No Names. In: IMC. (2013) pp. 127–140 45. Nakamoto, S.: Bitcoin: A Peer-to-Peer Electronic Cash System. (2008) https: //bitcoin.org/bitcoin.pdf. 46. Ober, M., Katzenbeisser, S., Hamacher, K.: Structure and Anonymity of the Bit- coin Transaction Graph. Future Internet 5(2) (2013) pp. 237–250 47. Office of the Law Revision Counsel of the United States House of Representatives: U.S. Code, Title 18, Chapter 110, § 2256 (2017) 48. Okupski, K.: Bitcoin Developer Reference. Technical report (2014) 49. Peerenboom, R.P.: Assessing Human Rights in China: Why the Double Standard. (2005) Accessed 09/23/2017. 50. PoEx Co., Ltd: Proof of Existence. https://proofofexistence.com (2015) Ac- cessed 09/23/2017. 51. Puddu, I., Dmitrienko, A., Capkun, S.: µchain: How to forget without hard forks. IACR Cryptology ePrint Archive 2017/106 (2017) Accessed 09/23/2017. 52. Reid, F., Harrigan, M.: An Analysis of Anonymity in the Bitcoin System. In: Security and Privacy in Social Networks. (2013) pp. 197–223 53. Ron, D., Shamir, A.: Quantitative Analysis of the Full Bitcoin Transaction Graph. In: FC. (2013) pp. 6–24 https://medium.com/@alcio/non-standard-p2sh-scripts-508fa6292df5 https://medium.com/@alcio/non-standard-p2sh-scripts-508fa6292df5 http://www.bbc.com/news/technology-23252515 http://www.bbc.com/news/technology-23252515 https://bitcoin.org/bitcoin.pdf https://bitcoin.org/bitcoin.pdf https://proofofexistence.com 54. Scheller, S.H.: A Picture Is Worth a Thousand Words: The Legal Implications of Revenge Porn. North Carolina Law Review 93(2) (2015) pp. 551–595 55. Selcuk, A.A., Uzun, E., Pariente, M.R.: A Reputation-based Trust Management System for P2P Networks. In: IEEE CCGrid. (2004) pp. 251–258 56. Shirriff, K.: Hidden surprises in the Bitcoin blockchain and how they are stored: Nelson Mandela, Wikileaks, photos, and Python software. http://www. righto.com/2014/02/ascii-bernanke-wikileaks-photographs.html (2014) Ac- cessed 09/23/2017. 57. Sleiman, M.D., Lauf, A.P., Yampolskiy, R.: Bitcoin message: Data insertion on a proof-of-work cryptocurrency system. In: ACM CW. (2015) pp. 332–336 58. Snow, P., Deery, B., Lu, J., Johnston, D., Kirby, P.: Factom: Business Processes Secured by Immutable Audit Trails on the Blockchain. https://www.factom.com/ devs/docs/guide/factom-white-paper-1-0 (2014) Accessed 09/23/2017. 59. Spagnuolo, M., Maggi, F., Zanero, S.: BitIodine: Extracting Intelligence from the Bitcoin Network. In: FC. (2014) pp. 457–468 60. Standing Committee of the National People’s Congress: Law of the People’s Re- public of China on Guarding State Secrets. (1989) Accessed 09/23/2017. 61. Taylor, G.: Concepts of Intention in German Criminal Law. Oxford Journal of Legal Studies 24(1) (2004) pp. 99–127 62. Tomescu, A., Devadas, S.: Catena: Efficient non-equivocation via bitcoin. In: IEEE S&P. (2017) pp. 393–409 63. Tucker, E.: A Look at Federal Cases on Handling Classified In- formation. http://www.military.com/daily-news/2016/01/30/a-look-at- federal-cases-on-handling-classified-information.html (2016) Accessed 09/23/2017. 64. United Nations: Appendix to the Optional protocols to the Convention on the Rights of the Child on the involvement of children in armed conflict and on the sale of children, child prostitution and child pornography (2000) 65. United Nations: Optional protocols to the Convention on the Rights of the Child on the involvement of children in armed conflict and on the sale of children, child prostitution and child pornography. 2171 (2000) pp. 247–254 66. Waldman, M., Rubin, A.D., Cranor, L.: Publius: A Robust, Tamper-Evident, Censorship-Resistant and Source-Anonymous Web Publishing System. In: USENIX Security. (2000) pp. 59–72 67. Walsh, K., Sirer, E.G.: Experience with an Object Reputation System for Peer- to-peer Filesharing. In: NSDI. (2006) 68. Wei, W.: Ancient ’STONED’ Virus Signatures found in Bitcoin Block- chain. https://thehackernews.com/2014/05/microsoft-security-essential- found.html (2014) Accessed 09/23/2017. 69. Wood, G.: Ethereum: A Secure Decentralised Generalised Transaction Ledger. Ethereum Project Yellow Paper (2016) Accessed 09/23/2017. 70. Zeilinger, M.: Digital art as ‘monetised graphics’: Enforcing intellectual property on the blockchain. Philosophy & Technology (2016) 71. Ziegeldorf, J.H., Grossmann, F., Henze, M., Inden, N., Wehrle, K.: CoinParty: Secure Multi-Party Mixing of Bitcoins. In: ACM CODASPY. (2015) pp. 75–86 72. Ziegeldorf, J.H., Matzutt, R., Henze, M., Grossmann, F., Wehrle, K.: Secure and Anonymous Decentralized Bitcoin Mixing. FGCS 80 (3 2018) 448–466 73. Zimmermann, T., Rüth, J., Wirtz, H., Wehrle, K.: Maintaining Integrity and Reputation in Content Offloading. In: IEEE/IFIP WONS. (2016) pp. 1–8 http://www.righto.com/2014/02/ascii-bernanke-wikileaks-photographs.html http://www.righto.com/2014/02/ascii-bernanke-wikileaks-photographs.html https://www.factom.com/devs/docs/guide/factom-white-paper-1-0 https://www.factom.com/devs/docs/guide/factom-white-paper-1-0 http://www.military.com/daily-news/2016/01/30/a-look-at-federal-cases-on-handling-classified-information.html http://www.military.com/daily-news/2016/01/30/a-look-at-federal-cases-on-handling-classified-information.html https://thehackernews.com/2014/05/microsoft-security-essential-found.html https://thehackernews.com/2014/05/microsoft-security-essential-found.html A Quantitative Analysis of the Impact of Arbitrary Blockchain Content on Bitcoin feeds-dltj-org-5409 ---- Disruptive Library Technology Jester Disruptive Library Technology Jester We're Disrupted, We're Librarians, and We're Not Going to Take It Anymore More Thoughts on Pre-recording Conference Talks Over the weekend, I posted an article here about pre-recording conference talks and sent a tweet about the idea on Monday. I hoped to generate discussion about recording talks to fill in gaps—positive and negative—about the concept, and I was not disappointed. I’m particularly thankful to Lisa Janicke Hinchliffe and Andromeda Yelton along with Jason Griffey, Junior Tidal, and Edward Lim Junhao for generously sharing their thoughts. Daniel S and Kate Deibel also commented on the Code4Lib Slack team. I added to the previous article’s bullet points and am expanding on some of the issues here. I’m inviting everyone mentioned to let me know if I’m mischaracterizing their thoughts, and I will correct this post if I hear from them. (I haven’t found a good comments system to hook into this static site blog.) Pre-recorded Talks Limit Presentation Format Lisa Janicke Hinchliffe made this point early in the feedback: @DataG For me downside is it forces every session into being a lecture. For two decades CfPs have emphasized how will this season be engaging/not just a talking head? I was required to turn workshops into talks this year. Even tho tech can do more. Not at all best pedagogy for learning— Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 Jason described the “flipped classroom” model that he had in mind as the NISOplus2021 program was being developed. The flipped classroom model is one where students do the work of reading material and watching lectures, then come to the interactive time with the instructors ready with questions and comments about the material. Rather than the instructor lecturing during class time, the class time becomes a discussion about the material. For NISOplus, “the recording is the material the speaker and attendees are discussing” during the live Zoom meetings. In the previous post, I described how the speaker could respond in text chat while the recording replay is beneficial. Lisa went on to say: @DataG Q+A is useful but isn't an interactive session. To me, interactive = participants are co-creating the session, not watching then commenting on it.— Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 She described an example: the SSP preconference she ran at CHS. I’m paraphrasing her tweets in this paragraph. The preconference had a short keynote and an “Oprah-style” panel discussion (not pre-prepared talks). This was done live; nothing was recorded. After the panel, people worked in small groups using Zoom and a set of Google Slides to guide the group work. The small groups reported their discussions back to all participants. Andromeda points out (paraphrasing twitter-speak): “Presenters will need much more— and more specialized—skills to pull it off, and it takes a lot more work.” And Lisa adds: “Just so there is no confusion … I don’t think being online makes it harder to do interactive. It’s the pre-recording. Interactive means participants co-create the session. A pause to chat isn’t going to shape what comes next on the recording.” Increased Technical Burden on Speakers and Organizers @ThatAndromeda @DataG Totally agree on this. I had to pre-record a conference presentation recently and it was a terrible experience, logistically. I feel like it forces presenters to become video/sound editors, which is obviously another thing to worry about on top of content and accessibility.— Junior Tidal (@JuniorTidal) April 5, 2021 Andromeda also agreed with this: “I will say one of the things I appreciated about NISO is that @griffey did ALL the video editing, so I was not forced to learn how that works.” She continued, “everyone has different requirements for prerecording, and in [Code4Lib’s] case they were extensive and kept changing.” And later added: “Part of the challenge is that every conference has its own tech stack/requirements. If as a presenter I have to learn that for every conference, it’s not reducing my workload.” It is hard not to agree with this; a high-quality (stylistically and technically) recording is not easy to do with today’s tools. This is also a technical burden for meeting organizers. The presenters will put a lot of work into talks—including making sure the recordings look good; whatever playback mechanism is used has to honor the fidelity of that recording. For instance, presenters who have gone through the effort to ensure the accessibility of the presentation color scheme want the conference platform to display the talk “as I created it.” The previous post noted that recorded talks also allow for the creation of better, non-real-time transcriptions. Lisa points out that presenters will want to review that transcription for accuracy, which Jason noted adds to the length of time needed before the start of a conference to complete the preparations. Increased Logistical Burden on Presenters @ThatAndromeda @DataG @griffey Even if prep is no more than the time it would take to deliver live (which has yet to be case for me and I'm good at this stuff), it is still double the time if you are expected to also show up live to watch along with everyone else.— Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 This is a consideration I hadn’t thought through—that presenters have to devote more clock time to the presentation because first they have to record it and then they have to watch it. (Or, as Andromeda added, “significantly more than twice the time for some people, if they are recording a bunch in order to get it right and/or doing editing.”) No. Audience. Reaction. @DataG @griffey 3) No. Audience. Reaction. I give a joke and no one laughs. Was it funny? Was it not funny? Talks are a *performance* and a *relationship*; I'm getting energy off the audience, I'm switching stuff on the fly to meet their vibe. Prerecorded/webinar is dead. Feels like I'm bombing.— Andromeda Yelton (@ThatAndromeda) April 5, 2021 Wow, yes. I imagine it would take a bit of imagination to get in the right mindset to give a talk to a small camera instead of an audience. I wonder how stand-up comedians are dealing with this as they try to put on virtual shows. Andromeda summed this up: @DataG @griffey oh and I mean 5) I don't get tenure or anything for speaking at conferences and goodness knows I don't get paid. So the ENTIRE benefit to me is that I enjoy doing the talk and connect to people around it. prerecorded talk + f2f conf removes one of these; online removes both.— Andromeda Yelton (@ThatAndromeda) April 5, 2021 Also in this heading could be “No Speaker Reaction”—or the inability for subsequent speakers at a conference to build on something that someone said earlier. In the Code4Lib Slack team, Daniel S noted: “One thing comes to mind on the pre-recording [is] the issue that prerecorded talks lose the ‘conversation’ aspect where some later talks at a conference will address or comment on earlier talks.” Kate Deibel added: “Exactly. Talks don’t get to spontaneously build off of each other or from other conversations that happen at the conference.” Currency of information Lisa points out that pre-recording talks before en event means there is a delay between the recording and the playback. In the example she pointed out, there was a talk at RLUK that pre-recorded would have been about the University of California working on an Open Access deal with Elsevier; live, it was able to be “the deal we announced earlier this week”. Conclusions? Near the end of the discussion, Lisa added: @DataG @griffey @ThatAndromeda I also recommend going forward that the details re what is required of presenters be in the CfP. It was one thing for conferences that pivoted (huge effort!) but if you write the CfP since the pivot it should say if pre-record, platform used, etc.— Lisa Janicke Hinchliffe (@lisalibrarian) April 5, 2021 …and Andromeda added: “Strong agree here. I understand that this year everyone was making it up as they went along, but going forward it’d be great to know that in advance.” That means conferences will need to take these needs into account well before the Call for Proposals (CfP) is published. A conference that is thinking now about pre-recording their talks must work through these issues and set expectations with presenters early. As I hoped, the Twiter replies tempered my eagerness for the all-recorded style with some real-world experience. There could be possibilities here, but adapting face-to-face meetings to a world with less travel won’t be simple and will take significant thought beyond the issues of technology platforms. Edward Lim Junhao summarized this nicely: “I favor unpacking what makes up our prof conferences. I’m interested in recreating that shared experience, the networking, & the serendipity of learning sth you didn’t know. I feel in-person conferences now have to offer more in order to justify people traveling to attend them.” Related, Andromeda said: “Also, for a conf that ultimately puts its talks online, it’s critical that it have SOMEthing beyond content delivery during the actual conference to make it worth registering rather than just waiting for youtube. realtime interaction with the speaker is a pretty solid option.” If you have something to add, reach out to me on Twitter. Given enough responses, I’ll create another summary. Let’s keep talking about what that looks like and sharing discoveries with each other. The Tree of Tweets It was a great discussion, and I think I pulled in the major ideas in the summary above. With some guidance from Ed Summers, I’m going to embed the Twitter threads below using Treeverse by Paul Butler. We might be stretching the boundaries of what is possible, so no guarantees that this will be viewable for the long term. Should All Conference Talks be Pre-recorded? The Code4Lib conference was last week. That meeting used all pre-recorded talks, and we saw the benefits of pre-recording for attendees, presenters, and conference organizers. Should all talks be pre-recorded, even when we are back face-to-face? Note! After I posted a link to this article on Twitter, there was a great response of thoughtful comments. I've included new bullet points below and summarized the responses in another blog post. As an entirely virtual conference, I think we can call Code4Lib 2021 a success. Success ≠ Perfect, of course, and last week the conference coordinating team got together on a Zoom call for a debriefing session. We had a lengthy discussion about what we learned and what we wanted to take forward to the 2022 conference, which we’re anticipating will be something with a face-to-face component. That last sentence was tough to compose: “…will be face-to-face”? “…will be both face-to-face and virtual”? (Or another fully virtual event?) Truth be told, I don’t think we know yet. I think we know with some certainty that the COVID pandemic will become much more manageable by this time next year—at least in North America and Europe. (Code4Lib draws from primarily North American library technologists with a few guests from other parts of the world.) I’m hearing from higher education institutions, though, that travel is going to be severely curtailed…if not for health risk reasons, then because budgets have been slashed. So one has to wonder what a conference will look like next year. I’ve been to two online conferences this year: NISOplus21 and Code4Lib. Both meetings recorded talks in advance and started playback of the recordings at a fixed point in time. This was beneficial for a couple of reasons. For organizers and presenters, pre-recording allowed technical glitches to be worked through without the pressure of a live event happening. Technology is not nearly perfect enough or ubiquitously spread to count on it working in real-time. 1 NISOplus21 also used the recordings to get transcribed text for the videos. (Code4Lib used live transcriptions on the synchronous playback.) Attendees and presenters benefited from pre-recording because the presenters could be in the text chat channel to answer questions and provide insights. Having the presenter free during the playback offers new possibilities for making talks more engaging: responding in real-time to polls, getting forehand knowledge of topics for subsequent real-time question/answer sessions, and so forth. The synchronous playback time meant that there was a point when (almost) everyone was together watching the same talk—just as in face-to-face sessions. During the Code4Lib conference coordinating debrief call, I asked the question: “If we saw so many benefits to pre-recording talks, do we want to pre-record them all next year?” In addition to the reasons above, pre-recorded talks benefit those who are not comfortable speaking English or are first-time presenters. (They have a chance to re-do their talk as many times as they need in a much less stressful environment.) “Live” demos are much smoother because a recording can be restarted if something goes wrong. Each year, at least one presenter needs to use their own machine (custom software, local development environment, etc.), and swapping out presenter computers in real-time is risky. And it is undoubtedly easier to impose time requirements with recorded sessions. So why not pre-record all of the talks? I get it—it would be different to sit in a ballroom watching a recording play on big screens at the front of the room while the podium is empty. But is it so different as to dramatically change the experience of watching a speaker at a podium? In many respects, we had a dry-run of this during Code4Lib 2020. It was at the early stages of the coming lockdowns when institutions started barring employee travel, and we had to bring in many presenters remotely. I wrote a blog post describing the setup we used for remote presenters, and at the end, I said: I had a few people comment that they were taken aback when they realized that there was no one standing at the podium during the presentation. Some attendees, at least, quickly adjusted to this format. For those with the means and privilege of traveling, there can still be face-to-face discussions in the hall, over meals, and social activities. For those that can’t travel (due to risks of traveling, family/personal responsibilities, or budget cuts), the attendee experience is a little more level—everyone is watching the same playback and in the same text backchannels during the talk. I can imagine a conference tool capable of segmenting chat sessions during the talk playback to “tables” where you and close colleagues can exchange ideas and then promote the best ones to a conference-wide chat room. Something like that would be beneficial as attendance grows for events with an online component, and it would be a new form of engagement that isn’t practical now. There are undoubtedly reasons not to pre-record all session talks (beyond the feels-weird-to-stare-at-an-unoccupied-ballroom-podium reasons). During the debriefing session, one person brought up that having all pre-recorded talks erodes the justification for in-person attendance. I can see a manager saying, “All of the talks are online…just watch it from your desk. Even your own presentation is pre-recorded, so there is no need for you to fly to the meeting.” That’s legitimate. So if you like bullet points, here’s how it lays out. Pre-recording all talks is better for: Accessibility: better transcriptions for recorded audio versus real-time transcription (and probably at a lower cost, too) Engagement: the speaker can be in the text chat during playback, and there could be new options for backchannel discussions Better quality: speakers can re-record their talk as many times as needed Closer equality: in-person attendees are having much the same experience during the talk as remote attendees Downsides for pre-recording all talks: Feels weird: yeah, it would be different Erodes justification: indeed a problem, especially for those for whom giving a speech is the only path to getting the networking benefits of face-to-face interaction Limits presentation format: it forces every session into being a lecture. For two decades CfPs have emphasized how will this season be engaging/not just a talking head? (Lisa Janicke Hinchliffe) Increased Technical Burden on Speaker and Organizers: conference organizers asking presenters to do their own pre-recording is a barrier (Junior Tidal), and organizers have added new requirements for themselves No Audience Feedback: pre-recording forces the presenter into an unnatural state relative to the audience (Andromeda Yelton) Currency of information: pre-recording talks before en event naturally introduces a delay between the recording and the playback. (Lisa Janicke Hinchliffe) I’m curious to hear of other reasons, for and against. Reach out to me on Twitter if you have some. The COVID-19 pandemic has changed our society and will undoubtedly transform it in ways that we can’t even anticipate. Is the way that we hold professional conferences one of them? Can we just pause for a moment and consider the decades of work and layers of technology that make a modern teleconference call happen? For you younger folks, there was a time when one couldn’t assume the network to be there. As in: the operating system on your computer couldn’t be counted on to have a network stack built into it. In the earliest years of my career, we were tickled pink to have Macintoshes at the forefront of connectivity through GatorBoxes. Go read the first paragraph of that Wikipedia article on GatorBoxes…TCP/IP was tunneled through LocalTalk running over PhoneNet on unshielded twisted pairs no faster than about 200 kbit/second. (And we loved it!) Now the network is expected; needing to know about TCP/IP is pushed so far down the stack as to be forgotten…assumed. Sure, the software on top now is buggy and bloated—is my Zoom client working? has Zoom’s service gone down?—but the network…we take that for granted. ↩ User Behavior Access Controls at a Library Proxy Server are Okay Earlier this month, my Twitter timeline lit up with mentions of a half-day webinar called Cybersecurity Landscape - Protecting the Scholarly Infrastructure. What had riled up the people I follow on Twitter was the first presentation: “Security Collaboration for Library Resource Access” by Cory Roach, the chief information security officer at the University of Utah. Many of the tweets and articles linked in tweets were about a proposal for a new round of privacy-invading technology coming from content providers as a condition of libraries subscribing to publisher content. One of the voices that I trust was urging caution: I highly recommend you listen to the talk, which was given by a university CIO, and judge if this is a correct representation. FWIW, I attended the event and it is not what I took away.— Lisa Janicke Hinchliffe (@lisalibrarian) November 14, 2020 As near as I can tell, much of the debate traces back to this article: Scientific publishers propose installing spyware in university libraries to protect copyrights - Coda Story https://t.co/rtCokIukBf— Open Access Tracking Project (@oatp) November 14, 2020 The article describes Cory’s presentation this way: One speaker proposed a novel tactic publishers could take to protect their intellectual property rights against data theft: introducing spyware into the proxy servers academic libraries use to allow access to their online services, such as publishers’ databases. The “spyware” moniker is quite scary. It is what made me want to seek out the recording from the webinar and hear the context around that proposal. My understanding (after watching the presentation) is that the proposal is not nearly as concerning. Although there is one problematic area—the correlation of patron identity with requested URLs—overall, what is described is a sound and common practice for securing web applications. To the extent that it is necessary to determine a user’s identity before allowing access to licensed content (an unfortunate necessity because of the state of scholarly publishing), this is an acceptable proposal. (Through the university communications office, Corey published a statement about the reaction to his talk.) In case you didn’t know, a web proxy server ensures the patron is part of the community of licensed users, and the publisher trusts requests that come through the web proxy server. The point of Cory’s presentation is that the username/password checking at the web proxy server is a weak form of access control that is subject to four problems: phishing (sending email to tricking a user into giving up their username/password) social engineering (non-email ways of tricking a user into giving up their username/password) credential reuse (systems that are vulnerable because the user used the same password in more than one place) hactivism (users that intentionally give out their username/password so others can access resources) Right after listing these four problems, Cory says: “But anyway we look at it, we can safely say that this is primarily a people problem and the technology alone is not going to solve that problem. Technology can help us take reasonable precautions… So long as the business model involves allowing access to the data that we’re providing and also trying to protect that same data, we’re unlikely to stop theft entirely.” His proposal is to place “reasonable precautions” in the web proxy server as it relates to the campus identity management system. This is a slide from his presentation: Slide from presentation by Cory Roach I find this layout (and lack of labels) somewhat confusing, so I re-imagined the diagram as this: Revised 'Modern Library Design' The core of Cory’s presentation is to add predictive analytics and per-user blocking automation to the analysis of the log files from the web proxy server and the identity management server. By doing so, the university can react quicker to compromised usernames and passwords. In fact, it could probably do so more quicker than the publisher could do with its own log analysis and reporting back to the university. Where Cory runs into trouble is this slide: Slide from presentation by Cory Roach In this part of the presentation, Cory describes the kinds of patron-identifying data that the university could-or-would collect and analyze to further the security effort. In search engine optimization, these sorts of data points are called “signals” and are used to improve the relevance of search results; perhaps there is an equivalent term in access control technology. But for now, I’ll just call them “signals”. There are some problems in gathering these signals—most notably the correlation between user identity and “URLs Requested”. In the presentation, he says: “You can also move over to behavioral stuff. So it could be, you know, why is a pharmacy major suddenly looking up a lot of material on astrophysics or why is a medical professional and a hospital suddenly interested in internal combustion. Things that just don’t line up and we can identify fishy behavior.” It is core to the library ethos that we make our best effort to not track what a user is interested in—to not build a profile of a user’s research unless they have explicitly opted into such data collection. As librarians, we need to gracefully describe this professional ethos and work that into the design of the systems used on campus (and at the publishers). Still, there is much to be said for using some of the other signals to analyze whether a particular request is from an authorized community member. For instance, Cory says: “We commonly see this user coming in from the US and today it’s coming in from Botswana. You know, has there been enough time that they could have traveled from the US to Botswana and actually be there? Have they ever access resources from that country before is there residents on record in that country?” The best part of what Cory is proposing is that the signals’ storage and processing is at the university and not at the publisher. I’m not sure if Cory knew this, but a recent version of EZProxy added a UsageLimit directive that builds in some of these capabilities. It can set per-user limits based on the number of page requests or the amount of downloaded information over a specified interval. One wonders if somewhere in OCLC’s development queue is the ability to detect IP addresses from multiple networks (geographic detection) and browser differences across a specified interval. Still, pushing this up to the university’s identity provider allows for a campus-wide view of the signals…not just the ones coming through the library. Also, in designing the system, there needs to be clarity about how the signals are analyzed and used. I think Cory knew this as well: “we do have to be careful about not building bias into the algorithms.” Yeah, the need for this technology sucks. Although it was the tweet to the Coda Story about the presentation that blew up, the thread of the story goes through TechDirt to a tangential paragraph from Netzpolitik in an article about Germany’s licensing struggle with Elsevier. With this heritage, any review of the webinar’s ideas are automatically tainted by the disdain the library community in general has towards Elsevier. It is reality—an unfortunate reality, in my opinion—that the traditional scholarly journal model has publishers exerting strong copyright protection on research and ideas behind paywalls. (Wouldn’t it be better if we poured the anti-piracy effort into improving scholarly communication tools in an Open Access world? Yes, but that isn’t the world we live in.) Almost every library deals with this friction by employing a web proxy server as an agent between the patron and the publisher’s content. The Netzpolitik article says: …but relies on spyware in the fight against „cybercrime“ Of Course, Sci-Hub and other shadow libraries are a thorn in Elsevier’s side. Since they have existed, libraries at universities and research institutions have been much less susceptible to blackmail. Their staff can continue their research even without a contract with Elsevier. Instead of offering transparent open access contracts with fair conditions, however, Elsevier has adopted a different strategy in the fight against shadow libraries. These are to be fought as „cybercrime“, if necessary also with technological means. Within the framework of the „Scholarly Networks Security Initiative (SNSI)“, which was founded together with other large publishers, Elsevier is campaigning for libraries to be upgraded with security technology. In a SNSI webinar entitled „Cybersecurity Landscape – Protecting the Scholarly Infrastructure“*, hosted by two high-ranking Elsevier managers, one speaker recommended that publishers develop their own proxy or a proxy plug-in for libraries to access more (usage) data („develop or subsidize a low cost proxy or a plug-in to existing proxies“). With the help of an „analysis engine“, not only could the location of access be better narrowed down, but biometric data (e.g. typing speed) or conspicuous usage patterns (e.g. a pharmacy student suddenly interested in astrophysics) could also be recorded. Any doubts that this software could also be used—if not primarily—against shadow libraries were dispelled by the next speaker. An ex-FBI analyst and IT security consultant spoke about the security risks associated with the use of Sci-Hub. The other commentary that I saw was along similar lines: [Is the SNSI the new PRISM? bjoern.brembs.blog](http://bjoern.brembs.net/2020/10/is-the-snsi-the-new-prism/) [Academics band together with publishers because access to research is a cybercrime chorasimilarity](https://chorasimilarity.wordpress.com/2020/11/14/academics-band-together-with-publishers-because-access-to-research-is-a-cybercrime/) [WHOIS behind SNSI & GetFTR? Motley Marginalia](https://csulb.edu/~ggardner/2020/11/16/snsi-getftr/) Let’s face it: any friction beyond follow-link-to-see-PDF is more friction than a researcher deserves. I doubt we would design a scholarly communication system this way were we to start from scratch. But the system is built on centuries of evolving practice, organizations, and companies. It really would be a better world if we didn’t have to spend time and money on scholarly publisher paywalls. And I’m grateful for the Open Access efforts that are pivoting scholarly communications into an open-to-all paradigm. That doesn’t negate the need to provide better options for content that must exist behind a paywall. So what is this SNSI thing? The webinar where Cory presented was the first mention I’d seen of a new group called the Scholarly Networks Security Initiative (SNSI). SNSI is the latest in a series of publisher-driven initiatives to reduce the paywall’s friction for paying users or library patrons coming from licensing institutions. GetFTR (my thoughts) and Seamless Access (my thoughts). (Disclosure: I’m serving on two working groups for Seamless Access that are focused on making it possible for libraries to sensibly and sanely integrate the goals of Seamless Access into campus technology and licensing contracts.) Interestingly, while the Seamless Access initiative is driven by a desire to eliminate web proxy servers, this SNSI presentation upgrades a library’s web proxy server and makes it a more central tool between the patron and the content. One might argue that all access on campus should come through the proxy server to benefit from this kind of access control approach. It kinda makes one wonder about the coordination of these efforts. Still, SNSI is on my radar now, and I think it will be interesting to see what the next events and publications are from this group. As a Cog in the Election System: Reflections on My Role as a Precinct Election Official I may nod off several times in composing this post the day after election day. Hopefully, in reading it, you won’t. It is a story about one corner of democracy. It is a journal entry about how it felt to be a citizen doing what I could do to make other citizens’ voices be heard. It needed to be written down before the memories and emotions are erased by time and naps. Yesterday I was a precinct election officer (PEO—a poll worker) for Franklin County—home of Columbus, Ohio. It was my third election as a PEO. The first was last November, and the second was the election aborted by the onset of the coronavirus in March. (Not sure that second one counts.) It was my first as a Voting Location Manager (VLM), so I felt the stakes were high to get it right. Would there be protests at the polling location? Would I have to deal with people wearing candidate T-shirts and hats or not wearing masks? Would there be a crash of election observers, whether official (scrutinizing our every move) or unofficial (that I would have to remove)? It turns out the answer to all three questions was “no”—and it was a fantastic day of civic engagement by PEOs and voters. There were well-engineered processes and policies, happy and patient enthusiasm, and good fortune along the way. This story is going to turn out okay, but it could have been much worse. Because of the complexity of the election day voting process, last year Franklin County started allowing PEOs to do some early setup on Monday evenings. The early setup started at 6 o’clock. I was so anxious to get it right that the day before I took the printout of the polling room dimensions from my VLM packet, scanned it into OmniGraffle on my computer, and designed a to-scale diagram of what I thought the best layout would be. The real thing only vaguely looked like this, but it got us started. What I imagined our polling place would look like We could set up tables, unpack equipment, hang signs, and other tasks that don’t involve turning on machines or breaking open packets of ballots. One of the early setup tasks was updating the voters’ roster on the electronic poll pads. As happened around the country, there was a lot of early voting activity in Franklin County, so the update file must have been massive. The electronic poll pads couldn’t handle the update; they hung at step 8-of-9 for over an hour. I called the Board of Elections and got ahold of someone in the equipment warehouse. We tried some of the simple troubleshooting steps, and he gave me his cell phone number to call back if it wasn’t resolved. By 7:30, everything was done except for the poll pad updates, and the other PEOs were wandering around. I think it was 8 o’clock when I said everyone could go home while the two Voting Location Deputies and I tried to get the poll pads working. I called the equipment warehouse and we hung out on the phone for hours…retrying the updates based on the advice of the technicians called in to troubleshoot. I even “went rogue” towards the end. I searched the web for the messages on the screen to see if anyone else had seen the same problem with the poll pads. The electronic poll pad is an iPad with a single, dedicated application, so I even tried some iPad reset options to clear the device cache and perform a hard reboot. Nothing worked—still stuck at step 8-of-9. The election office people sent us home at 10 o’clock. Even on the way out the door, I tried a rogue option: I hooked a portable battery to one of the electronic polling pads to see if the update would complete overnight and be ready for us the next day. It didn’t, and it wasn’t. Text from Board of Elections Polling locations in Ohio open at 6:30 in the morning, and PEOs must report to their sites by 5:30. So I was up at 4:30 for a quick shower and packing up stuff for the day. Early in the setup process, the Board of Elections sent a text that the electronic poll pads were not going to be used and to break out the “BUMPer Packets” to determine a voter’s eligibility to vote. At some point, someone told me what “BUMPer” stood for. I can’t remember, but I can imagine it is Back-Up-something-something. “Never had to use that,” the trainers told me, but it is there in case something goes wrong. Well, it is the year 2020, so was something going to go wrong? Fortunately, the roster judges and one of the voting location deputies tore into the BUMPer Packet and got up to speed on how to use it. It is an old fashioned process: the voter states their name and address, the PEO compares that with the details on the paper ledger, and then asks the voter to sign beside their name. With an actual pen…old fashioned, right? The roster judges had the process down to a science. They kept the queue of verified voters full waiting to use the ballot marker machines. The roster judges were one of my highlights of the day. And boy did the voters come. By the time our polling location opened at 6:30 in the morning, they were wrapped around two sides of the building. We were moving them quickly through the process: three roster tables for checking in, eight ballot-marking machines, and one ballot counter. At our peak capacity, I think we were doing 80 to 90 voters an hour. As good as we were doing, the line never seemed to end. The Franklin County Board of Elections received a grant to cover the costs of two greeters outside that helped keep the line orderly. They did their job with a welcoming smile, as did our inside greeter that offered masks and a squirt of hand sanitizer. Still, the voters kept back-filling that line, and we didn’t see a break until 12:30. The PEOs serving as machine judges were excellent. This was the first time that many voters had seen the new ballot equipment that Franklin County put in place last year. I like this new equipment: the ballot marker prints your choices on a card that it spits out. You can see and verify your choices on the card before you slide it into a separate ballot counter. That is reassuring for me, and I think for most voters, too. But it is new, and it takes a few extra moments to explain. The machine judges got the voters comfortable with the new process. And some of the best parts of the day were when they announced to the room that a first-time voter had just put their card into the ballot counter. We would all pause and cheer. The third group of PEOs at our location were the paper table judges. They handle all of the exceptions. Someone wants to vote with a pre-printed paper ballot rather than using a machine? To the paper table! The roster shows that someone requested an absentee ballot? That voter needs to vote a “provisional” ballot that will be counted at the Board of Elections office if the absentee ballot isn’t received in the mail. The paper table judges explain that with kindness and grace. In the wrong location? The paper table judges would find the correct place. The two paper table PEOs clearly had experience helping voters with the nuances of election processes. Rounding out the team were two voting location deputies (VLD). By law, a polling location can’t have a VLD and a voting location manager (VLM) of the same political party. That is part of the checks and balances built into the system. One VLD had been a VLM at this location, and she had a wealth of history and wisdom about running a smooth polling location. For the other VLD, this was his first experience as a precinct election officer, and he jumped in with both feet to do the visible and not-so-visible things that made for a smooth operation. He reminded me a bit of myself a year ago. My first PEO position was as a voting location deputy last November. The pair handled a challenging curbside voter situation where it wasn’t entirely clear if one of the voters in the car was sick. I’d be so lucky to work with them again. The last two hours of the open polls yesterday were dreadfully dull. After the excitement of the morning, we may have averaged a voter every 10 minutes for those last two hours. Everyone was ready to pack it in early and go home. (Polls in Ohio close at 7:30, so counting the hour early for setup and the half an hour for tear down, this was going to be a 14 to 15 hour day.) Over the last hour, I gave the PEOs little tasks to do. At one point, I said they could collect the barcode scanners attached to the ballot markers. We weren’t using them anyway because the electronic poll pads were not functional. Then, in stages (as it became evident that there was no final rush of voters), they could pack up one or two machines and put away tables. Our second to last voter was someone in medical scrubs that just got off their shift. I scared our last voter because she walked up to the roster table at 7:29:30. Thirty seconds later, I called out that the polls are closed (as I think a VLM is required to do), and she looked at me startled. (She got to vote, of course; that’s the rule.) She was our last voter; 799 voters in our precinct that day. Then our team packed everything up as efficiently as they had worked all day. We had put away the equipment and signs, done our final counts, closed out the ballot counter, and sealed the ballot bin. At 8:00, we were done and waving goodbye to our host facility’s office manager. One of the VLD rode along with me to the board of elections to drop off the ballots, and she told me of a shortcut to get there. We were among the first reporting results for Franklin County. I was home again by a quarter of 10—exhausted but proud. I’m so happy that I had something to do yesterday. After weeks of concern and anxiety for how the election was going to turn out, it was a welcome bit of activity to ensure the election was held safely and that voters got to have their say. It was certainly more productive than continually reloading news and election results pages. The anxiety of being put in charge of a polling location was set at ease, too. I’m proud of our polling place team and that the voters in our charge seemed pleased and confident about the process. Maybe you will find inspiration here. If you voted, hopefully it felt good (whether or not the result turned out as you wanted). If you voted for the first time, congratulations and welcome to the club (be on the look-out for the next voting opportunity…likely in the spring). If being a poll worker sounded like fun, get in touch with your local board of elections (here is information about being a poll worker in Franklin County). Democracy is participatory. You’ve got to tune in and show up to make it happen. Certificate of Appreciation Running an All-Online Conference with Zoom [post removed] This is an article draft that was accidentally published. I hope to work on a final version soon. If you really want to see it, I saved a copy on the Internet Archive Wayback Machine. With Gratitude for the NISO Ann Marie Cunningham Service Award During the inaugural NISO Plus meeting at the end of February, I was surprised and proud to receive the Ann Marie Cunningham Service award. Todd Carpenter, NISO’s executive director, let me know by tweet as I was not able to attend the conference. Pictured in that tweet is my co-recipient, Christine Stohn, who serves NISO with me as the co-chair of the Information Delivery and Interchange Topic Committee. This got me thinking about what NISO has meant to me. As I think back on it, my activity in NISO spans at least four employers and many hours of standard working group meetings, committee meetings, presentations, and ballot reviews. NISO Ann Marie Cunningham Service Award I did not know Ms Cunningham, the award’s namesake. My first job started when she was the NFAIS executive director in the early 1990s, and I hadn’t been active in the profession yet. I read her brief biography on the NISO website: The Ann Marie Cunningham Service award was established in 1994 to honor NFAIS members who routinely went above and beyond the normal call of duty to serve the organization. It is named after Ann Marie Cunningham who, while working with abstracting and information services such as Biological Abstracts and the Institute for Scientific Information (both now part of NISO-member Clarivate Analytics), worked tirelessly as an dedicated NFAIS volunteer. She ultimately served as the NFAIS Executive Director from 1991 to 1994 when she died unexpectedly. NISO is pleased to continue to present this award to honor a NISO volunteer who has shown the same sort of commitment to serving our organization. As I searched the internet for her name, I came across the proceedings of the 1993 NFAIS meeting, in which Ms Cunningham wrote the introduction with Wendy Wicks. These first sentences from some of the paragraphs of that introduction are as true today as they were then: In an era of rapidly expanding network access, time and distance no longer separate people from information. Much has been said about the global promise of the Internet and the emerging concept of linking information highways, to some people, “free” ways. What many in the networking community, however, seem to take for granted is the availability of vital information flowing on these high-speed links. I wonder what Ms Cunningham of 1993 would think of the information landscape today? Hypertext linking has certainly taken off, if not taken over, the networked information landscape. How that interconnectedness has improved with the adaptation of print-oriented standards and the creation of new standards that match the native capabilities of the network. In just one corner of that space, we have the adoption of PDF as a faithful print replica and HTML as a common tool for displaying information. In another corner, MARC has morphed into a communication format that far exceeds its original purpose of encoding catalog cards; we have an explosion of purpose-built metadata schemas and always the challenge of finding common ground in tools like Dublin Core and Schema.org. We’ve seen several generations of tools and protocols for encoding, distributing, and combining data in new ways to reach users. And still we strive to make it better…to more easily deliver a paper to its reader—a dataset to its next experimenter—an idea to be built upon by the next generation. It is that communal effort to make a better common space for ideas that drives me forward. To work in a community at the intersection of libraries, publishers, and service providers is an exciting and fulfilling place to be. I’m grateful to my employers that have given me the ability to participate while bringing the benefits of that connectedness to my organizations. I was not able to be at NISO Plus to accept the award in person, but I was so happy to be handed it by Jason Griffey of NISO about a week later during the Code4lib conference in Pittsburgh. What made that even more special was to learn that Jason created it on his own 3D printer. Thank you to the new NFAIS-joined-with-NISO community for honoring me with this service award. Tethering a Ubiquity Network to a Mobile Hotspot I saw it happen. The cable-chewing device The contractor in the neighbor’s back yard with the Ditch Witch trencher burying a cable. I was working outside at the patio table and just about to go into a Zoom meeting. Then the internet dropped out. Suddenly, and with a wrenching feeling in my gut, I remembered where the feed line was buried between the house and the cable company’s pedestal in the right-of-way between the properties. Yup, he had just cut it. To be fair, the utility locator service did not mark the my cable’s location, and he was working for a different cable provider than the one we use. (There are three providers in our neighborhood.) It did mean, though, that our broadband internet would be out until my provider could come and run another line. It took an hour of moping about the situation to figure out a solution, then another couple of hours to put it in place: an iPhone tethered to a Raspberry Pi that acted as a network bridge to my home network’s UniFi Security Gateway 3P. Network diagram with tethered iPhone A few years ago I was tired of dealing with spotty consumer internet routers and upgraded the house to UniFi gear from Ubiquity. Rob Pickering, a college comrade, had written about his experience with the gear and I was impressed. It wasn’t a cheap upgrade, but it was well worth it. (Especially now with four people in the household working and schooling from home during the COVID-19 outbreak.) The UniFi Security Gateway has three network ports, and I was using two: one for the uplink to my cable internet provider (WAN) and one for the local area network (LAN) in the house. The third port can be configured as another WAN uplink or as another LAN port. And you can tell the Security Gateway to use the second WAN as a failover for the first WAN (or as load balancing the first WAN). So that is straight forward enough, but do I get the Personal Hotspot on the iPhone to the second WAN port? That is where the Raspberry Pi comes in. The Raspberry Pi is a small computer with USB, ethernet, HDMI, and audio ports. The version I had laying around is a Raspberry Pi 2—an older model, but plenty powerful enough to be the network bridge between the iPhone and the home network. The toughest part was bootstrapping the operating system packages onto the Pi with only the iPhone Personal Hotspot as the network. That is what I’m documenting here for future reference. Bootstrapping the Raspberry Pi The Raspberry Pi runs its own operating system called Raspbian (a Debian/Linux derivative) as well as more mainstream operating systems. I chose to use the Ubuntu Server for Raspberry Pi instead of Raspbian because I’m more familiar with Ubuntu. I tethered my MacBook Pro to the iPhone to download the Ubuntu 18.04.4 LTS image and follow the instructions for copying that disk image to the Pi’s microSD card. That allows me to boot the Pi with Ubuntu and a basic set of operating system packages. The Challenge: Getting the required networking packages onto the Pi It would have been really nice to plug the iPhone into the Pi with a USB-Lightning cable and have it find the tethered network. That doesn’t work, though. Ubuntu needs at least the usbmuxd package in order to see the tethered iPhone as a network device. That package isn’t a part of the disk image download. And of course I can’t plug my Pi into the home network to download it (see first paragraph of this post). My only choice was to tether the Pi to the iPhone over WiFi with a USB network adapter. And that was a bit of Ubuntu voodoo. Fortunately, I found instructions on configuring Ubuntu to use a WPA-protected wireless network (like the one that the iPhone Personal Hotspot is providing). In brief: sudo -i cd /root wpa_passphrase my_ssid my_ssid_passphrase > wpa.conf screen -q wpa_supplicant -Dwext -iwlan0 -c/root/wpa.conf <control-a> c dhclient -r dhclient wlan0 Explanation of lines: Use sudo to get a root shell Change directory to root’s home Use the wpa_passphrase command to create a wpa.conf file. Replace my_ssid with the wireless network name provided by the iPhone (your iPhone’s name) and my_ssid_passphrase with the wireless network passphrase (see the “Wi-Fi Password” field in Settings -> Personal Hotspot). Start the screen program (quietly) so we can have multiple pseudo terminals. Run the wpa_supplicant command to connect to the iPhone wifi hotspot. We run this the foreground so we can see the status/error messages; this program must continue running to stay connected to the wifi network. Use the screen hotkey to create a new pseudo terminal. This is control-a followed by a letter c. Use dhclient to clear out any DHCP network parameters Use dhclient to get an IP address from the iPhone over the wireless network. Now I was at the point where I could install Ubuntu packages. (I ran ping www.google.com to verify network connectivity.) To install the usbmuxd and network bridge packages (and their prerequisites): apt-get install usbmuxd bridge-utils If your experience is like mine, you’ll get an error back: couldn't get lock /var/lib/dpkg/lock-frontend The Ubuntu Pi machine is now on the network, and the automatic process to install security updates is running. That locks the Ubuntu package registry until it finishes. That took about 30 minutes for me. (I imagine this varies based on the capacity of your tethered network and the number of security updates that need to be downloaded.) I monitored the progress of the automated process with the htop command and tried the apt-get command when it finished. If you are following along, now would be a good time to skip ahead to Configuring the UniFi Security Gateway if you haven’t already set that up. Turning the Raspberry Pi into a Network Bridge With all of the software packages installed, I restarted the Pi to complete the update: shutdown -r now While it was rebooting, I pulled out the USB wireless adapter from the Pi and plugged in the iPhone’s USB cable. The Pi now saw the iPhone as eth1, but the network did not start until I went to the iPhone to say that I “Trust” the computer that it is plugged into. When I did that, I ran these commands on the Ubuntu Pi: dhclient eth1 brctl addbr iphonetether brctl addif iphonetether eth0 eth1 brctl stp iphonetether on ifconfig iphonetether up Explanation of lines: Get an IP address from the iPhone over the USB interface Add a network bridge (the iphonetether is an arbitrary string; some instructions simply use br0 for the zero-ith bridge) Add the two ethernet interfaces to the network bridge Turn on the Spanning Tree Protocol (I don’t think this is actually necessary, but it does no harm) Bring up the bridge interface The bridge is now live! Thanks to Amitkumar Pal for the hints about using the Pi as a network bridge. More details about the bridge networking software is on the Debian Wiki. Note! I'm using a hardwired keyboard/monitor to set up the Raspbery Pi. I've heard from someone that was using SSH to run these commands, and the SSH connection would break off at brctl addif iphonetecther eth0 eth1 Configuring the UniFi Security Gateway I have a UniFi Cloud Key, so I could change the configuration of the UniFi network with a browser. (You’ll need to know the IP address of the Cloud Key; hopefully you have that somewhere.) I connected to my Cloud Key at https://192.168.1.58:8443/ and clicked through the self-signed certificate warning. First I set up a second Wide Area Network (WAN—your uplink to the internet) for the iPhone Personal Hotspot: Settings -> Internet -> WAN Networks. Select “Create a New Network”: Network Name: Backup WAN IPV4 Connection Type: Use DHCP IPv6 Connection Types: Use DHCPv6 DNS Server: 1.1.1.1 and 1.0.0.1 (CloudFlare’s DNS servers) Load Balancing: Failover only The last selection is key…I wanted the gateway to only use this WAN interfaces as a backup to the main broadband interface. If the broadband comes back up, I want to stop using the tethered iPhone! Second, assign the Backup WAN to the LAN2/WAN2 port on the Security Gateway (Devices -> Gateway -> Ports -> Configure interfaces): Port WAN2/LAN2 Network: WAN2 Speed/Duplex: Autonegotiate Apply the changes to provision the Security Gateway. After about 45 seconds, the Security Gateway failed over from “WAN iface eth0” (my broadband connection) to “WAN iface eth2” (my tethered iPhone through the Pi bridge). These showed up as alerts in the UniFi interface. Performance and Results So I’m pretty happy with this setup. The family has been running simultaneous Zoom calls and web browsing on the home network, and the performance has been mostly normal. Web pages do take a little longer to load, but whatever Zoom is using to dynamically adjust its bandwidth usage is doing quite well. This is chewing through the mobile data quota pretty fast, so it isn’t something I want to do every day. Knowing that this is possible, though, is a big relief. As a bonus, the iPhone is staying charged via the 1 amp power coming through the Pi. Managing Remote Conference Presenters with Zoom Bringing remote presenters into a face-to-face conference is challenging and fraught with peril. In this post, I describe a scheme using Zoom that had in-person attendees forgetting that the presenter was remote! The Code4Lib conference was this week, and with the COVID-19 pandemic breaking through many individuals and institutions made decisions to not travel to Pittsburgh for the meeting. We had an unprecedented nine presentations that were brought into the conference via Zoom. I was chairing the livestream committee for the conference (as I have done for several years—skipping last year), so it made the most sense for me to arrange a scheme for remote presenters. With the help of the on-site A/V contractor, we were able to pull this off with minimal requirements for the remote presenter. List of Requirements 2 Zoom Pro accounts 1 PC/Mac with video output, as if you were connecting an external monitor (the “Receiving Zoom” computer) 1 PC/Mac (the “Coordinator Zoom” computer) 1 USB audio interface Hardwired network connection for the Receiving Zoom computer (recommended) The Pro-level Zoom accounts were required because we needed to run a group call for longer than 40 minutes (to include setup time). And two were needed: one for the Coordinator Zoom machine and one for the dedicated Receiving Zoom machine. It would have been possible to consolidate the two Zoom Pro accounts and the two PC/Mac machines into one, but we had back-to-back presenters at Code4Lib, and I wanted to be able to help one remote presenter get ready while another was presenting. In addition to this equipment, the A/V contractor was indispensable in making the connection work. We fed the remote presenter’s video and audio from the Receiving Zoom computer to the contractor’s A/V switch through HDMI, and the contractor put the video on the ballroom projectors and audio through the ballroom speakers. The contractor gave us a selective audio feed of the program audio minus the remote presenter’s audio (so they wouldn’t hear themselves come back through the Zoom meeting). This becomes a little clearer in the diagram below. Physical Connections and Setup This diagram shows the physical connections between machines. The Audio Mixer and Video Switch were provided and run by the A/V contractor. The Receiving Zoom machine was the one that is connected to the A/V contractor’s Video Switch via an HDMI cable coming off the computer’s external monitor connection. In the Receiving Zoom computer’s control panel, we set the external monitor to mirror what was on the main monitor. The audio and video from the computer (i.e., the Zoom call) went out the HDMI cable to the A/V contractor’s Video Switch. The A/V contractor took the audio from the Receiving Zoom computer through the Video Switch and added it to the Audio Mixer as an input channel. From there, the audio was sent out to the ballroom speakers the same way audio from the podium microphone was amplified to the audience. We asked the A/V contractor to create an audio mix that includes all of the audio sources except the Receiving Zoom computer (e.g., in-room microphones) and plugged that into the USB Audio interface. That way, the remote presenter could hear the sounds from the ballroom—ambient laughter, questions from the audience, etc.—in their Zoom call. (Note that it was important to remove the remote presenter’s own speaking voice from this audio mix; there was a significant, distracting delay between the time the presenter spoke and the audio was returned to them through the Zoom call.) We used a hardwired network connection to the internet, and I would recommend that—particularly with tech-heavy conferences that might overflow the venue wi-fi. (You don’t want your remote presenter’s Zoom to have to compete with what attendees are doing.) Be aware that the hardwired network connection will cost more from the venue, and may take some time to get functioning since this doesn’t seem to be something that hotels often do. In the Zoom meeting, we unmuted the microphone and selected the USB Audio interface as the microphone input. As the Zoom meeting was connected, we made the meeting window full-screen so the remote presenter’s face and/or presentation were at the maximum size on the ballroom projectors. Setting Up the Zoom Meetings The two Zoom accounts came from the Open Library Foundation. (Thank you!) As mentioned in the requirements section above, these were Pro-level accounts. The two accounts were olf_host2@openlibraryfoundation.org and olf_host3@openlibraryfoundation.org. The olf_host2 account was used for the Receiving Zoom computer, and the olf_host3 account was used for the Coordinator Zoom computer. The Zoom meeting edit page looked like this: This is for the “Code4Lib 2020 Remote Presenter A” meeting with the primary host as olf_host2@openlibraryfoundation.org. Note these settings: A recurring meeting that ran from 8:00am to 6:00pm each day of the conference. Enable join before host is checked in case the remote presenter got on the meeting before I did. Record the meeting automatically in the cloud to use as a backup in case something goes wrong. Alternative Hosts is olf_host3@openlibraryfoundation.org The “Code4Lib 2020 Remote Presenter B” meeting was exactly the same except the primary host was olf_host3, and olf_host2 was added as an alternative host. The meetings were set up with each other as the alternative host so that the Coordinator Zoom computer could start the meeting, seamlessly hand it off to the Receiving Zoom computer, then disconnect. Preparing the Remote Presenter Remote presenters were given this information: Code4Lib will be using Zoom for remote presenters. In addition to the software, having the proper audio setup is vital for a successful presentation. Microphone: The best option is a headset or earbuds so a microphone is close to your mouth. Built-in laptop microphones are okay, but using them will make it harder for the audience to hear you. Speaker: A headset or earbuds are required. Do not use your computer’s built-in speakers. The echo cancellation software is designed for small rooms and cannot handle the delay caused by large ballrooms. You can test your setup with a test Zoom call. Be sure your microphone and speakers are set correctly in Zoom. Also, try sharing your screen on the test call so you understand how to start and stop screen sharing. The audience will see everything on your screen, so quit/disable/turn-off notifications that come from chat programs, email clients, and similar tools. Plan to connect to the Zoom meeting 30 minutes before your talk to work out any connection or setup issues. At the 30-minute mark before the remote presentation, I went to the ballroom lobby and connected to the designated Zoom meeting for the remote presenter using the Coordinator Zoom computer. I used this checklist with each presenter: Check presenter’s microphone level and sound quality (make sure headset/earbud microphone is being used!) Check presenter’s speakers and ensure there is no echo Test screen-sharing (start and stop) with presenter Remind presenter to turn off notifications from chat programs, email clients, etc. Remind the presenter that they need to keep track of their own time; there is no way for us to give them cues about timing other than interrupting them when their time is up The critical item was making sure the audio worked (that their computer was set to use the headset/earbud microphone and audio output). The result was excellent sound quality for the audience. When the remote presenter was set on the Zoom meeting, I returned to the A/V table and asked a livestream helper to connect the Receiving Zoom to the remote presenter’s Zoom meeting. At this point, the remote presenter can hear the audio in the ballroom of the speaker before them coming through the Receiving Zoom computer. Now I would lock the Zoom meeting to prevent others from joining and interrupting the presenter (from the Zoom Participants panel, select More then Lock Meeting). I hung out on the remote presenter’s meeting on the Coordinator Zoom computer in case they had any last-minute questions. As the speaker in the ballroom was finishing up, I wished the remote presenter well and disconnected the Coordinator Zoom computer from the meeting. (I always selected Leave Meeting rather than End Meeting for All so that the Zoom meeting continued with the remote presenter and the Receiving Zoom computer.) As the remote presenter was being introduced—and the speaker would know because they could hear it in their Zoom meeting—the A/V contractor switched the video source for the ballroom projectors to the Receiving Zoom computer and unmuted the Receiving Zoom computer’s channel on the Audio Mixer. At this point, the remote speaker is off-and-running! Last Thoughts This worked really well. Surprisingly well. So well that I had a few people comment that they were taken aback when they realized that there was no one standing at the podium during the presentation. I’m glad I had set up the two Zoom meetings. We had two cases where remote presenters were back-to-back. I was able to get the first remote presenter set up and ready on one Zoom meeting while preparing the second remote presenter on the other Zoom meeting. The most stressful part was at the point when we disconnected the first presenter’s Zoom meeting and quickly connected to the second presenter’s Zoom meeting. This was slightly awkward for the second remote presenter because they didn’t hear their full introduction as it happened and had to jump right into their presentation. This could be solved by setting up a second Receiving Zoom computer, but this added complexity seemed to be too much for the benefit gained. I would definitely recommend making this setup a part of the typical A/V preparations for future Code4Lib conferences. We don’t know when an individual’s circumstances (much less a worldwide pandemic) might cause a last-minute request for a remote presentation capability, and the overhead of the setup is pretty minimal. What is known about GetFTR at the end of 2019 In early December 2019, a group of publishers announced Get-Full-Text-Research, or GetFTR for short. There was a heck of a response on social media, and the response was—on the whole—not positive from my librarian-dominated corner of Twitter. For my early take on GetFTR, see my December 3rd blog post “Publishers going-it-alone (for now?) with GetFTR.” As that post title suggests, I took the five founding GetFTR publishers to task on their take-it-or-leave-it approach. I think that is still a problem. To get you caught up, here is a list of other commentary. Roger Schonfeld’s December 3rd “Publishers Announce a Major New Service to Plug Leakage” piece in The Scholarly Kitchen Tweet from Herbert Van de Sompel, the lead author of the OpenURL spec, on solving the appropriate copy problem December 5th post “Get To Fulltext Ourselves, Not GetFTR.” on the Open Access Button blog Twitter thread on December 7th between @cshillum and @lisalibrarian on the positioning of GetFTR in relation to link resolvers and an unanswered question about how GetFTR aligns with library interests Twitter thread started by @TAC_NISO on December 9th looking for more information with a link to an STM Association presentation added by @aarontay A tree of tweets starting from @mrgunn’s [I don’t trust publishers to decide] is the crux of the whole thing. In particular, threads of that tweet that include Jason Griffey of NISO saying he knew nothing about GetFTR and Bernhard Mittermaier’s point about hidden motivations behind GetFTR Twitter thread started by @aarontay on December 7th saying “GetFTR is bad for researchers/readers and librarians. It only benefits publishers, change my mind.” Lisa Janicke Hinchliffe’s December 10th “Why are Librarians Concerned about GetFTR?” in The Scholarly Kitchen and take note of the follow-up discussion in the comments Twitter thread between @alison_mudditt and @lisalibrarian clarifying PLOS is not on the Advisory Board with some @TAC_NISO as well. Ian Mulvany’s December 11th “thoughts on GetFTR” on ScholCommsProd GetFTR’s December 11th “Updating the community” post on their website The Spanish Federation of Associations of Archivists, Librarians, Archaeologists, Museologists and Documentalists (ANABAD)’s December 12th “GetFTR: new publishers service to speed up access to research articles” (original in Spanish, Google Translate to English) December 20th news entry from eContent Pro with the title “What GetFTR Means for Journal Article Access” which I’ll only quarrel with this sentence: “Thus, GetFTR is a service where Academic articles are found and provided to you at absolutely no cost.” No—if you are in academia the cost is born by your library even if you don’t see it. But this seems like a third party service that isn’t directly related to publishers or libraries, so perhaps they can be forgiven for not getting that nuance. Wiley’s Chemistry Views news post on December 26th titled simply “Get Full Text Research (GetFTR)” is perhaps only notable for the sentence “Growing leakage has steadily eroded the ability of the publishers to monetize the value they create.” If you are looking for a short list of what to look at, I recommend these posts. GetFTR’s Community Update On December 11—after the two posts I list below—an “Updating the Community” web page was posted to the GetFTR website. From a public relations perspective, it was…interesting. We are committed to being open and transparent This section goes on to say, “If the community feels we need to add librarians to our advisory group we will certainly do so and we will explore ways to ensure we engage with as many of our librarian stakeholders as possible.” If the GetFTR leadership didn’t get the indication between December 3 and December 12 that librarians feel strongly about being at the table, then I don’t know what will. And it isn’t about being on the advisory group; it is about being seen and appreciated as important stakeholders in the research discovery process. I’m not sure who the “community” is in this section, but it is clear that librarians are—at best—an afterthought. That is not the kind of “open and transparent” that is welcoming. Later on in the Questions about library link resolvers section is this sentence: We have, or are planning to, consult with existing library advisory boards that participating publishers have, as this enables us to gather views from a significant number of librarians from all over the globe, at a range of different institutions. As I said in my previous post, I don’t know why GetFTR is not engaging in existing cross-community (publisher/technology-supplier/library) organizations to have this discussion. It feels intentional, which colors the perception of what the publishers are trying to accomplish. To be honest, I don’t think the publishers are using GetFTR to drive a wedge between library technology service providers (who are needed to make GetFTR a reality for libraries) and libraries themselves. But I can see how that interpretation could be made. Understandably, we have been asked about privacy. I punted on privacy in my previous post, so let’s talk about it here. It remains to be seen what is included in the GetFTR API request between the browser and the publisher site. Sure, it needs to include the DOI and a token that identifies the patron’s institution. We can inspect that API request to ensure nothing else is included. But the fact that the design of GetFTR has the browser making the call to the publisher site means that the publisher site knows the IP address of the patron’s browser, and the IP address can be considered personally identifiable information. This issue could be fixed by having the link resolver or the discovery layer software make the API request, and according to the Questions about library link resolvers section of the community update, this may be under consideration. So, yes, an auditable privacy policy and implementation is key for for GetFTR. GetFTR is fully committed to supporting third-party aggregators This is good to hear. I would love to see more information published about this, including how discipline-specific repositories and institutional repositories can have their holdings represented in GetFTR responses. My Take-a-ways In the second to last paragraph: “Researchers should have easy, seamless pathways to research, on whatever platform they are using, wherever they are.” That is a statement that I think every library could sign onto. This Updating the Community is a good start, but the project has dug a deep hole of trust and it hasn’t reached level ground yet. Lisa Janicke Hinchliffe’s “Why are Librarians Concerned about GetFTR?” Posted on December 10th in The Scholarly Kitchen, Lisa outlines a series of concerns from a librarian perspective. I agree with some of these; others are not an issue in my opinion. Librarian Concern: The Connection to Seamless Access Many librarians have expressed a concern about how patron information can leak to the publisher through ill-considered settings at an institution’s identity provider. Seamless Access can ease access control because it leverages a campus’ single sign-on solution—something that a library patron is likely to be familiar with. If the institution’s identity provider is overly permissive in the attributes about a patron that get transmitted to the publisher, then there is a serious risk of tying a user’s research activity to their identity and the bad things that come from that (patrons self-censoring their research paths, commoditization of patron activity, etc.). I’m serving on a Seamless Access task force that is addressing this issue, and I think there are technical, policy, and education solutions to this concern. In particular, I think some sort of intermediate display of the attributes being transmitted to the publisher is most appropriate. Librarian Concern: The Limited User Base Enabled As Lisa points out, the population of institutions that can take advantage of Seamless Access, a prerequisite for GetFTR, is very small and weighted heavily towards well-resourced institutions. To the extent that projects like Seamless Access (spurred on by a desire to have GetFTR-like functionality) helps with the adoption of SAML-based infrastructure like Shibboleth, then the whole academic community benefits from a shared authentication/identity layer that can be assumed to exist. Librarian Concern: The Insertion of New Stumbling Blocks Of the issues Lisa mentioned here, I’m not concerned about users being redirected to their campus single sign-on system in multiple browsers on multiple machines. This is something we should be training users about—there is a single website to put your username/password into for whatever you are accessing at the institution. That a user might already be logged into the institution single sign-on system in the course of doing other school work and never see a logon screen is an attractive benefit to this system. That said, it would be useful for an API call from a library’s discovery layer to a publisher’s GetFTR endpoint to be able to say, “This is my user. Trust me when I say that they are from this institution.” If that were possible, then the Seamless Access Where-Are-You-From service could be bypassed for the GetFTR purpose of determining whether a user’s institution has access to an article on the publisher’s site. It would sure be nice if librarians were involved in the specification of the underlying protocols early on so these use cases could be offered. Update Lisa reached out on Twitter to say (in part): “Issue is GetFTR doesn’t redirect and SA doesnt when you are IPauthenticated. Hence user ends up w mishmash of experience.” I went back to read her Scholarly Kitchen post and realized I did not fully understand her point. If GetFTR is relying on a Seamless Access token to know which institution a user is coming from, then that token must get into the user’s browser. The details we have seen about GetFTR don’t address how that Seamless Access institution token is put in the user’s browser if the user has not been to the Seamless Access select-your-institution portal. One such case is when the user is coming from an IP-address-authenticated computer on a campus network. Do the GetFTR indicators appear even when the Seamless Access institution token is not stored in the browser? If at the publisher site the GetFTR response also uses the institution IP address table to determine entitlements, what does a user see when they have neither the Seamless Access institution token nor the institution IP address? And, to Lisa’s point, how does one explain this disparity to users? Is the situation better if the GetFTR determination is made in the link resolver rather than in the user browser? Librarian Concern: Exclusion from Advisory Committee See previous paragraph. That librarians are not at the table offering use cases and technical advice means that the developers are likely closing off options that meet library needs. Addressing those needs would ease the acceptance of the GetFTR project as mutually beneficial. So an emphatic “AGREE!” with Lisa on her points in this section. Publishers—what were you thinking? Librarian Concern: GetFTR Replacing the Library Link Resolver Libraries and library technology companies are making significant investments in tools that ease the path from discovery to delivery. Would the library’s link resolver benefit from a real-time API call to a publisher’s service that determines the direct URL to a specific DOI? Oh, yes—that would be mighty beneficial. The library could put that link right at the top of a series of options that include a link to a version of the article in a Green Open Access repository, redirection to a content aggregator, one-click access to an interlibrary-loan form, or even an option where the library purchases a copy of the article on behalf of the patron. (More likely, the link resolver would take the patron right to the article URL supplied by GetFTR, but the library link resolver needs to be in the loop to be able to offer the other options.) My Take-a-ways The patron is affiliated with the institution, and the institution (through the library) is subscribing to services from the publisher. The institution’s library knows best what options are available to the patron (see above section). Want to know why librarians are concerned? Because they are inserting themselves as the arbiter of access to content, whether it is in the patron’s best interest or not. It is also useful to reinforce Lisa’s closing paragraph: Whether GetFTR will act to remediate these concerns remains to be seen. In some cases, I would expect that they will. In others, they may not. Publishers’ interests are not always aligned with library interests and they may accept a fraying relationship with the library community as the price to pay to pursue their strategic goals. Ian Mulvany’s “thoughts on GetFTR” Ian’s entire post from December 11th in ScholCommsProd is worth reading. I think it is an insightful look at the technology and its implications. Here are some specific comments: Clarifying the relation between SeamlessAccess and GetFTR There are a couple of things that I disagree with: OK, so what is the difference, for the user, between seamlessaccess and GetFTR? I think that the difference is the following - with seamless access you the user have to log in to the publisher site. With GetFTR if you are providing pages that contain DOIs (like on a discovery service) to your researchers, you can give them links they can click on that have been setup to get those users direct access to the content. That means as a researcher, so long as the discovery service has you as an authenticated user, you don’t need to even think about logins, or publisher access credentials. To the best of my understanding, this is incorrect. With SeamlessAccess, the user is not “logging into the publisher site.” If the publisher site doesn’t know who a user is, the user is bounced back to their institution’s single sign-on service to authenticate. If the publisher site doesn’t know where a user is from, it invokes the SeamlessAccess Where-Are-You-From service to learn which institution’s single sign-on service is appropriate for the user. If a user follows a GetFTR-supplied link to a publisher site but the user doesn’t have the necessary authentication token from the institution’s single sign-on service, then they will be bounced back for the username/password and redirected to the publisher’s site. GetFTR signaling that an institution is entitled to view an article does not mean the user can get it without proving that they are a member of the institution. What does this mean for Green Open Access A key point that Ian raises is this: One example of how this could suck, lets imagine that there is a very usable green OA version of an article, but the publisher wants to push me to using some “e-reader limited functionality version” that requires an account registration, or god forbid a browser exertion, or desktop app. If the publisher shows only this limited utility version, and not the green version, well that sucks. Oh, yeah…that does suck, and it is because the library—not the publisher of record—is better positioned to know what is best for a particular user. Will GetFTR be adopted? Ian asks, “Will google scholar implement this, will other discovery services do so?” I do wonder if GetFTR is big enough to attract the attention of Google Scholar and Microsoft Research. My gut tells me “no”: I don’t think Google and Microsoft are going to add GetFTR buttons to their search results screens unless they are paid a lot. As for Google Scholar, it is more likely that Google would build something like GetFTR to get the analytics rather than rely on a publisher’s version. I’m even more doubtful that the companies pushing GetFTR can convince discovery layers makers to embed GetFTR into their software. Since the two widely adopted discovery layers (in North America, at least) are also aggregators of journal content, I don’t see the discovery-layer/aggregator companies devaluing their product by actively pushing users off their site. My Take-a-ways It is also useful to reinforce Ian’s closing paragraph: I have two other recommendations for the GetFTR team. Both relate to building trust. First up, don’t list orgs as being on an advisory board, when they are not. Secondly it would be great to learn about the team behind the creation of the Service. At the moment its all very anonymous. Where Do We Stand? Wow, I didn’t set out to write 2,500 words on this topic. At the start I was just taking some time to review everything that happened since this was announced at the start of December and see what sense I could make of it. It turned into a literature review of sort. While GetFTR has some powerful backers, it also has some pretty big blockers: Can GetFTR help spur adoption of Seamless Access enough to convince big and small institutions to invest in identity provider infrastructure and single sign-on systems? Will GetFTR grab the interest of Google, Google Scholar, and Microsoft Research (where admittedly a lot of article discovery is already happening)? Will developers of discovery layers and link resolvers prioritize GetFTR implementation in their services? Will libraries find enough value in GetFTR to enable it in their discovery layers and link resolvers? Would libraries argue against GetFTR in learning management systems, faculty profile systems, and other campus systems if its own services cannot be included in GetFTR displays? I don’t know, but I think it is up to the principles behind GetFTR to make more inclusive decisions. The next steps is theirs. Publishers going-it-alone (for now?) with GetFTR In early December 2019, a group of publishers announced Get-Full-Text-Research, or GetFTR for short. I read about this first in Roger Schonfeld’s “Publishers Announce a Major New Service to Plug Leakage” piece in The Scholarly Kitchen via Jeff Pooley’s Twitter thread and blog post. Details about how this works are thin, so I’m leaning heavily on Roger’s description. I’m not as negative about this as Jeff, and I’m probably a little more opinionated than Roger. This is an interesting move by publishers, and—as the title of this post suggests—I am critical of the publisher’s “go-it-alone” approach. First, some disclosure might be in order. My background has me thinking of this in the context of how it impacts libraries and library consortia. For the past four years, I’ve been co-chair of the NISO Information Discovery and Interchange topic committee (and its predecessor, the “Discovery to Delivery” topic committee), so this is squarely in what I’ve been thinking about in the broader library-publisher professional space. I also traced the early development of RA21 and more recently am volunteering on the SeamlessAccess Entity Category and Attribute Bundles Working Group; that’ll become more important a little further down this post. I was nodding along with Roger’s narrative until I stopped short here: The five major publishing houses that are the driving forces behind GetFTR are not pursuing this initiative through one of the major industry collaborative bodies. All five are leading members of the STM Association, NISO, ORCID, Crossref, and CHORUS, to name several major industry groups. But rather than working through one of these existing groups, the houses plan instead to launch a new legal entity.  While [Vice President of Product Strategy & Partnerships for Wiley Todd] Toler and [Senior Director, Technology Strategy & Partnerships for the American Chemical Society Ralph] Youngen were too politic to go deeply into the details of why this might be, it is clear that the leadership of the large houses have felt a major sense of mismatch between their business priorities on the one hand and the capabilities of these existing industry bodies. At recent industry events, publishing house CEOs have voiced extensive concerns about the lack of cooperation-driven innovation in the sector. For example, Judy Verses from Wiley spoke to this issue in spring 2018, and several executives did so at Frankfurt this fall. In both cases, long standing members of the scholarly publishing sector questioned if these executives perhaps did not realize the extensive collaborations driven through Crossref and ORCID, among others. It is now clear to me that the issue is not a lack of knowledge but rather a concern at the executive level about the perceived inability of existing collaborative vehicles to enable the new strategic directions that publishers feel they must pursue.  This is the publishers going-it-alone. To see Roger describe it, they are going to create this web service that allows publishers to determine the appropriate copy for a patron and do it without input from the libraries. Librarians will just be expected to put this web service widget into their discovery services to get “colored buttons indicating that the link will take [patrons] to the version of record, an alternative pathway, or (presumably in rare cases) no access at all.” (Let’s set aside for the moment the privacy implications of having a fourth-party web service recording all of the individual articles that come up in a patron’s search results.) Librarians will not get to decide the “alternative pathway” that is appropriate for the patron: “Some publishers might choose to provide access to a preprint or a read-only version, perhaps in some cases on some kind of metered basis.” (Roger goes on to say that he “expect[s] publishers will typically enable some alternative version for their content, in which case the vast majority of scholarly content will be freely available through publishers even if it is not open access in terms of licensing.” I’m not so confident.) No, thank you. If publishers want to engage in technical work to enable libraries and others to build web services that determine the direct link to an article based on a DOI, then great. Libraries can build a tool that consumes that information as well as takes into account information about preprint services, open access versions, interlibrary loan and other methods of access. But to ask libraries to accept this publisher-controlled access button in their discovery layers, their learning management systems, their scholarly profile services, and their other tools? That sounds destined for disappointment. I am only somewhat encouraged by the fact that RA21 started out as a small, isolated collaboration of publishers before they brought in NISO and invited libraries to join the discussion. Did it mean that it slowed down deployment of RA21? Undoubtedly yes. Did persnickety librarians demand transparent discussions and decisions about privacy-related concerns like what attributes the publisher would get about the patron in the Shibboleth-powered backchannel? Yes, but because the patrons weren’t there to advocate for themselves. Will it likely mean wider adoption? I’d like to think so. Have publishers learned that forcing these kinds of technologies onto users without consultation is a bad idea? At the moment it would appear not. Some of what publishers are seeking with GetFTR can be implemented with straight-up OpenURL or—at the very least—limited-scope additions to OpenURL (the Z39.88 open standard!). So that they didn’t start with OpenURL, a robust existing standard, is both concerning and annoying. I’ll be watching and listening for points of engagement, so I remain hopeful. A few words about Jeff Pooley’s five-step “laughably creaky and friction-filled effort” that is SeamlessAccess. Many of the steps Jeff describes are invisible and well-established technical protocols. What Jeff fails to take into account is the very visible and friction-filled effect of patrons accessing content beyond the boundaries of campus-recognized internet network addresses. Those patrons get stopped at step two with a “pay $35 please” message. I’m all for removing that barrier entirely by making all published content “open access”. It is folly to think, though, that researchers and readers can enforce an open access business model on all publishers, so solutions like SeamlessAccess will have a place. (Which is to say nothing of the benefit of inter-institutional resource collaboration opened up by a more widely deployed Shibboleth infrastructure powered by SeamlessAccess.) feeds-feedburner-com-1059 ---- None feeds-feedburner-com-1102 ---- None feeds-feedburner-com-1128 ---- None feeds-feedburner-com-1353 ---- HubLog by Alf Eatonsearch Continuous deployment of a web service on Cloud Run March 14, 2021 Creating and deploying a web service using Cloud Run's continuous deployment, GitHub integration and Cloud Build's buildpacks cloud runnode.jsgithubBuilding amd64 Docker images with arm64 (M1) macOS March 7, 2021 Using docker buildx bake to build Docker images for different system architectures docker"git scraping" data from the Office for National Statistics API March 7, 2021 Fetching and publishing regularly-updated data as a web service with GitHub Actions and Datasette github actiondatasettecsvsqliteDocker on a Raspberry Pi 400 December 14, 2020 Using armv8 Docker images on a Raspberry Pi 400 DockerRaspberry PiARMAn Express app as a web service in Cloud Functions July 6, 2020 Deploying a simple web service to Cloud Functions node.jscloud functionsAn Express app as a web service in Cloud Run July 5, 2020 Deploying a simple web service to Cloud Run node.jscloud runA single-author web app hosted on Cloud Run June 13, 2020 Developing, building and deploying a single-author web app blogjavascriptexpressnode.jscloud rungithubSending a raw HTTPS request May 1, 2020 Storing, editing and sending a multipart/form-data request over HTTPS Converting PDF to PNG or JPEG September 13, 2019 Tools and services for converting a page of a PDF to an image How to build a user interface April 4, 2019 The 5 steps of designing a software product May 20, 2018 Designing a user interface for moving data from one state to another OpenID Connect March 13, 2018 A summary of the OpenID Connect protocol and its usage for authentication in an SPA Serving a web application over HTTPS February 16, 2018 Using nginx and LetsEncrypt to serve a web application over HTTPS JANICE: a prototype re-implementation of JANE, using the Semantic Scholar Open Research Corpus January 19, 2018 Formatting a LaCie external drive for Time Machine January 18, 2018 Indexing Semantic Scholar's Open Research Corpus in Elasticsearch January 4, 2018 Building an Elasticsearch index of Semantic Scholar's Open Research Corpus dataset A single-user blog October 22, 2017 Building a simple blog using React and Firebase Recovering from a failed macOS High Sierra upgrade October 18, 2017 OAuth in a Chrome extension October 16, 2017 ES6 export/import August 14, 2017 Exporting/importing/re-exporting ES6 modules Styling and theming React Components August 10, 2017 Using CSS in JS to style and theme React components async is more than await April 20, 2017 Symfony Forms March 23, 2017 Symfony is best at allowing users to apply mutations to resources via HTML forms Polymer + Firebase Makefile October 17, 2016 A Makefile for deploying Polymer apps to Firebase Distributed Consensus April 1, 2016 What Aaron understood September 11, 2015 What colour is a tree? September 11, 2015 Collections of items in time and space Fetching Web Resources September 10, 2015 Using Resource and Collection interfaces to retrieve data from the web Quantifying journals September 10, 2015 Metrics for scoring and ranking journals It's a shame about Google Plus September 10, 2015 URLs for people Distributed Asynchronous Composable Resources September 10, 2015 Filling out data tables using promises and computed properties Access-Control-Allow-Origin: * April 21, 2015 Add the Access-Control-Allow-Origin: * header to the data you publish No More Documents April 19, 2015 Client-side XML validation in JavaScript April 18, 2015 Using an Emscripten port of xmllint to validate XML against a DTD in a web browser. Organising, building and deploying static web sites/applications March 1, 2015 Using Jekyll (remote or local) or Yeoman (local) to build, serve and deploy a GitHub Pages site or application Visualising political donations February 15, 2015 Using Tableau Public to visualise donations to UK political parties Force-directed tag clouds February 15, 2015 Using artists as the dark matter in a graph of tags, to visualise the thematic content of radio shows Exploring a personal Twitter network January 25, 2015 Using Gephi to create a network graph showing the most highly-connected Twitter friends of those I follow. Searching for mergeable tables January 12, 2015 Finding tabular data sets that can be merged, using URLs for data types UK Prospective Parliamentary Candidates January 4, 2015 The people who will be standing as candidates in the 2015 General Election Creating a map of Grade I listed buildings January 4, 2015 Filtering an Environment Agency Shapefile to create a custom map UK parliamentary constituencies January 3, 2015 Boundaries, names and codes of the UK's parliamentary constituencies The trouble with scientific software December 31, 2014 Scientific software is often opaque, and difficult to obtain and cite Archiving and displaying tweets with dat September 18, 2014 Don't just publish JSON-LD June 19, 2014 Publish plain, simple JSON, with a linked context document for consumers that want it vege-table: the data table that grows, with leaves May 16, 2014 The easiest, most resourceful way to harvest, explore and publish a collection of data. Line-oriented data formats February 26, 2014 Iterating Arrays February 20, 2014 JavaScript methods for iterating arrays Publishing research on the web January 27, 2014 Two examples of publishing code, data and a human-readable report jQuery Microdata January 13, 2014 A jQuery plugin for working with HTML Microdata Creating printable cards with HTML and CSS December 22, 2013 Use HTML and CSS to fill a printed card with content Post-humanist technology December 19, 2013 If you can't tell why a technology would be useful to you, it's for the robots Collecting article metrics with OpenRefine December 16, 2013 Using OpenRefine to collect article metrics data JSON templates December 16, 2013 Using JSON templates to describe objects and query by example JSON-LD December 12, 2013 Using context documents to map local property names to shared URLs CSV on the Web, with PHP December 12, 2013 Fetching, parsing and publishing CSV Publishing, Versioning and Persistence December 12, 2013 Some rules for publishing a resource online SELECT * FROM WEB December 11, 2013 OK Guha Describing Objects December 11, 2013 Using names and classes as shorthand for object properties Switching off HubMed's RSS and Atom feeds August 14, 2013 HubMed's RSS and Atom feeds are discontinued Web Components July 7, 2013 Using Web Components to define custom HTML elements Internet Surveillance June 10, 2013 Methods of gathering information from the internet. Citing Articles Within Articles March 2, 2013 HTML markup for inline citations in scholarly articles Open, Social, Academic Bookmarking: Save to App.net February 5, 2013 Using App.net's File API to create an open, personal reading library. HTML metadata for journal articles November 28, 2012 A summary of ontologies for describing journal articles Ten years of HubMed November 28, 2012 An overview of the ten years since HubMed was created Publishing a podcast using Google Drive (in theory) September 20, 2012 Generate a podcast feed for audio files stored on Google Drive, using Apps Script and Yahoo Pipes Publishing Articles Using Gists September 5, 2012 Introducing macrodocs.org, a client-side renderer for articles stored in Gists Music Seeds and More Like These August 17, 2012 Sources for music recommendation; querying by example Querying Data Sets using Google BigQuery August 17, 2012 Using Google Fusion Tables to provide an API to data files August 15, 2012 Resourceful Web Interfaces August 2, 2012 Classlessness June 26, 2012 A Resourceful Alternative to OAI-PMH June 4, 2012 Adding Files to Google Drive using PHP May 1, 2012 Working with the Harvard Library Bibliographic Dataset April 27, 2012 BBC Radio -> XSPF Bookmarklet March 12, 2012 How To Text Mine Open Access Documents February 22, 2012 Open Access Author Manuscripts in PubMed Central February 20, 2012 ISSN(L)s And Serial Title Abbreviations February 9, 2012 Extracting Text From A PDF Using Only Javascript November 18, 2011 Open Graph wins the Semantic Web September 29, 2011 Citing With URIs in Google Docs September 16, 2011 Client-Side PubMed Searching July 23, 2011 Capturing a manipulated web page with PhantomJS March 25, 2011 This Weblog In (Some) URLs March 6, 2011 A Modular System for Automatic Entity Extraction and Manual Annotation of Academic Papers February 3, 2011 Getting and Sending Binary Files with XMLHttpRequest December 15, 2010 AOTY 2010 November 18, 2010 ReCo: a music recommender October 18, 2010 Artists October 7, 2010 Creating a single file, lossless rip of a DVD chapter in Ubuntu August 22, 2010 London Cycle Hire data/apps August 7, 2010 Writing Firefox Add-ons with the JetPack SDK July 31, 2010 UK Fuel Consumption for Energy Use July 1, 2010 Current UK Reservoir Stocks July 1, 2010 eCryptfs in Ubuntu (Lucid) June 27, 2010 Using STIX fonts with @font-face June 10, 2010 Inline annotations/formatting in HTML May 20, 2010 Command line Twitter authentication using the PECL OAuth library May 20, 2010 Automatically mounting a remote directory in Ubuntu using autofs + sshfs May 15, 2010 A Simple Hit Counter with Node.js and Redis May 13, 2010 Voting Correlation (UK General Election 2010) May 9, 2010 UK General Election 2010 May 8, 2010 Installing PHP 5.3 etc on Ubuntu Karmic (9.10) May 5, 2010 Maps at the British Library, and on the BBC May 4, 2010 mapstvBillions April 15, 2010 Archiving Timestamped Copies of Bookmarked Web Content March 30, 2010 A WSDL 2.0 description of the EUtils EFetch web service March 28, 2010 phpschemaxmlREST Web Services, XML and Data Typing March 27, 2010 phpxmlGoogle Bookmarks Lists March 24, 2010 googlelistsmapsA Solr index of Wikipedia on EC2/EBS March 17, 2010 ec2lucenesolrMapping XML Named Character References to Unicode Characters March 16, 2010 A Pipe for New Episodes in a BBC series March 8, 2010 bbce4xjavascriptpipesrdfxmlyahooyqlIndependent UK Record Labels on Spotify March 4, 2010 Adding Spotify links to BBC Radio playlists, via RDFa, using Greasemonkey and rdfQuery March 2, 2010 Indexing JSON data in MongoDB using PHP February 23, 2010 Showing Delicious bookmarks of pages within a domain February 19, 2010 ElasticSearch in PHP February 16, 2010 Describing REST APIs with HTML5 forms February 2, 2010 The Top Google Search Result for each Unicode Character January 22, 2010 Listing Unicode Characters January 22, 2010 Spotify Playlist: The Hype Machine Top 1000 Albums of 2009 January 21, 2010 On A Bus updated January 19, 2010 Using the Bing Maps Web Services in PHP January 19, 2010 Publishing Files using a Public Folder in Google Docs January 12, 2010 Web Applications January 11, 2010 Installing platform-specific applications January 11, 2010 Operating Systems and Application Launching January 11, 2010 An OS X Single Site Browser with HTML5 Storage Support? January 6, 2010 A Basic Web App With A Settings Page, Using jQTouch and PHP January 5, 2010 OpenURL + OpenSearch January 4, 2010 Map Overlays January 2, 2010 mapsThird-Party Cookies December 23, 2009 Spotify lookup and Playdar in AOTY December 10, 2009 aotyplaydarspotifyOpenSearch + YQL December 10, 2009 Importing GeoPlanet data into MySQL December 10, 2009 Semantic Assistants November 16, 2009 Text Mining November 16, 2009 SoyLatte: Java 1.6 for 32-bit OS X November 11, 2009 javaosxTransforming XML files with XSLT 2.0 and Saxon-HE on OS X, using an XML catalog October 26, 2009 xmlLatest NPG articles in PubMed Central October 22, 2009 Exploring PubChem via SPARQL October 7, 2009 Bacode October 7, 2009 Yahoo! APIs Terms of Use changed October 2, 2009 Using PubMed's autocomplete data in JQuery September 30, 2009 HTML template September 30, 2009 htmlSheevaPlug as a Torrent Seed Box September 19, 2009 Graphing weather time series data with Timetric September 10, 2009 dataweatherConverting PDF to PNG using ImageMagick or Ghostscript August 20, 2009 The Music Industry (version) August 11, 2009 QR code testing on the iPhone August 7, 2009 Embedding chemical structure information in image files August 6, 2009 applescriptchemistryUsing the Tesco API with PHP August 5, 2009 apiphpTopic Modelling with MALLET August 3, 2009 Travel with an iPhone August 2, 2009 iphonetravelMarking up a bibliographic reference with RDFa July 30, 2009 Entities in Scientific News Stories June 18, 2009 onabus.com June 16, 2009 Annotation of Scientific Articles June 14, 2009 annotationNow Playing in Songbird June 14, 2009 musicnow-playingsongbirdA Private Radio Archive June 14, 2009 notuberadioDealing with election results data June 11, 2009 Adding Bing search results to Google June 4, 2009 Extracting keyphrases from documents using MeSH terms and KEA June 1, 2009 Scraping with YQL Execute June 1, 2009 scrapingClustering documents with CLUTO May 28, 2009 Exploring an OAI-PMH repository May 26, 2009 oaiYahoo! PlaceMaker May 21, 2009 apilocationyahooFetching article citation counts from Web of Science May 21, 2009 apiPHP, DOM, DTDs and named entities May 20, 2009 phpxmlPHP, DOM and XML encodings May 20, 2009 phpxmlRecording video from a webcam in Ubuntu May 17, 2009 ubuntuvideoQuerying BBC programmes in a Talis data store May 15, 2009 bbcrdfuriOAI, YQL and JSON May 14, 2009 phpyqlWhat's the Unicode character for "irony"? May 7, 2009 Updating local copies of databases and ontologies May 6, 2009 Server-side DOM scraping with Javascript: options April 29, 2009 domjavascriptSolr/Lucene on EC2/EBS April 20, 2009 ec2lucenesolrInstalling CouchDB from source on OS X April 17, 2009 Everything? April 7, 2009 Playdar as an OpenURL resolver? April 3, 2009 audiocoinsopenurlplaydarresolutionGraph of new albums added to Spotify April 1, 2009 Analysing 'science' bookmarks in Delicious March 29, 2009 deliciousPosting shared items from Google Reader to Delicious March 29, 2009 deliciousphpResolving URLs with PHP March 29, 2009 phpFinding all occurrences of a UTF-8-encoded needle in a UTF-8-encoded haystack March 25, 2009 phpUsing YQL and Pipes to make a screensaver of The Big Picture March 25, 2009 pipesyqlPages tagged as 'science' on Delicious, by co-tags March 21, 2009 delicioussciencePopular pages tagged as 'science' on Delicious March 21, 2009 deliciousscienceSelecting Wikipedia articles by InChI March 20, 2009 chemistryinchirdfContent Hashing March 19, 2009 similarityYQL Open Data Tables March 16, 2009 scrapingyqlFestive 50 Spotify Playlists March 15, 2009 playlistsradioxspfTfL feeds March 13, 2009 Semantic/Scientific Authoring Add-ins for Microsoft Word March 13, 2009 publishingsemanticData, Science and Stories March 12, 2009 dataMusic Recipe March 12, 2009 musicDelicious Network Meme Tracker March 12, 2009 deliciousComparing similar articles and categorisation with Wikipedia March 12, 2009 Fetching articles from the NY Times API March 11, 2009 apiGuardian + Lucene = Similar Articles + Categorisation March 10, 2009 Guardian Open Platform March 10, 2009 apiCloudMade February 23, 2009 mapsAn open question to authors of text mining tools February 22, 2009 text-miningHTML + WMV -> XSPF + MP4 February 21, 2009 phpvideoAnalysing the ticTOCs collection of journal TOC feeds February 18, 2009 Freebase: Types, Topics, Timelines and Mentions February 7, 2009 freebaseontologyGoogle, jQuery and plugin loading February 3, 2009 googlejavascriptjqueryYouMomus February 2, 2009 BigMaps with Modest Maps January 31, 2009 mapsQuestion for a map January 31, 2009 mapsBigMaps with CutyCapt and Xvfb January 30, 2009 ecsstract: Scraping in XULRunner with JSON/CSS selectors January 29, 2009 Generating Standard Chemical Identifiers (Standard InChI) January 22, 2009 PubMed XML in eXist on OS X January 21, 2009 Difficult Album Titles of 2008 January 19, 2009 musicPrivacy online: prevent tracking using Adblock Plus' site-specific filters January 19, 2009 adblockprivacyExtracting a certificate/key pair from a Java keystore January 19, 2009 Spotified SXSW Catalog January 19, 2009 greasemonkeyspotifyDefining scraper mappings using CSS selectors January 19, 2009 An Annotated Timeline of U.S. Public Debt, using Google Spreadsheet and Google Calendar January 16, 2009 dataGenerative art in Second Life January 13, 2009 Installing an independent PHP 5.3 to run from the command line January 13, 2009 phpNotes on using the Ubuntu EC2 AMI January 13, 2009 Events! January 11, 2009 Radio Now January 6, 2009 iplayerradioUbuntu on EC2 January 5, 2009 Displaying new episodes from BBC iPlayer January 4, 2009 bbciplayertvZemanta API January 2, 2009 Making a Lucene index of Wikipedia for MoreLikeThis queries January 2, 2009 lucenephpwikipediaAlbums of the Year collages January 1, 2009 musicEnd-of-year TV (UK only) January 1, 2009 tvSkyrails December 19, 2008 graphnetworkvisualisationSpotification December 19, 2008 greasemonkeyUniProt / RDF / SPARQL December 19, 2008 rdfuniprotGetting a visitor's location (city) December 15, 2008 Browse My Privates December 10, 2008 Firefox 3.1, maxVersion for extensions December 10, 2008 firefoxAlbums of the Year 2008 December 10, 2008 nokeepalive December 9, 2008 POTAtoo December 6, 2008 greasemonkeySongbird links and bookmarks December 3, 2008 Libxml2, PHP and UTF-8 December 2, 2008 The 16 Most Interesting Regions in Second Life November 24, 2008 secondlifeOn A Bus November 21, 2008 iphonemapstransportSecond Life person pseudo-APIs November 17, 2008 Second Life region APIs November 13, 2008 secondlifeSecond Life BigMap November 11, 2008 Mouse coordinates bookmarklet November 11, 2008 bookmarkletEncoding AAC/MP4 audio files on OS X November 11, 2008 audioosxInline Wikipedia History, updated November 11, 2008 greasemonkeywikipediaIntrepid vs NVIDIA November 9, 2008 ubuntuNational Public Transport Data Repository data November 9, 2008 transportRoyal Mail PAF data November 9, 2008 dataTransport Direct API November 9, 2008 apiphptransportGetting Started in Second Life November 8, 2008 secondlifeJSONP, Google Spreadsheet security October 29, 2008 securityUIMA October 28, 2008 Minimal PHP script for downloading PubMed XML October 23, 2008 phppubmedMinimal PHP script for downloading PubMed XML (with error checking) October 23, 2008 phppubmedHuffduffer October 22, 2008 Who Cares About Open Access October 21, 2008 publishingscienceSecond Life: "Teleport to Camera Position" October 9, 2008 second lifeVideo Encoding Recommendations October 2, 2008 videoMaximise OS X windows with a keyboard shortcut September 26, 2008 Query Parameters in URIs September 26, 2008 Logout/Login CSRF September 24, 2008 Pure Data September 22, 2008 audiopuredataPlaylist Builder using Freebase Suggest September 22, 2008 freebasemetadataWeb Playlist Tool September 22, 2008 mediaplaylistsPreprints and Categorisation September 18, 2008 Creating a Freebase data view September 18, 2008 datafreebaseRasmus Lerdorf on PHP performance September 14, 2008 phpUbiquity PubMed search September 12, 2008 Audacity September 12, 2008 PHP, SimpleXML, XPath and namespaced attributes September 11, 2008 Removing 'for each' from Javascript examples September 10, 2008 PubMed JSON API September 8, 2008 apijavascriptjsonpubmedLinux music players: compilations and watching folders September 7, 2008 audiolinuxBBC AOD Filter Pipe September 4, 2008 audiobbcpipesGoPubMed export API September 4, 2008 Ubiquity commands September 2, 2008 National Rail bus service sparklines September 1, 2008 businfographictransportGmail MenuExtra SSB in Fluid August 29, 2008 Veodia August 28, 2008 second lifevideoPulseAudio resampling August 28, 2008 audioprojectM-pulseaudio August 27, 2008 audioubuntu,visualisationLondon: cycling and walking route maps August 26, 2008 London: Visitors bus map, mobile TFL August 26, 2008 PulseAudio voodoo August 24, 2008 audioubuntuUKPA negotiates a licence for commercial music podcasting August 23, 2008 podcastGeoNames NearbyWikipedia API August 21, 2008 apijavascriptjquerylocationwikipediaAideRSS PostRank API August 21, 2008 apijqueryUK Postcode -> Bus Stop prototype August 20, 2008 maptransport400,000 bus stops August 20, 2008 locationphpFull-text feeds as a route around censorship August 20, 2008 feedsEPUB and Stanza August 20, 2008 epubliteraturepdfMendeley August 19, 2008 bibliographypdfCreating MapTube maps with Neighbourhood Statistics data August 19, 2008 mapsAn Amazon Wishlist Competition/Contest August 18, 2008 amazoncompetitionwishlistListen Later updated August 18, 2008 audiobbcextensionfirefoxMobile bus departures August 18, 2008 locationtransportUpload to Google Docs bookmarklet August 18, 2008 bookmarkletgoogleGrowl Alerts for Gmail Messages from Address Book August 18, 2008 applescriptgmailRadio 4 Comedy Feeds August 18, 2008 bbcradioSending a URL from Safari to Firefox August 18, 2008 applescriptosxWriting an Atom feed in PHP 5 August 13, 2008 atomphpHow to Share a Social Network August 12, 2008 portabilityprivacyLocatory August 11, 2008 Britain From Above August 10, 2008 bbctvMeta-TV August 8, 2008 tviPhone reader for Google Reader Starred Items August 8, 2008 feedsgoogleiphoneFree August 6, 2008 iphoneCOUNT/DISTINCT queries August 5, 2008 mysqlrdfxqueryManipulating Forms in Google Spreadsheets August 4, 2008 googlespreadsheetsTiddlyWiki zoomable interface August 4, 2008 tiddlywikiiPhone interface for Delicious Network August 4, 2008 deliciousiphoneMapping Statistics mini-presentation at BarCamb August 4, 2008 mapsDS Game Classics July 29, 2008 dsgamesLondon Age Distribution Maps Part 2 July 29, 2008 mapsLondon Age Distribution Maps July 29, 2008 mapsdata URI for a Google search box July 26, 2008 googleiphoneSecurity Email Addresses that are Black Holes July 26, 2008 securityBeware of the App July 26, 2008 iphonesecurityXMPP comments July 26, 2008 xmppUpcoming API (PHP5) July 24, 2008 apiphpSend PDFs from Skim to Gmail July 24, 2008 applescriptgmailpdfListen Direct /programmes July 17, 2008 bbcgreasemonkeyradioInstalling Java Advanced Imaging in Ubuntu Hardy July 12, 2008 javaubuntuConvert AMR files to WAV in Ubuntu Hardy July 9, 2008 audiolinuxubuntuNeighbourhood Statistics API July 4, 2008 apiphpsoapOrdnance Survey-based BigMap of the UK July 2, 2008 mapArtist -> BBC Radio Shows lookup June 24, 2008 bbcmusicphpradioSoul Bubbles June 22, 2008 gamesEssential Add-ons for Firefox 3 June 19, 2008 extensionsfirefoxChris Wetherell on Google Reader June 18, 2008 feedsgooglePod-U-Like June 17, 2008 app-enginepodcastsUsing Google to Fetch All of a Feed's Items June 17, 2008 apifeedsgooglephpFirefox, OpenSearch and Autocomplete June 14, 2008 firefoxopensearchOpenCalais API June 11, 2008 apiSkim: Open All With Papers June 9, 2008 applescriptosxpdfTumblr Auto-Pager June 9, 2008 greasemonkeyWebClipCountUpDown June 7, 2008 csssafariCreate a calendar from del.icio.us bookmarks June 6, 2008 calendardel.icio.usPod News June 5, 2008 podcastsBringing a publisher's content to the Life Science researcher May 29, 2008 presentationpublishingslidestext-miningThe rules of Web 3.0 May 28, 2008 Publications May 28, 2008 app-enginemedlineopensocialpublicationspythonOn the Rain-Slick Precipice of Darkness, from Penny Arcade May 25, 2008 gameOpenSocial terminology May 24, 2008 opensocialMy Speediest Gatherers updated May 21, 2008 del.icio.usphpUpgrading to Gmail May 11, 2008 emailWith or Without UIDs May 8, 2008 metadatapresentationsearchslidesxtechI'm Feeling Unlucky May 8, 2008 googlegreasemonkeyMeta Latest May 7, 2008 searchDealing with corrupt preference files on OS X May 4, 2008 osxRecipeBook April 29, 2008 drupalrecipebookrecipesMixingIt April 26, 2008 drupalminingphpradiotextGazelle April 22, 2008 musicp2pNow Playing on the radio April 22, 2008 bbcradioxmppRealPlayer 11 for Linux April 21, 2008 linuxradiorealplayerPubMed search URL April 16, 2008 pubmedHow to Make Someone Fetch a URL with a Blank Referer Header April 15, 2008 securityReification April 15, 2008 rdfSecurity against SQL injection in Wordpress April 14, 2008 phpsecuritywordpressTwubble April 10, 2008 twitterGoogle Docs April 7, 2008 Code quality in contributed Drupal modules April 5, 2008 drupalNDS April 2, 2008 dsgamesSecure password hashing April 1, 2008 drupalopenpasswordssecuritysourcewordpressHow IdentiFight works April 1, 2008 identifightprivacySemgine's myMap for exploring semantic networks of information April 1, 2008 datagraphinterfacerdfVST instruments in Linux March 30, 2008 linuxvstGoogle Site Search bookmarklet March 30, 2008 bookmarkletgoogleBookMooch, LibraryThing March 30, 2008 booksSneetchalizer March 28, 2008 audiolinuxLast.fm Fingerprinting Client March 28, 2008 last.fmmetadatamusicDecentralised music subscription services March 28, 2008 musicDrupalCPP March 28, 2008 drupalS01E01 March 28, 2008 tvPenguin March 28, 2008 booksIdentiFight additions March 27, 2008 identifightprivacyIdentiFight March 26, 2008 identifightprivacySpokeo March 18, 2008 identityPHP script for downloading MP4 files from iPlayer March 16, 2008 bbcphpTopCited March 13, 2008 citationpublishingDownload TV shows from the BBC iPlayer as MP4 March 8, 2008 bbctvClimate Change March 4, 2008 climate-changeconferenceparticipationLocating London Buses March 3, 2008 busesopen-dataBuilding a "Now Playing" Wall February 20, 2008 amarokxmppMusicBrainz artist info API February 17, 2008 apimusicbrainzphpLast.fm artist info API February 17, 2008 lastfmphp"Relation" metadata February 14, 2008 metadatapublishingXMPP, Publish-Subscribe, PEP and User Tune February 14, 2008 pubsubxmppFull OpenURL metadata from CrossRef February 14, 2008 openurlSetting the height of a cross-domain iframe using postMessage February 13, 2008 htmljavascriptNo Frills Fullscreen February 12, 2008 extensionsfirefoxKitte February 12, 2008 designwordpressCrossRef Citation plugin February 11, 2008 citationcrossref"Play in Sidebar" Firefox extension February 6, 2008 audioextensionfirefoxplaylistvlcCoding Niggles February 5, 2008 codeZowbar February 5, 2008 firefoxmetadatazoterorefactormycode February 3, 2008 codeWindows-less? February 3, 2008 linuxmusicrenoisewindowsCanon printers in Ubuntu February 3, 2008 printubuntuLinking to papers February 3, 2008 citationconversationsdisambiguationUnpredictability of influence January 29, 2008 XMPP January 27, 2008 firefoxxmppUpdating "Selected Text" Bookmarklets January 25, 2008 bookmarkletsFinding Conversations around Academic Publications January 24, 2008 citationconversationscintillaCanonical PubMed URLs January 22, 2008 pubmedListen Later January 22, 2008 bbcextensionfirefoxradioHow a Firefox Extension Works January 22, 2008 extensionsfirefoxFSDL January 21, 2008 searchSingle Window mode January 21, 2008 firefoxMozilla, Chrome and FUEL January 21, 2008 firefoxFirefox 3, del.icio.us posting extension January 21, 2008 del.icio.usextensionfirefoxOpenURLed January 18, 2008 openurlFeed Deltas: What's Changed? January 18, 2008 feedsBlog Remix January 16, 2008 musicxspfAnnotations in XML January 16, 2008 annotationSubmitting Author Manuscripts to PubMed Central January 15, 2008 publishingDepositing Nature articles in PubMed Central January 15, 2008 natureopen-accesspublishingBPR3 markup January 14, 2008 citationmicroformatsCrowbar January 14, 2008 scrapingzoteroThe Long Arm of Copyright January 13, 2008 copyrightgamesAll Nature papers now available online January 10, 2008 natureAccessing the UMLSKS SOAP Web Service using PHP5 January 10, 2008 apiphpCoverFlow-ish for Newest Amarok Albums January 6, 2008 amarokphpCommunicating with Amarok from a local web page January 6, 2008 amarokHTTP POST in PHP5 January 6, 2008 phpArchiving del.icio.us bookmarks January 3, 2008 del.icio.usdrupalprojectM in Amarok December 28, 2007 audiovisualisationContextLinks Amarok plugin December 23, 2007 amarokAmarok: Record Labels from MusicBrainz December 22, 2007 amarokmusicbrainzpythonAmarok: Album Release Dates from MusicBrainz December 22, 2007 amarokmusicbrainzpythonTV on the Internet December 20, 2007 bbctvBBC Cross-Platform iPlayer December 18, 2007 bbcradiotvBest Albums of 2007 lists December 18, 2007 drupalmusicPresenting replicates in a table December 11, 2007 datahtmlpublishingCharting features December 7, 2007 datajavascriptvisualisationAIR December 5, 2007 e4xEasylistener bookmarklet December 5, 2007 bookmarkletplayrAIDA Toolkit Entity Extraction API December 3, 2007 medlineminingReasons for loving CASH Music December 3, 2007 musicmyExperiment December 3, 2007 bioinformaticsXNAT workflow December 3, 2007 scienceFirefox's Sandbox December 2, 2007 firefoxsecurityjQuery in Zotero November 29, 2007 jqueryzoteroThe World is 67108864 Pixels at Zoom Level 5 November 28, 2007 googlemapConditionally hiding HTML elements with jQuery/CSS November 27, 2007 jqueryAmazon AWS API November 26, 2007 amazonapiPubChem (EUtilities) API November 20, 2007 apieutilsphpCross-platform Javascript omissions November 20, 2007 javascriptRenoise November 18, 2007 musicrenoiseSideload/MP3tunes vs EMI November 15, 2007 copyrightmusicPredictive accuracy is substantially improved when blending multiple predictors November 14, 2007 algorithmsServer-side scraping with Javascript November 12, 2007 javascriptmetadataCureHunter's graph viewer November 9, 2007 visualisationScraping web pages with PHP 5 November 8, 2007 phpscrapingMaking a screencast November 8, 2007 screencastvideorev="review" November 8, 2007 citationmicroformatsPreserving PDF metadata November 6, 2007 metadatapdfGutsy November 6, 2007 ubuntuBPR3 November 5, 2007 citationmicroformatsStill OiNK-less November 3, 2007 musicp2pted November 1, 2007 p2ptvMySpace -> File2HD Greasemonkey script November 1, 2007 greasemonkeyMetadata Scrapers October 31, 2007 metadatadel.icio.us / earlier October 31, 2007 del.icio.usHype Machine October 17, 2007 musicsongbirdGetting a local copy of MEDLINE October 9, 2007 medlinephppubmedBBC Radio Player as a separate application, with WebRunner October 6, 2007 bbcfirefoxradioI Forgot My Password October 3, 2007 securityXUL FTW October 3, 2007 xulFix SSH in Mac OS X by reinstalling Kerberos.framework October 1, 2007 osxMail -> Thunderbird October 1, 2007 emailMethods for private Atom/RSS feeds October 1, 2007 feedssecurityLinux and wireless devices September 30, 2007 linuxwifiGmail vulnerability September 27, 2007 emailsecurityThings that Taste Great Together September 20, 2007 Notes From DrupalCon Barcelona 2007 September 20, 2007 drupalUpdate: Artist popularity in specific countries September 12, 2007 lastfmNCBI Resource Locator September 11, 2007 pubmedhCalendar, Microformats and Google Calendar September 10, 2007 microformatsDealing with hard drives in Ubuntu September 8, 2007 ubuntuLondon Cinema Today September 4, 2007 cinemadrupalHigh Usage of PubMed's "Related Articles" August 23, 2007 pubmedsearchInline Wikipedia History August 22, 2007 greasemonkeywikipediaOn The Wire has a podcast August 21, 2007 radioMusicSun August 16, 2007 audioscrobblervisualisationNon-destructive faceted browsing August 15, 2007 searchvisualisationquite |kw?t| August 15, 2007 Adding Random Email Addresses to Facebook August 15, 2007 Geocoding APIs August 14, 2007 apigeoGene Network API August 14, 2007 apibioinformaticsphpPostgenomic API August 14, 2007 apicitationWhatizit API August 14, 2007 apibioinformaticsphpPubMed API August 13, 2007 apiphppubmedClearForest SWS API August 13, 2007 apimetadataphpCSS Workarounds for Internet Explorer < 7 August 3, 2007 cssWikipedia API July 31, 2007 apiFreebase API July 30, 2007 apiScopus API July 30, 2007 apicitationRSS Nightmare July 30, 2007 feedsLazyTube July 26, 2007 screencastvideoPublishing data tables July 26, 2007 datapublishingUser Styles July 19, 2007 cssFarewell Azureus July 19, 2007 p2pEPUB and Adobe Digital Editions July 14, 2007 publishingIf It's Ready, Release It July 13, 2007 musicOpera Mini 4 Beta July 9, 2007 Faceted Search in Solr/Drupal July 3, 2007 drupalsearchmusic.of.interest July 1, 2007 musicPeel Sessions July 1, 2007 drupalradioScintilla June 14, 2007 naturescintillaCreate a Google Custom Search Engine on the fly June 14, 2007 googlesearchXTech 2007 Science BOF slides June 7, 2007 presentationxtechUnofficial London RSS feeds June 6, 2007 feedslondonBenchmarking PHP 5.2.3 string manipulation June 5, 2007 phpBenchmarking PHP 5.1.4 string manipulation June 2, 2007 phpExpanding Abbreviations in HubMed June 1, 2007 hubmedMahalo June 1, 2007 searchPodule #10 May 31, 2007 podulesLast.fm listening graph May 31, 2007 lastfmvisualisationPodcast Awards May 30, 2007 podcastXSS vulnerabilities by PageRank May 30, 2007 securityRIAA-safe Top 100 May 24, 2007 musicp2pCompiling and installing Xalan on OS X May 24, 2007 osxProgramming Language Reference Widgets for Dashboard May 23, 2007 osxWebjay → last.fm playlists May 22, 2007 lastfmplaylistsNotes From XTech 2007 May 22, 2007 xtechPredictions/Observations for 2007 May 16, 2007 Mobile Feed Reader May 10, 2007 feedsReal-time, 32-bit audio processing May 10, 2007 audioPlay This Gene May 3, 2007 greasemonkeyItems You Rated in Amazon May 2, 2007 amazonrecommendationRate items quickly in Amazon with Greasemonkey April 30, 2007 amazongreasemonkeyXTech 2007 April 29, 2007 xtechMultiple "Related Articles" in PubMed April 26, 2007 pubmedR4DS April 25, 2007 ds2500 album covers April 25, 2007 musicvisualisationFetching cover art for a list of albums April 25, 2007 amarokmusicmusicbrainzpythonAdd publications from HubMed to PublicationsList.org April 23, 2007 greasemonkeyhubmedpublicationsMirror last.fm listening statistics April 4, 2007 lastfmmusicphpMobile Mapping/GPS April 3, 2007 gpsZotero ? HubMed Tags April 1, 2007 extensionfirefoxhubmedzoteroOwnership of user-contributed data March 31, 2007 data-portabilityTouchGraph relaunched March 29, 2007 touchgraphAmarok → Last.fm links (My First Ruby) March 29, 2007 amaroklastfmmusicrubyAdding MusicBrainz data to an Amarok database March 29, 2007 amarokmusicmusicbrainzpythonSmall Pieces Please March 27, 2007 p2pThe Sorry State of Online Music March 27, 2007 musicVideo Aggregators March 27, 2007 video32-bit Firefox on 64-bit Ubuntu March 27, 2007 ubuntuWhat's on your Google Homepage? March 26, 2007 googleVisual Scrapers March 26, 2007 automationscrapingXHTML vs HTML March 9, 2007 xhtmlFour Tenets of Web Security March 8, 2007 securityOne column layouts March 6, 2007 cssFirefox offline browsing March 4, 2007 firefoxpublishinglast.fm user listening data March 3, 2007 lastfmScientific article conversations and distributed libraries February 20, 2007 citationSearch-and-replace February 19, 2007 bashWarning: don't use hpmount February 17, 2007 linuxPLoS One February 13, 2007 publishingOpenSearchFox February 13, 2007 extensionfirefoxCopying bookmarks from del.icio.us to Connotea February 13, 2007 bookmarksdel.icio.use4xgreasemonkeyGetting an audio file in another format from an M4P on OS X February 8, 2007 audioosxmemcached and Drupal February 8, 2007 drupalDrupal module for Solr February 8, 2007 drupalsearchUbuntu Edgy, Bluetooth and Sony Ericsson k800i February 6, 2007 mobileubuntuMAME reviews February 5, 2007 drupalgamesA web interface to search and download albums in an Amarok library February 4, 2007 amarokphpQ: What am I using iTunes for? January 31, 2007 musicPosting machine tags to del.icio.us January 25, 2007 del.icio.usDrupal 5 January 21, 2007 drupalBeryl 0.2 beta2 January 21, 2007 berylubuntuBT have been busy January 21, 2007 btThings You Need To Play Arcade Games January 20, 2007 gamesdvd::rip January 19, 2007 dvdlinuxNew server January 13, 2007 server365 Days Of London widget January 2, 2007 drupallondonosxPLoS Too December 23, 2006 drupalpublishingmetrack December 21, 2006 del.icio.usgreasemonkeyAutomatically play YouTube videos in a full window December 17, 2006 greasemonkeyyoutubePlaying web video in fullscreen December 17, 2006 playlistsTits & Sharks & Acid December 5, 2006 audiomashupNautilus Actions December 1, 2006 ubuntuPlaying YouTube videos in Ubuntu December 1, 2006 ubuntuvideoSound from a microphone on HDA Intel in Ubuntu December 1, 2006 ubuntumetalicious November 28, 2006 javascriptperlMusicBrainz Picard Tagger November 26, 2006 metadatamusicmusicbrainzUniform Requirements for Manuscripts November 20, 2006 citationVisual jQuery user style November 13, 2006 cssjqueryCheap-ish Windows XP November 7, 2006 windowsGeocoding UK postcodes with PostcodeAnywhere November 7, 2006 geophpYahoo! Bookmarks November 2, 2006 bookmarksGMap Geocoding UK Postcodes November 2, 2006 geoLast.fm events calendar October 21, 2006 lastfmNovelty vs Necessity October 21, 2006 ubuntuFitting in Ubuntu October 19, 2006 firefoxubuntuAmarok, MySQL, JSON and Greasemonkey October 19, 2006 amarokgreasemonkeyphpThe case of the disappearing comments October 16, 2006 Wikipedia export format for citing papers from HubMed October 16, 2006 greasemonkeyhubmedunapiwikipediaTGN1412 analysis in The Lancet October 16, 2006 immunologytgn1412HubMed speed October 16, 2006 hubmedNNW Sneak Peek Release October 16, 2006 netnewswireZotero and compound documents October 11, 2006 metadatapublishingzoteroMetaphors that have had their day October 9, 2006 Google Webpage Gadgets October 4, 2006 googleprivacyDreamhost promotion today October 3, 2006 MSN.co.uk doesn't rank Firefox October 1, 2006 firefoxsearchI Candy September 30, 2006 ubuntuAll You Need On A (Consumer) PC September 29, 2006 appsosxubuntuwindowsUbuntu and Core2 Duo PCs September 25, 2006 ubuntuMigrate Movable Type to Drupal (4.7) September 22, 2006 drupalmtSecurity as a non-admin user in OS X September 21, 2006 securityNetNewsWire: "Mark All As Read And Proceed" September 20, 2006 netnewswireBuilding a site to handle images in Drupal September 17, 2006 drupalBT Home Hub September 16, 2006 btPodcasts For People Who Say They Don't Know Any Good Podcasts September 7, 2006 podcastsSharing a list of podcasts September 7, 2006 podcastsFirefox 2 Beta 2 September 4, 2006 firefoxWhy is MySpace popular August 30, 2006 myspace2 Steps to Making MySpace Nicer August 28, 2006 cssgreasemonkeymyspaceScan in iTunes August 21, 2006 applescriptitunesGenerate a bookmarklet to automate offprint requests August 12, 2006 bookmarkletpublishingNature.com CSS August 1, 2006 cssnaturestylishCleanliness July 28, 2006 osxNotate July 27, 2006 annotationpublishingAggademia July 26, 2006 aggregationdrupalnatureunAPI link enabler for Greasemonkey July 25, 2006 greasemonkeyunapiHubMed paper in Nucleic Acids Research July 17, 2006 hubmedSphere It! July 5, 2006 bookmarkletRSS feeds for Bloglines citation searches July 4, 2006 bloglinesfeedsFerret: Lucene for Ruby July 3, 2006 lucenerubysearch2006-06-28 Data Webs Conference June 28, 2006 conferenceMore About/Like This Page June 25, 2006 bookmarkletsMapping and Tagging Greasemonkey scripts June 22, 2006 connoteagreasemonkeyGraph your Connotea library June 22, 2006 connoteatouchgraphvisualisationXSL files for publishing from NLM XML May 30, 2006 publishingxslLucene 2.0 May 29, 2006 lucenesearchMeSH information in HubMed May 24, 2006 hubmedmeshPodule #8 May 22, 2006 podulesQuery statistics in HubMed May 17, 2006 hubmedSentence ordering in OTMI May 17, 2006 text-miningGoogle Co-op May 15, 2006 googlesearchAdd HubMed links to Google search results May 11, 2006 greasemonkeyhubmedAdd radio commands to BBC Radio Player May 11, 2006 bbcgreasemonkeyradioStructure of a scientific article May 9, 2006 publishingRelated Articles algorithms May 9, 2006 hubmedHealth-related queries in Google May 8, 2006 googleRecommendations from HubMed May 6, 2006 hubmedrecommendationA Plan for Publishing Journal Articles May 6, 2006 publishingTreemaps of MEDLINE May 3, 2006 medlinevisualisationPlaying Streaming Radio [RealAudio, BBC, OS X] Through an Airport Express May 3, 2006 osxA Network of Politicians and Interviewers/Journalists on the BBC May 1, 2006 touchgraophvisualisationTouchGraph of BBC TV/Radio Collaborators April 30, 2006 bbctouchgraphvisualisationRecent Papers in HubMed Search Results April 30, 2006 hubmedPersonalisation and Privacy April 28, 2006 hubmedpersonalisationMarkov-chained text from MEDLINE abstracts April 19, 2006 medlineiTunes Alarm Clock (iCal + Applescript) April 19, 2006 applescriptitunesosxEmail notifications in Gnome April 17, 2006 emailubuntuDocument clustering in HubMed April 12, 2006 hubmedRecording Streaming Radio (improved) April 12, 2006 bashradioVLC, XSPF, Dapper and Tango April 10, 2006 playlistsubuntuOpen Text Mining Interface (OTMI) April 7, 2006 text-miningInterDB links in HubMed April 7, 2006 hubmedex-HTML April 1, 2006 htmlHubMed extension for MediaWiki March 29, 2006 hubmedCTLA-4-Ig March 25, 2006 immunologyAdobe XMP SDK 4 beta March 25, 2006 metadatapdfxmpPeer Review with Marginalia March 25, 2006 annotationpublishingFulltext links from HubMed's feeds March 24, 2006 hubmedA Week of TV in Pictures (comedy, mostly) March 24, 2006 tvCriticker March 23, 2006 filmrecommendationFeedback down March 20, 2006 Playr's XSPF player March 20, 2006 playrPodule #7 March 18, 2006 podulesTGN1412 March 17, 2006 immunologytgn1412My Most Played Artists This Week, from Last.fm March 17, 2006 lastfmXHTML, SVG and MathML March 16, 2006 xhtmlGetting Document Elements Out Of The Clipboard March 16, 2006 bookmarkletHobbs on Rewind March 12, 2006 Lots of Comment Spam March 10, 2006 mtGetting Document Elements Into The Clipboard March 10, 2006 bookmarkletExclusive Photo of the New Google Colander March 8, 2006 googleCopy and Paste with unAPI March 7, 2006 unapiLinking and Storing Supplementary Data March 7, 2006 dataNotepress for Wordpress 2 March 6, 2006 wordpressConnecting to the Nintendo WFC March 1, 2006 dsSupplementary Data March 1, 2006 dataBluetooth Intellimouse Explorer on OS X March 1, 2006 osxSound in Ubuntu February 27, 2006 ubuntuTorrentbot missed some episodes February 26, 2006 p2pManaging Metadata for Academic PDFs February 21, 2006 bibdeskbibtexmetadatapdfScalable Bar Charts with Tables and CSS February 20, 2006 cssOpenURL For Music February 20, 2006 openurlPimp My Paper! February 18, 2006 publishingWhere to Download Firefox February 16, 2006 firefoxsearchUniProt Creative Commons licensed, available as RDF February 10, 2006 rdfThe state of online biomedical full text articles February 10, 2006 publishingAllFullText February 8, 2006 bookmarkletshubmedA_List of Podcasts February 7, 2006 podcastsThe State of Biomedical PDFs February 6, 2006 publishingManaging Academic Papers (almost) Like MP3s February 3, 2006 publishingQuery Expansion in HubMed February 3, 2006 hubmedWebphones February 3, 2006 audioAuthor Contributions in Scientific Paper Metadata February 2, 2006 publishingUpcoming.org simple event posting form February 2, 2006 calendareventsInterview with David Lipman of the NCBI February 2, 2006 pubmedBritish Albums vs Mercury Nominees February 2, 2006 musicPodule #6 January 31, 2006 podulesThings That Reek Of Greatness January 21, 2006 Normalising URIs January 18, 2006 citationAd-hoc XML databases with MySQL 5.1 January 18, 2006 mysqlxlmPubMed lookup for Structure Blogging January 12, 2006 hubmednotepressRelevance-ranked search results in HubMed January 12, 2006 hubmedPopulate iTunes With Webjay Playlists update January 11, 2006 applescriptituneswebjayNLM2MODS January 11, 2006 xsltCreating an Atom feed in Perl January 10, 2006 atomperlTV January 9, 2006 tvCreating an OpenOffice document in Perl January 9, 2006 openofficeperlA suggestion for OpenSearch January 9, 2006 opensearchListenable retrospectives January 7, 2006 feedsmusicBest Albums of 2006 January 7, 2006 musicrvwHubMed BibTeX changes January 6, 2006 bibtexhubmedCiteProxy January 5, 2006 citationidentifiersmodsxmlReading Feeds January 3, 2006 feedsosxsoftwaremp3blog TopList updated January 2, 2006 blogsmusicVideo chat between Mac and PC December 31, 2005 imosxvideoA script for Slogger December 30, 2005 bookmarksextensionfirefoxMachineProse December 18, 2005 biomedicalontologypublishingrdfAcademic metadata workflow December 17, 2005 metadataNow That's What I Call Weblogs... Vol 1 December 15, 2005 weblogsGoogle Music Search December 15, 2005 googleUpdated OpenSearch templates for Movable Type December 13, 2005 mtMore Useful Firefox Extensions December 12, 2005 firefoxDragThing [OS X] December 12, 2005 osxsoftwareEasyNews search plugin December 11, 2005 firefoxsearchpluginsSpacer December 11, 2005 bookmarkletExtracting Knowledge from Biomedical Text December 8, 2005 biomedicalhubmedrdftextBioinformatics Workflows December 8, 2005 bioinformaticsAlbums of 2005 snapshot December 5, 2005 musicThings of Interest Added to HubMed December 2, 2005 hubmedIndestructible user profiles December 2, 2005 searchGoggle update December 1, 2005 greasemonkeyRDF interoperability for social bookmarking tools December 1, 2005 feedshubmedrdftagsPopulate iTunes With Webjay Playlists November 29, 2005 applescriptituneswebjaySequence Manipulation En-Suite November 29, 2005 javascriptscienceIndex Diagnosticus November 29, 2005 searchBest Of: Bookmarklets November 27, 2005 bookmarkletsStylish November 27, 2005 firefoxBest Of: Games on OS X November 27, 2005 gamesosxBest Of: Applescripts for iTunes November 27, 2005 itunesosxBest Of: Firefox extensions November 27, 2005 extensionsfirefoxMechanical Turking November 23, 2005 amazonRDF export from HubMed Tags November 22, 2005 hubmedrdftagsContent Negotiation for HubMed Tags November 22, 2005 hubmedpiggybankrdftagsYahoo! Canada Movies feed November 22, 2005 cinemafeedCached web pages and Spurl November 20, 2005 bookmarkscacheUTF-8 citation export from HubMed November 17, 2005 hubmedA modular dynamic web page for bioinformatics searches November 16, 2005 bioinformaticsperlLast.fm search plugin November 16, 2005 firefoxlastfmDated web page snapshots with My Web November 16, 2005 bookmarkletcachemywebSubmitting reviews to Google Base November 16, 2005 googlegreasemonkeyrvwCreating a citable archive of a web page November 15, 2005 archivebookmarkscitationEeeeeeeeeeeevil November 14, 2005 googleThe Wire on Resonance FM podcast November 13, 2005 musicplayrpodcastMIME types and feed handlers November 13, 2005 feedspodcastsPodule #5 November 12, 2005 musicpodulesFOAF + hCard November 12, 2005 Yahoo's My Web 2.0 November 11, 2005 taggingVisible changes in HubMed this week November 11, 2005 hubmedDisabling Caps Lock in Ubuntu November 11, 2005 ubuntuVisible changes to HubMed this week November 4, 2005 hubmedBlogBridge November 3, 2005 softwareJEdit November 3, 2005 softwareTemporary feed subscriptions and individual item archives November 3, 2005 feedsOpenOffice 2 on OS X November 3, 2005 openofficeosxNotePress October 24, 2005 notepressresearchwordpressFlock October 22, 2005 firefoxflockHealthline October 19, 2005 healthsearchA definite lack of standards for academic metadata October 19, 2005 metadataLooking for MP3s? October 17, 2005 firefoxsearchpluginsTweaking Firefox for user-side accessibility October 16, 2005 accessibilitycssfirefoxstart.com gadgets October 15, 2005 A couple of music videos on slow servers October 15, 2005 musicPublishing whole documents using an open XML standard format October 13, 2005 publishingxmlSomeone Comes To Town October 7, 2005 torontoSomeone Leaves Town October 7, 2005 parisFixation October 6, 2005 hubmedrvwTiny Greasemonkey script for Flickr page titles October 6, 2005 flickrgreasemonkeyNews.com graph visualisation for related stories October 4, 2005 Anti-personal portal aggregators October 3, 2005 BlogPulse Alert Feed for Recently Played Artists September 28, 2005 audioscrobblerfeedslast.fmmusicPodule #4 September 26, 2005 musicpodulesOpenSearch Description and Atom-based Response templates for Movable Type September 24, 2005 atommovabletypeopensearchsearchUbuntu Breezy September 24, 2005 breezyubuntuReplace the Guardian logo September 22, 2005 greasemonkeyOpenSearch 1.1 September 20, 2005 firefoxopensearchsearchsruSpyware 2.0 September 20, 2005 security142 September 19, 2005 fruityloopsmusicFirefox search plugin for Google Blog Search September 16, 2005 firefoxgooglesearchpluginsMetadata in feeds (again) September 14, 2005 atommetadatardfreviewsThou shalt not make me squint September 14, 2005 accessibilitycssfirefoxFlickr vs Yahoo sign-up September 13, 2005 flickrsecurityyahooBookmark Folders September 12, 2005 bookmarksfirefoxChanging Feed Format September 12, 2005 atomfeedsrdfrssDVD ripping to Matroska September 12, 2005 dvdmatroskaoggxvidMovable Type + Tags September 7, 2005 movabletypetagAudioscrobbler Browser update September 7, 2005 last.fmmusictouchgraphCOinS Browser Extensions updated September 7, 2005 bookmarkletcoinsgreasemonkeyopenurlTV August 31, 2005 torrentbottvSort del.icio.us popular (again) August 31, 2005 deliciousgreasemonkeyGetting Firefox bookmarks into Spotlight August 31, 2005 firefoxspotlightDYLD_FALLBACK_LIBRARY_PATH (OS X) August 31, 2005 osxExport citations from HubMed to Refworks August 29, 2005 bookmarklethubmedSelf-contained Firefox search plugins August 28, 2005 firefoxsearchpluginsxsltFirefox Search Plugin Template for a Movable Type Weblog August 25, 2005 firefoxmovabletypesearchpluginsTalk To Google August 24, 2005 googleprivacyExtracting microcontent (XSLT, GRDDL, RDF) August 24, 2005 firefoxgreasemonkeymicrocontentrdfsome handy links for del.icio.us August 20, 2005 deliciousCOinS to CrossRef resolver script August 20, 2005 coinsgreasemonkeymicrocontentopenurlA better cite bookmarklet August 18, 2005 bookmarkletcitemicrocontentAdding to last.fm with Greasemonkey August 15, 2005 flittergreasemonkeylast.fmmusicTorrentbot additions August 15, 2005 torrentbotTurning a Java jar into an Application bundle (OS X) August 14, 2005 Jackson & His Computer Band (a test of Bleep.com's Web Tools) August 11, 2005 Hide Flickr comments from specific users August 11, 2005 flickrgreasemonkeyPlayr Atom feed August 11, 2005 atomplayrPandora August 9, 2005 musicAudioscrobbler & last.fm relaunch August 9, 2005 audioscrobblerlast.fmdel.icio.us Firefox extension security update August 8, 2005 deliciousfirefoxPodule #3 August 5, 2005 musicpodulesServer speed August 5, 2005 debianNTRSTNG (OS X) August 4, 2005 flickrperlHide Flickr Comments (Greasemonkey) August 3, 2005 flickrgreasemonkeyPublication of cytokines August 1, 2005 Converting RTF to plain text (OS X) July 31, 2005 osxGatherers of the Month #2 July 31, 2005 deliciousMaking a big old map July 30, 2005 perlPlayr's MP3 blogs section July 30, 2005 playrpodcastsOpenURL COinS July 30, 2005 coinsopenurlPodule #2 July 27, 2005 musicpodulesBBC Air Time July 23, 2005 bbcvisualisationAtom 1.0 in HubMed July 22, 2005 atomfeedshubmedLinux Applications July 18, 2005 linuxsoftwareubuntuFirefox form widgets in OS X July 16, 2005 firefoxosxAtom 1.0 July 15, 2005 atomfeedsInsecure RSS encryption July 15, 2005 greasemonkeysecuritydel.icio.us inbox in Firefox's sidebar July 12, 2005 deliciousfirefoxOS X URL handler to open links to local files July 11, 2005 applescriptosxwordpressFlickr Pro accounts July 11, 2005 Almost Everything About HubMed July 7, 2005 hubmedStatistics in Nature Immunology July 2, 2005 publishingstatisticsiTunes podcasts June 29, 2005 itunesplayrpodcastsExtracting Microcontent June 21, 2005 greasemonkeymicrocontentIt's about the catalogue June 19, 2005 musicp2pEasyNewzBin June 18, 2005 greasemonkeyHubMed Tag Storage June 17, 2005 hubmedtagGive us a big Back! June 12, 2005 cssfirefoxSpotlight June 9, 2005 osxSemantic weblog posts with Movable Type June 8, 2005 movabletyperdfTV June 6, 2005 tvMore feeds please, vicar June 4, 2005 feedsClient-side M3U generation June 4, 2005 playrGoggle update May 27, 2005 googlegreasemonkeyOperation D-Elite May 26, 2005 p2pWeb Assistants May 25, 2005 firefoxRDF data in HubMed May 24, 2005 hubmedpiggybankrdfParis 1911 May 22, 2005 mapsparisG.W.A. May 22, 2005 googleprivacyFoF May 22, 2005 feedonfeedsAbout reviews and microformats May 22, 2005 microcontentreviewsConcatenating multiple MP3s into one big playable MP3 May 22, 2005 musicsoftwarePodules May 21, 2005 musicpodcastspodulesTag search May 21, 2005 searchtagBagram May 21, 2005 M.I.A. loop May 21, 2005 musicGoggle May 19, 2005 firefoxgooglegreasemonkeyScreencast of HubMed and BibDesk May 12, 2005 bibliographyhubmedscreencastPubMed RSS May 9, 2005 Automator Plug-Ins May 4, 2005 osxGot Adblock? May 1, 2005 firefoxOpen Sourcing APIs April 28, 2005 opensourceThe Hype Machine April 28, 2005 musicEmail reply notifications (Mail and Growl) April 27, 2005 applescriptgrowlosxHow to get a Firefox that works (OS X) April 27, 2005 firefoxosxTargeted advertising April 26, 2005 cssnetnewswirePlay LHB Daily Downloads April 26, 2005 playrBrowser anti-aliasing April 24, 2005 firefoxItems for consideration April 23, 2005 bibliographyidentifiersopenurlSidєɳotÑ” April 20, 2005 osxsoftwareSkinning del.icio.us with Firefox and URIid April 19, 2005 deliciousfirefoxFirefox search plugin for Audioscrobbler April 19, 2005 audioscrobblerfirefoxsearchpluginsSend Me A File April 18, 2005 gpgJavascript benchmarks April 17, 2005 firefoxosxsafariSwitching from Safari to Firefox April 17, 2005 firefoxosxFirefox search plugin installer April 17, 2005 firefoxsearchpluginsBest albums of 2005 April 12, 2005 musicreviewsPrefetch Google ad links April 10, 2005 googlegreasemonkeyNew rvw! April 9, 2005 deliciousreviewsDeliciousify Audioscrobbler April 5, 2005 audioscrobblerdeliciousgreasemonkeyLesInrocksParis March 30, 2005 concertsfeedsparisFlash + XSPF in Playr March 29, 2005 xspfUpcoming API March 28, 2005 TV March 25, 2005 tvUpdates March 24, 2005 Why upcoming.org isn't more popular March 21, 2005 upcomingOpenSearch March 16, 2005 feedssearchFlitter v1.1 March 14, 2005 flittersearchGive Me All Your Cookies March 12, 2005 greasemonkeyOpenURL resolver bookmarklet March 7, 2005 bookmarkletopenurlAdd search links to Audioscrobbler artist pages March 5, 2005 audioscrobblerbookmarkletgreasemonkeyArtist popularity in specific countries March 4, 2005 audioscrobblermusicvisualisationRemove PDF delay for journal articles March 4, 2005 pithhelmetHubMed UTF-8 export March 1, 2005 bibtexhubmedrisutf-8US vs UK band popularity February 28, 2005 audioscrobblermusicvisualisationBlackwell's 'author pays' publishing February 27, 2005 biomedicalSkip SourceForge delay page February 26, 2005 pithhelmetHide selected content with CSS (in the future) February 21, 2005 cssnetnewswireSSL certificates for Apache2, Courier, Exim4 and Jabberd2 on Debian February 21, 2005 debiansslXML-based book authoring February 20, 2005 softwaresubversionWordPress 1.5 February 15, 2005 softwarewordpressChilibot February 15, 2005 biomedicaldatavisualisationRadio 4 RSS feeds February 15, 2005 bbcfeedsradioBD Graphit floating (modified) style for NetNewsWire February 15, 2005 cssnetnewswireTorrentbot moved February 15, 2005 torrentbotSorted lists of reviews, using rvw! and del.icio.us February 15, 2005 deliciousreviewsFlickrdesk update February 12, 2005 flickrsoftwareSorting del.icio.us/popular February 11, 2005 bookmarkletdeliciousMap interfaces February 8, 2005 googlemapsgre.gario.us February 7, 2005 deliciousDynamic OpenURL resolver links February 7, 2005 openurlPubMed tabs and Amplify February 6, 2005 biomedicalosxpubmedsoftwareMy Speediest Gatherers February 2, 2005 deliciousGatherers of the Month February 1, 2005 deliciousSwervedriver oddities February 1, 2005 musicPithHelmet HubMed redirect January 31, 2005 hubmedpithhelmetOrdnance Survey copyright annoyance (again) January 30, 2005 mapsMFeeds January 30, 2005 feedsmusicplayrBeagle January 29, 2005 linuxsoftwareWPA January 29, 2005 ubuntuMore Paris RSS feeds January 28, 2005 feedsparisOCLC software contest January 28, 2005 A9's street photos January 27, 2005 mapsGraph del.icio.us subscriptions network January 26, 2005 deliciousvisualisationGraph del.icio.us related tags January 26, 2005 deliciousvisualisationExplosion January 23, 2005 playrm3ucast update January 21, 2005 m3ucastosxplayrsoftwaregames TopList January 20, 2005 deliciousblogresearch TopList January 20, 2005 deliciousmp3blog TopList January 18, 2005 deliciousmusicYesterday's KEXP January 18, 2005 musicradioFlitter with del.icio.us links January 18, 2005 deliciousflitterLugRadio interviews Mark Shuttleworth January 17, 2005 radioubuntuCoral too slow January 17, 2005 p2pplayrFlitter with band photos January 14, 2005 flitterFlickrdesk update January 14, 2005 flickrsoftwareJavascript drag-and-drop ordered lists January 12, 2005 javascriptUnshuffly iPod January 12, 2005 SVG maps from PDF January 11, 2005 mapspdfsvgXSPF + SWF January 11, 2005 xspfListmania January 10, 2005 listsTechnorati TouchGraph January 9, 2005 touchgraphvisualisationA mini SVG map January 9, 2005 mapsparissvgUpcoming.org for Paris concerts January 6, 2005 feedsparisupcomingBreezy Listening January 4, 2005 Flitter bookmarklet January 2, 2005 flitteriTunes Music Store gift certificates January 2, 2005 musicp2pBands of 2004 January 2, 2005 musicBe The Coolest December 29, 2004 bookmarkletdeliciousCatching Up December 28, 2004 feedsmusicp2ptvLocalOpenURL and LocalSFX for HubMed pages December 13, 2004 openurlXSLT export from OmniOutliner Pro December 12, 2004 osxsoftwarexsltParis Concerts RSS feed update December 12, 2004 concertsfeedsparisCompiled live Swervedriver albums December 8, 2004 Search a restricted set of feeds December 7, 2004 feedssearchGoogle Reviews December 7, 2004 googlereviewsWi-Fi iPod December 7, 2004 musicp2pHubMed tutorials? December 7, 2004 hubmedAmiga Emulation December 6, 2004 emulationWindows Applications December 4, 2004 softwarewindowsVennMaster December 3, 2004 biomedicalvisualisationWorlds Apart December 3, 2004 musicFlitter December 2, 2004 musicMusic in del.icio.us December 2, 2004 playlistsHorizontal Amazon music thing December 1, 2004 amazonPricenoia December 1, 2004 amazonucomics Atom feeds for NetNewsWire November 30, 2004 feedperlGoogle Scholar bookmarklet November 27, 2004 bookmarkletTouchGraph browser for Amazon Citations November 24, 2004 touchgraphCiteSeer OAI compliance November 24, 2004 citeseerTouchGraph browser for Google Scholar November 22, 2004 googletouchgraphvisualisationPaper CD Case November 20, 2004 John Peel's final show November 20, 2004 musicAdium Groupchats November 18, 2004 Links to Google Scholar November 18, 2004 Free shipping for books at Amazon France November 17, 2004 More TV November 17, 2004 Mark Steel Lectures November 17, 2004 OS X Applications November 14, 2004 osxClusty PubMed November 12, 2004 Dowser November 12, 2004 Momentum November 11, 2004 CiteULike November 10, 2004 A decent setup for writing (OS X, with LaTeX) November 9, 2004 Sending files from one computer to another November 9, 2004 Motorcasting November 9, 2004 Fluxpod November 3, 2004 Visitors October 31, 2004 Flickrdesk: daily updated desktop pictures from Flickr (OS X) October 29, 2004 Define The Lie October 28, 2004 Netgear WG311FS and Airport Express on Ubuntu October 24, 2004 Ubuntu October 24, 2004 M3Ucast October 19, 2004 Desktop pictures from Flickr, using Magpie and OS X October 15, 2004 Making Podcasts from MP3 blogs October 13, 2004 What's in your menubar? October 13, 2004 RealPlayer update October 4, 2004 Fulltext links from HubMed October 4, 2004 Must... resist... October 4, 2004 Torrentbot missed episodes September 30, 2004 Bookmarklets September 30, 2004 Review sites reviewed September 30, 2004 reviewsFirefox extensions updated September 30, 2004 Glit_9 September 26, 2004 Hercules/PowerVR drivers September 26, 2004 Storing or aggregating microcontent September 26, 2004 NetNewsWire 2.0 beta September 22, 2004 Feed Your Reader September 20, 2004 MP3.com community features September 19, 2004 Albums for people who are bored of music September 19, 2004 deliciousPresidential candidates on science September 17, 2004 A Buttload of Bootlegs September 15, 2004 bootlegsNew web software releases September 14, 2004 softwareMolecular Systems Biology - a new open access journal from NPG September 14, 2004 open-accessMedical literature info sorting overload September 12, 2004 Aggregate feeds as impromptu record labels September 12, 2004 musicDivision of Laura Lee September 12, 2004 musicGoogle vs MEDLINE September 12, 2004 pubmedThe Return of STG September 10, 2004 p2pDocco September 10, 2004 documentssearchvisualisationDiebold's tamper-conducive vote counter August 31, 2004 securityCoral August 30, 2004 cacheData retention August 30, 2004 googleArtificial meme tracking August 30, 2004 memeNobel laureates call for open access to public-funded research August 30, 2004 open-accessThe 99p challenge August 28, 2004 radioFactory resetting an Airport Express August 25, 2004 appleiBook/PowerBook + iPod rebate August 25, 2004 appleDevendra Banhart August 24, 2004 musicThings Google knows about you August 24, 2004 googleBlogger's Next Blog tour August 22, 2004 IngentaConnect August 21, 2004 RSS munging with Urchin August 21, 2004 AWS 4.0 beta August 21, 2004 Rilo Kiley + others August 20, 2004 I wish I had a tripod August 19, 2004 The Guardian Digital Music survey August 18, 2004 ALAC droplet August 18, 2004 Some people say 'Tax the Rich' August 17, 2004 Audio channelling August 17, 2004 OSXPlanet is fantastic August 14, 2004 FLAC → ALAC August 13, 2004 What to do if you're running out of bandwidth August 12, 2004 RSS feeds of playlist recommendations from Playr August 12, 2004 Inducement to Hymn August 12, 2004 JustePort streams audio to an Airport Express August 11, 2004 Create a wishlist of videos to rent August 11, 2004 AllMusic desertion August 5, 2004 Caching media files August 5, 2004 Brain Hacks August 4, 2004 Joggle Tellybot August 4, 2004 Tagged reviews August 3, 2004 Cover art in del.icio.us RSS feeds August 3, 2004 Sidebar stars August 2, 2004 meme destruction project August 2, 2004 rvw! tool repurposed August 1, 2004 deliciousReview of an Airport Express August 1, 2004 AllConsuming book list July 29, 2004 Connected bands July 28, 2004 Cross-cluster navigation July 28, 2004 Nodule Organiser July 28, 2004 Hallowed be thy game July 26, 2004 Punk Voter playlist July 23, 2004 PageRank 10 sites July 23, 2004 PubMed search field tags July 23, 2004 PDF warning from CSS July 22, 2004 A 13.8GB torrent July 21, 2004 Open Source + Television July 18, 2004 DRM denial July 17, 2004 CacheM3U July 16, 2004 Mobile RFID reader July 16, 2004 Feed autodetection in Firefox July 16, 2004 iPapers July 15, 2004 BioMail July 14, 2004 OMG - NoMusic July 13, 2004 HTML2wget2M3U2mpg123 July 11, 2004 BBC News video console July 11, 2004 wget M3U playlist files July 10, 2004 MP3 blog playlists July 10, 2004 Simpler TODO RSS feed July 10, 2004 Music to go to sleep by July 9, 2004 Audioscrobbler Browser bugfix July 4, 2004 Twitch 14 July 3, 2004 CrossRef/Google search July 3, 2004 Choose your open access with Springer July 3, 2004 A Ghost Is Born July 2, 2004 RealPlayer 10 beta for OS X July 1, 2004 Getting the web out of the browser July 1, 2004 Data auto-detection in browsers July 1, 2004 Identifying papers with content hashes June 30, 2004 RIS format confusion June 30, 2004 Document publishing diagram June 30, 2004 HubMed.org June 29, 2004 Apple Tiger June 29, 2004 Compressing PDFs containing colour images June 29, 2004 Server switch June 29, 2004 Mail.app, IMAP and Courier June 27, 2004 Torrentbot updates June 22, 2004 Live music from the Sonar festival June 19, 2004 Webjay Alarm Clock June 17, 2004 clevercactus share June 17, 2004 SOAP interface for E-Utilities June 16, 2004 Suprnova discographies (-ish) feed June 16, 2004 Firefox 0.9 June 16, 2004 Favourite albums at/of the moment June 16, 2004 OS X root email account forwarding June 16, 2004 RVW 0.2 June 9, 2004 JEM archives June 8, 2004 Blogdigger Media June 7, 2004 Muziekhobbyist Webradio June 6, 2004 Your kids need drugs June 6, 2004 Pocket Radio June 4, 2004 Goliath X June 3, 2004 Scopus links from HubMed June 3, 2004 DrupalEd and DrupalBlog June 3, 2004 OME June 2, 2004 HubMed history tab June 2, 2004 Music file data hashes June 2, 2004 All Back To Mine May 31, 2004 rvw! formatter May 30, 2004 Détente May 29, 2004 Boosh on TV May 29, 2004 Elsevier author self-archiving May 28, 2004 Server co-op May 26, 2004 PNAS offers Open Access option May 24, 2004 Real BBC Radio May 23, 2004 That Man Will Not Hang May 21, 2004 Bastet May 20, 2004 Swervedriver May 20, 2004 MP3.com May 17, 2004 Safari vulnerability May 17, 2004 Google Groups Atom feeds May 17, 2004 Printing from Windows 98 to a shared CUPS printer on Panther using SAMBA May 17, 2004 Endnote incompatibility May 17, 2004 Infomediaries May 16, 2004 Sente: like iTunes, for biomedical literature May 16, 2004 Penance Soirée May 15, 2004 Red Cross reports May 15, 2004 Dropload May 11, 2004 Fillable, home-networked file servers May 11, 2004 HubMed search box May 11, 2004 RSS feeds for new album releases May 10, 2004 Rock Ahoy May 10, 2004 A big red blip May 9, 2004 Synchronized Multimedia Working Group May 8, 2004 Printer sharing May 8, 2004 The Golden Apples of the Sun May 7, 2004 Hey Hey 16k May 6, 2004 Franz Ferdinand 2004-04-22 May 6, 2004 prefuse visualisation toolkit May 6, 2004 Electric Chill Noise May 4, 2004 Searching scientific papers online May 3, 2004 Unbinding May 3, 2004 iPod pricing May 3, 2004 Returned-to Albums May 2, 2004 FNAC listings in Paris concerts RSS feed May 1, 2004 Be Not Afraid April 30, 2004 Blosxom 3 April 28, 2004 iTunes 4.5 April 28, 2004 Flickr photo badging April 27, 2004 GeneInfoViz April 27, 2004 Inline del.icio.us April 27, 2004 Random MP3.com playlist April 26, 2004 Inline Musilog April 25, 2004 Academic PDF workflow April 23, 2004 Collaborative playlists April 22, 2004 Current advantages of IM clients April 21, 2004 Follow mouse focus in X11 windows April 21, 2004 DaFONT April 21, 2004 Lab notebook database system April 21, 2004 Genotyping a meme April 20, 2004 OS X fonts in GTK2/Gimp April 20, 2004 Google ads April 19, 2004 Unicode vs Latin-1 April 19, 2004 Pixies Reunion Show April 16, 2004 PDF Browser plugin April 14, 2004 CulturePool April 13, 2004 ImageMagick 6 April 13, 2004 musiCompass April 12, 2004 TV torrent/RSS automation April 12, 2004 Spillsbury April 11, 2004 Fibonacci Ratios and Musical Intervals April 10, 2004 A shared word processor-bibliographic manager interface April 9, 2004 Musiclogging from Winamp April 8, 2004 Torrentbot Pt 2 April 8, 2004 Torrentbot April 8, 2004 Smell the satire April 7, 2004 Musiclogging April 6, 2004 MLtorrents bookmarklet April 5, 2004 Wonderfalls April 5, 2004 ml_www April 4, 2004 Outcesticide Set April 3, 2004 The live music archive is huge April 3, 2004 Repaying generosity April 3, 2004 BitTorrent command line client on OS X April 2, 2004 Bitcollider April 2, 2004 Entrez search updates April 2, 2004 MediaSeek April 2, 2004 Link extraction bookmarklet for Webjay March 31, 2004 User-centric data services March 31, 2004 AIBrainz March 31, 2004 AMG New Releases March 30, 2004 Gimp.app March 30, 2004 Album cover art tagging for Windows March 28, 2004 TV Shows March 28, 2004 OpenURL Router March 25, 2004 Text mining March 25, 2004 Science Commons March 24, 2004 Remembering konspire March 24, 2004 Music ownership in an open, online database March 23, 2004 PerlPrimer March 22, 2004 Music Publishers, Sales and Metadata March 22, 2004 Atomly March 22, 2004 HubMed SVG graphs March 19, 2004 Free Software directory March 15, 2004 LIARS March 12, 2004 Playr update March 10, 2004 Time Shifting March 10, 2004 pyget***** March 9, 2004 SHN and FLAC tools March 8, 2004 X11 on OS X March 8, 2004 Acknowledgements March 8, 2004 LaTeX add-ons March 8, 2004 LaTeX for Dummies March 8, 2004 LaTeX for the Modern Age March 8, 2004 How To Find (More Of) What You Want March 8, 2004 iPod Whamb skin with volume March 7, 2004 Join groups, find better music March 6, 2004 propa' mash-up ragga-rave dubplate bloodclot jungle tekno tour-de-force March 6, 2004 Semantic HiFi March 5, 2004 Document Handling LinkDump March 5, 2004 /cores March 5, 2004 Sparklines March 2, 2004 Closed-access data from open-acces publications March 1, 2004 HubMed Atom feeds March 1, 2004 Power laws and purchasing priorities February 29, 2004 One World in March February 29, 2004 Audioscrobbler Browser update February 29, 2004 Darkplace February 29, 2004 reviewsKimya Dawson February 28, 2004 Netlabel Catalogue February 27, 2004 Releasing Mac Word 6.0 February 27, 2004 Groupthinking, but on which side? February 26, 2004 The Advancement of Science and Culture February 25, 2004 Ranchero's Big Cat Scripts plugin February 25, 2004 Movable Type 'Edit This Entry' bookmarklet February 25, 2004 AdvanceMAME February 25, 2004 Zoom Player February 24, 2004 LaTeX February 20, 2004 GTK/Panther February 20, 2004 RAM February 19, 2004 Yahoo search bookmarklet February 19, 2004 Mperia February 17, 2004 Jukebox 45 MP3 collection for £3.99 February 13, 2004 Recommend playlists with Flickr February 13, 2004 Berlin Bastard Lesson February 12, 2004 Raster Noton February 11, 2004 Reviewr February 11, 2004 reviewsSubviral RNA February 11, 2004 America's Sweetheart February 10, 2004 Dear Microsoft Word February 10, 2004 Listening post February 10, 2004 DeepVacuum [OS X] February 8, 2004 Throttled [OS X] February 8, 2004 CiteSpace February 3, 2004 Open-source audio, tools for endless music [OS X] February 3, 2004 OmniaMea February 2, 2004 EndNote PubMed import filter February 2, 2004 Compiling an MP3-playing Helix Client on Panther February 1, 2004 Helix developer grants February 1, 2004 NetNewsWire has no Atom February 1, 2004 Ecto final January 31, 2004 betterPropaganda January 30, 2004 QuickTime for M3U January 30, 2004 Kazaa/Kapsule test needed January 30, 2004 Emotilinks January 30, 2004 USB floppy drive with Red Hat Linux January 29, 2004 Install Classic after Panther January 29, 2004 Primer design for cDNA amplification January 29, 2004 Vienna RNA January 28, 2004 It's called I Like January 26, 2004 Webjay.org January 26, 2004 Nothing in return January 25, 2004 Playlist bookmarklet update January 25, 2004 Sente January 23, 2004 Automatically update iTunes library with daapd January 23, 2004 Earth Map Desktop January 22, 2004 MacGDE January 22, 2004 Musicplasma January 22, 2004 iTunes Music Store RSS Generator January 22, 2004 MTCommentAuthorLink January 21, 2004 M3U playlists page January 20, 2004 Artemis sequence viewer January 20, 2004 iTunes Opener January 19, 2004 JPEGs progress January 18, 2004 Pick of the bastard pops January 18, 2004 RealPlayer 10 installation January 18, 2004 Copper, prions and TSE January 18, 2004 Suprnova RSS feed January 18, 2004 HubMed print-friendly pages January 17, 2004 Scientific stories January 17, 2004 MP3 to M3U or SMIL playlist January 16, 2004 Return of the Mac January 14, 2004 kX project and AVG January 14, 2004 Firebird Tabbrowser Extensions January 11, 2004 Morality of software patches January 11, 2004 Transmission 3000 January 10, 2004 Pitchfork Singles of 2003 January 7, 2004 Are you feeling throaty? January 7, 2004 1&1 free hosting January 7, 2004 largehearted boy January 5, 2004 What a difference a year makes January 5, 2004 Linking to MusicBrainz January 5, 2004 Fountain of Youth January 4, 2004 Collections of music files from distributed sources January 3, 2004 Trying this again January 2, 2004 iBook display problems? January 1, 2004 2003 Festive Fifty January 1, 2004 Happy New Year January 1, 2004 Some contributions to saving the internet December 31, 2003 Who represents my points of view December 31, 2003 Scientists for Dean December 31, 2003 Social software interfaces December 31, 2003 Movable Type plugins December 31, 2003 Using religion for aggression December 31, 2003 Playlist distribution December 31, 2003 GPSWeb December 31, 2003 RPXP web service December 19, 2003 MLDonkey [OS X] December 19, 2003 Vocal removal December 19, 2003 Pester December 18, 2003 Make GTK apps pretty December 17, 2003 MAD plugin eats track numbers December 17, 2003 The Signaling Gateway December 16, 2003 Movable Type on Windows XP December 16, 2003 XStream Radio December 16, 2003 SourceForge December 16, 2003 No Fink December 12, 2003 Fluorescence microscopy movies December 12, 2003 GoFigure December 12, 2003 Rumoured demise of BioMedNet December 11, 2003 SemBlogBibMan December 9, 2003 Not safe for work December 9, 2003 RDF for last played tracks, via Audioscrobbler December 9, 2003 MP3 blogs are switched on December 7, 2003 RSS feed for Paris concerts December 7, 2003 Google irregularity December 6, 2003 Festival Octopus December 4, 2003 Azureus December 4, 2003 0day audio December 4, 2003 Kid 606 video December 4, 2003 Album continuum December 3, 2003 Laptop DJ December 3, 2003 Firebird [OS X] December 3, 2003 Playlist tracklisting update December 3, 2003 WEASEL December 2, 2003 Electromagnetism December 2, 2003 QOTSA/Kyuss circle of collaboration December 1, 2003 musictouchgraphBBC Radio interface November 30, 2003 The Shins on KCRW November 29, 2003 Unfree the music November 28, 2003 Movable Type spam vulnerability November 26, 2003 French-speaking weblog rankings November 24, 2003 QTFairUse November 23, 2003 Got FOAF November 23, 2003 cgi_buffer November 22, 2003 blam2 for trial November 22, 2003 Vorbis updates November 21, 2003 BibDesk November 21, 2003 The perfect email November 20, 2003 Singingfish November 20, 2003 GLC November 19, 2003 bleep.com November 18, 2003 Free- or donation-ware updates for Panther November 18, 2003 You will require... November 8, 2003 XML for individual entries November 6, 2003 Eugene Garfield commentaries November 6, 2003 Styling RSS with CSS to make it browser-friendly November 6, 2003 SciDev Open Access section November 5, 2003 Reviews-enabled Movable Type November 5, 2003 Sublime electronica November 5, 2003 Excel add-in to remove low numbers November 4, 2003 Fixed the blaxm reviews exchange November 3, 2003 Blogware with reviews metadata November 2, 2003 A Tune Called Grin October 31, 2003 Longhorn October 30, 2003 iTunes playlist hint October 29, 2003 Facil-o-SMIL update for M3U and CC October 29, 2003 Phrase searching in PubMed October 28, 2003 The Knowledge Society October 28, 2003 FlowJo October 28, 2003 Wow. A big clock. October 28, 2003 PlayLouder MSP October 27, 2003 Soulseek recommendations October 25, 2003 WinAMP5 October 25, 2003 PDC Pokemon October 24, 2003 Fink upgrade for gcc 3.3 October 24, 2003 Google Glossary October 22, 2003 PLOS Biology October 21, 2003 Chutes Too Narrow October 19, 2003 Constant playlist October 18, 2003 Facil-o-SMIL October 18, 2003 Weed October 18, 2003 GNU Privacy Guard October 18, 2003 OS X IM: MSN IS ON October 18, 2003 PLOS Biology trackbacks in HubMed October 17, 2003 PLOS Biology October 17, 2003 From The Ashes October 17, 2003 Stop the Leaks October 16, 2003 tunA October 15, 2003 Empty pages in search results October 15, 2003 Steam [OS X] October 13, 2003 Calendar events from XHTML October 13, 2003 Sound October 12, 2003 In defence of open access October 10, 2003 EMusic pricing changes October 9, 2003 Digital Accretion October 9, 2003 Daily MP3s from Pitchforkmedia October 9, 2003 Albums Of The Year (so far) October 8, 2003 Dynamo playlist October 7, 2003 Pure Data DSP software October 7, 2003 Open Source Democracy October 6, 2003 Bad spiders October 6, 2003 Mercora October 6, 2003 Boom Selected October 2, 2003 HeadCloud October 2, 2003 Flat-fee P2P model October 2, 2003 Neuro-info-transmitters October 2, 2003 What's on my docks? October 2, 2003 Metadata in the MetaWeblog API October 1, 2003 RDF review vocabulary October 1, 2003 The Wellcome Trust supports open access October 1, 2003 Shareable playlists October 1, 2003 Bloglines recommendations October 1, 2003 Mini-links RSS feed September 30, 2003 Research Mapper September 30, 2003 Downstream September 29, 2003 iWebCal September 29, 2003 Dynamic event files September 29, 2003 Freak Up, Look Smart September 28, 2003 X11 goodies September 25, 2003 Export an event from a web page to iCal September 25, 2003 Terminally ill September 24, 2003 iCal events from web pages September 24, 2003 Fingertips September 24, 2003 NatureEvents September 24, 2003 KDE September 24, 2003 Share The Music September 22, 2003 Jumbled words September 22, 2003 Equinox September 19, 2003 Syncato September 18, 2003 WSIL for blogroll autodiscovery September 17, 2003 TV listings and audio streaming licensing September 17, 2003 Worst Jobs in Science September 17, 2003 Digital marketplace summary September 16, 2003 UK data surveillance measures September 16, 2003 Intellectual is not physical September 16, 2003 Collective payment September 15, 2003 ICARIS 2003 September 10, 2003 The importance of open access for semantic research September 10, 2003 DEVONagent [OS X] September 9, 2003 Polished turds September 9, 2003 Fame *and* fortune (if you're good enough) September 9, 2003 Open source bibliography format September 8, 2003 biologging September 8, 2003 Science and Religion forum September 8, 2003 Scientific publishing September 7, 2003 Konspire radio channel September 7, 2003 Subscribe to comments September 5, 2003 Trackbacks September 4, 2003 Nicotine September 4, 2003 Waypath September 4, 2003 my.PubMed RSS feeds September 4, 2003 Musical interlude September 4, 2003 Biotech protocols September 2, 2003 Smokescreen September 1, 2003 Openam.com subdomains August 29, 2003 Armagetron Tron clone August 26, 2003 NetNewsWire with WebKit August 26, 2003 Fun with XMLTV August 26, 2003 Peel Sessions August 25, 2003 The return of OpenAm August 25, 2003 BBC Creative Archive August 25, 2003 radioLaw Enforcement Against Prohibition August 21, 2003 EarthStation5 August 21, 2003 MSN network rejigged August 21, 2003 Tofu August 21, 2003 Album cover artwork August 21, 2003 FreakMachine August 20, 2003 Human Knowledge Navigator August 14, 2003 Peer review under scrutiny August 11, 2003 Tools for handling information August 11, 2003 mCode August 9, 2003 Classification of associations August 9, 2003 Morale-o-Meter August 9, 2003 Miranda IM August 9, 2003 PRISM for RDF August 9, 2003 Who's going to pay? August 8, 2003 Perception August 8, 2003 BluFilter August 8, 2003 Trillian Pro 2.0 beta August 7, 2003 Protein Interaction Browser August 7, 2003 Music Browser repaired August 6, 2003 109 August 6, 2003 Jobs as RSS extensions August 6, 2003 Quick Release August 5, 2003 Test August 1, 2003 OpenAm linking July 31, 2003 Rock-It Launcher July 30, 2003 AOL Journals July 29, 2003 Myths and legends of file sharing July 29, 2003 OS X Show Desktop July 28, 2003 The same thing, again July 28, 2003 Digital sales network July 25, 2003 YAPC July 25, 2003 Buy back continues July 22, 2003 blosxom.com July 21, 2003 Vague memories July 21, 2003 Amazon tracks search box July 18, 2003 Perl Culture July 18, 2003 We're all going straight to hell :-) July 17, 2003 BioMed Central links July 17, 2003 Faculty of 1000 links July 17, 2003 Open bibliography software July 15, 2003 Musical artifacts July 11, 2003 WireTap July 11, 2003 The Marigolds July 9, 2003 The Holy Grail July 9, 2003 Audioscrobbler + last.fm July 9, 2003 Safari FullScreen bookmarklet July 9, 2003 BioMed Central articles in one big zip July 9, 2003 Negative Feedback on eBay July 9, 2003 Copyright for scientific papers in Eprint archives July 7, 2003 Performance at the cost of expansibility July 7, 2003 I (heart symbol) MP3 July 7, 2003 When Fireworks Attack July 7, 2003 Online electronic hardware stores July 7, 2003 TouchGraph LiveJournal Browser July 7, 2003 Zane Lowe on Radio 1 July 4, 2003 Clutter update July 2, 2003 Endnote v7 July 2, 2003 RSS legacy July 2, 2003 Did you know? July 2, 2003 Open access conference reports June 30, 2003 Phone GPS June 30, 2003 Searching for the social benefits of technological progress June 30, 2003 EFF seeks P2P licensing scheme June 27, 2003 Cites and Insights July June 27, 2003 Public Access to Science Act June 27, 2003 Blosxom rating plugin June 27, 2003 PithHelmet June 25, 2003 NITLE Blog Census API June 25, 2003 Blogs ! US June 25, 2003 Concept clustering June 25, 2003 Molecular Graphics on OS X June 25, 2003 How I got Soulseek to work on OS X June 25, 2003 Handy hints June 25, 2003 Public Library of Science June 24, 2003 ID card consultation figures June 20, 2003 Nature PDF content extraction June 20, 2003 OpenURL draft standard June 20, 2003 Politics and the English Language June 20, 2003 Four Tet Favourites June 18, 2003 Costs of illicit MP3 downloading June 18, 2003 Spoogefest June 12, 2003 Unicode characters in HubMed June 11, 2003 Paid for software June 10, 2003 Site redesign June 10, 2003 Ontologies in scientific research June 10, 2003 Concert listings June 10, 2003 PDF annotation June 10, 2003 Spared from internet hell June 10, 2003 Political positioning June 6, 2003 Technicalities of a P2P Music Market June 5, 2003 Modelling social interactions June 5, 2003 Andromeda on OS X May 29, 2003 Back on Track May 27, 2003 RVW specification May 27, 2003 Jack Valenti says May 26, 2003 kast/konspire2b May 21, 2003 Come Together May 21, 2003 Emergence May 20, 2003 iTunes script May 20, 2003 Emusic signs Beggars Group May 20, 2003 Principles of Emergent Democracy May 20, 2003 Lo-Fi May 19, 2003 Jabber notification of new referrers May 19, 2003 LameBrain May 18, 2003 Kwiki and VoodooPad May 17, 2003 Music recommendations May 17, 2003 Technorati API in blaxm! May 17, 2003 If they want to do this the hard way... May 17, 2003 SFX/OpenURL interview May 16, 2003 Advertoys May 16, 2003 Nodalpoint - moderated bioinformatics papers from PubMed May 15, 2003 RVW success May 14, 2003 Test review for RVW markup in RSS 2.0 May 14, 2003 Photopal May 14, 2003 Geograffiti May 14, 2003 RVW format in RSS 2.0 May 14, 2003 Video Sans Frontieres May 14, 2003 The Scientist in RSS May 14, 2003 RVW format in RSS May 14, 2003 Arrowsmith May 13, 2003 WinAMP AAC/MP4 input plugin May 13, 2003 iSuck May 13, 2003 iTunes, again May 13, 2003 Global Friendster visualisation May 13, 2003 Peer-to-peer search spidering May 9, 2003 DRM within AAC files May 8, 2003 Processing Soda May 8, 2003 Improving science through online commentary May 8, 2003 RVW standard metadata format for reviews May 7, 2003 DJ Martian's page May 6, 2003 iTunes download May 5, 2003 FOS News catchup May 4, 2003 Scrobbleyou May 4, 2003 On The Wire May 1, 2003 EMusic upgrade May 1, 2003 iTunes 4 May 1, 2003 Electric Six - Fire April 30, 2003 iTunes Music Store top downloads April 30, 2003 CD industry seeks niche April 29, 2003 Echocloud April 29, 2003 Music licensing April 27, 2003 Semantic blogging demonstrator April 27, 2003 Modular, extensible RDF April 27, 2003 TouchGraph Audioscrobbler Browser April 26, 2003 Antisocial behaviour in online communities April 26, 2003 Laszlo April 22, 2003 The World Live Web April 21, 2003 Finding people April 21, 2003 The Wipers - Box Set -- Is This Real April 21, 2003 Librarians on the offensive April 20, 2003 Environmental Noise Retards Auditory Cortical Development April 20, 2003 The liberation will not be nationalised April 19, 2003 Fire with intent April 19, 2003 Thinkbot April 18, 2003 Globe Alive April 17, 2003 Journal of Mammalogy April 17, 2003 Open Access April 16, 2003 EQUATOR April 15, 2003 Last.fm April 15, 2003 Completion of the Human Genome Project April 15, 2003 RDF braindump April 14, 2003 WaveFinder, DAB April 14, 2003 Microsound April 13, 2003 Sumeria April 13, 2003 George Boosh April 11, 2003 Terrestrial jukebox April 10, 2003 Not content April 10, 2003 Summarise this April 10, 2003 Queens of the Stone Age - Feel Good Hit of the Summer April 10, 2003 Many to many April 10, 2003 Digital video April 10, 2003 Digital music streaming April 9, 2003 An excellent lab web page April 8, 2003 W3C drafting, drifting April 8, 2003 Internet Explorer 6 April 8, 2003 Winamp 2.9 April 6, 2003 Phoenix April 6, 2003 Clarity of Writing April 6, 2003 a miscommunication with civilians April 6, 2003 A few blam! and blaxm! updates April 6, 2003 Not in My Name ++ April 5, 2003 Some radio shows April 5, 2003 Automata and visualisation April 5, 2003 Complexity Digest RSS April 5, 2003 Political fiction April 5, 2003 Distributing music on plastic discs one album at a time April 4, 2003 Discussion from CTO forum April 4, 2003 Blueprint for phased access journals April 4, 2003 MP3 ripping and encoding benchmark April 3, 2003 New Clinic album April 3, 2003 A Tune April 3, 2003 Rock & Roll Library April 2, 2003 More sites on sticks April 1, 2003 The day music became priceless April 1, 2003 blaxm!, FOAF, RSS March 31, 2003 MP3 track IDs March 31, 2003 Death of an activist March 29, 2003 Iraqi opposition March 29, 2003 Web applications March 28, 2003 Anacubis visual Google March 27, 2003 BeOS file system with metadata for OS X March 26, 2003 BioinforMatrix March 26, 2003 AcroMed March 26, 2003 FOAF Browser March 26, 2003 Thinkbot March 25, 2003 Lock down March 25, 2003 Empire March 21, 2003 A global discussion forum, by invitation only. March 21, 2003 Techgnosis March 20, 2003 blam! + Radio March 20, 2003 blam! + Blogger March 19, 2003 XNap hint for OS X March 19, 2003 Standardised review metadata March 19, 2003 blam! + Moveable Type March 19, 2003 OAI searches from HubMed March 18, 2003 Newzcrawler update March 18, 2003 IMDB moveabletype hack March 18, 2003 Amazon CD track listings March 17, 2003 Not unexpectedly pleasant March 17, 2003 blam!: Amazon review creator March 17, 2003 Research buy-back March 15, 2003 Endnote March 15, 2003 MP3 Sushi March 15, 2003 But why? March 15, 2003 Making money March 13, 2003 ScienceDirect Backfiles March 13, 2003 A simplified valuation of commoditised art March 12, 2003 Apple Java Hooray March 11, 2003 iBook USB FM radio tuner March 11, 2003 More mini-things March 10, 2003 Digital collection and peer review March 10, 2003 SpirographX March 10, 2003 Biopedia March 8, 2003 The Ends of the Internet March 7, 2003 biologging part 2 March 6, 2003 Oral traditions in online communication March 4, 2003 Value of music March 4, 2003 Biologging March 4, 2003 Spiders March 3, 2003 Citation Trackbacks March 3, 2003 Keyboard shortcuts March 2, 2003 GiftBoX March 2, 2003 CNPS March 2, 2003 AllAbstracts bookmarklet March 2, 2003 Citation Maps February 28, 2003 Science Citation Index February 27, 2003 Medscape Headlines in RSS February 25, 2003 iScrobbler February 25, 2003 Open Access Literature Part III February 24, 2003 Open Access Literature Part II February 23, 2003 Visualisations of political polarisation February 23, 2003 Better late than never February 22, 2003 Andromeda/PHP on OS X February 18, 2003 Open Access Literature February 18, 2003 HubLink February 17, 2003 allmusic-to-magnet-URI bookmarklet February 15, 2003 Semantic Blogging and Bibliographies February 15, 2003 Linking Services February 14, 2003 Nice Titles February 14, 2003 EndNote import filter updated February 13, 2003 endnotehubmedExceptions to copyright February 10, 2003 Latent Semantic Indexing February 9, 2003 Fair use February 7, 2003 Proper P2P February 7, 2003 Taking the internet outside February 7, 2003 The Infography February 7, 2003 Safari cookies February 2, 2003 cookiessafariTOC alerts January 31, 2003 alertsnewsreaderpushrsszetoc evaluation January 31, 2003 ParaTools January 27, 2003 citationparaciteparsingIntegrated Comments and TrackBacks January 25, 2003 commentstrackbackFixed TouchGraph scripts January 23, 2003 applettouchgraphvisualisationCitation parser update January 22, 2003 citationparsereferencesCollaboration Network Browser January 22, 2003 Analogies with TrackBack variants January 21, 2003 analogiesbiomedicalliteraturenetworksself-organisingtrackbackMake A List January 21, 2003 collaborativelisttrackbackweblogEDINA Join-Up January 20, 2003 openurlCitation matcher updated for multiple references January 20, 2003 citationparseRIS citation export file suffix January 20, 2003 exportfilerisCitation matching January 18, 2003 citationopcitparseHubLog RSS update January 18, 2003 hublogrssTrackBack January 18, 2003 trackbackAlternative software for community-driven literature management January 15, 2003 blogcommunitysitesoftwarePersonal/group publishing January 15, 2003 knowledgeliteraturepersonalpublishingwebImmunoLog launched January 15, 2003 collaborativejournalJoining the dots - advances in online biomedical literature management. January 14, 2003 biomedicalknowledgeliteraturemanagementSafari, TouchGraph update January 11, 2003 touchgraphSFX Lookup bookmarklet January 10, 2003 bookmarkletsfx03-01-08: Perl scripts for organising PDFs January 9, 2003 acrobatperl03-01-03: Library Lookup ISSN bookmarklet January 9, 2003 bookmarkletlibrarylookup03-01-03: Experimental links January 9, 2003 citationdoi02-12-20: Gnutella P2P January 9, 2003 gnutellamagnetp2p02-12-16: BibTex output January 9, 2003 bibtexpubmed02-12-08: Endnote and RIS import filters January 9, 2003 endnoteexportpubmed02-12-04: Related Articles algorithm January 9, 2003 articlespubmedrelated02-12-03: LinkOut URLs January 9, 2003 fulltextlinkoutpubmed02-12-02: PubMed Javascript January 9, 2003 javascriptpubmed02-11-25: HubMed online. January 9, 2003 perlpubmedutilitiesxml feeds-feedburner-com-1367 ---- None feeds-feedburner-com-1383 ---- None feeds-feedburner-com-1396 ---- None feeds-feedburner-com-139 ---- None feeds-feedburner-com-1472 ---- None feeds-feedburner-com-1551 ---- None feeds-feedburner-com-1654 ---- None feeds-feedburner-com-1780 ---- None feeds-feedburner-com-1820 ---- None feeds-feedburner-com-1864 ---- None feeds-feedburner-com-2000 ---- None feeds-feedburner-com-2010 ---- None feeds-feedburner-com-2109 ---- None feeds-feedburner-com-2265 ---- None feeds-feedburner-com-2282 ---- None feeds-feedburner-com-2544 ---- None feeds-feedburner-com-2558 ---- None feeds-feedburner-com-2565 ---- None feeds-feedburner-com-2587 ---- None feeds-feedburner-com-2638 ---- None feeds-feedburner-com-2805 ---- None feeds-feedburner-com-2898 ---- None feeds-feedburner-com-2907 ---- None feeds-feedburner-com-2947 ---- None feeds-feedburner-com-2968 ---- None feeds-feedburner-com-3096 ---- None feeds-feedburner-com-31 ---- None feeds-feedburner-com-3212 ---- None feeds-feedburner-com-3225 ---- None feeds-feedburner-com-3374 ---- None feeds-feedburner-com-3463 ---- None feeds-feedburner-com-34 ---- None feeds-feedburner-com-3542 ---- None feeds-feedburner-com-3635 ---- None feeds-feedburner-com-3711 ---- None feeds-feedburner-com-389 ---- None feeds-feedburner-com-4034 ---- None feeds-feedburner-com-405 ---- What I Learned Today… What I Learned Today… Taking a Break I’m sure those of you who are still reading have noticed that I haven’t been updating this site much in the past few years. I was sharing my links with you all but now Delicious has started adding ads to that. I’m going to rethink how I can use this site effectively going forward. For […] Bookmarks for May 3, 2016 Today I found the following resources and bookmarked them on Delicious. Start A Fire Grow and expand your audience by recommending your content within any link you share Digest powered by RSS Digest Bookmarks for April 4, 2016 Today I found the following resources and bookmarked them on Delicious. Mattermost Mattermost is an open source, self-hosted Slack-alternative mBlock Program your app, Arduino projects and robots by dragging & dropping Fidus Writer Fidus Writer is an online collaborative editor especially made for academics who need to use citations and/or formulas. Beek Social network for […] Bookmarks for February 25, 2016 Today I found the following resources and bookmarked them on Delicious. Connfa Open Source iOS & Android App for Conferences & Events Paperless Scan, index, and archive all of your paper documents Foss2Serve Foss2serve promotes student learning via participation in humanitarian Free and Open Source Software (FOSS) projects. Disk Inventory X Disk Inventory X is […] Bookmarks for January 9, 2016 Today I found the following resources and bookmarked them on Delicious. Superpowers The open source, extensible, collaborative HTML5 2D+3D game maker Sequel Pro Sequel Pro is a fast, easy-to-use Mac database management application for working with MySQL databases. Digest powered by RSS Digest Bookmarks for December 11, 2015 Today I found the following resources and bookmarked them on Delicious. Open Broadcaster Software Free, open source software for live streaming and recording Digest powered by RSS Digest Bookmarks for November 22, 2015 Today I found the following resources and bookmarked them on Delicious. NumFOCUS Foundation NumFOCUS promotes and supports the ongoing research and development of open-source computing tools through educational, community, and public channels. Digest powered by RSS Digest Bookmarks for November 16, 2015 Today I found the following resources and bookmarked them on Delicious. Smore Smore makes it easy to design beautiful and effective online flyers and newsletters. Ninite Install and Update All Your Programs at Once Digest powered by RSS Digest Bookmarks for November 13, 2015 Today I found the following resources and bookmarked them on Delicious. VIM Adventures Learning VIM while playing a game Digest powered by RSS Digest Bookmarks for November 10, 2015 Today I found the following resources and bookmarked them on Delicious. Star Wars: Building a Galaxy with Code Digest powered by RSS Digest Bookmarks for October 31, 2015 Today I found the following resources and bookmarked them on Delicious. Open Food Facts Open Food Facts gathers information and data on food products from around the world. Digest powered by RSS Digest Bookmarks for October 27, 2015 Today I found the following resources and bookmarked them on Delicious. VersionPress WordPress meets Git, properly. Undo anything (including database changes), clone & merge your sites, maintain efficient backups, all with unmatched simplicity. Digest powered by RSS Digest Bookmarks for October 20, 2015 Today I found the following resources and bookmarked them on Delicious. SOGo Share your calendars, address books and mails in your community with a completely free and open source solution. Let your Mozilla Thunderbird/Lightning, Microsoft Outlook, Android, Apple iCal/iPhone and BlackBerry users collaborate using a modern platform. GitBook GitBook is a modern publishing toolchain. Making […] Bookmarks for October 19, 2015 Today I found the following resources and bookmarked them on Delicious. Discourse Discourse is the 100% open source discussion platform built for the next decade of the Internet. It works as a mailing list, a discussion forum, and a long-form chat room Digest powered by RSS Digest Bookmarks for September 28, 2015 Today I found the following resources and bookmarked them on Delicious. Zulip A group chat application optimized for software development teams Digest powered by RSS Digest Bookmarks for September 25, 2015 Today I found the following resources and bookmarked them on Delicious. iDoneThis Reply to an evening email reminder with what you did that day. The next day, get a digest with what everyone on the team got done. Digest powered by RSS Digest Bookmarks for September 22, 2015 Today I found the following resources and bookmarked them on Delicious. Vector Vector is a new, fully open source communication and collaboration tool we’ve developed that’s open, secure and interoperable. Based on the concept of rooms and participants, it combines a great user interface with all core functions we need (chat, file transfer, VoIP and […] Bookmarks for September 11, 2015 Today I found the following resources and bookmarked them on Delicious. Roundcube Free and Open Source Webmail Software Bolt Bolt is an open source Content Management Tool, which strives to be as simple and straightforward as possible. It is quick to set up, easy to configure, uses elegant templates, and above all: It’s a joy […] Bookmarks for September 10, 2015 Today I found the following resources and bookmarked them on Delicious. MadEye MadEye is a collaborative web editor backed by your filesystem. Digest powered by RSS Digest Bookmarks for September 6, 2015 Today I found the following resources and bookmarked them on Delicious. Gimlet Your library’s questions and answers put to their best use. Know when your desk will be busy. Everyone on your staff can find answers to difficult questions. Digest powered by RSS Digest Bookmarks for September 2, 2015 Today I found the following resources and bookmarked them on Delicious. Thimble by Mozilla Thimble is an online code editor that makes it easy to create and publish your own web pages while learning HTML, CSS & JavaScript. Google Coder a simple way to make web stuff on Raspberry Pi Digest powered by RSS Digest Bookmarks for August 23, 2015 Today I found the following resources and bookmarked them on Delicious. MediaGoblin MediaGoblin is a free software media publishing platform that anyone can run. You can think of it as a decentralized alternative to Flickr, YouTube, SoundCloud, etc. The Architecture of Open Source Applications A web whiteboard A Web Whiteboard is touch-friendly online whiteboard app […] Bookmarks for August 6, 2015 Today I found the following resources and bookmarked them on Delicious. Computer Science Learning Opportunities We have developed a range of resources, programs, scholarships, and grant opportunities to engage students and educators around the world interested in computer science. Digest powered by RSS Digest Bookmarks for August 3, 2015 Today I found the following resources and bookmarked them on Delicious. Pydio The mature open source alternative to Dropbox and box.net Digest powered by RSS Digest Bookmarks for July 23, 2015 Today I found the following resources and bookmarked them on Delicious. hylafax The world’s most advanced open source fax server Digest powered by RSS Digest feeds-feedburner-com-4092 ---- None feeds-feedburner-com-4179 ---- None feeds-feedburner-com-4232 ---- None feeds-feedburner-com-4330 ---- None feeds-feedburner-com-4354 ---- None feeds-feedburner-com-4356 ---- None feeds-feedburner-com-4386 ---- None feeds-feedburner-com-4399 ---- None feeds-feedburner-com-4413 ---- None feeds-feedburner-com-4502 ---- None feeds-feedburner-com-4552 ---- None feeds-feedburner-com-4684 ---- None feeds-feedburner-com-4718 ---- None feeds-feedburner-com-4815 ---- None feeds-feedburner-com-4819 ---- None feeds-feedburner-com-4909 ---- None feeds-feedburner-com-4915 ---- None feeds-feedburner-com-4919 ---- None feeds-feedburner-com-5344 ---- None feeds-feedburner-com-5357 ---- None feeds-feedburner-com-5408 ---- None feeds-feedburner-com-5455 ---- None feeds-feedburner-com-5456 ---- None feeds-feedburner-com-5594 ---- None feeds-feedburner-com-5610 ---- None feeds-feedburner-com-5675 ---- None feeds-feedburner-com-5786 ---- None feeds-feedburner-com-5871 ---- None feeds-feedburner-com-5993 ---- None feeds-feedburner-com-6084 ---- None feeds-feedburner-com-6133 ---- None feeds-feedburner-com-6161 ---- None feeds-feedburner-com-6178 ---- None feeds-feedburner-com-6184 ---- commonplace.net commonplace.net Data. The final frontier. Infrastructure for heritage institutions – ARK PID’s In the Digital Infrastructure program at the Library of the University of Amsterdam we have reached a first milestone. In my previous post in the Infrastructure for heritage institutions series, “Change of course“, I mentioned the coming implementation of ARK persistent identifiers for our collection objects. Since November 3, 2020, ARK PID’s are available for our university library Alma catalogue through the Primo user interface. Implementation of ARK PID’s for the other collection description systems […] Infrastructure for heritage institutions – change of course In July 2019 I published the first post about our planning to realise a “coherent and future proof digital infrastructure” for the Library of the University of Amsterdam. In February I reported on the first results. As frequently happens, since then the conditions have changed, and naturally we had to adapt the direction we are following to achieve our goals. In other words: a change of course, of course.  Projects  I will leave aside the […] Infrastructure for heritage institutions – first results In July 2019 I published the post Infrastructure for heritage institutions in which I described our planning to realise a “coherent and future proof digital infrastructure” for the Library of the University of Amsterdam. Time to look back: how far have we come? And time to look forward: what’s in store for the near future? Ongoing activities I mentioned three “currently ongoing activities”:  Monitoring and advising on infrastructural aspects of new projects Maintaining a structured dynamic overview […] Infrastructure for heritage institutions During my vacation I saw this tweet by LIBER about topics to address, as suggested by the participants of the LIBER 2019 conference in Dublin: It shows a word cloud (yes, a word cloud) containing a large number of terms. I list the ones I can read without zooming in (so the most suggested ones, I guess), more or less grouped thematically: Open scienceOpen dataOpen accessLicensingCopyrightsLinked open dataOpen educationCitizen science Scholarly communicationDigital humanities/DHDigital scholarshipResearch assessmentResearch […] Ten years linked open data This post is the English translation of my original article in Dutch, published in META (2016-3), the Flemish journal for information professionals. Ten years after the term “linked data” was introduced by Tim Berners-Lee it appears to be time to take stock of the impact of linked data for libraries and other heritage institutions in the past and in the future. I will do this from a personal historical perspective, as a library technology professional, […] Maps, dictionaries and guidebooks Interoperability in heterogeneous library data landscapes Libraries have to deal with a highly opaque landscape of heterogeneous data sources, data types, data formats, data flows, data transformations and data redundancies, which I have earlier characterized as a “data maze”. The level and magnitude of this opacity and heterogeneity varies with the amount of content types and the number of services that the library is responsible for. Academic and national libraries are possibly dealing with more […] Standard deviations in data modeling, mapping and manipulation Or: Anything goes. What are we thinking? An impression of ELAG 2015 This year’s ELAG conference in Stockholm was one of many questions. Not only the usual questions following each presentation (always elicited in the form of yet another question: “Any questions?”). But also philosophical ones (Why? What?). And practical ones (What time? Where? How? How much?). And there were some answers too, fortunately. This is my rather personal impression of the event. For a […] Analysing library data flows for efficient innovation In my work at the Library of the University of Amsterdam I am currently taking a step forward by actually taking a step back from a number of forefront activities in discovery, linked open data and integrated research information towards a more hidden, but also more fundamental enterprise in the area of data infrastructure and information architecture. All for a good cause, for in the end a good data infrastructure is essential for delivering high […] Looking for data tricks in Libraryland IFLA 2014 Annual World Library and Information Congress Lyon – Libraries, Citizens, Societies: Confluence for Knowledge After attending the IFLA 2014 Library Linked Data Satellite Meeting in Paris I travelled to Lyon for the first three days (August 17-19) of the IFLA 2014 Annual World Library and Information Congress. This year’s theme “Libraries, Citizens, Societies: Confluence for Knowledge” was named after the confluence or convergence of the rivers Rhône and Saône where the city of […] Library Linked Data Happening On August 14 the IFLA 2014 Satellite Meeting ‘Linked Data in Libraries: Let’s make it happen!’ took place at the National Library of France in Paris. Rurik Greenall (who also wrote a very readable conference report) and I had the opportunity to present our paper ‘An unbroken chain: approaches to implementing Linked Open Data in libraries; comparing local, open-source, collaborative and commercial systems’. In this paper we do not go into reasons for libraries to […] feeds-feedburner-com-6309 ---- None feeds-feedburner-com-6319 ---- None feeds-feedburner-com-6462 ---- None feeds-feedburner-com-6496 ---- None feeds-feedburner-com-6549 ---- Dan Cohen Dan Cohen Vice Provost, Dean, and Professor at Northeastern University When We Look Back on 2020, What Will We See? It is far too early to understand what happened in this historic year of 2020, but not too soon to grasp what we will write that history from: data—really big data, gathered from our devices and ourselves. Sometimes a new technology provides an important lens through which a historical event is recorded, viewed, and remembered. […] More than THAT “Less talk, more grok.” That was one of our early mottos at THATCamp, The Humanities and Technology Camp, which started at the Roy Rosenzweig Center for History and New Media at George Mason University in 2008. It was a riff on “Less talk, more rock,” the motto of WAAF, the hard rock station in Worcester, Massachusetts. And […] Humane Ingenuity: My New Newsletter With the start of this academic year, I’m launching a new newsletter to explore technology that helps rather than hurts human understanding, and human understanding that helps us create better technology. It’s called Humane Ingenuity, and you can subscribe here. (It’s free, just drop your email address into that link.) Subscribers to this blog know […] Engagement Is the Enemy of Serendipity Whenever I’m grumpy about an update to a technology I use, I try to perform a self-audit examining why I’m unhappy about this change. It’s a helpful exercise since we are all by nature resistant to even minor alterations to the technologies we use every day (which is why website redesign is now a synonym […] On the Response to My Atlantic Essay on the Decline in the Use of Print Books in Universities I was not expecting—but was gratified to see—an enormous response to my latest piece in The Atlantic, “The Books of College Libraries Are Turning Into Wallpaper,” on the seemingly inexorable decline in the circulation of print books on campus. I’m not sure that I’ve ever written anything that has generated as much feedback, commentary, and […] What’s New Season 2 Wrap-up With the end of the academic year at Northeastern University, the library wraps up our What’s New podcast, an interview series with researchers who help us understand, in plainspoken ways, some of the latest discoveries and ideas about our world. This year’s slate of podcasts, like last year’s, was extraordinarily diverse, ranging from the threat […] When a Presidential Library Is Digital I’ve got a new piece over at The Atlantic on Barack Obama’s prospective presidential library, which will be digital rather than physical. This has caused some consternation. We need to realize, however, that the Obama library is already largely digital: The vast majority of the record his presidency left behind consists not of evocative handwritten […] Robin Sloan’s Fusion of Technology and Humanity When Roy Rosenzweig and I wrote Digital History 15 years ago, we spent a lot of time thinking about the overall tone and approach of the book. It seemed to us that there were, on the one hand, a lot of our colleagues in professional history who were adamantly opposed to the use of digital […] Presidential Libraries and the Digitization of Our Lives Buried in the recent debates (New York Times, Chicago Tribune, The Public Historian) about the nature, objectives, and location of the Obama Presidential Center is the inexorable move toward a world in which virtually all of the documentation about our lives is digital. To make this decades-long shift—now almost complete—clear, I made the following infographic […] Kathleen Fitzpatrick’s Generous Thinking Generosity and thoughtfulness are not in abundance right now, and so Kathleen Fitzpatrick‘s important new book, Generous Thinking: A Radical Approach to Saving the University, is wholeheartedly welcome. The generosity Kathleen seeks relates to lost virtues, such as listening to others and deconstructing barriers between groups. As such, Generous Thinking can be helpfully read alongside […] feeds-feedburner-com-6560 ---- None feeds-feedburner-com-6640 ---- None feeds-feedburner-com-6661 ---- None feeds-feedburner-com-6753 ---- Library Hat Library Hat http://www.bohyunkim.net/blog/ Blockchain: Merits, Issues, and Suggestions for Compelling Use Cases * This post was also published in ACRL TechConnect.*** Blockchain holds a great potential for both innovation and disruption. The adoption of blockchain also poses certain risks, and those risks will need to be addressed and mitigated before blockchain becomes mainstream. A lot of people have heard of blockchain at this point. But many are […] Taking Diversity to the Next Level ** This post was also published in ACRL TechConnect on Dec. 18, 2017.*** Getting Minorities on Board I recently moderated a panel discussion program titled “Building Bridges in a Divisive Climate: Diversity in Libraries, Archives, and Museums.”1 Participating in organizing this program was interesting experience. During the whole time, I experienced my perspective constantly shifting […] From Need to Want: How to Maximize Social Impact for Libraries, Archives, and Museums At the NDP at Three event organized by IMLS yesterday, Sayeed Choudhury on the “Open Scholarly Communications” panel suggested that libraries think about return on impact in addition to return on investment (ROI). He further elaborated on this point by proposing a possible description of such impact. His description was that when an object or […] How to Price 3D Printing Service Fees ** This post was originally published in ACRL TechConnect on May. 22, 2017.*** Many libraries today provide 3D printing service. But not all of them can afford to do so for free. While free 3D printing may be ideal, it can jeopardize the sustainability of the service over time. Nevertheless, many libraries tend to worry […] Post-Election Statements and Messages that Reaffirm Diversity These are statements and messages sent out publicly or internally to re-affirm diversity, equity, and inclusion by libraries or higher ed institutions. I have collected these – some myself and many others through my fellow librarians. Some of them were listed on my blog post, “Finding the Right Words in Post-Election Libraries and Higher Ed.” […] Finding the Right Words in Post-Election Libraries and Higher Ed ** This post was originally published in ACRL TechConnect on Nov. 15, 2016.*** This year’s election result has presented a huge challenge to all of us who work in higher education and libraries. Usually, libraries, universities, and colleges do not comment on presidential election result and we refrain from talking about politics at work. But […] Say It Out Loud – Diversity, Equity, and Inclusion I usually and mostly talk about technology. But technology is so far away from my thought right now. I don’t feel that I can afford to worry about Internet surveillance or how to protect privacy at this moment. Not that they are unimportant. Such a worry is real and deserves our attention and investigation. But […] Cybersecurity, Usability, Online Privacy, and Digital Surveillance ** This post was originally published in ACRL TechConnect on May. 9, 2016.*** Cybersecurity is an interesting and important topic, one closely connected to those of online privacy and digital surveillance. Many of us know that it is difficult to keep things private on the Internet. The Internet was invented to share things with others […] Three Recent Talks of Mine on UX, Data Visualization, and IT Management I have been swamped at work and pretty quiet here in my blog. But I gave a few talks recently. So I wanted to share those at least. I presented about how to turn the traditional library IT department and its operation that is usually behind the scene into a more patron-facing unit at the recent American Library Association Midwinter […] Near Us and Libraries, Robots Have Arrived ** This post was originally published in ACRL TechConnect on Oct. 12, 2015.*** The movie, Robot and Frank, describes the future in which the elderly have a robot as their companion and also as a helper. The robot monitors various activities that relate to both mental and physical health and helps Frank with various house chores. […] feeds-feedburner-com-6828 ---- None feeds-feedburner-com-6846 ---- None feeds-feedburner-com-7031 ---- None feeds-feedburner-com-7110 ---- None feeds-feedburner-com-7189 ---- None feeds-feedburner-com-7223 ---- None feeds-feedburner-com-7283 ---- The Code4Lib Journal The Code4Lib Journal Editorial Resuming our publication schedule Managing an institutional repository workflow with GitLab and a folder-based deposit system Institutional Repositories (IR) exist in a variety of configurations and in various states of development across the country. Each organization with an IR has a workflow that can range from explicitly documented and codified sets of software and human workflows, to ad hoc assortments of methods for working with faculty to acquire, process and load items into a repository. The University of North Texas (UNT) Libraries has managed an IR called UNT Scholarly Works for the past decade but has until recently relied on ad hoc workflows. Over the past six months, we have worked to improve our processes in a way that is extensible and flexible while also providing a clear workflow for our staff to process submitted and harvested content. Our approach makes use of GitLab and its associated tools to track and communicate priorities for a multi-user team processing resources. We paired this Web-based management with a folder-based system for moving the deposited resources through a sequential set of processes that are necessary to describe, upload, and preserve the resource. This strategy can be used in a number of different applications and can serve as a set of building blocks that can be configured in different ways. This article will discuss which components of GitLab are used together as tools for tracking deposits from faculty as they move through different steps in the workflow. Likewise, the folder-based workflow queue will be presented and described as implemented at UNT, and examples for how we have used it in different situations will be presented. Customizing Alma and Primo for Home & Locker Delivery Like many Ex Libris libraries in Fall 2020, our library at California State University, Northridge (CSUN) was not physically open to the public during the 2020-2021 academic year, but we wanted to continue to support the research and study needs of our over 38,000 university students and 4,000 faculty and staff. This article will explain our Alma and Primo implementation to allow for home mail delivery of physical items, including policy decisions, workflow changes, customization of request forms through labels and delivery skins, customization of Alma letters, a Python solution to add the “home” address type to patron addresses to make it all work, and will include relevant code samples in Python, XSL, CSS, XML, and JSON. In Spring 2021, we will add the on-site locker delivery option in addition to home delivery, and this article will include new system changes made for that option. GaNCH: Using Linked Open Data for Georgia’s Natural, Cultural and Historic Organizations’ Disaster Response In June 2019, the Atlanta University Center Robert W. Woodruff Library received a LYRASIS Catalyst Fund grant to support the creation of a publicly editable directory of Georgia’s Natural, Cultural and Historical Organizations (NCHs), allowing for quick retrieval of location and contact information for disaster response. By the end of the project, over 1,900 entries for NCH organizations in Georgia were compiled, updated, and uploaded to Wikidata, the linked open data database from the Wikimedia Foundation. These entries included directory contact information and GIS coordinates that appear on a map presented on the GaNCH project website (https://ganch.auctr.edu/), allowing emergency responders to quickly search for NCHs by region and county in the event of a disaster. In this article we discuss the design principles, methods, and challenges encountered in building and implementing this tool, including the impact the tool has had on statewide disaster response after implementation. Archive This Moment D.C.: A Case Study of Participatory Collecting During COVID-19 When the COVID-19 pandemic brought life in Washington, D.C. to a standstill in March 2020, staff at DC Public Library began looking for ways to document how this historic event was affecting everyday life. Recognizing the value of first-person accounts for historical research, staff launched Archive This Moment D.C. to preserve the story of daily life in the District during the stay-at-home order. Materials were collected from public Instagram and Twitter posts submitted through the hashtag #archivethismomentdc. In addition to social media, creators also submitted materials using an Airtable webform set up for the project and through email. Over 2,000 digital files were collected. This article will discuss the planning, professional collaboration, promotion, selection, access, and lessons learned from the project; as well as the technical setup, collection strategies, and metadata requirements. In particular, this article will include a discussion of the evolving collection scope of the project and the need for clear ethical guidelines surrounding privacy when collecting materials in real-time. Advancing ARKs in the Historical Ontology Space This paper presents the application of Archival Resource Keys (ARKs) for persistent identification and resolution of concepts in historical ontologies. Our use case is the 1910 Library of Congress Subject Headings (LCSH), which we have converted to the Simple Knowledge Organization System (SKOS) format and will use for representing a corpus of historical Encyclopedia Britannica articles. We report on the steps taken to assign ARKs in support of the Nineteenth-Century Knowledge Project, where we are using the HIVE vocabulary tool to automatically assign subject metadata from both the 1910 LCSH and the contemporary LCSH faceted, topical vocabulary to enable the study of the evolution of knowledge. Considered Content: a Design System for Equity, Accessibility, and Sustainability The University of Minnesota Libraries developed and applied a principles-based design system to their Health Sciences Library website. With the design system at its center, the revised site was able to achieve accessible, ethical, inclusive, sustainable, responsible, and universal design. The final site was built with elegantly accessible semantic HTML-focused code on Drupal 8 with highly curated and considered content, meeting and exceeding WCAG 2.1 AA guidance and addressing cognitive and learning considerations through the use of plain language, templated pages for consistent page-level organization, and no hidden content. As a result, the site better supports all users regardless of their abilities, attention level, mental status, reading level, and reliability of their internet connection, all of which are especially critical now as an elevated number of people experience crises, anxieties, and depression. Robustifying Links To Combat Reference Rot Links to web resources frequently break, and linked content can change at unpredictable rates. These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information. In this paper, we highlight the significance of reference rot, provide an overview of existing techniques and their characteristics to address it, and introduce our Robust Links approach, including its web service and underlying API. Robustifying links offers a proactive, uniform, and machine-actionable way to combat reference rot. In addition, we discuss our reasoning and approach aimed at keeping the approach functional for the long term. To showcase our approach, we have robustified all links in this article. Machine Learning Based Chat Analysis The BYU library implemented a Machine Learning-based tool to perform various text analysis tasks on transcripts of chat-based interactions between patrons and librarians. These text analysis tasks included estimating patron satisfaction and classifying queries into various categories such as Research/Reference, Directional, Tech/Troubleshooting, Policy/Procedure, and others. An accuracy of 78% or better was achieved for each category. This paper details the implementation details and explores potential applications for the text analysis tool. Always Be Migrating At the University of California, Los Angeles, the Digital Library Program is in the midst of a large, multi-faceted migration project. This article presents a narrative of migration and a new mindset for technology and library staff in their ever-changing infrastructure and systems. This article posits that migration from system to system should be integrated into normal activities so that it is not a singular event or major project, but so that it is a process built into the core activities of a unit. Editorial: For Pandemic Times Such as This A pandemic changes the world and changes libraries. Open Source Tools for Scaling Data Curation at QDR This paper describes the development of services and tools for scaling data curation services at the Qualitative Data Repository (QDR). Through a set of open-source tools, semi-automated workflows, and extensions to the Dataverse platform, our team has built services for curators to efficiently and effectively publish collections of qualitatively derived data. The contributions we seek to make in this paper are as follows: 1. We describe ‘human-in-the-loop’ curation and the tools that facilitate this model at QDR; 2. We provide an in-depth discussion of the design and implementation of these tools, including applications specific to the Dataverse software repository, as well as standalone archiving tools written in R; and 3. We highlight the role of providing a service layer for data discovery and accessibility of qualitative data. Keywords: Data curation; open-source; qualitative data From Text to Map: Combing Named Entity Recognition and Geographic Information Systems This tutorial shows readers how to leverage the power of named entity recognition (NER) and geographic information systems (GIS) to extract place names from text, geocode them, and create a public-facing map. This process is highly useful across disciplines. For example, it can be used to generate maps from historical primary sources, works of literature set in the real world, and corpora of academic scholarship. In order to lead the reader through this process, the authors work with a 500 article sample of the COVID-19 Open Research Dataset Challenge (CORD-19) dataset. As of the date of writing, CORD-19 includes 45,000 full-text articles with metadata. Using this sample, the authors demonstrate how to extract locations from the full-text with the spaCy library in Python, highlight methods to clean up the extracted data with the Pandas library, and finally teach the reader how to create an interactive map of the places using ArcGIS Online. The processes and code are described in a manner that is reusable for any corpus of text Using Integrated Library Systems and Open Data to Analyze Library Cardholders The Harrison Public Library in Westchester County, New York operates two library buildings in Harrison: The Richard E. Halperin Memorial Library Building (the library’s main building, located in downtown Harrison) and a West Harrison branch location. As part of its latest three-year strategic plan, the library sought to use existing resources to improve understanding of its cardholders at both locations. To do so, we needed to link the circulation data in our integrated library system, Evergreen, to geographic data and demographic data. We decided to build a geodemographic heatmap that incorporated all three aforementioned types of data. Using Evergreen, American Community Survey (ACS) data, and Google Maps, we plotted each cardholder’s residence on a map, added census boundaries (called tracts) and our town’s borders to the map, and produced summary statistics for each tract detailing its demographics and the library card usage of its residents. In this article, we describe how we acquired the necessary data and built the heatmap. We also touch on how we safeguarded the data while building the heatmap, which is an internal tool available only to select authorized staff members. Finally, we discuss what we learned from the heatmap and how libraries can use open data to benefit their communities. Update OCLC Holdings Without Paying Additional Fees: A Patchwork Approach Accurate OCLC holdings are vital for interlibrary loan transactions. However, over time weeding projects, replacing lost or damaged materials, and human error can leave a library with a catalog that is no longer reflected through OCLC. While OCLC offers reclamation services to bring poorly maintained collections up-to-date, the associated fee may be cost prohibitive for libraries with limited budgets. This article will describe the process used at Austin Peay State University to identify, isolate, and update holdings using OCLC Collection Manager queries, MarcEdit, Excel, and Python. Some portions of this process are completed using basic coding; however, troubleshooting techniques will be included for those with limited previous experience. Data reuse in linked data projects: a comparison of Alma and Share-VDE BIBFRAME networks This article presents an analysis of the enrichment, transformation, and clustering used by vendors Casalini Libri/@CULT and Ex Libris for their respective conversions of MARC data to BIBFRAME. The analysis considers the source MARC21 data used by Alma then the enrichment and transformation of MARC21 data from Share-VDE partner libraries. The clustering of linked data into a BIBFRAME network is a key outcome of data reuse in linked data projects and fundamental to the improvement of the discovery of library collections on the web and within search systems. CollectionBuilder-CONTENTdm: Developing a Static Web ‘Skin’ for CONTENTdm-based Digital Collections Unsatisfied with customization options for CONTENTdm, librarians at University of Idaho Library have been using a modern static web approach to creating digital exhibit websites that sit in front of the digital repository. This "skin" is designed to provide users with new pathways to discover and explore collection content and context. This article describes the concepts behind the approach and how it has developed into an open source, data-driven tool called CollectionBuilider-CONTENTdm. The authors outline the design decisions and principles guiding the development of CollectionBuilder, and detail how a version is used at the University of Idaho Library to collaboratively build digital collections and digital scholarship projects. Automated Collections Workflows in GOBI: Using Python to Scrape for Purchase Options The NC State University Libraries has developed a tool for querying GOBI, our print and ebook ordering vendor platform, to automate monthly collections reports. These reports detail purchase options for missing or long-overdue items, as well as popular items with multiple holds. GOBI does not offer an API, forcing staff to conduct manual title-by-title searches that previously took up to 15 hours per month. To make this process more efficient, we wrote a Python script that automates title searches and the extraction of key data (price, date of publication, binding type) from GOBI. This tool can gather data for hundreds of titles in half an hour or less, freeing up time for other projects. This article will describe the process of creating this script, as well as how it finds and selects data in GOBI. It will also discuss how these results are paired with NC State’s holdings data to create reports for collection managers. Lastly, the article will examine obstacles that were experienced in the creation of the tool and offer recommendations for other organizations seeking to automate collections workflows. Testing remote access to e-resource with CodeceptJS At the Badische Landesbibliothek Karlsruhe (BLB) we offer a variety of e-resources with different access requirements. On the one hand, there is free access to open access material, no matter where you are. On the other hand, there are e-resources that you can only access when you are in the rooms of the BLB. We also offer e-resources that you can access from anywhere, but you must have a library account for authentication to gain access. To test the functionality of these access methods, we have created a project to automatically test the entire process from searching our catalogue, selecting a hit, logging in to the provider's site and checking the results. For this we use the End 2 End Testing Framework CodeceptJS. Editorial An abundance of information sharing. Leveraging Google Drive for Digital Library Object Storage This article will describe a process at the University of Kentucky Libraries for utilizing an unlimited Google Drive for Education account for digital library object storage. For a number of recent digital library projects, we have used Google Drive for both archival file storage and web derivative file storage. As a part of the process, a Google Drive API script is deployed in order to automate the gathering of of Google Drive object identifiers. Also, a custom Omeka plugin was developed to allow for referencing web deliverable files within a web publishing platform via object linking and embedding. For a number of new digital library projects, we have moved toward a small VM approach to digital library management where the VM serves as a web front end but not a storage node. This has necessitated alternative approaches to storing web addressable digital library objects. One option is the use of Google Drive for storing digital objects. An overview of our approach is included in this article as well as links to open source code we adopted and more open source code we produced. Building a Library Search Infrastructure with Elasticsearch This article discusses our implementation of an Elastic cluster to address our search, search administration and indexing needs, how it integrates in our technology infrastructure, and finally takes a close look at the way that we built a reusable, dynamic search engine that powers our digital repository search. We cover the lessons learned with our early implementations and how to address them to lay the groundwork for a scalable, networked search environment that can also be applied to alternative search engines such as Solr. How to Use an API Management platform to Easily Build Local Web Apps Setting up an API management platform like DreamFactory can open up a lot of possibilities for potential projects within your library. With an automatically generated restful API, the University Libraries at Virginia Tech have been able to create applications for gathering walk-in data and reference questions, public polling apps, feedback systems for service points, data dashboards and more. This article will describe what an API management platform is, why you might want one, and the types of potential projects that can quickly be put together by your local web developer. Git and GitLab in Library Website Change Management Workflows Library websites can benefit from a separate development environment and a robust change management workflow, especially when there are multiple authors. This article details how the Oakland University William Beaumont School of Medicine Library use Git and GitLab in a change management workflow with a serverless development environment for their website development team. Git tracks changes to the code, allowing changes to be made and tested in a separate branch before being merged back into the website. GitLab adds features such as issue tracking and discussion threads to Git to facilitate communication and planning. Adoption of these tools and this workflow have dramatically improved the organization and efficiency of the OUWB Medical Library web development team, and it is the hope of the authors that by sharing our experience with them others may benefit as well. Experimenting with a Machine Generated Annotations Pipeline The UCLA Library reorganized its software developers into focused subteams with one, the Labs Team, dedicated to conducting experiments. In this article we describe our first attempt at conducting a software development experiment, in which we attempted to improve our digital library’s search results with metadata from cloud-based image tagging services. We explore the findings and discuss the lessons learned from our first attempt at running an experiment. Leveraging the RBMS/BSC Latin Place Names File with Python To answer the relatively straight-forward question “Which rare materials in my library catalog were published in Venice?” requires an advanced knowledge of geography, language, orthography, alphabet graphical changes, cataloging standards, transcription practices, and data analysis. The imprint statements of rare materials transcribe place names more faithfully as it appears on the piece itself, such as Venetus, or Venetiae, rather than a recognizable and contemporary form of place name, such as Venice, Italy. Rare materials catalogers recognize this geographic discoverability and selection issue and solve it with a standardized solution. To add consistency and normalization to imprint locations, rare materials catalogers utilize hierarchical place names to create a special imprint index. However, this normalized and contemporary form of place name is often missing from legacy bibliographic records. This article demonstrates using a traditional rare materials cataloging aid, the RBMS/BSC Latin Place Names File, with programming tools, Jupyter Notebook and Python, to retrospectively populate a special imprint index for 17th-century rare materials. This methodology enriched 1,487 MAchine Readable Cataloging (MARC) bibliographic records with hierarchical place names (MARC 752 fields) as part of a small pilot project. This article details a partially automated solution to this geographic discoverability and selection issue; however, a human component is still ultimately required to fully optimize the bibliographic data. Tweeting Tennessee’s Collections: A Case Study of a Digital Collections Twitterbot Implementation This article demonstrates how a Twitterbot can be used as an inclusive outreach initiative that breaks down the barriers between the web and the reading room to share materials with the public. These resources include postcards, music manuscripts, photographs, cartoons and any other digitized materials. Once in place, Twitterbots allow physical materials to converge with the technical and social space of the Web. Twitterbots are ideal for busy professionals because they allow librarians to make meaningful impressions on users without requiring a large time investment. This article covers the recent implementation of a digital collections bot (@UTKDigCollBot) at the University of Tennessee, Knoxville (UTK), and provides documentation and advice on how you might develop a bot to highlight materials at your own institution. Building Strong User Experiences in LibGuides with Bootstrapr and Reviewr With nearly fifty subject librarians creating LibGuides, the LibGuides Management Team at Notre Dame needed a way to both empower guide authors to take advantage of the powerful functionality afforded by the Bootstrap framework native to LibGuides, and to ensure new and extant library guides conformed to brand/identity standards and the best practices of user experience (UX) design. To accomplish this, we developed an online handbook to teach processes and enforce styles; a web app to create Twitter Bootstrap components for use in guides (Bootstrapr); and a web app to radically speed the review and remediation of guides, as well as better communicate our changes to guide authors (Reviewr). This article describes our use of these three applications to balance empowering guide authors against usefully constraining them to organizational standards for user experience. We offer all of these tools as FOSS under an MIT license so that others may freely adapt them for use in their own organization. IIIF by the Numbers The UCLA Library began work on building a suite of services to support IIIF for their digital collections. The services perform image transformations and delivery as well as manifest generation and delivery. The team was unsure about whether they should use local or cloud-based infrastructure for these services, so they conducted some experiments on multiple infrastructure configurations and tested them in scenarios with varying dimensions. Trust, But Verify: Auditing Vendor-Supplied Accessibility Claims Despite a long-overdue push to improve the accessibility of our libraries’ online presences, much of what we offer to our patrons comes from third party vendors: discovery layers, OPACs, subscription databases, and so on. We can’t directly affect the accessibility of the content on these platforms, but rely on vendors to design and test their systems and report on their accessibility through Voluntary Product Accessibility Templates (VPATS). But VPATs are self-reported. What if we want to verify our vendors’ claims? We can’t thoroughly test the accessibility of hundreds of vendor systems, can we? In this paper, we propose a simple methodology for spot-checking VPATs. Since most websites struggle with the same accessibility issues, spot checking particular success criteria in a library vendor VPAT can tip us off to whether the VPAT as a whole can be trusted. Our methodology combines automated and manual checking, and can be done without any expensive software or complex training. What’s more, we are creating a repository to share VPAT audit results with others, so that we needn’t all audit the VPATs of all our systems. feeds-feedburner-com-7338 ---- None feeds-feedburner-com-7357 ---- None feeds-feedburner-com-7360 ---- None feeds-feedburner-com-7426 ---- None feeds-feedburner-com-7445 ---- None feeds-feedburner-com-7453 ---- Zotero Zotero Collect, organize, cite, and share your research Move Zotero Citations Between Google Docs, Word, and LibreOffice Last year, we added Google Docs integration to Zotero, bringing to Google Docs the same powerful citation functionality — with support for over 9,000 citation styles — that Zotero offers in Word and LibreOffice. Today we’re adding a feature that lets you move documents between Google Docs and Word or LibreOffice while preserving active Zotero citations. […] Retracted item notifications with Retraction Watch integration Zotero can now help you avoid relying on retracted publications in your research by automatically checking your database and documents for works that have been retracted. We’re providing this service in partnership with Retraction Watch, which maintains the largest database of retractions available, and we’re proud to help sustain their important work. How It Works […] Scan Books into Zotero from Your iPhone or iPad Zotero makes it easy to collect research materials with a single click as you browse the web, but what do you do when you want to add a real, physical book to your Zotero library? If you have an iPhone or iPad running iOS 12, you can now save a book to Zotero just by […] Zotero Comes to Google Docs We’re excited to announce the availability of Zotero integration with Google Docs, joining Zotero’s existing support for Microsoft Word and LibreOffice. The same powerful functionality that Zotero has long offered for traditional word processors is now available for Google Docs. You can quickly search for items in your Zotero library, add page numbers and other […] Improved PDF retrieval with Unpaywall integration As an organization dedicated to developing free and open-source research tools, we care deeply about open access to scholarship. With the latest version of Zotero, we’re excited to make it easier than ever to find PDFs for the items in your Zotero library. While Zotero has always been able to download PDFs automatically as you […] Introducing ZoteroBib: Perfect bibliographies in minutes We think Zotero is the best tool for almost anyone doing serious research, but we know that a lot of people — including many students — don’t need all of Zotero’s power just to create the occasional bibliography. Today, we’re introducing ZoteroBib, a free service to help people quickly create perfect bibliographies. Powered by the same technology […] Zotero 5.0.36: New PDF features, faster citing in large documents, and more The latest version of Zotero introduces some major improvements for PDF-based workflows, a new citing mode that can greatly speed up the use of the word processor plugin in large documents, and various other improvements and bug fixes. New PDF features Improved PDF metadata retrieval While the “Save to Zotero” button in the Zotero Connector […] Zotero 5.0 and Firefox: Frequently Asked Questions In A Unified Zotero Experience, we explained the changes introduced in Zotero 5.0 that affect Zotero for Firefox users. See that post for a full explanation of the change, and read on for some additional answers. What’s changing? Zotero 5.0 is available only as a standalone program, and Zotero 4.0 for Firefox is being replaced […] New Features for Chrome and Safari Connectors We are excited to announce major improvements to the Zotero Connectors for Chrome and Safari. Chrome The Zotero Connector for Chrome now includes functionality that was previously available only in Zotero for Firefox. Automatic Institutional Proxy Detection Many institutions provide a way to access electronic resources while you are off-campus by signing in to a […] A Unified Zotero Experience Since the introduction of Zotero Standalone in 2011, Zotero users have had two versions to choose from: the original Firefox extension, Zotero for Firefox, which provides deep integration into the Firefox user interface, and Zotero Standalone, which runs as a separate program and can be used with any browser. Starting with the release of Zotero […] feeds-feedburner-com-7472 ---- None feeds-feedburner-com-7538 ---- None feeds-feedburner-com-7642 ---- None feeds-feedburner-com-7745 ---- None feeds-feedburner-com-7753 ---- None feeds-feedburner-com-7770 ---- None feeds-feedburner-com-7775 ---- None feeds-feedburner-com-7879 ---- None feeds-feedburner-com-7884 ---- None feeds-feedburner-com-7912 ---- None feeds-feedburner-com-7967 ---- None feeds-feedburner-com-796 ---- None feeds-feedburner-com-8173 ---- None feeds-feedburner-com-8217 ---- None feeds-feedburner-com-8311 ---- None feeds-feedburner-com-8326 ---- None feeds-feedburner-com-8419 ---- None feeds-feedburner-com-8459 ---- None feeds-feedburner-com-8480 ---- Free Range Librarian Free Range Librarian K.G. Schneider's blog on librarianship, writing, and everything else (Dis)Association I have been reflecting on the future of a national association I belong to that has struggled with relevancy and with closing the distance between itself and its members, has distinct factions that differ on fundamental matters of values, faces declining national and chapter membership, needs to catch up on the technology curve, has sometimes […] I have measured out my life in Doodle polls You know that song? The one you really liked the first time you heard it? And even the fifth or fifteenth? But now your skin crawls when you hear it? That’s me and Doodle. In the last three months I have filled out at least a dozen Doodle polls for various meetings outside my organization. […] Memento DMV This morning I spent 40 minutes in the appointment line at the Santa Rosa DMV to get my license renewed and converted to REAL ID, but was told I was “too early” to renew my license, which expires in September, so I have to return after I receive my renewal notice. I could have converted […] An Old-Skool Blog Post I get up early these days and get stuff done — banking and other elder-care tasks for my mother, leftover work from the previous day, association or service work. A lot of this is writing, but it’s not writing. I have a half-dozen unfinished blog posts in WordPress, and even more in my mind. I […] Keeping Council Editorial note: Over half of this post was composed in July 2017. At the time, this post could have been seen as politically neutral (where ALA is the political landscape I’m referring to) but tilted toward change and reform. Since then, Events Have Transpired. I revised this post in November, but at the time hesitated […] What burns away We are among the lucky ones. We did not lose our home. We did not spend day after day evacuated, waiting to learn the fate of where we live. We never lost power or Internet. We had three or four days where we were mildly inconvenienced because PG&E wisely turned off gas to many neighborhoods, […] Neutrality is anything but “We watch people dragged away and sucker-punched at rallies as they clumsily try to be an early-warning system for what they fear lies ahead.” — Unwittingly prophetic me, March, 2016. Sometime after last November, I realized something very strange was happening with my clothes. My slacks had suddenly shrunk, even if I hadn’t washed them. After […] MPOW in the here and now I have coined a few biblioneologisms in my day, but the one that has had the longest legs is MPOW (My Place of Work), a convenient, mildly-masking shorthand for one’s institution. For the last four years I haven’t had the bandwidth to coin neologisms, let alone write about MPOW*. This silence could be misconstrued. I […] Questions I have been asked about doctoral programs About six months ago I was visiting another institution when someone said to me, “Oh, I used to read your blog, BACK IN THE DAY.” Ah yes, back in the day, that Pleistocene era when I wasn’t working on a PhD while holding down a big job and dealing with the rest of life’s shenanigans. […] A scholar’s pool of tears, Part 2: The pre in preprint means not done yet Note, for two more days, January 10 and 11, you (as in all of you) have free access to my article, To be real: Antecedents and consequences of sexual identity disclosure by academic library directors. Then it drops behind a paywall and sits there for a year. When I wrote Part 1 of this blog […] feeds-feedburner-com-8611 ---- None feeds-feedburner-com-861 ---- None feeds-feedburner-com-8646 ---- None feeds-feedburner-com-8690 ---- None feeds-feedburner-com-8727 ---- None feeds-feedburner-com-8744 ---- None feeds-feedburner-com-8789 ---- Hanging Together Hanging Together the OCLC Research blog Nederlandse ronde tafel sessie over next generation metadata: Denk groter dan NACO en WorldCat Met dank aan Ellen Hartman, OCLC, voor het vertalen van de oorspronkelijke Engelstalige blogpost. Op 8 maart 2021 werd een Nederlandse ronde tafel discussie georganiseerd als onderdeel van de OCLC … The post Nederlandse ronde tafel sessie over next generation metadata: Denk groter dan NACO en WorldCat appeared first on Hanging Together. Recognizing bias in research data – and research data management As the COVID pandemic grinds on, vaccinations are top of mind. A recent article published in JAMA Network Open examined whether vaccination clinical trials over the last decade adequately represented … The post Recognizing bias in research data – and research data management appeared first on Hanging Together. Accomplishments and priorities for the OCLC Research Library Partnership With 2021 well underway, the OCLC Research Library Partnership is as active as ever. We are heartened by the positive feedback and engagement our Partners have provided in response to … The post Accomplishments and priorities for the OCLC Research Library Partnership appeared first on Hanging Together. Dutch round table on next generation metadata: think bigger than NACO and WorldCat As part of the OCLC Research Discussion Series on Next Generation Metadata, this blog post reports back from the Dutch language round table discussion held on March 8, 2021. (A Dutch … The post Dutch round table on next generation metadata: think bigger than NACO and WorldCat appeared first on Hanging Together. Third English round table on next generation metadata: investing in the utility of authorities and identifiers Thanks to George Bingham, UK Account Manager at OCLC, for contributing this post as part of the Metadata Series blog posts.  As part of the OCLC Research Discussion Series on Next Generation Metadata, this blog post reports back … The post Third English round table on next generation metadata: investing in the utility of authorities and identifiers appeared first on Hanging Together. Mesa redonda sobre metadatos de próxima generación en español: la gestión de las identidades de los investigadores es lo más importante Muchas gracias a Francesc García Grimau, OCLC, por la traducción de esta entrada de blog, que originalmente estaba en inglés. Como parte de la Serie de Debates de OCLC Research … The post Mesa redonda sobre metadatos de próxima generación en español: la gestión de las identidades de los investigadores es lo más importante appeared first on Hanging Together. Making strategic choices about library collaboration in RDM Academic libraries are responding to a host of disruptions – emerging technologies, changing user expectations, evolving research and learning practices, economic pressures, and of course, the COVID-19 pandemic. While these … The post Making strategic choices about library collaboration in RDM appeared first on Hanging Together. Spanish round table on next generation metadata: managing researcher identities is top of mind As part of the OCLC Research Discussion Series on Next Generation Metadata, this blog post reports back from the Spanish language round table discussion held on March 8, 2021. (A Spanish … The post Spanish round table on next generation metadata: managing researcher identities is top of mind appeared first on Hanging Together. Table ronde française sur les métadonnées de nouvelle génération: le défi consiste à gérer de concert de multiples échelles Merci à Arnaud Delivet, OCLC, pour la traduction de l’article original en anglais. Cet article de blog revient sur la table ronde en français organisée par le département recherche d’OCLC … The post Table ronde française sur les métadonnées de nouvelle génération: le défi consiste à gérer de concert de multiples échelles appeared first on Hanging Together. Deutschsprachige Gesprächsrunde zu Metadaten der nächsten Generation: Formate, Kontexte und Lücken Vielen Dank an Petra Löffel, OCLC, für die Übersetzung dieses im Original englischsprachigen Blogposts. Im Rahmen der Diskussionsserie zu Metadaten der nächsten Generation berichtet dieser Blogpost von der deutschen Gesprächsrunde … The post Deutschsprachige Gesprächsrunde zu Metadaten der nächsten Generation: Formate, Kontexte und Lücken appeared first on Hanging Together. feeds-feedburner-com-8820 ---- None feeds-feedburner-com-8867 ---- None feeds-feedburner-com-8983 ---- None feeds-feedburner-com-9113 ---- None feeds-feedburner-com-9292 ---- None feeds-feedburner-com-9338 ---- None feeds-feedburner-com-9428 ---- None feeds-feedburner-com-9453 ---- None feeds-feedburner-com-9570 ---- None feeds-feedburner-com-9678 ---- None feeds-feedburner-com-9885 ---- None feeds-feedburner-com-9931 ---- None feeds-feedburner-com-9997 ---- None feeds-fiander-info-4620 ---- Rapid Communications Rapid Communications Rapid, but irregular, communications from the frontiers of Library Technology Mac OS vs Emacs: Getting on the right (exec) PATH Finding ISBNs in the the digits of π Software Upgrades and The Parable of the Windows Using QR Codes in the Library A Manifesto for the Library I'm a Shover and Maker! LITA Tears Down the Walls A (Half) Year in Books The Desk Set Drinking Game July Book a Month Challenge: Independence June Book a Month Challenge: Knowledge Anthony Hope and the Triumph of the Public Domain May Book a Month Challenge: Mother Eric S. Raymond on Proprietary ILSs One Big Library Unconference in Toronto April Book A Month Challenge: Beauty Thinking About Dates on To-Do List Web Sites The Most Important Programming Language I've Learned Building Systems that Support Librarians Book A Month Challenge for March: Craft Social Aggregators On Keeping a Reading Journal BAM Challenge: Heart Where the Users Are My Top Technology Trends Slides feeds-pinboard-in-8651 ---- Pinboard (items tagged code4lib) https://pinboard.in/t:code4lib/ (400) https://twitter.com/rudokemper/status/1371454887721119748/photo/1 2021-03-23T15:36:56+00:00 https://twitter.com/rudokemper/status/1371454887721119748/photo/1 bsscdt RT @rudokemper: Floored and honored to have been invited to give a keynote for the #c4l21 #code4lib conference next Monday. I can't wait to share about our work building open-source tech for communities to map oral histories, and how my journey started in the library + archive space! @code4lib c4l21 code4lib https://twitter.com/ https://pinboard.in/u:bsscdt/b:393a9fefac65/ Untitled (https://d1keuthy5s86c8.cloudfront.net/static/ems/upload/files/code4lib21_discogs_blacklight.pdf) 2021-03-23T05:00:37+00:00 https://d1keuthy5s86c8.cloudfront.net/static/ems/upload/files/code4lib21_discogs_blacklight.pdf rybesh RT @sf433: Really happy to share, “Dynamic Integration of Discogs Data within a Blacklight Catalog” From now on I’m going to ask myself, “Can this talk be a poster?” #code4lib code4lib https://twitter.com/ https://pinboard.in/u:rybesh/b:731426d5f14f/ The Code4Lib Journal – Advancing ARKs in the Historical Ontology Space 2021-03-10T18:19:28+00:00 https://journal.code4lib.org/articles/15608 geephroh code4lib digitallibraries digitalpreservation data ontology identifiers digitalhumanities ark computationalarchivalscience cas archives journalarticle https://pinboard.in/ https://pinboard.in/u:geephroh/b:60093e26caf8/ The Code4Lib Journal – Managing an institutional repository workflow with GitLab and a folder-based deposit system 2021-02-16T00:56:37+00:00 https://journal.code4lib.org/articles/15650 aarontay Managing an institutional repository workflow with GitLab and a folder-based deposit system by Whitney R. Johnson-Freeman, @vphill, and Kristy K. Phillips #code4lib Journal issue 50. code4lib https://twitter.com/ https://pinboard.in/u:aarontay/b:95dfc9c36cda/ LISTSERV 16.5 - CODE4LIB Archives 2020-09-29T12:04:57+00:00 https://lists.clir.org/cgi-bin/wa?A2=CODE4LIB;e2bc9365.2009 miaridge RT @kiru: I forgot to post the call earlier: The Code4Lib Journal () is looking for volunteers to join its editorial committee. Deadline: 12 Oct. #code4lib code4lib https://twitter.com/ https://pinboard.in/u:miaridge/b:e26e92731fb6/ 20 - C4L [5] Future Role of Libraries in Researcher Workflows - Google Slides 2020-03-11T00:13:42+00:00 https://t.co/JCoE2mVhD5 elibtronic research-lifecycle code4lib publish scholarly-communication https://pinboard.in/u:elibtronic/b:7282952b4f7a/ Twitter 2020-02-18T09:20:53+00:00 https://twitter.com/i/web/status/1229697282284625920 aarontay New issue of the The #Code4Lib Journal published. Some terrific looking papers, including a review of PIDs for heri… Code4Lib https://twitter.com/ https://pinboard.in/u:aarontay/b:8525b50b475d/ (500) https://journal.code4lib.org/ 2020-02-18T08:24:34+00:00 https://journal.code4lib.org/ miaridge RT @kiru: I am very happy to announce the publication of the @Code4Lib Journal issue #47: webscraping… 47 code4lib https://twitter.com/ https://pinboard.in/u:miaridge/b:8f5c33d4d11c/ The Code4Lib Journal – COLUMN: We Love Open Source Software. No, You Can’t Have Our Code 2019-12-09T23:24:08+00:00 https://journal.code4lib.org/articles/527 pfhyper Librarians are among the strongest proponents of open source software. Paradoxically, libraries are also among the least likely to actively contribute their code to open source projects. This article identifies and discusses six main reasons this dichotomy exists and offers ways to get around them. Code4Lib library LIBT opensource finalproject https://pinboard.in/ https://pinboard.in/u:pfhyper/b:4da9d5a48b61/ The Code4Lib Journal – Barriers to Initiation of Open Source Software Projects in Libraries 2019-12-09T23:20:43+00:00 https://journal.code4lib.org/articles/10665 pfhyper Libraries share a number of core values with the Open Source Software (OSS) movement, suggesting there should be a natural tendency toward library participation in OSS projects. However Dale Askey’s 2008 Code4Lib column entitled “We Love Open Source Software. No, You Can’t Have Our Code,” claims that while libraries are strong proponents of OSS, they are unlikely to actually contribute to OSS projects. He identifies, but does not empirically substantiate, six barriers that he believes contribute to this apparent inconsistency. In this study we empirically investigate not only Askey’s central claim but also the six barriers he proposes. In contrast to Askey’s assertion, we find that initiation of and contribution to OSS projects are, in fact, common practices in libraries. However, we also find that these practices are far from ubiquitous; as Askey suggests, many libraries do have opportunities to initiate OSS projects, but choose not to do so. Further, we find support for only four of Askey’s six OSS barriers. Thus, our results confirm many, but not all, of Askey’s assertions. Code4Lib library LIBT opensource finalproject https://pinboard.in/ https://pinboard.in/u:pfhyper/b:74f337d2e129/ Twitter 2019-11-07T05:59:14+00:00 https://twitter.com/i/web/status/1191993029948780545 jbfink RT @kiru: The #Code4Lib Journal's issue 46 (2019/4) has been just published: . Worldcat Search API, Go… Code4Lib https://twitter.com/ https://pinboard.in/u:jbfink/b:d0cd0f6754e5/ Twitter 2019-11-01T15:40:51+00:00 https://twitter.com/i/web/status/1190292574008987648 jbfink RT @mjingle: Who's excited for the next #code4lib conference?! It will be in Pittsburgh, PA from March 8-11. Is your org interes… code4lib https://twitter.com/ https://pinboard.in/u:jbfink/b:14defc6eb027/ Attempto Project 2019-09-13T09:31:25+00:00 http://attempto.ifi.uzh.ch/site/ blebo nlp basic cnl computationalLinguistics controlledLanguage controlled_language code4lib compsci english knowledgeRepresentation https://pinboard.in/u:blebo/b:5a5b84f3a2fd/ Twitter 2019-08-22T22:54:45+00:00 https://twitter.com/i/web/status/1164566585371066368 danbri When our grandchildren ask about the Great #code4lib IRC Battle of the Tisane, we will serve them both tea and coff… code4lib https://twitter.com/ https://pinboard.in/u:danbri/b:3ce9a224628e/ Code4Lib 2019 Recap – bloggERS! 2019-07-23T17:38:41+00:00 https://saaers.wordpress.com/2019/04/02/code4lib-2019-recap/ geephroh code4lib digitallibraries research saa archives https://pinboard.in/ https://pinboard.in/u:geephroh/b:232421afd001/ Digital Technologies Development Librarian | NC State University Libraries 2019-07-09T15:54:39+00:00 https://www.lib.ncsu.edu/jobs/ehra/dtdl2019 cdmorris We're hiring a Digital Technologies Development Librarian @ncsulibraries ! #job #libjobs #code4lib #dlf #libtech dlf libtech code4lib job libjobs https://twitter.com/ https://pinboard.in/u:cdmorris/b:cf25e0f15239/ Twitter 2019-07-03T13:01:26+00:00 https://twitter.com/i/web/status/1146403575649787904 jbfink 3) All the men who want to preserve the idea of a #Code4Lib discussion space as one that's free of such topics as s… Code4Lib https://twitter.com/ https://pinboard.in/u:jbfink/b:d2f274738572/ Google Refine cheat sheet (code4lib) 2019-05-31T23:23:19+00:00 https://code4libtoronto.github.io/2018-10-12-access/GoogleRefineCheatSheets.pdf Psammead openRefine code4lib how-to cheatsheet https://pinboard.in/ https://pinboard.in/u:Psammead/b:d34452c7d709/ Untitled (https://www.youtube.com/watch?v=ICbLVnCHpnw) 2019-05-31T19:41:08+00:00 https://www.youtube.com/watch?v=ICbLVnCHpnw cdmorris Code4Lib Southeast happening today! Live stream starting at 9:30am eastern. #code4libse2019 #code4lib code4libse2019 code4lib https://twitter.com/ https://pinboard.in/u:cdmorris/b:d06090cf849c/ Twitter 2019-04-12T16:27:34+00:00 https://twitter.com/i/web/status/1116739648724897792 lbjay It occurs to me the #code4lib statement of support for Chris Bourg, , offers a better model… code4lib https://twitter.com/ https://pinboard.in/u:lbjay/b:d8424d01c06f/ GitHub - code4lib/c4l18-keynote-statement: Code4Lib Community Statement in Support of Chris Bourg 2019-04-12T16:27:34+00:00 https://github.com/code4lib/c4l18-keynote-statement lbjay It occurs to me the #code4lib statement of support for Chris Bourg, , offers a better model… code4lib https://twitter.com/ https://pinboard.in/u:lbjay/b:80b4ef487c08/ Twitter 2019-03-01T18:42:32+00:00 https://twitter.com/i/web/status/1101553322773770240 jbfink Now that the #code4lib Discord is up & running, I'm contemplating leaving Slack overall, with exception for plannin… code4lib https://twitter.com/ https://pinboard.in/u:jbfink/b:c5d0f0ddd90d/ (429) https://twitter.com/palcilibraries/status/1098658932589965312/photo/1 2019-02-22T03:01:16+00:00 https://twitter.com/palcilibraries/status/1098658932589965312/photo/1 cdmorris Talking privacy and RA21 at #c4l19 with Dave Lacy from @TempleLibraries #code4lib c4l19 code4lib https://twitter.com/ https://pinboard.in/u:cdmorris/b:9f144c1c99f8/ SCOPE: An access interface for DIPs from Archivematica 2019-02-21T00:55:32+00:00 https://github.com/CCA-Public/dip-access-interface sdellis archives code4lib https://pinboard.in/ https://pinboard.in/u:sdellis/b:1489ef99d5c6/ Review, Appraisal and Triage of Mail (RATOM) 2019-02-21T00:48:05+00:00 http://ratom.web.unc.edu/ sdellis archives code4lib https://pinboard.in/ https://pinboard.in/u:sdellis/b:5cdd23154090/ National Web Privacy Forum - MSU Library | Montana State University 2019-02-20T21:36:08+00:00 http://www.lib.montana.edu/privacy-forum/ sdellis privacy analytics code4lib https://pinboard.in/ https://pinboard.in/u:sdellis/b:0b1957db96e2/ The Code4Lib Journal 2019-01-16T14:25:26+00:00 https://journal.code4lib.org/ ratledge Code4lib Library_Technology Journal Journals_Code4Lib https://pinboard.in/ https://pinboard.in/u:ratledge/b:8a9f4c764b97/ Code4Lib | We are developers and technologists for libraries, museums, and archives who are dedicated to being a diverse and inclusive community, seeking to share ideas and build collaboration. 2018-12-05T14:35:01+00:00 https://code4lib.org/ ratledge Code4lib https://pinboard.in/ https://pinboard.in/u:ratledge/b:113cfc93ccb3/ Twitter 2018-11-15T09:00:52+00:00 https://twitter.com/i/web/status/1062993826913099781 verwinv Ne'er had the pleasure to attend #Code4lib myself ... but if you're thinking about it but can't afford to go - ther… Code4lib https://twitter.com/ https://pinboard.in/u:verwinv/b:f42046813ceb/ Twitter 2018-07-26T23:19:49+00:00 https://twitter.com/justindlc/status/1022612508979355649/photo/1 LibrariesVal RT @justindlc: Pre-conference meetup at Ormsby's for Code4Lib Southeast 2018! #code4libse2018 #code4lib code4lib code4libse2018 https://twitter.com/ https://pinboard.in/u:LibrariesVal/b:465c39ad24b0/ Twitter 2018-05-26T00:10:08+00:00 https://twitter.com/i/web/status/1000167164471529477 jbfink Thanks @lydia_zv @redlibrarian and Jolene (are you on Twitter, I can find you?) for a great #code4lib day! It was… code4lib https://twitter.com/ https://pinboard.in/u:jbfink/b:64faa19e4bad/ Twitter 2018-05-11T14:55:10+00:00 https://twitter.com/i/web/status/994954070707273728 jbfink My slides and speakers notes from #code4lib #c4ln18 on Ursula Franklin's "Real World of Technology" (which I really… code4lib c4ln18 https://twitter.com/ https://pinboard.in/u:jbfink/b:a2ed9a40fc54/ Twitter 2018-05-10T09:27:02+00:00 https://twitter.com/i/web/status/994509105006956544 jbfink In an unfortunate timing, it appears the code4lib wiki is down the first day of #code4lib North - there's a cache o… code4lib https://twitter.com/ https://pinboard.in/u:jbfink/b:099edcfb623c/ Twitter 2018-05-08T11:57:24+00:00 https://twitter.com/i/web/status/993775574291230720 jbfink RT @kiru: Just off the (word)press: the #Code4Lib Journal issue 40 is available: . Great articles writ… Code4Lib https://twitter.com/ https://pinboard.in/u:jbfink/b:db3c0bb083a8/ The Code4Lib Journal 2018-05-08T11:57:24+00:00 http://journal.code4lib.org/ jbfink RT @kiru: Just off the (word)press: the #Code4Lib Journal issue 40 is available: . Great articles writ… Code4Lib https://twitter.com/ https://pinboard.in/u:jbfink/b:9be441405213/ Twitter 2018-03-22T18:28:45+00:00 https://twitter.com/GitWishes/status/976754075164438528 lbjay this is all of #code4lib working on @bot4lib circa 2012. code4lib https://twitter.com/ https://pinboard.in/u:lbjay/b:3e91697b52b4/ Twitter 2018-03-19T19:07:46+00:00 https://twitter.com/gmcharlt/status/975810223842713601 danbri This is fabulous news for the cultural heritage open source world. Big ups to @code4lib and @CLIRDLF! #code4lib code4lib https://twitter.com/ https://pinboard.in/u:danbri/b:8cbe22ff2f58/ Twitter 2018-03-11T20:39:02+00:00 https://twitter.com/i/web/status/972894218237743105 miaridge RT @achdotorg: We too co-sign the #code4lib Community Statement in Support of @mchris4duke. We continue to admire an honor our col… code4lib https://twitter.com/ https://pinboard.in/u:miaridge/b:cf4f6d5494e3/ code4lib/c4l18-keynote-statement: Code4Lib Community Statement in Support of Chris Bourg 2018-03-10T00:33:33+00:00 https://github.com/code4lib/c4l18-keynote-statement jbfink code4lib github https://pinboard.in/ https://pinboard.in/u:jbfink/b:12610b3f6bd6/ Code4Lib Community Statement in Support of Chris Bourg | c4l18-keynote-statement 2018-03-09T22:50:57+00:00 https://code4lib.github.io/c4l18-keynote-statement/ wragge RT @CLIRDLF: We’re proud to stand with the #code4lib community in support of #c4l18 keynoter @mchris4duke: code4lib c4l18 https://twitter.com/ https://pinboard.in/u:wragge/b:d81e2b3e7158/ Matthew Reidsma : Auditing Algorithms 2018-02-20T16:41:34+00:00 https://matthew.reidsrow.com/talks/206 malantonio
    Talks about libraries, technology, and the Web by Matthew Reidsma.
    algorithms bias search libraries technology code4lib code4lib-2018 https://pinboard.in/u:malantonio/b:7dd04c469f56/ For the love of baby unicorns: My Code4Lib 2018 Keynote | Feral Librarian 2018-02-19T17:49:48+00:00 https://chrisbourg.wordpress.com/2018/02/14/for-the-love-of-baby-unicorns-my-code4lib-2018-keynote/ petej code4lib diversity technology libraries inclusion mansplaining https://pinboard.in/ https://pinboard.in/u:petej/b:18d1e6f30875/ JIRA for archives - Google Slides 2018-02-15T14:37:38+00:00 https://docs.google.com/presentation/d/1uwYWg04-nT6Qjm-j5HAAvsoH88iKzUCAX0eFBNLcy34/edit#slide=id.g306a7ccaec_0_0 malantonio see https://youtu.be/4cNo3SERnXI?t=1h45m28s for presentation code4lib code4lib-2018 libraries work-life https://pinboard.in/u:malantonio/b:5fc7b215e268/ Twitter 2018-02-07T10:53:30+00:00 https://twitter.com/justin_littman/status/960859481914605568/photo/1 aarontay RT @justin_littman: Peer review of my #code4lib poster on "Where to get Twitter data for academic research." code4lib https://twitter.com/ https://pinboard.in/u:aarontay/b:c54955c97e7d/ Availability Calendar - Kalorama Guest House 2018-01-16T17:55:55+00:00 https://secure.rezovation.com/Reservations/AvailabilityCalendar.aspx?s=UT57fw2WiD skorasaurus KALORAMA GUEST HOUSE CODE4LIB https://pinboard.in/ https://pinboard.in/u:skorasaurus/b:10f300ea6594/ (429) https://twitter.com/i/web/status/941746243352563712 2017-12-20T21:22:21+00:00 https://twitter.com/i/web/status/941746243352563712 DocDre RT @nowviskie: ICYMI: #Code4Lib 2018 registration is open! @mmsubram & @mchris4duke to keynote, reception in the Great Hall… Code4Lib https://twitter.com/ https://pinboard.in/u:DocDre/b:9e19136f92cb/ (429) https://twitter.com/freethefiles/status/938843684572889090/photo/1 2017-12-07T18:52:31+00:00 https://twitter.com/freethefiles/status/938843684572889090/photo/1 verwinv Yay! I'm presenting at #code4lib. And I can say hello to Walter Forsberg, @hbmcd4 and @cristalyze! code4lib https://twitter.com/ https://pinboard.in/u:verwinv/b:86bf904d3371/ (429) https://twitter.com/i/web/status/938488557911576576 2017-12-06T19:21:23+00:00 https://twitter.com/i/web/status/938488557911576576 verwinv Registration for #code4lib is now open! And its being held in #WashingtonDC where our #MemoryLab is - so come visit… WashingtonDC code4lib MemoryLab https://twitter.com/ https://pinboard.in/u:verwinv/b:19769bc2fa8c/ code4lib 2018 - Washington, D.C. 2017-11-13T23:02:58+00:00 http://2018.code4lib.org/ verwinv Last day to vote #code4lib 2018 program! don't forget 😓! code4lib https://twitter.com/ https://pinboard.in/u:verwinv/b:1efcaa1db5a7/ 2018 Presentation Voting Survey 2017-10-23T19:49:45+00:00 https://www.surveymonkey.com/r/c4l2018-presentations verwinv vote #code4lib proposals rather than the presenters. new anonymity feature! check it: Got until 11/13 code4lib https://twitter.com/ https://pinboard.in/u:verwinv/b:81a15e672b49/ LODLAM Challenge Winners 2017-06-29T14:06:06+00:00 https://summit2017.lodlam.net/2017/06/29/lodlam-challenge-winners/ miaridge RT @LODLAM: #LODLAM Challenge prize winners congrats to DIVE+ (Grand) & WarSampo (Open data) teams #DH #musetech #code4lib DH musetech LODLAM code4lib https://twitter.com/ https://pinboard.in/u:miaridge/b:c6429902bd26/ JobBoard 2017-05-11T16:33:41+00:00 https://jobs.code4lib.org/ lbjay Some heroes don't wear capes, y'all. back online and and better than ever thanks to @ryanwick and @_cb_ #code4lib code4lib https://twitter.com/ https://pinboard.in/u:lbjay/b:a7f06f02b03e/ Digital Technologies Development Librarian | NCSU Libraries 2017-05-08T12:56:52+00:00 https://www.lib.ncsu.edu/jobs/ehra/digital-technologies-development-librarian jbfink RT @ronallo: Job opening: Digital Technologies Development Librarian @ncsulibraries #code4lib #libtechwomen Know someone? libtechwomen code4lib https://twitter.com/ https://pinboard.in/u:jbfink/b:3a5951bff6fd/ Who's Using IPFS in Libraries, Archives and Museums - Communities / Libraries, Archives and Museums - discuss.ipfs.io 2017-04-19T20:32:44+00:00 https://discuss.ipfs.io/t/whos-using-ipfs-in-libraries-archives-and-museums/130 sdellis career ipfs libraries code4lib https://pinboard.in/ https://pinboard.in/u:sdellis/b:df848f7bc65b/ Scott W. H. Young on Twitter: "Slides for my talk on participatory design with underrepresented populations. Thank you, #c4l17 :) https://t.co/rVS2Zdv25u" 2017-04-02T17:47:47+00:00 https://twitter.com/hei_scott/status/839523334744236033 brainwane refers to my Code4Lib keynote on empathy & UX yay Code4Lib https://pinboard.in/ https://pinboard.in/u:brainwane/b:4c2ef624cde4/ Twitter 2017-03-23T12:45:58+00:00 https://twitter.com/i/web/status/844892979965890560 lbjay Have not read the full report but based on the abstract seems useful to those involved in the #code4lib incorporati… code4lib https://twitter.com/ https://pinboard.in/u:lbjay/b:96f82f0b17b3/ ResistanceIsFertile - Google Drive 2017-03-09T18:06:41+00:00 https://drive.google.com/drive/folders/0B74oOQcTdnHjMy1WN003ZW5HTXc pmhswe code4lib harlow keynote https://pinboard.in/u:pmhswe/b:1760658453c2/ ResistanceIsFertile - Google Drive 2017-03-09T17:22:43+00:00 https://drive.google.com/drive/folders/0B74oOQcTdnHjMy1WN003ZW5HTXc markpbaggett code4lib harlow keynote https://pinboard.in/ https://pinboard.in/u:markpbaggett/b:cffeeb1e58e6/ Google Drive CMS 2017-03-09T16:08:39+00:00 https://www.drivecms.xyz/ jju webdev programming tech 2017 Code4Lib https://pinboard.in/u:jju/b:f9af0e34a8a0/ Code4Lib | Docker Presentation - Google Slides 2017-03-08T19:48:57+00:00 https://docs.google.com/presentation/d/12P1pR3p67dXIKXJWE5_sHa-RSktax-hzquo-Ffz-TH0/edit#slide=id.p markpbaggett code4lib docker https://pinboard.in/ https://pinboard.in/u:markpbaggett/b:bd340aec487e/ Best Catalog Results Page Ever 2017-03-08T19:05:25+00:00 https://www.dropbox.com/s/jbxe4jpbdck874z/deibel-c4l17-best-ever.pptx markpbaggett code4lib accessibility presentation https://pinboard.in/ https://pinboard.in/u:markpbaggett/b:56f2b0fea47a/ Participatory User Experience Design with Underrepresented Populations: A Model for Disciplined Empathy 2017-03-08T18:09:13+00:00 http://2017.code4lib.org/talks/Participatory-User-Experience-Design-with-Underrepresented-Populations-A-Model-for-Disciplined-Empathy brainwane Am honored & humbled to see #c4l17 Glad my talk/article was helpful! Wish I were at #code4lib to thank you in person c4l17 code4lib https://twitter.com/ https://pinboard.in/u:brainwane/b:9bf7ebd61d5d/ Twitter 2017-02-01T14:02:05+00:00 https://twitter.com/i/web/status/826792743464792065 bsscdt Why don't you join us in the #libux slack? Sign yourself up: #litaux #ux #code4lib… ux libux litaux code4lib https://twitter.com/ https://pinboard.in/u:bsscdt/b:961f3bd08a75/ Untitled (http://libux.co/slack?utm_content=buffer0f822&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer) 2017-02-01T14:02:05+00:00 http://libux.co/slack bsscdt Why don't you join us in the #libux slack? Sign yourself up: #litaux #ux #code4lib… ux libux litaux code4lib https://twitter.com/ https://pinboard.in/u:bsscdt/b:839a04bf9612/ Twitter 2016-12-23T18:11:41+00:00 https://twitter.com/jschneider/status/812360040082456576/photo/1 jcarletonoh Ten Principles for User Protection: #code4lib #privacy #ISCHOOLUI ISCHOOLUI privacy code4lib https://twitter.com/ https://pinboard.in/u:jcarletonoh/b:3bf57dea160b/ Technology in Hostile States: Ten Principles for User Protection | The Tor Blog 2016-12-23T18:11:41+00:00 https://blog.torproject.org/blog/technology-hostile-states-ten-principles-user-protection jcarletonoh Ten Principles for User Protection: #code4lib #privacy #ISCHOOLUI ISCHOOLUI privacy code4lib https://twitter.com/ https://pinboard.in/u:jcarletonoh/b:729712aebf8a/ Analyzing MARC with MicroXPath, part 1 - U. Ogbuji on the 1s & 2sies 2016-11-03T18:45:01+00:00 http://uogbuji.tumblr.com/post/152693143951/analyzing-marc-with-microxpath-part-1#_=_ uche Analyzing MARC with MicroXPath, part 1 #XML #XPath #libraries #code4lib XPath XML libraries code4lib https://twitter.com/ https://pinboard.in/u:uche/b:18f29fca2a83/ Library Technology Jobs 2016-11-03T12:34:08+00:00 http://librarytechnology.org/jobs/ jbfink RT @yo_bj: 2/2 For the #code4lib, #lita, and #mashcat crowds, keep an eye out on for #libtech jobs. libtech mashcat lita code4lib https://twitter.com/ https://pinboard.in/u:jbfink/b:ad699235bba2/ 2017 Keynote Speakers Nominations - Code4Lib 2016-10-11T16:10:19+00:00 http://wiki.code4lib.org/2017_Keynote_Speakers_Nominations verwinv Do you know who should keynote #Code4Lib 2017? Help us out: #c4l17 Code4Lib c4l17 https://twitter.com/ https://pinboard.in/u:verwinv/b:fe39b74c928e/ Library of Congress LCCN Permalink sh2016001442 2016-09-15T13:02:28+00:00 https://lccn.loc.gov/sh2016001442 anneheathen RT @JulieSwierczek: #code4lib #c4l16 - "Black Lives Matter movement" is now a SUBJECT HEADING. . Catalogers, make sure you USE IT! c4l16 code4lib https://twitter.com/ https://pinboard.in/u:anneheathen/b:3de84d0358ff/ femsom-org-3527 ---- Hawa Feminist Coalition – Coalition of Young Feminists in Somalia Skip to content News Home     |     Hawa Feminist Coalition Coalition of Young Feminists in Somalia EMAIL info@femsom.org CALL NOW 907 483965 Donate Menu Home About Us Our Vision Our Mission Our Team Our Members Our Governance Structure Our Work Advocacy & Awareness Rising Leadership Development & Empowerment Collective Action and Feminist Movement Building Publications Blog Join Us Contact Us Coalition of young feminists working to promote the safety, equality, justice, rights and dignity of girls and young women in Somalia. Join Us! We strive to providing a brighter future for our sisters in Somalia. Join Us! We do mobilize collective action and meaningful ways of working with each other…. Join Us! Home Who We Are About Us Hawa Feminist Coalition was founded by young feminists all under the age of 35 in 2018 with aim of promoting the safety, equality, justice, rights and dignity of girls and young women in Somalia where women and girls bear an unequal brunt of hardships exacerbated by poverty, conflict, religious and cultural limitations which promotes strict male authority. Read More Our Vision Somalia where gender equality is achieved and women and girls enjoy all their rights and live in dignity. Read More Our Mission Mobilisation of Somali young women and girls for achievement of gender equality and the realisation of women’s and girls’ rights at all levels to enjoy all their rights and live in dignity. Read More We are coalition of young feminists all under the age of 35 all standing for promoting the safety, equality, justice, rights and dignity of girls and young women in Somalia. JOIN US! If you are interesting to be part of collective feminist movement. Join Us! 0 Member Groups / Clubs 0 Members What We Do Collective Action & Feminist Movement Building We mobilize and strengthen feminist based collective actions and practice meaningful ways of working with each other to be visible, strong and diverse enough to result in concrete and sustainable change in achieving gender equality in Somalia. Read More Leadership Development & Empowerment We provide capacity development and empowerment for our members and feminist grassroots groups to build stronger grassroots movements with the confidence, information, skills and strategies they need to bring dramatic changes in norms, laws, policies and practices toward achieving gender equality in Somalia. Read More Advocacy & Awareness Rising We use the influence of art, music, culture, poetry, social media, feminist activism to promote the safety, equality, justice, rights and dignity of girls, young women and other marginalized groups. Read More latest news Statement: Hawa Feminist Coalition condemns Two Sisters Killed In Mogadishu Hawa Feminist Coalition condemns the death of Fahdi Adow Abdi and Faiza Adow Abdi in Mogadishu in the night of April 22, 2021 after a mortar landed in their house ... Read More Hawa Feminist Coalition advocates promotion of sex-disaggregated data in the event of commemoration of Open Data Day 2021 In commemoration of Open Data Day; an annual celebration event held all over the world in the first week of March in every year, Hawa Feminist Coalition organized an online ... Read More Join us in promotion of sex-disaggregated data in the event of commemoration of Open Day 2021 Open Data Day is an annual celebration of open data all over the world. Groups from around the world create local events on the day where they will use open ... Read More Awareness rising on raise of domestic violence amid the COVID-19 health crisis in Puntland Gender-based violence (GBV) increases during every type of emergency – whether economic crises, conflict or disease outbreaks. Pre-existing toxic social norms and gender inequalities, economic and social stress caused by ... Read More STATEMENT: Hawa Feminist Coalition condemns arbitrary arrest of Daljir journalists for reporting on GBV cases We understand that reporting on these topics is a difficult task, and we appreciate the media’s commitment to doing so with integrity. We strongly condemn the arbitrary arrest of these journalists ... Read More STATETEMENT: Call for Immediate Action on Rape and Murder of Innocent Girl in Bosaso Very Sad! A body of young girl horrifically murdered was found lying on Bosaso street. The young girl seems to be raped and then murdered with her face beaten terribly, ... Read More Our Team Jawahir A. Mohamed Executive Director Mariam M. Hussein Head of Operations Kowsar Abdisalam Guled Head of Programs Linda S. Mohamed Membership Officer Learn More Our Partners Previous Next Who we are Coalition of Young Feminists in Somalia, all under the age of 35 working to promote the safety, equality, justice, rights and dignity of girls and young women in Somalia. Our Office Hawa Feminist Coalition HQ Laanta Hawada, Airport Road 500 Bosaso Puntland Somalia Email: info@femsom.org Tell: +252 907 483965 Join Us! Join us!, If you are a young girl under the age of 35 years, interesting to be part of collective feminist movement working to promote the safety, equality, justice, rights and dignity of girls and young women in Somalia. Hawa Feminist Coalition Theme by Grace Themes forum2021-diglib-org-4568 ---- Home - DLF Forum 2021 Skip to content Home About Code of Conduct CoC Reporting Form Thank You Resources News Affiliated Events Learn@DLF NDSA’s #DigiPres21 Sponsors Sponsorship Opportunities Registration CFP Search for... Search for... Toggle Navigation Toggle Navigation Home About Code of Conduct CoC Reporting Form Thank You Resources News Affiliated Events Learn@DLF NDSA’s #DigiPres21 Sponsors Sponsorship Opportunities Registration CFP A world-class marketplace of ideas for digital GLAM practitioners since 1999 What's the DLF Forum? DLF programs stretch year-round, but we are perhaps best known for our signature event, the annual DLF Forum. The DLF Forum welcomes digital library, archives, and museum practitioners from member institutions and beyond—for whom it serves as a meeting place, marketplace, and congress. Learn about the event and plan to attend Attend our Affiliated Events! NDSA's Digital Preservation 2021 November 4 Digital Preservation is the annual conference of the National Digital Stewardship Alliance. DigiPres is expected to be a crucial venue for intellectual exchange, community-building, development of best practices, and national-level agenda-setting in the field. Learn@DLF November 8-10 Now in its fourth year, Learn@DLF returns in 2021 includes engaging, hands-on sessions where attendees will gain experience with new tools and resources, exchange ideas, and develop and share expertise with fellow community members as well as short tutorials about specific tools, techniques, workflows, or concepts. Make an Impact! Sponsor the DLF Forum and NDSA's #DigiPres21 DLF Forum starts in Days Hours Minutes Seconds NOW What makes the DLF Forum great? After nearly 14 years of academic library experience and subsequently participating in no less than 25 conferences, I can say that the DLF Forum was the most progressive and enlightening conference that I have ever attended. It was downright empowering. Ana Ndumu 2017 DLF Forum Fellow The thoughtful way the experience was designed was due to the efforts of the organizers...As a first-time participant, I am grateful to have been able to participate in this year’s virtual Forum and look forward to continuing to learn from the DLF community! Betsy Yoon 2020 DLF Forum Community Journalist Forum Updates 2021 DLF Forum, DigiPres, and Learn@DLF Calls for Proposals April 8, 2021 We’re delighted to share that it’s CFP season for CLIR’s annual events. Based on community feedback, we’ve made the decision… Read more Want Forum news? Subscribe to our newsletter to stay informed! Subscribe Sponsorship Opportunities About DLF Join DLF Contact Menu Sponsorship Opportunities About DLF Join DLF Contact Envelope Facebook Twitter Youtube Instagram Linkedin Skip to content Open toolbar Accessibility Tools Increase Text Decrease Text Grayscale High Contrast Negative Contrast Light Background Links Underline Readable Font Reset Sitemap forum2021-diglib-org-5902 ---- Call for Proposals - DLF Forum 2021 Skip to content Home About Code of Conduct CoC Reporting Form Thank You Resources News Affiliated Events Learn@DLF NDSA’s #DigiPres21 Sponsors Sponsorship Opportunities Registration CFP Search for... Search for... Toggle Navigation Toggle Navigation Home About Code of Conduct CoC Reporting Form Thank You Resources News Affiliated Events Learn@DLF NDSA’s #DigiPres21 Sponsors Sponsorship Opportunities Registration CFP Call for Proposals 2021 DLF Forum & Learn@DLF Call for Proposals CLIR’s Digital Library Federation invites proposals for the 2021 DLF Forum (November 1-3) and Learn@DLF (November 8-10), our workshop series, both held online this year. A separate call will be issued for Digital Preservation 2021, the annual conference of the NDSA (November 4). The Forum is a meeting place, a marketplace, and a congress for digital library practitioners from DLF member institutions and the broader community. Now that our events will take place virtually for a second time, we look forward to new and better ways to come together—as always, with community at the center.  Therefore, our guiding focus for this year’s Forum is sustaining our community. Relentless innovation, disruptive change, and constant demands on our time and energy rarely allow for a pause to assess how we got here. Sustenance comes in many forms and while it allows for growth, it is also an end in itself. How can we then shift our focus to prioritize the sustaining and nurturing of ourselves and our communities while still pushing for greater openness and inclusivity?  Pervasive racism persists and contributes to wrenching inequalities in the United States, especially among our Black, Indigenous, and People of Color (BIPOC) communities. CLIR has long recognized this inequity; diversity, social justice, and broad access to cultural heritage have been integral to our mission. In 2021, we reaffirm our commitment to pursuing greater equity and justice throughout the DLF Forum, working with our entire community toward an inclusivity that prizes the chorus of diverse voices needed for systemic change. As such, the planning committee will again prioritize submissions from BIPOC people and people working at Historically Black Colleges and Universities (HBCUs) and other BIPOC-centered libraries, archives, and museums. We therefore have self-identification options in the proposal submission form. For all events, we encourage proposals from DLF members and non-members; regulars and newcomers; digital library practitioners and those in adjacent fields such as institutional research and educational technology; and students, early-career professionals and senior staff alike. Proposals to more than one event are permitted, though please submit different proposals for each. Our Events The DLF Forum will take place Monday, November 1 through Wednesday, November 3, 2021. Digital Preservation 2021: Embracing Digitality will take place on Thursday, November 4, 2021. More information on that event can be found here: https://ndsa.org/conference/ Learn@DLF is a series of workshops offered the week after the DLF Forum, November 8-10, 2021. About Presenting Accepted presentations and panels will be delivered via pre-recorded video. This format allows for flexible watch times and speeds, captioning, and avoids many technical challenges. Videos must be submitted by Wednesday, September 15. Presenters will receive support in the form of tutorials, resources, and individual assistance. Presenters will be expected to be in attendance and available during their presentation time for live Q&A (chat-based or video, format TBD). To make space for as many voices as possible, individuals may present only once on the Forum program. The DLF Forum is explicitly designed to enact and support the DLF community’s values, and we strive to create a safe, accessible, welcoming, and inclusive event that reflects our Code of Conduct. Submissions & Evaluation Based on community feedback and the work of our Program Committee, we welcome submissions geared toward a practitioner audience that: Clearly engage with DLF’s mission of advancing research, learning, social justice, and the public good through the creative design and wise application of digital library technologies Activate and inspire participants to think, make, and do Engage people from different backgrounds, experience levels, and disciplines Include clear take-aways that participants can implement in their own work Submission Formats Sessions are invited in the following lengths and formats: At the DLF Forum, November 1-3: 45-minute Panels: A panel discussion of three to four speakers on a unified topic, with an emphasis on the discussion. A maximum of four speakers is allowed per submission. Proposals with representative and inclusive speaker involvement will be favored by the committee, and all-male-identifying panels will not be accepted. The main goals of the panel format at the DLF Forum are to bring together diverse perspectives on a topic and to encourage a community discussion of panelists’ approaches or findings. 15-minute Presentations: A presentation by one to two speakers on a single topic or project. A maximum of two speakers is allowed per submission. Presentations will be grouped by the program committee based on overarching themes or ideas. 5-minute Lightning Talks: High-profile, high-energy lightning talks held in plenary, with the opportunity to point attendees to contact information and additional materials online. No more than two speakers are allowed per submission. 25-minute Birds of a Feather (BOAF) Sessions: Working on a project on which you’d like feedback? Have a question you want to ponder with other interested people? New this year, 25-minute BOAF sessions are live video discussion sections where folks can discuss a topic of the proposer’s choice. These are roundtables where ideas can be shared and questions can be asked in the spirit of shared knowledge.   At Learn@DLF, November 8-10: 90-minute Workshops: Live, in-depth, hands-on training sessions on specific tools, techniques, workflows, or concepts. All workshop organizers are asked to provide details on technology needed, participant proficiency level, and learning outcomes for participants. Workshops must be interactive and inclusive, and the strongest proposals will demonstrate this clearly. Interested in presenting something longer? Consider submitting a ‘part I’ (morning session) and ‘part II’ (afternoon session). 10-15-minute Tutorials: Pre-recorded training sessions or demonstrations between 10 to 15 minutes in length about specific tools, techniques, workflows, or concepts. Proposal Requirements Proposal title Submission format and event: Varies by event First and last names, organizational affiliations, and email addresses for all authors / presenters Abstract (50 words max) Proposal (250 works max for all formats except for panels and workshops, up to 500 words) Five keywords for your proposal   Submit using our online system: bit.ly/2021CLIRcfps. Submit your proposal THE DEADLINE FOR ALL PROPOSALS IS MONDAY, MAY 17, 2021, AT 11:59PM EASTERN TIME. As in previous years, all submissions will be peer reviewed. Broader DLF community input will also be solicited through an open community voting process, which will inform the Program Committee’s final decisions. Selected presenters will be notified over the summer and will have a minimum of four weeks to prepare their recordings. We are still looking for sponsors for this year’s events! If you or someone you know may be interested, check out our sponsorship opportunities or contact us. Questions? You can reach us at forum@diglib.org. Want Forum news? Subscribe to our newsletter to stay informed! Subscribe Sponsorship Opportunities About DLF Join DLF Contact Menu Sponsorship Opportunities About DLF Join DLF Contact Envelope Facebook Twitter Youtube Instagram Linkedin Skip to content Open toolbar Accessibility Tools Increase Text Decrease Text Grayscale High Contrast Negative Contrast Light Background Links Underline Readable Font Reset Sitemap forum2021-diglib-org-7292 ---- Learn@DLF - DLF Forum 2021 Skip to content Home About Code of Conduct CoC Reporting Form Thank You Resources News Affiliated Events Learn@DLF NDSA’s #DigiPres21 Sponsors Sponsorship Opportunities Registration CFP Search for... Search for... Toggle Navigation Toggle Navigation Home About Code of Conduct CoC Reporting Form Thank You Resources News Affiliated Events Learn@DLF NDSA’s #DigiPres21 Sponsors Sponsorship Opportunities Registration CFP Join us for Learn@DLF November 8-10, 2021 To cultivate creative training and professional development opportunities stemming from our past three successful DLF Forum Pre-Conferences as well as our series of video tutorials from last year’s first-ever virtual DLF Forum, we are excited to host Learn@DLF the week immediately following the DLF Forum and NDSA’s Digital Preservation 2021 on Monday-Wednesday, November 8-10, 2021. Stay tuned for updates on Learn@DLF offerings. Share your experiences on Twitter with #LearnAtDLF! Want Forum news? Subscribe to our newsletter to stay informed! Subscribe Sponsorship Opportunities About DLF Join DLF Contact Menu Sponsorship Opportunities About DLF Join DLF Contact Envelope Facebook Twitter Youtube Instagram Linkedin Skip to content Open toolbar Accessibility Tools Increase Text Decrease Text Grayscale High Contrast Negative Contrast Light Background Links Underline Readable Font Reset Sitemap forum2021-diglib-org-9655 ---- Home - DLF Forum 2021 Skip to content Home About Code of Conduct CoC Reporting Form Thank You Resources News Affiliated Events Learn@DLF NDSA’s #DigiPres21 Sponsors Sponsorship Opportunities Registration CFP Search for... Search for... Toggle Navigation Toggle Navigation Home About Code of Conduct CoC Reporting Form Thank You Resources News Affiliated Events Learn@DLF NDSA’s #DigiPres21 Sponsors Sponsorship Opportunities Registration CFP A world-class marketplace of ideas for digital GLAM practitioners since 1999 What's the DLF Forum? DLF programs stretch year-round, but we are perhaps best known for our signature event, the annual DLF Forum. The DLF Forum welcomes digital library, archives, and museum practitioners from member institutions and beyond—for whom it serves as a meeting place, marketplace, and congress. Learn about the event and plan to attend Attend our Affiliated Events! NDSA's Digital Preservation 2021 November 4 Digital Preservation is the annual conference of the National Digital Stewardship Alliance. DigiPres is expected to be a crucial venue for intellectual exchange, community-building, development of best practices, and national-level agenda-setting in the field. Learn@DLF November 8-10 Now in its fourth year, Learn@DLF returns in 2021 includes engaging, hands-on sessions where attendees will gain experience with new tools and resources, exchange ideas, and develop and share expertise with fellow community members as well as short tutorials about specific tools, techniques, workflows, or concepts. Make an Impact! Sponsor the DLF Forum and NDSA's #DigiPres21 DLF Forum starts in Days Hours Minutes Seconds NOW What makes the DLF Forum great? After nearly 14 years of academic library experience and subsequently participating in no less than 25 conferences, I can say that the DLF Forum was the most progressive and enlightening conference that I have ever attended. It was downright empowering. Ana Ndumu 2017 DLF Forum Fellow The thoughtful way the experience was designed was due to the efforts of the organizers...As a first-time participant, I am grateful to have been able to participate in this year’s virtual Forum and look forward to continuing to learn from the DLF community! Betsy Yoon 2020 DLF Forum Community Journalist Forum Updates 2021 DLF Forum, DigiPres, and Learn@DLF Calls for Proposals April 8, 2021 We’re delighted to share that it’s CFP season for CLIR’s annual events. Based on community feedback, we’ve made the decision… Read more Want Forum news? Subscribe to our newsletter to stay informed! Subscribe Sponsorship Opportunities About DLF Join DLF Contact Menu Sponsorship Opportunities About DLF Join DLF Contact Envelope Facebook Twitter Youtube Instagram Linkedin Skip to content Open toolbar Accessibility Tools Increase Text Decrease Text Grayscale High Contrast Negative Contrast Light Background Links Underline Readable Font Reset Sitemap freerangelibrarian-com-7953 ---- Free Range Librarian › K.G. Schneider's blog on librarianship, writing, and everything else Free Range Librarian K.G. Schneider's blog on librarianship, writing, and everything else Skip to content About Free Range Librarian Comment guidelines Writing: Clips & Samples (Dis)Association Monday, May 27, 2019 Walking two roses to their new home, where they would be planted in the front yard. I have been reflecting on the future of a national association I belong to that has struggled with relevancy and with closing the distance between itself and its members, has distinct factions that differ on fundamental matters of values, faces declining national and chapter membership, needs to catch up on the technology curve, has sometimes problematic vendor relationships, struggles with member demographics and diversity,  and has an uneven and sometimes conflicting national message and an awkward at best relationship with modern communications; but represents something important that I believe in and has a spark of vitality that is the secret to its future. I am not, in fact, writing about the American Library Association, but the American Rose Society.  Most readers of Free Range Librarian associate me with libraries, but the rose connection may be less visible. I’ve grown roses in nine places I’ve lived in the last thirty-plus years, starting with roses planted in front of a rental house in Clovis, New Mexico, when I was stationed at Cannon Air Force Base in the 1980s, and continuing in pots or slices of garden plots as I moved around the world and later, the United States. Basically, if I had an outdoor spot to grow in, I grew roses, either in-ground or in pots, whether it was a slice of sunny backyard in Wayne, New Jersey, a tiny front garden area in Point Richmond, California, a sunny interior patio in our fake Eichler rental in Palo Alto, or a windy, none-too-sunny, and cold (but still much-appreciated) deck in our rental in San Francisco. When Sandy and I bought our sweet little house in Santa Rosa, part of the move involved rolling large garden pots on my Radio Flyer from our rental two blocks away. Some of you know I’m an association geek, an avocation that has waxed as the years have progressed. I join associations because I’m from a generation where that’s done, but another centripetal pull for staying and being involved is that associations, on their own, have always interested me. It’s highly likely that a long time ago, probably when I was stationed in New Mexico and, later, Germany (the two duty stations where I had the ability to grow roses), that I was a member of the American Rose Society for two or three years. I infer this because I accumulated, then later recycled, their house magazine, American Rose, and I also have vague memories of receiving the annual publication, Handbook for Selecting Roses. Early this year I joined the Redwood Empire Rose Society and a few weeks after that joined the American Rose Society. I joined the local society because I was eager to plant roses in our new home’s garden and thought this would be a way to tap local expertise, and was won over by the society’s programming, a range of monthly educational events that ranged from how to sharpen pruning shears to the habits and benefits of bees (a program where the audience puffed with pride, because roses--if grown without toxic chemical intervention–are highly beneficial bee-attracting pollen plants). I joined the national society less out of need than because I was curious about what ARS had to offer to people like me who are rose-lovers but average gardeners, and I was also inquisitive about how the society had (or had not) repositioned itself over the years. My own practices around rose gardening have gradually changed, reflecting broader societal trends. Thirty years ago, I was an unwitting cog in the agricultural-industrial rose complex. I planted roses that appealed to my senses — attractive, repeat-blooming, and fragrant — and then managed their ability to grow and produce flowers not only through providing the two things all roses need to grow– sun and water — but also through liberal applications of synthetic food and toxic pest and disease products. The roses I purchased were bred for the most part with little regard for their ability to thrive without toxic intervention or for their suitability for specific regions. Garden by garden, my behavior changed. I slowly adopted a “thrive or die” mantra. If a rose could not exist without toxic chemical interventions, then it did not belong in my garden, and I would, in rosarian parlance, “shovel-prune” it and replace it with a rose that could succeed with sun, water, good organic food and amendments, and an occasional but not over-fussy attention. Eventually, as I moved toward organic gardening and became more familiar with sustainability in general, I absorbed the message that roses are plants, and the soil they grow in is like the food I put in my body: it influences their health. So I had the garden soil tested this winter while I was moving and replacing plants, digging holes that were close to two feet wide and deep. Based on the test results, I adjusted the soil accordingly: I used organic soil sulphur to lower the ph, dug in slow-release nitrogen in the form of feathermeal, and bathed the plants in a weak solution of organic liquid manganese. As I now do every spring, when it warmed up a bit I also resumed my monthly treatment of fish fertilizer, and this year, based on local rose advice, in a folksier vein dressed all the bushes with organic worm castings and alfalfa, both known to have good fertilizing capabilities. Alfalfa also has a lot of trace nutrients we know less about but appear to be important. Princesse Charlene de Monaco, hybrid tea rose bred by Meilland Guess what? Science is real! Nearly all of the rose bushes are measurably larger and more vigorous. Carding Mill, a David Austin rose, went from a medium shrub to a flowering giant. New roses I planted this spring, such as Grand Dame and Pinkerbelle, are growing much more vigorously than last year’s new plantings. Some of this is due to the long, gloomy, wet winter, which gave roses opportunities to snake their long roots deeper into the good soil we have in Sonoma County; my friends are reporting great spring flushes this year. But roses planted even in the last six weeks, such as Princesse Charlene de Monaco and Sheila’s Perfume, are taking off like a rocket, so it’s not just the rain or the variety. (You do not need to do all this to grow roses that will please you and your garden visitors, including bees and other beneficial insects. I enjoy the process. The key thing is that nearly all of my roses are highly rated for disease resistance and nearly all are reported to grow well in our region.) Science–under attack in our national conversations–is also an area of conflict within the ARS. Presidents of the ARS have three-year terms, and the previous president, Pat Shanley, was an advocate of sustainable rose growing. She spoke and wrote about the value of organic gardening, and championed selecting varieties that do not require toxic intervention to thrive. The theme of the 2018 American Rose Annual was “Roses are for Everyone,” and this Annual is a fascinating look at the sustainable-gardening wing of the ARS. Most of the articles emphasized the value of what Paul Zimmerman, a rose evangelist, calls “garden roses,” flowers that everyday people like you and me can grow and enjoy. The message in this Annual is reinforced by recent books by longtime rose advocates and ARS members, such as Peter Kukielski’s Roses without Chemicals and Zimmerman’s Everyday Roses, books I highly recommend for library collections as well as personal use. (Roses without Chemicals is a book I use when I wake up at odd hours worried about things, because it is beautifully written and photographed and the roses are listed alphabetically.) Now the ARS has a new president, Bob Martin, a longtime exhibitor, who in editorials has promoted chemical intervention for roses. “And yes Virginia we do spray our roses,” he wrote in the March/April “First Word” editorial in American Rose, the house organ of the ARS. “As does nearly every serious rose exhibitor and those who want their rose bushes to sustainably produce the most beautiful blooms [emphasis mine].” American Rose does not appear to publish letters to the editor. There is no section listed for letters that I can find in any recent issue, and the masthead only lists a street address for “member and subscription correspondence.” Otherwise, I would write a short letter protesting the misuse of the term “sustainably,” as well as the general direction of this editorial. I am a rose amateur, and make no bones about it. But I know that equating chemical spraying with sustainability is, hands-down, fake news. It’s one thing to soak roses in toxins and call it a “health maintenance” program, as he does in this article. That’s close to the line but not over it, since he’s from the exhibitors’ wing of ARS. But it’s just plain junk science to claim that there is anything connected to sustainability about this approach. I also can’t imagine that this “toxins forever” message is attracting new ARS members or encouraging them to renew. It feels disconnected from what motivates average gardeners like me to grow roses today (to enjoy them in their gardens) and from how they want to grow them today (in a manner that honors the earth). Frankly, one of the happiest moments in my garden last year was not from personal enjoyment of the flowers or even the compliments of neighbors and passers-by, but when I saw bees doing barrel-rolls in the stamens of my roses, knowing that I was helping, not hurting, their survival. The vast majority of people buying and planting roses these days have no idea there is a single-plant society dedicated to this plant, or even less that this society believes it understands their motivations for and interest in roses. My environmental scan of the literature and the quantities of roses provided by garden stores makes me suspect that many people buy roses based on a mix of personal recommendations, marketing guidance (what the vendors are promoting), and what they remember from their family gardens. (I would love to learn there had been market research in this area; vendors may have taken this up.) For average gardeners, their memories include roses such as Peace and Mr. Lincoln, which were bred in the middle of the last century, when the focus was not on disease resistance but on producing the hourglass hybrid tea shape that became the de facto standard for exhibiting. We can get sentimental about roses from the late 20th century, but many of these varieties also helped perpetuate the idea that roses are hard to grow, despite the many varieties that grew just fine for thousands of years (or in the case of Excellenz von Schubert, which I planted this year, 110 years and counting). Market persuasion continues today; vendors tempt buyers through savvy marketing plans such as the Downton Abbey rose series from Weeks or David Austin’s persistent messaging about “English” roses. Note — I own a lovely rose from the Downton Abbey line, Violet’s Pride, that is quite the garden champ, and have three David Austin roses (Carding Mill, Munstead Wood, and Gentle Hermione). I’m just noting market behavior. It is well-documented in rose literature that the rose that seems to have shaken the ARS to the core is the Knockout series, which introduced maintenance-free roses to a generation short on time and patience and increasingly invested in sustainable practices throughout their lives, including their gardens. Again, smart marketing was part of the formula, because there always have been sustainable roses, and ome companies, such as Kordes, moved to disease-resistant hybridizing decades ago. But the Knockout roses were promoted as an amazing breakthrough. (It may help to know that new varieties of roses have 20-year patents during which propagation is only legally through license. I don’t begrudge hybridizers their income, given how much work–sometimes thousands of seedlings–goes into producing a single good rose, but this does factor into how and why roses are marketed.) You don’t need a certificate as a master gardener or membership in a rose society to grow Knockout roses or newer competitors such as the Oso Easy line. You don’t really need to know anything about roses at all, other than roses grow in sun, not shade, and appreciate water. You also don’t need to spray Knockout roses with powerful fungicides to prevent blackspot and mildew. Regardless of the public’s reaction to easy-to-grow roses, the rose world’s reception of the Knockout rose by the rose world was mixed, to use an understatement. Though the Knockout rose was the 2004 ARS members’ choice rose, rumblings abounded, and Knockout was even blamed in popular literature as a vector for the rose rosette virus (RRV), though this was later debunked. Fifty years ago RRV was observed in a number of rose varieties, long before the Knockout rose appeared. (This mite-spread virus was promulgated in the United States to control a pest rose, rosa multiflora, that was itself introduced without realizing what havoc it would wreak.) Again, I’m no scientist, but I would think the appearance of RRV in “domesticated” roses was inevitable, regardless of which rose variety was first identified by name as carrying this disease. Rose hybridizing is now catching up with the public’s interests and the wider need for roses with strong disease resistance. Rose companies prominently tout disease resistance and many new varieties can be grown toxin-free. I selected Princesse Charlene de Monaco in part because it medaled as best hybrid tea in the 2018 Biltmore International Rose Trials, for which roses must perform well in terms of vigor and disease resistance as well as aesthetic qualities. There were companies such as Kordes who walked this walk before it was fashionable, but in typical change-adoption fashion, other vendors are adapting their own practices, because the market is demanding it. But association leadership is driven by different goals than that for for-profit companies. A colleague of mine, after sharing his support for my successful run for ALA Executive Board, commented that it takes expertise to run a $50 million organization–skills not everyone has in equal abundance. My further reflection is that the kind of leadership we need at any one time is also unique to that moment, though–with absolutely no aspersions on our current crop of excellent leaders in ALA–historically, we have not always selected leadership for either general expertise or current needs, an issue hardly unique to ARS or ALA. So I watch the ARS seesaw. As just one more example, recently I read an article within the same ARS email newsletter touting the value of lacewings for insect management, followed by an article about the value of chemical interventions that I know are toxic to beneficial insects. These aren’t just contradictory ideas; they are contradictory values, contradictory messages, and contradictory branding. And these conflicting messages are evident even before we look at the relationship between the national association and local societies (organized differently than ALA chapters but with the similar intent). If I could deduce the current priorities for ARS from its magazine, website, and email newsletters, it would be the renovation of the ARS garden in Shreveport. The plan to update the 84-year-old “national rosarium” makes sense, if you like rose gardens, but it sounds more like a call to the passionate few than the general public. It’s hard to infer other priorities when website sections such as “Cyber Rosarian” invite members to ask questions that then go unanswered for over a year. The section called “Endorsed Products” is its own conflicted mix of chemical interventions, artificial fertilizers, and organic rose food. The website section on rose preservation–a goal embedded in the ARS mission statement, “The American Rose Society exists to promote the culture, preservation and appreciation of the Rose”–is a blank page with a note it is under construction. A section with videos by Paul Zimmerman is useful, but the rose recommendations by district are incomplete, and also raise the issue that ARS districts are organized geopolitically, not by climate. A rose suited for the long dry summers of Sonoma County may not do as well in Maui. The ARS “Modern Roses” database has value, listing over 37,000 cultivars. But if I want insight into a specific rose, I use Helpmefind.com, which despite its generic name and rustic interface is the de facto go-to site for rose information, questions, and discussion, often in the context of region, climate, and approaches to sustainability. I pay a small annual fee for premium access, in part to get HMF’s extra goodies (advanced search, and access to lineage information) but primarily because this site gives me value and I want to support their work. Though I couldn’t find data on the ARS website for membership numbers in national, district, or local societies, I intuit membership overall is declining. It is in our local society, where despite great programming in a region where many people grow roses, I am one of the younger members. Again, there are larger forces at work with association membership, but pointing to those forces and then doing business as usual is a recipe for slow death. Interestingly, the local rose society is aware of its challenges and interested in what it might mean to reposition itself for survival. Most recently, we founded a Facebook group that anyone could join (look for Redwood Empire Rose Society). But the society doesn’t have very much time, and a Facebook group isn’t the magic bullet. To loop back to ALA for a moment: I can remember when the response to concerns about membership decline were that the library field was contracting as a whole and association membership was also less popular in general. But these days, ALA is invested in moving past these facts and asking, what then? ALA is willing to change to survive. And I believe that is why ALA will be around 100 years from now, assuming we continue to support human life on this continent. As I ponder all this, deep in my association geekiness, I’m left with these questions: if the ARS can’t save itself, who will be there for the roses? Will the ad hoc, de facto green-garden rosarians form a new society, will they simply soldier on as a loose federation, or will the vendors determine the future of roses? Have rose societies begun talking about strategic redirection, consolidation, and other new approaches? Does the ARS see itself as a change leader? Where does the ARS see itself in 25 years? Am I just a naive member in the field, totally missing the point, or is there something to what I’m observing, outside the palace walls? I’ve been writing this off and on for months. It’s Memorial Day and it’s now light enough outside to wander into our front yard, pruners and deadheading bucket in hand, iPhone in my pocket so I can share what bloomed while I slept. Over time I changed how I grow roses, but not why I grow roses. Somewhere in there is an insight, but it’s time to garden. Bookmark to: Filed in Uncategorized | | Comments Off on (Dis)Association I have measured out my life in Doodle polls Wednesday, April 10, 2019 You know that song? The one you really liked the first time you heard it? And even the fifth or fifteenth? But now your skin crawls when you hear it? That’s me and Doodle. In the last three months I have filled out at least a dozen Doodle polls for various meetings outside my organization. I complete these polls at work, where my two-monitor setup means I can review my Outlook calendar while scrolling through a Doodle poll with dozens of date and time options. I don’t like to inflict Doodle polls on our library admin because she has her hands full enough, including managing my real calendar. I have largely given up on earmarking dates on my calendar for these polls, and I just wait for the inevitable scheduling conflicts that come up. Some of these polls have so many options I would have absolutely no time left on my calendar for work meetings, many of which need to be made on fairly short notice. Not only that, I gird my loins for the inevitable “we can’t find a date, we’re Doodling again” messages that mean once again, I’m going to spend 15 minutes checking my calendar against a Doodle poll. I understand the allure of Doodle; when I first “met” Doodle, I was in love. At last, a way to pick meeting dates without long, painful email threads! But we’re now deep into the Tragedy of the Doodle Commons, with no relief in sight. Here are some Doodle ideas–you may have your own to toss in. First, when possible, before Doodling, I ask for blackout dates. That narrows the available date/time combos and helps reduce the “we gotta Doodle again” scenarios. Second, if your poll requires more than a little right-scrolling, reconsider how many options you’re providing. A poll with 40 options might as well be asking me to block out April. And I can’t do that. Third, I have taken exactly one poll where the pollster chose to suppress other people’s responses, and I hope to never see that again. There is a whole gaming side to Doodling in which early respondents get to drive the dates that are selected, and suppressing other’s responses eliminates that capability. Plus I want to know who has and hasn’t responded, and yes, I may further game things when I have that information. Also, if you don’t have to Doodle, just say no. Bookmark to: Filed in Uncategorized | | Comments (4) Memento DMV Saturday, March 30, 2019 This morning I spent 40 minutes in the appointment line at the Santa Rosa DMV to get my license renewed and converted to REAL ID, but was told I was “too early” to renew my license, which expires in September, so I have to return after I receive my renewal notice. I could have converted to REAL ID today, but I would still need to return to renew my license, at least as it was explained to me, and I do hope that was correct. CC BY 4.0, https://wellcomecollection.org/works/m8wh2kmc But–speaking as a librarian, and therefore from a profession steeped in resource management–I predict chaos in 2020 if DMV doesn’t rethink their workflow. We’re 18 months out from October 2020, the point at which people will not be able to board domestic flights if they don’t have a REAL ID or a valid passport, or another (and far less common) substitute. Then again, California DMV is already in chaos. Their longtime leader retired, the replacement lasted 32 days, and their new leader has been there ca. 60 days. Last year featured the license renewal debacle, which I suspect impacted the man standing behind me. He said he was there to apply for his license again because he never received the one he applied for last fall. And California DMV is one of 10 states that still needs a REAL ID extension because it didn’t have it together on time. Indeed, I was on the appointment line, and nearly everyone in that line was on their second visit to DMV for the task they were trying to accomplish, and not for lack of preparation on their part. Some of that was due to various DMV crises, and some of it is baked into DMV processes. Based on how their current policies were explained to me today at Window 13, I should never have been on that line in the first place; somewhere, in the online appointment process, the DMV should have prevented me from completing that task. I needlessly took up staff time at DMV. But the bigger problem is a system that gets in its own way, like libraries that lock book drops during the day to force users to enter the libraries to return books. With me standing there at Window 13 with my online appointment, my license, and my four types of ID, the smart thing to do would be to complete the process and get me out of the pipeline of REAL ID applicants–or any other DMV activity. But that didn’t happen. And I suspect I’m just one drop in a big, and overflowing, bucket. I suppose an adroit side move is to ensure your passport is current, but I hope we don’t reach the point where we need a passport to travel in our own country. Bookmark to: Filed in Uncategorized | | Comments Off on Memento DMV An Old-Skool Blog Post Friday, March 29, 2019 I get up early these days and get stuff done — banking and other elder-care tasks for my mother, leftover work from the previous day, association or service work. A lot of this is writing, but it’s not writing. I have a half-dozen unfinished blog posts in WordPress, and even more in my mind. I map them out and they are huge topics, so then I don’t write them. But looking back at the early days of this blog — 15 years ago! — I didn’t write long posts. I still wrote long-form for other media, but my blog posts were very much in the moment. So this is an old-skool post designed to ease me back in the writing habit. I’ll strive for twice a week, which is double the output of the original blogger, Samuel Johnson. I’ll post for 15 minutes and move on to other things. I am an association nerd, and I spend a lot of time thinking about associations of all kinds, particularly the American Library Association, the American Homebrewers Association, the American Rose Society, the Redwood Empire Rose Society, the local library advisory boards, my church, and our neighborhood association. Serving on the ALA Steering Committee on Organizational Effectiveness, I’m reminded of a few indelible truths. One is that during the change management process you need to continuously monitor the temperature of the association you’re trying to change and in the words of one management pundit, keep fiddling with the thermostat. An association didn’t get that big or bureaucratic overnight, and it’s not going to get agile overnight, either. Another is that the same people show up in each association, and–more interesting to me–stereotypes are not at play in determining who the change agents are. I had a great reminder of that 20 years ago, when I served as the library director for one of those tiny Barbie Dream libraries in upstate New York, and I led the migration from a card catalog to a shared system in a consortium. Too many people assumed that the library staff–like so many employees in these libraries, all female, and nearly all older women married to retired spouses–would be resistant to this change. In fact, they loved this change. They were fully on board with the relearning process and they were delighted and proud that they were now part of a larger system where they could not only request books from 30 other libraries but sometimes even lend books as well from our wee collection. There were changes they and the trustees resisted, and that was a good lesson too, but the truism of older women resisting technology was dashed against the rocks of reality. My 15 minutes are up. I am going in early today because I need to print things, not because I am an older woman who fears technology but because our home printer isn’t working and I can’t trust that I’ll have seatback room on my flight to Chicago to open my laptop and read the ALA Executive Board manual electronically, let alone annotate it or mark it up. I still remember the time I was on a flight, using my RPOD (Red Pen of Death, a fine-point red-ink Sharpie) to revise an essay, and the passenger next to me turned toward me wide-eyed and whispered, “Are you a TEACHER?” Such is the power of RPOD, an objective correlative that can immediately evoke the fear of correction from decades ago. Bookmark to: Filed in American Liberry Ass'n, Association Nerd | | Comments (1) Keeping Council Saturday, January 20, 2018 Editorial note: Over half of this post was composed in July 2017. At the time, this post could have been seen as politically neutral (where ALA is the political landscape I’m referring to) but tilted toward change and reform. Since then, Events Have Transpired. I revised this post in November, but at the time hesitated to post it because Events Were Still Transpiring. Today, in January 2018, I believe even more strongly in what I write here, but take note that the post didn’t have a hidden agenda when I wrote it, and, except where noted, it still reflects my thoughts from last July, regardless of ensuing events. My agendas tend to be fairly straightforward. — KGS   Original Post, in which Councilors are Urged to Council Edits in 2018 noted with bolding. As of July 2017, I am back on ALA Council for my fifth (non-consecutive) term since joining the American Library Association in 1991. In June I attended Council Orientation, and though it was excellent–the whole idea that Councilors would benefit from an introduction to the process is a beneficial concept that emerged over the last two decades–it did make me reflect on what I would add if there had been a follow-on conversation with sitting Councilors called “sharing the wisdom.” I was particularly alerted to that by comments during Orientation which pointed up a traditional view of the Council process where ALA’s largest governing body is largely inactive for over 350 days a year, only rousing when we prepare to meet face to face. Take or leave what I say here, or boldly contradict me, but it does come from an abundance of experience. You are a Councilor year-round Most newly-elected Councilors “take their seats” immediately after the annual conference following their election — a factoid with significance. Council, as a body, struggles with being a year-round entity that takes action twice a year during highly-condensed meetings during a conference with many other things happening. I have written about this before, in a dryly wonky post from 2012 that also addresses Council’s composition and the role of chapters. I proposed that Council meet four times a year, in a solstice-and-equinox model. Two of those meetings (the “solstice” meetings) could  be online. (As far back as 2007 I was hinting around about the overhead and carbon footprint of Midwinter.) I doubt Midwinter will go to an online format even within the next decade–it’s a moneymaker for ALA, if less so than before, and ALA’s change cycle is glacial–but the proposal was intended to get people thinking about how Council does, and doesn’t, operate. In lieu of any serious reconsideration of Council, here are some thoughts. First, think of yourself as a year-round Councilor, even if you do not represent a constituency such as a state chapter or a division that meets and takes action outside of ALA. Have at least a passing familiarity with the ALA Policy Manual. Bookmark it and be prepared to reference it. Get familiar with ALA’s financial model through the videos that explain things such as the operating agreement. Read and learn about ALA. Share news. Read the reports shared on the list, and post your thoughts and your questions. Think critically about what you’re reading. It’s possible to love your Association, believe with your heart that it has a bright future, and still raise your eyebrows about pat responses to budget questions, reassurances that membership figures and publishing revenue will rebound, and glib responses about the value of units such as the Planning and Budget Assembly. Come to Council prepared. Read everything you can in advance, speak with other Councilors, and apply solid reflection, and research if needed, before you finish packing for your trip. Preparation requires an awareness that you will be deluged with reading just as you are struggling to button up work at your library and preparing to be away for nearly a week, so skimming is essential. I focus on issues where I know I can share expertise, and provide input when I can. Also, I am proud we do memorial resolutions and other commemorations but I don’t dwell on them in advance unless I have helped write them or had close familiarity with the people involved. Fee, Fie, Foe, Forum Coming prepared to Council is one of those values Council has struggled with. Looking at the Council list for the week prior to Annual 2017, the only conversation was a discussion about the relocation of the Council Forum meeting room from one hotel to another, complete with an inquiry asking if ALA could rent a special bus to tote Councilors to and from the Forum hotel. Council Forum is an informal convening that has taken place for decades to enable Council to discuss resolutions and other actions outside of the strictures of parliamentary procedure. It meets three times during ALA, in the evening, and though it is optional, I agree with the Councilor who noted that important work happens at this informal gathering. I am conflicted about Forum. It allows substantive discussion about key resolutions to happen outside of the constrictive frameworks of parliamentary procedure. Forum is also well-run, with volunteer Councilors managing the conversation. But Forum also appears to have morphed into a substitute for reading and conversation in advance. It also means that Councilors have to block out yet more time to do “the work of the Association,” which in turn takes us away from other opportunities during the few days we are together as an Association. I don’t say this to whine about the sacrifice of giving up dinners and networking with ALA colleagues, though those experiences are important to me, but rather to point out that Forum as a necessary-but-optional Council activity takes a silo–that Brobdingnabian body that is ALA Council–and further silos it. That can’t be good for ALA. As Councilors, we benefit from cross-pollination with the work of the Association. Resolved: To tread lightly with resolutions New Councilors, and I was one of them once, are eager to solve ALA’s problems by submitting resolutions. Indeed, there are new Councilors who see resolutions as the work of Council, and there have been round tables and other units that clearly saw their work as generating reams of lightly-edited, poorly-written resolutions just prior to and during the conference. There are at least three questions to ask before submitting a resolution (other than memorial and other commemorative resolutions): Can the resolution itself help solve a problem? Has it been coordinated with the units and people involved in the issue it addresses? Is it clear and well-written? There are other questions worth considering, such as, if the issue this resolution proposed to address cropped up a month after Council met, would you still push it online with your Council colleagues, or ask the ALA Executive Board to address it? Which is another way to ask, is it important? Tread lightly with Twitter Overall, since coming through the stress of living through the Santa Rosa fires, I’m feeling weary, and perhaps wary, of social media. Though I appreciate the occasional microbursts taking on idiots insulting libraries and so on, right now much of social media feels at once small and overwrought. If I seem quieter on social media, that’s true. (But I have had more conversations with neighbors and area residents during and after the fires than I have since we moved to Santa Rosa in early 2015, and those convos are the real thing.) More problematically, as useful as Twitter can be for following real-world issues–including ALA–Twitter also serves as a place where people go to avoid the heavy lifting involved with crucial conversations. I find I like #alacouncil Twitter best when it is gently riffing on itself or amplifying action that the larger ALA body would benefit hearing about. [the following, to the end of this post, is all new content] I like #alacouncil Twitter least when it is used as a substitute for authentic conversation, used to insult other Councilors, or otherwise undermining the discourse taking place in the meatware world. Twitter is also particularly good at the unthinking pile-on, and many people have  vulnerabilities in this area that are easily exploited. Sometimes those pile-ons hit me close to home, as happened a little over a year ago. Other times these pile-ons serve only to amuse the minx in me, such as when a Famous Author (™) recently scolded me for “trafficking in respectability politics” because I was recommending a list of books written by writers from what our fearless leader calls “s–thole countries.” Guilty as charged! Indeed, I have conducted two studies where a major theme was “Do I look too gay?” I basically have a Ph.D. in respectability politics. And like all writers–including Famous Author (™)–I traffic in them. I chuckled and walked on by. Walking on by, on Twitter, takes different forms. As an administrator, I practice a certain pleasant-but-not-sugary facial expression that stays on my face regardless of what’s going on in my head. I’m not denying my emotions, which would be the sugary face; I’m managing them. It’s a kind of discipline that also helps me fjord difficult conversations, in which the discipline of managing my face also helps me manage my brain. The equivalent of my Admin Face for me for #alacouncil Twitter is to exercise the mute button. I have found it invaluable. People don’t know they are muted (or unmuted). If only real life had mute buttons–can you imagine how much better some meetings would be if you could click a button and the person speaking would be silenced, unaware that you couldn’t hear them? Everyone wins. But that aside, I have yet to encounter a situation on Twitter when–for me–muting was the wrong call. It’s as if you stepped off the elevator and got away from that person smacking gum. Another car will be along momentarily. My last thought on this post has to do with adding the term “sitting” before Councilors in the first part of this post. When I was not on Council I tried very hard not to be “that” former Councilor who is always kibitizing behind scene, sending Councilors messages about how things should be and how, in the 1960s, ALA did something bad and therefore we can never vote online because nobody knows how to find ALA Connect and it’s all a nefarious plot hatched by the ALA President, his dimwitted sycophants, and the Executive Board, and why can’t MY division have more representation because after all we’re the 800-pound gorilla (ok, I just got political, but you’ll note I left out anything about what should or should not be required for a Very Special Job). Yes, once in a while I sent a note if I thought it was helpful, the way some of my very ALA-astute friends will whisper in my ear about policy and process I may be unfamiliar with. Michael Golrick, a very connected ALA friend of mine, must have a third brain hemisphere devoted to the ALA policy manual and bylaws. And during a time when I was asking a lot of questions about the ALA budget (boiling down to one question: who do you think you’re fooling?), I was humbled by the pantheon of ALA luminaries whispering in my ear, providing encouragement as well as crucial guidance and information. But when I am no longer part of something, I am mindful that things can and should change and move on, and that I may not have enough information to inform that change. We don’t go to ALA in horse-and-buggies any more, but we conduct business as if we do, and when we try to change that, the fainting couches are rolled out and the smelling salts waved around as if we had, say, attempted to change the ALA motto, which is, I regret to inform you, “The best reading, for the largest number, at the least cost”–and yes, attempts to change that have been defeated. My perennial question is, if you were starting an association today, how would it function? If the answer is “as it did in 1893” (when that motto was adopted), perhaps your advice on a current situation is less salient than you fancy. You may succeed at what you’re doing, but that doesn’t make you right. And with that, I go off to Courthouse Square today to make exactly that point about events writ much, much larger, and of greater significance, than our fair association. But I believe how we govern makes a difference, and I believe in libraries and library workers, and I believe in ALA. Especially today. Bookmark to: Filed in American Liberry Ass'n, Librarianship | | Comments (2) What burns away Thursday, November 16, 2017 We are among the lucky ones. We did not lose our home. We did not spend day after day evacuated, waiting to learn the fate of where we live. We never lost power or Internet. We had three or four days where we were mildly inconvenienced because PG&E wisely turned off gas to many neighborhoods, but we showered at the YMCA and cooked on an electric range we had been planning to upgrade to gas later this fall (and just did, but thank you, humble Frigidaire electric range, for being there to let me cook out my anxiety). We kept our go-bags near the car, and then we kept our go-bags in the car, and then, when it seemed safe, we took them out again. That, and ten days of indoor living and wearing masks when we went out, was all we went through. But we all bear witness. The Foreshadowing It began with a five-year drought that crippled forests and baked plains, followed by an soaking-wet winter and a lush  spring that crowded the hillsides with greenery. Summer temperatures hit records several times, and the hills dried out as they always do right before autumn, but this time unusually crowded with parched foliage and growth. The air in Santa Rosa was hot and dry that weekend, an absence of humidity you could snap between your fingers. In the southwest section of the city, where we live, nothing seemed unusual. Like many homes in Santa Rosa our home does not have air conditioning, so for comfort’s sake I grilled our dinner, our 8-foot backyard fence buffering any hint of the winds gathering speed northeast of us. We watched TV and went to bed early. Less than an hour later one of several major fires would be born just 15 miles east of where we slept. Reports vary, but accounts agree it was windy that Sunday night, with windspeeds ranging between 35 and 79 miles per hour, and a gust northwest of Santa Rosa reaching nearly 100 miles per hour. If the Diablo winds were not consistently hurricane-strength, they were exceptionally fast, hot, and dry, and they meant business. A time-lapse map of 911 calls shows the first reports of downed power lines and transformers coming in around 10 pm.  The Tubbs fire was named for a road that is named for a 19th-century winemaker who lived in a house in  Calistoga that burned to the ground in an eerily similar fire in 1964. In three hours this fire sped 12 miles southwest, growing in size and intent as it gorged on hundreds and then thousands of homes in its way, breaching city limits and expeditiously laying waste to 600 homes in the Fountaingrove district before it tore through the Journey’s End mobile home park, then reared back on its haunches and leapt across a six-lane divided section of Highway 101, whereupon it gobbled up big-box stores and fast food restaurants flanking Cleveland Avenue, a business road parallel to the highway.  Its swollen belly, fat with miles of fuel, dragged over the area and took out buildings in the  the random manner of fires. Kohl’s and KMart were totaled and Trader Joe’s was badly damaged, while across the street from KMart, JoAnn Fabrics was untouched. The fire demolished one Mexican restaurant, hopscotched over another, and feasted on a gun shop before turning its ravenous maw toward the quiet middle-class neighborhood of Coffey Park, making short work of thousands more homes. Santa Rosa proper is itself only 41 square miles, approximately 13 miles north-south and 9 miles east-west, including the long tail of homes flanking the Annadel mountains. By the time Kohl’s was collapsing, the “wildfire” was less than 4 miles from our home. I woke up around 2 am, which I tend to do a lot anyway. I walked outside and smelled smoke, saw people outside their homes looking around, and went on Twitter and FaceBook. There I learned of a local fire, forgotten by most in the larger conflagration, but duly noted in brief by the Press Democrat: a large historic home at 6th and Pierson burned to the ground, possibly from  a downed transformer, and the fire licked the edge of the Santa Rosa Creek Trail for another 100 feet. Others in the West End have reported the same experience of reading about the 6th Street house fire on social media and struggling to reconcile the reports of this fire with reports of panic and flight from areas north of us and videos of walls of flame. At 4 am I received a call that the university had activated its Emergency Operations Center and I asked if I should report in. I showered and dressed, packed a change of clothes in a tote bag, threw my bag of important documents in my purse, and drove south on my usual route to work, Petaluma Hill Road. The hills east of the road flickered with fire, the road itself was packed with fleeing drivers, and halfway to campus I braked at 55 mph when a massive buck sprang inches in front of my car, not running in that “oops, is this a road?” way deer usually cross lanes of traffic but yawing too and fro, its eyes wide. I still wonder, was it hurt or dying. As I drove onto campus I thought, the cleaning crew. I parked at the Library and walked through the building, already permeated with smoky air. I walked as quietly as I could, so that if they were anywhere in the building I would hear them. As I walked through the silent building I wondered, is this the last time I will see these books? These computers? The new chairs I’m so proud of? I then went to the EOC and found the cleaning crew had been accounted for, which was a relief. At Least There Was Food And Beer A few hours later I went home. We had a good amount of food in the house, but like many of us who were part of this disaster but not immediately affected by it, I decided to stock up. The entire Santa Rosa Marketplace– CostCo and Trader Joe’s, Target–on Santa Rosa Avenue was closed, and Oliver’s had a line outside of people waiting to get in. I went to the “G&G Safeway”–the one that took over a down-at-the-heels family market known as G&G and turned it into a spiffy market with a wine bar, no less–and it was without power, but open for business and, thanks to a backup system, able to take ATM cards. I had emergency cash on me but was loathe to use it until I had to. Sweating through an N95 mask I donned to protect my lungs, I wheeled my cart through the dark store, selecting items that would provide protein and carbs if we had to stuff them in our go-bags, but also fresh fruit and vegetables, dairy and eggs–things I thought we might not see for a while, depending on how the disaster panned out. (Note, we do already have emergency food, water, and other supplies.) The cold case for beer was off-limits–Safeway was trying to retain the cold in its freezer and fridge cases in case it could save the food–but there was a pile of cases of Lagunitas Lil Sumpin Sumpin on sale, so that with a couple of bottles of local wine went home with me too. And with one wild interlude, for most of the rest of the time we stayed indoors with the windows closed.  I sent out email updates and made phone calls, kept my phone charged and read every Nexil alert, and people at work checked in with one another. My little green library emergency contact card stayed in my back pocket the entire time. We watched TV and listened to the radio, including extraordinary local coverage by KSRO, the Little Station that Could; patrolled newspapers and social media; and rooted for Sheriff Rob, particularly after his swift smack-down of a bogus, Breitbart-fueled report that an undocumented person had started the fires. Our home was unoccupied for a long time before we moved in this September, possibly up to a decade, while it was slowly but carefully upgraded. The electric range was apparently an early purchase; it was a line long discontinued by Frigidaire, with humble electric coils. But it had been unused until we arrived, and was in perfect condition. If an electric range could express gratitude for finally being useful, this one did. I used it to cook homey meals: pork loin crusted with Smithfield bacon; green chili cornbread; and my sui generis meatloaf, so named because every time I make it, I grind and add meat scraps from the freezer for a portion of the meat mixture. (It would be several weeks before I felt comfortable grilling again.) We cooked. We stirred. We sauteed. We waited. On Wednesday, we had to run an errand. To be truthful, it was an Amazon delivery purchased that Saturday, when the world was normal, and sent to an Amazon locker at the capacious Whole Foods at Coddington Mall, a good place to send a package until the mall closes down because the northeast section of the city is out of power and threatened by a massive wildfire. By Wednesday, Whole Foods had reopened, and after picking up my silly little order–a gadget that holds soda cans in the fridge–we drove past Russian River Brewing Company and saw it was doing business, so we had salad and beer for lunch, because it’s a luxury to have beer at lunch and the fires were raging and it’s so hard to get seating there nights and weekends, when I have time to go there, but there we were. We asked our waiter how he was doing, and he said he was fine but he motioned to the table across from ours, where a family was enjoying pizza and beer, and he said they had lost their homes. There were many people striving for routine during the fires, and to my surprise, even the city planning office returned correspondence regarding some work we have planned for our new home, offering helpful advice on the permitting process required for minor improvements for homes in historic districts. Because it turns out developers and engineers could serenely ignore local codes and build entire neighborhoods in Santa Rosa in areas known to be vulnerable to wildfire; but to replace bare dirt with a little white wooden picket fence, or to restore front windows from 1950s-style plate glass to double-hung wooden windows with mullions–projects intended to reinstate our house to its historic accuracy, and to make it more welcoming–requires a written justification of the project, accompanying photos, “Proposed Elevations (with Landscape Plan IF you are significantly altering landscape) (5 copies),” five copies of a paper form, a Neighborhood Context and Vicinity Map provided by the city, and a check for $346, followed by “8-12 weeks” before a decision is issued. The net result of this process is like the codes about not building on ridges, though much less dangerous; most people ignore the permitting process, so that the historic set piece that is presumably the goal is instead rife with anachronisms. And of course, first I had to bone up on the residential building code and the historic district guidelines, which contradict one another on key points, and because the permitting process is poorly documented I have an email traffic thread rivaling in word count Byron’s letters to his lovers. But the planning people are very pleasant, and we all seemed to take comfort in plodding through the administrivia of city bureaucracy as if we were not all sheltering in place, masks over our noses and mouths, go-bags in our cars, while fires raged just miles from their office and our home. The Wild Interlude, or, I Have Waited My Entire Career For This Moment Regarding the wild interlude, the first thing to know about my library career is that nearly everywhere I have gone where I have had the say-so to make things happen, I have implemented key management. That mishmosh of keys in  a drawer, the source of so much strife and arguments, becomes an orderly key locker with numbered labels. It doesn’t happen overnight, because keys are control and control is political and politics are what we tussle about in libraries because we don’t have that much money, but it happens. Sometimes I even succeed in convincing people to sign keys out so we know who has them. Other times I convince people to buy a locker with a keypad so we sidestep the question of where the key to the key locker is kept. But mostly, I leave behind the lockers, and, I hope, an appreciation for lockers. I realize it’s not quite as impressive as founding the Library of Alexandria, and it’s not what people bring up when I am introduced as a keynote speaker, and I have never had anyone ask for a tour of my key lockers nor have I ever been solicited to write a peer-reviewed article on key lockers. However unheralded, it’s a skill. My memory insists it was Tuesday, but the calendar says it was late Monday night when I received a call that the police could not access a door to an area of the library where we had high-value items. It would turn out that this was a rogue lock, installed sometime soon after the library opened in 2000, that unlike others did not have a master registered with the campus, an issue we have since rectified. But in any event, the powers that be had the tremendous good fortune to contact the person who has been waiting her entire working life to prove beyond doubt that KEY LOCKERS ARE IMPORTANT. After a brief internal conversation with myself, I silently nixed the idea of offering to walk someone through finding the key. I said I knew where the key was, and I could be there in twenty minutes to find it. I wasn’t entirely sure this was the case, because as obsessed as I am with key lockers, this year I have been preoccupied with things such as my deanly duties, my doctoral degree completion, national association work, our home purchase and household move, and the selection of geegaws like our new gas range (double oven! center griddle!). This means I had not spend a lot of time perusing this key locker’s manifest. So there was an outside chance I would have to find the other key, located somewhere in an another department, which would require a few more phone calls. I was also in that liminal state between sleep and waking; I had been asleep for two hours after being up since 2 am, and I would have agreed to do just about anything. Within minutes I was dressed and again driving down Petaluma Hill Road, still busy with fleeing cars.  The mountain ridges to the east of the road roiled with flames, and I gripped the steering wheel, watching for more animals bolting from fire. Once in the library, now sour with smoke, I ran up the stairs into my office suite and to the key locker, praying hard that the key I sought was in it. My hands shook. There it was, its location neatly labeled by the key czarina who with exquisite care had overseen the organization of the key locker. The me who lives in the here-and-now profusely thanked past me for my legacy of key management, with a grateful nod to the key czarina as well. What a joy it is to be able to count on people! Items were packed up, and off they rolled. After a brief check-in at the EOC, home I went, to a night of “fire sleep”–waking every 45 minutes to sniff the air and ask, is fire approaching?–a type of sleep I would have for the next ten days, and occasionally even now. How we speak to one another in the here and now Every time Sandy and I interact with people, we ask, how are you. Not, hey, how are ya, where the expected answer is “fine, thanks” even if you were just turned down for a mortgage or your mother died. But no, really, how are you. Like, fire-how-are-you. And people usually tell you, because everyone has a story. Answers range from: I’m ok, I live in Petaluma or Sebastopol or Bodega Bay (in SoCo terms, far from the fire), to I’m ok but I opened my home to family/friends/people who evacuated or lost their homes; or, I’m ok but we evacuated for a week; or, as the guy from Home Depot said, I’m ok and so is my wife, my daughter, and our 3 cats, but we lost our home. Sometimes they tell you and they change the subject, and sometimes they stop and tell you the whole story: when they first smelled smoke, how they evacuated, how they learned they did or did not lose their home. Sometimes they have before-and-after photos they show you. Sometimes they slip it in between other things, like our cat sitter, who mentioned that she lost her apartment in Fountaingrove and her cat died in the fire but in a couple of weeks she would have a home and she’d be happy to cat-sit for us. Now, post-fire, we live in that tritest of phrases, a new normal. The Library opened that first half-day back, because I work with people who like me believe that during disasters libraries should be the first buildings open and the last to close. I am proud to report the Library also housed NomaCares, a resource center for those at our university affected by the fire. That first Friday back we held our Library Operations meeting, and we shared our stories, and that was hard but good. But we also resumed regular activity, and soon the study tables and study rooms were full of students, meetings were convened, work was resumed, and the gears of life turned. But the gears turned forward, not back. Because there is no way back. I am a city mouse, and part of moving to Santa Rosa was our decision to live in a highly citified section, which turned out to be a lucky call. But my mental model of city life has been forever twisted by this fire. I drive on 101 just four miles north of our home, and there is the unavoidable evidence of a fire boldly leaping into an unsuspecting city. I go to the fabric store, and I pass twisted blackened trees and a gun store totaled that first night. I drive to and from work with denuded hills to my east a constant reminder. But that’s as it should be. Even if we sometimes need respite from those reminders–people talk about taking new routes so they won’t see scorched hills and devastated neighborhoods–we cannot afford to forget. Sandy and I have moved around the country in our 25 years together, and we have seen clues everywhere that things are changing and we need to take heed. People like to lapse into the old normal, but it is not in our best interests to do so. All of our stories are different. But we share a collective loss of innocence, and we can never return to where we were. We can only move forward, changed by the fire, changed forever. Bookmark to: Filed in Santa Rosa Living | | Comments Off on What burns away Neutrality is anything but Saturday, August 19, 2017 “We watch people dragged away and sucker-punched at rallies as they clumsily try to be an early-warning system for what they fear lies ahead.” — Unwittingly prophetic me, March, 2016. Sheet cake photo by Flickr user Glane23. CC by 2.0 Sometime after last November, I realized something very strange was happening with my clothes. My slacks had suddenly shrunk, even if I hadn’t washed them. After months of struggling to keep myself buttoned into my clothes, I gave up and purchased slacks and jeans one size larger. I call them my T***p Pants. This post is about two things. It is about the lessons librarians are learning in this frightening era about the nuances and qualifications shadowing our deepest core values–an era so scary that quite a few of us, as Tina Fey observed, have acquired T***p Pants. And it’s also some advice, take it or leave it, on how to “be” in this era. I suspect many librarians have had the same thoughts I have been sharing with a close circle of colleagues. Most librarians take pride in our commitment to free speech. We see ourselves as open to all viewpoints. But in today’s new normal, we have seen that even we have limits. This week, the ACRL Board of Directors put out a statement condemning the violence in Charlottesville. That was the easy part. The Board then stated, “ACRL is unwavering in its long-standing commitment to free exchange of different viewpoints, but what happened in Charlottesville was not that; instead, it was terrorism masquerading as free expression.” You can look at what happened in Charlottesville and say there was violence “from many sides,” some of it committed by “very fine people” who just happen to be Nazis surrounded by their own private militia of heavily-armed white nationalists. Or you can look at Charlottesville and see terrorism masquerading as free expression, where triumphant hordes descended upon a small university town under the guise of protecting some lame-ass statue of an American traitor, erected sixty years after the end of the Civil War, not coincidentally during a very busy era for the Klan. Decent people know the real reason the Nazis were in Charlottesville: to tell us they are empowered and emboldened by our highest elected leader. There is no middle ground. You can’t look at Charlottesville and see everyday people innocently exercising First Amendment rights. As I and many others have argued for some time now, libraries are not neutral.  Barbara Fister argues, “we stand for both intellectual freedom and against bigotry and hate, which means some freedoms are not countenanced.” She goes on to observe, “we don’t have all the answers, but some answers are wrong.” It goes to say that if some answers are wrong, so are some actions. In these extraordinary times, I found myself for the first time ever thinking the ACLU had gone too far; that there is a difference between an unpopular stand, and a stand that is morally unjustifiable. So I was relieved when the national ACLU concurred with its three Northern California chapters that “if white supremacists march into our towns armed to the teeth and with the intent to harm people, they are not engaging in activity protected by the United States Constitution. The First Amendment should never be used as a shield or sword to justify violence.” But I was also sad, because once again, our innocence has been punctured and our values qualified. Every asterisk we put after “free speech” is painful. It may be necessary and important pain, but it is painful all the same. Many librarians are big-hearted people who like to think that our doors are open to everyone and that all viewpoints are welcome, and that enough good ideas, applied frequently, will change people. And that is actually very true, in many cases, and if I didn’t think it was true I would conclude I was in the wrong profession. But we can’t change people who don’t want to be changed. Listen to this edition of The Daily, a podcast from the New York Times, where American fascists plan their activities. These are not people who are open to reason. As David Lankes wrote, “there are times when a community must face the fact that parts of that community are simply antithetical to the ultimate mission of a library.” We urgently need to be as one voice as a profession around these issues. I was around for–was part of–the “filtering wars” of the 1990s, when libraries grappled with the implications of the Internet bringing all kinds of content into libraries, which also challenged our core values. When you’re hand-selecting the materials you share with your users, you can pretend you’re open to all points of view. The Internet challenged that pretense, and we struggled and fought, and were sometimes divided by opportunistic outsiders. We are fortunate to have strong ALA leadership this year. The ALA Board and President came up swinging on Tuesday with an excellent presser that stated unequivocally that “the vile and racist actions and messages of the white supremacist and neo-Nazi groups in Charlottesville are in stark opposition to the ALA’s core values,” a statement that (in the tradition of ensuring chapters speak first) followed a strong statement from our Virginia state association.  ARL also chimed in with a stemwinder of a statement.  I’m sure we’ll see more. But ALA’s statement also describes the mammoth horns of the library dilemma. As I wrote colleagues, “My problem is I want to say I believe in free speech and yet every cell in my body resists the idea that we publicly support white supremacy by giving it space in our meeting rooms.” If you are in a library institution that has very little likelihood of exposure to this or similar crises, the answers can seem easy, and our work appears done. But for more vulnerable libraries, it is crucial that we are ready to speak with one voice, and that we be there for those libraries when they need us. How we get there is the big question. I opened this post with an anecdote about my T***p pants, and I’ll wrap it up with a concern. It is so easy on social media to leap in to condemn, criticize, and pick apart ideas. Take this white guy, in an Internet rag, the week after the election, chastising people for not doing enough.  You know what’s not enough? Sitting on Twitter bitching about other people not doing enough. This week, Siva Vaidhyanathan posted a spirited defense of a Tina Fey skit where she addressed the stress and anxiety of these political times.  Siva is in the center of the storm, which gives him the authority to state an opinion about a sketch about Charlottesville. I thought Fey’s skit was insightful on many fronts. It addressed the humming anxiety women have felt since last November (if not earlier). It was–repeatedly–slyly critical of inaction: “love is love, Colin.” It even had a Ru Paul joke. A lot of people thought it was funny, but then the usual critics came out to call it naive, racist, un-funny, un-woke, advocating passivity, whatever. We are in volatile times, and there are provocateurs from outside, but also from inside. Think. Breathe. Step away from the keyboard. Take a walk. Get to know the mute button in Twitter and the unfollow feature in Facebook. Pull yourself together and think about what you’re reading, and what you’re planning to say. Interrogate your thinking, your motives, your reactions. I’ve read posts by librarians deriding their peers for creating subject guides on Charlottesville, saying instead we should be punching Nazis. Get a grip. First off, in real life, that scenario is unlikely to transpire. You, buried in that back cubicle in that library department, behind three layers of doors, are not encountering a Nazi any time soon, and if you did, I recommend fleeing, because that wackdoodle is likely accompanied by a trigger-happy militiaman carrying a loaded gun. (There is an entire discussion to be had about whether violence to violence is the politically astute response, but that’s for another day.) Second, most librarians understand that their everyday responses to what is going on in the world are not in and of themselves going to defeat the rise of fascism in America. But we are information specialists and it’s totally wonderful and cool to respond to our modern crisis with information, and we need to be supportive and not go immediately into how we are all failing the world. Give people a positive framework for more action, not scoldings for not doing enough. In any volatile situation, we need to slow the eff down and ask how we’re being manipulated and to what end; that is a lesson the ACLU just learned the hard way. My colleague Michael Stephens is known for saying, “speak with a human voice.” I love his advice, and I would add, make it the best human voice you have. We need one another, more than we know.   Bookmark to: Filed in Intellectual Freedom, Librarianship | | Comments (2) MPOW in the here and now Sunday, April 9, 2017 Sometimes we have monsters and UFOs, but for the most part it’s a great place to work I have coined a few biblioneologisms in my day, but the one that has had the longest legs is MPOW (My Place of Work), a convenient, mildly-masking shorthand for one’s institution. For the last four years I haven’t had the bandwidth to coin neologisms, let alone write about MPOW*. This silence could be misconstrued. I love what I do, and I love where I am. I work with a great team on a beautiful campus for a university that is undergoing a lot of good change. We are just wrapping up the first phase of a visioning project to help our large, well-lit building serve its communities well for the decades to come. We’re getting ready to join the other 22 CSU libraries on OneSearch, our first-ever unified library management system. We have brought on some great hires, thrown some great events (the last one featured four Black Panthers talking about their life work — wow!). With a new dean (me) and a changing workforce, we are developing our own personality. It’s all good… and getting better The Library was doing well when I arrived, so my job was to revitalize and switch it up. As noted in one of the few posts about MPOW, the libraries in my system were undergoing their own reassessment, and that has absorbed a fair amount of our attention, but we continue to move forward. Sometimes it’s the little things. You may recall I am unreasonably proud of the automated table of contents I generated for my dissertation, and I also feel that way about MPOW’s slatwall book displays, which in ten areas beautifully market new materials in spaces once occupied by prison-industry bookcases or ugly carpet and unused phones (what were the phones for? Perhaps we will never know). The slatwall was a small project that was a combination of expertise I brought from other libraries, good teamwork at MPOW, and knowing folks. The central problem was answered quickly by an email to a colleague in my doctoral program (hi, Cindy!) who manages public libraries where I saw the displays I thought would be a good fit. The team selected the locations, a staff member with an eye for design recommended the color, everyone loves it, and the books fly off the shelves. If there is any complaining, it is that we need more slatwall. Installed slatwall needs to wait until we know if we are moving/removing walls as part of our building improvements. A bigger holdup is that we need to hire an Access Services Manager, and really, anything related to collections needs the insight of a collections librarian. People… who need people… But we had failed searches for both these positions… in the case of collections, twice. *cue mournful music* We have filled other positions with great people now doing great things, and are on track to fill more positions, but these two, replacing people who have retired, are frustrating us. The access services position is a managerial role, and the collections librarian is a tenure-track position. Both offer a lot of opportunity. We are relaunching both searches very soon (I’ll post a brief update when that happens), and here’s my pitch. If you think you might qualify for either position, please apply. Give yourself the benefit of the doubt. If you know someone who would be a good fit for either position, ask them to apply. I recently mentored someone who was worried about applying to a position. “Will that library hold it against me if I am not qualified?” The answer is of course not!  (And if they do, well, you dodged that bullet!) I have watched far too many people self-select out of positions they were qualified for (hrrrrmmmm particularly one gender…). Qualification means expertise + capacity + potential. We expect this to be a bit of a stretch to you. If a job is really good, most days will have a “fake it til you make it” quality. This is also not a “sink or swim” institution. If it ever was, those days are in the dim past, long before I arrived. The climate is positive. People do great things and we do our best to support them. I see our collective responsibility as an organization as to help one another succeed. Never mind me and my preoccupation with slatwall (think of it as something to keep the dean busy and happy, like a baby with a binky). We are a great team, a great library, on a great campus, and we’re a change-friendly group with a minimum of organizational issues, and I mean it. I have worked enough places to put my hand on a Bible and swear to that. It has typical organizational challenges, and it’s a work in progress… as are we all. The area is crazily expensive, but it’s also really beautiful and so convenient for any lifestyle. You like city? We got city. You like suburb, or ocean, or mountain, or lake? We got that! Anyway, that’s where I am with MPOW: I’m happy enough, and confident enough, to use this blog post to BEG YOU OH PLEASE HELP US FILL THESE POSITIONS. The people who join us will be glad you did. ### *   Sidebar: the real hilarity of coining neologisms is that quite often someone, generally of a gender I do not identify with, will heatedly object to the term, as happened in 2004 when I coined the term biblioblogosphere. Then, as I noted in that post from 2012, others will defend it. That leads me to believe that creating new words is the linguistic version of lifting one’s hind leg on a tree. Bookmark to: Filed in Uncategorized | | Comments (1) Questions I have been asked about doctoral programs Wednesday, March 29, 2017 About six months ago I was visiting another institution when someone said to me, “Oh, I used to read your blog, BACK IN THE DAY.” Ah yes, back in the day, that Pleistocene era when I wasn’t working on a PhD while holding down a big job and dealing with the rest of life’s shenanigans. So now the PhD is done–I watched my committee sign the signature page, two copies of it, even, before we broke out the champers and celebrated–and here I am again. Not blogging every day, as I did once upon a time, but still freer to put virtual pen to electronic paper as the spirit moves me. I have a lot to catch up on–for example, I understand there was an election last fall, and I hear it may not have gone my way–but the first order of business is to address the questions I have had from library folk interested in doctoral programs. Note that my advice is not directed at librarians whose goal is to become faculty in LIS programs. Dropping Back In One popular question comes from people who had dropped out of doctoral programs. Could they ever be accepted into a program again? I’m proof there is a patron saint for second chances. I spent one semester in a doctoral program in 1995 and dropped out for a variety of reasons–wrong time, wrong place, too many life events happening. At the time, I felt that dropping out was the academic equivalent of You’ll Never Eat Lunch In This Town Again, but part of higher education is a series of head games, and that was one of them. The second time around, I had a much clearer idea of what I wanted from a program and what kind of program would work for me, and I had the confluence of good timing and good luck. The advice Tom Galvin gave me in 1999, when Sandy and I were living in Albany and when Tom–a longtime ALA activist and former ALA Exec Director–was teaching at SUNY Albany, still seems sound: you can drop out of one program and still find your path back to a doctorate, just don’t drop out of two programs. I also have friends who suffered through a semester or two, then decided it wasn’t for them. When I started the program, I remember thinking “I need this Ph.D. because I could never get a job at, for example, X without it.” Then I watched as someone quite accomplished, with no interest in ever pursuing even a second masters, was hired at X. There is no shame in deciding the cost/benefit analysis isn’t there for you–though I learned, through this experience, that I was in the program for other, more sustainable reasons. Selecting Your Program I am also asked what program to attend. To that my answer is, unless you are very young and can afford to go into, and hopefully out of, significant amounts of debt, pick the program that is most affordable and allows you to continue working as a professional (though if you are at a point in life when you can afford to take a couple years off and get ‘er done, more power to you). That could be a degree offered by your institution or in cooperation with another institution, or otherwise at least partially subsidized. I remember pointing out to an astonished colleague that the Ed.D. he earned for free (plus many Saturdays of sweat equity) was easily worth $65,000, based on the tuition rate at his institution. Speaking of which, I get asked about Ph.D. versus Ed.D. This can be a touchy question. My take: follow the most practical and affordable path available to you that gets you the degree you will be satisfied with and that will be the most useful to you in your career. But whether Ed.D. or Ph.D., it’s still more letters after your name than you had before you started. Where Does It Hurt? What’s the hardest part of a doctoral program? For me, that was a two-way tie between the semester coursework and the comprehensive exams. The semester work was challenging because it couldn’t be set aside or compartmentalized. The five-day intensives were really seven days for me as I had to fly from the Left Coast to Boston. The coursework had deadlines that couldn’t be put aside during inevitable crises. The second semester was the hardest, for so many reasons, not the least of which is that once I had burned off the initial adrenaline, the finish line seemed impossibly far away; meanwhile, the tedium of balancing school and work was settling in, and I was floundering in alien subjects I was struggling to learn long-distance. Don’t get me wrong, the coursework was often excellent: managing in a political environment, strategic finance, human resources, and other very practical and interesting topics. But it was a bucket o’ work, and when I called a colleague with a question about chair manufacturers (as one does) and heard she was mired in her second semester, I immediately informed her This Too Shall Pass. Ah, the comprehensive exams. I would say I shall remember them always, except they destroyed so much of my frontal lobe, that will not be possible. The comps required memorizing piles of citations–authors and years, with salient points–to regurgitate during two four-hour closed-book tests.  I told myself afterwards that the comps helped me synthesize major concepts in grand theory, which is a dubious claim but at least made me feel better about the ordeal. A number of students in my program helped me with comps. My favorite memory is of colleague Gary Shaffer, who called me from what sounded like a windswept city corner to offer his advice. I kept hearing this crinkling sound. The crinkling became louder. “Always have your cards with you,” Gary said. He had brought a sound prop: the bag of index cards he used to constantly drill himself. I committed myself to continuous study until done, helped by partnering with my colleague Chuck in long-distance comps prep. We didn’t study together, but we compared timelines and kept one another apprised of our progress. You can survive a doctoral program without a study buddy, but whew, is it easier if you have one. Comps were an area where I started with old tech–good old paper index cards–and then asked myself, is this how it’s done these days? After research, I moved on to electronic flashcards through Quizlet. When I wasn’t flipping through text cards on my phone, iPad, or computer, I was listening to the cards on my phone during my run or while driving around running errands. Writing != Not Writing So about that dissertation. It was a humongous amount of work, but the qualifying paper that preceded it and the coursework and instruction in producing dissertation-quality research gave me the research design skills I needed to pull it off. Once I had the data gathered, it was just a lot of writing. This, I can do. Not everyone can. Writing is two things (well, writing is many things, but we’ll stick with two for now): it is a skill, and it is a discipline. If you do not have those two things, writing will be a third thing: impossible. Here is my method. It’s simple. You schedule yourself, you show up, and you write. You do not talk about how you are going to write, unless you are actually going to write. You do not tweet that you are writing (because then you are tweeting, not writing). You do not do other things and feel guilty because you are not writing. (If you do other things, embrace them fully.) I would write write write write write, at the same chair at the same desk (really, a CostCo folding table) facing the same wall with the same prompts secured to the wall with painter’s tape that on warm days would loosen, requiring me to crawl under my “desk” to retrieve the scattered papers, which on many days was pretty much my only form of exercise. Then I would write write write write write some more, on weekends, holiday breaks, and the occasional “dissercation day,” as I referred to vacation days set aside for this purpose. Dissercation Days had the added value that  I was very conscious I was using vacation time to write, so I didn’t procrastinate–though in general I find procrastinating at my desk a poor use of time; if I’m going to procrastinate, let me at least get some fresh air. People will advise you when and how to write. A couple weekends ago I was rereading Stephen King’s On Writing–now that I can read real books again–in which King recommends writing every day. If that works for you, great. What worked for me was using weekends, holidays, or vacation days; writing early in the day, often starting as early as 4 am; taking a short exercise break or powering through until mid-afternoon; and then stopping no later than 4 pm, many times more like 2 pm if I hadn’t stopped by then. When I tried to write on weekday mornings, work would distract me. Not actual tasks, but the thought of work. It would creep into my brain and then I would feel the urgent need to see if the building consultant had replied to my email or if I had the agenda ready for the program and marketing meeting. It also takes me about an hour to get into a writing groove, so by the time the words were flowing it was time to get ready for work. As for evenings, a friend of mine observed that I’m a lark, not an owl. The muse flees me by mid-afternoon. (This also meant I saved the more chore-like tasks of writing for the afternoon.) The key is to find your own groove and stick to it. If your groove isn’t working, maybe it’s not your groove after all. Do not take off too much time between writing sessions. I had to do that a couple of times for six to eight weeks each time, during life events such as household moves and so on, and it took some revisiting to reacquaint myself with my writing (which was Stephen King’s main, and excellent, point in his recommendation to write daily). Even when I was writing on a regular basis I often spent at least an hour at the start of the weekend rereading my writing from page 1 to ensure that my most recent writing had a coherent flow of reasoning and narrative and that the writing for that day would be its logical descendant. Another universal piece of advice is to turn off the technology. I see people tweeting “I’m writing my dissertation right now” and I think, no you aren’t. I used a Mac app called Howler timer to give me writing sieges of 45, 60, 75, or 90 minutes, depending on my degree of focus for that day, during which all interruptions–email, Facebook, Twitter, etc.–were turned off. Twitter and Facebook became snack breaks, though I timed those snacks as well. I had favorite Pandora stations to keep me company and drown out ambient noise, and many, many cups of herbal tea. Technology Will Save Us All A few technical notes about technology and doctoral programs. With the exception of the constant allure of social networks and work email, it’s a good thing. I used Kahn Academy and online flash cards to study for the math portion of the GRE.  As noted earlier, I used Quizlet for my comps, in part because this very inexpensive program not only allowed me to create digital flashcards but also read them aloud to me on my iPhone while I exercised or ran errands. I conducted interviews using FaceTime with an inexpensive plug-in, Call Recorder, that effortlessly produced digital recordings, from which the audio files could be easily split out. I then emailed the audio files to Valerie, my transcriptionist, who lives several thousand miles away but always felt as if she were in the next room, swiftly and flawlessly producing transcripts. I used Dedoose, a cloud-based analytical product, to mark up the narratives, and with the justifiable paranoia of any doctoral student, exported the output to multiple secure online locations. I dimly recall life before such technology, but cannot fathom operating in such a world again, or how much longer some of the tasks would have taken.  I spent some solid coin on things like paying a transcriptionist, but when I watch friends struggling to transcribe their own recordings, I have no regrets. There are parts of my dissertation I am exceptionally proud of, but I admit particular pride for my automatically-generated table of contents, just one of many skills I learned through YouTube (spoiler alert: the challenge is not marking up the text, it’s changing the styles to match your requirements. Word could really use a style set called Just Times Roman Please). And of course, there were various library catalogs and databases, and hundreds of e-journals to plumb, activity I accomplished as far away from your typical “library discovery layer” as possible. You can take Google Scholar away from me when you pry it from my cold, dead hands. I also plowed through a lot of print books, and many times had to do backflips to get the book in that format. Journal articles work great in e-format (though I do have a leaning paper pillar of printed journal articles left over from comps review and classes). Books, not so much. I needed to have five to fifteen books simultaneously open during a writing session, something ebooks are lame at.  I don’t get romantic about the smell of paper blah blah blah, but when I’m writing, I need my tools in the most immediately accessible format possible, and for me that is digital for articles and paper for books. Nothing Succeeds Like Success Your cohort can be very important,  and indeed I remember all of them with fondness but one with particular gratitude. Nevertheless, you alone will cross the finish line. I was unnerved when one member of our cohort dropped out after the first semester, but I shouldn’t have been. Doctoral student attrition happens throughout the academy, no less so in LibraryLand. Like the military, or marriage, you really have no idea what it’s like until you’re in it, and it’s not for everyone. It should be noted that the program I graduated from has graduated, or will graduate, nearly all of the students who made it past the first two semesters, which in turn is most of the people who entered the program in its short but glorious life–another question you should investigate while looking at programs. It turned out that for a variety of reasons that made sense, the cohort I was in was the last for this particular doctoral program. That added a certain pressure since each class was the last one to ever be offered, but it also encouraged me to keep my eyes on the prize. I also, very significantly, had a very supportive committee, and most critically, I fully believed they wanted me to succeed. I also had a very supportive spouse, with whom I racked up an infinity of backlogged honey-dos and I-owe-you-for-this promises. Regarding success and failure, at the beginning of the program, I asked if anyone had ever failed out of the program. The answer was no, everyone who left self-selected. I later asked the same question regarding comps: had anyone failed comps? The answer was that a student or two had retaken a section of comps in order to pass, but no one had completely failed (and you got one do-over if that happened). These were crucial questions for me. It also helped me to reflect on students who had bigger jobs, or were also raising kids, or otherwise were generally worse off than me in the distraction department. If so-and-so, with the big Ivy League job, or so-and-so, with the tiny infant, could do it, couldn’t I? (There is a fallacy inherent here that more prestigious schools are harder to administer, but it is a fallacy that comforted me many a day.) Onward I am asked what I will “do” with my Ph.D. In higher education, a doctorate is the expected degree for administrators, and indeed, the news of my successful doctoral defense was met with comments such as “welcome to the club.” So, mission accomplished. Also, I have a job I love, but having better marketability is never a bad idea, particularly in a political moment that can best be described as volatile and unpredictable. I can consult. I can teach (yes, I already could teach, but now more fancy-pants). I could make a reservation at a swanky bistro under the name Dr. Oatmeal and only half of that would be a fabrication. The world is my oyster! Frankly, I did not enter the program with the idea that I would gain skills and develop the ability to conduct doctoral-quality research (I was really shooting for the fancy six-sided tam), but that happened and I am pondering what to do with this expertise. I already have the joy of being pedantic, if only quietly to myself. Don’t tell me you are writing a “case study” unless it has the elements of a case study not to mention the components of any true research design. Otherwise it’s just anecdata. And of course, when it comes to owning the area of LGTBQ leadership in higher education, I am totally M.C. Hammer: u can’t touch this! I would not mind being part of the solution for addressing the dubious quality of so much LIS “research.” LibraryLand needs more programs such as the Institute for Research Design in Librarianship to address the sorry fact that basic knowledge of the fundamentals of producing industry-appropriate research is in most cases not required for a masters degree in library science, which at least for academic librarianship, given the student learning objectives we claim to support, is absurd. I also want to write a book, probably continuing the work I have been doing with documenting the working experiences of LGBTQ librarians. But first I need to sort and purge my home office, revisit places such as Hogwarts and Narnia, and catch up on some of those honey-dos and I-owe-you-for-this promises. And buy a six-sided tam. Bookmark to: Filed in Uncategorized | | Comments (2) A scholar’s pool of tears, Part 2: The pre in preprint means not done yet Tuesday, January 10, 2017 Note, for two more days, January 10 and 11, you (as in all of you) have free access to my article, To be real: Antecedents and consequences of sexual identity disclosure by academic library directors. Then it drops behind a paywall and sits there for a year. When I wrote Part 1 of this blog post in late September, I had keen ambitions of concluding this two-part series by discussing “the intricacies of navigating the liminal world of OA that is not born OA; the OA advocacy happening in my world; and the implications of the publishing environment scholars now work in.” Since then, the world, and my priorities have changed. My goals are to prevent nuclear winter and lead our library to its first significant building upgrades since it opened close to 20 years ago. But at some point I said on Twitter, in response to a conversation about posting preprints, that I would explain why I won’t post a preprint of To be real. And the answer is very simple: because what qualifies as a preprint for Elsevier is a draft of the final product that presents my writing before I incorporated significant stylistic guidance from the second reviewer, and that’s not a version of the article I want people to read. In the pre-Elsevier draft, as noted before, my research is present, but it is overshadowed by clumsy style decisions that Reviewer 2 presented far more politely than the following summary suggests: quotations that were too brief; rushing into the next thought without adequately closing out the previous thought; failure to loop back to link the literature review to the discussion; overlooking a chance to address the underlying meaning of this research; and a boggy conclusion. A crucial piece of advice from Reviewer 2 was to use pseudonyms or labels to make the participants more real. All of this advice led to a final product, the one I have chosen to show the world. That’s really all there is to it. It would be better for the world if my article were in an open access publication, but regardless of where it is published, I as the author choose to share what I know is my best work, not my work in progress. The OA world–all sides of it, including those arguing against OA–has some loud, confident voices with plenty of “shoulds,” such as the guy (and so many loud OA voices are male) who on a discussion list excoriated an author who was selling self-published books on Amazon by saying “people who value open access should praise those scholars who do and scorn those scholars who don’t.” There’s an encouraging appproach! Then there are the loud voices announcing the death of OA when a journal’s submissions drop, followed by the people who declare all repositories are Potemkin villages, and let’s not forget the fellow who curates a directory of predatory OA journals that is routinely cited as an example of what’s wrong with scholarly publishing. I keep saying, the scholarly-industrial complex is broken. I’m beyond proud that the Council of Library Deans for the California State University–my 22 peers–voted to encourage and advocate for open access publishing in the CSU system. I’m also excited that my library has its first scholarly communications librarian who is going to bat on open access and open educational resources and all other things open–a position that in consultation with the library faculty I prioritized as our first hire in a series of retirement/moving-on faculty hires. But none of that translates to sharing work I consider unfinished. We need to fix things in scholarly publishing and there is no easy, or single, path. And there are many other things happening in the world right now. I respect every author’s decision about what they will share with the world and when and how they will share it. As for my decision–you have it here. Bookmark to: Filed in Uncategorized | | Comments Off on A scholar’s pool of tears, Part 2: The pre in preprint means not done yet ‹ Older posts Search for: Recto and verso About Free Range Librarian Comment guidelines Writing: Clips & Samples You were saying… K.G. Schneider on I have measured out my life in Doodle polls Thomas Dowling on I have measured out my life in Doodle polls Chad on I have measured out my life in Doodle polls Dale McNeill on I have measured out my life in Doodle polls Walter Underwood on An Old-Skool Blog Post Recent Posts (Dis)Association I have measured out my life in Doodle polls Memento DMV An Old-Skool Blog Post Keeping Council Browse by month Browse by month Select Month May 2019  (1) April 2019  (1) March 2019  (2) January 2018  (1) November 2017  (1) August 2017  (1) April 2017  (1) March 2017  (1) January 2017  (1) November 2016  (1) September 2016  (1) June 2016  (1) March 2016  (3) January 2016  (2) September 2015  (1) August 2015  (1) July 2015  (1) April 2015  (1) March 2015  (1) January 2015  (1) October 2014  (1) September 2014  (1) August 2014  (1) June 2014  (1) April 2014  (1) February 2014  (1) January 2014  (1) December 2013  (2) October 2013  (1) August 2013  (1) July 2013  (1) June 2013  (2) April 2013  (2) March 2013  (4) February 2013  (2) January 2013  (2) December 2012  (2) November 2012  (1) September 2012  (1) August 2012  (2) July 2012  (2) June 2012  (2) May 2012  (4) April 2012  (3) March 2012  (5) February 2012  (3) January 2012  (5) December 2011  (1) November 2011  (1) October 2011  (1) September 2011  (1) August 2011  (1) July 2011  (1) June 2011  (2) May 2011  (1) April 2011  (1) March 2011  (1) February 2011  (4) January 2011  (2) December 2010  (2) November 2010  (2) September 2010  (6) August 2010  (2) July 2010  (4) June 2010  (4) May 2010  (3) April 2010  (6) March 2010  (3) February 2010  (2) January 2010  (5) December 2009  (6) November 2009  (4) October 2009  (7) September 2009  (9) August 2009  (4) July 2009  (5) June 2009  (6) May 2009  (6) April 2009  (8) March 2009  (6) February 2009  (9) January 2009  (20) December 2008  (11) November 2008  (16) October 2008  (11) September 2008  (7) August 2008  (8) July 2008  (10) June 2008  (15) May 2008  (12) April 2008  (14) March 2008  (15) February 2008  (9) January 2008  (15) December 2007  (17) November 2007  (30) October 2007  (21) September 2007  (23) August 2007  (34) July 2007  (41) June 2007  (34) May 2007  (34) April 2007  (28) March 2007  (15) February 2007  (12) January 2007  (11) December 2006  (9) November 2006  (20) October 2006  (21) September 2006  (21) August 2006  (25) July 2006  (24) June 2006  (26) May 2006  (30) April 2006  (31) March 2006  (32) February 2006  (45) January 2006  (37) December 2005  (25) November 2005  (27) October 2005  (18) September 2005  (26) August 2005  (22) July 2005  (47) June 2005  (25) May 2005  (31) April 2005  (41) March 2005  (44) February 2005  (33) January 2005  (37) December 2004  (35) November 2004  (26) October 2004  (14) September 2004  (9) August 2004  (26) July 2004  (5) June 2004  (34) May 2004  (22) April 2004  (31) March 2004  (22) February 2004  (29) January 2004  (34) December 2003  (27) November 2003  (36) July 2003  (3) Categories CategoriesSelect Category American Liberry Ass’n Annual Lists Another Library Blog Association Nerd Bad Entry Titles Best of FRL Blog Problems Blogging Blogging and Ethics Blogging and RSS BlogHer Blogs and Journalism Blogs Worth Reading Book Reviews Business 2.0 California Dreamin’ Car shopping Cats Who Blog CLA Shenanigans Conferences Congrunts Cooking creative nonfiction Cuba Customer Service Digital Divide Issues Digital Preservation ebooks Essays from the MP Ethics Evergreen ILS Family Values Five Minute Reviews Flickr Fun Flori-duh Friends FRL Blogroll Additions FRL Penalty Box FRL Spotlight Reviews Gardening Whatnots Gay Rights Gender and Librarianship Get a Grip Get Real! God’s Grammar Google Gormangate Hire Edukayshun Homebrewing Homosexual Agenda Hot Tech Intellectual Freedom Intellectual Property Ipukarea Katrina and Libraries Kudos and Woo-Hoos lastentries Leadership LGBT librarianship Librarian Wisdom Librarianship Library 2.0 Library Journal Librewians Lies damn lies Life Linkalicious LITA Councilor Memes MFA-O-Rama Military Life Movable Type Movie Reviews MPOW MPOW Wishlist Must-Read Blogs NASIG Paper Next Gen Catalog Online learning Onomies Open Access Open Data Open Source Software Our World People Sitings Podcasts Politics Postalicious Prayer Circle Product Reviews Reading Recipes Recto and Verso Regency Zombies Regular Issues Religion RSS-alicious Santa Rosa Living Search 4 Search Standards Schmandards tagging Talks and Tours Tallahassee Dining Tallahassee Living TANSTAAFL Test Entries The Big O This and That Top Tech Trends Travel Schmavel Treo Time Twitterprose Two Minutes Hate Uncategorized Upcoming gigs Uppity Wimmin Vast stupidity War No More WoGroFuBiCo Women WordPress Writing Writing for the Web Ye Olde Tech Tags ALA a mile down BACABI bush Castro CIL2008 cloud tests CNF creative nonfiction crowdvine david vann defragcon defragcon07 defrag07 defragcon defragcon07 defrag07 shootingonesownfoot defragcon defragcon2007 defrag07 Digital Preservation email environment essays flaming homophobia gay Gay Rights GLBT Harvey Milk Homebrew hybrid iasummit08 iasummit2008 idjits journals keating 5 LOCKSS mccain mea culpas mullets naked emperors obama ready fire aim San Francisco silly tags swift tag clouds tagging VALA-CAVAL WoGroFuBiCo Writing Scribbly stuff Log in Entries feed Comments feed WordPress.org © 2021 K.G. Schneider ¶ Thanks, WordPress. ¶ veryplaintxt theme by Scott Allan Wallick. ¶ It's nice XHTML & CSS. futurearchives-blogspot-com-8781 ---- futureArch, or the future of archives... Monday, 5 September 2016 This blog is no longer being updated But you will find posts on some of our digital archives work here: http://blogs.bodleian.ox.ac.uk/archivesandmanuscripts/category/activity/digital-archives/  Posted by Susan Thomas at 17:33 No comments: Thursday, 31 October 2013 Born Digital: Guidance for Donors, Dealers, and Archival Repositories Today CLIR published a report which is designed to provide guidance on the acquisition of archives in a digital world. The report provides recommendations for donors and dealers, and for repository staff, based on the experiences of archivists and curators at ten repositories in the UK and US, including the Bodleian. You can read it here: http://www.clir.org/pubs/reports/pub159 Posted by Susan Thomas at 17:49 No comments: Labels: acquisitions, dealers, donors, guidance, scoping, sensitivity review, transfers Thursday, 31 January 2013 Digital Preservation: What I wish I knew before I started The Digital Preservation Coalition (DPC) and Archives and Records Association event ‘Digital Preservation: What I wish I knew before I started, 2013’ took place at Birkbeck College, London on 24 January 2013. A half-day conference, it brought together a group of leading specialists in the filed to discuss the challenges of digital collection. William Kilbride kicked off events with his presentation ‘What’s the problem with digital preservation’. He looked at the traditional -or in his words "bleak"- approach that is too often characterised by data loss. William suggested we need to create new approaches, such as understanding the actual potential and value of output; data loss is not the issue if there is no practical case for keeping or digitising material. Some key challenges facing digital archivists were also outlined and it was argued that impediments such as obsolescence issues and storage media failure are a problem bigger than one institution, and collaboration across the profession is paramount. Helen Hockx-Yu discussed how the British Library is collaborating with other institutions to archive websites of historical and cultural importance through the UK Web Archive. Interestingly, web archiving at the British Library is now a distinct business unit with a team of eight people. Like William, Helen also emphasised how useful it is to share experiences and work together, both internally and externally. Next, Dave Thompson, Digital Curator at the Wellcome Library stepped up with a lively presentation entitled ‘So You Want to go Digital’. For Dave, it is “not all glamour, metadata and preservation events”, which he illustrated with an example of his diary for the week. He then looked at the planning side of digital preservation, arguing that if digital preservation is going to work, not only are we required to be creative, but we need to be sure what we are doing is sustainable. Dave highlighted some key lessons from his career thus far: 1.     We must be willing to embrace change 2.     Data preservation is not solely an exercise in technology but requires engagement with data and consumers. 3.     Little things we do everyday in the workplace are essential to efficient digital preservation, including backup, planning, IT infrastructure, maintenance and virus checking. 4.     It needs to be easy to do and within our control, otherwise the end product is not preservation. 5.     Continued training is essential so we can make the right decisions in appraisal, arrangement, context, description and preservation. 6.     We must understand copyright access. Patricia Sleeman, Digital Archivist at University of London Computer Centre then highlighted a selection of practical skills that should underpin how we move forward with digital preservation. For instance, she stressed that information without context is meaningless and has little value without the appropriate metadata. Like the other speakers, she suggested planning is paramount, and before we start a project we must look forward and learn about how we will finish it. As such, project management is an essential tool, including the ability to understand budgets. Adrian Brown from the Parliamentary Archives continued with his presentation 'A Day in the Life of a Digital Archivist'. His talk was a real eye-opener on just how busy and varied the role is. A typical day for Adrian might involve talking to information owners about possible transfers, ingesting and cataloguing new records into the digital repository, web archiving, providing demos to various groups, drafting preservation policies and developing future requirements such as building software, software testing and preservation planning. No room to be bored here! Like Dave Thompson, Adrian noted that while there are more routine tasks such as answering emails and endless meetings, the rewards from being involved in a new and emerging discipline far outweigh the more mundane moments. We then heard from Simon Rooks from the BBC Multi-Media Archive who described the varied roles at his work (I think some of the audience were feeling quite envious here!). In keeping with the theme of the day, Simon reflected on his career path. Originally trained as a librarian, he argued that he would have benefited immensely as a digital archivist if he had learnt the key functions of an archivist’s role early on. He emphasised how the same archival principles (intake, appraisal and selection, cataloguing, access etc.) underpin our practices, whether records are paper or digital, and whether we are in archives or records management. These basic functions help to manage many of the issues concerning digital content. Simon added that the OAIS functional model is an approach that has encouraged multi-disciplinary team-work amongst those working at the BBC. After some coffee there followed a Q&A session, which proved lively and engaging. A lot of ground was covered including how appropriate it is to distinguish 'digital archivists' from 'archivists'. We also looked at issues of cost modelling and it was suggested that while we need to articulate budgets better, we should perhaps be less obsessed with costs and focus on the actual benefits and return of investment from projects. There was then some debate about what students should expect from undertaking the professional course. Most agreed that it is simply not enough to have the professional qualification, and continually acquiring new skill sets is essential. A highly enjoyable afternoon then, with some thought-provoking presentations, which were less about the techie side of digital preservation, and more a valuable lesson on the planning and strategies involved in managing digital assets. Communications, continued learning and project planning were central themes of the day, and importantly, that we should be seeking to build something that will have value and worth. Posted by Anonymous at 10:42 No comments: Tuesday, 13 November 2012 Transcribe at the arcHIVE I do worry from time to time that textual analogue records will come to suffer from their lack of searchability when compared with their born-digital peers. For those records that have been digitised, crowd-sourcing transcription could be an answer. A rather neat example of just that is the arcHIVE platform from the National Archives of Australia. arHIVE is a pilot from NAA's labs which allows anyone to contribute to the transcription of records. To get started they have chosen a selection of records from their Brisbane office which are 'known to be popular'. Not too many of them just yet, but at this stage I guess they're just trying to prove the concept works. All the items have been OCR-ed, and users can choose to improve or overwrite the results from the OCR process. There are lots of nice features here, including the ability to choose documents by a difficulty rating (easy, medium or hard) or by type (a description of the series by the looks of it). The competitive may be inspired by the presence of a leader board, while the more collaborative may appreciate the ability to do as much as you can, and leave the transcription for someone else to finish up later. You can register for access to some features, but you don't have to either. Very nice. Posted by Susan Thomas at 09:37 No comments: Labels: crowdsourcing, searchability, transcription Friday, 19 October 2012 Atlas of digital damages An Atlas of digital damage has been created on Flickr, which will provide a handy resource for illustrating where digital preservation has failed. Perhaps 'failed' is a little strong. In some cases the imperfection may be an acceptable trade off. A nice, and useful, idea. Contribute here. Posted by Susan Thomas at 17:48 No comments: Labels: corruption, damage Saturday, 13 October 2012 DayOfDigitalArchives 2012 Yesterday was Day of Digital Archives 2012! (And yes, I'm a little late posting...) This 'Day' was initiated last year to encourage those working with digital archives to use social media to raise awareness of digital archives: "By collectively documenting what we do, we will be answering questions like: What are digital archives? Who uses them? How are they created and managed? Why are they important?" . So in that spirit, here is a whizz through my week. Coincidentally not only does this week include the Day of Digital Archives but it's also the week that the Digital Preservation Coalition (or DPC) celebrated its 10th birthday. On Monday afternoon I went to the reception at the House of Lords to celebrate that landmark anniversary. A lovely event, during which the shortlist for the three digital preservation awards was announced. It's great to see three award categories this time around, including one that takes a longer view: 'the most outstanding contribution to digital preservation in the last decade'. That's quite an accolade. On the train journey home from the awards I found some quiet time to review a guidance document on the subject of acquiring born-digital materials. There is something about being on a train that puts my brain in the right mode for this kind of work. Nearing its final form, this guidance is the result of a collaboration between colleagues from a handful of archive repositories. The document will be out for further review before too long, and if we've been successful in our work it should prove helpful to creators, donors, dealers and repositories. Part of Tuesday I spent reviewing oral history guidance drafted by a colleague to support the efforts of Oxford Medical Alumni in recording interviews with significant figures in the world of Oxford medicine. Oral histories come to us in both analogue and digital formats these days, and we try to digitise the former as and when we can. The development of the guidance is in the context of our Saving Oxford Medicine initiative to capture important sources for the recent history of medicine in Oxford. One of the core activities of this initiative is survey work, and it is notable that many archives surveyed include plenty of digital material. Web archiving is another element of the 'capturing' work that the Saving Oxford Medicine team has been doing, and you can see what has been archived to-date via Archive-It, our web archiving service provider. Much of Wednesday morning was given over to a meeting of our building committee, which had very little to do with digital archives! In the afternoon, however, we were pleased to welcome visitors from MIT - Nancy McGovern and Kari Smith. I find visits like these are one of the most important ways of sharing information, experiences and know-how, and as always I got a lot out of it. I hope Nancy and Kari did too! That same afternoon, colleagues returned from a trip to London to collect another tranche of a personal archive. I'm not sure if this instalment contains much in the way of digital material, but previous ones have included hundreds of floppies and optical media, some zip discs and two hard disks. Also arriving on Wednesday, some digital Library records courtesy of our newly retired Executive Secretary; these supplement materials uploaded to BEAM (our digital archives repository) last week. On Thursday, I found some time to work with developer Carl Wilson on our SPRUCE-funded project. Becky Nielsen (our recent trainee, now studying at Glasgow) kicked off this short project with Carl, following on from her collaboration with Peter May at a SPRUCE mashup in Glasgow. I'm picking up some of the latter stages of testing and feedback work now Becky's started her studies. The development process has been an agile one with lots of chat and testing. I've found this very productive - it's motivating to see things evolving, and to be able to provide feedback early and often. For now you can see what's going on at github here, but this link will likely change once we settle on a name that's more useful than 'spruce-beam' (doesn't tell you much, does it?! Something to do with trees...) One of the primary aims of this tool is to facilitate collection analysis, so we know better what our holdings are in terms of format and content. We expect that it will be useful to others, and there will be more info. on it available soon. Friday was more SPRUCE work with Carl, among other things. Also a few meetings today - one around funding and service models for digital archiving, and a meeting of the Bodleian's eLegal Deposit Group (where my special interest is web archiving). The curious can read more about e-legal deposit at the DCMS website.  One fun thing that came out of the day was that the Saving Oxford Medicine team decided to participate in a Women in Science wikipedia editathon. This will be hosted by the Radcliffe Science Library on 26 October as part of a series of 'Engage' events on social media organised by the Bodleian and the University's Computing Services. It's fascinating to contemplate how the range and content of Wikipedia articles change over time, something a web archive would facilitate perhaps.  For more on working with digital archives, go take a look at the great posts at the Day of Digital Archives blog! Posted by Susan Thomas at 19:45 No comments: Labels: acquisition, collection analysis, DayofDigArc, DODA12, dpc, mashup, SPRUCE, webarchiving Friday, 8 June 2012 Sprucing up the TikaFileIdentifier As it's International Archives Day tomorrow, I thought it would be nice to quickly share some news of a project we are working on, which should help us (and others!) to carry out digital preservation work a little bit more efficiently. Following the SPRUCE mashup I attended in April, we are very pleased to be one of the organizations granted a SPRUCE Project funding award, which will allow us to 'spruce' up the TikaFileIdentifier tool. (Paul has written more about these funding awards on the OPF site.) TikaFileIdentifier is the tool which was developed at the mashup to address a problem several of us were having extracting metadata from batches of files, in our case within ISO images. Due to the nature of the mashup event the tool is still a bit rough around the edges, and this funding will allow us to improve on it. We aim to create a user interface and a simpler install process, and carry out performance improvements. Plus, if resources allow, we hope to scope some further functionality improvements. This is really great news, as with the improvements that this funding allows us to make, the TikaFileIdentifier will provide us with better metadata for our digital files more efficiently than our current system of manually checking each file in a disk image. Hopefully the simpler user interface and other improvements means that other repositories will want to make use of it as well; I certainly think it will be very useful! Posted by Rebecca Nielsen at 17:18 No comments: Labels: metadata, SPRUCE, TikaFileIdentifier Friday, 20 April 2012 SPRUCE Mashup: 16th-18th April 2012 Earlier this week I attended a 3 day mashup event in Glasgow, organised as part of the SPRUCE project.  SPRUCE aims to enable Higher Education Institutions to address preservation gaps and articulate the business case of digital preservation, and the mashup serves as a way to bring practitioners and developers together to work on these problems. Practitioners took along a collection which they were having issues with, and were paired off with a developer who could work on a tool to provide a solution.  Day 1 After some short presentations on the purpose of SPRUCE and the aims of the mashup, the practitioners presented some lightning talks on our collections and problems. These included dealing with email attachments, preserving content off Facebook, software emulation, black areas in scanned images, and identifying file formats with incorrect extensions, amongst others. I took along some disk images, as we find it very time-consuming to find out date ranges, file types and content of the files in the disk image, and we wanted a more efficient way to get this metadata. More information on the collections and issues presented can be found at the wiki. After a short break for coffee (and excellent cakes and biscuits) we were sorted into small groups of collection owners and developers to discuss our issues in more detail. In my group this led to conversations about natural language processing, and the possibilities of using predefined subjects to identify files as being about a particular topic, which we thought could be really helpful, but somewhat impossible to create in a couple of days! We were then allocated our developers. As there were a few of us with problems with file identification, we were assigned to the same developer, Peter May from the BL. The day ended with a short presentation from William Kilbride on the value of digital collections and Neil Beagrie's benefits framework. Day 2 The developers were packed off to another room to work on coding, while we collection owners started to look into the business case for digital preservation. We used Beagrie’s framework to consider the three dimensions of benefits (direct or indirect, near- or long-term, and internal or external), as they apply to our institutions. When we reported back, it was interesting to see how different organisations benefit in different ways. We also looked at various stakeholders and how important or influential they are to digital preservation. Write ups of these sessions are also available at the wiki.   The developers came back at several points throughout the day to share their progress with us, and by lunchtime the first solution had been found! The first steps to solving our problem were being made; Peter had found a program, Apache Tika, which can parse a file and extract metadata (it can also identify the content type of files with incorrect extensions), and had written a script so that it could work through a directory of files, and output the information into a CSV spreadsheet. This was a really promising start, especially due to the amount of metadata that could potentially be extracted (provided it exists within the file), and the ability to identify file types with incorrect extensions. Day 3 We had another catch up with the developers and their overnight progress. Peter had written a script that took the information from the CSV file and summarised it into one row, so that it fits into the spreadsheets we use at BEAM. Unfortunately, mounting the ISO image to check it with Apache Tika was slightly more complicated than anticipated, so our disk images couldn't be checked this way without further work. While the developers set about finalizing their solutions, we continued to work on the business case, doing a skills gap analysis to consider whether our institutions had the skills and resources to carry out digital preservation. Reporting back, we had a very interesting discussion on skills gaps within the broader archives sector, and the need to provide digital preservation training to students as well as existing professionals. We then had to prepare an ‘elevator pitch’ for those occasions when we find ourselves in a lift with senior management, which neatly brought together all the things we had discussed, as we had to explain the specific benefits of digital preservation to our institution and our goals in about a minute.  To wrap up the developers presented their solutions, which solved many of the problems we had arrived with. A last minute breakthrough in mounting ISO images using  WinCDEmu and running scripts on them meant that we are able to use the Tika script on our disk images. However, because we were so short on time, there are still some small problems that need addressing. I'm really happy with our solution, and I was very impressed by all the developers and how much they were able to get done in such a short space of time. I felt that this event was a very useful way to get thinking about the business case for what we do, and to get to see what other people within the sector are doing and what problems they are facing. It was also really helpful as a non-techie to get to talk with developers and get an idea of what it is possible to build tools to do (and get them made!). I would definitely recommend this type of event – in fact, I’d love to go along again if I get the opportunity! Posted by Rebecca Nielsen at 15:52 2 comments: Monday, 26 March 2012 Media Recognition: DV part 3 DVCAM (encoding) Type: Digital videotape cassette encoding Introduced: 1996 Active: Yes, but few new camcorders are being produced. Cessation: - Capacity: 184 minutes (large), 40 minutes (MiniDV). Compatibility: DVCAM is an enhancement of the widely adopted DV format, and uses the same encoding. Cassettes recorded in DVCAM format can be played back in DVCAM VTRs (Video Tape Recorders), newer DV VTRs (made after the introduction of DVCAM), and DVCPRO VTRs, as long as the correct settings are specified (this resamples the signal to 4:1:1). DVCAM can also be played back in compatible HDV players. Users: Professional / Industrial. File Systems: - Common Manufacturers: Sony, Ikegami. DVCAM is Sony’s enhancement of the DV format for the professional market. DVCAM uses the same encoding as DV, although it records ‘locked’ rather than ‘unlocked’ audio. It also differs from DV as it has a track width of 15 microns and a tape speed of 28.215 mm/sec to make it more robust. Any DV cassette can contain DVCAM format video, but some are sold with DVCAM branding on them. Recognition DVCAM labelled cassettes come in large (125.1 x 78 x 14.6 mm) or MiniDV (66 x 48 x 12.2mm) sizes. Tape width is ¼”. Large cassettes are used in editing and recording decks, while the smaller cassettes are used in camcorders. They are marked with the DVCAM logo, usually in the upper-right hand corner.  HDV (encoding) Type: Digital videotape cassette encoding Introduced: 2003 Active: Yes, although industry experts do not expect many new HDV products. Cessation: - Capacity: 1 hour (MiniDV), up to 4.5 hours (large) Compatibility: Video is recorded in the popular MPEG-2 video format. Files can be transferred to computers without loss of quality using an IEEE 1394 connection. There are two types of HDV, HDV 720p and HDV 1080, which are not cross-compatible. HDV can be played back in HDV VTRs. These are often able to support other formats such as DV and DVCAM. Users: Amateur/Professional File Systems: - Common Manufacturers: Format developed by JVC, Sony, Canon and Sharp. Unlike the other DV enhancements, HDV uses MPEG-2 compression rather than DV encoding. Any DV cassette can contain HDV format video, but some are sold with HDV branding on them.  There are two different types of HDV: HDV 720p (HD1, made by JVC) and HDV 1080 (HD2, made by Sony and Canon). HDV 1080 devices are not generally compatible with HDV 720p devices. The type of HDV used is not always identified on the cassette itself, as it depends on the camcorder used rather than the cassette. Recognition  HDV is a tape only format which can be recorded on normal DV cassettes. Some MiniDV cassettes with lower dropout rates are indicated as being for HDV, either with text or the HDV logo. These are not essential for recording HDV video.  Posted by Rebecca Nielsen at 14:52 No comments: Labels: digital video, DVCAM, HDV, media recoginition, video Media Recognition: DV part 2 DV (encoding) Type: Digital videotape cassette encoding Introduced: 1995 Active: Yes, but tapeless formats such as MPEG-1, MPEG-2 and MPEG-4 are becoming more popular. Cessation: - Capacity: MiniDV cassettes can hold up to 80/120 minutes SP/LP. Medium cassette size can hold up to 3.0/4.6 hrs SP/LP. Files sizes can be up to 1GB per 4 minutes of recording. Compatibility: DV format is widely adopted. Cassettes recorded in the DV format can be played back on DVCAM, DVCPRO and HDV replay devices. However, LP recordings cannot be played back in these machines. Users: DV is aimed at a consumer market – may also be used by ‘prosumer’ film makers. File Systems: - Common Manufacturers: A consortium of over 60 manufacturers including Sony, Panasonic, JVC, Canon, and Sharp. DV has a track width of 10 microns and a tape speed of 18.81mm/sec. It can be found on any type of DV cassette, regardless of branding, although most commonly it is the format used on MiniDV cassettes.  Recognition DV cassettes are usually found in the small size, known as MiniDV. Medium size (97.5 × 64.5 × 14.6 mm) DV cassettes are also available, although these are not as popular as MiniDV. DV cassettes are labelled with the DV logo. DVCPRO (encoding) Type: Digital videotape cassette encoding Introduced: 1995 (DVCPRO), 1997 (DVCPRO 50), 2000 (DVCPRO HD) Active: Yes, but few new camcorders are being produced. Cessation: - Capacity: 126 minutes (large), 66 minutes (medium). Compatibility: DVCPRO is an enhancement of the widely adopted DV format, and uses the same encoding. Cassettes recorded in DVCPRO format can be played back only in DVCPRO Video Tape Recorders (VTRs) and some DVCAM VTRs. Users: Professional / Industrial; designed for electronic news gathering File Systems: - Common Manufacturers: Panasonic, also Philips, Ikegami and Hitachi. DVCPRO is Panasonic’s enhancement of the DV format, which is aimed at a professional market. DVCPRO uses the same encoding as DV, but it features ‘locked’ audio, and uses 4:1:1 sampling instead of 4:2:0. It has an 18 micron track width, and a tape speed of 33.82 mm/sec which makes it more robust. DVCPRO uses Metal Particle (MP) tape rather than Metal Evaporate( ME) to improve durability. DVCPRO 50 and DVCPRO HD are further developments of DVCPRO, which use the equivalent of 2 or 4 DV codecs in parallel to increase the video data rate. Any DV cassette can contain DVCPRO format video, but some are sold with DVCPRO branding on them. Recognition DVCPRO branded cassettes come in medium (97.5 × 64.5 × 14.6mm) or large (125 × 78 × 14.6mm) cassette sizes. The medium size is for use in camcorders, and the large size in editing and recording decks. DVCPRO 50 and DVCPRO HD branded cassettes are extra-large cassettes (172 x 102 x 14.6mm). Tape width is ¼”. DVCPRO labelled cassettes have different coloured tape doors depending on their type; DVCPRO has a yellow tape door, DVCPRO50 has a blue tape door, and DVCPRO HD has a red tape door. Images of DVCPRO cassettes are available at the Panasonic website. Posted by Rebecca Nielsen at 14:31 No comments: Labels: digital video, DV, DVCPRO, media recoginition, video Media Recognition: DV part 1 DV can be used to refer to both a digital tape format, and a codec for digital video. DV tape usually carries video encoded with the DV codec, although it can hold any type of data. The DV format was developed in the mid 1990s by a consortium of video manufacturers, including Sony, JVC and Panasonic, and quickly became the de facto standard for home video production after introduction in 1995. Videos are recorded in .dv or .dif formats, or wrapped in an AVI, QuickTime or MXF container. These can be easily transferred to a computer with no loss of data over an IEEE 1394 (Fire Wire) connection. DV tape is ¼ inch (6.35mm) wide. DV cassettes come in four different sizes: Small, also known as MiniDV (66 x 48 x 12.2 mm), medium (97.5 × 64.5 × 14.6 mm), large (125.1 x 78 x 14.6 mm), and extra-large (172 x 102 x 14.6 mm). MiniDV is the most popular cassette size. DV cassettes can be encoded with one of four formats; DV, DVCAM, DVCPRO, or HDV. DV is the original encoding, and is used in consumer devices. DVCPRO and DVCAM were developed by Panasonic and Sony respectively as an enhancement of DV, and are aimed at a professional market. The basic encoding algorithm is the same as with DV, but a higher track width (18 and 15 microns versus DV’s 10 micron track width) and faster tape speed means that these formats are more robust and better suited to professional users. HDV is a high-definition variant, aimed at professionals and consumers, which uses MPEG-2 compression rather than the DV format. Depending on the recording device, any of the four DV encodings can be recorded on any size DV cassette. However, due to different recording speeds, the formats are not always backwards compatible. A cassette recorded in an enhanced format, such as HDV, DVCAM or DVCPRO, will not play back on a standard DV player. Also, as they are supported by different companies, there are some issues with playing back a DVCPRO cassette on DVCAM equipment, and vice versa. Although all DV cassette sizes can record any format of DV, some are marketed specifically as being of a certain type; e.g. DVCAM. The guide below looks at some of the most common varieties of DV cassette that might be encountered, and the encodings that may be used with them. It is important to remember that any type of encoding may be found on any kind of cassette, depending on what system the video was recorded on. MiniDV (cassette) Type: Digital videotape cassette Introduced: 1995 Active: Yes, but is being replaced in popularity by hard disk and flash memory recording. At the International Consumer Electronics Show 2011 no camcorders were presented which record on tape. Cessation: - Capacity: Up to 80 minutes SP / 120 minutes LP, depending on the tape used; 60/90 minutes SP/LP is standard. This can also depend on the encoding used (see further entries). Files sizes can be up to 1GB per 4 minutes of recording. Compatibility: DV file format is widely adopted. Requires Fire Wire (IEEE 1394) port for best transfer. Users: Consumer and ‘Prosumer’ film makers, some professionals. File Systems: - Common Manufacturers: A consortium of over 60 manufacturers including Sony, Panasonic, JVC, Canon, and Sharp MiniDV refers to the size of the cassette; as noted above, it can come with any encoding. As a consumer format they generally use DV encoding. DVCAM and HDV cassettes also come in MiniDV size. MiniDV is the most popular DV cassette, and is used for consumer and semi-professional (‘prosumer’) recordings due to its high quality. Recognition These cassettes are the small cassette size, measuring 66 x 48 x 12.2mm. Tape width is ¼”. They carry the MiniDV logo, as seen below: Posted by Rebecca Nielsen at 13:03 No comments: Labels: digital video, DV, media recoginition, MiniDV, video Monday, 30 January 2012 Digital Preservation: What I Wish I Knew Before I Started Tuesday 24th January, 2012 Last week I attended a student conference, hosted by the Digital Preservation Coalition, on what digital preservation professionals wished they had known before they started. The event covered a great deal of the challenges faced by those involved in digital preservation, and the skills required to deal with these challenges. The similarities between traditional archiving and digital preservation were highlighted at the beginning of the afternoon, when Sarah Higgins translated terms from the OAIS model into more traditional ‘archive speak’. Dave Thompson also emphasized this connection, arguing that digital data “is just a new kind of paper”, and that trained archivists already have 85-90% of the skills needed for digital preservation. Digital preservation was shown to be a human rather than a technical challenge. Adrian Brown argued that much of the preservation process (the "boring stuff") can be automated. Dave Thompson stated that many of the technical issues of digital preservation, such as migration, have been solved, and that the challenge we now face is to retain the context and significance of the data. The point made throughout the afternoon was that you don’t need to be a computer expert in order to carry out effective digital preservation. The urgency of intervention was another key lesson for the afternoon. As William Kilbride put it; digital preservation won’t do itself, won’t go away, and we shouldn't wait for perfection before we begin to act. Access to data in the future is not guaranteed without input now, and digital data is particularly intolerant to gaps in preservation. Andrew Fetherstone added to this argument, noting that doing something is (usually) better than doing nothing, and that even if you are not in a position to carry out the whole preservation process, it is better to follow the guidelines as far as you can, rather than wait and create a backlog. The scale of digital preservation was another point illustrated throughout the afternoon. William Kilbride suggested that the days of manual processing are over, due to the sheer amount of digital data being created (estimated to reach 35ZB by 2020!). He argued that the ability to process this data is more important to the future of digital preservation than the risks of obsolescence. The impossibility of preserving all of this data was illustrated by Helen Hockx-Yu, who offered the statistic the the UK Web Archive and National Archives Web Archive combined have archived less than 1% of UK websites. Adrian Brown also pointed out that as we move towards dynamic, individualised content on the web, we must decide exactly what the information is that we are trying to preserve. During the Q&A session, it was argued that the scale of digital data means that we have to accept that we can’t preserve everything, that not everything needs to be preserved, and that there will be data loss. The importance of collaboration was another theme which was repeated by many speakers. Collaboration between institutions on a local, national and even international level was encouraged, as by sharing solutions to problems and implementing common standards we can make the task of digital preservation easier. This is only a selection of the points covered in a very engaging afternoon of discussion. Overall, the event showed that, despite the scale of the task, digital preservation needn't be a frightening prospect, as archivists already have many of the necessary skills. The DPC have uploaded the slides used during the event, and the event was also live-tweeted, using the hashtag #dpc_wiwik, if you are interested in finding out more. Posted by Rebecca Nielsen at 09:41 1 comment: Labels: http://www.blogger.com/img/blank.gif Tuesday, 18 October 2011 What is ‘The Future of the Past of the Web’? ‘The Future of the Past of the Web’, Digital Preservation Coalition Workshop British Library, 7 October 2011 Chrissie Webb and Liz McCarthy In his keynote address to this event – organised by the Digital Preservation Coalition , the Joint Information Systems Committee and the British Library – Herbert van der Sompel described the purpose of web archiving as combating the internet’s ‘perpetual now’. Stressing the importance to researchers of establishing the ‘temporal context’ of publications and information, he explained how the framework of his Memento Project uses a ‘ timegate’ implemented via web plugins to show what a resource was like at a particular date in the past. There is a danger, however, that not enough is being archived to provide the temporal context; for instance, although DOIs provide stable documents, the resources they link to may disappear (‘link rot’). The Memento Project Firefox plugin uses a sliding timeline (here, just below the Google search box) to let users choose an archived date A session on using web archives picked up on the theme of web continuity in a presentation by The National Archives on the UK Government Web Archive, where a redirection solution using open source software helps tackle the problems that occur when content is moved or removed and broken links result. Current projects are looking at secure web archiving, capturing internal (e.g. intranet) sources, social media capture and a semantic search tool that helps to tag ‘unstructured’ material. In a presentation that reinforced the reason for the day’s ‘use and impact’ theme, Eric Meyer of the Oxford Internet Institute wondered whether web archives were in danger of becoming the ‘dusty archives’ of the future, contrasting their lack of use with the mass digitisation of older records to make them accessible. Is this due to a lack of engagement with researchers, their lack of confidence with the material or the lingering feeling that a URL is not a ‘real’ source? Archivists need to interrupt the momentum of ‘learned’ academic behaviour, engaging researchers with new online material and developing archival resources in ways that are relevant to real research – for instance, by helping set up mechanisms for researchers to trigger archiving activity around events or interests, or making more use of server logs to help them understand use of content and web traffic. One of the themes of the second session on emerging trends was the shift from a ‘page by page’ approach to the concept of ‘data mining’ and large scale data analysis. Some of the work being done in this area is key to addressing the concerns of Eric Meyer’s presentation; it has meant working with researchers to determine what kinds and sources of data they could really use in their work. Representatives of the UK Web Archive and the Internet Archive described their innovations in this field, including visualisation and interactive tools. Archiving social networks was also a major theme, and Wim Peters outlined the challenges of the ARCOMEM project, a collaboration between Sheffield and Hanover Universities that is tackling the problems of archiving ‘community memory’ through the social web, confronting extremely diverse and volatile content of varying quality for which future demand is uncertain. Richard Davis of the University of London Computer Centre spoke about the BlogForever project, a multi-partner initiative to preserve blogs, while Mark Williamson of Hanzo Archives spoke about web archiving from a commercial perspective, noting that companies are very interested in preserving the research opportunities online information offers. The final panel session raised the issue of the changing face of the internet, as blogs replace personal websites and social media rather than discrete pages are used to create records of events. The notion of ‘web pages’ may eventually disappear, and web archivists must be prepared to manage the dispersed data that will take (and is taking) their place. Other points discussed included the need for advocacy and better articulation of the demand for web archiving (proposed campaign: ‘Preserve!: Are you saving your digital stuff?’), duplication and deduplication of content, the use of automated selection for archiving and the question of standards. Posted by lizrosemccarthy at 13:40 No comments: Labels: Future of the Past of the Web, webarchives, workshop Older Posts Home Subscribe to: Posts (Atom) What's the futureArch blog? A place for sharing items of interest to those curating hybrid archives & manuscripts. Legacy computer bits wanted! At Bodleian Electronic Archives and Manuscripts (BEAM) we are always on the lookout for older computers, disk drives, technical manuals and software that can help us recover digital archives. If you have any such stuff that you would be willing to donate, please contact susan.thomas@bodleian.ox.ac.uk. Examples of items in our wish list include: an Apple Mac Macintosh Classic II Computer, a Wang PC 200/300 series, as well as myriad legacy operating system and word-processing software. Handy links Bodleian Electronic Archives & Manuscripts (BEAM) Bodleian Library Digital Preservation Coalition Oxford University Label Cloud 4n6umd (1) access (1) accession (1) accessioning (2) adapter (1) advisory board (2) agents (1) agrippa (1) Amatino Manucci (1) analysis (1) appraisal (1) arch enemy (1) archival dates (1) archival interfaces (7) archiving habits (1) ATA (1) audio (2) authority control (2) autogenerated metadata (3) autumn (1) BBC (2) BEAM architecture (1) blu-ray (1) buzz (1) cais (1) case studies (3) cd (3) cerp (1) chat (1) community (1) content model (2) copyright review (1) corruption (2) creator curation (1) cunning plan (1) D-Link (1) DAMS (1) data capture (5) data extraction (4) data recovery (5) dead media (1) Desktop (1) development (1) DGE-530T (1) digital archaeology (4) digital legacy (1) digital preservation (6) digitallivesconference (1) disk imaging (3) disks (1) documents (3) dundee (2) DVD (2) eac (1) ead (1) electronic records (1) email (5) emcap (1) emulation (1) emulators (1) eSATA (1) estate planning (1) etdf (1) facebook (1) faceted browser (1) file carving (1) file format recognition (2) file format specifications (1) file signatures (1) film (1) finding aids (1) FireWire (2) flash media (2) floppy disks (5) forensics (2) formats (1) friday post (1) funny (1) futureArch (1) gaip (1) geocities (1) Gigabit (1) gmail (1) google (1) googledocs (1) graduate traineeship (1) hard drive (5) highslide (1) holidays (1) hybrid archives (1) hybridity (1) hypertext exhibitions writers (1) images (1) indexing (1) ingest (2) interfaces (11) interoperability (2) intrallect (1) ipaper (1) ipr (1) iPres2008 (1) IPs (1) island of unscalable complexity (1) iso8601 (1) java (2) javascript (1) jif08 (1) JISC (1) job (1) kryoflux (1) lightboxes (1) linked data (1) literary (1) markup (2) may 2010 (1) media (2) media recoginition (14) metadata (4) microsoft (1) Microsoft Works (1) migration tools (4) moon landings (1) multitouch (1) music blogs (1) namespaces (1) never say never (1) normalisation (1) object characteristics (1) obsolescence (1) odd (1) office documents (2) online data stores (1) open source (5) open source development (2) open source software (2) optical media (4) osswatch (1) PCI (1) planets (2) planets testbed (1) plato (1) preservation planning (1) preservation policy (2) preservation tools (5) projects (2) pst (2) repositories (1) researchers (3) saa2009 (1) SAS (1) SATA (1) scat (1) scholars (1) scooters (1) scribd (1) SCSI (1) semantic (1) seminars (1) significant properties (1) snow (1) software (4) solr (1) sound (1) steganography (1) storage (1) tag clouds (1) tags (1) technical metadata (1) transfer bagit verify (1) twapperkeeper (1) tweets (1) USB (2) use cases (1) users (2) validation (1) value (1) video (6) vintage computers (2) weavers (1) webarchives (4) workshop (2) xena (1) xml (3) xmp (1) zip disks (1) Blog archive ▼  2016 (1) ▼  September (1) This blog is no longer being updated ►  2013 (2) ►  October (1) ►  January (1) ►  2012 (9) ►  November (1) ►  October (2) ►  June (1) ►  April (1) ►  March (3) ►  January (1) ►  2011 (16) ►  October (2) ►  September (1) ►  August (2) ►  July (1) ►  May (1) ►  April (4) ►  March (3) ►  February (1) ►  January (1) ►  2010 (42) ►  December (1) ►  November (2) ►  September (4) ►  August (4) ►  July (3) ►  June (4) ►  May (1) ►  April (5) ►  March (13) ►  February (3) ►  January (2) ►  2009 (51) ►  December (4) ►  November (3) ►  October (2) ►  September (2) ►  August (5) ►  July (6) ►  June (5) ►  May (6) ►  April (5) ►  March (1) ►  February (6) ►  January (6) ►  2008 (10) ►  November (1) ►  October (3) ►  September (1) ►  August (1) ►  July (4) Subscribe To futureArch Posts Atom Posts All Comments Atom All Comments beamtweet Loading... My Blog List The Signal: Digital Preservation Joining By the People: An interview with Abby Shelton ArchivesBlogs Meet Ike Digital Archiving at the University of York Latest Booking System in Google Sheets (WORKING!) ArchivesNext Now available: “A Very Correct Idea of Our School”: A Photographic History of the Carlisle Indian Industrial School Practical E-Records Hello world! born digital archives Practical First Steps mgolson@stanford.edu's blog KEEP - Keeping Emulation Environments Portable Digital Curation Blog Thoughts before "The Future of the Past of the Web" Archives Hub Blog Open Planets Foundation UK Web Archive Technology Watch Digital Lives Bits Bytes & Archives branker's blog DPC RSS News Feed Loading... About Me Susan Thomas View my complete profile futurelab-mx-5627 ---- ¡El futuro es ahora! | Future Lab ☰ ✎ Edit navigation Inicio Nosotros Eventos Blog Contacto Somos Future Lab, la comunidad del futuro. Desarrollamos tecnología y compartimos conocimiento. Trabajamos por el futuro que queremos ver. Síguenos en Facebook Desarrollamos proyectos de base científica y tecnológica Ya sea que nosotros mismo metamos mano en el desarrollo, o que sea a través de mentorías, nos encanta ser capaces de innovar y poder aportar al desarrollo de nuevas tecnologías; llevarlo a diferentes partes de México y poder presentar nuestros proyectos en diferentes eventos con empresas que están en el medio. Compartimos conocimiento y fomentamos la educación en tecnología A través de talleres, charlas, conferencias y participaciones en eventos compartimos conocimientos técnicos y de cultura para el desarrollo de tecnología. Vinculamos y creamos comunidad Nos encanta empoderar a nuestra comunidad, apoyar y poder vincular con quien pueda potencializar a las grandes mentes del futuro que se nos acercan. Nuestra visión en Future Lab es poder desarrollar nuestro futuro, compartir conocimientos y poder crear las conexiones que ayuden a nuestra comunidad. Rodolfo Ferro, Co-fundador de Future Lab. ¡Conoce todo lo que estamos haciendo! Future Lab en Facebook ✎ Edit footer © 2020 Future Lab. galencharlton-com-3162 ---- Meta Interchange – Libraries, computing, metadata, and more Skip to content Meta Interchange Libraries, computing, metadata, and more Search for Submit Primary Menu About Comment policy Privacy Policy Search for Submit Trading for images Posted: 23 February 2020 Categories: Libraries, Patron Privacy Let’s search a Koha catalog for something that isn’t at all controversial: What you search for in a library catalog ought to be only between you and the library — and that, only briefly, as the library should quickly forget. Of course, between “ought” and “is” lies the Devil and his details. Let’s poke around with Chrome’s DevTools: Hit Control-Shift-I (on Windows) Switch to the Network tab. Hit Control-R to reload the page and get a list of the HTTP requests that the browser makes. We get something like this: There’s a lot to like here: every request was made using HTTPS rather than HTTP, and almost all of the requests were made to the Koha server. (If you can’t trust the library catalog, who can you trust? Well… that doesn’t have an answer as clear as we would like, but I won’t tackle that question here.) However, the two cover images on the result’s page come from Amazon: https://images-na.ssl-images-amazon.com/images/P/0974458902.01.TZZZZZZZ.jpg https://images-na.ssl-images-amazon.com/images/P/1849350949.01.TZZZZZZZ.jpg What did I trade in exchange for those two cover images? Let’s click on the request on and see: :authority: images-na.ssl-images-amazon.com :method: GET :path: /images/P/0974458902.01.TZZZZZZZ.jpg :scheme: https accept: image/webp,image/apng,image/,/*;q=0.8 accept-encoding: gzip, deflate, br accept-language: en-US,en;q=0.9 cache-control: no-cache dnt: 1 pragma: no-cache referer: https://catalog.libraryguardians.com/cgi-bin/koha/opac-search.pl?q=anarchist sec-fetch-dest: image sec-fetch-mode: no-cors sec-fetch-site: cross-site user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36 Here’s what was sent when I used Firefox: Host: images-na.ssl-images-amazon.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0 Accept: image/webp,/ Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, br Connection: keep-alive Referer: https://catalog.libraryguardians.com/cgi-bin/koha/opac-search.pl?q=anarchist DNT: 1 Pragma: no-cache Amazon also knows what my IP address is. With that, it doesn’t take much to figure out that I am in Georgia and am clearly up to no good; after all, one look at the Referer header tells all. Let’s switch over to using Google Book’s cover images: https://books.google.com/books/content?id=phzFwAEACAAJ&printsec=frontcover&img=1&zoom=5 https://books.google.com/books/content?id=wdgrJQAACAAJ&printsec=frontcover&img=1&zoom=5 This time, the request headers are in Chrome: :authority: books.google.com :method: GET :path: /books/content?id=phzFwAEACAAJ&printsec=frontcover&img=1&zoom=5 :scheme: https accept: image/webp,image/apng,image/,/*;q=0.8 accept-encoding: gzip, deflate, br accept-language: en-US,en;q=0.9 cache-control: no-cache dnt: 1 pragma: no-cache referer: https://catalog.libraryguardians.com/ sec-fetch-dest: image sec-fetch-mode: no-cors sec-fetch-site: cross-site user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36 x-client-data: CKO1yQEIiLbJAQimtskBCMG2yQEIqZ3KAQi3qsoBCMuuygEIz6/KAQi8sMoBCJe1ygEI7bXKAQiNusoBGKukygEYvrrKAQ== and in Firefox: Host: books.google.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0 Accept: image/webp,/ Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate, br Connection: keep-alive Referer: https://catalog.libraryguardians.com/ DNT: 1 Pragma: no-cache Cache-Control: no-cache On the one hand… the Referer now contains only the base URL of the catalog. I believe this is due to a difference in how Koha figures out the correct image URL. When using Amazon for cover images, the ISBN of the title is normalized and used to construct a URL for an tag. Koha doesn’t currently set a Referrer-Policy, so the default of no-referrer-when-downgrade is used and the full referrer is sent. Google Book’s cover image URLs cannot be directly constructed like that, so a bit of JavaScript queries a web service and gets back the image URLs, and for reasons that are unclear to me at the moment, doesn’t send the full URL as the referrer. (Cover images from OpenLibrary are fetched in a similar way, but full Referer header is sent.) As a side note, the x-client-data header sent by Chrome to books.google.com is… concerning. There are some relatively simple things that can be done to limit leaking the full referring URL to the likes of Google and Amazon, including Setting the Referrer-Policy header via web server configuration or meta tag to something like origin or origin-when-cross-origin. Setting referrerpolicy for
    """ % graph_data open(output, "w").write(html) Copy lines Copy permalink View git blame Reference in new issue Go © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-3751 ---- GitHub - lostRSEs/escape-room: Escape room: Translating between RSEs and Arts & Humanities Researchers Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} lostRSEs / escape-room Notifications Star 1 Fork 0 Escape room: Translating between RSEs and Arts & Humanities Researchers lostrses.github.io/escape-room/ CC-BY-4.0 License 1 star 0 forks Star Notifications Code Issues 18 Pull requests 0 Actions Projects 1 Security Insights More Code Issues Pull requests Actions Projects Security Insights main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags 7 branches 0 tags Go to file Code Clone HTTPS GitHub CLI Use Git or checkout with SVN using the web URL. Work fast with our official CLI. Learn more. Open with GitHub Desktop Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit   Git stats 115 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time docs     CODE_OF_CONDUCT.md     CONTRIBUTING.md     LICENSE     README.md     View code AHA: An Arts and Humanities Adventure! Welcome! What is AHA? Problem Solution README.md AHA: An Arts and Humanities Adventure! Welcome! Welcome to ⭐AHA: An Arts and Humanities Adventure!⭐ What is AHA? AHA: An Arts and Humanities Adventure is an interactive game to help 'translate' concepts from computer science, for researchers in the arts and humanities. For researchers in the arts and humanities: this game aims to help you understand some of the ideas, concepts (and jargon) that your research software engineering colleagues have been using. For research software engineers: this will help you explain the ideas and concepts that you use in your work to people who do not have a computer science background. We hope that playing this game will help RSEs and arts and humanities reserachers work together better and build research software that helps advance research in artss and humanities! This project began at a hackday run as part of Software Sustainability Institute's Collaborations Workshop 2021. There is a proof-of-concept web version of the game now online! You can see the source for that website in the docs folder. Problem Researchers in the Arts & Humanities can benefit greatly from research software, but often don’t have the kind of background in formally-structured design that a physicist or engineer does. This can make developing research software for them challenging- particularly when A&H problems are often defined in ways that are very different from how computational problems are defined. We want to help researchers in A&H and RSEs to communicate better, so that they can collaborate on building research software more easily. Using gamified versions of boring and dry training materials for software development, we want to make learning about software development fun and accessible. Solution Virtual escape room: Solve a set of connected puzzles to escape the virtual game room. In the course of solving the puzzles, the participants will learn key concepts from research software development. Our pitch: develop the Part 1 of this escape room series: Theme: Gamified activities to learn the meaning of common jargon words. E.g. API, Object, function, Sprint, version, Agile, automation The escape room will be themed around learning to translate an alien language (Software development) expressed in an unusual way, so that the unfamiliar concepts can be understood in the context of our work. For example: which of these flow diagrams is the correct one? What analogy of a RSE concept can we find in humanities? Format: Online, can use existing websites or a GitHub repository with questions and clues to find information. Learning journey. Aim: The aim is to encourage participants to look for information and find out resources about software development practices and RSE related concepts themselves as they find answers to solve the puzzles. Outcome of the escape room activity: participants are familiar with 4 concepts/jargon words usually used by software developers. Participants are now in a better position to work/interact with Research Software Engineers- or to go on and learn to become digital humanities developers themselves. Potential topics and set of activities for escape rooms for part 2 onwards (not proposed for this pitch, but idea for future collaboration): Set a repo to teach GitHub / version control (create with long history, ask people to find who did what, and on what days) Give a project goal that required chunking down one goal into different tasks and create clues (Agile development) Create puzzles to teach reproducibility Use interesting data table to teach about dataframe and coding using pandas Use a visualization tool or shiny app to solve different puzzles About Escape room: Translating between RSEs and Arts & Humanities Researchers lostrses.github.io/escape-room/ Resources Readme License CC-BY-4.0 License Contributors 5 © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-3945 ---- Pull requests · frictionlessdata/frictionless-py · GitHub Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} frictionlessdata / frictionless-py Notifications Star 393 Fork 74 Code Issues 68 Pull requests 0 Actions Projects 0 Security Insights More Code Issues Pull requests Actions Projects Security Insights Labels 9 Milestones 0 Labels 9 Milestones 0 New pull request New 0 Open 322 Closed 0 Open 322 Closed Author Filter by author author: Filter by this user Label Filter by label Use alt + click/return to exclude labels. Projects Filter by project Milestones Filter by milestone Reviews Filter by reviews No reviews Review required Approved review Changes requested Assignee Filter by who’s assigned Sort Sort by Newest Oldest Most commented Least commented Recently updated Least recently updated Most reactions 👍 👎 😄 🎉 😕 ❤️ 🚀 👀 There aren’t any open pull requests. You could search all of GitHub or try an advanced search. ProTip! Mix and match filters to narrow down what you’re looking for. © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-4265 ---- twarc/urls.py at main · DocNow/twarc · GitHub Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} DocNow / twarc Notifications Star 1k Fork 214 Code Issues 53 Pull requests 0 Actions Projects 0 Wiki Security Insights More Code Issues Pull requests Actions Projects Wiki Security Insights Permalink main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags twarc/utils/urls.py / Jump to Code definitions No definitions found in this file. Code navigation not available for this commit Go to file Go to file T Go to line L Go to definition R Copy path Copy permalink     Cannot retrieve contributors at this time executable file 19 lines (16 sloc) 461 Bytes Raw Blame Open with Desktop View raw View blame #!/usr/bin/env python3 """ Print out the URLs in a tweet json stream. """ from __future__ import print_function import json import fileinput for line in fileinput.input(): tweet = json.loads(line) for url in tweet["entities"]["urls"]: if 'unshortened_url' in url: print(url['unshortened_url']) elif url.get('expanded_url'): print(url['expanded_url']) elif url.get('url'): print(url['url']) Copy lines Copy permalink View git blame Reference in new issue Go © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-428 ---- twarc/wordcloud.py at main · DocNow/twarc · GitHub Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} DocNow / twarc Notifications Star 1k Fork 214 Code Issues 53 Pull requests 0 Actions Projects 0 Wiki Security Insights More Code Issues Pull requests Actions Projects Wiki Security Insights Permalink main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags twarc/utils/wordcloud.py / Jump to Code definitions No definitions found in this file. Code navigation not available for this commit Go to file Go to file T Go to line L Go to definition R Copy path Copy permalink     Cannot retrieve contributors at this time executable file 116 lines (98 sloc) 3.96 KB Raw Blame Open with Desktop View raw View blame #!/usr/bin/env python from __future__ import print_function import re import sys import json import fileinput def main(): try: from urllib import urlopen # Python 2 except ImportError: from urllib.request import urlopen # Python 3 MAX_WORDS = 100 word_counts = {} stop_words = set(["a","able","about","across","actually","after","against","agreed","all","almost","already","also","am","among","an","and","any","anyone","anyway","are","as","at","be","because","been","being","between","but","by","can","cannot","come","could","dear","did","do","does","either","else","ever","every","for","from","get","getting","got","had","has","have","he","her","here","hers","hey","hi","him","his","how","however","i","i'd","i'll","i'm","if","in","into","is","isnt","isn't","it","its","just","kind","last","latest","least","let","like","likely","look","make","may","me","might","more","most","must","my","neither","new","no","nor","not","now","of","off","often","on","only","or","other","our","out","over","own","part","piece","play","put","putting","rather","real","really","said","say","says","she","should","simply","since","so","some","than","thanks","that","that's","thats","the","their","them","then","there","these","they","they're","this","those","tis","to","too","try","twas","us","use","used","uses","via","wants","was","way","we","well","were","what","when","where","which","while","who","whom","why","will","with","would","yet","you","your","you're","youre"]) for line in fileinput.input(): try: tweet = json.loads(line) except: pass for word in text(tweet).split(' '): word = word.lower() word = word.replace(".", "") word = word.replace(",", "") word = word.replace("...", "") word = word.replace("'", "") word = word.replace(":", "") word = word.replace("(", "") word = word.replace(")", "") if len(word) < 3: continue if len(word) > 15: continue if word in stop_words: continue if word[0] in ["@", "#"]: continue if re.match('https?', word): continue if word.startswith("rt"): continue if not re.match('^[a-z]', word, re.IGNORECASE): continue word_counts[word] = word_counts.get(word, 0) + 1 sorted_words = list(word_counts.keys()) sorted_words.sort(key = lambda x: word_counts[x], reverse=True) top_words = sorted_words[0:MAX_WORDS] words = [] count_range = word_counts[top_words[0]] - word_counts[top_words[-1]] + 1 size_ratio = 100.0 / count_range for word in top_words: size = int(word_counts[word] * size_ratio) + 15 words.append({ "text": word, "size": size }) wordcloud_js = urlopen('https://raw.githubusercontent.com/jasondavies/d3-cloud/master/build/d3.layout.cloud.js').read() output = """ twarc wordcloud """ % (wordcloud_js.decode('utf8'), json.dumps(words, indent=2)) sys.stdout.write(output) def text(t): if 'full_text' in t: return t['full_text'] return t['text'] if __name__ == "__main__": main() Copy lines Copy permalink View git blame Reference in new issue Go © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-4737 ---- GitHub - softwaresaved/habeas-corpus: A corpus of research software used in COVID-19 research. Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} softwaresaved / habeas-corpus Notifications Star 4 Fork 3 A corpus of research software used in COVID-19 research. MIT License 4 stars 3 forks Star Notifications Code Issues 12 Pull requests 0 Actions Projects 0 Security Insights More Code Issues Pull requests Actions Projects Security Insights main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags 6 branches 0 tags Go to file Code Clone HTTPS GitHub CLI Use Git or checkout with SVN using the web URL. Work fast with our official CLI. Learn more. Open with GitHub Desktop Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit   Git stats 73 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time R     data     docs     notebooks     .gitignore     Habeas Corpus logo.png     LICENSE     README.md     postBuild     requirements.txt     View code Habeas Corpus Contributing ✏️ Project roadmap 🏁 Licensing Acknowledgements 👪 References 📚 README.md Habeas Corpus This is work done during the hack day at Collaborations Workshop 2021, to create a corpus of research software used for COVID-19 and coronavirus-related research that will be useful in a number of ways to the research software sustainability community around the Software Sustainability Institute. This is based on and extends the "CORD-19 Software Mentions" dataset published by the Chan Zuckerberg Institute (doi: https://doi.org/10.5061/dryad.vmcvdncs0). Contributing ✏️ Habeas Corpus is a collaborative project and we welcome suggestions and contributions. We hope one of the invitations below works for you, but if not, please let us know! 🏃 I'm busy, I only have 1 minute Tell a friend about the project! ⏳ I've got 5 minutes - tell me what I should do Suggest ideas for how you would like to use Habeas Corpus 💻 I've got a few hours to work on this Take a look at the issues and see if there are any you can contribute to Create an analysis using the data and let us know about it 🎉 I really want to help increase the community Organise a hackday to use or improve Habeas Corpus Please open a GitHub issue to suggest a new idea or let us know about bugs. Project roadmap 🏁 For tasks to work on in the near future, please see open Issues. For the bigger picture, please check and contribute to plan.md Licensing Software code and notebooks from this project are licensed under the open source MIT license. Project documentation and images are licensed under CC BY 4.0. Data produced by this project in the data/outputs directory is licensed under CC0. Other data included in this project from other sources remains licensed under its original license. Acknowledgements 👪 This project originated as part of the Collaborations Workshop 2021. It was based on an original idea by Neil Chue Hong (@npch) and Stephan Druskat (@sdruskat), incorporated ideas and feedback from Michelle Barker, Daniel S. Katz, Shoaib Sufi, Carina Haupt and Callum Rollo, and was developed by Alexander Konovalov (@alex-konovalov), Hao Ye (@ha0ye), Louise Chisholm (@LouiseChisholm), Mark Turner (@MarkLTurner), Neil Chue Hong (@npch), Sammie Buzzard (@sammiebuzzard), and Stephan Druskat (@sdruskat). The data is derived from the "CORD-19 Software Mentions" dataset published by Alex D Wade and Ivana Williams from the Chan Zuckerberg Initiative and released under a CC0 license. References 📚 Softcite dataset v1.0: Du, C., Cohoon, J., Lopez, P., & Howison, J. (forthcoming). Softcite Dataset: A Dataset of Software Mentions in Biomedical and Economic Research Publications. Journal of the Association for Information Science and Technology. DOI: 10.1002/asi.24454. CORD-19 Software Mentions Software in the Scientific Literature: Problems with Seeing, Finding, and Using Software Mentioned in the Biology Literature Introducing the PID Graph About A corpus of research software used in COVID-19 research. Topics research-software Resources Readme License MIT License Releases No releases published Packages 0 No packages published Contributors 8 Languages Jupyter Notebook 99.4% Other 0.6% © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-5521 ---- GitHub - elichad/software-twilight: Software end of project plans Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this user All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} elichad / software-twilight Notifications Star 0 Fork 0 Software end of project plans View license 0 stars 0 forks Star Notifications Code Issues 2 Pull requests 0 Actions Projects 0 Security Insights More Code Issues Pull requests Actions Projects Security Insights main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags 8 branches 0 tags Go to file Code Clone HTTPS GitHub CLI Use Git or checkout with SVN using the web URL. Work fast with our official CLI. Learn more. Open with GitHub Desktop Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit   Git stats 51 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time __pycache__     .replit     CODEofCONDUCT.md     CONTRIBUTING.md     LICENSE.md     README.md     backend.py     decisions.py     environment.yml     index.ipynb     questionnaire.md     test_data.py     twilight_date_example.svg     twilight_plan_example.svg     View code software-twilight License Introduction Available badges Question themes Running Design Question format Customization of UI Further resources Known issues README.md This work is licensed under a Creative Commons Attribution 4.0 International License. software-twilight Software end of project plans License This project is licensed under the CC-BY license. You are free to: Share — copy and redistribute the material in any medium or format Adapt — remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms. The full text of the license can be found here. Introduction Development of software under a fixed-term project should consider several aspects of ongoing support after the project's end. There are two main eventualities: the software's development abruptly ends; there is some end-user support, although there will be no new feature development. Each of these presents a problem. Ending support reduces the sustainability of the environment, while ongoing maintenance requires the dedication of further resources. Under the software twilight plan, the project's developer will be aware of necessary considerations. This repository is intended to be used to assess and guide a project maintainer in plans for the software's end of life. We provide a tool to be used, during the active development phase, by a project maintainer to assess and certify support plans for the project once it will no longer be actively developed. On completion of a short questionnaire the user is offered a badge to add to the repository to signal to the community when, and how, the software will go gentle into its good night. Available badges We have two badges, as examples, which look look like and mean the following: - we have a (good) plan - twilight is coming up at the specified time Question themes The tool covers a number of themes, including: potential funding for ongoing development required levels of future support deployment infrastructure required size of user community size of maintainer group status of ongoing contact with main developer(s)/development group Running Design The tool is designed in three parts: The front-end is designed with Jupyer Notebooks. It uses Jupyter Widgets, appmode package and mybinder.org to display automatically the notebook cells as a web app. The questions and answers are populated by the backend, that provides the appropriate next question based on the answer to the previous one, following a decision tree, until there are no more (relevant) questions to ask. Finally, all the answers are processed and one or more badges informing on the end-of-life status of the project are provided in the form of markdown text. A summary of the answers is also provided. This text can be easily pasted into the project README file. Question format The decision tree is populated from the file decisions.py. This file has quite customizable entries in the format described below. This is initially represented by a serialized Python dictionary. We have a Python object Question which has attributes for the question text and a dictionary for the answers (and links to each answer's follow-up question). Our input file is like: decision_tree = { 1: Question("Is this a question?", {"Yes": 2, "No", 3}), 2: Question("Is it a good question?", {"Yes": None, "No", 3}), 3: Question("Really!?", {"Yes": None, "No": None}) } decision_tree is an object with (contiguous, [1,n]?) numeric identifier and a Question object with question text and answer dictionary. The answer dictionary keys are answer text (diplayed) and the value the link to the question to follow. None is used to indicate that a decision will be reached with this answer. In this prototype there is no full decision tree. We indicate the path to follow by placing non-supported answers in parentheses. Customization of UI If the UI can be readily customized, we describe here that. Further resources Here we list related resources which may be of interest to the developer of a sustainable project. FAIRness, etc. Known issues This is a proof of concept. It is far from complete. We have a desire that the following features be implemented: Improved decision tree input (not deserialization) Complete decision tree Final badge choice and design About Software end of project plans Resources Readme License View license Releases No releases published Packages 0 No packages published Contributors 5 Languages Jupyter Notebook 56.1% Python 43.9% © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-5855 ---- twarc/tags.py at main · DocNow/twarc · GitHub Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} DocNow / twarc Notifications Star 1k Fork 214 Code Issues 53 Pull requests 0 Actions Projects 0 Wiki Security Insights More Code Issues Pull requests Actions Projects Wiki Security Insights Permalink main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags twarc/utils/tags.py / Jump to Code definitions No definitions found in this file. Code navigation not available for this commit Go to file Go to file T Go to line L Go to definition R Copy path Copy permalink     Cannot retrieve contributors at this time executable file 16 lines (13 sloc) 378 Bytes Raw Blame Open with Desktop View raw View blame #!/usr/bin/env python from __future__ import print_function import json import fileinput import collections counts = collections.Counter() for line in fileinput.input(): tweet = json.loads(line) for tag in tweet['entities']['hashtags']: t = tag['text'].lower() counts[t] += 1 for tag, count in counts.most_common(): print("%5i %s" % (count, tag)) Copy lines Copy permalink View git blame Reference in new issue Go © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-5998 ---- marcedit_xslt_files/homosaurus_xml.xsl at master · reeset/marcedit_xslt_files · GitHub Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this user All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} reeset / marcedit_xslt_files Notifications Star 20 Fork 2 Code Issues 0 Pull requests 0 Actions Projects 0 Security Insights More Code Issues Pull requests Actions Projects Security Insights Permalink master Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags marcedit_xslt_files/homosaurus_xml.xsl Go to file Go to file T Go to line L Copy path Copy permalink     Cannot retrieve contributors at this time 132 lines (119 sloc) 4.77 KB Raw Blame Open with Desktop View raw View blame 00596nz a2200217n 4500 210101 " |||anznnbab||||||||||||||a|||||||d homosaurus Copy lines Copy permalink View git blame Reference in new issue Go © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-643 ---- Issues · DocNow/twarc · GitHub Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} DocNow / twarc Notifications Star 1k Fork 214 Code Issues 53 Pull requests 0 Actions Projects 0 Wiki Security Insights More Code Issues Pull requests Actions Projects Wiki Security Insights Labels 8 Milestones 0 Labels 8 Milestones 0 New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. Pick a username Email Address Password Sign up for GitHub By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails. Already on GitHub? Sign in to your account 53 Open 214 Closed 53 Open 214 Closed Author Filter by author author: Filter by this user Label Filter by label Use alt + click/return to exclude labels. Projects Filter by project Milestones Filter by milestone Assignee Filter by who’s assigned Sort Sort by Newest Oldest Most commented Least commented Recently updated Least recently updated Most reactions 👍 👎 😄 🎉 😕 ❤️ 🚀 👀 Override keys #442 opened Apr 24, 2021 by edsu Error Message after running Twarc command #441 opened Apr 22, 2021 by osemele 12 Counts and basic statistics plugins #440 opened Apr 21, 2021 by igorbrigadir 2 Progress bar v2 #437 opened Apr 16, 2021 by igorbrigadir 2 youtubedl.py #433 opened Apr 13, 2021 by ameliameyer 2 tweets.py #429 opened Apr 8, 2021 by ameliameyer 8 wall.py #419 opened Mar 30, 2021 by ameliameyer 6 Plugin for ActivityStreams? plugins #412 opened Mar 24, 2021 by edsu 4 Document Common V2 usecase: Crawl archive tweets, flatten, export to CSV plugins v2 #411 opened Mar 23, 2021 by igorbrigadir 11 Thread v2 #404 opened Mar 8, 2021 by edsu 8 Retweets v2 #403 opened Mar 8, 2021 by edsu 1 Support Batch Compliance Endpoints v2 #399 opened Mar 4, 2021 by igorbrigadir foaf.py #392 opened Feb 24, 2021 by ameliameyer 1 Make sure the rate limit decorator works appropriately for the new monthly tweet cap v2 #391 opened Feb 23, 2021 by SamHames 3 TWARC Utilities #387 opened Feb 19, 2021 by shamreeza 5 sqlite schema v2 #379 opened Feb 16, 2021 by edsu 1 An example to run twarc as a Kafka producer #374 opened Feb 2, 2021 by rongpenl 3 deleted.py #373 opened Feb 1, 2021 by ameliameyer 20 How are accent marks handled? #366 opened Dec 2, 2020 by cgb37 1 Keep getting "Please run the command "twarc configure" to get started" after updating OS to Big Sur #364 opened Nov 24, 2020 by lalkulaib 2 Got "MissingKeys" error using app-only auth #362 opened Nov 21, 2020 by JiA1996 7 Temporal and Spatial Query #361 opened Nov 6, 2020 by eo4929 Can't track hashtags with '#' in the 'filter' query #359 opened Oct 25, 2020 by glocalglocal 3 Support for providing reply_count #356 opened Oct 22, 2020 by jasco 5 UnicodeDecodeError when running utils in window #343 opened Sep 3, 2020 by juanulload Previous 1 2 3 Next Previous Next ProTip! Follow long discussions with comments:>50. © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-7303 ---- Home · DocNow/twarc Wiki · GitHub Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} DocNow / twarc Notifications Star 1k Fork 214 Code Issues 53 Pull requests 0 Actions Projects 0 Wiki Security Insights More Code Issues Pull requests Actions Projects Wiki Security Insights Home Jump to bottom Ed Summers edited this page Apr 7, 2021 · 7 revisions 🐦 🐍 💾 Welcome to the twarc wiki. We mostly use this space to organically share ideas for how to develop and use twarc. In practice this wiki is a place for documentation about the design and use of twarc that doesn't fit comfortably into a discrete issue ticket or the current documentation. Sometimes these pages graduate into the official documentation that is available on ReadTheDocs. However there is no requirement for wiki pages to be written with the goal of integrating them into the official documentation. Please feel empowered to add new pages, it's a wiki! You can send a pull request, or if you prefer create an issue to request the ability to edit directly. If you'd like to have your page migrated into the official documentation, or think it warrants changes to the code please open an issue to let us know. Pages 4 Home End to End Example Twitter Study twarc2 Design Working with v2 Tweet Formats Clone this wiki locally © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-7750 ---- GitHub - KnowledgeCaptureAndDiscovery/somef-github-action Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} KnowledgeCaptureAndDiscovery / somef-github-action Notifications Star 3 Fork 0 Apache-2.0 License 3 stars 0 forks Star Notifications Code Issues 4 Pull requests 1 Actions Projects 0 Security Insights More Code Issues Pull requests Actions Projects Security Insights main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags 4 branches 1 tag Go to file Code Clone HTTPS GitHub CLI Use Git or checkout with SVN using the web URL. Work fast with our official CLI. Learn more. Open with GitHub Desktop Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit   Git stats 36 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .github/workflows     Dockerfile     LICENSE     README.md     action.yml     entrypoint.sh     View code SOMEF GitHub Action Basic usage Advanced workflow README.md SOMEF GitHub Action This action uses SOMEF to generate a .codemeta file and meet the recommendations from howfairis Basic usage In its more basic usage, the github action only uses SOMEF to generate a codemeta.json file. on: [push] jobs: somef_job: runs-on: ubuntu-latest name: Run SOMEF steps: # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it - name: Chechout repo uses: actions/checkout@v2 # Use SOMEF generate codemeta.json - name: Somef with repo-url input uses: KnowledgeCaptureAndDiscovery/somef-github-action@main with: repo-url: "https://github.com/${{ github.repository }}" Advanced workflow A more advanced workflow uses howfairis and Create Pull Request actions to create a howfairis badge and send a pull request with the generated codemeta.json file if necessary: on: [push] jobs: somef_job: runs-on: ubuntu-latest name: Test somef steps: # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it - name: Chechout repo uses: actions/checkout@v2 # Run howfairis - name: fair-software uses: fair-software/howfairis-github-action@0.1.0 with: MY_REPO_URL: "https://github.com/${{ github.repository }}" # Use SOMEF generate codemeta.json - name: Somef with repo-url input uses: KnowledgeCaptureAndDiscovery/somef-github-action@main with: repo-url: "https://github.com/${{ github.repository }}" # Create a PR - name: Create Pull Request uses: peter-evans/create-pull-request@v3.8.2 with: title: Generating codemeta template commit-message: Add codemeta.json template committer: GitHub author: ${{ github.actor }} <${{ github.actor }}@users.noreply.github.com> labels: automated pr branch: add-codemeta About No description, website, or topics provided. Resources Readme License Apache-2.0 License Releases 1 tags Packages 0 No packages published Contributors 3       Languages Shell 88.2% Dockerfile 11.8% © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-7851 ---- twarc/expansions.py at main · DocNow/twarc · GitHub Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} DocNow / twarc Notifications Star 1k Fork 214 Code Issues 53 Pull requests 0 Actions Projects 0 Wiki Security Insights More Code Issues Pull requests Actions Projects Wiki Security Insights Permalink main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags twarc/twarc/expansions.py / Jump to Code definitions extract_includes Function flatten Function expand_payload Function Code navigation index up-to-date Go to file Go to file T Go to line L Go to definition R Copy path Copy permalink     Cannot retrieve contributors at this time 207 lines (177 sloc) 6.18 KB Raw Blame Open with Desktop View raw View blame """ This module contains a list of the known Twitter V2+ API expansions and fields for each expansion, and a function for "flattening" a result set, including all expansions inline """ from collections import defaultdict EXPANSIONS = [ "author_id", "in_reply_to_user_id", "referenced_tweets.id", "referenced_tweets.id.author_id", "entities.mentions.username", "attachments.poll_ids", "attachments.media_keys", "geo.place_id", ] USER_FIELDS = [ "created_at", "description", "entities", "id", "location", "name", "pinned_tweet_id", "profile_image_url", "protected", "public_metrics", "url", "username", "verified", "withheld", ] TWEET_FIELDS = [ "attachments", "author_id", "context_annotations", "conversation_id", "created_at", "entities", "geo", "id", "in_reply_to_user_id", "lang", "public_metrics", # "non_public_metrics", # private # "organic_metrics", # private # "promoted_metrics", # private "text", "possibly_sensitive", "referenced_tweets", "reply_settings", "source", "withheld", ] MEDIA_FIELDS = [ "duration_ms", "height", "media_key", "preview_image_url", "type", "url", "width", # "non_public_metrics", # private # "organic_metrics", # private # "promoted_metrics", # private "public_metrics", ] POLL_FIELDS = ["duration_minutes", "end_datetime", "id", "options", "voting_status"] PLACE_FIELDS = [ "contained_within", "country", "country_code", "full_name", "geo", "id", "name", "place_type", ] EVERYTHING = { "expansions": ",".join(EXPANSIONS), "user.fields": ",".join(USER_FIELDS), "tweet.fields": ",".join(TWEET_FIELDS), "media.fields": ",".join(MEDIA_FIELDS), "poll.fields": ",".join(POLL_FIELDS), "place.fields": ",".join(PLACE_FIELDS), } # For endpoints focused on user objects such as looking up users and followers. # Not all of the expansions are available for these endpoints. USER_EVERYTHING = { "expansions": "pinned_tweet_id", "tweet.fields": ",".join(TWEET_FIELDS), "user.fields": ",".join(USER_FIELDS), } def extract_includes(response, expansion, _id="id"): if "includes" in response and expansion in response["includes"]: return defaultdict( lambda: {}, {include[_id]: include for include in response["includes"][expansion]}, ) else: return defaultdict(lambda: {}) def flatten(response): """ Flatten the response. Expects an entire page response from the API (data, includes, meta) Defaults: Return empty objects for things missing in includes. Doesn't modify tweets, only adds extra data. """ # Users extracted both by id and by username for expanding mentions includes_users = defaultdict( lambda: {}, { **extract_includes(response, "users", "id"), **extract_includes(response, "users", "username"), }, ) # Media is by media_key, not id includes_media = extract_includes(response, "media", "media_key") includes_polls = extract_includes(response, "polls") includes_places = extract_includes(response, "places") # Tweets in includes will themselves be expanded includes_tweets = extract_includes(response, "tweets") # Errors are returned but unused here for now includes_errors = extract_includes(response, "errors") def expand_payload(payload): """ Recursively step through an object and sub objects and append extra data. Can be applied to any tweet, list of tweets, sub object of tweet etc. """ # Don't try to expand on primitive values, return strings as is: if isinstance(payload, (str, bool, int, float)): return payload # expand list items individually: elif isinstance(payload, list): payload = [expand_payload(item) for item in payload] return payload # Try to expand on dicts within dicts: elif isinstance(payload, dict): for key, value in payload.items(): payload[key] = expand_payload(value) if "author_id" in payload: payload["author"] = includes_users[payload["author_id"]] if "in_reply_to_user_id" in payload: payload["in_reply_to_user"] = includes_users[payload["in_reply_to_user_id"]] if "media_keys" in payload: payload["media"] = list( includes_media[media_key] for media_key in payload["media_keys"] ) if "poll_ids" in payload and len(payload["poll_ids"]) > 0: poll_id = payload["poll_ids"][-1] # only ever 1 poll per tweet. payload["poll"] = includes_polls[poll_id] if "geo" in payload and "place_id" in payload["geo"]: place_id = payload["geo"]["place_id"] payload["geo"] = {**payload["geo"], **includes_places[place_id]} if "mentions" in payload: payload["mentions"] = list( {**referenced_user, **includes_users[referenced_user["username"]]} for referenced_user in payload["mentions"] ) if "referenced_tweets" in payload: payload["referenced_tweets"] = list( {**referenced_tweet, **includes_tweets[referenced_tweet["id"]]} for referenced_tweet in payload["referenced_tweets"] ) if "pinned_tweet_id" in payload: payload["pinned_tweet"] = includes_tweets[payload["pinned_tweet_id"]] return payload # First, expand the included tweets, before processing actual result tweets: for included_id, included_tweet in extract_includes(response, "tweets").items(): includes_tweets[included_id] = expand_payload(included_tweet) # Now flatten the list of tweets or an individual tweet if "data" in response: response["data"] = expand_payload(response["data"]) # Add the __twarc metadata to each tweet if it's a result set if "__twarc" in response and isinstance(response["data"], list): for tweet in response["data"]: tweet["__twarc"] = response["__twarc"] return response Copy lines Copy permalink View git blame Reference in new issue Go © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-8456 ---- GitHub - DocNow/twarc-ids: A plugin for twarc2 to extract tweet ids from tweet JSON. Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} DocNow / twarc-ids Notifications Star 1 Fork 0 A plugin for twarc2 to extract tweet ids from tweet JSON. MIT License 1 star 0 forks Star Notifications Code Issues 0 Pull requests 0 Actions Projects 0 Security Insights More Code Issues Pull requests Actions Projects Security Insights main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags 1 branch 0 tags Go to file Code Clone HTTPS GitHub CLI Use Git or checkout with SVN using the web URL. Work fast with our official CLI. Learn more. Open with GitHub Desktop Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit   Git stats 15 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time test-data     .gitignore     LICENSE     README.md     setup.cfg     setup.py     test_twarc_ids.py     twarc_ids.py     View code README.md twarc-ids This module is a simple example of how to create a plugin for twarc. It uses click-plugins to extend the main twarc command, and to manage the command line options. First you need to install twarc and this plugin: pip install twarc pip install twarc-ids Now you can collect data using the core twarc utility: twarc search blacklivesmatter > tweets.jsonl And you have a new subcommand ids that is supplied by twarc-ids. twarc ids tweets.jsonl > ids.txt It's good practice to include some tests for your module. See test_twarc_ids.py for an example. You can run it directly with pytest or using: python setup.py test When creating your setup.py make sure you don't forget the entry_points magic so that twarc will find your plugin when it is installed! About A plugin for twarc2 to extract tweet ids from tweet JSON. Resources Readme License MIT License Releases No releases published Packages 0 No packages published Languages Python 100.0% © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-8970 ---- GitHub - robintw/CW-ideas: Hack day project from CW21 working on collating and analysing collaborative ideas and hack day projects from previous Collaborations Workshops Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this user All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} robintw / CW-ideas Notifications Star 1 Fork 0 Hack day project from CW21 working on collating and analysing collaborative ideas and hack day projects from previous Collaborations Workshops robintw.github.io/cw-ideas/ MIT License 1 star 0 forks Star Notifications Code Issues 9 Pull requests 0 Actions Projects 0 Security Insights More Code Issues Pull requests Actions Projects Security Insights main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags 3 branches 0 tags Go to file Code Clone HTTPS GitHub CLI Use Git or checkout with SVN using the web URL. Work fast with our official CLI. Learn more. Open with GitHub Desktop Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit   Git stats 124 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .github/workflows     archetypes     content     static     themes/PaperMod     CONTRIBUTING.md     LICENSE     README.md     config.yml     View code Exploring previous Collaborations Workshop ideas (CW-ideas) Building locally Task split during the hack day Hack day presentation README.md Exploring previous Collaborations Workshop ideas (CW-ideas) This is the repo for a hack day project from Collaborations Workshop 2021 which aims to explore previous ideas from Collaborations Workshops and provide them in an easily browseable and searchable form. A live version of the website is hosted at https://robintw.github.io/CW-ideas/. The repo consists of markdown versions of the collaborative ideas and hackday pitches, plus code to host a website to view them. To contribute to the repository - either by adding new ideas from previous CWs, or to contribute to the code to view the ideas - please see the contributing guide. This repository is licensed under the MIT license, and all the ideas themselves are CC-BY (this is mentioned at the bottom of each idea). The team creating this was Mario Antonioletti, Heather Turner and Robin Wilson. Building locally The repository is automatically built and deployed on every push, but if you want to build locally for testing or debugging purposes, follow the instructions below: Install Hugo In the root of the repo, run hugo server The site will be built, and served on localhost - see the command-line output for the full URL Task split during the hack day Heather Turner: The brains behind the idea Robin Wilson: The technical guru Mario Antonioletti: The plodder with superpowers Tasks divided orthogonally Conversion of past google doc proposals to markdown (Mario and Robin) Configuring and setting up Hugo (Robin and Heather) Provisioning a GitHub repo (Robin) Hack day presentation Available here About Hack day project from CW21 working on collating and analysing collaborative ideas and hack day projects from previous Collaborations Workshops robintw.github.io/cw-ideas/ Resources Readme License MIT License Releases No releases published Packages 0 No packages published Contributors 3       Languages HTML 83.2% CSS 13.4% JavaScript 3.4% © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-9494 ---- twarc/deletes.py at main · DocNow/twarc · GitHub Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} DocNow / twarc Notifications Star 1k Fork 214 Code Issues 53 Pull requests 0 Actions Projects 0 Wiki Security Insights More Code Issues Pull requests Actions Projects Wiki Security Insights Permalink main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags twarc/utils/deletes.py / Jump to Code definitions No definitions found in this file. Code navigation not available for this commit Go to file Go to file T Go to line L Go to definition R Copy path Copy permalink     Cannot retrieve contributors at this time executable file 187 lines (155 sloc) 6.18 KB Raw Blame Open with Desktop View raw View blame #!/usr/bin/env python3 """ This program assumes that you are feeding it tweet JSON data for tweets that have been deleted. It will use the metadata and the API to analyze why each tweet appears to have been deleted. Note that lookups are based on user id, so may give different results than looking up a user by screen name. """ import json import fileinput import collections import requests import twarc import argparse import logging USER_OK = "USER_OK" USER_DELETED = "USER_DELETED" USER_PROTECTED = "USER_PROTECTED" USER_SUSPENDED = "USER_SUSPENDED" TWEET_OK = "TWEET_OK" TWEET_DELETED = "TWEET_DELETED" # You have been blocked by the user. TWEET_BLOCKED = "TWEET_BLOCKED" RETWEET_DELETED = "RETWEET_DELETED" ORIGINAL_TWEET_DELETED = "ORIGINAL_TWEET_DELETED" ORIGINAL_TWEET_BLOCKED = "ORIGINAL_TWEET_BLOCKED" ORIGINAL_USER_DELETED = "ORIGINAL_USER_DELETED" ORIGINAL_USER_PROTECTED = "ORIGINAL_USER_PROTECTED" ORIGINAL_USER_SUSPENDED = "ORIGINAL_USER_SUSPENDED" t = twarc.Twarc() def main(files, enhance_tweet=False, print_results=True): counts = collections.Counter() for count, line in enumerate(fileinput.input(files=files)): if count % 10000 == 0: logging.info("processed {:,} tweets".format(count)) tweet = json.loads(line) result = examine(tweet) if enhance_tweet: tweet['delete_reason'] = result print(json.dumps(tweet)) else: print(tweet_url(tweet), result) counts[result] += 1 if print_results: for result, count in counts.most_common(): print(result, count) def examine(tweet): user_status = get_user_status(tweet) # Go with user status first (suspended, protected, deleted) if user_status != USER_OK: return user_status else: retweet = tweet.get('retweeted_status', None) tweet_status = get_tweet_status(tweet) # If not a retweet and tweet deleted, then tweet deleted. if tweet_status == TWEET_OK: return TWEET_OK elif retweet is None or tweet_status == TWEET_BLOCKED: return tweet_status else: rt_status = examine(retweet) if rt_status == USER_DELETED: return ORIGINAL_USER_DELETED elif rt_status == USER_PROTECTED: return ORIGINAL_USER_PROTECTED elif rt_status == USER_SUSPENDED: return ORIGINAL_USER_SUSPENDED elif rt_status == TWEET_DELETED: return ORIGINAL_TWEET_DELETED elif rt_status == TWEET_BLOCKED: return ORIGINAL_TWEET_BLOCKED elif rt_status == TWEET_OK: return RETWEET_DELETED else: raise "Unexpected retweet status %s for %s" % (rt_status, tweet['id_str']) users = {} def get_user_status(tweet): user_id = tweet['user']['id_str'] if user_id in users: return users[user_id] url = "https://api.twitter.com/1.1/users/show.json" params = {"user_id": user_id} # USER_DELETED: 404 and {"errors": [{"code": 50, "message": "User not found."}]} # USER_PROTECTED: 200 and user object with "protected": true # USER_SUSPENDED: 403 and {"errors":[{"code":63,"message":"User has been suspended."}]} result = USER_OK try: resp = t.get(url, params=params, allow_404=True) user = resp.json() if user['protected']: result = USER_PROTECTED except requests.exceptions.HTTPError as e: try: resp_json = e.response.json() except json.decoder.JSONDecodeError: raise e if e.response.status_code == 404 and has_error_code(resp_json, 50): result = USER_DELETED elif e.response.status_code == 403 and has_error_code(resp_json, 63): result = USER_SUSPENDED else: raise e users[user_id] = result return result tweets = {} def get_tweet_status(tweet): id = tweet['id_str'] if id in tweets: return tweets[id] # USER_SUSPENDED: 403 and {"errors":[{"code":63,"message":"User has been suspended."}]} # USER_PROTECTED: 403 and {"errors":[{"code":179,"message":"Sorry, you are not authorized to see this status."}]} # TWEET_DELETED: 404 and {"errors":[{"code":144,"message":"No status found with that ID."}]} # or {"errors":[{"code":34,"message":"Sorry, that page does not exist."}]} url = "https://api.twitter.com/1.1/statuses/show.json" params = {"id": id} result = TWEET_OK try: t.get(url, params=params, allow_404=True) except requests.exceptions.HTTPError as e: try: resp_json = e.response.json() except json.decoder.JSONDecodeError: raise e if e.response.status_code == 404 and has_error_code(resp_json, (34, 144)): result = TWEET_DELETED elif e.response.status_code == 403 and has_error_code(resp_json, 63): result = USER_SUSPENDED elif e.response.status_code == 403 and has_error_code(resp_json, 179): result = USER_PROTECTED elif e.response.status_code == 401 and has_error_code(resp_json, 136): result = TWEET_BLOCKED else: raise e tweets[id] = result return result def tweet_url(tweet): return "https://twitter.com/%s/status/%s" % ( tweet['user']['screen_name'], tweet['id_str']) def has_error_code(resp, code): if isinstance(code, int): code = (code, ) for error in resp['errors']: if error['code'] in code: return True return False if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument('--enhance', action='store_true', help='Enhance tweet with delete_reason and output enhanced tweet.') parser.add_argument('--skip-results', action='store_true', help='Skip outputting delete reason summary') parser.add_argument('files', metavar='FILE', nargs='*', help='files to read, if empty, stdin is used') args = parser.parse_args() main(args.files if len(args.files) > 0 else ('-',), enhance_tweet=args.enhance, print_results=not args.skip_results and not args.enhance) Copy lines Copy permalink View git blame Reference in new issue Go © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-9542 ---- GitHub - DocNow/twarc: A command line tool (and Python library) for archiving Twitter JSON Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} DocNow / twarc Notifications Star 1k Fork 214 A command line tool (and Python library) for archiving Twitter JSON MIT License 1k stars 214 forks Star Notifications Code Issues 53 Pull requests 0 Actions Projects 0 Wiki Security Insights More Code Issues Pull requests Actions Projects Wiki Security Insights main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags 4 branches 99 tags Go to file Code Clone HTTPS GitHub CLI Use Git or checkout with SVN using the web URL. Work fast with our official CLI. Learn more. Open with GitHub Desktop Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit edsu Merge pull request #443 from DocNow/install-docs-mac-clarifications … 5ebd0ef Apr 26, 2021 Merge pull request #443 from DocNow/install-docs-mac-clarifications Clarifications to Mac install instructions 5ebd0ef Git stats 1,260 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .github/workflows add a message to the slack notification Apr 8, 2021 docs Update README.md Apr 26, 2021 twarc Pagination fix Apr 25, 2021 utils commit on every insert is slow when writing to a usb thumbdrive appar… Feb 27, 2021 .gitignore Retweets changes Jun 25, 2020 .readthedocs.yaml moving readthedocs back here; refs #421 Apr 7, 2021 LICENSE it easier this way I think Apr 12, 2021 MANIFEST.in add docs/README.md to manifest so setup.py can read it Apr 7, 2021 README.md instructions for running mkdocs locally Apr 12, 2021 mkdocs.yml fix docs edit links Apr 7, 2021 requirements-mkdocs.txt moving readthedocs back here; refs #421 Apr 7, 2021 requirements.txt moving readthedocs back here; refs #421 Apr 7, 2021 setup.cfg small fixes to tests for python3, and a new version Sep 15, 2016 setup.py read version from version.py Apr 7, 2021 test_twarc.py and then there 100 Mar 27, 2021 test_twarc2.py Pagination fix Apr 25, 2021 View code twarc Contributing Documentation Code README.md twarc Collect data at the command line from the Twitter API (v1.1 and v2). Read the documentation Ask questions in Slack or Matrix Contributing Documentation The documentation is managed at ReadTheDocs. If you would like to improve the documentation you can edit the Markdown files in docs or add new ones. Then send a pull request and we can add it. To view your documentation locally you should be able to: pip install -r requirements-mkdocs.txt mkdocs serve open http://127.0.0.1:8000/ If you prefer you can create a page on the wiki to workshop the documentation, and then when/if you think it's ready to be merged with the documentation create an issue. Please feel free to create whatever documentation is useful in the wiki area. Code If you are interested in adding functionality to twarc or fixing something that's broken here are the steps to setting up your development environment: git clone https://github.io/docnow/twarc cd twarc pip install -r requirements.txt Create a .env file that included Twitter App keys to use during testing: BEARER_TOKEN=CHANGEME CONSUMER_KEY=CHANGEME CONSUMER_SECRET=CHANGEME ACCESS_TOKEN=CHANGEME ACCESS_TOKEN_SECRET=CHANGEME Now run the tests: python setup.py test Add your code and some new tests, and send a pull request! About A command line tool (and Python library) for archiving Twitter JSON Resources Readme License MIT License Releases 99 v2.0.8 Latest Apr 25, 2021 + 98 releases Packages 0 No packages published Used by 146 + 138 Contributors 51 + 40 contributors Languages Python 100.0% © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-9574 ---- twarc/utils at main · DocNow/twarc · GitHub Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} DocNow / twarc Notifications Star 1k Fork 214 Code Issues 53 Pull requests 0 Actions Projects 0 Wiki Security Insights More Code Issues Pull requests Actions Projects Wiki Security Insights main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags twarc/utils/ Go to file twarc/utils/ Latest commit edsu commit on every insert is slow when writing to a usb thumbdrive appar… … 3dd7635 Feb 27, 2021 commit on every insert is slow when writing to a usb thumbdrive appar… …ently 3dd7635 Git stats History Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time . . auth_timing.py     deduplicate.py     deleted.py     deleted_users.py     deletes.py     embeds.py     emojis.py     extractor.py     filter_date.py     filter_users.py     flakey.py     foaf.py     gender.py     geo.py     geofilter.py     geojson.py     json2csv.py     media2warc.py     media_urls.py     network.py     noretweets.py     oembeds.py     remove_limit.py     retweets.py     search.py     sensitive.py     sort_by_id.py     source.py     tags.py     times.py     twarc-archive.py     tweet.py     tweet_compliance.py     tweet_text.py     tweet_urls.py     tweetometer.py     tweets.py     unshrtn.py     urls.py     users.py     validate.py     wall.py     wayback.py     webarchives.py     wordcloud.py     youtubedl.py     © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-9699 ---- GitHub - dokempf/credit-all Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this user All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} dokempf / credit-all Notifications Star 0 Fork 0 MIT License 0 stars 0 forks Star Notifications Code Issues 1 Pull requests 0 Actions Projects 1 Wiki Security Insights More Code Issues Pull requests Actions Projects Wiki Security Insights master Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags 1 branch 0 tags Go to file Code Clone HTTPS GitHub CLI Use Git or checkout with SVN using the web URL. Work fast with our official CLI. Learn more. Open with GitHub Desktop Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit   Git stats 37 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time creditall     .all-contributorsrc     .gitignore     CodeOfConduct.md     Credit-all.odp     LICENSE.md     MANIFEST.in     README.md     Sandstrom2021.jpg     setup.py     View code Welcome! Thanks for visiting Credit All! 😁 What is this project about and why is it important? The problem The solution Installation Who are we? What does this project need? We need you! How can you get involved? Get in touch Thank you README.md Welcome! Thanks for visiting Credit All! 😁 In this document you can find lots of information about this project. You can just scroll down or use the quick links below for each section. Welcome! Thanks for visiting Credit All! 😁 What is this project about and why is it important? The problem The solution Installation Who are we? What does this project need? We need you! How can you get involved? Get in touch Thank you What is this project about and why is it important? There is no one size fits all system for capturing all of the contributions during different research projects. This could be a scientific research project, a software development project or an open-source community project. We think it is important that all contributions are recorded and therefore everyone is given credit for their work more fairly. The problem Current systems that attribute contributions to authors in academic outputs do not include all of the jobs/roles/tasks that are encompassed in research projects. The current problems include: Capturing all roles on a project. Capturing all tasks within those roles. How to convert this into the actual authorship or contributions list that can be used for project outputs. How this list can be presented. The solution Taking inspiration from Malin Sandstroms Lightning talk at the Software Sustainability Institutes Collaboration Workshop 2021, in which she proposed to combine the current contributions approaches. Slide from Malin Sandstrom's SSI talk In this project, we propose to: Expand current lists to be more inclusive - using current systems such as CRediT, INRIA, BIDS Contributors. Develop a tool to be used to record these contributions during the project such as within a Github repository - we have adapted the All Contributor bot for our tool. Develop a way that this can be shown on academic papers - lists, table, cinema title page? (look at e.g. Brainhack paper w 100+ authors and Living with machines). Installation You can install the command line tool using pip: python -m pip install git+git://github.com/dokempf/credit-all.git Who are we? In alphabetical order: Daisy Perry (Writing a code of conduct, Curating data) Dominic Kempf (Initial ideas of the project, Writing new code, Writing documentation about the code) Emma Karoune (Initial ideas of the project, Curating data) Malin Sandström (Initial ideas of the project, Curating data) What does this project need? We need you! Please review our list of tasks and tell us if something needs to be added. Spot a bug and tell us about it! Suggest new ways that our contributions list can be presented. If you have any feedback on the work that is going on, then please get in contact. How can you get involved? If you think you can help in any way or just want to suggest something currently not in the project, then please check out the contributor’s guidelines. Please note that it’s very important to maintain a positive and supportive environment for everyone who wants to participate. When you join as a collaborator, you must follow the code of conduct in all interactions both on and offline. Get in touch Please feel free to get in touch with our team: ekaroune@googlemail.com Thank you Thanks for taking the time to read this project page and do please get involved. About No description, website, or topics provided. Resources Readme License MIT License Releases No releases published Packages 0 No packages published Contributors 4         Languages Python 97.0% TeX 3.0% © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-9789 ---- EIPs/eip-721.md at master · ethereum/EIPs · GitHub Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this organization All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} ethereum / EIPs Notifications Star 6.3k Fork 2.5k Code Issues 468 Pull requests 34 Actions Projects 2 Security Insights More Code Issues Pull requests Actions Projects Security Insights Permalink master Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags EIPs/EIPS/eip-721.md Go to file Go to file T Go to line L Copy path Copy permalink MicahZoltu Adds rule to EIP-1 that references to other EIPs must use relative pa… … Latest commit 15f61ed Sep 29, 2020 History …th format and the first reference must be linked. (#2947) I have gone through and updated all existing EIPs to match this rule, including EIP-1. In some cases, people were using markdown citations, I suspect because the long-form was a bit verbose to inline. Since the relative path is quite short, I moved these to inline but I wouldn't be opposed to putting them back to citation format if that is desired by the authors. In doing the migration/cleanup, I found some EIP references to EIPs that don't actually exist. In these cases I tried to excise the reference from the EIP as best I could. It is worth noting that the Readme actually already had this rule, it just wasn't expressed properly in EIP-1 and the "Citation Format" section of the readme I think caused people a bit of confusion (when citing externally, you should use the citation format). 13 contributors Users who have contributed to this file +1 Simple Summary Abstract Motivation Specification Caveats Rationale Backwards Compatibility Test Cases Implementations References Copyright 447 lines (335 sloc) 29.7 KB Raw Blame Open with Desktop View raw View blame eip title author discussions-to type category status created requires 721 ERC-721 Non-Fungible Token Standard William Entriken , Dieter Shirley , Jacob Evans , Nastassia Sachs https://github.com/ethereum/eips/issues/721 Standards Track ERC Final 2018-01-24 165 Simple Summary A standard interface for non-fungible tokens, also known as deeds. Abstract The following standard allows for the implementation of a standard API for NFTs within smart contracts. This standard provides basic functionality to track and transfer NFTs. We considered use cases of NFTs being owned and transacted by individuals as well as consignment to third party brokers/wallets/auctioneers ("operators"). NFTs can represent ownership over digital or physical assets. We considered a diverse universe of assets, and we know you will dream up many more: Physical property — houses, unique artwork Virtual collectables — unique pictures of kittens, collectable cards "Negative value" assets — loans, burdens and other responsibilities In general, all houses are distinct and no two kittens are alike. NFTs are distinguishable and you must track the ownership of each one separately. Motivation A standard interface allows wallet/broker/auction applications to work with any NFT on Ethereum. We provide for simple ERC-721 smart contracts as well as contracts that track an arbitrarily large number of NFTs. Additional applications are discussed below. This standard is inspired by the ERC-20 token standard and builds on two years of experience since EIP-20 was created. EIP-20 is insufficient for tracking NFTs because each asset is distinct (non-fungible) whereas each of a quantity of tokens is identical (fungible). Differences between this standard and EIP-20 are examined below. Specification The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. Every ERC-721 compliant contract must implement the ERC721 and ERC165 interfaces (subject to "caveats" below): pragma solidity ^0.4.20; /// @title ERC-721 Non-Fungible Token Standard /// @dev See https://eips.ethereum.org/EIPS/eip-721 /// Note: the ERC-165 identifier for this interface is 0x80ac58cd. interface ERC721 /* is ERC165 */ { /// @dev This emits when ownership of any NFT changes by any mechanism. /// This event emits when NFTs are created (`from` == 0) and destroyed /// (`to` == 0). Exception: during contract creation, any number of NFTs /// may be created and assigned without emitting Transfer. At the time of /// any transfer, the approved address for that NFT (if any) is reset to none. event Transfer(address indexed _from, address indexed _to, uint256 indexed _tokenId); /// @dev This emits when the approved address for an NFT is changed or /// reaffirmed. The zero address indicates there is no approved address. /// When a Transfer event emits, this also indicates that the approved /// address for that NFT (if any) is reset to none. event Approval(address indexed _owner, address indexed _approved, uint256 indexed _tokenId); /// @dev This emits when an operator is enabled or disabled for an owner. /// The operator can manage all NFTs of the owner. event ApprovalForAll(address indexed _owner, address indexed _operator, bool _approved); /// @notice Count all NFTs assigned to an owner /// @dev NFTs assigned to the zero address are considered invalid, and this /// function throws for queries about the zero address. /// @param _owner An address for whom to query the balance /// @return The number of NFTs owned by `_owner`, possibly zero function balanceOf(address _owner) external view returns (uint256); /// @notice Find the owner of an NFT /// @dev NFTs assigned to zero address are considered invalid, and queries /// about them do throw. /// @param _tokenId The identifier for an NFT /// @return The address of the owner of the NFT function ownerOf(uint256 _tokenId) external view returns (address); /// @notice Transfers the ownership of an NFT from one address to another address /// @dev Throws unless `msg.sender` is the current owner, an authorized /// operator, or the approved address for this NFT. Throws if `_from` is /// not the current owner. Throws if `_to` is the zero address. Throws if /// `_tokenId` is not a valid NFT. When transfer is complete, this function /// checks if `_to` is a smart contract (code size > 0). If so, it calls /// `onERC721Received` on `_to` and throws if the return value is not /// `bytes4(keccak256("onERC721Received(address,address,uint256,bytes)"))`. /// @param _from The current owner of the NFT /// @param _to The new owner /// @param _tokenId The NFT to transfer /// @param data Additional data with no specified format, sent in call to `_to` function safeTransferFrom(address _from, address _to, uint256 _tokenId, bytes data) external payable; /// @notice Transfers the ownership of an NFT from one address to another address /// @dev This works identically to the other function with an extra data parameter, /// except this function just sets data to "". /// @param _from The current owner of the NFT /// @param _to The new owner /// @param _tokenId The NFT to transfer function safeTransferFrom(address _from, address _to, uint256 _tokenId) external payable; /// @notice Transfer ownership of an NFT -- THE CALLER IS RESPONSIBLE /// TO CONFIRM THAT `_to` IS CAPABLE OF RECEIVING NFTS OR ELSE /// THEY MAY BE PERMANENTLY LOST /// @dev Throws unless `msg.sender` is the current owner, an authorized /// operator, or the approved address for this NFT. Throws if `_from` is /// not the current owner. Throws if `_to` is the zero address. Throws if /// `_tokenId` is not a valid NFT. /// @param _from The current owner of the NFT /// @param _to The new owner /// @param _tokenId The NFT to transfer function transferFrom(address _from, address _to, uint256 _tokenId) external payable; /// @notice Change or reaffirm the approved address for an NFT /// @dev The zero address indicates there is no approved address. /// Throws unless `msg.sender` is the current NFT owner, or an authorized /// operator of the current owner. /// @param _approved The new approved NFT controller /// @param _tokenId The NFT to approve function approve(address _approved, uint256 _tokenId) external payable; /// @notice Enable or disable approval for a third party ("operator") to manage /// all of `msg.sender`'s assets /// @dev Emits the ApprovalForAll event. The contract MUST allow /// multiple operators per owner. /// @param _operator Address to add to the set of authorized operators /// @param _approved True if the operator is approved, false to revoke approval function setApprovalForAll(address _operator, bool _approved) external; /// @notice Get the approved address for a single NFT /// @dev Throws if `_tokenId` is not a valid NFT. /// @param _tokenId The NFT to find the approved address for /// @return The approved address for this NFT, or the zero address if there is none function getApproved(uint256 _tokenId) external view returns (address); /// @notice Query if an address is an authorized operator for another address /// @param _owner The address that owns the NFTs /// @param _operator The address that acts on behalf of the owner /// @return True if `_operator` is an approved operator for `_owner`, false otherwise function isApprovedForAll(address _owner, address _operator) external view returns (bool); } interface ERC165 { /// @notice Query if a contract implements an interface /// @param interfaceID The interface identifier, as specified in ERC-165 /// @dev Interface identification is specified in ERC-165. This function /// uses less than 30,000 gas. /// @return `true` if the contract implements `interfaceID` and /// `interfaceID` is not 0xffffffff, `false` otherwise function supportsInterface(bytes4 interfaceID) external view returns (bool); } A wallet/broker/auction application MUST implement the wallet interface if it will accept safe transfers. /// @dev Note: the ERC-165 identifier for this interface is 0x150b7a02. interface ERC721TokenReceiver { /// @notice Handle the receipt of an NFT /// @dev The ERC721 smart contract calls this function on the recipient /// after a `transfer`. This function MAY throw to revert and reject the /// transfer. Return of other than the magic value MUST result in the /// transaction being reverted. /// Note: the contract address is always the message sender. /// @param _operator The address which called `safeTransferFrom` function /// @param _from The address which previously owned the token /// @param _tokenId The NFT identifier which is being transferred /// @param _data Additional data with no specified format /// @return `bytes4(keccak256("onERC721Received(address,address,uint256,bytes)"))` /// unless throwing function onERC721Received(address _operator, address _from, uint256 _tokenId, bytes _data) external returns(bytes4); } The metadata extension is OPTIONAL for ERC-721 smart contracts (see "caveats", below). This allows your smart contract to be interrogated for its name and for details about the assets which your NFTs represent. /// @title ERC-721 Non-Fungible Token Standard, optional metadata extension /// @dev See https://eips.ethereum.org/EIPS/eip-721 /// Note: the ERC-165 identifier for this interface is 0x5b5e139f. interface ERC721Metadata /* is ERC721 */ { /// @notice A descriptive name for a collection of NFTs in this contract function name() external view returns (string _name); /// @notice An abbreviated name for NFTs in this contract function symbol() external view returns (string _symbol); /// @notice A distinct Uniform Resource Identifier (URI) for a given asset. /// @dev Throws if `_tokenId` is not a valid NFT. URIs are defined in RFC /// 3986. The URI may point to a JSON file that conforms to the "ERC721 /// Metadata JSON Schema". function tokenURI(uint256 _tokenId) external view returns (string); } This is the "ERC721 Metadata JSON Schema" referenced above. { "title": "Asset Metadata", "type": "object", "properties": { "name": { "type": "string", "description": "Identifies the asset to which this NFT represents" }, "description": { "type": "string", "description": "Describes the asset to which this NFT represents" }, "image": { "type": "string", "description": "A URI pointing to a resource with mime type image/* representing the asset to which this NFT represents. Consider making any images at a width between 320 and 1080 pixels and aspect ratio between 1.91:1 and 4:5 inclusive." } } } The enumeration extension is OPTIONAL for ERC-721 smart contracts (see "caveats", below). This allows your contract to publish its full list of NFTs and make them discoverable. /// @title ERC-721 Non-Fungible Token Standard, optional enumeration extension /// @dev See https://eips.ethereum.org/EIPS/eip-721 /// Note: the ERC-165 identifier for this interface is 0x780e9d63. interface ERC721Enumerable /* is ERC721 */ { /// @notice Count NFTs tracked by this contract /// @return A count of valid NFTs tracked by this contract, where each one of /// them has an assigned and queryable owner not equal to the zero address function totalSupply() external view returns (uint256); /// @notice Enumerate valid NFTs /// @dev Throws if `_index` >= `totalSupply()`. /// @param _index A counter less than `totalSupply()` /// @return The token identifier for the `_index`th NFT, /// (sort order not specified) function tokenByIndex(uint256 _index) external view returns (uint256); /// @notice Enumerate NFTs assigned to an owner /// @dev Throws if `_index` >= `balanceOf(_owner)` or if /// `_owner` is the zero address, representing invalid NFTs. /// @param _owner An address where we are interested in NFTs owned by them /// @param _index A counter less than `balanceOf(_owner)` /// @return The token identifier for the `_index`th NFT assigned to `_owner`, /// (sort order not specified) function tokenOfOwnerByIndex(address _owner, uint256 _index) external view returns (uint256); } Caveats The 0.4.20 Solidity interface grammar is not expressive enough to document the ERC-721 standard. A contract which complies with ERC-721 MUST also abide by the following: Solidity issue #3412: The above interfaces include explicit mutability guarantees for each function. Mutability guarantees are, in order weak to strong: payable, implicit nonpayable, view, and pure. Your implementation MUST meet the mutability guarantee in this interface and you MAY meet a stronger guarantee. For example, a payable function in this interface may be implemented as nonpayble (no state mutability specified) in your contract. We expect a later Solidity release will allow your stricter contract to inherit from this interface, but a workaround for version 0.4.20 is that you can edit this interface to add stricter mutability before inheriting from your contract. Solidity issue #3419: A contract that implements ERC721Metadata or ERC721Enumerable SHALL also implement ERC721. ERC-721 implements the requirements of interface ERC-165. Solidity issue #2330: If a function is shown in this specification as external then a contract will be compliant if it uses public visibility. As a workaround for version 0.4.20, you can edit this interface to switch to public before inheriting from your contract. Solidity issues #3494, #3544: Use of this.*.selector is marked as a warning by Solidity, a future version of Solidity will not mark this as an error. If a newer version of Solidity allows the caveats to be expressed in code, then this EIP MAY be updated and the caveats removed, such will be equivalent to the original specification. Rationale There are many proposed uses of Ethereum smart contracts that depend on tracking distinguishable assets. Examples of existing or planned NFTs are LAND in Decentraland, the eponymous punks in CryptoPunks, and in-game items using systems like DMarket or EnjinCoin. Future uses include tracking real-world assets, like real-estate (as envisioned by companies like Ubitquity or Propy. It is critical in each of these cases that these items are not "lumped together" as numbers in a ledger, but instead each asset must have its ownership individually and atomically tracked. Regardless of the nature of these assets, the ecosystem will be stronger if we have a standardized interface that allows for cross-functional asset management and sales platforms. "NFT" Word Choice "NFT" was satisfactory to nearly everyone surveyed and is widely applicable to a broad universe of distinguishable digital assets. We recognize that "deed" is very descriptive for certain applications of this standard (notably, physical property). Alternatives considered: distinguishable asset, title, token, asset, equity, ticket NFT Identifiers Every NFT is identified by a unique uint256 ID inside the ERC-721 smart contract. This identifying number SHALL NOT change for the life of the contract. The pair (contract address, uint256 tokenId) will then be a globally unique and fully-qualified identifier for a specific asset on an Ethereum chain. While some ERC-721 smart contracts may find it convenient to start with ID 0 and simply increment by one for each new NFT, callers SHALL NOT assume that ID numbers have any specific pattern to them, and MUST treat the ID as a "black box". Also note that a NFTs MAY become invalid (be destroyed). Please see the enumerations functions for a supported enumeration interface. The choice of uint256 allows a wide variety of applications because UUIDs and sha3 hashes are directly convertible to uint256. Transfer Mechanism ERC-721 standardizes a safe transfer function safeTransferFrom (overloaded with and without a bytes parameter) and an unsafe function transferFrom. Transfers may be initiated by: The owner of an NFT The approved address of an NFT An authorized operator of the current owner of an NFT Additionally, an authorized operator may set the approved address for an NFT. This provides a powerful set of tools for wallet, broker and auction applications to quickly use a large number of NFTs. The transfer and accept functions' documentation only specify conditions when the transaction MUST throw. Your implementation MAY also throw in other situations. This allows implementations to achieve interesting results: Disallow transfers if the contract is paused — prior art, CryptoKitties deployed contract, line 611 Blacklist certain address from receiving NFTs — prior art, CryptoKitties deployed contract, lines 565, 566 Disallow unsafe transfers — transferFrom throws unless _to equals msg.sender or countOf(_to) is non-zero or was non-zero previously (because such cases are safe) Charge a fee to both parties of a transaction — require payment when calling approve with a non-zero _approved if it was previously the zero address, refund payment if calling approve with the zero address if it was previously a non-zero address, require payment when calling any transfer function, require transfer parameter _to to equal msg.sender, require transfer parameter _to to be the approved address for the NFT Read only NFT registry — always throw from unsafeTransfer, transferFrom, approve and setApprovalForAll Failed transactions will throw, a best practice identified in ERC-223, ERC-677, ERC-827 and OpenZeppelin's implementation of SafeERC20.sol. ERC-20 defined an allowance feature, this caused a problem when called and then later modified to a different amount, as on OpenZeppelin issue #438. In ERC-721, there is no allowance because every NFT is unique, the quantity is none or one. Therefore we receive the benefits of ERC-20's original design without problems that have been later discovered. Creating of NFTs ("minting") and destruction NFTs ("burning") is not included in the specification. Your contract may implement these by other means. Please see the event documentation for your responsibilities when creating or destroying NFTs. We questioned if the operator parameter on onERC721Received was necessary. In all cases we could imagine, if the operator was important then that operator could transfer the token to themself and then send it -- then they would be the from address. This seems contrived because we consider the operator to be a temporary owner of the token (and transferring to themself is redundant). When the operator sends the token, it is the operator acting on their own accord, NOT the operator acting on behalf of the token holder. This is why the operator and the previous token owner are both significant to the token recipient. Alternatives considered: only allow two-step ERC-20 style transaction, require that transfer functions never throw, require all functions to return a boolean indicating the success of the operation. ERC-165 Interface We chose Standard Interface Detection (ERC-165) to expose the interfaces that a ERC-721 smart contract supports. A future EIP may create a global registry of interfaces for contracts. We strongly support such an EIP and it would allow your ERC-721 implementation to implement ERC721Enumerable, ERC721Metadata, or other interfaces by delegating to a separate contract. Gas and Complexity (regarding the enumeration extension) This specification contemplates implementations that manage a few and arbitrarily large numbers of NFTs. If your application is able to grow then avoid using for/while loops in your code (see CryptoKitties bounty issue #4). These indicate your contract may be unable to scale and gas costs will rise over time without bound. We have deployed a contract, XXXXERC721, to Testnet which instantiates and tracks 340282366920938463463374607431768211456 different deeds (2^128). That's enough to assign every IPV6 address to an Ethereum account owner, or to track ownership of nanobots a few micron in size and in aggregate totalling half the size of Earth. You can query it from the blockchain. And every function takes less gas than querying the ENS. This illustration makes clear: the ERC-721 standard scales. Alternatives considered: remove the asset enumeration function if it requires a for-loop, return a Solidity array type from enumeration functions. Privacy Wallets/brokers/auctioneers identified in the motivation section have a strong need to identify which NFTs an owner owns. It may be interesting to consider a use case where NFTs are not enumerable, such as a private registry of property ownership, or a partially-private registry. However, privacy cannot be attained because an attacker can simply (!) call ownerOf for every possible tokenId. Metadata Choices (metadata extension) We have required name and symbol functions in the metadata extension. Every token EIP and draft we reviewed (ERC-20, ERC-223, ERC-677, ERC-777, ERC-827) included these functions. We remind implementation authors that the empty string is a valid response to name and symbol if you protest to the usage of this mechanism. We also remind everyone that any smart contract can use the same name and symbol as your contract. How a client may determine which ERC-721 smart contracts are well-known (canonical) is outside the scope of this standard. A mechanism is provided to associate NFTs with URIs. We expect that many implementations will take advantage of this to provide metadata for each NFT. The image size recommendation is taken from Instagram, they probably know much about image usability. The URI MAY be mutable (i.e. it changes from time to time). We considered an NFT representing ownership of a house, in this case metadata about the house (image, occupants, etc.) can naturally change. Metadata is returned as a string value. Currently this is only usable as calling from web3, not from other contracts. This is acceptable because we have not considered a use case where an on-blockchain application would query such information. Alternatives considered: put all metadata for each asset on the blockchain (too expensive), use URL templates to query metadata parts (URL templates do not work with all URL schemes, especially P2P URLs), multiaddr network address (not mature enough) Community Consensus A significant amount of discussion occurred on the original ERC-721 issue, additionally we held a first live meeting on Gitter that had good representation and well advertised (on Reddit, in the Gitter #ERC channel, and the original ERC-721 issue). Thank you to the participants: @ImAllInNow Rob from DEC Gaming / Presenting Michigan Ethereum Meetup Feb 7 @Arachnid Nick Johnson @jadhavajay Ajay Jadhav from AyanWorks @superphly Cody Marx Bailey - XRAM Capital / Sharing at hackathon Jan 20 / UN Future of Finance Hackathon. @fulldecent William Entriken A second event was held at ETHDenver 2018 to discuss distinguishable asset standards (notes to be published). We have been very inclusive in this process and invite anyone with questions or contributions into our discussion. However, this standard is written only to support the identified use cases which are listed herein. Backwards Compatibility We have adopted balanceOf, totalSupply, name and symbol semantics from the ERC-20 specification. An implementation may also include a function decimals that returns uint8(0) if its goal is to be more compatible with ERC-20 while supporting this standard. However, we find it contrived to require all ERC-721 implementations to support the decimals function. Example NFT implementations as of February 2018: CryptoKitties -- Compatible with an earlier version of this standard. CryptoPunks -- Partially ERC-20 compatible, but not easily generalizable because it includes auction functionality directly in the contract and uses function names that explicitly refer to the assets as "punks". Auctionhouse Asset Interface -- The author needed a generic interface for the Auctionhouse ÐApp (currently ice-boxed). His "Asset" contract is very simple, but is missing ERC-20 compatibility, approve() functionality, and metadata. This effort is referenced in the discussion for EIP-173. Note: "Limited edition, collectible tokens" like Curio Cards and Rare Pepe are not distinguishable assets. They're actually a collection of individual fungible tokens, each of which is tracked by its own smart contract with its own total supply (which may be 1 in extreme cases). The onERC721Received function specifically works around old deployed contracts which may inadvertently return 1 (true) in certain circumstances even if they don't implement a function (see Solidity DelegateCallReturnValue bug). By returning and checking for a magic value, we are able to distinguish actual affirmative responses versus these vacuous trues. Test Cases 0xcert ERC-721 Token includes test cases written using Truffle. Implementations 0xcert ERC721 -- a reference implementation MIT licensed, so you can freely use it for your projects Includes test cases Active bug bounty, you will be paid if you find errors Su Squares -- an advertising platform where you can rent space and place images Complete the Su Squares Bug Bounty Program to seek problems with this standard or its implementation Implements the complete standard and all optional interfaces ERC721ExampleDeed -- an example implementation Implements using the OpenZeppelin project format XXXXERC721, by William Entriken -- a scalable example implementation Deployed on testnet with 1 billion assets and supporting all lookups with the metadata extension. This demonstrates that scaling is NOT a problem. References Standards ERC-20 Token Standard. ERC-165 Standard Interface Detection. ERC-173 Owned Standard. ERC-223 Token Standard. ERC-677 transferAndCall Token Standard. ERC-827 Token Standard. Ethereum Name Service (ENS). https://ens.domains Instagram -- What's the Image Resolution? https://help.instagram.com/1631821640426723 JSON Schema. https://json-schema.org/ Multiaddr. https://github.com/multiformats/multiaddr RFC 2119 Key words for use in RFCs to Indicate Requirement Levels. https://www.ietf.org/rfc/rfc2119.txt Issues The Original ERC-721 Issue. https://github.com/ethereum/eips/issues/721 Solidity Issue #2330 -- Interface Functions are External. https://github.com/ethereum/solidity/issues/2330 Solidity Issue #3412 -- Implement Interface: Allow Stricter Mutability. https://github.com/ethereum/solidity/issues/3412 Solidity Issue #3419 -- Interfaces Can't Inherit. https://github.com/ethereum/solidity/issues/3419 Solidity Issue #3494 -- Compiler Incorrectly Reasons About the selector Function. https://github.com/ethereum/solidity/issues/3494 Solidity Issue #3544 -- Cannot Calculate Selector of Function Named transfer. https://github.com/ethereum/solidity/issues/3544 CryptoKitties Bounty Issue #4 -- Listing all Kitties Owned by a User is O(n^2). https://github.com/axiomzen/cryptokitties-bounty/issues/4 OpenZeppelin Issue #438 -- Implementation of approve method violates ERC20 standard. https://github.com/OpenZeppelin/zeppelin-solidity/issues/438 Solidity DelegateCallReturnValue Bug. https://solidity.readthedocs.io/en/develop/bugs.html#DelegateCallReturnValue Discussions Reddit (announcement of first live discussion). https://www.reddit.com/r/ethereum/comments/7r2ena/friday_119_live_discussion_on_erc_nonfungible/ Gitter #EIPs (announcement of first live discussion). https://gitter.im/ethereum/EIPs?at=5a5f823fb48e8c3566f0a5e7 ERC-721 (announcement of first live discussion). https://github.com/ethereum/eips/issues/721#issuecomment-358369377 ETHDenver 2018. https://ethdenver.com NFT Implementations and Other Projects CryptoKitties. https://www.cryptokitties.co 0xcert ERC-721 Token. https://github.com/0xcert/ethereum-erc721 Su Squares. https://tenthousandsu.com Decentraland. https://decentraland.org CryptoPunks. https://www.larvalabs.com/cryptopunks DMarket. https://www.dmarket.io Enjin Coin. https://enjincoin.io Ubitquity. https://www.ubitquity.io Propy. https://tokensale.propy.com CryptoKitties Deployed Contract. https://etherscan.io/address/0x06012c8cf97bead5deae237070f9587f8e7a266d#code Su Squares Bug Bounty Program. https://github.com/fulldecent/su-squares-bounty XXXXERC721. https://github.com/fulldecent/erc721-example ERC721ExampleDeed. https://github.com/nastassiasachs/ERC721ExampleDeed Curio Cards. https://mycuriocards.com Rare Pepe. https://rarepepewallet.com Auctionhouse Asset Interface. https://github.com/dob/auctionhouse/blob/master/contracts/Asset.sol OpenZeppelin SafeERC20.sol Implementation. https://github.com/OpenZeppelin/zeppelin-solidity/blob/master/contracts/token/ERC20/SafeERC20.sol Copyright Copyright and related rights waived via CC0. Go © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. github-com-997 ---- GitHub - hughrun/yawp: command line app for publishing social media posts Skip to content Sign up Sign up Why GitHub? Features → Mobile → Actions → Codespaces → Packages → Security → Code review → Project management → Integrations → GitHub Sponsors → Customer stories→ Team Enterprise Explore Explore GitHub → Learn and contribute Topics → Collections → Trending → Learning Lab → Open source guides → Connect with others The ReadME Project → Events → Community forum → GitHub Education → GitHub Stars program → Marketplace Pricing Plans → Compare plans → Contact Sales → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this user All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up Sign up {{ message }} hughrun / yawp Notifications Star 0 Fork 0 command line app for publishing social media posts AGPL-3.0 License 0 stars 0 forks Star Notifications Code Issues 1 Pull requests 0 Actions Projects 0 Security Insights More Code Issues Pull requests Actions Projects Security Insights main Switch branches/tags Branches Tags Nothing to show {{ refName }} default View all branches Nothing to show {{ refName }} default View all tags 1 branch 1 tag Go to file Code Clone HTTPS GitHub CLI Use Git or checkout with SVN using the web URL. Work fast with our official CLI. Learn more. Open with GitHub Desktop Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit   Git stats 12 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time src add more detail to README Apr 12, 2021 .gitignore more readme updates Apr 12, 2021 Cargo.lock Initial code commit Apr 12, 2021 Cargo.toml minor updates to cargo.toml and readme Apr 12, 2021 LICENSE Initial commit Apr 11, 2021 README.md add yawp definition to readme Apr 12, 2021 example.env add more detail to README Apr 12, 2021 View code yawp In brief Installation MacOS or Linux From source Usage: Flags: Options: Args: Environment variables Mastodon Twitter Examples README.md yawp A command line (CLI) app for publishing social media posts. In brief yawp takes some text as an argument and publishes it to the social media accounts of your choice. No need to read the comments, just send your yawp and move on with your day. Current options are Twitter and Mastodon, it's possible more will be added in future (or not). yawp is specifically designed to fit within a broader toolchain: in general terms it tries to follow "the Unix philosophy": can take input from stdin (e.g. redirected from a file or another process) outputs the message as plaintext to stdout (i.e. the output is the input) takes all configuration from environment (ENV) values to enable flexibility Installation MacOS or Linux Download the relevant binary file from the latest release. Save it somewhere in your PATH, e.g. in /usr/local/bin/. Alternatively you can symlink it from wherever you want to save it, like this: ln -s /my/awesome/directory/yawp /usr/local/bin/ From source If you're using another platform or don't trust my binaries you can build your own from source: git clone or download the repository as a zip. cargo build --release Usage: yawp [FLAGS] [OPTIONS] Flags: -h, --help Prints help information -m, --mastodon Send toot -q, --quiet Suppress output (error messages will still be sent to stderr) -t, --twitter Send tweet -V, --version Prints version information Options: -e, --env path to env file Args: Message (post) to send. If using stdin you must provide a hyphen (-) as the argument. However if you do this and are not redirecting stdin from somewhere, yawp will hang your shell unless you supply EOF by pressing Ctrl + D. (See example 5 below). Environment variables yawp requires some environment variables in order to actually publish your message. You can set these in a number of ways depending on your operating system. yawp also allows you to call them in from a file. See example 6 for using a file or example 7 for setting environment values at the same time you call yawp. An example environment variables file is provided at example.env. The possible values are: Mastodon For Mastodon you need the base url of your instance (server), and an API access token. MASTODON_ACCESS_TOKEN - You can create a token at settings - applications in your Mastodon account. You require write:statuses permission. MASTODON_BASE_URL - This is the base URL of your server. e.g. https://mastodon.social Twitter For Twitter you need the four tokens provided when you create an app at https://developer.twitter.com/en/apps. TWITTER_CONSUMER_KEY TWITTER_CONSUMER_SECRET TWITTER_ACCESS_TOKEN TWITTER_ACCESS_SECRET Examples Provide message on command line: yawp 'Hello, World!' -t # Output: Hello, World! # Tweets: Hello, World! Pipe in message: echo 'Hello again, World!' | yawp - -m # Output: Hello again, World! # Toots: Hello again, World! Read from file # create a file (echo Hello fronds; echo " It's me"; echo ...a tree 🌳) > message.txt # run yawp and direct file content into it yawp - output.txt # the message.txt and output.txt files are now identical. Read from user input This is not really recommended, but you may find yourself facing a user input prompt if you use a hyphen without providing any redirected input. i.e. if you do this: yawp - # machine awaits user further input from command line Don't panic, you can provide the message text by typing it in at the command prompt. There is a catch, however, in that yawp will wait for further input until it reaches EOF (End of File). This will not happen when you press Enter but can usually be provided by pressing Ctrl + D: yawp -t - # machine awaits user further input from command line Awoo! [Ctrl + D] # Output: Awoo! # Tweets: Awoo! Provide environment variables from file In some situtations (e.g. when using Docker Compose) you may have already set environment variables specific to those needed by yawp. If not, you can call them in from a file by providing the filepath using -e or --env: yawp -m --env 'yawp.env' 'I love to toot!' Provide environment variables on command line You could also set ENV settings manually when you call yawp: MASTODON_BASE_URL=https://ausglam.space MASTODON_ACCESS_TOKEN=abcd1234 yawp -m '🎺 I am tooting!' About command line app for publishing social media posts Resources Readme License AGPL-3.0 License Releases 1 0.1.0 Latest Apr 12, 2021 Packages 0 No packages published Languages Rust 96.8% Shell 3.2% © 2021 GitHub, Inc. Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. go-to-hellman-blogspot-com-1227 ---- Go To Hellman skip to main | skip to sidebar Go To Hellman If you wanna end war and stuff, you gotta sing loud! Monday, February 22, 2021 Open Access for Backlist Books, Part II: The All-Stars Libraries know that a big fraction of their book collections never circulate, even once. The flip side of this fact is that a small fraction of a library's collection accounts for most of the circulation. This is often referred to as Zipf's law; as a physicist I prefer to think of it as another manifestation of log-normal statistics resulting a preferential attachment mechanism for reading. (English translation: "word-of-mouth".) In my post about the value of Open Access for books, I suggested that usage statistics (circulation, downloads, etc.) are a useful proxy for the value that books generate for their readers. The logical conclusion is that the largest amount of value that can be generated from opening of the backlist comes from the books that are most used, the "all-stars" of the library, not the discount rack or the discards. If libraries are to provide funding for Open Access backlist books, shouldn't they focus their resources on the books that create the most value? The question of course, is how the library community would ever convince publishers, who have monopolies on these books as a consequence of international copyright laws, to convert these books to Open Access. Although some sort of statutory licensing or fair-use carve-outs could eventually do the trick, I believe that Open Access for a significant number of "backlist All-Stars" can be achieved today by pushing ALL the buttons available to supporters of Open Access. Here's where the Open Access can learn from the game (and business) of baseball. "Baseball", Henry Sandham, L. Prang & Co. (1861).   from Digital Commonwealth Baseball's best player, Mike Trout, should earn $33.25 million this year, a bit over $205,000 per regular season game. If he's chosen for the All-Star game, he won't get even a penny extra to play unless he's named MVP, in which case he earns a $50,000 bonus. So why would he bother to play for free? It turns out there are lots of reasons. The most important have everything to with the recognition and honor of being named as an All-Star, and with having respect for his fans. But being an All-Star is not without financial benefits considering endorsement contracts and earning potential outside of baseball. Playing in the All-Star game is an all-around no-brainer for Mike Trout. Open Access should be an All-Star game for backlist books. We need to create community-based award programs that recognize and reward backlist conversions to OA. If the world's libraries want to spend $50,000 on backlist physics books, for example, isn't it better to spend it on the the Mike Trout of physics books than on a team full of discount-rack replacement-level players? Competent publishers would line up in droves for major-league all-star backlist OA programs. They know that publicity will drive demand for their print versions (especially if NC licenses are used.) They know that awards will boost their prestige, and if they're trying to build Open Access publication programs, prestige and quality are a publisher's most important selling points. The Newbury Medal Over a hundred backlist books have been converted to open access already this year. Can you name one of them? Probably not, because the publicity value of existing OA conversion programs is negligible. To relicense an All-Star book, you need an all-star publicity program. You've heard of the Newbury Medal, right? You've seen the Newbury medal sticker on children's books, maybe even special sections for them in bookstores. That prize, award by the American Library Association every year to honor the most distinguished contributions to American literature for children, is a powerful driver of sales. The winners get feted in a gala banquet and party (at least they did in the before-times). That's the sort of publicity we need to create for open access books. If you doubt that "All-Star Open Access" could work, don't discount the fact that it's also the right thing to do. Authors of All-Star backlist books want their books to be used, cherished and remembered. Libraries want books that measurably benefit the communities they serve. Foundations and governmental agencies want to make a difference. Even publishers who look only at their bottom lines can structure a rights conversion as a charitable donation to reduce their tax bills. And did I mention that there could be Gala Award Celebrations? We need more celebrations, don't you think? If your community is interest in creating an Open-Access program for backlist books, don't hesitate to contact me at the Free Ebook Foundation! Notes I've written about the statistics of book usage here, here and here. This is the third in a series of posts about creating value of Open Access books. The first two are: Creating Value with Open Access Books Open Access for Backlist Books, Part I: The Slush Pile Posted by Eric at 9:49 PM 2 comments Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest Links to this post Labels: Baseball, Book Use, Open Access, Ungluing Ebooks Tuesday, February 16, 2021 Open Access for Backlist Books, Part I: The Slush Pile "Kale emerging from a slush pile" (CC BY, Eric Hellman) Book publishers hate their "slush pile": books submitted for publication unsolicited, rarely with literary merit and unlikely to make money for the publisher if accepted. In contrast, book publishers love their backlist; a strong backlist is what allows a book publisher to remain consistently profitable even when most of their newly published books fail to turn a profit. A publisher's backlist typically consists of a large number of "slushy" books that generate negligible income and a few steady "evergreen" earners. Publishers don't talk much about the backlist slush pile, maybe because it reminds them of their inability to predict a book's commercial success. With the advent of digital books has come new possibilities for generating value from the backlist slush pile. Digital books can be kept "in print" at essentially no cost (printed books need warehouse space) which has allowed publishers to avoid rights reversion in many cases. Some types of books can be bundled in ebook aggregations that can be offered on a subscription basis. This is reminiscent of the way investment bankers created valuable securities by packaging junk bonds with opaque derivatives. Open access is a more broadly beneficial way to generate value from the backlist slush pile. There is a reason that libraries keep large numbers of books on their shelves even when they don't circulate for years. The myriad ways that books can create value doesn't have to be tied to book sales, as I wrote in my previous post. Those of us who want to promote Open Access for backlist ebooks have a number of strategies at our disposal. The most basic strategy is to promote the visibility of these books. Libraries can add listings for these ebooks in their catalogs. Aggregators can make these books easier to find. Switching backlist books to Open Access licenses can be expensive and difficult. While the cost of digitization has dropped dramatically over the past decade, quality control is still a significant conversion expense. Licensing-related expenses are sometimes large. Unlike journals and journal articles, academic books are typically covered by publishing agreements that give authors royalties on sales and licensing, and give authors control over derivative works such as translations. No publisher would consent to OA relicensing without the consent and support of the author. For older books, a publisher may not even have electronic rights (in the US, the Tasini decision established that electronic rights are separate from print rights), or may need to have a lawyer interpret the language of the original publishing contract.  While most scholarly publishers obtain worldwide rights to the books they publish, rights for trade books are very often divided among markets. Open-access licenses such as the Creative Commons licenses are not limited to markets, so a license conversion would require the participation of every rights holder worldwide.  The CC BY license can be problematic for books containing illustrations or figures used by permission from third party rights holders. "All Rights Reserved" illustrations are often included in Open Access Books, but they are carved out of the license by separate rights statements, and to be safe, publishers use the CC BY-ND or CC BY-ND-NC license for the complete book, as the permissions do not cover derivative works. Since the CC BY license allows derivative works, it cannot be used in cases where translation rights have been sold (without also buying out the translation rights). A publisher cannot use a CC BY license for a translated work without also having rights to the original work. The bottom line is that converting a backlist book to OA often requires economic motivations quite apart from any lost sales. Luckily, there's evidence that opening access can lead to increased sales. Nagaraj and Reimers found that digitization and exposure through Google Books increased sales of print editions by 35% for books in the Public Domain.  In addition, a publisher's commercial position and prestige can be enhanced by the attribution requirement in Creative Commons licenses. Additional motivation for OA conversion of the backlist slush pile has been supplied by programs such as used by Knowledge Unlatched, where libraries contribute to to a fund used for "unlatching" backlist books. (Knowledge Unlatched has programs for front list books as well.) While such programs can in principle be applied for the "evergreen" backlist, the incentives currently in place result in the unlatching of books in the "slush pile" backlist. While value for society is being gained this way, the willingness of publishers to "unlatch" hundreds of these books poses the question of how much library funding for Open Access should be allocated to the discount bin, as opposed to the backlist books most used in libraries. That's the topic of my next post!  Notes This is the second in a series of posts about creating value of Open Access books. The others are: Creating Value with Open Access Books Open Access for Backlist Books, Part II: The All-Stars Posted by Eric at 9:32 PM 0 comments Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest Links to this post Labels: Creative Commons, ebooks, Open Access, Ungluing Ebooks Friday, February 12, 2021 Creating Value with Open Access Books Can a book be more valuable if it's free? How valuable? To whom? How do we unlock this value? I've been wrestling with these questions for over ten years now.  And for each of these questions, the answer is... it depends. A truism of the bookselling business is that "Every book is different" and the same is true of the book freeing "business". Recently there's been increased interest in academic communities around Open Access book publishing and in academic book relicensing (adding an Open Access License to an already published book). Both endeavors have been struggling with the central question of how to value an open access book. The uncertainty in OA book valuation has led to many rookie mistakes among OA stakeholders. For example, when we first started Unglue.it, we assumed that reader interest would accelerate the relicensing process for older books whose sales had declined. But the opposite turned out to be true. Evidence of reader interest let rights holders know that these backlist titles were much more valuable than sales would indicate, thus precluding any notion of making them Open Access. Pro tip: if you want to pay a publisher to make a books free, don't publish your list of incredibly valuable books! Instead of a strictly transactional approach, it's more useful to consider the myriad ways that academic books create value. Each of these value mechanisms offer buttons that we can push to promote open access, and point to new structures for markets where participants join together to create mutual value. First, consider the book's reader. The value created is the reader's increased knowledge, understanding and sometimes, sheer enjoyment. The fact of open access does not itself create the value, but removes some of the barriers which might suppress this value. It's almost impossible to quantify the understanding and enjoyment from books; but "hours spent reading" might be a useful proxy for it. Next consider a book's creator. While a small number of creators derive an income stream from their books, most academic authors benefit primarily from the development and dissemination of their ideas. In many fields of inquiry, publishing a book is the academic's path to tenure. Educators (and their students!) similarly benefit. In principle, you might assess a textbook's value by measuring student performance. The value of a book to a publisher can be more than just direct sales revenue. A widely distributed book can be a marketing tool for a publisher's entire business. In the world of Open Access, we can see new revenue models emerging - publication charges, events, sponsorships, even grants and memberships.  The value of a book to society as a whole can be enormous. In areas of research, a book might lead to technological advances, healthier living, or a more equitable society. Or a book might create outrage, civil strife, and misinformation. That's another issue entirely! Books can be valuable to secondary distributors as well. Both used book resellers and libraries add value to physical books by increasing their usage. This is much harder to accomplish for paywalled ebooks! Since academic libraries are often considered as potential funding sources for Open Access publishing it's worth noting that the value of an open access ebook to a library is entirely indirect. When a library acts as an Open Access funding source, it's acting as a proxy for the community it serves. This brings us to communities. The vast majority of books create value for specific communities, not societies as a whole. I believe that community-based funding is the most sustainable path for support of Open Access Books. Community supported OA article publishing has already had plenty of support. Communities organized by discipline have been particularly successful: consider the success that ArXiv has had in promoting Open Access in physics, both at the preprint level and for journals in high-energy physics. A similar story can be told for biomedicine, Pubmed and Pubmed Central. A different sort of community success story has been SciELO, which has used Open Access to address challenges faced by scholars in Latin America. So far, however, sustainable Open Access has proven to be challenging for scholarly ebooks. My next few posts will discuss the challenges and ways forward for support of ebook relicensing and for OA ebook creation: Open Access for Backlist Books, Part I: The Slush Pile Open Access for Backlist Books, Part II: The All-Stars Posted by Eric at 12:31 PM 0 comments Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest Links to this post Labels: Open Access, Ungluing Ebooks Tuesday, December 29, 2020 Infra-infrastructure, inter-infrastructure and para-infrastructure No one is against "Investing in Infrastructure". No one wants bridges to collapse, investing is always more popular than spending, and it's even alliterative! What's more, since infrastructure is almost invisible by definition, it's politically safe to support investing in infrastructure because no one will see when you don't follow through on your commitment! Ponte Morandi collapse - Michele Ferraris, CC BY-SA 4.0 via Wikimedia Commons Geoffrey Bilder gives a talk where he asks us to think of Crossref and similar services as "information infrastructure" akin to "plumbing", where the implication is that since we, as a society, are accustomed to paying plumbers and bridge builders lots of money, we should also pony up for "information infrastructure", which is obvious once you say it. What qualifies as infrastructure, anyway? If I invest in a new laptop, is that infrastructure for the Go-to-Hellman blog? Blogspot is Google-owned blogging infrastructure for sure. It's certainly not open infrastructure, but it works, and I haven't had to do much maintenance on it.  There's a lot of infrastructure used to make Unglue.it, which supports distribution of open-access ebooks. It uses Django, which is open-source software originally developed to support newspaper websites. Unglue.it also uses modules that extend Django that were made possible by Django's Open license. It works really well, but I've had to put a fair amount of work into updating my code to keep up with new versions of Django. Ironically, most of this work has been in fixing the extensions that have not updated along with Django. I deploy Unglue.it on AWS, which is DEFINITELY infrastructure. I have a love/hate relationship with AWS because it works so well, but every time I need to change something, I have to spend 2 hours with documentation to find the one-line incantation that make it work. But every few months, the cost of using AWS goes down, which I like, but the money goes to Amazon, which is ironic because they really don't care for the free ebooks we distribute. Aside from AWS and Django, the infrastructure I use to deliver Ebook Foundation services includes Python, Docker, Travis-CI, GitHub, git, Ubuntu Linux, MySQL, Postgres, Ansible, Requests, Beautiful Soup, and many others. The Unglue.it database relies on infrastructure services from DOAB, OAPEN, LibraryThing, Project Gutenberg, OpenLibrary and Google Books. My development environment relies heavily on BBEdit and Jupyter. We depend on Crossref and Internet Archive to resolve some links; we use subject vocabulary from Library of Congress and BISAC. You can imagine why I was interested in "JROST 2020" which turns out to stand for "Join Roadmap for Open Science Tools 2020", a meeting organized by a relatively new non-profit, "Invest in Open Infrastructure" (IOI). The meeting was open and free, and despite the challenges associated with such a meeting in our difficult times, it managed to present a provocative program along with a compelling vision. If you think a bit about how to address the infrastructure needs of open science and open scholarship in general, you come up with at least 3 questions: How do you identify the "leaky pipes" that need fixing so as to avoid systemic collapse? How do you bolster healthy infrastructure so that it won't need repair? How do you build new infrastructure that will be valuable and thrive? If it were up to me, my first steps would be to: Get people with a stake in open infrastructure to talk to each other. Break them out of their silos and figure out how their solutions can help solve problems in other communities. Create a 'venture fund" for new needed infrastructure. Work on solving the problems that no one wants to tackle on their own. Invest in Open Infrastructure is already doing this! Kaitlin Thaney, who's been Executive Director of IOI for less that a year, seems to be pressing all the right buttons. The JROST 2020 meeting was a great start on #1 and #2 is the initial direction of the "JROST Rapid Response Fund", whose first round of awards was announced at the meeting. Among the first awardees of the JROST Rapid Response Fund announced at JROST2020 was an organization that ties into the infrastructure that I use, 2i2c. It's a great example of much-needed infrastructure for scientific computing, education, digital humanities and data science. 2i2c aims to create hosted interactive computing environments that run in the cloud and are powered by entirely open-source technology (Jupyter). As I'm a Jupyter user and enthusiast, this makes me happy. But while 2i2c is the awardee,  it's being built on top of Jupyter. Is Jupyter also infrastructure? It needs investment too, doesn't it? There's a lot of overlap between the Jupyter team and the 2i2c team, so investment in one could be investment in the other. In fact, Chris Holdgraf, Executive Director of 2i2c, told me that "we see 2i2c as a way to both increase the impact of Jupyter in the research/education community, and a way to more sustainably drive resources back into the Jupyter community.". Open Science Infrastructure Interdependency (from “Scoping the Open Science Infrastructure Landscape in Europe”, https://doi.org/10.5281/zenodo.4153809) Where does Jupyter fit in the infrastructure landscape? It's nowhere to be seen on the neat "interdependency map" presented by SPARC EU at JROST. If 2i2c is an example of investment-worthy infrastructure, maybe the best way to think of Jupyter is "infra-infrastructure" - the open information infrastructure needed to build open information infrastructure. "Trickle-down" investment in this sort of infrastructure may be the best way to support projects like Jupyter so they stay open and are widely used. But wait... Jupyter is built on top of Python, right? Python needs people investing in it, Is Python infra-infra-infrastructure? And Python is built on top of C  (I won't even mention Jython or PyJS), right?? Turtles all the way down. Will 2i2c eventually get buried under other layers of infrastructure, be forgotten and underinvested in, only to be one day excavated and studied by technology archeologists? Looking carefully at the interdependency map, I don't see a lot of layers. I see a network with lots of loops. And many of the nodes are connectors themselves. Orcid and CrossRef resemble roads, bridges and plumbing not because they're hidden underneath, but because they're visible and in-between. They exist because of the entities they connect cooperate to make the connection robust instead of incidental. They're not infra-infrastructure, they're inter-infrastructure. Trickle-down investment probably wouldn't work for inter-infrastucture. Instead, investments need to come from the communities that benefit so that the communities can decide how to manage and access to the inter-infrastructure to maximize the community benefit. There's another type of infrastructure that needs investment. I work in ebooks, and a lot of overlapping communities have tackled their own special ebook problems. But the textbook people don't talk to the public domain people don't talk to the monograph people don't talk to the library people. (A slight exaggeration.) There are lots of "almost" solutions that work well for specific tasks. But with the total amount of effort being expended, we could some really amazing things... if only we were better at collaborating. For example, the Jupyter folks have gotten funding from Sloan for the "Executable Book Project". This is really cool. Similarly, there's Bookdown, which comes out of the R community. And there are other efforts to give ebooks the functionality that a website could have. Gitbook is a commercial open-source effort targeting a similar space, Rebus, a non-profit, is using Pressbooks to gain traction in the textbook space, while MIT Press's PubPub has similar goals. I'll call these overlapping efforts "para-infrastructure." Should investors in open infrastructure target investment in "rolling up" or merging these efforts? When private equity investors have done this to library automation companies the results have not benefited the user communities, so I'd say "NO!" but what's the alternative? I've observed that the folks who are doing the best job of just making stuff work rarely have the time or resources to go off to conferences or workshops. Typically, these folks have no incentive to do the work to make their tools work for slightly different problems. That can be time consuming! But it's still easier than taking someone else's work and modifying it to solve your own special problem. I think the best way to invest in open para-infrastructure is to get lots of these folks together and give the time and incentive to talk and to share solutions (and maybe code.) It's hard work, but making the web of open infrastructure stronger and more resilient is what investment in open infrastructure is all about.  Different types of open infrastructure benefit from different styles of investment; I'm hoping that IOI will build on the directions exhibited by its Rapid Response Fund and invest effectively in infra-infrastructure, inter-infrastructure, and para-infrastructure.   Notes 1. Geoff Bilder and Cameron Neylon have a nice discussion of many of the issues in this post: “Bilder G, Lin J, Neylon C (2016) Where are the pipes? Building Foundational Infrastructures for Future Services, retrieved [date], http://cameronneylon.net/blog/where-are-the-pipes-building-foundational-infrastructures-for-future-services/ ‎” 2. "Trickle-down" has a negative connotation in economics, but that's how you feed a tree, right? Posted by Eric at 1:17 PM 0 comments Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest Links to this post Labels: Crossref, ebooks, Infrastructure, Open Source Monday, October 19, 2020 We should regulate virality It turns out that virality on internet platforms is a social hazard!  Living in the age of the Covid pandemic, we see around us what happens when we let things grow exponentially. The reason that the novel coronavirus has changed our lives is not that it's often lethal - it's that it found a way to jump from one infected person to several others on average, leading to exponential growth. We are infected with virus without regard to the lethality of the virus, but only its reproduction rate. For years, websites have been built to optimize virality of content. What we see on Facebook or Twitter is not shown to us for its relevance to our lives, its education value, or even its entertainment value. It shown to us because it maximizes our "engagement" - our tendency to interact and spread it. The more we interact with a website, the more money it makes, and so a generation of minds has been employed in the pursuit of more engagement. Sometimes it's cat videos that delight us, but more often these days it's content that enrages and divides us. Our dissatisfaction with what the internet has become has led calls to regulate the giants of the internet. A lot of the political discourse has focused on "section 20" https://en.wikipedia.org/wiki/Section_230  a part of US law that gives interactive platforms such as Facebook a set of rules that result in legal immunity for content posted by users. As might be expected, many of the proposals for reform have sounded attractive, but the details are typically unworkable in the real world, and often would have effects opposite of what is intended.  I'd like to argue that the only workable approaches to regulating internet platforms should target their virality. Our society has no problem with regulations that force restaurant, food preparation facilities, and even barbershops to prevent the spread of disease, and no one ever complains that the regulations affect "good" bacteria too. These regulations are a component of our society's immune system, and they are necessary for its healthy functioning. Add caption You might think that platform virality is too technical to be amenable to regulation, but it's not. That's because of the statistical characteristics of exponential growth. My study of free ebook usage has made me aware of the pervasiveness of exponential statistics on the internet. Sometime labeled the 80-20 rule, the Pareto principle, or log-normal statistics, it's the natural result of processes that grow at a rate proportional to their size. As a result, it's possible to regulate virality of platforms because only a very small amount of content is viral enough dominate the platform. Regulate that tiny amount of super-viral content, and you create incentive to moderate the virality of platforms. The beauty of doing this is that a huge majority of content is untouched by regulation. How might this work? Imagine a law that removed a platform's immunity for content that it shows to a million people (or maybe 10 million - I've not sure what the cutoff should be). This makes sense, too; if a platform promotes illegal content in such a way that a million people see it, the platform shouldn't get immunity just because "algorithms"! It also makes it practical for platforms to curate the content for harmlessness- it won't kill off the cat videos! The Facebooks and Twitters of the world will complain, but they'll be able to add antibodies and T-cells to their platforms, and the platforms will be healthier for it. Smaller sites will be free to innovate, without too much worry, but to get funding they'll need to have plans for virality limits. So we really do have a choice; healthy platforms with diverse content, or cesspools of viral content. Doesn't seem like such a hard decision! Techdirt has excellent coverage of Section 230.  Posted by Eric at 9:29 PM 0 comments Email ThisBlogThis!Share to TwitterShare to FacebookShare to Pinterest Links to this post Labels: social networks Older Posts Home Subscribe to: Posts (Atom) Unglue.it Making the world of ebooks safe for the free. Blog Archive ▼  2021 (3) ▼  February (3) Open Access for Backlist Books, Part II: The All-S... Open Access for Backlist Books, Part I: The Slush ... Creating Value with Open Access Books ►  2020 (3) ►  December (1) ►  October (1) ►  September (1) ►  2019 (9) ►  December (1) ►  July (1) ►  May (6) ►  April (1) ►  2018 (11) ►  December (2) ►  October (1) ►  September (1) ►  August (1) ►  June (1) ►  May (2) ►  April (1) ►  March (1) ►  January (1) ►  2017 (13) ►  December (1) ►  November (1) ►  October (1) ►  September (1) ►  August (1) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (1) ►  February (1) ►  January (2) ►  2016 (12) ►  December (1) ►  October (1) ►  September (1) ►  July (1) ►  June (1) ►  May (1) ►  April (1) ►  March (2) ►  February (1) ►  January (2) ►  2015 (18) ►  December (2) ►  November (1) ►  October (1) ►  September (2) ►  August (1) ►  July (2) ►  June (2) ►  May (2) ►  April (1) ►  March (1) ►  February (2) ►  January (1) ►  2014 (30) ►  December (2) ►  November (3) ►  October (4) ►  September (4) ►  August (1) ►  July (2) ►  June (3) ►  May (2) ►  April (2) ►  March (3) ►  February (2) ►  January (2) ►  2013 (49) ►  December (3) ►  November (5) ►  October (3) ►  September (2) ►  August (3) ►  July (4) ►  June (6) ►  May (12) ►  April (3) ►  March (2) ►  February (4) ►  January (2) ►  2012 (31) ►  December (3) ►  November (2) ►  October (3) ►  September (2) ►  August (2) ►  July (2) ►  June (2) ►  May (2) ►  April (3) ►  March (3) ►  February (3) ►  January (4) ►  2011 (69) ►  December (3) ►  November (3) ►  October (7) ►  September (5) ►  August (4) ►  July (4) ►  June (6) ►  May (8) ►  April (5) ►  March (7) ►  February (9) ►  January (8) ►  2010 (87) ►  December (9) ►  November (4) ►  October (7) ►  September (5) ►  August (5) ►  July (5) ►  June (7) ►  May (6) ►  April (8) ►  March (10) ►  February (7) ►  January (14) ►  2009 (82) ►  December (10) ►  November (7) ►  October (11) ►  September (8) ►  August (5) ►  July (14) ►  June (13) ►  May (10) ►  April (4) Popular Posts One Human Brain = 8 Global Internets (Total Data Rate) It's a trope of science fiction movies . Somebody gets the content of their brain sucked out and transferred to some other brain over a ... Google's "Crypto-Cookies" are tracking Chrome users Ordinary  HTTP cookies are used in many ways to make the internet work. Cookies help websites remember their users. A common use of cookies... Sci-Hub, LibGen, and Total Information Awareness "Good thing downloads NOT trackable!" was one twitter response to my post imagining a skirmish in the imminent scholarly publi... Open Access for Backlist Books, Part II: The All-Stars Libraries know that a big fraction of their book collections never circulate, even once. The flip side of this fact is that a small fraction... Beware, Comment Spammers! I had this great idea about how to fight comment spam. If you're not familiar with comment spam, you probably don't have your own bl... Twitter Updates Twitter Updates follow me on Twitter Tweets by @gluejar Subscribe To Posts Atom Posts All Comments Atom All Comments Me Eric Eric Go To Hellman Fan Page Go To Hellman on Facebook Labels ebooks (91) Libraries (71) book industry (52) E-book (49) privacy (48) Copyright (46) business models (44) linked data (33) Semantic web (28) Ungluing Ebooks (26) Open Access (25) Creative Commons (23) physics (23) Google Book Search (21) Publishing (21) Twitter (21) Web Design and Development (21) library automation (21) Google (19) Piracy (19) Unglue.it (19) Gluejar (18) magic (18) ALA Midwinter (17) RDF (17) social practice (17) Overdrive (16) linking technology (16) scholarly publishing (16) Amazon (15) Amazon Kindle (15) identifiers (15) metadata (15) Digital rights management (14) ALA Annual (13) Book Use (13) Conferences (13) Google Book Search Settlement (13) HarperCollins (13) Crossref (12) EPUB (12) OpenURL (12) facebook (12) Just Kidding (11) New York Times (11) RDFa (11) Big Library Read (10) Book Digitization (10) HTTP Secure (10) The Four Corners of the Sky: A Novel (10) Truth (10) isbn (10) Blogging (9) Public library (9) Bugs (8) Denny Chin (8) IDPF (8) URL redirection (8) knowledgebases (8) languages (8) semtech2009 (8) social networks (8) wikipedia (8) Attributor (7) Book Rights Registry (7) Hackathon (7) Kickstarter (7) Library (7) Project Gutenberg (7) RA21 (7) bit.ly (7) Apple (6) DOI (6) Digital library (6) Google Books (6) IPad (6) India (6) New Jersey (6) Newspaper industry (6) Open Source (6) Public Domain (6) semantic technology (6) Digital Object Identifier (5) Entrepreneurship (5) Intel (5) Interlibrary loan (5) Library journal (5) Microdata (5) OCLC (5) Star Trek (5) authentication (5) public identity (5) Aaron Swartz (4) Amazon Web Services (4) American Library Association (4) Bell Labs (4) Bitcoin (4) Brian O'Leary (4) DPLA (4) Electronic Journals (4) J. K. Rowling (4) Koha (4) Liblime (4) LibraryThing (4) Neal Stephenson (4) Publishing Point (4) SOPA (4) crowdfunding (4) my attic (4) Accessibility (3) AdWords (3) Adobe Digital Editions (3) Baseball (3) Code4Lib (3) Cryptography (3) Forms of government (3) Geolocation (3) GitHub (3) Google Analytics (3) Google Wave (3) JSTOR (3) Macmillan (3) Network Effect (3) New York Public Library (3) OWL (3) PTFS (3) Search Engine Optimization (3) Sweden (3) death (3) genealogy (3) hashtags (3) http-range (3) iPhone (3) my dad (3) security (3) unicode (3) Advertising (2) Americans with Disabilities Act of 1990 (2) Book Design (2) Book Industry Study Group (2) Bruce Springsteen (2) Database Licensing (2) Disruptive technology (2) Electronic Frontier Foundation (2) FRBR (2) Fair use (2) Fan Fiction (2) File sharing (2) Fusion Tables (2) Gitenberg (2) Google Book (2) Great Gatsby (2) Hachette Book Group (2) Hal Varian (2) Hurricane Sandy (2) John Sundman (2) Nook (2) OpenID (2) OpenSource (2) Payments (2) Philadelphia Phillies (2) Proxy server (2) Radiolab (2) Random House (2) Rush Holt (2) School library (2) Social network (2) Spam (2) Star trek TNG (2) Vegetables (2) Wolfram Alpha (2) blockchain (2) ebrary (2) linkedin (2) politics (2) technology (2) tr.im (2) AdaptiveBlue (1) Assistive Technology (1) Beer (1) Bibliocommons (1) Bots (1) Brewster Kahle (1) Clay Johnson (1) Clayton M. Christensen (1) Comic Con (1) DBpedia (1) DCWG (1) Dave Winer (1) Digital watermarking (1) EBL (1) Evan Ratliff (1) Evert Taube (1) Firefox (1) GNU Affero General Public License (1) Garage sale (1) Hugh Howie (1) Ian Davis (1) Infochimps (1) Infrastructure (1) Instant Messaging (1) Internet Archive (1) Jon Stewart (1) Knowledge representation (1) Kobo (1) Lawrence Lessig (1) Mac OS X (1) Metcalfe's Law (1) Neil Gaiman (1) Neurobiology (1) ORCID (1) Open Database License (1) Open Knowledge Foundation (1) Open Library (1) PDDL (1) Paypal (1) ProQuest (1) PubMed (1) Qin Dynasty (1) Qin Shi Huangdi (1) RV Guha (1) Ralph Waldo Emerson (1) SPARQL (1) Simon and Schuster (1) Single sign-on (1) Siri (1) Star Wars (1) Text-To-Speech (1) Textbooks (1) The Hitchhiker's Guide to the Galaxy (1) Tim O'Reilly (1) Tor (anonymity network) (1) Warner Oland (1) Weeds (1) YouTube (1) Zemanta (1) Zola Books (1) dead serious (1) design patterns (1) gmail (1) h1n1 (1) life (1) patents (1) shibboleth (1) swedish music (1) twitterdata (1) If you are a Comment Spammer, comments are closed to you.   Your use of this material is subject to the Go To Hellman Blog License Agreement. This blog uses StatCounter analytics; they set a tracking cookie that may spy on you. go-to-hellman-blogspot-com-7799 ---- Go To Hellman Go To Hellman If you wanna end war and stuff, you gotta sing loud! Open Access for Backlist Books, Part II: The All-Stars Open Access for Backlist Books, Part I: The Slush Pile Creating Value with Open Access Books Infra-infrastructure, inter-infrastructure and para-infrastructure We should regulate virality Notes on work-from-home teams Your Identity, Your Library Four-Leaf Clovers Responding to Critical Reviews RA21: Technology is not the problem. RA21 doesn't address the yet-another-WAYF problem. Radical inclusiveness would. RA21's recommended technical approach is broken by emerging browser privacy features RA21 Draft RP session timeout recommendation considered harmful RA21 RP does not require secure protocols. It should. Fudge, and open access ebook download statistics On the Surveillance Techno-state Towards Impact-based OA Funding A Milestone for GITenberg eBook DRM and Blockchain play CryptoKitty and Mouse. And the Winner is... My Face is Personally Identifiable Information The Vast Potential for Blockchain in Libraries The Shocking Truth About RA21: It's Made of People! Choose Privacy Week: Your Library Organization Is Watching You Everything* You Always Wanted To Know About Voodoo (But Were Afraid To Ask) Holtzbrinck has attacked Project Gutenberg in a new front in the War of Copyright Maximization groups-google-com-1246 ---- Research Software Alliance (ReSA) - Google Groups Groups Conversations All groups and messages Send feedback to Google Help Account Search Maps YouTube Play News Gmail Meet Chat Contacts Drive Calendar Translate Photos Duo Chrome Shopping Finance Docs Sheets Slides Books Blogger Hangouts Keep Jamboard Earth Collections Arts and Culture Google Ads Podcasts Stadia Travel Forms More from Google Sign in Groups Research Software Alliance (ReSA) Conversations About Privacy •Â Terms Research Software Alliance (ReSA) You don't have permission to access this content For access, try logging in If you are subscribed to this group and have noticed abuse, report abusive group. Search Clear search Close search Google apps Main menu groups-google-com-2597 ---- Google Groups To use Google Groups Discussions, please enable JavaScript in your browser settings, and then refresh this page. . Search Images Maps Play YouTube News Gmail Drive More » Help | Report an issue about Google Groups | Keyboard shortcuts | Sign in guides-library-ucsc-edu-6046 ---- Home - Research Data Management - Library Guides at University of California, Santa Cruz Skip to main content University Library Hours My Account Contact Us Giving Search form Search MenuUniversity Library Find & Borrow Research Materials Search: Books, Articles & More Start your search for research materials Course Reserves Set up reserves or find course materials Borrowing Policies Databases A - Z Continue your research with more databases Interlibrary Loan: Borrow from other libraries Borrow items from libraries worldwide Search Libraries Worldwide (Melvyl) Search beyond the UCSC Library Request a Purchase Help & Tutorials Recommended Resources Find the best databases for your classes Get Research Help Contact the library with your questions Cite Your Sources Get help with citation basics Sign In from Off-Campus Access books, articles, and other online materials from off-campus Start Your Research Learn how to use library resources Collections & Scholarly Communication Our Collections Digital collections, video games, maps, and more Media Collection & Desk Borrow films, music, and digital equipment Special Collections Find and use our unique collections and archives Borrow Tech & Equipment Laptops, cameras, mics, and more Faculty & Teaching Support Faculty & Graduate Services Learn about how we support your work Open Access Learn about OA policies and publishing Online Journals Locate a journal by its title Teaching Support Consult with us on your next assignment Digital Scholarship Upgrade your digital skills About the Library About the Library News & Events Stay up-to-date on library events Library Computers Find and use computer stations at both libraries McHenry Library Reserve a Study Room Student Study Center Science & Engineering Library Campus Maps & Directions Find our libraries on campus Research Data Management Home Create Your Plan (DMPTool) Preserve & Publish (Dryad) Find Data For ReUse Best Practices Tools Video Tutorials Researcher to Researcher We can help you Create a Data Management Plan Easily create a data management plan for your next grant proposal using the DMPTool Preserve & Publish Your Data Publish your data in Dryad for preservation and discovery Manage your paper or data set with a unique persistent identifier. Request a DOI (Digital Object Identifier) Manage Your Data Check out these best practices for file naming, file organization, file formats, archival data storage, metadata creation and data sharing options Find Data for Reuse Locate an appropriate data repository Research Data Management Lifecycle Our Goal To assist UCSC faculty, staff and students with strategies and tools for organizing, managing and preserving research data throughout the research data life cycle.  Request a Data Consultation Scholarly Communication & eResearch Team email: research@library.ucsc.edu UCSC Campus Services UCSC ITS provides a range of research support services, including data backup. Next: Create Your Plan (DMPTool) >> 1156 High StreetSanta Cruz, CA95064 Feedback Creative Commons Attribution 3.0 License except where otherwise noted. Patrons with Disabilities Privacy Policy Staff Portal LibApps Login Incident Form (staff only) Print Page Edit this page Tags: how do i...?, special topics hangingtogether-org-1218 ---- None hangingtogether-org-261 ---- None hangingtogether-org-3299 ---- None hangingtogether-org-3574 ---- None hangingtogether-org-4228 ---- None hangingtogether-org-4999 ---- None hangingtogether-org-5859 ---- None hangingtogether-org-7556 ---- None hangingtogether-org-8658 ---- None hbr-org-5273 ---- Strategies for Learning from Failure Subscribe Sign In CLEAR SUGGESTED TOPICS Explore HBR Diversity Latest The Magazine Most Popular Podcasts Video Store Webinars Newsletters Popular Topics Managing Yourself Leadership Strategy Managing Teams Gender Innovation Work-life Balance All Topics For Subscribers The Big Idea Visual Library Reading Lists Case Selections Subscribe My Account My Library Topic Feeds Orders Account Settings Email Preferences Log Out Sign In Subscribe Diversity Latest Podcasts Video The Magazine Store Webinars Newsletters All Topics The Big Idea Visual Library Reading Lists Case Selections My Library Account Settings Log Out Sign In Your Cart Your Shopping Cart is empty. Visit Our Store Guest User Subscriber My Library Topic Feeds Orders Account Settings Email Preferences Log Out Reading List Reading Lists You have 1 free articles left this month. You are reading your last free article for this month. Subscribe for unlimited access. Create an account to read 2 more. Leadership Strategies for Learning from Failure We are programmed at an early age to think that failure is bad. That belief prevents organizations from effectively learning from their missteps. by Amy C. Edmondson by Amy C. Edmondson From the Magazine (April 2011) Tweet Post Share Save Get PDF Buy Copies Print Summary.    Reprint: R1104B Many executives believe that all failure is bad (although it usually provides lessons) and that learning from it is pretty straightforward. The author, a professor at Harvard Business School, thinks both beliefs are misguided. In organizational life, she says, some failures are inevitable and some are even good. And successful learning from failure is not simple: It requires context-specific strategies. But first leaders must understand how the blame game gets in the way and work to create an organizational culture in which employees feel safe admitting or reporting on failure. Failures fall into three categories: preventable ones in predictable operations, which usually involve deviations from spec; unavoidable ones in complex systems, which may arise from unique combinations of needs, people, and problems; and intelligent ones at the frontier, where “good” failures occur quickly and on a small scale, providing the most valuable information. Strong leadership can build a learning culture—one in which failures large and small are consistently reported and deeply analyzed, and opportunities to experiment are proactively sought. Executives commonly and understandably worry that taking a sympathetic stance toward failure will create an “anything goes” work environment. They should instead recognize that failure is inevitable in today’s complex work organizations. Tweet Post Share Save Get PDF Buy Copies Print Leer en español The wisdom of learning from failure is incontrovertible. Yet organizations that do it well are extraordinarily rare. This gap is not due to a lack of commitment to learning. Managers in the vast majority of enterprises that I have studied over the past 20 years—pharmaceutical, financial services, product design, telecommunications, and construction companies; hospitals; and NASA’s space shuttle program, among others—genuinely wanted to help their organizations learn from failures to improve future performance. In some cases they and their teams had devoted many hours to after-action reviews, postmortems, and the like. But time after time I saw that these painstaking efforts led to no real change. The reason: Those managers were thinking about failure the wrong way. Most executives I’ve talked to believe that failure is bad (of course!). They also believe that learning from it is pretty straightforward: Ask people to reflect on what they did wrong and exhort them to avoid similar mistakes in the future—or, better yet, assign a team to review and write a report on what happened and then distribute it throughout the organization. These widely held beliefs are misguided. First, failure is not always bad. In organizational life it is sometimes bad, sometimes inevitable, and sometimes even good. Second, learning from organizational failures is anything but straightforward. The attitudes and activities required to effectively detect and analyze failures are in short supply in most companies, and the need for context-specific learning strategies is underappreciated. Organizations need new and better ways to go beyond lessons that are superficial (“Procedures weren’t followed”) or self-serving (“The market just wasn’t ready for our great new product”). That means jettisoning old cultural beliefs and stereotypical notions of success and embracing failure’s lessons. Leaders can begin by understanding how the blame game gets in the way. The Blame Game Failure and fault are virtually inseparable in most households, organizations, and cultures. Every child learns at some point that admitting failure means taking the blame. That is why so few organizations have shifted to a culture of psychological safety in which the rewards of learning from failure can be fully realized. Executives I’ve interviewed in organizations as different as hospitals and investment banks admit to being torn: How can they respond constructively to failures without giving rise to an anything-goes attitude? If people aren’t blamed for failures, what will ensure that they try as hard as possible to do their best work? This concern is based on a false dichotomy. In actuality, a culture that makes it safe to admit and report on failure can—and in some organizational contexts must—coexist with high standards for performance. To understand why, look at the exhibit “A Spectrum of Reasons for Failure,” which lists causes ranging from deliberate deviation to thoughtful experimentation. Which of these causes involve blameworthy actions? Deliberate deviance, first on the list, obviously warrants blame. But inattention might not. If it results from a lack of effort, perhaps it’s blameworthy. But if it results from fatigue near the end of an overly long shift, the manager who assigned the shift is more at fault than the employee. As we go down the list, it gets more and more difficult to find blameworthy acts. In fact, a failure resulting from thoughtful experimentation that generates valuable information may actually be praiseworthy. When I ask executives to consider this spectrum and then to estimate how many of the failures in their organizations are truly blameworthy, their answers are usually in single digits—perhaps 2% to 5%. But when I ask how many are treated as blameworthy, they say (after a pause or a laugh) 70% to 90%. The unfortunate consequence is that many failures go unreported and their lessons are lost. Not All Failures Are Created Equal A sophisticated understanding of failure’s causes and contexts will help to avoid the blame game and institute an effective strategy for learning from failure. Although an infinite number of things can go wrong in organizations, mistakes fall into three broad categories: preventable, complexity-related, and intelligent. Preventable failures in predictable operations. Most failures in this category can indeed be considered “bad.” They usually involve deviations from spec in the closely defined processes of high-volume or routine operations in manufacturing and services. With proper training and support, employees can follow those processes consistently. When they don’t, deviance, inattention, or lack of ability is usually the reason. But in such cases, the causes can be readily identified and solutions developed. Checklists (as in the Harvard surgeon Atul Gawande’s recent best seller The Checklist Manifesto) are one solution. Another is the vaunted Toyota Production System, which builds continual learning from tiny failures (small process deviations) into its approach to improvement. As most students of operations know well, a team member on a Toyota assembly line who spots a problem or even a potential problem is encouraged to pull a rope called the andon cord, which immediately initiates a diagnostic and problem-solving process. Production continues unimpeded if the problem can be remedied in less than a minute. Otherwise, production is halted—despite the loss of revenue entailed—until the failure is understood and resolved. Unavoidable failures in complex systems. A large number of organizational failures are due to the inherent uncertainty of work: A particular combination of needs, people, and problems may have never occurred before. Triaging patients in a hospital emergency room, responding to enemy actions on the battlefield, and running a fast-growing start-up all occur in unpredictable situations. And in complex organizations like aircraft carriers and nuclear power plants, system failure is a perpetual risk. Although serious failures can be averted by following best practices for safety and risk management, including a thorough analysis of any such events that do occur, small process failures are inevitable. To consider them bad is not just a misunderstanding of how complex systems work; it is counterproductive. Avoiding consequential failures means rapidly identifying and correcting small failures. Most accidents in hospitals result from a series of small failures that went unnoticed and unfortunately lined up in just the wrong way. Intelligent failures at the frontier. Failures in this category can rightly be considered “good,” because they provide valuable new knowledge that can help an organization leap ahead of the competition and ensure its future growth—which is why the Duke University professor of management Sim Sitkin calls them intelligent failures. They occur when experimentation is necessary: when answers are not knowable in advance because this exact situation hasn’t been encountered before and perhaps never will be again. Discovering new drugs, creating a radically new business, designing an innovative product, and testing customer reactions in a brand-new market are tasks that require intelligent failures. “Trial and error” is a common term for the kind of experimentation needed in these settings, but it is a misnomer, because “error” implies that there was a “right” outcome in the first place. At the frontier, the right kind of experimentation produces good failures quickly. Managers who practice it can avoid the unintelligent failure of conducting experiments at a larger scale than necessary. Leaders of the product design firm IDEO understood this when they launched a new innovation-strategy service. Rather than help clients design new products within their existing lines—a process IDEO had all but perfected—the service would help them create new lines that would take them in novel strategic directions. Knowing that it hadn’t yet figured out how to deliver the service effectively, the company started a small project with a mattress company and didn’t publicly announce the launch of a new business. Although the project failed—the client did not change its product strategy—IDEO learned from it and figured out what had to be done differently. For instance, it hired team members with MBAs who could better help clients create new businesses and made some of the clients’ managers part of the team. Today strategic innovation services account for more than a third of IDEO’s revenues. Tolerating unavoidable process failures in complex systems and intelligent failures at the frontiers of knowledge won’t promote mediocrity. Indeed, tolerance is essential for any organization that wishes to extract the knowledge such failures provide. But failure is still inherently emotionally charged; getting an organization to accept it takes leadership. Building a Learning Culture Only leaders can create and reinforce a culture that counteracts the blame game and makes people feel both comfortable with and responsible for surfacing and learning from failures. (See the sidebar “How Leaders Can Build a Psychologically Safe Environment.”) They should insist that their organizations develop a clear understanding of what happened—not of “who did it”—when things go wrong. This requires consistently reporting failures, small and large; systematically analyzing them; and proactively searching for opportunities to experiment. How Leaders Can Build a Psychologically Safe Environment If an organization’s employees are to help spot existing and pending failures and to learn from them, their leaders must make it safe to speak up. Julie Morath, the chief operating officer of Children’s Hospital and Clinics of Minnesota from 1999 to 2009, did just that when she led a highly successful effort to reduce medical errors. Here are five practices I’ve identified in my research, with examples of how Morath employed them to build a psychologically safe environment. Frame the Work Accurately People need a shared understanding of the kinds of failures that can be expected to occur in a given work context (routine production, complex operations, or innovation) and why openness and collaboration are important for surfacing and learning from them. Accurate framing detoxifies failure. In a complex operation like a hospital, many consequential failures are the result of a series of small events. To heighten awareness of this system complexity, Morath presented data on U.S. medical error rates, organized discussion groups, and built a team of key influencers from throughout the organization to help spread knowledge and understanding of the challenge. Embrace Messengers Those who come forward with bad news, questions, concerns, or mistakes should be rewarded rather than shot. Celebrate the value of the news first and then figure out how to fix the failure and learn from it. Morath implemented “blameless reporting”—an approach that encouraged employees to reveal medical errors and near misses anonymously. Her team created a new patient safety report, which expanded on the previous version by asking employees to describe incidents in their own words and to comment on the possible causes. Soon after the new system was implemented, the rate of reported failures shot up. Morath encouraged her people to view the data as good news, because the hospital could learn from failures—and made sure that teams were assigned to analyze every incident. Acknowledge Limits Being open about what you don’t know, mistakes you’ve made, and what you can’t get done alone will encourage others to do the same. As soon as she joined the hospital, Morath explained her passion for patient safety and acknowledged that as a newcomer, she had only limited knowledge of how things worked at Children’s. In group presentations and one-on-one discussions, she made clear that she would need everyone’s help to reduce errors. Invite Participation Ask for observations and ideas and create opportunities for people to detect and analyze failures and promote intelligent experiments. Inviting participation helps defuse resistance and defensiveness. Morath set up cross-disciplinary teams to analyze failures and personally asked thoughtful questions of employees at all levels. Early on, she invited people to reflect on their recent experiences in caring for patients: Was everything as safe as they would have wanted it to be? This helped them recognize that the hospital had room for improvement. Suddenly, people were lining up to help. Set Boundaries and Hold People Accountable Paradoxically, people feel psychologically safer when leaders are clear about what acts are blameworthy. And there must be consequences. But if someone is punished or fired, tell those directly and indirectly affected what happened and why it warranted blame. When she instituted blameless reporting, Morath explained to employees that although reporting would not be punished, specific behaviors (such as reckless conduct, conscious violation of standards, failing to ask for help when over one’s head) would. If someone makes the same mistake three times and is then laid off, coworkers usually express relief, along with sadness and concern—they understand that patients were at risk and that extra vigilance was required from others to counterbalance the person’s shortcomings. Leaders should also send the right message about the nature of the work, such as reminding people in R&D, “We’re in the discovery business, and the faster we fail, the faster we’ll succeed.” I have found that managers often don’t understand or appreciate this subtle but crucial point. They also may approach failure in a way that is inappropriate for the context. For example, statistical process control, which uses data analysis to assess unwarranted variances, is not good for catching and correcting random invisible glitches such as software bugs. Nor does it help in the development of creative new products. Conversely, though great scientists intuitively adhere to IDEO’s slogan, “Fail often in order to succeed sooner,” it would hardly promote success in a manufacturing plant. The slogan “Fail often in order to succeed sooner” would hardly promote success in a manufacturing plant. Often one context or one kind of work dominates the culture of an enterprise and shapes how it treats failure. For instance, automotive companies, with their predictable, high-volume operations, understandably tend to view failure as something that can and should be prevented. But most organizations engage in all three kinds of work discussed above—routine, complex, and frontier. Leaders must ensure that the right approach to learning from failure is applied in each. All organizations learn from failure through three essential activities: detection, analysis, and experimentation. Detecting Failure Spotting big, painful, expensive failures is easy. But in many organizations any failure that can be hidden is hidden as long as it’s unlikely to cause immediate or obvious harm. The goal should be to surface it early, before it has mushroomed into disaster. Shortly after arriving from Boeing to take the reins at Ford, in September 2006, Alan Mulally instituted a new system for detecting failures. He asked managers to color code their reports green for good, yellow for caution, or red for problems—a common management technique. According to a 2009 story in Fortune, at his first few meetings all the managers coded their operations green, to Mulally’s frustration. Reminding them that the company had lost several billion dollars the previous year, he asked straight out, “Isn’t anything not going well?” After one tentative yellow report was made about a serious product defect that would probably delay a launch, Mulally responded to the deathly silence that ensued with applause. After that, the weekly staff meetings were full of color. That story illustrates a pervasive and fundamental problem: Although many methods of surfacing current and pending failures exist, they are grossly underutilized. Total Quality Management and soliciting feedback from customers are well-known techniques for bringing to light failures in routine operations. High-reliability-organization (HRO) practices help prevent catastrophic failures in complex systems like nuclear power plants through early detection. Electricité de France, which operates 58 nuclear power plants, has been an exemplar in this area: It goes beyond regulatory requirements and religiously tracks each plant for anything even slightly out of the ordinary, immediately investigates whatever turns up, and informs all its other plants of any anomalies. Such methods are not more widely employed because all too many messengers—even the most senior executives—remain reluctant to convey bad news to bosses and colleagues. One senior executive I know in a large consumer products company had grave reservations about a takeover that was already in the works when he joined the management team. But, overly conscious of his newcomer status, he was silent during discussions in which all the other executives seemed enthusiastic about the plan. Many months later, when the takeover had clearly failed, the team gathered to review what had happened. Aided by a consultant, each executive considered what he or she might have done to contribute to the failure. The newcomer, openly apologetic about his past silence, explained that others’ enthusiasm had made him unwilling to be “the skunk at the picnic.” In researching errors and other failures in hospitals, I discovered substantial differences across patient-care units in nurses’ willingness to speak up about them. It turned out that the behavior of midlevel managers—how they responded to failures and whether they encouraged open discussion of them, welcomed questions, and displayed humility and curiosity—was the cause. I have seen the same pattern in a wide range of organizations. A horrific case in point, which I studied for more than two years, is the 2003 explosion of the Columbia space shuttle, which killed seven astronauts (see “Facing Ambiguous Threats,” by Michael A. Roberto, Richard M.J. Bohmer, and Amy C. Edmondson, HBR November 2006). NASA managers spent some two weeks downplaying the seriousness of a piece of foam’s having broken off the left side of the shuttle at launch. They rejected engineers’ requests to resolve the ambiguity (which could have been done by having a satellite photograph the shuttle or asking the astronauts to conduct a space walk to inspect the area in question), and the major failure went largely undetected until its fatal consequences 16 days later. Ironically, a shared but unsubstantiated belief among program managers that there was little they could do contributed to their inability to detect the failure. Postevent analyses suggested that they might indeed have taken fruitful action. But clearly leaders hadn’t established the necessary culture, systems, and procedures. One challenge is teaching people in an organization when to declare defeat in an experimental course of action. The human tendency to hope for the best and try to avoid failure at all costs gets in the way, and organizational hierarchies exacerbate it. As a result, failing R&D projects are often kept going much longer than is scientifically rational or economically prudent. We throw good money after bad, praying that we’ll pull a rabbit out of a hat. Intuition may tell engineers or scientists that a project has fatal flaws, but the formal decision to call it a failure may be delayed for months. Again, the remedy—which does not necessarily involve much time and expense—is to reduce the stigma of failure. Eli Lilly has done this since the early 1990s by holding “failure parties” to honor intelligent, high-quality scientific experiments that fail to achieve the desired results. The parties don’t cost much, and redeploying valuable resources—particularly scientists—to new projects earlier rather than later can save hundreds of thousands of dollars, not to mention kickstart potential new discoveries. Analyzing Failure Once a failure has been detected, it’s essential to go beyond the obvious and superficial reasons for it to understand the root causes. This requires the discipline—better yet, the enthusiasm—to use sophisticated analysis to ensure that the right lessons are learned and the right remedies are employed. The job of leaders is to see that their organizations don’t just move on after a failure but stop to dig in and discover the wisdom contained in it. Why is failure analysis often shortchanged? Because examining our failures in depth is emotionally unpleasant and can chip away at our self-esteem. Left to our own devices, most of us will speed through or avoid failure analysis altogether. Another reason is that analyzing organizational failures requires inquiry and openness, patience, and a tolerance for causal ambiguity. Yet managers typically admire and are rewarded for decisiveness, efficiency, and action—not thoughtful reflection. That is why the right culture is so important. The challenge is more than emotional; it’s cognitive, too. Even without meaning to, we all favor evidence that supports our existing beliefs rather than alternative explanations. We also tend to downplay our responsibility and place undue blame on external or situational factors when we fail, only to do the reverse when assessing the failures of others—a psychological trap known as fundamental attribution error. My research has shown that failure analysis is often limited and ineffective—even in complex organizations like hospitals, where human lives are at stake. Few hospitals systematically analyze medical errors or process flaws in order to capture failure’s lessons. Recent research in North Carolina hospitals, published in November 2010 in the New England Journal of Medicine, found that despite a dozen years of heightened awareness that medical errors result in thousands of deaths each year, hospitals have not become safer. Fortunately, there are shining exceptions to this pattern, which continue to provide hope that organizational learning is possible. At Intermountain Healthcare, a system of 23 hospitals that serves Utah and southeastern Idaho, physicians’ deviations from medical protocols are routinely analyzed for opportunities to improve the protocols. Allowing deviations and sharing the data on whether they actually produce a better outcome encourages physicians to buy into this program. (See “Fixing Health Care on the Front Lines,” by Richard M.J. Bohmer, HBR April 2010.) Motivating people to go beyond first-order reasons (procedures weren’t followed) to understanding the second- and third-order reasons can be a major challenge. One way to do this is to use interdisciplinary teams with diverse skills and perspectives. Complex failures in particular are the result of multiple events that occurred in different departments or disciplines or at different levels of the organization. Understanding what happened and how to prevent it from happening again requires detailed, team-based discussion and analysis. A team of leading physicists, engineers, aviation experts, naval leaders, and even astronauts devoted months to an analysis of the Columbia disaster. They conclusively established not only the first-order cause—a piece of foam had hit the shuttle’s leading edge during launch—but also second-order causes: A rigid hierarchy and schedule-obsessed culture at NASA made it especially difficult for engineers to speak up about anything but the most rock-solid concerns. Promoting Experimentation The third critical activity for effective learning is strategically producing failures—in the right places, at the right times—through systematic experimentation. Researchers in basic science know that although the experiments they conduct will occasionally result in a spectacular success, a large percentage of them (70% or higher in some fields) will fail. How do these people get out of bed in the morning? First, they know that failure is not optional in their work; it’s part of being at the leading edge of scientific discovery. Second, far more than most of us, they understand that every failure conveys valuable information, and they’re eager to get it before the competition does. In contrast, managers in charge of piloting a new product or service—a classic example of experimentation in business—typically do whatever they can to make sure that the pilot is perfect right out of the starting gate. Ironically, this hunger to succeed can later inhibit the success of the official launch. Too often, managers in charge of pilots design optimal conditions rather than representative ones. Thus the pilot doesn’t produce knowledge about what won’t work. Too often, pilots are conducted under optimal conditions rather than representative ones. Thus they can’t show what won’t work. In the very early days of DSL, a major telecommunications company I’ll call Telco did a full-scale launch of that high-speed technology to consumer households in a major urban market. It was an unmitigated customer-service disaster. The company missed 75% of its commitments and found itself confronted with a staggering 12,000 late orders. Customers were frustrated and upset, and service reps couldn’t even begin to answer all their calls. Employee morale suffered. How could this happen to a leading company with high satisfaction ratings and a brand that had long stood for excellence? A small and extremely successful suburban pilot had lulled Telco executives into a misguided confidence. The problem was that the pilot did not resemble real service conditions: It was staffed with unusually personable, expert service reps and took place in a community of educated, tech-savvy customers. But DSL was a brand-new technology and, unlike traditional telephony, had to interface with customers’ highly variable home computers and technical skills. This added complexity and unpredictability to the service-delivery challenge in ways that Telco had not fully appreciated before the launch. A more useful pilot at Telco would have tested the technology with limited support, unsophisticated customers, and old computers. It would have been designed to discover everything that could go wrong—instead of proving that under the best of conditions everything would go right. (See the sidebar “Designing Successful Failures.”) Of course, the managers in charge would have to have understood that they were going to be rewarded not for success but, rather, for producing intelligent failures as quickly as possible. Designing Successful Failures Perhaps unsurprisingly, pilot projects are usually designed to succeed rather than to produce intelligent failures—those that generate valuable information. To know if you’ve designed a genuinely useful pilot, consider whether your managers can answer yes to the following questions: Is the pilot being tested under typical circumstances (rather than optimal conditions)? Do the employees, customers, and resources represent the firm’s real operating environment? Is the goal of the pilot to learn as much as possible (rather than to demonstrate the value of the proposed offering)? Is the goal of learning well understood by all employees and managers? Is it clear that compensation and performance reviews are not based on a successful outcome for the pilot? Were explicit changes made as a result of the pilot test? In short, exceptional organizations are those that go beyond detecting and analyzing failures and try to generate intelligent ones for the express purpose of learning and innovating. It’s not that managers in these organizations enjoy failure. But they recognize it as a necessary by-product of experimentation. They also realize that they don’t have to do dramatic experiments with large budgets. Often a small pilot, a dry run of a new technique, or a simulation will suffice.   The courage to confront our own and others’ imperfections is crucial to solving the apparent contradiction of wanting neither to discourage the reporting of problems nor to create an environment in which anything goes. This means that managers must ask employees to be brave and speak up—and must not respond by expressing anger or strong disapproval of what may at first appear to be incompetence. More often than we realize, complex systems are at work behind organizational failures, and their lessons and improvement opportunities are lost when conversation is stifled. Savvy managers understand the risks of unbridled toughness. They know that their ability to find out about and help resolve problems depends on their ability to learn about them. But most managers I’ve encountered in my research, teaching, and consulting work are far more sensitive to a different risk—that an understanding response to failures will simply create a lax work environment in which mistakes multiply. This common worry should be replaced by a new paradigm—one that recognizes the inevitability of failure in today’s complex work organizations. Those that catch, correct, and learn from failure before others do will succeed. Those that wallow in the blame game will not. A version of this article appeared in the April 2011 issue of Harvard Business Review. Read more on Leadership or related topics Organizational culture, Knowledge management, Business processes and Experimentation Amy C. Edmondson is the Novartis Professor of Leadership and Management at Harvard Business School. She is the author of The Fearless Organization: Creating Psychological Safety in the Workplace for Learning, Innovation, and Growth (Wiley, 2019). Tweet Post Share Save Get PDF Buy Copies Print Read more on Leadership or related topics Organizational culture, Knowledge management, Business processes and Experimentation Partner Center Diversity Latest Magazine Popular Topics Podcasts Video Store The Big Idea Visual Library Case Selections Subscribe Explore HBR The Latest Most Popular All Topics Magazine Archive The Big Idea Reading Lists Case Selections Video Podcasts Webinars Visual Library My Library Newsletters HBR Press HBR Ascend HBR Store Article Reprints Books Cases Collections Magazine Issues HBR Guide Series HBR 20-Minute Managers HBR Emotional Intelligence Series HBR Must Reads Tools About HBR Contact Us Advertise with Us Information for Booksellers/Retailers Masthead Global Editions Media Inquiries Guidelines for Authors HBR Analytic Services Copyright Permissions Manage My Account My Library Topic Feeds Orders Account Settings Email Preferences Account FAQ Help Center Contact Customer Service Follow HBR Facebook Twitter LinkedIn Instagram Your Newsreader About Us Careers Privacy Policy Cookie Policy Copyright Information Trademark Policy Harvard Business Publishing: Higher Education Corporate Learning Harvard Business Review Harvard Business School Copyright ©   Harvard Business School Publishing. All rights reserved. Harvard Business Publishing is an affiliate of Harvard Business School. hecticpace-com-3498 ---- Hectic Pace Hectic Pace A view on libraries, the library business, and the business of libraries My Pre-Covid Things Authors note: these parodies are always about libraries and always based on Christmas songs, stories, or poems. 2020 being what it is, this year is an exception to both…that’s right, I’m siding with my family and admitting that My Favorite Things is not a Christmas song. (sung to the tune of “My Favorite Things”) [Click the YouTube link to listen while you sing along.] Eating in restaurants and movies on big screensPeople who don’t doubt the virtue of vaccines.Inspiring leaders who don’t act like kings.These were a few of my pre-Covid things. Live music venues and in-person classes.No masks or ... Sitting in the Reading Room All Day (sung to the tune of “Walking in a Winter Wonderland”) [Click the YouTube link to listen while you sing along.] People shhhhhh, are you listening? In the stacks, laptops glistening The reading light’s bright The library’s right For sitting in the reading room all day. Gone away are the book stacks Here to stay, the only town’s fax. We share all our books Without judgy looks. Sitting in the reading room all day. In the lobby we could build a book tree. Readers Guide is green and they stack well. I’ll say ‘Do we have ’em?’ You’ll say, ‘Yeah man.’ ... It’s the Best Library Time of the Year (sung to the tune of “It’s the Most Wonderful Time of the Year”)  Press play to sing along with the instrumental track! It’s the best library time of the year With no more children yelling And no one is telling you “get it in gear!” It’s the best library time of the year It’s the qui-quietest season at school Only smile-filled greetings and no more dull meetings Where bosses are cruel It’s the qui-quietest season at school There’ll be books for re-stocking Vendor end-of-year-hawking And overdue fine cash for beer Send the word out to pre-schools Drag queen visit ... Maybe It’s Books We Need [I figured this was a song in desperate need of some new lyrics. Sung to the tune of Baby It’s Cold Outside. You’re gonna want to grab a singing partner and use the instrumental track for this one!] (Listen to the track while you sing!) I really must binge (But maybe it’s books we need) You mustn’t infringe (It’s definitely books we need) This season has been (Reading will make you grin) So fun to watch (I’ll hold the remote, you hold my scotch) My Netflix queue scrolls forever (Mystery, poems, whichever) And Stranger Things won’t just watch itself (Grab ... Being a Better Ally: First, Believe Warning: I might make you uncomfortable. I’m uncomfortable. But it comes from an earnest place. I was recently lucky enough to participate with my OCLC Membership & Research Division colleagues in DeEtta Jones & Associates’ Cultural Competency Training. This day-long session has a firm spot in the top 5 of my professional development experiences. (Not coincidentally, one of the others in that top 5 was DeEtta’s management training I took part in when she was with the Association of Research Libraries). A week later, I’m still processing this incredible experience. And I’m very grateful to OCLC for sponsoring the workshop! ... Fake News Forever! Librarians were among the first to join the call to arms and combat the onslaught of fake news that has permeated our political discussions for the last several months. Frankly, it seems hard for anyone to be on the other side of this issue. But is it? Not long after the effort to stop fake news in its tracks, a group of librarians began to consider the long-term implications of eradicating an entire body of content from history. Thus began a concerted effort to preserve all the fake news that a vigilant group of librarians could gather up. Building on ... How will you be remembered? My grandfather had a sizable library when he passed away, and his son (my father) would wind up with roughly half of it. I remember shelves and shelves of books of quotations. He was a criminal lawyer with a love of quotes. I either inherited this love or caught it through the osmosis of being surrounded by these books throughout my childhood. Most of the books were ruined over the years by mold and silverfish and a dose of neglect. But I managed to save a few handfuls of eclectic titles. Their smell still transports me to the basement of ... Seeking Certainty “Uncertain times” is a phrase you hear a lot these days. It was actually in the title of the ALA Town Hall that took place in Atlanta last month (ALA Town Hall: Library Advocacy and Core Values in Uncertain Times). Political turmoil, uncertainty, divisiveness, and vitriol have so many of us feeling a bit unhinged. When I feel rudderless, adrift, even completely lost at sea, I tend to seek a safer port. I’ve exercised this method personally, geographically, and professionally and it has always served me well. For example, the stability and solid foundation provided by my family gives me solace ... No Not Google Search Box, Just You (to the tune of “All I want for Christmas is You”) (if you need a karaoke track, try this one) I don’t need a lot for freedom, Peace, or love, democracy, and I Don’t care about the Congress or their failed bureaucracy I just want a li-brar-y Filled with places just for me A librarian or two No not Google search box, just you I don’t want a lot of features Search results are too grotesque I don’t care about the systems Back behind your reference desk I don’t need to download e-books On the de-vice of my choice Noisy ... We are ALA I’ve been thinking a lot about governance lately. That said, I will avoid the topic of the recent U.S. election as much as possible, even though it is a factor in what makes me think about governance. Instead, I will focus on library governance and what makes it work and not work. Spoiler alert: active participation. I am an admitted governance junky, an unapologetic lover of Robert’s Rules of Order, and someone who tries to finds beauty in bureaucratic process. I blame my heritage. I come from a long line of federal government employees, all of us born in the ... hecticpace-com-5840 ---- Hectic Pace – A view on libraries, the library business, and the business of libraries Skip to content Hectic Pace My Pre-Covid Things Posted On Dec 21 2020 by Andrew K. Pace Authors note: these parodies are always about libraries and always based on Christmas songs, stories, or poems. 2020 being what it is, this year is an exception to both…that’s right, I’m siding with my family and admitting that My Favorite Things is not a Christmas song. (sung to the tune of “My Favorite Things”) [Click the YouTube link to listen while you sing along.] Eating in restaurants and movies on big screensPeople who don’t doubt the virtue of vaccines.Inspiring leaders who don’t act like kings.These were a few of my pre-Covid things. Live music venues and in-person classes.No masks or … Tagged with: / Category: Christmas Parody / Leave a comment Sitting in the Reading Room All Day Posted On Dec 17 2019 by Andrew K. Pace (sung to the tune of “Walking in a Winter Wonderland”) [Click the YouTube link to listen while you sing along.] People shhhhhh, are you listening? In the stacks, laptops glistening The reading light’s bright The library’s right For sitting in the reading room all day. Gone away are the book stacks Here to stay, the only town’s fax. We share all our books Without judgy looks. Sitting in the reading room all day. In the lobby we could build a book tree. Readers Guide is green and they stack well. I’ll say ‘Do we have ’em?’ You’ll say, ‘Yeah man.’ … Tagged with: / Category: Christmas Parody / 1 Comment It’s the Best Library Time of the Year Posted On Dec 20 2018 by Andrew K. Pace (sung to the tune of “It’s the Most Wonderful Time of the Year”)  Press play to sing along with the instrumental track! It’s the best library time of the year With no more children yelling And no one is telling you “get it in gear!” It’s the best library time of the year It’s the qui-quietest season at school Only smile-filled greetings and no more dull meetings Where bosses are cruel It’s the qui-quietest season at school There’ll be books for re-stocking Vendor end-of-year-hawking And overdue fine cash for beer Send the word out to pre-schools Drag queen visit … Tagged with: / Category: Christmas Parody / Leave a comment Posts navigation 1 2 … 56 Next About A blog and its author Search the Archive Search for: Archives Archives Select Month December 2020  (1) December 2019  (1) December 2018  (1) December 2017  (1) July 2017  (1) April 2017  (1) March 2017  (1) February 2017  (1) December 2016  (1) November 2016  (1) October 2016  (4) September 2016  (1) August 2016  (3) May 2013  (1) December 2012  (1) October 2012  (1) September 2012  (1) July 2012  (1) April 2012  (1) March 2012  (1) January 2012  (1) December 2011  (3) June 2011  (2) April 2011  (2) December 2010  (3) October 2010  (1) August 2010  (1) July 2010  (1) June 2010  (1) May 2010  (1) April 2010  (2) March 2010  (2) January 2010  (1) December 2009  (2) September 2009  (2) August 2009  (3) July 2009  (1) June 2009  (2) May 2009  (2) April 2009  (3) March 2009  (1) January 2009  (3) December 2008  (2) November 2008  (1) October 2008  (1) September 2008  (3) August 2008  (2) July 2008  (2) June 2008  (3) May 2008  (4) April 2008  (3) March 2008  (4) January 2008  (1) December 2007  (4) November 2007  (2) October 2007  (4) September 2007  (4) August 2007  (3) July 2007  (3) June 2007  (5) May 2007  (5) April 2007  (5) March 2007  (6) February 2007  (4) January 2007  (5) December 2006  (4) November 2006  (4) October 2006  (4) September 2006  (4) August 2006  (5) July 2006  (4) June 2006  (2) Categories CategoriesSelect Category 2.0  (6) ALA  (20) April Fool’s  (7) Catalogs  (9) Christmas Parody  (12) Community Development  (1) E-books  (6) EBSCO  (3) Education  (2) Equity, Diversity, & Inclusion  (1) General  (82) Google  (9) Innovation  (31) LITA  (5) Mergers & Acquisitions  (19) Metasearch  (7) NISO  (5) OCLC  (30) Open Source  (5) OpenURL  (1) Product Management  (5) Public Libraries  (1) Publishers  (9) Sacred Cows  (5) Search  (1) Standards  (9) Vendors  (41) Web-scale  (9) WMS  (20) WorldShare  (7) OCLC Next We persevere through challenges when we rely on each other Skip Prichard Why a “Library on-demand” vision benefits from pandemic wisdom Cathy King The OCLC network: Collaboration, innovation, and efficiency OCLC OCLC Colleagues 025.431: The Dewey blog The Digital Shift (Roy Tennant) Hanging Together Lorcan Dempsey’s Weblog OCLC Next WebJunction Library Colleagues librarian.net Screwy Decimal The Shifted Librarian Librarian in Black Free Range Librarian The Travelin’ Librarian David Lee King Jenny Arch Justin the Librarian Mr. Library Dude Thoughts from Carl Grant Search WorldCat Enter title, subject, person, or keyword Hectic Pace RSS feeds Entries RSS Comments RSS © 2006–2021. All Rights Reserved, Hectic Pace help-twitter-com-1683 ---- How to Tweet – what is a Tweet, keyboard shortcuts, and sources Open menu Help Center Help topics Using Twitter Managing your account Safety and security Rules and policies Guides New user FAQ Glossary A safer Twitter Our rules My privacy Getting Started Guide Contact us Provide Feedback Search Go to Twitter Sign out Sign in Search this site Search goglobalwithtwitterbanner Tweets Search Using Twitter Tweets Adding content to your Tweet Search and trends Following and unfollowing Blocking and muting Direct Messages Twitter on your device Website and app integrations Using Periscope Twitter Voices Fleets Managing your account Login and password Username, email, and phone Account settings Notifications Verified accounts Suspended accounts Deactivate and reactivate accounts Safety and security Security and hacked accounts Privacy Spam and fake accounts Sensitive content Abuse Rules and policies Twitter Rules and policies General guidelines and policies Law enforcement guidelines Research and experiments Help Center Tweets How to Tweet How to Tweet A Tweet may contain photos, GIFs, videos, links, and text. Looking for information on how to Tweet at someone? Check out our article about how to post replies and mentions on Twitter. View instructions for: How to Tweet Tap the Tweet compose icon  Compose your message (up to 280 characters) and tap Tweet. How to Tweet Tap on the Tweet compose icon  Enter your message (up to 280 characters), and then tap Tweet. A notification will appear in the status bar on your device and will go away once the Tweet successfully sends. How to Tweet Type your Tweet (up to 280 characters) into the compose box at the top of your Home timeline, or click the Tweet button in the navigation bar. You can include up to 4 photos, a GIF, or a video in your Tweet. Click the Tweet button to post the Tweet to your profile. To save a draft of your Tweet, click the X icon in the top left corner of the compose box, then click Save. To schedule your Tweet to be sent at a later date/time, click on the calendar icon at the bottom of the compose box and make your schedule selections, then click Confirm. To access your drafts and scheduled Tweets, click on Unsent Tweets from the Tweet compose box.   Tweet source labels Tweet source labels help you better understand how a Tweet was posted. This additional information provides context about the Tweet and its author. If you don’t recognize the source, you may want to learn more to determine how much you trust the content.   Click on a Tweet to go to the Tweet details page. At the bottom of the Tweet, you’ll see the label for the source of the account’s Tweet. For example, Twitter for iPhone, Twitter for Android, or Twitter for Web. Tweets containing the Twitter for Advertisers label indicate they are created through the Twitter Ads Composer and not whether they are paid content or not. Paid content contains a Promoted badge across all ad formats. In some cases you may see a third-party client name, which indicates the Tweet came from a non-Twitter application. Authors sometimes use third-party client applications to manage their Tweets, manage marketing campaigns, measure advertising performance, provide customer support, and to target certain groups of people to advertise to. Third-party clients are software tools used by authors and therefore are not affiliated with, nor do they reflect the views of, the Tweet content. Tweets and campaigns can be directly created by humans or, in some circumstances, automated by an application. Visit our partners page for a list of common third-party sources. Deleting Tweets Read about how to delete a Tweet. Note that you can only delete your own Tweets. You cannot delete Tweets which were posted by other accounts. Instead, you can unfollow, block or mute accounts whose Tweets you do not want to receive. Read about how to delete or undo a Retweet. Keyboard shortcuts  The following are a list of keyboard shortcuts to use on twitter.com. Actions n  =  new Tweet l  =  like r  =  reply t  =  Retweet m  =  Direct Message u  =  mute account b  =  block account enter  =  open Tweet details o   =  expand photo /  =  search cmd-enter | ctrl-enter  =  send Tweet Navigation ?  =  full keyboard menu j  =  next Tweet k  =  previous Tweet space  =  page down .  =  load new Tweets Timelines g and h  =  Home timeline g and o  =  Moments g and n  =  Notifications tab g and r  =  Mentions g and p  =  profile  g and l  =  likes tab g and i  =  lists tab g and m  =  Direct Messages g and s  =  Settings and privacy g and u  =  go to someone’s profile Bookmark or share this article Scroll to top Twitter platform Twitter.com Status Card validator Privacy Center Transparency Center Twitter, Inc. About the company Twitter for Good Company news Brand toolkit Jobs and internships Investors Help Help Center Using Twitter Twitter Media Ads Help Center Managing your account Safety and security Rules and policies Contact us Developer resources Developer home Documentation Forums Communities Developer blog Engineering blog Developer terms Business resources Advertise Twitter for business Resources and guides Twitter for marketers Marketing insights Brand inspiration Twitter Data Twitter Flight School © 2021 Twitter, Inc. Cookies Privacy Terms and conditions English Help Center English Español 日本語 한국어 Português Deutsch Türkçe Français Italiano العربيّة Nederlands Bahasa Indonesia Русский हिंदी সহায়তা কেন্দ্র मदत केंद्र સહાયતા કેન્દ્ર உதவி மையம் ಸಹಾಯ ಕೇಂದ್ರ By using Twitter’s services you agree to our Cookies Use. We use cookies for purposes including analytics, personalisation, and ads. OK homosaurus-org-167 ---- Homosaurus Vocabulary Site homosaurus.org Toggle navigation Home Vocabulary Search Releases About Contact Welcome to the Homosaurus! The Homosaurus is an international linked data vocabulary of Lesbian, Gay, Bisexual, Transgender, and Queer (LGBTQ) terms. This vocabulary is intended to function as a companion to broad subject term vocabularies, such as the Library of Congress Subject Headings. Libraries, archives, museums, and other institutions are encouraged to use the Homosaurus to support LGBTQ research by enhancing the discoverability of their LGBTQ resources. If you are using the Homosaurus, we want to hear from you! Please contact us to let us know how you are using this vocabulary and share any feedback you might have. Homosaurus.org is a linked data service maintained by the Digital Transgender Archive Loading... homosaurus-org-6206 ---- Homosaurus Vocabulary Site homosaurus.org Toggle navigation Home Vocabulary Search Releases About Contact Welcome to the Homosaurus! The Homosaurus is an international linked data vocabulary of Lesbian, Gay, Bisexual, Transgender, and Queer (LGBTQ) terms. This vocabulary is intended to function as a companion to broad subject term vocabularies, such as the Library of Congress Subject Headings. Libraries, archives, museums, and other institutions are encouraged to use the Homosaurus to support LGBTQ research by enhancing the discoverability of their LGBTQ resources. If you are using the Homosaurus, we want to hear from you! Please contact us to let us know how you are using this vocabulary and share any feedback you might have. Homosaurus.org is a linked data service maintained by the Digital Transgender Archive Loading... hopeforgirlsandwomen-com-6443 ---- Hope for girls & women Skip to content Facebook Instagram Twitter LinkedIn Search for: Hope for girls & women Menu News News from Hope Upcoming events About Us About Rhobi About Hope About FGM in Tanzania Background Updates from Rhobi Our Supporters Awards & Articles Contact Challenges Team members Marketing material COVID-19 What We Do Safe Houses Sponsor a girl Sponsored girls Community Road Shows Alternative Rites of Passage Film Screenings: In the Name of your Daughter Digital Champions Mapping Re-educating cutters Donate We provide a safe environment for girls escaping Female Genital Mutilation (FGM) Girls often arrive at Hope’s safe houses late at night with just the clothes they have run away in. Those arriving on foot have to navigate from remote, rural areas in the dark. We also work with local police teams to rescue girls when we are alerted that FGM is going to take place. We provide girls with safety, education and hope. Donate to hope Sponsor a girl According to the United Nations, in the Mara region of Tanzania, 32% of women aged between 15 and 49 report having undergone FGM. Hope for Girls and Women was founded by the Tanzanian activist Rhobi Samwelly in 2017. Rhobi’s personal experience of being forced to undergo female genital mutilation (FGM) as a child inspired her lifelong commitment to fight for the rights of girls and women. Our organisation runs two safe houses in the Butiama and Serengeti Districts of the Mara Region of Tanzania, which shelter and support those fleeing FGM, child marriage, and other forms of gender based violence. Read more here. Find out more about our important work to provide Alternative Rites of Passage ceremonies here. We’re continually working on raising awareness locally and globally, whilst also raising funds for our safe houses. Watch our new film here Subscribe here to follow our updates: Email Address: Sign Up Share this: Twitter Facebook Recent Posts 14/02/2021 beccadash Human Rights Detecting pests in Maize and Cassava with the PlantNuru app 20/12/202020/12/2020 beccadash Event reports Rhobi Participates in Women’s Health Talk 13/12/2020 beccadash Event reports Debating Gender-Based Violence with male villagers in Northern Tanzania 13/12/202020/12/2020 hopeforgirlsandwomen Event reports Fighting FGM with Maps 02/12/202006/12/2020 beccadash Human Rights How mapping is helping Tanzanian villages source water More Posts→ Create a website or blog at WordPress.com Email (Required) Name (Required) Website   Loading Comments... Comment × i0-wp-com-7448 ---- None i0-wp-com-7651 ---- None i0-wp-com-8298 ---- None i0-wp-com-8704 ---- None i1-wp-com-5169 ---- None i1-wp-com-5773 ---- None i1-wp-com-8412 ---- None i1-wp-com-8783 ---- None i2-wp-com-1556 ---- None i2-wp-com-240 ---- None i2-wp-com-2606 ---- None i2-wp-com-3644 ---- None i2-wp-com-4290 ---- None i2-wp-com-5138 ---- None i2-wp-com-5720 ---- None i2-wp-com-7223 ---- None i2-wp-com-8043 ---- None i2-wp-com-9284 ---- None idatosabiertos-org-8870 ---- Home - ILDA About About ILDA Transparency Report 2020 Strategic areas Community Gender and inclusion Developing technologies Transparency and governance Projects Femicide data standard Artificial Intelligence Regional Open Data Barometer Global Data Barometer Resources Papers Reports Tools Blog Contact  Español  English  Português do Brasil We work towards an open, equal and data-driven region Featured projects Proyectos Status: active ILDA: The Next Generation Proyectos Status: active Empatía Proyectos Status: active Femicide Data Standardization Proyectos Status: active Global Data Barometer Proyectos Status: active Regional Open Data Barometer Proyectos Status: active Data+Art News Posts 21/08/2020 Open data standards design behind closed doors? Recursos 06/10/2020 Data for development – a road ahead Recursos 15/09/2020 Flow to identify femicides Dirección Legal Rincon 477/803 Montevideo - Uruguay Impact Hub Av.12, entre calle 35 y 37, San Pedro San José - Costa Rica Home Researches Projects Blog Contacto Suscribite a nuestro newsletter: Leave this field empty if you're human: Contactanos Seguinos Seguinos en: Apoyan: inkdroid-org-1054 ---- inkdroid inkdroid Paper or Plastic 856 Coincidence? twarc2 This post was originally published on Medium but I spent time writing it so I wanted to have it here too. TL;DR twarc has been redesigned from the ground up to work with the new Twitter v2 API and their Academic Research track. Many thanks for the code and design contributions of Betsy Alpert, Igor Brigadir, Sam Hames, Jeff Sauer, and Daniel Verdeer that have made twarc2 possible, as well as early feedback from Dan Kerchner, Shane Lin, Miles McCain, 李荣蓬, David Thiel, Melanie Walsh and Laura Wrubel. Extra special thanks to the Institute for Future Environments at Queensland University of Technology for supporting Betsy and Sam in their work, and for the continued support of the Mellon Foundation. Back in August of last year Twitter announced early access to their new v2 API, and their plans to sunset the v1.1 API that has been active for almost the last 10 years. Over the lifetime of their v1.1 API Twitter has become deeply embedded in the media landscape. As magazines, newspapers and television have moved onto the web they have increasingly adopted tweets as a mechanism for citing politicians, celebrities and organizations, while also using them to document current events, generate leads and gather feedback for evolving stories. As a result Twitter has also become a popular object of study for humanities and social science researchers looking to understand the world as reflected, refracted and distorted by/in social media. On the surface the v2 API update seems pretty insignificant since the shape of a tweet, its parts, properties and affordances, aren’t changing at all. Tweets with 280 characters of text, images and video will continue to be posted, retweeted and quoted. However behind the scenes the representation of a tweet as data, and the quotas that control the rates at which this data can flow between apps and other third party services will be greatly transformed. Needless to say, v2 represents a big change for the Documenting the Now project. Along with community members we’ve developed and maintained open source tools like twarc that talk directly to the Twitter API to help users to search for and collect live tweets that match criteria like hashtags, names and geographic locations. Today we’re excited to announce the release of twarc v2 which has been designed from the ground up to work with the v2 API and Twitter’s new Academic Research track. Clearly it’s extremely problematic having a multi-national corporation act as a gatekeeper for who counts as an academic researcher, and what constitutes academic research. We need look no further than the recent experiences of Timnit Gebru and Margaret Mitchell at Google for an example of what happens when research questions run up against the business objectives of capital. We only know their stories because Gebru and Mitchell’s bravely took a principled approach, where many researchers would have knowingly or unknowingly shaped their research to better fit the needs of the company. So it is important for us that twarc still be usable by people with and without access to the Academic Research Track. But we have heard from many users that the Academic Research Track presents new opportunities for Twitter data collection that are essential for researchers interested in the observability of social media platforms. Twitter is making a good faith effort to work with the academic research community, and we thought twarc should support it, even if big challenges lie ahead. So why are people interested in the Academic Research Track? Once your application has been approved you are able to collect data from the full history of Tweets, at no cost. This is a massive improvement over the v1.1 access which was limited to a one week window and researchers had to pay for access. Access to the full archive means it’s now possible to study events that have happened in the past back to the beginning of Twitter in 2006. If you do create any historical datasets we’d love for you to share the tweet identifier datasets in The Catalog. However this opening up of access on the one hand comes with a simultaneous contraction in terms of how much data can be collected at one time. The remainder of this post describes some of the details and the design decisions we have made with twarc2 to address them. If you would prefer to watch a quick introduction to using twarc v2 please check out this short video: Installation If you are familiar with installing twarc nothing is changed. You still install (or upgrade) with pip as you did before: $ pip install --upgrade twarc In fact you will still have full access to the v1.1 API just as you did before. So the old commands will continue to work as they did1 $ twarc search blacklivesmatter > tweets.jsonl twarc2 was designed to let you to continue to use Twitter’s v1.1 API undisturbed until it is finally turned off by Twitter, at which point the functionality will be removed from twarc. All the support for the v2 API is mediated by a new command line utility twarc2. For example to search for blacklivesmatter tweets and write them to a file tweets.jsonl: $ twarc2 search blacklivesmatter > tweets.jsonl All the usual twarc functionality such as searching for tweets, collecting live tweets from the streaming API endpoint, requesting user timelines and user metadata are all still there, twarc2 --help gives you the details. But while the interface looks the same there’s quite a bit different going on behind the scenes. Representation Truth be told, there is no shortage of open source libraries and tools for interacting with the Twitter API. In the past twarc has made a bit of a name for itself by catering to a niche group of users who want a reliable, programmable way to collect the canonical JSON representation of a tweet. JavaScript Object Notation (JSON) is the language of Web APIs, and Twitter has kept its JSON representation of a tweet relatively stable over the years. Rather than making lots of decisions about the many ways you might want to collect, model and analyze tweets twarc has tried to do one thing and do it well (data collection) and get out of the way so that you can use (or create) the tools for putting this data to use. But the JSON representation of a tweet in the Twitter v2 API is completely burst apart. The v2 base representation of a tweet is extremely lean and minimal, and just includes the text of the tweet its identifier and a handful of other things. All the details about the user who created the tweet, embedded media, and more are not included. Fortunately this information is still available, but the user needs to craft their API request to request tweets using a set of expansions that tell the Twitter API what additional entities to include. In addition for each expansion there are a set of field options to include that control what of these expansions is returned. So rather than there being a single JSON representation of a tweet API users now have the ability to shape the data based on what they need, much like how GraphQL APIs work. This kind of makes you wonder why Twitter didn’t make their GraphQL API available. For specific use cases this customizability is very useful, but the mutability of the representation of a tweet presents challenges when collecting data for future use. If you didn’t request the right expansions or fields when collecting the data then you won’t be able to analyze that data later when doing your research. To solve for this twarc2 has been designed to collect the richest possible representation for a tweet, by requesting all possible expansions and field combinations for tweets. See the expansions module for the details if you are interested. This takes a significant burden off of users to digest the API documentation, and craft the correct API requests themselves. In addition the twarc community will be monitoring the Twitter API documentation going forward to incorporate new expansions and fields as they will inevitably be added in the future. Flattening This is diving into the weeds a little bit, but it’s worth noting here that Twitter’s introduction of expansions allows data that was once duplicated across multiple tweets (such as user information, media, retweets, etc) to be included once per response from the API. This means that instead of seeing information about the user who created a tweet in the context of their tweet the user will be referenced using an identifier, and this identifier will map to user metadata in the outer envelope of the response. It makes sense why Twitter have introduced expansions since it means in a set of 100 tweets from a given user the user information will just be included once rather than repeated 100 times, which means less data, less network traffic and less money. It’s even more significant when consider the large number of possible expansions. However this pass by-reference rather than by-value presents some challenges for stream based processing which expects each tweet to be self-contained. For this reason we’ve introduce the idea of flattening the response data when persisting the JSON to disk. This means that tools and data pipelines that expect to operate on a stream of tweets can continue to do so. Since the representation of a tweet is so dependent on how data is requested we’ve taken the opportunity to introduce a small stanza of twarc specific metadata using the __twarc prefix. This metadata records what API endpoint the data was requested from, and when. This information is critically important when interpreting the data, because some information about a tweet like its retweet and quote counts are constantly changing. Data Flows As mentioned above you can still collect tweets from the search and streaming API endpoints in a way that seems quite similar to the v1 API. The big changes however are the quotas associated with these endpoints which govern how much can be collected. These quotas control how many requests can be sent to Twitter in 15 minute intervals. In fact these quotas are not much changed, but what’s new are app wide quotas that constrain how many tweets a given application (app) can collect every month. An app in this context is a piece of software (e.g. your twarc software) identified by unique API keys set up in the Twitter Developer Portal. The standard API access sets a 500,000 tweet per month limit. This is a huge change considering there were no monthly app limits before. If you get approved for the Academic Research track your app quota is increased to 10 million per month. This is markedly better but the achievable data volume is still nothing like the v1.1 API, as these graphs attempt to illustrate: twarc2 will still observe the same rate limits, but once you’ve collected your portion for the month there’s not much that can be done, for that app at least. Apart from the quotas Twitter’s streaming endpoint in v2 is substantially changed which impacts how users interact with twarc. Previously twarc users would be able to create up to to two connections to the filter stream API. This could be done by simply: twarc filter obama > obama.jsonl However in the Twitter v2 API only apps can connect to the filter stream, and they can only connect once. At first this seems like a major limitation but rather than creating a connection per query the v2 API allows you to build a set of rules for tweets to match, which in turns controls what tweets are included in the stream. This means you can collect for multiple types of queries at the same time, and the tweets will come back with a piece of metadata indicating what rule caused its inclusion. This translates into a markedly different set of interactions at the command line for collecting from the stream where you first need to set your stream rules and then open a connection to fetch it. twarc2 stream-rules add blacklivesmatter twarc2 stream > tweets.jsonl One useful side effect of this is that you can update the stream (add and remove rules) while the stream is in motion: twarc2 stream-rules add blm While you are limited by the API quota in terms of how many tweets you can collect, tweets are not “dropped on the floor” when the volume gets too high. Once upon a time the v1.1 filter stream was rumored to be rate limited when your stream exceeds 1% of the total volume of new tweets. Plugins In addition to twarc helping you collect tweets the GitHub repository has also been a place to collect a set of utilities for working with the data. For example there are scripts for extracting and unshortening urls, identifying suspended/deleted content, extracting videos, buiding wordclouds, putting tweets on maps, displaying network graph visualizations, counting hashtags, and more. These utilities all work like Unix filters where the input is a stream of tweets and the output varies depending on what the utility is doing, e.g. a Gephi file for a network visualization, or a folder of mp4 files for video extraction. While this has worked well in general the kitchen sink approach has been difficult to manage from a configuration management perspective. Users have to download these scripts manually from GitHub or by cloning the repository. For some users this is fine, but it’s a bit of a barrier to entry for users who have just installed twarc with pip. Furthermore these plugins often have their own dependencies which twarc itself does not. This lets twarc can stay pretty lean, and things like youtube_dl, NetworkX or Pandas can be installed by people that want to use utilities that need them. But since there is no way to install the utilities there isn’t a way to ensure that the dependencies are installed, which can lead to users needing to diagnose missing libraries themselves. Finally the plugins have typically lacked their own tests. twarc’s test suite has really helped us track changes to the Twitter API and to make sure that it continues to operate properly as new functionality has been added. But nothing like this has existed for the utilities. We’ve noticed that over time some of them need updating. Also their command line arguments have drifted over time which can lead to some inconsistencies in how they are used. So with twarc2 we’ve introduced the idea of plugins which extend the functionality of the twarc2 command, are distributed on PyPI separately from twarc, and exist in their own GitHub repositories where they can be developed and tested independently of twarc itself. This is all achieved through twarc2’s use of the click library and specifically click-plugins. So now if you would like to convert your collected tweets to CSV you can install the twarc-csv: $ pip install twarc-csv $ twarc2 search covid19 > covid19.jsonl $ twarc2 csv covid19.jsonl > covid19.csv Or if you want to extract embedded and referenced videos from tweets you can install twarc-videos which will write all the videos to a directory: $ pip install twarc-videos $ twarc2 videos covid19.jsonl --download-dir covid19-videos You can write these plugins yourself and release them as needed. Check out the plugin reference implementation tweet-ids for a simple example to adapt. We’re still in the process of porting some of the most useful utilities over and would love to see ideas for new plugins. Check out the current list of twarc2 plugins and use the twarc issue tracker on GitHub to join the discussion. You may notice from the list of plugins that twarc now (finally) has documentation on ReadTheDocs external from the documentation that was previously only available on GitHub. We got by with GitHub’s rendering of Markdown documents for a while, but GitHub’s boilerplate designed for developers can prove to be quite confusing for users who aren’t used to selectively ignoring it. ReadTheDocs allows us to manage the command line and API documentation for twarc, and to showcase the work that has gone into the Spanish, Japanese, Portuguese, Swedish, Swahili and Chinese translations. Feedback Thanks for reading this far! We hope you will give twarc2 a try. Let us know what you think either in comments here, in the DocNow Slack or over on GitHub. ✨ ✨ Happy twarcing! ✨ ✨ ✨ Windows users will want to indicate the output file using a second argument rather than redirecting output with >. See this page for details.↩ $ j You may have noticed that I try to use this static website as a journal. But, you know, not everything I want to write down is really ready (or appropriate) to put here. Some of these things end up in actual physical notebooks–there’s no beating the tactile experience of writing on paper for some kind of thinking. But I also spend a lot of time on my laptop, and at the command line in some form or another. So I have a directory of time stamped Markdown files stored on Dropbox, for example: ... /home/ed/Dropbox/Journal/2019-08-25.md /home/ed/Dropbox/Journal/2020-01-27.md /home/ed/Dropbox/Journal/2020-05-24.md /home/ed/Dropbox/Journal/2020-05-25.md /home/ed/Dropbox/Journal/2020-05-31.md ... Sometimes these notes migrate into a blog post or some other writing I’m doing. I used this technique quite a bit when writing my dissertation when I wanted to jot down things on my phone when an idea arrived. I’ve tried a few different apps for editing Markdown on my phone, but mostly settled on iA Writer which mostly just gets out of the way. But when editing on my laptop I tend to use my favorite text editor Vim with the vim-pencil plugin for making Markdown fun and easy. If Vim isn’t your thing and you use another text editor keep reading since this will work for you too. The only trick to this method of journaling is that I just need to open the right file. With command completion on the command line this isn’t so much of a chore. But it does take a moment to remember the date, and craft the right path. Today while reflecting on how nice it is to still be using Unix, it occurred to me that I could create a little shell script to open my journal for that day (or a previous day). So I put this little file j in my PATH: #!/bin/zsh journal_dir="/home/ed/Dropbox/Journal" if [ "$1" ]; then date=$1 else date=`date +%Y-%m-%d` fi vim "$journal_dir/$date.md" So now when I’m in the middle of something else and want to jot a note in my journal I just type j. Unix, still crazy after all these years. Strengths and Weaknesses Quoting Macey (2019), quoting Foucault, quoting Nietzsche: One thing is needful. – To ‘give style’ to one’s character – a great and rare art! It is practised by those who survey all the strengths and weaknesses that their nature has to offer and then fit them into an artistic plan until each appears as art and reason and even weaknesses delight the eye. Nietzsche, Williams, Nauckhoff, & Del Caro (2001), p. 290 This is a generous and lively image of what art does when it is working. Art is not perfection. Macey, D. (2019). The lives of Michel Foucault: A biography. Verso. Nietzsche, F. W., Williams, B., Nauckhoff, J., & Del Caro, A. (2001). The gay science: with a prelude in German rhymes and an appendix of songs. Cambridge, U.K. ; New York: Cambridge University Press. Data Speculation I’ve taken the ill-advised approach of using the Coronavirus as a topic to frame the exercises in my computer programming class this semester. I say “ill-advised” because given the impact that COVID has been having on students I’ve been thinking they probably need a way to escape news of the virus by way of writing code, rather than diving into it more. It’s late in the semester to modulate things but I think we will shift gears to look at programming through another lens after spring break. That being said, one of the interesting things we’ve been doing is looking at vaccination data that is being released by the Maryland Department of Health through their ESRI ArcGIS Hub. Note: this dataset has since been removed from the web because it has been superseded by a new dataset that includes single dose vaccinations. I guess it’s good that students get a feel for how ephemeral data on the web is, even when it is published by the government. We noticed that this dataset recorded a small number of vaccinations as happening as early as the 1930s up until December 11, 2020 when vaccines were approved for use. I asked students to apply what we have been learning about Python (files, strings, loops, and sets) to identify the Maryland counties that were responsible for generating this anomalous data. I thought this exercise provided a good demonstration using real, live data that critical thinking about the provenance of data is always important because there is no such thing as raw data (Gitelman, 2013). While we were working with the data to count the number of anomalous vaccinations per county one of my sharp eyed students noticed that the results we were seeing with my version of the dataset (downloaded on February 28) were different from what we saw with his (downloaded on March 4). We expected to see new rows in the later one because new vaccination data seem to be reported daily–which is cool in itself. But we were surprised to find new vaccination records for dates earlier than December 11, 2020. Why would new vaccinations for these erroneous older dates still be entering the system? For example the second dataset downloaded March 4 acquired 6 new rows: Object ID Vaccination Date County Daily First Dose Cumulative First Dose Daily Second Dose Cumulative Second Dose 4 1972/10/13 Allegany 1 1 0 0 5 1972/12/16 Baltimore 1 1 0 0 6 2012/02/03 Baltimore 1 2 0 0 28 2020/02/24 Baltimore City 1 2 0 0 34 2020/08/24 Baltimore 1 4 0 0 64 2020/12/10 Prince George’s 1 3 0 0 And these rows present in the February 28 version were deleted in the March 4 version: Object ID Vaccination Date County Daily First Dose Cumulative First Dose Daily Second Dose Cumulative Second Dose 4 2019/12/26 Frederick 1 1 0 0 15 2020/01/25 Talbot 1 1 0 0 19 2020/01/28 Baltimore 1 1 0 0 20 2020/01/30 Caroline 1 1 0 0 28 2020/02/12 Prince George’s 1 1 0 0 30 2020/02/20 Anne Arundel 1 6 0 0 56 2020/10/16 Frederick 1 7 0 4 59 2020/11/01 Wicomico 1 1 0 0 60 2020/11/04 Frederick 1 8 0 4 I found these additions perplexing at first, because I assumed these outliers were part of an initial load. But it appears that the anomalies are still being generated? The deletions suggest that perhaps the anomalous data is being identified and scrubbed in a live system that is then dumping out the data? Or maybe the code that is being used to update the dataset in ArcGIS Hub itself is malfunctioning in some way? If you are interested in toying around with the code and data it is up on GitHub. I was interested to learn about pandas.DataFrame.merge which is useful for diffing tables when you use indicator=True. At any rate, having students notice, measure and document anomalies like this seems pretty useful. I also asked them to speculate about what kinds of activities could generate these errors. I meant speculate in the speculative fiction sense of imagining a specific scenario that caused it. I think this made some students scratch their head a bit, because I wasn’t asking them for the cause, but to invent a possible cause. Based on the results so far I’d like to incorporate more of these speculative exercises concerned with the functioning of code and data representations into my teaching. I want to encourage students to think creatively about data processing as they learn about the nuts and bolts of how code operates. For example the treatments in How to Run a City Like Amazon, and Other Fables which use sci-fi to test ideas about how information technologies are deployed in society. Another model is the Speculative Ethics Book Club which also uses sci-fi to explore the ethical and social consequences of technology. I feel like I need to read up on specualtive research more generally before doing this though (Michael & Wilkie, 2020). I’d also like to focus the speculation down at the level of the code or data processing, rather than at the macro super-system level. But that has its place too. Another difference is that I was asking students to engage in speculation about the past rather than the future. How did the data end up this way? Perhaps this is more of a genealogical approach, of winding things backwards, and tracing what is known. Maybe it’s more Mystery than Sci-Fi. The speculative element is important because (in this case) operations at the MD Dept of Health, and their ArcGIS Hub setup are mostly opaque to us. But even when access isn’t a problem these systems they can feel opaque, because rather than there being a dearth of information you are drowning in it. Speculation is a useful abductive approach to hypothesis generation and, hopefully, understanding. Update 2021-03-17: Over in the fediverse David Benque recommended I take a look at Matthew Stanley’s chapter in (Gitelman, 2013) “Where Is That Moon, Anyway? The Problem of Interpreting Historical Solar Eclipse Observations” for the connection to Mystery. For the connection to Peirce and abduction he also pointed to Luciana Parisi’s chapter “Speculation: A method for the unattainable” in Lury & Wakeford (2012). Definitely things to follow up on! References Gitelman, L. (Ed.). (2013). “Raw data” is an oxymoron. MIT Press. Lury, C., & Wakeford, N. (2012). Inventive methods: The happening of the social. Routledge. Michael, M., & Wilkie, A. (2020). Speculative research. In The Palgrave encyclopedia of the possible (pp. 1–8). Cham: Springer International Publishing. Retrieved from https://doi.org/10.1007/978-3-319-98390-5_118-1 Recovering Foucault I’ve been enjoying reading David Macey’s biography of Michel Foucault, that was republished in 2019 by Verso. Macey himself is an interesting figure, both a scholar and an activist who took leave from academia to do translation work and to write this biography and others of Lacan and Fanon. One thing that struck me as I’m nearing the end of Macey’s book is the relationship between Foucault and archives. I think Foucault has become emblematic of a certain brand of literary analysis of “the archive” that is far removed from the research literature of archival studies, while using “the archive” as a metaphor (Caswell, 2016). I’ve spent much of my life working in libraries and digital preservation, and now studying and teaching about them from the perspective of practice, so I am very sympathetic to this critique. It is perhaps ironic that the disconnect between these two bodies of research is a difference in discourse which Foucault himself brought attention to. At any rate, the thing that has struck me while reading this biography is how much time Foucault himself spent working in libraries and archives. Here’s Foucault in his own words talking about his thesis: In Histoire de la folie à l’âge classique I wished to determine what could be known about mental illness in a given epoch … An object took shape for me: the knowledge invested in complex systems of institutions. And a method became imperative: rather than perusing … only the library of scientific books, it was necessary to consult a body of archives comprising decrees, rules hospital and prison registers, and acts of jurisprudence. It was in the Arsenal or the Archives Nationales that I undertook the analysis of a knowledge whose visible body is neither scientific nor theoretical discourse, nor literature, but a daily and regulated practice. (Macey, 2019, p. 94) Foucault didn’t simply use archives for his research: understanding the processes and practices of archives were integral to his method. Even though the theory and practice of libraries and archives are quite different given their different functions and materials, they are often lumped together as a convenience in the same buildings. Macey blurs them a little bit, in sections like this where he talks about how important libraries were to Foucault’s work: Foucault required access to Paris for a variety of reasons, not least because he was also teaching part-time at ENS. The putative thesis he had begun at the Fondation Thiers – and which he now described to Polin as being on the philosophy of psychology – meant that he had to work at the Bibliothèque Nationale and he had already become one of its habitues. For the next thirty years, Henri Labrouste’s great building in the rue de Richelieu, with its elegant pillars and arches of cast iron, would be his primary place of work. His favourite seat was in the hemicycle, the small, raised section directly opposite the entrance, sheltered from the main reading room, where a central aisle separates rows of long tables subdivided into individual reading desks. The hemicycle affords slighty more quiet and privacy. For thirty years, Foucault pursued his research here almost daily, with occasional forays to the manuscript department and to other libraries, and contended with the Byzantine cataloguing system: two incomplete and dated printed catalogues supplemented by cabinets containing countless index cards, many of them inscribed with copperplate handwriting. Libraries were to become Foucault’s natural habitat: ‘those greenish institutions where books accumulate and where there grows the dense vegetation of their knowledge’ There’s a metaphor for you: libraries as vegetation :) It kind of reminds me of some recent work looking at decentralized web technologies in terms of mushrooms. But I digress. I really just wanted to note here that the erasure of archival studies from humanities research about “the archive” shouldn’t really be attributed to Foucault, whose own practice centered the work of libraries and archives. Foucault wasn’t just writing about an abstract archive, he was practically living out of them. As someone who has worked in libraries and archives I can appreciate how power users (pun intended) often knew aspects of the holdings and intricacies of their their management better than I did. Archives, when they are working, are always collaborative endeavours, and the important thing is to recognize and attribute the various sides of that collaboration. PS. Writing this blog post led me to dig up a few things I want to read (Eliassen, 2010; Radford, Radford, & Lingel, 2015 ). References Caswell, M. (2016). The archive is not an archives: On acknowledging the intellectual contributions of archival studies. Reconstruction, 16(1). Retrieved from http://reconstruction.eserver.org/Issues/161/Caswell.shtml Eliassen, K. (2010). Archives of Michel Foucualt. In E. Røssaak (Ed.), The archive in motion, new conceptions of the archive in contemporary thought and new media practices. Novus Press. Macey, D. (2019). The lives of Michel Foucault: A biography. Verso. Radford, G. P., Radford, M. L., & Lingel, J. (2015). The library as heterotopia: Michel Foucault and the experience of library space. Journal of Documentation, 71(4), 773–751. Teaching OOP in the Time of COVID I’ve been teaching a section of the Introduction to Object Oriented Programming at the UMD College for Information Studies this semester. It’s difficult for me, and for the students, because we are remote due to the Coronavirus pandemic. The class is largely asynchronous, but every week I’ve been holding two synchronous live coding sessions in Zoom to discuss the material and the exercises. These have been fun because the students are sharp, and haven’t been shy about sharing their screen and their VSCode session to work on the details. But students need quite a bit of self-discipline to move through the material, and probably only about 1/4 of the students take advantage of these live sessions. I’m quite lucky because I’m working with a set of lectures, slides and exercises that have been developed over the past couple of years by other instructors: Josh Westgard, Aric Bills and Gabriel Cruz. You can see some of the public facing materials here. Having this backdrop of content combined with Severance’s excellent (and free) Python for Everybody has allowed me to focus more on my live sessions, on responsive grading, and to also spend some time crafting additional exercises that are geared to this particular moment. This class is in the College for Information Studies and not in the Computer Science Department, so it’s important for the students to not only learn how to use a programming language, but to understand programming as a social activity, with real political and material effects in the world. Being able to read, understand, critique and talk about code and its documentation is just as important as being able to write it. In practice, out in the “real world” of open source software I think these aspects are arguably more important. One way I’ve been trying to do this in the first few weeks of class is to craft a sequence of exercises that form a narrative around Coronavirus testing and data collection to help remind the students of the basics of programming: variables, expressions, conditionals, loops, functions, files. In the first exercise we imagined a very simple data entry program that needed to record results of Real-time polymerase chain reaction tests (RT-PCR). I gave them the program and described how it was supposed to work, and asked them describe (in English) any problems that they noticed and to submit a version of the program with problems fixed. I also asked them to reflect on a request from their boss about adding the collection of race, gender and income information. The goal here was to test their ability to read the program and write English about it while also demonstrating a facility for modifying the program. Most importantly I wanted them to think about how inputs such as race or gender have questions about categories and standards behind them, and weren’t simply a matter of syntax. The second exercise builds on the first by asking them to adjust the revised program to be able to save the data in a very particular format. Yes, in the first exercise the data is stored in memory and printed to the screen in aggregate at the end. The scenario here is that the Department of Health and Human Services has assumed the responsibility for COVID test data collection from the Centers for Disease Control. Of course this really happened, but the data format I chose was completely made up (maybe we will be working with some real data at the end of the semester if I continue with this theme). The goal in this exercise was to demonstrate their ability to read another program and fit a function into it. The students were given a working program that had a save_results() function stubbed out. In addition to submitting their revised code I asked them to reflect on some limitations of the data format chosen, and the data processing pipeline that it was a part of. And in the third exercise I asked them to imagine that this lab they were working in had a scientist who discovered a problem with some of the thresholds for acceptable testing, which required an update to the program from Exercise 2, and also a test suite to make sure the program was behaving properly. In addition to writing the tests I asked them to reflect on what functionality was not being tested that probably should be. This alternation between writing code and writing prose is something I started doing as part of a Digital Curation class. I don’t know if this dialogical or perhaps dialectical, approach is something others have tried. I should probably do some research to see. In my last class I alternated week by week: one week reading and writing code, the next week reading and writing prose. But this semester I’ve stayed focused on code, but required the reading and writing of code as well as prose about code in the same week. I hope to write more about how this goes, and these exercises as I go. I’m not sure if I will continue with the Coronavirus data examples. One thing I’m sensitive to is that my students themselves are experiencing the effects of the Coronavirus, and may want to escape it just for a bit in their school work. Just writing in the open about it here, in addition to the weekly meetings I’ve had with Aric, Josh and Gabriel has been very useful. Speaking of those meetings. I learned today from Aric that tomorrow (February 20th, 2021) is the 30th anniversary of Python’s first public release! You can see this reflected in this timeline. This v0.9.1 release was the first release Guido van Rossum made outside of CWI and was made on the Usenet newsgroup alt.sources where it is split out into chunks that need to be reassembled. Back in 2009 Andrew Dalke located a and repackaged these sources in Google Groups which acquired alt.sources as part of DejaNews in 2001. But if you look at the time stamp on the first part of the release you can see that it was made February 19, 1991 (not February 20). So I’m not sure if the birthday is actually today. I sent this little note out to my students with this wonderful two part oral history that the Computer History Museum did with Guido van Rossum a couple years ago. I turns out Both of his parents were atheists and pacifists. His dad went to jail because he refused to be conscripted into the military. That and many more details of his background and thoughts about the evolution of Python can be found in these delightful interviews: Happy Birthday Python! GPT-3 Jam One of the joys of pandemic academic life has been a true feast of online events to attend, on a wide variety of topics, some of which are delightfully narrow and esoteric. Case in point was today’s Reflecting on Power and AI: The Case of GPT-3 which lived up to its title. I’ll try to keep an eye out for when the video posts, and update here. The workshop was largely organized around an exploration of whether GPT-3, the largest known machine learning language model, changes anything for media studies theory, or if it amounts to just more of the same. So the discussion wasn’t focused so much on what games could be played with GPT-3, but rather if GPT-3 changes the rules of the game for media theory, at all. I’m not sure there was a conclusive answer at the end, but it sounded like the consensus was that current theorization around media is adequate for understanding GPT-3, but it matters greatly what theory or theories are deployed. The online discussion after the presentations indicated that attendees didn’t see this as merely a theoretical issue, but one that has direct social and political impacts on our lives. James Steinhoff looked at GPT-3 using a Marxist media theory perspective where he told the story of GPT-3’s as a project of OpenAI and as a project of capital. OpenAI started with much fanfare in 2015 as a non-profit initiative where the technology, algorithms and models developed would would be kept openly licensed and freely available so that the world could understand the benefits and risks of AI technology. Steinhoff described how in 2019 the project’s needs for capital (compute power and staff) transitioned it from a non-profit into a capped-profit company, which is now owned, or at least controlled, by Microsoft. The code for generating the model as well as the model itself are gated behind a token driven Web API run my Microsoft. You can get on a waiting list to use it, but apparently a lot of people have been waiting a while, so … Being a Microsoft employee probably helps. I grabbed a screenshot of the pricing page that Steinhoff shared during his presentation: I’d be interested to hear more about how these tokens operate. Are they per-request, or are they measured according something else? I googled around a bit during the presentation to try to find some documentation for the Web API, and came up empty handed. I did find Shreya Shankar’s gpt3-sandbox project for interacting with the API in your browser (mostly for iteratively crafting text input in order to generate desired output). It depends on the openai Python package created by OpenAI themselves. The docs for openai then point at a page on the openai.com website which is behind a login. You can create an account, but you need to be pre-approved (made it through the waitlist) to be able to see the docs. There’s probably some sense that can be made from examining the python client though. All of the presentations in some form or another touched on the 175 billion parameters that were used to generate the model. But the API to the model doesn’t have that many parameters. It allows you to enter text and get text back. But the API surface that the GPT-3 service provides could be interesting to examine a bit more closely, especially to track how it changes over time. In terms of how this model mediates knowledge and understanding it’ll be important watch. Steinhoff’s message seemed to be that, despite the best of intentions, GPT-3 functions in the service of very large corporations with very particular interests. One dimension that he didn’t explore perhaps because of time, is how the GPT-3 model itself is fed massive amounts of content from the web, or the commons. Indeed 60% of the data came from the CommonCrawl project. GPT-3 is an example of an extraction project that has been underway at large Internet companies for some time. I think the critique of these corporations has often been confined to seeing them in terms of surveillance capitalism rather than in terms of raw resource extraction, or the primitive accumulation of capital. The behavioral indicators of who clicked on what are certainly valuable, but GPT-3 and sister projects like CommonCrawl shows just the accumulation of data with modest amounts of metadata can be extremely valuable. This discussion really hit home for me since I’ve been working with Jess Ogden and Shawn Walker using CommonCrawl as a dataset for talking about the use of web archives, while also reflecting on the use of web archives as data. CommonCrawl provides a unique glimpse into some of the data operations that are at work in the accumulation of web archives. I worry that the window is closing and the CommonCrawl itself will be absorbed into Microsoft. Following Steinhoff Olya Kudina and Bas de Boer jointly presented some compelling thoughts about how its important to understand GPT-3 in terms of sociotechnical theory, using ideas drawn from Foucault and Arendt. I actually want to watch their presentation again because it followed a very specific path that I can’t do justice to here. But their main argument seemed to be that GPT-3 is an expression of power and that where there is power there is always resistance to power. GPT-3 can and will be subverted and used to achieve particular political ends of our own choosing. Because of my own dissertation research I’m partial to Foucault’s idea of governmentality, especially as it relates to ideas of legibility (Scott, 1998)–the who, what and why of legibility projects, aka archives. GPT-3 presents some interesting challenges in terms of legibility because the model is so complex, the results it generates defy deductive logic and auditing. In some ways GPT-3 obscures more than it makes a population legible, as Foucault moved from disciplinary analysis of the subject, to the ways in which populations are described and governed through the practices of pastoral power, of open datasets. Again the significance of CommonCrawl as an archival project, as a web legibility project, jumps to the fore. I’m not as up on Arendt as I should be, so one outcome of their presentation is that I’m going to read her The Human Condition which they had in a slide. I’m long overdue. References Scott, J. C. (1998). Seeing like a state: How certain schemes to improve the human condition have failed. Yale University Press. mimetypes Today I learned that Python has a mimetypes module, and has ever since Guido von Rossum added it in 1997. Honestly I’m just a bit sheepish to admit this discovery, as someone who has been using Python for digital preservation work for about 15 years. But maybe there’s a good reason for that. Since the entire version history for Python is available on GitHub (which is a beautiful thing in itself) you can see that the mimetypes module started as a guess_type() function built around a pretty simple hard coded mapping of file extensions to mimetypes. The module also includes a little bit of code to look for, and parse, mimetype registries that might be available on the host operating system. The initial mimetype registries used included one from the venerable Apache httpd web server, and the Netscape web browser, which was about three years old at the time. It makes sense why this function to look up a mimetype for a filename would be useful at that time, since Python was being used to serve up files on the nascent web and for sending email, and whatnot. Today the module looks much the same, but has a few new functions and about twice as many mimetypes in its internal list. Some of the new mimetypes include text/csv, audio.mpeg, application/vnd.ms-powerpoint, application/x-shockwave-flash, application/xml, and application/json. Comparing the first commit to the most latest provides a thumbnail sketch of 25 years of web format evolution. I’ll admit, this is is a bit of an esoteric thing to be writing a blog post about. So I should explain. At work I’ve been helping out on a community archiving project which has accumulated a significant amount of photographs, scans, documents of various kinds, audio files and videos. Some of these files are embedded in web applications like Omeka, some are in cloud storage like Google Drive, or on the office networked attached storage, and others are on scattered storage devices in people’s desk drawers and closets. We’ve also created new files during community digitization events, and oral history interviews. As part of this work we’ve wanted to start building a place on the web where all these materials live. This has required not only describing the files, but also putting all the files in one place so that access can be provided. In principle this sounds simple. But it turns out that collecting the files from all these diverse locations poses significant challenges, because their context matters. The filenames, and the directories they are found in, are sometimes the only descriptive metadata that exists for this data. In short, the original order matters. But putting this content on the web means that the files need to be brought together and connected with their metadata programmatically. This is how I stumbled across the mimetypes module. I’ve been writing some throwaway code to collect the files together into the same directory structure while preserving their original filenames and locations in an Airtable database. I’ve been using the magic module to identify the format of the file, which is used to copy the file into a Dropbox storage location. The extension is important because we are expecting this to be a static site serving up the content and we want the files to also be browsable using the Dropbox drive. It turns out the mimetypes.guess_extension is pretty useful for turning a mediatype into an file extension. I’m kind of surprised that it took me this long to discover mimetypes, but I’m glad I did. As an aside I think this highlights for me how important Git can be as an archive and research method for software studies work. Northwest Branch Cairn Here is a short recording and a couple photos from my morning walk along the Northwest Branch trail with Penny. I can’t go every day but at 7 months old she has tons of energy, so it’s generally a good idea for all concerned to go at least every other morning. And it’s a good thing, because the walk is surprisingly peaceful, and it’s such a joy to see her run through the woods. After walking about 30 minutes there is this little cairn that is a reminder for me to turn around. After seeing it grow in size I was sad to see it knocked down one day. But, ever so slowly, it is getting built back up again. inkdroid-org-180 ---- inkdroid inkdroid Paper or Plastic 856 Coincidence? twarc2 This post was originally published on Medium but I spent time writing it so I wanted to have it here too. TL;DR twarc has been redesigned from the ground up to work with the new Twitter v2 API and their Academic Research track. Many thanks for the code and design contributions of Betsy Alpert, Igor Brigadir, Sam Hames, Jeff Sauer, and Daniel Verdeer that have made twarc2 possible, as well as early feedback from Dan Kerchner, Shane Lin, Miles McCain, 李荣蓬, David Thiel, Melanie Walsh and Laura Wrubel. Extra special thanks to the Institute for Future Environments at Queensland University of Technology for supporting Betsy and Sam in their work, and for the continued support of the Mellon Foundation. Back in August of last year Twitter announced early access to their new v2 API, and their plans to sunset the v1.1 API that has been active for almost the last 10 years. Over the lifetime of their v1.1 API Twitter has become deeply embedded in the media landscape. As magazines, newspapers and television have moved onto the web they have increasingly adopted tweets as a mechanism for citing politicians, celebrities and organizations, while also using them to document current events, generate leads and gather feedback for evolving stories. As a result Twitter has also become a popular object of study for humanities and social science researchers looking to understand the world as reflected, refracted and distorted by/in social media. On the surface the v2 API update seems pretty insignificant since the shape of a tweet, its parts, properties and affordances, aren’t changing at all. Tweets with 280 characters of text, images and video will continue to be posted, retweeted and quoted. However behind the scenes the representation of a tweet as data, and the quotas that control the rates at which this data can flow between apps and other third party services will be greatly transformed. Needless to say, v2 represents a big change for the Documenting the Now project. Along with community members we’ve developed and maintained open source tools like twarc that talk directly to the Twitter API to help users to search for and collect live tweets that match criteria like hashtags, names and geographic locations. Today we’re excited to announce the release of twarc v2 which has been designed from the ground up to work with the v2 API and Twitter’s new Academic Research track. Clearly it’s extremely problematic having a multi-national corporation act as a gatekeeper for who counts as an academic researcher, and what constitutes academic research. We need look no further than the recent experiences of Timnit Gebru and Margaret Mitchell at Google for an example of what happens when research questions run up against the business objectives of capital. We only know their stories because Gebru and Mitchell’s bravely took a principled approach, where many researchers would have knowingly or unknowingly shaped their research to better fit the needs of the company. So it is important for us that twarc still be usable by people with and without access to the Academic Research Track. But we have heard from many users that the Academic Research Track presents new opportunities for Twitter data collection that are essential for researchers interested in the observability of social media platforms. Twitter is making a good faith effort to work with the academic research community, and we thought twarc should support it, even if big challenges lie ahead. So why are people interested in the Academic Research Track? Once your application has been approved you are able to collect data from the full history of Tweets, at no cost. This is a massive improvement over the v1.1 access which was limited to a one week window and researchers had to pay for access. Access to the full archive means it’s now possible to study events that have happened in the past back to the beginning of Twitter in 2006. If you do create any historical datasets we’d love for you to share the tweet identifier datasets in The Catalog. However this opening up of access on the one hand comes with a simultaneous contraction in terms of how much data can be collected at one time. The remainder of this post describes some of the details and the design decisions we have made with twarc2 to address them. If you would prefer to watch a quick introduction to using twarc v2 please check out this short video: Installation If you are familiar with installing twarc nothing is changed. You still install (or upgrade) with pip as you did before: $ pip install --upgrade twarc In fact you will still have full access to the v1.1 API just as you did before. So the old commands will continue to work as they did1 $ twarc search blacklivesmatter > tweets.jsonl twarc2 was designed to let you to continue to use Twitter’s v1.1 API undisturbed until it is finally turned off by Twitter, at which point the functionality will be removed from twarc. All the support for the v2 API is mediated by a new command line utility twarc2. For example to search for blacklivesmatter tweets and write them to a file tweets.jsonl: $ twarc2 search blacklivesmatter > tweets.jsonl All the usual twarc functionality such as searching for tweets, collecting live tweets from the streaming API endpoint, requesting user timelines and user metadata are all still there, twarc2 --help gives you the details. But while the interface looks the same there’s quite a bit different going on behind the scenes. Representation Truth be told, there is no shortage of open source libraries and tools for interacting with the Twitter API. In the past twarc has made a bit of a name for itself by catering to a niche group of users who want a reliable, programmable way to collect the canonical JSON representation of a tweet. JavaScript Object Notation (JSON) is the language of Web APIs, and Twitter has kept its JSON representation of a tweet relatively stable over the years. Rather than making lots of decisions about the many ways you might want to collect, model and analyze tweets twarc has tried to do one thing and do it well (data collection) and get out of the way so that you can use (or create) the tools for putting this data to use. But the JSON representation of a tweet in the Twitter v2 API is completely burst apart. The v2 base representation of a tweet is extremely lean and minimal, and just includes the text of the tweet its identifier and a handful of other things. All the details about the user who created the tweet, embedded media, and more are not included. Fortunately this information is still available, but the user needs to craft their API request to request tweets using a set of expansions that tell the Twitter API what additional entities to include. In addition for each expansion there are a set of field options to include that control what of these expansions is returned. So rather than there being a single JSON representation of a tweet API users now have the ability to shape the data based on what they need, much like how GraphQL APIs work. This kind of makes you wonder why Twitter didn’t make their GraphQL API available. For specific use cases this customizability is very useful, but the mutability of the representation of a tweet presents challenges when collecting data for future use. If you didn’t request the right expansions or fields when collecting the data then you won’t be able to analyze that data later when doing your research. To solve for this twarc2 has been designed to collect the richest possible representation for a tweet, by requesting all possible expansions and field combinations for tweets. See the expansions module for the details if you are interested. This takes a significant burden off of users to digest the API documentation, and craft the correct API requests themselves. In addition the twarc community will be monitoring the Twitter API documentation going forward to incorporate new expansions and fields as they will inevitably be added in the future. Flattening This is diving into the weeds a little bit, but it’s worth noting here that Twitter’s introduction of expansions allows data that was once duplicated across multiple tweets (such as user information, media, retweets, etc) to be included once per response from the API. This means that instead of seeing information about the user who created a tweet in the context of their tweet the user will be referenced using an identifier, and this identifier will map to user metadata in the outer envelope of the response. It makes sense why Twitter have introduced expansions since it means in a set of 100 tweets from a given user the user information will just be included once rather than repeated 100 times, which means less data, less network traffic and less money. It’s even more significant when consider the large number of possible expansions. However this pass by-reference rather than by-value presents some challenges for stream based processing which expects each tweet to be self-contained. For this reason we’ve introduce the idea of flattening the response data when persisting the JSON to disk. This means that tools and data pipelines that expect to operate on a stream of tweets can continue to do so. Since the representation of a tweet is so dependent on how data is requested we’ve taken the opportunity to introduce a small stanza of twarc specific metadata using the __twarc prefix. This metadata records what API endpoint the data was requested from, and when. This information is critically important when interpreting the data, because some information about a tweet like its retweet and quote counts are constantly changing. Data Flows As mentioned above you can still collect tweets from the search and streaming API endpoints in a way that seems quite similar to the v1 API. The big changes however are the quotas associated with these endpoints which govern how much can be collected. These quotas control how many requests can be sent to Twitter in 15 minute intervals. In fact these quotas are not much changed, but what’s new are app wide quotas that constrain how many tweets a given application (app) can collect every month. An app in this context is a piece of software (e.g. your twarc software) identified by unique API keys set up in the Twitter Developer Portal. The standard API access sets a 500,000 tweet per month limit. This is a huge change considering there were no monthly app limits before. If you get approved for the Academic Research track your app quota is increased to 10 million per month. This is markedly better but the achievable data volume is still nothing like the v1.1 API, as these graphs attempt to illustrate: twarc2 will still observe the same rate limits, but once you’ve collected your portion for the month there’s not much that can be done, for that app at least. Apart from the quotas Twitter’s streaming endpoint in v2 is substantially changed which impacts how users interact with twarc. Previously twarc users would be able to create up to to two connections to the filter stream API. This could be done by simply: twarc filter obama > obama.jsonl However in the Twitter v2 API only apps can connect to the filter stream, and they can only connect once. At first this seems like a major limitation but rather than creating a connection per query the v2 API allows you to build a set of rules for tweets to match, which in turns controls what tweets are included in the stream. This means you can collect for multiple types of queries at the same time, and the tweets will come back with a piece of metadata indicating what rule caused its inclusion. This translates into a markedly different set of interactions at the command line for collecting from the stream where you first need to set your stream rules and then open a connection to fetch it. twarc2 stream-rules add blacklivesmatter twarc2 stream > tweets.jsonl One useful side effect of this is that you can update the stream (add and remove rules) while the stream is in motion: twarc2 stream-rules add blm While you are limited by the API quota in terms of how many tweets you can collect, tweets are not “dropped on the floor” when the volume gets too high. Once upon a time the v1.1 filter stream was rumored to be rate limited when your stream exceeds 1% of the total volume of new tweets. Plugins In addition to twarc helping you collect tweets the GitHub repository has also been a place to collect a set of utilities for working with the data. For example there are scripts for extracting and unshortening urls, identifying suspended/deleted content, extracting videos, buiding wordclouds, putting tweets on maps, displaying network graph visualizations, counting hashtags, and more. These utilities all work like Unix filters where the input is a stream of tweets and the output varies depending on what the utility is doing, e.g. a Gephi file for a network visualization, or a folder of mp4 files for video extraction. While this has worked well in general the kitchen sink approach has been difficult to manage from a configuration management perspective. Users have to download these scripts manually from GitHub or by cloning the repository. For some users this is fine, but it’s a bit of a barrier to entry for users who have just installed twarc with pip. Furthermore these plugins often have their own dependencies which twarc itself does not. This lets twarc can stay pretty lean, and things like youtube_dl, NetworkX or Pandas can be installed by people that want to use utilities that need them. But since there is no way to install the utilities there isn’t a way to ensure that the dependencies are installed, which can lead to users needing to diagnose missing libraries themselves. Finally the plugins have typically lacked their own tests. twarc’s test suite has really helped us track changes to the Twitter API and to make sure that it continues to operate properly as new functionality has been added. But nothing like this has existed for the utilities. We’ve noticed that over time some of them need updating. Also their command line arguments have drifted over time which can lead to some inconsistencies in how they are used. So with twarc2 we’ve introduced the idea of plugins which extend the functionality of the twarc2 command, are distributed on PyPI separately from twarc, and exist in their own GitHub repositories where they can be developed and tested independently of twarc itself. This is all achieved through twarc2’s use of the click library and specifically click-plugins. So now if you would like to convert your collected tweets to CSV you can install the twarc-csv: $ pip install twarc-csv $ twarc2 search covid19 > covid19.jsonl $ twarc2 csv covid19.jsonl > covid19.csv Or if you want to extract embedded and referenced videos from tweets you can install twarc-videos which will write all the videos to a directory: $ pip install twarc-videos $ twarc2 videos covid19.jsonl --download-dir covid19-videos You can write these plugins yourself and release them as needed. Check out the plugin reference implementation tweet-ids for a simple example to adapt. We’re still in the process of porting some of the most useful utilities over and would love to see ideas for new plugins. Check out the current list of twarc2 plugins and use the twarc issue tracker on GitHub to join the discussion. You may notice from the list of plugins that twarc now (finally) has documentation on ReadTheDocs external from the documentation that was previously only available on GitHub. We got by with GitHub’s rendering of Markdown documents for a while, but GitHub’s boilerplate designed for developers can prove to be quite confusing for users who aren’t used to selectively ignoring it. ReadTheDocs allows us to manage the command line and API documentation for twarc, and to showcase the work that has gone into the Spanish, Japanese, Portuguese, Swedish, Swahili and Chinese translations. Feedback Thanks for reading this far! We hope you will give twarc2 a try. Let us know what you think either in comments here, in the DocNow Slack or over on GitHub. ✨ ✨ Happy twarcing! ✨ ✨ ✨ Windows users will want to indicate the output file using a second argument rather than redirecting output with >. See this page for details.↩ $ j You may have noticed that I try to use this static website as a journal. But, you know, not everything I want to write down is really ready (or appropriate) to put here. Some of these things end up in actual physical notebooks–there’s no beating the tactile experience of writing on paper for some kind of thinking. But I also spend a lot of time on my laptop, and at the command line in some form or another. So I have a directory of time stamped Markdown files stored on Dropbox, for example: ... /home/ed/Dropbox/Journal/2019-08-25.md /home/ed/Dropbox/Journal/2020-01-27.md /home/ed/Dropbox/Journal/2020-05-24.md /home/ed/Dropbox/Journal/2020-05-25.md /home/ed/Dropbox/Journal/2020-05-31.md ... Sometimes these notes migrate into a blog post or some other writing I’m doing. I used this technique quite a bit when writing my dissertation when I wanted to jot down things on my phone when an idea arrived. I’ve tried a few different apps for editing Markdown on my phone, but mostly settled on iA Writer which mostly just gets out of the way. But when editing on my laptop I tend to use my favorite text editor Vim with the vim-pencil plugin for making Markdown fun and easy. If Vim isn’t your thing and you use another text editor keep reading since this will work for you too. The only trick to this method of journaling is that I just need to open the right file. With command completion on the command line this isn’t so much of a chore. But it does take a moment to remember the date, and craft the right path. Today while reflecting on how nice it is to still be using Unix, it occurred to me that I could create a little shell script to open my journal for that day (or a previous day). So I put this little file j in my PATH: #!/bin/zsh journal_dir="/home/ed/Dropbox/Journal" if [ "$1" ]; then date=$1 else date=`date +%Y-%m-%d` fi vim "$journal_dir/$date.md" So now when I’m in the middle of something else and want to jot a note in my journal I just type j. Unix, still crazy after all these years. Strengths and Weaknesses Quoting Macey (2019), quoting Foucault, quoting Nietzsche: One thing is needful. – To ‘give style’ to one’s character – a great and rare art! It is practised by those who survey all the strengths and weaknesses that their nature has to offer and then fit them into an artistic plan until each appears as art and reason and even weaknesses delight the eye. Nietzsche, Williams, Nauckhoff, & Del Caro (2001), p. 290 This is a generous and lively image of what art does when it is working. Art is not perfection. Macey, D. (2019). The lives of Michel Foucault: A biography. Verso. Nietzsche, F. W., Williams, B., Nauckhoff, J., & Del Caro, A. (2001). The gay science: with a prelude in German rhymes and an appendix of songs. Cambridge, U.K. ; New York: Cambridge University Press. Data Speculation I’ve taken the ill-advised approach of using the Coronavirus as a topic to frame the exercises in my computer programming class this semester. I say “ill-advised” because given the impact that COVID has been having on students I’ve been thinking they probably need a way to escape news of the virus by way of writing code, rather than diving into it more. It’s late in the semester to modulate things but I think we will shift gears to look at programming through another lens after spring break. That being said, one of the interesting things we’ve been doing is looking at vaccination data that is being released by the Maryland Department of Health through their ESRI ArcGIS Hub. Note: this dataset has since been removed from the web because it has been superseded by a new dataset that includes single dose vaccinations. I guess it’s good that students get a feel for how ephemeral data on the web is, even when it is published by the government. We noticed that this dataset recorded a small number of vaccinations as happening as early as the 1930s up until December 11, 2020 when vaccines were approved for use. I asked students to apply what we have been learning about Python (files, strings, loops, and sets) to identify the Maryland counties that were responsible for generating this anomalous data. I thought this exercise provided a good demonstration using real, live data that critical thinking about the provenance of data is always important because there is no such thing as raw data (Gitelman, 2013). While we were working with the data to count the number of anomalous vaccinations per county one of my sharp eyed students noticed that the results we were seeing with my version of the dataset (downloaded on February 28) were different from what we saw with his (downloaded on March 4). We expected to see new rows in the later one because new vaccination data seem to be reported daily–which is cool in itself. But we were surprised to find new vaccination records for dates earlier than December 11, 2020. Why would new vaccinations for these erroneous older dates still be entering the system? For example the second dataset downloaded March 4 acquired 6 new rows: Object ID Vaccination Date County Daily First Dose Cumulative First Dose Daily Second Dose Cumulative Second Dose 4 1972/10/13 Allegany 1 1 0 0 5 1972/12/16 Baltimore 1 1 0 0 6 2012/02/03 Baltimore 1 2 0 0 28 2020/02/24 Baltimore City 1 2 0 0 34 2020/08/24 Baltimore 1 4 0 0 64 2020/12/10 Prince George’s 1 3 0 0 And these rows present in the February 28 version were deleted in the March 4 version: Object ID Vaccination Date County Daily First Dose Cumulative First Dose Daily Second Dose Cumulative Second Dose 4 2019/12/26 Frederick 1 1 0 0 15 2020/01/25 Talbot 1 1 0 0 19 2020/01/28 Baltimore 1 1 0 0 20 2020/01/30 Caroline 1 1 0 0 28 2020/02/12 Prince George’s 1 1 0 0 30 2020/02/20 Anne Arundel 1 6 0 0 56 2020/10/16 Frederick 1 7 0 4 59 2020/11/01 Wicomico 1 1 0 0 60 2020/11/04 Frederick 1 8 0 4 I found these additions perplexing at first, because I assumed these outliers were part of an initial load. But it appears that the anomalies are still being generated? The deletions suggest that perhaps the anomalous data is being identified and scrubbed in a live system that is then dumping out the data? Or maybe the code that is being used to update the dataset in ArcGIS Hub itself is malfunctioning in some way? If you are interested in toying around with the code and data it is up on GitHub. I was interested to learn about pandas.DataFrame.merge which is useful for diffing tables when you use indicator=True. At any rate, having students notice, measure and document anomalies like this seems pretty useful. I also asked them to speculate about what kinds of activities could generate these errors. I meant speculate in the speculative fiction sense of imagining a specific scenario that caused it. I think this made some students scratch their head a bit, because I wasn’t asking them for the cause, but to invent a possible cause. Based on the results so far I’d like to incorporate more of these speculative exercises concerned with the functioning of code and data representations into my teaching. I want to encourage students to think creatively about data processing as they learn about the nuts and bolts of how code operates. For example the treatments in How to Run a City Like Amazon, and Other Fables which use sci-fi to test ideas about how information technologies are deployed in society. Another model is the Speculative Ethics Book Club which also uses sci-fi to explore the ethical and social consequences of technology. I feel like I need to read up on specualtive research more generally before doing this though (Michael & Wilkie, 2020). I’d also like to focus the speculation down at the level of the code or data processing, rather than at the macro super-system level. But that has its place too. Another difference is that I was asking students to engage in speculation about the past rather than the future. How did the data end up this way? Perhaps this is more of a genealogical approach, of winding things backwards, and tracing what is known. Maybe it’s more Mystery than Sci-Fi. The speculative element is important because (in this case) operations at the MD Dept of Health, and their ArcGIS Hub setup are mostly opaque to us. But even when access isn’t a problem these systems they can feel opaque, because rather than there being a dearth of information you are drowning in it. Speculation is a useful abductive approach to hypothesis generation and, hopefully, understanding. Update 2021-03-17: Over in the fediverse David Benque recommended I take a look at Matthew Stanley’s chapter in (Gitelman, 2013) “Where Is That Moon, Anyway? The Problem of Interpreting Historical Solar Eclipse Observations” for the connection to Mystery. For the connection to Peirce and abduction he also pointed to Luciana Parisi’s chapter “Speculation: A method for the unattainable” in Lury & Wakeford (2012). Definitely things to follow up on! References Gitelman, L. (Ed.). (2013). “Raw data” is an oxymoron. MIT Press. Lury, C., & Wakeford, N. (2012). Inventive methods: The happening of the social. Routledge. Michael, M., & Wilkie, A. (2020). Speculative research. In The Palgrave encyclopedia of the possible (pp. 1–8). Cham: Springer International Publishing. Retrieved from https://doi.org/10.1007/978-3-319-98390-5_118-1 Recovering Foucault I’ve been enjoying reading David Macey’s biography of Michel Foucault, that was republished in 2019 by Verso. Macey himself is an interesting figure, both a scholar and an activist who took leave from academia to do translation work and to write this biography and others of Lacan and Fanon. One thing that struck me as I’m nearing the end of Macey’s book is the relationship between Foucault and archives. I think Foucault has become emblematic of a certain brand of literary analysis of “the archive” that is far removed from the research literature of archival studies, while using “the archive” as a metaphor (Caswell, 2016). I’ve spent much of my life working in libraries and digital preservation, and now studying and teaching about them from the perspective of practice, so I am very sympathetic to this critique. It is perhaps ironic that the disconnect between these two bodies of research is a difference in discourse which Foucault himself brought attention to. At any rate, the thing that has struck me while reading this biography is how much time Foucault himself spent working in libraries and archives. Here’s Foucault in his own words talking about his thesis: In Histoire de la folie à l’âge classique I wished to determine what could be known about mental illness in a given epoch … An object took shape for me: the knowledge invested in complex systems of institutions. And a method became imperative: rather than perusing … only the library of scientific books, it was necessary to consult a body of archives comprising decrees, rules hospital and prison registers, and acts of jurisprudence. It was in the Arsenal or the Archives Nationales that I undertook the analysis of a knowledge whose visible body is neither scientific nor theoretical discourse, nor literature, but a daily and regulated practice. (Macey, 2019, p. 94) Foucault didn’t simply use archives for his research: understanding the processes and practices of archives were integral to his method. Even though the theory and practice of libraries and archives are quite different given their different functions and materials, they are often lumped together as a convenience in the same buildings. Macey blurs them a little bit, in sections like this where he talks about how important libraries were to Foucault’s work: Foucault required access to Paris for a variety of reasons, not least because he was also teaching part-time at ENS. The putative thesis he had begun at the Fondation Thiers – and which he now described to Polin as being on the philosophy of psychology – meant that he had to work at the Bibliothèque Nationale and he had already become one of its habitues. For the next thirty years, Henri Labrouste’s great building in the rue de Richelieu, with its elegant pillars and arches of cast iron, would be his primary place of work. His favourite seat was in the hemicycle, the small, raised section directly opposite the entrance, sheltered from the main reading room, where a central aisle separates rows of long tables subdivided into individual reading desks. The hemicycle affords slighty more quiet and privacy. For thirty years, Foucault pursued his research here almost daily, with occasional forays to the manuscript department and to other libraries, and contended with the Byzantine cataloguing system: two incomplete and dated printed catalogues supplemented by cabinets containing countless index cards, many of them inscribed with copperplate handwriting. Libraries were to become Foucault’s natural habitat: ‘those greenish institutions where books accumulate and where there grows the dense vegetation of their knowledge’ There’s a metaphor for you: libraries as vegetation :) It kind of reminds me of some recent work looking at decentralized web technologies in terms of mushrooms. But I digress. I really just wanted to note here that the erasure of archival studies from humanities research about “the archive” shouldn’t really be attributed to Foucault, whose own practice centered the work of libraries and archives. Foucault wasn’t just writing about an abstract archive, he was practically living out of them. As someone who has worked in libraries and archives I can appreciate how power users (pun intended) often knew aspects of the holdings and intricacies of their their management better than I did. Archives, when they are working, are always collaborative endeavours, and the important thing is to recognize and attribute the various sides of that collaboration. PS. Writing this blog post led me to dig up a few things I want to read (Eliassen, 2010; Radford, Radford, & Lingel, 2015 ). References Caswell, M. (2016). The archive is not an archives: On acknowledging the intellectual contributions of archival studies. Reconstruction, 16(1). Retrieved from http://reconstruction.eserver.org/Issues/161/Caswell.shtml Eliassen, K. (2010). Archives of Michel Foucualt. In E. Røssaak (Ed.), The archive in motion, new conceptions of the archive in contemporary thought and new media practices. Novus Press. Macey, D. (2019). The lives of Michel Foucault: A biography. Verso. Radford, G. P., Radford, M. L., & Lingel, J. (2015). The library as heterotopia: Michel Foucault and the experience of library space. Journal of Documentation, 71(4), 773–751. Teaching OOP in the Time of COVID I’ve been teaching a section of the Introduction to Object Oriented Programming at the UMD College for Information Studies this semester. It’s difficult for me, and for the students, because we are remote due to the Coronavirus pandemic. The class is largely asynchronous, but every week I’ve been holding two synchronous live coding sessions in Zoom to discuss the material and the exercises. These have been fun because the students are sharp, and haven’t been shy about sharing their screen and their VSCode session to work on the details. But students need quite a bit of self-discipline to move through the material, and probably only about 1/4 of the students take advantage of these live sessions. I’m quite lucky because I’m working with a set of lectures, slides and exercises that have been developed over the past couple of years by other instructors: Josh Westgard, Aric Bills and Gabriel Cruz. You can see some of the public facing materials here. Having this backdrop of content combined with Severance’s excellent (and free) Python for Everybody has allowed me to focus more on my live sessions, on responsive grading, and to also spend some time crafting additional exercises that are geared to this particular moment. This class is in the College for Information Studies and not in the Computer Science Department, so it’s important for the students to not only learn how to use a programming language, but to understand programming as a social activity, with real political and material effects in the world. Being able to read, understand, critique and talk about code and its documentation is just as important as being able to write it. In practice, out in the “real world” of open source software I think these aspects are arguably more important. One way I’ve been trying to do this in the first few weeks of class is to craft a sequence of exercises that form a narrative around Coronavirus testing and data collection to help remind the students of the basics of programming: variables, expressions, conditionals, loops, functions, files. In the first exercise we imagined a very simple data entry program that needed to record results of Real-time polymerase chain reaction tests (RT-PCR). I gave them the program and described how it was supposed to work, and asked them describe (in English) any problems that they noticed and to submit a version of the program with problems fixed. I also asked them to reflect on a request from their boss about adding the collection of race, gender and income information. The goal here was to test their ability to read the program and write English about it while also demonstrating a facility for modifying the program. Most importantly I wanted them to think about how inputs such as race or gender have questions about categories and standards behind them, and weren’t simply a matter of syntax. The second exercise builds on the first by asking them to adjust the revised program to be able to save the data in a very particular format. Yes, in the first exercise the data is stored in memory and printed to the screen in aggregate at the end. The scenario here is that the Department of Health and Human Services has assumed the responsibility for COVID test data collection from the Centers for Disease Control. Of course this really happened, but the data format I chose was completely made up (maybe we will be working with some real data at the end of the semester if I continue with this theme). The goal in this exercise was to demonstrate their ability to read another program and fit a function into it. The students were given a working program that had a save_results() function stubbed out. In addition to submitting their revised code I asked them to reflect on some limitations of the data format chosen, and the data processing pipeline that it was a part of. And in the third exercise I asked them to imagine that this lab they were working in had a scientist who discovered a problem with some of the thresholds for acceptable testing, which required an update to the program from Exercise 2, and also a test suite to make sure the program was behaving properly. In addition to writing the tests I asked them to reflect on what functionality was not being tested that probably should be. This alternation between writing code and writing prose is something I started doing as part of a Digital Curation class. I don’t know if this dialogical or perhaps dialectical, approach is something others have tried. I should probably do some research to see. In my last class I alternated week by week: one week reading and writing code, the next week reading and writing prose. But this semester I’ve stayed focused on code, but required the reading and writing of code as well as prose about code in the same week. I hope to write more about how this goes, and these exercises as I go. I’m not sure if I will continue with the Coronavirus data examples. One thing I’m sensitive to is that my students themselves are experiencing the effects of the Coronavirus, and may want to escape it just for a bit in their school work. Just writing in the open about it here, in addition to the weekly meetings I’ve had with Aric, Josh and Gabriel has been very useful. Speaking of those meetings. I learned today from Aric that tomorrow (February 20th, 2021) is the 30th anniversary of Python’s first public release! You can see this reflected in this timeline. This v0.9.1 release was the first release Guido van Rossum made outside of CWI and was made on the Usenet newsgroup alt.sources where it is split out into chunks that need to be reassembled. Back in 2009 Andrew Dalke located a and repackaged these sources in Google Groups which acquired alt.sources as part of DejaNews in 2001. But if you look at the time stamp on the first part of the release you can see that it was made February 19, 1991 (not February 20). So I’m not sure if the birthday is actually today. I sent this little note out to my students with this wonderful two part oral history that the Computer History Museum did with Guido van Rossum a couple years ago. I turns out Both of his parents were atheists and pacifists. His dad went to jail because he refused to be conscripted into the military. That and many more details of his background and thoughts about the evolution of Python can be found in these delightful interviews: Happy Birthday Python! GPT-3 Jam One of the joys of pandemic academic life has been a true feast of online events to attend, on a wide variety of topics, some of which are delightfully narrow and esoteric. Case in point was today’s Reflecting on Power and AI: The Case of GPT-3 which lived up to its title. I’ll try to keep an eye out for when the video posts, and update here. The workshop was largely organized around an exploration of whether GPT-3, the largest known machine learning language model, changes anything for media studies theory, or if it amounts to just more of the same. So the discussion wasn’t focused so much on what games could be played with GPT-3, but rather if GPT-3 changes the rules of the game for media theory, at all. I’m not sure there was a conclusive answer at the end, but it sounded like the consensus was that current theorization around media is adequate for understanding GPT-3, but it matters greatly what theory or theories are deployed. The online discussion after the presentations indicated that attendees didn’t see this as merely a theoretical issue, but one that has direct social and political impacts on our lives. James Steinhoff looked at GPT-3 using a Marxist media theory perspective where he told the story of GPT-3’s as a project of OpenAI and as a project of capital. OpenAI started with much fanfare in 2015 as a non-profit initiative where the technology, algorithms and models developed would would be kept openly licensed and freely available so that the world could understand the benefits and risks of AI technology. Steinhoff described how in 2019 the project’s needs for capital (compute power and staff) transitioned it from a non-profit into a capped-profit company, which is now owned, or at least controlled, by Microsoft. The code for generating the model as well as the model itself are gated behind a token driven Web API run my Microsoft. You can get on a waiting list to use it, but apparently a lot of people have been waiting a while, so … Being a Microsoft employee probably helps. I grabbed a screenshot of the pricing page that Steinhoff shared during his presentation: I’d be interested to hear more about how these tokens operate. Are they per-request, or are they measured according something else? I googled around a bit during the presentation to try to find some documentation for the Web API, and came up empty handed. I did find Shreya Shankar’s gpt3-sandbox project for interacting with the API in your browser (mostly for iteratively crafting text input in order to generate desired output). It depends on the openai Python package created by OpenAI themselves. The docs for openai then point at a page on the openai.com website which is behind a login. You can create an account, but you need to be pre-approved (made it through the waitlist) to be able to see the docs. There’s probably some sense that can be made from examining the python client though. All of the presentations in some form or another touched on the 175 billion parameters that were used to generate the model. But the API to the model doesn’t have that many parameters. It allows you to enter text and get text back. But the API surface that the GPT-3 service provides could be interesting to examine a bit more closely, especially to track how it changes over time. In terms of how this model mediates knowledge and understanding it’ll be important watch. Steinhoff’s message seemed to be that, despite the best of intentions, GPT-3 functions in the service of very large corporations with very particular interests. One dimension that he didn’t explore perhaps because of time, is how the GPT-3 model itself is fed massive amounts of content from the web, or the commons. Indeed 60% of the data came from the CommonCrawl project. GPT-3 is an example of an extraction project that has been underway at large Internet companies for some time. I think the critique of these corporations has often been confined to seeing them in terms of surveillance capitalism rather than in terms of raw resource extraction, or the primitive accumulation of capital. The behavioral indicators of who clicked on what are certainly valuable, but GPT-3 and sister projects like CommonCrawl shows just the accumulation of data with modest amounts of metadata can be extremely valuable. This discussion really hit home for me since I’ve been working with Jess Ogden and Shawn Walker using CommonCrawl as a dataset for talking about the use of web archives, while also reflecting on the use of web archives as data. CommonCrawl provides a unique glimpse into some of the data operations that are at work in the accumulation of web archives. I worry that the window is closing and the CommonCrawl itself will be absorbed into Microsoft. Following Steinhoff Olya Kudina and Bas de Boer jointly presented some compelling thoughts about how its important to understand GPT-3 in terms of sociotechnical theory, using ideas drawn from Foucault and Arendt. I actually want to watch their presentation again because it followed a very specific path that I can’t do justice to here. But their main argument seemed to be that GPT-3 is an expression of power and that where there is power there is always resistance to power. GPT-3 can and will be subverted and used to achieve particular political ends of our own choosing. Because of my own dissertation research I’m partial to Foucault’s idea of governmentality, especially as it relates to ideas of legibility (Scott, 1998)–the who, what and why of legibility projects, aka archives. GPT-3 presents some interesting challenges in terms of legibility because the model is so complex, the results it generates defy deductive logic and auditing. In some ways GPT-3 obscures more than it makes a population legible, as Foucault moved from disciplinary analysis of the subject, to the ways in which populations are described and governed through the practices of pastoral power, of open datasets. Again the significance of CommonCrawl as an archival project, as a web legibility project, jumps to the fore. I’m not as up on Arendt as I should be, so one outcome of their presentation is that I’m going to read her The Human Condition which they had in a slide. I’m long overdue. References Scott, J. C. (1998). Seeing like a state: How certain schemes to improve the human condition have failed. Yale University Press. mimetypes Today I learned that Python has a mimetypes module, and has ever since Guido von Rossum added it in 1997. Honestly I’m just a bit sheepish to admit this discovery, as someone who has been using Python for digital preservation work for about 15 years. But maybe there’s a good reason for that. Since the entire version history for Python is available on GitHub (which is a beautiful thing in itself) you can see that the mimetypes module started as a guess_type() function built around a pretty simple hard coded mapping of file extensions to mimetypes. The module also includes a little bit of code to look for, and parse, mimetype registries that might be available on the host operating system. The initial mimetype registries used included one from the venerable Apache httpd web server, and the Netscape web browser, which was about three years old at the time. It makes sense why this function to look up a mimetype for a filename would be useful at that time, since Python was being used to serve up files on the nascent web and for sending email, and whatnot. Today the module looks much the same, but has a few new functions and about twice as many mimetypes in its internal list. Some of the new mimetypes include text/csv, audio.mpeg, application/vnd.ms-powerpoint, application/x-shockwave-flash, application/xml, and application/json. Comparing the first commit to the most latest provides a thumbnail sketch of 25 years of web format evolution. I’ll admit, this is is a bit of an esoteric thing to be writing a blog post about. So I should explain. At work I’ve been helping out on a community archiving project which has accumulated a significant amount of photographs, scans, documents of various kinds, audio files and videos. Some of these files are embedded in web applications like Omeka, some are in cloud storage like Google Drive, or on the office networked attached storage, and others are on scattered storage devices in people’s desk drawers and closets. We’ve also created new files during community digitization events, and oral history interviews. As part of this work we’ve wanted to start building a place on the web where all these materials live. This has required not only describing the files, but also putting all the files in one place so that access can be provided. In principle this sounds simple. But it turns out that collecting the files from all these diverse locations poses significant challenges, because their context matters. The filenames, and the directories they are found in, are sometimes the only descriptive metadata that exists for this data. In short, the original order matters. But putting this content on the web means that the files need to be brought together and connected with their metadata programmatically. This is how I stumbled across the mimetypes module. I’ve been writing some throwaway code to collect the files together into the same directory structure while preserving their original filenames and locations in an Airtable database. I’ve been using the magic module to identify the format of the file, which is used to copy the file into a Dropbox storage location. The extension is important because we are expecting this to be a static site serving up the content and we want the files to also be browsable using the Dropbox drive. It turns out the mimetypes.guess_extension is pretty useful for turning a mediatype into an file extension. I’m kind of surprised that it took me this long to discover mimetypes, but I’m glad I did. As an aside I think this highlights for me how important Git can be as an archive and research method for software studies work. Northwest Branch Cairn Here is a short recording and a couple photos from my morning walk along the Northwest Branch trail with Penny. I can’t go every day but at 7 months old she has tons of energy, so it’s generally a good idea for all concerned to go at least every other morning. And it’s a good thing, because the walk is surprisingly peaceful, and it’s such a joy to see her run through the woods. After walking about 30 minutes there is this little cairn that is a reminder for me to turn around. After seeing it grow in size I was sad to see it knocked down one day. But, ever so slowly, it is getting built back up again. inkdroid-org-2856 ---- twarc2 Toggle Navigation inkdroid About Bookmarks Photos Music Software Social Talks twarc2 April 7, 2021 python twitter This post was originally published on Medium but I spent time writing it so I wanted to have it here too. TL;DR twarc has been redesigned from the ground up to work with the new Twitter v2 API and their Academic Research track. Many thanks for the code and design contributions of Betsy Alpert, Igor Brigadir, Sam Hames, Jeff Sauer, and Daniel Verdeer that have made twarc2 possible, as well as early feedback from Dan Kerchner, Shane Lin, Miles McCain, 李荣蓬, David Thiel, Melanie Walsh and Laura Wrubel. Extra special thanks to the Institute for Future Environments at Queensland University of Technology for supporting Betsy and Sam in their work, and for the continued support of the Mellon Foundation. Back in August of last year Twitter announced early access to their new v2 API, and their plans to sunset the v1.1 API that has been active for almost the last 10 years. Over the lifetime of their v1.1 API Twitter has become deeply embedded in the media landscape. As magazines, newspapers and television have moved onto the web they have increasingly adopted tweets as a mechanism for citing politicians, celebrities and organizations, while also using them to document current events, generate leads and gather feedback for evolving stories. As a result Twitter has also become a popular object of study for humanities and social science researchers looking to understand the world as reflected, refracted and distorted by/in social media. On the surface the v2 API update seems pretty insignificant since the shape of a tweet, its parts, properties and affordances, aren’t changing at all. Tweets with 280 characters of text, images and video will continue to be posted, retweeted and quoted. However behind the scenes the representation of a tweet as data, and the quotas that control the rates at which this data can flow between apps and other third party services will be greatly transformed. Needless to say, v2 represents a big change for the Documenting the Now project. Along with community members we’ve developed and maintained open source tools like twarc that talk directly to the Twitter API to help users to search for and collect live tweets that match criteria like hashtags, names and geographic locations. Today we’re excited to announce the release of twarc v2 which has been designed from the ground up to work with the v2 API and Twitter’s new Academic Research track. Clearly it’s extremely problematic having a multi-national corporation act as a gatekeeper for who counts as an academic researcher, and what constitutes academic research. We need look no further than the recent experiences of Timnit Gebru and Margaret Mitchell at Google for an example of what happens when research questions run up against the business objectives of capital. We only know their stories because Gebru and Mitchell’s bravely took a principled approach, where many researchers would have knowingly or unknowingly shaped their research to better fit the needs of the company. So it is important for us that twarc still be usable by people with and without access to the Academic Research Track. But we have heard from many users that the Academic Research Track presents new opportunities for Twitter data collection that are essential for researchers interested in the observability of social media platforms. Twitter is making a good faith effort to work with the academic research community, and we thought twarc should support it, even if big challenges lie ahead. So why are people interested in the Academic Research Track? Once your application has been approved you are able to collect data from the full history of Tweets, at no cost. This is a massive improvement over the v1.1 access which was limited to a one week window and researchers had to pay for access. Access to the full archive means it’s now possible to study events that have happened in the past back to the beginning of Twitter in 2006. If you do create any historical datasets we’d love for you to share the tweet identifier datasets in The Catalog. However this opening up of access on the one hand comes with a simultaneous contraction in terms of how much data can be collected at one time. The remainder of this post describes some of the details and the design decisions we have made with twarc2 to address them. If you would prefer to watch a quick introduction to using twarc v2 please check out this short video: Installation If you are familiar with installing twarc nothing is changed. You still install (or upgrade) with pip as you did before: $ pip install --upgrade twarc In fact you will still have full access to the v1.1 API just as you did before. So the old commands will continue to work as they did1 $ twarc search blacklivesmatter > tweets.jsonl twarc2 was designed to let you to continue to use Twitter’s v1.1 API undisturbed until it is finally turned off by Twitter, at which point the functionality will be removed from twarc. All the support for the v2 API is mediated by a new command line utility twarc2. For example to search for blacklivesmatter tweets and write them to a file tweets.jsonl: $ twarc2 search blacklivesmatter > tweets.jsonl All the usual twarc functionality such as searching for tweets, collecting live tweets from the streaming API endpoint, requesting user timelines and user metadata are all still there, twarc2 --help gives you the details. But while the interface looks the same there’s quite a bit different going on behind the scenes. Representation Truth be told, there is no shortage of open source libraries and tools for interacting with the Twitter API. In the past twarc has made a bit of a name for itself by catering to a niche group of users who want a reliable, programmable way to collect the canonical JSON representation of a tweet. JavaScript Object Notation (JSON) is the language of Web APIs, and Twitter has kept its JSON representation of a tweet relatively stable over the years. Rather than making lots of decisions about the many ways you might want to collect, model and analyze tweets twarc has tried to do one thing and do it well (data collection) and get out of the way so that you can use (or create) the tools for putting this data to use. But the JSON representation of a tweet in the Twitter v2 API is completely burst apart. The v2 base representation of a tweet is extremely lean and minimal, and just includes the text of the tweet its identifier and a handful of other things. All the details about the user who created the tweet, embedded media, and more are not included. Fortunately this information is still available, but the user needs to craft their API request to request tweets using a set of expansions that tell the Twitter API what additional entities to include. In addition for each expansion there are a set of field options to include that control what of these expansions is returned. So rather than there being a single JSON representation of a tweet API users now have the ability to shape the data based on what they need, much like how GraphQL APIs work. This kind of makes you wonder why Twitter didn’t make their GraphQL API available. For specific use cases this customizability is very useful, but the mutability of the representation of a tweet presents challenges when collecting data for future use. If you didn’t request the right expansions or fields when collecting the data then you won’t be able to analyze that data later when doing your research. To solve for this twarc2 has been designed to collect the richest possible representation for a tweet, by requesting all possible expansions and field combinations for tweets. See the expansions module for the details if you are interested. This takes a significant burden off of users to digest the API documentation, and craft the correct API requests themselves. In addition the twarc community will be monitoring the Twitter API documentation going forward to incorporate new expansions and fields as they will inevitably be added in the future. Flattening This is diving into the weeds a little bit, but it’s worth noting here that Twitter’s introduction of expansions allows data that was once duplicated across multiple tweets (such as user information, media, retweets, etc) to be included once per response from the API. This means that instead of seeing information about the user who created a tweet in the context of their tweet the user will be referenced using an identifier, and this identifier will map to user metadata in the outer envelope of the response. It makes sense why Twitter have introduced expansions since it means in a set of 100 tweets from a given user the user information will just be included once rather than repeated 100 times, which means less data, less network traffic and less money. It’s even more significant when consider the large number of possible expansions. However this pass by-reference rather than by-value presents some challenges for stream based processing which expects each tweet to be self-contained. For this reason we’ve introduce the idea of flattening the response data when persisting the JSON to disk. This means that tools and data pipelines that expect to operate on a stream of tweets can continue to do so. Since the representation of a tweet is so dependent on how data is requested we’ve taken the opportunity to introduce a small stanza of twarc specific metadata using the __twarc prefix. This metadata records what API endpoint the data was requested from, and when. This information is critically important when interpreting the data, because some information about a tweet like its retweet and quote counts are constantly changing. Data Flows As mentioned above you can still collect tweets from the search and streaming API endpoints in a way that seems quite similar to the v1 API. The big changes however are the quotas associated with these endpoints which govern how much can be collected. These quotas control how many requests can be sent to Twitter in 15 minute intervals. In fact these quotas are not much changed, but what’s new are app wide quotas that constrain how many tweets a given application (app) can collect every month. An app in this context is a piece of software (e.g. your twarc software) identified by unique API keys set up in the Twitter Developer Portal. The standard API access sets a 500,000 tweet per month limit. This is a huge change considering there were no monthly app limits before. If you get approved for the Academic Research track your app quota is increased to 10 million per month. This is markedly better but the achievable data volume is still nothing like the v1.1 API, as these graphs attempt to illustrate: twarc2 will still observe the same rate limits, but once you’ve collected your portion for the month there’s not much that can be done, for that app at least. Apart from the quotas Twitter’s streaming endpoint in v2 is substantially changed which impacts how users interact with twarc. Previously twarc users would be able to create up to to two connections to the filter stream API. This could be done by simply: twarc filter obama > obama.jsonl However in the Twitter v2 API only apps can connect to the filter stream, and they can only connect once. At first this seems like a major limitation but rather than creating a connection per query the v2 API allows you to build a set of rules for tweets to match, which in turns controls what tweets are included in the stream. This means you can collect for multiple types of queries at the same time, and the tweets will come back with a piece of metadata indicating what rule caused its inclusion. This translates into a markedly different set of interactions at the command line for collecting from the stream where you first need to set your stream rules and then open a connection to fetch it. twarc2 stream-rules add blacklivesmatter twarc2 stream > tweets.jsonl One useful side effect of this is that you can update the stream (add and remove rules) while the stream is in motion: twarc2 stream-rules add blm While you are limited by the API quota in terms of how many tweets you can collect, tweets are not “dropped on the floor” when the volume gets too high. Once upon a time the v1.1 filter stream was rumored to be rate limited when your stream exceeds 1% of the total volume of new tweets. Plugins In addition to twarc helping you collect tweets the GitHub repository has also been a place to collect a set of utilities for working with the data. For example there are scripts for extracting and unshortening urls, identifying suspended/deleted content, extracting videos, buiding wordclouds, putting tweets on maps, displaying network graph visualizations, counting hashtags, and more. These utilities all work like Unix filters where the input is a stream of tweets and the output varies depending on what the utility is doing, e.g. a Gephi file for a network visualization, or a folder of mp4 files for video extraction. While this has worked well in general the kitchen sink approach has been difficult to manage from a configuration management perspective. Users have to download these scripts manually from GitHub or by cloning the repository. For some users this is fine, but it’s a bit of a barrier to entry for users who have just installed twarc with pip. Furthermore these plugins often have their own dependencies which twarc itself does not. This lets twarc can stay pretty lean, and things like youtube_dl, NetworkX or Pandas can be installed by people that want to use utilities that need them. But since there is no way to install the utilities there isn’t a way to ensure that the dependencies are installed, which can lead to users needing to diagnose missing libraries themselves. Finally the plugins have typically lacked their own tests. twarc’s test suite has really helped us track changes to the Twitter API and to make sure that it continues to operate properly as new functionality has been added. But nothing like this has existed for the utilities. We’ve noticed that over time some of them need updating. Also their command line arguments have drifted over time which can lead to some inconsistencies in how they are used. So with twarc2 we’ve introduced the idea of plugins which extend the functionality of the twarc2 command, are distributed on PyPI separately from twarc, and exist in their own GitHub repositories where they can be developed and tested independently of twarc itself. This is all achieved through twarc2’s use of the click library and specifically click-plugins. So now if you would like to convert your collected tweets to CSV you can install the twarc-csv: $ pip install twarc-csv $ twarc2 search covid19 > covid19.jsonl $ twarc2 csv covid19.jsonl > covid19.csv Or if you want to extract embedded and referenced videos from tweets you can install twarc-videos which will write all the videos to a directory: $ pip install twarc-videos $ twarc2 videos covid19.jsonl --download-dir covid19-videos You can write these plugins yourself and release them as needed. Check out the plugin reference implementation tweet-ids for a simple example to adapt. We’re still in the process of porting some of the most useful utilities over and would love to see ideas for new plugins. Check out the current list of twarc2 plugins and use the twarc issue tracker on GitHub to join the discussion. You may notice from the list of plugins that twarc now (finally) has documentation on ReadTheDocs external from the documentation that was previously only available on GitHub. We got by with GitHub’s rendering of Markdown documents for a while, but GitHub’s boilerplate designed for developers can prove to be quite confusing for users who aren’t used to selectively ignoring it. ReadTheDocs allows us to manage the command line and API documentation for twarc, and to showcase the work that has gone into the Spanish, Japanese, Portuguese, Swedish, Swahili and Chinese translations. Feedback Thanks for reading this far! We hope you will give twarc2 a try. Let us know what you think either in comments here, in the DocNow Slack or over on GitHub. ✨ ✨ Happy twarcing! ✨ ✨ ✨ Windows users will want to indicate the output file using a second argument rather than redirecting output with >. See this page for details.↩ Unless otherwise noted all the content here is licensed CC-BY inkdroid-org-4236 ---- inkdroid inkdroid Paper or Plastic 856 Coincidence? twarc2 This post was originally published on Medium but I spent time writing it so I wanted to have it here too. TL;DR twarc has been redesigned from the ground up to work with the new Twitter v2 API and their Academic Research track. Many thanks for the code and design contributions of Betsy Alpert, Igor Brigadir, Sam Hames, Jeff Sauer, and Daniel Verdeer that have made twarc2 possible, as well as early feedback from Dan Kerchner, Shane Lin, Miles McCain, 李荣蓬, David Thiel, Melanie Walsh and Laura Wrubel. Extra special thanks to the Institute for Future Environments at Queensland University of Technology for supporting Betsy and Sam in their work, and for the continued support of the Mellon Foundation. Back in August of last year Twitter announced early access to their new v2 API, and their plans to sunset the v1.1 API that has been active for almost the last 10 years. Over the lifetime of their v1.1 API Twitter has become deeply embedded in the media landscape. As magazines, newspapers and television have moved onto the web they have increasingly adopted tweets as a mechanism for citing politicians, celebrities and organizations, while also using them to document current events, generate leads and gather feedback for evolving stories. As a result Twitter has also become a popular object of study for humanities and social science researchers looking to understand the world as reflected, refracted and distorted by/in social media. On the surface the v2 API update seems pretty insignificant since the shape of a tweet, its parts, properties and affordances, aren’t changing at all. Tweets with 280 characters of text, images and video will continue to be posted, retweeted and quoted. However behind the scenes the representation of a tweet as data, and the quotas that control the rates at which this data can flow between apps and other third party services will be greatly transformed. Needless to say, v2 represents a big change for the Documenting the Now project. Along with community members we’ve developed and maintained open source tools like twarc that talk directly to the Twitter API to help users to search for and collect live tweets that match criteria like hashtags, names and geographic locations. Today we’re excited to announce the release of twarc v2 which has been designed from the ground up to work with the v2 API and Twitter’s new Academic Research track. Clearly it’s extremely problematic having a multi-national corporation act as a gatekeeper for who counts as an academic researcher, and what constitutes academic research. We need look no further than the recent experiences of Timnit Gebru and Margaret Mitchell at Google for an example of what happens when research questions run up against the business objectives of capital. We only know their stories because Gebru and Mitchell’s bravely took a principled approach, where many researchers would have knowingly or unknowingly shaped their research to better fit the needs of the company. So it is important for us that twarc still be usable by people with and without access to the Academic Research Track. But we have heard from many users that the Academic Research Track presents new opportunities for Twitter data collection that are essential for researchers interested in the observability of social media platforms. Twitter is making a good faith effort to work with the academic research community, and we thought twarc should support it, even if big challenges lie ahead. So why are people interested in the Academic Research Track? Once your application has been approved you are able to collect data from the full history of Tweets, at no cost. This is a massive improvement over the v1.1 access which was limited to a one week window and researchers had to pay for access. Access to the full archive means it’s now possible to study events that have happened in the past back to the beginning of Twitter in 2006. If you do create any historical datasets we’d love for you to share the tweet identifier datasets in The Catalog. However this opening up of access on the one hand comes with a simultaneous contraction in terms of how much data can be collected at one time. The remainder of this post describes some of the details and the design decisions we have made with twarc2 to address them. If you would prefer to watch a quick introduction to using twarc v2 please check out this short video: Installation If you are familiar with installing twarc nothing is changed. You still install (or upgrade) with pip as you did before: $ pip install --upgrade twarc In fact you will still have full access to the v1.1 API just as you did before. So the old commands will continue to work as they did1 $ twarc search blacklivesmatter > tweets.jsonl twarc2 was designed to let you to continue to use Twitter’s v1.1 API undisturbed until it is finally turned off by Twitter, at which point the functionality will be removed from twarc. All the support for the v2 API is mediated by a new command line utility twarc2. For example to search for blacklivesmatter tweets and write them to a file tweets.jsonl: $ twarc2 search blacklivesmatter > tweets.jsonl All the usual twarc functionality such as searching for tweets, collecting live tweets from the streaming API endpoint, requesting user timelines and user metadata are all still there, twarc2 --help gives you the details. But while the interface looks the same there’s quite a bit different going on behind the scenes. Representation Truth be told, there is no shortage of open source libraries and tools for interacting with the Twitter API. In the past twarc has made a bit of a name for itself by catering to a niche group of users who want a reliable, programmable way to collect the canonical JSON representation of a tweet. JavaScript Object Notation (JSON) is the language of Web APIs, and Twitter has kept its JSON representation of a tweet relatively stable over the years. Rather than making lots of decisions about the many ways you might want to collect, model and analyze tweets twarc has tried to do one thing and do it well (data collection) and get out of the way so that you can use (or create) the tools for putting this data to use. But the JSON representation of a tweet in the Twitter v2 API is completely burst apart. The v2 base representation of a tweet is extremely lean and minimal, and just includes the text of the tweet its identifier and a handful of other things. All the details about the user who created the tweet, embedded media, and more are not included. Fortunately this information is still available, but the user needs to craft their API request to request tweets using a set of expansions that tell the Twitter API what additional entities to include. In addition for each expansion there are a set of field options to include that control what of these expansions is returned. So rather than there being a single JSON representation of a tweet API users now have the ability to shape the data based on what they need, much like how GraphQL APIs work. This kind of makes you wonder why Twitter didn’t make their GraphQL API available. For specific use cases this customizability is very useful, but the mutability of the representation of a tweet presents challenges when collecting data for future use. If you didn’t request the right expansions or fields when collecting the data then you won’t be able to analyze that data later when doing your research. To solve for this twarc2 has been designed to collect the richest possible representation for a tweet, by requesting all possible expansions and field combinations for tweets. See the expansions module for the details if you are interested. This takes a significant burden off of users to digest the API documentation, and craft the correct API requests themselves. In addition the twarc community will be monitoring the Twitter API documentation going forward to incorporate new expansions and fields as they will inevitably be added in the future. Flattening This is diving into the weeds a little bit, but it’s worth noting here that Twitter’s introduction of expansions allows data that was once duplicated across multiple tweets (such as user information, media, retweets, etc) to be included once per response from the API. This means that instead of seeing information about the user who created a tweet in the context of their tweet the user will be referenced using an identifier, and this identifier will map to user metadata in the outer envelope of the response. It makes sense why Twitter have introduced expansions since it means in a set of 100 tweets from a given user the user information will just be included once rather than repeated 100 times, which means less data, less network traffic and less money. It’s even more significant when consider the large number of possible expansions. However this pass by-reference rather than by-value presents some challenges for stream based processing which expects each tweet to be self-contained. For this reason we’ve introduce the idea of flattening the response data when persisting the JSON to disk. This means that tools and data pipelines that expect to operate on a stream of tweets can continue to do so. Since the representation of a tweet is so dependent on how data is requested we’ve taken the opportunity to introduce a small stanza of twarc specific metadata using the __twarc prefix. This metadata records what API endpoint the data was requested from, and when. This information is critically important when interpreting the data, because some information about a tweet like its retweet and quote counts are constantly changing. Data Flows As mentioned above you can still collect tweets from the search and streaming API endpoints in a way that seems quite similar to the v1 API. The big changes however are the quotas associated with these endpoints which govern how much can be collected. These quotas control how many requests can be sent to Twitter in 15 minute intervals. In fact these quotas are not much changed, but what’s new are app wide quotas that constrain how many tweets a given application (app) can collect every month. An app in this context is a piece of software (e.g. your twarc software) identified by unique API keys set up in the Twitter Developer Portal. The standard API access sets a 500,000 tweet per month limit. This is a huge change considering there were no monthly app limits before. If you get approved for the Academic Research track your app quota is increased to 10 million per month. This is markedly better but the achievable data volume is still nothing like the v1.1 API, as these graphs attempt to illustrate: twarc2 will still observe the same rate limits, but once you’ve collected your portion for the month there’s not much that can be done, for that app at least. Apart from the quotas Twitter’s streaming endpoint in v2 is substantially changed which impacts how users interact with twarc. Previously twarc users would be able to create up to to two connections to the filter stream API. This could be done by simply: twarc filter obama > obama.jsonl However in the Twitter v2 API only apps can connect to the filter stream, and they can only connect once. At first this seems like a major limitation but rather than creating a connection per query the v2 API allows you to build a set of rules for tweets to match, which in turns controls what tweets are included in the stream. This means you can collect for multiple types of queries at the same time, and the tweets will come back with a piece of metadata indicating what rule caused its inclusion. This translates into a markedly different set of interactions at the command line for collecting from the stream where you first need to set your stream rules and then open a connection to fetch it. twarc2 stream-rules add blacklivesmatter twarc2 stream > tweets.jsonl One useful side effect of this is that you can update the stream (add and remove rules) while the stream is in motion: twarc2 stream-rules add blm While you are limited by the API quota in terms of how many tweets you can collect, tweets are not “dropped on the floor” when the volume gets too high. Once upon a time the v1.1 filter stream was rumored to be rate limited when your stream exceeds 1% of the total volume of new tweets. Plugins In addition to twarc helping you collect tweets the GitHub repository has also been a place to collect a set of utilities for working with the data. For example there are scripts for extracting and unshortening urls, identifying suspended/deleted content, extracting videos, buiding wordclouds, putting tweets on maps, displaying network graph visualizations, counting hashtags, and more. These utilities all work like Unix filters where the input is a stream of tweets and the output varies depending on what the utility is doing, e.g. a Gephi file for a network visualization, or a folder of mp4 files for video extraction. While this has worked well in general the kitchen sink approach has been difficult to manage from a configuration management perspective. Users have to download these scripts manually from GitHub or by cloning the repository. For some users this is fine, but it’s a bit of a barrier to entry for users who have just installed twarc with pip. Furthermore these plugins often have their own dependencies which twarc itself does not. This lets twarc can stay pretty lean, and things like youtube_dl, NetworkX or Pandas can be installed by people that want to use utilities that need them. But since there is no way to install the utilities there isn’t a way to ensure that the dependencies are installed, which can lead to users needing to diagnose missing libraries themselves. Finally the plugins have typically lacked their own tests. twarc’s test suite has really helped us track changes to the Twitter API and to make sure that it continues to operate properly as new functionality has been added. But nothing like this has existed for the utilities. We’ve noticed that over time some of them need updating. Also their command line arguments have drifted over time which can lead to some inconsistencies in how they are used. So with twarc2 we’ve introduced the idea of plugins which extend the functionality of the twarc2 command, are distributed on PyPI separately from twarc, and exist in their own GitHub repositories where they can be developed and tested independently of twarc itself. This is all achieved through twarc2’s use of the click library and specifically click-plugins. So now if you would like to convert your collected tweets to CSV you can install the twarc-csv: $ pip install twarc-csv $ twarc2 search covid19 > covid19.jsonl $ twarc2 csv covid19.jsonl > covid19.csv Or if you want to extract embedded and referenced videos from tweets you can install twarc-videos which will write all the videos to a directory: $ pip install twarc-videos $ twarc2 videos covid19.jsonl --download-dir covid19-videos You can write these plugins yourself and release them as needed. Check out the plugin reference implementation tweet-ids for a simple example to adapt. We’re still in the process of porting some of the most useful utilities over and would love to see ideas for new plugins. Check out the current list of twarc2 plugins and use the twarc issue tracker on GitHub to join the discussion. You may notice from the list of plugins that twarc now (finally) has documentation on ReadTheDocs external from the documentation that was previously only available on GitHub. We got by with GitHub’s rendering of Markdown documents for a while, but GitHub’s boilerplate designed for developers can prove to be quite confusing for users who aren’t used to selectively ignoring it. ReadTheDocs allows us to manage the command line and API documentation for twarc, and to showcase the work that has gone into the Spanish, Japanese, Portuguese, Swedish, Swahili and Chinese translations. Feedback Thanks for reading this far! We hope you will give twarc2 a try. Let us know what you think either in comments here, in the DocNow Slack or over on GitHub. ✨ ✨ Happy twarcing! ✨ ✨ ✨ Windows users will want to indicate the output file using a second argument rather than redirecting output with >. See this page for details.↩ $ j You may have noticed that I try to use this static website as a journal. But, you know, not everything I want to write down is really ready (or appropriate) to put here. Some of these things end up in actual physical notebooks–there’s no beating the tactile experience of writing on paper for some kind of thinking. But I also spend a lot of time on my laptop, and at the command line in some form or another. So I have a directory of time stamped Markdown files stored on Dropbox, for example: ... /home/ed/Dropbox/Journal/2019-08-25.md /home/ed/Dropbox/Journal/2020-01-27.md /home/ed/Dropbox/Journal/2020-05-24.md /home/ed/Dropbox/Journal/2020-05-25.md /home/ed/Dropbox/Journal/2020-05-31.md ... Sometimes these notes migrate into a blog post or some other writing I’m doing. I used this technique quite a bit when writing my dissertation when I wanted to jot down things on my phone when an idea arrived. I’ve tried a few different apps for editing Markdown on my phone, but mostly settled on iA Writer which mostly just gets out of the way. But when editing on my laptop I tend to use my favorite text editor Vim with the vim-pencil plugin for making Markdown fun and easy. If Vim isn’t your thing and you use another text editor keep reading since this will work for you too. The only trick to this method of journaling is that I just need to open the right file. With command completion on the command line this isn’t so much of a chore. But it does take a moment to remember the date, and craft the right path. Today while reflecting on how nice it is to still be using Unix, it occurred to me that I could create a little shell script to open my journal for that day (or a previous day). So I put this little file j in my PATH: #!/bin/zsh journal_dir="/home/ed/Dropbox/Journal" if [ "$1" ]; then date=$1 else date=`date +%Y-%m-%d` fi vim "$journal_dir/$date.md" So now when I’m in the middle of something else and want to jot a note in my journal I just type j. Unix, still crazy after all these years. Strengths and Weaknesses Quoting Macey (2019), quoting Foucault, quoting Nietzsche: One thing is needful. – To ‘give style’ to one’s character – a great and rare art! It is practised by those who survey all the strengths and weaknesses that their nature has to offer and then fit them into an artistic plan until each appears as art and reason and even weaknesses delight the eye. Nietzsche, Williams, Nauckhoff, & Del Caro (2001), p. 290 This is a generous and lively image of what art does when it is working. Art is not perfection. Macey, D. (2019). The lives of Michel Foucault: A biography. Verso. Nietzsche, F. W., Williams, B., Nauckhoff, J., & Del Caro, A. (2001). The gay science: with a prelude in German rhymes and an appendix of songs. Cambridge, U.K. ; New York: Cambridge University Press. Data Speculation I’ve taken the ill-advised approach of using the Coronavirus as a topic to frame the exercises in my computer programming class this semester. I say “ill-advised” because given the impact that COVID has been having on students I’ve been thinking they probably need a way to escape news of the virus by way of writing code, rather than diving into it more. It’s late in the semester to modulate things but I think we will shift gears to look at programming through another lens after spring break. That being said, one of the interesting things we’ve been doing is looking at vaccination data that is being released by the Maryland Department of Health through their ESRI ArcGIS Hub. Note: this dataset has since been removed from the web because it has been superseded by a new dataset that includes single dose vaccinations. I guess it’s good that students get a feel for how ephemeral data on the web is, even when it is published by the government. We noticed that this dataset recorded a small number of vaccinations as happening as early as the 1930s up until December 11, 2020 when vaccines were approved for use. I asked students to apply what we have been learning about Python (files, strings, loops, and sets) to identify the Maryland counties that were responsible for generating this anomalous data. I thought this exercise provided a good demonstration using real, live data that critical thinking about the provenance of data is always important because there is no such thing as raw data (Gitelman, 2013). While we were working with the data to count the number of anomalous vaccinations per county one of my sharp eyed students noticed that the results we were seeing with my version of the dataset (downloaded on February 28) were different from what we saw with his (downloaded on March 4). We expected to see new rows in the later one because new vaccination data seem to be reported daily–which is cool in itself. But we were surprised to find new vaccination records for dates earlier than December 11, 2020. Why would new vaccinations for these erroneous older dates still be entering the system? For example the second dataset downloaded March 4 acquired 6 new rows: Object ID Vaccination Date County Daily First Dose Cumulative First Dose Daily Second Dose Cumulative Second Dose 4 1972/10/13 Allegany 1 1 0 0 5 1972/12/16 Baltimore 1 1 0 0 6 2012/02/03 Baltimore 1 2 0 0 28 2020/02/24 Baltimore City 1 2 0 0 34 2020/08/24 Baltimore 1 4 0 0 64 2020/12/10 Prince George’s 1 3 0 0 And these rows present in the February 28 version were deleted in the March 4 version: Object ID Vaccination Date County Daily First Dose Cumulative First Dose Daily Second Dose Cumulative Second Dose 4 2019/12/26 Frederick 1 1 0 0 15 2020/01/25 Talbot 1 1 0 0 19 2020/01/28 Baltimore 1 1 0 0 20 2020/01/30 Caroline 1 1 0 0 28 2020/02/12 Prince George’s 1 1 0 0 30 2020/02/20 Anne Arundel 1 6 0 0 56 2020/10/16 Frederick 1 7 0 4 59 2020/11/01 Wicomico 1 1 0 0 60 2020/11/04 Frederick 1 8 0 4 I found these additions perplexing at first, because I assumed these outliers were part of an initial load. But it appears that the anomalies are still being generated? The deletions suggest that perhaps the anomalous data is being identified and scrubbed in a live system that is then dumping out the data? Or maybe the code that is being used to update the dataset in ArcGIS Hub itself is malfunctioning in some way? If you are interested in toying around with the code and data it is up on GitHub. I was interested to learn about pandas.DataFrame.merge which is useful for diffing tables when you use indicator=True. At any rate, having students notice, measure and document anomalies like this seems pretty useful. I also asked them to speculate about what kinds of activities could generate these errors. I meant speculate in the speculative fiction sense of imagining a specific scenario that caused it. I think this made some students scratch their head a bit, because I wasn’t asking them for the cause, but to invent a possible cause. Based on the results so far I’d like to incorporate more of these speculative exercises concerned with the functioning of code and data representations into my teaching. I want to encourage students to think creatively about data processing as they learn about the nuts and bolts of how code operates. For example the treatments in How to Run a City Like Amazon, and Other Fables which use sci-fi to test ideas about how information technologies are deployed in society. Another model is the Speculative Ethics Book Club which also uses sci-fi to explore the ethical and social consequences of technology. I feel like I need to read up on specualtive research more generally before doing this though (Michael & Wilkie, 2020). I’d also like to focus the speculation down at the level of the code or data processing, rather than at the macro super-system level. But that has its place too. Another difference is that I was asking students to engage in speculation about the past rather than the future. How did the data end up this way? Perhaps this is more of a genealogical approach, of winding things backwards, and tracing what is known. Maybe it’s more Mystery than Sci-Fi. The speculative element is important because (in this case) operations at the MD Dept of Health, and their ArcGIS Hub setup are mostly opaque to us. But even when access isn’t a problem these systems they can feel opaque, because rather than there being a dearth of information you are drowning in it. Speculation is a useful abductive approach to hypothesis generation and, hopefully, understanding. Update 2021-03-17: Over in the fediverse David Benque recommended I take a look at Matthew Stanley’s chapter in (Gitelman, 2013) “Where Is That Moon, Anyway? The Problem of Interpreting Historical Solar Eclipse Observations” for the connection to Mystery. For the connection to Peirce and abduction he also pointed to Luciana Parisi’s chapter “Speculation: A method for the unattainable” in Lury & Wakeford (2012). Definitely things to follow up on! References Gitelman, L. (Ed.). (2013). “Raw data” is an oxymoron. MIT Press. Lury, C., & Wakeford, N. (2012). Inventive methods: The happening of the social. Routledge. Michael, M., & Wilkie, A. (2020). Speculative research. In The Palgrave encyclopedia of the possible (pp. 1–8). Cham: Springer International Publishing. Retrieved from https://doi.org/10.1007/978-3-319-98390-5_118-1 Recovering Foucault I’ve been enjoying reading David Macey’s biography of Michel Foucault, that was republished in 2019 by Verso. Macey himself is an interesting figure, both a scholar and an activist who took leave from academia to do translation work and to write this biography and others of Lacan and Fanon. One thing that struck me as I’m nearing the end of Macey’s book is the relationship between Foucault and archives. I think Foucault has become emblematic of a certain brand of literary analysis of “the archive” that is far removed from the research literature of archival studies, while using “the archive” as a metaphor (Caswell, 2016). I’ve spent much of my life working in libraries and digital preservation, and now studying and teaching about them from the perspective of practice, so I am very sympathetic to this critique. It is perhaps ironic that the disconnect between these two bodies of research is a difference in discourse which Foucault himself brought attention to. At any rate, the thing that has struck me while reading this biography is how much time Foucault himself spent working in libraries and archives. Here’s Foucault in his own words talking about his thesis: In Histoire de la folie à l’âge classique I wished to determine what could be known about mental illness in a given epoch … An object took shape for me: the knowledge invested in complex systems of institutions. And a method became imperative: rather than perusing … only the library of scientific books, it was necessary to consult a body of archives comprising decrees, rules hospital and prison registers, and acts of jurisprudence. It was in the Arsenal or the Archives Nationales that I undertook the analysis of a knowledge whose visible body is neither scientific nor theoretical discourse, nor literature, but a daily and regulated practice. (Macey, 2019, p. 94) Foucault didn’t simply use archives for his research: understanding the processes and practices of archives were integral to his method. Even though the theory and practice of libraries and archives are quite different given their different functions and materials, they are often lumped together as a convenience in the same buildings. Macey blurs them a little bit, in sections like this where he talks about how important libraries were to Foucault’s work: Foucault required access to Paris for a variety of reasons, not least because he was also teaching part-time at ENS. The putative thesis he had begun at the Fondation Thiers – and which he now described to Polin as being on the philosophy of psychology – meant that he had to work at the Bibliothèque Nationale and he had already become one of its habitues. For the next thirty years, Henri Labrouste’s great building in the rue de Richelieu, with its elegant pillars and arches of cast iron, would be his primary place of work. His favourite seat was in the hemicycle, the small, raised section directly opposite the entrance, sheltered from the main reading room, where a central aisle separates rows of long tables subdivided into individual reading desks. The hemicycle affords slighty more quiet and privacy. For thirty years, Foucault pursued his research here almost daily, with occasional forays to the manuscript department and to other libraries, and contended with the Byzantine cataloguing system: two incomplete and dated printed catalogues supplemented by cabinets containing countless index cards, many of them inscribed with copperplate handwriting. Libraries were to become Foucault’s natural habitat: ‘those greenish institutions where books accumulate and where there grows the dense vegetation of their knowledge’ There’s a metaphor for you: libraries as vegetation :) It kind of reminds me of some recent work looking at decentralized web technologies in terms of mushrooms. But I digress. I really just wanted to note here that the erasure of archival studies from humanities research about “the archive” shouldn’t really be attributed to Foucault, whose own practice centered the work of libraries and archives. Foucault wasn’t just writing about an abstract archive, he was practically living out of them. As someone who has worked in libraries and archives I can appreciate how power users (pun intended) often knew aspects of the holdings and intricacies of their their management better than I did. Archives, when they are working, are always collaborative endeavours, and the important thing is to recognize and attribute the various sides of that collaboration. PS. Writing this blog post led me to dig up a few things I want to read (Eliassen, 2010; Radford, Radford, & Lingel, 2015 ). References Caswell, M. (2016). The archive is not an archives: On acknowledging the intellectual contributions of archival studies. Reconstruction, 16(1). Retrieved from http://reconstruction.eserver.org/Issues/161/Caswell.shtml Eliassen, K. (2010). Archives of Michel Foucualt. In E. Røssaak (Ed.), The archive in motion, new conceptions of the archive in contemporary thought and new media practices. Novus Press. Macey, D. (2019). The lives of Michel Foucault: A biography. Verso. Radford, G. P., Radford, M. L., & Lingel, J. (2015). The library as heterotopia: Michel Foucault and the experience of library space. Journal of Documentation, 71(4), 773–751. Teaching OOP in the Time of COVID I’ve been teaching a section of the Introduction to Object Oriented Programming at the UMD College for Information Studies this semester. It’s difficult for me, and for the students, because we are remote due to the Coronavirus pandemic. The class is largely asynchronous, but every week I’ve been holding two synchronous live coding sessions in Zoom to discuss the material and the exercises. These have been fun because the students are sharp, and haven’t been shy about sharing their screen and their VSCode session to work on the details. But students need quite a bit of self-discipline to move through the material, and probably only about 1/4 of the students take advantage of these live sessions. I’m quite lucky because I’m working with a set of lectures, slides and exercises that have been developed over the past couple of years by other instructors: Josh Westgard, Aric Bills and Gabriel Cruz. You can see some of the public facing materials here. Having this backdrop of content combined with Severance’s excellent (and free) Python for Everybody has allowed me to focus more on my live sessions, on responsive grading, and to also spend some time crafting additional exercises that are geared to this particular moment. This class is in the College for Information Studies and not in the Computer Science Department, so it’s important for the students to not only learn how to use a programming language, but to understand programming as a social activity, with real political and material effects in the world. Being able to read, understand, critique and talk about code and its documentation is just as important as being able to write it. In practice, out in the “real world” of open source software I think these aspects are arguably more important. One way I’ve been trying to do this in the first few weeks of class is to craft a sequence of exercises that form a narrative around Coronavirus testing and data collection to help remind the students of the basics of programming: variables, expressions, conditionals, loops, functions, files. In the first exercise we imagined a very simple data entry program that needed to record results of Real-time polymerase chain reaction tests (RT-PCR). I gave them the program and described how it was supposed to work, and asked them describe (in English) any problems that they noticed and to submit a version of the program with problems fixed. I also asked them to reflect on a request from their boss about adding the collection of race, gender and income information. The goal here was to test their ability to read the program and write English about it while also demonstrating a facility for modifying the program. Most importantly I wanted them to think about how inputs such as race or gender have questions about categories and standards behind them, and weren’t simply a matter of syntax. The second exercise builds on the first by asking them to adjust the revised program to be able to save the data in a very particular format. Yes, in the first exercise the data is stored in memory and printed to the screen in aggregate at the end. The scenario here is that the Department of Health and Human Services has assumed the responsibility for COVID test data collection from the Centers for Disease Control. Of course this really happened, but the data format I chose was completely made up (maybe we will be working with some real data at the end of the semester if I continue with this theme). The goal in this exercise was to demonstrate their ability to read another program and fit a function into it. The students were given a working program that had a save_results() function stubbed out. In addition to submitting their revised code I asked them to reflect on some limitations of the data format chosen, and the data processing pipeline that it was a part of. And in the third exercise I asked them to imagine that this lab they were working in had a scientist who discovered a problem with some of the thresholds for acceptable testing, which required an update to the program from Exercise 2, and also a test suite to make sure the program was behaving properly. In addition to writing the tests I asked them to reflect on what functionality was not being tested that probably should be. This alternation between writing code and writing prose is something I started doing as part of a Digital Curation class. I don’t know if this dialogical or perhaps dialectical, approach is something others have tried. I should probably do some research to see. In my last class I alternated week by week: one week reading and writing code, the next week reading and writing prose. But this semester I’ve stayed focused on code, but required the reading and writing of code as well as prose about code in the same week. I hope to write more about how this goes, and these exercises as I go. I’m not sure if I will continue with the Coronavirus data examples. One thing I’m sensitive to is that my students themselves are experiencing the effects of the Coronavirus, and may want to escape it just for a bit in their school work. Just writing in the open about it here, in addition to the weekly meetings I’ve had with Aric, Josh and Gabriel has been very useful. Speaking of those meetings. I learned today from Aric that tomorrow (February 20th, 2021) is the 30th anniversary of Python’s first public release! You can see this reflected in this timeline. This v0.9.1 release was the first release Guido van Rossum made outside of CWI and was made on the Usenet newsgroup alt.sources where it is split out into chunks that need to be reassembled. Back in 2009 Andrew Dalke located a and repackaged these sources in Google Groups which acquired alt.sources as part of DejaNews in 2001. But if you look at the time stamp on the first part of the release you can see that it was made February 19, 1991 (not February 20). So I’m not sure if the birthday is actually today. I sent this little note out to my students with this wonderful two part oral history that the Computer History Museum did with Guido van Rossum a couple years ago. I turns out Both of his parents were atheists and pacifists. His dad went to jail because he refused to be conscripted into the military. That and many more details of his background and thoughts about the evolution of Python can be found in these delightful interviews: Happy Birthday Python! GPT-3 Jam One of the joys of pandemic academic life has been a true feast of online events to attend, on a wide variety of topics, some of which are delightfully narrow and esoteric. Case in point was today’s Reflecting on Power and AI: The Case of GPT-3 which lived up to its title. I’ll try to keep an eye out for when the video posts, and update here. The workshop was largely organized around an exploration of whether GPT-3, the largest known machine learning language model, changes anything for media studies theory, or if it amounts to just more of the same. So the discussion wasn’t focused so much on what games could be played with GPT-3, but rather if GPT-3 changes the rules of the game for media theory, at all. I’m not sure there was a conclusive answer at the end, but it sounded like the consensus was that current theorization around media is adequate for understanding GPT-3, but it matters greatly what theory or theories are deployed. The online discussion after the presentations indicated that attendees didn’t see this as merely a theoretical issue, but one that has direct social and political impacts on our lives. James Steinhoff looked at GPT-3 using a Marxist media theory perspective where he told the story of GPT-3’s as a project of OpenAI and as a project of capital. OpenAI started with much fanfare in 2015 as a non-profit initiative where the technology, algorithms and models developed would would be kept openly licensed and freely available so that the world could understand the benefits and risks of AI technology. Steinhoff described how in 2019 the project’s needs for capital (compute power and staff) transitioned it from a non-profit into a capped-profit company, which is now owned, or at least controlled, by Microsoft. The code for generating the model as well as the model itself are gated behind a token driven Web API run my Microsoft. You can get on a waiting list to use it, but apparently a lot of people have been waiting a while, so … Being a Microsoft employee probably helps. I grabbed a screenshot of the pricing page that Steinhoff shared during his presentation: I’d be interested to hear more about how these tokens operate. Are they per-request, or are they measured according something else? I googled around a bit during the presentation to try to find some documentation for the Web API, and came up empty handed. I did find Shreya Shankar’s gpt3-sandbox project for interacting with the API in your browser (mostly for iteratively crafting text input in order to generate desired output). It depends on the openai Python package created by OpenAI themselves. The docs for openai then point at a page on the openai.com website which is behind a login. You can create an account, but you need to be pre-approved (made it through the waitlist) to be able to see the docs. There’s probably some sense that can be made from examining the python client though. All of the presentations in some form or another touched on the 175 billion parameters that were used to generate the model. But the API to the model doesn’t have that many parameters. It allows you to enter text and get text back. But the API surface that the GPT-3 service provides could be interesting to examine a bit more closely, especially to track how it changes over time. In terms of how this model mediates knowledge and understanding it’ll be important watch. Steinhoff’s message seemed to be that, despite the best of intentions, GPT-3 functions in the service of very large corporations with very particular interests. One dimension that he didn’t explore perhaps because of time, is how the GPT-3 model itself is fed massive amounts of content from the web, or the commons. Indeed 60% of the data came from the CommonCrawl project. GPT-3 is an example of an extraction project that has been underway at large Internet companies for some time. I think the critique of these corporations has often been confined to seeing them in terms of surveillance capitalism rather than in terms of raw resource extraction, or the primitive accumulation of capital. The behavioral indicators of who clicked on what are certainly valuable, but GPT-3 and sister projects like CommonCrawl shows just the accumulation of data with modest amounts of metadata can be extremely valuable. This discussion really hit home for me since I’ve been working with Jess Ogden and Shawn Walker using CommonCrawl as a dataset for talking about the use of web archives, while also reflecting on the use of web archives as data. CommonCrawl provides a unique glimpse into some of the data operations that are at work in the accumulation of web archives. I worry that the window is closing and the CommonCrawl itself will be absorbed into Microsoft. Following Steinhoff Olya Kudina and Bas de Boer jointly presented some compelling thoughts about how its important to understand GPT-3 in terms of sociotechnical theory, using ideas drawn from Foucault and Arendt. I actually want to watch their presentation again because it followed a very specific path that I can’t do justice to here. But their main argument seemed to be that GPT-3 is an expression of power and that where there is power there is always resistance to power. GPT-3 can and will be subverted and used to achieve particular political ends of our own choosing. Because of my own dissertation research I’m partial to Foucault’s idea of governmentality, especially as it relates to ideas of legibility (Scott, 1998)–the who, what and why of legibility projects, aka archives. GPT-3 presents some interesting challenges in terms of legibility because the model is so complex, the results it generates defy deductive logic and auditing. In some ways GPT-3 obscures more than it makes a population legible, as Foucault moved from disciplinary analysis of the subject, to the ways in which populations are described and governed through the practices of pastoral power, of open datasets. Again the significance of CommonCrawl as an archival project, as a web legibility project, jumps to the fore. I’m not as up on Arendt as I should be, so one outcome of their presentation is that I’m going to read her The Human Condition which they had in a slide. I’m long overdue. References Scott, J. C. (1998). Seeing like a state: How certain schemes to improve the human condition have failed. Yale University Press. mimetypes Today I learned that Python has a mimetypes module, and has ever since Guido von Rossum added it in 1997. Honestly I’m just a bit sheepish to admit this discovery, as someone who has been using Python for digital preservation work for about 15 years. But maybe there’s a good reason for that. Since the entire version history for Python is available on GitHub (which is a beautiful thing in itself) you can see that the mimetypes module started as a guess_type() function built around a pretty simple hard coded mapping of file extensions to mimetypes. The module also includes a little bit of code to look for, and parse, mimetype registries that might be available on the host operating system. The initial mimetype registries used included one from the venerable Apache httpd web server, and the Netscape web browser, which was about three years old at the time. It makes sense why this function to look up a mimetype for a filename would be useful at that time, since Python was being used to serve up files on the nascent web and for sending email, and whatnot. Today the module looks much the same, but has a few new functions and about twice as many mimetypes in its internal list. Some of the new mimetypes include text/csv, audio.mpeg, application/vnd.ms-powerpoint, application/x-shockwave-flash, application/xml, and application/json. Comparing the first commit to the most latest provides a thumbnail sketch of 25 years of web format evolution. I’ll admit, this is is a bit of an esoteric thing to be writing a blog post about. So I should explain. At work I’ve been helping out on a community archiving project which has accumulated a significant amount of photographs, scans, documents of various kinds, audio files and videos. Some of these files are embedded in web applications like Omeka, some are in cloud storage like Google Drive, or on the office networked attached storage, and others are on scattered storage devices in people’s desk drawers and closets. We’ve also created new files during community digitization events, and oral history interviews. As part of this work we’ve wanted to start building a place on the web where all these materials live. This has required not only describing the files, but also putting all the files in one place so that access can be provided. In principle this sounds simple. But it turns out that collecting the files from all these diverse locations poses significant challenges, because their context matters. The filenames, and the directories they are found in, are sometimes the only descriptive metadata that exists for this data. In short, the original order matters. But putting this content on the web means that the files need to be brought together and connected with their metadata programmatically. This is how I stumbled across the mimetypes module. I’ve been writing some throwaway code to collect the files together into the same directory structure while preserving their original filenames and locations in an Airtable database. I’ve been using the magic module to identify the format of the file, which is used to copy the file into a Dropbox storage location. The extension is important because we are expecting this to be a static site serving up the content and we want the files to also be browsable using the Dropbox drive. It turns out the mimetypes.guess_extension is pretty useful for turning a mediatype into an file extension. I’m kind of surprised that it took me this long to discover mimetypes, but I’m glad I did. As an aside I think this highlights for me how important Git can be as an archive and research method for software studies work. Northwest Branch Cairn Here is a short recording and a couple photos from my morning walk along the Northwest Branch trail with Penny. I can’t go every day but at 7 months old she has tons of energy, so it’s generally a good idea for all concerned to go at least every other morning. And it’s a good thing, because the walk is surprisingly peaceful, and it’s such a joy to see her run through the woods. After walking about 30 minutes there is this little cairn that is a reminder for me to turn around. After seeing it grow in size I was sad to see it knocked down one day. But, ever so slowly, it is getting built back up again. inkdroid-org-4669 ---- None inkdroid-org-616 ---- 856 Toggle Navigation inkdroid About Bookmarks Photos Music Software Social Talks 856 April 27, 2021 metadata Coincidence? Unless otherwise noted all the content here is licensed CC-BY inkdroid-org-798 ---- None inkdroid-org-8502 ---- None inkdroid-org-8885 ---- None inkdroid-org-9563 ---- inkdroid Toggle Navigation inkdroid About Bookmarks Photos Music Software Social Talks 2021-04-27 ~ 856 2021-04-07 ~ twarc2 2021-03-27 ~ $ j 2021-03-19 ~ Strengths and Weaknesses 2021-03-16 ~ Data Speculation 2021-02-26 ~ Recovering Foucault 2021-02-19 ~ Teaching OOP in the Time of COVID 2021-02-18 ~ GPT-3 Jam 2021-02-13 ~ mimetypes 2021-02-12 ~ Northwest Branch Cairn 2021-02-11 ~ Blow back derelict wind 2021-02-04 ~ Outgoing 2021-01-21 ~ Trump's Tweets 2020-12-31 ~ noarchive 2020-12-23 ~ What's the diff? 2020-12-07 ~ 25 for 2020 2020-12-05 ~ Diss Music 2020-12-02 ~ 25 Years of robots.txt 2020-12-01 ~ Curation Communities 2020-11-30 ~ Mystery File! 2020-11-27 ~ Kettle 2020-11-24 ~ Static-Dynamic 2020-11-08 ~ Dark Reading 2020-10-28 ~ Seeing Software 2020-10-16 ~ Curating Corpora 2020-10-14 ~ Fuzzy 2020-10-11 ~ Penny 2020-10-09 ~ Fuzzy File Formats 2020-09-26 ~ Pandoc 2020-09-26 ~ Fuzzy Matching 2020-09-22 ~ Less is (sometimes) More 2020-09-20 ~ Teaching Digital Curation 2020-09-08 ~ RSS 2020-09-05 ~ Organizations on Twitter 2020-09-03 ~ BibDesk, Zotero and JabRef 2020-09-02 ~ Disinformation Metadata 2020-08-30 ~ Equipment 2020-08-27 ~ Twitter 2020-08-26 ~ Music for Hard Times 2020-08-23 ~ Digital Curation 2020-08-22 ~ Dependency Hell 2020-08-14 ~ Keyboard 2020-08-06 ~ Tech Tree 2020-07-02 ~ Appraisal Talk in Web Archives 2020-06-16 ~ Talk Talk 2020-06-07 ~ Original Voice 2020-06-02 ~ Write It Down 2020-05-15 ~ Sun and Moon 2020-05-07 ~ First Thought 2020-04-23 ~ Studying the COVID-19 Web « Prev 1 2 3 4 5 6 7 8 9 10 11 12 13 Next » Unless otherwise noted all the content here is licensed CC-BY inkdroid-org-9635 ---- None invidious-xyz-1352 ---- Collaborations Workshop 2021 - Keynotes Live Stream - Invidious true Invidious Log in Collaborations Workshop 2021 - Keynotes Live Stream Video unavailable. Watch on YouTube Show annotations Download is disabled. 190 0 0 Genre: Family friendly? No Wilson score: 0.0 Rating: 0.0 / 5 Engagement: 0.0% SoftwareSaved Subscribe | - Shared March 30, 2021 Hi! Looks like you have JavaScript turned off. Click here to view comments, keep in mind they may take a bit longer to load. Play next by default: 46:35 Collaborations Workshop 2021 - Panel Live Stream SoftwareSaved 50 views 4:13:05 Python Software Carpentry workshop 05 March 2021 - Version Control with Git Module SoftwareSaved 32 views 2:23:16 Python Software Carpentry workshop 10-11 Nov 2020 - Building Programs with Python (part 1) SoftwareSaved 48 views 3:43:12 April Series: Enhancing Learning Using Google for Education Tools Franco Nicolo Addun 20K views 2:59:44 Python Software Carpentry workshop 10-11 Nov 2020 - Automating Tasks with the Unix Shell SoftwareSaved 46 views 2:38:21 April Series | Communicating Using Google For Education Tools Franco Nicolo Addun 17K views 59:41 Fellowship Programme 2021 Launch Webinar SoftwareSaved 179 views 1:29:42 Resurrecting Retail Virtual Launch Party Retail Prophet 14K views 18:26 Chris Hartgerink keynote talk on "The social model of inaccessibility" SoftwareSaved 121 views 2:46:06 April Series: Organizing Life and Work Using Google for Education Tools Franco Nicolo Addun 32K views 7:17:50 Volt Europa General Assembly 25.04.2021 | #VoteVolt Volt Europa 1.5K views 3:18:29 Python Software Carpentry workshop 02-04 March 2021 - Building Programs with Python Module Part 2 SoftwareSaved 17 views Released under the AGPLv3 by Omar Roth. BTC: 356DpZyMXu6rYd55Yqzjs29n79kGKWcYrY BCH: qq4ptclkzej5eza6a50et5ggc58hxsq5aylqut2npk Liberapay View JavaScript license information. / View privacy policy. Current version: 0.20.1-99ba987 @ master invidious-xyz-3206 ---- Collaborations Workshop 2021 - Panel Live Stream - Invidious true Invidious Log in Collaborations Workshop 2021 - Panel Live Stream Video unavailable. Watch on YouTube Show annotations Download is disabled. 50 0 0 Genre: Family friendly? No Wilson score: 0.0 Rating: 5.0 / 5 Engagement: 0.0% SoftwareSaved Subscribe | - Shared March 31, 2021 Hi! Looks like you have JavaScript turned off. Click here to view comments, keep in mind they may take a bit longer to load. Play next by default: 4:13:05 Python Software Carpentry workshop 05 March 2021 - Version Control with Git Module SoftwareSaved 32 views 2:23:16 Python Software Carpentry workshop 10-11 Nov 2020 - Building Programs with Python (part 1) SoftwareSaved 48 views 52:44 Collaborations Workshop 2021 - Keynotes Live Stream SoftwareSaved 190 views 59:41 Fellowship Programme 2021 Launch Webinar SoftwareSaved 179 views 2:59:44 Python Software Carpentry workshop 10-11 Nov 2020 - Automating Tasks with the Unix Shell SoftwareSaved 46 views 45:55 SSI Fellows Community Call: February 2021 SoftwareSaved 26 views 18:26 Chris Hartgerink keynote talk on "The social model of inaccessibility" SoftwareSaved 121 views 6:18:51 R Data Carpentry workshop 13-14 Oct 20 - Data Analysis and Visualisation in R SoftwareSaved 76 views 3:04:58 Python Software Carpentry workshop 01 March 2021 - Automating Tasks with Shell Module SoftwareSaved 46 views 32:44 Research Software Camp: Q&A with Chris Hartgerink SoftwareSaved 17 views 1:21 Research Software Camp: Chris Hartgerink's abstract SoftwareSaved 83 views 27:08 The most elegant key change in all of pop music Adam Neely 1.1M views Released under the AGPLv3 by Omar Roth. BTC: 356DpZyMXu6rYd55Yqzjs29n79kGKWcYrY BCH: qq4ptclkzej5eza6a50et5ggc58hxsq5aylqut2npk Liberapay View JavaScript license information. / View privacy policy. Current version: 0.20.1-99ba987 @ master ipfs-io-5542 ---- IPFS Powers the Distributed Web IPFS About Install Docs Team Blog Help IPFS powers the Distributed Web A peer-to-peer hypermedia protocol designed to make the web faster, safer, and more open. Get started How it works View more Disable animation The web of tomorrow needs IPFS today IPFS aims to surpass HTTP in order to build a better web for all of us. Today's web is inefficient and expensive HTTP downloads files from one computer at a time instead of getting pieces from multiple computers simultaneously. Peer-to-peer IPFS saves big on bandwidth — up to 60% for video — making it possible to efficiently distribute high volumes of data without duplication. Today's web can't preserve humanity's history The average lifespan of a web page is 100 days before it's gone forever. It's not good enough for the primary medium of our era to be this fragile. IPFS keeps every version of your files and makes it simple to set up resilient networks for mirroring data. Today's web is centralized, limiting opportunity The Internet has turbocharged innovation by being one of the great equalizers in human history — but increasing consolidation of control threatens that progress. IPFS stays true to the original vision of an open, flat web by delivering technology to make that vision a reality. Today's web is addicted to the backbone IPFS powers the creation of diversely resilient networks that enable persistent availability — with or without Internet backbone connectivity. This means better connectivity for the developing world, during natural disasters, or just when you're on flaky coffee shop wi-fi. Install IPFS Join the future of the web right now — just choose the option that's right for you. Store and share files IPFS Desktop IPFS for everyone The desktop app offers menubar/tray shortcuts and an easy interface for adding, pinning, and sharing files — plus a full IPFS node ready for heavy-duty hosting and development too. A great choice for devs and non-devs alike. Get IPFS Desktop Command-line install All IPFS, no frills Just want IPFS in your terminal? Get step-by-step instructions for getting up and running on the command line using the Go implementation of IPFS. Includes directions for Windows, macOS, and Linux. Get the CLI IPFS Companion Add IPFS to your browser Get ipfs:// URL support and much more in your web browser with this extension. Get Companion IPFS Cluster For servers or big data Automatically allocate, replicate, and track your data as pinsets across multiple IPFS nodes. Get Cluster Build with IPFS Go implementation The original IPFS, with core implementation, daemon server, CLI tooling, and more. Get go-ipfs JS implementation Written entirely in JavaScript for a world of possibilities in browser implementations. Get js-ipfs Here's how IPFS works Take a look at what happens when you add a file to IPFS. Your file, and all of the blocks within it, is given a unique fingerprint called a cryptographic hash. IPFS removes duplications across the network. Each network node stores only content it is interested in, plus some indexing information that helps figure out which node is storing what. When you look up a file to view or download, you're asking the network to find the nodes that are storing the content behind that file's hash. You don't need to remember the hash, though — every file can be found by human-readable names using a decentralized naming system called IPNS. Take a closer look Want to dig in? Check out the docs Hands-on learner? Explore ProtoSchool Curious where it all began? Read the whitepaper IPFS can help here and now No matter what you do with the web, IPFS helps make it better today. Archivists IPFS provides deduplication, high performance, and clustered persistence — empowering you to store the world's information for future generations. Service providers Providing large amounts of data to users? IPFS offers secure, peer-to-peer content delivery — an approach that could save you millions in bandwidth costs. Researchers If you're working with or distributing large data sets, IPFS can help provide fast performance and decentralized archiving. Developing world High-latency networks are a big barrier for those with poor internet infrastructure. IPFS provides resilient access to data independent of latency or backbone connectivity. Blockchains With IPFS, you can address large amounts of data and put immutable, permanent links in transactions — timestamping and securing content without having to put the data itself on-chain. Content creators IPFS brings the freedom and independent spirit of the web in full force — and can help you deliver your content at a much lower cost. Who's already using IPFS? Companies and organizations worldwide are already building amazing things on IPFS. See the list News and more IPFS blog 07 April 2021 Welcome to IPFS Weekly 130 07 April 2021 Meet the New IPFS Blog & News 05 April 2021 Storing NFTs on IPFS 31 March 2021 Welcome to IPFS Weekly 129 In the media TechCrunch Why The Internet Needs IPFS Before It’s Too Late Motherboard IPFS Wants to Create a Permanent Web MakeUseOf Faster, Safer, Decentralized Internet With IPFS Videos Why IPFS? Developers Speak: Building on IPFS More videos Stay on top of the latest Sign up for the IPFS Weekly newsletter to get project updates, community news, event details, and more. In your inbox, each Tuesday. Subscribe Protocol Labs About Join IPFS Install GitHub Code of Conduct Docs Community Help Awesome IPFS IPFS Cluster Team Press Blog Legal ProtoSchool Tutorials Events Filecoin About FAQ Other Projects libp2p IPLD Drand Multiformats Testground Twitter Facebook YouTube © Protocol Labs | Except as noted, content licensed CC-BY 3.0. islandora-ca-1879 ---- Islandora Open Meeting: April 27, 2021 | Islandora Skip to main content Toggle navigation Main Menu Home About Events Blog Contact Newsletter Support Islandora Search Search You are here : Home Islandora Open Meeting: April 27, 2021 About Menu Islandora Foundation Get Started Community Contribute Help Islandora Open Meeting: April 27, 2021 We are happy to announce the date of our next Open Meeting! Join us on April 27, 2021 any time between 10:00-2:00pm EDT. The Open Meetings are drop-in style sessions where users of all levels and abilities gather to ask questions, share use cases and get updates on Islandora. There will be experienced Islandora 8 users on hand to answer questions or give demos. We would love for your to join us any time during the 4-hour window, so feel free to pop by any time! More details about the Open Meeting, and the Zoom link to join, are in this Google doc.  Registration is not required. If you would like a calendar invite as a reminder, please let us know at community@islandora.ca. Submitted by agriffith on Tue, 04/13/2021 - 16:11 Log in to post comments notvisible Home About Events Blog Contact Newsletter Support Islandora © Copyright 2020 Islandora Foundation. Header photo credits. Privacy Policy. islandora-ca-4289 ---- Islandorans Unite! It's Release Time | Islandora Skip to main content Toggle navigation Main Menu Home About Events Blog Contact Newsletter Support Islandora Search Search You are here : Home Islandorans Unite! It's Release Time About Menu Islandora Foundation Get Started Community Contribute Help Islandorans Unite! It's Release Time It's that time again everyone!  Our amazing community contributors have made all sorts of improvements and upgrades to Islandora.  Some have been merged, but some are still hanging out, waiting for the love they need to make it into the code base.  We're calling on you - yes you! - to help us get things merged, tested, documented, and released to the world. I would like to kick off this release cycle with a sprint to mop up some the amazing improvements that have unmerged pull requests.  Did you know that we have pull requests for an advanced search module and a basic batch ingest form just lounging around?  And that's not all.  There are all kinds of great improvements that just need some time and attention. A little code review and some basic testing by others are all that is needed before we freeze the code and start turning the crank on the release process. Here's a rough timetable for the release: April 19 - 30th: Code Sprint May 3rd: Code Freeze May 3rd - 14th: Testing, bug fixing, responding to feedback May 17th - 28th: Documentation sprint May 31st - June 18th: More testing, bug fixing, and responding to feedback June 21st - July 2nd: Testing sprint Release! This is, of course, an optimistic plan.  If major issues are discovered we will take the time to address them which can affect the timeline.  I also plan on liaising with the Documentation Interest Group and folks from the Users' Call / Open Meetings for the documentation and testing sprints, and their availabilities may nudge things a week in either direction. An open and transparent release process is one of the hallmarks of our amazing community. If you or your organization have any interest in helping out, please feel free to reach out or sign up for any of the upcoming sprints.  There are plenty of opportunities to contribute regardless of your skill set or level of experience with Islandora.  There's something for everyone! We'll make further announcements for the other sprints, but you can sign up for the code sprint now using our sign up sheet.  Hope to see you there!   Submitted by dlamb on Mon, 03/29/2021 - 19:07 Log in to post comments notvisible Home About Events Blog Contact Newsletter Support Islandora © Copyright 2020 Islandora Foundation. Header photo credits. Privacy Policy. islandora-ca-5670 ---- Islandora Islandora What's New What's New manez Tue, 05/12/2020 - 14:05 Body Our website has been overhauled in a big way. We have moved to Drupal 8, changed our look, and shifted content around to make it easier to find the Islandora information and resources that you need. Can't find something you expect from the old site? Let us know and we'll get it fixed.     Islandora Open Meeting: April 27, 2021 Islandora Open Meeting: April 27, 2021 agriffith Tue, 04/13/2021 - 16:11 Body We are happy to announce the date of our next Open Meeting! Join us on April 27, 2021 any time between 10:00-2:00pm EDT. The Open Meetings are drop-in style sessions where users of all levels and abilities gather to ask questions, share use cases and get updates on Islandora. There will be experienced Islandora 8 users on hand to answer questions or give demos. We would love for your to join us any time during the 4-hour window, so feel free to pop by any time! More details about the Open Meeting, and the Zoom link to join, are in this Google doc.  Registration is not required. If you would like a calendar invite as a reminder, please let us know at community@islandora.ca. Upcoming DIG Sprint Upcoming DIG Sprint agriffith Thu, 04/08/2021 - 20:03 Body The Islandora Documentation Interest Group is holding a sprint! To support the upcoming release of Islandora, the DIG has planned a 2-week documentation, writing-and-updating sprint to occur as part of the release process. To prepare for that effort, we’re going to spend April 19 – 30th on an Auditing Sprint, where volunteers will review existing documentation and complete this spreadsheet, providing a solid overview of the current status of our docs so we know where to best deploy our efforts during the release. This sprint will run alongside the upcoming Pre-Release Code Sprint, so if you’re not up for coding, auditing docs is a great way to contribute during sprint season! We are looking for volunteers to sign up to take on two sprint roles: Auditor: Review a page of documentation and fill out a row in the spreadsheet indicating things like the current status (‘Good Enough’ or ‘Needs Work’) , the goal for that particular page (e.g., “Explain how to create an object,” or “Compare Islandora 7 concepts to Islandora 8 concepts”), and the intended audience (Beginners, developers, etc.). Reviewer: Read through a page that has been audited and indicate if you agree with the auditor’s assessment, add additional notes or suggestions as needed; basically, give a second set of eyes on each page.  You can sign up for the sprint here, and sign up for individual pages here.   Community Announcement Community Announcement agriffith Wed, 03/31/2021 - 16:49 Body As you know, the Islandora Foundation has recently updated its governance structure to remain compliant with Canadian non-profit regulations. Islandora Foundation members approved these changes at the Annual General Meeting in early March. A summary of these changes is provided here, as well as our emerging roadmap for moving forward. A newly formed “Leadership Group”, composed of representatives from our Partner-level member organizations, replaces the pre-existing Board of Directors, and a smaller Board of Directors remains responsible for Islandora’s administrative and fiscal responsibilities. This Leadership Group met for the first time on Friday, March 26th to begin to discuss their goals going forward, and the ways the Leadership Group will interact with the other governance structures of the Islandora community. The Leadership Group immediately affirmed their commitment to transparent communication and collaboration with the vibrant, robust Islandora community and will be creating a Terms of Reference over the next month. The Terms of Reference will be written with agility and transformation in mind, as we work together to secure a strong future for both the community and codebase. In the meantime, please let us know if you have any questions regarding the formation of the Leadership Group, and stay tuned to hear more about the initial goals of this group. Islandorans Unite! It's Release Time Islandorans Unite! It's Release Time dlamb Mon, 03/29/2021 - 19:07 Body It's that time again everyone!  Our amazing community contributors have made all sorts of improvements and upgrades to Islandora.  Some have been merged, but some are still hanging out, waiting for the love they need to make it into the code base.  We're calling on you - yes you! - to help us get things merged, tested, documented, and released to the world. I would like to kick off this release cycle with a sprint to mop up some the amazing improvements that have unmerged pull requests.  Did you know that we have pull requests for an advanced search module and a basic batch ingest form just lounging around?  And that's not all.  There are all kinds of great improvements that just need some time and attention. A little code review and some basic testing by others are all that is needed before we freeze the code and start turning the crank on the release process. Here's a rough timetable for the release: April 19 - 30th: Code Sprint May 3rd: Code Freeze May 3rd - 14th: Testing, bug fixing, responding to feedback May 17th - 28th: Documentation sprint May 31st - June 18th: More testing, bug fixing, and responding to feedback June 21st - July 2nd: Testing sprint Release! This is, of course, an optimistic plan.  If major issues are discovered we will take the time to address them which can affect the timeline.  I also plan on liaising with the Documentation Interest Group and folks from the Users' Call / Open Meetings for the documentation and testing sprints, and their availabilities may nudge things a week in either direction. An open and transparent release process is one of the hallmarks of our amazing community. If you or your organization have any interest in helping out, please feel free to reach out or sign up for any of the upcoming sprints.  There are plenty of opportunities to contribute regardless of your skill set or level of experience with Islandora.  There's something for everyone! We'll make further announcements for the other sprints, but you can sign up for the code sprint now using our sign up sheet.  Hope to see you there!   Islandora Open Meeting: March 30, 2021 Islandora Open Meeting: March 30, 2021 agriffith Wed, 03/24/2021 - 16:26 Body We will be holding our next Open Meeting on Tuesday, March 30 from 10:00 AM to 2:00 PM Eastern. Full details, and the Zoom link to join, are in this Google doc. The meeting is drop-in and will be free form, with experienced Islandora 8 users on hand to answer questions or give demos on request. We would love for your to join us any time during the 4-hour window, so feel free to pop by any time. Registration is not required. If you would like a calendar invite as a reminder, please let us know at community@islandora.ca. ISLE: Now with Islandora 8 ISLE: Now with Islandora 8 dlamb Tue, 03/23/2021 - 20:12 Body The Islandora Foundation is pleased to announce that ISLE for Islandora 8 has gone alpha and is now available! What is ISLE? ISLE (short for ISLandora Enterprise), is "Dockerized" Islandora, and seeks to create community managed infrastructure, streamlining the installation and maintenance of an Islandora repository.  With ISLE, the bulk of your repository's infrastructure is managed for you, and updates are as easy as pulling in new Docker images.  System administrators are only responsible for maintaining and updating their Drupal site, and can rely on ISLE to handle Fedora, Solr, the triplestore, and all the other services we use to run a digital repository. The project began as a Mellon grant funded initiative by the Islandora Collaboration Group back in 2017 for Islandora 7. Then in January 2020, the ICG, Born Digital, Lyrasis, CWRC, and the Islandora Foundation got together and started working on a version for Islandora 8.  This version would be a full community project, worked on in the open and residing in the Islandora-Devops Github organization. What are the benefits of using ISLE? On top of being easier to install, run, and update, there are many awesome reasons to use ISLE for running Islandora.  First and foremost: speed. Simply put, ISLE is fast! Installation time is simply the amount of time it takes to download the images from Dockerhub.  For those who are building the images themselves, ISLE takes advantage of Docker's buildkit feature for blazing fast builds.  A complete rebuild of the entire stack consistently takes less than ten minutes on my laptop.  And for small tweaks to the environment, builds often take seconds to make a change. Compared to our Ansible playbook, which usually takes around 45 minutes for me, this is a significant boost to productivity when testing/deploying changes! Because it's so quick, it lends itself well to automation using CI/CD tools like Github Actions and Gitlab. The Islandora Foundation is "dogfooding" with ISLE, putting it at the center of its deployment strategy for future.islandora.ca and release testing. ISLE is also cross-platform. It is the first and only community supported way to run Islandora on a Windows machine. Any Windows computer with WSL2 can build and run ISLE.  ISLE also supports ARM builds, and can be run on cheaper cloud resources, newer Macs with M1 chips, and even (theoretically) Raspberry Pis. How can I get ISLE? Docker images for Islandora 8 are automatically pushed to Dockerhub and are available here. If you want to run them using docker-compose, you can use isle-dc to build yourself a sandbox or a local development environment.  Upcoming Sprint: Metadata Upcoming Sprint: Metadata dlamb Wed, 02/24/2021 - 16:12 Body Our very own Metadata Interest Group is running a sprint from March 8th to the 19th, and everyone's invited to participate.  We'll be auditing the default metadata fields that we ship with and comparing them to the excellent metadata profile the MIG has worked so hard to create for us. The goal of the sprint is just to find out where the gaps are so we know the full scope of work needed to implement their recommendations.  If you can navigate the Drupal fields UI (or just want to learn!), contributing is easy and would be super helpful to us. NO PROGRAMMING REQUIRED. And if you don't have an Islandora 8 instance to work on (or are having a hard time installing one), we're making a fresh sandbox just for the sprint. Also, Islandora Foundation staff (a.k.a. me) and representatives from the MIG will be on hand to help out and answer any questions you may have. You can sign up for the sprint here, and choose a metadata field to audit in this spreadsheet.  As always, commit to as much or as little as you like.  It only takes a couple minutes to check out a field and its settings to see if they line up with the recommendations. If we get enough folks to sign up, then many hands will make light work of this task! This is yet another sign of the strength of our awesome community.  An interest group is taking it upon themselves to run a sprint to help achieve their goals, and the Islandora Foundation couldn't be happier to help. If you're a member of an interest group and want help engaging the community to make your goals happen, please feel free to reach out on Slack or email me (dlamb@islandora.ca). Islandora Open Meeting: February 23, 2021 Islandora Open Meeting: February 23, 2021 manez Wed, 02/03/2021 - 19:09 Body We will be holding another open drop-in session on Tuesday, February 23 from 10:00 AM to 2:00 PM Eastern. Full details, and the Zoom link to join, are in this Google doc. The meeting is free form, with experienced Islandora 8 users on hand to answer questions or give demos on request. Please drop in at any time during the four-hour window. Registration is not required. If you would like a calendar invite as a reminder, please let us know at community@islandora.ca. Islandora Open Meeting: January 28, 2021 Islandora Open Meeting: January 28, 2021 manez Thu, 01/14/2021 - 15:55 Body We will be holding another open drop-in session on January 28th from 10:00 AM to 2:00 PM Eastern. Full details, and the Zoom link to join, are in this Google doc. The meeting is free form, with experienced Islandora 8 users on hand to answer questions or give demos on request. Please drop in at any time during the four-hour window. Registration is not required. If you would like a calendar invite as a reminder, please let us know at community@islandora.ca. islandora-ca-7106 ---- Upcoming DIG Sprint | Islandora Skip to main content Toggle navigation Main Menu Home About Events Blog Contact Newsletter Support Islandora Search Search You are here : Home Upcoming DIG Sprint About Menu Islandora Foundation Get Started Community Contribute Help Upcoming DIG Sprint The Islandora Documentation Interest Group is holding a sprint! To support the upcoming release of Islandora, the DIG has planned a 2-week documentation, writing-and-updating sprint to occur as part of the release process. To prepare for that effort, we’re going to spend April 19 – 30th on an Auditing Sprint, where volunteers will review existing documentation and complete this spreadsheet, providing a solid overview of the current status of our docs so we know where to best deploy our efforts during the release. This sprint will run alongside the upcoming Pre-Release Code Sprint, so if you’re not up for coding, auditing docs is a great way to contribute during sprint season! We are looking for volunteers to sign up to take on two sprint roles: Auditor: Review a page of documentation and fill out a row in the spreadsheet indicating things like the current status (‘Good Enough’ or ‘Needs Work’) , the goal for that particular page (e.g., “Explain how to create an object,” or “Compare Islandora 7 concepts to Islandora 8 concepts”), and the intended audience (Beginners, developers, etc.). Reviewer: Read through a page that has been audited and indicate if you agree with the auditor’s assessment, add additional notes or suggestions as needed; basically, give a second set of eyes on each page.  You can sign up for the sprint here, and sign up for individual pages here.   Submitted by agriffith on Thu, 04/08/2021 - 20:03 Log in to post comments notvisible Home About Events Blog Contact Newsletter Support Islandora © Copyright 2020 Islandora Foundation. Header photo credits. Privacy Policy. isni-org-9658 ---- ISNI | : Home Page Toggle navigation ABOUT What is ISNI? Governance Our History Objectives & Policies ISNI COMMUNITY The ISNI Community ISNI Registration Agencies ISNI Members Direct Data Contributors Joining ISNI RESOURCES How ISNI Works Data Quality Procedures Data Inputs & Outputs Technical Documentation Training Linked Data NEWS News & Archive ISNI Newsletter HELP FAQs Get an ISNI Contact ISNI SEARCH SITE MAILING LIST GET AN ISNI SEARCH DATABASE SEARCH WEBSITE: ABOUT ISNI ISNI is the ISO certified global standard number for identifying the millions of contributors to creative works and those active in their distribution, including researchers, inventors, writers, artists, visual creators, performers, producers, publishers, aggregators, and more. As ISO 27729, it is part of a family of international standard identifiers that includes identifiers of works, recordings, products and right holders in all repertoires, e.g. DOI, ISAN, ISBN, ISRC, ISSN, and ISWC. The mission of the ISNI International Agency (ISNI-IA) is to assign to the public name(s) of a researcher, inventor, writer, artist, performer, publisher, etc. a persistent unique identifying number in order to resolve the problem of name ambiguity in search and discovery; and diffuse each assigned ISNI across all repertoires in the global supply chain so that every published work can be unambiguously attributed to its creator wherever that work is described. By achieving these goals, the ISNI will act as a bridge identifier across multiple domains and become a critical component in Linked Data and Semantic Web applications. KEY STATISTICS 12.22 million  ISNI holds public records of more than 12 million identities 11.10 million  ISNI holds public records of over 11.10 million individuals (of which 2.93 million are researchers)  1.11 million ISNI holds public records of 1,119,480 organizations  104 sources The ISNI database is a cross-domain resource with direct contributions from 104 sources NEWS The British Library launches its ISNI Portal: A Brand New, Online Service for ISNI Users   We are delighted and privileged to announce that the British Library has now launched its online, all-in-one service for the International Standard Name... READ MORE BDS Builds a New Website for ISNI BDSDigital, the web services and IT arm of BDS, has built a new ISNI website, which went live in June 2020. BDS also transferred existing content from the... READ MORE Music Industry ISNI Registrations Now Free and Automated Sound Credit music credit cloud profile system offers world's only free and automated ISNI registration service MEMPHIS, TENN., OCTOBER 23, 2020 – Every creative work... READ MORE GET AN ISNI SEARCH DATABASE ISNI International Agency (ISNI-IA) LIMITED Registered address: c/o EDItEUR, United House, North Road, London, N7 9DP, UK Company registration number: 07476425 Follow us Privacy Terms of Use FAQs iwatchafrica-org-721 ---- iWatch Home - iWatch Africa Menu iWatch Africa Home Education Government Expenditure Health Job Creation Together Against Corruption Watch Africa Digital Rights Gender Force Ocean & Climate Action Search for Latest Articles Transforming climate finance for debt-distressed economies during COVID-19 EC proposed Carbon Border Adjustment mechanism: Key considerations for Least Developed Countries iWatch Africa marks 2021 Open Data Day with focus on women safety online How Big Tech’s Content Moderation Policies Could Jeopardize Users in Authoritarian Regimes iWatch Africa launches its 2021 Policy Dialogue Series Where women journalists in Ghana go to ‘die’ Predictions for 2021: Digital Rights, Global Security, Climate Change & Expectations of the Biden Administration – Part 1 Stolen at sea: An investigation into illegal Chinese transhipment activities in Ghana and Nigeria On the other side of Saiko Value your personal and public integrity – Co-founder, iWatch Africa About Team iWatch Gallery Contact Transforming climate finance for debt-distressed economies during COVID-19 One year after the World Health Organisation declared the COVID-19 disease as a global pandemic, many… EC proposed Carbon Border Adjustment mechanism: Key considerations for Least Developed Countries Although most nations recognise the need to transition to a decarbonised world, carbon tax policies… iWatch Africa marks 2021 Open Data Day with focus on women safety online iWatch Africa, marked the 2021 Open Data Day last Saturday with virtual event with focused… Ocean & Climate Action Transforming climate finance for debt-distressed economies during COVID-19 One year after the World Health Organisation declared the COVID-19 disease as a global pandemic, many emerging markets and developing economies… Read More » Ocean & Climate Action EC proposed Carbon Border Adjustment mechanism: Key considerations for Least Developed Countries Although most nations recognise the need to transition to a decarbonised world, carbon tax policies have usually encountered significant roadblocks,… Read More » Digital Rights iWatch Africa marks 2021 Open Data Day with focus on women safety online iWatch Africa, marked the 2021 Open Data Day last Saturday with virtual event with focused on leveraging data to promote… Read More » Digital Rights How Big Tech’s Content Moderation Policies Could Jeopardize Users in Authoritarian Regimes Social media advocates have historically lauded its ability to facilitate democratic progress by connecting people over space and time, enabling… Read More » News iWatch Africa launches its 2021 Policy Dialogue Series iWatch Africa has launched its 2021 Policy Dialogue Series which seeks to bring diverse experts and stakeholders across the world… Read More » Load More Follow Us On Facebook Find us on Facebook Most Read Transforming climate finance for debt-distressed economies during COVID-19 17 seconds ago EC proposed Carbon Border Adjustment mechanism: Key considerations for Least Developed Countries 2 weeks ago iWatch Africa marks 2021 Open Data Day with focus on women safety online March 11, 2021 Watch Video: iWatch Africa Open Data Day Event 2021 March 6, 2021 Open Data Day 2021: iWatch Africa to focus on safety of women journalists & equal development online March 1, 2021 iWatch Video Playlist 1 / 20 Videos 1 Inside iWatch Africa's Digital Rights Campaign 01:11 2 iWatch Digital Rights Campaign. What is doxxing and the effects of doxxing? 00:40 3 Gideon Sarpong, Policy and News Director at iWatch Africa interviewed on Plus TV Africa, Nigeria 06:07 4 iWatch Africa campaign against domestic violence. 00:29 5 iWatch Africa investigation into use of corporal punishment in Ghana 00:50 6 iWatch Africa campaign against online trolling and impersonation (cyber-stalking) 00:46 7 iWatch Africa's video highlighting the negative impact of abuse on journalists 00:40 8 iWatch Africa campaign against cyberstalking and its impact on journalists and rights activists 00:36 9 Together Against Corruption iWatch Africa, Socioserve & JMK 01:52 10 Budget Tracking: Key educational commitments to be tracked in 2018 01:22 11 Budget Tracking: iWatch Africa Budget Tracking 2018 Health 01:14 12 iWatch Africa: Ministry of Finance released close to GH¢ 30 million to the Electoral Commission 02:59 13 iWatch Africa assessment of GoG commitment in education 2017 03:21 14 iWatch Africa third-quarter assessment of GoG commitments - Health Sector, 2017 02:40 15 iWatch Africa assessment of Planting for Food and Jobs Program 2017 04:02 16 A new age for Data Journalism | Nana Boakye-Yiadom 10:06 17 Government promise to distribute school uniforms and sandals yet to take off 02:04 18 iWatch Progress Report: One District, One Factory Initiative (1D1F) 02:12 19 iWatch Africa: Over GH¢600 million in off shore accounts at risk of abuse & recovery 03:35 20 iWatch Review:How Assemblies in Ghana mismanaged their Common Fund-Ranking 04:00 Our Partners Quick Links Education Government Expenditure Health Job Creation Quick Links Education Government Expenditure Health Job Creation © Copyright 2021. iWatch Africa. All Rights Reserved Back to top button Close Search for: Popular Posts Watch Africa Missing Gold: How Ghana lost over $6 billion in gold export revenue to major trading partners May 29, 2018 iWatch Africa joins the World Economic Forum 1 Trillion Trees Initiative as part of our Climate Action March 9, 2020 Full List: Volta Region ranked 1st for mismanagement of Assemblies’ Common Fund August 23, 2017 Third Quarter Assessment of the ‘One Village One Dam’ Promise October 16, 2017 Parents with wards in Class A schools must be allowed to pay fees- Ass. Headmaster Mfantsipim August 24, 2017 Most Commented 17 seconds ago Transforming climate finance for debt-distressed economies during COVID-19 August 10, 2017 Everybody is affected by climate change [Infographic] August 11, 2017 Reasons Journalists should use data to improve their stories August 11, 2017 Ghana’s shameful record in child marriages [Infographics] August 12, 2017 Meet Maukeni Padiki Kodjo, an iWatch Africa Transparency Launch facilitator August 12, 2017 Meet Sandister Tei, an iWatch Africa Transparency Project facilitator Recent Comments .widget-title .the-subtitle { color: #000 !important; } jakoblog-de-6672 ---- Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /kunden/116716_10965/jakoblog.de/wp/wp-content/plugins/mendeleyplugin/wp-mendeley.php on line 548 Jakoblog — Das Weblog von Jakob Voß Blog About Erster expliziter Entwurf einer Digitalen Bibliothek (1959) 18. März 2018 um 23:38 3 Kommentare Ich recherchiere (mal wieder) zu Digitalen Bibliotheken und habe mich gefragt, wann der Begriff zum ersten mal verwendet wurde. Laut Google Books taucht (nach Aussortieren falsch-positiver Treffer) „digital library“ erstmals 1959 in einem Bericht für das US-Außenministerium auf. Die bibliographischen Daten habe ich bei Wikidata eingetragen. Der Bericht „The Need for Fundamental Research in Seismology“ wurde damals erstellt um zu Untersuchen wie mit seismischen Wellen Atomwaffentests erkannt werden können. In Anhang 19 legte John Gerrard, einer von vierzehn an der Studie beteiligten Wissenschaftler, auf zwei Seiten den Bedarf an einem Rechenzentrum mit einem IBM 704 Rechner dar. Da das US-Regierungsdokument gemeinfrei ist hier die entsprechenden Seiten: Bei der geplanten digitalen Bibliothek handelt es sich um eine Sammlung von Forschungsdaten mitsamt wissenschaftlicher Software um aus den Forschungsdaten neue Erkenntnisse zu gewinnen: The following facilities should be available: A computer equivalent to the IBM 704 series, plus necessary peripheral equipment. Facilities for converting standard seismograms into digital form. A library of records of earthquakes and explosions in form suitable for machine analysis. A (growing) library of basic programs which have proven useful in investigations of seismic disturbances and related phenomena. … Klingt doch ziemlich aktuell, oder? Gefallen hat mir auch die Beschreibung des Rechenzentrums als „open shop“ und der Hinweis „nothing can dampen enthusiasm for new ideas quite as effectively as long periods of waiting time“. Die Bezeichnung „digital library“ bezieht sich in dem Text primär auf die Sammlung von digitalisierten Seimsmogrammen. Am Ende der Empfehlung wird abweichend der Begriff „digitized library“ verwendet. Dies spricht dafür dass beide Begriffe synonym verwendet wurden. Interessanterweise bezieht sich „library“ aber auch auf die Sammlung von Computerprogrammen. Ob das empfohlene Rechenzentrum mit digitaler Bibliothek realisiert wurde konnte ich leider nicht herausfinden (vermutlich nicht). Zum Autor Dr. John Gerrard ist mir nicht viel mehr bekannt als dass er 1957 als Director of Data Systems and Earth Science Research bei Texas Instruments (TI) arbeitete. TI wurde 1930 als „Geophysical Service Incorporated“ zur seismischen Erkundung von Erdöllagerstätten gegründet und bekam 1965 den Regierungsauftrag zur Überwachung von Kernwaffentests (Projekt Vela Uniform). An Gerrard erinnert sich in diesem Interview ein ehemaliger Kollege: John Gerrard: into digital seismology, and he could see a little bit of the future of digital processing and he talked about how that could be effective in seismology, he was right that this would be important in seismology In Birmingham gibt es einen Geologen gleichen Namens, der ist aber erst 1944 geboren. Ich vermute dass Gerrard bei TI an der Entwicklung des Texas Instruments Automatic Computer (TIAC) beteiligt war, der speziell zur digitalen Verarbeitung seismischer Daten entwickelt wurde. Der Einsatz von Computern in klassischen Bibliotheken kam übrigens erst mit der nächsten Rechner-Generation: das MARC-Format wurde in den 1960ern mit dem IBM System/360 entwickelt (von Henriette Avram, die zuvor bei der NSA auch mit IBM 701 gearbeitet hatte). Davor gabe es den fiktiven Bibliotheks-Computer EMMARAC (angelehnt an ENIAC und UNIVAC) in „Eine Frau, die alles weiß“ mit Katharine Hepburn als Bibliothekarin und Spencer Tracy als Computervertreter. Bis Ende der 1980er taucht der Begriff „digital library“ bei Google Books übrigens nur vereinzelt auf. Tags: digital library, Geschichte 3 Kommentare Data models age like parents 15. März 2018 um 21:51 Keine Kommentare Denny Vrandečić, employed as ontologist at Google, noticed that all six of of six linked data applications linked to 8 years ago (IWB, Tabulator, Disko, Marbles, rdfbrowser2, and Zitgist) have disappeared or changed their calling syntax. This reminded me at a proverb about software and data: software ages like fish, data ages like wine. ‏ The original form of this saying seems to come from James Governor (@monkchips) who in 2007 derived it from from an earlier phrase: Hardware is like fish, operating systems are like wine. The analogy of fishy applications and delightful data has been repeated and explained and criticized several times. I fully agree with the part about software rot but I doubt that data actually ages like wine (I’d prefer Whisky anyway). A more accurate simile may be „data ages like things you put into your crowded cellar and then forget about“. Thinking a lot about data I found that data is less interesting than the structures and rules that shape and restrict data: data models, ontologies, schemas, forms etc. How do they age compared with software and data? I soon realized: data models age like parents. First they guide you, give good advise, and support you as best as they can. But at some point data begin to rebel against their models. Sooner or later parents become uncool, disconnected from current trends, outdated or even embarrassing. Eventually you have to accept their quaint peculiarities and live your own life. That’s how standards proliferate. Both ontologies and parents ultimately become weaker and need support. And in the end you have to let them go, sadly looking back. (The analogy could further be extended, for instance data models might be frustrated confronted by how actual data compares to their ideals, but that’s another story) Tags: Data Modeling Keine Kommentare in memoriam Ingetraut Dahlberg 28. Oktober 2017 um 09:24 3 Kommentare Die Informationswissenschaftlerin Ingetraut Dahlberg, bekannt unter Anderem als Gründerin der International Society for Knowledge Organization (ISKO), ist letzte Woche im Alter von 91 Jahren verstorben. Meine erste Reaktion nach einem angemessenen Bedauern war es in Wikipedia und in Wikidata das Sterbedatum einzutragen, was jedoch schon andere erledigt hatten. Also stöberte ich etwas im Lebenslauf, und legte stattdessen Wikidata-Items zum McLuhan Institute for Digital Culture and Knowledge Organization an, dem Dahlberg schon zu Lebzeiten ihre Bibliothek vermacht hat, das aber bereits 2004 wieder geschlossen wurde. Der ehemalige Direktor Kim Veltman betreibt noch eine Webseite zum Institut und nennt in seinen Memoiren Ingetraut Dahlberg, Douglas Engelbart, Ted Nelson und Tim Berners Lee in einem Atemzug. Das sollte eigentlich Grund genug sein, mich mit der Frau zu beschäftigen. Wenn ich ehrlich bin war mein Verhältnis zu Ingetraut Dahlberg allerdings eher ein distanziert-ignorantes. Ich wusste um ihre Bedeutung in der „Wissensorganisation-Szene“, der ich zwangsläufig auch angehöre, bin ihr aber nur ein oder zwei mal auf ISKO-Tagungen begegnet und hatte auch nie Interesse daran mich mehr mit ihr auseinanderzusetzen. Als „junger Wilder“ schien sie mir immer wie eine Person, deren Zeit schon lange vorbei ist und deren Beiträge hoffnungslos veraltet sind. Dass alte Ideen auch im Rahmen der Wissensorganisation keineswegs uninteressant und irrelevant sind, sollte mir durch die Beschäftigung mit Ted Nelson und Paul Otlet eigentlich klar sein; irgendwie habe ich aber bisher nie einen Anknüpfungspunkt zu Dahlbergs Werk gefunden. Wenn ich zurückblicke muss der Auslöser für meine Ignoranz in meiner ersten Begegnung mit Vertreter*innen der Wissensorganisation auf einer ISKO-Tagung Anfang der 2000er Jahre liegen: Ich war damals noch frischer Student der Bibliotheks- und Informationswissenschaft mit Informatik-Hintergrund und fand überall spannende Themen wie Wikipedia, Social Tagging und Ontologien, die prinzipiell alle etwas mit Wissensorganisation zu tun hatten. Bei der ISKO fand ich dagegen nichts davon. Das Internet schien jedenfalls noch sehr weit weg. Erschreckend fand ich dabei weniger das Fehlen inhaltlicher Auseinandersetzung mit den damals neuesten Entwicklungen im Netz sondern die formale Fremdheit: mehrere der beteiligten Wissenschaftler*innen hatten nach meiner Erinnerung nicht einmal eine Email-Adresse. Menschen, die sich Anfang der 2000er Jahre ohne Email mit Information und Wissen beschäftigten konnte ich einfach nicht ernst nehmen. So war die ISKO in meiner Ignoranz lange ein Relikt, das ähnlich wie die International Federation for Information and Documentation (FID, warum haben die sich eigentlich nicht zusammengetan?) auf tragische Weise von der technischen Entwicklung überholt wurde. Und Ingetraut Dahlberg stand für mich exemplarisch für dieses ganze Scheitern einer Zunft. Inzwischen sehe ich es etwas differenzierter und bin froh Teil dieser kleinen aber feinen Fachcommunity zu sein (und wenn die ISKO endlich auf Open Access umstellt, werde ich auch meinen Publikations-Boycott aufgeben). In jedem Fall habe ich Ingetraut Dahlberg Unrecht getan und hoffe auf differenziertere Auseinandersetzungen mit ihrem Werk. Tags: Nachruf 3 Kommentare Wikidata documentation on the 2017 Hackathon in Vienna 21. Mai 2017 um 15:21 2 Kommentare At Wikimedia Hackathon 2017, a couple of volunteers sat together to work on the help pages of Wikidata. As part of that Wikidata documentation sprint. Ziko and me took a look at the Wikidata glossary. We identified several shortcomings and made a list of rules how the glossary should look like. The result are the glossary guidelines. Where the old glossary partly replicated Wikidata:Introduction, the new version aims to allow quick lookup of concepts. We already rewrote some entries of the glossary according to these guidelines but several entries are outdated and need to be improved still. We changed the structure of the glossary into a sortable table so it can be displayed as alphabetical list in all languages. The entries can still be translated with the translation system (it took some time to get familiar with this feature). We also created some missing help pages such as Help:Wikimedia and Help:Wikibase to explain general concepts with regard to Wikidata. Some of these concepts are already explained elsewhere but Wikidata needs at least short introductions especially written for Wikidata users. Image taken by Andrew Lih (CC-BY-SA) Tags: Wikidata, wmhack 2 Kommentare Introduction to Phabricator at Wikimedia Hackathon 20. Mai 2017 um 09:44 1 Kommentar This weekend I participate at Wikimedia Hackathon in Vienna. I mostly contribute to Wikidata related events and practice the phrase "long time no see", but I also look into some introductionary talks. In the late afternoon of day one I attended an introduction to Phabricator project management tool given by André Klapper. Phabricator was introduced in Wikimedia Foundation about three years ago to replace and unify Bugzilla and several other management tools. Phabricator is much more than an issue tracker for software projects (although it is mainly used for this purpose by Wikimedia developers). In summary there are tasks, projects, and teams. Tasks can be tagged, assigned, followed,discussed, and organized with milestones and workboards. The latter are Kanban-boards like those I know from Trello, waffle, and GitHub project boards. Phabricator is Open Source so you can self-host it and add your own user management without having to pay for each new user and feature (I am looking at you, JIRA). Internally I would like to use Phabricator but for fully open projects I don’t see enough benefit compared to using GitHub. P.S.: Wikimedia Hackathon is also organized with Phabricator. There is also a task for blogging about the event. Tags: Wikimedia, wmhack 1 Kommentar Some thoughts on IIIF and Metadata 5. Mai 2017 um 22:40 Keine Kommentare Yesterday at DINI AG Kim Workshop 2017 I Martin Baumgartner and Stefanie Rühle gave an introduction to the International Image Interoperability Framework (IIIF) with focus on metadata. I already knew that IIIF is a great technology for providing access to (especially large) images but I had not have a detailed look yet. The main part of IIIF is its Image API and I hope that all major media repositories (I am looking at you, Wikimedia Commons) will implement it. In addition the IIIF community has defined a „Presentation API“, a „Search API“, and an „Authentication API“. I understand the need of such additional APIs within the IIIF community, but I doubt that solving the underlying problems with their own standards (instead of reusing existing standards) is the right way to go. Standards should better „Do One Thing and Do It Well“ (Unix philosophy). If Images are the „One Thing“ of IIIF, then Search and Authentication are different matter. In the workshop we only looked at parts of the Presentation API to see where metadata (creator, dates, places, provenance etc. and structural metadata such as lists and hierarchies) could be integrated into IIIF. Such metadata is already expressed in many other formats such as METS/MODS and TEI so the question is not whether to use IIIF or other metadata standards but how to connect IIIF with existing metadata standards. A quick look at the Presentation API surprised me to find out that the metadata element is explicitly not intended for additional metadata but only „to be displayed to the user“. The element contains an ordered list of key-value pairs that „might be used to convey the author of the work, information about its creation, a brief physical description, or ownership information, amongst other use cases“. At the same time the standard emphasizes that „there are no semantics conveyed by this information“. Hello, McFly? Without semantics conveyed it isn’t information! In particular there is no such thing as structured data (e.g. a list of key-value pairs) without semantics. I think the design of field metadata in IIIF is based on a common misconception about the nature of (meta)data, which I already wrote about elsewhere (Sorry, German article – some background in my PhD and found by Ballsun-Stanton). In a short discussion at Twitter Rob Sanderson (Getty) pointed out that the data format of IIIF Presentation API to describe intellectual works (called a manifest) is expressed in JSON-LD, so it can be extended by other RDF statements. For instance the field „license“ is already defined with dcterms:rights. Addition of a field „author“ for dcterms:creator only requires to define this field in the JSON-LD @context of a manifest. After some experimenting I found a possible way to connect the „meaningless“ metadata field with JSON-LD fields: { "@context": [ "http://iiif.io/api/presentation/2/context.json", { "author": "http://purl.org/dc/terms/creator", "bibo": "http://purl.org/ontology/bibo/" } ], "@id": "http://example.org/iiif/book1/manifest", "@type": ["sc:Manifest", "bibo:book"], "metadata": [ { "label": "Author", "property": "http://purl.org/dc/terms/creator", "value": "Allen Smithee" }, { "label": "License", "property": "http://purl.org/dc/terms/license", "value": "CC-BY 4.0" } ], "license": "http://creativecommons.org/licenses/by/4.0/", "author": { "@id": "http://www.wikidata.org/entity/Q734916", "label": "Allen Smithee" } } This solution requires an additional element property in the IIIF specification to connect a metadata field with its meaning. IIIF applications could then enrich the display of metadata fields for instance with links or additional translations. In JSON-LD some names such as „CC-BY 4.0“ and „Allen Smithee“ need to be given twice, but this is ok because normal names (in contrast to field names such as „Author“ and „License“) don’t have semantics. Tags: iiif, Metadata Keine Kommentare Ersatzteile aus dem 3D-Drucker 30. Dezember 2014 um 10:43 2 Kommentare Krach, Zack, Bumm! Da liegt die Jalousie unten. Ein kleinen Plastikteil ist abgebrochen, das wäre doch ein prima Anwendungsfall für einen 3D-Drucker, oder? Schön länger spiele ich mit dem Gedanken, einen 3D-Drucker anzuschaffen, kann aber nicht so recht sagen, wozu eigentlich. Die Herstellung von Ersatzteilen aus dem 3D-Drucker scheint mir allerdings eher so ein Versprechen zu sein wie der intelligente Kühlschrank: theoretisch ganz toll aber nicht wirklich praktisch. Es würde mich vermutlich Stunden kosten, das passende Teil auf diversen Plattformen wie Thingiverse zu finden oder es mit CAD selber zu konstruieren. Ohne verlässliche 3D-Modelle bringt also der beste 3D-Drucker nichts, deshalb sind die Geräte auch nur ein Teil der Lösung zur Herstellung von Ersatzteilen. Ich bezweifle sehr dass in naher Zukunft Hersteller 3D-Modelle ihrer Produkte zum Download anbieten werden, es sei denn es handelt sich um Open Hardware. Abgesehen von elektronischen Bastelprojekten ist das Angebot von Open-Hardware-Produkten für den Hausgebrauch aber noch sehr überschaubar. Dennoch denke ich, dass Open Hardware, das heisst Produkte deren Baupläne frei lizensiert zur kostenlosen Verfügung stehen, sowie standardisierte Bauteile das einzig Richtige für den Einsatz von 3D-Druckern im Hausgebrauch sind. Ich werde das Problem mit der kaputten Jalousie erstmal mit analoger Technik angehen und schauen, was ich so an passenden Materialien und Werkzeugen herumliegen habe. Vielleicht hilft ja Gaffer Tape? Tags: 3D-Drucker, maker, Open Hardware 2 Kommentare Einfachste Projekthomepage bei GitHub 24. September 2014 um 09:57 1 Kommentar Die einfachste Form einer Projekthomepage bei GitHub pages besteht aus einer Startseite, die lediglich auf das Repository verweist. Lokal lässt sich eine solche Seite so angelegen: 1. Erstellung des neuen, leeren branch gh-pages: git checkout --orphan gh-pages git rm -rf . 2. Anlegen der Datei index.md mit folgendem Inhalt: --- --- # {{site.github.project_title}} [{{site.github.repository_url}}]({{site.github.repository_url}}#readme). 3. Hinzufügen der Datei und push nach GitHub git add index.md git commit -m "homepage" git push origin gh-pages Tags: github 1 Kommentar Abbreviated URIs with rdfns 9. September 2014 um 11:26 4 Kommentare Working with RDF and URIs can be annoying because URIs such as „http://purl.org/dc/elements/1.1/title“ are long and difficult to remember and type. Most RDF serializations make use of namespace prefixes to abbreviate URIs, for instance „dc“ is frequently used to abbreviate „http://purl.org/dc/elements/1.1/“ so „http://purl.org/dc/elements/1.1/title“ can be written as qualified name „dc:title„. This simplifies working with URIs, but someone still has to remember mappings between prefixes and namespaces. Luckily there is a registry of common mappings at prefix.cc. A few years ago I created the simple command line tool rdfns and a Perl library to look up URI namespace/prefix mappings. Meanwhile the program is also available as Debian and Ubuntu package librdf-ns-perl. The newest version (not included in Debian yet) also supports reverse lookup to abbreviate an URI to a qualified name. Features of rdfns include: look up namespaces (as RDF/Turtle, RDF/XML, SPARQL…) $ rdfns foaf.ttl foaf.xmlns dbpedia.sparql foaf.json @prefix foaf: . xmlns:foaf="http://xmlns.com/foaf/0.1/" PREFIX dbpedia: "foaf": "http://xmlns.com/foaf/0.1/" expand a qualified name $ rdfns dc:title http://purl.org/dc/elements/1.1/title lookup a preferred prefix $ rdfns http://www.w3.org/2003/01/geo/wgs84_pos# geo create a short qualified name of an URL $ rdfns http://purl.org/dc/elements/1.1/title dc:title I use RDF-NS for all RDF processing to improve readability and to avoid typing long URIs. For instance Catmandu::RDF can be used to parse RDF into a very concise data structure: $ catmandu convert RDF --file rdfdata.ttl to YAML Tags: Perl, rdf 4 Kommentare Das Wissen der Welt 24. August 2014 um 22:32 2 Kommentare Denny Vrandečić, einer der Köpfe hinter Semantic MediaWiki und Wikidata, hat eine clevere Metrik vorgeschlagen um den Erfolg der Wikimedia-Projekte zu messen. Die Tätigkeit und damit das Ziel der Wikimedia-Foundation wurde 2004 von Jimbo Wales so ausgedrückt: Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That’s what we’re doing. In Wikiquote wird dieser bekannte Ausspruch momentan folgendermaßen übersetzt: „Stell dir eine Welt vor, in der jeder Mensch auf der Erde freien Zugang zum gesamten menschlichem Wissen hat. Das ist, was wir machen.“ Wie lässt sich nun aber quantifizieren, zu welchem Grad das Ziel erreicht ist? So wie ich es verstanden (und in meine Worte übersetzt) habe, schlägt Denny Folgendes vor: Für jedem Menschen auf der Welt gibt es theoretisch eine Zahl zwischen Null und Eins, die angibt wieviel vom gesamten Wissens der Welt („the sum of all human knowledge“) diesem Menschen durch Wikimedia-Inhalte zugänglich ist. Der Wert lässt sich als Prozentzahl des zugänglichen Weltwissens interpretieren – da sich Wissen aber kaum so einfach messen und vergleichen lässt, ist diese Interpretation problematisch. Der Wert von Eins ist utopisch, da Wikipedia & Co nicht alles Wissen der Welt enthält. Für Menschen ohne Internet-Zugang kann der Wert aber bei Null liegen. Selbst mit Zugang zu Wikipedia ist die Zahl bei jedem Menschen eine andere, da nicht alle Inhalte in allen Sprachen vorhanden sind und weil viele Inhalte ohne Vorwissen unverständlich und somit praktisch nicht zugänglich sind. Die Zahlen der individuellen Zugänglichkeit des Weltwissens lassen sich nun geordnet in ein Diagram eintragen, das von links (maximales Wissen) nach rechts (kein Wissen durch zugänglich) alle Menschen aufführt. Wie Denny an folgendem Bild ausführt, kann die Wikimedia-Community ihrem Weg auf verschiedenen Wegen näher kommen: (1) Der Ausbau von vielen Artikeln in einem komplexen Spezialgebiet oder einer kleinen Sprache kommt nur wenigen Menschen zu gute. (2) Stattdessen könnten auch die wichtigsten Artikel bzw. Themen in Sprachen verbessert und ergänzt werden, welche von vielen Menschen verstanden werden. (3) Schließlich kann Wikimedia auch dafür sorgen, dass mehr Menschen einen Zugang zu den Wikimedia-Ihren Inhalten bekommen – zum Beispiel durch Initiativen wie Wikipedia Zero Ich halte die von Denny vorgeschlagene Darstellung für hilfreich um über das einfache Zählen von Wikipedia-Artikeln hinauszukommen. Wie er allerdings selber zugibt, gibt es zahlreiche offene Fragen da sich die tatsächlichen Zahlen der Verfügbarkeit von Wissen nicht einfach ermitteln lassen. Meiner Meinung nach liegt ein Grundproblem darin, dass sich Wissen – und vor allem das gesamte Wissen der Menschheit – nicht quantifizieren lässt. Es ist auch irreführend davon auszugehen, dass die Wikimedia-Produkte Wissen sammeln oder enthalten. Möglicherweise ist dieser Irrtum für die Metrik egal, nicht aber für das was eigentlich gemessen werden soll (Zugänglichkeit des Wissens der Welt). Falls Wikimedia an einem unverstelltem Blick auf die Frage interessiert ist, wieviel des Wissens der Menschheit durch ihre Angebote den Menschen zugänglich gemacht wird, könnte es helfen mal einige Philosophen und Philosophinnen zu fragen. Ganz im Ernst. Mag sein (und so vermute ich mit meinem abgebrochenen Philosophie-Studium), dass am Ende lediglich deutlich wird, warum dass ganze Wikimedia-Projekt nicht zu realisieren ist; selbst Erkenntnisse über mögliche Gründe dieses Scheitern wären aber hilfreich. Vermutlich ist es aber zu verpönt, Philosophen ernsthaft um Rat zu fragen oder die verbliebenen Philosophen beschäftigen sich lieber mit anderen Fragen. P.S: Eine weitere relevante Disziplin zur Beantwortung der Frage wieviel Wissen der Welt durch Wikipedia & Co der Menschheit zugänglich gemacht wird, ist die Pädagogik, aber da kenne ich mich noch weniger aus als mit der Philosophie. Tags: Freie Inhalte, Wikipedia, Wissensordnung 2 Kommentare Nächste Seite » Neueste Beiträge Erster expliziter Entwurf einer Digitalen Bibliothek (1959) Data models age like parents in memoriam Ingetraut Dahlberg Wikidata documentation on the 2017 Hackathon in Vienna Introduction to Phabricator at Wikimedia Hackathon Neueste Kommentare подробности... bei Erster expliziter Entwurf einer Digitalen Bibliothek (1959) ayam s128 bei Ersatzteile aus dem 3D-Drucker will taking dht at 16 increase penis size bei Abbreviated URIs with rdfns Http://asikgapleqq.Com/ bei Dublin Core conference 2008 started thekitchenconnection-nc.com bei Suchmaschinenoptimierung á la INSM Themen API Archivierung ATOM Bibliothek Bibliothekswissenschaft BibSonomy DAIA Data Modeling digital library Feed Freie Inhalte GBV Humor Identifier Katalog Katalog 2.0 LibraryThing Literatur Mashup Medien Metadata Microformats musik OAI Open Access OpenStreetMap Perl PICA Politik rdf Seealso Semantic Web SOA Software Standards Suchmaschine Tagging Veranstaltung Web 2.0 Webservices Widget Wikimedia Wikipedia Wikis Überwachungsstaat Blogroll Planet Biblioblog 2.0 Planet Code4Lib Planet Wikimedia (de) Feeds Siehe auch Powered by WordPress with Theme based on Pool theme and Silk Icons. Entries and comments feeds. Valid XHTML and CSS. ^Top^ jakoblog-de-7417 ----
    Warning: "continue" targeting switch is equivalent to "break". Did you mean to use "continue 2"? in /kunden/116716_10965/jakoblog.de/wp/wp-content/plugins/mendeleyplugin/wp-mendeley.php on line 548

    Warning: Cannot modify header information - headers already sent by (output started at /kunden/116716_10965/jakoblog.de/wp/wp-content/plugins/mendeleyplugin/wp-mendeley.php:548) in /kunden/116716_10965/jakoblog.de/wp/wp-includes/feed-atom.php on line 8
    en – Jakoblog Das Weblog von Jakob Voß 2018-03-18T21:52:22Z http://jakoblog.de/feed/atom/ WordPress jakob <![CDATA[Data models age like parents]]> http://jakoblog.de/?p=1499 2018-03-15T19:51:45Z 2018-03-15T19:51:45Z Denny Vrandečić, employed as ontologist at Google, noticed that all six of of six linked data applications linked to 8 years ago (IWB, Tabulator, Disko, Marbles, rdfbrowser2, and Zitgist) have disappeared or changed their calling syntax. This reminded me at a proverb about software and data:

    software ages like fish, data ages like wine.


    The original form of this saying seems to come from James Governor (@monkchips) who in 2007 derived it from from an earlier phrase:

    Hardware is like fish, operating systems are like wine.

    The analogy of fishy applications and delightful data has been repeated and explained and criticized several times. I fully agree with the part about software rot but I doubt that data actually ages like wine (I’d prefer Whisky anyway). A more accurate simile may be „data ages like things you put into your crowded cellar and then forget about“.

    Thinking a lot about data I found that data is less interesting than the structures and rules that shape and restrict data: data models, ontologies, schemas, forms etc. How do they age compared with software and data? I soon realized:

    data models age like parents.

    First they guide you, give good advise, and support you as best as they can. But at some point data begin to rebel against their models. Sooner or later parents become uncool, disconnected from current trends, outdated or even embarrassing. Eventually you have to accept their quaint peculiarities and live your own life. That’s how standards proliferate. Both ontologies and parents ultimately become weaker and need support. And in the end you have to let them go, sadly looking back.

    (The analogy could further be extended, for instance data models might be frustrated confronted by how actual data compares to their ideals, but that’s another story)

    ]]>
    0
    jakob <![CDATA[Wikidata documentation on the 2017 Hackathon in Vienna]]> http://jakoblog.de/?p=1490 2017-05-21T13:47:47Z 2017-05-21T13:21:39Z At Wikimedia Hackathon 2017, a couple of volunteers sat together to work on the help pages of Wikidata. As part of that Wikidata documentation sprint. Ziko and me took a look at the Wikidata glossary. We identified several shortcomings and made a list of rules how the glossary should look like. The result are the glossary guidelines. Where the old glossary partly replicated Wikidata:Introduction, the new version aims to allow quick lookup of concepts. We already rewrote some entries of the glossary according to these guidelines but several entries are outdated and need to be improved still. We changed the structure of the glossary into a sortable table so it can be displayed as alphabetical list in all languages. The entries can still be translated with the translation system (it took some time to get familiar with this feature).

    We also created some missing help pages such as Help:Wikimedia and Help:Wikibase to explain general concepts with regard to Wikidata. Some of these concepts are already explained elsewhere but Wikidata needs at least short introductions especially written for Wikidata users.

    Image taken by Andrew Lih (CC-BY-SA)

    ]]>
    2
    jakob <![CDATA[Introduction to Phabricator at Wikimedia Hackathon]]> http://jakoblog.de/?p=1484 2017-05-20T07:47:48Z 2017-05-20T07:44:30Z This weekend I participate at Wikimedia Hackathon in Vienna. I mostly contribute to Wikidata related events and practice the phrase "long time no see", but I also look into some introductionary talks.

    In the late afternoon of day one I attended an introduction to Phabricator project management tool given by André Klapper. Phabricator was introduced in Wikimedia Foundation about three years ago to replace and unify Bugzilla and several other management tools.

    Phabricator is much more than an issue tracker for software projects (although it is mainly used for this purpose by Wikimedia developers). In summary there are tasks, projects, and teams. Tasks can be tagged, assigned, followed,discussed, and organized with milestones and workboards. The latter are Kanban-boards like those I know from Trello, waffle, and GitHub project boards.

    Phabricator is Open Source so you can self-host it and add your own user management without having to pay for each new user and feature (I am looking at you, JIRA). Internally I would like to use Phabricator but for fully open projects I don’t see enough benefit compared to using GitHub.

    P.S.: Wikimedia Hackathon is also organized with Phabricator. There is also a task for blogging about the event.

    ]]>
    1
    jakob <![CDATA[Some thoughts on IIIF and Metadata]]> http://jakoblog.de/?p=1476 2017-05-05T20:40:59Z 2017-05-05T20:40:59Z Yesterday at DINI AG Kim Workshop 2017 I Martin Baumgartner and Stefanie Rühle gave an introduction to the International Image Interoperability Framework (IIIF) with focus on metadata. I already knew that IIIF is a great technology for providing access to (especially large) images but I had not have a detailed look yet. The main part of IIIF is its Image API and I hope that all major media repositories (I am looking at you, Wikimedia Commons) will implement it. In addition the IIIF community has defined a „Presentation API“, a „Search API“, and an „Authentication API“. I understand the need of such additional APIs within the IIIF community, but I doubt that solving the underlying problems with their own standards (instead of reusing existing standards) is the right way to go. Standards should better „Do One Thing and Do It Well“ (Unix philosophy). If Images are the „One Thing“ of IIIF, then Search and Authentication are different matter.

    In the workshop we only looked at parts of the Presentation API to see where metadata (creator, dates, places, provenance etc. and structural metadata such as lists and hierarchies) could be integrated into IIIF. Such metadata is already expressed in many other formats such as METS/MODS and TEI so the question is not whether to use IIIF or other metadata standards but how to connect IIIF with existing metadata standards. A quick look at the Presentation API surprised me to find out that the metadata element is explicitly not intended for additional metadata but only „to be displayed to the user“. The element contains an ordered list of key-value pairs that „might be used to convey the author of the work, information about its creation, a brief physical description, or ownership information, amongst other use cases“. At the same time the standard emphasizes that „there are no semantics conveyed by this information“. Hello, McFly? Without semantics conveyed it isn’t information! In particular there is no such thing as structured data (e.g. a list of key-value pairs) without semantics.

    I think the design of field metadata in IIIF is based on a common misconception about the nature of (meta)data, which I already wrote about elsewhere (Sorry, German article – some background in my PhD and found by Ballsun-Stanton).

    In a short discussion at Twitter Rob Sanderson (Getty) pointed out that the data format of IIIF Presentation API to describe intellectual works (called a manifest) is expressed in JSON-LD, so it can be extended by other RDF statements. For instance the field „license“ is already defined with dcterms:rights. Addition of a field „author“ for dcterms:creator only requires to define this field in the JSON-LD @context of a manifest. After some experimenting I found a possible way to connect the „meaningless“ metadata field with JSON-LD fields:

     { "@context": [ "http://iiif.io/api/presentation/2/context.json", { "author": "http://purl.org/dc/terms/creator", "bibo": "http://purl.org/ontology/bibo/" } ], "@id": "http://example.org/iiif/book1/manifest", "@type": ["sc:Manifest", "bibo:book"], "metadata": [ { "label": "Author", "property": "http://purl.org/dc/terms/creator", "value": "Allen Smithee" }, { "label": "License", "property": "http://purl.org/dc/terms/license", "value": "CC-BY 4.0" } ], "license": "http://creativecommons.org/licenses/by/4.0/", "author": { "@id": "http://www.wikidata.org/entity/Q734916", "label": "Allen Smithee" } } 

    This solution requires an additional element property in the IIIF specification to connect a metadata field with its meaning. IIIF applications could then enrich the display of metadata fields for instance with links or additional translations. In JSON-LD some names such as „CC-BY 4.0“ and „Allen Smithee“ need to be given twice, but this is ok because normal names (in contrast to field names such as „Author“ and „License“) don’t have semantics.

    ]]>
    0
    jakob <![CDATA[Abbreviated URIs with rdfns]]> http://jakoblog.de/?p=1459 2014-09-09T09:26:13Z 2014-09-09T09:26:13Z Working with RDF and URIs can be annoying because URIs such as „http://purl.org/dc/elements/1.1/title“ are long and difficult to remember and type. Most RDF serializations make use of namespace prefixes to abbreviate URIs, for instance „dc“ is frequently used to abbreviate „http://purl.org/dc/elements/1.1/“ so „http://purl.org/dc/elements/1.1/title“ can be written as qualified name „dc:title„. This simplifies working with URIs, but someone still has to remember mappings between prefixes and namespaces. Luckily there is a registry of common mappings at prefix.cc.

    A few years ago I created the simple command line tool rdfns and a Perl library to look up URI namespace/prefix mappings. Meanwhile the program is also available as Debian and Ubuntu package librdf-ns-perl. The newest version (not included in Debian yet) also supports reverse lookup to abbreviate an URI to a qualified name. Features of rdfns include:

    look up namespaces (as RDF/Turtle, RDF/XML, SPARQL…)

     $ rdfns foaf.ttl foaf.xmlns dbpedia.sparql foaf.json @prefix foaf:  . xmlns:foaf="http://xmlns.com/foaf/0.1/" PREFIX dbpedia:  "foaf": "http://xmlns.com/foaf/0.1/" 

    expand a qualified name

     $ rdfns dc:title http://purl.org/dc/elements/1.1/title 

    lookup a preferred prefix

     $ rdfns http://www.w3.org/2003/01/geo/wgs84_pos# geo 

    create a short qualified name of an URL

     $ rdfns http://purl.org/dc/elements/1.1/title dc:title 

    I use RDF-NS for all RDF processing to improve readability and to avoid typing long URIs. For instance Catmandu::RDF can be used to parse RDF into a very concise data structure:

     $ catmandu convert RDF --file rdfdata.ttl to YAML 
    ]]>
    4
    jakob <![CDATA[Testing command line apps with App::Cmd]]> http://jakoblog.de/?p=1435 2013-11-01T08:49:19Z 2013-11-01T08:49:19Z This posting has also been published at blogs.perl.org.

    Ricardo Signes‘ App::Cmd has been praised a lot so I gave it a try for my recent command line app. In summary, the module is great although I missed some minor features and documentation (reminder to all: if you miss some feature in a CPAN module, don’t create yet another module but try to improve the existing one!). One feature I like a lot is how App::Cmd facilitates writing tests for command line apps. After having written a short wrapper around App::Cmd::Tester my formerly ugly unit tests look very simple and clean. Have a look at this example:

     use Test::More; use App::PAIA::Tester; new_paia_test; paia qw(config); is stdout, "{}\n"; is error, undef; paia qw(config -c x.json --verbose); is error, "failed to open config file x.json\n"; ok exit_code; paia qw(config --config x.json --verbose foo bar); is output, "# saved config file x.json\n"; paia qw(config foo bar); paia qw(config base http://example.org/); is exit_code, 0; is output, ''; paia qw(config); is_deeply stdout_json, { base => 'http://example.org/', foo => 'bar', }, "get full config" done_paia_test; 

    The application is called paia – that’s how it called at command line and that’s how it is simply called as function in the tests. The wrapper class (here: App::PAIA::Tester) creates a singleton App::Cmd::Tester::Result object and exports its methods (stdout, stderr, exit_code…). This alone makes the test much more readable. The wrapper further exports two methods to set up a testing environment (new_paia_test) and to finish testing (done_paia_test). In my case the setup creates an empty temporary directory, other applications might clean up environment variables etc. Depending on your application you might also add some handy functions like stdout_json to parse the app’s output in a form that can better be tested.

    ]]>
    0
    jakob <![CDATA[My PhD thesis about data]]> http://jakoblog.de/?p=1422 2013-09-23T11:22:22Z 2013-09-23T07:03:55Z

    I have finally received paper copies of my PhD thesis „Describing Data Patterns“, published and printed via CreateSpace. The full PDF has already been archived as CC-BY-SA, but a paper print may still be nice and more handy (it’s printed as small paperback instead of the large A4-PDF). You can get a copy for 12.80€ or 12.24€ via Amazon (ISBN 1-4909-3186-4).

    I also set up a little website at aboutdata.org. The site contains an HTML view of the pattern language that I developed as one result of the thesis.

    I am sorry for not having written the thesis in Pandoc Markdown but in LaTeX (source code available at GitHub), so there is no EPUB/HTML version.

    ]]>
    3
    jakob <![CDATA[On the way to a library ontology]]> http://jakoblog.de/?p=1379 2013-04-11T13:02:50Z 2013-04-11T13:02:50Z I have been working for some years on specification and implementation of several APIs and exchange formats for data used in, and provided by libraries. Unfortunately most existing library standards are either fuzzy, complex, and misused (such as MARC21), or limited to bibliographic data or authority data, or both. Libraries, however, are much more than bibliographic data – they involve library patrons, library buildings, library services, library holdings, library databases etc.

    During the work on formats and APIs for these parts of library world, Patrons Account Information API (PAIA) being the newest piece, I found myself more and more on the way to a whole library ontology. The idea of a library ontology started in 2009 (now moved to this location) but designing such a broad data model from bottom would surely have lead to yet another complex, impractical and unused library standard. Meanwhile there are several smaller ontologies for parts of the library world, to be combined and used as Linked Open Data.

    In my opinion, ontologies, RDF, Semantic Web, Linked Data and all the buzz is is overrated, but it includes some opportunities for clean data modeling and data integration, which one rarely finds in library data. For this reason I try to design all APIs and formats at least compatible with RDF. For instance the Document Availability Information API (DAIA), created in 2008 (and now being slightly redesigned for version 1.0) can be accessed in XML and in JSON format, and both can fully be mapped to RDF. Other micro-ontologies include:

    • Document Service Ontology (DSO) defines typical document-related services such as loan, presentation, and digitization
    • Simple Service Status Ontology (SSSO) defines a service instance as kind of event that connects a service provider (e.g. a library) with a service consumer (e.g. a library patron). SSSO further defines typical service status (e.g. reserved, prepared, executed…) and limitations of a service (e.g. a waiting queue or a delay
    • Patrons Account Information API (PAIA) will include a mapping to RDF to express basic patron information, fees, and a list of current services in a patron account, based on SSSO and DSO.
    • Document Availability Information API (DAIA) includes a mapping to RDF to express the current availability of library holdings for selected services. See here for the current draft.
    • A holdings ontology should define properties to relate holdings (or parts of holdings) to abstract documents and editions and to holding institutions.
    • GBV Ontology contains several concepts and relations used in GBV library network that do not fit into other ontologies (yet).
    • One might further create a database ontology to describe library databases with their provider, extent APIs etc. – right now we use the GBV ontology for this purpose. Is there anything to reuse instead of creating just another ontology?!

    The next step will probably creation of a small holdings ontology that nicely fits to the other micro-ontologies. This ontology should be aligned or compatible with the BIBFRAME initiative, other ontologies such as Schema.org, and existing holding formats, without becoming too complex. The German Initiative DINI-KIM has just launched a a working group to define such holding format or ontology.

    ]]>
    1
    jakob <![CDATA[Dead End Electronic Resource Citation (ERC)]]> http://jakoblog.de/?p=1376 2013-03-29T09:51:26Z 2013-03-29T09:51:26Z Tidying up my PhD notes, I found this short rant about „Electronic Resource Citation“. I have not used it anywhere, so I publish it here, licensed under CC-BY-SA.

    Electronic Resource Citation (ERC) was introduced by John Kunze with a presentation at the International Conference on Dublin Core and Metadata Applications 2001 and with a paper in the Journal of Digital Information, Vol. 2, No 2 (2002). Kunze cited his paper in a call for an ERC Interest Group within the Dublin Core Metadata Initiative (DCMI) at the PERL4LIB mailing list, giving the following example of an ERC:

     erc: Kunze, John A. | A Metadata Kernel for Electronic Permanence | 20011106 | http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Kunze/ 

    An ERC is a minimal „kernel“ metadata record that consist of four elements: who, what, when and where. In the given example they are:

     who: Kunze, John A. what: A Metadata Kernel for Electronic Permanence when: 20011106 where: http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Kunze/ 

    Ironically the given URL is obsolete, the host ‚jodi.ecs.soton.ac.uk‘ does not even exist anymore. The ERC is pretty useless if it just uses a fragile URL to cite a resource. How about some value that does not change over time, e.g:

     where: Journal of Digital Information, Volume 2 Issue 2 

    As ERC is defined as „a location or machine-oriented identifier“, one could also use stable identifiers:

     where: ISSN 1368-7506, Article No. 81 

    Both ISSN and article numbers 81 are much more identifiers then URLs. Citing an URL is more like

     where: at the desk in the little reading room of my library 

    By the way the current location is http://www.rice.edu/perl4lib/archives/2002-09/msg00017.html – but who knows whether Texas A&M University will still host the journal at this URL in 20 years?

    There are some interesting ideas in the original ERC proposal (different kinds of missing values, TEMPER date values, the four questions etc.), but its specification and implementation are just ridiculous and missing references to current technology (you know that you are doing something wrong in specification if you start to define your own encodings for characters, dates etc. instead of concentrating to your core subject and refering to existing specifications for the rest). The current draft (2010) is a typical example of badly mixing modeling and encoding issues and of loosing touch with existing, established data standards.

    In addition to problems at the „low level“ of encoding, the „high level“ of conceptual modeling lacks appropriate references. What about the relation of ERC concepts to models such as FRBR and CIDOC-CRM? Why are ‚who‘, ‚when‘, ‚where‘, ‚what‘ the important metadata fields (in many cases the most interesting question is ‚why‘)? How about Ranganathan’s colon classification with personality, matter, energy, space, and time?

    In summary the motivation behind ERC contains some good ideas, but its form is misdirected.

    ]]>
    0
    jakob <![CDATA[Access to library accounts for better user experience]]> http://jakoblog.de/?p=1362 2013-02-08T09:10:03Z 2013-02-08T09:10:03Z I just stumbled upon ReadersFirst, a coalition of (public) libraries that call for a better user experience for library patrons, especially to access e-books. The libraries regret that

    the products currently offered by e-content distributors, the middlemen from whom libraries buy e-books, create a fragmented, disjointed and cumbersome user experience.

    One of the explicit goals of ReadersFirst is to urge providers of e-content and integrated library systems for systems that allow users to

    Place holds, check-out items, view availability, manage fines and receive communications within individual library catalogs or in the venue the library believes will serve them best, without having to visit separate websites.

    In a summary of the first ReadersFirst meeting at January 28, the president of Queens Library (NY) is cited with the following request:

    The reader should be able to look at their library account and see what they have borrowed regardless of the vendor that supplied the ebook.

    This goal matches well with my activity at GBV: as part of a project to implement a mobile library app, I designed an API to access library accounts. The Patrons Account Information API (PAIA) is current being implemented and tested by two independent developers. It will also be used to provide a better user experience in VuFind discovery interfaces.

    During the research for PAIA I was surprised by the lack of existing methods to access library patron accounts. Some library systems not even provide an internal API to connect to the loan system – not to speak of a public API that could directly be used by patrons and third parties. The only example I could find was York University Libraries with a simple, XML-based, read-only API. This lack of public APIs to library patron accounts is disappointing, given that its almost ten years after the buzz around Web 2.0, service oriented architecture, and mashups. All all major providers of web applications (Google, Twitter, Facebook, StackExchange, GitHub etc.) support access to user accounts via APIs.

    The Patrons Account Information API will hopefully fill this gap with defined methods to place holds and to view checked out items and fines. PAPI is agnostic to specific library systems, aligned with similar APIs as listed above, and designed with RDF in mind (without any need to bother with RDF, apart from the requirement to use URIs as identifiers). Feedback and implementations are very welcome!

    ]]>
    5
    jamanetwork-com-2664 ---- None jodischneider-com-1630 ---- jodischneider.com/blog reading, technology, stray thoughts Blog About Categories argumentative discussions books and reading computer science Firefox future of publishing higher education information ecosystem Information Quality Lab news intellectual freedom iOS: iPad, iPhone, etc. library and information science math old newspapers PhD diary programming random thoughts reviews scholarly communication semantic web social semantic web social web Uncategorized Search Paid graduate hourly research position at UIUC for Spring 2021 December 3rd, 2020 by jodi Jodi Schneider’s Information Quality Lab (http://infoqualitylab.org) seeks a graduate hourly student for a research project on bias in citation networks. Biased citation benefits authors in the short-term by bolstering grants and papers, making them more easily accepted. However, it can have severe negative consequences for scientific inquiry. Our goal is to find quantitative measures of network structure that can indicate the existence of citation bias.  This job starts January 4, 2021. Pay depending on experience (Master’s students start at $18/hour). Optionally, the student can also take a graduate independent study course (generally 1-2 credits IS 589 or INFO 597). Apply on Handshake Responsibilities will include: Assist in the development of algorithms to simulate an unbiased network Carry out statistical significance tests for candidate network structure measures Attend weekly meetings Assist with manuscript and grant preparation Required Skills Proficiency in Python or R Demonstrated ability to systematically approach a simulation or modeling problem Statistical knowledge, such as developed in a course on mathematical statistics and probability (e.g. STAT400 Statistics and Probability I https://courses.illinois.edu/schedule/2021/spring/STAT/400 ) Preferred Skills Knowledge of stochastic processes Experience with simulation Knowledge of random variate generation and selection of input probability distribution Knowledge of network analysis May have taken classes such as STAT433 Stochastic Processes (https://courses.illinois.edu/schedule/2021/spring/STAT/433) or IE410 Advanced Topics in Stochastic Processes & Applications (https://courses.illinois.edu/schedule/2020/fall/IE/410) MORE INFORMATION: https://ischool.illinois.edu/people/jodi-schneider http://infoqualitylab.org APPLICATION DEADLINE: Monday December 14th. Apply on Handshake with the following APPLICATION MATERIALS: Resume Transcript – Such as free University of Illinois academic history from Banner self-service (https://apps.uillinois.edu, click “Registration & Records”, “Student Records and Transcripts”, “View Academic History”, choose “Web Academic History”) Cover letter: Just provide short answers to the following two questions: 1) Why are you interested in this particular project? 2) What past experience do you have that is related to this project?  Tags: citation bias, jobs, network analysis, statistical modeling Posted in Information Quality Lab news | Comments (0) Avoiding long-haul air travel during the COVID-19 pandemic October 28th, 2020 by jodi I would not recommend long-haul air travel at this time. An epidemiological study of a 7.5 hour flight from the Middle East to Ireland concluded that 4 groups (13 people), traveling from 3 continents in four groups, who used separate airport lounges, were likely infected in flight. The flight had 17% occupancy (49 passengers/283 seats; 12 crew) and took place in summer 2020. (Note: I am not an epidemiologist.) The study (published open access): Murphy Nicola, Boland Máirín, Bambury Niamh, Fitzgerald Margaret, Comerford Liz, Dever Niamh, O’Sullivan Margaret B, Petty-Saphon Naomi, Kiernan Regina, Jensen Mette, O’Connor Lois. A large national outbreak of COVID-19 linked to air travel, Ireland, summer 2020. Euro Surveill. 2020;25(42):pii=2001624. https://doi.org/10.2807/1560-7917.ES.2020.25.42.2001624 Irish news sites including RTE and the Irish Times also covered the paper. Figure 2 from “A large national outbreak of COVID-19 linked to air travel, Ireland, summer 2020” https://doi.org/10.2807/1560-7917.ES.2020.25.42.2001624 Caption in original “Passenger seating diagram on flight, Ireland, summer 2020 (n=49 passengers)” “Numbers on the seats indicate the Flight Groups 1–4.” The age of the 13 flight cases ranged from 1 to 65 years with a median age of 23 years. Twelve of 13 flight cases and almost three quarters (34/46) of the non-flight cases were symptomatic. After the flight, the earliest onset of symptoms occurred 2 days after arrival, and the latest case in the entire outbreak occurred 17 days after the flight. Of 12 symptomatic flight cases, symptoms reported included cough (n = 7), coryza (n = 7), fever (n = 6) and sore throat (n = 5), and six reported loss of taste or smell. No symptoms were reported for one flight case. A mask was worn during the flight by nine flight cases, not worn by one (a child), and unknown for three. Murphy Nicola, Boland Máirín, Bambury Niamh, Fitzgerald Margaret, Comerford Liz, Dever Niamh, O’Sullivan Margaret B, Petty-Saphon Naomi, Kiernan Regina, Jensen Mette, O’Connor Lois. A large national outbreak of COVID-19 linked to air travel, Ireland, summer 2020. Euro Surveill. 2020;25(42):pii=2001624. https://doi.org/10.2807/1560-7917.ES.2020.25.42.2001624 (Notes to Figure 1 Caption) “It is interesting that four of the flight cases were not seated next to any other positive case, had no contact in the transit lounge, wore face masks in-flight and would not be deemed close contacts under current guidance from the European Centre for Disease Prevention and Control (ECDC) [1].” Murphy Nicola, Boland Máirín, Bambury Niamh, Fitzgerald Margaret, Comerford Liz, Dever Niamh, O’Sullivan Margaret B, Petty-Saphon Naomi, Kiernan Regina, Jensen Mette, O’Connor Lois. A large national outbreak of COVID-19 linked to air travel, Ireland, summer 2020. Euro Surveill. 2020;25(42):pii=2001624. https://doi.org/10.2807/1560-7917.ES.2020.25.42.2001624 “The source case is not known. The first two cases in Group 1 became symptomatic within 48 h of the flight, and COVID-19 was confirmed in three, including an asymptomatic case from this Group in Region A within 5 days of the flight. Thirteen secondary cases and one tertiary case were later linked to these cases. Two cases from Flight Group 2 were notified separately in Region A with one subsequent secondary family case, followed by three further flight cases notified from Region B in two separate family units (Flight Groups 1 and 2). These eight cases had commenced their journey from the same continent and had some social contact before the flight. The close family member of a Group 2 case seated next to the case had tested positive abroad 3 weeks before, and negative after the flight. Flight Group 3 was a household group of which three cases were notified in Region C and one case in Region D. These cases had no social or airport lounge link with Groups 1 or 2 pre-flight and were not seated within two rows of them. Their journey origin was from a different continent. A further case (Flight Group 4) had started the journey from a third continent, had no social or lounge association with other cases and was eated in the same row as passengers from Group 1. Three household contacts and a visitor of Flight Group 4 became confirmed cases. One affected contact travelled to Region E, staying in shared accommodation with 34 others; 25 of these 34 became cases (attack rate 73%) notified in regions A, B, C, D, E and F, with two cases of quaternary spread.” Murphy Nicola, Boland Máirín, Bambury Niamh, Fitzgerald Margaret, Comerford Liz, Dever Niamh, O’Sullivan Margaret B, Petty-Saphon Naomi, Kiernan Regina, Jensen Mette, O’Connor Lois. A large national outbreak of COVID-19 linked to air travel, Ireland, summer 2020. Euro Surveill. 2020;25(42):pii=2001624. https://doi.org/10.2807/1560-7917.ES.2020.25.42.2001624 “In-flight transmission is a plausible exposure for cases in Group 1 and Group 2 given seating arrangements and onset dates. One case could hypothetically have acquired the virus as a close household contact of a previous positive case, with confirmed case onset date less than two incubation periods before the flight, and symptom onset in the flight case was 48 h after the flight. In-flight transmission was the only common exposure for four other cases (Flight Groups 3 and 4) with date of onset within four days of the flight in all but the possible tertiary case. This case from Group 3 developed symptoms nine days after the flight and so may have acquired the infection in-flight or possibly after the flight through transmission within the household.” Murphy Nicola, Boland Máirín, Bambury Niamh, Fitzgerald Margaret, Comerford Liz, Dever Niamh, O’Sullivan Margaret B, Petty-Saphon Naomi, Kiernan Regina, Jensen Mette, O’Connor Lois. A large national outbreak of COVID-19 linked to air travel, Ireland, summer 2020. Euro Surveill. 2020;25(42):pii=2001624. https://doi.org/10.2807/1560-7917.ES.2020.25.42.2001624 “Genomic sequencing for cases travelling from three different continents strongly supports the epidemiological transmission hypothesis of a point source for this outbreak. The ability of genomics to resolve transmission events may increase as the virus evolves and accumulates greater diversity [23].” Murphy Nicola, Boland Máirín, Bambury Niamh, Fitzgerald Margaret, Comerford Liz, Dever Niamh, O’Sullivan Margaret B, Petty-Saphon Naomi, Kiernan Regina, Jensen Mette, O’Connor Lois. A large national outbreak of COVID-19 linked to air travel, Ireland, summer 2020. Euro Surveill. 2020;25(42):pii=2001624. https://doi.org/10.2807/1560-7917.ES.2020.25.42.2001624 Authors note that a large percentage of the flight passengers were infected: “We calculated high attack rates, ranging plausibly from 9.8 % to 17.8% despite low flight occupancy and lack of passenger proximity on-board.” Murphy Nicola, Boland Máirín, Bambury Niamh, Fitzgerald Margaret, Comerford Liz, Dever Niamh, O’Sullivan Margaret B, Petty-Saphon Naomi, Kiernan Regina, Jensen Mette, O’Connor Lois. A large national outbreak of COVID-19 linked to air travel, Ireland, summer 2020. Euro Surveill. 2020;25(42):pii=2001624. https://doi.org/10.2807/1560-7917.ES.2020.25.42.2001624 Among the reasons for the uncertainty of this range is that “11 flight passengers could not be contacted and were consequently not tested.” (A twelfth passenger “declined testing”.) There is also some inherent uncertainty due to incubation period and possibility of “transmission within the household”, especially after the flight; authors note that “Exposure possibilities for flight cases include in-flight, during overnight transfer/pre-flight or unknown acquisition before the flight.” Beyond the 13 people on the flight, cases spread to several social groups, across “six of the eight different health regions (Regions A–H) throughout the Republic of Ireland”. Flight groups 1 and 2 started their travel from one continent; Flight group 3 from another; Flight group 4 from a third continent. Figure 3 from “A large national outbreak of COVID-19 linked to air travel, Ireland, summer 2020” https://doi.org/10.2807/1560-7917.ES.2020.25.42.2001624 caption in original: “Diagram of chains of transmission, flight-related COVID-19 cases, Ireland, summer 2020 (n=59)” Tags: air travel, attack rate, COVID-19, COVID19, epidemiology, flights, flying, Ireland, Middle East, pandemic Posted in random thoughts | Comments (0) Paid Undergraduate Research position at UIUC for Fall & Spring 2020 August 18th, 2020 by jodi University of Illinois undergraduates are encouraged to apply for a position in my lab. I particularly welcome applications from students in the new iSchool BS/IS degree or in the university-wide informatics minor. While I only have 1 paid position open, I also supervise unpaid independent study projects. Dr. Jodi Schneider and the Information Quality Lab seek undergraduate research assistants for 100% REMOTE WORK. Past students have published research articles, presented posters, earned independent study credit, James Scholar research credit, etc. One paid position in news analytics/data science for Assessing the Impact of Media Polarization on Public Health Emergencies, funded by the Cline Center for Advanced Research in the Social Sciences. (8hrs/week at $12.50/hour + possible independent study – 100% REMOTE WORK). COVID-19 news analytics: We seek to understand how public health emergencies are reported and to assess the polarization and politicization of the U.S. news coverage. You will be responsible for testing and improving search parameters, investigating contextual information such as media bias and media circulation, using text mining and data science, and close reading of sample texts. You will work closely with a student who has worked on the opioid crisis – see the past work following poster (try the link TWICE – you have to log in with an Illinois NetID): https://compass2g.illinois.edu/webapps/discussionboard/do/message?action=list_messages&course_id=_50281_1&nav=discussion_board&conf_id=_247818_1&forum_id=_417427_1&message_id=_6264991_1 Applications should be submitted here: https://forms.illinois.edu/sec/742264484 DEADLINE: 5 pm Central Time SUNDAY AUGUST 30, 2020 Tags: COVID19, data science, health controversies, jobs, media polarization, news analytics, research experiences for undergraduates, undergraduate research Posted in Information Quality Lab news | Comments (0) #ShutDownSTEM #strike4blacklives #ShutDownAcademia June 10th, 2020 by jodi I greatly appreciated receiving messages from senior people about their participation in the June 10th #ShutDownSTEM #strike4blacklives #ShutDownAcademia. In that spirit, I am sharing my email bounce message for tomorrow, and the message I sent to my research lab. Email bounce: I am not available by email today:  This June 10th is a day of action about understanding and addressing racism, and its impact on the academy, and on STEM.  -Jodi Email to my research lab Wednesday is a day of action about understanding and addressing racism, and its impact on the academy, and on STEM. I strongly encourage you to use tomorrow for this purpose. Specifically, I invite you to think about what undoing racism – moving towards antiracism – means, and what you can do. One single day, by itself, will not cure racism; but identifying what we can do on an ongoing basis, and taking those actions day after day – that can and will have an impact. And, if racism is vivid in your daily life, make #ShutDownSTEM a day of rest. If tomorrow doesn’t suit, I encourage you to reserve a day over the course of the next week, to replace your everyday duties. What does taking this time actually mean? It means scheduling a dedicated block of time to learn more; rescheduling meetings; shutting down your email; reading books and articles and watching videos; and taking time to reflect on recent events and the stress that they cause every single person in our community. What am I doing personally? I’ve cancelled meetings tomorrow, and set an email bounce. I will spend part of the day to think more seriously about what real antiracist action looks like from my position, as a white female academic. This week I will also be using time to re-read White Fragility, to finish Dreamland Burning (a YA novel about the 1921 Tulsa race riot), and to investigate how to bring bystander training to the iSchool. I will also be thinking about the relationship of racism to other forms of oppression – classism, sexism, homophobia, transphobia, xenophobia. If you are looking for readings of your own, I can point to a list curated by an Anti-Racism Task Force: https://idea.illinois.edu/education For basic information, #ShutDownSTEM #strike4blacklives #ShutDownAcademia website: https://www.shutdownstem.com Physicists’ Particles for Justice: https://www.particlesforjustice.org -Jodi Tags: #ShutDownAcademia, #ShutDownSTEM, #strike4blacklives, email Posted in random thoughts | Comments (0) QOTD: Storytelling in protest and politics March 16th, 2020 by jodi I recently read Francesca Polletta‘s book It was like a fever: Storytelling in protest and politics (2006, University of Chicago Press). I recommend it! It will appeal to researchers interested in topics such as narrative, strategic communication, (narrative) argumentation, or epistemology (here, of narrative). Parts may also interest activists. The book’s case studies are drawn from the Student Nonviolent Coordinating Committee (SNCC) (Chapters 2 & 3); online deliberation about the 9/11 memorial (Listening to the City, summer 2002) (Chapter 4); women’s stories in law (including, powerfully, battered women who had killed their abusers, and the challenges in making their stories understandable) (Chapter 5); references to Martin Luther King by African American Congressmen (in the Congressional Record) and by “leading back political figures who were not serving as elected or appointed officials” (Chapter 6). Several are extended from work Polletta previously published from 1998 through 2005 (see page xiii for citations). The conclusion—”Conclusion: Folk Wisdom and Scholarly Tales” (pages 166-187)—takes up several topics, starting with canonicity, interpretability, ambivalence. I especially plan to go back to the last two sections: “Scholars Telling Stories” (pages 179-184)—about narrative and storytelling in analysts’ telling of events—and “Towards a Sociology of Discursive Forms” (pages 185-187)—about investigating the beliefs and conventions of narrative and its institutional conventions (and relating those to conventions of other “discursive forms” such as interviews). These set forward a research agenda likely useful to other scholars interested in digging in further. These are foreshadowed a bit in the introduction (“Why Stories Matter”) which, among other things, sets out the goal of developing “A Sociology of Storytelling”. A few quotes I noted—may give you the flavor of the book: page 141: “But telling stories also carries risks. People with unfamiliar experiences have found those experiences assimilated to canonical plot lines and misheard as a result. Conventional expectations about how stories work, when they are true, and when they are appropriate have also operated to diminish the impact of otherwise potent political stories. For the abused women whom juries disbelieved because their stories had changed in small details since their first traumatized [p142] call to police, storytelling has not been especially effective. Nor was it effective for the citizen forum participants who did not say what it was like to search fruitlessly for affordable housing because discussions of housing were seen as the wrong place in which to tell stories.” pages 166-167: “So which is it? Is narrative fundamentally subversive or hegemonic? Both. As a rhetorical form, narrative is equipped to puncture reigning verities and to uphold them. At times, it seems as if most of the stories in circulation are subtly or not so subtly defying authorities; at others as if the most effective storytelling is done by authorities. To make it more complicated, sometimes authorities unintentionally undercut their own authority when they tell stories. And even more paradoxically, undercutting their authority by way of a titillating but politically inconsequential story may actually strengthen it. Dissenters, for their part, may find their stories misread in ways that support the very institutions that are challenging….”For those interested in the relations between storytelling, protest, and politics, this all suggests two analytical tasks. One is to identify the features of narrative that allow it to [p167] achieve certain rhetorical effects. The other is to identify the social conditions in which those rhetorical effects are likely to be politically consequential. The surprise is that scholars of political processes have devoted so little attention to either task.” pages 177-8 – “So institutional conventions of storytelling influence what people can do strategically with stories. In the previously pages, I have described the narrative conventions that operate in legal adjudication, media reporting, television talk shows, congressional debate, and public deliberation. Sociolinguists have documented such conventions in other settings: in medical intake interviews, for example, parole hearings, and jury deliberations. One could certainly generate a catalogue of the institutional conventions of storytelling. To some extent, those conventions reflect the peculiarities of the institution as it has developed historically. They also serve practical functions; some explicit, others less so. I have argued that the lines institutions draw between suitable and unsuitable occasions for storytelling or for certain kinds of stories serve to legitimate the institution.” [specific examples follow] ….”As these examples suggest, while institutions have different conventions of storytelling, storytelling does some of the same work in many institutions. It does so because of broadly shared assumptions about narrative’s epistemological status. Stories are generally thought to be more affecting by less authoritative than analysis, in part because narrative is associated with women rather than men, the private sphere rather than the public one, and custom rather than law. Of course, conventions of storytelling and the symbolic associations behind them are neither unitary nor fixed. Nor are they likely to be uniformly advantageous for those in power and disadvantageous for those without it. Narrative’s alignment [179] along the oppositions I noted is complex. For example, as I showed in chapter 5, Americans’ skepticism of expert authority gives those telling stories clout. In other words, we may contrast science with folklore (with science seen as much more credible), but we may also contrast it with common sense (with science seen as less credible). Contrary to the lamentation of some media critics and activists, when disadvantaged groups have told personal stories to the press and on television talk shows, they have been able to draw attention not only to their own victimization but to the social forces responsible for it.“ Tags: Congressional Record, Francesca Polletta, Listening to the City, Martin Luther King, narrative, QOTD, SNCC, storytelling, strategic communication, Student Nonviolent Coordinating Committee Posted in argumentative discussions, books and reading | Comments (0) Knowledge Graphs: An Aggregation of Definitions March 3rd, 2019 by jodi I am not aware of a consensus definition of knowledge graph. I’ve been discussing this for awhile with Liliana Giusti Serra, and the topic came up again with my fellow organizers of the knowledge graph session at US2TS as we prepare for a panel. I’ve proposed the following main features: RDF-compatible, has a defined schema (usually an OWL ontology) items are linked internally may be a private enterprise dataset (e.g. not necessarily openly available for external linking) or publicly available covers one or more domains Below are some quotes. I’d be curious to hear of other definitions, especially if you think there’s a consensus definition I’m just not aware of. “A knowledge graph consists of a set of interconnected typed entities and their attributes.” Jose Manuel Gomez-Perez, Jeff Z. Pan, Guido Vetere and Honghan Wu. “Enterprise Knowledge Graph: An Introduction.”  In Exploiting Linked Data and Knowledge Graphs in Large Organisations. Springer. Part of the whole book: http://link.springer.com/10.1007/978-3-319-45654-6 “A knowledge graph is a structured dataset that is compatible with the RDF data model and has an (OWL) ontology as its schema. A knowledge graph is not necessarily linked to external knowledge graphs; however, entities in the knowledge graph usually have type information, defined in its ontology, which is useful for providing contextual information about such entities. Knowledge graphs are expected to be reliable, of high quality, of high accessibility and providing end user oriented information services.” Boris Villazon-Terrazas, Nuria Garcia-Santa, Yuan Ren, Alessandro Faraotti, Honghan Wu, Yuting Zhao, Guido Vetere and Jeff Z. Pan .  “Knowledge graphs: Foundations”. In Exploiting Linked Data and Knowledge Graphs in Large Organisations.  Springer. Part of the whole book: http://link.springer.com/10.1007/978-3-319-45654-6 “The term Knowledge Graph was coined by Google in 2012, referring to their use of semantic knowledge in Web Search (“Things, not strings”), and is recently also used to refer to Semantic Web knowledge bases such as DBpedia or YAGO. From a broader perspective, any graph-based representation of some knowledge could be considered a knowledge graph (this would include any kind of RDF dataset, as well as description logic ontologies). However, there is no common definition about what a knowledge graph is and what it is not. Instead of attempting a formal definition of what a knowledge graph is, we restrict ourselves to a minimum set of characteristics of knowledge graphs, which we use to tell knowledge graphs from other collections of knowledge which we would not consider as knowledge graphs. A knowledge graph mainly describes real world entities and their interrelations, organized in a graph. defines possible classes and relations of entities in a schema. allows for potentially interrelating arbitrary entities with each other. covers various topical domains.” Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3), 489-508. http://www.semantic-web-journal.net/system/files/swj1167.pdf “ISI’s Center on Knowledge Graphs research group combines artificial intelligence, the semantic web, and database integration techniques to solve complex information integration problems. We leverage general research techniques across information-intensive disciplines, including medical informatics, geospatial data integration and the social Web.” http://usc-isi-i2.github.io/home/ Just as I was “finalizing” my list to send to colleagues, I found a poster all about definitions: Ehrlinger, L., & Wöß, W. (2016). Towards a Definition of Knowledge Graphs. SEMANTiCS (Posters, Demos, SuCCESS), 48. http://ceur-ws.org/Vol-1695/paper4.pdf Its Table 1: Selected definitions of knowledge graph has the following definitions (for citations see that paper) “A knowledge graph (i) mainly describes real world entities and their interrelations, organized in a graph, (ii) defines possible classes and relations of entities in a schema, (iii) allows for potentially interrelating arbitrary entities with each other and (iv) covers various topical domains.” Paulheim [16] “Knowledge graphs are large networks of entities, their semantic types, properties, and relationships between entities.” Journal of Web Semantics [12] “Knowledge graphs could be envisaged as a network of all kind things which are relevant to a specific domain or to an organization. They are not limited to abstract concepts and relations but can also contain instances of things like documents and datasets.” Semantic Web Company [3] “We define a Knowledge Graph as an RDF graph. An RDF graph consists of a set of RDF triples where each RDF triple (s, p, o) is an ordered set of the following RDF terms: a subjects∈U∪B,apredicatep∈U,andanobjectU∪B∪L. AnRDFtermiseithera URI u ∈ U, a blank node b ∈ B, or a literal l ∈ L.” Färber et al. [7] “[…] systems exist, […], which use a variety of techniques to extract new knowledge, in the form of facts, from the web. These facts are interrelated, and hence, recently this extracted knowledge has been referred to as a knowledge graph.” Pujara et al. [17] “A knowledge graph is a graph that models semantic knowledge, where each node is a real-world concept, and each edge represents a relationship between two concepts” Fang, Y., Kuan, K., Lin, J., Tan, C., & Chandrasekhar, V. (2017). Object detection meets knowledge graphs. https://oar.a-star.edu.sg/jspui/handle/123456789/2147 “things not strings” – Google https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html Tags: knowledge graph, knowledge representation, quotations Posted in information ecosystem, semantic web | Comments (0) QOTD: Doing more requires thinking less December 1st, 2018 by jodi by the aid of symbolism, we can make transitions in reasoning almost mechanically by the eye which would otherwise call into play the higher faculties of the brain. …Civilization advances by extending the number of important operations that we can perform without thinking about them. Operations of thought are like cavalry charges in a battle — they are strictly limited in number, they require fresh horses, and must only be made at decisive moments. One very important property for symbolism to possess is that it should be concise, so as to be visible at one glance of the eye and be rapidly written. – Whitehead, A.N. (1911). An introduction to mathematics, Chapter 5, “The Symbolism of Mathematics” (page 61 in this version) HT to Santiago Nuñez-Corrales (Illinois page for Santiago Nuñez-Corrales, LinkedIn for Santiago Núñez-Corrales) who used part of this quote in a Conceptual Foundations Group talk, Nov 29. From my point of view, this is why memorizing multiplication tables is not now irrelevant; why new words for concepts are important; and underlies a lot of scientific advancement. Tags: cavalry, modes of thought, QOTD, symbolism Posted in information ecosystem, random thoughts | Comments (0) QOTD: Sally Jackson on how disagreement makes arguments more explicit June 19th, 2018 by jodi Sally Jackson explicates the notion of the “disagreement space” in a new Topoi article: “a position that remains in doubt remains in need of defense”1   “The most important theoretical consequence of seeing argumentation as a system for management of disagreement is a reversal of perspective on what arguments accomplish. Are arguments the means by which conclusions are built up from established premises? Or are they the means by which participants drill down from disagreements to locate how it is that they and others have arrived at incompatible positions? A view of argumentation as a process of drilling down from disagreements suggests that arguers themselves do not simply point to the reasons they hold for a particular standpoint, but sometimes discover where their own beliefs come from, under questioning by others who do not share their beliefs. A logical analysis of another’s argument nearly always involves first making the argument more explicit, attributing more to the author than was actually said. This is a familiar enough problem for analysts; my point is that it is also a pervasive problem for participants, who may feel intuitively that something is seriously wrong in what someone else has said but need a way to pinpoint exactly what. Getting beliefs externalized is not a precondition for argument, but one of its possible outcomes.”2 From Sally Jackson’s Reason-Giving and the Natural Normativity of Argumentation.3 The original treatment of disagreement space is cited to a book chapter revising an ISSA 1992 paper4, somewhat harder to get one’s hands on. p 12, Sally Jackson. Reason-Giving and the Natural Normativity of Argumentation. Topoi. 2018 Online First. http://doi.org/10.1007/s11245-018-9553-5 [↩] p 10, Sally Jackson. Reason-Giving and the Natural Normativity of Argumentation. Topoi. 2018 Online First. http://doi.org/10.1007/s11245-018-9553-5 [↩] Sally Jackson. Reason-Giving and the Natural Normativity of Argumentation. Topoi. 2018 Online First. http://doi.org/10.1007/s11245-018-9553-5 [↩] Jackson S (1992) “Virtual standpoints” and the pragmatics of conversational argument. In: van Eemeren FH, Grootendorst R, Blair JA, Willard CA (eds) Argument illuminated. International Centre for the Study of Argumentation, Amsterdam, pp. 260–226 [↩] Tags: argumentation, argumentation norms, disagreement space Posted in argumentative discussions | Comments (0) QOTD: Working out scientific insights on paper, Lavoisier case study July 12th, 2017 by jodi …language does do much of our thinking for us, even in the sciences, and rather than being an unfortunate contamination, its influence has been productive historically, helping individual thinkers generate concepts and theories that can then be put to the test. The case made here for the constitutive power of figures [of speech] per se supports the general point made by F.L. Holmes in a lecture addressed to the History of Science Society in 1987. A distinguished historian of medicine and chemistry, Holmes based his study of Antoine Lavoisier on the French chemist’s laboratory notebooks. He later examined drafts of Lavoisier’s published papers and discovered that Lavoisier wrote many versions of his papers and in the course of careful revisions gradually worked out the positions he eventually made public (Holmes, 221). Holmes, whose goal as a historian is to reconstruct the careful pathways and fine structure of scientific insights, concluded from his study of Lavoisier’s drafts We cannot always tell whether a thought that led him to modify a passage, recast an argument, or develop an alternative interpretation occurred while he was still engaged in writing what he subsequently altered, or immediately afterward, or after some interval during which he occupied himself with something else; but the timing is, I believe, less significant than the fact that the new developments were consequences of the effort to express ideas and marshall supporting information on paper (225). – page xi of Rhetorical Figures in Science by Jeanne Fahnestock, Oxford University Press, 1999. She is quoting Frederich L. Holmes. 1987. Scientific writing and scientific discovery. Isis 78:220-235. DOI:10.1086/354391 As Moore summarizes, Lavoisier wrote at least six drafts of the paper over a period of at least six months. However, his theory of respiration did not appear until the fifth draft. Clearly, Lavoisier’s writing helped him refine and understand his ideas. Moore, Randy. Language—A Force that Shapes Science. Journal of College Science Teaching 28.6 (1999): 366. http://www.jstor.org/stable/42990615 (which I quoted in a review I wrote recently) Fahnestock adds: “…Holmes’s general point [is that] there are subtle interactions ‘between writing, thought, and operations in creative scientific activity’ (226).” Tags: Lavoisier, revision, rhetoric of science, scientific communication, scientific writing Posted in future of publishing, information ecosystem, scholarly communication | Comments (0) David Liebovitz: Achieving Care transformation by Infusing Electronic Health Records with Wisdom May 1st, 2017 by jodi Today I am at the Health Data Analytics summit. The title of the keynote talk is Achieving Care transformation by Infusing Electronic Health Records with Wisdom. It’s a delight to hear from a medical informaticist: David M. Liebovitz (publications in Google Scholar), MD, FACP, Chief Medical Information Officer, The University of Chicago. He graduated from University of Illinois in electrical engineering, making this a timely talk as the engineering-focused Carle Illinois College of Medicine gets going. David Liebovitz started with a discussion of the data problems — problem lists, medication lists, family history, rules, results, notes — which will be familiar to anyone using EHRs or working with EHR data. He draws attention also to the human problems — both in terms of provider “readiness” (e.g. their vision for population-level health) as well as about “current expectations”. (An example of such an expectation is a “main clinician satisfier” he closed with: U Chicago is about to turn on outbound faxing from the EHR!) He mentioned also the importance of resilience. He mentioned customizing systems as a risk when the vendor makes upstream changes (this is not unique to healthcare but a threat to innovation and experimentation with information systems in other industries.) Still, in managing the EHR, there is continual optimization, scored based on a number of factors. He mentioned: Safety Quality/patient experience Regulatory/legal Financial Usability/productivity Availability of alternative solutions As well as weighting for old requests. He emphasized the complexity of healthcare in several ways: “Nobody knew that healthcare could be so complicated.” – POTUS Showing the Medicare readmissions adjustment factors Pharmacy pricing, an image (showing kickbacks among other things) from “Prices That Are Too High”, Chapter 5, The Healthcare Imperative: Lowering Costs and Improving Outcomes: Workshop Series Summary (2010)  National Academies Press doi:10.17226/12750 An image from “Prices That Are Too High”, Chapter 5, The Healthcare Imperative: Lowering Costs and Improving Outcomes: Workshop Series Summary (2010) Icosystem’s diagram of the complexity of the healthcare system Icosystem – complexity of the healthcare system Another complexity is the modest impact of medical care compared to other factors such as the impact of socioeconomic and political context on equity in health and well-being (see the WHO image below). For instance, there is a large impact of health behaviors, which “happen in larger social contexts.” (See the Relative Contribution of Multiple Determinants to Health, August 21, 2014, Health Policy Briefs) Solar O, Irwin A. A conceptual framework for action on the social determinants of health. Social Determinants of Health Discussion Paper 2 (Policy and Practice). Given this complexity, David Liebovitz stresses that we need to start with the right model, “simultaneously improving population health, improving the patient experience of care, and reducing per capita cost”. (See Stiefel M, Nolan K. A Guide to Measuring the Triple Aim: Population Health, Experience of Care, and Per Capita Cost. IHI Innovation Series white paper. Cambridge, Massachusetts: Institute for Healthcare Improvement; 2012). Table 1 from Stiefel M, Nolan K. A Guide to Measuring the Triple Aim: Population Health, Experience of Care, and Per Capita Cost. IHI Innovation Series white paper. Cambridge, Massachusetts: Institute for Healthcare Improvement; 2012. Given the modest impact of medical care, and of data, he suggests that we should choose the right outcomes. David Liebovitz says that “not enough attention has been paid to usability”; I completely agree and suggest that information scientists, human factors engineeers, and cognitive ergonomists help mainstream medical informaticists fill this gap. He put up Jakob Nielsen’s 10 usability heuristics for user interface design A vivid example is whether a patient’s resuscitation preferences are shown (which seems to depend on the particular EHR screen): the system doesn’t highlight where we are in the system. For providers, he says user control and freedom are very important. He suggests that there are only a few key tasks. A provider should be able to do ANY of these things wherever they are in the chart: put a note order something send a message Similarly, EHR should support recognition (“how do I admit a patient again?”) rather than requiring recall. Meanwhile, on the decision support side he highlights the (well-known) problems around interruptions by saying that speed is everything and changing direction is much easier than stopping. Here he draws on some of his own work, describing what he calls a “diagnostic process aware workflow” David Liebovitz. Next steps for electronic health records to improve the diagnostic process. Diagnosis 2015 2(2) 111-116. doi:10.1515/dx-2014-0070 Can we predict X better? Yes, he says (for instance pointing to Table 3 of “Can machine-learning improve cardiovascular risk prediction using routine clinical data?” and its machine learning analysis of over 300,000 patients, based on variables chosen from previous guidelines and expert-informed selection–generating further support for aspects such as aloneness, access to resources, socio-economic status). But what’s really needed, he says, is to: Predict the best next medical step, iteratively Predict the best next lifestyle step, iteratively (And what to do about genes and epigenetic measures?) He shows an image of “All of our planes in the air” from flightaware, drawing the analogy that we want to work on “optimal patient trajectories” — predicting what are the “turbulent events” to avoid”. This is not without challenges. He points to three: Data privacy (He suggests Google DeepMind and healthcare in an age of algorithms. Powles, J. & Hodson, H. Health Technol. (2017). doi:10.1007/s12553-017-0179-1 Two sorts of mismatches between the current situation and where we want to go: For instance the source of data being from finance Certain basic current clinician needs  (e.g. that a main clinician satisfier is that UChicago is soon to turn on outbound faxing from their EHR — and that an ongoing source of dissatisfaction: managing volume of inbound faxes.) He closes suggesting that we: Finish the basics Address key slices of the spectrum Descriptive/prescriptive Begin the prescriptive journey: impact one trajectory at a time. Tags: data analytics, electronic health records, healthcare systems, medical informatics Posted in information ecosystem | Comments (0) « Older Entries Recent Posts Paid graduate hourly research position at UIUC for Spring 2021 Avoiding long-haul air travel during the COVID-19 pandemic Paid Undergraduate Research position at UIUC for Fall & Spring 2020 #ShutDownSTEM #strike4blacklives #ShutDownAcademia QOTD: Storytelling in protest and politics Monthly December 2020 October 2020 August 2020 June 2020 March 2020 Meta Log in Valid XHTML XFN WordPress Wordpress powers jodischneider.com/blog. Layers theme Designed by Jai Pandya. jodischneider-com-7291 ---- jodischneider.com/blog jodischneider.com/blog reading, technology, stray thoughts Paid graduate hourly research position at UIUC for Spring 2021 Jodi Schneider’s Information Quality Lab (http://infoqualitylab.org) seeks a graduate hourly student for a research project on bias in citation networks. Biased citation benefits authors in the short-term by bolstering grants and papers, making them more easily accepted. However, it can have severe negative consequences for scientific inquiry. Our goal is to find quantitative measures of […] Avoiding long-haul air travel during the COVID-19 pandemic I would not recommend long-haul air travel at this time. An epidemiological study of a 7.5 hour flight from the Middle East to Ireland concluded that 4 groups (13 people), traveling from 3 continents in four groups, who used separate airport lounges, were likely infected in flight. The flight had 17% occupancy (49 passengers/283 seats; […] Paid Undergraduate Research position at UIUC for Fall & Spring 2020 University of Illinois undergraduates are encouraged to apply for a position in my lab. I particularly welcome applications from students in the new iSchool BS/IS degree or in the university-wide informatics minor. While I only have 1 paid position open, I also supervise unpaid independent study projects. Dr. Jodi Schneider and the Information Quality Lab <https://infoqualitylab.org> seek […] #ShutDownSTEM #strike4blacklives #ShutDownAcademia I greatly appreciated receiving messages from senior people about their participation in the June 10th #ShutDownSTEM #strike4blacklives #ShutDownAcademia. In that spirit, I am sharing my email bounce message for tomorrow, and the message I sent to my research lab. Email bounce: I am not available by email today: This June 10th is a day of action […] QOTD: Storytelling in protest and politics I recently read Francesca Polletta‘s book It was like a fever: Storytelling in protest and politics (2006, University of Chicago Press). I recommend it! It will appeal to researchers interested in topics such as narrative, strategic communication, (narrative) argumentation, or epistemology (here, of narrative). Parts may also interest activists. The book’s case studies are drawn from the […] Knowledge Graphs: An Aggregation of Definitions I am not aware of a consensus definition of knowledge graph. I’ve been discussing this for awhile with Liliana Giusti Serra, and the topic came up again with my fellow organizers of the knowledge graph session at US2TS as we prepare for a panel. I’ve proposed the following main features: RDF-compatible, has a defined schema (usually an […] QOTD: Doing more requires thinking less by the aid of symbolism, we can make transitions in reasoning almost mechanically by the eye which would otherwise call into play the higher faculties of the brain. …Civilization advances by extending the number of important operations that we can perform without thinking about them. Operations of thought are like cavalry charges in a battle […] QOTD: Sally Jackson on how disagreement makes arguments more explicit Sally Jackson explicates the notion of the “disagreement space” in a new Topoi article: “a position that remains in doubt remains in need of defense”1   “The most important theoretical consequence of seeing argumentation as a system for management of disagreement is a reversal of perspective on what arguments accomplish. Are arguments the means by […] QOTD: Working out scientific insights on paper, Lavoisier case study …language does do much of our thinking for us, even in the sciences, and rather than being an unfortunate contamination, its influence has been productive historically, helping individual thinkers generate concepts and theories that can then be put to the test. The case made here for the constitutive power of figures [of speech] per se […] David Liebovitz: Achieving Care transformation by Infusing Electronic Health Records with Wisdom Today I am at the Health Data Analytics summit. The title of the keynote talk is Achieving Care transformation by Infusing Electronic Health Records with Wisdom. It’s a delight to hear from a medical informaticist: David M. Liebovitz (publications in Google Scholar), MD, FACP, Chief Medical Information Officer, The University of Chicago. He graduated from […] joinpeertube-org-5198 ---- JoinPeerTube developed by Home Create an account News Help Contribute Git Languages English Français Deutsch Español Esperanto Italiano Polski Português русский svenska magyar galego 日本語 繁體中文(台灣) Translate Free software to take back control of your videos image/svg+xml PeerTube, developed by Framasoft, is the free and decentralized alternative to video platforms, providing you over 400,000 videos published by 60,000 users and viewed over 15 million times What is PeerTube? See the instances listDiscover our content selection The Hackers War Watch the video The Hackers War The Hacker Wars (2014) is a film about the targeting of hacktivists and journalists by the US government. The film follows the information warriors who are fighting back, and it depicts the dangerous battle in which (h)ac(k)tivists fight for information freedom. Hacktivists impact the world in a new way by using the government's information against itself to call out those in power. The Hacker Wars takes you to the front lines of the high-stakes battle over the fate of the Internet, freedom and privacy. #information #freedom #privacy Gaby Weber Documentaries Discover the channel Gaby Weber Documentaries Gabriele "Gaby" Weber is a German journalist. She has been reporting from South America since the mid-eighties, mainly for ARD. Her focal points are international politics, human rights and the history of German-Latin American relations. On this channel, you can discover her documentary movies in german, english and spanish languages. #documentary #geopolitical #journalism Blender Go on the instance Blender The Official Blender Foundation PeerTube instance give you access to videos presenting the evolutions of the 3D creation software, tutorials and animated films supported by the Blender Foundation. All videos published on this instance are under Creative Commons Attribution licence. #Blender Browse contents What is? PeerTube aspires to be a decentralized and free/libre alternative to video broadcasting services. Our aim is not to replace them, but rather to simultaneously offer something else, with different values. A federation of interconnected hosting services PeerTube is not meant to become a huge platform that would centralize videos from all around the world. Rather, it is a network of inter-connected small videos hosters. Anyone with a modicum of technical skills can host a PeerTube server, aka an instance. Each instance hosts its users and their videos. In this way, every instance is created, moderated and maintained independently by various administrators. Discover PeerTube instances You can still watch from your account videos hosted by other instances though if the administrator of your instance had previously connected it with other instances. This is just how a federation works! And there's more! PeerTube uses ActivityPub, a federating protocol that allows you to interact with other software, provided they also use this protocol. For example, PeerTube and Mastodon -a Twitter alternative- are connected: you can follow a PeerTube user from Mastodon (the latest videos from the PeerTube account you follow will appear in your feed), and even comment on a PeerTube-hosted video directly from your Mastodon's account. Open-source, free/libre license code Mainstream online video broadcasting services make money off of your data by analyzing your interactions so that they can then bombard your with targeted advertising. Peertube is not subject to any corporate monopoly, does not rely on ads and does not track you. Most importantly, you are a person to PeerTube, not a product in need of profiling so as to be stuck in video loops. For example, PeerTube doesn't use any biased recommendation algorithms to keep you online for hours on end. All of this is made possible by Peertube's free/libre license (GNU-AGPL). Its code is a digital "common", that belongs to everybody, instead of a secret formula that belongs to Google (in the case of Youtube) or to Vivendi/Bolloré (Dailymotion). This free/libre license guarantees our fundamental freedoms as users and allows many contributors to offer evolutions and new features. Are you a video maker? With PeerTube, choose your hosting company and the rules you believe in. YouTube has clearly gone astray: its hoster, Google-Alphabet, can enforce its ContentID system (the infamous "Robocopyright") or its videos recommendation system, all of which appear to be as obscure as unfair. Direct contact with a human-scale hoster allows for two things: you no longer are the client of a huge tech company, and you can nurture a special relationship with your hoster, who distributes your data. With PeerTube, you get to choose your hosting provider according to their terms of use, such as their disk space limit per user, their moderation policy, who they chose to federate with... You are not speaking with a huge tech company, so you can talk it out in case of any issue, need, desire... Browse/discover PeerTube instances About peer-to-peer broadcasting and watching The PeerTube software can, whenever necessary, use a peer-to-peer protocol (P2P) to broadcast viral videos, lowering the load of their hosts. In this way, when you watch a video, your computer contributes to its broadcast. If a lot of people are watching the same video at the same time, their browser automatically send smalls pieces of the video to the other viewers. The server resources are not over-exploited: the stream is split, the network optimized. It might not look like it, but thanks to peer-to-peer broadcasting, popular video makers and their videos are no longer forced to be hosted by big companies, whose infrastructure can stand thousands of views at the same time... or to pay for a robust but extremely expensive independent video host. Your move! Browse contentsSign up Enjoy every feature: history, subscriptions, playlists, notifications... Who is behind? Peertube is a free/libre software funded by a French non-profit organization: Framasoft Our organization started in 2004, and now devotes itself to popular education about digital technology issues. We are a small structure of less than 40 members and under 10 employees, well-known for the De-google-ify Internet project, when we offered 34 ethical and alternative online tools. As a public interest organization, over 90% of our funding comes from donations (tax deductible for French taxpayers). Thanks to our crowdfunding (from March to July 2018), Framasoft were able to employ PeerTube's main developer. After a beta release in March 2018, release 1 came out in November 2018. Since then, several intermediary releases have brought many features along. Several collectives have already created PeerTube hosts, laying the foundation for the federation. The more people use, support, and contribute to PeerTube, the quicker it will become a concrete alternative to platforms like YouTube. Donate to FramasoftLegal noticesContactNewsletterForum Press kitJoinPeerTube GitPeerTube Git Website developed by Framasoft and designed by Maiwann Illustrations from What is PeerTube video, created by LILA - ZeMarmot Team PeerTube mascot created by David Revoy PeerTube news! content licensed under CC-BY-SA journal-code4lib-org-790 ---- The Code4Lib Journal Mission Editorial Committee Process and Structure Code4Lib Issue 50, 2021-02-10 Editorial Eric Hanson Resuming our publication schedule Managing an institutional repository workflow with GitLab and a folder-based deposit system Whitney R. Johnson-Freeman, Mark E. Phillips, and Kristy K. Phillips Institutional Repositories (IR) exist in a variety of configurations and in various states of development across the country. Each organization with an IR has a workflow that can range from explicitly documented and codified sets of software and human workflows, to ad hoc assortments of methods for working with faculty to acquire, process and load items into a repository. The University of North Texas (UNT) Libraries has managed an IR called UNT Scholarly Works for the past decade but has until recently relied on ad hoc workflows. Over the past six months, we have worked to improve our processes in a way that is extensible and flexible while also providing a clear workflow for our staff to process submitted and harvested content. Our approach makes use of GitLab and its associated tools to track and communicate priorities for a multi-user team processing resources. We paired this Web-based management with a folder-based system for moving the deposited resources through a sequential set of processes that are necessary to describe, upload, and preserve the resource. This strategy can be used in a number of different applications and can serve as a set of building blocks that can be configured in different ways. This article will discuss which components of GitLab are used together as tools for tracking deposits from faculty as they move through different steps in the workflow. Likewise, the folder-based workflow queue will be presented and described as implemented at UNT, and examples for how we have used it in different situations will be presented. Customizing Alma and Primo for Home & Locker Delivery Christina L. Hennessey Like many Ex Libris libraries in Fall 2020, our library at California State University, Northridge (CSUN) was not physically open to the public during the 2020-2021 academic year, but we wanted to continue to support the research and study needs of our over 38,000 university students and 4,000 faculty and staff. This article will explain our Alma and Primo implementation to allow for home mail delivery of physical items, including policy decisions, workflow changes, customization of request forms through labels and delivery skins, customization of Alma letters, a Python solution to add the “home” address type to patron addresses to make it all work, and will include relevant code samples in Python, XSL, CSS, XML, and JSON. In Spring 2021, we will add the on-site locker delivery option in addition to home delivery, and this article will include new system changes made for that option. GaNCH: Using Linked Open Data for Georgia’s Natural, Cultural and Historic Organizations’ Disaster Response Cliff Landis, Christine Wiseman, Allyson F. Smith, Matthew Stephens In June 2019, the Atlanta University Center Robert W. Woodruff Library received a LYRASIS Catalyst Fund grant to support the creation of a publicly editable directory of Georgia’s Natural, Cultural and Historical Organizations (NCHs), allowing for quick retrieval of location and contact information for disaster response. By the end of the project, over 1,900 entries for NCH organizations in Georgia were compiled, updated, and uploaded to Wikidata, the linked open data database from the Wikimedia Foundation. These entries included directory contact information and GIS coordinates that appear on a map presented on the GaNCH project website (https://ganch.auctr.edu/), allowing emergency responders to quickly search for NCHs by region and county in the event of a disaster. In this article we discuss the design principles, methods, and challenges encountered in building and implementing this tool, including the impact the tool has had on statewide disaster response after implementation. Archive This Moment D.C.: A Case Study of Participatory Collecting During COVID-19 Julie Burns, Laura Farley, Siobhan C. Hagan, Paul Kelly, and Lisa Warwick When the COVID-19 pandemic brought life in Washington, D.C. to a standstill in March 2020, staff at DC Public Library began looking for ways to document how this historic event was affecting everyday life. Recognizing the value of first-person accounts for historical research, staff launched Archive This Moment D.C. to preserve the story of daily life in the District during the stay-at-home order. Materials were collected from public Instagram and Twitter posts submitted through the hashtag #archivethismomentdc. In addition to social media, creators also submitted materials using an Airtable webform set up for the project and through email. Over 2,000 digital files were collected. This article will discuss the planning, professional collaboration, promotion, selection, access, and lessons learned from the project; as well as the technical setup, collection strategies, and metadata requirements. In particular, this article will include a discussion of the evolving collection scope of the project and the need for clear ethical guidelines surrounding privacy when collecting materials in real-time. Advancing ARKs in the Historical Ontology Space Mat Kelly, Christopher B. Rauch, Jane Greenberg, Sam Grabus, Joan Boone, John Kunze and Peter M. Logan This paper presents the application of Archival Resource Keys (ARKs) for persistent identification and resolution of concepts in historical ontologies. Our use case is the 1910 Library of Congress Subject Headings (LCSH), which we have converted to the Simple Knowledge Organization System (SKOS) format and will use for representing a corpus of historical Encyclopedia Britannica articles. We report on the steps taken to assign ARKs in support of the Nineteenth-Century Knowledge Project, where we are using the HIVE vocabulary tool to automatically assign subject metadata from both the 1910 LCSH and the contemporary LCSH faceted, topical vocabulary to enable the study of the evolution of knowledge. Considered Content: a Design System for Equity, Accessibility, and Sustainability Erinn Aspinall, Amy Drayer, Gabe Ormsby, and Jen Neveau The University of Minnesota Libraries developed and applied a principles-based design system to their Health Sciences Library website. With the design system at its center, the revised site was able to achieve accessible, ethical, inclusive, sustainable, responsible, and universal design. The final site was built with elegantly accessible semantic HTML-focused code on Drupal 8 with highly curated and considered content, meeting and exceeding WCAG 2.1 AA guidance and addressing cognitive and learning considerations through the use of plain language, templated pages for consistent page-level organization, and no hidden content. As a result, the site better supports all users regardless of their abilities, attention level, mental status, reading level, and reliability of their internet connection, all of which are especially critical now as an elevated number of people experience crises, anxieties, and depression. Robustifying Links To Combat Reference Rot Shawn Jones, Martin Klein, and Herbert Van de Sompel Links to web resources frequently break, and linked content can change at unpredictable rates. These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information. In this paper, we highlight the significance of reference rot, provide an overview of existing techniques and their characteristics to address it, and introduce our Robust Links approach, including its web service and underlying API. Robustifying links offers a proactive, uniform, and machine-actionable way to combat reference rot. In addition, we discuss our reasoning and approach aimed at keeping the approach functional for the long term. To showcase our approach, we have robustified all links in this article. Machine Learning Based Chat Analysis Christopher Brousseau, Justin Johnson, Curtis Thacker The BYU library implemented a Machine Learning-based tool to perform various text analysis tasks on transcripts of chat-based interactions between patrons and librarians. These text analysis tasks included estimating patron satisfaction and classifying queries into various categories such as Research/Reference, Directional, Tech/Troubleshooting, Policy/Procedure, and others. An accuracy of 78% or better was achieved for each category. This paper details the implementation details and explores potential applications for the text analysis tool. Always Be Migrating Elizabeth McAulay At the University of California, Los Angeles, the Digital Library Program is in the midst of a large, multi-faceted migration project. This article presents a narrative of migration and a new mindset for technology and library staff in their ever-changing infrastructure and systems. This article posits that migration from system to system should be integrated into normal activities so that it is not a singular event or major project, but so that it is a process built into the core activities of a unit. ISSN 1940-5758 Current Issue Issue 50, 2021-02-10 Previous Issues Issue 49, 2020-08-10 Issue 48, 2020-05-11 Issue 47, 2020-02-17 Issue 46, 2019-11-05 Older Issues For Authors Call for Submissions Article Guidelines Log in This work is licensed under a Creative Commons Attribution 3.0 United States License. kcoyle-blogspot-com-8471 ---- Coyle's InFormation Coyle's InFormation Comments on the digital age, which, as we all know, is 42. Monday, March 01, 2021 Digitization Wars, Redux  (NB: IANAL)   Because this is long, you can download it as a PDF here. From 2004 to 2016 the book world (authors, publishers, libraries, and booksellers) was involved in the complex and legally fraught activities around Google’s book digitization project. Once known as “Google Book Search,” the company claimed that it was digitizing books to be able to provide search services across the print corpus, much as it provides search capabilities over texts and other media that are hosted throughout the Internet.  Both the US Authors Guild and the Association of American Publishers sued Google (both separately and together) for violation of copyright. These suits took a number of turns including proposals for settlements that were arcane in their complexity and that ultimately failed. Finally, in 2016 the legal question was decided: digitizing to create an index is fair use as long as only minor portions of the original text are shown to users in the form of context-specific snippets.  We now have another question about book digitization: can books be digitized for the purpose of substituting remote lending in the place of the lending of a physical copy? This has been referred to as “Controlled Digital Lending (CDL),” a term developed by the Internet Archive for its online book lending services. The Archive has considerable experience with both digitization and providing online access to materials in various formats, and its Open Library site has been providing digital downloads of out of copyright books for more than a decade. Controlled digital lending applies solely to works that are presumed to be in copyright.  Controlled digital lending works like this: the Archive obtains and retains a physical copy of a book. The book is digitized and added to the Open Library catalog of works. Users can borrow the book for a limited time (2 weeks) after which the book “returns” to the Open Library. While the book is checked out to a user no other user can borrow that “copy.” The digital copy is linked one-to-one with a physical copy, so if more than one copy of the physical book is owned then there is one digital loan available for each physical copy.  The Archive is not alone in experimenting with lending of digitized copies: some libraries have partnered with the Archive’s digitization and lending service to provide digital lending for library-owned materials. In the case of the Archive the physical books are not available for lending. Physical libraries that are experimenting with CDL face the added step of making sure that the physical book is removed from circulation while the digitized book is on loan, and reversing that on return of the digital book.  Although CDL has an air of legality due to limiting lending to one user at a time, authors and publishers associations had raised objections to the practice. [nwu] However, in March of 2020 the Archive took a daring step that pushed their version of the CDL into litigation: using the closing of many physical libraries due to the COVID pandemic as its rationale, the Archive renamed its lending service the National Emergency Library [nel] and eliminated the one-to-one link between physical and digital copies. Ironically this meant that the Archive was then actually doing what the book industry had accused it of (either out of misunderstanding or as an exaggeration of the threat posed): it was making and lending digital copies beyond its physical holdings. The Archive stated that the National Emergency Library would last only until June of 2020, presumably because by then the COVID danger would have passed and libraries would have re-opened. In June the Archive’s book lending service returned to the one-to-one model. Also in June a suit was filed by four publishers (Hachette, HarperCollins, Penguin Random House, and Wiley) in the US District Court of the Southern District of New York. [suit]  The Controlled Digital Lending, like the Google Books project, holds many interesting questions about the nature of “digital vs physical,” not only in a legal sense but in a sense of what it means to read and to be a reader today. The lawsuit not only does not further our understanding of this fascinating question; it sinks immediately into hyperbole, fear-mongering, and either mis-information or mis-direction. That is, admittedly, the nature of a lawsuit. What follows here is not that analysis but gives a few of the questions that are foremost in my mind.  Apples and Oranges   Each of the players in this drama has admirable reasons for their actions. The publishers explain in their suit that they are acting in support of authors, in particular to protect the income of authors so that they may continue to write. The Authors’ Guild provides some data on author income, and by their estimate the average full-time author earns less than $20,000 per year, putting them at poverty level.[aghard] (If that average includes the earnings of highly paid best selling authors, then the actual earnings of many authors is quite a bit less than that.)  The Internet Archive is motivated to provide democratic access to the content of books to anyone who needs or wants it. Even before the pandemic caused many libraries to close the collection housed at the Archive contained some works that are available only in a few research libraries. This is because many of the books were digitized during the Google Books project which digitized books from a small number of very large research libraries whose collections differ significantly from those of the public libraries available to most citizens.  Where the pronouncements of both parties fail is in making a false equivalence between some authors and all authors, and between some books and all books, and the result is that this is a lawsuit pitting apples against oranges. We saw in the lawsuits against Google that some academic authors, who may gain status based on their publications but very little if any income, did not see themselves as among those harmed by the book digitization project. Notably the authors in this current suit, as listed in the bibliography of pirated books in the appendix to the lawsuit, are ones whose works would be characterized best as “popular” and “commercial,” not academic: James Patterson, J. D. Salinger, Malcolm Gladwell, Toni Morrison, Laura Ingalls Wilder, and others. Not only do the living authors here earn above the poverty level, all of them provide significant revenue for the publishers themselves. And all of the books listed are in print and available in the marketplace. No mention is made of out-of-print books, no academic publishers seem to be involved.  On the part of the Archive, they state that their digitized books fill an educational purpose, and that their collection includes books that are not available in digital format from publishers: “ While Overdrive, Hoopla, and other streaming services provide patrons access to latest best sellers and popular titles,  the long tail of reading and research materials available deep within a library’s print collection are often not available through these large commercial services.  What this means is that when libraries face closures in times of crisis, patrons are left with access to only a fraction of the materials that the library holds in its collection.”[cdl-blog] This is undoubtedly true for some of the digitized books, but the main thesis of the lawsuit points out that the Archive has digitized and is also lending current popular titles. The list of books included in the appendix of the lawsuit shows that there are in-copyright and most likely in-print books of a popular reading nature that have been part of the CDL. These titles are available in print and may also be available as ebooks from the publishers. Thus while the publishers are arguing that current, popular books should not be digitized and loaned (apples), the Archive is arguing that they are providing access to items not available elsewhere, and for educational purposes (oranges).  The Law  The suit states that publishers are not questioning copyright law, only violations of the law. “For the avoidance of doubt, this lawsuit is not about the occasional transmission of a title under appropriately limited circumstances, nor about anything permissioned or in the public domain. On the contrary, it is about IA’s purposeful collection of truckloads of in-copyright books to scan, reproduce, and then distribute digital bootleg versions online.” ([Suit] Page 3). This brings up a whole range of legal issues in regard to distributing digital copies of copyrighted works. There have been lengthy arguments about whether copyright law could permit first sale rights for digital items, and the answer has generally been no; some copyright holders have made the argument that since transfer of a digital file is necessarily the making of a copy there can be no first sale rights for those files. [1stSale] [ag1] Some ebook systems, such as the Kindle, have allowed time-limited person-to-person lending for some ebooks. This is governed by license terms between Amazon and the publishers, not by the first sale rights of the analog world.  Section 108 of the copyright law does allow libraries and archives to make a limited number of copies The first point of section 108 states that libraries can make a single copy of a work as long as 1) it is not for commercial advantage, 2) the collection is open to the public and 3) the reproduction includes the copyright notice from the original. This sounds to be what the Archive is doing. However, the next two sections (b and c) provide limitations on that first section that appear to put the Archive in legal jeopardy: section “b” clarifies that copies may be made for preservation or security; section “c” states that the copies can be made if the original item is deteriorating and a replacement can no longer be purchased. Neither of these applies to the Archive’s lending.   In addition to its lending program, the Archive provides downloads of scanned books in DAISY format for those who are certified as visually impaired by the National Library Service for the Blind and Physically Handicapped in the US. This is covered in 121A of the copyright law, Title17, which allows the distribution of copyrighted works in accessible formats. This service could possibly be cited as a justification of the scanning of in-copyright works at the Archive, although without mitigating the complaints about lending those copies to others. This is a laudable service of the Archive if scans are usable by the visually impaired, but the DAISY-compatible files are based on the OCR’d text, which can be quite dirty. Without data on downloads under this program it is hard to know the extent to which this program benefits visually impaired readers.   Lending  Most likely as part of the strategy of the lawsuit, very little mention is made of “lending.” Instead the suit uses terms like “download” and “distribution” which imply that the user of the Archive’s service is given a permanent copy of the book “With just a few clicks, any Internet-connected user can download complete digital copies of in-copyright books from Defendant.” ([suit] Page 2). “... distributing the resulting illegal bootleg copies for free over the Internet to individuals worldwide.” ([suit] Page 14). Publishers were reluctant to allow the creation of ebooks for many years until they saw that DRM would protect the digital copies. It then was another couple of years before they could feel confident about lending - and by lending I mean lending by libraries. It appears that Overdrive, the main library lending platform for ebooks, worked closely with publishers to gain their trust. The lawsuit questions whether the lending technology created by the Archive can be trusted. “...Plaintiffs have legitimate fears regarding the security of their works both as stored by IA on its servers” ([suit] Page 47). In essence, the suit accuses IA of a lack of transparency about its lending operation. Of course, any collaboration between IA and publishers around the technology is not possible because the two are entirely at odds and the publishers would reasonably not cooperate with folks they see as engaged in piracy of their property.  Even if the Archive’s lending technology were proven to be secure, lending alone is not the issue: the Archive copied the publishers’ books without permission prior to lending. In other words, they were lending content that they neither owned (in digital form) nor had licensed for digital distribution. Libraries pay, and pay dearly, for the ebook lending service that they provide to their users. The restrictions on ebooks may seem to be a money-grab on the part of publishers, but from their point of view it is a revenue stream that CDL threatens.  Is it About the Money? “... IA rakes in money from its infringing services…” ([suit] Page 40). (Note: publishers earn, IA “rakes in”) “Moreover, while Defendant promotes its non-profit status, it is in fact a highly commercial enterprise with millions of dollars of annual revenues, including financial schemes that provide funding for IA’s infringing activities. ([suit] Page 4). These arguments directly address section (a)(1) of Title 17, section 108: “(1) the reproduction or distribution is made without any purpose of direct or indirect commercial advantage”.  At various points in the suit there are references to the Archive’s income, both for its scanning services and donations, as well as an unveiled show of envy at the over $100 million that Brewster Kahle and his wife have in their jointly owned foundation. This is an attempt to show that the Archive derives “direct or indirect commercial advantage” from CDL. Non-profit organizations do indeed have income, otherwise they could not function, and “non-profit” does not mean a lack of a revenue stream, it means returning revenue to the organization instead of taking it as profit. The argument relating to income is weakened by the fact that the Archive is not charging for the books it lends. However, much depends on how the courts will interpret “indirect commercial advantage.” The suit argues that the Archive benefits generally from the scanned books because this enhances the Archive’s reputation which possibly results in more donations. There is a section in the suit relating to the “sponsor a book” program where someone can donate a specific amount to the Archive to digitize a book. How many of us have not gotten a solicitation from a non-profit that makes statements like: “$10 will feed a child for a day; $100 will buy seed for a farmer, etc.”? The attempt to correlate free use of materials with income may be hard to prove.  Reading  Decades ago, when the service Questia was just being launched (Questia ceased operation December 21, 2020), Questia sales people assured a group of us that their books were for “research, not reading.” Google used a similar argument to support its scanning operation, something like “search, not reading.” The court decision in Google’s case decided that Google’s scanning was fair use (and transformative) because the books were not available for reading, as Google was not presenting the full text of the book to its audience.[suit-g]  The Archive has taken the opposite approach, a “books are for reading” view. Beginning with public domain books, many from the Google books project, and then with in-copyright books, the Archive has promoted reading. It developed its own in-browser reading software to facilitate reading of the books online. [reader] (*See note below) Although the publishers sued Google for its scanning, they lost due to the “search, not reading” aspect of that project. The Archive has been very clear about its support of reading, which takes the Google justification off the table.  “Moreover, IA’s massive book digitization business has no new purpose that is fundamentally different than that of the Publishers: both distribute entire books for reading.” ([suit] Page 5).   However, the Archive's statistics on loaned books shows that a large proportion of the books are used for 30 minutes or less.  “Patrons may be using the checked-out book for fact checking or research, but we suspect a large number of people are browsing the book in a way similar to browsing library shelves.” [ia1]   In its article on the CDL, the Center for Democracy and Technology notes that “the majority of books borrowed through NEL were used for less than 30 minutes, suggesting that CDL’s primary use is for fact-checking and research, a purpose that courts deem favorable in a finding of fair use.” [cdt] The complication is that the same service seems to be used both for reading of entire books and as a place to browse or to check individual facts (the facts themselves cannot be copyrighted). These may involve different sets of books, once again making it difficult to characterize the entire set of digitized books under a single legal claim.  The publishers claim that the Archive is competing with them using pirated versions of their own products. That leads us to the question of whether the Archive’s books, presented for reading, are effectively substitutes for those of the publishers. Although the Archive offers actual copies, those copies that are significantly inferior to the original. However, the question of quality did not change the judgment in the lawsuit against copying of texts by Kinko’s [kinkos], which produced mediocre photocopies from printed and bound publications. It seems unlikely that the quality differential will serve to absolve the Archive from copyright infringement even though the poor quality of some of the books interferes with their readability.  Digital is Different Publishers have found a way to monetize digital versions, in spite of some risks, by taking advantage of the ability to control digital files with technology and by licensing, not selling, those files to individuals and to libraries. It’s a “new product” that gets around First Sale because, as it is argued, every transfer of a digital file makes a copy, and it is the making of copies that is covered by copyright law. [1stSale]  The upshot of this is that because a digital resource is licensed, not sold, the right to pass along, lend, or re-sell a copy (as per Title 17 section 109) does not apply even though technology solutions that would delete the sender’s copy as the file safely reaches the recipient are not only plausible but have been developed. [resale]  “Like other copyright sectors that license education technology or entertainment software, publishers either license ebooks to consumers or sell them pursuant to special agreements or terms.” ([suit] Page 15) “When an ebook customer obtains access to the title in a digital format, there are set terms that determine what the user can or cannot do with the underlying file.”([suit] Page 16) This control goes beyond the copyright holder’s rights in law: DRM can exercise controls over the actual use of a file, limiting it to specific formats or devices, allowing or not allowing text-to-speech capabilities, even limiting copying to the clipboard. Publishers and Libraries  The suit claims that publishers and libraries have reached an agreement, an equilibrium. “To Plaintiffs, libraries are not just customers but allies in a shared mission to make books available to those who have a desire to read, including, especially, those who lack the financial means to purchase their own copies.” ([suit] Page 17). In the suit, publishers contrast the Archive’s operation with the relationship that publishers have with libraries. In contrast with the Archive’s lending program, libraries are the “good guys.” “... the Publishers have established independent and distinct distribution models for ebooks, including a market for lending ebooks through libraries, which are governed by different terms and expectations than print books.”([suit] Page 6). These “different terms” include charging much higher prices to libraries for ebooks, limiting the number of times an ebook can be loaned. [pricing1] [pricing2] “Legitimate libraries, on the other hand, license ebooks from publishers for limited periods of time or a limited number of loans; or at much higher prices than the ebooks available for individual purchase.” [agol] The equilibrium of which publishers speak looks less equal from the library side of the equation: library literature is replete with stories about the avarice of publishers in relation to library lending of ebooks. Some authors/publishers even speak out against library lending of physical books, claiming that this cuts into sales. (This same argument has been made for physical books.) “If, as Macmillan has determined, 45% of ebook reads are occurring through libraries and that percentage is only growing, it means that we are training readers to read ebooks for free through libraries instead of buying them. With author earnings down to new lows, we cannot tolerate ever-decreasing book sales that result in even lower author earnings.” [agliblend][ag42] The ease of access to digital books has become a boon for book sales, and ebook sales are now rising while hard copy sales fall. This economic factor is a motivator for any of those engaged with the book market. The Archive’s CDL is a direct affront to the revenue stream that publishers have carved out for specific digital products. There are indications that the ease of borrowing of ebooks - not even needing to go to the physical library to borrow a book - is seen as a threat by publishers. This has already played out in other media, from music to movies.  It would be hard to argue that access to the Archive’s digitized books is merely a substitute for library access. Many people do not have actual physical library access to the books that the Archive lends, especially those digitized from the collections of academic libraries. This is particularly true when you consider that the Archive’s materials are available to anyone in the world with access to the Internet. If you don’t have an economic interest in book sales, and especially if you are an educator or researcher, this expanded access could feel long overdue.  We need numbers  We really do not know much about the uses of the Archive’s book collection. The lawsuit cites some statistics of “views” to show that the infringement has taken place, but the page in question does not explain what is meant by a “view”. Archive pages for downloadable files of metadata records also report “views” which most likely reflect views of that web page, since there is nothing viewable other than the page itself. Open Library book pages give “currently reading” and “have read” stats, but these are tags that users can manually add to the page for the work. To compound things, the 127 books cited in the suit have been removed from the lending service (and are identified in the Archive as being in the collection “litigation works)  Although numbers may not affect the legality of the controlled digital lending, the social impact of the Archive’s contribution to reading and research would be clearer if we had this information. Although the Archive has provided a small number of testimonials, a proof of use in educational settings would bolster the claims of social benefit which in turn could strengthen a fair use defense.  Notes (*) The NWU has a slide show [nwu2] that explains what it calls Controlled Digital Lending at the Archive. Unfortunately this document conflates the Archive's book Reader with CDL and therefore muddies the water. It muddies it because it does not distinguish between sending files to dedicated devices (which is what Kindle is) or dedicated software like what libraries use via software like Libby, and the Archive's use of a web-based reader. It is not beyond reason to suppose that the Archive's Reader software does not fully secure loaned items. The NWU claims that files are left in the browser cache that represent all book pages viewed: "There’s no attempt whatsoever to restrict how long any user retains these images". (I cannot reproduce this. In my minor experiments those files disappear at the end of the lending period, but this requires more concerted study.) However, this is not a fault of CDL but a fault of the Reader software. The reader is software that works within a browser window. In general, electronic files that require secure and limited use are not used within browsers, which are general purpose programs. Conflating the Archive's Reader software with Controlled Digital Lending will only hinder understanding. Already CDL has multiple components: Digitization of in-copyright materials Lending of digital copies of in-copyright materials that are owned by the library in a 1-to-1 relation to physical copies We can add #3, the leakage of page copies via the browser cache, but I maintain that poorly functioning software does not automatically moot points 1 and 2. I would prefer that we take each point on its own in order to get a clear idea of the issues. The NWU slides also refer to the Archive's API which allows linking to individual pages within books. This is an interesting legal area because it may be determined to be fair use regardless of the legality of the underlying copy. This becomes yet another issue to be discussed by the legal teams, but it is separate from the question of controlled digital lending. Let's stay focused.   Citations [1stSale] https://abovethelaw.com/2017/11/a-digital-take-on-the-first-sale-doctrine/  [ag1]https://www.authorsguild.org/industry-advocacy/reselling-a-digital-file-infringes-copyright/  [ag42] https://www.authorsguild.org/industry-advocacy/authors-guild-survey-shows-drastic-42-percent-decline-in-authors-earnings-in-last-decade/  [aghard] https://www.authorsguild.org/the-writing-life/why-is-it-so-goddamned-hard-to-make-a-living-as-a-writer-today/ [aglibend] https://www.authorsguild.org/industry-advocacy/macmillan-announces-new-library-lending-terms-for-ebooks/ [agol] https://www.authorsguild.org/industry-advocacy/update-open-library/  [cdl-blog] https://blog.archive.org/2020/03/09/controlled-digital-lending-and-open-libraries-helping-libraries-and-readers-in-times-of-crisis/  [cdt] https://cdt.org/insights/up-next-controlled-digital-lendings-first-legal-battle-as-publishers-take-on-the-internet-archive/  [kinkos] https://law.justia.com/cases/federal/district-courts/FSupp/758/1522/1809457 [nel] http://blog.archive.org/national-emergency-library/ [nwu] "Appeal from the victims of Controlled Digital Lending (CDL)". (Retrieved 2021-01-10)  [nwu2] "What is the Internet Archive doing with our books?" https://nwu.org/wp-content/uploads/2020/04/NWU-Internet-Archive-webinar-27APR2020.pdf [pricing1] https://www.authorsguild.org/industry-advocacy/e-book-library-pricing-the-game-changes-again/  [pricing2] https://americanlibrariesmagazine.org/blogs/e-content/ebook-pricing-wars-publishers-perspective/  [reader] Bookreader  [resale] https://www.hollywoodreporter.com/thr-esq/appeals-court-weighs-resale-digital-files-1168577  [suit] https://www.courtlistener.com/recap/gov.uscourts.nysd.537900/gov.uscourts.nysd.537900.1.0.pdf  [suit-g] https://cases.justia.com/federal/appellate-courts/ca2/13-4829/13-4829-2015-10-16.pdf?ts=1445005805 Posted by Karen Coyle at 11:54 AM No comments: Labels: ebooks books digitization Internet Archive Open Library Controlled digital lending Thursday, June 25, 2020 Women designing Those of us in the library community are generally aware of our premier "designing woman," the so-called "Mother of MARC," Henriette Avram. Avram designed the MAchine Reading Cataloging record in the mid-1960's, a record format that is still being used today. MARC was way ahead of its time using variable length data fields and a unique character set that was sufficient for most European languages, all thanks to Avram's vision and skill. I'd like to introduce you here to some of the designing women of the University of California library automation project, the project that created one of the first online catalogs in the beginning of the 1980's, MELVYL. Briefly, MELVYL was a union catalog that combined data from the libraries of the nine (at that time) University of California campuses. It was first brought up as a test system in 1980 and went "live" to the campuses in 1982. Work on the catalog began in or around 1980, and various designs were put forward and tested. Key designers were Linda Gallaher-Brown, who had one of the first masters degrees in computer science from UCLA, and Kathy Klemperer, who like many of us was a librarian turned systems designer. We were struggling with how to create a functional relational database of bibliographic data (as defined by the MARC record) with computing resources that today would seem laughable but were "cutting edge" for that time. I remember Linda remarking that during one of her school terms she returned to her studies to learn that the newer generation of computers would have this thing called an "operating system" and she thought "why would you need one?" By the time of this photo she had come to appreciate what an operating system could do for you. The one we used at the time was IBM's OS 360/370. Kathy Klemperer was the creator of the database design diagrams that were so distinctive we called them "Klemperer-grams." Here's one from 1985: MELVYL database design Klemperer-gram, 1985 Drawn and lettered by hand, not only did these describe a workable database design, they were impressively beautiful. Note that this not only predates the proposed 2009 RDA "database scenario" for a relational bibliographic design by 24 years, it provides a more detailed and most likely a more accurate such design. RDA "Scenario 1" data design, 2009 In the early days of the catalog we had a separate file and interface for the cataloged serials based on a statewide project (including the California State Universities). Although it was possible to catalog serials in the MARC format, the systems that had the detailed information about which issues the libraries held was stored in serials control databases that were separate from the library catalog, and many serials were represented by crusty cards that had been created decades before library automation. The group below developed and managed the CALLS (California Academic Library List of Serials). Four of those pictured were programmers, two were serials data specialists, and four had library degrees. Obviously, these are overlapping sets. The project heads were Barbara Radke (right) and Theresa Montgomery (front, second from right). At one point while I was still working on the MELVYL project, but probably around the very late 1990's or early 2000's, I gathered up some organization charts that had been issued over the years and quickly calculated that during its history the project the technical staff that had created this early marvel had varied from 3/4 to 2/3 female. I did some talks at various conferences in which I called MELVYL a system "created by women." At my retirement in 2003 I said the same thing in front of the entire current staff, and it was not well-received by all. In that audience was one well-known member of the profession who later declared that he felt women needed more mentoring in technology because he had always worked primarily with men, even though he had indeed worked in an organization with a predominantly female technical staff, and another colleague who was incredulous when I stated once that women are not a minority, but over 50% of the world's population. He just couldn't believe it. While outright discrimination and harassment of women are issues that need to be addressed, the invisibility of women in the eyes of their colleagues and institutions is horribly damaging. There are many interesting projects, not the least the Wikipedia Women in Red, that aim to show that there is no lack of accomplished women in the world, it's the acknowledgment of their accomplishments that falls short. In the library profession we have many women whose stories are worth telling. Please, let's make sure that future generations know that they have foremothers to look to for inspiration. Posted by Karen Coyle at 9:47 AM 1 comment: Labels: library catalogs, library history, open data, women and technology Monday, May 25, 2020 1982 I've been trying to capture what I remember about the early days of library automation. Mostly my memory is about fun discoveries in my particular area (processing MARC records into the online catalog). I did run into an offprint of some articles in ITAL from 1982 (*) which provide very specific information about the technical environment, and I thought some folks might find that interesting. This refers to the University of California MELVYL union catalog, which at the time had about 800,000 records. Operating system: IBM 360/370 Programming language: PL/I CPU: 24 megabytes of memory Storage: 22 disk drives, ~ 10 gigabytes DBMS: ADABAS The disk drives were each about the size of an industrial washing machine. In fact, we referred to the room that held them as "the laundromat." Telecommunications was a big deal because there was no telecommunications network linking the libraries of the University of California. There wasn't even one connecting the campuses at all. The article talks about the various possibilities, from an X.25 network to the new TCP/IP protocol that allows "internetwork communication." The first network was a set of dedicated lines leased from the phone company that could transmit 120 characters per second (character = byte) to about 8 ASCII terminals at each campus over a 9600 baud line. There was a hope to be able to double the number of terminals. In the speculation about the future, there was doubt that it would be possible to open up the library system to folks outside of the UC campuses, much less internationally. (MELVYL was one of the early libraries to be open access worldwide over the Internet, just a few years later.) It was also thought that libraries would charge other libraries to view their catalogs, kind of like an inter-library loan. And for anyone who has an interest in Z39.50, one section of the article by David Shaughnessy and Clifford Lynch on telecommunications outlines a need for catalog-to-catalog communication which sounds very much like the first glimmer of that protocol. ----- (*) Various authors in a special edition: (1982). In-Depth: University of California MELVYL. Information Technology and Libraries, 1(4) I wish I could give a better citation but my offprint does not have page numbers and I can't find this indexed anywhere. (Cue here the usual irony that libraries are terrible at preserving their own story.) Posted by Karen Coyle at 6:25 AM No comments: Labels: library catalogs, library history Monday, April 27, 2020 Ceci n'est pas une Bibliothèque On March 24, 2020, the Internet Archive announced that it would "suspend waitlists for the 1.4 million (and growing) books in our lending library," a service they then named The National Emergency Library. These books were previously available for lending on a one-to-one basis with the physical book owned by the Archive, and as with physical books users would have to wait for the book to be returned before they could borrow it. Worded as a suspension of waitlists due to the closure of schools and libraries caused by the presence of the coronavirus-19, this announcement essentially eliminated the one-to-one nature of the Archive's Controlled Digital Lending program. Publishers were already making threatening noises about the digital lending when it adhered to lending limitations, and surely will be even more incensed about this unrestricted lending. I am not going to comment on the legality of the Internet Archive's lending practices. Legal minds, perhaps motivated by future lawsuits, will weigh in on that. I do, however, have much to say on the use of the term "library" for this set of books. It's a topic worthy of a lengthy treatment, but I'll give only a brief account here. LIBRARY … BIBLIOTHÈQUE … BIBLIOTEK The roots “LIBR…” and “BIBLIO…” both come down to us from ancient words for trees and tree bark. It is presumed that said bark was the surface for early writings. “LIBR…”, from the Latin word liber meaning “book,” in many languages is a prefix that indicates a bookseller’s shop, while in English it has come to mean a collection of books and from that also the room or building where books are kept. “BIBLIO…” derives instead from the Greek biblion (one book) and biblia (books, plural). We get the word Bible through the Greek root, which leaked into old Latin and meant The Book. Therefore it is no wonder that in the minds of many people, books = library.  In fact, most libraries are large collections of books, but that does not mean that every large collection of books is a library. Amazon has a large number of books, but is not a library; it is a store where books are sold. Google has quite a few books in its "book search" and even allows you to view portions of the books without payment, but it is also not a library, it's a search engine. The Internet Archive, Amazon, and Google all have catalogs of metadata for the books they are offering, some of it taken from actual library catalogs, but a catalog does not make a quantity of books into a library. After all, Home Depot has a catalog, Walmart has a catalog; in essence, any business with an inventory has a catalog. "...most libraries are large collections of books, but that does not mean that every large collection of books is a library." The Library Test First, I want to note that the Internet Archive has met the State of California test to be defined as a library, and this has made it possible for the Archive to apply for library-related grants for some of its projects. That is a Good Thing because it has surely strengthened the Archive and its activities. However, it must be said that the State of California requirements are pretty minimal, and seem to be limited to a non-profit organization making materials available to the general public without discrimination. There doesn't seem to be a distinction between "library" and "archive" in the state legal code, although librarians and archivists would not generally consider them easily lumped together as equivalent services. The Collection The Archive's blog post says "the Internet Archive currently lends about as many as a US library that serves a population of about 30,000." As a comparison, I found in the statistics gathered by the California State Library those of the Benicia Public Library in Benicia California. Benicia is a city with a population of 31,000; the library has about 88,000 books. Well, you might say, that's not as good as over one million books at the Internet Archive. But, here's the thing: those are not 88,000 random books, they are books chosen to be, as far as the librarians could know, the best books for that small city. If Benicia residents were, for example, primarily Chinese-speaking, the library would surely have many books in Chinese. If the city had a large number of young families then the children's section would get particular attention. The users of the Internet Archive's books are a self-selected (and currently un-defined) set of Internet users. Equally difficult to define is the collection that is available to them: This library brings together all the books from Phillips Academy Andover and Marygrove College, and much of Trent University’s collections, along with over a million other books donated from other libraries to readers worldwide that are locked out of their libraries. Each of these is (or was, in the case of Marygrove, which has closed) a collection tailored to the didactic needs of that institution. How one translates that, if one can, to the larger Internet population is unknown. That a collection has served a specific set of users does not mean that it can serve all users equally well. Then there is that other million books, which are a complete black box. Library science I've argued before against dumping a large and undistinguished set of books on a populace, regardless of the good intentions of those doing so. Why not give the library users of a small city these one million books? The main reason is the ability of the library to fulfill the 5 Laws of Library Science: Books are for use. Every reader his or her book. Every book its reader. Save the time of the reader. The library is a growing organism. [0] The online collection of the Internet Archive nicely fulfills laws 1 and 5: the digital books are designed for use, and the library can grow somewhat indefinitely. The other three laws are unfortunately hindered by the somewhat haphazard nature of the set of books, combined with the lack of user services. Of the goals of librarianship, matching readers to books is the most difficult. Let's start with law 3, "every book its reader." When you follow the URL to the National Emergency Library, you see something like this: The lack of cover art is not the problem here. Look at what books you find: two meeting reports, one journal publication, and a book about hand surgery, all from 1925. Scroll down for a bit and you will find it hard to locate items that are less obscure than this, although undoubtedly there are some good reads in this collection. These are not the books whose readers will likely be found in our hypothetical small city. These are books that even some higher education institutions would probably choose not to have in their collections. While these make the total number of available books large, they may not make the total number of useful books large. Winnowing this set to one or more (probably more) wheat-filled collections could greatly increase the usability of this set of books. "While these make the total number of available books large, they may not make the total number of useful books large." A large "anything goes" set of documents is a real challenge for laws 2 and 4: every reader his or her book, and save the time of the reader. The more chaff you have the harder it is for a library user to find the wheat they are seeking. The larger the collection the more of the burden is placed on the user to formulate a targeted search query and to have the background to know which items to skip over. The larger the retrieved set, the less likely that any user will scroll through the entire display to find the best book for their purposes. This is the case for any large library catalog, but these libraries have built their collection around a particular set of goals. Those goals matter. Goals are developed to address a number of factors, like: What are the topics of interest to my readers and my institution? How representative must my collection be in each topic area? What are the essential works in each topic area? What depth of coverage is needed for each topic? [1] If we assume (and we absolutely must assume this) that the user entering the library is seeking information that he or she lacks, then we cannot expect users to approach the library as an expert in the topic being researched. Although anyone can type in a simple query, fewer can assess the validity and the scope of the results. A search on "California history" in the National Emergency Library yields some interesting-looking books, but are these the best books on the topic? Are any key titles missing? These are the questions that librarians answer when developing collections. The creation of a well-rounded collection is a difficult task. There are actual measurements that can be run against library collections to determine if they have the coverage that can be expected compared to similar libraries. I don't know if any such statistical packages can look beyond quantitative measures to judge the quality of the collection; the ones I'm aware of look at call number ranges, not individual titles.  There Library Service The Archive's own documentation states that "The Internet Archive focuses on preservation and providing access to digital cultural artifacts. For assistance with research or appraisal, you are bound to find the information you seek elsewhere on the internet." After which it advises people to get help through their local public library. Helping users find materials suited to their need is a key service provided by libraries. When I began working in libraries in the dark ages of the 1960's, users generally entered the library and went directly to the reference desk to state the question that brought them to the institution. This changed when catalogs went online and were searchable by keyword, but prior to then the catalog in a public library was primarily a tool for librarians to use when helping patrons. Still, libraries have real or virtual reference desks because users are not expected to have the knowledge of libraries or of topics that would allow them to function entirely on their own. And while this is true for libraries it is also true, perhaps even more so, for archives whose collections can be difficult to navigate without specialized information. Admitting that you give no help to users seeking materials makes the use of the term "library" ... unfortunate. What is to be done? There are undoubtedly a lot of useful materials among the digital books at the Internet Archive. However, someone needing materials has no idea whether they can expect to find what they need in this amalgamation. The burden of determining whether the Archive's collection might suit their needs is left entirely up to the members of this very fuzzy set called "Internet users." That the collection lends at the rate of a public library serving a population of 30,000 shows that it is most likely under-utilized. Because the nature of the collection is unknown one can't approach, say, a teacher of middle-school biology and say: "they've got what you need." Yet the Archive cannot implement a policy to complete areas of the collection unless it knows what it has as compared to known needs. "... these warehouses of potentially readable text will remain under-utilized until we can discover a way to make them useful in the ways that libraries have proved to be useful." I wish I could say that a solution would be simple - but it would not. For example, it would be great to extract from this collection works that are commonly held in specific topic areas in small, medium and large libraries. The statistical packages that analyze library holdings all are, AFAIK, proprietary. (If anyone knows of an open source package that does this, please shout it out!) If would also be great to be able to connect library collections of analog books to their digital equivalents. That too is more complex than one would expect, and would have to be much simpler to be offered openly. [2] While some organizations move forward with digitizing books and other hard copy materials, these warehouses of potentially readable text will remain under-utilized until we can discover a way to make them useful in the ways that libraries have proved to be useful. This will mean taking seriously what modern librarianship has developed over its circa 2 centuries, and in particular those 5 laws that give us a philosophy to guide our vision of service to the users of libraries. ----- [0] Even if you are familiar with the 5 laws you may not know that Ranganathan was not as succinct as this short list may imply. The book in which he introduces these concepts is over 450 pages long, with extended definitions and many homey anecdotes and stories. [1] A search on "collection development policy" will yield many pages of policies that you can peruse. To make this a "one click" here are a few *non-representative* policies that you can take a peek at: Hennepin County (public) Lansing Community College (community college) Stanford University, Science Library (research library) [2] Dan Scott and I did a project of this nature with a Bay Area public library and it took a huge amount of human intervention to determine whether the items matched were really "equivalent". That's a discussion for another time, but, man, books are more complicated than they appear. Posted by Karen Coyle at 8:08 AM No comments: Labels: books, Digital libraries, OpenLibrary Monday, February 03, 2020 Use the Leader, Luke! If you learned the MARC format "on the job" or in some other library context you may have learned that the record is structured as fields with 3-digit tags, each with two numeric indicators, and that subfields have a subfield indicator (often shown as "$" because it is a non-printable character) and a single character subfield code (a-z, 0-9). That is all true for the MARC records that libraries create and process, but the MAchine Readable Cataloging standard (Z39.2 or ISO 2709) has other possibilities that we are not using. Our "MARC" (currently MARC21) is a single selection from among those possibilities, in essence an application profile of the MARC standard. The key to the possibilities afforded by MARC is in the MARC Leader, and in particular in two positions that our systems generally ignore because they always contain the same values in our data: Leader byte 10 -- Indicator count Leader byte 11 -- Subfield code length In MARC21 records, Leader byte 10 is always "2" meaning that fields have 2-byte indicators, and Leader byte 11 is always 2 because the subfield code is always two characters in length. That was a decision made early on in the life of MARC records in libraries, and it's easy to forget that there were other options that were not taken. Let's take a short look at the possibilities the record format affords beyond our choice. Both of these Leader positions are single bytes that can take values from 0 to 9. An application could use the MARC record format and have zero indicators. It isn't hard to imagine an application that has no need of indicators or that has determined to make use of subfields in their stead. As an example, the provenance of vocabulary data for thesauri like LCSH or the Art and Architecture Thesaurus could always be coded in a subfield rather than in an indicator: 650 $a Religion and science $2 LCSH Another common use of indicators in MARC21 is to give a byte count for the non-filing initial articles on title strings. Istead of using an indicator value for this some libraries outside of the US developed a non-printing code to make the beginning and end of the non-filing portion. I'll use backslashes to represent these codes in this example: 245 $a \The \Birds of North America I am not saying that all indicators in MARC21 should or even could be eliminated, but that we shouldn't assume that our current practice is the only way to code data. In the other direction, what if you could have more than two indicators? The MARC record would allow you have have as many as nine. In addition, there is nothing to say that each byte in the indicator has to be a separate data element; you could have nine indicator positions that were defined as two data elements (4 + 5), or some other number (1 + 2 + 6). Expanding the number of indicators, or beginning with a larger number, could have prevented the split in provenance codes for subject vocabularies between one indicator value and the overflow subfield, $2, when the number exceeded the capability of a single numerical byte. Having three or four bytes for those codes in the indicator and expanding the values to include a-z would have been enough to include the full list of authorities for the data in the indicators. (Although I would still prefer putting them all in $2 using the mnemonic codes for ease of input.) In the first University of California union catalog in the early 1980's we expanded the MARC indicators to hold an additional two bytes (or was it four?) so that we could record, for each MARC field, which library had contributed it. Our union catalog record was a composite MARC record with fields from any and all of the over 300 libraries across the University of California system that contributed to the union catalog as dozen or so separate record feeds from OCLC and RLIN. We treated the added indicator bytes as sets of bits, turning on bits to represent the catalog feeds from the libraries. If two or more libraries submitted exactly the same MARC field we stored the field once and turned on a bit for each separate library feed. If a library submitted a field for a record that was new to the record, we added the field and turned on the appropriate bit. When we created a user display we selected fields from only one of the libraries. (The rules for that selection process were something of a secret so as not to hurt anyone's feelings, but there was a "best" record for display.) It was a multi-library MARC record, made possible by the ability to use more than two indicators. Now on to the subfield code. The rule for MARC21 is that there is a single subfield code and that is a lower case a-z and 0-9. The numeric codes have special meaning and do not vary by field; the alphabetic codes aare a bit more flexible. That gives use 26 possible subfields per tag, plus the 10 pre-defined numeric ones. The MARC21 standard has chosen to limit the alphabetic subfield codes to lower case characters. As the fields reached the limits of the available subfield codes (and many did over time) you might think that the easiest solution would be to allow upper case letters as subfield codes. Although the subfield code limitation was reached decades ago for some fields I can personally attest to the fact that suggesting the expansion of subfield codes to upper case letters was met with horrified glares at the MARC standards meeting. While clearly in 1968 the range of a-z seemed ample, that has not be the case for nearly half of the life-span of MARC. The MARC Leader allows one to define up to 9 characters total for subfield codes. The value in this Leader position includes the subfield delimiter so this means that you can have a subfield delimiter and up to 8 characters to encode a subfield. Even expanding from a-z to aa-zz provides vastly more possibilities, and allow upper case as well give you a dizzying array of choices. The other thing to mention is that there is no prescription that field tags must be numeric. They are limited to three characters in the MARC standard, but those could be a-z, A-Z, 0-9, not just 0-9, greatly expanding the possibilities for adding new tags. In fact, if you have been in the position to view internal systems records in your vendor system you may have been able to see that non-numeric tags have been used for internal system purposes, like noting who made each edit, whether functions like automated authority control have been performed on the record, etc. Many of the "violations" of the MARC21 rules listed here have been exploited internally -- and since early days of library systems. There are other modifiable Leader values, in particular the one that determines the maximum length of a field, Leader 20. MARC21 has Leader 20 set at "4" meaning that fields cannot be longer than 9999. That could be longer, although the record size itself is set at only 5 bytes, so a record cannot be longer than 99999. However, one could limit fields to 999 (Leader value 20 set at "3") for an application that does less pre-composing of data compared to MARC21 and therefore comfortably fits within a shorter field length.  The reason that has been given, over time, why none of these changes were made was always: it's too late, we can't change our systems now. This is, as Caesar might have said, cacas tauri. Systems have been able to absorb some pretty intense changes to the record format and its contents, and a change like adding more subfield codes would not be impossible. The problem is not really with the MARC21 record but with our inability (or refusal) to plan and execute the changes needed to evolve our systems. We could sit down today and develop a plan and a timeline. If you are skeptical, here's an example of how one could manage a change in length to the subfield codes: a MARC21 record is retrieved for editing read the Leader 10 of the MARC21 record if the value is "2" and you need to add a new subfield that uses the subfield code plus two characters, convert all of the subfield codes in the record: $a becomes $aa, $b becomes $ba, etc. $0 becomes $01, $1 becomes $11, etc. Leader 10 code is changed to "3" (alternatively, convert all records opened for editing) a MARC21 record is retrieved for display read the Leader 10 of the MARC21 record if the value is "2" use the internal table of subfield codes for records with the value "2" if the value is "3" use the internal table of subfield codes for records with the value "3" Sounds impossible? We moved from AACR to AACR2, and now from AACR2 to RDA without going back and converting all of our records to the new content.  We have added new fields to our records, such as the 336, 337, 338 for RDA values, without converting all of the earlier records in our files to have these fields. The same with new subfields, like $0, which has only been added in recent years. Our files have been using mixed record types for at least a couple of generations -- generations of systems and generations of catalogers. Alas, the time to make these kinds of changes this was many years ago. Would it be worth doing today? That depends on whether we anticipate a change to BIBFRAME (or some other data format) in the near future. Changes do continue to be made to the MARC21 record; perhaps it would have a longer future if we could broach the subject of fixing some of the errors that were introduced in the past, in particular those that arose because of the limitations of MARC21 that could be rectified with an expansion of that record standard. That may also help us not carry over some of the problems in MARC21 that are caused by these limitations to a new record format that does not need to be limited in these ways. Epilogue Although the MARC  record was incredibly advanced compared to other data formats of its time (the mid-1960's), it has some limitations that cannot be overcome within the standard itself. One obvious one is the limitation of the record length to 5 bytes. Another is the fact that there are only two levels of nesting of data: the field and the subfield. There are times when a sub-subfield would be useful, such as when adding information that relates to only one subfield, not the entire field (provenance, external URL link). I can't advocate for continuing the data format that is often called "binary MARC" simply because it has limitations that require work-arounds. MARCXML, as defined as a standard, gets around the field and record length limitations, but it is not allowed to vary from the MARC21 limitations on field and subfield coding. It would be incredibly logical to move to a "non-binary" record format (XML, JSON, etc.) beginning with the existing MARC21 and  to allow expansions where needed. It is the stubborn adherence to the ISO 2709 format really has limited library data, and it is all the more puzzling because other solutions that can keep the data itself intact have been available for many decades. Posted by Karen Coyle at 6:59 AM No comments: Labels: MARC Tuesday, January 28, 2020 Pamflets I was always a bit confused about the inclusion of "pamflets" in the subtitle of the Decimal System, such as this title page from the 1922 edition: Did libraries at the time collect numerous pamphlets? For them to be the second-named type of material after books was especially puzzling. I may have discovered an answer to my puzzlement, if not THE answer, in Andrea Costadoro's 1856 work: A "pamphlet" in 1856 was not (necessarily) what I had in mind, which was a flimsy publication of the type given out by businesses, tourist destinations, or public health offices. In the 1800's it appears that a pamphlet was a literary type, not a physical format. Costadoro says: "It has been a matter of discussion what books should be considered pamphlets and what not. If this appellation is intended merely to refer to the SIZE of the book, the question can be scarecely worth considering ; but if it is meant to refer to the NATURE of a work, it may be considered to be of the same class and to stand in the same connexion with the word Treatise as the words Tract ; Hints ; Remarks ; &c, when these terms are descriptive of the nature of the books to which they are affixed." (p. 42) To be on the shelves of libraries, and cataloged, it is possible that these pamphlets were indeed bound, perhaps by the library itself.  The Library of Congress genre list today has a cross-reference from "pamphlet" to "Tract (ephemera)". While Costadoro's definition doesn't give any particular subject content to the type of work, LC's definition says that these are often issued by religious or political groups for proselytizing. So these are pamphlets in the sense of the political pamphlets of our revolutionary war. Today they would be blog posts, or articles in Buzzfeed or Slate or any one of hundreds of online sites that post such content. Churches I have visited often have short publications available near the entrance, and there is always the Watchtower, distributed by Jehovah's Witnesses at key locations throughout the world, and which is something between a pamphlet (in the modern sense) and a journal issue. These are probably not gathered in most libraries today. In Dewey's time the printing (and collecting by libraries) of sermons was quite common. In a world where many people either were not literate or did not have access to much reading material, the Sunday sermon was a "long form" work, read by a pastor who was probably not as eloquent as the published "stars" of the Sunday gatherings. Some sermons were brought together into collections and published, others were published (and seemingly bound) on their own.  Dewey is often criticized for the bias in his classification, but what you find in the early editions serves as a brief overview of the printed materials that the US (and mostly East Coast) culture of that time valued.  What now puzzles me is what took the place of these tracts between the time of Dewey and the Web. I can find archives of political and cultural pamphlets in various countries and they all seem to end around the 1920's-30's, although some specific collections, such as the Samizdat publications in the Soviet Union, exist in other time periods. Of course the other question now is: how many of today's tracts and treatises will survive if they are not published in book form? Posted by Karen Coyle at 1:15 PM No comments: Labels: classification, library history Saturday, November 23, 2019 The Work The word "work" generally means something brought about by human effort, and at times implies that this effort involves some level of creativity. We talk about "works of art" referring to paintings hanging on walls. The "works" of Beethoven are a large number of musical pieces that we may have heard. The "works" of Shakespeare are plays, in printed form but also performed. In these statements the "work" encompasses the whole of the thing referred to, from the intellectual content to the final presentation. This is not the same use of the term as is found in the Library Reference Model (LRM). If you are unfamiliar with the LRM, it is the successor to FRBR (which I am assuming you have heard of) and it includes the basic concepts of work, expression, manifestation and item that were first introduced in that previous study. "Work," as used in the LRM is a concept designed for use in library cataloging data. It is narrower than the common use of the term illustrated in the previous paragraph and is defined thus: Class: Work Definition: An abstract notion of an artistic or intellectual creation. In this definition the term only includes the idea of a non-corporeal conceptual entity, not the totality that would be implied in the phrase "the works of Shakespeare." That totality is described when the work is realized through an LRM-defined "expression" which in turn is produced in an LRM-defined "manifestation" with an LRM-defined "item" as its instance.* These four entities are generally referred to as a group with the acronym WEMI. Because many in the library world are very familiar with the LRM definition of work, we have to use caution when using the word outside the specific LRM environment. In particular, we must not impose the LRM definition on uses of the work that are not intending that meaning. One should expect that the use of the LRM definition of work would be rarely found in any conversation that is not about the library cataloging model for which it was defined. However, it is harder to distinguish uses within the library world where one might expect the use to be adherent to the LRM. To show this, I want to propose a particular use case. Let's say that a very large bibliographic database has many records of bibliographic description. The use case is that it is deemed to be easier for users to navigate that large database if they could get search results that cluster works rather than getting long lists of similar or nearly identical bibliographic items. Logically the cluster looks like this: In data design, it will have a form something like this: This is a great idea, and it does appear to have a similarity to the LRM definition of work: it is gathering those bibliographic entries that are judged to represent the same intellectual content. However, there are reasons why the LRM-defined work could not be used in this instance. The first is that there is only one WEMI relationship for work, and that is from LRM work to LRM expression. Clearly the bibliographic records in this large library catalog are not LRM expressions; they are full bibliographic descriptions including, potentially, all of the entities defined in the LRM. To this you might say: but there is expression data in the bibliographic record, so we can think of this work as linking to the expression data in that record. That leads us to the second reason: the entities of WEMI are defined as being disjoint. That means that no single "thing" can be more than one of those entities; nothing can be simultaneously a work and an expression, or any other combination of WEMI entities. So if the only link we have available in the model is from work to expression, unless we can somehow convince ourselves that the bibliographic record ONLY represents the expression (which it clearly does not since it has data elements from at least three of the LRM entities) any such link will violate the rule of disjointness. Therefore, the work in our library system can have much in common with the conceptual definition of the LRM work, but it is not the same work entity as is defined in that model. This brings me back to my earlier blog post with a proposal for a generalized definition of WEMI-like entities for created works.  The WEMI concepts are useful in practice, but the LRM model has some constraints that prevent some desirable uses of those entities. Providing unconstrained entities would expand the utility of the WEMI concepts both within the library community, as evidenced by the use case here, and in the non-library communities that I highlight in that previous blog post and in a slide presentation. To be clear, "unconstrained" refers not only to the removal of the disjointness between entities, but also to allow the creation of links between the WEMI entities and non-WEMI entities, something that is not anticipated in the LRM. The work cluster of bibliographic records would need a general relationship, perhaps, as in the case of VIAF, linked through a shared cluster identifier and an entity type identifying the cluster as representing an unconstrained work. ---- * The other terms are defined in the LRM as: Class: Expression Definition: A realization of a single work usually in a physical form. Class: Manifestation Definition: The physical embodiment of one or more expressions. Class: Item Definition: An exemplar of a single manifestation. Posted by Karen Coyle at 4:13 AM No comments: Labels: FRBR, library catalogs, LRM, metadata Older Posts Home Subscribe to: Posts (Atom) CopyRight Coyle's InFormation by Karen Coyle is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License. Karen Karen Coyle Where I'll Be DC2020, Ottawa, Sep 14-17 What I'm Reading Paper, M. Kurlansky The Coming of the Third Reich, R. J. Evans A Room of Ones Own, V. Woolf Blog Archive ▼  2021 (1) ▼  March (1) Digitization Wars, Redux ►  2020 (5) ►  June (1) ►  May (1) ►  April (1) ►  February (1) ►  January (1) ►  2019 (4) ►  November (1) ►  April (1) ►  March (1) ►  January (1) ►  2018 (3) ►  November (1) ►  August (2) ►  2017 (8) ►  October (1) ►  August (1) ►  July (1) ►  June (1) ►  May (1) ►  April (2) ►  February (1) ►  2016 (20) ►  December (2) ►  November (1) ►  September (1) ►  August (5) ►  July (2) ►  June (5) ►  February (1) ►  January (3) ►  2015 (13) ►  November (1) ►  October (2) ►  September (2) ►  August (1) ►  July (1) ►  May (2) ►  April (2) ►  January (2) ►  2014 (15) ►  November (2) ►  October (3) ►  September (3) ►  June (1) ►  May (1) ►  April (2) ►  February (3) ►  2013 (25) ►  November (1) ►  October (6) ►  September (2) ►  August (1) ►  July (3) ►  June (2) ►  May (1) ►  April (1) ►  March (3) ►  February (2) ►  January (3) ►  2012 (32) ►  December (1) ►  November (2) ►  October (4) ►  September (3) ►  August (1) ►  July (5) ►  June (2) ►  May (4) ►  April (5) ►  March (1) ►  February (1) ►  January (3) ►  2011 (38) ►  December (3) ►  November (1) ►  October (2) ►  September (9) ►  August (8) ►  July (3) ►  June (1) ►  May (3) ►  April (2) ►  March (3) ►  February (2) ►  January (1) ►  2010 (26) ►  December (5) ►  October (2) ►  September (1) ►  August (1) ►  July (4) ►  May (2) ►  April (3) ►  March (2) ►  February (5) ►  January (1) ►  2009 (38) ►  December (1) ►  November (3) ►  October (2) ►  September (4) ►  August (6) ►  July (6) ►  June (2) ►  May (3) ►  April (2) ►  March (2) ►  February (1) ►  January (6) ►  2008 (55) ►  December (6) ►  November (8) ►  October (1) ►  September (7) ►  August (3) ►  July (4) ►  June (3) ►  May (3) ►  April (11) ►  March (1) ►  February (3) ►  January (5) ►  2007 (58) ►  December (2) ►  November (4) ►  October (3) ►  September (4) ►  August (4) ►  July (6) ►  June (5) ►  May (5) ►  April (2) ►  March (15) ►  February (4) ►  January (4) ►  2006 (20) ►  December (3) ►  November (5) ►  October (4) ►  September (5) ►  August (3) Labels library catalogs (50) googlebooks (48) FRBR (46) cataloging (41) linked data (32) RDA (26) RDF (24) oclc (23) MARC (20) copyright (19) metadata (19) Digital libraries (18) women and technology (18) semantic web (15) digitization (12) books (11) intellectual freedom (11) bibframe (10) open access (9) women technology (8) Standards (7) classification (7) internet (7) Google (6) RDA DCMI (6) classification LCSH (6) ebooks (6) kosovo (6) DCMI (5) ER models (5) OpenLibrary (5) authority control (5) libraries (5) skyriver (5) wish list (5) identifiers (4) library history (4) open data (4) reading (4) schema.org (4) search (4) vocabularies (4) Wikipedia (3) knowledge organization (3) privacy (2) women (2) DRM (1) FOAF (1) LRM (1) RFID (1) SHACL (1) application profiles (1) ebooks books digitization Internet Archive Open Library Controlled digital lending (1) lcsh intell (1) linux (1) names (1) politics (1) About Me Karen Coyle BERKELEY, CA, United States Librarian, techie, social commentator, once called "public intellectual" by someone who couldn't think of a better title. View my complete profile Simple theme. Theme images by gaffera. Powered by Blogger. labiks-org-3226 ---- Principal - LABIKS Ir para o conteúdo LinkedInInstagramTwitterYouTubeFacebookE-mail Buscar resultados para: QUEM SOMOS BIBLIOTECA PROJETOS Relatório Anual Mapa Painel de Dados BLOG CONTATO Buscar resultados para: Loading... SISTEMA DE BICICLETAS COMPARTILHADAS NA AMÉRICA LATINA RELATÓRIO 2019 PAINEL DE DADOS OS SISTEMASLATINO-AMERICANOS2019 NAVEGUE MAPA LATINO-AMERICANO DE SISTEMAS DE BICICLETAS COMPARTILHADAS VISITE Principaladmin2021-01-26T18:57:42+00:00 LATIN AMERICAN BIKE KNOWLEDGE SHARING A LABIKS nasce com a missão de reunir, compartilhar e potencializar conhecimento sobre os Sistemas de Bicicletas Públicas da América Latina. Nós realmente acreditamos no valor e contribuição das pesquisas para alcançarmos cidades e comunidades mais sustentáveis. Sendo assim, nós trabalhamos por mais transparência e responsabilidade governamental na América Latina. CONHEÇA A LABIKS LATIN AMERICAN BIKE KNOWLEDGE SHARING A LABIKS nasce com a missão de reunir, compartilhar e potencializar conhecimento sobre os Sistemas de Bicicletas Públicas da América Latina. Nós realmente acreditamos no valor e contribuição das pesquisas para alcançarmos cidades e comunidades mais sustentáveis. Sendo assim, nós trabalhamos por mais transparência e responsabilidade governamental na América Latina. CONHEÇA A LABIKS NOSSOS DESAFIOS Transformar Conhecimento em Ação JUNTE-SE A LABIKS! Para que mais cidades possam usufruir de Sistemas de Bicicletas Compartilhadas de qualidade é importante estimular a capacitação de todos os atores sobre tendências e boas práticas aplicadas ao planejamento, financiamento, gestão e monitoramento destes sistemas. Assim, a LABIKS convida pesquisadores, governos, indústria, financiadores e todos os interessados a serem parceiros desta iniciativa. Junte-se a nós! Latin American Bike Knowledge Sharing Site feito pela Liquefeito Ir ao Topo lawlesst-github-io-5080 ---- Ted Lawless Ted Lawless Work notebook Datasette hosting costs I've been hosting a Datasette (https://baseballdb.lawlesst.net, aka baseballdb) of historical baseball data for a few years and the last year or so it has been hosted on Google Cloud Run I thought I would share my hosting costs for 2020 as a point of reference for others who might be interested in running a Datasette but aren't sure how much it may cost. The total hosting cost on Google Cloud Run for 2020 for the baseballdb was $51.31, or a monthly average of about $4.28 USD The monthly bill did vary a fair amount from as high as $13 in May to as low as $2 in March Since I did no deployments during this time or updates to the site, I assume the variation in costs is related to the amount queries the Datasette was serving I don't have a good sense of how many total queries per month this instance is serving since I'm not using Google Analytics or similar. Google does report that it is subtracting $49.28 in credits for the year but I don't expect those credits/promotions to expire anytime soon since my projected costs for 2021 is $59. This cost information is somewhat incomplete without knowing the number of queries served per month but it is a benchmark Connecting Python's RDFLib to AWS Neptune I've written previously about using Python's RDFLib to connect to various triple stores For a current project, I'm using Amazon Neptune as a triple store and the RDFLib SPARQLStore implemenation did not work out of the box I thought I would share my solution. The problem Neptune returns ntriples by default and RDFLib, by default in version 4.2.2, is expecting CONSTRUCT queries to return RDF/XML The solution is to override RDFLib's SPARQLStore to explictly request RDF/XML from Neptune via HTTP content negotiation. Once this is in place, you can query and update Neptune via SPARQL with RDFLib the same way that you would other triple stores. Code If you are interested in working with Neptune using RDFLib, here's a "NeptuneStore" and "NeptuneUpdateStore" implementation that you can use. Usable sample researcher profile data I've published a small set of web harvesting scripts to fetch information about researchers and their activities from the NIH Intramural Research Program website. On various projects I've been involved with, it has been difficult to acquire usable sample, or test data, about researchers and their activities You either need access to a HR system and a research information system (for the activities) or create mock data Mock, or fake data, doesn't work well when you want to start integrating information across systems or develop tools to find new publications It's hard to build a publication harvesting tool without real author names and research interests. To that end, the scripts I've published crawl the NIH Intramural Research Program website and pull out profile information for the thousand or so researchers that are members of the program, including a name, email, photo, short biography research interests, and the Pubmed IDs for selected publications. A second script harvests the organizational structure of the program Both types of data are outputted to a simple JSON structure that then can be mapped to your destination system Exploring 10 years of the New Yorker Fiction Podcast with Wikidata Note: The online Datasette that supported the sample queries below is no longer available The raw data is at: https://github.com/lawlesst/new-yorker-fiction-podcast-data. The New Yorker Fiction Podcast recently celebrated its ten year anniversary For those of you not familiar, this is a monthly podcast hosted by New Yorker fiction editor Deborah Treisman where a writer who has published a short story in the New Yorker selects a favorite story from the magazine's archive and reads and discusses it on the podcast with Treissman.1 I've been a regular listener to the podcast since it started in 2007 and thought it would be fun to look a little deeper at who has been invited to read and what authors they selected to read and discuss. The New Yorker posts all episodes of the Fiction podcast on their website in nice clean, browseable HTML pages I wrote a Python script to step through the pages and pull out the basic details about each episode: title url summary date published writer reader The reader and the writer for each story is embedded in the title so a bit of text processing was required to cleanly identify each reader and writer I also had to manually reconcile a few episodes that didn't follow the same pattern as the others. All code used here and harvested data is available on Github. Matching to Wikidata I then took each of the writers and readers and matched them to Wikidata using the searchentities API. With the Wikidata ID, I'm able to retrieve many attributes each reader and writer by querying the Wikidata SPARQL endpoint, such as gender, date of birth, awards received, Library of Congress identifier, etc. Publishing with Datasette I saved this harvested data to two CSV files - episodes.csv and people.csv - and then built a sqlite database to publish with Datasette using the built-in integration with Zeit Now Now Publishing Complete Lahman Baseball Database with Datasette Summary: The Datasette API available at https://baseballdb.lawlesst.net now contains the full Lahman Baseball Database. In a previous post, I described how I'm using Datasette to publish a subset of the Lahman Baseball Database At that time, I only published three of the 27 tables available in the database I've since expanded that Datasette API to include the complete Baseball Database. The process for this was quite straightforward I ran the MySQL dump Lahman helpfully provides through this mysql2sqlite tool to provide an import file for sqlite Importing into sqlite for publishing with Datasette was as simple as: $ ./mysql2sqlite lahman2016.sql | sqlite3 baseball.db The complete sqlite version of the Lahman database is 31 megabytes. Querying With the full database now loaded, there are many more interesting queries that can be run Publishing the Lahman Baseball Database with Datasette Summary: publishing the Lahman Baseball Database with Datasette API available at https://baseballdb.lawlesst.net. For those of us interested in open data, an exciting new tool was released this month It's by Simon Willison and called Datasette Datasette allows you to very quickly convert CSV files to a sqlite database and publish on the web with an API Head over to Simon's site for more details SPARQL to Pandas Dataframes Update: See this Python module for converting SPARQL query results into Pandas dataframes. Using Pandas to explore data SPARQL Pandas is a Python based power tool for munging and analyzing data While working with data from SPARQL endpoints, you may prefer to explore and analyze it with pandas given its full feature set, strong documentation and large community of users. The code below is an example of issuing a query to the Wikidata SPARQL endpoint and loading the data into a pandas dataframe and running basic operations on the returned data. This is a modified version of code from Su Labs Here we remove the types returned by the SPARQL endpoint since they add noise and we will prefer to handle datatypes with Pandas. {% notebook sparql_dataframe.ipynb %} With a few lines of code, we can connect data stored in SPARQL endpoints with pandas, the powerful Python data munging and analysis library. See the Su Labs tutorial for more examples. You can also download the examples from this post as a Jupyter notebook. Querying Wikidata to Identify Globally Famous Baseball Players Earlier this year I had the pleasure of attending a lecture by Cesar Hidalgo of MIT's Media Lab One of the projects Hidalgo discussed was Pantheon Pantheon is a website and dataset that ranks "globally famous individuals" based on a metric the team created called the Historical Popularity Index (HPI) A key component of HPI is the number of Wikipedia pages an individual has in in various languages For a complete description of the project, see: Yu, A Python ETL and JSON-LD I've written an extension to petl, a Python ETL library, that applies JSON-LD contexts to data tables for transformation into RDF. The problem Converting existing data to RDF, such as for VIVO, often involves taking tabular data exported from a system of record, transforming or augmenting it in some way, and then mapping it to RDF for ingest into the platform The W3C maintains an extensive list of tools designed to map tabular data to RDF. General purpose CSV to RDF tools, however, almost always require some advanced preparation or cleaning of the data This means that developers and data wranglers often have to write custom code This code can quickly become verbose and difficult to maintain Using an ETL toolkit can help with this. ETL with Python One such ETL tool that I'm having good results with is petl, Python ETL OrgRef data as RDF Summary: Notes on mapping OrgRef to DBPedia and publishing with Linked Data Fragments . This past fall, Data Salon, a UK-based data services company, released an open dataset about academic and research organizations called OrgRef The data is available as a CSV and contains basic information about over 30,000 organizations. OrgRef was created with publishers in mind, and so its main focus is on institutions involved with academic content: universities, colleges, schools, hospitals, government agencies and companies involved in research. This announcement caught our attention at my place of work because we are compiling information about educational organizations in multiple systems, including a VIVO instance, and are looking for manageable ways to consume Linked Data that will enrich or augment our local systems Since the OrgRef data has been curated and focuses on a useful subset of data that we are interested in, it seemed to be a good candidate for investigation, even it isn't published as RDF Due to it's size, it is also easier to work with than attempting to consume and process something like VIAF or DBPedia itself. Process We downloaded the OrgRef CSV dataset and used the ever helpful csvkit tool to get handle on what data elements exist. $ csvstat --unique orgref.csv 1 Name: 31149 2 lawlesst-github-io-574 ---- Ted Lawless Ted Lawless I'm Ted Lawless, an application developer based in Ann Arbor, MI working in higher education. I post brief articles or technical notes from time to time about working with metadata, Web APIs and data management tools. See the list below. I've also compiled a list of presentations and projects that I've been involved with. If any of this is of interest to you, please feel free to contact me via email (lawlesst at gmail), Github , LinkedIn, or Twitter. Posts Datasette hosting costs 1-16-21 Connecting Python's RDFLib to AWS Neptune 03-15-19 Usable sample researcher profile data 05-19-18 Exploring 10 years of the New Yorker Fiction Podcast with Wikidata 02-06-18 Now Publishing Complete Lahman Baseball Database with Datasette 12-03-17 Publishing the Lahman Baseball Database with Datasette 11-20-17 SPARQL to Pandas Dataframes 10-26-17 Querying Wikidata to Identify Globally Famous Baseball Players 10-18-16 Python ETL and JSON-LD 12-05-15 OrgRef data as RDF 01-10-2015 See a full list of posts or the RSS feed. Ted Lawless, 2021 lawlesst at gmail Github LinkedIn Twitter legalhackers-org-6697 ---- Global Chapters | Legal Hackers Navigation Our Story Blog Global Chapters Press Videos Governance 2019 Summit Search Our Story Blog Global Chapters Press Videos Governance 2019 Summit Search Global Chapters Global Chapters Want to be a part of the largest grassroots legal innovation movement in the world? Join us! Legal Hackers is an open and collaborative community of individuals passionate about exploring and building creative solutions to some of the most pressing issues at the intersection of law and technology. Since 2012, we have used the hashtag #legalhack to share the activities of the global Legal Hackers community. Legal Hackers Online Communities Twitter: @legalhackers Facebook: www.facebook.com/groups/legalhackers Slack: legalhackers.slack.com (invitation here) LinkedIn: https://www.linkedin.com/groups/4782208 Hashtag: #legalhack Legal Hackers Local Chapters Legal Hackers is the largest grassroots legal innovation community in the world, with chapters in many major cities. Check out the list below to find your local Legal Hackers community. If you don’t see a community near you below, apply to start your own by clicking the appropriate link below: I want to start a traditional Legal Hackers Chapter for my city or region I want to start a student-only Legal Hackers Student Group for my university Questions? Read more about the differences between Chapters and Student Groups here, or email us at us at: info [at] legalhackers [dot] org. North AmericaEuropeAfricaAsiaAustralia & New ZealandLATAMStudent Groups Atlanta, Georgia Baltimore, Maryland Boston, Massachusetts Chicago, Illinois Cleveland, Ohio DFW (Dallas-Fort Worth), Texas Denver, Colorado Detroit, Michigan Houston, Texas Kansas City, Missouri London, Ontario Miami, Florida Minneapolis-St. Paul, Minnesota Montreal, Québec Nashville, Tennessee New Orleans, Louisiana New York, New York North Carolina Orlando, Florida Ottawa, Ontario Philadelphia, Pennsylvania Portland, Oregon Puerto Rico Salt Lake City, Utah San Diego, California San Francisco, California Seattle, Washington Toronto, Ontario Tulsa, Oklahoma Vancouver, British Columbia Washington, D.C. Amsterdam, Netherlands Asturias, Spain Athens, Greece Barcelona, Spain Bari, Italy Belfast, Northern Ireland Belgrade, Serbia Berlin, Germany Bern, Switzerland Bilbao, Spain Bologna, Italy Bristol, England Brno, Czech Republic Brussels, Belgium Bucharest, Romania Chișinău, Moldova Cologne/Bonn, Germany Copenhagen, Denmark Dublin, Ireland Estonia Firenze, Italy Frankfurt, Germany Geneva, Switzerland Genova, Italy Ghent, Belgium The Hague, Netherlands Hamburg, Germany Helsinki, Finland Istanbul, Turkey Kyiv, Ukraine Limassol, Cyprus Lisbon, Portugal Ljubljana, Slovenia London, England Luxembourg Lviv, Ukraine Madrid, Spain Malaga, Spain Manchester, England Mantova, Italy Milan, Italy Moscow, Russia Munich, Germany Napoli-Campania, Italy Novi Sad, Serbia Nürnberg, Germany Padova, Italy Paris, France Perugia, Italy Pescara, Italy Pisa, Italy Porto, Portugal Preston, England Roma, Italy Rijeka, Croatia Scotland Sheffield, England Skopje, Macedonia Sofia, Bulgaria St. Petersburg, Russia Stockholm, Sweden Timisoara, Romania Torino, Italy Toulouse, France Trieste, Italy Valencia, Spain Venezia, Italy Verona, Italy Vienna, Austria Vilnius, Lithuania Warsaw, Poland Zagreb, Croatia Zurich, Switzerland Abuja, Nigeria Accra, Ghana Alexandria, Egypt Cape Town, South Africa Casablanca, Morocco Douala, Cameroon Enugu, Nigeria Harare, Zimbabwe Imo, Nigeria Kampala, Uganda Lagos, Nigeria Luanda, Angola Nairobi, Kenya Almaty, Kazakhstan Ankara, Turkey Bhopal, India Chandigarh, India Delhi, India Goa, India Hong Kong Jakarta, Indonesia Jeddah, Saudi Arabia Kuala Lumpur, Malaysia Lahore, Pakistan Lucknow, India Manila, Philippines Patna, India Pune, India Seoul, South Korea Singapore Tokyo, Japan Melbourne, Australia Perth, Australia Sydney, Australia Wellington, New Zealand Aguascalientes, Mexico Arequipa, Peru Baja, Mexico Barranquilla, Colombia Belém, Brazil Belo Horizonte, Brazil Bogota, Colombia Brasília, Brazil Buenos Aires, Argentina Campinas, Brazil Cuiabá, Brazil Curitiba, Brazil Cusco, Peru Fortaleza, Brazil Goiânia, Brazil Guadalajara, Mexico Guatemala City, Guatemala Guayaquil, Ecuador Imperatriz. Brazil Jaraguá do Sul, Brazil Lavras, Brazil Lima, Peru Manaus, Brazil Manizales, Colombia Maringá, Brazil Medellin, Colombia Mexico City, Mexico Mogi das Cruzes, Brazil Monterrey, Mexico Montevideo, Uruguay Natal, Brazil Panama City, Panama Passo Fundo, Brazil Pereira, Colombia Petrolina, Brazil Porto Alegre, Brazil Porto Velho, Brazil Puebla, Mexico Querétaro, Mexico Quito, Ecuador Recife, Brazil Rio de Janiero, Brazil Salvador, Brazil Santa Cruz, Bolivia Santo André, Brazil São Paulo, Brazil San Salvador, El Salvador Sete Lagoas, Brazil Tegucigalpa, Hondouras Tepic, Mexico Kansas, USA – Kansas University New Brunswick, Canada – University of New Brunswick New York, USA – Brooklyn Law School North Carolina, USA – Wake Forest University School of Law Tennessee, USA – University of Tennessee College of Law Toronto, Canada – University of Toronto Coventry, England – University of Warwick Kyiv, Ukraine – National University of Kyiv-Mohyla Academy Kyiv, Ukraine – Taras Shevchenko National University of Kyiv London, England – University College London Sheffield, England – University of Sheffield Tarragona, Spain – Universitat Rovira i Virgili Kharagpur, India – Indian Institute of Technology Quito, Ecuador – Pontificia Universidad Católica del Ecuador (PUCE) Tweets by LegalHackers Our Story Blog Global Chapters Press Videos Governance 2019 Summit Type and Press “enter” to Search Our Story Global Chapters Governance Press librarian-aedileworks-com-5940 ---- Librarian of Things Skip to content Librarian of Things Weeknote 9 (2021) §1 Zotero PDF Reader A new look and functionality for Zotero’s PDF Reader is still in beta. I can’t wait for this version to be unleashed! §2 MIT D2O Earlier this week, MIT Press announced a new Open Access Monograph program. It appears that the transition of scholarly ebooks to another form of subscription product is continuing. §3 AI but Canadian I’m glad to see that the Federal Government has an Advisory Council on AI and I hope they are going to meaningfully fulfill their mandate. We are already late of the gate on this front. The city where I live is already trialing software that will suggest where road safety investments should be made based on an AI’s recommendations. §4 Discovering Science Misconduct via Image Integrity Not new but new to me. I’ve recently started following Elisabeth Bik on Twitter and it has been an eye-opening experience. Bik, a microbiologist from the Netherlands who moved to the United States almost two decades ago, is a widely lauded super-spotter of duplicated images in the scientific literature. On a typical day, she’ll scan dozens of biomedical papers by eye, looking for instances in which images are reused and reported as results from different experiments, or where parts of images are cloned, flipped, shifted or rotated to create ‘new’ data… Her skill and doggedness have earned her a worldwide following. “She has an uncommon ability to detect even the most complicated manipulation,” says Enrico Bucci, co-founder of the research-integrity firm Resis in Samone, Italy. Not every issue means a paper is fraudulent or wrong. But some do, which causes deep concern for many researchers. “It’s a terrible problem that we can’t rely on some aspects of the scientific literature,” says Ferric Fang, a microbiologist at the University of Washington, Seattle, who worked on a study with Bik in which she analysed more than 20,000 biomedical papers, finding problematic duplications in roughly 4% of them (E. M. Bik et al. mBio 7, e00809-16; 2016). “You have postdocs and students wasting months or years chasing things which turn out to not be valid,” he says. Nature 581, 132-136 (2020), doi: https://doi.org/10.1038/d41586-020-01363-z §5 And here’s one thing I did this week! Author Mita WilliamsPosted on March 5, 2021Categories weeknotesLeave a comment on Weeknote 9 (2021) Weeknote 8 (late) 2021 Last week I had a week that was more taxing than normal and I had nothing in the tank by Friday. So I’m putting together last week’s weeknotes today. Also, going forward each section heading has been anchor tagged for your link sharing needs. e.g. §1 §2 §3 §4 §5 and §6. I say this recognizing that the weeknote format resists social sharing which I consider a feature not a bug. §1 We Are Here From Library and Archives Canada: Over the past three years, We Are Here: Sharing Stories has digitized and described over 590,000 images of archival and published materials related to First Nations, Inuit and the Métis Nation. Digitized and described content includes textual documents, photographs, artworks and maps as well as numerous language publications. All items are searchable and linked in our Collection Search or Aurora databases. In order to make it easier to locate recently digitized Indigenous heritage content at LAC, we have created a searchable list of the collections and introduced a Google map feature – allowing users to browse archival materials by geographic region! Visit the We Are Here: Sharing Stories page to pick your destination and start your research! Those who know me, know that I’ve been advocating for more means of discovery via maps and location for a while now. While my own mapping has slowed down, I still bookmarked Georeferencing in QGIS 2.0 from The Programming Historian today. If used appropriately, maps hold a great deal of potential as a means to discover works related to indigenous peoples. Some forms of Indigenous Knowledge Organization such as the X̱wi7x̱wa Classification Scheme emphasize geographic grouping over alphabetical grouping. §2 Bookfeedme, Seymour! * Not every author has a newsletter that you can subscribe to in order to be informed when they have a new book out. You would think it would be easier to be notified otherwise, but with the mothballing of Amazon Alerts, the only other way I know to be notified is through Bookfeed.io which uses the Google Books API at its core. If don’t have a familiarity with RSS, see About Feeds for more help. * musical reference §3 Best article title in librarianship for 2021 Ain’t no party like a LibGuides Party / ’cause a LibGuides Party is mandatory ** ** musical reference §4 This is the time and this is the record of the time *** ScholComm librarians ask: Do we want a Version of Record or Record of Versions? *** musical reference §5 The 5000 Fingers of Dr. T **** A Hand With Many Fingers is a first-person investigative thriller. While searching through a dusty CIA archive you uncover a real Cold War conspiracy. Every document you find has new leads to research. But the archive might not be as empty as you think…   – Slowly unravel a thrilling historical conspiracy – Discover new clues through careful archival research – Assemble your theories using corkboard and twine – Experience a story of creeping paranoia **** musical reference / movie reference Hat tip: Errant Signal’s Bad Bosses, Beautiful Vistas, and Baffling Mysteries: Blips Episode 8 §6 Citational politics bibliography I’m not entirely sure how this bibliography on the politics of citation and references crossed my twitter stream, but I immediately bookmarked it. The bibliography is from a working group of CLEAR from Memorial University: Civic Laboratory for Environmental Action Research (CLEAR) is an interdisciplinary natural and social science lab space dedicated to good land relations directed by Dr. Max Liboiron at Memorial University, Canada. Equal parts research space, methods incubator, and social collective, CLEAR’s ways of doing things, from environmental monitoring of plastic pollution to how we run lab meetings, are based on values of humility, accountability, and anti-colonial research relations. We specialize in community-based and citizen science monitoring of plastic pollution, particularly in wild food webs, and the creation and use of anti-colonial research methodologies. To change science and research from its colonial, macho, and elitist norms, CLEAR works at the level of protocol. Rather than lead with good intentions, we work to ensure that every step of research and every moment of laboratory life exemplifies our values and commitments. To see more of how we do this, see the CLEAR Lab Book, our methodologies, and media coverage of the lab. About CLEAR I have no musical reference for this. Author Mita WilliamsPosted on March 1, 2021Categories citations, weeknotesLeave a comment on Weeknote 8 (late) 2021 Weeknote 7 (2021) Today the library is closed as is my place of work’s tradition on the last day of Reading Week. But as I have three events (helping in a workshop, giving a presentation, participating in a focus group) in my calendar, I’m just going to work the day and bank the time for later. §1 Barbara Fister in The Atlantic! We are experiencing a moment that is exposing a schism between two groups: those who have faith that there is a way to arrive at truth using epistemological practices that originated during the Enlightenment, and those who believe that events and experiences are portents to be interpreted in ways that align with their personal values. As the sociologist and media scholar Francesca Tripodi has demonstrated, many conservatives read the news using techniques learned through Bible study, shunning secular interpretations of events as biased and inconsistent with their exegesis of primary texts such as presidential speeches and the Constitution. The faithful can even acquire anthologies of Donald Trump’s infamous tweets to aid in their study of coded messages. While people using these literacy practices are not unaware of mainstream media narratives, they distrust them in favor of their own research, which is tied to personal experience and a high level of skepticism toward secular institutions of knowledge. This opens up opportunities for conservative and extremist political actors to exploit the strong ties between the Republican Party and white evangelical Christians. The conspiracy theory known as QAnon is a perfect—and worrisome—example of how this works. After all, QAnon is something of a syncretic religion. But its influence doesn’t stop with religious communities. While at its core it’s a 21st-century reboot of a medieval anti-Semitic trope (blood libel), it has shed some of its Christian vestments to gain significant traction among non-evangelical audiences. §2 New to me: Andromeda Yelton’s course reading list dedicated to AI in the Library. Hat-tip to Beck Tench. §3 I recently suggested that MPOW’s next Journal Club should deviate from looking at the library literature and reflect on personal knowledge management. I’m not sure how much take up there will be on the topic, but I love reading about how other people deliberately set up how they set up systems to help them learn. Case in point: Cecily Walker’s Thoughts Like A Runaway Train: Notes on Information Management with Zettelkasten Fun fact: I first learned of Zettelkasten from Beck Tench. Author Mita WilliamsPosted on February 19, 2021Categories weeknotes2 Comments on Weeknote 7 (2021) Weeknote 6 (2021) Another week in which I was doing a lot of behind the scenes work. §1 Duly noted: Here’s the article in full. §2 Years ago, I gave a keynote called Libraries are for use. And by use, I mean copying that featured the short and sad story of a person who was unable to donate their ebook to their local library. I thought of this slide this week when I learned the the DPLA is now offering an ebook creation service that allows library to an ebook collection — albeit of openly licensed or public domain works. I downloaded the SimplyE app for my iPad and I found it simple and well-designed. Having access to a good set of public domain work is great although I was slightly disappointed that there it wasn’t possible to import my own collection of ebooks into the app. But if I was a library, that’s what I could do. §3 I’m not ready to share my thoughts on this next matter yet but I’ve been recently re-considering how much of our knowledge is socially constructed. As such, I am still mulling over Harold Jarche’s Subject Matter Networks. It begins, We live in a networked world. Is it even possible for one person to have sufficient expertise to understand a complex situation such as this pandemic? So do we rely on one subject matter expert or rather a subject matter network? Author Mita WilliamsPosted on February 12, 2021February 12, 2021Categories weeknotesLeave a comment on Weeknote 6 (2021) Weeknote 5 (2021) §1 Last Friday I was interviewed for the podcast The Grasscast — a game-themed podcast named after the book, The Grasshopper: Games, Life, and Utopia. I ramble a little bit in the episode as I tried to be more open and conversational than concise and correct. But I also spoke that way because for some of the questions, no pat answer came immediately to mind. There was one question that stumped me but in my trying to answer, I think I found something I had not considered before. The question was, What is one bad thing about games? And I tried to convey that, unlike video games where you can play with strangers, most tabletop games are generally constrained by the preferences of your social circles. In order to convince others to spend time on a game that might think is too complicated for them or not for them, you need to have be a successful evangelist. Also the episode drifts into chatter about libraries, copyright and ebooks. §2 This week, I reviewed and published another batch of works for our institutional repository from our department of History that was prepared by our library assistants at Leddy At this point, we have reviewed and uploaded the works of half the faculty from this department. I’m hoping to finish the rest this month but I think I have some outstanding H5P work that might push the end of this project til March. §3 This morning I assisted with an online workshop called Data Analysis and Visualization in R for Ecologists that was being lead by a colleague of mine. R Version 4.0.3 (“Bunny-Wunnies Freak Out”) was released on 2020-10-10. The release of R 4.0.4 (“Lost Library Book”) is scheduled for Monday 2021-02-15. §4 On Sunday, I published a short response to “Windsor Works – An Economic Development Strategy” which is going to City Council on Monday. Why am I writing about this document here? I am mention this here because the proposed strategy (L.I.F.T.) lists the following as potential metric for measuring the strategy’s success… Take it from me, someone who knows a quite a bit about citations — the city should use another metric — perhaps one pertaining to local unemployment levels instead. §5 A viral post from 2019 resurfaced on my FB feed this week and unlike most of the posts I read there, this one did spark joy: And it struck me how much I loved that the anti-prom was being at the library. So I started doing some research! It appears to me that some anti-proms are technically better described as alternative proms. These proms have been established as an explicitly safe place where LGBTQ young people can enjoy prom. Other anti-proms are true morps. I now wonder what other anti-traditions should find a home at the public library. Author Mita WilliamsPosted on February 5, 2021February 5, 2021Categories weeknotesLeave a comment on Weeknote 5 (2021) Weeknote 4 (2021) I don’t have much that I can report in this week’s note. You are just going to have to take my word that this week, a large amount of my time was spent at meetings pertaining to my library department, my union, and anti-black racism work. §1 Last year, around this same time, some colleagues from the University and I organized an speaking event called Safer Communities in a ‘Smart Tech’ World: We need to talk about Amazon Ring in Windsor. Windsor’s Mayor proposes we be the first city in Canada to buy into the Ring Network. As residents of Windsor, we have concerns with this potential project. Seeing no venue for residents of Windsor to share their fears of surveillance and loss of privacy through this private-partnership, we hosted an evening of talks on January 22nd, 2020 at The Performance Hall at the University of Windsor’s School of Creative Arts Windsor Armories Building. Our keynote speaker was Chris Gilliard, heard recently on CBC’s Spark. Since that evening, we have been in the media raising our concerns, asking questions, and encouraging others to do the same. The City of Windsor has yet to have entered an agreement with Amazon Ring. This is good news. This week, the City of Windsor announced that it has entered a one-year deal partnership with Ford Mobility Canada to share data and insights via Ford’s Safety Insights platform. I don’t think this is good news for reasons outlined in this post called Safety Insights, Data Privacy, and Spatial Justice. §2 This week I learned a neat Tweetdeck hack. If set up a search as column, you can limit the results for that term using the number of ‘engagements’: §3 §4 I haven’t read this but I have it bookmarked for potential future reference: The weaponization of web archives: Data craft and COVID-19 publics: An unprecedented volume of harmful health misinformation linked to the coronavirus pandemic has led to the appearance of misinformation tactics that leverage web archives in order to evade content moderation on social media platforms. Here we present newly identified manipulation techniques designed to maximize the value, longevity, and spread of harmful and non-factual content across social media using provenance information from web archives and social media analytics. After identifying conspiracy content that has been archived by human actors with the Wayback Machine, we report on user patterns of “screensampling,” where images of archived misinformation are spread via social platforms. We argue that archived web resources from the Internet Archive’s Wayback Machine and subsequent screenshots contribute to the COVID-19 “misinfodemic” in platforms. Understanding these manipulation tactics that use sources from web archives reveals something vexing about information practices during pandemics—the desire to access reliable information even after it has been moderated and fact-checked, for some individuals, will give health misinformation and conspiracy theories more traction because it has been labeled as specious content by platforms. §5 I’m going to leave this tweet here because I might pick up this thread in the future: This reminds me of a talk given in 2018 by Data & Society Founder and President, danah boyd called You Think You Want Media Literacy… Do You? This essay still haunts me, largely because we still don’t have good answers for the questions that Dr. Boyd asks of us and the stakes have only gotten higher. Author Mita WilliamsPosted on January 29, 2021January 29, 2021Categories weeknotesLeave a comment on Weeknote 4 (2021) Weeknote 3 (2021) Hey. I missed last week’s weeknote. But we are here now. §1 This week I gave a class on searching scientific literature to a group of biology masters students. While I was making my slides comparing the Advanced Search capabilities of Web of Science and Scopus, I discovered this weird behaviour of Google Scholar: a phrase search generated more hits than not. I understand that Google Scholar performs ‘stemming’ instead of truncation in generating search results but this still makes no sense to me. §2 New to me: if you belong to an organization that is already a member of CrossRef, you are eligible to use a Similarity Check of documents for an additional fee. Perhaps this is a service we could provide to our OJS editors. §3 I’m still working through the Canadian Journal of Academic Librarianship special issue on Academic Libraries and the Irrational. Long time readers know that I have a fondness for the study of organizational culture and so it should not be too surprising that the first piece I wanted to read was The Digital Disease in Academic Libraries. It begins…. THOUGH several recent books and articles have been written about change and adaptation in contemporary academic libraries (Mossop 2013; Eden 2015; Lewis 2016), there are few critical examinations of change practices at the organizational level. One example, from which this paper draws its title, is Braden Cannon’s (2013) The Canadian Disease, where the term disease is used to explore the trend of amalgamating libraries, archives, and museums into monolithic organizations. Though it is centered on the impact of institutional convergence, Cannon’s analysis uses an ethical lens to critique the bureaucratic absurdity of combined library-archive-museum structures. This article follows in Cannon’s steps, using observations from organizational de-sign and management literature to critique a current trend in the strategic planning processes and structures of contemporary academic libraries. My target is our field’s ongoing obsession with digital transformation beyond the shift from paper-based to electronic resources, examined in a North American context and framed here as The Digital Disease. I don’t want to spoil the article but I do want to include this zinger of a symptom which is the first of several: If your library’s organizational chart highlights digital forms of existing functions, you might have The Digital Disease. Kris Joseph, The Digital Disease in Academic Libraries, Canadian Journal of Academic Librarianship, Vol 6 (2020) Ouch. That truth hurts almost as much as this tweet did: Author Mita WilliamsPosted on January 22, 2021January 22, 2021Categories weeknotesLeave a comment on Weeknote 3 (2021) Weeknote 1 (2021) This week’s post is not going to capture my ability to be productive while white supremacists appeared to be ushered in and out of the US Capitol building by complicit police and COVID-19 continued to ravage my community because our provincial government doesn’t want to spend money on the most vulnerable. Instead, I’m just going to share what I’ve learned this week that might prove useful to others. This week I added works to three faculty member’s ORCiD profiles using ORCiD’s trusted individual functionality. One of these professors was works in the field of Psychology and I found the most works for that researcher using BASE (Bielefeld Academic Search Engine) including APA datasets not found elsewhere. Similarly, I found obscure ERIC documents using The Lens.org. Unfortunately, you can’t directly import records into The Lens into an ORCiD profile unless you create a Lens profile for yourself. I’ve added The Lens to my list of free resources to consult when looking for research. This list already includes Google Scholar and Dimensions.ai. fin Author Mita WilliamsPosted on January 8, 2021February 4, 2021Categories weeknotesLeave a comment on Weeknote 1 (2021) Weeknote 50 (2020) §1 It looks like Andromeda Yelton is sharing weeknotes (“This week in AI“). I can’t wait to see what she shares with us all in 2021. §2 Earlier this fall, Clarivate Analytics announced that it was moving toward a future that calculated the Journal Impact Factor (JIF) based on the date of electronic publication and not the date of print publication… This discrepancy between how Clarivate treated traditional print versus online-only journals aroused skepticism among scientists, some of whom… cynically suggested that editors may be purposefully extending their lag in an attempt to artificially raise their scores. Changes to Journal Impact Factor Announced for 2021, Scholarly Kitchen, Phil Davis, Dec 7, 2020 I don’t think there is anything cynical about the observation that journal publishers picked up a trick from those booksellers who actively engage in promoting pre-publication book sales because those weeks of sales are accumulated and counted in the first week of publication which results in a better chance of landing on the New York Times Bestseller list. §3 In 2020, a team at Georgia State University compiled a report on virtual learning best practices. While evidence in the field is “sparse” and “inconsistent,” the report noted that logistical issues like accessing materials—and not content-specific problems like failures of comprehension—were often among the most significant obstacles to online learning. It wasn’t that students didn’t understand photosynthesis in a virtual setting, in other words—it was that they didn’t find (or simply didn’t access) the lesson on photosynthesis at all. That basic insight echoed a 2019 study that highlighted the crucial need to organize virtual classrooms even more intentionally than physical ones. Remote teachers should use a single, dedicated hub for important documents like assignments… The 10 Most Significant Education Studies of 2020, Edutopia, By Youki Terada, Stephen Merrill, December 4, 2020 §4 I’m pleased to say that with some much appreciated asssistance, our OJS instances are now able to allow to connect authors with their ORCiD profiles. This means that all authors who have articles accepted by these journals will receive an email asking if they would like to connect to ORCiD. I was curious how many authors from one of our existing journals had existing ORCiD profiles and so I did a quick check. This is how I did it. First, I used OJS’s export function to download all the metadata available at an article level. Next, I used the the information from that .csv file to create a new spreadsheet of full names. I then opened this file using OpenRefine. Then, through the generosity from Jeff Chiu, I was able check these last names with the ORCiD api using the OpenRefine Reconciliation Service and Chiu’s SmartName server: http://refine.codefork.com/reconcile/orcid/smartnames. Using the smart name integration, I can limit the list to those names very likely to match. With this set of likely suspects in hand, I can locate the authors in the OJS backend and then send invitations from the OJS server from their author profile (via the published article’s metadata page): §5 I can’t wait to properly tuck into this issue of The Canadian Journal of Academic Librarianship with its Special Focus on Academic Libraries and the Irrational §6 Happy Solstice, everyone. Author Mita WilliamsPosted on December 21, 2020December 21, 2020Categories weeknotesLeave a comment on Weeknote 50 (2020) Weeknote 49 (2020) §1 I don’t have much to report in regards to the work I’ve been doing this week. I tried to get our ORCiD-OJS plugin to work but there is some small strange bug that needs to be squished. Luckily, next week I will have the benefit of assistance from the good people of CRKN and ORCiD-CA. What else? I uploaded a bunch of files into our IR. I set up a site for an online-only conference being planned for next year. And I finally got around to trying to update a manuscript for potential publication. But this writing has been very difficult as my attention has been sent elsewhere many times this week. §2 Unfortunately I wasn’t able to catch the live Teach-In #AgainstSurveillance on Tuesday but luckily the talks have been captured and made available at http://againstsurveillance.net/ So many of our platforms are designed to extract user data. But not all of them are. Our institutions of higher education could choose to invest in free range ed-tech instead. §3 Bonus links! Making a hash out of knitting with data shannon_mattern’s Library | Zotero Mystery File! Author Mita WilliamsPosted on December 4, 2020December 4, 2020Categories weeknotesLeave a comment on Weeknote 49 (2020) Posts navigation Page 1 Page 2 … Page 6 Next page About me Librarian of Things is a blog by me, Mita Williams, who used to blog at New Jack Librarian until Blogger.com finally gave up the ghost. If you don’t have an RSS reader, you can subscribe for email delivery through mailchimp. You can learn more about my work at aedileworks.com as well as my other blogs and my weekly newsletter. If you are an editor of a scholarly journal and think that a post could be expanded into a more academic form, please let me know. Search for: Search Recent Posts Weeknote 9 (2021) Weeknote 8 (late) 2021 Weeknote 7 (2021) Weeknote 6 (2021) Weeknote 5 (2021) Archives March 2021 February 2021 January 2021 December 2020 November 2020 October 2020 September 2020 June 2020 May 2020 April 2020 October 2019 June 2019 May 2019 April 2019 March 2019 January 2019 July 2018 June 2018 May 2018 April 2018 December 2017 June 2017 May 2017 April 2017 November 2016 August 2016 July 2016 Meta Log in Entries feed Comments feed WordPress.org Librarian of Things Proudly powered by WordPress librarian-aedileworks-com-650 ---- Librarian of Things Librarian of Things Weeknote 9 (2021) §1 Zotero PDF Reader A new look and functionality for Zotero’s PDF Reader is still in beta. I can’t wait for this version to be unleashed! §2 MIT D2O Earlier this week, MIT Press announced a new Open Access Monograph program. It appears that the transition of scholarly ebooks to another form of subscription product … Continue reading "Weeknote 9 (2021)" Weeknote 8 (late) 2021 Last week I had a week that was more taxing than normal and I had nothing in the tank by Friday. So I’m putting together last week’s weeknotes today. Also, going forward each section heading has been anchor tagged for your link sharing needs. e.g. §1 §2 §3 §4 §5 and §6. I say this … Continue reading "Weeknote 8 (late) 2021" Weeknote 7 (2021) Today the library is closed as is my place of work’s tradition on the last day of Reading Week. But as I have three events (helping in a workshop, giving a presentation, participating in a focus group) in my calendar, I’m just going to work the day and bank the time for later. §1 Barbara … Continue reading "Weeknote 7 (2021)" Weeknote 6 (2021) Another week in which I was doing a lot of behind the scenes work. §1 Duly noted: Here’s the article in full. §2 Years ago, I gave a keynote called Libraries are for use. And by use, I mean copying that featured the short and sad story of a person who was unable to donate … Continue reading "Weeknote 6 (2021)" Weeknote 5 (2021) §1 Last Friday I was interviewed for the podcast The Grasscast — a game-themed podcast named after the book, The Grasshopper: Games, Life, and Utopia. I ramble a little bit in the episode as I tried to be more open and conversational than concise and correct. But I also spoke that way because for some … Continue reading "Weeknote 5 (2021)" Weeknote 4 (2021) I don’t have much that I can report in this week’s note. You are just going to have to take my word that this week, a large amount of my time was spent at meetings pertaining to my library department, my union, and anti-black racism work. §1 Last year, around this same time, some colleagues … Continue reading "Weeknote 4 (2021)" Weeknote 3 (2021) Hey. I missed last week’s weeknote. But we are here now. §1 This week I gave a class on searching scientific literature to a group of biology masters students. While I was making my slides comparing the Advanced Search capabilities of Web of Science and Scopus, I discovered this weird behaviour of Google Scholar: a … Continue reading "Weeknote 3 (2021)" Weeknote 1 (2021) This week’s post is not going to capture my ability to be productive while white supremacists appeared to be ushered in and out of the US Capitol building by complicit police and COVID-19 continued to ravage my community because our provincial government doesn’t want to spend money on the most vulnerable. Instead, I’m just going … Continue reading "Weeknote 1 (2021)" Weeknote 50 (2020) §1 It looks like Andromeda Yelton is sharing weeknotes (“This week in AI“). I can’t wait to see what she shares with us all in 2021. §2 Earlier this fall, Clarivate Analytics announced that it was moving toward a future that calculated the Journal Impact Factor (JIF) based on the date of electronic publication and not … Continue reading "Weeknote 50 (2020)" Weeknote 49 (2020) §1 I don’t have much to report in regards to the work I’ve been doing this week. I tried to get our ORCiD-OJS plugin to work but there is some small strange bug that needs to be squished. Luckily, next week I will have the benefit of assistance from the good people of CRKN and … Continue reading "Weeknote 49 (2020)" libraries-uc-edu-2824 ---- Libraries | University Of Cincinnati Skip to main content Use the form to search UC's web site for pages, programs, directory profiles and more. Libraries Online Library For Faculty For Graduate Students For Undergraduates For Staff Libraries Archives and Rare Books About the Archives and Rare Books Library Annual Summary Research Policies Staff FAQs Desiderata Collections Urban Studies Rare Books University Archives German Americana Local Government Records Search ARB Collections Records Management Disposal Submission Form Online Exhibits Special Projects Services Genealogy Research Image Reproduction and Use Archives and Rare Books Teaching Support Internship Program CCM CCM Catalog Search CCM Research About the CCM Library CCM Staff Directory CCML FAQs CCM Services CCM Special-Collections CEAS About CEAS Library History Floor Plan Guide for New Faculty Course Research Guides Research Resources Ask a Librarian Tutorial Videos Senior Design Reports Special Collections The Armstrong Collection The Cooperative Engineer The Strauss Collection Contact Us Ask a Librarian Reserve a Room CECH About Faculty & Staff Our Collections Borrowing Guidelines Find Us Services Poster Printing Reserves Instruction Resources MakerLab Technology for Checkout Study Spaces Info Commons Chemistry-Biology About Staff Oesper History of Chemistry Collection Services Getting Around the Library Ask a Librarian Classics About the Classics Library Classics Library Guide Snapshot of the Classics Collections Highlights of Classics Books Classics Collection Development Policy Why a Classics Library? Classics Book of the Month Classics Library's Open Access Link of the Month Classics Library Book Desiderata Virtual Tour of the Classics Library Usage Statistics Staff Directory Recent Book Acquisitions Classicizing Cincinnati Classics Library Policies Classics Library Collections German Classics Dissertations Modern Greek Journal Collection Classics Map Collection Greek Rare Book Collection Latin Rare Book Collection Classics Books with Author Signatures UC Department of Classics Archive Classics Library Services Group Study Room in Classics Scanners, Printers, Copiers in Classics Tours and Drop-Ins Classics Library Picture Gallery DAAP Collections Architecture Drawings Related Regional Libraries Exhibits Instruction Services Study Rooms Contact Us DAAP Library COVID-19 Updates Geology-Mathematics-Physics About the Library History Getting around the Library Help Ask-a-Librarian Help for Faculty Help for Students Help for Undergraduate Students Services New Books Special Collections Rare Book Collection Willis G. Meyer Map Collection Guidebook Collection Health Sciences Services Membership Room Reservations Borrow HSL-IT Research Help HSL History HSL Directions HSL Staff Directory Winkler Center About Cecil Striker Society & Lecture Resources Services Langsam Law UC Blue Ash About the UCBA Library UCBA Library Faculty & Staff UCBA Library Policies Vision and Core Values Annual Reports Student Employment at UCBA Library COVID-19 Services FAQs Borrowing Materials at UCBA Library Borrowing & Returning Reserves Equipment Lending Study Spaces Resources for UCBA Faculty and Staff Course Reserve Guidelines Collections Library Liaison Program Space Reservations Teaching Support Ask the UCBA Library UC Clermont Library About UC Clermont Library Student Employment Support the Library Collection Development Contact Us 2020-2021 FAQs Borrow Materials Technology & Equipment Textbook Reserves Study Spaces Policies and Guidelines Teaching Support Information Literacy Course Materials Course Reserves Ask UC Clermont Winkler Center Other Area Libraries Ask Find, Request, Borrow Search for Materials Call Number Locator (Langsam Library) Borrow Materials Borrow Equipment Renew Materials Request Materials Reserves E-Reserves Faculty Guidelines Traditional Course Reserves Reserves - Contacts Copyright Resources Textbook Affordability Help Finding and Using Materials Interlibrary Loan Special Collections FAQ Digital Collections Research & Teaching Support Research Data Services Lab Spaces Workshops and Education Meet the Team UC Data Day Data & Computational Science Series Data Tools Testimonials Data Visualization Showcase Digital Scholarship Center Citing Sources Copyright Repositories Subject Librarians UC Press Teaching Support Workshops & Trainings Ask a Librarian Online Reference Shelf Library Materials for Online Teaching Spaces & Tech Room Reservations Adaptive Technologies Library Media Space Student Technology Resources Center Borrow Equipment About Covid-19 Click & Collect Health and Safety Protocols Hours and Locations Contact Us Employment ohiolink-luminaries Staff Directory Giving Adopt-a-book Funding Donors Strategic Plan Tenets Pillars Ten Initiatives News and Events Policies Acceptable Use Gift Policy Source Library Faculty Resources for Library Faculty Library Faculty Directory Dean's Welcome Core Beliefs Login Off Campus Access Affiliate and Guest Access Help and Troubleshooting Tools VPN Interlibrary Loan Interlibrary Lending Policies My Library Record Pay Fines Fine Appeal Form Articles Books Journals Databases Search Summon to find articles, books, and more Advanced Summon Search | Find by DOI or PMID | More Search Options |Help Search the Library Catalog for books and more Advanced Catalog Search | Guest Access |  More Search Options | Help Find E-Journals or Print Journals E-Journals | Print Journals | Browzine | More Search Options | Help Search the A-Z Indexes databases list Browse Databases | Top 10 Databases | Academic Search Complete | More Search Options UC Online Library Whether onsite or online, we continue to connect students, faculty, researchers and scholars to dynamic data, information and resources. UC Online Library Service Updates Off Campus Access Contact Us Interlibrary Loan Research Guides Browse all guides Spring 2021 Return to Campus As we step into Spring Term 2021, our motto “Strength in Unity” continues to take on added meaning. Health and safety remain a top priority in an environment featuring virtual, hybrid, HyFlex and in-person classes, testing as a critical component toward a safer community as well as remote work options. Visit UC's Public Health Site Online Library Searching for a resource, have a question or simply browsing for fun? We've brought all online resources together in one place. Online Library Digital Technologies & Innovation UC Libraries creates and utilizes learning tools and research platforms that transform the user experience and the creation of new knowledge. Special Collections UC Libraries preserves and provides access to special collections and the scholarly and historical record of the university, including archival as well as born-digital content and datasets. View   With 4.3 million volumes and access to thousands of electronic resources available 24/7 through our online library catalog, UC Libraries' virtual and physical locations offer resources for everyone. UC Libraries includes the Walter C. Langsam Library, the Archives and Rare Books Library, the Donald C. Harrison Health Sciences Library, and eight college and departmental libraries serving constituents in applied science, architecture, art, biology, chemistry, classics, design, education, engineering, geology, mathematics, music, physics and planning.   Give to UC Libraries Library News "Off the Shelf and into the Lab" webinar May 6 April 14, 2021 Event: May 6, 2021 7:00 PM Join the Henry R. Winkler Center for the History of the Health Professions and the Cecil Striker Society for the History of Medicine at 7 p.m., Wednesday, May 6, for the third lecture in the Cecil Striker Webinar series. Faculty Awards 2021: Arlene Johnson April 6, 2021 Through her many roles in her 20 years at the University of Cincinnati, Arlene Johnson has served students, faculty and staff in the pursuit of knowledge — fitting for the recipient of the Faculty Senate Exemplary Service to the University Award. ‘CAN UC my mask’ canned food sculpture temporarily installed in... March 23, 2021 The masked Bearcat is showing school pride while reminding everyone to stay safe by wearing a mask. Debug Query for this More News Library Blog News from the Library Blog UCBA Library Needs You!  Now Hiring for Summer Semester  Fri, 23 Apr 2021 UCBA Library Needs You!  Now Hiring for Summer Semester  ARE YOU…  Friendly and welcoming?   Eager to help students, staff and faculty?   If so, consider joining the UCBA Library Team!   Apply:  https://libraries.uc.edu/libraries/ucba/about/employment.html    April 20 Service Note: Access to library resources is currently down Tue, 20 Apr 2021 UPDATE: All access has been restored. ________________________________ All access to library resources through the proxy server is currently down. OCLC is working on the issue and we expect a resolution shortly. We apologize for the inconvenience. If you know the resource URL you are attempting to access, try this page: https://libapps.libraries.uc.edu/proxy/proxygoto.php. The URL for the […] The Preservation Lab celebrates Preservation Week 2021: Preservation in Action Mon, 19 Apr 2021 Join The Preservation Lab April 26-30 as they celebrate the American Library Association’s (ALA) Preservation Week, “Preservation in Action.” More information, including a schedule of the week’s events, is available on the Preservation’s blog. “Off the Shelf and into the Lab” May 6th webinar to highlight medical history, preservation and the UC Libraries’ Adopt-A-Book program Wed, 14 Apr 2021 Join the Henry R. Winkler Center for the History of the Health Professions and the Cecil Striker Society for the History of Medicine, Thursday, May 6 at 7:00 p.m. for the 3rd lecture in the Cecil Striker Webinar series. Off the Shelf and into the Lab: Medical History, Preservation and the University of Cincinnati Libraries’ […] Ending the HIV Epidemic, a panel discussion April 21 Mon, 12 Apr 2021 Join UC Libraries online Wednesday, April 21, 1:00 p.m. for “Ending the HIV Epidemic,” a panel discussion. Learn from various Cincinnati area HIV/AIDS service providers about how long-standing HIV prevention efforts combined with education on treatment, viral load suppression and concerted efforts by multiple agencies are being utilized to make HIV infection a thing of […] University of Cincinnati Libraries PO Box 210033 Cincinnati, Ohio 45221-0033 Contact Us | Staff Directory UC Tools Canopy & Canvas One Stop Email Catalyst Shuttle Tracker IT Help UC VPN Bearcats Landing About Us Maps & Directions Jobs News Diversity Governance & Policies Directory Events Calendar University of Cincinnati | 2600 Clifton Ave. | Cincinnati, OH 45221 | ph: 513-556-6000 Alerts | Clery and HEOA Notice | Notice of Non-Discrimination | eAccessibility Concern | Privacy Statement | Copyright Information © 2020 University of Cincinnati University of Cincinnati Libraries PO Box 210033 Cincinnati, Ohio 45221-0033 Contact Us | Staff Directory © 2020 University of Cincinnati chat loading... library-brown-edu-7056 ---- Brown University Library Digital Technologies Skip to content Find & Borrow Articles, Journals, & Databases Subject Support Hours & Locations Ask a Question Now Off-Campus Access Library A-Z Brown University Library Digital Technologies Menu and widgets Home BDR Blacklight Digital Preservation Drupal Josiah OCRA ORCID Researchers@Brown Web WordPress Search the DT Blog Search for: Authors Ben Cail (22) Hector Correa (9) Jean Rainwater (9) Kerri Hicks (6) Kevin Powell (6) Ted Lawless (3) Adam Bradley (2) Birkin Diana (1) Bundler 2.1.4 and homeless accounts This week we upgraded a couple of our applications to Ruby 2.7 and Bundler 2.1.4 and one of the changes that we noticed was that Bundler was complaining about not being able to write to the /opt/local directory. Turns out this problem shows up because the account that we use to run our application is a system account that does not have a home folder. This is how the problems shows up: $ su - system_account $ pwd /opt/local $ mkdir test_app $ cd test_app $ pwd /opt/local/test_app $ gem install bundler -v 2.1.4 $ bundler --version `/opt/local` is not writable. Bundler will use `/tmp/bundler20200731-59360-174h3lz59360' as your home directory temporarily. Bundler version 2.1.4 Notice that Bundler complains about the /opt/local directory not being writable, that’s because we don’t have home for this user, in fact env $HOME outputs /opt/local rather than the typical /home/username. Although Bundler is smart enough to use a temporary folder instead and continue, the net result of this is that if we set a configuration value for Bundler in one execution and try to use that configuration value in the next execution Bundler won’t be able to find the value that we set in the first execution (my guess is because the value was saved in a temporary folder.) Below is an example of this. Notice how we set the path value to vendor/bundle in the first command, but then when we inspect the configuration in the second command the configuration does not report the value that we just set: # First - set the path value $ bundle config set path 'vendor/bundle' `/opt/local` is not writable. Bundler will use `/tmp/bundler20200731-60203-16okmcg60203' as your home directory temporarily. # Then - inspect the configuration $ bundle config `/opt/local` is not writable. Bundler will use `/tmp/bundler20200731-60292-1r50oed60292' as your home directory temporarily. Settings are listed in order of priority. The top value will be used. Ideally the call to bundle config will report the vendor/bundle path that we set, but it does not in this case. In fact if we run bundle install next Bundler will install the gems in $GEM_PATH rather than using the custom vendor/bundle directory that we indicated. Working around the issue One way to work around this issue is to tell Bundler that the HOME directory is the one from where we are running bundler (i.e. /opt/local/test_app) in our case. # First - set the path value  # (no warning is reported) $ HOME=/opt/local/test_app/ bundle config set path 'vendor/bundle' # Then - inspect the configuration $ bundle config `/opt/local` is not writable. Bundler will use `/tmp/bundler20200731-63230-11dmgcb63230' as your home directory temporarily. Settings are listed in order of priority. The top value will be used. path Set for your local app (/opt/local/test_app/.bundle/config): "vendor/bundle" Notice that we didn’t get a warning in the first command (since we indicated a HOME directory) and then, even though we didn’t pass a HOME directory to the second command, our value was picked up and shows the correct value for the path setting (vendor/bundle). So it seems to me that when HOME is set to a non-writable directory (/opt/local in our case) Bundler picks up the values from ./bundle/config if it is available even as it complains about /opt/local not being writable. If we were to run bundle install now it will install the gems in our local vendor/bundle directory. This is good for us, Bundler is using the value that we configured for the path setting (even though it still complains that it cannot write to /opt/local.) We could avoid the warning in the second command if we pass the HOME value here too: $ HOME=/opt/local/test-app/ bundle config Settings are listed in order of priority. The top value will be used. path Set for your local app (/opt/local/test-app/.bundle/config): "vendor/bundle" But the fact the Bundler picks up the correct values from ./bundle/config when HOME is set to a non-writable directory was important for us because it meant that when the app runs under Apache/Passenger it will also work. This is more or less how the configuration for our apps in http.conf looks like, notice that we are not setting the HOME value.    PassengerBaseURI /test-app   PassengerUser system_account   PassengerRuby /opt/local/rubies/ruby-2.7.1/bin/ruby   PassengerAppRoot /opt/local/test-app   SetEnv GEM_PATH /opt/local/.gem/ruby/2.7.1/ Some final thoughts Perhaps a better solution would be to set a HOME directory for our system_account, but we have not tried that, we didn’t want to make such a wide reaching change to our environment just to please Bundler. Plus this might be problematic in our development servers where we share the same system_account for multiple applications (this is not a problem in our production servers) We have no idea when this change took effect in Bundler. We went from Bundler 1.17.1 (released in October/2018) to Bundler 2.1.4 (released in January/2020) and there were many releases in between. Perhaps this was documented somewhere and we missed it. In our particular situation we noticed this issue because one of our gems needed very specific parameters to be built during bundle install. We set those values via a call to bundle config build.mysql2 --with-mysql-dir=xxx mysql-lib=yyy and those values were lost by the time we ran bundle install and the installation kept failing. Luckily we found a work around and were able to install the gem with the specific parameters. Posted on July 31, 2020February 2, 2021Author hcorreaCategories Programming Upgrading from Solr 4 to Solr 7 A few weeks ago we upgraded the version of Solr that we use in our Discovery layer, we went from Solr 4.9 to Solr 7.5. Although we have been using Solr 7.x in other areas of the library this was a significant upgrade for us because searching is the raison d’être of our Discovery layer and we wanted to make sure that the search results did not change in unexpected ways with the new field and server configurations in Solr. All in all the process went smooth for our users. This blog post elaborates on some of the things that we had to do in order to upgrade. Managed Schema This is the first Solr that we setup to use the managed-schema feature in Solr. This allows us to define field types and fields via the Schema API rather than by editing XML files. All in all this was a good decision and it allows us to recreate our Solr instances by running a shell script rather than by copying XML files. This feature was very handy during testing when we needed to recreate our Solr core for testing purposes multiple times. You can see the script that we use to recreate our Solr core in GitHub. We are still tweaking how we manage updates to our schema. For now we are using a low-tech approach in which we create small scripts to add fields to the schema that is conceptually similar to what Rails does with database migrations, but our approach is still very manual. Default Field Definitions The default field definitions in Solr 7 are different from the default field definitions in Solr 4, this is not surprising given that we skipped two major versions of Solr, but it was one one the hardest things to reconcile. Our Solr 4 was setup and configured many years ago and the upgrade forced us to look very close into exactly what kind of transformations we were doing to our data and decide what should be modified in Solr 7 to support the Solr 4 behavior versus what should be updated to use new Solr 7 features. Our first approach was to manually inspect the “schema.xml” in Solr 4 and compare it with the “managed-schema” file in Solr 7 which is also an XML file. We soon found that this was too cumbersome and error prone. But we found the output of the LukeRequestHandler to be much more concise and easier to compare between the versions of Solr, and lucky us, the output of the LukeRequestHandler is identical in both versions of Solr! Using the LukeRequestHandler we dumped our Solr schema to XML files and compare those files with a traditional file compare tool, we used the built-in file compare option in VS Code but any file compare tool would do. These are the commands that we used to dump the schema to XML files: curl http://solr-4-url/admin/luke?numTerms=0 > luke4.xml curl http://solr-7-url/admin/luke?numTerms=0 > luke7.xml The output of the LukeRequestHandler includes both the type of field (e.g. string) and the schema definition (single value vs multi-value, indexed, tokenized, et cetera.)  string --SD------------l Another benefit of using the LukeRequestHandler instead of going by the fields defined in schema.xml is that the LukeRequestHandler only outputs fields that are indeed used in the Solr core, whereas schema.xml lists fields that were used at one point even if we don’t use them anymore. ICUFoldingFilter In Solr 4 a few of the default field types used the ICUFoldingFilter which handles diacritics so that a word like “México” is equivalent to “Mexico”. This filter used to be available by default in a Solr 4 installation but that is not the case anymore. In Solr 7 ICUFoldingFilter is not enabled by default and you must edit your solrconfig.xml as indicated in the documentation to enable it (see previous link). and then you can use it in a field type by adding it as a filter: curl -X POST -H 'Content-type:application/json' --data-binary '{ "add-field-type" : {     "name":"text_search",     "class":"solr.TextField",     "analyzer" : {        "tokenizer":{"class":"solr.StandardTokenizerFactory"},        "filters":[          {"class":"solr.ICUFoldingFilterFactory"},          ...      ]    }  } }' $SOLR_CORE_URL/schema Handle Select HandleSelect is a parameter that is defined in the solrconfig.xml and in previous versions of Solr it used to default to true but starting in Solr 7 it defaults to false. The version of Blacklight that we are using (5.19) expects this value to be true. This parameter is what allows Blacklight to use a request handler like “search” (without a leading slash) instead of “/search”. Enabling handleSelect is easy, just edit the requestDispatcher setting in the solrconfig.xml LocalParams and Dereferencing Our current version of Blacklight uses LocalParams and Dereferencing heavily and support for these two features changed drastically in Solr 7.2. This is a good enhancement in Solr but it caught us by surprise.  The gist of the problem is that if the solrconfig.xml sets the query parser to DisMax or eDisMax then Solr will not recognize a query like this:  {!qf=$title_qf} We tried several workarounds and settled on setting the default parser (defType) in solrconfig.xml to Lucene and requesting eDisMax explicitly from the client application: {!type=dismax qf=$title_qff}Coffee&df=id It’s worth nothing that passing defType as a normal query string parameter to change the parser did not work for us for queries using LocalParams and Dereferencing.  Stop words One of the settings that we changed in our new field definitions was the use of stop words. We are now not using stop words when indexing title fields. This was one of the benefits of us doing a full review of each one of our field types and tweak them during the upgrade. The result is that now searches for titles that are only stop words (like “There there”) return the expected results. Validating Results To validate that our new field definitions and server side configuration in Solr 7 were compatible with that we had in Solr 4 we did several kinds of tests, some of them manual and others automated. We have small suite of unit tests that Jeanette Norris and Ted Lawless created years ago and that we still use to validate some well known scenarios that we want to support. You can see those “relevancy” tests in our GitHub repository. We also captured thousands of live searches from our Discovery layer using Solr 4 and replayed them with Solr 7 to make sure that the results of both systems were compatible. To determine that results were compatible we counted how many of the top 10 results, top 5, and top 1 were included in the results of both Solr instances. The following picture shows an example of how the results looks like. The code that we used to run the searches on both Solr and generate the table is on our GitHub repo. CJK Searches The main reason for us to upgrade from Solr 4 to Solr 7 was to add support for Chinese, Japanese, and Korean (CJK) searches. The way our Solr 4 index was created we did not support searches in these languages. In our Solr 7 core we are using the built-in CJK fields definitions and our results are much better. This will be the subject of future blog post. Stay tuned. Posted on January 30, 2020February 2, 2021Author hcorreaCategories SolrTags Blacklight, Josiah, Solr PyPI packages Recently, we published two Python packages to PyPI: bdrxml and bdrcmodels. No one else is using those packages, as far as I know, and it takes some effort to put them up there, but there are benefits from publishing them. Putting a package on PyPI makes it easier for other code we package up to depend on bdrxml. For our indexing package, we can switch from this: ‘bdrxml @ https://github.com/Brown-University-Library/bdrxml/archive/v1.0a1.zip#sha1=5802ed82ee80a9627657cbb222fe9c056f73ad2c’, to this: ‘bdrxml>=1.0’, in setup.py, which is simpler. This also lets us using Python’s package version checking to not pin bdrxml to just one version, which is helpful when we embed the indexing package in another project that may use a different version of bdrxml. Publishing these first two packages also gave us experience, which will help if we publish more packages to PyPI. Posted on June 12, 2019Author Ben CailCategories Uncategorized New RIAMCO website A few days ago we released a new version of the Rhode Island Archival and Manuscript Collections Online (RIAMCO) website. The new version is a brand new codebase. This post describes a few of the new features that we implemented as part of the rewrite and how we designed the system to support them. The RIAMCO website hosts information about archival and manuscript collections in Rhode Island. These collections (also known as finding aids) are stored as XML files using the Encoded Archival Description (EAD) standard and indexed into Solr to allow for full text searching and filtering. Look and feel The overall look and feel of the RIAMCO site is heavily influenced by the work that the folks at the NYU Libraries did on their site. Like NYU’s site and Brown’s Discovery tool the RIAMCO site uses the typical facets on the left, content on the right style that is common in many library and archive websites. Below a screenshot on how the main search page looks like: Architecture Our previous site was put together over many years and it involved many separate applications written in different languages: the frontend was written in PHP, the indexer in Java, and the admin tool in (Python/Django). During this rewrite we bundled the code for the frontend and the indexer into a single application written in Ruby on Rails. [As of September 13th, 2019 the Rails application also provides the admin interface.] You can view a diagram of this architecture and few more notes about it on this document. Indexing Like the previous version of the site, we are using Solr to power the search feature of the site. However, in the previous version each collection was indexed as a single Solr document whereas in the new version we are splitting each collection into many Solr documents: one document to store the main collection information (scope, biographical info, call number, et cetera), plus one document for each item in the inventory of the collection. This new indexing strategy significantly increased the number of Solr documents that we store. We went from from 1100+ Solr documents (one for each collection) to 300,000+ Solr documents (one for each item in the inventory of those collections). The advantage of this approach is that now we can search and find items at a much granular level than we did before. For example, we can tell a user that we found a match on “Box HE-4 Folder 354” of the Harris Ephemera collection for their search on blue moon rather than just telling them that there is a match somewhere in the 25 boxes (3,000 folders) in the “Harris Ephemera” collection. In order to keep the relationship between all the Solr documents for a given collection we are using an extra ead_id_s field to store the id of the collection that each document belongs to. If we have a collection “A” with three items in the inventory they will have the following information in Solr: {id: "A", ead_id_s: "A"} // the main collection record {id: "A-1", ead_id_s: "A"} // item 1 in the inventory {id: "A-2", ead_id_s: "A"} // item 2 in the inventory {id: "A-3", ead_id_s: "A"} // item 3 in the inventory This structure allows us to use the Result Grouping feature in Solr to group results from a search into the appropriate collection. With this structure in place we can then show the results grouped by collection as you can see in the previous screenshot. The code to index our EAD files into Solr is on the Ead class. We had do add some extra logic to handle cases when a match is found only on a Solr document for an inventory item (but not on the main collection) so that we can also display the main collection information along the inventory information in the search results. The code for this is on the search_grouped() function of the Search class. Hit highlighting Another feature that we implemented on the new site is hit highlighting. Although this is a feature that Solr supports out of the box there is some extra coding that we had to do to structure the information in a way that makes sense to our users. In particular things get tricky when the hit was found in a multi value field or when Solr only returns a snippet of the original value in the highlights results. The logic that we wrote to handle this is on the SearchItem class. Advanced Search We also did an overhaul to the Advanced Search feature. The layout of the page is very typical (it follows the style used in most Blacklight applications) but the code behind it allows us to implement several new features. For example, we allow the user to select any value from the facets (not only one of the first 10 values for that facet) and to select more than one value from those facets. We also added a “Check” button to show the user what kind of Boolean expression would be generated for the query that they have entered. Below is a screenshot of the results of the check syntax for a sample query. There are several tweaks and optimizations that we would like to do on this page, for example, opening the facet by Format is quite slow and it could be optimized. Also, the code to parse the expression could be written to use a more standard Tokenizer/Parser structure. We’ll get to that later on… hopefully : ) Individual finding aids Like on the previous version of the site, the rendering of individual finding aids is done by applying XSLT transformations to the XML with the finding aid data. We made a few tweaks to the XSLT to integrate them on the new site but the vast majority of the transformations came as-is from the previous site. You can see the XSLT files in our GitHub repo. It’s interesting that GitHub reports that half of the code for the new site is XSLT: 49% XSLT, 24% HTML, and 24% Ruby. Keep in mind that these numbers do not take into account the Ruby on Rails code (which is massive.) Source code The source code for the new application is available in GitHub. Acknowledgements Although I wrote the code for the new site, there were plenty of people that helped me along the way in this implementation, in particular Karen Eberhart and Joe Mancino. Karen provided the specs for the new site, answered my many questions about the structure of EAD files, and suggested many improvements and tweaks to make the site better. Joe helped me find the code for the original site and indexer, and setup the environment for the new one. Posted on June 5, 2019February 2, 2021Author hcorreaCategories Programming, RIAMCO, SolrTags Solr Deploying with shiv I recently watched a talk called “Containerless Django – Deploying without Docker”, by Peter Baumgartner. Peter lists some benefits of Docker: that it gives you a pipeline for getting code tested and deployed, the container adds some security to the app, state can be isolated in the container, and it lets you run the exact same code in development and production. Peter also lists some drawbacks to Docker: it’s a lot of code that could slow things down or have bugs, docker artifacts can be relatively large, and it adds extra abstractions to the system (eg. filesystem, network). He argues that an ideal deployment would include downloading a binary, creating a configuration file, and running it (like one can do with compiled C or Go programs). Peter describes a process of deploying Django apps by creating a zipapp using shiv and goodconf, and deploying it with systemd constraints that add to the security. He argues that this process achieves most of the benefits of  Docker, but more simply, and that there’s a sweet spot for application size where this type of deploy is a good solution. I decided to try using shiv with our image server Loris. I ran the shiv command “shiv -o loris.pyz .”, and I got the following error: User “loris” and or group “loris” do(es) not exist. Please create this user, e.g.: `useradd -d /var/www/loris -s /sbin/false loris` The issue is that in the Loris setup.py file, the install process not only checks for the loris user as shown in the error, but it also sets up directories on the filesystem (including setting the owner and permission, which requires root permissions). I submitted a PR to remove the filesystem setup from the Python package installation (and put it in a script the user can run), and hopefully in the future it will be easier to package up Loris and deploy it different ways. Posted on May 17, 2019Author Ben CailCategories BDR, Programming Checksums In the BDR, we calculate checksums automatically on ingest (Fedora 3 provides that functionality for us), so all new content binaries going into the BDR get a checksum, which we can go back and check later as needed. We can also pass checksums into the BDR API, and then we verify that Fedora calculates the same checksum for the ingested file, which shows that the content wasn’t modified since the first checksum was calculated. We have only been able to use MD5 checksums, but we want to be able to use more checksum types. This isn’t a problem for Fedora, which can calculate multiple checksum types, such as MD5, SHA1, SHA256, and SHA512. However, there is a complicating factor – if Fedora gets a checksum mismatch, by default it returns a 500 response code with no message, so we can’t tell whether it was a checksum mismatch or some other server error. Thanks to Ben Armintor, though, we found that we can update our Fedora configuration so it returns the Checksum Mismatch information. Another issue in this process is that we use eulfedora (which doesn’t seem to be maintained anymore). If a checksum mismatch happens, it raises a DigitalObjectSaveFailure, but we want to know that there was a checksum mismatch. We forked eulfedora and exposed the checksum mismatch information. Now we can remove some extra code that we had in our APIs, since more functionality is handled in Fedora/eulfedora, and we can use multiple checksum types. Posted on March 29, 2019Author Ben CailCategories BDR Exporting Django data We recently had a couple cases where we wanted to dump the data out of a Django database. In the first case (“tracker”), we were shutting down a legacy application, but needed to preserve the data in a different form for users. In the second case (“deposits”), we were backing up some obsolete data before removing it from the database. We handled the processes in two different ways. Tracker For the tracker, we used an export script to extract the data. Here’s a modified version of the script: def export_data(): now = datetime.datetime.now() dir_name = 'data_%s_%s_%s' % (now.year, now.month, now.day) d = os.mkdir(dir_name) file_name = os.path.join(dir_name, 'tracker_items.dat') with open(file_name, 'wb') as f: f.write(u'\u241f'.join([ 'project name', 'container identifier', 'container name', 'identifier', 'name', 'dimensions', 'note', 'create digital surrogate', 'qc digital surrogate', 'create metadata record', 'qc metadata record', 'create submission package']).encode('utf8')) f.write('\u241e'.encode('utf8')) for project in models.Project.objects.all(): for container in project.container_set.all(): print(container) for item in container.item_set.all(): data = u'\u241f'.join([ project.name.strip(), container.identifier.strip(), container.name.strip(), item.identifier.strip(), item.name.strip(), item.dimensions.strip(), item.note.strip() ]) item_actions = u'\u241f'.join([str(item_action) for item_action in item.itemaction_set.all().order_by('id')]) line_data = u'%s\u241f%s\u241e' % (data, item_actions) f.write(line_data.encode('utf8')) As you can see, we looped through different Django models and pulled out fields, writing everything to a file. We used the Unicode Record and Unit Separators as delimiters. One advantage of using those is that your data can have commas, tabs, newlines, … and it won’t matter. You still don’t have to quote or escape anything. Then we converted the data to a spreadsheet that users can view and search: import openpyxl workbook = openpyxl.Workbook() worksheet = workbook.active with open('tracker_items.dat', 'rb') as f: data = f.read() lines = data.decode('utf8').split('\u241e') print(len(lines)) print(lines[0]) print(lines[-1]) for line in lines: fields = line.split('\u241f') worksheet.append(fields) workbook.save('tracker_items.xlsx') Deposits For the deposits project, we just used the built-in Django dumpdata command: python manage.py dumpdata -o data_20180727.dat That output file could be used to load data back into a database if needed. Posted on March 25, 2019Author Ben CailCategories Uncategorized Searching for hierarchical data in Solr Recently I had to index a dataset into Solr in which the original items had a hierarchical relationship among them. In processing this data I took some time to look into the ancestor_path and descendent_path features that Solr provides out of the box and see if and how they could help to issue searches based on the hierarchy of the data. This post elaborates on what I learned in the process. Let’s start with some sample hierarchical data to illustrate the kind of relationship that I am describing in this post. Below is a short list of databases and programming languages organized by type. Databases ├─ Relational │ ├─ MySQL │ └─ PostgreSQL └─ Document ├─ Solr └─ MongoDB Programming Languages └─ Object Oriented ├─ Ruby └─ Python For the purposes of this post I am going to index each individual item shown in the hierarchy, not just the children items. In other words I am going to create 11 Solr documents: one for “Databases”, another for “Relational”, another for “MySQL”, and so on. Each document is saved with an id, a title, and a path. For example, the document for “Databases” is saved as: { "id": "001", "title_s": "Databases", "x_ancestor_path": "db", "x_descendent_path": "db" } and the one for “MySQL” is saved as: { "id": "003", "title_s": "MySQL", "x_ancestor_path": "db/rel/mysql", "x_descendent_path": "db/rel/mysql" } The x_ancestor_path and x_descendent_path fields in the JSON data represent the path for each of these documents in the hierarcy. For example, the top level “Databases” document uses the path “db” where the lowest level document “MySQL” uses “db/rel/mysql”. I am storing the exact same value on both fields so that later on we can see how each of them provides different features and addresses different use cases. ancestor_path and descendent_path The ancestor_path and descendent_path field types come predefined in Solr. Below is the definition of the descendent_path in a standard Solr 7 core: $ curl http://localhost:8983/solr/your-core/schema/fieldtypes/descendent_path { ... "indexAnalyzer":{ "tokenizer":{ "class":"solr.PathHierarchyTokenizerFactory", "delimiter":"/"}}, "queryAnalyzer":{ "tokenizer":{ "class":"solr.KeywordTokenizerFactory"}}}} Notice how it uses the PathHierarchyTokenizerFactory tokenizer when indexing values of this type and that it sets the delimiter property to /. This means that when values are indexed they will be split into individual tokens by this delimiter. For example the value “db/rel/mysql” will be split into “db”, “db/rel”, and “db/rel/mysql”. You can validate this in the Analysis Screen in the Solr Admin tool. The ancestor_path field is the exact opposite, it uses the PathHierarchyTokenizerFactory at query time and the KeywordTokenizerFactory at index time. There are also two dynamic field definitions *_descendent_path and *_ancestor_path that automatically create fields with these types. Hence the wonky x_descendent_path and x_ancestor_path field names that I am using in this demo. Finding descendants The descendent_path field definition in Solr can be used to find all the descendant documents in the hierarchy for a given path. For example, if I query for all documents where the descendant path is “db” (q=x_descendent_path:db) I should get all document in the “Databases” hierarchy, but not the ones under “Programming Languages”. For example: $ curl "http://localhost:8983/solr/your-core/select?q=x_descendent_path:db&fl=id,title_s,x_descendent_path" { ... "response":{"numFound":7,"start":0,"docs":[ { "id":"001", "title_s":"Databases", "x_descendent_path":"db"}, { "id":"002", "title_s":"Relational", "x_descendent_path":"db/rel"}, { "id":"003", "title_s":"MySQL", "x_descendent_path":"db/rel/mysql"}, { "id":"004", "title_s":"PostgreSQL", "x_descendent_path":"db/rel/pg"}, { "id":"005", "title_s":"Document", "x_descendent_path":"db/doc"}, { "id":"006", "title_s":"MongoDB", "x_descendent_path":"db/doc/mongo"}, { "id":"007", "title_s":"Solr", "x_descendent_path":"db/doc/solr"}] }} Finding ancestors The ancestor_path not surprisingly can be used to achieve the reverse. Given the path of a given document we can query Solr to find all its ancestors in the hierarchy. For example if I query Solr for the documents where x_ancestor_path is “db/doc/solr” (q=x_ancestor_path:db/doc/solr) I should get “Databases”, “Document”, and “Solr” as shown below: $ curl "http://localhost:8983/solr/your-core/select?q=x_ancestor_path:db/doc/solr&fl=id,title_s,x_ancestor_path" { ... "response":{"numFound":3,"start":0,"docs":[ { "id":"001", "title_s":"Databases", "x_ancestor_path":"db"}, { "id":"005", "title_s":"Document", "x_ancestor_path":"db/doc"}, { "id":"007", "title_s":"Solr", "x_ancestor_path":"db/doc/solr"}] }} If you are curious how this works internally, you could issue a query with debugQuery=true and look at how the query value “db/doc/solr” was parsed. Notice how Solr splits the query value by the / delimiter and uses something called SynonymQuery() to handle the individual values as synonyms: $ curl "http://localhost:8983/solr/your-core/select?q=x_ancestor_path:db/doc/solr&debugQuery=true" { ... "debug":{ "rawquerystring":"x_ancestor_path:db/doc/solr", "parsedquery":"SynonymQuery(Synonym(x_ancestor_path:db x_ancestor_path:db/doc x_ancestor_path:db/doc/solr))", ... } One little gotcha Given that Solr is splitting the path values by the / delimiter and that we can see those values in the Analysis Screen (or when passing debugQuery=true) we might expect to be able to fetch those values from the document somehow. But that is not the case. The individual tokens are not stored in a way that you can fetch them, i.e. there is no way for us to fetch the individual “db”, “db/doc”, and “db/doc/solr” values when fetching document id “007”. In hindsight this is standard Solr behavior but something that threw me off initially. Posted on January 10, 2019February 2, 2021Author hcorreaCategories Programming, Solr Monitoring Passenger’s Requests in Queue over time As I mentioned in a previous post we use Phusion Passenger as the application server to host our Ruby applications. A while ago upon the recommendation of my coworker Ben Cail I created a cron job that calls passenger-status every 5 minutes to log the status of Passenger in our servers.  Below is a sample of the passenger-status output: Version : 5.1.12 Date : Mon Jul 30 10:42:54 -0400 2018 Instance: 8x6dq9uX (Apache/2.2.15 (Unix) DAV/2 Phusion_Passenger/5.1.12) ----------- General information ----------- Max pool size : 6 App groups : 1 Processes : 6 Requests in top-level queue : 0 ----------- Application groups ----------- /path/to/our/app: App root: /path/to/our/app Requests in queue: 3 * PID: 43810 Sessions: 1 Processed: 20472 Uptime: 1d 7h 31m 25s CPU: 0% Memory : 249M Last used: 1s ag * PID: 2628 Sessions: 1 Processed: 1059 Uptime: 4h 34m 39s CPU: 0% Memory : 138M Last used: 1s ago * PID: 2838 Sessions: 1 Processed: 634 Uptime: 4h 30m 47s CPU: 0% Memory : 134M Last used: 1s ago * PID: 16836 Sessions: 1 Processed: 262 Uptime: 2h 14m 46s CPU: 0% Memory : 160M Last used: 1s ago * PID: 27431 Sessions: 1 Processed: 49 Uptime: 25m 27s CPU: 0% Memory : 119M Last used: 0s ago * PID: 27476 Sessions: 1 Processed: 37 Uptime: 25m 0s CPU: 0% Memory : 117M Last used: 0s ago Our cron job to log this information over time is something like this: /path/to/.gem/gems/passenger-5.1.12/bin/passenger-status >> ./logs/passenger_status.log Last week we had some issues in which our production server was experiencing short outages. Upon review we noticed that we were having a unusual amount of traffic coming to our server (most of it from crawlers submitting bad requests.) One of the tools that we used to validate the status of our server was the passenger_status.log file created via the aforementioned cron job. The key piece of information that we use is the “Requests in queue” value highlighted above. We parsed this value of out the passenger_status.log file to see how it changed in the last 30 days. The result showed that although we have had a couple of outages recently the number of “requests in queue” dramatically increased about two weeks ago and it had stayed high ever since. The graph below shows what we found. Notice how after August 19th the value of “requests in queue” has been constantly high, whereas before August 19th it was almost always zero or below 10. We looked closely to our Apache and Rails logs and determined the traffic that was causing the problem. We took a few steps to handle it and now our servers are behaving as normal again. Notice how we are back to zero requests in queue on August 31st in the graph above. The Ruby code that we use to parse our passenger_status.log file is pretty simple, it just grabs the line with the date and the line with the number of requests in queue, parses their values, and outputs the result to a tab delimited file that then we can use to create a graph in Excel or RAWGraphs. Below is the Ruby code: require "date" log_file = "passenger_status.log" excel_date = true def date_from_line(line, excel_date) index = line.index(":") return nil if index == nil date_as_text = line[index+2..-1].strip # Thu Aug 30 14:00:01 -0400 2018 datetime = DateTime.parse(date_as_text).to_s # 2018-08-30T14:00:01-04:00 if excel_date return datetime[0..9] + " " + datetime[11..15] # 2018-08-30 14:00 end datetime end def count_from_line(line) return line.gsub("Requests in queue:", "").to_i end puts "timestamp\trequest_in_queue" date = "N/A" File.readlines(log_file).each do |line| if line.start_with?("Date ") date = date_from_line(line, excel_date) elsif line.include?("Requests in queue:") request_count = count_from_line(line) puts "\"#{date}\"\t#{request_count}" end end In this particular case the number of requests in queue was caused by bad/unwanted traffic. If the increase in traffic had been legitimate we would have taken a different route, like adding more processes to our Passenger instance to handle the traffic. Posted on September 4, 2018February 2, 2021Author hcorreaCategories Blacklight, Josiah, Programming, Web Looking at the Oxford Common Filesystem Layout (OCFL) Currently, the BDR contains about 34TB of content. The storage layer is Fedora 3, and the data is stored internally by Fedora (instead of being stored externally). However, Fedora 3 is end-of-life. This means that we either maintain it ourselves, or migrate to something else. However, we don’t want to migrate 34TB, and then have to migrate it again if we change software again. We’d like to be able to change our software, without migrating all our data. This is where the Oxford Common Filesystem Layout (OCFL) work is interesting. OCFL is an effort to define how repository objects should be laid out on the filesystem. OCFL is still very much a work-in-progress, but the “Need” section of the specification speaks directly to what I described above. If we set up our data using OCFL, hopefully we can upgrade and change our software as necessary without having to move all the data around. Another benefit of the OCFL effort is that it’s work being done by people from multiple institutions, building on other work and experience in this area, to define a good, well-thought-out layout for repository objects. Finally, using a common specification for the filesystem layout of our repository means that there’s a better chance that other software will understand how to interact with our files on disk. The more people using the same filesystem layout, the more potential collaborators and applications for implementing the OCFL specification – safely creating, updating, and serving out content for the repository. Posted on July 24, 2018Author Ben CailCategories BDR Posts navigation Page 1 Page 2 … Page 6 Next page Proudly powered by WordPress library-brown-edu-879 ---- Brown University Library Digital Technologies Brown University Library Digital Technologies Bundler 2.1.4 and homeless accounts This week we upgraded a couple of our applications to Ruby 2.7 and Bundler 2.1.4 and one of the changes that we noticed was that Bundler was complaining about not being able to write to the /opt/local directory. Turns out this problem shows up because the account that we use to run our application is … Continue reading Bundler 2.1.4 and homeless accounts Upgrading from Solr 4 to Solr 7 A few weeks ago we upgraded the version of Solr that we use in our Discovery layer, we went from Solr 4.9 to Solr 7.5. Although we have been using Solr 7.x in other areas of the library this was a significant upgrade for us because searching is the raison d’être of our Discovery layer … Continue reading Upgrading from Solr 4 to Solr 7 PyPI packages Recently, we published two Python packages to PyPI: bdrxml and bdrcmodels. No one else is using those packages, as far as I know, and it takes some effort to put them up there, but there are benefits from publishing them. Putting a package on PyPI makes it easier for other code we package up to … Continue reading PyPI packages New RIAMCO website A few days ago we released a new version of the Rhode Island Archival and Manuscript Collections Online (RIAMCO) website. The new version is a brand new codebase. This post describes a few of the new features that we implemented as part of the rewrite and how we designed the system to support them. The … Continue reading New RIAMCO website Deploying with shiv I recently watched a talk called “Containerless Django – Deploying without Docker”, by Peter Baumgartner. Peter lists some benefits of Docker: that it gives you a pipeline for getting code tested and deployed, the container adds some security to the app, state can be isolated in the container, and it lets you run the exact … Continue reading Deploying with shiv Checksums In the BDR, we calculate checksums automatically on ingest (Fedora 3 provides that functionality for us), so all new content binaries going into the BDR get a checksum, which we can go back and check later as needed. We can also pass checksums into the BDR API, and then we verify that Fedora calculates the … Continue reading Checksums Exporting Django data We recently had a couple cases where we wanted to dump the data out of a Django database. In the first case (“tracker”), we were shutting down a legacy application, but needed to preserve the data in a different form for users. In the second case (“deposits”), we were backing up some obsolete data before … Continue reading Exporting Django data Searching for hierarchical data in Solr Recently I had to index a dataset into Solr in which the original items had a hierarchical relationship among them. In processing this data I took some time to look into the ancestor_path and descendent_path features that Solr provides out of the box and see if and how they could help to issue searches based … Continue reading Searching for hierarchical data in Solr Monitoring Passenger’s Requests in Queue over time As I mentioned in a previous post we use Phusion Passenger as the application server to host our Ruby applications. A while ago upon the recommendation of my coworker Ben Cail I created a cron job that calls passenger-status every 5 minutes to log the status of Passenger in our servers.  Below is a sample … Continue reading Monitoring Passenger’s Requests in Queue over time Looking at the Oxford Common Filesystem Layout (OCFL) Currently, the BDR contains about 34TB of content. The storage layer is Fedora 3, and the data is stored internally by Fedora (instead of being stored externally). However, Fedora 3 is end-of-life. This means that we either maintain it ourselves, or migrate to something else. However, we don’t want to migrate 34TB, and then have … Continue reading Looking at the Oxford Common Filesystem Layout (OCFL) libraryservices-jiscinvolve-org-1117 ---- What is “Plan M”? – Library services Skip to the content Search Library services Providing libraries with shared services to save time and money Menu About Advisory Groups Library Services Advisory Group Terms of Reference Meetings & minutes Members Library Hub Community Advisory Board Jisc Library Services Menu Search Search for: Close search Close Menu Coronavirus Library Hub Discover Library Hub Compare Library Hub Cataloguing NBK Licensing TLSS KB+ CMCAB Analytics Advisory Group Plan M Events Communications About Advisory GroupsShow sub menu Library Services Advisory GroupShow sub menu Terms of Reference Meetings & minutes Members Library Hub Community Advisory Board Jisc Library Services Categories Data NBK Plan M What is “Plan M”? Post author By Neil Grindley Post date 17 December 2019 No Comments on What is “Plan M”? Plan M is a very wide-ranging discussion that has been going on throughout the second half of 2019 involving many different stakeholders across the library community. The ‘M’ stands for ‘metadata’. In a nutshell, the way that metadata for academic and specialist libraries is created, sold, licensed, shared and re-used in the UK needs a re-think. Plan M is an initiative that is being facilitated by Jisc but is really a conversation between libraries, suppliers and intermediary organisations to streamline the metadata marketplace in the UK so that it is more coherent, transparent, robust and sustainable. The catalyst for this conversation has been a focus on aggregating and sharing library data at a new ambitious scale via the National Bibliographic Knowledgebase (NBK) and through Jisc Library Hub  services. Three new resources are available: A concise description of Plan M objectives and next steps as of December 2019 A slide deck (7 slides) providing a Plan M summary A fuller description of Plan M providing more context and definition A word document (4 pages) – Plan M – Definition and Direction A synthesis of discussions relating to Plan M during the period May – October 2019 A word document (7 pages) – Plan M – Review of Stakeholder Input Final We will be in touch with all stakeholders in the New Year to take forward this plan and look forward to working with everybody. Needless to say, if anyone has comments or queries about Plan M then drop us a line at nbk@jisc.ac.uk Best wishes and a merry Xmas The NBK Team Tags NBK By Neil Grindley Director of Content & Discovery Services at Jisc. With oversight of Jisc's library and archival discovery services and content solutions for HE and FE. View Archive → ← Collection Management: Share the Experience @ event summaries → Season’s Greetings and Christmas Closure Leave a Reply Cancel reply Your email address will not be published. Required fields are marked * Comment Notify me of followup comments via e-mail Name * Email * Website Search for: Recent Posts Consultation on National Bibliographic Metadata Agreements Libraries of the Enlightenment gallery at the British Museum Update on WorldCat Synchronisation for NBK Contributor Libraries The International Anthony Burgess Foundation Library Next steps with Plan M Discussions (BDS) Meta Log in Entries feed Comments feed WordPress.org Archives Archives Select Month April 2021 March 2021 January 2021 December 2020 November 2020 October 2020 September 2020 August 2020 July 2020 June 2020 May 2020 April 2020 March 2020 February 2020 January 2020 December 2019 November 2019 October 2019 July 2019 June 2019 April 2019 February 2019 January 2019 December 2018 November 2018 October 2018 August 2018 July 2018 June 2018 May 2018 April 2018 March 2018 February 2018 January 2018 December 2017 November 2017 October 2017 September 2017 August 2017 June 2017 May 2017 April 2017 March 2017 February 2017 January 2017 December 2016 November 2016 October 2016 September 2016 August 2016 July 2016 June 2016 May 2016 April 2016 February 2016 January 2016 December 2015 July 2015 May 2015 April 2015 March 2015 February 2015 November 2013 April 2013 March 2013 December 2012 November 2012 October 2012 August 2012 July 2012 May 2012 Categories Advisory Group (2) Analytics (2) CMCAB (20) Communications (8) Contributors (14) Coronavirus (5) Data (12) Events (11) Interface (4) KB+ (1) LHCAB (9) libraries (16) Library Hub Cataloguing (14) Library Hub Compare (33) Library Hub Discover (66) Licensing (5) NBK (35) NBKCG (1) open access (1) Plan M (11) Planning (2) Retention Data (5) Survey (2) TLSS (1) Uncategorized (1) Recent Comments Nadine Edwards on Library Hub Cataloguing: hear about our plans for 2021 and beyond Paul on The Gerald Coke Handel Collection at the Foundling Museum Jeff Edmunds on Moving Plan M Forwards – We Need Your Help! Bethan Ruddock on Driving Transformation with the NBK – where have we got to and where next? Jane Daniels on Driving Transformation with the NBK – where have we got to and where next? Site info: Privacy Cookies Accessibility To the top ↑ Up ↑ We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies. Cookie settingsACCEPT Manage consent Close Privacy Overview This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience. Necessary Necessary Always Enabled Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information. Non-necessary Non-necessary Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website. librecatproject-wordpress-com-7211 ---- Catmandu Catmandu About Download Tutorial May 22, 2019 Catmandu 1.20 On May 21th 2019, Nicolas Steenlant (our main developer and guru of Catmandu) released version 1.20 of our Catmandu toolkit with some very interesting new features. The main addition is a brand new way how Catmandu Fix-es can be implemented using the new Catmandu::Path implementation. This coding by Nicolas will make it much easier and straightforward to implement any kind of fixes in Perl. In the previous versions of Catmandu there were only two options to create new fixes: Create a Perl package in the Catmandu::Fix namespace which implements a fix method. This was very easy: update the $data hash you got as first argument, return the updated $data and you were done. Then disadvantage was that accessing fields in a deeply nested record was tricky and slow to code. Create a Perl package in the Catmandu::Fix namespace which implemented emit functions. These were functions that generate Perl code on the fly. Using emit functions it was easier to get fast access to deeply nested data. But, to create Fix packages was pretty complex. In Catmandu 1.20 there is now support for a third and easy way to create new Fixes using the Catmandu::Fix::Builder and Catmandu::Fix::Path class. Let me give an simple example of a skeleton Fix that does nothing: package Catmandu::Fix::rot13; use Catmandu::Sane; use Moo; use Catmandu::Util::Path qw(as_path); use Catmandu::Fix::Has; with 'Catmandu::Fix::Builder'; has path => (fix_arg => 1); sub _build_fixer { my ($self) = @_; sub { my $data = $_[0]; # ..do some magic here ... $data; } } 1; In the code above we start implementing a rot13(path) Fix that should read a string on a JSON path and encrypt it using the ROT13 algorithm. This Fix is only the skeleton which doesn’t do anything. What we have is: We import the as_path method be able to easily access data on JSON paths/ We import Catmandu::Fix::Has to be able to use has path constructs to read in arguments for our Fix. We import Catmandu::Fix::Builder to use the new Catmandu 1.20 builder class provides a _build_fixermethod. The builder is nothing more than a closure that reads the data, does some action on the data and return the data. We can use this skeleton builder to implement our ROT13 algorithm. Add these lines instead of the # do some magic part: # On the path update the string value... as_path($self->path)->updater( if_string => sub { my $value = shift; $value =~ tr{N-ZA-Mn-za-m}{A-Za-z}; $value; }, )->($data); The as_path method receives a JSON path string an creates an object which you can use to manipulate data on that path. One can update the values found with the updater method, or read data at that path with the getter method or create a new path with the creator method. In our example, we update the string found at the JSON path using if_string condition. The updaterhas many conditions: if_string needs a closure what should happen when a string is found on the JSON path. if_array_ref needs a closure what should happen when an array is found on the JSON path. if_hash_refneeds a closure what should happen when a hash is found on the JSON path. In our case we are only interested in transforming strings using our rot13(path) fix. The ROT13 algorithm is very easy and only switched the order of some characters. When we execute this fix on some sample data we get this result: $ catmandu -I lib convert Null to YAML --fix 'add_field(demo,hello);rot13v2(demo)' --- demo: uryyb ... In this case the Fix can be written much shorter when we know that every Catmandu::Path method return a closure (hint: look at the ->($data) in the code. The complete Fix can look like: package Catmandu::Fix::rot13; use Catmandu::Sane; use Moo; use Catmandu::Util::Path qw(as_path); use Catmandu::Fix::Has; with 'Catmandu::Fix::Builder'; has path => (fix_arg => 1); sub _build_fixer { my ($self) = @_; # On the path update the string value... as_path($self->path)->updater( if_string => sub { my $value = shift; $value =~ tr{N-ZA-Mn-za-m}{A-Za-z}; $value; }, ); } 1; This is as easy as it can get to manipulate deeply nested data with your own Perl tools. All the code is in Perl, there is no limit on the number of external CPAN packages one can include in these Builder fixes. We can’t wait what Catmandu extensions you will create. Written by hochstenbach Leave a comment Posted in Advanced, Updates Tagged with catmandu, fix language, perl April 8, 2019 LPW 2018: “Contrarian Perl” – Tom Hukins At 09:10, Tom Hukins shares his enthusiasm for Catmandu! Written by hochstenbach Leave a comment Posted in Uncategorized June 22, 2017 Introducing FileStores Catmandu is always our tool of choice when working with structured data. Using the Elasticsearch or MongoDB Catmandu::Store-s it is quite trivial to store and retrieve metadata records. Storing and retrieving a YAML, JSON (and by extension XML, MARC, CSV,…) files can be as easy as the commands below: $ catmandu import YAML to database < input.yml $ catmandu import JSON to database < input.json $ catmandu import MARC to database < marc.data $ catmandu export database to YAML > output.yml A catmandu.yml  configuration file is required with the connection parameters to the database: $ cat catmandu.yml --- store: database: package: ElasticSearch options: client: '1_0::Direct' index_name: catmandu ... Given these tools to import and export and even transform structured data, can this be extended to unstructured data? In institutional repositories like LibreCat we would like to manage metadata records and binary content (for example PDF files related to the metadata).  Catmandu 1.06 introduces the Catmandu::FileStore as an extension to the already existing Catmandu::Store to manage binary content. A Catmandu::FileStore is a Catmandu::Store where each Catmandu::Bag acts as a “container” or a “folder” that can contain zero or more records describing File content. The files records themselves contain pointers to a backend storage implementation capable of serialising and streaming binary files. Out of the box, one Catmandu::FileStore implementation is available Catmandu::Store::File::Simple, or short File::Simple, which stores files in a directory. Some examples. To add a file to a FileStore, the stream command needs to be executed: $ catmandu stream /tmp/myfile.pdf to File::Simple --root /data --bag 1234 --id myfile.pdf In the command above: /tmp/myfile.pdf is the file up be uploaded to the File::Store. File::Simple is the name of the File::Store implementation which requires one mandatory parameter, --root /data which is the root directory where all files are stored.  The--bag 1234 is the “container” or “folder” which contains the uploaded files (with a numeric identifier 1234). And the --id myfile.pdf is the identifier for the new created file record. To download the file from the File::Store, the stream command needs to be executed in opposite order: $ catmandu stream File::Simple --root /data --bag 1234 --id myfile.pdf to /tmp/file.pdf or $ catmandu stream File::Simple --root /data --bag 1234 --id myfile.pdf > /tmp/file.pdf On the file system the files are stored in some deep nested structure to be able to spread out the File::Store over many disks: /data `--/000 `--/001 `--/234 `--/myfile.pdf A listing of all “containers” can be retreived by requesting an export of the default (index) bag of the File::Store: $ catmandu export File::Simple --root /data to YAML _id: 1234 ... A listing of all files in the container “1234” can be done by adding the bag name to the export command: $ catmandu export File::Simple --root /data --bag 1234 to YAML _id: myfile.pdf _stream: !!perl/code '{ "DUMMY" }' content_type: application/pdf created: 1498125394 md5: '' modified: 1498125394 size: 883202 ... Each File::Store implementation supports at least the fields presented above: _id: the name of the file _stream: a callback function to retrieve the content of the file (requires an IO::Handle as input) content_type: the MIME-Type of the file created: a timestamp when the file was created modified: a timestamp when the file was last modified size: the byte length of the file md5: optional a MD5 checksum We envision in Catmandu that many implementations of FileStores can be created to be able to store files in GitHub, BagIts, Fedora Commons and more backends. Using the Catmandu::Plugin::SideCar  Catmandu::FileStore-s and Catmandu::Store-s can be combined as one endpoint. Using Catmandu::Store::Multi and Catmandu::Store::File::Multi many different implementations of Stores and FileStores can be combined. This is a short introduction, but I hope you will experiment a bit with the new functionality and provide feedback to our project. Written by hochstenbach Leave a comment Posted in Uncategorized March 24, 2017 Catmandu 1.04 Catmandu 1.04 has been released to with some nice new features. There are some new Fix routines that were asked by our community: error The “error” fix stops immediately the execution of the Fix script and throws an error. Use this to abort the processing of a data stream: $ cat myfix.fix unless exists(id)     error("no id found?!") end $ catmandu convert JSON --fix myfix.fix < data.json valid The “valid” fix condition can be used to validate a record (or part of a record) against a JSONSchema. For instance we can select only the valid records from a stream: $ catmandu convert JSON --fix 'select valid('', JSONSchema, schema:myschema.json)' < data.json Or, create some logging: $ cat myfix.fix unless valid(author, JSONSchema, schema:authors.json) log("errors in the author field") end $ catmandu convert JSON --fix myfix.fix < data.json rename The “rename” fix can be used to recursively change the names of fields in your documents. For example, when you have this JSON input: { "foo.bar": "123", "my.name": "Patrick" } you can transform all periods (.) in the key names to underscores with this fix: rename('','\.','_') The first parameter is the fields “rename” should work on (in our case it is an empty string, meaning the complete record). The second and third parameters are the regex search and replace parameters. The result of this fix is: { "foo_bar": "123", "my_name": "Patrick" } The “rename” fix will only work on the keys of JSON paths. For example, given the following path: my.deep.path.x.y.z The keys are: my deep path x y z The second and third argument search and replaces these seperate keys. When you want to change the paths as a whole take a look at the “collapse()” and “expand()” fixes in combination with the “rename” fix: collapse() rename('',"my\.deep","my.very.very.deep") expand() Now the generated path will be: my.very.very.deep.path.x.y.z Of course the example above could be written more simple as “move_field(my.deep,my.very.very.deep)”, but it serves as an example  that powerful renaming is possible. import_from_string This Fix is a generalisation of the “from_json” Fix. It can transform a serialised string field in your data into an array of data. For instance, take the following YAML record: --- foo: '{"name":"patrick"}' ... The field ‘foo’ contains a JSON fragment. You can transform this JSON into real data using the following fix: import_from_string(foo,JSON) Which creates a ‘foo’ array containing the deserialised JSON: --- foo: - name: patrick The “import_from_string” look very much like the “from_json” string, but you can use any Catmandu::Importer. It always created an array of hashes. For instance, given the following YAML record: --- foo: "name;hobby\nnicolas;drawing\npatrick;music" You can transform the CSV fragment in the ‘foo’ field into data by using this fix: import_from_string(foo,CSV,sep_char:";") Which gives as result: --- foo: - hobby: drawing name: nicolas - hobby: music name: patrick ... I the same way it can process MARC, XML, RDF, YAML or any other format supported by Catmandu. export_to_string The fix “export_to_string” is the opposite of “import_from_string” and is the generalisation of the “to_json” fix. Given the YAML from the previous example: --- foo: - hobby: drawing name: nicolas - hobby: music name: patrick ... You can create a CSV fragment in the ‘foo’ field with the following fix: export_to_string(foo,CSV,sep_char:";") Which gives as result: --- foo: "name;hobby\nnicolas;drawing\npatrick;music" search_in_store The fix “search_in_store” is a generalisation of the “lookup_in_store” fix. The latter is used to query the “_id” field in a Catmandu::Store and return the first hit. The former, “search_in_store” can query any field in a store and return all (or a subset) of the results. For instance, given the YAML record: --- foo: "(title:ABC OR author:dave) AND NOT year:2013" ... then the following fix will replace the ‘foo’ field with the result of the query in a Solr index: search_in_store('foo', store:Solr, url: 'http://localhost:8983/solr/catalog') As a result, the document will be updated like: --- foo: start: 0, limit: 0, hits: [...], total: 1000 ... where start: the starting index of the search result limit: the number of result per page hits: an array containing the data from the result page total: the total number of search results Every Catmandu::Solr can have another layout of the result page. Look at the documentation of the Catmandu::Solr implementations for the specific details. Thanks for all your support for Catmandu and keep on data converting 🙂 Written by hochstenbach Leave a comment Posted in Uncategorized June 16, 2016 Metadata Analysis at the Command-Line I was last week at the ELAG  2016 conference in Copenhagen and attended the excellent workshop by Christina Harlow  of Cornell University on migrating digital collections metadata to RDF and Fedora4. One of the important steps required to migrate and model data to RDF is understanding what your data is about. Probably old systems need to be converted for which little or no documentation is available. Instead of manually processing large XML or MARC dumps, tools like metadata breakers can be used to find out which fields are available in the legacy system and how they are used. Mark Phillips of the University of North Texas wrote recently in Code4Lib a very inspiring article how this could be done in Python. In this blog post I’ll demonstrate how this can be done using a new Catmandu tool: Catmandu::Breaker. To follow the examples below, you need to have a system with Catmandu installed. The Catmandu::Breaker tools can then be installed with the command: $ sudo cpan Catmandu::Breaker A breaker is a command that transforms data into a line format that can be easily processed with Unix command line tools such as grep, sort, uniq, cut and many more. If you need an introduction into Unix tools for data processing please follow the examples Johan Rolschewski of Berlin State Library and I presented as an ELAG bootcamp. As a simple example lets create a YAML file and demonstrate how this file can be analysed using Catmandu::Breaker: $ cat test.yaml --- name: John colors: - black - yellow - red institution: name: Acme years: - 1949 - 1950 - 1951 - 1952 This example has a combination of simple name/value pairs a list of colors and a deeply nested field. To transform this data into the breaker format execute the command: $ catmandu convert YAML to Breaker < test.yaml 1 colors[] black 1 colors[] yellow 1 colors[] red 1 institution.name Acme 1 institution.years[] 1949 1 institution.years[] 1950 1 institution.years[] 1951 1 institution.years[] 1952 1 name John The breaker format is a tab-delimited output with three columns: An record identifier: read from the _id field in the input data, or a counter when no such field is present. A field name. Nested fields are seperated by dots (.) and list are indicated by the square brackets ([]) A field value When you have a very large JSON or YAML field and need to find all the values of a deeply nested field you could do something like: $ catmandu convert YAML to Breaker < data.yaml | grep "institution.years" Using Catmandu you can do this analysis on input formats such as JSON, YAML, XML, CSV, XLS (Excell). Just replace the YAML by any of these formats and run the breaker command. Catmandu can also connect to OAI-PMH, Z39.50 or databases such as MongoDB, ElasticSearch, Solr or even relational databases such as MySQL, Postgres and Oracle. For instance to get a breaker format for an OAI-PMH repository issue a command like: $ catmandu convert OAI --url http://lib.ugent.be/oai to Breaker If your data is in a database you could issue an SQL query like: $ catmandu convert DBI --dsn 'dbi:Oracle' --query 'SELECT * from TABLE WHERE ...' --user 'user/password' to Breaker Some formats, such as MARC, doesn’t provide a great breaker format. In Catmandu, MARC files are parsed into a list of list. Running a breaker on a MARC input you get this: $ catmandu convert MARC to Breaker < t/camel.usmarc | head fol05731351 record[][] LDR fol05731351 record[][] _ fol05731351 record[][] 00755cam 22002414a 4500 fol05731351 record[][] 001 fol05731351 record[][] _ fol05731351 record[][] fol05731351 fol05731351 record[][] 082 fol05731351 record[][] 0 fol05731351 record[][] 0 fol05731351 record[][] a The MARC fields are part of the data, not part of the field name. This can be fixed by adding a special ‘marc’ handler to the breaker command: $ catmandu convert MARC to Breaker --handler marc < t/camel.usmarc | head fol05731351 LDR 00755cam 22002414a 4500 fol05731351 001 fol05731351 fol05731351 003 IMchF fol05731351 005 20000613133448.0 fol05731351 008 000107s2000 nyua 001 0 eng fol05731351 010a 00020737 fol05731351 020a 0471383147 (paper/cd-rom : alk. paper) fol05731351 040a DLC fol05731351 040c DLC fol05731351 040d DLC Now all the MARC subfields are visible in the output. You can use this format to find, for instance, all unique values in a MARC file. Lets try to find all unique 008 values: $ catmandu convert MARC to Breaker --handler marc < camel.usmarc | grep "\t008" | cut -f 3 | sort -u 000107s2000 nyua 001 0 eng 000203s2000 mau 001 0 eng 000315s1999 njua 001 0 eng 000318s1999 cau b 001 0 eng 000318s1999 caua 001 0 eng 000518s2000 mau 001 0 eng 000612s2000 mau 000 0 eng 000612s2000 mau 100 0 eng 000614s2000 mau 000 0 eng 000630s2000 cau 001 0 eng 00801nam 22002778a 4500 Catmandu::Breaker doesn’t only break input data in a easy format for command line processing, it can also do a statistical analysis on the breaker output. First process some data into the breaker format and save the result in a file: $ catmandu convert MARC to Breaker --handler marc < t/camel.usmarc > result.breaker Now, use this file as input for the ‘catmandu breaker’ command: $ catmandu breaker result.breaker | name | count | zeros | zeros% | min | max | mean | median | mode | variance | stdev | uniq | entropy | |------|-------|-------|--------|-----|-----|------|--------|--------|----------|-------|------|---------| | 001 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 003 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 005 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 008 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 010a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 020a | 9 | 1 | 10.0 | 0 | 1 | 0.9 | 1 | 1 | 0.09 | 0.3 | 9 | 3.3/3.3 | | 040a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 040c | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 040d | 5 | 5 | 50.0 | 0 | 1 | 0.5 | 0.5 | [0, 1] | 0.25 | 0.5 | 1 | 1.0/3.3 | | 042a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 050a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 050b | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 | | 0822 | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0.0/3.3 | | 082a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 3 | 0.9/3.3 | | 100a | 9 | 1 | 10.0 | 0 | 1 | 0.9 | 1 | 1 | 0.09 | 0.3 | 8 | 3.1/3.3 | | 100d | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 100q | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 111a | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 111c | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 111d | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 245a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 9 | 3.1/3.3 | | 245b | 3 | 7 | 70.0 | 0 | 1 | 0.3 | 0 | 0 | 0.21 | 0.46 | 3 | 1.4/3.3 | | 245c | 9 | 1 | 10.0 | 0 | 1 | 0.9 | 1 | 1 | 0.09 | 0.3 | 8 | 3.1/3.3 | | 250a | 3 | 7 | 70.0 | 0 | 1 | 0.3 | 0 | 0 | 0.21 | 0.46 | 3 | 1.4/3.3 | | 260a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 6 | 2.3/3.3 | | 260b | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 5 | 2.0/3.3 | | 260c | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 2 | 0.9/3.3 | | 263a | 6 | 4 | 40.0 | 0 | 1 | 0.6 | 1 | 1 | 0.24 | 0.49 | 4 | 2.0/3.3 | | 300a | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 5 | 1.8/3.3 | | 300b | 3 | 7 | 70.0 | 0 | 1 | 0.3 | 0 | 0 | 0.21 | 0.46 | 1 | 0.9/3.3 | | 300c | 4 | 6 | 60.0 | 0 | 1 | 0.4 | 0 | 0 | 0.24 | 0.49 | 4 | 1.8/3.3 | | 300e | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 500a | 2 | 8 | 80.0 | 0 | 1 | 0.2 | 0 | 0 | 0.16 | 0.4 | 2 | 0.9/3.3 | | 504a | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 630a | 2 | 9 | 90.0 | 0 | 2 | 0.2 | 0 | 0 | 0.36 | 0.6 | 2 | 0.9/3.5 | | 650a | 15 | 0 | 0.0 | 1 | 3 | 1.5 | 1 | 1 | 0.65 | 0.81 | 6 | 1.7/3.9 | | 650v | 1 | 9 | 90.0 | 0 | 1 | 0.1 | 0 | 0 | 0.09 | 0.3 | 1 | 0.5/3.3 | | 700a | 5 | 7 | 70.0 | 0 | 2 | 0.5 | 0 | 0 | 0.65 | 0.81 | 5 | 1.9/3.6 | | LDR | 10 | 0 | 0.0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 10 | 3.3/3.3 As a result you get a table listing the usage of subfields in all the input records. From this output we can learn: The ‘001’ field is available in 10 records (see: count) One record doesn’t contain a ‘020a’ subfield (see: zeros) The ‘650a’ is available in all records at least once at most 3 times (see: min, max) Only 8 out of 10 ‘100a’ subfields have unique values (see: uniq) The last column ‘entropy’ provides a number how interesting the field is for search engines. The higher the entropy, the more uniq content can be found. I hope this tools are of some use in your projects! Written by hochstenbach 8 Comments Posted in Uncategorized May 10, 2016 Catmandu 1.01 Catmandu 1.01 has been released today. There has been some speed improvements processing fixes due to switching from the Data::Util to the Ref::Util package which has better a support on many Perl platforms. For the command line there is now support for preprocessing  Fix scripts. This means, one can read in variables from the command line into a Fix script. For instance, when processing data you might want to keep some provenance data about your data sources in the output. This can be done with the following commands: $ catmandu convert MARC --fix myfixes.fix --var source=Publisher1 --var date=2014-2015 < data.mrc with a myfixes.fix like: add_field(my_source,{{source}}) add_field(my_data,{{date}}) marc_field(245,title) marc_field(022,issn) . . . etc . . Your JSON output will now contain the clean ‘title’ and ‘issn’ fields but also for each record a ‘my_source’ with value ‘Publisher1’ and a ‘my_date’ with value ‘2014-2015’. By using the Text::Hogan compiler full support of the mustache language is available. In this new Catmandu version there have been also some new fix functions you might want to try out, see our Fixes Cheat Sheet for a full overview.   Written by hochstenbach Leave a comment Posted in Updates April 20, 2016 Parallel Processing with Catmandu In this blog post I’ll show a technique to scale out your data processing with Catmandu. All catmandu scripts use a single process, in a single thread. This means that if you need to process 2 times as much data , you need 2 times at much time. Running a catmandu convert command with the -v option will show you the speed of a typical conversion: $ catmandu convert -v MARC to JSON --fix heavy_load.fix < input.marc > output.json added 100 (55/sec) added 200 (76/sec) added 300 (87/sec) added 400 (92/sec) added 500 (90/sec) added 600 (94/sec) added 700 (97/sec) added 800 (97/sec) added 900 (96/sec) added 1000 (97/sec) In the example above we process an ‘input.marc’ MARC file into a ‘output.json’ JSON file with some difficult data cleaning in the ‘heave_load.fix’ Fix script. Using a single process we can reach about 97 records per second. It would take 2.8 hours to process one million records and 28 hours to process ten million records. Can we make this any faster? When you buy a computer they are all equipped with multiple processors. Using a single process, only one of these processors are used for calculations. One would get much ‘bang for the buck’  if all the processors could be used. One technique to do that is called ‘parallel processing’. To check the amount of processors available on your machine use the file ‘/proc/cpuinfo’: on your Linux system: $ cat /proc/cpuinfo | grep processor processor : 0 processor : 1 The example above  shows two lines: I have two cores available to do processing on my laptop. In my library we have servers which contain  4 , 8 , 16 or more processors. This means that if we could do our calculations in a smart way then our processing could be 2, 4, 8 or 16 times as fast (in principle). To check if your computer  is using all that calculating power, use the ‘uptime’ command: $ uptime 11:15:21 up 622 days, 1:53, 2 users, load average: 1.23, 1.70, 1.95 In the example above I ran did ‘uptime’ on one of our servers with 4 processors. It shows a load average of about 1.23 to 1.95. This means that in the last 15 minutes between 1 and 2 processors where being used and the other two did nothing. If the load average is less than the number of cores (4 in our case) it means: the server is waiting for input. If the load average is equal to the number of cores  it means: the server  is using all the CPU power available. If the load is bigger than the number of cores, then there is more work available than can be executed by the machine, some processes need to wait. Now you know some Unix commands we can start using the processing power available on your machine. In my examples I’m going to use a Unix tool called ‘GNU parallel’ to run Catmandu  scripts on all the processors in my machine in the most efficient way possible. To do this you need to install GNU parallel: sudo yum install parallel The second ingredient we need is a way to cut our input data into many parts. For instance if we have a 4 processor machine we would like to create 4 equal chunks of data to process in parallel. There are very many ways to cut your data in to many parts. I’ll show you a trick we use in at Ghent University library with help of a MongoDB installation. First install, MongoDB and the MongoDB catmandu plugins (these examples are taken from our CentOS documentation): $ sudo cat > /etc/yum.repos.d/mongodb.repo < part1 $ catmandu export MongoDB --database_name -q '{"part.rand2":1}' > part2 We are going to use these catmandu commands in a Bash script which makes use of GNU parallel run many conversions simultaneously. #!/bin/bash # file: parallel.sh CPU=$1 if [ "${CPU}" == "" ]; then /usr/bin/parallel -u $0 {} < result.${CPU}.json fi This example script above shows how a conversion process could run on a 2-processor machine. The lines with ‘/usr/bin/parallel’ show how GNU parallel is used to call this script with two arguments ‘0’ and ‘1’ (for the 2-processor example). In the lines with ‘catmandu export’ shows how chunks of data are read from the database and processed with the ‘heavy_load.fix’ Fix script. If you have a 32-processor machine, you would need to provide parallel an input which contains the numbers 0,1,2 to 31 and change the query to ‘part.rand32’. GNU parallel is a very powerfull command. It gives the opportunity to run many processes in parallel and even to spread out the load over many machines if you have a cluster. When all these machines have access to your MongoDB database then all can receive chunks of data to be processed. The only task left is to combine all results which can be as easy as a simple ‘cat’ command: $ cat result.*.json > final_result.json Written by hochstenbach 4 Comments Posted in Advanced Tagged with catmandu, JSON Path, library, Linux, marc, parallel procesing, perl February 25, 2016 Catmandu 1.00 After 4 years of programming, 88 minor releases we are finally there: the release of Catmandu 1.00! We have pushed the test coverage of the code to 93.97% and added and cleaned a lot of our documentation. For the new features read our Changes file. A few important changes should be noted.     By default Catmandu will read and write valid JSON files. In previous versions the default input format was (new)line delimited JSON records as in: {"record":"1"} {"record":"2"} {"record":"3"} instead of the valid JSON array format: [{"record":"1"},{"record":"2"},{"record":"3"}] The old format can still be used as input but will be read much faster when using the –line_delimited  option on the command line. Thus, write: # fast $ catmandu convert JSON --line_delimited 1 < lines.json.txt instead of: # slow $ catmandu convert JSON < lines.json.txt By default Catmandu will export in the valid JSON-array format. If you still need to use the old format, then provide the –line_delimited option on the command line: $ catmandu convert YAML to JSON --line_delimited 1 < data.yaml We thank all contributors for these wonderful four years of open source coding and we wish you all four new hacking years. Our thanks goes to: Nicolas Steenlant Christian Pietsch Dave Sherohman Dries Moreels Friedrich Summann Jakob Voss Johann Rolschewski Jonas Smedegaard Jörgen Eriksson Magnus Enger Maria Hedberg Mathias Lösch Najko Jahn Nicolas Franck Patrick Hochstenbach Petra Kohorst Robin Sheat Snorri Briem Upasana Shukla Vitali Peil Deutsche Forschungsgemeinschaft for providing us the travel funds Lund University Library , Ghent University Library and Bielefeld University Library to provide us a very welcome environment for open source collaboration. Written by hochstenbach Leave a comment Posted in Uncategorized June 19, 2015 Catmandu Chat On Friday June 26 2015 16:00 CEST, we’ll  provide a one hour introduction/demo into processing data with Catmandu. If you are interested, join us on the event page: https://plus.google.com/hangouts/_/event/c6jcknos8egjlthk658m1btha9o More instructions on the exact Google Hangout coordinates for this chat will follow on this web page at Friday June 26 15:45. To enter the chat session, a working version of the Catmandu VirtualBox needs to be running on your system: https://librecatproject.wordpress.com/get-catmandu/ Written by hochstenbach Leave a comment Posted in Events June 3, 2015 Matching authors against VIAF identities At Ghent University Library we enrich catalog records with VIAF identities to enhance the search experience in the catalog. When searching for all the books about ‘Chekov’ we want to match all name variants of this author. Consult VIAF http://viaf.org/viaf/95216565/#Chekhov,_Anton_Pavlovich,_1860-1904 and you will see many of them. Chekhov Čehov Tsjechof Txékhov etc Any of the these names variants can be available in the catalog data if authority control is not in place (or not maintained). Searching any of these names should result in results for all the variants. In the past it was a labor intensive, manual job for catalogers to maintain an authority file. Using results from Linked Data Fragments research by Ruben Verborgh (iMinds) and the Catmandu-RDF tools created by Jakob Voss (GBV) and RDF-LDF by Patrick Hochstenbach, Ghent University started an experiment to automatically enrich authors with VIAF identities. In this blog post we will report on the setup and results of this experiment which will also be reported at ELAG2015. Context Three ingredients are needed to create a web of data: A scalable way to produce data. The infrastructure to publish data. Clients accessing the data and reusing them in new contexts. On the production site there doesn’t seem to be any problem creating huge datasets by libraries. Any transformation of library data to linked data will quickly generate an enormous number of RDF triples. We see this in the size of public available datasets: UGent Academic Bibliography: 12.000.000 triples Libris catalog: 50.000.000 triples Gallica: 72.000.000 triples DBPedia: 500.000.000 triples VIAF: 600.000.000 triples Europeana: 900.000.000 triples The European Library: 3.500.000.000 triples PubChem: 60.000.000.000 triples Also for accessing data, from a consumers perspective the “easy” part seems to be covered. Instead of thousands of APIs available and many documents formats for any dataset, SPARQL and RDF provide the programmer a single protocol and document model. The claim of the Linked Data Fragments researchers is that on the publication side, reliable queryable access to public Linked Data datasets largely remains problematic due to the low availability percentages of public SPARQL endpoints [Ref]. This is confirmed by the 2013 study by researchers from Pontificia Universidad Católica in Chili and National University of Ireland where more than half of the public SPARQL endpoints seem to be offline 1.5 days per month. This gives an availability rate of less than 95% [Ref]. The source of this high rate of inavailability can be traced back to the service model of Linked Data where two extremes exists to publish data (see image below). From: http://www.slideshare.net/RubenVerborgh/dbpedias-triple-pattern-fragments At one side, data dumps (or dereferencing of URLs) can be made available which requires a simple HTTP server and lots of processing power on the client side. At the other side, an open SPARQL endpoint can be provided which requires a lot of processing power (hence, hardware investment) on the serverside. With SPARQL endpoints, clients can demand the execution of arbitrarily complicated queries. Furthermore, since each client requests unique, highly specific queries, regular caching mechanisms are ineffective, since they can only optimized for repeated identical requests. This situation can be compared with providing a database SQL dump to endusers or open database connection on which any possible SQL statement can be executed. To a lesser extent libraries are well aware of the different modes of operation between running OAI-PMH services and Z39.50/SRU services. Linked Data Fragment researchers provide a third way, Triple Pattern Fragments, to publish data which tries to provide the best of both worlds: access to a full dump of datasets while providing a queryable and cachable interface. For more information on the scalability of this solution I refer to the report  presented at the 5th International USEWOD Workshop. The experiment VIAF doesn’t provide a public SPARQL endpoint, but a complete dump of the data is available at http://viaf.org/viaf/data/. In our experiments we used the VIAF (Virtual International Authority File), which is made available under the ODC Attribution License.  From this dump we created a HDT database. HDT provides a very efficient format to compress RDF data while maintaining browser and search functionality. Using command line tools RDF/XML, Turtle and NTriples can be compressed into a HDT file with an index. This standalone file can be used to without the need of a database to query huge datasets. A VIAF conversion to HDT results in a 7 GB file and a 4 GB index. Using the Linked Data Fragments server by Ruben Verborgh, available at https://github.com/LinkedDataFragments/Server.js, this HDT file can be published as a NodeJS application. For a demonstration of this server visit the iMinds experimental setup at: http://data.linkeddatafragments.org/viaf Using Triple Pattern Fragments a simple REST protocol is available to query this dataset. For instance it is possible to download the complete dataset using this query: $ curl -H "Accept: text/turtle" http://data.linkeddatafragments.org/viaf If we only want the triples concerning Chekhov (http://viaf.org/viaf/95216565) we can provide a query parameter: $ curl -H "Accept: text/turtle" http://data.linkeddatafragments.org/viaf?subject=http://viaf.org/viaf/95216565 Likewise, using the predicate and object query any combination of triples can be requested from the server. $ curl -H "Accept: text/turtle" http://data.linkeddatafragments.org/viaf?object="Chekhov" The memory requirements of this server are small enough to run a copy of the VIAF database on a MacBook Air laptop with 8GB RAM. Using specialised Triple Pattern Fragments clients, SPARQL queries can be executed against this server. For the Catmandu project we created a Perl client RDF::LDF which is integrated into Catmandu-RDF. To request all triples from the endpoint use: $ catmandu convert RDF --url http://data.linkeddatafragments.org/viaf --sparql 'SELECT * {?s ?p ?o}' Or, only those Triples that are about “Chekhov”: $ catmandu convert RDF --url http://data.linkeddatafragments.org/viaf --sparql 'SELECT * {?s ?p "Chekhov"}' In the Ghent University experiment a more direct approach was taken to match authors to VIAF. First, as input a MARC dump from the catalog is being streamed into a Perl program using a Catmandu iterator. Then, we extract the 100 and 700 fields which contain $a (name) and $d (date) subfields. These two fields are combined in a search query, as if we would search: Chekhov, Anton Pavlovich, 1860-1904 If there is exactly one hit in our local VIAF copy, then the result is reported. A complete script to process MARC files this way is available at a GitHub gist. To run the program against a MARC dump execute the import_viaf.pl command: $ ./import_viaf.pl --type USMARC file.mrc 000000089-2 7001 L $$aEdwards, Everett Eugene,$$d1900- http://viaf.org/viaf/110156902 000000122-8 1001 L $$aClelland, Marjorie Bolton,$$d1912- http://viaf.org/viaf/24253418 000000124-4 7001 L $$aSchein, Edgar H. 000000124-4 7001 L $$aKilbridge, Maurice D.,$$d1920- http://viaf.org/viaf/29125668 000000124-4 7001 L $$aWiseman, Frederick. 000000221-6 1001 L $$aMiller, Wilhelm,$$d1869- http://viaf.org/viaf/104464511 000000256-9 1001 L $$aHazlett, Thomas C.,$$d1928- http://viaf.org/viaf/65541341 [edit: 2017-05-18 an updated version of the code is available as a Git project https://github.com/LibreCat/MARC2RDF ] All the authors in the MARC dump will be exported. If there is exactly one single match against VIAF it will be added to the author field. We ran this command for one night in a single thread against 338.426 authors containing a date and found 135.257 exact matches in VIAF (=40%). In a quite recent follow up of our experiments, we investigated how LDF clients can be used in a federated setup. When combining in the LDF algorithm the triples result from many LDF servers, one SPARQL query can be run over many machines. These results are demonstrated at the iMinds demo site where a single SPARQL query can be executed over the combined VIAF and DBPedia datasets. A Perl implementation of this federated search is available in the latest version of RDF-LDF at GitHub. We strongly believe in the success of this setup and the scalability of this solution as demonstrated by Ruben Verborgh at the USEWOD Workshop. Using Linked Data Fragments a range of solutions are available to publish data on the web. From simple data dumps to a full SPARQL endpoint any service level can be provided given the resources available. For more than a half year DBPedia has been running an LDF server with 99.9994% availability on a 8 CPU , 15 GB RAM Amazon server with 4.5 million requests. Scaling out, services such has the LOD Laundromat cleans 650.000 datasets and provides access to them using a single fat LDF server (256 GB RAM). For more information on the Federated searches with  Linked Data Fragments  visit the blog post of Ruben Verborgh at: http://ruben.verborgh.org/blog/2015/06/09/federated-sparql-queries-in-your-browser/ Written by hochstenbach Leave a comment Posted in Advanced Tagged with LDF, Linked Data, marc, perl, RDF, SPARQL, Triple Pattern Fragments, VIAF Older Posts Recent Posts Catmandu 1.20 LPW 2018: “Contrarian Perl” – Tom Hukins Introducing FileStores Catmandu 1.04 Metadata Analysis at the Command-Line Catmandu 1.01 Parallel Processing with Catmandu Catmandu 1.00 Catmandu Chat Matching authors against VIAF identities Preprocessing Catmandu fixes Earthquake in Kathmandu Importing files from a hotfolder directory LibreCat/Memento Hackathon Day 18: Merry Christmas! Day 17: Exporting RDF data with Catmandu Day 16: Importing RDF data with Catmandu Day 15 : MARC to Dublin Core Day 14: Set up your own OAI data service Day 13: Harvest data with OAI-PMH Day 12: Index your data with ElasticSearch Day 11: Store your data in MongoDB Day 10: Working with CSV and Excel files Day 9: Processing MARC with Catmandu Day 8: Processing JSON data from webservices Day 7: Catmandu JSON paths Day 6: Introduction into Catmandu Day 5: Editing text with nano Day 4: grep, less and wc Day 3: Bash basics Create a free website or blog at WordPress.com. Catmandu Create a free website or blog at WordPress.com. Email (Required) Name (Required) Website   Loading Comments... Comment × Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use. To find out more, including how to control cookies, see here: Cookie Policy librecatproject-wordpress-com-7844 ---- Catmandu Catmandu Catmandu 1.20 On May 21th 2019, Nicolas Steenlant (our main developer and guru of Catmandu) released version 1.20 of our Catmandu toolkit with some very interesting new features. The main addition is a brand new way how Catmandu Fix-es can be implemented using the new Catmandu::Path implementation. This coding by Nicolas will make it much easier and […] LPW 2018: “Contrarian Perl” – Tom Hukins At 09:10, Tom Hukins shares his enthusiasm for Catmandu! Introducing FileStores Catmandu is always our tool of choice when working with structured data. Using the Elasticsearch or MongoDB Catmandu::Store-s it is quite trivial to store and retrieve metadata records. Storing and retrieving a YAML, JSON (and by extension XML, MARC, CSV,…) files can be as easy as the commands below: $ catmandu import YAML to database […] Catmandu 1.04 Catmandu 1.04 has been released to with some nice new features. There are some new Fix routines that were asked by our community: error The “error” fix stops immediately the execution of the Fix script and throws an error. Use this to abort the processing of a data stream: $ cat myfix.fix unless exists(id)     error("no […] Metadata Analysis at the Command-Line I was last week at the ELAG  2016 conference in Copenhagen and attended the excellent workshop by Christina Harlow  of Cornell University on migrating digital collections metadata to RDF and Fedora4. One of the important steps required to migrate and model data to RDF is understanding what your data is about. Probably old systems need to […] Catmandu 1.01 Catmandu 1.01 has been released today. There has been some speed improvements processing fixes due to switching from the Data::Util to the Ref::Util package which has better a support on many Perl platforms. For the command line there is now support for preprocessing  Fix scripts. This means, one can read in variables from the command line into […] Parallel Processing with Catmandu In this blog post I’ll show a technique to scale out your data processing with Catmandu. All catmandu scripts use a single process, in a single thread. This means that if you need to process 2 times as much data , you need 2 times at much time. Running a catmandu convert command with the […] Catmandu 1.00 After 4 years of programming, 88 minor releases we are finally there: the release of Catmandu 1.00! We have pushed the test coverage of the code to 93.97% and added and cleaned a lot of our documentation. For the new features read our Changes file. A few important changes should be noted.     By default […] Catmandu Chat On Friday June 26 2015 16:00 CEST, we’ll  provide a one hour introduction/demo into processing data with Catmandu. If you are interested, join us on the event page: https://plus.google.com/hangouts/_/event/c6jcknos8egjlthk658m1btha9o More instructions on the exact Google Hangout coordinates for this chat will follow on this web page at Friday June 26 15:45. To enter the chat session, […] Matching authors against VIAF identities At Ghent University Library we enrich catalog records with VIAF identities to enhance the search experience in the catalog. When searching for all the books about ‘Chekov’ we want to match all name variants of this author. Consult VIAF http://viaf.org/viaf/95216565/#Chekhov,_Anton_Pavlovich,_1860-1904 and you will see many of them. Chekhov Čehov Tsjechof Txékhov etc Any of the these names variants can be […] lisletters-fiander-info-7514 ---- Rapid Communications skip to main | skip to sidebar Rapid Communications Rapid, but irregular, communications from the frontiers of Library Technology Wednesday, April 20, 2016 Mac OS vs Emacs: Getting on the right (exec) PATH One of the minor annoyances about using Emacs on Mac OS is that the PATH environment variable isn't set properly when you launch Emacs from the GUI (that is, the way we always do it). This is because the Mac OS GUI doesn't really care about the shell as a way to launch things, but if you are using brew, or other packages that install command line tools, you do. Apple has changed the way that the PATH is set over the years, and the old environment.plist method doesn't actually work anymore, for security reasons. For the past few releases, the official way to properly set up the PATH is to use the path_helper utility program. But again, that only really works if your shell profile or rc file is run before you launch Emacs. So, we need to put a bit of code into Emacs' site_start.el file to get things set up for us: (when (file-executable-p "/usr/libexec/path_helper") (let ((path (shell-command-to-string "eval `/usr/libexec/path_helper -s`; echo -n \"$PATH\""))) (setenv "PATH" path) (setq exec-path (append (parse-colon-path path) (list exec-directory))))) This code runs the path_helper utility, saves the output into a string, and then uses the string to set both the PATH environment variable and the Emacs exec-path lisp variable, which Emacs uses to run subprocesses when it doesn't need to launch a shell. If you are using the brew version of Emacs, put this code in /usr/local/share/emacs/site-lisp/site-start.el and restart Emacs. Posted by David J. Fiander at 9:35 am No comments: Tuesday, January 20, 2015 Finding ISBNs in the the digits of π For some reason, a blog post from 2010 about searching for ISBNs in the first fifty million digits of π suddenly became popular on the net again at the end of last week (mid-January 2015). The only problem is that Geoff, the author, only looks for ISBN-13s, which all start with the sequence "978". There aren't many occurrences of "978" in even the first fifty million digits of π, so it's not hard to check them all to see if they are the beginning of a potential ISBN, and then find out if that potential ISBN was ever assigned to a book. But he completely ignores all of the ISBN-10s that might be hidden in π. So, since I already have code to validate ISBN checksums and to look up ISBNs in OCLC WorldCat, I decided to check for ISBN-10s myself. I don't have easy access to the first fifty million digits of π, but I did manage to find the first million digits online without too much difficulty. An ISBN-10 is a ten character long string that uniquely identifies a book. An example is "0-13-152414-3". The dashes are optional and exist mostly to make it easier for humans, just like the dashes in a phone number. The first character of an ISBN-10 indicate the language in which the book is published: 0 and 1 are for English, 2 is for French, and so on. The last character of the ISBN is a "check digit", which is supposed to help systems figure out if the ISBN is correct or not. It will catch many common types of errors, like swapping two characters in the ISBN: "0-13-125414-3" is invalid. Here are the first one hundred digits of π: 3.141592653589793238462643383279502884197169399375 105820974944592307816406286208998628034825342117067 To search for "potential (English) ISBN-10s", all one needs to do is search for every 0 or 1 in the first 999,990 digits of π (there is a "1" three digits from the end, but then there aren't enough digits left over to find a full ISBN, so we can stop early) and check to see if the ten digit sequence of characters starting with that 0 or 1 has a valid check digit at the end. The sequence "1415926535", highlighted in red, fails the test, because "5" is not the correct check digit; but the sequence "0781640628" highlighted in green is a potential ISBN. There are approximately 200,000 zeros and ones in the first million digits of π, but "only" 18,273 of them appear at the beginning of a potential ISBN-10. Checking those 18,273 potentials against the WorldCat bibliographic database results in 1,168 valid ISBNs. The first one is at position 3,102: ISBN 0306803844, for the book The evolution of weapons and warfare by Trevor N. Dupuy. The last one is at position 996,919: ISBN 0415597234 for the book Exploring language assessment and testing : language in action by Anthony Green. Here's the full dataset. Posted by David J. Fiander at 5:27 pm 4 comments: Saturday, March 10, 2012 Software Upgrades and The Parable of the Windows A librarian friend of mine recently expressed some surprise at the fact that a library system would spend almost $140,000 to upgrade their ILS software, when the vendor is known to be hostile to its customers and not actually very good with new development on their products. The short answer is that it's easier to upgrade than to think. Especially when an "upgrade" will be seen as easier than a "migration" to a different vendor's system (note: open ILS platforms like Evergreen and Koha may be read as being different vendors for the sake of convenience). In fact, when an ILS vendor discontinues support for a product and tells its customers that they have to migrate to another product if they want to continue to purchase support, it is the rare library that will take this opportunity to re-examine all its options and decide to migrate to a different vendor's product. A simple demonstration of this thinking, on a scale that most of us can imagine, is what happened when my partner and I decided that it was time to replace the windows in our house several years ago. There are a couple of things you need to know about replacing the windows in your house, if you've never done this before: Most normal folks replace the windows in their house over the course of several years, doing two or three windows every year or two. If one is replacing the huge bay window in the living room, then that might be the only window that one does that year. Windows are expensive enough that one can't really afford to do them all at once. Windows are fungible. For the most part, one company's windows look exactly like another company's. Unless you're working hard at getting a particular colour of flashing on the outside of the window, nobody looking at your house from the sidewalk would notice that the master bedroom window and the livingroom window were made by different companies. Like any responsible homeowners, we called several local window places, got quotations from three or four of them for the windows we wanted replaced that year, made our decision about which vendor we were going to use for the first round of window replacements, and placed an order. A month or so later, on a day that the weather was going to be good, a crew from the company arrived, knocked big holes in the front of our house to take out the old windows and install the new ones. A couple of years went by, and we decided it was time to do the next couple of windows, so my partner, who was always far more organized about this sort of thing that me, called three or four window companies and asked them to come out to get quotations for the work. At least one of the vendors declined, and another vendor did come out and give us a quote but he was very surprised that we were going through this process again, because normally, once a householder has gone through the process once, they tend to use the same window company for all the windows, even if several years have passed, or if the type of work is very different from the earlier work (such as replacing the living room bay window after a couple of rounds of replacing bedroom windows). In general, once a decision has been made, people tend to stick with that plan. I think it's a matter of, "Well, I made this decision last year, and at the time, this company was good, so they're probably still good," combined, perhaps, with a bit of thinking that changing vendors in mid-stream implies that I didn't make a good decision earlier. And there is, of course, always the thought that it's better to stick with the devil you know that the one you don't. Posted by David J. Fiander at 5:21 pm 3 comments: Sunday, January 02, 2011 Using QR Codes in the Library This started out as a set of internal guidelines for the staff at MPOW, but some friends expressed interest in it, and it seems to have struck a nerve, so I'm posting it here, so it is easier for people to find and to link to. Using QR Codes in the Library QR codes are new to North American, but have been around for a while in Japan, where they originated, and where everybody has a cellphone that can read the codes. They make it simpler to take information from the real world and load it into your phone. As such, they should only be used when the information will be useful for somebody on the go, and shouldn't normally be used if the person accessing the information will probably be on a computer to begin with. Do Use QR Codes: On posters and display projectors to guide users to mobile-friendly websites. To share your contact information on posters, display projectors, or your business card. This makes it simpler for users to add you to their addressbook without having to type it all in. In display cabinets or art exhibits to link to supplementary information about the items on display. Don't use QR Codes: to record your contact information in your email signature. Somebody reading your email can easily copy the information from your signature to their addressbook. to share URLs for rich, or full-sized, websites. The only URLs you should be sharing via QR codes for are mobile-friendly sites. When Using QR Codes: Make sure to include a human readable URL, preferably one that's easy to remember, near the QR code for people without QR Code scanners to use. Posted by David J. Fiander at 7:07 pm No comments: Monday, April 06, 2009 A Manifesto for the Library Last week John Blyberg, Kathryn Greenhill, and Cindi Trainor spent some time together thinking about what the library is for and what its future might hold. The result of that deep thinking has now been published on John's blog under the title "The Darien Statements on the Library and Librarians." Opening with the ringing statement that The purpose of the Library is to preserve the integrity of civilization they then provide their own gloss on what this means for individual libraries, and for librarians. There is a lively discussion going on in the comments on John's blog, as well as less thoughtful sniping going on in more "annoying" blogs. I think that this is something that will engender quite a bit of conversation in the month's to come. Posted by David J. Fiander at 6:13 pm No comments: Sunday, April 05, 2009 I'm a Shover and Maker! Since only a few people can be named "Movers and Shakers" by Library Journal, Joshua Neff and Steven Lawson created the "Shovers and Makers" awards "for the rest of us," under the auspices of the not entirely serious Library Society of the World. I'm very pleased to report that I have been named a 2009 Shover and Maker (by myself, as are all the winners). The Shovers and Makers awards are a fun way to share what we've done over the past year or two and they're definitely a lot simpler than writing the annual performance review that HR wants. Think of this as practice for writing the speaker's bio for the conference keynote you dream of being invited to give. Posted by David J. Fiander at 8:22 am No comments: Sunday, January 25, 2009 LITA Tears Down the Walls At ALA Midwinter 2009, Jason Griffey and the LITA folks took advantage of the conference center's wireless network to provide quick and easy access to the Top Tech Trends panel for those of us that couldn't be there in person. The low-bandwidth option was a CoverItLive live-blogging feed of comments from attending that also included photos by Cindi Trainor, and a feed of twitters from attendees. The high-bandwidth option was a live (and recorded) video stream of the event that Jason captured using the webcam built into his laptop. Aside from the LITA planned events, the fact that we could all sit in meant that there were lots of virtual conversations in chat rooms and other forums that sprung up as people joined in from afar. Unfortunately, because my Sunday morning is filled with laundry and other domestic pleasures, I wasn't able to join in on the "live" chatter going on in parallel with the video or livebloggin. Owing to funding constraints and my own priorities, my participation at ALA is limited. I've been to LITA Forum once, and might go again, but I focus more on the OLA other regional events. This virtual option from LITA let me get a peek at what's going on and hear what the "big thinkers" at LITA have to say. I hope they can keep it up, and will definitely be talking to local folks about how we might be able to emulate LITA in our own events. Posted by David J. Fiander at 12:34 pm No comments: Older Posts Home Subscribe to: Posts (Atom) About Me David J. Fiander I'm a former software developer who's now the web services librarian at a university. The great thing about that job title is that nobody knows what I do. View my complete profile Last.FM Weekly Chart Blog archive ▼  2016 (1) ▼  April (1) Mac OS vs Emacs: Getting on the right (exec) PATH ►  2015 (1) ►  January (1) ►  2012 (1) ►  March (1) ►  2011 (1) ►  January (1) ►  2009 (4) ►  April (2) ►  January (2) ►  2008 (22) ►  September (1) ►  August (1) ►  July (1) ►  June (2) ►  May (3) ►  April (3) ►  March (4) ►  February (4) ►  January (3) ►  2007 (6) ►  December (2) ►  July (1) ►  June (1) ►  March (2) ►  2006 (13) ►  December (1) ►  November (1) ►  October (1) ►  September (1) ►  July (1) ►  May (1) ►  April (1) ►  March (1) ►  February (5) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 Canada License.   litablog-org-4795 ---- LITA Blog LITA Blog Empowering libraries through technology Jobs in Information Technology: August 25, 2020 New This Week Coordinator of Digital Scholarship and Programs, Marquette University Libraries, Milwaukee WI Digital Scholarship Coordinator, UNC Charlotte, Charlotte, NC Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Jobs in Information Technology: August 13, 2020 New This Week Information Systems Manager (PDF), The Community Library Association, Ketchum, ID Children’s Librarian, Buhl Public Library, Buhl, ID Technology Integration Librarian, Drexel University Libraries, Philadelphia, PA Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Your Core Community Update Much has been happening behind-the-scenes to prepare for Core’s upcoming launch on September 1st, so we want to update you on the progress we’ve made. At the 2020 ALA Virtual Conference Council meetings, the ALA Council approved the creation of Core, so we’re official! It’s been a difficult summer for everyone given the global situation, but this was a milestone we’re excited to reach. What We’ve Been Doing In May, the Core Transition Committee (the 9 division presidents plus senior staff) formed 11 working groups of members from all 3 divisions to make recommendations about how to proceed with our awards/scholarships, budget/finance, committees, communications, conference programming, continuing education, fundraising/sponsorships, interest groups, member engagement, nominations for 2021 president-elect, publications, and standards. These groups have done an amazing amount of work in a very short time period, and we’re grateful to these members for their commitment and effort. We’re happy to report... Free LITA Webinar ~ Library Tech Response to Covid-19 ~ August 5th Sign up for this free LITA webinar: Library Tech Response to Covid-19 Libraries are taking the necessary precautions to create a safe environment during the pandemic. Social distancing isn’t the only solution, but providing access to loanable technologies, including handling and quarantine of equipment, cleaning, and other safety and health concerns are just some of the measures put in place. With the ongoing disruption to library services caused by COVID-19, what reopening planning policies should be considered for usage? In this free 90-minute presentation, our presenters will share tips that might be helpful to other librarians before they reopen. The presenters will also talk about the evolution of the phased plan from the establishment of a temporary computer lab in the library as COVID-19 began to spread in March 2020, to the current phased approach for gradual reopening. Justin will also offer insight into managed access, technology and services, workflows, messaging,... Jobs in Information Technology: July 29, 2020 New This Week Library Director, Walpole Town Library, Walpole, NH Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Core Call for 2021 ALA Annual Program Proposals Submit an ALA 2021 Annual Conference program proposal for ALA’s newest division, Core: Leadership, Infrastructure, Futures, which will begin on September 1, 2020. Proposals are due September 30, 2020, and you don’t need to be a Core member to submit a proposal. Submit your idea using this proposal form. Core welcomes topics of interest to a wide range of library professionals in many different areas, including… 1. Access and Equity Advocacy in areas such as copyright, equity of access, open access, net neutrality, and privacy Preservation Week Equity, diversity, and inclusion, both within the division and the profession, as related to Core’s subject areas 2. Assessment Emphasizing the role of assessment in demonstrating the impacts of libraries or library services Assessment tools, methods, guidelines, standards, and policies and procedures 3. Leadership and Management Developing leaders at every level Best practices for inclusion by using an equity lens to examine leadership... Core Call for Webinar Proposals Submit a webinar proposal for ALA’s newest division, Core: Leadership, Infrastructure, Futures, which will begin on September 1, 2020. Proposals are due September 1, 2020, and you don’t need to be a Core member to submit a proposal. Early submissions are encouraged and will be considered for September and October presentations. Submit your idea using this proposal form. Core webinars reach a wide range of library professionals in many different areas, including… 1. Access and Equity Advocacy in areas such as copyright, equity of access, open access, net neutrality, and privacy Preservation Week Equity, diversity, and inclusion, both within the division and the profession, as related to Core’s subject areas 2. Assessment Emphasizing the role of assessment in demonstrating the impacts of libraries or library services Assessment tools, methods, guidelines, standards, and policies and procedures 3. Leadership Developing leaders at every level Best practices for inclusion by using an equity lens to examine... Core Virtual Forum is excited to announce our 2020 Keynote Speakers! Core Virtual Forum welcomes our 2020 Keynote speakers, Dr. Meredith D. Clark and Sofia Leung! Both speakers embody our theme in leading through their ideas and are catalysts for change to empower our community and move the library profession forward. Dr. Clark is a journalist and Assistant Professor in Media Studies at the University of Virginia. She is Academic Lead for Documenting the Now II, funded by the Andrew W. Mellon Foundation. Dr. Clark develops new scholarship on teaching students about digital archiving and community-based archives from a media studies perspective. She will be a 2020-2021 fellow with Data & Society. She is a faculty affiliate at the Center on Digital Culture and Society at the University of Pennsylvania. And, she sits on the advisory boards for Project Information Literacy, and for the Center for Critical Race and Digital Studies at New York University. Clark is an in-demand media consultant... Catch up on the June 2020 Issue of Information Technology and Libraries The June 2020 issue of Information Technology and Libraries (ITAL) was published on June 15. Editor Ken Varnum and LITA President Emily Morton-Owens reflect on the past three months in their Letter from the Editor, A Blank Page, and LITA President’s Message, A Framework for Member Success, respectively. Kevin Ford is the author of this issue’s “Editorial Board Thoughts” column, Seeing through Vocabularies. Rounding out our editorial section, the June “Public Libraries Leading the Way” section offers two items. Chuck McAndrew of the Lebanon (New Hampshire) Public Libraries describes his leadership in the IMLS-funded LibraryVPN project. Melody Friedenthal, of the Worcester (Massachusetts) Public Library talks about how she approached and teaches an Intro to Coding Using Python course. Peer-reviewed Content Virtual Reality as a Tool for Student Orientation in Distance Education Programs: A Study of New Library and Information Science Students Dr. Sandra Valenti, Brady Lund, Ting Wang Virtual reality... Jobs in Information Technology: July 8, 2020 New This Week Dean of Libraries, San Jose State University, San Jose, CA Deputy Library Director, City of Carlsbad, Carlsbad, CA Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Jobs in Information Technology: July 2, 2020 New This Week Web Services Librarian, Chester Fritz Library, University of North Dakota, Grand Forks, ND Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Jobs in Information Technology: June 26, 2020 New This Week Metadata Librarian, Librarian I or II, University of Northern British Columbia, Prince George, British Columbia, Canada Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Jobs in Information Technology: June 19, 2020 New This Week Information Technology Librarian, University of Maryland, Baltimore County, Baltimore, MD Associate University Librarian for Research and Learning, Columbia University Libraries, New York, NY Library Technology/Programmer Analyst III, Virginia Beach Public Library, Virginia Beach, VA Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Core Virtual Happy Hour Social ~ June 26 Our Joint Happy Hour social at Midwinter was such a success that next week we’re bringing Happy Hour to you online—and registration is free! We invite members of ALCTS, LITA, and LLAMA to join us on Friday, June 26, 5:00-7:00 pm Central Time for Virtual Happy Hour networking and/or play with your peers in a game of Scattergories. Wear your favorite pop culture T-shirt, bring your best Zoom background, grab a beverage, and meet us online for a great time! Attendees will automatically be entered to win free registration to attend the Core Virtual Forum. Winner must be present to redeem prize. Registration is required. Register now at: bit.ly/2NeNprH Michael Carroll Awarded 2020 LITA/Christian Larew Memorial Scholarship Michael Carroll has been selected to receive the 2020 LITA/Christian Larew Memorial Scholarship ($3,000) sponsored by the Library and Information Technology Association (LITA) and Baker & Taylor. This scholarship is for master’s level study, with an emphasis on library technology and/or automation, at a library school program accredited by the American Library Association. Criteria for the Scholarship includes previous academic excellence, evidence of leadership potential, and a commitment to a career in library automation and information technology. The Larew Scholarship Committee was impressed by what Michael has already accomplished and look forward to seeing what he will accomplish after graduation in 2021. Michael has already shown a strong interest in digitization projects. He currently manages a team of students working with digitization. Previously, he has scanned and cataloged many collections. He has also assisted the Presbyterian Historical Society in creating sustainable processes for digitization. Michael has also shown his willingness and ability to work with a wide variety of projects and technologies that span both technical and non-technical including... We are back on Twitter Friday for #LITAchat The fourth in this series of #LITAchats will start on Friday, June 12 from 12-1 Central Standard Time on Twitter. We will be asking you to chat with us about self-care. What are you doing to take care of yourselves during this time? How do you unplug without feeling guilty?  We hope you’ll join us for #LITAchat and chat about self-care techniques and figuring out how to better take care of ourselves during these tough times. We’re looking forward to hearing from you! Join LITA on Twitter Catch up on the last #LITAchat Join us for ALCTS/LITA/LLAMA e-Forum! Please join us for a joint ALCTS/LITA/LLAMA e-forum discussion. It’s free and open to everyone! Registration information is at the end of the message, along with subscription management options for existing listserv members. Continuing to Manage the Impact of COVID-19 on Libraries June 9-10, 2020 Moderated by Alyse Jordan, Steven Pryor, Nicole Lewis and Rebecca Uhl Please join us for an e-forum discussion. It’s free and open to everyone! Registration information is at the end of the message. Each day, discussion begins and ends at: Pacific: 7 a.m. – 3 p.m. Mountain: 8 a.m. – 4 p.m. Central: 9 a.m. – 5 p.m. Eastern: 10 a.m. – 6 p.m. Over the past several months, COVID-19 has significantly impacted libraries and library technical service units and departments, including requiring staff to work remotely and determining what services they can provide. As states begin to reopen, libraries face challenges as they determine... Together Against Racism ALA and Core are committed to dismantling racism and white supremacy. Along with the ALA Executive Board, we endorse the Black Caucus of the American Library Association (BCALA)’s May 28 statement condemning the brutal murder of George Floyd at the hands of Minneapolis Police Department officers. In their statement, BCALA cites Floyd’s death as “the latest in a long line of recent and historical violence against Black people in the United States.” Not only does Core support the sentiments of BCALA, we vow to align our values regarding equity, diversity, and inclusion with those of BCALA and other organizations that represent marginalized communities within ALA. We also stand strong with the Asian/Pacific American community, which has been the target of xenophobia and racism in the wake of the outbreak of COVID-19, and support the Asian/Pacific American Librarians Association (APALA) and their statement that, “There is no excuse for discriminatory sentiments and actions towards Asians... We are back on Twitter tomorrow for #LITAchat Are you ready for the next Twitter #LITAchat? Join the discussion on Friday, May 22, from 12-1pm Central Time. We will be asking you to tell us about challenges with working from home. Are there things you can’t do and wish you could? Are there issues with your home setup in general? Anne Pepitone will lead the discussion. We invite you to join us tomorrow to share your experiences and chat with your colleagues. Follow LITA on Twitter Catch up on the last #LITAchat We’re looking forward to hearing from you! -The LITA Membership Development Committee LITA Job Board Analysis Report – Laura Costello (Chair, Assessment & Research) LITA Assessment & Research and Diversity & Inclusion Committees Background & Data This report comes from a joint analysis conducted by LITA’s Assessment & Research and Diversity & Inclusion committees in Fall 2019. The analysis focused on the new and emerging trends in skills in library technology jobs and the types of positions that are currently in demand. It also touches on trends in diversity and inclusion in job postings and best practices for writing job ads that attract a diverse and talented candidate pool.  The committees were provided with a list of 678 job postings from the LITA job board between 2015-2019. Data included the employer information, the position title, the location (city/state) the posting date. Some postings also included a short description. The Assessment & Research Committee augmented the dataset with job description, responsibilities, qualifications, and salary information for a 25% sample of the postings from each year using archival job posting information. Committee members also assigned... Congratulations to Dr. Jian Qin, winner of the 2020 LITA/OCLC Kilgour Research Award Dr. Jian Qin has been selected as the recipient of the 2020 Frederick G. Kilgour Award for Research in Library and Information Technology, sponsored by OCLC and the Library and Information Technology Association (LITA). She is the Professor and Director at the iSchool, Syracuse University.  The Kilgour Award honors research relevant to the development of information technologies, especially work which shows promise of having a positive and substantive impact on any aspect(s) of the publication, storage, retrieval and dissemination of information, or the processes by which information and data are manipulated and managed. It recognizes a body of work probably spanning years, if not the majority of a career. The winner receives $2,000, and a citation. Dr. Qin’s recent research projects include metadata modeling for gravitational wave research data management and big metadata analytics using GenBank metadata records for DNA sequences, both with funding from NSF. She also collaborated with a colleague to develop a Capability Maturity Model... LITA/ALA Survey of Library Response to COVID-19 The Library and Information Technology Association (LITA) and its ALA partners are seeking a new round of feedback about the work of libraries as they respond to the COVID-19 crisis, releasing a survey and requesting feedback by 11:59 p.m. CDT, Monday, May 18, 2020. Please complete the survey by clicking on the following link: https://www.surveymonkey.com/r/libraries-respond-to-covid-19-may-2020.  LITA and its ALA partners know that libraries across the United States are taking unprecedented steps to answer the needs of their communities, and this survey will help build a better understanding of those efforts. LITA and its ALA partners will use the results to advocate on behalf of libraries at the national level, communicate aggregated results with the public and media, create content and professional development opportunities to address library staff needs, and share some raw, anonymized data elements with state-level staff and library support organizations for their own advocacy needs.  Additional information about... #CoreForum2020 is now a Virtual Event! Join your ALA colleagues from across divisions for the 2020 Forum, which is now a virtual event!  WHERE: In light of the COVID-19 public health crisis, leadership within LITA, ALCTS, and LLAMA made the decision to move the conference online to create a safe, interactive environment accessible for all. WHAT: Call for proposals have been extended to Friday June 12, 2020.  WHEN: Forum is scheduled November 18 and 20, 2020 HOW: Share your ideas and experiences with library projects by submitting a talk for the inaugural event for Core:  https://forum.lita.org/call-for-proposals For more information about the LITA, ALCTS, LLAMA (Core) Forum, please visit https://forum.lita.org  Jobs in Information Technology: May 6, 2020 New This Week Web Services Librarian, Fairfield University, Fairfield, CT Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. WFH? Boost your skill set with LITA CE! Reserve your spot and learn new skills to enhance your career with LITA online continuing education offerings. Buying Strategies 101 for Information TechnologyWednesday, May 27, 2020, 1:00-2:30 pm Central TimePresenter: Michael Rodriguez, Collections Strategist at the University of Connecticut In this 90-minute webinar, you’ll learn best practices, terminology, and concepts for effectively negotiating contracts for the purchase of information technology (IT) products and services. View details and Register here. Using Images from the Internet in a Webpage: How to Find and CiteWednesday, June 3, 2020, 2:00-3:30 pm Central TimePresenter: Lauren Bryant, Priority Associate Librarian of Ray W. Howard Library In this 90-minute webinar, you’ll learn practical ways to quickly find and filter creative commons licensed images online, learn how to hyperlink a citation for a website, and how to use creative commons images for thumbnails in videos and how to cite the image in unconventional situations like this. View details and Register here. Troublesome Technology Trends: Bridging the Learning DivideWednesday, June 17, 2020, 1:00-2:30 pm... May 5/1 Twitter #LITAchat Last week, Anne Pepitone kicked off the discussion with Zoom Virtual Backgrounds, shared her favorites, and provided tips on how to use them. The next Twitter #LITAchat will be on Friday, May 1, from 12-1pm Central Time when we’ll talk about apps that help you work from home. What do you use to help with project management, time management, deadlines, or to just stay focused? We invite you to join us tomorrow to share, learn, and chat about it with your colleagues. Follow LITA on Twitter. We’re looking forward to hearing from you! -The LITA Membership Development Committee Jobs in Information Technology: April 29, 2020 New This Week Two Associate Dean Positions, James Madison University Libraries, Harrisonburg, VA Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Data Privacy While Working From Home Today’s guest post is brought to you by our recent presenter, Becky Yoose. Special thanks to Becky for being willing to answer the questions we didn’t have time for during our webinar! Hello everyone from your friendly neighborhood library data privacy consultant! We covered a lot of material earlier this month in “A Crash Course in Protecting Library Data While Working From Home,” co-sponsored by LITA and OIF. We had a number of questions during the webinar, some of which were left unanswered at the end. Below are three questions in particular that we didn’t get to in the webinar. Enjoy! Working from home without a web-based ILS We don’t have a web-based version of our ILS and our County-based IT department says they can’t set up remote desktop (something to do with their firewall)… do you have any recommendations on how to advocate for remote desktop? If I have... Strategies for Surviving a Staffing Crisis Library staff are no strangers to budget and staffing reductions. Most of us have way too much experience doing more with less, covering unfilled positions, and rigging solutions out of the digital equivalent of chewing gum and bailing wire, because we can’t afford to buy all the tools we need. In the last two years, my department at Northern Arizona University’s Cline Library operated with roughly half the usual amount of staff. In this post, I’ll share a few strategies that helped us get through this challenging time. First, a quick introduction. My department, Content, Discovery & Delivery services, includes the digital services unit (formerly library technology services) as well as collection management (including electronic resources management), acquisitions, cataloging, physical processing, interlibrary loan and document delivery, and course reserves. We are a technology-intensive department, both as users and implementers/supporters of technology. Here are some of the strategies we used to... April 4/24 Twitter #LITAchat A lot has changed since we had our last Twitter #LITAchat, Core passed and then COVID 19 happened. We are all navigating new territory in our jobs and life overall. So we wanted to bring you a weekly set of LITAChats discussing our shared experiences during these strange times.  The first in this series of LITAchats will start on Friday, April 24 from 12-1pm Central Standard Time. We will be asking you to show us your Zoom Virtual Backgrounds! We know that Zoom conferencing has been popular among many workplaces so we thought what would be better than showcasing some of the creative backgrounds everyone has been using. If you don’t have a background no worries, you can share about the best backgrounds you have seen from colleagues. Don’t know how to turn on Zoom Virtual Backgrounds? We will cover that too! We hope you’ll join us on Twitter for... Congratulations to Samantha Grabus, winner of the 2020 LITA/Ex Libris Student Writing Award Samantha Grabus has been selected as the winner of the 2020 Student Writing Award sponsored by Ex Libris Group and the Library and Information Technology Association (LITA) for her paper titled “Evaluating the Impact of the Long S upon 18th-Century Encyclopedia Britannica Automatic Subject Metadata Generation Results.” Grabus is a Research Assistant and PhD student at Drexel University Metadata Research Center. “This valuable work of original research helps to quantify the scope of a problem that is of interest not only in the field of library and information science, but that also, as Grabus notes in her conclusion, could affect research in fields from the digital humanities to the sciences,” said Julia Bauder, the Chair of this year’s selection committee. When notified she had won, Grabus remarked, “I am thrilled and honored to receive the 2020 LITA/Ex Libris Student Writing Award. I would like to extend my gratitude to the award committee... Jobs in Information Technology: April 15, 2020 New This Week Web and Digital Scholarship Technologies Librarian, Marquette University Libraries, Milwaukee, WI CEO / Library Director, Orange County Library System, Orlando, FL Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. ALA LITA Emerging Leaders: Inventing a Sustainable Division In January 2020, the latest cohort of Emerging Leaders met at ALA Midwinter to begin their projects. LITA sponsored two Emerging Leaders this year: Kelsey Flynn, Adult Services Specialist at White Oak Library, and Paige Walker, Digital Collections & Preservation Librarian at Boston College. Kelsey and Paige are part of Emerging Leaders Group G, “Inventing a Sustainable Division,” in which they’ve been charged with identifying measures that LITA can take to improve its fiscal and environmental sustainability. As a first step in their assessment, the group distributed a survey to LITA members that will quantify interest in sustainable measures such as virtual conferences and webinars. Want to help? Complete the survey to give feedback that may shape the direction of our chapter. Group G is fortunate to have several other talented library workers on its team:  Kristen Cooper, Plant Sciences Librarian at University of Minnesota Tonya Ferrell, OER Coordinator at... Latest in LITA eLearning So much has changed since COVID-19. Online learning is in greater demand and we are working hard to provide you with resources and more professional development opportunities that strengthens the library community. We hope you are well and staying safe. There’s a seat waiting for you. Register today! Digital Inception: Building a digital scholarship/humanities curriculum as a subject librarian Wednesday, April 22, 2020 1:00 – 2:30 p.m. Central Time Presenter: Marcela Isuster, Education and Humanities Librarian, McGill University This presentation will guide attendees in building a digital scholarship curriculum from a subject librarian position. It will explore how to identify opportunities, reach out to faculty, and advertise your services. It will also showcase activities, lesson plans, and free tools for digital publication, data mining, text analysis, mapping, a section on finding training opportunities and strategies to support colleagues and create capacity in your institutions. In this 90-minute webinar, you’ll learn:... Join us this Fall for #CoreForum2020 – Proposal Deadline Extended! Call for Proposals have now been extended to Friday, May 22, 2020. Share your ideas and experiences about library technology, leadership, collections, preservation, assessment, and metadata at the inaugural meeting of Core, a joining of LITA/ALCTS/LLAMA. We welcome your session proposal. For more information about the call for proposals and our theme of exploring ideas and making them reality, visit the 2020 Forum website: https://forum.lita.org  Event Details November 19-21, 2020 Baltimore, MD Renaissance Baltimore Harborplace Hotel COVID-19 Planning The 2020 LITA/ALCTS/LLAMA Forum Planning Committee is currently evaluating a contingency plan, should the COVID-19 public health crisis impact Forum in November. Core Is Approved! We’re thrilled to announce that Core: Leadership, Infrastructure, Futures is moving forward, thanks to our members. The three existing divisions’ members all voted to approve the bylaws change that will unite ALCTS, LITA, and LLAMA to form Core: ALCTS: 91% yes LITA: 96% yes LLAMA: 96% yes The presidents of the three divisions, Jennifer Bowen, ALCTS, Emily Morton-Owens, LITA, and Anne Cooper Moore, LLAMA, shared the following statement: “We first want to thank our members for supporting Core. Their belief in this vision, that we can accomplish more together than we can separately, has inspired us, and we look forward to working with all members to build this new and sustainable ALA division. We also want to thank the Core Steering Committee, and all the members who were part of project teams, town halls and focus groups. We would not have reached this moment without their incredible work.” ALA Executive... Free LITA Webinar: Protect Library Data While Working From Home A Crash Course in Protecting Library Data While Working From Home Presenter: Becky Yoose, Founder / Library Data Privacy Consultant, LDH Consulting Services Thursday, April 9, 2020 1:00 – 2:00 pm Central Time There’s a seat waiting for you… Register for this free LITA webinar today! Libraries across the U.S. rapidly closed their doors to both public and staff in the last two weeks, leaving many staff to work from home. Several library workers might be working from home for the first time in their current positions, while many others were not fully prepared to switch over to remote work in a matter of days, or even hours, before the library closed. In the rush to migrate library workers to remote work and to migrate physical library programs and services to online, data privacy and security sometimes gets lost in the mix. Unfamiliar settings, new routines, and increased reliance on vendor... Jobs in Information Technology: March 25, 2020 New This Week Head of Library Technology Services, East Carolina University, Greenville, NC Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. March 2020 ITAL Issue Now Available The March 2020 issue of Information Technology and Libraries (ITAL) is available now. In this issue, ITAL Editor Ken Varnum shares his support of LITA, ALCTS, and LLAMA merging to form a new ALA division, Core. Our content includes a message from LITA President, Emily Morton-Owens. “A Framework for Member Success,“ Morton-Owens discusses the current challenges of LITA as a membership organization and reinvention being the key to survival. Also in this edition, Laurie Willis discusses the pros and cons of handling major projects in-house versus hiring a vendor in “Tackling Big Projects.” Sheryl Cormicle Knox and Trenton Smiley discuss using digital tactics as a cost-effective way to increase marketing reach in “Google Us!” Featured Articles: “User Experience Methods and Maturity in Academic Libraries,” Scott W. H. Young, Zoe Chao, and Adam Chandler This article presents a mixed-methods study of the methods and maturity of user experience (UX) practice in... Learn How to Build your own Digital Scholarship/Humanities Curriculum with this LITA webinar Are you a subject librarian interested in building digital scholarships? Join us for the upcoming webinar “Digital Inception: Building a digital scholarship/humanities curriculum as a subject librarian,” on Wednesday, April 22, from 1:00 – 2:30 pm CST.  Digital scholarship is gaining momentum in academia. What started as a humanities movement is now present in most disciplines. Introducing digital scholarship to students can benefit them in multiple ways: it helps them interact with new trends in scholarship, appeals to different kinds of learners, helps them develop new and emerging literacies, and gives them the opportunity to be creative. This 90-minute presentation will guide attendees in building a digital scholarship curriculum from a subject librarian position. It will explore how to identify opportunities, reach out to faculty, and advertise your services. It will also showcase activities, lesson plans, and free tools for digital publication, data mining, text analysis, mapping, etc. Finally, the presentation will... Jobs in Information Technology: March 11, 2020 New this week Project Manager for Resource Sharing Initiatives, Harvard University, Cambridge, MA Research Data Services Librarian, University of Kentucky Libraries, Lexington, KY Digital Archivist, Rice University, Fondren Library, Houston, TX Associate Director, Technical Services, Yale University, New Haven, CT Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Congratulations to Alison Macrina, winner of the 2020 LITA/Library Hi Tech Award The LITA/Library Hi Tech Awards Committee is pleased to select Alison Macrina as the 2020 recipient of the LITA/Library Hi-Tech Award. Macrina led the Tor Relay Initiative in New Hampshire, is the founder and executive director of the Library Freedom Project, and has written and taught extensively in the areas of digital privacy, surveillance, and user anonymity in the context of libraries and librarianship. In this role, Macrina was instrumental in creating the Library Freedom Institute, which trained its first cohort in 2018 and will train its third cohort in 2020. Macrina has also spoken on digital privacy and the work of the Library Freedom Project across the United States and published Anonymity, the first book in ALA’s Library Futures Series, in 2019. The committee was fortunate to receive several outstanding nominations for the 2020 award. Macrina stood out in this strong pool of candidates for the broad reach and impact... Nominate yourself or someone you know for the next LITA Top Tech Trends panel of speakers LITA is looking for dynamic speakers with knowledge about the top trends in technology and how they intersect with information security and privacy. Library technology is quickly evolving with trends such as VR, cloud computing and AI. As library technology continues to impact our profession and those that we serve, security and privacy are quickly becoming top concerns. We hope this panel will provide insight and information about these technology trends for you to discuss within your own organization. If you or someone you know would be a great fit for this exciting panel, please submit your nomination today.   Submit your nominations – the deadline is April 17, 2020. The session is planned for Sunday, June 28, 2020, 2:30 – 3:30 pm, at the 2020 ALA Annual Conference in Chicago, IL. A moderator and several panelists will each discuss trends impacting libraries, ideas for use cases, and practical approaches for... Jobs in Information Technology: March 4, 2020 New this week Wilson Distinguished Professorship, University of North Carolina at Chapel Hill, Chapel Hill, NC Coordinator of Library Technical Services, Berea College, Berea, KY UI/UX Designer, University of Rochester Libraries, Rochester, NY Technical Support and Hardware Specialist – 2 Openings, St. Lawrence University, Canton, NY ​​​​​​​Software Engineer, Library Systems, Stanford Health Care, Palo Alto, CA Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Hebah Emara is our 2019-20 LITA/OCLC Spectrum Scholar LITA and OCLC are funding Hebah Emara’s participation in the ALA Spectrum Scholars program as part of their commitment to help diversify the library technology field. Emara is a second year distance student at the University of Missouri – Columbia School of Information Science and Learning Technologies MLIS program. She is interested in the ways libraries and technology intersect. Her background in IT and love of learning about technology, computers, and programming drew her to working in library technology. Libraries’ ability to bridge the digital divide and their use of technology to provide opportunities to their communities and solve problems are also of particular interest to Emara. Her decision to apply to the Spectrum Scholarship was fueled by a desire to learn from a community of peers and mentors.  Emara is currently the co-chair of a Tech UnConference to be held in April 2020 and organized by MentorNJ in collaboration with the... Share your ideas and library projects by submitting a session proposal for the 2020 Forum! 2020 Forum Call for Proposals Submission Deadline: March 30, 2020 November 19-21, 2020 Baltimore, Maryland Renaissance Baltimore Harborplace Hotel Do you have an idea or project that you would like to share? Does your library have a creative or inventive solution to a common problem? Submit a proposal for the 2020 LITA/ALCTS/LLAMA Forum! Submission deadline is March 30th. Our library community is rich in ideas and shared experiences. The 2020 Forum Theme embodies our purpose to share knowledge and gain new insights by exploring ideas through an interactive, hands-on experience. We hope that this Forum can be an inspiration to share, finish, and be a catalyst to implement ideas… together. We invite those who choose to lead through their ideas to submit proposals for sessions or preconference workshops, as well as nominate keynote speakers. This is an opportunity to share your ideas or unfinished work, inciting collaboration and advancing the library profession... Early-bird Registration for the Exchange Ends in Three Days! The March 1 early-bird registration deadline for the Exchange is approaching. Register today and save! There’s still time to register for the Exchange at a discount, with early-bird registration rates at $199 for ALCTS, LITA, and LLAMA members; $255 for ALA individual members; $289 for non-members; $79 for student/retired members; $475 for groups; and $795 for institutions. Early-bird registration ends March 1. Taking place May 4, 6, and 8, the Exchange will engage a wide range of presenters and participants, facilitating enriching conversations and learning opportunities in a three-day, fully online, virtual forum. Programming includes keynote presentations from Emily Drabinski and Rebekkah Smith Aldrich, and sessions focusing on leadership and change management, continuity and sustainability, and collaborations and cooperative endeavors. In addition to these sessions, the Exchange will offer lightning rounds and virtual poster sessions. For up-to-date details on sessions, be sure to check the Exchange website as new information... Jobs in Information Technology: February 26, 2020 New this week Back End Drupal Web Developer, Multnomah County Library, Portland, OR Distance Education & Outreach Librarian, Winona State University, Winona, MN Senior Systems Specialist, PrairieCat, Library Consortium, Coal Valley, IL Training and Outreach Coordinator, PrairieCat, Library Consortium, Coal Valley, IL Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Deadline Extended to March 15 – Submit a Proposal to Teach for LITA The deadline to submit LITA education proposals has been extended to March 15th. We’re seeking instructors passionate about library technology topics to share their expertise and teach a webinar, webinar series, or online course for LITA this year. Instructors receive a $500 honorarium for an online course or $150 for a webinar, split among instructors. Check out our list of current and past course offerings to see what topics have been covered recently. Be part of another slate of compelling and useful online education programs this year! Submit your LITA education proposal today! For questions or comments related to teaching for LITA, contact us at lita@ala.org or (312) 280-4268. The 2020 Census Starts in Two Weeks — Are Your Computers Ready? Post courtesy of Gavin Baker, ALA Office of Public Policy and Advocacy, Deputy Director, Public Policy and Government Relations On March 12, millions of American households will begin receiving mailings inviting them to respond to the 2020 Census. To get an accurate count, everyone has to respond – if they don’t, our libraries and communities will lose needed funding. As the mailings arrive, patrons may come to your library with questions – and, with a new option to respond online, to complete the questionnaire using the library’s computers or internet. To help you prepare, ALA has a new, two-page tip sheet, “Libraries and the 2020 Census: Responding to the Census,” that provides key dates, options for responding, and advice for libraries preparing for the 2020 Census. For instance, the tip sheet explains these important facts: Ways to Respond: Households can respond to the Census online, by phone, or by mail... News Regarding the Future of LITA after the Core Vote Dear LITA members, We’re writing about the implications of LITA’s budget for the upcoming 2020-21 fiscal year, which starts September 1, 2020. We have reviewed the budget and affirmed that LITA will need to disband if the Core vote does not succeed. Since the Great Recession, membership in professional organizations has been declining consistently. LITA has followed the same pattern and as a result, has been running at a deficit for a number of years. Each year, LITA spends more on staff, events, equipment, software, and supplies than it takes in through memberships and event registrations. We were previously able to close our budgets through the use of our net asset balance which is, in effect, like a nest egg for the division. Of course, that could not continue indefinitely. Our path towards sustainability has culminated in the proposal to form Core: Leadership, Infrastructure, Futures. The new division would come with... Boards of ALCTS, LITA and LLAMA put Core on March 2020 ballot The Boards of the Association for Library Collections & Technical Services (ALCTS), Library Information Technology Association (LITA) and the Library Leadership & Management Association (LLAMA) have all voted unanimously to send to members their recommendation that the divisions form a new division, Core: Leadership, Infrastructure, Futures.  ALCTS, LITA and LLAMA will vote on the recommendation during the upcoming American Library Association (ALA) election. If approved by all three memberships, and the ALA Council, the three long-time divisions will end operations on August 31, 2020, and merge into Core on September 1. Members of the three Boards emphasized that Core will continue to support the groups in which members currently find their professional homes while also creating new opportunities to work across traditional division lines. It is also envisioned that Core would strengthen member engagement efforts and provide new career-support services. If one or more of the division memberships do not... Jobs in Information Technology: February 19, 2020 New this week Librarian (Emphasis in User Experience and Technology), Chabot College, Hayward, CA Librarian II (ILS Admin & Tech Services), Duluth Public Library, Duluth, MN Distance Education & Outreach Librarian, Winona State University, Winona, MN Head, Digital Initiatives – Tisch Library, Tufts University, Medford, MA Online Learning and User Experience Librarian, Ast or Asc Professor, SIU Edwardsville, Edwardsville, IL Discovery and Systems Librarian, Hamilton College, Clinton, NY Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Early-bird registration ends March 1st for the Exchange With stimulating programming, including discussion forums and virtual poster sessions, the Exchange will engage a wide range of presenters and participants, facilitating enriching conversations and learning opportunities in a three-day, fully online, virtual forum. Programming includes keynote presentations from Emily Drabinski and Rebekkah Smith Aldrich, and sessions focusing on leadership and change management, continuity and sustainability, and collaborations and cooperative endeavors. The Exchange will take place May 4, 6, and 8. In addition to these sessions, the Exchange will offer lightning rounds and virtual poster sessions. For up-to-date details on sessions, be sure to check the Exchange website as new information is being added regularly. Early-bird registration rates are $199 for ALCTS, LITA, and LLAMA members, $255 for ALA individual members, $289 for non-members, $79 for student members, $475 for groups, and $795 for institutions. Early-bird registration ends March 1. Want to register your group or institution? Groups watching the... Jobs in Information Technology: February 13, 2020 New this week Upper School Librarian (PDF), St. Christopher’s School, Richmond, VA Diversity and Engagement Librarian, Ast or Asc Professor, SIU Edwardsville, Edwardsville, IL Repository Services Manager, Washington University, Saint Louis, MO Information Technology Librarian, Albin O. Kuhn Library & Gallery (UMBC), Baltimore, MD Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. LITA Blog Call for Contributors We’re looking for new contributors for the LITA Blog! Do you have just a single idea for a post or a series of posts? No problem! We’re always looking for guest contributors with new ideas. Do you have thoughts and ideas about technology in libraries that you’d like to share with LITA members? Apply to be a regular contributor! If you’re a member of LITA, consider either becoming a regular contributor for the next year or submitting a post or two as a guest. Apply today! Learn the latest in Library UX with this LITA Webinar There’s a seat waiting for you… Register for this LITA webinar today! How to Talk About Library UX – Redux Presenter: Michael Schofield Librarian / Director of Engineering, WhereBy.Us Wednesday, March 11, 2020 12:00 – 1:00 pm Central Time The last time we did this webinar was in 2016 – and a lot’s changed. The goal then was to help establish some practical benchmarks for how to think about the user experience and UX design in libraries, which suffered from a lack of useful vocabulary and concepts: while we might be able to evangelize the importance of UX, LibUXers struggled with translating their championship into the kinds of bureaucratic goals that unlocked real budget for our initiatives. It’s one thing to say, “the patron experience is critical!” It’s another thing to say, “the experience is critical – so pay for OptimalWorkshop, or hire a UX Librarian, or give me a... Joint Working Group on eBooks and Digital Content in Libraries John Klima, the LITA Representative to the Working Group on eBooks and Digital Content, recently agreed to an interview about the latest update from ALA Midwinter 2020. Watch the blog for more updates from John about the Working Group in the coming months! What is the mission and purpose of the Working Group on eBooks and Digital Content? Quoting from the minutes of the ALA Executive Board Fall meeting in October of 2019: [The purpose of this working group is] to address library concerns with publishers and content providers specifically to develop a variety of digital content license models that will allow libraries to provide content more effectively, allowing options to choose between one-at-a-time, metered, and other options to be made at point of sale; to make all content available in print and for which digital variants have been created to make the digital content equally available to libraries without... 2020 Forum Call for Proposals LITA, ALCTS and LLAMA are now accepting proposals for the 2020 Forum, November 19-21 at the Renaissance Baltimore Harborplace Hotel in Baltimore, MD. Intention and Serendipity: Exploration of Ideas through Purposeful and Chance Connections Submission Deadline: March 30, 2020 Our library community is rich in ideas and shared experiences. The 2020 Forum Theme embodies our purpose to share knowledge and gain new insights by exploring ideas through an interactive, hands-on experience. We hope that this Forum can be an inspiration to share, finish, and be a catalyst to implement ideas…together. We invite those who choose to lead through their ideas to submit proposals for sessions or preconference workshops, as well as nominate keynote speakers. This is an opportunity to share your ideas or unfinished work, inciting collaboration and advancing the library profession forward through meaningful dialogue. We encourage diversity in presenters from a wide range of background, libraries, and experiences. We deliberately... LITA announces the 2020 Excellence in Children’s and Young Adult Science Fiction Notable Lists The LITA Committee Recognizing Excellence in Children’s and Young Adult Science Fiction presents the 2020 Excellence in Children’s and Young Adult Science Fiction Notable Lists. The lists are composed of notable children’s and young adult science fiction published between November 2018 and October 2019 and organized into three age-appropriate categories. The annotated lists will be posted on the website at www.sfnotables.org. The Golden Duck Notable Picture Books List is selected from books intended for pre-school children and very early readers, up to 6 years old. Recognition is given to the author and the illustrator: Field Trip to the Moon by John Hare. Margaret Ferguson Books Hello by Aiko Ikegami. Creston Books How to be on the Moon by Viviane Schwarz. Candlewick Press Out There by Tom Sullivan. Balzer + Bray The Babysitter From Another Planet by Stephen Savage. Neal Porter Books The Space Walk by Brian Biggs. Dial Books for Young... Jobs in Information Technology: February 5, 2020 New this week (Tenure-Track) Senior Assistant Librarian, Sonoma State UniversityRohnert Park, CA Data Services Librarian for the Sciences, Harvard UniversityCambridge, MA Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Teach for LITA: Submit Proposals by February 16 Reminder: The deadline to submit LITA education proposals is February 16th. Please share our CFP with your colleagues. We are seeking instructors passionate about library technology topics to share their expertise and teach a webinar, webinar series, or online course for LITA this year. All topics related to the intersection of technology and libraries are welcomed, including: Machine Learning IT Project Management Data Visualization Javascript, including: jquery, json, d3.js Library-related APIs Change management in technology Big Data, High Performance Computing Python, R, GitHub, OpenRefine, and other programming/coding topics in a library context Supporting Digital Scholarship/Humanities Virtual and Augmented Reality Linked Data Implementation or Participation in Open Source Technologies or Communities Open Educational Resources, Creating and Providing Access to Open Ebooks and Other Educational Materials Managing Technology Training Diversity/Inclusion and Technology Accessibility Issues and Library Technology Technology in Special Libraries Ethics of Library Technology (e.g., Privacy Concerns, Social Justice Implications) Library/Learning Management... Jobs in Information Technology: January 29, 2020 New this week STEM, Instruction, and Assessment Librarian, McDaniel College, Westminster, MD Data Science/Analysis Research Librarian, Hamilton College, Clinton, NY Electronic Resources Librarian, Brown University, Providence, RI Systems Librarian, Brown University, Providence, RI Head, Technical Services, Brown University, Providence, RI Network and Systems Administrator, St. Lawrence University, Canton, NY Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Emily Drabinski, Rebekkah Smith Aldrich to deliver keynotes at the Exchange Virtual Forum The Association for Library Collections and Technical Services (ALCTS), the Library Information Technology Association (LITA) and the Library Leadership and Management Association (LLAMA) have announced that Emily Drabinski and Rebekkah Smith Aldrich will deliver keynote addresses at the Exchange Virtual Forum. The theme for the Exchange is “Building the Future Together,” and it will take place on the afternoons of May 4, 6 and 8. Each day has a different focus, with day 1 exploring leadership and change management; day 2 examining continuity and sustainability; and day 3 focusing on collaborations and cooperative endeavors. Drabinski’s keynote will be on May 4, and Smith Aldrich’s will be on May 8.  Emily Drabinski is the Critical Pedagogy Librarian at Mina Rees Library, Graduate Center, City University of New York (CUNY). She is also the liaison to the School of Labor and Urban Studies and other CUNY masters and doctoral programs. Drabinski’s research includes... Jobs in Information Technology: January 22, 2020 New this week Information Technology and Web Services (ITWS) Department Head, Auraria Library, Denver, CO Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Advice for the New Systems Librarian – Building Relationships 2.0 Advice for the New Systems Librarian – Building Relationships, Part 2 Previous articles in this series: Building Relationships, Helpful Resources, A Day in the Life I am at the two-year mark of being in my role as systems librarian at Jacksonville University, and I continue to love what I do. I am working on larger-scale projects and continuing to learn new things every week. There has not been a challenge or new skill to learn yet that I have been afraid of. My first post in this series highlighted groups and departments that may be helpful in learning your new role. Now that I’m a little more seasoned, I have had the opportunity to work with even more departments and individuals at my institution on various projects. Some of these departments may be unique to me, but I would imagine you would find counterparts where you work. The Academic Technology... Jobs in Information Technology: January 15, 2020 New this week Performing and Visual Arts Librarian, Butler University, Indianapolis, IN Librarian, The College of Lake County, Grayslake, IL User Experience (UX) Librarian, UNC Charlotte, J. Murrey Atkins Library, Charlotte, NC Southeast Asia Digital Librarian, Cornell University, Ithaca, NY Head of Digital Infrastructure Services at UConn Library, University of Connecticut, Storrs, CT Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. LITA Education Call for Proposals for 2020 What library technology topics are you passionate about? Have something you can help others learn? LITA invites you to share your expertise with an international audience! Our courses and webinars are based on topics of interest to library technology workers and technology managers at all levels in all types of libraries. Taught by experts, they reach beyond physical conferences to bring high quality continuing education to the library world. We deliberately seek and strongly encourage submissions from underrepresented groups, such as women, people of color, the LGBTQA+ community, and people with disabilities. Submit a proposal by February 16th to teach a webinar, webinar series, or online course for Winter/Spring/Summer/Fall 2020. All topics related to the intersection of technology and libraries are welcomed, including: Machine Learning IT Project Management Data Visualization Javascript, including: jquery, json, d3.js Library-related APIs Change management in technology Big Data, High Performance Computing Python, R, GitHub, OpenRefine,... Jobs in Information Technology: January 8, 2020 New this week Web Services & Discovery Manager, American University Library, Washington, DCSenior Research Librarian, Finnegan, Washington, DC Electronic Resources and Discovery Librarian, Auburn University, AL ​​​​​​​Discovery & Systems Librarian, California State University, Dominguez Hills, Carson, CA Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. UX “don’ts” we still need from Erika Hall The second edition of Erika Hall’s Just Enough Research dropped October 2019; although this excellent volume was previously unknown to me I am taking the opportunity now to consume, embody, and evangelize Hall’s approach to user research. Or, as Hall might put it, I’m a willing convert to the gospel of “Enoughening”. Hall is a seasoned design consultant and co-founder of Mule Design Studio but her commercial approach is tempered by a no-nonsense attitude that makes her solutions and suggestions palatable to a small UX team such as my own at Indiana University Bloomington Libraries. Rather than conduct a formulaic book review of Just Enough Research, I want to highlight some specific things Hall tells the reader not to do in their UX research. This list of five “don’ts” summarize Hall’s tone, style, and approach. It will also highlight the thesis of the second edition’s brand new chapter on surveys.... Jobs in Information Technology: December 18, 2019 New this week Vice Provost and University Librarian, University of Oregon, Eugene, OR Data Migration Specialist (Telecommuting position), Bywater Solutions, Remote Position Research Librarian, Oak Ridge National Laboratory, Oak Ridge, TN Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Announcing the new LITA eLearning Coordinator We are proud to announce that Kira Litvin will be the new LITA eLearning Coordinator. Litvin has been the Continuing Education Coordinator at the Colorado School for Public Health for the past six months. She provides distance/online learning library services and instruction and works regularly with other librarians, instructional designers, faculty, and educators to collaborate on instructional delivery projects. “I am passionate about being a librarian and working with people in an online environment!  For the past nine years I have worked with libraries that are exclusively online. My roles include administering and managing electronic library systems, including Springshare products, and providing virtual reference and instruction to students, faculty and staff. More recently I have transitioned to working as an eLearning Instructional Designer which means I design and develop instructional content available for asynchronous learning and professional development. As online learning continues to grow, I believe that libraries need to... Submit a Nomination for 2020 Awards and Scholarships Hugh C. Atkinson Memorial Award The award honors the life and accomplishments of Hugh C. Atkinson by soliciting nominations and recognizing the outstanding accomplishments of an academic librarian who has worked in the areas of library automation or library management and has made contributions (including risk taking) toward the improvement of library services or to library development or research. Nomination deadline: January 9, 2020 Winner receives a cash award and a plaque. Learn more about the requirements for the Atkinson Memorial Award. Ex Libris Student Writing Award The LITA/Ex Libris Student Writing Award is given for the best unpublished manuscript on a topic in the area of libraries and information technology written by a student or students enrolled in an ALA-accredited library and information studies graduate program. Application deadline: February 28, 2020 Winner receives a $1,000 cash and a plaque. Learn more about the requirements for the Ex Libris Student... Submit a Nomination for the Hugh C. Atkinson Memorial Award LITA, ACRL, ALCTS, and LLAMA invite nominations for the 2020 Hugh C. Atkinson Memorial Award. Please submit your nominations by January 9, 2020. The award honors the life and accomplishments of Hugh C. Atkinson by recognizing the outstanding accomplishments of an academic librarian who has worked in the areas of library automation or library management and has made contributions (including risk taking) toward the improvement of library services or to library development or research. Winners receive a cash award and a plaque. This award is funded by an endowment created by divisional, individual, and vendor contributions given in memory of Hugh C. Atkinson. The nominee must be a librarian employed in one of the following during the year prior to application for this award: University, college, or community college library Non-profit consortium, or a consortium comprised of non-profits that provides resources/services/support to  academic libraries The nominee must have a minimum... Core Update – 12/12/2019 Greetings again from the Steering Committee of Core: Leadership, Infrastructure, Futures, a proposed division of ALA. Coming up this Friday, December 13 is the last of four town halls we are holding this fall to share information and elicit your input. Please join us! Register for Town Hall 4 today. ALCTS, LITA, and LLAMA division staff will lead this town hall with a focus on Core’s mission, vision, and values; benefits organizationally; benefits to members; and opportunities in the future. Our speakers will be Jenny Levine (LITA Executive Director), Julie Reese (ALCTS Deputy Executive Director), and Kerry Ward (LLAMA Executive Director and interim ALCTS Executive Director). We’re excited to share an updated Core proposal document for ALA member feedback and review, strengthened by your input. We invite further comments on this updated proposal through Sunday, December 15. Meanwhile, division staff will incorporate your comments and finalize this proposal document for... Jobs in Information Technology: December 11, 2019 New this week Senior Specialist – Makerspace, Middle Tennessee State University, Walker Library, Murfreesboro, TN User Experience Librarian, Auburn University, Auburn University, AL Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Announcing the new LITA Blog Editor We are proud to announce that Jessica Gilbert Redman will be the new editor of the LITA Blog.  Gilbert Redman has been the web services librarian at the University of North Dakota for the past three years. She coordinates and writes for the library blog and maintains the library website. She has completed a post-graduate certificate in user experience and always seeks to ensure that end users are able to easily find the information they need to complete their research. Additionally, she realizes communication is the key component in any relationship, be it between libraries and their users or between colleagues, and she always strives to make communication easier for all involved. “I am excited to become more involved in LITA, and I think the position of LITA Blog Editor is an excellent way to meet more people within LITA and ALA, and to maintain a finger on the pulse of new... Jobs in Information Technology: December 4, 2019 New this week Digital Discovery Librarian/Assistant Librarian, Miami University, Oxford, OH Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Jobs in Information Technology: November 27, 2019 New this week Web and Digital Scholarship Technologies Librarian, Marquette University Libraries, Milwaukee, WI Digital Access and Metadata Librarian, Marquette University Libraries, Milwaukee, WI Librarian (San Ramon Campus), Contra Costa Community College District, San Ramon, CA Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Support LITA Scholarships this #GivingTuesday It’s almost #GivingTuesday, so we’re highlighting the difference that LITA scholarships can make, and inviting you to join us in increasing access to LITA events by donating to our scholarship fund today. You can help us to provide more scholarships to events like AvramCamp and LITA Forum, as well as sponsor Emerging Leaders, with your donation today! Your donation of $25 could open up untold opportunities for other library technology professionals. “The LITA scholarship afforded me the opportunity to present at the 2019 AvramCamp and ALA conference. It was an incredible opportunity to network with dozens of information professionals, build connections with people in the field, ask them all of my questions and exchange our technical acumen and job experiences. As a result, I have been offered two interviewing opportunities that were an incredibly valuable experience for my career development. I am very grateful to LITA for the opportunity to... Jobs in Information Technology: November 22, 2019 New This Week Metadata Specialist III, Metadata Services, The New York Public Library, New York, NY eResources Librarian, University of Maryland, Baltimore County, Baltimore, MD Multiple Librarian Positions, George Washington University, Washington DC INFORMATION TECHNOLOGY ANALYST, San Mateo County Libraries, San Mateo County, CA Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Call for Blog Coordinator for the Exchange: An ALCTS/LITA/LLAMA Collaboration The Exchange: An ALCTS/LITA/LLAMA Collaboration brings together experiences, ideas, expertise, and individuals from the three ALA divisions. Broadly organized around the theme of “Building the Future Together,” the Exchange will examine the topic in relation to collections, leadership, technology, innovation, sustainability, and collaborations. Participants from diverse areas of librarianship will find the three days of presentations, panels, and lightning rounds both thought-provoking and highly relevant to their current and future career paths. The Exchange will engage a wide range of presenters and participants, facilitating enriching conversations and learning opportunities. Divisional members and non-members alike are encouraged to register and bring their questions, experiences, and perspectives to the events. As part of the conference experience, the Exchange plans to host regular blog posts in advance of the conference. Blog posts will serve multiple purposes: generate excitement and interest in content, encourage participation outside of simply watching presentations, and provide an avenue... The Exchange Call for Proposals and Informational Webinar ALCTS, LITA, and LLAMA are now accepting proposals for the Exchange: Building the Future Together, a virtual forum scheduled for May 4, 6, and 8, 2020. The twelve hour virtual event will take place over three afternoons, featuring the following themes and topics: Day 1 – Leadership and Change Management Day 2 – Continuity and Sustainability Day 3 – Collaborations and Cooperative Endeavors Session Formats The Exchange will feature the following session formats: Full-session Proposals Presenters prepare content for a 35-minute session, with an additional 10-minute Q&A period for all presenters. Full-session proposals may include multiple presentations with content that is topically related. Lightning Round Each participant is given five minutes to give a presentation. At the end of the lightning round, there will be a 10-15-minute Q&A period for all presenters in the session. Topics for lightning rounds related to innovative projects or research are encouraged. Proposals will be... Registration is Now Open for the Exchange In May 2020, join ALCTS, LITA, and LLAMA for an exciting and engaging virtual forum. Registration is now open!   The Exchange: An ALCTS/LITA/LLAMA Collaboration brings together experiences, ideas, expertise, and individuals from the three ALA divisions. Broadly organized around the theme of “Building the Future Together,” the Exchange will examine the topic in relation to collections, leadership, technology, innovation, sustainability, and collaborations. Participants from diverse areas of librarianship will find the three days of presentations, panels, and lightning rounds both thought-provoking and highly relevant to their current and future career paths. The Exchange will engage a wide range of presenters and participants, facilitating enriching conversations and learning opportunities. Divisional members and non-members alike are encouraged to register and bring their questions, experiences, and perspectives to the events. “Building on the rich educational traditions of the three divisions, the Exchange provides the opportunity to break down silos and explore synergies... Core Call for Comment Greetings again from the Steering Committee of Core: Leadership, Infrastructure, Futures, a proposed division of ALA. The Steering Committee welcomes comments on the draft division proposal documentation through November 25th. Please join the conversation! Your perspectives and input are shaping the identity and priorities of the proposed division. We’re asking for you to respond to the documents with key questions in mind, including: Does this make sense to someone new to ALCTS/ LITA/ LLAMA? Does this piece of the plan reflect how members want the new division to function? Are there any points that are cause for concern? If you’re interested in helping us in the review process or other work ahead, please consider volunteering for Core. We’re eager to collaborate with you! We’re working hard to ensure everyone can participate in the Core conversation, so please let us know what could make Core a compelling and worthy division home for you. Keep the feedback and input coming! Full details for all our upcoming events are... LIS Students: Apply for the 2020 Larew Scholarship for Tuition Help The Library and Information Technology Association (LITA) and Baker & Taylor are accepting applications for the LITA/Christian (Chris) Larew Memorial Scholarship for those who plan to follow a career in library and information technology, demonstrate potential leadership, and hold a strong commitment to library automation. The winner will receive a $3,000 check and a citation. The application form is open through March 1, 2020. Criteria for the Scholarship includes previous academic excellence, evidence of leadership potential, and a commitment to a career in library automation and information technology. Candidates should illustrate their qualifications for the scholarships with a statement indicating the nature of their library experience, letters of reference and a personal statement of the applicant’s view of what they can bring to the profession. Winners must have been accepted to a Master of Library Science (MLS) program recognized by the American Library Association. References, transcripts, and other documents must be postmarked no... Jobs in Information Technology: November 13, 2019 New This Week Full Time Faculty – Non Tenure Track, SJSU School of Information, San Jose, CA Digital Collections Librarian, Union College, Schenectady, NY Web Services Librarian, University of Oregon Libraries, Eugene, OR GALILEO Programmer/Analyst, University of Georgia Libraries, Athens, GA Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. LITA Opens Call for Innovative LIS Student Writing Award for 2020 The Library and Information Technology Association (LITA), a division of the American Library Association (ALA), is pleased to offer an award for the best unpublished manuscript submitted by a student or students enrolled in an ALA-accredited graduate program. Sponsored by LITA and Ex Libris, the award consists of $1,000, publication in LITA’s referred journal, Information Technology and Libraries (ITAL), and a certificate. The deadline for submission of the manuscript is February 28, 2020. The award recognizes superior student writing and is intended to enhance the professional development of students. The manuscript can be written on any aspect of libraries and information technology. Examples include, but are not limited to, digital libraries, metadata, authorization and authentication, electronic journals and electronic publishing, open source software, distributed systems and networks, computer security, intellectual property rights, technical standards, desktop applications, online catalogs and bibliographic systems, universal access to technology, and library consortia. To be eligible, applicants must follow these guidelines and fill out the application form (PDF).... Jobs in Information Technology: November 6, 2019 New This Week Open Educational Resources Production Manager, Oregon State University – Ecampus, Corvallis, OR User Experience Librarian, Northwestern University, Evanston, IL Institute for Clinical and Translational Research (ICTR) Librarian, University of Maryland, Baltimore, Baltimore, MD Director of Collections & Access, Wheaton College, Norton, MA Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Nominate a Colleague Doing Cutting Edge Work in Tech Education for the LITA Library Hi Tech Award Nominations are open for the 2020 LITA/Library Hi Tech Award, which is given each year to an individual or institution for outstanding achievement in educating the profession about cutting edge technology within the field of library and information technology. Sponsored by the Library and Information Technology Association (LITA) and Library Hi Tech, the award includes a citation of merit and a $1,000 stipend provided by Emerald Publishing, publishers of Library Hi Tech. The deadline for nominations is December 31, 2019. The award, given to either a living individual or an institution, may recognize a single seminal work or a body of work created during or continuing into the five years immediately preceding the award year. The body of work need not be limited to published texts but can include course plans or actual courses and/or non-print publications such as visual media. Awards are intended to recognize living persons rather than to honor the deceased; therefore,... Propose a Topic for the ITAL “Public Libraries Leading the Way” Column Information Technology and Libraries (ITAL), the quarterly open-access journal published by ALA’s Library Information Technology Association, is looking for contributors for its regular “Public Libraries Leading the Way” column. This column highlights a technology-based innovation or approach to problem solving from a public library perspective. Topics we are interested in include the following, but proposals on any other technology topic are welcome. 3-D printing and makerspaces Civic technology Drones Diversity, equity, and inclusion and technology Privacy and cyber-security Virtual and augmented reality Artificial intelligence Big data Internet of things Robotics Geographic information systems and mapping Library analytics and data-driven services Anything else related to public libraries and innovations in technology To propose a topic, use this brief form, which will ask you for three pieces of information: Your name Your email address A brief (75-150 word) summary of your proposed column that describes your library, the technology you wish to... ALCTS, LITA and LLAMA collaborate for virtual forum The Association for Library Collections & Technical Services (ALCTS), the Library and Information Technology Association (LITA) and the Library Leadership & Management Association (LLAMA) have collaborated to create The Exchange, an interactive, virtual forum designed to bring together experiences, ideas, expertise and individuals from these American Library Association (ALA) divisions. Modeled after the 2017 ALCTS Exchange, the Exchange will be held May 4, May 6 and May 8 in 2020 with the theme “Building the Future Together.” As a fully online interactive forum, the Exchange will give participants the opportunity to share the latest research, trends and developments in collections, leadership, technology, innovation, sustainability and collaborations. Participants from diverse areas of librarianship will find the three days of presentations, panels and activities both thought-provoking and highly relevant to their current and future career paths. The Exchange will engage an array of presenters and participants, facilitating enriching conversations and learning opportunities.... Submit your 2020 Annual Meeting Request by Feb 7 The LITA meeting request form is now open for the 2020 ALA Annual Conference in Chicago, IL. All LITA committee and interest group chairs should use it to let us know if you plan to meet at Annual. We’re looking forward to seeing what you have planned. The deadline to submit your meeting request is Friday, February 7, 2020. We’re going to change how we’ve listed meetings in the past. If you do NOT submit this form, your group will NOT be included in the list of LITA session on our website, the online scheduler, or the print program. While we’ll still hold the Joint Chairs meeting on Saturday from 8:30-10:00am and use that same room for committee and IG meetings from 10:30-11:30am, your group will only be listed if you submit this form. You should also use it if you want to request a meeting on a different day... Submit a Nomination for the Prestigious Kilgour Technology Research Award LITA and OCLC invite nominations for the 2020 Frederick G. Kilgour Award for Research in Library and Information Technology. Submit your nomination no later than December 31, 2019. The Kilgour Research Award recognizes research relevant to the development of information technologies, in particular research showing promise of having a positive and substantive impact on any aspect of the publication, storage, retrieval, and dissemination of information or how information and data are manipulated and managed. The winner receives $2,000 cash, an award citation, and an expense-paid trip (airfare and two nights lodging) to the 2020 ALA Annual Conference in Chicago, IL. Nominations will be accepted from any member of the American Library Association. Nominating letters must address how the research is relevant to libraries; is creative in its design or methodology; builds on existing research or enhances potential for future exploration; and/or solves an important current problem in the delivery of... Core Update – October 23, 2019 Greetings again from the Steering Committee of Core: Leadership, Infrastructure, Futures, a proposed division of ALA. Thank you for all of your questions and feedback about the proposed new division! The Steering Committee has been revising Core documents based on what we’ve heard from you so far in order to share draft bylaws and other information with you soon. We want you to know that we are continuing to listen and incorporate the feedback you’re providing via Town Halls, Twitter Chats, the Core feedback form, and more.  In our next Steering Committee meeting, we will be discussing how we can support the operational involvement of interested volunteers. If you have ideas on how members should be involved, please share them with us through the feedback form.  We’re working hard to ensure everyone can participate in the Core conversation, so please let us know what could make Core a compelling and worthy division home for... Jobs in Information Technology: October 23, 2019 New This Week Metadata & Research Support Specialist, Open Society Research Services, Open Society Foundations, New York, NY Head of Public Services in The Daniel Library, The Citadel, The Military College of South Carolina, Charleston, SC Engineering and Science Liaison, MIT, Cambridge, MA Head of Technical Services – Library, The Citadel, The Military College of South Carolina, Charleston, SC Analyst Programmer 3, Oregon State University Libraries and Press, Corvallis, OR Collection Information Specialist, Isabella Stewart Gardner Museum, Boston, MA Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. Jobs in Information Technology: October 16, 2019 New This Week Metadata Librarian for Distinctive Collections, MIT, Cambridge, MA Electronic Access Librarian, University of Rochester, Rochester, NY Dean, University Libraries, University of Northern Colorado, Greeley, CO Administrative/Metadata Specialist, ASR International Corp., Monterey, CA Core Systems Librarian, University of Oregon Libraries, Eugene, OR Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. September 2019 ITAL Issue Now Available The September 2019 issue of Information Technology and Libraries (ITAL) is available now. In this issue, ITAL Editor Ken Varnum announces six new members of the ITAL Editorial Board. Our content includes a recap of Emily Morton-Owens’ President’s Inaugural Message, “Sustaining LITA“, discussing the many ways LITA strives to provide a sustainable member organization. In this edition of our “Public Libraries Leading the Way” series, Thomas Lamanna discusses ways libraries can utilize their current resources and provide ideas on how to maximize effectiveness and roll new technologies into operations in “On Educating Patrons on Privacy and Maximizing Library Resources.“ Featured Articles: “Library-Authored Web Content and the Need for Content Strategy,” Courtney McDonald and Heidi Burkhardt Increasingly sophisticated content management systems (CMS) allow librarians to publish content via the web and within the private domain of institutional learning management systems. “Libraries as publishers”may bring to mind roles in scholarly communication and... Jobs in Information Technology: October 9, 2019 New This Week Information Research Specialist, Harvard Business School, Boston, MA 2020-2021 Library Residency Program (Provost’s Postdoctoral Fellowship), New York University, Division of Libraries, New York, NY Executive Director, Library Connection, Inc, Windsor, CT Associate University Librarian, Cornell University, Ithaca, NY Visit the LITA Jobs Site for additional job listings and information on submitting your own job posting. New vacancy listings are posted on Wednesday afternoons. Latest LITA Learnings There’s a seat waiting for you… Register today for a LITA webinar! Guiding Students through Digital Citizenship Presenter: Casey Davis Instructional Designer (IT), Arizona State University Wednesday, October 16, 2019 12:00 – 1:30 pm Central Time As academic librarians, we help build our students into digital citizens. It’s our duty to make sure students have the tools and resources to be savvy tech users, become information literate, and understand the permanence of their digital actions. In this 90-minute webinar, you’ll learn research-based best practices you can implement using the framework of the hero’s journey without creating an additional burden on faculty, staff, and students. Learning objectives for this program include: • An expanded understanding of digital citizenship within the context of college/university life • Examining areas where increased awareness and practice is needed within the college/university community • Creating authentic training for increasing digital citizenship within the college/university community View details and Register here. In-House vs.... litablog-org-6683 ---- LITA Blog – Empowering libraries through technology LITA Blog Empowering libraries through technology Toggle navigation About Regular Contributors Get Involved! Join LITA LITA Jobs Jobs in Information Technology: August 25, 2020 August 25, 2020August 28, 2020| Jenny Levine New This Week Coordinator of Digital Scholarship and Programs, Marquette University Libraries, Milwaukee WI Digital Scholarship Coordinator, UNC Charlotte, Charlotte, NC Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Continue Reading LITA Jobs Jobs in Information Technology: August 13, 2020 August 13, 2020August 13, 2020| Jenny Levine New This Week Information Systems Manager (PDF), The Community Library Association, Ketchum, ID Children’s Librarian, Buhl Public Library, Buhl, ID Technology Integration Librarian, Drexel University Libraries, Philadelphia, PA Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Continue Reading Core Update Your Core Community Update August 4, 2020| Chrishelle Thomas Much has been happening behind-the-scenes to prepare for Core’s upcoming launch on September 1st, so we want to update you on the progress we’ve made. At the 2020 ALA Virtual Conference Council meetings, the ALA Council approved the creation of Core, so we’re official! It’s been a difficult summer for everyone given the global situation, but this was a milestone we’re excited to reach. What We’ve Been Doing In May, the Core Transition Committee (the 9 division presidents plus senior staff) formed 11 working groups of members from all 3 divisions to make recommendations about how to proceed with our awards/scholarships, budget/finance, committees, communications, conference programming, continuing education, fundraising/sponsorships, interest groups, member engagement, nominations for 2021 president-elect, publications, and standards. These groups have done an amazing amount of work in a very short time period, and we’re grateful to these members for their commitment and effort. We’re happy to report… Continue Reading Education Free LITA Webinar ~ Library Tech Response to Covid-19 ~ August 5th July 31, 2020| Chrishelle Thomas Sign up for this free LITA webinar: Library Tech Response to Covid-19 Libraries are taking the necessary precautions to create a safe environment during the pandemic. Social distancing isn’t the only solution, but providing access to loanable technologies, including handling and quarantine of equipment, cleaning, and other safety and health concerns are just some of the measures put in place. With the ongoing disruption to library services caused by COVID-19, what reopening planning policies should be considered for usage? In this free 90-minute presentation, our presenters will share tips that might be helpful to other librarians before they reopen. The presenters will also talk about the evolution of the phased plan from the establishment of a temporary computer lab in the library as COVID-19 began to spread in March 2020, to the current phased approach for gradual reopening. Justin will also offer insight into managed access, technology and services, workflows, messaging,… Continue Reading LITA Jobs Jobs in Information Technology: July 29, 2020 July 29, 2020| Jenny Levine New This Week Library Director, Walpole Town Library, Walpole, NH Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Continue Reading Program Planning Core Call for 2021 ALA Annual Program Proposals July 24, 2020| Chrishelle Thomas Submit an ALA 2021 Annual Conference program proposal for ALA’s newest division, Core: Leadership, Infrastructure, Futures, which will begin on September 1, 2020. Proposals are due September 30, 2020, and you don’t need to be a Core member to submit a proposal. Submit your idea using this proposal form. Core welcomes topics of interest to a wide range of library professionals in many different areas, including… 1. Access and Equity Advocacy in areas such as copyright, equity of access, open access, net neutrality, and privacy Preservation Week Equity, diversity, and inclusion, both within the division and the profession, as related to Core’s subject areas 2. Assessment Emphasizing the role of assessment in demonstrating the impacts of libraries or library services Assessment tools, methods, guidelines, standards, and policies and procedures 3. Leadership and Management Developing leaders at every level Best practices for inclusion by using an equity lens to examine leadership… Continue Reading Education Core Call for Webinar Proposals July 16, 2020| Chrishelle Thomas Submit a webinar proposal for ALA’s newest division, Core: Leadership, Infrastructure, Futures, which will begin on September 1, 2020. Proposals are due September 1, 2020, and you don’t need to be a Core member to submit a proposal. Early submissions are encouraged and will be considered for September and October presentations. Submit your idea using this proposal form. Core webinars reach a wide range of library professionals in many different areas, including… 1. Access and Equity Advocacy in areas such as copyright, equity of access, open access, net neutrality, and privacy Preservation Week Equity, diversity, and inclusion, both within the division and the profession, as related to Core’s subject areas 2. Assessment Emphasizing the role of assessment in demonstrating the impacts of libraries or library services Assessment tools, methods, guidelines, standards, and policies and procedures 3. Leadership Developing leaders at every level Best practices for inclusion by using an equity lens to examine… Continue Reading Core Virtual Forum Core Virtual Forum is excited to announce our 2020 Keynote Speakers! July 13, 2020July 13, 2020| Chrishelle Thomas Core Virtual Forum welcomes our 2020 Keynote speakers, Dr. Meredith D. Clark and Sofia Leung! Both speakers embody our theme in leading through their ideas and are catalysts for change to empower our community and move the library profession forward. Dr. Clark is a journalist and Assistant Professor in Media Studies at the University of Virginia. She is Academic Lead for Documenting the Now II, funded by the Andrew W. Mellon Foundation. Dr. Clark develops new scholarship on teaching students about digital archiving and community-based archives from a media studies perspective. She will be a 2020-2021 fellow with Data & Society. She is a faculty affiliate at the Center on Digital Culture and Society at the University of Pennsylvania. And, she sits on the advisory boards for Project Information Literacy, and for the Center for Critical Race and Digital Studies at New York University. Clark is an in-demand media consultant… Continue Reading ITAL Catch up on the June 2020 Issue of Information Technology and Libraries July 8, 2020| Chrishelle Thomas The June 2020 issue of Information Technology and Libraries (ITAL) was published on June 15. Editor Ken Varnum and LITA President Emily Morton-Owens reflect on the past three months in their Letter from the Editor, A Blank Page, and LITA President’s Message, A Framework for Member Success, respectively. Kevin Ford is the author of this issue’s “Editorial Board Thoughts” column, Seeing through Vocabularies. Rounding out our editorial section, the June “Public Libraries Leading the Way” section offers two items. Chuck McAndrew of the Lebanon (New Hampshire) Public Libraries describes his leadership in the IMLS-funded LibraryVPN project. Melody Friedenthal, of the Worcester (Massachusetts) Public Library talks about how she approached and teaches an Intro to Coding Using Python course. Peer-reviewed Content Virtual Reality as a Tool for Student Orientation in Distance Education Programs: A Study of New Library and Information Science Students Dr. Sandra Valenti, Brady Lund, Ting Wang Virtual reality… Continue Reading LITA Jobs Jobs in Information Technology: July 8, 2020 July 8, 2020July 9, 2020| Jenny Levine New This Week Dean of Libraries, San Jose State University, San Jose, CA Deputy Library Director, City of Carlsbad, Carlsbad, CA Visit the LITA Jobs Site for additional job openings and information on submitting your own job posting. Continue Reading Posts navigation Older posts Upcoming Events Bibliometrics for Librarians Presented by Phillip Doehle and Clarke Lakovakis on July 9, 2020 – July 30, 2020 Virtual Reality, Augmented Reality, Mixed Reality and the Academic Library Presenters: Dr. Plamen Miltenoff and Mark Gill Offered: August 6, 2020 – August 27, 2020 Core Virtual Forum Visit our website for the latest updates on the Core Virtual Forum in Fall 2020. Recent Posts Jobs in Information Technology: August 25, 2020 Jobs in Information Technology: August 13, 2020 Your Core Community Update Free LITA Webinar ~ Library Tech Response to Covid-19 ~ August 5th Jobs in Information Technology: July 29, 2020 Archives Archives Select Month August 2020 July 2020 June 2020 May 2020 April 2020 March 2020 February 2020 January 2020 December 2019 November 2019 October 2019 September 2019 August 2019 July 2019 June 2019 May 2019 April 2019 March 2019 February 2019 January 2019 December 2018 November 2018 October 2018 September 2018 August 2018 July 2018 June 2018 May 2018 April 2018 March 2018 February 2018 January 2018 December 2017 November 2017 October 2017 September 2017 August 2017 July 2017 June 2017 May 2017 April 2017 March 2017 February 2017 January 2017 December 2016 November 2016 October 2016 September 2016 August 2016 July 2016 June 2016 May 2016 April 2016 March 2016 February 2016 January 2016 December 2015 November 2015 October 2015 September 2015 August 2015 July 2015 June 2015 May 2015 April 2015 March 2015 February 2015 January 2015 December 2014 November 2014 October 2014 September 2014 August 2014 July 2014 June 2014 May 2014 April 2014 March 2014 February 2014 January 2014 December 2013 November 2013 October 2013 September 2013 August 2013 July 2013 June 2013 May 2013 April 2013 March 2013 February 2013 January 2013 December 2012 November 2012 October 2012 September 2012 August 2012 July 2012 June 2012 May 2012 April 2012 March 2012 February 2012 January 2012 December 2011 November 2011 October 2011 September 2011 August 2011 July 2011 June 2011 May 2011 April 2011 March 2011 February 2011 January 2011 December 2010 November 2010 October 2010 September 2010 August 2010 July 2010 June 2010 May 2010 April 2010 March 2010 February 2010 January 2010 December 2009 November 2009 October 2009 September 2009 August 2009 July 2009 June 2009 May 2009 April 2009 March 2009 February 2009 January 2009 December 2008 November 2008 October 2008 September 2008 August 2008 July 2008 June 2008 May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006 October 2006 September 2006 August 2006 July 2006 June 2006 May 2006 April 2006 March 2006 February 2006 January 2006 November 2005 October 2005 September 2005 August 2005 July 2005 June 2005 Categories CategoriesSelect Category A New Division Discussions ALA Annual Conferences    2005    2006    2007    2008    2009    2010    2011    2012    2013    2014    2015    2016    2017    2018    2019    2020 ALA Midwinter Meetings    2006    2007    2008    2009    2010    2011    2012    2013    2014    2015    2017    2018    2019 Awards and Scholarships Begin Transmission BIGWIG Blogging Help Committees and Interest Groups Core Update Education Emerging technologies General information Imagineering Institutional Repositories Instruction and online learning Legislation & Regulation Library experiences LITA Board of Directors LITA Bylaws LITA Elections LITA Forums    2005    2006    2007    2008    2009    2010    2012    2013    2015    2016    2017    2018    2020    Core Virtual Forum LITA Jobs LITA Officers News & Noteworthy Original Content Podcast President’s Post Program Planning Publications    ITAL Roundup SF Notables Spotlight Series Standards Watch Sunday Routines Technical services Top Technology Trends Topic + reaction Twitter Chats Uncategorized Website management and User Experience the blog of the Library and Information Technology Association Privacy Policy Powered by WordPress | WordPress Theme by Tidyhive literarymachin-es-37 ---- literary machines - digital libraries, books, archives literary machines digital libraries, books, archives. About me 05 Jul 2020 Archiviiify A short guide to download digitized books from Internet Archive and rehost on your own infrastructure using IIIF with full-text search. 31 Jan 2018 pywb 2.0 - docker quickstart Four years have passed since i first wrote of pywb: it was a young tool at the time, but already usable and extremely simple to deploy. Since then a lot of works has been done by Ilya Kreymer (and others), resulting in all the new features available with the 2.0 release. Also, some very big webarchiving initiatives have moved and used pywb in these years: Webrecorder itself, Rhizome, Perma, Arquivo PT in Portugal, the Italian National Library in Florence (Italy), (others i’m missing). 05 Oct 2017 Anonymous webarchiving Webarchiving activities, as any other activity where an HTTP client is involved, leave marks of their steps: the web server you are visiting or crawling will save your IP address in its logs (or even worse it can decide to ban your IP). This is usually not a problem, there are plenty of good reasons for a webserver to keep logs of its visitors. But sometimes you may need to protect your own identity when you are visiting or saving something from a website, and there a lot of sensitive careers that need this protection: activists, journalist, political dissidents. TOR has been invented for this, and today offer a good protection to browse anonymously the web. Can we also archive the web through TOR? 03 Sep 2016 Open BNI Il 30 maggio 2016 viene annunciato il rilascio libero della Bibliografia Nazionale Italiana (BNI). Viene apprezzata l’apertura di questo catalogo (anche se con i limiti dei soli pdf), e da profano di biblioteconomia faccio anche una domanda sull’effettivo caso d’uso della BNI. Il 30 agosto 2016 viene annunciato il rilascio delle annate 2015 e 2016 anche in formato UNIMARC e MARCXML. Incuriosito dal catalogo inizio ad esplorarlo, per pensare a possibili trasformazioni (triple rdf) o arricchimenti con/verso altri dati (wikidata). 03 Mar 2015 Epub linkrot Linkrot also affects epub files (who would have thought! :)). How to check the health of external links in epub books (required tools: a shell, atool, pup, gnu parallel). 26 Feb 2015 SKOS Nuovo Soggettario, api e autocomplete Come creare una api per un form con autocompletamento usando i termini del Nuovo Soggettario, con i Sorted Sets di Redis e Nginx+Lua. 23 Nov 2014 Serve deepzoom images from a zip archive with openseadragon vips is a fast image processing system. Version higher than 7.40 can generate static tiles of big images in deepzoom format, saving them directly into a zip archive. 23 Oct 2014 a wayback machine (pywb) on a cheap, shared host For a long time the only free (i’m unaware of commercial ones) implementation of a web archival replay software has been the Wayback Machine (now Openwayback). It’s a stable and mature software, with a strong community behind. To use it you need to be confident with the deploy of a java web application; not so difficult, and documentation is exaustive. But there is a new player in the game, pywb, developed by Ilya Kramer, a former Internet Archive developer. Built in python, relatively simpler than wayback, and now used in a pro archiving project at Rhizome. 22 Sep 2014 Opendata dell'Anagrafe Biblioteche Come usare gli opendata dell’Anagrafe delle Biblioteche Italiane e disegnare su una mappa web gli indirizzi delle biblioteche. 05 Sep 2014 api json dell'opac SBN Alcuni mesi fa è stata rilasciata da ICCU una app mobile per consultare l’OPAC SBN. Anche se graficamente poco accattivante l’app funziona bene, e trovo molto utili le funzioni di ricerca di un libro scansionando il codice a barre con la camera del telefonino, e la possibilità di bookmarkare dei preferiti. Incuriosito dal funzionamento ho pensato di analizzarne il traffico http. Page 1 of 1 Subscribe! all content is licensed under a Creative Commons Attribution 4.0 International License made with jekyll + kasper theme literarymachin-es-7470 ---- literary machines literary machines digital libraries, books, archives Archiviiify A short guide to download digitized books from Internet Archive and rehost on your own infrastructure using IIIF with full-text search. pywb 2.0 - docker quickstart Four years have passed since i first wrote of pywb: it was a young tool at the time, but already usable and extremely simple to deploy. Since then a lot of works has been done by Ilya Kreymer (and others), resulting in all the new features available with the 2.0 release. Also, some very big webarchiving initiatives have moved and used pywb in these years: Webrecorder itself, Rhizome, Perma, Arquivo PT in Portugal, the Italian National Library in Florence (Italy), (others i’m missing). Anonymous webarchiving Webarchiving activities, as any other activity where an HTTP client is involved, leave marks of their steps: the web server you are visiting or crawling will save your IP address in its logs (or even worse it can decide to ban your IP). This is usually not a problem, there are plenty of good reasons for a webserver to keep logs of its visitors. But sometimes you may need to protect your own identity when you are visiting or saving something from a website, and there a lot of sensitive careers that need this protection: activists, journalist, political dissidents. TOR has been invented for this, and today offer a good protection to browse anonymously the web. Can we also archive the web through TOR? Open BNI Il 30 maggio 2016 viene annunciato il rilascio libero della Bibliografia Nazionale Italiana (BNI). Viene apprezzata l’apertura di questo catalogo (anche se con i limiti dei soli pdf), e da profano di biblioteconomia faccio anche una domanda sull’effettivo caso d’uso della BNI. Il 30 agosto 2016 viene annunciato il rilascio delle annate 2015 e 2016 anche in formato UNIMARC e MARCXML. Incuriosito dal catalogo inizio ad esplorarlo, per pensare a possibili trasformazioni (triple rdf) o arricchimenti con/verso altri dati (wikidata). Epub linkrot Linkrot also affects epub files (who would have thought! :)). How to check the health of external links in epub books (required tools: a shell, atool, pup, gnu parallel). SKOS Nuovo Soggettario, api e autocomplete Come creare una api per un form con autocompletamento usando i termini del Nuovo Soggettario, con i Sorted Sets di Redis e Nginx+Lua. Serve deepzoom images from a zip archive with openseadragon vips is a fast image processing system. Version higher than 7.40 can generate static tiles of big images in deepzoom format, saving them directly into a zip archive. a wayback machine (pywb) on a cheap, shared host For a long time the only free (i’m unaware of commercial ones) implementation of a web archival replay software has been the Wayback Machine (now Openwayback). It’s a stable and mature software, with a strong community behind. To use it you need to be confident with the deploy of a java web application; not so difficult, and documentation is exaustive. But there is a new player in the game, pywb, developed by Ilya Kramer, a former Internet Archive developer. Built in python, relatively simpler than wayback, and now used in a pro archiving project at Rhizome. Opendata dell’Anagrafe Biblioteche Come usare gli opendata dell’Anagrafe delle Biblioteche Italiane e disegnare su una mappa web gli indirizzi delle biblioteche. api json dell’opac SBN Alcuni mesi fa è stata rilasciata da ICCU una app mobile per consultare l’OPAC SBN. Anche se graficamente poco accattivante l’app funziona bene, e trovo molto utili le funzioni di ricerca di un libro scansionando il codice a barre con la camera del telefonino, e la possibilità di bookmarkare dei preferiti. Incuriosito dal funzionamento ho pensato di analizzarne il traffico http. lostrses-github-io-4979 ---- AHA! | An Arts & Humanities Adventure View on GitHub AHA! An Arts & Humanities Adventure You are a researcher in the Classics department. As part of your current research project, you have become interested in the life of a woman called Fabrica Collaborare, who lived in Roman Britain. There’s not much written specifically about Fabrica, but you have seen her name mentioned in several texts from that time. You are not looking forward to the task of having to look at lots more texts to find out where Fabrica - and the Collaborare family - are mentioned. On your way out of the library to get a cup of coffee, you meet your colleague Priya, and tell her about your problem. She tells you about a group at the university who might be able to help. You haven’t heard of the RSE team before: Priya tells you that ‘RSE’ stands for Research Software Engineering, and that their office is in room 20.21. Go to room 20.21 AHA! maintained by lostRSEs Published with GitHub Pages maisonbisson-com-2871 ---- MaisonBisson Menu Close Home Search Subscribe ☰Menu     MaisonBisson a bunch of stuff I would have emailed you about Scroll DownPage 1 of 112 Older Posts → Every journalist Ryu Spaeth on the dirty job of journalism: [E]very journalist […] at some point will have to face the morally indefensible way we go about our business: namely, using other people to tell a story about the world. Not everyone dupes their subjects into trusting them, but absolutely everyone robs other people of their stories to tell their own. Every journalist knows this flushed feeling, a mix of triumph and guilt, of securing the story that will redound glory unto them, not the subject. Some subjects who have no outlet, who are voiceless, approve of this arrangement, since they have no other way of getting their story heard. But even they will not wholly recognize their own depiction in the newspaper, by virtue of the fact that it was told by someone else with their own agenda. This is what Jonathan Franzen has called the “inescapable shame of being a storyteller”—that it involves stealing from another person, much in the way some people believe a photograph steals a bit of the sitter’s soul. Casey Bisson on #journalism, #reporting, #storytelling, 1 Dec 2020The three tribes of the internet Authors Primavera De Filippi, Juan Ortiz Freuler, and Joshua Tan outline three competing narratives that have shaped the internet: libertarian, corporate, and nationalist. “[These narratives] emerged from a community of shared interests; each calls for a set of institutional arrangements; each endures in today’s politics.” » about 400 words Casey Bisson on #Internet, #Hyperspace, #Law, #Governance, #Libertarian, #Corporate, #Nationalist, #Berkman Klein Center, #Harvard Berkman Center, 30 Nov 2020Happy D.B. Cooper Day D.B. Cooper day is celebrated on this day, the Saturday following Thanksgiving, every year. Casey Bisson on #Agent Smith, #Aircraft hijacking, #Aviation accidents and incidents, #D.B. Cooper, #FBI, #Federal Bureau of Investigation, #festival, #Hijackers, #hijacking, #mysteries, #skyjacking, 28 Nov 2020Vitaminwater's #nophoneforayear contest Back in the before times, Vitaminwater invited applicants to a contest to go a full year without a smartphone or tablet. It was partly in response to rising concerns over the effect of all those alerts on our brains. Over 100,000 people clamored for the chance, but author Elana A. Mugdan’s entry stood out with an amusing video, and in February 2019 the company took away her iPhone 5s and handed her a Kyocera flip phone. » about 600 words Casey Bisson on #Vitaminwater, #nophoneforayear, #scrollfreeforayear, #smartphones, #ethical technology, #humane technology, 22 Nov 2020Membership-driven news media From The Membership Guide’s handbook/manifesto: Journalism is facing both a trust crisis and a sustainability crisis. Membership answers to both. It is a social contract between a news organization and its members in which members give their time, money, energy, expertise, and connections to support a cause that they believe in. In exchange, the news organization offers transparency and opportunities to meaningfully contribute to both the sustainability and impact of the organization. Elsewhere it continues: Membership is not subscription by another name, nor a brand campaign that can be toggled on and off. …and: Memberful routines are workflows that connect audience members to journalism and the people producing it. Routines are the basis for a strong membership strategy. Notice that audience members are specified here, which is likely a wider group than your members. Casey Bisson on #membership, #journalism, #monetization, #publishers, #news organizations, #media, 23 Oct 2020Political bias in social media algorithms and media monetization models New reports reveal yet more structural political biases in consumption and monetization models. » about 300 wordsCasey Bisson on #Politics, #Media, #Algorithms, #Monetization, #Bias, #Journalism, #Social media, #News organizations, 22 Oct 2020Media monetization vs. internet advertising Media face structural, regulatory, and technical hurdles to effectively monetizing with ads on the internet, but there are some solutions that are working. » about 1000 words Casey Bisson on #advertising, #ads, #media monetization, #monetization models, #media, #journalism, #news organizations, 14 Aug 2020The argument against likes: aim for deeper, more genuine interactions Sweet Pea on the state of social media and dating apps: “We are not creating a healthy society when we’re telling millions of young people that the key to happy relationships is photo worthy of an impulsive right swipe.” » about 800 words Casey Bisson on #likes, #social media, #dating apps, #social software, #signal, 8 Aug 2020Paid reactions: virtual awards and tipping Reddit and Twitch both allow members to pay for the privilege of reacting to other member's content with special awards or stickers. » about 600 words Casey Bisson on #social media, #reactions, #paid reactions, #virtual awards, #tipping, #revenue, #Reddit, #Twitch, 7 Aug 2020Reactions Facebook introduced reactions with an emphasis on both the nuance they enabled and the mobile convenience: “[I]f you are sharing something that is sad [...] it might not feel comfortable to Like that post.” Later: “Commenting might afford nuanced responses, but composing those responses on a [mobile] keypad takes too much time.” » about 800 words Casey Bisson on #reactions, #likes, #social media, #Facebook, #Instagram, 6 Aug 2020“Likes” vs. “Faves” Twitter switched from Faves to Likes in 2015. “You might like a lot of things, but not everything can be your *favorite*,” they explained. Weeks after the change, liking activity for existing users was up 6% and 9% for new users. » about 500 words Casey Bisson on #Likes, #Faves, #social media, #Twitter, #Facebook, #microcopy, 5 Aug 2020Honey cocktails: eau de lavender Liquor.com’s recipe for eau de lavender, from a larger collection of cocktails with honey. They all look and sound delightful, but I can vouch for the eau de lavender. Ingredients 1 1/2 oz Tequila 3/4 oz Fresh lemon juice 3/4 oz Honey syrup1 1 Egg white 1 dash Scrappy’s lavender bitters Garnish: Lavender sprig Steps Add all ingredients into a shaker and dry-shake (without ice). Add ice and shake again to emulsify thoroughly. Strain into a chilled coupe glass. Garnish with a lavender sprig. Honey syrup: Add 1/2 cup honey and 1/2 cup water to a small saucepan over medium heat. (You can experiment and decide how much of a honey flavor you want in your syrup. The more honey you use, the thicker the syrup and stronger in flavor it will be.) Stir until blended. Strain into a jar and seal tightly with a lid. Will keep for 1 month in the refrigerator. ↩︎ Casey Bisson on #cocktails, #mixology, #honey, 12 May 2020Satellite tracking If you’re not reading Skyriddles blog, then you’re not tracking the sky above. And you might have missed the re-discovery of a satellite launched in 1967 and lost for nearly 50 years. As it turns out, there’s a lot of stuff that’s been forgotten up there, and quite a bit that some are trying to hide. The blog is an entertaining view into the world satellites, including communication, spy, weather, research, and the occasional probe going further afield. Casey Bisson on #satellite tracking, #space, 19 Apr 2020I'm missing restaurants now @nakedlunchsf was notable for having both a strong contender for the best burger in the city, _and_... Casey Bisson on #photo, #photoblog, #stayhome, #supportlocalbusiness, 24 Mar 2020When unzip fails on macOS with UTF8 unzip can fail on macOS when UTF-8 chars are in the archive. The solution is to use ditto. Via a Github issue: ditto -V -x -k --sequesterRsrc --rsrc FILENAME.ZIP DESTINATIONDIRECTORY Casey Bisson on #zip, #unzip, #macOS, #utf8, 4 Feb 2020TikTok vs. Instagram Zuckerberg describes TikTok as “almost like the Explore Tab that we have on Instagram,” but Connie Chan suggests he's missing the deeper value of AI, and TechCrunch's Josh Constantine suggests Zuck is missing the bigger difference in intent on TikTok. » about 400 words Casey Bisson on #TikTok, #Instagram, #social media, #social software, #social networks, #social signals, #artificial intelligence, #AI, 3 Jan 2020Swipegram template Benjamin Lee’s instructions and downloadable template to make panoramic carousel Instagrams (AKA #swipegram), as illustrated via his animation above. » about 100 words Casey Bisson on #instagram, #template, #swipegram, 29 Dec 2019“It is clear that the books owned the shop... “It is clear that the books owned the shop rather than the other way about. Everywhere they... Casey Bisson on #photo, #photoblog, #lovemaine, #portlandmaine, #mustbevancouver, #penderstreet, #downtownvancouver, 1 Dec 2019“Life is like riding a bicycle... “Life is like riding a bicycle. To keep your balance, you must keep moving.” —wisdom by Albert... Casey Bisson on #photo, #photoblog, #forahappymoment, #voreskbh, #visitcopenhagen, #buyfilmnotmegapixels, #ig_denmark, #fujipro400h, #ishootfilm, #travelog, #filmisnotdead, #visitdenmark, #mytinyatlas, #pro400h, #fuji, #believeinfilm, #københavn, #analoguepeople, #instapassport, #staybrokeshootfilm, #hasselblad, #igerscopenhagen, #flashesofdelight, #exploringtheglobe, 8 Nov 2019Notes about Spotify creator features Spotify often gets bashed by top creators. The service pays just $0.00397 per stream, but with 108 million users listening to an average of 25 hours per month, those streams can add up for creators who can get the listener’s attention. Spotify verifies artists who then get additional benefits on the platform. Some artists find success the traditional route, some optimize their work for the system, others work the system…and some really work it. Relevance to other network/aggregation platforms: tiny payments add up, and given a platform, creators will find a way to get and maximize value from it. The critical component is customers. Casey Bisson on #Spotify, #creators, #social networks, #revenue, #aggregation, 3 Nov 2019ExifTool examples I use for encoding analog camera details I’m a stickler for detail and love to add exif metadata for my film cameras to my scanned images. These are my notes to self about the data I use most often. I only wish exif had fields to record the film details too. » about 400 wordsCasey Bisson on #exiftool, #photography, #exif, #metadata, 3 Nov 2019Random notes on Instagram Delete your old photos, rebrand your page, and delete it entirely are all common advice. Plus some tools and traps to be aware of. » about 600 words Casey Bisson on #Instagram, #social media, #photography, 17 Oct 2019Every media has its tastemakers and influencers Every media, network, or platform has would-be influencers or promoters who can help connect consumers with creators. Don’t mistake the value of these tastemakers, and be sure to find a place for them to create new value for your platform. » about 400 wordsCasey Bisson on #Spotify, #Instagram, #social media, #social networks, #influencers, #tastemakers, 15 Oct 2019Storehouse: the most wonderful story sharing flop ever Storehouse shuttered in summer 2016, just a couple years after they launched, but the app and website introduced or made beautiful a few features that remain interesting now. » about 400 wordsCasey Bisson on #Storehouse, #photo sharing, #story sharing, #microblogging, #blogging, #social media, #user-generated content, #ugc, 13 Oct 2019Page 1 of 112 Older Posts →MaisonBisson managemetadata-com-4642 ---- Metadata Matters Metadata Matters It's all about the services It’s not just me that’s getting old Having just celebrated (?) another birthday at the tail end of 2015, the topics of age and change have been even more on my mind than usual. And then two events converged. First I had a chat with Ted Fons in a hallway at Midwinter, and he asked about using an older article I’d published […] Denying the Non-English Speaking World Not long ago I encountered the analysis of BibFrame published by Rob Sanderson with contributions by a group of well-known librarians. It’s a pretty impressive document–well organized and clearly referenced. But in fact there’s also a significant amount of personal opinion in it, the nature of which is somewhat masked by the references to others […] Review of: DRAFT Principles for Evaluating Metadata Standards Metadata standards is a huge topic and evaluation a difficult task, one I’ve been involved in for quite a while. So I was pretty excited when I saw the link for “DRAFT Principles for Evaluating Metadata Standards”, but after reading it? Not so much. If we’re talking about “principles” in the sense of ‘stating-the-obvious-as-a-first-step’, well, […] The Jane-athons continue! The Jane-athon series is alive, well, and expanding its original vision. I wrote about the first ‘official’ Jane-athon earlier this year, after the first event at Midwinter 2015. Since then the excitement generated at the first one has spawned others: the Ag-athon in the UK in May 2015, sponsored by CILIP the Maurice Dance in […] Separating ideology, politics and utility Those of you who pay attention to politics (no matter where you are) are very likely to be shaking your head over candidates, results or policy. It’s a never ending source of frustration and/or entertainment here in the U.S., and I’ve noticed that the commentators seem to be focusing in on issues of ideology and […] Semantic Versioning and Vocabularies A decade ago, when the Open Metadata Registry (OMR) was just being developed as the NSDL Registry, the vocabulary world was a very different place than it is today. At that point we were tightly focussed on SKOS (not fully cooked at that point, but Jon was on the WG that was developing it, so […] Five Star Vocabulary Use Most of us in the library and cultural heritage communities interested in metadata are well aware of Tim Berners-Lee’s five star ratings for linked open data (in fact, some of us actually have the mug). The five star rating for LOD, intended to encourage us to follow five basic rules for linked data is useful, […] What do we mean when we talk about ‘meaning’? Over the past weekend I participated in a Twitter conversation on the topic of meaning, data, transformation and packaging. The conversation is too long to repost here, but looking from July 11-12 for @metadata_maven should pick most of it up. Aside from my usual frustration at the message limitations in Twitter, there seemed to be […] Fresh From ALA, What’s New? In the old days, when I was on MARBI as liaison for AALL, I used to write a fairly detailed report, and after that wrote it up for my Cornell colleagues. The gist of those reports was to describe what happened, and if there might be implications to consider from the decisions. I don’t propose […] What’s up with this Jane-athon stuff? The RDA Development Team started talking about developing training for the ‘new’ RDA, with a focus on the vocabularies, in the fall of 2014. We had some notion of what we didn’t want to do: we didn’t want yet another ‘sage on the stage’ event, we wanted to re-purpose the ‘hackathon’ model from a software […] managemetadata-com-643 ---- Metadata Matters | It's all about the services Pagetitle: Metadata Matters It's all about the services Blog About Archives Log in Schnellnavigation: Jump to start of page | Jump to posts | Jump to navigation It’s not just me that’s getting old Having just celebrated (?) another birthday at the tail end of 2015, the topics of age and change have been even more on my mind than usual. And then two events converged. First I had a chat with Ted Fons in a hallway at Midwinter, and he asked about using an older article I’d published with Karen Coyle way back in early 2007 (“Resource Description and Access (RDA): Cataloging Rules for the 20th Century”). The second thing was a message from Research Gate that reported that the article in question was easily the most popular thing I’d ever published. My big worry in terms of having Ted use that article was that RDA had experienced several sea changes in the nine (!) years since the article was published (Jan./Feb. 2007), so I cautioned Ted about using it. Then I decided I needed to reread the article and see whether I had spoken too soon. The historic rationale holds up very well, but it’s important to note that at the time that article was written, the JSC (now the RSC) was foundering, reluctant to make the needed changes to cut ties to AACR2. The quotes from the CC:DA illustrate how deep the frustration was at that time. There was a real turning point looming for RDA, and I’d like to believe that the article pushed a lot of people to be less conservative and more emboldened to look beyond the cataloger tradition. In April of 2007, a mere few months from when this article came out, ALA Publishing arranged for the famous “London Meeting” that changed the course of RDA. Gordon Dunsire and I were at that meeting–in fact it was the first time we met. I didn’t even know much about him aside from his article in the same DLIB issue. As it turns out, the RDA article was elevated to the top spot, thus stealing some of his thunder, so he wasn’t very happy with me. The decision made in London to allow DCMI to participate by building the vocabularies was a game changer, and Gordon and I were named co-chairs of a Task Group to manage that process. So as I re-read the article, I realized that the most important bits at the time are probably mostly of historical interest at this point. I think the most important takeaway is that RDA has come a very long way since 2007, and in some significant ways is now leading the pack in terms of its model and vocabulary management policies (more about that to come). And I still like the title! …even though it’s no longer a true description of the 21st Century RDA. By Diane Hillmann, February 9, 2016, 9:19 am (UTC-5) RDA, Uncategorized Post a comment Denying the Non-English Speaking World Not long ago I encountered the analysis of BibFrame published by Rob Sanderson with contributions by a group of well-known librarians. It’s a pretty impressive document–well organized and clearly referenced. But in fact there’s also a significant amount of personal opinion in it, the nature of which is somewhat masked by the references to others holding the same opinion. I have a real concern about some of those points where an assertion of ‘best practices’ are particularly arguable. The one that sticks in my craw particularly shows up in section 2.2.5: 2.2.5 Use Natural Keys in URIs References: [manning], [ldbook], [gld-bp], [cooluris] Although the client must treat URIs as opaque strings, it is good practice to construct URIs in a systematic and human readable fashion for both instances and ontology terms. A natural key is one that appears in the information about the resource, such as some unique identifier for the resource, or the label of the property for ontology terms. While the machine does not care about structure, memorability or readability of URIs, the developers that write the code do. Completely random URIs introduce difficult to detect semantic and algorithmic errors in both publication and consumption of the data. Analysis: The use of natural keys is a strength of BIBFRAME, compared to similarly scoped efforts in similar communities such as the RDA and CIDOC-CRM vocabularies which use completely opaque numbers such as P10001 (hasRespondent) or E33 (Linguistic Entity). RDA further misses the target in this area by going on to define multiple URIs for each term with language tagged labels in the URI, such as rda:hasRespondent.en mapping to P10001. This is a different predicate from the numerical version, and using owl:sameAs to connect the two just makes everyone’s lives more difficult unnecessarily. In general, labels for the predicates and classes should be provided in the ontology document, along with thorough and understandable descriptions in multiple languages, not in the URI structure. This sounds fine so long as you accept the idea that ‘natural’ means English, because, of course, all developers, no matter their first language, must be fluent enough in English to work with English-only standards and applications. This mis-use of ‘natural’ reminds me of other problematic usages, such as the former practice in the adoption community (of which I have been a part for 40 years) where ‘natural’ was routinely used to refer to birth parents, thus relegating adoptive parents to the ‘un-natural’ realm. So in this case, if ‘natural’ means English, are all other languages inherently un-natural in the world of development? The library world has been dominated by the ‘Anglo-American’ notions of standard practice for a very long time, and happily, RDA is leading away from that, both in governance and in development of vocabularies and tools. The Multilingual strategy adopted by RDA is based on the following points: More than a decade of managing vocabularies has convinced us that opaque identifiers are extremely valuable for managing URIs, because they need not be changed as labels change (only as definitions change). The kinds of ‘churn’ we saw in the original version of RDA (2008-2013) convinced us that label-based URIs were a significant problem (and cost) that became worse as the vocabularies grew over time. We get the argument that opaque URIs are often difficult for humans to use–but the tools we’re building (the RDA Registry as case in point) are intended to give human developers what they want for their tasks (human readable URIs, in a variety of languages) but ensure that the URIs for properties and values are set up based on what machines need. In this way, changes in the lexical URIs (human-readable) can be maintained properly without costly change in the canonical URIs that travel with the data content itself. The multiple language translations (and distributed translation management by language communities) also enable humans to build discovery and display mechanisms for users that are speakers of a variety of languages. This has been a particularly important value for national libraries outside the US, but also potentially for libraries in the US meeting the needs of non-English language communities closer to home. It’s too easy for the English-first library development community to insist that URIs be readable in English and to turn a blind eye to the degree that this imposes understanding of the English language and Anglo-American library culture on the rest of the world. This is not automatically the intellectual gift that the distributors of that culture assume it to be. It shouldn’t be necessary for non-Anglo-American catalogers to learn and understand Anglo-American language and culture in order to express metadata for a non-Anglo audience. This is the rough equivalent of the Philadelphia cheese steak vendor who put up a sign reading “This is America. When ordering speak in English”. We understand that for English-speaking developers bibframe.org/vocab/title is initially easier to use than rdaregistry.info/Elements/w/P10088 or even (heaven forefend!) “130_0#$a” (in RDF: marc21rdf.info/elements/1XX/M1300_a). That’s why RDA provides rdaregistry.info/Elements/w/titleOfTheWork.en but also, eventually, rdaregistry.info/Elements/w/拥有该作品的标题.ch and rdaregistry.info/Elements/w/tieneTítuloDeLaObra.es, et al (you do understand Latin of course). These ‘unnatural’ Lexical Aliases will be provided by the ‘native’ language speakers of their respective national library communities. As one of the many thousands of librarians who ‘speak’ MARC to one another–despite our language differences–I am loathe to give up that international language to an English-only world. That seems like a step backwards. By Diane Hillmann, January 3, 2016, 5:05 pm (UTC-5) BibFrame, Linked data, RDA, Vocabularies 1 Comment (Show inline) Review of: DRAFT Principles for Evaluating Metadata Standards Metadata standards is a huge topic and evaluation a difficult task, one I’ve been involved in for quite a while. So I was pretty excited when I saw the link for “DRAFT Principles for Evaluating Metadata Standards”, but after reading it? Not so much. If we’re talking about “principles” in the sense of ‘stating-the-obvious-as-a-first-step’, well, okay—but I’m still not very excited. I do note that the earlier version link uses the title ‘draft checklist’, and I certainly think that’s a bit more real than ‘draft principles’ for this effort. But even taken as a draft, the text manages to use lots of terms without defining them—not a good thing in an environment where semantics is so important. Let’s start with a review of the document itself, then maybe I can suggest some alternative paths forward. First off, I have a problem with the preamble: “These principles are intended for use by libraries, archives and museum (LAM) communities for the development, maintenance, governance, selection, use and assessment of metadata standards. They apply to metadata structures (field lists, property definitions, etc.), but can also be used with content standards and value vocabularies”. Those tasks (“development, maintenance, governance, selection, use and assessment” are pretty all encompassing, but yet the connection between those tasks and the overall “evaluation” is unclear. And, of course, without definitions, it’s difficult to understand how ‘evaluation’ relates to ‘assessment’ in this context—are they they same thing? Moving on to the second part about what kind of metadata standards that might be evaluated, we have a very general term, ‘metadata structures’, with what look to be examples of such structures (field lists, property definitions, etc.). Some would argue (including me) that a field list is not a structure without a notion of connections between the fields; and although property definitions may be part of a ‘structure’ (as I understand it, at least), they are not a structure, per se. And what is meant by the term ‘content standards’, and how is that different from ‘metadata structures’? The term ’value vocabularies’ goes by many names, and is not something that can go without a definition. I say this as an author/co-author of a lot of papers that use this term, and we always define it within the context of the paper for just that reason. There are many more places in the text where fuzziness in terminology is a problem (maybe not a problem for a checklist, but certainly for principles). Some examples: 1. What is meant by ’network’? There are many different kinds, and if you mean to refer to the Internet, for goodness sakes say so. ‘Things’ rather than ‘strings’ is good, but it will take a while to make it happen in legacy data, which we’ll be dealing with for some time, most likely forever. Prospectively created data is a bit easier, but still not a cakewalk — if the ‘network’ is the global Internet, then “leveraging ‘by-reference’ models” present yet-to-be-solved problems of network latency, caching, provenance, security, persistence, and most importantly: stability. Metadata models for both properties and controlled values are an essential part of LAM systems and simply saying that metadata is “most efficient when connected with the broader network” doesn’t necessarily make it so. 2. ‘Open’ can mean many things. Are we talking specific kinds of licenses, or the lack of a license? What kind of re-use are you talking about? Extension? Wholesale adoption with namespace substitution? How does semantic mapping fit into this? (In lieu of a definition, see the paper at (1) below) 3. This principle seems to imply that “metadata creation” is the sole province of human practitioners and seriously muddies the meaning of the word creation by drawing a distinction between passive system-created metadata and human-created metadata. Metadata is metadata and standards apply regardless. What do you mean by ‘benefit user communities’? Whose communities? Please define what is meant by ‘value’ in this context? How would metadata practitioners ‘dictate the level of description provided based on the situation at hand’? 4. As an evaluative ‘principle’ this seems overly vague. How would you evaluate a metadata standard’s ability to ‘easily’ support ‘emerging’ research? What is meant by ‘exchange/access methods’ and what do they have to do with metadata standards for new kinds of research? 5. I agree totally with the sentence “Metadata standards are only as valuable and current as their communities of practice,” but the one following makes little sense to me. “ … metadata in LAM institutions have been very stable over the last 40 years …” Really? It could easily be argued that the reason for that perceived stability is the continual inability of implementers to “be a driving force for change” within a governance model that has at the same time been resistant to change. The existence of the DCMI usage board, MARBI, the various boards advising the RDA Steering Committee, all speak to the involvement of ‘implementers’. Yet there’s an implication in this ‘principle’ that stability is liable to no longer be the case and that implementers ‘driving’ will somehow make that inevitable lack of stability palatable. I would submit that stability of the standard should be the guiding principle rather than the democracy of its governance. 6. “Extensible, embeddable, and interoperable” sounds good, but each is more complex than this triumvirate seems. Interoperability in particular is something that we should all keep in mind, but although admirable, interoperability rarely succeeds in practice because of the practical incompatibility of different models. DC, MARC21, BibFrame, RDA, and Schema.org are examples of this — despite their ‘modularity’ they generally can’t simply be used as ‘modules’ because of differences in the thinking behind the model and their respective audiences. I would also argue that ‘lite style implementations’ make sense only if ‘lite’ means a dumbed-down core that can be mapped to by more detailed metadata. But stressing the ‘lite implementations’ as a specified part of an overall standard gives too much power to the creator of the standard, rather than the creator of the data. Instead we should encourage the use of application profiles, so that the particular choices and usages of the creating entity are well documented, and others can use the data in full or in part according to their needs. I predict that lossy data transfer will be less acceptable in the reality than it is in the abstract, and would suggest that dumb data is more expensive over the longer term (and certainly doesn’t support ‘new research methods’ at all). “Incorporation into local systems” really can only be accomplished by building local systems that adhere to their own local metadata model and are able to map that model in/out to more global models. Extensible and embeddable are very different from interoperable in that context. 7. The last section, after the inarguable first sentence, describes what the DCMI ‘dumb-down’ principle defined nearly twenty years ago, and that strategy still makes sense in a lot of situations. But ‘graceful degradation’ and ‘supporting new and unexpected uses’ requires smart data to start with. ‘Lite’ implementation choices (as in #6 above) preclude either of those options, IMO, and ‘adding value’ of any kind (much less by using ‘ontological inferencing’) is in no way easily achievable. I intend to be present at the session in Boston [9:00-10:00 Boston Conference and Exhibition Center, 107AB] and since I’ve asked most of my questions here I intend not to talk much. Let’s see how successful I can be at that! It may well be that a document this short and generalized isn’t yet ready to be a useful tool for metadata practitioners (especially without definitions!). That doesn’t mean that the topics that it’s trying to address aren’t important, just that the comprehensive goals in the preamble are not yet being met in this document. There are efforts going on in other arenas–the NISO Bibliography Roadmap work, for instance, that should have an important impact on many of these issues, which suggests that it might be wise for the Committee to pause and take another look around. Maybe a good glossary would be a important step? Dunsire, Gordon, et al. “A Reconsideration of Mapping in a Semantic World”, paper presented at International Conference on Dublin Core and Metadata Applications, The Hague, 2011. Available at: dcpapers.dublincore.org/pubs/article/view/3622/1848 By Diane Hillmann, December 14, 2015, 4:59 pm (UTC-5) ALA Conferences, Systems, Vocabularies 1 Comment (Show inline) The Jane-athons continue! The Jane-athon series is alive, well, and expanding its original vision. I wrote about the first ‘official’ Jane-athon earlier this year, after the first event at Midwinter 2015. Since then the excitement generated at the first one has spawned others: the Ag-athon in the UK in May 2015, sponsored by CILIP the Maurice Dance in New Zealand (October 16, 2015 at the National Library of New Zealand in Wellington, focused on Maurice Gee) the Jane-in (at ALA San Francisco at Annual 2015) the RLS-athon (November 9, 2015, Edinburgh, Scotland), following the JSC meeting there and focused on Robert Louis Stevenson Like good librarians we have an archive of the Jane-athon materials, for use by anyone who wants to look at or use the presentations or the data created at the Jane-athons We’re still at it: the next Jane-athon in the series will be the Boston Thing-athon at Harvard University on January 7, 2016. Looking at the list of topics gives a good idea about how the Jane-athons are morphing to a broader focus than that of a creator, while training folks to create data with RIMMF. The first three topics (which may change–watch this space) focus not on specific creators, but on moving forward on topics identified of interest to a broader community. * Strings vs things. A focus on replacing strings in metadata with URIs for things. * Institutional repositories, archives and scholarly communication. A focus on issues in relating and linking data in institutional repositories and archives with library catalogs. * Rare materials and RDA. A continuing discussion on the development of RDA and DCRM2 begun at the JSC meeting and the international seminar on RDA and rare materials held in November 2015. For beginners new to RDA and RIMMF but with an interest in creating data, we offer: * Digitization. A focus on how RDA relates metadata for digitized resources to the metadata for original resources, and how RIMMF can be used to improve the quality of MARC 21 records during digitization projects. * Undergraduate editions. A focus on issues of multiple editions that have little or no change in content vs. significant changes in content, and how RDA accommodates the different scenarios. Further on the horizon is a recently approved Jane-athon for the AALL conference in July 2016, focusing on Hugo Grotius (inevitably, a Hugo-athon, but there’s no link yet). NOTE: The Thing-a-thon coming up at ALA Midwinter is being held on Thursday rather than the traditional Friday to open the attendance to those who have other commitments on Friday. Another new wrinkle is the venue–an actual library away from the conference center! Whether you’re a cataloger or not-a-cataloger, there will be plenty of activities and discussions that should pique your interest. Do yourself a favor and register for a fun and informative day at the Thing-athon to begin your Midwinter experience! Instructions for registering (whether or not you plan to register for MW) can be found on the Toolkit Blog. By Diane Hillmann, December 7, 2015, 11:19 am (UTC-5) Uncategorized Post a comment Separating ideology, politics and utility Those of you who pay attention to politics (no matter where you are) are very likely to be shaking your head over candidates, results or policy. It’s a never ending source of frustration and/or entertainment here in the U.S., and I’ve noticed that the commentators seem to be focusing in on issues of ideology and faith, particularly where it bumps up against politics. The visit of Pope Francis seemed to be taking everyone’s attention while he was here, but though this event has added some ‘green’ to the discussion, it hasn’t pushed much off the political plate. Politics and faith bump up against each other in the metadata world, too. What with traditionalists still thinking in MARC tags and AACR2, to the technical types rolling their eyes at any mention of MARC and trying to push the conversation towards RDA, RDF, BibFrame, schema.org, etc., there are plenty of metadata politics available to flavor the discussion. The good news for us is that the conflicts and differences we confront in the metadata world are much more amenable to useful solution than the politics crowding our news feeds. I remember well the days when the choice of metadata schema was critical to projects and libraries. Unfortunately, we’re all still behaving as if the proliferation of ‘new’ schemas makes the whole business more complicated–that’s because we’re still thinking we need to choose one or another, ignoring the commonality at the core of the new metadata effort. But times have changed, and we don’t all need to use the same schema to be interoperable (just like we don’t all need to speak English or Esperanto to communicate). But what we do need to think about is what the needs of our organization are at all stages of the workflow: from creating, publishing, consuming, through integrating our metadata to make it useful in the various efforts in which we engage. One thing we do need to consider as we talk about creating new metadata is whether it will need to work with other data that already exists in our institution. If MARC is what we have, then one requirement may be to be able to maintain the level of richness we’ve built up in the past and still move that rich data forward with us. This suggests to me that RDA, which RIMMF has demonstrated can be losslessly mapped to and from MARC, might be the best choice for the creation of new metadata. Back in the day, when Dublin Core was the shiny new thing, the notion of ‘dumb-down’ was hatched, and though not an elegantly named principle, it still works. It says that rich metadata can be mapped fairly easily into a less-rich schema (‘dumbed down’), but once transformed in a lossy way, it can’t easily be ‘smartened up’. But in a world of many publishers of linked data, and many consumers of that data, the notion of transforming rich metadata into any number of other schemas and letting the consumer chose what they want, is fairly straightforward, and does not require firm knowledge (or correct assumptions) of what the consumers actually need. This is a strategy well-tested with OAI-PMH which established a floor of Simple Dublin Core but encouraged the provision of any number of other formats as well, including MARC. As consumers, libraries and other cultural institutions are also better served by choices. Depending on the services they’re trying to support, they can choose what flavor of data meets their needs best, instead of being offered only what the provider assumes they want. This strategy leaves open the possibility of serving MARC as one of the choices, allowing those institutions still nursing an aged ILS to continue to participate. Of course, the consumers of data need to think about how they aggregate and integrate the data they consume, how to improve that data, and how to make their data services coherent. That’s the part of the new create, publish, consume, integrate cycle that scares many librarians, but it shouldn’t–really! So, it’s not about choosing the ‘right’ metadata format, it’s about having a fuller and more expansive notion about sharing data and learning some new skills. Let’s kiss the politics goodbye, and get on with it. By Diane Hillmann, October 12, 2015, 10:08 am (UTC-5) Linked data, RDA, Vocabularies 1 Comment (Show inline) Semantic Versioning and Vocabularies A decade ago, when the Open Metadata Registry (OMR) was just being developed as the NSDL Registry, the vocabulary world was a very different place than it is today. At that point we were tightly focussed on SKOS (not fully cooked at that point, but Jon was on the WG that was developing it, so we felt pretty secure diving in). But we were thinking about versioning in the Open World of RDF even then. The NSDL Registry kept careful track of all changes to a vocabulary (who, what, when) and the only way to get data in was through the user interface. We ran an early experiment in making versions based on dynamic, timestamp-based snapshots (we called them ‘time slices’, Git calls them ‘commit snapshots’) available for value vocabularies, but this failed to gain any traction. This seemed to be partly because, well, it was a decade ago for one, and while it attempted to solve an Open World problem with versioned URIs, it created a new set of problems for Closed World experimenters. Ultimately, we left the versions issue to sit and stew for a bit (6 years!). All that started to change in 2008 as we started working with RDA, and needed to move past value vocabularies into properties and classes, and beyond that into issues around uploading data into the OMR. Lately, Git and GitHub have started taking off and provide a way for us to make some important jumps in functionality that have culminated in the OMR/GitHub-based RDA Registry. Sounds easy and intuitive now, but it sure wasn’t at the time, and what most people don’t know is that the OMR is still where RDA/RDF data originates — it wasn’t supplanted by Git/Github, but is chugging along in the background. The OMR’s RDF CMS is still visible and usable by all, but folks managing larger vocabularies now have more options. One important aspect of the use of Git and GitHub was the ability to rethink versioning. Just about a year ago our paper on this topic (Versioning Vocabularies in a Linked Data World, by Diane Hillmann, Gordon Dunsire and Jon Phipps) was presented to the IFLA Satellite meeting in Paris. We used as our model the way software on our various devices and systems is updated–more and more these changes happen without much (if any) interaction with us. In the world of vocabularies defining the properties and values in linked data, most updating is still very manual (if done at all), and the important information about what has changed and when is often hidden behind web pages or downloadable files that provide no machine-understandable connections identifying changes. And just solving the change management issue does little to solve the inevitable ‘vocabulary rot’ that can make published ‘linked data’ less and less meaningful, accurate, and useful over time. Building stable change management practices is a very critical missing piece of the linked data publishing puzzle. The problem will grow exponentially as language versions and inter-vocabulary mappings start to show up as well — and it won’t be too long before that happens. Please take a look at the paper and join in the conversation! By Diane Hillmann, September 20, 2015, 6:41 pm (UTC-5) RDA, Tools, Vocabularies Post a comment Five Star Vocabulary Use Most of us in the library and cultural heritage communities interested in metadata are well aware of Tim Berners-Lee’s five star ratings for linked open data (in fact, some of us actually have the mug). The five star rating for LOD, intended to encourage us to follow five basic rules for linked data is useful, but, as we’ve discussed it over the years, a basic question rises up: What good is linked data without (property) vocabularies? Vocabulary manager types like me and my peeps are always thinking like this, and recently we came across solid evidence that we are not alone in the universe. Check out: “Five Stars of Linked Data Vocabulary Use”, published last year as part of the Semantic Web Journal. The five authors posit that TBL’s five star linked data is just the precondition to what we really need: vocabularies. They point out that the original 5 star rating says nothing about vocabularies, but that Linked Data without vocabularies is not useful at all: “Just converting a CSV file to a set of RDF triples and linking them to another set of triples does not necessarily make the data more (re)usable to humans or machines.” Needless to say, we share this viewpoint! I’m not going to steal their thunder and list here all five star categories–you really should read the article (it’s short), but only note that the lowest level is a zero star rating that covers LD with no vocabularies. The five star rating is reserved for vocabularies that are linked to other vocabularies, which is pretty cool, and not easy to accomplish by the original publisher as a soloist. These five star ratings are a terrific start to good practices documentation for vocabularies used in LOD, which we’ve had in our minds for some time. Stay tuned. By Diane Hillmann, August 7, 2015, 1:50 pm (UTC-5) Linked data, Vocabularies Post a comment What do we mean when we talk about ‘meaning’? Over the past weekend I participated in a Twitter conversation on the topic of meaning, data, transformation and packaging. The conversation is too long to repost here, but looking from July 11-12 for @metadata_maven should pick most of it up. Aside from my usual frustration at the message limitations in Twitter, there seemed to be a lot of confusion about what exactly we mean about ‘meaning’ and how it gets expressed in data. I had a skype conversation with @jonphipps about it, and thought I could reproduce that here, in a way that could add to the original conversation, perhaps clarifying a few things. [Probably good to read the Twitter conversation ahead of reading the rest of this.] Jon Phipps: I think the problem that the people in that conversation are trying to address is that MARC has done triple duty as a local and global serialization (format) for storage, supporting indexing and display; a global data interchange format; and a focal point for creating agreement about the rules everyone is expected to follow to populate the data (AACR2, RDA). If you walk away from that, even if you don’t kill it, nothing else is going to be able to serve that particular set of functions. But that’s the way everyone chooses to discuss bibframe, or schema.org, or any other ‘marc replacement’. Diane Hillmann: Yeah, but how does ‘meaning’ merely expressed on a wiki page help in any way? Isn’t the idea to have meaning expressed with the data itself? Jon Phipps: It depends on whether you see RDF as a meaning transport mechanism or a data transport mechanism. That’s the difference between semantic data and linked data. Diane Hillmann: It’s both, don’t you think? Jon Phipps: Semantic data is the smart subset of linked data. Diane Hillmann: Nice tagline Jon Phipps: Zepheira, and now DC, seem to be increasingly looking at RDF as merely linked data. I should say a transport mechanism for ‘linked’ data. Diane Hillmann: It’s easier that way. Jon Phipps: Exactly. Basically what they’re saying is that meaning is up to the receiver’s system to determine. Dc:title of ‘Mr.’ is fine in that world–it even validates according to the ‘new’ AP thinking. It’s all easier for the data producers if they don’t have to care about vocabularies. But the value of RDF is that it’s brilliantly designed to transport knowledge, not just data. RDF data is intended to live in a world where any Thing can be described by any Thing, and all of those descriptions can be aggregated over time to form a more complete description of the Thing Being Described. Knowledge transfer really benefits from Semantic Web concepts like inferences and entailments and even truthiness (in addition to just validation). If you discount and even reject those concepts in a linked data world than you might as well ship your data around as CSV or even SQL files and be done with it. One of the things about MARC is that it’s incredibly semantically rich (marc21rdf.info) and has also been brilliantly designed by a lot of people over a lot of years to convey an equally rich body of bibliographic knowledge. But throwing away even a small portion of that knowledge in pursuit of a far dumber linked data holy grail is a lot like saying that since most people only use a relatively limited number of words (especially when they’re texting) we have no need for a 50,000 word, or even a 5,000 word, dictionary. MARC makes knowledge transfer look relatively easy because the knowledge is embedded in a vocabulary every cataloger learns and speaks fairly fluently. It looks like it’s just a (truly limiting) data format so it’s easy to think that replacing it is just a matter of coming up with a fresh new format, like RDF. But it’s going to be a lot harder than that, which is tacitly acknowledged by the many-faceted effort to permanently dumb-down bibliographic metadata, and it’s one of the reasons why I think bibframe.org, bibfra.me, and schema.org might end up being very destructive, given the way they’re being promoted (be sure to Park Your MARC somewhere). [That’s why we’re so focused on the RDA data model (which can actually be semantically richer than MARC), why we helped create marc21rdf.info, and why we’re working at building out our RDF vocabulary management services.] Diane Hillmann: This would be a great conversation to record for a podcast 😉 Jon Phipps: I’m not saying proper vocabulary management is easy. Look at us for instance, we haven’t bothered to publish the OMR vocabs and only one person has noticed (so far). But they’re in active use in every OMR-generated vocab. The point I was making was that we we’re no better, as publishers of theoretically semantic metadata, at making sure the data was ‘meaningful’ by making sure that the vocabs resolved, had definitions, etc. [P.S. We’re now working on publishing our registry vocabularies.] By Diane Hillmann, July 16, 2015, 9:35 pm (UTC-5) Linked data, RDA, Vocabularies 1 Comment (Show inline) Fresh From ALA, What’s New? In the old days, when I was on MARBI as liaison for AALL, I used to write a fairly detailed report, and after that wrote it up for my Cornell colleagues. The gist of those reports was to describe what happened, and if there might be implications to consider from the decisions. I don’t propose to do that here, but it does feel as if I’m acting in a familiar ‘reporting’ mode. In an early Saturday presentation sponsored by the Linked Library Data IG, we heard about BibFrame and VIVO. I was very interested to see how VIVO has grown (having seen it as an infant), but was puzzled by the suggestion that it or FOAF could substitute for the functionality embedded in authority records. For one thing, auth records are about disambiguating names, and not describing people–much as some believe that’s where authority control should be going. Even when we stop using text strings as identifiers, we’ll still need that function and should be thinking carefully whether adding other functions makes good sense. Later on Saturday, at the Cataloging Norms IG meeting, Nancy Fallgren spoke on the NLM collaboration with Zepheira, GW, (and others) on BibFrame Lite. They’re now testing the Kuali OLE cataloging module for use with BF Lite, which will include a triple store. An important quote from Nancy: “Legacy data should not drive development.” So true, but neither should we be starting over, or discarding data, just to simplify data creation, thus losing the ability to respond to the more complex needs in cataloging, which aren’t going away, (a point demonstrated usefully in the recent Jane-athons). I was the last speaker on that program, and spoke on the topic of “What Can We Do About Our Legacy Data?” I was primarily asking questions and discussing options, not providing answers. The one thing I am adamant about is that nobody should be throwing away their MARC records. I even came up with a simple rule: “Park the MARC”. After all, storage is cheap, and nobody really knows how the current situation will settle out. Data is easy to dumb down, but not so easy to smarten up, and there may be do-overs in store for some down the road, after the experimentation is done and the tradeoffs clearer. I also attended the BibFrame Update, and noted that there’s still no open discussion about the ‘classic’ (as in ‘Classic Coke’) BibFrame version used by LC, and the ‘new’ (as in ‘New Coke’) BibFrame Lite version being developed by Zepheira, which is apparently the vocabulary they’re using in their projects and training. It seems like it could be a useful discussion, but somebody’s got to start it. It’s not gonna be me. The most interesting part of that update from my point of view was hearing Sally McCallum talk about the testing of BibFrame by LC’s catalogers. The tool they’re planning on using (in development, I believe) will use RDA labels and include rule numbers from the RDA Toolkit. Now, there’s a test I really want to hear about at Midwinter! But of course all of that RDA ‘testing’ they insisted on several years ago to determine if the RDA rules could be applied to MARC21 doesn’t (can’t) apply to BibFrame Classic so … Will there be a new round of much publicized and eagerly anticipated shared institutional testing of this new tool and its assumptions? Just askin’. By Diane Hillmann, July 10, 2015, 10:10 am (UTC-5) ALA Conferences, BibFrame, RDA, Vocabularies Post a comment What’s up with this Jane-athon stuff? The RDA Development Team started talking about developing training for the ‘new’ RDA, with a focus on the vocabularies, in the fall of 2014. We had some notion of what we didn’t want to do: we didn’t want yet another ‘sage on the stage’ event, we wanted to re-purpose the ‘hackathon’ model from a software focus to data creation (including a major hands-on aspect), and we wanted to demonstrate what RDA looked like (and could do) in a native RDA environment, without reference to MARC. This was a tall order. Using RIMMF for the data creation was a no-brainer: the developers had been using the RDA Registry to feed new vocabulary elements into their their software (effectively becoming the RDA Registry’s first client), and were fully committed to FRBR. Deborah Fritz had been training librarians and other on RIMMF for years, gathering feedback and building enthusiasm. It was Deborah who came up with the Jane-athon idea, and the RDA Development group took it and ran with it. Using the Jane Austen theme was a brilliant part of Deborah’s idea. Everybody knows about JA, and the number of spin offs, rip-offs and re-tellings of the novels (in many media formats) made her work a natural for examining why RDA and FRBR make sense. One goal stated everywhere in the marketing materials for our first Jane outing was that we wanted people to have fun. All of us have been part of the audience and on the dais for many information sessions, for RDA and other issues, and neither position has ever been much fun, useful as the sessions might have been. The same goes for webinars, which, as they’ve developed in library-land tend to be dry, boring, and completely bereft of human interaction. And there was a lot of fun at that first Jane-athon–I venture to say that 90% of the folks in the room left with smiles and thanks. We got an amazing response to our evaluation survey, and the preponderance of responses were expansive, positive, and clearly designed to help the organizers to do better the next time. The various folks from ALA Publishing who stood at the back and watched the fun were absolutely amazed at the noise, the laughter, and the collaboration in evidence. No small part of the success of Jane-athon 1 rested with the team leaders at each table, and the coaches going from table to table helping out with puzzling issues, ensuring that participants were able to create data using RIMMF that could be aggregated for examination later in the day. From the beginning we thought of Jane 1 as the first of many. In the first flush of success as participants signed up and enthusiasm built, we talked publicly about making it possible to do local Jane-athons, but we realized that our small group would have difficulty doing smaller events with less expertise on site to the same standard we set at Jane-athon 1. We had to do a better job in thinking through the local expansion and how to ensure that local participants get the same (or similar) value from the experience before responding to requests. As a step in that direction CILIP in the UK is planning an Ag-athon on May 22, 2015 which will add much to the collective experience as well as to the data store that began with the first Jane-athon and will be an increasingly important factor as we work through the issues of sharing data. The collection and storage of the Jane-athon data was envisioned prior to the first event, and the R-Balls site was designed as a place to store and share RIMMF-based information. Though a valuable step towards shareable RDA data, rballs have their limits. The data itself can be curated by human experts or available with warts, depending on the needs of the user of the data. For the longer term, RIMMF can output RDF statements based on the rball info, and a triple store is in development for experimentation and exploration. There are plans to improve the visualization of this data and demonstrate its use at Jane-athon 2 in San Francisco, which will include more about RDA and linked data, as well as what the created data can be used for, in particular, for new and improved services. So, what are the implications of the first Jane-athon’s success for libraries interested in linked data? One of the biggest misunderstandings floating around libraryland in linked data conversations is that it’s necessary to make one and only one choice of format, and eschew all others (kind of like saying that everyone has to speak English to participate in LOD). This is not just incorrect, it’s also dangerous. In the MARC era, there was truly no choice for libraries–to participate in record sharing they had to use MARC. But the technology has changed, and rapidly evolving semantic mapping strategies [see: dcpapers.dublincore.org/pubs/article/view/3622] will enable libraries to use the most appropriate schemas and tools for creating data to be used in their local context, and others for distributing that data to partners, collaborators, or the larger world. Another widely circulated meme is that RDA/FRBR is ‘too complicated’ for what libraries need; we’re encouraged to ‘simplify, simplify’ and assured that we’ll still be able to do what we need. Hmm, well, simplification is an attractive idea, until one remembers that the environment we work in, with evolving carriers, versions, and creative ideas for marketing materials to libraries is getting more complex than ever. Without the specificity to describe what we have (or have access to), we push the problem out to our users to figure out on their own. Libraries have always tried to be smarter than that, and that requires “smart” , not “dumb”, metadata. Of course the corollary to the ‘too complicated’ argument lies the notion that a) we’re not smart enough to figure out how to do RDA and FRBR right, and b) complex means more expensive. I refuse to give space to a), but b) is an important consideration. I urge you to take a look at the Jane-athon data and consider the fact that Jane Austen wrote very few novels, but they’ve been re-published with various editions, versions and commentaries for almost two centuries. Once you add the ‘based on’, ‘inspired by’ and the enormous trail created by those trying to use Jane’s popularity to sell stuff (“Sense and Sensibility and Sea Monsters” is a favorite of mine), you can see the problem. Think of a pyramid with a very expansive base, and a very sharp point, and consider that the works that everything at the bottom wants to link to don’t require repeating the description of each novel every time in RDA. And we’re not adding notes to descriptions that are based on the outdated notion that the only use for information about the relationship between “Sense and Sensibility and Sea Monsters” and Jane’s “Sense and Sensibility” is a human being who looks far enough into the description to read the note. One of the big revelations for most Jane-athon participants was to see how well RIMMF translated legacy MARC records into RDA, with links between the WEM levels and others to the named agents in the record. It’s very slick, and most importantly, not lossy. Consider that RIMMF also outputs in both MARC and RDF–and you see something of a missing link (if not the Golden Gate Bridge :-). Not to say there aren’t issues to be considered with RDA as with other options. There are certainly those, and they’ll be discussed at the Jane-In in San Francisco as well as at the RDA Forum on the following day, which will focus on current RDA upgrades and the future of RDA and cataloging. (More detailed information on the Forum will be available shortly). Don’t miss the fun, take a look at the details and then go ahead and register. And catalogers, try your best to entice your developers to come too. We’ll set up a table for them, and you’ll improve the conversation level at home considerably! By Diane Hillmann, May 18, 2015, 10:13 am (UTC-5) Linked data, RDA, Uncategorized 1 Comment (Show inline) Older articles » Schnellnavigation: Jump to start of page | Jump to posts | Jump to navigation Syndication RDF Articles RSS2 Articles ATOM Articles Archives February 2016 January 2016 December 2015 October 2015 September 2015 August 2015 July 2015 May 2015 February 2015 December 2014 November 2014 October 2014 September 2014 February 2014 December 2013 July 2013 May 2013 October 2012 August 2012 July 2012 June 2012 May 2012 April 2012 March 2012 December 2011 September 2011 April 2011 March 2011 February 2011 January 2011 October 2010 September 2010 August 2010 July 2010 June 2010 April 2010 March 2010 February 2010 January 2010 November 2009 August 2009 July 2009 May 2009 April 2009 March 2009 February 2009 January 2009 December 2008 November 2008 Categories ALA Conferences (15) BibFrame (3) Dublin Core (5) Futures (27) Legislative Data Project (1) Linked data (27) MARC21 in RDF (11) Meeting reports (8) Presentations (9) RDA (32) Tools (5) Systems (7) Teaching (2) Uncategorized (12) Vocabularies (11) April S M T W T F S « Feb «-»     1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Search Search the archive Latest comments Denying the Non-English Speaking World  1 Karen Coyle What do we mean when we talk about ‘meaning’?  1 Owen Stephens Why Are We Waiting for the ILS to Change?  2 Diane Hillmann, Dan Scott Mapping without taggregations  2 Gordon Dunsire, Kathleen Lamantia If We Were Asked  2 Eddie F. Fitzgerald, Chuck Getting to higher MARC branches  2 Gordon Dunsire, Karen Coyle Blogroll Bibliographic Wilderness Catalogablog Cataloging Futures Celeripedean Coyle’s Information Go to Hellman Inkdroid LITABlog Lorcan Dempsey’s Weblog Metadata Blog Metalogue The FRBR Blog The Registry blog Thingology Virtual Dave … Real Blog Weibel Lines Linkroll Buttons Schnellnavigation: Jump to start of page | Jump to posts | Jump to navigation Metadata Matters is powered by WordPress v4.4.2 and Binary Blue v1.4.1 marcedit-reeset-net-8985 ---- 7.5.27 Updated: 4/26/2021 * Enhancement: MarcEditor: Added a button to provide quick access to the available task list. * Enhancement: MarcEditor: code is in place to begin allowing users to show/hide menu/toolbar buttons. This should be available in a near term update. 7.5.25 Updated: 4/25/2021 * Bug Fix: Internet Archive => HathiTrust plugin updates to correct debug link generation. * Update: File Assoc. updates * Update: Installer - file extensions will now assign to 7.5.x 7.5.20 Updated: 4/19/2021 * Enhancement: Z39.50 -- users can add more than 2 criteria. * Update: Plugin -- Internet Archive => HathiTrust Plugin updated to allow for multiple date type searches. * Update: Z39.50 UI changes to make it easier to prevent data from being hidden on high zoom * Update: In the Preferences, the Task location can now allow Environment variables in the file path (example: %APPDATA%) * Update: Updated JSON/RDF Components * Bug Fix: Validate Headings window was freezing when using some of the new linked data rule options. * Enhancement: Custom Reports -- added a UI validation to ensure required data is provided (this wasn't previously the case). * Enhancement: MarcValidator -- added some updated language in the error changing. * Bug Fix: MarcValidator -- make sure that all file handles are closed (there was a case where one of the handles was remaining opened and could, potentially, result in a locked process). 7.5.8 Updated: 4/3/2021 * Enhancement: MarcEditor global Edit functions -- a new Preview option has been added (Replace All, Add Field, Delete Field, Copy Field, Edit Indicators, Edit Field, Edit Subfield, Swap Field) * Enhancement: UI enhancement to ensure that a status message is present so users know the process is running (Replace All, Add Field, Delete Field, Copy Field, Edit Indicators, Edit Field, Edit Subfield, Swap Field) * Enhancement: MarcEngine -- added JSON => XML translation * Enhancement: XML/JSON Profile Wizard - added support for JSON-LD formatted data. * Enhancement: XSLT -- including XSLT for the Homosaurus vocabulary * Enhancement: OCLC API -- surfacing more debugging information to make it easier to see when an issue is occuring * Bug Fix: MarcValidator -- Ensured all file handles are closing and released * Behavior Change: KBART 2 MARC Plugin - tool will preference ISBN 13 if present (currently, it selects the last ISBN if multiples of the same type are present) * Bug Fix: Installer -- cleaned up some old files * Behavior Change: OCLC has discontinued providing work id information in worldcat.org. I've shifted to using the classify api till a better option is found. * Clean-up: UI Clean up in the migration wizard * Clean-up: UI clean up of the main window * Bug Fix/Clean-up: Corrected UI to add back missing icons (for example, in the Extract Selected Records form) 7.5.2 Updated: 2/7/2021 * Enhancement: Updated Plugin Manager * Enhancement: OCLC Connexion Plugin Added/Converted * Enhancement: Internet Archive => HathiTrust Packager Added/Converted * Enhancement: MARC => KBART Converter Added/Converted * Enhancement: Make Check Digit Added/Converted * Enhancement: Microlif => Mnemonic Converted Added/Converted * RIS => MARC Plugin Added/Converted * Enhancement: Installer evaluates for the 64-bit Access Database Engine (2016) on 64 bit systems * Enhancement: Installer evaluates for the 2015 C++ Runtime required by the Access Database Engine on 64 bit systems * Behavior Change: Restart as 32-bit program has been hidden * Enhancement: MARC SQL Explorer has been folded into the primary MarcEdit Application [results in a reduction of dependencies] * Bug Fix: Clustering Tools -- Beta build wasn't allowing the clustering tools to function correctly. * Enhancement: OCLC Search -- Batch Searching has been allowed * Enhancement: OCLC Integration -- New Session Diagnostics option added for debugging processes * Bug Fix: Integration Settings Import -- If no settings have ever been set and the initial file hasn’t been created, import will say it’s completed, but it won’t. * Bug Fix: OCLC Integration -- if the expires_at element is null or fails to parse, it can throw an error. This is now trapped and will attempt to reauthorize. * Bug Fix: Console: Added process to consume event processing for validate and split tasks. 7.5.1 Updated: 2/2/2021 * Bug Fix: Installer throws an error when attempting to install per user * Bug Fix: MarcEditor -- MarcEdit will be deprecating legacy page loading. This option is now ignored if set and will be removed entirely in future builds. 7.5.0 Updated: 2/1/2021 * Change: Allow OS to manage supported supported Security Protocol types. * Change: Remove com.sun dependency related to dns and httpserver * Change: Changed AppData Path * Change: First install automatically imports settings from MarcEdit 7.0-3.x * Change: Field Count - simplify UI (consolidate elements) * Change: 008 Windows -- update help urls to oclc * Change: Generate FAST Headings -- update help urls * Change: .NET changes thread stats queuing. Updating thread processing on forms: * Generate FAST Headings * Batch Process Records * Build Links * Main Window * RDA Helper * Delete Selected Records * MARC Tools * Check URL Tools * MARCValidator * MARCEngine * task manager * Z39.50 * ILS Integration Processing * Character Conversions * Format Handing (delimited text, openrefine, etc.) * Change: XML Function List -- update process for opening URLs * Change: Z39.50 Preferences Window - update process for opening URLs * Change: About Windows -- new information, updated how version information is calculated. * Change: Catalog Calculator Window -- update process for opening URLs * Change: Generate Call Numbers -- update process for opening URLs * Change: Generate Material Formats -- update process for opening URLs * Change: Tab Delimiter -- remove context windows * Change: Tab Delimiter -- new options UI * Change: Tab Delimiter -- normalization changes * Change: Remove Old Help HTML Page * Change: Remove old Hex Editor Page * Change: Updated Hex Editor to integrate into main program * Change: Main Window -- remove custom scheduler dependency * Change: UI Update to allow more items * Change: Main Window -- new icon * Change: Main Window -- update process for opening URLs * Change: Main Window -- removed context menus * Change: Main Window -- Upgrade changes to new executable name * Change: Main Window -- Updated the following menu Items: * Edit Linked Data Tools * Removed old help menu item * Added new application shortcut * Change: OCLC Bulk Downloader -- new UI elements to correspond to new OCLC API * Change: OCLC Search Page -- new UI elements to correspond to new OCLC API * Change: Preferences -- Updates related to various preference changes: * Hex Editor * Integrations * Editor * Other * Change: RDA Helper -- update process for opening URLs * Change: RDA Helper -- Opening files for editing * Change: Removed the Script Maker * Change: Templates for Perl and vbscripts includes * Change: Removed Find/Search XML in the XML Editor and consolidated in existing windows * Change: Delete Selected Records: Exposed the form and controls to the MarcEditor * Change: Sparql Browser -- update process for opening URLs * Change: Sparql Browser -- removed context menus * Change: TroubleShooting Wizard -- Added more error codes and kb information to the Wizard * Change: UNIMARC Utility -- controls change, configurable transform selections * Change: MARC Utilities -- removed the context menu * Change: First Run Wizard -- new options, new agent images * Change: XML Editor -- Delete Block Addition * Change: XML Editor -- XQuery transform support * Change: XML Profile Wizard -- option to process attributes * Change: MarcEditor -- Status Bar control doesn't exist in NET 5.0. Control has changed. * Change: MarcEditor -- Improved Page Loading * Change: MarcEditor -- File Tracking updated to handle times when the file opened is a temp record * Change: MarcEditor -- removed ~7k of old code * Change: MarcEditor -- Added Delete Selected Records Option * Change: Removed helper code used by Installer * Change: Removed Office2007 menu formatting code * Change: Consolidated Extensions into new class (removed 3 files) * Change: Removed calls Marshalled to the Windows API -- replaced with Managed Code * Change: OpenRefine Format handler updated to capture changes between OpenRefine versions * Change: MarcEngine -- namespace update to 75 * Change: Wizard -- missing unicode font options more obvious * Change: Wizard install puts font in program directory so that additional users can simply copy (not download) the font on use * Change: checkurls: removed support for insecure crypto-types * Change: checkurls: additional heuristics to respond dynamically to http status codes * Change: All Components -- .NET 5.0 includes a new codepages library that allows for extended codepage support beyond the default framework. Added across the project. * Change: MarcValidator -- new rules process that attempts to determine if records are too long for processing when validating rules or structure. * Change: Command-line -- batch process switch has been added to the tasks processing function * Change: Options -- Allow user path to be reset. * Bug Fix: Main Window -- corrects process for determining version for update * Bug Fix: Main Window -- Updated image * Bug Fix: When doing first run, wizard not showing in some cases. * Bug Fix: Main Window -- Last Tool used sometimes shows duplicates * Bug Fix: RDA Helper -- $e processing * Bug Fix: RDA Helper -- punctuation in the $e * Bug Fix: XML Profile Wizard -- When the top element is selected, it's not viewed for processing (which means not seeing element data or attribute data) * Bug Fix: MarcEditor -- Page Processing correct to handle invalid formatted data better * Bug Fix: Installation Wizard -- if a unicode font was installed during the first run process, it wouldn't be recognized. * Bug Fix: MarcValidator fails when attempting to process a .mrk file from outside the MarcEditor * Bug Fix: Linked Data Processing: When processing services with multiple redirects -- process may stop pre-maturely. (Example: LC's id.loc.gov 3xx processing) * Bug Fix: Edit Field -- Find fields with just spaces are trimmed, causing the field data to process improperly. * Bug Fix: RDA Helper will fail if LDR length is incorrect when attempting to determine character encoding mashable-com-9799 ---- The 10 Founding Fathers of the Web We're using cookies to improve your experience. Find out more. Hidden main menu item Mashable Video Entertainment Movies Gaming Television Culture Web Culture Sex & Relationships Celebrities Memes Parenting Social Media Tech Business Apps Gadgets Reviews Mobile Smart Home How To Mashable Choice Science Climate Space Social Good LGBTQ Feminism Gender Equality Activism Non-profits AMPLIFY Shop Tech VPN Headphones Speakers Laptops Web Hosting Antivirus Lifestyle Black Friday Home Kitchen Gift Guides Gaming Culture Dating Pets Subscription Boxes Carry On Best of Tech Best VPN Best Cheap VPN Best Streaming Services Best Cheap Laptops Best Running Headphones Best Bluetooth Speakers Best of Culture Best Dating Sites Best Free Dating Sites Best Dating Sites for Introverts Best DNA Tests Best Dog DNA Tests Best Subscription Boxes Best of Lifestyle Best Airfryer Best Cordless Vacuum Best Instant Pot Best Gifts Under $50 Best Robot Vacuums Best Vacuum for Pet Hair Black Friday Search More Channels Video Entertainment Culture Tech Science Social Good AMPLIFY Company Masthead Licensing & Reprints Archive Mashable Careers Contact Contact Us Submit News Mashable Shop Advertise Advertise AdChoices Legal Privacy Policy Terms of Use Cookie Policy Accessibility Statement Do Not Sell My Personal Information Resources Travel Security How To Mashable Deals Gift Guides Sites Job Board Social Good Summit International Mashable Australia Mashable Benelux Mashable India Mashable Italia Mashable ME Mashable Pakistan Mashable SE Asia Mashable UK Entertainment Like Follow The 10 Founding Fathers of the Web By Christina Warren2010-07-04 14:17:28 UTC While the phrase "founding fathers" is often used in conjunction with men like Benjamin Franklin, Thomas Jefferson and George Washington, we wanted the think about the phrase on the global level. And what is more global than the world wide web? Thus, this holiday, we're taking a look at 10 individuals who have been instrumental in helping to shape the world wide web and the culture of the Internet as we know it today. Check out our round up below to learn about some of the most influential people in the creation and development of the ideas and technologies that have led to today's web experience. Let us know in the comments if you think we've missed anyone! 1. Tim Berners-Lee Why He Matters: Tim Berners-Lee is credited as the inventor of the World Wide Web. A physicist, Berners-Lee and his team built the world's very first web browser, WorldWideWeb, the first web server and the HyperText-based markup language HTML. Berners-Lee founded and is the current director of the World Wide Web Consortium (W3C), a standards body that oversees the development of the web as a whole. While the Internet itself dates back 1969, it was Berners-Lee who was able to bring together the concept of the Internet and hypertext, which set the foundation for the Internet as we know it today. Because CERN (the European Organization for Nuclear Research) didn't make the World Wide Web proprietary and never charged for dues, its protocols were widely adopted. 2. Marc Andreessen Why He Matters: Marc Andreessen co-authored Mosaic, the first widely-used web browser and he founded Netscape Communications. While Mosaic wasn't the first graphical web browser, it was the first to garner significant attention. It was also the first browser to display images inline with text. After designing and programing Mosaic, Andreessen went on to co-found Netscape Communications. Netscape's flagship product, Netscape Navigator, had an enormous impact, by helping to bring the web to mainstream users. In 1998, Netscape released the code base for Netscape Communicator under an open source license. That project, known as Mozilla, became the basis of what we now know as Firefox. 3. Brian Behlendorf Why He Matters: Brian Behlendorf was the primary developer of the Apache Web Server and one of the founding members of the Apache Group. While working as the webmaster for Wired Magazines's HotWired web site, Behlendorf found himself making changes and patches to the HTTP server first developed at NCSA at the University of Illinois at Urbana-Champaign. After realizing that others were also adding their own patches, he put together an electronic mailing list to help coordinate the work. By February 1995, the project had been given a name - Apache - and the entire codebase from the original NCSA server was rewritten and re-optimized. The real genius with Apache, other than its free and open source nature, was that it was built to be extensible. That meant that ISPs could easily add their own extensions or plugins to better optimize the server, allowing hundreds of sites to be hosted from just one computer server. Apache remains the most popular web server on the Internet. 4, 5, 6. Rasmus Lerdorf, Andi Gutmans and Zeev Suraski Why They Matter: Lerdorf, Gutmans and Suraski are all responsible for what we know as PHP, the scripting language that remains one of the most used web languages for creating dynamic web pages. Rasmus Lerdorf first created PHP in 1995 and he was the main developer of the project for its first two versions. In 1997, Gutmans and Suraski decided to extend PHP, rewriting the parser and creating what became known as PHP 3. The two then went on to rewrite the core of PHP, naming it the Zend Engine, and using that to power PHP 4. Gutmans and Suraski further went on to found Zend Technologies, which continues to do much of the development of PHP. While Larry Wall's Perl was one of the first general-purpose scripting languages to really take off on the web, the ease of use and embedability of PHP is what has made it take over as the defacto "P" in the LAMP stack (LAMP being a default set of components on which many web applications are based). 7. Brad Fitzpatrick Why He Matters: Creator of LiveJournal, in many ways the proto-social network, the original author of memcached and the original authentication protocol for OpenID. Fitzpatrick created LiveJournal in college, as a way for he and his friends to keep one another up to date with what they were doing. It evolved into a larger blogging community and implemented many features, like Friends Lists, the ability to create user polls, support for blog clients, the ability to send text messages to users, the ability to post by phone, post by e-mail, create group blogs and more that have become a standard part of communities like Facebook, Tumblr, MySpace, WordPress.com and Posterous today. As LiveJournal grew and started to use more and more resources, Fitzpatrick started the memcached project as a way to speed up dynamic web applications and alleviate database load. It does this by pooling together the free memory from across your web servers and then allocate it out as needed. This makes it easy for large projects to scale. Memcached is in use by Wikipedia, Flickr, Facebook, WordPress, Twitter, Craigslist and more. 8. Brendan Eich Why He Matters: He created JavaScript and now serves as the CTO of the Mozilla Corporation. Eich created JavaScript while at Netscape, first under the name Mocha, then under the name LiveScript, and finally as JavaScript. JavaScript made its official debut in December of 1995. JavaScript quickly became one of the most popular web programming languages, even if its use cases in the early days were often visual abominations. However, as time has progressed, the advent of JavaScript libraries and frameworks, coupled with the power of Ajax has made JavaScript an integral part of the standards-based web. 9. John Resig Why He Matters: John Resig is the creator and lead developer of jQuery, the most popular JavaScript library on the web. While other JavaScript libraries, such as Sam Stephenson's Protoype, preceded jQuery, jQuery's goal of being compatible across web browsers is what really sets it apart. In the last two years especially, the momentum around jQuery has exploded and it is now reportedly in use by 31% of the top 10,000 most visited websites. It's extensibility and the jQuery UI toolkit has also made it a popular adoption target in enterprise application development. Any JavaScript library that can make the leap from web developers to enterprise app builders is the real deal. JavaScript continues to be one of the big forces within the standards-based web and jQuery is helping to lead the charge. 10. Jonathan Gay Why He Matters: He co-founded FutureWave Software and for more than a decade was the main programmer and visionary behind Flash. While not everyone is a fan of Adobe Flash, it's important to remember how influential and instrumental the technology has been over the course of the last 15 years. Gay wrote a vector drawing program called SmartSketch back in 1993 for the PenPoint operating system, and after PenPoint was discontinued, the technology in SmartSketch was repurposed as a tool that could create animation that could be played back on web pages. This product, FutureSplash Animator, was acquired by Macromedia in 1996 and renamed Flash. After the acquisition, Gay became Vice President of Engineering at Macromedia and he led the Flash engineering team. Over the years, his team implemented new elements to Flash, like Actionscript. However, perhaps Gay's pinnacle achievement with Flash was in the team he spearheaded to create what was then known as the Flash Communication Server (it's now the Flash Media Server) which let Flash Player use the RTMP protocol to stream audio and video over the web. In essence, this technology is what allowed YouTube to be, well, YouTube. More development and design resources from Mashable: - Top 10 Resources for Design Inspiration - HOW TO: Get Up-to-Date on WordPress 3.0 - 7 Hackathons Around the World and the Web - 10 Web Design Bloggers You Should Follow - Top 10 Beautiful Minimalist Icon Sets [img credits: European Parliament, Marc Andreessen, Ilya Schurov, chrys/Sebastian Bergmann, crucially, jsconf, badubadu] Topics: brendan eich, Dev & Design, founding fathers, john resig, marc andreessen, rasmus lerdorf, Social Media, web development, World Wide Web Masthead Jobs Advertise Mashable Shop Contact Privacy Terms Facebook mashable Twitter mashable Feeds mashable Pinterest mashable YouTube mashable StumbleUpon mashable LinkedIn mashable Better Business Bureau Accredited Business is a global, multi-platform media and entertainment company. Powered by its own proprietary technology, Mashable is the go-to source for tech, digital culture and entertainment content for its dedicated and influential audience around the globe. ©2021 Mashable, Inc. All Rights Reserved. Mashable, MashBash and Mashable House are among the federally registered trademarks of Ziff Davis, LLC and may not be used by third parties without explicit permission. matienzo-org-2040 ---- Posts | Mark A. Matienzo skip to content W3C SVG Main navigation Menu About Now Posts Notes Music Projects Publications Presentations Press Posts IAH Forecast - Disquiet Junto Project 0476 Publish date: February 15, 2021 Tags: music black tent by Mark A. Matienzo An experiment with recording a new single using VCV Rack and REAPER based on a compositional prompt. I ended up recording two tracks. Perfecting a favorite: oatmeal chocolate chip cookies Publish date: November 29, 2020 Tags: recipes food by Mark A. Matienzo I have a horrible sweet tooth, and I absolutely love oatmeal chocolate chip cookies. I tend to bake as a means to cope with stress, and of course, more often then that means making these cookies. After making many iterations, I’ve settled upon this recipe as the ultimate version to which all compare. (Read more …) In Memoriam and Appreciation of Rob Casson (1974-2020) Publish date: October 1, 2020 Tags: code4lib Personal by Mark A. Matienzo The world lost one of its brightest and most charming lights earlier this week, Rob Casson. Many of us knew Rob through the Code4Lib community and conferences and his work at Miami University Libraries. We miss his generosity, patience, sense of humor, and genuine kindness. Those of us who got the chance to socialize with him also remember his passion for music, and some of us were even lucky to see live shows in the evenings between conference sessions and other social activities. On Sunday, October 4 at 1:30 PM Pacific/4:30 PM Eastern, those of us who knew him through Code4Lib and the world of libraries are encouraged to gather to share our memories of him and to appreciate his life and work. Please join me and my co-organizers, Mike Giarlo and Declan Fleming on Zoom (registration required). Robert Casson (robcaSSon), 30 Jan 1974 - 29 Sep 2020. Photo: Declan Fleming. (Read more …) First SOTA activation Publish date: September 29, 2020 Tags: ham radio by Mark A. Matienzo About a month ago, I got my ham radio license, and soon after I got pretty curious about Summits on the Air (SOTA), an award scheme focused on safe and low impact portable operation from mountaintops. While I like to hike, I’m arguably a pretty casual hiker, and living in California provides a surprising number of options within 45 minutes driving time for SOTA newbies. (Read more …) Optimizing friction Publish date: August 10, 2020 Tags: indieweb music plan9 food by Mark A. Matienzo Over and in response to the last few months, I’ve been reflecting about intentionality, and how I spend my time creating things. I have tried to improve the indiewebbiness of my site, and understanding what it means to “scratch my own itch”. This resonates particularly lately because it’s leading me to mull over which parts should be hard and easy. Unsurprisingly, much of that is personal preference, and figuring out how I want to optimize from the perspective of user experience. Friction in UX can be a powerful tool, part of what I’m trying to find is where I want to retain friction as it helps me remain intentional. (Read more …) A Hugo shortcode for embedding Mirador Publish date: July 25, 2020 Tags: iiif hugo by Mark A. Matienzo I spent a little time over the last day or so trying to bodge together a shortcode for Hugo to embed an instance of Mirador. While it’s not quite as simple (or full-featured) as I’d like, it’s nonetheless a starting point. The shortcode generates a snippet of HTML that gets loaded into Hugo pages, but (unfortunately) most of the heavy lifting is done by a separate static page that gets included as an