Code that Lasts: Sustainable And Usable Open Source Code

A presentation I gave at online conference Code4Lib 2021, on Monday March 21.

I have realized that the open source projects I am most proud of are a few that have existed for years now, increasing in popularity, with very little maintenance required. Including traject and bento_search. While community aspects matter for open source sustainability, the task gets so much easier when the code requires less effort to keep alive, for maintainers and utilizers. Using these projects as examples, can we as developers identify what makes code “inexpensive” to use and maintain over the long haul with little “churn”, and how to do that?

Slides on Google Docs

Rough transcript (really the script I wrote for myself)

Hi, I’m Jonathan Rochkind, and this is “Code that Lasts: Sustainable and Usable Open Source Code”

So, who am I? I have been developing open source library software since 2006, mainly in ruby and Rails. 

Over that time, I have participated in a variety open source projects meant to be used by multiple institutions, and I’ve often seen us having challenges with long-term maintenance sustainability and usability of our software. This includes in projects I have been instrumental in creating myself, we’ve all been there! 

We’re used to thinking of this problem in terms of needing more maintainers.

But let’s first think more about what the situation looks like, before we assume what causes it. In addition to features  or changes people want not getting done, it also can look like, for instance:

Being stuck using out-of-date dependencies like old, even end-of-lifed, versions of Rails or ruby.

A reduction in software “polish” over time. 

What do I mean by “polish”?

Engineer Richard Schneeman writes: [quote] “When we say something is “polished” it means that it is free from sharp edges, even the small ones. I view polished software to be ones that are mostly free from frustration. They do what you expect them to and are consistent.” 

I have noticed that software can start out very well polished, but over time lose that polish. 

This usually goes along with decreasing “cohesion” in software over time, a feeling like that different parts of the software start to no longer tell the developer a consistent story together. 

While there can be an element of truth in needing more maintainers in some cases – zero maintainers is obviously too few — there are also ways that increasing the number of committers or maintainers can result in diminishing returns and additional challenges.

One of the theses of Fred Brooks famous 1975 book “The Mythical Man-Month” is sometimes called ”Brooks Law”:  “under certain conditions, an incremental person when added to a project makes the project take more, not less time.”

Why? One of the main reasons Brooks discusses is the the additional time taken for communication and coordination between more people – with every person you add, the number of connections between people goes up combinatorily. 

That may explain the phenomenon we sometimes see with so-called “Design  by committee” where “too many cooks in the kitchen” can produce inconsistency or excessive complexity.

Cohesion and polish require a unified design vision— that’s  not incompatible with increasing numbers of maintainers, but it does make it more challenging because it takes more time to get everyone on the same page, and iterate while maintaining a unifying vision.  (There’s also more to be said here about the difference between just a bunch of committers committing PR’s, and the maintainers role of maintaining historical context and design vision for how all the parts fit together.)

Instead of assuming adding more committers or maintainers is the solution, can there instead be ways to reduce the amount of maintenance required?

I started thinking about this when I noticed a couple projects of mine which had become more widely successful than I had any right  to expect, considering how little maintainance was being put into them. 

Bento_search is a toolkit for searching different external search engines in a consistent way. It’s especially but not exclusively for displaying multiple search results in “bento box” style, which is what Tito Sierra from NCSU first called these little side by side search results. 

I wrote bento_search  for use at a former job in 2012.  55% of all commits to the project were made in 2012.  95% of all commits in 2016 or earlier. (I gave it a bit of attention for a contracting project in 2016).

But bento_search has never gotten a lot of maintenance, I don’t use it anymore myself. It’s not in wide use, but I found  it kind of amazing, when I saw people giving me credit in conference presentations for the gem (thanks!), when I didn’t even know they were using it and I hadn’t been paying it any attention at all! It’s still used by a handful of institutions for whom it just works with little attention from maintainers. (The screenshot from Cornell University Libraries)

Traject is a Marc-to-Solr indexing tool written in ruby  (or, more generally, can be a general purpose extract-transform-load tool), that I wrote with Bill Dueber from the University of Michigan in 2013. 

We hoped it would catch on in the Blacklight community, but for the first couple years, it’s uptake was slow. 

However, since then, it has come to be pretty popular in Blacklight and Samvera communities, and a few other library technologist uses.  You can see the spikes of commit activity in the graph for a 2.0 release in 2015 and a 3.0 release in 2018 – but for the most part at other times, nobody has really been spending much time on maintaining traject.   Every once in a while a community member submits a minor Pull Request, and it’s usually me who reviews it. Me and Bill remain the only maintainers. 

And yet traject just keeps plugging along, picking up adoption and working well for adopters.  

So, this made me start thinking, based on what I’ve seen in my career, what are some of the things that might make open source projects both low-maintenance and successful in their adoption and ease-of-use for developers?

One thing both of these projects did was take backwards compatibility very seriously. 

The first step of step there is following “semantic versioning” a set of rules whose main point is that releases can’t include backwards incompatible changes unless they are a new major version, like going from 1.x to 2.0. 

This is important, but it’s not alone enough to minimize backwards incompatible changes that add maintenance burden to the ecosystem. If the real goal is preventing the pain of backwards incompatibility, we also need to limit the number of major version releases, and limit the number and scope of backwards breaking changes in each major release!

The Bento_search gem has only had one major release, it’s never had a 2.0 release, and it’s still backwards compatible to it’s initial release. 

Traject is on a 3.X release after 8 years, but the major releases of traject have had extremely few backwards breaking changes, most people could upgrade through major versions changing very little or most often nothing in their projects. 

So OK, sure, everyone wants to minimize backwards incompatibility, but that’s easy to say, how do you DO it? Well, it helps to have less code overall, that changes less often overall all  – ok, again, great, but how do you do THAT? 

Parsimony is a word in general English that means “The quality of economy or frugality in the use of resources.”

In terms of software architecture, it means having as few as possible moving parts inside your code: fewer classes, types, components, entities, whatever: Or most fundamentally, I like to think of it in terms of minimizing the concepts in the mental model a programmer needs to grasp how the code works and what parts do what.

The goal of architecture design is, what is the smallest possible architecture we can create to make [quote] “simple things simple and complex things possible”, as computer scientist Alan Kay described the goal of software design. 

We can see this in bento_search has very few internal architectural concepts. 

The main thing bento_search does is provide a standard API for querying a search engine and representing results of a search. These are consistent across different searche engines,, with common metadata vocabulary for what results look like. This makes search engines  interchangeable to calling code.  And then it includes half a dozen or so search engine implementations for services I needed or wanted to evaluate when I wrote it.  

This search engine API at the ruby level can be used all by itself even without the next part, the actual “bento style” which is a built-in support for displaying search engine results in a boxes on a page of your choice in a Rails app, way to,  writing very little boilerplate code.  

Traject has an architecture which basically has just three parts at the top.

There is a reader which sends objects into the pipeline. 

There are some indexing rules which are transformation steps from source object to build an output Hash object. 

And then a writer which which translates the Hash object to write to some store, such as Solr.

The reader, transformation steps, and writer are all independent and uncaring about each other, and can be mixed and matched.  

That’s MOST of traject right there. It seems simple and obvious once you have it, but it can take a lot of work to end up with what’s simple and obvious in retrospect! 

When designing code I’m often reminded of the apocryphal quote: “I would have written a shorter letter, but I did not have the time”

And, to be fair, there’s a lot of complexity within that “indexing rules” step in traject, but it’s design was approached the same way. We have use cases about supporting configuration settings in a  file or on command line; or about allowing re-usable custom transformation logic – what’s the simplest possible architecture we can come up with to support those cases.

OK, again, that sounds nice, but how do you do it? I don’t have a paint by numbers, but I can say that for both these projects I took some time – a few weeks even – at the beginning to work out these architectures, lots of diagraming, some prototyping I was prepared to throw out,  and in some cases “Documentation-driven design” where I wrote some docs for code I hadn’t written yet. For traject it was invaluable to have Bill Dueber at University of Michigan also interested in spending some design time up front, bouncing ideas back and forth with – to actually intentionally go through an architectural design phase before the implementation. 

Figuring out a good parsimonious architecture takes domain knowledge: What things your “industry” – other potential institutions — are going to want to do in this area, and specifically what developers are going to want to do with your tool. 


We’re maybe used to thinking of “use cases” in terms of end-users, but it can be useful at the architectural design stage, to formalize this in terms of developer use cases. What is a developer going to want to do, how can I come up with a small number of software pieces she can use to assemble together to do those things.

When we said “make simple things simple and complex things possible”, we can say domain analysis and use cases is identifying what things we’re going to put in either or neither of those categories. 

The “simple thing” for bento_search , for instance is just “do a simple keyword search in a search engine, and display results, without having the calling code need to know anything about the specifics of that search engine.”

Another way to get a head-start on solid domain knowledge is to start with another tool you have experience with, that you want to create a replacement for. Before Traject, I and other users used a tool written in Java called SolrMarc —  I knew how we had used it, and where we had had roadblocks or things that we found harder or more complicated than we’d like, so I knew my goals were to make those things simpler.

We’re used to hearing arguments about avoiding rewrites, but like most things in software engineering, there can be pitfalls on either either extreme.

I was amused to notice, Fred Brooks in the previously mentioned Mythical Man Month makes some arguments in both directions. 

Brooks famously warns about a “second-system effect”, the [quote] “tendency of small, elegant, and successful systems to be succeeded by over-engineered, bloated systems, due to inflated expectations and overconfidence” – one reason to be cautious of a rewrite. 

But Brooks in the very same book ALSO writes [quote] “In most projects, the first system built is barely usable….Hence plan to throw one away; you will, anyhow.”

It’s up to us figure out when we’re in which case. I personally think an application is more likely to be bitten by the “second-system effect” danger of a rewrite, while a shared re-usable library is more likely to benefit from a rewrite (in part because a reusable library is harder to change in place without disruption!). 

We could sum up a lot of different princples as variations of “Keep it small”. 

Both traject and bento_search are tools that developers can use to build something. Bento_search just puts search results in a box on a page; the developer is responsible for the page and an overall app. 

Yes, this means that you have to be a ruby developer to use it. Does this limit it’s audience? While we might aspire to make tools that even not-really-developers can just use out of the box, my experience has been that our open source attempts at shrinkwrapped “solutions” often end up still needing development expertise to successfully deploy.  Keeping our tools simple and small and not trying to supply a complete app can actually leave more time for these developers to focus on meeting local needs, instead of fighting with a complicated frameworks that doesn’t do quite what they need.

It also means we can limit interactions with any external dependencies. Traject was developed for use with a Blacklight project, but traject code does not refer to Blacklight or even Rails at all, which means new releases of Blacklight or Rails can’t possibly break traject. 

Bento_search , by doing one thing and not caring about the details of it’s host application, has kept working from Rails 3.2 all the way up to current Rails 6.1 with pretty much no changes needed except to the test suite setup. 

Sometimes when people try to have lots of small tools working together, it can turn into a nightmare where you get a pile of cascading software breakages every time one piece changes. Keeping assumptions and couplings down is what lets us avoid this maintenance nightmare. 

And another way of keeping it small is don’t be afraid to say “no” to features when you can’t figure out how to fit them in without serious harm to the parsimony of your architecture. Your domain knowledge is what lets you take an educated guess as to what features are core to your audience and need to be accomodated, and which are edge cases and can be fulfilled by extension points, or sometimes not at all. 

By extension points we mean we prefer opportunities for developer-users to write their own code which works with your tools, rather than trying to build less commonly needed features in as configurable features. 

As an example, Traject does include some built-in logic, but one of it’s extension point use cases is making sure it’s simple to add whatever transformation logic a developer-user wants, and have it look just as “built-in” as what came with traject. And since traject makes it easy to write your own reader or writer, it’s built-in readers and writers don’t need to include every possible feature –we plan for developers writing their own if they need something else. 

Looking at bento_search, it makes it easy to write your own search engine_adapter — that will be useable interchangeably with the built-in ones. Also, bento_search provides a standard way to add custom search arguments specific to a particular adapter – these won’t be directly interchangeable with other adapters, but they are provided for in the architecture, and won’t break in future bento_search releases – it’s another form of extension point. 

These extension points are the second half of “simple things simple, complex things possible.” – the complex things possible. Planning for them is part of understanding your developer use-cases, and designing an architecture that can easily handle them. Ideally, it takes no extra layers of abstraction to handle them, you are using the exact  architectural join points the out-of-the-box code is using, just supplying custom components. 

So here’s an example of how these things worked out in practice with traject, pretty well I think.

Stanford ended up writing a package of extensions to traject called TrajectPlus, to take care of some features they needed that traject didn’t provide. Commit history suggests it was written in 2017, which was Traject 2.0 days.  

I can’t recall, but I’d guess they approached me with change requests to traject at that time and I put them off because I couldn’t figure out how to fit them in parsimoniously, or didn’t have time to figure it out. 

But the fact that they were *able* to extend traject in this way I consider a validation of traject’s architecture, that they could make it do what they needed, without much coordination with me, and use it in many projects (I think beyond just Stanford). 

Much of the 3.0 release of traject was “back-port”ing some features that TrajectPlus had implemented, including out-of-the-box support for XML sources. But I didn’t always do them with the same implementation or API as TrajectPlus – this is another example of being able to use a second go at it to figure out how to do something even more parsimoniously, sometimes figuring out small changes to traject’s architecture to support flexibility in the right dimensions. 

When Traject 3.0 came out – the TrajectPlus users didn’t necessarily want to retrofit all their code to the new traject way of doing it. But TrajectPlus could still be used with traject 3.0 with few or possibly no changes, doing things the old way, they weren’t forced to upgrade to the new way. This is a huge win for traject’s backwards compat – everyone was able to do what they needed to do, even taking separate paths, with relatively minimized maintenance work. 

As I think about these things philosophically, one of my takeaways is that software engineering is still a craft – and software design is serious thing to be studied and engaged in. Especially for shared libraries rather than local apps, it’s not always to be dismissed as so-called “bike-shedding”. 

It’s worth it to take time to think about design, self-reflectively and with your peers, instead of just rushing to put our fires or deliver features, it will reduce maintenance costs and increase values over the long-term. 

And I want to just briefly plug “kithe”, a project of mine which tries to be guided by these design goals to create a small focused toolkit for building Digital Collections applications in Rails. 

I could easily talk about all of this this another twenty minutes, but that’s our time! I’m always happy to talk more, find me on slack or IRC or email. 

This last slide has some sources mentioned in the talk. Thanks for your time! 

Product management

In my career working in the academic sector, I have realized that one thing that is often missing from in-house software development is “product management.”

But what does that mean exactly? You don’t know it’s missing if you don’t even realize it’s a thing and people can use different terms to mean different roles/responsibilities.

Basically, deciding what the software should do. This is not about colors on screen or margins (what our stakeholderes often enjoy micro-managing) — I’d consider those still the how of doing it, rather than the what to do. The what is often at a much higher level, about what features or components to develop at all.

When done right, it is going to be based on both knowledge of the end-user’s needs and preferences (user research); but also knowledge of internal stakeholder’s desires and preferences (overall organiational strategy, but also just practically what is going to make the right people happy to keep us resourced). Also knowledge of the local capacity, what pieces do we need to put in place to get these things developed. When done seriously, it will necessarily involve prioritization — there are many things we could possibly done, some subset of them we very well may do eventually, but which ones should we do now?

My experience tells me it is a very big mistake to try to have a developer doing this kind of product management. Not because a developer can’t have the right skillset to do them. But because having the same person leading development and product management is a mistake. The developer is too close to the development lense, and there’s just a clarification that happens when these roles are separate.

My experience also tells me that it’s a mistake to have a committee doing these things, much as that is popular in the academic sector. Because, well, just of course it is.

But okay this is all still pretty abstract. Things might become more clear if we get more specific about the actual tasks and work of this kind of product management role.

I found Damilola Ajiboye blog post on “Product Manager vs Product Marketing Manager vs Product Owner” very clear and helpful here. While it is written so as to distinguish between three different product management related roles, but Ajiboye also acknowledges that in a smaller organization “a product manager is often tasked with the duty of these 3 roles.

Regardless of if the responsibilities are to be done by one or two or three person, Ajiboye’s post serves as a concise listing of the work to be done in managing a product — deciding the what of the product, in an ongoing iterative and collaborative manner, so that developers and designers can get to the how and to implementation.

I recommend reading the whole article, and I’ll excerpt much of it here, slightly rearranged.

The Product Manager

These individuals are often referred to as mini CEOs of a product. They conduct customer surveys to figure out the customer’s pain and build solutions to address it. The PM also prioritizes what features are to be built next and prepares and manages a cohesive and digital product roadmap and strategy.

The Product Manager will interface with the users through user interviews/feedback surveys or other means to hear directly from the users. They will come up with hypotheses alongside the team and validate them through prototyping and user testing. They will then create a strategy on the feature and align the team and stakeholders around it. The PM who is also the chief custodian of the entire product roadmap will, therefore, be tasked with the duty of prioritization. Before going ahead to carry out research and strategy, they will have to convince the stakeholders if it is a good choice to build the feature in context at that particular time or wait a bit longer based on the content of the roadmap.

The Product Marketing Manager
The PMM communicates vital product value — the “why”, “what” and “when” of a product to intending buyers. He manages the go-to-market strategy/roadmap and also oversees the pricing model of the product. The primary goal of a PMM is to create demand for the products through effective messaging and marketing programs so that the product has a shorter sales cycle and higher revenue.

The product marketing manager is tasked with market feasibility and discovering if the features being built align with the company’s sales and revenue plan for the period. They also make research on how sought-after the feature is being anticipated and how it will impact the budget. They communicate the values of the feature; the why, what, and when to potential buyers — In this case users in countries with poor internet connection.

[While expressed in terms of a for-profit enterprise selling something, I think it’s not hard to translate this to a non-profit or academic environment. You still have an audience whose uptake you need to be succesful, whether internal or external. — jrochkind ]

The Product Owner
A product owner (PO) maximizes the value of a product through the creation and management of the product backlog, creation of user stories for the development team. The product owner is the customer’s representative to the development team. He addresses customer’s pain points by managing and prioritizing a visible product backlog. The PO is the first point of call when the development team needs clarity about interpreting a product feature to be implemented.

The product owner will first have to prioritize the backlog to see if there are no important tasks to be executed and if this new feature is worth leaving whatever is being built currently. They will also consider the development effort required to build the feature i.e the time, tools, and skill set that will be required. They will be the one to tell if the expertise of the current developers is enough or if more engineers or designers are needed to be able to deliver at the scheduled time. The product owner is also armed with the task of interpreting the product/feature requirements for the development team. They serve as the interface between the stakeholders and the development team.

When you have someone(s) doing these roles well, it ensures that the development team is actually spending time on things that meet user and business needs. I have found that it makes things so much less stressful and more rewarding for everyone involved.

When you have nobody doing these roles, or someone doing it in a cursory or un-intentional way not recognized as part of their core job responsibilities, or have a lead developer trying to do it on top of develvopment, I find it leads to feelings of: spinning wheels, everything-is-an-emergency, lack of appreciation, miscommunication and lack of shared understanding between stakeholders and developers, general burnout and dissatisfaction — and at the root, a product that is not meeting user or business needs well, leading to these inter-personal and personal problems.

Rails auto-scaling on Heroku

We are investigating moving our medium-small-ish Rails app to heroku.

We looked at both the Rails Autoscale add-on available on heroku marketplace, and the hirefire.io service which is not listed on heroku marketplace and I almost didn’t realize it existed.

I guess hirefire.io doesn’t have any kind of a partnership with heroku, but still uses the heroku API to provide an autoscale service. hirefire.io ended up looking more fully-featured and lesser priced than Rails Autoscale; so the main service of this post is just trying to increase visibility of hirefire.io and therefore competition in the field, which benefits us consumers.

Background: Interest in auto-scaling Rails background jobs

At first I didn’t realize there was such a thing as “auto-scaling” on heroku, but once I did, I realized it could indeed save us lots of money.

I am more interested in scaling Rails background workers than I a web workers though — our background workers are busiest when we are doing “ingests” into our digital collections/digital asset management system, so the work is highly variable. Auto-scaling up to more when there is ingest work piling up can give us really nice inget throughput while keeping costs low.

On the other hand, our web traffic is fairly low and probably isn’t going to go up by an order of magnitude (non-profit cultural institution here). And after discovering that a “standard” dyno is just too slow, we will likely be running a performance-m or performance-l anyway — which likely can handle all anticipated traffic on it’s own. If we have an auto-scaling solution, we might configure it for web dynos, but we are especially interested in good features for background scaling.

There is a heroku built-in autoscale feature, but it only works for performance dynos, and won’t do anything for Rails background job dynos, so that was right out.

That could work for Rails bg jobs, the Rails Autoscale add-on on the heroku marketplace; and then we found hirefire.io.

Pricing: Pretty different

hirefire

As of now January 2021, hirefire.io has pretty simple and affordable pricing. $15/month/heroku application. Auto-scaling as many dynos and process types as you like.

hirefire.io by default can only check into your apps metrics to decide if a scaling event can occur once per minute. If you want more frequent than that (up to once every 15 seconds), you have to pay an additional $10/month, for $25/month/heroku application.

Even though it is not a heroku add-on, hirefire does advertise that they bill pro-rated to the second, just like heroku and heroku add-ons.

Rails autoscale

Rails autoscale has a more tiered approach to pricing that is based on number and type of dynos you are scaling. Starting at $9/month for 1-3 standard dynos, the next tier up is $39 for up to 9 standard dynos, all the way up to $279 (!) for 1 to 99 dynos. If you have performance dynos involved, from $39/month for 1-3 performance dynos, up to $599/month for up to 99 performance dynos.

For our anticipated uses… if we only scale bg dynos, I might want to scale from (low) 1 or 2 to (high) 5 or 6 standard dynos, so we’d be at $39/month. Our web dynos are likely to be performance and I wouldn’t want/need to scale more than probably 2, but that puts us into performance dyno tier, so we’re looking at $99/month.

This is of course significantly more expensive than hirefire.io’s flat rate.

Metric Resolution

Since Hirefire had an additional charge for finer than 1-minute resolution on checks for autoscaling, we’ll discuss resolution here in this section too. Rails Autoscale has same resolution for all tiers, and I think it’s generally 10 seconds, so approximately the same as hirefire if you pay the extra $10 for increased resolution.

Configuration

Let’s look at configuration screens to get a sense of feature-sets.

Rails Autoscale

web dynos

To configure web dynos, here’s what you get, with default values:

The metric Rails Autoscale uses for scaling web dynos is time in heroku routing queue, which seems right to me — when things are spending longer in heroku routing queue before getting to a dyno, it means scale up.

worker dynos

For scaling worker dynos, Rails Autoscale can scale dyno type named “worker” — it can understand ruby queuing libraries Sidekiq, Resque, Delayed Job, or Que. I’m not certain if there are options for writing custom adapter code for other backends.

Here’s what the configuration options are — sorry these aren’t the defaults, I’ve already customized them and lost track of what defaults are.

You can see that worker dynos are scaled based on the metric “number of jobs queued”, and you can tell it to only pay attention to certain queues if you want.

Hirefire

Hirefire has far more options for customization than Rails Autoscale, which can make it a bit overwhelming, but also potentially more powerful.

web dynos

You can actually configure as many Heroku process types as you have for autoscale, not just ones named “web” and “worker”. And for each, you have your choice of several metrics to be used as scaling triggers.

For web, I think Queue Time (percentile, average) matches what Rails Autoscale does, configured to percentile, 95, and is probably the best to use unless you have a reason to use another. (“Rails Autoscale tracks the 95th percentile queue time, which for most applications will hover well below the default threshold of 100ms.“)

Here’s what configuration Hirefire makes available if you are scaling on “queue time” like Rails Autoscale, configuration may vary for other metrics.

I think if you fill in the right numbers, you can configure to work equivalently to Rails Autoscale.

worker dynos

If you have more than one heroku process type for workers — say, working on different queues — Hirefire can scale the independently, with entirely separate configuration. This is pretty handy, and I don’t think Rails Autoscale offers this. (update i may be wrong, Rails Autoscale says they do support this, so check on it yourself if it matters to you).

For worker dynos, you could choose to scale based on actual “dyno load”, but I think this is probably mostly for types of processes where there isn’t the ability to look at “number of jobs”. A “number of jobs in queue” like Rails Autoscale does makes a lot more sense to me as an effective metric for scaling queue-based bg workers.

Hirefire’s metric is slightly difererent than Rails Autoscale’s “jobs in queue”. For recognized ruby queue systems (a larger list than Rails Autoscale’s; and you can write your own custom adapter for whatever you like), it actually measures jobs in queue plus workers currently busy. So queued+in-progress, rather than Rails Autoscale’s just queued. I actually have a bit of trouble wrapping my head around the implications of this, but basically, it means that Hirefire’s “jobs in queue” metric strategy is intended to try to scale all the way to emptying your queue, or reaching your max scale limit, whichever comes first. I think this may make sense and work out at least as well or perhaps better than Rails Autoscale’s approach?

Here’s what configuration Hirefire makes available for worker dynos scaling on “job queue” metric.

Since the metric isn’t the same as Rails Autosale, we can’t configure this to work identically. But there are a whole bunch of configuration options, some similar to Rails Autoscale’s.

The most important thing here is that “Ratio” configuration. It may not be obvious, but with the way the hirefire metric works, you are basically meant to configure this to equal the number of workers/threads you have on each dyno. I have it configured to 3 because my heroku worker processes use resque, with resque_pool, configured to run 3 resque workers on each dyno. If you use sidekiq, set ratio to your configured concurrency — or if you are running more than one sidekiq process, processes*concurrency. Basically how many jobs your dyno can be concurrently working is what you should normally set for ‘ratio’.

Hirefire not a heroku plugin

Hirefire isn’t actually a heroku plugin. In addition to that meaning separate invoicing, there can be some other inconveniences.

Since hirefire only can interact with heroku API, for some metrics (including the “queue time” metric that is probably optimal for web dyno scaling) you have to configure your app to log regular statistics to heroku’s “Logplex” system. This can add a lot of noise to your log, and for heroku logging add-ons that are tired based on number of log lines or bytes, can push you up to higher pricing tiers.

If you use paperclip, I think you should be able to use the log filtering feature to solve this, keep that noise out of your logs and avoid impacting data log transfer limits. However, if you ever have cause to look at heroku’s raw logs, that noise will still be there.

Support and Docs

I asked a couple questions of both Hirefire and Rails Autoscale as part of my evaluation, and got back well-informed and easy-to-understand answers quickly from both. Support for both seems to be great.

I would say the documentation is decent-but-not-exhaustive for both products. Hirefire may have slightly more complete documentation.

Other Features?

There are other things you might want to compare, various kinds of observability (bar chart or graph of dynos or observed metrics) and notification. I don’t have time to get into the details (and didn’t actually spend much time exploring them to evaluate), but they seem to offer roughly similar features.

Conclusion

Rails Autoscale is quite a bit more expensive than hirefire.io’s flat rate, once you get past Rails Autoscale’s most basic tier (scaling no more than 3 standard dynos).

It’s true that autoscaling saves you money over not, so even an expensive price could be considered a ‘cut’ of that, and possibly for many ecommerce sites even $99 a month might a drop in the bucket (!)…. but this price difference is so significant with hirefire (which has flat rate regardless of dynos), that it seems to me it would take a lot of additional features/value to justify.

And it’s not clear that Rails Autoscale has any feature advantage. In general, hirefire.io seems to have more features and flexibility.

Until 2021, hirefire.io could only analyze metrics with 1-minute resolution, so perhaps that was a “killer feature”?

Honestly I wonder if this price difference is sustained by Rails Autoscale only because most customers aren’t aware of hirefire.io, it not being listed on the heroku marketplace? Single-invoice billing is handy, but probably not worth $80+ a month. I guess hirefire’s logplex noise is a bit inconvenient?

Or is there something else I’m missing? Pricing competition is good for the consumer.

And are there any other heroku autoscale solutions, that can handle Rails bg job dynos, that I still don’t know about?

update a day after writing djcp on a reddit thread writes:

I used to be a principal engineer for the heroku add-ons program.

One issue with hirefire is they request account level oauth tokens that essentially give them ability to do anything with your apps, where Rails Autoscaling worked with us to create a partnership and integrate with our “official” add-on APIs that limits security concerns and are scoped to the application that’s being scaled.

Part of the reason for hirefire working the way it does is historical, but we’ve supported the endpoints they need to scale for “official” partners for years now.

A lot of heroku customers use hirefire so please don’t think I’m spreading FUD, but you should be aware you’re giving a third party very broad rights to do things to your apps. They probably won’t, of course, but what if there’s a compromise?

“Official” add-on providers are given limited scoped tokens to (mostly) only the actions / endpoints they need, minimizing blast radius if they do get compromised.

You can read some more discussion at that thread.

Managed Solr SaaS Options

I was recently looking for managed Solr “software-as-a-service” (SaaS) options, and had trouble figuring out what was out there. So I figured I’d share what I learned. Even though my knowledge here is far from exhaustive, and I have only looked seriously at one of the ones I found.

The only managed Solr options I found were: WebSolr; SearchStax; and OpenSolr.

Of these, i think WebSolr and SearchStax are more well-known, I couldn’t find anyone with experience with OpenSolr, which perhaps is newer.

Of them all, SearchStax is the only one I actually took for a test drive, so will have the most to say about.

Why we were looking

We run a fairly small-scale app, whose infrastructure is currently 4 self-managed AWS EC2 instances, running respectively: 1) A rails web app 2) Bg workers for the rails web app 3) Postgres, and 4) Solr.

Oh yeah, there’s also a redis running one of those servers, on #3 with pg or #4 with solr, I forget.

Currently we manage this all ourselves, right on the EC2. But we’re looking to move as much as we can into “managed” servers. Perhaps we’ll move to Heroku. Perhaps we’ll use hatchbox. Or if we do stay on AWS resources we manage directly, we’d look at things like using an AWS RDS Postgres instead of installing it on an EC2 ourselves, an AWS ElastiCache for Redis, maybe look into Elastic Beanstalk, etc.

But no matter what we do, we need a Solr, and we’d like to get it managed. Hatchbox has no special Solr support, AWS doesn’t have a Solr service, Heroku does have a solr add-on but you can also use any Solr with it and we’ll get to that later.

Our current Solr use is pretty small scale. We don’t run “SolrCloud mode“, just legacy ordinary Solr. We only have around 10,000 documents in there (tiny for Solr), our index size is only 70MB. Our traffic is pretty low — when I tried to figure out how low, it doesn’t seem we have sufficient logging turned on to answer that specifically but using proxy metrics to guess I’d say 20K-40K requests a day, query as well as add.

This is a pretty small Solr installation, although it is used centrally for the primary functions of the (fairly low-traffic) app. It currently runs on an EC2 t3a.small, which is a “burstable” EC2 type with only 2G of RAM. It does have two vCPUs (that is one core with ‘hyperthreading’). The t3a.small EC2 instance only costs $14/month on-demand price! We know we’ll be paying more for managed Solr, but we want to do get out of the business of managing servers — we no longer really have the staff for it.

WebSolr (didn’t actually try out)

WebSolr is the only managed Solr currently listed as a Heroku add-on. It is also available as a managed Solr independent of heroku.

The pricing in the heroku plans vs the independent plans seems about the same. As a heroku add-on there is a $20 “staging” plan that doesn’t exist in the independent plans. (Unlike some other heroku add-ons, no time-limited free plan is available for WebSolr). But once we go up from there, the plans seem to line up.

Starting at: $59/month for:

  • 1 million document limit
  • 40K requests/day
  • 1 index
  • 954MB storage
  • 5 concurrent requests limit (this limit is not mentioned on the independent pricing page?)

Next level up is $189/month for:

  • 5 million document limit
  • 150K requests/day
  • 4.6GB storage
  • 10 concurrent request limit (again concurrent request limits aren’t mentioned on independent pricing page)

As you can see, WebSolr has their plans metered by usage.

$59/month is around the price range we were hoping for (we’ll need two, one for staging one for production). Our small solr is well under 1 million documents and ~1GB storage, and we do only use one index at present. However, the 40K requests/day limit I’m not sure about, even if we fit under it, we might be pushing up against it.

And the “concurrent request” limit simply isn’t one I’m even used to thinking about. On a self-managed Solr it hasn’t really come up. What does “concurrent” mean exactly in this case, how is it measured? With 10 puma web workers and sometimes a possibly multi-threaded batch index going on, could we exceed a limit of 4? Seems plausible. What happens when they are exceeded? Your Solr request results in an HTTP 429 error!

Do I need to now write the app to rescue those gracefully, or use connection pooling to try to avoid them, or something? Having to rewrite the way our app functions for a particular managed solr is the last thing we want to do. (Although it’s not entirely clear if those connection limits exist on the non-heroku-plugin plans, I suspect they do?).

And in general, I’m not thrilled with the way the pricing works here, and the price points. I am positive for a lot of (eg) heroku customers an additional $189*2=$378/month is peanuts not even worth accounting for, but for us, a small non-profit whose app’s traffic does not scale with revenue, that starts to be real money.

It is not clear to me if WebSolr installations (at “standard” plans) are set up in “SolrCloud mode” or not; I’m not sure what API’s exist for uploading your custom schema.xml (which we’d need to do), or if they expect you to do this only manually through a web UI (that would not be good); I’m not sure if you can upload custom solrconfig.xml settings (this may be running on a shared solr instance with standard solrconfig.xml?).

Basically, all of this made WebSolr not the first one we looked at.

Does it matter if we’re on heroku using a managed Solr that’s not a Heroku plugin?

I don’t think so.

In some cases, you can get a better price from a Heroku plug-in than you could get from that same vendor not on heroku or other competitors. But that doesn’t seem to be the case here, and other that that does it matter?

Well, all heroku plug-ins are required to bill you by-the-minute, which is nice but not really crucial, other forms of billing could also be okay at the right price.

With a heroku add-on, your billing is combined into one heroku invoice, no need to give a credit card to anyone else, and it can be tracked using heroku tools. Which is certainly convenient and a plus, but not essential if the best tool for the job is not a heroku add-on.

And as a heroku add-on, WebSolr provides a WEBSOLR_URL heroku config/env variable automatically to code running on heroku. OK, that’s kind of nice, but it’s not a big deal to set a SOLR_URL heroku config manually referencing the appropriate address. I suppose as a heroku add-on, WebSolr also takes care of securing and authenticating connections between the heroku dynos and the solr, so we need to make sure we have a reasonable way to do this from any alternative.

SearchStax (did take it for a spin)

SearchStax’s pricing tiers are not based on metering usage. There are no limits based on requests/day or concurrent connections. SearchStax runs on dedicated-to-you individual Solr instances (I would guess running on dedicated-to-you individual (eg) EC2, but I’m not sure). Instead the pricing is based on size of host running Solr.

You can choose to run on instances deployed to AWS, Google Cloud, or Azure. We’ll be sticking to AWS (the others, I think, have a slight price premium).

While SearchStax gives you a pricing pages that looks like the “new-way-of-doing-things” transparent pricing, in fact there isn’t really enough info on public pages to see all the price points and understand what you’re getting, there is still a kind of “talk to a salesperson who has a price sheet” thing going on.

What I think I have figured out from talking to a salesperson and support, is that the “Silver” plans (“Starting at $19 a month”, although we’ll say more about that in a bit) are basically: We give you a Solr, we don’t don’t provide any technical support for Solr.

While the “Gold” plans “from $549/month” are actually about paying for Solr consultants to set up and tune your schema/index etc. That is not something we need, and $549+/month is way more than the price range we are looking for.

While the SearchStax pricing/plan pages kind of imply the “Silver” plan is not suitable for production, in fact there is no real reason not to use it for production I think, and the salesperson I talked to confirmed that — just reaffirming that you were on your own managing the Solr configuration/setup. That’s fine, that’s what we want, we just don’t want to mangage the OS or set up the Solr or upgrade it etc. The Silver plans have no SLA, but as far as I can tell their uptime is just fine. The Silver plans only guarantees 72-hour support response time — but for the couple support tickets I filed asking questions while under a free 14-day trial (oh yeah that’s available), I got prompt same-day responses, and knowledgeable responses that answered my questions.

So a “silver” plan is what we are interested in, but the pricing is not actually transparent.

$19/month is for the smallest instance available, and IF you prepay/contract for a year. They call that small instance an NDN1 and it has 1GB of RAM and 8GB of storage. If you pay-as-you-go instead of contracting for a year, that already jumps to $40/month. (That price is available on the trial page).

When you are paying-as-you-go, you are actually billed per-day, which might not be as nice as heroku’s per-minute, but it’s pretty okay, and useful if you need to bring up a temporary solr instance as part of a migration/upgrade or something like that.

The next step up is an “NDN2” which has 2G of RAM and 16GB of storage, and has an ~$80/month pay-as-you-go — you can find that price if you sign-up for a free trial. The discount price price for an annual contract is a discount similar to the NDN1 50%, $40/month — that price I got only from a salesperson, I don’t know if it’s always stable.

It only occurs to me now that they don’t tell you how many CPUs are available.

I’m not sure if I can fit our Solr in the 1G NDN1, but I am sure I can fit it in the 2G NDN2 with some headroom, so I didn’t look at plans above that — but they are available, still under “silver”, with prices going up accordingly.

All SearchStax solr instances run in “SolrCloud” mode — these NDN1 and NDN2 ones we’re looking at just run one node with one zookeeper, but still in cloud mode. There are also “silver” plans available with more than one node in a “high availability” configuration, but the prices start going up steeply, and we weren’t really interested in that.

Because it’s SolrCloud mode though, you can use the standard Solr API for uploading your configuration. It’s just Solr! So no arbitrary usage limits, no features disabled.

The SearchStax web console seems competently implemented; it let’s you create and delete individual Solr “deployments”, manage accounts to login to console (on “silver” plan you only get two, or can pay $10/month/account for more, nah), and set up auth for a solr deployment. They support IP-based authentication or HTTP Basic Auth to the Solr (no limit to how many Solr Basic Auth accounts you can create). HTTP Basic Auth is great for us, because trying to do IP-based from somewhere like heroku isn’t going to work. All Solrs are available over HTTPS/SSL — great!

SearchStax also has their own proprietary HTTP API that lets you do most anything, including creating/destroying deployments, managing Solr basic auth users, basically everything. There is some API that duplicates the Solr Cloud API for adding configsets, I don’t think there’s a good reason to use it instead of standard SolrCloud API, although their docs try to point you to it. There’s even some kind of webhooks for alerts! (which I haven’t really explored).

Basically, SearchStax just seems to be a sane and rational managed Solr option, it has all the features you’d expect/need/want for dealing with such. The prices seem reasonable-ish, generally more affordable than WebSolr, especially if you stay in “silver” and “one node”.

At present, we plan to move forward with it.

OpenSolr (didn’t look at it much)

I have the least to say about this, have spent the least time with it, after spending time with SearchStax and seeing it met our needs. But I wanted to make sure to mention it, because it’s the only other managed Solr I am even aware of. Definitely curious to hear from any users.

Here is the pricing page.

The prices seem pretty decent, perhaps even cheaper than SearchStax, although it’s unclear to me what you get. Does “0 Solr Clusters” mean that it’s not SolrCloud mode? After seeing how useful SolrCloud APIs are for management (and having this confirmed by many of my peers in other libraries/museums/archives who choose to run SolrCloud), I wouldn’t want to do without it. So I guess that pushes us to “executive” tier? Which at $50/month (billed yearly!) is still just fine, around the same as SearchStax.

But they do limit you to one solr index; I prefer SearchStax’s model of just giving you certain host resources and do what you want with it. It does say “shared infrastructure”.

Might be worth investigating, curious to hear more from anyone who did.

Now, what about ElasticSearch?

We’re using Solr mostly because that’s what various collaborative and open source projects in the library/museum/archive world have been doing for years, since before ElasticSearch even existed. So there are various open source libraries and toolsets available that we’re using.

But for whatever reason, there seem to be SO MANY MORE managed ElasticSearch SaaS available. At possibly much cheaper pricepoints. Is this because the ElasticSearch market is just bigger? Or is ElasticSearch easier/cheaper to run in a SaaS environment? Or what? I don’t know.

But there’s the controversial AWS ElasticSearch Service; there’s the Elastic Cloud “from the creators of ElasticSearch”. On Heroku that lists one Solr add-on, there are THREE ElasticSearch add-ons listed: ElasticCloud, Bonsai ElasticSearch, and SearchBox ElasticSearch.

If you just google “managed ElasticSearch” you immediately see 3 or 4 other names.

I don’t know enough about ElasticSearch to evaluate them. There seem on first glance at pricing pages to be more affordable, but I may not know what I’m comparing and be looking at tiers that aren’t actually usable for anything or will have hidden fees.

But I know there are definitely many more managed ElasticSearch SaaS than Solr.

I think ElasticSearch probably does everything our app needs. If I were to start from scratch, I would definitely consider ElasticSearch over Solr just based on how many more SaaS options there are. While it would require some knowledge-building (I have developed a lot of knowlege of Solr and zero of ElasticSearch) and rewriting some parts of our stack, I might still consider switching to ES in the future, we don’t do anything too too complicated with Solr that would be too too hard to switch to ES, probably.

Gem authors, check your release sizes

Most gems should probably be a couple hundred kb at most. I’m talking about the package actually stored in and downloaded from rubygems by an app using the gem.

After all, source code is just text, and it doesn’t take up much space. OK, maybe some gems have a couple images in there.

But if you are looking at your gem in rubygems and realize that it’s 10MB or bigger… and that it seems to be getting bigger with every release… something is probably wrong and worth looking into it.

One way to look into it is to look at the actual gem package. If you use the handy bundler rake task to release your gem (and I recommend it), you have a ./pkg directory in your source you last released from. Inside it are “.gem” files for each release you’ve made from there, unless you’ve cleaned it up recently.

.gem files are just .tar files it turns out. That have more tar and gz files inside them etc. We can go into it, extract contents, and use the handy unix utility du -sh to see what is taking up all the space.

How I found the bytes

jrochkind-chf kithe (master ?) $ cd pkg

jrochkind-chf pkg (master ?) $ ls
kithe-2.0.0.beta1.gem        kithe-2.0.0.pre.rc1.gem
kithe-2.0.0.gem            kithe-2.0.1.gem
kithe-2.0.0.pre.beta1.gem    kithe-2.0.2.gem

jrochkind-chf pkg (master ?) $ mkdir exploded

jrochkind-chf pkg (master ?) $ cp kithe-2.0.0.gem exploded/kithe-2.0.0.tar

jrochkind-chf pkg (master ?) $ cd exploded

jrochkind-chf exploded (master ?) $ tar -xvf kithe-2.0.0.tar
 x metadata.gz
 x data.tar.gz
 x checksums.yaml.gz

jrochkind-chf exploded (master ?) $  mkdir unpacked_data_tar

jrochkind-chf exploded (master ?) $ tar -xvf data.tar.gz -C unpacked_data_tar/

jrochkind-chf exploded (master ?) $ cd unpacked_data_tar/
/Users/jrochkind/code/kithe/pkg/exploded/unpacked_data_tar

jrochkind-chf unpacked_data_tar (master ?) $ du -sh *
 4.0K    MIT-LICENSE
  12K    README.md
 4.0K    Rakefile
 160K    app
 8.0K    config
  32K    db
 100K    lib
 300M    spec

jrochkind-chf unpacked_data_tar (master ?) $ cd spec

jrochkind-chf spec (master ?) $ du -sh *
 8.0K    derivative_transformers
 300M    dummy
  12K    factories
  24K    indexing
  72K    models
 4.0K    rails_helper.rb
  44K    shrine
  12K    simple_form_enhancements
 8.0K    spec_helper.rb
 188K    test_support
 4.0K    validators

jrochkind-chf spec (master ?) $ cd dummy/

jrochkind-chf dummy (master ?) $ du -sh *
 4.0K    Rakefile
  56K    app
  24K    bin
 124K    config
 4.0K    config.ru
 8.0K    db
 300M    log
 4.0K    package.json
  12K    public
 4.0K    tmp

Doh! In this particular gem, I have a dummy rails app, and it has 300MB of logs, cause I haven’t b bothered trimming them in a while, that are winding up including in the gem release package distributed to rubygems and downloaded by all consumers! Even if they were small, I don’t want these in the released gem package at all!

That’s not good! It only turns into 12MB instead of 300MB, because log files are so compressable and there is compression involved in assembling the rubygems package. But I have no idea how much space it’s actually taking up on consuming applications machines. This is very irresponsible!

What controls what files are included in the gem package?

Your .gemspec file of course. The line s.files = is an array of every file to include in the gem package. Well, plus s.test_files is another array of more files, that aren’t supposed to be necessary to run the gem, but are to test it.

(Rubygems was set up to allow automated *testing* of gems after download, is why test files are included in the release package. I am not sure how useful this is, and who if anyone does it; although I believe that some linux distro packagers try to make use of it, for better or worse).

But nobody wants to list every file in your gem individually, manually editing the array every time you add, remove, or move one. Fortunately, gemspec files are executable ruby code, so you can use ruby as a shortcut.

I have seen two main ways of doing this, with different “gem skeleton generators” taking one of two approaches.

Sometimes a shell out to git is used — the idea is that everything you have checked into your git should be in the gem release package, no more or no less. For instance, one of my gems has this in it, not sure where it came from or who/what generated it.

spec.files = `git ls-files -z`.split("\x0").reject do |f|
 f.match(%r{^(test|spec|features)/})
end

In that case, it wouldn’t have included anything in ./spec already, so this obviously isn’t actually the gem we were looking at before.

But in this case, in addition to using ruby logic to manipulate the results, nothing excluded by your .gitignore file will end up included in your gem package, great!

In kithe we were looking at before, those log files were in the .gitignore (they weren’t in my repo!), so if I had been using that git-shellout technique, they wouldn’t have been included in the ruby release already.

But… I wasn’t. Instead this gem has a gemspec that looks like:

s.test_files = Dir["spec/*/"]

Just include every single file inside ./spec in the test_files list. Oops. Then I get all those log files!

One way to fix

I don’t really know which is to be preferred of the git-shellout approach vs the dir-glob approach. I suspect it is the subject of historical religious wars in rubydom, when there were still more people around to argue about such things. Any opinions? Or another approach?

Without being in the mood to restructure this gemspec in anyway, I just did the simplest thing to keep those log files out…

Dir["spec/*/"].delete_if {|a| a =~ %r{/dummy/log/}}

Build the package without releasing with the handy bundler supplied rake build task… and my gem release package size goes from 12MB to 64K. (which actually kind of sounds like a minimum block size or something, right?)

Phew! That’s a big difference! Sorry for anyone using previous versions and winding up downloading all that cruft! (Actually this particular gem is mostly a proof of concept at this point and I don’t think anyone else is using it).

Check your gem sizes!

I’d be willing to be there are lots of released gems with heavily bloated release packages like this. This isn’t the first one I’ve realized was my fault. Because who pays attention to gem sizes anyway? Apparently not many!

But rubygems does list them, so it’s pretty easy to see. Are your gem release packages multiple megs, when there’s no good reason for them to be? Do they get bigger every release by far more than the bytes of lines of code you think were added? At some point in gem history was there a big jump from hundreds of KB to multiple MB? When nothing particularly actually happened to gem logic to lead to that?

All hints that you might be including things you didn’t mean to include, possibly things that grow each release.

You don’t need to have a dummy rails app in your repo to accidentally do this (I accidentally did it once with a gem that had nothing to do with rails). There could be other kind of log files. Or test coverage or performance metric files, or any other artifacts of your build or your development, especially ones that grow over time — that aren’t actually meant to or needed as part of the gem release package!

It’s good to sanity check your gem release packages now and then. In most cases, your gem release package should be hundreds of KB at most, not MBs. Help keep your users’ installs and builds faster and slimmer!

Updating SolrCloud configuration in ruby

We have an app that uses Solr. We currently run a Solr in legacy “not cloud” mode. Our solr configuration directory is on disk on the Solr server, and it’s up to our processes to get our desired solr configuration there, and to update it when it changes.

We are in the process of moving to a Solr in “SolrCloud mode“, probably via the SearchStax managed Solr service. Our Solr “Cloud” might only have one node, but “SolrCloud mode” gives us access to additional API’s for managing your solr configuration, as opposed to writing it directly to disk (which may not be possible at all in SolrCloud mode? And certainly isn’t using managed SearchStax).

That is, the Solr ConfigSets API, although you might also want to use a few pieces of the Collection Management API for associating a configset with a Solr collection.

Basically, you are taking your desired solr config directory, zipping it up, and uploading it to Solr as a “config set” [or “configset”] with a certain name. Then you can create collections using this config set, or reassign which named configset an existing collection uses.

I wasn’t able to find any existing ruby gems for interacting with these Solr API’s. RSolr is a “ruby client for interacting with solr”, but was written before most of these administrative API’s existed for Solr, and doesn’t seem to have been updated to deal with them (unless I missed it), RSolr seems to be mostly/only about querying solr, and some limited indexing.

But no worries, it’s not too hard to wrap the specific API I want to use in some ruby. Which did seem far better to me than writing the specific HTTP requests each time (and making sure you are dealing with errors etc!). (And yes, I will share the code with you).

I decided I wanted an object that was bound to a particular solr collection at a particular solr instance; and was backed by a particular local directory with solr config. That worked well for my use case, and I wound up with an API that looks like this:

updater = SolrConfigsetUpdater.new(
  solr_url: "https://example.com/solr",
  conf_dir: "./solr/conf",
  collection_name: "myCollection"
)

# will zip up ./solr/conf and upload it as named MyConfigset:
updater.upload("myConfigset")

updater.list #=> ["myConfigSet"]
updater.config_name # what configset name is MyCollection currently configured to use?
# => "oldConfigSet"

# what if we try to delete the one it's using?
updater.delete("oldConfigSet")
# => raises SolrConfigsetUpdater::SolrError with message:
# "Can not delete ConfigSet as it is currently being used by collection [myConfigset]"

# okay let's change it to use the new one and delete the old one

updater.update_config_name("myConfigset")
# now MyCollection uses this new configset, although we possibly
# need to reload the collection to make that so
updater.reload
# now let's delete the one we're not using
updater.delete("oldConfigSet")

OK, great. There were some tricks in there in trying to catch the apparently multiple ways Solr can report different kinds of errors, to make sure Solr-reported errors turn into exceptions ideally with good error messages.

Now, in addition to uploading a configset initially for a collection you are creating to use, the main use case I have is wanting to UPDATE the configuration to new values in an existing collection. Sure, this often requires a reindex afterwards.

If you have the recently released Solr 8.7, it will let you overwrite an existing configset, so this can be done pretty easily.

updater.upload(updater.config_name, overwrite: true)
updater.reload

But prior to Solr 8.7 you can not overwrite an existing configset. And SearchStax doesn’t yet have Solr 8.7. So one way or another, we need to do a dance where we upload the configset under a new name than switch the collection to use it.

Having this updater object that lets us easily execute relevant Solr API lets us easily experiment with different logic flows for this. For instance in a Solr listserv thread, Alex Halovnic suggests a somewhat complicated 8-step process workaround, which we can implement like so:

current_name = updater.config_name
temp_name = "#{current_name}_temp"

updater.create(from: current_name, to: temp_name)
updater.change_config_name(temp_name)
updater.reload
updater.delete(current_name)
updater.upload(configset_name: current_name)
updater.change_config_name(current_name)
updater.reload
updater.delete(temp_name)

That works. But talking to Dann Bohn at Penn State University, he shared a different algorithm, which goes like:

  • Make a cryptographic digest hash of the entire solr directory, which we’re going to use in the configset name.
  • Check if the collection is already using a configset named $name_$digest, which if it already is, you’re done, no change needed.
  • Otherwise, upload the configset with the fingerprint-based name, switch the collection to use it, reload, delete the configset that the collection used to use.

At first this seemed like overkill to me, but after thinking and experimenting with it, I like it! It is really quick to make a digest of a handful of files, that’s not a big deal. (I use first 7 chars of hex SHA256). And even if we had Solr 8.7, I like that we can avoid doing any operation on solr at all if there had been no changes — I really want to use this operation much like a Rails db:migrate, running it on every deploy to make sure the solr schema matches the one in the repo for the depoy.

Dann also shared his open source code with me, which was helpful for seeing how to make the digest, how to make a Zip file in ruby, etc. Thanks Dann!

Sharing my code

So I also wrote some methods to implement those variant updating stragies, Dann’s, and Alex Halovnic’s from the list etc.

I thought about wrapping this all up as a gem, but I didn’t really have the time to make it really good enough for that. My API is a little bit janky, I didn’t spend the extra time think it out really well to minimize the need for future backwards incompat changes like I would if it were a gem. I also couldn’t figure out a great way to write automated tests for this that I would find particularly useful; so in my code base it’s actually not currently test-covered (shhhhh) but in a gem I’d want to solve that somehow.

But I did try to write the code general purpose/flexible so other people could use it for their use cases; I tried to document it to my highest standards; and I put it all in one file which actually might not be the best OO abstraction/design, but makes it easier for you to copy and paste the single file for your own use. :)

So you can find my code here; it is apache-licensed; and you are welcome to copy and paste it and do whatever you like with it, including making a gem yourself if you want. Maybe I’ll get around to making it a gem in the future myself, I dunno, curious if there’s interest.

The SearchStax proprietary API’s

SearchStax has it’s own API’s that can I think be used for updating configsets and setting collections to use certain configsets etc. When I started exploring them, they are’t the worst vendor API’s I’ve seen, but I did find them a bit cumbersome to work with. The auth system involves a lot of steps (why can’t you just create an API Key from the SearchStax Web GUI?).

Overall I found them harder to use than just the standard Solr Cloud API’s, which worked fine in the SearchStax deployment, and have the added bonus of being transferable to any SolrCloud deployment instead of being SearchStax-specific. While the SearchStax docs and support try to steer you to the SearchStax specific API’s, I don’t think there’s really any good reason for this. (Perhaps the custom SearchStax API’s were written long ago when Solr API’s weren’t as complete?)

SearchStax support suggested that the SearchStax APIs were somehow more secure; but my SearchStax Solr API’s are protected behind HTTP basic auth, and if I’ve created basic auth credentials (or IP addr allowlist) those API’s will be available to anyone with auth to access Solr whether I use em or not! And support also suggested that the SearchStax API use would be logged, whereas my direct Solr API use would not be, which seems to be true at least in default setup, I can probably configure solr logging differently, but it just isn’t that important to me for these particular functions.

So after some initial exploration with SearchStax API, I realized that SolrCloud API (which I had never used before) could do everything I need and was more straightforward and transferable to use, and I’m happy with my decision to go with that.

Are you talking to Heroku redis in cleartext or SSL?

In “typical” Redis installation, you might be talking to redis on localhost or on a private network, and clients typically talk to redis in cleartext. Redis doesn’t even natively support communications over SSL. (Or maybe it does now with redis6?)

However, the Heroku redis add-on (the one from Heroku itself) supports SSL connections via “Stunnel”, a tool popular with other redis users use to get SSL redis connections too. (Or maybe via native redis with redis6? Not sure if you’d know the difference, or if it matters).

There are heroku docs on all of this which say:

While you can connect to Heroku Redis without the Stunnel buildpack, it is not recommend. The data traveling over the wire will be unencrypted.

Perhaps especially because on heroku your app does not talk to redis via localhost or on a private network, but on a public network.

But I think I’ve worked on heroku apps before that missed this advice and are still talking to heroku in the clear. I just happened to run across it when I got curious about the REDIS_TLS_URL env/config variable I noticed heroku setting.

Which brings us to another thing, that heroku doc on it is out of date, it doesn’t mention the REDIS_TLS_URL config variable, just the REDIS_URL one. The difference? the TLS version will be a url beginning with rediss:// instead of redis:// , note extra s, which many redis clients use as a convention for “SSL connection to redis probably via stunnel since redis itself doens’t support it”. The redis docs provide ruby and go examples which instead use REDIS_URL and writing code to swap the redis:// for rediss:// and even hard-code port number adjustments, which is silly!

(While I continue to be very impressed with heroku as a product, I keep running into weird things like this outdated documentation, that does not match my experience/impression of heroku’s all-around technical excellence, and makes me worry if heroku is slipping…).

The docs also mention a weird driver: ruby arg for initializing the Redis client that I’m not sure what it is and it doesn’t seem necessary.

The docs are correct that you have to tell the ruby Redis client not to try to verify SSL keys against trusted root certs, and this implementation uses a self-signed cert. Otherwise you will get an error that looks like: OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=error: certificate verify failed (self signed certificate in certificate chain)

So, can be as simple as:

redis_client = Redis.new(url: ENV['REDIS_TLS_URL'], ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE })

$redis = redis_client
# and/or
Resque.redis = redis_client

I don’t use sidekiq on this project currently, but to get the SSL connection with VERIFY_NONE, looking at sidekiq docs maybe on sidekiq docs you might have to(?):

redis_conn = proc {
  Redis.new(url: ENV['REDIS_TLS_URL'], ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE })
}

Sidekiq.configure_client do |config|
  config.redis = ConnectionPool.new(size: 5, &redis_conn)
end

Sidekiq.configure_server do |config|
  config.redis = ConnectionPool.new(size: 25, &redis_conn)
end

(Not sure what values you should pick for connection pool size).

While the sidekiq docs mention heroku in passing, they don’t mention need for SSL connections — I think awareness of this heroku feature and their recommendation you use it may not actually be common!

Update: Beware REDIS_URL can also be rediss

On one of my apps I saw a REDIS_URL which used redis: and a REDIS_TLS_URL which uses (secure) rediss:.

But on another app, it provides *only* a REDIS_URL, which is rediss — meaning you have to set the verify_mode: OpenSSL::SSL::VERIFY_NONE when passing it to ruby redis client. So you have to be prepared to do this with REDIS_URL values too — I think it shouldn’t hurt to set the ssl_params option even if you pass it a non-ssl redis: url, so just set it all the time?

This second app was heroku-20 stack, and the first was heroku-18 stack, is that the difference? No idea.

Documented anywhere? I doubt it. Definitely seems sloppy for what I expect of heroku, making me get a bit suspicious of whether heroku is sticking to the really impressive level of technical excellence and documentation I expect from them.

So, your best bet is to check for both REDIS_TLS_URL and REDIS_URL, prefering the TLS one if present, realizing the REDIS_URL can have a rediss:// value in it too.

The heroku docs also say you don’t get secure TLS redis connection on “hobby” plans, but I”m not sure that’s actually true anymore on heroku-20? Not trusting the docs is not a good sign.

Comparing performance of a Rails app on different Heroku formations

I develop a “digital collections” or “asset management” app, which manages and makes digitized historical objects and their descriptions available to the public, from the collections here at the Science History Institute.

The app receives relatively low level of traffic (according to Google Analytics, around 25K pageviews a month), although we want it to be able to handle spikes without falling down. It is not the most performance-optimized app, it does have some relatively slow responses and can be RAM-hungry. But it works adequately on our current infrastructure: Web traffic is handled on a single AWS EC2 t2.medium instance, with 10 passenger processes (free version of passenger, so no multi-threading).

We are currently investigating the possibility of moving our infrastructure to heroku. After realizing that heroku standard dynos did not seem to have the performance characteristics I had expected, I decided to approach performance testing more methodically, to compare different heroku dyno formations to each other and to our current infrastructure. Our basic research question is probably What heroku formation do we need to have similar performance to our existing infrastructure?

I am not an expert at doing this — I did some research, read some blog posts, did some thinking, and embarked on this. I am going to lead you through how I approached this and what I found. Feedback or suggestions are welcome. The most surprising result I found was much poorer performance from heroku standard dynos than I expected, and specifically that standard dynos would not match performance of present infrastructure.

What URLs to use in test

Some older load-testing tools only support testing one URL over and over. I decided I wanted to test a larger sample list of URLs — to be a more “realistic” load, and also because repeatedly requesting only one URL might accidentally use caches in ways you aren’t expecting giving you unrepresentative results. (Our app does not currently use fragment caching, but caches you might not even be thinking about include postgres’s built-in automatic caches, or passenger’s automatic turbocache (which I don’t think we have turned on)).

My initial thought to get a list of such URLs from our already-in-production app from production logs, to get a sample of what real traffic looks like. There were a couple barriers for me to using production logs as URLs:

  1. Some of those URLs might require authentication, or be POST requests. The bulk of our app’s traffic is GET requests available without authentication, and I didn’t feel like the added complexity of setting up anything else in a load traffic was worthwhile.
  2. Our app on heroku isn’t fully functional yet. Without having connected it to a Solr or background job workers, only certain URLs are available.

In fact, a large portion of our traffic is an “item” or “work” detail page like this one. Additionally, those are the pages that can be the biggest performance challenge, since the current implementation includes a thumbnail for every scanned page or other image, so response time unfortunately scales with number of pages in an item.

So I decided a good list of URLs was simply a representative same of those “work detail” pages. In fact, rather than completely random sample, I took the 50 largest/slowest work pages, and then added in another 150 randomly chosen from our current ~8K pages. And gave them all a randomly shuffled order.

In our app, every time a browser requests a work detail page, the JS on that page makes an additional request for a JSON document that powers our page viewer. So for each of those 200 work detail pages, I added the JSON request URL, for a more “realistic” load, and 400 total URLs.

Performance: “base speed” vs “throughput under load”

Thinking about it, I realized there were two kinds of “performance” or “speed” to think about.

You might just have a really slow app, to exagerate let’s say typical responses are 5 seconds. That’s under low/no-traffic, a single browser is the only thing interacting with the app, it makes a single request, and has to wait 5 seconds for a response.

That number might be changed by optimizations or performance regressions in your code (including your dependencies). It might also be changed by moving or changing hardware or virtualization environment — including giving your database more CPU/RAM resources, etc.

But that number will not change by horizontally scaling your deployment — adding more puma or passenger processes or threads, scaling out hosts with a load balancer or heroku dynos. None of that will change this base speed because it’s just how long the app takes to prepare a response when not under load, how slow it is in a test only one web worker , where adding web workers won’t matter because they won’t be used.

Then there’s what happens to the app actually under load by multiple users at once. The base speed is kind of a lower bound on throughput under load — page response time is never going to get better than 5s for our hypothetical very slow app (without changing the underlying base speed). But it can get a lot worse if it’s hammered by traffic. This throughput under load can be effected not only by changing base speed, but also by various forms of horizontal scaling — how many puma or passenger processes you have with how many threads each, and how many CPUs they have access to, as well as number of heroku dynos or other hosts behind a load balancer.

(I had been thinking about this distinction already, but Nate Berkopec’s great blog post on scaling Rails apps gave me the “speed” vs “throughout” terminology to use).

For my condition, we are not changing the code at all. But we are changing the host architecture from a manual EC2 t2.medium to heroku dynos (of various possible types) in a way that could effect base speed, and we’re also changing our scaling architecture in a way that could change throughput under load on top of that — from one t2.medium with 10 passenger process to possibly multiple heroku dynos behind heroku’s load balancer, and also (for Reasons) switching from free passenger to trying puma with multiple threads per process. (we are running puma 5 with new experimental performance features turned on).

So we’ll want to get a sense of base speed of the various host choices, and also look at how throughput under load changes based on various choices.

Benchmarking tool: wrk

We’re going to use wrk.

There are LOTS of choices for HTTP benchmarking/load testing, with really varying complexity and from different eras of web history. I got a bit overwhelmed by it, but settled on wrk. Some other choices didn’t have all the features we need (some way to test a list of URLs, with at least some limited percentile distribution reporting). Others were much more flexible and complicated and I had trouble even figuring out how to use them!

wrk does need a custom lua script in order to handle a list of URLs. I found a nice script here, and modified it slightly to take filename from an ENV variable, and not randomly shuffle input list.

It’s a bit confusing understanding the meaning of “threads” vs “connections” in wrk arguments. This blog post from appfolio clears it up a bit. I decided to leave threads set to 1, and vary connections for load — so -c1 -t1 is a “one URL at a time” setting we can use to test “base speed”, and we can benchmark throughput under load by increasing connections.

We want to make sure we run the test for long enough to touch all 400 URLs in our list at least once, even in the slower setups, to have a good comparison — ideally it would be go through the list more than once, but for my own ergonomics I had to get through a lot of tests so ended up less tha ideal. (Should I have put fewer than 400 URLs in? Not sure).

Conclusions in advance

As benchmarking posts go (especially when I’m the one writing them), I’m about to drop a lot of words and data on you. So to maximize the audience that sees the conclusions (because they surprise me, and I want feedback/pushback on them), I’m going to give you some conclusions up front.

Our current infrastructure has web app on a single EC2 t2.medium, which is a burstable EC2 type — our relatively low-traffic app does not exhaust it’s burst credits. Measuring base speed (just one concurrent request at a time), we found that performance dynos seem to have about the CPU speed of a bursting t2.medium (just a hair slower).

But standard dynos are as a rule 2 to 3 times slower; additionally they are highly variable, and that variability can be over hours/days. A 3 minute period can have measured response times 2 or more times slower than another 3 minute period a couple hours later. But they seem to typically be 2-3x slower than our current infrastructure.

Under load, they scale about how you’d expect if you knew how many CPUs are present, no real surprises. Our existing t2.medium has two CPUs, so can handle 2 simultaneous requests as fast as 1, and after that degrades linearly.

A single performance-L ($500/month) has 4 CPUs (8 hyperthreads), so scales under load much better than our current infrastructure.

A single performance-M ($250/month) has only 1 CPU (!), so scales pretty terribly under load.

Testing scaling with 4 standard-2x’s ($200/month total), we see that it scales relatively evenly. Although lumpily because of variability, and it starts out so much worse performing that even as it scales “evenly” it’s still out-performed by all other arcchitectures. :( (At these relatively fast median response times you might say it’s still fast enough who cares, but in our fat tail of slower pages it gets more distressing).

Now we’ll give you lots of measurements, or you can skip all that to my summary discussion or conclusions for our own project at the end.

Let’s compare base speed

OK, let’s get to actual measurements! For “base speed” measurements, we’ll be telling wrk to use only one connection and one thread.

Existing t2.medium: base speed

Our current infrastructure is one EC2 t2.medium. This EC2 instance type has two vCPUs and 4GB of RAM. On that single EC2 instance, we run passenger (free not enterprise) set to have 10 passenger processes, although the base speed test with only one connection should only touch one of the workers. The t2 is a “burstable” type, and we do always have burst credits (this is not a high traffic app; verified we never exhausted burst credits in these tests), so our test load may be taking advantage of burst cpu.

$ URLS=./sample_works.txt  wrk -c 1 -t 1 -d 3m --timeout 20s --latency -s load_test/multiplepaths.lua.txt https://[current staging server]
 multiplepaths: Found 400 paths
 multiplepaths: Found 400 paths
 Running 3m test @ https://staging-digital.sciencehistory.org
   1 threads and 1 connections
   Thread Stats   Avg      Stdev     Max   +/- Stdev
     Latency   311.00ms  388.11ms   2.37s    86.45%
     Req/Sec    11.89      8.96    40.00     69.95%
   Latency Distribution
      50%   90.99ms
      75%  453.40ms
      90%  868.81ms
      99%    1.72s
   966 requests in 3.00m, 177.43MB read
 Requests/sec:      5.37
 Transfer/sec:      0.99MB

I’m actually feeling pretty good about those numbers on our current infrastructure! 90ms median, not bad, and even 453ms 75th percentile is not too bad. Now, our test load involves some JSON responses that are quicker to deliver than corresponding HTML page, but still pretty good. The 90th/99th/and max request (2.37s) aren’t great, but I knew I had some slow pages, this matches my previous understanding of how slow they are in our current infrastructure.

90th percentile is ~9 times 50th percenile.

I don’t have an understanding of why the two different Req/Sec and Requests/Sec values are so different, and don’t totally understand what to do with the Stdev and +/- Stdev values, so I’m just going to be sticking to looking at the latency percentiles, I think “latency” could also be called “response times” here.

But ok, this is our baseline for this workload. And doing this 3 minute test at various points over the past few days, I can say it’s nicely regular and consistent, occasionally I got a slower run, but 50th percentile was usually 90ms–105ms, right around there.

Heroku standard-2x: base speed

From previous mucking about, I learned I can only reliably fit one puma worker in a standard-1x, and heroku says “we typically recommend a minimum of 2 processes, if possible” (for routing algorithmic reasons when scaled to multiple dynos), so I am just starting at a standard-2x with two puma workers each with 5 threads, matching heroku recommendations for a standard-2x dyno.

So one thing I discovered is that bencharks from a heroku standard dyno are really variable, but here are typical ones:

$ heroku dyno:resize
 type     size         qty  cost/mo
 ───────  ───────────  ───  ───────
 web      Standard-2X  1    50

$ heroku config:get --shell WEB_CONCURRENCY RAILS_MAX_THREADS
 WEB_CONCURRENCY=2
 RAILS_MAX_THREADS=5

$ URLS=./sample_works.txt  wrk -c 1 -t 1 -d 3m --timeout 20s --latency -s load_test/multiplepaths.lua.txt https://scihist-digicoll.herokuapp.com/
 multiplepaths: Found 400 paths
 multiplepaths: Found 400 paths
 Running 3m test @ https://scihist-digicoll.herokuapp.com/
   1 threads and 1 connections
   Thread Stats   Avg      Stdev     Max   +/- Stdev
     Latency   645.08ms  768.94ms   4.41s    85.52%
     Req/Sec     5.78      4.36    20.00     72.73%
   Latency Distribution
      50%  271.39ms
      75%  948.00ms
      90%    1.74s
      99%    3.50s
   427 requests in 3.00m, 74.51MB read
 Requests/sec:      2.37
 Transfer/sec:    423.67KB

I had heard that heroku standard dynos would have variable performance, because they are shared multi-tenant resources. I had been thinking of this like during a 3 minute test I might see around the same median with more standard deviation — but instead, what it looks like to me is that running this benchmark on Monday at 9am might give very different results than at 9:50am or Tuesday at 2pm. The variability is over a way longer timeframe than my 3 minute test — so that’s something learned.

Running this here and there over the past week, the above results seem to me typical of what I saw. (To get better than “seem typical” on this resource, you’d have to run a test, over several days or a week I think, probably not hammering the server the whole time, to get a sense of actual statistical distribution of the variability).

I sometimes saw tests that were quite a bit slower than this, up to a 500ms median. I rarely if ever saw results too much faster than this on a standard-2x. 90th percentile is ~6x median, less than my current infrastructure, but that still gets up there to 1.74 instead of 864ms.

This typical one is quite a bit slower than than our current infrastructure, our median response time is 3x the latency, with 90th and max being around 2x. This was worse than I expected.

Heroku performance-m: base speed

Although we might be able to fit more puma workers in RAM, we’re running a single-connection base speed test, so it shouldn’t matter to, and we won’t adjust it.

$ heroku dyno:resize
 type     size           qty  cost/mo
 ───────  ─────────────  ───  ───────
 web      Performance-M  1    250

$ heroku config:get --shell WEB_CONCURRENCY RAILS_MAX_THREADS
 WEB_CONCURRENCY=2
 RAILS_MAX_THREADS=5

$ URLS=./sample_works.txt  wrk -c 1 -t 1 -d 3m --timeout 20s --latency -s load_test/multiplepaths.lua.txt https://scihist-digicoll.herokuapp.com/
 multiplepaths: Found 400 paths
 multiplepaths: Found 400 paths
 Running 3m test @ https://scihist-digicoll.herokuapp.com/
   1 threads and 1 connections
   Thread Stats   Avg      Stdev     Max   +/- Stdev
     Latency   377.88ms  481.96ms   3.33s    86.57%
     Req/Sec    10.36      7.78    30.00     37.03%
   Latency Distribution
      50%  117.62ms
      75%  528.68ms
      90%    1.02s
      99%    2.19s
   793 requests in 3.00m, 145.70MB read
 Requests/sec:      4.40
 Transfer/sec:    828.70KB

This is a lot closer to the ballpark of our current infrastructure. It’s a bit slower (117ms median intead of 90ms median), but in running this now and then over the past week it was remarkably, thankfully, consistent. Median and 99th percentile are both 28% slower (makes me feel comforted that those numbers are the same in these two runs!), that doesn’t bother me so much if it’s predictable and regular, which it appears to be. The max appears to me still a little bit less regular on heroku for some reason, since performance is supposed to be non-shared AWS resources, you wouldn’t expect it to be, but slow requests are slow, ok.

90th percentile is ~9x median, about the same as my current infrastructure.

heroku performance-l: base speed

$ heroku dyno:resize
 type     size           qty  cost/mo
 ───────  ─────────────  ───  ───────
 web      Performance-L  1    500

$ heroku config:get --shell WEB_CONCURRENCY RAILS_MAX_THREADS
 WEB_CONCURRENCY=2
 RAILS_MAX_THREADS=5

URLS=./sample_works.txt  wrk -c 1 -t 1 -d 3m --timeout 20s --latency -s load_test/multiplepaths.lua.txt https://scihist-digicoll.herokuapp.com/
 multiplepaths: Found 400 paths
 multiplepaths: Found 400 paths
 Running 3m test @ https://scihist-digicoll.herokuapp.com/
   1 threads and 1 connections
   Thread Stats   Avg      Stdev     Max   +/- Stdev
     Latency   471.29ms  658.35ms   5.15s    87.98%
     Req/Sec    10.18      7.78    30.00     36.20%
   Latency Distribution
      50%  123.08ms
      75%  635.00ms
      90%    1.30s
      99%    2.86s
   704 requests in 3.00m, 130.43MB read
 Requests/sec:      3.91
 Transfer/sec:    741.94KB

No news is good news, it looks very much like performance-m, which is exactly what we expected, because this isn’t a load test. It tells us that performance-m and performance-l seem to have similar CPU speeds and similar predictable non-variable regularity, which is what I find running this test periodically over a week.

90th percentile is ~10x median, about the same as current infrastructure.

The higher Max speed is just evidence of what I mentioned, the speed of slowest request did seem to vary more than on our manual t2.medium, can’t really explain why.

Summary: Base speed

Not sure how helpful this visualization is, charting 50th, 75th, and 90th percentile responses across architectures.

But basically: performance dynos perform similarly to my (bursting) t2.medium. Can’t explain why performance-l seems slightly slower than performance-m, might be just incidental variation when I ran the tests.

The standard-2x is about twice as slow as my (bursting) t2.medium. Again recall standard-2x results varied a lot every time I ran them, the one I reported seems “typical” to me, that’s not super scientific, admittedly, but I’m confident that standard-2x are a lot slower in median response times than my current infrastructure.

Throughput under load

Ok, now we’re going to test using wrk to use more connections. In fact, I’ll test each setup with various number of connections, and graph the result, to get a sense of how each formation can handle throughput under load. (This means a lot of minutes to get all these results, at 3 minutes per number of connection test, per formation!).

An additional thing we can learn from this test, on heroku we can look at how much RAM is being used after a load test, to get a sense of the app’s RAM usage under traffic to understand the maximum number of puma workers we might be able to fit in a given dyno.

Existing t2.medium: Under load

A t2.medium has 4G of RAM and 2 CPUs. We run 10 passenger workers (no multi-threading, since we are free, rather than enterprise, passenger). So what do we expect? With 2 CPUs and more than 2 workers, I’d expect it to handle 2 simultaneous streams of requests almost as well as 1; 3-10 should be quite a bit slower because they are competing for the 2 CPUs. Over 10, performance will probably become catastrophic.

2 connections are exactly flat with 1, as expected for our two CPUs, hooray!

Then it goes up at a strikingly even line. Going over 10 (to 12) simultaneous connections doesn’t matter, even though we’ve exhausted our workers, I guess at this point there’s so much competition for the two CPUs already.

The slope of this curve is really nice too actually. Without load, our median response time is 100ms, but even at a totally overloaded 12 overloaded connections, it’s only 550ms, which actually isn’t too bad.

We can make a graph that in addition to median also has 75th, 90th, and 99th percentile response time on it:

It doesn’t tell us too much; it tells us the upper percentiles rise at about the same rate as the median. At 1 simultaneous connection 90th percentile of 846ms is about 9 times the median of 93ms; at 10 requests the 90th percentile of 3.6 seconds is about 8 times the median of 471ms.

This does remind us that under load when things get slow, this has more of a disastrous effect on already slow requests than fast requests. When not under load, even our 90th percentile was kind of sort of barley acceptable at 846ms, but under load at 3.6 seconds it really isn’t.

Single Standard-2X dyno: Under load

A standard-2X dyno has 1G of RAM. The (amazing, excellent, thanks schneems) heroku puma guide suggests running two puma workers with 5 threads each. At first I wanted to try running three workers, which seemed to fit into available RAM — but under heavy load-testing I was getting Heroku R14 Memory Quota Exceeded errors, so we’ll just stick with the heroku docs recommendations. Two workers with 5 threads each fit with plenty of headroom.

A standard-2x dyno is runs on shared (multi-tenant) underlying Amazon virtual hardware. So while it is running on hardware with 4 CPUs (each of which can run two “hyperthreads“), the puma doc suggests “it is best to assume only one process can execute at a time” on standard dynos.

What do we expect? Well, if it really only had one CPU, it would immediately start getting bad at 2 simulataneous connections, and just get worse from there. When we exceed the two worker count, will it get even worse? What about when we exceed the 10 thread (2 workers * 5 threads) count?

You’d never run just one dyno if you were expecting this much traffic, you’d always horizontally scale. This very artificial test is just to get a sense of it’s characteristics.

Also, we remember that standard-2x’s are just really variable; I could get much worse or better runs than this, but graphed numbers from a run that seemed typical.

Well, it really does act like 1 CPU, 2 simultaneous connections is immediately a lot worse than 1.

The line isn’t quite as straight as in our existing t2.medium, but it’s still pretty straight; I’d attribute the slight lumpiness to just the variability of shared-architecture standard dyno, and figure it would get perfectly straight with more data.

It degrades at about the same rate of our baseline t2.medium, but when you start out slower, that’s more disastrous. Our t2.medium at an overloaded 10 simultaneous requests is 473ms (pretty tolerable actually), 5 times the median at one request only. This standard-2x has a median response time of 273 ms at only one simultaneous request, and at an overloaded 10 requests has a median response time also about 5x worse, but that becomes a less tolerable 1480ms.

Does also graphing the 75th, 90th, and 99th percentile tell us much?

Eh, I think the lumpiness is still just standard shared-architecture variability.

The rate of “getting worse” as we add more overloaded connections is actually a bit better than it was on our t2.medium, but since it already starts out so much slower, we’ll just call it a wash. (On t2.medium, 90th percentile without load is 846ms and under an overloaded 10 connections 3.6s. On this single standard-2x, it’s 1.8s and 5.2s).

I’m not sure how much these charts with various percentiles on them tell us, I’ll not include them for every architecture hence.

standard-2x, 4 dynos: Under load

OK, realistically we already know you shouldn’t have just one standard-2x dyno under that kind of load. You’d scale out, either manually or perhaps using something like the neat Rails Autoscale add-on.

Let’s measure with 4 dynos. Each is still running 2 puma workers, with 5 threads each.

What do we expect? Hm, treating each dyno as if it has only one CPU, we’d expect it to be able to handle traffic pretty levelly up to 4 simultenous connections, distributed to 4 dynos. It’s going to do worse after that, but up to 8 there is still one puma worker per connection so it might get even worse after 8?

Well… I think that actually is relatively flat from 1 to 4 simultaneous connections, except for lumpiness from variability. But lumpiness from variability is huge! We’re talking 250ms median measured at 1 connection, up to 369ms measured median at 2, down to 274ms at 3.

And then maybe yeah, a fairly shallow slope up to 8 simutaneous connections than steeper.

But it’s all fairly shallow slope compared to our base t2.medium. At 8 connections (after which we pretty much max out), the standard-2x median of 464ms is only 1.8 times the median at 1 conection. Compared to the t2.median increase of 3.7 times.

As we’d expect, scaling out to 4 dynos (with four cpus/8 hyperthreads) helps us scale well — the problem is the baseline is so slow to begin (with very high bounds of variability making it regularly even slower).

performance-m: Under load

A performance-m has 2.5 GB of memory. It only has one physical CPU, although two “vCPUs” (two hyperthreads) — and these are all your apps, it is not shared.

By testing under load, I demonstrated I could actually fit 12 workers on there without any memory limit errors. But is there any point to doing that with only 1/2 CPUs? Under a bit of testing, it appeared not.

The heroku puma docs recommend only 2 processes with 5 threads. You could do a whole little mini-experiment just trying to measure/optimize process/thread count on performance-m! We’ve already got too much data here, but in some experimentation it looked to me like 5 processes with 2 threads each performed better (and certainly no worse) than 2 processes with 5 threads — if you’ve got the RAM just sitting there anyway (as we do), why not?

I actually tested with 6 puma processes with 2 threads each. There is still a large amount of RAM headroom we aren’t going to use even under load.

What do we expect? Well, with the 2 “hyperthreads” perhaps it can handle 2 simultaneous requests nearly as well as 1 (or not?); after that, we expect it to degrade quickly same as our original t2.medium did.

It an handle 2 connections slightly better than you’d expect if there really was only 1 CPU, so I guess a hyperthread does give you something. Then the slope picks up, as you’d expect; and it looks like it does get steeper after 4 simultaneous connections, yup.

performance-l: Under load

A performance-l ($500/month) costs twice as much as a performance-m ($250/month), but has far more than twice as much resources. performance-l has a whopping 14GB of RAM compared to performance-m’s 2.5GB; and performance-l has 4 real CPUs/hyperthreads available to use (visible using the nproc technique in the heroku puma article.

Because we have plenty of RAM to do so, we’re going to run 10 worker processes to match our original t2.medium’s. We still ran with 2 threads, just cause it seems like maybe you should never run a puma worker with only one thread? But who knows, maybe 10 workers with 1 thread each would perform better; plenty of room (but not plenty of my energy) for yet more experimentation.

What do we expect? The graph should be pretty flat up to 4 simultaneous connections, then it should start getting worse, pretty evenly as simultaneous connections rise all the way up to 12.

It is indeed pretty flat up to 4 simultaneous connections. Then up to 8 it’s still not too bad — median at 8 is only ~1.5 median at 1(!). Then it gets worse after 8 (oh yeah, 8 hyperthreads?).

But the slope is wonderfully shallow all the way. Even at 12 simultaneous connections, the median response time of 266ms is only 2.5x what it was at one connection. (In our original t2.medium, at 12 simultaneous connections median response time was over 5x what it was at 1 connection).

This thing is indeed a monster.

Summary Comparison: Under load

We showed a lot of graphs that look similar, but they all had different sclaes on the y-axis. Let’s plot median response times under load of all architectures on the same graph, and see what we’re really dealing with.

The blue t2.medium is our baseline, what we have now. We can see that there isn’t really a similar heroku option, we have our choice of better or worse.

The performance-l is just plain better than what we have now. It starts out performing about the same as what we have now for 1 or 2 simultaneous connections, but then scales so much flatter.

The performance-m also starts out about thesame, but sccales so much worse than even what we have now. (it’s that 1 real CPU instead of 2, I guess?).

The standard-2x scaled to 4 dynos… has it’s own characteristics. It’s baseline is pretty terrible, it’s 2 to 3 times as slow as what we have now even not under load. But then it scales pretty well, since it’s 4 dynos after all, it doesn’t get worse as fast as performance-m does. But it started out so bad, that it remains far worse than our original t2.medium even under load. Adding more dynos to standard-2x will help it remain steady under even higher load, but won’t help it’s underlying problem that it’s just slower than everyone else.

Discussion: Thoughts and Surprises

  • I had been thinking of a t2.medium (even with burst) as “typical” (it is after all much slower than my 2015 Macbook), and has been assuming (in retrospect with no particular basis) that a heroku standard dyno would perform similarly.
    • Most discussion and heroku docs, as well as the naming itself, suggest that a ‘standard’ dyno is, well, standard, and performance dynos are for “super scale, high traffic apps”, which is not me.
    • But in fact, heroku standard dynos are much slower and more variable in performance than a bursting t2.medium. I suspect they are slower than other options you might consider non-heroku “typical” options.



  • My conclusion is honestly that “standard” dynos are really “for very fast, well-optimized apps that can handle slow and variable CPU” and “performance” dynos are really “standard, matching the CPU speeds you’d get from a typical non-heroku option”. But this is not how they are documented or usually talked about. Are other people having really different experiences/conclusions than me? If so, why, or where have I gone wrong?
    • This of course has implications for estimating your heroku budget if considering switching over. :(
    • If you have a well-optimized fast app, say even 95th percentile is 200ms (on bursting t2.medium), then you can handle standard slowness — so what your 95th percentile is now 600ms (and during some time periods even much slower, 1s or worse, due to variability). That’s not so bad for a 95th percentile.
    • One way to get a very fast is of course caching. There is lots of discussion of using caching in Rails, sometimes the message (explicit or implicit) is “you have to use lots of caching to get reasonable performance cause Rails is so slow.” What if many of these people are on heroku, and it’s really you have to use lots of caching to get reasonable performance on heroku standard dyno??
    • I personally don’t think caching is maintenance free; in my experience properly doing cache invalidation and dealing with significant processing spikes needed when you choose to invalidate your entire cache (cause cached HTML needs to change) lead to real maintenance/development cost. I have not needed caching to meet my performance goals on present architecture.
    • Everyone doesn’t necessarily have the same performance goals/requirements. Mine of a low-traffic non-commercial site are are maybe more modest, I just need users not to be super annoyed. But whatever your performance goals, you’re going to have to spend more time on optimization on a heroku standard than something with much faster CPU — like a standard affordable mid-tier EC2. Am I wrong?


  • One significant factor on heroku standard dyno performance is that they use shared/multi-tenant infrastructure. I wonder if they’ve actually gotten lower performance over time, as many customers (who you may be sharing with) have gotten better at maximizing their utilization, so the shared CPUs are typically more busy? Like a frog boiling, maybe nobody noticed that standard dynos have become lower performance? I dunno, brainstorming.
    • Or maybe there are so many apps that start on heroku instead of switcching from somewhere else, that people just don’t realize that standard dynos are much slower than other low/mid-tier options?
    • I was expecting to pay a premium for heroku — but even standard-2x’s are a significant premium over paying for t2.medium EC2 yourself, one I found quite reasonable…. performance dynos are of course even more premium.


  • I had a sort of baked-in premise that most Rails apps are “IO-bound”, they spend more time waiting on IO than using CPU. I don’t know where I got that idea, I heard it once a long time ago and it became part of my mental model. I now do not believe this is true true of my app, and I do not in fact believe it is true of most Rails apps in 2020. I would hypothesize that most Rails apps today are in fact CPU-bound.

  • The performance-m dyno only has one CPU. I had somehow also been assuming that it would have two CPUs — I’m not sure why, maybe just because at that price! It would be a much better deal with two CPUs.
    • Instead we have a huge jump from $250 performance-m to $500 performance-l that has 4x the CPUs and ~5x the RAM.
    • So it doesn’t make financial sense to have more than one performance-m dyno, you might as well go to performance-l. But this really complicates auto-scaling, whether using Heroku’s feature , or the awesome Rails Autoscale add-on. I am not sure I can afford a performance-l all the time, and a performance-m might be sufficient most of the time. But if 20% of the time I’m going to need more (or even 5%, or even unexpectedly-mentioned-in-national-media), it would be nice to set things up to autoscale up…. I guess to financially irrational 2 or more performance-m’s? :(

  • The performance-l is a very big machine, that is significantly beefier than my current infrastructure. And has far more RAM than I need/can use with only 4 physical cores. If I consider standard dynos to be pretty effectively low tier (as I do), heroku to me is kind of missing mid-tier options. A 2 CPU option at 2.5G or 5G of RAM would make a lot of sense to me, and actually be exactly what I need… really I think performance-m would make more sense with 2 CPUs at it’s existing already-premium price point, and to be called a “performance” dyno. . Maybe heroku is intentionally trying set options to funnel people to the highest-priced performance-l.

Conclusion: What are we going to do?

In my investigations of heroku, my opinion of the developer UX and general service quality only increases. It’s a great product, that would increase our operational capacity and reliability, and substitute for so many person-hours of sysadmin/operational time if we were self-managing (even on cloud architecture like EC2).

But I had originally been figuring we’d use standard dynos (even more affordably, possibly auto-scaled with Rails Autoscale plugin), and am disappointed that they end up looking so much lower performance than our current infrastructure.

Could we use them anyway? Response time going from 100ms to 300ms — hey, 300ms is still fine, even if I’m sad to lose those really nice numbers I got from a bit of optimization. But this app has a wide long-tail ; our 75th percentile going from 450ms to 1s, our 90th percentile going from 860ms to 1.74s and our 99th going from 2.3s to 4.4s — a lot harder to swallow. Especially when we know that due to standard dyno variability, a slow-ish page that on my present architecture is reliably 1.5s, could really be anywhere from 3 to 9(!) on heroku.

I would anticipate having to spend a lot more developer time on optimization on heroku standard dynos — or, i this small over-burdened non-commercial shop, not prioritizing that (or not having the skills for it), and having our performance just get bad.

So I’m really reluctant to suggest moving our app to heroku with standard dynos.

A performance-l dyno is going to let us not have to think about performance any more than we do now, while scaling under high-traffic better than we do now — I suspect we’d never need to scale to more than one performance-l dyno. But it’s pricey for us.

A performance-m dyno has a base-speed that’s fine, but scales very poorly and unaffordably. Doesn’t handle an increase in load very well as one dyno, and to get more CPUs you have to pay far too much (especially compared to standard dynos I had been assuming I’d use).

So I don’t really like any of my options. If we do heroku, maybe we’ll try a performance-m, and “hope” our traffic is light enough that a single one will do? Maybe with Rails autoscale for traffic spikes, even though 2 performance-m dynos isn’t financially efficient? If we are scaling to 2 (or more!) performance-m’s more than very occasionally, switch to performance-l, which means we need to make sure we have the budget for it?

Deep Dive: Moving ruby projects from Travis to Github Actions for CI

So this is one of my super wordy posts, if that’s not your thing abort now, but some people like them. We’ll start with a bit of context, then get to some detailed looks at Github Actions features I used to replace my travis builds, with example config files and examination of options available.

For me, by “Continuous Integration” (CI), I mostly mean “Running automated tests automatically, on your code repo, as you develop”, on every PR and sometimes with scheduled runs. Other people may mean more expansive things by “CI”.

For a lot of us, our first experience with CI was when Travis-ci started to become well-known, maybe 8 years ago or so. Travis was free for open source, and so darn easy to set up and use — especially for Rails projects, it was a time when it still felt like most services focused on docs and smooth fit for ruby and Rails specifically. I had heard of doing CI, but as a developer in a very small and non-profit shop, I want to spend time writing code not setting up infrastructure, and would have had to get any for-cost service approved up the chain from our limited budget. But it felt like I could almost just flip a switch and have Travis on ruby or rails projects working — and for free!

Free for open source wasn’t entirely selfless, I think it’s part of what helped Travis literally define the market. (Btw, I think they were the first to invent the idea of a “badge” URL for a github readme?) Along with an amazing Developer UX (which is today still a paragon), it just gave you no reason not to use it. And then once using it, it started to seem insane to not have CI testing, nobody would ever again want to develop software without the build status on every PR before merge.

Travis really set a high bar for ease of use in a developer tool, you didn’t need to think about it much, it just did what you needed, and told you what you needed to know in it’s read-outs. I think it’s an impressive engineering product. But then.

End of an era

Travis will no longer be supporting open source projects with free CI.

The free open source travis projects originally ran on travis-ci.org, with paid commercial projects on travis-ci.com. In May 2018, they announced they’d be unifying these on travis-ci.com only, but with no announced plan that the policy for free open source would change. This migration seemed to proceed very slowly though.

Perhaps because it was part of preparing the company for a sale, in Jan 2019 it was announced private equity firm Idera had bought travis. At the time the announcement said “We will continue to maintain a free, hosted service for open source projects,” but knowing what “private equity” usually means, some were concerned for the future. (HN discussion).

While the FAQ on the migration to travis-ci.com still says that travis-ci.org should remain reliable until projects are fully migrated, in fact over the past few months travis-ci.org projects largely stopped building, as travis apparently significantly reduced resources on the platform. Some people began manually migrating their free open source projects to travis-ci.com where builds still worked. But, while the FAQ also still says “Will Travis CI be getting rid of free users? Travis CI will continue to offer a free tier for public or open-source repositories on travis-ci.com” — in fact, travis announced that they are ending the free service for open source. The “free tier” is a limited trial (available not just to open source), and when it expires, you can pay, or apply to a special program for an extension, over and over again.

They are contradicting themselves enough that while I’m not sure exactly what is going to happen, but no longer trust them as a service.

Enter Github Actions

I work mostly on ruby and Rails projects. They are all open source, almost all of them use travis. So while (once moved to travis-ci.com) they are all currently working, it’s time to start moving them somewhere else, before I have dozens of projects with broken CI and still don’t know how to move them. And the new needs to be free — many of these projects are zero-budget old-school “volunteer” or “informal multi-institutional collaboration” open source.

There might be several other options, but the one I chose is Github Actions — my sense that it had gotten mature enough to start approaching travis level of polish, and all of my projects are github-hosted, and Github Actions is free for unlimited use for open source. (pricing page; Aug 2019 announcement of free for open source). And we are really fortunate that it became mature and stable in time for travis to withdraw open source support (if travis had been a year earlier, we’d be in trouble).

Github Actions is really powerful. It is built to do probably WAY MORE than travis does, definitely way beyond “automated testing” to various flows for deployment and artifact release, to really just about any kind of process for managing your project you want. The logic you can write almost unlimited, all running on github’s machines.

As a result though…. I found it a bit overwhelming to get started. The Github Actions docs are just overwhelmingly abstract, there is so much there, you can almost anything — but I don’t actually want to learn a new platform, I just want to get automated test CI for my ruby project working! There are some language/project speccific Guides available, for node.js, python, a few different Java setups — but not for ruby or Rails! My how Rails has fallen, from when most services like this would be focusing on Rails use cases first. :(

There are some third part guides available that might focus on ruby/rails, but one of the problems is that Actions has been evolving for a few years with some pivots, so it’s easy to find outdated instructions. One I found helpful orientation was this Drifting Ruby screencast. This screencast showed me there is a kind of limited web UI with integrated docs searcher — but i didn’t end up using it, I just created the text config file by hand, same as I would have for travis. Github provides templates for “ruby” or “ruby gem”, but the Drifting Ruby sccreencast said “these won’t really work for our ruby on rails application so we’ll have to set up one manually”, so that’s what I did too. ¯\_(ツ)_/¯

But the cost of all the power github Actions provides is… there are a lot more switches and dials to understand and get right (and maintain over time and across multiple projects). I’m not someone who likes copy-paste without understanding it, so I spent some time trying to understand the relevant options and alternatives; in the process I found some things I might have otherwise copy-pasted from other people’s examples that could be improved. So I give you the results of my investigations, to hopefully save you some time, if wordy comprehensive reports are up your alley.

A Simple Test Workflow: ruby gem, test with multiple ruby versions

Here’s a file for a fairly simple test workflow. You can see it’s in the repo at .github/workflows. The name of the file doesn’t matter — while this one is called ruby.yml, i’ve since moved over to naming the file to match the name: key in the workflow for easier traceability, so I would have called it ci.yml instead.

Triggers

You can see we say that this workflow should be run on any push to master branch, and also for any pull_request at all. Many other examples I’ve seen define pull_request: branches: ["main"], which seems to mean only run on Pull Requests with main as the base. While that’s most of my PR’s, if there is ever a PR that uses another branch as a base for whatever reason, I still want to run CI! While hypothetically you should be able leave branches out to mean “any branch”, I only got it to work by explicitly saying branches: ["**"]

Matrix

For this gem, we want to run CI on multiple ruby versions. You can see we define them here. This works similarly to travis matrixes. If you have more than one matrix variable defined, the workflow will run for every combination of variables (hence the name “matrix”).

      matrix:
        ruby: [ '2.4.4', '2.5.1', '2.6.1', '2.7.0', 'jruby-9.1.17.0', 'jruby-9.2.9.0' ]

In a given run, the current value of the matrix variables is available in github actions “context”, which you can acccess as eg ${{ matrix.ruby }}. You can see how I use that in the name, so that the job will show up with it’s ruby version in it.

    name: Ruby ${{ matrix.ruby }}

Ruby install

While Github itself provides an action for ruby install, it seems most people are using this third-party action. Which we reference as `ruby/setup-ruby@v1`.

You can see we use the matrix.ruby context to tell the setup-ruby action what version of ruby to install, which works because our matrix values are the correct values recognized by the action. Which are documented in the README, but note that values like jruby-head are also supported.

Note, although it isn’t clearly documented, you can say 2.4 to mean “latest available 2.4.x” (rather than it meaning “2.4.0”), which is hugely useful, and I’ve switched to doing that. I don’t believe that was available via travis/rvm ruby install feature.

For a project that isn’t testing under multiple rubies, if we left out the with: ruby-version, the action will conveniently use a .ruby-version file present in the repo.

Note you don’t need to put a gem install bundler into your workflow yourself, while I’m not sure it’s clearly documented, I found the ruby/setup-ruby action would do this for you (installing the latest available bundler, instead of using whatever was packaged with ruby version), btw regardless of whether you are using the bundler-cache feature (see below).

Note on How Matrix Jobs Show Up to Github

With travis, testing for multiple ruby or rails versions with a matrix, we got one (or, well, actually two) jobs showing up on the Github PR:

Each of those lines summaries a collection of matrix jobs (eg different ruby versions). If any of the individual jobs without the matrix failed, the whole build would show up as failed. Success or failure, you could click on “Details” to see each job and it’s status:

I thought this worked pretty well — especially for “green” builds I really don’t need to see the details on the PR, the summary is great, and if I want to see the details I can click through, great.

With Github Actions, each matrix job shows up directly on the PR. If you have a large matrix, it can be… a lot. Some of my projects have way more than 6. On PR:

Maybe it’s just because I was used to it, but I preferred the Travis way. (This also makes me think maybe I should change the name key in my workflow to say eg CI: Ruby 2.4.4 to be more clear? Oops, tried that, it just looks even weirder in other GH contexts, not sure.)

Oh, also, that travis way of doing the build twice, once for “pr” and once for “push”? Github Actions doesn’t seem to do that, it just does one, I think corresponding to travis “push”. While the travis feature seemed technically smart, I’m not sure I ever actually saw one of these builds pass while the other failed in any of my projects, I probably won’t miss it.

Badge

Did you have a README badge for travis? Don’t forget to swap it for equivalent in Github Actions.

The image url looks like: https://github.com/$OWNER/$REPOSITORY/workflows/$WORKFLOW_NAME/badge.svg?branch=master, where $WORKFLOW_NAME of course has to be URL-escaped if it ocntains spaces etc.

The github page at https://github.com/owner/repo/actions, if you select a particular workflow/branch, does, like travis, give you a badge URL/markdown you can copy/paste if you click on the three-dots and then “Create status badge”. Unlike travis, what it gives you to copy/paste is just image markdown, it doesn’t include a link.

But I definitely want the badge to link to viewing the results of the last build in the UI. So I do it manually. Limit to the speciifc workflow and branch that you made the badge for in the UI then just copy and paste the URL from the browser. A bit confusing markdown to construct manually, here’s what it ended up looking like for me:

I copy and paste that from an existing project when I need it in a new one. :shrug:

Require CI to merge PR?

However, that difference in how jobs show up to Github, the way each matrix job shows up separately now, has an even more negative impact on requiring CI success to merge a PR.

If you want to require that CI passes before merging a PR, you configure that at https://github.com/acct/project/settings/branches under “Branch protection rules”.When you click “Add Rule”, you can/must choose WHICH jobs are “required”.

For travis, that’d be those two “master” jobs, but for the new system, every matrix job shows up separately — in fact, if you’ve been messing with job names trying to get it right as I have, you have any job name that was ever used in the last 7 days, and they don’t have the Github workflow name appended to them or anything (another reason to put github workflow name in the job name?).

But the really problematic part is that if you edit your list of jobs in the matrix — adding or removing ruby versions as one does, or even just changing the name that shows up for a job — you have to go back to this screen to add or remove jobs as a “required status check”.

That seems really unworkable to me, I’m not sure how it hasn’t been a major problem already for users. It would be better if we could configure “all the checks in the WORKFLOW, whatever they may be”, or perhaps best of all if we could configure a check as required in the workflow YML file, the same place we’re defining it, just a required_before_merge key you could set to true or use a matrix context to define or whatever.

I’m currently not requiring status checks for merge on most of my projects (even though i did with travis), because I was finding it unmanageable to keep the job names sync’d, especially as I get used to Github Actions and kept tweaking things in a way that would change job names. So that’s a bit annoying.

fail_fast: false

By default, if one of the matrix jobs fails, Github Acitons will cancel all remaining jobs, not bother to run them at all. After all, you know the build is going to fail if one job fails, what do you need those others for?

Well, for my use case, it is pretty annoying to be told, say, “Job for ruby 2.7.0 failed, we can’t tell you whether the other ruby versions would have passed or failed or not” — the first thing I want to know is if failed on all ruby versions or just 2.7.0, so now I’d have to spend extra time figuring that out manually? No thanks.

So I set `fail_fast: false` on all of my workflows, to disable this behavior.

Note that travis had a similar (opt-in) fast_finish feature, which worked subtly different: Travis would report failure to Github on first failure (and notify, I think), but would actually keep running all jobs. So when I saw a failure, I could click through to ‘details’ to see which (eg) ruby versions passed, from the whole matrix. This does work for me, so I’d chose to opt-in to that travis feature. Unfortunately, the Github Actions subtle difference in effect makes it not desirable to me.

Note You may see some people referencing a Github Actions continue-on-error feature. I found the docs confusing, but after experimentation what this really does is mark a job as successful even when it fails. It shows up in all GH UI as succeeeded even when it failed, the only way to know it failed would be to click through to the actual build log to see failure in the logged console. I think “continue on error” is a weird name for this; it is not useful to me with regard to fine-tuning fail-fast; or honestly in any other use case I can think of that I have.

Bundle cache?

bundle install can take 60+ seconds, and be a significant drag on your build (not to mention a lot of load on rubygems servers from all these builds). So when travis introduced a feature to cache: bundler: true, it was very popular.

True to form, Github Actions gives you a generic caching feature you can try to configure for your particular case (npm, bundler, whatever), instead of an out of the box feature “just do the right thing you for bundler, you figure it out”.

The ruby/setup-ruby third-party action has a built-in feature to cache bundler installs for you, but I found that it does not work right if you do not have a Gemfile.lock checked into the repo. (Ie, for most any gem, rather than app, project). It will end up re-using cached dependencies even if there are new releases of some of your dependencies, which is a big problem for how I use CI for a gem — I expect it to always be building with latest releases of dependencies, so I can find out of one breaks the build. This may get fixed in the action.

If you have an app (rather than gem) with a Gemfile.lock checked into repo, the bundler-cache: true feature should be just fine.

Otherwise, Github has some suggestions for using the generic cache feature for ruby bundler (search for “ruby – bundler” on this page) — but I actually don’t believe they will work right without a Gemfile.lock checked into the repo either.

Starting from that example, and using the restore-keys feature, I think it should be possible to design a use that works much like travis’s bundler cache did, and works fine without a checked-in Gemfile.lock. We’d want it to use a cache from the most recent previous (similar job), and then run bundle install anyway, and then cache the results again at the end always to be available for the next run.

But I haven’t had time to work that out, so for now my gem builds are simply not using bundler caching. (my gem builds tend to take around 60 seconds to do a bundle install, so that’s in every build now, could be worse).

update nov 27: The ruby/ruby-setup action should be fixed to properly cache-bust when you don’t have a Gemfile.lock checked in. If you are using a matrix for ruby version, as below, you must set the ruby version by setting the BUNDLE_GEMFILE env variable rather than the way we did it below, and there is is a certain way Github Action requires/provides you do that, it’s not just export. See the issue in ruby/ruby-setup project.

Notifications: Not great

Travis has really nice defaults for notifications: The person submitting the PR would get an email generally only on status changes (from pass to fail or fail to pass) rather than on every build. And travis would even figure out what email to send to based on what email you used in your git commits. (Originally perhaps a workaround to lack of Github API at travis’ origin, I found it a nice feature). And then travis has sophisticated notification customization available on a per-repo basis.

Github notifications are unfortunately much more basic and limited. The only notification settings avaialable are for your entire account at https://github.com/settings/notifications, “GitHub Actions”. So they apply to all github workflows in all projects, there are no workflow- or project-specific settings. You can set to receive notification via web push or email or both or neither. You can receive notifications for all builds or only failed builds. That’s it.

The author of a PR is the one who receives the notifications, same as in travis. You will get notifications for every single build, even repeated successes or failures in a series.

I’m not super happy with the notification options. I may end up just turning off Github Actions notifications entirely for my account.

Hypothetically, someone could probably write a custom Github action to give you notifications exactly how travis offered — after all, travis was using public GH API that should be available to any other author, and I think should be usable from within an action. But when I started to think through it, while it seemed an interesting project, I realized it was definitely beyond the “spare hobby time” I was inclined to give to it at present, especially not being much of a JS developer (the language of custom GH actions, generally). (While you can list third-party actions on the github “marketplace”, I don’t think there’s a way to charge for them). .

There are custom third-party actions available to do things like notify slack for build completion; I haven’t looked too much into any of them, beyond seeing that I didn’t see any that would be “like travis defaults”.

A more complicated gem: postgres, and Rails matrix

Let’s move to a different example workflow file, in a different gem. You can see I called this one ci.yml, matching it’s name: CI, to have less friction for a developer (including future me) trying to figure out what’s going on.

This gem does have rails as a dependency and does test against it, but isn’t actually a Rails engine as it happens. It also needs to test against Postgres, not just sqlite3.

Scheduled Builds

At one point travis introduced a feature for scheduling (eg) weekly builds even when no PR/commit had been made. I enthusiastically adopted this for my gem projects. Why?

Gem releases are meant to work on a variety of different ruby versions and different exact versions of dependencies (including Rails). Sometimes a new release of ruby or rails will break the build, and you want to know about that and fix it. With CI builds happening only on new code, you find out about this with some random new code that is unlikely to be related to the failure; and you only find out about it on the next “new code” that triggers a build after a dependency release, which on some mature and stable gems could be a long time after the actual dependency release that broke it.

So scheduled builds for gems! (I have no purpose for scheduled test runs on apps).

Github Actions does have this feature. Hooray. One problem is that you will receive no notification of the result of the scheduled build, success or failure. :( I suppose you could include a third-party action to notify a fixed email address or Slack or something else; not sure how you’d configure that to apply only to the scheduled builds and not the commit/PR-triggered builds if that’s what you wanted. (Or make an custom action to file a GH issue on failure??? But make sure it doesn’t spam you with issues on repeated failures). I haven’t had the time to investigate this yet.

Also oops just noticed this: “In a public repository, scheduled workflows are automatically disabled when no repository activity has occurred in 60 days.” Which poses some challenges for relying on scheduled builds to make sure a stable slow-moving gem isn’t broken by dependency updates. I definitely am committer on gems that are still in wide use and can go 6-12+ months without a commit, because they are mature/done.

I still have it configured in my workflow; I guess even without notifications it will effect the “badge” on the README, and… maybe i’ll notice? Very far from ideal, work in progress. :(

Rails Matrix

OK, this one needs to test against various ruby versions AND various Rails versions. A while ago I realized that an actual matrix of every ruby combined with every rails was far too many builds. Fortunately, Github Actions supports the same kind of matrix/include syntax as travis, which I use.

     matrix:
        include:
          - gemfile: rails_5_0
            ruby: 2.4

          - gemfile: rails_6_0
            ruby: 2.7

I use the appraisal gem to handle setting up testing under multiple rails versions, which I highly recommend. You could use it for testing variant versions of any dependencies, I use it mostly for varying Rails. Appraisal results in a separate Gemfile committed to your repo for each (in my case) rails version, eg ./gemfiles/rails_5_0.gemfile. So those values I use for my gemfile matrix key are actually portions of the Gemfile path I’m going to want to use for each job.

Then we just need to tell bundler, in a given matrix job, to use the gemfile we specified in the matrix. The old-school way to do this is with the BUNDLE_GEMFILE environmental variable, but I found it error-prone to make sure it stayed consistently set in each workflow step. I found that the newer (although not that new!) bundle config set gemfile worked swimmingly! I just set it before the bundle install, it stays set for the rest of the run including the actual test run.

steps:
    # [...]
    - name: Bundle install
      run: |
        bundle config set gemfile "${GITHUB_WORKSPACE}/gemfiles/${{ matrix.gemfile }}.gemfile"
        bundle install --jobs 4 --retry 3

Note that single braces are used for ordinary bash syntax to reference the ENV variable ${GITHUB_WORKSPACE}, but double braces for the github actions context value interpolation ${{ matrix.gemfile }}.

Works great! Oh, note how we set the name of the job to include both ruby and rails matrix values, important for it showing up legibly in Github UI: name: ${{ matrix.gemfile }}, ruby ${{ matrix.ruby }}. Because of how we constructed our gemfile matrix, that shows up with job names rails_5_0, ruby 2.7.

Still not using bundler caching in this workflow. As before, we’re concerned about the ruby/setup-ruby built-in bundler-cache feature not working as desired without a Gemfile.lock in the repo. This time, I’m also not sure how to get that feature to play nicely with the variant gemfiles and bundle config set gemfile. Github Actions makes you put together a lot more pieces together yourself compared to travis, there are still things I just postponed figuring out for now.

update jan 11: the ruby/setup-ruby action now includes a ruby version matrix example in it’s README. https://github.com/ruby/setup-ruby#matrix-of-gemfiles It does require you use the BUNDLE_GEMFILE env variable, rather than the bundle config set gemfile command I used here. This should ordinarily be fine, but is something to watch out for in case other instructions you are following tries to use bundle config set gemfile instead, for reasons or not.

Postgres

This project needs to build against a real postgres. That is relatively easy to set up in Github Actions.

Postgres normally by default allows connections on localhost without a username/password set, and my past builds (in travis or locally) took advantage of this to not bother setting one, which then the app didn’t have to know about. But the postgres image used for Github Actions doesn’t allow this, you have to set a username/password. So the section of the workflow that sets up postgres looks like:

jobs:
   tests:
     services:
       db:
         image: postgres:9.4
         env:
           POSTGRES_USER: postgres
           POSTGRES_PASSWORD: postgres
         ports: ['5432:5432']

5432 is the default postgres port, we need to set it and map it so it will be available as expected. Note you also can specify whatever version of postgres you want, this one is intentionally testing on one a bit old.

OK now our Rails app that will be executed under rspec needs to know that username and password to use in it’s postgres connection; when before it connected without a username/password. That env under the postgres service image is not actually available to the job steps. I didn’t find any way to DRY the username/password in one place, I had to repeat it in another env block, which I put at the top level of the workflow so it would apply to all steps.

And then I had to alter my database.yml to use those ENV variables, in the test environment. On a local dev machine, if your postgres doens’t have a username/password requirement and you don’t set the ENV variables, it keeps working as before.

I also needed to add host: localhost to the database.yml; before, the absence of the host key meant it used a unix-domain socket (filesystem-located) to connect to postgres, but that won’t work in the Github Actions containerized environment.

Note, there are things you might see in other examples that I don’t believe you need:

  • No need for an apt-get of pg dev libraries. I think everything you need is on the default GH Actions images now.
  • Some examples I’ve seen do a thing with options: --health-cmd pg_isready, my builds seem to be working just fine without it, and less code is less code to maintain.

allow_failures

In travis, I took advantage of the travis allow_failures key in most of my gems.

Why? I am testing against various ruby and Rails versions; I want to test against *future* (pre-release, edge) ruby and rails versions, cause its useful to know if I’m already with no effort passing on them, and I’d like to keep passing on them — but I don’t want to mandate it, or prevent PR merges if the build fails on a pre-release dependency. (After all, it could very well be a bug in the dependency too!)

There is no great equivalent to allow_failures in Github Actions. (Note again, continue_on_error just makes failed jobs look identical to successful jobs, and isn’t very helpful here).

I investigated some alternatives, which I may go into more detail on in a future post, but on one project I am trying a separate workflow just for “future ruby/rails allowed failures” which only checks master commits (not PRs), and has a separate badge on README (which is actually pretty nice for advertising to potential users “Yeah, we ALREADY work on rails edge/6.1.rc1!”). Main downside there is having to copy/paste synchronize what’s really the same workflow in two files.

A Rails app

I have many more number of projects I’m a committer on that are gems, but I spend more of my time on apps, one app in specific.

So here’s an example Github Actions CI workflow for a Rails app.

It mostly remixes the features we’ve already seen. It doesn’t need any matrix. It does need a postgres.

It does need some “OS-level” dependencies — the app does some shell-out to media utilities like vips and ffmpeg, and there are integration tests that utilize this. Easy enough to just install those with apt-get, works swimmingly.

        - name: Install apt dependencies
          run: |
            sudo apt-get -y install libvips-tools ffmpeg mediainfo

Update 25 Nov: My apt-get that worked for a couple weeks started failing for some reason on trying to install a libpulse0 dependency of one of those packages, the solution was doing a sudo apt-get update before the sudo apt-get install. I guess this is always good practice? (That forum post also uses apt install and apt update instead of apt-get install and apt-get update, that I can’t tell you much about, I’m really not a linux admin).

In addition to the bundle install, a modern Rails app using webpacker needs yarn install. This just worked for me — no need to include lines for installing npm itself or yarn or any yarn dependencies, although some examples I find online have them. (My yarn installs seem to happen in ~20 seconds, so I’m not motivated to try to figure out caching for yarn).

And we need to create the test database in the postgres, which I do with RAILS_ENV=test bundle exec rails db:create — typical Rails test setup will then automatically run migrations if needed. There might be other (better?) ways to prep the database, but I was having trouble getting rake db:prepare to work, and didn’t spend the time to debug it, just went with something that worked.

    - name: Set up app
       run: |
         RAILS_ENV=test bundle exec rails db:create
         yarn install

Rails test setup usually ends up running migrations automatically is why I think this worked alone, but you could also throw in a RAILS_ENV=test bundle exec rake db:schema:load if you wanted.

Under travis I had to install chrome with addons: chrome: stable to have it available to use with capybara via the webdrivers gem. No need for installing chrome in Github Actions, some (recent-ish?) version of it is already there as part of the standard Github Actions build image.

In this workflow, you can also see a custom use of the github “cache” action to cache a Solr install that the test setup automatically downloads and sets up. In this case the cache doesn’t actually save us any build time, but is kinder on the apache foundation servers we are downloading from with every build otherwise (and have gotten throttled from in the past).

Conclusion

Github Aciton sis a really impressively powerful product. And it’s totally going to work to replace travis for me.

It’s also probably going to take more of my time to maintain. The trade-off of more power/flexibility and focusing on almost limitless use cases is more things th eindividual project has to get right for their use case. For instance figuring out the right configuration to get caching for bundler or yarn right, instead of just writing cache: { yarn: true, bundler: true}. And when you have to figure it out yourself, you can get it wrong, which when you are working on many projects at once means you have a bunch of places to fix.

The amazingness of third-party action “marketplace” means you have to figure out the right action to use (the third-party ruby/setup-ruby instead of the vendor’s actions/setup-ruby), and again if you change your mind about that you have a bunch of projects to update.

Anyway, it is what it is — and I’m grateful to have such a powerful and in fact relatively easy to use service available for free! I could not really live without CI anymore, and won’t have to!

Oh, and Github Actions is giving me way more (free) simultaneous parallel workers than travis ever did, for my many-job builds!