systemd timers vs cron

Chris Siebenmann recently posted an article on his excellent blog titled, “Systemd timer units have the unfortunate practical effect of hiding errors“.

I posted a comment in reply:

Switching from cron to systemd timers is definitely an operational change.

The emphasis on emails feels like status quo bias, though. Imagine the situation was reversed: that everything was using systemd timers and then someone wrote cron and people started switching to that. In that case, there is a similar operational change. You’d switch from having a centralized status (e.g. systemctl list-units --failed) and centralized logging (the journal, which also defaults to forwarding to syslog) to crond sending emails. Is that an improvement, a step backwards, neither or both? Either way, I’d say the most important thing is that you need to integrate the new tool into your environment.

FWIW, at my work, we are in the process of converting all of our cron jobs into systemd service and timer pairs. One of the big reasons for that is that we already have systemd failed units monitored by Icinga, so this eliminates a separate way of monitoring things (emails to root) in favor of our unified alarming system. Also, emails are not great if an “every minute” or “every five minute” cron job starts failing.

We also expect other advantages. For one, service units are easier to develop & debug, as you can just start them with systemctl, without having to fiddle with the cron definition to run it at “the next minute” and then remember to change it back to the production timings when you’re done. Also, systemd timers can be randomized to spread the load rather than having every system wake up at the same moment and start running e.g. daily jobs. (I’m aware that cronie has RANDOM_DELAY, but Debian & Ubuntu still use Debian’s vixie-cron which does not.)

Time will tell if this was a good idea or not. Assuming this goes well for us, the next phase will be to switch from crond to systemd-cron, a (third-party) systemd generator that creates service and timer units from crontabs. This will dynamically convert any package cron jobs.

If emails are what you want, systemd timers are definitely a step backwards in that regard. Emails can be done, and systemd-cron has a setup for them for the units it converts, but it is additional work. And for timer-triggered services that are provided by distro packages (i.e. not you), while you can use drop-in config files to add the relevant configuration, you have to do that per-service. This is extra work, and more importantly, you have to know about all such units you have installed, which does not scale.

Another, more general, option would be to wire up something to check for failed units and send an email based on that.

FreeBSD Code of Conduct

Slashdot linked to the FreeBSD Code of Conduct. The article claims there is some controversy, so that’s what the comments focus on. I wrote:

Having just read the Code of Conduct, it seem generally fine. Some of my concerns are that the rules are too broad, and some are that they are too narrow.

The “Comments that reinforce systemic oppression related to” wording seems super vague. This portion has the highest potential for abusive use. To be clear, I’m fine with all the protected criteria that come in that rule. I’d much prefer replacing that with “Harassing comments related to”

The “unwelcome comments” thing is pretty broad. If someone says to me on IRC, “I’m tired all the time.” and I say, “You should stop eating so much junk food and get some exercise.”, I’m now in trouble if they feel that comment is unwelcome. With this rule, the only option for me is to never engage in such a conversation. Is that helpful or harmful to building relationships and living fulfilling lives? I think it’s more harmful than helpful. Now, I agree that continually nagging that person to eat healthy is inappropriate. If this was limited to “repeated”, “after being asked to stop”, or similar, it would be better.

I have some concerns about the “dead” names thing. I get and agree with the point: use the names people pick for themselves. As long as this isn’t enforced robotically, it should be fine. There are some legitimate reasons to use names that were in use in times in the past. For example, I think citations to publications should use the name of the author at the time it was published, because the point of the citation is to help you find the publication. This is supported by, for example, an APA Style Blog post. The issue of whether to change one’s name is complicated for the individual and has implications for the wider community.

For another example, yesterday I was considering replying to a years-old mailing list comment, and quoting some text. The author of the quoted text is trans and has changed names. Am I required to edit the “On DATE, NAME wrote:” line? To be clear, in new text, I would address this person using their new name (and have actually done so). I said in a follow up comment: I actually struggled with this for several minutes before ultimately deciding to just drop the “On DATE, NAME” bit. I ultimately determined the answer to my own question, so I dropped the email before sending it.

I personally don’t see a problem with person A saying “*hugs*” to person B without (advance) consent. Though, this is situational. If someone says, “Sorry for the delay on this bug, I’ve been distracted. My dog died.”, I see no problem with “Sorry to hear about your dog. *hugs*”. On the other hand, something like “You’re such a special snowflake. *hugs*” is an improper ad hominem attack. Even in the first example, I do have a problem if they keep doing it after being told by person B to stop, so that rule is fine. On the other hand, saying “*backrub*” out of the blue does seem across the line. I’m struggling to think of an example where that would be unambiguously appropriate.

I’m not sure why the “as necessary to protect vulnerable people from intentional abuse” exception exists to the “outing” rule. Why would it be necessary or acceptable to out someone to protect them? I said in a follow up comment: In terms of the exception to the “outing” rule, I was assuming that the person being outed was the vulnerable person. I see my error now, and this makes sense.

“Publication of non-harassing private communication without consent.” is problematic as a blanket rule. If someone says something important publicly which is materially contradicted by private statements, that might be necessary (albeit tacky) to share, even if those private statements are non-harrassing.

“Knowingly making harmful false claims about a person.” I would strike harmful. Why is it necessary that the false claims be harmful?

ISP Traffic Prioritization

This was originally posted as a Slashdot comment. It discusses the idea of prioritizing traffic in an ISP environment, ideally using markings generated by the customers.

I do network engineering at an ISP. We are small, though I have discussed these things with my peers at larger networks.

Once you scale above a very small network (like your home connection), allowing congestion isn’t really okay in practice, even with QoS. When I say it’s not “okay” here, I’m speaking purely technically.

It might be possible to let networks congest somewhat if you had a large amount of elastic traffic that you could reliably identify. Netflix, for example, could meet these criteria. But that’s not okay politically; that’s an example of why net neutrality is good!

QoS in carrier networks is only useful for priority (de-)queuing of traffic to reduce latency and jitter. For example, real-time voice or video traffic could benefit. This is where it’d be nice to actually be able to honor user traffic markings.

It’s not (currently at least) practical to make the decisions on a flow-by-flow basis in the core of the network (which is what your proposal would require). This is a hardware scaling issue. To be clear, tracking flows statistically is okay at scale. ISPs do plenty with NetFlow/sFlow. But taking an incoming packet, assigning it to a flow, and marking it appropriately, for every packet, in real time is the scaling challenge.

The following approach would scale perfectly in trusted CPE (ONT/cable modem) or reasonably well in a DSLAM (for DSL). Give each user (for example) two queues. Honor the incoming DSCP markings. Put a small, but reasonable, limit on the size of the priority queue; overflowing traffic gets remarked and placed into the non-priority queue. Then, honor markings through the rest of the network.

There are a few problems with even this approach. First off, there are going to be users who legitimately create more high priority traffic than any limit that’s acceptable across the board. Is it okay to charge them for a higher limit? If not, how do you avoid gaming the system? If yes, won’t that incentivize ISPs to set the limit to zero and charging for all priority? Is that okay? If so, what fraction of people will request and pay for priority in that world? Will that be enough to encourage application developers to mark traffic appropriately? Or does this just degrade into our current zero-priority Internet?

Second, this only gets you one direction (upload). To handle the download direction, you’d need to honor priority bits on your upstream and peering links. But there, you can’t trust the markings (unless it’s a 1:1 peering link and you are guaranteed your peer implements a compatible policy at their incoming edge), at least without policing. Policing the queues there is easy, but gives you terrible results in real life. If the limit is exceeded with traffic that “should not have been” marked priority, it will destroy the prioritization of “legitimate” priority flows by forcing some fraction of their packets into the non-priority queue. If you accept all (or just a high enough fraction of) incoming traffic as priority traffic, then you have destroyed the prioritization yourself. If you try to mark flows per IP/customer, we’re back to that scaling problem.

It might be possible to do something that involves tracking flows at the customer edge and using the incoming markings for the downstream direction. But this is only prioritizing in the last mile. At best, this is a lot of work for very little benefit.

“Past Due”

Dear banks: if someone has automatic payments setup, it is impossible for them to ever be past due, unless a payment bounces. Please get your programmers to ensure that your websites never show such inaccuracies. I’m really sick of things showing as “past due” randomly and then clearing up and “customer service” treating it like it is okay that it ever showed that way because it cleared up in the end.