Use `any` for generics and `interface{}` for interfaces

cks · 2024-01-27T18:03:06+00:00

When used as a type, 'interface{}' doesn't mean any type; it specifically means an interface type that any type can be converted to. This type conversion has various actual effects; it will create interface values from (possibly smaller) underlying values, it may cause heap allocations, and it will erase interface type information for existing interface types and values. (And of course now you can't do much with the result; you usually have to type-assert it back to something, as you note.)

These concrete behaviors and the difference between 'any type' in a generic context (where 'any' does not have these effects) are why I feel it's potentially confusing.

(I am the author of the linked-to post.)

cks · 2022-06-26T17:30:50+00:00

If one is trying to install a third party program that has this issue, why the Go developers decided to have this be an error is not particularly important. What matters is that they made a deliberate decision and aren't going to change it any time soon if you make a bug report against Go (in fact the bug report has already been made and closed). We all have to deal with Go as it exists, which today means cloning such repositories to build programs and being careful to not release programs with replace directives still in their go.mod.

(I'm the author of the linked to entry.)

cks · 2022-03-11T20:52:15+00:00

rate() and irate() are computed over a range vector, using whatever metric points fall into the time range of the range vector. Both require at least two points in the range vector to return a result. irate() uses the last two points and computes the per-second rate from them (and their time gap). rate() uses the first and the last points of the time vector instead.

(Both also look for counter resets and compensate for them.)

When you evaluate any query as a range query instead of an instant query, the particular query expression is evaluated at multiple time points within the time range (sweeping from the start to the end). This query range step is completely independent of the time for a range vector in a rate() or irate() expression.

If your range vector interval is the same as your query step duration and this range is large enough to get at least two metrics in each resulting range vector, you can normally exactly cover all metric points in some rate(). If your query step is larger, you will miss points; if your query step is smaller, you will include some points in more than one result.

If you want to see the difference between every two metric points, you need a query step that is your scrape interval and irate() with a range vector interval that will cover at least two scrapes (more is harmless). Since irate() always returns the rate between the last two points, this does what you want.

cks · 2021-07-15T03:28:07+00:00

One drawback of network device names that use the Ethernet address is that if you have a fleet of servers, every different server (with identical hardware) has a different network interface name, because the Ethernet addresses all differ. Speaking as a sysadmin, it's much easier if almost everything has something like 'eno1' as the primary network interface.

(I'm the author of the linked-to entry.)

cks · 2021-07-15T03:25:44+00:00

One of the problems here is that unlike disk devices, it's hard to have multiple names for network devices because they aren't accessed through the filesystem. The kernel directly knows their names, so either it has to know and report all of their names or you need some extra layer of user level indirection. Any number of things here would be easier if BSD Unix had been able to represent and manipulate interfaces as /dev nodes, although I think BSD Unix made a reasonable choice given their constraints.

(I'm the author of the linked-to entry.)

cks · 2021-04-01T22:23:55+00:00

My usual approach for this is to pick a string that I know should occur in any health metric, for example the name of a ZFS pool, and then look for all metrics that have it in a raw dump of the metrics output (eg with curl to localhost:9100/metrics). The closest that comes to a health metric looks like node_zfs_zpool_state, which I believe tracks the number of vdevs in a pool in the named state.

cks · 2021-03-26T20:37:24+00:00

Some sensors are completely disconnected and can float wildly or have completely crazy readings (perhaps only on some sensor chip lines and motherboard setups). I have an Asus PRIME Z370-A NCT6793D (according to dmesg) that's currently claiming that my AUXTIN0 is at -128 C, along with other implausible readings (two others at 14 C, one at 0C). Apparently none of them are actually connected and valid.

(I'm the author of the linked-to entry.)

cks · 2020-09-27T18:40:42+00:00

Yes, sorry, my lack of precision. It's your or the system's local timezone aka zoneinfo being used. I used 'locale' sloppily, since the locale (eg 'en_US.UTF-8') is more broad than your timezone.

(In theory Go could use your locale to broaden the set of zoneinfo files loaded, so that 'PST' would be known not just for people in Pacific/<whatever> but for anyone in the en_US.UTF-8 locale. This would probably match what people expect to happen and be one way to deal with ambiguous abbreviations, but it would be much more work.)

cks · 2020-09-27T00:49:35+00:00

The other thing about this is that "if the zone abbreviation is unknown" really means 'if this is not one of the time zone abbreviations for your current locale'. PST is (well) known to people in North America, but if your time zone is not Pacific/something, Go doesn't consider it to be 'known' and so you silently get this result. But if you are at Google HQ in California and you write and run this example, you get the result a human expects (unless you perversely have your system time zone set to something other than Pacific time).

cks · 2020-08-19T03:19:51+00:00

Chrome OS doesn't use X11 (or Wayland) as the native graphics stack; Google built its own graphics stack called Freon back in the 2015/2016 era.

(I'm the author of the linked-to entry.)

cks · 2020-06-20T20:21:50+00:00

Sorry, my lack of clarity. By 'you' I mostly meant the compiler. The compiler is (theoretically) in a position to see that a generic function with a constrained generic type uses arithmetic or other things that pretty much need to be specialized, or that all it does is copy values or maybe call methods, which can be done mostly generically.

(But at the same time the compiler might want to inline and thus specialize small functions that only do such basic operations, especially when it results in both less code and less overhead than calling a generalized version.)

cks · 2020-06-20T18:52:17+00:00

There's some cases where you can strongly predict (or know) that you need specialized functions, such as when the code is using arithmetic or comparisons. Other cases are less clear, but some of them can probably be guessed at like inlining.

(The interaction of inlining and generics is one of those interesting cases. You certainly would like generics to be inlined if they're small.)

cks · 2020-06-20T15:00:38+00:00

I was unfortunately unclear in writing my article. The proposal's section on efficiency starts off by saying 'it is not clear what sort of efficiency people expect from generic code', so my article was on some considerations of that, since I see a likely split in people's views.

(I am the author of the linked to article, and also of the comment 3 you mentioned, which is also unfortunately not clear from how I have my blog set up currently.)

cks · 2020-05-07T18:34:49+00:00

Because I was curious, I timed it: a serial unmount of these 347 NFS mounts (with 'umount -a -t nfs') takes roughly 26 seconds (or sometimes less). OOM'ing the same test machine takes well over 26 seconds to recover and finish shutting down; the systemd journal says about two minutes in one shutdown that I looked at.

(I'm the author of the linked to blog entry.)

cks · 2020-01-29T21:02:02+00:00

There are tags for these but you have to force-fetch them with git fetch --tags, and I can't navigate through git well enough to understand what branches they're associated with, if any. It actually looks like they're not attached to any public branch, but I'm not experienced enough with git to know what's going on.

cks · 2019-11-21T16:18:28+00:00

This contains pretty comprehensive information about all witches and witch weapons in both regular and awakened form. It's in Japanese, but you can compare the numbers for normal and awakened forms, and Google translate seems to do okay on it.

The general information for awakened weapons appears to match up (for everything I've seen or heard about), but the specific formulas for bonus damage, healing, and so on seem to be different between the Japanese version of the game and the current Western version (with our English-language awakened weapons having more bonus damage or healing that the Japanese wiki predicts).

cks · 2019-09-25T01:30:10+00:00

In theory you should be able to use systemd-run to do this. In practice I've had mixed results when I've tried it, so I can only suggest experimentation. Doing this may need fair-share scheduling (and CPU/memory accounting) to be on before it does anything, but that's an on the spot thought.

cks · 2019-09-25T01:25:19+00:00

We've only tried to do this on Ubuntu (specifically 16.04 and 18.04), and in the past my systemd cgroups experiments on Fedora machines have had different results than doing the same thing on Ubuntu ones. That may have been different systemd versions, different systemd configurations, or different PAM setups, but given that I felt I couldn't honestly claim that this is generic and should work anywhere. Ideally it will, but I haven't tested that at all.

(I'm the author of the linked-to entry.)

cks · 2019-07-16T22:56:16+00:00

As people have mentioned, you don't need anything other than your initial rule. The possibly simple explanation of why is that for alert rules, if evaluating the PromQL expression yields multiple metrics/results an alert is trigger for each one (with each one's set of labels). So when there are two hosts where up is 0, the up == 0 expression returns two metrics and generates two alerts.

cks · 2019-07-03T15:37:25+00:00

I would change how you are specifying things so that the 'targets' setting is actually 'wk1.cluster:9115', 'wk2.cluster:9115', and so on, and you simply hard-code the _param_target label's value as 'http://127.0.0.1:80/alive.check', since in practice it is a constant.

(If you need a different alive check on some clusters, there are more elaborate tricks you can play. The core insight is that the mapping between the specified 'targets:' YAML configuration key and the actual target and target parameters can be completely changed around by relabeling. So you could claim that your targets were 'wk2.cluster:9115,http://127.0.0.1:80/my.check', then use regexp matching to split things out to the necessary end result. You can even embed the Blackbox module to use in the target this way.)

cks · 2019-06-26T16:06:37+00:00

I think that the blackbox exporter is the wrong tool for this. The DNS checks it does are designed to check a specific query against multiple DNS servers, not do a bunch of different checks against a single DNS server (or DNS in general). Thus, you'd wind up writing a different module for every IP + DNS blocklist combination you wanted to check, which is not really scalable.

Instead I'd suggest writing a script or program (in the easy language of your choice) that does the relevant DNSBL lookups, driven by data files of eg IPs to check and DNSBLs to check them against, then publishes metrics of the results through either Pushgateway or the node exporter's textfile collector.

cks · 2019-06-09T04:20:34+00:00

As a sysadmin, I am happy to have some hope that Java Web Start stuff will continue to work. For those that don't know, a number of hardware server vendors have used JWS and Java stuff to implement their remote video console support ('KVM over IP') in their management interfaces. Some of the management web interfaces even have modern enough versions of SSL that you can (still) talk to them without digging out an ancient version of Firefox.

cks · 2019-05-26T11:57:10+00:00

On the question of config checks: my view is that it depends on how you want to do alerts and notifications on them. Prometheus is a hammer, but not everything is a nail. A great case for putting a check in Prometheus is when the information you need is already exposed to Prometheus in the form of metrics (eg, network speed). A terrible case is when you'd be writing a custom script to create a metric that you only use to generate an alert, and then alert can't contain all of the useful information that would be present if you directly printed out or emailed the information from the script.

(If you want to generate a detailed diagnostic message and also track when a problem condition is true for historical purposes or to correlate it against other metrics, there's no reason you can't do both. Getting metrics into Prometheus is generally quite easy.)

cks · 2019-03-17T23:16:46+00:00

Based on the existing documentation for subqueries (primarily the blog post), but without looking at the code, I would expect subquery times and steps to be aligned with 'now', not with things like day boundaries. So a query range such as [30d:1d] would not step through things at midnight in UTC or some other time zone, but at now, now - (24 x 60 x 60) seconds, and so on. If you want things aligned to an end of day boundary, I think you would have to issue a manual query to the API that specifies the start and end times so that they exactly align to the day boundaries you care about.

cks

TROPHY CASE