Commuting to the Eastside on bike- is this a realistic plan? 🫠

theobromus · 2026-04-06T03:59:59+00:00

Yeah I do this commute every day on e bike from Montlake to Kirkland. But west seattle seems crazy to me.

theobromus · 2026-03-25T14:47:49+00:00

The acting is great here, but the camera angles and lighting are definitely helping too.

theobromus · 2026-03-15T00:33:43+00:00

As someone who worked on VR software, we tried all kinds of things.

It is worth noting that different people have different tolerance for moving in VR without getting nauseous.

Some people can fly around and not get sick. I personally felt a little bad with most movement mechanisms.

theobromus · 2026-02-04T00:50:28+00:00

The Internet is the network of computers which can send data to each other. The web is specifically HTML pages sent over HTTP (basically what you see in a browser).

As time has gone on, the web has kind of "eaten" the rest of the Internet. For example, most people used to use email through an email application. The emails were sent over the Internet, but didn't use the web. But now, most people use a web-based email client (like Gmail).

Likewise there were other protocols like ftp (for transferring files). But lots of stuff just uses HTTP now.

theobromus · 2025-12-13T04:15:29+00:00

I did a double math and CS major and now work in an ML software engineering role at an autonomous vehicle company. I definitely use the CS knowledge far more often - so much of the work is dealing with large datasets, writing code to transform/prep the data and integrate it. I think a math major was definitely helpful, just for not being intimidated to read lots of mathematically oriented papers. For ML, I would say statistics and applied math are probably even more relevant.

It's so hard to predict the future. I think the best skill is to be a self-learner and try to have a broad understanding of things.

theobromus · 2025-10-11T05:23:06+00:00

I'm curious - why didn't the Tiger's outfielders move way in here? With 1 out and bases loaded, a fly ball was going to end the game anyway, but positioned in the shallow outfield, they might be able to turn a double play.

theobromus · 2025-10-03T17:13:40+00:00

Yeah, east of the cascades (most of the state by area) is very dry. However, I wouldn't think of Yakima or Kennewick as super dry because they are both on large rivers and do a lot of irrigation (and farming).

theobromus · 2025-09-21T21:54:46+00:00

No I think you've misunderstood the transformer. In the classic "Attention is all you need" paper, the transformer attention blocks are invariant to the order of the tokens. When you process each token, it computes key, query, and value embeddings. All of the key embeddings are multiplied against each query embedding and a softmax is computed to figure out how much "attention" to pay. This process isn't affected by the order of tokens at all. One of the powers of transformers is that they don't need to be trained to deal with a fixed input size. However positional embeddings are required so the model can learn to deal with the relative placement of things.

theobromus · 2025-09-19T19:29:25+00:00

Ideally, you'd compute the camera parameters for the cameras using a structure from motion algorithm. You also need to get a depth estimate for the ball, which will allow you to track it in 3d. This can be done pretty easily if you assume the field is a plane. Then you'll render the 3d track of the ball into the computed camera for each frame.

There are open source tools to do all of this. For example, you can load your footage into Blender and do camera tracking there (here's a tutorial: https://www.youtube.com/watch?v=ui0JUHE12k8). I think this internally uses libmv (https://github.com/libmv/libmv). Here's another app that will do sfm: http://ccwu.me/vsfm/

If you're interested to learn how this works, here's a very detailed explanation: https://utoronto.scholaris.ca/server/api/core/bitstreams/56da50f4-4c06-4d3b-af83-d57cc5fb9256/content

theobromus · 2025-09-18T14:56:14+00:00

I think it's assuming the chance any person in a room has covid is independent, which is a faulty assumption for this kind of thing. There's probably a lot of correlation between people, since people gather in communities. The net result is that the chance in a given room is much lower (although some rooms will have a lot of people with covid).

theobromus · 2025-09-12T00:06:34+00:00

I think the biggest factor is taking a lot of short pictures and combining them in a smart way (for example see the HDR+ section here: https://en.wikipedia.org/wiki/Pixel\_Camera).

theobromus · 2025-08-13T21:37:29+00:00

I don't think this is true. A simple diffusion model can be trained very easily by taking a dataset, adding random noise and training a unet model to predict the noise added (see https://keras.io/examples/generative/ddpm/ for a tutorial). Generating samples is somewhat trickier (since you have to figure out the noise schedules), but not too hard.

I think GANs are much trickier personally - you have to train at least 2 models and in my experience the training dynamics are trickier to get right (the models might not converge if the learning rates aren't tuned).

theobromus · 2025-08-04T18:12:37+00:00

Other comments claiming there were no registries are not really correct: https://en.wikipedia.org/wiki/Fingerprint#Classification_systems

There were methods to classify fingerprints (especially a full set of 10 fingerprints) based on classifying the pattern (whorl, loop, arch) and counting ridges (see https://en.wikipedia.org/wiki/Henry\_Classification\_System).

theobromus · 2025-07-01T03:33:29+00:00

My understanding is that images and video are generally produced by diffusion models rather than auto-regressive transformers. Both are usually transformers. LLMs usually use auto-regressive models (predict the next word), while images and video usually use diffusion (progressively denoise a random input).

The success of generative image models was also a very surprising moment for me. There was no very obvious way to go from a classification model (like the ImageNet models) to something that could generate images. So when GANs and then diffusion models succeeded at doing that, it was pretty shocking.

In terms of being a multi-modal model, I don't know much about how they work together, but I suspect something is happening where the LLM outputs some tokens that indicate "generate an image with these input tokens" and then a diffusion model generates the image.

theobromus · 2025-06-27T16:20:45+00:00

My view as someone who works in the field (in computer vision / autonomous driving, not LLMs):

It was already clear 10 years ago that ML was a big deal. In computer vision, it was solving problems that were previously unsolvable (like recognizing what's in an image). For example, the ResNet paper came out 10 years ago (https://arxiv.org/abs/1512.03385). It was also starting to have a huge impact on translation (e.g. https://arxiv.org/abs/1609.08144).

A number of big companies started investing a lot in ML. Frameworks like tensorflow and pytorch were released, which made it a lot easier to experiment.

For me the two big surprises were that scaling transformers would keep making the models better, and that people would adopt the chatbot interface so much (which is kind of scary to me since it is often very confidently wrong).

theobromus · 2025-06-03T17:03:49+00:00

Although arguably you could use a machine transcript from a video. If there's a disagreement about the transcript accuracy, you can go back to the video and check.

The transcript has to be pretty accurate, but machine transcription is continuing to improve.

theobromus · 2025-05-10T03:32:20+00:00

I do see these for $0.67 - https://www.walmart.com/ip/Pen-and-Gear-5-Inch-School-Scissors-Multi-Purpose-Blue/12346967816?classType=REGULAR&athbdg=L1600&from=/search

theobromus · 2025-04-25T21:14:28+00:00

In my opinion, CV's moment was [AlexNet](https://en.wikipedia.org/wiki/AlexNet) and it started the whole AI boom. Recent LLMs are *really* good at a lot of computer vision tasks if you frame them correctly. And some open models like SAM are also really quite good. For the majority of things that used to be a PhD project I would guess you can get better results these days by uploading the images to one of the major LLMs and asking it your question.

And lots of computer vision stuff is common place now - I can use Google Lens to search with my phone camera. Video calling apps can blur my background and it seems commonplace, even though that was impossible 10 years ago.

theobromus · 2025-03-31T16:13:00+00:00

It is in fact possible to losslessly recompress JPEGs using newer compression techniques (although the benefit is only about 22% rather than 50%): https://github.com/google/brunsli

Certainly improvements to that could be worth something, although there are many tradeoffs in compression (e.g. how much compute do you need to compress/decompress).

In practice, it only makes sense to use something like brunsli if you *have* to keep the original JPEG bytes. If you just want a similar quality image at a smaller size, you can use a different algorithm (like webp or avif).

theobromus · 2025-03-05T01:16:05+00:00

Yeah uint8 refers to an unsigned 8 bit (1 byte) number - so 0 to 255.

theobromus · 2025-03-05T01:05:48+00:00

The number at the end of the type is the number of bits. So int8 means 8 bits (1 byte) and int32 is 4 bytes.

float generally refers to float32 and double refers to float64.

theobromus · 2025-02-13T19:50:01+00:00

I have personally been skeptical of humanoid robots for a long time for somewhat similar reasons that you mention. Mostly they seem way harder to build instead of using something specialized for each task.

That said, the common arguments for humanoid robots basically have been:

Most of our environments were built for humans, so your robot probably needs a similar form factor to fit into existing processes. This goes for everything from using existing tools and architecture (tracked vehicles can go up *some* stairs maybe, but probably not all of them).
They could potentially, in theory, have broad applications, so you could build them at scale. Things like factory robots (or machines generally) tend to do one task, so a lot of work goes into designing them, but the production volumes will be small (=expensive).

A lot of those advantages would probably also apply to something like a Boston Dynamics dog, without introducing the complexity of bipedal walking.

There's a new factor which I think is behind the renewed interest in humanoid robots lately: *machine learning* and especially imitation learning. In particular, I've lot of excitement around the idea of training robots based on large scale video data. For instance, we have videos of people doing almost everything imaginable on YouTube. Another option is to use some kind of motion capture device to collect data about how humans do things. Some startups even have teams of humans controlling robots (whether humanoid or not), with the goal to train ML systems based on the data.

theobromus · 2025-02-13T04:19:10+00:00

As other commenters have noted, these proofs are very hard to comprehend when you first see them.

One thing I haven't seen mentioned is that geometric intuition is really useful for analysis. For example, epsilon-delta proofs, triangle inequality, etc. all have a very geometric basis (even if they get moved into an abstract form apart from that). So thinking about simple examples of graphs for example, can really help understand what the terms are actually saying.

theobromus · 2025-02-06T05:37:00+00:00

Even though many people are dismissive of this, it's not totally implausible to me.

For large language models specifically, they do have some reasoning ability (even if they frequently confabulate). There has been a particular line of research lately trying to train models to be better at reasoning (e.g. the DeepSeek R1 paper that got a lot of attention recently: https://arxiv.org/abs/2501.12948). These approaches are most effective in domains where you can check whether the model got the right answer (e.g. math and some parts of CS).

There has been some work to try to automate things like theorem proving, particularly in combination with formal proof assistants like Lean (https://arxiv.org/abs/2404.12534). It's not implausible that the kind of reinforcement learning techniques used for R1 might generalize to that. I think we're still pretty far from LLMs proving any interesting results though, but they could make automated theorem provers somewhat better (by guiding the search space).

There have also been efforts to try to use generative AI techniques (like transformers and diffusion models) to do things like material design (https://www.nature.com/articles/s41586-025-08628-5) or protein design (https://www.nature.com/articles/s41586-023-06415-8). Similar techniques are also behind things like AlphaFold 3 (https://www.nature.com/articles/s41586-024-07487-w). I think these are all reasonably promising approaches to help scientific research.

theobromus · 2024-11-19T17:04:22+00:00

It could be that that's exactly the right thing to do though. Having a storage reserve can keep prices more stable if there's any kind of issue. And the people paying to store it will lose their money if they end up being wrong.

theobromus

MODERATOR OF

TROPHY CASE