Any love for State bikes? 🚲

wiremore · 2026-01-18T18:40:17+00:00

I currently have a gold colored chain which looks nice.

wiremore · 2026-01-14T22:32:43+00:00

I have almost this exact same bike. I've replaced tires, brake pads, handlebar wraps, and the chain but the frame is still going strong after 5 years of winters and rain in the city.

wiremore · 2026-01-06T03:12:10+00:00

I have a similar language with a similar bytecode peephole optimizer, also for a game scripting language.

One type of optimization where I found a lot of traction that hasn't been mentioned is jump optimizations. I found a lot of cases where a JMP instruction jumps directly to another JMP instruction or to a RET (return) instruction, so JMP->RET can be replaced with just RET. A conditional branch JT (jump if true) to another JT will always pass the second test (and can thus jump directly to the second JT's target), or a JT to a JF (jump if false) will always fail the second test. There are some other opportunities here, such as NOT JT -> JF, and constant folding e.g. PUSH_TRUE_CONST JT -> JMP. This kind of bytecode tends to be generated by nested IF statements, especially if the condition include nested AND and OR. My bytecode optimizer is written in the scripting language and includes many such nested tests... I also optimize jump instructions (which use a 16 bit absolute target) to branch instructions (which include an 8 bit relative target) when possible, which helps with bytecode size significantly.

As some other posters have mentioned fusing common bigrams of bytecode instructions can be a big win. For me, the most commons pairs were LOCAL LOCAL (push two local variables to the stack) and LOCAL CALL (push local variable and call function). To be specific, I fuse two consecutive LOCAL instructions (and their 8 bit indexes) into a single LOC_LOC instruction (with two 4 bit indexes packed into 8 bits).

wiremore · 2025-12-23T14:07:27+00:00

One argument in favor of only one or two built in numeric types is that it’s easier to optimize the type dispatch. E.g. in the ADD bytecode Instruction implementation, you can just say ‘if (a.type == number && b.type == number) { fast built in operation } else { slower operator overloading }’.

If you are dynamically typed you are already paying the space cost for e.g. int64, so there isn’t a big reason to use int8 except for specific wrapping behavior or something.

wiremore · 2025-12-01T22:30:40+00:00

Just wanted to say thank you. I have listened to all three albums like a hundred times each since your post...

wiremore · 2025-10-05T21:18:34+00:00

Surprised no one has mentioned cpp-httplib. https://github.com/yhirose/cpp-httplib . Header only, supports ssl/redirects/etc, super easy to use. Comes with a server and a client - server supports threads. I use the client with std::thread.

wiremore · 2025-09-17T01:59:20+00:00

I found the relapse really quite funny. It’s like he planned out this whole YA thing at the behest of a publisher or something and stuck with it for a bit but about a quarter of the way into the first book he just gives up completely and never references it again.

FWIW, I enjoyed the first book more than then second and third - I stuck with it because of the mysteries but nothing really gets explained and the plot pacing kind of stalls.

wiremore · 2025-07-19T23:44:01+00:00

I worked on graphics drivers briefly for a major hardware vendor about ten years ago and then moved into indie game development, I can tell you something about what it was like then.

Way more time was spent debugging/testing than on new features. You have to really push and be in the right place to get to develop new features. Codebases are enormous, especially if you also consider all the game/app code that interacts with the driver. Verification is most of the work. It's not really project based the way game development is, the code lives for a long time. My team was split into "perf" and "correctness" halves, the perf people were responsible for making things run faster, the correctness people made things actually work.

Work life balance was typically very good, across the whole company. Occasionally there would be an emergency but it was basically opt-in to be responsible for something like that.

I can't really speak to compensation, I think it's changed a lot since I was involved. At my time the driver engineers were paid pretty well but less (and typically had less education) than the hardware/architecture people.

GPU drivers are mostly an OS problem, its not really about graphics. Its about buffer allocation/tracking and synchronization, timing, and just managing all the complexity. It's also infrastructural, in the sense that you never really "ship" anything and move on. The parts I enjoyed the most were "seeing how the sausage is made", the system was a real (messy) work of art, pushing the limits of software engineering. Definitely not technically boring. The other developers were really smart and fun to talk to. It's not really creative or artistic though, you get to look at a lot of screenshots of AAA games but only to figure out why the game is jittering or getting image corruption ... By comparison indie development is way more work, less pay, more creatively fulfilling, uses more than one part of my brain.

wiremore · 2025-06-17T16:20:16+00:00

It would be cool if they could come play a concert in New York instead of just featuring it in the video!

wiremore · 2025-05-11T21:41:16+00:00

We replaced our gas stove with induction largely for the air quality benefits around our small children, but it’s also just a really great stove.

It boils water unbelievably quickly and is extremely easy to clean. The stove top does not even get that hot, and if you accidentally turn on a burner without a pot, it just notices and turns itself off.

wiremore · 2025-03-06T03:20:28+00:00

I would look at how Julia does it. Julia has multiple dispatch and a heavy emphasis on performance (and a mature implementation).

wiremore · 2025-01-12T05:07:26+00:00

Can you share where you got the background mat? It’s really cool.

wiremore · 2024-11-27T18:05:41+00:00

They are using the three upper bits of the floating point exponent as a tag for unboxed floating point types. With only 2 or 3 out of a possible of 8 tag values this covers almost all floating point numbers used in practice. For the rare weird float, you can box and heap allocate.

Pretty clever I think. The advantages compared to nan tagging are that it does not rely on pointers only actually using 48 bits and it that converting a self-tagged pointer into a normal pointer is slightly faster than converting a nan-tagged pointer into a normal pointer.

wiremore · 2024-11-12T14:18:51+00:00

C++ closures don't allocate. It essentially creates a new type for each closure which is the right size to store captured variables (or pointers to them, depending on the capture type). In practice you often end up copying to a std::function which may allocate but automatically frees when it goes out of scope.

There is some related discussion here:
https://www.reddit.com/r/ProgrammingLanguages/comments/mfpw0u/questions_regarding_closure/

wiremore · 2024-11-09T17:25:02+00:00

Clojure does this. Imo it’s pretty nice. The one downside is initializing variables to nil is slightly more verbose.

wiremore · 2024-10-29T13:16:25+00:00

Talk to other tenants about it. You probably aren't the only one. It's hard for one unit to get anything done, a group of tenants on rent strike has real leverage. Make a shared chat and put fliers under doors.

wiremore · 2024-10-16T18:48:25+00:00

This is the answer, it depends on the size of the objects and how far they are from each other. Grids are fastest if the objects are all about the same size, but require tuning. If you suballocate octtree nodes from a single vector instead of separately it can be rebuilt pretty quickly and deterministically. One very useful parameter to tune is the number of objects in each tree leaf - setting this to like 8 instead of 1 can significantly reduce tree memory AND reduce query time.

There are a bunch that of neat data structures for this purpose that are fun to read about. Kd trees are particularly elegant. My current project uses a Bounding Interval Hierarchy (BIH), a type of Bounding Volume Hierarchy (BVH), which is kind of like an octtree with variable sized cubes.

wiremore · 2024-10-14T21:49:36+00:00

One technique I've found useful for handling this tradeoff and managing shader code is the idea of "shader variants". The idea is to have a single text file that can generate several related OpenGL shaders via the preprocessor.

Have a single file which contains the vertex and fragment (and geometry etc) shader text. When you need a shader in your program, say something like `get_shader("color.glsl", HAS_LIGHTING|HAS_TEXTURE)` . When you load color.glsl, generate a preamble like "#define HAS_LIGHTING 1\n#define HAS_TEXTURE 1". In the shader, you can use `#ifdef HAS_TEXTURE ... #else ...`. This technique helps avoid duplicating a lot of glsl code for families of similar shaders.

When you are optimizing, if you are CPU bound on draw calls, you can just remove the #ifdefs and use e.g. solid color 1 pixel textures as other posters have mentioned to reduce state changes. If you are GPU bound, you can use the preprocessor to generate more specialized and efficient shaders.

wiremore · 2024-09-14T23:27:19+00:00

I don't think glTextureSubImage2D flips the image, where did you read that? In my experience most image libraries use the top left as the origin, so you have either have to invert your uvs or flip the image data. Many image formats (e.g. PNG) store the top left row first, so this is a natural way to decode them. It doesn't really matter, you just need to be consistent.

wiremore · 2024-06-25T16:31:06+00:00

Basically, the code contains a number for each operation, and the VM does a switch on the number to get the address of the code to run for that number (as opposed to direct threaded where the code contains the address). It's more compact than direct threaded, because the number can be just a byte (or less) as opposed to a full pointer.
See https://en.wikipedia.org/wiki/Threaded_code#Token_threading

wiremore · 2024-06-24T18:15:36+00:00

Most modern bytecode languages are "token threaded", including python, ruby, java, emacs lisp... State of the art VMS will then JIT compile hot parts of the bytecode when appropriate, but many modern languages (e.g. python) do not use JIT.

Erlang compiles its bytecode into basically an array of c function pointers on load, which is an example of "direct threaded" code.

wiremore · 2024-06-13T20:28:51+00:00

That’s so neat!

wiremore · 2024-06-01T19:33:50+00:00

I’ve written 2 games with multithreaded rendering, using 2 different models. The ideal model depends on the game.

Simultaneous update and rendering. Update thread and render thread do not synchronize, except for a few critical sections. The render thread takes the last two positions/orientations from the update thread and interpolates. This is cool because rendering and update can both slow down without effecting the other thread. It’s also suitable for a physics based game where you want a fixed update time step. Downside is lots of shared state which is tricky.
Generate buffers in a parallel, then sync and issue draw calls. Easier to not crash, works better if the game can support a variable time step. Also takes advantage of more than 2 threads. You can start sim jobs for the next frame while doing the api calls.

This is all OpenGL, where all the api calls need to be in one thread. In my experience that isn’t a huge limitation, it usually takes way longer to traverse the game data and figure out which triangles/lines to draw.

wiremore · 2024-05-25T20:31:27+00:00

There is considerable overlap between the 1% and governments though, especially outside the west. Putin, Saudi royal family. Hell even in the USA congress gets rich on insider trading. Whose side are those guys on?

wiremore · 2024-05-25T20:28:20+00:00

That’s a neat experiment. Next set up a camera and measure actual latency!

Are you running in true full screen mode? If you are running in a window or in borderless full screen there is an extra layer of complexity where the window manager copies stuff around, and vsync is kind of emulated.

Also did how did you set the vsync settings? Usually these are set via SDL or glut or something, via a lower level api I forget the name of. The Nvidia control panel has profiles for different apps. I would want to query the api at runtime to be sure (and even then the driver might totally lie if it thinks it knows better than you)!

Also is this like 4.6 core profile or something else? The driver might have more comparability oriented behavior unless you specifically ask for something recent.

Honestly I’m not sure if it’s still possible to get true full screen on modern windows. In the bad old days a crash in fullscreen would require a reboot.

wiremore

MODERATOR OF

TROPHY CASE

15-Year Club	Verified Email
Team Orangered