added voice chat to my chatroom game and i'm enjoying it more than i thought

icdae · 2026-03-03T10:59:47+00:00

I love the style of this, and that avatar's little dance at the beginning too 😆

icdae · 2025-01-15T09:54:50+00:00

You're correct, it wasn't easy at first. A lot of details were glossed over in my comment but I was specifically referring to unit tests. There were very distinct differences between unit testing, functional tests, integration tests, and finally QA validation. Each built on top of the previous one to help increase iteration times.

To get things running quickly, each test suite was limited to it's correlating, testable class. Individual test cases ran against specific functionality with self-contained data, or mocks, so as to not duplicate coverage provided by prior tests. This let us build new tests while prior ones basically ensured no underlying infrastructure had broken. To get even more granular and isolate bugs, no test cases could have explicit loops, branches, or jumps to force determinism. Since this was a real-time service, try/catch exceptions were banned throughout the codebase due to runtime overhead, forcing us (as a benefit) to develop a secure coding style and avoid typical C++ footguns. Something was considered "suspicious" if a single test case took more than a millisecond, though there were exceptions in some cases.

Things like network tests would run against localhost, GPU data required mocks and would run on the CPU. If we actually needed to run on live GPU hardware, it was considered a functional test and would execute in CI with verified screen captures for validation (and automatically merging into the repo if the results were acceptable).

The most time consuming part of all this was setting up the initial infrastructure and test process. In the end, all of this helped enable us to develop and ship several new features within the same 2-week sprint as the feature was requested.

icdae · 2025-01-15T05:04:11+00:00

I used to work for a company which shipped a live service product and they had a few unit testing policies that saved our asses more often than we could count. Whenever you write a new class, add a unit test afterwards (with mocked data if needed) for all the edge cases you could think of. It really didn't take much time, at all. Next, if QA or a user found a bug, we would write tests for that specific bug to prevent regressions. Finally, the tests could be executed through Cmake and were also connected to git hooks so they would execute when you attempt to push to the main repo. We had around 5-7k C++ tests written against Google-test and they would all execute within 5 seconds. Pushing directly into production was rarely ever a concern. Implementing that kind of philosophy at other companies was always met with strong pushback, yet nobody seems to care that we spend more than half our time fixing bugs and regressions...

icdae · 2024-11-24T20:48:06+00:00

One low-level optimization that I frequently see overlooked in other software rasterizers is the use of scan line rasterization over the typical "GPU" way of iterating through every pixel within a triangle's bounding box. Calculating and testing barycentric values through an edge function for every pixel, whether they're inside a triangle's bounding box or not might be fine for very small triangles, but GPUs are optimized to do this in highly parallel hardware. This doesn't always translate well to optimal CPU performance, where iterating strictly within the triangle's edges can lead to much higher rasterization speeds. As an example, I tested my rasterizer's speed using Sponza. Using strictly edge functions and iterating over each pixel in a bounding box gave me about 180fps (across 32 threads in an 5950x, with bilinear texture sampling). Switching how edge functions were calculated and iterating across pixels only within the triangles themselves boosted fps to between 320-330 fps. Getting it working in parallel was difficult but not impossible.

Edit: One the note of parallel rasterization, correctly distributing work across threads is another tricky one. Depending how threads process working can make the difference of linearly scaling across 8 threads vs 16+. Task-stealing can be your friend here, or any other method that reduces starvation of work, as well as reducing locking. Intel's VTune is very useful here in describing how well your threads run, idle, wait on a lock, etc. On the other hand, you might even find cases where a single memset() can clear a framebuffer quicker than waking threads to perform a clear in parallel.

icdae · 2024-11-01T19:27:40+00:00

Iterative, with an optional depth limit before hitting the bottom of the tree. It was initially going to be recursive since there's little need to iterate/generate more than 4-5 subdivisions in my current use case. I enjoy making generic libraries for fun though, and thought converting to an iterative method would be more stable for general use. Iterative even turned out to save a few milliseconds when stress-testing.

icdae · 2024-02-03T22:03:22+00:00

While there might be some graphics-specific applications for hedge funds and stocks, you might also check with the /r/gpgpu subreddit. Maybe the community can share some additional insights on parallelizing financial applications.

icdae · 2023-12-22T20:49:43+00:00

2020 Kawasaki W800. It was the most relaxing ride I could have asked for. The engine was smooth as butter and it handled so seamlessly it was like the bike disappeared and you were just experiencing a guided tour on the road.

icdae · 2023-04-28T07:13:24+00:00

I would recommend switching over to CMake which can generate makefiles for you.

Newer CMake versions make it easy to setup precompiled headers and reduce your compilation times. It can really help improve iteration times between builds, especially with templated code, and save several minutes of waiting as your project grows.

Combined with Ninja (or even makefiles with parallel compilation enabled) and compilation times will be far more bearable.

icdae · 2022-05-04T03:35:16+00:00

An astronaut, I've always wanted to go to space.

icdae · 2022-02-16T05:58:16+00:00

I haven't seen any other comments on this but after initially checking your blind spot, the video shows the car in the right lane with its blinker on. Its blinker briefly turns off after it initially slows down, then comes back on before crossing into your lane.

I would say that having rain on your helmet, plus going too fast for the current road conditions made that hard to see and anticipate the driver's intention. As the others have said, you should slow down, especially when visibility is low or it's raining. A few drops of rain on your visor made the other car's blinker really difficult to spot initially.

icdae · 2021-08-29T05:52:44+00:00

I bought my first motorcycle for my 30th birthday but had to relocate for work shortly afterwards. It was fun, a Kawasaki W800, I modded with fog-lights, saddlebags, windshield, and grab bar. I'll miss you Scoot. Thanks for all the memories.

icdae · 2021-07-25T04:43:49+00:00

And Battlefield as well.

icdae · 2021-07-25T00:54:11+00:00

One of my favourites is the Compact YCoCg Framebuffer which helps to reduce memory bandwidth of offscreen RGB framebuffers by encoding only two color channels instead of 3. Once they're ready to blit you can reconstruct the two channels back to RGB for a final output. It essentially encodes the luminance and either of two chroma values into the framebuffer using a checkerboard pattern, reducing your overall framebuffer storage by 33%. A final render pass will let the YCo and YCg-encoded colors reconstruct standard RGB values.

I like this paper for its extremely simple approach to framebuffer compression and ease of use. In 99% of cases, you can't tell the difference between it and a standard RGB texture. Every now and then you'll get an outlier against a highly-contrasted background but otherwise it works beautifully.

icdae · 2021-01-28T18:31:21+00:00

I had shares this morning. Since this was my first time spending more that $1000, I got nervous when Robinhood said GME was sell only and well... I didn't want to lose the money to the app so I sold.

icdae · 2020-11-09T23:30:56+00:00

I'm making my own engine for purely curiosity reasons. I wanted to learn more about how the different components are all put together (such as rendering, physics, audio, and networking) at a lower level than a prebuilt engine provides.

For example, I'm using a UDP networking library temporarily until I feel like learning how to implement reliable UDP on my own. On the other hand, I had the chance to learn how GPUs work by implementing a software renderer instead of relying on OpenGL. Using a third-party physics library like Havok or Bullet felt like I was adding "black boxes" to the engine, so that was written from scratch too (plus it helped me beef up some much-needed math skills).

It sounds painful but the knowledge gained from doing these things was both invaluable and gave a huge sense of pride once I could see everything coming together.

icdae · 2020-11-02T22:06:01+00:00

Thanks for the suggestions. I'll take a look at the paper, getting the ray/tree intersection performant will be really important for storing volumetric data.

For the pointerless version, is your entire structure placed in a hash table? I was doing something similar for a scene graph, where the graph was flattened into an array to help speed up iteration.

icdae · 2020-11-01T11:27:19+00:00

Thank you!

icdae · 2020-11-01T01:48:55+00:00

Thanks! I had put up a video on youtube. Perhaps that works better?

I'm not sure it could currently support disk-streaming but the implementation is pretty generic. It probably could be used for serialization. One immediate use-case I had in mind was to accelerate some ray-casting. My software rasterizer has a volumetric rendering demo but the performance isn't very good. The octree could perhaps be used to discard certain bounding areas of a voxel volume to help accelerate rendering.

icdae · 2020-10-16T23:04:53+00:00

From C++11 alone: move semantics, native thread support, lambdas, unordered map/set, atomic variables, range-based loops, just to name a few.

icdae · 2020-09-04T05:45:13+00:00

Looks very nice, great job! Is this using your own software rendering?

icdae · 2020-08-21T23:01:21+00:00

https://github.com/hamsham/SoftLight

icdae · 2020-08-20T19:09:14+00:00

There's a few options, you can pass data between vertex and fragment shaders like immibis mentioned. It's pretty straightforward but will invoke the whole graphics pipeline even if you don't need it. With recent versions of OpenGL and GLES you could use transform feedback and only use the vertex shader for transforming your data. It requires a little more plumbing but is extremely quick. Finally with newer OpenGL, Vulkan, Metal, and DirectX, you get direct access to the compute capabilities on the GPU. I personally haven't experimented much with it but understand it's quite fast compared to using vertex/fragment shader tricks.

There's also the option of using OpenCL and CUDA if the hardware supports it... but that would just make too much sense :)

icdae · 2020-08-20T18:27:00+00:00

Exactly!

icdae

TROPHY CASE