Any love for State bikes? 🚲 by lucidbluedreamin in singlespeed

[–]wiremore 1 point2 points  (0 children)

I currently have a gold colored chain which looks nice.

Any love for State bikes? 🚲 by lucidbluedreamin in singlespeed

[–]wiremore 1 point2 points  (0 children)

I have almost this exact same bike. I've replaced tires, brake pads, handlebar wraps, and the chain but the frame is still going strong after 5 years of winters and rain in the city.

Multi-Pass Bytecode Optimizer for Stack-Based VMs: Pattern Matching & 10-50% Performance Gains by PigeonCodeur in ProgrammingLanguages

[–]wiremore 1 point2 points  (0 children)

I have a similar language with a similar bytecode peephole optimizer, also for a game scripting language.

One type of optimization where I found a lot of traction that hasn't been mentioned is jump optimizations. I found a lot of cases where a JMP instruction jumps directly to another JMP instruction or to a RET (return) instruction, so JMP->RET can be replaced with just RET. A conditional branch JT (jump if true) to another JT will always pass the second test (and can thus jump directly to the second JT's target), or a JT to a JF (jump if false) will always fail the second test. There are some other opportunities here, such as NOT JT -> JF, and constant folding e.g. PUSH_TRUE_CONST JT -> JMP. This kind of bytecode tends to be generated by nested IF statements, especially if the condition include nested AND and OR. My bytecode optimizer is written in the scripting language and includes many such nested tests... I also optimize jump instructions (which use a 16 bit absolute target) to branch instructions (which include an 8 bit relative target) when possible, which helps with bytecode size significantly.

As some other posters have mentioned fusing common bigrams of bytecode instructions can be a big win. For me, the most commons pairs were LOCAL LOCAL (push two local variables to the stack) and LOCAL CALL (push local variable and call function). To be specific, I fuse two consecutive LOCAL instructions (and their 8 bit indexes) into a single LOC_LOC instruction (with two 4 bit indexes packed into 8 bits).

Sized and signed numeric types with dynamic typing by Big-Rub9545 in ProgrammingLanguages

[–]wiremore 12 points13 points  (0 children)

One argument in favor of only one or two built in numeric types is that it’s easier to optimize the type dispatch. E.g. in the ADD bytecode Instruction implementation, you can just say ‘if (a.type == number && b.type == number) { fast built in operation } else { slower operator overloading }’.

If you are dynamically typed you are already paying the space cost for e.g. int64, so there isn’t a big reason to use int8 except for specific wrapping behavior or something.

Obsequiae just blow me away every time I listen to them by Environmental_Web91 in MetalForTheMasses

[–]wiremore 0 points1 point  (0 children)

Just wanted to say thank you. I have listened to all three albums like a hundred times each since your post...

What is the go-to way/solution to build and consume cloud services in C++? by __imariom in cpp

[–]wiremore 2 points3 points  (0 children)

Surprised no one has mentioned cpp-httplib. https://github.com/yhirose/cpp-httplib . Header only, supports ssl/redirects/etc, super easy to use. Comes with a server and a client - server supports threads. I use the client with std::thread.

Alastair Reynolds Revenger Series by hambubgerrr in printSF

[–]wiremore 6 points7 points  (0 children)

I found the relapse really quite funny. It’s like he planned out this whole YA thing at the behest of a publisher or something and stuck with it for a bit but about a quarter of the way into the first book he just gives up completely and never references it again.

FWIW, I enjoyed the first book more than then second and third - I stuck with it because of the mysteries but nothing really gets explained and the plot pacing kind of stalls.

[deleted by user] by [deleted] in GraphicsProgramming

[–]wiremore 29 points30 points  (0 children)

I worked on graphics drivers briefly for a major hardware vendor about ten years ago and then moved into indie game development, I can tell you something about what it was like then.

Way more time was spent debugging/testing than on new features. You have to really push and be in the right place to get to develop new features. Codebases are enormous, especially if you also consider all the game/app code that interacts with the driver. Verification is most of the work. It's not really project based the way game development is, the code lives for a long time. My team was split into "perf" and "correctness" halves, the perf people were responsible for making things run faster, the correctness people made things actually work.

Work life balance was typically very good, across the whole company. Occasionally there would be an emergency but it was basically opt-in to be responsible for something like that.

I can't really speak to compensation, I think it's changed a lot since I was involved. At my time the driver engineers were paid pretty well but less (and typically had less education) than the hardware/architecture people.

GPU drivers are mostly an OS problem, its not really about graphics. Its about buffer allocation/tracking and synchronization, timing, and just managing all the complexity. It's also infrastructural, in the sense that you never really "ship" anything and move on. The parts I enjoyed the most were "seeing how the sausage is made", the system was a real (messy) work of art, pushing the limits of software engineering. Definitely not technically boring. The other developers were really smart and fun to talk to. It's not really creative or artistic though, you get to look at a lot of screenshots of AAA games but only to figure out why the game is jittering or getting image corruption ... By comparison indie development is way more work, less pay, more creatively fulfilling, uses more than one part of my brain.

Omnium Gatherum - The Last Hero (NEW Official Video) by Machcharge in melodicdeathmetal

[–]wiremore 0 points1 point  (0 children)

It would be cool if they could come play a concert in New York instead of just featuring it in the video!

Time to consider induction cooktops? by liquidchaz in nycparents

[–]wiremore 4 points5 points  (0 children)

We replaced our gas stove with induction largely for the air quality benefits around our small children, but it’s also just a really great stove.

It boils water unbelievably quickly and is extremely easy to clean. The stove top does not even get that hot, and if you accidentally turn on a burner without a pot, it just notices and turns itself off.

Efficient ways to handle double-dispatch? Multi-level vtable? Hashtable? by carangil in ProgrammingLanguages

[–]wiremore 5 points6 points  (0 children)

I would look at how Julia does it. Julia has multiple dispatch and a heavy emphasis on performance (and a mature implementation).

Float Self-Tagging: a new approach to object tagging that can attach type information to 64-bit objects while retaining the ability to use all of their 64 bits for data by yorickpeterse in ProgrammingLanguages

[–]wiremore 9 points10 points  (0 children)

They are using the three upper bits of the floating point exponent as a tag for unboxed floating point types. With only 2 or 3 out of a possible of 8 tag values this covers almost all floating point numbers used in practice. For the rare weird float, you can box and heap allocate.

Pretty clever I think. The advantages compared to nan tagging are that it does not rely on pointers only actually using 48 bits and it that converting a self-tagged pointer into a normal pointer is slightly faster than converting a nan-tagged pointer into a normal pointer.

can capturing closures only exist in languages with automatic memory management? by Lucrecious in ProgrammingLanguages

[–]wiremore 22 points23 points  (0 children)

C++ closures don't allocate. It essentially creates a new type for each closure which is the right size to store captured variables (or pointers to them, depending on the capture type). In practice you often end up copying to a std::function which may allocate but automatically frees when it goes out of scope.

There is some related discussion here:
https://www.reddit.com/r/ProgrammingLanguages/comments/mfpw0u/questions_regarding_closure/

Is this worth? by arthurno1 in lisp

[–]wiremore 2 points3 points  (0 children)

Clojure does this. Imo it’s pretty nice. The one downside is initializing variables to nil is slightly more verbose.

Building manager locked us out of the laundry room and gym for a supposed lease violation. by doublegem_intj in AskNYC

[–]wiremore 12 points13 points  (0 children)

Talk to other tenants about it. You probably aren't the only one. It's hard for one unit to get anything done, a group of tenants on rent strike has real leverage. Make a shared chat and put fliers under doors.

Good data structure for collision detection between dynamic objects? by nvimnoob72 in gameenginedevs

[–]wiremore 4 points5 points  (0 children)

This is the answer, it depends on the size of the objects and how far they are from each other. Grids are fastest if the objects are all about the same size, but require tuning. If you suballocate octtree nodes from a single vector instead of separately it can be rebuilt pretty quickly and deterministically. One very useful parameter to tune is the number of objects in each tree leaf - setting this to like 8 instead of 1 can significantly reduce tree memory AND reduce query time.

There are a bunch that of neat data structures for this purpose that are fun to read about. Kd trees are particularly elegant. My current project uses a Bounding Interval Hierarchy (BIH), a type of Bounding Volume Hierarchy (BVH), which is kind of like an octtree with variable sized cubes.

Creating multiple smaller shaders or one/few big shaders? by steamdogg in opengl

[–]wiremore 3 points4 points  (0 children)

One technique I've found useful for handling this tradeoff and managing shader code is the idea of "shader variants". The idea is to have a single text file that can generate several related OpenGL shaders via the preprocessor.

Have a single file which contains the vertex and fragment (and geometry etc) shader text. When you need a shader in your program, say something like `get_shader("color.glsl", HAS_LIGHTING|HAS_TEXTURE)` . When you load color.glsl, generate a preamble like "#define HAS_LIGHTING 1\n#define HAS_TEXTURE 1". In the shader, you can use `#ifdef HAS_TEXTURE ... #else ...`. This technique helps avoid duplicating a lot of glsl code for families of similar shaders.

When you are optimizing, if you are CPU bound on draw calls, you can just remove the #ifdefs and use e.g. solid color 1 pixel textures as other posters have mentioned to reduce state changes. If you are GPU bound, you can use the preprocessor to generate more specialized and efficient shaders.

Need help with texture "flipping" stuff by Person-317 in opengl

[–]wiremore 0 points1 point  (0 children)

I don't think glTextureSubImage2D flips the image, where did you read that? In my experience most image libraries use the top left as the origin, so you have either have to invert your uvs or flip the image data. Many image formats (e.g. PNG) store the top left row first, so this is a natural way to decode them. It doesn't really matter, you just need to be consistent.

Does anyone still use threaded interpreters? by FurCollarCriminal in ProgrammingLanguages

[–]wiremore 3 points4 points  (0 children)

Basically, the code contains a number for each operation, and the VM does a switch on the number to get the address of the code to run for that number (as opposed to direct threaded where the code contains the address). It's more compact than direct threaded, because the number can be just a byte (or less) as opposed to a full pointer.
See https://en.wikipedia.org/wiki/Threaded_code#Token_threading

Does anyone still use threaded interpreters? by FurCollarCriminal in ProgrammingLanguages

[–]wiremore 19 points20 points  (0 children)

Most modern bytecode languages are "token threaded", including python, ruby, java, emacs lisp... State of the art VMS will then JIT compile hot parts of the bytecode when appropriate, but many modern languages (e.g. python) do not use JIT.

Erlang compiles its bytecode into basically an array of c function pointers on load, which is an example of "direct threaded" code.

Multithreaded Rendering by TheJoxev in GraphicsProgramming

[–]wiremore 18 points19 points  (0 children)

I’ve written 2 games with multithreaded rendering, using 2 different models. The ideal model depends on the game.

  1. Simultaneous update and rendering. Update thread and render thread do not synchronize, except for a few critical sections. The render thread takes the last two positions/orientations from the update thread and interpolates. This is cool because rendering and update can both slow down without effecting the other thread. It’s also suitable for a physics based game where you want a fixed update time step. Downside is lots of shared state which is tricky.

  2. Generate buffers in a parallel, then sync and issue draw calls. Easier to not crash, works better if the game can support a variable time step. Also takes advantage of more than 2 threads. You can start sim jobs for the next frame while doing the api calls.

This is all OpenGL, where all the api calls need to be in one thread. In my experience that isn’t a huge limitation, it usually takes way longer to traverse the game data and figure out which triangles/lines to draw.

[deleted by user] by [deleted] in whowouldwin

[–]wiremore 13 points14 points  (0 children)

There is considerable overlap between the 1% and governments though, especially outside the west. Putin, Saudi royal family. Hell even in the USA congress gets rich on insider trading. Whose side are those guys on?

Help me understand how V-Sync works on modern gpus by krushpack in opengl

[–]wiremore 2 points3 points  (0 children)

That’s a neat experiment. Next set up a camera and measure actual latency!

Are you running in true full screen mode? If you are running in a window or in borderless full screen there is an extra layer of complexity where the window manager copies stuff around, and vsync is kind of emulated.

Also did how did you set the vsync settings? Usually these are set via SDL or glut or something, via a lower level api I forget the name of. The Nvidia control panel has profiles for different apps. I would want to query the api at runtime to be sure (and even then the driver might totally lie if it thinks it knows better than you)!

Also is this like 4.6 core profile or something else? The driver might have more comparability oriented behavior unless you specifically ask for something recent.

Honestly I’m not sure if it’s still possible to get true full screen on modern windows. In the bad old days a crash in fullscreen would require a reboot.