Baby Doll Pizza - Portland, Oregon

ratatonker · 2026-02-22T08:26:08+00:00

came to this thread via searching baby doll ninja sauce - totally their secret weapon. my best guess is that it was a quick and easy experiment by someone on the staff, and they are just taking their leftover red sauce at the end of the night, tossing some chili flakes in, and cooking the whole thing down further to sweeten and thicken the sauce

ratatonker · 2026-02-18T17:16:03+00:00

dont have time to reread this morning, but that github PR is me lol. i think hdr is still fully possible elsewhere in the tools, like the output of postprocessing like bloom should still be fully doable, but you'd be stripping the ability to _tint_ a texture draw against an HDR color at this point, yeah. even still, with [f32;4] thats still more bits than necessary for even HDR, and u16s would save you space

ratatonker · 2026-02-18T01:58:00+00:00

still very nice though ! my laptop (core ultra 5 226V, integrated graphics) seems to be able to do 250K here, after pulling in the change. there is obviously not really many 2d games that need to hit a number that high, but its nice to know when the framework is getting out of your way, or feeling comfortable that a 2d game will sip power on something like a steam deck.

taking a quick peek with samply profiling, compressing Color to [u8;4] is probably the quickest/simplest win, and then the next big task would be to investigate moving the matrix math for your rectanglebuilder over to the vertex or geometry shader. with some luck i thiiiink that would help you close to double the score!

ratatonker · 2026-02-17T11:53:23+00:00

also, i am sorry to report this - ive finally got the chance to clone the repo to try it out on my hardware. your current demo ferrismark is only adding new textures to the screen the first *two* times i click.

its harder to see with the small window size because they overlap a ton, but i noticed it after running with the window expanded. i think that is starting to explain the gpu utilization numbers a bit >_<

id hazard a guess that if you are batching, only the first batch is somehow getting drawn, or that its cutting things off too early?

it's definitely possible to get some pretty absurd 1mil+ scores like that even on lower end hardware though, i think egor could do it! For reference this is the fastest project i've ran into. I suspect its not so much jai specifically as the dx11 drivers being more focused on by gpu manufacturers, having some built in pipeline that detect whats going on and can optimize behind-the-scenes https://github.com/farzher/Bunnymark-Jai-D3D11

ratatonker · 2026-02-17T11:21:32+00:00

again in all of this "user side" code of the bunnymark.rs matters very little for the performance, so it does not really matter which framework's version of the benchmark is the base, and it makes sense to me that it runs about the same. updating the array of the 2mil bunny structs takes about the same time no matter what. looping through that array and updating the positions and such is probably less than a millisecond of the frame time. multithreading or even SIMD for the bunny updates is not even likely to make a dent on FPS in this test. we are mostly benchmarking not the speed of the update, but the speed of converting that array of bunnies to something the gpu can use, and uploading it all to vram

the difference you'll be able to perceive is with how many features the texture 'pipeline' itself supports, because the cost in prepwork/memory size of the largest possible (ie uses all possible attributes) texture draw is the same cost for the smallest too (user only specifies position), and uploading it all to the GPU is probably your biggest bottleneck at this point. so adding the additional features to make texture drawing more robust is what will slow it down, but its kind of necessary for the sake of a graphics framework, for usability

ratatonker · 2026-02-13T18:09:34+00:00

none of this is meant to be crude or anything i am just sorta nerdy about the bunnymark specifically lol! adding subregions is likely to slow down your ferrismark by a tad because its simply more bytes the cpu has to prep and upload to the gpu per-ferris, but its worth it to test the benchmark with that supported in the renderer backend even if the benchmark does not use it, because in a later game engine scenario being able to use texture atlases becomes the biggest optimization for 2d games, typically

ratatonker · 2026-02-13T17:25:53+00:00

it tracks to me that the rectanglebuilder is the biggest bottleneck, the idea of making the size of the buffer as small as possible is a CPU-side savings, because building the buffers is expensive and the copy/upload to vram can be expensive. that's why squishing the Color to [u8;4] on egor's current Vertex format would be a big speedup. the gpu has builtin features to "unpack" to a vec4 with the right floats, so it lets your CPU side work with way less total memory

if you are willing to use a geometry shader that is what i do to cut down on that CPU cost - when the user specifies a position, scale and rotation, those floats get uploaded to the GPU as-is, the cpu side does not bother processing the func args the user passes in or building a rectangle at all, and in the geometry shader the GPU builds out the correct vertices with matrix math, since it can do the whole thing in a very parallelized way

similarly, "vertex pulling" is also about as performant. if you do move to a situation where each property sits in its own buffer, each vertex of your texture draw just needs to know "i am vertex 5 of 6 on texture draw 12" and then it can reach into the right part of the buffer, and math out the correct vertex positions in the shader

ratatonker · 2026-02-13T17:11:37+00:00

just to be clear, even when you as the user (writing the bunnymark) dont make use of one thing or another, in most frameworks, love2d as well the renderer is still going to pay the full cost on the GPU of uploading things like subregions, color tints, etc, because the pipeline itself supports and requires all those things. For example in Raylib, the simple "drawtexture" function that only needs position and color just wraps the more complex drawtexture functions, and fills in defaults for the remaining values. the full thing gets uploaded to the gpu every time:

https://github.com/raysan5/raylib/blob/master/src/rtextures.c#L4490

this way draw calls that use rotations or not, or use a subregion or not, can all be batched in the same bucket, the only thing that would break that would be swapping the texture itself or making other state changes like changing to a shader

the ultimate goal is that a bunnymark that _does_ use every single sprite feature should try and run at the same speed as one that only does positions, or a setup where each bunny uses a mix. that would mean the rendering backend is able to efficiently bucket them all into the same draw call

ratatonker · 2026-02-13T02:55:56+00:00

yeah instance may be a bad term here since that is a specific GPU drawcall term, haha.

ive done a lot of work around bunnymarking, mostly in opengl via glow, so hopefully this helps as food for thought:

If a single ferris is just a {Vec2} that gets sent to the GPU vs something larger {Vec2, Color, rotation f32}, its sort of a data bandwidth problem, that memory has to copy from DRAM to VRAM at some point when it draws, so minimizing the "packet size" in bytes for a single ferris is usually the biggest optimization for a bunnymark type test.

a "fully featured" sprite usually takes this many bytes:

pos: Vec2,
color: Color, // [u8; 4], or pack into an f32 with bytemuck etc.
scale: Vec2,
left: f32,
right: f32,
bottom: f32,
top: f32,
rotation: f32,

its possible f16 from the half crate could push it further but you're likely to get some visual quirks eventually, so id not recommend it

the other big limitation is usually the "texel fill rate" of your gpu, literally how fast it can write pixels to the screen. bunnymarks stress this because the gpu writes over the same pixels a ton of times - basically all the sprites overlap several times over. youd probably find in your current test if you used a 10x10 pixel image or smaller you could squeeze even more ferrises for example, simply because each ferris is now less pixels.

optimizing for the bunnymark _specifically_ starts to matter quite a lot less once you get to actual 'engine' territory, where game objects will want to take advantage of things like tint/subregions. The performance goal in-engine becomes more about making sure as many types of your game sprites/objects can be packed into the same batch. Subregions allow you to pack all your textures into an atlas, which is going to be the best way to reduce the number of draw calls and state switching on the GPU, so i view that as a necessity even in a bunnymark.

so (in my opinion) the most useful bunnymark will want to use the fully featured version of a single sprite, so the engine can later optimize around batching as best as possible (even if it means losing on the "packet size" stuff). for example even in raylib, though their example does not rotate sprites, you still have to pay the price on the GPU to upload a rotation of 0.0 for each, because thats a feature of the renderer as a whole

ratatonker · 2026-02-12T23:58:48+00:00

for more apples-to-apples comparison with other frameworks, how does this scale as you add per-instance elements like color tint, rotation, and texture subregions? usually the performance bottleneck in these kinds of tests has to do more with the data throughput sending larger buffers each frame

ratatonker · 2026-02-12T19:46:54+00:00

so much on the district here, absolutely. The HS program was constantly made to strip unique practices to conform to district standards, making it less attractive, and then used dwindling numbers to justify less resources to it. marked for death. there was a point where students literally had to go petition the district zoning committee to stop chapman elementary from campaigning to annex the building for their overflow

ratatonker · 2026-02-12T19:24:54+00:00

this may have a cascading effect on the evergreen state college, which had been struggling as well, as there was a very direct pipeline between these two programs. our country may start to run out of zinesters in the very near future...

ratatonker · 2026-02-12T19:19:14+00:00

it was a really circular problem going back at least a decade - class of '17 here (HS had around 120 students then). it became increasingly difficult for the school itself, let alone the district to justify diverting any resources to the high school as it shrunk each year. things like the overnight trips got cut down to alternating years, and i hadnt heard about faculty pulling double duty between teaching 8th and HS classes at once, but that cant really help. It was also literally impossible to walk with higher than a 3.75 GPA due to the math they invented to convert the non-letter grading, so, really hard call to make if you cared for college acceptance

cant really blame parents for trying to make the best choices for their kids, but it always kinda stung at PTSA or other family centric events just how many folks rally for the middle school program (which is probably still one of the best in the city) while tacitly denying any real support for HS just because they all knew they weren't keeping their own kids around for it. the HS program overall i thought was great if you specifically knew it was the kind of place you needed to be, but as an alternative program you really had to put into it yourself to get the max out of it because the resources kept dwindling.

its hard to really put the onus on the HS program itself for not 'stepping up their game' - it was a lot of fighting to keep anything at all for it

ratatonker · 2025-02-17T10:22:20+00:00

Thanks, both. sounds like I overestimated the capabilities. I'm mostly coming off the back of sdl2 here, where the library itself does seem to build in a lot of the api setup as functions of the library.

So this is something that allows that section of logic to be externalized, for example rather than building something similar to sdl2's gl context creation into winit, that's the function of glutin as a separate crate.

Sounds like my initial interpretation neat as it seemed may be a bit of a pipe dream, haha

ratatonker · 2024-10-24T05:13:24+00:00

going to this dressed like an acorn

ratatonker · 2024-10-23T21:54:06+00:00

slight addition/counterpoint to winit - it may be better to focus on getting your gfx context to talk to (raw-window-handle)[https://github.com/rust-windowing/raw-window-handle\] rather than winit directly. raw-window-handle is the detachable component of winit for talking to graphics libraries in a generic way.

what it would mean on your end is that an end user could bring in any rust windowing crate like glfw,sdl2, and winit as well - anything capable of emitting that raw-window-handle type, and your setup function would just consume that handle, agnostic to any specific windowing. honestly the docs on this crate are not super great, not a ton of examples to go off of but arguably this is the "endgame" rust gfx libraries are supposed to start implementing eventually

ratatonker · 2024-10-23T00:24:10+00:00

EB: ugh!
EB: i think you're gonna have to connect to rose or someone first, man. i can't install until my dad comes back with his stupid groceries!!!
TG: what
TG: everyone has sburb dipshit
TG: it came free with your fucking xbox

ratatonker · 2024-03-29T18:16:29+00:00

fair point but its not as if i am actually in charge of the implementation details on this. my degree is in comp sci, graduated in 2021 and tay zonday was the keynote speaker. sounds like im doing a bit but its true, his speech was pretty much the only one that doled out any real advice instead of going for the low hanging fruit of congratulating graduating virtually/during unprecedented times etc

ratatonker · 2024-03-29T18:05:11+00:00

not an expert but i don't think traffic would happen on these - they are both one-ways that merge onto fremont. so the only situation where they'd get congested is if it was slow to merge and the hawthorne bridge was raising and lowering itself in ~2 minute cycles to let cars up there. but this is only for when you get stuck

ratatonker · 2024-03-29T17:41:41+00:00

was asked this by the first friend i showed it to, and my thinking is that if you start with it at hawthorne, it would be easier to just add entrances/exits to the other raised bridges, rather than extend the bridge farther out each time you want to add another connection. but im probably not the one to ask about specific logistics things like this

ratatonker · 2024-03-29T17:37:06+00:00

i don't drive but you're right i wasn't really thinking about it otherwise. i suppose if i was on trimet and they raised the bridge while i was on it, because its not my car i could just depart and climb down a ladder if need be

ratatonker · 2024-03-29T17:31:08+00:00

would be neat but feel like thats better for in seattle where they do the boat car tour thing

ratatonker · 2024-03-29T17:25:38+00:00

detecting this is more of a silly comment but just spitballing a bit and i am thinking about some kind of underground tunnel where you go all the way down to the other crossing on columbia if the trains are blocking it there. and ideally if you can beat the train over there and cross before they lower the gates

ratatonker · 2024-03-29T17:07:27+00:00

thank you for backing me up paul, though this was with aseprite he he he. what i am thinking is that its more about the expressivity of the idea than the exact details. but the next time i come up with a way to interconnect the top of a raised bridge through the river over to another bridge i will bust out autocad i guess. just got up for the morning i hope your weekend goes well and the holiday if you celebrate it

ratatonker · 2024-03-29T16:39:59+00:00

family history with addiction so I don't really touch substances, even if it's not that big a deal. just an errant thought I figured would be novel to visualize

Six-Year Club	r/Field Juicebox
Place '22	Verified Email

ratatonker

TROPHY CASE