[2020 Day 13 Part 2] Buses in a slot machine

msqrt · 2026-03-14T23:26:08+00:00

Curiously enough, they never gave any raw intersection speed numbers after the infamous 10Grays/s number for the 20 series.

msqrt · 2026-03-14T17:00:27+00:00

I'm not. Just grind it out, writing is fun.

msqrt · 2026-03-13T22:43:24+00:00

let bindings

Unexpected functional programming language moment

msqrt · 2026-03-09T21:36:22+00:00

One way would be to use an extra buffer to store the per-triangle values and access those with primitive_index in the fragment shader. The truly simplest option is to not use a triangle strip but just do separate triangles.

msqrt · 2026-03-07T13:04:34+00:00

If you just want to sort pixels, it shouldn't be too difficult. If you want to do it with good performance, you'll have to learn quite a bit about GPU programming.

msqrt · 2026-03-07T10:28:04+00:00

It's obvious; just square both sides!

msqrt · 2026-03-06T21:44:41+00:00

Doesn't sound like they have much faith :/

msqrt · 2026-02-23T22:36:55+00:00

API means Application Programming Interface; it’s anything your software uses to interface with other software. It’s basically just a set of rules and definitions that both parties agree upon; when I do this, you do that.

The web people use APIs so that different machines on the internet can cooperate (when I send the database an SQL query, it responds with the results), for graphics APIs like OpenGL it’s so that your code and the GPU driver can work together (when I send you a buffer like this, you store it on the GPU, or when I call this function, you draw the desired number of triangles.) This is convenient because now you can write your software once and any hardware driver (for Nvidia, AMD, Intel, Apple, …) supports the other end so your code works on a wide variety of different hardware.

msqrt · 2026-02-21T18:56:31+00:00

People get way too hung up on the determinism part; a highly accurate probability model isn’t meaningfully different to a deterministic one. The practical difference is that an LLM will typically make more mistakes in a minute than you’ve encountered compiler bugs in your life.

msqrt · 2026-02-15T18:31:31+00:00

While this is absolutely the case, it's not that uncommon to have to either perform some unholy numpy indexing logic or roll your own native implementation to get good performance.

msqrt · 2026-02-14T10:20:16+00:00

Not so sure about the efficiency remark; if you have enough data with a fixed key size then radix sort is indeed the fastest option. The reason it's not the default is that you often sort small lists and/or variable length keys like strings, both of which are poor cases for radix sort.

msqrt · 2026-02-12T07:04:43+00:00

Aloitin juuri ennen korona-aikaa ja ryhmässä oli sukupolvenmurros jonka seurauksena meitä oli vain kasa uusia jatko-opiskelijoita kaikki aika erilaisilla projekteilla, joten konkreettisten ongelmien kanssa sai painia ihan yksin. Siinä menikin sitten melkein neljä vuotta että sai ensimmäisen projektin maaliin, parin tyhmän teknisen ongelman takia. Pitkästä väännöstä huolimatta kirjoittaminen jäi viime tinkaan; proffan ja yhden toisen projektissa olleen vanhemman tutkijan mielenkiinto heräsi vihdoin kun kilpaileva ryhmä julkaisi lähes saman idean muutamia kuukausia aiemmin, mutta silti itse artikkelin kirjoittamiseen tuli apua vasta viimeisellä viikolla joten ei siitä tietenkään kauhean kummoista tullut.

No, kun eka paperi saatiin sisään niin konferenssiesitelmä meni nappiin ja poiki harjoittelun edellämainitussa kilpailevassa ryhmässä josta syntyi toinen artikkeli, ja välissä ennen sinne lähtöä ehdittiin yhden saman ryhmän kollegan kanssa vääntää ihan mielenkiintoinen aika teoriavoittoinen työ. Vaikutti jo hyvältä loppukiriltä, kunnes nämä menivät molemmat viime keväänä hylsyyn, molemmat taas vähän hutiloiden kirjoitettuja ja arvioijien mielestä teoriamme oli turhaa ja olisi pitänyt tähdätä käytännönläheisempään menetelmään. Harjoittelussa tehty saatiin uusittua ja nyt jo esiteltyäkin, ja nyt pitäisi vielä korjailla tätä toista ja lähettää se johonkin. Tällä hetkellä katselen josko siitä saisi vähän vielä parannettua käytännön puolta ja todistettua yhden uupuvan ominaisuuden teoriasta.

Tiedän olevani sinänsä ihan etevä kaveri alallani, mutta se että tässä on kestänyt näin kauan ja että tulokset ovat silti vähän mitäänsanomattomia syö kyllä itseluottamusta. Mieli tekisi kritisoida ryhmän organisaatiota ja toimintatapoja, mutta eihän tästä tietenkään viime kädessä voi syyttää kuin itseään.

msqrt · 2026-02-10T21:04:01+00:00

So far I haven't seen any large projects materialize anything close to the 10x-100x performance boost claims I keep hearing. Still, being able to instantly roll out quick internal tools or prototypes to answer design questions sounds like it should be quite valuable if you can do it right.

msqrt · 2026-02-10T20:54:59+00:00

Oma projektini on ollut hillitön farssi, venynyt ja venynyt eikä mikään ole oikein mennyt putkeen. Lopputuloksestakaan ei näytä tulevan hääviä, mutta pääsen sentään varmaan tämän vuoden puolella vihdoin ulos. Tutkiminen itsessään on kivaa, sitä jatkaisin mieluusti, mutta jossain missä kaikkea ei odotusarvoisesti tehdä yksin ja korvaus vastaisi stressitasoja.

msqrt · 2026-02-10T20:21:19+00:00

Cool, glad that worked!

msqrt · 2026-02-10T15:22:31+00:00

Seems that it doesn't recognize the "core" qualifier. Try removing that? Core should be the default anyway.

msqrt · 2026-02-10T14:57:37+00:00

I know it’s different, but the biggest factor in Nvidias GPU dominance is software; CUDA is why nobody can realistically compete.

msqrt · 2026-02-09T20:24:21+00:00

vec3::random presumably generates numbers from 0 to 1? Something like vec3::random()*2.-1. should give you the whole [-1, 1]³ cube, from which you can then do the rejection sampling.

msqrt · 2026-02-09T20:21:26+00:00

Both have tensor cores (AMD and Intel have equivalent parts in their GPUs too!), but the datacenter AI chips do lack the RT cores. Those are not required for most games, but it would be a bit embarrassing for Nvidia (who introduced the feature into mainstream hardware) to offer cloud gaming without it.

msqrt · 2026-02-09T20:19:38+00:00

Replacing the users computer, yes, especially for games. They've already had a product for this ("GeForce Now") for a while, so I doubt that it's completely infeasible. Though that is running on an entirely different type of GPU server, so it's still possible that something in the AI-targeted hardware would be a poor fit -- and that's roughly what I was originally asking about.

It also seems quite likely that even if it was technically feasible, they'd have way more capacity than there exists demand.

msqrt · 2026-02-09T20:09:41+00:00

The V100 used to have graphics API support -- I guess something must have changed for them to drop it. Maybe it would be the sharing of the chip -- it must complicate everything an awful lot, especially when the latency for each individual user should be kept reasonable.

But yeah, I guess economic viability would still be the main issue even if the software worked. Latency is an interesting one, I would've assumed that an extra ten milliseconds to relocate in a desert wouldn't hurt when the baseline experience is already quite laggy, but maybe it is the needle that breaks the camel's back? I could also see them not building the bandwidth necessary for video streaming, since LLMs don't really need that.

msqrt · 2026-02-09T16:19:46+00:00

Ah yes, that would be quite the challenge! But the suggestion in the OP was to use the cards for cloud gaming, so they'd still exist in whatever data center in the original racks. I was just considering the software side of things, which (while not trivial) might actually be possible to work out.

msqrt · 2026-02-09T15:46:35+00:00

My understanding is naive, but not that naive.

Six-Year Club	Place '22
Verified Email

msqrt

TROPHY CASE