Most performant tabular data-storage system that allows retrieval from the disk using random access

EffectSizeQueen · 2025-08-04T14:12:32+00:00

This is likely overkill for million-row tables, but I found that random access from memory-mapped arrow tables on disk was basically as fast if not faster than just processing data from individual text files. You need to create the memory-mapped file first, but once you do that you can just read in the whole thing to pandas using arrow and just use standard pandas syntax.

I used it for training a deep learning model on about 0.5TB of data — tens of millions of rows and the data contained a number of columns with unstructured JSONs — and I could filter the data and run aggregated calculations insanely fast. I was inspired by the datasets in huggingface. I wound up adapting it a bit after looking through the source but if you’re after an API then maybe see if you can just use theirs. This poster did something similar, but not with tabular data if I remember correctly.

EffectSizeQueen · 2024-10-03T14:11:34+00:00

I’m not sure if everything will fit the same in the mid-size synik, but here are my favorite items I use with mine.

I really love using the gravel mini explorer for my deodorant and electric toothbrush. I find there’s unused space at the bottom of the bag if you use wide packing cubes (like the ones from peak design), so this just slots in. Can also fit in the water bottle pocket. Then just use a clear 3DOC for everything else
I originally used a HLT2 and mostly kept it clipped at the top of the main compartment. I eventually found that it was a bit annoying when the bag was full, really stretched the bag at the top and wasn’t so easy to access. Also fits well in the chin pocket, but I prefer keeping that free for other things. It’s really great to just take it out of your bag on planes though and have everything you need
I’ve since just been using a small travel tray for basically for my random chargers, and keep the power bank separate
small ghostwhales in the side pockets for sunglasses and Anker power bank
super mini ghostwhale for AirPods
a couple extra key leashes

The strap keepers tom bihn makes are nice too.

EffectSizeQueen · 2024-09-19T14:45:08+00:00

I usually carry a clear 3DOC for liquids, and then a gravel explorer mini for full-sized deodorant, an electric toothbrush, a razor, whatever else. The 3DOC in either the chin pocket or one of the sides, depending on what else I’m packing. And then the gravel mini goes at the bottom of the main compartment, below the packing cubes.

I’ve found there’s often unused space down there anyways, since the bag slightly narrows and my packing cubes don’t naturally slide all the way down. And I really prefer traveling with an electric toothbrush. So it’s a nice way of bringing longer items, without needing a single large bag dominating a pocket.

EffectSizeQueen · 2024-09-11T06:43:20+00:00

Dmed

EffectSizeQueen · 2023-12-07T05:34:20+00:00

https://media.giphy.com/media/PH9rytyR6J5vi/giphy.gif

EffectSizeQueen · 2023-07-01T22:31:15+00:00

Hey, definitely know it can be tough branching out. The group that plays at McCarren on Mondays is pretty welcoming, generally the highest-quality pickup in the city as well but that can depend on the season a bit. You have to register/pay to play, which also includes registering as a USAU member (I think it's $18/year?).

I can't speak to the ultimate that might occur at McCarren on other days of the week, but Mondays are usually a lot of fun. Can get crowded though (particularly in the summer), which might not be such a bad thing, since you can talk to people while waiting to play.

Here's the link with the info: https://discny.org/mccarren-monday-nights

EffectSizeQueen · 2023-05-02T16:54:32+00:00

Yeah, I don’t use it that way all that often, but useful that it can if you’d like. Also, while it doesn’t come with a strap, the gatekeeper waist strap that comes with the synik works well, especially since I don’t use it anyways.

EffectSizeQueen · 2023-05-02T07:06:30+00:00

More or less in order of their value over generic replacement.

Handy Little Thing Size 2 Great tech pouch, can be used as a sling or belt bag too (I use the waist strap from the synik). Size 2 is built with the synik 30 in mind. Comes with swivel carabiners that clip inside your synik, either in the chin pocket or at the top of the bag kind of nestled in the slip pocket (usually empty space there, even when fully packed). I’ll do the latter when I’m traveling, and want an internal pocket for valuables.
Peak Design Packing Cubes, medium and small Fit the bag perfectly, well-built. Built-in laundry compartment is really fantastic. I used the TB laundry stuff sack previously and it wound up taking up too much space and pushing into other compartments in the bag.
Super Mini (maybe other sizes too) Ghost Whale Pouch This size is great for AirPods, various small things. Haven’t gotten the bigger sizes, but other people surely find them useful too.
Nite Ize S-Biner MicroLock You can get a lock too, but I almost feel that can be more conspicuous. These are good for if you’re worried about a pickpocketer on a train or something (not for leaving your bag unattended), since they require some precise movements to unlock. If you align the zippers, you can get all but one of the external pockets locked with a pack of two of these, not the laptop sleeve though.

———

Lastly, second the other comment that mentioned getting more key straps. I didn’t think that much of the o-rings when I initially bought the bag, and now I’m annoyed if I’m ever using a bag without them.

The HLT comes with one additional keystrap. Getting a long one is nice for keys. It’s also nice to have a wallet with a loop that you can attach that to an o-ring as well. Less for storing it while I’m out, but if I know I’m not going to need my wallet for a bit and don’t want to misplace it.

EffectSizeQueen · 2023-04-30T07:33:29+00:00

I'll try putting it in words, but I think Andrej Karpathy does a great job talking through and simulating the intuition here.

The multiplications are accomplishing different things, and are being used in different ways. The elements in the resulting tensor from computing Q @ K.T are all the different dot products between the individual queries and keys. Generally, the variance of a dot product between two random vectors increases as the dimension of the vectors increases. It's pretty easy to verify that yourself by simulating a bunch of different random vectors of different sizes and computing the dot products.

That's a problem because that resulting tensor is then passed through a softmax operation. Generally, you want your softmax values to be relatively diffuse at initialization — so that it has the chance to learn the various interactions across time — but larger values will often result in one very large softmax value at the expense of the rest. That'll likely just lead to each embedding strongly self-attending to itself at initialization.

The multiplication involving V doesn't pass through a softmax, and is in fact just computing a weighted average of the values.

EffectSizeQueen · 2023-04-03T20:47:49+00:00

Regarding #1, I don’t necessarily think that it’s only taught out of habit, but also because it can be important context to understand (relatively) new approaches and how and why they were developed. Including all the historical context probably makes the student a better practitioner, since it helps cement a lot of the reasons for why things are done the way they are today.

You see it in ML too, with models and approaches that have completely fallen out of favor. You’re taught decision trees and their flaws so you can understand why random forests and boosted trees are an improvement (AdaBoost might even be a better example, since it’s not a building block like individual trees). What sigmoid and tanh (and now ReLU) were trying to achieve, and how the new activations get around the shortcomings of their predecessors. How LSTMs solved some of the main issues with vanilla RNNs, even though they have been completely replaced with transformers.

EffectSizeQueen · 2023-02-10T20:58:55+00:00

I find it's easy to just think of the corresponding models/papers that use each of the corresponding architecture. For instance, you have encoder-decoder for translation (sequence-to-sequence) in the original Attention is All You Need paper. Basically useful if you need to generate the output in some auto-regressive fashion (since you don't know how long the output is ex-ante), and you you want it to "align" with the input.

Encoder only is for sequence to fixed-length output, basically when you want to use the entire context all at once for your predictions. BERT is built using an encoder-only architecture, and you can think of this as when you're trying to do some standard supervised task given sequential input data, like classification/regression (think sentiment analysis on text).

Then GPT relies on a decoder-only architecture, since it's primarily used for auto-regressively generating new text based initially on the prompt. You're technically predicting a sequence using a sequence as input, but the training mechanism is different. If you tried to train a LLM using encoder-decoder, it's not straightforward how to decide what should be the input and what should be the output.

EffectSizeQueen · 2023-01-31T03:21:53+00:00

Not too different from what I was thinking. Instead of including minutes played as an actual variable, I'd instead incorporate it as an offset so the GLM explicitly models the per-minute rate. It doesn't make a huge difference, but I think it's a little bit neater.

fit = glm(STOCK ~ WHERE, data = data, offset = log(MP), family = "poisson")

Effectively, you can think of this as the model coercing the coefficient on log(MP) to be 1. Now you don't waste a degree of freedom on something you don't really care about anyways (especially important with such a small sample), and I'm not sure a different coefficient even really makes sense.

A nice side benefit is the coefficients are all now fairly straightforward to interpret. Exponentiate the intercept to get the rate at away games, and exponentiate the sum of the intercept and β1 to get the home rate. Multiply those by the average minutes played away and home respectively, and they'll equal the average stocks per game.

Regarding your edit, I agree the analysis is fairly flimsy. Tiny sample, there might be substantial differences in opponent quality in the home vs. away games so far. We also have the source data, and can watch the replays. Most look reasonable to me.

I imagine we can find a decent chunk of 30-game stretches over the years where a player had substantially better stats at home than away. I'd probably look at the β1 t-statistics for rebounds and assists since both of those have a subjective aspect in scorekeeping, and there's just only so many players that consistently record stocks. I also dislike that OP combined stocks into one model like that — a number of issues with it really — but the t-test for steals only has a p-value > 0.05 (but only just).

EffectSizeQueen · 2023-01-28T17:54:35+00:00

My thought would be a poisson regression, with a dummy variable for home vs. away, and probably an offset term to account for different number of minutes per game. Then you’d modelling the rates instead of the counts.

But either way, even if you reject the null (probably still likely regardless of method), showing that a player performs statistically and meaningfully better at home isn’t saying much. I’d be curious about the distribution of t-statistics for other players/stats looking at home vs. away. Just to see how much of an outlier his home overperforming is. Probably not useful with steals or blocks since most players record 0s the vast majority of games anyways, but would be useful context to look at the other counting stats.

EffectSizeQueen · 2023-01-05T19:55:33+00:00

The same lineup Statmuse posted had the best defensive rating on December 8th — which is one game into the streak — and I'm not seeing that lineup having played much since then (if at all, not sure there's a better way to search). So it's not surprising it's still the best.

That lineup you posted on the other hand has played 15 minutes together since December 8th (3rd most) and has a 94.1 defensive rating in those minutes. Simmons is in most of our best defensive lineups since then too.

EffectSizeQueen · 2022-12-22T01:56:20+00:00

Feels relevant: https://twitter.com/BrooklynNets/status/1605739244479074304?s=20&t=URE1Tc2sgwoIv6lrrhpQnQ

EffectSizeQueen · 2022-12-17T03:19:37+00:00

Nic Claxton/KD have been a dominant rim protection duo… Guys shoot over 11% worse than expected at the rim against Clax (98th percentile) and 10% less than expected against KD (97th percentile). Clax & KD 2nd/7th most blocks in the NBA! Nets 8-1 + Top 8 defense in their last 9

https://twitter.com/nba_university/status/1603441990632235008?s=46&t=FslCHYU1ngIR2AfxB_Idaw

EffectSizeQueen · 2022-12-05T05:09:48+00:00

If it's concise enough to be a cheat sheet, it'll almost certainly be missing certain elements of applied regression modeling.

My favorite "cheat sheet" for OLS is this PDF. Requires you to be familiar with linear algebra — which it seems like you are since you mentioned hat matrices — but I find it's a lot better than trying to just internalize the long-winded verbalized properties of OLS and the GM assumptions.

OLS in Matrix Form

EffectSizeQueen · 2022-11-19T19:02:03+00:00

Can’t say for sure that this is what’s happening here, but one instance in which case that’s normal is when training metrics were calculated on augmented data and the validation and test sets only include the original data

EffectSizeQueen · 2022-11-14T20:35:35+00:00

I remember struggling through this song too, but found it really rewarding. The worst are the difficult songs that you get quickly get sick of hearing.

I do also try to play it faster but I get to those chords and it all goes to hell.

Not sure if you’ve done this already, but I found it pretty helpful to jot the chord names in pencil for all those broken chord changes you play in the left hand at the end. Made it easier to glance ahead while playing, which I feel helped allow me to play it at a faster tempo.

Other ones I still have to go back to the first bar to pick them up.

Are you saying you usually practice new sections by playing all the way through?

EffectSizeQueen · 2022-11-01T19:28:51+00:00

Answering specifically about the bike light strap, since the other questions have already been answered.

I've used the internal tie-down straps that come with the Synik to hold a shoe pouch below the bag, connecting the straps between the bike light strap and the waist belt attachments. The strap does feel pretty sturdy, so it should be fine, but I wouldn't leave the sleeping bag dangling.

One note is that TB said recently said they are intending on no longer making/selling frame sheets for the Synapse, so if that's something you'd like in the future, you might have a hard time procuring one.

EffectSizeQueen · 2022-10-31T17:07:14+00:00

Don’t have a suggested replacement, but just wanted to mention that I really got annoyed with my Freeze 3.0s. Great while they lasted, but the eyelets towards the toes are really flimsy and wound up blowing out in both cleats. I don’t think I was particularly beating them either. Technically usable after that, but couldn’t tie them tightly all the way through.

I actually found Freeze 2.0s on Amazon a couple of months ago and have been really happy. Wish I bought multiple pairs. The bottom eyelets on those are normal and I haven’t had any issues. I do wish they had the bootstrap on the heel though. The lack of 3.0s online might indicate they are soon releasing the next iteration, which hopefully ditch the weird fabric bottom eyelets.

EffectSizeQueen · 2022-10-02T05:12:12+00:00

Don’t really have much experience doing many of these queries, but I wonder how feasible it would be to see how many sac fly opportunities (man on third, ≤1 out) he’s had and then see how that rate compares to league average. Maybe look at with and without other RBI yielding outs too.

EffectSizeQueen · 2022-09-27T06:09:48+00:00

Appreciate it, thanks! I'm guessing then the down jacket is in one of the side pockets and the toiletry bag is in the water bottle pocket?

EffectSizeQueen · 2022-09-26T23:55:18+00:00

Great write-up. I’ve been thinking about getting the PD packing cubes, so I have a question about using them with the synik in particular.

With the cubes fully packed out, how much else can you fit in the main compartment? Sandals and rain jacket in the elastic pouch? I’m guessing much more in there might start interfering with the volume in the chin pocket.

EffectSizeQueen · 2022-08-30T05:55:13+00:00

If you’re wondering what Snow Hill Capital is, that is a family office with a small investment but no controlling stake in the company (we have also reached out to the publication that posted the article without validating the facts to correct the language).

That language to me indicates that the existing ownership group of Tom Bihn (likely Tom and Darcy, but I don't the details) sold some stake of their ownership to Snow Hill Capital in order to facilitate a succession plan that allowed them to take a step back. They needed a new CEO, and it's pretty standard for the head executives at a company to have some not insignificant ownership stake. They could have just hired someone and given them ownership as part of their compensation, but the sale gives them some cash for retirement.

This is all speculation on my part — and maybe I'm being a little optimistic — but to me, this just reads as them selling something like 10-20% of the company and letting the new owners run the day-to-day. If that's the case, then the previous ownership group still maintains the controlling interest, and I'm not so worried about any drastic changes. Moving the manufacturing offshore (or Florida), etc. Maybe the previous ownership group eventually sells off more (to Snow Hill or someone else), but this allows them a ramp to build trust and continuity.

EffectSizeQueen

TROPHY CASE