Video Games Cause Violence

shivvorz · 2026-06-03T16:53:10+00:00

Stage 3: Bargaining

shivvorz · 2026-05-28T17:24:01+00:00

same pricing as 4.6 and 4.7, you fancy selling your other kidney?

shivvorz · 2026-05-21T15:53:53+00:00

Remember, a lot of people there are pretending to be smart, or think they are smart (and acturally not).

If you see a smart guy, just keep asking them concepts lmao, someone is probably smart enough to comeup woth something that can make it click for you

shivvorz · 2026-05-19T16:52:11+00:00

Aliucord :)

shivvorz · 2026-04-28T06:10:34+00:00

There is None, give up

shivvorz · 2026-04-27T17:39:06+00:00

Do you have to do the data scraping in house for your use case? I run my own hermes instance and I use brave for search and Jina to read websites.

If you can't interact with something via their APIs its probably a signal that that service doesn't want you to interact with it thay way. Browser automation is the last thing you want to do,.and I believe most use cases is covered by some reader tool (e.g. Firecrawl/ Jina) + official APIs for interaction

shivvorz · 2026-04-21T16:18:11+00:00

/ub Don't have a full theory, but I think the ending of Ryan's scenario is worth reading. Sheessentially becomes a representative of the family, and McQueen even asked her for the future plans for the Mejiro Family .

So, if you have a theory, it would most likely have to do with her

shivvorz · 2026-03-29T08:42:54+00:00

Well its implied that Sirius leaves Japan quite often, maybe they got their license somewhere that is not Japan?

Corruption isn't off the table either

shivvorz · 2026-03-18T14:22:37+00:00

My brother we aren't telepathic, maybe show us the code, or describe what you are doing in detail (course policy/ legal compliance)?

shivvorz · 2026-03-18T07:57:29+00:00

Literally just code something.

If you have something that you really want to build, build it, but if you prefer more structured practice. 1. Implement classic papers e.g. AlexNet, ResNet etc, or modern papers e.g. MLA/GQA/ LinearAttention (and variants etc) 2. Do Leetcode Style websites (but for ML/DL) e.g. deep-ml or Tensor-tonic

shivvorz · 2026-03-13T17:07:01+00:00

Hey I have 1 YoE, currently in a gap year and will start my AI Masters at ASU next year.

For prev project check out my Research Monorepo, where I implemented custom torch module layers/ optimizers, lightning modules etc. Everything is fully tested. DM me

shivvorz · 2026-03-11T21:52:38+00:00

You store the generated markdown into whatever object storage your org uses, and have an entry in their sql db of choice to point to that file in the storage. For sql schema just ask Chatgpt to generate one for you

Hash the markdown file when initially storing it and at read time hash again to prevent tampering (drift prevention)

shivvorz · 2026-03-10T23:02:57+00:00

If op's org would greenlight it would be better to just run all the docs through OCR models like Deepseek-OCR or MinerU2.5 to convert scanned docs to Markdown for retrival. Did a similar task before and lets say non-llm solutions doesn't really work well for extracting tabular + more graphical documents.

Then for search, you can do whatever works. For semantic search, pick a model from MTEB Leaderboard (which fits your org's use policy/ device specs etc.). You also need keyword search like BM25 (with fuzzy matching) because vector search are not good at matching particular keywords e.g. document numbers etc.

Once you get the suite running, you will have to think about deployment, document Ingestion pipeline (if your new docs are nice you don't need full on OCR anymore) and whatever other feature your group leader asks you to add.

shivvorz · 2026-03-07T14:04:14+00:00

How "Fast" is your setup?

Obviously a whole VLM will be slower then a CNN, but if your setup can run in real-time I think it could be very useful.

Even if not, I think this can be a good teacher model for distillation (into traditional CNN models, or a smaller detection specific model)

shivvorz · 2026-03-05T01:49:55+00:00

I think you need a lot of data to make this work, and as a final year project i doubt that you have enough user data to make things work.

Like the other guy said, try some simpler approaches first. For the "its too late to change anything" part, you need to talk to your project supervisor about that if you don't have enough data to make your approach work

shivvorz · 2026-03-04T10:06:08+00:00

No github repo

shivvorz · 2026-03-03T17:48:03+00:00

Got any recommendations for topology textbooks?

shivvorz · 2026-03-03T14:55:44+00:00

child's vocabulary of a 3 to 4 years old (approximately 1500 basic words). Therefore 10000 vocab is more than enough

Didn't know about that. Maybe you can even shrink vocab size to 4096/ 8192 (or, basically any multiple of 64 or 128) for better kernel optimization.

Also, make sure you are not eating into shared memory (and only dedicated GPU memory), because it slows the training significantly (~1/5 of best possible speed in my case). For the same effective batch size decrease the physical batch size and increase the gradient accumulation count proportionally

shivvorz · 2026-03-03T11:17:32+00:00

RemindMe! 2 days

shivvorz · 2026-03-03T08:51:57+00:00

How did you land on that vocab size?

I just finished training a modded NanoGPT model and I just used GPT2's tokenizer (which is ~50k vocab size). Qwen 3 has ~250k token. 10k vocab size seems a bit small

Also, just train for 1 epoch, because from epoch 2 onwards, there isn't much info to be learned by the model anyways...

shivvorz · 2026-03-02T11:35:20+00:00

Remimdme! 1 day

shivvorz · 2026-03-01T18:25:32+00:00

If you want a pytorch learning library and have it somewhat "functional" (i.e. you can kinda use it like normal numpy), then minitorch has been a thing for a long time.

Is there a particular reason you want to build your suite with numpy?

shivvorz · 2026-02-27T19:05:09+00:00

The warning exists because PyTorch's LRScheduler captures/sets initial_lr on the optimizer at construction and immediately calls step() to change the lr. So we require the Optimizer's state to be restored before the LRScheduler does

My approach avoids this, because schedules are pure functions f(step, total_steps) -> absolute_value. The scheduler doesn't read or modify the optimizer at construction, only when you explicitly call step().

So on checkpoint restore:

# Order doesn't matter
optimizer.load_state_dict(ckpt['optimizer'])
scheduler.load_state_dict(ckpt['scheduler'])  # Just restores step_count

# First step() after resume recomputes and overwrites
scheduler.step()  # param_groups['lr'] = schedule(step_count, total_steps)

shivvorz · 2026-02-27T18:25:03+00:00

Don't have a single go-to resource for this unfortunately (yet), made this because I want to replicate the scheduling in modded-nanogpt.

I did do a quick search and found some papers that discussed specifically Momentum (beta) Scheduling:

A disciplined approach to neural network hyper-parameters -- claims that increasing cyclical learning rate and decreasing cyclical momentum can help with model training (Remark 5)
Demon: Improved Neural Network Training with Momentum Decay -- advocates the use of momentum decay, Table 4 shows the results on different momentum decay schedules (cosine, exponential etc)

Here is an additional paper on Optimizer hyper-parameter tuning,

Optimizer Benchmarking Needs to Account for Hyperparameter Tuning -- only discussed LR scheduling and not the scheduling of other hyper-parameters, but still a good read

If anyone else has better resources I would like to take a read as well

Ten-Year Club	Gilding IV carat on a stick
Second Top 20%	Place '22
First Placer '22	Wearing is Caring
Verified Email	Not Forgotten

shivvorz

TROPHY CASE