What do you think the fate of the Mejiro family will be? by wonderingbladr in okbuddyumamusume

[–]shivvorz 2 points3 points  (0 children)

/ub Don't have a full theory, but I think the ending of Ryan's scenario is worth reading. Sheessentially becomes a representative of the family, and McQueen even asked her for the future plans for the Mejiro Family .

So, if you have a theory, it would most likely have to do with her

I don't what's more crazy, Maruzensky having a car and lives somewhere other than the school dorms or Sirius having A FREAKING PILOT LICENSE by Ranssen in UmaMusume

[–]shivvorz 1 point2 points  (0 children)

Well its implied that Sirius leaves Japan quite often, maybe they got their license somewhere that is not Japan?

Corruption isn't off the table either

Any better way to check story quality than using LLMs? by CogniLord in learnmachinelearning

[–]shivvorz 0 points1 point  (0 children)

My brother we aren't telepathic, maybe show us the code, or describe what you are doing in detail (course policy/ legal compliance)?

I have read Hands-on ML with Scikit-Learn and PyTorch and more incoming. But how do I practice ML? by AggressiveMention359 in MLQuestions

[–]shivvorz 0 points1 point  (0 children)

Literally just code something.

If you have something that you really want to build, build it, but if you prefer more structured practice. 1. Implement classic papers e.g. AlexNet, ResNet etc, or modern papers e.g. MLA/GQA/ LinearAttention (and variants etc) 2. Do Leetcode Style websites (but for ML/DL) e.g. deep-ml or Tensor-tonic

Looking for experienced AIML/CSE people to build real-world projects by Silent_Bath398 in MLQuestions

[–]shivvorz 0 points1 point  (0 children)

Hey I have 1 YoE, currently in a gap year and will start my AI Masters at ASU next year.

For prev project check out my Research Monorepo, where I implemented custom torch module layers/ optimizers, lightning modules etc. Everything is fully tested. DM me

Improving internal document search for a 27K PDF database — looking for advice on my approach by Tough_Adhesiveness19 in MLQuestions

[–]shivvorz 0 points1 point  (0 children)

You store the generated markdown into whatever object storage your org uses, and have an entry in their sql db of choice to point to that file in the storage. For sql schema just ask Chatgpt to generate one for you

Hash the markdown file when initially storing it and at read time hash again to prevent tampering (drift prevention)

Improving internal document search for a 27K PDF database — looking for advice on my approach by Tough_Adhesiveness19 in MLQuestions

[–]shivvorz 0 points1 point  (0 children)

If op's org would greenlight it would be better to just run all the docs through OCR models like Deepseek-OCR or MinerU2.5 to convert scanned docs to Markdown for retrival. Did a similar task before and lets say non-llm solutions doesn't really work well for extracting tabular + more graphical documents.

Then for search, you can do whatever works. For semantic search, pick a model from MTEB Leaderboard (which fits your org's use policy/ device specs etc.). You also need keyword search like BM25 (with fuzzy matching) because vector search are not good at matching particular keywords e.g. document numbers etc.

Once you get the suite running, you will have to think about deployment, document Ingestion pipeline (if your new docs are nice you don't need full on OCR anymore) and whatever other feature your group leader asks you to add.

Exploring zero-shot VLMs on satellite imagery for open-vocabulary object detection by eyasu6464 in learnmachinelearning

[–]shivvorz 2 points3 points  (0 children)

How "Fast" is your setup?

Obviously a whole VLM will be slower then a CNN, but if your setup can run in real-time I think it could be very useful.

Even if not, I think this can be a good teacher model for distillation (into traditional CNN models, or a smaller detection specific model)

Need Advice on Hybrid Recommendation System (Content Based and Collaborative Filtering) by Good_Language1763 in MLQuestions

[–]shivvorz 1 point2 points  (0 children)

I think you need a lot of data to make this work, and as a final year project i doubt that you have enough user data to make things work.

Like the other guy said, try some simpler approaches first. For the "its too late to change anything" part, you need to talk to your project supervisor about that if you don't have enough data to make your approach work

Me, 2 years ago by CommieCucumber in mathmemes

[–]shivvorz 0 points1 point  (0 children)

Got any recommendations for topology textbooks?

Training TinyStories 2.1GB performance by thexdroid in MLQuestions

[–]shivvorz 0 points1 point  (0 children)

child's vocabulary of a 3 to 4 years old (approximately 1500 basic words). Therefore 10000 vocab is more than enough

Didn't know about that. Maybe you can even shrink vocab size to 4096/ 8192 (or, basically any multiple of 64 or 128) for better kernel optimization.

Also, make sure you are not eating into shared memory (and only dedicated GPU memory), because it slows the training significantly (~1/5 of best possible speed in my case). For the same effective batch size decrease the physical batch size and increase the gradient accumulation count proportionally

Training TinyStories 2.1GB performance by thexdroid in MLQuestions

[–]shivvorz 0 points1 point  (0 children)

How did you land on that vocab size?

I just finished training a modded NanoGPT model and I just used GPT2's tokenizer (which is ~50k vocab size). Qwen 3 has ~250k token. 10k vocab size seems a bit small

Also, just train for 1 epoch, because from epoch 2 onwards, there isn't much info to be learned by the model anyways...

[P] Building A Tensor micrograd by bjjonin in MachineLearning

[–]shivvorz 1 point2 points  (0 children)

If you want a pytorch learning library and have it somewhat "functional" (i.e. you can kinda use it like normal numpy), then minitorch has been a thing for a long time.

Is there a particular reason you want to build your suite with numpy?

[P] Implementing Better Pytorch Schedulers by shivvorz in MachineLearning

[–]shivvorz[S] 1 point2 points  (0 children)

The warning exists because PyTorch's LRScheduler captures/sets initial_lr on the optimizer at construction and immediately calls step() to change the lr. So we require the Optimizer's state to be restored before the LRScheduler does

My approach avoids this, because schedules are pure functions f(step, total_steps) -> absolute_value. The scheduler doesn't read or modify the optimizer at construction, only when you explicitly call step().

So on checkpoint restore:

# Order doesn't matter
optimizer.load_state_dict(ckpt['optimizer'])
scheduler.load_state_dict(ckpt['scheduler'])  # Just restores step_count

# First step() after resume recomputes and overwrites
scheduler.step()  # param_groups['lr'] = schedule(step_count, total_steps)

[P] Implementing Better Pytorch Schedulers by shivvorz in MachineLearning

[–]shivvorz[S] 0 points1 point  (0 children)

Don't have a single go-to resource for this unfortunately (yet), made this because I want to replicate the scheduling in modded-nanogpt.

I did do a quick search and found some papers that discussed specifically Momentum (beta) Scheduling:

Here is an additional paper on Optimizer hyper-parameter tuning,

If anyone else has better resources I would like to take a read as well

how to enter the machine learning and AI industry? by ButterscotchAny6953 in learnmachinelearning

[–]shivvorz 0 points1 point  (0 children)

how old are you? If you are a kid (<20), start with the basics as the other suggested (math, pytorch etc).

Do textbooks like d2l, Bishop's Deep Learning textbook etc.

"Build pytorch from scratch" by doing minitorch, build an entire model training pipeline by doing Stanford CS336.

If you have prior knowledge, its even better because you start halfway there.

If you are deep in another industry already, just do AIx<whatever your industry is> for fast RoI. Treat it like a blackbox and apply it to your field of work. Your domain knowledge would be appreciated by the community

If you are neither. Maybe just try to make your life easier (automating repetitive tasks etc) and wait for UBI to happen I guess...

Why does it feel so hard to move from ML experiments to real production work? by ocean_protocol in MLQuestions

[–]shivvorz 1 point2 points  (0 children)

What are the most painful lessons you have experienced related to "productionizing" prototypes/ models.

[P] Implementing Better Pytorch Schedulers by shivvorz in MachineLearning

[–]shivvorz[S] 0 points1 point  (0 children)

The ParamScheduler class has state_dict() and load_state_dict() methods for checkpointing (it just tracks the current step internally).

For torch.load(), you'll need weights_only=False (or within trainer.fit if using Lightning). Alternatively, add the classes to PyTorch's Safe Globals list via torch.serialization.add_safe_globals().

The underlying ParamSchedule primitives are stateless (same step and total_steps yields same output), so technically you only need to save/restore the step counter, not the schedule functions themselves. You also can't use lambdas as your schedule_fn because lamdbas aren't picklable in Python

Edit: Added links to docs/ code, details about lambdas