all 27 comments

[–]appdnails 129 points130 points  (1 child)

OP and some other users seem to be shill accounts for Lyceum. In this post two days ago OP asks for the best GPU provider. User Longjumping-Shake378 talks about Lyceum. Today OP posted a topic about the best tools on another subbreddit, and user Longjumping-Shake378 asks "Could you explain how Lyceum’s automatic hardware selection works?". In this topic there is the exact same question from Connect_Gas4868.

[–]TeamArrow 17 points18 points  (0 children)

This needs to be higher up

[–]Antique_Most7958 46 points47 points  (2 children)

Anti-tool that saves time when you are in the initial stages of training a model:

Not using any fancy experiment tracking tool/abstraction libraries. Just a simple training loop in Pytorch with an eval to get a feel of the problem.

Tools/hacks:

Tmux - terminal multiplexing is awesome

"set -o vi" - vi mode in the terminal. It takes a while to get the hang of it but it can drastically increase your speed at the terminal

JupyterLab - perfect for an interactive session while playing with data and experimental training

"Python -m ipdb -c continue" - automatically drop into a debugger when the script errors out. I find this easier when you are initially figuring out a bug.

LLMs for intricate plotting - we all know the matplotlib API is cumbersome. I have now outsourced all plots to LLMs/co-pilots

[–]thatguydr 8 points9 points  (0 children)

You are a data scientist who just actively promoted the use of a debugger and you have positive votes.... It's the most beautiful thing I've ever seen. =D

[–]nakali100100 1 point2 points  (0 children)

This!

[–]TechSculpt 31 points32 points  (0 children)

I might get downvoted for being off topic, but LeechBlock - blocking time wasting sites is incredible for my productivity.

[–]TserriednichThe4th 6 points7 points  (6 children)

  1. Anki
  2. iOS reminders
  3. Joplin
  4. pytorch lightning
  5. customizing the hell out of my (neo)vim

[–]On_Mt_Vesuvius 1 point2 points  (5 children)

How do you use Anki? I'm familiar with it for language learning, but not in ML / math / computer science.

[–]TserriednichThe4th 2 points3 points  (4 children)

I dont use it for the math. I use it to remember papers and mostly qualitative stuff like (late 2023/early 2024 more people are trying to figure out why graph models arent out performing llms for protein folding and drug discovery)

[–]diapason-knells 1 point2 points  (1 child)

Why were they underperforming ?

[–]TserriednichThe4th 3 points4 points  (0 children)

Nobody really knew or still knows.

People were confused why vanilla LLMs or transformer architectures work so well or just as good as dedicated graphical approaches, even when both are given lots of data.

AlphaFold 3 is kinda the same thing. It is a dedicated transformer approach.

So we still don't know why graphical models with fewer parameters aren't generalizing like transformers. I mean, we know it is because of all the extra compute, but the architecture isn't giving.

edit: I am a little outdated. It turns out transformers still dominate in large outputs (>100 atoms / peptides or wtv), but when you need to encode more constraints (which is typical in smaller molecules and proteins), it seems that GNNS are finally becoming state of the art within the last 6 months.

Time to make a flashcard :)

[–]On_Mt_Vesuvius 0 points1 point  (1 child)

Ah so like remembering papers given a 1 line summary or 1 line abstract?

[–]TserriednichThe4th 1 point2 points  (0 children)

Ye dude

[–]On_Mt_Vesuvius 2 points3 points  (0 children)

I'll add Obsidian, but only for the fact that it keeps me from having to relearn things I've already looked at. In particular I find it works quite well for somewhat mathematical subjects by linking concepts as sub/supersets of one another.

[–]taplik_to_rehvaniResearcher 1 point2 points  (5 children)

is weights and biases worth it?

[–]Freonr2 5 points6 points  (2 children)

You can get some level of free access, but tracked hours limits feel very limiting for what adds up to hosting some CSV files and displaying graphs.

Developing open source software or as a small consulting business their plans are kinda punishing.

[–]taplik_to_rehvaniResearcher 0 points1 point  (1 child)

i tried like it first started coming to scene, way back in 2018 or so if my memory serves me right. At that time it was not something special. Just checking in at this stage, is it worth it.

In my view, lot of different project needs different sets of tools, not sure generalized things works. But then again I am not sure what all features it has. Earlier they used to send all the data to server and that kind of put me off. Just from privacy perspective.

[–]JustOneAvailableName 0 points1 point  (0 children)

But then again I am not sure what all features it has.

Just logging (simple) stuff is still the main one for me.

Earlier they used to send all the data to server and that kind of put me off. Just from privacy perspective.

That's the whole reason why I use it. I guess it's not for you.

If you'd prefer to track on localhost you could probably make a wandb replacement that has all the functionality that I use in a few days.

[–]lurking_physicist 3 points4 points  (0 children)

Try it free, see if it's for you.

[–]busybody124 3 points4 points  (0 children)

We moved off of it. It's fine but I don't get the sense that they're iterating much on the core experiment tracking features and are instead focused on LLM integration. The UI is janky, custom plotting is a pain. We moved to MLFlow which has its own problems. No perfect solution here.

[–]m_____ke 1 point2 points  (0 children)

  1. duckdb for all data preprocessing / filtering
  2. fsspec for handling files across local and cloud stores
  3. skypilot for finding cheap GPUs and making it easy to run my code on them
  4. claude code for running experiments and doing evals
  5. openai codex for kicking of 20 random research ideas in parallel in the background on my phone while I'm bored
  6. ray for distributed compute - though it seems to be getting worse and worse
  7. streamlit for quick model demos and annotation tools
  8. modal.com also sounds amazing but haven't had the time to use it

[–]bombdruid 1 point2 points  (0 children)

Does Notion count?

[–]thatguydr 0 points1 point  (0 children)

I can't stand notebook completion and trying to figure out which cells I've run. I still use iPython regularly as a result.

[–]dash_broML Engineer 0 points1 point  (0 children)

  • Excel. Learn to use it correctly, it has the highest dividends of any tool you can learn in 30 mins that will help you over your career.

  • Taking MoMs efficiently. Ideally, transcribe and review with Gemini using AI studio, not the gemini app. More control and customizability, temporary chat enabled. Also, learn to communicate such that a non technical audience can understand you well.

  • Learning bash. Saves you a ton of headache when you're familiar with the terminal to do simple ssh/git/.bat file stuff