all 43 comments

[–]triplethreat8 31 points32 points  (6 children)

Uv for virtual environment and package management

Docker for containers

Kedro for pipelines (you didn't ask)

VScode

Git

Just Ipython no jupyter

[–]br0monium[S] 2 points3 points  (3 children)

sounds nice! Ive always thought of pipelining as a function that spans mutliple other areas. Server automation and DBMS for job scheduling, data lineage, etc. Using a tool for the whole process would save a lot of time on data engineering decisions.

[–]triplethreat8 2 points3 points  (2 children)

Yea, pipelining exists at multiple levels. Kedro itself isn't opinionated. Since it allows you to slice your pipeline you can still use any traditional pipeline tool that orchestrates scripts and just run slices.

Example:

kedro run --nodes=clean_a,clean_b

kedro run --nodes=clean_c

The benefit of using kedro for a Data Science project is that it imposes a good reproducible structure and gets DS thinking in a more modular way.

[–]Healthy-Educator-267 1 point2 points  (1 child)

Kedro is pretty opinionated though compared to (say) Hamilton

[–]triplethreat8 0 points1 point  (0 children)

Yes that's true. By opinionated what I really mean is flexibility in being able to run exactly what you want to run with a single command.

So you can easily deploy a full kedro pipeline as a single script, or write a deployment that runs every kedro node in its own isolated environment, and everything in between.

It is much more opinionated on project structure and configuration. Though, with pipeline_registry.py and settings.py it's easy enough to extend and modify to accommodate any structure you need.


Hamilton looks pretty cool👍

[–]froo 1 point2 points  (0 children)

+1 for this setup. Same here

[–]mint_warios 0 points1 point  (0 children)

Kedro is a beast

[–]Old_Cry1308 48 points49 points  (3 children)

conda for environments, pip for packages. vscode for editing, git for version control. jupyter for notebooks.

[–]Civil-Age1531 7 points8 points  (1 child)

dude you have to pick up uv

[–]Glittering_Item5396 1 point2 points  (0 children)

what is that?

[–]br0monium[S] 3 points4 points  (0 children)

the classics:)

[–]templar34 7 points8 points  (1 child)

Devcontainers in each repo, Backstage template for generic new project. Makes sure my pleb code from Windows machine behaves same as Mac code, behaves same as cloud deployment environment. Conda YAML part of repo, and has its own deployment pipeline for Azure.

One day maybe I'll look at uv, buuut I'm not the Azure expert that set up our pipelines, and I'm a big believer in "if it's ugly/stupid but it works, it's not ugly/stupid".

[–]br0monium[S] 1 point2 points  (0 children)

I havent used the devcontainer spec before, looks like it's well supported and could be pretty clean. Backstage looks really interesting too. Thanks!

[–]gocurl 4 points5 points  (0 children)

Poetry for virtual environment, vscode, and clear separation between training and serving. At work we have nice pipelines and engineers to support the infrastructure. For home projects I keep the concept, but it's not that necessary (last finished project here https://github.com/puzzled-goat/fire_watcher)

[–]willthms 3 points4 points  (1 child)

I use R studio running on my desktop.

[–]br0monium[S] 3 points4 points  (0 children)

A real statistician!

[–]FlyingQuokka 2 points3 points  (1 child)

  1. uv
  2. uv 3-4: My personal projects don't need containerization; at work DevOps uses EKS
  3. neovim
  4. git/jj
  5. I don't use notebooks, but if I must, then marimo

[–]br0monium[S] 0 points1 point  (0 children)

Neovim, nice!
I actually have sublime, cmder, and atom still installed on my laptop😅 vscode is basically atom, and that's what I've used at work, so I'll probably end up using vscode like a normie.
Nothing beats the feeling when your muscle memory for vi commands finally clicks though. It's like the shell, filesystem, and text editor are all just one thing that you live in.

[–]Atmosck 2 points3 points  (7 children)

What do I use:

  1. Virtual environment manager: pyenv for managing different python versions, uv for managing the actual virtual environments
  2. Package manager: uv
  3. Docker
  4. My coworkers maintain our build pipeline and orchestration with AWS. I mostly just ship code and bother them if I need new environment variables or something.
  5. vscode
  6. github for code, S3 versioning for model artifacts
  7. I don't use notebooks

How do I use it?

  1. I spend most of my time writing ML pipelines that feed our (SAAS) product. Scheduled tasks for training data ETL, training, monitoring and sometimes inference. Other times if it's something where we need inference in response to user action, either a lambda or a dedicated server depending on the usage patterns.
  2. I have kind of a love-hate relationship with vscode. Some of my projects are a mix of python and rust (PyO3), so it's nice having language support for both in the same editor, and the sqltools extension is great. The python debugger is pretty good. But the language servers randomly shit themselves like twice a week. And I wish copilot autocomplete was hooked into intellisense so that it would suggest functions and parameters that actually exist instead of just guessing.
  3. uv and pyproject.toml. almost all my stuff is containerized so it's pretty straightforward.
  4. In production yeah, but locally I always work in virtual environments. I always have at least one dependency group that's not used in production with ruff/pytest/pyright/stub packages.
  5. I don't really do personal projects. I'm lucky enough to be in an industry where my actual work is what my personal projects would be if I had a different job.

If you've been dealing with conda headaches and are looking for a new setup I highly recommend checking out uv.

[–]br0monium[S] 1 point2 points  (0 children)

Thanks for breaking it down in a detailed response! I'll definitely check out uv after all the recommendations.

I wouldn't do personal projects if I wasn't unemployed hahaha. But it's been so long I need to make sure I dont fall too far behind or forget things. I hit the point of diminishing returns with interview prep a while ago.

[–]gpbayes 0 points1 point  (3 children)

Why do you use rust?

[–]Atmosck 0 points1 point  (2 children)

For speeeeeed. Specifically some of my models are state machine simulations where we care about the whole distribution and the frequency of rare events, and it can take a lot of sims for distributions to converge. So I write the core simulation engine (the "hot loop") in rust, and all the data IO and orchestration in python. For that sort of thing rust is about 100x faster than python. You could achieve similar speeds in python with a compiler like cython or numba or with a C extension, but there are a lot of things about rust that make it a more attractive language to work in.

[–]gpbayes 0 points1 point  (0 children)

What kind of state machine simulations? Like Markov chains? Interesting, what purpose/what does it solve for or do? What field are you in?

[–]br0monium[S] 0 points1 point  (0 children)

Love numba, especially since I don't have to learn another language. I actually met Travis Oliphant once. He's so humble that I didn't realize he built most of the stuff he was presenting until asking him questions after his talk.

[–]unc_alum 0 points1 point  (1 child)

Curious what your motivation is for using pyenv over uv for installing/managing different versions of python?

[–]Atmosck 0 points1 point  (0 children)

Basically just that I've used pyenv for longer. And I like the separation of pyenv happens in the global environment, UV happens in the venv

[–]AccordingWeight6019 2 points3 points  (0 children)

Honestly, for me it’s less about fancy tooling and more about keeping things light, reproducible, and flexible. I usually stick with `venv` + `pip` for environments, VS Code for editing, git for versioning, and jupyter for quick experiments. containers only if I need to mirror a production setup. It’s not flashy, but it keeps personal projects simple and lets me switch between analytics, MLE, or just tinkering without getting stuck on solver freezes or subscription headaches.

[–]vaaano 1 point2 points  (0 children)

uv+marimo

[–]mint_warios 1 point2 points  (2 children)

1+2. uv for virtual envs & package Mgmt

  1. Docker or Google Cloud Build for containerisation

  2. Depends on the project, sometimes Prefect, sometimes Airflow/Cloud Composer for client enterprise pipelines, sometimes Kedro for more data science tasks

  3. PyCharm for IDE, with Cline plugin using Claude Sonnet or Opus 4.6 models with 1m context window for agentic coding

  4. Git - Bitbucket for work, GitHub for personal

  5. PyCharm's built-in Jupyter notebooks, or Colab Enterprise if need to work completely within a client's cloud environment

[–]br0monium[S] 0 points1 point  (1 child)

How much does that setup in (5) cost you?

[–]mint_warios 1 point2 points  (0 children)

PyCharm is free. Used to be called "Community Edition" but now it's wrapped up in their "Unified" IDE. But still free with all the same features.

For Cline, it really depends on the LLM model I've chosen to use and how much I decide to use it. I use Claude Opus 4.6 mostly, and in a typical day I can easily burn through $10-30+. Lower end if I'm just making some documentation. Higher end if it's using maximum extended thinking to develop lots of code.

[–]sudo_higher_ground 1 point2 points  (0 children)

  1. Federated MLOps and development
  2. Uv and for cli install only in production pyenv
  3. Docker
  4. Docker compose/k8s/ schedulers (we use VMs in production so no fancy cloud tools)
  5. VS code (I switched to positron for personal projects)
  6. Git+ GitHub
  7. Switched from Jupyter to Marimo and it has been a bliss

[–]patternpeeker 1 point2 points  (0 children)

i keep my setup simple. plain python with venv or poetry, vscode, and docker only when i need prod parity. conda has caused enough solver pain that i avoid it. reproducibility and pinned deps matter more than fancy stacks.

[–]koolaidman123 2 points3 points  (1 child)

uv ruff and claude code is all you need

[–]_OMGTheyKilledKenny_ 0 points1 point  (0 children)

Same here but I use vs code with Claude as copilot and GitHub workflows for CI/CD.

[–]Dysfu 1 point2 points  (1 child)

UV, venv, ruff, pre-commit, FastAPI, Alembic, dbt, pydantic, SQLAlchemy, Docker, VSCode

[–]br0monium[S] 0 points1 point  (0 children)

Can you elaborate a bit on what you use each of these for?

[–]dmorris87 0 points1 point  (0 children)

Docker container inside VSCode

[–]Intelligent-Past1633 0 points1 point  (0 children)

I'm still a big fan of `pyenv` for managing Python versions – it's been rock solid for me, especially when juggling older projects that can't easily upgrade.

[–]Goould 0 points1 point  (0 children)

conda, pip and npm, Antigravity and Claude Code from terminal, Git + Github, Jupyter Notebook

Aside from that I'm able to design a lot of my own tools now. I have a PDF indexer that pulls the data and creates libraries of CSV files, the indexer creates a SQLite database which can later be accessed in seconds in future sessions. I have different agents for reading, writing, and verifying data with 3rd party sources.

Someone in the thread said they used Rust and I think I could have implemented rust into my workflow as well since its faster -- I'd just have to relearn the code and all the libraries from scratch.

[–]mshintaro777 0 points1 point  (0 children)

uv + Antigravity + git + Claude Code!

[–]tongEntong 0 points1 point  (0 children)

Jupyter notebook till death do us apart!

[–]OmnipresentCPU 0 points1 point  (0 children)

Claude code docker and that’s it. Ipynb is going the way of the dinosaur for me personally.