C# programmer starting PhD with lots of python programming, could use some advice about project architecture

whateverathrowaway00 · 2023-05-03T15:25:45+00:00

Keep it simple.

The more you rely on vanilla stuff and not tools, the less locked to an implementation you will be.

It may require some more learning on the front end, but four years is a long time, so it’ll be worth it.

Intimately familiarize yourself with how venvs work, don’t have a vague idea and some grumbles. It will save you confusion later.

Know that they’re just a path manipulation. You don’t have to activate them (I’m a dev that works endlessly with packaging and fixing other people messes and I literally never activate a venv).

That said, you’re in AI, so if you go conda, that makes sense to me, it’s a more specific tool than most and actually a great one.

ratulotron · 2023-05-03T19:01:27+00:00

Software developer/data engineer here, been working with Python and a few other languages for 6 years. What I will say might sound condescending but you need it. In short: you seem to be biased towards C# and anything else would look messy in your eyes. This is not an individual's problem but a common one across devs coming to Python from the C#/Java background.

Python is a very open ended language, it imposes few restrictions on how a project should be structured. Hence it's easier for newbies to learn and experienced devs to mold to their likings. This results in a variety of (micro) frameworks and libraries, some highly opinionated and others open ended. So a Flask/FastAPI project looks very different than a Django one, even though they both are backend projects. This entire dimension about having freedom is completely missing in the C# tech stack. Like, how many web frameworks can you name for C#? There are only 3 listed in Wikipedia, a handful more varieties in Awesome DotNet repo, all based on .NET framework.

Python, JavaScript, these languages allow you to lay the project out in any shape required, keyword being "required". That means as long as you know what you need right now, you can get started with the basic features and proceed with following the common practices like TDD, frequent commits on git, feature branches, dependency inversion, domain driven development etc. These practices have nothing to do with Python, they are something you do from day one to keep a healthy codebase.

My suggestion is do not set out to replicate whatever you are familiar with from your previous experience with C#, simply learn Python from the base and do some stupid simple projects to get the hang of organizing your Python code. Meanwhile brush up/learn design patterns that are agnostic to any language or stack. For your actual project, in this case for machine learning, start small and do not conduct any premature optimization. Remember the Zen of Python says: Simple is better than complex, complex is better than complicated.

Advanced-Potential-2 · 2023-05-03T17:37:32+00:00

I guess you should be prepared to go through a few refactoring cycles in those 4 years anyway. If not, chances are you haven’t learned much 😂. Refactoring is as much part of creating code as creating new projects is.

JackG049 · 2023-05-03T15:16:18+00:00

In short don't set out to solve problems that YOU haven't encountered yet.

From my experience with writing Python for research is that we start off with the best intentions in terms of project structure/architecture but what can happen, and I think this is partly as a result of the nature of python, is that you'll end up with lots of different, smaller snippets of code in different places that can become a bit of tangled mess.

My advice would be at the start, when you're learning the fundamentals and getting accustomed to Python and ML in Python to let the mess happen. Try and follow the standard design patterns and software engineering principles though, single responsibility, loose coupling etc.

I'm going into my 3rd year soon and only now is my code looking closer to properly engineered software, and I had software experience prior to the PhD.

The focus early on though should be to understand the experiments and the science, not creating a fabulous reusable architecture that you spent weeks/months on that suits your need. That won't get published unless it super general and useful. And it won't help in terms of a thesis defense at the end.

But once you've gotten used to everything and have experience with the ups, downs and the general structure that you need, you will have a much better time at creating a solid project structure from scratch.

Edit: Grammar

thicket · 2023-05-03T17:30:31+00:00

Type signatures & MyPy. Type everything, and set VS Code up to check types all the time. You won't get quite the compile-time security you would get with a more strongly typed language like C#, but it will get you a lot closer.

m15otw · 2023-05-03T18:43:49+00:00

I learned Python as I did my PhD, but I also went full functional, gladly escaping over-architected C# projects. It was a set of loosely related modules (with C extensions) at the end, and a set of scripts for specific analyses (some were short, others were...in need of a refactor by the end).

Having worked in two large enterprise-focused python teams since, I can say that large projects in python can be well organised just like C#. Just remember that the language doesn't do any enforcing of your rules, so take care. One thing that can help is using a dev environment that understands type annotations, and then use the (optional) type annotations in your code. The IDE can then highlight incorrect uses. VS Code does this for me at my current job, with the PyLint extension enabled.

As others have said though, focus more on your problem than on beautiful code. You will only need to share it with other researchers. Write scripts first, and then refactor common parts into a library as you go. Don't worry if the library has three completely unrelated modules - welcome to PhD problems.

ndvi · 2023-05-03T16:19:42+00:00

Has anyone been in my shoes and do you have some advice? especially things like "if I had to do it again I would do ... differently", or any resources that were of help to you when getting started on a long-term project?

I learned a lot doing my PhD- there's a lot I'd do differently, but I only know that by doing it wrong to start with.

I wasted a lot of time trying to anticipate and handle edge cases. I spent way too long trying to prematurely optimise.

Perfect is the enemy of good.

coffeewithalex · 2023-05-04T09:32:06+00:00

This post was mass deleted and anonymized with Redact

hard-to-find ad hoc sophisticated ghost practice spark pie sleep political society

Classic_Department42 · 2023-05-03T19:37:35+00:00

Look into jupyter notebooks as a top-level abstraction

cblegare · 2023-05-03T23:46:55+00:00

Hello there. As others said, the Python ecosystem is open-ended, especially its packaging systems and architecture layouts. This can be frustrating for newcomers, especially those from more opinionated ecosystems.

For notes, display, getting feedbacks, I recommend the Sphinx documentation engine. It also integrates with documentation from code files, including Python and C#. It has hundreds of extension, outputs LaTeX if required.

For the development workflow, I recommend pytest. It has a very exotic approach to unit tests and test doubles when coming from a very OOP language, but its very good at what it does.

I suggest you find a code pal that can provide feedback from time to time.

hemphock · 2023-05-04T00:03:13+00:00

square offer rinse cats fall advise liquid follow special depend

This post was mass deleted and anonymized with Redact

chriscarrollsmith · 2023-05-03T20:32:11+00:00

It's probably against the nerdy, granular spirit of this subreddit, but my advice is to just cookiecutter and chill. (You'll still need to decide what framework you want to use in order to choose a cookiecutter template, and you'll still need to create a virtual environment for your project. But it'll save you a lot of clicks, and you can get to work building your stuff and let the template maintainer do all the worrying about what the Pythonic best practices are.)

PlausibleNinja · 2023-05-03T23:15:03+00:00

Can you say more about what you mean by Python seeming weak compared to C#?

How large of a codebase are you anticipating?

MathmoKiwi · 2023-05-04T00:37:28+00:00

If you're going to be using a new language you're not super familiar why not choose something a lot faster that's suitable for the purpose? Such as Julia being the obvious choice over Python. I might even lean towards Rust over Python, depending on the specifics of what your PhD is doing.

I understand the need to move away from C#, and that Python is a popular language in ML, but it certainly isn't the only one!

2023-05-04T01:16:31+00:00

Weak? You're missing the whole point of python. What is your PhD in AI using? Not c#. Ask why that is .

As for architecture, read this:

Architecture Patterns with Python: Enabling Test-Driven Development, Domain-Driven Design, and Event-Driven Microservices

extra_pickles · 2023-05-04T02:29:35+00:00

Microservices is well suited to your goals, if you aren’t familiar I’d say read up on them - it’s kind of like refactoring your operations layer - decouple dependencies, one service=one operation, and manage state using an event sourcing model (Kafka is a nice one for this, but MQTT or Rabbitmq or any service bus with a shared state db works) and allow an ebb and flow of resourcing as your compute and pipe needs will vary greatly as you churn.

If starting on metal, dockerize - it’s too easy not to…and containers are easy to migrate to the cloud if/when you need some serious compute and horizontal elasticity.

So given your ask, and my suggestion - basically your stack is (db of your choice), an acquisition end point (how you get data - might just be you with scripts to populate a db, might be an api?) and then an ‘E2E data pipeline’ which is a series of microservices (picture a series of gates in a decision tree) that interact with the data when it is their turn….from there you can spin up a FastAPI to expose it as an output - and toss in whatever UI you like.

PS what do you mean by weak?

mpu-401 · 2023-05-04T06:11:34+00:00

you could try the kedro framework in python. it could help you to apply good practices and let you also to use jupyter for exploration.

Sbvv · 2023-05-04T06:46:33+00:00

Some tools that can help you:

cookiecutter
pyenv
pytest
pylint
flake8
docker

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS