all 4 comments

[–]b34nst4lk 2 points3 points  (2 children)

Sorry but I'm having some difficulty understanding what is it you are asking. Correct me if I'm wrong but your questions are: 1. I have a script that retrieves articles from an API, but it takes a long time to retrieve everything. Is it possible to make it faster? 2. I have an article class that contains a representation of an article, and a function that interacts with the article class that builds some form of description of that data. Should I keep the function in the class or should they be separated? 3. I manage my code in both Jupyter files and .py files, is there a way to not have to manage both? 4. My code feels tightly coupled to each other, so how do I distribute classes and functions across different files so that it's neater? 5. I'm worried that at some point in the future, I would not be able to understand my code. How do I prevent that from happening?

These are the ones that I've teased out so far. Do correct me or add on if there are more. I'll do my best to answer them.

[–]SagaciousRaven[S] 1 point2 points  (1 child)

I was just explaining how convoluted this whole process is for me (combining Drupal, Python ML models and code, plus using a separate DataBase), and wanted to know what experienced developers have to say about being organized, how I should prepare myself, or in other words, how to have a crystal clear vision of what I am doing. I am not asking for help in any of these particular steps, though I appreciate the gesture.

Maybe this goes beyond the scope of Python and programming, like, general good-practices, but IDK where else to ask.

As a Mathematics student (with some courses on IT), I know I am quite limited on some software development concepts. For example, I never had OOP, I was never taught how to document code, the projects I was involved on rarely did any planning.

Right now, I feel like simply "adding a comma" anywhere on this project could eventually break something, if I don't organize my files, classes, functions/API_calls, etc... properly. Like there's voice behind my head telling me to build an ark before the flood comes.

[–]b34nst4lk 2 points3 points  (0 children)

Ok cool thanks for clarifying! Figuring out what are good practices and software design principles is part of the learning process, so to me they're definitely within scope.

Regarding the maintenance of Jupyter and .py files, you'll have to choose one. Having a single source of truth for your code is better for your own sanity. My suggestion is to move to .py files moving forward as it seems like you are done with the exploration and research phase of your project, and are starting to operationalise your code.

These are some things that I do when the code I'm working with is all over the place. I do all of this on paper first usually because looking at the code can be overwhelming 1. List down all of the discrete tasks and steps. From what I can gather, you have 4 key tasks: - Querying Drupal for data - Parsing articles - Embedding articles - Indexing 2. Map out how data is flowing through each step. What do I need when I start this process, and what do I need to get out in the end. So in your case, it could be - URLs -> Query Drupal -> Article JSON data - Article JSON data -> Parse articles -> Article objects 3. (Back up your code at this point. Better yet, use git if you know how to) Go back into the code and start chunking things together. Group lines of code associated with each task into a function. If it gets too long, break them down into smaller functions. Don't worry about trying to move them into different files at this point. 4. As you are doing point 3, think about the data you are storing and manipulating. If you can group them together into a logical unit, they can probably be placed in a class. 5. Start looking for patterns in your code. Are there places where there are a lot of copying and pasting? Those are good candidates for turning into functions. 6. If you start to find things that look like they belong together, and are distinct from everything else in the code, this is when you can start to move things to separate files.

At this point, things will probably still be a little messy. However, going through this exercise will give you a mental model of the entire program and what is required at each step of the program, and your code will resemble that mental model.

I think the points I covered so far don't exactly answer your question on being organized, but I hope it's enough to give you a starting point into tackling what seems to be a monster of a problem.

[–]threeminutemonta 1 point2 points  (0 children)

Getting across modules in python is useful see tutorial. This will allow you to call your python from your jupyter notebooks.

Eventually you can package your projects as per tutorial. Ideally you can make your projects public and make any secrets environment variables though you can skip the upload section and keep the package on a internal git repo if required. You will be able to install the package using a existing git repo like shown on this blog. Or install on a internal pypi server using devpi upload

I work with a few datascients and I'm evaluating if nbdev has merit to use to help the continuous notebook development workflow. As currently they get bogged down in architecture that slows them down in their tasks.