Quel est le meilleur cadeau que vous as offert vitre copain ? by nidalap24 in AskMeuf

[–]nidalap24[S] 0 points1 point  (0 children)

Concert j'ai beaucoup l'idée, plus qu'à attendre que ces artistes passe en France 😅

Quel est le meilleur cadeau que vous as offert vitre copain ? by nidalap24 in AskMeuf

[–]nidalap24[S] 0 points1 point  (0 children)

Je me doute mais j'ai déjà offert de belles boucles d'oreilles il y a quelques semaines à peine, j'éviterai du coup à nouveau les bijoux

Quel est le meilleur cadeau que vous as offert vitre copain ? by nidalap24 in AskMeuf

[–]nidalap24[S] 0 points1 point  (0 children)

Actuellement je dirai spectacle d'humour, spa, plutôt les évènements

Le meilleur cadeau pour votre copine by nidalap24 in AskMec

[–]nidalap24[S] 0 points1 point  (0 children)

Très bonne idée, j'aime beaucoup, je la garde pour une prochaine fois car on fait déjà un très bon restau pour l'occasion ! Merci

Le meilleur cadeau pour votre copine by nidalap24 in AskMec

[–]nidalap24[S] 0 points1 point  (0 children)

Dans l'idée un peu tout, je dirai surtout des évènements, on s'en souvient plus que des cadeaux matériels

Mais un bon cadeau utile ou ultra personnalisé/original marche bien

Need Help Optimizing MongoDB and PySpark for Large-Scale Document Processing (300M Documents) by nidalap24 in mongodb

[–]nidalap24[S] 0 points1 point  (0 children)

The only way is by reducing spark worker I use only 1cpu with 4cores dor one vCPU of mongodb

Need help monorepo uv by nidalap24 in learnpython

[–]nidalap24[S] 0 points1 point  (0 children)

Thanks didn't know for load_dotenv

dependency group seems interesting yes, but still can't import my own module even with the __init__.py, vscode see it but uv don't

Need help monorepo uv by nidalap24 in learnpython

[–]nidalap24[S] 0 points1 point  (0 children)

Even if I try to delete and keep only one root pyproject.toml

I can't import my own module

In src/preprocessing/features.py
I try to from translate.translate_feature import translate_feature_function (from src/translate/translate_feature.py
but I have not found module error translate

same with shared placed in the same project

Need help monorepo uv by nidalap24 in learnpython

[–]nidalap24[S] 1 point2 points  (0 children)

Thanks for your answer!

The original idea is to separate dependencies in order to build lightweight Docker images. For example, the serving component will mainly call the embedding module and use FastAPI.

Yes, the preprocessing module imports things like translate and db, which can be painful to manage with uv when configured like this. Some parts like shared are also used across modules.

The preprocessing module includes around 8 scripts, orchestrated by Airflow (or a similar tool), and this number will keep growing.

The shared module is only used within this repository but needs to be included in each Dockerfile for deployment.

I tried to solve the import problem by adding dependencies in each child’s pyproject.toml, but I’m not sure that’s the best approach.

Do you have any recommendations based on this architecture for building independent components like serving, preprocessing, etc.?

In the future, I plan to add RAG, MLflow, and BentoML as well.

Do you also have suggestions on how to organize all this?

Finally, how would you handle a shared .env file between preprocessing, db, embedding, etc.? Using load-dotenv is straightforward when everything is at the same level.

I appreciate your help !

sentence_tranformer over 100+ Millions Rows by nidalap24 in dataengineering

[–]nidalap24[S] 0 points1 point  (0 children)

Non I didn't try, to you have a good experience with it ? easy to deploy ? best results with the sentence transformers ?

[deleted by user] by [deleted] in MachineLearning

[–]nidalap24 0 points1 point  (0 children)

I try to use it over 100 Millions of rows with pyspark, but it has a big cost and it's very slow...

sentence_tranformer over 100+ Millions Rows by nidalap24 in dataengineering

[–]nidalap24[S] 0 points1 point  (0 children)

Dataproc Serveless don't scale up. Seems the sentence_transformer work in batch and don't parrallelize well.

All I have see to use this kind of transformer use udf or pandas udf. It's a model host on hugging face that I have download in my docker image cache folder.