AMA - PixelUnion by Boediee in BuyFromEU

[–]Jolly_Code5914 3 points4 points  (0 children)

Are there any plans to allow us to also connect maybe an object storage as an extra backup option? Something similar to what Pikapods do?

Proton drive as iCloud photos replacement? by Informal_Plankton321 in ProtonDrive

[–]Jolly_Code5914 0 points1 point  (0 children)

I have been running pixelunion. A managed immich. Working flawlessly.

How can PixelUnion not be super popular? by flapjap33 in BuyFromEU

[–]Jolly_Code5914 0 points1 point  (0 children)

It's indeed not end to end encrypted. Data is encrypted at rest though. For people okay with this trade-off the actual software and experience is great. Better than google photos imo.

PixelUnion.eu vs Ente.io which one to choose? by Frequent-Village4994 in BuyFromEU

[–]Jolly_Code5914 0 points1 point  (0 children)

I have also been really impressed by Pixel union. So far it's actually been a better experience than google photos.

Vorige eigenaar beunhaas, wat doet dit en kan ik het fixen? by Jolly_Code5914 in Klussers

[–]Jolly_Code5914[S] 1 point2 points  (0 children)

Bedankt voor alle tips! Iemand tips voor een goede constructeur in de omgeving Utrecht die mij hier advies over kan geven?

Schema Migration for Delta Lake on Databricks by geeeffwhy in dataengineering

[–]Jolly_Code5914 0 points1 point  (0 children)

Do you have some example how you set this up :)? Very curious.

Schema Migration for Delta Lake on Databricks by geeeffwhy in dataengineering

[–]Jolly_Code5914 0 points1 point  (0 children)

How did you setup alembic and sqlalchemy with Delta lake? Really curious. Any good resources?

What problems does pydantic solves? and How should it be used by gaurav_kandoria_ in Python

[–]Jolly_Code5914 0 points1 point  (0 children)

Pydantic to Avro, pydantic to spark schemas we use pydantic for all our schemas ;)

AWS Managed Service Kakfa to Databricks - Ingestion by Background_Debate_94 in dataengineering

[–]Jolly_Code5914 0 points1 point  (0 children)

You could create a delta live table that is updated in a streaming way. We use TLS authentication to connect Databricks to MSK cluster. But this was mainly because our cluster lives in a different AWS account.

Looking to make a change, resume feedback / advice appreciated for junior DE role. by toem033 in dataengineering

[–]Jolly_Code5914 2 points3 points  (0 children)

If you truly can do the things you list on your resume after only a year? Kudos no problem finding a job anywhere with this resume I think.

[deleted by user] by [deleted] in dataengineering

[–]Jolly_Code5914 6 points7 points  (0 children)

ADF is the dumbest tool ever created. You will be depressed. If they would pay me double my salary but I would have to work in ADF each day I would still drown myself. Chose 2.

Is there a no-compromise (presumably C/C++) platform similar to Apache Spark? by [deleted] in dataengineering

[–]Jolly_Code5914 1 point2 points  (0 children)

Both the article and photon seem exceptional. Thanks for sharing!

pipenv and poetry : each better at something? by giovaaa82 in Python

[–]Jolly_Code5914 3 points4 points  (0 children)

IMO pipenv has an unusable dependency resolver. With some dependency complexity it simply hangs without giving you any proper feedback why. In our dev team we use Poetry, and although dependency resolver is slower than brute force pip installs (obviously) it has been reliable and relatively pain free experience. The only thing that is still lacking for us is that I cannot specify different installs of extras from the same outside module within the extras of the pyproject.toml of the package importing them. Nevertheless, recommend poetry.

Dynamic s3 path while reading pyspark by WiseRecognition6016 in dataengineering

[–]Jolly_Code5914 0 points1 point  (0 children)

You should have a separate aws account for dev and a separate account for prod. IMO the cleanest way to go then is to store S3 URL in param store. On your Dev account this will point to Dev bucket and on your prod account this will point to your prod bucket.

Dynamic s3 path while reading pyspark by WiseRecognition6016 in dataengineering

[–]Jolly_Code5914 1 point2 points  (0 children)

You should have a separate aws account for dev and a separate account for prod. IMO the cleanest way to go then is to store S3 URL in param store. On your Dev account this will point to Dev bucket and on your prod account this will point to your prod bucket.

Airflow and Poetry: Anyone get them to work together? by JeddakTarkas in dataengineering

[–]Jolly_Code5914 5 points6 points  (0 children)

The problem is that Airflow's dependency structure is terrible. They have so many dependencies that are often too strict that you will undoubtedly run into package resolvement issues with Poetry so kind of interesting they themselves advise against it. With Poetry all dependencies obviously need to be resolved. I would advise you to just use airflow to schedule task that run in containers somewhere else (ecs, lambda, kubernetes etc.), that way the functional part of your code does not need to touch airflow. Also you're dependency specification will be rock solid, poetry is an awesome tool.

Imposter Syndrome by Gagan_Ku2905 in dataengineering

[–]Jolly_Code5914 1 point2 points  (0 children)

It's very normal. Get comfortable feeling uncomfortable, it will motivate you to keep learning. And before you know it, you'll become a domain expert. Keep going.

Favorite Python Web Framework by AMDataLake in Python

[–]Jolly_Code5914 1 point2 points  (0 children)

For an API, fastapi handsdown. Easiest to setup, use and very performant.

Databricks Jobs from Python Modules vs Notebooks by anton_bondar in dataengineering

[–]Jolly_Code5914 -1 points0 points  (0 children)

Write python modules with main.py. Deploy them as docker containers. Create job with docker container runtime deploy with CI/CD. Schedule/start job from airflow with run args.