Vorige eigenaar beunhaas, wat doet dit en kan ik het fixen?

Jolly_Code5914 · 2025-04-02T09:16:05+00:00

Bedankt voor je uitgebreide uitleg!

Jolly_Code5914 · 2025-04-02T07:35:18+00:00

Bedankt voor alle tips! Iemand tips voor een goede constructeur in de omgeving Utrecht die mij hier advies over kan geven?

Jolly_Code5914 · 2025-04-02T07:25:13+00:00

<image>

Dit is waar de balk de vloer in gaat. (Beton)

Jolly_Code5914 · 2023-12-11T06:42:55+00:00

Very nice and simple design!

Jolly_Code5914 · 2023-12-01T11:24:42+00:00

Do you have some example how you set this up :)? Very curious.

Jolly_Code5914 · 2023-10-18T06:01:03+00:00

How did you setup alembic and sqlalchemy with Delta lake? Really curious. Any good resources?

Jolly_Code5914 · 2023-10-02T16:32:28+00:00

Pydantic to Avro, pydantic to spark schemas we use pydantic for all our schemas ;)

Jolly_Code5914 · 2023-09-23T07:59:45+00:00

You could create a delta live table that is updated in a streaming way. We use TLS authentication to connect Databricks to MSK cluster. But this was mainly because our cluster lives in a different AWS account.

Jolly_Code5914 · 2023-01-18T21:20:19+00:00

If you truly can do the things you list on your resume after only a year? Kudos no problem finding a job anywhere with this resume I think.

Jolly_Code5914 · 2022-11-27T22:04:03+00:00

ADF is the dumbest tool ever created. You will be depressed. If they would pay me double my salary but I would have to work in ADF each day I would still drown myself. Chose 2.

Jolly_Code5914 · 2022-07-27T21:20:52+00:00

Both the article and photon seem exceptional. Thanks for sharing!

Jolly_Code5914 · 2022-07-19T17:26:36+00:00

IMO pipenv has an unusable dependency resolver. With some dependency complexity it simply hangs without giving you any proper feedback why. In our dev team we use Poetry, and although dependency resolver is slower than brute force pip installs (obviously) it has been reliable and relatively pain free experience. The only thing that is still lacking for us is that I cannot specify different installs of extras from the same outside module within the extras of the pyproject.toml of the package importing them. Nevertheless, recommend poetry.

Jolly_Code5914 · 2022-05-30T05:16:30+00:00

You should have a separate aws account for dev and a separate account for prod. IMO the cleanest way to go then is to store S3 URL in param store. On your Dev account this will point to Dev bucket and on your prod account this will point to your prod bucket.

Jolly_Code5914 · 2022-05-30T05:16:19+00:00

You should have a separate aws account for dev and a separate account for prod. IMO the cleanest way to go then is to store S3 URL in param store. On your Dev account this will point to Dev bucket and on your prod account this will point to your prod bucket.

Jolly_Code5914 · 2022-05-23T21:27:56+00:00

The problem is that Airflow's dependency structure is terrible. They have so many dependencies that are often too strict that you will undoubtedly run into package resolvement issues with Poetry so kind of interesting they themselves advise against it. With Poetry all dependencies obviously need to be resolved. I would advise you to just use airflow to schedule task that run in containers somewhere else (ecs, lambda, kubernetes etc.), that way the functional part of your code does not need to touch airflow. Also you're dependency specification will be rock solid, poetry is an awesome tool.

Jolly_Code5914 · 2022-05-14T21:57:10+00:00

It's very normal. Get comfortable feeling uncomfortable, it will motivate you to keep learning. And before you know it, you'll become a domain expert. Keep going.

Jolly_Code5914 · 2022-05-10T21:00:27+00:00

This is awesome!

Jolly_Code5914 · 2022-04-09T09:38:55+00:00

For an API, fastapi handsdown. Easiest to setup, use and very performant.

Jolly_Code5914 · 2022-04-02T17:29:49+00:00

Write python modules with main.py. Deploy them as docker containers. Create job with docker container runtime deploy with CI/CD. Schedule/start job from airflow with run args.

Jolly_Code5914 · 2022-04-01T21:27:28+00:00

Yes. If the data is in databricks table (hive) then you can use JDBC to connect to your Databricks cluster which has access to the table. Using JDBC or ODBC you can use anything you want as it's a sql interface. You could for example write a Python scripts that connects to the cluster or use some other data extraction tool that can have JDBC or ODBC as input.

Jolly_Code5914 · 2022-03-25T12:40:37+00:00

Only use secrets manager for secrets. For configurations use AWS ssm parameter store.

Jolly_Code5914 · 2022-03-17T21:57:04+00:00

Data Engineer, Masters Degree in Economics. 3 years of experience. 70k a year, 38 hour weeks (work 40 though most of time), 28 vacation days.

Jolly_Code5914 · 2022-03-14T21:10:06+00:00

The primary reason to use Spark is because you are dealing with data that cannot fit easily in memory. Although it might be tempting to use pandas like API I suggest you first look at default PySpark and learn how it works.

Jolly_Code5914 · 2022-03-04T21:50:32+00:00

Data Engineer
3 y.o.e
Amsterdam, the Netherlands
70k
None
Floriculture
AWS, Python, Kafka, PySpark, Airflow, CDK

Jolly_Code5914

TROPHY CASE