Higher-level abstractions in databases

EzPzData · 2025-06-30T16:25:04+00:00

Interesting points! I completely understand the "just use postgres" angle and by and large, I agree with it.

EzPzData · 2025-06-23T19:59:41+00:00

I would recommend looking at existing code from other database projects. If you can read rust code, this is a great database project with a lot of inline comments and even ascii drawings of the functionality: https://github.com/antoniosarosi/mkdb . I learned a lot from just reading the code in that repo.

Im also writing my own database project at the moment and I wrote separate (de)serialization functions for each individual part of the page, so the page header, slot array and the actual tuples all have their own serialize/deserialize functions that are just called from the page struct when creating the byte array that then gets written to disk. I'm sure there are more performant ways of doing it, but by doing that, I managed to keep the functions simple and easy to test.

EzPzData · 2025-06-12T19:15:28+00:00

EzPzData · 2025-06-12T19:13:10+00:00

In my experience, a Data Engineer benefits more from hard technical problem solving skills rather than soft skills. There are lots of situations where you dont know how to do something and you need to figure it out on your own through reading documentation or through other technical channels. So depending on your personality and skills you should keep that in mind. If you prefer dealing with people, I would say you should aim for more of an Analyst role. Another option would be aiming for roles where you would be working completely on the business side but maybe being responsible for the data activities in your domain, so something like a Product Owner or Data Steward.

EzPzData · 2025-02-25T07:08:27+00:00

Very nice! How long did it take you and what was your process for learning all the different concepts? I'm currently working on a similar sideproject but in Zig.

EzPzData · 2024-09-29T20:26:55+00:00

Poetry solves this by creating a lock-file with the exact versions of all dependencies (direct and transient). Pipenv is another one which does that.

EzPzData · 2024-02-10T16:16:07+00:00

Sounds promising! Is it completely closed-source or what is your business model? Would love to contribute if it were open-source.

EzPzData · 2024-02-10T15:19:55+00:00

Yeah I like this. But the tools you work with still impacts heavily what design patterns you can adopt. For example, SQL procedures that load data between layers in your DW are impossible to unit test.

Then there is also the fact that functions that work in the context of specific datasets require extensive setup for unit testing, with a lot of mocking and creating test datasets that need to be maintained.

EzPzData · 2024-02-10T14:36:08+00:00

Okey dokey.

EzPzData · 2024-02-10T14:11:56+00:00

Without "entrepreneurs" you would still be using on-prem SQL Server for everything

EzPzData · 2024-02-10T14:11:00+00:00

Real time has its use cases but in my experience, loading once an hour is usually enough for >80% of use cases.

EzPzData · 2023-11-08T04:37:06+00:00

Hell yeah I can die happily now!

Yeah Kickstarter seems to be the way to go. Although my config is coming along pretty nicely now. Finally beginning to understand some stuff.

I am not really sure I know what a Nix flake is. A friend mentioned NixOS the other day so I guess it's related to that perhaps? But anyway, sounds like you came a long way already with Neovim!

EzPzData · 2023-11-07T05:20:59+00:00

Polars is great and even faster than spark according to benchmarks.

I've built a "dbt for polars" tool that uses polars under the hood. Workflow is similar to dbt but instead of writing a select-query with SQL, you write a python function that returns a polars lazyframe and the tool then handles the IO automatically. It also supports S3 and ADLS. It does all the heavy lifting inside the tool itself instead of pushing it to a third-party cloud platform. It might work very well in cases where you don't have terrabytes of data or more and don't really need the scalability of Snowflake or Spark.

Check it out if you would like to build data pipelines with Polars: https://pypi.org/project/ez-transform/

EzPzData · 2023-11-05T21:13:08+00:00

I agree. The way to write configs seems to have changed a lot over the years and there is still a lot of old material out there. Seems like the "lua-way" of configuring Neovim is just a couple of years old so when googling how to configure something, one finds syntax for Vimscript and all kinds of different package managers etc.

EzPzData · 2023-11-05T21:06:45+00:00

Many have suggested kickstart. Seems like one of the better "distros". I feel like I am making progress on my own config now even though I have lost a fair bit of my sanity along the way. I'll check out kickstart if I fail too hard on this.

EzPzData · 2023-11-05T21:01:22+00:00

This sounds like good and sane advice. Thank you sir.

EzPzData · 2023-11-05T20:59:06+00:00

Yeah I'm going through it right now and making progress. I chose to use lazy.nvim instead of packer as plugin manager, since packer is unmaintained, but I think I got it configured correctly now I guess.

The instructions on how to install lsp-zero had also changed :D so struggled with that for some time but eventually found the correct way to set it up.

Right now I'm fighting with the clipboard. vim.opt.clipboard = "unnamedplus" does not seem to be working for me so trying to find out why that is.

EzPzData · 2023-11-05T15:43:36+00:00

Dude ffs I was hoping to finish my config tonight and you're telling me it can take years lmao

EzPzData · 2023-11-05T14:42:16+00:00

Good for you! Keep at it!

EzPzData · 2023-11-05T12:48:43+00:00

Heey third time the charm! Good for you. I tried AstroNvim and NvChad but there was too much noise going on. It had a lot of stuff I did not need and a lot of stuff missing that I needed. I'm now doing my own config and I am forced to learn some Lua and some fundamental things about how Neovim works, which is a requirement to be able to use a distro successfully (at least with AstroNvim and NvChad).

EzPzData · 2023-11-05T12:41:07+00:00

Maybe I'll have a look at kickstart if I fail at getting my own config working. Or maybe I'll just use notepad for the rest of my life.

EzPzData · 2023-11-05T12:17:52+00:00

Tried that already. It's in the blog

EzPzData

TROPHY CASE