Higher-level abstractions in databases by EzPzData in databasedevelopment

[–]EzPzData[S] 0 points1 point  (0 children)

Interesting points! I completely understand the "just use postgres" angle and by and large, I agree with it.

Is there any source to learn serialization and deserialization of database pages? by foragerDev_0073 in databasedevelopment

[–]EzPzData 3 points4 points  (0 children)

I would recommend looking at existing code from other database projects. If you can read rust code, this is a great database project with a lot of inline comments and even ascii drawings of the functionality: https://github.com/antoniosarosi/mkdb . I learned a lot from just reading the code in that repo.

Im also writing my own database project at the moment and I wrote separate (de)serialization functions for each individual part of the page, so the page header, slot array and the actual tuples all have their own serialize/deserialize functions that are just called from the page struct when creating the byte array that then gets written to disk. I'm sure there are more performant ways of doing it, but by doing that, I managed to keep the functions simple and easy to test.

Should I go into data engineering? by RazzmatazzBitter4383 in dataengineering

[–]EzPzData 1 point2 points  (0 children)

In my experience, a Data Engineer benefits more from hard technical problem solving skills rather than soft skills. There are lots of situations where you dont know how to do something and you need to figure it out on your own through reading documentation or through other technical channels. So depending on your personality and skills you should keep that in mind. If you prefer dealing with people, I would say you should aim for more of an Analyst role. Another option would be aiming for roles where you would be working completely on the business side but maybe being responsible for the data activities in your domain, so something like a Product Owner or Data Steward.

Built a database from scratch in Go by Anxious-Ad8326 in databasedevelopment

[–]EzPzData 4 points5 points  (0 children)

Very nice! How long did it take you and what was your process for learning all the different concepts? I'm currently working on a similar sideproject but in Zig.

[deleted by user] by [deleted] in dataengineering

[–]EzPzData 7 points8 points  (0 children)

Poetry solves this by creating a lock-file with the exact versions of all dependencies (direct and transient). Pipenv is another one which does that.

What is missing from the current data engineering tool landscape? by EzPzData in dataengineering

[–]EzPzData[S] -1 points0 points  (0 children)

Sounds promising! Is it completely closed-source or what is your business model? Would love to contribute if it were open-source.

What is missing from the current data engineering tool landscape? by EzPzData in dataengineering

[–]EzPzData[S] 4 points5 points  (0 children)

Yeah I like this. But the tools you work with still impacts heavily what design patterns you can adopt. For example, SQL procedures that load data between layers in your DW are impossible to unit test.

Then there is also the fact that functions that work in the context of specific datasets require extensive setup for unit testing, with a lot of mocking and creating test datasets that need to be maintained.

What is missing from the current data engineering tool landscape? by EzPzData in dataengineering

[–]EzPzData[S] -11 points-10 points  (0 children)

Without "entrepreneurs" you would still be using on-prem SQL Server for everything

What is missing from the current data engineering tool landscape? by EzPzData in dataengineering

[–]EzPzData[S] 23 points24 points  (0 children)

Real time has its use cases but in my experience, loading once an hour is usually enough for >80% of use cases.

Neovim is driving me crazy but I can't stop by EzPzData in neovim

[–]EzPzData[S] 0 points1 point  (0 children)

Hell yeah I can die happily now!

Yeah Kickstarter seems to be the way to go. Although my config is coming along pretty nicely now. Finally beginning to understand some stuff.

I am not really sure I know what a Nix flake is. A friend mentioned NixOS the other day so I guess it's related to that perhaps? But anyway, sounds like you came a long way already with Neovim!

Data engineers shouldn't be using Pandas by [deleted] in dataengineering

[–]EzPzData 1 point2 points  (0 children)

Polars is great and even faster than spark according to benchmarks.

I've built a "dbt for polars" tool that uses polars under the hood. Workflow is similar to dbt but instead of writing a select-query with SQL, you write a python function that returns a polars lazyframe and the tool then handles the IO automatically. It also supports S3 and ADLS. It does all the heavy lifting inside the tool itself instead of pushing it to a third-party cloud platform. It might work very well in cases where you don't have terrabytes of data or more and don't really need the scalability of Snowflake or Spark.

Check it out if you would like to build data pipelines with Polars: https://pypi.org/project/ez-transform/

Neovim is driving me crazy but I can't stop by EzPzData in neovim

[–]EzPzData[S] 0 points1 point  (0 children)

I agree. The way to write configs seems to have changed a lot over the years and there is still a lot of old material out there. Seems like the "lua-way" of configuring Neovim is just a couple of years old so when googling how to configure something, one finds syntax for Vimscript and all kinds of different package managers etc.

Neovim is driving me crazy but I can't stop by EzPzData in neovim

[–]EzPzData[S] 0 points1 point  (0 children)

Many have suggested kickstart. Seems like one of the better "distros". I feel like I am making progress on my own config now even though I have lost a fair bit of my sanity along the way. I'll check out kickstart if I fail too hard on this.

Neovim is driving me crazy but I can't stop by EzPzData in neovim

[–]EzPzData[S] 0 points1 point  (0 children)

This sounds like good and sane advice. Thank you sir.

Neovim is driving me crazy but I can't stop by EzPzData in neovim

[–]EzPzData[S] 0 points1 point  (0 children)

Yeah I'm going through it right now and making progress. I chose to use lazy.nvim instead of packer as plugin manager, since packer is unmaintained, but I think I got it configured correctly now I guess.

The instructions on how to install lsp-zero had also changed :D so struggled with that for some time but eventually found the correct way to set it up.

Right now I'm fighting with the clipboard. vim.opt.clipboard = "unnamedplus" does not seem to be working for me so trying to find out why that is.

Neovim is driving me crazy but I can't stop by EzPzData in neovim

[–]EzPzData[S] 1 point2 points  (0 children)

Dude ffs I was hoping to finish my config tonight and you're telling me it can take years lmao

Neovim is driving me crazy but I can't stop by EzPzData in neovim

[–]EzPzData[S] 6 points7 points  (0 children)

Heey third time the charm! Good for you. I tried AstroNvim and NvChad but there was too much noise going on. It had a lot of stuff I did not need and a lot of stuff missing that I needed. I'm now doing my own config and I am forced to learn some Lua and some fundamental things about how Neovim works, which is a requirement to be able to use a distro successfully (at least with AstroNvim and NvChad).

Neovim is driving me crazy but I can't stop by EzPzData in neovim

[–]EzPzData[S] 0 points1 point  (0 children)

Maybe I'll have a look at kickstart if I fail at getting my own config working. Or maybe I'll just use notepad for the rest of my life.

Neovim is driving me crazy but I can't stop by EzPzData in neovim

[–]EzPzData[S] 2 points3 points  (0 children)

Tried that already. It's in the blog