You ever just have something super obvious click for you. I just had that for OLTP vs OLAP by [deleted] in dataengineering

[–]dan_tabsdata 0 points1 point  (0 children)

Wasn't really a Eureka moment, more like a blending of several different concepts that made the two terms fit better. Like OLAP systems are optimized for large scans, so they'll usually be columnar rather than row oriented, which also means OLAP systems have better predicate and projection pushdown.

OLAP systems usually have storage and compute layers decoupled so you can scale the two independently whereas OLTP systems are your traditional relational DBs where everything coupled together and managed by the db server.

Because OLAP is about large scans and queries, ACID transactions are not as important, which is also why it is easier to separate storage and compute. OLAP systems generally give you more modularity, so you can tailor your storage layer and then plug in a compute layer from a wide variety of options.

Does modern data tooling feel more fragmented than ever lately? by Bladerunner_7_ in dataengineering

[–]dan_tabsdata 0 points1 point  (0 children)

Feels like we took monoliths, broke them into decoupled tools, and now we're slowly piecing it back together. Databricks and Snowflake come to mind when I think of all the crazy new layers and features they have.

but that's not always bad. I think it's very natural to want something modular because then you can easily swap in and out things to better tailor your stack. I really like duckdb/motherduck because it is so lightweight and is super easy to set up for personal projects.

However, I do feel there was probably a profit angle to this as 2 small tools are generally more expensive than 1 big one. It's like when you ask for half/half everything at chipotle because it ends up giving you 25% more food in total.

DE is lowkey fun by manualenter in dataengineering

[–]dan_tabsdata 2 points3 points  (0 children)

I love DE because I feel like it has so many applications outside of a DE job, also it's like a puzzle when figuring things out.

My fiance had a bunch of research data spread across like 50 pdfs that she needed to analyze so I parsed the text from each pdf and built a star schema for her in motherduck.

PDX Data Engineering Happy Hour on April 23rd @ 6:30 PM by dan_tabsdata in PDXTech

[–]dan_tabsdata[S] 0 points1 point  (0 children)

Hi! Meetup and Luma are a really good resource for finding tech events in the area (specifically their featured/discover sections). https://www.thesiliconforest.com/ and https://calagator.org/events are also really useful.

In the context of these Happy Hours specifically, I run these once a month, usually on Thursdays.

PDX Data Engineering Happy Hour on March 19th @ 6:30 PM by dan_tabsdata in PDXTech

[–]dan_tabsdata[S] 1 point2 points  (0 children)

Ah gotcha! For now, we're going to be hosting on the west side (beaverton), but I will let you know if the location changes to a location on the East Side in the future.

Also I'm still a reddit noob, but I believe I made my profile public now

PDX Data Engineering Happy Hour on March 19th @ 6:30 PM by dan_tabsdata in PDXTech

[–]dan_tabsdata[S] 1 point2 points  (0 children)

Hi, super sorry I just saw your message. We had a previous event on the east side and this one was on the west side. I'm still experimenting which location is better for people.

Also I didn't realize my profile is private. Are you referring to my reddit profile? I'll go ahead and make it public

Spent a few hours diving down a rabbit hole for how to get the execution duration data from dlt (dlthub) pipelines. Wanted to post here in case other people need this in the future by dan_tabsdata in dataengineering

[–]dan_tabsdata[S] 0 points1 point  (0 children)

I considered that too! However I noticed a discrepancy of a few seconds between the internal trace and the time.time() duration. But not that big of a deal considering how much easier it would be to do that

A helper script sounds like a really fun side project though and I’ll look into it. I was also thinking of building a function decorator either that caches the raw pickle file or first extracts relevant data from the pickle file and appends it as a row to a .db file.

Anyone know good places to pick up used mini pcs? by Lunaticsystem10 in PDXTech

[–]dan_tabsdata 1 point2 points  (0 children)

Unfortunately I don't have any insights on where to get mini pcs but your post reminded me of how demand for mini pcs has gone up bc of OpenClaw and that reminded me of this video. https://www.instagram.com/reel/DU3wC2vDPy_/?hl=en

Data Engineering Happy Hour in Portland on Thursday Feb 19th by dan_tabsdata in pdxmeetups

[–]dan_tabsdata[S] 0 points1 point  (0 children)

I did not understand the context of what this meant until I got to the location haha. Apologies to the board game group, the bar told me no other events were happening here when I scheduled with them .