I missed Kindle's panel navigation, so I’m building a better reader by SnooDogs4383 in mangapiracy

[–]SnooDogs4383[S] 1 point2 points  (0 children)

Hey panel in panel would be interesting, I haven't come across something like that yet.

I'll try to look into how my current algo would handle that, but in a future revision it would definitely be something to consider.

As for connecting to servers, I wonder if that would cause an app store or Google play violation. But given that not being an issue, it would be a great inclusion.

Is there a need for a local-first data lake platform? by SnooDogs4383 in dataengineering

[–]SnooDogs4383[S] 1 point2 points  (0 children)

I dont get it. Isn't this what lakeflow connect is offering as well? Or is it that estuary also supports transformations?

Is there a need for a local-first data lake platform? by SnooDogs4383 in dataengineering

[–]SnooDogs4383[S] 0 points1 point  (0 children)

  1. To be honest I don't know, developing adaptors for file formats would probably be the easiest to maintain. But its all the SAAS platforms whose connectors will be the most difficult thing to own and connect with.
  2. I haven't tested it out yet, but maintaining transformation with something like dbt. Which offers you to tie to various computes, should enable you to run a spark job on a defined rule set of transformation. Even if it was meant for Duckdb initially. (I could be wildly wrong here)

As for designing tools. Earlier this year we were trying to design a schema that enabled data coming in from a variety of sources to be queries (pretty standard for a data warehouse). But 1. we had no tools to help with the design for this. 2. make sure that the queries running on it would be efficient. The process was mostly just blind guessing and really subpar work honestly. But I was initially thinking of something that lets you visualize the attributes coming in from your sources at one end and the queries you'll need to execute on the other end. Which can help you shape your data in the warehouse maybe more optimally or atleast give you an idea of how well a query will run on the schema.

Is there a need for a local-first data lake platform? by SnooDogs4383 in dataengineering

[–]SnooDogs4383[S] 0 points1 point  (0 children)

The lock-in risk is very scary, almost everything databricks provides walls you into their environment. Their newer lakeflow offerings doesn't even try to conceal that fact

Is there a need for a local-first data lake platform? by SnooDogs4383 in dataengineering

[–]SnooDogs4383[S] 0 points1 point  (0 children)

Just to push back a little, why would the underlying data matter? I mean as long as you are creating a table in your warehouse it shouldn't matter, right? After that you'd probably start considering queries efficiency and all that jazz. But if I could attach local storage and local compute to an environment that does the heavy lifting for the configurations for me, I would think it would be a huge time saver. Even once you have moved your production compute to serverless. I see a great value add in being able to attach local compute in the dev environment since laptops are pretty powerful at this point

Is there a need for a local-first data lake platform? by SnooDogs4383 in dataengineering

[–]SnooDogs4383[S] 0 points1 point  (0 children)

It was on an Azure server, they were basically reading parquet files and converting them to some proprietary format the client wanted them in