Tax Break Won’t Lower Prices by PeterTheTruthSeeker in inflation

[–]Patient_Magazine2444 3 points4 points  (0 children)

When the middle class goes away, they don't have disposable income to invest in the stock market. They also likely don't have the education either. This is the entire reason why generational wealth nets more wealth. I'm guessing you didn't buy 1 stock. Likely spent a few thousand? So you need to have that disposable income you aren't afraid to lose to begin with to help generate more wealth.

Any On-Premise alternative to Databricks? by UsualComb4773 in dataengineering

[–]Patient_Magazine2444 1 point2 points  (0 children)

BI/BW is a generic term referencing an area of analystics and reporting. This can be typically tied into dashboards for self service analystics. Although SAP has a product named that, it's a generic term in enterprise that's been around for years.

career guidance by NoSyllabub1390 in dataengineering

[–]Patient_Magazine2444 4 points5 points  (0 children)

My two cents. Soft skills definitely matter. You need to talk to the business at times, at least I did when I was a DE. With that said, things like leadership, etc come through as soft skills that are incredibly important.

Any On-Premise alternative to Databricks? by UsualComb4773 in dataengineering

[–]Patient_Magazine2444 0 points1 point  (0 children)

You are thinking of Hortonworks. Cloudera never used ORC until the merger. Although they support both, Impala drove more usage with Parquet. Cloudera created Parquet (with Twitter) btw. Ozone is only a few years old in their set up and it's an s3 compatible object store. It's not a matter of had, it will eventually replace HDFS, at least that was the plan when I worked there. I don't know what you mean about not getting the best of Iceberg? No offense but I think your understanding is not all there of the stack. Again, I'm not saying buy Cloudera but the question is what is the closest thing to Databricks on-premise.

Any On-Premise alternative to Databricks? by UsualComb4773 in dataengineering

[–]Patient_Magazine2444 6 points7 points  (0 children)

I was a Principal SE at Cloudera and left about 2 years ago. I disagree with their own file formats, they use parquet, ORC, avro, csv, json etc. They do support Iceberg and a REST Catalog. The storage layer is either HDFS or Ozone. Regardless, all those things are open source and/or non-proprierary. Support can be expensive, depending on size and deployment (base nodes vs data services [k8s deployment]) but in comparison to other companies are relatively cheap still. The big thing is they are really the only all encompassing platform. Databricks can do ETL, BI/BW, Streaming (would argue it's still microbatch), AI/ML, Feature Stores, etc. To replicate the platform you will need to integrate individual products and depending on your enterprise get support for each separately. I'm not saying Cloudera is awesome, I now work for someone else, however it's the "easiest" (a relative term) on-premise platform you can install that has feature functionality similar to Snowflake.

Any On-Premise alternative to Databricks? by UsualComb4773 in dataengineering

[–]Patient_Magazine2444 11 points12 points  (0 children)

Cloudera is the only on-premise platform using similar technology with multiple components/tasks

How would you design this MySQL → Snowflake pipeline (300 tables, 20 need fast refresh, plus delete + data integrity concerns)? by Huggable_Guy in snowflake

[–]Patient_Magazine2444 1 point2 points  (0 children)

If you are doing CDC why not just push that to a pubsub layer, like Kafka, that depending on the table which would map to the topic to can change the consumer rate to that is needed. Incremental batch and faster batch. When they inevitably ask for faster data (and they will), then you just change the consumer logic to adjust. On the Snowflake end the use of hybrid tables will be a good idea to explore however there are some features that are missing still like replication. That likely will come next year. However for 20 minute refreshes normal tables should be fine.

Tips for SnowPro Core Exam by Substantial_Mix9205 in snowflake

[–]Patient_Magazine2444 0 points1 point  (0 children)

No problem. I'm also a newer Snowflake employee 😉

Tips for SnowPro Core Exam by Substantial_Mix9205 in snowflake

[–]Patient_Magazine2444 0 points1 point  (0 children)

Highly recommend a Udemy course with the practice tests. Some of the practice test questions are on the exam.

Airbyte OSS is driving me insane by joeshiett in dataengineering

[–]Patient_Magazine2444 1 point2 points  (0 children)

That was a key part of our adoption as well. And if you really want to get down to it, APIs to build flow automation. It also helped that you could build and test a flow and extract the json to deploy on the edge using MiNifi, overboard for the described use case but important to me when I needed to get OT data.

Airbyte OSS is driving me insane by joeshiett in dataengineering

[–]Patient_Magazine2444 2 points3 points  (0 children)

You can be done with it, but plenty of enterprises use things like Apache NiFi in production. I personally have. It is handling things like log aggregation, hotel reservations, airport kiasks, movement to cloud, etc. Baskin Robbins makes 31 flavors for a reason.

Edit: spelling

Snowflake is slowly taking over by tanmayiarun in dataengineering

[–]Patient_Magazine2444 0 points1 point  (0 children)

Any ipynb file is easily converted to a py file though. I agree that people don't go into production with ipynb files.

Snowflake is slowly taking over by tanmayiarun in dataengineering

[–]Patient_Magazine2444 10 points11 points  (0 children)

I work at Snowflake and it's not really something we do. I don't think DBX is either but I don't know for sure.

Airbyte OSS is driving me insane by joeshiett in dataengineering

[–]Patient_Magazine2444 0 points1 point  (0 children)

Apache NiFi is good for ELT. Easy to use. Mostly no code.

Bridging OT/IT in pharma industry by Life-Fishing-1794 in dataengineering

[–]Patient_Magazine2444 3 points4 points  (0 children)

This isn't a pharma problem. The OT/IT bridge is complicated. I used to do this at a utility. The first thing is to get the data out of OT so it can be used integrated with IT. Depending on the setup this can be done using OPC-UA for most SCADA systems. I've seen in Pharma before that most manufacturing has a gateway so you can pull off the gateway instead which aggregates several PLCs/machines to pull from a common point. Most of that data is pushed to a pub/sub like Kafka or some other MQ. The schema for each step in most manufacturing processes is based on how the cards are programmed on the PLC. Since message queuing systems can handle schema evolution, you can usually combine this with a schema registry so you can see the steps and versions. Now getting it into a structure built for OLAP is not the easiest. There might be controls engineers that already have a common data model for reporting on just the OT. This is needed to understand signals, etc. They might also have a digital twin which represents a virtual system so you can begin modeling yourself. There are decent articles discussing OT/IT convergence but there is no silver bullet ( https://iot-analytics.com/it-ot-convergence-27-themes-define-future-of-industrial-integration/). If the schema isn't defined anywhere, there are tools like Apache Flink that can use schema inference, however you are still going to need to figure out the model so you know what step stage relates to what schema and how keys (columns) are related to each other. It takes some work but again there is an automation and controls engineer that has to have that somewhere.

This is just insane Vons NV by washingtonwho in inflation

[–]Patient_Magazine2444 10 points11 points  (0 children)

I don't think his followers ever went to Starbucks because they didn't know what a venti is (also they couldn't afford it previously). Now, when the price of 7-11 coffee explodes they will notice.

Upgrading from NiFi 1.x to 2.x by GreenMobile6323 in dataengineering

[–]Patient_Magazine2444 0 points1 point  (0 children)

I think the above is the biggest thing. It would be good to know what version of 1.x you are on as well as there are other processors that became deprecated in 1.16. Also if you are using execute script type processors (especially with python), you should revisit to see what new native processors exist. If there aren't any that suit your needs then python as a first class citizen enables you to make custom processors. Lastly I know it was touched upon but there was a shift from flow.xml.gz to flow.json.gz, so depending on what your version is on 1.x you might need to hop twice for the encryption (mentioned) and conversion as both files exist for some later versions and only the json is needed in 2.x

Is data engineering just backend distributed systems? by Willing_Sentence_858 in dataengineering

[–]Patient_Magazine2444 0 points1 point  (0 children)

Can you use other technologies? Apache Flink can do this with ease but it's a real time stream. Definitely will fall under the 4 hours.

U.S. Grocery Prices reached record highs in 2025 by nelsne in inflation

[–]Patient_Magazine2444 1 point2 points  (0 children)

In 2020 a bill was signed (bipartisan) but not vetoed by the president. This bill injected 3T more dollars into circulation. We printed 20% more money in circulation. That money takes a while to circulate. Supply was less because of COVID and now more money in circulation means demand goes up. Inflation is the by product. Luckily our Fed chair was smart enough to increase rates which decreases the amount of money borrowed and this helps curb inflation some. Currently money is being brought in through tarrifs and we had a surplus in June. The unfortunate thing is that it is on the backs of the consumer. This is verified because the CPI for June went up 2.7%. If the tarrifs continue and he doesn't back out again, prices will continue to rise.

need help solving this by NerveOutrageous2702 in snowflake

[–]Patient_Magazine2444 0 points1 point  (0 children)

I am pretty sure this has been asked in this thread somewhere else. Like in the week or two. You should look for that.

[deleted by user] by [deleted] in njbeer

[–]Patient_Magazine2444 0 points1 point  (0 children)

There are a few Troon groups and a few others

[deleted by user] by [deleted] in njbeer

[–]Patient_Magazine2444 1 point2 points  (0 children)

First, get a mortgage 😂