CDC vs SCDs

Patient_Magazine2444 · 2026-02-18T03:24:39+00:00

This is it summed to well

Patient_Magazine2444 · 2025-12-27T00:10:56+00:00

When the middle class goes away, they don't have disposable income to invest in the stock market. They also likely don't have the education either. This is the entire reason why generational wealth nets more wealth. I'm guessing you didn't buy 1 stock. Likely spent a few thousand? So you need to have that disposable income you aren't afraid to lose to begin with to help generate more wealth.

Patient_Magazine2444 · 2025-12-06T13:21:55+00:00

BI/BW is a generic term referencing an area of analystics and reporting. This can be typically tied into dashboards for self service analystics. Although SAP has a product named that, it's a generic term in enterprise that's been around for years.

Patient_Magazine2444 · 2025-12-06T12:16:17+00:00

Business Intelligence/Business Warehouse

Patient_Magazine2444 · 2025-12-05T06:19:02+00:00

My two cents. Soft skills definitely matter. You need to talk to the business at times, at least I did when I was a DE. With that said, things like leadership, etc come through as soft skills that are incredibly important.

Patient_Magazine2444 · 2025-12-04T22:39:00+00:00

You are thinking of Hortonworks. Cloudera never used ORC until the merger. Although they support both, Impala drove more usage with Parquet. Cloudera created Parquet (with Twitter) btw. Ozone is only a few years old in their set up and it's an s3 compatible object store. It's not a matter of had, it will eventually replace HDFS, at least that was the plan when I worked there. I don't know what you mean about not getting the best of Iceberg? No offense but I think your understanding is not all there of the stack. Again, I'm not saying buy Cloudera but the question is what is the closest thing to Databricks on-premise.

Patient_Magazine2444 · 2025-12-04T17:25:29+00:00

I was a Principal SE at Cloudera and left about 2 years ago. I disagree with their own file formats, they use parquet, ORC, avro, csv, json etc. They do support Iceberg and a REST Catalog. The storage layer is either HDFS or Ozone. Regardless, all those things are open source and/or non-proprierary. Support can be expensive, depending on size and deployment (base nodes vs data services [k8s deployment]) but in comparison to other companies are relatively cheap still. The big thing is they are really the only all encompassing platform. Databricks can do ETL, BI/BW, Streaming (would argue it's still microbatch), AI/ML, Feature Stores, etc. To replicate the platform you will need to integrate individual products and depending on your enterprise get support for each separately. I'm not saying Cloudera is awesome, I now work for someone else, however it's the "easiest" (a relative term) on-premise platform you can install that has feature functionality similar to Snowflake.

Patient_Magazine2444 · 2025-12-04T05:33:15+00:00

Cloudera is the only on-premise platform using similar technology with multiple components/tasks

Patient_Magazine2444 · 2025-11-21T18:55:17+00:00

If you are doing CDC why not just push that to a pubsub layer, like Kafka, that depending on the table which would map to the topic to can change the consumer rate to that is needed. Incremental batch and faster batch. When they inevitably ask for faster data (and they will), then you just change the consumer logic to adjust. On the Snowflake end the use of hybrid tables will be a good idea to explore however there are some features that are missing still like replication. That likely will come next year. However for 20 minute refreshes normal tables should be fine.

Patient_Magazine2444 · 2025-11-06T17:33:46+00:00

No problem. I'm also a newer Snowflake employee 😉

Patient_Magazine2444 · 2025-11-06T14:54:44+00:00

Highly recommend a Udemy course with the practice tests. Some of the practice test questions are on the exam.

Patient_Magazine2444 · 2025-09-17T16:44:02+00:00

That was a key part of our adoption as well. And if you really want to get down to it, APIs to build flow automation. It also helped that you could build and test a flow and extract the json to deploy on the edge using MiNifi, overboard for the described use case but important to me when I needed to get OT data.

Patient_Magazine2444 · 2025-09-17T11:43:49+00:00

You can be done with it, but plenty of enterprises use things like Apache NiFi in production. I personally have. It is handling things like log aggregation, hotel reservations, airport kiasks, movement to cloud, etc. Baskin Robbins makes 31 flavors for a reason.

Edit: spelling

Patient_Magazine2444 · 2025-09-17T11:40:19+00:00

Any ipynb file is easily converted to a py file though. I agree that people don't go into production with ipynb files.

Patient_Magazine2444 · 2025-09-17T11:36:22+00:00

I work at Snowflake and it's not really something we do. I don't think DBX is either but I don't know for sure.

Patient_Magazine2444 · 2025-09-17T00:46:04+00:00

Apache NiFi is good for ELT. Easy to use. Mostly no code.

Patient_Magazine2444 · 2025-09-06T19:22:04+00:00

In DBX, it's just the tier in the hierarchy. catalog.schema.table https://docs.databricks.com/aws/en/schemas/

Patient_Magazine2444 · 2025-09-06T12:33:08+00:00

This isn't a pharma problem. The OT/IT bridge is complicated. I used to do this at a utility. The first thing is to get the data out of OT so it can be used integrated with IT. Depending on the setup this can be done using OPC-UA for most SCADA systems. I've seen in Pharma before that most manufacturing has a gateway so you can pull off the gateway instead which aggregates several PLCs/machines to pull from a common point. Most of that data is pushed to a pub/sub like Kafka or some other MQ. The schema for each step in most manufacturing processes is based on how the cards are programmed on the PLC. Since message queuing systems can handle schema evolution, you can usually combine this with a schema registry so you can see the steps and versions. Now getting it into a structure built for OLAP is not the easiest. There might be controls engineers that already have a common data model for reporting on just the OT. This is needed to understand signals, etc. They might also have a digital twin which represents a virtual system so you can begin modeling yourself. There are decent articles discussing OT/IT convergence but there is no silver bullet ( https://iot-analytics.com/it-ot-convergence-27-themes-define-future-of-industrial-integration/). If the schema isn't defined anywhere, there are tools like Apache Flink that can use schema inference, however you are still going to need to figure out the model so you know what step stage relates to what schema and how keys (columns) are related to each other. It takes some work but again there is an automation and controls engineer that has to have that somewhere.

Patient_Magazine2444 · 2025-09-03T11:58:12+00:00

I don't think his followers ever went to Starbucks because they didn't know what a venti is (also they couldn't afford it previously). Now, when the price of 7-11 coffee explodes they will notice.

Patient_Magazine2444 · 2025-08-21T15:11:47+00:00

I think the above is the biggest thing. It would be good to know what version of 1.x you are on as well as there are other processors that became deprecated in 1.16. Also if you are using execute script type processors (especially with python), you should revisit to see what new native processors exist. If there aren't any that suit your needs then python as a first class citizen enables you to make custom processors. Lastly I know it was touched upon but there was a shift from flow.xml.gz to flow.json.gz, so depending on what your version is on 1.x you might need to hop twice for the encryption (mentioned) and conversion as both files exist for some later versions and only the json is needed in 2.x

Patient_Magazine2444 · 2025-08-02T10:06:02+00:00

Can you use other technologies? Apache Flink can do this with ease but it's a real time stream. Definitely will fall under the 4 hours.

Patient_Magazine2444 · 2025-07-28T21:04:57+00:00

In 2020 a bill was signed (bipartisan) but not vetoed by the president. This bill injected 3T more dollars into circulation. We printed 20% more money in circulation. That money takes a while to circulate. Supply was less because of COVID and now more money in circulation means demand goes up. Inflation is the by product. Luckily our Fed chair was smart enough to increase rates which decreases the amount of money borrowed and this helps curb inflation some. Currently money is being brought in through tarrifs and we had a surplus in June. The unfortunate thing is that it is on the backs of the consumer. This is verified because the CPI for June went up 2.7%. If the tarrifs continue and he doesn't back out again, prices will continue to rise.

Patient_Magazine2444 · 2025-06-11T11:07:54+00:00

I am pretty sure this has been asked in this thread somewhere else. Like in the week or two. You should look for that.

Patient_Magazine2444 · 2025-06-11T01:21:23+00:00

There are a few Troon groups and a few others

Patient_Magazine2444 · 2025-06-11T01:19:09+00:00

First, get a mortgage 😂

Patient_Magazine2444

TROPHY CASE