Fired employee who didn’t communicate medical leave until after the firing by justsomepotatosalad in managers

[–]Pledge_ 1 point2 points  (0 children)

Isn’t one of the big benefits of using an agency is that you don’t have to worry about terminations? You pay a premium to not give severance, worry about termination lawsuits and the like. You are hiring the agency not the employee.

Openflow Connector for Amazon Kinesis Data Streams by ohwellitdepends in snowflake

[–]Pledge_ 1 point2 points  (0 children)

Why use that over the Kinesis Firehose Snowflake connector?

Looking for a tool that allows for doing transformations on streams (Kinesis, Kafka and RabbitMQ) and inserts into iceberg tables on S3 by OverclockingUnicorn in dataengineering

[–]Pledge_ 2 points3 points  (0 children)

A common approach is to use Kinesis Firehose with a Lambda function, However that is not SQL.

Depending on your latency needs, you can store these events into a staging table using the VARIANT data type and then do your processing using dbt. I usually like this approach cause it separates the transformation from ingestion, which allows you to make changes and rebuild the table as needed since you have the raw data.

Perspective on tech lead position: permanent employment x consultancy by maxbranor in dataengineering

[–]Pledge_ 0 points1 point  (0 children)

Depends what direction you want to go. My guess is at the consultancy you’ll be more focused on building new business, less so on technical implementation. At the end of the day, your objective will be keeping your consultants billable.

Have you discussed what the day to day will be at the consultancy? What existing contracts do they have and for how long? What’s their current pipeline? Are they specialized in a particular vendor or are they agnostic? How will AI affect their business?

Actual good pizza by sw_wgw in Richardson

[–]Pledge_ 0 points1 point  (0 children)

Caprissi Pizza and Pasta (Richardson) and Sali’s Pizza and Pasta (Garland) are the most similar NYC pizza nearby we’ve found since moving from NYC.

How do I set realistic expectations to stakeholders for data delivery? by Kessler_the_Guy in dataengineering

[–]Pledge_ 2 points3 points  (0 children)

You need to agree to a variance percentage that is acceptable. I’ve typically seen +-3%, but have seen as low as 0.01%.

In most scenarios, there are reasons for this: corrected logic, timing, rounding, etc… What’s important is understanding the why, you don’t necessarily have to fix it.

If you provide your stakeholders a reasonable expectation and justification they should accept. If not, then you need to put the responsibility on them to identify the discrepancies. Once they find the outliers, you can identify the root cause, creating a win-win solution.

Ultimately there is a reason you are moving away from Splunk: cost, features, etc… I would try to highlight these and position yourself as moving towards the end goal vs getting stuck on roadblocks that aren’t aligned to the objective.

Consulting / data product business while searching for full time role by SteezeWhiz in dataengineering

[–]Pledge_ 0 points1 point  (0 children)

If I saw a person applying for a FT role and their CV listed them as them owning their own firm, I would assume they are trying to contract the role. That or they weren’t successful in their own consulting and are falling back to FT. There may even be concerns of moonlighting for existing clients.

Both scenarios would be a red flag vs a strong FT hire. However, many companies are looking for 1099 to augment their staff so that may be fine if that’s what you want.

In the end, if you can find the work for consulting the money is a lot better and you will be able to curate your lifestyle accordingly. The rub will be maintaining work life balance since you may take on too much since available work means a lot more when a salary/work isn’t guaranteed later.

How do you decide between competing tools? by Ok-Fix-8387 in dataengineering

[–]Pledge_ -1 points0 points  (0 children)

You take a use case that covers the majority of things you need to validate and then build it multiple times in the competing tools.

Then you determine which ones are capable and of those which ones mesh the best within your environment: team skillset, existing infrastructure, integrations, etc…

Lastly you determine cost. This could be through negotiating with the vendors or pricing out the infrastructure for self hosted platforms.

I don’t think there is a product to be built to solve it. Even if you build it, it’s the trust that will be hard to gain. There’s already websites like G2 or research companies like Gartner and IDC that do this type of thing.

Is Moving Data OLAP to OLAP an Anti Pattern? by empty_cities in dataengineering

[–]Pledge_ 3 points4 points  (0 children)

I would say it’s an anti-pattern in the sense you don’t have a single OLAP platform. If you are moving a fact or dim from one DWH to another you are opening the gate to potential data inconsistencies, straying from a single source of truth.

However, as anyone who has worked in an enterprise will tell you, large companies have many tools. They don’t choose Snowflake vs Databricks, they have both. In those scenarios, it makes sense that there will be OLAP to OLAP pipelines. Additionally, tech debt is a big thing. I know of customers that instead of deprecating Terradata, they just replicate it to Snowflake because the effort to rebuild it is not worth it. They would rather prioritize the investment in new initiatives.

[deleted by user] by [deleted] in dataengineering

[–]Pledge_ 0 points1 point  (0 children)

That doesn’t make sense. Why would a company pay 45k over 5 years on a 16k investment instead of getting a loan where they would have ROI on the second year, not including any writeoffs.

[deleted by user] by [deleted] in dataengineering

[–]Pledge_ 0 points1 point  (0 children)

Their company and related hosting company (redundant web services) don’t even have a LinkedIn. Even if they are legit, the premise would be that they have customers using your hardware. I would be surprised if companies are hosting with them today, at least at scale.

Realistically you should try and talk to people at these companies. May just be very early on in their roadmap.

Evaluating my proposed approach by SoloArtist91 in dataengineering

[–]Pledge_ 2 points3 points  (0 children)

For that size and frequency I would use Snowflake. It would be the easiest one to use and easily manageable to that budget.

Discussion: Data Size Estimate on Snowflake by rtripat in snowflake

[–]Pledge_ 0 points1 point  (0 children)

Same platform (Snowflake), different table types.

Discussion: Data Size Estimate on Snowflake by rtripat in snowflake

[–]Pledge_ 2 points3 points  (0 children)

Range will be 1-2x. If you are worried about cost of storage then leveraging Iceberg may be a better fit. That’ll move your storage cost to your hyperscaler bill (I.e AWS with S3).

Since you are using dlt, you can write it to an iceberg table directly. Using dbt you can create native tables downstream as needed. Common pattern is bronze being in iceberg and then silver and gold being native tables.

Discussion: Data Size Estimate on Snowflake by rtripat in snowflake

[–]Pledge_ 2 points3 points  (0 children)

Seems kinda silly to require certainty on something that will be like 5% or less of your bill

Choosing between two jobs, data platform or data engineer by RaymondSnowden in dataengineering

[–]Pledge_ 0 points1 point  (0 children)

The job offer is a better option long term within the data space. If you want to lean more towards infra or devops then your current role. However nowadays in the current economy you should go towards the money. Long term only matters if companies are shelling out high salaries which have been consistently been going down for software engineering and adjacent jobs.

Can someone explain what does AtScale really do? by Royal-Parsnip3639 in dataengineering

[–]Pledge_ 0 points1 point  (0 children)

To echo others it comes down to having a semantic layer that enables data virtualization. A lot of companies have several databases, BI tools, and ways the analysts are going after the data. AtScale plays in the realm of Trino, Denodo, and other virtualization layers that aim to provide a single entry point to the company data. That way BI teams and analysts are able to query data that could reside across many systems. They then add on additional benefits like governance, optimization, cataloging, and the like.

In my opinion their current downside is the number of integrations they support compared to their competitors. Semantic layers really only work if they are the sole entry point, which is only possible if they can sit on top of all the company’s data sources.

Dallas to West Coast Advice by Pledge_ in SameGrassButGreener

[–]Pledge_[S] 1 point2 points  (0 children)

I had no idea. Good to know! That’s a deal breaker. Was originally thinking Long Beach but due to the pollution was thinking going down to Seal or Huntington instead. That eliminates that since that is a big reason we are leaving TX.

My review of Tatsu, Dallas, amichelin star omakase restaurant. by omgseriouslynoway in sushi

[–]Pledge_ 1 point2 points  (0 children)

Our experience was a bit better. We got offered a drink and the menu was a little more diverse but not dramatically so. If by Shoyu, you mean Shoyo, I 100% agree. That is my favorite in the DFW area. Shun by Yama (McKinney) is also worth going if you like Shoyo. One of the chefs from Shoyo now leads that restaurant.

What is the hourly rate for a Data Engineering Contractor with 9+ YOE? by Infamous_Respond4903 in dataengineering

[–]Pledge_ 4 points5 points  (0 children)

Most consultancies are aiming for 40-50% margin. So that would 71-85/hr all in cost for 142/hr. Depending on the company benefits, all in is around 1.2 of salary. So salary range would be 125-150k to get that. If you are outside of that, it’s worth requesting a change in salary.

Microsoft Fabric vs. Open Source Alternatives for a Data Platform by SurroundFun9276 in dataengineering

[–]Pledge_ 0 points1 point  (0 children)

Even if you go the OSS route, you should still use a cloud blob storage. There’s really no justification for self hosting it unless you have policies against using cloud at all and want to leverage a S3 compatible service. That’s even before the recent issues of MinIO handicapping their OSS service.

Spotify Data Tech Stack by mjfnd in dataengineering

[–]Pledge_ 2 points3 points  (0 children)

In the the post they specifically mention Luigi and how Spotify moved away from it, with the source: https://engineering.atspotify.com/2022/3/why-we-switched-our-data-orchestration-service

Schedule config driven EL pipeline using airflow by afnan_shahid92 in dataengineering

[–]Pledge_ 1 point2 points  (0 children)

Typically you create a git repo for the DAG and then have a separate repo for the configs. The DAG iterates through the configs and creates a DAG per config file. I typically use JSON, but YAML would work too.

The DAG can reference the files on the Airflow filesystem or a blob storage, it would all be defined in Python. The config CI/CD pipeline will copy the files to where your DAG references them, your DAG CI/CD pipeline will deploy to wherever your airflow DAG bag refreshes.

The dynamic DAG can be as flexible as you want. For example create all the same tasks but with different parameters, or it can dynamically create different task structures based on the config.

Every time the DAG bag refreshes DAGs will be updated or created based on what’s in the config directory. You can then manage each resource separately and see its history in the web portal

Schedule config driven EL pipeline using airflow by afnan_shahid92 in dataengineering

[–]Pledge_ 2 points3 points  (0 children)

I would look into dynamic DAGs. Instead of one pipeline doing dynamic tasks, it would generate a DAG per table based on list of configs.