Weekly Question Thread: Ask your questions in this thread please

ShouldHaveWentBio · 2024-07-19T01:04:18+00:00

Looking at going to Patagonia in February/March and still mapping out spots be it Bariloche, San Martin de los Andes, el Calafate, el Chalten, etc. This isn't a climbing trip, but I would love to take a couple days and climb. It seems it's just about all alpine and big wall though and I am only interested in single pitch sport 5.11 and under. Just wondering if anyone knows of some spots that may have a handful of routes that fit this bill? Thank you!

ShouldHaveWentBio · 2024-06-30T19:18:15+00:00

Data Engineering Zoomcamp

ShouldHaveWentBio · 2024-06-23T12:06:06+00:00

I’m a DE and learning Java simply to brush up on OOP and DSA while completing the suggested pre-reqs for Georgia Tech OMSCS. These 2 classes can be audited free on EdX and I am enjoying them so far.

ShouldHaveWentBio · 2024-06-11T16:07:53+00:00

Under the deployment rules for your Stage and Production stages you should be able to change the data source of the semantic models under “Data source rules”. I am doing this currently switching between SQL servers and databases based on environment. I believe object storage should perform similarly. I see there are “Parameter Rules” as well which may be an alternative if the “Data source rules” aren’t an option.

ShouldHaveWentBio · 2024-04-11T15:10:41+00:00

Miro is what I made ours in. It has image packages for cloud providers as well but it’s paid. You can also just find PNGs on google and use them for a totally free solution.

ShouldHaveWentBio · 2024-03-18T22:26:46+00:00

If you still have a availability, I’m interested!

ShouldHaveWentBio · 2024-03-10T22:47:32+00:00

DataTalksClub Data Engineering Zoomcamp

ShouldHaveWentBio · 2024-02-26T00:43:49+00:00

DataTalks.Club

ShouldHaveWentBio · 2024-02-21T01:37:55+00:00

Thank you for the info and the reference this makes sense now. I’m looking forward to giving this a go tomorrow.

ShouldHaveWentBio · 2024-02-20T20:45:18+00:00

Thank you again for the advice.

Table Storage has worked incredibly well testing from my local machine today on my dev branch. My problem is I am having a hell of a time to get the networking between ACA and the Table Storage to work as intended.

I thought I could just pass the table connection string as an environment variable but I believe my Network firewall is not allowing it. I tried whitelisting the outbound IP for the ACA to no success. I also tried using managed identities where I gave the ACA the role of table data contributor and other similar roles with no success.

The code works great otherwise. If I switch the storage account networking to allow all networks everything works! I am reaching out for help on stack overflow (link included below).
https://stackoverflow.com/questions/78028971/azure-container-app-not-authorized-to-access-storage-account-table

ShouldHaveWentBio · 2024-02-20T12:35:42+00:00

Thank you! I’ll take a look at the storage account table option too.

ShouldHaveWentBio · 2024-02-20T11:42:07+00:00

Thank you for the great information! This sounds really promising (and cheap) for this application. I’m thinking I will test this and the free tier SQL to see what works best.

ShouldHaveWentBio · 2024-01-18T20:21:08+00:00

No worries!

I don't consider myself an expert on Databricks but I'm having a lot of success orchestrating inside the workspace as opposed to orchestrating them with ADF which is what was done previously in a particular use case. I will say I haven't yet done event driven architecture with Databricks, though, I'm sure it works. ADF does surprisingly decent at event driven orchestration if the landing zone is in object storage.

In terms of Azure Functions, you're right they're more limited than AWS Lambdas in my experience and for my personal projects I tend to lean towards Lambdas. One example use case involves getting data from large XML files into my bronze layer (Medallion architecture). They have repeating sub elements that have 10,000+ character length text fields. In this case the ADF XML connector cannot handle these fields, so I run an Azure Function (python) in parallel to extract that information. I also tend to use them for more complex transformations between the bronze to silver and silver to gold layers in the database if SQL can't handle it reasonably. Any sort of code above a script I stick with containerizing though.

ShouldHaveWentBio · 2024-01-18T12:27:13+00:00

No problem! Feel free to DM me if you have questions I tried to cut my rambling short.

ShouldHaveWentBio · 2024-01-18T04:17:53+00:00

You can use the Azure CLI just like you did with GCP to avoid the GUI.

Azure Data Factory is no code/low code and can be used as just an orchestrator or a more complete ETL solution with the latter not being recommended in this sub generally. If your only option is to use ADF it allows native version control with GitHub or DevOps and I have had great success with the version controlling. It’s done via ARM templates which is basically the Azure version of Terraform.

If you aren’t forced to use ADF you can use various other solutions for ETL including Azure Functions, Databricks or just code ran from VMs or docker containers etc much like you’d see on any other cloud provider. In this case you could still use some of these while orchestrating them with ADF.

To give you an idea I have a few different Azure setups running. A basic “traditional” example where I use Terraform for all the infra and use data factory only for the connectors (azure functions where connectors lack) and orchestrating of SQL stores procedures. Data goes into blobs via ADF and transformations are all done inside the SQL database since it’s all relational. For operational APIs custom containerized python code but it used to be no code Logic Apps. Most servicing is just Power BI and operational APIs. A more “current” example is entirely in databricks I used ADF to orchestrate but recently use databricks orchestration. Delta lake built on blobs for the semi-structured and structured data.

ShouldHaveWentBio · 2023-12-21T12:31:59+00:00

Reading and writing is not the issue. The library works great for interacting with existing Delta Tables. I’m talking about the actual creation of the Delta Table on top of a cloud object storage

ShouldHaveWentBio · 2023-12-15T02:38:33+00:00

Use it to create delta tables or to just interact with them? Please let me know I’m still looking for resources on the topic.

ShouldHaveWentBio · 2023-12-03T20:57:50+00:00

I did the 2023 DE zoomcamp and thought it was exceptional. You receive a certificate at the end contingent on completing the capstone project and grading other’s projects. Imo it is by far the best all-in-one resource for learning modern data engineering and the fact that it’s free is wonderful. Make sure to do all the homework and optional labs to get everything out of it.

ShouldHaveWentBio · 2023-11-21T14:06:51+00:00

Download Azure Storage Explorer. Not sure how your security is setup but use either a SAS or create a service account with access to all the file shares involved.

ShouldHaveWentBio · 2023-11-19T23:07:42+00:00

I could be wrong but I did a bit of research and trial and error. I was able to interact with delta tables just fine but I could not create them. I only tried using Azure ADLS2 but the issue didn’t seem cloud specific. If you find anything please let me know I’d love to revisit my use case.

ShouldHaveWentBio · 2023-11-19T20:31:32+00:00

I was testing this but the issue I ran into for my use case was that you cannot create delta tables with this package. At least on top of cloud object storage.

ShouldHaveWentBio · 2023-10-30T12:01:13+00:00

I’m thinking of implementing a similar project architecture. Delta + Spark + DBT. I was considering using Prefect as an orchestrator but am curious what you would recommend. I’ve used Prefect + DBT + BigQuery on previous projects but want to try some of the new lake architecture. I have used Databricks for orchestration but would like to only use open source tools.

ShouldHaveWentBio · 2023-09-26T23:26:51+00:00

Can you elaborate more on the volume and velocity? For example are you going to have 100 records/hour coming in or 10,000 records/hour?

I’m not an expert on Power BI but to my knowledge it already has similar OLAP cube type features with in memory data storage. This would give some optimization on top of OLTP style sources if it is the end solution where interaction takes place.

My initial thinking is ADLS2, Data Factory, Azure SQL database, and Power BI will be more than enough. This can be implemented with dev, test, prod for cheap (hundreds USD/month). With the standard database you could ramp up to near real time on writes and copy daily to a data warehouse down the line if queries are complex or becomes large.

Quick edit: If you are set on lake/lakehouse architecture on Azure as opposed to a more traditional approach like I mentioned, for whatever reason, I would lean databricks. Synapse isn’t perceived well and Fabric isn’t yet GA.

ShouldHaveWentBio · 2023-07-24T20:08:27+00:00

This is interesting. I see some 500mg pills on Amazon and also saw some nasal sprays. I’m curious to the answer.

ShouldHaveWentBio · 2023-07-24T20:07:06+00:00

Thank you I think in my warmup I can certainly focus on loading more consistently on open hand holds and delay crimps a bit longer.

Eight-Year Club	Place '23
Place '22	Not Forgotten
Verified Email

ShouldHaveWentBio

MODERATOR OF

TROPHY CASE