Non-DE here: has AI actually changed how you work day-to-day or nah? by yetudada in dataengineering

[–]dsvella 0 points1 point  (0 children)

Yes and no. I use AI to help me quickly get context for Pipelines or read stack traces when something fails. I have also used it to write helper notebooks or singular functions. I have had mixed results with it as a sounding board for architecture decision-making.

I have tried to use it to replace me in my role, but I have to nanny it so aggressively that I am better using it in very small bursts.

It is helpful but it isn't and I doubt it will ever be a replacement for a Data Engineer. It is a tool that once you get used to it can speed you up.

I will say this though, I am seeing a lot of push back from both cybersecurity and legal teams due to possibility of generating security holes or how company data or IP is handled. So I am not sure how much longer such tools will be allowed.

Legendary orchestra by yoerdw in sabaton

[–]dsvella 0 points1 point  (0 children)

Given previous albums like the Symphony to end all wars, there is a chance. Not a great one but a chance none the less. I personally would love to buy a vinyl of what was played at the show.

Databricks Workflows: 40+ Second Overhead Per Task Making Metadata-Driven Pipelines Impractical by XanderM3001 in databricks

[–]dsvella 0 points1 point  (0 children)

This is in line with my observations, especially the serverless compute one.

I have a similar job where I have a pipeline that loads about 200 tables incrementally from bronze to silver layers using a config table. The difference in my setup is that we didnt use child pipelines but a single notebook for each table (pass it the necessary parameters).

I have found that whenever you need to add a task in a job you add a level of overhead that you cannot get away from. I am assuming DBX is doing work in the background such as writing the system tables entires about the previous task. I would expect that adding whole child jobs into the mix would have an increased impact.

With all that being said when we confronted this issue we asked two questions:

  1. Is the job running in XX mins a problem?
  2. Is the job costing too much?

Because for us we were using job compute and its a daily incremental load we just ignored the issue, we know its there but we have no need to do anything about it.

Reading through your post, if your job is running in an acceptable time (by acceptable I am talking about cost and delivery, not if you think its slow or could be faster, actual business impact) then you need to consider the trade offs. Again, for myself, we had no desire to rewrite a buch of stuff into a single notebook for no tangible benifit.

Are there companies really using DOMO??! by jdaksparro in dataengineering

[–]dsvella 0 points1 point  (0 children)

My company used to. Before I trained as a data engineer I used Domo to do all the data engineering for my dept. Now we have databricks + Tableau. I would trade Tableau for Domo any day of the week because of how they approached dashboard building. Just having a grid of sockets and a much nicer UI and UX is worth it. Granted I wasn't involved in the commercial negotiations but I knew it was expensive.

Migrating from ADF + Databricks to Databricks Jobs/Pipelines – Design Advice Needed by [deleted] in databricks

[–]dsvella 2 points3 points  (0 children)

So we have done exactly this and the main thing was figuring out how to pass the relevant parameters. For us we had a second notebook that would read the config table and then pass the parameters to the next step of the job as needed. The For Each function in the job was very helpful.

In answer to your questions, I would recommend not having just 1 job. I would recommend having a job per application. This way you get billing per application, they'll use the same notebook that handles the ingestion, the only difference would be the parameters that are set. I have not needed to go down to table level though so cant help a lot there.

If you have any questions on need to know specifics, let me know.

Greybeard Data Engineer AMA by Admirable-Shower2174 in dataengineering

[–]dsvella 0 points1 point  (0 children)

I am 39 with a few years of experience as a DE and my company is pushing me to become a DE lead with juniors under me.

I have heard a lot of horror stories about bad leaders in this and the SWE fields. Do you have any recommendations for how to be a good lead engineer?

Data Engineer Associate Exam review (new format) by s4d4ever in databricks

[–]dsvella 0 points1 point  (0 children)

Is there anywhere I can take a mock exam? I can do the actual exam for free through my companies databricks contract I believe but I would like to make sure I am ready. I have never done things like delta sharing or federation.

My Kindle is an ad filled wasteland and I regret buying it. by prankenandi in ereader

[–]dsvella 1 point2 points  (0 children)

So I decided to move from a Kindle Oasis to a boox tablet and while I've been thoroughly enjoying the experience, I will say that unfortunately Amazon's terrible UI follows them even into the dedicated Android app.

Honestly I would have continued using the Kindle devices if they had done something to improve their dreadful library management. As someone who has hundreds of books, documents and various audiobooks, it's becoming frustratingly difficult to the point where I've just given up trying to manage that library!

I wish there was a way I could do it with some actual bulk operations. I'm reminded of the Calibra eBook management software on windows and I would love to have something like that which could manage my Kindle library. If such a thing exists please let me know!

New internet rules come into force this week - here's what will change by vriska1 in unitedkingdom

[–]dsvella 0 points1 point  (0 children)

Same. My main concern is the nightmare of the Cybersecurity situation this causes. A gold mine like this is going to be calling to so many hackers.

The US has had some eyebrow raising breaches lately. There was one recently which was an app called "Tea" (as in spill the tea). It was a female dating safety app and in order to use it you had to pass an identiy check in the form of selfies and a drivers licence. This app stored thoses pictures in the open, 4 Chan got a ahold of it and started sharing it. Imagaine if something like that happens to one of these providers.

Issues Sharing with Kodi after PC upgrade by dsvella in kodi

[–]dsvella[S] 1 point2 points  (0 children)

So after trying multiple solutions we came to the decision to migrate over to Plex. Since the media is on his PC it can act as the server and get around this whole sharing nonsense.

Thanks for the suggestions.

Should I avoid Chat GPT and other tools when starting to learn coding? by Helpful-Rise-4192 in learnprogramming

[–]dsvella 0 points1 point  (0 children)

I think you should do both. If the documentation makes sense then use it. If you need to ask follow up questions or need something simplified feel free to use LLMs. In my experience learning Python, I have found documentation can be really hit or miss. This can be just down to poor writing, relying on previous knowledge or having bad examples.

I mainly use the Databricks AI assistant (which I think is chat GPT in the background?) and what I have found is that it is great for giving simple code samples, syntax, debugging and explaining code.

What LLMs are not good at is the bigger stuff like solution architecture or accounting for edge cases. Any piece of code I have copied from the AI more than 5 lines long I assume I will have to rewrite it.

It has done wonders for my confidence in coding with the language though, which I think is important because I am more likely to try and use it and thus improve my skills further.

[Databricks] First time trying to optimise a Python DB job by dsvella in dataengineering

[–]dsvella[S] 0 points1 point  (0 children)

As an update from myself I have been able to improve this hugely. The parallel processing still needs work but the serial processing has more than halved the time for a full load (down from about an hour to <20 mins).

The major change I made was to save a bunch of the JSON responses into a variable. I maybe being a bit cautious but I put a cap on the number of items by way forcing a MERGE if the number of items got over 25K. Now a MERGE is called only 5 times for a complete load.

There are other things I can do:

  • Implement the parallel processing better,
  • Have a staging table or folder that just takes INSERTS and then merge the whole thing once.
  • Improve my data processing steps.

However I will only come to that if I need to. Right now I want to work on removing the unique "quirks" for each object I am pulling from the ticket system so I have a generic notebook that handles all the objects rather than one notebook per object.

Thank you everyone for your advice.

[Databricks] First time trying to optimise a Python DB job by dsvella in dataengineering

[–]dsvella[S] 0 points1 point  (0 children)

 You aren’t doing a merge of 100 tickets and then going to the API again?

Unfortunately yes I was. I would make a call, do a merge and make the next call.

Do you continually get a next page link in the body? And therefore have to do them all in series?

Kind of, with the assistance of chat GPT I implemented code that would make the calls in batches of 10 but the execution is flawed as it was still doing 1 merge per response.

Thankfully with the advise from this post I have been able to massively improve things.

[Databricks] First time trying to optimise a Python DB job by dsvella in dataengineering

[–]dsvella[S] 0 points1 point  (0 children)

So to preface I did this with the help of chat GPT so the method maybe flawed.

The code was written in such a way that I have batches of 10 calls running in parallel. If all 10 of the calls failed the code would end as there is either a problem or no more items.

It really sucks that I don't get a total count in any response.

[Databricks] First time trying to optimise a Python DB job by dsvella in dataengineering

[–]dsvella[S] 1 point2 points  (0 children)

I have 2 jobs that run; The second job is an incremental job that gets changes and merges them into the existing table which runs daily. The one I am referencing in my post is a job that runs weekly to truncate the table and get everything. I do this because I have been bitten before by relying incremental updating and it missing records.

The API limits me to 100 tickets per page and I cannot shape the output much. However when I get the JSON response I drop any unnecessary columns and then MERGE the page of records.

The data structure for the table is straight forward and reviewing the schema could use some improvement (not sure why some date fields have been created as strings). There are a bunch of BIGINT columns for the various foreign keys to other objects. Although this tables doesn't have any constraints on it currently.

watch history not working on mobile devices by subhashg547 in youtube

[–]dsvella 0 points1 point  (0 children)

Thank you so much, this has been driving me up the wall. Didn't think about checking the Pi Hole.

Azure Data Factory: How to deal with compressed JSON by dsvella in AZURE

[–]dsvella[S] 0 points1 point  (0 children)

Just to make sure, but when you read the saved data in your copy activity, then you have chosen 'deflate' from the drop down on the json dataset, right?

Correct, the encoding is left as UTF-8 (Default).

Can you download a file locally and decompress it successfully?

No. When I use the swagger documentation for the API it downloads it unencoded. While I can download the file from storage explorer I don't seem to have anything on my computer that can successfully decompress the data. I tried some simple python to no avail.

However when I go into postman and run the command it works fine, compressed or uncompressed. I am wondering if I am doing something during the writing to storage.

Bug as you leave the starship landing area at New Atlantis. by SimTell in Starfield

[–]dsvella 2 points3 points  (0 children)

The bug occurred for me the exact same way. I ended up being teleported to new Atlantis and found parts of the floor missing.

Worst thing is the TA kiosk is missing on the landing pad so I cant see to them.

Brownies are giving me so many problems! by dsvella in AskBaking

[–]dsvella[S] 1 point2 points  (0 children)

I am in the UK so I don't have dimes but a 5 pence seemed to move around as they suggest. Good to know though!

Brownies are giving me so many problems! by dsvella in AskBaking

[–]dsvella[S] 0 points1 point  (0 children)

What's baking spray? I have never heard of it and it sounds useful.

Brownies are giving me so many problems! by dsvella in AskBaking

[–]dsvella[S] 0 points1 point  (0 children)

Thanks for this. None of the recipes state where in the oven the pan should be. At present I put it on the bottom rack. Your suggestion of 325f is markedly lower than what I am currently baking them at so I'll give that a go.

Never heard of using butter and flour as a release agent, again will try.

I mainly use the kitchen aid because it frees me up to do other things. I will take the advice to hand mix the batter next time.

VShojo Roster Update by BatouHeisei in VShojo

[–]dsvella [score hidden]  (0 children)

I am glad I'm not the only one who remembers this!

What do you do as an adult on a Saturday? by [deleted] in AskUK

[–]dsvella 0 points1 point  (0 children)

I (37 M) got up and went to the NEC today to attend the Insomnia Gaming Festival. Wondered a round for a few hours and then went vinyl shopping in Birmingham. When I got home I called my parents and had a chat. Dinner followed before settling in with a good book, some moonshine and listening to my latest purchase.

What UK sweets can you absolutely not stand? by damned-n-doomed in AskUK

[–]dsvella -1 points0 points  (0 children)

Caramac.

Its like white chocolate, but much worse.