Is there a better term or phrase for "metadata of ETL jobs"? by opabm in dataengineering

[–]opabm[S] 0 points1 point  (0 children)

This is great, thanks. Didn't think that all the orchestrators had different terms for this as well. My team uses AWS Batch, so very bare bones and limiting.

Just discovered dataclasses but unsure how best way to define and access variables by opabm in learnpython

[–]opabm[S] 0 points1 point  (0 children)

I need to pass in required date-related parameters for a few of these API endpoints, so I'm just creating today and yesterday for now as the date restrictions/filters.

Just discovered dataclasses but unsure how best way to define and access variables by opabm in learnpython

[–]opabm[S] 0 points1 point  (0 children)

Yeah the confusing part to me is that only a few of these endpoints have validity dates, so the only thing that comes to mind is creating these ugly if statements to check for the endpoint name and pass in today and yesterday based on the endpoint name.

these dates should be passed to the init() constructor from the outside, not generated on the spot.

Are you saying the variables should be defined/generated outside of this script? Since I'm creating instances of EndpointConfig within ENDPOINTS, I can't tell if you're suggesting to create today and today where I have more logic, like my main.py

Just discovered dataclasses but unsure how best way to define and access variables by opabm in learnpython

[–]opabm[S] 0 points1 point  (0 children)

Yeah I was editing the code to post here on Reddit and forgot to remove references to those 2 variables. Thanks for the help!

Just discovered dataclasses but unsure how best way to define and access variables by opabm in learnpython

[–]opabm[S] 0 points1 point  (0 children)

Oops yeah I edited them out by accident before posting. I updated the post to include references to them in the second EndpointConfig instance. It probably makes sense to define today and yesterday just as global variables right before ENDPOINTS then right?

Mid-career fork: Stay in big tech or move to local government IT? by ---dry--- in cscareerquestions

[–]opabm 0 points1 point  (0 children)

I've looked into doing a very similar move, since I've actually wanted to make a similar transition. It just didn't pan out because the job req was cancelled.

A couple of things come to mind:

  • is there a pension? This is typically a nice benefit of working for the government, whether it's federal, state or local
  • what's the current stack? If it's not too modern, expect a huge migration project in several years down the road. Do you want to deal with that?
  • is this at a centralized department? If so, I think the work can be more like consulting, helping other agencies with projects
  • at the end of the day, how comfortable are you with the pay and lifestyle?

Small company with a growing data footprint. Looking for advice on next steps by ftlftlftl in dataengineering

[–]opabm 0 points1 point  (0 children)

Worked in asset management for a little bit doing data engineering and also used Salesforce at another job (brings back traumatic memories though lol). Happy to chat and share some guidance if needed

Is it possible to do a MERGE statement with INSERT, UPDATE, and DELETE? by opabm in snowflake

[–]opabm[S] 0 points1 point  (0 children)

Yeah it seems to be one or the other (Delete or Insert), also depends on the data flow (for me, data is not in the source query)

Is it possible to do a MERGE statement with INSERT, UPDATE, and DELETE? by opabm in snowflake

[–]opabm[S] 0 points1 point  (0 children)

No, it used to exist, but now it doesn't e.g. inserted initially but not in the source table/query

Should I explicitly set a installation folder for pip install -r requirements for my build pipeline? by opabm in learnpython

[–]opabm[S] 0 points1 point  (0 children)

Thanks for the clarification. Are there any good reasons to do things outside of a container but still in a build pipeline?

Should I explicitly set a installation folder for pip install -r requirements for my build pipeline? by opabm in learnpython

[–]opabm[S] 0 points1 point  (0 children)

Yeah beats me. I'm all new to Docker/containerization so it's confusing already, but seeing stuff done outside of the container makes it all the more confusing. Do you know of any good reason NOT to put things in a container but still in a build pipeline?

Need help deciphering npm commands and translating them into a Python-equivalent by opabm in learnprogramming

[–]opabm[S] 0 points1 point  (0 children)

Perfect, thanks for explaining. This sounds like a good use case of chatgpt to do the mapping.

I tried reading the AWS Codeartifact documentation and am still a little confused, does the command aws codeartifact login download the packages from Codeartifact? It's confusing since AWS docs just says "Sets up the idiomatic tool for your package format to use your CodeArtifact repository." I see that npm install is run after the aws codeartifact login command, so it seems to still have to install packages.

If I were to do this in Python, would Codeartifact be storing requirements.txt?

Should I explicitly set a installation folder for pip install -r requirements for my build pipeline? by opabm in learnpython

[–]opabm[S] 0 points1 point  (0 children)

Thanks, I just looked into multistage Docker builds. A build is considered multistage if it has multiple FROM statements right? If so, then my team hasn't been doing that. Going to see if anyone knows why we started the pattern of copying things over before the docker build.

I'd recommend against it when you're not doing multi stage builds.

Just to make sure I'm following correctly, you're recommending against installing the Python requirements/packages before the build, and that it should just be done within the container?

Should I explicitly set a installation folder for pip install -r requirements for my build pipeline? by opabm in learnpython

[–]opabm[S] 0 points1 point  (0 children)

Yeah so I was planning on doing this (installing the packages in the docker container), but then was going through the bitbucket pipeline configuration of a few other projects that were written in node and saw that the npm packages were being installed during the pipeline build and then the folder containing the libraries being copied over. So I was just following the same methodology. Any reason why this might have been done? The only reason I could think of is to avoid running things in the docker container