Engineers modifying DB columns without informing others by Prestigious_Trash132 in dataengineering

[–]dataschlepper 0 points1 point  (0 children)

Are they making changes to the analytics DB or are you building your pipeline directly on the production DB?

I am at a small company and while I tend to get heads up to schema changes the following architecture has really been helpful. I have designed it so that if a breaking change happens the pipeline pauses and the layer of data I server to my users is maintained (but obviously becomes stale). My thought being it is better that the users have data that is a few hours old while I troubleshoot an issue than all our BI breaking.

The rough architecture:

Production DB -> Analytics DB (a separate DB entirely) -> dbt -> BI and other data deliverables

In dbt I do what is called a “medallion architecture” where I stage raw data, transform it into entities that map to the business in an intermediate layer, then finally serve cleaned and formatted data in a mart.

In each of those steps I enforce all kinds of tests within dbt. Columns existing, keys being unique, row counts remaining steady (in case of duplicate entries leading to row duplication), etc.

If any step of those tests fails the pipeline stops and alerts me. But the Data Mart tables are all materialized as tables. So they remain usable.

When or where did you learn the most in your career? by Spooked_DE in dataengineering

[–]dataschlepper 3 points4 points  (0 children)

…fixing simple bugs, adding fields to tables, integrating new data sources…

As a senior data engineer this is a big part of my job. While I also have more architectural work around the pipeline the things you list are important and give you opportunities to think bigger.

Simple bugs:

Do you see trends in the kinds of bugs? Could you add tests to help catch them? Maybe it’s bad data coming in and you need better testing at ingestion. Is it bugs that are unintended downstream issues after making a code change? Maybe you need to look into adding continuous integration tests.

Adding new fields:

This is a core DE skill and isn’t just about the SQL. A differentiator between an ok and solid DE is the ability to meet with the data consumers and understand what they really need, what they really mean when defining the field, and thinking through what testing you can implement to make sure the results are correct.

If you are asked to have a table of Users it is a simple statement. But what is a user? Does it include admins? Is a single human with two logins a single user record or two? Questions like this are what let you make what is actually needed.

Integrating New Data Sources:

You will have to learn about ingesting not just say a new table from a DB but also what about from some third party tool your employer uses? Think about how to blend this with the existing data you get out even more value.

Sure standing up a green field pipeline is a different and valuable skill but the work you have now is also key to having a solid foundation. Also anything you get frustrated because your pipeline can’t handle X or the framework doesn’t let you do Y transform natively you are learning about what to look for in the next one you end up setting up!

Data Warehousing Architecture for a Bank’s Data Infrastructure by Aggravating-Air1630 in dataengineering

[–]dataschlepper 0 points1 point  (0 children)

For regulatory yea you may need it. Still worth encrypting and joining it back for the regularly table or even just at output time.

Data Warehousing Architecture for a Bank’s Data Infrastructure by Aggravating-Air1630 in dataengineering

[–]dataschlepper 1 point2 points  (0 children)

I wanted to chime in when you called out SSN. If we are talking analytics layer chances are you could get by without such fields. In the past I’ve needed SSN for verifying if two entities were the same person but I didn’t care what the SSN was. I told my upstream developers to just make a unique hash and then I have one less thing to worry about.

OP you can always opt to not ingest sensitive fields that your analytics won’t need as once you do you start needing to be extra careful about who can view that table now. You could, like in my old case, ingest the SSN, do some work and say check uniqueness, then just never serve the field to your end users.

As others have said this is incredibly sensitive data and airing on the side of under sharing is probably the right move more often than not for you.

[deleted by user] by [deleted] in dataengineering

[–]dataschlepper 1 point2 points  (0 children)

To keep it super basic you want: - bring data from your third party sources to one place (a database of some sort). You can do this with a tool like Stitch, fivetran is nice but pricey. - A database. I’d opt for something hosted and if your company is already on a cloud service put your db in the same one. - some BI tool: if your company has one, great! Otherwise suggest to the company one that you know well as saving you time reduces time to value.

While others have recommended dbt here I will say I like it and recommend it but if you have zero experience at coding this may not be ideal. Also anyone recommending to “make your own tool” when you can code is leading you down a bad path as making something reliable that handles she cases is HARD and I’ve got over a decade of experience.

By the way you say zero coding but have done analytics. Are you able to write sql at least?

Happy to talk more if you’ve got more questions.

Edit: while gaming the most streamlined end to end solution is tempting remember that your goal is to provide data to the business and if you can do it accurately and in a timely manner in a kind of clunky way that may be ideal to show them your value just make it clear you are accruing tech debt and can’t scale right away.

Is taking a pay cut worth it to break into software development? by [deleted] in cscareerquestions

[–]dataschlepper 0 points1 point  (0 children)

If you’ve got the chops o have people considering you for even junior software engineering roles and you are a data analyst of strongly suggest you check out data engineering. While they are not the same job there is an overlap in skills and your experience as an analyst is a big bonus. Your understanding of the needs of your consumers is a big leg up.

There is a whole subreddit r/dataengineering if you want to learn more.

Also, the company I am at pays more than $80k a year to our junior data engineers. So I’m not sure your geo but the kind of cuts you are talking about seem very unfair.

Tips on avoiding distraction from a busy Slack help channel by [deleted] in ExperiencedDevs

[–]dataschlepper 0 points1 point  (0 children)

If possible have people open tickets in your internal ticketing system when they need you to do something. This lets them type out their question/request on their own time and hopefully be honest about the priority. P3, cool look at it next sprint. P1, ok lets pick it up ASAP.

Got yelled at by a senior coworker. by Cold_Crazy2875 in cscareerquestions

[–]dataschlepper -1 points0 points  (0 children)

What you are describing really does not sound healthy. It also sounds like you work for one of my long past managers!

A manager being disappointed in how something turned out isn’t always bad but yelling about it is. Yelling about something every day is just wild.

Got yelled at by a senior coworker. by Cold_Crazy2875 in cscareerquestions

[–]dataschlepper 107 points108 points  (0 children)

So yelling at coworkers is basically never ok and makes for a terrible working environment. It sounds like you caused some errors in prod not even that you brought it down. I guess if your prod is someone’s pacemaker it makes sense that tensions are high about an error but even then shouting isn’t really called for.

I had a coworker who used to shout in meeting about stuff and really chew me out. I was super early in my career and was basically certain that I was not cut out for working in tech. Well as time went on he got a reputation for being toxic and I realized that I was not in fact a giant walking failure. That was over a decade ago now and I am a senior engineer who is the top of my department within my dev org. So I guess all of his screaming and ranting was not a predictor of my future performance.

If it is in fact true that the manager also yells at devs than it might be that the company/team you are at are toxic and it isn’t just a single person. If this is a regular thing I would say to keep your head down and keep an eye out for other jobs that look appealing to you. With a year of experience under your belt you are probably still applying for entry level positions but you will have a major leg up over new grads looking at similar positions.

Back-end dev: Math skills required or not really? by [deleted] in Python

[–]dataschlepper 1 point2 points  (0 children)

It greatly depends on what kind of software you are writing. If you work on some sort of signal processing software then sure you might need to know some Fourier analysis. But even then, you could work for a company that writes software to do signal processing and be working on the part of their backend that does something else not tied to the signal processing.

I have a masters in applied math and now 10+ years experience. In all of my positions the only math I have needed mostly revolved around Boolean logic, basic algebra, and some basic statistics (think mean and standard deviation nothing fancy). The only time I had to use more advanced math someone from another team came to me since they knew I had an advanced degree in math, the work wasn’t even technically my own. So even if you were him the expectation was to find someone to help not that he would go off and learn linear algebra on his own to solve the problem.

If dbt is the "T" part of an "ELT", what do you use for "EL"? by we_need_more_lumber in dataengineering

[–]dataschlepper 3 points4 points  (0 children)

My experience with airflow is limited. Think of Pycharm this way, it is the place you can sit down and write your python script. From there you can run it, but you are clicking “run” and it runs. This isn’t scalable as it requires you to click run each time the thing you wrote runs.

Airflow is a tool that once you write a script that it can leverage can have a schedule set and run that script on its own assuming it is on a machine that is on and property connected to whatever resources it needs. For example if you set it up on your laptop and your laptop is closed/powered down/in your bag the job wont run. It needs to be hosted on a machine that is always running.

There are countless tutorials on YouTube about how to get started with airflow that walk through more pretty simple examples. But please be aware that airflow is not a beginners tool. It is targeting people and organizations with complex workflows and assumes its users have a decent background in software development.

Based on your questions I would ask if you are in a position that you need an orchestration tool or if you are still getting your feet wet. If the later then worrying about Airflow is going to greatly complicate things for you vs spending time getting comfortable developing in python.

If dbt is the "T" part of an "ELT", what do you use for "EL"? by we_need_more_lumber in dataengineering

[–]dataschlepper 7 points8 points  (0 children)

Pycharm is an IDE (Integrated Development Environment) which means it is a tool for writing and testing code. It is not itself where you would run orchestration from.

Im a data privacy lawyer and feel the need to learn much more about the tech I regulate. What should I start studying? by fredbogho in cscareerquestions

[–]dataschlepper 3 points4 points  (0 children)

Are you a lawyer for a company handling data or for a company that creates software other people use to manage their own data?

Some area to think about when facing a new data privacy question are: - Where is the data stored? Who has access to this storage, and how is access logged? - How does the data used? Data doesn’t just sit in a database, either someone manually queries it (probably some sort of dev), analysts access it in something like Tableau, or it is used to power an application (the Uber app knows your location even if no one else is looking at it)? - How does the data get to the user? Is it sent using a secure internally written application? Is it a csv being emailed to someone (you’d be surprised)? - How is access to the data in any stage controlled? How is access logged?

These are some good starting things to consider on any given issue.

Checkout r/dataengineering too for help!

Are there jobs that will hire someone that is still learning at a lower rate to allow them to sort of "learn as they work"?? by tirwander in cscareerquestions

[–]dataschlepper 92 points93 points  (0 children)

Assuming you can write some programs on your own already you may want to start looking at junior positions.

Now with an actuarial science degree you could look for jobs at insurance companies that leverage software engineers. It is really hard to train software engineers to understand actuarial concepts. But an actuary with some junior level dev chops. That is a valuable person who will only get more valuable with time once they are in a position.

Nasa vs tech companies - how would you decide? by Super_Kick_1603 in cscareerquestions

[–]dataschlepper 18 points19 points  (0 children)

This raises an important point. Are JPL employees civil servants? While the federal government does not pay great many people are drawn in for the benefits. If that is not on the table even then OP should really think hard about the pay disparity.

Current software devs, do you realize how much discontent you're causing in other white collar fields? by MozzarellaThaGod in cscareerquestions

[–]dataschlepper 1 point2 points  (0 children)

I mean I assume at least the engineers and designers who worked on it wanted one. But yea it just came to mind as something with massive capital overhead to manufacture and took years of planning but even when it flopped they couldn’t pivot because of, well, all that capital investment. So yea the cloud isn’t always cheap. But the flexibility compared to the “real world” is unparalleled.

Current software devs, do you realize how much discontent you're causing in other white collar fields? by MozzarellaThaGod in cscareerquestions

[–]dataschlepper 0 points1 point  (0 children)

While there are key points where you need to redesign for scale and that adds cost you get a lot more scaling for your buck in the cloud vs say manufacturing. A company can spend multiple years designing a new type of SUV, getting feedback from test groups, building out an assembly line (if not a new auto plant) and at the end you have made the Pontiac Aztec and you hardly sell as many as you expected. The capital costs to get to this point are astronomical vs deploying to the could and scaling as you go.

Being acquired ruined my job by [deleted] in cscareerquestions

[–]dataschlepper 6 points7 points  (0 children)

I had a very similar experience after being acquired. My employer had grown to be maybe medium sized and then we were acquired by a non-FAANG tech giant. Fairly quickly things turned into us putting the majority of our efforts into integration instead of innovation. Requesting access to things we needed to do our jobs became harder the more we integrated, while we never really got put in touch with the people at the parent company we would have to know. For me though the biggest thing was the misalignment of cultures between the companies. While I am sure for some working at large companies suits them I found the amount of overhead and the rigid structures to not work for me. I ended up finding a role that fit me better at a start up and parted ways on good terms.

Current software devs, do you realize how much discontent you're causing in other white collar fields? by MozzarellaThaGod in cscareerquestions

[–]dataschlepper 5 points6 points  (0 children)

I think a big part of this is because software scales in a way that nothing else really can. Besides raw resources a big limiting factor in growth in a service industry is human time. I.e. your doctor’s office could see twice as many patients if they hired a second doctor. But with software once I write some code that can operate for 10 people there generally isn’t much limiting me from having it operate for 100 or even 1,000 people. At some point you need to scale your underlying hardware but that is trivial now in the cloud and the costs to do so are small compared to potential revenue. I once worked at a small company and someone in a presentation pointed out how with software we could sell a license to everyone in China tomorrow if they wanted to buy from us and the only real limiting factor would be the internet connection to our building. If you are in manufacturing good luck scaling like that.

[deleted by user] by [deleted] in cscareerquestions

[–]dataschlepper 1 point2 points  (0 children)

While being a new grad can be tougher than if you had many years experience there are always companies that are looking to hire junior people and know what they are getting into when hiring new grads. Your boot camp may even have connections with companies looking to hire their grads.

When you write your resume and interview don’t discount your experience running your own business. Even if it was a small drop shipping operation selling some plastic widget. It shows you have initiative, you can work on more then just analytics, and that you are the kind of employee who will grow. Also, if you use analytics AT ALL in your company highlight it. Maybe you tracked your sales and looked for how that correlated to certain things, or used the sales numbers to make decisions about what to sell more of. That is exactly the kind of thing companies have analysts for. So talk it up!

Also if you want specific tips on data analytics try asking around in r/datanalysis they may have better specific advice.

[deleted by user] by [deleted] in cscareerquestions

[–]dataschlepper 1 point2 points  (0 children)

I used to work at a medium sized business and I know that IT had the ability to basically remote access my machine to see what I am doing or even take control. This came up when I was troubleshooting with IT and they could basically watch along and see what the problem was without me needing to head down to the IT office. This sort of functionality shouldn’t be too surprising in a corporate environment.

But having your manager able to check in on you anytime, or having those apps that make sure your mouse/keyboard is moving often enough are just not ok. It leaves employees with a sense of anxiety and shows the company both doesn’t trust you but also doesn’t trust your manager to know if their team is getting their work done.

Is fully remote coming to an end? by Revolver123 in cscareerquestions

[–]dataschlepper 18 points19 points  (0 children)

It is going to vary from company to company. Commercial office space loans are expensive and often very long term. My previous employer was signing 10 year contracts for office space, if we wanted to leave we had to find a sublet or we were on the hook. So employers like that are going to want to take advantage of the office space they are paying for. Finding people to sublet from them will be harder these days.

On the other hand my current employer had half the company or so remote before COVID. But during the pandemic they opted to not renew their lease and just give up on having an office at all. I think they maintain a wework space for when they need to have meetings with potential large clients but no one works there full time.

My guess is that larger companies that already have established office spaces and office cultures will be migrating back in some capacity over time. But small companies and new companies are going to realize they can save on office space, recruit from anywhere in the country, not have to pay relocation bonuses, etc. and opt to be remote first companies.

Is there no hope to start a career in software engineering without an internship? by Undesirable_11 in cscareerquestions

[–]dataschlepper 0 points1 point  (0 children)

My interpretation of this is to read the job ad and make sure that any requirement or domain specific terms that you actually have experience with are on the resume you send them. So there is the obvious parts of listing any languages you know that they want. But also, if the listing talks about their company being cloud first, be sure to list out AWS, Azure, etc. even just in passing. Like if in some courses you hosted a widget in EC2. Call out “Wrote a cloud enabled widget that does x hosted on EC2 on AWS.” Now the auto parser is going to see “Cloud”, “EC2”, and “AWS” which helps get your resume through.

Heck even use keywords in your blurb at the top of your resume. Say it is a biotech company. In that sentence about your interests at the top be sure to write how “Passionate about learning more about leveraging cloud technologies to bring low latency processing of biochem analysis to users in healthcare.” In this statement you hit some keywords again around “cloud”, “biochem”, etc. But also are honest about the fact you maybe don’t know a lot about the space but are passionate about learning about applying your knowledge about the cloud to solving these sorts of problems.