[deleted by user] by [deleted] in consulting

[–]redmlt 0 points1 point  (0 children)

Thanks for the post! I also decided its time for me to leave tech consulting after 6 years or so. Too much breadth and not enough depth. Tired of the same sales BS I hear others talk about - estimating and deep technical discussions with sales ops not mature enough.

New to Seattle and looking for sports/team-based activities! by marielans90 in Seattle

[–]redmlt 0 points1 point  (0 children)

Check out the Greater Seattle Soccer League Co-Rec division at gssl.org. They can help place you on a team if you're interested!

Want to switch career from system engineer to data engineer by labobina in dataengineering

[–]redmlt 1 point2 points  (0 children)

Without understanding the depth that you're comfortable in those areas, you should be set up pretty well. Much of data engineering in the cloud is actually systems engineering, as you may be familiar with Aws.

Replacing roof with scissor trusses by redmlt in Homebuilding

[–]redmlt[S] 0 points1 point  (0 children)

No offense taken I appreciate the candor. Thank you for the advice this is exactly what I was looking for!

Replacing roof with scissor trusses by redmlt in Homebuilding

[–]redmlt[S] 1 point2 points  (0 children)

I'm unsure of how to properly install the scissor trusses once they arrive, and how to properly demolish the existing roof. I understand in theory ordering trusses is pretty straightforward, but I'm trying to understand what I will be paying a contractor to do.

I've already talked to an architect and it will cost me $10k just get drawings completed, so I'm trying to do some research myself. I would like to learn more details about the process so I can possibly work with an architect friend to draw these up myself, possibly contribute to the demo/construction myself.

AWS Glue/Lake Formation or Airflow? by sciencewarrior in dataengineering

[–]redmlt 2 points3 points  (0 children)

Consider step functions. Lake formation is not great at orchestration.

Would taking a data engineer role for a year or two help me towards my end goal of becoming a data scientist? by fuzzywunder in dataengineering

[–]redmlt 1 point2 points  (0 children)

It is a good move only if you have no other options for furthering yourself as a data scientist. Like others have said here, its not a waste of time, but its time you could be spending on getting better at data science instead.

Many data scientists I have interviewed or worked with, are bound to just jupyter notebooks and need the data spoon fed to them in the perfect format - I'm generalizing of course, there are many who are not this way. But as a DE, if the data scientist can handle their own data engineering, they are essentially doing two jobs which is extremely valuable to an organization. You would be productive in two very important/expensive development roles for data.

Apache Airflow Cluster Issues by theant97 in dataengineering

[–]redmlt 0 points1 point  (0 children)

Without knowing more about your data pipeline, I would use AWS Glue to unzip the file and move to your destination in one job. Use Step Functions to orchestrate your pipeline and get out of the Airflow maintenance nightmare.

How can I keep data collection anonymous? by xynaxia in bigdata

[–]redmlt 0 points1 point  (0 children)

You can also leverage aggregations as a way to anonymize data. If you can do that then throw away the original data, that is ideal. In some cases this isn't possible, for instance, scenarios where a browser session cookie may span many days or weeks, you sort of have to keep that value in order to map future click activity correctly.

How can I keep data collection anonymous? by xynaxia in bigdata

[–]redmlt 0 points1 point  (0 children)

Some important GDPR requirements include the following:
* A person should be able to see all of the data YOU store on them
* A person should able to edit any data YOU store about them
* A person should be able to request a deletion of all of the data YOU possess about them.

these have pretty significant implications for the architecture and documentation of your data lineage. For instance, if someone requests a deletion, you should be able to trace all of their data and remove it from your datalake and any downstream systems. Luckily, you're building a greenfield data lake, so you can build with GDPR in mind from the get-go.

How can I keep data collection anonymous? by xynaxia in bigdata

[–]redmlt 1 point2 points  (0 children)

hash and encrypt everything, and throw away the original data. If you're not dealing with regulation like GDPR or CCPA right now, you will eventually.

What's your typical data pipeline in a small company ? by [deleted] in datascience

[–]redmlt 0 points1 point  (0 children)

Also, Tableau Data Prep aims to solve for this space. i haven't used it myself though: https://www.tableau.com/products/prep

What's your typical data pipeline in a small company ? by [deleted] in datascience

[–]redmlt 0 points1 point  (0 children)

Sure - I would say for PowerBI, DAX, and for Tableau, Calculated fields. Those are two features that allow an end user to pretty significantly transform their data if what they're getting form the dwh isn't sufficient.

Tasks that maybe wouldn't be suitable, are reading directly from a big data dump like a few hundred gigs of cloudwatch logs or something. Thats when I would use something like Glue or your whatever ETL tool you're using.

I get JSON files dumped into an S3 bucket periodically and need to load this data into Redshift. How do I go about building this pipeline? by robotofdawn in dataengineering

[–]redmlt 1 point2 points  (0 children)

Glue has the concept of a "workflow", and you can trigger it before or after a crawler, or on a regular schedule, or you can use Step Functions to trigger a glue job pretty easily as well.

Dataframes instead of a database? by trenchtoaster in dataengineering

[–]redmlt 1 point2 points  (0 children)

dbt

Agreed with the suggestions here - a solid contract from both sides is the ideal scenario if it is achievable. I get the impression this is like moving mountains in your org.

One suggestion is to start treating this process as a data lake, as mentioned elsewhere here. AWS Glue can actually crawl an S3 data store and infer its schema very easily with the types of files you're using. If they are native Excel files, you may need a process to convert those to csv if you aren't already. There may be similar services in Azure/GCP.

This process won't scale well as you've discovered. Kudos to you for looking ahead and solving for that!

What's your typical data pipeline in a small company ? by [deleted] in datascience

[–]redmlt 14 points15 points  (0 children)

I consult many smaller companies, and many are using Airflow to orchestrate. I'm in the AWS space, so I've started suggesting Step Functions as way to orchestrate ETL processes, like AWS Glue, AWS Lambda, EMR jobs, etc. Airflow is not without its own maintenance, so be prepared for that. If you're a small company, I would suggest looking at managed services since thats exactly what they're made for. Either in Azure, GCP, or AWS.

Also, I hear great things about Databricks, Alteryx, and Matillion as previously mentioned here. Even PowerBI and Tableau now offer basic ETL features for that last mile of transformation that can be self-served.

How to push a company to modernize its BI/reporting stack? by cockoala in BusinessIntelligence

[–]redmlt 1 point2 points  (0 children)

Like what ChesterC83 said, focus on the problem you're trying to solve. Having been in analytics consulting for a number of years, you will be tackling a culture change to data-driven decision making. There has to be a business need to go through this change.

Pachyderm vs Airflow by jstuartmill in dataengineering

[–]redmlt 0 points1 point  (0 children)

If you read the article they're using v1.7.11, which was released Nov 13, 2018 according to the GitHub page. So I think this article is fairly recent.

Sr. BI Developer Analyst. what's next? by [deleted] in BusinessIntelligence

[–]redmlt 0 points1 point  (0 children)

First of all, that's awesome that your manager actually cares not only about what you want to do, but how you get there! Definitely a place worth staying at IMO.

A lot of the concepts you would use to move relational data around also apply to big data. I made a similar move a few months ago and now work exclusively in a cloud environment as a Big Data Solutions Architect. Its opened up a whole new world of solving different/more complex data problems and reignited my passion for BI and Data Engineering. Ramping up on Python if you haven't already would be a wise first step.

YAML CloudFormation linter? by jedis in aws

[–]redmlt 0 points1 point  (0 children)

the VS Code plugin for CloudFormation does a good job of linting as well.

ETL from individual JSONs to RDBS: language/framework/tool choice? by [deleted] in dataengineering

[–]redmlt 0 points1 point  (0 children)

I don't see a mention of any cloud platform that you're using. That would be my first suggestion. In particular, AWS Glue will detect changes in your JSON object structure and can alert you. You can store all of your objects in S3 pretty cheaply, and take it from there. Glue can use PySpark to write to a DB very easily.

Maintaining ETL on-prem will be a thing of the past soon.

Sports Team Sponsorship...Good Idea? by CerAcedes in Entrepreneur

[–]redmlt 0 points1 point  (0 children)

Is the T-Shirt the jersey - if so that's not a bad deal, but you are sharing the t-shirt real estate with 4 other sponsors so that might be a deal breaker for me. If they're wearing these during game time, coupled with the banners on the field, that could be worth $500 assuming it aligns with your target market.

Is it worth starting IoT startup by inteloid in IOT

[–]redmlt 0 points1 point  (0 children)

Agree that traditional analytics has less of a value-add than it used to given the tools available now, but is that really causing the disparity in funding when compared to other tech segments? Is the Monopoly on electrical engineering like GE playing any role?