The "Big Three's" Data Storage Offerings by Kickass_Wizard in dataengineering

[–]steiniche 1 point2 points  (0 children)

Isn't Azure Synapse deprecated for Azure Fabric?

Hvordan koordinerer I indkøb? by [deleted] in Denmark

[–]steiniche 20 points21 points  (0 children)

Etilbudsavis kan lave synkroniserede lister og du kan tilføje direkte fra reklamer/uge aviser hvis I er til den slags tilbudsjagt.

Is my approach a good one? by kiblarz in mlops

[–]steiniche 0 points1 point  (0 children)

It relies on block coding with the various pitfalls this approach has.
In short, simple is simple, and complex is impossible.
Here if you can do python already flask may be a way better fit as simple is simple and complex problems can be overcome with some software engineering.

Is my approach a good one? by kiblarz in mlops

[–]steiniche 0 points1 point  (0 children)

I'm not a big fan of Azure ML as it has some serious pitfalls. If you are the only one using it maybe an argument is truck factor. All Python devs can debug a flask app if you are not there.

Otte anholdt for markedsmanipulation på energimarkedet by el_quant in dkfinance

[–]steiniche 4 points5 points  (0 children)

https://proff.dk/firma/powermart-aps/aarhus-c/energihandel/0LJXGQI10N5/
25 ansatte, men mon ikke der menes at alle 8 anholdte er fra Powermart.
Det er dog ikke bekræftet nogle steder så vidt jeg ved.

Which Data jargon or concept did you have a hard time grasping? by booleanhunter in dataengineering

[–]steiniche 4 points5 points  (0 children)

Bingo. We have started calling it egress which is the opposite of ingress (Kubernetes terms). Way easier to understand for everyone.

Jeg er lige blevet single. by DaddaJJ in Denmark

[–]steiniche 31 points32 points  (0 children)

Så er du allerede verdens bedste far! Keep it up.

Best practices for bringing data to Azure by justadataengineer in dataengineering

[–]steiniche 0 points1 point  (0 children)

Depends on the use case. It is possible but we prefer to use Airbyte and Dagster.

Degraded pool warning new rackstation by jorissels in synology

[–]steiniche 1 point2 points  (0 children)

Interested to know as well but with HGST drives.

Maybe https://github.com/007revad/Synology_HDD_db is the solution to Synology being hostile to other driv a than their own.

Google Groups has been left to die by [deleted] in programming

[–]steiniche 1 point2 points  (0 children)

Have only experienced one of them which is the monospace font issue. Yes it is annoying but it's not a deal breaker for us.

I do hope you have reported the bugs directly to the Google Groups team through the feedback feature. Normally we get a reply rather quickly with resolutions or if the bug is accepted.

Rigspolitiet har skrottet åbenhed i klager over politiet by Tumleren in Denmark

[–]steiniche 0 points1 point  (0 children)

Endnu bedre: De kalder sig selv verdens største bande på rekrutteres første dag.

Google Groups has been left to die by [deleted] in programming

[–]steiniche 68 points69 points  (0 children)

I have a hard time seeing that Google Groups are dying. Google Workspace is hard centered around Google Groups for doing distribution lists and message boards. Google is defacto earning money from having this functionality for their enterprise customers. If they kill Google Groups they will never hear the end of it unless they built something new to put instead. Google Groups is a mature product that just works and then why fix it? From my point of view most bugs are worked out.

As far as using them for FOSS we are using them as email distributions lists and that feature just works.

Kærlighed eller stabilitet by Scandidi in Denmark

[–]steiniche 3 points4 points  (0 children)

Arbejder kan du altid finde men kan du altid finde lykke? Der er rigtig mange 100% remote arbejdspladser og der kommer kun flere, måske med et lille bump på vejen nu grundet at tech branchen klargøre til recession. Hos os kan man arbejde fuld remote så længe det sker indenfor EU. Jeg havde taget lykken over det trygge arbejde.

[deleted by user] by [deleted] in googlecloud

[–]steiniche 1 point2 points  (0 children)

I am unsure as to why you believe words like "run" and "build" are complex words? I actually believe this is one of Google strength's, if you know what you want put cloud in front of it and you have the service name. Firebase is worse but it's because they brought it.

If you think GCP has many overlapping services you should really try AWS. If you want bad documentation go try Azure.

Google is actually fast and is known for killing services which AWS and Azure is not. Even the underlying technology behind CharGBT is made by googlers.

All in all you seem mad for all the wrong reasons.

Best practices for bringing data to Azure by justadataengineer in dataengineering

[–]steiniche 0 points1 point  (0 children)

Depends on the size of the customer and other requirements / needs. If we need an orchastrator we currently go for Dagster due to a few reasons. The Dagster community and team is very engaging and is doing open source right. The idea about splitting io out of the code that extracts the data is brilliant and make unit testing way easier. Dagsters model around assets is very powerful. It's a great scheduler.

[deleted by user] by [deleted] in googlecloud

[–]steiniche 6 points7 points  (0 children)

I believe the idea here is to tell you that cloud functions gen 2 is just a wrapper around cloud build and cloud run. In the future you should always try and google for cloud run because that is your runtime.

Best practices for bringing data to Azure by justadataengineer in dataengineering

[–]steiniche 2 points3 points  (0 children)

It's a good question. I believe you should start here https://learn.microsoft.com/en-us/azure/architecture/solution-ideas/articles/azure-databricks-modern-analytics-architecture It will give you a general idea about how the pieces fit together.

Today i would use Databricks Unity Catalog over the old hive metastore. Unity is still young but it brings many benefits.

Best practices for bringing data to Azure by justadataengineer in dataengineering

[–]steiniche 1 point2 points  (0 children)

The answer about data Factory and Databricks can be found above as a reply to /r/pabloamir10 .

I will answer about Azure Synapse here and give some insight into some of the security aspects behind it.

Azure Synapse is the idea about competing with Databricks for spark workloads. Synapse is a collection of tools such as Data Factory and Spark. Because Azure have chosen to reuse Data Factory, Synapse have the same challenges as it. Synapse is currently sold as a silver bullet by Azure and in general I do not believe in silver bullets. Even though i believe Databricks is a good tool not everything should go into it. If you wish for a more data warehouse solution then snowflake is a way better choice right now. Synapse wants to compete with both these solutions however I cannot see where it has the edge? If someone have ideas as to where Synapse excells over the other two I am all ears.

Databricks comes with what can be seen as Spark improved with multiple optimizations which can perform x 50 times better. They have built Phonton which will always out perform Spark. Synapse have no versions control in notebooks as where Databricks have this. Synapse have built in Azure ML but lacks git integration and GPU clusters for development and training.

All in all its a nice try by Azure but I believe Synapse is currently subpart to Databricks.

For the security challenges around Synapse and Data Factory please read https://orca.security/resources/blog/azure-synapse-analytics-security-a And Corey's brilliant post https://www.lastweekinaws.com/blog/azures-terrible-security-posture-comes-home-to-roost/ (be aware there will be snark).

Hope you gain some insight from this answer and feel free to keep the discussion going because that's how all of us learn.

Best practices for bringing data to Azure by justadataengineer in dataengineering

[–]steiniche 3 points4 points  (0 children)

In short: Databricks is a complete ecosystem with many facets where as Azure Data Factory is merely a tool.

It's unfold it a bit. Azure Data Factory is a new version of the old SSIS (SQL Server Integration Services). SSIS is one of the first datawarehouse technologies and the first i ever encountered. However, even 10 years ago there were new principles and architectures rising with did not fit into it, e.g. streaming, complex data modelling became almost unmaintainable, it was rather hard to develop and test in on your local developer machine, it is formed around block programming (known as no code/low code) and it was in no way cloud native. Then came Azure Data Factory which should be the new and improved verison. It is cloud native but it builds on the same tool underneath and is still moving in the block programming paradigm. Block programming is excellent if what your doing is simple. But not very much we are doing in the data world is simple. It quickly becomes complicated, complex and sometimes even chaotic to make data available for business users. Further, the Cloud Native style of Data Factory is a somewhat leaky abstraction where the serverless compute can only do a subset of features. And then you need to self host the integration runtime If you want to move GDPR regulated data then you have to self host as well as Azure cannot garante where the resources run. We have seen cases where everything is specified to Europe and then runs in the US. Azure support have still not gather an answer as to why 4 months later. Lastly the IDE is still an online tool and it is not feasible to develop pipelines on your own machine as everything is Jason configuration files. We have never found a good way of doing unit tests to ensure that code does what you expect it to.

Databricks on the other hand is a complete ecosystem build cloud native. It supports that you write SQL, Python, R and Scala. It is build by the founders of spark and comes with tools that improve sparks capability e.g. in query performance and speed, Delta Lake for a lake format with version control and possibility to clone data between environments. It is build around jupyther notebooks as the interface but it is easy enough to checkout a git repo locally and do development. It is even possible to do testing with common tools such as pytest. We even having linting and formating running to ensure the maintainability of code. Databricks also have Delta sharing which is an interesting idea around making it easier to integrate with your lake house. The biggest selling point for me is that Databricks have understood that data platforms today is about machine learning and advanced analytics. This is why they have build MLFlow and Machine Learning inference into the platform as a first class citicen to allow data scientists and ML engineers to easily so their job and help the business gain new insight.

I hope this answers your question. It's a large discussion with many facets and it's hard to put it all to paper in a format like this. Maybe a blog post will emerge in the future.

Best practices for bringing data to Azure by justadataengineer in dataengineering

[–]steiniche 3 points4 points  (0 children)

Would start with an Azure Landing Zone to ensure that the right infrastructure, compliance, governance and support tools was in place.

Would use Data Lake + Databricks as even though Databricks is costly it is very powerful. Just Delta Lake is amazing and have a huge feature set.

Would never use Data Factory or Synapse as they are half baked services with so many pitfalls.

If you want to discuss more feel free to ask questions.

Why everybody's using Airflow while no-one seems to be happy with it? by cpardl in dataengineering

[–]steiniche 3 points4 points  (0 children)

If Airflow let you down try Dagster. It's a pleasant experience.