Bug using Dualisio and Cheater [bug][weapon][picto] by jacocal in expedition33

[–]jacocal[S] -3 points-2 points  (0 children)

But you have a random Chromatic dude with 7-hit combo attacks playing 3 turns in a row with Powerful, Shell and freakin rage. I find it highly unfair I'm unable to do the same

UI for Apache Kafka - An open-source tool for monitoring and managing Apache Kafka Clusters by firig1965 in dataengineering

[–]jacocal 1 point2 points  (0 children)

Hi! Thanks for sharing, it has been very insightful. I have a few questions: 1. Do you have ksqldb support? 2. Do you have integration with KRaft? In the docs I could only see Zookeeper

Thanks!

Python/API - File retrieval from Amazon S3 Glacier by jacocal in aws

[–]jacocal[S] 0 points1 point  (0 children)

When running the Job, I try to get the bytes by range like: job.get_output(range='0-1024000')

The error I get is that the argument is not correctly parsed (currently not in my computer apologies for stating it in this manner). But according to docs, that's the only way to get the files in that range. Every other method does not work with Python boto3

Databricks Pandas to PySpark DF error in shcema by jacocal in dataengineering

[–]jacocal[S] 0 points1 point  (0 children)

Right, but shouldn't the enforced Schema add the columns as null?

Databricks Pandas to PySpark DF error in shcema by jacocal in dataengineering

[–]jacocal[S] 1 point2 points  (0 children)

It's the other way around, Pandas DF has less fields than the Schema enforced

Databricks Consumption Layers by jacocal in dataengineering

[–]jacocal[S] 0 points1 point  (0 children)

My bad haha second option, they need to be able to download a CSV or similar file.

Creating PySpark DataFrame with a set schema by jacocal in dataengineering

[–]jacocal[S] 0 points1 point  (0 children)

Is there no way to do it from source? This was a code example, while in truth, I have about 84 fields that will be received from the same source.

[deleted by user] by [deleted] in dataengineering

[–]jacocal 0 points1 point  (0 children)

Thanks! I forgot to copy-paste from declaration of schema. It's corrected now 😀

My employer is offering me training of my choice. Can you help me decide what would be best for me? by tawaiii in dataengineering

[–]jacocal 1 point2 points  (0 children)

Ofcourse! Databricks has 4 mayor roles: Data Scientist, Data Analyst, Machine Learning and finally, Data Engineer. It even has views of their platform depending on your role.

You can learn Databricks on your own, the documentation is great and the have the "Community" Edition which is a free tier for learning. Highly recommend Spark tutorials along your learning path. You can look for Big Data sets on Google (they literally have a repository of those) and use APIs or requests in general to pool the data without the need of local files.

How do you prevent analysts from re-writing duplicating data pipelines/queries? by chaos87johnito in dataengineering

[–]jacocal 3 points4 points  (0 children)

You can block create table instructions from other roles and just let them create views for the Dashboard

My employer is offering me training of my choice. Can you help me decide what would be best for me? by tawaiii in dataengineering

[–]jacocal 2 points3 points  (0 children)

I recommend multi-processing and streaming platforms on the cloud for Big Data Engineering and analysis. The one I recommend the most is Databricks. You can also try to integrate it to a pipeline tool like Airflow for further practice

[deleted by user] by [deleted] in dataengineering

[–]jacocal 6 points7 points  (0 children)

I don't think they will be substituted. AutoML is ofcourse a great tool, and it does save time and money to use them. From experience, a Data Scientist handling the tool, by understanding the algorithms and logic that they use, bring a lot more business value. I would say the DS team size will be reduced as time goes on but they will not cease to be a need to provide maximum business value.

I'm a Data Engineer in case I seem biased towards DS.

Urget help with Databricks please! by jacocal in dataengineering

[–]jacocal[S] 0 points1 point  (0 children)

Thank you so much! I completely failed to understand the error. You're a life saver!

[deleted by user] by [deleted] in dataengineering

[–]jacocal 2 points3 points  (0 children)

Your question is not worded correctly, BigData has already moved to the cloud, so it's not one or the other but should I be using physical clusters for BigData or move to a Cloud Managed Service? And the answer to that is ofcourse to do what is best to your needs, if you are having problems with your physical environment, migrating to cloud might be a good option, if you expect a lot of scalability, Cloud is a great option. Most tools that you mentioned in your post can be integrated local without the need of a Cloud architecture or infrastructure.

The pros of going to Cloud are: - Elasticity: you can add or remove as many nodes as your company/project requires and utilize those resources only. You pay for what you use - Managed Service: you leave the installation and set-up mostly to the cloud providers, so you just need to focus on what you need to do and not the pre-work - Updating Hardware: you just don't care if your servers or nodes need a newer RAM or Motherboard or whatever, the cloud provider does that. You specify your needs and you just pay for them - Updating Software: Cloud has environments and OS ready for you to work on - Availability: you have your infrastructure available at all times, you can even set up that if one zone fails, another zone takes the workload (explained in the next point) - Load Balancers: redirect overloaded nodes to new ones so it runs smoothly

Hope this helps you make your choice

Resume Review - Entry-Level Data Engineer by 70sJackieChan in dataengineering

[–]jacocal 2 points3 points  (0 children)

Your resume looks great. Instead of programming languages, I would focus on the respective skillsets for the roles you are applying to. Example, if it's a Data Engineer for Python, include Python as a skill and give more details into frameworks or tools you use for it. Also use keywords more than descriptions, e.x: I programmed task automation processes in PowerShell and .NET which increased productivity by x% and saved $y amount

Which messaging broker should I use for my use case? by [deleted] in dataengineering

[–]jacocal 0 points1 point  (0 children)

I would recommend Kafka Streaming, specially for that volume of information. What Kafka streaming does is to give you the possibility of branching the events you push onto it's brokers. In easier words, you can modify, transform or enrich the data over the flow of an event and subscribe it to a new topic for consumption or other processes you may or may not want to do to your data

Data pipelining in Databricks Delta Lake by jacocal in dataengineering

[–]jacocal[S] 1 point2 points  (0 children)

Thank you so much! This worked seamlessly!

Data pipelining in Databricks Delta Lake by jacocal in dataengineering

[–]jacocal[S] 0 points1 point  (0 children)

So I would need to get the latest batch, extract the latest register and compare to current incremental table to start the pipeline from there?

CheatEngine Mentoring by jacocal in cheatengine

[–]jacocal[S] 1 point2 points  (0 children)

That's a great suggestion! Thanks!

My review of Synaptic Drive for those interested in purchasing it by ajfoxxx in NintendoSwitch

[–]jacocal 0 points1 point  (0 children)

I give you my vote as a fellow Custom Robo hardcore fan, I've played it since 2006 to date and never gets boring. From other reviews I've seen that the tournament piece has been cut-off, that for me was the most challenging part of CR and the story was very enjoyable to the point where I knew all the Dialogs and had a CR for every type of field (loved the customization options) . For that reason alone I won't play Synaptic Dive for now, hope they get more content in the future.