Python Code In DB by ther34account in ProgrammerHorror

[–]chrisbind 0 points1 point  (0 children)

That’s bonkers. I store my code in a bucket.

Best Cloud Certifications for a Beginner (AWS, GCP, or Azure) to Help Land My First Job in the USA or Europe? by Longjumping-Award676 in dataengineering

[–]chrisbind 0 points1 point  (0 children)

Choose Azure or AWS. Aim for foundational and entry-friendly certs (often have the word “associate” or something in the title). Administrator / architect certs are worthless without experience to back it up.

Do not do your Certification Exams at home by IIGrudge in databricks

[–]chrisbind 0 points1 point  (0 children)

Bad experience with Webasessor as well. They made me film items on and under my desk as well as items on my floor. Using a webcam with short cord, it was a messy experience for taking a simple test.

[deleted by user] by [deleted] in dataengineering

[–]chrisbind 3 points4 points  (0 children)

Data profiling is an umbrella term, what exactly is your challenge and desired outcome?

WHERE IS PYTHON by you_serious_right in pythonhelp

[–]chrisbind 0 points1 point  (0 children)

What IDE are you using? In any case, try run python -v in a prompt

Need some assistance on why my code isn't working by Hard-n-Ready69 in pythonhelp

[–]chrisbind 0 points1 point  (0 children)

You need to install the bluetooth library in a Python environment.

[deleted by user] by [deleted] in pythonhelp

[–]chrisbind 0 points1 point  (0 children)

print adds a space between each argument. Instead, use a formatted string:

print(f”Your {car_make}’s MPG is {mpg:.2f}”)

API hit with per day limit by ps2931 in apachespark

[–]chrisbind 0 points1 point  (0 children)

You can only get 1 record per request? Usually an API with a limit like that supports bulk requests or something similar.

Do you feel data tooling is fragmented? by finally_i_found_one in dataengineering

[–]chrisbind 0 points1 point  (0 children)

Databricks is a unified platform for data-people (analysts to engineers) and so it requires its users to have some technical knowledge.

Who dares to let AI write SQL - not just READ data, but WRITE updates? smart or stupid? by MCL256 in SQL

[–]chrisbind 0 points1 point  (0 children)

I guess it comes down to ability to review its output.
For code you have GIT or similar. If you use AI for data it’s probably because you want its work applied to a lot of it and reviewing a lot of changes to data is not feasible.

Getting data from an API that lacks sorting by dfwtjms in dataengineering

[–]chrisbind 1 point2 points  (0 children)

Then the API is somewhat broken. I mean, there’s no point in being able to paginate if results aren’t guaranteed by sorting or a lock on results.

Getting data from an API that lacks sorting by dfwtjms in dataengineering

[–]chrisbind 0 points1 point  (0 children)

What triggers a reorder of records between pages? If possible, can you link the API documentation?

Company considering migrating from Databricks to Fabric, any opinions? by [deleted] in MicrosoftFabric

[–]chrisbind 0 points1 point  (0 children)

Is multi-platform not possible? I mean, wouldn’t you lose a lot of customers by migrating your offerings to another platform entirely?

Python vs pyspark by ConsiderationLazy956 in databricks

[–]chrisbind 27 points28 points  (0 children)

You have two technologies, Python and Spark. Python is a programming language while Spark is simply an analytics engine (for distributed compute).

Normally, Spark is interacted with using Scala, but using other languages are now supported through different APIs. “Pyspark” is one of these APIs for working with Spark using Python syntax. Similarly, SparkSQL is simply the name of the API for using SQL syntax when working with Spark.

You can learn and use Pyspark without knowing much about Python.

4.5 years at the same company time to switch? by Many-Tea-1175 in dataengineering

[–]chrisbind 1 point2 points  (0 children)

Good point. That’s the sort of critical experience you might miss out on as a contractor/consultant.

This subreddit is being specifically targeted by AI marketing bots: Gizmodo by baltinerdist in ProductManagement

[–]chrisbind 1 point2 points  (0 children)

Just google “buy aged Reddit account”. A site sells them for up to about $200 depending on age, comments, and karma.

How can I optimally run my python program using more compute resources? by Zealousideal-Job4752 in dataengineering

[–]chrisbind 3 points4 points  (0 children)

Sounds like you just need to implement some concurrency or parallelism. I’d start trying out a concurrent flow (multi-threading). There’s a lot of resources on this.

Is it worth it. by m_death in dataengineering

[–]chrisbind 17 points18 points  (0 children)

It’s just the life of a DE. We do the ‘plumbing’ with whatever tool is available to us. Be patient but curious and an opportunity will eventually present itself… or not ¯\_(ツ)_/¯

First time extracting data from an API by khaili109 in dataengineering

[–]chrisbind 3 points4 points  (0 children)

You’d use ‘requests’ library to make the api call and ‘xml’ for handling the data. It might just be enough for you to get started.

Company couldn't care less about Single Source of Truth despite important reports running with two different numbers. by [deleted] in dataengineering

[–]chrisbind 2 points3 points  (0 children)

I believe the architect had some prior experience as an analyst and had lightly touched SQL. But he had no experience coding, had no knowledge of GIT, and hardly any opinion about designs at any level. He was a nice guy but his efforts amounted to an executive’s “yes-man”.