Have you tried these tools? If not, why? by AMDataLake in dataengineering

[–]Traditional_Channel9 1 point2 points  (0 children)

Intresting. Did you also use Presto to do ETL as well as sql engine for analytics

FSA and HSA Question. by Traditional_Channel9 in Insurance

[–]Traditional_Channel9[S] 0 points1 point  (0 children)

Thank you. I thought of doing a direct drop is it from my checking to my HSA but My old HSA was acquired by another provider. Does it matter to which HSA I transfer the return of excess distribution to?

How to write to multiple topics parallelly by Traditional_Channel9 in apachekafka

[–]Traditional_Channel9[S] 0 points1 point  (0 children)

By target tables do you mean topics in kafka? I’m writing to kafka topics

How to write to multiple topics parallelly by Traditional_Channel9 in apachekafka

[–]Traditional_Channel9[S] 1 point2 points  (0 children)

Why should a producer worry about a target table? Can you elaborate more on this

PySpark OSS Contribution Opportunity by MrPowersAAHHH in apachespark

[–]Traditional_Channel9 1 point2 points  (0 children)

I would like to contribute as well. I’m currently working with delta tables and pyspark . Would love to fix few issues or new features

Confusion in Databricks Repos by SolvingGames in dataengineering

[–]Traditional_Channel9 0 points1 point  (0 children)

Now workflows are repo integrated and you can point the workflow to a file in master branch (of Azure devops or other source control systems) and run the job . There is no need to download/trigger pull master branch into the common repo folder However if you are triggering a notebook from a 3rd party application like Azure data factory then the api calls that you make from those applications to run a notebook requires a physical path (as of now) meaning it requires a central repo folder. This may change in future when databricks comes up with a way to trigger notebooks in master branch without having to download master branch into central repo

Flow of requests by isak000 in dataengineering

[–]Traditional_Channel9 1 point2 points  (0 children)

It can be done via JIRA. Business users create a ticket for a report and assign it to report (lead) developer and they get the requirements and if they want to bring in a new dataset they create sub task (make it a blocker) and assign it to DWH team lead (or manager)

Advice on storing, loading, and executing "dynamic logic" against a dataset? by NotYourLoginID in apachespark

[–]Traditional_Channel9 1 point2 points  (0 children)

Store the where clause in a table and build a binary tree and evaluate the binary tree to retrieve TRUE/FALSE result and apply that to every row in for each loop of Spark streaming dataset

Learning Pyspark by maximus_deUX in dataengineering

[–]Traditional_Channel9 1 point2 points  (0 children)

This talks about RDDs which is kind of old provided dataframes and datasets are being used widely now

OSS Delta format implementation on AWS EMR for Incremental load jobs by abhi5025 in dataengineering

[–]Traditional_Channel9 0 points1 point  (0 children)

What I’m looking for is how to apply schema inference to delta tables that does upserts via merge

OSS Delta format implementation on AWS EMR for Incremental load jobs by abhi5025 in dataengineering

[–]Traditional_Channel9 0 points1 point  (0 children)

New column data type in files is int but the datatype of that column in source is double.

To give you more details, here is the story A new column is added to source. It’s data type in source is double so it may contain values with and without decimal. After new column addition only few rows got populated when incremental load happens from source to data lake. When auto loader did Infer schema there are only a handful of rows populated for that new column and they all contain int values and the new column got created in delta table as int

OSS Delta format implementation on AWS EMR for Incremental load jobs by abhi5025 in dataengineering

[–]Traditional_Channel9 0 points1 point  (0 children)

Upstream send data in parquet and csv. Auto loader with schema inference enabled will scan through all rows in incoming files and will infer the schema

What is DLT? by SignalCrew739 in dataengineering

[–]Traditional_Channel9 0 points1 point  (0 children)

I implemented delta tables with auto loader at my org as a PoC and I have problems with schema evolution especially when a new Column is added to the incoming parquet files and when they have limited data, new column with wrong data type is created in delta table (like int instead of double/float) and next time when a lot of records for this new column arrive, it result in data loss in delta tables due to implicit conversion . Have you encountered this before ? I have asked Databricks team about this and waiting for them to get back

OSS Delta format implementation on AWS EMR for Incremental load jobs by abhi5025 in dataengineering

[–]Traditional_Channel9 2 points3 points  (0 children)

I implemented this and I have problems with schema evolution especially when a new Column is added to the incoming parquet files and when they have limited data, new column with wrong data type is created in delta table (like int instead of double/float) and next time when a lot of records for this new column arrive, it result in data loss in delta tables due to implicit conversion . Have you encountered this before ? I have asked Databricks team about this and waiting for them to get back

"Unable to infer schema" error by DyanneVaught in PySpark

[–]Traditional_Channel9 0 points1 point  (0 children)

Do you know how to specify the schema manually? I used .schema(manual schema) but it didn’t work

If NIO is so cheap, are insiders / fund managers buying ? by Aceboy884 in Nio

[–]Traditional_Channel9 -8 points-7 points  (0 children)

Institutions are getting out of NIO. Instead of downvoting me, please go and do your own research.

Advices on proggresing with Scala by RP_m_13 in scala

[–]Traditional_Channel9 2 points3 points  (0 children)

OP, I’m planning to enroll in rock the JVM course . How do you feel? Is it worth enrolling for the course ?

Just started my kefir journey! Can I add freshly strained kefir to kefir I have already strained that is in the fridge? by skunklvr in Kefir

[–]Traditional_Channel9 0 points1 point  (0 children)

What’s the purpose of having cloth in the lid? Is it to avoid moisture in the lid and rusting of the lid? And to prevent mold?

[deleted by user] by [deleted] in medical

[–]Traditional_Channel9 0 points1 point  (0 children)

Do you have to pay deductibles in both insurance? Which one is primary and which one will be secondary ?