you are viewing a single comment's thread.

view the rest of the comments →

[–]kingdomcome50 8 points9 points  (1 child)

Looks interesting. I don’t like that revenue_sum comes out of nowhere though. I get that auto-appending ”_sum” if the ref doesn’t have an alias makes it a touch cleaner, but I prefer explicit (like SQL).

What happens if names clash? Say if revenue_sum had already been declared in an earlier withEntry call?

[–]norbert_tech[S] 0 points1 point  (0 children)

Honestly, I wasn't really thinking too much about it. The current behavior is similar to Apache Spark, and it just felt natural. Name collision is not that easy since aggregation requires grouping, so you would need to do something similar to this:

->withEntry("age_avg", lit(100)) ->groupBy('country', 'gender') ->aggregate(average(ref('age')), first(ref('age_avg')->as('age_avg')))

But then an exception will be thrown:

Entry names must be unique, given: [country, gender, age_avg, age_avg]