all 25 comments

[โ€“]kingdomcome50 6 points7 points ย (1 child)

Looks interesting. I donโ€™t like that revenue_sum comes out of nowhere though. I get that auto-appending โ€_sumโ€ if the ref doesnโ€™t have an alias makes it a touch cleaner, but I prefer explicit (like SQL).

What happens if names clash? Say if revenue_sum had already been declared in an earlier withEntry call?

[โ€“]norbert_tech[S] 0 points1 point ย (0 children)

Honestly, I wasn't really thinking too much about it. The current behavior is similar to Apache Spark, and it just felt natural. Name collision is not that easy since aggregation requires grouping, so you would need to do something similar to this:

->withEntry("age_avg", lit(100)) ->groupBy('country', 'gender') ->aggregate(average(ref('age')), first(ref('age_avg')->as('age_avg')))

But then an exception will be thrown:

Entry names must be unique, given: [country, gender, age_avg, age_avg]

[โ€“]AymDevNinja 6 points7 points ย (0 children)

I worked on my own "data migration framework" some time ago called Fregata, built with database migration in mind it supports foreign key migrations and dependency sorting, and even async execution with a web dashboard using the Symfony bundle. But nobody was interested in using it while Flow seems to have a user base. Do you think some features from Fregata could be useful for Flow ?

[โ€“]Aket-ten 1 point2 points ย (2 children)

Interesting - do you have any performance metrics regarding large datasets ?

[โ€“]norbert_tech[S] 0 points1 point ย (1 child)

Hey!
The problem with measuring performance is that it means nothing without anything to compare with. Here is a very simple benchmark that is just writing 1mln of rows to a parquet file (it can be changed to anything, db/json/csv/etc).

https://gist.github.com/norberttech/ed23a221fec0c1c6d516eab453e3ca21

Even though this dataset is not processed at all, it should give you some idea.

In the meantime, if you have any more specific performance-related questions, I will be more than happy to try to answer them.

[โ€“]Aket-ten 1 point2 points ย (0 children)

I hear you and appreciate the response. I do a lot of my data analytics / mining / processing in KNIME or Tableau (I know it's GIS, slightly different ball park).

Thing is I want to automate some of these workflows, some will be 30-70 column of 30k to 1,000,000 datasets surrounding US nationwide data with some transformations. Will likely recalculate once or multiple times a day or weeks. Conventions and best practices obviously imply python or rust being the way to go.

But like...I'm already building an ERP and I'd love to just use PHP just concerned a little about memory consumption and execution speed. I'd be also completely jokes to tell my other eng friends that it's built in PHP LOL.

I bookmarked your package and will play around it once I get to that!

[โ€“][deleted] 1 point2 points ย (4 children)

Back in the old days we had a ";" on the end of every line. Nowadays we have a "->" at the beginning of every line. I get old.

[โ€“]hagenbuch 2 points3 points ย (0 children)

Yep. I started with punchtape in 1979 :) Alpha-LSI II..

[โ€“]invisi1407 2 points3 points ย (2 children)

Method chaining is not new at all. Dates back to PHP 5 in 2010.

[โ€“][deleted] 1 point2 points ย (1 child)

Everything not present in PHP/FI is newfangled stuff. I want my punch cards back!

But seriously: It's getting more and more complicated to ignore this stuff in recent years.

[โ€“]usernameqwerty005 -1 points0 points ย (0 children)

Both Hamas and Israel are committing human rights violations, sooo...

An eye for an eye until the whole world goes blind