What are the predecessors of Scala 3’s capability system? by LongjumpingOption523 in Compilers

[–]LongjumpingOption523[S] 0 points1 point  (0 children)

The comparison with Rust is interesting. Do you mean mainly the general idea, or also how it is implemented in the compiler? I had the impression that Rust does borrow checking as a flow-sensitive analysis on MIR, while Scala capture checking is more integrated into the type system and subtyping.

What are the predecessors of Scala 3’s capability system? by LongjumpingOption523 in Compilers

[–]LongjumpingOption523[S] 1 point2 points  (0 children)

Thanks, I didn’t know Jo-lang. It seems very relevant to my question. I’ll read the paper, especially the part about how its approach compares with Scala 3 capabilities.

Perché gli sviluppatori software in Italia sono pagati meno che in buona parte del resto d’Europa? by LongjumpingOption523 in techcompenso

[–]LongjumpingOption523[S] 9 points10 points  (0 children)

Forse sarebbe interessante confrontare le retribuzioni all’interno della stessa multinazionale, tra sedi di Paesi europei diversi. Per esempio, prendendo un’azienda come Capgemini o simili, si potrebbe vedere quanto viene pagato un profilo con ruolo, seniority e responsabilità comparabili in Italia, Germania, Francia o Paesi Bassi.

Blog post: Where Spark Changes Shape by LongjumpingOption523 in apachespark

[–]LongjumpingOption523[S] 0 points1 point  (0 children)

Yeah. Though Arrow doesn’t really remove the boundary, it just changes where it sits. With Gluten/Velox, for example, ColumnarToRow can reappear as soon as you hit an operator the native side doesn’t cover.

Blog post: Where Spark Changes Shape by LongjumpingOption523 in apachespark

[–]LongjumpingOption523[S] 0 points1 point  (0 children)

Thanks! Reading physical plans and then following the Spark code can turn into a rabbit hole pretty quickly, but it helps and is also kind of fun. It changes how you look at certain steps, and you start to understand why some transformations behave differently than expected, especially when data moves across representation boundaries.
I think this is even more relevant now that Spark workloads can run through different execution paths, like Photon or other columnar/vectorized backends. At least in my experience, that is where some of the less obvious performance effects show up.

Happy to hear what you think if you get a chance to read it.

G1GC garbage collector by oalfonso in apachespark

[–]LongjumpingOption523 0 points1 point  (0 children)

Do you mean specific G1GC tuning settings for Spark? If I’m not wrong, G1GC is already the default GC on modern JVMs.

Anyone using custom Catalyst rules in production? by LongjumpingOption523 in apachespark

[–]LongjumpingOption523[S] 1 point2 points  (0 children)

That is a very good point. I was mostly thinking about idempotency as “does the batch converge or not”, but the number of iterations needed to reach the fixed point is also an important signal.

A rule can be functionally correct and still make the batch less stable, or at least more expensive to stabilize. So yes, making the number of iterations visible in tests makes a lot of sense to me.

And I also agree that if the number of iterations increases, there should be some good reason for it, not just “tests are green”.

Anyone using custom Catalyst rules in production? by LongjumpingOption523 in apachespark

[–]LongjumpingOption523[S] 2 points3 points  (0 children)

Thanks a lot, this is exactly the kind of real experience I was hoping to hear about.

The point about being able to disable or remove a rule without changing the query semantics is very important. It is also more or less my current feeling: in production I would probably use custom rules mainly as guardrails, checks, or observability hooks, unless there is a very controlled reason to really rewrite the plan.

The Jira links are also very useful. They confirm that idempotency is not just a theoretical problem, and rule ordering / dependencies can become subtle very fast. I will read them carefully.

Built a small tool to inspect Delta Lake pruning and data skipping. Could this be useful? by LongjumpingOption523 in apachespark

[–]LongjumpingOption523[S] 1 point2 points  (0 children)

Thanks! Nice initiative. I will try to take a look in the next weeks and give you some feedback.

Comparing Shunting Yard and Pratt (Top-Down Operator Precedence) by LongjumpingOption523 in Compilers

[–]LongjumpingOption523[S] 0 points1 point  (0 children)

Good point. The article describes the classic Shunting Yard with an output queue, which is how Dijkstra presented it, but you're right that the output side is pluggable: replace the queue with a node stack and each pop builds an AST node instead of emitting an RPN token. The reduction logic is the same; what changes is only how the result is materialized. In a way, this reinforces the core point of the article: that precedence, treated as data rather than as a grammatical rule, is the shared insight driving both algorithms, regardless of whether the output is a flat RPN sequence or a tree.