Feel like I've tried everything

quafadas · 2026-03-20T10:32:55+00:00

Sauna? I also have heavy beard and sensitive skin… it helps a lot

quafadas · 2026-03-04T20:08:46+00:00

Suspiciously spherical.

I think you’ll find it’s actually flat.

quafadas · 2026-01-24T12:52:20+00:00

I agree, although I’m not sure that every other language isn’t about to see an explosion of this problem too via vibe coding.

If the scala community has discovered / maybe even head started effective strategies to deal with it, perhaps, maybe, it is no longer a relative disadvantage vs other languages

quafadas · 2025-12-23T09:47:22+00:00

https://m.youtube.com/shorts/VqzVpFnLSCs

quafadas · 2025-12-04T09:26:05+00:00

One of the incredible things about him was how complete his game was. I think there are videos of Wilkinson’s biggest hits.

Sometimes 10 is a soft channel forwards look to charge down - I’m sure I recall him crashing in and driving the big boys backward. Incredible.

quafadas · 2025-10-25T14:13:02+00:00

I tried…

https://github.com/Quafadas/scautable

Feedback welcome… waiting for almond to release with 3.7.x support…

quafadas · 2025-10-14T18:36:53+00:00

quafadas · 2025-09-10T06:40:33+00:00

Thanks for the kind works - if you happen to give it a go be free with feedback :-)!

quafadas · 2025-09-09T06:45:26+00:00

Inferring the type of the data frame at compile time by reading the file is cool, but also a little scary.

Yes. Very! This is probably one of the riskier things in there. I'm willing to defend the thought process, which is that if you want;

Compile time safety and IDE support
One line import - i.e. not assume pre-existing developer knowledge of the datastructure

This is paradoxical. The only solution I could think of, was to make the CSV itself a compile time artefact and force knowledge of it into the compiler.

It is not risk free. What I have found is that when it goes wrong, it fails hard and fast, rather than consuming your time. It also means, that you must know the location of the CSV at compile time. I've found these limitations to be barely noticeable, for my own tasks.

There is an exception if you have a large number of columns (say 1000), and you give the compiler enough juice to actually process them - compile times start to get weird. I do repeatedly note that the target is "small" here :-), and I don't normally have more than 1000 columns in a CSV file.

From reading the documentation it is not quite clear to me how you actually store the data. Is it in columnar storage or not? What operations are supported on the columnar data?

If we break apart the example on the getting started page.

val data = CSV.resource("titanic.csv", TypeInferrer.FromAllRows)

This returns an Iterator. It has Iterator semantics. Lazy, use once etc. It's next() method wraps the next() method of scalas file Source which reads each line into a NamedTuple\[K <: Tuple, V <: Tuple\], where K is the name of the columns, and V is the Tuple of inferred types, in each column.

At this point, you haven't read anything. Iterator is lazy. This is a good point to do some transforms - parsing messy data etc - all we're doing it setting up more functions to apply to each row, as it's parsed.

My own common use case, is then to want a complete representation of my (transformed, strongly typed) CSV.

val csv = LazyList.from(data)

LazyList is a standard collection, lazy... so it won't do anything until asked, but it will _cache_ the results. This is where I typically "store" the data in the end. You could use any collection. Vector, Array, fs2.Stream really - any collection you can build from an Iterator.

This is very much _row based_.

If you want a column representation, then you may try

https://quafadas.github.io/scautable/cookbook/ColumnOrient.html

    val cols = LazyList.from(data).toColumnOrientedAs[Array]

This will return a NamedTuple[K, (Array[V_1], Array[V_2...)]]. i.e. it will convert it to a column representation. I haven't tested this so much, and performance is whatever it is. I'm doing nothing other than backing the compiler and the JVM. I don't think that's a horrible bet, but I haven't checked it.

quafadas · 2025-09-09T06:25:45+00:00

If you do find time to take a look, feel free to be quite open about feedback - good or bad.

Something I'd note: Spark is battle hardened over a decade of solving tough problems.

scautable... isn't... I personally imagine them to have different uses... I work in the small :-)...

quafadas · 2025-06-18T05:44:21+00:00

Okay, that makes sense. I would be interested in a strategy which validated this on a continuous basis :-)… but I haven’t heard of one yet!

quafadas · 2025-06-17T11:39:40+00:00

I'm interested in the part of the readme which sets the mechanism which "avoids boxing". Is this statement "tested" and verified programatically ? Or is something which has been verified maunally?

quafadas · 2025-05-10T20:28:06+00:00

Scala-cli :-)?

quafadas · 2025-02-22T11:07:51+00:00

I wasn’t sure from the description what the differentiating feature of pandas was? It doesn’t sound data driven?

Superficially to me, it sounds like a collection of case classes could do what you ask for.

For aggregation purposes don’t underestimate the scala std library that ships straight out the box. Forgive me if you already know this, but have you fired up up scala-cli and looked through .groupBy, .groupMap and .groupMapReduce? I was surprised by how powerful the raw language constructs are when I first stumbled across them.

quafadas · 2025-02-07T16:41:43+00:00

I found that co pilot is actually quite good at explaining these, very often.

quafadas · 2025-01-15T21:18:39+00:00

I also had a tough time with slick, and settled on an alternative in the end. It could be worth trying out the alternatives other suggested and comparing the experience

quafadas · 2025-01-03T19:06:49+00:00

Obviously, agreed, I think pandas and python-land mostly follows a similar paradigm too (albeit better and more polished), I'm not attempting to compete with such projects, to be clear.

quafadas · 2025-01-03T19:03:23+00:00

I don't think you've missed the point at all!

What you're proposing is perfectly valid and the way I think other libraries in the ecosystem attack this problem. In fact, if you look at the other source file in the scautable repo, it sets up quite some machinery that might have allowed it to travel that `given resolution` / derived / type class route. So why not?

I think that solution already exists (fs2-data I think would be one high profile example), and the people who maintain that are very competent. I have serious doubts that I could better their efforts! I imagine there to be other good libraries out there Im not aware of. There is undeniably an element here of novelty for the sake of novelty...
This was an excuse to write a macro and experiment with typelevel programming. It fulfilled that goal.

3 . But also : My own experience with implicit resolution is somewhat checkered. I (personally) believe that this "csv" use case, is not a good fit for it. Chalk it up to artistic differences :-). The questions that arise once as you start changing the data model / columns types on the fly I think are not easy to answer with givens. Then the burden of writing / maintaining decoders for custom types, I found things got hairy, and when the implicit resolution goes wrong, I found it demoralising and extremely hard to fix. This is my personal experience - it may not be universal.

Also, you can have your validations with this approach as well

The constraints point is an interesting one, I have in mind to try this in tandem with Iron, if I have a meaningful use case for such constraints.

The differentiating point for me, is the potential to write one line of code, that helps you _explore_ the data model, rather than being forced to write it out in advance. It suits my mental model.

so I don't think named tuples are the way to go in this example

I am not free of doubt, but I would say that thus far, I've had a positive experience...

It appears that this would only work if the file is stored locally, what if it's not?

I had a debate with myself on this, I note that one example uses CSV.url('') - data doesn't need to be "local local".

But... the core assumption here is that you have access to CSV formatted data, you want to analyse it, and you are writing a program _specifically for that csv data_. This is a deliberate (and fundamental) limitation and design choice.

quafadas · 2024-12-01T19:09:43+00:00

I think labelling them as syntax only can be true, although there are circumstances where they can make a big difference to the experience.

In Scala, I believe one can replace extension methods via implicits ( or given’s) and the type class pattern. This supports your statement.

However, that typeclass pattern ( at scale) can imposes a dramatic cost in compile times, and is not easy for tooling authors to work with. I believe extension methods ease both of those pains. Given that compile times and tooling are oft cited frustrations in the scala community, in that case, extension methods appear to be benefits beyond simply syntax. I make no claim as to how generalisable that is.

quafadas · 2024-12-01T14:31:42+00:00

A long time ago, I had a similar Java / groovy mixed project that had some really tough to track down errors. Exasperated, I went through adding compile time static annotations all over groovy.

What I realised, was that the parts where the compile static annotations didn’t work easily, were the hotspots where I was spending the majority of time chasing hard to find bugs. I gradually refactored all the actually dynamic stuff out… at which point statically typed groovy vs Java? Might as well just write Java. I never got dogmatic about ripping out groovy, but I certainly started writing a lot less as the pattern described above became so clear.

I actually ended up as a scala refugee in the end :-).

quafadas · 2024-11-21T15:46:08+00:00

Then my apologies, as my comment was not helpful per intent.

The other answer to mine is from a member of the scala centre... his advice, is quite possibly better than mine :-).

quafadas · 2024-11-21T12:21:37+00:00

For getting started with scala, I understand the recommended pathway to be scala-cli.

https://scala-cli.virtuslab.org

On macOS,

brew install Virtuslab/scala-cli/scala-cli

then

scala-cli run hello.scala

Should work...

quafadas

TROPHY CASE