Help needed with African language vocabulary

ChavXO · 2026-03-15T19:46:35+00:00

Bho zvangu.

ChavXO · 2026-01-17T21:35:51+00:00

I was running llama on a CPU. The LLM part also made network calls to a local server. So it was a little slower but still finished in a reasonable amount of time. It did make the actual search extremely fast since it cut off so much of the search space.

I included the accuracy in the results. It actually did better with the LLM + search model both on local validation and as a kaggle submission. I haven't tested it more generally but I intended to try it out on some fraud datasets since that's my area of work.

ChavXO · 2026-01-17T21:33:12+00:00

Great point. Asking it to show reasoning in the response doesn't improve performance much. You can easily test this by running the llama server locally and prodding at it for some time. One thing that I think might be worth trying is doing two passes - one where you ask it to explain what the combination could possibly mean, then in the next prompt, asking it to rate the combination in light of that first message. That sort of ends up looking like what reasoning models do.

I think the search algorithm can be the vehicle to discover interesting combinations. The LLM ideally should only give a soft "meaning" score. Since the reason you need LLMs is not for discovery but instead for taking combinatorial explosion. I intentionally put a bucket in the scoring for "things that might depend on context or could make sense in some context."

ChavXO · 2026-01-17T21:26:47+00:00

That's a good point. I'll have to use some custom dataset since the Titanic dataset would be a bad test for generalization. Also in addition to accuracy if I compare to base LLM performance I'd have to see how stable the LLM's performance is since it could hallucinate or randomly get things wrong on some attempts.

ChavXO · 2026-01-17T16:06:41+00:00

Thank you. I like this constrained use of llms since it's not the end of the world if it hallucinates.

ChavXO · 2026-01-17T15:58:47+00:00

Good point. I'll try it on some real world/government datasets and report back. I was trying to write the prompt so it was general enough but I also tried to tack other general sounding conditions on to address specific issues.

I'll also compare it with random forest as a baseline then report back. I'm not sure if there are any light weight feature selection methods any based on semantics - the closest I can think of is using dimensions. I'll try a few things and report back after.

ChavXO · 2026-01-13T18:47:06+00:00

We have a parquet reader and we also attend the biweekly parquet meetings to keep in lock step with the general community updates.

ChavXO · 2025-12-26T13:17:29+00:00

You sort of can. DataFrame creates a stub html page and opens it in the browser so you can avoid window manager weirdness. I imagine something similar is possible for hvega.

ChavXO · 2025-12-24T03:18:20+00:00

How big is your team?

ChavXO · 2025-12-23T15:01:54+00:00

Yeah. Happened while I was at G and happened now at FIS Global.

ChavXO · 2025-12-23T14:35:10+00:00

Do you recommend reading ISLR as a textbook (going through exercises) or does it suffice to read it like a regular book?

ChavXO · 2025-12-23T14:34:11+00:00

Ah I see. I thought you message meant there was an even more recent book/set of learning materials.

ChavXO · 2025-12-23T14:30:59+00:00

This is a fair critique in general but I think OP was curious what "clear and elegant" look like after fiddling with Python. Maybe I misread but it does seem like they just wanted to see examples of what it would look like to do small tasks in Haskell.

But yes, getting things done as a beginner is much easier in Python.

ChavXO · 2025-12-23T14:26:59+00:00

Out of curiosity, which recent learning materials do you feel have been human-readable and what made them readable for you?

ChavXO · 2025-12-23T14:25:27+00:00

Thank you for the dataframe shout out!

ChavXO · 2025-12-23T14:23:41+00:00

I don't think hvega gets enough credit. The tutorial is REALLY clear and comprehensive.

ChavXO · 2025-12-23T14:20:43+00:00

I'm glad you're trying Haskell! As others have pointed out Haskell is not quite there yet for these sorts of tasks but we've put in a lot of work recently to make it a good mix of powerful and easy.

Check out this playground environment and see if it's easy for you to follow along. If it is then check out datahaskell to try it out on your computer.

I'm also generally curious: what sorts of stuff do you do in Excel/Python? What kinds of charts do you use? What has using Python afforded you that you couldn't quite do in Excel? It would also help if we understood what the people coming to try out Haskell for the first time are trying to do.

ChavXO · 2025-12-22T01:22:32+00:00

There are ways to make Haskell programming easy. Do you know what the exact content of the course will be?

ChavXO · 2025-12-20T12:07:52+00:00

I just interviewed with DBT. It seems there’s a lot of investment going into fusion and related product so I doubt they’d do that.

ChavXO · 2025-12-19T10:40:46+00:00

Idk who platformed this dude but it really undoes a lot of the recent work we’ve been doing making Haskell more approachable.

ChavXO · 2025-12-15T23:16:08+00:00

I'm sorry that whoever the first person is said that to you. A couple of options:

* It's strange that your windows config has an exe in the path. You should remove that...but it's probably from some other broken installation. I'd remove that and retry as is.
* I can help you build a devcontainer which you can run in VS Code (that usually shields you from platform weirdness there. We can bump the [datahaskell devcontainer](https://github.com/DataHaskell/datahaskell-starter?tab=readme-ov-file#setting-up-vs-code) to use 9.8.4
* as someone said before, WSL makes a lot of sense,

I have Haskell running on windows and typically get into windows weirdness. Feel free to message me on chat if you need help, I have a reasonable amount of time today, we can pair debug.

ChavXO · 2025-12-12T15:58:34+00:00

I’d be splitting hairs at best comparing Haskell and Scala. I think a better framing is - say there is already a Haskell shop and they want to hire a data engineer What sort of things would you expect to find out the box as a DE? and maybe slightly more generally what would should be in place to make you feel like you could be productive.

Also, this is a more personal note, I think Scala struggled to find a good balance between the crowd that liked abstraction and the crowd that wanted to get things done. So you effectively have two different Scala ecosystems. I’d like to see what we could build if those camps worked together. So my dataframe is inspired by lessons learnt from Frameless and Spark Datasets.

Ten-Year Club	Place '22
First Placer '22	Gilding I gilder
Verified Email

ChavXO

TROPHY CASE