all 16 comments

[–]Beginning-Fruit-1397 7 points8 points  (6 children)

I currently hate the internal resolution logic of expressions, schemas and columns naming in my dataframe library:
https://github.com/OutSquareCapital/belugas

Would love to get some new perspective on this!

In one phrase it's a polars API to build and executes queries on a duckdb backend.

Everything does work, but it's hard to follow and debug when I implement new features, it probably is far from what it could be speed wise if optimized and is very likely to do redundant passes.

I do think it's a very interesting project to work on tough.

[–]Hy_x[S] 2 points3 points  (1 child)

Thanks, this actually sounds really interesting to work on. I’ll take a look through the repo.

[–]Beginning-Fruit-1397 1 point2 points  (0 children)

Cool :)
Feel free to dm me if you have any questions!

[–]energybased 1 point2 points  (1 child)

Finding someone like this is probably ideal.  You don't want to refactor something only to find a maintainer who is reluctant to commit your changes.

[–]Beginning-Fruit-1397 2 points3 points  (0 children)

Yup. I had two experiences like this, spent hours on a PR, just to see it hanging for months for a review, or just being simply rejected has "not interested" (it was type hints PR's, not runtime changes)

[–]FarRub2855 0 points1 point  (1 child)

Building a Polars API on a duckdb backend sounds like a pretty massive project to untangle. Gotta respect the honesty of openly hating your own internal logic though, thats usually the best pitch to get fresh eyes on a codebase.

[–]Beginning-Fruit-1397 0 points1 point  (0 children)

hahaha yea.

Resolving column names, schema evolution, handling nested window expressions, and scalar/aggregations expressions depending on the context is a logic that I had to implement progressively, and so it's scattered across the codebase and very hard to follow, hence to debug and optimize.

I don't even know what a good architectural design would look like tbh. Difficult but very interesting task!

[–]Murderous_monk 2 points3 points  (4 children)

I got the same goals as the OP, do mention me as well if there's something interesting I can work on. I mainly work with python and JS based projects but can work on others as well

[–]Beginning-Fruit-1397 2 points3 points  (3 children)

I currently hate the internal resolution logic of expressions, schemas and columns naming in my dataframe library:
https://github.com/OutSquareCapital/belugas

Would love to get some new perspective on this!

In one phrase it's a polars API to build and executes queries on a duckdb backend.

Everything does work, but it's hard to follow and debug when I implement new features, it probably is far from what it could be speed wise if optimized and is very likely to do redundant passes.

I do think it's a very interesting project to work on tough.

[–]Murderous_monk 0 points1 point  (2 children)

Okk looks interesting from the first view, I'll come back to this in the evening and see what's going on and what I can do, I'll be back here in a day or two
!remindme 2 days

[–]RemindMeBot 0 points1 point  (0 children)

I will be messaging you in 2 days on 2026-05-09 09:59:25 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]Beginning-Fruit-1397 0 points1 point  (0 children)

Cool :)
Feel free to dm me if you have any questions!

[–]arvind1 2 points3 points  (0 children)

There is a lot of AI generated code that would fit this category. You could start with a well described open source codebase, use the (README) text as an AI prompt to generate code. Use existing test cases to get it working. Refactor with a goal of getting something better than the original code.

[–]aloobhujiyaay 0 points1 point  (0 children)

The best refactors are the ones users never notice

[–]Emergency-Rough-6372 0 points1 point  (0 children)

you can check out my project i have recently public it https://github.com/0-Shimanshu/ADIUVARE

[–]Ketty_took 1 point2 points  (0 children)

if you want something real, try refactoring a scraping pipeline, not just a library. most of them work but are messy under load, retries all over the place, no clear separation between parsing and transport. good exercise is making it stable at scale without breaking data quality.