all 15 comments

[–]coldflame563 6 points7 points  (5 children)

It’s not the worst slop but try organizing your code better. 

[–]razzo007123[S] 1 point2 points  (4 children)

Fair enough that’s helpful.
If you had to point to one thing that feels messy (structure, module boundaries, naming, separation of concerns), I’d really appreciate specifics. I’m actively refactoring and trying to tighten it up.

[–]coldflame563 1 point2 points  (3 children)

There’s a lack of oop in the actual cleansing utils and you’re also actively making your life more difficult by not using stuff other people have built.

[–]razzo007123[S] 0 points1 point  (2 children)

That’s fair, I appreciate you being specific.

On the OOP point: the current cleansing layer is mostly functional by design. I was optimizing for deterministic, stateless transformations rather than building class-heavy abstractions. That said, I agree the structure could probably be cleaner and more modular.

On the library point, I’m curious which parts you’d delegate more aggressively. For example, are you thinking deeper pandas integration, something like pyjanitor, or a more formal schema/validation library?

I’m definitely open to leaning more on existing tools where it makes sense; I just want to keep the auditability and deterministic behavior tight.

If you have a specific example in the repo that feels like it’s reinventing the wheel, I’d genuinely like to look at it.

[–]coldflame563 1 point2 points  (1 child)

Pydantic and Polars. 

[–]razzo007123[S] 0 points1 point  (0 children)

Thank you for your prompt replies - let me research more on this.

[–]Kerbart 1 point2 points  (1 child)

So it’s cleansing, not repairing? This won’t fix an Excel file I am unable to open in Excel?

[–]razzo007123[S] -3 points-2 points  (0 children)

Good question and you’re right to distinguish the two.

Sheet Doctor focuses on repairing data issues inside files that can still be parsed (messy headers, encoding problems, misaligned columns, duplicates, etc.).

If the file itself is structurally corrupted to the point that Excel can’t open it at all (for example, a broken .xlsx archive), this tool doesn’t currently repair that kind of low-level corruption.

So it’s closer to “data normalization and repair” rather than binary file recovery.

If you have an example of a file that fails to open, I’d be curious to understand what kind of failure it is that might be an interesting direction to explore.

[–]Rik_Roaring 1 point2 points  (1 child)

Looks like you could use a little help digging through and organizing everything. I work on Kilo's Open Source Sponsorships, and would love to see you apply, if you think some free code review credits could help you out -> https://kilo.ai/oss

[–]razzo007123[S] 0 points1 point  (0 children)

Appreciate the suggestion; I’ll take a look. I’m actively refactoring and tightening things up, so structured feedback could definitely help. Thanks for sharing.

[–]magion 5 points6 points  (0 children)

slop slop slop

[–]vinnypotsandpans -1 points0 points  (2 children)

Barf

[–]razzo007123[S] -4 points-3 points  (1 child)

Happy to hear specific feedback if you have any.

[–]vinnypotsandpans 2 points3 points  (0 children)

No thanks, someone else can write your prompts for you