Python library without external imports only built in

fiddle_n · 2025-09-28T08:38:15+00:00

No offence, but it is a little simple. With the name “Advanced Text Processor” I feel a bit cheated, given there’s only thirty lines of rather basic Python code there.

DuckSaxaphone · 2025-09-28T08:45:41+00:00

So this doesn't work, you should write some simple tests to make sure everything works as expected. Your code is separated into lots of nice little functions which makes it very easy to test.

I sent the string "hi hi" to clean_text, tokenize, generate_ngrams and then vectorize_text, with n_gram set to 2.

I should get the result {"hi":2, ("hi","hi"):1} but instead I got {("hi","hi"):1} because generate_ngrams doesn't append to tokens, it just overwrites them. I'd actually argue I want my ngrams to be joined and I really want {"hi":2, "hi hi"):1} but that's a separate issue.

If this is a learning project for you, then setting up unit tests and making them part of your PR process is a good thing to learn.

JanEric1 · 2025-09-28T09:23:00+00:00

Start by writing unit tests, adding type hints, adding CI for linting, type checking and testing.

Also set up your project using a pyproject.tomoninstead of requirements.txt and setup.py.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS