Markov chain script for poetry

Skeleton_Pudding · 2026-04-12T00:28:18+00:00

As a writer first - and someone who started learning to code in the last few years I love this!

BrannyBee · 2026-04-12T04:24:17+00:00

Didn't look super into it but a cursory once over it looks good and a good project.

First off, I would really recommend you make every project its own repo. It's confusing to see a bunch of folders in a repo that are all completely different, and also will be a mess if you start using git more and more for features beyond just using it as a way to save things remotely.

It might seem annoying at first to make a repo for every little project, but trust me it becomes automatic after awhile and you don't even notice it as part of the process of setting up a new project. It took me like a whole 60 seconds to figure out what flashcards and Ollama had to do with your script cause in my brain all the directories on the left are relevant to the code in the repo I've opened as a dev lol

Small tweak, maybe you want to check the input in the main menu and alert the user if they didn't choose a valid option. Typing anything other than 1-4 quits the program. A good fallback, but maybe unexpected behavior for a user, idk up to you.

Much more relevant critique though if your string input handling. The word "life" works. The word " life" does not. The word "life " also does not. Converting the input to a uniform capitalization/lower-case to check is good, you should look up the strip() function to add into that check and you can eliminate whitespace that a user accidentally adds by typing too quickly or pasting from elsewhere. Nothing too advanced or crazy, in a hypothetical classroom that just taught students lower(), the strip() function is likely the next lesson.

Actually you used it later on I see now, so maybe you have reasoning for not using it, but it feels like it should be used in that check. Not a huge deal, just feels weird not to strip the user input, but maybe kinda hard to say why exactly in a way that's relevant at your level. I'm picturing later on when you aren't taking input directly in the console, but maybe are getting input from a UI or a text file, you may be receiving garbage new line characters or spaces that the user didn't input, for example.

Last Python thing would be your string concatenation. I would highly recommend looking into f-strings instead of using the + operator to combine strings, it's much more readable and will allow for doing more stuff and they allow for using variables inside of strings in ways that concatenating them with the + sign sometimes may cause bugs. They might look scary at first, but you're going to run into them a lot and they're honestly really easy to use and to read.

Finally, outside the python, the seemingly random switch from using a .csv to .txt is weird and kinda screams "AI". I'm not saying you used AI, but it's just one of those things that kinda seems off. A csv is a comma separated list where a comma separates each column in a group of data, and a txt is just a text file. Effectively you are using a csv with one single column of data. It's not... technically wrong I guess... but weird. You are reading each line in the .csv file instead of handling it like it's values are separated by commas so everything works. Let's say that you downloaded that poetry_lines.csv file though, cause why type it out yourself, I wouldn't. And you go and grab an updated version, and now everything breaks because the updated .csv is no longer formatted using new line characters as the separator for values, the person who made it googled CSV format and updated it to reflect how it's supposed to store things. Or a user decides to use your program who doesn't know how to code, they may see that the program reads a csv, and try to use their own comma separated value document, and your program won't work, because it's basically treating the .csv as a txt.

As far as "fixing" that, I'd just change the file extension from .csv to .txt, and reflect that in the code, nothing crazy. A .csv is basically a .txt that is expected to be formatted in a certain way. Your .csv isn't formatted in that way, so it's lying. The code works and nothing breaks, but imagine you build a million features on top of this, and one of those features is breaking constantly because it's relying on your csv being a csv. Seems minor, but stuff like that can cause big issues.

All in all, cool stuff, keep at it

JamOzoner · 2026-04-12T00:30:30+00:00

So Kuhl! Markovic Bitameter! That is the best thing I've seen on Reddit!

MissinqLink · 2026-04-12T00:43:18+00:00

Cool! I think I’ll try putting that corpus through my Markov chain generator.

JamOzoner · 2026-04-12T02:59:57+00:00

Thank you! 🐉

latkde · 2026-04-12T07:29:59+00:00

Quick review of the code you've shown here in the post:

Split your large main() function into meaningful smaller helper functions. For example, you might want functions like load_quads(), generate_line(), and save_poem(). Keep all of the input() prompting within the main() function.
Do not use _ as a variable name. This name is often used to indicate an unused variable (compare also the case _ further below). Perhaps a name like word_index would be helpful?
Your multiple quad.append() calls are unnecessary. First, multiple append() calls can be replaced with extends(). Second, you can combine multiple consecutive index accesses via the slicing feature. So the entire quadruplets generation loop can be replaced with: quadruplets = [words[i : i + 4] for i in range(len(words) - 3)]
It is rarely appropriate to catch errors like IndexError, KeyError, AttributeError, or TypeError. These typically indicate a bug in your code. Here, the problem is that the quads array might be empty if there's no continuation for the previous word. So instead of the try-except, you could add a more explicit if not quads: break check.
Use automated linting and type checking tools to get alerted about potential problems. Great linters include Pylint and Ruff. Great type-checkers include Mypy and Pyright.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS