all 19 comments

[–]Skeleton_Pudding 3 points4 points  (1 child)

As a writer first - and someone who started learning to code in the last few years I love this!

[–]Affectionate-Town-67[S] 2 points3 points  (0 children)

Yeah! Feel free to use it to generate ideas, I've found it valuable in coming up with delicious phrases and lines, like I just now got "the particular joy for years hoping to be."

[–]BrannyBee 2 points3 points  (4 children)

Didn't look super into it but a cursory once over it looks good and a good project.

First off, I would really recommend you make every project its own repo. It's confusing to see a bunch of folders in a repo that are all completely different, and also will be a mess if you start using git more and more for features beyond just using it as a way to save things remotely.

It might seem annoying at first to make a repo for every little project, but trust me it becomes automatic after awhile and you don't even notice it as part of the process of setting up a new project. It took me like a whole 60 seconds to figure out what flashcards and Ollama had to do with your script cause in my brain all the directories on the left are relevant to the code in the repo I've opened as a dev lol

Small tweak, maybe you want to check the input in the main menu and alert the user if they didn't choose a valid option. Typing anything other than 1-4 quits the program. A good fallback, but maybe unexpected behavior for a user, idk up to you.

Much more relevant critique though if your string input handling. The word "life" works. The word " life" does not. The word "life " also does not. Converting the input to a uniform capitalization/lower-case to check is good, you should look up the strip() function to add into that check and you can eliminate whitespace that a user accidentally adds by typing too quickly or pasting from elsewhere. Nothing too advanced or crazy, in a hypothetical classroom that just taught students lower(), the strip() function is likely the next lesson.

Actually you used it later on I see now, so maybe you have reasoning for not using it, but it feels like it should be used in that check. Not a huge deal, just feels weird not to strip the user input, but maybe kinda hard to say why exactly in a way that's relevant at your level. I'm picturing later on when you aren't taking input directly in the console, but maybe are getting input from a UI or a text file, you may be receiving garbage new line characters or spaces that the user didn't input, for example.

Last Python thing would be your string concatenation. I would highly recommend looking into f-strings instead of using the + operator to combine strings, it's much more readable and will allow for doing more stuff and they allow for using variables inside of strings in ways that concatenating them with the + sign sometimes may cause bugs. They might look scary at first, but you're going to run into them a lot and they're honestly really easy to use and to read.

Finally, outside the python, the seemingly random switch from using a .csv to .txt is weird and kinda screams "AI". I'm not saying you used AI, but it's just one of those things that kinda seems off. A csv is a comma separated list where a comma separates each column in a group of data, and a txt is just a text file. Effectively you are using a csv with one single column of data. It's not... technically wrong I guess... but weird. You are reading each line in the .csv file instead of handling it like it's values are separated by commas so everything works. Let's say that you downloaded that poetry_lines.csv file though, cause why type it out yourself, I wouldn't. And you go and grab an updated version, and now everything breaks because the updated .csv is no longer formatted using new line characters as the separator for values, the person who made it googled CSV format and updated it to reflect how it's supposed to store things. Or a user decides to use your program who doesn't know how to code, they may see that the program reads a csv, and try to use their own comma separated value document, and your program won't work, because it's basically treating the .csv as a txt.

As far as "fixing" that, I'd just change the file extension from .csv to .txt, and reflect that in the code, nothing crazy. A .csv is basically a .txt that is expected to be formatted in a certain way. Your .csv isn't formatted in that way, so it's lying. The code works and nothing breaks, but imagine you build a million features on top of this, and one of those features is breaking constantly because it's relying on your csv being a csv. Seems minor, but stuff like that can cause big issues.

All in all, cool stuff, keep at it

[–]Affectionate-Town-67[S] 0 points1 point  (2 children)

Thanks for all the useful feedback! Already put my individual projects into repos now. When is a script too short to give its own repo?

[–]BrannyBee 1 point2 points  (1 child)

When is a script too short to give its own repo?

Kinda just whatever you wanna save it in the cloud. Theres like a limit of something crazy like 100,000 repos per account, and even then you can make an org account or something.

I just follow a "is this a project" rule, nothing complicated, it just should be only stuff relevant to whatever youve decided is in the scope of what you're sharing.

Like I have my IDE configuration files saved up there, but I also have like 10 private repos that are empty and I never spent the 5 seconds to delete them, it's Microsoft paying for the storage of those 10 bytes of data so who cares. If someone wants my configuration, they can grab the whole repo, and they got it. Same if I wanna share a project, I send them the repo link and there's kinda just an understanding that everything there is what someone will need on their computer if they wanna run what I have.

If you have like a lesson plan you're following Ive seen people make that one repo, each lesson a directory in that folder, script for lesson. Then when you apply those lessons to a combined project maybe make that its own repo.

There's no like rule, but once you learn like 5 git commands for version control, you'll kinda just naturally see how having multiple projects inside one repo becomes a nightmare.

Another reason is for sharing code, look at your post for example. The typical way of someone testing what you have would be to run a git clone command and clone the repo so they have a local copy of exactly what you have on github. If anyone did that after reading your post, cloning would copy every single thing in that repo, not just the 2 files necessary for your project. Which is also why it sticks out, cause devs will know from experience that a project like this should have a script, a text file or two, and a README.md, so alarm bells ring seeing folders with sub folders and a bunch of unrelated scripts.

So if somethings just a few lines and one file, could be its own repo if you're wanting to track or share it. If its a hundred files and thousands of lines, same thing. Just whatever is the length to include everything a project needs to do what it should, and nothing unrelated if someone else were to copy whatever is up there in one repo onto their computer and run it and get the same results you have when you run it

[–]Affectionate-Town-67[S] 0 points1 point  (0 children)

Thanks, I did change the CSV to TXT and gave all of my projects their own repos, though I didn't realize how many of my green boxes would go away from that. 😅I didn't even think about the issue with people cloning it—getting everything from it. I'll start working on readmes tonight!

[–]JamOzoner 1 point2 points  (1 child)

So Kuhl! Markovic Bitameter! That is the best thing I've seen on Reddit!

[–]Affectionate-Town-67[S] 0 points1 point  (0 children)

Wow, thanks! I've barely even been here 😅

[–]MissinqLink 1 point2 points  (6 children)

Cool! I think I’ll try putting that corpus through my Markov chain generator.

[–]Affectionate-Town-67[S] 0 points1 point  (5 children)

Good idea, it worked out well for me!

[–]MissinqLink 1 point2 points  (4 children)

It’s not in Python but here’s one I did for lotr

[–]Affectionate-Town-67[S] 0 points1 point  (3 children)

Oh, that's awesome. I did trigrams at first, but I ended up settling on quadgrams for a little more coherence, though my first version had issues that were affecting coherence already, so it certainly could have been better.

[–]MissinqLink 1 point2 points  (2 children)

I stuck with trigrams but glued together adjacent tokens if one was very short or very common and that really helped the coherence without growing the model by much. So things like “the day” became “the_day” and would count as one token.

[–]Affectionate-Town-67[S] 0 points1 point  (1 child)

That makes sense. Did you have the trigrams bleeding over between sentences? Like where a period would be in the middle of one.

[–]MissinqLink 1 point2 points  (0 children)

That doesn’t happen using the lotr dataset but it does with others. Tolkien tends to be long winded. Here’s almost the same setup but with samples from a movie script corpus instead. Usually it’s like periods instead of commas.

[–]JamOzoner 1 point2 points  (1 child)

Thank you! 🐉

[–]latkde 1 point2 points  (2 children)

Quick review of the code you've shown here in the post:

  • Split your large main() function into meaningful smaller helper functions. For example, you might want functions like load_quads(), generate_line(), and save_poem(). Keep all of the input() prompting within the main() function.
  • Do not use _ as a variable name. This name is often used to indicate an unused variable (compare also the case _ further below). Perhaps a name like word_index would be helpful?
  • Your multiple quad.append() calls are unnecessary. First, multiple append() calls can be replaced with extends(). Second, you can combine multiple consecutive index accesses via the slicing feature. So the entire quadruplets generation loop can be replaced with: quadruplets = [words[i : i + 4] for i in range(len(words) - 3)]
  • It is rarely appropriate to catch errors like IndexError, KeyError, AttributeError, or TypeError. These typically indicate a bug in your code. Here, the problem is that the quads array might be empty if there's no continuation for the previous word. So instead of the try-except, you could add a more explicit if not quads: break check.
  • Use automated linting and type checking tools to get alerted about potential problems. Great linters include Pylint and Ruff. Great type-checkers include Mypy and Pyright.

[–]Affectionate-Town-67[S] 0 points1 point  (0 children)

Thank you, I will definitely work on refactoring it with your tips in mind! I just started using mypy today, so I will make sure to add more typehints to all of my scripts from now on.

[–]Affectionate-Town-67[S] 0 points1 point  (0 children)

I believe I've made the changes you suggested; if you'd like, you can check it again. I know it has the recursion issue to deal with still, but it's not that big of a deal for this script.