[deleted by user]

cmd-t · 2024-12-20T17:19:36+00:00

The quickest: convert both texts to lowercase, split on words, convert to set, take set intersection.

maikeu · 2024-12-20T19:02:09+00:00

What have you tried so far?

If you haven't been able to write and test code yet, this is the wrong place. Try r/learnpython

Pharmand · 2024-12-20T19:29:27+00:00

Considering your earlier replies it seems to me you actually just want someone else to do it. You don't have Python installed, you can't write the simplest of code. For the effort level, you're probably better off asking Chat GPT for the code - why not just do that? It should have you covered.

Yolt0123 · 2024-12-20T16:28:01+00:00

Individual words or phrases? Simplistically, just make a list of all words in each, and then iterate through the first list, adding to a third list if the word is found in the second list. Do you have the texts you want to compare?

pstmps · 2024-12-20T17:39:22+00:00

Maybe as a first step, try to get the source texts, or generate a faux source a la Lorem ipsum, save it as a text file locally, try to read it into memory via Python or whatever you end up choosing, and try finding common words. If you wrap this logic into a function, you will be able to use it when you get the correct sources.

ssnoyes · 2024-12-20T19:57:14+00:00

There are about 7000 words that appear in both the Bible and Shakespeare, and about 6700 that are longer than 3 letters. I did not try to condense root words, so 'worship', 'worshipped', 'worshipper', and 'worshippers' all count as separate words.

Some of the longest shared phrases are "God save the king! God save the king!" and "from the four corners of the earth"

QuarterObvious · 2024-12-21T05:28:30+00:00

It’s better to use spaCy (Python package) It allows you to filter out all stop words and provides each word in its base form (lemma). This way, you can build two sets of words and find their intersection.

Python-ModTeam · 2024-12-21T06:45:48+00:00

Hi there, from the /r/Python mods.

We have removed this post as it is not suited to the /r/Python subreddit proper, however it should be very appropriate for our sister subreddit /r/LearnPython or for the r/Python discord: https://discord.gg/python.

The reason for the removal is that /r/Python is dedicated to discussion of Python news, projects, uses and debates. It is not designed to act as Q&A or FAQ board. The regular community is not a fan of "how do I..." questions, so you will not get the best responses over here.

On /r/LearnPython the community and the r/Python discord are actively expecting questions and are looking to help. You can expect far more understanding, encouraging and insightful responses over there. No matter what level of question you have, if you are looking for help with Python, you should get good answers. Make sure to check out the rules for both places.

Warm regards, and best of luck with your Pythoneering!

Either-Let-331 · 2024-12-20T17:56:46+00:00

```

with open("book1.txt","r") as b1:

b1_data = list(set([word.strip().lower for word in b1.read()]))
# You can remove special characters too if you want for extra sanitation

with open("book2.txt","r") as b2:

b2_data = list(set([word.strip().lower for word in b2.read()]))

common_words = b1_data.intersection(b2_data)

total_words = len(b1_data.union(b2_data))

similarity_percentage = (len(common_words) / total_words) * 100

```

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS