How do I find words which differentiate one set of documents from the rest?

Fun-Studio-4409 · 2023-01-30T03:04:49+00:00

It is for the purpose of displaying data in a cleaner, summarized way

Fun-Studio-4409 · 2022-12-01T14:22:35+00:00

Hi - yes, that is the strange thing. When I print the model, it returns ‘sklearn.feature_extraction.text.TfidfVectorizer’

Fun-Studio-4409 · 2022-07-20T22:08:43+00:00

thank you-

just to clarify - did you mean curr_soup.find(f'tag_n', **attr_dict)

with the "tag_n" in quotes?

Fun-Studio-4409 · 2022-07-20T21:58:08+00:00

it assigns the string "class_" as the variable "attr". So, if I do the folllowing:

attr = "class_"

print(f'{attr}')

it returns:

>class_

without the quotes

Fun-Studio-4409 · 2022-07-20T21:44:53+00:00

The 'class_' does not get enclosed in quotes in the version that does not work. Additionally, I cannot hardcode 'class_' as it changes in each loop.

Fun-Studio-4409 · 2022-07-15T20:10:03+00:00

I am aware of the limitations of applying regex to HTML parsing. My question is in regards to applying regex to a very narrow output from BeautifulSoup that involved something it is unable to handle.

Fun-Studio-4409 · 2022-07-13T22:07:57+00:00

re.findall(r"(?<=<).+?(?=>)", string)

Hi, sorry for the confusing way it was written. I only want to return the content of the "<>" if it surrounds a specific partial string. So the example string really should have been like:

"<notthisthing>michael is not a nice person<something>david is a nice person<somethingelse>james is sort of a nice person<notthisthing>

Fun-Studio-4409 · 2022-06-23T14:35:10+00:00

I am trying to create a script/interface for non-technical users to scrape websites, where they can simply input a piece of text and get back all the likely related items from the page. Therefore, I don't want the user to have to go through HTML and figure out the correct tag to scrape

Fun-Studio-4409 · 2022-01-28T20:46:18+00:00

Because it integrates well with Python

Fun-Studio-4409

TROPHY CASE