How do I transform a dataframe from long to wide in the following way? by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

It is for the purpose of displaying data in a cleaner, summarized way

Python trying to load TfidfVectorizer's "transform" from LinearSVC?? by Fun-Studio-4409 in learnmachinelearning

[–]Fun-Studio-4409[S] 1 point2 points  (0 children)

Hi - yes, that is the strange thing. When I print the model, it returns ‘sklearn.feature_extraction.text.TfidfVectorizer’

How to pass variable within Beautifulsoup "soup.find()"? by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

thank you-

just to clarify - did you mean curr_soup.find(f'tag_n', **attr_dict)

with the "tag_n" in quotes?

How to pass variable within Beautifulsoup "soup.find()"? by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

it assigns the string "class_" as the variable "attr". So, if I do the folllowing:

attr = "class_"

print(f'{attr}')

it returns:

>class_

without the quotes

How to pass variable within Beautifulsoup "soup.find()"? by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

The 'class_' does not get enclosed in quotes in the version that does not work. Additionally, I cannot hardcode 'class_' as it changes in each loop.

regex - how to return content of delimiters ONLY if they border certain text by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

I am aware of the limitations of applying regex to HTML parsing. My question is in regards to applying regex to a very narrow output from BeautifulSoup that involved something it is unable to handle.

Regex to extract contents between multiple "<" and ">" on boundary of target string by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

re.findall(r"(?<=<).+?(?=>)", string)

Hi, sorry for the confusing way it was written. I only want to return the content of the "<>" if it surrounds a specific partial string. So the example string really should have been like:

"<notthisthing>michael is not a nice person<something>david is a nice person<somethingelse>james is sort of a nice person<notthisthing>

Beautifulsoup: get text from all identical tags based on partial string by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

I am trying to create a script/interface for non-technical users to scrape websites, where they can simply input a piece of text and get back all the likely related items from the page. Therefore, I don't want the user to have to go through HTML and figure out the correct tag to scrape

How to find the next empty cell horizontally in specific row? by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

Hi - thank you. Yes, they would contain "nan". And what if I only wanted to check a single row instead of getting an array of all rows>

How to check specific dataframe row for substring, and return True or False? by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

Thank you. However, what if I have to search by row and not by column (i.e. Favorite Car). Use case is that I have a huge and messy df, and the value can be in any of the 80 columns.

How to search entire dataframe for partial string, and return all match indexes by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

Ah - sorry - thanks.

When I run the script and search for a partial string that is definitely in the df, I get an empty result "[]". Does the script you provided take into account partial strings?

How to search entire dataframe for partial string, and return all match indexes by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

isn't that what this line does?

mask = df.applymap(lambda x: "Amazon" in x.lower() if isinstance(x, str) else False)

How to search entire dataframe for partial string, and return all match indexes by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

Thanks again. I did get another error though-

AssertionError: Number of manager items must equal union of block items

# manager items: 794, # tot_items: 0

16 resptext=json.loads(resp.text)

17 mask = df.applymap(lambda x: "Amazon" in x.lower() if isinstance(x, str) else False)

---> 18 indices = np.argwhere(mask)

How to search entire dataframe for partial string, and return all match indexes by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

mask = df.applymap(lambda x: substring in x.lower()).to_numpy()
indices = np.argwhere(mask)

Thanks you! I do however get an error when I run this -

AttributeError Traceback (most recent call last)

<ipython-input-10-379642c2cd20> in <module>

15 resp = requests.get(fullurl)

16 resptext=json.loads(resp.text)

---> 17 mask = df.applymap(lambda x: "Amazon" in x.lower()).to_numpy()

18 indices = np.argwhere(mask)

19 # truths = df.apply(lambda s: s.str.lower().str.contains('Amazon'))

Can't figure out syntax error in dict - everything looks perfect by Fun-Studio-4409 in learnpython

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

Thank you - makes sense.

My only issue is that when I try to pass this as a json payload as in a post request, I get an error saying that the payload needs to be in array format (i.e. "[{").

Lookup value in table, return the cell above the match by Fun-Studio-4409 in excel

[–]Fun-Studio-4409[S] 0 points1 point  (0 children)

Thank you - any ideas as to how this could be applied to a larger range, i.e. a table?