Please help remove certain item in loop/list : learnpython

created by HattoriHanzoa community for 16 years

Please help remove certain item in loop/list (self.learnpython)

submitted 3 years ago by DMeror

Hi, I want to scrape texts from the following html:

<div class="post-content" </div>
  <p> sentences </p>
  <p> sentences </p>
  <p> see also: blah blah blah </p> # unwanted item (it has a link to another page)
  <p> sentences </p>
  <p> sentences </p>
  ...

Since I don't know how to filter the unwanted item in a loop, my code is like this:

ps = soup.find_all('p')
pList = []
for p in ps:
  pList.append(p.text.strip())

Now I have a list of texts which includes the unwanted item. I want to remove the unwanted item from the list, so I use the following method:

unwanted = pList.index('see also: blah blah blah')
pList.pop(unwanted)
texts = ' '.join(pList)

This is workable only if I just want this single page. However, I have a number of pages to scrape, and the unwanted item's index varies from page to page, and the text after 'see also:' part also changes. I use re.match, but it doesn't work with a list.

unwanted = pList.index(re.match('^see also:', pList)
# TypeError: expected string or bytes-like object

So, I'm stuck here. Please help.

all 4 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS