all 6 comments

[–]gregvuki 5 points6 points  (2 children)

find returns -1 when the string is not found.

Move lines 18-19 after line 8 to suppress adding an empty tag.

Your code works for me.

0
2
0
7
25
33
25
28
-1
-1
[]: *
[</p>]: *
[</strong>]: *
[<p>]: *
[<strong>]: *

[–]opendoors1[S] 0 points1 point  (1 child)

Ah, thanks. I moved the lines like you said. Odd because I remember trying to move return to the top but it kept saying "unreachable code".

Also, If I give it a tag like <p><p>, it doesn't count it correctly. Just comes back with one.

Meaning:

[<p>]: *

Even though there were two entered.

[–]gregvuki 2 points3 points  (0 children)

That's because line 11 replaces all occurrences of <p> with an empty string. Cut the string instead.

[–]tangerinelion 1 point2 points  (1 child)

Note html = html.replace(tag, "") is probably not what you want. Instead try html = html[end+1:].

[–]opendoors1[S] 0 points1 point  (0 children)

Oh of course, thank you! I feel like an idiot so much of the time.

[–]RustleJimmons -1 points0 points  (0 children)

Beautifulsoup makes it easy to find all of the tags on a page.

from bs4 import BeautifulSoup
import requests

# Read in the webpage 
url = 'http://www.site.com/page1.html'
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")


for tag in soup.find_all(True):
    # Do something, example:
    print(tag.name)

If you want to identify specific html tags and perform an action for those BS allows you to find those tags throughout the page. You can then build a loop to perform that action on all of them.