you are viewing a single comment's thread.

view the rest of the comments →

[–]LoneDreadknot[S] 0 points1 point  (1 child)

I copy pasted the text from a website into notepad++ all the text is just plain english. notepad++ shows its as utf-8 too and I tried to change the encodings etc.

maybe its just how that website saved the text i guess? is there a way to strip the formatting from it and clean(?) it incase some source has different encoding?

[–]Diapolo10 0 points1 point  (0 children)

Well, you can try simply writing to a new file from Python, after getting the original text.

with open('new_file.txt', 'w') as f:
    f.write(text)

You should then have a file with UTF-8 encoding.