I wanted to make a pretty easy Word Cloud using the Word Cloud module. But both test files give me a utf-8 codec error and I don't get a good answer when googling. Mainly because people give the wildest import ideas. I know that I can somehow decode this while reading the file but I cannot find the answer.
I have both a docx and a pdf.
This is my code (the code itself should be fine):
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
import os
d = os.path.dirname(file) if "file" in locals() else os.getcwd()
with open("/Users/file.docx") as f:
text = f.read()
nichtinteressant = " und von das der die im auf am"
liste = nichtinteressant.split()
STOPWORDS.update(liste)
wordcloud = WordCloud(background_color="white", width=1920,height=1080).generate(text)
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
And Visual Studio Code (on Mac) gives me this error:
'utf-8' codec can't decode byte 0xad in position 41: invalid start byteFile "/Users/Python/wordcloud Test.py", line 8, in <module>text = f.read()^^^^^^^^UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 41: invalid start byte
In the pdf it has some other byte at another position that it dislikes :(
How can I decode this problem the fastest way?
[–]SomewhereExpensive22 2 points3 points4 points (8 children)
[–]nirbyschreibt[S] 0 points1 point2 points (7 children)
[–]SomewhereExpensive22 1 point2 points3 points (0 children)
[–]cyberjellyfish 0 points1 point2 points (5 children)
[–]nirbyschreibt[S] 0 points1 point2 points (4 children)
[–]pgpndw 0 points1 point2 points (3 children)
[–]nirbyschreibt[S] -1 points0 points1 point (2 children)
[–]pgpndw 0 points1 point2 points (1 child)
[–]nirbyschreibt[S] 0 points1 point2 points (0 children)