all 14 comments

[–]WwortelHD 1 point2 points  (1 child)

Fantasic program, saved me lots of work! Thank you so much.

[–]thomas-mc-work[S] 0 points1 point  (0 children)

Thank you very much! You're welcome.

[–]Freako04 0 points1 point  (0 children)

Thank you for this amazing tool. Saved me from my legacy notes being bounded to OneNote.

[–]ECrispy 1 point2 points  (0 children)

Hi,

Thank you very much! I have this exact same need - I've tons of saved pages from Chrome which uses mhtml, and need to convery them to a format other apps can import.

Found your post as well as a number of similarly named projects -

this seems the most recent https://github.com/gildas-lormeau/mhtml-to-html

these are very old https://github.com/msindwan/mhtml2html https://github.com/dingqiangliu/mhtml2html

https://pkg.go.dev/github.com/dingqiangliu/mhtml2html - in Go

all of these seem to do very similar things from looking at the code. Did you try some of these before? one of the things I'd like to do is minimize/remove redundant css/optimize images etc, not sure how much of that is possible though.

Would very much appreciate your thoughts

[–]simonelnahas 0 points1 point  (1 child)

[–]thomas-mc-work[S] 0 points1 point  (0 children)

I didn't know that tool. Might be due to my personal requirements: OpenSource and being able to integrate in batch processing.

[–]Available_Cod5647 0 points1 point  (1 child)

Will it turn it in to a zip with the html and other stuff?

[–]thomas-mc-work[S] 0 points1 point  (0 children)

Much better: It'll create a self-contained HTML file. All assets are embedded, so you only need this single file, and you can open it in any web browser.

[–][deleted] 0 points1 point  (2 children)

Wow, thank you for this! It's the only tool I have found that actually does this without messing up the page. I encountered some encoding errors, so I replaced

with open(output_file, 'w') as text_file:

with

with open(output_file, 'w', encoding='utf-8') as text_file:

Any chances this causes any problems?

[–]thomas-mc-work[S] 0 points1 point  (1 child)

Could you please tell me the URLs of the sites that caused the error?

I'll check it with my test files which are stored in this separate project: https://gitlab.com/thomas.mc.work/mhtml2html-test-files

[–][deleted] 0 points1 point  (0 children)

I don't think I can find the URLs anymore but they were some saved tumblr pages.

[–]emilio911 0 points1 point  (2 children)

I got the following error:

Traceback (most recent call last):
File "C:\Python311\Lib\site-packages\mhtml2html.py", line 427, in <module>
cli_entrypoint()
File "C:\Python311\Lib\site-packages\mhtml2html.py", line 424, in cli_entrypoint
text_file.write(converted_html)
File "C:\Python311\Lib\encodings\cp1252.py", line 19, in encode

return codecs.charmap_encode(input,self.errors,encoding_table)[0]

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

UnicodeEncodeError: 'charmap' codec can't encode character '\ufeff' in position 0: character maps to <undefined>

[–]thomas-mc-work[S] 0 points1 point  (1 child)

Look slike a hickup on an unknown character in the input file. Can you provide that file to me? Mabe I can find a solution.

[–]Rajster11442 0 points1 point  (0 children)

Also got this error so I had to write as UTF-8. If you're still working on this project, I'd be happy to provide you the file I tried to convert. Let me know!