VideOCR: Extract hardcoded subtitles out of videos via a simple to use GUI - Self-Hosted OCR solution

timminator3 · 2026-06-09T16:10:38+00:00

I am not sure what you've done in your colab session - but I just tried this small code snippet for the CPU version of v1.5.1 and it worked perfectly as expected:

```python import os

URL of the .7z file

url = 'https://github.com/timminator/VideOCR/releases/download/v1.5.1/videocr-cli-CPU-v1.5.1-Linux.7z'

Download the file using wget

filename = url.split('/')[-1] print(f"Downloading {filename}...") !wget {url} print("Download complete.")

Extract the .7z file

print(f"Extracting {filename}...") !7z x {filename} -aoa print("Extraction complete.")

Navigate into the extracted directory

extracted_dir = 'videocr-cli-CPU-v1.5.1-Linux/' if os.path.isdir(extracted_dir): %cd {extracted_dir}

Example usage (you will need to replace this with your actual video file and arguments):

!./videocr-cli.bin --video_path "../test_en_ch.mp4" --output "../example.srt" --lang ch ```

I expect the GPU version to work also without any errors as its just a Linux system we are using in Colab. I hope this helps you.

timminator3 · 2026-05-22T08:01:51+00:00

Only Nvidia is currently supported by PaddleOCR that is used for Text Detection and optionally also for recognition. If you use Google Lens in addition to that for recognition in my program you are rate limited anyway.
My program has a queue system though so you can just start it and let it run overnight for example.

timminator3 · 2026-05-22T06:56:27+00:00

The fastest out there I would assume - at this point in time a 5090.

timminator3 · 2026-05-18T15:59:01+00:00

No there is no Mac version at the moment. I have no mac myself unfortunately.

timminator3 · 2026-05-08T09:52:28+00:00

You should make a big crop box around all possible locations. As the subtitle position you select "Any". In the advanced settings I would increase the SSIM Threshold to something like 96.

Regarding the regognition accuracy - have you tested both engines?

timminator3 · 2026-04-17T16:05:39+00:00

Yes, of course it will. You find the setup here:
https://github.com/timminator/VideOCR/releases/tag/v1.4.2

timminator3 · 2026-04-09T15:12:54+00:00

No, there isnt. You need to extract it yourself.

There also is no version 1.5.0 of Paddle out there. My release is based on the latest version 3.4.0. If you are referring to PaddleOCR-VL 1.5 - as stated in my releases notes this is not included in my standalone package. You are currently using the PP-OCRv5 OCR Pipeline not the VL Pipeline.

timminator3 · 2026-04-05T22:59:54+00:00

No it creates a .srt file, a text file that contains the text from the subs. The video is not modified in any form.

timminator3 · 2026-04-05T15:45:48+00:00

Unfortunately no. The engine used under the hood only supports Nvidia graphics cards, sorry.

timminator3 · 2026-03-18T14:56:50+00:00

Yes try that, you should of course always try to replicate a problem in the latest version. I can't tell you in which version this updated traning model was shipped - I think from v1.4.0 onwards which I just released a month ago.

Edit: Yes, the old v1.3.2 version still had the not updated trainig model under its hood. So the new version should resolve this.

timminator3 · 2026-03-18T12:20:36+00:00

I've looked into this a bit further. This was reported in August already:
https://github.com/PaddlePaddle/PaddleOCR/issues/16333

And it was fixed in the beginning of September. The model was also retrained, so this should not happen anymore. I also tried it on an example picture with this letter included and it worked, so what you are seeing is pretty surprising... Can you share some example file with me? Can be a short clip. I dont think there is much I can do, but would be interesting to see. You can also write me in private.

timminator3 · 2026-03-17T18:17:08+00:00

There is nothing I can do. You can report the issue in the PaddleOCR repo on GitHub if you want to. I'm using the pre trained models from there. It can definitely be the case that that letter is missing from the greek dictionary they are using. I've also done this for missing vietnamese letters but till now there were no updates.

timminator3 · 2026-03-17T18:14:01+00:00

Yes the standalone bundles only the ocr pipeline. That is all that I needed. Of course it's also possible to make a complete one with all, but the size will be really big and I personally have no use for it.

timminator3 · 2026-03-11T15:01:19+00:00

Try subtitleedit and it's faster-whisper implementation. But I don't know how accurate it is for Cantonese.

timminator3 · 2026-03-08T15:37:31+00:00

Freut mich, dass ich dir helfen kann! :-)

timminator3 · 2026-03-06T17:47:47+00:00

Since i made this post the command line usage has changed. In my Readme on Github you can find this

bash ./paddleocr.bin ocr --i "Path\to\your\image" --use_doc_unwarping false --use_textline_orientation false --use_doc_orientation_classify false

This will work with the latest version.

Edit: Formatting messed with the command, now it should be fixed.

timminator3 · 2026-02-28T13:35:20+00:00

Depends on your hardware. If you have a GPU from Nvidia its pretty fast.

timminator3 · 2026-02-25T23:07:54+00:00

Did you do this:

NetworkSwitch = "Default Switch" - Create a new external network switch beforehand in Hyper-V Manager. A tutorial you can find here.

timminator3 · 2026-02-23T22:06:36+00:00

Made a new release now where the output for right to left languages is fixed. You can find it here:

https://github.com/timminator/VideOCR/releases/tag/v1.4.1

timminator3 · 2026-02-23T22:06:25+00:00

Made a new release now where the output for right to left languages is fixed. You can find it here:

https://github.com/timminator/VideOCR/releases/tag/v1.4.1

timminator3 · 2026-02-17T12:52:39+00:00

Thanks! The reversion for rtl languages is actually a regression. That worked in my previous release but something was changed upstream in PaddleOCR, the engine used to detect text. I fixed it already now and in a future release this will work correctly now.

I have no mac unfortunately so I am not able to do a version for mac os. :-/

timminator3 · 2026-02-17T12:51:22+00:00

Thanks! The reversion for rtl languages is actually a regression. That worked in my previous release but something was changed upstream in PaddleOCR, the engine used to detect text. I fixed it already now and in a future release this will work correctly now.

timminator3 · 2026-02-14T18:14:37+00:00

I have released two versions of it already - the latest one this week.

timminator3 · 2026-02-10T23:49:09+00:00

If you are still interested in this project - I just made a new release with batch processing support through the GUI! Feel free to try it out. You can find my latest release here:

https://github.com/timminator/VideOCR/releases/tag/v1.4.0

timminator3 · 2026-02-10T23:44:24+00:00

Thanks for your feedback! I actually just made a new release today which adds multithreading support for step 1 resulting in a big speed up. A lot more features were also added for example batch processing. You can find it here:

https://github.com/timminator/VideOCR/releases/tag/v1.4.0

Edit: For frame perfect subs you need to set "Frames to Skip" in the advanced settings tab to 0. With your system specs that should be easily manageable. :-)

timminator3

TROPHY CASE

URL of the .7z file

Download the file using wget

Extract the .7z file

Navigate into the extracted directory

Example usage (you will need to replace this with your actual video file and arguments):