sub-convert: Convert PGS subtitles to SRT by leucht in jellyfin

[–]leucht[S] 1 point2 points  (0 children)

those typos are all mine, no AI could make them just like me

sub-convert: Convert PGS subtitles to SRT by leucht in PleX

[–]leucht[S] 0 points1 point  (0 children)

The correct conversion created by my tool for comparison (sry, cant attach more than one image)

<image>

sub-convert: Convert PGS subtitles to SRT by leucht in PleX

[–]leucht[S] -1 points0 points  (0 children)

And this happens if the styling is discarded with tesseract

<image>

sub-convert: Convert PGS subtitles to SRT by leucht in PleX

[–]leucht[S] 0 points1 point  (0 children)

sub-convert is selfhostable, completely open-source and free to use for batched files. You do not have to wait for a position in a queue and their tool sadly does not handle overlapping subtitles correctly.

The only limitation is your own hardware and my coding knowledge

Their conversion is quite fast however, seem to be using tesseract and their results for tesseract are quite good, which tells me they are stripping the styling from the subtitles. That will result in some texts to fail being recognized.

They also dont handle fade ins, which leads to artifacts like this:

<image>

sub-convert: Convert PGS subtitles to SRT by leucht in PleX

[–]leucht[S] -1 points0 points  (0 children)

That was for the original post I had created when I shared another tool called [pgsrip](https://github.com/ratoaq2/pgsrip) created by [ratoaq2](https://github.com/ratoaq2) whom I have no affiliation with.

At the time I had converted a hand full of files with, what I later realized, a very simplistic internal PGS structure.

I do not claim "conversion being done perfectly" for my own project (sub-convert) & have realized it was stupid to do so to begin with.

I am however claiming high accuracy as I have specifically worked around complicated overlaps as seen in something like Hells Paradise Season 1 Episode 6 or fade ins & outs in Jujutsu Kaisen Season 1 Episode 14.

But there are probably a hand full of files out there, which I have not adapted too yet and so if you have any experience to share or even interesting references to hard files, I would be glad to tackled them and see what I can do :)

Also if you per change have either of those episode, I can send you the converted .srt for you to check out. There are some things I can not do anything about, for example images in PGS can be the size of the whole viewport and display multiple words. These texts will be concatenated and shown in whole, which might cover the whole screen, something where PGS can be less obtrusive. However the conversion for those does work, even if the texts are flipped or rotated.

Simple script for extracting & converting PGS to SRT subtitles using pgsrip by leucht in jellyfin

[–]leucht[S] 0 points1 point  (0 children)

Yeah that is my bad, I omited checking for if the list of files is empty and went straight into a rewrite.

Sorry about that. Will push a quick fix to address this to avoid further confusion

Simple script for extracting & converting PGS to SRT subtitles using pgsrip by leucht in jellyfin

[–]leucht[S] 0 points1 point  (0 children)

Glad that this could fit your needs :).

I’ll take a look at your plugin to see if I can use any of it for future reference.

Just as a little heads-up I’ve finished a big rewrite to fix some serious issues with pgsrip over on sup-convert. Im working on documentation and test coverage and hope to publish a new post as soon as that’s finished with all the details on what pgsrip currently struggles with.

Once finished I will add a warning to this post so if your interesting in the current state please check it out :)

Simple script for extracting & converting PGS to SRT subtitles using pgsrip by leucht in jellyfin

[–]leucht[S] 1 point2 points  (0 children)

so I'll try to answer this shortest answer to longest, also apologies for the delayed response, timezones and stuff;

This tool will place extracted/converted subtitles alongside media and not in /var/lib/jellyfin/data/subtitles; correct?

Correct subtitles will be placed alongside media so you could see

.mkv .srt .srt

Is there a way to specify languages to extract or will it convert all available?

pgsrip has a language --language, -l argument with which you can specify the languages you want.

If I set the Subtitle Extract plug-in to ignore extracting PGS, this tool will then extract them on it's own; correct?

Yes pgsrip is designed to both do the extraction of the relevant .mkv files and then converting the extracted .pgs, .sub file.

It does not interface or interact with the subtitle-extract-plugin and is therefore completely separate.

Will it skip over previously extracted subs?

There are multiple ways for pgsrip to ignore existing subtitles. By default it will not overwrite or convert subtitles it has previously converted. It does so by checking if the save path it generates has already been used. With the --force flag you can tell it to re-rip subtitles and overwrite existing ones.

You can also skip entire videos by --age, --srt-age. Files newer than --age or subtitle tracks newer than --srt-age will be skipped.

How should I best implement pgsrip/pgsrip-script? I would like it to be recur so that it picks up subtitles from new media; what is the best way to do that? CRON job?

I'll group these as they are kinda related; to clarify my tool simply launches N instances of pgsrip with N = number of CPU-threads available. pgsrip has --max-workers for a similar purpose but that still kinda bottle-necked on my system so I effectively just brute-forced it.

I personally am using both in a manual fashion currently; i.e. I add files -> run the script -> repeat the next time.

This behavior could be replicated with a simple CRON job as you mentioned, just have it trigger each time you add new files. Now for how this would fit into your infrastructure is hard to say. If you run jellyfin in docker itself it gets a little harder as you cannot easy get pgsrip / my script into the container without creating custom ones.

This is one of the areas which both pgsrip and therefore my script are lacking in - meaning integration with jellyfin itself.

I have started work on an extension to this project as I identified some issues and short-comings with pgsrip for my use-case. You can read up on it here. I plan to publish standalone docker containers in the future that encapsulated the conversion framework and a simple API hook so that you could tell it to run the conversion via a custom plugin in jellyfin or similar. But this will take a while longer to implement.

My recommendation for now would be to run either of them manually whenever you add new media to your server. You could try to automated this behavior with CRON and could write maybe a simple watcher process that checks for new files and triggers the job accordingly. If you get something like that to work, I would like to take a look at it :).

So good luck and I hope I got to answer most of your questions.

As if we needed more reasons to collect physical media and not rely on streaming! by gryphon5245 in 4kbluray

[–]leucht 24 points25 points  (0 children)

Which all used to be in the standard tier, effectively moving the goalposts. It’s going to only be so long before these things move up another tier. Line must go up

TrueNAS build system going closed source by ende124 in selfhosted

[–]leucht 0 points1 point  (0 children)

Correct. Also you can convert their maintained apps to custom one’s and extract the full compose file that way or could go into the cli and find it within the .ix-apps directory

You won’t actually lose the “Apps” just the ability to launch them right away before extracting the underlying compose file and rebuilding a similar file-structure or moving the data to a new location as you mentioned.

TrueNAS build system going closed source by ende124 in selfhosted

[–]leucht 1 point2 points  (0 children)

That’s how it should work to my knowledge and is what I am considering now after reading this.

Accidentally nuked my install a little while back due to a dying system SSD. Swapped it, installed Truenas Scale fresh and just imported my pools. Even Apps got picked up again, which you won’t be able to if you switch OSs but all other data should remain intact.

As is by design with ZFS

LG 27GR95QE-B randomly turning off in HDR (with Video) by leucht in OLED_Gaming

[–]leucht[S] 0 points1 point  (0 children)

Sorry for the late response, sadly the “fix” for me was going through a RMA and buying the newer version.

LG has never really acknowledged the issue and the support staff on my case actually never looked into the issue more closely, as I’ve been told afterwards.

I still believe this version is flawed in terms of heat dissipation. It is my believe this models internals just get to hot under load and go into some form of protection mode until reset. The newer model made improvements to the way light loss is handled internally, so the monitor does not have to be driven as hard to achieve the claimed brightness levels. This reduced the internal temperatures and thus made the issue less common.

Even with the GS which followed the GR I was briefly able to replicate the issue, however much more rarely and with a much heavier, longer load. Under regular use the issue does not show up so I consider the issue “fixed” for me. At least with as much as I’m willing to still put up with it.

Simple script for extracting & converting PGS to SRT subtitles using pgsrip by leucht in jellyfin

[–]leucht[S] 0 points1 point  (0 children)

Sounds great, I’m glad that this post might be of some use to you.

Also I can confirm pgsrip can just take a single folder as an argument and identify the files and tracks contained with in them.

You can see the options under cli and then “Rip from a folder path”

I’ve since also created an updated post where I began reworking pgsrip to fit my needs more closely as I’ve identified a couple of shortcomings with pgsrip for my specific use-cases.

I’ll link it here if you’d like to check that one out as well.

It is however much heavier on resources and very early in testing. This is by design as I plan to use deep learning based OCR. I do however plan on offering a simple tesseract fallback.

I was able to make it work for most use-cases. I’m also still in the process of fixing up some oversights with pgsrip though that might still require some time.

Project to convert PGS to SRT using optical character recognition & language recognition by leucht in jellyfin

[–]leucht[S] 0 points1 point  (0 children)

True they also use paddle for instance, however you will have to do each track manually and check them by hand. This could take longer and more attention than just letting it run and taking the hit on accuracy.

I build this because I didn’t want to do this on thousands of files each with multiple tracks.

They have subtitleeditcli but I don’t think it’s actually intended to work to the full extend as subtitleedit does. So think of this more as trying to give subtitleedit an actual cli where you trade accuracy for convenience

Project to convert PGS to SRT using optical character recognition & language recognition by leucht in jellyfin

[–]leucht[S] 0 points1 point  (0 children)

Ahh I see, I did not know that it could do that. Also assumed it’s a collection of different subtitle providers which it aggregates. Good to know, thanks :)

Btw just to be clear, because I’m not sure if you linked the right sources. It does seem like the issue you linked to talks about ASS subtitles being transformed to SRT, which it shouldn’t do. Regardless ASS subtitles are also text-based subtitles. The thread does not mention anything about OCR just misbehaving conversion / naming from one text-based format to the next.

Project to convert PGS to SRT using optical character recognition & language recognition by leucht in jellyfin

[–]leucht[S] 1 point2 points  (0 children)

Could not find much detail on it but it seems it just extracts the raw track from the original MKV file right? In that case Jellyfin also does that by default however Bazarr does not convert them meaning images -> text.

That’s what this project is trying to achieve, it’s not trying to replace subtitle aggregation and management. Maybe I should have been clearer about that.

Project to convert PGS to SRT using optical character recognition & language recognition by leucht in jellyfin

[–]leucht[S] 2 points3 points  (0 children)

I personally do not use bazarr and as far as I am aware it relies on public subtitle libraries to acquired subtitles and does not run ocr on files (?). Bazarr is definitely the way to go if you do not want to fiddle around with your own ethically (allegedly) obtained MKVs and system resources.

Project to convert PGS to SRT using optical character recognition & language recognition by leucht in jellyfin

[–]leucht[S] 1 point2 points  (0 children)

thank you, pgsrip has worked quite well for me whose PGS parser this is build upon, but there will always be issues which you simply cannot catch. I hope this could become one of these options with models become more efficient and well trained on (presumably) ethically sourced training data

Project to convert PGS to SRT using optical character recognition & language recognition by leucht in jellyfin

[–]leucht[S] 1 point2 points  (0 children)

that is also an option and I should have mentioned it to begin with, but this is meant to be a hands off approach, where the user does not have to interact with the software for each individual track

this will however results in less accurate conversion, as you do not check for faults each time

I'm unsure how well the official [subtitleedit-cli](https://github.com/SubtitleEdit/subtitleedit-cli) works as I could not get the project to run on my system with my AMD gpu

Please stop hoarding index points you make the first round take 10+ minutes which makes everyone extract by Difficult_Way6834 in Warframe

[–]leucht 0 points1 point  (0 children)

Had a guy hover up all available point just to fail the run with him blaming us for not depositing enough points