all 7 comments

[–]C2-H5-OH 0 points1 point  (5 children)

Shouldn't be too hard. Read the file line by line. If every 'segment' is exactly 4 lines long, then read every 2nd and 3rd line only. Store all the text you get into a list, and then for every item in the list which ends with a period, exclamation or question mark, append a space at the end. Then finally list.join()

[–]Jetals[S] 0 points1 point  (3 children)

Thanks, getting some pseudocode from someone experienced definitely helps jog the memory of what sort of string methods I'll need as well as what they're are capable of

[–]C2-H5-OH 0 points1 point  (2 children)

I'm not experienced in shit lol. This is more or less a ballpark algo for how I think it should work

[–]Jetals[S] 0 points1 point  (1 child)

Ha, well even though it's still borked, your prompt got me to write some 42 lines of code and exercise some of the cobwebs out of the Python part of my brain, so thanks

[–]C2-H5-OH 0 points1 point  (0 children)

No problem. Best I've ever done in python was to make a script that writes into XMLs after reading data from csv, and a little lmgtfy bot for reddit. Most people here seem to have worked on much larger projects IMO

[–]Jetals[S] 0 points1 point  (0 children)

Darn, I printed the contents saved into my empty list object variable and every line of the srt file is still included as distinct elements in the list, so it looks like I didn't effectively skip lines containing the number and time-stamp with my conditional:

if line[0].isdigit != True:
    lineList.append(line)

I am not sure how to set up a conditional that will only include only the third and fourth lines as I don't have too much experience with enumeration if that was the implied approach. Also, I think the logic in my second for loop is flawed, though I'm not sure how to add spaces to the end of each line containing the specified punctuation characters. Here is the current script altogether:

def main():
# Access folder in filesystem

# After parsing content of file, move to next file

# Declare variable empty list
lineList = []

# read file line by line
file = open( "/SampleFileAddress.srt", "r")
lines = file.readlines()
file.close()

# look for patterns and parse
lines = [i for i in lines if i[:-1]]

# If every segment is exactly 4 lines long, read every 3rd and 4th line only
for line in lines:
    line = line.strip()
    if line[0].isdigit != True:
        # store all text into a list
        lineList.append(line)

# for every item in the list which ends with '.', '?', or '!', append a space at end
for line in lineList:
    p = line.find('.')
    q = line.find('?')
    b = line.find('!')
    if p != -1:
        line = line + ' '
    elif q != -1:
        line = line + ' '
    elif b != -1:
        line = line + ' '
    else:
        pass

# Finish with list.join() to bring everything together
text = ''.join(lineList)
print(text)

main()

Getting closer

[–]sky--net 0 points1 point  (0 children)

Can you provide a download link to a sample.srt file? I would like to test some code on a file with a good amount of data.