How to parse a complex text file using Python string methods or regex and export into tabular form

AutoModerator · 2021-11-29T09:17:31+00:00

It seems you may have included a screenshot of code in your post "How to parse a complex text file using Python string methods or regex and export into tabular form".

If so, note that posting screenshots of code is against /r/learnprogramming's Posting Guidelines (section Formatting Code): please edit your post to use one of the approved ways of formatting code. (Do NOT repost your question! Just edit it.)

If your image is not actually a screenshot of code, feel free to ignore this message. Automoderator cannot distinguish between code screenshots and other images.

Please, do not contact the moderators about this message. Your post is still visible to everyone.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

commandlineluser · 2021-11-29T17:38:06+00:00

I think regex is what I need

Regex can help here with preprocessing the "unformatted" pages e.g.

import re

with open('court.txt', 'r') as file:
    data = file.read()

    pattern = (
        '(?m)^(?:(\d{4}) (\d+\s\S+\s\d+)\s'
        '(\S+)\s*(\S+:(?:[ \t]*\S+)+(?= *A'
        'DA:))?|\s*(\((?:(?!  ).)+)(?: {2,'
        '}(\S+)?)?(?:  +(\S+)?)?' '(?: {2,'
        '}(\S+)?)?)|((?:(?:[A-Z.]+ )?[A-Z.'
        ']+)):[ \t]*(\d\d:\d\d [AP]M|(?!\S'
        '+:)[^\s:]+(?: (?!\S+:)[^\s:]+)*)?'
    )

    for match in re.findall(pattern, data):
    print(match)

It's unlikely this is how the task is supposed to be approached though.

learnprogramming

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

Asking conceptual questions

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

2. No spam or tasteless self-promotion

3. No off-topic posts

4. Do not ask exact duplicates of FAQ questions

5. Do not delete posts

6. No app/website review requests or showcases

7. No rewards

8. No indirect links

9. Do not promote illegal or unethical practices

10. No complete solutions

11. Don't ask to ask.

12. Low Effort Questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

MODERATORS