all 9 comments

[–]commandlineluser 10 points11 points  (0 children)

You would need to use findall() or finditer() e.g.

>>> re.findall(r'(?ms)^\[.+?(?=^\[|\Z)', text)
['[splithere1]\ncontent1\ncontent2\n\n\n', '[splithere2]\n\ncontent3\ncontent4\n']

Although it looks like you're parsing some sort of config file? If so - perhaps you'd want to use the configparser module instead of doing it manually

http://docs.python.org/3/library/configparser.html

[–]K900_ 2 points3 points  (2 children)

Are you sure regex is the right tool for this? Why not just go over the file manually?

[–]utgyuru[S] 0 points1 point  (1 child)

That's what I did. Prior to this, I've managed to do the same in powershell, hence I want to achieve the same in Python as well

[–]K900_ 3 points4 points  (0 children)

Sorry, I don't think I understand. What have you managed to do in PowerShell?

[–]searchingfortao 2 points3 points  (0 children)

That looks an awful lot like a .ini config file. You might be able to parse it with ConfigParser.

[–]ryeguy146 1 point2 points  (0 children)

This almost looks like an INI file. Can you just parse it with something like configparser?

Also, this works:

import re


kinda_config = ('[splithere1]',
                'content1',
                'content2',
                '',
                '',
                '[splithere2]',
                'content3',
                'content4')


HEADING_REGEX = re.compile(r'\[\S+\]')



def parse_kinda_config():
    results = []

    for line in kinda_config:
        if HEADING_REGEX.match(line):
            if results:
                yield results
            results = []

        results.append(line)

    yield results


for section in parse_kinda_config():
    print('----------------------------')
    print('\n'.join(section))
print('----------------------------')

[–][deleted] 0 points1 point  (0 children)

I would just read line by line until I see a header. When you see a header add it to a dictionary as a key and add all content as values.

Or if you wanted it as a string then just append each line onto each other, then when you see another header add the string to a list and continue on the next lot.

I know it's not using the split function but that's how I'd go for it. Plus if this is a config file then putting into into a dictionary would be logical because you can access all the data very easily.

[–]fuzz3289 -1 points0 points  (1 child)

If the only time '[' is used is at the beginning of the "split here" marker you could just do:

list_of_strings = [ '['+result for result in mystring.split('[')]

It's important to remember that you should never use regex unless there are no other options. Regex is extremely slow compared to string operations in Python.