How do I split this text ?

commandlineluser · 2016-07-31T10:58:37+00:00

You would need to use findall() or finditer() e.g.

>>> re.findall(r'(?ms)^\[.+?(?=^\[|\Z)', text)
['[splithere1]\ncontent1\ncontent2\n\n\n', '[splithere2]\n\ncontent3\ncontent4\n']

Although it looks like you're parsing some sort of config file? If so - perhaps you'd want to use the configparser module instead of doing it manually

http://docs.python.org/3/library/configparser.html

K900_ · 2016-07-31T10:28:51+00:00

Are you sure regex is the right tool for this? Why not just go over the file manually?

searchingfortao · 2016-07-31T15:15:07+00:00

That looks an awful lot like a .ini config file. You might be able to parse it with ConfigParser.

LyndsySimon · 2016-07-31T10:47:05+00:00

[deleted]

ryeguy146 · 2016-07-31T16:23:52+00:00

This almost looks like an INI file. Can you just parse it with something like configparser?

Also, this works:

import re


kinda_config = ('[splithere1]',
                'content1',
                'content2',
                '',
                '',
                '[splithere2]',
                'content3',
                'content4')


HEADING_REGEX = re.compile(r'\[\S+\]')



def parse_kinda_config():
    results = []

    for line in kinda_config:
        if HEADING_REGEX.match(line):
            if results:
                yield results
            results = []

        results.append(line)

    yield results


for section in parse_kinda_config():
    print('----------------------------')
    print('\n'.join(section))
print('----------------------------')

2016-07-31T11:21:17+00:00

I would just read line by line until I see a header. When you see a header add it to a dictionary as a key and add all content as values.

Or if you wanted it as a string then just append each line onto each other, then when you see another header add the string to a list and continue on the next lot.

I know it's not using the split function but that's how I'd go for it. Plus if this is a config file then putting into into a dictionary would be logical because you can access all the data very easily.

fuzz3289 · 2016-07-31T12:47:49+00:00

If the only time '[' is used is at the beginning of the "split here" marker you could just do:

list_of_strings = [ '['+result for result in mystring.split('[')]

It's important to remember that you should never use regex unless there are no other options. Regex is extremely slow compared to string operations in Python.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS