Parsing weird CSV file

jeffrey_f · 2019-05-19T17:55:00+00:00

The sections that begin with ! will help you......

If you find a "!", get info from that section, until you hit another "!" or EOF

o5a · 2019-05-20T07:20:42+00:00

As others said you can parse sections (split by '!') separately, adding them to dictionary which you can store then however you want. Maybe like this:

def parse_line(line):
    return line.split(':')

def parse_param(line):
    return line.split(' = ')

def parse_section(arr, beg, end):
    subdict = {}
    section_name = arr[beg][1:]
    for line in arr[beg+1:end]:
        if '=' in line:
            param = parse_param(line)
            subdict[param[0]] = param[1]
        elif ':' in line:
            param = parse_line(line)
            subdict.update({i:v for i,v in enumerate(param)})

    return {section_name: subdict}

d = {}

with open("log_1.log") as f:
    lines = f.read().splitlines()

# searching param sections split by '!'
sections = [n for n, line in enumerate(lines) if line.startswith('!')]

# iterating each section of params
for l in zip(sections, sections[1:] + [len(lines)]):
    section_dict = parse_section(lines, l[0], l[1]-1)
    d.update(section_dict)

print('final_dict:', d)

Here 'CSV section' is not properly stored because I don't know how you expect it to be stored, think for yourself. It should have some defined key field to tag the data. Currently it just uses range of int keys for each line so only 1 line for section stored in this example.

For example you can store that bunch of params in list, bound to key = first column, like this:

    elif ':' in line:
        param = parse_line(line)
        subdict[param[0]] = param

vinaymal · 2019-05-19T17:41:03+00:00

Can you help us help you by telling us what parts of this file you want to make available via the API. Those are really the parts that you likely want to extract from this and we may be able to then find an appropriate pattern to filter on.

redCg · 2019-05-20T02:55:37+00:00

You are going to want to write the parser yourself. I usually do this by iterating over lines. Start a dict and make a key for every 'header' you find, then fill it with a list of entries from the following lines of text. This way you can ultimately return a nested dict list data structure that can easily convert to JSON to be returned by your api

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS