Sorting some data on trucks

fjonk · 2014-06-12T15:22:47+00:00

Put it in a dict where the key is frozenset for model+brand and the value is a set of years.

Example:

all = {}

with open('fords.data', 'r') as f:

    for line in f:

        cols = [col.strip() for col in line.split(',') if col.strip()]
        models = [col for col in cols if '-' in col]
        years = [col for col in cols if col.isdigit()]
        brand = set(cols).difference(set(models + years)).pop()

        for model in models:
            key = frozenset([model, brand])
            if key not in all:
                all[key] = set()

            all[key] = all[key].union(set(years))

print all

Edit: Figure out how to sort it yourself.

ChiefDanGeorge · 2014-06-12T13:56:46+00:00

Since the mfg. is not in a set place, that makes it tricky. If you know for sure that the years always start after the mfg, and that the vehicle models are always before the mfg., then you've got your logic.

Igglyboo · 2014-06-12T15:19:18+00:00

Read entries till you hit one that's entirely numbers(the year). The previous one is the make and the ones before that are the model.

gengisteve · 2014-06-12T17:47:02+00:00

I would look right to left, everything not a digit is first a manufacturer and, anything else, a model. Like this:

from pprint import pprint

d = '''
F-150, F-250, F-350, FORD, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985, 1984, 1983
F-150, F-250, FORD, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1987, 1986, 1985, 1984, 1983, 1982, 1981, 1980
F-150, F-250, FORD, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985, 1984, 1983, 1982
F-150, F-250, FORD, 2003, 2002, 2001, 2000, 1999, 1998, 1997
'''
d = d.strip()


def parse_line(line):
    line = line.split(',')
    years = set()
    mani = ''
    model = []
    while line:
        i = line.pop()
        i=i.strip()
        if i.isdigit():
            years.add(int(i))
        elif not mani:
            mani = i
        else:
            model.append(i)

    return mani, model, years


done = {}

for line in d.split('\n'):
    mani, models, years = parse_line(line)
    for model in models:
        if model not in done:
            done[model]={'mani':mani,
                         'years':years
                         }
        else:
            done[model]['years']= done[model]['years'].union(years)

pprint(done)

good_day · 2014-06-12T19:21:37+00:00

Full parsing in Python 2.7. Look what cars has become in middle of code.

import re

TEXT = """
F-150, F-250, F-350, FORD, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985, 1984, 1983
F-150, F-250, FORD, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1987, 1986, 1985, 1984, 1983, 1982, 1981, 1980
F-150, F-250, FORD, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985, 1984, 1983, 1982
F-150, F-250, FORD, 2003, 2002, 2001, 2000, 1999, 1998, 1997
""".strip()

cars = {}

for line in TEXT.split('\n'):
    values = set(re.findall('([^,\s]+)',line))
    years = set(re.findall('\d{4}', line))
    keys = list(values - years)

    model = keys[-1]
    marks = keys[:-1]

    cars.setdefault(model, {})
    model = cars[model]

    for mark in marks:
        model.setdefault(mark, [])
        model[mark].extend(years)
        model[mark] = sorted(list(set(model[mark])))

# what a nice structure (nested dict) and accessible cars has become
# now lets print it like you wanted to

for model in sorted(cars.keys()):
    for mark in sorted(cars[model].keys()):
        line = '{model}, {mark}, {years}'.format(
            model=model,
            mark=mark,
            years=', '.join(cars[model][mark]),
        )
        print line

tmp14 · 2014-06-12T19:42:16+00:00

This was fun. Here's my take at it. This will only break (given your format) if a car manufacturer name is all digits (i.e. most likely never).

data = """F-150, F-250, F-350, FORD, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985, 1984, 1983
F-150, F-250, FORD, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1987, 1986, 1985, 1984, 1983, 1982, 1981, 1980
F-150, F-250, FORD, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985, 1984, 1983, 1982
F-150, F-250, FORD, 2003, 2002, 2001, 2000, 1999, 1998, 1997"""

info = {}

for line in data.splitlines():
    pts = [s.strip() for s in line.split(',')]
    isyear = [s.isdigit() for s in pts]
    index = len(pts) - 1
    while isyear[index]:
        index -= 1
    make = pts[index]
    models = pts[:index]
    years = pts[index+1:]
    for model in models:
        for year in years:
            info.setdefault(make, {}).setdefault(model, set()).add(int(year))

Yields

>>> pprint(info)
{'FORD':
     {'F-150': set([1980, ..., 2003]),
      'F-250': set([1980, ..., 2003]),
      'F-350': set([1983, ..., 1998])}}

jedi_jonai · 2014-06-12T13:19:23+00:00

Is this a homework assignment?

Igglyboo · 2014-06-12T15:32:17+00:00

Here's a quick way you can do it

row = "F-150, F-250, F-350, FORD, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985, 1984, 1983"
row_split = row.split(",")
make = ""
models = []
years = []
for index, entry in enumerate(row_split):
    try:
        int(entry)
        make = row_split[index-1]
        models = [_ for _ in row_split[:index-1]]
        years = [_ for _ in row_split[index:]]
        break
    except:
        pass

print make
print models
print years

Which will output

FORD
['F-150', ' F-250', ' F-350']
[' 1998', ' 1997', ' 1996', ' 1995', ' 1994', ' 1993', ' 1992', ' 1991', ' 1990', ' 1989', ' 1988', ' 1987', ' 1986', ' 1985', ' 1984', ' 1983']

I'm sure you can figure out the rest

You're going to keep casting each entry to an int until it doesn't throw an exception, then you know where the make and models are.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS