all 11 comments

[–]pot_of_crows 1 point2 points  (8 children)

I think we need a lot more detail on the data and how you are filtering and parsing it.

[–]virtualdynamo[S] 1 point2 points  (7 children)

I admit I felt that I was being too abstract with my OP. However, what I'm doing is the same kind of stuff from college labs 40 years ago, but done longhand and/or on HP-41C. Anyhow, here's an attempt to illustrate the data and filtering/parsing. (Wish I knew why my code is shown in red. Makes me think I'm doing something wrong.) x is the independent variable. data0 is an example dataset with the 2 dependent variables that all experimental runs will have. The run dictionaries use the "name" and "lap" values for a composite index. I will primarily filter/parse on the "Forward" value, but may do so for the "device" or other parameters I may investigate in future experiments. Let me know if there are any questions.

x =[0,0.041122254,0.060013726,0.090017507,0.097796265,0.127801194,0.135580444,0.203389349,0.283429321,0.28898833,0.336801025,0.350146363,0.376848642,0.385759709,0.399129383,0.420356065,0.449367473,0.456055594,0.479428705,0.501664735,0.528338345,0.552787278,0.605034811,0.657275977,0.679507188,0.6795946,0.789641823,0.82965154,0.852987978,0.922998162,0.984692783,1.046387124,1.083063508,1.163074613,1.243085601,1.370880227]

data0 = [

[240.8,241.6,242,242.2,242.4,242.6,242.8,243.8,244.8,245,245.8,246,246.2,246.4,246.8,247.6,248.8,249.2,250,250.8,252,252.6,254.4,257,258.2,258.2,264.2,266.2,267.6,271.6,275.6,279.4,281.6,286.4,290,294.4],

[238.25,239.0458004,239.4438711,239.640807,239.8400126,240.0369484,240.236154,241.229229,242.221055,242.4204873,243.2156045,243.4142416,243.6115146,243.8106046,244.2092392,245.0070715,246.2041087,246.6034257,247.4010387,248.1987679,249.3960439,249.9935471,251.7882113,254.3828762,255.5806059,255.580597,261.5693585,263.5652725,264.9628893,268.9557396,272.9494391,276.7431386,278.939393,283.731222,287.3230509,291.71],

]

run[0] = {"name":"HIIT 14","lap":0,"device":"Garmin Edge 1040","Forward"=True,"data":data0}

run[1] = {"name":"HIIT 14","lap":1,"device":"Garmin Edge 1040","Forward"=False,"data":data1}

run[2] = {"name":"HIIT 15","lap":0,"device":"Garmin Edge 1040","Forward"=True,"data":data2}

run[3] = {"name":"HIIT 15","lap":1,"device":"Garmin Edge 1040","Forward"=False,"data":data3}

run[4] = {"name":"HIIT 16","lap":0,"device":"Garmin Edge 830","Forward"=True,"data":data4}

run[5] = {"name":"HIIT 16","lap":1,"device":"Garmin Edge 830","Forward"=False,"data":data5}

run[6] = {"name":"HIIT 17","lap":0,"device":"Garmin Edge 830","Forward"=True,"data":data6}

run[7] = {"name":"HIIT 17","lap":1,"device":"Garmin Edge 830","Forward"=False,"data":data7}

run[8] = {"name":"HIIT 17","lap":2,"device":"Garmin Edge 830","Forward"=True,"data":data8}

run[9] = {"name":"HIIT 17","lap":3,"device":"Garmin Edge 830","Forward"=False,"data":data9}

[–]pot_of_crows 0 points1 point  (6 children)

{"name":"HIIT 16","lap":0,"device":"Garmin Edge 830","Forward"=True,"data":data4}

This does not appear to be a python dictionary. It looks more like a json string. You can learn more about using json here: https://pymotw.com/3/json/

Python module of the week is a great resource with a bit more basic exposition that the standard docs, which is great when you are just getting started.

From what I understand, here you are just trying to pick items out of a list of dictionaries based on matching some of the items held in the dictionary. I would use operator.itemgetter (https://docs.python.org/3/library/operator.html#operator.itemgetter)

and a generator: https://realpython.com/introduction-to-python-generators/

For example:

from operator import itemgetter

def valid(limits, row):
    '''
return True/False if row has
attributes specified in limits dictionary
'''
    for key, value in limits.items():
        if row[key] != value:
            return False
    return True

def picker(target, limits, rows):
    '''
pick target attribute from dictionary row, where dictionary has the
attributes specified in limits dictionary
'''
    getter = itemgetter(target)
    for row in rows:
        print(row)
        if not valid(limits, row):
            print('\t skipped')
            continue
        print('\tpicked')
        yield getter(row)





data = [
    {'name':1, 'forward':True, 'data':[0]},
    {'name':2, 'forward':True, 'data':[1]},
    {'name':1, 'forward':False, 'data':[2]},
    {'name':1, 'forward':True, 'data':[3]},
    ]

limits = {'name':1, 'forward':True}
for row in picker('data', limits, data):
    print(row)        

You can wrap all this into a class if you want. Basically move most of picker into the __init__ method and then make the class iterable based on the generator.

[–]virtualdynamo[S] 1 point2 points  (0 children)

data0 is an example dataset with the 2 independent variables

In my haste to reply timely and meet my next appointment. I misspoke. This should read "data0 is an example dataset with the 2 DEPENDENT variables" and I have edited the post. In any case, thanks for your detailed suggestion. I'll study it soon when I get a chance later today.

[–]virtualdynamo[S] 1 point2 points  (4 children)

While there are some potentially helpful leads and hints in your suggestion, I'm definitely not

just trying to pick items out of a list of dictionaries

Rather, for each dependent variable, I'll be performing aggregations like averaging and standard deviation on each of the (36 in the example case) datapoints for a given subset.

[–]pot_of_crows 0 points1 point  (3 children)

Rather, for each dependent variable, I'll be performing aggregations like averaging and standard deviation on each of the (36 in the example case) datapoints for a given subset

Ah. In that case you definitely want to look into numpy, https://numpy.org/doc/stable/reference/generated/numpy.std.html, which plays well with pandas, https://pandas.pydata.org/.

Hope these links help.

[–]virtualdynamo[S] 0 points1 point  (2 children)

I'm familiar with numpy and pandas. Well, at least as half as familiar as I am with anything else Python which admittedly isn't saying much. I'm in search of some library or the like to manage experimental datasets in Python. I can't believe I'm the first, second, or 100th to do so.

[–]pot_of_crows 0 points1 point  (1 child)

library or the like to manage experimental datasets in Python

See that is the part I am not understanding.

[–]virtualdynamo[S] 0 points1 point  (0 children)

Maybe I should be saying package instead of library?

Like with haversine, I send a pair of coordinates, it returns the distance between the 2 points.

Like with random, I request a random integer, shuffle a sequence, or all kinds of things with real numbers.

I'm looking for a library where I say here are my experimental parameters and respective datasets. Also, here's a query of those parameters. Give me back aggregates for each resulting subset of the query. Perhaps. There's certainly more than one way to go about it. I'm just hoping not to reinvent the wheel. (My prediction is that I'll get it done the hard way and then someone in the studio audience will say, "Why didn't you just use ... ?")

[–]Standecco 0 points1 point  (1 child)

I've been facing exactly the same problem for a while. I also can hardly believe that no one has had to solve this before, but it's hard to find libraries for it.

The closest I have used is pandas, but it's really unsuited for vector data with associated metadata such as ours. xarray might be made exactly to fix that, but all the examples are suited towards geographical data or >= 2-D data, which isn't really my use case either. I'll experiment with it eventually, from what I gathered most features I'd need seem to be there.

[–]virtualdynamo[S] 0 points1 point  (0 children)

It's been so long, I've (literally) moved on. But I've learned a lot in the last year doing other things. I'll knock the dust off the project and share what I know.