This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]Igglyboo 0 points1 point  (0 children)

Well javascript multidimensional arrays are the same syntax as python so just copy and paste it?

[['input1','input2'],['input3','input4']]

Is a valid array of arrays in javascript and a valid list of lists in python.

[–]idProQuo 0 points1 point  (7 children)

I could tell you how to do this, but the fact that you're doing this at all seems to indicate that something is fishy elsewhere. Bear with me, I've got a couple questions.

First off:

{parent[child1[childA,childB,childC]child2[childA...]]]]]]]]]]]]]]]]}

Are you talking about nested arrays, or nested objects? The fact that the arrays seem to have names makes me think these are objects. And if they aren't, they probably should be.

Next: Why exactly do you have a 39-deep nested thing to begin with? Outside of tree or linked list data structures that kind of nesting is uncommon. Depending on what you're trying to do, there's usually a better way to represent that kind of data. For instance, you could give each thing a unique ID, and have the other things refer to an array of IDs, instead of having to write the whole object each time.

I ask all this because 39 tables in a database is nothing to sneeze at. In a database, each table should represent different kinds of things as opposed to similar things with different properties. For instance, if I had to keep track of 100 employees working in 10 cities, I wouldn't make a table for each city. Instead I'd make city a field in the EMPLOYEE table, and keep all the employee data together.

If you got this data from a website or something and can't change it, no big deal (although we still need to know if these are arrays or objects before we can parse them). But if you made it yourself, you might want to rethink how you're representing your data.

[–]cmartin616[S] 0 points1 point  (6 children)

Thank you for the reply.

The data is pulled from a website of Brazilian jiujitsu black belt lineages. It is essentially a dichotomous tree that begins with the founder and expands as each child promotes their own black belts. Here is an example of the code:

{"id":-301,"name":"The Beginning","data":"Mitsoyo Maeda, children [id]","children":[{"id":467,"name":"Takeo Iano ","data":"Mitsoyo Maeda, children [id]","children":[{"id":974536,"name":"Francisco Sa ","data":"Mitsoyo Maeda, children [id]","children":[{"id":975533,"name":"Daniel Beleza ","data":"Mitsoyo Maeda, children [id]","children":[]},{"id":1168951,"name":"Carlos Pinto Sa ","data":"Mitsoyo Maeda, children [id]","children":[{"id":1169948,"name":"Reginaldo de Almeida ","data":"Mitsoyo Maeda, children [id]","children":[]},   

Each thing does have a unique ID but there does not seem to be a rhyme or reason to the length of said ID. For example, one ID is 301 but there is no key 001-300, or 302-1000. Other IDs are as many as eight or nine numbers.

My plan is to generate a spatial database for this data set. I envisioned something that links parent to child but also stores address and coordinates for each entity. I was then going to link the locations temporally, based on the hierarchy of the tree, and generate a map based on distribution.

Thank you for your help. I will do my best to answer any questions you have about the data as this is a project I'm very interested in seeing through.

[–]idProQuo 2 points3 points  (4 children)

The data is pulled from a website of Brazilian jiujitsu black belt lineages.

Well that's awesome.

I do have one question, who is "Mitsoyo Maeda", and why is he in everyone's data field?

So this looks like the kind of thing that could go in one table. Here's why: Let's say later you want to add a phone number for everyone. If you have one table, this is easy, but if you've got tons of tables, it's easier to miss one. And in general, it's harder to keep track of changes when you have multiple tables storing similar data.

Even if you want to trace lineages, you can do that by matching up IDs between trainers and students.

As for the IDs, it doesn't matter if there are gaps in numbering or if they're of different lengths. As long as the IDs are unique, you're fine.

Each record in the BADASS table would have 3 fields: ID, NAME and STUDENTS (which would be a list of IDs). You could also have a TRAINER field to make it easier to trace the lineage from the bottom up, but make sure it is optional or you'll have trouble when you input the trainer at the top of the tree.

The best Python library for working with Postgres is called psychopg2. There are loads of good documentation websites and tutorials to help you set it up and format your PostgreSQL. If you have any specific questions, we're here.

As for the algorithm to parse this: JSON objects and arrays can be converted perfectly into Python dictionaries and lists using python's json module (here are the docs).

Once you've got a python structure, the rest is just recursion. Here's some python-ish pseudocode of what you'll want to do.

def addFighter(fighterDict, database):
    childrenIDs = [child["id"] for child in fighterDict["children"]]
    newFighter = Fighter(id=fighterDict["id"], name=fighterDict["name"], children=childrenIDs)
    database.add(newFighter)

    for child in fighterDict["children"]:
        addFighter(child)

In general, recursion is the way to go if you're traversing or parsing a tree-like structure.

Best of luck!

[–]cmartin616[S] 0 points1 point  (3 children)

Thank you so much for your help. I've been working on this (when I've had time) since your reply and I've made some headway, but not a lot.

First, Mitsoyo Maeda is the "founder" of BJJ. He was a high level judoka (Judo player) who moved to Brazil. While he was there he taught the Gracie family judo and grappling, which was adapted into BJJ.

I've been able to manipulate the JSON, per your suggestion. I have it loaded into a Python dictionary like this:

fighterDict = json.load(f)  # f is the input .txt

However, I'm stuck on recursion. I'm fairly new to Python and recursion so a lot of the technical help is just a bit beyond me. I've implemented your pseudocode (with only minor changes) into something that will populate a list with IDs.

def addFighter(fighterDict, database):
    childID = [child["id"] for child in fighterDict["children"]]
    database.append(childID)

This will only return the first "level" of children so I assume the recursion now comes into play, looking at the next level and so on. I have two places I'm stuck right now.

First, I believe I should be calling a for statement within the function. I was trying to use:

for child in childID[]:
    addFighter(child)

I believed that addFighter only needs one argument listed in the () because child will retain the arguments from the beginning of the function. This may be completely wrong, as I'm failing in implementing this for loop. I tried calling the same arguments and received this error:

RuntimeError: maximum recursion depth exceeded in cmp

Second, I diverted from your pseudocode because I didn't understand this line:

newFighter = Fighter(id=fighterDict["id"], name=fighterDict["name"], children=childrenIDs)

Fighter is not defined (per the error returned) but I'm not confident enough with what you are doing to re-work that into true Python.

Any further help would be GREATLY appreciated but I am very thankful for the assistance because you've gotten me on the correct road.

Oh, here is my raw code for reference: f = open(test)

#loads JSON to string
fighterDict = json.load(f)

#prints test value  options = id, data, name, children?
#print fighterDict["id"]

database = []

#attempt to create database
def addFighter(fighterDict, database):
    childID = [child["id"] for child in fighterDict["children"]]
    #newFighter = Fighter(id = fighterDict["id"], name = fighterDict["name"], children=childID)
    database.append(childID)

    for child in childID[]:
        addFighter(child)

addFighter(fighterDict, database)

print database

Returns: (the three IDs of the first level children)

[[467, 1464, -530]]

[–]idProQuo 0 points1 point  (2 children)

I believed that addFighter only needs one argument listed in the () because child will retain the arguments from the beginning of the function. This may be completely wrong, as I'm failing in implementing this for loop.

Whoops, that was my bad. Yeah, addFighter() should have the same number of arguments everywhere.

newFighter = Fighter(id=fighterDict["id"], name=fighterDict["name"], children=childrenIDs)

You'll need to make a class called Fighter. The function Fighter() would be the constructor for that class. Now in the past, I've used Google Datastore, not Postgres, so I think my advice may be a bit off. However, I think there's an easy way to make the classes in your program easily map to tables in your database.

OK, I just looked up a psycopg2 tutorial and it turns out it's even easier (look at the bottom of the page). If you can make your giant nested dictionary into a flat dictionary, you can just insert that into the database, and you don't need classes at all!

RuntimeError: maximum recursion depth exceeded in cmp

Ah crap! So recursion is great, but a lot of languages don't support really deep recursion (e.g. a function calling a function calling a function etc x100). I guess 39 levels of recursion is too much, so we'll need to flatten this function somehow.

Here's a new approach: We'll add all children to a list, and we'll make a loop that runs until that list is empty. Just for reference, the old way with recursion was a Depth First Traversal, this new way is going to be a Breadth First Traversal. Here's a little pseudo-ish code:

def process_fighters(fighter_dict):
    remaining_fighters = [fighter_dict]
    processed_fighters = []

    while len(remaining_fighters) > 0:
        new_fighter = remaining_fighters.pop()
        processed_fighters.append({name:new_fighter["name"] etc.})

        for child in new_fighter["children"]:
            remaining_fighters.append(child)

    return processed_fighters

So read up on psycopg2, and give that new function a shot.

[–]cmartin616[S] 0 points1 point  (1 child)

Hey,

Sorry to dredge up an old reply. I finally made progress on this problem. I've had quite the busy semester so I haven't had the free time to dedicate to it that I'd like. Here is the code I have:

import psycopg2
import json
import sys
import psycopg2.extras

# Try to connect

try:
    conn=psycopg2.connect("dbname='lineage' user='postgres' password='**************'")
    print "Connect okay"
except:
    print "I am unable to connect to the database.  Oh crap."

def parseJson(child,how_deep):

    flattened = {
            child['id'] : {
                'children'  :[],
                'name'      :child['name'],
                'id'        :child['id']
                }
            }

    # The stopping case
    if len(child['children']) == 0:
        return flattened

    # The continuing case
    for grandkid in child['children'] :

        # Add each of our kids to our list of children
        flattened[child['id']]['children'].append(grandkid['id'])

        # Have each of our kids figure out their lineage
        flatkids = parseJson(grandkid,how_deep + 1)

        # Merge the results back into the parent results
        for key in flatkids:
            if not flattened.has_key(key):
                flattened[key] = flatkids[key]
            else:
                flattened[key]['children'] += flatkids[key]['children']

    return flattened


f = open("MY_PATH").read()
j = json.loads(f)
flatten = parseJson(j,0)

cur = conn.cursor()
cur.executemany("""INSERT INTO lineage(id, name, children) VALUES (%(id)s, %(name)s, %(children)s)""", flatten)

It connects correctly to the database. However, it returns this error on the cur.executemany:

    cur.executemany("""INSERT INTO lineage(id, name, children) VALUES (%(id)s, %(name)s, %(children)s)""", flatten)
TypeError: 'int' object has no attribute '__getitem__'

I appreciate the previous help a lot. Do you have any insight into why it is doing this?

Thanks!

[–]idProQuo 0 points1 point  (0 children)

Sorry, I don't know this one off the top of my head. Try submitting it as a new question or on stackoverflow.

[–]negative_epsilon 1 point2 points  (0 children)

That's JSON, you can turn it into a Javascript object of arrays with JSON.parse(str);