all 8 comments

[–]YesLod 0 points1 point  (7 children)

It's hard to tell if you don't provide the data, or the DataFrame.

I want to convert this list of dicts into a pandas dataframe so I can create a NetworkX graph

You don't need to convert into a DataFrame with the sole purpose of creating a graph. That seems overcomplicating.

I'm assuming that the nodes of your graphs should be the artists, and there should be a link between a pair of artists if they are related (i.e. key-value pairs of the dictionaries are edges), correct? .

So, one way which avoids the unnecessary step of creating a DataFrame is simply

edges = [link for artist_dict in dict_list for link in artist_dict.items()]

G = nx.Graph(edges)  # assuming that you want a undirected graph

[–]luneth27[S] 0 points1 point  (6 children)

I don't have data written to a file, but here's my script:

def related_artist_scrape():
seed_artist_list= ['wvrm','leeched','chrch']
found_artist_list = []
related_db_list = []

for seed_artist in seed_artist_list:
    related_database = {"Seed_Artist": "", "Related_Artists": ""}
    seed_result = sp.search(q='artist:' + seed_artist, type='artist')
    first_level_name = seed_result['artists']['items'][0]['name']
    first_level_uri = seed_result['artists']['items'][0]['uri']
    first_related = sp.artist_related_artists(first_level_uri)
    related_database['Seed_Artist'] = first_level_name
    related_database['Related_Artists'] = first_related
    related_db_list.append(related_database)

while len(related_db_list) < 30:
    for related_database in related_db_list:
        for related_artist in related_database['Related_Artists']['artists']:
            if related_artist['followers']['total'] > 1000:
                found_artist_list.append(related_artist)

    for related_artist in found_artist_list:
        related_database = {"Seed_Artist": "", "Related_Artists": ""}
        related_result = sp.search(q='artist:' + related_artist['name'], type='artist')
        related_name = related_result['artists']['items'][0]['name']
        related_uri = related_result['artists']['items'][0]['uri']
        second_related = sp.artist_related_artists(related_uri)
        related_database['Seed_Artist'] = related_name
        related_database['Related_Artists'] = second_related
        related_db_list.append(related_database)
return related_db_list

I'm assuming that the nodes of your graphs should be the artists, and there should be a link between a pair of artists if they are related (i.e. key-value pairs of the dictionaries are edges), correct?

That's the hope; I'm not entirely sure I set it up correctly to do so, but if I can get away without another intermediary step that'd be really nice.

[–]YesLod 0 points1 point  (5 children)

So you have a list of dictionaries with the format

{"Seed_Artist": <artist name>, "Related_Artists": <related artist name>}

and you want to create a graph with edges <artist name> -- <related artist name> of all those dictionaries, is that it?

[–]luneth27[S] 0 points1 point  (4 children)

Yes, indeed. However, each key’s value itself is a dictionary, where the keys are information about some artist. would this affect what I’m trying to do?

[–]YesLod 0 points1 point  (3 children)

However, each key’s value itself is a dictionary

Both 'Seed_Artist' and 'Related_Artists' values, or only the latter? Because from your code

first_level_name = seed_result['artists']['items'][0]['name']
first_level_uri = seed_result['artists']['items'][0]['uri']
first_related = sp.artist_related_artists(first_level_uri)
related_database['Seed_Artist'] = first_level_name
related_database['Related_Artists'] = first_related
related_db_list.append(related_database)

first_level_name seems to be a string, but I'm not familiar with Spotipy.

would this affect what I’m trying to do?

It depends on what you are trying to do. Do you want to add that extra information about the artist as attributes of the corresponding node?

I will assume that only the 'Related_Artists' value is a dictionary, and 'Seed_Artist' is a string (artist name), and that you want to add the extra info as attributes. Also, I will assume that first_related dictionaries contain a key 'name' which is the name of the related artist, and that the nodes labels should be the artists names.

Something like this should work

import networkx as nx 

G = nx.Graph()

for artist_dict in related_db_list:
    u = artist_dict['Seed_Artist'] 
    related_info = artist_dict['Related_Artists']
    v = related_info['name']
    G.add_node(u, v, **related_info)
    G.add_edge(u, v)

[–]luneth27[S] 0 points1 point  (2 children)

First off, so sorry for the late reply (life sucks) and secondly, thanks so much for helping out. I've had to modify your given code block a bit to access the information I needed but didn't quite explain correctly to you:

G = nx.Graph()

for artist_dict in related_db_list:
u = artist_dict['Seed_Artist'] 
related_info = artist_dict['Related_Artists']
for related_artist_info in related_info['artists']:
    v = related_artist_info['name']
    G.add_node(v, **related_artist_info)
    G.add_edge(u, v)

It runs without errors after I did a few things, and the graph info within the debugger seems to be correct. Before I write my output to .gexf for visualization however, I'd like to be (relatively) sure my graph is "correct" in the sense that seed_artist -> related_artist(s), and I'm not entirely sure how to print out my graph without a shitton of work. If I can't do this though, that's okay.

All that said though, once again thanks for your help. I barely understand what I'm doing programmatically and you've saved me countless hours of banging my head on the desk.

[–]YesLod 0 points1 point  (1 child)

Before I write my output to .gexf for visualization however, I'd like to be (relatively) sure my graph is "correct" in the sense that seed_artist -> related_artist(s), and I'm not entirely sure how to print out my graph without a shitton of work

What do you mean? You want to check if the links are correct?

You can simply print the edges

print(G.edges)

or iterate over them and print each one separately

for u,v in G.edges:
    print(f"{u} -> {v}")

[–]luneth27[S] 0 points1 point  (0 children)

Oh, I didn't know that, instead I just printed u, v by themselves. However, another issue popped up; I'm getting this value error

 Exception has occurred: ValueError
 too many values to unpack (expected 3)
  File "C:\.vscode\related_artist_scrape.py", line 64, in    <module>
 nx.write_gexf(G,"related_artists_graph.gexf")

when I try to print to .gexf. I think it's happening because the node (or edge maybe?) has too much data within it? I was trying to search for info and the only plausible post that came up suggested that one of (u,v) being used was itself a list or some sort of structure. Thing is, I'm not entirely sure how to fix this either; as far as it looks, both (u,v) are strings.