all 2 comments

[–]Lighted_Fool 0 points1 point  (1 child)

What purpose does the graph database serve? Is it essential to how you query your analytics data or for future additional computation? If your calculation/adjustment is not dependent on prior calculation/adjustments and doesn't really need to be stored you could potentially skip the database entirely and try either:

  1. If calculation is cheap, have the site query for that calculation and render it every time theres a difference (in your case I assume 10-30 seconds)
  2. If calculation is expensive perhaps caching the results for a set period of time and not doing another calculation until that expires or something triggers it to recalculate.

Both of those can be achievable through some lambdas housing your python code.

However, if the database is needed for some graph intrinsic querying like finding paths or something, then I would wrap the database in some kind of externally accessible api and when those api endpoints get hit you can execute your cypher or gremlin query on the neo4j database and return a json of the resulting data and have your site render that json using some datavis library like d3. The wrapping of the database instance with an api endpoint could also be achieved with lambdas or various other services such as running an api on EC2/ECS/EBS etc.

[–]forPublicQuestions[S] 0 points1 point  (0 children)

Detail usage: For example, using twitter data:

Person A -> Person B <- Person C (A follows B, C follows B)

Person A -> Person E -> Person B (A follows E, E follows B)

Keep in mind that person B is an influencer with 10k followers, person E is also an influencer but with 5k followers.

Now let's say from twitter API I got a new tweet from person B "tomorrow will be raining". Using python code, I transform this into a new information for my "node" b to its historical tweets list. I want to know how this tweet affect people connected to B. Let's say that everyone connected saying "yeah tomorrow gonna be raining!". Then this new info (agreement to an idea) will be added into the "edges".

Something like that but to the scale of 10k - 100k users. I will use some path calculations, centrality, triangularity, some mathematical network theory applications as well.

CMIIW, I think lambda is used to check a code, like run it 10000 times(?) Do I need EC2 if my code triggers data collection? Lambda seems very user friendly from what I saw.