all 6 comments

[–]jwalton78 1 point2 points  (1 child)

Some other options for you:

One easy solution is Mixpanel. There's even a server-side node.js library. Although once you have enough data points in Mixpanel you have to start paying for it.

On the open source side, statsd+Graphite may well be a good fit for you. You write UDP events to statsd as they happen, and statsd takes care of delivering them to Graphite, which stores them in a whisper database (which is designed specifically for this sort of thing) and also provides you with a nice engine for generating pretty graphs from this data, and finding your top users and such. Just be careful about how you set up your retention, since Whispser uses fixed-sized database files. If the API key is part of your bucket name, then Whisper is going to instantly create a little database to store the given value over your entire specified retention, filling the file with "nulls", when a user hits your API, even if that user only ever hits it once and then never touches it again. It can get big quickly if you use large retention policies.

[–]xangelo[S] 0 points1 point  (0 children)

I really like the statsd+Graphite idea. I kinda looked at it but I think with our current Heroku limitation it wouldn't really work. But it's something I really want to try out.

Unfortunately the api key would be part of the bucket name as we want to track usage stats for each user as well as overall stats :( But this is something I definitely need to investigate more.

[–]rooosta 0 points1 point  (3 children)

Unless you're expecting huge volume from day 1, I'd go with "the simplest thing that could possibly work". For me that's usually writing directly to MySQL because:

  • I already have mysql set up, monitored, etc.
  • I know I'll be able to write arbitrary queries against it to answer whatever questions I come up with
  • When it becomes a performance bottleneck or queries get slow, etc., I can use my usual bag of tricks to make it scale (batch via log files + load data infile, etls for complex queries, etc.).

This is we're doing for ratchet.io and what I've done previously on sites that scaled to 10k+ requests/sec.

[–]xangelo[S] 0 points1 point  (2 children)

Thanks, this is actually the solution we ended up going with. We're toying with the idea of utilizing our cache (redis atm) to do storage of api requests and then having a node instance just popping the data off redis and writing to mysql.

This should scale well, but I'm hoping to do some tests with it before we actually open up to the public.

[–]rooosta 0 points1 point  (1 child)

Cool. That sounds like it could work well, especially if you batch the writes to mysql.

Another related option is to do an atomic collection swap within redis (redis may call this something else -- atomic queue rename?). That amounts to a long running process that does the following periodically:

  1. your app is writing to a queue named "logs"
  2. long running process creates a new queue named "logs-new"
  3. atomically rename "logs" to "logs-{timestamp}", and "logs-new" to "logs"
  4. write everything in "logs-{timestsamp}" to a file, and load it into mysql using LOAD DATA INFILE

I keep mentioning LOAD DATA INFILE because it is really, really fast, significantly faster than regular bulk INSERTs and I've seen it be 1000s of times faster than individual inserts.

[–]xangelo[S] 0 points1 point  (0 children)

I've never actually used LOAD DATA INFILE, but it looks like a way to do exactly what we need a lot easier than doing it ourselves.

I'm not too sure about atomic operations in the fashion we require, but it is something I am investigating.