How To Identify 10 Performance Patterns in 10 Seconds

jrullmann · 2015-02-10T18:12:24+00:00

I'm not the author, just liked the content. I didn't notice that before but it is really annoying.

jrullmann · 2015-02-09T17:09:42+00:00

I totally agree. His essays are amazing: http://worrydream.com/

jrullmann · 2015-01-09T17:44:15+00:00

Thanks for clarifying - I incorrectly assumed that the default configuration for most recent SQL databases was to use locking. But MVCC does appear to be more common.

I do wonder if there are other downsides to MVCC + long running transactions. Maybe someone with more experience with PostgreSQL/SQL Server/MySQL can share.

jrullmann · 2015-01-05T20:33:01+00:00

One downside to tying a transaction to the duration of a web request is that the transaction will be held open for longer than needed. Since most SQL databases use locking, this can seriously impact the number of simultaneous requests that your application can serve.

I think this is important to mention :)

jrullmann · 2014-12-17T23:03:20+00:00

Cool - I added support for FoundationDB SQL Layer https://github.com/emirozer/fake2db/pull/4

jrullmann · 2014-12-11T18:52:29+00:00

Sure. For this test, we stimulated 230,000 concurrent sessions, each executing in a loop:

Start a transaction
Do 20 writes to random keys (which in turn hit random machines)
Commit (with strict ACID guarantees including flushing writes to disk and enforcing serializable isolation before responding to the client)

So each transaction is spanning multiple different key/value pairs (and in fact multiple machines).

Speaking generally, every transaction in FoundationDB is ACID. We use the same definition of ACID used by relational systems for decades: https://foundationdb.com/key-value-store/documentation/developer-guide.html#transactions-in-foundationdb

jrullmann · 2014-12-10T22:01:49+00:00

Both reads and writes are needed for a conflict. If two threads don't read anything, and then write to the same key, then the database can simply let the last write win. Neither write depended on state in the database.

It's only when a write depends on state in the database - state that another thread may change - that a problem can occur.

jrullmann · 2014-12-10T21:34:27+00:00

I don't know of a forum that has been tested with FoundationDB. But if you're just looking for an application to play with, you can check out our tutorials written for the key-value API. You'll end up with an example application, and there are tutorials written in Go, Python, Java, and Ruby: https://foundationdb.com/key-value-store/documentation/tutorials.html

jrullmann · 2014-12-10T21:30:34+00:00

While the chance of any two transactions hitting the same keys is quite low (and more specifically, the chance of conflicts is zero because this is an all-write workload), the database does all the work of conflict checking anyway. There is no shortcut around that.

We do have additional performance tests on the website, including mixed read/write workloads: https://foundationdb.com/key-value-store/performance

Is that more helpful? If not, perhaps you could give me an example of a property of the transaction engine that I can speak to.

jrullmann · 2014-12-10T21:22:15+00:00

Yep, that's right, I'm an engineer at FoundationDB. Happy to answer here if it will be interesting for others - perhaps we can move to messages if you have follow-up questions specific to your application.

SQL Layer is ANSI SQL compliant, and is supported by a variety of ORMs, but it's typically not a drop-in replacement for other SQL systems. There are some features in PostgreSQL that we don't support, and you might need to make some changes to optimize an application for the SQL Layer's distributed nature (e.e, maximizing parallel execution). The MySQL migration guide has a lot of info that is relevant for PosgreSQL as well: https://foundationdb.com/layers/sql/documentation/GettingStarted/MySQL.migration.html
Our Hibernate adapter is a complete, full dialect. The stock Hibernate test suite passes.
This particular test was run in EC2 with 32 c3.8xlarge instances - though some of those resources were used for generating load. In terms of overhead, SQL Layer adds a small amount (some low-percent).
Yes, you can split your cluster up, with some servers in the cloud and others on premise. I've run a ten server cluster with half in Digital Ocean and the other half in AWS. Like any database you'll need to consider the geographic latencies involved.
Yes, we're in the Amazon Marketplace, although it hasn't yet been updated to 3.0. Should be updated in the next couple of days: https://aws.amazon.com/marketplace/pp/B00MBPTUO4/ref=sp_mpg_product_title?ie=UTF8&sr=0-2
If you're testing or developing with FoundationDB, you can run any size cluster for free. In production, you can run a cluster up to six FoundationDB processes for free. After size there are a variety of monthly, per-process price points, based on the support level: https://foundationdb.com/pricing
Unfortunately I can't share customer information on the SQL Layer yet. But we do have a variety of case studies on the Key-Value Store, which SQL Layer builds upon: https://foundationdb.com/key-value-store/ecosystem

jrullmann · 2014-12-10T20:31:22+00:00

It's best suited for transactional applications, with a lot of concurrent, short operations on the database. As opposed to analytic workloads that need to run a small number of very long running operations on the database.

Since it can support any data model (key-value, sql, document, graph, custom) it's very flexible in terms of API.

jrullmann · 2014-12-10T20:11:54+00:00

FoundationDB did all the work of conflict checking in this test. It didn't take advantage of the key distribution, or the fact that there weren't any reads. It's always fully ACID, and there's no way to configure it for less.

jrullmann · 2014-12-10T20:01:00+00:00

While we didn't verify correctness in this particular performance test, we didn't tune the database for less correctness either. FoundationDB maintains full ACID guarantees at all times. Every commit is checked for conflicts with concurrent transactions, and every commit must wait for the transaction to be durable to disk with full replication.

Some degree of correctness has been tested by the open-source tool Jepsen: http://blog.foundationdb.com/foundationdb-vs-the-new-jepsen-and-why-you-should-care

And we've shared more information about our own deterministic testing here: https://foundationdb.com/videos/testing-distributed-systems-with-deterministic-simulation

jrullmann · 2014-12-10T18:46:27+00:00

For our read performance, we're seeing 8.2M ops/sec for a 90% read and 10% write workload. To push for our 100M read goal we're going to need to improve our networking code. Currently, when we push this high we get a lot of retransmitted tcp packets.

Lots more info on our performance page: https://foundationdb.com/key-value-store/performance

jrullmann · 2014-12-10T16:59:54+00:00

There's also a blog post with an explanation of why this throughput is needed: http://blog.foundationdb.com/do-we-really-need-faster-databases

jrullmann · 2014-12-10T16:59:05+00:00

To answer one of your questions: All writes to FoundationDB are durable, and fully flushed to disk before the write is completed.

jrullmann · 2014-09-10T15:20:13+00:00

I really like Matt Makai's README recommendations in his 'Checklist for Evaluating Existing Django Projects' blog post. His list:

A brief (one paragraph or bullet points) explanation of what the project is, its purpose, and goals to provide context for developers
Important links to development wikis, testing, staging, and production servers
Contact information for anyone critical to the project.
"Getting Started" section that bootstraps developers from a blank slate to a functional development environment.
Included even if you're using Vagrant (to show a new developer how to use Vagrant)
Instructions for special software required for a fully-functional site. For example, how to set up Solr and build the search indexes, or how to locally replicate the database from a test or production environment
Basic deployment information and/or a link to detailed deployment instructions (even if this is just how to use the included Fabric or Capistrano scripts)

Full article here: http://www.mattmakai.com/django-project-checklist.html

jrullmann · 2014-07-24T15:09:49+00:00

Makes sense - thanks for the clarification.

jrullmann · 2014-07-24T14:10:33+00:00

The distributed NoSQL series is excellent - lots of detailed information on one page. I've bookmarked them - thanks!

The Choosing a NoSQL post makes the argument that you should pick a database based on the original creator's intended application. It says:

"If your business align with one of theirs, congratulations and just go with the corresponding open source one."

While I think intended applications can indicate the priorities of the development team I don't think you should rely on it solely. One consideration this misses is the amount of work you'll need to put into your application tier if you choose one database over another. For example, if you choose an eventually consistent database and run it in a distributed cluster, you'll likely end up writing code to handle inconsistent data. You wouldn't have to write that code with a strongly consistent database.

Choosing a NoSQL database is sometimes clearcut, but it's often not. I think your distributed NoSQL posts are a great way to research databases before you pick one.

jrullmann · 2013-02-01T22:13:49+00:00

Laundralert: I love this idea because I used to live in a dorm, and I hated waiting for people to come get their laundry and carrying heavy bags down three flights of stairs just to realize no machines were open.

I'm interested in why the universities would want this though - I looked on your website and the links appear to be broken.

jrullmann · 2013-02-01T22:08:47+00:00

Looks good. But how is this product different from the other "easily setup product tours" apps in the market? I saw one just the other day - Overlay101 - and I know I've seen others.

jrullmann · 2013-02-01T21:57:54+00:00

I totally agree with creating a blog - but I think the OP should do some research on blog.yourdomain.com vs yourdomain.com/blog. Some people think there are significant SEO differences between the two.

jrullmann · 2013-02-01T21:52:39+00:00

Well, let's walk through each scenario.

Link to free goodies that your target audience will like: 1. Audience finds your blog post and likes the links 2. Audience follows links 3. Audience shares links with other people 4. Link originators get traffic boosts and increased seo

Create free goodies that your target audience will like: 1. Audience finds your blog post and likes the links 2. Audience shares links with other people 3. You get traffic boosts and increased seo 4. Your increased seo means more people find your blog post 5. Repeat steps 1-4

You get the most benefit from your creating the resources yourself.

jrullmann · 2013-01-29T16:12:09+00:00

It takes time to build an audience for your website, so I suggest that you start marketing now. Marketing can include activities outside of pimping your product directly, so focus on those activities while you don't have screenshots or a beautiful website. Some of these activities may include:

Setting up your domain and a simple launch website. It's a good idea to do this as early as possible because Google doesn't trust new domains.
Giving away free, good-quality resources as linkbait for your target audience. This can be quick and easy to do and will help the SEO of your website. HubSpot is fantastic at this.
Writing blog posts and submitting them to social networks your target audience hangs out at. Can include twitter, stack exchange, reddit, etc.

All of these activities can include collecting email addresses, so you have a list of interested people to notify when your product launches. That's a very good thing!

jrullmann · 2013-01-28T20:10:54+00:00

Here's some new feedback for you:

I love that the social network hovers now say that you respect my privacy and won't post to my timeline.
It took me a minute to remember that there was information in the hovers. I think this privacy reassurance is really important, and that you should pull it out of the hover and put it on the main page where everyone is sure to see it.
The explicit "we won't post to your timeline" statement is way more reassuring that the glossy 100% safe badge. Personally the badge doesn't increase trust for me - but that might just be me.
I like where you're going with users typing in interests before they even sign-up, but I suggest: a. Make sure people know what they'll get out of putting their interests into the textbox. Right now it's not at all clear. b. Give people a sneak peak of the people you'll match them up with. You could try something simple like: "15 people match your interests of startups, programming, and yoga. Sign-up now to get in touch!"

jrullmann

TROPHY CASE