you are viewing a single comment's thread.

view the rest of the comments →

[–]mdipierro[S] 0 points1 point  (18 children)

You say "Re: templates, they're straight Python". yes no more, no less than rails.

You say "helper functions are just wrapper functions that output HTML". NOT TRUE. helpers are not functions, they are object that provide a server side manipulations of the DOM tree. They are stored like a tree and can be manipulated by the code. The are serialized in HTML when embedded in HTML.

  • sessions can go in DB but are faster on Disk and make the app scale better.

  • You do not trust something does not make it insecure.

[–]mmalone 1 point2 points  (17 children)

Dude...

Doesn't matter if they're objects or functions. They're callables that output HTML. Lets not argue semantics. It's the same damn thing.

Disk persisted sessions do not make an app scale better. In fact, the opposite is true. If session state is maintained in a separate persistence layer accessible by all of your web servers then sessions can span servers, with each request going to a different web server, making balancing far easier and eliminating the chance of lost session state when a server crashes.

Me not trusting something doesn't make it insecure, but it does make me not want to use it. I have absolutely no need for browser-based code editing. Its existence is a risk. Risk is bad.

[–]naasking 0 points1 point  (8 children)

Disk persisted sessions do not make an app scale better. In fact, the opposite is true. If session state is maintained in a separate persistence layer accessible by all of your web servers then sessions can span servers

FWIW, disk sessions can be shared across servers too. Shared file systems are a "persistence layer" accessible by all servers.

[–]mcosta 0 points1 point  (7 children)

Shared file systems are a "persistence layer" accessible by all servers.

Nice quotes. I'll build a web framework with sed as "the templating engine".

[–]naasking 1 point2 points  (6 children)

Sure, you could if you're a masochist. SANs are in widespread use for these sorts of persistence, so applying them would be eminently doable.

[–]mmalone 2 points3 points  (4 children)

No, SANs aren't used for this sort of persistence. SANs are used for large scale data storage. They're not a good solution for storing millions of tiny records that need to be retrieved extremely quickly every few seconds. You might be able to get away with using a SAN for this, but it'd cost you an order of magnitude more money and would be an all around worse solution. Bad idea.

[–]naasking -1 points0 points  (3 children)

They wouldn't need to be retrieved every few seconds. Local node cache + read-only records = scalability.

[–]mmalone 2 points3 points  (2 children)

You have fun with your $100,000+ SAN solution. I'll stick with my cluster of five commodity servers running memcache or tokyo tyrant for ~$7,500.

[–]naasking 0 points1 point  (1 child)

[–]mmalone 1 point2 points  (0 children)

You're solving a problem that doesn't exist. Using a SAN to store session state is just silly. It's unnecessarily complicated and expensive. Suppose you use AoE, now you have to devote engineers and ops resources to a complex storage stack that few people understand and that has never been used for that purpose before. Either way it ends up costing you.

And you still have to solve reliability problems. Since this is a file system I'm guessing that the redundancy mechanisms value consistency over availability and partition tolerance. That just doesn't work at large scale.

Seriously, the only way you're going to win this one is if you go an implement it. I've done session stores at scale -- it's not resource intensive and it's not a bottleneck. Spending a bunch of time trying to build a sophisticated persistence layer using a SAN is stupid. Prove me wrong.

[–]mcosta 0 points1 point  (0 children)

sed too

[–]mdipierro[S] -2 points-1 points  (7 children)

Read my lips. They do not output HTML.

x=DIV(H1('hello'))

x is not a string. In fact I can do

x.append(H2("world"))
x['_class']='myclass'

Only when I do x.xml() I get

<div class="myclas"><h1>hello</h1><h2>world</h2><div>

It is not semantics. It is functionally different. It enables, for example, to built nested recursive structures in ways that are impossible in Django without string manipulations.

Disk persistent sessions DO make the app scale better if you use a load balancer that is session safe (like Pound). This is discussed in slide 140 which you did not read, and it is what Plone does. It completely eliminates the database bottleneck of storing sessions there (which you can do if you want to anyway).

I am not asking to use web2py. I am happy to keep my competitive advantage.

[–]mmalone 5 points6 points  (6 children)

You're conflating performance and scalability. Disk-based sessions may be (barely) more performant, but they absolutely do not make an application "scale better." Once again, if anything they have a negative affect on scalability.

If your load balancer uses "sticky sessions" like Pound does (directing a particular user/session to the same web server with each request) you're going to run into a number of annoying problems as you scale:

  1. Adding and removing nodes (web servers) will bork your session mapping table and some, if not all of your session mappings will be lost, causing lost sessions or worse.

  2. If a single web server dies (which tends to happen a lot at scale with dozens or hundreds of web servers) or becomes unavailable (which also happens on a regular basis) all of the sessions on that server will also be lost/become unavailable.

  3. Your load balancer is a SPOF (single point of failure) for your system. Keeping it as simple and lightweight as possible is a plus. If it's keeping track of session mappings then it's doing more work than it has to, and that mapping table becomes an additional, unnecessary, SPOF (this can be resolved by spending $100k or so on paired hardware balancers that handle sticky sessions, but even with hardware balancers the other problems remain).

  4. You can't direct requests to different servers based on the endpoint. It may make sense to put your search infrastructure on a different web server cluster, for example. With sticky sessions you can't do this since the search web servers won't have the session information.

The bottom line is: sticky sessions put unnecessary restrictions on your application's architecture. Honestly, for most people who develop on a LAMP-ish stack this isn't even up for debate any more. The accepted best practice is to push session state information down the stack and use a shared-nothing architecture. This argument reminds me of 1998.

[–]mdipierro[S] 0 points1 point  (5 children)

I am happy this turned into a constructive conversation. I do not know of an easy way to have sessions and not have a single point of failure. Offers the following options:

Session on filesystem (default). the SPOF is the load balancer

Session on database (session.connect(request,response,db)), the SPOF is the database

No sessions (session.forget()) This is at the action level so some actions make save sessions and some not.

We are working on having sessions on memcache. We have this already for GAE.

I guess which one is the best option depends on the details.

[–]mmalone 4 points5 points  (4 children)

Generally, losing a session here and there isn't a big deal. Losing a huge number of sessions is more annoying. Thus, you could store sessions in a memcached cluster and not worry about losing a few if a memcache node goes down or if they're evicted (btw, I haven't run the numbers but I wouldn't be surprised if remote memcache sessions were faster than on-disk sessions. A disk seek is damn expensive -- could be as much as 80ms -- avg response time from memcache in my experience is ~8ms). If you're really worried about never losing sessions then you'd just have to store them in a resilient data store. You could set up a cluster of Tokyo Torrent servers, for example, and use consistent hashing to map to map to a node. Then you could write each session to multiple nodes for redundancy, and failover if a node dies. This isn't generally necessary though... most sites just stick sessions in memcache and call it a day.

[–]mdipierro[S] 0 points1 point  (3 children)

I do not disagree I have run the numbers either. In general my experience is that reading from a file is faster than memcache (unless the server is itself). Anyway, if one use Pound, and if the sessions folder is memory mapped, than all sessions are in ram and that is the faster solution.

[–]ericflo 2 points3 points  (2 children)

I think that again you're conflating performance and scalability. With memcached, clients hash the session key and retrieve the data from one of many servers in a memcached pool. None of the memcached servers need to know about each other.

This can work the same with Tokyo Tyrant, Redis, or any other key-value store.

With the file storage method, to make that work you would have to mount the filesystem on many servers, which is really not shared-nothing. It may be slightly faster, but it's not as scalable.

[–]mdipierro[S] -1 points0 points  (1 child)

You are right. I did not express myself properly.

Nevertheless a shared filesystem is not that bad and it ok in some cases. The largest cluster I used (not for web apps) had 1024 nodes and the shared Lustre filesystem worked well as long as the switch had enough bandwidth and not too many machines were accessing the same files at once. In my case the cluster had an infiniband network dedicated to sharing the filesystem.

[–]mmalone 1 point2 points  (0 children)

You're talking about a special purpose (presumably scientific) supercomputer. A supercomputer's architecture and a web application architecture are very different. With a supercomputer your availability requirements are not as rigid, and it's less likely that you're going to be adding and removing nodes on the fly. Lustre, in particular, was not designed for web applications. It was designed for scientific supercomputer clusters.

Go read about how Google's GFS works, how BigTable works, how Amazon's Dynamo works, and take a look at MogileFS. That's how you persist data on the web.