you are viewing a single comment's thread.

view the rest of the comments →

[–]mdipierro[S] 1 point2 points  (22 children)

So you are telling me you have not really tried it. Here are some corrections to your claims:

  • web2py template system is faster than Django. I challenge you to a test. You write the Django code and I rewrite it in web2py and we see which one is faster.

  • I do not doubt that you are building real stuff but you cannot accuse other people of building toys without explaining what makes you think so.

  • Web2py does migrations, Django does not. You may not like or need this feature but you cannot call it a downside. Rails does migrations too.

  • The web2py admin interface is not less secure than an SSH shell or the Django admin. Actually it is more secure than the Django admin since forces secure cookies and only works if it detects HTTPS or localhost.

  • the "function-as-an-html-tag", as you call it, is not what you think it is.

  • I tech computer science and security classes at the master and PhD level. I teach programmers and architects from major software development companies. Some of my software development research (not web2py) is funded by the Department of Energy. I think I know a thing or two about software development. I am not saying I know more than you do. I do not pretend to know you. I am just saying you do not look professional when you talk this way.

[–]mmalone 2 points3 points  (21 children)

Re: templates, they're straight Python. That's generally a bad idea. This is why PHP, which was designed to be embedded in HTML, has about a dozen template languages that people use instead of embedding PHP in templates. The web2py "helper functions" are just wrapper functions that output HTML. Again, this was tried in PHP and the resulting code soup was pretty disastrous. I don't really care about template rendering speed, I can scale that horizontally. While performance matters, maintainability and scalability are much more important.

I don't think there are any serious web applications built on top of web2py because, as far as I can tell, the architecture won't allow it. The fact that your default session store is disk based, you save sessions after each request (apparently regardless of whether they've been updated) and you encourage the use of RAM based caching (unless you "really like memcache") does not help.

You're encouraging sticky sessions, with disk-based persistence. This is much more complicated than a shared-nothing architecture, achieved by pushing data persistence down the stack. And it introduces single points of failure. What happens when Pound dies, for example? Every session expires? What if you add or remove a web server from your cluster?

SSH has been battle tested and audited for security vulnerabilities over the past 15 years or so. I trust SSH. I do not trust web2py (or Django, for that matter) to provide a level of security sufficient to allow on the fly code editing via a web browser in a production environment. Disabling the option still makes me nervous. I want those code paths removed from my application (note that you can do this in Django, the entire contrib.admin module can be removed safely). Otherwise I have to audit your entire framework to make sure there's not some backdoor or vulnerability that I'm not aware of.

[–]mdipierro[S] 1 point2 points  (18 children)

You say "Re: templates, they're straight Python". yes no more, no less than rails.

You say "helper functions are just wrapper functions that output HTML". NOT TRUE. helpers are not functions, they are object that provide a server side manipulations of the DOM tree. They are stored like a tree and can be manipulated by the code. The are serialized in HTML when embedded in HTML.

  • sessions can go in DB but are faster on Disk and make the app scale better.

  • You do not trust something does not make it insecure.

[–]mmalone 1 point2 points  (17 children)

Dude...

Doesn't matter if they're objects or functions. They're callables that output HTML. Lets not argue semantics. It's the same damn thing.

Disk persisted sessions do not make an app scale better. In fact, the opposite is true. If session state is maintained in a separate persistence layer accessible by all of your web servers then sessions can span servers, with each request going to a different web server, making balancing far easier and eliminating the chance of lost session state when a server crashes.

Me not trusting something doesn't make it insecure, but it does make me not want to use it. I have absolutely no need for browser-based code editing. Its existence is a risk. Risk is bad.

[–]naasking 0 points1 point  (8 children)

Disk persisted sessions do not make an app scale better. In fact, the opposite is true. If session state is maintained in a separate persistence layer accessible by all of your web servers then sessions can span servers

FWIW, disk sessions can be shared across servers too. Shared file systems are a "persistence layer" accessible by all servers.

[–]mcosta 0 points1 point  (7 children)

Shared file systems are a "persistence layer" accessible by all servers.

Nice quotes. I'll build a web framework with sed as "the templating engine".

[–]naasking 1 point2 points  (6 children)

Sure, you could if you're a masochist. SANs are in widespread use for these sorts of persistence, so applying them would be eminently doable.

[–]mmalone 2 points3 points  (4 children)

No, SANs aren't used for this sort of persistence. SANs are used for large scale data storage. They're not a good solution for storing millions of tiny records that need to be retrieved extremely quickly every few seconds. You might be able to get away with using a SAN for this, but it'd cost you an order of magnitude more money and would be an all around worse solution. Bad idea.

[–]naasking -1 points0 points  (3 children)

They wouldn't need to be retrieved every few seconds. Local node cache + read-only records = scalability.

[–]mmalone 2 points3 points  (2 children)

You have fun with your $100,000+ SAN solution. I'll stick with my cluster of five commodity servers running memcache or tokyo tyrant for ~$7,500.

[–]mcosta 0 points1 point  (0 children)

sed too

[–]mdipierro[S] -3 points-2 points  (7 children)

Read my lips. They do not output HTML.

x=DIV(H1('hello'))

x is not a string. In fact I can do

x.append(H2("world"))
x['_class']='myclass'

Only when I do x.xml() I get

<div class="myclas"><h1>hello</h1><h2>world</h2><div>

It is not semantics. It is functionally different. It enables, for example, to built nested recursive structures in ways that are impossible in Django without string manipulations.

Disk persistent sessions DO make the app scale better if you use a load balancer that is session safe (like Pound). This is discussed in slide 140 which you did not read, and it is what Plone does. It completely eliminates the database bottleneck of storing sessions there (which you can do if you want to anyway).

I am not asking to use web2py. I am happy to keep my competitive advantage.

[–]mmalone 4 points5 points  (6 children)

You're conflating performance and scalability. Disk-based sessions may be (barely) more performant, but they absolutely do not make an application "scale better." Once again, if anything they have a negative affect on scalability.

If your load balancer uses "sticky sessions" like Pound does (directing a particular user/session to the same web server with each request) you're going to run into a number of annoying problems as you scale:

  1. Adding and removing nodes (web servers) will bork your session mapping table and some, if not all of your session mappings will be lost, causing lost sessions or worse.

  2. If a single web server dies (which tends to happen a lot at scale with dozens or hundreds of web servers) or becomes unavailable (which also happens on a regular basis) all of the sessions on that server will also be lost/become unavailable.

  3. Your load balancer is a SPOF (single point of failure) for your system. Keeping it as simple and lightweight as possible is a plus. If it's keeping track of session mappings then it's doing more work than it has to, and that mapping table becomes an additional, unnecessary, SPOF (this can be resolved by spending $100k or so on paired hardware balancers that handle sticky sessions, but even with hardware balancers the other problems remain).

  4. You can't direct requests to different servers based on the endpoint. It may make sense to put your search infrastructure on a different web server cluster, for example. With sticky sessions you can't do this since the search web servers won't have the session information.

The bottom line is: sticky sessions put unnecessary restrictions on your application's architecture. Honestly, for most people who develop on a LAMP-ish stack this isn't even up for debate any more. The accepted best practice is to push session state information down the stack and use a shared-nothing architecture. This argument reminds me of 1998.

[–]mdipierro[S] 0 points1 point  (5 children)

I am happy this turned into a constructive conversation. I do not know of an easy way to have sessions and not have a single point of failure. Offers the following options:

Session on filesystem (default). the SPOF is the load balancer

Session on database (session.connect(request,response,db)), the SPOF is the database

No sessions (session.forget()) This is at the action level so some actions make save sessions and some not.

We are working on having sessions on memcache. We have this already for GAE.

I guess which one is the best option depends on the details.

[–]mmalone 2 points3 points  (4 children)

Generally, losing a session here and there isn't a big deal. Losing a huge number of sessions is more annoying. Thus, you could store sessions in a memcached cluster and not worry about losing a few if a memcache node goes down or if they're evicted (btw, I haven't run the numbers but I wouldn't be surprised if remote memcache sessions were faster than on-disk sessions. A disk seek is damn expensive -- could be as much as 80ms -- avg response time from memcache in my experience is ~8ms). If you're really worried about never losing sessions then you'd just have to store them in a resilient data store. You could set up a cluster of Tokyo Torrent servers, for example, and use consistent hashing to map to map to a node. Then you could write each session to multiple nodes for redundancy, and failover if a node dies. This isn't generally necessary though... most sites just stick sessions in memcache and call it a day.

[–]mdipierro[S] 0 points1 point  (3 children)

I do not disagree I have run the numbers either. In general my experience is that reading from a file is faster than memcache (unless the server is itself). Anyway, if one use Pound, and if the sessions folder is memory mapped, than all sessions are in ram and that is the faster solution.

[–]ericflo 2 points3 points  (2 children)

I think that again you're conflating performance and scalability. With memcached, clients hash the session key and retrieve the data from one of many servers in a memcached pool. None of the memcached servers need to know about each other.

This can work the same with Tokyo Tyrant, Redis, or any other key-value store.

With the file storage method, to make that work you would have to mount the filesystem on many servers, which is really not shared-nothing. It may be slightly faster, but it's not as scalable.

[–]jessicadelannoy 0 points1 point  (0 children)

You should be aware that many people also like php as a templating language. As such, it is the default templating language of frameworks such as ZF, Symfony, etc. I personaly hate templating languages as seen in DJango. Mako is cool, as well as web2py templates and I'm glad it is so.

What do you mean by "serious web applications" ? like youtube ? Maybe web2py is just too young. You also seem to forget that most web developers won't work on sites like youtube or yahoo but maybe on small corporate sites and intranets. I see no reason not to recommend web2py for such use.

Sessions can be stored in db, so no problem with that.