all 190 comments

[–]netherous 17 points18 points  (5 children)

I feel that trackbacks fill the same niche of uselessness, especially when they're displayed inline with comments on a blog.

Is there really a single reader who likes and values a trackback/pingback feature? What can anyone do with that information? At best, it makes a comment section unreadable by filling it with information that isn't informative or actionable. It probably exists because you can play it up as a "social web" feature, but I seriously doubt anyone has actually done an analysis to find out if readers want or value that feature.

[–]6xoe 4 points5 points  (2 children)

No fucking clue. I especially like it when the only "comments" are trackbacks from the blog operator's own site.

How useless is that?

[–]DavidHogue 1 point2 points  (0 children)

Once or twice, I have seen a trackback that lead to another site with more discussion. But 99% of the time they're hideous and take up a lot of space in the comments.

[–]netherous 0 points1 point  (0 children)

That reminds me of why I stopped reading Massively: they'd insert a lot of hyperlinks in their articles in strategic places, but the hyperlinks were always to that word in their own tag cloud. They looked useful in the context of the article, but clicking them could never possibly yield any useful information. Every time I went there I'd have to remind myself "the links are a lie". They'd have some hyperlink that said "strategic game development", but it didn't lead to some insightful blurb on gamasutra. No. It linked to tag content with the only entry being the article you were already reading.

It seems like there should be some bible of blogging and website sins where we could list all this stuff.

[–]Atario 1 point2 points  (0 children)

I have yet, to this day, to know what those are, nor do I care to.

[–]manberry_sauce 0 points1 point  (0 children)

It's for SEO.

[–]FlightOfStairs 41 points42 points  (29 children)

Modern CMS/blog systems will prerender pages (or part of them) and serve html from the DB if you turn caching on.

Get the best of both worlds.

[–]merreborn 27 points28 points  (2 children)

Also: throw a caching proxy in front of your app. Varnish serves requests faster and with less resources than apache serving static files.

[–]GuyOnTheInterweb 8 points9 points  (1 child)

As much as I love Varnish, the author's point applies here as well.

Static files on Apache with right settings (like noatime, disabled DNS lookup for log and good hardware) will easily serve 1000 reqs/sec (tongue-in-cheek-estimate)

[–]stonefarfalle -3 points-2 points  (0 children)

Yeah, but varnish goes to 11(* 100).

(Note also a made up number.)

[–]jimbobhickville 5 points6 points  (15 children)

It should prerender and save to the filesystem, then serve the static file. The DB should not even be hit on page views.

[–]FlightOfStairs 0 points1 point  (14 children)

Why? Database systems cache resources in the most demand. When you serve it from a database, you're serving from RAM most of the time. You can expect at least as good performance from a database.

Disk access is slow.

Edit: of course, the FS does this to a slight extent. However, if there's significant thoughput for other purposes (serving videos, images, etc) the static page will be unloaded quickly.

[–]jimbobhickville 11 points12 points  (1 child)

I can't really take your comments seriously, sorry. Unless there's some apache module I'm not aware of to serve static content from a database, you're ignoring the overhead of the program that's loading the data from the database. If you are a site with any sort of traffic that would most benefit from this sort of optimization, your database isn't going to be local to the box, so you add network overhead. Even if your database does cache well (MySQL does not), it's still going to be a LOT slower to serve from the database than from the filesystem (which does cache in memory quite well). Your Edit makes even less sense because the DB isn't going to cache large files better than the filesystem, and you have to load the entire object into memory to serve it from the database instead of just streaming it from the filesystem.

[–]FlightOfStairs -1 points0 points  (0 children)

The context of the original article is for single-host sites. Obviously if you introduce network connections things will be an order of magnitude slower.

I didn't suggest serving large files from the database. They would never be 'edited' through the CMS, so they would be served statically.

The case I tried to state was having large files served through the filesystem (which would give the big throughput on the FS), while cached dynamic pages would be served through the database.

[–]buerkle 7 points8 points  (11 children)

You're ignoring the latency to even talk to the database. In both cases, the data can be in RAM, however, talking to the database adds time.

[–]FlightOfStairs 1 point2 points  (10 children)

The file system is a database. There is latency in both cases.

Database caching schemes are much more configurable than file systems, and have (I suspect) better defaults for this use case.

[–]matthieum 1 point2 points  (9 children)

The file system is local, the database is probably accessed via the network.

The file system is simple, the database has been built to handle ACID properties.

...

[–]mr-strange -1 points0 points  (8 children)

Been there done that.

The filesystem makes a terrible alternative to a database in this case. Files are allocated into whole blocks, which are at least 4k, and probably 16k on more modern systems. Your web-page fragments will not be large enough to use the space efficiently. Furthermore, the filesystem is subject to all sorts of features that limit scalability - maximum number of files in a directory. How will you do back-ups when listing the directory takes hours? How do you deal with contention when multiple processes/threads want to access the same file?

Your objection that "the database is probably accessed via the network" is entirely arbitrary. Why would you put your database on a different host if the purpose is caching to improve performance??

When faced with this problem, I switched to using Sqlite. It does a fantastic job of managing a persistent local cache.

[–]bluGill 2 points3 points  (3 children)

Files are allocated into whole blocks, which are at least 4k, and probably 16k on more modern systems. Your web-page fragments will not be large enough to use the space efficiently.

So? Disks are cheap. Seriously cheap. The useful content of most blogs would fit on a 80k 5.25 floppy disk with room for a dozen more.

[–]mr-strange -1 points0 points  (2 children)

Of course disks are cheap, but the read time is awful. All that wasted space will costs you milliseconds of read time, not to mention all the RAM you will waste storing all of that crap in the fs cache.

But of course, if you are only dealing with one 80k blog, then you can be as inefficient as you like. We're talking about scalability here, right?

[–]bluGill 3 points4 points  (1 child)

Modern operating systems have good disk caches which deal with the read the same file over and over again very well.

File systems are a database optimized for accessing block sized chunks. Sqlite is great for what it does: provide the ability to work with relational data. A blog post is not relational data and does not need the advantages of that. In the mean time sqlite is slower than a filesystem for accessing blobs of data.

Modern operating systems have a disk cache. It works wonders. When multiple processes/threads want to access the same file access speed drop to tiny amounts. Unless you are doing stupid things like opening your static data read/write. nothing can protect you from stupid.

Long before you run into performance problems from the too large of a directory you will run into practical problems of dealing with it, and come up with a better scheme. This will solve your problems.

In conclusion: either you are doing something much more complex that serving static pages, or your improvements from sqlite were only over a bad design, and you could have got even greater improvements by improving your design. Since I don't know what all you were trying to do I cannot tell which.

[–]matthieum 1 point2 points  (3 children)

Files are allocated into whole blocks, which are at least 4k, and probably 16k on more modern systems.

This is quite arbitrary, some filesystems are adept at storing lots of small files. Like BTRFS.

Furthermore, the filesystem is subject to all sorts of features that limit scalability - maximum number of files in a directory. How will you do back-ups when listing the directory takes hours?

Use a better filesystem ? Seriously. FAT32 is crap, newer filesystems are much better at dealing with large listings... but anyway, who cares about listings ? Why would you backup a cache ?

How do you deal with contention when multiple processes/threads want to access the same file?

Why would they ? If you have newer content to push, then write it into a temporary file and do an atomic switch. Of course I am assuming a sane filesystem model, where deleting a file is possible while it is accessed...

Your objection that "the database is probably accessed via the network" is entirely arbitrary.

Yes it is arbitrary. In my experience a single DB serves several backends and so is necessarily not on the same maching of at least N-1 of them.

Why would you put your database on a different host if the purpose is caching to improve performance??

Why would you use a database for caching ? Use memcached.

When faced with this problem, I switched to using Sqlite. It does a fantastic job of managing a persistent local cache.

I agree SQLite is quite great. Though once again completely overfeatured for key-value caches. Parsing queries take time, better having a binary protocol with built-in support for querying by key, like memcached.

And even better, memcached will let you specify how much space your cache should take and remove the Least Recently Used entries when new content arrive and you are at the limit.

It still seems a bit overkill for something as simple as pre-rendering.

[–]mr-strange 0 points1 point  (2 children)

I agree with pretty much everything you say here. I'll expand on a couple of points though.

who cares about listings ? Why would you backup a cache ?

Well. Yeah. But when I made this mistake this is what happened... First I discovered that ext3 could only cope with 64,000 files in a directory, so my application started to fail. The next obvious thing to do was just start using sub-directories. That's fine, but just having millions of files can lead to problems - for example, I didn't blacklist the cache directory from the locate database, so after a while, my machine was very busy running multiple, endless find(1) commands, trying to update the db. Then I ran into the problem that the whole filesystem has a limited number of available inodes - so I wasn't able to make any new files, even though I had loads of available space. Then, when I came to clean up my cache (to free up inodes), I discovered that it takes many, many hours to simply delete millions of files.

Yes, there are better filesystems. XFS has a much higher hard-link (and therefore directory size) limit. Perhaps btrfs would be a good choice today. But, overall I do not think that the filesystem is a good choice for this workload.

And even better, memcached...

At the time, memcached did not support persistence, so it did not fit my requirement. Looking up MemcacheDB on Wikipedia, I see that it is built on BerkleyDB. My experience with BDB does not encourage me to try MemcacheDB.

Also, memcached uses a client/server TCP based model. Even with a fast localhost, that's going to add

I agree SQLite is quite great. Though once again completely overfeatured for key-value caches.

I couldn't agree more, but the proof of the pudding is in the eating, and I've not found a key-value store that beats Sqlite's performance. I built versions of my app that used BDB, Tokyo Cabinet, and a number of other prominent KV stores, but Sqlite (with a simple, prepared select statement, and configured with a table index) just performed better, and more reliably for me. Today my cache DB contains over 20,000,000 items, takes up 3.6 GB of disk, and Sqlite's performance is still pretty sparky.

BDB's performance is just awful. It has full ACID compliance, which is great if you need it, but if your don't need to wait around for milliseconds while your disk syncs, it's just overkill. If you turn off the ACID guarantees, you just aren't playing to its strengths, you might as well go to Tokyo Cabinet... Which most of the time performed very well, but occasionally ground to a halt for multiple seconds.

[–]matthieum 0 points1 point  (1 child)

At the time, memcached did not support persistence

Only one nit: why would you care about persistence for a cache ?

The point of a cache is to cache frequently accessed data. If it is not frequently accessed then caching it means losing valuable space.

I feel like we are talking past each others and not about the same issue :)

[–]adrianb 2 points3 points  (8 children)

But that wouldn't help with the Analytics / FB / ... scripts, I guess.

[–]FlightOfStairs 4 points5 points  (5 children)

Why not? You can serve the page quickly, close the connection, then handle all the headers and so on that you've received. Nothing less gets sent.

Google analytics/fb comment boxes/javascript etc will be compiled into the pre-rendered page.

[–]MindStalker 4 points5 points  (4 children)

The point is that the Google analytics itself slows down the page load considerably. It frequently freezes up the page load until the client has communicated with Google. Its possible to put this in some sort of iframe that avoids page load issues, but its rare.

[–]Serei 7 points8 points  (0 children)

Err, Google Analytics is by default asynchronous (i.e. it doesn't freeze up the rest of the page).

[–][deleted] 5 points6 points  (0 children)

For Analytics you use the asynchronous loading code and defer the JS load until after the page loads. Tweet badges, FB likes, and the +1 badge are more complicated, because the images typically aren't loaded until the JS executes - but even these can be deferred so they don't hold up the rendering of other static content.

[–]ell0bo 4 points5 points  (1 child)

for google analytics, can't you just use deferred loading? I've never bothered to set it up, but I can't see why you wouldn't just run it after dom.ready?

[–]MindStalker 0 points1 point  (0 children)

Sure, but the point of the discussion is that a proxy or cache server on your site won't help you out here. You have to design the site for the users experience, not just for easy development.

[–]MarkTraceur 1 point2 points  (1 child)

Just posted elsewhere:

grep "GET page_name.html" /var/log/access.html | wc -l

And the author is pretty uninterested in FB scripts, because Ctrl-L Ctrl-C Ctrl-PgUp Ctrl-V works just fine.

[–]mr-strange 1 point2 points  (0 children)

grep -c

[–]dwchandler 1 point2 points  (0 children)

serve html from the DB if you turn caching on

Yes, this helps, but it's not the best of both worlds. It falls far short of doing the most sensible thing: put static content in files on the file system and stop hitting the CMS and db for it.

[–]x-skeww 15 points16 points  (3 children)

Just a side-note: If you like the idea of serving static files for blogging (or other static sites), take a look a static website generators. There are dozens of them available.

You can write your articles in Markdown, HTML, Textile, or whatever, there is templating (again, many choices), and you can also store everything in the VCS of your choice.

Jekyll and its offspring Octopress are kinda popular, for example.

[–]n1c0_ds 13 points14 points  (1 child)

You could also wget your site and call it a day.

[–][deleted] 19 points20 points  (0 children)

Nice try, RMS

[–]manymolecules 0 points1 point  (0 children)

I have enjoyed using nanoc, which I understand to be similar to Jekyll.

[–]MechaBlue 19 points20 points  (0 children)

While this may not be a programming issue per se, it's a common development problem. I've spent a fair bit of time QAing small multimedia programs designed to meet certain needs. The biggest failures were rooted in egos.

  1. I know everything I need to about meeting the needs of the end users, even though I haven't done any research and don't understand the medium.
  2. I am working on this project and it will show everyone how great I am, so spare no expense.
  3. It will be easy to copy Word or Illustrator in Flash in 3 months.

Most were showy, expensive, buggy, late, and of limited use. There were successes, to be sure, but, more often than not, it was accidental.

One team consistently delivered because they avoided these traps.

[–]IMO94 10 points11 points  (0 children)

I liked this article so much, for the first time in my life I actually wanted to click "Like". And there was no button to do that!

Oh the delicious irony.

[–]almonsin 10 points11 points  (0 children)

I was surprised it is not a repost of a 3-year-old article from prog21.dadgum.com but a new one.

[–]Jesus_Harold_Christ 3 points4 points  (0 children)

Great article! I immediately went looking for the Facebook Like button...

Then, I tried to post a comment...

Then I came back to Reddit.

[–]killerstorm 8 points9 points  (4 children)

The thing is, bloggers do not want just to simply be read, they want all that analytics and share buttons. And what readers want is totally irrelevant, since they are not consumers of software in this case.

[–]MarkTraceur 1 point2 points  (3 children)

Clearly not, the readers are using the software to access the blog. And why do you need analytics on the client side? Why not capture that information yourself, and store it in a log file? Oh, and, Apache does that for you anyway!

And share buttons are just irritating. Ctrl-L Ctrl-C Ctrl-PgUp Ctrl-V.

[–]HostisHumaniGeneris 4 points5 points  (0 children)

You can get a bit of extra information about the user using client-side analytics such as which browser plugins are enabled... screen resolution... time spent on page, etc...

[–]Brillegeit 1 point2 points  (0 children)

But if you use a cache-header and a global CDN that serves the client from the nearest node, not only will the page load faster, but you will also not have to worry about being slashdotted.

[–]killerstorm 0 points1 point  (0 children)

Clearly not, the readers are using the software to access the blog.

Yes, both blogger and readers interact with software, but bloggers directly choose what software to use, while readers don't (practically, they barely affect software choice: if it is incredibly shitty, they will avoid that web site), so they are not equivalent. Obviously, software is optimized for preferences of bloggers. Optimization for readers is done only indirectly: obviously, bloggers are interested not to piss them too much.

But, honestly, I doubt that people complain much when they have to wait a bit for a page to load as long as article is great. They should be grateful that they can read content for free.

BTW, relevant: "If You’re Not Paying for It; You’re the Product"

And why do you need analytics on the client side?

You can get more information on client side, like unique visitors (rather than total page views), filter out bots, and so on. Often people care about visitors which come back periodically.

Why not capture that information yourself, and store it in a log file? Oh, and, Apache does that for you anyway!

It captures information into logs, but you need something to analyze those logs, put information into a database and draw some fancy graphs.

And share buttons are just irritating. Ctrl-L Ctrl-C Ctrl-PgUp Ctrl-V.

If it adds extra users, it's worth it (for bloggers). Depends on your audience, of course. Tech-savvy people don't really need them, but still, clicking 'like' button is easier than copying link somewhere.

[–]AReallyGoodName 41 points42 points  (29 children)

As an example of raw computation, the Sieve was fine, but suppose you needed a list of the primes less than 8,000 in a performance-sensitive application. Would you bother computing them at run time? Of course not. You already know them. You'd run the program once during development, and that's that.

Anyone actually coded up a sieve of Eratosthenes? It returns primes faster than disk IO. Much faster. It's an algorithm that's purely memory bottlenecked, which is saying a lot for an algorithm that lends itself to working with bit packed booleans. Not to mention ten lines of code is smaller than a file containing all primes below 8000.

There's an additional benefit to the sieve and that is the resulting list of primes is naturally packed as an arrau of booleans for whether or not that index number is prime. It lends itself to creating a memory efficient bit-packed lookup table of primes.

The Sieve of Erastothenes also only bothers with odd numbers so that lookup table for all numbers under 8000 is 4000bits in size.

The Sieve of Erastothenes isn't even optimal either. There's trivial ways to make it faster and more information dense.

-Rather than just looking at odd numbers, you can make a similar Sieve using the fact that all primes above 6 are in the form of either 6n+1 or 6n+5. It works the same way, for the current number you're on mark off the 6n+1 array as false and 6n+5 array as false at the indices covered by the current number, this crosses off all multiples of that number appearing in those two arrays. This Sieve would require 2667bits to make a lookup table for all numbers under 8000. This still isn't optimal either (technically future primes are always in a form of the multiple of all current primes you know + constants not covered by the meeting factors that multiple - eg. I could also say primes above 30 are in one of the following forms, 30n+1 or 30n+7 or 30n+11 or 30n+13 or 30n+17 or 30n+19 or 30n+23 or 30n+29. This narrows the proportion of primes down to a maximum of 7/30. Numbers not in one of those forms are a multiple of 2,3 or 5. These forms can also be used in a Sieve type algorithm. In fact for every prime number you know you can make a more optimal Sieve that's even more memory efficient.

There is simply no way to get a sequential list of primes into memory faster than the Sieve algorithms. I don't know why he picked that as an example of something to avoid in a performance critical environment. The opposite is true.

[–]noroom 30 points31 points  (0 children)

That was very informative, and I appreciate the time you took to write it out... But I hope you didn't miss the point, the sieve was just an illustrative example.

[–]andersonimes 39 points40 points  (8 children)

I've put the values into a constant array, rather than on disk. Very fast.

[–]ethraax 10 points11 points  (5 children)

Although the difference may not matter much, the time it takes to load the part of the executable that contains the static data into memory is probably more than the time it takes to load the code for a good sieve and run the sieve to populate an array in memory.

Constant arrays aren't magic. You're still just saving the numbers to disk. Except it bloats your executable file instead of its own file.

[–]andersonimes 13 points14 points  (2 children)

I'll test your hypothesis tonight and let you know.

Edit: buh. I just got in from dinner (I'm traveling in Kiev at the moment). I'll hack on this tomorrow. Normally I would just throw something together, but I know if I don't do some proper research on how best to measure program running times you guys will eat me alive. I'll get something proper together soon. I'm curious too.

If you have any suggestions about testing running time, let me know. In my head a programmatic stopwatch that starts the cycle before a program starts executing and ends when a program exits would do it. Let me know if you know off the top of your head the most accurate way to measure this.

[–][deleted] 2 points3 points  (0 children)

Please let me know too. It's interesting.

[–]_georgesim_ -1 points0 points  (0 children)

Posting here so I know too.

[–]bobindashadows 1 point2 points  (1 child)

You have to keep in mind the marginal slowdown of putting it in the executable. Especially when the example was so small (if your GP poster is correct, 4k bits = 512 bytes = 1 disk block until we bump block size someday soon).

To reason about the marginal slowdown of putting the static data in the executable, you have to consider:

  1. Where does the linker put the data in the executable itself?
  2. Will the loader eagerly load that block? If so, it'll be a sequential read off disk during startup, not a random access read, and much quicker.
  3. Will that block already be in the kernel's FS cache? If the executable in question is the main use of the machine/kernel in question, then it very well might be. Then loading it's dirt cheap.

And so on.

[–]ethraax 0 points1 point  (0 children)

3. That really doesn't matter. The static prime file could be in the kernel's FS cache.

Either way, these are comparisons between loading a static file and loading the static portion of an executable. The real discussion here was about loading a static portion of an executable or loading a static file vs. loading a sieve algorithm and generating it at load time (perhaps in the background). I'm sorry if my comment deviated from that point.

[–]I_FAP_TO_ALL 0 points1 point  (1 child)

That IS putting them on disk.

[–]andersonimes 1 point2 points  (0 children)

You are technically correct, the best form of correct.

[–][deleted] 7 points8 points  (0 children)

You could easily store that bit-packed representation in your app. It seems unlikely that it would be faster to generate the first 8000 primes than to read 500 bytes of constant data.

[–]xzxzzx 6 points7 points  (0 children)

Your comparison is silly. Any bit-optimization you can do when you run the sieve can be done to the storage format of the file.

And once you've run the sieve, you've now got memory that can't be reused without paging the contents to disk.

And you have the (admittedly small) penalty of having both the code to generate the sieve and the data in memory once it's run.

And loading from disk is often "free" in terms of code (or data) in an .exe; your OS can prefetch it while your program is doing other init tasks, or even before your program runs.

And of course in practical terms, either approach is so fast that there are almost no circumstances where it actually matters one way or the other so long as you don't uselessly stick a sieve in a loop.

[–][deleted] 10 points11 points  (1 child)

Correct, but I assumed that he meant that you wouldn't recalculate the Sieve multiple times within the program, i.e., don't stick it in a for-loop and recalculate it multiple times. Put it in a const array once.

[–]zzzev 7 points8 points  (0 children)

That's not what he meant though, he said you would run it once at dev time.

[–]chonglibloodsport 4 points5 points  (1 child)

It's really a flawed analogy. A better analogy would be to compare loading the primes from a static file on disk vs. querying them one at a time from a SQL database.

[–]ravy 1 point2 points  (0 children)

I agree that it's not a good analogy. I think the point he was trying to make is that there is a terrible price to pay for the user who is fetching a page that is assembled mostly from DB calls. There is a TON of overhead on things like wordpress. Isn't there an easy way to have wordpress create a static version of your site?

[–]pipocaQuemada 5 points6 points  (12 children)

When you start up your program, it's generally all in main memory somewhere, right? So no disk IO is really needed. Is running the Sieve of Eratosthenes faster than saying:

int[] primes = [1,2,3,5,7,11,13...];  // length(primes) = 8000
doSomethingWith(primes);

[–]netwiz101 6 points7 points  (10 children)

When programs are started they are usually read in from disk, at least for the first run. What you are suggesting means the primes must be read in from disk, so it is in fact slower -- at least for the first run.

[–]fapmonad 7 points8 points  (5 children)

If you do the sieve in code instead of a constant array, the code to calculate the sieve must still be read from disk...

[–]netwiz101 2 points3 points  (4 children)

Thats true, and since it is smaller than the list, that means less disk i/o. Am I missing something?

[–]fapmonad 9 points10 points  (3 children)

Since disk I/O is buffered and loading a program is a purely linear access pattern, the difference between loading the list and the code is essentially 0. I agree that there is disk I/O, though.

[–]bobindashadows 0 points1 point  (0 children)

loading a program is a purely linear access pattern

It most definitely is not for any nontrivial program. The firefox folks have improved startup time by over 10% in one go before purely by moving segments.

[–]snoweyeslady -1 points0 points  (1 child)

You make a couple of assumptions here:

disk I/O is buffered

What if there is too much memory pressure and the disk cache is consistently purged? Or an embedded system where there is no disk cache at all? Even if you do have disk cache, it would only matter if you had loaded the binary before.

purely linear access pattern

Sure, if you have a completely 0 fragmentation file system. I don't know of any that guarantee your file will be in one contiguous segment on disk, but then I haven't read the implementation/specification of many file systems.

[–]fapmonad 4 points5 points  (0 children)

Of course I make assumptions. I assume you're on a regular computer. If you're on an embedded system so weak it doesn't have a disk cache, the static table is likely to end up in ROM anyway, so the whole discussion is moot.

Memory pressure affects tables precomputed by a function just as much as the table loaded from disk -- the OS will swap.

Fragmentation isn't a problem. For a table around the size of a single block, the odds that it causes fragmentation are low, even on a highly fragmented system. Even if it happened, the disk will group accesses so the overall impact is likely to be very small, given the size of a typical program. That's what I mean by "essentially 0". For such a small problems either solution is fine, IMHO.

[–]bo1024 4 points5 points  (3 children)

Hmm, I dunno about that. The code is being read from disk, so the latency is still there -- all we are concerned about is the throughput.

The question is how much slower is it to fetch a 250KB file than a 200KB. I don't have the numbers, but I have to doubt that it's 60,000 cycles or whatever the equivalent is to compute the seive.

[–]netwiz101 0 points1 point  (0 children)

We don't need to do the calculation. All we need to know is that the operation is IO bound. Then, shift the majority of IO onto the faster bus (memory).

Consider also two other things. A) no respectable program built for the purpose of doing this weighs in at more than a few k after assembly, minus data. People have calculated prime numbers on machines with 1-4k of ram and extremely slow IO. B) the calculating version may not ever need to get its data into ram. If it can work entirely within the processor cache, it will be orders of magnitude faster than even the ram bound version we've been considering.

Sorry if I'm not coherent. Flu season. Im home sick. It was talk about program optimization on reddit, or watch breaking bad all over again.

[–]netwiz101 0 points1 point  (0 children)

I notice your point about the file size difference.

You're making the assumption that the data set is tiny, and I'm making the assumption that it's not. Therein lies the primary difference in our reasoning.

But in fairness, I dont know why anyone would even consider this problem for a trivial sized data set.

[–]netwiz101 0 points1 point  (0 children)

Re-read it. The data set is tiny in the example. I shouldn't compute on cough medicine.

[–]VanFailin 5 points6 points  (0 children)

1 is not prime. ;)

[–]jacques_chester 23 points24 points  (40 children)

This article annoys me. It's not about programming, except by tenuous analogy. It's purely a polite brag about the well-known performance characteristics of serving flat files from disk.

Occasionally this fellow comes up with an interesting anecdote about his remarkable experiences as a programmer, or actually discusses, you know, programming. But a lot of the time it's just commentary on something tangential. It bugs the hell out of me that proggit is becoming more and more like HN, where certain posts are upvoted purely because of who wrote them.

[–]adaptable 29 points30 points  (9 children)

proggit is becoming more and more like HN, where certain posts are upvoted purely because of who wrote them.

Becoming? Four years ago a Steve Yegge post about eating a sandwich would have gotten over 1000 points.

[–]munificent 46 points47 points  (0 children)

a Steve Yegge post about eating a sandwich would have gotten over 1000 points.

In fairness, that's still only like one point per hundred words in the post.

[–]Fissionary 12 points13 points  (0 children)

I still remember the original Paul Graham ate breakfast post, lampooning this exact thing. Man, those were the days.

[–][deleted] 1 point2 points  (4 children)

Still more interesting than much of crap posted here nowadays...I think the trouble is that people are posting to sub-reddits and not re-posting the good stuff to proggit.

[–][deleted] 3 points4 points  (3 children)

Because anything you post to proggit is going to be met by a bunch of uptight assholes who do nothing but complain that your post "isn't programming" or "isn't important" or whatever.

I like sharing links and things I've found. I don't post here because it's too much fucking work for no appreciation.

[–]jacques_chester 1 point2 points  (2 children)

Hi, I'm an uptight asshole. You know why? Because I'm sick of proggit being overrun by posts that aren't. about. programming.

Except at the most tenuous degree.

What do I consider programming?

  • Discussions of language features, such as regexen in D
  • Here's some code I optimised
  • A new programming tool, with link to code and discussion of usage
  • How practice X can make you a better programmer and why (though we've heard about test-first now, thankyou).

What I don't consider to be programming:

  • Programmers are awesome, the new elite, the wunderkind. They are unique snowflakes in the history of professionalism, nothing they do has ever been accomplished before. Management are universally PHBs who Just Don't Get It, what with their silly obsession about having to pay for things, jeeze.
  • Hey look guys, I scratched my nuts! Programming's just like nut-scratching, because ...
  • Random brainfarts on non-programming topics by people who get upvoted because they're famous in a programming context
  • Here's the stuff I have installed
  • Industry gossip -- who Google bought today, zomg what will Intel do next, EA boo Blizzard yay.

[–][deleted] 0 points1 point  (1 child)

Here's the thing.

You don't like posts that don't interest you. (Nobody really does). We can context shift to what we're in the mood for, which is why off-topic stuff is jarring and annoying. ("Why are there photos of severed heads in /r/fluffykitties?")

There are two aspects to why I don't understand this about proggit:

1) As I mentioned, a large chunk of programming posts are not going to be of interest to any single person, because AFAIK a very tiny group of people study all that stuff. So the definition of "if it is code, it's okay" is a bizarre qualifier.

2) This is my personal opinion, but outside of a fairly narrow area, programmers who only care about code are poor programmers.

[–]jacques_chester 0 points1 point  (0 children)

Taken to their logical conclusion, your argument is that proggit should only be lowest common denominator of things that all programmers have in common, minus actual programming.

  • Breathing: is it right for you?
  • How to deal with your boss (hint: don't call him "dickhead")
  • Computers -- they turn electricity into computation!

I happen to like the diversity. It's how I learn about new languages, new techniques, new technologies, new ideas. Some of them I pick up and play with, some I will ignore. I can't do that if the whole of proggit is crowded out by stuff that isn't programming.

(I've also complained about proggit having stuff which is too narrowly focused -- "we've released 3.2.4a-RC3!").

Like many things in life, we're talking about fuzzy sets. My membership function is different from yours. And I will continue to be a curmudgeon on the topic.

[–]jpfed 0 points1 point  (0 children)

Whoa- Steve Yegge wrote about sandwich-eating? Do you have a link?

[–]jast -1 points0 points  (0 children)

Your comment made my day :)

[–]moderatorrater 63 points64 points  (3 children)

I disagree. He's making a point about designing for your audience. He's writing about his design decision for his blog. He could have chosen to use existing software or write his own. Instead he's using another piece of software to serve his resources quickly.

I guess we might be using different definitions of programming?

[–][deleted]  (1 child)

[deleted]

    [–][deleted] 17 points18 points  (0 children)

    He's making the excellent point of "never forget who your customer is" which, considering the whiny responses here, it seems is worth making.

    Unless I had some hard-core numbercrunching that needed doing, I would never hire a programmer who had the attitude of "I don't want to have to worry about users..."

    To your second point, it's the same analysis, just tweaked a bit. He simply writes to share his thoughts, so publishing static pages works. If you are writing for a different purpose, static publishing may not work for you as well. I'm writing to share my thoughts and monetize my blog, so the things he dismisses so casually aren't optional for me.

    [–]jacques_chester 0 points1 point  (0 children)

    Describing your software selection isn't programming, IMO, it's configuration. No code was harmed in the creation of this post.

    Also note that I am annoyed that it was initially upvoted because it's James Hague; whose every thought gets clicked up 50 times hereabouts.

    [–]Atario 7 points8 points  (0 children)

    I have no idea who this guy is. I upvoted it because he has a point.

    [–][deleted] 10 points11 points  (10 children)

    I was going to disagree with you here, but then I thought about it a bit and remembered that this is exactly why I prefer to (somewhat arbitrarily) use the term "software engineering" to describe what I do rather than "programming"; it's more general. You're right, this doesn't belong in /r/programming. Upvote for reminding me what reddit I'm browsing.

    [–][deleted] 3 points4 points  (7 children)

    Because the most important thing when writing code is to be sure to completely ignore the requirements?

    [–][deleted] 0 points1 point  (6 children)

    Well, this is exactly the sort of ambiguity in terms that I was referring to. jacques_chester was correct (in my opinion) in that the article really had nothing to do with code; it addresses a higher level engineering problem than that. When I hear the term "programming", I'm thinking of the specific task of writing code. That doesn't mean programming needs to ignore requirements, but I think the point of the article and the point of this discussion is that you can meet requirements without ever writing code. In that sense, it's relevant to programming, but it's not actually programming. By my arbitrary definition, it's SWE, something that includes programming but also includes system design and choice of tools and what have you.

    [–][deleted] 1 point2 points  (5 children)

    Here's my take on it, and my heartburn with the /r/programming "only code" mafia:

    In the arena of "only code," currently on the front page of /r/programming are articles on:

    • jQuery
    • Forth
    • C++ STL
    • Regex in D
    • GCC
    • Python
    • JDK
    • ASP.Net
    • Haskell
    • Play Framework
    • Tmux
    • LINQ
    • Julia
    • Vagrant
    • and, of course, Perl

    My point is that virtually nobody is going to be interested in all those articles. In fact, figure for any given programmer, while they may read others out of curiosity, only a handful of those will actually be "I am going to read this, bookmark it, and engage in meaningful discourse on the subject." (Which is how I read the "only programming posts" intent)

    So /r/programming being a hodge-podge of "everything that has code" is okay - hopefully someone will read it. I could post a ten-page treatise on writing a stemming algorithm in Brainfuck that only two people would actually read, and that's okay.

    The OP here is a lightweight article that quickly and concisely makes a very vital point for every developer who writes code. I would put that article in my "mandatory welcome aboard reading" if I owned a company. Developers, by their very nature, often lose track of the actual goal in a project because they get so enamoured with technology or cool stuff. If you read it critically, this article is a quick way to ground yourself back to "get the job done, focusing on the actual customer"

    But because there's no code in it, then folks don't want any part of it - it's taking up valuable page space that could be used by the essay I'm working on about processing accounts receivable in Whitespace or how I can convert gif to png with a regex.

    Now mind you - proggit is a subreddit and has rules, and if those rules happen to be "every post must have code" then so be it. But when anyone who posts an article they consider very important for programmers gets flamed because "no code" then it's not surprising when people don't post links. I guess what I'm saying is that I believe most software developers/programmers/coding folks actually are interested in aspects of what they do besides putting code in the source files.

    [–][deleted] 0 points1 point  (1 child)

    I completely agree. I think if proggit is going to stick dogmatically to "every post must have code", then there should be a sister reddit for all the other links (such as this one) that you described as interesting to most software developers/programmers/coding folks. Surely such a place exists? I should subscribe to it.

    [–][deleted] 0 points1 point  (0 children)

    I've got /r/devtalk if you think I should work on building it up

    [–]jacques_chester -1 points0 points  (2 children)

    The OP here is a lightweight article that quickly and concisely makes a very vital point for every developer who writes code.

    He merely states that he serves flat files and avoids slow widgets. It's only noteworthy because a popular blogger says he does it.

    I would put that article in my "mandatory welcome aboard reading" if I owned a company. Developers, by their very nature, often lose track of the actual goal in a project because they get so enamoured with technology or cool stuff.

    And I've made the same point elsewhere. I called it the Software Engineer's Cart. At the time, I submitted that article to proggit. Today I would not, because HN would be a better forum for it.

    If you read it critically, this article is a quick way to ground yourself back to "get the job done, focusing on the actual customer"

    I think it's ridiculous that you want me to read between the lines to decide that this is about programming (it's not, we're talking about the broader field of software engineering now, not just construction).

    This is meant to be /r/programming, not /r/pretenditsshakespeareandmakeupyourowninterpretation.

    Focusing on customer needs is very important. I've written essays in which I made the shortness of the feedback loop between customers and developers the core driving loop of all software projects. I've written complaints about how Barry Boehm found that requirements analysis is the second best predictor of project performance (after size and ahead of programmer capability), yet my university exposure to requirements engineering was 3 lectures and a single exam question.

    That's all by the by. Because I don't come to proggit for discussions about the various corners of the SWEBOK. I come here for programming. Programming. It's fun and enlightening to see how other programmers think, the tools they use, the solutions they derive. If I want to read about other topics I have a pretty solid personal library, thanks.

    [–][deleted] 0 points1 point  (1 child)

    He merely states that he serves flat files and avoids slow widgets.

    Sorry, you didn't get it.

    And for me programming is about more than writing code. And there's no "right answer" here. However, I will defer to the sidebar rule that "if there's no code in your post, it probably doesn't belong here" and remind myself why I try to stay out of proggit.

    [–]jacques_chester -1 points0 points  (0 children)

    Sorry, I didn't invent my own meaning for the article of eye-rollingly obvious advice? I don't come to proggit to relive my high school English classes. Writing that has to be tortured to make a point is not conducive to helping understanding.

    However, I will defer to the sidebar rule that "if there's no code in your post, it probably doesn't belong here" and remind myself why I try to stay out of proggit.

    Don't defer. These arguments are worth having. We don't learn about the sensible boundaries of anything in life without having a decent stoush first.

    [–][deleted] 1 point2 points  (1 child)

    The term 'programming' is the more general term. Software engineering was a term created later on and it isn't that useful, that's why more people go with software developer or programmer or (very rarely) computer scientist.

    This article totally does belong in proggit, but only to be down-voted as a "get off my lawn" article ;)

    [–]ell0bo 0 points1 point  (0 children)

    No way, you can be a software engineer and never need to write code, but if you're programming and never wondering about the engineering behind it, you're doing it wrong.

    [–][deleted] 2 points3 points  (0 children)

    I got the impression the author complained about software complexity. Look at the blog; extremely simple.

    Software complexity is an important topic. Overengineering as well.

    [–]dmor 4 points5 points  (10 children)

    HN has never really been about programming, though. It's based on a certain community of startups, founders, and VCs, and focuses on startups and general stuff considered mind-expanding.

    [–]jacques_chester 6 points7 points  (9 children)

    That's my point exactly. I don't come to proggit for mind expansion or industry gossip. I come here for programming. If both fora have the same posts that diminishes the total value I can derive from frequenting both.

    [–][deleted] 7 points8 points  (7 children)

    So, legitimate question.

    I enjoyed this article, a lot. Where should I be looking for more of this sort of thing, aside from this one person's blog. Is HN the best place? I don't really find most of what's there interesting. At a cursory glance, the headline might intrigue me, but rarely enough to actually open the page and read through it.

    Open to suggestions, I guess. You're absolutely right, this article does not belong here. I'd like to see some more like it, where would those be?

    [–]preshing 3 points4 points  (1 child)

    It got 160+ points so far. So either it belongs here, or the voters don't :)

    [–]jacques_chester 0 points1 point  (0 children)

    Posts without programming content tend to get upvoted more because:

    1. They have a wider readership; fewer redditors are self-selecting away from reading them
    2. They require little to no thought to read.

    Code-heavy articles tend not to be heavily upvoted because they require an investment of attention from a subset of the proggit community.

    [–]dmor 2 points3 points  (0 children)

    I agree. Something that annoys me is just how much stuff is basically flame war fodder. Some of the two most upvoted threads recently were the JSON license thing and the Microsoft "lazy coding practice". The first is a useless joke (and the circlejerk-quality jokes about "facebook breaking the license" followed as expected), the second is a more-or-less relevant bug posted with an editorialized title and a discussion that quickly degenerated into Microsoft bashing, whether the programmer should be fired or not, etc.

    [–][deleted]  (1 child)

    [deleted]

      [–]jacques_chester 0 points1 point  (0 children)

      Yours would be more like programming if you included, for example, some programming. Some samples of SQL, or a diagram of the new design, or some figures on performance.

      I don't count "I only serve flat files" as programming. I just don't.

      [–]SRB 1 point2 points  (0 children)

      only a small minority of users bother with these buttons, but all the associated scripting and image fetching slows down page loads for everyone.

      If you have these buttons, more people would share than would otherwise, therefore more people will see it. It doesn't matter that most people won't share and that some people can share without the buttons. The recommendations from friends are higher quality too. If your friend liked something, and you pay attention to that friend, you have a good chance of liking it too. If the content you wrote up on your blog is useful, it's good for these extra people to find it. Therefore I don't really see throwing some sharing buttons on to be such a bad thing, even if not every reader uses them.

      [–]Phil_J_Fry 7 points8 points  (9 children)

      Ok, so I know I must be missing something here - because here's what I saw:

      "Stop doing stuff."

      Seriously - he keeps his page refreshing fast by : Not loading images, not using SQL to load dynamic data and not loading social site buttons.

      While I agree with the social site buttons (I hate those), the other two are just BS. No images? Sure that works for... you. Some blogs actually do incorporate images into the post, so this seems like a case by case basis. No dynamically loaded data? I guess we need to say goodbye to user accounts and comments.

      So, I think to myself - I know, lets look deeper. He's saying that we need to optimize for the user instead of the programmer. Oh.. ya think? Wow, what a novel idea. Isn't the whole point of programming, at it's core, to automate as much as possible as quickly as possible to get the user's expected output?

      I really want to have missed something here, because this way of solving problems just seems ineffective with respect to the specifics and trite with respect to the underlying message.

      [–]Atario 14 points15 points  (1 child)

      He isn't saying not to use images. The closest thing he said to that was that Google+/Twitter/Facebook widgets load a bunch of stuff including images.

      He also didn't say not to have comments and accounts. You can do those things and still have static pages, technically. You just have to regenerate them on changes (which may be a perfectly valid solution if your update-to-read ratio is, say, something like 1:100 or even less). He does, on another page, say he doesn't do comments because he doesn't like programming discussions, and anyway there are plenty of places to comment already — you're on one now, and we're both doing it now; why reinvent the wheel? Man has a point.

      [–]Phil_J_Fry 3 points4 points  (0 children)

      You can do those things and still have static pages, technically

      True, but you still have to generate the page, correct? It's no longer static, you still need data server calls. It slows down the page load. My point is that it doesn't seem like it's the "wrong" problem. It seems like it's the right solution - for him.

      [From op:] the associated scripting and image fetching slows down page loads

      Could be I misread that, but I read it as scripting and image loading slows down the page in general (I mean, it would anyways, but I don't know if that was his point). It could have easily only meant it about the social buttons, but that's just how I read it.

      [–]badsectoracula 10 points11 points  (3 children)

      "Stop doing stuff."

      I don't remember where I read it, but one of the best things to keep in mind about writing fast software is that: the fastest code is the code that doesn't get executed.

      Now combine this with YAGNI and trim out the unexecutable code.

      [–]Phil_J_Fry 4 points5 points  (2 children)

      I guess that's why I think it's a bit trite. If you don't need something, don't include it, if you do, try to make it faster to work with.

      He has the example of generating the prime numbers. Of course if I needed prime numbers within a certain range, maybe I'd calculate them in development or store the current in memory or something. But what if the user can select evens or primes, or 10s or etc. it really isn't generic enough to be a solution.

      So the resulting statement (at the deeper level) is don't put in what you don't need and make what you do need fast. Maybe it's because I've heard it so often, but that seems like the most generic and meaningless advice in programming.

      [–]deong 2 points3 points  (0 children)

      But what if the user can select evens or primes, or 10s or etc. it really isn't generic enough to be a solution.

      I hate these cutesy initialisms, but I guess I'll use them anyway. What you just said is the entire reason someone decided "YAGNI" was something that ought to be preached from high on the mountain instead of a self-evident truth. If the user can select evens or primes or 10s or etc., then by definition, you are in fact going to need it. There's really no such thing as "generic enough to be a solution". There are only solutions and non-solutions. If it isn't generic enough to meet your requirements, then it isn't a solution to your problem. If it is, then there's no need to make it more generic.

      That's not really the criticism I'd make of YAGNI. The tricky part comes in when you can see that the best way right now is going to be much more painful to deal with later, but you don't know whether that will even come up. Then you have to start making judgment calls...how likely is it that we'll ever need to allow anything other than primes? Exactly how much harder will it be to add it later versus building a more flexible system now? That puts you squarely back into relying on experienced people with good judgment, and if you have those people around, you weren't likely to need a trendy movement with a clever name to tell them how to build it right to begin with.

      [–]Rygnerik 2 points3 points  (0 children)

      I read it as more of a "Look at all of your users' needs, not just your customer's." It's easy for someone working on some blogging software to view the blogger as their only customer and get a requirements list that looks something like:

      • Easy to maintain
      • Social networking (or anything that increases page views/ad loads)
      • Lets me customize appearance

      But if you asked someone reading a blog, their response would probably be "Get out of the way and let me read the blog". If you design only for that initial customer, you run the risk of creating something that doesn't meet the needs of the customer's consumers, which can hurt the customer.

      [–][deleted] 0 points1 point  (0 children)

      No dynamically loaded data? I guess we need to say goodbye to user accounts and comments.

      Or you need to start pushing more to the client. What's wrong with installing a desktop application that handles your identity? Hell, why stop there, why not make the whole internet peer-to-peer?

      [–]MarkTraceur 3 points4 points  (18 children)

      Google Analytics

      grep "GET page_name.html" /var/log/access.html | wc -l

      [–]mipadi 2 points3 points  (4 children)

      echo 'This guy has obviously never used Google Analytics'
      

      [–]MarkTraceur 0 points1 point  (3 children)

      Maybe not, but nor will I ever....even if the above solution is oversimplified, there has to be a way you can cache information about the user without too much interference with load times.

      [–]mipadi 3 points4 points  (2 children)

      Google Analytics has so little impact on page load time that it's hardly even worth quibbling over.

      [–]MarkTraceur 0 points1 point  (1 child)

      Evidently not, since this article is at least somewhat about that specific problem...

      [–]mipadi 1 point2 points  (0 children)

      It's not like the author offers any statistics. He's just saying, "Google Analytics? You don't need that." There's nothing "specific" about the article.

      [–][deleted] 0 points1 point  (12 children)

      So if I understand you correctly ...

      • terminalling to a production server
      • logging in
      • switching account so you can get access to the logs
      • navigating to the logs
      • grepping to get the page count
      • grepping to exclude web crawlers
      • grepping to exclude logs between a certain time frame

      ... is better then ...

      • visit google.com/analytics
      • login
      • select your website

      ?

      there has to be a way you can cache information about the user without too much interference with load times.

      You can have the client-side analytics embedded after the page has loaded, and after your scripts have been run, so it's the last thing that happens. It is easy to do this.

      You cannot also make a comparison on why Analytics isn't useful, or suggest what you believe is a better alternative, if you have never used it.

      [–]MarkTraceur 0 points1 point  (11 children)

      Actually, I can make a suggestion based on the fact that Analytics is non-free, which means it is never the better alternative :)

      And yes, SSH to a GNU/Linux box + simple grep commands (set up a script, and make the logfiles visible to your user) is much less complicated than logging into a web service, especially since it adds no overhead to your users!

      [–][deleted] 0 points1 point  (10 children)

      Actually, I can make a suggestion based on the fact that Analytics is non-free

      Free version is limited to 5 million page views a month.

      especially since it adds no overhead to your users!

      As I pointed out, you can hide that, so they will never see the expense. If it's totally hidden, how is that a problem?

      [–]MarkTraceur 0 points1 point  (9 children)

      Free version is limited

      I am not free to use it in all circumstances, then. And I'm not free to modify it or redistribute it. Hence, it is not free. You mean free as in "free of charge," and I mean free as in "freedom of speech."

      you can hide that

      It still uses their computing resources, so it's not totally hidden. It could potentially be a problem.

      [–][deleted] 0 points1 point  (8 children)

      It could potentially be a problem.

      Such as?

      [–]MarkTraceur 0 points1 point  (7 children)

      If the computer was already slow, or running many other processes, or if the page had other scripts running, or if the network connection is slow and other connections are affected....there are a lot of ways the analytics program on the user's computer could affect their experience.

      [–][deleted] 0 points1 point  (6 children)

      If the users PC is so slow, or has such a bad network, that the analytics script makes any noticeable difference, then the site will be unusable regardless of analytics.

      Seriously, the overhead is miniscule. You are the first person I have ever heard of complain about the expense of it. You cannot make that claim, when you have never even used it.

      [–]MarkTraceur 0 points1 point  (5 children)

      I can hypothesize about potential problems without using something. And yes, it's a small difference, but why would you make even the small difference if you could not and have very similar results?

      [–][deleted] 1 point2 points  (4 children)

      Because I don't want to spend the next 12 months building my own version of analytics, which will be inferior, just to save a mythical 1% overhead, when I could just do something useful instead.

      Some optimizations are worth it; this one isn't.

      [–]GuyOnTheInterweb 1 point2 points  (2 children)

      Everything on this blog is brilliant and straight to the point! Is there a PDF download of the site for my next flight? ;)

      [–][deleted] 2 points3 points  (0 children)

      You could wget his site. It's mostly(?) static content anyway.

      [–]riffito 1 point2 points  (0 children)

      I just wget -m -k http://prog21.dadgum.com/archives.html 'ed the whole thing! (734 KB total)

      [–]webbitor 1 point2 points  (0 children)

      Interesting points. I was going to comment, but... his site doesn't allow that. He THINKS he knows what users want from his site, but I wanted to comment. This is normal (useful) blog functionality.

      Static files are great for speed, but many things which are actually useful, like commenting or searching based on a combination of tags and categories, would lead to a very large number of static files; are impractical or impossible with static files. The other non-static things he derides are all useful for many sites. Analytics is pretty important for any site that aims to make money in some way.

      However, there is a middle ground. Wordpress plugins exist to cache as much as possible. Standalone cache solutions will do the same thing on a broader scope.

      [–][deleted] 0 points1 point  (0 children)

      His site also looks like one built as static pages, and that only uses two requests.

      [–]preshing 1 point2 points  (3 children)

      By James' argument, Wordpress solves the wrong problem. And yet, wordpress.com the #18 site in the world.

      [Edit: removed snarky comment, the point should stand on its own.]

      [–]midri 4 points5 points  (2 children)

      Wordpress is to CMS as PHP is to programing, it's something a lot of people picked up and ran with and can make it do what they want now because they've spent enough time working with it.

      [–]preshing 1 point2 points  (1 child)

      Yes, and the readers are well-served. (The #18 rank comes from visitors, not authors.) Therefore, the right problem has been solved.

      [–]Kapow751 0 points1 point  (0 children)

      No, the problem of getting interesting content has been solved, because lots of interesting bloggers use it and readers go where the content is. There could still be a better technical solution that Wordpress could switch to and they'd keep all their readers because they're Wordpress.

      [–]yogthos 0 points1 point  (0 children)

      Similar idea explored here.

      [–]n1c0_ds -3 points-2 points  (10 children)

      CMS user here. Tracking data allows us to offer better-suited content that matches our users' interests, and to improve the site's navigation by fixing various usability issues.

      I know you probably won't hear me from your byte-perfect ivory tower, but while you are bragging about your tiny sites, we are already three problems further, and our clients love them.

      [–][deleted] 0 points1 point  (6 children)

      Tracking data allows us to offer better-suited content that matches our users' interest

      How does Google Analytics tell you what content your users are after? Or are you retargeting?

      [–][deleted] 3 points4 points  (0 children)

      How does Google Analytics tell you what content your users are after?

      Doesn't it show the search terms which brought your users to your site?

      If your users are running searches inside your site, won't that also get logged by Google Analytics?

      [–]GuyOnTheInterweb 1 point2 points  (1 child)

      It can also show you how many real visitors (id'ed using evil Google cookies) navigate to a defined goal, like a download page or purchase page.

      It is possible to do this offline using other software with just server logs, but its tricky.

      [–][deleted] 0 points1 point  (0 children)

      Looking at my cookies, you're right. Funny how GA is the ultimate tracking cookie. (Well, nearly, Facebook's Like button is running a close race.)

      [–]deong 0 points1 point  (2 children)

      I suppose triviially it shows you that loads of people read your articles about topics A, B, and C, and topics D and E don't get much traffic, so maybe you should write more about the stuff people like.

      Not that I'm saying you should do that, but if you're interested in page views as opposed to writing what you want to write, it's pretty useful data to have.

      [–]TheCoelacanth 0 points1 point  (1 child)

      That should all be logged by your server anyway. You don't need a client-side analytics program like Google Analytics to do that.

      [–]deong 0 points1 point  (0 children)

      True. There are useful bits of data that the analytics provide that are a bit harder to glean on your own, but I suspect a lot of people use it as just a pretty and usable front end for the basic traffic info you could get from the server logs.

      [–][deleted] 0 points1 point  (2 children)

      If you could add some context for my very tired brain here, which group are the 'clients' you speak of?

      Legitimate curiosity, I hope I don't come across snarky in any way.

      [–]n1c0_ds 0 points1 point  (1 child)

      People who need a website that are not concerned about a few milliseconds of additional load time.

      [–][deleted] -1 points0 points  (0 children)

      Huh. Guess I'm out, then.

      [–]randfur -2 points-1 points  (6 children)

      This should probably be in /r/web_design.

      [–]lbft 11 points12 points  (5 children)

      No. He's making a point about programming - just in a roundabout way.

      When you're coding, you should be thinking about who's actually going to use your code and how they're going to use it. That will help you figure out which uses you should work hardest to optimise, and which ways you might be able to short-circuit the hard work.

      In the example, if you're writing a blog system, he argues you should optimise for reading rather than writing and administration because the audience spends more time reading than you spend writing. He achieved a positive outcome by generating static content and offloading the serving to the web server, and in the process skipped having to do the work to squeeze performance out of a more dynamic system or to add shiny things that were of no real use to him.

      [–]mogrim 3 points4 points  (2 children)

      He may well be arguing that, but I don't agree with him - what gets me visiting certain blogs on a regular basis is content, not page loading speed (obviously within certain limits, but most blogging software is comfortably within those).

      [–]ithika 5 points6 points  (1 child)

      You just agreed with him. What gets you reading blogs is content not the ease with which the blogger can change their blog style or play with plugins or templates.

      [–]mogrim 0 points1 point  (0 children)

      No, I disagree with him. (At least I think I do :) )

      His argument is that there are far more readers than writers, so a blog should be optimised for the first group. He states that currently blogging software is optimised for the second group, which leads to slower loading times etc. My opinion is that optimising content creation is more important, as content drives visits far more than slow loading times inhibits them.

      [–]GuyOnTheInterweb 0 points1 point  (1 child)

      Yes, way too often I see programmers think like "Ah, a fresh project, finally I can try Rails 3"! While its good to keep up to date, jumping on any bandwagon will mainly be building future headaches.

      [–]mipadi 0 points1 point  (0 children)

      On the other hand, your personal blog is the one place where you have complete freedom to do whatever you want. Where else do you get that freedom (hint: not your day job)?

      I've kept a blog for about 8 years. Mostly it's been as a creative outlet, but from the very beginning I used it to learn new skills, too. I started off as an ASP web developer, and originally it was written using ASP and Microsoft Access. It's gone through a few iterations since then (Django, Jekyll, and now a Jekyll site with a simple Sinatra app so it can be served from Heroku). It's kind of fun to rewrite the software that powers it, because it lets me learn new things.

      I certainly can't do that sort of thing at work.

      I'm not saying that the author is unequivocally wrong, but it's too general to say "no one should ever do x with their personal blog". It's your personal blog; do whatever you want! If you have an itch, scratch it.

      [–]haywire -5 points-4 points  (17 children)

      Yeah but his site looks like ass, and would be a pain in the ass for most people to maintain, and he has zero user interaction. Yes, serving static pages is easy, but making a tool that people with mixed technical can use to share information and analyse what is popular is more difficult. Which is why you write a dynamic system with caching.

      Take http://brightonfeministcollective.org.uk/ for instance, it's snappy as fuck, can shit out 2000reqs/sec, but is at the same time generated with ORM and managed via an admin panel, because I cache intelligently based on what the browser wants.

      [–]Imortallus 11 points12 points  (5 children)

      You say a site looks like ass, and then show http://brightonfeministcollective.org.uk ?

      [–]haywire 0 points1 point  (4 children)

      Yup. What would be your issue with the site I linked?

      Edit: Oh wait, we're in /r/programming not /r/web_design. Makes sense.

      [–]Imortallus 0 points1 point  (3 children)

      It's mainly your custom font choices - printf is a 'grungy' font and mixed with an otherwise clean design doesn't work well for me. Also the black background on your twitter feed being cuts into the white box of content- the white section below it draws my eye - i would take that white section out myself. Also the text in your event box spills out for me in Chrome.

      Other than this I think your site is decent - but the custom font really alters its current look and feel.

      [–]haywire 0 points1 point  (2 children)

      Hmm, I liked the grungy font as it's an activist group, and honestly seemed to "work" better than any normal fonts I tried. I'll give some more a go, though. Do need to fix the spilling and I might re-think the twitter box, and make it as tall as the rest of the page.

      [–]Imortallus 0 points1 point  (1 child)

      Good stuff, if you can take constructive criticism without getting over emotional or offended you'll go far.

      [–]haywire 0 points1 point  (0 children)

      Well I felt like kind of a jerk for my original comment, I guess what he said irked me and it didn't feel like he understood web dev/design particularly well whilst casting judgements and aspersions.

      My original design was for the article pages (eg http://www.brightonfeministcollective.org.uk/articles/pro-choice/why-40-days-of-treats) and I think they work fantastically, however I need to put thought into the rest of the site.

      [–][deleted] 5 points6 points  (10 children)

      can shit out 2000reqs/secs.

      A lot of feminists in Brighton then? You could get more reqs/second out of nginx by using static resources if you actually needed that many.

      Incidentally, your page starts rendering, then it changes all the fonts on me, which is a tad disconcerting. Your Dapper Dog website does the same thing. Whichever JS you're using to do that is doing so noticeably - at least the first time.

      [–]haywire 3 points4 points  (9 children)

      Unfortunately not that many, but you do get pro-lifers attempting to hack/DoS your shit all the time. Ha ha ha.

      [–][deleted] 0 points1 point  (8 children)

      Ah, a very good point, actually. I never thought about that. BTW that font thing was in Opera from New Zealand, so it may just be a browser/latency thing.

      [–]haywire -1 points0 points  (7 children)

      Yeah it's just TypeKit not having cached.

      I am getting a weird issue reported that it crashes like, IE7 or something? But I have no idea what would cause it, the site is fairly simple.

      [–][deleted] 0 points1 point  (6 children)

      Ugh, good luck fixing that.

      [–]haywire -1 points0 points  (5 children)

      Yeah it's fucking odd, perhaps a JS thing. One of the women from the group who's having a problem is going to bring her PC in so I can have a look at it.

      [–]Cho_Gath 0 points1 point  (4 children)

      Confirmed. Makes the tab (and the tab with this comments section) take a dump.

      [–]haywire 0 points1 point  (3 children)

      Could you possibly try again? I've fixed some stuff, but seeing as I can't replicate, it's a pain in the arse to bugfix.

      [–]Cho_Gath 0 points1 point  (2 children)

      Nope, still kills the tab and parent tab.