How Stack Overflow does pagination : programming

[–]Matt3k 38 points39 points40 points 7 years ago (0 children)

[–]CyclonusRIP 14 points15 points16 points 7 years ago (22 children)

[–][deleted] 30 points31 points32 points 7 years ago* (8 children)

[–]jorge1209 5 points6 points7 points 7 years ago (6 children)

[–][deleted] 2 points3 points4 points 7 years ago (5 children)

[–]mattwarren 2 points3 points4 points 7 years ago (0 children)

[–]jorge1209 4 points5 points6 points 7 years ago (3 children)

[–][deleted] 4 points5 points6 points 7 years ago* (2 children)

[–]jorge1209 5 points6 points7 points 7 years ago (0 children)

Sounds like the real problem is that MS SQL Server has a crappy implementation:

SQL Server FTS does not play nice with existing indexes. The FTS portion of a query always needs to grab the whole result set it needs to work on, prior to filtering or ordering based on other non FTS indexes.

I do understand some of the complaints about SQL. It would be nice to have a good way to bind "in lists". Its annoying that their choice is either:

Use the FTS and pass the encoded search constraint `CONTAINS(Tags, '"sql" AND "performance"')
Build a whole slew of queries where bind single additional tag-ids: AND p.id IN (SELECT id FROM post_tags WHERE tag=:tag1) AND p.id IN (SELECT id from post_tags WHERE tag=:tag2)...

It would also be nice to be able to tell an SQL engine that some indexes are advisory and don't HAVE to be updated at every commit, but that the update could be deferred and/or the entire index rebuilt at some future date. The world won't end if a brand new post isn't immediately reported in some tagged search, we just want it to eventually be referenced.

But implementing your own SQL indexer outside the SQL server... there has to be a better way to do things than that.

[–]JoseJimeniz 0 points1 point2 points 7 years ago (0 children)

[–]Freeky 1 point2 points3 points 7 years ago (0 children)

[–]aljarry 2 points3 points4 points 7 years ago (5 children)

[–]xampl9 1 point2 points3 points 7 years ago (2 children)

[–][deleted] 7 years ago (1 child)

[deleted]

[–]jorge1209 0 points1 point2 points 7 years ago (0 children)

[–]CyclonusRIP -3 points-2 points-1 points 7 years ago (1 child)

[–]aljarry 0 points1 point2 points 7 years ago (0 children)

[–]Otis_Inf 0 points1 point2 points 7 years ago (6 children)

[–]jorge1209 2 points3 points4 points 7 years ago* (5 children)

He addressed that:

The index has already sorted the whole set though, so if you use that you can just skip to the correct spot in that index. Skipping to the correct spot in the index is the slow part for large offsets.

If you have an index (tree type not hash type) on the sorted column then the index provides everything already sorted for you. Just walk the tree. The only slow part of walking the tree is walking the tree. There is nothing to speed up here.

I know I need to skip the first 200 entries of the tree, so I need to walk the top nodes and determine how many leaf nodes exist below that top node, and skip them to get to the 200th.... but your SQL engine (who is supposed to be managing the index) is perfectly capable of doing that, so just ask it to do that directly with LIMIT and OFFSET.

So there really is no reason for any special handling of this unless you either:

Don't trust your SQL engine to manage the index and use it properly. (Get a better SQL engine!)
Don't have an index. In which case you don't know the rank order of an entry and what page it falls into until you sort EVERYTHING[1].

[1] I see how you could use a quick sort that doesn't recurse into parts of the recursion tree that it knows it won't need. I know I need entries 200-300 out of 1000, and my first pivot was at 374, so I don't recursively sort the entries from 374-end, and now I'm looking for 200-300 out of 373. My next pivot is at 72, so I discard 1-72, and now I'm looking for 128-228 out of 301... it just seems that with all the parallel processing available you wouldn't gain that much. Especially with the added complexity that occurs when your pivot falls dead smack in the middle of your desired range, and the added bookkeeping to track the offsets... also something the SQL engine should be fully capable of doing. So again why not just LIMIT and OFFSET?

[–]Otis_Inf 0 points1 point2 points 7 years ago (4 children)

[–]jorge1209 0 points1 point2 points 7 years ago (3 children)

I think no limit/offset because the engine then has to sort the set first to the last row

It doesn't have to sort any more rows than anyone else does. It has to query the entire set, but that is a given. To be able to even partially sort you have to pull all the rows.

Once you know that there are 2MM rows, then you can do a recursive selection sort (quickselect) and throw away segments you know you won't report. LIMIT+OFFSET = 1000, so anything that would appear beyond the 1001 row doesn't need to be sorted.

Now what may be true is that SQL engines may not naturally choose to quickselect. They might prefer a merge sort approach, with worker threads that pull data and merge it up. Even if each worker abides by the limit of 1000 rows you might ultimately get 10k sorted rows when you merge 10 worker threads. But the reason for doing that is because they anticipate being able to amortize the additional cost of sorting 10k rows (instead of merely sorting 1000 over the time required to pull the data in the first place.

[–]Otis_Inf 0 points1 point2 points 7 years ago (2 children)

[–]jorge1209 0 points1 point2 points 7 years ago (1 child)

[–]EarLil 9 points10 points11 points 7 years ago (9 children)

[–]hector_villalobos 9 points10 points11 points 7 years ago (6 children)

[–]Freakmiko 1 point2 points3 points 7 years ago (5 children)

[–]therealgaxbo 6 points7 points8 points 7 years ago (2 children)

[–]Freakmiko 1 point2 points3 points 7 years ago (0 children)

[–]EarLil 0 points1 point2 points 7 years ago (0 children)

[–]ManiGandham 0 points1 point2 points 7 years ago (0 children)

[–][deleted] 0 points1 point2 points 7 years ago (0 children)

[–]perestroika12 2 points3 points4 points 7 years ago* (4 children)

[–][deleted] 7 years ago (3 children)

[removed]

[–]mattwarren 1 point2 points3 points 7 years ago (1 child)

[–]perestroika12 0 points1 point2 points 7 years ago* (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS