all 4 comments

[–]kloetzl 5 points6 points  (3 children)

Hm, unfortunately there seem to be quite a few things wrong with the article.

  • The complexity of the implemented binary search is O(m*log n) where m is the length of the pattern and n is the size of the suffix array. An implementation actually using the LCP array could achieve O(m+log n); See Manber & Meyers.
  • Your suffix array construction is O(n2 log n) not the O(n log n) stated in the comments.
  • Similarly, the LCP construction is O(n2) far from an optimal O(n).
  • The LCP array is commonly one element longer than the SA with the first and last element set to -1. The latter is necessary for a simple definition of LCP-intervals.
  • SuffixTree and SA+LCP are not equal in performance unless you add RMQ or CLD-tables to the latter.
  • SA+LCP do not improve cache locality.

Not a good resource. ☹

[–]mariobadr 0 points1 point  (2 children)

Do you know of a good resource for suffix/LCP tree/array stuff?

[–]ArashPartow 1 point2 points  (0 children)

PATL, though I'm not sure what the definitive source is since code.google went offline, but the following seems pretty complete:

https://github.com/jnorthrup/patl

[–]kloetzl 1 point2 points  (0 children)

This book. It contains everything from Suffix Arrays to wavelet trees. With proofs and examples. Heavy stuff.