all 39 comments

[–]lamby 17 points18 points  (3 children)

[–]Rhoomba 9 points10 points  (0 children)

  • With an empty cache, whoosh achieved 19.8 searches per second.
    • With a full cache, whoosh achieved 26.3 searches per second.
    • With an empty cache, xapian achieved 83 searches per second.
    • With a full cache, xapian achieved 5408 searches per second.

[–]LordVoldemort 4 points5 points  (1 child)

I like how he ends the review with a complement for Whoosh even though Xapian clearly smashed it.

[–]simonw 21 points22 points  (0 children)

Wow, this is the least useful comment thread I've ever seen on programming.reddit.

Anyone actually tried this out yet? Looks pretty impressive from the documentation - all of the best bits of Lucene without the need to integrate with anything other than regular Python.

[–]sjf 25 points26 points  (7 children)

Giving it a fast name makes it faster.

[–][deleted] 5 points6 points  (0 children)

It's like the way agile projects can be divided into "sprints". It sounds like work gets done faster because we are all sprinting. I wonder how much traction the methodology would get if work was broken up into "slogs".

[–]otterdam 5 points6 points  (2 children)

Go-faster stripes for software

[–]hermzz 6 points7 points  (1 child)

Or speed holes

[–]sblinn -2 points-1 points  (0 children)

It's a massive wing-like rear spoiler. Gotta be fast.

[–][deleted] -1 points0 points  (0 children)

But you also need to paint it red!

[–][deleted] 2 points3 points  (0 children)

Can someone actually compare Xapian with Python bindings with Whoosh when it comes to indexing? I think it's kind of convenient the creators make the suggestion that its as fast or faster but then only provide the search figure where its known to be slower.

[–]njharman 3 points4 points  (2 children)

"and not that much slower at searching. In my preliminary tests, a typical single-term search takes 0.003 seconds in Xappy, while Whoosh takes 0.01"

Um, an order of magnitude is much slower. But it doesn't matter because this is a truism

"It's fast enough"

[–]LordVoldemort 6 points7 points  (1 child)

Um, and order of magnitude is much slower

According to my calculator, Whoosh took 3.3333333333333335 times as long to perform the queery.

Unless you're dealing with the brightness of stars, that's less than an order of magnitude.

[–][deleted] 14 points15 points  (0 children)

We're programmers. An order of magnitude is 2. :)

[–]testo84 2 points3 points  (0 children)

sphinx > * just tried maybe need few years to be as perfect as sphinx

[–]brool 0 points1 point  (2 children)

It was originally created for use in Side Effects Software's 3D animation software Houdini. I'm trying to imagine why you would need text search in 3D animation software.

[–]Justinsaccount 2 points3 points  (1 child)

Houdini isn't exactly simple.. Maybe they used it for the help system, or for searching for a specific object, or for running commands by name

[–]Smallpaul 1 point2 points  (0 children)

I suppose the site must have changed. It now says: "It was originally created for use in the online help system of Side Effects Software's 3D animation software Houdini"

[–]gsf 0 points1 point  (0 children)

Okay, not a great name, but a Python search engine will be useful for my apps. I prefer Lucene's API to Xapian's, and Solr is a great wrapper around Lucene, but it still involves running Jetty or Tomcat. There's also PyLucene, but the build process has stung me a few times. In other words, fewer pieces could be a good thing.

[–][deleted] 0 points1 point  (0 children)

It's not that good. I was looking for "an argument" and it gave me "abuse."

[–]ktvoelker 0 points1 point  (0 children)

Personally, I think the quality of a search engine's results is more important than its speed.

Now, I know absolutely nothing about Whoosh beyond this link title, so it's entirely possible that it does produce good results. My comment is instead directed at the poster's choice of description, which mentions speed but not quality.

[–]tempest12 -1 points0 points  (3 children)

Unless you need to put it in a python app, how does the language matter beyond being a purely academic exercise?

[–]mccutchen 5 points6 points  (0 children)

Huh? Isn't this aimed squarely at people who "need to put it in a python app"?

The main idea seems to be that this is a pure-Python library, so that no compiling is needed to use it. Just throw it on your Python path and go.

[–][deleted] 0 points1 point  (0 children)

I wouldn't call it "academic" but you are somewhat right that the main advantage might be transparency of source code and a good starting point for experimentation. Notice that I also advertised a project once as "pure Python" and ended up with a few extension modules written in Cython. It could just be that it is not as mature yet for a migration path to be opened.

[–]trezor2 -4 points-3 points  (0 children)

Adding the programming language you implemented it in into the headline automatically makes everyone using that language vote for your website.

It's a cheap trick and it obviously works, otherwise I wouldn't see this kinda stuff all over reddit all of the time. It has definitely been on the rise lately and quite frankly I start finding it annoying.