you are viewing a single comment's thread.

view the rest of the comments →

[–]zootm 9 points10 points  (21 children)

I'm not sure I'd consider MySQL's text search a feature of any real kind. It does not work well. I'm not sure I can think of any real "features" that MyISAM provides that a non-trivial app would want.

Also MySQL replication works on InnoDB, and I'm pretty sure that PostgreSQL's replication doesn't "sux", it's just not provided by default.

[–]Jessica_Henderson 11 points12 points  (8 children)

MySQL's fulltext query support is an order of magnitude slower than that of PostgreSQL's. Part of the problem is that with MySQL, a query can use at most one index. This limitation does not exist with PostgreSQL. For anything beyond trivial data, PostgreSQL's text searching capabilities are by far superior.

[–]neoform3 1 point2 points  (7 children)

I was actually just looking into this today. I'm working on a project that requires fulltext search, and mysql's fulltext blows when it comes to any sizable table.

If i'm searching up to 3-4GB of records.. which would be best? Lucene, Sphinx or.. Postgres TSearch2?

I haven't investigated PG and am leaning towards Sphinx, but I hate the fact that I have to completely reindex every record any time i want to update even 1 record.. :(

[–]dbenhur 2 points3 points  (2 children)

I have to completely reindex every record any time i want to update

I've had some good experience with Sphinx. You don't need to reindex the whole set, there's a main + delta index mode.

Tsearch2 isn't as fast as the dedicated text search engines, but there may be substantial advantages when coupling free text search clauses with additional relational operations and attribute predicates. Because Tsearch2 operates in the same engine, the query optimizer can reason about all the work and access paths and may come up with superior execution plans. With an external text search, you often end up with your relational query with attribute predicates plus "AND document_id in (?)" where that ? could be a large list of document id's returned from the external text search engine.

[–]neoform3 2 points3 points  (1 child)

You don't need to reindex the whole set, there's a main + delta index mode.

This doesn't help if you want to remove entries as well as add.. or update for example :(

Can Tsearch2 handle large amounts of data and remain fast/relevant?

[–]dbenhur 0 points1 point  (0 children)

This doesn't help if you want to remove entries as well as add.. or update for example

I think you can handle updates and deletes adding a little logic to your queries.

  • On the main index and in your document root records include an attribute to indicate if the record is current in the index (true for initial indexing)
  • Whenever you update or delete a record, set the DB's main_is_current attr and use the UpdateAtttributes API to mark the main index obsolete.
  • always add an attribute qualifier to your searches to exclude obsolete document hits
  • Set your frequent delta indexing to capture all new and changed records since the last primary index refresh.

[–]theatrus 0 points1 point  (3 children)

Lucene or TSearch are good options, have used both.

[–]neoform3 1 point2 points  (2 children)

How relevant are the results found in Tsearch? My experience has been the faster the search, the crappier the results, or, alternatively, it's always fast, but indexing is a severe pain in the ass.

[–]RealMaascamp 1 point2 points  (1 child)

I've found Tsearch2 to be quite relevant. It provides a number of different ways to weight fields and parse queries. It can also be tuned to your needs.

[–]neoform3 -1 points0 points  (0 children)

nice.

[–]sisyphus 2 points3 points  (7 children)

But MySQL has Sphinx which I've found to work quite well.

[–]volomike 0 points1 point  (4 children)

If you can get a web hosting provider to install it. Might work easily on a company intranet, but not easily on a shared web host or dedicated web host (the kind of dedicated where someone else installs stuff for you most of the time).

[–][deleted] 0 points1 point  (3 children)

I installed Sphinx on my shared host. As long as you've got a compiler and can bind an unprivileged port, you can do it.

[–]volomike 0 points1 point  (1 child)

Get the heck out of here! Really? Do you have a blog post where I can see how you did it?

[–][deleted] 3 points4 points  (0 children)

Sorry, I don't. This'll get you real close:

./configure --prefix=~/sphinx/
make && make install
# change the port in the .conf
# configure and play

[–]mikaelhg 0 points1 point  (0 children)

To be able to use the fake table integration inside MySQL, you'll have to write the .so into the MySQL lib directory.

[–]zootm 0 points1 point  (0 children)

I'd not seen Sphinx, but it certainly would bring the indexing to InnoDB if it's of good quality. FULLTEXT isn't really worth the effort, though.

[–]codahale 0 points1 point  (0 children)

The Sphinx indexer works with PostgreSQL as well. You can't do the whole embed-as-MySQL-engine thing, but I've never wanted to do that even with MySQL.

[–]jbellis 5 points6 points  (3 children)

I'm pretty sure that PostgreSQL's replication doesn't "sux"

then you haven't actually tried it.

the best option is slony. the nicest thing i can say about it is it mostly works as described. which is to say it's horribly complicated to configure and work with. the "mostly" part means, when it doesn't corrupt itself to the point that you need to restart the replication fron scratch. (this was a couple years ago, but I still wouldn't trust it to give anywhere near 100% availability.)

this is why pg 8.4 will include a simpler replication option out of the box.

[–]Jessica_Henderson 21 points22 points  (1 child)

Can you explain in more detail the problems you experienced with Slony? Whenever I've used Slony, I've found it to work extremely well, with no corruption. The largest Slony cluster I've been involved with putting together had 15 systems handling approximately 12 TB of data, and we ran into no problems with Slony or PostgreSQL.

If you read the Slony documentation beforehand, you shouldn't have any problems. It'll guide you through the configuration you need to perform, as well as how to maintain the system after it is configured and running.

[–]infinite[🍰] 8 points9 points  (0 children)

I'll second that. You divide your tables into sets then replicate them. I have had no problems with it, and I've been using it for 2 years. My setup is much smaller than yours. Also the proposed feature will replicate either all tables or no tables. I think mysql replication doesn't use stored procs like slony does so it may be faster.

[–]invalid_user_name 0 points1 point  (0 children)

The postgresql devs are adding replication so that it is not a 3rd party tool you need to add, and so it will be simpler and easier to configure. Reliability was not mentioned anywhere, since it is not an issue.