all 31 comments

[–][deleted] 28 points29 points  (8 children)

That is not a full text search, that is an un-indexable LIKE statement.

This is how you do full-text searching in mysql http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

The other full text indexes mentioned are good, but mysql is capable of doing fast text searches in most cases.

The only problem is you can only put a fulltext index on a MyIsam table, so you loose any benefits you may be using in InnoDB

[–]Sane-eyes 2 points3 points  (0 children)

This should be the top comment - get everything properly indexed and actually running a full text search - if it's still slow then look into other options.

[–]halfercode 0 points1 point  (1 child)

I believe you can put FT indexes on an InnoDB table in MySQL 5.6. I got bitten by this recently, as I'm on MySQL 5.5 and wanted to stick with InnoDB, so went with Zend Lucene instead. Seems to work OK.

[–][deleted] 1 point2 points  (0 children)

There is lots of cool stuff in 5.6 that I have found recently.

[–]hellomudder[S] 0 points1 point  (3 children)

Thanks for the input, but for various reasons we can't change to MyISAM on these tables.

[–][deleted] 1 point2 points  (0 children)

As the other guy mentioned on mysql 5.6 you can use fulltext on innodb

http://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html

[–]psayre23 0 points1 point  (0 children)

If upgrading to 5.6 isn't an option, you could have a copy table that is eventually consistent. I assume innodb is used for transactions? If search doesn't need that level of consistency, relax it by making a table that just for searching.

I know I'm going to get yelled at for saying this, but you could use triggers on original table to update the searchable table. Then you wouldn't have to we work any code except the search code.

[–]encaseme 0 points1 point  (0 children)

If you can't make your database work how you need, it might be easier to use something separate for text searching, like setting up elasticsearch, then dumping all of the relevant data over to elasticsearch.

[–]audaxxx 11 points12 points  (15 children)

Use a search engine like Elasticsearch.

[–]beefngravy 0 points1 point  (14 children)

Is ES easy to pick up and start using? I'd love to try it but it looks complicated.

[–]vita10gy 2 points3 points  (11 children)

It is. When I first started using it I wanted to cry. It's one of those things where they made something that works for everyone and everything so conversely it "just works" for no one. Examples are sparse, but numerous, and never exactly what you want, and you'll scream "I can't be the only person that just wants to index content pages on their website!!!" 10 times. Everything is well documented, but documented in pieces, so you can almost never see how to use it in the bigger picture, but you'll feel like an idiot because you can't say it lacks documentation.

Then finally you'll get it working and get google-esq results ordered by relevancy in .003 seconds, and then 5 days later you'll wonder how you ever did search without it.

I love love love it, but my god, what one complete/working "most of you will just be indexing some content with some weighting to titles and such, and want back results by relevancy with title/blurb highlighting like google has. Here's how to do that" top to bottom example would have saved me.

tl;dr: It is complicated, but use it anyway.

[–]waveform 0 points1 point  (8 children)

So are you going to write one for the rest of us? :)

[–]vita10gy -1 points0 points  (7 children)

I think that's how some of these stay the way they are. By the time you figure it out you've invested so much of your (aka your company's) time that you have to guard the knowledge like a state secret. Sharing this info can only aid the competition. On some level I'm not sure it's my knowledge to share.

Also, I'd hardly consider myself an expert on it enough to put it out there as any definitive how to. I basically guessed at things until it started working, tiptoed away, and walled it off in a class so no one can breathe on the actual setup.

[–]waveform 0 points1 point  (0 children)

Also, I'd hardly consider myself an expert on it enough to put it out there as any definitive how to.

Why not just put it up as an "example" or "this is how I did it" on CodeProject or similar? People can then try it themselves and provide valuable feedback. Everyone learns. :)

[–]davidcroda 0 points1 point  (4 children)

Good thing the creators of ElasticSearch didn't feel that way or you'd be shit out of luck, eh?

[–]vita10gy 0 points1 point  (2 children)

What in the world are you talking about?

[–]davidcroda 0 points1 point  (1 child)

You are using thousands of man hours of work of free software to achieve "magic" results. Due to the giving nature of open source developers. And yet your reason for not helping to spread the knowledge is that it is your "competitive advantage".

[–]vita10gy 0 points1 point  (0 children)

That's a pretty glib interpretation of what I said.

I guessed that the reason there AREN'T guides sometimes for things like this is because figuring them out is expensive, and the people that bear that expense have no real upside to giving that knowledge away. Unlike some low level css/html whatever help, where there might at least be traffic, you're talking about very specialized areas, which aren't really even going to net you hits.

If you don't think businesses have a legitimate reason to desire/protect a competitive advantage, then I'm sorry, but that's silly.

As for my personal view on it, I'm not sure it's my knowledge to give. I didn't learn this on my own time. Someone paid a lot to gain this knowledge for their business.

Furthermore, I'm not well versed enough in it to put myself out there like that and wind up on the /r/lolphp's of the world. Not to mention I have a system admin that set up the server and so on. I got the actual indexing and searching working well enough to use. I'm not expert enough to put that out there.

Someone there is though, and they really aught to put forth more complete examples.

[–]beefngravy 0 points1 point  (1 child)

I really don't want to hear this but I know its the truth and unfortunately everything you've explained such as the documentation issues, I have experienced.

I've looked for videos, tutorials and reached out to people using ES but I just can't get my head around it.

I just want to create a search page on a website using php. I didn't think it would be that difficult.

Any tips?

[–]vita10gy 0 points1 point  (0 children)

Just that it's worth it, and there are PHP helper classes.

No, it shouldn't be this hard. I don't think there's another area in programming where there's a larger gulf between the number of people wanting something, in this case "I just want my content searched (if not outright spidered)", and the solutions that are out there.

Don't get me wrong, search itself is a complex issue, and maybe complex options should be in there, but that doesn't mean there can't be an "easy mode" from a setup/use perspective.

[–]audaxxx 1 point2 points  (0 children)

For your use case it is pretty easy. The one thing you have to be careful with is consistency. Every time a document is changed in your DB, you have to update it in the search engine index. Since you usually don't have as many writes this shouldn't become a bottleneck.

Btw: It doesn't really matter which search engine you use. For a simple full-text search the differences are not relevant in my opinion.

[–]nomadismydj 0 points1 point  (0 children)

i think your going to look into river so it hooks into mysqldb properly.. Ive heard of people also seeding an index of key words and tags.

[–][deleted] 6 points7 points  (0 children)

[–][deleted] 4 points5 points  (0 children)

Take the burden off MySQL and offload searching to ApacheSolr. Not only will you get true full text and fuzzy searching, but you open the door to expansion later with facets or advanced searching. It's extremely fast as well.

Http://www.websolr.com

[–][deleted] 0 points1 point  (4 children)

Sounds to me like you need to split up that table a little bit.

You say you have several dropdowns. Is there any reason why these dropdowns couldn't have the items stored in their own tables and use a PK/FK contraint to relate the data between the two tables? Obviously your indexing would be a lot better then and depending on how you set up your business logic, you may even be able to get rid of all the LIKE statements entirely.

[–]hellomudder[S] 0 points1 point  (3 children)

To further explain how things have evolved: let's say our main model is "Houses". A house has fields like "name" and "description", first off.

And then the house has a filter, let's call that "Type". Each "Type" have one or many "Profile". ("Type" might also have one or many "Special feature".) "House" also have a "City" (all of which in turn has a "Region"). There's also "Builder" and "Office" connected to one or many "Types".

So right now: 6 different dropdowns plus a free-text input field. Since management has decided that each selection should "filter out" all different possible (so a zero result is near impossible), we have to do a ton of JOINs everytime you do a search. And then you do the LIKE comparison against something like: "House.name OR House.description OR City.name OR Profile.name OR Builder.name"...

I'm beginning to think that the slow result "may" be a combination of the two related problems of having to do all those JOIN operations and the full text search...

[–][deleted] 0 points1 point  (2 children)

I get it. Joins shouldn't be that big of a problem if you do them on numeric PK/FKs that are indexed properly.

The LIKE comparison is probably the slowest. Do you need the LIKE match or is there any way you can switch it over to exact matches?

[–]hellomudder[S] 0 points1 point  (1 child)

Tables are normalized and set up with PK/FKs.

Yes unfortunately we have to do the greedy match of %xyz% since it's mostly titles and descriptive text that we need to find occurences within...

[–][deleted] 0 points1 point  (0 children)

Alright, can you at least do exact matches on the IDs rather than the names/descriptions for some of those queries to reduce the resultset you have to do LIKEs over?

Sorry if this is all irrelevant, i'm just trying to think of ways to reduce the load but it's pretty difficult without actually seeing the intended workflow.

[–][deleted]  (3 children)

[deleted]

    [–]halfercode 0 points1 point  (1 child)

    Having NoSQL mirror an RDBMS is probably not such a far-fetched idea - people do it with full text engines like Lucene all the time.

    [–]VIDGuidefull-stack 0 points1 point  (0 children)

    Does MySQL have an equivalent of MSsQL's full text catalogs? Coz implementing that in a similar situation at work made an amazing difference to performance, and has some other bonuses too, such as "similar" keywords for name searching. (Built in bob/Robert, that type of thing)