I’ve been having problems with the search functionality on the PoshCOde repository, and I just thought I’d throw this up here because I just now solved the biggest problem: ranking. Up until now, the results have not been returned in order of relevance — this is because the search works using MySQL’s FULLTEXT BOOLEAN search, which doesn’t return in relevance order, nor does it return an extra ‘score’ column.
I’ve fixed that, and weighted the search so that words in the title count more than words in the code by creating a relevance column by hand:
Incidentally, the FULLTEXT index means that words shorter than 4 characters don’t count (I’m going to try to get this changed, but it’s an option for MySQL, so it has to be changed in the config file) in the meantime you can search for words using the wildcard character, like: SQL* and it sort-of works. The PoshCode cmdlet actually was adding *‘s to the query (although I’ve just decided that’s not a good idea, because it means that queries from the cmdlet appear to have different results than queries on the website.
MySQL’s FULLTEXT BOOLEAN search has all sorts of features (and limitations): there is a stopword list, maximum and minimum word lengths, and all sorts of operators for setting word precedence or negating words, or weighting them negatively … to REQUIRE that a word be present, it must have a + in front, and in order to mark a word as more important, you have to put > in front, not just put it first… I’ve been thinking about trying to apply a few of those tricks myself (eg: put * on words under four characters, and put > on the first 30% of words and < on the last 30% to try to simulate weighting them …) but my original feeling was that the search is more powerful if you just know that it’s a fulltext boolean search and can write your queries accordingly.
If anyone has any ideas for how to improve search in MySQL … or opinions on whether I should try to apply boolean operators to queries which don’t already have them … please let me know.
one trick i find use to find out the occurance of something in a text field. is to do a replace(@searchterm,’‘) and then substract that from the length the of the original text and divide it by the length of the searchterm.