Posts Tagged ‘Searching’

Someone asked the question in #PowerShell (on irc.Freenode.net):

How do I find an item (in an array) based on one of it’s properties?

Actually, the question was rather more complicated than that. They were importing a bunch of users from a csv file, and wanted to sort them and search them based on a specific column. There are many ways to skin this cat. Imagine that you have a CSV file, and have imported it, like so:


Set-Content users.csv @"
LastName, FirstName, UserName, Url
Bennett, Joel, Jaykul, http://HuddledMasses.org
Rottenberg, Hal, HalR9000, http://halr9000.com
Hicks, Jeffrey, SapienScripter, http://blog.sapien.com/
"
@

$users = Import-Csv .\users.csv

Now, imagine that the CSV file has thousands of users in it, and that you need to not only sort the data by first name or last name on demand, but you also need to pull users from the list (by name) on demand.

These are trivial tasks in PowerShell: Read the rest of this entry »

I’ve been having problems with the search functionality on the PoshCOde repository, and I just thought I’d throw this up here because I just now solved the biggest problem: ranking. Up until now, the results have not been returned in order of relevance — this is because the search works using MySQL’s FULLTEXT BOOLEAN search, which doesn’t return in relevance order, nor does it return an extra ‘score’ column.

I’ve fixed that, and weighted the search so that words in the title count more than words in the code by creating a relevance column by hand:


SELECT *,
((1.3 * (MATCH(posttitle) AGAINST ('keywords' IN BOOLEAN MODE)))
+ (0.8 * (MATCH(description) AGAINST ('keywords' IN BOOLEAN MODE)))
+ (1.0 * (MATCH(code) AGAINST ('keywords' IN BOOLEAN MODE)))) AS relevance
FROM pastebin WHERE MATCH (posttitle,description,code) AGAINST ('keywords' IN BOOLEAN MODE)
ORDER BY relevance DESC LIMIT 25
 

Incidentally, the FULLTEXT index means that words shorter than 4 characters don’t count (I’m going to try to get this changed, but it’s an option for MySQL, so it has to be changed in the config file) in the meantime you can search for words using the wildcard character, like: SQL* and it sort-of works. The PoshCode cmdlet actually was adding *‘s to the query (although I’ve just decided that’s not a good idea, because it means that queries from the cmdlet appear to have different results than queries on the website.

MySQL’s FULLTEXT BOOLEAN search has all sorts of features (and limitations): there is a stopword list, maximum and minimum word lengths, and all sorts of operators for setting word precedence or negating words, or weighting them negatively … to REQUIRE that a word be present, it must have a + in front, and in order to mark a word as more important, you have to put > in front, not just put it first… I’ve been thinking about trying to apply a few of those tricks myself (eg: put * on words under four characters, and put > on the first 30% of words and < on the last 30% to try to simulate weighting them …) but my original feeling was that the search is more powerful if you just know that it’s a fulltext boolean search and can write your queries accordingly.

If anyone has any ideas for how to improve search in MySQL … or opinions on whether I should try to apply boolean operators to queries which don’t already have them … please let me know.

Search My Content