Posts Tagged ‘Recommender’

Well, I feel like I left some of you hanging (those who read my blog but don’t follow me on Twitter or Facebook or FriendFeed or … something) ...

I aced my Master’s Defense [groupwoot]

Which is to say, I got the highest grade possible for my project (yes, RIT grades your Master’s project when you defend it) and am (unofficially, since I’m still waiting for the paperwork to clear) done with my Master’s degree.

Here are the slides from my presentation (not very exciting without all the stuff I said … maybe I should scan in my 3.5” cards), along with the source .tex including my final paper, the proposal and the original research I did. The source code for the prioject is available, as well as compiled binaries and if you actually want to work on it, I’ve got my database which I could make available.

I took the weekend off

The MythTV menu (default blue theme) Taken fro...
Image via Wikipedia

I intended to spend last week celebrating my defense by releasing a new PoshCode build, but stuff happened™ and before I knew it, I had spent the whole weekend hanging out with my kids, and setting up a new MythTV box, and then upgrading my main development box to Vista 64bit (I’ve been frustrated about missing out on some of my RAM because I was running 32-bit Vista).

That’s pretty much completed now (except for the hanging out with the kids part :D ), so tonight, or tomorrow at the latest, I will get back to the work I had left for PoshCode and try to get that done in time to use it in my presentation next week.

I have several backlogged blog posts waiting…

  • I’ve got a few more ideas about doing REST from PowerShell, and about setting up DekiWiki for the PowerShell Community
  • I wrote a little script to do compiled C# in PowerShell v1, and a couple functions to do custom IComparer implementations with it.
  • Visifire now supports WPF (which means you can use their charts from PowerShell)
  • I wrote an extensible IRC bot in under 40 lines of code using SmartIRC4Net from PowerShell

And on top of that, I’ve got a ton of work at work now, mostly involving writing Ruby ...

Reblog this post [with Zemanta]

Well, my project report has been accepted and I’m defending my Master’s on December 2nd, 2008 at 4:30pm. You’re all invited to come and see how slick my Self-Organizing Maps Recommender is, from database to PowerShell cmdlets and all.

Report Abstract

This project comprises designing and implementing a hybrid recommender system for web–pages which uses data from a social tagging system to recommend interesting items to users. For this implementation, the tagging data comes from del.icio.us, the oldest and largest public social bookmarking or tagging system for web pages. The system clusters items using a pair of self–organizing maps (SOM) networks and allows users to see the system’s evaluation of their region of interest, or set their own regions.

The focus of this project was the web-scraper for gathering tagged URLs from del.icio.us, and the recommender system. The SOM networks are built using the GHSOM implementation, and several recommenders were built to compare results: one using a single map for URLs and users and the other using separate maps to compare the relative quality of the recommendations.

Reblog this post [with Zemanta]

The last month went rather badly. First I got distracted by another project for two weeks … and then when I finally got back to working on the recommender, I found a bunch of problems with the way my database code was working (or rather, not working).

Essentially, I made a mistake when I wrote the tests and mostly tested my DataSet? logic, and not the round-trip to the database, or even the individual queries to the database. Even though my unit tests were working, the data wasn’t being stored in the database right. I’ve added a few tests which actually run SQL queries against the back-end database after exercising the interface — and I’ll add a few more in the next week or so as I actually try running the scraper, but the couple that are there now will prevent duplicating the same bug again.

At any rate. I’m clearly at least three weeks behind schedule, although I expect I’ll be able to get back on track, it will be partly at the cost of leaving the SOM code in C++ for now or using an automatic converter to get most of the conversion to C# from C++. Overall, that’s not really a huge problem, as long as I can hook it in without resorting to a command-line interface.

The inaptly named Windows RSS Platform is actually part of IE7, not part of Windows, and therefore is available on Windows XP if IE 7 has been installed, as well as on Windows Vista (where IE 7 is included originally). However, having said that, it isn’t just for IE: it includes a complete COM API which is usable from script or the .Net Framework, and the header files are part of the Windows Platform SDK and usable from C/C++.

The RSS Platform is intended to introduce a unified approach to RSS for Windows applications, where all applications use the same RSS feed store, and a service handles downloading the RSS feeds — including enclosures if requested — and normalizes them so applications need not handle parsing all the different feed formats (that is, you only need to parse the Microsoft-normalized RSS 2.0 with extensions).

As a platform for building RSS-based applications, it’s very well done, and well thought out. It’s now ridiculously easy to create an RSS reader, since the platform removes all need to parse XML except in the weirdest situations, and allows all applications to be instantly integrated on the same list of RSS feeds … let me show you …. Read the rest of this entry »

Well, I’ve decided to refer to my SOM Recommender by the initialism SOMR (which for the sake of the argument, I pronounce “sommer” like “summer” but with an o), and I’ve been working on it for about a month. It’s been a busy month outside of working on this project, with end-of-year stuff at work, and of course, the Christmas holiday with family, but I’m basically tracking correctly on my schedule despite that. Read the rest of this entry »

After some extensive research, I decided to go the project route instead of the thesis route at RIT, primarily because I’m not immediately going to be working toward a doctorate. But also because I’m still a bit more interested in the code part of the research, and the project defense is rumored to be easier than the thesis, without the requirement to prove that my idea is original. At the end of the day, I’m still working full time, and have a family, so my primary concern right now is to get my degree completed as soon as I can.

My project proposal was accepted, and I’ve started working on code and databases. I’ll be posting regular updates here, along with a link to my subversion server, but I wanted to start by posting a short summary of my project.

The project comprises designing and implementing a hybrid recommender system for web–pages which uses data from a social tagging system to recommend interesting items to users. For the initial implementation, the tagging data will come from del.icio.us, one of the oldest and largest public social bookmarking systems. The system will cluster items using a self–organizing map (SOM) network and will include a new SOM visualizer that allows users to see and modify the system’s evaluation of their regions of interest.

The focus of the programming project will be a scraper for gathering tagged URLs from del.icio.us, a visualizer, and the recommender. The SOM network code will be based on existing implementations, and two recommenders will be built, to compare the relative quality of the recommendations: with one using a single map for URLs and users and the other using separate maps.

For those who are interested, the full project proposal is here with a short specification and design document, as well as my proposed schedule — that I’m not too far off of, so far ;) .

My wife and I went on vacation for most of last week, so I didn’t get a whole lot done on my research.

I did some searching online for projects and papers related to SOM classifiers, and found a couple that show promise in the sense of incorporating their methods into my project, but nothing that goes beyond using the SOM itself as the classifier, and all of the articles related to SOMs seem to involve analysing the full text of documents using TF/IDF except one, which has an interesting approach to use the SOM itself to generate weights, but still analyzes the full text of the document.

I also found a couple of very good background papers including an article on what folsonomies are, and a detailed analysis of their strengths and weaknesses.

This was a slow week, but I’m ramping up for next week. I have several other papers to read, and I’m starting to look at how I can best use the SOM for strict classification, and whether or not applying a Bayes Net on top of it is the best approach. Hopefully I’ll get a start on some code for that this week.

As a side note: I did get to see the Tony Award winning Monty Python’s Spamalot in DC last week. It was excellent, and I highly recommend it only to those of you who enjoyed the movie Monty Python and the Holy Grail, from which it is lovingly ripped off.

I’m also updating my CiteULike page with all of the articles I’ve been reading, nicely tagged and with my comments on each, in case you want to follow along.

Well, I’ve started taking the “MS Thesis and Project Seminar” course at RIT, which is the first step towards registering my Thesis (or project) with the school and finishing off my degree!

I’m currently nursing two main ideas for projects related to Artificial Intelligence, and this past week I met with Jessica Bayliss to discuss the second idea and started work on a independent study focusing on refining this second idea and determining whether it’s Thesis material or not. The majority of what I did this week, and plan to do over the next week is researching the existing research in the area of Self-Organizing Maps and classifiers, including reviewing my previous research to refresh my memory.

I’ve also created draft documentation of both ideas as posts on my blog. I’ve linked to them below, but these posts posts will change a lot over the next couple of months, and are marked “private” and thus not accessible to the general public … sign up on the front page and drop me a line if you’re really interested.

  1. The first is a project idea for a Learning Notification System which would give more control over computer notifications and alerts to the end user. The primary product is a system akin to GROWL for the Mac, but with multiple levels of alerts, and a learning classification algorithm which allows the computer to intelligently avoid interrupting the user.
  1. The second idea is for a full-blown Thesis, which I’m not really sure I want to attempt. However, this idea for an SOM-based classifier could also be done as a project, and in either case would actually be fairly interesting. It revolves around the idea of building a classification system based on SOM algorithms. Currently there’s many parts to this idea, and it’s really possible that this could be fodder for several project-length experiments.

The basic idea here is to build a classification system based on SOM algorithms which can be used to pick “interesting” articles from sites like delicious, diigo, magnolia, and lilisto (I have a partial list of possible sites here).

There are currently several parts to this idea, and it’s really possible that this could be fodder for several project-length experiments.

The first question

Can I build a classifier which rates documents on how closely they match your interests, based on placing them in a self-organizing map which uses keywords to position the document. I have already built an algorithm which applies GHSOM to the relationship between keywords which were applied to documents, so the task here is mainly to see how useful this information is for mapping additional documents and for classifying them by interest.

The next steps:

  1. Apply the algorithm to individual documents and see where they are placed
  2. Determine the area of the map that represents the user’s interests (either by inference from having them rate documents, or by directly “circling” on the map their area(s) of interest)
  1. Rate documents by their (multi-dimensional) proximity to these areas.

The second question

Is this method more effective when using keywords generated by actual people than when it uses machine-generated keywords. There are many existing document sumarizing and keyword extraction algorithms, and even commercial products (eg: brevity intellexer). One or more of these could be run on the document to extract keywords instead of using the human-generated keywords available on delicious et. al. This would make the algorithm more capable of analyzing “any” documents, and would reduce dependency on the websites mentioned earlier (although this seems important, it may be of limited use, since the intent is to classify interesting documents from an incoming “stream” of documents, and currently my “stream” comes from these same sites where the keywords come from).

The next steps:

  1. Create a collection of documents with their human-generated keywords
  2. Run machine summarizing algorithms on these documents
  1. Compare the resulting mappings for relevancy ( what is the metric here? )

Additional questions

The most important open question (to me) is whether this idea is original enough to work as a thesis at RIT (as opposed to becoming a project). If it’s not, I’m leaning toward working on a different project which is somewhat more interesting to me.

However, there are several other open questions:

  • Is GHSOM better than a non-hierarchical growing SOM, or even a simple SOM algorithm for this task? (Instinctively, it seems that the key requirement is that the map size must be inferred, and thus that a growing algorithm is required, but the hierarchy may be uneccessary).
  • How does this system using free keywords (anything can be a keyword, including the user-name of the person who creates the keywords) compare to a system which has set categories. It seems that the classification would have much less adapting to do in a situation where categories are limited, since in the current system new keywords are constantly being added to the database and the algorithm must infer a user’s interest in these new keywords.

There are many opportunities for the use of AI in improving mainstream user interfaces. One particular area where little has been done is in the way our computers “alert” us of problems or information that we may need to know. Current operating systems provide little or no way for the user to control this, making no attempts to filter the information presented to the user (intelligently or otherwise), leaving the burden of optimizing notifications and alerts to the individual applications.

Individual application developers, on the other hand, only have control over the way their specific application interacts with the user, and perceive that providing less alerts than other apps, or providing them in a way that is less intrusive than the norm, will be seen by users as inconsistent, and could be considered annoying or even “buggy” by end users. Furthermore, providing sophisticated means for the user to determine what alerts or notifications they see is of little use on a per-application basis for most applications, so few developers have made any efforts in this regard, other than developers of email and instant messaging or news readers applications.

To illustrate the current state of the art, and show where it is lacking, take Outlook 2003, an example many will be familiar with. Outlook allows you to have sounds play or alerts pop up based on rules you create which are applied to incoming email, but it is not context aware, so is not able, for instance, to avoid playing sounds when you are in a meeting, nor can it choose to use sound alerts instead of popups when the screensaver is engaged. Outlook also lacks any form of learning or intelligence, so it can only perform these notifications based on specific rules the user has created: if you have a rule to be notified when you recieve an email from your manager, it’s based on the email address, and will not activate if, for instance, your manager emails you from home.

We propose a way to give more control to the end user by building a notification system with multiple types of alerts, and allowing the user to control which applications and indeed, which specific notifications are allowed to use which types of alerts. Further, we propose that the operating system itself could ease the burden on end users by building “smart” filters based on machine learning algorithms, which can adapt to individual user’s prefferences and help create adaptive rules for rating the importance and urgency of incoming messages to determine which notifications a user wants to see. Lastly, we propose a contextual awareness system which will modify the ratings of messages based on information about the current state of the user, whether that be in a meeting, playing a game, away from the computer but still in the room, or even out of the office, but available by phone.

We intend to provide a baseline implementation of this as a replacement for the Windows “system notification tray.” This will be implemented using the Windows Shell Exchanger toolkit which allows multiple system tray applications to run at the same time, and thus allows our app to run over Explorer, GeoShell or other third-party shells. We will, of course, implement additional APIs for delevopers of third party apps (beyond that of the system tray) which would include additional information for the filters … but we will endeavor to allow users to create rules for any application that currently uses the default Windows system tray notification area. Our implementation will provide multiple levels of alerts, starting with no notification at all, and progressing through just changing icons, and through sound alerts, and “toast” windows to modal popups and even email, SMS, and pager notification.

Our implementation will also include bayesian? learning filters based on an implementation from POPFile? which allows us to automatically categorize different alert or notification messages based on content, source, and source-assigned alert levels. While this learning system is expected to have only rudimentary success on current system tray applications, we provide a sample application in the form of a calendar app? which uses multiple “levels” of alerts as provided in our system’s API, and will do user testing which we expect will prove that when the system has this full information, it is able to substantially reduce user irritation by prompting the user less frequently and less intrusively than current systems. Read the rest of this entry »

Search My Content