Huddled Masses
You can do more than breathe for free...
Browse: Home / SOM Recommender Progress

SOM Recommender Progress

By Joel 'Jaykul' Bennett on 06-Jan-2007

Well, I’ve decided to refer to my SOM Recommender by the initialism SOMR (which for the sake of the argument, I pronounce “sommer” like “summer” but with an o), and I’ve been working on it for about a month. It’s been a busy month outside of working on this project, with end-of-year stuff at work, and of course, the Christmas holiday with family, but I’m basically tracking correctly on my schedule despite that.

A Better Project Site

Rather than just keep everything here in my blog, I created a Trac website so that those of you who are interested can easily track my progress. I’m hosting that site on my home server (on residential cable) so if you have problems accessing it, it’s probably just momentarily off-line, it’s at somr.jaykul.org. Trac has a timeline which will allow you to easily see the subversion checkins, as well as tracking my progress against the milestones that I set out in my project proposal.

The somr site also allows you to easily browse the source in the subversion tree, and it has a wiki, which I’ll be updating more frequently with progress and ideas that come to me as I’m working. I’ll still be posting updates here once a month or so to update those who are less interested in the details.

Current Progress

The WebDownloader and scraper are done, and I’ve created the database schema and saved a bunch of SQL scripts so I can recreate it later as part of the installer?

I also created a SomrDataSet class to handle the interface to the database storage, and a small set of tests for each of these items to validate that they work.

Problems with tests

At this point, most of these tests serve more as an example of how to use the classes than as comprehensive test, so I have in mind to try and get some coverage tests going as well to make sure that I have 100% code coverage in the tests going forward (although that will be awfully difficult with the SomrDataSet.Designer.cs which has quite a lot of generated code in it that I may not really need).

The Windows RSS Platform

I’ve also discovered the new Windows RSS Platform, which is part of IE7, and as a result, is built into Vista. I’ve created a simple test case to see how it works, and it’s pretty simple, and fairly slick. It seems like the absolute best way to parse the recent feed because it will continuously download the feed in the background even SOMR isn’t running.

Using the RSS platform would mean that SOMR itself would never have to download the feed for recent items at start up, and would be guaranteed a larger number of items to evaluate, since the RSS Platform service can retrieve the feed as often as every five minutes even when SOMR isn’t running.

However: it’s probably not the best way to handle downloading user or URL pages, as downloading through the RSS Platform seems to require adding the feed to the collection and then invoking the download method. Considering the number of feeds we’d be processing (a feed for each URL we find in the recent feed) it seems like a bad idea to add them to the RSS platform collection, since we don’t want to be downloading thousands of feeds on a regular basis.

Although it might be simplest to parse all the feeds through the Windows RSS “normalizer,” I’m not entirely convinced. There are basically two feeds I have to deal with on del.icio.us: the recent feed and the URL feeds ... even if I’m getting the recent feed through the platform, it might be worth handling the URL feeds myself.

Issues:

How much data do I need to validly map a URL?

Can I really tell if a URL is interesting when it’s only been bookmarked by 2 people (unless I knew those people were “like me”)?

Is it possible it might be interesting later?

That is, if I test a URL initially when it’s only been bookmarked by 2 people, and it fails to be interesting based on keyword tags, should I retest it when it’s been bookmarked by 5 or 10 people? How about when it’s been bookmarked by 25 or 50?

Similar Posts:

    None Found

Posted in Recommender | Tagged Development, Recommender

« Previous Next »

Lijit Search

Tags

.Net .Net 2008 Scripting Games Automation Bugs Design Development Funny Gadgets GeoShell GUI Huddled Masses Internet licensing Microsoft Modules My Software News Personal PInvoke Pipeline Politics PoshCode PoshConsole PowerBoots PowerShell PowerShell Functions PowerTips Rants Recommender Repository Scripting ShowUI Software Solutions Textile Tips User Group UserInterface WalkThrough WebHosting Windows 7 WordPress WPF Xml

About Huddled Masses

This is web site is dedicated to the musings of Joel Bennett (aka Jaykul) about technology, software, software development, the web, and the world.

Any resemblance of the views expressed and the views of my employer, my terminal, or the view out my window are purely coincidental. The resemblance between them and my own views is non-deterministic. The question of the existence of views in the absence of anyone to hold them is left as an exercise for the reader.

P.S.: I occasionally link to things I think are great. When I do, I occasionally find a "referral code" so I can make a little cash. I promise that I don't link to anything just because of that cash (I wouldn't cross the street for the amount of cash those links bring in, never mind write a whole blog post) ... but I do not promise that things I link to will stay great as time passes, nor that you will agree with me about their greatness!

Archives

  • January 2012
  • October 2011
  • August 2011
  • July 2011
  • June 2011
  • March 2011
  • February 2011
  • January 2011
  • November 2010
  • August 2010

Copyright © 2012 Joel Bennett.

Powered by WordPress and Hybrid.