Thursday, December 23, 2004


Ultra Recall

Kinook software have a new product called Ultra Recall, which in their words is "a personal information / knowledge / document management application for Microsoft Windows. It helps you capture, organize, and recall all of your electronic documents and information across all the applications that you use."

As far as a quick look goes, this seems to be a fairly direct competitor for Onfolio. I note a few immediate problems:
Perhaps if I get time to use the app a bit more, I might find that it has its uses - Angela Booth is certainly very happy with it, according to her blog.

The Onfolio team have picked up on my earlier post about the inability to backup data (noting I make a "very good point about our lack of support for incremental back-ups of collections") - lets hope they can reconsider their reliance on the "one monster file approach" to address this. (I know that Joe Cheng who designed this part of the app covered why they went with .CFS in the first place, but the analysis he gives does not consider whether you can do backups, and is also factually wrong in saying that Windows has a 260 character limit for file paths - that was probably true in the bad old days of DOS based file systems, but has not been the case on NTFS for many years now).


Search as you type named as software innovation of 2004

vbRad has come up with a list of their top 10 software innovations for 2004.

Top of the list is Search as you type in Firefox. As MozillaZine notes, this was actually earlier available in the Mozilla suite, and in fact its implementation there is better as far as I'm concerned - its one of the main reasons I stick with Mozilla rather than the dumbed down Firefox. (For example, the feature only works on the main web page view in Firefox - it does not work on the "view source" view).

I'd agree that a number of other items on the top 10 list are useful features, but I'd be hard pressed to say they were the top 10 innovations.

Wednesday, December 22, 2004


IFilters galore

The MSN Search team have a wiki for both Web search and Desktop search.

One of the pages lists the many available IFilters that MSN Desktop Search can use. Some of these are shipped with the desktop search download, some come with other Microsoft products, some with third party products, or free downloads, and some can be purchased. For a company that has just had a huge new market opened up to it, the IFilterShop seems to be very slow off the mark - its website does not even mention MSN Desktop Search.

Onfolio also makes use of the IFilter interface to allow it to index Office documents, though their developers point out that its actually not a sufficiently rich enough interface to allow many other types of document to be properly indexed - the API does not allow "deep linking" to the relevent data.


Google Print restricting results for UK IP addresses

Aaron Swartz did a thorough review of the updated Google Print / Google Library service, which he helpfully peppered with deep links. However, when I tried out the links, the page it took me to was basically empty - it certainly did not show the data Aaron was trying to link to.

It appears that this behaviour is due to the fact that I am UK based, so Google detects my geolocation, and "censors" the results - presumably because they have not sorted out the copyright issues, where UK and US copyright laws differ.

One way around this is to get a US based proxy server to access the page for me - and it so happens that Google have such a service in the form of their translation service.

Thus a direct link to Darwin and After Darwin gives the result "Your search - - did not match any documents." whereas a link proxied via the translate service shows the scanned pages.

Can anyone confirm if they see the same restrictions to the results from other geolocations?


Google Desktop Search new version

Google Desktop Search has rolled out a new version, via its inbuilt update mechanism. This version was essentially to fix a security problem that was discovered, but the release notes page also refers to "Miscellaneous fixes and improvements".

GDSPlus, a third party addition which allows you to edit what filetypes are indexed has been revised to work with this new release.

Friday, December 17, 2004


Google Suggest Dissected... for your education

Google Suggest Dissected... has produced an unobfuscated version of Google's javascript that powers the Google Suggest feature. I guess this dissection falls under fair use for education purposes - but making use of that code in another website is going to cross a line somewhere.


Implementing search as you type using Google results does not work

Google Suggest is certainly serving to popularize XMLHttpRequest and as Joel says "it's going to teach web users to expect highly responsive user interfaces" of which this mapping server is a worthy example (which ironically does not use XMLHttpRequest).

However, one example that does not show the technology to good effect is this Live Google Results which tries to give a "search as you type" effect in Google. The problem is that Google's search does not allow prefix searching, so the searches you get as you type extra characters are unrelated - you are not refining the search with each letter added, but rather getting a completely different set of results each time.


Onfolio 2.0 Beta - with Firefox Support

I've installed the Onfolio 2.0 beta which comes with a whole raft of new features.

Top of the list so far form me is Firefox integration - though am I being ungrateful by wishing that it would support Mozilla itself as well, rather than just Firefox which I find to be just a little too dumbed down for my needs. Anyway, the integration does appear to duplicate the power of this tool which was initially IE specific, within Firefox so well done there. (Note that although it does use a Firefox extension, and extensions are cross platform, the extension itself calls into an Onfolio XPCOM component, which is a Windows specific dll, so this does not extend Onfolio to new platforms).

Also new is the Feed Reeder. This was the feature that made Scoble pronounce this application as awesome. (Note that the reader comes configured to offer Scoble's feed by default!) I agree that the feed reader generally beats the other stand alone feed readers I've used - however it's arrangement is very similar to how Bloglines presents feeds, and for the moment the advantages of the online solution are compelling.

The invite to take part in this beta offered the following sensible advice "Before using this version of Onfolio, it is a good idea to back up your Onfolio Collection files, and then back up your collection files regularly during your use of the Onfolio Beta." Unfortunately, that is one of the weaknesses of the product - it produces huge files that are impossible to backup. Essentially what it has done is to use Microsoft's support for producing a file system within a file, rather than storing the data directly in the host file system. This means that the collection file soon becomes too big to backup to either CD or DVD, and any incremental backup strategy is defeated by the fact that a single character change to one item in the collection means that the whole collection needs to be backed up once again.

Wednesday, December 15, 2004


Ask Jeeves why their app crashes

I installed the Ask Jeeves Desktop Search - actually install itself is a good experience, its a small download, and actually asks where I want to place the software.

On first running, it offered me its option panels, which allowed me to select custom indexing, but they are simply not custom enough - the choice is between just the default Windows location file locations (which I never use if I have a choice), and "all my files". The email is likewise all, or nothing - so I cant exclude things like my deleted mail folder.

The results screen does offer a preview panel, which is its one plus point, but also its undoing - an image search gave results that the preview panel not only failed to display (which I can probably live with), but which took the whole program down with it. (The image in question was a jpg file in the browser cache - which was corrupted by being truncated - a regular occurrence when downloading files. The standard Windows XP image view can show the file fine - with the missing blocks just "chunkier" than the rest).

Not only did the crash kill the Jeeves.exe program, but since it went down whilst still holding handles to my Outlook email, Outlook was unusable until I rebooted my machine.

I'm not the only one to find this program crashes on them - InsideGoogle has a similar experience.

Since I dont expect to be keeping this program on my machine much longer, I'll quickly see how it stacks up against my review checklist of a couple of days ago. Lots of dont knows, since I'm not spending time getting too deep into this program - I already know its not for me.


Copyright Myths of the Online World

Obviously the Google Print project to scan printed books from major libraries has tremendous copyright implications, which they have carefully considered.

Not all websites and bloggers are so considerate, so it's good to see a very readable page "Debunking Eight Copyright Myths of the Online World". I think there is just one major omission from this advice - the page needs to state "This summary considers copyright law as it applies in the USA. Websites are global and different copyright practices may apply in other parts of the world."

Tuesday, December 14, 2004


How MSN Desktop Search triggers my virus detector

I left MSN Desktop Search indexing my machine overnight, and returned to my desk this morning to find that our IT department were more than a little concerned that my machine was now apparently riddled with viruses - since they had been receiving alerts throughout the night of viruses which kept appearing on my machine.

Piecing together what had happened showed the following:
None of the other desktop search products has given me this problem. (Copernic in particular makes it very easy to choose which folders within my Outlook email it should index). Net result is that I cant afford to incure the wrath of the IT department, and will be disabling MSN desktop search. (I rather expect an edict from the IT guys as well that prohibits others in the company installing this product).


Google Library, and Google Print

John Battelle has the best coverage and comment so far on Google Library which is the announcement that Google are teaming up with the university libraries at Stanford, the University of Michigan, Harvard, Oxford, and also the New York Public Library to digitize substantial numbers of books from their collections.

The results of this collaboration will most likely appear in the Google Print format.

Google Print is currently rather a stange addition to Google results, in that you mostly have to stumble across the results, rather than being able to deliberately search for them. There isn't even a search box on the Google Print home page!

There are a couple of tricks however to target Google Print results:

Monday, December 13, 2004


MSN Toolbar Suite

Well, as widely expected, the MSN Toolbar Suite beta was made available today. Its a hefty download, at nearly 5MB - and for that you get 3 toolbars - one for IE and Windows Explorer, one for Outlook, and one for the desktop.

The support for indexing PDF files is an additional download, and for this Microsoft point you at the Adobe site, where there is another 4MB download needed. It's most interesting to note that this is a standard download that Adobe have been making available for many years, not something they specifically produced for this MSN Desktop Search. The download simply implements Microsoft's standard IFilter interface, so its possible that other new file types can be similarly added by third parties. Its also interesting that Microsoft point you at the old version 5.0 of this download, rather than the current PDF Version 6 one, which weighs in at an even more massive 9.7MB.

Of most interest to me was the Deskbar, which I'm pleased to see implements "search as you type", though since the results are presented in a tiny list window, its a far from perfect implemetation (and if you have the Windows Taskbar set to auto hide, then when you move from the taskbar to the results, the window suddenly jumps position as the taskbar hides, and you find the mouse is now over a different result than you were aiming at). Realizing the limitations of the tiny list, the deskbar also puts up its results in an Explorer window when you press enter. Again there seems to be a problem with the window that it uses - I always use maximized windows, yet this Explorer window insists on coming up at some smaller size, and I have to click on the maximize button to be able to see the results.

The results seem to be of a fixed format, though you can vary the column widths. The results include a few sentences from the start of the file text - not really all that useful - this really needs to change to be a relevant bit of the file, with the search term shown in context.

The Outlook search box also presents its results in the same Windows Explorer window - so the advantage of integrating search into the email client is lost, since you are switched out of the program to view the results.

On the positive side, there do appear to be a good set of tools in the advanced query syntax, including the ability to target specific "fields" of the results such as the author field. I guess that Jake Zukowski had a hand in this feature of the search!

So, although I'm still playing with it, I'll round of this first look with a run down of how it measures up against the list of questions I posted earlier this morning:


Firefox GSuggest Extension

Firefox has always had a close integration with Google search, allowing you to easily do Google searches.

There is now a Firefox extension, GSuggest, which integrates the Google Suggest feature into the browser as well. A few rough edges at the moment, but no doubt it will improve.


Blog LiveSearch drop down list of matches

In the same spirit as Google Suggest, bITFLUX also offers a LiveSearch feature where they come up with pages that match your search query as you type it, and offer them in a drop down list attached to the entry box. (Unlike Google, this is giving you the actual search matches, rather than simply a list of search terms - but the UI idea is the same).

This feature, explained in the bITFLUX wiki went live before Google's. Its available for use on other sites, under an Apache license. There's a list on the wiki page of 8 such sites that are using the code.


Desktop Search Reviewers Guide additions

Scoble has a well thought out Desktop Search Reviewers Guide which covers much of what you might want to consider when choosing a desktop search product.

I think his 16 items list covers most things, but one area he doesn't cover well is "what are its actual search capabilities?"

To go into more detail on this point, I'm looking to see:
He also doesn't raise too many questions about how the results are presented. Again, I'm looking to see:


The return of case sensitive searching?

Once upon a time, there was a pretty good search engine called AltaVista...

One of the features it offered was case sensitive searching. If the search term you entered contained capital letters, then the only matches you got were those that contained capitals where you put them. This certainly came in useful when searching for product names that also happened to be generic words (consider the difference between Windows and windows, or Word and word). The alternative, of entering a search term using only lower case characters would return matches in any case. This worked well, but has been dropped in subsequent reincarnations of the search engine.

If you read the help for Accoona then it states that this new search engine follows the same case sensitive rules as AV used to. The example they give is searching for NeXT - the operating system. However. this is a lie - a search for NeXT currently returns just the same as a search for next. What a shame - case sensitivity is a feature that it would be great to have back in one of the major search engines.

Sunday, December 12, 2004


LookAhead News Index

Whilst it's not nearly as good an implementation as Google Suggest, Surfwax offers a similar "suggest as you type" feature they call "LookAhead News Index".

Unlike the Google variant, this only offers matches after at least 2 characters have been typed, and the suggestions are all exact matches from the index, offered in alphabetical order, rather than ranked by likelyhood of selection. The suggestions appear in a list given in a separate control further down the webpage, not in a dropdown of the text entry box.

There is some quick analysis of the Surfwax implementation.


Google Suggest

As mentioned in the previous post, Google is now offering a beta service called Google Suggest which provides suggested completions of your search term as you are still typing it in.

This is implemented as a small dynamic request sent to the Google server as each character is typed (using XMLHttpRequest), which returns a compact block of javascript which describes the suggestions for the current partial search term. The request is of the form

where the qu attribute is the current partial term.

Predictably, people have started to work out how to use this request form for other uses - after all its effectively just a web service, that given a (partial) word returns a list of associated words and phrases - where these results may include completions of the word, phrases that are specializations of the word, and indeed mispellings of the word.

Such uses include:
For the record (as no doubt the list will change over time), the suggestions after just one character has been given are as follows:

A - Amazon
B - Best Buy
D - Dictionary
E - Ebay
F - Firefox
G - Games
H - Hotmail
I - Ikea
J - Jokes
K - Kazaa
L - Lyrics
M - Mapquest
N - News
O - Online dictionary
P - Paris Hilton
Q - Quotes
R - Recipes
S - Spybot
T - Tara Reid
V - Verizon
W - Weather
X - Xbox
Y - Yahoo
Z - Zip Codes

0 - 02
1 - 1
2 - 2004 Election
3 - 3m
4 - 411
5 - 50 cent
6 - 60 minutes
7 - 7th heaven
8 -
9 - 911

. - .com
£ - £
_ - _
¬ - ¬

The symbols are a strange set of results - its very interesting to see that this service thinks there are over 2 billion results for £, even though actually entering that as your search term gives you a blank set of results.

As Scoble notes, the results of this service are at least partially censored (or "safe search enabled" if you want to put it another way), though like much censorship, its not completely successful.


Search as you type

I find "Search as you type" (where the searching starts when you start typing, rather than waiting for you to complete a search term and hit enter) to be such a useful way of working that I'm always looking for more apps that implement the feature.

Of the apps I use daily, the following implement the feature:
Of course, the first two of these have the easier task - they are simply searching within a block of text, and the result of the search is to move the displayed portion to include the searched for term.

On the otherhand, CDS is returning a sorted or ranked list of matches that the text you are typing refers to - which both requires a lot more processing, and requires that your index that allows you to pull up such results is designed in such a way that it allows prefix matching, not just full word matching.

This sort of leads to the fact that no major search engine offers "search as you type" - few of their indexes are set up for efficient prefix matching, and the need to come up with an entirely new set of results with each keypress is more loading than the servers want.

It's very interesting therefore to see Google Suggest which whilst not providing a full "search as you type" feature, does go part way there in that it provides a list of suggested competions of your search term as you type.