Wednesday, May 25, 2005
Piggy Bank brings the semantic web here and now
Piggy Bank is the rather strangely titled Firefox extension that tries to turn the existing web into the semantic web - that is to say that it aims to allow extraction of structured information from the (on the surface) unstructured data presentation that makes up most webpages, and then to go on and use that structured data.
The extension, a substantial 4.8MB download, is the result of a joint project of w3c, MIT, and HP. Many other open source projects have also been utilized in producing this one.
It provides a mechanism to capture information from the web, to save it as structured data, to tag it, and to search through and combine that information in many ways. In particular, it integrates the Google Maps mashup mechanism, allowing the data to be seen in its geographical context. Data can also be shared, by publishing it, perhaps to a central "semantic bank".
The collection of data either uses some in built collectors (for existing structured data such as RSS feeds or RDF data), or relies on specially written screen scrapers which can extract structured data from particular web sites.
Unfortunately the use of screen scrapers is often fragile - depite being launched just a couple of
days ago, I could not get any of the sample screen scrapers to work - the web pages they target seem to have changed in such ways that the scrapers no longer work. Instructions are given for writing new screen scrapers - they are written in either XSLT or javascript, and their job is to take (generally) HTML, and extract the structured data from it into RDF form.
All in all a very interesting project, pulling together a lot of smart concepts - I will be following this closely.
The extension, a substantial 4.8MB download, is the result of a joint project of w3c, MIT, and HP. Many other open source projects have also been utilized in producing this one.
It provides a mechanism to capture information from the web, to save it as structured data, to tag it, and to search through and combine that information in many ways. In particular, it integrates the Google Maps mashup mechanism, allowing the data to be seen in its geographical context. Data can also be shared, by publishing it, perhaps to a central "semantic bank".
The collection of data either uses some in built collectors (for existing structured data such as RSS feeds or RDF data), or relies on specially written screen scrapers which can extract structured data from particular web sites.
Unfortunately the use of screen scrapers is often fragile - depite being launched just a couple of
days ago, I could not get any of the sample screen scrapers to work - the web pages they target seem to have changed in such ways that the scrapers no longer work. Instructions are given for writing new screen scrapers - they are written in either XSLT or javascript, and their job is to take (generally) HTML, and extract the structured data from it into RDF form.
All in all a very interesting project, pulling together a lot of smart concepts - I will be following this closely.