Wednesday, December 10, 2008
Playing with Mashups - Part I : Using Dapper for Content Mapping
This is the first blog post in what I plan to be a series of posts to share my experiences with various mashup building applications online.
Over the last couple of days, I’ve been playing with the data mapping and mashup functionality provided by Dapper. Overall, my experience has been rather pleasant, and I would definitely suggest the service to someone who’s just getting started with mashups. It should be noted however, that on the outset, Dapper is more of a data mapping service than a mashup. The main features and functions of Dapper allow the end-user to extract pieces of information from websites and label and organize the information in a logical structure. The resulting information structure can then be output in various formats including XML, HTML, and RSS feeds, or it can be rendered in specific applications such as a Google Gadget or a Flash Widget. It’s pretty cool when you try all the available output options for the first time.
So to test out the basic functionality of Dapper, I set out on a simple task to query the Google Scholar search engine and fetch the list of my research publications. I thought this might result in something I would want to use in the future… and I was right. Other than the fact that Google Scholar still does not index many academic repositories and hence, the results are not comprehensive, I did get some results which I was able to formulate into a dapp. The resulting dapp was then rendered as a flash widget (shown below).
Here is a summary of the procedure to get to the results as displayed in the flash widget above.
1. Login to Dapper
2. In the section titled, Get any content from the Web, click on create a new Dapp
3. Enter the Google Scholar URL pointing to the search results you want to extract from.
In my case, I used: http://scholar.google.ca/scholar?q=author:"Umar+Ruhi"&num=100&scoring=r
My name is specified in the author parameter. The num=100 parameter specifies the number of search results to show; and scoring=r specifies that I would like “r”(recent) articles to be listed first.
Under the form address bar, you can leave the format as Dapp XML.The output format can be customized later.
4. Once you see the google search results in Dapper’s virtual browser, you can select “Add to Basket”.
5. Ideally, at this point, it is recommended that you also give Dapper a couple of other similar pages (e.g. search results for other authors), so that Dapper can make some sense of the webpage structures. I didn’t really feel the need to do so in this case. If you want Dapper to just analyze the webpage structure of more pages, you can enter their URLs and “Add to Basket”. Once done, you can click on the Next Step button.
6. With the page(s) in the basket, you can start selecting areas of the page that you would like to save into your information structure for output. For example, I clicked on the titles of the search results which highlighted all the titles on the page. Sometimes, Dapper will select other non-related areas automatically, and you will need to manually de-select these by clicking the Clear link for the item(s) in the Preview selected content pane.
7. If you’re satisfied with the selection of information items, you can click on the Save Field button and provide a label for the field.
8. Steps 6 and 7 can be repeated to gather more information items from the webpage in the basket. With all the content fields identified, you can click on the Next Step button.
9. On the next screen, you will group the content fields and provide group name(s). These group names will be used to structure the resulting dapps. With all the groups specified, click on the Next Step button to proceed to the last screen.
10. On the last screen, you will be asked to save the dapp and provide a description. You will also have the option of making the dapp private, i.e. only you or someone who knows the exact address of the dapp or its output feeds will be able to access it. I use the private option for most of my dapps which I know have little utility for anyone else.
The input variables are meant to provide a mechanism for allowing an end-user to customize how the dapp is called. It is an optional feature, and you can click on the help link to learn more. Click Save when you’re done entering all the details of the dapp.
11. Now that the dapp is created, you will be presented with a variety of output options which you can play around with to see which option works best for you. As shown above, I selected the flash widget to present the results of my dapp.
That’s about it in terms of how the Google Scholar dapp was created. For everyone who’s interested in exploring Dapper, I also hope this mini-tutorial comes in handy for learning about other basic features of Dapper.