The Informatician: Blog by Umar Ruhi: 2008

Monday, December 29, 2008

Exploring Mashups – Part III: A Geo-coded Mashup using Dapper, Yahoo Pipes & Google Maps

This is the third installment in my series of tutorials on building mashups using tools such as dapper for data mapping, and yahoo! pipes for data processing.

Sunday, December 14, 2008

Playing with Mashups - Part II : Integrating Dapper with Popfly

Update: The Popfly site was discontinued in August 2009, and this article is only been keep for archival purposes.

This blog post follows from a previous one where I experimented and mulled over the functions provided by the content mapping and mashup functionality provided by Dapper. In my first trial run, I created a mashup for an online discussion forum in which I participate on a regular basis. The mashup was pretty simple – yet, it was the result of composing several dapps together in an iterative manner using a variety of information output formats. In this post, I am summarizing how I was able to take the final dapp that was created in part 1, and use it as an input to Microsoft’s Popfly mashup creator.

The data flow schematic and the popfly screenshot below reveal the configuration between the dapp and the popfly mashup. What’s happening here is that the dapp is used to output an RSS feed which is then captured by the RSS block in popfly. The specific items from the RSS feed are then rendered as a table.

Dataflow Schematic - Dapper & Popfly Integration

click to enlarge

Popfly Screenshot - Dapper & Popfly Integration

click to enlarge

After reading the above, you’d probably wonder if this new configuration is really a “mashup” afterall in it’s strictest definition… and the simple answer to that is “probably not”… since the data in this case is being acquired from a single source, i.e. the dapp. However, the dapp itself was constructed in such a way that it fetches and formats data from multiple sources - so the original dapp itself was a mashup! That being said, the integration between dapper and popfly goes to show two important aspects of mashups: composability and reuse. The original dapp has now been composed with a different output stream, i.e. RSS and it has been configured with a new presentation format, i.e. table display in popfly. The end result is shown below:

Dapper gathers feeds from the online forum and combines those feeds with last posters avatars into a data object;
Popfly acquires the new data object and displays the data

Mashup - Dapper & Popfly Integration

*Embedded Content Removed - Popfly website is no longer available*

Wednesday, December 10, 2008

Playing with Mashups - Part I : Using Dapper for Content Mapping

This is the first blog post in what I plan to be a series of posts to share my experiences with various mashup building applications online.

Over the last couple of days, I’ve been playing with the data mapping and mashup functionality provided by Dapper. Overall, my experience has been rather pleasant, and I would definitely suggest the service to someone who’s just getting started with mashups. It should be noted however, that on the outset, Dapper is more of a data mapping service than a mashup. The main features and functions of Dapper allow the end-user to extract pieces of information from websites and label and organize the information in a logical structure. The resulting information structure can then be output in various formats including XML, HTML, and RSS feeds, or it can be rendered in specific applications such as a Google Gadget or a Flash Widget. It’s pretty cool when you try all the available output options for the first time.

So to test out the basic functionality of Dapper, I set out on a simple task to query the Google Scholar search engine and fetch the list of my research publications. I thought this might result in something I would want to use in the future… and I was right. Other than the fact that Google Scholar still does not index many academic repositories and hence, the results are not comprehensive, I did get some results which I was able to formulate into a dapp. The resulting dapp was then rendered as a flash widget (shown below).

Here is a summary of the procedure to get to the results as displayed in the flash widget above.

1. Login to Dapper

2. In the section titled, Get any content from the Web, click on create a new Dapp

3. Enter the Google Scholar URL pointing to the search results you want to extract from.
In my case, I used: http://scholar.google.ca/scholar?q=author:"Umar+Ruhi"&num=100&scoring=r
My name is specified in the author parameter. The num=100 parameter specifies the number of search results to show; and scoring=r specifies that I would like “r”(recent) articles to be listed first.
Under the form address bar, you can leave the format as Dapp XML.The output format can be customized later.

4. Once you see the google search results in Dapper’s virtual browser, you can select “Add to Basket”.

5. Ideally, at this point, it is recommended that you also give Dapper a couple of other similar pages (e.g. search results for other authors), so that Dapper can make some sense of the webpage structures. I didn’t really feel the need to do so in this case. If you want Dapper to just analyze the webpage structure of more pages, you can enter their URLs and “Add to Basket”. Once done, you can click on the Next Step button.

6. With the page(s) in the basket, you can start selecting areas of the page that you would like to save into your information structure for output. For example, I clicked on the titles of the search results which highlighted all the titles on the page. Sometimes, Dapper will select other non-related areas automatically, and you will need to manually de-select these by clicking the Clear link for the item(s) in the Preview selected content pane.

7. If you’re satisfied with the selection of information items, you can click on the Save Field button and provide a label for the field.

8. Steps 6 and 7 can be repeated to gather more information items from the webpage in the basket. With all the content fields identified, you can click on the Next Step button.

9. On the next screen, you will group the content fields and provide group name(s). These group names will be used to structure the resulting dapps. With all the groups specified, click on the Next Step button to proceed to the last screen.

10. On the last screen, you will be asked to save the dapp and provide a description. You will also have the option of making the dapp private, i.e. only you or someone who knows the exact address of the dapp or its output feeds will be able to access it. I use the private option for most of my dapps which I know have little utility for anyone else.
The input variables are meant to provide a mechanism for allowing an end-user to customize how the dapp is called. It is an optional feature, and you can click on the help link to learn more. Click Save when you’re done entering all the details of the dapp.

11. Now that the dapp is created, you will be presented with a variety of output options which you can play around with to see which option works best for you. As shown above, I selected the flash widget to present the results of my dapp.

That’s about it in terms of how the Google Scholar dapp was created. For everyone who’s interested in exploring Dapper, I also hope this mini-tutorial comes in handy for learning about other basic features of Dapper.

Friday, September 19, 2008

Security Authentication & Authorization: The Case of Sarah Palin's Email Account Breach

So I take it that many of you have now heard that Republican Vice Presidential candidate Sarah Palin's Yahoo! email account was broken into a couple of days ago. I’ve now read through a few news stories and also gone over the actual chronology of the incident by reading the online post of the person who has claimed responsibility. The guy claiming responsibility posted his story on an online forum with the handle “Rubico”.

This case makes a rather interesting discussion piece for a course I teach on Internet Security… and reminds me of the many conversations I’ve had with security colleagues in the past when I was involved with the implementation and institutionalization of authentication credentials, access control procedures, authorization mechanisms, and single-sign-on (SSO) implementations.

The gist of the story about Sarah Palin’s Yahoo! email hacking is that the intruder gained access via the forgotten password mechanism on the Yahoo! webmail interface. By following the typical verification prompts and challenge questions, the intruder was able to reset the profile's password. All that was needed were just a few pieces of information that were easily acquired through Google searches and Wikipedia entries about the Alaska Governor. Overall, this was a case of knowledge-based security mechanism hack, and it is also a classic illustration of the importance of proper password management. It demonstrates how our common password habits can expose us to a rather simple security hack. That being said, the consequential impact of this simple hack can turn out to be something quite destructive such as the disclosure of confidential information.

Relevant details and a quick chronology of this incident are as follows:

the person (lets call him Rubico) who hacked in to Sarah Palin’s Yahoo! account used “an anonymizer” service called CTunnel (http://www.CTunnel.com) to browse to the Yahoo! email service site.

Rubico used Sarah Palin’s email address on the login screen and clicked on the "Forgot your ID or password?" link. He knew Palin’s correct email address from several online websites of government institutions, groups and committees that Palin had served on.
Rubico was then prompted to either supply an alternative email address for resetting the password, or to choose the option to reset without access to a registered email account. To me, this option to bypass the alternative email mechanism is an obvious red flag. Rubico, as can be expected, chose the latter option.

At this point, Rubico was asked to answer a "secret" security question. This so called knowledge-based authentication is supposed to add an additional layer of validation in a single-factor authentication scheme whereby the end-user has to provide something else he/she knows. In Palin’s case, her “secret” question was “Where did you meet your spouse?”, and Rubico had found the information to answer this question through public sources. Apparently, she had revealed in her TV interviews that she and her husband were high school sweethearts. So Rubico tried a couple of variations of the name of the local high school where she grew up and eventually got a hit on “Wasilla High”.

Finally, the Yahoo! account profile verification prompts ask to verify the ZIP code and Country. Once again, for public figures such as Palin, this information is generally commonly available.

Once Rubico entered the correct answer to that single “secret” question, and completed the profile verification prompts he was immediately allowed to change Palin’s password, and he promptly changed it to “popcorn”. At this point, if you’re like me, you’re probably wondering why Yahoo! didn’t make the password reset mechanism and the validation workflow a little more challenging.

The only somewhat positive aspect of the validation workflow is its last step where the end-user is notified that his/her account is now "up to date", and that he/she will also receive an email on the alternative email account with a notification of the changes that were made to the account. This is probably how Palin would have discovered that her Yahoo! email account was compromised, but alas, that news was already out in the open through numerouschat channels and public discussion forums.

So, it’s pretty apparent in this situation that the authentication mechanism is only as strong as its weakest component, and the fact that there is an option to reset the password without ever having to leave the browser window is a critical issue in the validation workflow. Why not include an out-of-band multiple factor authentication component to authorize security related changes such as password resets? Why not use something similar to MyOpenID (http://www.myopenid.com) or Passpack (http://www.passpack.com) to complement or strengthen the current procedures?

Sunday, September 7, 2008

The Informatician - Blog Overview

Key Definitions:

Informatician: someone who practices informatics

Informatics: the art, science and human dimensions of information technology ; the interdisciplinary study of the design, application, use and impact of information technology

Subject Matter of the Blog:

My intention in starting this blog is to discuss and learn about current and upcoming information and communication technologies (ICTs) and their applications for public end-users as well as organizations.

By virtue of my own professional and academic pursuits, I am particularly interested in the discussing challenges and opportunities in the areas of Virtual Communities; Web Engineering ; Web 2.0 Technologies; Web Services; Social Media; Information Security; and Electronic Human Resources Management.

Orientation of the Blog Writer:

My philosophy as an informatician is fundamentally based on my conviction that the design and deployment of technologies is more than just a technical pursuit, and as informaticians, we strive to bring attention to the intricate relationship between technologies and their social and organizational context of use in real-world settings.

My own predilection as an informatician is at the intersection of management and sociology, and most of my work as an academic and an industry consultant has been in the areas of social informatics and business informatics.

Below is a visual depiction of how I think about informatics. Please note that this is purely my own viewpoint - you might have an different position on the same subject.

click to enlarge