Web Information Organization

UNC SILS, INLS 490-186, Spring 2013

Web Archaeology

Due January 22.

Pre-Web Hypertext & Hypermedia (5 points)
- Do a little research to identify 2 pre-Web hypertext or hypermedia systems other than the ones we discussed in class. For each system, state when and where it was developed, and identify at least two features it had that the Web does not have. (For example, the Xanadu hypertext system was designed to include a “royalty” mechanism so that authors could be paid for their work.) State whether you think these “missing” features ought to be part of the Web, and explain why. Cite your sources!
TBL’s Proposal (5 points)
- In Tim Berners-Lee’s “A Proposal,” he described some issues CERN had with managing their information. What are some of the issues that they faced?
- TBL also described three possible ways to deal with their information management process: trees, keywords, and hypertext. Describe the pros and cons of each of these methods.
- What are the most significant differences between TBL’s initial proposal and the Web we have today? Explain your reasoning.

Submit your answers as a zipped plain text file or PDF.

Submit this assignment.

URIs, Resources & Representations

Due February 5.

Choose a Web site or Web application you use frequently. Identify a possible resource that the site or application does not make addressable and that you think would be useful if it were. Note that this is not a question about what additional information or functionality the site or application might provide. Rather, it is a question about how the existing information or functionality might be made addressable.

Provide:
- the “main” URI for the site or application
- an explanation of the new resource you think should exist, and why
- the URI you would use to identify this new resource
Open this page in the Chrome Web browser. Clear the browser’s cache, then open the Developer Tools. Open the Network Panel by clicking on the Network icon at the top of the Developer Tools window. Now click this link (you may want to copy the questions below someplace else first so you can refer to them).

Now answer the following questions:
- How many HTTP requests did following this link result in? How many resources were requested?
- Were all the requests successful? How do you know?
- How many different types of representations were returned? List the different types.
Now click the Back button, returning this page, and follow the link above again.
- Do you see any differences in the Network panel this time? What are they?
Who owns the URI http://ils.unc.edu/ilssa/, and why?
How can you determine whether two different URIs refer to the same resource?
DBpedia is a project that publishes on the Web structured data extracted from Wikipedia. For this question you will use cURL to explore a DBpedia resource, its related resources, and their representations. Mainly, you’ll be using cURL to request URIs and to look at the headers of HTTP responses.

Quick cURL tutorial:

To request the resource identified by the URI http://example.org/, simply type:
```
curl http://example.org/
```
To look at the headers of the response, type:
```
curl -X HEAD -D - http://example.org/
```
The -X HEAD part means ‘make an HTTP HEAD request,’ which will only request the HTTP headers for the resource, not the representation data. The -D - part means ‘print out the headers’ (by default cURL only shows the representation data, not the metadata in the headers).

To do content negotiation, you need to add an HTTP header to your request, specifying what kind of content you want. For example, if you wanted to request a representation in plain text format, you could type:
```
curl -H 'Accept: text/plain' http://example.org/
```
Of course, just because you request a certain type of representation doesn’t mean that that type of representation is actually available.

Finally, you can combine the options shown above:
```
curl -X HEAD -D - -H 'Accept: text/plain' http://example.org/
```
Now, use cURL to request the following resource: http://dbpedia.org/resource/University_of_North_Carolina_at_Chapel_Hill
- Does this resource have any representations? Why or why not?
Examine the headers returned when you request this resource to find another resource related to this one.
- What is the URI of this second resource?
- What is the relationship between these two resources?
- Investigate this second resource. Does it have a representation?
Look at the headers returned by a request for this second resource. You should see information about a number of related resources, with associated media types. Choose one of these alternate resources, and note the media type. Now make a request for the original resource (http://dbpedia.org/resource/University_of_North_Carolina_at_Chapel_Hill), specifying that you want that media type.
- What was the media type you requested?
- How does specifying a media type change the response you get?

Submit this assignment.

Designing with the Uniform Interface

Due February 14.

This assignment is based on Chapter 5 of Richardson and Ruby’s RESTful Web Services.

For this assignment you will try your hand at designing an information service that follows the principles of the uniform interface. You’ll do the following:

Decide what your information or “data” is
Figure out how to split up your information into resources
Give your resources URIs
Decide which HTTP methods your resources will support
Think about what representations your resources will have

Part 1: Choosing your data

Unlike the buckets service, you’ll want your service to be something “real,” i.e. potentially useful. But that doesn’t mean it needs to be complex.

Some things to think about as you make your decision:

What are your primary units of information? For example: books.
How can you identify your units? For example: books have ISBNs.
Do your units have some structure to them? For example: books can be divided into chapters and pages.
How might your units be classified into types? For example: books can be classified by subject.
How might people want to find your units? For example: books could be looked up by title, by author, by subject, by similarity to some other book, or some combination of these.

Deliverable #1: write a few paragraphs explaining what kind of data your service deals with, addressing the points above and whatever else you think is relevant. Do not get into specifics such as data formats, unless it’s absolutely necessary for a high-level description of your service.

Part 2: Split your data into resources

Now you have to decide how to divide your data into resources. Remember that a resource is anything you might want to link to.

To start off, you’ll probably have some sort of “root” resource that exists just to provide access to the other resources. For example, in the buckets service, the root resource is http://webinfo-buckets.herokuapp.com/. Its purpose is just to list the existing buckets and to create new buckets.

Next, you’ll want to have resources that more or less map directly to the units and sub-units you identified in Part 1. In the buckets service, these are the buckets themselves: http://webinfo-buckets.herokuapp.com/A, http://webinfo-buckets.herokuapp.com/B, etc.

Finally, you probably want some resources that allow you to filter, select, or compose other resources. For example, we might add search functionality to the buckets service by creating resources like http://webinfo-buckets.herokuapp.com/find?q=cat, the value of which wuld be a list of buckets that contain the sequence of letters cat. Or we might add resources like http://webinfo-buckets.herokuapp.com/B+F, the value of which would be the contents of both bucket B and bucket F.

Deliverable #2: list the various kinds of resources your service will provide access to. Do not worry about giving them URIs yet.

For example, for the buckets service I might list the following kinds of resources:

The list of buckets
A bucket identified by a letter
A list of buckets that contain some sequence of letters
A concatenation of two or more buckets’ contents

Part 3: Name your resources

Now it’s time to identify your resources by giving them URIs. You’re not going to be actually hosting this service anywhere yet, so don’t worry about the host part of your URIs, just specify what the path should be.

For example, in the buckets service the resources are named as followed:

The list of buckets: /
A bucket identified by a letter: /{bucket-id}
A list of buckets that contain some sequence of letters: /find?q={query}
A concatenation of two or more buckets’ contents /{bucket-id}{+bucket-id*}

A few things to note about how I specified my URIs above:

For the root resource, I just gave the path, since there is only one.
Rather than list each bucket URI individually, I used a URI template. This is just a shorthand for indicating how to construct a set of similar URIs. The convention is to use curly brackets to indicate a variable that should be replaced with some value in order to obtain an actual URI. In this case, the bucket-id variable in the template /{bucket-id} would be replaced by a bucket’s identifier (e.g. the letter C) to obtain that bucket’s URI: /C.
Similarly, the query variable in the template /find?q={query} could be replaced by any sequence of letters, giving us a potentially infinite number of URIs.
For the last category of resources, I’ve used another special character, *, to indicate that this part of the template can be repeated any number of times. So, following this template we can construct URIs like /A+C, /A+C+G, /A+C+G+B, etc.

One kind of URI we don’t have in the buckets service is one that indicates hierarchy. If you have resources that are “parts of” other resources, you will probably want to give them URIs that indicate that relationship. So, for example, if your resources are books and chapters, you might have URIs that look like this:

Book: /{isbn}
Chapter of a book: /{isbn}/ch{chapter-number}

Note how the / character is used to divide levels of the hierarchy.

Deliverable #3: Specify a URI or URI template for each kind of resource you identified in Part 2.

Part 4: Pick your methods

Now, for each of your resources, decide which HTTP methods it will support, and explain what the effect of the method is.

For example, in the buckets service:

/ supports GET and POST. GET returns a list of buckets. POST creates a new bucket.
/{bucket-id} supports GET, PUT, and DELETE. GET returns the bucket’s contents. PUT replaces the bucket’s contents with some new contents. DELETE destroys the bucket.
/find?q={query} supports GET, which returns a (possibly empty) list of buckets.
/{bucket-id}{+bucket-id*} supports GET, which returns the concatenated contents of two or more buckets.

Deliverable #4: For each of your resources, list the supported methods and explain what they do.

Part 5: Think about your representations

We’ll get deeper into designing representations after this assignment is due. For now, just think about what kind of data needs to be included in the representations of your resources. Consider both representations included in requests to your service (i.e. in PUT or POST requests) and representations included in responses from your service. Don’t worry about specific media types for now, just think about what data is needed. Another important thing to consider is what status codes your service will possibly return. This means you need to think not only about successful requests for your resources, but unsuccessful things as well.

For example, in the buckets service we might document the following (note that this is incomplete):

GET to / returns either a comma-separated list of bucket names, or the message No buckets have been created. In either case the status code is 200 OK.
POST to / returns the message Created bucket {bucket-id} with the URI of the new bucket in the Location header, and a 201 Created status code. But if no more buckets can be created, it returns the message The maximum number of buckets has been created with a 403 Forbidden status code.
PUT to /{bucket-id} requires a representation in the form of a key-value pair where the key is data and the value is any sequence of characters. If this representation is missing, the response will be the message No data was specified with a 400 Bad Request status code. If the bucket identified by bucket-id doesn’t exist yet, the response will be the message No such bucket exists with a 404 Not Found error code. Otherwise the response will be the message Added {data} to bucket {bucket-id} with a 200 OK status code.

Note that I didn’t bother including 500 Internal Server Error responses, since we assume that any resource can potentially return these.

Deliverable #5: For each method supported by each resource in your service, specify what kinds of representation (if any) it requires, and what kinds of representations, including status codes, it might return. Be sure to consider possible error conditions.

All five of your deliverables should be combined into a single text file, zipped and uploaded via the link below.

Submit this assignment.

Designing a Hypermedia Type

Due February 26.

For this assignment, you will continue designing the information service you began developing in the previous assignment. Specifically, you will be designing representations of the resources you identified in the previous assignment. While your resources may lend themselves to any number of different representations, here you are asked to focus on designing hypermedia that not only represents the data and metadata about the data, but also uses links to represent metadata about your service and the ways it can be interacted with.

It is possible to design hypermedia types using many different data formats, but for the purposes of this assignment you are asked to use HTML. By using HTML as your base format, you will not need to design your own hypermedia controls (i.e. syntax for creating links) since HTML has already defined these for you. So your design effort will focus on expressing the semantics of your information service using the existing elements and attributes of HTML.

Getting set up

You will be using GitHub to manage and submit your work for this assignment. If you’re already a GitHub user, you’ll just need to create a new repository for this assignment. If you’re not, read on. (You should have already done some of these steps in class on 2/19.)

First, sign up for a free GitHub account.
Next, install and set up the Git version control software on your computer.
Now you’re ready to create a “repository” for your assignment. Follow the instructions at GitHub to create a repository. Note that these instructions assume you are creating a repository named Hello-World. Don’t name your repository Hello World. Give your repository a short but meaningful name related to the information service you are designing. So, don’t name it assignment4 either.

At this point, if you’ve followed all the instructions linked above, you should now have a public GitHub repository for your assignment.

Using Git

Git is powerful and complex software. However, the way we’ll be using it is rather simple and should be straightforward.

These instructions assume that you’re using Git from the command line. That means you’ll be using Terminal.app if you’re on a Mac, or Git Bash (which you should have installed as part of the setup process above) on Windows. (If you’re on Linux I assume you already know how to operate a shell.)

To add files to your GitHub repository:

Create and save a file (for example, all-buckets.html) in the directory (folder) you created as part of the process of creating a repository.
Open your command line (Terminal or Git Bash), and move to your repository directory using the cd command. For example, if your repository was named buckets, you should be able to get there using the following command:
```
$ cd ~/buckets
```
Verify that you can see your file using the ls command:
```
$ ls
README           all-buckets.html
```

The git status command should show you that your new file has not yet been added to your repository:

$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   all-buckets.html
nothing added to commit but untracked files present (use "git add" to track)

Now use git add to add the new file:
```
$ git add all-buckets.html
```

Using git status again will show you that the file has been added:

$ git status
# On branch master
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#   new file:   all-buckets.html
#

At this point you’ve told Git that you want it to keep track of the new file, but you haven’t actually saved it to the repository yet. Do that with the git commit command, adding an informative note:
```
$ git commit -m 'Added example representation of a list of buckets.' all-buckets.html
```
Now you’ve saved the file to your local repository, but you haven’t yet “pushed” it to the public repository on GitHub. Do that as follows:
```
$ git push origin master
```
Now you should be able to see your file in your public repository on GitHub. If you make changes to your file and save it, then git status should notify you that the file has changed:
```
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   all-buckets.html
#
```
Using git add and git commit again, you can save these changes to your local repository, and using git push you can push them to your public repository.

Designing your hypermedia type

To design the hypermedia representations of your resources, you’ll need to think about:

How to represent the data provided by your various resources in HTML
Common patterns (“blocks”) that appear in your representations, and how these will be identified
How you will use outbound links to link representations to related resources and indicate possible process flows
How you will use templated query links for search actions
How you will use non-idempotent update links to create and update resources

To show your answers to these questions, you will create a set of HTML files that are example representations of your resources.

Your templated query links and non-idempotent update links (i.e. HTML forms) should have action attribute values and input elements that correspond to the URI templates you designed in the previous assignment.

Your outbound links (i.e. anchor elements) in each HTML files should have href values that link them to the other HTML files, so that you can open one HTML file in a web browser and click on links to get to the other files. So, for example, in a “real” buckets service each element in the HTML representation of my “all buckets” list would link to the specific URI for that bucket. But in these example HTML files, I would just have each element in the list link to bucket.html (the example HTML representation of a single bucket resource).

Note that because HTML does not support idempotent update links, you can use non-idempotent update links to handle idempotent updates. Later in the semester we will look at techniques for adding support for idempotent updates to HTML.

Deliverables

To complete this assignment, your GitHub repository ought to contain:

A plain-text README file documenting the id attribute values, class attribute values, name attribute values, and rel attribute values you coined to describe your service and the data it provides. Use pages 105-110 of Hypermedia APIs with HTML5 & Node as an example of how to document your attribute values. Note that you can use Markdown syntax in your README file for formatting such as italics, lists, etc.
One example HTML file for each GETtable kind of resource your service provides access to. For example, the buckets service used as an example in the last assignment would have four example files:
1. all-buckets.html, demonstrating how the list of all buckets (the resource /) would be represented.
2. bucket.html, demonstrating how the contents of a single bucket (a resource with a URI like /{bucket-id}) would be represented.
3. find-buckets.html, demonstrating how a list of buckets matching a query (a resource with a URI like /find?q={query}) would be represented. (This would probably look very similar to all-buckets.html.)
4. concatenated-buckets.html, demonstrating how two concatenated buckets (a resource with a URI like /{bucket-id}{+bucket-id*}) would be represented. (This might look very similar to bucket.html.)

Once you’ve pushed all your files to your public GitHub repository, please submit this assignment by posting a message to Piazza with the URI of your repository.

Midterm Exam

Due March 8.

The exam is a take-home exam. You may consult any resources that you desire while you are working on the exam.

You may download the exam at any time. You must submit the exam within 24 hours of the time that you downloaded it. Submit your exam no later than 5PM on Friday, March 8. If for some reason you need to submit it later than 5PM on Friday, please contact me. But be sure you do not download the exam more than 24 hours before you plan to submit it.

When you are finished with the exam, please put your name in the header, zip it, and submit it using the link below.

Submit this assignment.

Final Project

Due May 7.

For your final project, you will take the design work you did for the last two assignments, and turn it into a working Web information service.

Implementing your service

Your service must provide access to at least two kinds of resources that have some kind of relationship to one another. Clients should be able to access the two kinds of resources directly, and they should also be able to access “collection” resources that list all the resources of a particular kind. It should be possible to create and update at least one of the kinds of resources through your service.

For example, the election information service provides access to two kinds of resources: political parties and candidates. Candidates belong to one party at a time, although that affiliation may change over time. The service provides resources that list existing parties and candidates, the latter of which is filterable. Both parties and candidates can be created and updated through the service.

Your service should provide (at least) HTML representations for all resources. These representations must include metadata that describe the application (how to transition from one state to another) and the data being provided.

Describing your application flow

Your HTML representations must include the proper hypermedia controls for linking representations to one another, creating query URIs from templates, and updating resources both idempotently and non-idempotently. Your HTML controls must have appropriate id attribute values, class attribute values, name attribute values, and rel attribute values that describe their meaning and purpose (this was the work you did for the Designing a Hypermedia Type assignment).

Describing your data

Furthermore, the data in your HTML representations must be described using one of the two standards for embedding metadata in HTML: RDFa or microdata.

If you choose to use microdata, you should describe your data and relationships using appropriate types and properties from schema.org. If the types or properties at schema.org are too generic for your data, you may considering extending them. If there are no appropriate types or properties for your data at schema.org, you might consider using RDFa instead.

If you choose to use RDFa, you should describe your data and relationships using types and properties from some RDF-compatible vocabulary such as the DBpedia Ontology, the Bibliographic Ontology, or schema.org. You can search for appropriate vocabularies at Schemapedia. The Library Linked Data Incubator Group also has a good list of RDF vocabularies.

Once you’ve got your application running, either in Cloud9 or deployed to Heroku, you can use Google’s Rich Snippets Testing Tool or the omnipotentdatatranslator to check your microdata, or one of the two RDFa Distillers to check your RDFa.

Working in groups

You may work in groups of up to three people. If you choose to work in a group, select one group member’s designs to implement. (If two or more group members developed similar or related designs, you may choose to merge their designs.) The guidelines given above reflect my minimum expectations for someone working alone; larger groups will have correspondingly higher expectations.

If you decide to work in a group, email me the GitHub usernames of your group members and a short but clever name for your group. This will enable me to set up a shared GitHub repository for your group.

The example service

I’ve created an example service that provides access to the election information described above. See the README for details on how to use this as a starting point for implementing your own service.

Deliverables

Your final deliverables for the project are:

The URI of your service running on Heroku.
The URI of a single GitHub repository containing the complete source code for your service.
A Readme.md file in the GitHub repository documenting:
- the attribute values used to describe your application flow, and
- the types and properties used to describe your data.
(The .md suffix indicates a plain text file that uses Markdown syntax. This allows you to produce something more nicely formatted than plain text alone, and GitHub will automatically display it as HTML.)

You may simply post to Piazza your two URIs by 12 midnight on Tuesday, May 7th.