Web Information Organization

UNC SILS, INLS 690-186, Fall 2014

Probes

Throughout the semester (approximately once a week, or slightly less often) I will give you “probe” questions to answer online before coming to class. These questions will “probe” your understanding of the material you’ve just read. They serve 2 purposes:

They show me that you’ve done the reading, and
they highlight areas that you may be having trouble with, so I can spend more time on them in class.

Because the probes will involve concepts that we have not yet discussed, you will be graded mainly on the level of effort you put into trying to answer, rather than the correctness of your answer (and some questions may not have a clear correct answer anyway). So:

If you don’t answer the probe at all, you get zero points.
If you try to answer the probe, but make no reference to the readings or anything else we’ve covered in class, you get one point.
If you answer the probe and refer to concepts from the readings or from class, but use them incorrectly or don’t fully answer the question, you get two points.
If you answer the probe completely and correctly, making reference to the readings, class notes, or outside materials, you get the full three points.

The questions will be posted 24 hours before they are due, and they will be due at 7AM, 2.5 hours before we meet.

Designing a State Machine

Due September 16.

For this assignment you will conceptualize interactions with an information service in terms of “state machines,” and think about how these state machines could be mapped to the uniform interface of HTTP.

You’ll do the following:

Decide what interactions the service needs to support, and the kinds of resources involved
Draw diagrams of the “state machines” for these interactions
Show how your state machines could be implemented using HTTP

You should work on this assignment on your own. You are welcome to ask your classmates general questions about concepts relevant to the assignment, but don’t design your state machines collaboratively.

Part 1: Thinking about service interactions

In class I will give you a brief, high level description of the service for which you will be designing state machines. Read the description carefully, then try to answer the following questions:

What are the different user roles for this service? In the Starbucks example that we discussed in class there were two roles: Customer and Barista. You will design one state machine per role. You should have at least two roles, but probably won’t need more than three.
What kinds of interactions does the service need to support for each role? In the Starbucks example, the customer needed to be able to: order a drink, change her order, pay for her order, and receive her drink. The barista needed to be able to: see what drinks he needs to make, check to see if a drink has been paid for, and cross off his list drinks received by customers.
What are the different kinds of resources involved in the interactions? In the Starbucks example there were the following kinds of resources: Order, Payment, and Drink. There was also a resource that was a queue of Orders.
What kinds of dependencies are there among the steps of the various interactions? In the Starbucks example, the customer could not change her order once the barista started making it; and the customer could not receive her drink until she had paid.

Deliverable #1: write a few paragraphs addressing the points above and whatever else you think is relevant. Do not get into specifics such as URLs or data formats. Keep things as simple as you can.

Part 2: Draw your state machine diagrams

Now you will take what you decided upon in Part 1 and draw state diagrams. You can use drawing software, or draw your diagrams by hand. I don’t really care as long as they are readable and understandable.

Draw one diagram per user role. Each diagram should have a “start” node, an “end” node, and a set of state nodes. The state nodes should be given appropriate names; in the Starbucks example these were names like Order placed, Drink made, etc. Arrows between nodes indicate how the user moves from one state to another. The arrows should be labeled to indicate the action taken to move from one state for another. For example, in the Starbucks Customer state diagram, the pay action took the customer from the Order placed state to the Paid state. There may be multiple arrows between the same two states; for example either accept update or reject update took a customer from Order change requested to Order placed in the Starbucks example.

Deliverable #2: At least two but not more than three state machine diagrams, one per user role.

Part 3: Show how your interactions could be implemented using HTTP

Now you will show how your state machines could be implemented using HTTP. Create new versions of your state diagrams in which your nodes are resources, and the arrows are HTTP requests or responses. If an arrow represents an HTTP request, it should be labeled with the request’s HTTP method. If it represents a response, it should be labeled with an HTTP status code. So, for example, in the Starbucks case the Order placed state became the Orders queue resource, and moving from the start node to Order placed by the pay action became sending a POST request to the Orders queue resource.

You do not need to give your resources URLs. Just give them meaningful names. Often you will have a resource that is a specific instance of a kind of resource. In this case, ensure that your resource name makes this clear. For example, we named the order resource in the Starbucks example Order #123 to indicate that it is one of many Order resources.

You don’t need to indicate the to response for every HTTP request unless it is particularly critical to the interaction. For example, in the Starbucks example the to a PUT request to an Order resource indicated whether the update succeeded or not, which is a pretty critical part of the interaction of changing an order.

Deliverable #3: New versions of your state machine diagrams, showing how they could be implemented using HTTP.

Now look over all your deliverables, and correct any inconsistencies that might have arisen. If, in the process of making your state diagrams, you changed your mind about how to model your resources or interactions, update your answers to part 1 to indicate this.

All three of your deliverables should be printed and turned in at the beginning of class on the date the assignment is due.

Resources and Representations

Due September 23.

Choose a Web site or Web application you use frequently. Identify a potential resource that the site or application does not make addressable and that you think would be useful if it were. Note that this is not a question about what additional information or functionality the site or application might provide. Rather, it is a question about how the existing information or functionality might be made better addressable.

Provide:
- the “main” URL for the site or application, i.e. the URL of the “starting” resource in a typical interaction with the site or application
- an explanation of the new resource you think should exist, and why
- the URLs (or URL templates) of 2 existing (kinds of) resources, from which you think your new resource should be directly reachable (e.g. via a link), with short explanations why
Open this page (the one you are reading now) in the Chrome Web browser. Clear the browser’s cache, then open the Developer Tools. Open the Network Panel by clicking on the Network tab at the top of the Developer Tools window. Now click this link (you may want to copy the questions below someplace else first so you can refer to them).

Now answer the following questions:
- How many HTTP requests did following this link result in? How many resources were requested?
- Were all the requests successful? How do you know?
- How many different types of representations were returned? List the different types.
Now click the Back button, returning this page, and follow the link above again.
- Do you see any differences in the Network panel this time? What are they?
Who owns the URL http://ils.unc.edu/ilssa/, and why?
How can you determine whether two different URLs refer to the same resource?
DBpedia is a project that publishes on the Web structured data extracted from Wikipedia. For this question you will use cURL to explore a DBpedia resource, its related resources, and their representations. Mainly, you’ll be using cURL to request URLs and to look at the headers of HTTP responses.

Quick cURL tutorial:

To request the resource identified by the URL http://example.org/, simply type:
```
curl http://example.org/
```
To look at the headers of the response, type:
```
curl -X HEAD -D - http://example.org/
```
The -X HEAD part means ‘make an HTTP HEAD request,’ which will only request the HTTP headers for the resource, not the representation data. The -D - part means ‘print out the headers’ (by default cURL only shows the representation data, not the metadata in the headers).

To do content negotiation, you need to add an HTTP header to your request, specifying what kind of content you want. For example, if you wanted to request a representation in plain text format, you could type:
```
curl -H 'Accept: text/plain' http://example.org/
```
Of course, just because you request a certain type of representation doesn’t mean that that type of representation is actually available.

Finally, you can combine the options shown above:
```
curl -X HEAD -D - -H 'Accept: text/plain' http://example.org/
```
Now, use cURL to request the following resource: http://dbpedia.org/resource/University_of_North_Carolina_at_Chapel_Hill
- Does this resource have any representations? Why or why not?
Examine the headers returned when you request this resource. Find another resource related to this one.
- What is the URL of this second resource?
- What is the relationship between these two resources?
- Investigate this second resource. Does it have a representation?
Look at the headers returned by a request for this second resource. You should see information about a number of related resources, with associated media types. Choose one of these alternate resources, and note the media type. Now make a request for the original resource (http://dbpedia.org/resource/University_of_North_Carolina_at_Chapel_Hill), specifying that you want that media type.
- What was the media type you requested?
- How does specifying a media type change the response you get?

Designing Representations

Due October 7.

For this assignment, you will continue designing the information service you began developing in assignment 1. Specifically, you will design representations of the resources you identified in the previous assignment. While your resources may lend themselves to any number of different representations, I want you to focus on designing hypermedia that not only represents the data and metadata about the data, but also uses links to represent metadata about your service and the ways it can be interacted with.

It is possible to design hypermedia types using many different data formats, but for the purposes of this assignment you are asked to use HTML. By using HTML as your base format, you will not need to design your own hypermedia controls (i.e. syntax for creating links) since HTML has already defined these for you. So your design effort will focus on expressing the semantics of your information service using the existing elements and attributes of HTML.

Getting set up

You will be using GitHub to manage and submit your work for this assignment. If you’re already a GitHub user, you’ll just need to create a new repository for this assignment. If you’re not, read on.

First, sign up for a free GitHub account.
Next, install and set up the Git version control software on your computer.
Now you’re ready to create a “repository” for your assignment. Follow the instructions at GitHub to create a repository. Note that these instructions assume you are creating a repository named hello-world. Don’t name your repository hello-world. Give your repository a short but meaningful name related to the information service you are designing. So, don’t name it assignment3 either.

At this point, if you’ve followed all the instructions linked above, you should now have a public GitHub repository for your assignment.

Using Git

Git is powerful and complex software. However, the way we’ll be using it is rather simple and should be straightforward.

These instructions assume that you’re using Git from the command line. That means you’ll be using Terminal.app if you’re on a Mac, or Git Shell (which you should have installed as part of the setup process above) on Windows. (If you’re on Linux I assume you already know how to operate a shell.)

Instead of using the command line, you can use a GitHub client like GitHub Windows or Github Mac. I don’t provide instructions for these, because I’ve never used them. I assume their documentation is better than what I could provide anyway.

To use Git from the command line, read on.

To add files to your GitHub repository:

Create and save a file (for example, menu.html or businesses.html) in the directory (folder) you created as part of the process of creating a repository.
Open your command line (Terminal or Git Shell), and move to your repository directory using the cd command. For example, if your repository was named visitors-bureau, you should be able to get there using the following command:
```
$ cd ~/visitors-bureau
```
Verify that you can see your file using the ls command:
```
$ ls
README           businesses.html
```

The git status command should show you that your new file has not yet been added to your repository:

$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   businesses.html
nothing added to commit but untracked files present (use "git add" to track)

Now use git add to add the new file:
```
$ git add businesses.html
```

Using git status again will show you that the file has been added:

$ git status
# On branch master
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#   new file:   businesses.html
#

At this point you’ve told Git that you want it to keep track of the new file, but you haven’t actually saved it to the repository yet. Do that with the git commit command, adding an informative note:
```
$ git commit -m 'Added example representation of a list of businesses.' businesses.html
```
Now you’ve saved the file to your local repository, but you haven’t yet “pushed” it to the public repository on GitHub. Do that as follows:
```
$ git push origin master
```
Now you should be able to see your file in your public repository on GitHub. If you make changes to your file and save it, then git status should notify you that the file has changed:
```
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   businesses.html
#
```
Using git add and git commit again, you can save these changes to your local repository, and using git push you can push them to your public repository.

Think about your representations

Think about what kind of data needs to be included in the representations of your resources. Consider both representations included in requests to your service (i.e. in PUT or POST requests) and representations included in responses from your service. Don’t worry about specific media types for now, just think about what data is needed. Another important thing to consider is what status codes your service will possibly return. This means you need to think not only about successful requests for your resources, but unsuccessful ones as well.

For example, in the farmer’s market service we might document the following (note that this is incomplete):

GET to the Farmers Market resource returns either a list of links to individual Farm resources, or the message No farms yet. In either case the status code is 200 OK.

POST to Farmers Market returns the message Created farm {farm-name} with the URI of the new Farm in the Location HTTP header, and a 201 Created status code. If the POSTed data is missing some required information (e.g. the farm’s name), it returns the message Farm's name is required with a 400 Bad Request status code.

PUT to a Farm resource requires a representation including (at least) the farm’s name and URL. If either of these is missing, the response will be the message Farm's {property} is required (repeated once for each missing property) with a 400 Bad Request status code. If the Farm doesn’t exist yet, the response will be the message No such farm exists with a 404 Not Found error code. Otherwise the response will be the message Updated farm {farm-name} with a 200 OK status code.

Note that I didn’t bother including 500 Internal Server Error responses, since we assume that any resource can potentially return these.

Deliverable #1: For each method supported by each resource in your service, specify what kinds of representation (if any) it requires, and what kinds of representations, including status codes, it might return. Be sure to consider possible error conditions. Add this specification to your repository.

Designing your hypermedia type

To design the hypermedia representations of your resources, you’ll need to think about:

How to represent the data provided by your various resources in HTML
Common patterns (“blocks”) that appear in your representations, and how these will be identified
How you will use outbound links to link representations to related resources and indicate possible process flows
How you will use templated query links for search actions
How you will use update links to create and update resources

To show your answers to these questions, you will create a set of HTML files that are example representations of your resources. For each kind of resource you have, you will create one example HTML file. This file should be very simple, with just the minimum HTML markup necessary to present the data. Don’t spend any time worrying about the styling of the file; focus only the structure. So, for example, think about whether you need a list of things, or a table, a paragraph of text, etc.

Now, give your HTML elements class attributes where necessary to describe the specific kind content they hold. For example, in an HTML list representing a list of Farms, each list item element might be given a class attribute with the value farm.

Once you’re satisfied with your example representations, link them to one another.

Your outbound links (i.e. anchor elements) in each HTML file should have href values that link them to the other HTML files, so that you can open one HTML file in a web browser and click on links to get to the other files. So, for example, in a “real” farmer’s market service each farm in the HTML representation of my “all farms” list would link to the specific URL for that Farm resource. But in these example HTML files, I would just have each element in the list link to farm.html (the example HTML representation of a single Farm resource).

Likewise, a templated query link (i.e. an HTML GET forms) should have an action attribute values that refers to the example HTML representation of the search results for that query.

HTML does not support idempotent update links. But for the purposes of this assignment, you can pretend that it does. Just indicate, as the value of the form’s method attribute, the method you intend for the form to use to make the request. Later in the semester we will look at techniques for adding true support for idempotent updates to HTML.

Your anchor elements should have rel attributes that describe why a user agent might want to activate them. Check the registry of link relations and try to find an appropriate one, or make up your own.

You can describe your HTML forms using class attributes, as explained in RESTful Web APIs 119—122.

Deliverable #2: An interlinked set of HTML files, one to represent each kind of resource in your service.

Documentation

To complete this assignment, your GitHub repository ought to contain, in addition to deliverables #1 and #2 above:

Deliverable #3: A plain-text README file documenting the class and rel attribute values you used to describe your service and the data it provides. Use pages 114—115 and 121—122 of RESTful Web APIs as an example of how to document your attribute values. (Note that you can use Markdown syntax in your README file for formatting—-italics, lists, etc. If you do this, save your file as README.md.)

Once you’ve pushed all your files to your public GitHub repository, please submit this assignment by emailing me the URL of your repository.

Final Project

Due December 9.

For your final project, you will take the design work you did for the last two assignments, and turn it into a working Web information service.

Implementing your service

Your service must provide access to at least two kinds of resources (e.g. farmers and products, or dishes and ingredients, or business and events) that have some kind of relationship to one another. Clients should be able to access the two kinds of resources directly, and they should also be able to access “collection” resources that list all the resources of a particular kind. It should be possible to create and update at least one of the kinds of resources through your service.

For example, the help desk service provides access to one kind of resource: help requests. The service provides a resource that lists existing help requests, and this resource is filterable. Help requests can be created and updated through the service.

Your service should provide (at least) HTML representations for all resources. These representations must include metadata that describe the application (how to transition from one state to another) and the data being provided.

Describing your application flow

Your HTML representations must include the proper hypermedia controls for linking representations to one another, creating query URIs from templates, and updating resources both idempotently and non-idempotently. Your HTML controls must have appropriate class attribute values and rel attribute values that describe their meaning and purpose (this was the work you did for the Designing a Hypermedia Type assignment).

Providing machine-readable access to your data

You have three options for providing machine-readable access to your data:

Option #1: HTML+Microdata

If you choose to use microdata, you should describe your data and relationships using appropriate types and properties from schema.org. If the types or properties at schema.org are too generic for your data, you may considering extending them. If there are no appropriate types or properties for your data at schema.org, you might consider using RDFa instead.

You can use Google’s Structured Data Testing Tool or the omnipotentdatatranslator to check your microdata
Option #2: HTML+RDFa

If you choose to use RDFa, you should describe your data and relationships using types and properties from some RDF-compatible vocabulary such as the DBpedia Ontology or schema.org. You can search for appropriate vocabularies at Schemapedia.

You can use Google’s Structured Data Testing Tool, or one of the two RDFa Distillers to check your RDFa.
Option #3: JSON-LD

Instead of using microdata or RDFa to describe the data in your HTML representations, you can provide JSON-LD representations. Your types and properties should come from some RDF-compatible vocabulary such as the DBpedia Ontology or schema.org. You can search for appropriate vocabularies at Schemapedia.

Inter-operability

Finally, your service must take advantage of the machine-readable data provided by one of the other group’s services to enhance its own data. This means that your representations of one of your kinds resources must include not only data from your JSON database, but also from another service.

So, for example:

The restaurant service can use the farmers’ market service to indicate which farms provided ingredients for its dishes
The visitors’ bureau service can use the restaurant service to show current menus for local restaurants
The farmers’ marker service can use the visitors’ bureau service to show special events taking place at the farmers’ market

I will provide the code for parsing microdata, RDFa, and/or JSON-LD.

Note that, in order to test interoperability, you will need to download and run locally the service you will be reading data from.

Deliverables

Your final deliverables for the project are:

The URL of a single GitHub repository containing the complete source code for your service.
A Readme.md file in the GitHub repository documenting:
- the attribute values used to describe your application flow, and
- the types and properties used to describe your data.
(The .md suffix indicates a plain text file that uses Markdown syntax. This allows you to produce something more nicely formatted than plain text alone, and GitHub will automatically display it as HTML.)

You may simply email me your two URLs by 8am on Tuesday, December 9th.