Due August 31.
Issues of information organization pervade our daily lives. Once you learn to notice them you will see them everywhere. For this assignment you will:
Find a recent (published since January 2011) news item (article, video, whatever) that illustrates how an issue related to information organization impacts our lives as individuals, as members or employees of organizations, as citizens, and so on.
Using the class blog, write a brief blog entry about your story, summarizing it and highlighting how it relates to issues of information organization. Write at least two paragraphs and no more than four that contextualize the story—that is, why is this story related to the organization of information? Also, see if you can answer the five questions about organizing systems in Glushko’s “Foundations for Organizing Systems.”
Due September 21.
In this assignment, you will:
You must submit your work by uploading your zip archive of assignment files before 12:30PM on Wednesday, September 21, 2011. Late assignments will not be accepted unless you have an exceptionally good excuse. Even though you have nine days for this assignment, you’ll want to start early, as you’ll need to swap vocabularies with a classmate in order to successfully complete all parts of the assignment.
You will need to use a text editor for this assignment. You may choose to use an XML editor as well, but it isn’t really necessary.
You will submit a total of 3 text files (zipped).
The first file should be named report.txt
. It should have the following sections:
The second file will be a sample resource described in XML using your vocabulary. Name this file mine.xml
.
The third file will be a sample resource described in XML using your classmate’s vocabulary. Name this file swapped.xml
.
Your task is to develop a vocabulary for describing some kind of resource, for the purpose of organizing a collection of that resource. Creating a vocabulary involves scoping the problem, making decisions about key trade-offs, choosing meaningful names and descriptions, and being aware of any biases you might be bringing to the vocabulary. This assignment will take you through that process.
The first thing you need to decide is: what kind of resource are you organizing? Remember, a resource can be virtually anything. It could be something tangible, like action figures. It could be something intangible, like restaurant dining experiences. It’s totally up to you. But choose carefully! Most of the rest of your assignments for the semester will involve different approaches to organizing the kind of resource you choose for this assignment. So choose something you’re interested in, and won’t get bored of. Ideally, it will be something you could imagine yourself organizing professionally in the future. But don’t feel that you have to choose something down-to-earth; it’s OK to be creative.
Next, you must settle on a scope for the type of experience you intend to model. Suppose you’re creating a catalog recording people’s descriptions of their experiences dining out at restaurants. Are you looking at five-star restaurants or fast food? Or do you want to try to model any dining experience? At this point, you need to be thinking about why you would be organizing this resource. In the “real world” there would be someone paying you to organize it, and you could talk to them to help you answer the five questions that would help you design the organizing system. But for the purposes of this assignment, just make up your own answers to help you define the scope of your system.
In your report.txt
document, under the section title “Scoping the Vocabulary,” write one or two sentences that define the scope of the vocabulary you intend to design. For example, if you were developing a vocabulary for describing course syllabi, you might define its scope by saying, “The Syllabus Vocabulary is a vocabulary for describing a syllabus for a university course. It will describe the topics, readings, assignments, and exams or other important events and the dates associated with each.”
Be sure to carefully consider not only what falls within your scope but also what is outside of it. The Syllabus Vocabulary might want to include some notion of date or time—but making the vocabulary fully compatible with Google Calendar or another calendar format would fall outside the scope.
Identify and define the terms—XML tags and attributes—your vocabulary requires to describe resources within your scope. For example, some possible tags in the Syllabus Vocabulary might be Topic
, Reading
, Date
, Instructor
, etc.
For each tag, write a short definition. These definitions are your instructions for people using your vocabulary to describe an instance—as your classmate will do later in this assignment—so strive for precision. Be sure to specify if a tag is meant to be nested within another tag. Also, if you decide to use XML attributes, be sure to specific which tags have attributes. Finally, you may need to specify what kind of content a tag or attribute can take: a number? a date? anything?
All of the above should go in the “Defining the Terms” section of your report.txt
document.
Remember the tradeoffs we’ve discussed so far in the semester. They’ll help you think through some key issues, such as:
You might have developed your vocabulary by thinking of a specific resource. This step will make that connection explicit, testing your vocabulary by having you create a description of a particular resource.
Create an XML description of your resource, using your vocabulary, in the file mine.xml
. Make sure the resource falls within the scope you defined in step 1, and that you followed all the definitions you created in step 2.
As you do this step, you may find you want to change some of your tags, make things less (or more) granular, remove (or add) levels of hierarchy, or alter your vocabulary in other ways. That’s OK! You should revisit steps 1-3 and revise your scope, tags, definitions, and description as needed. Make some notes on any changes you make – you’ll want them for your reflection later.
When you’re finished, make sure your XML is well-formed (i.e., is actually XML) by uploading your file to the W3C Markup Validation Service. You should get the message “This document was successfully checked as well-formed XML!” If you get the message “Errors found while checking this document as XML!” look at the list of errors and try to determine what is wrong. (Don’t worry about the warnings.)
Now comes the fun part: You’ll be swapping vocabularies with a classmate.
For this part of the assignment, you’ll be sharing parts 1 and 2 of your report.txt
. DO NOT share the description you created using your vocabulary—just the scope statement and the definition of terms.
Once you have your classmate’s vocabulary, attempt to create an XML description using only the scope and definitions they provide. Call your instance swapped.xml
. Again, make sure your instance is well-formed. Send this file back to your classmate.
Write a few paragraphs reflecting on your vocabulary modeling experience in your report.txt
document under the section “Reflection.” This doesn’t need to be longer than a few hundred words, but it does need to include, at minimum, the following elements:
Your thoughts on the challenges of creating your own vocabulary, what it was like to scope and create your terms, and what changes—if any—you needed to make after you tried to encode your own instance.
What it was like to try to create a description using your classmate’s vocabulary. Did any terms confuse you? Was anything particularly clear? What did looking at someone else’s terms and definitions teach you about creating a vocabulary?
What was it like to see the description your classmate created using your vocabulary? Did it match the expectations you had for your vocabulary? And what does that tell you about the success of your scope, terms, and definitions?
Assignments are meant to challenge you intellectually in some way, not to see how much “busy work” you can put up with. So I’ll never intentionally ask you to do something that takes time but that doesn’t give you more insights. Put another way, if you find yourself doing busy work in an assignment, re-read the instructions to see if they advise against doing what you’re doing. If that doesn’t end the busy work, ask me if you’re doing what is expected.
In this assignment there are ample opportunities to do “busy work,” so be careful not to do it. For example, a “Syllabus Vocabulary” would probably have some notion of “Topic,” and an description of a syllabus might have many of them. In part 3 of this assignment your task is to demonstrate how your vocabulary works by creating a description. If you were creating a syllabus description it would be sufficient to include just two or three topics, not 29.
Due October 5.
In this assignment you will:
You must submit your work by uploading your facet map file to the FacetMap website (see below) and uploading your zip archive of assignment files to the course website before 12:30PM on Wednesday, October 5th, 2011. Late assignments will not be accepted unless you have an exceptionally good excuse.
You will submit a zip archive containing a copy of your facet map file and a report text file, as detailed in the assignment instructions below. You will also need to make sure that your facet map file has successfully uploaded to the FacetMap website, and that it is browsable online.
In the last assignment you created a vocabulary for describing some resource, and you and your partner created two instances (XML files) that used your vocabulary. For this assignment you will need to create five more instances. Note: you are free to make minor revisions to your vocabulary from what you turned in for the first assignment, but do not radically change the kind of resource you are describing.
Your partner from the last assignment will also create an additional five instances (using your vocabulary), but will not give them to you yet. If you’ve made changes to your vocabulary, be sure to communicate them to your partner.
So, at this point, you should have a total of seven instances (the original two plus the new five you created), each describing a different resource using your vocabulary.
Don’t spend too much time creating the instances: the point is just to give you a reasonable set of resources to classify, not to richly describe each resource. But make sure that your resources are sufficiently different from one another, or else they will be more difficult to divide into classes. For example, if your resources are restaurants, don’t choose to make all your instances descriptions of barbecue joints (unless “barbecue joints” was the scope of your vocabulary).
Get acquainted with the FacetMap website at http://facetmap.com. In particular, look at the wine demo to see how the three facets of Varietals, Region, and Price combine to organize hundreds of wine instances. The demo will start up in the “Commentary Track” that explains what is going on as you specify facet values to select wines in the collection. Notice that there are three different user interface styles for the display of the facets and the items that are selected by different facet values. Think about the pros and cons of each style of display.
Once you have familiarized yourself with FacetMap, it’s time to start working on your classification. Review pages 12–16 in the book chapter on classification, noting Ranganathan’s dimensions, the set of general criteria for facet design, and the principles guiding facet ordering. You will use these to design your own faceted classification to organize your set of seven resources.
You can have as many facets as you want, but you must have at least three. You will need to be creative but make sure that anybody else could understand them without your help. They can be abstract or practical or a mix of both, so long as they classify the resources in a way that other people could understand. Also make sure that your facets are flexible enough to handle additions to the list of instances.
A good set of facets should be able to accommodate new instances without adjustment. Ask your partner to give you the five new instances they created. Revise your facets if necessary so that your system can classify all 12 (7 original plus 5 new) instances. Note: there have been reports of the UNC mail servers eating XML attachments, so zip your XML files before mailing them to one another.
Once you have a system that handles all your instances, encode it and all your instances using the Facetmap markup XML format (not the XFML format). Carefully read the documentation in the DTD and look at the example file.
You may run into quirks, bugs and limitations while using FacetMap. Have patience, make note of any compromises or design changes you had to make because of limitations of FacetMap or facets in general and send email to the class list if you are having difficulties that other students might encounter.
FacetMap enforces a strict “occurrence exclusivity” principle in assigning values in a facet. This means that if (for example) a library facet map had a “subject” facet that included both a “War” and a “Peace” heading, your copy of Tolstoy’s renowned work about both war and peace could not be listed under both. Keep this in mind when designing your facets and facet values. This principle is what distinguishes faceted classification from “tagging” systems where any tag can be assigned to any information resource or object regardless of other tags already associated with it.
Upload your facet map file to facetmap.com. Name your facet map sils2011_firstname_lastname
. (For example, my facet map would be named sils2011_ryan_shaw
.)
After you’ve uploaded, verify that your facet map is browsable at http://facetmap.com/browse/sils2011_firstname_lastname
(again, substituting your actual names). Note that subsequent uploads will overwrite previous ones with the same name, so you can fix any problems you might notice after uploading.
Your uploaded facet map it will be available for 7 days, but after that it may expire, so be sure to keep a copy of what you submitted. (And make sure you don’t upload your facet map file too early, as I won’t be looking at them until after the assignment is due.)
In addition to uploading your facet map file to the FacetMap website, you’ll need to upload it to the course website. Upload a zip archive including your facet map file (name it facetmap.xml
), your 12 instances (name them instance01.xml
… instance12.xml
), and a short report (name it report.txt
). The report should have two sections, and each should be a paragraph or two:
The first section should describe your use of the FacetMap program, especially any compromises you made in your design because of perceived limitations.
The second section should answer the following questions:
Due October 19.
In this assignment you will:
You must submit your work by uploading your zip archive of assignment files to the course website before 12:30PM on Wednesday, October 19th, 2011. Late assignments will not be accepted unless you have an exceptionally good excuse.
You will submit a zip archive containing two text files, one named reflection.txt
and one named urls.txt
. reflection.txt
should contain your reflection on the assignment (see part 5). urls.txt
should contain the published URLs for your Google Docs spreadsheet and drawing (see parts 1-4). To get your published URL for each document, select Publish as a web page
from the Share
menu in the upper right-hand corner of the Google Docs interface. For your spreadsheet, make sure you choose to publish All sheets
. Copy and paste your two published URLs into the urls.txt
file.
In this assignment, you’ll be returning to the set of 12 instances you and your partner created over the last two assignments. This time, you’ll be developing a hierarchical classification scheme—a taxonomy—rather than a faceted one.
The goal of this assignment is to give you more practice thinking about categories and category membership, abstraction, classification, and taxonomy. You’ll also learn a technique for naming and describing a system of categories so that you can clearly convey their meaning to others.
Round up your 12 instances from the last assignment. Create a Google Docs spreadsheet by making a copy of this template. (You can do this by selecting Make a copy ...
under the File
menu.) In the first sheet (“Instances”) of your spreadsheet, create a list of your instances. (Hint: You’ll need some way of identifying your instances in order to make this list. If you already have identifiers, use these. If not, some up with unique names for your instances so that you can list them.)
In this part of the assignment, you’re going to start generalizing away from the specific instances. For instance, identify a class to which that instance belongs. For example, if your instances were musical instruments, and the specific instance you were trying to classify were a drum set, you might pick something like “rhythm instrument” as the class. Or you might choose something more granular or more abstract than that. Remember, as always, you’re making a choice about the level of abstraction you use.
One thing you don’t want to do here is make your classes so specific that they can’t describe anything but the specific instance you’re considering. (Continuing from the example above: if your instances were musical instruments, and you had only one drum set, a class called “drum sets” probably wouldn’t be too useful. On the other hand, if your instances were all rhythm instruments, inlcuding several different kinds of drum set, then a class called “drum sets” might be appropriate.) You should be able to think of some common features held by all members of your class, as well as some other instances that would fit into the class.
As you’re making your first pass through the instances, do not stress out too much about naming these classes. You’re likely to go back to them and revise them as you progress through the assignment. If it’s starting to make you feel crazy, my advice is to come up with something temporary and move on; new ideas might pop up once you’ve started to arrange your hierarchy.
Now that you’ve taken a crack at identifying classes for each of your instances, begin arranging them into a hierarchy. The top or “root” element of your hierarchy should come directly the scope statement you wrote for your vocabulary: i.e., this is the general class covering all of your instances, something like “Photographs” or “Patient Histories.” The bottom level of your hierarchy will be your instances. When you created your classes in part 1, you added a second level to the hierarchy, more abstract than your instances, but less abstract than the “root” class including any instance describable using your vocabulary. What you’re doing now is adding one more level of abstraction, a new level between the classes you identified in part 1 and your root class. These are hypernyms or “super-classes.”
Think of this as a sorting task. (Sometimes it’s even helpful to write your class names down on pieces of paper or sticky notes and physically sort them.) As you sort, you may discover that some of your original classes are too narrow. You may also realize that they’re too broad and don’t leave you enough room to insert another level before getting to your root class. That’s OK! Revise your classes as many times as you need to and record them in your spreadsheet.
At this phase of the assignment, it’s important that you strive for a consistent level of abstraction among your “super-classes.” Again, try to think of some common characteristics that would be shared by all members of that super-class. For example, if we had “musical instruments” as our root element and our next level down included both “clarinets” and “stringed instruments,” that might be a sign that the classification wasn’t maintaining a consistent level of abstraction, because “stringed instruments” is more abstract than “clarinets.” A more consistent taxonomy would have “wind instruments” and “stringed instruments” on the same level.
When you’re satisfied with your assignment of classes to super-classes, record them in the second sheet (“Classes”) of your spreadsheet.
Create a Google Docs drawing showing the structure of your classification hierarchy. This does not have to be fancy. Start with your root class at the top of your diagram, then your “super-classes,” then your classes. You don’t need to include your instances.
Now, you’re going to write definitions for your classes such that an ordinary person would be able to categorize new instances. You’ll be following this formula for definitions:
Hyponym = { adjective } hypernym { distinguishing clause }
For example, suppose you’re classifying instruments. “Instruments” is your root element.
Your first instance was a bass clarinet, and you assigned it to a class called “clarinets.” Then, as you created your taxonomy, you sorted clarinets, saxophones, and flutes together into a “super-class” called “woodwinds.”
Your definitions might then look something like this:
clarinets = { reeded } woodwinds { that are approximately cylindrical in shape and have numerous keys }
woodwinds = { non-brass } instruments { that produce sound when air is blown into them}
Remember that your definitions should reflect things that are true for all members of a class. A good sanity check at this stage is to make sure you can think of a hypothetical second instance for each class.
Record each definition in your spreadsheet. Put your class definitions in the second sheet (“Classes”) and your super-class definitions in the third sheet (“Super-classes”).
In a text file called reflection.txt
, write a paragraph or two about the approaches you used to identify classes and organize them into “super-classes.” Be sure to include your name at the top of the file.
Some questions to guide your reflection:
Due November 2.
In this assignment, you will:
You must submit your work by uploading your assignment before 12:30PM on Wednesday, November 2. Late assignments will not be accepted unless you have an exceptionally good excuse.
You will submit a (zipped) text file called report.txt
. The file will include short answers (50-100 words) to the 8 reflections of this assignment.
You will use some text tools to analyze a collection of documents and you will be asked to answer some questions. Please include in your submission short answers (50-100 words) to every question marked as “Reflection” below.
Voyeur is a web-based text analysis tool designed to work on text collections.
Note: there are two versions of Voyeur, which can be found at the following URLs:
They are mostly the same, but annoyingly each has different bugs that affect this assignment. Please pay close attention to the instructions below regarding which version to use for which part of the assignment.
In the Add Texts
section you can put the links to the documents you want to analyze. Paste the following links (one per row). You will probably recognize them.
Click on reveal
.
Summary
tab with with a list of statistics. At the bottom of this list, look at Distinctive words (compared to the rest of the corpus)
. Take note of the five most distinctive words for each document. Why do you think they are distinctive? What do they tell you about the document? Which of them could be used as the “big concepts” of every chapter? Which of them look more accidental? What is the benefit of seeing the term frequency compared to other documents in the collection? (Reflection 1)reveal
.Distinctive words
using this version of the tool (now in the lower left corner). Ignore these; that feature appears to be broken in this version of the tool.Words in the Entire Corpus
bar in the lower left. You should see a list of all the words found in the texts, in descending order by frequency. Using the checkboxes, select the words classification
and descriptions
. (You will have to scroll down to find them in the list.) A new panel should appear on the far right, showing a graph. What does this graph tell you? (Reflection 2)people
and information
. The graph will change. What does this graph tell you? (Reflection 3)In this part of the assignment you will compare tagging and summarization by humans to tags and summaries automatically extracted by computational algorithms.
Upload
button. When you are prompted to Select Source
, select Link
.Paste in the following link, and click Upload
.
Wait a bit, until you see a blue 1 new item
message in the sidebar on the left.
Links
in the sidebar. You should see an entry for “Folksonomies: Tidying up Tags?” Click on it.Focus:
)? (Reflection 6)Related Concepts
. If you click the index
button in the side bar, you will see more of these “tags,” arranged in the form of a traditional back-of-the-book index. (Note that if you’ve added additional sources to Topicmarks, this index will contain entries for all of them.) On the Pinboard page, you can see a cloud of user-assigned tags on the right. Compare the tags and tag clouds generated by the two services. Do you see overlap between the extracted Topicmarks tags and the tags assigned by Pinboard users? What types of tags can be assigned automatically and what types cannot? (Reflection 8)Have fun with the assignment and make sure you answer the 8 reflections in your assignment submission. I hope you’ll find these tools useful to play around with further. (I’m especially fond of Pinboard.)
Due December 9.
See your branch materials for details about deliverables. Your deliverables must be uploaded by 7PM on Friday, December 9th.