Organization of Information

UNC SILS, INLS 520, Fall 2011

Organization of Information in the News

Due August 31.

Issues of information organization pervade our daily lives. Once you learn to notice them you will see them everywhere. For this assignment you will:

  1. Find a recent (published since January 2011) news item (article, video, whatever) that illustrates how an issue related to information organization impacts our lives as individuals, as members or employees of organizations, as citizens, and so on.

  2. Using the class blog, write a brief blog entry about your story, summarizing it and highlighting how it relates to issues of information organization. Write at least two paragraphs and no more than four that contextualize the story—that is, why is this story related to the organization of information? Also, see if you can answer the five questions about organizing systems in Glushko’s “Foundations for Organizing Systems.”

Creating a Vocabulary & Descriptions

Due September 21.

Assignment Overview

In this assignment, you will:

  1. Create an XML vocabulary for describing a some kind of resource.
  2. Describe a specific instance of that resource using your vocabulary.
  3. Swap vocabularies with a classmate.
  4. Describe a resource using your classmate’s vocabulary.
  5. Reflect on your experience creating and using vocabularies.

Deadline

You must submit your work by uploading your zip archive of assignment files before 12:30PM on Wednesday, September 21, 2011. Late assignments will not be accepted unless you have an exceptionally good excuse. Even though you have nine days for this assignment, you’ll want to start early, as you’ll need to swap vocabularies with a classmate in order to successfully complete all parts of the assignment.

Submission Requirements

You will need to use a text editor for this assignment. You may choose to use an XML editor as well, but it isn’t really necessary.

You will submit a total of 3 text files (zipped).

The first file should be named report.txt. It should have the following sections:

  1. Scoping the Vocabulary
  2. Defining the Terms
  3. Reflection

The second file will be a sample resource described in XML using your vocabulary. Name this file mine.xml.

The third file will be a sample resource described in XML using your classmate’s vocabulary. Name this file swapped.xml.

Detailed Instructions

Your task is to develop a vocabulary for describing some kind of resource, for the purpose of organizing a collection of that resource. Creating a vocabulary involves scoping the problem, making decisions about key trade-offs, choosing meaningful names and descriptions, and being aware of any biases you might be bringing to the vocabulary. This assignment will take you through that process.

Part 1: Scoping the Vocabulary

The first thing you need to decide is: what kind of resource are you organizing? Remember, a resource can be virtually anything. It could be something tangible, like action figures. It could be something intangible, like restaurant dining experiences. It’s totally up to you. But choose carefully! Most of the rest of your assignments for the semester will involve different approaches to organizing the kind of resource you choose for this assignment. So choose something you’re interested in, and won’t get bored of. Ideally, it will be something you could imagine yourself organizing professionally in the future. But don’t feel that you have to choose something down-to-earth; it’s OK to be creative.

Next, you must settle on a scope for the type of experience you intend to model. Suppose you’re creating a catalog recording people’s descriptions of their experiences dining out at restaurants. Are you looking at five-star restaurants or fast food? Or do you want to try to model any dining experience? At this point, you need to be thinking about why you would be organizing this resource. In the “real world” there would be someone paying you to organize it, and you could talk to them to help you answer the five questions that would help you design the organizing system. But for the purposes of this assignment, just make up your own answers to help you define the scope of your system.

In your report.txt document, under the section title “Scoping the Vocabulary,” write one or two sentences that define the scope of the vocabulary you intend to design. For example, if you were developing a vocabulary for describing course syllabi, you might define its scope by saying, “The Syllabus Vocabulary is a vocabulary for describing a syllabus for a university course. It will describe the topics, readings, assignments, and exams or other important events and the dates associated with each.”

Be sure to carefully consider not only what falls within your scope but also what is outside of it. The Syllabus Vocabulary might want to include some notion of date or time—but making the vocabulary fully compatible with Google Calendar or another calendar format would fall outside the scope.

Part 2: Defining the Terms

Identify and define the terms—XML tags and attributes—your vocabulary requires to describe resources within your scope. For example, some possible tags in the Syllabus Vocabulary might be Topic, Reading, Date, Instructor, etc.

For each tag, write a short definition. These definitions are your instructions for people using your vocabulary to describe an instance—as your classmate will do later in this assignment—so strive for precision. Be sure to specify if a tag is meant to be nested within another tag. Also, if you decide to use XML attributes, be sure to specific which tags have attributes. Finally, you may need to specify what kind of content a tag or attribute can take: a number? a date? anything?

All of the above should go in the “Defining the Terms” section of your report.txt document.

Remember the tradeoffs we’ve discussed so far in the semester. They’ll help you think through some key issues, such as:

  • How many “levels” of tags do you want or need? Everything can be at the same hierarchical level, or you may have some tags that are “containers” for others. There’s no right answer; just think about the implications.
  • How many tags do you need to cover everything in your scope?
  • What are the benefits/drawbacks of a simpler vocabulary? A more complex one?
  • Are you being consistent with the levels of abstraction and granularity of your terms?
  • There’s no required upper or lower boundary on the number of tags that will be in your vocabulary. That said, in the past, we’ve found that a useful vocabulary can be developed with somewhere between 10 and 20 terms.
Part 3: Using Your Vocabulary

You might have developed your vocabulary by thinking of a specific resource. This step will make that connection explicit, testing your vocabulary by having you create a description of a particular resource.

Create an XML description of your resource, using your vocabulary, in the file mine.xml. Make sure the resource falls within the scope you defined in step 1, and that you followed all the definitions you created in step 2.

As you do this step, you may find you want to change some of your tags, make things less (or more) granular, remove (or add) levels of hierarchy, or alter your vocabulary in other ways. That’s OK! You should revisit steps 1-3 and revise your scope, tags, definitions, and description as needed. Make some notes on any changes you make – you’ll want them for your reflection later.

When you’re finished, make sure your XML is well-formed (i.e., is actually XML) by uploading your file to the W3C Markup Validation Service. You should get the message “This document was successfully checked as well-formed XML!” If you get the message “Errors found while checking this document as XML!” look at the list of errors and try to determine what is wrong. (Don’t worry about the warnings.)

Part 4: Swapping Vocabularies

Now comes the fun part: You’ll be swapping vocabularies with a classmate.

For this part of the assignment, you’ll be sharing parts 1 and 2 of your report.txt. DO NOT share the description you created using your vocabulary—just the scope statement and the definition of terms.

Once you have your classmate’s vocabulary, attempt to create an XML description using only the scope and definitions they provide. Call your instance swapped.xml. Again, make sure your instance is well-formed. Send this file back to your classmate.

Part 5: Reflecting on the vocabulary modeling experience

Write a few paragraphs reflecting on your vocabulary modeling experience in your report.txt document under the section “Reflection.” This doesn’t need to be longer than a few hundred words, but it does need to include, at minimum, the following elements:

  1. Your thoughts on the challenges of creating your own vocabulary, what it was like to scope and create your terms, and what changes—if any—you needed to make after you tried to encode your own instance.

  2. What it was like to try to create a description using your classmate’s vocabulary. Did any terms confuse you? Was anything particularly clear? What did looking at someone else’s terms and definitions teach you about creating a vocabulary?

  3. What was it like to see the description your classmate created using your vocabulary? Did it match the expectations you had for your vocabulary? And what does that tell you about the success of your scope, terms, and definitions?

Note: The “No Busy Work” Principle

Assignments are meant to challenge you intellectually in some way, not to see how much “busy work” you can put up with. So I’ll never intentionally ask you to do something that takes time but that doesn’t give you more insights. Put another way, if you find yourself doing busy work in an assignment, re-read the instructions to see if they advise against doing what you’re doing. If that doesn’t end the busy work, ask me if you’re doing what is expected.

In this assignment there are ample opportunities to do “busy work,” so be careful not to do it. For example, a “Syllabus Vocabulary” would probably have some notion of “Topic,” and an description of a syllabus might have many of them. In part 3 of this assignment your task is to demonstrate how your vocabulary works by creating a description. If you were creating a syllabus description it would be sufficient to include just two or three topics, not 29.

Submit this assignment.

Classifying

Due October 5.

Assignment Overview

In this assignment you will:

  1. Familiarize yourself with an online tool for facet creation.
  2. Design a faceted classification for a set of resources.
  3. Adjust your classification given additional instances.
  4. Build and upload your faceted classification through the online tool.
  5. Reflect on your experience designing and iterating your classification.

Deadline

You must submit your work by uploading your facet map file to the FacetMap website (see below) and uploading your zip archive of assignment files to the course website before 12:30PM on Wednesday, October 5th, 2011. Late assignments will not be accepted unless you have an exceptionally good excuse.

Submission Requirements

You will submit a zip archive containing a copy of your facet map file and a report text file, as detailed in the assignment instructions below. You will also need to make sure that your facet map file has successfully uploaded to the FacetMap website, and that it is browsable online.

Part 1. Create More Instances

In the last assignment you created a vocabulary for describing some resource, and you and your partner created two instances (XML files) that used your vocabulary. For this assignment you will need to create five more instances. Note: you are free to make minor revisions to your vocabulary from what you turned in for the first assignment, but do not radically change the kind of resource you are describing.

Your partner from the last assignment will also create an additional five instances (using your vocabulary), but will not give them to you yet. If you’ve made changes to your vocabulary, be sure to communicate them to your partner.

So, at this point, you should have a total of seven instances (the original two plus the new five you created), each describing a different resource using your vocabulary.

Don’t spend too much time creating the instances: the point is just to give you a reasonable set of resources to classify, not to richly describe each resource. But make sure that your resources are sufficiently different from one another, or else they will be more difficult to divide into classes. For example, if your resources are restaurants, don’t choose to make all your instances descriptions of barbecue joints (unless “barbecue joints” was the scope of your vocabulary).

Part 2. Getting Acquainted with FacetMap

Get acquainted with the FacetMap website at http://facetmap.com. In particular, look at the wine demo to see how the three facets of Varietals, Region, and Price combine to organize hundreds of wine instances. The demo will start up in the “Commentary Track” that explains what is going on as you specify facet values to select wines in the collection. Notice that there are three different user interface styles for the display of the facets and the items that are selected by different facet values. Think about the pros and cons of each style of display.

Part 3. Design A Faceted Classification

Once you have familiarized yourself with FacetMap, it’s time to start working on your classification. Review pages 12–16 in the book chapter on classification, noting Ranganathan’s dimensions, the set of general criteria for facet design, and the principles guiding facet ordering. You will use these to design your own faceted classification to organize your set of seven resources.

You can have as many facets as you want, but you must have at least three. You will need to be creative but make sure that anybody else could understand them without your help. They can be abstract or practical or a mix of both, so long as they classify the resources in a way that other people could understand. Also make sure that your facets are flexible enough to handle additions to the list of instances.

Part 4. Testing Your Facets with New Instances

A good set of facets should be able to accommodate new instances without adjustment. Ask your partner to give you the five new instances they created. Revise your facets if necessary so that your system can classify all 12 (7 original plus 5 new) instances. Note: there have been reports of the UNC mail servers eating XML attachments, so zip your XML files before mailing them to one another.

Part 4. Encoding Your Work for FacetMap

Once you have a system that handles all your instances, encode it and all your instances using the Facetmap markup XML format (not the XFML format). Carefully read the documentation in the DTD and look at the example file.

FacetMap’s Limitations

You may run into quirks, bugs and limitations while using FacetMap. Have patience, make note of any compromises or design changes you had to make because of limitations of FacetMap or facets in general and send email to the class list if you are having difficulties that other students might encounter.

FacetMap enforces a strict “occurrence exclusivity” principle in assigning values in a facet. This means that if (for example) a library facet map had a “subject” facet that included both a “War” and a “Peace” heading, your copy of Tolstoy’s renowned work about both war and peace could not be listed under both. Keep this in mind when designing your facets and facet values. This principle is what distinguishes faceted classification from “tagging” systems where any tag can be assigned to any information resource or object regardless of other tags already associated with it.

Part 5. Submitting Your Work

Upload your facet map file to facetmap.com. Name your facet map sils2011_firstname_lastname. (For example, my facet map would be named sils2011_ryan_shaw.)

After you’ve uploaded, verify that your facet map is browsable at http://facetmap.com/browse/sils2011_firstname_lastname (again, substituting your actual names). Note that subsequent uploads will overwrite previous ones with the same name, so you can fix any problems you might notice after uploading.

Your uploaded facet map it will be available for 7 days, but after that it may expire, so be sure to keep a copy of what you submitted. (And make sure you don’t upload your facet map file too early, as I won’t be looking at them until after the assignment is due.)

In addition to uploading your facet map file to the FacetMap website, you’ll need to upload it to the course website. Upload a zip archive including your facet map file (name it facetmap.xml), your 12 instances (name them instance01.xmlinstance12.xml), and a short report (name it report.txt). The report should have two sections, and each should be a paragraph or two:

  1. The first section should describe your use of the FacetMap program, especially any compromises you made in your design because of perceived limitations.

  2. The second section should answer the following questions:

    • Were there sequence effects? That is, did you find that you had to work through the facets in a particular order to get to a classification you felt comfortable with?
    • Was your vocabulary useful for or a barrier to developing facets? How?
    • Did you use Ranganathan’s PMEST in conceiving your facets? Was it useful?
    • What were your biggest challenges in designing the classification?
    • How well did your initial facet design handle the additional instances? What changes did you need to make to accommodate the new instances?

Submit this assignment.

Building a Taxonomy

Due October 19.

Assignment Overview

In this assignment you will:

  1. Define classes for your 12 instances.
  2. Sort those classes into a taxonomy.
  3. Create a diagram of your taxonomy.
  4. Write definitions for each part of your taxonomy using hypernyms and hyponyms.
  5. Reflect on your experience.

Deadline

You must submit your work by uploading your zip archive of assignment files to the course website before 12:30PM on Wednesday, October 19th, 2011. Late assignments will not be accepted unless you have an exceptionally good excuse.

Submission Requirements

You will submit a zip archive containing two text files, one named reflection.txt and one named urls.txt. reflection.txt should contain your reflection on the assignment (see part 5). urls.txt should contain the published URLs for your Google Docs spreadsheet and drawing (see parts 1-4). To get your published URL for each document, select Publish as a web page from the Share menu in the upper right-hand corner of the Google Docs interface. For your spreadsheet, make sure you choose to publish All sheets. Copy and paste your two published URLs into the urls.txt file.

Detailed Instructions

In this assignment, you’ll be returning to the set of 12 instances you and your partner created over the last two assignments. This time, you’ll be developing a hierarchical classification scheme—a taxonomy—rather than a faceted one.

The goal of this assignment is to give you more practice thinking about categories and category membership, abstraction, classification, and taxonomy. You’ll also learn a technique for naming and describing a system of categories so that you can clearly convey their meaning to others.

Part 1. Identify your classes

Round up your 12 instances from the last assignment. Create a Google Docs spreadsheet by making a copy of this template. (You can do this by selecting Make a copy ... under the File menu.) In the first sheet (“Instances”) of your spreadsheet, create a list of your instances. (Hint: You’ll need some way of identifying your instances in order to make this list. If you already have identifiers, use these. If not, some up with unique names for your instances so that you can list them.)

In this part of the assignment, you’re going to start generalizing away from the specific instances. For instance, identify a class to which that instance belongs. For example, if your instances were musical instruments, and the specific instance you were trying to classify were a drum set, you might pick something like “rhythm instrument” as the class. Or you might choose something more granular or more abstract than that. Remember, as always, you’re making a choice about the level of abstraction you use.

One thing you don’t want to do here is make your classes so specific that they can’t describe anything but the specific instance you’re considering. (Continuing from the example above: if your instances were musical instruments, and you had only one drum set, a class called “drum sets” probably wouldn’t be too useful. On the other hand, if your instances were all rhythm instruments, inlcuding several different kinds of drum set, then a class called “drum sets” might be appropriate.) You should be able to think of some common features held by all members of your class, as well as some other instances that would fit into the class.

As you’re making your first pass through the instances, do not stress out too much about naming these classes. You’re likely to go back to them and revise them as you progress through the assignment. If it’s starting to make you feel crazy, my advice is to come up with something temporary and move on; new ideas might pop up once you’ve started to arrange your hierarchy.

Part 2. Organize your classes into a hierarchy

Now that you’ve taken a crack at identifying classes for each of your instances, begin arranging them into a hierarchy. The top or “root” element of your hierarchy should come directly the scope statement you wrote for your vocabulary: i.e., this is the general class covering all of your instances, something like “Photographs” or “Patient Histories.” The bottom level of your hierarchy will be your instances. When you created your classes in part 1, you added a second level to the hierarchy, more abstract than your instances, but less abstract than the “root” class including any instance describable using your vocabulary. What you’re doing now is adding one more level of abstraction, a new level between the classes you identified in part 1 and your root class. These are hypernyms or “super-classes.”

Think of this as a sorting task. (Sometimes it’s even helpful to write your class names down on pieces of paper or sticky notes and physically sort them.) As you sort, you may discover that some of your original classes are too narrow. You may also realize that they’re too broad and don’t leave you enough room to insert another level before getting to your root class. That’s OK! Revise your classes as many times as you need to and record them in your spreadsheet.

At this phase of the assignment, it’s important that you strive for a consistent level of abstraction among your “super-classes.” Again, try to think of some common characteristics that would be shared by all members of that super-class. For example, if we had “musical instruments” as our root element and our next level down included both “clarinets” and “stringed instruments,” that might be a sign that the classification wasn’t maintaining a consistent level of abstraction, because “stringed instruments” is more abstract than “clarinets.” A more consistent taxonomy would have “wind instruments” and “stringed instruments” on the same level.

When you’re satisfied with your assignment of classes to super-classes, record them in the second sheet (“Classes”) of your spreadsheet.

Part 3. Create a diagram of your hierarchy

Create a Google Docs drawing showing the structure of your classification hierarchy. This does not have to be fancy. Start with your root class at the top of your diagram, then your “super-classes,” then your classes. You don’t need to include your instances.

Part 4. Define your classes and “super-classes”

Now, you’re going to write definitions for your classes such that an ordinary person would be able to categorize new instances. You’ll be following this formula for definitions:

Hyponym = { adjective } hypernym { distinguishing clause }

For example, suppose you’re classifying instruments. “Instruments” is your root element.

Your first instance was a bass clarinet, and you assigned it to a class called “clarinets.” Then, as you created your taxonomy, you sorted clarinets, saxophones, and flutes together into a “super-class” called “woodwinds.”

Your definitions might then look something like this:

  • clarinets = { reeded } woodwinds { that are approximately cylindrical in shape and have numerous keys }
  • woodwinds = { non-brass } instruments { that produce sound when air is blown into them}

Remember that your definitions should reflect things that are true for all members of a class. A good sanity check at this stage is to make sure you can think of a hypothetical second instance for each class.

Record each definition in your spreadsheet. Put your class definitions in the second sheet (“Classes”) and your super-class definitions in the third sheet (“Super-classes”).

Part 5. Reflect on your experience

In a text file called reflection.txt, write a paragraph or two about the approaches you used to identify classes and organize them into “super-classes.” Be sure to include your name at the top of the file.

Some questions to guide your reflection:

  • What was your thought process like?
  • What specific tips from the readings or lectures did you draw on?
  • Were there any “outliers” that you had to work especially hard to fit in?
  • Were you able to keep your “super-classes” to a consistent level of abstraction, and how did you do so?
  • Was this conceptually harder or easier for you than the faceted classification assignment, and why?

Submit this assignment.

Computationally Representing Text

Due November 2.

Assignment Overview

In this assignment, you will:

  • Experiment with some tools for analyzing texts
  • Use the tools to analyze various texts related to the topics of the class
  • Reflect on your experiences

Deadline

You must submit your work by uploading your assignment before 12:30PM on Wednesday, November 2. Late assignments will not be accepted unless you have an exceptionally good excuse.

Submission Requirements

You will submit a (zipped) text file called report.txt. The file will include short answers (50-100 words) to the 8 reflections of this assignment.

Detailed Instructions

You will use some text tools to analyze a collection of documents and you will be asked to answer some questions. Please include in your submission short answers (50-100 words) to every question marked as “Reflection” below.

Part 1: Voyeur

Voyeur is a web-based text analysis tool designed to work on text collections.

Note: there are two versions of Voyeur, which can be found at the following URLs:

They are mostly the same, but annoyingly each has different bugs that affect this assignment. Please pay close attention to the instructions below regarding which version to use for which part of the assignment.

  1. Go to http://voyeur.hermeneuti.ca/
  2. In the Add Texts section you can put the links to the documents you want to analyze. Paste the following links (one per row). You will probably recognize them.

    • http://people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter2-20100908.pdf
    • http://people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter3-20100909.pdf
    • http://people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter4-20100917.pdf
    • http://people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter5-20100917.pdf
    • http://people.ischool.berkeley.edu/~glushko/IFIOIR/Chapter6-20100917.pdf
  3. Click on reveal.

  4. On the right, you should see a Summary tab with with a list of statistics. At the bottom of this list, look at Distinctive words (compared to the rest of the corpus). Take note of the five most distinctive words for each document. Why do you think they are distinctive? What do they tell you about the document? Which of them could be used as the “big concepts” of every chapter? Which of them look more accidental? What is the benefit of seeing the term frequency compared to other documents in the collection? (Reflection 1)
  5. Go to http://voyeurtools.org/, add the same texts, and click reveal.
  6. You will see different Distinctive words using this version of the tool (now in the lower left corner). Ignore these; that feature appears to be broken in this version of the tool.
  7. Click the Words in the Entire Corpus bar in the lower left. You should see a list of all the words found in the texts, in descending order by frequency. Using the checkboxes, select the words classification and descriptions. (You will have to scroll down to find them in the list.) A new panel should appear on the far right, showing a graph. What does this graph tell you? (Reflection 2)
  8. Now select the words people and information. The graph will change. What does this graph tell you? (Reflection 3)
  9. Voyeur does not remove stop-words (unless you tell it to). What would be the effect of stop-word removal? (Reflection 4)
  10. Voyeur does not do stemming of words. What is the effect of this? (Reflection 5)
Part 2: Tagfight: Topicmarks vs. Pinboard

In this part of the assignment you will compare tagging and summarization by humans to tags and summaries automatically extracted by computational algorithms.

  1. Sign in to Topicmarks. You can use a Google or Yahoo! account to sign in, or you can register at Topicmarks.
  2. Once you’ve signed in, click the yellow Upload button. When you are prompted to Select Source, select Link.
  3. Paste in the following link, and click Upload.

  4. Wait a bit, until you see a blue 1 new item message in the sidebar on the left.

  5. Click on Links in the sidebar. You should see an entry for “Folksonomies: Tidying up Tags?” Click on it.
  6. You should see a summary of the article on the right. Read it. Does it make sense? Do you feel you can grasp what the article is about? Is the summary an adequate substitute for actually reading the article? How does the summary change (or not) if you follow the links in the blue bar above the summary (after where it says Focus:)? (Reflection 6)
  7. Now go to the page for this article at Pinboard.
  8. Read the descriptions on the left. How do these descriptions compare to the Topicmarks summary? Are they better, worse, or complementary? Be specific. (Reflection 7)
  9. Below the summary on the Topicmarks page is a tag cloud of Related Concepts. If you click the index button in the side bar, you will see more of these “tags,” arranged in the form of a traditional back-of-the-book index. (Note that if you’ve added additional sources to Topicmarks, this index will contain entries for all of them.) On the Pinboard page, you can see a cloud of user-assigned tags on the right. Compare the tags and tag clouds generated by the two services. Do you see overlap between the extracted Topicmarks tags and the tags assigned by Pinboard users? What types of tags can be assigned automatically and what types cannot? (Reflection 8)

Have fun with the assignment and make sure you answer the 8 reflections in your assignment submission. I hope you’ll find these tools useful to play around with further. (I’m especially fond of Pinboard.)

Submit this assignment.

Midterm Exam

Due November 9.

Submit this assignment.

Final Report

Due December 9.

See your branch materials for details about deliverables. Your deliverables must be uploaded by 7PM on Friday, December 9th.

Submit this assignment.