Categorize the data – The Humanities Now!

Close reading in digital humanity is a little bit different from the traditional reading process. From my own experience, it is a process of categorizing rather than information gathering. Elena Pierazzo values this categorizing process a lot in her report. “For print the choice of which features to include in the transcription is limited largely by the limits of the publishing technology. In contrast, the digital medium has proved to be much more permissive.” (Abstract, Pierazzo) She mentioned that the public version of transcription should contain as much information as it allows. However, in order to make the transcription a modernize publication, we have to change the original text. How to make the loss of information during the process in this changing the possibly smallest is a big challenge. In traditional way, there might be tons of footnotes at the bottom of the pages trying to explain and reproduce the original information. It is neither efficient nor reader-friendly.

A better way to do that is using digital method to tag words. Categorizing and tagging each word is a good way to explain the hidden information to the readers. More importantly, it will not affect readers’ reading experience a lot in digital publication form. The output of the project combines both simplicity and functionality. However, changing an original transcription to a complete XML compiled file is not very easy. The biggest challenge is to make decisions during the categorize process. Pierazzo mentions that “informed choices need to be made on what to include because it is relevant and what can be safely omitted.” For example, when we talk about the word “Indian”, it is hard to determine if it should be labeled a person or an object. Indian is a person because it represents actually a group of people. However, Indian is not a specific name, it is also a “concept” and we are not able to locate it to any specific character from the word itself. To resolve this kind of problem, we need to persuade each other in the class to make decisions on this categorising process. Finally, we create a new catagory as affiliation to describe a group of people.

Nature language is not like computer programming language. It is much more complicate and vague. Our work is to do as much as possible to translate the nature language into computer language without any information and emotion loss. We find out the information hiding in nature language and hide them inside our XML webpage. We present original diary as an elegant website to keep the beauty and original taste of nature language. At the beginning of this project, I don’t actually value this categorizing work much. But now I think our work is important. It expands traditional 2-D publication by adding a layer behind.

The upper part is the presenting webpage

Source code is at the bottom