Search

alexries606

Portfolio for English 606: Topics in Humanities Computing

Tag

digital humanities

Math for Humanists? (Week 10)

This week’s readings introduced the idea of topic modeling as a digital humanities tool. The concept of Latent Dirichlet Allocation (LDA), the primary example of topic modeling in the readings, is credited to David Blei, Andrew Ng, and Michael I. Jordan.

I felt that no one text provided a good definition of topic modeling. In “Words Alone: Dismantling Topic Models in the Humanities,” Benjamin Schmidt refers to topic models as “clustering algorithms that create groupings based on the distributional properties of words across documents.”

In the same edition of Journal of Digital Humanities, Andrew Goldstone and Ted Underwood call topic modeling a “technique that automatically identifies groups of words that tend to occur together in a large collection of documents.”

In Maryland Institute for Technology in the Humanities’ overview of topic modeling, they provide attributes of topic modeling projects as opposed to a concrete definition (their 5 elements of topic modeling projects are corpus, technique, unit of analysis, post processing, and visualization).

According to Schmidt, LDA was originally designed for data retrieval, not for exploring literary or historical corpora. And he expresses concern with the uncontextualized use of topic modeling in the digital humanities field.

He acknowledges that topics are easier to study than individual words when trying to understand a massive text corpora. However, he also expresses that “simplifying topic models for humanists who will not (and should not) study the underlying algorithms creates an enormous potential for groundless–or even misleading–insights.”

His concerns primarily stem from two assumptions that are made when using a topic modeling approach: 1) topics are coherent, and 2) topics are stable. Schmidt then proposes contextualizing the topics in the word usage/frequency of the documents.

Although Schmidt stays positive and realistic (he supports topic modeling; he just wants digital humanists to understand its limitations), the underlying point that I got from the reading is that perhaps that digital humanists are meddling in things they shouldn’t be (at least, not yet).

Schmidt hints that the people who can use topic modeling the most successfully are those who understand the algorithms, at least on a basic level. And this makes sense. That’s the reality for any tool.

This brought me back to the debates about whether or not digital humanists need to know how to code (I feel like I keep coming back to this topic). If we can’t agree that digital humanists need to know how to code, how can we agree or disagree that digital humanists need to be able to understand the algorithms of topic modeling?

The concept of topic modeling is mildly confusing, but still attainable. The algorithms, however, are straight up intimidating. The Wikipedia page for LDA shows a ton of variables and equations that would take more time and effort to understand than I am capable of giving.

Maybe if we discussed this in class, we would come to same conclusion as we did with the need to code for digital humanists: they shouldn’t have to be experts, but they should know enough to talk about it with an expert. But who are the experts in topic modeling? Statisticians, perhaps?

I think that digital humanists who wish to conduct research across a large number of texts could benefit from studying statistics. I’m starting to realize just how many hats digital humanists must (or at least should) wear!

Advertisements

Describing Images with Images (Week 8)

In “How to Compare One Million Images” [UDH], Lev Manovich discusses the challenge for the DH field of accounting for the crazy amount of data that exists and continues to grow. He introduces the software studies initiative’s key method for analysis and visualization of large sets of images, video, and interactive visual media (251).

There are two parts of this approach: 1) “automatic digital image analysis that generates numerical descriptions of various visual characteristics of the images,” and 2) “visualizations that show the complete image set organized by these characteristics” (251).

His outlined approach addresses problems that DH researchers struggle with when they use traditional approaches. These include scalability, registering subtle differences, and adequately describing visual characteristics. The approach also accounts more for entropy, the degree of uncertainty in the data.

For me, this idea of entropy echoes with Johanna Drucker’s concern in “Humanities Approaches to Graphical Display” [DITDH] with the binary representations required for traditional scientific approaches to graphical displays.

I think the connection lies in the separation that Drucker describes between science’s realist approach and humanities’ constructivist approach and the need for the DH field to forge their own path in statistical displays of capta.

Note: although I agree with Drucker’s characterization of data as capta (something that is taken and constructed rather than recorded and observed), I will use the term data throughout the rest of this post for simplicity.

I think Manovich’s approach for handling large sets of data makes sense and is a viable option for the DH field, as long as they can afford the necessary computer programs and have the necessary technical expertise. As Manovich explains, a project like comparing a million manga pages (or even 10,000) would be exceptionally difficult without computer software that can measure differences between images.

For example, tagging can be problematic because even with a closed vocabulary, tags can vary. As mentioned earlier, the human eye cannot account for the subtle differences among a large number of images.

Most DH projects utilize sampling (comparing 1,000 out of 100,000 images), but sampling data can be very problematic. When sampling from a large data set, there is always the possibility that the sample will not accurately represent the entire data set. This is something that every field, both in the sciences and humanities, has to deal with.

Manovich’s scatter plots, line graphs, and image plots are beautiful and interesting and I thought they were surprisingly simple to read and understand for being so nontraditional. Describing images with images just makes sense.

Find the pattern in this response. Is there one or is it apophenia? (Week 7)

Dan Dixon’s “Analysis Tool or Research Methodology” chapter in UDH introduced the psychological phenomenon of pattern recognition in the context of DH. He explains that the DH field has an affinity towards finding patterns, but the field (and most others) have ignored “the nature of what patterns are and their statuses as an epistemological object” (192).

After very briefly explaining the psychology of pattern recognition, the systems view of the world that all pattern-based approaches take, and validating patterns as an epistemic construct, he discusses the occurrence of abductions and apophenia, and this was the section that I found most interesting. As I read about apophenia, I thought about my studies and what I’ve learned so far about the DH field, and I thought doesn’t this happen a lot?

So when I read Dixon’s conclusion, I really took note of one of the questions he posed: “Are we designing patterns where none existed in the first place and is there an unavoidable tendency towards apophenia instead of pattern recognition?” (206).

I think this is a valid and important question that might not have a straightforward answer. I think that, yes, the field does tend towards apophenia, but I think it can be avoided, or alternatively, it may even be okay. I can easily see how one could tend towards apophenia. I think it’s natural to preemptively predict an answer to a research question before research begins.

I also think that there’s pressure for professionals to validate their research within their field, and this may cause apophenia or simply slight manipulation to reach the desired outcome, such as removing certain words from a word count.

My problem with apophenia, or at least Dixon’s definition of apophenia, is with the idea of “ascribing excessive meaning” (202). How do we know when someone has ascribed excessive meaning to a perceived pattern?

Dixon does get at how we determine if a pattern is really there. He suggests that pattern recognition, by itself, is not a valid method of enquiry, and then suggests using inductive and deductive reasoning to prove abductive reasoning. Induction and especially deduction can invalidate a pattern. I agree with this, and I think that it is the researcher’s responsibility to fully account for the valid patterns that appear and be able to recognize when apophenia has occurred.

However, I also think that even loosely developed patterns that are formed from apophenia can be important (as long as it is acknowledged as such). If the researcher can create a unique and productive discussion from the barely formed pattern, it shouldn’t be cast aside.

Furthermore, a single pattern can have several different meanings, depending on the researcher, the research question, the field of study, the context, etc. What may be unimportant in one field may be important to another.

Because the main topic of this weeks readings is digital archives, I want to quickly connect the Dixon reading to the Parrika and Rice & Rice readings. Patterns play a significant role in archives. They help archivists group and organize items. They influence the way items are tagged in an archive. They influence the software and interface of the archival system. And the way that items are grouped, organized, tagged, and retrieved can force patterns that may not emerge otherwise.

Teaching XML in the Digital Humanities (Week 6)

Birnhaum, in “What is XML and why should humanities scholars care,” addresses how we should teach XML. He suggests that the Text Encoding Initiative’s, or TEI’s, Gentle introduction to XML is not gentle enough and suggests learning the syntax of XML after the introduction (although, under this stance, character entities could have been removed from this introduction).

Birnhaum’s gentle introduction was written for an undergraduate course called “Computational methods in the humanities.” The course was “designed specifically to address the knowledge and skills involved in quantitative and formal reasoning within the context of the interests and needs of students in the humanities” (taken from the class syllabus at  http://dh.obdurodon.org/description.xhtml).

In his gentle introduction, Birnhaum takes the stand that digital humanities scholars will need to learn XML at some point, and this stand is even clearer in the syllabus. How should we teach XML?

To help me explore that question, I try to relate it to how I’ve learned programming languages. How did I learn HTML? Mostly by reading online references like w3schools.com and practicing through Notepad. Every new command I read I tried to recreate on my local server. It was very skill based.

Yes, I wanted to be able to create a website, but I mostly wanted a skill to put on my resume. I didn’t think about design and functionality (other than, does the code do what it’s supposed to do). I didn’t think about why I, as an English student, should care or how HTML could be used in a context other than putting content online.

I’m currently learning Javascript through an introductory web development course on Udemy, and so far, I (and the instructor) have been focused on building a skill. I partitioned my screen to display the online reference on the left and Notepad ++ on the right. After I enter new code, I save and refresh my browser window to see if it worked.

The instructor likes to let the code’s output explain itself. He repeatedly says “this will make more sense later in the course.” Sometimes after successfully writing a section of code, I try to think of how it will be useful, and sometimes I can’t answer that.

The instructor essentially throws us in there with very little introduction, but I like that full immersion. HTML and Javascript are languages, and if immersion is an effective technique for learning French or German, why can’t it be an effective technique for learning programming languages?

It was hard for me to learn about XML from this introduction. It was especially hard to learn the terminology without seeing them in action. I actually felt like McDonough’s “XML, Interoperability and the Social Construction of Markup Languages: The Library Example” did a much better job at contextualizing the use of XML in digital humanities, even though it was specific to digital libraries.

Whether a digital humanist slowly learns XML or is thrown into the deep end probably depends on the person and the context. Regardless, I think it’s extremely beneficial to have XML (and other computer-based) classes specifically designed for digital humanists.

Those classes could fill in the gaps that, for example, occurred in my skill based learning. The classes could include discussions about XML problems in the digital humanities, such as interoperability, which is a problem that would not be as urgent to a web developer creating a website for a business.

Research Methods in the Social Sciences (Week 3)

In the “Tactical and Strategic: Qualitative Approaches to the Digital Humanities” chapter of Rhetoric and the Digital Humanities, McNely and Teston discuss the importance of carefully choosing strategies, as different strategies afford or limit certain tactics. As examples, they describe a WAGR approach to explore transmedia storytelling and a GT approach to collect and analyze data.

I had trouble with this reading, because my comfort level with methods and methodologies is poor, and adding the concept of strategies and tactics on top of that left me feeling like I did not fully understand the two approaches.

Despite my difficulty with this reading, research methods in the social sciences was a topic that really stood out to me this week. I was even a little surprised by Smagorinsky’s “The Method Section as Conceptual Epicenter in Constructing Social Science Research Reports.”

Clearly it’s important for people to understand how you conduct your research (validity!), but I guess I must have assumed that the humanities would kind of gloss over that section, whereas in the hard sciences, it sometimes seems like the method section is more important than the research question or findings.

Maybe I just had a brain lapse. Or maybe it’s because it’s not something that we normally talk about in our English classes.

It’s surprising that we do not talk much about methods, since we do apply research methods in most of the papers we write. We are often required to choose a sampling of readings from our field that will fit our research topic (annotated bibliography). We sometimes (carefully) use empirical evidence to back up our arguments. We use research methods, but the word method usually doesn’t come up.

The projects discussed or implied in the DH readings are often fairly different than the standard conference paper for a PWE grad class, but we still use research methods, and we don’t really refer to them as methods.

The only conference paper I wrote that required me to really think about my research methods as methods was for Catherine’s class last semester.

For my research, I collected a sample of images and text related to a popular meme of a fictional character on a TV show. I used Evernote to tag and organize my data and then I looked for patterns. I used Laurie Gries’ iconographic tracking method as a basis for my research, and although I employed a few methods, what would be my “methods section” was really weak, because I just don’t know the vocabulary.

Last semester in the grants class I took, Hawley asked us to pair up each week with another classmate to peer review sections of our grants. The week that I had drafted my project evaluation section, I was paired with a graduate student in a Sport Sciences program (I don’t remember which one specifically).

In the grant, I stated that after the project ended, I would write a paper to be published in an academic journal. My project evaluation section would influence my methods section of the paper more than any other part, because that section explains how I plan to collect data.

The initial draft had very simple statements. With the help of my very knowledgeable classmate (and a little help from socialresearchmethods.net), I was able to provide more specificity.

Research methods is an area that I get a little lost in, but I’m sure that will change through this class.

What’s in a name? (Week 2)

A continuous theme that I’ve noticed in many professional writing fields is that people get really hung up on definitions. ‘How do we define ourselves’ is a very prevalent debate in the tech comm field, and from the readings this week, it is clear that it is a very prevalent debate in the Digital Humanities (DH) field as well.

These debates are slightly annoying. Imagine what they could accomplish with all that time they’ve spent writing and presenting their disagreements with others’ definitions!? But when it comes to research, and especially when it comes to funding, the distinctions between the various similar fields becomes very important. From the definitions put forth in the readings, I was able to sense why it’s so difficult to define the DH field.

This may be completely incorrect, and is certainly an oversimplification, but I imagine a graph with humanities on the X axis and digital technology on the Y axis. A few people in the DH field are at (0.1, 10), a few others are at (10, 0.1), and everyone else is somewhere in between. To simplify, some people seem to lean more towards the “digital” side of DH and others lean more towards the “humanities” side of DH. Therefore, everyone is bringing a variety of skills and ideas to the DH field.

Although Gold did not express his preferred definition of DH in the introduction to The Digital Humanities Moment, he did note the tension after Ramsey’s “Who’s In and Who’s Out” talk. Ramsey describes DH as a field that builds/make things (a sentiment that aligns with the STEM fields).

This idea of building/creating is echoed in the introduction to Rhetoric and the Digital Humanities (RDH). Ridolfo and Hart-Davidson describe DH as a term that largely functions tactically (to get things done). They suggest two political moves for the scholars in rhet studies, TPW, and tech comm: selectively redefining their digital projects under the DH umbrella and studying the DH job market. When reading their introduction, the word “practical” kept coming to mind.

In the first chapter of RDH, Reid expresses that there is a problem with defining the DH field and tries to work out a definition by examining the fields doing DH work. As Reid points out, the obvious DH fields are those that employ computers to study traditional objects of humanities study (what used to be called humanities computing). Other fields he includes are media study and rhet and comp.

When it comes to the challenge of defining the field as a whole, Reid seems to partially blame the troubled relationship between rhetoric and humanities and the “correlationist view” both fields tend to take. Pulling in Latour, his suggestion is to approach rhetorical relations as relations with nonhumans. He calls to recognize how technology (nonhuman) affects technology. Rather than building, he focuses on theorizing.

Going back to the graph, I guess the DH field needs to find a balance between shortening the range while not excluding the field out of existence. It’s too early for me to settle on a definition, although I am certain that I lean a little more towards the “digital” side of DH.

Create a free website or blog at WordPress.com.

Up ↑