In “How to Compare One Million Images” [UDH], Lev Manovich discusses the challenge for the DH field of accounting for the crazy amount of data that exists and continues to grow. He introduces the software studies initiative’s key method for analysis and visualization of large sets of images, video, and interactive visual media (251).

There are two parts of this approach: 1) “automatic digital image analysis that generates numerical descriptions of various visual characteristics of the images,” and 2) “visualizations that show the complete image set organized by these characteristics” (251).

His outlined approach addresses problems that DH researchers struggle with when they use traditional approaches. These include scalability, registering subtle differences, and adequately describing visual characteristics. The approach also accounts more for entropy, the degree of uncertainty in the data.

For me, this idea of entropy echoes with Johanna Drucker’s concern in “Humanities Approaches to Graphical Display” [DITDH] with the binary representations required for traditional scientific approaches to graphical displays.

I think the connection lies in the separation that Drucker describes between science’s realist approach and humanities’ constructivist approach and the need for the DH field to forge their own path in statistical displays of capta.

Note: although I agree with Drucker’s characterization of data as capta (something that is taken and constructed rather than recorded and observed), I will use the term data throughout the rest of this post for simplicity.

I think Manovich’s approach for handling large sets of data makes sense and is a viable option for the DH field, as long as they can afford the necessary computer programs and have the necessary technical expertise. As Manovich explains, a project like comparing a million manga pages (or even 10,000) would be exceptionally difficult without computer software that can measure differences between images.

For example, tagging can be problematic because even with a closed vocabulary, tags can vary. As mentioned earlier, the human eye cannot account for the subtle differences among a large number of images.

Most DH projects utilize sampling (comparing 1,000 out of 100,000 images), but sampling data can be very problematic. When sampling from a large data set, there is always the possibility that the sample will not accurately represent the entire data set. This is something that every field, both in the sciences and humanities, has to deal with.

Manovich’s scatter plots, line graphs, and image plots are beautiful and interesting and I thought they were surprisingly simple to read and understand for being so nontraditional. Describing images with images just makes sense.