Web 2.0 and/or Semantic Web
by Clifford Tatum (first published 05 June 2011, at Digital Scholarship)
The eHumanities Enhanced Publication (EP) project is envisioned as a hybrid platform that leverages Web 2.0 participatory modes of scholarly communication combined with formalized content structures imposed by Semantic Web formats. Translating this vision into a database design and formalized object relationships brings into focus contemporary tensions related to scholarly communication in the digital era. Specifically, innovation and diffusion of informal communication practices is occurring at a higher rate than modes of formal communication.
Asymmetric change in formal/informal communication practices seems to put pressure on their respective roles. Research shows increased acceptance of novel forms of scholarly communication on the Web, but enduring preference to publish in traditional journals rather than choosing web-based venues (Harley et al. 2010). Crucially, this preference is tied the academic system of professional assessment and reward (Procter et al. 2010). While the top-down Semantic Web and bottom-up Web 2.0 intertextual structures are not inherently incompatible, their differences have implications for the design, use, and diffusion of enhanced scholarly publications. Using this distinction as a starting point, it is useful to consider how each approach maps onto expectations about how EPs will be used.
The formalization of object relationship structures entailed in the Semantic Web approach is most compatible with the formal publication of research and scholarship. In comparison, Web 2.0 applications and practices are more oriented to informal modes of communication whereby published texts can be discussed, critiqued, and interpreted. However, this mapping suggests a false symmetry between Semantic Web and Web 2.0, and formal/informal modes of scholarly communication. With this in mind, following is an overview of the EP project, a discussion of particular structural mechanics underpinning the Semantic Web and Web 2.0, and the ways in which digital enhancements in this project are expected to facilitate scholarly communication.
As described elsewhere in more detail, the basic architecture for this project includes a database driven website for each of four traditionally published books related to e-research along with a central aggregation database that facilitates queries within and across the four books. Still an important mode of scholarly communication, particularly in the Humanities, the academic book format has seen relatively little enhancement from the affordances of digital media, networked content, and database technologies. Setting aside the tantalizing question of how the book could or should be redesigned to fully leverage digital media, there is potential for creating enhancements for the book in its present form. This EP project is conceived with the present book opportunities (and limitations) in mind.
In the project diagram above, each website is depicted with a local database, each of which is connected to a central database whose sole purpose is to provide query functionality across the collection of websites. In this way each book website retains an individual web presence with local content management (and storage). Simple content relationships are articulated through the mechanics of hyperlinks, which are comprised of three components: context, situated meaning, and object.
Triples and Content Relationships-
Both Web 2.0 and Semantic Web provide content structures that facilitate interoperability within and among knowledge domains. Both utilize a ‘triple‘ construct that defines the individual object relationships within a domain that in aggregate comprise a significant dimension of content structure. However, the Semantic Web imposes a top-down structure based on formalized object relationships and Web 2.0 facilitates emerging structures through user contributions, primarily through ad-hoc interlinking of content.
The hyperlink defines a simple relationship between local text and a referenced resource. Selection of the local text can serve as the ‘physical’ linking mechanism and at the same time it specifies what the remote content is, both to the reader and to machine-based aggregation, e.g. search engine crawlers. The linked-text names the link and, along with the direction of the link, helps to define the relationship between the two linked objects.
Through hyperlinking, documents, collections of documents, and related audio and visual resources are structured across the web (Halavais 2008, 43). In this bottom-up, nonhierarchical fashion, features of Web 2.0 facilitate co-construction of intertextual discourses. However, content structures emerging from Web 2.0 practices, such as hyperlinking, suffer from some basic linguistic limitations, such as homonyms and synonyms. (Vossen and Hagemann, 2007). In this way, the flexibility of Web 2.0 can also limit the precision of aggregated content.
Formalization of content relationships through Semantic Web approaches is intended to improve content interoperability. Whereas Web 2.0 applications and practices facilitate the emergence of content structures through user contributions, the Semantic Web approach imposes an ontology that defines a structure through object relationships.
The Semantic Web format is known as the Resource Description Framework (RDF), which refers to the structural mechanisms that define object relationships and data interchange specifications (W3C 2011). Of interest here are the object relationships, which are expressed in triples (i.e. subject – object – predicate) and are the building blocks for creating a localized ontology.
The RDF triple is different from the hyperlink in two important ways. It defines object relationships through simple facts about kinds of objects within a particular domain and each part of the triple has a unique identifier or Uniform Resource Identifier (URI). In comparison, the hyperlink connects two objects and in the process the relationship is named in ways that are not always relevant to the meaning of the relationship. For example, a common link naming convention, such as: ‘you can find the report here,’ where the word here, does very little to give meaning to the link.
Additionally, the objects are connected by a hyperlink, which itself does not have a unique URI as such. Instead, the HTML code that defines a hyperlink is embedded in the local content and the code construction includes the URL for the linked-to object. The three distinct parts of the object relationship are identifiable within the local HTML code, but the hyperlink is not an entity separate from the two linked objects. Functionally, the RDF triple and hyperlink are surprisingly similar. Nevertheless, implications of these seemingly minor differences are particularly relevant at the level of content aggregation.
Semantic web formats “gain their expressive power at the expense of increasingly complex design processes, in particular, when it comes to the design of an ontology” and “increased complexity of concepts, deductions and other computations” (Vossen and Hagemann, 2007: 335). The expected payoff though, is increased content interoperability within and across different knowledge domains. Predefined triples, combined as a local content ontology, are the basis for the sorts sophisticated content organization and machine readability envisioned by Semantic Web approaches.
In the face of transformative Internet technologies over the past couple decades, the formal system of academic publishing has seen relatively little change. This is not surprising given the need to maintain integrity and organization of formal knowledge. Semantic Web structures are envisioned as a way to create more precise interoperability between concepts and terms within and across knowledge domains, while still retaining a rigorous grip on the accumulation of new knowledge. Meanwhile, a wide variety of new informal communication practices have emerged at a pace that more closely resembles popular use of digital media. Popularity in academic use of Web 2.0 applications, such as blogs, Twitter, video and image sharing sites, and related features of social networking services, indicates changes in the form of scholarly communication–in addition to the emergence of new communication practices.
A significant challenge in this project are the tensions between our aim to expose book content to the kinds of intertextual discourse possible on the Web and the formal content structure facilitated by Semantic Web formats. Combining attributes serves to offset their respective weaknesses while also avoiding the sticky issue of distinguishing between formal and informal modes of communication. Whereas the Semantic Web triple establishes object relationships within a particular set of content, the Web 2.0 hyperlink is always a specific relationship between two content objects located on the Web. This fundamental conceptual difference creates a tension, which to some extent is indicative of the scholarly communication system in a state of uncertain change.
The juxtaposition of formal and informal communication with respect to Semantic Web and Web 2.0 provides an opportunity to reflect on normative roles scholarly communication in relation to emerging new practices. In our hybrid approach, we expose book content to the construction of intertextual discourses occurring on the Web. In addition, the content is hyperlinked to cited references and related resources. In this way, books can be contextualized within the discourses and resources. This sort of situating of book content actively increases its exposure on the Web through increased access within a network and through increased visibility in search engine queries.
We also structure the book content through formal object relationships defined in a book-website ontology. Exposing book content to the burgeoning Semantic Web also increases its exposure, but in a more passive way and potentially in a more precise way. It is passive because access to Semantic Web aggregation is still somewhat limited to specialized repositories and machine aggregation that adds an additional layer of mediation between humans and content. While Semantic Web projects seem to be on the rise, expected contribution to scholarly communication more broadly is presently seen as a longer-term investment.
UPDATE 09 June 2011: See here for a diagram of our hybrid approach. It displays functional modules in three clusters: the WordPress software (upper left), community-developed WordPress plugins (lower left) and our custom plugins (right).
Halavais, Alexander. 2008. “The Hyperlink as Organizing Principle.” In The hyperlinked society: questioning connections in the digital age, eds. Joseph Turow and Lokman Tsui. University of Michigan Press.
Harley, Diane, Sophia Acord, Sarah Earl-Novell, Shannon Lawrence, and C. Judson King. 2010. Assessing the Future Landscape of Scholarly Communication: An Exploration of Faculty Values and Needs in Seven Disciplines. Center for Studies in Higher Education, UC Berkeley.
Procter, Rob et al. 2010. “Adoption and use of Web 2.0 in scholarly communications.” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 368(1926): 4039 -4056.
Vossen, Gottfried, and Stephan Hagemann. 2007. Unleashing Web 2.0: From Concepts to Creativity. Morgan Kaufmann.
W3C. “Resource Description Framework (RDF): Concepts and Abstract Syntax.” (Accessed June 5, 2011).