Our team is actively engaged in a variety of projects, ranging from well-established endeavors to those in the early stages of development. Highlights of select initiatives are presented below. Students interested in contributing to research efforts are encouraged to reach out to Dr. Kelly for more information and potential opportunities.
Advertisements have served as the basis for rich analysis by scholars in many fields, particularly in the humanities and social sciences. Arguably the mores and norms of a time and place can be surmised to some extent through examination of a body of ads, or more narrowly at least ads can provide interesting contextual information to items they surround. Although print publications used to be the dominant information distribution format, access to information and publications online is now the norm, and the threat of losing today’s digital material mirrors the known loss of print material from the past. This foundational work allowed for quantitative and qualitative analysis of the current short-comings of archival copies of advertisements embedded in web pages. Through this project we began to determine how to improve software based tools to capture specific types of ads. Our project explored the paradox of having an abundance of information online at the same time we face the disappearance of digital artifacts. In physical archives, what “gets into” an archive reflects the values of particular institutions (and funding availability), as well as the professional judgement of archivists. The project extended a vital, timely initiative that addresses both the technological shortcomings of and human negligence in archiving advertisements embedded in web pages. The project focused on personalized ads – those viewed on the live web by users and typically not surfaced for capture by archival crawlers.
Mat Kelly, Alexander H. Poole, Michele C. Weigle, Michael L. Nelson, Travis Reid, Christopher Rauch, and Hyung Wook Choi, “What You See No One Saw,” Presented at the International Internet Preservation Consortium (IIPC) Web Archiving Conference (WAC) 2025, Oslo, Norway, April 8-10, 2025.
Christopher Rauch, Alex H. Poole, Travis Reid, Michele C. Weigle, Michael L. Nelson, Faryaneh Poursardar, and Mat Kelly, “Archiving Digital Marketing: Examining Preservation of Dynamic Content on the Web Through the Lens of Online Advertisements,” In Proceedings of the 20th International Conference on Digital Preservation (iPRES) 2024, Ghent, Belgium, September 16–20, 2024.
Christopher Rauch, Mat Kelly, Alex H. Poole, Travis Reid, Faryaneh Poursardar, Michael L. Nelson, and Michele C. Weigle, “Contextual Archiving of Web Page Advertisements Using Persona-Based Tools,” Presented at the Society of American Archivists (SAA) Research Forum, Online, July 17, 2024.
Christopher Rauch, Mat Kelly, Alexander H. Poole, Michele C. Weigle, Michael L. Nelson, and Travis Reid, “Saving Ads: Assessing and Improving Web Archives’ Holdings of Online Advertisements,” Presented at the International Internet Preservation Consortium (IIPC) Web Archiving Conference (WAC) 2024, Paris, France, April 24–26, 2024.
(project description)
Christopher B. Rauch, Hyung Wook Choi, and Mat Kelly, “Beyond the ‘Idols of the Marketplace’: Managing Semantic Change in Research,” In Proceedings from the 2024 Annual Meeting of the Document Academy, Vol. 11, Issue 2, Article 3.
Hyung Wook Choi and Mat Kelly, “Documenting Semantic Shifts Across Domains with Version Control,” Presented at the Annual Meeting of the Document Academy (DOCAM ‘24), Philadelphia, Pennsylvania, September 19-21, 2024.
Hyung Wook Choi and Mat Kelly, “On Identifying Points of Semantic Shift Across Domains,” Presented at the 17th International Conference on Metadata and Semantic Research (MTSR), Milan, Italy, October 25–27, 2023.
The project collected faculty affiliation data from 35 HBCUs with master- or doctoral-level programs. It used Internet Archive as the primary data source and LinkedIn, ORCID, and ProQuest as secondary data sources. The project linked large, heterogeneous corpora of faculty affiliation data, Carnegie Classification institution profile data, Web of Science publication and citation data, and survey and interview data. The linked data was used to conduct expansive, cross-domain examinations of the impact of academic moves on individual professors’ research activity and institutional human capital. The project employed statistical modeling and historical comparisons in combination with surveys and interviews. The combination of quantitative and qualitative results provided evidence concerning both the causes of institutional human capital change at HBCUs and the effect of moves on professors’ research activities. This project contributed new knowledge on academic mobility, particularly for minority-serving institutions (MSIs). The project designed an interactive visual dashboard to share project outputs broadly. The visual dashboard will be updated annually in September for three additional years beyond the conclusion of the project. The results of this project will provide insights for administrators and policy makers.
This project is funding by the National Science Foundation (NSF), Science of Science: Discovery, Communication, and Impact award #2122525.
IPARO is a decentralized version tracking system using the existing primitives of IPFS (InterPlanetary File System) and IPNS (InterPlanetary Name System). IPARO proposes the concept of IPMT (InterPlanetary Media Types) and namespacing so that it can be used in other applications that require versioning, such as a wiki or a collaborative code tracking system. The proposed system does not rely on any centralized server for archiving or replay of the content and continues to allow aggregators to play their role from which both large and small archives can benefit and flourish. This project entails creating an implementation of the proposed system and evaluating the decentralized version tracking system in the context of web archives.
YAMZ (pronounced “yams”) is an open, crowdsourced metadata vocabulary and terminology platform-a “metadictionary” that spans all domains and types of metadata “speech,” including names, values, units, and relationships. It enables users to browse, add, import, tag, and link terms, obtain permanent identifiers (ARK permalinks), and iteratively refine terminology via community dialog and voting
Jane Greenberg, Scott McClellan, Christopher Rauch, Xintong Zhao, Mat Kelly, Yuan An, John Kunze, Rachel Orenstein, Claire Porter, Vanessa Meschke, and Eric Toberer, “Building Community Consensus for Scientific Metadata with YAMZ,” Data Intelligence, 5(1), pp. 242–260, February 2023.
Christopher B. Rauch, Mat Kelly, John Kunze, and Jane Greenberg, “FAIR Metadata: A Community-driven Vocabulary Application,” In Proceedings of the 15th International Conference on Metadata and Semantic Research (MTSR), Online, pp. 187–198, November 29–December 3, 2021.
Mat Kelly, Christopher B. Rauch, John Kunze, Sam Grabus, Joan Boone, Peter M. Logan, and Jane Greenberg, “Archival Resource Keys for Collaborative Historical Ontology Publication,” In Proceedings of the International Conference on ICT enhanced Social Sciences and Humanities (ICTeSSH 2021), June 28–30, 2021.
Mat Kelly, Christopher B. Rauch, Jane Greenberg, Sam Grabus, Joan Boone, John Kunze, and Peter M. Logan, “Advancing ARKs in the Historical Ontology Space,” Code4Lib Journal, Issue 50, Feb 2021.
Mat Kelly, Jane Greenberg, Christopher B. Rauch, Joan P. Boone, John Kunze, and Peter Logan, “Of ARKs and Ontologies,” Presented at PIDapalooza 2021, Online Meeting, 27–28 January 2021.
Mat Kelly, Jane Greenberg, Christopher B. Rauch, Sam Grabus, Joan P. Boone, John A. Kunze, and Peter Melville Logan, “A Computational Approach to Historical Ontologies,” In Proceedings of the 5th Computational Archival Science Workshop at the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), Atlanta, Georgia, pp. 1878–1883, December 2020.
MITRE’s Venom Project is developing a framework for archiving the dark web. One of the core challenges to our research is tracking the Universal Resource Identifiers (URIs) – the .onion addresses – of dark web sites as they shift over time. The project team has developed a .onion canonicalization service to help monitor and track the .onions over time. The Contractor is expected to construct a prototype persistent identifier for dark web sites (and their .onion URLs). The prototype should work with dark web sites as well as personalized surface web sites.
InterPlanetary Wayback (ipwb) RQs:
WARCreate MV3 RQs:
Web Archiving Integration Layer
Mink: MV3 Client-side MMA Temporal Visualization RQs:
Yet Another Metadata Zoo (YAMZ) - (Chris) #Crowdsourcing #Linked Data #Vocabularies #Term Evolution #Python RQs:
Forbidden Visualization (need diff name) #Web Archives #Visualization RQs:
Content Negotiation of Web Archives in Dimensions Beyond Time RQs:
TOR-based archiving/access regulation (PWAA) RQs:
Pre-Alpha (Conceptual, Needs Preliminary Investigation) Alpha (Work Started, Previous Pubs) Beta (Work Heavily Underway)