Projects

Our team is actively engaged in a variety of projects, ranging from well-established endeavors to those in the early stages of development. Highlights of select initiatives are presented below. Students interested in contributing to research efforts are encouraged to reach out to Dr. Kelly for more information and potential opportunities.

Projects Index

  • Modeling Semantic Shift
  • IPARO Simulation
  • Yet Another Metadata Zoo (YAMZ)
  • Saving Web Ads
  • HBCU Mobility
  • VENOM: Archiving the Dark Web

Ongoing Projects

Saving Ads: Assessing and Improving Web Archives’ Holdings of Online Advertisements & Preserving Personalized Advertisements for More Accurate Web Archives

Advertisements have served as the basis for rich analysis by scholars in many fields, particularly in the humanities and social sciences. Arguably the mores and norms of a time and place can be surmised to some extent through examination of a body of ads, or more narrowly at least ads can provide interesting contextual information to items they surround. Although print publications used to be the dominant information distribution format, access to information and publications online is now the norm, and the threat of losing today’s digital material mirrors the known loss of print material from the past. This foundational work allowed for quantitative and qualitative analysis of the current short-comings of archival copies of advertisements embedded in web pages. Through this project we began to determine how to improve software based tools to capture specific types of ads. Our project explored the paradox of having an abundance of information online at the same time we face the disappearance of digital artifacts. In physical archives, what “gets into” an archive reflects the values of particular institutions (and funding availability), as well as the professional judgement of archivists. The project extended a vital, timely initiative that addresses both the technological shortcomings of and human negligence in archiving advertisements embedded in web pages. The project focused on personalized ads – those viewed on the live web by users and typically not surfaced for capture by archival crawlers.

Collaborators

  • Mat Kelly (Drexel University, PI, Project Lead)
  • Alexander H. Poole (co-PI, Drexel University)
  • Michael L. Nelson (co-PI, Old Dominion University)
  • Michele C. Weigle (co-PI, Old Dominion University)
  • Christopher Rauch (PhD researcher, Drexel University)
  • Travis Reid (PhD researcher, Old Dominion University)
  • Hyung Wook Choi (PhD researcher, Drexel University)

Associated Publications

Mat Kelly, Alexander H. Poole, Michele C. Weigle, Michael L. Nelson, Travis Reid, Christopher Rauch, and Hyung Wook Choi, “What You See No One Saw,” Presented at the International Internet Preservation Consortium (IIPC) Web Archiving Conference (WAC) 2025, Oslo, Norway, April 8-10, 2025.

Christopher Rauch, Alex H. Poole, Travis Reid, Michele C. Weigle, Michael L. Nelson, Faryaneh Poursardar, and Mat Kelly, “Archiving Digital Marketing: Examining Preservation of Dynamic Content on the Web Through the Lens of Online Advertisements,” In Proceedings of the 20th International Conference on Digital Preservation (iPRES) 2024, Ghent, Belgium, September 16–20, 2024.

Christopher Rauch, Mat Kelly, Alex H. Poole, Travis Reid, Faryaneh Poursardar, Michael L. Nelson, and Michele C. Weigle, “Contextual Archiving of Web Page Advertisements Using Persona-Based Tools,” Presented at the Society of American Archivists (SAA) Research Forum, Online, July 17, 2024.

Christopher Rauch, Mat Kelly, Alexander H. Poole, Michele C. Weigle, Michael L. Nelson, and Travis Reid, “Saving Ads: Assessing and Improving Web Archives’ Holdings of Online Advertisements,” Presented at the International Internet Preservation Consortium (IIPC) Web Archiving Conference (WAC) 2024, Paris, France, April 24–26, 2024.

Modeling Semantic Shift Using Version Control Paradigms

(project description)

Collaborators

  • Mat Kelly (Project Lead, Drexel University)
  • Hyung Wook Choi (PhD Research Lead, Drexel University)

Associated Publications

Christopher B. Rauch, Hyung Wook Choi, and Mat Kelly, “Beyond the ‘Idols of the Marketplace’: Managing Semantic Change in Research,” In Proceedings from the 2024 Annual Meeting of the Document Academy, Vol. 11, Issue 2, Article 3.

Hyung Wook Choi and Mat Kelly, “Documenting Semantic Shifts Across Domains with Version Control,” Presented at the Annual Meeting of the Document Academy (DOCAM ‘24), Philadelphia, Pennsylvania, September 19-21, 2024.

Hyung Wook Choi and Mat Kelly, “On Identifying Points of Semantic Shift Across Domains,” Presented at the 17th International Conference on Metadata and Semantic Research (MTSR), Milan, Italy, October 25–27, 2023.

HBCU Mobility

The project collected faculty affiliation data from 35 HBCUs with master- or doctoral-level programs. It used Internet Archive as the primary data source and LinkedIn, ORCID, and ProQuest as secondary data sources. The project linked large, heterogeneous corpora of faculty affiliation data, Carnegie Classification institution profile data, Web of Science publication and citation data, and survey and interview data. The linked data was used to conduct expansive, cross-domain examinations of the impact of academic moves on individual professors’ research activity and institutional human capital. The project employed statistical modeling and historical comparisons in combination with surveys and interviews. The combination of quantitative and qualitative results provided evidence concerning both the causes of institutional human capital change at HBCUs and the effect of moves on professors’ research activities. This project contributed new knowledge on academic mobility, particularly for minority-serving institutions (MSIs). The project designed an interactive visual dashboard to share project outputs broadly. The visual dashboard will be updated annually in September for three additional years beyond the conclusion of the project. The results of this project will provide insights for administrators and policy makers.

  • https://hbcumobility.cci.drexel.edu/

Collaborators

  • Erjia Yan (Project Lead, Drexel University)
  • Robert Palmer (co-PI, Howard University)
  • Mat Kelly (co-PI, Drexel University)
  • Jiangen He (co-PI, University of Tennessee - Knoxville)
  • Chaoqun Ni (co-PI, University of Wisconsin - Madison)
  • Deanna Zarrillo (PhD Researcher, Drexel University)
  • Christopher Jackson (Student Researcher, University of Tennessee - Knoxville)
  • Noah Osman Azehaf (Student Researcher, University of Tennessee - Knoxville)
  • Nathelen Wanjiru (Student Researcher, University of Tennessee - Knoxville)
  • Baolu Yu (Student Researcher, University of Wisconsin - Madison)
  • Xi Hong (Student Researcher, University of Wisconsin - Madison)
  • Xiang Zheng (Student Researcher, University of Wisconsin - Madison)
  • Mungshu Shen (Student Researcher, University of Tennessee - Knoxville)

Support

This project is funding by the National Science Foundation (NSF), Science of Science: Discovery, Communication, and Impact award #2122525.

IPARO Simulation and Evaluation - https://seniorproject.cci.drexel.edu/project/f3447740-0655-4766-a836-935aaaa732b8/

IPARO is a decentralized version tracking system using the existing primitives of IPFS (InterPlanetary File System) and IPNS (InterPlanetary Name System). IPARO proposes the concept of IPMT (InterPlanetary Media Types) and namespacing so that it can be used in other applications that require versioning, such as a wiki or a collaborative code tracking system. The proposed system does not rely on any centralized server for archiving or replay of the content and continues to allow aggregators to play their role from which both large and small archives can benefit and flourish. This project entails creating an implementation of the proposed system and evaluating the decentralized version tracking system in the context of web archives.

Collaborators

  • Mat Kelly (Project Co-lead, Drexel University)
  • Sawood Alam (Project Co-lead, Internet Archive)
  • John Nguyen (Undergraduate Student Lead, Drexel University)
  • Patrick Le (Undergraduate Senior Project Lead, Drexel University)
  • Benji Bui (Undergraduate Senior Project Researcher, Drexel University)
  • Alexander Grigorian (Undergraduate Senior Project Researcher, Drexel University)
  • Thiyazan Qaissi (Undergraduate Senior Project Researcher, Drexel University)
  • Alexey Kuraev (Undergraduate Senior Project Researcher, Drexel University)

YAMZ - Yet Another Metadata Zoo

YAMZ (pronounced “yams”) is an open, crowdsourced metadata vocabulary and terminology platform-a “metadictionary” that spans all domains and types of metadata “speech,” including names, values, units, and relationships. It enables users to browse, add, import, tag, and link terms, obtain permanent identifiers (ARK permalinks), and iteratively refine terminology via community dialog and voting

Collaborators

  • John Kunze (Project Lead)
  • Jane Greenberg (Project Collaborator, Drexel University)
  • Mat Kelly (Project Collaborator, Drexel University)
  • Christopher Rauch (PhD Researcher, Drexel University)
  • Hyung Wook Choi (PhD Researcher, Drexel University)
  • Scott McClellan (PhD Researcher, Drexel University)
  • Naima Sultana (MS Student Researcher, Drexel University)
  • Haard Doshi (STAR Scholar Undergraduate Researcher, Drexel University)

Associated Publications

Jane Greenberg, Scott McClellan, Christopher Rauch, Xintong Zhao, Mat Kelly, Yuan An, John Kunze, Rachel Orenstein, Claire Porter, Vanessa Meschke, and Eric Toberer, “Building Community Consensus for Scientific Metadata with YAMZ,” Data Intelligence, 5(1), pp. 242–⁠260, February 2023.

Christopher B. Rauch, Mat Kelly, John Kunze, and Jane Greenberg, “FAIR Metadata: A Community-driven Vocabulary Application,” In Proceedings of the 15th International Conference on Metadata and Semantic Research (MTSR), Online, pp. 187–198, November 29–December 3, 2021.

Mat Kelly, Christopher B. Rauch, John Kunze, Sam Grabus, Joan Boone, Peter M. Logan, and Jane Greenberg, “Archival Resource Keys for Collaborative Historical Ontology Publication,” In Proceedings of the International Conference on ICT enhanced Social Sciences and Humanities (ICTeSSH 2021), June 28–30, 2021.

Mat Kelly, Christopher B. Rauch, Jane Greenberg, Sam Grabus, Joan Boone, John Kunze, and Peter M. Logan, “Advancing ARKs in the Historical Ontology Space,” Code4Lib Journal, Issue 50, Feb 2021.

Mat Kelly, Jane Greenberg, Christopher B. Rauch, Joan P. Boone, John Kunze, and Peter Logan, “Of ARKs and Ontologies,” Presented at PIDapalooza 2021, Online Meeting, 27–28 January 2021.

Mat Kelly, Jane Greenberg, Christopher B. Rauch, Sam Grabus, Joan P. Boone, John A. Kunze, and Peter Melville Logan, “A Computational Approach to Historical Ontologies,” In Proceedings of the 5th Computational Archival Science Workshop at the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), Atlanta, Georgia, pp. 1878–1883, December 2020.

VENOM: Archiving the Dark Web

MITRE’s Venom Project is developing a framework for archiving the dark web. One of the core challenges to our research is tracking the Universal Resource Identifiers (URIs) – the .onion addresses – of dark web sites as they shift over time. The project team has developed a .onion canonicalization service to help monitor and track the .onions over time. The Contractor is expected to construct a prototype persistent identifier for dark web sites (and their .onion URLs). The prototype should work with dark web sites as well as personalized surface web sites.

Collaborators

  • Justin F. Brunelle (Project Co-Lead, MITRE Corp)
  • Mat Kelly (Project Co-Lead, Drexel University University)
  • Christopher Rauch (PhD researcher, Drexel University)

Projects Seeking Students

InterPlanetary Wayback (ipwb) RQs:

WARCreate MV3 RQs:

Web Archiving Integration Layer

  • UI/UX (Dhanashree)
  • Python -> native app (macOS) RQs:

Mink: MV3 Client-side MMA Temporal Visualization RQs:

Yet Another Metadata Zoo (YAMZ) - (Chris) #Crowdsourcing #Linked Data #Vocabularies #Term Evolution #Python RQs:

Forbidden Visualization (need diff name) #Web Archives #Visualization RQs:

Content Negotiation of Web Archives in Dimensions Beyond Time RQs:

TOR-based archiving/access regulation (PWAA) RQs:

Pre-Alpha (Conceptual, Needs Preliminary Investigation) Alpha (Work Started, Previous Pubs) Beta (Work Heavily Underway)