The digitization of the US Patent and Trademark Office’s (USPTO) backfile of six million patents undertaken between 1951 and 2001 was a five-decade struggle, featuring several media transitions from print and microfilm to CD-ROMs and, finally, the Web. This mass digitization project is on a similar scale to Google Books and the Internet Archive, but it is rarely discussed within critical digitization scholarship or for its significance as a tool for knowledge production. In this article, I focus on the USPTO’s patent document’s digital and physical material form and how the current paradigm of access and storage of the digital backfile emerged. Through this case study, I build upon Ian Milligan’s distinction between the ‘text’ and ‘platform’ layers of a digitization project to demonstrate how historical decisions regarding format and metadata continue to influence how users retrieve and interpret documents, such as patents, online.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.