A catalyst for continuing advancement

New discoveries are made each day at the intersection of data science and patent analysis. Patent Research Foundation and members of its Advisory Board actively participate in research and  publish in business, law, economics, and computer science, providing a depth of expertise and transparency unique in the patent field.  The following articles have been enabled by our data and more research is underway:

Patent Citations Reexamined: New Data and Methods

Jeffrey M. Kuhn, University of California, Haas School of Business & Berkeley School of Law 
Kenneth A. Younge,  École Polytechnique Fédérale de Lausanne 
Alan Marco, Chief Economist, United States Patent and Trademark Office

Existing measures of innovation often rely on patent citations to indicate intellectual lineage and impact. We show that the data generating process for patent citations has changed substantially since citation-based measures were validated a decade ago.

Available at SSRN

Patent-to-Patent Similarity: A Vector Space Model

Jeffrey M. Kuhn, University of North Carolina (UNC) at Chapel Hill – Kenan-Flagler Business School
Kenneth A. Younge, École Polytechnique Fédérale de Lausanne

Current measures of patent similarity rely on the manual classification of patents into taxonomies. In this project, we leverage information retrieval theory and Big Data methods to develop a machine-automated measure of patent-to-patent similarity. We validate the measure and demonstrate that it significantly improves upon existing patent classification systems.

Available at SSRN

Does Winning a Patent Race Lead to More Follow-On Innovation?

Neil Thompson, MIT – Sloan School of Management
Jeffrey M. Kuhn, University of North Carolina (UNC) at Chapel Hill – Kenan-Flagler Business School

Competition between firms to invent and patent an idea, or “patent racing,” has been much discussed in theory, but seldom analyzed empirically. This article introduces an empirical way to identify patent races, and provides the first broad-based view of them in the real world. It reveals that patent races are common, particularly in information-technology fields.

Available at SSRN

Property Rights and Frictions in the Sale of Patents

Jeffrey M. Kuhn, University of California, Haas School of Business & Berkeley School of Law

Patent scope is central to the sale of ideas, which can spur economic growth and provide significant gains from trade. Awarding an inventor a patent on a new idea partially solves a commitment problem that would otherwise prevent the inventor from selling the idea. (Arrow, 1962). In the absence of a patent, a prospective buyer cannot credibly promise not to steal the idea should the inventor reveal it, while the inventor cannot credibly promise to reveal the idea…

Available at SSRN

Efficient Sparse Matrix-Matrix Multiplication on Multicore Architectures

Adam Lugowski, University of California Santa Barbara, Computer Science Department
John R. Gilbert, University of California Santa Barbara, Computer Science Department

We describe a new parallel sparse matrix-matrix multiplication algorithm in shared memory using a quadtree decomposition. Our preliminary implementation is nearly as fast as the best sequential method on one core, and scales well to multiple cores.

Available at USCB

Datasets for future research

Our datasets are provided subject to the Creative Commons Attribution-NonCommercial-NoDerivatives license. No co‑authorship is required to use the data in academic research – please just cite the supporting article. If you would like to be notified of future data releases, please let us know at research@patrf.org.

Patent Citation Similarity Dataset
Kuhn-Younge-Marco_Patent_Citation_Similarity_2017-10-23.zip 819 MiB

Many studies of innovation rely on patent citations to measure intellectual lineage and impact. To create this dataset, we use a vector space model of patent similarity to compute the technological similarity between each pair of citing-cited patents. The VSM model analyzes the full text of each document to position it as a vector in a vector space that includes more than 700,000 dimensions and then calculates the angular distance between the two vectors. The dataset includes similarity values for all citations made by patents issued between 1976 and 2017 to issued patents or published patent applications.

Supporting Article: Patent Citations Reexamined: New Data and Methods
By Jeffrey Kuhn, Kenneth Younge, Alan Marco

Patent Families Dataset
Younge-Kuhn_Patent_Families_2017-09-25.zip 18 MiB

Patent applicants frequently file groups of patent applications linked together by priority claims. These priority claims create families of patent applications that share features such as inventors, priority dates, and technical descriptions. By analyzing these linkages, each patent can be assigned a family identifier that it shares with other patents in the same family. This data set includes two levels of family identifiers (clone for near copies, and extended for more attenuated linkages) for each patent issued 2005-2014.

Supporting Article: Patent-to-Patent Similarity: A Vector Space Model 
By Kenneth Younge, Jeffrey Kuhn

Patent Citation Timing and Source Dataset
Kuhn-Younge-Marco_Patent_Citation_Source_and_Timing_2017-09-25.zip 292 MiB

Innovation studies frequently distinguish between patent citations submitted by the patent examiner and those submitted by the patent applicant. However, publicly available citations data is often misleading, for instance by attributing a patent citation to the patent examiner when it was, in fact, first submitted by the patent applicant. This dataset uses internal USPTO data to identify the date on which each citation was first submitted as well as the party (examiner or applicant) who first submitted it.

Supporting Article: Patent Citations Reexamined: New Data and Methods
By Kenneth Younge, Jeffrey Kuhn, Alan Marco 

Patent Scope and Examiner Toughness Dataset
Kuhn-Thompson_Patent_Scope_2017-10-23.zip 33 MiB

This dataset includes an easy-to-use measure of patent scope that is grounded both in patent law and in the practices of patent attorneys. Our measure counts the number of words in the patents’ first claim. The longer the first claim, the less scope a patent has. This is because a longer claim has more details – and all those details must be met for another invention to be infringing. Hence, the more details there are in the patent, the greater are the opportunities for others to invent around it. We validate our measure by showing both that patent attorneys’ subjective assessments of scope agree with our estimates, and that the behavior of patenters is consistent with it. To facilitate drawing causal inferences with our measure, we show how it can be used to create an instrumental variable, patent examiner Scope Toughness, which we also validate.

Supporting Article: The Ways We’ve Been Measuring Patent Scope are Wrong: How to Measure and Draw Causal Inferences with Patent Scope
By Jeffrey Kuhn, Neil Thompson