Scientific research is fundamental to our mission

PRF is at the forefront of scientific computing and big data technologies. Continuing academic exploration ensures that our analytic results are always based on the latest in scientific developments, not third-party applications. It’s also our passion – as we continually push the limits of data synergies, you’ll benefit from our ongoing advancements. Scholars at PRF also research and teach at leading business schools, giving us a unique view on how to use our patent analytics to support your firm’s strategy.

Validation is paramount for trust

The Patent Research Foundation publishes its methodology; our analytics are open, scientifically validated, and peer-reviewed – giving you results that you can trust. Our team includes scholars publishing in business, law, economics, and computer science, providing a depth of expertise and transparency unique in the patent field.

PRF research grants foster collaboration

We provide Research Data Grants of $1,000 to $10,000 for cutting‑edge research, as well as access to datasets on patent-to-patent similarity, patent metadata, and other computations from our system. If you would like to be considered for a grant, please email us a one‑page research proposal for consideration. Proposals are reviewed every 6 months. We give preference to open research that contributes back to the community, and to data/calculations that can be posted back to the PRF Research page.

Our articles are a catalyst for continuing advancement

The PRF Vector Space Model has played an important role in supporting empirical research in innovation economics and business strategy. Patent-to-patent textual similarity data enables new insights into technological growth, changes in the patent system, and how firms profit from innovation. The following articles have been enabled by our data and more research is underway:

Patent Citations Reexamined: New Data and Methods

Jeffrey M. Kuhn, University of California, Haas School of Business & Berkeley School of Law 
Kenneth A. Younge,  École Polytechnique Fédérale de Lausanne 
Alan Marco, Chief Economist, United States Patent and Trademark Office

Existing measures of innovation often rely on patent citations to indicate intellectual lineage and impact. We show that the data generating process for patent citations has changed substantially since citation-based measures were validated a decade ago.

Available at SSRN

Patent-to-Patent Similarity: A Vector Space Model

Jeffrey M. Kuhn, University of North Carolina (UNC) at Chapel Hill – Kenan-Flagler Business School
Kenneth A. Younge, École Polytechnique Fédérale de Lausanne

Current measures of patent similarity rely on the manual classification of patents into taxonomies. In this project, we leverage information retrieval theory and Big Data methods to develop a machine-automated measure of patent-to-patent similarity. We validate the measure and demonstrate that it significantly improves upon existing patent classification systems.

Available at SSRN

Does Winning a Patent Race Lead to More Follow-On Innovation?

Neil Thompson, MIT – Sloan School of Management
Jeffrey M. Kuhn, University of North Carolina (UNC) at Chapel Hill – Kenan-Flagler Business School

Competition between firms to invent and patent an idea, or “patent racing,” has been much discussed in theory, but seldom analyzed empirically. This article introduces an empirical way to identify patent races, and provides the first broad-based view of them in the real world. It reveals that patent races are common, particularly in information-technology fields.

Available at SSRN

Property Rights and Frictions in the Sale of Patents

Jeffrey M. Kuhn, University of California, Haas School of Business & Berkeley School of Law

Patent scope is central to the sale of ideas, which can spur economic growth and provide significant gains from trade. Awarding an inventor a patent on a new idea partially solves a commitment problem that would otherwise prevent the inventor from selling the idea. (Arrow, 1962). In the absence of a patent, a prospective buyer cannot credibly promise not to steal the idea should the inventor reveal it, while the inventor cannot credibly promise to reveal the idea…

Available at SSRN

Efficient Sparse Matrix-Matrix Multiplication on Multicore Architectures

Adam Lugowski, University of California Santa Barbara, Computer Science Department
John R. Gilbert, University of California Santa Barbara, Computer Science Department

We describe a new parallel sparse matrix-matrix multiplication algorithm in shared memory using a quadtree decomposition. Our preliminary implementation is nearly as fast as the best sequential method on one core, and scales well to multiple cores.

Available at USCB

Our datasets are available for further research

Our datasets are provided subject to the Creative Commons Attribution-NonCommercial-NoDerivatives license. No co‑authorship is required to use the data in academic research – please just cite the supporting article. If you would like to be notified of future data releases, please let us know at research@patrf.org.

Patent Citation Similarity Dataset
Kuhn-Younge-Marco_Patent_Citation_Similarity_2017-10-23.zip 819 MiB

Many studies of innovation rely on patent citations to measure intellectual lineage and impact. To create this dataset, we use a vector space model of patent similarity to compute the technological similarity between each pair of citing-cited patents. The VSM model analyzes the full text of each document to position it as a vector in a vector space that includes more than 700,000 dimensions and then calculates the angular distance between the two vectors. The dataset includes similarity values for all citations made by patents issued between 1976 and 2017 to issued patents or published patent applications.

Supporting Article: Patent Citations Reexamined: New Data and Methods
By Jeffrey Kuhn, Kenneth Younge, Alan Marco
(SSRN)

Patent Families Dataset
Younge-Kuhn_Patent_Families_2017-09-25.zip 18 MiB

Patent applicants frequently file groups of patent applications linked together by priority claims. These priority claims create families of patent applications that share features such as inventors, priority dates, and technical descriptions. By analyzing these linkages, each patent can be assigned a family identifier that it shares with other patents in the same family. This data set includes two levels of family identifiers (clone for near copies, and extended for more attenuated linkages) for each patent issued 2005-2014.

Supporting Article: Patent-to-Patent Similarity: A Vector Space Model 
By Kenneth Younge, Jeffrey Kuhn
(SSRN)

Patent Citation Timing and Source Dataset
Kuhn-Younge-Marco_Patent_Citation_Source_and_Timing_2017-09-25.zip 292 MiB

Innovation studies frequently distinguish between patent citations submitted by the patent examiner and those submitted by the patent applicant. However, publicly available citations data is often misleading, for instance by attributing a patent citation to the patent examiner when it was, in fact, first submitted by the patent applicant. This dataset uses internal USPTO data to identify the date on which each citation was first submitted as well as the party (examiner or applicant) who first submitted it.

Supporting Article: Patent Citations Reexamined: New Data and Methods
By Kenneth Younge, Jeffrey Kuhn, Alan Marco 
(SSRN)

Patent Scope and Examiner Toughness Dataset
Kuhn-Thompson_Patent_Scope_2017-10-23.zip 33 MiB

This dataset includes an easy-to-use measure of patent scope that is grounded both in patent law and in the practices of patent attorneys. Our measure counts the number of words in the patents’ first claim. The longer the first claim, the less scope a patent has. This is because a longer claim has more details – and all those details must be met for another invention to be infringing. Hence, the more details there are in the patent, the greater are the opportunities for others to invent around it. We validate our measure by showing both that patent attorneys’ subjective assessments of scope agree with our estimates, and that the behavior of patenters is consistent with it. To facilitate drawing causal inferences with our measure, we show how it can be used to create an instrumental variable, patent examiner Scope Toughness, which we also validate.

Supporting Article: The Ways We’ve Been Measuring Patent Scope are Wrong: How to Measure and Draw Causal Inferences with Patent Scope
By Jeffrey Kuhn, Neil Thompson
(SSRN)

Research conferences

Sharing and validating our research with other academics is important for making sure we stay at the very forefront of the field. We have presented research and held seminars at the following events.

  • 2018 Intellectual Property Owner's European Practice Meeting | Presenter (Amsterdam, Netherlands)
  • 2017 American Law and Econ Assn Annual Meeting | Presenter (New Haven, CT)
  • 2017 Boston University School of Law | Technology and Research Conference (Boston, MA)
  • 2016 Northwestern/Searle | 9th Annual Conference on Innovation Economics (Chicago, IL)
  • 2016 AOM Patent PDW | Organizer (Anaheim, CA)
  • 2016 AOM Paper Session | Chair (Anaheim, CA)
  • 2016 Munich Summer Institute | Max Planck Institute, LMU, ETH (Munich, Germany)
  • 2016 DRUID16 | 20th Anniversary Conference Paper Session (Copenhagen, Denmark)
  • 2015 National Bureau of Economic Research | Productivity Group Seminar (Boston, MA)
  • 2015 USPTO | Chief Economist Seminar Series (Washington, DC)
  • 2015 SKEMA Business School | KTO Seminar Series (Sophia Antipolis, France)
  • 2015 SKEMA | Spatial Evolution of Industries (Sophia Antipolis)
  • 2015 EPFL | Chair, 7th Annual Clean Tech Seminar (Lausanne, Switzerland)
  • 2014 AOM Patent PDW | Presenter (Vancouver, Canada)