Check Out Matt Marx's LinkedIn Stats (Last 30 Days)
Matt Marx
Bruce F. Failing, Sr., Chair and Faculty Director of Entrepreneurship at Cornell University. Research Associate at NBER. I like building datasets and startups.
AI Summary
Bridging science and business, I research how to commercialize technology more effectively. My work on non-compete agreements has influenced policy reforms. As a former startup executive, I bring real-world experience to academia. Passionate about open data, I've created resources used by thousands of researchers worldwide.
Topics associated with them
Follower Count
2,160
Total Reactions
1,043
Total Comments
45
Total Reposts
98
Posts (Last 30 Days)
0
Engagement Score
54 / 100
Matt Marx's recent posts

Matt Marx
Bruce F. Failing, Sr., Chair and Faculty Director of Entrepreneurship at Cornell University. Research Associate at NBER. I like building datasets and startups.
Interim PV update: I’ve posted the final release of PatentsView in the following Zenodo repositories: Metadata for grants & applications (these are the files you usually use, also contains the data dictionaries): https://lnkd.in/gFYvizcD Brief summary text for grants & applications: https://lnkd.in/g7qZ8e36 Full description text for grants & applications (this is the big one): https://lnkd.in/gC2KqBhU Claims for grants & applications: https://lnkd.in/gAjKGDxP Drawing descriptions for grants & applications: https://lnkd.in/gDFrqJGw Tried my best to get these right, but please have a look and let me know *today8 if you see problems. (Drove to D.C. last night & coming back tonight, so I should have time to find any missing files on Thursday.) Documentation tips also welcome. Dror Shvadron (THANK YOU) has uploaded the metadata to the i3-nber BigQuery workspace and is working on the fulltext. I’m told these tables will be mirrored on Google Patents once we are done. Looking at long-term strategies for updating the data but no promises yet. More soon, Matt

Matt Marx
Bruce F. Failing, Sr., Chair and Faculty Director of Entrepreneurship at Cornell University. Research Associate at NBER. I like building datasets and startups.
Fellow innovation researchers - you may have heard that the beloved PatentsView is set to close down a week from tomorrow. (To be clear, this doesn't just mean that there won't be updates, but that the site will be gone.) We at the Innovation Information Initiative (i3) have downloaded all of the bulk data from the site, including both metadata and full-text, so you don't need to worry about files not being accessible. I've stored the metadata files in a permanent repository, and before the site goes dark we will upload everything including full-text files to our i3 BigQuery Workspace (https://lnkd.in/gxaB-fPK). more soon, Matt

Matt Marx
Bruce F. Failing, Sr., Chair and Faculty Director of Entrepreneurship at Cornell University. Research Associate at NBER. I like building datasets and startups.
last Friday we had our first i3 Upskilling session re: BigQuery, led by Roger Masclans, Duke PhD candidate. notes attached, recording here (https://lnkd.in/gtFRpJER) I tried this myself. needed to calculate the year-by-year share of newly-founded startups, by whether their patents cited scientific articles within 5 years of founding. Normally I'd download foundingpatents.com, relianceonscience.org, OpenAlex. OA is a 400G zip of hundreds of files that need to be linked, just to get the year of publication to ascertain whether the science cites were within 5 years. it would have taken a couple days, 1T+ of storage, and who knows how much memory. Here was the SQL query I wrote (well, Grok3 wrote it), which ran in about 3 seconds (!!!!). WITH assigneeatfiling as ( select * from `nber-i3.founding_patents.ocpb_assigneeatfiling` ), npls as ( select * from `nber-i3.reliance_on_science.pcs_oa_v64` ), papers as ( SELECT publication_year, id -- SELECT REGEXP_EXTRACT(id, r'\d+') AS oaid -- id AS oaid FROM `nber-i3.openalex.works_241125` ), sq1 AS ( SELECT --t1.patent_id, t1.initassignee_id, t1.founding_year, t1.VC_backed_assignee, --t2.oaid, --t3 id, t3 publication_year, --t3.publication_year - t1.founding_year AS year_gap, CASE WHEN t3.publication_year - t1.founding_year <= 5 THEN 1 ELSE 0 END AS patcitewithin5yrs FROM assigneeatfiling AS t1 LEFT JOIN npls t2 ON t1.patent_id = CAST(REGEXP_EXTRACT(t2.patent, r'\d+') AS STRING) LEFT JOIN papers t3 ON CAST(t2.oaid AS STRING) = CAST(REGEXP_EXTRACT(t3.id, r'\d+') AS STRING) WHERE t1.founding_year>1989 ), sq2 AS ( SELECT initassignee_id, ANY_VALUE(founding_year) AS founding_year, -- Assumes founding_year is unique per initassignee_id ANY_VALUE(VC_backed_assignee) AS VC_backed_assignee, -- Assumes VC_backed_assignee is consistent MAX(patcitewithin5yrs) AS has_patcite_within_5yrs -- 1 if ever 1, 0 if never 1 FROM sq1 GROUP BY initassignee_id ), final AS ( SELECT founding_year, COUNT(CASE WHEN has_patcite_within_5yrs = 1 THEN 1 END) AS assignees_with_science_in_5, COUNT(CASE WHEN has_patcite_within_5yrs = 0 THEN 1 END) AS assignees_without_science_in_5, COUNT(CASE WHEN VC_backed_assignee = 1 THEN 1 END) AS vc_backed_assignees, COUNT(CASE WHEN VC_backed_assignee = 1 and has_patcite_within_5yrs = 1 THEN 1 END) AS vc_backed_assignees_with_science, COUNT(CASE WHEN VC_backed_assignee = 1 and has_patcite_within_5yrs = 0 THEN 1 END) AS vc_backed_assignees_without_science, COUNT(*) AS total_assignees FROM sq2 GROUP BY founding_year ORDER BY founding_year DESC ) SELECT * FROM final

Matt Marx
Bruce F. Failing, Sr., Chair and Faculty Director of Entrepreneurship at Cornell University. Research Associate at NBER. I like building datasets and startups.
A week from today (Friday 2/21, 11am ET) the Innovation Information Initiative (i3) will host its first Upskilling session. These sessions are designed to equip researchers will techniques to save time/$/frustration with big-data projects. I previously mentioned our i3 BigQuery Workspace, where we host everything from OpenAlex to Paper Patent Pairs. The focus of this first Upskilling session is BigQuery, including: 1. querying massive datasets efficiently 2. using SQL + Python for reproducible research 3. optimizing costs & avoid common pitfalls Rogers Masclans (Duke) and Dror Shvadron (Toronto) will lead the session. Register here: https://lnkd.in/grvvRHYS I'd love suggestions on the next topics we should cover. Among suggestions I've heard are a) wrangling the massive Revelio/CoreSignal data many people are buying; b) using LLMs without handing your entire research budget to Sam Altman. Of course, I am ready to hold a Perl masterclass at any time (Reliance on Science is written almost entirely in Perl, the Latin of computer science). That makes me a dinosaur who needs Upskilling...

Matt Marx
Bruce F. Failing, Sr., Chair and Faculty Director of Entrepreneurship at Cornell University. Research Associate at NBER. I like building datasets and startups.
first week of the semester means not many manuscripts submitted, so I'll take a minute to summarize learnings from year 1 at the Entrepreneurship & Innovation desk of ManSci. (don't worry, won't bore you with counts of manuscripts processed, response time, rejection rates. we did achieve our stated goal of No New Initiatives, focusing instead on keeping the trains running) 1. as a referee I've usually had the mindset of keeping bad papers from getting published. now I think more about how to get good papers to publication. that said, we try to make judicious use of our volunteer referees' time, so we're desk-rejecting more. but, no more form letters; we provide feedback on every manuscript. 2. years ago, most manuscripts had great theoretical flourish but crashed on empirics. things seem to have reversed. I often recommend Ezra Zuckerman Sivan's "Tips to Article Writers" as a reality check. also, sometimes the contribution is there but not written clearly. as a data-monkey myself, I often fall into this trap of being exhausted after months of coding. I remind myself of when I did engineering sales support for Emil Michael (about to be Undersecretary of Defense for Research and Technology!), who said it's never the customer's fault if we lose the sale; it's OUR fault. I try to be charitable toward referees who "didn't understand the paper" - maybe I wrote it badly. 3. some editors don't read cover letters, but I do. if you're going to write one, no need to paste in the abstract (already saw it) or disclaiming that the article isn't under review elsewhere (you checked that box). this is your chance to highlight the contribution *and* why it's a fit for ManSci E&I (the latter might be hard to communicate explicitly in the manuscript). a great cover letter is a way to get past the desk. note that our new Editor in Chief Christoph Loch *really* cares about real-world relevance. hope this is somehow helpful for 2025, and I'm happy to answer questions. I hope you'll send us manuscripts that you think are a fit for ManSci E&I.

Matt Marx
Bruce F. Failing, Sr., Chair and Faculty Director of Entrepreneurship at Cornell University. Research Associate at NBER. I like building datasets and startups.
Releasing another open dataset this weekend. You may know Michaël Bikard's amazing dissertation where he developed "idea twins" (think CRISPR, or calculus) where the same discovery occurred simultaneously. A few years back, David Hsu and I took his algorithm and scaled it up to the entire Web of Science. We built a server farm in my basement and scraped PDFs from Google Scholar for 19 months so we could check whether papers are cited *in the same parenthesis*. That paper isn't super-well known, but somehow people heard about the data. In fact I think we've had more requests for the data than we've had citations...anyway, we couldn't post it because of the proprietary WoS IDs. Since then I've ported the WoS twins to the Microsoft Academic Graph (same IDs are used for OpenAlex). I could have posted it but held off because Michaël Bikard posted his own twins file, based on PubMed, at the FIVES archive. But if you're studying fields outside life sciences you really need the twins David and I built. Michaël Bikard graciously gave me the go-ahead to post these. THAT SAID, if you use these data please cite his SMJ article because the twins idea is his idea: https://lnkd.in/gaDvRHXc. (If you want to cite ours too, that's great https://lnkd.in/gf964xnF). Here's the catch. The only way to get this file is from the new NBER-i3 BigQuery Workspace, where we have upload dozens of open-access entrepreneurship/innovation datasets including the latest OpenAlex. Instructions below.
Top Hooks from Matt Marx



Famous LinkedIn Creators to Check Out
Kate Sheerin
ansari shab

Kevin Anthony Johnson, PCC
CEO & Trusted Advisor | Coaching Global Leaders to Build Legacies of Influence + Impact
11,269 Followers
Open in LinkedIn