Request a tool
All toolsMCP serverRequest a toolPlatformsCategories
OpenAlex Scholarly Works Scraper icon

OpenAlex Scholarly Works Scraper

Search 250M+ OpenAlex papers by keyword. Get titles, authors, venue, year, citations, concepts, OA links and full abstracts as structured JSON. No API key.

Run this in the cloudRun on Apify →

Developer & Research Tools

How it works

  1. 1
    Open it on Apify

    Hit Run on Apify — it opens the tool in the cloud, no install.

  2. 2
    Set the inputs

    Adjust query, sort, fromDate (sensible defaults are pre-filled).

  3. 3
    Click Run

    The tool runs on Apify’s cloud and collects the data for you.

  4. 4
    Export the results

    Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.

Inputs

FieldWhat it doesType
queryKeywords to search OpenAlex works for (title, abstract and fulltext are searched), e.g. "machine learning", "crispr gene editing". Required.string
sortHow to order results: Relevance (best match for the query), Citations (most-cited first), or Date (newest first).string
fromDateOptional. Only return works published on or after this date (YYYY-MM-DD, e.g. 2023-01-01). Adds a from_publication_date filter.string
filterOptional advanced filter passed straight to the OpenAlex API filter param. Comma-separated key:value pairs, e.g. "type:article,is_oa:true,from_publication_date:2023-01-01". See the OpenAlex docs for available filter keys. Merged with From publication date.string
maxItemsMaximum number of works to return. Cursor pagination fetches 50 per page until this many unique works are collected.integer
notionConnectorOptional. Write each result as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, then pick it here. Leave empty to skip (default) — results are always saved to the dataset regardless.string
notionParentIdOptional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your workspace instead.string

What you get

A structured dataset — each result includes fields like:

abstractauthorscitationsconceptsdoiinstitutionsisOpenAccessoaUrlopenalexIdpublicationDatetitletypeurlvenue

Export every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.

2 ready-to-run use cases

Latest CRISPR Papers from OpenAlex, Newest First

Track CRISPR gene-editing research on OpenAlex sorted newest first, with authors, journal, citation counts, and full abstracts for each work.

Top-Cited Deep Learning Papers from OpenAlex

The foundational deep learning reading list, ranked by citation count from OpenAlex with authors, venue, and abstracts for literature reviews and ML research.

OpenAlex Scholarly Works Scraper

Search the OpenAlex catalog of 250M+ scholarly works and get clean, structured records — no API key, no login. OpenAlex is a free, open index of scholarship (an open replacement for Microsoft Academic Graph / Scopus).

This actor calls the public OpenAlex works endpoint, walks results with cursor pagination (the reliable way past the first couple hundred), reconstructs each abstract from its inverted index into readable text, and returns one flat row per work.

It is a polite API citizen: every request carries a contact mailto (both as a query param and in the User-Agent), which routes traffic to OpenAlex's faster, more reliable "polite pool".

Input

FieldTypeDefaultDescription
querystring— (required)Keywords to search (title, abstract, fulltext), e.g. machine learning.
sortstringrelevancerelevance, citations (most cited first), or date (newest first).
fromDatestringOptional YYYY-MM-DD; only works published on/after this date.
filterstringOptional raw OpenAlex filter, e.g. type:article,is_oa:true. Merged with fromDate.
maxItemsinteger100Max works to return (50 fetched per page via cursor).
proxyConfigurationobject{ "useApifyProxy": false }Optional. Not needed — OpenAlex is a clean public API.

Example input

{
  "query": "crispr",
  "sort": "citations",
  "fromDate": "2020-01-01",
  "maxItems": 120
}

Output

One row per work:

{
  "ok": true,
  "openalexId": "https://openalex.org/W...",
  "doi": "https://doi.org/10....",
  "title": "…",
  "authors": ["Jane Doe", "John Roe"],
  "institutions": ["Some University"],
  "year": 2021,
  "publicationDate": "2021-05-03",
  "type": "article",
  "venue": "Nature",
  "citations": 1234,
  "concepts": ["Biology", "Genetics"],
  "isOpenAccess": true,
  "oaUrl": "https://…pdf",
  "abstract": "Reconstructed abstract text…",
  "url": "https://openalex.org/W..."
}

abstract is rebuilt from OpenAlex's abstract_inverted_index; when no abstract is indexed it is null. Results are deduplicated by openalexId.

Diagnostics & billing

On failure or no results, the actor pushes a single diagnostic row (ok:false) with an errorCode (BAD_INPUT, NO_RESULTS, RATE_LIMITED, SERVER_ERROR, NETWORK) instead of failing silently. Only successful work rows are charged (one work unit each) — diagnostics and empty results are never billed.

Data source

Data comes from OpenAlex, released under CC0. Please cite OpenAlex when you use it.