Request a tool
All toolsMCP serverRequest a toolPlatformsCategories
Crossref Scholarly Works Scraper icon

Crossref Scholarly Works Scraper

Search 150M+ scholarly works on Crossref and export DOI, authors, journal, citation count, and abstracts as JSON or CSV. Filter by type and date.

Run this in the cloudRun on Apify →

Developer & Research Tools

How it works

  1. 1
    Open it on Apify

    Hit Run on Apify — it opens the tool in the cloud, no install.

  2. 2
    Set the inputs

    Adjust query, filterType, fromDate (sensible defaults are pre-filled).

  3. 3
    Click Run

    The tool runs on Apify’s cloud and collects the data for you.

  4. 4
    Export the results

    Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.

Inputs

FieldWhat it doesType
queryKeywords to search Crossref for across titles, authors, abstracts, and metadata (e.g. "deep learning", "CRISPR gene editing", "climate change adaptation").string
filterTypeOnly return works of this Crossref type. Leave empty for all types. "journal-article" is the most common for research papers.string
fromDateOnly return works published on or after this date, in YYYY-MM-DD format (e.g. 2020-01-01). Leave empty for no date floor.string
sortHow to order results. "Relevance" matches the query best; "Most cited" surfaces influential papers; "Newest first" sorts by publication date descending.string
maxItemsMaximum number of scholarly works to return. Uses deep cursor pagination to fetch beyond 100 reliably.integer
notionConnectorOptional. Write each result as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, then pick it here. Leave empty to skip (default) — results are always saved to the dataset regardless.string
notionParentIdOptional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your workspace instead.string

What you get

A structured dataset — each result includes fields like:

abstractauthorscitationsdoiissnjournalpublishedDatepublishersubjectstitletypeurl

Export every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.

2 ready-to-run use cases

Most-Cited CRISPR Papers Ranked by Citations | Crossref

Rank CRISPR gene-editing papers by citation count from Crossref's 150M-work index. DOIs, titles, authors, and journals for literature reviews.

Microplastics Literature Search: All Crossref Works

Every microplastics publication on Crossref in one dataset: journal articles, books, datasets, and preprints with DOIs for systematic reviews.

Crossref Scholarly Works Scraper

Search the Crossref catalog of 150M+ scholarly works (journal articles, preprints, books, datasets, and more) via its public REST API — no API key, no login, no anti-bot.

The actor is a polite Crossref client: it identifies itself with a contact User-Agent and a mailto query parameter so Crossref routes it to the faster "polite pool", and it uses deep cursor pagination (cursor=*next-cursor) which is the only reliable way to page past 1,000 rows.

Input

FieldTypeDefaultDescription
querystring (required)deep learningKeywords searched across titles, authors, abstracts and metadata.
filterTypestring_all_Restrict to a Crossref work type, e.g. journal-article.
fromDatestring YYYY-MM-DD_none_Only works published on/after this date.
sortenumrelevancerelevance, is-referenced-by-count (most cited), or published (newest).
maxItemsinteger100Max works to return (cursor pagination handles >100).
proxyConfigurationobject_none_Optional and off by default; Crossref is a public, no-key API with no anti-bot, so a proxy adds no benefit. Only enable it if you hit IP-level rate limits.

Output

Each successful row:

{
  "ok": true,
  "doi": "10.1038/nature14539",
  "title": "Deep learning",
  "authors": ["Yann LeCun", "Yoshua Bengio", "Geoffrey Hinton"],
  "journal": "Nature",
  "publisher": "Springer Science and Business Media LLC",
  "type": "journal-article",
  "publishedDate": "2015-05-28",
  "citations": 70000,
  "subjects": ["Multidisciplinary"],
  "issn": ["0028-0836", "1476-4687"],
  "abstract": null,
  "url": "https://doi.org/10.1038/nature14539"
}
  • authors are formatted "Given Family" (organizational authors fall back to their name).
  • publishedDate is assembled from Crossref's date-parts (may be year-only or year-month for older records).
  • citations is Crossref's is-referenced-by-count.
  • abstract is the JATS-XML abstract stripped to plain text, or null when Crossref has none.
  • Nullable fields: title, journal, publisher, type, publishedDate, abstract, and url may be null, and authors, subjects, and issn may be empty arrays, depending on what the publisher deposited with Crossref. doi is always present (rows without a DOI are dropped). citations defaults to 0 when absent.

Results are deduplicated by DOI. Charging is per successful work (work event). Diagnostic / empty / blocked rows (ok: false with an errorCode) are never charged — this includes BAD_INPUT (empty query or malformed fromDate), NO_RESULTS, and any network/block error.

Troubleshooting

  • BAD_INPUT row, no results: you left query empty or fromDate isn't YYYY-MM-DD. Fix the input and re-run — you were not charged.
  • NO_RESULTS row: your query/filter combination matched nothing in Crossref. Try broader keywords or drop the type/date filters.
  • RATE_LIMITED / BLOCKED row: rare for Crossref. The actor already retries with backoff; if it persists, enable a proxy to use a different IP.

Notes

  • Powered entirely by the public Crossref REST API (https://api.crossref.org/works). Please be considerate of the shared, free service.
  • Citation counts and abstracts depend on what publishers deposit with Crossref; coverage varies by record.