Academic & Research Scrapers & Data Tools
5 ready-to-run Academic & Research data tools. Search arXiv, OpenAlex, Crossref, npm and the Internet Archive. No code — run them on the Apify cloud and export the results.
Academic & Research tools
arXiv Scraper
Search arXiv and get clean JSON: titles, abstracts, authors, categories, DOI, dates and PDF links. No API key. Sort by relevance or date; push to Notion.
OpenAlex Scholarly Works Scraper
Search 250M+ OpenAlex papers by keyword. Get titles, authors, venue, year, citations, concepts, OA links and full abstracts as structured JSON. No API key.
Crossref Scholarly Works Scraper
Search 150M+ scholarly works on Crossref and export DOI, authors, journal, citation count, and abstracts as JSON or CSV. Filter by type and date.
Package Registry Scraper (npm + PyPI)
Get npm and PyPI package metadata - version, license, repo, keywords, and npm download counts. Search by keyword or look up exact names. No API key needed.
Internet Archive Scraper
Search archive.org by keyword and export clean items (title, creator, year, downloads, item URL). Filter by media type, sort by popularity or date.
Popular Academic & Research use cases
arXiv LLM Paper Scraper - Abstracts, Authors, PDFs
Run a keyword search across arXiv for large language model papers, ranked by relevance, with abstracts, author lists, and direct PDF download links.
Latest arXiv cs.CL Papers - Newest NLP Preprints
Track new cs.CL (Computation and Language) preprints the moment they hit arXiv, sorted by submission date, so NLP researchers never miss a release.
Latest CRISPR Papers from OpenAlex, Newest First
Track CRISPR gene-editing research on OpenAlex sorted newest first, with authors, journal, citation counts, and full abstracts for each work.
Top-Cited Deep Learning Papers from OpenAlex
The foundational deep learning reading list, ranked by citation count from OpenAlex with authors, venue, and abstracts for literature reviews and ML research.
Most-Cited CRISPR Papers Ranked by Citations | Crossref
Rank CRISPR gene-editing papers by citation count from Crossref's 150M-work index. DOIs, titles, authors, and journals for literature reviews.
Microplastics Literature Search: All Crossref Works
Every microplastics publication on Crossref in one dataset: journal articles, books, datasets, and preprints with DOIs for systematic reviews.
npm Search Scraper: Downloads, License & Repo
Search the npm registry by keyword and compare each package's version, license, source repo, and monthly downloads side by side. Great for JS library research.
npm License Audit: Map package.json Deps
Feed your package.json dependencies and return the license and source repository for each npm package - a fast OSS compliance check for engineering teams.
Archive.org Book Search by Keyword to JSON
Free public-domain books from archive.org's text collection by keyword, with author, publication year and item link for every title. Ideal for researchers.
Newest Archive.org Uploads for Any Search Term
Track recently added archive.org items for any topic, sorted newest first by upload date, each with its title, date and direct link. Great for monitoring.