Wikipedia Scraper

Search Wikipedia by keyword or fetch clean page data (plain text, thumbnail, categories, URL) by title. No API key, no anti-bot. Up to 50 titles per batch.

Run this in the cloudRun on Apify →

Developer & Research Tools

How it works

1
Open it on Apify
Hit Run on Apify — it opens the tool in the cloud, no install.
2
Set the inputs
Adjust searchQuery, pageTitles, fullText (sensible defaults are pre-filled).
3
Click Run
The tool runs on Apify’s cloud and collects the data for you.
4
Export the results
Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.

Inputs

Field	What it does	Type
`searchQuery`	Keywords to search Wikipedia for (e.g. "machine learning"). Returns matching pages with snippet, word count, and URL. Leave empty if you instead provide exact Page titles below.	string
`pageTitles`	Exact Wikipedia article titles to fetch full data for (plain-text extract, thumbnail, categories, URL). Batched 50 at a time. Use this OR a search query.	array
`fullText`	Only applies in Page titles mode. When on, returns the whole article as plain text instead of just the intro paragraph(s).	boolean
`language`	Wikipedia language edition code, e.g. en, fr, de, es, ja. Picks the host {lang}.wikipedia.org.	string
`maxItems`	Maximum number of pages to return. In search mode the actor paginates until it reaches this. In page-titles mode it caps how many titles are fetched.	integer
`notionConnector`	Optional. Write each page as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, then pick it here. Leave empty to skip (default) — results are always saved to the dataset regardless.	string
`notionParentId`	Optional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your workspace instead.	string

What you get

A structured dataset — each result includes fields like:

modepageidsizesnippettimestamptitleurlwordcount

Export every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.

3 ready-to-run use cases

Wikipedia Search API: Find Pages by Keyword

Search Wikipedia by keyword and rank matching pages by relevance, with snippets and word counts so researchers can decide what to read first.

Bulk Wikipedia Fetch: Get Page Data by Title

Pass up to 50 page titles and get each Wikipedia entry's intro, thumbnail, and categories back as clean JSON for datasets and enrichment.

Full Wikipedia Article Text Scraper (Plain Text)

Need the complete body of Wikipedia articles? This task returns full plain-text content by title, ready for NLP, summarization, and text analysis.

Wikipedia Scraper

Search Wikipedia by keyword, or fetch clean, structured page data for exact titles — straight from the official MediaWiki Action API. No API key, no login, no anti-bot.

Two modes

1. Search — set searchQuery. Returns matching articles with title, pageid, url, a plain-text snippet (the API's HTML is stripped for you), wordcount, size, and timestamp. The actor paginates automatically (50 per request) up to maxItems.

2. Page data — set pageTitles (a list of exact article titles). Returns title, pageid, url, the plain-text extract, a thumbnail image URL, and categories. Titles are batched 50 at a time. Turn on fullText to get the whole article instead of just the intro.

(If both are provided, search mode wins. Provide one or the other.)

What you get per row

Field	Mode	Notes
`title`	both	Article title.
`pageid`	both	Stable Wikipedia page id (used to dedupe).
`url`	both	Canonical article URL.
`snippet`	search	Plain-text match snippet (HTML stripped).
`wordcount`, `size`, `timestamp`	search	Article word count, byte size, last-edit time.
`extract`	page	Plain-text article text (intro, or full body with `fullText`).
`thumbnail`	page	Lead image URL (up to 400px), if the page has one.
`categories`	page	Visible category names (hidden categories excluded).

Input

Field	Notes
`searchQuery`	Keywords, e.g. `machine learning`. Leave empty if using titles.
`pageTitles`	List of exact titles, e.g. `["Apify", "Web scraping"]`.
`fullText`	Page mode only. Full article text vs. just the intro. Default off.
`language`	Wikipedia edition: `en`, `fr`, `de`, `es`, `ja`, … Default `en`.
`maxItems`	Cap on returned pages. Default 50.

Output

One dataset row per page (ok: true). Charged per page. Empty searches or unknown titles return a non-charged diagnostic row with an errorCode and a human-readable reason instead of silently returning nothing.

Example

{ "searchQuery": "machine learning", "language": "en", "maxItems": 30 }

{ "pageTitles": ["Apify", "Web scraping"], "fullText": false, "language": "en" }

Notes

Uses https://{language}.wikipedia.org/w/api.php. Per Wikimedia's policy the actor always sends a descriptive User-Agent with a contact. Results are deduped by pageid.