Wikipedia Scraper
Search Wikipedia by keyword or fetch clean page data (plain text, thumbnail, categories, URL) by title. No API key, no anti-bot. Up to 50 titles per batch.
How it works
- 1Open it on Apify
Hit Run on Apify — it opens the tool in the cloud, no install.
- 2Set the inputs
Adjust
searchQuery,pageTitles,fullText(sensible defaults are pre-filled). - 3Click Run
The tool runs on Apify’s cloud and collects the data for you.
- 4Export the results
Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.
Inputs
| Field | What it does | Type |
|---|---|---|
searchQuery | Keywords to search Wikipedia for (e.g. "machine learning"). Returns matching pages with snippet, word count, and URL. Leave empty if you instead provide exact Page titles below. | string |
pageTitles | Exact Wikipedia article titles to fetch full data for (plain-text extract, thumbnail, categories, URL). Batched 50 at a time. Use this OR a search query. | array |
fullText | Only applies in Page titles mode. When on, returns the whole article as plain text instead of just the intro paragraph(s). | boolean |
language | Wikipedia language edition code, e.g. en, fr, de, es, ja. Picks the host {lang}.wikipedia.org. | string |
maxItems | Maximum number of pages to return. In search mode the actor paginates until it reaches this. In page-titles mode it caps how many titles are fetched. | integer |
notionConnector | Optional. Write each page as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, then pick it here. Leave empty to skip (default) — results are always saved to the dataset regardless. | string |
notionParentId | Optional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your workspace instead. | string |
What you get
A structured dataset — each result includes fields like:
modepageidsizesnippettimestamptitleurlwordcountExport every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.
3 ready-to-run use cases
Wikipedia Search API: Find Pages by Keyword
Search Wikipedia by keyword and rank matching pages by relevance, with snippets and word counts so researchers can decide what to read first.
Bulk Wikipedia Fetch: Get Page Data by Title
Pass up to 50 page titles and get each Wikipedia entry's intro, thumbnail, and categories back as clean JSON for datasets and enrichment.
Full Wikipedia Article Text Scraper (Plain Text)
Need the complete body of Wikipedia articles? This task returns full plain-text content by title, ready for NLP, summarization, and text analysis.
Wikipedia Scraper
Search Wikipedia by keyword, or fetch clean, structured page data for exact titles — straight from the official MediaWiki Action API. No API key, no login, no anti-bot.
Two modes
1. Search — set searchQuery. Returns matching articles with title, pageid, url, a plain-text snippet (the API's HTML is stripped for you), wordcount, size, and timestamp. The actor paginates automatically (50 per request) up to maxItems.
2. Page data — set pageTitles (a list of exact article titles). Returns title, pageid, url, the plain-text extract, a thumbnail image URL, and categories. Titles are batched 50 at a time. Turn on fullText to get the whole article instead of just the intro.
(If both are provided, search mode wins. Provide one or the other.)
What you get per row
| Field | Mode | Notes |
|---|---|---|
title | both | Article title. |
pageid | both | Stable Wikipedia page id (used to dedupe). |
url | both | Canonical article URL. |
snippet | search | Plain-text match snippet (HTML stripped). |
wordcount, size, timestamp | search | Article word count, byte size, last-edit time. |
extract | page | Plain-text article text (intro, or full body with fullText). |
thumbnail | page | Lead image URL (up to 400px), if the page has one. |
categories | page | Visible category names (hidden categories excluded). |
Input
| Field | Notes |
|---|---|
searchQuery | Keywords, e.g. machine learning. Leave empty if using titles. |
pageTitles | List of exact titles, e.g. ["Apify", "Web scraping"]. |
fullText | Page mode only. Full article text vs. just the intro. Default off. |
language | Wikipedia edition: en, fr, de, es, ja, … Default en. |
maxItems | Cap on returned pages. Default 50. |
Output
One dataset row per page (ok: true). Charged per page. Empty searches or unknown titles return a non-charged diagnostic row with an errorCode and a human-readable reason instead of silently returning nothing.
Example
{ "searchQuery": "machine learning", "language": "en", "maxItems": 30 }
{ "pageTitles": ["Apify", "Web scraping"], "fullText": false, "language": "en" }
Notes
Uses https://{language}.wikipedia.org/w/api.php. Per Wikimedia's policy the actor always sends a descriptive User-Agent with a contact. Results are deduped by pageid.