Request a tool
All toolsMCP serverRequest a toolPlatformsCategories
Wikipedia Scraper icon

Wikipedia Scraper

Search Wikipedia by keyword or fetch clean page data (plain text, thumbnail, categories, URL) by title. No API key, no anti-bot. Up to 50 titles per batch.

Run this in the cloudRun on Apify →

Developer & Research Tools

How it works

  1. 1
    Open it on Apify

    Hit Run on Apify — it opens the tool in the cloud, no install.

  2. 2
    Set the inputs

    Adjust searchQuery, pageTitles, fullText (sensible defaults are pre-filled).

  3. 3
    Click Run

    The tool runs on Apify’s cloud and collects the data for you.

  4. 4
    Export the results

    Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.

Inputs

FieldWhat it doesType
searchQueryKeywords to search Wikipedia for (e.g. "machine learning"). Returns matching pages with snippet, word count, and URL. Leave empty if you instead provide exact Page titles below.string
pageTitlesExact Wikipedia article titles to fetch full data for (plain-text extract, thumbnail, categories, URL). Batched 50 at a time. Use this OR a search query.array
fullTextOnly applies in Page titles mode. When on, returns the whole article as plain text instead of just the intro paragraph(s).boolean
languageWikipedia language edition code, e.g. en, fr, de, es, ja. Picks the host {lang}.wikipedia.org.string
maxItemsMaximum number of pages to return. In search mode the actor paginates until it reaches this. In page-titles mode it caps how many titles are fetched.integer
notionConnectorOptional. Write each page as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, then pick it here. Leave empty to skip (default) — results are always saved to the dataset regardless.string
notionParentIdOptional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your workspace instead.string

What you get

A structured dataset — each result includes fields like:

modepageidsizesnippettimestamptitleurlwordcount

Export every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.

3 ready-to-run use cases

Wikipedia Search API: Find Pages by Keyword

Search Wikipedia by keyword and rank matching pages by relevance, with snippets and word counts so researchers can decide what to read first.

Bulk Wikipedia Fetch: Get Page Data by Title

Pass up to 50 page titles and get each Wikipedia entry's intro, thumbnail, and categories back as clean JSON for datasets and enrichment.

Full Wikipedia Article Text Scraper (Plain Text)

Need the complete body of Wikipedia articles? This task returns full plain-text content by title, ready for NLP, summarization, and text analysis.

Wikipedia Scraper

Search Wikipedia by keyword, or fetch clean, structured page data for exact titles — straight from the official MediaWiki Action API. No API key, no login, no anti-bot.

Two modes

1. Search — set searchQuery. Returns matching articles with title, pageid, url, a plain-text snippet (the API's HTML is stripped for you), wordcount, size, and timestamp. The actor paginates automatically (50 per request) up to maxItems.

2. Page data — set pageTitles (a list of exact article titles). Returns title, pageid, url, the plain-text extract, a thumbnail image URL, and categories. Titles are batched 50 at a time. Turn on fullText to get the whole article instead of just the intro.

(If both are provided, search mode wins. Provide one or the other.)

What you get per row

FieldModeNotes
titlebothArticle title.
pageidbothStable Wikipedia page id (used to dedupe).
urlbothCanonical article URL.
snippetsearchPlain-text match snippet (HTML stripped).
wordcount, size, timestampsearchArticle word count, byte size, last-edit time.
extractpagePlain-text article text (intro, or full body with fullText).
thumbnailpageLead image URL (up to 400px), if the page has one.
categoriespageVisible category names (hidden categories excluded).

Input

FieldNotes
searchQueryKeywords, e.g. machine learning. Leave empty if using titles.
pageTitlesList of exact titles, e.g. ["Apify", "Web scraping"].
fullTextPage mode only. Full article text vs. just the intro. Default off.
languageWikipedia edition: en, fr, de, es, ja, … Default en.
maxItemsCap on returned pages. Default 50.

Output

One dataset row per page (ok: true). Charged per page. Empty searches or unknown titles return a non-charged diagnostic row with an errorCode and a human-readable reason instead of silently returning nothing.

Example

{ "searchQuery": "machine learning", "language": "en", "maxItems": 30 }
{ "pageTitles": ["Apify", "Web scraping"], "fullText": false, "language": "en" }

Notes

Uses https://{language}.wikipedia.org/w/api.php. Per Wikimedia's policy the actor always sends a descriptive User-Agent with a contact. Results are deduped by pageid.