GDELT Worldwide News Scraper
Search global news with the GDELT 2.0 API. Get articles with title, URL, domain, country, language, and date. Filter by keyword, timespan, country. No API key.
How it works
- 1Open it on Apify
Hit Run on Apify — it opens the tool in the cloud, no install.
- 2Set the inputs
Adjust
query,maxItems,sort(sensible defaults are pre-filled). - 3Click Run
The tool runs on Apify’s cloud and collects the data for you.
- 4Export the results
Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.
Inputs
| Field | What it does | Type |
|---|---|---|
query | Keywords to search worldwide news for. Wrap phrases in quotes (e.g. "climate change"). Combine with OR, or operators like domain:reuters.com. Very short or very common single words may be rejected by GDELT — add a second word or quote a phrase. | string |
maxItems | Maximum number of articles to return. GDELT hard-caps a single request at 250 — higher values are automatically capped at 250. | integer |
sort | How to order results: DateDesc (newest first), DateAsc (oldest first), or HybridRel (by relevance to the query). | string |
timespan | Only return articles from the last N units of time, e.g. 1d, 3d, 1w, 1m, 3m. Leave empty for GDELT's default window. GDELT only covers roughly the last 3 months. | string |
sourceCountry | Limit to articles from a given source country, appended to the query as sourcecountry:{code}. Use GDELT country codes (e.g. US, UK, FR, DE, IN). Leave empty for all countries. | string |
sourceLang | Limit to articles in a given language, appended to the query as sourcelang:{code} (e.g. english, french, spanish, german). Leave empty for all languages. | string |
notionConnector | Optional. Write each article as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, then pick it here. Leave empty to skip (default) — results are always saved to the dataset regardless. | string |
notionParentId | Optional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your workspace instead. | string |
What you get
A structured dataset — each result includes fields like:
detailsquerysortsourceCountrysourceLangtimespanExport every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.
2 ready-to-run use cases
AI News Scraper Worldwide, Newest First | GDELT
Newest artificial intelligence headlines from global news outlets, sorted latest-first. Returns title, URL, source, country and date for AI trend tracking.
World Cup News Scraper, Global Coverage | GDELT
FIFA World Cup articles from every country and language GDELT indexes, ranked by relevance. Each result returns headline, link, source country and date.
GDELT Worldwide News Scraper
Search worldwide news through the GDELT 2.0 DOC API — a public, no-key, no-login JSON endpoint that indexes online news from across the planet in 65+ languages. No proxy or anti-bot handling needed.
What it does
Given a search query, the actor calls the GDELT DOC API in ArtList mode and returns clean, structured articles. You can filter by recency, source country, and source language, and sort by newest, oldest, or relevance.
Input
| Field | Type | Default | Notes |
|---|---|---|---|
query | string | — (required) | Keywords. Quote phrases: "climate change". Very short/common single words may be rejected by GDELT. |
maxItems | integer | 100 | Capped at 250 — GDELT returns at most 250 articles per request. |
sort | string | DateDesc | DateDesc (newest), DateAsc (oldest), HybridRel (relevance). |
timespan | string | — | Recency window, e.g. 1d, 3d, 1w, 1m, 3m. GDELT covers ~the last 3 months. |
sourceCountry | string | — | Appended as sourcecountry:{code} (e.g. US, UK, FR). |
sourceLang | string | — | Appended as sourcelang:{code} (e.g. english, french). |
proxyConfiguration | object | — | Optional; not needed (public API). |
Output
Each successful row:
{
"ok": true,
"title": "…",
"url": "https://…",
"domain": "example.com",
"sourceCountry": "United States",
"language": "English",
"publishedAt": "2026-06-11T12:00:00.000Z",
"socialImage": "https://…"
}
Results are de-duplicated by URL. Each ok:true article is billed one article charge unit. Diagnostic rows (ok:false) and empty/blocked runs are never charged.
Nullable fields: GDELT does not always populate every field. Any of title, url, domain, sourceCountry, language, publishedAt, and socialImage can be null for a given article (e.g. socialImage is often missing, and publishedAt is null when GDELT's seendate is unparseable). Rows with neither a url nor a title are dropped before charging.
Diagnostics
The actor never fails silently. Instead it writes a single diagnostic row (ok:false) with an errorCode and never charges for it:
BAD_INPUT— GDELT rejected the query (e.g. "query too short"). Quote phrases and avoid overly short/common terms.NO_RESULTS— the query was valid but matched nothing. Broaden it or widen the timespan.RATE_LIMITED/SERVER_ERROR/NETWORK— transient issues; the actor retried with backoff first.
Notes / quirks
- GDELT requires the query to be URL-encoded and phrases to be quoted — the actor handles both.
- On a malformed query GDELT may return a
text/plainerror string (sometimes with HTTP 200) or an empty body instead of JSON. The actor guardsJSON.parseand surfaces a clearBAD_INPUTdiagnostic. - GDELT's index covers roughly the last 3 months of news.
- The actor rotates a real browser User-Agent per request attempt for retry resilience, and supports an optional proxy (
proxyConfiguration). Neither is required — GDELT is a public no-key API with no anti-bot — so leave the proxy unset unless you hit IP-level rate limits.