Bluesky Scraper
Scrape Bluesky posts by keyword or handle. Get post text, URLs, likes, reposts, and full profiles as clean JSON. No login or API key needed.
How it works
- 1Open it on Apify
Hit Run on Apify — it opens the tool in the cloud, no install.
- 2Set the inputs
Adjust
searchQuery,authorHandles,maxItems(sensible defaults are pre-filled). - 3Click Run
The tool runs on Apify’s cloud and collects the data for you.
- 4Export the results
Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.
Inputs
| Field | What it does | Type |
|---|---|---|
searchQuery | Keyword(s) to search Bluesky posts for (e.g. "artificial intelligence"). Leave empty if you instead want to scrape specific authors via Author handles. | string |
authorHandles | Bluesky handles to scrape (e.g. bsky.app, jay.bsky.team). For each handle the actor returns the author's profile plus their recent posts. The leading @ is optional. Leave empty if using a Search query instead. | array |
maxItems | Maximum number of posts to return per search query or per author handle. Pagination follows the API cursor until this limit is reached. | integer |
notionConnector | Optional. Write each post as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, then pick it here. Leave empty to skip (default) — results are always saved to the dataset regardless. | string |
notionParentId | Optional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your workspace instead. | string |
What you get
A structured dataset — each result includes fields like:
authorHandlesdetailssearchQueryExport every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.
3 ready-to-run use cases
Bluesky Brand Monitoring: Track Mentions & Engagement
Social teams can see who's posting about a brand on Bluesky, with each post's author, like count, and repost count for real-time mention tracking.
Scrape Multiple Bluesky Accounts: Posts + Profiles
Feed a list of Bluesky handles and export every account's recent posts and profile into one dataset, ready for a competitor or creator roundup.
Bluesky Dataset by Keyword for Sentiment & NLP
Researchers collect thousands of keyword-matched Bluesky posts as a clean dataset for sentiment analysis, text labeling, and NLP model training.
Bluesky Scraper
Scrape Bluesky through its public AT Protocol XRPC API — no login, no API key, no anti-bot. Two modes:
- Search — pass a
searchQuerykeyword and get matching public posts. - Authors — pass
authorHandles(e.g.bsky.app) and get each author's profile (followers, bio, post count) plus their recent posts.
Input
| Field | Type | Description |
|---|---|---|
searchQuery | string | Keyword(s) to search posts for. Use this or authorHandles. |
authorHandles | array of strings | Handles to scrape (the leading @ is optional). For each, returns the profile + recent posts. |
maxItems | integer | Max posts per query / per author (default 100). Follows the API cursor until reached. |
proxyConfiguration | object | Optional. The Bluesky public API has no anti-bot and needs no proxy, so this is off by default. Only enable it if you hit IP rate limits. |
Provide at least one of searchQuery or authorHandles.
Output
Each post row:
{
"ok": true,
"type": "post",
"uri": "at://did:plc:.../app.bsky.feed.post/3kxyz...",
"postUrl": "https://bsky.app/profile/bsky.app/post/3kxyz...",
"authorHandle": "bsky.app",
"authorName": "Bluesky",
"authorDid": "did:plc:...",
"text": "…",
"createdAt": "2024-01-01T00:00:00.000Z",
"likeCount": 0,
"repostCount": 0,
"replyCount": 0,
"quoteCount": 0,
"langs": ["en"]
}
In author mode a profile row (type: "profile") is also emitted per handle, with did, handle, displayName, description, followersCount, followsCount, postsCount, avatar, banner, createdAt, and profileUrl.
Posts are deduplicated by uri. The rkey used in postUrl is the last path segment of the post uri.
Nullable fields. Some fields can be null when the API omits them: on posts, postUrl, authorHandle, authorName, authorDid, and createdAt (counts default to 0, text to "", langs to []); on profiles, handle, displayName, description, avatar, banner, createdAt, and profileUrl (counts default to 0).
Diagnostics
If the run fails or returns nothing, a single ok:false row is pushed with an errorCode (BAD_INPUT, NO_RESULTS, RATE_LIMITED, SERVER_ERROR, NETWORK, …) and a human-readable error message. Diagnostic rows are never charged.
Troubleshooting. If you get a BAD_INPUT row, set searchQuery to a keyword or add at least one handle to authorHandles. A NO_RESULTS row means the API answered but had nothing for that query/author — Bluesky's public index is smaller and sparser than Twitter/X, so broad keywords may return few posts. If you see RATE_LIMITED from many parallel runs, enable the optional proxy or lower the volume.
Billing
Charged per unique post returned (post event). Profile rows and diagnostic rows are not charged.
API
Built on the public host https://api.bsky.app:
app.bsky.feed.searchPostsapp.bsky.feed.getAuthorFeedapp.bsky.actor.getProfile
All are public, cursor-paginated GET/JSON endpoints. We hit api.bsky.app directly rather than the documented public.api.bsky.app alias: the alias is fronted by BunnyCDN, which intermittently returns 403 for searchPosts in some regions, whereas api.bsky.app is the same public AppView served directly and is more reliable.