apify-sdk-integration
Integrate Apify into an existing JavaScript/TypeScript or Python application using the apify-client package. Use when adding web scraping, automation, or data extraction capabilities to an existing app via the Apify API.
Skill body
Apify SDK Integration
Add Apify Actor execution to an existing application. This skill covers the apify-client package for JS/TS and Python, plus the REST API for other languages.
When to Use This Skill
- Adding web scraping or automation to an existing app
- Calling Apify Actors programmatically from application code
- Building a product that uses Apify as a backend service
- Integrating Actor results into a data pipeline
Critical: Package Naming
apify-clientis the API client for calling Actors from your app.apifyis the SDK for building Actors (wrong package for this use case).Always install
apify-client. Never installapifyfor integration work.
Prerequisites
The user needs an APIFY_TOKEN. Direct them to Console > Settings > Integrations at https://console.apify.com/settings/integrations to create one. If they don’t have an account: https://console.apify.com/sign-up (free, no credit card).
Store the token securely — environment variable or secrets manager, never hardcoded.
Finding the Right Actor
Before writing integration code, find the Actor that fits the user’s needs. Use the MCP tools if available:
search-actors— search the Apify Store by keywordfetch-actor-details— get the Actor’s input schema, output format, and pricing
Alternatively, browse https://apify.com/store. Append .md to any Actor’s Store URL to get its docs in markdown.
JavaScript / TypeScript
Install
npm install apify-client
Synchronous Execution (wait for results)
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('apify/web-scraper').call({
startUrls: [{ url: 'https://example.com' }],
maxPagesPerCrawl: 10,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
.call() blocks until the Actor finishes. Use for short-running Actors (under a few minutes).
Asynchronous Execution (start and poll/retrieve later)
const run = await client.actor('apify/web-scraper').start({
startUrls: [{ url: 'https://example.com' }],
});
// Poll for completion
const finishedRun = await client.run(run.id).waitForFinish();
// Retrieve results
const { items } = await client.dataset(finishedRun.defaultDatasetId).listItems();
Use .start() + .waitForFinish() for long-running Actors or when you need the run ID immediately.
Retrieving Results
// Dataset items (structured data from pushData)
const { items } = await client.dataset(run.defaultDatasetId).listItems({
limit: 100,
offset: 0,
});
// Key-value store (files, screenshots, etc.)
const record = await client.keyValueStore(run.defaultKeyValueStoreId).getRecord('OUTPUT');
Error Handling
try {
const run = await client.actor('apify/web-scraper').call(input);
if (run.status !== 'SUCCEEDED') {
const log = await client.log(run.id).get();
throw new Error(`Actor failed with status ${run.status}: ${log}`);
}
const { items } = await client.dataset(run.defaultDatasetId).listItems();
} catch (error) {
if (error.message?.includes('not found')) {
// Actor ID is wrong or Actor was deleted
} else if (error.statusCode === 401) {
// Invalid or missing APIFY_TOKEN
}
throw error;
}
Python
Install
pip install apify-client
Synchronous Execution
from apify_client import ApifyClient
import os
client = ApifyClient(token=os.environ['APIFY_TOKEN'])
run = client.actor('apify/web-scraper').call(run_input={
'startUrls': [{'url': 'https://example.com'}],
'maxPagesPerCrawl': 10,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
Asynchronous Execution
run = client.actor('apify/web-scraper').start(run_input={
'startUrls': [{'url': 'https://example.com'}],
})
# Poll for completion
finished_run = client.run(run['id']).wait_for_finish()
items = client.dataset(finished_run['defaultDatasetId']).list_items().items
Async Client (asyncio)
from apify_client import ApifyClientAsync
client = ApifyClientAsync(token=os.environ['APIFY_TOKEN'])
run = await client.actor('apify/web-scraper').call(run_input={
'startUrls': [{'url': 'https://example.com'}],
})
items = (await client.dataset(run['defaultDatasetId']).list_items()).items
REST API (Any Language)
For languages without an official client, use the REST API directly.
Start a Run
POST https://api.apify.com/v2/acts/{actorId}/runs
Authorization: Bearer <APIFY_TOKEN>
Content-Type: application/json
{ "startUrls": [{ "url": "https://example.com" }] }
Get Run Status
GET https://api.apify.com/v2/acts/{actorId}/runs/{runId}
Authorization: Bearer <APIFY_TOKEN>
Get Dataset Items
GET https://api.apify.com/v2/datasets/{datasetId}/items?format=json
Authorization: Bearer <APIFY_TOKEN>
Full API reference: https://docs.apify.com/api/v2
Best Practices
- Set timeouts: Pass
timeoutSecsin the Actor input or usewaitSecson.call()to avoid indefinite waits. - Paginate large datasets: Use
limitandoffsetwhen retrieving dataset items. Default limit is 250K items. - Reuse clients: Create one
ApifyClientinstance and reuse it across calls. - Handle Actor-specific input: Every Actor has its own input schema. Use
fetch-actor-detailsMCP tool or append.mdto the Actor’s Store URL to get the schema before constructing input.
Documentation
- Apify API client for JS: https://docs.apify.com/api/client/js
- Apify API client for Python: https://docs.apify.com/api/client/python
- REST API reference: https://docs.apify.com/api/v2
- Apify docs (LLM-friendly): https://docs.apify.com/llms.txt
- Apify docs (full): https://docs.apify.com/llms-full.txt
If the Apify MCP server is available, use search-apify-docs and fetch-apify-docs tools for contextual documentation lookups during development.