If you’re a Python developer who needs to automate web browsers, scrape dynamic pages, or manage multiple accounts across platforms, Pyppeteer has likely crossed your radar. This unofficial python port of Google’s Puppeteer library brings the power of Chrome DevTools Protocol automation to Python, letting you control Chromium browsers with async/await syntax instead of switching to Node.js.
In 2026, Pyppeteer sits in an interesting position. The project entered maintenance mode around 2022, with community forks keeping it alive for Python 3.12 compatibility and ARM64 support. Despite newer alternatives like Playwright, many python developers still rely on Pyppeteer for its lightweight footprint and familiar API—especially when they need to integrate browser automation into existing Python data pipelines.
This pyppeteer tutorial covers everything you need to get productive: installation pyppeteer on modern systems, core actions like navigation and form filling, taking screenshots and PDFs, handling cookies and iFrames, working with proxies, and understanding fingerprinting limitations. For those running multi-account operations, we’ll explore how pairing Pyppeteer with an anti-detect browser like Undetectable.io creates safer workflows with isolated profiles and realistic fingerprints.
What this article covers:
- Installation and setup for Python 3.10–3.12
- First scripts and core browser actions
- Screenshots, PDFs, cookies, and dynamic content
- Proxy configuration and basic anti-block techniques
- Browser fingerprinting limitations and anti-detect strategies
- Multi-account workflows for marketing automation
- Troubleshooting, best practices, and when to consider alternatives
What is Pyppeteer?
Pyppeteer emerged around 2017-2018 as a community-driven effort to bring Puppeteer’s capabilities to the python language. It exposes a high level api to control Chromium browsers via the chrome devtools protocol, giving you programmatic access to everything from page navigation to network interception.
The pyppeteer library mirrors the puppeteer api almost one-to-one. You can launch browsers, create pages, navigate URLs, interact with DOM elements, capture screenshots and PDFs, and intercept network requests—all using Python’s asyncio/await patterns instead of JavaScript Promises.
Core features and environment support:
- Python 3.8+ required (3.10–3.12 common in 2026)
- Works on Windows 10/11, Ubuntu 20.04+/Debian, macOS 12+
- ARM quirks on Apple Silicon (M1/M2/M3) may require system Chrome
- Bundled Chromium ~150-170MB downloaded on first run
- MIT license for free commercial use
Current limitations:
- Largely maintenance mode since ~2022
- Bundled/downloaded Chromium revisions may lag behind current stable Chrome and can cause compatibility issues.
- No native multi-browser support (Chrome/Chromium only)
- Some community forks address Chromium freshness and Python 3.12 compatibility
- Cutting-edge CDP features may not work without manual updates
For production users needing long-term stability, evaluate whether the risk of outdated Chromium builds affecting TLS or WebGL compatibility is acceptable for your use case.
Why Use Pyppeteer? Key Use Cases
Unlike simple HTTP tools like requests + BeautifulSoup, Pyppeteer renders the full DOM after JavaScript execution. This matters because many modern websites use frameworks like React, Vue, Next.js, or SvelteKit that build content client-side.
Concrete use cases where Pyppeteer excels:
- Scraping infinite scroll product listings on e-commerce sites where items load via IntersectionObserver
- Automating signup and login flows with 2FA sliders or dynamic form validation
- Capturing full page screenshots of dashboards for A/B testing documentation
- Generating PDFs of invoices or reports from authenticated portals
- Running social media warm-up flows with browsing, liking, and commenting actions
- Monitoring ad platform metrics across rotating accounts
Why stay in Python instead of switching to Node.js for Puppeteer?
- Reuse existing scraping/data-processing pipelines (pandas, NumPy, SQLite)
- Asyncio integration for concurrent tab management
- Avoid context-switching pain and serialization overhead between languages
For safer multi-account workflows, pairing Pyppeteer with an anti-detect browser like Undetectable.io makes sense: Pyppeteer handles the automation logic while Undetectable.io provides hardened fingerprints and isolated profiles that reduce correlation risk across sessions.
Pyppeteer vs Puppeteer vs Selenium (and Where Undetectable.io Fits)
Choosing between these tools depends on your language preference, browser requirements, and maintenance expectations in 2026.
Language and runtime:
- Puppeteer: Node.js with Promises, official Google backing, weekly Chromium syncs
- Pyppeteer: Python with asyncio, unofficial python wrapper, slower update velocity
- Selenium: Multi-language WebDriver (Python, Java, C#, etc.), driver-based architecture
Browser coverage:
- Pyppeteer: primarily Chrome/Chromium focused. Puppeteer: supports Chrome and Firefox.
- Selenium: Chrome, Firefox, Edge, Safari support through respective drivers
API style and capabilities:
- Pyppeteer/Puppeteer offer low-level DevTools access: network interception, fetch() mocking, performance tracing
- Selenium primarily uses WebDriver/WebDriver BiDi and supports request/response interception, though the workflow differs from Puppeteer-style APIs
- Pyppeteer can be faster than Selenium in some JavaScript-heavy scenarios, but performance depends on the site, browser setup, waits, and implementation details.
Maintenance reality:
- Puppeteer: highly active with 90k+ GitHub stars
- Selenium: stable and mature with broad ecosystem
- Pyppeteer has a smaller community footprint and a much slower release cadence than Puppeteer.
- Many Python users have moved to Playwright because it is more actively maintained.
Where Undetectable.io fits: All three tools can use proxies and custom headers, but none solve deep browser fingerprinting. Undetectable.io provides fingerprint isolation and unlimited local profiles on paid plans, customizing key browser fingerprint signals such as canvas behavior, WebGL-related data, and WebRTC exposure. For mass account workflows, this stack can reduce correlation risk compared with plain Chrome setups, but results depend on platform rules, behavior patterns, proxy quality, and session hygiene.
Installing and Setting Up Pyppeteer
Setup in 2026 starts with confirming your python version and preparing a clean environment.
Prerequisites:
Check Python version with the following command: python --version
- or python3 --version. You need 3.7+ (ideally 3.10-3.12).
On Linux, ensure basic dependencies are installed: apt install -y gconf-service libasound2 libatk1.0-0 libnss3 libgconf-2-4
- Windows 11 typically works out-of-the-box.
Create a virtual environment:
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
Modern tools like uv or poetry also work well for dependency isolation.
Install Pyppeteer: Run the pyppeteer install command:
pip install pyppeteer
On first run, Pyppeteer downloads a Chromium build (roughly 150MB) to a platform-dependent pyppeteer data directory. To pre-download, run pyppeteer install command separately:
pyppeteer-install
Apple Silicon notes: If the bundled Chromium fails on M1/M2/M3 Macs, use system Chrome instead:
browser = await launch(executablePath='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome')
Common issues:
- Corporate proxies blocking download: set PYPPETEER_DOWNLOAD_HOST to a mirror
- Missing shared libraries on minimal Docker images: install libnss3, gtk3 dependencies
- To force pyppeteer to use a specific Chromium revision: set PYPPETEER_CHROMIUM_REVISION environment variable
Getting Started: First Pyppeteer Script
Here’s a minimal pyppeteer script that opens a web page, prints the page title, and closes cleanly.
from pyppeteer import launch
import asyncio
async def main():
browser = await launch(headless=True)
try:
page = await browser.newPage()
await page.goto('https://example.com', waitUntil='domcontentloaded')
title = await page.title()
print(f'Page title: {title}')
finally:
await browser.close()
asyncio.run(main())
Key elements explained:
- from pyppeteer import launch brings in the launcher (also seen as pyppeteer import launch async patterns)
- async def main() defines the async function containing all browser logic
- await browser and await page are used throughout for async operations
- launch(headless=True) starts Chrome in headless mode (no graphical user interface)
- await page.goto() waits for the page to load before continuing
- finally: await browser.close() ensures the browser instance closes even on errors
Running the script: Execute with asyncio.run(main()), the modern approach for Python 3.7+. This pattern forms the foundation for all pyppeteer examples that follow.
Useful launch options:
- headless=False for debugging with visible browser
- args=['--no-sandbox', '--disable-setuid-sandbox'] for Linux servers
- args=['--start-maximized'] for fullscreen debugging
Core Browser Actions with Pyppeteer
This section covers daily operations: navigation, selection, clicking, typing, and waiting for elements.
Page navigation:
await page.goto('https://example.com', waitUntil='networkidle2')
The waitUntil option accepts: load, domcontentloaded, networkidle0 (no network in 500ms), or networkidle2 (2 or fewer connections). Adjust for SPAs or slow APIs.
Element selection: Pyppeteer uses J and JJ instead of $ and $$ because $ isn’t a valid identifier in Python:
element = await page.J('div.product') # CSS selector
elements = await page.JJ('.item') # All matching
xpath_el = await page.xpath('//button[@data-action="submit"]')
User interactions:
await page.type('#email', 'user@example.com', delay=100) # Realistic typing
await page.click('#submit-button')
await page.hover('.menu-item')
await page.keyboard.press('Enter')
Waiting strategies: Avoid static sleep() calls. Use explicit waits:
await page.waitForSelector('.dashboard', visible=True, timeout=30000)
await page.waitForXPath('//div[contains(text(), "Welcome")]')
await page.waitForFunction('() => document.querySelectorAll(".item").length > 10')
Login flow example: The following code snippet demonstrates a simple login:
await page.goto('https://demo-site.com/login')
await page.type('#email', 'test@example.com', delay=50)
await page.type('#password', 'demo123', delay=50)
await page.click('#login-btn')
await page.waitForSelector('.user-dashboard', timeout=15000)
print('Login successful')
Capturing Screenshots and PDFs
Visual capture matters for documenting ad variations, A/B tests, and archiving invoices.
Basic screenshot:
await page.screenshot(path='page.png')
await page.screenshot(path='full.png', fullPage=True) # Entire scrollable area
await page.screenshot(path='quality.jpg', type='jpeg', quality=90)
Viewport settings for device emulation:
await page.setViewport({'width': 1920, 'height': 1080}) # Desktop
await page.setViewport({'width': 390, 'height': 844}) # iPhone 15
PDF generation:
await page.pdf(
path='report.pdf',
format='A4',
printBackground=True,
margin={'top': '1cm', 'bottom': '1cm'}
)
Combined workflow example:
await page.goto('https://dashboard.example.com')
await page.waitForSelector('.charts-loaded')
await page.screenshot(path='dashboard.png', fullPage=True)
await page.pdf(path='dashboard.pdf', format='A4', printBackground=True)
This method accepts keyword arguments for customizing output quality and dimensions.
Handling Cookies, Sessions, and iFrames
Controlling cookies and frames preserves logins and handles embedded third-party widgets.
Cookie operations:
Get all cookies
cookies = await page.cookies()
Save to file
import json
with open('cookies.json', 'w') as f:
json.dump(cookies, f)
Restore cookies
with open('cookies.json', 'r') as f:
saved_cookies = json.load(f)
await page.setCookie(*saved_cookies)
Delete specific cookie
await page.deleteCookie({'name': 'session_id'})
For multi-account work, maintain separate cookie files per identity. However, cookies alone don’t isolate fingerprints—combining with Undetectable.io profiles provides stronger separation.
iFrame handling:
List all frames
frames = page.frames
Find frame by name or URL
payment_frame = page.frame({'name': 'stripe-checkout'})
Or by URL pattern
for frame in frames:
if 'payment-provider.com' in frame.url:
payment_frame = frame
break
Interact within frame
confirm_btn = await payment_frame.J('#confirm-payment')
await confirm_btn.click()
The following code demonstrates clicking inside an embedded payment modal—you must switch to the frame context first.
Dynamic Content, Alerts, and Pop-ups
Modern SPAs render content after API calls, requiring explicit handling for infinite scroll and modal dialogs.
Infinite scroll pattern:
async def scroll_and_load(page, max_items=50):
previous_count = 0
while True:
items = await page.JJ('.product-card')
if len(items) >= max_items or len(items) == previous_count:
break
previous_count = len(items)
await page.evaluate('window.scrollTo(0, document.body.scrollHeight)')
await page.waitForTimeout(2000)
return await page.JJ('.product-card')
Cap item counts for performance and to avoid triggering anti-bot systems.
Handling JavaScript dialogs:
page.on('dialog', lambda dialog: asyncio.ensure_future(dialog.accept()))
Or dismiss:
page.on('dialog', lambda dialog: asyncio.ensure_future(dialog.dismiss()))
Pop-ups and new windows:
Wait for new tab/window
new_target = await browser.waitForTarget(
lambda t: 'oauth' in t.url
)
new_page = await new_target.page()
await new_page.waitForSelector('#authorize-btn')
await new_page.click('#authorize-btn')
This handles OAuth flows that open web page in new windows. After configuring your automation stack, you can use tools like BrowserLeaks anonymity checks to verify that your IP, WebRTC, and DNS settings behave as expected.
Using Proxies and Evading Basic Blocks with Pyppeteer
Proxies are essential for large-scale web scraping and multi-accounting on platforms with aggressive rate limiting, and curated lists of the best proxy services for automation can help you choose stable providers.
Setting a proxy at launch:
browser = await launch(
args=['--proxy-server=http://proxy-host:8080']
)
All traffic from this new browser instance routes through the proxy.
Authenticated proxies:
page = await browser.newPage()
await page.authenticate({'username': 'proxyuser', 'password': 'proxypass'})
await page.goto('https://target-site.com')
Call authenticate() before navigation on each new page.
Basic anti-bot hygiene:
- Vary user agents across sessions using await page.setUserAgent()
- Add randomized delays between actions: await page.waitForTimeout(random.randint(1000, 3000))
- Rotate proxies based on the target site's sensitivity, session design, and provider quality rather than a fixed universal interval, using reliable providers such as PlainProxies professional proxy service
- Limit concurrency to 3-5 tabs per browser instance
- Block unnecessary resources (images, fonts) via request interception
Limitations: Pyppeteer handles proxies and headers but doesn’t address deeper fingerprinting signals. For such scenarios involving canvas, WebGL, and fonts, you need an anti-detect solution.
Pyppeteer and Browser Fingerprinting: Limitations and Anti-detect Strategies
Browser fingerprinting in 2026 combines many signals to uniquely identify sessions: user agents, screen size, timezone, fonts, canvas/WebGL hashes, WebRTC IPs, and hardware hints.
What Pyppeteer can adjust:
- User agent via setUserAgent()
- Viewport and screen dimensions
- Timezone via page.emulateTimezone('America/New_York')
- Geolocation via page.setGeolocation()
- Language headers
- Some navigator properties via evaluateOnNewDocument() to execute a javascript function before page load
What Pyppeteer cannot easily mask:
- Native canvas fingerprints (HTMLCanvasElement.toDataURL hashing)
- WebGL renderer/vendor strings tied to actual GPU
- System font enumeration
- AudioContext fingerprinting
- Hardware concurrency values
Tools like CreepJS can still reveal fingerprint leaks after basic spoofing, but the result should not be reduced to a universal single efficacy percentage. Free tools like AmIUnique.org browser fingerprint checks help you see how identifiable your setup remains. For major platforms with sophisticated risk systems, incomplete spoofing can still leave enough signals for cross-session correlation.
Anti-detect browser integration: Operators managing many accounts pair automation logic with solutions like Undetectable.io to obtain separate, realistic fingerprints per profile. You can download Undetectable for Mac and Windows and then configure it so that Undetectable.io offers:
- Unlimited local profiles on paid plans
- Per-profile fingerprint randomization (50+ signals)
- Proxy assignment per profile
- Data stored locally for security
- Automation-friendly design for external orchestration
Recommended architecture: Rather than bending a single Pyppeteer instance to impersonate many identities, run multiple Undetectable.io profiles (each with unique proxy and fingerprint) and use local automation to trigger actions within each isolated context.
Multi-account Management and Marketing Automation Workflows
Ad arbitrage teams, affiliates, and marketplace sellers often run dozens to hundreds of accounts across platforms like Facebook Ads, TikTok, or Amazon.
Why naive Pyppeteer usage fails: Running single machine, single Chrome, shared fingerprint quickly triggers:
- Account challenges and verification requests
- Shadowbans and reduced reach
- Manual reviews and permanent bans
- Cross-account correlation leading to farm detection
Responsible multi-account workflow: If you’re still evaluating your stack, reviews of GoLogin alternatives for multi-accounting can clarify how different anti-detect browsers compare before you commit.
- Dedicated browser profile per account
- Separate proxy/geo per profile
- Isolated cookies, localStorage, and IndexedDB
- Staggered, human-like action scheduling (5-15 minute gaps)
- Randomized browsing patterns before core actions
Where tools fit: In some ad-buying setups, teams also add specialized cloaking services for campaigns on top of their proxy and anti-detect stack to filter unwanted traffic.
- Pyppeteer: scripting repetitive tasks (posting, stats collection, QA checks)
- Undetectable.io: per-profile fingerprint isolation and pricing plans, proxy assignment, profile management
Workflow examples:
- Account warming: Browse 10-20 pages daily, like posts, leave comments with randomized timing
- Metrics monitoring: Login to ad dashboards, scrape CTR/CPM data, export to pandas for analysis
- Localization testing: Auto-test multiple language versions of stores, capture screenshots for QA
These workflows benefit from test automation principles: reliable waits, error recovery, and clean session handling.
Advanced Scraping Patterns with Pyppeteer
Complex scraping involves multi-page crawls, parsing dynamic pages, and integrating with Python data tools.
Paginated crawl pattern:
products = []
page_num = 1
while True:
await page.goto(f'https://shop.example.com/category?page={page_num}')
await page.waitForSelector('.product-grid')
items = await page.evaluate('''() => {
return Array.from(document.querySelectorAll('.product')).map(el => ({
title: el.querySelector('h3').innerText,
price: el.querySelector('.price').innerText
}))
}''')
products.extend(items)
next_disabled = await page.J('.pagination .next[disabled]')
if next_disabled or page_num >= 10:
break
page_num += 1
This javascript expression extracts all the data from product cards before moving to the next page.
Combining with BeautifulSoup:
from bs4 import BeautifulSoup
html = await page.content()
soup = BeautifulSoup(html, 'html.parser')
titles = [h.text for h in soup.select('.product h3')]
Interactive scraping: Type into search, wait for suggestions, extract results:
await page.type('#search', 'laptop', delay=100)
await page.waitForSelector('.suggestions')
suggestions = await page.JJ('.suggestion-item')
Basic concurrency:
import asyncio
async def scrape_url(url):
browser = await launch(headless=True)
# ... scrape logic
await browser.close()
urls = ['https://site.com/1', 'https://site.com/2', 'https://site.com/3']
await asyncio.gather(*(scrape_url(url) for url in urls))
Keep concurrency low (3-5 instances) on single machines to avoid memory issues and detection.
Common Errors and Troubleshooting Pyppeteer in 2026
A quick reference for frequent issues when running pyppeteer scripts.
“Browser closed unexpectedly” on Linux:
- Cause: Missing system dependencies
- Fix: apt install libnss3 libgconf-2-4 libasound2 libatk1.0-0
Chromium download failures:
- Cause: Corporate firewall blocking googleapis.com
- Fix: Set PYPPETEER_DOWNLOAD_HOST to mirror, or manually place Chromium in program files location
Timeouts on goto():
- Cause: Slow SPAs or network issues
- Fix: Increase timeout: await page.goto(url, timeout=60000), try waitUntil='load'
“Target closed” errors:
- Cause: Unhandled exceptions leaving browser in bad state
- Fix: Wrap in try/finally, ensure browser.close() always runs
Jupyter notebook issues:
- Cause: Already-running event loop conflicts
- Fix: Use nest_asyncio.apply() before running
Zombie Chrome processes:
- Cause: Scripts killed without cleanup
- Fix: ps aux | grep chrome then kill orphaned processes
Debugging:
- Enable verbose logs: launch(logLevel='debug')
- Listen to console: page.on('console', lambda msg: print(msg.text))
Best Practices for Stable and Low-Profile Pyppeteer Automation
These habits improve robustness and reduce detection risk.
Resource management:
- Always use try/finally with await browser.close() to prevent zombie processes
- Use asyncio.run() as entry point for clean event loop handling
- Close pages explicitly when done: await page.close()
Waiting strategies:
- Prefer waitForSelector(), waitForXPath(), waitForNavigation() over static sleep
- Use waitForFunction() for complex conditions based on javascript expression evaluation
- Set reasonable timeouts (10-30 seconds) based on target url response times
Anti-detection hygiene:
- Limit concurrency: 3-5 tabs per IP
- Randomize delays: random.uniform(0.5, 2.5) seconds between actions
- Rotate proxies every 10-20 requests
- Vary user agents across sessions
Code organization:
- Separate orchestration (scheduling, retries) from page workflows (login, scrape)
- Create reusable functions: async def login(page, credentials)
- Store credentials in environment variables or secrets managers
Compliance:
- Check robots.txt before scraping
- Avoid collecting sensitive personal data
- Respect rate limits mentioned earlier in site terms
When to Use Alternatives (Playwright, APIs, and Anti-detect Browsers)
Pyppeteer makes sense for existing codebases, small projects, and learning browser automation. But when should you migrate?
Consider Playwright Python when:
- You need cross-browser coverage (Chrome, Firefox, WebKit)
- Mobile emulation is important
- You want active maintenance and the latest version of CDP support
- Auto-waiting and better debugging tools matter
Consider Selenium when:
- Legacy systems require WebDriver compatibility
- Safari testing is needed
- Team already has Selenium expertise
Consider hosted APIs when:
- Scale exceeds hundreds of thousands of requests
- You need managed proxy rotation and CAPTCHA handling
- Infrastructure management isn’t your focus
The key decision for anonymity-critical work: The choice isn’t just “Pyppeteer vs alternatives” but “plain browser vs anti-detect stack.” For multi-account operations at scale, Undetectable.io provides what automation tools alone cannot:
- Unlimited local profiles with unique fingerprints
- Per-profile proxy configuration
- Local data storage for security
- Designed for orchestration with external automation
Conclusion
Pyppeteer remains a practical choice for Python-based browser automation in 2026. As an unofficial python port of Puppeteer, it brings Chrome DevTools Protocol control to automating web browsers without requiring a switch to Node.js. For scraping javascript heavy websites, capturing screenshots, generating PDFs, and handling dynamic pages, it delivers results with a simple example script and scales to moderate workloads.
The trade-offs are clear: maintenance lags behind Puppeteer, Chromium versions may be outdated, and deeper fingerprint spoofing requires additional tools. For routine scraping and QA tasks, these limitations rarely matter. For serious multi-account operations where detection means bans, combining Pyppeteer automation logic with Undetectable.io’s fingerprint isolation creates a more resilient workflow.
Start with a small, well-structured script following the patterns in this guide. Layer in explicit waits, error handling, and proxy rotation as you scale. When you’re ready to run multiple accounts safely, set up your first Undetectable.io profile alongside your Pyppeteer automation—and experience the difference proper fingerprint control makes.