Logo
Logo

Pyppeteer: Browser Automation, Scraping, and Fingerprint Control in Python (2026 Guide)

The Ultimate Pyppeteer Guide: Get Started with Python Web Automation

If you’re a Python developer who needs to automate web browsers, scrape dynamic pages, or manage multiple accounts across platforms, Pyppeteer has likely crossed your radar. This unofficial python port of Google’s Puppeteer library brings the power of Chrome DevTools Protocol automation to Python, letting you control Chromium browsers with async/await syntax instead of switching to Node.js.

In 2026, Pyppeteer sits in an interesting position. The project entered maintenance mode around 2022, with community forks keeping it alive for Python 3.12 compatibility and ARM64 support. Despite newer alternatives like Playwright, many python developers still rely on Pyppeteer for its lightweight footprint and familiar API—especially when they need to integrate browser automation into existing Python data pipelines.

This pyppeteer tutorial covers everything you need to get productive: installation pyppeteer on modern systems, core actions like navigation and form filling, taking screenshots and PDFs, handling cookies and iFrames, working with proxies, and understanding fingerprinting limitations. For those running multi-account operations, we’ll explore how pairing Pyppeteer with an anti-detect browser like Undetectable.io creates safer workflows with isolated profiles and realistic fingerprints.

What this article covers:

  • Installation and setup for Python 3.10–3.12
  • First scripts and core browser actions
  • Screenshots, PDFs, cookies, and dynamic content
  • Proxy configuration and basic anti-block techniques
  • Browser fingerprinting limitations and anti-detect strategies
  • Multi-account workflows for marketing automation
  • Troubleshooting, best practices, and when to consider alternatives

The image depicts a developer's workspace featuring multiple screens, each displaying various browser windows, indicative of browser automation tasks. The setup suggests the use of tools like pyppeteer for web scraping and automating web browsers, with code snippets visible that may include commands for launching a new browser instance or running a pyppeteer script.
The image depicts a developer's workspace featuring multiple screens, each displaying various browser windows, indicative of browser automation tasks. The setup suggests the use of tools like pyppeteer for web scraping and automating web browsers, with code snippets visible that may include commands for launching a new browser instance or running a pyppeteer script.

What is Pyppeteer?

Pyppeteer emerged around 2017-2018 as a community-driven effort to bring Puppeteer’s capabilities to the python language. It exposes a high level api to control Chromium browsers via the chrome devtools protocol, giving you programmatic access to everything from page navigation to network interception.

The pyppeteer library mirrors the puppeteer api almost one-to-one. You can launch browsers, create pages, navigate URLs, interact with DOM elements, capture screenshots and PDFs, and intercept network requests—all using Python’s asyncio/await patterns instead of JavaScript Promises.

Core features and environment support:

  • Python 3.8+ required (3.10–3.12 common in 2026)
  • Works on Windows 10/11, Ubuntu 20.04+/Debian, macOS 12+
  • ARM quirks on Apple Silicon (M1/M2/M3) may require system Chrome
  • Bundled Chromium ~150-170MB downloaded on first run
  • MIT license for free commercial use

Current limitations:

  • Largely maintenance mode since ~2022
  • Bundled/downloaded Chromium revisions may lag behind current stable Chrome and can cause compatibility issues.
  • No native multi-browser support (Chrome/Chromium only)
  • Some community forks address Chromium freshness and Python 3.12 compatibility
  • Cutting-edge CDP features may not work without manual updates

For production users needing long-term stability, evaluate whether the risk of outdated Chromium builds affecting TLS or WebGL compatibility is acceptable for your use case.

Why Use Pyppeteer? Key Use Cases

Unlike simple HTTP tools like requests + BeautifulSoup, Pyppeteer renders the full DOM after JavaScript execution. This matters because many modern websites use frameworks like React, Vue, Next.js, or SvelteKit that build content client-side.

Concrete use cases where Pyppeteer excels:

  • Scraping infinite scroll product listings on e-commerce sites where items load via IntersectionObserver
  • Automating signup and login flows with 2FA sliders or dynamic form validation
  • Capturing full page screenshots of dashboards for A/B testing documentation
  • Generating PDFs of invoices or reports from authenticated portals
  • Running social media warm-up flows with browsing, liking, and commenting actions
  • Monitoring ad platform metrics across rotating accounts

Why stay in Python instead of switching to Node.js for Puppeteer?

  • Reuse existing scraping/data-processing pipelines (pandas, NumPy, SQLite)
  • Asyncio integration for concurrent tab management
  • Avoid context-switching pain and serialization overhead between languages

For safer multi-account workflows, pairing Pyppeteer with an anti-detect browser like Undetectable.io makes sense: Pyppeteer handles the automation logic while Undetectable.io provides hardened fingerprints and isolated profiles that reduce correlation risk across sessions.

Pyppeteer vs Puppeteer vs Selenium (and Where Undetectable.io Fits)

Choosing between these tools depends on your language preference, browser requirements, and maintenance expectations in 2026.

Language and runtime:

  • Puppeteer: Node.js with Promises, official Google backing, weekly Chromium syncs
  • Pyppeteer: Python with asyncio, unofficial python wrapper, slower update velocity
  • Selenium: Multi-language WebDriver (Python, Java, C#, etc.), driver-based architecture

Browser coverage:

  • Pyppeteer: primarily Chrome/Chromium focused. Puppeteer: supports Chrome and Firefox.
  • Selenium: Chrome, Firefox, Edge, Safari support through respective drivers

API style and capabilities:

  • Pyppeteer/Puppeteer offer low-level DevTools access: network interception, fetch() mocking, performance tracing
  • Selenium primarily uses WebDriver/WebDriver BiDi and supports request/response interception, though the workflow differs from Puppeteer-style APIs
  • Pyppeteer can be faster than Selenium in some JavaScript-heavy scenarios, but performance depends on the site, browser setup, waits, and implementation details.

Maintenance reality:

  • Puppeteer: highly active with 90k+ GitHub stars
  • Selenium: stable and mature with broad ecosystem
  • Pyppeteer has a smaller community footprint and a much slower release cadence than Puppeteer.
  • Many Python users have moved to Playwright because it is more actively maintained.

Where Undetectable.io fits: All three tools can use proxies and custom headers, but none solve deep browser fingerprinting. Undetectable.io provides fingerprint isolation and unlimited local profiles on paid plans, customizing key browser fingerprint signals such as canvas behavior, WebGL-related data, and WebRTC exposure. For mass account workflows, this stack can reduce correlation risk compared with plain Chrome setups, but results depend on platform rules, behavior patterns, proxy quality, and session hygiene.

Installing and Setting Up Pyppeteer

Setup in 2026 starts with confirming your python version and preparing a clean environment.

Prerequisites:

Check Python version with the following command: python --version

  1. or python3 --version. You need 3.7+ (ideally 3.10-3.12).

On Linux, ensure basic dependencies are installed: apt install -y gconf-service libasound2 libatk1.0-0 libnss3 libgconf-2-4

  1. Windows 11 typically works out-of-the-box.

Create a virtual environment:

python -m venv venv

source venv/bin/activate # Linux/macOS

venv\Scripts\activate # Windows

Modern tools like uv or poetry also work well for dependency isolation.

Install Pyppeteer: Run the pyppeteer install command:

pip install pyppeteer

On first run, Pyppeteer downloads a Chromium build (roughly 150MB) to a platform-dependent pyppeteer data directory. To pre-download, run pyppeteer install command separately:

pyppeteer-install

Apple Silicon notes: If the bundled Chromium fails on M1/M2/M3 Macs, use system Chrome instead:

browser = await launch(executablePath='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome')

Common issues:

  • Corporate proxies blocking download: set PYPPETEER_DOWNLOAD_HOST to a mirror
  • Missing shared libraries on minimal Docker images: install libnss3, gtk3 dependencies
  • To force pyppeteer to use a specific Chromium revision: set PYPPETEER_CHROMIUM_REVISION environment variable

Getting Started: First Pyppeteer Script

Here’s a minimal pyppeteer script that opens a web page, prints the page title, and closes cleanly.

from pyppeteer import launch

import asyncio

async def main():

browser = await launch(headless=True)

try:

    page = await browser.newPage()

    await page.goto('https://example.com', waitUntil='domcontentloaded')

    title = await page.title()

    print(f'Page title: {title}')

finally:

    await browser.close()

asyncio.run(main())

Key elements explained:

  • from pyppeteer import launch brings in the launcher (also seen as pyppeteer import launch async patterns)
  • async def main() defines the async function containing all browser logic
  • await browser and await page are used throughout for async operations
  • launch(headless=True) starts Chrome in headless mode (no graphical user interface)
  • await page.goto() waits for the page to load before continuing
  • finally: await browser.close() ensures the browser instance closes even on errors

Running the script: Execute with asyncio.run(main()), the modern approach for Python 3.7+. This pattern forms the foundation for all pyppeteer examples that follow.

Useful launch options:

  • headless=False for debugging with visible browser
  • args=['--no-sandbox', '--disable-setuid-sandbox'] for Linux servers
  • args=['--start-maximized'] for fullscreen debugging

Core Browser Actions with Pyppeteer

This section covers daily operations: navigation, selection, clicking, typing, and waiting for elements.

Page navigation:

await page.goto('https://example.com', waitUntil='networkidle2')

The waitUntil option accepts: load, domcontentloaded, networkidle0 (no network in 500ms), or networkidle2 (2 or fewer connections). Adjust for SPAs or slow APIs.

Element selection: Pyppeteer uses J and JJ instead of $ and $$ because $ isn’t a valid identifier in Python:

element = await page.J('div.product') # CSS selector

elements = await page.JJ('.item') # All matching

xpath_el = await page.xpath('//button[@data-action="submit"]')

User interactions:

await page.type('#email', 'user@example.com', delay=100) # Realistic typing

await page.click('#submit-button')

await page.hover('.menu-item')

await page.keyboard.press('Enter')

Waiting strategies: Avoid static sleep() calls. Use explicit waits:

await page.waitForSelector('.dashboard', visible=True, timeout=30000)

await page.waitForXPath('//div[contains(text(), "Welcome")]')

await page.waitForFunction('() => document.querySelectorAll(".item").length > 10')

Login flow example: The following code snippet demonstrates a simple login:

await page.goto('https://demo-site.com/login')

await page.type('#email', 'test@example.com', delay=50)

await page.type('#password', 'demo123', delay=50)

await page.click('#login-btn')

await page.waitForSelector('.user-dashboard', timeout=15000)

print('Login successful')

Capturing Screenshots and PDFs

Visual capture matters for documenting ad variations, A/B tests, and archiving invoices.

Basic screenshot:

await page.screenshot(path='page.png')

await page.screenshot(path='full.png', fullPage=True) # Entire scrollable area

await page.screenshot(path='quality.jpg', type='jpeg', quality=90)

Viewport settings for device emulation:

await page.setViewport({'width': 1920, 'height': 1080}) # Desktop

await page.setViewport({'width': 390, 'height': 844}) # iPhone 15

PDF generation:

await page.pdf(

path='report.pdf',

format='A4',

printBackground=True,

margin={'top': '1cm', 'bottom': '1cm'}

)

Combined workflow example:

await page.goto('https://dashboard.example.com')

await page.waitForSelector('.charts-loaded')

await page.screenshot(path='dashboard.png', fullPage=True)

await page.pdf(path='dashboard.pdf', format='A4', printBackground=True)

This method accepts keyword arguments for customizing output quality and dimensions.

Handling Cookies, Sessions, and iFrames

Controlling cookies and frames preserves logins and handles embedded third-party widgets.

Cookie operations:

Get all cookies

cookies = await page.cookies()

Save to file

import json

with open('cookies.json', 'w') as f:

json.dump(cookies, f)

Restore cookies

with open('cookies.json', 'r') as f:

saved_cookies = json.load(f)

await page.setCookie(*saved_cookies)

Delete specific cookie

await page.deleteCookie({'name': 'session_id'})

For multi-account work, maintain separate cookie files per identity. However, cookies alone don’t isolate fingerprints—combining with Undetectable.io profiles provides stronger separation.

iFrame handling:

List all frames

frames = page.frames

Find frame by name or URL

payment_frame = page.frame({'name': 'stripe-checkout'})

Or by URL pattern

for frame in frames:

if 'payment-provider.com' in frame.url:

    payment_frame = frame

    break

Interact within frame

confirm_btn = await payment_frame.J('#confirm-payment')

await confirm_btn.click()

The following code demonstrates clicking inside an embedded payment modal—you must switch to the frame context first.

Dynamic Content, Alerts, and Pop-ups

Modern SPAs render content after API calls, requiring explicit handling for infinite scroll and modal dialogs.

Infinite scroll pattern:

async def scroll_and_load(page, max_items=50):

previous_count = 0

while True:

    items = await page.JJ('.product-card')

    if len(items) >= max_items or len(items) == previous_count:

        break

    previous_count = len(items)

    await page.evaluate('window.scrollTo(0, document.body.scrollHeight)')

    await page.waitForTimeout(2000)

return await page.JJ('.product-card')

Cap item counts for performance and to avoid triggering anti-bot systems.

Handling JavaScript dialogs:

page.on('dialog', lambda dialog: asyncio.ensure_future(dialog.accept()))

Or dismiss:

page.on('dialog', lambda dialog: asyncio.ensure_future(dialog.dismiss()))

Pop-ups and new windows:

Wait for new tab/window

new_target = await browser.waitForTarget(

lambda t: 'oauth' in t.url

)

new_page = await new_target.page()

await new_page.waitForSelector('#authorize-btn')

await new_page.click('#authorize-btn')

This handles OAuth flows that open web page in new windows. After configuring your automation stack, you can use tools like BrowserLeaks anonymity checks to verify that your IP, WebRTC, and DNS settings behave as expected.

Using Proxies and Evading Basic Blocks with Pyppeteer

Proxies are essential for large-scale web scraping and multi-accounting on platforms with aggressive rate limiting, and curated lists of the best proxy services for automation can help you choose stable providers.

Setting a proxy at launch:

browser = await launch(

args=['--proxy-server=http://proxy-host:8080']

)

All traffic from this new browser instance routes through the proxy.

Authenticated proxies:

page = await browser.newPage()

await page.authenticate({'username': 'proxyuser', 'password': 'proxypass'})

await page.goto('https://target-site.com')

Call authenticate() before navigation on each new page.

Basic anti-bot hygiene:

  • Vary user agents across sessions using await page.setUserAgent()
  • Add randomized delays between actions: await page.waitForTimeout(random.randint(1000, 3000))
  • Rotate proxies based on the target site's sensitivity, session design, and provider quality rather than a fixed universal interval, using reliable providers such as PlainProxies professional proxy service
  • Limit concurrency to 3-5 tabs per browser instance
  • Block unnecessary resources (images, fonts) via request interception

Limitations: Pyppeteer handles proxies and headers but doesn’t address deeper fingerprinting signals. For such scenarios involving canvas, WebGL, and fonts, you need an anti-detect solution.

Pyppeteer and Browser Fingerprinting: Limitations and Anti-detect Strategies

Browser fingerprinting in 2026 combines many signals to uniquely identify sessions: user agents, screen size, timezone, fonts, canvas/WebGL hashes, WebRTC IPs, and hardware hints.

What Pyppeteer can adjust:

  • User agent via setUserAgent()
  • Viewport and screen dimensions
  • Timezone via page.emulateTimezone('America/New_York')
  • Geolocation via page.setGeolocation()
  • Language headers
  • Some navigator properties via evaluateOnNewDocument() to execute a javascript function before page load

What Pyppeteer cannot easily mask:

  • Native canvas fingerprints (HTMLCanvasElement.toDataURL hashing)
  • WebGL renderer/vendor strings tied to actual GPU
  • System font enumeration
  • AudioContext fingerprinting
  • Hardware concurrency values

Tools like CreepJS can still reveal fingerprint leaks after basic spoofing, but the result should not be reduced to a universal single efficacy percentage. Free tools like AmIUnique.org browser fingerprint checks help you see how identifiable your setup remains. For major platforms with sophisticated risk systems, incomplete spoofing can still leave enough signals for cross-session correlation.

Anti-detect browser integration: Operators managing many accounts pair automation logic with solutions like Undetectable.io to obtain separate, realistic fingerprints per profile. You can download Undetectable for Mac and Windows and then configure it so that Undetectable.io offers:

  • Unlimited local profiles on paid plans
  • Per-profile fingerprint randomization (50+ signals)
  • Proxy assignment per profile
  • Data stored locally for security
  • Automation-friendly design for external orchestration

Recommended architecture: Rather than bending a single Pyppeteer instance to impersonate many identities, run multiple Undetectable.io profiles (each with unique proxy and fingerprint) and use local automation to trigger actions within each isolated context.

The image features multiple computer screens displaying various browser profiles, showcasing different web pages in headless mode. This setup illustrates browser automation using the pyppeteer library, highlighting the graphical user interface and the capability to manage multiple browser instances for tasks like web scraping and testing.
The image features multiple computer screens displaying various browser profiles, showcasing different web pages in headless mode. This setup illustrates browser automation using the pyppeteer library, highlighting the graphical user interface and the capability to manage multiple browser instances for tasks like web scraping and testing.

Multi-account Management and Marketing Automation Workflows

Ad arbitrage teams, affiliates, and marketplace sellers often run dozens to hundreds of accounts across platforms like Facebook Ads, TikTok, or Amazon.

Why naive Pyppeteer usage fails: Running single machine, single Chrome, shared fingerprint quickly triggers:

  • Account challenges and verification requests
  • Shadowbans and reduced reach
  • Manual reviews and permanent bans
  • Cross-account correlation leading to farm detection

Responsible multi-account workflow: If you’re still evaluating your stack, reviews of GoLogin alternatives for multi-accounting can clarify how different anti-detect browsers compare before you commit.

  1. Dedicated browser profile per account
  2. Separate proxy/geo per profile
  3. Isolated cookies, localStorage, and IndexedDB
  4. Staggered, human-like action scheduling (5-15 minute gaps)
  5. Randomized browsing patterns before core actions

Where tools fit: In some ad-buying setups, teams also add specialized cloaking services for campaigns on top of their proxy and anti-detect stack to filter unwanted traffic.

Workflow examples:

  • Account warming: Browse 10-20 pages daily, like posts, leave comments with randomized timing
  • Metrics monitoring: Login to ad dashboards, scrape CTR/CPM data, export to pandas for analysis
  • Localization testing: Auto-test multiple language versions of stores, capture screenshots for QA

These workflows benefit from test automation principles: reliable waits, error recovery, and clean session handling.

Advanced Scraping Patterns with Pyppeteer

Complex scraping involves multi-page crawls, parsing dynamic pages, and integrating with Python data tools.

Paginated crawl pattern:

products = []

page_num = 1

while True:

await page.goto(f'https://shop.example.com/category?page={page_num}')

await page.waitForSelector('.product-grid')



items = await page.evaluate('''() => {

    return Array.from(document.querySelectorAll('.product')).map(el => ({

        title: el.querySelector('h3').innerText,

        price: el.querySelector('.price').innerText

    }))

}''')

products.extend(items)



next_disabled = await page.J('.pagination .next[disabled]')

if next_disabled or page_num >= 10:

    break

page_num += 1

This javascript expression extracts all the data from product cards before moving to the next page.

Combining with BeautifulSoup:

from bs4 import BeautifulSoup

html = await page.content()

soup = BeautifulSoup(html, 'html.parser')

titles = [h.text for h in soup.select('.product h3')]

Interactive scraping: Type into search, wait for suggestions, extract results:

await page.type('#search', 'laptop', delay=100)

await page.waitForSelector('.suggestions')

suggestions = await page.JJ('.suggestion-item')

Basic concurrency:

import asyncio

async def scrape_url(url):

browser = await launch(headless=True)

# ... scrape logic

await browser.close()

urls = ['https://site.com/1', 'https://site.com/2', 'https://site.com/3']

await asyncio.gather(*(scrape_url(url) for url in urls))

Keep concurrency low (3-5 instances) on single machines to avoid memory issues and detection.

Common Errors and Troubleshooting Pyppeteer in 2026

A quick reference for frequent issues when running pyppeteer scripts.

“Browser closed unexpectedly” on Linux:

  • Cause: Missing system dependencies
  • Fix: apt install libnss3 libgconf-2-4 libasound2 libatk1.0-0

Chromium download failures:

  • Cause: Corporate firewall blocking googleapis.com
  • Fix: Set PYPPETEER_DOWNLOAD_HOST to mirror, or manually place Chromium in program files location

Timeouts on goto():

  • Cause: Slow SPAs or network issues
  • Fix: Increase timeout: await page.goto(url, timeout=60000), try waitUntil='load'

“Target closed” errors:

  • Cause: Unhandled exceptions leaving browser in bad state
  • Fix: Wrap in try/finally, ensure browser.close() always runs

Jupyter notebook issues:

  • Cause: Already-running event loop conflicts
  • Fix: Use nest_asyncio.apply() before running

Zombie Chrome processes:

  • Cause: Scripts killed without cleanup
  • Fix: ps aux | grep chrome then kill orphaned processes

Debugging:

  • Enable verbose logs: launch(logLevel='debug')
  • Listen to console: page.on('console', lambda msg: print(msg.text))

Best Practices for Stable and Low-Profile Pyppeteer Automation

These habits improve robustness and reduce detection risk.

Resource management:

  • Always use try/finally with await browser.close() to prevent zombie processes
  • Use asyncio.run() as entry point for clean event loop handling
  • Close pages explicitly when done: await page.close()

Waiting strategies:

  • Prefer waitForSelector(), waitForXPath(), waitForNavigation() over static sleep
  • Use waitForFunction() for complex conditions based on javascript expression evaluation
  • Set reasonable timeouts (10-30 seconds) based on target url response times

Anti-detection hygiene:

  • Limit concurrency: 3-5 tabs per IP
  • Randomize delays: random.uniform(0.5, 2.5) seconds between actions
  • Rotate proxies every 10-20 requests
  • Vary user agents across sessions

Code organization:

  • Separate orchestration (scheduling, retries) from page workflows (login, scrape)
  • Create reusable functions: async def login(page, credentials)
  • Store credentials in environment variables or secrets managers

Compliance:

  • Check robots.txt before scraping
  • Avoid collecting sensitive personal data
  • Respect rate limits mentioned earlier in site terms

When to Use Alternatives (Playwright, APIs, and Anti-detect Browsers)

Pyppeteer makes sense for existing codebases, small projects, and learning browser automation. But when should you migrate?

Consider Playwright Python when:

  • You need cross-browser coverage (Chrome, Firefox, WebKit)
  • Mobile emulation is important
  • You want active maintenance and the latest version of CDP support
  • Auto-waiting and better debugging tools matter

Consider Selenium when:

  • Legacy systems require WebDriver compatibility
  • Safari testing is needed
  • Team already has Selenium expertise

Consider hosted APIs when:

  • Scale exceeds hundreds of thousands of requests
  • You need managed proxy rotation and CAPTCHA handling
  • Infrastructure management isn’t your focus

The key decision for anonymity-critical work: The choice isn’t just “Pyppeteer vs alternatives” but “plain browser vs anti-detect stack.” For multi-account operations at scale, Undetectable.io provides what automation tools alone cannot:

  • Unlimited local profiles with unique fingerprints
  • Per-profile proxy configuration
  • Local data storage for security
  • Designed for orchestration with external automation

Conclusion

The image shows a person focused on their computer, with a code snippet displayed on the screen, likely related to browser automation using the pyppeteer library. This scene captures the essence of web scraping and automating tasks with Python in a modern graphical user interface.
The image shows a person focused on their computer, with a code snippet displayed on the screen, likely related to browser automation using the pyppeteer library. This scene captures the essence of web scraping and automating tasks with Python in a modern graphical user interface.

Pyppeteer remains a practical choice for Python-based browser automation in 2026. As an unofficial python port of Puppeteer, it brings Chrome DevTools Protocol control to automating web browsers without requiring a switch to Node.js. For scraping javascript heavy websites, capturing screenshots, generating PDFs, and handling dynamic pages, it delivers results with a simple example script and scales to moderate workloads.

The trade-offs are clear: maintenance lags behind Puppeteer, Chromium versions may be outdated, and deeper fingerprint spoofing requires additional tools. For routine scraping and QA tasks, these limitations rarely matter. For serious multi-account operations where detection means bans, combining Pyppeteer automation logic with Undetectable.io’s fingerprint isolation creates a more resilient workflow.

Start with a small, well-structured script following the patterns in this guide. Layer in explicit waits, error handling, and proxy rotation as you scale. When you’re ready to run multiple accounts safely, set up your first Undetectable.io profile alongside your Pyppeteer automation—and experience the difference proper fingerprint control makes.

Undetectable Team
Undetectable Team Anti-detection Experts
Undetectable - the perfect solution for
More details