PermalinkWhat is Puppeteer?
Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium browsers over the DevTools Protocol (i.e. set of APIs that allows you to interact programmatically with Chrome or Chromium browsers. This protocol provides a way to inspect, debug, and profile, and otherwise control the browser.)
We can install it by using - npm install puppeteer
PermalinkExamples of tasks it can perform -
Web scraping
Automated testing
Generate screenshots and pdf
Mimic different environments and much more...
PermalinkBasic functions -
// Launching a Browser
const browser = await puppeteer.launch();
// Opening a New Page
const page = await browser.newPage();
// Navigate to url
await page.goto("url")
// take screenshot
await page.screenshot({ path: 'ss.png' });
// close the browser
await browser.close();
PermalinkQuery selectors -
It is used to select and interact with elements on a web page. These selectors are similar to the ones you would use in JavaScript
// selects the first element that matches the specified CSS selector
const element = await page.$('selector');
// selects all elements that match the specified CSS selector
const elements = await page.$$('selector');
// find child element
const innerElement = await element.$('inner-selector');
// find multiple child elements of same type
const innerElements = await element.$$('inner-selector');
// to find out a deep descendent element by querying
const deep = await element.$('div >>> a');
p elements
P
elements are pseudo-elements with a -p
vendor prefix. It allows you to enhance your selectors with Puppeteer-specific query engines such as XPath, text queries, and ARIA.
// text
const txtEl = await element.$('div ::-p-text(hello)')
// using xpath
const xpathEl = await element.$('div ::-p-xpath(h1)')
// using aria
const ariaEl = await element.$('::-p-aria(Submit)');
PermalinkLocators -
It enables automatic retries for failed actions, resulting in more reliable and less flaky automation scripts along with functionalities of waitings and actions.
// waiting for button to be enabled
const btn = await page.locator('button').wait();
// clicking an element
await page.locator('button').click();
// fill value inside input
await page.locator('input').fill('value');
// hovering over an element
await page.locator('button').hover();
// scroll through the page
await page.locator('div').scroll({
scrollTop: 0,
});
// get event listening for locators
await page
.locator('button')
.on(LocatorEmittedEvents.Action, () => {
console.log("clicked");
})
.click();
PermalinkEvaluate javascript -
This provides us with the option to run and evaluate js on the page fetched by a puppeteer.
// simple function
const sum = await page.evaluate(() => {
return 1 + 2;
});
console.log(sum); // gives 3
// passing arguments in our evalutation function
const sum = await page.evaluate((a,b) => {
return a + b;
});
console.log(sum(1,1)); // gives 2
One example would be scraping out some important information with the help of selectors and then running some function to make it useful to store or display on your webpage.
PermalinkPuppeteer and cheerio -
While Puppeteer allows for browser automation, interaction, and rendering of JavaScript-heavy sites, Cheerio offers a fast and lightweight way to parse and traverse the DOM with a jQuery-like syntax. We can use Puppeteer to easily gain access to the websites we want to scrape out rather than just simply requesting the webpage by making a post
, get
request and then we can use Cheerio to scrape out important information easily.
If we only use Cheerio we have to take care of every request we make along with their payload and headers. Even though it's not that tough we can go through multiple debugging sessions as a beginner.
Below is a simple example of their combined usage -
const puppeteer = require('puppeteer');
const cheerio = require('cheerio');
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
const content = await page.content();
const $ = cheerio.load(content);
$('selector').each((index, element) => {
// Extract data using jQuery-like syntax
console.log($(element).text());
});
await browser.close();
PermalinkFollow up -
If you have any questions, you can comment below. Will try to come up with more interesting things ๐