Published on

How to Create PDFs from Web Pages with Puppeteer

Generating PDFs from web pages can be incredibly useful for archiving content, creating reports, or sharing information in a portable format.

Puppeteer, a Node.js library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol, makes this task straightforward.

TL;DR

Here’s a simple example of how to use Puppeteer to generate a PDF from a specified URL:

const puppeteer = require('puppeteer');

(async () => {
    // Launch a headless browser
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    await page.goto('http://archive.org/');

    // Generate PDF from the page
    const outputPath = './page.pdf';
    await page.pdf({
        path: outputPath,
    });

    console.log("PDF generated successfully at: " + outputPath);

    await browser.close();
})();

Here you can find all the options available for generating PDFs with Puppeteer: https://pptr.dev/api/puppeteer.pdfoptions

Setup

To get started, you’ll need to have Node.js installed on your machine. Then, you can create a new project and install Puppeteer:

mkdir pdf-generator
cd pdf-generator
npm init -y
npm install puppeteer

To run the script, save it to a file named generate-pdf.js and execute it using Node.js:

node generate-pdf.js

Generate PDF rom HTML Content

If you want to generate a PDF from raw HTML content instead of a URL, you can use the page.setContent() method:

const puppeteer = require('puppeteer');

(async () => {
    // Launch a headless browser
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    // Set the HTML content
    const htmlContent = `
      <!DOCTYPE html>
      <html>
      <head>
          <title>Sample PDF</title>
      </head>
      <body>
          <h1>Hello, PDF!</h1>
          <p>This PDF was generated from HTML content.</p>
      </body>
      </html>
    `;

    await page.setContent(htmlContent);

    // Generate PDF from the HTML content
    const outputPath = './page.pdf';
    await page.pdf({
        path: outputPath,
    });

    console.log("PDF generated successfully at: " + outputPath);

    await browser.close();
})();

Additional Options

You can customize the PDF generation by providing additional options to the page.pdf() method. Here are some commonly used options:

Include/Exclude Background

await page.pdf({
  path: outputPath,
  printBackground: true,
});

Set Landscape Orientation

await page.pdf({
  path: outputPath,
  landscape: true,
});

Set Margins

await page.pdf({
  path: outputPath,
  margin: {
    top: '20px',
    right: '20px',
    bottom: '20px',
    left: '20px'
  }
});

Set Page Format

await page.pdf({
    path: outputPath,
    format: 'Letter',
});

Here you can found a list of all supported formats: https://pptr.dev/api/puppeteer.paperformat#remarks

Set Custom Width and Height

await page.pdf({
    path: outputPath,
    width: '800px',
    height: '600px',
});

Set Scale

await page.pdf({
    path: outputPath,
    scale: 0.75,
});

Conclusion

Puppeteer provides a powerful and flexible way to generate PDFs from web pages or HTML content. By leveraging its API, you can customize the output to meet your specific needs, making it a valuable tool for developers looking to automate PDF generation tasks.

See also