Which puppeteer we can easy to create PDF from HTML. You can find a previous post at there. How to generate PDF from HTML?
Which is the best practice using puppeteer to create pdf?
Here are two approach using puppeteer.
- Always launch puppeteer and create a new browser to create pdf. The browser will be closed when created pdf is done.
;(async () => {
const finalHtml = 'html content...'
const browser = await puppeteer.launch()
const page = await browser.pages()[0]
await page.setContent(finalHtml)
await page.pdf({ path: 'hn.pdf', format: 'A4' })
await browser.close()
})()
- Keep 1 instance browser. Always create a new page to create pdf. The page will be closed when created pdf is done.
const browser = await puppeteer.launch()(async () => {
const finalHtml = 'html content...'
const page = await browser.newPage()
await page.setContent(finalHtml)
await page.pdf({ path: 'hn.pdf', format: 'A4' })
await page.close()
})()
Solution
- Short Lived Browser (Opening new browser instance every time)
Pros:
- A new session opens each time, one instance doesn't interfere with another.
- Perfect for testing multiple credentials over same/multiple websites.
- Can use instance wide proxy.
Cons:
- You cannot share data between two instance easily (unless you use userDataDir or cookies).
- Takes more time to open.
- Long Lived Browser (Sharing same browser instance every time)
Pros:
- Opening a new tab takes way less time than opening a new chrome with empty profile.
- Data is shared between two instance easily. Perfect for scraping/testing same website with same credentials.
Cons:
- You won't be able to use authentication and cookies on same website using different credentials.
- Cannot use instance wide proxy (at this moment).
Benchmark
Here is a benchmark for running only 100 times. The code to run:
const bench = require('@entrptaher/async-bench')
const puppeteer = require('puppeteer')
const createNewBrowser = async function () {
const finalHtml = 'html content...'
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.setContent(finalHtml)
await page.pdf({ path: 'hn_shortlived.pdf', format: 'A4' })
await browser.close()
return true
}
let longLivedBrowser
const useExisting = async function () {
const finalHtml = 'html content...'
if (!longLivedBrowser) {
longLivedBrowser = await puppeteer.launch()
}
const page = await longLivedBrowser.newPage()
await page.setContent(finalHtml)
await page.pdf({ path: 'hn_longlived.pdf', format: 'A4' })
await page.close()
return true
}
let longLivedNoNewTab
const useExitingTab = async function () {
const finalHtml = 'html content...'
if (!longLivedNoNewTab) {
longLivedNoNewTab = await puppeteer.launch()
}
const page = (await longLivedNoNewTab.pages())[0]
await page.setContent(finalHtml)
await page.pdf({ path: 'hn_longlived.pdf', format: 'A4' })
return true
}
const times = 100
Promise.all([bench(createNewBrowser, times), bench(useExisting, times), bench(useExitingTab, times)]).then(console.log)
The result:
;[
{ meanExecTime: 277.3644104500115, execTime: 27736.44104500115, resultOfMethod: true },
{ meanExecTime: 36.89182792000472, execTime: 3689.1827920004725, resultOfMethod: true },
{ meanExecTime: 11.07780257999897, execTime: 1107.780257999897, resultOfMethod: true },
]
Each has following:
- meanExecTime: Average time to run
- execTime: Total time to run
- resultOfMethod: Just some result for identification
The benchmark is incomplete because it doesn't have the machine details etc. But it definitely shows that opening browser each time will result in more time, even if it's only 100 times.
On second function, you will also notice opening a new tab takes time, so on third function if you don't close the page, it will take even less time.
Summary
- If you need performance (11ms compared to 277ms), don't care about session, go for existing tab.
- If you want to run multiple test on same window in parallel, go for new tab.
- If you need sessions and persistence, go for new browser instance.
More detail at this post on Stackoverflow.
Thanks @Md. Abu Taher answer on my post.