We often encounter the need to create PDFs based on content. While there is no right or wrong way to generate PDFs, some approaches are more efficient and quicker to build than others.
Previously, we had to write all the boilerplate code to generate PDFs in our applications.
However, now we have many great libraries and tools that can help us quickly implement this feature.
The most important part of generating PDFs is the input data. The most common and useful approach is to generate PDFs from HTML content or based on a website URL.
In this article, we will look into some approaches that we can take to generate PDFs from HTML
Why generate PDF from HTML?
Before we move on to the libraries, first let’s see why we prefer HTML as input data for generating PDFs. Some of the reasons are as follows
- Open and Mature Technology: HTML is an open standard, which ensures that tools and technologies built around it are widely available and well-understood. Its maturity also means that most of the challenges and quirks are well-documented, making troubleshooting easier.
- Cost-effective: There are a plethora of tools, libraries, and APIs available (both free and paid) that can convert HTML to PDF, reducing the need for specialized software for PDF creation.
- Embed Multimedia: HTML supports the embedding of multimedia such as images, videos, and audio. Although not all of these can be directly translated into a PDF, having a source in HTML provides options for creating rich, multimedia-enhanced documents.
- Styling with CSS: Cascading Style Sheets (CSS) provide powerful styling options for HTML content, allowing for branding, theming, and visual consistency. These can then be reflected in the resulting PDF.
- Easy to Learn and Use: Learning the basics of HTML can be done quickly, making it accessible for many users to create content.
In summary, converting PDFs from HTML combines the best of both worlds: the flexibility, accessibility, and interactivity of HTML with the portability and Standardization of PDFs.
HTML to PDF using Python Libraries
There are many libraries available in Python that allow the generation of PDFs from HTML content, some of them are explained below.
When generating HTML to PDF in Python, we need libraries and solutions which does not compromise the formatting of the PDF. With the following Open Source libraries you don’t need to worry about losing formatting because all the below solutions take care of the formatting when generating HTML to PDF using Python.
i. Pyppeteer
Pyppeteer is a Python port of the Node library Puppeteer, which provides a high-level API over the Chrome DevTools Protocol. it’s like you are running a browser in your code that can do similar things that your browser can do. Puppeteer can be used to scrap data from websites, take screenshots for a website, and much more. Let’s see how we can utilize pyppeteer to generate PDFs from HTML.
First, we need to install pyppeteer with the following command:
pip install pyppeteer
Generate PDF from a website URL
import asyncio
from pyppeteer import launch
async def generate_pdf(url, pdf_path):
browser = await launch()
page = await browser.newPage()
await page.goto(url)
await page.pdf({'path': pdf_path, 'format': 'A4'})
await browser.close()
# Run the function
asyncio.get_event_loop().run_until_complete(generate_pdf('https://example.com', 'example.pdf'))
In the above code, if you see the generate_pdf
method, we are doing the following things
Launching a new headless browser instance
Opens a new tab or page in the headless browser and waits for it to be ready.
Navigate to the URL specified in the
url
argument and wait for the page to load.Generates a PDF of the webpage. The PDF is saved at the location specified in
pdf_path
, and the format is set toA4
.Closes the headless browser.
Generate PDF from Custom HTML content
import asyncio
from pyppeteer import launch
async def generate_pdf_from_html(html_content, pdf_path):
browser = await launch()
page = await browser.newPage()
await page.setContent(html_content)
await page.pdf({'path': pdf_path, 'format': 'A4'})
await browser.close()
# HTML content
html_content = '''
<!DOCTYPE html>
<html>
<head>
<title>PDF Example</title>
</head>
<body>
<h1>Hello, world!</h1>
</body>
</html>
'''
# Run the function
asyncio.get_event_loop().run_until_complete(generate_pdf_from_html(html_content, 'from_html.pdf'))
Above is another example using Pyppeteer on how we can use our own custom HTML content to generate PDFs. Let’s see what is happening in the method generate_pdf_from_html
- Launching a new headless browser instance
- Opens a new tab or page in the headless browser and waits for it to be ready.
- Now we are explicitly setting the content of the page to our HTML content
- Generates a PDF of the webpage. The PDF is saved at the location specified in
pdf_path
, and the format is set to ‘A4’. - Closes the headless browser.
ii. xhtml2pdf
xhtml2pdf is another Python library that lets you generate PDFs from HTML content. Let’s see xhtml2pdf in action.
The following command is to install xhtml2pdf:
pip install xhtml2pdf requests
To generate PDF from a website URL
Note that xhtml2pdf does not have an in-built feature to parse the URL, but we can use requests in Python to get the content from a URL.
from xhtml2pdf import pisa
import requests
def convert_url_to_pdf(url, pdf_path):
# Fetch the HTML content from the URL
response = requests.get(url)
if response.status_code != 200:
print(f"Failed to fetch URL: {url}")
return False
html_content = response.text
# Generate PDF
with open(pdf_path, "wb") as pdf_file:
pisa_status = pisa.CreatePDF(html_content, dest=pdf_file)
return not pisa_status.err
# URL to fetch
url_to_fetch = "https://google.com"
# PDF path to save
pdf_path = "google.pdf"
# Generate PDF
if convert_url_to_pdf(url_to_fetch, pdf_path):
print(f"PDF generated and saved at {pdf_path}")
else:
print("PDF generation failed")
In the above code, we are doing the following things in our method convert_url_to_pdf
First, we are using
requests
to get the webpage content from the URL.Once we get the content, we select the text part from the response using
response.text
Now the generating PDF part comes, we are using
pisa.CreatePDF
and pass our HTML content and PDF file name for the output.
Generating PDF from custom HTML content
from xhtml2pdf import pisa
def convert_html_to_pdf(html_string, pdf_path):
with open(pdf_path, "wb") as pdf_file:
pisa_status = pisa.CreatePDF(html_string, dest=pdf_file)
return not pisa_status.err
# HTML content
html_content = '''
<!DOCTYPE html>
<html>
<head>
<title>PDF Example</title>
</head>
<body>
<h1>Hello, world!</h1>
</body>
</html>
'''
# Generate PDF
pdf_path = "example.pdf"
if convert_html_to_pdf(html_content, pdf_path):
print(f"PDF generated and saved at {pdf_path}")
else:
print("PDF generation failed")
Generating PDF from custom HTML content is also similar to what we have done for the URL part, the only change here is, that we are passing the actual HTML content to our generating method. Now it will use our custom HTML content and generate PDF from it.
iii. python-pdfkit
python-pdfkit is a python wrapper for wkhtmltopdf utility to convert HTML to PDF using Webkit.
First, we need to install python-pdfkit with pip:
pip install pdfkit
To generate PDF from website URL
import pdfkit
def convert_url_to_pdf(url, pdf_path):
try:
pdfkit.from_url(url, pdf_path)
print(f"PDF generated and saved at {pdf_path}")
except Exception as e:
print(f"PDF generation failed: {e}")
# URL to fetch
url_to_fetch = 'https://example.com'
# PDF path to save
pdf_path = 'example_from_url.pdf'
# Generate PDF
convert_url_to_pdf(url_to_fetch, pdf_path)
pdfkit supports generating PDFs from website URLs out of the box just like Pyppeteer.
In the above code, as you can see, pdfkit is generating pdf just from one line code. pdfkit.from_url
is all you need to generate a PDF.
Generating PDF from custom HTML content
import pdfkit
def convert_html_to_pdf(html_content, pdf_path):
try:
pdfkit.from_string(html_content, pdf_path)
print(f"PDF generated and saved at {pdf_path}")
except Exception as e:
print(f"PDF generation failed: {e}")
# HTML content
html_content = '''
<!DOCTYPE html>
<html>
<head>
<title>PDF Example</title>
</head>
<body>
<h1>Hello, world!</h1>
</body>
</html>
'''
# PDF path to save
pdf_path = 'example_from_html.pdf'
# Generate PDF
convert_html_to_pdf(html_content, pdf_path)
For generating PDF from custom HTML content, we only need to use pdfkit.from_string and provide HTML content and a pdf file path.
We find out more about python-pdfkit, we have a detailed article in the following link:
iv. Playwright
Playwright is a modern and lightweight library for using a headless browser in your application. Playwright is primarily used in automation testing with its powerful offering of integrations with modern browsers. Currently, Playwright supports Firefox, Chromium, Edge & Safari. Playwright is Cross-platform, cross-browser, and cross-language.
In this article, we will look into how we can convert HTML to PDF in Python using Playwright without losing formatting and quality.
the first step is to install the Playwright library
pip install playwright
playwright install
playwright install command will make sure to install a headless browser in your system which will be used to convert HTML to PDF in Python.
Generate PDF from website URL
import asyncio
from playwright.async_api import async_playwright
async def url_to_pdf(url, output_path):
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.goto(url)
await page.pdf(path=output_path)
await browser.close()
# Example usage
url = 'https://google.com'
output_path = 'html-to-pdf-output.pdf'
asyncio.run(url_to_pdf(url, output_path))
In the above code, we have url_to_pdf method which takes the URL of the website and output path as input parameters. It creates an async function to run a headless browser.
we use chromium.launch() to launch a new instance of the browser and create a new page with browser.new_page()
once we have a new page ready, we then load our URL which will be the source for HTML to PDF in this case. Once everything is done, we call browser.close() to close the browser instance.
Generate PDF from custom HTML content
We can also use Playwright to generate custom HTML to PDF along with directly loading HTML content from the website URL. To generate HTML to PDF from your custom HTML, we will do the following
import asyncio
from playwright.async_api import async_playwright
async def html_to_pdf(html_content, output_path):
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.set_content(html_content)
await page.pdf(path=output_path)
await browser.close()
html_content = '''
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Sample HTML</title>
</head>
<body>
<h1>Hello, World!</h1>
<p>This is a sample HTML content to be converted to PDF.</p>
</body>
</html>
'''
output_path = 'custom-html-to-pdf-output.pdf'
asyncio.run(html_to_pdf(html_content, output_path))
In the above code snippet, we are passing our custom HTML content to html_to_pdf method. We use Playwright to load that custom HTML in a new tab of the headless browser. and then generate PDFs using our custom HTML.
v. WeasyPrint
WeasyPrint is a visual rendering engine that converts HTML and CSS into PDFs, focusing on adhering to web standards for printing. It is freely available under a BSD license.
Unlike some other libraries which rely on browsers such as Chrome or Firefox, it is built on several libraries but does not use a full rendering engine.
In this section, we will look into how we can use WeasyPrint to convert HTML to PDF using Python.
Install WeasyPrint
pip install WeasyPrint
Generate PDF from website URL
To generate a PDF from the website URL, we will use .HTML() method given by the WeasyPrint library.
from weasyprint import HTML
def url_to_pdf(url, output_path):
HTML(url).write_pdf(output_path)
# Example usage
url = 'https://google.com'
output_path = 'output_url.pdf'
url_to_pdf(url, output_path)
In the above code snippet, we are using the WeasyPrint .write_pdf() method which converts the Website URL to PDF in a simple step.
While using WeasyPrint, we don’t need to worry about setting up any other things and we can just call the method and get PDFs directly from the website URL.
Generate PDF from custom HTML content
from weasyprint import HTML
def html_to_pdf(html_content, output_path):
HTML(string=html_content).write_pdf(output_path)
html_content = '''
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Sample HTML</title>
</head>
<body>
<h1>Hello, World!</h1>
<p>This is a sample HTML content to be converted to PDF.</p>
</body>
</html>
'''
output_path = 'output_html.pdf'
html_to_pdf(html_content, output_path)
Same as generating PDFs from website URL, we can use .HTML() method to load the custom HTML and then use .write_pdf() method to create PDF from our custom HTML.
Comparison of All 5 Popular Libraries
While all of these tools serve the primary purpose of HTML to PDF conversion, each brings its unique features and approaches to the table.
Below is a detailed tabular comparison to help you choose the right tool for your needs.
Feature/Aspect | Pyppeteer | xhtml2pdf | python-pdfkit | Playwright | WeasyPrint |
---|---|---|---|---|---|
Nature | Browser automation library | HTML/CSS to PDF converter | Wrapper around wkhtmltopdf | Lightweight and Modern Browser automation library | Document factory to create PDF documents easily |
Based On | Puppeteer (Chromium headless) | ReportLab & html5lib | wkhtmltopdf | Playwright (supports Firefox, Chromium, Edge & Safari) | Written in Python and based on various libraries and not a full rendering engine |
Dependencies | Requires Chrome/Chromium | Python libraries | Requires wkhtmltopdf | Cross-browser support | Python libraries |
Language | Python | Python | Python | Python | Python |
Javascript Support | Yes (full browser environment) | No | Yes (limited, via wkhtmltopdf) | Yes (full browser environment) | No |
CSS Support | Full (as in Chrome) | Limited | Good (via wkhtmltopdf) | Full (as in browser) | Yes |
Performance | May be slower (full browser) | Moderate | Fast (native conversion) | Fast | Fast |
Ease of Setup | Moderate (need Chromium) | Easy (pure Python) | Moderate (requires wkhtmltopdf) | Easy to setup | Easy to Setup |
API Flexibility | High (full browser automation) | Moderate (focused on PDF) | Moderate (wrapper around tool) | One API for cross-browser support | Simple Python API |
Usage | Good for complex web content, SPA, dynamic JS content | Good for simpler HTML/CSS docs | Common for various HTML to PDF tasks | Good for complex automation testing with support of all modern browser | Good for creating PDF documents with rich styling |
In conclusion, the choice between Pyppeteer, Playwright, xhtml2pdf, WeasyPrint, and python-pdfkit depends on the specific requirements of your project.
While Pyppeteer excels at handling dynamic content due to its full browser automation capabilities, xhtml2pdf offers a straightforward, Python-centric solution for basic conversions.
Playwright is a serious alternative to Pyppeteer as Playwright solves the problems that developers face while using Pupeteer. With the Playwright you get to experience a lightweight and fast library that has support for all the modern browsers.
WeasyPrint is a developer-friendly Python library that does not let you sweat over configurations and setup. One thing to take note is that WeasyPrint is based on various libraries and not a full rendering engine like WebKit or Gecko, so it doesn’t support Javasacript.
Python-pdfkit, wrapping around wkhtmltopdf, stands as a versatile middle ground. Developers should weigh the features, setup complexities, and performance of each library against their project’s demands to determine the best fit.
In summary, if you need to render complex HTML, CSS, and JavaScript with full browser compatibility, consider using Pyppeteer, Playwright, or python-pdfkit. For simpler tasks, xhtml2pdf and WeasyPrint are more suitable options.
HTML to PDF using APITemplate.io
Above are some examples of how we can use libraries to convert HTML to PDF and web pages to PDF. but when it comes to generating PDFs using templates or keeping track of generated PDFs, we need to do a lot of extra things to handle all those.
We need to have our own generating pdfs tracker for tracking the files generated. Or if we want to use custom templates such as Invoice generators, we need to create and manage those templates.
APITemplate.io is an API-based platform for PDF generation, perfect for the use cases mentioned. Our PDF generation API uses a Chromium-based rendering engine, which fully supports JavaScript, CSS, and HTML.
Let’s see how we can utilize APITemplate.io to handle generating PDFs
i. Template-based PDF generation
APITemplate.io allows you to manage your templates. Go to Manage Templates from the dashboard
From Manage Template, You can create your own templates. Following is the sample Invoice template. There are lots of templates available that you can choose and customize based on your requirements.
To start using APITemplate.io APIs, You need to get your API Key which you can get from the API Integration
Tab
Now that you have your APITemplate account ready, let’s get to some actions and integrate it with our application. We will be using the template to generate PDFs.
import requests
import json
# Initialize HTTP client
client = requests.Session()
# API URL
url = "https://rest.apitemplate.io/v2/create-pdf?template_id=YOUR_TEMPLATE_ID"
# Payload data
payload = {
"date": "15/05/2022",
"invoice_no": "435568799",
"sender_address1": "3244 Jurong Drive",
"sender_address2": "Falmouth Maine 1703",
"sender_phone": "255-781-6789",
"sender_email": "[email protected]",
"rece_addess1": "2354 Lakeside Drive",
"rece_addess2": "New York 234562",
"rece_phone": "34333-84-223",
"rece_email": "[email protected]",
"items": [
{"item_name": "Oil", "unit": 1, "unit_price": 100, "total": 100},
{"item_name": "Rice", "unit": 2, "unit_price": 200, "total": 400},
{"item_name": "Mangoes", "unit": 3, "unit_price": 300, "total": 900},
{"item_name": "Cloth", "unit": 4, "unit_price": 400, "total": 1600},
{"item_name": "Orange", "unit": 7, "unit_price": 20, "total": 1400},
{"item_name": "Mobiles", "unit": 1, "unit_price": 500, "total": 500},
{"item_name": "Bags", "unit": 9, "unit_price": 60, "total": 5400},
{"item_name": "Shoes", "unit": 2, "unit_price": 30, "total": 60},
],
"total": "total",
"footer_email": "[email protected]",
}
# Serialize payload to JSON
json_payload = json.dumps(payload)
# Set headers
headers = {
"X-API-KEY": "YOUR_API_KEY",
"Content-Type": "application/json",
}
# Make the POST request
response = client.post(url, data=json_payload, headers=headers)
# Read the response
response_string = response.text
# Print the response
print(response_string)
and If we check response_string
we have the following
{
"download_url":"PDF_URL",
"transaction_ref":"8cd2aced-b2a2-40fb-bd45-392c777d6f6",
"status":"success",
"template_id":"YOUR_TEMPLATE_ID"
}
In the above code, it’s very easy to use APITemplate to convert html to pdf because we don’t need to install any other library. Just need to call one simple API and use our data as a request body and that’s it!
You can use the download_url
from the response to download or distribute the generated PDF.
ii. Generate PDF from the website URL
APITemplate also supports generating PDFs from website URLs.
import requests, json
def main():
api_key = "YOUR_API_KEY"
template_id = "YOUR_TEMPLATE_ID"
data = {
"url": "https://en.wikipedia.org/wiki/Sceloporus_malachiticus",
"settings": {
"paper_size": "A4",
"orientation": "1",
"header_font_size": "9px",
"margin_top": "40",
"margin_right": "10",
"margin_bottom": "40",
"margin_left": "10",
"print_background": "1",
"displayHeaderFooter": true,
"custom_header": "<style>#header, #footer { padding: 0 !important; }</style>\n<table style=\"width: 100%; padding: 0px 5px;margin: 0px!important;font-size: 15px\">\n <tr>\n <td style=\"text-align:left; width:30%!important;\"><span class=\"date\"></span></td>\n <td style=\"text-align:center; width:30%!important;\"><span class=\"pageNumber\"></span></td>\n <td style=\"text-align:right; width:30%!important;\"><span class=\"totalPages\"></span></td>\n </tr>\n</table>",
"custom_footer": "<style>#header, #footer { padding: 0 !important; }</style>\n<table style=\"width: 100%; padding: 0px 5px;margin: 0px!important;font-size: 15px\">\n <tr>\n <td style=\"text-align:left; width:30%!important;\"><span class=\"date\"></span></td>\n <td style=\"text-align:center; width:30%!important;\"><span class=\"pageNumber\"></span></td>\n <td style=\"text-align:right; width:30%!important;\"><span class=\"totalPages\"></span></td>\n </tr>\n</table>"
}
}
response = requests.post(
F"https://rest.apitemplate.io/v2/create-pdf-from-url",
headers = {"X-API-KEY": F"{api_key}"},
json= data
)
if __name__ == "__main__":
main()
In the above code, we can provide the URL in the request body along with the settings for the PDF. APITemplate will use this request body to generate a PDF and will return a download URL for your PDF.
iii. Generate PDF from custom HTML content
If you want to generate PDFs using your own custom HTML content, APITemplate supports that as well.
import requests, json
def main():
api_key = "YOUR_API_KEY"
template_id = "YOUR_TEMPLATE_ID"
data = {
"body": "<h1> hello world {{name}} </h1>",
"css": "<style>.bg{backgound: red};</style>",
"data": {
"name": "This is a title"
},
"settings": {
"paper_size": "A4",
"orientation": "1",
"header_font_size": "9px",
"margin_top": "40",
"margin_right": "10",
"margin_bottom": "40",
"margin_left": "10",
"print_background": "1",
"displayHeaderFooter": true,
"custom_header": "<style>#header, #footer { padding: 0 !important; }</style>\n<table style=\"width: 100%; padding: 0px 5px;margin: 0px!important;font-size: 15px\">\n <tr>\n <td style=\"text-align:left; width:30%!important;\"><span class=\"date\"></span></td>\n <td style=\"text-align:center; width:30%!important;\"><span class=\"pageNumber\"></span></td>\n <td style=\"text-align:right; width:30%!important;\"><span class=\"totalPages\"></span></td>\n </tr>\n</table>",
"custom_footer": "<style>#header, #footer { padding: 0 !important; }</style>\n<table style=\"width: 100%; padding: 0px 5px;margin: 0px!important;font-size: 15px\">\n <tr>\n <td style=\"text-align:left; width:30%!important;\"><span class=\"date\"></span></td>\n <td style=\"text-align:center; width:30%!important;\"><span class=\"pageNumber\"></span></td>\n <td style=\"text-align:right; width:30%!important;\"><span class=\"totalPages\"></span></td>\n </tr>\n</table>"
}
}
response = requests.post(
F"https://rest.apitemplate.io/v2/create-pdf-from-html",
headers = {"X-API-KEY": F"{api_key}"},
json= data
)
if __name__ == "__main__":
main()
Similar to generating PDF from a website URL, the above API request takes body and CSS as part of the payload and it is to generate PDF.
Conclusion
PDF generation is now part of every business application and it should not make developers sweat.
We have seen how we can use third-party libraries to generate PDFs if our use case is simple. But also if we have complex use cases such as maintaining templates, then APITemplate.io provides the solution just for that using simple API calls.
Sign up for a free account with us now and start automating your PDF generation.
Libraries: