1. Introduction
Python became essential for everyday developer tasks. Whether you work with Python, you will need to know how to code with Python.
Python is used for automation, testing, web development, data analysis. On the other hand, HTML is the primary language of web development and web-based applications.
One of the superpowers of Python is to deal with data in any format and generate and convert data to any other format. PDF is one of the portable formats that can be used to view data across devices and platforms independent of the device and operating system.
In this article, We will talk about how to generate PDF using Python, and we will introduce multiple libraries like FPDF, ReportLab, Pyppeteer, Playwright, XHTML2PDF and Pdfkit and the difference between them.
Note: If you’re looking for a way to generate PDF documents from HTML, please visit our other blog post for a comprehensive guide: Convert HTML to PDF using Python with 5 Popular Libraries
2. Seven Popular Libraries for PDF Generation in Python
There are a lot of libraries on Python to deal with PDF; We will introduce some of the popular libraries that can be used easily to convert HTML files to PDF format.
i. FPDF
FPDF(Free-PDF) is a python library Ported from PHP to generate PDF. It provides various functionalities to generate PDF, like generating PDFs from text files and writing your data formats to generate PDFs.
While FPDF supports HTML, it only understands the basic functionalities and doesn’t understand CSS. That’s why you need to use HTMLMixin as it helps FPDF to understand the advanced features of the HTML.
Installation and code sample
You can install FPDF with pip using the following command.
pip install fpdf==1.7.2
FPDF supports:
- Page formatting
- Images, links, colours
- Automatic line and page breaks
A code example:
from fpdf import FPDF, HTMLMixin
# creating a class inherited from both FPDF and HTMLMixin
class MyFPDF(FPDF, HTMLMixin):
pass
# instantiating the class
pdf = MyFPDF()
# adding a page
pdf.add_page()
# opening html file
file = open("file.html", "r")
# extracting the data from hte file as a string
Data = file.read()
# HTMLMixin write_html method
pdf.write_html(data)
#saving the file as a pdf
pdf.output('Python_fpdf.pdf', 'F')
The previous example takes a file anime file.html and converts it into a PDF file name Python_fpdf.pdf with the help of the HTMLMixin library.
You can find more about FPDF here
ii. ReportLab
ReportLab is a python library that helps you to create PDF.it has its opensource version and a commercial version, and the difference is that the commercial version supports a Report Markup Language (RML)both provide you with the following features:
- Supports dynamic web PDF generation
- Supports converting XML into PDF
- Support vector graphics and inclusion of other PDF files
- Support the creation of time charts and tables
Installation and code sample
you can install it using the following command:
pip install reportlab
ReportLab is a very complex tool with a lot of capability to create your format and style for PDF. The simplest example can be like the following:
from reportlab.pdfgen import canvas
c = canvas.Canvas("reportlab_pdf.pdf")
c.drawString(100,100,"Hello World")
c.showPage()
c.save()
You can find more info about ReportLab here
iii. Pyppeteer
We talked before about Puppeteer in Generate a PDF with JavaScript Article and how it is a tool to automate the browser.
Pyppeteer is an unofficial port of the automation library provided by the chrome browser.
Main differences between Puppeteer and Pyppeteer
- Pyppeteer accepts both the dictionary input parameters and keyword arguments
- Python is not using $ in the method names
- Page.evaluate() and Page.querySelectorEval() may fail and require you to add a “` force_expr=True“` option to force input strings as an expression
Installation and code sample
Install it using the following command:
pip install pyppeteer
A Code example:
import asyncio
from pyppeteer import launch
#defining an async method
async def main():
# launching browser session
browser = await launch( )
# opening a new page
page = await browser.newPage()
# go to a specific address or file
await page.goto(file: path\_to\_html_file.html')
#create a screen shot from the page
await page.screenshot({'path': 'sample.png'})
# save the screenshot as a pdf
await page.pdf({'path': 'pyppeteer_pdf.pdf'})
#close the browser
await browser.close()
# invocation of the Async main function
asyncio.get_event_loop().run_until_complete(main())
You can read more about Pyppeteer here
iv. Playwright
When it comes to generating PDFs in Python, one powerful and flexible tool is Playwright. Originally designed for browser automation, Playwright can be used to create high-quality PDFs from web pages.
In this section, we’ll explore how you can leverage Playwright to generate PDFs, step-by-step.
What is Playwright?
Playwright is an open-source browser automation library developed by Microsoft. It enables developers to interact with browsers (like Chromium, Firefox, and WebKit) programmatically, making it a versatile tool for web scraping, UI testing, and, of course, PDF generation.
Why Use Playwright for PDF Generation?
Here are a few reasons why Playwright is a great option for generating PDFs:
- Cross-browser support: Playwright works with multiple browsers (Chromium, Firefox, and WebKit), ensuring consistent PDF rendering.
- Headless execution: Generate PDFs in headless mode without launching a visible browser window.
- Precise page control: You can control the page size, orientation, margins, and more.
- Easy to use: Playwright’s simple API allows for straightforward PDF generation.
Prerequisites
Before you get started, make sure you have the following installed:
1. Python (at least version 3.7)
2. Playwright – Install it by running:
pip install playwright
playwright install
These commands will install Playwright and download the necessary browser binaries.
Step-by-Step Guide to Generate a PDF with Playwright
1. Install Playwright
Make sure you have Playwright installed by running the following commands:
pip install playwright
playwright install
This will install the Python package and download browser binaries required by Playwright.
2. Write the PDF Generation Script
Here’s an example script to generate a PDF using Playwright:
import asyncio
from playwright.async_api import async_playwright
async def generate_pdf(url: str, output_file: str):
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.goto(url)
await page.pdf(
path=output_file,
format="A4",
print_background=True, # Ensures background colors and images are included
margin={"top": "20px", "right": "20px", "bottom": "20px", "left": "20px"}
)
await browser.close()
# Usage Example
url = "https://example.com"
output_file = "output.pdf"
asyncio.run(generate_pdf(url, output_file))
Explanation of Key Parts:
- launch(): Launches a new browser instance.
- new_page(): Creates a new tab (or page) in the browser.
- goto(url): Navigates to the given URL.
- pdf(): Generates a PDF file with options like page format, margins, and whether to print the background.
- browser.close(): Closes the browser once the PDF is created.
3. Run the Script
Run the script by executing:
python script_name.py
The PDF will be saved as output.pdf
in the directory where the script is executed.
Customizing the PDF
You can customize the PDF generation process with additional options:
- Page format: Change the format using
format="A4"
or other standard formats like “Letter” or “Legal”. - Orientation: Switch between portrait and landscape by controlling the width and height.
- Margins: Add margins using the
margin
option as seen in the example. - Header and Footer: Add headers and footers to each page.
Example:
await page.pdf(
path=output_file,
format="A4",
display_header_footer=True,
header_template='<span class="title">Header Content</span>',
footer_template='<span class="pageNumber"></span>/<span class="totalPages"></span>'
)
Playwright is a robust and feature-rich option for generating PDFs in Python. It provides cross-browser support, precise page control, and easy customization.
By following the steps outlined above, you can create high-quality PDFs from any webpage. Whether you need to convert invoices, reports, or web pages into PDF format, Playwright can get the job done efficiently.
v. Python-Wkhtmltopdf
wkhtmltopdf is a widely used command-line tool used to generate PDF from HTML URLs; Python-Wkhtmltopdf is a wrapper for this command-line tool to be used in Python.
Installation and code sample
You can install it using the following command:
pip install py3-wkhtmltopdf==0.4.1
The usage is simple: you need to import the library and provide wkhtmltopdf API with the URL and the path for the output file.
from wkhtmltopdf import wkhtmltopdf
wkhtmltopdf(url='apitemplate.io', output_file='wkhtmltopdf.pdf')
You can find more information here
vi. PDFKit
PDFKit is a wrapper for the wkhtmltopdf makes it very easy to generate PDF from various formats like files, strings, and URLs.
Installation and code sample
You can install it using the following command:
pip install pdfkit
Pdfkit supports features like:
- Vector graphics
- Text features like wrapping, aligning and bullet lists
- PNG and JPEG Image embedding
- Annotation features like Highlights and underlines
- PDF security like encryption
An example of the generation of a PDF from a file is:
#importing pdfkit
import pdfkit
# calling the from file method to convert file to pdf
pdfkit.from_file('file.html', 'file.pdf')
It also supports generating pdfs from links by calling the from_url method.
pdfkit.from_url('https://apitemplate.io/', python.pdf')
you can also specify the setting of the page and font like the following:
options = {
'page-size': 'A4',
'margin-top': '0.75in',
'margin-right': '0.75in',
'margin-bottom': '0.75in',
'margin-left': '0.75in',
'encoding': "UTF-8",
'custom-header': [
('Accept-Encoding', 'gzip')
],
'cookie': [
('cookie-empty-value', '""')
('cookie-name1', 'cookie-value1'),
('cookie-name2', 'cookie-value2'),
],
'no-outline': None
}
pdfkit.from_file('file.html', 'file.pdf', options=options)
You can learn more about pdfkit from here
vii. XHTML2PDF
XHTML2PDF, previously known as Pisa, is a powerful and user-friendly library that allows you to convert HTML and CSS into PDF documents. Unlike some other libraries, XHTML2PDF supports more advanced CSS styles, including page breaks, footers, headers, and more. It’s a great choice if you want to convert web-style content into a polished, printable PDF.
Why Use XHTML2PDF?
- HTML/CSS Support: Convert your existing HTML/CSS files directly to PDF.
- Advanced Styling: Full support for headers, footers, page breaks, and more.
- Ease of Use: Easy to install and quick to implement.
Installation
To get started with XHTML2PDF, you need to install the library. You can do so using pip:
pip install xhtml2pdf
This command installs the necessary files and dependencies.
Example 1: Convert an HTML String to PDF
One of the simplest use cases for XHTML2PDF is to convert a raw HTML string into a PDF. Here’s how you can do it in Python:
from xhtml2pdf import pisa
def convert_html_to_pdf(html_string, output_path):
""" Convert a given HTML string to a PDF file. """
with open(output_path, 'wb') as pdf_file:
pisa_status = pisa.CreatePDF(html_string, dest=pdf_file)
if pisa_status.err:
print("An error occurred while creating the PDF.")
else:
print(f"PDF successfully created at {output_path}")
# Example HTML content
html_content = """
<html>
<head>
<style>
h1 { color: blue; }
p { font-size: 16px; }
</style>
</head>
<body>
<h1>Welcome to XHTML2PDF</h1>
<p>This is a simple paragraph demonstrating PDF creation using XHTML2PDF.</p>
</body>
</html>
"""
# Convert the HTML string to PDF
convert_html_to_pdf(html_content, 'output.pdf')
Explanation
- HTML String: The HTML content includes simple tags like
<h1>
,<p>
, and a CSS style block. - PDF Generation: The
pisa.CreatePDF()
method takes the HTML string and converts it to a PDF saved atoutput.pdf
.
Run the script, and you’ll find output.pdf
in your working directory.
Example 2: Convert an HTML File to PDF
Another common scenario is to convert an existing HTML file to PDF. Here’s how to achieve this:
from xhtml2pdf import pisa
def convert_html_file_to_pdf(input_path, output_path):
""" Convert an existing HTML file to a PDF file. """
with open(input_path, 'r', encoding='utf-8') as html_file:
html_content = html_file.read()
with open(output_path, 'wb') as pdf_file:
pisa_status = pisa.CreatePDF(html_content, dest=pdf_file)
if pisa_status.err:
print("An error occurred while creating the PDF.")
else:
print(f"PDF successfully created at {output_path}")
# Path to the HTML file
html_file_path = 'sample.html'
output_pdf_path = 'output.pdf'
# Convert the HTML file to PDF
convert_html_file_to_pdf(html_file_path, output_pdf_path)
Explanation
- Read HTML File: The script reads an external
sample.html
file and loads its content as a string. - PDF Generation: The
pisa.CreatePDF()
method converts the HTML content into a PDF and saves it asoutput.pdf
.
This approach is especially useful if you’re working with existing HTML templates or web pages saved as HTML files.
3. Comparison of the Libraries
So we have a lot of options to choose from. The only question remains which one is more suitable for me. I would say it depends on your application and what you actually need to do.
Here’s a detailed comparison table for these libraries:
Feature | ReportLab | Pyppeteer | Playwright | PDFKit | FPDF | python-wkhtmltopdf | XHTML2PDF |
---|---|---|---|---|---|---|---|
Primary Use Case | PDF generation with strong support for complex layouts and graphics | Web scraping and browser automation, can generate screenshots as PDF | Web scraping, UI automation, and PDF generation from web pages | PDF generation from HTML using wkhtmltopdf as a backend | PDF generation focusing on ease of use without external dependencies | PDF generation from HTML, leveraging wkhtmltopdf capabilities | HTML/CSS to PDF with support for page breaks, headers, and footers |
Capabilities | High-quality PDFs, charts, graphics | Headless Chrome/Chromium browser automation | Cross-browser automation, full page control, precise PDF generation | Converts HTML to PDF, simple API | Simple PDF generation, customizable | Converts HTML to PDF with precise rendering of web pages | Full HTML/CSS support, including advanced layout features |
Syntax Ease | Moderate to complex, flexible API | Complex, requires understanding of async programming | Moderate, requires knowledge of async programming | Simple, minimal coding required | Very simple, easy to learn | Simple, acts as a wrapper around wkhtmltopdf | Simple, easy to learn and use |
Supported Platforms | Cross-platform (Windows, macOS, Linux) | Cross-platform (Windows, macOS, Linux) | Cross-platform (Windows, macOS, Linux) | Cross-platform (needs wkhtmltopdf installed) | Cross-platform (Windows, macOS, Linux) | Cross-platform (requires wkhtmltopdf) | Cross-platform (Windows, macOS, Linux) |
Installation Complexity | Requires Python and library installation | Requires Node installation and possibly additional browser binaries | Requires Python, playwright installation, and browser binaries | Requires wkhtmltopdf to be installed separately | Only needs Python and FPDF | Requires both Python and wkhtmltopdf installations | Requires Python and pip install xhtml2pdf |
Unique Features | Powerful layout engine, extensive documentation | Automates web interactions, generates PDFs/screenshots | Full page control, multi-browser support, precise rendering | Simple API, relies on robust wkhtmltopdf | Pure Python with no dependencies, supports plugins | Uses web rendering engines for accurate PDF creation | Support for headers, footers, page breaks, and CSS styling |
For example, if you want to build a PDF from scratch, or you just want to convert HTML into a PDF, or you want to fill a particular template and convert it into a specific format.
So if you want to convert HTML into PDF, I believe PDFKit, FPDF, and Wkhtmltopdf are the best options you have. But PDFkit is the more popular one of them. On the other hand, if you want to render PDFs, your options are Pyppeteer and ReportLab.
We’ve put together an article that explains how to convert HTML to PDF using Python, which includes a section on using APITemplate.io’s REST API for the HTML to PDF conversion.
ReportLab advantage is that it supports a wide variety of graphs like line plots and bar charts and can embed images. On the other hand, it doesn’t provide a method for creating a footer and footnotes and can embed only JPEG images, but with the right python extension, you can extend this to 30 more formats. ReportLab is also more difficult for beginner users and more comprehensive.
On the other hand, Pyppeteer provides better rendering and is easier if you are familiar with its javascript version but only supports specific browsers like chrome and chromium that must be available on your machine to work with this tool.
Similar to Pypuppeteer, Playwright uses browser to render PDF document. It is a robust and feature-rich option for generating PDFs in Python. It provides cross-browser support, precise page control, and easy customization
Each library serves somewhat different needs, so the best choice depends on the specific requirements of your project.
4. Conclusion
This article talked about seven of the most popular python libraries for generating PDFs.
We had a brief introduction to some of the tools/ libraries like FPDF, wkHTMLToPdf, Pyppeteer, ReportLab, Playwright, XHTML2PDF and PDFKit. We also compared them in different properties like complexity, size of generated files, resolution, and features.
Finally, if you want to have a tool with all the features of these libraries and more, APITemplate.io offers features such as PDF creation or HTML to PDF conversion compatibility with no-code/low-code tools, and HTML to PDF conversion, making it easy for businesses to generate PDF documents.
Sign up for a free account today and start automating your PDF generation process.