How to Generate PDFs from HTML with Python-PDFKit (HTML string, HTML File and URL)

Introduction

PDFs are one of the most commonly used formats, supported by all operating systems, and can be opened and viewed on any device without any compatibility issues.

HTML on the other hand, is the backbone for web content, it enables the creation and accessibility of interactive content across different platforms.

Generating PDFs from HTML can be useful for dynamic content like reports and invoices. Together you can benefit from the flexibility of HTML and reliability and security of the read-only format of the PDF.

In this article, we will introduce another tool that can be used to automate PDF generation from HTML in Python. PDFKit is another Python library that can be used to convert HTML pages into PDFs.

Python-PDFKit wraps the wkhtmltopdf command line tool. PDFKit is easy to use and can handle complex PDF generation tasks like tables and multi-page documents.

What’s Python-PDFKit

Python-PDFKit is a Python wrapper around the popular wkhtmltopdf command-line tool, which is used to convert HTML content into PDF files. It allows developers to generate PDFs from HTML content easily and programmatically, making it ideal for generating documents like invoices, reports, or printable pages for web applications.

Here’s a more detailed overview:

Key Features

  • HTML to PDF Conversion: PDFKit takes HTML content (which can be a URL, a local HTML file, or raw HTML string) and converts it to a PDF document.
  • wkhtmltopdf Backend: It relies on the wkhtmltopdf tool, which uses WebKit rendering engine to accurately render HTML content and convert it to PDF. This makes the output visually close to what you’d see in a web browser.
  • Customization Options: PDFKit allows for several customization options, such as:
    • Page size, margins, and orientation.
    • JavaScript execution delay (useful for rendering pages with dynamic content).
    • Adding headers and footers.
    • Customizing the output (e.g., page numbers, watermarks).

Installation and Setup

i. Installation of PDFKit

Installing PDFKit is simple as it’s a Python library. First, we need to create our Python virtual environment, the process is simple, you can find out more about it using this link.

You can install pdfkit using pip:

pip install pdfkit

ii. Installation of wkhtmltopdf

The second step is to install the dependencies, as PDFKit relies on wkhtmltopdf to perform HTML conversion to PD, We need to ensure that wkhtmltopdf is installed on your operating system.

On Mac, you can use the following command with Homebrew:

brew install --cask wkhtmltopdf

On Linux, use the following command:

sudo apt-get install wkhtmltopdf

On Windows, you can download the latest installer from here and set the path for your installation in the Windows PATH environment variable.

Now run the following command to check for the correct installation:

wkhtmltopdf --version

This should print the version of wkhtmltopdf installed.

Generate PDFs with PDFKit from String

Generating PDF from HTML is a simple task when dealing with PDFKit, Here is a basic example of using PDFKit to create a PDF from a string of HTML.

import pdfkit

# Convert HTML string to PDF using PDFKit
html_string = """
<html>
<head><h1>Hello, World!</h1></head>
<body>This PDF is generated using PDFKit</body>
</html>
"""
pdfkit.from_string(html_string, 'test.pdf')

This will generate a PDF file named “test.pdf” and contain the HTML translation of the text provided.

Congratulations, you have completed your first steps in generating PDF from an HTML using PDFKit.

Generate PDFs with PDFKit from HTML file and URL

Generating a PDF from HTML using PDFKit can be done in more than one way. In the previous section, we demonstrated how to create a PDF from an HTML string. This was only one way to use PDFKit. Another way to use PDFKit is to provide a file containing the HTML code you want to convert to a PDF.

Let’s create a simple HTML file containing the following HTML code and call it “test.html”

<html>
<head><h1>Hello, World!</h1></head>
<body>This PDF is generated using PDFKit from an HTML file.</body>
</html>

Now, go to your Python editor and write the following code:

import pdfkit
# Convert HTML file to PDF from a file
pdfkit.from_file('test.html', 'pdf_from_file.pdf')

With this simple one-line code, you can convert the “test.html” file into a PDF file named “pdf_from_file.pdf”.

Now for the fun part, we simply convert a URL page to a PDF with just another line of code.

import pdfkit
# Convert HTML file to PDF from a url
pdfkit.from_url('https://apitemplate.io/', 'pdf_from_url.pdf')

Just like that, We have converted the web page of “apitemplate.io” into a PDF file, you will also note that the links on the page still can be clicked through the PDF.

Now, we learned how to use PDFKit in its three simple use cases.

  • URL
  • String
  • HTML file

It’s time for you also to try these codes yourself.

Advanced Features of Python-PDFKit

In the previous section, we discussed the basic ways of using PDFKit. Now we are ready to move forward and discuss some of the advanced features.

i. Customizing PDF Output

PDFKit enables you also to customize the output. This can be done by providing different options along with the data that you share with the various APIs.

For example in the following code:

import pdfkit

options = {
    'page-size': 'A4',
    'margin-top': '10mm',
    'margin-left': '10mm',
    'encoding': 'UTF-8',
    'no-outline': None
}
# Convert HTML file to PDF from a url
pdfkit.from_url('https://apitemplate.io/', 'pdf_from_url.pdf', options=options)

We have used different options to modify and customize the generated PDF

The options are:

  • page-size: This will change the page size and let you choose between different page sizes like A4.
  • margin-top: change the size of the top margin, there is another option for the bottom one.
  • margin-left: change the size of the left margin, there is also a margin-right
  • Encoding: change the data encoding.
  • No-outline: if you used the previous example, you will notice that the outline (Bookmarks) generated is somehow messy, this option stops the generation of an outline.

You can find more options here.

Another way of customization is to add a custom CSS file or even a Java script.

Adding a custom CSS file:

options = { 'user-style-sheet': 'style.css' }

An example of javascript:

options = {
'javascript-delay': 1000, # Delay in milliseconds
'no-stop-slow-scripts': True, # Prevent stopping slow scripts 'debug-javascript': True # Enable JavaScript debugging
}

All these options and more can make it easy for you to customize the appearance and the generated PDF.

ii. Adding Headers and Footers

Another feature that cen be helpful is to add a custom header or footer to the generated PDF.

import pdfkit

options = {
    'header-html': 'header.html',
    'footer-html': 'footer.html',
    'margin-top': '20mm',
    'margin-bottom': '20mm'
}

pdfkit.from_file('test.html', 'output.pdf', options=options)

All that you need to create is two HTML files: header.html and footer.html, include them in your options and pass them to the generation API.

Example header.html:

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>Header</title>
    <style>
        /* Add CSS styles for header if needed */
        .header {
            text-align: center;
            font-size: 14px;
            color: #333;
        }
    </style>
</head>
<body>
    <div class="header">
        <p>Company Name</p>
        <p>Contact: [email protected]</p>
    </div>
</body>
</html>

Header example:

Footer example:

iii. Handling Multiple pages

One of the advanced features of PDFKit is the ability to handle multiple pages and add page breaks.

Using the same code used previously with adding a header and a footer, just by modifying the HTML CSS styles, we can force the content to be generated into a new page.

The new HTML code should look like this:

<html>
    <style>   
        .page-break {
            page-break-before: always; /* Always add page break before this element */
        }
    </style>
<head><h1>Hello, World!</h1></head>
<body>
<p>This PDF is generated using PDFKit from an HTML file.</p>

<div class="page-break">
    <h1>Section 1</h1>
    <p>This is content for section 1. This content should appear in a new page.</p>
</div>

</body>
</html>

The output will look like this:

Note that the header and footer also appear on both pages. 

we have another article that compares PDFKit and other Python libraries used for generating PDFs, you can read it here. You can also see a summary of the comparison here.

Common Challenges of Using PDFKit

In this section, we will be discussing some of the common issues that developers may encounter when working with Python-PDFKit.

  1. Dependency on wkhtmltopdf: The dependency between these two tools can introduce some issues like compatibility issues when using mismatched versions. Overcoming this mismatch can be another burden you must carry during setup.
  2. Performance: Wkhtmltopdf is a resource-intensive tool, that can consume a lot of memory and CPU. This can affect the performance of your application.
  3. Header and Footer Customization: we discussed how you can customize the header and the footer of the generated files, However, it sometimes can be tricky and requires a special script.
  4. JavaScript Support: if the HTML files contain a lot of JavasScript, this may affect the generated files as the execution can be inconsistent.

In summary, Python-PDFKit comes with some challenges and by understanding these challenges in advance, you can plan effectively and mitigate potential issues to make the most out of Python-PDFKit.

Other Popular Python libraries

We have an article about converting HTML to PDF using some of the popular libraries: Pyppeteer, Playwright, xhtml2pdf, WeasyPrint, and python-pdfkit.

You can find out more in the following link:

Introducing APITemplate.io for HTML to PDF Conversion

APITemplate.io is a powerful API-based platform for seamless PDF generation. Whether you need to create invoices, contracts, or any other PDF documents.

APITemplate.io offers an easy-to-use solution that leverages a Chromium-based rendering engine. This means full support for JavaScript, CSS, and HTML, giving you the flexibility to turn any web content into high-quality PDFs.

In addition, with our cloud-based API, there’s no need to install complex dependencies or worry about infrastructure.

APITemplate.io supports template-based PDF generation and can also generate PDFs from URLs. You can find more details in the following article about generating PDFs using different formats below:

In addition we offer a side-by-side WISIWYG editor and HTML editor for template editing.

With APITemplate.io, you can easily streamline your PDF generation needs, saving time and effort by automating the entire process.

Conclusion

In this article, We discussed PDFKit and How it wraps up the wkhtmltopdf tool. We also discussed the basic installation and usage of PDFKit, We introduced some of the advanced features supported by PDFKit with code examples and discussed some of the limitations and challenges faced when developing with PDFKit.

Finally, We have also discussed briefly the cloud-based API for PDF generation and How APITemplate.io supports three ways of HTML to PDF generation. Give APITemplate.io a try by signing up for a free account with us now and start automating your PDF generation.

Table of Contents

Share:

Facebook
Twitter
Pinterest
LinkedIn

Articles for Image Generation

Articles for PDF Generation

Copyright © 2024 APITemplate.io