1. Introduction
PDFs are one of the most commonly used formats, supported by all operating systems, and can be opened and viewed on any device without any compatibility issues. HTML on the other hand, is the backbone for web content, it enables the creation and accessibility of interactive content across different platforms.
Generating PDFs from HTML can be useful for dynamic content like reports and invoices. Together you can benefit from the flexibility of HTML and reliability and security of the read-only format of the PDF.
In this article, we will introduce another tool that can be used to automate PDF generation from HTML in Python. PDFKit is another Python library that can be used to convert HTML pages into PDFs. It wraps the wkhtmltopdf command line tool. PDFKit is easy to use and can handle complex PDF generation tasks like tables and multi-page documents.
2. Installation and Setup
Installing PDFKit is simple as it’s a Python library. First, we need to create our Python virtual environment, the process is simple, you can find out more about it using this link.
The second step is to install the dependencies, as PDFKit relies on wkhtmltopdf to perform HTML conversion to PD, We need to ensure that wkhtmltopdf is installed on your operating system.
On Mac, you can use the following command with Homebrew:
brew install --cask wkhtmltopdf
On Linux, use the following command:
sudo apt-get install wkhtmltopdf
On Windows, you can download the latest installer from here and set the path for your installation in the Windows PATH environment variable.
Now run the following command to check for the correct installation:
wkhtmltopdf --version
This should print the version of wkhtmltopdf installed.
3. Basic Usage of PDFKit (from String)
Generating PDF from HTML is a simple task when dealing with PDFKit, Here is a basic example of using PDFKit to create a PDF from a string of HTML.
import pdfkit
# Convert HTML string to PDF using PDFKit
html_string = """
<html>
<head><h1>Hello, World!</h1></head>
<body>This PDF is generated using PDFKit</body>
</html>
"""
pdfkit.from_string(html_string, 'test.pdf')
This will generate a PDF file named “test.pdf” and contain the HTML translation of the text provided.
Congratulations, you have completed your first steps in generating PDF from an HTML using PDFKit.
4. Ways to generate PDF Documents with PDFKit (HTML file and URL)
Generating a PDF from HTML using PDFKit can be done in more than one way. In the previous section, we demonstrated how to create a PDF from an HTML string. This was only one way to use PDFKit. Another way to use PDFKit is to provide a file containing the HTML code you want to convert to a PDF.
Let’s create a simple HTML file containing the following HTML code and call it “test.html”
<html>
<head><h1>Hello, World!</h1></head>
<body>This PDF is generated using PDFKit from an HTML file.</body>
</html>
Now, go to your Python editor and write the following code:
import pdfkit
# Convert HTML file to PDF from a file
pdfkit.from_file('test.html', 'pdf_from_file.pdf')
With this simple one-line code, you can convert the “test.html” file into a PDF file named “pdf_from_file.pdf”.
Now for the fun part, we simply convert a URL page to a PDF with just another line of code.
import pdfkit
# Convert HTML file to PDF from a url
pdfkit.from_url('https://apitemplate.io/', 'pdf_from_url.pdf')
Just like that, We have converted the web page of “apitemplate.io” into a PDF file, you will also note that the links on the page still can be clicked through the PDF.
Now, we learned how to use PDFKit in its three simple use cases.
- URL
- String
- HTML file
It’s time for you also to try these codes yourself.
5. Advanced Features
In the previous section, we discussed the basic ways of using PDFKit. Now we are ready to move forward and discuss some of the advanced features.
i. Customizing PDF Output
PDFKit enables you also to customize the output. This can be done by providing different options along with the data that you share with the various APIs.
For example in the following code:
import pdfkit
options = {
'page-size': 'A4',
'margin-top': '10mm',
'margin-left': '10mm',
'encoding': 'UTF-8',
'no-outline': None
}
# Convert HTML file to PDF from a url
pdfkit.from_url('https://apitemplate.io/', 'pdf_from_url.pdf', options=options)
We have used different options to modify and customize the generated PDF.
The options are:
- page-size: This will change the page size and let you choose between different page sizes like A4.
- margin-top: change the size of the top margin, there is another option for the bottom one.
- margin-left: change the size of the left margin, there is also a margin-right
- Encoding: change the data encoding.
- No-outline: if you used the previous example, you will notice that the outline (Bookmarks) generated is somehow messy, this option stops the generation of an outline.
You can find more options here.
Another way of customization is to add a custom CSS file or even a Java script.
Adding a custom CSS file:
options = { 'user-style-sheet': 'style.css' }
An example of javascript:
options = {
'javascript-delay': 1000, # Delay in milliseconds
'no-stop-slow-scripts': True, # Prevent stopping slow scripts 'debug-javascript': True # Enable JavaScript debugging
}
All these options and more can make it easy for you to customize the appearance and the generated PDF.
ii. Adding Headers and Footers
Another feature that cen be helpful is to add a custom header or footer to the generated PDF.
import pdfkit
options = {
'header-html': 'header.html',
'footer-html': 'footer.html',
'margin-top': '20mm',
'margin-bottom': '20mm'
}
pdfkit.from_file('test.html', 'output.pdf', options=options)
All that you need to create is two HTML files: header.html and footer.html, include them in your options and pass them to the generation API.
Example header.html:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Header</title>
<style>
/* Add CSS styles for header if needed */
.header {
text-align: center;
font-size: 14px;
color: #333;
}
</style>
</head>
<body>
<div class="header">
<p>Company Name</p>
<p>Contact: [email protected]</p>
</div>
</body>
</html>
Header example:
Footer example:
iii. Handling Multiple pages
One of the advanced features of PDFKit is the ability to handle multiple pages and add page breaks.
Using the same code used previously with adding a header and a footer, just by modifying the HTML CSS styles, we can force the content to be generated into a new page.
The new HTML code should look like this:
<html>
<style>
.page-break {
page-break-before: always; /* Always add page break before this element */
}
</style>
<head><h1>Hello, World!</h1></head>
<body>
<p>This PDF is generated using PDFKit from an HTML file.</p>
<div class="page-break">
<h1>Section 1</h1>
<p>This is content for section 1. This content should appear in a new page.</p>
</div>
</body>
</html>
The output will look like this:
Note that the header and footer also appear on both pages.
we have another article that compares PDFKit and other Python libraries used for generating PDFs, you can read it here. You can also see a summary of the comparison here.
vi. Common Challenges of Using PDFKit
(Please rewrite the following in your own words or you may change or remove any of them)
Based on how PDFKit works and the dependency between PDFKit and wkhtmltopdf, this may introduce some challenges while using the library, we will talk about some of these common challenges:
- Dependency on wkhtmltopdf: The dependency between these two tools can introduce some issues like compatibility issues when using mismatched versions. Overcoming this mismatch can be another burden you must carry during setup.
- Performance: Wkhtmltopdf is a resource-intensive tool, that can consume a lot of memory and CPU. This can affect the performance of your application.
- Header and Footer Customization: we discussed how you can customize the header and the footer of the generated files, However, it sometimes can be tricky and requires a special script.
- JavaScript Support: if the HTML files contain a lot of JavasScript, this may affect the generated files as the execution can be inconsistent.
Dealing with these limitations is important when using PDFKit to generate the output that you are looking for.
6. Introducing cloud-based API for PDF generation
Another useful way of generating PDFs from HTML is to use cloud-based platforms. It provides an easy and clean way of generating PDFs by using APIs. In addition to supporting generating PDFs files from different sources, it can also keep track of what was generated.
One of the best solutions available is APITemplate.io, it supports generating PDFs from HTML using the three sources mentioned earlier:
- URL
- String
- HTML File
In addition to that, it supports Template-based Generation,
you can view more details about how to generate PDFs using different formats from:
- Template-based PDF generation.
- Generate PDF from the website URL.
- Generate PDF from custom HTML content.
I suggest you have a look at these different features.
7. Conclusion
In this article, We discussed PDFKit and How it wraps up the wkhtmltopdf tool. We also discussed the basic installation and usage of PDFKit, We introduced some of the advanced features supported by PDFKit with code examples and discussed some of the limitations and challenges faced when developing with PDFKit.
Finally, We have also discussed briefly the cloud-based API for PDF generation and How APITemplate.io supports three ways of HTML to PDF generation. Give APITemplate.io a try by signing up for a free account with us now and start automating your PDF generation.