1. Introduction
Creating PDF documents programmatically is a common requirement in software development, especially when dealing with report generation, invoicing, or any content that requires a printable or easily distributable format.
Java, being one of the most widely used programming languages, offers several libraries to simplify this task. This article will expand on how to generate pdf from HTML files in Java using OpenHTMLtoPDF, iTextPDF, and Flying saucer and their differences.
Each of these libraries brings unique features and capabilities to the table, from converting HTML content directly into PDF files to providing detailed control over the document’s appearance and layout.
Let’s get started now.
2. Generating PDFs in Java with 3 Libraries
i. OpenHTMLtoPDF
OpenHTMLtoPDF is an open-source Java library to convert the ML/XHTML into PDFs or images.
It uses PDFBOX open library to generate PDF after rendering the XHTML. Apache PDFBox is an open-source Java library that supports creating and converting PDF documents.
In this tutorial, we will use the PdfRendererBuilder class from the library, which provides different methods to generate the PDF:
- run(): Run the XHTML/XML to PDF conversion
- toStream(): An output stream to output the resulting PDF.
- withUri(): Provides a URI (Uniform Resource Identifier) to convert to PDF.
You can find more about these methods in the documentation here.
Code Example
The following code example provides a simple usage for the OpenHTMLtoPDF by creating a URI from the HTML file, passing it to the builder to convert it to a stream and then running the XML/XHTML conversion to PDF using the Renderer.
import java.io.FileOutputStream;
import java.io.OutputStream;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;
public class SimpleUsage
{
public static void main(String[] args) throws Exception {
try (OutputStream os = new FileOutputStream("out.pdf")) {
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFastMode();
builder.withUri("file:in.htm");
// set output to an output stream set
builder.toStream(os);
// Run the XHTML/XML to PDF conversion and
builder.run();
//prints the message if the PDF is created successfully
System.out.println("PDF created");
}
}
}
Maven Dependency
Maven is a tool to standardize the build process as it takes up most of the build tasks.
We need to add the following dependencies to the POM.xml file to get the above code working and running.
<dependencies>
<dependency>
<!-- ALWAYS required, usually included transitively. -->
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-core</artifactId>
<version>${openhtml.version}</version>
</dependency>
<dependency>
<!-- Required for PDF output. -->
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-pdfbox</artifactId>
<version>${openhtml.version}</version>
</dependency>
<dependencies>
You can find more information about OpenHTMLtoPDF here.
ii. iTextPDF
iTextPDF library that provides API to create PDF, RTF, and HTML documents.
iTextPDF has a hierarchical structure; it divides the text into “Chunks” combining these Chunks together will form a “Phrase”. There is a subclass from the Phrase like the “Paragraph”, which itself contains multiple subclasses. In this tutorial, we will use some iTextPDF classes.
PdfWriter: A DocWriter class for PDF; using this class, every element can be added to a document and written to the outputstream.
XMLWorkerHelper: A helper class for parsing XHTML/CSS or XML flow to PDF.
You can find more about these classes from the documentation PdfWriter, XMLWorkerHelper.
Code Example
The flossing code example demonstrates the simplest way to generate a PDF from an HTML file by instantiating a singleton instance out of the XMLWorkerHelper class, parsing the HTML file, and passing the parsed stream to the pdfWriter instance to generate the PDF.
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
public class Html2Pdf {
private static final String HTML = "html.html";
public static void main(String[] args) {
try {
Document document = new Document();
// this method is used to get an instance of the PdfWriter.
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("html.pdf"));
document.open();
// Get a Singleton XMLWorkerHelper
// parseXHtml: Parses the xml data in the given reader
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(HTML));
document.close();
} catch (IOException | DocumentException e) {
e.printStackTrace();
}
}
}
Maven Dependency
We need to add the following dependencies to the POM.xml file to get the above code working and running.
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itextpdf</artifactId>
<version>${itextpdf.version}</version>
</dependency>
<dependency>
<groupId>com.itextpdf.tool</groupId>
<artifactId>xmlworker</artifactId>
<version>${xmlworker.version}</version>
</dependency>
You can find more information about iTextPDF here.
iii. Flying Saucer
Flying Saucer is a Java library for converting XML/XHTML into PDF or images; Flying Saucer was made based on the iTextPDF.
Code Example
The following code demonstrates how to use the Flying Saucer library by combining the Jsoup library along with the xhtmlrenderer, which is the flying saucer library.
Jsoup is an open-source Java library to parse, extract and manipulate data from HTML files. As Jsoup expects a string, we first need to open the HTML file as a File object first and then pass it to the JSOUP library; and you can find more about this here.
After parsing the HTML file, we pass the data to the Flying saucer library to convert it into a PDF.
import java.io.*;
import java.io.FileOutputStream;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.xhtmlrenderer.pdf.ITextRenderer;
public class Main {
public static void main(String[] args) throws Exception {
try (OutputStream os = new FileOutputStream("out.pdf")) {
// opening the file from the path
File in = new File("html.html");
// Jsoup expects a string
Document document = Jsoup.parse(in, null);
//Convert the HTML format into XHTML
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
ITextRenderer iTextRenderer = new ITextRenderer();
iTextRenderer.setDocumentFromString(document.html());
iTextRenderer.layout();
iTextRenderer.createPDF(os);
System.out.println("PDF created");
}
}
}
Maven Dependency
We need to add the following dependencies to the POM.xml file to get the above code working and running.
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.14.3</version>
</dependency>
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-core</artifactId>
<version>9.1.22</version>
</dependency>
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-pdf-openpdf</artifactId>
<version>9.1.22</version>
</dependency>
You can find more information here.
3. Comparison of All Libraries
OpenHTMLtoPDF, iTextPDF, and Flying Saucer are three popular libraries used by developers to create PDF documents. These libraries vary in their approach, usability, and feature sets.
This comparison aims to provide a clear overview of each, helping developers choose the one that best suits their project requirements.
Feature | OpenHTMLtoPDF | iTextPDF | Flying Saucer |
---|---|---|---|
Rendering Engine | Uses its own PDFBox-based renderer | Own rendering engine | Uses iText as the rendering engine |
HTML/CSS Support | Good support for CSS 2.1 | Excellent HTML and CSS support including some HTML5 features | Limited to CSS 2.1 and XHTML |
Extensibility | Moderate | High, with extensive customization options | Moderate |
Performance | Good | Very good, optimized for performance | Good, but dependent on iText |
Community and Support | Growing community, responsive support | Large community, professional support available at a cost | Smaller community, limited updates |
Licensing | Apache License 2.0 | AGPL license for open source; commercial license needed for proprietary use | LGPL, requires iText which has AGPL |
Ease of Use | Easy to use for basic PDF generation | Steep learning curve but versatile | Relatively easy to integrate |
Special Features | Direct image rendering, web page to PDF | Advanced features like PDF manipulation, merging, splitting | Mainly focused on HTML to PDF conversion |
First, We need to highlight Flying Saucer based on iText, which means minor changes between them. Flying Saucer provides an easier approach for simple HTML to PDF conversions but with limited HTML and CSS capability compared to the other two.
However, OpenHTMLtoPDF is based on another library called PDFBOX. PDFBOX is a well-maintained, open-source library with an LGPL license, while, iTextPDF is an AGPL license library.
OpenHTMLtoPDF is also considered faster than the Flying Saucer. In addition, OpenHTMLtoPDF is an excellent choice for straightforward PDF generation with good HTML and CSS support, benefiting from an open and permissive license
iTextPDF can be considered much more resource-efficient than PDFBOX as it processes the text chunk by chunk, and it also has an event-oriented architecture.
On the other hand, OpenHTMLtoPDF provides a built-in plugin for SVG and MathML and also provides better support for CSS3 transforms, and one of the drawbacks of OpenHTMLtoPDF is that there is no support for OpenType fonts.
4. Conclusion
In this article, we talked about how to generate pdf from HTML files using Java. Then, we briefly introduced some of the tools/ libraries like OpenHTMLtoPDF, iTextPDF and Flying Saucer. We also compared them in different properties like complexity, size of generated files, resolution, and features.
Finally, if you want to have a tool with all the features of these libraries and more, in that case, I recommend that you check out APITemplate.io.
APITemplate.io is a tool that can help you generate PDFs quickly with PDF generating API over the cloud and is compatible with CSS, JavaScript, and Python. It also comes with predefined templates which you can reuse and edit.
Sign up for a free account with APITemplate.io now and start automating your PDF generation.