Generating PDFs in Java with 3 Popular Libraries

1. Introduction

Creating PDF documents programmatically is a common requirement in software development, especially when dealing with report generation, invoicing, or any content that requires a printable or easily distributable format.

Java, being one of the most widely used programming languages, offers several libraries to simplify this task. This article will expand on how to generate pdf from HTML files in Java using OpenHTMLtoPDF, iTextPDF, and Flying saucer and their differences.

Each of these libraries brings unique features and capabilities to the table, from converting HTML content directly into PDF files to providing detailed control over the document’s appearance and layout.

Let’s get started now.

2. Generating PDFs in Java with 3 Libraries

i. OpenHTMLtoPDF

OpenHTMLtoPDF is an open-source Java library to convert the ML/XHTML into PDFs or images.

It uses PDFBOX open library to generate PDF after rendering the XHTML. Apache PDFBox is an open-source Java library that supports creating and converting PDF documents.

In this tutorial, we will use the PdfRendererBuilder class from the library, which provides different methods to generate the PDF:

  • run(): Run the XHTML/XML to PDF conversion
  • toStream(): An output stream to output the resulting PDF.
  • withUri(): Provides a URI (Uniform Resource Identifier) to convert to PDF.

You can find more about these methods in the documentation here.

Code Example

The following code example provides a simple usage for the OpenHTMLtoPDF by creating a URI from the HTML file, passing it to the builder to convert it to a stream and then running the XML/XHTML conversion to PDF using the Renderer.

import java.io.FileOutputStream;
import java.io.OutputStream;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;

public class SimpleUsage
{
    public static void main(String[] args) throws Exception {
        try (OutputStream os = new FileOutputStream("out.pdf")) {
            PdfRendererBuilder builder = new PdfRendererBuilder();
            builder.useFastMode();
           
 builder.withUri("file:in.htm");
	// set output to an output stream set 
            builder.toStream(os);
	// Run the XHTML/XML to PDF conversion and 
            builder.run();
            //prints the message if the PDF is created successfully
            System.out.println("PDF created");
        }
    }
}

Maven Dependency

Maven is a tool to standardize the build process as it takes up most of the build tasks.
We need to add the following dependencies to the POM.xml file to get the above code working and running.

 <dependencies>
    <dependency>
        <!-- ALWAYS required, usually included transitively. -->
        <groupId>com.openhtmltopdf</groupId>
        <artifactId>openhtmltopdf-core</artifactId>
        <version>${openhtml.version}</version>
    </dependency>

    <dependency>
        <!-- Required for PDF output. -->
        <groupId>com.openhtmltopdf</groupId>
        <artifactId>openhtmltopdf-pdfbox</artifactId>
        <version>${openhtml.version}</version>
    </dependency>
 <dependencies>

You can find more information about OpenHTMLtoPDF here.

ii. iTextPDF

iTextPDF library that provides API to create PDF, RTF, and HTML documents.

iTextPDF has a hierarchical structure; it divides the text into “Chunks” combining these Chunks together will form a “Phrase”. There is a subclass from the Phrase like the “Paragraph”, which itself contains multiple subclasses. In this tutorial, we will use some iTextPDF classes.


PdfWriter: A DocWriter class for PDF; using this class, every element can be added to a document and written to the outputstream.


XMLWorkerHelper: A helper class for parsing XHTML/CSS or XML flow to PDF.

You can find more about these classes from the documentation PdfWriter, XMLWorkerHelper.

Code Example

The flossing code example demonstrates the simplest way to generate a PDF from an HTML file by instantiating a singleton instance out of the XMLWorkerHelper class, parsing the HTML file, and passing the parsed stream to the pdfWriter instance to generate the PDF.

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;

public class Html2Pdf {
    private static final String HTML = "html.html";

    public static void main(String[] args) {
        try {
            Document document = new Document();
	// this method is used to get an instance of the PdfWriter.
            PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("html.pdf"));
            document.open();
	// Get a Singleton XMLWorkerHelper
	// parseXHtml: Parses the xml data in the given reader 
            XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(HTML));
            document.close();
        } catch (IOException | DocumentException e) {
            e.printStackTrace();
        }
    }
}

Maven Dependency

We need to add the following dependencies to the POM.xml file to get the above code working and running.

<dependency>
   <groupId>com.itextpdf</groupId>
   <artifactId>itextpdf</artifactId>
   <version>${itextpdf.version}</version>
</dependency>
<dependency>
   <groupId>com.itextpdf.tool</groupId>
   <artifactId>xmlworker</artifactId>
   <version>${xmlworker.version}</version>
</dependency>

You can find more information about iTextPDF here.

iii. Flying Saucer

Flying Saucer is a Java library for converting XML/XHTML into PDF or images; Flying Saucer was made based on the iTextPDF.

Code Example

The following code demonstrates how to use the Flying Saucer library by combining the Jsoup library along with the xhtmlrenderer, which is the flying saucer library.


Jsoup is an open-source Java library to parse, extract and manipulate data from HTML files. As Jsoup expects a string, we first need to open the HTML file as a File object first and then pass it to the JSOUP library; and you can find more about this here.

After parsing the HTML file, we pass the data to the Flying saucer library to convert it into a PDF.

import java.io.*;
import java.io.FileOutputStream;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import org.xhtmlrenderer.pdf.ITextRenderer;

public class Main {

   public static void main(String[] args) throws Exception {

       try (OutputStream os = new FileOutputStream("out.pdf")) {
           // opening the file from the path
           File in = new File("html.html");
           // Jsoup expects a string
           Document document = Jsoup.parse(in, null);

           //Convert the HTML format into XHTML
      	document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);

           ITextRenderer iTextRenderer = new ITextRenderer();
           iTextRenderer.setDocumentFromString(document.html());
           iTextRenderer.layout();
           iTextRenderer.createPDF(os);
           System.out.println("PDF created");
       }
   }
}

Maven Dependency

We need to add the following dependencies to the POM.xml file to get the above code working and running.

<dependency>
   <groupId>org.jsoup</groupId>
   <artifactId>jsoup</artifactId>
   <version>1.14.3</version>
</dependency>
<dependency>
   <groupId>org.xhtmlrenderer</groupId>
   <artifactId>flying-saucer-core</artifactId>
   <version>9.1.22</version>
</dependency>

<dependency>
   <groupId>org.xhtmlrenderer</groupId>
   <artifactId>flying-saucer-pdf-openpdf</artifactId>
   <version>9.1.22</version>
</dependency>

You can find more information here.

3. Comparison of All Libraries

OpenHTMLtoPDF, iTextPDF, and Flying Saucer are three popular libraries used by developers to create PDF documents. These libraries vary in their approach, usability, and feature sets.

This comparison aims to provide a clear overview of each, helping developers choose the one that best suits their project requirements.

FeatureOpenHTMLtoPDFiTextPDFFlying Saucer
Rendering EngineUses its own PDFBox-based rendererOwn rendering engineUses iText as the rendering engine
HTML/CSS SupportGood support for CSS 2.1Excellent HTML and CSS support including some HTML5 featuresLimited to CSS 2.1 and XHTML
ExtensibilityModerateHigh, with extensive customization optionsModerate
PerformanceGoodVery good, optimized for performanceGood, but dependent on iText
Community and SupportGrowing community, responsive supportLarge community, professional support available at a costSmaller community, limited updates
LicensingApache License 2.0AGPL license for open source; commercial license needed for proprietary useLGPL, requires iText which has AGPL
Ease of UseEasy to use for basic PDF generationSteep learning curve but versatileRelatively easy to integrate
Special FeaturesDirect image rendering, web page to PDFAdvanced features like PDF manipulation, merging, splittingMainly focused on HTML to PDF conversion

First, We need to highlight Flying Saucer based on iText, which means minor changes between them. Flying Saucer provides an easier approach for simple HTML to PDF conversions but with limited HTML and CSS capability compared to the other two.

However, OpenHTMLtoPDF is based on another library called PDFBOX. PDFBOX is a well-maintained, open-source library with an LGPL license, while, iTextPDF is an AGPL license library.

OpenHTMLtoPDF is also considered faster than the Flying Saucer. In addition, OpenHTMLtoPDF is an excellent choice for straightforward PDF generation with good HTML and CSS support, benefiting from an open and permissive license

iTextPDF can be considered much more resource-efficient than PDFBOX as it processes the text chunk by chunk, and it also has an event-oriented architecture.

On the other hand, OpenHTMLtoPDF provides a built-in plugin for SVG and MathML and also provides better support for CSS3 transforms, and one of the drawbacks of OpenHTMLtoPDF is that there is no support for OpenType fonts.

4. Conclusion

In this article, we talked about how to generate pdf from HTML files using Java. Then, we briefly introduced some of the tools/ libraries like OpenHTMLtoPDF, iTextPDF and Flying Saucer. We also compared them in different properties like complexity, size of generated files, resolution, and features.

Finally, if you want to have a tool with all the features of these libraries and more, in that case, I recommend that you check out APITemplate.io.

APITemplate.io is a tool that can help you generate PDFs quickly with PDF generating API over the cloud and is compatible with CSS, JavaScript, and Python. It also comes with predefined templates which you can reuse and edit.

Sign up for a free account with APITemplate.io now and start automating your PDF generation.

Table of Contents

Share:

Facebook
Twitter
Pinterest
LinkedIn

Articles for Image Generation

Articles for PDF Generation

Copyright © 2024 APITemplate.io