A guide to generating PDFs in Java

Introduction

While we continue introducing different libraries for generating PDF from HTML in different languages. This article will expand on how to generate pdf from HTML files in Java using openhtmltopdf, itextpdf, and Flying saucer and their differences.

Libraries

openhtmltopdf

Openhtmltopdf is an open-source Java library to convert the ML/XHTML into PDFs or images. It uses PDFBOX open library to generate PDF after rendering the XHTML.

Apache PDFBox is an open-source Java library that supports creating and converting PDF documents.

In this tutorial, we will use the PdfRendererBuilder class from the library, which provides different methods to generate the PDF:
run(): Run the XHTML/XML to PDF conversion
toStream(): An output stream to output the resulting PDF.
withUri(): Provides a URI (Uniform Resource Identifier) to convert to PDF.
You can find more about these methods in the documentation here.

Code Example

The following code example provides a simple usage for the openhtmltopdf by creating a URI from the HTML file, passing it to the builder to convert it to a stream and then running the XML/XHTML conversion to PDF using the Renderer.

import java.io.FileOutputStream;
import java.io.OutputStream;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;

public class SimpleUsage
{
    public static void main(String[] args) throws Exception {
        try (OutputStream os = new FileOutputStream("out.pdf")) {
            PdfRendererBuilder builder = new PdfRendererBuilder();
            builder.useFastMode();
           
 builder.withUri("file:in.htm");
	// set output to an output stream set 
            builder.toStream(os);
	// Run the XHTML/XML to PDF conversion and 
            builder.run();
            //prints the message if the PDF is created successfully
            System.out.println("PDF created");
        }
    }
}

Maven Dependency

Maven is a tool to standardize the build process as it takes up most of the build tasks.
We need to add the following dependencies to the POM.xml file to get the above code working and running.

 <dependencies>
    <dependency>
        <!-- ALWAYS required, usually included transitively. -->
        <groupId>com.openhtmltopdf</groupId>
        <artifactId>openhtmltopdf-core</artifactId>
        <version>${openhtml.version}</version>
    </dependency>

    <dependency>
        <!-- Required for PDF output. -->
        <groupId>com.openhtmltopdf</groupId>
        <artifactId>openhtmltopdf-pdfbox</artifactId>
        <version>${openhtml.version}</version>
    </dependency>
 <dependencies>

You can find more information about openhtmltopdf here.

iTextpdf

A library that provides API to create PDF, RTF, and HTML documents. IText has a hierarchical structure; it divides the text into “Chunks” combining these Chunks together will form a “Phrase”. There is a subclass from the Phrase like the “Paragraph”, which itself contains multiple subclasses. In this tutorial, we will use some iText classes.
PdfWriter: A DocWriter class for PDF; using this class, every element can be added to a document and written to the outputstream.
XMLWorkerHelper: A helper class for parsing XHTML/CSS or XML flow to PDF.

You can find more about these classes from the documentation PdfWriter, XMLWorkerHelper.

Code Example

The flossing code example demonstrates the simplest way to generate a PDF from an HTML file by instantiating a singleton instance out of the XMLWorkerHelper class, parsing the HTML file, and passing the parsed stream to the pdfWriter instance to generate the PDF.

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;

public class Html2Pdf {
    private static final String HTML = "html.html";

    public static void main(String[] args) {
        try {
            Document document = new Document();
	// this method is used to get an instance of the PdfWriter.
            PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("html.pdf"));
            document.open();
	// Get a Singleton XMLWorkerHelper
	// parseXHtml: Parses the xml data in the given reader 
            XMLWorkerHelper.getInstance().parseXHtml(writer, document, new FileInputStream(HTML));
            document.close();
        } catch (IOException | DocumentException e) {
            e.printStackTrace();
        }
    }
}

Maven Dependency

We need to add the following dependencies to the POM.xml file to get the above code working and running.

<dependency>
   <groupId>com.itextpdf</groupId>
   <artifactId>itextpdf</artifactId>
   <version>${itextpdf.version}</version>
</dependency>
<dependency>
   <groupId>com.itextpdf.tool</groupId>
   <artifactId>xmlworker</artifactId>
   <version>${xmlworker.version}</version>
</dependency>

You can find more information about iText here.

Flying Saucer

Flying Saucer is a Java library for converting XML/XHTML into PDF or images; Flying Saucer was made based on the iText.

Code Example

The following code demonstrates how to use the Flying Saucer library by combining the Jsoup library along with the xhtmlrenderer, which is the flying saucer library.
Jsoup is an open-source Java library to parse, extract and manipulate data from HTML files. As Jsoup expects a string, we first need to open the HTML file as a File object first and then pass it to the JSOUP library; and you can find more about this here.
After parsing the HTML file, we pass the data to the FLying saucer library to convert it into a PDF.

import java.io.*;
import java.io.FileOutputStream;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import org.xhtmlrenderer.pdf.ITextRenderer;

public class Main {

   public static void main(String[] args) throws Exception {

       try (OutputStream os = new FileOutputStream("out.pdf")) {
           // opening the file from the path
           File in = new File("html.html");
           // Jsoup expects a string
           Document document = Jsoup.parse(in, null);

           //Convert the HTML format into XHTML
      	document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);

           ITextRenderer iTextRenderer = new ITextRenderer();
           iTextRenderer.setDocumentFromString(document.html());
           iTextRenderer.layout();
           iTextRenderer.createPDF(os);
           System.out.println("PDF created");
       }
   }
}

Maven Dependency

We need to add the following dependencies to the POM.xml file to get the above code working and running.

<dependency>
   <groupId>org.jsoup</groupId>
   <artifactId>jsoup</artifactId>
   <version>1.14.3</version>
</dependency>
<dependency>
   <groupId>org.xhtmlrenderer</groupId>
   <artifactId>flying-saucer-core</artifactId>
   <version>9.1.22</version>
</dependency>

<dependency>
   <groupId>org.xhtmlrenderer</groupId>
   <artifactId>flying-saucer-pdf-openpdf</artifactId>
   <version>9.1.22</version>
</dependency>

You can find more information here.

Comparison

After introducing each of the libraries we have, we need to know which one suits our applications. First, We need to highlight Flying Saucer based on iText, which means minor changes between them. However, openhtmltopdf is based on another library called PDFBOX. PDFBOX is a well-maintained, open-source library with an LGPL license, while, iText is an AGPL license library. Openhtmltopdf is also considered faster than the Flying Saucer.

iText can be considered much more resource-efficient than PDFBOX as it processes the text chunk by chunk, and it also has an event-oriented architecture. On the other hand, openhtmltopdf provides a built-in plugin for SVG and MathML and also provides better support for CSS3 transforms, and one of the drawbacks of openhtmltopdf is that there is no support for OpenType fonts.

Conclusion

In this article, we talked about how to generate pdf from HTML files using Java. Then, we briefly introduced some of the tools/ libraries like openhtmltopdf, iText and Flying Saucer. We also compared them in different properties like complexity, size of generated files, resolution, and Features.

Finally, if you want to have a tool with all the features of these libraries and more, in that case, I recommend that you check out APITemplate.io. APITemplate.io is a tool that can help you generate PDFs quickly with PDF generating API over the cloud and is compatible with CSS, JavaScript, and Python. It also comes with predefined templates which you can reuse and edit.

Table of Contents

Share:

Facebook
Twitter
Pinterest
LinkedIn

Articles for Image Generation

Articles for PDF Generation