Generating PDFs from custom HTML content or directly from website URLs is a common requirement for many real-world applications. In the past, this required writing a lot of custom code and took a lot of time, but now there are many libraries and tools that can do this with just a few lines of code.
In this article, we will explore some approaches to generating PDFs from HTML using Java.
Prerequisites
This article will use Maven for dependency management. If you do not have Maven, you can visit the official website for instructions on how to get started: https://maven.apache.org/
Converting HTML to PDF using Java Libraries
iText
iText is a popular Java library that can be used to generate PDFs from HTML content or website URLs. It provides a simple API to control PDF generation and supports various features such as encryption, digital signatures, and watermarks.
The library is open-source and actively maintained, making it a reliable choice for PDF generation in Java applications.
Add iText into your Maven project
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itext7-core</artifactId>
<version>7.1.9</version>
</dependency>
Generate PDF from a website URL
import com.itextpdf.html2pdf.HtmlConverter;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
try {
String websiteUrl = "https://www.google.com";
String htmlContent = fetchHtmlContentFromUrl(websiteUrl);
if (htmlContent != null) {
HtmlConverter.convertToPdf(htmlContent, new FileOutputStream("google.pdf"));
System.out.println("PDF generated from website URL.");
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static String fetchHtmlContentFromUrl(String urlString) throws IOException {
HttpURLConnection connection = (HttpURLConnection) new URL(urlString).openConnection();
connection.setRequestMethod("GET");
StringBuilder content = new StringBuilder();
try (Scanner scanner = new Scanner(connection.getInputStream())) {
while (scanner.hasNext()) {
content.append(scanner.next());
}
} catch (IOException e) {
e.printStackTrace();
}
return content.toString();
}
}
In the above code, we are generating a PDF from the website URL. iText library does not support generating PDF out of the box and that is why first we need to fetch the website content and then use iText library to generate PDF.
In
fetchHtmlContentFromUrl
we are fetching the website content from the URL.Once we have the website content, we then call
HtmlConverter.convertToPdf
to generate PDF based on the website content we fetched.
Generate PDF from Custom HTML content
import com.itextpdf.html2pdf.HtmlConverter;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
try {
String customHtmlContent = "<html><body><h1>Hello, World!</h1></body></html>";
HtmlConverter.convertToPdf(customHtmlContent, new FileOutputStream("custom_output.pdf"));
System.out.println("PDF generated from custom HTML content.");
} catch (IOException e) {
e.printStackTrace();
}
}
}
iText library supports generating PDFs from the custom HTML content. In the above code we are using HtmlConverter.convertToPdf
and passing out custom HTML content and generating PDF into custom_output.pdf
file.
Flying Saucer
Flying Saucer is a Java library that generates PDFs from HTML content or website URLs. It simplifies PDF generation in Java applications and supports advanced features such as encryption, digital signatures, and watermarks.
Add Flying Saucer to your Maven project
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-pdf</artifactId>
<version>9.1.22</version>
</dependency>
Generate PDF from a website URL
import org.xhtmlrenderer.pdf.ITextRenderer;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
try {
String websiteUrl = "https://www.google.com";
String htmlContent = fetchHtmlContentFromUrl(websiteUrl);
if (htmlContent != null) {
ITextRenderer renderer = new ITextRenderer();
renderer.setDocumentFromString(htmlContent);
renderer.layout();
renderer.createPDF(new FileOutputStream("google.pdf"));
System.out.println("PDF generated from website URL.");
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static String fetchHtmlContentFromUrl(String urlString) throws IOException {
HttpURLConnection connection = (HttpURLConnection) new URL(urlString).openConnection();
connection.setRequestMethod("GET");
StringBuilder content = new StringBuilder();
try (Scanner scanner = new Scanner(connection.getInputStream())) {
while (scanner.hasNext()) {
content.append(scanner.next());
}
} catch (IOException e) {
e.printStackTrace();
}
return content.toString();
}
}
Flying Saucer
also does not support generating PDFs from website URL out of the box. In the above code, we are first fetching the website content from the URL using HttpURLConnection, and once we have the content from the website URL. then we are using ITextRenderer
to create and generate the PDF.
Generate PDF from Custom HTML content
import org.xhtmlrenderer.pdf.ITextRenderer;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
try {
String customHtmlContent = "<html><body><h1>Hello, World!</h1></body></html>";
ITextRenderer renderer = new ITextRenderer();
renderer.setDocumentFromString(customHtmlContent);
renderer.layout();
renderer.createPDF(new FileOutputStream("custom_output.pdf"));
System.out.println("PDF generated from custom HTML content.");
} catch (IOException e) {
e.printStackTrace();
}
}
}
Generating PDF from custom HTML is fairly straight forward. We are using ITextRenderer
to create the document from custom HTML and then using the .createPDF
to generate the PDF into out file path.
Apache PDFBox
Apache PDFBox
is a Java library that provides functionality for creating and manipulating PDF documents. It supports encryption, digital signatures, and various other features related to PDF documents. It uses a simple API to make it easy to work with PDF documents, and it can be used to generate PDFs from HTML content or website URLs. It is open-source and actively maintained, making it a reliable choice for PDF generation in Java applications.
Add Apache PdfBox to your Maven project
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.24</version>
</dependency>
Generate PDF from a website URL
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
try {
String websiteUrl = "https://www.google.com";
String htmlContent = fetchHtmlContentFromUrl(websiteUrl);
if (htmlContent != null) {
createPdfFromText("website_output.pdf", htmlContent);
System.out.println("PDF generated from website URL.");
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static String fetchHtmlContentFromUrl(String urlString) throws IOException {
HttpURLConnection connection = (HttpURLConnection) new URL(urlString).openConnection();
connection.setRequestMethod("GET");
StringBuilder content = new StringBuilder();
try (Scanner scanner = new Scanner(connection.getInputStream())) {
while (scanner.hasNext()) {
content.append(scanner.next());
}
} catch (IOException e) {
e.printStackTrace();
}
return content.toString();
}
public static void createPdfFromText(String fileName, String textContent) throws IOException {
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.setFont(PDType1Font.HELVETICA, 12);
contentStream.beginText();
contentStream.newLineAtOffset(50, 700);
contentStream.showText(textContent);
contentStream.endText();
contentStream.close();
document.save(fileName);
document.close();
}
}
For generating PDF from website URL, first we need to fetch the website content using HttpURLConnection
and once we have the content ready then we are using createPdfFromText
which is then using PDDocument
and PDPageContentStream
to create and generate the PDF.
Generate PDF from Custom HTML content
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
try {
String customHtmlContent = "Hello, World!";
createPdfFromText("custom_output.pdf", customHtmlContent);
System.out.println("PDF generated from custom HTML content.");
} catch (IOException e) {
e.printStackTrace();
}
}
public static void createPdfFromText(String fileName, String textContent) throws IOException {
PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.setFont(PDType1Font.HELVETICA, 12);
contentStream.beginText();
contentStream.newLineAtOffset(50, 700);
contentStream.showText(textContent);
contentStream.endText();
contentStream.close();
document.save(fileName);
document.close();
}
}
Apache PDFBox supports generating PDF from custom HTML content out of the box. We are using PDDocument to create the document first and then using PDPageContentStream to generate PDFs from custom HTML content.
Comparison of the libraries
When it comes to PDF generation in Java, several prominent libraries emerge as popular choices among developers. Among these are iText, Flying Saucer, and Apache PDFBox. Each of these libraries serves distinct use cases and offers its own set of features.
iText is a versatile tool known for its comprehensive capabilities around PDF creation and manipulation. Flying Saucer, while leveraging an older version of iText, specifically focuses on converting HTML content into PDF. On the other hand, Apache PDFBox excels as a general-purpose PDF library with a suite of tools for various PDF operations.
Below is a detailed comparison table to further delineate their strengths and offerings.
Criterion | iText | Flying Saucer | Apache PDFBox |
---|---|---|---|
License | AGPL / Commercial License | LGPL | Apache License 2.0 |
Main Focus | General-purpose PDF library | HTML to PDF conversion (using iText) | General-purpose PDF library |
Programming Language | Java, .NET | Java | Java |
Main Features | Comprehensive documentation, many tutorials, and books available | – HTML/CSS to PDF conversion (based on iText 2.1.7) | – PDF generation & manipulation – PDF rendering |
Complexity/Ease of Use | Medium to high complexity depending on the use-case | Relatively easy for HTML to PDF scenarios | Medium complexity |
Performance | High-performance, optimized for large-scale PDF operations | Good for simple HTML/CSS but can be slow for complex designs | Good performance for general PDF operations |
Documentation | Medium to high complexity depending on the use case | Adequate documentation mainly in the community | Comprehensive official documentation and active community support |
Extensibility | Highly extensible with plugins and addons | Limited to its primary use-case | Extensible and customizable |
Maintainability | Actively maintained with a strong community | Less actively maintained as compared to iText or PDFBox | Actively maintained by the Apache Software Foundation |
Use Cases | – Dynamic PDF generation from data – PDF manipulations like merging, splitting etc. – PDF forms and annotations | – Convert web pages or HTML content to PDFs | – PDF extraction and parsing<br>- PDF rendering for previews |
Converting HTML to PDF using APITemplate.io
The examples above demonstrate how Java libraries can be used to convert HTML and web pages to PDF. However, generating PDFs using templates or keeping track of generated PDFs requires additional work. A PDF generator tracker is necessary to manage the files generated. If custom templates such as invoice generators are desired, they must also be created and managed.
APITemplate.io offers an HTML to PDF API that’s ideal for all those needs. Also, we use a chromium-based PDF renderer that fully supports HTML, content, and CSS.
Let’s see how we can use APITemplate.io to generate PDFs.
i. Generate PDF from Predefined Template
APITemplate.io allows you to manage your templates. Go to “Manage Templates” from the dashboard.
From Manage Template, you can create your own templates. The following is a sample invoice template. There are many templates available that you can choose from and customize based on your requirements.
To start using APITemplate.io APIs, you need to obtain your API Key, which can be obtained from the API Integration
tab.
Now that you have your APITemplate account ready, let’s take some action and integrate it with our application. We will use the template to generate PDFs.
import okhttp3.*;
import java.io.IOException;
public class Main {
public static void main(String[] args) {
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("application/json");
RequestBody body = RequestBody.create(mediaType, "{\n \"date\": \"15/05/2022\",\n \"invoice_no\": \"435568799\",\n \"sender_address1\": \"3244 Jurong Drive\",\n \"sender_address2\": \"Falmouth Maine 1703\",\n \"sender_phone\": \"255-781-6789\",\n \"sender_email\": \"[email protected]\",\n \"rece_addess1\": \"2354 Lakeside Drive\",\n \"rece_addess2\": \"New York 234562\",\n \"rece_phone\": \"34333-84-223\",\n \"rece_email\": \"[email protected]\",\n \"items\": [\n {\"item_name\": \"Oil\", \"unit\": 1, \"unit_price\": 100, \"total\": 100},\n {\"item_name\": \"Rice\", \"unit\": 2, \"unit_price\": 200, \"total\": 400},\n {\"item_name\": \"Mangoes\", \"unit\": 3, \"unit_price\": 300, \"total\": 900},\n {\"item_name\": \"Cloth\", \"unit\": 4, \"unit_price\": 400, \"total\": 1600},\n {\"item_name\": \"Orange\", \"unit\": 7, \"unit_price\": 20, \"total\": 1400},\n {\"item_name\": \"Mobiles\", \"unit\": 1, \"unit_price\": 500, \"total\": 500},\n {\"item_name\": \"Bags\", \"unit\": 9, \"unit_price\": 60, \"total\": 5400},\n {\"item_name\": \"Shoes\", \"unit\": 2, \"unit_price\": 30, \"total\": 60}\n ],\n \"total\": \"total\",\n \"footer_email\": \"[email protected]\"\n}");
Request request = new Request.Builder()
.url("https://rest.apitemplate.io/v2/create-pdf?template_id=YOUR_TEMPLATE_ID")
.method("POST", body)
.addHeader("X-API-KEY", "YOUR_API_KEY")
.addHeader("Content-Type", "application/json")
.build();
Response response = null;
try {
response = client.newCall(request).execute();
System.out.println(response.body().string());
} catch (IOException e) {
e.printStackTrace();
System.err.println("Error: " + e.getMessage());
}
}
}
And if we check the response_string
, we have the following:
{
"download_url":"PDF_URL",
"transaction_ref":"8cd2aced-b2a2-40fb-bd45-392c777d6f6",
"status":"success",
"template_id":"YOUR_TEMPLATE_ID"
}
In the above code, it’s very easy to use ApiTemplate to convert HTML to PDF because we don’t need to install any other library. We just need to call one simple API and use our data as a request body, and that’s it!
You can use the download_url
from the response to download or distribute the generated PDF.
ii. Generate PDF from Website URL
ApiTemplate also supports generating PDFs from website URLs.
import okhttp3.*;
import java.io.IOException;
public class Main {
public static void main(String[] args) {
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("application/json");
String customHeaderFooterStyle = "<style>#header, #footer { padding: 0 !important; }</style>" +
"<table style=\"width: 100%; padding: 0px 5px;margin: 0px!important;font-size: 15px\">" +
"<tr>" +
"<td style=\"text-align:left; width:30%!important;\"><span class=\"date\"></span></td>" +
"<td style=\"text-align:center; width:30%!important;\"><span class=\"pageNumber\"></span></td>" +
"<td style=\"text-align:right; width:30%!important;\"><span class=\"totalPages\"></span></td>" +
"</tr>" +
"</table>";
RequestBody body = RequestBody.create(mediaType, "{\n" +
" \"url\": \"https://en.wikipedia.org/wiki/Sceloporus_malachiticus\",\n" +
" \"settings\": {\n" +
" \"paper_size\": \"A4\",\n" +
" \"orientation\": \"1\",\n" +
" \"header_font_size\": \"9px\",\n" +
" \"margin_top\": \"40\",\n" +
" \"margin_right\": \"10\",\n" +
" \"margin_bottom\": \"40\",\n" +
" \"margin_left\": \"10\",\n" +
" \"print_background\": \"1\",\n" +
" \"displayHeaderFooter\": true,\n" +
" \"custom_header\": \"" + customHeaderFooterStyle + "\",\n" +
" \"custom_footer\": \"" + customHeaderFooterStyle + "\"\n" +
" }\n" +
"}");
Request request = new Request.Builder()
.url("https://rest.apitemplate.io/v2/create-pdf-from-url")
.method("POST", body)
.addHeader("X-API-KEY", "YOUR_API_KEY")
.build();
Response response = null;
try {
response = client.newCall(request).execute();
System.out.println("PDF generated successfully: " + response.body().string());
} catch (IOException e) {
e.printStackTrace();
System.err.println("Error: " + e.getMessage());
}
}
}
In the above code, we can provide the URL in the request body along with the settings for the PDF. APITemplate will use this request body to generate a PDF and return a download URL for your PDF.
iii. Generate PDF from Custom HTML Content
If you want to generate PDFs using your own custom HTML content, ApiTemplate also supports that.
import okhttp3.*;
import java.io.IOException;
public class Main {
public static void main(String[] args) {
OkHttpClient client = new OkHttpClient().newBuilder()
.build();
MediaType mediaType = MediaType.parse("application/json");
String customHeaderFooterStyle = "<style>#header, #footer { padding: 0 !important; }</style>" +
"<table style=\"width: 100%; padding: 0px 5px;margin: 0px!important;font-size: 15px\">" +
"<tr>" +
"<td style=\"text-align:left; width:30%!important;\"><span class=\"date\"></span></td>" +
"<td style=\"text-align:center; width:30%!important;\"><span class=\"pageNumber\"></span></td>" +
"<td style=\"text-align:right; width:30%!important;\"><span class=\"totalPages\"></span></td>" +
"</tr>" +
"</table>";
RequestBody body = RequestBody.create(mediaType, "{\n" +
" \"body\": \"<h1> hello world {{name}} </h1>\",\n" +
" \"css\": \"<style>.bg{background: red};</style>\",\n" +
" \"data\": {\n" +
" \"name\": \"This is a title\"\n" +
" },\n" +
" \"settings\": {\n" +
" \"paper_size\": \"A4\",\n" +
" \"orientation\": \"1\",\n" +
" \"header_font_size\": \"9px\",\n" +
" \"margin_top\": \"40\",\n" +
" \"margin_right\": \"10\",\n" +
" \"margin_bottom\": \"40\",\n" +
" \"margin_left\": \"10\",\n" +
" \"print_background\": \"1\",\n" +
" \"displayHeaderFooter\": true,\n" +
" \"custom_header\": \"" + customHeaderFooterStyle + "\",\n" +
" \"custom_footer\": \"" + customHeaderFooterStyle + "\"\n" +
" }\n" +
"}");
Request request = new Request.Builder()
.url("https://rest.apitemplate.io/v2/create-pdf-from-html")
.method("POST", body)
.addHeader("X-API-KEY", "YOUR_API_KEY")
.build();
Response response = null;
try {
response = client.newCall(request).execute();
System.out.println("PDF generated successfully: " + response.body().string());
} catch (IOException e) {
e.printStackTrace();
System.err.println("Error: " + e.getMessage());
}
}
}
Similar to generating a PDF from a website URL, the API request above takes the body and CSS as part of the payload to generate a PDF.
Performance Considerations
When generating PDFs from HTML content, using open-source third-party libraries can work well in most cases. However, when it comes to scaling and handling edge cases, you need to handle those issues yourself.
This is where APITemplate.io comes in handy. It eliminates the need to worry about performance or scaling issues as it manages them for you. Even error situations are handled by APITemplate.io.
Conclusion
PDF generation is now a common requirement for many real-world applications. In the past, this required writing a lot of custom code and took a lot of time, but now there are many libraries and tools that can do this with just a few lines of code. This article explores some approaches to generating PDFs from HTML using Java.
The article demonstrates how to use iText, Flying Saucer, and Apache PDFBox libraries to convert HTML and web pages to PDF. It also introduces APITemplate.io as an API-based PDF generation platform that offers a solution for managing templates and generating PDFs through simple API calls.
Sign up for a free account with us now and start automating your PDF generation.
Libraries: