5 Advanced topics for HTML to PDF conversions

Introduction

The PDF file format is handy for downloading large amounts of data from a web service. It allows users to download dynamic material as a file for offline use. The HTML content is transformed into a PDF document and downloaded as a PDF file using the export to PDF functionality.

The advanced capabilities discussed here will enable you to customize the page-level rendering of a PDF in general. Unfortunately, because HTML and CSS lack the concept of a page in the same manner as PDF does, there are no procedures to assist with page-level rendering.

This article provides a detailed insight into creating print-friendly web pages using HTML and CSS. You can easily convert print-friendly web pages into PDFs with our HTML to API.

1. How to force page break?

Page breaks are used in paged media, such as printed books or documents. When a page breaks, the current page’s layout stops, and the remaining elements of the content are laid out on a new page. This may be seen in PDF documents when some pages have a lot of space, and the text continues on the following page. The page may break inside chunks of content such as text, lists, code snippets, images, and so on if no page break rules are provided.

Three attributes influence each conceivable break point (in other words, each element boundary): the preceding element’s break-after value, the next element’s break-before value, and the contained element’s break-inside value.

The following rules are used to assess whether a break is required:

  • It takes precedence if any of the three concerned values (always, left, right, page, column, or region) is a forced break value. If there are multiple such breaks, the one that appears the most recent in the flow is chosen (i.e., the break-before value has precedence over the break-after value, which itself has precedence over the break-inside value).
  • If any of the three concerned values (avoid, avoid-page, avoid-region, or avoid-column) is an avoid break value, no such break will be applied at that time.

Soft breaks may be added on element borders that resolve in a corresponding avoid value after forced breaks have been applied, but not on element boundaries that resolve in a corresponding avoid value.

Example:

Setting the ‘page-break-before: always’ and ‘page-break-after: always’ styles for an HTML element will insert page breaks before and after that element in the produced PDF document.

Add <p style=”page-break-before: always”> before starting a new printed page to suggest a page break. For example, if will get three pages

HTML

This is the text for page #1.

<p style="page-break-before: always">

Page #2

<p style="page-break-before: always">

Page #3

Output PDF

2. How to avoid page breaks inside the elements?

The page may break in the middle of an element, such as an image, causing it to be split and span two pages. This is typically an unfavorable outcome.

The page-break-within property in CSS specifies how the page breaks while printed inside the element to which it is applied. It inserts a page break or, in some cases, prevents a page break from occurring within an element during printing.

The page-break-inside property determines whether or not a page break should be avoided within a given element.

Browsers should treat page-break-inside as an alias of break-inside for compatibility reasons. This ensures that sites that use page-break-inside continue to function correctly.

CSS

<style type="text/css">
	body {
		background: white;
	}

  li {
		background: grey;
		height: 100px;
		font-size: 50px;
		page-break-inside: avoid;
		list-style-type: none;
		padding: 2px 0px 00px 0px;
    text-align: center
	}

	li:nth-child(odd) {
		background: lightgrey;
	}
</style>

HTML

<h1>ExampleOne</h1>

<h2>CSS page-break-inside property</h2>

<br>

<ul>
    <li>Data Structure</li>
    <li>Algorithms</li>
    <li>C Programming</li>
    <li>C++ Programming</li>
    <li>Java Programming</li>
    <li>Python Programming</li>
    <li>PHP Programming</li>
    <li>Operating System</li>
    <li>Computer Networks</li>
    <li>DBMS</li>
    <li>SQL</li>
    <li>TOC</li>
</ul>

PDF Output (Comparison)

Without page-break-inside
With page-break-inside

3. How to set the background of a PDF?

Set a block’s backdrop to the supplied PDF. Except for the structure of the background file, this works identically to the background-image attribute. The PDF is stretched or squashed to fit if it is not the same size as the element whose background is set.

The PDF page to draw can be set by adding a “#page=page” to the URL for multi-page pictures. The syntax “#page” is comparable to legacy use.

HTML

<div id="background"></div>

<h1>
  Setting the background color to yellow
</h1>

CSS

<style type="text/css">
  h1{
    padding: 30px;
  }
#background {
    background: yellow;
    position: fixed;
    top: 0;
    bottom: 0;
    left: 0;
    right: 0;
    z-index: -1;
  }
</style>

Output PDF

4. How to use emoji in the PDF?

Emojis have become a common feature of digital communication, and nearly all document and communication formats now allow them somehow. This, of course, means that they are also supported in PDF format.

Including emojis when converting HTML files to PDF documents is basic and clear; we simply need to add the font with the emojis to a FontProvider so that it can be passed to the HtmlConverter during conversion.

Emojis are characters. Therefore, they can be copied, displayed, and resized like any HTML character.

You can use emojis in the PDF using both HTML and CSS.

HTML

<h1>Emojis</h1>

<p style="font-size:48px">
😀 😃 😄 😁 😆 😅 😂 🤣 🥲
</p>

CSS

<style>
	@font-face {
		font-family: 'Noto Color Emoji';
		src: url(<https://raw.githack.com/googlefonts/noto-emoji/main/fonts/NotoColorEmoji.ttf>);
	}

	body {
		font-family: Noto Color Emoji, Lusitana;
	}
</style>

Output PDF

5. How to repeat table headers on each page?

Some web pages may have huge HTML tables divided across multiple pages. Select. The header of the HTML table can be repeated at the top of every pdf page that contains a part of that table.

This can be done by adding a CSS property to the HTML table’s thead element.

You’ve got a 100-row HTML table. The table content can expand indefinitely when a large table like this is shown in a browser. You don’t have to worry about pages in the browser world—which page your table rows arrive on, how many can fit on a page, etc.

However, if you save this table as a PDF, you’ll have to deal with page breaks. Rows will need to be divided up and shown across many pages if a table is longer than a single page or starts halfway down a page.

Example

The table headers should ideally be rendered on each page displaying the table’s rows so that the headers may be understood without going back to the previous page.

Table headers that repeat in this manner are supported and enabled by default! So all you have to do now is make sure your header this inside a thead element.

When converted to pdf, the following html table will repeat the table heading at the top of each page:

<table>
  <thead>
    <tr>
      <th>Qty</th>
      <th>Description</th>
      <th>Price</th>
      <th>Subtotal</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>2</td>
      <td>Blue large widgets</td>
      <td>$15.00</td>
      <td>$30.00</td>
    </tr>
    <!-- ...Many more rows... -->
  </tbody>
</table>

The headers are then rendered again on pages that have overflow rows. When a tfoot element is used, repeating works with table footers as well:

<table>
  <thead>...</thead>
  <tbody>
    <tr>
      <td>2</td>
      <td>Blue large widgets</td>
      <td>$15.00</td>
      <td>$30.00</td>
    </tr>
    <!-- ...Many more rows... -->
  </tbody>
  <tfoot>
    <tr>
      <th>Table Footer</th>
    </tr>
  </tfoot>
</table>

Output PDF

Conclusion

Hopefully, these capabilities will give you complete control over making PDFs that appear precisely how you want them to.

Table of Contents

Share:

Facebook
Twitter
Pinterest
LinkedIn

Articles for Image Generation

Articles for PDF Generation