• There is NO official Otland's Discord server and NO official Otland's server list. The Otland's Staff does not manage any Discord server or server list. Moderators or administrator of any Discord server or server lists have NO connection to the Otland's Staff. Do not get scammed!
  • 2026 staff recruitment is open! Check it out and consider applying!

How to generate PDF dynamically? PrinceXML cracked? LaTeX?

TibiCAM

Veteran OT User
Joined
Feb 3, 2020
Messages
214
Reaction score
295
Converting HTML to PDF has always been a big topic among web developers. There's been numerous tools over the years and they all come and go. One that still stands is PrinceXML. But their license is absurd. It's about $3800 USD or something for 1 license. Absolute lunacy.

A lot of people resort to using Puppeteer or Chrome in a headless browser. But that's resource heavy and terrible at cold starts. Especially if you're going to print a few thousand PDFs per day.

I'm in need of a way to dynamically generate PDFs in my web app, mainly as invoices but also for some other stuff. I have been thinking of using HTML to PDF. Currently, I've tried Puppeteer and didn't like how much resources it uses. And I've also tried wkhtmltopdf which is deprecated and has some security issues. It works, but like I said, it's unsecure.

So then I researched and all signs lead to PrinceXML. But you "may not" use their tool for commercial use (i.e. invoices). And their free version has a watermark (and probably metadata that they add). I wonder if there's anyone here who knows how to get it for free (i.e. remove watermark and any additional metadata)? Because let's be honest, nobody is paying $3800 as a small web dev to generate crisp PDFs from HTML.

I've also been thinking about LaTeX to PDF. My web app is in Node.js and I have all the data there, but I could easily programmatically build a LaTeX file from Node.js. I've never used LaTeX myself, ever. But from my understanding it's quite straight forward to generate LaTeX files. And I've heard they have some CLI tool (unofficial or official?) to convert to PDF.

So I wonder if HTML to PDF isn't the best option, how would you dynamically generate PDFs from data you have in your web app/database?

Anyone here ever used PrinceXML? I can't find any videos showing what it looks like as it seems their only customers are large businesses. And has anyone here used LaTeX to create PDFs? That means I'd need to convert my data into LaTeX and then to PDF. All can be done via Node.js by running shell commands to a bash script.

Note: I do not need to use any fancy CSS. Even just text and some table lists is all I need. And my logo (image).
Post automatically merged:

Consider this receipt I made in HTML:
HTML:
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta name="author" content="">
    <meta name="title" content="">
    <meta name="description" content="">
    <meta name="keywords" content="">

    <title>Receipt</title>

    <style>
        *, *::after, *::before {
            box-sizing: border-box;
            margin: 0;
            padding: 0;
        }
     
        html, body {
            font-family: "Arial", Helvetica, sans-serif;
            line-height: 1.6;
            color: #333;
            background-color: #fff;
        }
     
        body {
            padding: 16px;
            max-width: 800px;
            margin: 0 auto;
            height: 100%;
            width: 100%;
            overflow: hidden;
        }
     
        /* Header */
        h1 {
            font-size: 24px;
            font-weight: 600;
            color: #222;
            margin-bottom: 10px;
        }
     
        .receipt-meta {
            color: #292929;
            margin-bottom: 30px;
        }
     
        .receipt-meta span {
            display: block;
            font-size: 14px;
            margin: 4px 0;
        }
     
        /* Table Styles */
        .table {
            width: 100%;
            border-collapse: collapse;
            margin-bottom: 20px;
        }
     
        th, td {
            padding: 5px 20px;
            text-align: left;
            font-size: 14px;
            border-bottom: 1px solid #ddd;
        }

        th {
            padding: 12px 20px;
        }
     
        th {
            background-color: #f8f8f8;
            font-weight: 600;
            color: #444;
        }
     
        td {
            color: #3b3b3b;
        }
     
        td span {
            font-size: 12px;
            color: #888;
        }
     
        .total {
            font-weight: 700;
        }
     
        .total td {
            font-size: 14px;
            padding: 10px 20px;
        }
     
        /* Footer */
        .company-info {
            margin-top: 40px;
            font-size: 14px;
            line-height: 1.6;
            color: #777;
        }
     
        .company-info p {
            margin-bottom: 10px;
        }
     
        .company-info p:last-child {
            margin-bottom: 0;
        }
     
        /* Utility Classes */
        .align-right {
            text-align: right;
        }
     
        .align-center {
            text-align: center;
        }
     
        .text-muted {
            color: #999;
        }
     
        .bold {
            font-weight: 700;
        }

        .shipping,
        .vat-included {
            border: none;
        }

        .shipping {
            border-top: 2px solid #333;
        }

        .receipt-footer {
            margin-top: 32px;
            color: #292929;
            font-size: 12px;
        }
    </style>
</head>
<body>
    <h1>Receipt from Example, LLC.</h1>
    <div class="receipt-meta">
        <span>Date: <strong>2025-07-25</strong></span>
        <span>Receipt number: <strong>1234567890</strong></span>
        <span>Order number: <strong>987654</strong></span>
        <span>Customer reference: <strong>[email protected]</strong></span>
        <span>Payment method: <strong>VISA ************1234</strong></span>
    </div>
 
    <table class="table">
        <thead>
            <tr>
                <th>Product</th>
                <th>Quantity</th>
                <th class="align-right">Amount</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>Jacket (Green)<br><span>Prod.No: 12345</span></td>
                <td>2</td>
                <td class="align-right">20.00 EUR</td>
            </tr>
            <tr>
                <td>Jacket (White)<br><span>Prod.No: 22300</span></td>
                <td>1</td>
                <td class="align-right">15.00 EUR</td>
            </tr>
            <tr class="shipping">
                <td colspan="2" class="align-right">Shipping:</td>
                <td class="align-right">8.00 EUR</td>
            </tr>
            <tr class="vat-included">
                <td colspan="2" class="align-right">Amount excl. VAT:</td>
                <td class="align-right">34.40 EUR</td>
            </tr>
            <tr class="vat-included">
                <td colspan="2" class="align-right">VAT amount (25.00%):</td>
                <td class="align-right">8.70 EUR</td>
            </tr>
            <tr class="total">
                <td colspan="2" class="align-right bold">Total incl. VAT:</td>
                <td class="align-right bold">43.00 EUR</td>
            </tr>
        </tbody>
    </table>
 
    <div class="company-info">
        <p>
            <span class="bold">VAT number:</span> 10001-0001<br>
            <span class="bold">Org. number:</span> 1234-5678<br>
            <span class="bold">Website:</span> www.example.com<br>
            <span class="bold">Contact:</span> [email protected]
        </p>
     
        <div class="company-address">
            <p>
                <span class="bold">Example, LLC.</span><br>
                123 Maple Street<br>
                90210 Beverly Hills<br>
                United States
            </p>
        </div>
    </div>

    <p class="receipt-footer">Your satisfaction is important to us. If you need to return your purchase, you have 14 calendar days from the date of receipt. Items must be unused, with all safety tags attached, and in their original packaging. Return shipping is at your expense. For full terms, visit www.example.com/terms. We recommend that you save or print this receipt. This receipt may be requested for returns or claims.</p>
</body>
</html>

I'm running this through "PrinceXML" CLI tool on Debian:

Bash:
cat index.html | prince --input=html --no-network --output="Receipt-987654.pdf" -

If I check on this site for EXIF data it tells me it includes a lot of MetaData from PrinceXML: Check files for metadata info (https://www.metadata2go.com)

So I installed "exiftool" to delete any metadata, like this:

Bash:
sudo apt install libimage-exiftool-perl

And then I run it on my generated PDF:

Bash:
exiftool -all= -overwrite_original Receipt-987654.pdf

If I check again on that EXIF website, it now shows no more metadata from PrinceXML. That's fine.

However... they still have their watermark in the top-right corner. And if I check with grep I can still find some info:

Bash:
$ grep -ai "prince" Receipt-987654.pdf
<</Producer (Prince 16.1 \(www.princexml.com\))

So I need a way to:

1) Delete any extra information inside the file about "PrinceXML"
2) Delete their watermark on the PDF.

Any idea how to do this?

I downloaded PrinceXML from their website using the deb package to Debian 12:


I refuse to pay thousands of dollars to turn HTML into PDF.

I'm attaching the PDF here with removed metadata using exiftool.
 

Attachments

Last edited:
I spent a long time tinkering with this and I've finally found a solution. I have now created a bash script that properly converts HTML to PDF. It's using PrinceXML, pdftk, qpdf and exiftool to remove any metadata, objects, annotations and ID added by PrinceXML. It's also linearizing the output PDF document - which means it gets optimized for web. The result is a light weight file.

You can input HTML files or raw HTML. You can specify the output file name (default is "document.pdf") and the document title (default is "Document"). Works flawlessly on Debian 12. Simply install PrinceXML, pdftk, qpdf and exiftool (instructions can be found in the source code).

This is much better than using Puppeteer, Gotenberg or other headless browser PDF conversion tools, because of PrinceXML's excellent conversion algorithms. So if you're in need of a way to easily convert HTML to PDF dynamically, this is for you. No need to pay for PrinceXML's license ($3800 USD/year per license).

Here's the full script:

Bash:
#!/bin/bash
set -euo pipefail

#************************************************************
#
# HTML to PDF using PrinceXML, pdftk, qpdf and exiftool.
# This ensures the generated PDF is linearized (optimized for web)
# and that it does not include any PrinceXML metadata, images or objects.
#
# Usage:
#  ./convert.sh {HTMLInput} {OutputPDF} {DocumentTitle}
# Examples:
#  ./convert.sh index.html document.pdf "Test Document"
#  ./convert.sh "<html><body><h1>Hello World</h1><p>This is a test</p></body></html>" document.pdf "HTML to PDF"
#
# Dependencies:
#  - PrinceXML: https://www.princexml.com/download/16/
#  - pdftk: sudo apt install pdftk
#  - qpdf: sudo apt install qpdf
#  - exiftool: sudo apt install libimage-exiftool-perl
#
#************************************************************


#************************************************************
#
# Ensure all dependencies are installed.
#
#************************************************************
command -v prince >/dev/null || { echo "PrinceXML is not installed."; exit 1; }
command -v pdftk >/dev/null || { echo "pdftk is not installed."; exit 1; }
command -v qpdf >/dev/null || { echo "qpdf is not installed."; exit 1; }
command -v exiftool >/dev/null || { echo "exiftool is not installed."; exit 1; }


#************************************************************
#
# Create a temporary directory.
#
#************************************************************
TEMP_DIR=$(mktemp -d)
trap 'rm -rf "$TEMP_DIR"' EXIT


#************************************************************
#
# File input and output.
#
#************************************************************
if [[ -z "${1:-}" ]]; then
  exit 1
fi

INPUT_HTML="$1"
OUTPUT_FILE="${2:-document.pdf}"
OUTPUT_TITLE="${3:-Document}"


#************************************************************
#
# Convert HTML to PDF using "PrinceXML".
#
#************************************************************
if [[ -f "$INPUT_HTML" ]]; then
  prince --input=html --no-network --output="$TEMP_DIR/temp_document.pdf" "$INPUT_HTML"
else
  echo "$INPUT_HTML" | prince --input=html --no-network --output="$TEMP_DIR/temp_document.pdf" -
fi


#************************************************************
#
# Uncompress for sed editing using "pdftk".
#
#************************************************************
pdftk "$TEMP_DIR/temp_document.pdf" output "$TEMP_DIR/uncompressed_document.pdf" uncompress


#************************************************************
#
# Remove annotations, tags, trailer ID and more objects.
#
#************************************************************
sed -r '
  /^\/Annots/d;
  s#(/[^ ]+ )\([^)]*[Pp][Rr][Ii][Nn][Cc][Ee][^)]*\)#\1()#g;
  s#\(\)\)#\(\)#g;
  s#^[[:space:]]*/pdftk_PageNum[[:space:]]+[0-9]+[[:space:]]*##I;
  s#^[[:space:]]*/ID[[:space:]]+\[[^]]+\][[:space:]]*##I;
' "$TEMP_DIR/uncompressed_document.pdf" > "$TEMP_DIR/cleaned_document.pdf"


#************************************************************
#
# Compress the cleaned file using "pdftk".
#
#************************************************************
pdftk "$TEMP_DIR/cleaned_document.pdf" output "$TEMP_DIR/compressed_document.pdf" compress


#************************************************************
#
# Remove metadata using "exiftool".
#
#************************************************************
exiftool -all= -Title="$OUTPUT_TITLE" -overwrite_original "$TEMP_DIR/compressed_document.pdf" >/dev/null 2>&1


#************************************************************
#
# Linearize and optimize file size and performance for web with "qpdf".
#
#************************************************************
qpdf --linearize "$TEMP_DIR/compressed_document.pdf" "$OUTPUT_FILE"


#************************************************************
#
# Cleanup and exit.
#
#************************************************************
rm -rf "$TEMP_DIR"
exit 0

Notice that I have added "--no-network" to the PrinceXML command. If you have external sources in your HTML file (such as images, fonts, CSS) then you need to remove this parameter. Personally, I do not have any external sources that I embed in my HTML file, so I will keep that parameter.

If you want to manually examine a PDF document output by this script, you need to uncompress the file. For example if we create an empty HTML file and convert that to PDF (this is barebones) you will see all the metadata and objects it includes.

Bash:
# Create an empty HTML document.
touch test.html

# Convert to PDF using the script.
./convert.sh test.html test.pdf "A Test File"

# Examine metadata.
exiftool test.pdf

# Uncompress and examine the file contents.
pdftk test.pdf output test_uncompressed.pdf uncompress
less test_uncompressed.pdf

# Verify there are no "PrinceXML" tags or objects.
grep -ai "prince" test_uncompressed.pdf
 
Last edited:
I have now tested this thoroughly and have used it to create 1000 PDF documents in a row, ranging from 1-500 pages in length, and it's incredibly fast! Can't even compare it with Puppeteer or Gotenberg. I'm really satisfied with the end result.
 
Nice work, in the past i used gothenberg and i worked well for me. Why didnt you want to use him? Was he too slow for you?
 
Nice work, in the past i used gothenberg and i worked well for me. Why didnt you want to use him? Was he too slow for you?

Gotenberg uses a headless browser in the background. It scales very badly because of that. If you're in need of generating thousands of PDF documents per hour, Gotenberg and Puppeteer and other headless browser solutions will be your biggest bottleneck. I prefer to use PrinceXML because it's not only faster but has the best CSS support.

PrinceXML is made by the guy who invented CSS, and the guy who made the web, and a few other big people in the industry. They have perfected their software and it requires no headless browser.

Today I also learned about Apache FOP - which can turn XSL-FO objects (basically XML files) into PDF. That can also be an alternative to headless browser converters. It's not used a lot but I found that Hetzner uses it (I checked the metadata of one of their invoices). But its CSS support is not as great and it may not return an exact copy of what you wanted to make. For example, fonts may look smaller/bigger, positions may be slightly off, etc.

 
Last edited:
Is anyone using it? And if so, what do you think of it? Any ideas on how to improve it even further perhaps?
 
Is anyone using it? And if so, what do you think of it? Any ideas on how to improve it even further perhaps?

I actually had to create a PDF from HTML yesterday, and this worked incredibly well. Nice job!
 
I actually had to create a PDF from HTML yesterday, and this worked incredibly well. Nice job!

Thank you. If you find any improvements please let me know. I had some issues with large images only, but I solved it by setting a max-width to 800px or somewhere near that. If you find any issues please report it here thanks!
 
Thank you. If you find any improvements please let me know. I had some issues with large images only, but I solved it by setting a max-width to 800px or somewhere near that. If you find any issues please report it here thanks!
It works great for my webshop. I haven't had any issues with it yet. It does spike up the CPU quite a bit for 1.5 seconds when it converts. So far I have used it to create receipts and send it via email to customers, but I only do like 10 receipts at a time so the CPU is no issue. Not sure how this scales beyond it though. Great work!
 
Back
Top