DEV Community: IDRSolutions

iOS and HTML5: Gotcha with Absolute Positioning

IDRSolutions — Wed, 29 Jul 2026 08:52:21 +0000

Introduction

One of the aims of BuildVu and all of its various view modes (all 9 of them) was to make viewing of PDF files easy and platform-independent, where the user only needs a relatively modern web browser to view them.

And as we designed the output to be used by the browser, we also allow you to select and search the text using your browser’s default tools, and this free functionality normally works great in all web browsers, across all platforms, even Android devices, as you can see in the images in this post.

Issue on iOS Browsers

However, this sadly isn’t true within Apple’s current version of Safari on iOS and Chrome on iOS. Currently, both don’t quite support all the latest CSS to as great a degree as other mobile devices, and as a result of its bizarre Selection engine, it’s very difficult and often impossible to select text on pages containing complicated CSS (explained below).

We recently had a customer query us about why they couldn’t select the text of our output on their iPad, which struck us as an odd question; the default mode for our output has had selectable text for as long as I can recall so my initial thought was that it may have just been a user unfamiliar with how to select text on an iPad. However, we still checked to be sure and were surprised to find that the text wasn’t selectable.

This was puzzling because, as I mentioned before, the text has always been selectable, it is, after all, just text within a div tag in the HTML, and we were sure it worked previously.

Investigating the Problem

After going over our current output, I found some older output that worked and had a look at the differences to the current version.

Visually they looked almost identical, with a few improvements in regard to character spacing in our current version and a different background colour.

Structurally, the newer version differs quite a lot from the older version. In our older versions, we placed the text within div tags under our parent jpedal tag with styling like so:

<body style="background-color: rgb(55,55,65);">
<div id="jpedal" style="position:relative; width: 984px; margin: 0 auto;">
<!– Shared CSS values –>
<style type="text/css" >
.t {
position:absolute;
white-space:nowrap;
overflow:visible;
z-index:1;
}
.tr {
-webkit-transform-origin: left top;
-ms-transform-origin: left top;
-moz-transform-origin: left top;
-o-transform-origin: left top;
}
</style>
<!– Inline CSS values –>
<style type="text/css" >
#t1_1 {
left:90px;
top:60px;
FONT-SIZE: 60px;
FONT-FAMILY: CataneoBT-Regular1;
color:rgb(0,85,149);
}
#t2_1 {
-webkit-transform:matrix(0.97,0,-0.2,0.97,114, 181);
-ms-transform:matrix(0.97,0,-0.2,0.97,114, 181);
-moz-transform:matrix(0.97,0,-0.2,0.97,114, 181);
-o-transform:matrix(0.97,0,-0.2,0.97,114, 181);
FONT-SIZE: 21px;
FONT-FAMILY: IGNACK-RaleighBT-Roman1;
color:rgb(35,32,32);
}
#t3_1 {
-webkit-transform:matrix(0.97,0,-0.2,0.97,350, 212);
-ms-transform:matrix(0.97,0,-0.2,0.97,350, 212);
-moz-transform:matrix(0.97,0,-0.2,0.97,350, 212);
-o-transform:matrix(0.97,0,-0.2,0.97,350, 212);
FONT-SIZE: 13px;
FONT-FAMILY: IGNACK-RaleighBT-Roman1;
color:rgb(35,32,32);
}
</style>
<!– Any embedded fonts defined here –>
<style type="text/css" >
@font-face {
font-family: CataneoBT-Regular1;
src: url("01/fonts/CataneoBT-Regular.woff");
}
@font-face {
font-family: IGNACK-RaleighBT-Roman1;
src: url("01/fonts/IGNACK-RaleighBT-Roman.woff");
}
</style>
<!– Text defined here and setup in CSS –>
<div id="t1_1″ class="t">Some things never change</div>
<div id="t2_1″ class="t tr">Never trust a dog to watch your food.</div>
<div id="t3_1″ class="t tr">â��</div>

We simply apply the correct styling and letter spacing to each element via its class and ID attributes.

Introduction of Parent Divs

To reduce the large amount of class=”t”, which is a CSS class in our older output that contained some CSS rules common to all of our text and other repeated values in the CSS for each div’s ID, we introduced several parent divs that reduce file size and make our CSS easier to understand.

Example of Current Output

Below you can see an example of the current output and it’s structure (Note: As with the previous example, this is just a snippet of the relevant parts of our output):

<body style="background-color:#919191;">
<div id="jpedal" style="position:relative; width: 984px; height: 1179px; overflow: hidden; margin: 0 auto; box-shadow: 0 2px 6px rgba(100, 100, 100, 0.5);">
<!– Begin shared CSS values –>
<!–[if lt IE 9]><style type="text/css">.text div div{zoom: 25%;}</style><![endif]–>
<style type="text/css" >
.text {
position: absolute;
-webkit-transform-origin: top left;
-moz-transform-origin: top left;
-o-transform-origin: top left;
-ms-transform-origin: top left;
-webkit-transform: scale(0.25);
-moz-transform: scale(0.25);
-o-transform: scale(0.25);
-ms-transform: scale(0.25);
z-index: 1;
}
.text div div {
position:absolute;
white-space:nowrap;
overflow:visible;
}
</style>
<!– End shared CSS values –>
<!– Begin inline CSS –>
<style type="text/css" >
#t1_1{left:360px;top:240px;}
#t2_1{-webkit-transform:matrix(0.97,0,-0.2,0.97,456, 724);-ms-transform:matrix(0.97,0,-0.2,0.97,456, 724);-moz-transform:matrix(0.97,0,-0.2,0.97,456, 724);-o-transform:matrix(0.97,0,-0.2,0.97,456, 724);}
#t3_1{-webkit-transform:matrix(0.97,0,-0.2,0.97,1400, 848);-ms-transform:matrix(0.97,0,-0.2,0.97,1400, 848);-moz-transform:matrix(0.97,0,-0.2,0.97,1400, 848);-o-transform:matrix(0.97,0,-0.2,0.97,1400, 848);}
#t4_1{left:1456px;top:848px;}
#t2_1,#t3_1 {
-webkit-transform-origin: left top;
-ms-transform-origin: left top;
-moz-transform-origin: left top;
-o-transform-origin: left top;
}
.s2_1{
FONT-SIZE: 84px;
FONT-FAMILY: IGNACK-RaleighBT-Roman1;
color: rgb(35,32,32);
}
.s1_1{
FONT-SIZE: 240px;
FONT-FAMILY: CataneoBT-Regular1;
color: rgb(0,85,149);
}
.s3_1{
FONT-SIZE: 52px;
FONT-FAMILY: IGNACK-RaleighBT-Roman1;
color: rgb(35,32,32);
}
</style>
<!– End inline CSS –>
<!– Begin embedded font definitions –>
<style type="text/css" >
@font-face {
font-family: CataneoBT-Regular1;
src: url("index/fonts/CataneoBT-Regular.woff");
}
@font-face {
font-family: IGNACK-RaleighBT-Roman1;
src: url("index/fonts/IGNACK-RaleighBT-Roman.woff");
}
</style>
<!– End embedded font definitions –>
<!– Begin text definitions (Positioned/styled in CSS) –>
<div class="text">
<div class="s1_1″>
<div id="t1_1″>Some things never change</div>
</div>
<div class="s2_1″>
<div id="t2_1″>Never trust a dog to watch your food.</div>
</div>
<div class="s3_1″>
<div id="t3_1″>â��</div>

This reduced our output length by a lot; not having to output the font-family per ID and the class=”t” per element adds up to a lot of saved characters in the output files, which consequently makes large converted files with a lot of similar text smaller.

However, nesting these absolutely positioned elements appears to be what the issue is in iOS. This probably isn’t intended behaviour and may well be a bug with iOS!

Workaround and Conclusion

One solution we’ve come up with for this is to change the output on the page when navigated to in iOS to something it can select the text of. Of course, this affects the performance of our output when looked at on iOS devices, which isn’t the best compromise.

My personal hope is that this issue is rectified within iOS itself so that other developers don’t have to encounter this oddity.

Have you had any difficulties with selecting text on iOS or other web browsers? We’d love to hear about them and how you solved them!

How to Convert CCITT data to TIFF image (Tutorial)

IDRSolutions — Mon, 27 Jul 2026 11:27:37 +0000

What is CCITT data?

CCITT is used to compress black and white image data. Using Huffman encoding, the data is squeezed into a much smaller compressed stream.

CCITT is also a compression format used in the TIFF file format. By adding some additional bytes to your raw CCITT data, and saving it in a file ending .tif, you can create a TIFF Image from raw CCITT data. My example is written in Java (but it should be easy to recode in any language). It will take the raw data and add the required bytes.

CCITT data in PDF files

CCITT is used as a compression format in PDF files for images in XObjects. You can manually extract the CCITT data and the Dictionary values (K, isBlack, etc) from PDF files if you want to reuse the images.

If you have extracted the CCITT data from a PDF, there may be some differences between the raw image and the image in the PDF – remember this is the raw image which may be inverted, coloured, clipped, etc.

How to convert CCITT to a TIFF

Get the CCITT parameters
Create a metadata header
Append the raw CCITT data

and the Java code to write TIFF…

/*
 * default values (these may be set in a PDF DecodeParms dictionary)
 */ 
boolean isBlack = false;  //flag to show if default is black/white
int k = 0;
int w = -1;
int h = -1;

/*
 * build the image
 */ 
ByteArrayOutputStream bos = new ByteArrayOutputStream();

/*
 * tiff header (id, version, offset)
 */ 
String[] headerValues = {"4d", "4d", "00", "2a", "00", "00", "00", "08"};
for (int i = 0; i < headerValues.length; i++)
    bos.write(Integer.parseInt(headerValues[i], 16));

int tagCount = 9; // appears to be minimum needed
// writeWord and writeTag are convenience methods
// add the values as bytes to the stream

/* IFD – Image File Directory */ 
writeWord(String.valueOf(tagCount), bos); // num of entries
writeTag("256", "04", "01", String.valueOf(w), bos); // width
writeTag("257", "04", "01", String.valueOf(h), bos); // length

// BitsPerSample 258 – B&W 1 bit image
writeTag("258", "03", "01", "00010000h", bos);

if (k == 0)
    writeTag("259", "03", "01", "00030000h", bos); // compression
else if (k > 0)
    writeTag("259", "03", "01", "00020000h", bos); // compression
else if (k < 0)
    writeTag("259", "03", "01", "00040000h", bos); // compression

//photometricInterpretation
if (!isBlack)
   writeTag("262", "03", "01", "00000000h", bos);
else
   writeTag("262", "03", "01", "00010000h", bos);

//stripOffsets -start of data after tables
writeTag("273", "04", "1", "122", bos);

//samplesPerPixel
writeTag("277", "03", "01", "00010000h", bos);
//rowsPerStrip – uses height
writeTag("278", "04", "01", String.valueOf(h), bos);
//stripByteCount – 1 strip so all data
writeTag("279", "04", "1", String.valueOf(data.length), bos);
// write next IOD offset  zero as no other table
writeDWord("0", bos);

/*
 * write the CCITT image data at the end
 */ 
try {
   bos.write(data);
   bos.close();
} catch (IOException e) {
   LogWriter.writeLog("[PDF] Tiff exception  " + e);
}

/* save data as image */ 
try {
   FileOutputStream fos = new FileOutputStream(fileName);
   fos.write(bos.toByteArray());
   fos.close();
   } catch (Error err) {
      LogWriter.writeLog("[PDF] Tiff error " + err);
   } catch (Exception e1) {
      LogWriter.writeLog("[PDF] Tiff exception  " + e1);
   }
}

In this tutorial you learned how to change CCITT data to a TIFF image, we have many more blog posts for Java developers working with image technology. Please feel free to check them out.

As experienced Java developers, we help you work with images in Java and bring over a decade of hands-on experience with many image file formats.

How to extract clipped images from PDF file in Java

IDRSolutions — Wed, 22 Jul 2026 14:51:10 +0000

This tutorial shows you how to extract clipped images from a PDF file in 5 simple steps using the JPedal PDF library. JPedal is the best Java PDF library for developers. Clipped images are raw images that have had their formats edited; this includes cropping, flipping, resizing and more.

How to Extract clipped images from PDF files?

1. Add JPedal to your class or module path. (download the trial jar).
Create a File handle, InputStream, or URL pointing to the PDF file
Include a password if file is password-protected
Open the PDF file
Iterate over the images on each page
Close the PDF file

and the Java code to extract clipped images…

File file = new File("/path/to/document.pdf"));
ExtractClippedImages extract = new ExtractClippedImages(file);
//extract.setPassword("password");
if (extract.openPDFFile()) {
    int pageCount = extract.getPageCount();
    for (int page = 1; page <= pageCount; page++) {
        int imagesOnPageCount = extract.getImageCount(page);
        for (int image = 0; image < imagesOnPageCount; image++) {
            BufferedImage img = extract.getClippedImage(page, image, true);
        }
    }
}
extract.closePDFfile();

Why use a third-party library to handle PDF files?

PDF files are a very complex binary/text hybrid data structure. The image data, color information, clipping and scaling details are all stored separately in a compressed format and need to be extracted and combined together.

A third-party library handles all of this for you automatically. In this example, we will use our JPedal PDF library. This provides an easy-to-use Java PDF APi so you can work with PDF files easily in Java.

Extract clipped images from a PDF file with JPedal

If you are looking to use JPedal to extract clipped images from PDF files, we recommend you start with these tutorials:

Why convert PDF magazines to HTML5?

IDRSolutions — Mon, 20 Jul 2026 09:40:11 +0000

In these articles, we talk about the advantages of converting your PDF documents to HTML5. Each point has a full supporting article with a more in-depth discussion on that point. Don’t forget to check back in the future, as we are continually adding to this list!

Gain control of your content – Many companies offer services where they will offer a viewer for your content on the basis that they host it for you. Not us; we let you take the credit for your content, along with the SEO that naturally comes with it.
SEO and the long tail – It is likely that you have a lot of back issues containing lots of well-written content full of all the right keywords. We let you convert this into a format that all search engines can understand, and allow you to take the credit for it.
The best ‘browser’ for your PDF content – Many services convert into a proprietary format (such as Flash) where you lose a lot of features (e.g. text selection). You can benefit by converting to a tried and tested format with billions of users.
Make it easy for foreign readers – Many services convert into a format with ‘fake’ (unchangable) text. We convert to real text that can be altered, for example by many web browsers that offer built-in web page translation.
Publish Everywhere – People are browsing content on an ever-increasing range of devices. Converting to HTML5 will give your content a consistent interface across a wide range of HTML5-capable devices. Read more…
Load quickly and save on bandwidth – One of the issues with PDF is that regardless of how much of the PDF you view, it is required to download the complete file. By converting to HTML5, your magazine readers only have to download the pages they actually read, and you can even create bespoke versions at different quality or zoom levels to optimise for mobile devices.
Measure content performance with analytics – Analytics is vitally important to give you performance data about your content, as well as the opportunity to learn about the demographics of your content readers. By converting to HTML5, you can enable analytics and start improving your content’s performance.

Want to learn more about the PDF file format? We have been developing PDF software for over 20 years!

How Does CCITT Compress Image Data?

IDRSolutions — Thu, 16 Jul 2026 11:23:26 +0000

How does CCITT compression work?

CCITT encodes black-and-white data. It does this by encoding runs of black or white pixels. We can do this in various ways (G31D/ G32D/G42D). They are also known as Group 3/ Group 4 compression. We explain how the most common type (G31D) works in detail below.

As most images contain more white than black, we assume that we start with white. For cases where we do not start with white, we add a marker at the start to show this.

If we encode black as value 1, we just set these bits in our decompressed data – we do not explicitly need to set white values (because it is binary, not setting a value to black means that it is white).

But sometimes, we find that there are more pixels that are black than white. Well, in this case, we can just invert the image (flipping bits is very fast) and then we get the best compression.

All we need is a flag (BlackIs1 in the PDF file format – its default value is false) to flag that the image data needs inversion to appear correctly.

How does G31D compression work?

This is the simpler form of CCITT to decode. Firstly, here are some keywords that would make it easier to understand how G31D works.

Key Terms

Pixel run- Usually 1-bit, 1 for Black and 0 for White. A block of pixels all the same.
Scan line– The width of data from one end of the page to the other.
Code Words– This contains information regarding what the data does, e.g., makeup or terminating.
Run Length– Block of either White or Black bits to be decoded/ encoded.
End of line(EOL)- Unique 12-bit code word used to determine the start and end of a scan line.

Return to control(RTC)- Six EOL code words occurring consecutively usually determine the end of the file. EOL & RTC would become more obvious in later blogs.

Overview of G31D

G31D CCITT is a variation of the Huffman keyed compression scheme. Essentially, to decode a G31D PDF file, a scan line is read in single-bit pixel runs. Each of these bits represents a number of white or black pixels.

The black and white run lengths alternate and vary in length making them uniquely identified when decoded, the maximum size of the run lengths is bounded by the maximum width of the scan line (page width).

More frequently occurring run-lengths are assigned to smaller code words while less frequently occurring run-lengths are assigned to longer code words. This is particularly useful as in a typical handwritten or printed document more short run-lengths are encountered than long run-lengths.

Encoding and Decoding Process

While still on the subject of pixel runs and run-lengths, it is important to mention facts about how pixel runs are encoded which in turn makes it easier to decode.
Pixel runs which are between 0 and 63 pixels in length are generally encoded using a single terminating code while runs between 64 and 2623 are encoded by a single make up code and a terminating code.
When the run length is above 2623 pixels, they are encoded using as many make up codes as needed and only a terminating code.

Firstly, a pre-calculated lookup table for both the black and white pixel runs has to be created against which the current data is compared. You want to be able to keep track of your current bit location in the scan line.

This is so that when a different bit is hit, be it black or white, the decoder can group the previous bits into a code word of either make-up (longer code words) or terminating (shorter code words) code words, which are then checked against the table and decoded as needed.

The make-up code word represents long run-lengths, while the short run-length is represented by the terminating cord-words. The sum of the length values of each code word makes up the run length. The process is repeated as new EOLs are hit.

It is also worth mentioning that each EOL usually starts with a white run-length code word. But there are some unusual cases where it does not follow the norm i.e. begins with a black run-length.

In this situation, the beginning of that scan is preceded by a zero-length white run-length code word. However, if 6 EOLs are hit consecutively, then this denotes the end of the file, i.e. RTC.

Advantages and Disadvantages

Advantages

Good compression of black and white data.

Disadvantages

Cannot optimise across lines or for multiple empty lines.
Takes a while to get to grips with the algorithm.

Do you need to read or write TIFF files in Java?

Our JDeli image library (the best enterprise-level Java image library for performance and efficiency) offers a range of advantages over ImageIO and alternatives for TIFF files, including:

prevents heap-related JVM crashes
reads 1-32 bit bilevel, grayscale, RGB, argb, cmyk, acmyk, ycbcr Colorspaces, and converts to sRGB BufferedImage
implements both Little and Big Endian Byte Ordering
decompresses uncompressed, CCITT group 3 and 4, Deflate/Adobe Deflate, LZW, Packbits
support for Single, Multi-file, Tiling, Planar (Chunky, Separated), Predictor, 16,32 bit floating samples
improve read performance
supports threading
superior image scaling algorithms

Learn more about JDeli, and try it yourself.

As experienced Java developers, we help you work with images in Java and bring over a decade of hands-on experience with many image file formats.

How to add a watermark to a PDF in Java (Tutorial)

IDRSolutions — Fri, 10 Jul 2026 08:59:37 +0000

What is a PDF watermark?

A watermark in a PDF file is a visual element placed behind or over the main content of the page. They are typically faint and translucent. The primary purpose of adding a watermark to a PDF is to convey document status (like “Confidential”) or company branding.

Why Add a Watermark to a PDF Using Java?

If you are building an automated document pipeline, you may want to watermark a PDF using Java to protect your intellectual property. You can programmatically stamp your name or logo so that you can be identified as the owner, or automatically mark unfinished documents with a “Draft” image before they are distributed.

Which Java PDF library should you choose?

When looking to add a watermark to a PDF in Java, there are several options to choose from, each with tradeoffs to consider:

Apache PDFBox and iText can both add watermarks to PDFs, but they often require low-level handling of content streams and rendering edge cases to work reliably across complex documents. PDFBox is a free Java API, but can struggle with performance and inconsistent rendering. iText provides a more powerful API, but introduces AGPL/commercial licensing constraints
JPedal is often preferred for enterprise Java PDF watermarking because its high-fidelity and high-performance engine handles complex PDFs with ease and consistency. It is the perfect tool for batch-processing large volumes of documents.

How to watermark a PDF in Java using JPedal

First, download the JPedal JAR and then add it to your project.

To make edits to PDF files, you can use JPedal’s PDFManipulator class. You can learn more about this powerful tool here. To get started, we will create the basic structure for editing PDF files:

final PdfManipulator pdf = new PdfManipulator();
pdf.loadDocument(new File("inputFile.pdf"));
// insert operations here…
pdf.apply();
pdf.reset();
pdf.writeDocument(new File("outputFile.pdf"));
pdf.closeDocument();

Now we can add different operations depending on what kind of watermarks we want to add.

Add a Watermark to All PDF Pages

You should rebuild the list of pages each time when loading a document, otherwise a document with more pages than the previous one will not have the watermark applied to the additional pages.

final PageRanges pages = new PageRanges(1, pdf.getPageCount());

Add an Image Watermark to a PDF

To add an image watermark to a PDF:

final BufferedImage image = JDeli.read(new File("watermark.png"));
final float[] rect = new float[] {0, 0, 100, 100}; // X1, Y1, X2, Y2
addImage(pages, image, rect)

Images may be transparent or in different colour spaces.

Add a Text Watermark to a PDF

To add a text watermark to a PDF:

final float x = 10;
final float y = 10;
final int fontSize = 12;
final float[] color = {1, 0.3f, 0.2f, 1.0f}; // RGBA
pdf.addText(pages, "Hello World", x, y, BaseFont.HelveticaBold, fontSize, color[0], color[1], color[2], color[3]);

You can also draw text at an angle.

Add a Shape Watermark

To draw a shape onto your PDF:

final Shape shape = new Rectangle2D.Float(56.7f, 596.64f, 131.53f, 139.25f);
final DrawParameters params = new DrawParameters();
params.setStrokeColor(new float[] {1, 0, 0});
params.setFillRule(DrawParameters.STROKE);
pdf.addShape(pages, shape, params);

Using Annotations with Watermarks

Annotations by themselves are not suitable for watermarks, as users can easily remove them. However, you could use an annotation to create a clickable hyperlink over your PDF watermarks.

pdf.addAnnotation(pages, new Link(
    rect,
    Annotation.getFlagsValue(false, false, true, false, false, false, true, true, false, true),
    new float[3], // annotation color
    1.0f, // stroking opacity
    1.0f, // fill opacity
    "https://www.idrsolutions.com/"
));

Learn more

Looking for a pure Java PDF library to handle processing your documents? Check out JPedal.

Want to learn more about the PDF file format? We have been developing PDF software for over 20 years!

How to extract JPG data from PDF

IDRSolutions — Wed, 08 Jul 2026 09:00:48 +0000

Overview

It is actually possible to extract some raw images from the PDF file. In general, images do not exist inside a PDF file – TIFFs and PNGs are ripped apart and the data stored in separate objects. The data is compressed using various compression formats (JBIG2, CCITT, FLATE, LZW).

However, one of the formats used for image data is the DCT format. This is actually a JPEG, and if you take the binary data out and save it in a file with a .jpeg format, you can open it. It includes not just the pixel data but also the JPEG header at the start – it is a complete file.

How is the JPEG data stored?

If you open a PDF file, the stored JPEG data will appear in the XObject image. Here is an example.

14 0 obj
<<
/Intent/RelativeColorimetric
/Type/XObject
/ColorSpace/DeviceGray
/Subtype/Image
/Name/X
/Width 2988
/BitsPerComponent 8
/Length 134030
/Height 2286
/Filter/DCTDecode
>>
stream (binary data) endstream

Key Indicators in the PDF Object

The /Type shows that this is an image. The key section is the /Filter value – DCTDecode indicates a JPEG (JPX shows a JPEG2000), which also works.

The data is between stream and endstream. You need to extract the raw data (cut and paste of text is unlikely to work) for the JPEG file. The **/Length **value shows how long it is.

Understanding the Colour Space

Lastly, the /Colorspace is important because it shows the colour-coding used in the JPEG. If it is DeviceRGB, it will look exactly as it is in the PDF display. Not many viewers understand types like DeviceCMYK – you may need a heavyweight package like Photoshop to see it correctly.

Notes on Clipped Images

If the image is clipped, you may find you can see background details not in the PDF display and the image may also be a different size or even upside down. But you have extracted the raw image data!

As experienced Java developers, we help you work with images in Java and bring over a decade of hands-on experience with many image file formats.

How to redact PDF text with the JPedal Viewer

IDRSolutions — Fri, 03 Jul 2026 13:44:05 +0000

What is redaction and why should you use it?

Redaction is the process of removing sensitive information from a document so that it is suitable for publishing. It is commonly used in legal or government processes when documents are made available to the public while keeping certain details hidden.

If you want to publish parts of a document and have certain secrets remain secrets, then redaction is the right tool to use.

How Redaction in PDFs Works

Redaction typically consists of a black rectangle which covers up the text you want hidden. Traditionally, this was done by drawing over text with a black marker and then scanning it back in.

In the age of digital media, the concept is the same, but care must be taken to ensure that the content is actually removed and cannot be recovered through some sneaky techniques.

Pitfalls of Weak Digital Redaction Tools

One common mistake with low-quality redaction tools is drawing a black box over the text, but leaving the text remaining underneath. Other tools may then allow people to remove the boxes and see the text that was there, or you may just see that large characters like j may poke out of top or bottom of the black box.
Another mistake is replacing the text with empty characters of the same width. This is done to preserve the layout of subsequent characters, however it is possible to reverse-engineer what text used to be there by looking at the widths of the blank characters and comparing them with the widths from the font. Ideally, any blank spacing should be accumulated so not to leak any information.
Finally, a common pitfall to be aware of is that sometimes you can simply figure out what used to be there based on context e.g. “Jane ■■■” appears in one place, but “Jane Doe” appears nearby.

How to redact text using the JPedal Viewer

The JPedal Viewer has a tools menu which contains various operations you can perform on the currently opened document.

The tools menu is hidden by default, so you will need to enable it by going to edit -> preferences -> menu, and selecting tools.

Now that the tools menu is visible, you can open a PDF document in the JPedal Viewer, navigate to the desired page, and select redact from the tools menu.

This will bring up a dialog box to confirm which page you want to draw over, press OK to confirm.

You can now drag a rectangle over the area you want to redact. Any text that intersects this rectangle will be removed and a black box will take its place.

Download JPedal

You can download a copy of the JPedal jar from our website and get started using the JPedal Viewer as a PDF redaction tool.

How FormVu Adds Signature Fields to Converted HTML Forms

IDRSolutions — Wed, 01 Jul 2026 09:15:19 +0000

Digital Signatures: PDF vs HTML

If you convert PDF forms to HTML, you’ve probably run into the signature problem. Every other HTML form signature field converts cleanly, text inputs, checkboxes, dropdowns, radio buttons all have direct HTML equivalents whereas signature fields don’t.

The PDF spec ties them to cryptographic infrastructure (certificate chains, byte ranges, PKCS#7 envelopes) that doesn’t exist in a browser. Implementing a digital signature in HTML has always meant bolting something on after the fact.

Whereas with FormVu’s new feature users don’t need to download the raw PDF, sign it, and upload it back. FormVu’s May 2026 release adds a digital signature field to the HTML output. Signature fields in the source PDF now convert to a browser-based signing interface, just like every other form field.

How the HTML signature field works

When FormVu encounters a signature field in a source PDF and signing is enabled, the converted HTML includes an HTML electronic signature input at the same position in the form. The interface gives users three ways to sign:

Draw a signature directly in the browser
Upload an image of their signature
Type their name as text

No additional JavaScript libraries, no manual integration per form.

Enabling the signature field in HTML output

No need to write electronic signature HTML code yourself. The feature is controlled by a single JVM flag:

-Dorg.jpedal.pdf2html.useFormVuSigning=true

Or via the Java API:

FormViewerOptions options = new FormViewerOptions();
options.setUseFormVuSigning(true);

Set it to true and signature fields in the source PDF will convert to the signing interface. Set it to false (or omit it) and you get the previous behavior, signature fields render as empty space. You can find the Javadoc for FormVu digital signing here.

What this is (and what it isn’t)

FormVu’s signature feature is visual image-based signing, not cryptographic signing. The user draws, uploads, or types a signature, and that image is placed on the signature field. There’s no certificate embedding, no hash validation, no PKCS#7 envelope generated in the browser.

This approach works better for converter HTML forms since you’re outside the PDF signing infrastructure when signing. The browser captures the signature input, storing it, embedding it back into a PDF. If you need cryptographic signatures applied to the original PDF document, use JPedal directly.

Why this matters for PDF-to-HTML conversion

Signature fields were a major form field type that FormVu couldn’t convert before. For teams in insurance, government, healthcare, or compliance, where forms almost always require a signature, this was the gap that kept them serving raw PDFs or maintaining custom workarounds for every converted form.

That constraint is gone. With useFormVuSigning enabled, every signature field in the HTML form works. One flag, no per-form integration work.

Version and compatibility

Available in the May 2026 release of FormVu. There are no changes needed to your existing conversion pipeline beyond setting the JVM flag. The generated signature interface works in all current browsers.

PDF to HTML5 conversion – No Even-Odd Winding Rule for filling shapes in HTML5? – Part 2

IDRSolutions — Fri, 26 Jun 2026 11:20:07 +0000

This is part 2 of this subject. If you haven’t already read it, please read part 1 to understand the winding problem.

So to recap quickly: HTML5 only supports the Non-Zero winding rule, PDF supports both Non-Zero and Even-Odd rules. This means that we need to do something with shapes filled using the Even-Odd rule; they may not display correctly.

To demonstrate in this blog article, I will be using this example:

Just a red circle? The first time I saw it, that was my thought too. It’s only after changing the fill to a stroke that you realise what it actually is.

To help understand what’s going on, here it is with arrows showing the directions of the paths shown.

So our first thought was that we needed to make some kind of change to the way that we draw the shape. The obvious thing was to change the direction of some of the paths drawn in order to “convert” it from Even-Odd to Non-Zero.

Would alternating between clockwise and anti-clockwise work? Well, it was worth a try just to see what the effect was. And besides, it will only take a minute anyway.

Well, it turns out that it’s actually a bit of a headache. It’s fine if you are doing something simple, like just using lineTo to draw a box, but with bezier curves, it’s not quite so easy.

If you were to just reverse the order of the draw commands, you get something like this:

The order of the two control points makes a difference to the way that the curve is drawn. So what about reversing the order of the control points for the lines, too?

Oh, that doesn’t quite seem right either. The type of bezier curve used is a bezierCurveTo, which means that it takes our current position (where we ended up from the previous command), and draws a curve to a position that we specify using two control points.

What that means is that we need to step back to position backwards so that each command uses the position of the last command, so that our control points draw our curve correctly.

Good. So we can reverse the direction of the path so that our image draws correctly:

So are we done? No. While doing something like this may be OK for our no-entry sign, where we can control the order and direction of the shapes that get drawn, this approach is no good for a general case.

In the real world, we can’t just program for specific cases; we need to program for a general case so that our code works for any PDF file, not just the ones we’ve seen before.

And this means that doing something like alternating the direction between paths is not going to work. Here’s how it would look for our no-entry sign:

My next thought was that we could just fill the overlapping shapes with the background colour. But this wouldn’t work either because again, the order is important.

Even if we were to fill our two D’s with white, when we draw and fill the circle around them, it’s just going to cover up all of our hard work. It would also be no good for shapes that only partially overlap each other, because only the intersection of the two would need to be unfilled.

And it’s also important to remember that the unfilled pieces are just that – unfilled. By filling them with a background colour, you lose transparency, which means we wouldn’t be able to put anything behind our no entry sign to say you’re not allowed to do it.

So what now? If we can’t draw the shapes in a different order, and we can’t change the direction that some of the shapes are drawn, and we can’t fill in different pieces to create the illusion that we have fixed the problem, how else can we alter our shape so that it magically works using the Non-Zero winding rule?

We could get really clever and do some really complicated stuff and somehow cut our combined shape into lots of smaller shapes so that we can detect how it should be drawn, and then fill the smaller shapes accordingly.

A bit like this:

But that’s a little overkill for what we want. After all, it might not even matter that the shape we are drawing is drawn using Non-Zero rather than Even-Odd. And that’s a lot of computation time that will just be wasted, resulting in poor performance for our converter. And besides, I am nowhere near clever enough to be able to code that up.

Since I started playing with these shapes, there has been a niggling thought at the back of my mind. What if we were just to output these shapes as images and be done with it?

This would work just fine, but it is not optimal. Very rarely is there a shape drawn where the output from using either the Non-Zero or Even-Odd rule is actually different. So, outputting as an image every time a shape is drawn using the Even-Odd rule is going to output lots of images that are not actually required.

And this is no fun because shapes are nice. They scale beautifully, take no time at all to convert from PDF to HTML5, and take up barely any space at all compared to an image.

But what other option is there? Unfortunately, we are left with a compromise. The best we can do is create some criteria that the shape must pass before we output it as an image.

How to Save Java images as Tifs with JAI

IDRSolutions — Wed, 24 Jun 2026 09:54:07 +0000

Java makes it very easy to create images as BufferedImages which can then be saved out in standard image file formats. Here is the code to save an image as a Tif image using the JAI image (a free library from Sun).

com.sun.media.jai.codec.TIFFEncodeParam params = new com.sun.media.jai.codec.TIFFEncodeParam();
FileOutputStream os = new FileOutputStream(outputDir + imageName+”.tif”);

javax.media.jai.JAI.create(“encode”, image, os, “TIFF”, params);

This works very nicely but there are a number of extra tricks worth knowing.

Firstly, there is a compression option available to compress the image – use the modified code as shown below. There are several types of compression but several of them produce Tif files which will not display under Windows or Mac – COMPRESSION_PACKBITS works well.

com.sun.media.jai.codec.TIFFEncodeParam params = new com.sun.media.jai.codec.TIFFEncodeParam();

params.setCompression(com.sun.media.jai.codec.TIFFEncodeParam.COMPRESSION_PACKBITS);

Secondly, the type of image you save out can have a huge effect on the size of the Tif image. If you save a grayscale image, it produces a much smaller file and also compresses much better than an RGB or ARGB image.

You can find out the image type by using image.getType() – the int values returned are all static Constants in the BufferedImage class.

You can convert the image to another format by creating a second BufferedImage in that format and drawing the original image onto it. Here is the code to make any image grayscale.

BufferedImage image_to_save2=new BufferedImage(image_to_save.getWidth(),image_to_save.getHeight(), BufferedImage.TYPE_BYTE_GRAY);
image_to_save2.getGraphics().drawImage(image_to_save,0,0,null);
image_to_save = image_to_save2;

For our PDF library, we always generate images in ARGB (we need to because PDF files can have transparency which only works in ARGB). But sometimes the image is only grayscale. Using both these tricks of converting to grayscale and compressing allowed us to reduce the size of the Tif files created from a sample PDF file from 1.4 Meg to 52K – pretty impressive!

As experienced Java developers, we help you work with images in Java and bring over a decade of hands-on experience with many image file formats.

How to remove text from a PDF in Java using JPedal (Tutorial)

IDRSolutions — Fri, 19 Jun 2026 08:56:21 +0000

Why remove text from a PDF file?

Removing text from a PDF in Java is a common requirement when dealing with sensitive information, names, email addresses, phone numbers, and other personally identifiable information.

Whether you are meeting GDPR redaction obligations, preparing documents for external sharing, or sanitising files before archiving, this tutorial explains how to do it using the JPedal PDF library.

What redaction actually means

Removing text from a PDF is a two-part problem. First, you find the text. Then you redact it, which itself has two layers:

Hide the text visually, usually done by drawing an opaque box over it
Remove it from the underlying content stream so it cannot be extracted by a PDF reader or copy-paste

Both steps are critical. Drawing a black box without editing the content stream is not true redaction. The text is still there, just invisible, and people will be able to copy and paste it. JPedal handles both steps, and together these are called redaction.

Choosing a Java PDF library for text removal

Most developers reach for Apache PDFBox first, but programmatically removing text from a PDF in Java, rather than just drawing over it, requires direct access to the content stream. JPedal exposes this through a clean API, handling both the search and the redaction in a few lines of code without manual stream manipulation.

Find, delete and redact text from a PDF in Java using JPedal

Open the PDF, scan each page for the target text, redact every match, then write out the modified document. The key methods are findTextOnPage() to locate matches and redact() to remove them. pdf.apply() commits the redaction operations to the document before writing.

Download JPedal trial jar.
Create a File handle to the PDF file
Include a password if file password protected
Open the PDF file
Scan the pages for text
Redact each match
Write the output and close

final File inputFile = new File("inputFile.pdf");
final FindTextInRectangle extract = new FindTextInRectangle(inputFile);
final PdfManipulator pdf = new PdfManipulator();
pdf.loadDocument(inputFile);
if (extract.openPDFFile()) {
    final int pageCount = extract.getPageCount();
    for (int page = 1; page <= pageCount; page++) {
        final float[] coords = extract.findTextOnPage(page, "the", SearchType.MUTLI_LINE_RESULTS);
        for (int val = 0; val < coords.length; val = val + 5) {
            pdf.redact(page, new float[] {coords[val], coords[val + 1], coords[val + 2], coords[val + 3]});
        }
    }
}
extract.closePDFfile();
//apply changes and write out
pdf.apply();
final File outputFile = new File("redactedFile.pdf");
pdf.writeDocument(outputFile);
pdf.closeDocument();

findTextOnPage() returns a flat float array of coordinates for each match, x1, y1, x2, y2, plus a fifth value (magic number documented here) at index 4, which is why the loop increments by 5. The output is a new PDF with every instance of the search term permanently removed from both the visual layer and the content stream.

The original file is not modified unless you overwrite it. Add try-catch blocks around the file operations and PDF calls for production use. For other PDF text manipulation tasks in Java, extracting, searching, or modifying content programmatically, see the JPedal tutorials.

You can expand your understanding of the PDF format by reading our other articles. Similarly, if there is a specific term for PDF you would like to know more about, our PDF Glossary has an extensive list of common terms.