Skip to content

Error in pixReadFromTiffStream: old style jpeg format is not supported #4475

@milahu

Description

@milahu

Current Behavior

tesseract 5.5.1 tries to use the thumbnail images embedded in tiff files
but it fails to open the thumbnail images and shows the warning

Error in pixReadFromTiffStream: old style jpeg format is not supported

test image

helloworld-with-thumb.tiff

wget https://github.com/user-attachments/files/23533251/helloworld-with-thumb.tiff
tesseract -l eng helloworld-with-thumb.tiff -
Page 1
Hello world
Error in pixReadFromTiffStream: old style jpeg format is not supported

Reproduce

gimp -> export as -> test.tiff -> compression: jpeg, save thumbnail
run tesseract test.tiff

Expected Behavior

tesseract should see that these are just thumbnail images
and should silently ignore them

Suggested Fix

No response

tesseract version

$ tesseract -v
tesseract 5.5.1
 leptonica-1.85.0
  libgif 5.2.2 : libjpeg 6b (libjpeg-turbo 3.1.0) : libpng 1.6.47 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.5.0 : libopenjp2 2.5.2

Operating System

No response

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

the warning message comes from leptonica/src/tiffio.c

        /* Old style jpeg is not supported.  We tried supporting 8 bpp.
         * TIFFReadScanline() fails on this format, so we used RGBA
         * reading, which generates a 4 spp image, and pulled out the
         * red component.  However, there were problems with double-frees
         * in cleanup.  For RGB, tiffbpl is exactly half the size that
         * you would expect for the raster data in a scanline, which
         * is 3 * w.  */
    TIFFGetFieldDefaulted(tif, TIFFTAG_COMPRESSION, &tiffcomp);
    if (tiffcomp == COMPRESSION_OJPEG) {
        L_ERROR("old style jpeg format is not supported\n", __func__);
        return NULL;
    }

continue #3008

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions