Skip to content

Tesseract fails to process image paths enclosed in quotes (e.g., from Windows "Copy as path") #4515

@eduardomozart

Description

@eduardomozart

Current Behavior

Tesseract fails to run when a provided text file (containing a list of images to process) includes file paths enclosed in double quotes.

C:\Users\eduardo.oliveira\Downloads>tesseract C:\Users\eduardo.oliveira\Downloads\tesseract.txt output-prefix -l eng pdf
Leptonica Error in fopenReadStream: file not found: "10.png"
Leptonica Error in pixRead: image file not found: "10.png"
Image file "10.png" cannot be read!
Error during processing.

Failing Examples:

Absolute path (Fails):
txt "C:\Users\eduardo.oliveira\Downloads\10.png"

Relative path (Fails):
txt "10.png"

Working Examples:
Tesseract processes the files correctly if the quotes are manually removed, even if the path contains spaces.

Relative path without quotes (Works):
txt 10.png

Absolute path without quotes (Works):
txt C:\Users\eduardo.oliveira\Downloads\10.png

Absolute path with spaces, without quotes (Works):
txt C:\Users\eduardo.oliveira\Downloads\1 0.png

Expected Behavior

Tesseract should gracefully handle and parse quoted file paths within the input text file. It should strip or ignore the surrounding quotes and process the file. Wrapping paths in quotes is a standard OS behavior, particularly when dealing with paths that contain spaces or when utilizing standard clipboard shortcuts.

Suggested Fix

It seems this might be a regression in Leptonica. However, as a robust workaround within Tesseract itself, it would be highly beneficial to automatically strip leading and trailing double quotes (") from image names/paths during the text file parsing phase before passing them along to Leptonica.

tesseract -v

tesseract v5.4.0.20240606
leptonica-1.84.1
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.1) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.3 : libwebp 1.4.0 : libopenjp2 2.5.2
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found libarchive 3.7.4 zlib/1.3.1 liblzma/5.6.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.6

Operating System

Windows 11

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

12th Gen Intel(R) Core(TM) i5-1235U (1.30 GHz)

Virtualization / Containers

No response

Other Information

This bug significantly impacts Windows users because selecting multiple files in Windows Explorer and clicking "Copy as path" automatically wraps the absolute paths in quotes. Manually removing these quotes from large lists of files is a tedious extra step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions