Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Groups
  • Search
  • Get Qt Extensions
  • Unsolved
Collapse
Brand Logo
  1. Home
  2. Qt Development
  3. General and Desktop
  4. How to extract text from PDF?

How to extract text from PDF?

Scheduled Pinned Locked Moved Unsolved General and Desktop
5 Posts 5 Posters 1.1k Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    Jerry van de Bunt
    wrote on last edited by
    #1

    Hi everyone,

    How can I extract text from a PDF file? 😁

    1 Reply Last reply
    0
    • C Offline
      C Offline
      ChrisW67
      wrote on last edited by ChrisW67
      #2

      Welcome to the forum.

      Assuming you want to do this using Qt then there's no out-of-the-box way to achieve this.

      How you go about it depends on what you are doing this for, how you want to handle non-text content, how you want to handle layout, what platform you are on, ...

      You might get away with something like Ghostscript:
      gs -sDEVICE=txtwrite -o output.txt input.pdf
      (or a Windows equivalent)

      1 Reply Last reply
      0
      • SGaistS Offline
        SGaistS Offline
        SGaist
        Lifetime Qt Champion
        wrote on last edited by
        #3

        Hi and welcome to devnet,

        Another option is to convert your pdf to images and use something like tesseract to do OCR on them.

        Interested in AI ? www.idiap.ch
        Please read the Qt Code of Conduct - https://forum.qt.io/topic/113070/qt-code-of-conduct

        1 Reply Last reply
        0
        • hskoglundH Offline
          hskoglundH Offline
          hskoglund
          wrote on last edited by hskoglund
          #4

          Hi, on Ubuntu there's pdftotext (a.k.a. poppler-utils).

          Also there's a QPdfDocument class which has a getAllText() function. However it looks like you have to compile/build QPdfDocument yourself, i..e it's not included in the Qt installer.

          1 Reply Last reply
          1
          • J Offline
            J Offline
            JanHardin01
            Banned
            wrote last edited by
            #5
            This post is deleted!
            1 Reply Last reply
            0

            • Login

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • Users
            • Groups
            • Search
            • Get Qt Extensions
            • Unsolved