How To Read PDFS in OCR C#

Name: IronOCR
Brand: Iron Software
Availability: InStock
Rating: 4.86 (101 reviews)

ByKannapat Udonpant

December 30, 2024

Updated June 2, 2025

In this tutorial, you'll discover how to extract text from PDF documents using Iron OCR in C#. First, ensure your environment is set up with Iron OCR and the necessary namespaces, including installing the Iron package via NuGet and setting up a license key. The tutorial begins with initializing Iron OCR, the OCR engine, to read a PDF document such as 'Iron pdf.pdf.' By instantiating an OCR PDF input object, you can extract text from the entire PDF, which is then printed to the console.

Advanced scenarios include extracting text from specific pages by specifying page indices or from specific areas like forms or tables by defining rectangle objects to represent these regions. This flexibility allows you to manage PDF data programmatically, whether it's an entire document, specific pages, or defined regions.

The tutorial concludes by demonstrating the power of Iron OCR in managing PDF data and encourages viewers to sign up for a trial on the Iron Software website to experience the software firsthand. By following the steps outlined, users can efficiently extract text from PDF documents, making Iron OCR a powerful tool for developers.

using System;
using IronOcr;

class Program
{
    static void Main()
    {
        // Initialize the IronTesseract OCR engine
        var Ocr = new IronTesseract();

        // Set the license key for Iron OCR (replace 'YOUR_LICENSE_KEY' with your actual license key)
        IronOcrInstallation.LicenseKey = "YOUR_LICENSE_KEY"; 

        // Specify the PDF file to read
        string pdfPath = "Iron pdf.pdf";

        // Perform OCR on the PDF document to extract text
        using (var Input = new OcrInput(pdfPath))
        {
            // Optionally, specify pages or areas to extract text from
            // E.g., Input.SelectPages(1); // To select only the first page

            // Execute OCR and capture the result
            var Result = Ocr.Read(Input);

            // Output the extracted text to the console
            Console.WriteLine(Result.Text);
        }
    }
}

using System;
using IronOcr;

class Program
{
    static void Main()
    {
        // Initialize the IronTesseract OCR engine
        var Ocr = new IronTesseract();

        // Set the license key for Iron OCR (replace 'YOUR_LICENSE_KEY' with your actual license key)
        IronOcrInstallation.LicenseKey = "YOUR_LICENSE_KEY"; 

        // Specify the PDF file to read
        string pdfPath = "Iron pdf.pdf";

        // Perform OCR on the PDF document to extract text
        using (var Input = new OcrInput(pdfPath))
        {
            // Optionally, specify pages or areas to extract text from
            // E.g., Input.SelectPages(1); // To select only the first page

            // Execute OCR and capture the result
            var Result = Ocr.Read(Input);

            // Output the extracted text to the console
            Console.WriteLine(Result.Text);
        }
    }
}

$vbLabelText $csharpLabel

Further Reading: How to Read PDFs

Kannapat Udonpant

Chat with engineering team now

Software Engineer

Before becoming a Software Engineer, Kannapat completed a Environmental Resources PhD from Hokkaido University in Japan. While pursuing his degree, Kannapat also became a member of the Vehicle Robotics Laboratory, which is part of the Department of Bioproduction Engineering. In 2022, he leveraged his C# skills to join Iron Software's engineering team, where he focuses on IronPDF. Kannapat values his job because he learns directly from the developer who writes most of the code used in IronPDF. In addition to peer learning, Kannapat enjoys the social aspect of working at Iron Software. When he's not writing code or documentation, Kannapat can usually be found gaming on his PS5 or rewatching The Last of Us.

< PREVIOUS
How to Use Input Images for OCR Processing in C#

NEXT >
How to Use System Drawing Images for OCR Processing in C#