Test in production without watermarks.
Works wherever you need it to.
Get 30 days of fully functional product.
Have it up and running in minutes.
Full access to our support engineering team during your product trial
In this tutorial, you'll discover how to extract text from PDF documents using Iron OCR in C#. First, ensure your environment is set up with Iron OCR and the necessary namespaces, including installing the Iron package via NuGet and setting up a license key. The tutorial begins with initializing Iron OCR, the OCR engine, to read a PDF document such as 'Iron pdf.pdf.' By instantiating an OCR PDF input object, you can extract text from the entire PDF, which is then printed to the console.
Advanced scenarios include extracting text from specific pages by specifying page indices or from specific areas like forms or tables by defining rectangle objects to represent these regions. This flexibility allows you to manage PDF data programmatically, whether it's an entire document, specific pages, or defined regions.
The tutorial concludes by demonstrating the power of Iron OCR in managing PDF data and encourages viewers to sign up for a trial on the Iron Software website to experience the software firsthand. By following the steps outlined, users can efficiently extract text from PDF documents, making Iron OCR a powerful tool for developers.
using System;
using IronOcr;
class Program
{
static void Main()
{
// Initialize the IronTesseract OCR engine
var Ocr = new IronTesseract();
// Set the license key for Iron OCR (replace 'YOUR_LICENSE_KEY' with your actual license key)
IronOcrInstallation.LicenseKey = "YOUR_LICENSE_KEY";
// Specify the PDF file to read
string pdfPath = "Iron pdf.pdf";
// Perform OCR on the PDF document to extract text
using (var Input = new OcrInput(pdfPath))
{
// Optionally, specify pages or areas to extract text from
// E.g., Input.SelectPages(1); // To select only the first page
// Execute OCR and capture the result
var Result = Ocr.Read(Input);
// Output the extracted text to the console
Console.WriteLine(Result.Text);
}
}
}
using System;
using IronOcr;
class Program
{
static void Main()
{
// Initialize the IronTesseract OCR engine
var Ocr = new IronTesseract();
// Set the license key for Iron OCR (replace 'YOUR_LICENSE_KEY' with your actual license key)
IronOcrInstallation.LicenseKey = "YOUR_LICENSE_KEY";
// Specify the PDF file to read
string pdfPath = "Iron pdf.pdf";
// Perform OCR on the PDF document to extract text
using (var Input = new OcrInput(pdfPath))
{
// Optionally, specify pages or areas to extract text from
// E.g., Input.SelectPages(1); // To select only the first page
// Execute OCR and capture the result
var Result = Ocr.Read(Input);
// Output the extracted text to the console
Console.WriteLine(Result.Text);
}
}
}
Further Reading: How to Read PDFs