Object Model

Every piece of data in a PDF is an object. LibPDF represents these objects as TypeScript classes that you can inspect and manipulate.

PDF Object Types

PDF defines eight primitive types plus references:

PDF Type	TypeScript Class	PDF Syntax	Example
Boolean	`boolean`	`true` / `false`	`true`
Integer	`number`	`42`	`42`
Real	`number`	`3.14`	`3.14`
String	`PdfString`	`(Hello)` or `<48656C6C6F>`	`(Hello World)`
Name	`PdfName`	`/Type`	`/Page`
Array	`PdfArray`	`[1 2 3]`	`[0 0 612 792]`
Dictionary	`PdfDict`	`<< /Key /Value >>`	`<< /Type /Page >>`
Stream	`PdfStream`	dictionary + binary data	Page content, images
Null	`null`	`null`	`null`
Reference	`PdfRef`	`1 0 R`	`5 0 R`

Working with Objects

PdfDict (Dictionary)

Dictionaries are key-value maps, the most common structure in PDFs. Every page, font, and image is a dictionary.

import { PdfArray, PdfDict, PdfName, PdfNumber, PdfString } from "@libpdf/core";

// Create a dictionary
const dict = PdfDict.of({
  Type: PdfName.of("Page"),
  MediaBox: new PdfArray([PdfNumber.of(0), PdfNumber.of(0), PdfNumber.of(612), PdfNumber.of(792)]),
});

// Read values with typed getters
const type = dict.getName("Type"); // PdfName | undefined
const box = dict.getArray("MediaBox"); // PdfArray | undefined
const count = dict.getNumber("Count"); // PdfNumber | undefined
const title = dict.getString("Title"); // PdfString | undefined
const ref = dict.getRef("Pages"); // PdfRef | undefined

// Generic get (returns PdfObject | undefined)
const value = dict.get("SomeKey");

// Check existence
if (dict.has("Resources")) {
  // ...
}

// Set values
dict.set("NewKey", PdfString.fromString("value"));

// Delete keys
dict.delete("OldKey");

// Iterate
for (const [name, value] of dict) {
  console.log(`${name.value}: ${value.type}`);
}

PdfArray

Arrays are ordered collections of objects.

import { PdfArray, PdfNumber } from "@libpdf/core";

// Create an array
const arr = PdfArray.of(PdfNumber.of(0), PdfNumber.of(0), PdfNumber.of(612), PdfNumber.of(792));

// Access by index
const first = arr.at(0); // PdfObject | undefined
const count = arr.length; // number

// Add items
arr.push(PdfNumber.of(100));

// Insert at index
arr.insert(0, PdfNumber.of(-10));

// Iterate
for (let i = 0; i < arr.length; i++) {
  const item = arr.at(i);
  console.log(item?.type);
}

PdfRef (Reference)

References point to indirect objects by number:

import { PdfRef } from "@libpdf/core";

// Create a reference
const ref = PdfRef.of(5, 0); // Object 5, generation 0

// References are value objects
ref.objectNumber; // 5
ref.generation; // 0

// Compare references (interned, so use ===)
const same = PdfRef.of(5, 0);
ref === same; // true (same instance due to interning)

// Resolve a reference to get the actual object
const obj = pdf.context.resolve(ref);

PdfName

Names are atomic symbols used as dictionary keys and type identifiers:

import { PdfName } from "@libpdf/core";

// Create a name
const name = PdfName.of("Type");

// Common predefined names
PdfName.Page; // /Page
PdfName.Type; // /Type
PdfName.Catalog; // /Catalog

// Get the string value
name.value; // "Type"

PdfString

Strings hold text data. PDF has two encodings:

import { PdfString } from "@libpdf/core";

// From a JavaScript string (auto-encodes)
const str = PdfString.fromString("Hello, World!");

// From raw bytes
const bytes = new Uint8Array([0x48, 0x65, 0x6c, 0x6c, 0x6f]);
const str2 = PdfString.fromBytes(bytes);

// Get as JavaScript string
str.asString(); // "Hello, World!"

// Get raw bytes
str.bytes; // Uint8Array

PdfNumber

Numbers can be integers or reals:

import { PdfNumber } from "@libpdf/core";

const int = PdfNumber.of(42);
const real = PdfNumber.of(3.14159);

int.value; // 42
real.value; // 3.14159

PdfStream

Streams combine a dictionary with binary data (used for page content, images, fonts):

import { PdfStream, PdfName, PdfNumber } from "@libpdf/core";

// Create a stream
const stream = PdfStream.fromDict(
  {
    Type: PdfName.of("XObject"),
    Subtype: PdfName.of("Image"),
    Width: PdfNumber.of(100),
    Height: PdfNumber.of(100),
  },
  imageBytes,
);

// Access dictionary entries (PdfStream extends PdfDict)
const width = stream.getNumber("Width");

// Get decoded data (decompresses if filtered)
const data = stream.getDecodedData();

// Get raw (possibly compressed) data
const raw = stream.data;

Indirect Objects

Most objects in a PDF are indirect objects-stored with an ID and accessed via references.

5 0 obj
<< /Type /Page /MediaBox [0 0 612 792] >>
endobj

The 5 0 is the object number and generation. Other objects reference this as 5 0 R.

Why Indirect Objects?

Sharing: Multiple pages can reference the same font
Lazy loading: Only parse objects when needed
Updates: Replace objects without rewriting the file

Working with References

// Page dictionaries contain references
const page = pdf.getPage(0);
const contentsRef = page?.dict.getRef("Contents");

if (contentsRef) {
  // Resolve to get the actual stream
  const contents = pdf.context.resolve(contentsRef);

  if (contents instanceof PdfStream) {
    const data = contents.getDecodedData();
    console.log(new TextDecoder().decode(data));
  }
}

Registering New Objects

When creating new objects, register them to get a reference:

const newDict = PdfDict.of({
  Type: PdfName.of("XObject"),
  Subtype: PdfName.of("Form"),
});

// Register returns a reference
const ref = pdf.context.registry.register(newDict);

// Now you can use this reference elsewhere
page.dict.set("MyXObject", ref);

Type Checking

Check object types at runtime:

import {
  PdfDict, PdfArray, PdfStream, PdfString,
  PdfName, PdfNumber, PdfRef
} from "@libpdf/core";

const obj = dict.get("SomeKey");

if (obj instanceof PdfDict) {
  // It's a dictionary
  const type = obj.getName("Type");
}

if (obj instanceof PdfArray) {
  // It's an array
  const first = obj.at(0);
}

if (obj instanceof PdfStream) {
  // It's a stream (has binary data)
  const data = obj.getDecodedData();
}

if (obj instanceof PdfRef) {
  // It's a reference (needs resolving)
  const resolved = pdf.context.resolve(obj);
}

// Use the type property
if (obj?.type === "dict") { ... }
if (obj?.type === "array") { ... }
if (obj?.type === "stream") { ... }
if (obj?.type === "ref") { ... }
if (obj?.type === "name") { ... }
if (obj?.type === "string") { ... }
if (obj?.type === "number") { ... }

Common Object Patterns

Page Dictionary

{
  Type: /Page,
  Parent: 2 0 R,           // Reference to page tree
  MediaBox: [0 0 612 792], // Page dimensions
  Contents: 5 0 R,         // Page content stream
  Resources: {
    Font: { F1: 6 0 R },
    XObject: { Im1: 7 0 R },
  },
}

Font Dictionary

{
  Type: /Font,
  Subtype: /Type1,
  BaseFont: /Helvetica,
  Encoding: /WinAnsiEncoding,
}

Image XObject

{
  Type: /XObject,
  Subtype: /Image,
  Width: 100,
  Height: 100,
  ColorSpace: /DeviceRGB,
  BitsPerComponent: 8,
  Filter: /DCTDecode,  // JPEG compression
  // ... stream contains image data
}

Low-Level Access

For advanced use cases, access the object registry directly:

const pdf = await PDF.load(bytes);

// Get the registry
const registry = pdf.context.registry;

// Resolve any reference
const obj = registry.resolve(PdfRef.of(5, 0));

// Get object synchronously (if already loaded)
const cached = registry.getObject(PdfRef.of(5, 0));

Warning: Low-level APIs may change between minor versions. Prefer high-level APIs when possible.

Summary

Class	Use Case
`PdfDict`	Structured data (pages, fonts, forms)
`PdfArray`	Ordered lists (MediaBox, colors)
`PdfStream`	Binary data with metadata (content, images)
`PdfRef`	Pointers to indirect objects
`PdfName`	Dictionary keys, type identifiers
`PdfString`	Text values
`PdfNumber`	Numeric values

Understanding the object model helps you:

Debug PDF issues by inspecting raw structures
Build custom features using low-level APIs
Understand why certain operations work the way they do

Object Model

On this page