codex-pdf
BetaStructured PDF extraction API that turns complex files into consistent JSON.
codex-pdf is a contract-first, read-only extraction engine. It provides canonical document facts in a schema-validated CodexDocument payload so downstream systems stop re-parsing the same files in different ways.
- CodexDocument contract with published schemas
- Read-only extraction boundary by design
- CLI workflows: extract, probe, validate, parity
- Consumer-agnostic output for adapter layers
- AGPL open source with typed Python models