Method

Version 1 skeleton

This page is intentionally light on live content. Its main job is to establish route structure, layout, and visual language before the site begins consuming generated dataset exports.

Pipeline overview

  1. OCR

    Source transcription

    The website will describe how raw page images are turned into machine-readable text and where OCR uncertainty enters the workflow.

  1. Extraction

    Field structuring

    This section is reserved for the logic that converts biography text into structured fields such as names, occupations, places, and family relations.

  1. Validation

    Quality controls

    Validation steps, conflict handling, and manual review boundaries will be described here once the content phase begins.

  1. Normalization

    Standardized outputs

    This page will later explain how normalized exports, geocoding, and derived statistics are generated for downstream use.

Documentation plan

  • Add a concise process narrative before any deep technical appendix.
  • Separate methodological claims from implementation details.
  • Link future outputs on the stats and download pages back to the relevant processing stage.

Coming next

Live outputs will be connected after the shell is stable.

This placeholder keeps the structure explicit now, while leaving the implementation of real dataset integration to a later phase.