Reproducible LLM Workflows in Economic History: Building a Dataset from Degener’s Wer Ist’s?
  • Home
  • Browse
  • Stats
  • Method
  • Download

Degener Dataset

Reproducible LLM workflows in economic history

Degener's
Wer Ist's?

A transparent pipeline for turning Degener's Wer Ist's? (1911) into a structured dataset with prompts, logs, validation steps, and reusable outputs.

The website accompanies my paper on reproducible historical data construction. For details, see below or inspect the workflow.

Explore Records Read the Paper
Source page detail from Degener's Wer Ist's? showing the Max Weber biography entry
An Example Page

Structured Data

Biography record

22 variables · Image01721_right:1

Name
WEBER
First names
Max
Titles/profession
Dr. jur., o. Hon.-Prof. Nat.-Ökon. Univ. Heidelberg.
Gender
male
Birth date
21. April 1864
Birth place
Erfurt.
Address
Heidelberg, Ziegelhäuser Landstraße 17.
Education
Königliches Gymnasium Charlottenburg; Universität Heidelberg, Straßburg, Berlin, Göttingen; Schüler von L. Goldschmidt und A. Meitzen; philosophisch von H. Rickert beeinflusst; Doktor der Rechte 89.
Job / occupation
Dr. jur., o. Hon.-Prof. Nat.-Ökon. Univ. Heidelberg.
Career
Refer. 86; Dr. jur. 89; Assess. Berlin 90; Priv.-Doz. 92; a. o. Prof. Eidebrecht. Berlin 93; o. Prof. Nat.-Ök. Freiburg 94; Heidelberg 97; künd. Lehramt weg. hartnäck. Krankht. 03; sd. o. Hon.-Prof. das.
Father
Max W., Stadtr. Berlin, Rchstgsabgeordn.
Mother
Helene Fallenstein.
Ancestors
Vorfahren mütterlicherseits: Hugenotten.
Spouse
Verheiratet: mit Marianne Schnitger, Tochter des Sanitätsrats Doktor S., Lippe, bekannte Schriftstellerin auf dem Gebiet der Philosophie und Rechtsgeschichte der Ehe.
Children
unknown
Works/publications
Z. Gesch. d. Hdeisges. 89; Röm. Agrargesch. 92; Ostelb. Ldarbeiter. 93; Protestant. Ethik u. Geist d. Kapitalismus (Arch. f. Sozialwiss., Bd. XXI.); zahlr. methodol. u. Psychophysik d. gewerbl. Arb. betr. Abhdlgn. das.; Mithrsg. d. Arch. f. Soz.-Wissensch.
Specialization
Spez.: Rechtsgeschichte; Nationalökonomie; Soziologie; Methodologie.
Hobbies
unknown
Collections
unknown
Personal notes
unknown
Political party
unknown
Memberships
außerordentliches Mitglied Akademie der Wissenschaften Heidelberg.

Featured Paper

Working paper

Reproducible LLM Workflows in Economic History: Degener's Wer Ist's?

The paper develops a transparent, reproducible LLM workflow for turning dense historical print sources into structured data. Degener's Wer Ist's? (1911) is the demonstration case: roughly 16,000 biographies are processed through OCR, assembly, extraction, validation, and normalization.

The central argument is practical: LLM-based data construction becomes more useful when prompts, logs, model responses, and outputs remain linked to the original source.

Inspect method Download outputs View repository

Paper focus

  • End-to-end traceability from page image to normalized row.
  • Modular tasks with explicit failure points and review flags.
  • Human verification embedded where model output is uncertain.

Main results

Input and output

1,686 pages source pages to 16,001 biographies linked, structured records in the finished release.

Variables

22 Normalized fields capture names, occupations, addresses, family details, publications, and source-quality flags.

Quality check

4.49% semantic error rate from human verification. 1.07% CER sample estimate.

Efficiency

€159.24 total model cost. 66 min total processing time.

Reproducible LLM Workflows in Economic History: Building a Dataset from Degener’s Wer Ist’s?

 

GitHub · 2026