Reproducible LLM Workflows in Economic History: Building a Dataset from Degener’s Wer Ist’s?
  • Home
  • Browse
  • Stats
  • Method
  • Download

Download

Current public release

Structured biographies from Degener's Wer Ist's? (1911)

Download the dataset produced by the workflow. Each row describes one person from the source, and each field is one column of information about that person.

The 22 core fields are the main pieces of biography information the workflow tries to recover: names, birth details, address, education, occupation, career, family relations, publications, memberships, political affiliation, hobbies, collections, and personal notes.

16,001 people described in the current release.
22 fields columns like name, birthplace, occupation, and family.
about 13 known details per person on average.

What the Fields Mean

Person

Who is the entry about?

Name, first names, gender, title or profession, and basic birth information.

Work and education

What did they do?

Education, job, career, specialization, and works or publications.

Family and networks

Who were they connected to?

Parents, spouse, children, ancestors, memberships, and political affiliation.

Notes and uncertainty

What needs careful reading?

Hobbies, collections, personal notes, unknown values, and workflow review flags.

Download the Data

Primary data

Start here

These files contain the biography records themselves.

JSONL · Primary data

Normalized JSONL

Best for scripts and reproducible pipelines. Each line is one structured biography.

Download file View on GitHub

  • Path: data/05-openai/normalized.jsonl
  • Size: 24.8 MB
  • Updated: 2026-05-12 14:39

Excel · Primary data

Normalized Excel

Best for reading, filtering, and sharing the biography table without writing code.

Download file View on GitHub

  • Path: data/06-excel/normalized.xlsx
  • Size: 7.5 MB
  • Updated: 2026-05-12 14:39

Supplementary data

Derived outputs

These files support specific analysis tasks built from the biographies.

CSV · Supplementary data

Geocoded addresses CSV

Use this for maps and place-based analysis. It is derived from the normalized address field.

Download file View on GitHub

  • Path: data/08-addresses/addresses_geonames.csv
  • Size: 3.4 MB
  • Updated: 2026-05-12 14:39

Documentation and checks

Read before reuse

These files explain what is present, missing, or flagged for review.

Markdown · Documentation and checks

Biography stats report

Use this to inspect coverage, missing values, geography, occupation classes, and quality flags.

Download file View on GitHub

  • Path: data/07-stats/biography_stats.md
  • Size: 4.1 KB
  • Updated: 2026-05-12 14:39

Reproducibility Notes

Versioning

Pin the repository state when citing the data.

The buttons point to the current public files in the repository. For formal reuse, cite the repository together with the commit hash and download date.

Missing values

unknown is part of the data model.

Historical entries rarely contain every detail. An unknown value means the source or workflow did not provide a confident value for that field.

Reproducible LLM Workflows in Economic History: Building a Dataset from Degener’s Wer Ist’s?

 

GitHub · 2026