Download

Current public release

Structured biographies from Degener's Wer Ist's? (1911)

Download the dataset produced by the workflow. Each row describes one person from the source, and each field is one column of information about that person.

The 22 core fields are the main pieces of biography information the workflow tries to recover: names, birth details, address, education, occupation, career, family relations, publications, memberships, political affiliation, hobbies, collections, and personal notes.

16,001 people described in the current release.

22 fields columns like name, birthplace, occupation, and family.

about 13 known details per person on average.

What the Fields Mean

Person

Who is the entry about?

Name, first names, gender, title or profession, and basic birth information.

Work and education

What did they do?

Education, job, career, specialization, and works or publications.

Family and networks

Who were they connected to?

Parents, spouse, children, ancestors, memberships, and political affiliation.

Notes and uncertainty

What needs careful reading?

Hobbies, collections, personal notes, unknown values, and workflow review flags.

Download the Data

Primary data

Start here

These files contain the biography records themselves.

JSONL · Primary data

Normalized JSONL

Best for scripts and reproducible pipelines. Each line is one structured biography.

Download file View on GitHub

Path: data/05-openai/normalized.jsonl
Size: 24.8 MB
Updated: 2026-05-12 14:39

Excel · Primary data

Normalized Excel

Best for reading, filtering, and sharing the biography table without writing code.

Download file View on GitHub

Path: data/06-excel/normalized.xlsx
Size: 7.5 MB
Updated: 2026-05-12 14:39

Supplementary data

Derived outputs

These files support specific analysis tasks built from the biographies.

CSV · Supplementary data

Geocoded addresses CSV

Use this for maps and place-based analysis. It is derived from the normalized address field.

Download file View on GitHub

Path: data/08-addresses/addresses_geonames.csv
Size: 3.4 MB
Updated: 2026-05-12 14:39

Documentation and checks

Read before reuse

These files explain what is present, missing, or flagged for review.

Markdown · Documentation and checks

Biography stats report

Use this to inspect coverage, missing values, geography, occupation classes, and quality flags.

Download file View on GitHub

Path: data/07-stats/biography_stats.md
Size: 4.1 KB
Updated: 2026-05-12 14:39

Reproducibility Notes

Versioning

Pin the repository state when citing the data.

The buttons point to the current public files in the repository. For formal reuse, cite the repository together with the commit hash and download date.

Missing values

`unknown` is part of the data model.

Historical entries rarely contain every detail. An unknown value means the source or workflow did not provide a confident value for that field.

Structured biographies from Degener's Wer Ist's? (1911)

What the Fields Mean

Who is the entry about?

What did they do?

Who were they connected to?

What needs careful reading?

Download the Data

Start here

Normalized JSONL

Normalized Excel

Derived outputs

Geocoded addresses CSV

Read before reuse

Biography stats report

Reproducibility Notes

Pin the repository state when citing the data.

unknown is part of the data model.

`unknown` is part of the data model.