Home

Historical Biography Dataset

Degener Dataset

A public-facing home for a structured dataset derived from printed historical biographical entries, designed to support browsing, summary statistics, and downloadable research outputs.

View current statistics See available files

What This Dataset Contains

This site presents a structured dataset derived from printed historical biographical entries. It turns a dense reference source into a format that can be summarized, compared, and gradually explored online.

The source material consists of compact biographical notices that bring together identity, titles, life events, family references, affiliations, and addresses in compressed prose. In raw form, those entries are readable, but they are difficult to search or analyze systematically at scale.

Here, those notices are transformed into structured records across 12,000 biographies. The current normalized output tracks 22 core fields and preserves room for names, professions, birth details, addresses, education, career paths, family relations, political affiliations, memberships, and personal notes.

What one record can contain

A single record can combine several layers of information at once:

Identity

Name, first names, title or profession.

Life details

Birth date, birth place, and address.

Education and career

Education, job, career path, publications, or specialization.

Family

Parents, spouse, children, or ancestors.

Affiliations and notes

Political affiliation, memberships, hobbies, collections, and personal notes.

From source text to structure

Printed notices become analyzable records

Raw entries are converted into machine-readable fields that can be grouped, counted, mapped, and eventually browsed record by record.

Rich but uneven coverage

Depth and incompleteness coexist

The current release tracks 22 core fields, reaches 56.91% overall completeness, and still captures family information in 58.23% of biographies.

Geography and classification

Structured outputs enable comparison

74.90% of entries already resolve to a geocoded country, and occupation classification covers 83.34% of biographies in the current structured output.

Current Snapshot

The metrics below provide a compact view of the current structured release and show how much of the dataset is already available for comparison and interpretation.

Dataset size

12,000

Biographies currently reflected in the structured dataset outputs.

Completeness

56.91%

150,241 of 264,000 tracked core slots filled.

Geocoded coverage

74.90%

8,988 entries resolved to a country.

Occupation coverage

83.34%

10,001 biographies classified so far.

Last refreshed: 2026-03-11 14:28

What comes next

Planned additions

A lightweight browse index for quick name lookup and filtering.
Statistical summaries and map views sourced from generated site data.
Method documentation tied back to the extraction and normalization pipeline.
Download bundles, citation metadata, and release notes.

Current emphasis

The next additions will deepen interpretation rather than change the core structure: richer browse tools, fuller method documentation, and clearer public release guidance.

Page map

Browse

Developing space for searchable records and future record detail views.

Open browse page

Stats

Current summary metrics, classification outputs, and geographic views.

Open stats page

Method

Pipeline, validation, and limitations behind the structured dataset.

Open method page

Download

Dataset files and supporting outputs available through the repository.

Open download page