Stats

Version 1 skeleton

This page is intentionally light on live content. Its main job is to establish route structure, layout, and visual language before the site begins consuming generated dataset exports.

Key Metrics

12,000

Biographies reflected in the current structured dataset.

56.91%

Overall completeness across 22 core fields.

74.90%

Entries with a resolved geocoded country.

83.34%

Biographies covered by occupation classification.

Highlights

Gender distribution

  • Male: 92.51% (11,101/12,000)
  • Female: 4.68% (562/12,000)
  • Unknown: 2.81% (337/12,000)

Quality and validation

  • Quality issues flagged: 3.32% (398/12,000)
  • Needs validation: 82.00% (9,840/12,000)
  • Cross references: 4.58% (549/12,000)

Family and classification

  • Any family information present: 58.23% (6,987/12,000)
  • Father present: 51.11% (6,133/12,000)
  • Mother present: 41.58% (4,990/12,000)
  • Occupation coverage: 83.34% (10,001/12,000); OpenAI 97.08% (9,709/10,001), fallback 2.92% (292/10,001)
  • Mean classification confidence: 0.8600; low-confidence OpenAI rows: 9.95% (966/9,709)

Visuals

Occupation classification

Category distribution

Category distribution

Current distribution across the published occupation categories.

Address geocoding

Geographic coverage map

Geographic coverage map

Resolved address locations staged from the latest geocoding output.

Integration note

This page will eventually be driven by lightweight JSON files in site/data/generated/, not by direct reads of the raw or normalized dataset in the browser.