U.S. Presidential Speeches, 1775–2024
Tracking how the meaning of political concepts evolves across 250 years of American presidential discourse. Word embeddings are trained on quarter-century corpora using SVD on PPMI matrices with Count-Min Sketch for memory efficiency, then aligned via orthogonal Procrustes to measure semantic distance over time.
Cosine distance between aligned word vectors in consecutive quarter-century periods. Higher values indicate greater semantic shift.
Semantic shift magnitude for each word across period transitions. Darker cells indicate greater meaning change. Hover for details.
How far each word has drifted from its 1775–1799 meaning. Measured as cosine distance from the baseline embedding.
The most similar words for a given concept in each period — showing how semantic context shifts over time.
How the prominence of each topic evolves across 250 years of presidential speeches. Topics are extracted using NMF on a unified TF-IDF matrix. Click legend to toggle.
Top words defining each topic within a selected period. Select a period to see how each topic manifests in that era's language.
Number of unique words (min length 3, above frequency threshold) per quarter-century period.
Each quarter-century corpus of lemmatized presidential speeches is processed into a word–context co-occurrence matrix using skipgrams (window=5). Word and skipgram frequencies are counted with Count-Min Sketch probabilistic data structures for memory efficiency.
The co-occurrence matrix is transformed into a Positive Pointwise Mutual Information (PPMI) matrix with context distribution smoothing (α=0.75), then reduced to 100-dimensional embeddings via truncated SVD.
To compare embeddings across periods, we use orthogonal Procrustes alignment: finding the rotation matrix that best aligns shared vocabulary vectors, then measuring cosine distance for target words in the aligned space.
Topics are extracted using NMF (Non-negative Matrix Factorization) on a unified TF-IDF matrix built from all periods, enabling consistent topic tracking over time.
Built with chronowords —
install via pip install chronowords.