Writing

Training Dynamics of Transformer Attention Heads

15 minute read

Published: April 12, 2026

A time-dependent study of W_QK statistics across training checkpoints in the Pythia model suite: how spectral structure, stable rank, and head diversity evolve during pretraining.

Singular Value Structure of Transformer Attention Heads

20 minute read

Published: March 30, 2026

An empirical study of the singular value spectra of W_QK matrices across transformer architectures: spectral distributions, participation ratios, and what they reveal about learned attention geometry.

Inverse Scattering via Physics-Informed Neural Networks

23 minute read

Published: March 26, 2026

PINN solving the KdV equation with boundary conditions specified via inverse scattering transform.

Weight Distributions in Transformer Attention Heads

24 minute read

Published: January 16, 2026

An empirical study of the statistical distributions of W_Q, W_K, and W_QK weight matrices across transformer architectures.

Aaron Angerami

Writing

Recent Posts

Training Dynamics of Transformer Attention Heads

Singular Value Structure of Transformer Attention Heads

Inverse Scattering via Physics-Informed Neural Networks

Weight Distributions in Transformer Attention Heads