Spectral Adaptation

Overview

Full implementation available at: https://github.com/mzhou02/Spectral-Adaptation

This project implements a vanilla transformer architecture with efficient fine-tuning capabilities using Singular Value Decomposition (SVD). The approach enables parameter-efficient adaptation of large language models to new domains while preserving the structure of pretrained weights.

Fine-Tuning Approach

Given a weight matrix \(W \in \mathbb{R}^{m \times n}\), SVD factorizes it as:

\[W = U \Sigma V^\top\]

where:

\(U \in \mathbb{R}^{m \times r}\) and \(V \in \mathbb{R}^{n \times r}\) are (semi-)orthogonal matrices
\(\Sigma = \mathrm{diag}(\sigma_1, \sigma_2, \dots, \sigma_r)\) is the diagonal matrix of singular values

Instead of updating the entire weight matrix, we:

Freeze \(U\) and \(V\) matrices
Train only a new set of scalar multipliers \(z \in \mathbb{R}^r\) applied to singular values:

\[\Sigma' = \mathrm{diag}(z_1 \sigma_1, z_2 \sigma_2, \dots, z_r \sigma_r)\]

Reconstruct the adapted weight matrix:

\[W' = U \Sigma' V^\top\]