Spectral Adaptation
SVD-Transformer implementation with efficient fine-tuning using Singular Value Decomposition. Trains vanilla transformers on Wiki40B and fine-tunes on conversational datasets.
Overview
Full implementation available at: https://github.com/mzhou02/Spectral-Adaptation
This project implements a vanilla transformer architecture with efficient fine-tuning capabilities using Singular Value Decomposition (SVD). The approach enables parameter-efficient adaptation of large language models to new domains while preserving the structure of pretrained weights.
Fine-Tuning Approach
Given a weight matrix \(W \in \mathbb{R}^{m \times n}\), SVD factorizes it as:
\[W = U \Sigma V^\top\]where:
- \(U \in \mathbb{R}^{m \times r}\) and \(V \in \mathbb{R}^{n \times r}\) are (semi-)orthogonal matrices
- \(\Sigma = \mathrm{diag}(\sigma_1, \sigma_2, \dots, \sigma_r)\) is the diagonal matrix of singular values
Instead of updating the entire weight matrix, we:
- Freeze \(U\) and \(V\) matrices
- Train only a new set of scalar multipliers \(z \in \mathbb{R}^r\) applied to singular values:
- Reconstruct the adapted weight matrix: