When it comes to actually storing the numerical weights that power a large language model underlying neural network , most modern AI models rely on the precision of 16 or 32 bit floating point numbers . But that level of precision can come at the cost of large memory footprints in the hundreds of gigabytes for the largest models and significant processing resources needed for the complex matrix multiplication used when responding to prompts. Now, researchers at Microsoft General Artificial...
