L-MUL: Energy-Efficient AI Training Approach

Finally, trimming AI Energy Consumption and carbon footprint can soon be a reality.

Oct 14, 2024

Energy-Efficient Approach to Boost AI Performance for Generative AI GENAI LLM Models Machine Learning Deep Learning — Image by Author

Large language models have shown us that, given enough data and sound ML algorithms, human-like text can be generated.

However, the cost of training and inference is deadly from energy consumption and data collection standpoints.

Looking at how the big players in the LLM field have accumulated the data to train their proprietary solutions, it is clear that, user privacy is not considered.

That doesn’t concern or scare me, but the energy consumption and carbon emissions send a chill down my spine.

Did you know that in early 2023, the electricity required to power ChatGPT was around 564 MWh? A small town with nearly 18000 families can survive with that energy.

Other big players like Google Gemini and Mistral contribute at the same level.

It is now increasingly important than ever to optimize the model training for efficiency.

Why does Model training require so much Energy?

Machine learning models are bundles with billions if not millions of weights, biases, and other parameters.

Higher model complexity and data dimensionality directly impact training times and energy requirements.

Multi-dimensional tensors containing floating point values and complex model architectures require more operations per forward and backward training pass to complete one epoch.

GPUs or TPUs demand significantly more power supply to function optimally while executing complex arithmetic operations parallelly at scale.

We cannot compromise on data volume or network complexities for now until better approaches emerge where we can train models on smaller datasets and expect the inference to be equally precise and accurate. However, we can use dynamic computation graphs to skip computations dynamically based on input data. Again, not so efficient for larger models.

We can use efficient model architectures like MobileNets to minimize computations but the results might not be as promising.

Also, hardware optimizations are partially out of our hands and we rely on the vendors for GPU/TPUs.

We are only left with manipulating the arithmetic or data handling operations to minimize energy consumption and yield robust results.

Ironically, large neural networks consume excess energy while computing floating point tensor multiplications.

So on average multiplying floating point numbers with 32 precision (fp32) might require four times more power than simply adding fp32 floats.

Existing Approaches

We noticed new research papers and solutions proposing ways to reduce computational reduction through tensor I/O optimizations and neural network pruning.

In network pruning, we reduce the number of connections between the neural networks.

In I/O-intensive optimizations, we optimize the data/tensor I/O by reducing data shuffle/movement between different hardware and logical components.

Early stopping is another way to stop the model training when the performance hits a threshold. This helps avoid unwanted computations consuming more energy.

We can use Mixed precision training. Instead of 64 or 32 bits, we can use low-precision floating point values to speed up training.

All these approaches have their merits but don’t scale well when working with GPT scale models.

The L-MUL Algorithm

L-Mul is a new approach that can drastically diminish the computational demands of neural networks.

L-Mul focuses on applying and optimizing most of the arithmetic operations we discussed earlier.

Decimals represent floating-point numbers, while whole numbers for integers. L-Mul tries to approximate the results of floating-point multiplication using a promising formula based on integer addition.

Generally, when we try to multiply a floating point value, we try to multiply the significands, add their exponents, and finally round out the result to fit the precision.

These are complex steps and when applied to billions of values, they turn up the heat and consume exponentially more energy.

Floating-Point Multiplication (Mul):

The standard floating-point multiplication operation for two floating-point numbers, x, and y, can be represented as:

Mul(x, y) = (1 + xm) * 2^xe * (1 + ym) * 2^ye 
           = (1 + xm + ym + xm * ym) * 2^(xe + ye)

Assume xe and ye are the exponents of x and y, respectively.

xm and ym are the mantissas (fractional parts) of x and y, respectively.

L-Mul focuses on simplifying the process by directly manipulating the floating point bits instead of performing full multiplication.

Linear-complexity Multiplication (L-Mul):

The L-Mul algorithm approximates floating-point multiplication with the following formula:

L-Mul(x, y) = (1 + xm + ym + 2^(-l(m))) * 2^(xe + ye)

Assume l(m) is an offset exponent that depends on the number of bits (m) used to represent the mantissa. The sources define l(m) as:

The upside of the L-Mul algorithm is that it can be easily integrated into existing models without extensive modifications or retraining.

It focuses on arithmetic optimizations which help achieve tensor I/O optimizations without additional changes/efforts.

Instead of adopting mixed precision training where we reduce the float precision to 8 bites, L-Mul can easily enhance training efficiency and improve model accuracy by leveraging integers.

Overall, L-Mul claims might significantly enhance model training processes while revolutionizing the energy efficiency of AI model deployments.

Share Databracket

Final Words

The future is data and AI-driven where we will witness smart systems and robots taking over most of the tasks that involve decision-making or emotions.

With the rise of LLMs and Generative AI, we are crawling into the AI domain. In our initial leap, we can already see that we require excess energy to achieve mediocre LLMs that sometimes hallucinate and go off-path.

If these useful yet simpler models require so much energy, we can’t even fathom the amount of energy we would need to bring artificial general intelligence to life.

Approaches like L-Mul can be the step towards building better and sustainable AI models.

Connect with Me