Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Jim__

(14,845 posts)
5. It sounds like they may be obverse problems.
Mon Oct 14, 2024, 07:59 AM
Oct 2024

This post is based on quick reads of the AI paper and the papers you referenced. So, it's sort of food for thought.

In the AI problem, the number of multiplications is O(n2) where n is roughly based on the number of elements in a tensor. They are working with neural networks here, so my current understanding (I've read through the paper, but I need to read it more thoroughly to get a better understanding) is that this is based on the number of nodes (synapses) in the neural net. From the
paper - my best attempt at copying this:

Multiplication operations are generally more complicated than additions, and FP operation are more costly than integers (Horowitz, 2014). Table 1 shows that multiplying two fp32 numbers consumes 37 times higher energy than adding two 32-bit integers. While the complexity of integer addition is O(n) where n is the number of bits used for representing the number, FP multiplication requires O(e) exponent addition, O(m2) mantissa multiplication, and rounding. Here e and m stand for the number of bits used for exponent and mantissa parts of the FP numbers.

Modern LLM training and inference involves a large number of FP calculations in tensor computa-
tion. Consider calculating the element-size and dot products of two 2-D tensors:

Y1 = A ◦ X, Y2 = A · XT ; A, X (elements of) R(N,k)


Calculating Y1 involves N2 FP multiplications (Mul). If A and X are both fp32 tensors, A ◦ X consumes 37 times higher energy than adding two int32 matrices of the save (SIC - s/b same?) size. Similarly, Calculating Y2 involves (m × n × k) FP Mul and the same number of FP additions (Add). When A and X are fp32 tensors, each Mul-Add operation for two numbers consumes 0.9 + 3.7 = 4.6 (pJ)energy. If we replace the fp32 Mul with int32 Add, the energy cost becomes
0.1 + 0.9 = 1.0 (pJ),only 21.7% of the original cost. Similarly, if the inference is conducted in fp16, replacing fp16 Mul with int16 Add result in a 1 - (0.05 + 0.4) / (1.1 + 0.4) = 70% energy saving.

...


We propose L-Mul, a FP multiplication algorithm with O(n) complexity, where n is the bit size
of its FP operands. Consider two FP numbers x, y, whose exponents and fractions are xe, ye and xm, ym respectively, the vanilla FP Mul result is

Mul(x, y) = (1 + xm) · 2xe · (1 + ym) · 2ye
= (1 + xm + ym + xm · ym) · 2xe+ye

plus an xor operation ( ⊕ ) to decide the sign of the result. Assume xm and ym are mantissas of m bits. The O(m2) mantissa multiplication operation is the complexity bottleneck of this calculation. We remove this operation and introduce a new multiplication algorithm that processes mantissas with a computational complexity of O(m):


L-Mul(x, y) = (1 + xm + ym + 2-l(m)) · 2xee+ym,
l(m) = | m if m =< 3,
.........| 3 if m = 4,
.........| 4 if m > 4.



And in the encryption and large prime searches, the O(n2) is based on the large number of digits in the very large numbers involved. From the link: https://theconversation.com/weve-found-a-quicker-way-to-multiply-really-big-numbers-114923 you referenced from The Conversation:

We’ve found a quicker way to multiply really big numbers


In 1960, Anatoly Karatsuba, a 23-year-old mathematics student in Russia, discovered a sneaky algebraic trick that reduces the number of multiplications needed.

For example, to multiply four-digit numbers, instead of needing 42 = 16 multiplications, Karatsuba’s method gets away with only nine. When using his method, twice as many digits means only three times as much work.

This stacks up to an impressive advantage as the numbers get bigger. For numbers with a thousand digits, Karatsuba’s method needs about 17 times fewer multiplications than long multiplication.


Can the solution to the AI problem have some impact on the solution to the problems of multiplying large numbers? In the long-term there may be some connections between the solutions. But, I don't think there is a currently recognized connection between them.

Recommendations

0 members have recommended this reply (displayed in chronological order):

Latest Discussions»Culture Forums»Science»Integer addition algorith...»Reply #5