Matrix Fast Fourier transform(MFFT)

Hadi published on November 05, 2024

22 min, 4307 words

Categories: Math Programming FFT Matrix multiplication

Tags: Matrix multiplication matmul Math Optimization

Matrix Fast Fourier transform(MFFT)

Matrix multiplication lies at the heart of modern machine learning, powering everything from neural networks to transformer models. Its optimization can dramatically impact the performance of AI systems, potentially reducing training times from days to hours. Let's explore how we can make this fundamental operation more efficient.

Above wallpaper reference¹

The problem

There are very great sources out there to explain the problem, such as

Or you can check out Discovering novel algorithms with AlphaTensor² to grasp the amount of hard work we put into finding a more optimized matrix multiplication(matmul) operation.

In summary, if you follow the definition of matrix multiplication of two \(n\times n\) matrices you will write a code with almost \(n^3\) number of number multiplications to calculate the result. However, the number multiplication itself is expensive. If you had two numbers with \(m\) digits each, their multiplication will cost you \(m\) times the cost of number addition, where the number addition of such numbers would cost you \(m\) times the cost of a single digit operation. This means number multiplication will cost you \(m^2\) times the cost of a single digit operation. Thus, the matrix multiplication above would cost you almost \(n^3\times m^2\) times the cost of a single digit operation.

You will see people explain it with big O notation like the time complexity of the matrix multiplication is \(m^2O(n^3)\), which is more precise, and we don't need to mention "almost" all the times. Notice, the reason that we kept the \(m^2\) out of the \(O\) is that \(m\) is not something we want to scale up. It's a constant depending on the CPU architecture, such as \(8\), \(16\), \(32\), and \(64\), and it's a power of \(2\), where this extra condition will turn out to be so useful in this method. In the end, knowing its impact matters when you want to optimize, and this method relies on that.

The \(3\) in \(m^2O(n^3)\) is what people focused to optimize this operation, so it has its own name \(\omega\), where \(m^2O(n^\omega)\) is the complexity of matmul operation. Base on the Computational complexity of matrix multiplication³ page on Wikipedia the latest achievement today is \(\omega=2.371339\), which is awesome.

The goal here is to improve \(m^2O(n^\omega)\) time complexity. This is a novel approach, and if you are an expert please bare with me since I am preparing basic things around, however, I think the idea worth reading even by frontier scientists! Don't forget to cite if you like it!

To achieve that improvement we're going to use Fast Fourier transform(FFT)⁴, which is known to reduce the complexity of computing the Discrete Fourier transform(DFT)⁵ from \(O(n^2)\), which arises if one simply applies the definition of DFT, to \( O(n\log n)\), where \(n\) is the data size. To make yourself familiar with this concept I strongly suggest watching the following video.

Additionally, you can checkout this Fast Fourier Transform⁶ lecture. The author of that lecture mentioned

The algorithm in this lecture, known since the time of Gauss but popularized mainly by Cooley and Tukey in the 1960s, is an example of the divide-and-conquer paradigm.

Where I see someone will write something similar about this post in the future, because what I am going to show you is already known. It's only the connection of the dots here to significantly improve the performance.

The FFT starts by questioning how to improve multiplication of two polynomials with degree \(n\), where \(n+1\) is \(2^k\). The answer is that we map the polynomials to another representation, which is called value representation, then compute the multiplication, and finally map back the result into the original representation. The multiplication in the value representation is so efficient that even with the additional cost of mapping into another representation back and forth, the overall algorithm will decrease the cost of multiplying the two polynomials from \(O(n^2)\) to \(O(n\log n)\). The core idea in the FFT, as you can find in the video, is the tree graph to calculate mapping the polynomials. It's done through below recursive functions:

\[ \begin{cases} P(\omega^j)=P_e(\omega^{2j})+\omega^j P_o(\omega^{2j}) \\ P(\omega^{j+n/2})=P_e(\omega^{2j})-\omega^j P_o(\omega^{2j}) \end{cases} \]

Where \(P_e\) and \(P_o\) are sub-polynomials of \(P\) with respectively even and odd terms. Additionally, \(\omega\) here is referring to the \(m\)th root of unity⁷. Don't confuse it with the previous \(\omega\) above, also this was the last time I used \(\omega\) as a root of unity in this post.

MatMul with FFT

It starts by writing the matrix multiplication of two matrices with degree \(n\times n\).

\[ AB = \left[\sum_u A_{ku}B_{ul} \right]_{kl} \]

Which \(AB\) is the matrix multiplication of two matrices, \(A = \left[A_{kp} \right]_{kp}\) and \(B = \left[B_{pl} \right]_{pl}\). Notice in this notation the indices on the bottom right of the brackets are showing the indices that you need to iterate-on to create the matrix by using its elements. The \(A_{ki}\) and \(B_{il}\) are the elements of the matrix \(A\) and \(B\) respectively, so they are numbers, and we can write them down as the base \(2\) numbers, with maximum precision of \(m\). As discussed before we have \(m=2^q\). Therefore, we have

\[ AB = \left[\sum_{u} A_{kju} 2^u \right]_{kj} \left[\sum_{v} B_{jlv} 2^v \right]_{jl} = \left[\sum_{juv} A_{kju}B_{jlv} 2^{u+v} \right]_{kl} \]

Where \(A_{kiu}\) and \(B_{ilv}\) are the digits of the numbers in the two matrices above. Someone can rewrite this as the following

\[ AB = \left(\sum_{u} \left[A_{kiu} \right]_{ki} 2^u\right) \left(\sum_{v}\left[ B_{ilv} \right]_{il} 2^v\right) \]

Now you can see the two polynomials where their coefficients are matrices of digits. To see them better, you can replace the \(2\) with an unknown variable \(x\), but it's not changing the result of multiplication on the coefficients that we're looking for.

The task is to calculate the multiplication of two polynomials, just like ordinary FFT. In fact, someone can do FFT here to find the results, but dealing with roots of unity⁷ with complex numbers and their infinite digits all over the place, would not help on the performance a lot. Here, we're going to introduce a novel algorithm, Matrix Fast Fourier transform, or for short MFFT, that will do this computation with matrices by using low digit computations.

Roots of unity

They taught us that matrices⁸ are not numbers, but imaginary numbers⁹ are numbers! However, did you see the below calculation?

\[ \begin{bmatrix} 0 & -1 \\ 1 & 0 \\ \end{bmatrix}^2= \begin{bmatrix} -1 & 0 \\ 0 & -1 \\ \end{bmatrix}=-1 \]

Where if we name this matrix \(i\)

\[ i= \begin{bmatrix} 0 & -1 \\ 1 & 0 \\ \end{bmatrix} \]

Then we have \(i^2=-1\), and it's not the end! Checkout Euler's identity¹⁰

\[ e^{ix}=\cos x+i\sin x= \begin{bmatrix} \cos x & -\sin x \\ \sin x & \cos x \\ \end{bmatrix} \]

Which is the elements of \(SO(2)\)¹¹ group. I understood the imaginary numbers with this group since I was in high school. Someone can say: of course, it's just a representation, and I'll respond yes, but it definitely makes the boundary of numbers and matrices very blurry to the point that I can say any matrix representation of a group that is Abelian¹² is also a number set. The Abelian group is just commutative group. This is true for any matrix like \(a+bi\), which are called complex numbers.

After establishing these concepts, now it's the time to mention that \(i\) is the first root of \(x^4=1\) equation, or someone may say it's the generator of roots of unity for \(x^4=1\) equation. In general for \(x^n=1\) roots of unity in complex numbers are

\[ e^{2m\pi i/n} \]

Where \(m=0,1,...,n-1\). Someone can think of these as \(2\times 2\) matrices like what we did above. But it'll get more interesting for \(x^{2^{s+1}}=1\) equations if you want only to compute with \(0\), \(1\), and \(-1\). For instance, what about roots of \(x^8=1\)? Easy! Just increase the dimensions of the matrices. Then the generator for that equation with \(4\times 4\) dimension is

\[ I_2= \begin{bmatrix} 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ \end{bmatrix} \]

This means someone can generate the roots of unity of \(x^8=1\) by just multiplying \(I_2\) multiple times. So the \(k\)th root is just

\[ I_2^k \]

You may notice that I started from \(I_2\) because \(I_1\) is our old imaginary number, \(I_1=i\), and \(I_0\) is just \(I_0=-1\). Additionally, the generator of roots of \(x^{16}=1\), which is a matrix with dimension \(8\times 8\), is as follows.

\[ I_3= \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ \end{bmatrix} \]

In general let me show you the recursive function to calculate the \(I_s^k\), which are the roots of \(x^{2^{s+1}}=1\) equation. Since it's recursive, its time complexity is \(O(2^s)\), but in the final algorithm of matmul we can cache the results in each step and achieve \(O(1)\). Before jumping on the solution, first notice that each row and column only has just one non-zero element, which is either \(1\), or \(-1\). So to generate the elements of \(I_s^k\) we just need a function from row's index to column's index, and we need to also distinguish the \(1\) from \(-1\). The way that I define this function, let's call it \(H\), is giving negative sign to the column's index if its value is \(-1\). For instance, for \(I_1\) we have

\[ \begin{cases} s=1, k=1: H_{11} = [-2,1] \\ s=1, k=2: H_{12} = [-1,-2] \\ s=1, k=3: H_{13} = [2,-1] \\ s=1, k=4: H_{14} = [1,2] \\ \end{cases} \]

Notice, \([1,2]\) is just \(1\) matrix, and \([-2,1]\) is our old imaginary friend, \(i\)! Also, I need to mention that the indices of rows and columns in these lists and matrices are starting from \(1\), not \(0\), to avoid writing something like \(-0\)! Let's provide more samples

\[ \begin{cases} s=2, k=1: H_{21} = [-4,3,1,2] \\ s=2, k=2: H_{22} = [-2,1,-4,3] \\ s=2, k=3: H_{23} = [-3,-4,-2,1] \\ s=2, k=4: H_{24} = [-1,-2,-3,-4] \\ s=2, k=5: H_{25} = [4,-3,-1,-2] \\ s=2, k=6: H_{26} = [2,-1,4,-3] \\ s=2, k=7: H_{27} = [3,4,2,-1] \\ s=2, k=8: H_{28} = [1,2,3,4] \\ s=3, k=1: H_{31} = [-8,7,5,6,1,2,3,4] \\ s=3, k=2: H_{32} = [-4,3,1,2,-8,7,5,6] \\ s=3, k=3: H_{33} = [-6,5,-8,7,-4,3,1,2] \\ s=3, k=4: H_{34} = [-2,1,-4,3,-6,5,-8,7] \\ s=3, k=5: H_{35} = [-7,-8,-6,5,-2,1,-4,3] \\ s=3, k=6: H_{36} = [-3,-4,-2,1,-7,-8,-6,5] \\ s=3, k=7: H_{37} = [-5,-6,-7,-8,-3,-4,-2,1] \\ s=3, k=8: H_{38} = [-1,-2,-3,-4,-5,-6,-7,-8] \\ ... \end{cases} \]

While these looks like passwords, someone can see the pattern. Yes I mean this

\[ H_{sk}[j] = \begin{cases} - H_{s,k-1}[2^s] & \text{if } j = 1 \\ H_{s,k-1}[2^s - 1] & \text{if } j = 2 \\ H_{s,k-1}[2^{s-1}+j - 2] & \text{if } 2 < j \leq 2^{s-1} \\ H_{s,k-1}[j-2^{s-1}] & \text{if } 2^{s-1} < j \end{cases} \]

Of course

\[ H_{s2^{s+1}}=H_{s0}=[1,2,3,...,2^s] \]

Which are the identity matrix¹³. Someone may ask: now that we have these roots of unity, what's their relation to the \(2\times 2\) roots, or to put it more clear, their relation to the complex numbers? For instance, can we write?

\[ e^{i\pi/4}=\cos (\pi/4)+i\sin(\pi/4)= \frac{1}{\sqrt{2}}(1+i)=I_{21} \]

Absolutely yes! Just replace

\[ \begin{bmatrix} 0 & -1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 \end{bmatrix} =\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \otimes\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \]

with \(i\) and

\[ \begin{bmatrix} 0 & 0 & 1 & -1 \\ 0 & 0 & 1 & 1 \\ 1 & 1 & 0 & 0 \\ -1 & 1 & 0 & 0 \end{bmatrix} \]

with \(\sqrt{2}\). You must have been noticed the block multiplication, and the tensor product¹⁴ \(\otimes\), that's used in these calculations. Even though \(I_s^k\) build an Abelian group, someone may say: but they are not commutative with other matrices, so how they are numbers. The answer is they don't need to, just like \(i\) doesn't need that. The trick is to use tensor product¹⁴, \(\otimes\), to keep them independent and commutative with the other computations.

The most important thing that we need in MFFT, is the following property of roots of unity.

\[ I_s^{2^s}=-1 \]

Where someone can deduce \( I_s^k=-I_s^{k+2^s} \) out of it, to be used in the same way we used complex numbers in FFT.

If you can accept it by the sample arrays above, or you are not interested in sketchy proofs below, just jump to the next section, MatMul optimization.

As mentioned I am going to proof some of the basics here, because we don't want to scale up this method to find out it's not working! However, the proofs are not rigorous, and mathematicians would correctly complain. I hear you!

Theorem 1: \(I_s\) and \(e^{i\pi/2^s}\) are isomorphic¹⁵.

If we show \(I_s\) matrices have the same algebraic properties of \(e^{i\pi/2^s}\), then I would argue these are two different representations of that algebra, therefore, they are isomorphic.

For \(I_s\) matrices to be able to build a field¹⁶, \(I_s\) must be commutative, and their determinant¹⁷ be non-zero, which are true because of block structure of \(I_s\), since \(1\) and \(i\) matrices are building blocks of \(I_s\). Now we only need to focus on algebraic property of \(e^{i\pi/2^s}\). We have the following in the complex numbers.

\[ \sqrt{e^{\frac{i\pi}{2^s}}}=e^{\frac{i\pi}{2^{s + 1}}} \]

The similar relationship for \(I_s\) would be like

\[ \sqrt{I_s}=I_{s+1} \]

Where we need to investigate. By the way, this is showing if \(I_s\) makes a group, then \(I_{s+1}\) act like half of \(I_s\), therefore, it'll also make a group. For the simplest case, \(s=0\), we have

\[ \sqrt{-1}= \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \]

Which is exactly \(i\) and we discussed it before. By applying induction¹⁸, we just need to show we can build \(I_{s+1}\) on top of \(I_s\). To do so, we want to show \(\sqrt{I_s}=I_{s+1}\). If we write \(I_{s+1}\) like this

\[ I_{s+1}= \begin{bmatrix} 0 & I_s \\ 1 & 0 \end{bmatrix} \]

Then we have

\[ I_{s+1}^2= \begin{bmatrix} I_s & 0 \\ 0 & I_s \end{bmatrix} =I_s\otimes \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} =I_s \]

Which is another way to write \(\sqrt{I_s}=I_{s+1}\), so Q.E.D.

Theorem 2: The following recursive function calculates \(I_s^k\) \[ H_{sk}[j] = \begin{cases} - H_{s,k-1}[2^s] & \text{if } j = 1 \\ H_{s,k-1}[2^s - 1] & \text{if } j = 2 \\ H_{s,k-1}[2^{s-1}+j - 2] & \text{if } 2 < j \leq 2^{s-1} \\ H_{s,k-1}[j-2^{s-1}] & \text{if } 2^{s-1} < j \end{cases} \]

To show this recursive function is correct, we need to understand the notation we had before. First notice, that the followings are the generators in this notation, respectively \(I_1\), \(I_2\), and \(I_3\).

\[ \begin{cases} s=1, k=1: H_{11} = [-2,1] \\ s=2, k=1: H_{21} = [-4,3,1,2] \\ s=3, k=1: H_{31} = [-8,7,5,6,1,2,3,4] \end{cases} \]

Second, notice that this notation is just a representation of this permutation group¹⁹, with extra instruction for handling the negative signs, where the matrix form of \(I_s\) are the matrix representation of this group. Therefore, if we always handle the negative signs like we're multiplying, then we can say the composition of \(H\) and multiplication of \(I\) are homomorphism²⁰. The composition would look like this

\[ H_{sk}[j]=H_{s,1}[abs(H_{s,k-1}[j])]\times sign(H_{s,k-1}[j]) \]

Where the \(abs\) and \(sign\) functions are respectively absolute value and sign functions. Notice, the indices in these arrays are started from \(1\), not \(0\). The rest of the proof is easy, just check that the \(H\)'s recursive function is exactly the above composition, so Q.E.D.

One last question! In FFT we used Fundamental theorem of algebra²¹, but here we will use matrices as \(x\) in

\[ P_A(x) = \sum_{u} x^u\otimes \left[A_{kiu} \right]_{ki} \]

Does it work for matrices? The answer is yes due to the independent spaces that left and right of \(\otimes\) computing on, so it doesn't matter if the coefficients are numbers or matrices in this case. By the way, we don't need such a general theorem to move forward with FFT. In our case, we just need to show that there are mappings between \(P_A(x)\)'s two representations. The magical mappings are possible because, the same as FFT, these two following matrices exist, and they are reverse of each others.

\[ \begin{bmatrix} 1 & 1 & 1 & ... & 1 \\ 1 & I_s & I_s^2 & ... & I_s^{2^{s+1}-1} \\ ... & ... & ... & ... & ... \\ 1 & I_s^{2^{s+1}-1} & I_s^{2(2^{s+1}-1)} & ... & I_s^{{\left(2^{s+1}-1\right)}^2} \\ \end{bmatrix} \]

and its reverse

\[ \frac{1}{2^{s+1}} \begin{bmatrix} 1 & 1 & 1 & ... & 1 \\ 1 & I_s^{-1} & I_s^{-2} & ... & I_s^{-(2^{s+1}-1)} \\ ... & ... & ... & ... & ... \\ 1 & I_s^{-(2^{s+1}-1)} & I_s^{-2(2^{s+1}-1)} & ... & I_s^{-{\left(2^{s+1}-1\right)}^2} \\ \end{bmatrix} \]

MatMul optimization

This is where we'll use the results above. Let's write the latest equation we had above and start from there.

\[ AB = \left(\sum_{u} \left[A_{kiu} \right]_{ki} 2^u\right) \left(\sum_{v}\left[ B_{ilv} \right]_{il} 2^v\right) \]

We don't need to track the computation of both \(A\) and \(B\), since the mapping is similar for both of them, so let's focus on one of matrices, for example \(A\).

\[ A = \sum_{u} \left[A_{kiu} \right]_{ki} 2^u \]

Now make it a polynomial as described before by replacing \(2\) with \(x\) and keep their computation independent of the matrix coefficients

\[ P_A(x) = \sum_{u} x^u\otimes \left[A_{kiu} \right]_{ki} \]

Now based on FFT's instruction, we need to compute its value on roots of unity. Remember, \(A\) is a \(n\times n\) matrix, and \(I_s\) is a \(2^s\times 2^s\) matrix. Recall the numbers had \(m=2^q\) digits, which means the degree of polynomials is \(m-1=2^q-1\), so we need to take this into account too. Then we need \(m\) roots, therefore, we should have \(2^q \leq 2^{s+1}\), where obviously we choose the minimum for \(s\), which is \(s=q-1\). This means \(I_s\) is a \(\frac{m}{2}\times\frac{m}{2}\) matrix. We just need to plug the roots of unity into this polynomial, then due to the fact that

\[ I_{s}^j=-I_{s}^{j+2^s} \]

We can separate the even and odd sub-polynomials from each other and build the computation tree we need.

\[ \begin{cases} P_A(I_s^j)= P_{Ae}(I_s^{2j})+I_s^j P_{Ao}(I_s^{2j})\\ P_A(I_s^{j+2^s})= P_{Ae}(I_s^{2j})-I_s^j P_{Ao}(I_s^{2j})\\ P_B(I_s^j)= P_{Be}(I_s^{2j})+I_s^j P_{Bo}(I_s^{2j})\\ P_B(I_s^{j+2^s})= P_{Be}(I_s^{2j})-I_s^j P_{Bo}(I_s^{2j}) \end{cases} \]

Notice there's no matrix multiplication is going on in these recursive functions, since \(I_s^j\) has only one non-zero element in each row, where we found how to calculate the respected column before. Hence, to calculate something like \(I_s^j P_{Ao}(I_s^{2j})\), we only need to iterate on rows, and find the respected column then only add, or subtract, \(P_{Ao}(I_s^{2j})\)'s elements for each of them. This means, computation complexity of calculating \(I_s^j P_{Ao}(I_s^{2j})\) is \(\frac{1}{2}mO(n^2)=\frac{1}{2}mO(2^{2s})\). Notice we kept \(m\) out of the big O notation since it's not a scaling variable, where we don't expect it to grow. It's like a constant \(16\), \(32\), etc. so the half is important here.

In the recursive functions above we can only have \(\frac{1}{2}mq=\frac{1}{2}m\log m\) additions/subtractions. Also, the elements of coefficient matrices are only zero or one, hence, the numbers in \(P_A(I_{s}^j)\) and \(P_B(I_{s}^j)\) matrices has an upper limit, and it's \(\frac{1}{2}m\log m\). Therefore, only \(\log(\frac{1}{2}m\log m)\) digits are needed for each elements' calculation, and as we counted we have \(\frac{1}{2}m\log m\) of additions, or subtractions, then the time complexity would be proportional to \(\frac{1}{2}m\log m\log(\frac{1}{2}m\log m)\). This all means, calculation of \(P_A(I_{s}^j)\) for all \(0\leq j <2^{s+1}\) will cost us \(\frac{1}{4}m^2\log m\log(\frac{1}{2}m\log m)O(n^2)\) times single digit operation's cost. Then we can do the following in the value representation.

\[ P_A(I_{s}^j)P_B(I_{s}^j) \]

And after doing this value multiplications, we can map back the polynomials to their original representation. Also notice that after that step, we need to redistribute the elements of the result matrix to be only \(0\), and \(1\), where its time complexity is \(\frac{1}{2}mO(n^2)\).

Unlike FFT, here the most time-consuming step was calculating on the value representation.

\[ P_A(I_{s}^j)P_B(I_{s}^j) \]

Which is another matrix multiplication with the \(O(n^{\omega})\) complexity, so we failed to optimize matmul!! Kidding! The overall time complexity is \(\frac{1}{2}m\log^2(\frac{1}{2}m\log m)O(n^\omega )\), assuming we can ignore the time complexity of the mapping, which means we presumed the following.

\[ \frac{m\log m}{\log(\frac{1}{2}m\log m)} \ll n^{\omega-2} \]

Comparing this to the original time complexity, \(m^2O(n^\omega)\), someone can deduce the following decrease in cost.

\[ 1-\frac{\log^2 (\frac{1}{2}m\log m )}{2m} \]

For instance, 16 digits numbers, which are ordinary numbers in the machine learning matrices, would have \(1-25/32=21\%\), decrease in computation cost which I consider it as a good result, but the good news is fortunately this method is orthogonal to the other methods, so you can use the others to optimize \(\omega\) further.

It's important to mention that we took a trade-off here. The elements of the matrices in MFFT must be integers. I couldn't make it work with floating points!

Conclusion

We invented a method, MFFT, that could reduce the time complexity of matrix multiplication down to \(\frac{1}{2}m\log^2(\frac{1}{2}m\log m)O(n^\omega )\).

I will probably implement it for my c-numbers library. Feel free to send me your thought publicly, or privately.

References

Matrix multiplication with FFT

Discovering novel algorithms with AlphaTensor

Computational complexity of matrix multiplication

⁴

Fast Fourier Transform

⁵

Discrete Fourier transform

⁶

Fast Fourier Transform Lecture

⁷

⁸

⁹

¹⁰

¹¹

¹²

¹³

¹⁴

¹⁵

¹⁶

¹⁷

¹⁸

Mathematical induction

¹⁹

Permutation group

²⁰

Homomorphism

²¹

Fundamental theorem of algebra

²²

Pauli matrices

²³

Gamma matrices

Cite

If you found this work useful, please consider citing:

@misc{hadilq2024MatMul,
    author = {{Hadi Lashkari Ghouchani}},
    note = {Published electronically at \url{https://hadilq.com/posts/matrix-fast-fourier-transform/}},
    gitlab = {Gitlab source at \href{https://gitlab.com/hadilq/hadilq.gitlab.io/-/blob/main/content/posts/2024-11-05-matrix-fast-fourier-transform/index.md}},
    title = {Matrix Fast Fourier transform(MFFT)},
    year={2024},
}

Matrix Fast Fourier transform(MFFT)

The problem

MatMul with FFT

Roots of unity

MatMul optimization

Conclusion

Next

References

Cite