Fast Exact Integer Roots – Kevin Ventullo

Contents

Classic Approach 2-adic approach; k is odd 2-adic approach; k is even 2-adic approach; k is even (Alternative)Benchmarks Further Exploration

Suppose you knew that 9,273,284,218,074,431 was a perfect 7th power. How would you compute the 7th root?

This is a long overdue sequel to the previous post, in which the author promised to derive an efficient algorithm for computing exact k-th roots of integers. That is, computing the k-th root of an integer assumed to be a perfect k-th power. This post will assume familiarity with that post, and use many of the same conventions. In particular, refers to the usual base-2 logarithm, while ${log_2}$ and ${exp_2}$ refer to the 2-adic logarithm and exponential.

First, fix an integer ${k>1}$ , and suppose ${x}$ is an integer perfect k-th power, i.e. there exists an integer ${y}$ such that ${y^k=x}$ . Our goal is to compute ${y}$ . Define ${n}$ to be the minimum integer for which ${x < 2^{nk}}$ , i.e. for which ${x}$ can be expressed in at most ${nk}$ bits. This implies that ${y < 2^{n}}$ , i.e. ${y}$ is expressible in at most ${n}$ bits.

Classic Approach

The classical Newton’s method approach to calculating ${y}$ works by iteratively computing the values

${y_i = frac{1}{k} left[(k-1)y_{i-1} + frac{x}{y_{i-1}^{k-1}} right]}$ .

This works even when doing integer division, and will compute the rounded answer even for non-perfect powers. A ballpark complexity analysis goes as follows:

The rate of convergence is quadratic in the size of the error. In other words, the number of digits of accuracy double on each iteration. So, if we know the final answer is an ${n}$ -bit number, we would expect this method to take about ${lg(n)}$ iterations to converge. Moreover, each step requires an ${nk}$ -bit division as well as ${lg(k)}$ smaller multiplications to compute the ${(k-1)}$ -th power. So, without getting into the weeds, the overall complexity is something like ${O(nk*lg(nk)^c)}$ for some constant ${c}$ .¹

On the other hand, what happens if we try to use ideas from 2-adic analysis?

2-adic approach; k is odd

When k is odd, there is a straightforward extension of the previous post to solve this problem. Namely, one first factors x into its even and odd parts:

$displaystyle x = 2^{v_2(x)}*x_{odd}$ ,

and calculates the k-th roots for each part separately. Since x is a k-th power, we know that ${v_2(x)}$ is divisible by k, so the k-th root of the even part is simply ${y_{even} = 2^{frac{v_2(x)}{k}}}$ . For the odd part, we can compute the k-th root as

${text{Root Equation}}$
${y_{odd} = sqrt[^k]{x_{odd}} = x_{odd}^{k^{-1}} = exp_2left(frac{1}{k}log_2(x)right)}$ ,

Here, ${k^{-1}}$ refers to the 2-adic or modular inverse of k. Now, we’ve been a little fast and loose with the exact domains we’re working in, but when k is odd the above equation is true whether we’re working in the full characteristic zero ring ${mathbb{Z}_2}$ or any of the quotient rings ${mathbb{Z}/2^{whatever}}$ .

So, here’s the really amazing part.

We know that ${x_{odd}}$ is potentially ${nk}$ bits long, and ${y_{odd}}$ it at most ${n}$ bits. So, if we want to compute the exact value of ${y_{odd}}$ , it is sufficient to run the above calculation inside ${mathbb{Z}/2^n}$ . But wait, isn’t that losing a ton of information about ${x_{odd}}$ ? Surprisingly, the answer is no! That’s right, for k odd, if you want to compute the kth root of a perfect kth power integer that is nk bits long, you actually only need to know its first n bits!

In effect, this removes a dependency on k from the complexity analysis, since all calculations can be performed inside ${mathbb{Z}/2^n}$ . The algorithm is fairly simple: compute the modular inverse of k, and then run the exponentiation algorithm from the previous post. In that post, we showed that the exponentiation algorithm could be done with ${O(sqrt{n})}$ multiplications, and it is well known that modular inverses can be done in ${O(lg(n))}$ multiplications via Newton’s method. Putting this together, at a very coarse level, the complexity is something like ${O(n*sqrt{n}*lg(n)^c)}$ for some constant ${c}$ .

So how does these compare? Ignoring the log terms, we might expect the 2-adic approach to work better whenever ${k > sqrt{n}}$ . To put it another way, if we hold the number of bits of the input argument fixed, say ${M=nk}$ , then we would expect the 2-adic approach to work better as soon as ${k > sqrt[3]{M}}$ . For 64-bit arguments, this means we would expect the 2-adic approach to be faster at extracting k-th roots for any ${k > 4}$ . Again, this is a lot of napkin math, and we’ll run concrete benchmarks at the end. For now, let’s move on to handling k even.

2-adic approach; k is even

First, the 2-part of x goes through unchanged; the number of trailing zeros is still divisible by k, so just divide that number by k. For notational simplicity, we will assume for the remainder of this section that x is odd.

When k is even, it turns out that Root Equation is still valid in the characteristic 0 ring ${mathbb{Z}_2}$ , but the tricky part is in correctly handling division by 2 in the finite quotient rings. The basic observation is that division by 2 can be well-defined in the quotient rings, but only in a specific sense:

${2mathbb{Z}/2^{whatever} rightarrow mathbb{Z}/2^{whatever-1}}$
${2a (text{mod }2^{whatever}) mapsto a (text{mod }2^{whatever-1})}$ .

Let ${kappa = v_2(k)}$ , which by assumption is a positive integer. Then we can write

${k = 2^kappa*k_{odd}}$ .

Given that x is odd and a perfect k-th power, we can actually make a much stronger claim about x:

${v_2(x-1) geq kappa + 2}$ ,

i.e.

${x equiv 1 (text{mod }4*2^kappa)}$ ,

i.e. the first ${kappa + 2}$ bits of x are all 0 except for the final 1. This can be proven by induction. Note that this also implies

${log_2(x) equiv 0 (text{mod }4*2^kappa)}$ .

With this in mind, while it would appear that we are losing ${kappa}$ bits of information when we perform division by k in Root Equation, it turns out those bits are all 0’s anyway. So, the division by ${2^kappa}$ in Root Equation can in fact be thought of as a well-defined mapping

${2^{kappa+2}mathbb{Z}/2^{kappa+n} rightarrow 4mathbb{Z}/2^{n}}$ .

Great, so what does this mean for our computation? It means that when computing the 2-adic log of x, we do need to keep track of the first ${kappa+n}$ bits, not just the first n bits as we did when k is odd. This is counterbalanced by the fact that the larger ${kappa}$ gets, the more 2-divisible ${x-1}$ necessarily is, and hence as we saw in the original post, the faster the Taylor series for ${log_2}$ converges. In any case, ${kappa}$ is naturally bounded by ${lg(k)}$ , and so can absorbed by the log terms in our coarse estimates.

There is one last complication to deal with, but it is minor. Namely, we have implicitly been assuming throughout that our final answer will be positive. However, when dealing with even roots, there is always going to be a ${pm}$ ambiguity in the “correct” answer. Fortunately for us, there are no higher order ambiguities as ${pm 1}$ are the only roots of unity inside ${mathbb{Z}_2}$ . Nevertheless, both solutions are algebraically valid.

We can resolve this ambiguity as follows: the positive solution starts with an infinite string of leading zeroes, while the negative solution starts with an infinite string of leading ones. In order to determine which one we are looking at in the final answer, we just need one extra bit of precision. Given that, the sign of the leading bit will then tell us whether we’re looking at the negative or positive solution. To summarize, the logarithm will need to be computed to ${kappa+n+1}$ digits, the division will remove ${kappa}$ digits, and the exponential will need to be computed to ${n+1}$ digits. If the leading digit is a 1, the entire solution should be negated, and the remaining n bits are the final answer.

2-adic approach; k is even (Alternative)

This is a alternative to computing perfect roots via the Root Equation. In short, it exploits Euler’s identity:

${exp(x) = lim_{nrightarrow infty} left(1 + frac{x}{n}right)^n}$

As you can probably guess, this identity holds 2-adically as well, given the right definitions. The manner in which we will use it is

${exp_2(x) = lim_{trightarrow 0} left(1 + txright)^frac{1}{t}}$

Here, ${lim_{trightarrow 0}}$ can be interpreted as meaning t is divisible by larger and larger powers of 2. Now, suppose we are trying to compute some kth root of an odd number:

${left(1 + xright)^frac{1}{k}}$

For any n, we have the identity

${left(1 + xright)^frac{1}{k} = left(left(1 + xright)^{2^n}right)^frac{1}{2^nk}}$

As n gets larger, according to Euler’s identity we have

${left(left(1 + xright)^{2^n}right)^frac{1}{2^nk} approx exp_2left(frac{1}{2^nk}left(left(1 + xright)^{2^n}-1right)right)}$

Each time we increment n, the expression ${left(1 + xright)^{2^n}}$ gains an additional leading zero after the initial one, but the approximation gains two bits of accuracy. Thus, if we have an a priori bound for how many bits are needed to represent the answer, we can choose a sufficiently large n and compute the right hand side to the appropriate precision.

Benchmarks

The 2-adic algorithms are only tested on 64-bit integers as before, and are not fully optimized to calculate just the needed bits of precision; some operations compute all 64 bits. This makes them less performant than they ought to be. Nevertheless, we will see that the 2-adic approach does outperform the Newton approach as k increases, and the critical threshold for our implementation occurs around ${k=4}$ , matching our prediction closely.

The benchmark computes, for k between 2 and 18, one million kth perfect powers less than ${2^{64}}$ . For large k, there are fewer than one million such numbers, so repeats are used. Then, each of the 2-adic and Newton approaches are run in a loop, and the overall timing is measured. The code can be found here.²

Evidently, Newton’s method does perform worse as k grows due to the expense of repeatedly computing kth powers in a loop. Conversely, the 2-adic approach tends to get better as k grows due to a smaller number of bits of precision needed to compute the final answer. The performance of the 2-adic approach for k odd is essentially constant, as we simply used the exponentiation function from the previous post to a full 64 bits of precision. This could be modified to compute fewer bits in order to improve performance for odd k. For even k, we implemented an optimization to the Euler formula method by keeping track of bits of precision to avoid some computations. However, this has not been fully exploited, so there is still low-hanging fruit in the even case. In both cases, the estimates for the bits of precision are based solely on the value of k and the knowledge that the input is at most 64 bits; we could also use the exact number of bits needed to represent the input to save even more work. Nevertheless, the 2-adic approach is already quite fast.

Further Exploration

While writing this post, the author was reminded of an inversion of Euler’s equation above which computes the 2-adic logarithm. Namely, we have the 2-adic identity

${log_2(x) = lim_{krightarrowinfty} frac{x^{2^k}-1}{2^k}}$ .

For finite k, this approximation should be valid to about k bits, but note that the numerator of the right hand side will need to be computed to 2k bits of precision (the trailing k+2 bits will all be zero). This appears to be a much simpler formula for the 2-adic logarithm than the Taylor series, and one for which the number of bits of precision can be finely tuned. Calculating bits of precision using the Taylor series is trickier due to the wildly varying 2-adic valuations of the coefficients.

In a related vein, there is yet another Taylor series which calculates nth roots directly:

${(1+x)^{frac{s}{t}} = sum_{n=0}^infty frac{prod_{k=0}^{n-1} (s-kt)}{n!t^n}x^n}$ ,

which should work just as well 2-adically. Note that the radius of convergence of this series depends on the 2-adic valuation of t. Although it appears complicated, each summand can computed from the previous with a constant number of operations. It seems plausible that this could used to build an even more efficient algorithm, but that will have to wait for future work.

Thanks to Jason Sundram for feedback on an earlier version of this post.

Source link