7
$\begingroup$

Let us consider these Q&As:

In the first, the focus is on machine reals. The second question concerns both reals and integers, but it was marked as a duplicate of the first. After playing around with the second question, I began to wonder exactly how or why the behavior of division with integers seemed different than the behavior with machine reals. As noted in answers to both questions, a/b is computed as Times[a, Power[b, -1]]. In an answer to the first Q&A, it is noted that Divide[a, b] is one FLOP (times the length of the arrays), whereas a/b is two FLOPs. Hence a/b takes twice as long, roughly, when both divisions are vectorized.

But the timing on integer arrays does not differ by a factor of 2; rather, it's much greater, on the order of 20 for me (Mac M4 Max, V14.3):

a = RandomInteger[{1, 100}, 100000];
b = RandomInteger[{1, 100}, 100000];
resA = a/b; // RepeatedTiming
resB = Divide[a, b]; // RepeatedTiming
resA === resB (* equivalent results *)
(*
{0.231458, Null}
{0.00976378, Null}
True
*)

The difference in the computation is the construction of an array of rationals (Divide[a, b]) versus constructing an array of rationals (Power[b, -1]) and multiplying it by an array of integers. Since Rational is not a machine type, it seems impossible that vectorization is a factor. Unpacking a and b does not affect the timing significantly; unpacking takes about 0.1 ms.

Why is there a difference? What are good practices for efficient computation with integers, especially with respect to division? Are there other functions that are more efficient?

$\endgroup$

1 Answer 1

8
$\begingroup$

Outline of issues

Timings related to division and fractions are more interesting than the other arithmetic operations, and they will get a little more attention. Some boundaries between performance jumps have been mentioned; others may be obvious:

  • packed versus unpacked arrays:
    • in input.
    • in output.
  • machine integers versus bigints ($|n| > 2^{63}$ or $n = -2^{63}$).
  • vectorized versus unvectorized operations (on packed arrays).
  • one operation in Divide[a,b] versus two operations in a/b.

Of course the timings on bigints grow with their size. The bigints used in the timings below are small, around 2^64, to give an idea of the initial jump between them and machine integers.

Power[]

Power[] is the main reason a/b = Times[a, Power[b, -1]] is so much slower on machine-integer arrays than Divide[a, b]. The table below gives an overview of the performance of Power[] on large arrays. There is a step down in speed when the output is not packable. Such is the case for Power[b, -1] and other negative exponents, and it is quite slow. There is another step down in speed of Power[] when the integers in the output are bigints. The cases when power is 0 or 1 tend to be special cases that are not particularly important. A surprise is that raising a bigint array to a power is faster than raising a packed array to a power when the result contains quite a few bigints (5,000 or more in an array of length 100,000).

Overview of Power[]:

input array power output array timing (s)
mach. int packed > 1 machine packed $5.5\times 10^{-5}$
unpacked ≥ 1 machine packed* $3.3\times 10^{-4}$
any ≥ 1 bigint unpacked $6.3\times 10^{-1}$
any = -1 machine unpacked $3.3\times 10^{-1}$
any < -1 machine unpacked $3.4\times 10^{-1}$
any < -1 bigint unpacked $6.5\times 10^{-1}$
bigint unpacked = -1 bigint unpacked $7.9\times 10^{-3}$
unpacked other bigint unpacked $1.{-}\,4.\times 10^{-2}$
mach. int singleton any bigint singleton $4.1\times 10^{-6}$
bigint singleton any bigint singleton $2.1\times 10^{-6}$

Table 1: Power[] on integer arrays. Other than the last two rows, which time a single integer, timings were on arrays of length 10^5 on a MacBook Pro M4 Max. When the length is ≥ 251 and the power is nonnegative, a machine-int array will be packed, and a vectorized Power[] will be used. By "machine" output for negative powers, I mean the integers in Rational[n, d] are machine integers. When the output is unpacked, the effect of packed vs. unpacked input is negligible compared to the variability in timings. The timing on machine input to bigint output is roughly proportional to how many bigints are produced.

Comparison with other arithmetic operations:

f packed unpacked big,small big,big
Plus $1.2434\times 10^{-5}$ $2.5219\times 10^{-4}$ $\bf 1.0446\times 10^{-2}$ $\bf 1.3225\times 10^{-2}$
Times $1.6838\times 10^{-5}$ $2.5162\times 10^{-4}$ $\bf 9.5097\times 10^{-3}$ $\bf 1.1018\times 10^{-2}$
Divide $\bf 1.0785\times 10^{-2}$ $\bf 1.0838\times 10^{-2}$ $\bf 1.3317\times 10^{-2}$ $\bf 2.5694\times 10^{-2}$
Quotient $4.8694\times 10^{-5}$ $3.2230\times 10^{-4}$ $\bf 1.3558\times 10^{-2}$ $\bf 1.8172\times 10^{-2}$
Mod $3.4405\times 10^{-5}$ $3.0939\times 10^{-4}$ $\bf 9.5853\times 10^{-3}$ $\bf 1.9692\times 10^{-2}$
Power $1.0667\times 10^{-4}$ $3.6628\times 10^{-4}$ $\bf 3.0156\times 10^{-2}$ -

Table 2. Comparison of arithmetic operations f[a, b]. Timings were on arrays of length 10^5 on a MacBook Pro M4 Max. The "small" array is an array of small machine integers, since otherwise Power[a, b] would lead to Overflow[]; for the same there is no timing for when both are bigints. Bold timings indicate the output was not packed.

Vectorized operations

We can see from the overview of Power[a, k] that it is vectorized on a large arrays a of machine integers with nonnegative powers k, provided no integer overflow that falls back on bigint computations. Here is a summary of vectorized division-related operations, provided input and output are machine integers:

  • Vectorized: Quotient[], Mod[].
  • Not vectorized: Divide[], QuotientRemainder[], Power[] with negative powers, Internal`Reciprocal[].
  • Like Power[] with nonnegative power, Quotient[] and Mod[] pack array inputs of length $\ge 251$ as do other vectorizable basic arithmetic operations like Plus[], Times[], and Subtract[]. The length is a system parameter controlled by the option "ListableAutoPackLength" in SystemOptions["PackedArrayOptions"].
  • For large arrays, Transpose[{Quotient[a, b], Mod[a, b]}] is much faster than the equivalent QuotientRemainder[a, b].

Division

Famously, one cannot divide most pairs of integers and get an integer result. Hence Quotient[] and Mod[] are the performant machine-integer operations for division-related computation. Fractions involve other considerations. A significant computational expense in computing a/b by whatever code is the reduction of the fraction to least terms. Such computations are not vectorized. Another consideration is that an array of fractions is not represented as an efficient packed array.

f packed unpacked bigint
Divide $1.4557\times 10^{-2}$ $1.3384\times 10^{-2}$ $3.6033\times 10^{-2}$
slash $2.5502\times 10^{-1}$ $2.5825\times 10^{-1}$ $4.5660\times 10^{-2}$
reciprocal $1.5873\times 10^{-2}$ $1.6426\times 10^{-2}$ $4.3957\times 10^{-2}$
mapRational $1.3904\times 10^{-2}$ $1.2133\times 10^{-2}$ $3.3932\times 10^{-2}$
threadRational $7.4402\times 10^{-3}$ $8.3068\times 10^{-3}$ $6.1329\times 10^{-3}$

Table 3. Timings of integer-to-rational division. Timings were on arrays of length 10^5 on a MacBook Pro M4 Max. Here are the codes for the composite operations:

slash[a_, b_] := a/b;
reciprocal[a_, b_] := a*Internal`Reciprocal[b];
mapRational[a_, b_] := MapThread[Rational, {a, b}];
threadRational[a_, b_] := Block[{Rational}, Thread[Rational[a, b]]];

The arithmetic of fractions

There are fewer cases to consider with inputs that are fractions. The main difference in speed is whether the terms of the fractions are machine integers or bigints. On an array of length $10^5$ of machine-integer fractions, the code a/b and a * Internal`Reciprocal[b] take about the same amount of time (0.033±0.02s); Divide[a, b] is somewhat faster (0.022±0.03s). Plus[a, b] and Times[a, b] take the same time as Divide[a, b]. The timings on fractions with bigint terms are about three to four times as long and grow as the bigints get bigger; the time for Plus seems to grow more slowly.

Sample codes

Here are examples of the arrays used:

(* arrays *)
packed = RandomInteger[{1, 1000}, 100000]; 
unpacked = Developer`FromPackedArray@packed;
small = RandomInteger[{2,6}, 100000]; 
bigint = RandomInteger[{2^63, 2^64}, 100000];
bigger = RandomInteger[{2^123, 2^124}, 100000];
fraction = bigger/bigint;

Timings were done with either RepeatedTiming[f[a,b];] or on longer running trials with just AbsoluteTiming[f[a,b];], with different arrays for a and b.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.