Int32 VS Int64 performance

Having compared int to float arithmetics, one should wonder: is there a similar difference between Int32 and Int64? Let’s find out!

Benchmarks

As usually, we are going to get to the numbers right away

Addition:
Benchmark                Mode  Cnt     Score   Error  Units
Int32vs64.sumInt32       avgt   10   220.515 ± 0.365  us/op
Int32vs64.sumInt64       avgt   10   224.636 ± 0.771  us/op

Subtraction:
Benchmark                Mode  Cnt     Score   Error  Units
Int32vs64.minusInt32     avgt   10   219.931 ± 0.314  us/op
Int32vs64.minusInt64     avgt   10   223.050 ± 1.612  us/op

Multiplication:
Benchmark                Mode  Cnt     Score   Error  Units
Int32vs64.multiplyInt32  avgt   10   654.286 ± 1.963  us/op
Int32vs64.multiplyInt64  avgt   10   649.246 ± 1.391  us/op


Division:
Benchmark                Mode  Cnt     Score   Error  Units
Int32vs64.divideInt32    avgt   10  1971.968 ± 3.969  us/op
Int32vs64.divideInt64    avgt   10  1945.585 ± 4.307  us/op

Benchmarks made with JMH. Code below.
CPU: AMD Ryzen 9 5900x
OS: Ubuntu 21.10 64-bit
JVM: OpenJDK 17 64-bit
JMH: version 1.34
Score: how long does it take to execute an operation on 1 mln values

Conclusions

As we can see, there is no measurable difference in arithmetic performance between int and long data types. The reason for that is fairly simple: most modern systems use 64-bit registers for both types. Historically, that wasn’t the case and int used to represent different sizes depending on CPU and ranged from 4 bits (Intel 4004) to 64 bits (IBM 7030). In fact, it comes down to a notion of a word. Int usually represents a single word, while long represents double word.

A word - is a number of bits for majority of the CPU registers. Intel 8080 used to have 8 bits per register. All modern CPU architectures (x86-64, ARMv8, ARMv9, z/Architecture, etc) are using 64-bit words. You may still encounter some 32-bit CPUs, either in older servers or embedded devices space, using older architectures like ARMv7, ARMv6 (Raspberry Pi, Apple M7), IA-64 (Itanium servers), PowerPC.

Another important factor to consider is OS data model. Popular data models include: LP64, LLP64 and ILP64.

Model short int long long long pointer OS Desc
ILP32 16 32 32 64 32 Embedded systems Int, Long, Pointer, 32
LLP64 16 32 32 64 64 Windows Long Long, Pointer, 64
LP64 16 32 64 64 64 Linux, MacOS, BSD, Solaris, z/OS, Windows + Cygwin Long, Pointer, 64
ILP64 16 64 64 64 64 HAL Computer Systems Int, Long, Pointer, 64
SILP64 64 64 64 64 64 UNICOS Short, Int, Long, Pointer, 64

As you can see, the vast majority of modern CPUs (both ARM and x86) use 64-bit registers for both int and long.

To ensure better compatibility, most compilers set the size of an integer and long according to their spec instead of relying on OS data model. For example, C and C++ compilers define int32_t and int64_t types in <stdint.h>. You may need to check specification of your compiler.

Pointer length can also differ between compilers and even compiler settings. For example, using parameter UseCompressedOops passed to JVM you are going to enable compressed pointers, that take only 32-bits instead of 64. In fact, JVM can be configured to work in ILP32, LP64 or ILP64 mode.

Code:

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.openjdk.jmh.runner.options.VerboseMode;

import java.util.Random;
import java.util.concurrent.TimeUnit;

public class Int32vs64 {

    @State(Scope.Thread)
    public static class ExecutionPlan {

        int len = 1_000_000;

        int[] int32arr = new int[len];
        long[] int64arr = new long[len];

        Random rnd = new Random();

        @Setup(Level.Trial)
        public void setUp() {
            for (int i = 0; i < len; i++) {
                int val = rnd.nextInt();
                // do not generate 0 to ensure no division by zero
                if (val == 0) val = 1;
                int32arr[i] = val;
                int64arr[i] = val;
            }


        }
    }

    //<editor-fold desc="Division">
    @Benchmark
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public int divideInt32(ExecutionPlan plan) {
        int result = Integer.MAX_VALUE;
        for (int i = 0; i < plan.len; i++){
            result /= plan.int32arr[i];
        }
        return result;
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public long divideInt64(ExecutionPlan plan) {
        long result = Integer.MAX_VALUE;
        for (int i = 0; i < plan.len; i++){
            result /= plan.int64arr[i];
        }
        return result;
    }
    //</editor-fold>

    //<editor-fold desc="Multiply">
    @Benchmark
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public int multiplyInt32(ExecutionPlan plan) {
        int result = 1;
        for (int i = 0; i < plan.len; i++){
            result *= plan.int32arr[i];
        }
        return result;
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public long multiplyInt64(ExecutionPlan plan) {
        long result = 1;
        for (int i = 0; i < plan.len; i++){
            result *= plan.int64arr[i];
        }
        return result;
    }
    //</editor-fold>

    //<editor-fold desc="Subtract">
    @Benchmark
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public int minusInt32(ExecutionPlan plan) {
        int result = 0;
        for (int i = 0; i < plan.len; i++){
            result -= plan.int32arr[i];
        }
        return result;
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public long minusInt64(ExecutionPlan plan) {
        long result = 0;
        for (int i = 0; i < plan.len; i++){
            result -= plan.int64arr[i];
        }
        return result;
    }
    //</editor-fold>


    //<editor-fold desc="Add">
    @Benchmark
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public int sumInt32(ExecutionPlan plan) {
        int result = 0;
        for (int i = 0; i < plan.len; i++){
            result += plan.int32arr[i];
        }
        return result;
    }

    @Benchmark
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    @BenchmarkMode(Mode.AverageTime)
    public long sumInt64(ExecutionPlan plan) {
        long result = 0;
        for (int i = 0; i < plan.len; i++){
            result += plan.int64arr[i];
        }
        return result;
    }
    //</editor-fold>

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(Int32vs64.class.getSimpleName())
                .warmupIterations(1)
                .measurementIterations(10)
                .threads(2)
                .forks(1)
                .verbosity(VerboseMode.EXTRA)
                .build();

        new Runner(opt).run();
    }

}

Sources

Intel: Using the ILP64 Interface vs. LP64 Interface
UNIX: 64-Bit Programming Models: Why LP64?
IBM: LP64 application performance and program size
Wikipedia: 64-bit computing
Wikipedia: x86 instruction listings
OpenJDK Wiki: CompressedOops by John Rose

05 Apr 2022 - Hasan Al-Ammori