64-bit ARM

introduction to porting

Written by Riku Voipio

What is Aarch64?

64 Bit Instruction set introduced in ARMv8

Overview

  • 64-Bit pointer and registers
  • fixed length (32bit) instructions
  • load/store architecture
  • little endian (big endian possible)
  • 31 general purpose registers and zero register
  • unaligned access ok (some exceptions)

Traditional ARM features gone

  • conditional execuction of most instructions
  • "free shifts" in arithmetic instructions
  • open access to PC register
  • co-processor concept
  • load/store multiple instructions
  • floating point support is now mandatory

Traditional ARM features still here

  • VFP -mostly same
  • AdvSIMD is based on NEON but with major changes
  • weakly ordered memory
  • basic arithmetic instructions usually same

New features

  • load-acquire and store-release atomics
  • crypto (AES and SHA) instructions
  • larger PC-relative addressing and branching
  • AdvSIMD usable for general purpose float math
  • nontemporal (cache skipping) load/store

Registers

64 Bit integer registers:
X0    X1    X2    X3    X4    X5    X6    X7
X8    X9    X10   X11   X12   X13   X14   X15
X16   X17   X18   X19   X20   X21   X22   X23
X24   X25   X26   X27   X28   X29   X30/LR SP/ZERO

only register with special semantics is 31, which acts as both stack pointer and a zero register

bottom 32 bits of the registers are referred as W0 .. W30

Scalar/SIMD Registers

SIMD and Scalar share register bank

  • 32 bit float registers: S0 ... S31
  • 64 bit double registers: D0 ... D31
  • 128 bit SIMD registers: V0 ... V31

S0 is bottom 32 bits of D0 which is the bottom 64 bits of V0

Threads

Some programmers when confronted with a problem say: I know, I'll use threads to solve the problem. Programmer has now two problems.
- Internet folklore

With ARM MP systems, the thread using programmer will also have to deal with weak memory model.

Weakly ordered memory model

Unlike on X86, but like Aarch32 and powerpc, order of writes to memory isn't guaranteed. Deal with it:

  • use mutexes!
  • barrier instructions DMB, DSB, ISB
  • ARMv8: Load-Acquire/Store-Release instructions: LDRA, STRL

GNU/Linux porting issues

Good News

Most typical C/C++ OSS software compiles just fine - except:

  • when code assumes endianness or struct sizes
  • or calls kernel system call directly
  • or has assembler code or a JIT
  • or uses autoconf ^_^

Most common porting problem

checking build system type... x86_64-pc-linux-gnu
checking host system type... Invalid configuration `aarch64-oe-linux': machine `aarch64-oe' not recognized
configure: error: /bin/sh config.sub aarch64-oe-linux failed

Please run autoreconf against autotools-dev 20120210.1 or later, and make a release of your software.

Available defines

aarch64-oe-linux-cpp -dM -E - < /dev/null|sort
...
#define __aarch64__ 1
#define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
#define __CHAR_UNSIGNED__ 1
#define __SIZEOF_POINTER__ 8
...

but this is gcc specific!

alternative: use autoconf

AC_INIT(fooapp, 0.1)
AC_CHECK_SIZEOF(void *)
AC_C_BIGENDIAN()

autoconf does have it place

test features, not platform

Works but not portable

#if defined (__alpha__) || defined(__aarch64__)
// assume 64-bit pointers
#elif ...

Instead

#if __SIZEOF_POINTER__ == 8 
// assume 64-bit pointers
#elif ...

Aarch64 call convention

Arguments and return values in registers

  • X0 - X7 arguments and return value
  • X8 indirect result (struct) location
  • X9 - X15 temporary registers
  • X16 - X17 intra-call-use registers (PLT, linker)
  • X18 platform specific use (TLS)
  • X19 - X28 callee-saved registers
  • X29 frame pointer
  • X30 link register
  • SP stack pointer (XZR)

Aarch64 call convention floats

VFP/SIMD mandatory - no soft float ABI

  • V0 - V7 arguments and return value
  • D8 - D15 callee saved registers
  • V16 - V31 temporary registers

Bits 64:128 not saved on V8-V15

System calls

Generic syscall numbers come kernel header "asm-generic/unistd.h", used also by other new architectures (tile, hexagon, openrisc, unicore32, c6x and score). Since the architectures are new, some legacy support has been removed

look down for details

deprecated system calls are not available:
alarm      -> ualarm      bdflush   -> gone!  
epoll_wait -> epoll_pwait fork      -> clone
futimesat  -> utimensat   getdents  -> getdents64
getpgrp    -> getpgid     oldumount -> umount
pause      -> ?           poll      -> ppoll
recv       -> recvfrom    select    -> pselect6
send       -> sendto      sysctl    -> use /proc/sys
time       -> ?           uselib    -> gone!
ustat      -> statfs      utime     -> utimes
pre-at system calls are not available:
open    -> openat      link     -> linkat
unlink  -> unlinkat    mknod    -> mknodat
chmod   -> chmodat     chown    -> chownat
mkdir   -> mkdirat     rmdir    -> rmdirat
lchown  -> lchownat    access   -> accessat
rename  -> renameat    readlink -> readlinkat
symlink -> symlinkat   utimes   -> utimensat
system calls without flags parameter:
pipe         -> pipe2
dup2         -> dup3
epoll_create -> epoll_create1
inotify_init -> inotify_init1
eventfd      -> eventfd2
signalfd     -> signalfd4

Syscall numbers

RHEL5 has 2.6.18 (2007) and Debian Lenny 2.6.26 (2009), any older syscalls can be safely used with: #include <sys/syscall.h>. The last syscalls defined in 2.6.18 are:

#define __NR_vmsplice 316
#define __NR_move_pages 317

Consider also if your software can now move from direct system calls to glibc wrapper.

More information

Transition Styles

You can select from different transitions, like:
Cube - Page - Concave - Zoom - Linear - None - Default

Themes

Reveal.js comes with a few themes built in:
Sky - Beige - Simple - Serif - Night - Default

* Theme demos are loaded after the presentation which leads to flicker. In production you should load your theme in the <head> using a <link>.