V8 System Emulation -- required work and features (draft) Milestones: * aarch64 kernel boots enough to start printing kernel log info * aarch64 kernel boots to trying to start userspace * aarch64 kernel and userspace work enough to get to shell prompt * aarch64 kernel can boot SMP * aarch32 userspace processes can run * minor missing features added Properly define what PSTATE should look like in QEMU's CPU state Identify what CPU we're emulating -- A57, or a "generic v8 CPU" like the foundation model? Reset -- reset vector is impdef, readable by register -- v8-only things need resetting GDBstub -- fix bugs, make sure it works in aarch64 system mode (this will be highly useful for debugging everything else) TLB walk code -- need to handle 64 bit VA/PA, v8 changes to the system registers which control page table format, etc Exception entry -- new vector table format, vector base addresses, etc ELR/SPSR changes Exception return -- return differs a bit Events -- SEV, WFE, etc -- linux makes heavy use of these -- default "nop" implementation may work, or may not... System registers Many system registers for AArch64 are the same as for v7, or only slightly different. The general approach here will be to retain and augment the existing CPU state structure and hashtable entries for system registers: * registers that are the same can just be flagged as "accessible in AArch64"; the MSR/MRS instructions will only look for flagged registers * registers which don't exist in AArch64 won't be flagged and so are effectively non-existent to the guest * registers with minor changes can have their code adapted * registers which are extended to 64 bits can have the extra state stored in a hi/lo pair of fields or by extending a uint32_t field to uint64_t * for VM state save/restore of a 64 bit CPU we only save/restore the state marked as AArch64-accessible * genuinely new-for-AArch64 registers can have their definitions added only for v8 CPUs (in the same way we handle new-in-v7 and new-in-v6 regs). Their actual implementation code will be new work, obviously. The following is a complete categorised list of AArch64 system registers (excluding perf monitor, debug, implementation specific and EL2/EL3 only registers): Registers which are unchanged from v7 and just need to be accessible via the AArch64 MRS/MSR instructions: MIDR_EL1 unchanged CTR_EL0 unchanged REVIDR_EL1 unchanged (copy of MIDR, effectively) ID_PFR*, ID_DFR*, ID_AFR*, ID_MMFR*, ID_ASAR*, MVFR* unchanged (constant feature-bit regs) CCSIDR_EL1 unchanged CLIDR_EL1 unchanged AIDR_EL1 unchanged CCSELR_EL1 unchanged ACTLR_EL1 unchanged ISR_EL1 unchanged AFSR0_EL1, AFSR1_EL1 are IMPDEF and can be considered unchanged CONTEXTIDR_EL1 unchanged TEECR32_EL1, TEEHBR32_EL1 essentially unchanged CNTFRQ_EL0, CNTPCT_EL0, CNTVCT_EL0, CNTKCTRL_EL1, CNTP_TVAL_EL0, CNTP_CTL_EL0, CNTP_CVAL_EL0, CNTV_TVAL_EL0, CNTV_CTL_EL0, CNTV_CVAL_EL0 generic timer registers, unchanged Registers which have minor differences (like extension from 32 to 64 bits) but are broadly unchanged and will need only minor tweaks: MPIDR_EL1 now 64 bits TTBR0_EL1, TTBR1_EL1 now 64 bits with 16 bit ASID, otherwise unchanged PAR_EL1 format changes but like existing version MAIR_EL1 concatenation of 2 existing registers AMAIR_EL1 like existing registers VBAR_EL1 extended to 64 bits, otherwise broadly unchanged TPIDR_EL0, TPIDRRO_EL0, TPIDR_EL1 extended to 64 bits, otherwise unchanged Registers which are new or different: DCZID_EL0 new reg, reads as constant value ID_AA64* new AArch64 feature bit regs SCTLR_EL1 some new control bits; these are all enables for "should we trap this operation?" and are not too hard to implement .UCI -- enable EL0 use of some cache operations .nTWE -- should EL0 WFE suspend or trap to EL1? .nTWI -- ditto, for WFI .UCT -- enable EL0 use of CTR_EL0 .DZE -- enable EL0 use of DZ (data cache zero operation) .UMA -- enable EL0 use of interrupt masks .SED -- disable SETEND .ITD -- disable deprecated IT blocks .CP15BEN -- disable cp15 barrier insns .SA0 -- enable EL0 stack alignment checks .SA -- enable stack alignment checks CPACR_EL1 new definition ('trap on FP/Neon' bits have moved) TCR_EL1 different for 64 bit: will be implemented as part of page walk ESR_EL1 exception syndrome -- new for 64 bits this will be effort as we now need to record syndrome information (ie "why did we take this exception") everywhere we currently just have "take UNDEF [etc] exception now FAR_EL1 fault address register (similar to existing IFAR/DFAR) RVBAR_EL1 new for AArch64: constant, defines the reset address RMR_EL1 not needed for QEMU as we will insist that EL1 is AArch64 The following is a list of the v8 "system instructions" which are actually in the system-register encoding space (like their v7 predecessors) and can be coded as such: IC -- icache maintenance (noop for QEMU) DC -- dcache maintenance. Mostly noop for QEMU, except: DC ZVA -- new in v8, effectively a "zero block of memory" instruction DC IVAC -- noop but requires write permission check DC CVAU/CVAC/CIVAC -- noop but if from EL0 need read permission check AT -- address translation : simple adjustments to existing code TLBI -- tlb maintenance (noop for QEMU) FP sysregs -- minor changes here System only instructions: -- SVC -- MSR/MSR (for reading/writing system registers) -- LDRT & friends -- system-emulation versions of exclusive load/store insns === for SMP support === Major issue here is how we boot the secondary CPUs. v8 kernels expect PSCI boot, which we don't yet support for TCG. === for 32 bit userspace process support === General purpose and FP register mapping between AArch32 and AArch64 Exception entry/exit changes to correctly switch mode === minor missing features (could potentially be dropped/postponed) === SError -- effectively a new h/w interrupt (like IRQ, FIQ) for asynchronous system errors -- QEMU won't ever generate this so a trivial no-op implementation of the new enable bits will suffice EL1 can choose which SP to use -- fairly easy to implement, but Linux doesn't need this Catch illegal process states -- setting CPSR.MODE to bad values, exception return to wrong level, etc Software stepping -- handled via a PSTATE SS bit -- needed to run gdb inside the guest Non-4K page tables -- architecture allows 16K and 64K pages; Linux will cope with 4K only AArch32 insns which should UNDEF now -- SWP, SWPB should UNDEF -- VFP vectors should UNDEF -- CP15 barriers may optionally UNDEF -- deprecated IT blocks may UNDEF CPU save/restore support === not included in this card because covered elsewhere === AArch32 v8 only instructions [see other card for list] AArch64 A64 user level instructions [see other card] === not included === Not included (because not implemented for existing v7 emulation): -- big-endian -- perf counters -- debug support -- Hyp mode (EL2) -- Secure mode (EL3) -- trapping on misaligned loads/stores/etc (including the new-to-v8 PC and SP misalignment traps) -- modelling of CPU cache Not included (because not necessary for our purposes): -- supporting v8 CPUs with 32 bit EL1 (configurably or otherwise)