## Transparent Superpages Support for FreeBSD on ARM

Zbigniew Bodek zbb@semihalf.com zbb@freebsd.org

17.05.2014 Ottawa



## Presentation outline

Virtual Memory principles of operation drawbacks Introduction to Superpages basic concepts implementation for ARM Validation and benchmarking Future work





Transparent Superpages Support for FreeBSD on ARM

## Virtual Memory





















- Accessing memory on ARM
- Limitations
  - Small TLBs (due to speed restrictions)
  - 4 KB page size
    - to maintain dense granulation and hence small fragmentation factor

# SMALL TLB COVERAGE



- Accessing memory on ARM
- How to overcome?
  - Enlarge TLB?
  - Use bigger pages?
    - Allow user to decide which page size to use?



# Superpages technique overcomes this issue Reducing TLB misses





#### Reservation-based allocation





#### Reservation-based allocation

sys/arm/include/vmparam.h

- VM\_NRESERVLEVEL specifies a number of promotion levels enabled for the architecture. Effectively this indicates how many superpage sizes are used.



#### Reservation-based allocation

sys/arm/include/vmparam.h

VM\_NRESERVLEVEL - 1 (one superpage size will be used)

VM\_LEVEL\_0\_ORDER - 8 (superpage will consist of 256 (1 << 8)

base pages



Introduced support for machine-dependent portion of Superpages mechanism

- promotion pmap\_promote\_section()
- demotion pmap\_demote\_section()
- creation pmap\_enter\_section()
- removal pmap\_remove\_section()
- shared mappings management pmap\_pv\_promote/demote\_section()
- other modifications of the pmap(9) module



## Introduced support for machine-dependent portion of Superpages mechanism

**Virtual Address Space** 

#### **Physical Address Space**





- Summarize general functionalities
  - Superpage creation
    - I. Check for contiguity & attributes consistency
    - 2. Allocate & set up single PV entry for the superpage
    - 3. Create a IMB section mapping (don't deallocate L2)
    - 4. Cache + TLB maintenance (invalidate old data)



- Summarize general functionalities
  - Superpage creation
    - Promotion or direct mapping
    - Preferred read-only mappings (minimize disc traffic)
    - Contiguity (PA/VA) and attributes check required
    - Corresponding L2 table (and I2\_bucket) preserved
    - Single PV entry for entire superpage area



## Summarize general functionalities

Superpage creation



VA to PA

0

4095

Change L1 descriptor to a section mapping







- Summarize general functionalities
  - Superpage removal

#### Demote superpage when:

- Changing attributes of the base page within
- Paging out the base page
- Write attempt to RO superpage

#### Remove superpage when:

- The address map region to remove is at least superpage size
- Quick recreation of the L2 table is not possible



- Summarize general functionalities
  - Superpage removal
    - During demotion:
      - Recall old L2 table
        - recreate if there is none
        - fix-up if it is obsolete
      - Fix-up L1 table accordingly
      - Recreate PV entries basing on the superpage PV entry





Introduced support for machine-dependent portion of Superpages mechanism

- Support for two page sizes
  - 4 KB small page (base page)
  - I MB section (superpage)
- One superpage instead of 256 base pages
  - Less TLB misses
  - Shorter translation table walk



- Test tools
  - GUPS (Giga Updates Per Second)
  - LMbench (STREAM)
  - Self-hosted world build
  - forkbomb
  - Hardware performance counters
- Test platform
  - Armada XP (quad core ARMv7)







- HW performance counters
  Per-CPU TLB miss counter
  Per-CPU cycles counter
  - Goals:
    - Measure/estimate TLB miss penalty
    - Check TLB miss reduction due to superpages



#### Test plan

Allocate 2 x (TLB size) x (superpage size) memory region

#### Configure PMU hardware

asm volatile("mcr p15, 0, %0, c9, c14, 0"::"r"(1));





#### Test plan

Allocate 2 x (TLB size) x (superpage size) memory region

#### Touch all 4KB pages: Addr: [0 : end] Prefault all pages in the range

#### Enable PMU counters

#### Touch 64 pages with IMB interval: Addr: [0 : (TLB size) x (superpage size)

Disable PMU counters

Get CPU cycles count and TLB miss count





- X CPU cycles recorded during the test
- Y CPU cycles for all loop iterations without TLB miss
- T Number of all TLB misses

CPM = (X - Y) / T



#### Test results



| Cyc/TLB miss | TLB miss nb. |  |
|--------------|--------------|--|
| 157          | 32882        |  |
| 60           | 193          |  |



# What's next?

Support for 64 KB pages Further performance improvement More applications can use superpages Enable superpages by default (sp\_enabled = 1) Move all status flags from PV to PTE Less overhead on promotion failure Faster page management



## References

#### Project's wiki page

http://wiki.freebsd.org/ARMSuperpage

Paper

http://semihalf.com/download.html



## Acknowledgments

- Special thanks go to:
  Grzegorz Bernacki
  Alan Cox
  - Project mentors and sponsors:
    - Rafał Jaworowski & Bartłomiej Sięka (<u>www.semihalf.com</u>)
    - The FreeBSD Foundation (<u>www.freebsdfoundation.org</u>)



Transparent Superpages Support for FreeBSD on ARM

# Any questions?

