Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

rpi4's bare-metal hashing performance is poor without caching#155

nihalpasham started this conversation inGeneral
Discussion options

Disclaimer: - I'm assuming this topic can be discussed here. If not, please let me know and I will remove this topic.

Question: ran into an odd issue. I'm working on asecure bootloader that's written entirely in rust. Most of the boot code for the rpi4 is from this repo. I managed to get all the pieces working. However, I've run into a strange performance issue. The gist of it is

  • when I compute the hash of a large file (like 30MB) on a raspberry pi 4, it takes way too long. (i.e. I'm expecting a 30MB file to be hashed in 3 seconds but it takes about 36-40 seconds).
  • my bare-metal bootloader's results are way off when compared withOpenSSL and the sha2 crate running on a standard linux OS + raspberry pi 4 i.e. the hashing-speed for openssl is 121 MiB/s and sha2 is 82 MiB/s, which roughly translates to less than 3 seconds for a 30MB file.
  • My suspicion is its some kind of hardware mis-configuration issue but I cant seem to figure it out.

I'm hoping folks here who have more experience with a rpi can offer some insight into what's probably missing/wrong.

A link to theimplementation. The boot code is present in/boards/bootloaders/rpi4/src/boot.rs

serial output from an rpi4: as you can see from the logs below, computing a hash kernel and ramdisk takes an additional 80 secs (give or take).

boards\bootloaders\rpi4 on  main is 📦 v0.1.0 via 🦀 v1.61.0-nightly❯ terminal-s.exe--- COM3 is connected. Press Ctrl+] to quit ---[    2.170921] EMMC2 driver initialized...............[   42.699906] loaded fit: 62202019 bytes, starting at addr: 0x200000[   42.703127] authenticating fit-image...[   42.712671] [INFO]  computing"kernel"hash[   42.714672]          - rustBoot::dt::fit @ line:289[   78.644641] [INFO]  computed"kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb[   78.652289]          - rustBoot::dt::fit @ line:294[   78.657293] [INFO]  kernel integrity consistent with supplied itb...[   78.664885]          - rustBoot::dt::fit @ line:306[   78.670539] [INFO]  computing"fdt"hash[   78.674268]          - rustBoot::dt::fit @ line:289[   78.710473] [INFO]  computed"fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540[   78.717861]          - rustBoot::dt::fit @ line:294[   78.722847] [INFO]  fdt integrity consistent with supplied itb...[   78.730197]          - rustBoot::dt::fit @ line:306[   78.735997] [INFO]  computing"ramdisk"hash[   78.739927]          - rustBoot::dt::fit @ line:289[  119.074666] [INFO]  computed"ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c[  119.082401]          - rustBoot::dt::fit @ line:294[  119.087369] [INFO]  ramdisk integrity consistent with supplied itb...[  119.095084]          - rustBoot::dt::fit @ line:306[  119.101018] [INFO]  computing"rbconfig"hash[  119.104902]          - rustBoot::dt::fit @ line:289[  119.110001] [INFO]  computed"rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519[  119.120365]          - rustBoot::dt::fit @ line:294[  119.125330] [INFO]  rbconfig integrity consistent with supplied itb...[  119.133135]          - rustBoot::dt::fit @ line:306######## ecdsa signature checks out, image is authentic ########[  120.415416] relocating kernel to addr: 0x4200000[  121.660402] relocating initrd to addr: 0x6200000[  121.662056] load rbconfig...[  121.666328] patching dtb...[  121.671186] relocating dtb to addr: 0x6000000***************************************** Starting kernel********************************************[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd083]
You must be logged in to vote

Replies: 5 comments 17 replies

Comment options

I presume the file is already entirely copied to RAM when your loader does computation on it?

Do you have virtual memory and caching enabled?

You must be logged in to vote
3 replies
@nihalpasham
Comment options

Yes, the file (to be hashed) is loaded into RAM.

The MMU is disabled, so no virtual memory. I assume by caching, you mean d-cache. If yes, that's not enabled either. (one of the goals is to ensure that the bootloader has the smallest possible trusted computing base)

But that's an interesting point. I assumed the only variable to consider was the single-core frequency. Would enabling them improve performance?

If yes, I'd be curious to know why?

@andre-richter
Comment options

Then you have your case I‘d say.

Your hashing code will inevitably use some temporary storage (on the stack) when doing it’s computation. Having that readily available in the cache will boost performance.

Caches are filled in quantums of the cacheline-size (usually 64 byte on aarch64 cpus). So for every load, you get the next few bytes „for free“.

Also, when you operate on a file that is layed out sequentially in memory, the CPU‘s prefetchers will most likely kick in and pre-load even more upcoming needed data in the background.

I-Cache will help for similar reasons.

@nihalpasham
Comment options

Makes sense. I'll test this and report back. Thank you!

Comment options

Just in case this wasn’t already discussed: Caching is predicated on theMMU being enabled on Arm v{7,8}~A. Cacheability expression needs to be donevia page table descriptors.This is true irrespective of address translation being required or not.A typical setup for such early boot code is to setup identity maps via asuitable set of page table entries.
On Fri, 15 Apr 2022 at 18:35, nihalpasham ***@***.***> wrote: Yes, the file (to be hashed) is loaded into RAM. The MMU is disabled, so no virtual memory. I assume by caching, you mean d-cache. If yes, that's not enabled either. (one of the goals is to ensure that the bootloader has the smallest possible trusted computing base) But that's an interesting point. I assumed the only variable to consider was the single-core frequency. Would enabling them improve performance? If yes, I'd be curious to know why? — Reply to this email directly, view it on GitHub <#155 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFMKYRSRRIBYNOYS6XGEKLVFFSQBANCNFSM5TP7SIFQ> . You are receiving this because you are subscribed to this thread.Message ID: <rust-embedded/rust-raspberrypi-OS-tutorials/repo-discussions/155/comments/2573843 @github.com>
You must be logged in to vote
4 replies
@nihalpasham
Comment options

Thank you, this helps. I'm not aware of identity maps. Would you know of any reading material that I could use to learn more?

@andre-richter
Comment options

@nihalpasham
Comment options

yeah, was just looking at this. thanks again.

@nihalpasham
Comment options

So, tried this. It kind of works but I'm a bit stuck.

I added themmu-specific code fromexercise-10 to my bootloader. It seems to crash right away. So, I moved the snippet of code for enabling themmu + caching to right after we acquire logging capabilities.

The output below indicates, attempts to modify theSCTLR_EL1 register simply crashes the entire system. The odd thing here, is it doesn'tpanic either. The red status led turns on and stays on (until we we perform a hard reset).

PS: I've captured the register's value just before we try to modify it - 0xc50838. I cross-checked it withARM's register docs, couldn't find anything wrong with it.

boards\bootloaders\rpi4 on  main [✘!?] is 📦 v0.1.0 via 🦀 v1.61.0-nightly❯ terminal-s.exe--- COM3 is connected. Press Ctrl+] to quit ---[    1.696665] EMMC: reset card.[    1.696758] control1: 16143[    1.699378] Divisor = 63, Freq Set = 396825[    2.106809] CSD Contents: 00 40 0e 00 32 5b 59 00 00ed c8 7f 80 0a 40 40[    2.110637] cemmc_structure=1, spec_vers=0, taac=0x0E, nsac=0x00, tran_speed=0x32,ccc=0x05B5, read_bl_len=0x09, read_bl_partial=0b, write_blk_misalign=0b,read_blk_misalign=0b, dsr_imp=0b, sector_size =0x7F, erase_blk_en=1b[    2.130268] CSD 2.0: ver2_c_size = 0xEFFC, card capacity: 31914459136 bytes or 31.91GiB[    2.138174] wp_grp_size=0x0000000b, wp_grp_enable=0b, default_ecc=00b, r2w_factor=010b, write_bl_len=0x09, write_bl_partial=0b, file_format_grp=0, copy=1b, perm_write_protect=0b, tmp_write_protect=0b, file_format=0b ecc=00b[    2.157897] control1: 271[    2.160414] Divisor = 1, Freq Set = 25000000[    2.166935] EMMC: Bus widthset to 4[    2.168068] EMMC: SD Card Type 2 HC, 30436Mb, mfr_id: 3,'SD:ACLCD', r8.0, mfr_date: 1/2017, serial: 0xbbce119c, RCA: 0xaaaa[    2.179179] EMMC2 driver initialized...[    2.183002] mmu not enabled check[    2.186216] translation granularity supported[    2.190473] MAIR_EL1set[    2.473125] translation tables populated[    2.474084] TTBR0_EL1 SET[    2.476603] TCR SET[    2.478601] first isb passed[    2.481381] SCTLR_EL1: c50838
Comment options

Ok, I compiledexcercise-10, flashed the (kernel8) image onto my rpi4. It works as expected.

Note: I moved the MMU activation code, so that we're able to log the activation flow. .

[    0.007482] MAIR_EL1: 0xff04[    0.075640] Special regions:[    0.075698]       0x00080000 - 0x0008ffff|  64 KiB| C   RO PX| Kernel code and RO data[    0.076652]       0x1fff0000 - 0x1fffffff|  64 KiB| Dev RW PXN| Remapped Device MMIO[    0.077638]       0xfe000000 - 0xff84ffff|  24 MiB| Dev RW PXN| Device MMIO[    0.078527] BASE ADDR: 0x120000[    0.078905] TTBR0_EL1: 0x120000[    0.079285] TCR_EL1: 0x200807520[    0.079675] SCTLR_EL1: 0xc50838[    0.080054] After enabling MMU, SCTLR_EL1: 0xc5183d[    0.080648] mingo version 0.10.0[    0.081038] Booting on: Raspberry Pi 4[    0.081493] MMU online. Special regions:[    0.081970]       0x00080000 - 0x0008ffff|  64 KiB| C   RO PX| Kernel code and RO data[    0.082988]       0x1fff0000 - 0x1fffffff|  64 KiB| Dev RW PXN| Remapped Device MMIO[    0.083974]       0xfe000000 - 0xff84ffff|  24 MiB| Dev RW PXN| Device MMIO[    0.084862] Current privilege level: EL1[    0.085339] Exception handling state:[    0.085783]       Debug:  Masked[    0.086173]       SError: Masked[    0.086563]       IRQ:    Masked[    0.086953]       FIQ:    Masked[    0.087343] Architectural timer resolution: 18 ns[    0.087917] Drivers loaded:[    0.088253]       1. BCM GPIO[    0.088610]       2. BCM PL011 UART[    0.089033] Timer test, spinningfor 1 second[!!!    ] Writing through the remapped UART at 0x1FFF_1000[    1.089900] Echoing input now

However, when I copy and paste the (same) mmu-code fromexcercise-10 into my bootloader, it ends up crashing the entire system (...perplexing).

Note:

  • all mmu-related code has been pulled into a single folder calledmemory and
  • I think I've used every possible permutation and combination to set the relevant fields in theSCTLR_EL1 register (i.e. set, write, modify, modify_on_read) and even tried to write the raw value into the register but I cant seem to get it work.

I plan on getting a hardware debugger.

But in the meantime, any thoughts on what I'm doing wrong here?

❯ terminal-s.exe--- COM3 is connected. Press Ctrl+] to quit ---............[    2.211136] MAIR_EL1: 0xff04[    2.485412] translation tables populated[    2.486370] Special regions:[    2.489151]       0x00080000 - 0x000a2fff| 140 KiB| C   RO PX| Kernel code and RO data[    2.497317]       0x1fff0000 - 0x1fffffff|  64 KiB| Dev RW PXN| Remapped Device MMIO[    2.505223]       0xfe000000 - 0xff84ffff|  24 MiB| Dev RW PXN| Device MMIO[    2.512347] BASE ADDR: 0x280000[    2.515387] TTBR0_EL1: 0x280000[    2.518427] TCR_EL1: 0x200807520[    2.521555] first isb passed[    2.524334] SCTLR_EL1: 0xc50838[    2.527375] new SCTLR_EL1: 0xc5183d[ ---- crashes ----- a red led turns on and stays on
You must be logged in to vote
8 replies
@nihalpasham
Comment options

objdump'd my elf binary and I think I understand the cause of the error.

So, as suggested, I examined the contents of the address -0x09fa6c (i.e. which contains the faulting instruction), observed the following

  • the instruction is part of thewrite_str subroutine and
  • it attempts to store the contents of thex10 register to an address at[x8 + 8]

image

The value inx8 at the time of the exception is0xad018, which happens to be an address in the.data section. (it contains the memory mapped address for the PL011_UART peripheral). However, as we're adding an offset of 8 to x8, the faulting instruction attempts to store the contents ofx10 to address0xad020, which results in a (level-3 table) permission fault.

  • note: FAR_EL1 also contains0xad020 .

image

So, my previous suspicion that it had something to do with writing toSCTLR_EL1 turned out to be wrong. The relevantSCTLR_EL1 bits are set and the MMU is enabled but later on when we try to log/print anything to serial output, we get the above (bad-write) exception.

A couple of things that I haven't figured out:

  • why does printing fail only after enabling the MMU? I noticed we're able to log a single character -[ just before the panic.
  • I have not been able to figure out the exact execution path inkernel_main. I see that theexecution-flow passes from kernel_init to kernel_main but how we end up inwrite_str is still a mystery or at least I'm not sure we can answer that with juststatic code analysis.
  • the other thing is why do we get a write-permission fault, address 0x0ad020 is basically the.data section which should be writeable - right?

Would adding an extratabledescriptor to theLAYOUT for.data section and making itReadWrite-able solve this?

[    2.130593] mmu not enabled check[    2.130994] translation granularity supported[    2.131525] MAIR_EL1set[    2.131828] MAIR_EL1: 0xff04[    2.226713] translation tables populated[    2.226837] Special regions:[    2.227183]       0x00080000 - 0x000acfff| 180 KiB| C   RO PX| Kernel code and RO data[    2.228202]       0x1fff0000 - 0x1fffffff|  64 KiB| Dev RW PXN| Remapped Device MMIO[    2.229187]       0xfe000000 - 0xff84ffff|  24 MiB| Dev RW PXN| Device MMIO[    2.230076] BASE ADDR: 0x280000[    2.230455] TTBR0_EL1: 0x280000[    2.230834] TCR_EL1: 0x200807520[    2.231224] first isb passed[    2.231571] SCTLR_EL1: 0xc50838[    2.231950] new SCTLR_EL1: 0xc5183d[[    2.232384] Kernel panic!Panic location:      File'hal\src\rpi\rpi4\exception\exception.rs', line 64, column 5CPU Exception!ESR_EL1: 0x9600004f      Exception Class         (EC): 0x25 - Data Abort, current EL      Instr Specific Syndrome (ISS): 0x4fFAR_EL1: 0x00000000000ad020SPSR_EL1: 0x600003c5      Flags:            Negative (N): Notset            Zero     (Z): Set            Carry    (C): Set            Overflow (V): Notset      Exception handling state:            Debug  (D): Masked            SError (A): Masked            IRQ    (I): Masked            FIQ    (F): Masked      Illegal Execution State (IL): NotsetELR_EL1: 0x000000000009fa6cGeneral purpose register:      x0: 0x000000000007fbf0         x1: 0x00000000000ac111      x2: 0x0000000000000003         x3: 0x000000000009fa50      x4: 0x0000000000000006         x5: 0x000000000007ff44      x6: 0x000000000007ff48         x7: 0x000000000007ff4c      x8: 0x00000000000ad018         x9: 0x00000000000ac113      x10: 0x00000000000006bb         x11: 0x00000000fe201000      x12: 0x0000000000000009         x13: 0x00000000000a6bf8      x14: 0x0000000000000006         x15: 0x0000000000000057      x16: 0x000000000007fc8b         x17: 0x0000000000000005      x18: 0x0000000000000002         x19: 0x000000000007f540      x20: 0x0000000000000005         x21: 0x0000000000000118      x22: 0x00000000000a6b88         x23: 0x00000000000aab70      x24: 0x0000000000081d88         x25: 0x00000000000abea0      x26: 0x0000000100000000         x27: 0x000000000007f7e0      x28: 0x0000000000081480         x29: 0x0000000000081ddc      lr: 0x0000000000082c14
@andre-richter
Comment options

I think I spotted it. Your end of code section and start of data section is not 64KiB aligned, but that is the paging granularity. You get the permission fault because start of your data is still covered by the last code page, which is mapped RO.

@nihalpasham
Comment options

Ah, makes sense. I'll change that and report back.

@nihalpasham
Comment options

Yep, that works 🙌🏾. I added a (64KiB) alignment constraint to the.data section of the linker script.

    .data: ALIGN(65536) {*(.data*) } :segment_data

as for the original issue - performance is way better than what I could have hoped for. What took a 100 seconds before, now completes in less than 1.5 seconds and that includes

  • hashing and validating the integrity of 4 files (with a total size of 62MB) along with verifying an ECC signature.
  • guess, caching is a wondrous thing (until it is not).

I run into aninstruction abort exception at the end. The faulting instruction starts at address 0x4600000 located in the.bss section (which is where I've loaded the Linux kernel). Its another permission fault. I guess the fix here, is to mark the kernel load-range as a special region with the read + execute permissions - right?

[    1.714893] EMMC: reset card.[    1.714982] control1: 16143[    1.715250] Divisor = 63, Freq Set = 396825[    2.119087] CSD Contents: 00 40 0e 00 32 5b 59 00 00ed c8 7f 80 0a 40 40[    2.119571] cemmc_structure=1, spec_vers=0, taac=0x0E, nsac=0x00, tran_speed=0x32,ccc=0x05B5, read_bl_len=0x09, read_bl_partial=0b, write_blk_misalign=0b,read_blk_misalign=0b, dsr_imp=0b, sector_size =0x7F, erase_blk_en=1b[    2.122018] CSD 2.0: ver2_c_size = 0xEFFC, card capacity: 31914459136 bytes or 31.91GiB[    2.123004] wp_grp_size=0x0000000b, wp_grp_enable=0b, default_ecc=00b, r2w_factor=010b, write_bl_len=0x09, write_bl_partial=0b, file_format_grp=0, copy=1b, perm_write_protect=0b, tmp_write_protect=0b, file_format=0b ecc=00b[    2.125465] control1: 271[    2.125778] Divisor = 1, Freq Set = 25000000[    2.128635] EMMC: Bus widthset to 4[    2.128721] EMMC: SD Card Type 2 HC, 30436Mb, mfr_id: 3,'SD:ACLCD', r8.0, mfr_date: 1/2017, serial: 0xbbce119c, RCA: 0xaaaa[    2.130102] EMMC2 driver initialized...[    2.232355] rpi4 version 0.1.0[    2.232721] Booting on: Raspberry Pi 4[    2.233176] MMU online. Special regions:[    2.233653]       0x00080000 - 0x000acfff| 180 KiB| C   RO PX| Kernel code and RO data[    2.234671]       0x1fff0000 - 0x1fffffff|  64 KiB| Dev RW PXN| Remapped Device MMIO[    2.235657]       0xfe000000 - 0xff84ffff|  24 MiB| Dev RW PXN| Device MMIO[    2.236546] Current privilege level: EL1[    2.237022] Exception handling state:[    2.237466]       Debug:  Masked[    2.237856]       SError: Masked[    2.238246]       IRQ:    Masked[    2.238636]       FIQ:    Masked[    2.239026] Architectural timer resolution: 18 ns[    2.239600] Drivers loaded:[    2.239936]       1. BCM GPIO[    2.240294]       2. BCM PL011 UART[    2.240716] Chars written: 2494[!!!    ] Writing through the remapped UART at 0x1FFF_1000[    2.241790] [INFO]  create new emmc-fat controller...[    2.242504]          - rustBoot::fs::controller @ line:200[    2.247239] Listing root directory:[    2.250831]      - Found: SIGNED~1.ITB[    2.251027] loading fit-image...[   33.920214] loaded fit: 62202019 bytes, starting at addr: 0x600000[   33.920617] authenticating fit-image...[   33.921360] [INFO]  computing"kernel"hash[   33.921830]          - rustBoot::dt::fit @ line:289[   34.612911] [INFO]  computed"kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb[   34.613864]          - rustBoot::dt::fit @ line:294[   34.614467] [INFO]  kernel integrity consistent with supplied itb...[   34.615435]          - rustBoot::dt::fit @ line:308[   34.616054] [INFO]  computing"fdt"hash[   34.616605]          - rustBoot::dt::fit @ line:289[   34.617811] [INFO]  computed"fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540[   34.618732]          - rustBoot::dt::fit @ line:294[   34.619333] [INFO]  fdt integrity consistent with supplied itb...[   34.620270]          - rustBoot::dt::fit @ line:308[   34.620891] [INFO]  computing"ramdisk"hash[   34.621484]          - rustBoot::dt::fit @ line:289[   35.398004] [INFO]  computed"ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c[   35.398968]          - rustBoot::dt::fit @ line:294[   35.399570] [INFO]  ramdisk integrity consistent with supplied itb...[   35.400550]          - rustBoot::dt::fit @ line:308[   35.401174] [INFO]  computing"rbconfig"hash[   35.401774]          - rustBoot::dt::fit @ line:289[   35.402376] [INFO]  computed"rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519[   35.403702]          - rustBoot::dt::fit @ line:294[   35.404303] [INFO]  rbconfig integrity consistent with supplied itb...[   35.405295]          - rustBoot::dt::fit @ line:308######## ecdsa signature checks out, image is authentic ########[   35.434296] relocating kernel to addr: 0x4600000[   35.456677] relocating initrd to addr: 0x6400000[   35.456885] load rbconfig...[   35.457266] patching dtb...[   35.457772] relocating dtb to addr: 0x400000***************************************** Starting kernel********************************************[   35.459487] Kernel panic!Panic location:      File'hal\src\rpi\rpi4\exception\exception.rs', line 64, column 5CPU Exception!ESR_EL1: 0x8600000f      Exception Class         (EC): 0x21 - N/A      Instr Specific Syndrome (ISS): 0xfFAR_EL1: 0x0000000004600000SPSR_EL1: 0x600003c5      Flags:            Negative (N): Notset            Zero     (Z): Set            Carry    (C): Set            Overflow (V): Notset      Exception handling state:            Debug  (D): Masked            SError (A): Masked            IRQ    (I): Masked            FIQ    (F): Masked      Illegal Execution State (IL): NotsetELR_EL1: 0x0000000004600000General purpose register:      x0: 0x0000000000400000         x1: 0x0000000000000000      x2: 0x0000000000000000         x3: 0x0000000000000000      x4: 0x0000000000000006         x5: 0x0000000000005ea8      x6: 0x0000000000000001         x7: 0x0000000000000000      x8: 0x0000000004600000         x9: 0x00000000000a7014      x10: 0x00000000000013de         x11: 0x00000000fe201000      x12: 0x0000000000000019         x13: 0x000000000007f810      x14: 0x0000000000000000         x15: 0x0000000000000000      x16: 0x0000000000000030         x17: 0x0000000000000078      x18: 0x0000000000400000         x19: 0x00000000000b0018      x20: 0x000000004e650000         x21: 0x0000000000083bec      x22: 0x00000000000000bc         x23: 0x000000003b9aca00      x24: 0x0000000000000244         x25: 0x00000000000f4240      x26: 0x00000000000abea0         x27: 0x0000000000006521      x28: 0x0000000000000264         x29: 0x0000000000081ddc      lr: 0x000000000008bd5c
@andre-richter
Comment options

Well, the first thing that Linux will do is to set up its own page tables. I don’t know by heart what the expectation from a previous boot loader stage is with respect to the architectural state of the memory subsystem.

For starters, I would probably just disable the MMU again before jumping to Linux.

Comment options

This doc outlines the AArch64 Linux boot protocol andarchitectural/micro-architectural expectations:https://www.kernel.org/doc/Documentation/arm64/booting.txtThe MMU needs to be off and the caching needs to be explicitly disabledadditionally.
On Thu, Apr 21, 2022 at 7:38 PM Andre Richter ***@***.***> wrote: Well, the first thing that Linux will do is to set up its own page tables. I don’t know by heart what the expectation from a previous boot loader stage is with respect to the architectural state of the memory subsystem. For starters, I would probably just disable caching again before jumping Linux. — Reply to this email directly, view it on GitHub <#155 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFMKYVEXFOPYGMKBGRFJETVGGOBZANCNFSM5TP7SIFQ> . You are receiving this because you commented.Message ID: <rust-embedded/rust-raspberrypi-OS-tutorials/repo-discussions/155/comments/2610867 @github.com>
You must be logged in to vote
1 reply
@nihalpasham
Comment options

Yeah, completely forgot about this (got lost in MMU translation - 😁). I'll need to reset most of the hardware to an early state forlinux to boot.

Comment options

@nihalpasham can you do me a favor and check what the speedup is with instruction caching alone?

Would be a nice datapoint to have.

You must be logged in to vote
1 reply
@nihalpasham
Comment options

enabled instruction-caching alone.

// Enable the MMU and turn on instruction caching alone.SCTLR_EL1.modify(SCTLR_EL1::M::Enable +SCTLR_EL1::I::Cacheable);

Results: for the same set of operations

  • approx time: 47.5 seconds or half the original amount of time.
..........[2.399815] rpi4 version0.1.0[2.400183] Booting on: Raspberry Pi4[2.400638] MMU online. Special regions:[2.401115]0x00080000-0x000a4fff|148 KiB| C   RO PX| Kernel code and ROdata[2.402134]0x1fff0000-0x1fffffff|64 KiB| Dev RW PXN| Remapped Device MMIO[2.403119]0xfe000000-0xff84ffff|24 MiB| Dev RW PXN| Device MMIO[2.404008] Current privilege level: EL1[2.404484] Exception handling state:[2.404928]       Debug:  Masked[2.405318]       SError: Masked[2.405708]       IRQ:    Masked[2.406098]       FIQ:    Masked[2.406488] Architectural timer resolution:18 ns[2.407062] Drivers loaded:[2.407398]1. BCM GPIO[2.407756]2. BCM PL011 UART[2.408178] Chars written:1793[!!!    ] Writing through the remapped UART at0x1FFF_1000[2.409253] [INFO]  create new emmc-fat controller...[2.409966]- rustBoot::fs::controller @ line:200[2.414702] Listing root directory:[2.418335]- Found: SIGNED~1.ITB[2.418538] loading fit-image...[34.053223] loaded fit:62202019 bytes, starting at addr:0x290000[34.053627] authenticating fit-image...[34.055365] [INFO]  computing"kernel" hash[34.055617]- rustBoot::dt::fit @ line:289[56.309267] [INFO]  computed"kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb[56.310223]- rustBoot::dt::fit @ line:294[56.310875] [INFO]  kernel integrity consistent with supplied itb...[56.311793]- rustBoot::dt::fit @ line:308[56.312622] [INFO]  computing"fdt" hash[56.312963]- rustBoot::dt::fit @ line:289[56.333355] [INFO]  computed"fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540[56.334278]- rustBoot::dt::fit @ line:294[56.334926] [INFO]  fdt integrity consistent with supplied itb...[56.335816]- rustBoot::dt::fit @ line:308[56.336685] [INFO]  computing"ramdisk" hash[56.337030]- rustBoot::dt::fit @ line:289[81.346679] [INFO]  computed"ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c[81.347646]- rustBoot::dt::fit @ line:294[81.348290] [INFO]  ramdisk integrity consistent with supplied itb...[81.349227]- rustBoot::dt::fit @ line:308[81.350135] [INFO]  computing"rbconfig" hash[81.350451]- rustBoot::dt::fit @ line:289[81.351205] [INFO]  computed"rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519[81.352380]- rustBoot::dt::fit @ line:294[81.353024] [INFO]  rbconfig integrity consistent with supplied itb...[81.353972]- rustBoot::dt::fit @ line:308######## ecdsa signature checks out, image is authentic ########[81.556150] relocating kernel to addr:0x4200000........

enabled data-caching alone.

// Enable the MMU and turn on data caching alone.SCTLR_EL1.modify(SCTLR_EL1::M::Enable +SCTLR_EL1::C::Cacheable);

Results: for the same set of operations

  • approx time: 61.15 seconds
........[2.399946] rpi4 version0.1.0[2.400313] Booting on: Raspberry Pi4[2.400767] MMU online. Special regions:[2.401245]0x00080000-0x000a4fff|148 KiB| C   RO PX| Kernel code and ROdata[2.402263]0x1fff0000-0x1fffffff|64 KiB| Dev RW PXN| Remapped Device MMIO[2.403249]0xfe000000-0xff84ffff|24 MiB| Dev RW PXN| Device MMIO[2.404138] Current privilege level: EL1[2.404614] Exception handling state:[2.405058]       Debug:  Masked[2.405448]       SError: Masked[2.405838]       IRQ:    Masked[2.406228]       FIQ:    Masked[2.406618] Architectural timer resolution:18 ns[2.407192] Drivers loaded:[2.407528]1. BCM GPIO[2.407885]2. BCM PL011 UART[2.408308] Chars written:1793[!!!    ] Writing through the remapped UART at0x1FFF_1000[2.409383] [INFO]  create new emmc-fat controller...[2.410096]- rustBoot::fs::controller @ line:200[2.414937] Listing root directory:[2.419248]- Found: SIGNED~1.ITB[2.419470] loading fit-image...[42.972299] loaded fit:62202019 bytes, starting at addr:0x290000[42.972711] authenticating fit-image...[42.977009] [INFO]  computing"kernel" hash[42.977267]- rustBoot::dt::fit @ line:289[71.962372] [INFO]  computed"kernel" hash: 97dcbff24ad0a60514e31a7a6b34a765681fea81f8dd11e4644f3ec81e1044fb[71.963334]- rustBoot::dt::fit @ line:294[71.964113] [INFO]  kernel integrity consistent with supplied itb...[71.964905]- rustBoot::dt::fit @ line:308[71.966198] [INFO]  computing"fdt" hash[71.966423]- rustBoot::dt::fit @ line:289[71.992601] [INFO]  computed"fdt" hash: 3572783be74511b710ed7fca9b3131e97fd8073c620a94269a4e4ce79d331540[71.993531]- rustBoot::dt::fit @ line:294[71.994300] [INFO]  fdt integrity consistent with supplied itb...[71.995069]- rustBoot::dt::fit @ line:308[71.996478] [INFO]  computing"ramdisk" hash[71.996746]- rustBoot::dt::fit @ line:289[104.578328] [INFO]  computed"ramdisk" hash: f1290587e2155e3a5c2c870fa1d6e3e2252fb0dddf74992113d2ed86bc67f37c[104.579301]- rustBoot::dt::fit @ line:294[104.580056] [INFO]  ramdisk integrity consistent with supplied itb...[104.580881]- rustBoot::dt::fit @ line:308[104.582422] [INFO]  computing"rbconfig" hash[104.582700]- rustBoot::dt::fit @ line:289[104.583560] [INFO]  computed"rbconfig" hash: b16d058c4f09abdb8da98561f3a15d06ff271c38a4655c2be11dec23567fd519[104.584629]- rustBoot::dt::fit @ line:294[104.585383] [INFO]  rbconfig integrity consistent with supplied itb...[104.586221]- rustBoot::dt::fit @ line:308######## ecdsa signature checks out, image is authentic ########[106.124374] relocating kernel to addr:0x4200000

Conclusions:

  • for the above set of operations, instruction caching alone contributes to a 50% speed-up
  • for the same set of operations, data caching alone contributes to a 40% speed-up
  • cumulatively though i.e. with both instruction + data caching enabled, we get a massive 100x speed-up.

The results kind of make sense as hashing algorithms (are typically implemented in 3 steps - init, update and finalize). The bulk of the work is performed in theupdate step where we apply the same operations on new chunks of data, repeatedly.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
General
Labels
None yet
3 participants
@nihalpasham@raw-bin@andre-richter

[8]ページ先頭

©2009-2025 Movatter.jp