- Notifications
You must be signed in to change notification settings - Fork13.3k
Use 32b loads to set print strings#7545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Instead of using either a series of etc_putc or setting a series of bytesone by one, use a simple macro to define 32b constants to build up strings.Saves ~30 bytes of program code in eboot
| uint32_tfmt[2]; | ||
| fmt[0]=S('v','%','0','8'); | ||
| fmt[1]=S('x','\n',0,0); | ||
| ets_printf((constchar*)fmt,ver); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
more readable solution I guess is:
#define TINYSTR(str) [__builtin_strlen(str) + 1] = str#define TINYCHARS(chars) [__builtin_strlen(chars)] = charsand then,
const char fmt TINYSTR("v%08x\n"); ets_printf(fmt, ver);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OMG, greater then 4 bytes string literals will be placed into RODATA (andmemcpy will copy them)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sorry,
#define S(str) ((uint32_t)(str"\0\0\0")[0] | ((uint32_t)(str"\0\0\0")[1] << 8) | ((uint32_t)(str"\0\0\0")[2] << 16) | ((uint32_t)(str"\0\0\0")[3] << 24))and
const uint32_t fmt[] = { S("v%08"), S("x\n") }; ets_printf(fmt, ver);are OK (now placed into.text).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yeah, there aren't any really good solutions. We don't allow RODATA because we don't have the C init stuff in the bootloader. Can't use PSTR because ets_printf can't handle PROGMEM byte accesses. GCC's multichar literal puts things in the wrong order for strings (endian issue). So we end up with either a 4-param or 1-param marco w/array indexing.
Honestly, I like the existing setup more than the string 4-chars because with the current macro, if you accidentally put in a >4 char element it will error (while the string[index] will silently allow it).
But, your idea to make it a const array to get in .text, that's pretty cool. Let me see what it does to the size, might buy 4-8 bytes per instance...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Argh, looks like that's not going to work. I see .rodata appear when I doconst uint32_t s={S(),S()}; Back where we were...
earlephilhowerAug 24, 2020 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Also, anyway, the string needs to be in RAM since it will be byte accessed. So if it was reduced to just a pointer to ICACHE (where eboot lives)/PROGMEM it would end up SEGV'ing. Has to be a copy to RAM step no matter what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Haha, get to be quite weird definition...
(#include <assert.h> needed )
#define S(x, str) \ do { \ const unsigned int l = (__builtin_strlen(str) + 1 + 3) / 4; \ static_assert(sizeof(x) / sizeof(*x) == l); \ unsigned int i = 0; \ while(i < l) \ x[i] = ((uint32_t)(str"\0\0\0")[i * 4 + 0] | ((uint32_t)(str"\0\0\0")[i * 4 + 1] << 8) | ((uint32_t)(str"\0\0\0")[i * 4 + 2] << 16) | ((uint32_t)(str"\0\0\0")[i * 4 + 3] << 24)), ++i; \ } while(0)and
void test(void) { uint32_t fmt[3]; S(fmt, "foo bar baz"); extern int write(const void*); write((const char*)fmt);}emits the below (good result)
.file"test.c".text.literal_position.literal .LC0, 544173926.literal .LC1, 544366946.literal .LC2, 8020322.align4.globaltest.typetest, @functiontest:l32ra2, .LC0addisp, sp, -32s32i.na2, sp, 0l32ra2, .LC1s32i.na0, sp, 28s32i.na2, sp, 4l32ra2, .LC2s32i.na2, sp, 8mov.na2, spcall0writel32i.na0, sp, 28addisp, sp, 32ret.n.sizetest, .-test.ident"GCC: (GNU) 10.1.0"but this
void test(void) { uint32_t fmt[4]; S(fmt, "foo bar baz "); /* 1 whitespace appended */ extern int write(const void*); write((const char*)fmt);}emits bad...
.file"test.c".text.section.rodata.LC0:.string"foo bar baz ".string"".string"".string"".text.literal_position.literal .LC1, .LC0.align4.globaltest.typetest, @functiontest:addisp, sp, -32l32ra3, .LC1s32i.na0, sp, 28movi.na4, 0.L2:l8uia2, a3, 1l8uia5, a3, 2sllia2, a2, 8sllia5, a5, 16ora2, a2, a5l8uia5, a3, 0add.na6, sp, a4ora2, a2, a5l8uia5, a3, 3addi.na4, a4, 4sllia5, a5, 24ora2, a2, a5s32i.na2, a6, 0addi.na3, a3, 4bneia4, 16, .L2mov.na2, spcall0writel32i.na0, sp, 28addisp, sp, 32ret.n.sizetest, .-test.ident"GCC: (GNU) 10.1.0"mhightower83 commentedAug 24, 2020
@earlephilhower, At the moment I am having trouble remember the exact specifics; however, earlier this year (feels like years ago) I was able to get the Boot ROM loader to handle a special eboot, with two segments one for code and one for data which got loaded to DRAM. It also required changes to elft2bin.py. It was one of many tangents/attempts I went on when doing the HWDT Stack Dump. There were some issues in getting the checksum correct for each segment. Only the 1st segment is biased by 0xFE the additional segments use 0. My notes indicate there are also some payload length tweaks that are needed in creating the .bin. If you are interested I can hunt down and update the branch, I think there are changes in eboot and elft2bin.py since then that would need to be incorporated. It might take me a while, I am a bit distracted. |
earlephilhower commentedAug 24, 2020
Thanks,@mhightower83 . For now, though, I'd really like to just make it simple and save a few bytes because I know there are add'l changes coming in (fixed verify, eboot protect, OTA command in flash, etc.). The way it works now the ROM copies the first 4K to IRAM (per the flash copy structure which elf2bin.py generates and the ROM parses) and jumps to the start of code. In eboot.c it's definitely possible to manually copy from IRAM->DRAM for RODATA/etc., but that's additional code which eats up any savings. I don't remember if the ROM can do multiple region copies, bu if it does then we could do the same stuff for RODATA (just align start to 4K somewhere, not freely assigned as it would be normally). But, again, that's way more work than I'm thinking of right now, for little gain. |
mhightower83 commentedAug 24, 2020
@earlephilhower Yes, it does; however, I could not find it documented. I ended out using the empirical method to figure out what it wanted. |
d-a-v left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Does elf file need to be regenerated since aug24 ?
dirkenstein commentedOct 21, 2020
Hello, I think this should be: I think the cast was using the actual text as the pointer value, not the location of fmt. Have I misunderstood something? |
Instead of using either a series of etc_putc or setting a series of bytes
one by one, use a simple macro to define 32b constants to build up strings.
Saves ~30 bytes of program code in eboot