MicroPython 1.25.0 introduced a breaking change, aligning the behaviour of theint() function closer to the behaviour of CPython (something along the lines of: strings are assumed to represent a decimal number, unless a base is specified. if a base of 0 is specified, is the base is inferred from the string)

This broke our parsing logic, which relied on the previous behaviour of theint() function to automatically determine the base of the string literal, based on a base prefix present in the string. Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.

Additionally, we never actually parsed octal in the format0100 correctly - even before this PR; that number would have been interpreted as 100 rather than 64.

So, to fix this, and to ensure our parsing matches the GNU assembler, this PR implements a customparse_int() function, using the base prefix in a string to determine the correct base to pass toint(). The following are supported:

0x -> treated as hex
0b -> treated as binary
0... -> treated as octal
0o -> treated as octal
anything else parsed as decimal

Theparse_int method also supports the negative prefix operator for all of the above cases.

This change also ensures.int,.long,.word directives correctly handle the above mentioned formats. This fixes the issue described in#104.

Note: GNU as does not actually accept the octal prefix0o..., but we accept it as a convenience, as this is accepted in Python code. This means however, that our assembler accepts code which GNU as does not accept. But the other way around, we still accept all code that GNU as accepts, which was one of our goals.

wnienhaus self-assigned this

Jun 19, 2025

wnienhaus requested a review fromThomasWaldmann

June 19, 2025 20:25

wnienhaus removed their assignment

Jun 19, 2025

wnienhaus mentioned this pull request

Jun 19, 2025

Update builder image to ubuntu-22.04#107

Merged

wnienhaus force-pushed thefix-int-parsing-with-base-prefix branch from9452423 to23f8ab4Compare

June 19, 2025 20:42

Copy link

CollaboratorAuthor

wnienhaus commentedJun 19, 2025

After merging#107 the tests now pass.

dpgeorge reviewed

Jun 20, 2025

View reviewed changes

Copy link

Member

dpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.

I guess the simplest fix here would be to just replaceint(x) withint(x, 0). That should restore the existing behaviour. But it looks like you want to improve things further, which is great!

esp32_ulp/opcodes.py OutdatedShow resolvedHide resolved

Copy link

Contributor

mjaspers2mtu commentedJun 22, 2025

Hey@wnienhaus , tested with the ulp programs on my s2, and had no issues 👍

mjaspers2mtu mentioned this pull request

Jun 22, 2025

improve argument evaluation#104

Closed

Copy link

CollaboratorAuthor

wnienhaus commentedJun 30, 2025•
edited
Loading

Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.
I guess the simplest fix here would be to just replaceint(x) withint(x, 0). That should restore the existing behaviour. But it looks like you want to improve things further, which is great!

Yes,int(x, 0) would have restored the previous behaviour, but it wasn't the behaviour we needed. Of course it wasn't the behaviour we needed even before, so this PR technically fixes 2 things - adapt to the new MicroPython behaviour and fix parsing behaviour to match GNU as.

That said, sinceint(x, 0) exists, and because it's really just octal parsing we need extra, I just tried this simpler approach:

defparse_int(literal):iflen(literal)>2:prefix_start=1ifliteral[0]=='-'else0# skip over negative sign if presentifliteral[prefix_start]=='0'andliteral[prefix_start+1]in'123456789':returnint(literal,8)returnint(literal,0)

and all tests still pass.

So it's really just the octal case that's different (and theoretically we should disallow python style octal (0b..), but I had already decided to support it).

Now I am starting the overthink this:

what is better for clarity and/or long-term stability? To handle all cases we support explicitly (as I am doing now)? Or to just handle the extra octal case we need?
I see very little performance difference, and the shorter code saves perhaps a few bytes of memory, but it's probably not worth quibbling over.

I think I'll keep the current approach, as it's very explicit about what we support (including explicitly supporting python style octal). I'll just remove the comment about legacy octal format, because from the GNU as perspective, it's the currently valid and only possible octal format.

(Happy to get feedback on my chosen approach)

Copy link

CollaboratorAuthor

wnienhaus commentedJun 30, 2025

Ok. Fixes pushed. Will squash-merge this once approved.

wnienhaus force-pushed thefix-int-parsing-with-base-prefix branch 2 times, most recently fromf7dfddc to5c84d08Compare

June 30, 2025 11:12

Copy link

Member

dpgeorge commentedJul 2, 2025

That said, sinceint(x, 0) exists, and because it's really just octal parsing we need extra, I just tried this simpler approach:

IMO that's quite a bit simpler and easier to read/understand. I would vote for this approach.

Or a little simpler still:

defparse_int(literal):iflen(literal)>=2andliteral.startswith(("0","-0"))andliteral.lstrip("-0").isdigt():returnint(literal,8)returnint(literal,0)

Note: I think you need the>= 2 to cover cases like07.

Copy link

CollaboratorAuthor

wnienhaus commentedJul 3, 2025

Thanks for that feedback. Shorter is nice, so let's go with that. MicroPython'sstartswith does not support multiple values (tuple), so I expanded that into two separate conditions.

The>=2 is indeed needed in the shorter version of the code (both yours and also my earlier test code - well spotted). In the previous longer version all<=2 cases were also valid decimal, so they worked fine via the fall-through case.

I added some extra tests to show more cases that (should) work as expected, e.g.07 as you mentioned. I also noticed, my short version did not handle000010 correctly, i.e. octal with extra zero padding. GNU as understands that as decimal 8. My long parsing code worked correctly for that case, but my short version did not.

Anyway, your proposed code handles all cases correctly, so I used it now. And now there are tests to verify they all work as intended.

Thanks.

Copy link

Member

dpgeorge commentedJul 3, 2025

MicroPython'sstartswith does not support multiple values (tuple), so I expanded that into two separate conditions.

Ah, it will in the next release! But in the interest of backwards compatibility, best not to rely on that yet.

Anyway, your proposed code handles all cases correctly, so I used it now. And now there are tests to verify they all work as intended.

Very good!

dpgeorge approved these changes

Jul 3, 2025

View reviewed changes

Fix parsing of integer literals with base prefix

da5d928

MicroPython 1.25.0 introduced a breaking change, aligning the behaviourof the int() function with the behaviour of CPython (assume a decimalnumber, unless a base is specified. Only if a base of 0 is specifiedwill the base be inferred from the string).This commit implements a new custom parsing function `parse_int`. Itcan correctly parse the following string literals:* 0x[0-9]+ -> treated as hex* 0b[0-9]+ -> treated as binary* 0o[0-9]+ -> treated as octal (Python style)* 0[0-9]+ -> treated as octal (GNU as style)* anything else parsed as decimalIt only handles the GNU as style octal case directly, letting theoriginal `int()` function handle the other cases (using base 0).In fact, the GNU as octal case was not handled correctly previously,and this commit fixes that.Some new tests for previous functionality were added to show thatboth new and previous cases are being handled correctly.Note: GNU as does not actually accept the octal prefix 0o..., but weaccept it as a convenience, as this is accepted in Python code. Thismeans however, that our assembler accepts code which GNU as does notaccept. But the other way around, we still accept all code that GNUas accepts, which was one of our goals.

wnienhaus force-pushed thefix-int-parsing-with-base-prefix branch from57d06d9 toda5d928Compare

July 8, 2025 08:49

Copy link

CollaboratorAuthor