Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fix parsing of integer literals with base prefix#106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

@wnienhaus
Copy link
Collaborator

MicroPython 1.25.0 introduced a breaking change, aligning the behaviour of theint() function closer to the behaviour of CPython (something along the lines of: strings are assumed to represent a decimal number, unless a base is specified. if a base of 0 is specified, is the base is inferred from the string)

This broke our parsing logic, which relied on the previous behaviour of theint() function to automatically determine the base of the string literal, based on a base prefix present in the string. Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.

Additionally, we never actually parsed octal in the format0100 correctly - even before this PR; that number would have been interpreted as 100 rather than 64.

So, to fix this, and to ensure our parsing matches the GNU assembler, this PR implements a customparse_int() function, using the base prefix in a string to determine the correct base to pass toint(). The following are supported:

  • 0x -> treated as hex
  • 0b -> treated as binary
  • 0... -> treated as octal
  • 0o -> treated as octal
  • anything else parsed as decimal

Theparse_int method also supports the negative prefix operator for all of the above cases.

This change also ensures.int,.long,.word directives correctly handle the above mentioned formats. This fixes the issue described in#104.

Note: GNU as does not actually accept the octal prefix0o..., but we accept it as a convenience, as this is accepted in Python code. This means however, that our assembler accepts code which GNU as does not accept. But the other way around, we still accept all code that GNU as accepts, which was one of our goals.

@wnienhauswnienhaus self-assigned thisJun 19, 2025
@wnienhauswnienhaus removed their assignmentJun 19, 2025
@wnienhauswnienhausforce-pushed thefix-int-parsing-with-base-prefix branch from9452423 to23f8ab4CompareJune 19, 2025 20:42
@wnienhaus
Copy link
CollaboratorAuthor

After merging#107 the tests now pass.

Copy link
Member

@dpgeorgedpgeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.

I guess the simplest fix here would be to just replaceint(x) withint(x, 0). That should restore the existing behaviour. But it looks like you want to improve things further, which is great!

@mjaspers2mtu
Copy link
Contributor

Hey@wnienhaus , tested with the ulp programs on my s2, and had no issues 👍

wnienhaus reacted with thumbs up emojiThomasWaldmann reacted with hooray emoji

@wnienhaus
Copy link
CollaboratorAuthor

wnienhaus commentedJun 30, 2025
edited
Loading

Specifying base 0 was not a solution, as this resulted in parsing behaviour different from GNU as.

I guess the simplest fix here would be to just replaceint(x) withint(x, 0). That should restore the existing behaviour. But it looks like you want to improve things further, which is great!

Yes,int(x, 0) would have restored the previous behaviour, but it wasn't the behaviour we needed. Of course it wasn't the behaviour we needed even before, so this PR technically fixes 2 things - adapt to the new MicroPython behaviour and fix parsing behaviour to match GNU as.

That said, sinceint(x, 0) exists, and because it's really just octal parsing we need extra, I just tried this simpler approach:

defparse_int(literal):iflen(literal)>2:prefix_start=1ifliteral[0]=='-'else0# skip over negative sign if presentifliteral[prefix_start]=='0'andliteral[prefix_start+1]in'123456789':returnint(literal,8)returnint(literal,0)

and all tests still pass.

So it's really just the octal case that's different (and theoretically we should disallow python style octal (0b..), but I had already decided to support it).

Now I am starting the overthink this:

  • what is better for clarity and/or long-term stability? To handle all cases we support explicitly (as I am doing now)? Or to just handle the extra octal case we need?
  • I see very little performance difference, and the shorter code saves perhaps a few bytes of memory, but it's probably not worth quibbling over.

I think I'll keep the current approach, as it's very explicit about what we support (including explicitly supporting python style octal). I'll just remove the comment about legacy octal format, because from the GNU as perspective, it's the currently valid and only possible octal format.

(Happy to get feedback on my chosen approach)

@wnienhaus
Copy link
CollaboratorAuthor

Ok. Fixes pushed. Will squash-merge this once approved.

@wnienhauswnienhausforce-pushed thefix-int-parsing-with-base-prefix branch 2 times, most recently fromf7dfddc to5c84d08CompareJune 30, 2025 11:12
@dpgeorge
Copy link
Member

That said, sinceint(x, 0) exists, and because it's really just octal parsing we need extra, I just tried this simpler approach:

IMO that's quite a bit simpler and easier to read/understand. I would vote for this approach.

Or a little simpler still:

defparse_int(literal):iflen(literal)>=2andliteral.startswith(("0","-0"))andliteral.lstrip("-0").isdigt():returnint(literal,8)returnint(literal,0)

Note: I think you need the>= 2 to cover cases like07.

@wnienhaus
Copy link
CollaboratorAuthor

Thanks for that feedback. Shorter is nice, so let's go with that. MicroPython'sstartswith does not support multiple values (tuple), so I expanded that into two separate conditions.

The>=2 is indeed needed in the shorter version of the code (both yours and also my earlier test code - well spotted). In the previous longer version all<=2 cases were also valid decimal, so they worked fine via the fall-through case.

I added some extra tests to show more cases that (should) work as expected, e.g.07 as you mentioned. I also noticed, my short version did not handle000010 correctly, i.e. octal with extra zero padding. GNU as understands that as decimal 8. My long parsing code worked correctly for that case, but my short version did not.

Anyway, your proposed code handles all cases correctly, so I used it now. And now there are tests to verify they all work as intended.

Thanks.

@dpgeorge
Copy link
Member

MicroPython'sstartswith does not support multiple values (tuple), so I expanded that into two separate conditions.

Ah, it will in the next release! But in the interest of backwards compatibility, best not to rely on that yet.

Anyway, your proposed code handles all cases correctly, so I used it now. And now there are tests to verify they all work as intended.

Very good!

wnienhaus reacted with thumbs up emoji

MicroPython 1.25.0 introduced a breaking change, aligning the behaviourof the int() function with the behaviour of CPython (assume a decimalnumber, unless a base is specified. Only if a base of 0 is specifiedwill the base be inferred from the string).This commit implements a new custom parsing function `parse_int`. Itcan correctly parse the following string literals:* 0x[0-9]+ -> treated as hex* 0b[0-9]+ -> treated as binary* 0o[0-9]+ -> treated as octal (Python style)* 0[0-9]+ -> treated as octal (GNU as style)* anything else parsed as decimalIt only handles the GNU as style octal case directly, letting theoriginal `int()` function handle the other cases (using base 0).In fact, the GNU as octal case was not handled correctly previously,and this commit fixes that.Some new tests for previous functionality were added to show thatboth new and previous cases are being handled correctly.Note: GNU as does not actually accept the octal prefix 0o..., but weaccept it as a convenience, as this is accepted in Python code. Thismeans however, that our assembler accepts code which GNU as does notaccept. But the other way around, we still accept all code that GNUas accepts, which was one of our goals.
@wnienhauswnienhausforce-pushed thefix-int-parsing-with-base-prefix branch from57d06d9 toda5d928CompareJuly 8, 2025 08:49
@wnienhaus
Copy link
CollaboratorAuthor

Force-pushed after squashing the commits. Will merge next.

@wnienhauswnienhaus merged commit1098afb intomicropython:masterJul 8, 2025
1 check passed
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@dpgeorgedpgeorgedpgeorge approved these changes

@ThomasWaldmannThomasWaldmannAwaiting requested review from ThomasWaldmann

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@wnienhaus@mjaspers2mtu@dpgeorge@ThomasWaldmann

[8]ページ先頭

©2009-2025 Movatter.jp