1

I noticed that the difference between lower and uppercase is32. This seems like a perfect opportunity to utilize some clever bit manipulation. The problem is that it's been a long time since my Computer Architecture classes, and I'm a bit rusty with the concepts required. From what I do recall, depending on the CPU architecture and language representation of signed/unsigned there are a very small number of solutions that would apply to almost all programming languages with these operators. I'm interested in comparing these. I'm not interested in simply converting the case, I know there are "simpler" ways (for humans, at least). I'm interested in learning about how this problem interacts with the low-level representation of the data.

Please provide workable, minimal solutions for both lowercase->upper and uppercase->lower, for each common representation, as well as a reasonably detailed explanation of how they work.

askedOct 16, 2014 at 22:59
Floegipoky's user avatar
5
  • 1
    If there's a difference of 32 (which there is), then addition and subtraction are what you need. Nothing fancy. You may be thinking ofshifting to perform multiplication or division by powers of 2.CommentedOct 16, 2014 at 23:01
  • For ASCII assuming Cint c semantics:lower2upper(c) = c - 'a' + 'A' andupper2lower(c) = c - 'A' + 'a' and let the compiler do its work. For Unicode text it won't work.
    – jfs
    CommentedOct 17, 2014 at 1:08
  • 1
    you probably expectc | 0x60.
    – jfs
    CommentedOct 17, 2014 at 4:09
  • @J.F.Sebastian Yes, I'd like to see an explanation of how that works in relation to the low-level representation of the data, and how it would change for other common representations.CommentedOct 17, 2014 at 16:04
  • Do you want to know how various languages represent bytes as numbers and the corresponding bit-patterns if applicable? For example,How to combine two 32-bit integers into one 64-bit integer? Do you want to know what assembly/machine code is generated by various compilers/for target architectures?
    – jfs
    CommentedOct 20, 2014 at 4:47

1 Answer1

1

(Note: i'm usingpython here, but this is of course language agnostic. I also do speak aboutascii, so I'll use a 7-bit representation of things.)

If you look at the binary representation ofascii characters in the[a-z][A-Z] range, you'll notice two things:

>>> bin(ord('a'))'0b1100001'>>> bin(ord('A'))'0b1000001'>>> bin(ord('y'))'0b1111001'>>> bin(ord('Y'))'0b1011001'

First: they all have the seventh bit (from right) set.Second: the lowercase characters have the sixth bit (from right) set, the uppercase ones unset, and that's the only difference between a given uppercase character and it's lowercase version (et vice versa).

So all you have to do is flip that bit to toggle case - that would bexor 0b0100000 which isxor 0x20.

Tolower(), you have to set that bit, so you canor 0b0100000 which isor 0x20 - the already mentionedor 0x60 also works, as 0x60 is 0b1100000 and that bit is set anyway.

And to upper, you have to unset that bit, that would be "and the inverse of the mask 0b0100000", which is the same asand 0x5f.

To see it all in action, i wrote some python snipplets which check that what we just saw is true for every character in the english alphabet:

#toggle():>>> ''.join(chr(ord(c)^0x20) for c in string.ascii_lowercase) == string.ascii_uppercaseTrue>>> ''.join(chr(ord(c)^0x20) for c in string.ascii_uppercase) == string.ascii_lowercaseTrue#lower():>>> ''.join(chr(ord(c)|0x20) for c in string.ascii_lowercase) == string.ascii_lowercaseTrue>>> ''.join(chr(ord(c)|0x20) for c in string.ascii_uppercase) == string.ascii_lowercase#upper():>>> ''.join(chr(ord(c)&0x5f) for c in string.ascii_lowercase) == string.ascii_uppercaseTrue>>> ''.join(chr(ord(c)&0x5f) for c in string.ascii_uppercase) == string.ascii_uppercaseTrue

it does not do anything useful to' ','\n', and such though!

answeredOct 21, 2014 at 13:20
ch3ka's user avatar

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.