Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for Obfuscating “Hello world!” obfuscate on Python
Teddy Zugana
Teddy Zugana

Posted on

Obfuscating “Hello world!” obfuscate on Python

create the weirdest obfuscated program that prints the string “Hello world!”. I decided to write up an explanation of how the hell it works. So, here’s the entry, in Python 2.7:

(lambda _, __, ___, ____, _____, ______, _______, ________:    getattr(        __import__(True.__class__.__name__[_] + [].__class__.__name__[__]),        ().__class__.__eq__.__class__.__name__[:__] +        ().__iter__().__class__.__name__[_____:________]    )(        _, (lambda _, __, ___: _(_, __, ___))(            lambda _, __, ___:                chr(___ % __) + _(_, __, ___ // __) if ___ else                (lambda: _).func_code.co_lnotab,            _ << ________,            (((_____ << ____) + _) << ((___ << _____) - ___)) + (((((___ << __)            - _) << ___) + _) << ((_____ << ____) + (_ << _))) + (((_______ <<            __) - _) << (((((_ << ___) + _)) << ___) + (_ << _))) + (((_______            << ___) + _) << ((_ << ______) + _)) + (((_______ << ____) - _) <<            ((_______ << ___))) + (((_ << ____) - _) << ((((___ << __) + _) <<            __) - _)) - (_______ << ((((___ << __) - _) << __) + _)) + (_______            << (((((_ << ___) + _)) << __))) - ((((((_ << ___) + _)) << __) +            _) << ((((___ << __) + _) << _))) + (((_______ << __) - _) <<            (((((_ << ___) + _)) << _))) + (((___ << ___) + _) << ((_____ <<            _))) + (_____ << ______) + (_ << ___)        )    ))(    *(lambda _, __, ___: _(_, __, ___))(        (lambda _, __, ___:            [__(___[(lambda: _).func_code.co_nlocals])] +            _(_, __, ___[(lambda _: _).func_code.co_nlocals:]) if ___ else []        ),        lambda _: _.func_code.co_argcount,        (            lambda _: _,            lambda _, __: _,            lambda _, __, ___: _,            lambda _, __, ___, ____: _,            lambda _, __, ___, ____, _____: _,            lambda _, __, ___, ____, _____, ______: _,            lambda _, __, ___, ____, _____, ______, _______: _,            lambda _, __, ___, ____, _____, ______, _______, ________: _        )    ))
Enter fullscreen modeExit fullscreen mode

String literals weren’t allowed, but I set some other restrictions for fun: it had to be a single expression (so no print statement) with minimal builtin usage and no integer literals.
Getting started

Since we can’t use print, we can write to the stdout file object:

import syssys.stdout.write("Hello world!\n")
Enter fullscreen modeExit fullscreen mode

But let’s use something lower-level: os.write(). We need stdout’s file descriptor, which is 1 (you can check with print sys.stdout.fileno()).

import osos.write(1, "Hello world!\n")
Enter fullscreen modeExit fullscreen mode

We want a single expression, so we’ll useimport():

__import__("os").write(1, "Hello world!\n")
Enter fullscreen modeExit fullscreen mode

We also want to be able to obfuscate the write(), so we’ll throw in a getattr():

getattr(__import__("os"), "write")(1, "Hello world!\n")
Enter fullscreen modeExit fullscreen mode

This is the starting point. Everything from now on will be obfuscating the three strings and the int.
Stringing together strings

"os" and "write" are fairly simple, so we’ll create them by joining parts of the names of various built-in classes. There are many different ways to do this, but I chose the following:

"o" from the second letter of bool: True.__class__.__name__[1]"s" from the third letter of list: [].__class__.__name__[2]"wr" from the first two letters of wrapper_descriptor, an implementation detail in CPython found as the type of some builtin classes’ methods (more on that here): ().__class__.__eq__.__class__.__name__[:2]"ite" from the sixth through eighth letters of tupleiterator, the type of object returned by calling iter() on a tuple: ().__iter__().__class__.__name__[5:8]
Enter fullscreen modeExit fullscreen mode

We’re starting to make some progress!

getattr(    __import__(True.__class__.__name__[1] + [].__class__.__name__[2]),    ().__class__.__eq__.__class__.__name__[:2] +    ().__iter__().__class__.__name__[5:8])(1, "Hello world!\n")
Enter fullscreen modeExit fullscreen mode

"Hello world!\n" is more complicated. We’re going to encode it as a big integer, which will be formed of the ASCII code of each character multiplied by 256 to the power of the character’s index in the string. In other words, the following sum:
∑n=0L−1cn(256n)

where L
is the length of the string and cn is the ASCII code of the n

th character in the string. To create the number:

>>> codes = [ord(c) for c in "Hello world!\n"]>>> num = sum(codes[i] * 256 ** i for i in xrange(len(codes)))>>> print num802616035175250124568770929992
Enter fullscreen modeExit fullscreen mode

Now we need the code to convert this number back into a string. We use a simple recursive algorithm:

>>> def convert(num):...     if num:...         return chr(num % 256) + convert(num // 256)...     else:...         return ""...>>> convert(802616035175250124568770929992)'Hello world!\n'
Enter fullscreen modeExit fullscreen mode

Rewriting in one line with lambda:

convert = lambda num: chr(num % 256) + convert(num // 256) if num else ""
Enter fullscreen modeExit fullscreen mode

Now we use anonymous recursion to turn this into a single expression. This requires a combinator. Start with this:

>>> comb = lambda f, n: f(f, n)>>> convert = lambda f, n: chr(n % 256) + f(f, n // 256) if n else "">>> comb(convert, 802616035175250124568770929992)'Hello world!\n'
Enter fullscreen modeExit fullscreen mode

Now we just substitute the two definitions into the expression, and we have our function:

>>> (lambda f, n: f(f, n))(...     lambda f, n: chr(n % 256) + f(f, n // 256) if n else "",...     802616035175250124568770929992)'Hello world!\n'
Enter fullscreen modeExit fullscreen mode

Now we can stick this into our code from before, replacing some variable names along the way (f →, n → _):

getattr(    __import__(True.__class__.__name__[1] + [].__class__.__name__[2]),    ().__class__.__eq__.__class__.__name__[:2] +    ().__iter__().__class__.__name__[5:8])(    1, (lambda _, __: _(_, __))(        lambda _, __: chr(__ % 256) + _(_, __ // 256) if __ else "",        802616035175250124568770929992    ))
Enter fullscreen modeExit fullscreen mode

Function internals

We’re left with a "" in the body of our convert function (remember: no string literals!), and a large number that we’ll have to hide somehow. Let’s start with the empty string. We can make one on the fly by examining the internals of some random function:

>>> (lambda: 0).func_code.co_lnotab''
Enter fullscreen modeExit fullscreen mode

What we’re really doing here is looking at the line number table of the code object contained within the function. Since it’s anonymous, there are no line numbers, so the string is empty. Replace the 0 with _ to make it more confusing (it doesn’t matter, since the function’s not being called), and stick it in. We’ll also refactor out the 256 into an argument that gets passed to our obfuscated convert() along with the number. This requires adding an argument to the combinator:

getattr(    __import__(True.__class__.__name__[1] + [].__class__.__name__[2]),    ().__class__.__eq__.__class__.__name__[:2] +    ().__iter__().__class__.__name__[5:8])(    1, (lambda _, __, ___: _(_, __, ___))(        lambda _, __, ___:            chr(___ % __) + _(_, __, ___ // __) if ___ else            (lambda: _).func_code.co_lnotab,        256,        802616035175250124568770929992    ))
Enter fullscreen modeExit fullscreen mode

A detour

Let’s tackle a different problem for a bit. We want a way to obfuscate the numbers in our code, but it’ll be cumbersome (and not particularly interesting) to recreate them each time they’re used. If we can implement, say, range(1, 9) == [1, 2, 3, 4, 5, 6, 7, 8], then we can wrap our current work in a function that takes variables containing the numbers from 1 to 8, and replace occurrences of integer literals in the body with these variables:

(lambda n1, n2, n3, n4, n5, n6, n7, n8:    getattr(        __import__(True.__class__.__name__[n1] + [].__class__.__name__[n2]),        ...    )(        ...    ))(*range(1, 9))
Enter fullscreen modeExit fullscreen mode

Even though we need to form 256 and 802616035175250124568770929992 as well, these can be created using arithmetic operations on these eight “fundamental” numbers. The choice of 1–8 is arbitrary, but seems to be a good middle ground.

We can get the number of arguments a function takes via its code object:

>>> (lambda a, b, c: 0).func_code.co_argcount
Enter fullscreen modeExit fullscreen mode

Build a tuple of functions with argcounts between 1 and 8:

funcs = (    lambda _: _,    lambda _, __: _,    lambda _, __, ___: _,    lambda _, __, ___, ____: _,    lambda _, __, ___, ____, _____: _,    lambda _, __, ___, ____, _____, ______: _,    lambda _, __, ___, ____, _____, ______, _______: _,    lambda _, __, ___, ____, _____, ______, _______, ________: _)
Enter fullscreen modeExit fullscreen mode

Using a recursive algorithm, we can turn this into the output of range(1, 9):

>>> def convert(L):...     if L:...         return [L[0].func_code.co_argcount] + convert(L[1:])...     else:...         return []...>>> convert(funcs)[1, 2, 3, 4, 5, 6, 7, 8]
Enter fullscreen modeExit fullscreen mode

As before, we convert this into lambda form:

convert = lambda L: [L[0].func_code.co_argcount] + convert(L[1:]) if L else []
Enter fullscreen modeExit fullscreen mode

Then, into anonymous-recursive form:

>>> (lambda f, L: f(f, L))(...     lambda f, L: [L[0].func_code.co_argcount] + f(f, L[1:]) if L else [],...     funcs)[1, 2, 3, 4, 5, 6, 7, 8]
Enter fullscreen modeExit fullscreen mode

For fun, we’ll factor out the argcount operation into an additional function argument, and obfuscate some variable names:

(lambda _, __, ___: _(_, __, ___))(    (lambda _, __, ___:        [__(___[0])] + _(_, __, ___[1:]) if ___ else []    ),    lambda _: _.func_code.co_argcount,    funcs)
Enter fullscreen modeExit fullscreen mode

There’s a new problem now: we still need a way to hide 0 and 1. We can get these by examining the number of local variables within arbitrary functions:

>>> (lambda: _).func_code.co_nlocals0>>> (lambda _: _).func_code.co_nlocals1
Enter fullscreen modeExit fullscreen mode

Even though the function bodies look the same, _ in the first function is not an argument, nor is it defined in the function, so Python interprets it as a global variable:

>>> import dis>>> dis.dis(lambda: _)  1           0 LOAD_GLOBAL              0 (_)              3 RETURN_VALUE>>> dis.dis(lambda _: _)  1           0 LOAD_FAST                0 (_)              3 RETURN_VALUE
Enter fullscreen modeExit fullscreen mode

This happens regardless of whether _ is actually defined in the global scope.

Putting this into practice:

(lambda _, __, ___: _(_, __, ___))(    (lambda _, __, ___:        [__(___[(lambda: _).func_code.co_nlocals])] +        _(_, __, ___[(lambda _: _).func_code.co_nlocals:]) if ___ else []    ),    lambda _: _.func_code.co_argcount,    funcs)
Enter fullscreen modeExit fullscreen mode

Now we can substitute the value of funcs in, and then using * to pass the resulting list of integers as eight separate variables, we get this:

(lambda n1, n2, n3, n4, n5, n6, n7, n8:    getattr(        __import__(True.__class__.__name__[n1] + [].__class__.__name__[n2]),        ().__class__.__eq__.__class__.__name__[:n2] +        ().__iter__().__class__.__name__[n5:n8]    )(        n1, (lambda _, __, ___: _(_, __, ___))(            lambda _, __, ___:                chr(___ % __) + _(_, __, ___ // __) if ___ else                (lambda: _).func_code.co_lnotab,            256,            802616035175250124568770929992        )    ))(    *(lambda _, __, ___: _(_, __, ___))(        (lambda _, __, ___:            [__(___[(lambda: _).func_code.co_nlocals])] +            _(_, __, ___[(lambda _: _).func_code.co_nlocals:]) if ___ else []        ),        lambda _: _.func_code.co_argcount,        (            lambda _: _,            lambda _, __: _,            lambda _, __, ___: _,            lambda _, __, ___, ____: _,            lambda _, __, ___, ____, _____: _,            lambda _, __, ___, ____, _____, ______: _,            lambda _, __, ___, ____, _____, ______, _______: _,            lambda _, __, ___, ____, _____, ______, _______, ________: _        )    ))
Enter fullscreen modeExit fullscreen mode

Shifting bits

Almost there! We’ll replace the n{1..8} variables with, _,, _, etc., since it creates confusion with the variables used in our inner functions. This doesn’t cause actual problems, since scoping rules mean the right ones will be used. This is also one of the reasons why we refactored 256 out to where _ refers to 1 instead of our obfuscated convert() function. It’s getting long, so I’ll paste only the first half:

(lambda _, __, ___, ____, _____, ______, _______, ________:    getattr(        __import__(True.__class__.__name__[_] + [].__class__.__name__[__]),        ().__class__.__eq__.__class__.__name__[:__] +        ().__iter__().__class__.__name__[_____:________]    )(        _, (lambda _, __, ___: _(_, __, ___))(            lambda _, __, ___:                chr(___ % __) + _(_, __, ___ // __) if ___ else                (lambda: _).func_code.co_lnotab,            256,            802616035175250124568770929992        )    ))
Enter fullscreen modeExit fullscreen mode

Only two more things are left. We’ll start with the easy one: 256. 256=28

, so we can rewrite it as 1 << 8 (using a left bit shift), or _ << ________ with our obfuscated variables.

We’ll use the same idea with 802616035175250124568770929992. A simple divide-and-conquer algorithm can break it up into sums of numbers which are themselves sums of numbers that are shifted together, and so on. For example, if we had 112, we could break it up into 96 + 16 and then (3 << 5) + (2 << 3). I like using bit shifts because the << reminds me of std::cout << "foo" in C++, or print chevron (print >>) in Python, both of which are red herrings involving other ways of doing I/O.

The number can be decomposed in a variety of ways; no one method is correct (after all, we could just break it up into (1 << 0) + (1 << 0) + ..., but that’s not interesting). We should have some substantial amount of nesting, but still use most of our numerical variables. Obviously, doing this by hand isn’t fun, so we’ll come up with an algorithm. In pseudocode:

func encode(num):    if num <= 8:        return "_" * num    else:        return "(" + convert(num) + ")"func convert(num):    base = shift = 0    diff = num    span = ...    for test_base in range(span):        for test_shift in range(span):            test_diff = |num| - (test_base << test_shift)            if |test_diff| < |diff|:                diff = test_diff                base = test_base                shift = test_shift    encoded = "(" + encode(base) + " << " + encode(shift) + ")"    if diff == 0:        return encoded    else:        return encoded + " + " + convert(diff)convert(802616035175250124568770929992)
Enter fullscreen modeExit fullscreen mode

The basic idea here is that we test various combinations of numbers in a certain range until we come up with two numbers, base and shift, such that base << shift is as closest to num as possible (i.e. we minimize their absolute difference, diff). We then use our divide-and-conquer algorithm to break up best_base and best_shift, and then repeat the procedure on diff until it reaches zero, summing the terms along the way.

The argument to range(), span, represents the width of the search space. This can’t be too large, or we’ll end getting num as our base and 0 as our shift (because diff is zero), and since base can’t be represented as a single variable, it’ll repeat, recursing infinitely. If it’s too small, we’ll end up with something like the (1 << 0) + (1 << 0) + ... mentioned above. In practice, we want span to get smaller as the recursion depth increases. Through trial and error, I found this equation to work well:
span=⌈log1.5|num|⌉+⌊24−depth⌋

Translating the pseudocode into Python and making some tweaks (support for the depth argument, and some caveats involving negative numbers), we get this:

from math import ceil, logdef encode(num, depth):    if num == 0:        return "_ - _"    if num <= 8:        return "_" * num    return "(" + convert(num, depth + 1) + ")"def convert(num, depth=0):    result = ""    while num:        base = shift = 0        diff = num        span = int(ceil(log(abs(num), 1.5))) + (16 >> depth)        for test_base in xrange(span):            for test_shift in xrange(span):                test_diff = abs(num) - (test_base << test_shift)                if abs(test_diff) < abs(diff):                    diff = test_diff                    base = test_base                    shift = test_shift        if result:            result += " + " if num > 0 else " - "        elif num < 0:            base = -base        if shift == 0:            result += encode(base, depth)        else:            result += "(%s << %s)" % (encode(base, depth),                                      encode(shift, depth))        num = diff if num > 0 else -diff    return result
Enter fullscreen modeExit fullscreen mode

Now, when we call convert(802616035175250124568770929992), we get a nice decomposition:

>>> convert(802616035175250124568770929992)(((_____ << ____) + _) << ((___ << _____) - ___)) + (((((___ << __) - _) << ___) + _) << ((_____ << ____) + (_ << _))) + (((_______ << __) - _) << (((((_ << ___) + _)) << ___) + (_ << _))) + (((_______ << ___) + _) << ((_ << ______) + _)) + (((_______ << ____) - _) << ((_______ << ___))) + (((_ << ____) - _) << ((((___ << __) + _) << __) - _)) - (_______ << ((((___ << __) - _) << __) + _)) + (_______ << (((((_ << ___) + _)) << __))) - ((((((_ << ___) + _)) << __) + _) << ((((___ << __) + _) << _))) + (((_______ << __) - _) << (((((_ << ___) + _)) << _))) + (((___ << ___) + _) << ((_____ << _))) + (_____ << ______) + (_ << ___)
Enter fullscreen modeExit fullscreen mode

Stick this in as a replacement for 802616035175250124568770929992, and put all the parts together:

(lambda _, __, ___, ____, _____, ______, _______, ________:    getattr(        __import__(True.__class__.__name__[_] + [].__class__.__name__[__]),        ().__class__.__eq__.__class__.__name__[:__] +        ().__iter__().__class__.__name__[_____:________]    )(        _, (lambda _, __, ___: _(_, __, ___))(            lambda _, __, ___:                chr(___ % __) + _(_, __, ___ // __) if ___ else                (lambda: _).func_code.co_lnotab,            _ << ________,            (((_____ << ____) + _) << ((___ << _____) - ___)) + (((((___ << __)            - _) << ___) + _) << ((_____ << ____) + (_ << _))) + (((_______ <<            __) - _) << (((((_ << ___) + _)) << ___) + (_ << _))) + (((_______            << ___) + _) << ((_ << ______) + _)) + (((_______ << ____) - _) <<            ((_______ << ___))) + (((_ << ____) - _) << ((((___ << __) + _) <<            __) - _)) - (_______ << ((((___ << __) - _) << __) + _)) + (_______            << (((((_ << ___) + _)) << __))) - ((((((_ << ___) + _)) << __) +            _) << ((((___ << __) + _) << _))) + (((_______ << __) - _) <<            (((((_ << ___) + _)) << _))) + (((___ << ___) + _) << ((_____ <<            _))) + (_____ << ______) + (_ << ___)        )    ))(    *(lambda _, __, ___: _(_, __, ___))(        (lambda _, __, ___:            [__(___[(lambda: _).func_code.co_nlocals])] +            _(_, __, ___[(lambda _: _).func_code.co_nlocals:]) if ___ else []        ),        lambda _: _.func_code.co_argcount,        (            lambda _: _,            lambda _, __: _,            lambda _, __, ___: _,            lambda _, __, ___, ____: _,            lambda _, __, ___, ____, _____: _,            lambda _, __, ___, ____, _____, ______: _,            lambda _, __, ___, ____, _____, ______, _______: _,            lambda _, __, ___, ____, _____, ______, _______, ________: _        )    ))
Enter fullscreen modeExit fullscreen mode

And there you have it.
Addendum: Python 3 support

Since writing this post, several people have asked about Python 3 support. I didn’t think of it at the time, but as Python 3 continues to gain traction (and thank you for that!), this post is clearly long overdue for an update.

Fortunately, Python 3 (as of writing, 3.6) doesn’t require us to change much:

The func_code function object attribute has been renamed to __code__. Easy fix with a find-and-replace.The tupleiterator type name has been changed to tuple_iterator. Since we use this to extract the substring "ite", we can get around this by changing our indexing in ().__iter__().__class__.__name__ from [_____:________] to [_:][_____:________].os.write() requires bytes now instead of a str, so chr(...) needs to be changed to bytes([...]).
Enter fullscreen modeExit fullscreen mode

Here is the full Python 3 version:

(lambda _, __, ___, ____, _____, ______, _______, ________:    getattr(        __import__(True.__class__.__name__[_] + [].__class__.__name__[__]),        ().__class__.__eq__.__class__.__name__[:__] +        ().__iter__().__class__.__name__[_:][_____:________]    )(        _, (lambda _, __, ___: _(_, __, ___))(            lambda _, __, ___:                bytes([___ % __]) + _(_, __, ___ // __) if ___ else                (lambda: _).__code__.co_lnotab,            _ << ________,            (((_____ << ____) + _) << ((___ << _____) - ___)) + (((((___ << __)            - _) << ___) + _) << ((_____ << ____) + (_ << _))) + (((_______ <<            __) - _) << (((((_ << ___) + _)) << ___) + (_ << _))) + (((_______            << ___) + _) << ((_ << ______) + _)) + (((_______ << ____) - _) <<            ((_______ << ___))) + (((_ << ____) - _) << ((((___ << __) + _) <<            __) - _)) - (_______ << ((((___ << __) - _) << __) + _)) + (_______            << (((((_ << ___) + _)) << __))) - ((((((_ << ___) + _)) << __) +            _) << ((((___ << __) + _) << _))) + (((_______ << __) - _) <<            (((((_ << ___) + _)) << _))) + (((___ << ___) + _) << ((_____ <<            _))) + (_____ << ______) + (_ << ___)        )    ))(    *(lambda _, __, ___: _(_, __, ___))(        (lambda _, __, ___:            [__(___[(lambda: _).__code__.co_nlocals])] +            _(_, __, ___[(lambda _: _).__code__.co_nlocals:]) if ___ else []        ),        lambda _: _.__code__.co_argcount,        (            lambda _: _,            lambda _, __: _,            lambda _, __, ___: _,            lambda _, __, ___, ____: _,            lambda _, __, ___, ____, _____: _,            lambda _, __, ___, ____, _____, ______: _,            lambda _, __, ___, ____, _____, ______, _______: _,            lambda _, __, ___, ____, _____, ______, _______, ________: _        )    ))
Enter fullscreen modeExit fullscreen mode

Thank you for reading! I continue to be amazed by this post’s popularity.

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

vi veri veniversum vivus vici Noob Teams
  • Location
    Indonesia Jakarta
  • Education
    Computer Science Sriwijaya University
  • Work
    Programmer at Icon Plus, Ezeelink Jakarta
  • Joined

More fromTeddy Zugana

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp