Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Danie Palm
Danie Palm

Posted on

     

Programs as proteins - a ribosome programmed in Joy

Ribosomes are enzymes that are part protein and part RNA. They play a central role in the translation of messenger RNA to proteins according to the genetic code. And yet all the protein subunits of ribosomes are themselves the products of messenger RNA that was translated according to the genetic code by... a ribosome. Ultimately, the messenger RNA that codes for the protein subunits and the ribosomal RNA that directly contributes to the translation function of the ribosome are both the products of DNA transcription, so that the ribosome, like other enzymes, are fully represented in the genome. Without the ribosome there would be no interpreter of the genetic code and no point in having any genes. The ribosome is therefore the most suitable starting place for our Joy-based artificial life system.

We have previously touched on the Y combinator, a function that is able to produce a fixed point of other functions (or function compositions, such as entire Joy programs). In this post we will finally put the Y combinator and its anonymous recursion to good use when we loop through the catalytic steps of the ribosome.

Biology

Ribosomes are roughly half rRNA and half protein. The rRNA component is made up of 4 rRNA strands and the protein component comprises about 80 proteins. The exact composition differs between bacteria, archaea, and eukaryotes, but the overall function and structure is largely comparable: all ribosomes are made up of a small and a large subunit, both of which is a mix of rRNA and proteins.

The process of translation begins when the small ribosomal subunit binds to mRNA, either at the start (the 5'-end) or just prior to the start codon (AUG). It then scans towards the 3'-end until it reaches the start codon, at which point the large ribosomal subunit is also recruited to form the full ribosome.

Once the ribosomal complex is fully initiated, an amino acid-charged tRNA that is complementary to the start codon of the mRNA strand binds to a position called site A. Right after binding, the ribosome moves one codon to the right, so that the amino acid-charged tRNA is now located at a position known as site P. The next codon of the mRNA is now located at site A and once again a complementary charged tRNA binds to it. The amino acid at site P is now transfered to the amino acid at site A, and then the ribosome once again moves one codon to the right. Site P is now bound to a tRNA molecule that is charged with a peptide chain of two amino acids. This process continues until a stop codon is reached and on each iteration the peptide chain at site P grows by one amino acid, building up a protein of which the amino acid sequence is dictated by the mRNA strand being translated.

It is somewhat interesting to note that the protein subunits of the ribosome are not actively involved at the translation sites, but rather seem to support the rRNA that catalyses the elongation reaction. This could perhaps be an indication that primitive ribosomes were entirely made up of rRNA.

Code

Let's dive right in and take a look at the high-level logic that we want from a ribosome in the form of an Elixir function. The function takes a polypeptide (protein) list as the first argument and an mRNA list as the second. The first invocation should be with an empty polypeptide and an mRNA list of any length. It calls some undefined functions of which the names are hopefully self-explanatory.

defribosome_worker(polypeptide,mrna)doifEnum.count(mrna)>=3doifpolypeptide==[]do{codon,mrna}=Enum.split(mrna,3)ifcodon==[:a,:u,:g]do# Initiate translationamino_acid=translate(codon)polypeptide=polypeptide++amino_acidribosome_worker(polypeptide,mrna)else# Continue searching for start codon (AUG)[_first_base|last_two_bases]=codonmrna=last_two_bases++mrnaribosome_worker(polypeptide,mrna)endelse# Elongate previously initiated polypeptide{codon,mrna}=Enum.split(mrna,3)amino_acid=translate(codon)ifamino_acid==[]do# the proper Elixir counterpart of what we're doing here# detracts from the argument, so we move along insteadstash_current_polypeptide_somewhere_and_start_a_new_one()elsepolypeptide=polypeptide++amino_acidendribosome_worker(polypeptide,mrna)endelsepolypeptideendend
Enter fullscreen modeExit fullscreen mode

In short, we want to scan over a messenger RNA (mRNA) strand (a list of atoms:a,:c,:g or:u) for as long as it is longer than or equal to 3 bases. As soon as the sublist[:a, :u, :g] is reached, protein synthesis is initiated. This initiation involves translating theAUG codon to its corresponding amino acid (methionine in this case), which forms the first amino acid of the polypeptide (protein) chain. After initiation we recurse. This time however, the polypeptide is not empty and so we don't search for the initiatingAUG codon, but instead translate the next codon and attach the corresponding amino acid to the end of the growing polypeptide. Unless the codon corresponds to a stop codon, returning an empty amino acid, in which case the existing polypeptide chain is stashed (code for this is not shown) and a new chain is started. In either case, we once a again recurse, so that we either elongate the polypeptide or search for a new start codon. This process continues until the mRNA strand has less than 3 bases left, in which case it can contain no more codons so that the translation process is terminated.

The sequence as described above does not correspond exactly to the real process of translation of which there are many variations. For instance, when a ribosome encounters a stop codon, it detaches from the mRNA strand instead of continuing to search for the next occurence ofAUG.

The problem we now face, is to turn the above pseudocode into Joy. Some of the challenges we encounter in this process are:

  • Joy, even without the restrictions we place on it, does not have the notion of variables
  • theribosome_worker function is recursive, but since it is not one of our primitives (likedup,swap, etc.) there is no direct way to make a recursive call in the Joy version
  • we are using a minimal subset of Joy in which we don't have numbers
  • the pseudocode is in a mixture of Polish and infix notation, but Joy exclusively makes use of the reverse Polish notation

Joy doesn't have variables. Functions can only manipulate or act on values on the stack. Most Joy primitives only operate on the topmost one or two stack values. Before executing a function, it is therefore usually necessary to first manipulate the stack to make sure that the values expected by the function are located at the correct stack positions. Similarly, it may be necessary to do some stack cleanup after a function has been executed. In other words, it is sometimes necessary to dig out values from deep in the stack and place them on top, and other times it is necessary to bury values from the top of the stack deep underneath some others. Let's look at a family of such dig and bury functions.

We are already familiar with a function that will bury the top element (and simultaneously dig out the second element) -swap. But it is possible to bury the top elementn positions deep with:

bury1 == [[] cons] dip swap ibury2 == [[] cons cons] dip swap ibury3 == [[] cons cons cons] dip swap i
Enter fullscreen modeExit fullscreen mode

in whichcons is repeatedn times.

For example:

[A] [B] [C] bury2 == [C] [A] [B]

Here is how it works:

[A] [B] [C] bury2[A] [B] [C] [[] cons cons] dip swap i     (definition)[A] [B] [] cons cons [C] swap i           (dip)[A] [[B]] cons [C] swap i                 (cons)[[A] [B]] [C] swap i                      (cons)[C] [[A] [B]] i                           (swap)[C] [A] [B]                               (i)
Enter fullscreen modeExit fullscreen mode

Wecons up two items into a single list. Then weswap the list with the top element and unpack the consed up list again. Similary, we can dig outn elements with:

dig1 == [] cons dipdig2 == [] cons cons dipdig3 == [] cons cons cons dip
Enter fullscreen modeExit fullscreen mode

Note thatdig1 andbury1 are both equivalent toswap. SeeThe Theory of Concatenative Combinators for a more detailed treatment. We will not be usingdign orburyn directly because we generally want to employ as few as possible amino acids. Instead, we will be writing out thebury anddig family of functions in full whenever they are needed.

Let's put this to the test. Suppose we have a function that takes 3 parameters: a numbern and two stringseven_string andodd_string. The function must print the value ofeven_string ifn is even and the value ofodd_string otherwise. In Elixir, it could look like this:

defodd_or_even(n,even_string,odd_string)doifrem(n,2)==0doIO.puts(even_string)elseIO.puts(odd_string)endend
Enter fullscreen modeExit fullscreen mode

A corresponding Joy function, allowing the use of numbers and strings, could look like this:

[odd_or_even][    [        # check if the remainder of n divided by 2 is 0    ]    [        # print odd string    ]    [        # print even string    ]    ifte]define
Enter fullscreen modeExit fullscreen mode

Remember thatifte is the Joy counterpart of an if-then-else block. Now, the remainder check could look like:

2 rem 0 equal
Enter fullscreen modeExit fullscreen mode

This assumes that the value representingn is at the top of the stack. However, if the function is to be called like4 "I am odd" "I am even" odd_or_even, thenn is two places below the top and needs to be dug up first.

Similarly, for the case in which we want to print the value ofeven_string, we can't just writeputs (the Joy built-in that writes to the standard output), because that would expecteven_string to be at the top of the stack. Instead we have to dig once (or swap) to get it. Lastly, in order to print the value ofodd_string, we can simply useputs becauseodd_string is indeed at the top. Here is the full Joy version:

[odd_or_even][    [        dig2          # dig up n        2 rem 0 equal # check if n is even    ]    [        swap puts     # print even_string if n is even    ]    [        puts          # print odd_string otherwise    ]    ifte]define
Enter fullscreen modeExit fullscreen mode

Now that we know how to manipulate the stack, it is time to tackle recursion. Consider the following recursive function (assuming only non-negative values ofn):

defsum(total,n)doifn==0doIO.puts(total)elsesum(total+n,n-1)endend
Enter fullscreen modeExit fullscreen mode

Callingsum(0, 3) should print out the number 6. Here is the corresponding Joy version, with all the necessary digging and burying:

[sum][    [0 equal] # n is at the top, so no digging    [        swap  # dig out total, which is below n        puts  # ... and print it    ]    [        dup   # duplicate n, which is at the top        dig2  # dig out total which is now buried under two n's        +     # sum total and the n below it        swap  # so that the new total is under the n that remains from the earlier dup        1 -   # subtract 1 from the n on top        sum   # recurse    ]    ifte]define
Enter fullscreen modeExit fullscreen mode

The else-block that makes the recursive call is particularly involved, because it needs to first manipulate the stack so that it is ready for the first calculationtotal = total + n and then to prepare for the secod calculationn = n - 1. And finally it has to placetotal andn on the stack in the order that is expected by thesum function in preparation for the recursive call.

While the function above works, we are interested in a version of it that uses anonymous recursion. That is, instead of directly callingsum recursively, we need to execute something that is equivalent but unnamed. In a previous post, we have established the following effect of the Y combinator:

[p] y == [q] p
Enter fullscreen modeExit fullscreen mode

where[q] (short for[[dup cons p] dup cons p]), when executed, again yields[q] p. This means that if the programp decides to executeq by for instance invokingi when[q] is on top of the stack, then the result is that[q] is once again placed on top of the stack andp is once again executed, having again the option to unquote[q] whenever its logic requires it to do so. In this wayp can effectively call itself anonymously.

We can now modify the definition ofsum to make use of the Y combinator as follows:

[sum][    [        [dig1 0 equal] # [q] is now above n, so we have to dig n out        [            dig2       # dig out total, which is below n and [q]            puts       # ... and print it        ]        [            dig1 dup   # dig out and duplicate n            dig3       # dig out total which is now buried under two n's and [q]            +          # sum total and the n below it            bury2      # bury the new total under [q]            1 -        # subtract 1 from the n that is now on top            bury1      # bury n under [q]            i          # recurse: execute [q] which is on top        ]        ifte    ] y]define
Enter fullscreen modeExit fullscreen mode

We have now directly or indirectly addressed all the challenges we faced in translating pseudcode to restricted Joy. Perhaps a final note on the absence of numbers is warranted. While we won't support numbers (there are too many of them and we aim to use as few symbols as possible), there is no limitation on performing certain operations a number of times. For instance, the predicateEnum.count(mrna) >= 3, can be calculated without using numbers:

not Enum.empty?(mrna) and not Enum.empty?(tl(mrna)) and not Enum.empty?(tl(tl(mrna)))
Enter fullscreen modeExit fullscreen mode

which is readily converted to Joy.

Without further ado, here is our ribosome, first with high-level functions likeburyn anddign and itermediate definitions, and then in austere form using only primitives.

High-level ribosome:

[ribosome][    [] swap    [ribosome-worker]    y]define
Enter fullscreen modeExit fullscreen mode
[ribosome-worker][    # check if the mrna list contains at least 3 items    [swap empty] [ ] [[swap rest empty] [ ] [[swap rest rest empty] [ ] [        [dig2 empty]            # if polypeptide is empty        [                       # then initiate            swap split3 swap    # dig up mrna and split at 3            [[a u g] equal]     # if codon equals [a u g]            [                   # then                translate       # translate codon to amino acid                dig3            # dig out polypeptide                swap cat        # dig out amino acid and concatenate to polypeptide                bury2           # bury polypeptide                swap            # bury mrna below [q]                i               # recurse            ]            [                   # else                rest            # drop the first element of the codon                swap cat        # prepend the rest to mrna                 swap            # bury mrna below [q]                i               # recurse            ]            ifte        ]        [                       # else elongate            swap split3 swap    # dig up mrna and split at 3            translate           # translate codon to amino acid            [empty]             # if amino acid is empty            [                   # then                bury2           # bury it as a new polypeptide            ]            [                   # else                dig3            # dig out polypeptide                swap cat        # dig out amino acid and concatenate to polypeptide                bury2           # bury polypeptide            ]            ifte            swap                # bury mrna below [q]            i                   # recurse        ]        ifte    ] ifte] ifte] ifte]define
Enter fullscreen modeExit fullscreen mode
[split3][    [] swap    dup [first unit cat] dip rest    dup [first unit cat] dip rest    dup [first unit cat] dip rest]define
Enter fullscreen modeExit fullscreen mode

Thesplit3 function is the counterpart of the Elixir functionEnum.split(mrna, 3). Its first effect is to bury an empty list underneath the mRNA, which it expects to be on top of the stack. It then duplicates the mRNA and concatenates the head of the first copy to the list underneath it on the stack, and then it drops the head of the second copy. This duplication process is repeated 3 times to correspond to the 3 insplit3.

Austere, but verbose ribosome:

[ribosome][    [] swap    [        [swap empty] [ ] [[swap rest empty] [ ] [[swap rest rest empty] [ ] [            [[] cons cons dip empty]            [                swap                [] swap                dup [first unit cat] dip rest                dup [first unit cat] dip rest                dup [first unit cat] dip rest                swap                [[a u g] equal]                [                    translate                    [] cons cons cons dip                    swap cat                    [[] cons cons] dip swap i                    swap                    i                ]                [                    rest                    swap cat                    swap                    i                ]                ifte            ]            [                swap                dup [first unit] dip rest                dup [first unit cat] dip rest                dup [first unit cat] dip rest                swap                translate                [empty]                [                    [[] cons cons] dip swap i                ]                [                    [] cons cons cons dip                    swap cat                    [[] cons cons] dip swap i                ]                ifte                swap                i            ]            ifte        ] ifte] ifte] ifte    ]    [dup cons] swap cat dup cons i pop pop]define
Enter fullscreen modeExit fullscreen mode

We've sneaked in a few things not present in the initial Elixir code, but only so that we don't have to provide the initial empty polypeptide to the function. I.e. we can callribosome with just an mRNA sequence:

[c a u g a a c a u u g a a a u g u g a a a a a u g a a a c] ribosome
Enter fullscreen modeExit fullscreen mode

The output of the above program are two lists of amino acids (a stop codon in the mix triggered the start of a new polypeptide):

[met asn ile glu met] [met lys]
Enter fullscreen modeExit fullscreen mode

So the Joy ribosome does what we set out to do at the beginning of this post, but it is not quite what we ultimately want and I've slipped in some cheating. Here are some things that still need to be addressed:

  1. The ribosome translates mRNA to amino acids according to thereal genetic code, whereas we really want it to translate mRNA to primitive Joy functions or combinators (artificial amino acids) according to an artificial genetic code.

  2. The ribosome can output multiple polypeptides, but these peptides are hopelessly flat - they contain no quotations. In essence the produced lists correspond to unfolded (primary structure) proteins, but Joy programs without quotations would hardly be capable of any interesting behaviour.

  3. I've treated thetranslate function as a primitive of Joy. This explains why I haven't provided its definition above. I wasn't able to provide it, because it is only defined at a language level, i.e. in the underlying Elixir implementation of Joy. The end result is that I've hardcoded the genetic code, injecting it from the outside, instead of representing it within the system. For completeness, here is the definition oftranslate:

deftranslate(stack)do[head|rest]=stackamino_acid=caseheaddo[:u,:u,:u]->[:phe][:u,:c,:u]->[:ser][:u,:a,:u]->[:tyr][:u,:g,:u]->[:cys][:u,:u,:c]->[:phe][:u,:c,:c]->[:ser][:u,:a,:c]->[:tyr][:u,:g,:c]->[:cys][:u,:u,:a]->[:leu][:u,:c,:a]->[:ser][:u,:a,:a]->[]# stop codon[:u,:g,:a]->[]# stop codon[:u,:u,:g]->[:leu][:u,:c,:g]->[:ser][:u,:a,:g]->[]# stop codon[:u,:g,:g]->[:trp][:c,:u,:u]->[:leu][:c,:c,:u]->[:pro][:c,:a,:u]->[:his][:c,:g,:u]->[:arg][:c,:u,:c]->[:leu][:c,:c,:c]->[:pro][:c,:a,:c]->[:his][:c,:g,:c]->[:arg][:c,:u,:a]->[:leu][:c,:c,:a]->[:pro][:c,:a,:a]->[:gln][:c,:g,:a]->[:arg][:c,:u,:g]->[:leu][:c,:c,:g]->[:pro][:c,:a,:g]->[:gln][:c,:g,:g]->[:arg][:a,:u,:u]->[:ile][:a,:c,:u]->[:thr][:a,:a,:u]->[:asn][:a,:g,:u]->[:ser][:a,:u,:c]->[:ile][:a,:c,:c]->[:thr][:a,:a,:c]->[:asn][:a,:g,:c]->[:ser][:a,:u,:a]->[:ile][:a,:c,:a]->[:thr][:a,:a,:a]->[:lys][:a,:g,:a]->[:arg][:a,:u,:g]->[:met][:a,:c,:g]->[:thr][:a,:a,:g]->[:lys][:a,:g,:g]->[:arg][:g,:u,:u]->[:val][:g,:c,:u]->[:ala][:g,:a,:u]->[:asp][:g,:g,:t]->[:gly][:g,:u,:c]->[:val][:g,:c,:c]->[:ala][:g,:a,:c]->[:asp][:g,:g,:c]->[:gly][:g,:u,:a]->[:val][:g,:c,:a]->[:ala][:g,:a,:a]->[:glu][:g,:g,:a]->[:gly][:g,:u,:g]->[:val][:g,:c,:g]->[:ala][:g,:a,:g]->[:glu][:g,:g,:g]->[:gly]end[amino_acid|rest]end
Enter fullscreen modeExit fullscreen mode

To end with, the Joy programs I've presented here, while very pleasing to me, are by no means readable; not to the untrained eye. I should remind the reader that firstly I'm not an expert in Joy and am probably not using the most idiomatic code and secondly that I'm purposefully using tediously repetitive code in order to limit the number of different functions that are used. Anonymous recursion is for instance built into the standard Joy libraries, which I'm not using. Using real Joy is probably much closer to real joy.

Be that as it may. We have now built a workable first iteration of the most important enzyme of our artificial life system. In the next few posts we will focus on ironing out the remaining three issues with this implementation. This will bring us to a critical milestone: having a ribosome that can produce itself given an appropriate mRNA template.

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

  • Joined

More fromDanie Palm

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp