Program Synthesis with CodeGen

Program Synthesis with CodeGen #

April 16, 2024 byPhillip Dang.

2 min read. | 611 total words.

CodeGen is a family of standard transformer-based auto-regressive language models for program synthesis, which asdefined by the authors as a method for generating computer programs that solve specified problems, using input-output examples or natural language descriptions.

The specific CodeGen model that we’ll be testing is fine-tuned on a set of data which consists of 71.7B tokens of Python programming language. For a deeper dive into the inner workings of CodeGen, we recommend that users take a look atthis paper from Salesforce.

In this blog, we run several inferences with CodeGen and demonstrate how it works out-of-the-box with AMD GPUs and ROCm.

Prerequisites#

Software:
- ROCm
- PyTorch
- Linux OS

For a list of supported GPUs and OS, please refer tothis page. For convenience and stability, we recommend pulling and running the ROCm/PyTorch Docker container in your Linux system with the following code:

dockerrun-it--ipc=host--network=host--device=/dev/kfd--device=/dev/dri\--group-addvideo--cap-add=SYS_PTRACE--security-optseccomp=unconfined\--name=olmorocm/pytorch:rocm6.0_ubuntu20.04_py3.9_pytorch_2.1.1/bin/bash

Hardware:

Make sure the system recognizes your AMD GPU:

!rocm-smi--showproductname

=================ROCmSystemManagementInterface=========================================ProductInfo============================GPU[0]:Cardseries:InstinctMI210GPU[0]:Cardmodel:0x0c34GPU[0]:Cardvendor:AdvancedMicroDevices,Inc.[AMD/ATI]GPU[0]:CardSKU:D67301========================================================================================EndofROCmSMILog=========================

Let’s check if we have the right version of ROCm installed.

!aptshowrocm-libs-a

Package:rocm-libsVersion:5.7.0.50700-63~22.04Priority:optionalSection:develMaintainer:ROCmLibsSupport<rocm-libs.support@amd.com>Installed-Size:13.3kBDepends:hipblas(=1.1.0.50700-63~22.04),hipblaslt(=0.3.0.50700-63~22.04),hipfft(=1.0.12.50700-63~22.04),hipsolver(=1.8.1.50700-63~22.04),hipsparse(=2.3.8.50700-63~22.04),miopen-hip(=2.20.0.50700-63~22.04),rccl(=2.17.1.50700-63~22.04),rocalution(=2.1.11.50700-63~22.04),rocblas(=3.1.0.50700-63~22.04),rocfft(=1.0.23.50700-63~22.04),rocrand(=2.10.17.50700-63~22.04),rocsolver(=3.23.0.50700-63~22.04),rocsparse(=2.5.4.50700-63~22.04),rocm-core(=5.7.0.50700-63~22.04),hipblas-dev(=1.1.0.50700-63~22.04),hipblaslt-dev(=0.3.0.50700-63~22.04),hipcub-dev(=2.13.1.50700-63~22.04),hipfft-dev(=1.0.12.50700-63~22.04),hipsolver-dev(=1.8.1.50700-63~22.04),hipsparse-dev(=2.3.8.50700-63~22.04),miopen-hip-dev(=2.20.0.50700-63~22.04),rccl-dev(=2.17.1.50700-63~22.04),rocalution-dev(=2.1.11.50700-63~22.04),rocblas-dev(=3.1.0.50700-63~22.04),rocfft-dev(=1.0.23.50700-63~22.04),rocprim-dev(=2.13.1.50700-63~22.04),rocrand-dev(=2.10.17.50700-63~22.04),rocsolver-dev(=3.23.0.50700-63~22.04),rocsparse-dev(=2.5.4.50700-63~22.04),rocthrust-dev(=2.18.0.50700-63~22.04),rocwmma-dev(=1.2.0.50700-63~22.04)Homepage:https://github.com/RadeonOpenCompute/ROCmDownload-Size:1012BAPT-Manual-Installed:yesAPT-Sources:http://repo.radeon.com/rocm/apt/5.7jammy/mainamd64PackagesDescription:RadeonOpenCompute(ROCm)Runtimesoftwarestack

Make sure PyTorch also recognizes the GPU:

importtorchprint(f"number of GPUs:{torch.cuda.device_count()}")print([torch.cuda.get_device_name(i)foriinrange(torch.cuda.device_count())])

numberofGPUs:1['AMDRadeonGraphics']

Let’s start testing CodeGen.

Libraries#

Before you begin, make sure you have all the necessary libraries installed:

!pipinstalltransformers

Next import the modules you’ll be working with for this blog:

importtorchimporttimefromtransformersimportAutoModelForCausalLM,AutoTokenizer

Loading the model#

Let’s load the model and its tokenizer. CodeGen has several variants at different sizes from 350M to 16.1B parameters. In this blog, we’ll be running inferences on the 350M parameters variant of the model.

torch.set_default_device("cuda")start_time=time.time()checkpoint="Salesforce/codegen-350M-mono"model=AutoModelForCausalLM.from_pretrained(checkpoint)tokenizer=AutoTokenizer.from_pretrained(checkpoint)print(f"Loaded in{time.time()-start_time: .2f} seconds")print(model)

Loadedin6.89secondsCodeGenForCausalLM((transformer):CodeGenModel((wte):Embedding(51200,1024)(drop):Dropout(p=0.0,inplace=False)(h):ModuleList((0-19):20xCodeGenBlock((ln_1):LayerNorm((1024,),eps=1e-05,elementwise_affine=True)(attn):CodeGenAttention((attn_dropout):Dropout(p=0.0,inplace=False)(resid_dropout):Dropout(p=0.0,inplace=False)(qkv_proj):Linear(in_features=1024,out_features=3072,bias=False)(out_proj):Linear(in_features=1024,out_features=1024,bias=False))(mlp):CodeGenMLP((fc_in):Linear(in_features=1024,out_features=4096,bias=True)(fc_out):Linear(in_features=4096,out_features=1024,bias=True)(act):NewGELUActivation()(dropout):Dropout(p=0.0,inplace=False))))(ln_f):LayerNorm((1024,),eps=1e-05,elementwise_affine=True))(lm_head):Linear(in_features=1024,out_features=51200,bias=True))

Running inference#

Let’s create a function that takes in some input prompt and generates the output. We’ll also estimate the following 2 inference metrics:

Latency: The total time it takes for the model to generate the output
Throughput: The number of output tokens per second

defrun_inference(raw_input):start_time=time.time()inputs=tokenizer(raw_inputs,return_tensors="pt",return_attention_mask=False)outputs=model.generate(**inputs,max_length=1000)latency=time.time()-start_timethroughput=len(outputs[0])/latencyprint(f"Latency:{latency: .2f} seconds")print(f"Throughput:{throughput: .2f} tokens/s")text=tokenizer.batch_decode(outputs)[0]print(text)

With this, we’re ready to run inference and have some fun with CodeGen! We’ll be testing the model’s ability to generate code.

Generate code#

Let’s give CodeGen a medium difficultyLeetcode question and see how it does.

raw_inputs='''Given an integer array nums, return all the triplets [nums[i], nums[j], nums[k]] such that i != j, i != k, and j != k, and nums[i] + nums[j] + nums[k] == 0.Notice that the solution set must not contain duplicate triplets.'''text=run_inference(raw_inputs)

Output:

Latency:14.45secondsThroughput:36.12tokens/sGivenanintegerarraynums,returnallthetriplets[nums[i],nums[j],nums[k]]suchthati!=j,i!=k,andj!=k,andnums[i]+nums[j]+nums[k]==0.Noticethatthesolutionsetmustnotcontainduplicatetriplets.Example1:Input:nums=[-1,0,1,2,-1,-4]Output:[[-1,-1,2],[-1,0,1]]Explanation:-1and-1aretriplets.-1and0arenottriplets.-1and1arenottriplets.-4and-1arenottriplets.-4and-1aretriplets.-4and0arenottriplets.-4and1aretriplets.-1and2arenottriplets.Example2:Input:nums=[]Output:[]Example3:Input:nums=[0]Output:[]Constraints:1<=nums.length<=104-104<=nums[i]<=104"""class Solution:    def threeSum(self, nums: List[int]) -> List[List[int]]:        nums.sort()        res = []        for i in range(len(nums)):            if i > 0 and nums[i] == nums[i-1]:                continue            l, r = i+1, len(nums)-1            while l < r:                if nums[i] + nums[l] + nums[r] == 0:                    res.append([nums[i], nums[l], nums[r]])                    while l < r and nums[l] == nums[l+1]:                        l += 1                    while l < r and nums[r] == nums[r-1]:                        r -= 1                    l += 1                    r -= 1                elif nums[i] + nums[l] + nums[r] > 0:                    r -= 1                else:                    l += 1        return res<|endoftext|>

While the answer is correct and accepted by Leetcode, we see that the example generated by the model1and-1aretriplets does not make much sense.

Let’s tryanother problem, this time with a bit of a twist:

raw_inputs='''Given a 1-indexed array of integers numbers that is already sorted in non-decreasing order, find two numbers such that they add up to a specific target number. Let these two numbers be numbers[index1] and numbers[index2] where 1 <= index1 < index2 <= numbers.length.Return the indices of the two numbers, index1 and index2, added by one as an integer array [index1, index2] of length 2.The tests are generated such that there is exactly one solution. You may not use the same element twice.Your solution must use only constant extra space.'''text=run_inference(raw_inputs)

Output:

Latency:13.03secondsThroughput:41.05tokens/sGivena1-indexedarrayofintegersnumbersthatisalreadysortedinnon-decreasingorder,findtwonumberssuchthattheyadduptoaspecifictargetnumber.Letthesetwonumbersbenumbers[index1]andnumbers[index2]where1<=index1<index2<=numbers.length.Returntheindicesofthetwonumbers,index1andindex2,addedbyoneasanintegerarray[index1,index2]oflength2.Thetestsaregeneratedsuchthatthereisexactlyonesolution.Youmaynotusethesameelementtwice.Yoursolutionmustuseonlyconstantextraspace.Example1:Input:numbers=[2,7,11,15],target=9Output:[1,2]Explanation:Thesumof2and7is9.Thereforeindex1=1,index2=2.Example2:Input:numbers=[2,3,4],target=6Output:[1,3]Explanation:Thesumof2and3is6.Thereforeindex1=1,index2=3.Example3:Input:numbers=[2,3,4],target=18Output:[1,3]Explanation:Thesumof2and3is6.Thereforeindex1=1,index2=3.Example4:Input:numbers=[2,3,4],target=0Output:[1,2]Explanation:Thesumof2and0is0.Thereforeindex1=1,index2=2.Example5:Input:numbers=[2,3,4],target=10Output:[1,3]Explanation:Thesumof2and3is6.Thereforeindex1=1,index2=3.Constraints:1<=numbers.length<=10^4-10^9<=numbers[i]<=10^9-10^9<=target<=10^9"""class Solution:    def twoSum(self, numbers: List[int], target: int) -> List[int]:        for i in range(len(numbers)):            for j in range(i+1, len(numbers)):                if numbers[i] + numbers[j] == target:                    return [i, j]        return []<|endoftext|>

This time the answer is incorrect since it missed one of the requirements in the question, which isReturntheindicesofthetwonumbers,index1andindex2,addedbyone.

Next, let’s ask the model to generate a commonly used loss function in training ML models. This time we do not constrain it as much as the two previous examples.

raw_inputs='''Implement the cross entropy loss function'''text=run_inference(raw_inputs)

Output:

Latency:32.24secondsThroughput:31.02tokens/sImplementthecrossentropylossfunction"""import torchimport torch.nn as nnimport torch.nn.functional as Fclass CrossEntropyLoss(nn.Module):    def __init__(self, ignore_index=255):        super(CrossEntropyLoss, self).__init__()        self.ignore_index = ignore_index    def forward(self, inputs, targets):        n = inputs.size(0)        log_probs = F.log_softmax(inputs, dim=1)        loss = -torch.sum(targets * log_probs, dim=1)        loss = torch.mean(loss)        return lossclass DiceLoss(nn.Module):    def __init__(self, ignore_index=255):        super(DiceLoss, self).__init__()        self.ignore_index = ignore_index    def forward(self, inputs, targets):        n = inputs.size(0)        log_probs = F.log_softmax(inputs, dim=1)        loss = -torch.sum(targets * log_probs, dim=1)        loss = torch.mean(loss)        return lossclass DiceLoss2(nn.Module):    def __init__(self, ignore_index=255):        super(DiceLoss2, self).__init__()        self.ignore_index = ignore_index    def forward(self, inputs, targets):        n = inputs.size(0)        log_probs = F.log_softmax(inputs, dim=1)        loss = -torch.sum(targets * log_probs, dim=1)        loss = torch.mean(loss)        return loss

Here we see that even though the model already finishes generating theCrossEntropyLoss class, it goes on and on to generate unnecessary code until it reaches themax_length=1000.

Finally, let’s ask the model to implement binary search.

raw_inputs='''Implement binary search'''text=run_inference(raw_inputs)

Output:

Latency:4.60secondsThroughput:30.65tokens/sImplementbinarysearch"""def binary_search(arr, target):    low = 0    high = len(arr) - 1    while low <= high:        mid = (low + high) // 2        if arr[mid] == target:            return mid        elif arr[mid] < target:            low = mid + 1        else:            high = mid - 1    return -1arr = [1,2,3,4,5,6,7,8,9,10]target = 10print(binary_search(arr, target))<|endoftext|>

This time, we see the model is able to perfectly implement binary search!

From the examples above, we see that CodeGen works quite well barring some strange behaviors like not knowing when to stop or missing some minor details in the responses. This could be due to our using the smallest variant with 300M parameters, which is quite small for a language model. Readers are encouraged to explore larger variants and test out the quality of the generated responses.

Disclaimers#

Third-party content is licensed to you directly by the third party that owns the content and isnot licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS PROVIDED “AS IS”WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT IS DONE ATYOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO YOU FORANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE FOR ANYDAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.

Contents

Movatterモバイル変換

Program Synthesis with CodeGen

Contents

Program Synthesis with CodeGen#

Prerequisites#

Libraries#

Loading the model#

Running inference#

Generate code#

Disclaimers#

Program Synthesis with CodeGen #