Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Chris White
Chris White

Posted on

The inner workings of boto

Theboto library is the python bindings for working with AWS. So much that it even powers theAWS CLI itself. In this guide we'll take a look at some of the inner workings of boto.

boto and boto3

Originally boto was one big monolith package. For historical purposes you can find theoriginal source code on GitHub. The issue that the original had was containing logic for every service known. When AWS really started to expand it became too much of a challenge to maintain every service.

Then comes in boto3. One of the first changes is that much of the low level logic was moved over to thebotocore project. Instead of declaring all the service logic in python, services are abstracted toJSON data files. The boto3 library then deals with some exceptions which aren't well handled by service file abstraction such asbatch writing in DynamoDB.

Service JSON

The actual service JSON files are generally comprised of:

  • service JSON
  • paginator JSON
  • waiter JSON

To start off with the services. SQS is a pretty simple API so I'll use that for this example starting with theCreateQueue call:

CreateQueue":{      "name":"CreateQueue",      "http":{        "method":"POST",        "requestUri":"/"      },      "input":{"shape":"CreateQueueRequest"},      "output":{        "shape":"CreateQueueResult",        "resultWrapper":"CreateQueueResult"      },      "errors":[        {"shape":"QueueDeletedRecently"},        {"shape":"QueueNameExists"}      ],      "documentation":"<p>CreatesanewstandardorFIFOqueue.Youcanpassoneormoreattributesintherequest.Keepthefollowinginmind:</p><ul><li><p>Ifyoudon'tspecifythe<code>FifoQueue</code>attribute,AmazonSQScreatesastandardqueue.</p><note><p>Youcan'tchangethequeuetypeafteryoucreateitandyoucan'tconvertanexistingstandardqueueintoaFIFOqueue.YoumusteithercreateanewFIFOqueueforyourapplicationordeleteyour<snip>"    },
Enter fullscreen modeExit fullscreen mode

Interestingly enough if you check the documentation forcreate_queue in boto3's API reference page you can see the contents of thedocumentation key is shown.errors also map to the exceptions shown at the bottom of the page. Essentially a good majority of boto documentation is generated via automation that parses the service files. The shapes part of input and output are the parameters and results structure for the API call. Looking atCreateQueueRequest:

"CreateQueueRequest":{"type":"structure","required":["QueueName"],"members":{"QueueName":{"shape":"String","documentation":"<snip>"},"Attributes":{"shape":"QueueAttributeMap","documentation":"<snip>","locationName":"Attribute"},"tags":{"shape":"TagMap","documentation":"<snip>","locationName":"Tag"}},"documentation":"<snip>"},
Enter fullscreen modeExit fullscreen mode

The documentation parts are removed due to making the paste really long. There's also a declaration for required parameters (can't create a queue without a queue name).Attributes also has its as a custom shape:

"QueueAttributeMap":{"type":"map","key":{"shape":"QueueAttributeName","locationName":"Name"},"value":{"shape":"String","locationName":"Value"},"flattened":true,"locationName":"Attribute"},"QueueAttributeName":{"type":"string","enum":["All","Policy","VisibilityTimeout","MaximumMessageSize","MessageRetentionPeriod","ApproximateNumberOfMessages","ApproximateNumberOfMessagesNotVisible","CreatedTimestamp","LastModifiedTimestamp","QueueArn","ApproximateNumberOfMessagesDelayed","DelaySeconds","ReceiveMessageWaitTimeSeconds","RedrivePolicy","FifoQueue","ContentBasedDeduplication","KmsMasterKeyId","KmsDataKeyReusePeriodSeconds","DeduplicationScope","FifoThroughputLimit","RedriveAllowPolicy","SqsManagedSseEnabled"]},
Enter fullscreen modeExit fullscreen mode

These types generally drill down into a low level type such as String or Integer. Results follow much of the same structure:

"CreateQueueResult":{"type":"structure","members":{"QueueUrl":{"shape":"String","documentation":"<p>The URL of the created Amazon SQS queue.</p>"}},"documentation":"<p>Returns the <code>QueueUrl</code> attribute of the created queue.</p>"},
Enter fullscreen modeExit fullscreen mode

Paginators

This is essentially a collection of list like calls that support pagination. The backend boto call handles token management and results iterations:

{"pagination":{"ListDeadLetterSourceQueues":{"input_token":"NextToken","limit_key":"MaxResults","output_token":"NextToken","result_key":"queueUrls"},"ListQueues":{"input_token":"NextToken","limit_key":"MaxResults","output_token":"NextToken","result_key":"QueueUrls"}}}
Enter fullscreen modeExit fullscreen mode

boto handles this behind the scenes via thebotocore pagination module.

Waiters

As the name suggests, this is a collection of list like API calls which check if a resource is in a specific state. It handles not only the polling but also how much time between each poll and the max retries. For example the ec2 instance running waiter:

"InstanceRunning":{"delay":15,"operation":"DescribeInstances","maxAttempts":40,"acceptors":[{"expected":"running","matcher":"pathAll","state":"success","argument":"Reservations[].Instances[].State.Name"},{"expected":"shutting-down","matcher":"pathAny","state":"failure","argument":"Reservations[].Instances[].State.Name"},{"expected":"terminated","matcher":"pathAny","state":"failure","argument":"Reservations[].Instances[].State.Name"},{"expected":"stopping","matcher":"pathAny","state":"failure","argument":"Reservations[].Instances[].State.Name"},{"matcher":"error","expected":"InvalidInstanceID.NotFound","state":"retry"}]},
Enter fullscreen modeExit fullscreen mode

It runsDescribeInstances and utilizes aJMESPath style query to drill down to the proper attribute in the resulting JSON.

Client

The client code is where a lot of the magic in making boto work happens. Of particular interest is themethod mapping code:

def_create_methods(self,service_model):op_dict={}foroperation_nameinservice_model.operation_names:py_operation_name=xform_name(operation_name)op_dict[py_operation_name]=self._create_api_method(py_operation_name,operation_name,service_model)returnop_dictdef_create_name_mapping(self,service_model):# py_name -> OperationName, for every operation available# for a service.mapping={}foroperation_nameinservice_model.operation_names:py_operation_name=xform_name(operation_name)mapping[py_operation_name]=operation_namereturnmappingdef_create_api_method(self,py_operation_name,operation_name,service_model):def_api_call(self,*args,**kwargs):# We're accepting *args so that we can give a more helpful# error message than TypeError: _api_call takes exactly# 1 argument.ifargs:raiseTypeError(f"{py_operation_name}() only accepts keyword arguments.")# The "self" in this scope is referring to the BaseClient.returnself._make_api_call(operation_name,kwargs)_api_call.__name__=str(py_operation_name)# Add the docstring to the client methodoperation_model=service_model.operation_model(operation_name)docstring=ClientMethodDocstring(operation_model=operation_model,method_name=operation_name,event_emitter=self._event_emitter,method_description=operation_model.documentation,example_prefix='response = client.%s'%py_operation_name,include_signature=False,)_api_call.__doc__=docstringreturn_api_call
Enter fullscreen modeExit fullscreen mode

This creates the actual method mapping which links back to the relevant API call in question. Creating the more pythonic version of methods is done via thexform_name function:

defxform_name(name,sep='_',_xform_cache=_xform_cache):"""Convert camel case to a"pythonic" name.    If the name contains the ``sep`` character, then it is    returned unchanged."""ifsepinname:# If the sep is in the name, assume that it's already# transformed and return the string unchanged.returnnamekey=(name,sep)ifkeynotin_xform_cache:if_special_case_transform.search(name)isnotNone:is_special=_special_case_transform.search(name)matched=is_special.group()# Replace something like ARNs, ACLs with _arns, _acls.name=f"{name[:-len(matched)]}{sep}{matched.lower()}"s1=_first_cap_regex.sub(r'\1'+sep+r'\2',name)transformed=_end_cap_regex.sub(r'\1'+sep+r'\2',s1).lower()_xform_cache[key]=transformedreturn_xform_cache[key]
Enter fullscreen modeExit fullscreen mode

Which takes the camel caseDescribeInstances type calls and makes them lower case_ separated calls such asdescribe_instances with afew exceptions.

Request Sending

Despite all the mapping the end result of boto's code still needs to be sending an AWS API call. Theboto AWS request module handles this along with some of the more interesting cases such asdealing withHTTP 100 CONTINUE handling. Singing related code can be found in theauth module such as SigV2 auth (which I will add is deprecated):

defcalc_signature(self,request,params):logger.debug("Calculating signature using v2 auth.")split=urlsplit(request.url)path=split.pathiflen(path)==0:path='/'string_to_sign=f"{request.method}\n{split.netloc}\n{path}\n"lhmac=hmac.new(self.credentials.secret_key.encode("utf-8"),digestmod=sha256)pairs=[]forkeyinsorted(params):# Any previous signature should not be a part of this# one, so we skip that particular key. This prevents# issues during retries.ifkey=='Signature':continuevalue=str(params[key])quoted_key=quote(key.encode('utf-8'),safe='')quoted_value=quote(value.encode('utf-8'),safe='-_~')pairs.append(f'{quoted_key}={quoted_value}')qs='&'.join(pairs)string_to_sign+=qslogger.debug('String to sign: %s',string_to_sign)lhmac.update(string_to_sign.encode('utf-8'))b64=base64.b64encode(lhmac.digest()).strip().decode('utf-8')return(qs,b64)
Enter fullscreen modeExit fullscreen mode

Which thankfully handles the fun parts like HMAC signing for us. It also covers the evenmore complexSigV4 calls.

Credentials

With a few minor exceptions you can't do much with the AWS API unless you're authenticated. Thecredentials module handles this against a variety of sources. It also handles the ever important part of credentials refresh via STS in cases such asAssumeRole:

def_refresh(self):# In the common case where we don't need a refresh, we# can immediately exit and not require acquiring the# refresh lock.ifnotself.refresh_needed(self._advisory_refresh_timeout):return# acquire() doesn't accept kwargs, but False is indicating# that we should not block if we can't acquire the lock.# If we aren't able to acquire the lock, we'll trigger# the else clause.ifself._refresh_lock.acquire(False):try:ifnotself.refresh_needed(self._advisory_refresh_timeout):returnis_mandatory_refresh=self.refresh_needed(self._mandatory_refresh_timeout)self._protected_refresh(is_mandatory=is_mandatory_refresh)returnfinally:self._refresh_lock.release()elifself.refresh_needed(self._mandatory_refresh_timeout):# If we're within the mandatory refresh window,# we must block until we get refreshed credentials.withself._refresh_lock:ifnotself.refresh_needed(self._mandatory_refresh_timeout):returnself._protected_refresh(is_mandatory=True)
Enter fullscreen modeExit fullscreen mode

Of particular interest is thecredentials resolver:

env_provider=EnvProvider()container_provider=ContainerProvider()instance_metadata_provider=InstanceMetadataProvider(iam_role_fetcher=InstanceMetadataFetcher(timeout=metadata_timeout,num_attempts=num_attempts,user_agent=session.user_agent(),config=imds_config,))profile_provider_builder=ProfileProviderBuilder(session,cache=cache,region_name=region_name)assume_role_provider=AssumeRoleProvider(load_config=lambda:session.full_config,client_creator=_get_client_creator(session,region_name),cache=cache,profile_name=profile_name,credential_sourcer=CanonicalNameCredentialSourcer([env_provider,container_provider,instance_metadata_provider]),profile_provider_builder=profile_provider_builder,)
Enter fullscreen modeExit fullscreen mode

Boto tracks credentials from a number of sources including profile based credentials, environment variables, special container resolvers, instance metadata for EC2 instance profiles, and AssumeRole providers. As noted by the order environment variables have the highest resolution priority.

Resources

Resources are an abstraction around the lower level API calls encapsulating them into an easier to work with form. This part is actually handled on theboto3 side instead of the botocore one. Despite being an alternative to the client interface itstill uses it on the backend:

ifidentifiersisNone:identifiers=[]#: (``list``) List of identifier namesself.identifiers=identifiers#: (:py:class:`~botocore.client.BaseClient`) Low-level Botocore clientself.client=client#: (``dict``) Loaded resource data attributesself.data=data# The resource model for that resourceself.resource_model=resource_model
Enter fullscreen modeExit fullscreen mode

Resources also have their own service file layout:

"Queue":{"identifiers":[{"name":"Url"}],"shape":"GetQueueAttributesResult","load":{"request":{"operation":"GetQueueAttributes","params":[{"target":"QueueUrl","source":"identifier","name":"Url"},{"target":"AttributeNames[]","source":"string","value":"All"}]},"path":"@"},
Enter fullscreen modeExit fullscreen mode

There's a few things going on here. First is the load method which refreshes a queues attributes viaGetQueueAttributes. It also reads in theQueueUrl to make it available as theurl property. There are also constructors for obtaining a resource type via a unique identifier. This is often shown ashas declarations in the resource JSON:

"has":{"Queue":{"resource":{"type":"Queue","identifiers":[{"target":"Url","source":"input"}]}}},
Enter fullscreen modeExit fullscreen mode

In this case aQueue resource can be instantiated by providing the URL to the queue.

Collections

Collections are an abstraction aroundpagination logic. One of the more interesting things it does is provide iteration in the form of apython generator:

def__iter__(self):"""        A generator which yields resource instances after doing the        appropriate service operation calls and handling any pagination        on your behalf.        Page size, item limit, and filter parameters are applied        if they have previously been set.            >>>bucket=s3.Bucket('boto3')>>>forobjinbucket.objects.all():...print(obj.key)'key1''key2'"""limit=self._params.get('limit',None)count=0forpageinself.pages():foriteminpage:yielditem# If the limit is set and has been reached, then# we stop processing items here.count+=1iflimitisnotNoneandcount>=limit:return
Enter fullscreen modeExit fullscreen mode

Which does mean you'll need to dolist() style casting to switch to greedy loading and operate on results using the list type format. Collections can be seen in the boto resource JSON files ashasMany style declarations:

"hasMany":{"Queues":{"request":{"operation":"ListQueues"},"resource":{"type":"Queue","identifiers":[{"target":"Url","source":"response","path":"QueueUrls[]"}]}}}
Enter fullscreen modeExit fullscreen mode

The target also gives you the items at the level you generally need them at without having to reference the toplevel return key ("QueueUrls" for example). This is especially handy for the EC2 resource which has two layers of nesting:

"Instances":{"request":{"operation":"DescribeInstances"},"resource":{"type":"Instance","identifiers":[{"target":"Id","source":"response","path":"Reservations[].Instances[].InstanceId"}],"path":"Reservations[].Instances[]"}},
Enter fullscreen modeExit fullscreen mode

Conclusion

I hope you enjoyed this look at how boto operates on the inside. It's a rather fascinating use of python and shows how a service heavy cloud provider like AWS can have a scalable API interface via service model mapping. Given that the AWS CLI uses it, it's also a nice peak at how the CLI works as well.

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Like what you see? I'm currently open for work opportunities!
  • Location
    Austin, Texas
  • Work
    Open for work
  • Joined

More fromChris White

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp